Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
49 changes: 48 additions & 1 deletion src/explanation/whats-new-22.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# What's New in DataJoint 2.2

DataJoint 2.2 introduces **isolated instances** and **thread-safe mode** for applications that need multiple independent database connections—web servers, multi-tenant notebooks, parallel pipelines, and testing.
DataJoint 2.2 introduces **isolated instances**, **thread-safe mode**, and **graph-driven diagram operations** for applications that need multiple independent database connections, explicit cascade control, and operational use of the dependency graph.

> **Upgrading from 2.0 or 2.1?** No breaking changes. All existing code using `dj.config` and `dj.Schema()` continues to work. The new Instance API is purely additive.

Expand Down Expand Up @@ -201,9 +201,56 @@ class MyTable(dj.Manual):

Once a Schema is created, table definitions, inserts, queries, and all other operations work identically regardless of which pattern was used to create the Schema.

## Graph-Driven Diagram Operations

DataJoint 2.2 promotes `dj.Diagram` from a visualization tool to an operational component. The same dependency graph that renders pipeline diagrams now powers cascade delete, table drop, and data subsetting.

### From Visualization to Operations

In prior versions, `dj.Diagram` existed solely for visualization — drawing the dependency graph as SVG or Mermaid output. The cascade logic inside `Table.delete()` traversed dependencies independently, with no way to inspect or control the cascade before it executed.

In 2.2, `Table.delete()` and `Table.drop()` delegate internally to `dj.Diagram`. The user-facing behavior of `Table.delete()` is unchanged, but the diagram-level API is now available as a more powerful interface for complex scenarios.

### The Preview-Then-Execute Pattern

The key benefit of the diagram-level API is the ability to build a cascade explicitly, inspect it, and then decide whether to execute:

```python
# Build the dependency graph
diag = dj.Diagram(schema)

# Apply cascade restriction — nothing is deleted yet
restricted = diag.cascade(Session & {'subject_id': 'M001'})

# Inspect: what tables and how many rows would be affected?
counts = restricted.preview()
# {'`lab`.`session`': 3, '`lab`.`trial`': 45, '`lab`.`processed_data`': 45}

# Execute only after reviewing the blast radius
restricted.delete(prompt=False)
```

This is valuable when working with unfamiliar pipelines, large datasets, or multi-schema dependencies where the cascade impact is not immediately obvious.

### Two Propagation Modes

The diagram supports two restriction propagation modes with different convergence semantics:

**`cascade()` uses OR at convergence.** When a child table has multiple restricted ancestors, the child row is affected if *any* parent path reaches it. This is the right semantics for delete — if any reason exists to remove a row, it should be removed. `cascade()` is one-shot: it can only be called once on an unrestricted diagram.

**`restrict()` uses AND at convergence.** A child row is included only if *all* restricted ancestors match. This is the right semantics for data subsetting and export — only rows satisfying every condition are selected. `restrict()` is chainable: call it multiple times to build up conditions from different tables.

The two modes are mutually exclusive on the same diagram. This prevents accidental mixing of incompatible semantics.

### Architecture

`Table.delete()` now constructs a `Diagram` internally, calls `cascade()`, and then `delete()`. This means every table-level delete benefits from the same graph-driven logic. The diagram-level API simply exposes this machinery for direct use when more control is needed.

## See Also

- [Use Isolated Instances](../how-to/use-instances.md/) — Task-oriented guide
- [Working with Instances](../tutorials/advanced/instances.ipynb/) — Step-by-step tutorial
- [Configuration Reference](../reference/configuration.md/) — Thread-safe mode settings
- [Configure Database](../how-to/configure-database.md/) — Connection setup
- [Diagram Specification](../reference/specs/diagram.md/) — Full reference for diagram operations
- [Delete Data](../how-to/delete-data.md/) — Task-oriented delete guide
35 changes: 35 additions & 0 deletions src/how-to/delete-data.md
Original file line number Diff line number Diff line change
Expand Up @@ -189,8 +189,43 @@ count = (Subject & restriction).delete(prompt=False)
print(f"Deleted {count} subjects")
```

## Diagram-Level Delete

!!! version-added "New in 2.2"
Diagram-level delete was added in DataJoint 2.2.

For complex scenarios — previewing the blast radius, working across schemas, or understanding the dependency graph before deleting — use `dj.Diagram` to build and inspect the cascade before executing.

### Build, Preview, Execute

```python
import datajoint as dj

# 1. Build the dependency graph
diag = dj.Diagram(schema)

# 2. Apply cascade restriction (nothing deleted yet)
restricted = diag.cascade(Session & {'subject_id': 'M001'})

# 3. Preview: see affected tables and row counts
counts = restricted.preview()
# {'`lab`.`session`': 3, '`lab`.`trial`': 45, '`lab`.`processed_data`': 45}

# 4. Execute only after reviewing
restricted.delete(prompt=False)
```

### When to Use

- **Preview blast radius**: Understand what a cascade delete will affect before committing
- **Multi-schema cascades**: Build a diagram spanning multiple schemas and delete across them in one operation
- **Programmatic control**: Use `preview()` return values to make decisions in automated workflows

For simple single-table deletes, `(Table & restriction).delete()` remains the simplest approach. The diagram-level API is for when you need more visibility or control.

## See Also

- [Diagram Specification](../reference/specs/diagram.md/) — Full reference for diagram operations
- [Master-Part Tables](master-part.ipynb) — Compositional data patterns
- [Model Relationships](model-relationships.ipynb) — Foreign key patterns
- [Insert Data](insert-data.md) — Adding data to tables
Expand Down
3 changes: 3 additions & 0 deletions src/reference/specs/data-manipulation.md
Original file line number Diff line number Diff line change
Expand Up @@ -332,6 +332,9 @@ Delete automatically cascades to all dependent tables:
2. Recursively delete matching rows in child tables
3. Delete rows in target table

!!! version-added "New in 2.2"
`Table.delete()` now uses graph-driven cascade internally via `dj.Diagram`. User-facing behavior is unchanged — the same parameters and return values apply. For direct control over the cascade (preview, multi-schema operations), use the [Diagram operational methods](diagram.md#operational-methods).

### 4.3 Basic Usage

```python
Expand Down
156 changes: 154 additions & 2 deletions src/reference/specs/diagram.md
Original file line number Diff line number Diff line change
Expand Up @@ -117,6 +117,153 @@ dj.Diagram(Subject) + dj.Diagram(analysis).collapse()

---

## Operational Methods

!!! version-added "New in 2.2"
Operational methods (`cascade`, `restrict`, `delete`, `drop`, `preview`) were added in DataJoint 2.2.

Diagrams can propagate restrictions through the dependency graph and execute data operations (delete, drop) using the graph structure. These methods turn Diagram from a visualization tool into an operational component.

### `cascade()`

```python
diag.cascade(table_expr, part_integrity="enforce")
```

Apply a cascade restriction and propagate it downstream through the dependency graph. Uses **OR** semantics at convergence — a child row is affected if *any* restricted ancestor reaches it. Designed for delete operations.

| Parameter | Type | Default | Description |
|-----------|------|---------|-------------|
| `table_expr` | QueryExpression | — | A restricted table expression (e.g., `Session & 'subject_id=1'`) |
| `part_integrity` | str | `"enforce"` | Master-part integrity policy |

**Returns:** New `Diagram` with cascade restrictions applied.

**Constraints:**

- `cascade()` can only be called **once** on an unrestricted Diagram
- Cannot be mixed with `restrict()` — the two modes are mutually exclusive
- `table_expr.full_table_name` must be a node in the diagram

**`part_integrity` values:**

| Value | Behavior |
|-------|----------|
| `"enforce"` | Error if parts would be deleted before masters |
| `"ignore"` | Allow deleting parts without masters |
| `"cascade"` | Also delete masters when parts are deleted |

```python
# Build a cascade from a restricted table
diag = dj.Diagram(schema)
restricted = diag.cascade(Session & {'subject_id': 'M001'})
```

### `restrict()`

```python
diag.restrict(table_expr)
```

Apply a restrict condition and propagate it downstream. Uses **AND** semantics at convergence — a child row is included only if it satisfies *all* restricted ancestors. Designed for data subsetting and export operations.

| Parameter | Type | Default | Description |
|-----------|------|---------|-------------|
| `table_expr` | QueryExpression | — | A restricted table expression |

**Returns:** New `Diagram` with restrict conditions applied.

**Constraints:**

- Cannot be called on a cascade-restricted Diagram (mutually exclusive with `cascade()`)
- `table_expr.full_table_name` must be a node in the diagram
- **Can be chained** — call `restrict()` multiple times to add conditions from different tables

```python
# Chain multiple restrictions (AND semantics)
diag = dj.Diagram(schema)
restricted = (diag
.restrict(Subject & {'species': 'mouse'})
.restrict(Session & 'session_date > "2024-01-01"'))
```

### `delete()`

```python
diag.delete(transaction=True, prompt=None)
```

Execute a cascading delete using previously applied cascade restrictions. Tables are deleted in reverse topological order (leaves first) to maintain referential integrity.

| Parameter | Type | Default | Description |
|-----------|------|---------|-------------|
| `transaction` | bool | `True` | Wrap in atomic transaction |
| `prompt` | bool or None | `None` | Prompt for confirmation. Default: `dj.config['safemode']` |

**Returns:** Number of rows deleted from the root table.

**Requires:** `cascade()` must be called first.

```python
diag = dj.Diagram(schema)
restricted = diag.cascade(Session & {'subject_id': 'M001'})
restricted.preview() # inspect what will be deleted
restricted.delete() # execute the delete
```

### `drop()`

```python
diag.drop(prompt=None, part_integrity="enforce")
```

Drop all tables in the diagram in reverse topological order.

| Parameter | Type | Default | Description |
|-----------|------|---------|-------------|
| `prompt` | bool or None | `None` | Prompt for confirmation. Default: `dj.config['safemode']` |
| `part_integrity` | str | `"enforce"` | `"enforce"` or `"ignore"` |

**Note:** Unlike `delete()`, `drop()` does not use cascade restrictions. It drops all tables in the diagram.

### `preview()`

```python
diag.preview()
```

Show affected tables and row counts without modifying data. Works with both `cascade()` and `restrict()` restrictions.

**Returns:** `dict[str, int]` — mapping of full table names to affected row counts.

**Requires:** `cascade()` or `restrict()` must be called first.

```python
diag = dj.Diagram(schema)
restricted = diag.cascade(Session & {'subject_id': 'M001'})
counts = restricted.preview()
# {'`lab`.`session`': 3, '`lab`.`trial`': 45, '`lab`.`processed_data`': 45}
```

### Restriction Propagation

When `cascade()` or `restrict()` propagates a restriction from a parent table to a child table, one of three rules applies depending on the foreign key relationship:

**Rule 1 — Direct copy:** When the foreign key is non-aliased and the restriction attributes are a subset of the child's primary key, the restriction is copied directly to the child.

**Rule 2 — Aliased projection:** When the foreign key uses attribute renaming (e.g., `subject_id` → `animal_id`), the parent is projected with the attribute mapping to match the child's column names.

**Rule 3 — Full projection:** When the foreign key is non-aliased but the restriction uses attributes not in the child's primary key, the parent is projected (all attributes) and used as a restriction on the child.

**Convergence behavior:**

When a child table has multiple restricted ancestors, the convergence rule depends on the mode:

- **`cascade()` (OR):** A child row is affected if *any* path from a restricted ancestor reaches it. This is appropriate for delete — if any reason exists to delete a row, it should be deleted.
- **`restrict()` (AND):** A child row is included only if *all* restricted ancestors match. This is appropriate for export — only rows satisfying every condition are selected.

---

## Output Methods

### Graphviz Output
Expand Down Expand Up @@ -299,18 +446,23 @@ combined = dj.Diagram.from_sequence([schema1, schema2, schema3])

## Dependencies

Diagram visualization requires optional dependencies:
Operational methods (`cascade`, `restrict`, `delete`, `drop`, `preview`) use `networkx`, which is always installed as a core dependency.

Diagram **visualization** requires optional dependencies:

```bash
pip install matplotlib pygraphviz
```

If dependencies are missing, `dj.Diagram` displays a warning and provides a stub class.
If visualization dependencies are missing, `dj.Diagram` displays a warning and provides a stub class. Operational methods remain available regardless.

---

## See Also

- [How to Read Diagrams](../../how-to/read-diagrams.ipynb/)
- [Delete Data](../../how-to/delete-data.md/) — Diagram-level delete workflow
- [What's New in 2.2](../../explanation/whats-new-22.md/) — Motivation and design
- [Data Manipulation](data-manipulation.md) — Insert, update, delete specification
- [Query Algebra](query-algebra.md)
- [Table Declaration](table-declaration.md)