From 693abe0025695f1fed603029eab7819a89a50046 Mon Sep 17 00:00:00 2001 From: Dimitri Yatsenko Date: Sun, 22 Feb 2026 11:53:53 -0600 Subject: [PATCH] docs: document restricted diagram operations (new in 2.2) Add documentation for cascade(), restrict(), delete(), drop(), and preview() methods on dj.Diagram, including restriction propagation rules and OR-vs-AND convergence semantics. Co-Authored-By: Claude Opus 4.6 --- src/explanation/whats-new-22.md | 49 ++++++- src/how-to/delete-data.md | 35 +++++ src/reference/specs/data-manipulation.md | 3 + src/reference/specs/diagram.md | 156 ++++++++++++++++++++++- 4 files changed, 240 insertions(+), 3 deletions(-) diff --git a/src/explanation/whats-new-22.md b/src/explanation/whats-new-22.md index 0f437c08..16c7f0b1 100644 --- a/src/explanation/whats-new-22.md +++ b/src/explanation/whats-new-22.md @@ -1,6 +1,6 @@ # What's New in DataJoint 2.2 -DataJoint 2.2 introduces **isolated instances** and **thread-safe mode** for applications that need multiple independent database connections—web servers, multi-tenant notebooks, parallel pipelines, and testing. +DataJoint 2.2 introduces **isolated instances**, **thread-safe mode**, and **graph-driven diagram operations** for applications that need multiple independent database connections, explicit cascade control, and operational use of the dependency graph. > **Upgrading from 2.0 or 2.1?** No breaking changes. All existing code using `dj.config` and `dj.Schema()` continues to work. The new Instance API is purely additive. @@ -201,9 +201,56 @@ class MyTable(dj.Manual): Once a Schema is created, table definitions, inserts, queries, and all other operations work identically regardless of which pattern was used to create the Schema. +## Graph-Driven Diagram Operations + +DataJoint 2.2 promotes `dj.Diagram` from a visualization tool to an operational component. The same dependency graph that renders pipeline diagrams now powers cascade delete, table drop, and data subsetting. + +### From Visualization to Operations + +In prior versions, `dj.Diagram` existed solely for visualization — drawing the dependency graph as SVG or Mermaid output. The cascade logic inside `Table.delete()` traversed dependencies independently, with no way to inspect or control the cascade before it executed. + +In 2.2, `Table.delete()` and `Table.drop()` delegate internally to `dj.Diagram`. The user-facing behavior of `Table.delete()` is unchanged, but the diagram-level API is now available as a more powerful interface for complex scenarios. + +### The Preview-Then-Execute Pattern + +The key benefit of the diagram-level API is the ability to build a cascade explicitly, inspect it, and then decide whether to execute: + +```python +# Build the dependency graph +diag = dj.Diagram(schema) + +# Apply cascade restriction — nothing is deleted yet +restricted = diag.cascade(Session & {'subject_id': 'M001'}) + +# Inspect: what tables and how many rows would be affected? +counts = restricted.preview() +# {'`lab`.`session`': 3, '`lab`.`trial`': 45, '`lab`.`processed_data`': 45} + +# Execute only after reviewing the blast radius +restricted.delete(prompt=False) +``` + +This is valuable when working with unfamiliar pipelines, large datasets, or multi-schema dependencies where the cascade impact is not immediately obvious. + +### Two Propagation Modes + +The diagram supports two restriction propagation modes with different convergence semantics: + +**`cascade()` uses OR at convergence.** When a child table has multiple restricted ancestors, the child row is affected if *any* parent path reaches it. This is the right semantics for delete — if any reason exists to remove a row, it should be removed. `cascade()` is one-shot: it can only be called once on an unrestricted diagram. + +**`restrict()` uses AND at convergence.** A child row is included only if *all* restricted ancestors match. This is the right semantics for data subsetting and export — only rows satisfying every condition are selected. `restrict()` is chainable: call it multiple times to build up conditions from different tables. + +The two modes are mutually exclusive on the same diagram. This prevents accidental mixing of incompatible semantics. + +### Architecture + +`Table.delete()` now constructs a `Diagram` internally, calls `cascade()`, and then `delete()`. This means every table-level delete benefits from the same graph-driven logic. The diagram-level API simply exposes this machinery for direct use when more control is needed. + ## See Also - [Use Isolated Instances](../how-to/use-instances.md/) — Task-oriented guide - [Working with Instances](../tutorials/advanced/instances.ipynb/) — Step-by-step tutorial - [Configuration Reference](../reference/configuration.md/) — Thread-safe mode settings - [Configure Database](../how-to/configure-database.md/) — Connection setup +- [Diagram Specification](../reference/specs/diagram.md/) — Full reference for diagram operations +- [Delete Data](../how-to/delete-data.md/) — Task-oriented delete guide diff --git a/src/how-to/delete-data.md b/src/how-to/delete-data.md index 545788bb..72724075 100644 --- a/src/how-to/delete-data.md +++ b/src/how-to/delete-data.md @@ -189,8 +189,43 @@ count = (Subject & restriction).delete(prompt=False) print(f"Deleted {count} subjects") ``` +## Diagram-Level Delete + +!!! version-added "New in 2.2" + Diagram-level delete was added in DataJoint 2.2. + +For complex scenarios — previewing the blast radius, working across schemas, or understanding the dependency graph before deleting — use `dj.Diagram` to build and inspect the cascade before executing. + +### Build, Preview, Execute + +```python +import datajoint as dj + +# 1. Build the dependency graph +diag = dj.Diagram(schema) + +# 2. Apply cascade restriction (nothing deleted yet) +restricted = diag.cascade(Session & {'subject_id': 'M001'}) + +# 3. Preview: see affected tables and row counts +counts = restricted.preview() +# {'`lab`.`session`': 3, '`lab`.`trial`': 45, '`lab`.`processed_data`': 45} + +# 4. Execute only after reviewing +restricted.delete(prompt=False) +``` + +### When to Use + +- **Preview blast radius**: Understand what a cascade delete will affect before committing +- **Multi-schema cascades**: Build a diagram spanning multiple schemas and delete across them in one operation +- **Programmatic control**: Use `preview()` return values to make decisions in automated workflows + +For simple single-table deletes, `(Table & restriction).delete()` remains the simplest approach. The diagram-level API is for when you need more visibility or control. + ## See Also +- [Diagram Specification](../reference/specs/diagram.md/) — Full reference for diagram operations - [Master-Part Tables](master-part.ipynb) — Compositional data patterns - [Model Relationships](model-relationships.ipynb) — Foreign key patterns - [Insert Data](insert-data.md) — Adding data to tables diff --git a/src/reference/specs/data-manipulation.md b/src/reference/specs/data-manipulation.md index e2841efa..160aee16 100644 --- a/src/reference/specs/data-manipulation.md +++ b/src/reference/specs/data-manipulation.md @@ -332,6 +332,9 @@ Delete automatically cascades to all dependent tables: 2. Recursively delete matching rows in child tables 3. Delete rows in target table +!!! version-added "New in 2.2" + `Table.delete()` now uses graph-driven cascade internally via `dj.Diagram`. User-facing behavior is unchanged — the same parameters and return values apply. For direct control over the cascade (preview, multi-schema operations), use the [Diagram operational methods](diagram.md#operational-methods). + ### 4.3 Basic Usage ```python diff --git a/src/reference/specs/diagram.md b/src/reference/specs/diagram.md index 58aba574..c2fc9174 100644 --- a/src/reference/specs/diagram.md +++ b/src/reference/specs/diagram.md @@ -117,6 +117,153 @@ dj.Diagram(Subject) + dj.Diagram(analysis).collapse() --- +## Operational Methods + +!!! version-added "New in 2.2" + Operational methods (`cascade`, `restrict`, `delete`, `drop`, `preview`) were added in DataJoint 2.2. + +Diagrams can propagate restrictions through the dependency graph and execute data operations (delete, drop) using the graph structure. These methods turn Diagram from a visualization tool into an operational component. + +### `cascade()` + +```python +diag.cascade(table_expr, part_integrity="enforce") +``` + +Apply a cascade restriction and propagate it downstream through the dependency graph. Uses **OR** semantics at convergence — a child row is affected if *any* restricted ancestor reaches it. Designed for delete operations. + +| Parameter | Type | Default | Description | +|-----------|------|---------|-------------| +| `table_expr` | QueryExpression | — | A restricted table expression (e.g., `Session & 'subject_id=1'`) | +| `part_integrity` | str | `"enforce"` | Master-part integrity policy | + +**Returns:** New `Diagram` with cascade restrictions applied. + +**Constraints:** + +- `cascade()` can only be called **once** on an unrestricted Diagram +- Cannot be mixed with `restrict()` — the two modes are mutually exclusive +- `table_expr.full_table_name` must be a node in the diagram + +**`part_integrity` values:** + +| Value | Behavior | +|-------|----------| +| `"enforce"` | Error if parts would be deleted before masters | +| `"ignore"` | Allow deleting parts without masters | +| `"cascade"` | Also delete masters when parts are deleted | + +```python +# Build a cascade from a restricted table +diag = dj.Diagram(schema) +restricted = diag.cascade(Session & {'subject_id': 'M001'}) +``` + +### `restrict()` + +```python +diag.restrict(table_expr) +``` + +Apply a restrict condition and propagate it downstream. Uses **AND** semantics at convergence — a child row is included only if it satisfies *all* restricted ancestors. Designed for data subsetting and export operations. + +| Parameter | Type | Default | Description | +|-----------|------|---------|-------------| +| `table_expr` | QueryExpression | — | A restricted table expression | + +**Returns:** New `Diagram` with restrict conditions applied. + +**Constraints:** + +- Cannot be called on a cascade-restricted Diagram (mutually exclusive with `cascade()`) +- `table_expr.full_table_name` must be a node in the diagram +- **Can be chained** — call `restrict()` multiple times to add conditions from different tables + +```python +# Chain multiple restrictions (AND semantics) +diag = dj.Diagram(schema) +restricted = (diag + .restrict(Subject & {'species': 'mouse'}) + .restrict(Session & 'session_date > "2024-01-01"')) +``` + +### `delete()` + +```python +diag.delete(transaction=True, prompt=None) +``` + +Execute a cascading delete using previously applied cascade restrictions. Tables are deleted in reverse topological order (leaves first) to maintain referential integrity. + +| Parameter | Type | Default | Description | +|-----------|------|---------|-------------| +| `transaction` | bool | `True` | Wrap in atomic transaction | +| `prompt` | bool or None | `None` | Prompt for confirmation. Default: `dj.config['safemode']` | + +**Returns:** Number of rows deleted from the root table. + +**Requires:** `cascade()` must be called first. + +```python +diag = dj.Diagram(schema) +restricted = diag.cascade(Session & {'subject_id': 'M001'}) +restricted.preview() # inspect what will be deleted +restricted.delete() # execute the delete +``` + +### `drop()` + +```python +diag.drop(prompt=None, part_integrity="enforce") +``` + +Drop all tables in the diagram in reverse topological order. + +| Parameter | Type | Default | Description | +|-----------|------|---------|-------------| +| `prompt` | bool or None | `None` | Prompt for confirmation. Default: `dj.config['safemode']` | +| `part_integrity` | str | `"enforce"` | `"enforce"` or `"ignore"` | + +**Note:** Unlike `delete()`, `drop()` does not use cascade restrictions. It drops all tables in the diagram. + +### `preview()` + +```python +diag.preview() +``` + +Show affected tables and row counts without modifying data. Works with both `cascade()` and `restrict()` restrictions. + +**Returns:** `dict[str, int]` — mapping of full table names to affected row counts. + +**Requires:** `cascade()` or `restrict()` must be called first. + +```python +diag = dj.Diagram(schema) +restricted = diag.cascade(Session & {'subject_id': 'M001'}) +counts = restricted.preview() +# {'`lab`.`session`': 3, '`lab`.`trial`': 45, '`lab`.`processed_data`': 45} +``` + +### Restriction Propagation + +When `cascade()` or `restrict()` propagates a restriction from a parent table to a child table, one of three rules applies depending on the foreign key relationship: + +**Rule 1 — Direct copy:** When the foreign key is non-aliased and the restriction attributes are a subset of the child's primary key, the restriction is copied directly to the child. + +**Rule 2 — Aliased projection:** When the foreign key uses attribute renaming (e.g., `subject_id` → `animal_id`), the parent is projected with the attribute mapping to match the child's column names. + +**Rule 3 — Full projection:** When the foreign key is non-aliased but the restriction uses attributes not in the child's primary key, the parent is projected (all attributes) and used as a restriction on the child. + +**Convergence behavior:** + +When a child table has multiple restricted ancestors, the convergence rule depends on the mode: + +- **`cascade()` (OR):** A child row is affected if *any* path from a restricted ancestor reaches it. This is appropriate for delete — if any reason exists to delete a row, it should be deleted. +- **`restrict()` (AND):** A child row is included only if *all* restricted ancestors match. This is appropriate for export — only rows satisfying every condition are selected. + +--- + ## Output Methods ### Graphviz Output @@ -299,18 +446,23 @@ combined = dj.Diagram.from_sequence([schema1, schema2, schema3]) ## Dependencies -Diagram visualization requires optional dependencies: +Operational methods (`cascade`, `restrict`, `delete`, `drop`, `preview`) use `networkx`, which is always installed as a core dependency. + +Diagram **visualization** requires optional dependencies: ```bash pip install matplotlib pygraphviz ``` -If dependencies are missing, `dj.Diagram` displays a warning and provides a stub class. +If visualization dependencies are missing, `dj.Diagram` displays a warning and provides a stub class. Operational methods remain available regardless. --- ## See Also - [How to Read Diagrams](../../how-to/read-diagrams.ipynb/) +- [Delete Data](../../how-to/delete-data.md/) — Diagram-level delete workflow +- [What's New in 2.2](../../explanation/whats-new-22.md/) — Motivation and design +- [Data Manipulation](data-manipulation.md) — Insert, update, delete specification - [Query Algebra](query-algebra.md) - [Table Declaration](table-declaration.md)