Skip to content

Cascade SpecificationΒΆ

This document specifies how DataJoint propagates restrictions across the foreign-key (FK) graph for cascade delete and cascade preview operations. It defines the propagation rules, the role of part_integrity, and the Part-to-Master upward propagation that handles renamed FKs and Part-of-Part chains.

For the user-facing entry points, see Delete Data. For dependent concepts, see Master-Part, Diagram, and Data Manipulation.

OverviewΒΆ

A cascade starts at a (possibly restricted) seed table and propagates the restriction to every table that depends on it via foreign keys, so that a delete or preview affects all dependents consistently. Cascade is invoked by:

  • Table.delete(part_integrity=...) β€” executes the cascade as a delete.
  • Diagram.cascade(table_expr, part_integrity=...).counts() β€” preview, no mutation.

Both follow the same propagation rules; only the terminal step (delete vs. count) differs.

Dependency graphΒΆ

DataJoint loads its FK structure into a directed acyclic graph (Connection.dependencies, a networkx.DiGraph). The graph encodes two kinds of structure:

Element Encodes
Node A table, named by its fully-qualified SQL identifier (e.g. `schema`.`table_name`). The graph stores the table's primary key set as node data.
Edge parent β†’ child An FK constraint from child to parent. Edge data: attr_map (dict mapping child's FK columns β†’ parent's referenced columns), aliased (true iff any column was renamed), primary (true iff the FK is in the child's primary key).
Alias node Inserted between a parent and child when the FK is aliased=True. Named by an integer-string ("1", "2", …) to distinguish from real table nodes. Both half-edges (parent β†’ alias, alias β†’ child) carry the same attr_map and aliased props. Treat as transparent when walking.

The cascade engine operates on a copy of this graph (the Diagram class), recording per-table restrictions in _cascade_restrictions and the set of restricted attributes in _restriction_attrs.

Restriction propagation rulesΒΆ

When propagating a restriction across an edge parent β†’ child, one of three rules applies. The rule depends on whether the FK renames columns (aliased) and on whether the parent's currently-restricted attributes (parent_attrs) are contained in the child's primary key (child_pk).

Forward propagation (parent β†’ child)ΒΆ

Rule Trigger Effect on child
F1. Copy not aliased and parent_attrs βŠ† child_pk Child inherits the parent's restriction directly (same attribute names; literal restriction values copy as-is).
F2. Aliased rename aliased Child's restriction is parent.proj(**{fk_col: parent_col for fk_col, parent_col in attr_map.items()}) β€” the parent expression with columns renamed to match the child's column names.
F3. Project not aliased and parent_attrs βŠ„ child_pk Child's restriction is parent.proj() β€” the parent projected to its primary key.

After applying the rule, the child's restricted-attributes set is updated to track what's now constrained on it. The child becomes a propagation source for its own children in the next pass.

Upward propagation (child β†’ parent)ΒΆ

Symmetric inverses of the forward rules. Used by part_integrity="cascade" to propagate a Part's restriction up to its Master through the FK chain. Edge metadata is the same; the direction of travel is reversed.

Rule Trigger Effect on parent
U1. Copy not aliased and child_attrs βŠ† parent_pk Parent inherits the child's restriction directly (shared attribute names).
U2. Aliased reverse-rename aliased Parent's restriction is child.proj(**{parent_col: fk_col for fk_col, parent_col in attr_map.items()}) β€” the child expression with FK columns renamed back to the parent's column names.
U3. Project not aliased and child_attrs βŠ„ parent_pk Parent's restriction is child.proj() β€” the child projected to attributes in the parent's primary key.

Cascade flowΒΆ

flowchart TB
    A[seed table_expr] --> B[Diagram.cascade]
    B --> C{seed is a Part?
part_integrity=cascade?} C -- yes --> D[Upward walk:
_propagate_part_to_master] C -- no --> E[Forward propagation
over descendants] D --> E E --> F{any new master pulled in?} F -- yes --> E F -- no --> G[Build subgraph,
count or delete]

The engine performs multiple passes when part_integrity="cascade" is in effect: each pass forward-propagates over all allowed_nodes, and a pass may pull in a new master (with its descendants) that requires another pass. The loop terminates because the graph is a DAG and only finitely many nodes can be added.

part_integrity modesΒΆ

Master-Part integrity reflects the contract that a Master row exists for every Part row that references it. Three modes govern how cascade enforces or relaxes that contract:

Mode Behavior
"enforce" (default) If a delete would remove a Part row without removing the corresponding Master row, the entire delete is rolled back with DataJointError. The Master row is checked after the delete; the integrity violation is detected post-hoc and reversed.
"ignore" No upward propagation; no post-check. Use when the Master row is intentionally preserved and the user has accepted that Part rows may be orphaned. Caller is responsible for the consequences.
"cascade" Upward propagation enabled. When cascade reaches a Part, the Master is also restricted (via the upward rules below), and the Master then forward-cascades back down to all its Parts (siblings of the originating Part included). Used when the user wants the master-part group treated atomically.

This document focuses on the "cascade" mode; the "enforce" and "ignore" modes do not change the propagation graph.

Part-to-Master upward propagationΒΆ

When part_integrity="cascade" and the cascade reaches (or starts at) a Part node, the engine triggers an upward walk of the FK graph from the Part to its Master, applying the upward rules (U1, U2, U3) at each edge.

Identifying the MasterΒΆ

The Master is identified by naming convention via dependencies.extract_master: a Part table's SQL name ends in master__partname, and the Master is the prefix before the __. The graph must contain a node with that Master name; otherwise the upward walk is skipped.

Walking the FK pathΒΆ

The walk uses nx.shortest_path(master, part) to find the FK chain from Master to Part:

Master β†’ [intermediate Part(s)] β†’ Part

Alias nodes ("1", "2", …) on the path are skipped β€” both half-edges carry the same attr_map and aliased props, so the engine reads props from one and reasons about the real edge. Each real edge along the walk fires one upward rule (U1, U2, or U3) per the edge's metadata. Intermediate Parts in a Part-of-Part chain are restricted along the way β€” not only the Master.

Materialization at the MasterΒΆ

After the upward walk completes, the Master's accumulated restrictions are materialized to a literal value tuple via (master_ft & restrictions).proj().to_arrays() and stored as a single condition. Subsequent forward propagation from the Master back down to its other Parts then generates WHERE pk IN (literal-list) rather than WHERE pk IN (SELECT ... FROM <originating-part>).

Why materialization matters. Without this step, the Master's restriction would be a QueryExpression referencing the same Part being deleted. When the Master forward-cascades back to its Parts (including the originating Part), the generated DELETE contains a subquery against the table being modified β€” which MySQL forbids ("error 1093: You can't specify target table 'T' for update in FROM clause"). Materializing converts the restriction into a static value set, which both backends accept.

Intermediate Parts in the chain are not materialized β€” they appear only as restrictions on the path, not as forward-cascade sources, so the self-reference issue doesn't arise there.

Seed-is-Part caseΒΆ

When Table.delete(part_integrity="cascade") is invoked on a Part directly (e.g. Master.PartB.delete(part_integrity="cascade")), the cascade seed is the Part itself. A leaf Part has no out-edges, so the main propagation loop β€” which fires the part_integrity block from inside the out_edges iteration β€” would not trigger.

The engine handles this case by invoking the upward walk explicitly for the seed before the main loop runs. After the walk pulls in the Master and any intermediate Parts, the main loop then forward-cascades from the Master to its remaining descendants in the usual way.

Worked examplesΒΆ

Example 1: Part-of-Part with renamed FKΒΆ

import datajoint as dj

schema = dj.Schema("imaging")

@schema
class Subject(dj.Manual):
    definition = """
    subject_id : int32
    """

    class Session(dj.Part):
        definition = """
        -> master
        session_id : int32
        """

    class Recording(dj.Part):
        definition = """
        -> Subject.Session.proj(src_subject='subject_id', src_session='session_id')
        recording_id : int32
        """

Recording's columns are {src_subject, src_session, recording_id} β€” none of them are named subject_id. The FK from Subject.Session β†’ Subject.Recording is aliased.

When (Subject.Recording & {"recording_id": 5}).delete(part_integrity="cascade") runs:

  1. Seed-is-Part check. extract_master(Recording) == Subject. Trigger the upward walk.
  2. FK path. shortest_path(Subject, Recording) = [Subject, Subject.Session, alias_node, Subject.Recording] β†’ real path [Subject, Subject.Session, Subject.Recording].
  3. Walk reversed.
    • Edge Subject.Session β†’ Subject.Recording: aliased=True. Apply U2 β€” Subject.Session is restricted by Subject.Recording.proj(subject_id='src_subject', session_id='src_session').
    • Edge Subject β†’ Subject.Session: aliased=False, child_attrs={subject_id, session_id} βŠ† parent_pk={subject_id}? No (session_id not in parent pk). Apply U3 β€” Subject is restricted by Subject.Session.proj() (projected to subject_id).
  4. Materialize Master. Subject's restriction is fetched into a value tuple; replaces the chained QueryExpression.
  5. Forward pass. Master forward-cascades back down to Subject.Session and Subject.Recording (and any sibling Parts not on the original path), now with the materialized restriction.

Without the FK walk (the pre-fix behavior), the engine joined subject_ft.proj() & recording_ft.proj() on shared attribute names. Subject has subject_id; Recording has src_subject. No shared columns β†’ empty restriction β†’ Master not restricted. This is the failure mode from #1429 Case 1.

Example 2: Part-of-Part with no Master reference in PartBΒΆ

@schema
class Master(dj.Manual):
    definition = """
    master_id : int32
    """

    class PartA(dj.Part):
        definition = """
        -> master
        part_a_id : int32
        """

    class PartB(dj.Part):
        definition = """
        -> Master.PartA
        part_b_id : int32
        """

PartB's definition references Master.PartA, not master directly. The FK chain Master β†’ PartA β†’ PartB still exists in the dependency graph (PartA's FK to Master is aliased=False; PartA β†’ PartB is also aliased=False).

For (Master.PartB & {"master_id": 1}).delete(part_integrity="cascade"):

  1. Upward walk: PartB β†’ PartA β†’ Master.
    • Edge PartA β†’ PartB: aliased=False, child_attrs βŠ† parent_pk? child_attrs = {master_id}, parent_pk = {master_id, part_a_id}. Yes βŠ†. Apply U1 β€” PartA inherits PartB's restriction directly.
    • Edge Master β†’ PartA: aliased=False, child_attrs = {master_id} βŠ† parent_pk = {master_id}. Apply U1 β€” Master inherits PartA's restriction.
  2. Materialize Master.
  3. Forward cascade Master β†’ PartA β†’ PartB picks up all sibling rows under master_id=1.

Without the FK walk (the pre-fix behavior), the engine jumped directly from PartB to Master via master_ft.proj() & partB_ft.proj(). PartA was never restricted, and the chain semantics were silently incorrect. This is #1429 Case 2.

Algorithmic complexityΒΆ

For a cascade subgraph with N nodes and E edges, propagation runs in at most O(N Β· E) edge visits per pass, with at most one pass per master pulled in by upward propagation. For typical schemas (small Master-Part groups, shallow chains), the cost is dominated by the materialization fetch at the Master β€” one SELECT ... FROM master_ft over the restricted master rows.

What is not part of this specificationΒΆ

  • Diagram.trace() for general upstream restriction propagation: a related but distinct feature tracked in #1423. trace() exposes upstream propagation as a first-class operator; the cascade engine's upward walk above is a closed implementation detail of part_integrity="cascade".
  • Custom propagation rules (user-defined): not supported. The three forward and three upward rules cover the cases the FK graph can produce.
  • Cross-schema cascade: handled by dependencies.load_all_downstream() called from Diagram.cascade(); orthogonal to the propagation rules described here.

ReferencesΒΆ

  • Source: src/datajoint/diagram.py β€” Diagram.cascade, _propagate_restrictions, _apply_propagation_rule, _apply_propagation_rule_upward, _propagate_part_to_master.
  • Source: src/datajoint/dependencies.py β€” graph construction, extract_master, alias node insertion.
  • Issue: datajoint-python #1429 β€” bug report and motivating examples for the upward propagation rules.
  • Master-Part Specification β€” Part-Master contract.
  • Diagram Specification β€” graph operations on the dependency graph.
  • Data Manipulation Specification β€” delete() and delete_quick() user-facing API.