Skip to content

The Relational Workflow Model¶

The relational data model has historically been interpreted through two conceptual frameworks: Codd's mathematical foundation, which views tables as logical predicates, and Chen's Entity-Relationship Model, which views tables as entity types and relationships. The relational workflow model introduces a third paradigm: tables represent workflow steps, rows represent workflow artifacts, and foreign key dependencies prescribe execution order. This adds an operational dimension absent from both predecessors—the schema specifies not only what data exists but how it is derived.

The relational workflow model and its technical innovations are formally defined in Yatsenko & Nguyen, 2026. DataJoint's schema definition language and query algebra were first formalized in Yatsenko et al., 2018.

Three Paradigms Compared¶

Aspect Mathematical (Codd) Entity-Relationship (Chen) Relational Workflow (DataJoint)
Core question What functional dependencies exist? What entity types exist? When/how are entities created?
Table semantics Logical predicate Entity or relationship Workflow step
Row semantics True proposition Entity instance Workflow artifact
Foreign keys Referential integrity Relationship Execution order
Computation Not addressed Not addressed Declared in schema
Provenance Not addressed Not addressed Structural
Implementation gap High High None

Codd's Mathematical Foundation¶

Codd's mathematical foundation views tables as logical predicates and rows as true propositions—rigorous but abstract.

Chen's Entity-Relationship Model¶

Chen's Entity-Relationship Model shifted focus to domain modeling with entities, attributes, and relationships—more intuitive, but lacking any workflow or computational dimension.

Core Concepts¶

Workflow Steps and Artifacts¶

Tables are classified into tiers by data entry mode:

Tier Role make()
Manual Receive direct user entry No
Lookup Hold reference data No
Imported Reach out to data sources outside the DataJoint system (instruments, electronic lab notebooks, external databases) Yes
Computed Derive their contents entirely from upstream DataJoint tables Yes

Imported and Computed tables define computations via make() methods. The make() method specifies how each entity is derived—this computation logic is declared within the table definition, making it part of the schema itself rather than an external workflow specification.

Dependencies as Foreign Keys¶

Foreign keys define computational dependencies, not only referential integrity. The dependency graph is explicit, queryable, and enforced by the database.

graph LR
    A[Session] --> B[Scan]
    B --> C[Segmentation]
    C --> D[Analysis]

Master-Part Relationships¶

Master-part relationships declare transactional grouping directly in the schema: the master table represents the workflow step, while part tables hold the individual items. Insertions and deletions cascade as a unit, enforcing transactional semantics without application code.

Directed Acyclic Graph¶

Dependencies between tables form a directed acyclic graph (DAG); aggregated dependencies between schemas likewise form a DAG. Unlike task DAGs in workflow managers, these are relational schema DAGs—they define data structure and relationships, not just execution steps.

Active Schemas¶

The key distinction from classical models: traditional schemas are passive—containers for data produced by external processes. In the relational workflow model, the schema is active—Computed tables declare how their contents are derived, making the schema itself the workflow specification. Schemas are defined as Python classes, and entire pipelines are organized as self-contained code repositories—version-controlled, testable, and deployable using standard software engineering practices.

A useful analogy: electronic spreadsheets unified data and computation—cells with values alongside cells with formulas. Yet this integration never penetrated relational databases in their 50+ years of history. The relational workflow model brings to databases what spreadsheets brought to tabular calculation: the recognition that data and the computations that produce it belong together. The analogy has limits: spreadsheets' coupling is also the source of their well-known fragility. DataJoint addresses this through formal schema constraints and explicit dependency declaration rather than ad-hoc cell references.

Workflow Normalization¶

"Every table represents an entity type created at a specific workflow step, and all attributes describe that entity as it exists at that step."

Database normalization decomposes data into tables to eliminate redundancy. Classical normalization theory achieves this through normal forms based on functional dependencies. Entity normalization asks whether each attribute describes the entity identified by the primary key. Workflow normalization extends these principles with a temporal dimension.

A Session table contains attributes known when the session is entered (date, experimenter, subject). Analysis parameters determined later belong in Computed tables that depend on Session. This discipline prevents tables that accumulate attributes from different workflow stages, obscuring provenance and complicating updates.

Entity Integrity¶

All data is represented as well-formed entity sets with primary keys identifying each entity uniquely. This eliminates redundancy and ensures consistent updates.

When upstream data is deleted, dependent results cascade-delete automatically—including associated objects in external storage. To correct errors, you delete, reinsert, and recompute, ensuring every result represents a consistent computation from valid inputs.

Query Algebra¶

DataJoint provides a five-operator algebra:

Operator Symbol Purpose
Restrict & Filter entities by attribute values or membership in other relations
Project .proj() Select and rename attributes, compute derived values
Join * Combine related entities across relations
Aggregate .aggr() Group entities and compute summary statistics
Union + Combine entity sets with compatible structure

The algebra achieves algebraic closure: every operator produces a valid entity set with a well-defined primary key, enabling unlimited composition. This preservation of entity integrity—every query result is itself a proper entity set with clear identity—distinguishes DataJoint's algebra from SQL, where query results lack both a well-defined primary key and a clear entity type.

From Transactions to Transformations¶

Traditional View Workflow View
Tables store data Tables represent workflow steps
Rows are records Rows are workflow artifacts
Foreign keys enforce consistency Foreign keys prescribe execution order
Updates modify state Computations create new states
Schemas organize storage Schemas specify pipelines
Queries retrieve data Queries trace provenance

Summary¶

The relational workflow model offers a new way to understand relational databases—not merely as storage systems but as computational substrates. By interpreting tables as workflow steps and foreign keys as execution dependencies, the schema becomes a complete specification of how data is derived, not just what data exists.