Skip to content

Concepts

Understanding the principles behind DataJoint.

DataJoint implements the Relational Workflow Modelโ€”a paradigm that extends relational databases with native support for computational workflows. This section explains the core concepts that make DataJoint pipelines reliable, reproducible, and scalable.

Core Concepts

  • Entity Integrity

    Primary keys and the three questions. Ensuring one-to-one correspondence between entities and records.

  • Normalization

    Schema design principles. Organizing tables around workflow steps to minimize redundancy.

  • Query Algebra

    The five operators: restriction, join, projection, aggregation, union. Workflow-aware query semantics.

  • Type System

    Three-layer architecture: native, core, and codec types. In-table and in-store storage modes.

  • Computation Model

    AutoPopulate and Jobs 2.0. Automated, reproducible, distributed computation.

  • Custom Codecs

    Extend DataJoint with domain-specific types. The codec extensibility system.

  • Data Pipelines

    From workflows to complete data operations systems. Project structure and object-augmented schemas.

  • Semantic Matching

    How DataJoint ensures safe joins through attribute lineage tracking.

  • What's New in 2.0

    Major changes, new features, and migration guidance for DataJoint 2.0.

  • FAQ

    How DataJoint compares to ORMs, workflow managers, and lakehouses. Common questions answered.

Why These Concepts Matter

Traditional databases store data. DataJoint pipelines process data. Understanding the Relational Workflow Model helps you:

  • Design schemas that naturally express your workflow
  • Write queries that are both powerful and intuitive
  • Build computations that scale from laptop to cluster
  • Maintain data integrity throughout the pipeline lifecycle