Skip to content

Specifications

Formal specifications of DataJoint's data model and behavior.

These documents define how DataJoint works at a detailed level. They serve as authoritative references for:

  • Understanding exact behavior of operations
  • Implementing compatible tools and extensions
  • Debugging complex scenarios

How to Use These Specifications

If you're new to DataJoint: Start with the tutorials and how-to guides before diving into specifications. Specs are technical references, not learning materials.

If you're implementing features: Use specs as authoritative sources for behavior. Start with dependencies (see below) and work up to your target specification.

If you're debugging: Specs clarify exact behavior when documentation or examples are ambiguous.

Reading Order

Start Here

  1. Database Backends โ€” Supported databases (MySQL, PostgreSQL)
  2. Table Declaration โ€” How to define tables
  3. Primary Keys โ€” Key propagation rules
  4. Type System โ€” Three-layer type architecture

Next: Choose based on your needs: - Working with data? โ†’ Data Operations - Building queries? โ†’ Query Algebra - Using large data? โ†’ Object Storage

Query Algebra

Prerequisites: Table Declaration, Primary Keys

  1. Query Operators โ€” Restrict, proj, join, aggr, union
  2. Semantic Matching โ€” Attribute lineage
  3. Fetch API โ€” Data retrieval

Data Operations

Prerequisites: Table Declaration

  1. Data Manipulation โ€” Insert, update, delete
  2. AutoPopulate โ€” Jobs 2.0 system
  3. Job Metadata โ€” Hidden job tracking columns

Object Storage

Prerequisites: Type System

  1. Object Store Configuration โ€” Store setup
  2. Codec API โ€” Custom type implementation
  3. <npy> Codec โ€” NumPy array storage

Advanced Topics

  1. Master-Part Relationships โ€” Compositional modeling
  2. Virtual Schemas โ€” Schema introspection without source

Document Structure

Each specification follows a consistent structure:

  1. Overview โ€” What this specifies
  2. User Guide โ€” Practical usage
  3. API Reference โ€” Methods and signatures
  4. Concepts โ€” Definitions and rules
  5. Implementation Details โ€” Internal behavior
  6. Examples โ€” Concrete code samples
  7. Best Practices โ€” Recommendations

Specifications by Topic

Schema Definition

Specification Prerequisites Related How-To Related Explanation
Table Declaration None Define Tables Relational Workflow Model
Master-Part Relationships Table Declaration Model Relationships Data Pipelines
Virtual Schemas Table Declaration โ€” โ€”

Key concepts: Table tiers (Manual, Lookup, Imported, Computed, Part), foreign keys, dependency graphs, compositional modeling


Query Algebra

Specification Prerequisites Related How-To Related Explanation
Query Operators Table Declaration, Primary Keys Query Data Query Algebra
Semantic Matching Query Operators Model Relationships Query Algebra
Primary Keys Table Declaration Design Primary Keys Entity Integrity
Fetch API Query Operators Fetch Results โ€”
Diagram Table Declaration Read Diagrams โ€”

Key concepts: Restriction (&, -), projection (.proj()), join (*), aggregation (.aggr()), union, universal set (U()), attribute lineage, schema visualization


Type System

Specification Prerequisites Related How-To Related Explanation
Type System None Choose a Storage Type Type System
Codec API Type System Create Custom Codec Custom Codecs
<npy> Codec Type System Use Object Storage โ€”

Key concepts: Native types (MySQL), core types (portable), codec types (Python objects), in-table vs object storage, addressing schemes


Object Storage

Specification Prerequisites Related How-To Related Explanation
Object Store Configuration Type System Configure Object Storage Data Pipelines (OAS)

Key concepts: Hash-addressed storage (deduplication), schema-addressed storage (browsable paths), filepath storage (user-managed), store configuration, path generation


Data Operations

Specification Prerequisites Related How-To Related Explanation
Data Manipulation Table Declaration Insert Data Normalization
AutoPopulate Table Declaration, Data Manipulation Run Computations, Distributed Computing Computation Model
Job Metadata AutoPopulate Handle Errors Computation Model

Key concepts: Insert patterns, transactional integrity, workflow normalization, Jobs 2.0, job coordination, populate(), make() method, job states