Specifications¶

Formal specifications of DataJoint's data model and behavior.

These documents define how DataJoint works at a detailed level. They serve as authoritative references for:

Understanding exact behavior of operations
Implementing compatible tools and extensions
Debugging complex scenarios

How to Use These Specifications¶

If you're new to DataJoint: Start with the tutorials and how-to guides before diving into specifications. Specs are technical references, not learning materials.

If you're implementing features: Use specs as authoritative sources for behavior. Start with dependencies (see below) and work up to your target specification.

If you're debugging: Specs clarify exact behavior when documentation or examples are ambiguous.

Reading Order¶

Start Here¶

Database Backends — Supported databases (MySQL, PostgreSQL)
Table Declaration — How to define tables
Primary Keys — Key propagation rules
Type System — Three-layer type architecture

Next: Choose based on your needs: - Working with data? → Data Operations - Building queries? → Query Algebra - Using large data? → Object Storage

Query Algebra¶

Prerequisites: Table Declaration, Primary Keys

Query Operators — Restrict, proj, join, aggr, union
Semantic Matching — Attribute lineage
Fetch API — Data retrieval

Data Operations¶

Prerequisites: Table Declaration

Data Manipulation — Insert, update, delete
AutoPopulate — Jobs 2.0 system
Job Metadata — Hidden job tracking columns

Object Storage¶

Prerequisites: Type System

Object Store Configuration — Store setup
Codec API — Custom type implementation
<npy> Codec — NumPy array storage

Advanced Topics¶

Master-Part Relationships — Compositional modeling
Virtual Schemas — Schema introspection without source

Document Structure¶

Each specification follows a consistent structure:

Overview — What this specifies
User Guide — Practical usage
API Reference — Methods and signatures
Concepts — Definitions and rules
Implementation Details — Internal behavior
Examples — Concrete code samples
Best Practices — Recommendations

Specifications by Topic¶

Schema Definition¶

Specification	Prerequisites	Related How-To	Related Explanation
Table Declaration	None	Define Tables	Relational Workflow Model
Master-Part Relationships	Table Declaration	Model Relationships	Data Pipelines
Virtual Schemas	Table Declaration	—	—

Key concepts: Table tiers (Manual, Lookup, Imported, Computed, Part), foreign keys, dependency graphs, compositional modeling

Query Algebra¶

Specification	Prerequisites	Related How-To	Related Explanation
Query Operators	Table Declaration, Primary Keys	Query Data	Query Algebra
Semantic Matching	Query Operators	Model Relationships	Query Algebra
Primary Keys	Table Declaration	Design Primary Keys	Entity Integrity
Fetch API	Query Operators	Fetch Results	—
Diagram	Table Declaration	Read Diagrams	—

Key concepts: Restriction (&, -), projection (.proj()), join (*), aggregation (.aggr()), union, universal set (U()), attribute lineage, schema visualization

Type System¶

Specification	Prerequisites	Related How-To	Related Explanation
Type System	None	Choose a Storage Type	Type System
Codec API	Type System	Create Custom Codec	Custom Codecs
`<npy>` Codec	Type System	Use Object Storage	—

Key concepts: Native types (MySQL), core types (portable), codec types (Python objects), in-table vs object storage, addressing schemes

Object Storage¶

Specification	Prerequisites	Related How-To	Related Explanation
Object Store Configuration	Type System	Configure Object Storage	Data Pipelines (OAS)

Key concepts: Hash-addressed storage (deduplication), schema-addressed storage (browsable paths), filepath storage (user-managed), store configuration, path generation

Data Operations¶

Specification	Prerequisites	Related How-To	Related Explanation
Data Manipulation	Table Declaration	Insert Data	Normalization
AutoPopulate	Table Declaration, Data Manipulation	Run Computations, Distributed Computing	Computation Model
Job Metadata	AutoPopulate	Handle Errors	Computation Model

Key concepts: Insert patterns, transactional integrity, workflow normalization, Jobs 2.0, job coordination, populate(), make() method, job states