Specifications¶
Formal specifications of DataJoint's data model and behavior.
These documents define how DataJoint works at a detailed level. They serve as authoritative references for:
- Understanding exact behavior of operations
- Implementing compatible tools and extensions
- Debugging complex scenarios
How to Use These Specifications¶
If you're new to DataJoint: Start with the tutorials and how-to guides before diving into specifications. Specs are technical references, not learning materials.
If you're implementing features: Use specs as authoritative sources for behavior. Start with dependencies (see below) and work up to your target specification.
If you're debugging: Specs clarify exact behavior when documentation or examples are ambiguous.
Reading Order¶
Start Here¶
- Database Backends โ Supported databases (MySQL, PostgreSQL)
- Table Declaration โ How to define tables
- Primary Keys โ Key propagation rules
- Type System โ Three-layer type architecture
Next: Choose based on your needs: - Working with data? โ Data Operations - Building queries? โ Query Algebra - Using large data? โ Object Storage
Query Algebra¶
Prerequisites: Table Declaration, Primary Keys
- Query Operators โ Restrict, proj, join, aggr, union
- Semantic Matching โ Attribute lineage
- Fetch API โ Data retrieval
Data Operations¶
Prerequisites: Table Declaration
- Data Manipulation โ Insert, update, delete
- AutoPopulate โ Jobs 2.0 system
- Job Metadata โ Hidden job tracking columns
Object Storage¶
Prerequisites: Type System
- Object Store Configuration โ Store setup
- Codec API โ Custom type implementation
<npy>Codec โ NumPy array storage
Advanced Topics¶
- Master-Part Relationships โ Compositional modeling
- Virtual Schemas โ Schema introspection without source
Document Structure¶
Each specification follows a consistent structure:
- Overview โ What this specifies
- User Guide โ Practical usage
- API Reference โ Methods and signatures
- Concepts โ Definitions and rules
- Implementation Details โ Internal behavior
- Examples โ Concrete code samples
- Best Practices โ Recommendations
Specifications by Topic¶
Schema Definition¶
| Specification | Prerequisites | Related How-To | Related Explanation |
|---|---|---|---|
| Table Declaration | None | Define Tables | Relational Workflow Model |
| Master-Part Relationships | Table Declaration | Model Relationships | Data Pipelines |
| Virtual Schemas | Table Declaration | โ | โ |
Key concepts: Table tiers (Manual, Lookup, Imported, Computed, Part), foreign keys, dependency graphs, compositional modeling
Query Algebra¶
| Specification | Prerequisites | Related How-To | Related Explanation |
|---|---|---|---|
| Query Operators | Table Declaration, Primary Keys | Query Data | Query Algebra |
| Semantic Matching | Query Operators | Model Relationships | Query Algebra |
| Primary Keys | Table Declaration | Design Primary Keys | Entity Integrity |
| Fetch API | Query Operators | Fetch Results | โ |
| Diagram | Table Declaration | Read Diagrams | โ |
Key concepts: Restriction (&, -), projection (.proj()), join (*), aggregation (.aggr()), union, universal set (U()), attribute lineage, schema visualization
Type System¶
| Specification | Prerequisites | Related How-To | Related Explanation |
|---|---|---|---|
| Type System | None | Choose a Storage Type | Type System |
| Codec API | Type System | Create Custom Codec | Custom Codecs |
<npy> Codec |
Type System | Use Object Storage | โ |
Key concepts: Native types (MySQL), core types (portable), codec types (Python objects), in-table vs object storage, addressing schemes
Object Storage¶
| Specification | Prerequisites | Related How-To | Related Explanation |
|---|---|---|---|
| Object Store Configuration | Type System | Configure Object Storage | Data Pipelines (OAS) |
Key concepts: Hash-addressed storage (deduplication), schema-addressed storage (browsable paths), filepath storage (user-managed), store configuration, path generation
Data Operations¶
| Specification | Prerequisites | Related How-To | Related Explanation |
|---|---|---|---|
| Data Manipulation | Table Declaration | Insert Data | Normalization |
| AutoPopulate | Table Declaration, Data Manipulation | Run Computations, Distributed Computing | Computation Model |
| Job Metadata | AutoPopulate | Handle Errors | Computation Model |
Key concepts: Insert patterns, transactional integrity, workflow normalization, Jobs 2.0, job coordination, populate(), make() method, job states