Read Schema Diagrams¶
DataJoint diagrams visualize schema structure as directed acyclic graphs (DAGs). This guide teaches you to:
- Interpret line styles and their semantic meaning
- Recognize dimensions (underlined vs non-underlined tables)
- Use diagram operations to explore large schemas
- Compare DataJoint notation to traditional ER diagrams
import datajoint as dj
schema = dj.Schema('howto_diagrams')
schema.drop(prompt=False)
schema = dj.Schema('howto_diagrams')
[2026-02-06 11:44:10] DataJoint 2.1.0 connected to datajoint@127.0.0.1:5432
Quick Reference¶
| Line Style | Relationship | Child's Primary Key |
|---|---|---|
| Thick Solid ━━━ | Extension | Parent PK only (one-to-one) |
| Thin Solid ─── | Containment | Parent PK + own fields (one-to-many) |
| Dashed ┄┄┄ | Reference | Own independent PK (one-to-many) |
Key principle: Solid lines mean the parent's identity becomes part of the child's identity. Dashed lines mean the child maintains independent identity.
Thick Solid Line: Extension (One-to-One)¶
The foreign key is the entire primary key. The child extends the parent.
@schema
class Customer(dj.Manual):
definition = """
customer_id : int64
---
name : varchar(60)
"""
@schema
class CustomerPreferences(dj.Manual):
definition = """
-> Customer # FK is entire PK
---
theme : varchar(20)
notifications : bool
"""
dj.Diagram(Customer) + dj.Diagram(CustomerPreferences)
Equivalent ER Diagram:
DataJoint vs ER: The thick solid line immediately shows this is one-to-one. In ER notation, you must read the crow's foot symbols (||--o|).
Note: CustomerPreferences is not underlined — it exists in the Customer dimension space.
Thin Solid Line: Containment (One-to-Many)¶
The foreign key is part of the primary key, with additional fields.
@schema
class Account(dj.Manual):
definition = """
-> Customer # Part of PK
account_num : int32 # Additional PK field
---
balance : decimal(10,2)
"""
dj.Diagram(Customer) + dj.Diagram(Account)
Equivalent ER Diagram:
DataJoint vs ER: The thin solid line shows containment — accounts belong to customers. In ER, you see ||--o{ (one-to-many).
Note: Account is underlined — it introduces the Account dimension.
Dashed Line: Reference (One-to-Many)¶
The foreign key is a secondary attribute (below the --- line).
@schema
class Department(dj.Manual):
definition = """
dept_id : int32
---
dept_name : varchar(60)
"""
@schema
class Employee(dj.Manual):
definition = """
employee_id : int64 # Own independent PK
---
-> Department # Secondary attribute
employee_name : varchar(60)
"""
dj.Diagram(Department) + dj.Diagram(Employee)
Equivalent ER Diagram:
DataJoint vs ER: Both show one-to-many, but DataJoint's dashed line tells you immediately that Employee has independent identity. In ER, you must examine whether the FK is part of the PK.
Note: Both tables are underlined — each introduces its own dimension.
Dimensions and Underlined Names¶
A dimension is a new entity type introduced by a table that defines new primary key attributes. Each underlined table introduces exactly one dimension—even if it has multiple new PK attributes, together they identify one new entity type.
| Visual | Meaning |
|---|---|
| Underlined | Introduces a new dimension (new entity type) |
| Not underlined | Exists in the space defined by dimensions from referenced tables |
Key rules:
- Computed tables never introduce dimensions (always non-underlined)
- Part tables can introduce dimensions (may be underlined)
@schema
class Subject(dj.Manual):
definition = """
subject_id : varchar(16) # NEW dimension
---
species : varchar(50)
"""
@schema
class Session(dj.Manual):
definition = """
-> Subject # Inherits subject_id
session_idx : int32 # NEW dimension
---
session_date : date
"""
@schema
class SessionQC(dj.Computed):
definition = """
-> Session # Inherits both, adds nothing
---
passed : bool
"""
def make(self, key):
self.insert1({**key, 'passed': True})
dj.Diagram(schema)
In this diagram:
Subjectis underlined — introduces the Subject dimensionSessionis underlined — introduces the Session dimension (within each Subject)SessionQCis not underlined — exists in the Session dimension space, adds no new dimension
Why this matters: Dimensions determine attribute lineage. Primary key attributes trace back to the dimension where they originated, enabling semantic matching for safe joins.
Many-to-Many: Converging Lines¶
Many-to-many relationships appear as tables with multiple solid lines converging.
@schema
class Student(dj.Manual):
definition = """
student_id : int64
---
name : varchar(60)
"""
@schema
class Course(dj.Manual):
definition = """
course_code : char(8)
---
title : varchar(100)
"""
@schema
class Enrollment(dj.Manual):
definition = """
-> Student
-> Course
---
grade : enum('A','B','C','D','F')
"""
dj.Diagram(Student) + dj.Diagram(Course) + dj.Diagram(Enrollment)
Equivalent ER Diagram:
DataJoint vs ER: Both show the association table pattern. DataJoint's converging solid lines immediately indicate the composite primary key.
Note: Enrollment is not underlined — it exists in the space defined by Student × Course dimensions.
Orange Dots: Renamed Foreign Keys¶
When referencing the same table multiple times, use .proj() to rename. Orange dots indicate renamed FKs.
@schema
class Person(dj.Manual):
definition = """
person_id : int64
---
name : varchar(60)
"""
@schema
class Marriage(dj.Manual):
definition = """
marriage_id : int64
---
-> Person.proj(spouse1='person_id')
-> Person.proj(spouse2='person_id')
marriage_date : date
"""
dj.Diagram(Person) + dj.Diagram(Marriage)
The orange dots between Person and Marriage indicate that projections renamed the foreign key attributes (spouse1 and spouse2 both reference person_id).
Tip: In Jupyter, hover over orange dots to see the projection expression.
Diagram Operations¶
Filter and combine diagrams to explore large schemas:
# Entire schema
dj.Diagram(schema)
# Session and 1 level upstream (dependencies)
dj.Diagram(Session) - 1
# Subject and 2 levels downstream (dependents)
dj.Diagram(Subject) + 2
Operation Reference:
| Operation | Meaning |
|---|---|
dj.Diagram(schema) |
Entire schema |
dj.Diagram(Table) - N |
Table + N levels upstream |
dj.Diagram(Table) + N |
Table + N levels downstream |
D1 + D2 |
Union of two diagrams |
D1 * D2 |
Intersection (common nodes) |
Finding paths: Use intersection to find connection paths:
(dj.Diagram(upstream) + 100) * (dj.Diagram(downstream) - 100)
Layout Direction¶
New in DataJoint 2.1
Control the flow direction of diagrams via configuration:
| Direction | Description |
|---|---|
"TB" |
Top to bottom (default) |
"LR" |
Left to right |
# Horizontal layout using config override
with dj.config.override(display__diagram_direction="LR"):
display(dj.Diagram(schema))
# Generate Mermaid syntax
print((dj.Diagram(Subject) + 2).make_mermaid())
flowchart LR
classDef manual fill:#90EE90,stroke:#006400
classDef lookup fill:#D3D3D3,stroke:#696969
classDef computed fill:#FFB6C1,stroke:#8B0000
classDef imported fill:#ADD8E6,stroke:#00008B
classDef part fill:#FFFFFF,stroke:#000000
classDef collapsed fill:#808080,stroke:#404040
subgraph __main__
Subject[Subject]:::manual
Session[Session]:::manual
SessionQC([SessionQC]):::computed
end
Subject --> Session
Session --> SessionQC
Copy this output into any Mermaid-compatible viewer (GitHub Markdown, MkDocs with mermaid plugin, https://mermaid.live) to render the diagram.
Saving to file:
dj.Diagram(schema).save("pipeline.mmd") # .mmd or .mermaid extension
Multi-Schema Pipelines¶
Real-world pipelines often span multiple schemas (modules).
New in DataJoint 2.1: Tables are automatically grouped into visual clusters by schema, with the Python module name shown as the group label.
# Create a second schema for analysis
howto_analysis = dj.Schema('howto_analysis')
howto_analysis.drop(prompt=False)
howto_analysis = dj.Schema('howto_analysis')
# Reference tables from the first schema
@howto_analysis
class Experimenter(dj.Manual):
definition = """
experimenter : varchar(32)
---
email : varchar(100)
"""
@howto_analysis
class Experiment(dj.Manual):
definition = """
-> Subject # Cross-schema reference
-> Experimenter
experiment_date : date
---
notes : varchar(1000)
"""
@howto_analysis
class Analysis(dj.Computed):
definition = """
-> Experiment
---
result : float64
"""
def make(self, key):
self.insert1({**key, 'result': 0.0})
# Combine both schemas - tables are automatically grouped
multi_schema_diagram = dj.Diagram(schema) + dj.Diagram(howto_analysis)
multi_schema_diagram
Tables are grouped by their database schema automatically. The group label shows the Python module name when available (following the DataJoint convention of one module per schema).
Multi-schema diagrams are useful for:
- Visualizing pipelines spanning multiple schemas
- Understanding which tables belong to which module
- Documentation for multi-module architectures
Collapsing Schemas¶
New in DataJoint 2.1
For high-level pipeline views, collapse entire schemas into single nodes using .collapse(). This is useful for showing relationships between modules without the detail of individual tables.
# Show schema1 expanded, schema2 collapsed into a single node
dj.Diagram(schema) + dj.Diagram(howto_analysis).collapse()
The collapsed node shows the module name and table count. Edges from the expanded schema connect to the collapsed node.
"Expanded wins" rule: If a table appears in both a collapsed and non-collapsed diagram, it stays expanded. This applies even when expanding a single table from a collapsed schema:
# "Expanded wins": Experimenter is expanded even though howto_analysis is collapsed
dj.Diagram(Subject) + dj.Diagram(Experimenter) + dj.Diagram(howto_analysis).collapse()
Preserving directionality: Collapsing middle layers of a pipeline preserves the DAG structure. The collapsed node sits between expanded tables, maintaining edge directions:
# Create a separate schema for this example
sandwich = dj.Schema('howto_sandwich')
sandwich.drop(prompt=False)
sandwich = dj.Schema('howto_sandwich')
# Linear pipeline: RawData -> Filtered -> Normalized -> FinalResult
@sandwich
class RawData(dj.Manual):
definition = """
data_id : int32
"""
@sandwich
class Filtered(dj.Computed):
definition = """
-> RawData
---
filtered_value : float32
"""
def make(self, key): pass
@sandwich
class Normalized(dj.Computed):
definition = """
-> Filtered
---
normalized_value : float32
"""
def make(self, key): pass
@sandwich
class FinalResult(dj.Computed):
definition = """
-> Normalized
---
result : float32
"""
def make(self, key): pass
# Sandwich collapse: expand top and bottom, collapse middle processing steps
dj.Diagram(RawData) + dj.Diagram(FinalResult) + dj.Diagram(sandwich).collapse()
Extended Example: Multi-Module Pipeline¶
Here's a realistic example with three modules that have cross-schema dependencies:
demo_modules/acquisition.py - Core data acquisition:
@schema
class Lab(dj.Manual):
definition = """lab : varchar(32) ..."""
@schema
class Subject(dj.Manual):
definition = """subject_id : varchar(16) --- -> Lab ..."""
@schema
class Session(dj.Manual):
definition = """-> Subject session_date : date ..."""
demo_modules/processing.py - Data processing (references acquisition):
@schema
class ProcessingParams(dj.Lookup):
definition = """params_id : int16 ..."""
@schema
class ProcessedSession(dj.Computed):
definition = """-> acquisition.Session -> ProcessingParams ..."""
@schema
class EventDetection(dj.Computed):
definition = """-> ProcessedSession event_id : int32 ..."""
demo_modules/analysis.py - Analysis (references both modules):
@schema
class AnalysisParams(dj.Lookup):
definition = """analysis_id : int16 ..."""
@schema
class SubjectAnalysis(dj.Computed):
definition = """-> acquisition.Subject -> AnalysisParams ..."""
@schema
class CrossSessionAnalysis(dj.Computed):
definition = """-> acquisition.Subject -> processing.ProcessingParams ..."""
# Import the demo modules
from demo_modules import acquisition, processing, analysis
# Activate schemas (creates tables on first run)
acquisition.schema.activate('demo_acquisition')
processing.schema.activate('demo_processing')
analysis.schema.activate('demo_analysis')
# Drop and recreate for clean state
for s in [analysis.schema, processing.schema, acquisition.schema]:
s.drop(prompt=False)
acquisition.schema.activate('demo_acquisition')
processing.schema.activate('demo_processing')
analysis.schema.activate('demo_analysis')
# Full pipeline diagram - all modules expanded
# Note: dj.Diagram(module) works when the module has a `schema` attribute
full_pipeline = dj.Diagram(acquisition) + dj.Diagram(processing) + dj.Diagram(analysis)
full_pipeline
The full diagram shows all three modules with cross-schema references:
acquisitionprovides core tables (Lab,Subject,Session)processingreferencesSessionfrom acquisitionanalysisreferencesSubjectfrom acquisition ANDProcessingParamsfrom processing
Now let's see collapse in action:
# Two schemas collapsed: acquisition expanded, downstream modules collapsed
# This shows acquisition's internal structure while abstracting processing & analysis
dj.Diagram(acquisition) + dj.Diagram(processing).collapse() + dj.Diagram(analysis).collapse()
# "Expanded wins": Subject stays expanded even though analysis references it
dj.Diagram(acquisition.Subject) + dj.Diagram(analysis).collapse()
# Schema-level DAG: all modules collapsed
# Shows the dependency structure between modules at a glance
dj.Diagram(acquisition).collapse() + dj.Diagram(processing).collapse() + dj.Diagram(analysis).collapse()
Key observations:
- Collapsed nodes show table count — e.g., "processing (3 tables)"
- Cross-schema edges preserved — Dependencies between modules are shown as edges between collapsed nodes
- "Expanded wins" — If you explicitly include a table (like
Subject), it stays expanded even if a collapsed schema references it - Schema-level DAG — Collapsing all schemas reveals the high-level module dependency graph, useful for understanding pipeline architecture
DataJoint vs Traditional ER Notation¶
| Feature | Chen's ER | Crow's Foot | DataJoint |
|---|---|---|---|
| Cardinality | Numbers | Line symbols | Line style |
| Direction | None | None | Top-to-bottom |
| Cycles | Allowed | Allowed | Not allowed |
| PK cascade | Not shown | Not shown | Solid lines |
| Identity sharing | Not indicated | Not indicated | Thick solid |
| New dimensions | Not indicated | Not indicated | Underlined |
Why DataJoint differs:
- DAG structure — No cycles means schemas read as workflows (top-to-bottom)
- Line semantics — Immediately reveals relationship type
- Executable — Diagram is generated from schema, cannot drift out of sync
Summary¶
| Visual | Meaning |
|---|---|
| Thick solid | One-to-one extension |
| Thin solid | One-to-many containment |
| Dashed | Reference (independent identity) |
| Underlined | Introduces new dimension |
| Orange dots | Renamed FK via .proj() |
| Colors | Green=Manual, Gray=Lookup, Red=Computed, Blue=Imported |
| Grouped boxes | Tables grouped by schema/module |
| 3D box (gray) | Collapsed schema (2.1+) |
| Feature | Method |
|---|---|
| Layout direction | dj.config.display.diagram_direction |
| Mermaid output | .make_mermaid() |
| Collapse schema | .collapse() (2.1+) |
Related¶
# Cleanup all schemas
# Demo modules
for s in [analysis.schema, processing.schema, acquisition.schema]:
if s.is_activated():
s.drop(prompt=False)
# Earlier examples
sandwich.drop(prompt=False)
howto_analysis.drop(prompt=False)
schema.drop(prompt=False)