Design Primary Keys¶
Choose effective primary keys for your tables.
Primary Key Principles¶
Primary key attributes:
- Uniquely identify each entity
- Cannot be NULL
- Cannot be changed after insertion
- Are inherited by dependent tables via foreign keys
Natural Keys¶
Use meaningful identifiers when they exist:
@schema
class Subject(dj.Manual):
definition = """
subject_id : varchar(16) # Lab-assigned ID like 'M001'
---
species : varchar(32)
"""
Good candidates: - Lab-assigned IDs - Standard identifiers (NCBI accession, DOI) - Meaningful codes with enforced uniqueness
Composite Keys¶
Combine attributes when a single attribute isn't unique:
@schema
class Session(dj.Manual):
definition = """
-> Subject
session_idx : int32 # Session number within subject
---
session_date : date
"""
The primary key is (subject_id, session_idx).
Surrogate Keys¶
Use UUIDs when natural keys don't exist:
@schema
class Experiment(dj.Manual):
definition = """
experiment_id : uuid
---
description : varchar(500)
"""
Generate UUIDs:
import uuid
Experiment.insert1({
'experiment_id': uuid.uuid4(),
'description': 'Pilot study'
})
Why DataJoint Avoids Auto-Increment¶
DataJoint discourages auto_increment for primary keys:
-
Encourages lazy design โ Users treat it as "row number" rather than thinking about what uniquely identifies the entity in their domain.
-
Incompatible with composite keys โ DataJoint schemas routinely use composite keys like
(subject_id, session_idx, trial_idx). MySQL allows only one auto_increment column per table, and it must be first in the key. -
Breaks reproducibility โ Auto_increment values depend on insertion order. Rebuilding a pipeline produces different IDs.
-
No client-server handshake โ The client discovers the ID only after insertion, complicating error handling and concurrent access.
-
Meaningless foreign keys โ Downstream tables inherit opaque integers rather than traceable lineage.
Instead, use: - Natural keys that identify entities in your domain - UUIDs when no natural identifier exists - Composite keys combining foreign keys with sequence numbers
Foreign Keys in Primary Key¶
Foreign keys above the --- become part of the primary key:
@schema
class Trial(dj.Manual):
definition = """
-> Session # In primary key
trial_idx : int32 # In primary key
---
-> Stimulus # NOT in primary key
outcome : enum('hit', 'miss')
"""
Key Design Guidelines¶
Keep Keys Small¶
Prefer int32 over int64 when the range allows:
# Good: Appropriate size
session_idx : int32 # Max 65,535 sessions per subject
# Avoid: Unnecessarily large
session_idx : int64 # Wastes space, slower joins
Use Fixed-Width for Joins¶
Fixed-width types join faster:
# Good: Fixed width
subject_id : char(8)
# Acceptable: Variable width
subject_id : varchar(16)
Avoid Dates as Primary Keys¶
Dates alone rarely guarantee uniqueness:
# Bad: Date might not be unique
session_date : date
---
...
# Good: Add a sequence number
-> Subject
session_idx : int32
---
session_date : date
Avoid Computed Values¶
Primary keys should be stable inputs, not derived:
# Bad: Derived from other data
hash_id : varchar(64) # MD5 of some content
# Good: Assigned identifier
recording_id : uuid
Migration Considerations¶
Once a table has data, primary keys cannot be changed. Plan carefully:
# Consider future needs
@schema
class Scan(dj.Manual):
definition = """
-> Session
scan_idx : int16 # Might need int32 for high-throughput
---
...
"""
See Also¶
- Define Tables โ Table definition syntax
- Model Relationships โ Foreign key patterns