DataJoint Table Declaration Specification¶
Overview¶
This document specifies the table declaration mechanism in DataJoint Python. Table declarations define the schema structure using a domain-specific language (DSL) embedded in Python class definitions.
1. Table Class Structure¶
1.1 Basic Declaration Pattern¶
@schema
class TableName(dj.Manual):
definition = """
# table comment
primary_attr : int32
---
secondary_attr : float64
"""
1.2 Table Tiers¶
| Tier | Base Class | Table Prefix | Purpose |
|---|---|---|---|
| Manual | dj.Manual |
(none) | User-entered data |
| Lookup | dj.Lookup |
# |
Reference/enumeration data |
| Imported | dj.Imported |
_ |
Data from external sources |
| Computed | dj.Computed |
__ |
Derived from other tables |
| Part | dj.Part |
master__ |
Detail records of master table |
1.3 Class Naming Rules¶
- Format: Strict CamelCase (e.g.,
MyTable,SessionAnalysis) - Pattern:
^[A-Z][A-Za-z0-9]*$ - Conversion: CamelCase to snake_case for SQL table name
- Examples:
SessionTrial->session_trialProcessedEMG->processed_emg
1.4 Table Name Constraints¶
- Maximum length: 64 characters (MySQL limit)
- Final name: prefix + snake_case(class_name)
- Validation: Checked at declaration time
2. Definition String Grammar¶
2.1 Overall Structure¶
[table_comment]
primary_key_section
---
secondary_section
2.2 Table Comment (Optional)¶
# Free-form description of the table purpose
- Must be first non-empty line if present
- Starts with
# - Cannot start with
#: - Stored in MySQL table COMMENT
2.3 Primary Key Separator¶
---
- Three or more dashes
- Separates primary key attributes (above) from secondary attributes (below)
- Required if table has secondary attributes
2.4 Line Types¶
Each non-empty, non-comment line is one of:
- Attribute definition
- Foreign key reference
- Index declaration
2.5 Singleton Tables (Empty Primary Keys)¶
New in 2.1
Singleton tables were introduced in DataJoint 2.1.
A singleton table can hold at most one row. It is declared with no attributes in the primary key section:
@schema
class Config(dj.Lookup):
definition = """
# Global configuration
---
setting1 : varchar(100)
setting2 : int32
"""
Behavior:
| Operation | Result |
|---|---|
| Insert | Works without specifying a key |
| Second insert | Raises DuplicateError |
fetch1() |
Returns the single row |
heading.primary_key |
Returns [] (empty) |
Use cases:
- Global configuration settings
- Pipeline parameters
- Summary statistics
- State tracking
Implementation:
Internally, singleton tables use a hidden _singleton attribute of type bool as the primary key. This attribute is:
- Automatically created and populated
- Excluded from
heading.attributes - Excluded from
fetch()results - Excluded from join matching
3. Attribute Definition¶
3.1 Syntax¶
attribute_name [= default_value] : type [# comment]
3.2 Components¶
| Component | Required | Description |
|---|---|---|
attribute_name |
Yes | Identifier for the column |
default_value |
No | Default value (before colon) |
type |
Yes | Data type specification |
comment |
No | Documentation (after #) |
3.3 Attribute Name Rules¶
- Pattern:
^[a-z_][a-z0-9_]*$ - Start: Lowercase letter or underscore
- Contains: Lowercase letters, digits, underscores
- Convention: snake_case
3.4 Hidden Attributes¶
Attributes with names starting with an underscore (_) are hidden. The hidden-attribute mechanism is reserved for platform-managed columns โ bookkeeping that DataJoint itself adds to support the data pipeline โ and is intentionally not exposed for user-defined attributes. Attempting to declare an attribute name with a leading underscore raises:
DataJointError: Attribute name in line "_hidden: bool" starts with an underscore.
Names with leading underscore are reserved for platform-managed columns
(e.g. _job_start_time, _singleton). Use a regular attribute name; if you
need to control visibility at the call site, use proj().
Platform-managed hidden attributes are added automatically when DataJoint declares certain table types. Users do not write these in the definition; the framework injects them programmatically after parsing.
| Hidden attribute | Added to | Purpose |
|---|---|---|
_job_start_time |
Computed, Imported |
Wall-clock start of the populate call |
_job_duration |
Computed, Imported |
Elapsed seconds for the populate call |
_job_version |
Computed, Imported |
Library version that produced the row |
_singleton |
Singleton tables | Implementation detail of the singleton pattern |
These columns are populated by DataJoint internals via raw SQL during the populate() lifecycle, not via insert/update1. They are filtered out of every public API surface so they don't clutter joins, fetches, or displays.
Behavior. The filter is implemented in Heading.attributes, which all visible code paths consume; raw SQL strings bypass it.
| Context | Hidden attributes |
|---|---|
heading.attributes, heading.names, heading.primary_key |
Excluded |
heading._attributes (internal) |
Included |
Table display / repr / _repr_html_ |
Excluded |
fetch(), fetch1(), to_dicts(), to_pandas() (default) |
Excluded |
fetch("_name") / fetch1("_name") (explicit) |
Rejected (Attribute not found) โ use raw SQL via conn.query(...) |
proj("_name") (explicit) |
Rejected (same reason) |
| Natural-join namesake matching | Excluded |
Dict restriction Table & {"_name": value} |
Silently ignored |
String restriction Table & "_name = ..." |
Included (passes to SQL) |
Top(order_by="_name") |
Allowed (passes through to SQL ORDER BY; no heading validation) |
insert(), insert1() |
Rejected โ KeyError("`_name` is not in the table heading") |
update1() |
Rejected โ DataJointError("Attribute `_name` not found.") |
insert(..., ignore_extra_fields=True) |
Silently dropped (key not written) |
describe() / reverse-engineered definition |
Excluded |
unique index (..., _name) |
Allowed |
Why users can't declare them. Allowing user-defined hidden attributes would expose a feature with no public-API write path (insert/update1 reject the keys; ignore_extra_fields=True drops them silently), no describe() round-trip (the regenerated definition would be missing the column), and silent filtering on dict restrictions. The cases users typically reach for hidden attributes โ most commonly an index-backing derived column โ are better served by a regular attribute.
Inspecting platform-managed hidden columns:
# Default fetch โ hidden columns excluded
results = MyTable.to_dicts()
# To inspect platform-managed hidden columns, query raw SQL.
# The public API (fetch / proj) intentionally rejects them.
conn = MyTable.connection
rows = conn.query(
f"SELECT _job_start_time, _job_duration, _job_version "
f"FROM {MyTable.full_table_name}"
).fetchall()
# String restriction works (passes through to SQL)
MyTable & "_job_start_time > '2024-01-01'"
# Dict restriction is silently dropped โ does NOT filter
MyTable & {'_job_start_time': some_date} # โ ignored
Use a regular attribute instead. When you want a column that's part of the schema-level contract (backing an index, storing a derived value, etc.) but isn't featured in default displays, declare it as a regular attribute and use proj() at the call site if you want to omit it from a particular query result. For example, a hash column backing a unique index:
@schema
class TaskParams(dj.Manual):
definition = """
task_id : int32
---
tool : varchar(32)
params : json
params_hash : varchar(32)
unique index (tool, params_hash)
"""
# Inserts work directly:
TaskParams.insert1({'task_id': 1, 'tool': 't', 'params': {...}, 'params_hash': h})
# Dict restrictions work:
TaskParams & {'params_hash': h}
# Hide from a specific result set with proj() if needed:
TaskParams.proj('tool', 'params').fetch()
3.5 Examples¶
definition = """
# Experimental session with subject and timing info
session_id : int32 # auto-assigned
---
subject_name : varchar(100) # subject identifier
trial_number = 1 : int32 # default to 1
score = null : float32 # nullable
timestamp = CURRENT_TIMESTAMP : datetime # auto-timestamp
notes = '' : varchar(4000) # empty default
"""
4. Type System¶
4.1 Core Types¶
Scientist-friendly type names with guaranteed semantics:
| Type | SQL Mapping | Size | Description |
|---|---|---|---|
int8 |
tinyint |
1 byte | 8-bit signed integer |
int16 |
tinyint unsigned |
1 byte | 8-bit unsigned integer |
int16 |
smallint |
2 bytes | 16-bit signed integer |
int32 |
smallint unsigned |
2 bytes | 16-bit unsigned integer |
int32 |
int |
4 bytes | 32-bit signed integer |
int64 |
int unsigned |
4 bytes | 32-bit unsigned integer |
int64 |
bigint |
8 bytes | 64-bit signed integer |
int64 |
bigint unsigned |
8 bytes | 64-bit unsigned integer |
float32 |
float |
4 bytes | 32-bit IEEE 754 float |
float64 |
double |
8 bytes | 64-bit IEEE 754 float |
bool |
tinyint |
1 byte | Boolean (0 or 1) |
uuid |
binary(16) |
16 bytes | UUID stored as binary |
bytes |
longblob |
Variable | Binary data (up to 4GB) |
4.2 String Types¶
| Type | SQL Mapping | Description |
|---|---|---|
char(N) |
char(N) |
Fixed-length string |
varchar(N) |
varchar(N) |
Variable-length string (max N) |
enum('a','b',...) |
enum(...) |
Enumerated values |
4.3 Temporal Types¶
| Type | SQL Mapping | Description |
|---|---|---|
date |
date |
Date (YYYY-MM-DD) |
datetime |
datetime |
Date and time |
datetime(N) |
datetime(N) |
With fractional seconds (0-6) |
4.4 Other Types¶
| Type | SQL Mapping | Description |
|---|---|---|
json |
json |
JSON document |
decimal(P,S) |
decimal(P,S) |
Fixed-point decimal |
4.5 Native SQL Types (Passthrough)¶
These SQL types are accepted but generate a warning recommending core types:
- Integer variants:
tinyint,smallint,mediumint,bigint,integer,serial - Float variants:
float,double,real(with size specifiers) - Text variants:
tinytext,mediumtext,longtext - Blob variants:
tinyblob,smallblob,mediumblob,longblob - Temporal:
time,timestamp,year - Numeric:
numeric(P,S)
4.6 Codec Types¶
Format: <codec_name> or <codec_name@store>
| Codec | In-table dtype | In-store dtype | Purpose |
|---|---|---|---|
<blob> |
bytes |
<hash> |
Serialized Python objects |
<hash> |
N/A (in-store only) | json |
Hash-addressed deduped storage |
<attach> |
bytes |
<hash> |
File attachments with filename |
<filepath> |
N/A (in-store only) | json |
Reference to managed file |
<object> |
N/A (in-store only) | json |
Object storage (Zarr, HDF5) |
In-store storage syntax:
- <blob@> - default store
- <blob@store_name> - named store
4.7 Type Reconstruction¶
Core types and codecs are stored in the SQL COMMENT field for reconstruction:
COMMENT ':float32:user comment here'
COMMENT ':<blob@store>:user comment'
5. Default Values¶
5.1 Syntax¶
attribute_name = default_value : type
5.2 Literal Types¶
| Value | Meaning | SQL |
|---|---|---|
null |
Nullable attribute | DEFAULT NULL |
CURRENT_TIMESTAMP |
Server timestamp | DEFAULT CURRENT_TIMESTAMP |
"string" or 'string' |
String literal | DEFAULT "string" |
123 |
Numeric literal | DEFAULT 123 |
true/false |
Boolean | DEFAULT 1/DEFAULT 0 |
5.3 Constant Literals¶
These values are used without quotes in SQL:
- NULL
- CURRENT_TIMESTAMP
5.4 Nullable Attributes¶
score = null : float32
- The special default
null(case-insensitive) makes the attribute nullable - Nullable attributes can be omitted from INSERT
- Primary key attributes CANNOT be nullable
5.5 Blob/JSON Default Restrictions¶
Blob and JSON attributes can only have null as default:
# Valid
data = null : <blob>
# Invalid - raises DataJointError
data = '' : <blob>
6. Foreign Key References¶
6.1 Syntax¶
-> [options] ReferencedTable
6.2 Options¶
| Option | Effect |
|---|---|
nullable |
All inherited attributes become nullable |
unique |
Creates UNIQUE INDEX on FK attributes |
Options are comma-separated in brackets:
-> [nullable, unique] ParentTable
6.3 Attribute Inheritance¶
Foreign keys automatically inherit all primary key attributes from the referenced table:
# Parent
class Subject(dj.Manual):
definition = """
subject_id : int32
---
name : varchar(100)
"""
# Child - inherits subject_id
class Session(dj.Manual):
definition = """
-> Subject
session_id : int32
---
session_date : date
"""
6.4 Position Rules¶
| Position | Effect |
|---|---|
Before --- |
FK attributes become part of primary key |
After --- |
FK attributes are secondary |
6.5 Nullable Foreign Keys¶
-> [nullable] OptionalParent
- Only allowed after
---(secondary) - Primary key FKs cannot be nullable
- Creates optional relationship
6.6 Unique Foreign Keys¶
-> [unique] ParentTable
- Creates UNIQUE INDEX on inherited attributes
- Enforces one-to-one relationship from child perspective
6.7 Nullable Unique Foreign Keys¶
-> [nullable, unique] ParentTable
- Combines nullable and unique constraints
- Multiple rows can have NULL values (SQL standard: NULLs are not considered equal in UNIQUE constraints)
- At most one row per non-NULL parent reference
- Use case: optional one-to-one relationships where the child may not reference any parent
6.8 Projections in Foreign Keys¶
-> Parent.proj(alias='original_name')
- Reference same table multiple times with different attribute names
- Useful for self-referential or multi-reference patterns
6.8 Referential Actions¶
All foreign keys use:
- ON UPDATE CASCADE - Parent key changes propagate
- ON DELETE RESTRICT - Cannot delete parent with children
6.9 Lineage Tracking¶
Foreign key relationships are recorded in the ~lineage table:
{
'child_attr': ('parent_schema.parent_table', 'parent_attr')
}
Used for semantic attribute matching in queries.
7. Index Declarations¶
7.1 Syntax¶
index(attr1, attr2, ...)
unique index(attr1, attr2, ...)
7.2 Examples¶
definition = """
# User contact information
user_id : int32
---
first_name : varchar(50)
last_name : varchar(50)
email : varchar(100)
index(last_name, first_name)
unique index(email)
"""
7.3 Computed Expressions¶
Indexes can include SQL expressions:
index(last_name, (YEAR(birth_date)))
7.4 Limitations¶
- Cannot be altered after table creation (via
table.alter()) - Must reference existing attributes
8. Part Tables¶
8.1 Declaration¶
@schema
class Master(dj.Manual):
definition = """
master_id : int32
"""
class Detail(dj.Part):
definition = """
-> master
detail_id : int32
---
value : float32
"""
8.2 Naming¶
- SQL name:
master_table__part_name - Example:
experiment__trial
8.3 Master Reference¶
Within Part definition, use:
- -> master (lowercase keyword)
- -> MasterClassName (class name)
8.4 Constraints¶
- Parts must reference their master
- Cannot delete Part records directly (use master)
- Cannot drop Part table directly (use master)
- Part inherits master's primary key
9. Auto-Populated Tables¶
9.1 Classes¶
dj.Imported- Data from external sourcesdj.Computed- Derived from other DataJoint tables
9.2 Primary Key Constraint¶
All primary key attributes must come from foreign key references.
Valid:
class Analysis(dj.Computed):
definition = """
-> Session
-> Parameter
---
result : float64
"""
Invalid (by default):
class Analysis(dj.Computed):
definition = """
-> Session
analysis_id : int32 # ERROR: non-FK primary key
---
result : float64
"""
Override:
dj.config['jobs.allow_new_pk_fields_in_computed_tables'] = True
9.3 Job Metadata¶
When config['jobs.add_job_metadata'] = True, auto-populated tables receive:
| Column | Type | Description |
|---|---|---|
_job_start_time |
datetime(3) |
Job start timestamp |
_job_duration |
float64 |
Duration in seconds |
_job_version |
varchar(64) |
Code version |
10. Validation¶
10.1 Parse-Time Checks¶
| Check | Error |
|---|---|
| Unknown type | DataJointError: Unsupported attribute type |
| Invalid attribute name | DataJointError: Declaration error |
Comment starts with : |
DataJointError: comment must not start with colon |
| Non-null blob default | DataJointError: default value for blob can only be NULL |
10.2 Declaration-Time Checks¶
| Check | Error |
|---|---|
| Table name > 64 chars | DataJointError: Table name exceeds max length |
| No primary key | DataJointError: Table must have a primary key |
| Nullable primary key attr | DataJointError: Primary key attributes cannot be nullable |
| Invalid CamelCase | DataJointError: Invalid table name |
| FK resolution failure | DataJointError: Foreign key reference could not be resolved |
10.3 Insert-Time Validation¶
The table.validate() method checks:
- Required fields present
- NULL constraints satisfied
- Primary key completeness
- Codec validation (if defined)
- UUID format
- JSON serializability
11. SQL Generation¶
11.1 CREATE TABLE Template¶
CREATE TABLE `schema`.`table_name` (
`attr1` TYPE1 NOT NULL COMMENT "...",
`attr2` TYPE2 DEFAULT NULL COMMENT "...",
PRIMARY KEY (`pk1`, `pk2`),
FOREIGN KEY (`fk_attr`) REFERENCES `parent` (`pk`)
ON UPDATE CASCADE ON DELETE RESTRICT,
INDEX (`idx_attr`),
UNIQUE INDEX (`uniq_attr`)
) ENGINE=InnoDB COMMENT="table comment"
11.2 Type Comment Encoding¶
Core types and codecs are preserved in comments:
`value` float NOT NULL COMMENT ":float32:measurement value"
`data` longblob DEFAULT NULL COMMENT ":<blob>:serialized data"
`archive` json DEFAULT NULL COMMENT ":<blob@cold>:in-store data"
12. Implementation Files¶
| File | Purpose |
|---|---|
declare.py |
Definition parsing, SQL generation |
heading.py |
Attribute metadata, type reconstruction |
table.py |
Base Table class, declaration interface |
user_tables.py |
Tier classes (Manual, Computed, etc.) |
schemas.py |
Schema binding, table decoration |
codecs.py |
Codec registry and resolution |
lineage.py |
Attribute lineage tracking |
13. Future Considerations¶
Potential improvements identified for the declaration system:
- Better error messages with suggestions and context
- Import-time validation via
__init_subclass__ - Parser alternatives (regex-based for simpler grammar)
- SQL dialect abstraction for multi-database support
- Extended constraints (CHECK, custom validation)
- Migration support for schema evolution
- Definition caching for performance
- IDE tooling support via structured intermediate representation