Query Operators Reference¶
DataJoint provides a small set of operators for querying data. All operators return new query expressions without modifying the original—queries are immutable and composable.
Operator Summary¶
| Operator | Syntax | Description |
|---|---|---|
| Restriction | A & condition |
Select rows matching condition |
| Anti-restriction | A - condition |
Select rows NOT matching condition |
| Projection | A.proj(...) |
Select, rename, or compute attributes |
| Join | A * B |
Combine tables on matching attributes |
| Extension | A.extend(B) |
Add attributes from B, keeping all rows of A |
| Aggregation | A.aggr(B, ...) |
Group B by A's primary key and compute summaries |
| Union | A + B |
Combine entity sets |
Restriction (&)¶
Select rows that match a condition.
# String condition (SQL expression)
Session & "session_date > '2024-01-01'"
Session & "duration BETWEEN 30 AND 60"
# Dictionary (exact match)
Session & {'subject_id': 'M001'}
Session & {'subject_id': 'M001', 'session_idx': 1}
# Query expression (matching keys)
Session & Subject # Sessions for subjects in Subject table
Session & (Subject & "sex = 'M'") # Sessions for male subjects
# List (OR of conditions)
Session & [{'subject_id': 'M001'}, {'subject_id': 'M002'}]
Chaining: Multiple restrictions combine with AND:
Session & "duration > 30" & {"experimenter": "alice"}
Top N Rows (dj.Top)¶
Restrict to the top N rows with optional ordering:
# First row by primary key
Session & dj.Top()
# First 10 rows by primary key (ascending)
Session & dj.Top(10)
# First 10 rows by primary key (descending)
Session & dj.Top(10, 'KEY DESC')
# Top 5 by score descending
Result & dj.Top(5, 'score DESC')
# Top 10 most recent sessions
Session & dj.Top(10, 'session_date DESC')
# Pagination: skip 20, take 10
Session & dj.Top(10, 'session_date DESC', offset=20)
# All rows ordered (no limit)
Session & dj.Top(None, 'session_date DESC')
Parameters:
limit(default=1): Maximum rows. UseNonefor no limit.order_by(default="KEY"): Attribute(s) to sort by."KEY"expands to all primary key attributes. AddDESCfor descending order (e.g.,"KEY DESC","score DESC"). UseNoneto inherit existing order.offset(default=0): Rows to skip.
Chaining Tops: When chaining multiple Top restrictions, the second Top can inherit the first's ordering by using order_by=None:
# First Top sets the order, second inherits it
(Session & dj.Top(100, 'date DESC')) & dj.Top(10, order_by=None)
# Result: top 10 of top 100 by date descending
Note: dj.Top can only be used with restriction (&), not with anti-restriction (-).
Anti-Restriction (-)¶
Select rows that do NOT match a condition.
# Subjects without any sessions
Subject - Session
# Sessions not from subject M001
Session - {'subject_id': 'M001'}
# Sessions without trials
Session - Trial
Projection (.proj())¶
Select, rename, or compute attributes. Primary key is always included.
# Primary key only
Subject.proj()
# Specific attributes
Subject.proj('species', 'sex')
# All attributes
Subject.proj(...)
# All except some
Subject.proj(..., '-notes', '-internal_id')
# Rename attribute
Subject.proj(animal_species='species')
# Computed attribute (SQL expression)
Subject.proj(weight_kg='weight / 1000')
Session.proj(year='YEAR(session_date)')
Trial.proj(is_correct='response = stimulus')
Join (*)¶
Combine tables on shared attributes. DataJoint matches attributes by semantic matching—only attributes with the same name AND same origin (through foreign keys) are matched.
# Join Subject and Session on subject_id
Subject * Session
# Three-way join
Subject * Session * Experimenter
# Join then restrict
(Subject * Session) & "sex = 'M'"
# Restrict then join (equivalent)
(Subject & "sex = 'M'") * Session
Primary key of result: Determined by functional dependencies between operands. See Query Algebra Specification for details.
Extension (.extend())¶
Add attributes from another table while preserving all rows. This is useful for adding optional attributes.
# Add experimenter info to sessions
# Sessions without an experimenter get NULL values
Session.extend(Experimenter)
Requirement: The left operand must "determine" the right operand—all of B's primary key attributes must exist in A.
Aggregation (.aggr())¶
Group one entity type by another and compute summary statistics.
# Count trials per session
Session.aggr(Session.Trial, n_trials='count(trial_idx)')
# Multiple aggregates
Session.aggr(
Session.Trial,
n_trials='count(trial_idx)',
n_correct='sum(correct)',
avg_rt='avg(reaction_time)',
min_rt='min(reaction_time)',
max_rt='max(reaction_time)'
)
# Count sessions per subject
Subject.aggr(Session, n_sessions='count(session_idx)')
Default behavior: Keeps all rows from the grouping table (left operand), even those without matches. Use count(pk_attribute) to get 0 for entities without matches.
# All subjects, including those with 0 sessions
Subject.aggr(Session, n_sessions='count(session_idx)')
# Only subjects with at least one session
Subject.aggr(Session, n_sessions='count(session_idx)', exclude_nonmatching=True)
Common Aggregate Functions¶
| Function | Description |
|---|---|
count(attr) |
Count non-NULL values |
count(*) |
Count all rows (including NULL) |
sum(attr) |
Sum of values |
avg(attr) |
Average |
min(attr) |
Minimum |
max(attr) |
Maximum |
std(attr) |
Standard deviation |
group_concat(attr) |
Concatenate values |
Union (+)¶
Combine entity sets from two tables with the same primary key.
# All subjects that are either mice or rats
Mouse + Rat
Requirements:
- Same primary key attributes
- No overlapping secondary attributes
Universal Set (dj.U())¶
Create ad-hoc groupings or extract unique values.
Unique Values¶
# Unique species
dj.U('species') & Subject
# Unique (year, month) combinations
dj.U('year', 'month') & Session.proj(year='YEAR(session_date)', month='MONTH(session_date)')
Aggregation by Non-Primary-Key Attributes¶
# Count sessions by date (session_date is not a primary key)
dj.U('session_date').aggr(Session, n='count(session_idx)')
# Count by experimenter
dj.U('experimenter_id').aggr(Session, n='count(session_idx)')
Universal Aggregation (Single Row Result)¶
# Total count across all sessions
dj.U().aggr(Session, total='count(*)')
# Global statistics
dj.U().aggr(Trial,
total='count(*)',
avg_rt='avg(reaction_time)',
std_rt='std(reaction_time)'
)
Operator Precedence¶
Python operator precedence applies:
| Precedence | Operator | Operation |
|---|---|---|
| Highest | * |
Join |
+, - |
Union, Anti-restriction | |
| Lowest | & |
Restriction |
Use parentheses to make intent clear:
# Join happens before restriction
Subject * Session & condition # Same as: (Subject * Session) & condition
# Use parentheses to restrict first
(Subject & condition) * Session
Semantic Matching¶
DataJoint uses semantic matching for joins and restrictions by query expression. Attributes match only if they have:
- The same name
- The same origin (traced through foreign key lineage)
This prevents accidental matches on attributes that happen to share names but represent different things (like generic id columns in unrelated tables).
# These match on subject_id because Session references Subject
Subject * Session # Correct: subject_id has same lineage
# These would error if both have 'name' from different origins
Student * Course # Error if both define their own 'name' attribute
Resolution: Rename attributes to avoid conflicts:
Student * Course.proj(..., course_name='name')
See Also¶
- Query Algebra Specification — Complete formal specification
- Fetch API — Retrieving query results
- Queries Tutorial — Hands-on examples