Queries¶
This tutorial covers how to query data in DataJoint. You'll learn:
- Restriction (
&,-) — Filtering rows - Top (
dj.Top) — Limiting and ordering results - Projection (
.proj()) — Selecting and computing columns - Join (
*) — Combining tables - Extension (
.extend()) — Adding optional attributes - Aggregation (
.aggr()) — Grouping and summarizing - Fetching — Retrieving data in various formats
DataJoint queries are lazy—they build SQL expressions that execute only when you fetch data.
import datajoint as dj
import numpy as np
schema = dj.Schema('tutorial_queries')
[2026-02-06 11:44:51] DataJoint 2.1.0 connected to datajoint@127.0.0.1:5432
# Define tables for this tutorial
@schema
class Subject(dj.Manual):
definition = """
subject_id : varchar(16)
---
species : varchar(50)
date_of_birth : date
sex : enum('M', 'F', 'U')
weight : decimal(4,1) # grams
"""
@schema
class Experimenter(dj.Manual):
definition = """
experimenter_id : varchar(16)
---
full_name : varchar(100)
"""
@schema
class Session(dj.Manual):
definition = """
-> Subject
session_idx : int32
---
-> Experimenter
session_date : date
duration : decimal(4,1) # minutes
"""
class Trial(dj.Part):
definition = """
-> master
trial_idx : int32
---
stimulus : varchar(50)
response : varchar(50)
correct : bool
reaction_time : decimal(3,2) # seconds
"""
# Insert sample data
import random
random.seed(42)
Experimenter.insert([
{'experimenter_id': 'alice', 'full_name': 'Alice Smith'},
{'experimenter_id': 'bob', 'full_name': 'Bob Jones'},
])
subjects = [
{'subject_id': 'M001', 'species': 'Mus musculus',
'date_of_birth': '2026-01-15', 'sex': 'M', 'weight': 25.3},
{'subject_id': 'M002', 'species': 'Mus musculus',
'date_of_birth': '2026-02-01', 'sex': 'F', 'weight': 22.1},
{'subject_id': 'M003', 'species': 'Mus musculus',
'date_of_birth': '2026-02-15', 'sex': 'M', 'weight': 26.8},
{'subject_id': 'R001', 'species': 'Rattus norvegicus',
'date_of_birth': '2024-01-01', 'sex': 'F', 'weight': 280.5},
]
Subject.insert(subjects)
# Insert sessions
sessions = [
{'subject_id': 'M001', 'session_idx': 1, 'experimenter_id': 'alice',
'session_date': '2026-01-06', 'duration': 45.0},
{'subject_id': 'M001', 'session_idx': 2, 'experimenter_id': 'alice',
'session_date': '2026-01-07', 'duration': 50.0},
{'subject_id': 'M002', 'session_idx': 1, 'experimenter_id': 'bob',
'session_date': '2026-01-06', 'duration': 40.0},
{'subject_id': 'M002', 'session_idx': 2, 'experimenter_id': 'bob',
'session_date': '2026-01-08', 'duration': 55.0},
{'subject_id': 'M003', 'session_idx': 1, 'experimenter_id': 'alice',
'session_date': '2026-01-07', 'duration': 35.0},
]
Session.insert(sessions)
# Insert trials
trials = []
for s in sessions:
for i in range(10):
trials.append({
'subject_id': s['subject_id'],
'session_idx': s['session_idx'],
'trial_idx': i + 1,
'stimulus': random.choice(['left', 'right']),
'response': random.choice(['left', 'right']),
'correct': random.random() > 0.3,
'reaction_time': random.uniform(0.2, 0.8)
})
Session.Trial.insert(trials)
print(f"Subjects: {len(Subject())}, Sessions: {len(Session())}, "
f"Trials: {len(Session.Trial())}")
Subjects: 4, Sessions: 5, Trials: 50
# Simple comparison
Subject & "weight > 25"
| subject_id | species | date_of_birth | sex | weight |
|---|---|---|---|---|
| M001 | Mus musculus | 2026-01-15 | M | 25.3 |
| M003 | Mus musculus | 2026-02-15 | M | 26.8 |
| R001 | Rattus norvegicus | 2024-01-01 | F | 280.5 |
Total: 3
# Date comparison
Session & "session_date > '2026-01-06'"
| subject_id | session_idx | experimenter_id | session_date | duration |
|---|---|---|---|---|
| M001 | 2 | alice | 2026-01-07 | 50.0 |
| M002 | 2 | bob | 2026-01-08 | 55.0 |
| M003 | 1 | alice | 2026-01-07 | 35.0 |
Total: 3
# Multiple conditions with AND
Subject & "sex = 'M' AND weight > 25"
| subject_id | species | date_of_birth | sex | weight |
|---|---|---|---|---|
| M001 | Mus musculus | 2026-01-15 | M | 25.3 |
| M003 | Mus musculus | 2026-02-15 | M | 26.8 |
Total: 2
Dictionary Conditions¶
Dictionaries specify exact matches:
# Single attribute
Subject & {'sex': 'F'}
| subject_id | species | date_of_birth | sex | weight |
|---|---|---|---|---|
| M002 | Mus musculus | 2026-02-01 | F | 22.1 |
| R001 | Rattus norvegicus | 2024-01-01 | F | 280.5 |
Total: 2
# Multiple attributes (AND)
Session & {'subject_id': 'M001', 'session_idx': 1}
| subject_id | session_idx | experimenter_id | session_date | duration |
|---|---|---|---|---|
| M001 | 1 | alice | 2026-01-06 | 45.0 |
Total: 1
Restriction by Query Expression¶
Restrict by another query expression. DataJoint uses semantic matching: attributes with the same name are matched only if they share the same origin through foreign key lineage. This prevents accidental matches on unrelated attributes that happen to share names (like generic id columns in unrelated tables).
See Semantic Matching for the full specification.
# Subjects that have at least one session
Subject & Session
| subject_id | species | date_of_birth | sex | weight |
|---|---|---|---|---|
| M001 | Mus musculus | 2026-01-15 | M | 25.3 |
| M002 | Mus musculus | 2026-02-01 | F | 22.1 |
| M003 | Mus musculus | 2026-02-15 | M | 26.8 |
Total: 3
# Subjects without any sessions (R001 has no sessions)
Subject - Session
| subject_id | species | date_of_birth | sex | weight |
|---|---|---|---|---|
| R001 | Rattus norvegicus | 2024-01-01 | F | 280.5 |
Total: 1
Collection Conditions (OR)¶
Lists create OR conditions:
# Either of these subjects
Subject & [{'subject_id': 'M001'}, {'subject_id': 'M002'}]
| subject_id | species | date_of_birth | sex | weight |
|---|---|---|---|---|
| M001 | Mus musculus | 2026-01-15 | M | 25.3 |
| M002 | Mus musculus | 2026-02-01 | F | 22.1 |
Total: 2
Chaining Restrictions¶
Sequential restrictions combine with AND:
# These are equivalent
result1 = Subject & "sex = 'M'" & "weight > 25"
result2 = (Subject & "sex = 'M'") & "weight > 25"
print(f"Result 1: {len(result1)} rows")
print(f"Result 2: {len(result2)} rows")
Result 1: 2 rows Result 2: 2 rows
Top Restriction (dj.Top)¶
dj.Top is a special restriction that limits and orders query results. Unlike fetch-time order_by and limit, dj.Top applies within the query itself, making it composable with other operators.
query & dj.Top(limit=N, order_by='attr DESC', offset=M)
This is useful when you need the "top N" rows as part of a larger query—for example, the 5 highest-scoring trials per session.
# Top 2 heaviest subjects
Subject & dj.Top(limit=2, order_by='weight DESC')
| subject_id | species | date_of_birth | sex | weight |
|---|---|---|---|---|
| R001 | Rattus norvegicus | 2024-01-01 | F | 280.5 |
| M003 | Mus musculus | 2026-02-15 | M | 26.8 |
Total: 2
# Skip first 2, then get next 2 (pagination)
Subject & dj.Top(limit=2, order_by='weight DESC', offset=2)
| subject_id | species | date_of_birth | sex | weight |
|---|---|---|---|---|
| M001 | Mus musculus | 2026-01-15 | M | 25.3 |
| M002 | Mus musculus | 2026-02-01 | F | 22.1 |
Total: 2
# Combine with other restrictions
(Subject & "sex = 'M'") & dj.Top(limit=1, order_by='weight DESC')
| subject_id | species | date_of_birth | sex | weight |
|---|---|---|---|---|
| M003 | Mus musculus | 2026-02-15 | M | 26.8 |
Total: 1
When to use dj.Top vs fetch-time order_by/limit:
- Use
dj.Topwhen the limited result needs to be joined or restricted further - Use fetch-time parameters (
to_dicts(order_by=..., limit=...)) for final output
Note: Some databases (including MySQL 8.0) don't support LIMIT in certain subquery contexts. If you encounter this limitation, fetch the keys first and use them as a restriction:
# Get trials only from the 2 longest sessions
# Workaround: fetch keys first, then use as restriction
longest_session_keys = (Session & dj.Top(limit=2, order_by='duration DESC')).keys()
Session.Trial & longest_session_keys
| subject_id | session_idx | trial_idx | stimulus | response | correct | reaction_time |
|---|---|---|---|---|---|---|
| M001 | 2 | 1 | right | left | True | 0.23 |
| M001 | 2 | 2 | left | right | True | 0.71 |
| M001 | 2 | 3 | left | right | False | 0.58 |
| M001 | 2 | 4 | right | left | True | 0.33 |
| M001 | 2 | 5 | right | left | True | 0.30 |
| M001 | 2 | 6 | left | left | True | 0.36 |
| M001 | 2 | 7 | left | right | True | 0.67 |
| M001 | 2 | 8 | left | left | True | 0.44 |
| M001 | 2 | 9 | left | left | True | 0.54 |
| M001 | 2 | 10 | right | left | True | 0.44 |
| M002 | 2 | 1 | right | left | False | 0.66 |
| M002 | 2 | 2 | left | right | True | 0.27 |
...
Total: 20
# Primary key only (no arguments)
Subject.proj()
| subject_id |
|---|
| M001 |
| M002 |
| M003 |
| R001 |
Total: 4
# Primary key + specific attributes
Subject.proj('species', 'sex')
| subject_id | species | sex |
|---|---|---|
| M001 | Mus musculus | M |
| M002 | Mus musculus | F |
| M003 | Mus musculus | M |
| R001 | Rattus norvegicus | F |
Total: 4
# All attributes (using ellipsis)
Subject.proj(...)
| subject_id | species | date_of_birth | sex | weight |
|---|---|---|---|---|
| M001 | Mus musculus | 2026-01-15 | M | 25.3 |
| M002 | Mus musculus | 2026-02-01 | F | 22.1 |
| M003 | Mus musculus | 2026-02-15 | M | 26.8 |
| R001 | Rattus norvegicus | 2024-01-01 | F | 280.5 |
Total: 4
# All except specific attributes
Subject.proj(..., '-weight')
| subject_id | species | date_of_birth | sex |
|---|---|---|---|
| M001 | Mus musculus | 2026-01-15 | M |
| M002 | Mus musculus | 2026-02-01 | F |
| M003 | Mus musculus | 2026-02-15 | M |
| R001 | Rattus norvegicus | 2024-01-01 | F |
Total: 4
Renaming Attributes¶
# Rename 'species' to 'animal_species'
Subject.proj(animal_species='species')
| subject_id | animal_species |
|---|---|
| M001 | Mus musculus |
| M002 | Mus musculus |
| M003 | Mus musculus |
| R001 | Rattus norvegicus |
Total: 4
Computed Attributes¶
# Arithmetic computation
Subject.proj('species', weight_kg='weight / 1000')
| subject_id | species | weight_kg |
|---|---|---|
| M001 | Mus musculus | 0.02530000000000000000 |
| M002 | Mus musculus | 0.02210000000000000000 |
| M003 | Mus musculus | 0.02680000000000000000 |
| R001 | Rattus norvegicus | 0.28050000000000000000 |
Total: 4
# Date functions
Session.proj('session_date', year='YEAR(session_date)', month='MONTH(session_date)')
| subject_id | session_idx | session_date | year | month |
|---|---|---|---|---|
| M001 | 1 | 2026-01-06 | 2026 | 1 |
| M001 | 2 | 2026-01-07 | 2026 | 1 |
| M002 | 1 | 2026-01-06 | 2026 | 1 |
| M002 | 2 | 2026-01-08 | 2026 | 1 |
| M003 | 1 | 2026-01-07 | 2026 | 1 |
Total: 5
Join (*)¶
Join combines tables on shared attributes. Unlike SQL, which offers many join variants (INNER, LEFT, RIGHT, FULL, CROSS, NATURAL), DataJoint provides one rigorous join operator with strict semantic rules.
The * operator:
- Matches only semantically compatible attributes (same name AND same origin via foreign key lineage)
- Produces a result with a valid primary key determined by functional dependencies
- Follows clear algebraic properties
This simplicity makes DataJoint queries unambiguous and composable.
# Join Subject and Session on subject_id
Subject * Session
| subject_id | session_idx | experimenter_id | session_date | duration | species | date_of_birth | sex | weight |
|---|---|---|---|---|---|---|---|---|
| M001 | 1 | alice | 2026-01-06 | 45.0 | Mus musculus | 2026-01-15 | M | 25.3 |
| M001 | 2 | alice | 2026-01-07 | 50.0 | Mus musculus | 2026-01-15 | M | 25.3 |
| M002 | 1 | bob | 2026-01-06 | 40.0 | Mus musculus | 2026-02-01 | F | 22.1 |
| M002 | 2 | bob | 2026-01-08 | 55.0 | Mus musculus | 2026-02-01 | F | 22.1 |
| M003 | 1 | alice | 2026-01-07 | 35.0 | Mus musculus | 2026-02-15 | M | 26.8 |
Total: 5
# Join then restrict
(Subject * Session) & "sex = 'M'"
| subject_id | session_idx | experimenter_id | session_date | duration | species | date_of_birth | sex | weight |
|---|---|---|---|---|---|---|---|---|
| M001 | 1 | alice | 2026-01-06 | 45.0 | Mus musculus | 2026-01-15 | M | 25.3 |
| M001 | 2 | alice | 2026-01-07 | 50.0 | Mus musculus | 2026-01-15 | M | 25.3 |
| M003 | 1 | alice | 2026-01-07 | 35.0 | Mus musculus | 2026-02-15 | M | 26.8 |
Total: 3
# Restrict then join (equivalent result)
(Subject & "sex = 'M'") * Session
| subject_id | session_idx | experimenter_id | session_date | duration | species | date_of_birth | sex | weight |
|---|---|---|---|---|---|---|---|---|
| M001 | 1 | alice | 2026-01-06 | 45.0 | Mus musculus | 2026-01-15 | M | 25.3 |
| M001 | 2 | alice | 2026-01-07 | 50.0 | Mus musculus | 2026-01-15 | M | 25.3 |
| M003 | 1 | alice | 2026-01-07 | 35.0 | Mus musculus | 2026-02-15 | M | 26.8 |
Total: 3
# Three-way join
(Subject * Session * Experimenter).proj('species', 'session_date', 'full_name')
| subject_id | session_idx | session_date | species | full_name |
|---|---|---|---|---|
| M001 | 1 | 2026-01-06 | Mus musculus | Alice Smith |
| M001 | 2 | 2026-01-07 | Mus musculus | Alice Smith |
| M002 | 1 | 2026-01-06 | Mus musculus | Bob Jones |
| M002 | 2 | 2026-01-08 | Mus musculus | Bob Jones |
| M003 | 1 | 2026-01-07 | Mus musculus | Alice Smith |
Total: 5
Primary Keys in Join Results¶
Every query result has a valid primary key. For joins, the result's primary key depends on functional dependencies between the operands:
| Condition | Result Primary Key |
|---|---|
A → B (A determines B) |
PK(A) |
B → A (B determines A) |
PK(B) |
| Both | PK(A) |
| Neither | PK(A) ∪ PK(B) |
"A determines B" means all of B's primary key attributes exist in A (as primary or secondary attributes).
In our example:
Sessionhas PK:(subject_id, session_idx)Trialhas PK:(subject_id, session_idx, trial_idx)
Since Session's PK is a subset of Trial's PK, Session → Trial. The join Session * Trial has the same primary key as Session.
See the Query Algebra Specification for the complete functional dependency rules.
Extension (.extend())¶
Sometimes you want to add attributes from a related table without losing rows that lack matching entries. The extend operator is a specialized join for this purpose.
A.extend(B) is equivalent to a left join: it preserves all rows from A, adding B's attributes where matches exist (with NULL where they don't).
Requirement: A must "determine" B—all of B's primary key attributes must exist in A. This ensures the result maintains A's entity identity.
# Session contains experimenter_id (FK to Experimenter)
# extend adds Experimenter's attributes while keeping all Sessions
Session.extend(Experimenter)
| subject_id | session_idx | experimenter_id | session_date | duration | full_name |
|---|---|---|---|---|---|
| M001 | 1 | alice | 2026-01-06 | 45.0 | Alice Smith |
| M001 | 2 | alice | 2026-01-07 | 50.0 | Alice Smith |
| M002 | 1 | bob | 2026-01-06 | 40.0 | Bob Jones |
| M002 | 2 | bob | 2026-01-08 | 55.0 | Bob Jones |
| M003 | 1 | alice | 2026-01-07 | 35.0 | Alice Smith |
Total: 5
Why extend instead of join?
A regular join (*) would exclude sessions if their experimenter wasn't in the Experimenter table. Extend preserves all sessions, filling in NULL for missing experimenter data. This is essential when you want to add optional attributes without filtering your results.
Aggregation (.aggr())¶
DataJoint aggregation operates entity-to-entity: you aggregate one entity type with respect to another. This differs fundamentally from SQL's GROUP BY, which groups by arbitrary attribute sets.
In DataJoint:
Session.aggr(Trial, n_trials='count(*)')
This reads: "For each Session entity, aggregate its associated Trial entities."
The equivalent SQL would be:
SELECT session.*, COUNT(*) as n_trials
FROM session
JOIN trial USING (subject_id, session_idx)
GROUP BY session.subject_id, session.session_idx
The key insight: aggregation always groups by the primary key of the left operand. This enforces meaningful groupings—you aggregate over well-defined entities, not arbitrary attribute combinations.
# Count trials per session
Session.aggr(Session.Trial, n_trials='count(*)')
| subject_id | session_idx | n_trials |
|---|---|---|
| M001 | 1 | 10 |
| M001 | 2 | 10 |
| M002 | 1 | 10 |
| M002 | 2 | 10 |
| M003 | 1 | 10 |
Total: 5
# Multiple aggregates
Session.aggr(
Session.Trial,
n_trials='count(*)',
n_correct='sum(CASE WHEN correct THEN 1 ELSE 0 END)',
avg_rt='avg(reaction_time)'
)
| subject_id | session_idx | n_trials | n_correct | avg_rt |
|---|---|---|---|---|
| M001 | 1 | 10 | 8 | 0.50800000000000000000 |
| M001 | 2 | 10 | 9 | 0.46000000000000000000 |
| M002 | 1 | 10 | 7 | 0.45900000000000000000 |
| M002 | 2 | 10 | 6 | 0.50300000000000000000 |
| M003 | 1 | 10 | 6 | 0.51100000000000000000 |
Total: 5
# Count sessions per subject
Subject.aggr(Session, n_sessions='count(*)')
| subject_id | n_sessions |
|---|---|
| M001 | 2 |
| M002 | 2 |
| M003 | 1 |
| R001 | 1 |
Total: 4
The exclude_nonmatching Parameter¶
By default, aggregation keeps all entities from the grouping table, even those without matches. This ensures you see zeros rather than missing rows.
However, count(*) counts the NULL-joined row as 1. To correctly count 0 for entities without matches, use count(pk_attribute) which excludes NULLs:
# All subjects, including those without sessions (n_sessions=0)
# count(session_idx) returns 0 for NULLs, unlike count(*)
Subject.aggr(Session, n_sessions='count(session_idx)')
| subject_id | n_sessions |
|---|---|
| M001 | 2 |
| M002 | 2 |
| M003 | 1 |
| R001 | 0 |
Total: 4
# Only subjects that have at least one session (exclude those without matches)
Subject.aggr(Session, n_sessions='count(session_idx)', exclude_nonmatching=True)
| subject_id | n_sessions |
|---|---|
| M001 | 2 |
| M002 | 2 |
| M003 | 1 |
Total: 3
Universal Set (dj.U())¶
What if you need to aggregate but there's no appropriate entity to group by? DataJoint provides dj.U() (the "universal set") for these cases.
dj.U() (no attributes) represents the singleton entity—the "one universe." Aggregating against it produces a single row with global statistics.
dj.U('attr1', 'attr2') creates an ad-hoc grouping entity from the specified attributes. This enables aggregation when no table exists with those attributes as its primary key.
For example, suppose you want to count sessions by session_date, but no table has session_date as its primary key. You can use dj.U('session_date') to create the grouping:
# Group by session_date (not a primary key in any table)
dj.U('session_date').aggr(Session, n_sessions='count(*)', total_duration='sum(duration)')
| session_date | n_sessions | total_duration |
|---|---|---|
| 2026-01-06 | 2 | 85.0 |
| 2026-01-07 | 2 | 85.0 |
| 2026-01-08 | 1 | 55.0 |
Total: 3
# Universal aggregation: dj.U() with no attributes produces one row
# This aggregates against the singleton "universe"
dj.U().aggr(Session, total_sessions='count(*)', avg_duration='avg(duration)')
| total_sessions | avg_duration |
|---|---|
| 5 | 45.0000000000000000 |
Total: 1
# Group by experimenter_id (a foreign key in Session, not part of Session's PK)
# Without dj.U(), we couldn't aggregate sessions by experimenter
dj.U('experimenter_id').aggr(Session, n_sessions='count(*)')
| experimenter_id | n_sessions |
|---|---|
| alice | 3 |
| bob | 2 |
Total: 2
# Unique values
dj.U('species') & Subject
| species |
|---|
| Mus musculus |
| Rattus norvegicus |
Total: 2
# Get all rows as list of dicts
rows = Subject.to_dicts()
rows[:2]
[{'subject_id': 'M001',
'species': 'Mus musculus',
'date_of_birth': datetime.date(2026, 1, 15),
'sex': 'M',
'weight': Decimal('25.3')},
{'subject_id': 'M002',
'species': 'Mus musculus',
'date_of_birth': datetime.date(2026, 2, 1),
'sex': 'F',
'weight': Decimal('22.1')}]
to_pandas() — DataFrame¶
# Get as pandas DataFrame (primary key as index)
df = Subject.to_pandas()
df
| species | date_of_birth | sex | weight | |
|---|---|---|---|---|
| subject_id | ||||
| M001 | Mus musculus | 2026-01-15 | M | 25.3 |
| M002 | Mus musculus | 2026-02-01 | F | 22.1 |
| M003 | Mus musculus | 2026-02-15 | M | 26.8 |
| R001 | Rattus norvegicus | 2024-01-01 | F | 280.5 |
to_arrays() — NumPy Arrays¶
# Structured array (all columns)
arr = Subject.to_arrays()
arr
array([('M001', 'Mus musculus', datetime.date(2026, 1, 15), 'M', Decimal('25.3')),
('M002', 'Mus musculus', datetime.date(2026, 2, 1), 'F', Decimal('22.1')),
('M003', 'Mus musculus', datetime.date(2026, 2, 15), 'M', Decimal('26.8')),
('R001', 'Rattus norvegicus', datetime.date(2024, 1, 1), 'F', Decimal('280.5'))],
dtype=[('subject_id', 'O'), ('species', 'O'), ('date_of_birth', 'O'), ('sex', 'O'), ('weight', 'O')])
# Specific columns as separate arrays
species, weights = Subject.to_arrays('species', 'weight')
print(f"Species: {species}")
print(f"Weights: {weights}")
Species: ['Mus musculus' 'Mus musculus' 'Mus musculus' 'Rattus norvegicus']
Weights: [Decimal('25.3') Decimal('22.1') Decimal('26.8') Decimal('280.5')]
keys() — Primary Keys¶
# Get primary keys for iteration
keys = Session.keys()
keys[:3]
[{'subject_id': 'M001', 'session_idx': 1},
{'subject_id': 'M001', 'session_idx': 2},
{'subject_id': 'M002', 'session_idx': 1}]
fetch1() — Single Row¶
# Fetch one row (raises error if not exactly 1)
row = (Subject & {'subject_id': 'M001'}).fetch1()
row
{'subject_id': 'M001',
'species': 'Mus musculus',
'date_of_birth': datetime.date(2026, 1, 15),
'sex': 'M',
'weight': Decimal('25.3')}
# Fetch specific attributes from one row
species, weight = (Subject & {'subject_id': 'M001'}).fetch1('species', 'weight')
print(f"{species}: {weight}g")
Mus musculus: 25.3g
Ordering and Limiting¶
# Sort by weight descending, get top 2
Subject.to_dicts(order_by='weight DESC', limit=2)
[{'subject_id': 'R001',
'species': 'Rattus norvegicus',
'date_of_birth': datetime.date(2024, 1, 1),
'sex': 'F',
'weight': Decimal('280.5')},
{'subject_id': 'M003',
'species': 'Mus musculus',
'date_of_birth': datetime.date(2026, 2, 15),
'sex': 'M',
'weight': Decimal('26.8')}]
# Sort by primary key
Subject.to_dicts(order_by='KEY')
[{'subject_id': 'M001',
'species': 'Mus musculus',
'date_of_birth': datetime.date(2026, 1, 15),
'sex': 'M',
'weight': Decimal('25.3')},
{'subject_id': 'M002',
'species': 'Mus musculus',
'date_of_birth': datetime.date(2026, 2, 1),
'sex': 'F',
'weight': Decimal('22.1')},
{'subject_id': 'M003',
'species': 'Mus musculus',
'date_of_birth': datetime.date(2026, 2, 15),
'sex': 'M',
'weight': Decimal('26.8')},
{'subject_id': 'R001',
'species': 'Rattus norvegicus',
'date_of_birth': datetime.date(2024, 1, 1),
'sex': 'F',
'weight': Decimal('280.5')}]
Lazy Iteration¶
Iterating directly over a table streams rows efficiently:
# Stream rows (single database cursor)
for row in Subject:
print(f"{row['subject_id']}: {row['species']}")
M001: Mus musculus M002: Mus musculus M003: Mus musculus R001: Rattus norvegicus
Query Composition¶
Queries are composable and immutable. Build complex queries step by step:
# Build a complex query step by step
male_mice = Subject & "sex = 'M'" & "species LIKE '%musculus%'"
sessions_with_subject = male_mice * Session
alice_sessions = sessions_with_subject & {'experimenter_id': 'alice'}
result = alice_sessions.proj('session_date', 'duration', 'weight')
result
| subject_id | session_idx | session_date | duration | weight |
|---|---|---|---|---|
| M001 | 1 | 2026-01-06 | 45.0 | 25.3 |
| M001 | 2 | 2026-01-07 | 50.0 | 25.3 |
| M003 | 1 | 2026-01-07 | 35.0 | 26.8 |
Total: 3
# Or as a single expression
((Subject & "sex = 'M'" & "species LIKE '%musculus%'")
* Session
& {'experimenter_id': 'alice'}
).proj('session_date', 'duration', 'weight')
| subject_id | session_idx | session_date | duration | weight |
|---|---|---|---|---|
| M001 | 1 | 2026-01-06 | 45.0 | 25.3 |
| M001 | 2 | 2026-01-07 | 50.0 | 25.3 |
| M003 | 1 | 2026-01-07 | 35.0 | 26.8 |
Total: 3
Operator Precedence¶
Python operator precedence applies:
*(join) — highest+,-(union, anti-restriction)&(restriction) — lowest
Use parentheses for clarity:
# Without parentheses: join happens first
# Subject * Session & condition means (Subject * Session) & condition
# With parentheses: explicit order
result1 = (Subject & "sex = 'M'") * Session # Restrict then join
result2 = Subject * (Session & "duration > 40") # Restrict then join
print(f"Result 1: {len(result1)} rows")
print(f"Result 2: {len(result2)} rows")
Result 1: 3 rows Result 2: 3 rows
Quick Reference¶
Operators¶
| Operation | Syntax | Description |
|---|---|---|
| Restrict | A & cond |
Select matching rows |
| Anti-restrict | A - cond |
Select non-matching rows |
| Top | A & dj.Top(limit, order_by) |
Limit/order results |
| Project | A.proj(...) |
Select/compute columns |
| Join | A * B |
Combine tables |
| Extend | A.extend(B) |
Add B's attributes, keep all A rows |
| Aggregate | A.aggr(B, ...) |
Group and summarize |
| Union | A + B |
Combine entity sets |
Fetch Methods¶
| Method | Returns | Use Case |
|---|---|---|
to_dicts() |
list[dict] |
JSON, iteration |
to_pandas() |
DataFrame |
Data analysis |
to_arrays() |
np.ndarray |
Numeric computation |
to_arrays('a', 'b') |
tuple[array, ...] |
Specific columns |
keys() |
list[dict] |
Primary keys |
fetch1() |
dict |
Single row |
See the Query Algebra Specification and Fetch API for complete details.
Next Steps¶
- Computation — Building computational pipelines
# Cleanup
schema.drop(prompt=False)