Queries¶

This tutorial covers how to query data in DataJoint. You'll learn:

Restriction (&, -) — Filtering rows
Top (dj.Top) — Limiting and ordering results
Projection (.proj()) — Selecting and computing columns
Join (*) — Combining tables
Extension (.extend()) — Adding optional attributes
Aggregation (.aggr()) — Grouping and summarizing
Fetching — Retrieving data in various formats

DataJoint queries are lazy—they build SQL expressions that execute only when you fetch data.

In [1]:

Copied!

import datajoint as dj
import numpy as np

schema = dj.Schema('tutorial_queries')
import datajoint as dj
import numpy as np

schema = dj.Schema('tutorial_queries')

[2026-05-18 19:42:52] DataJoint 2.2.2 connected to postgres@postgres:5432

In [2]:

Copied!





# Define tables for this tutorial
@schema
class Subject(dj.Manual):
    definition = """
    subject_id : varchar(16)
    ---
    species : varchar(50)
    date_of_birth : date
    sex : enum('M', 'F', 'U')
    weight : decimal(4,1)        # grams
    """

@schema
class Experimenter(dj.Manual):
    definition = """
    experimenter_id : varchar(16)
    ---
    full_name : varchar(100)
    """

@schema
class Session(dj.Manual):
    definition = """
    -> Subject
    session_idx : int32
    ---
    -> Experimenter
    session_date : date
    duration : decimal(4,1)      # minutes
    """

    class Trial(dj.Part):
        definition = """
        -> master
        trial_idx : int32
        ---
        stimulus : varchar(50)
        response : varchar(50)
        correct : bool
        reaction_time : decimal(3,2)  # seconds
        """
# Define tables for this tutorial
@schema
class Subject(dj.Manual):
    definition = """
    subject_id : varchar(16)
    ---
    species : varchar(50)
    date_of_birth : date
    sex : enum('M', 'F', 'U')
    weight : decimal(4,1)        # grams
    """

@schema
class Experimenter(dj.Manual):
    definition = """
    experimenter_id : varchar(16)
    ---
    full_name : varchar(100)
    """

@schema
class Session(dj.Manual):
    definition = """
    -> Subject
    session_idx : int32
    ---
    -> Experimenter
    session_date : date
    duration : decimal(4,1)      # minutes
    """

    class Trial(dj.Part):
        definition = """
        -> master
        trial_idx : int32
        ---
        stimulus : varchar(50)
        response : varchar(50)
        correct : bool
        reaction_time : decimal(3,2)  # seconds
        """

In [3]:

Copied!





# Insert sample data
import random
random.seed(42)

Experimenter.insert([
    {'experimenter_id': 'alice', 'full_name': 'Alice Smith'},
    {'experimenter_id': 'bob', 'full_name': 'Bob Jones'},
])

subjects = [
    {'subject_id': 'M001', 'species': 'Mus musculus',
     'date_of_birth': '2026-01-15', 'sex': 'M', 'weight': 25.3},
    {'subject_id': 'M002', 'species': 'Mus musculus',
     'date_of_birth': '2026-02-01', 'sex': 'F', 'weight': 22.1},
    {'subject_id': 'M003', 'species': 'Mus musculus',
     'date_of_birth': '2026-02-15', 'sex': 'M', 'weight': 26.8},
    {'subject_id': 'R001', 'species': 'Rattus norvegicus',
     'date_of_birth': '2024-01-01', 'sex': 'F', 'weight': 280.5},
]
Subject.insert(subjects)

# Insert sessions
sessions = [
    {'subject_id': 'M001', 'session_idx': 1, 'experimenter_id': 'alice',
     'session_date': '2026-01-06', 'duration': 45.0},
    {'subject_id': 'M001', 'session_idx': 2, 'experimenter_id': 'alice',
     'session_date': '2026-01-07', 'duration': 50.0},
    {'subject_id': 'M002', 'session_idx': 1, 'experimenter_id': 'bob',
     'session_date': '2026-01-06', 'duration': 40.0},
    {'subject_id': 'M002', 'session_idx': 2, 'experimenter_id': 'bob',
     'session_date': '2026-01-08', 'duration': 55.0},
    {'subject_id': 'M003', 'session_idx': 1, 'experimenter_id': 'alice',
     'session_date': '2026-01-07', 'duration': 35.0},
]
Session.insert(sessions)

# Insert trials
trials = []
for s in sessions:
    for i in range(10):
        trials.append({
            'subject_id': s['subject_id'],
            'session_idx': s['session_idx'],
            'trial_idx': i + 1,
            'stimulus': random.choice(['left', 'right']),
            'response': random.choice(['left', 'right']),
            'correct': random.random() > 0.3,
            'reaction_time': random.uniform(0.2, 0.8)
        })
Session.Trial.insert(trials)

print(f"Subjects: {len(Subject())}, Sessions: {len(Session())}, "
      f"Trials: {len(Session.Trial())}")
# Insert sample data
import random
random.seed(42)

Experimenter.insert([
    {'experimenter_id': 'alice', 'full_name': 'Alice Smith'},
    {'experimenter_id': 'bob', 'full_name': 'Bob Jones'},
])

subjects = [
    {'subject_id': 'M001', 'species': 'Mus musculus',
     'date_of_birth': '2026-01-15', 'sex': 'M', 'weight': 25.3},
    {'subject_id': 'M002', 'species': 'Mus musculus',
     'date_of_birth': '2026-02-01', 'sex': 'F', 'weight': 22.1},
    {'subject_id': 'M003', 'species': 'Mus musculus',
     'date_of_birth': '2026-02-15', 'sex': 'M', 'weight': 26.8},
    {'subject_id': 'R001', 'species': 'Rattus norvegicus',
     'date_of_birth': '2024-01-01', 'sex': 'F', 'weight': 280.5},
]
Subject.insert(subjects)

# Insert sessions
sessions = [
    {'subject_id': 'M001', 'session_idx': 1, 'experimenter_id': 'alice',
     'session_date': '2026-01-06', 'duration': 45.0},
    {'subject_id': 'M001', 'session_idx': 2, 'experimenter_id': 'alice',
     'session_date': '2026-01-07', 'duration': 50.0},
    {'subject_id': 'M002', 'session_idx': 1, 'experimenter_id': 'bob',
     'session_date': '2026-01-06', 'duration': 40.0},
    {'subject_id': 'M002', 'session_idx': 2, 'experimenter_id': 'bob',
     'session_date': '2026-01-08', 'duration': 55.0},
    {'subject_id': 'M003', 'session_idx': 1, 'experimenter_id': 'alice',
     'session_date': '2026-01-07', 'duration': 35.0},
]
Session.insert(sessions)

# Insert trials
trials = []
for s in sessions:
    for i in range(10):
        trials.append({
            'subject_id': s['subject_id'],
            'session_idx': s['session_idx'],
            'trial_idx': i + 1,
            'stimulus': random.choice(['left', 'right']),
            'response': random.choice(['left', 'right']),
            'correct': random.random() > 0.3,
            'reaction_time': random.uniform(0.2, 0.8)
        })
Session.Trial.insert(trials)

print(f"Subjects: {len(Subject())}, Sessions: {len(Session())}, "
      f"Trials: {len(Session.Trial())}")

Subjects: 4, Sessions: 5, Trials: 50

Restriction (`&` and `-`)¶

Restriction filters rows based on conditions. Use & to select matching rows, - to exclude them.

String Conditions¶

SQL expressions using attribute names:

In [4]:

Copied!

# Simple comparison
Subject & "weight > 25"
# Simple comparison
Subject & "weight > 25"

Out[4]:

subject_id	species	date_of_birth	sex	weight
M001	Mus musculus	2026-01-15	M	25.3
M003	Mus musculus	2026-02-15	M	26.8
R001	Rattus norvegicus	2024-01-01	F	280.5

Total: 3

In [5]:

Copied!

# Date comparison
Session & "session_date > '2026-01-06'"
# Date comparison
Session & "session_date > '2026-01-06'"

Out[5]:

subject_id	session_idx	experimenter_id	session_date	duration
M001	2	alice	2026-01-07	50.0
M002	2	bob	2026-01-08	55.0
M003	1	alice	2026-01-07	35.0

Total: 3

In [6]:

Copied!

# Multiple conditions with AND
Subject & "sex = 'M' AND weight > 25"
# Multiple conditions with AND
Subject & "sex = 'M' AND weight > 25"

Out[6]:

subject_id	species	date_of_birth	sex	weight
M001	Mus musculus	2026-01-15	M	25.3
M003	Mus musculus	2026-02-15	M	26.8

Total: 2

Dictionary Conditions¶

Dictionaries specify exact matches:

In [7]:

Copied!

# Single attribute
Subject & {'sex': 'F'}
# Single attribute
Subject & {'sex': 'F'}

Out[7]:

subject_id	species	date_of_birth	sex	weight
M002	Mus musculus	2026-02-01	F	22.1
R001	Rattus norvegicus	2024-01-01	F	280.5

Total: 2

In [8]:

Copied!

# Multiple attributes (AND)
Session & {'subject_id': 'M001', 'session_idx': 1}
# Multiple attributes (AND)
Session & {'subject_id': 'M001', 'session_idx': 1}

Out[8]:

subject_id	session_idx	experimenter_id	session_date	duration
M001	1	alice	2026-01-06	45.0

Total: 1

Restriction by Query Expression¶

Restrict by another query expression. DataJoint uses semantic matching: attributes with the same name are matched only if they share the same origin through foreign key lineage. This prevents accidental matches on unrelated attributes that happen to share names (like generic id columns in unrelated tables).

See Semantic Matching for the full specification.

In [9]:

Copied!

# Subjects that have at least one session
Subject & Session
# Subjects that have at least one session
Subject & Session

Out[9]:

subject_id	species	date_of_birth	sex	weight
M001	Mus musculus	2026-01-15	M	25.3
M002	Mus musculus	2026-02-01	F	22.1
M003	Mus musculus	2026-02-15	M	26.8

Total: 3

In [10]:

Copied!

# Subjects without any sessions (R001 has no sessions)
Subject - Session
# Subjects without any sessions (R001 has no sessions)
Subject - Session

Out[10]:

subject_id	species	date_of_birth	sex	weight
R001	Rattus norvegicus	2024-01-01	F	280.5

Total: 1

Collection Conditions (OR)¶

Lists create OR conditions:

In [11]:

Copied!

# Either of these subjects
Subject & [{'subject_id': 'M001'}, {'subject_id': 'M002'}]
# Either of these subjects
Subject & [{'subject_id': 'M001'}, {'subject_id': 'M002'}]

Out[11]:

subject_id	species	date_of_birth	sex	weight
M001	Mus musculus	2026-01-15	M	25.3
M002	Mus musculus	2026-02-01	F	22.1

Total: 2

Chaining Restrictions¶

Sequential restrictions combine with AND:

In [12]:

Copied!





# These are equivalent
result1 = Subject & "sex = 'M'" & "weight > 25"
result2 = (Subject & "sex = 'M'") & "weight > 25"

print(f"Result 1: {len(result1)} rows")
print(f"Result 2: {len(result2)} rows")
# These are equivalent
result1 = Subject & "sex = 'M'" & "weight > 25"
result2 = (Subject & "sex = 'M'") & "weight > 25"

print(f"Result 1: {len(result1)} rows")
print(f"Result 2: {len(result2)} rows")

Result 1: 2 rows
Result 2: 2 rows

Top Restriction (`dj.Top`)¶

dj.Top is a special restriction that limits and orders query results. Unlike fetch-time order_by and limit, dj.Top applies within the query itself, making it composable with other operators.

query & dj.Top(limit=N, order_by='attr DESC', offset=M)

This is useful when you need the "top N" rows as part of a larger query—for example, the 5 highest-scoring trials per session.

In [13]:

Copied!

# Top 2 heaviest subjects
Subject & dj.Top(limit=2, order_by='weight DESC')
# Top 2 heaviest subjects
Subject & dj.Top(limit=2, order_by='weight DESC')

Out[13]:

subject_id	species	date_of_birth	sex	weight
R001	Rattus norvegicus	2024-01-01	F	280.5
M003	Mus musculus	2026-02-15	M	26.8

Total: 2

In [14]:

Copied!

# Skip first 2, then get next 2 (pagination)
Subject & dj.Top(limit=2, order_by='weight DESC', offset=2)
# Skip first 2, then get next 2 (pagination)
Subject & dj.Top(limit=2, order_by='weight DESC', offset=2)

Out[14]:

subject_id	species	date_of_birth	sex	weight
M001	Mus musculus	2026-01-15	M	25.3
M002	Mus musculus	2026-02-01	F	22.1

Total: 2

In [15]:

Copied!

# Combine with other restrictions
(Subject & "sex = 'M'") & dj.Top(limit=1, order_by='weight DESC')
# Combine with other restrictions
(Subject & "sex = 'M'") & dj.Top(limit=1, order_by='weight DESC')

Out[15]:

subject_id	species	date_of_birth	sex	weight
M003	Mus musculus	2026-02-15	M	26.8

Total: 1

When to use dj.Top vs fetch-time order_by/limit:

Use dj.Top when the limited result needs to be joined or restricted further
Use fetch-time parameters (to_dicts(order_by=..., limit=...)) for final output

Note: Some databases (including MySQL 8.0) don't support LIMIT in certain subquery contexts. If you encounter this limitation, fetch the keys first and use them as a restriction:

In [16]:

Copied!





# Get trials only from the 2 longest sessions
# Workaround: fetch keys first, then use as restriction
longest_session_keys = (Session & dj.Top(limit=2, order_by='duration DESC')).keys()
Session.Trial & longest_session_keys
# Get trials only from the 2 longest sessions
# Workaround: fetch keys first, then use as restriction
longest_session_keys = (Session & dj.Top(limit=2, order_by='duration DESC')).keys()
Session.Trial & longest_session_keys

Out[16]:

subject_id	session_idx	trial_idx	stimulus	response	correct	reaction_time
M001	2	1	right	left	True	0.23
M001	2	2	left	right	True	0.71
M001	2	3	left	right	False	0.58
M001	2	4	right	left	True	0.33
M001	2	5	right	left	True	0.30
M001	2	6	left	left	True	0.36
M001	2	7	left	right	True	0.67
M001	2	8	left	left	True	0.44
M001	2	9	left	left	True	0.54
M001	2	10	right	left	True	0.44
M002	2	1	right	left	False	0.66
M002	2	2	left	right	True	0.27

...

Total: 20

Projection (`.proj()`)¶

Projection selects, renames, or computes attributes.

Selecting Attributes¶

In [17]:

Copied!

# Primary key only (no arguments)
Subject.proj()
# Primary key only (no arguments)
Subject.proj()

Out[17]:

subject_id
M001
M002
M003
R001

Total: 4

In [18]:

Copied!

# Primary key + specific attributes
Subject.proj('species', 'sex')
# Primary key + specific attributes
Subject.proj('species', 'sex')

Out[18]:

subject_id	species	sex
M001	Mus musculus	M
M002	Mus musculus	F
M003	Mus musculus	M
R001	Rattus norvegicus	F

Total: 4

In [19]:

Copied!

# All attributes (using ellipsis)
Subject.proj(...)
# All attributes (using ellipsis)
Subject.proj(...)

Out[19]:

subject_id	species	date_of_birth	sex	weight
M001	Mus musculus	2026-01-15	M	25.3
M002	Mus musculus	2026-02-01	F	22.1
M003	Mus musculus	2026-02-15	M	26.8
R001	Rattus norvegicus	2024-01-01	F	280.5

Total: 4

In [20]:

Copied!

# All except specific attributes
Subject.proj(..., '-weight')
# All except specific attributes
Subject.proj(..., '-weight')

Out[20]:

subject_id	species	date_of_birth	sex
M001	Mus musculus	2026-01-15	M
M002	Mus musculus	2026-02-01	F
M003	Mus musculus	2026-02-15	M
R001	Rattus norvegicus	2024-01-01	F

Total: 4

Renaming Attributes¶

In [21]:

Copied!

# Rename 'species' to 'animal_species'
Subject.proj(animal_species='species')
# Rename 'species' to 'animal_species'
Subject.proj(animal_species='species')

Out[21]:

subject_id	animal_species
M001	Mus musculus
M002	Mus musculus
M003	Mus musculus
R001	Rattus norvegicus

Total: 4

Computed Attributes¶

In [22]:

Copied!

# Arithmetic computation
Subject.proj('species', weight_kg='weight / 1000')
# Arithmetic computation
Subject.proj('species', weight_kg='weight / 1000')

Out[22]:

subject_id	species	weight_kg
M001	Mus musculus	0.02530000000000000000
M002	Mus musculus	0.02210000000000000000
M003	Mus musculus	0.02680000000000000000
R001	Rattus norvegicus	0.28050000000000000000

Total: 4

In [23]:

Copied!

# Date functions
Session.proj('session_date', year='YEAR(session_date)', month='MONTH(session_date)')
# Date functions
Session.proj('session_date', year='YEAR(session_date)', month='MONTH(session_date)')

Out[23]:

subject_id	session_idx	session_date	year	month
M001	1	2026-01-06	2026	1
M001	2	2026-01-07	2026	1
M002	1	2026-01-06	2026	1
M002	2	2026-01-08	2026	1
M003	1	2026-01-07	2026	1

Total: 5

Join (`*`)¶

Join combines tables on shared attributes. Unlike SQL, which offers many join variants (INNER, LEFT, RIGHT, FULL, CROSS, NATURAL), DataJoint provides one rigorous join operator with strict semantic rules.

The * operator:

Matches only semantically compatible attributes (same name AND same origin via foreign key lineage)
Produces a result with a valid primary key determined by functional dependencies
Follows clear algebraic properties

This simplicity makes DataJoint queries unambiguous and composable.

In [24]:

Copied!

# Join Subject and Session on subject_id
Subject * Session
# Join Subject and Session on subject_id
Subject * Session

Out[24]:

subject_id	session_idx	experimenter_id	session_date	duration	species	date_of_birth	sex	weight
M001	1	alice	2026-01-06	45.0	Mus musculus	2026-01-15	M	25.3
M001	2	alice	2026-01-07	50.0	Mus musculus	2026-01-15	M	25.3
M002	1	bob	2026-01-06	40.0	Mus musculus	2026-02-01	F	22.1
M002	2	bob	2026-01-08	55.0	Mus musculus	2026-02-01	F	22.1
M003	1	alice	2026-01-07	35.0	Mus musculus	2026-02-15	M	26.8

Total: 5

In [25]:

Copied!

# Join then restrict
(Subject * Session) & "sex = 'M'"
# Join then restrict
(Subject * Session) & "sex = 'M'"

Out[25]:

subject_id	session_idx	experimenter_id	session_date	duration	species	date_of_birth	sex	weight
M001	1	alice	2026-01-06	45.0	Mus musculus	2026-01-15	M	25.3
M001	2	alice	2026-01-07	50.0	Mus musculus	2026-01-15	M	25.3
M003	1	alice	2026-01-07	35.0	Mus musculus	2026-02-15	M	26.8

Total: 3

In [26]:

Copied!

# Restrict then join (equivalent result)
(Subject & "sex = 'M'") * Session
# Restrict then join (equivalent result)
(Subject & "sex = 'M'") * Session

Out[26]:

subject_id	session_idx	experimenter_id	session_date	duration	species	date_of_birth	sex	weight
M001	1	alice	2026-01-06	45.0	Mus musculus	2026-01-15	M	25.3
M001	2	alice	2026-01-07	50.0	Mus musculus	2026-01-15	M	25.3
M003	1	alice	2026-01-07	35.0	Mus musculus	2026-02-15	M	26.8

Total: 3

In [27]:

Copied!

# Three-way join
(Subject * Session * Experimenter).proj('species', 'session_date', 'full_name')
# Three-way join
(Subject * Session * Experimenter).proj('species', 'session_date', 'full_name')

Out[27]:

subject_id	session_idx	session_date	species	full_name
M001	1	2026-01-06	Mus musculus	Alice Smith
M001	2	2026-01-07	Mus musculus	Alice Smith
M002	1	2026-01-06	Mus musculus	Bob Jones
M002	2	2026-01-08	Mus musculus	Bob Jones
M003	1	2026-01-07	Mus musculus	Alice Smith

Total: 5

Primary Keys in Join Results¶

Every query result has a valid primary key. For joins, the result's primary key depends on functional dependencies between the operands:

Condition	Result Primary Key
`A → B` (A determines B)	PK(A)
`B → A` (B determines A)	PK(B)
Both	PK(A)
Neither	PK(A) ∪ PK(B)

"A determines B" means all of B's primary key attributes exist in A (as primary or secondary attributes).

In our example:

Session has PK: (subject_id, session_idx)
Trial has PK: (subject_id, session_idx, trial_idx)

Since Session's PK is a subset of Trial's PK, Session → Trial. The join Session * Trial has the same primary key as Session.

See the Query Algebra Specification for the complete functional dependency rules.

Extension (`.extend()`)¶

Sometimes you want to add attributes from a related table without losing rows that lack matching entries. The extend operator is a specialized join for this purpose.

A.extend(B) is equivalent to a left join: it preserves all rows from A, adding B's attributes where matches exist (with NULL where they don't).

Requirement: A must "determine" B—all of B's primary key attributes must exist in A. This ensures the result maintains A's entity identity.

In [28]:

Copied!

# Session contains experimenter_id (FK to Experimenter)
# extend adds Experimenter's attributes while keeping all Sessions
Session.extend(Experimenter)
# Session contains experimenter_id (FK to Experimenter)
# extend adds Experimenter's attributes while keeping all Sessions
Session.extend(Experimenter)

Out[28]:

subject_id	session_idx	experimenter_id	session_date	duration	full_name
M001	1	alice	2026-01-06	45.0	Alice Smith
M001	2	alice	2026-01-07	50.0	Alice Smith
M002	1	bob	2026-01-06	40.0	Bob Jones
M002	2	bob	2026-01-08	55.0	Bob Jones
M003	1	alice	2026-01-07	35.0	Alice Smith

Total: 5

Why extend instead of join?

A regular join (*) would exclude sessions if their experimenter wasn't in the Experimenter table. Extend preserves all sessions, filling in NULL for missing experimenter data. This is essential when you want to add optional attributes without filtering your results.

Aggregation (`.aggr()`)¶

DataJoint aggregation operates entity-to-entity: you aggregate one entity type with respect to another. This differs fundamentally from SQL's GROUP BY, which groups by arbitrary attribute sets.

In DataJoint:

Session.aggr(Trial, n_trials='count(*)')

This reads: "For each Session entity, aggregate its associated Trial entities."

The equivalent SQL would be:

SELECT session.*, COUNT(*) as n_trials
FROM session
JOIN trial USING (subject_id, session_idx)
GROUP BY session.subject_id, session.session_idx

The key insight: aggregation always groups by the primary key of the left operand. This enforces meaningful groupings—you aggregate over well-defined entities, not arbitrary attribute combinations.

In [29]:

Copied!

# Count trials per session
Session.aggr(Session.Trial, n_trials='count(*)')
# Count trials per session
Session.aggr(Session.Trial, n_trials='count(*)')

Out[29]:

subject_id	session_idx	n_trials
M001	1	10
M001	2	10
M002	1	10
M002	2	10
M003	1	10

Total: 5

In [30]:

Copied!





# Multiple aggregates
Session.aggr(
    Session.Trial,
    n_trials='count(*)',
    n_correct='sum(CASE WHEN correct THEN 1 ELSE 0 END)',
    avg_rt='avg(reaction_time)'
)
# Multiple aggregates
Session.aggr(
    Session.Trial,
    n_trials='count(*)',
    n_correct='sum(CASE WHEN correct THEN 1 ELSE 0 END)',
    avg_rt='avg(reaction_time)'
)

Out[30]:

subject_id	session_idx	n_trials	n_correct	avg_rt
M001	1	10	8	0.50800000000000000000
M001	2	10	9	0.46000000000000000000
M002	1	10	7	0.45900000000000000000
M002	2	10	6	0.50300000000000000000
M003	1	10	6	0.51100000000000000000

Total: 5

In [31]:

Copied!

# Count sessions per subject
Subject.aggr(Session, n_sessions='count(*)')
# Count sessions per subject
Subject.aggr(Session, n_sessions='count(*)')

Out[31]:

subject_id	n_sessions
M001	2
M002	2
M003	1
R001	1

Total: 4

The `exclude_nonmatching` Parameter¶

By default, aggregation keeps all entities from the grouping table, even those without matches. This ensures you see zeros rather than missing rows.

However, count(*) counts the NULL-joined row as 1. To correctly count 0 for entities without matches, use count(pk_attribute) which excludes NULLs:

In [32]:

Copied!

# All subjects, including those without sessions (n_sessions=0)
# count(session_idx) returns 0 for NULLs, unlike count(*)
Subject.aggr(Session, n_sessions='count(session_idx)')
# All subjects, including those without sessions (n_sessions=0)
# count(session_idx) returns 0 for NULLs, unlike count(*)
Subject.aggr(Session, n_sessions='count(session_idx)')

Out[32]:

subject_id	n_sessions
M001	2
M002	2
M003	1
R001	0

Total: 4

In [33]:

Copied!

# Only subjects that have at least one session (exclude those without matches)
Subject.aggr(Session, n_sessions='count(session_idx)', exclude_nonmatching=True)
# Only subjects that have at least one session (exclude those without matches)
Subject.aggr(Session, n_sessions='count(session_idx)', exclude_nonmatching=True)

Out[33]:

subject_id	n_sessions
M001	2
M002	2
M003	1

Total: 3

Universal Set (`dj.U()`)¶

What if you need to aggregate but there's no appropriate entity to group by? DataJoint provides dj.U() (the "universal set") for these cases.

dj.U() (no attributes) represents the singleton entity—the "one universe." Aggregating against it produces a single row with global statistics.

dj.U('attr1', 'attr2') creates an ad-hoc grouping entity from the specified attributes. This enables aggregation when no table exists with those attributes as its primary key.

For example, suppose you want to count sessions by session_date, but no table has session_date as its primary key. You can use dj.U('session_date') to create the grouping:

In [34]:

Copied!

# Group by session_date (not a primary key in any table)
dj.U('session_date').aggr(Session, n_sessions='count(*)', total_duration='sum(duration)')
# Group by session_date (not a primary key in any table)
dj.U('session_date').aggr(Session, n_sessions='count(*)', total_duration='sum(duration)')

Out[34]:

session_date	n_sessions	total_duration
2026-01-06	2	85.0
2026-01-07	2	85.0
2026-01-08	1	55.0

Total: 3

In [35]:

Copied!

# Universal aggregation: dj.U() with no attributes produces one row
# This aggregates against the singleton "universe"
dj.U().aggr(Session, total_sessions='count(*)', avg_duration='avg(duration)')
# Universal aggregation: dj.U() with no attributes produces one row
# This aggregates against the singleton "universe"
dj.U().aggr(Session, total_sessions='count(*)', avg_duration='avg(duration)')

Out[35]:

total_sessions	avg_duration
5	45.0000000000000000

Total: 1

In [36]:

Copied!

# Group by experimenter_id (a foreign key in Session, not part of Session's PK)
# Without dj.U(), we couldn't aggregate sessions by experimenter
dj.U('experimenter_id').aggr(Session, n_sessions='count(*)')
# Group by experimenter_id (a foreign key in Session, not part of Session's PK)
# Without dj.U(), we couldn't aggregate sessions by experimenter
dj.U('experimenter_id').aggr(Session, n_sessions='count(*)')

Out[36]:

experimenter_id	n_sessions
alice	3
bob	2

Total: 2

In [37]:

Copied!

# Unique values
dj.U('species') & Subject
# Unique values
dj.U('species') & Subject

Out[37]:

species
Mus musculus
Rattus norvegicus

Total: 2

Fetching Data¶

DataJoint 2.0 provides explicit methods for different output formats.

`to_dicts()` — List of Dictionaries¶

In [38]:

Copied!

# Get all rows as list of dicts
rows = Subject.to_dicts()
rows[:2]
# Get all rows as list of dicts
rows = Subject.to_dicts()
rows[:2]

Out[38]:

[{'subject_id': 'M001',
  'species': 'Mus musculus',
  'date_of_birth': datetime.date(2026, 1, 15),
  'sex': 'M',
  'weight': Decimal('25.3')},
 {'subject_id': 'M002',
  'species': 'Mus musculus',
  'date_of_birth': datetime.date(2026, 2, 1),
  'sex': 'F',
  'weight': Decimal('22.1')}]

`to_pandas()` — DataFrame¶

In [39]:

Copied!

# Get as pandas DataFrame (primary key as index)
df = Subject.to_pandas()
df
# Get as pandas DataFrame (primary key as index)
df = Subject.to_pandas()
df

Out[39]:

	species	date_of_birth	sex	weight
subject_id
M001	Mus musculus	2026-01-15	M	25.3
M002	Mus musculus	2026-02-01	F	22.1
M003	Mus musculus	2026-02-15	M	26.8
R001	Rattus norvegicus	2024-01-01	F	280.5

`to_arrays()` — NumPy Arrays¶

In [40]:

Copied!

# Structured array (all columns)
arr = Subject.to_arrays()
arr
# Structured array (all columns)
arr = Subject.to_arrays()
arr

Out[40]:

array([('M001', 'Mus musculus', datetime.date(2026, 1, 15), 'M', Decimal('25.3')),
       ('M002', 'Mus musculus', datetime.date(2026, 2, 1), 'F', Decimal('22.1')),
       ('M003', 'Mus musculus', datetime.date(2026, 2, 15), 'M', Decimal('26.8')),
       ('R001', 'Rattus norvegicus', datetime.date(2024, 1, 1), 'F', Decimal('280.5'))],
      dtype=[('subject_id', 'O'), ('species', 'O'), ('date_of_birth', 'O'), ('sex', 'O'), ('weight', 'O')])

In [41]:

Copied!





# Specific columns as separate arrays
species, weights = Subject.to_arrays('species', 'weight')
print(f"Species: {species}")
print(f"Weights: {weights}")
# Specific columns as separate arrays
species, weights = Subject.to_arrays('species', 'weight')
print(f"Species: {species}")
print(f"Weights: {weights}")

Species: ['Mus musculus' 'Mus musculus' 'Mus musculus' 'Rattus norvegicus']
Weights: [Decimal('25.3') Decimal('22.1') Decimal('26.8') Decimal('280.5')]

`keys()` — Primary Keys¶

In [42]:

Copied!

# Get primary keys for iteration
keys = Session.keys()
keys[:3]
# Get primary keys for iteration
keys = Session.keys()
keys[:3]

Out[42]:

[{'subject_id': 'M001', 'session_idx': 1},
 {'subject_id': 'M001', 'session_idx': 2},
 {'subject_id': 'M002', 'session_idx': 1}]

`fetch1()` — Single Row¶

In [43]:

Copied!

# Fetch one row (raises error if not exactly 1)
row = (Subject & {'subject_id': 'M001'}).fetch1()
row
# Fetch one row (raises error if not exactly 1)
row = (Subject & {'subject_id': 'M001'}).fetch1()
row

Out[43]:

{'subject_id': 'M001',
 'species': 'Mus musculus',
 'date_of_birth': datetime.date(2026, 1, 15),
 'sex': 'M',
 'weight': Decimal('25.3')}

In [44]:

Copied!

# Fetch specific attributes from one row
species, weight = (Subject & {'subject_id': 'M001'}).fetch1('species', 'weight')
print(f"{species}: {weight}g")
# Fetch specific attributes from one row
species, weight = (Subject & {'subject_id': 'M001'}).fetch1('species', 'weight')
print(f"{species}: {weight}g")

Mus musculus: 25.3g

Ordering and Limiting¶

In [45]:

Copied!

# Sort by weight descending, get top 2
Subject.to_dicts(order_by='weight DESC', limit=2)
# Sort by weight descending, get top 2
Subject.to_dicts(order_by='weight DESC', limit=2)

Out[45]:

[{'subject_id': 'R001',
  'species': 'Rattus norvegicus',
  'date_of_birth': datetime.date(2024, 1, 1),
  'sex': 'F',
  'weight': Decimal('280.5')},
 {'subject_id': 'M003',
  'species': 'Mus musculus',
  'date_of_birth': datetime.date(2026, 2, 15),
  'sex': 'M',
  'weight': Decimal('26.8')}]

In [46]:

Copied!

# Sort by primary key
Subject.to_dicts(order_by='KEY')
# Sort by primary key
Subject.to_dicts(order_by='KEY')

Out[46]:

[{'subject_id': 'M001',
  'species': 'Mus musculus',
  'date_of_birth': datetime.date(2026, 1, 15),
  'sex': 'M',
  'weight': Decimal('25.3')},
 {'subject_id': 'M002',
  'species': 'Mus musculus',
  'date_of_birth': datetime.date(2026, 2, 1),
  'sex': 'F',
  'weight': Decimal('22.1')},
 {'subject_id': 'M003',
  'species': 'Mus musculus',
  'date_of_birth': datetime.date(2026, 2, 15),
  'sex': 'M',
  'weight': Decimal('26.8')},
 {'subject_id': 'R001',
  'species': 'Rattus norvegicus',
  'date_of_birth': datetime.date(2024, 1, 1),
  'sex': 'F',
  'weight': Decimal('280.5')}]

Lazy Iteration¶

Iterating directly over a table streams rows efficiently:

In [47]:

Copied!

# Stream rows (single database cursor)
for row in Subject:
    print(f"{row['subject_id']}: {row['species']}")
# Stream rows (single database cursor)
for row in Subject:
    print(f"{row['subject_id']}: {row['species']}")

M001: Mus musculus
M002: Mus musculus
M003: Mus musculus
R001: Rattus norvegicus

Query Composition¶

Queries are composable and immutable. Build complex queries step by step:

In [48]:

Copied!





# Build a complex query step by step
male_mice = Subject & "sex = 'M'" & "species LIKE '%musculus%'"
sessions_with_subject = male_mice * Session
alice_sessions = sessions_with_subject & {'experimenter_id': 'alice'}
result = alice_sessions.proj('session_date', 'duration', 'weight')

result
# Build a complex query step by step
male_mice = Subject & "sex = 'M'" & "species LIKE '%musculus%'"
sessions_with_subject = male_mice * Session
alice_sessions = sessions_with_subject & {'experimenter_id': 'alice'}
result = alice_sessions.proj('session_date', 'duration', 'weight')

result

Out[48]:

subject_id	session_idx	session_date	duration	weight
M001	1	2026-01-06	45.0	25.3
M001	2	2026-01-07	50.0	25.3
M003	1	2026-01-07	35.0	26.8

Total: 3

In [49]:

Copied!





# Or as a single expression
((Subject & "sex = 'M'" & "species LIKE '%musculus%'") 
 * Session 
 & {'experimenter_id': 'alice'}
).proj('session_date', 'duration', 'weight')
# Or as a single expression
((Subject & "sex = 'M'" & "species LIKE '%musculus%'") 
 * Session 
 & {'experimenter_id': 'alice'}
).proj('session_date', 'duration', 'weight')

Out[49]:

subject_id	session_idx	session_date	duration	weight
M001	1	2026-01-06	45.0	25.3
M001	2	2026-01-07	50.0	25.3
M003	1	2026-01-07	35.0	26.8

Total: 3

Operator Precedence¶

Python operator precedence applies:

* (join) — highest
+, - (union, anti-restriction)
& (restriction) — lowest

Use parentheses for clarity:

In [50]:

Copied!





# Without parentheses: join happens first
# Subject * Session & condition  means  (Subject * Session) & condition

# With parentheses: explicit order
result1 = (Subject & "sex = 'M'") * Session   # Restrict then join
result2 = Subject * (Session & "duration > 40")  # Restrict then join

print(f"Result 1: {len(result1)} rows")
print(f"Result 2: {len(result2)} rows")
# Without parentheses: join happens first
# Subject * Session & condition  means  (Subject * Session) & condition

# With parentheses: explicit order
result1 = (Subject & "sex = 'M'") * Session   # Restrict then join
result2 = Subject * (Session & "duration > 40")  # Restrict then join

print(f"Result 1: {len(result1)} rows")
print(f"Result 2: {len(result2)} rows")

Result 1: 3 rows
Result 2: 3 rows

Quick Reference¶

Operators¶

Operation	Syntax	Description
Restrict	`A & cond`	Select matching rows
Anti-restrict	`A - cond`	Select non-matching rows
Top	`A & dj.Top(limit, order_by)`	Limit/order results
Project	`A.proj(...)`	Select/compute columns
Join	`A * B`	Combine tables
Extend	`A.extend(B)`	Add B's attributes, keep all A rows
Aggregate	`A.aggr(B, ...)`	Group and summarize
Union	`A + B`	Combine entity sets

Fetch Methods¶

Method	Returns	Use Case
`to_dicts()`	`list[dict]`	JSON, iteration
`to_pandas()`	`DataFrame`	Data analysis
`to_arrays()`	`np.ndarray`	Numeric computation
`to_arrays('a', 'b')`	`tuple[array, ...]`	Specific columns
`keys()`	`list[dict]`	Primary keys
`fetch1()`	`dict`	Single row

See the Query Algebra Specification and Fetch API for complete details.

Next Steps¶

Computation — Building computational pipelines

In [51]:

Copied!

# Cleanup
schema.drop(prompt=False)
# Cleanup
schema.drop(prompt=False)

Queries¶

Restriction (& and -)¶

String Conditions¶

Dictionary Conditions¶

Restriction by Query Expression¶

Collection Conditions (OR)¶

Chaining Restrictions¶

Top Restriction (dj.Top)¶

Projection (.proj())¶

Selecting Attributes¶

Renaming Attributes¶

Computed Attributes¶

Join (*)¶

Primary Keys in Join Results¶

Extension (.extend())¶

Aggregation (.aggr())¶

The exclude_nonmatching Parameter¶

Universal Set (dj.U())¶

Fetching Data¶

to_dicts() — List of Dictionaries¶

to_pandas() — DataFrame¶

to_arrays() — NumPy Arrays¶

keys() — Primary Keys¶

fetch1() — Single Row¶

Ordering and Limiting¶

Lazy Iteration¶

Query Composition¶

Operator Precedence¶

Quick Reference¶

Operators¶

Fetch Methods¶

Next Steps¶

Restriction (`&` and `-`)¶

Top Restriction (`dj.Top`)¶

Projection (`.proj()`)¶

Join (`*`)¶

Extension (`.extend()`)¶

Aggregation (`.aggr()`)¶

The `exclude_nonmatching` Parameter¶

Universal Set (`dj.U()`)¶

`to_dicts()` — List of Dictionaries¶

`to_pandas()` — DataFrame¶

`to_arrays()` — NumPy Arrays¶

`keys()` — Primary Keys¶

`fetch1()` — Single Row¶