Skip to content

Query Data

Filter, join, and transform data with DataJoint operators.

Restriction (&)

Filter rows that match a condition:

# String condition
Session & "session_date > '2026-01-01'"
Session & "duration BETWEEN 30 AND 60"

# Dictionary (exact match)
Session & {'subject_id': 'M001'}
Session & {'subject_id': 'M001', 'session_idx': 1}

# Query expression
Session & Subject                    # Sessions for subjects in Subject
Session & (Subject & "sex = 'M'")    # Sessions for male subjects

# List (OR)
Session & [{'subject_id': 'M001'}, {'subject_id': 'M002'}]

Top N Rows (dj.Top)

Limit results with optional ordering:

# First 10 by primary key
Session & dj.Top(10)

# Top 10 by date (descending)
Session & dj.Top(10, 'session_date DESC')

# Pagination: skip 20, take 10
Session & dj.Top(10, 'session_date DESC', offset=20)

# All rows ordered
Session & dj.Top(None, 'session_date DESC')

Use "KEY" for primary key ordering, "KEY DESC" for reverse:

Session & dj.Top(10, 'KEY DESC')  # Last 10 by primary key

Anti-Restriction (-)

Filter rows that do NOT match:

Subject - Session          # Subjects without sessions
Session - {'subject_id': 'M001'}

Projection (.proj())

Select, rename, or compute attributes:

# Primary key only
Subject.proj()

# Specific attributes
Subject.proj('species', 'sex')

# All attributes
Subject.proj(...)

# All except some
Subject.proj(..., '-notes')

# Rename
Subject.proj(animal_species='species')

# Computed
Subject.proj(weight_kg='weight / 1000')

Join (*)

Combine tables on matching attributes:

Subject * Session
Subject * Session * Experimenter

# Restrict then join
(Subject & "sex = 'M'") * Session

Aggregation (.aggr())

Group and summarize:

# Count trials per session
Session.aggr(Session.Trial, n_trials='count(trial_idx)')

# Multiple aggregates
Session.aggr(
    Session.Trial,
    n_trials='count(trial_idx)',
    avg_rt='avg(reaction_time)',
    min_rt='min(reaction_time)'
)

# Exclude sessions without trials
Session.aggr(Session.Trial, n='count(trial_idx)', exclude_nonmatching=True)

Universal Set (dj.U())

Group by arbitrary attributes:

# Unique values
dj.U('species') & Subject

# Group by non-primary-key attribute
dj.U('session_date').aggr(Session, n='count(session_idx)')

# Global aggregation (one row)
dj.U().aggr(Session, total='count(*)')

Extension (.extend())

Add attributes without losing rows:

# Add experimenter info, keep all sessions
Session.extend(Experimenter)

Chain Operations

result = (
    Subject
    & "sex = 'M'"
    * Session
    & "duration > 30"
).proj('species', 'session_date', 'duration')

Operator Precedence

Priority Operator Operation
Highest * Join
+, - Union, Anti-restriction
Lowest & Restriction

Use parentheses for clarity:

(Subject & condition) * Session    # Restrict then join
Subject * (Session & condition)    # Join then restrict

View Query

# See generated SQL
print((Subject & condition).make_sql())

# Count rows without fetching
len(Subject & condition)

See Also