Skip to content

Expressions

Query expressions and operators

QueryExpression

QueryExpression implements query operators to derive new entity set from its input. A QueryExpression object generates a SELECT statement in SQL. QueryExpression operators are restrict, join, proj, aggr, and union.

A QueryExpression object has a support, a restriction (an AndList), and heading. Property heading (type dj.Heading) contains information about the attributes. It is loaded from the database and updated by proj.

Property support is the list of table names or other QueryExpressions to be joined.

The restriction is applied first without having access to the attributes generated by the projection. Then projection is applied by selecting modifying the heading attribute.

Application of operators does not always lead to the creation of a subquery. A subquery is generated when: 1. A restriction is applied on any computed or renamed attributes 2. A projection is applied remapping remapped attributes 3. Subclasses: Join, Aggregation, and Union have additional specific rules.

connection property

connection

a dj.Connection object

support property

support

A list of table names or subqueries to from the FROM clause

heading property

heading

a dj.Heading object, reflects the effects of the projection operator .proj

original_heading property

original_heading

a dj.Heading object reflecting the attributes before projection

restriction property

restriction

a AndList object of restrictions applied to input to produce the result

restriction_attributes property

restriction_attributes

the set of attribute names invoked in the WHERE clause

make_sql

make_sql(fields=None)

Make the SQL SELECT statement.

Parameters:

Name Type Description Default
fields list

Used to explicitly set the select attributes.

None

Returns:

Type Description
str

The SQL SELECT statement.

make_subquery

make_subquery()

create a new SELECT statement where self is the FROM clause

restrict

restrict(restriction, semantic_check=True)

Produces a new expression with the new restriction applied.

rel.restrict(restriction) is equivalent to rel & restriction. rel.restrict(Not(restriction)) is equivalent to rel - restriction.

The primary key of the result is unaffected. Successive restrictions are combined as logical AND: r & a & b is equivalent to r & AndList((a, b)). Any QueryExpression, collection, or sequence other than an AndList are treated as OrLists (logical disjunction of conditions). Inverse restriction is accomplished by either using the subtraction operator or the Not class.

The expressions in each row are equivalent::

rel & True                          rel
rel & False                         the empty entity set
rel & 'TRUE'                        rel
rel & 'FALSE'                       the empty entity set
rel - cond                          rel & Not(cond)
rel - 'TRUE'                        rel & False
rel - 'FALSE'                       rel
rel & AndList((cond1,cond2))        rel & cond1 & cond2
rel & AndList()                     rel
rel & [cond1, cond2]                rel & OrList((cond1, cond2))
rel & []                            rel & False
rel & None                          rel & False
rel & any_empty_entity_set          rel & False
rel - AndList((cond1,cond2))        rel & [Not(cond1), Not(cond2)]
rel - [cond1, cond2]                rel & Not(cond1) & Not(cond2)
rel - AndList()                     rel & False
rel - []                            rel
rel - None                          rel
rel - any_empty_entity_set          rel

When arg is another QueryExpression, the restriction rel & arg restricts rel to elements that match at least one element in arg (hence arg is treated as an OrList). Conversely, rel - arg restricts rel to elements that do not match any elements in arg. Two elements match when their common attributes have equal values or when they have no common attributes. All shared attributes must be in the primary key of either rel or arg or both or an error will be raised.

QueryExpression.restrict is the only access point that modifies restrictions. All other operators must ultimately call restrict().

Parameters:

Name Type Description Default
restriction QueryExpression, AndList, str, dict, list, or array-like

A sequence or an array (treated as OR list), another QueryExpression, an SQL condition string, or an AndList.

required
semantic_check bool

If True (default), use semantic matching - only match on homologous namesakes and error on non-homologous namesakes. If False, use natural matching on all namesakes (no lineage checking).

True

Returns:

Type Description
QueryExpression

A new QueryExpression with the restriction applied.

join

join(other, semantic_check=True, left=False, allow_nullable_pk=False)

Create the joined QueryExpression.

a * b is short for a.join(b).

Parameters:

Name Type Description Default
other QueryExpression

QueryExpression to join with.

required
semantic_check bool

If True (default), use semantic matching - only match on homologous namesakes (same lineage) and error on non-homologous namesakes. If False, use natural join on all namesakes (no lineage checking).

True
left bool

If True, perform a left join (retain all rows from self). Default False.

False
allow_nullable_pk bool

If True, bypass the left join constraint that requires self to determine other. When bypassed, the result PK is the union of both operands' PKs, and PK attributes from the right operand could be NULL. Used internally by aggregation when exclude_nonmatching=False. Default False.

False

Returns:

Type Description
QueryExpression

The joined QueryExpression.

extend

extend(other, semantic_check=True)

Extend self with attributes from other.

The extend operation adds attributes from other to self while preserving self's entity identity. It is semantically equivalent to self.join(other, left=True) but expresses a clearer intent: extending an entity set with additional attributes rather than combining two entity sets.

Requirements: self โ†’ other (self determines other). Every attribute in other's primary key must exist in self. This ensures:

  • All rows of self are preserved (no filtering)
  • Self's primary key remains the result's primary key (no NULL PKs)
  • The operation is a true extension, not a Cartesian product

Conceptual model: Unlike a general join (Cartesian product restricted by matching attributes), extend is closer to projectionโ€”it adds new attributes to existing entities without changing which entities are in the result.

Examples:

Session determines Trial (session_id is in Trial's PK), but Trial does NOT determine Session (trial_num not in Session).

Valid - extend trials with session info:

>>> Trial.extend(Session)  # Adds 'date' from Session to each Trial

Invalid - Session cannot extend to Trial:

>>> Session.extend(Trial)  # Error: trial_num not in Session

Parameters:

Name Type Description Default
other QueryExpression

QueryExpression whose attributes will extend self.

required
semantic_check bool

If True (default), require homologous namesakes. If False, match on all namesakes without lineage checking.

True

Returns:

Type Description
QueryExpression

Extended QueryExpression with self's PK and combined attributes.

Raises:

Type Description
DataJointError

If self does not determine other.

proj

proj(*attributes, **named_attributes)

Projection operator.

Primary key attributes cannot be excluded but may be renamed. If the attribute list contains an Ellipsis ..., then all secondary attributes are included too. Prefixing an attribute name with a dash -attr removes the attribute from the list if present. Keyword arguments can be used to rename attributes as in name='attr', duplicate them as in name='(attr)', or compute new attributes.

  • self.proj(...) or self.proj(Ellipsis) -- include all attributes (return self)
  • self.proj() -- include only primary key
  • self.proj('attr1', 'attr2') -- include primary key and attributes attr1 and attr2
  • self.proj(..., '-attr1', '-attr2') -- include all attributes except attr1 and attr2
  • self.proj(name1='attr1') -- include primary key and 'attr1' renamed as name1
  • self.proj('attr1', dup='(attr1)') -- include primary key and attr1 twice, with the duplicate 'dup'
  • self.proj(k='abs(attr1)') adds the new attribute k with the value computed as an expression (SQL syntax) from other attributes available before the projection.

Each attribute name can only be used once.

Parameters:

Name Type Description Default
*attributes str

Attributes to be included in the result. The primary key is already included.

()
**named_attributes str

New attributes computed or renamed from existing attributes.

{}

Returns:

Type Description
QueryExpression

The projected expression.

aggr

aggr(group, *attributes, exclude_nonmatching=False, **named_attributes)

Aggregation/grouping operation, similar to proj but with computations over a grouped relation.

By default, keeps all rows from self (like proj). Use exclude_nonmatching=True to keep only rows that have matches in group.

Parameters:

Name Type Description Default
group QueryExpression

The query expression to be aggregated.

required
*attributes str

Attributes to include in the result.

()
exclude_nonmatching bool

If True, exclude rows from self that have no matching entries in group (INNER JOIN). Default False keeps all rows (LEFT JOIN).

False
**named_attributes str

Computations of the form new_attribute="sql expression on attributes of group".

{}

Returns:

Type Description
QueryExpression

The derived query expression.

Examples:

Count sessions per subject (keeps all subjects, even those with 0 sessions)::

Subject.aggr(Session, n="count(*)")

Count sessions per subject (only subjects with at least one session)::

Subject.aggr(Session, n="count(*)", exclude_nonmatching=True)

fetch

fetch(*attrs, offset=None, limit=None, order_by=None, format=None, as_dict=None, squeeze=False)

Fetch data from the table (backward-compatible with DataJoint 0.14).

.. deprecated:: 2.0 Use the new explicit output methods instead: - to_dicts() for list of dictionaries - to_pandas() for pandas DataFrame - to_arrays() for numpy structured array - to_arrays('a', 'b') for tuple of arrays - keys() for primary keys

Parameters:

Name Type Description Default
*attrs str

Attributes to fetch. If empty, fetches all.

()
offset int

Number of tuples to skip.

None
limit int

Maximum number of tuples to return.

None
order_by str or list

Attribute(s) for ordering results.

None
format str

Output format: 'array' or 'frame' (pandas DataFrame).

None
as_dict bool

Return as list of dicts instead of structured array.

None
squeeze bool

Remove extra dimensions from arrays. Default False.

False

Returns:

Type Description
np.recarray, list[dict], or pd.DataFrame

Query results in requested format.

fetch1

fetch1(*attrs, squeeze=False)

Fetch exactly one row from the query result.

If no attributes are specified, returns the result as a dict. If attributes are specified, returns the corresponding values as a tuple.

Parameters:

Name Type Description Default
*attrs str

Attribute names to fetch. If empty, fetch all as dict.

()
squeeze bool

If True, remove extra dimensions from arrays. Default False.

False

Returns:

Type Description
dict or tuple or value

Dict (no attrs) or tuple/value (with attrs).

Raises:

Type Description
DataJointError

If not exactly one row in result.

Examples:

::

d = table.fetch1()              # returns dict with all attributes
a, b = table.fetch1('a', 'b')   # returns tuple of attribute values
value = table.fetch1('a')       # returns single value

to_dicts

to_dicts(order_by=None, limit=None, offset=None, squeeze=False)

Fetch all rows as a list of dictionaries.

For object storage types (attachments, filepaths), files are downloaded to config["download_path"]. Use config.override() to change::

with dj.config.override(download_path="/data"):
    data = table.to_dicts()

Parameters:

Name Type Description Default
order_by str or list

Attribute(s) to order by, or "KEY"/"KEY DESC".

None
limit int

Maximum number of rows to return.

None
offset int

Number of rows to skip.

None
squeeze bool

If True, remove extra dimensions from arrays. Default False.

False

Returns:

Type Description
list[dict]

List of dictionaries, one per row.

to_pandas

to_pandas(order_by=None, limit=None, offset=None, squeeze=False)

Fetch all rows as a pandas DataFrame with primary key as index.

Parameters:

Name Type Description Default
order_by str or list

Attribute(s) to order by, or "KEY"/"KEY DESC".

None
limit int

Maximum number of rows to return.

None
offset int

Number of rows to skip.

None
squeeze bool

If True, remove extra dimensions from arrays. Default False.

False

Returns:

Type Description
DataFrame

DataFrame with primary key columns as index.

to_polars

to_polars(order_by=None, limit=None, offset=None, squeeze=False)

Fetch all rows as a polars DataFrame.

Requires polars: pip install datajoint[polars]

Parameters:

Name Type Description Default
order_by str or list

Attribute(s) to order by, or "KEY"/"KEY DESC".

None
limit int

Maximum number of rows to return.

None
offset int

Number of rows to skip.

None
squeeze bool

If True, remove extra dimensions from arrays. Default False.

False

Returns:

Type Description
DataFrame

Polars DataFrame.

to_arrow

to_arrow(order_by=None, limit=None, offset=None, squeeze=False)

Fetch all rows as a PyArrow Table.

Requires pyarrow: pip install datajoint[arrow]

Parameters:

Name Type Description Default
order_by str or list

Attribute(s) to order by, or "KEY"/"KEY DESC".

None
limit int

Maximum number of rows to return.

None
offset int

Number of rows to skip.

None
squeeze bool

If True, remove extra dimensions from arrays. Default False.

False

Returns:

Type Description
Table

PyArrow Table.

to_arrays

to_arrays(*attrs, include_key=False, order_by=None, limit=None, offset=None, squeeze=False)

Fetch data as numpy arrays.

If no attrs specified, returns a numpy structured array (recarray) of all columns. If attrs specified, returns a tuple of numpy arrays (one per attribute).

Parameters:

Name Type Description Default
*attrs str

Attribute names to fetch. If empty, fetch all.

()
include_key bool

If True and attrs specified, prepend primary keys as list of dicts. Default False.

False
order_by str or list

Attribute(s) to order by, or "KEY"/"KEY DESC".

None
limit int

Maximum number of rows to return.

None
offset int

Number of rows to skip.

None
squeeze bool

If True, remove extra dimensions from arrays. Default False.

False

Returns:

Type Description
np.recarray or tuple of np.ndarray

Numpy recarray (no attrs) or tuple of arrays (with attrs). With include_key=True: (keys, *arrays) where keys is list[dict].

Examples:

Fetch as structured array::

data = table.to_arrays()

Fetch specific columns as separate arrays::

a, b = table.to_arrays('a', 'b')

Fetch with primary keys for later restrictions::

keys, a, b = table.to_arrays('a', 'b', include_key=True)
# keys = [{'id': 1}, {'id': 2}, ...]  # same format as table.keys()

keys

keys(order_by=None, limit=None, offset=None)

Fetch primary key values as a list of dictionaries.

Parameters:

Name Type Description Default
order_by str or list

Attribute(s) to order by, or "KEY"/"KEY DESC".

None
limit int

Maximum number of rows to return.

None
offset int

Number of rows to skip.

None

Returns:

Type Description
list[dict]

List of dictionaries containing only primary key columns.

head

head(limit=25)

Preview the first few entries from query expression.

Parameters:

Name Type Description Default
limit int

Number of entries. Default 25.

25

Returns:

Type Description
list[dict]

List of dictionaries.

tail

tail(limit=25)

Preview the last few entries from query expression.

Parameters:

Name Type Description Default
limit int

Number of entries. Default 25.

25

Returns:

Type Description
list[dict]

List of dictionaries.

cursor

cursor(as_dict=False)

Execute the query and return a database cursor.

Parameters:

Name Type Description Default
as_dict bool

If True, rows are returned as dictionaries. Default False.

False

Returns:

Type Description
cursor

Database query cursor.

preview

preview(limit=None, width=None)

Return a string preview of the contents of the query.

Parameters:

Name Type Description Default
limit int

Maximum number of rows to preview.

None
width int

Maximum width of the preview output.

None

Returns:

Type Description
str

A string preview of the contents of the query.

Aggregation

Bases: QueryExpression

Aggregation.create(arg, group, comp1='calc1', ..., compn='calcn') yields an entity set with primary key from arg. The computed arguments comp1, ..., compn use aggregation calculations on the attributes of group or simple projections and calculations on the attributes of arg. Aggregation is used QueryExpression.aggr and U.aggr. Aggregation is a private class in DataJoint, not exposed to users.

create classmethod

create(groupby, group, keep_all_rows=False)

Create an aggregation expression.

Parameters:

Name Type Description Default
groupby QueryExpression

The expression to GROUP BY (determines the result's primary key).

required
group QueryExpression

The expression to aggregate over.

required
keep_all_rows bool

If True, use left join to keep all rows from groupby. Default False.

False

Returns:

Type Description
Aggregation

The aggregation expression.

Union

Bases: QueryExpression

Union is the private DataJoint class that implements the union operator.

from_clause

from_clause()

The union does not use a FROM clause.

where_clause

where_clause()

The union does not use a WHERE clause.

U

Universal set representing all possible values of specified attributes.

dj.U objects cannot be queried on their own but are useful for forming certain queries. dj.U('attr1', ..., 'attrn') represents the universal set with primary key attributes attr1 ... attrn. Without any attributes, dj.U() represents a set with one element that has no attributes.

Restriction - Enumerate unique combinations of attribute values:

dj.U('contrast', 'brightness') & stimulus

This yields all unique combinations of contrast and brightness in stimulus.

Aggregation - Summary calculations over an entire set:

dj.U().aggr(expr, n='count(*)') # Total count

dj.U().aggr(expr, n='count(distinct attr)') # Distinct values

dj.U().aggr(expr, s='sum(attr)') # Sum of attribute

dj.U('attr1', 'attr2').aggr(expr, n='count(*)') # Count per group

Joins - Promote attributes to primary key:

If expr has attributes attr1 and attr2, then expr * dj.U('attr1', 'attr2') yields the same result as expr but with attr1 and attr2 promoted to the primary key.

.. note:: The * operator with dj.U has been removed in DataJoint 2.0. This pattern is no longer necessary with the new semantic matching system.

aggr

aggr(group, **named_attributes)

Aggregation of the type U('attr1','attr2').aggr(group, computation="QueryExpression").

Has the primary key ('attr1','attr2') and performs aggregation computations for all matching elements of group.

Note: exclude_nonmatching is always True for dj.U (cannot keep all rows from infinite set).

Parameters:

Name Type Description Default
group QueryExpression

The query expression to be aggregated.

required
**named_attributes str

Computations of the form new_attribute="sql expression on attributes of group".

{}

Returns:

Type Description
QueryExpression

The derived query expression.