Expressions¶

Query expressions and operators

QueryExpression ¶

QueryExpression implements query operators to derive new entity set from its input. A QueryExpression object generates a SELECT statement in SQL. QueryExpression operators are restrict, join, proj, aggr, and union.

A QueryExpression object has a support, a restriction (an AndList), and heading. Property heading (type dj.Heading) contains information about the attributes. It is loaded from the database and updated by proj.

Property support is the list of table names or other QueryExpressions to be joined.

The restriction is applied first without having access to the attributes generated by the projection. Then projection is applied by selecting modifying the heading attribute.

Application of operators does not always lead to the creation of a subquery. A subquery is generated when: 1. A restriction is applied on any computed or renamed attributes 2. A projection is applied remapping remapped attributes 3. Subclasses: Join, Aggregation, and Union have additional specific rules.

connection `property` ¶

connection

a dj.Connection object

support `property` ¶

support

A list of table names or subqueries to from the FROM clause

heading `property` ¶

heading

a dj.Heading object, reflects the effects of the projection operator .proj

original_heading `property` ¶

original_heading

a dj.Heading object reflecting the attributes before projection

restriction `property` ¶

restriction

a AndList object of restrictions applied to input to produce the result

restriction_attributes `property` ¶

restriction_attributes

the set of attribute names invoked in the WHERE clause

make_sql ¶

make_sql(fields=None)

Make the SQL SELECT statement.

Parameters:

Name	Type	Description	Default
`fields`	`list`	Used to explicitly set the select attributes.	`None`

Returns:

Type	Description
`str`	The SQL SELECT statement.

make_subquery ¶

make_subquery()

create a new SELECT statement where self is the FROM clause

restrict ¶

restrict(restriction, semantic_check=True)

Produces a new expression with the new restriction applied.

rel.restrict(restriction) is equivalent to rel & restriction. rel.restrict(Not(restriction)) is equivalent to rel - restriction.

The primary key of the result is unaffected. Successive restrictions are combined as logical AND: r & a & b is equivalent to r & AndList((a, b)). Any QueryExpression, collection, or sequence other than an AndList are treated as OrLists (logical disjunction of conditions). Inverse restriction is accomplished by either using the subtraction operator or the Not class.

The expressions in each row are equivalent::

rel & True                          rel
rel & False                         the empty entity set
rel & 'TRUE'                        rel
rel & 'FALSE'                       the empty entity set
rel - cond                          rel & Not(cond)
rel - 'TRUE'                        rel & False
rel - 'FALSE'                       rel
rel & AndList((cond1,cond2))        rel & cond1 & cond2
rel & AndList()                     rel
rel & [cond1, cond2]                rel & OrList((cond1, cond2))
rel & []                            rel & False
rel & None                          rel & False
rel & any_empty_entity_set          rel & False
rel - AndList((cond1,cond2))        rel & [Not(cond1), Not(cond2)]
rel - [cond1, cond2]                rel & Not(cond1) & Not(cond2)
rel - AndList()                     rel & False
rel - []                            rel
rel - None                          rel
rel - any_empty_entity_set          rel

When arg is another QueryExpression, the restriction rel & arg restricts rel to elements that match at least one element in arg (hence arg is treated as an OrList). Conversely, rel - arg restricts rel to elements that do not match any elements in arg. Two elements match when their common attributes have equal values or when they have no common attributes. All shared attributes must be in the primary key of either rel or arg or both or an error will be raised.

QueryExpression.restrict is the only access point that modifies restrictions. All other operators must ultimately call restrict().

Parameters:

Name	Type	Description	Default
`restriction`	`QueryExpression, AndList, str, dict, list, or array-like`	A sequence or an array (treated as OR list), another QueryExpression, an SQL condition string, or an AndList.	required
`semantic_check`	`bool`	If True (default), use semantic matching - only match on homologous namesakes and error on non-homologous namesakes. If False, use natural matching on all namesakes (no lineage checking).	`True`

Returns:

Type	Description
`QueryExpression`	A new QueryExpression with the restriction applied.

join ¶

join(other, semantic_check=True, left=False, allow_nullable_pk=False)

Create the joined QueryExpression.

a * b is short for a.join(b).

Parameters:

Name	Type	Description	Default
`other`	`QueryExpression`	QueryExpression to join with.	required
`semantic_check`	`bool`	If True (default), use semantic matching - only match on homologous namesakes (same lineage) and error on non-homologous namesakes. If False, use natural join on all namesakes (no lineage checking).	`True`
`left`	`bool`	If True, perform a left join (retain all rows from self). Default False.	`False`
`allow_nullable_pk`	`bool`	If True, bypass the left join constraint that requires self to determine other. When bypassed, the result PK is the union of both operands' PKs, and PK attributes from the right operand could be NULL. Used internally by aggregation when exclude_nonmatching=False. Default False.	`False`

Returns:

Type	Description
`QueryExpression`	The joined QueryExpression.

extend ¶

extend(other, semantic_check=True)

Extend self with attributes from other.

The extend operation adds attributes from other to self while preserving self's entity identity. It is semantically equivalent to self.join(other, left=True) but expresses a clearer intent: extending an entity set with additional attributes rather than combining two entity sets.

Requirements: self → other (self determines other). Every attribute in other's primary key must exist in self. This ensures:

All rows of self are preserved (no filtering)
Self's primary key remains the result's primary key (no NULL PKs)
The operation is a true extension, not a Cartesian product

Conceptual model: Unlike a general join (Cartesian product restricted by matching attributes), extend is closer to projection—it adds new attributes to existing entities without changing which entities are in the result.

Examples:

Session determines Trial (session_id is in Trial's PK), but Trial does NOT determine Session (trial_num not in Session).

Valid - extend trials with session info:

>>> Trial.extend(Session)  # Adds 'date' from Session to each Trial

Invalid - Session cannot extend to Trial:

>>> Session.extend(Trial)  # Error: trial_num not in Session

Parameters:

Name	Type	Description	Default
`other`	`QueryExpression`	QueryExpression whose attributes will extend self.	required
`semantic_check`	`bool`	If True (default), require homologous namesakes. If False, match on all namesakes without lineage checking.	`True`

Returns:

Type	Description
`QueryExpression`	Extended QueryExpression with self's PK and combined attributes.

Raises:

Type	Description
`DataJointError`	If self does not determine other.

proj ¶

proj(*attributes, **named_attributes)

Projection operator.

Primary key attributes cannot be excluded but may be renamed. If the attribute list contains an Ellipsis ..., then all secondary attributes are included too. Prefixing an attribute name with a dash -attr removes the attribute from the list if present. Keyword arguments can be used to rename attributes as in name='attr', duplicate them as in name='(attr)', or compute new attributes.

self.proj(...) or self.proj(Ellipsis) -- include all attributes (return self)
self.proj() -- include only primary key
self.proj('attr1', 'attr2') -- include primary key and attributes attr1 and attr2
self.proj(..., '-attr1', '-attr2') -- include all attributes except attr1 and attr2
self.proj(name1='attr1') -- include primary key and 'attr1' renamed as name1
self.proj('attr1', dup='(attr1)') -- include primary key and attr1 twice, with the duplicate 'dup'
self.proj(k='abs(attr1)') adds the new attribute k with the value computed as an expression (SQL syntax) from other attributes available before the projection.

Each attribute name can only be used once.

Parameters:

Name	Type	Description	Default
`*attributes`	`str`	Attributes to be included in the result. The primary key is already included.	`()`
`**named_attributes`	`str`	New attributes computed or renamed from existing attributes.	`{}`

Returns:

Type	Description
`QueryExpression`	The projected expression.

aggr ¶

aggr(group, *attributes, exclude_nonmatching=False, **named_attributes)

Aggregation/grouping operation, similar to proj but with computations over a grouped relation.

By default, keeps all rows from self (like proj). Use exclude_nonmatching=True to keep only rows that have matches in group.

Parameters:

Name	Type	Description	Default
`group`	`QueryExpression`	The query expression to be aggregated.	required
`*attributes`	`str`	Attributes to include in the result.	`()`
`exclude_nonmatching`	`bool`	If True, exclude rows from self that have no matching entries in group (INNER JOIN). Default False keeps all rows (LEFT JOIN).	`False`
`**named_attributes`	`str`	Computations of the form `new_attribute="sql expression on attributes of group"`.	`{}`

Returns:

Type	Description
`QueryExpression`	The derived query expression.

Examples:

Count sessions per subject (keeps all subjects, even those with 0 sessions)::

Subject.aggr(Session, n="count(*)")

Count sessions per subject (only subjects with at least one session)::

Subject.aggr(Session, n="count(*)", exclude_nonmatching=True)

fetch ¶

fetch(*attrs, offset=None, limit=None, order_by=None, format=None, as_dict=None, squeeze=False)

Fetch data from the table (backward-compatible with DataJoint 0.14).

.. deprecated:: 2.0 Use the new explicit output methods instead: - to_dicts() for list of dictionaries - to_pandas() for pandas DataFrame - to_arrays() for numpy structured array - to_arrays('a', 'b') for tuple of arrays - keys() for primary keys

Parameters:

Name	Type	Description	Default
`*attrs`	`str`	Attributes to fetch. If empty, fetches all.	`()`
`offset`	`int`	Number of tuples to skip.	`None`
`limit`	`int`	Maximum number of tuples to return.	`None`
`order_by`	`str or list`	Attribute(s) for ordering results.	`None`
`format`	`str`	Output format: 'array' or 'frame' (pandas DataFrame).	`None`
`as_dict`	`bool`	Return as list of dicts instead of structured array.	`None`
`squeeze`	`bool`	Remove extra dimensions from arrays. Default False.	`False`

Returns:

Type	Description
`np.recarray, list[dict], or pd.DataFrame`	Query results in requested format.

fetch1 ¶

fetch1(*attrs, squeeze=False)

Fetch exactly one row from the query result.

If no attributes are specified, returns the result as a dict. If attributes are specified, returns the corresponding values as a tuple.

Parameters:

Name	Type	Description	Default
`*attrs`	`str`	Attribute names to fetch. If empty, fetch all as dict.	`()`
`squeeze`	`bool`	If True, remove extra dimensions from arrays. Default False.	`False`

Returns:

Type	Description
`dict or tuple or value`	Dict (no attrs) or tuple/value (with attrs).

Raises:

Type	Description
`DataJointError`	If not exactly one row in result.

Examples:

::

d = table.fetch1()              # returns dict with all attributes
a, b = table.fetch1('a', 'b')   # returns tuple of attribute values
value = table.fetch1('a')       # returns single value

to_dicts ¶

to_dicts(order_by=None, limit=None, offset=None, squeeze=False)

Fetch all rows as a list of dictionaries.

For object storage types (attachments, filepaths), files are downloaded to config["download_path"]. Use config.override() to change::

with dj.config.override(download_path="/data"):
    data = table.to_dicts()

Parameters:

Name	Type	Description	Default
`order_by`	`str or list`	Attribute(s) to order by, or "KEY"/"KEY DESC".	`None`
`limit`	`int`	Maximum number of rows to return.	`None`
`offset`	`int`	Number of rows to skip.	`None`
`squeeze`	`bool`	If True, remove extra dimensions from arrays. Default False.	`False`

Returns:

Type	Description
`list[dict]`	List of dictionaries, one per row.

to_pandas ¶

to_pandas(order_by=None, limit=None, offset=None, squeeze=False)

Fetch all rows as a pandas DataFrame with primary key as index.

Parameters:

Name	Type	Description	Default
`order_by`	`str or list`	Attribute(s) to order by, or "KEY"/"KEY DESC".	`None`
`limit`	`int`	Maximum number of rows to return.	`None`
`offset`	`int`	Number of rows to skip.	`None`
`squeeze`	`bool`	If True, remove extra dimensions from arrays. Default False.	`False`

Returns:

Type	Description
`DataFrame`	DataFrame with primary key columns as index.

to_polars ¶

to_polars(order_by=None, limit=None, offset=None, squeeze=False)

Fetch all rows as a polars DataFrame.

Requires polars: pip install datajoint[polars]

Parameters:

Name	Type	Description	Default
`order_by`	`str or list`	Attribute(s) to order by, or "KEY"/"KEY DESC".	`None`
`limit`	`int`	Maximum number of rows to return.	`None`
`offset`	`int`	Number of rows to skip.	`None`
`squeeze`	`bool`	If True, remove extra dimensions from arrays. Default False.	`False`

Returns:

Type	Description
`DataFrame`	Polars DataFrame.

to_arrow ¶

to_arrow(order_by=None, limit=None, offset=None, squeeze=False)

Fetch all rows as a PyArrow Table.

Requires pyarrow: pip install datajoint[arrow]

Parameters:

Name	Type	Description	Default
`order_by`	`str or list`	Attribute(s) to order by, or "KEY"/"KEY DESC".	`None`
`limit`	`int`	Maximum number of rows to return.	`None`
`offset`	`int`	Number of rows to skip.	`None`
`squeeze`	`bool`	If True, remove extra dimensions from arrays. Default False.	`False`

Returns:

Type	Description
`Table`	PyArrow Table.

to_arrays ¶

to_arrays(*attrs, include_key=False, order_by=None, limit=None, offset=None, squeeze=False)

Fetch data as numpy arrays.

If no attrs specified, returns a numpy structured array (recarray) of all columns. If attrs specified, returns a tuple of numpy arrays (one per attribute).

Parameters:

Name	Type	Description	Default
`*attrs`	`str`	Attribute names to fetch. If empty, fetch all.	`()`
`include_key`	`bool`	If True and attrs specified, prepend primary keys as list of dicts. Default False.	`False`
`order_by`	`str or list`	Attribute(s) to order by, or "KEY"/"KEY DESC".	`None`
`limit`	`int`	Maximum number of rows to return.	`None`
`offset`	`int`	Number of rows to skip.	`None`
`squeeze`	`bool`	If True, remove extra dimensions from arrays. Default False.	`False`

Returns:

Type	Description
`np.recarray or tuple of np.ndarray`	Numpy recarray (no attrs) or tuple of arrays (with attrs). With `include_key=True`: `(keys, *arrays)` where keys is `list[dict]`.

Examples:

Fetch as structured array::

data = table.to_arrays()

Fetch specific columns as separate arrays::

a, b = table.to_arrays('a', 'b')

Fetch with primary keys for later restrictions::

keys, a, b = table.to_arrays('a', 'b', include_key=True)
# keys = [{'id': 1}, {'id': 2}, ...]  # same format as table.keys()

keys ¶

keys(order_by=None, limit=None, offset=None)

Fetch primary key values as a list of dictionaries.

Parameters:

Name	Type	Description	Default
`order_by`	`str or list`	Attribute(s) to order by, or "KEY"/"KEY DESC".	`None`
`limit`	`int`	Maximum number of rows to return.	`None`
`offset`	`int`	Number of rows to skip.	`None`

Returns:

Type	Description
`list[dict]`	List of dictionaries containing only primary key columns.

head ¶

head(limit=25)

Preview the first few entries from query expression.

Parameters:

Name	Type	Description	Default
`limit`	`int`	Number of entries. Default 25.	`25`

Returns:

Type	Description
`list[dict]`	List of dictionaries.

tail ¶

tail(limit=25)

Preview the last few entries from query expression.

Parameters:

Name	Type	Description	Default
`limit`	`int`	Number of entries. Default 25.	`25`

Returns:

Type	Description
`list[dict]`	List of dictionaries.

cursor ¶

cursor(as_dict=False)

Execute the query and return a database cursor.

Parameters:

Name	Type	Description	Default
`as_dict`	`bool`	If True, rows are returned as dictionaries. Default False.	`False`

Returns:

Type	Description
`cursor`	Database query cursor.

preview ¶

preview(limit=None, width=None)

Return a string preview of the contents of the query.

Parameters:

Name	Type	Description	Default
`limit`	`int`	Maximum number of rows to preview.	`None`
`width`	`int`	Maximum width of the preview output.	`None`

Returns:

Type	Description
`str`	A string preview of the contents of the query.

Aggregation ¶

Bases: QueryExpression

Aggregation.create(arg, group, comp1='calc1', ..., compn='calcn') yields an entity set with primary key from arg. The computed arguments comp1, ..., compn use aggregation calculations on the attributes of group or simple projections and calculations on the attributes of arg. Aggregation is used QueryExpression.aggr and U.aggr. Aggregation is a private class in DataJoint, not exposed to users.

create `classmethod` ¶

create(groupby, group, keep_all_rows=False)

Create an aggregation expression.

Parameters:

Name	Type	Description	Default
`groupby`	`QueryExpression`	The expression to GROUP BY (determines the result's primary key).	required
`group`	`QueryExpression`	The expression to aggregate over.	required
`keep_all_rows`	`bool`	If True, use left join to keep all rows from groupby. Default False.	`False`

Returns:

Type	Description
`Aggregation`	The aggregation expression.

Union ¶

Bases: QueryExpression

Union is the private DataJoint class that implements the union operator.

from_clause ¶

from_clause()

The union does not use a FROM clause.

where_clause ¶

where_clause()

The union does not use a WHERE clause.

U ¶

Universal set representing all possible values of specified attributes.

dj.U objects cannot be queried on their own but are useful for forming certain queries. dj.U('attr1', ..., 'attrn') represents the universal set with primary key attributes attr1 ... attrn. Without any attributes, dj.U() represents a set with one element that has no attributes.

Restriction - Enumerate unique combinations of attribute values:

dj.U('contrast', 'brightness') & stimulus

This yields all unique combinations of contrast and brightness in stimulus.

Aggregation - Summary calculations over an entire set:

dj.U().aggr(expr, n='count(*)') # Total count

dj.U().aggr(expr, n='count(distinct attr)') # Distinct values

dj.U().aggr(expr, s='sum(attr)') # Sum of attribute

dj.U('attr1', 'attr2').aggr(expr, n='count(*)') # Count per group

Joins - Promote attributes to primary key:

If expr has attributes attr1 and attr2, then expr * dj.U('attr1', 'attr2') yields the same result as expr but with attr1 and attr2 promoted to the primary key.

.. note:: The * operator with dj.U has been removed in DataJoint 2.0. This pattern is no longer necessary with the new semantic matching system.

aggr ¶

aggr(group, **named_attributes)

Aggregation of the type U('attr1','attr2').aggr(group, computation="QueryExpression").

Has the primary key ('attr1','attr2') and performs aggregation computations for all matching elements of group.

Note: exclude_nonmatching is always True for dj.U (cannot keep all rows from infinite set).

Parameters:

Name	Type	Description	Default
`group`	`QueryExpression`	The query expression to be aggregated.	required
`**named_attributes`	`str`	Computations of the form `new_attribute="sql expression on attributes of group"`.	`{}`

Returns:

Type	Description
`QueryExpression`	The derived query expression.

Expressions¶

QueryExpression ¶

connection property ¶

support property ¶

heading property ¶

original_heading property ¶

restriction property ¶

restriction_attributes property ¶

make_sql ¶

make_subquery ¶

restrict ¶

join ¶

extend ¶

proj ¶

aggr ¶

fetch ¶

fetch1 ¶

to_dicts ¶

to_pandas ¶

to_polars ¶

to_arrow ¶

to_arrays ¶

keys ¶

head ¶

tail ¶

cursor ¶

preview ¶

Aggregation ¶

create classmethod ¶

Union ¶

from_clause ¶

where_clause ¶

U ¶

aggr ¶

connection `property` ¶

support `property` ¶

heading `property` ¶

original_heading `property` ¶

restriction `property` ¶

restriction_attributes `property` ¶

create `classmethod` ¶