Expressions¶
Query expressions and operators
QueryExpression ¶
QueryExpression implements query operators to derive new entity set from its input. A QueryExpression object generates a SELECT statement in SQL. QueryExpression operators are restrict, join, proj, aggr, and union.
A QueryExpression object has a support, a restriction (an AndList), and heading.
Property heading (type dj.Heading) contains information about the attributes.
It is loaded from the database and updated by proj.
Property support is the list of table names or other QueryExpressions to be joined.
The restriction is applied first without having access to the attributes generated by the projection. Then projection is applied by selecting modifying the heading attribute.
Application of operators does not always lead to the creation of a subquery. A subquery is generated when: 1. A restriction is applied on any computed or renamed attributes 2. A projection is applied remapping remapped attributes 3. Subclasses: Join, Aggregation, and Union have additional specific rules.
heading
property
¶
heading
a dj.Heading object, reflects the effects of the projection operator .proj
original_heading
property
¶
original_heading
a dj.Heading object reflecting the attributes before projection
restriction
property
¶
restriction
a AndList object of restrictions applied to input to produce the result
restriction_attributes
property
¶
restriction_attributes
the set of attribute names invoked in the WHERE clause
make_sql ¶
make_sql(fields=None)
Make the SQL SELECT statement.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
fields
|
list
|
Used to explicitly set the select attributes. |
None
|
Returns:
| Type | Description |
|---|---|
str
|
The SQL SELECT statement. |
restrict ¶
restrict(restriction, semantic_check=True)
Produces a new expression with the new restriction applied.
rel.restrict(restriction) is equivalent to rel & restriction.
rel.restrict(Not(restriction)) is equivalent to rel - restriction.
The primary key of the result is unaffected.
Successive restrictions are combined as logical AND: r & a & b is equivalent to
r & AndList((a, b)).
Any QueryExpression, collection, or sequence other than an AndList are treated as OrLists
(logical disjunction of conditions).
Inverse restriction is accomplished by either using the subtraction operator or the Not class.
The expressions in each row are equivalent::
rel & True rel
rel & False the empty entity set
rel & 'TRUE' rel
rel & 'FALSE' the empty entity set
rel - cond rel & Not(cond)
rel - 'TRUE' rel & False
rel - 'FALSE' rel
rel & AndList((cond1,cond2)) rel & cond1 & cond2
rel & AndList() rel
rel & [cond1, cond2] rel & OrList((cond1, cond2))
rel & [] rel & False
rel & None rel & False
rel & any_empty_entity_set rel & False
rel - AndList((cond1,cond2)) rel & [Not(cond1), Not(cond2)]
rel - [cond1, cond2] rel & Not(cond1) & Not(cond2)
rel - AndList() rel & False
rel - [] rel
rel - None rel
rel - any_empty_entity_set rel
When arg is another QueryExpression, the restriction rel & arg restricts rel to elements
that match at least one element in arg (hence arg is treated as an OrList).
Conversely, rel - arg restricts rel to elements that do not match any elements in arg.
Two elements match when their common attributes have equal values or when they have no
common attributes.
All shared attributes must be in the primary key of either rel or arg or both or an error
will be raised.
QueryExpression.restrict is the only access point that modifies restrictions. All other operators must ultimately call restrict().
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
restriction
|
QueryExpression, AndList, str, dict, list, or array-like
|
A sequence or an array (treated as OR list), another QueryExpression, an SQL condition string, or an AndList. |
required |
semantic_check
|
bool
|
If True (default), use semantic matching - only match on homologous namesakes and error on non-homologous namesakes. If False, use natural matching on all namesakes (no lineage checking). |
True
|
Returns:
| Type | Description |
|---|---|
QueryExpression
|
A new QueryExpression with the restriction applied. |
join ¶
join(other, semantic_check=True, left=False, allow_nullable_pk=False)
Create the joined QueryExpression.
a * b is short for a.join(b).
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
other
|
QueryExpression
|
QueryExpression to join with. |
required |
semantic_check
|
bool
|
If True (default), use semantic matching - only match on homologous namesakes (same lineage) and error on non-homologous namesakes. If False, use natural join on all namesakes (no lineage checking). |
True
|
left
|
bool
|
If True, perform a left join (retain all rows from self). Default False. |
False
|
allow_nullable_pk
|
bool
|
If True, bypass the left join constraint that requires self to determine other. When bypassed, the result PK is the union of both operands' PKs, and PK attributes from the right operand could be NULL. Used internally by aggregation when exclude_nonmatching=False. Default False. |
False
|
Returns:
| Type | Description |
|---|---|
QueryExpression
|
The joined QueryExpression. |
extend ¶
extend(other, semantic_check=True)
Extend self with attributes from other.
The extend operation adds attributes from other to self while preserving
self's entity identity. It is semantically equivalent to self.join(other, left=True)
but expresses a clearer intent: extending an entity set with additional attributes
rather than combining two entity sets.
Requirements: self โ other (self determines other). Every attribute in
other's primary key must exist in self. This ensures:
- All rows of self are preserved (no filtering)
- Self's primary key remains the result's primary key (no NULL PKs)
- The operation is a true extension, not a Cartesian product
Conceptual model: Unlike a general join (Cartesian product restricted by matching attributes), extend is closer to projectionโit adds new attributes to existing entities without changing which entities are in the result.
Examples:
Session determines Trial (session_id is in Trial's PK), but Trial does NOT
determine Session (trial_num not in Session).
Valid - extend trials with session info:
>>> Trial.extend(Session) # Adds 'date' from Session to each Trial
Invalid - Session cannot extend to Trial:
>>> Session.extend(Trial) # Error: trial_num not in Session
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
other
|
QueryExpression
|
QueryExpression whose attributes will extend self. |
required |
semantic_check
|
bool
|
If True (default), require homologous namesakes. If False, match on all namesakes without lineage checking. |
True
|
Returns:
| Type | Description |
|---|---|
QueryExpression
|
Extended QueryExpression with self's PK and combined attributes. |
Raises:
| Type | Description |
|---|---|
DataJointError
|
If self does not determine other. |
proj ¶
proj(*attributes, **named_attributes)
Projection operator.
Primary key attributes cannot be excluded but may be renamed.
If the attribute list contains an Ellipsis ..., then all secondary attributes
are included too.
Prefixing an attribute name with a dash -attr removes the attribute from the list
if present.
Keyword arguments can be used to rename attributes as in name='attr', duplicate
them as in name='(attr)', or compute new attributes.
self.proj(...)orself.proj(Ellipsis)-- include all attributes (return self)self.proj()-- include only primary keyself.proj('attr1', 'attr2')-- include primary key and attributes attr1 and attr2self.proj(..., '-attr1', '-attr2')-- include all attributes except attr1 and attr2self.proj(name1='attr1')-- include primary key and 'attr1' renamed as name1self.proj('attr1', dup='(attr1)')-- include primary key and attr1 twice, with the duplicate 'dup'self.proj(k='abs(attr1)')adds the new attribute k with the value computed as an expression (SQL syntax) from other attributes available before the projection.
Each attribute name can only be used once.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
*attributes
|
str
|
Attributes to be included in the result. The primary key is already included. |
()
|
**named_attributes
|
str
|
New attributes computed or renamed from existing attributes. |
{}
|
Returns:
| Type | Description |
|---|---|
QueryExpression
|
The projected expression. |
aggr ¶
aggr(group, *attributes, exclude_nonmatching=False, **named_attributes)
Aggregation/grouping operation, similar to proj but with computations over a grouped relation.
By default, keeps all rows from self (like proj). Use exclude_nonmatching=True to
keep only rows that have matches in group.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
group
|
QueryExpression
|
The query expression to be aggregated. |
required |
*attributes
|
str
|
Attributes to include in the result. |
()
|
exclude_nonmatching
|
bool
|
If True, exclude rows from self that have no matching entries in group (INNER JOIN). Default False keeps all rows (LEFT JOIN). |
False
|
**named_attributes
|
str
|
Computations of the form |
{}
|
Returns:
| Type | Description |
|---|---|
QueryExpression
|
The derived query expression. |
Examples:
Count sessions per subject (keeps all subjects, even those with 0 sessions)::
Subject.aggr(Session, n="count(*)")
Count sessions per subject (only subjects with at least one session)::
Subject.aggr(Session, n="count(*)", exclude_nonmatching=True)
fetch ¶
fetch(*attrs, offset=None, limit=None, order_by=None, format=None, as_dict=None, squeeze=False)
Fetch data from the table (backward-compatible with DataJoint 0.14).
.. deprecated:: 2.0
Use the new explicit output methods instead:
- to_dicts() for list of dictionaries
- to_pandas() for pandas DataFrame
- to_arrays() for numpy structured array
- to_arrays('a', 'b') for tuple of arrays
- keys() for primary keys
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
*attrs
|
str
|
Attributes to fetch. If empty, fetches all. |
()
|
offset
|
int
|
Number of tuples to skip. |
None
|
limit
|
int
|
Maximum number of tuples to return. |
None
|
order_by
|
str or list
|
Attribute(s) for ordering results. |
None
|
format
|
str
|
Output format: 'array' or 'frame' (pandas DataFrame). |
None
|
as_dict
|
bool
|
Return as list of dicts instead of structured array. |
None
|
squeeze
|
bool
|
Remove extra dimensions from arrays. Default False. |
False
|
Returns:
| Type | Description |
|---|---|
np.recarray, list[dict], or pd.DataFrame
|
Query results in requested format. |
fetch1 ¶
fetch1(*attrs, squeeze=False)
Fetch exactly one row from the query result.
If no attributes are specified, returns the result as a dict. If attributes are specified, returns the corresponding values as a tuple.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
*attrs
|
str
|
Attribute names to fetch. If empty, fetch all as dict. |
()
|
squeeze
|
bool
|
If True, remove extra dimensions from arrays. Default False. |
False
|
Returns:
| Type | Description |
|---|---|
dict or tuple or value
|
Dict (no attrs) or tuple/value (with attrs). |
Raises:
| Type | Description |
|---|---|
DataJointError
|
If not exactly one row in result. |
Examples:
::
d = table.fetch1() # returns dict with all attributes
a, b = table.fetch1('a', 'b') # returns tuple of attribute values
value = table.fetch1('a') # returns single value
to_dicts ¶
to_dicts(order_by=None, limit=None, offset=None, squeeze=False)
Fetch all rows as a list of dictionaries.
For object storage types (attachments, filepaths), files are downloaded
to config["download_path"]. Use config.override() to change::
with dj.config.override(download_path="/data"):
data = table.to_dicts()
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
order_by
|
str or list
|
Attribute(s) to order by, or "KEY"/"KEY DESC". |
None
|
limit
|
int
|
Maximum number of rows to return. |
None
|
offset
|
int
|
Number of rows to skip. |
None
|
squeeze
|
bool
|
If True, remove extra dimensions from arrays. Default False. |
False
|
Returns:
| Type | Description |
|---|---|
list[dict]
|
List of dictionaries, one per row. |
to_pandas ¶
to_pandas(order_by=None, limit=None, offset=None, squeeze=False)
Fetch all rows as a pandas DataFrame with primary key as index.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
order_by
|
str or list
|
Attribute(s) to order by, or "KEY"/"KEY DESC". |
None
|
limit
|
int
|
Maximum number of rows to return. |
None
|
offset
|
int
|
Number of rows to skip. |
None
|
squeeze
|
bool
|
If True, remove extra dimensions from arrays. Default False. |
False
|
Returns:
| Type | Description |
|---|---|
DataFrame
|
DataFrame with primary key columns as index. |
to_polars ¶
to_polars(order_by=None, limit=None, offset=None, squeeze=False)
Fetch all rows as a polars DataFrame.
Requires polars: pip install datajoint[polars]
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
order_by
|
str or list
|
Attribute(s) to order by, or "KEY"/"KEY DESC". |
None
|
limit
|
int
|
Maximum number of rows to return. |
None
|
offset
|
int
|
Number of rows to skip. |
None
|
squeeze
|
bool
|
If True, remove extra dimensions from arrays. Default False. |
False
|
Returns:
| Type | Description |
|---|---|
DataFrame
|
Polars DataFrame. |
to_arrow ¶
to_arrow(order_by=None, limit=None, offset=None, squeeze=False)
Fetch all rows as a PyArrow Table.
Requires pyarrow: pip install datajoint[arrow]
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
order_by
|
str or list
|
Attribute(s) to order by, or "KEY"/"KEY DESC". |
None
|
limit
|
int
|
Maximum number of rows to return. |
None
|
offset
|
int
|
Number of rows to skip. |
None
|
squeeze
|
bool
|
If True, remove extra dimensions from arrays. Default False. |
False
|
Returns:
| Type | Description |
|---|---|
Table
|
PyArrow Table. |
to_arrays ¶
to_arrays(*attrs, include_key=False, order_by=None, limit=None, offset=None, squeeze=False)
Fetch data as numpy arrays.
If no attrs specified, returns a numpy structured array (recarray) of all columns. If attrs specified, returns a tuple of numpy arrays (one per attribute).
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
*attrs
|
str
|
Attribute names to fetch. If empty, fetch all. |
()
|
include_key
|
bool
|
If True and attrs specified, prepend primary keys as list of dicts. Default False. |
False
|
order_by
|
str or list
|
Attribute(s) to order by, or "KEY"/"KEY DESC". |
None
|
limit
|
int
|
Maximum number of rows to return. |
None
|
offset
|
int
|
Number of rows to skip. |
None
|
squeeze
|
bool
|
If True, remove extra dimensions from arrays. Default False. |
False
|
Returns:
| Type | Description |
|---|---|
np.recarray or tuple of np.ndarray
|
Numpy recarray (no attrs) or tuple of arrays (with attrs).
With |
Examples:
Fetch as structured array::
data = table.to_arrays()
Fetch specific columns as separate arrays::
a, b = table.to_arrays('a', 'b')
Fetch with primary keys for later restrictions::
keys, a, b = table.to_arrays('a', 'b', include_key=True)
# keys = [{'id': 1}, {'id': 2}, ...] # same format as table.keys()
keys ¶
keys(order_by=None, limit=None, offset=None)
Fetch primary key values as a list of dictionaries.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
order_by
|
str or list
|
Attribute(s) to order by, or "KEY"/"KEY DESC". |
None
|
limit
|
int
|
Maximum number of rows to return. |
None
|
offset
|
int
|
Number of rows to skip. |
None
|
Returns:
| Type | Description |
|---|---|
list[dict]
|
List of dictionaries containing only primary key columns. |
head ¶
head(limit=25)
Preview the first few entries from query expression.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
limit
|
int
|
Number of entries. Default 25. |
25
|
Returns:
| Type | Description |
|---|---|
list[dict]
|
List of dictionaries. |
tail ¶
tail(limit=25)
Preview the last few entries from query expression.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
limit
|
int
|
Number of entries. Default 25. |
25
|
Returns:
| Type | Description |
|---|---|
list[dict]
|
List of dictionaries. |
cursor ¶
cursor(as_dict=False)
Execute the query and return a database cursor.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
as_dict
|
bool
|
If True, rows are returned as dictionaries. Default False. |
False
|
Returns:
| Type | Description |
|---|---|
cursor
|
Database query cursor. |
preview ¶
preview(limit=None, width=None)
Return a string preview of the contents of the query.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
limit
|
int
|
Maximum number of rows to preview. |
None
|
width
|
int
|
Maximum width of the preview output. |
None
|
Returns:
| Type | Description |
|---|---|
str
|
A string preview of the contents of the query. |
Aggregation ¶
Bases: QueryExpression
Aggregation.create(arg, group, comp1='calc1', ..., compn='calcn') yields an entity set with primary key from arg. The computed arguments comp1, ..., compn use aggregation calculations on the attributes of group or simple projections and calculations on the attributes of arg. Aggregation is used QueryExpression.aggr and U.aggr. Aggregation is a private class in DataJoint, not exposed to users.
create
classmethod
¶
create(groupby, group, keep_all_rows=False)
Create an aggregation expression.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
groupby
|
QueryExpression
|
The expression to GROUP BY (determines the result's primary key). |
required |
group
|
QueryExpression
|
The expression to aggregate over. |
required |
keep_all_rows
|
bool
|
If True, use left join to keep all rows from groupby. Default False. |
False
|
Returns:
| Type | Description |
|---|---|
Aggregation
|
The aggregation expression. |
Union ¶
Bases: QueryExpression
Union is the private DataJoint class that implements the union operator.
U ¶
Universal set representing all possible values of specified attributes.
dj.U objects cannot be queried on their own but are useful for forming
certain queries. dj.U('attr1', ..., 'attrn') represents the universal set
with primary key attributes attr1 ... attrn. Without any attributes,
dj.U() represents a set with one element that has no attributes.
Restriction - Enumerate unique combinations of attribute values:
dj.U('contrast', 'brightness') & stimulus
This yields all unique combinations of contrast and brightness in stimulus.
Aggregation - Summary calculations over an entire set:
dj.U().aggr(expr, n='count(*)') # Total count
dj.U().aggr(expr, n='count(distinct attr)') # Distinct values
dj.U().aggr(expr, s='sum(attr)') # Sum of attribute
dj.U('attr1', 'attr2').aggr(expr, n='count(*)') # Count per group
Joins - Promote attributes to primary key:
If expr has attributes attr1 and attr2, then
expr * dj.U('attr1', 'attr2') yields the same result as expr
but with attr1 and attr2 promoted to the primary key.
.. note::
The * operator with dj.U has been removed in DataJoint 2.0.
This pattern is no longer necessary with the new semantic matching system.
aggr ¶
aggr(group, **named_attributes)
Aggregation of the type U('attr1','attr2').aggr(group, computation="QueryExpression").
Has the primary key ('attr1','attr2') and performs aggregation computations for all
matching elements of group.
Note: exclude_nonmatching is always True for dj.U (cannot keep all rows from infinite set).
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
group
|
QueryExpression
|
The query expression to be aggregated. |
required |
**named_attributes
|
str
|
Computations of the form |
{}
|
Returns:
| Type | Description |
|---|---|
QueryExpression
|
The derived query expression. |