DataJoint 2.0 Fetch API Specification¶

Overview¶

DataJoint 2.0 replaces the complex fetch() method with a set of explicit, composable output methods. This provides better discoverability, clearer intent, and more efficient iteration.

Design Principles¶

Explicit over implicit: Each output format has its own method
Composable: Use existing .proj() for column selection
Lazy iteration: Single cursor streaming instead of fetch-all-keys
Modern formats: First-class support for polars and Arrow

New API Reference¶

Output Methods¶

Method	Returns	Description
`to_dicts()`	`list[dict]`	All rows as list of dictionaries
`to_pandas()`	`DataFrame`	pandas DataFrame with primary key as index
`to_polars()`	`polars.DataFrame`	polars DataFrame (requires `datajoint[polars]`)
`to_arrow()`	`pyarrow.Table`	PyArrow Table (requires `datajoint[arrow]`)
`to_arrays()`	`np.ndarray`	numpy structured array (recarray)
`to_arrays('a', 'b')`	`tuple[array, array]`	Tuple of arrays for specific columns
`keys()`	`list[dict]`	Primary key values only
`fetch1()`	`dict`	Single row as dict (raises if not exactly 1)
`fetch1('a', 'b')`	`tuple`	Single row attribute values
`head(limit=25)`	`list[dict]`	Preview first N entries
`tail(limit=25)`	`list[dict]`	Preview last N entries
`cursor(as_dict=False)`	`cursor`	Raw database cursor for manual iteration

Common Parameters¶

All output methods accept these optional parameters:

table.to_dicts(
    order_by=None,      # str or list: column(s) to sort by, e.g. "KEY", "name DESC"
    limit=None,         # int: maximum rows to return
    offset=None,        # int: rows to skip
    squeeze=False,      # bool: remove singleton dimensions from arrays
)

For in-store types (attachments, filepaths), files are downloaded to config["download_path"]. Use config.override() to change:

with dj.config.override(download_path="/data"):
    data = table.to_dicts()

Iteration¶

# Lazy streaming - yields one dict per row from database cursor
for row in table:
    process(row)  # row is a dict

Migration Guide¶

Basic Fetch Operations¶

Old Pattern (1.x)	New Pattern (2.0)
`table.fetch()`	`table.to_arrays()` or `table.to_dicts()`
`table.fetch(format="array")`	`table.to_arrays()`
`table.fetch(format="frame")`	`table.to_pandas()`
`table.fetch(as_dict=True)`	`table.to_dicts()`

Attribute Fetching¶

Old Pattern (1.x)	New Pattern (2.0)
`table.fetch('a')`	`table.to_arrays('a')`
`a, b = table.fetch('a', 'b')`	`a, b = table.to_arrays('a', 'b')`
`table.fetch('a', 'b', as_dict=True)`	`table.proj('a', 'b').to_dicts()`

Primary Key Fetching¶

Old Pattern (1.x)	New Pattern (2.0)
`table.fetch('KEY')`	`table.keys()`
`table.fetch(dj.key)`	`table.keys()`
`keys, a = table.fetch('KEY', 'a')`	See note below

For mixed KEY + attribute fetch:

# Old: keys, a = table.fetch('KEY', 'a')
# New: Combine keys() with to_arrays()
keys = table.keys()
a = table.to_arrays('a')
# Or use to_dicts() which includes all columns

Ordering, Limiting, Offset¶

Old Pattern (1.x)	New Pattern (2.0)
`table.fetch(order_by='name')`	`table.to_arrays(order_by='name')`
`table.fetch(limit=10)`	`table.to_arrays(limit=10)`
`table.fetch(order_by='KEY', limit=10, offset=5)`	`table.to_arrays(order_by='KEY', limit=10, offset=5)`

Single Row Fetch (fetch1)¶

Old Pattern (1.x)	New Pattern (2.0)
`table.fetch1()`	`table.fetch1()` (unchanged)
`a, b = table.fetch1('a', 'b')`	`a, b = table.fetch1('a', 'b')` (unchanged)
`table.fetch1('KEY')`	`table.fetch1()` then extract pk columns

Configuration¶

Old Pattern (1.x)	New Pattern (2.0)
`dj.config['fetch_format'] = 'frame'`	Use `.to_pandas()` explicitly
`with dj.config.override(fetch_format='frame'):`	Use `.to_pandas()` in the block

Iteration¶

Old Pattern (1.x)	New Pattern (2.0)
`for row in table:`	`for row in table:` (same syntax, now lazy!)
`list(table)`	`table.to_dicts()`

Column Selection with proj()¶

Use .proj() for column selection, then apply output method:

# Select specific columns
table.proj('col1', 'col2').to_pandas()
table.proj('col1', 'col2').to_dicts()

# Computed columns
table.proj(total='price * quantity').to_pandas()

Removed Features¶

Removed Methods and Parameters¶

fetch() method - use explicit output methods
fetch('KEY') - use keys()
dj.key class - use keys() method
format= parameter - use explicit methods
as_dict= parameter - use to_dicts()
config['fetch_format'] setting - use explicit methods

Removed Imports¶

# Old (removed)
from datajoint import key
result = table.fetch(dj.key)

# New
result = table.keys()

Examples¶

Example 1: Basic Data Retrieval¶

# Get all data as DataFrame
df = Experiment().to_pandas()

# Get all data as list of dicts
rows = Experiment().to_dicts()

# Get all data as numpy array
arr = Experiment().to_arrays()

Example 2: Filtered and Sorted Query¶

# Get recent experiments, sorted by date
recent = (Experiment() & "date > '2024-01-01'").to_pandas(
    order_by='date DESC',
    limit=100
)

Example 3: Specific Columns¶

# Fetch specific columns as arrays
names, dates = Experiment().to_arrays('name', 'date')

# Or with primary key included
names, dates = Experiment().to_arrays('name', 'date', include_key=True)

Example 4: Primary Keys for Iteration¶

# Get keys for restriction
keys = Experiment().keys()
for key in keys:
    process(Session() & key)

Example 5: Single Row¶

# Get one row as dict
row = (Experiment() & key).fetch1()

# Get specific attributes
name, date = (Experiment() & key).fetch1('name', 'date')

Example 6: Lazy Iteration¶

# Stream rows efficiently (single database cursor)
for row in Experiment():
    if should_process(row):
        process(row)
    if done:
        break  # Early termination - no wasted fetches

Example 7: Modern DataFrame Libraries¶

# Polars (fast, modern)
import polars as pl
df = Experiment().to_polars()
result = df.filter(pl.col('value') > 100).group_by('category').agg(pl.mean('value'))

# PyArrow (zero-copy interop)
table = Experiment().to_arrow()
# Can convert to pandas or polars with zero copy

Performance Considerations¶

Lazy Iteration¶

The new iteration is significantly more efficient:

# Old (1.x): N+1 queries
# 1. fetch("KEY") gets ALL keys
# 2. fetch1() for EACH key

# New (2.0): Single query
# Streams rows from one cursor
for row in table:
    ...

Memory Efficiency¶

to_dicts(): Returns full list in memory
for row in table:: Streams one row at a time
to_arrays(limit=N): Fetches only N rows

Format Selection¶

Use Case	Recommended Method
Data analysis	`to_pandas()` or `to_polars()`
JSON API responses	`to_dicts()`
Numeric computation	`to_arrays()`
Large datasets	`for row in table:` (streaming)
Interop with other tools	`to_arrow()`

Error Messages¶

When attempting to use removed methods, users see helpful error messages:

>>> table.fetch()
AttributeError: fetch() has been removed in DataJoint 2.0.
Use to_dicts(), to_pandas(), to_arrays(), or keys() instead.
See table.fetch.__doc__ for details.

Optional Dependencies¶

Install optional dependencies for additional output formats:

# For polars support
pip install datajoint[polars]

# For PyArrow support
pip install datajoint[arrow]

# For both
pip install datajoint[polars,arrow]