Skip to content

DataJoint 2.0 Fetch API Specification

Overview

DataJoint 2.0 replaces the complex fetch() method with a set of explicit, composable output methods. This provides better discoverability, clearer intent, and more efficient iteration.

Design Principles

  1. Explicit over implicit: Each output format has its own method
  2. Composable: Use existing .proj() for column selection
  3. Lazy iteration: Single cursor streaming instead of fetch-all-keys
  4. Modern formats: First-class support for polars and Arrow

New API Reference

Output Methods

Method Returns Description
to_dicts() list[dict] All rows as list of dictionaries
to_pandas() DataFrame pandas DataFrame with primary key as index
to_polars() polars.DataFrame polars DataFrame (requires datajoint[polars])
to_arrow() pyarrow.Table PyArrow Table (requires datajoint[arrow])
to_arrays() np.ndarray numpy structured array (recarray)
to_arrays('a', 'b') tuple[array, array] Tuple of arrays for specific columns
keys() list[dict] Primary key values only
fetch1() dict Single row as dict (raises if not exactly 1)
fetch1('a', 'b') tuple Single row attribute values
head(limit=25) list[dict] Preview first N entries
tail(limit=25) list[dict] Preview last N entries
cursor(as_dict=False) cursor Raw database cursor for manual iteration

Common Parameters

All output methods accept these optional parameters:

table.to_dicts(
    order_by=None,      # str or list: column(s) to sort by, e.g. "KEY", "name DESC"
    limit=None,         # int: maximum rows to return
    offset=None,        # int: rows to skip
    squeeze=False,      # bool: remove singleton dimensions from arrays
)

For in-store types (attachments, filepaths), files are downloaded to config["download_path"]. Use config.override() to change:

with dj.config.override(download_path="/data"):
    data = table.to_dicts()

Iteration

# Lazy streaming - yields one dict per row from database cursor
for row in table:
    process(row)  # row is a dict

Migration Guide

Basic Fetch Operations

Old Pattern (1.x) New Pattern (2.0)
table.fetch() table.to_arrays() or table.to_dicts()
table.fetch(format="array") table.to_arrays()
table.fetch(format="frame") table.to_pandas()
table.fetch(as_dict=True) table.to_dicts()

Attribute Fetching

Old Pattern (1.x) New Pattern (2.0)
table.fetch('a') table.to_arrays('a')
a, b = table.fetch('a', 'b') a, b = table.to_arrays('a', 'b')
table.fetch('a', 'b', as_dict=True) table.proj('a', 'b').to_dicts()

Primary Key Fetching

Old Pattern (1.x) New Pattern (2.0)
table.fetch('KEY') table.keys()
table.fetch(dj.key) table.keys()
keys, a = table.fetch('KEY', 'a') See note below

For mixed KEY + attribute fetch:

# Old: keys, a = table.fetch('KEY', 'a')
# New: Combine keys() with to_arrays()
keys = table.keys()
a = table.to_arrays('a')
# Or use to_dicts() which includes all columns

Ordering, Limiting, Offset

Old Pattern (1.x) New Pattern (2.0)
table.fetch(order_by='name') table.to_arrays(order_by='name')
table.fetch(limit=10) table.to_arrays(limit=10)
table.fetch(order_by='KEY', limit=10, offset=5) table.to_arrays(order_by='KEY', limit=10, offset=5)

Single Row Fetch (fetch1)

Old Pattern (1.x) New Pattern (2.0)
table.fetch1() table.fetch1() (unchanged)
a, b = table.fetch1('a', 'b') a, b = table.fetch1('a', 'b') (unchanged)
table.fetch1('KEY') table.fetch1() then extract pk columns

Configuration

Old Pattern (1.x) New Pattern (2.0)
dj.config['fetch_format'] = 'frame' Use .to_pandas() explicitly
with dj.config.override(fetch_format='frame'): Use .to_pandas() in the block

Iteration

Old Pattern (1.x) New Pattern (2.0)
for row in table: for row in table: (same syntax, now lazy!)
list(table) table.to_dicts()

Column Selection with proj()

Use .proj() for column selection, then apply output method:

# Select specific columns
table.proj('col1', 'col2').to_pandas()
table.proj('col1', 'col2').to_dicts()

# Computed columns
table.proj(total='price * quantity').to_pandas()

Removed Features

Removed Methods and Parameters

  • fetch() method - use explicit output methods
  • fetch('KEY') - use keys()
  • dj.key class - use keys() method
  • format= parameter - use explicit methods
  • as_dict= parameter - use to_dicts()
  • config['fetch_format'] setting - use explicit methods

Removed Imports

# Old (removed)
from datajoint import key
result = table.fetch(dj.key)

# New
result = table.keys()

Examples

Example 1: Basic Data Retrieval

# Get all data as DataFrame
df = Experiment().to_pandas()

# Get all data as list of dicts
rows = Experiment().to_dicts()

# Get all data as numpy array
arr = Experiment().to_arrays()

Example 2: Filtered and Sorted Query

# Get recent experiments, sorted by date
recent = (Experiment() & "date > '2024-01-01'").to_pandas(
    order_by='date DESC',
    limit=100
)

Example 3: Specific Columns

# Fetch specific columns as arrays
names, dates = Experiment().to_arrays('name', 'date')

# Or with primary key included
names, dates = Experiment().to_arrays('name', 'date', include_key=True)

Example 4: Primary Keys for Iteration

# Get keys for restriction
keys = Experiment().keys()
for key in keys:
    process(Session() & key)

Example 5: Single Row

# Get one row as dict
row = (Experiment() & key).fetch1()

# Get specific attributes
name, date = (Experiment() & key).fetch1('name', 'date')

Example 6: Lazy Iteration

# Stream rows efficiently (single database cursor)
for row in Experiment():
    if should_process(row):
        process(row)
    if done:
        break  # Early termination - no wasted fetches

Example 7: Modern DataFrame Libraries

# Polars (fast, modern)
import polars as pl
df = Experiment().to_polars()
result = df.filter(pl.col('value') > 100).group_by('category').agg(pl.mean('value'))

# PyArrow (zero-copy interop)
table = Experiment().to_arrow()
# Can convert to pandas or polars with zero copy

Performance Considerations

Lazy Iteration

The new iteration is significantly more efficient:

# Old (1.x): N+1 queries
# 1. fetch("KEY") gets ALL keys
# 2. fetch1() for EACH key

# New (2.0): Single query
# Streams rows from one cursor
for row in table:
    ...

Memory Efficiency

  • to_dicts(): Returns full list in memory
  • for row in table:: Streams one row at a time
  • to_arrays(limit=N): Fetches only N rows

Format Selection

Use Case Recommended Method
Data analysis to_pandas() or to_polars()
JSON API responses to_dicts()
Numeric computation to_arrays()
Large datasets for row in table: (streaming)
Interop with other tools to_arrow()

Error Messages

When attempting to use removed methods, users see helpful error messages:

>>> table.fetch()
AttributeError: fetch() has been removed in DataJoint 2.0.
Use to_dicts(), to_pandas(), to_arrays(), or keys() instead.
See table.fetch.__doc__ for details.

Optional Dependencies

Install optional dependencies for additional output formats:

# For polars support
pip install datajoint[polars]

# For PyArrow support
pip install datajoint[arrow]

# For both
pip install datajoint[polars,arrow]