Skip to content

Codecs

Type codec system

Codec type system for DataJoint.

This module provides the Codec base class for creating custom data types that extend DataJoint's native type system. Codecs provide encode/decode semantics for complex Python objects.

Codecs auto-register when subclassed - no decorator needed (Python 3.10+).

Example: class GraphCodec(dj.Codec): name = "graph"

    def get_dtype(self, is_store: bool) -> str:
        return "<blob>"

    def encode(self, graph, *, key=None, store_name=None):
        return {'nodes': list(graph.nodes()), 'edges': list(graph.edges())}

    def decode(self, stored, *, key=None):
        import networkx as nx
        G = nx.Graph()
        G.add_nodes_from(stored['nodes'])
        G.add_edges_from(stored['edges'])
        return G

# Then use in table definitions:
class MyTable(dj.Manual):
    definition = '''
    id : uint16
    data : <graph>
    '''

Codec

Bases: ABC

Base class for codec types. Subclasses auto-register by name.

Requires Python 3.10+.

Attributes:

Name Type Description
name str or None

Unique identifier used in <name> syntax. Must be set by subclasses.

Examples:

>>> class GraphCodec(dj.Codec):
...     name = "graph"
...
...     def get_dtype(self, is_store: bool) -> str:
...         return "<blob>"
...
...     def encode(self, graph, *, key=None, store_name=None):
...         return {'nodes': list(graph.nodes()), 'edges': list(graph.edges())}
...
...     def decode(self, stored, *, key=None):
...         import networkx as nx
...         G = nx.Graph()
...         G.add_nodes_from(stored['nodes'])
...         G.add_edges_from(stored['edges'])
...         return G

Use in table definitions::

class Connectivity(dj.Manual):
    definition = '''
    id : uint16
    ---
    graph_data : <graph>
    '''

Skip auto-registration for abstract base classes::

class ExternalOnlyCodec(dj.Codec, register=False):
    '''Abstract base - not registered.'''
    ...

get_dtype abstractmethod

get_dtype(is_store)

Return the storage dtype for this codec.

Parameters:

Name Type Description Default
is_store bool

True if @ modifier present (object store vs inline).

required

Returns:

Type Description
str

A core type (e.g., "bytes", "json") or another codec (e.g., "<hash>").

Raises:

Type Description
DataJointError

If store mode not supported but requested.

encode abstractmethod

encode(value, *, key=None, store_name=None)

Encode Python value for storage.

Parameters:

Name Type Description Default
value any

The Python object to store.

required
key dict

Primary key values. May be needed for path construction.

None
store_name str

Target store name for object storage.

None

Returns:

Type Description
any

Value in the format expected by the dtype.

decode abstractmethod

decode(stored, *, key=None)

Decode stored value back to Python.

Parameters:

Name Type Description Default
stored any

Data retrieved from storage.

required
key dict

Primary key values.

None

Returns:

Type Description
any

The reconstructed Python object.

validate

validate(value)

Validate a value before encoding.

Override this method to add type checking or domain constraints. Called automatically before encode() during INSERT operations. The default implementation accepts any value.

Parameters:

Name Type Description Default
value any

The value to validate.

required

Raises:

Type Description
TypeError

If the value has an incompatible type.

ValueError

If the value fails domain validation.

parse_type_spec

parse_type_spec(spec)

Parse a type specification into type name and optional store parameter.

Parameters:

Name Type Description Default
spec str

Type specification string, with or without angle brackets.

required

Returns:

Type Description
tuple[str, str | None]

(type_name, store_name). store_name is None if not specified, empty string if @ present without name (default store).

Examples:

>>> parse_type_spec("<blob>")
("blob", None)
>>> parse_type_spec("<blob@cold>")
("blob", "cold")
>>> parse_type_spec("<blob@>")
("blob", "")

unregister_codec

unregister_codec(name)

Remove a codec from the registry.

Primarily useful for testing. Use with caution in production code.

Parameters:

Name Type Description Default
name str

The codec name to unregister.

required

Raises:

Type Description
DataJointError

If the codec is not registered.

get_codec

get_codec(name)

Retrieve a registered codec by name.

Looks up the codec in the explicit registry first, then attempts to load from installed packages via entry points.

Parameters:

Name Type Description Default
name str

The codec name, with or without angle brackets. Store parameters (e.g., "<blob@cold>") are stripped.

required

Returns:

Type Description
Codec

The registered Codec instance.

Raises:

Type Description
DataJointError

If the codec is not found.

list_codecs

list_codecs()

List all registered codec names.

Returns:

Type Description
list[str]

Sorted list of registered codec names.

is_codec_registered

is_codec_registered(name)

Check if a codec name is registered.

Parameters:

Name Type Description Default
name str

The codec name to check (store parameters are ignored).

required

Returns:

Type Description
bool

True if the codec is registered.

resolve_dtype

resolve_dtype(dtype, seen=None, store_name=None)

Resolve a dtype string, following codec chains.

If dtype references another codec (e.g., "<hash>"), recursively resolves to find the ultimate storage type. Store parameters are propagated through the chain.

Parameters:

Name Type Description Default
dtype str

The dtype string to resolve (e.g., "<blob>", "<blob@cold>", "bytes").

required
seen set[str]

Set of already-seen codec names (for cycle detection).

None
store_name str

Store name from outer type specification (propagated inward).

None

Returns:

Type Description
tuple[str, list[Codec], str | None]

(final_storage_type, codec_chain, resolved_store_name). Chain is ordered from outermost to innermost codec.

Raises:

Type Description
DataJointError

If a circular type reference is detected.

Examples:

>>> resolve_dtype("<blob>")
("bytes", [BlobCodec], None)
>>> resolve_dtype("<blob@cold>")
("<hash>", [BlobCodec], "cold")
>>> resolve_dtype("bytes")
("bytes", [], None)

lookup_codec

lookup_codec(codec_spec)

Look up a codec from a type specification string.

Parses a codec specification (e.g., "<blob@store>") and returns the codec instance along with any store name.

Parameters:

Name Type Description Default
codec_spec str

The codec specification, with or without angle brackets. May include store parameter (e.g., "<blob@cold>").

required

Returns:

Type Description
tuple[Codec, str | None]

(codec_instance, store_name) or (codec_instance, None).

Raises:

Type Description
DataJointError

If the codec is not found.

decode_attribute

decode_attribute(attr, data, squeeze=False)

Decode raw database value using attribute's codec or native type handling.

This is the central decode function used by all fetch methods. It handles: - Codec chains (e.g., blob@store โ†’ โ†’ bytes) - Native type conversions (JSON, UUID) - Object storage downloads (via config["download_path"])

Args: attr: Attribute from the table's heading. data: Raw value fetched from the database. squeeze: If True, remove singleton dimensions from numpy arrays.

Returns: Decoded Python value.