Codecs¶
Type codec system
Codec type system for DataJoint.
This module provides the Codec base class for creating custom data types that extend DataJoint's native type system. Codecs provide encode/decode semantics for complex Python objects.
Codecs auto-register when subclassed - no decorator needed (Python 3.10+).
Example: class GraphCodec(dj.Codec): name = "graph"
def get_dtype(self, is_store: bool) -> str:
return "<blob>"
def encode(self, graph, *, key=None, store_name=None):
return {'nodes': list(graph.nodes()), 'edges': list(graph.edges())}
def decode(self, stored, *, key=None):
import networkx as nx
G = nx.Graph()
G.add_nodes_from(stored['nodes'])
G.add_edges_from(stored['edges'])
return G
# Then use in table definitions:
class MyTable(dj.Manual):
definition = '''
id : uint16
id : uint16
data : <graph>
'''
Codec ¶
Bases: ABC
Base class for codec types. Subclasses auto-register by name.
Requires Python 3.10+.
Attributes:
| Name | Type | Description |
|---|---|---|
name |
str or None
|
Unique identifier used in |
Examples:
>>> class GraphCodec(dj.Codec):
... name = "graph"
...
... def get_dtype(self, is_store: bool) -> str:
... return "<blob>"
...
... def encode(self, graph, *, key=None, store_name=None):
... return {'nodes': list(graph.nodes()), 'edges': list(graph.edges())}
...
... def decode(self, stored, *, key=None):
... import networkx as nx
... G = nx.Graph()
... G.add_nodes_from(stored['nodes'])
... G.add_edges_from(stored['edges'])
... return G
Use in table definitions::
class Connectivity(dj.Manual):
definition = '''
id : uint16
---
graph_data : <graph>
'''
Skip auto-registration for abstract base classes::
class ExternalOnlyCodec(dj.Codec, register=False):
'''Abstract base - not registered.'''
...
get_dtype
abstractmethod
¶
get_dtype(is_store)
Return the storage dtype for this codec.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
is_store
|
bool
|
True if |
required |
Returns:
| Type | Description |
|---|---|
str
|
A core type (e.g., |
Raises:
| Type | Description |
|---|---|
DataJointError
|
If store mode not supported but requested. |
encode
abstractmethod
¶
encode(value, *, key=None, store_name=None)
Encode Python value for storage.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
value
|
any
|
The Python object to store. |
required |
key
|
dict
|
Primary key values. May be needed for path construction. |
None
|
store_name
|
str
|
Target store name for object storage. |
None
|
Returns:
| Type | Description |
|---|---|
any
|
Value in the format expected by the dtype. |
decode
abstractmethod
¶
decode(stored, *, key=None)
Decode stored value back to Python.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
stored
|
any
|
Data retrieved from storage. |
required |
key
|
dict
|
Primary key values. |
None
|
Returns:
| Type | Description |
|---|---|
any
|
The reconstructed Python object. |
validate ¶
validate(value)
Validate a value before encoding.
Override this method to add type checking or domain constraints.
Called automatically before encode() during INSERT operations.
The default implementation accepts any value.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
value
|
any
|
The value to validate. |
required |
Raises:
| Type | Description |
|---|---|
TypeError
|
If the value has an incompatible type. |
ValueError
|
If the value fails domain validation. |
parse_type_spec ¶
parse_type_spec(spec)
Parse a type specification into type name and optional store parameter.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
spec
|
str
|
Type specification string, with or without angle brackets. |
required |
Returns:
| Type | Description |
|---|---|
tuple[str, str | None]
|
|
Examples:
>>> parse_type_spec("<blob>")
("blob", None)
>>> parse_type_spec("<blob@cold>")
("blob", "cold")
>>> parse_type_spec("<blob@>")
("blob", "")
unregister_codec ¶
unregister_codec(name)
Remove a codec from the registry.
Primarily useful for testing. Use with caution in production code.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
name
|
str
|
The codec name to unregister. |
required |
Raises:
| Type | Description |
|---|---|
DataJointError
|
If the codec is not registered. |
get_codec ¶
get_codec(name)
Retrieve a registered codec by name.
Looks up the codec in the explicit registry first, then attempts to load from installed packages via entry points.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
name
|
str
|
The codec name, with or without angle brackets.
Store parameters (e.g., |
required |
Returns:
| Type | Description |
|---|---|
Codec
|
The registered Codec instance. |
Raises:
| Type | Description |
|---|---|
DataJointError
|
If the codec is not found. |
list_codecs ¶
list_codecs()
List all registered codec names.
Returns:
| Type | Description |
|---|---|
list[str]
|
Sorted list of registered codec names. |
is_codec_registered ¶
is_codec_registered(name)
Check if a codec name is registered.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
name
|
str
|
The codec name to check (store parameters are ignored). |
required |
Returns:
| Type | Description |
|---|---|
bool
|
True if the codec is registered. |
resolve_dtype ¶
resolve_dtype(dtype, seen=None, store_name=None)
Resolve a dtype string, following codec chains.
If dtype references another codec (e.g., "<hash>"), recursively
resolves to find the ultimate storage type. Store parameters are propagated
through the chain.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
dtype
|
str
|
The dtype string to resolve (e.g., |
required |
seen
|
set[str]
|
Set of already-seen codec names (for cycle detection). |
None
|
store_name
|
str
|
Store name from outer type specification (propagated inward). |
None
|
Returns:
| Type | Description |
|---|---|
tuple[str, list[Codec], str | None]
|
|
Raises:
| Type | Description |
|---|---|
DataJointError
|
If a circular type reference is detected. |
Examples:
>>> resolve_dtype("<blob>")
("bytes", [BlobCodec], None)
>>> resolve_dtype("<blob@cold>")
("<hash>", [BlobCodec], "cold")
>>> resolve_dtype("bytes")
("bytes", [], None)
lookup_codec ¶
lookup_codec(codec_spec)
Look up a codec from a type specification string.
Parses a codec specification (e.g., "<blob@store>") and returns
the codec instance along with any store name.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
codec_spec
|
str
|
The codec specification, with or without angle brackets.
May include store parameter (e.g., |
required |
Returns:
| Type | Description |
|---|---|
tuple[Codec, str | None]
|
|
Raises:
| Type | Description |
|---|---|
DataJointError
|
If the codec is not found. |
decode_attribute ¶
decode_attribute(attr, data, squeeze=False)
Decode raw database value using attribute's codec or native type handling.
This is the central decode function used by all fetch methods. It handles:
- Codec chains (e.g., blob@store โ
Args: attr: Attribute from the table's heading. data: Raw value fetched from the database. squeeze: If True, remove singleton dimensions from numpy arrays.
Returns: Decoded Python value.