Configuration Reference¶

DataJoint configuration options and settings.

Example: See lcms-demo for a complete project with configuration files.

Configuration Sources¶

Configuration is loaded in priority order:

Environment variables (highest priority)
Secrets directory (.secrets/)
Config file (datajoint.json)
Defaults (lowest priority)

Database Settings¶

Setting	Environment	Default	Description
`database.backend`	`DJ_BACKEND`	`mysql`	Database backend: `mysql` or `postgresql` (new in 2.1)
`database.host`	`DJ_HOST`	`localhost`	Database server hostname
`database.port`	`DJ_PORT`	`3306`/`5432`	Database server port (auto-detects from backend)
`database.user`	`DJ_USER`	—	Database username (required)
`database.password`	`DJ_PASS`	—	Database password (required)
`database.name`	`DJ_DATABASE_NAME`	`None`	Database name for PostgreSQL connections. Defaults to `"postgres"` if not set. (new in 2.2.1)
`database.reconnect`	—	`True`	Auto-reconnect on connection loss
`database.use_tls`	`DJ_USE_TLS`	`None`	Enable TLS encryption (env var new in 2.1)
`database.database_prefix`	`DJ_DATABASE_PREFIX`	`""`	(Deprecated — use `database.name` instead) Prefix for database/schema names
`database.create_tables`	`DJ_CREATE_TABLES`	`True`	Default for `Schema(create_tables=)`. Set `False` for production mode

Connection Settings¶

Setting	Default	Description
`connection.init_function`	`None`	SQL function to run on connect
`connection.charset`	`""`	Character set (pymysql default)

Stores Configuration¶

Unified storage configuration for all in-store types (<blob@>, <attach@>, <filepath@>, <object@>, <npy@>).

Default stores:

DataJoint uses two default settings to reflect the architectural distinction between integrated and reference storage:

Setting	Default	Description
`stores.default`	—	Default store for integrated storage (`<blob@>`, `<attach@>`, `<object@>`, `<npy@>`)
`stores.filepath_default`	—	Default store for filepath references (`<filepath@>`) — often different from `stores.default`

Why separate defaults? Hash and schema-addressed storage are integrated into the Object-Augmented Schema (OAS)—DataJoint manages paths, lifecycle, and integrity. Filepath storage is user-managed references to existing files—DataJoint only stores the path. These are architecturally distinct and often use different storage locations.

Common settings (all protocols):

Setting	Required	Description
`stores.<name>.protocol`	Yes	Storage protocol: `file`, `s3`, `gcs`, `azure`
`stores.<name>.location`	Yes	Base path or prefix (includes project context)
`stores.<name>.hash_prefix`	No	Path prefix for hash-addressed section (default: `"_hash"`)
`stores.<name>.schema_prefix`	No	Path prefix for schema-addressed section (default: `"_schema"`)
`stores.<name>.filepath_prefix`	No	Required path prefix for filepath section, or `null` for unrestricted (default: `null`)
`stores.<name>.subfolding`	No	Directory nesting for hash-addressed storage, e.g., `[2, 2]` (default: no subfolding)
`stores.<name>.partition_pattern`	No	Path partitioning for schema-addressed storage, e.g., `"subject_id/session_date"` (default: no partitioning)
`stores.<name>.token_length`	No	Random token length for schema-addressed filenames (default: `8`)

Storage sections:

Each store is divided into sections defined by prefix configuration. The *_prefix parameters set the path prefix for each storage section:

hash_prefix: Defines the hash-addressed section for <blob@> and <attach@> (default: "_hash")
schema_prefix: Defines the schema-addressed section for <object@> and <npy@> (default: "_schema")
filepath_prefix: Optionally restricts the filepath section for <filepath@> (default: null = unrestricted)

Prefixes must be mutually exclusive (no prefix can be a parent/child of another). This allows mapping DataJoint to existing storage layouts:

{
  "stores": {
    "legacy": {
      "protocol": "file",
      "location": "/data/existing_storage",
      "hash_prefix": "content_addressed",      // Path prefix for hash section
      "schema_prefix": "structured_data",       // Path prefix for schema section
      "filepath_prefix": "raw_files"            // Path prefix for filepath section
    }
  }
}

S3-specific settings:

Setting	Required	Description
`stores.<name>.endpoint`	Yes	S3 endpoint URL (e.g., `s3.amazonaws.com`)
`stores.<name>.bucket`	Yes	Bucket name
`stores.<name>.access_key`	Yes	S3 access key ID
`stores.<name>.secret_key`	Yes	S3 secret access key
`stores.<name>.secure`	No	Use HTTPS (default: `True`)

GCS-specific settings:

Setting	Required	Description
`stores.<name>.bucket`	Yes	GCS bucket name
`stores.<name>.token`	Yes	Authentication token path
`stores.<name>.project`	No	GCS project ID

Azure-specific settings:

Setting	Required	Description
`stores.<name>.container`	Yes	Azure container name
`stores.<name>.account_name`	Yes	Storage account name
`stores.<name>.account_key`	Yes	Storage account key
`stores.<name>.connection_string`	No	Alternative to account_name + account_key

How storage methods use stores:

Hash-addressed (<blob@>, <attach@>): {location}/{hash_prefix}/{schema}/{hash} with optional subfolding
Schema-addressed (<object@>, <npy@>): {location}/{schema_prefix}/{partition}/{schema}/{table}/{key}/{field}.{token}.{ext} with optional partitioning
Filepath (<filepath@>): {location}/{filepath_prefix}/{user_path} (user-managed, cannot use hash or schema prefixes)

All storage methods share the same stores and default store. DataJoint reserves the configured hash_prefix and schema_prefix sections for managed storage; <filepath@> references can use any other paths (unless filepath_prefix is configured to restrict them).

Path structure examples:

Without partitioning:

{location}/_hash/{schema}/ab/cd/abcd1234...                    # hash-addressed with subfolding
{location}/_schema/{schema}/{table}/{key}/data.x8f2a9b1.zarr   # schema-addressed, no partitioning

With partition_pattern: "subject_id/session_date":

{location}/_schema/subject_id=042/session_date=2024-01-15/{schema}/{table}/{remaining_key}/data.x8f2a9b1.zarr

If table lacks partition attributes, it follows normal path structure.

Credentials should be stored in secrets:

.secrets/
├── stores.main.access_key
├── stores.main.secret_key
├── stores.archive.access_key
├── stores.archive.secret_key
└── stores.uc.token            # any stores.<name>.<attr> file (new in 2.2.4)

New in 2.2.4

Any stores.<name>.<attr> file is loaded, not only access_key / secret_key. This supports plugin-registered adapters with arbitrary field names (e.g. a Bearer token). See Storage Adapter API.

Jobs Settings¶

Setting	Default	Description
`jobs.auto_refresh`	`True`	Auto-refresh job queue on populate
`jobs.keep_completed`	`False`	Retain success records in jobs table
`jobs.stale_timeout`	`3600`	Seconds before stale job cleanup
`jobs.default_priority`	`5`	Default priority (0-255, lower = more urgent)
`jobs.version_method`	`None`	Version tracking: `git`, `none`, or `None` (disabled)
`jobs.add_job_metadata`	`False`	Add hidden metadata to computed tables
`jobs.allow_new_pk_fields_in_computed_tables`	`False`	Allow non-FK primary key fields

Display Settings¶

Setting	Environment	Default	Description
`display.limit`	—	`12`	Max rows to display
`display.width`	—	`14`	Column width
`display.show_tuple_count`	—	`True`	Show row count in output
`display.diagram_direction`	`DJ_DIAGRAM_DIRECTION`	`LR`	Diagram layout: `LR` (left-right) or `TB` (top-bottom) (new in 2.1)

Top-Level Settings¶

Setting	Environment	Default	Description
`loglevel`	`DJ_LOG_LEVEL`	`INFO`	Log level: `DEBUG`, `INFO`, `WARNING`, `ERROR`, `CRITICAL`
`safemode`	—	`True`	Require confirmation for destructive operations
`enable_python_native_blobs`	—	`True`	Allow Python-native blob serialization
`cache`	—	`None`	Path for query result cache
`query_cache`	—	`None`	Path for compiled query cache
`download_path`	—	`.`	Download location for attachments/filepaths
`stores`	`DJ_STORES`	`{}`	JSON-encoded `stores` dict. Replaces the `stores` block from `datajoint.json`. (new in 2.2.4)
`ignore_config_file`	`DJ_IGNORE_CONFIG_FILE`	`False`	Skip `datajoint.json` and the secrets directory. (new in 2.2.4)

Example Configuration¶

datajoint.json (Non-sensitive settings)¶

{
    "database.host": "mysql.example.com",
    "database.port": 3306,
    "stores": {
        "default": "main",
        "filepath_default": "raw_data",
        "main": {
            "protocol": "s3",
            "endpoint": "s3.amazonaws.com",
            "bucket": "datajoint-bucket",
            "location": "neuroscience-lab/production",
            "partition_pattern": "subject_id/session_date",
            "token_length": 8
        },
        "archive": {
            "protocol": "s3",
            "endpoint": "s3.amazonaws.com",
            "bucket": "archive-bucket",
            "location": "neuroscience-lab/long-term",
            "subfolding": [2, 2]
        },
        "raw_data": {
            "protocol": "file",
            "location": "/mnt/acquisition",
            "filepath_prefix": "recordings"
        }
    },
    "jobs": {
        "add_job_metadata": true
    }
}

.secrets/ (Credentials - never commit!)¶

.secrets/
├── database.user                      # analyst
├── database.password                  # dbpass123
├── stores.main.access_key             # AKIAIOSFODNN7EXAMPLE
├── stores.main.secret_key             # wJalrXUtnFEMI/K7MDENG...
├── stores.archive.access_key          # AKIAIOSFODNN8EXAMPLE
└── stores.archive.secret_key          # xKbmsYVuoGFNJ/L8NEOH...

Add .secrets/ to .gitignore:

echo ".secrets/" >> .gitignore

Environment Variables (Alternative to .secrets/)¶

# Database
export DJ_HOST=mysql.example.com
export DJ_USER=analyst
export DJ_PASS=secret
export DJ_DATABASE_NAME=my_database  # PostgreSQL only (new in 2.2.1)

# Stores (new in 2.2.4) — JSON-encoded copy of the stores block
export DJ_STORES='{
  "default": "main",
  "main": {
    "protocol": "s3",
    "endpoint": "s3.amazonaws.com",
    "bucket": "datajoint-bucket",
    "location": "neuroscience-lab/production",
    "access_key": "AKIA...",
    "secret_key": "wJal..."
  }
}'

# Skip datajoint.json and .secrets/ entirely (new in 2.2.4)
export DJ_IGNORE_CONFIG_FILE=true

New in 2.2.4

DJ_STORES carries a JSON-encoded copy of the stores block. DJ_IGNORE_CONFIG_FILE=true skips datajoint.json, the project .secrets/, and /run/secrets/datajoint/ — useful for env-var-only deployments (Kubernetes pods, the DataJoint platform). See Manage Secrets.

Programmatic Access¶

Reading and Writing Settings¶

Access settings using dot notation on dj.config:

import datajoint as dj

# Read settings
print(dj.config.database.host)
print(dj.config.display.diagram_direction)

# Write settings
dj.config.database.host = "mysql.example.com"
dj.config.display.diagram_direction = "TB"

Temporary Overrides¶

Use dj.config.override() to temporarily change settings within a context:

with dj.config.override(safemode=False):
    # safemode is False here
    (Table & key).delete()
# safemode is restored to original value

Nested settings syntax: Since Python doesn't allow dots in keyword argument names, use double underscores (__) to access nested settings in override():

# These are equivalent:
dj.config.display.diagram_direction = "TB"  # direct assignment

with dj.config.override(display__diagram_direction="TB"):  # in override()
    ...

The double underscore maps to the dot-notation path: display__diagram_direction → display.diagram_direction.

Instance API¶

New in 2.2

dj.Instance provides isolated config and connection for multi-tenant and thread-safe use.

dj.Instance encapsulates a Config and Connection pair, independent of global state:

inst = dj.Instance(host="localhost", user="root", password="secret")
schema = inst.Schema("my_schema")

Parameters¶

Parameter	Type	Default	Description
`host`	str	—	Database hostname (required)
`user`	str	—	Database username (required)
`password`	str	—	Database password (required)
`port`	int	from config	Database port
`use_tls`	bool or dict	`None`	TLS configuration
`**kwargs`	—	—	Config overrides (e.g., `safemode=False`, `database__reconnect=False`)

Attributes and Methods¶

Member	Description
`inst.config`	This Instance's Config object
`inst.connection`	This Instance's Connection object
`inst.Schema(name)`	Create a Schema bound to this Instance
`inst.FreeTable(full_name)`	Create a FreeTable bound to this Instance

See What's New in 2.2 for usage examples and rationale.

Thread-Safe Mode¶

New in 2.2

Thread-safe mode disables global singletons to prevent shared-state bugs in concurrent applications.

Setting	Environment	Default	Description
Thread-safe mode	`DJ_THREAD_SAFE`	`false`	Disable global `dj.config` and `dj.conn()`

Accepted values for DJ_THREAD_SAFE: true, 1, yes (enabled) or false, 0, no (disabled).

When enabled, the following raise ThreadSafetyError:

Any access to dj.config (read or write)
Calling dj.conn()
Calling dj.Schema() without an explicit connection

Only dj.Instance() works in thread-safe mode, enforcing explicit connection management.

import os
os.environ["DJ_THREAD_SAFE"] = "true"

import datajoint as dj

# These all raise ThreadSafetyError:
# dj.config.database.host
# dj.conn()
# dj.Schema("my_db")

# This works:
inst = dj.Instance(host="localhost", user="root", password="secret")
schema = inst.Schema("my_db")

API Reference¶

See Settings API for programmatic access.