Skip to content

What's New in DataJoint 2.2.4

DataJoint 2.2.4 introduces env-var-only configuration of storage, a public plugin-adapter contract for third-party storage protocols, and tightens credential loading for files.

Upgrading from 2.2.0โ€“2.2.3? No breaking changes for projects using datajoint.json or .secrets/. The new env vars are purely additive.

Overview

The DataJoint platform โ€” and many production deployments generally โ€” provision configuration entirely from environment variables: there is no datajoint.json in the container image and no .secrets/ directory on disk. Until 2.2.4, this worked for the database connection (DJ_HOST, DJ_USER, DJ_PASS, โ€ฆ) but not for object stores: per-store credentials had to be configured through datajoint.json or .secrets/stores.<name>.<attr> files.

DataJoint 2.2.4 closes that gap with two new env vars, both purely additive:

  • DJ_STORES โ€” a JSON-encoded copy of the entire stores dict, in the same shape used in datajoint.json.
  • DJ_IGNORE_CONFIG_FILE โ€” a boolean flag that skips both datajoint.json and the secrets directory entirely.

The 2.2.4 release also formalizes the storage-adapter plugin contract (datajoint.storage entry-point group), which had been used internally since 2.0 but lacked a published spec. Third-party packages can now register storage protocols (Databricks Unity Catalog Volumes, custom HTTP-based stores, lab-specific archive systems, โ€ฆ) by subclassing dj.StorageAdapter and declaring an entry point.

DJ_STORES โ€” JSON-encoded stores configuration

New in 2.2.4

DJ_STORES accepts a JSON object identical to the stores block of datajoint.json.

A single env var carries the entire stores dict. The format matches what users already write in datajoint.json, so config can be moved between file and env var by copy-paste โ€” no per-field naming scheme to learn.

export DJ_STORES='{
  "default": "main",
  "main": {
    "protocol": "s3",
    "endpoint": "s3.amazonaws.com",
    "bucket": "my-bucket",
    "location": "my-project/production",
    "access_key": "AKIA...",
    "secret_key": "wJal..."
  }
}'

For plugin-registered adapters, the field names are whatever the adapter defines โ€” token, api_key, workspace_url, etc.:

export DJ_STORES='{
  "uc": {
    "protocol": "databricks",
    "workspace_url": "https://my-workspace.cloud.databricks.com",
    "volume": "main.default.my_volume",
    "token": "dapibd..."
  }
}'

Precedence

DJ_STORES, if set, replaces the stores block loaded from datajoint.json wholesale. The .secrets/ directory still runs after DJ_STORES and fills in any attributes that DJ_STORES omits โ€” useful if a deployment wants to inject only secrets via env vars while leaving non-sensitive store config in a file.

Source Priority
dj.config["stores"][...] (programmatic) 1 (highest)
DJ_STORES env var 2
datajoint.json stores block 3
.secrets/stores.<name>.<attr> files 4 (fills missing attrs only)

Errors

If DJ_STORES is set but unparsable, DataJoint raises ValueError at config load time with the JSON error, rather than failing later with a confusing KeyError from a half-loaded store.

ValueError: DJ_STORES contains invalid JSON: Expecting property name enclosed in double quotes...

DJ_IGNORE_CONFIG_FILE โ€” skip files entirely

New in 2.2.4

Set DJ_IGNORE_CONFIG_FILE=true to skip datajoint.json and the secrets directory.

For env-var-only deployments โ€” Kubernetes pods, Lambda functions, the DataJoint platform โ€” set:

export DJ_IGNORE_CONFIG_FILE=true

When true, DataJoint skips:

  • the recursive parent-directory search for datajoint.json
  • the project .secrets/ directory
  • the Docker/Kubernetes /run/secrets/datajoint/ directory

Only env vars (DJ_HOST, DJ_USER, DJ_PASS, DJ_STORES, โ€ฆ) and defaults apply. This guarantees that no stray file in a container image can leak into config.

Variable Values Default Description
DJ_IGNORE_CONFIG_FILE true, 1, yes / false, 0, no false Skip file-based config sources

.secrets/stores.<name>.<attr> accepts any attribute

New in 2.2.4

Any .secrets/stores.<name>.<attr> file loads into dj.config["stores"][<name>][<attr>], not just access_key / secret_key.

Previously, only .secrets/stores.<name>.access_key and .secrets/stores.<name>.secret_key were honored. Plugin-registered adapters often need other field names โ€” a Databricks adapter wants a Bearer token, an HTTP adapter might want api_key, etc.

In 2.2.4, any file matching stores.<name>.<attr> under the secrets directory is loaded:

.secrets/
โ”œโ”€โ”€ stores.uc.token         # Databricks Bearer token
โ”œโ”€โ”€ stores.main.access_key  # S3 access key
โ””โ”€โ”€ stores.main.secret_key  # S3 secret key

Config-file values and DJ_STORES still take precedence โ€” secrets only fill attributes that are not already set.

Storage-adapter plugin contract

New in 2.2.4

The datajoint.storage entry-point group is now part of the public API.

Third-party packages can register additional storage protocols (Databricks Unity Catalog Volumes, custom HTTP-based stores, lab archive systems) by declaring an entry point. The built-in file, s3, gcs, and azure protocols continue to be served by the existing internal dispatch in StorageBackend; migrating them onto the public adapter contract is tracked separately.

# pyproject.toml of a plugin package
[project.entry-points."datajoint.storage"]
databricks = "dj_databricks:DatabricksVolumesAdapter"

Once installed, the protocol name (databricks in the example) is accepted in any stores.<name>.protocol field, and DataJoint will use the adapter to construct the underlying fsspec filesystem.

See Storage Adapter API for the full plugin contract.

See Also