Deployment Operations Specification¶
This document specifies the datajoint.deploy module โ idempotent, re-runnable operations that configure an existing schema for its deployment environment (CDC tools, replication, role grants, performance tuning).
For one-shot schema-evolution operations (column migrations, lineage repair, retroactive job-metadata columns), see datajoint.migrate (referenced in the Data Manipulation Specification).
New in 2.3
The datajoint.deploy module is introduced in DataJoint 2.3, beginning with set_replica_identity for PostgreSQL CDC integration.
Scope: migration vs. deployment¶
DataJoint exposes two categories of operational helpers. The distinction is load-bearing โ applying the wrong one at the wrong time produces inconsistent state.
datajoint.migrate |
datajoint.deploy |
|
|---|---|---|
| Purpose | Schema/state evolution, fixing legacy | Configure an environment for a consumer's requirements |
| Cadence | One-shot transitions | Idempotent, re-runnable in deploy hooks |
| Trigger | Schema definition changed, or repair needed | Environment changes (new CDC consumer, replication topology) |
| Examples | migrate_columns, add_job_metadata_columns, rebuild_lineage |
set_replica_identity |
A deployment operation must be safe to call repeatedly without accumulating side effects: re-running it brings the environment to the same end state and is a no-op when already there.
set_replica_identity¶
Apply ALTER TABLE ... REPLICA IDENTITY DEFAULT|FULL to every user table in a schema, or to a single table, on PostgreSQL.
Signature¶
def set_replica_identity(
target: Schema | Table,
mode: Literal["default", "full"] = "full",
dry_run: bool = True,
) -> dict
Parameters¶
| Name | Type | Default | Description |
|---|---|---|---|
target |
Schema or Table (class or instance) |
โ | Schema (all user tables) or a single table. |
mode |
str |
"full" |
"default" (PK only) or "full" (entire old row). |
dry_run |
bool |
True |
If True, collect DDL but do not execute. |
Return value¶
A dict:
| Key | Type | Description |
|---|---|---|
tables_analyzed |
int |
Number of tables considered. |
tables_modified |
int |
Tables on which the ALTER ran. 0 when dry_run=True. |
ddl |
list[str] |
DDL statements that were (or would be) executed. |
Errors¶
| Condition | Behavior |
|---|---|
| Connection's adapter is not PostgreSQL | DataJointError: "set_replica_identity is PostgreSQL-only; โฆ" |
mode is not "default" or "full" |
DataJointError: "mode must be 'default' or 'full'; โฆ" |
target is not a Schema or Table |
DataJointError: "target must be a Schema or Table class/instance; โฆ" |
Behavior¶
For each user table in the target (excluding ~-prefixed hidden tables), the function builds ALTER TABLE "{schema}"."{table}" REPLICA IDENTITY {MODE} via the PostgreSQL adapter's replica_identity_ddl() and either records it (dry-run) or executes it on the connection.
Both default and full produce explicit ALTER statements. default is not treated as a no-op โ it actively resets the table to PostgreSQL's default, which is the right semantics when reverting from FULL.
The underlying ALTER is metadata-only, instant, and idempotent at the PostgreSQL layer (re-applying the same mode is a no-op at the storage layer).
Design rationale¶
Three structural decisions distinguish dj.deploy from alternatives that were considered and rejected. Each is informed by the failure modes the alternative would have produced.
1. Migration-only, not auto-emit on declare()¶
Issue #1447 originally proposed two mechanisms โ a database.replica_identity config flag applied automatically during declare(), plus a utility for existing tables. We collapsed to migration-only. Two mechanisms would produce mixed state: a deployment with the config set, applied mid-cycle, would have new tables at FULL and old tables at DEFAULT until someone remembered to run the migration. One mechanism is the only path that converges.
2. Not in dj.migrate¶
dj.migrate covers one-shot schema-evolution operations: fix lineage, add job-metadata columns, transform external store layouts. set_replica_identity is not a one-shot transition โ a fresh declare in a staging environment may need it re-applied; deploy hooks may run it on every release. The cadence and trigger differ, and conflating them in one module obscures the difference.
3. New module for an emerging category¶
set_replica_identity is the first of a category. Plausible siblings, as needs arise:
- Publication membership for PostgreSQL logical replication (
CREATE PUBLICATION โฆ FOR TABLE โฆ). - Maintenance:
vacuum_analyze,reindex, table-level autovacuum parameters. - Role/grant management for shared environments.
Creating dj.deploy now โ with one inhabitant โ gives those future helpers a clear home and keeps dj.migrate focused. The cost is one file; the alternative is an indefinite period of "where do I put this?" for every operational helper.
Idempotency and re-running¶
Every function in datajoint.deploy must be safe to re-run. set_replica_identity satisfies this because:
- The DDL is generated freshly each call.
- The PostgreSQL ALTER is metadata-only and applying the same mode again is a no-op at the storage layer.
- The dry-run path produces a complete preview without executing.
Deploy hooks may call set_replica_identity(schema, mode="full", dry_run=False) on every release without accumulating side effects.
Related¶
- Explanation: PostgreSQL CDC and Replica Identity
- Data Manipulation Specification: Data Manipulation (insert / update / delete; not deployment-time)
- PostgreSQL: Logical Replication โ Replica Identity