Data Entry¶

This tutorial covers how to manipulate data in DataJoint tables. You'll learn:

Insert — Adding rows to tables
Update — Modifying existing rows (for corrections)
Delete — Removing rows with cascading
Validation — Checking data before insertion

DataJoint is designed around insert and delete as the primary operations. Updates are intentionally limited to surgical corrections.

In [1]:

Copied!

import datajoint as dj
import numpy as np

schema = dj.Schema('tutorial_data_entry')
import datajoint as dj
import numpy as np

schema = dj.Schema('tutorial_data_entry')

[2026-03-13 13:18:17][WARNING]: SSL connection failed (connection to server at "localhost" (::1), port 5432 failed: server does not support SSL, but SSL was required
). Falling back to non-SSL connection. To require SSL, set use_tls=True explicitly.

[2026-03-13 13:18:17] DataJoint 2.2.0.dev0 connected to postgres@localhost:5432

In [2]:

Copied!





# Define tables for this tutorial
@schema
class Lab(dj.Manual):
    definition = """
    lab_id : varchar(16)
    ---
    lab_name : varchar(100)
    """

@schema
class Subject(dj.Manual):
    definition = """
    subject_id : varchar(16)
    ---
    -> Lab
    species : varchar(50)
    date_of_birth : date
    notes = '' : varchar(1000)
    """

@schema
class Session(dj.Manual):
    definition = """
    -> Subject
    session_idx : int32
    ---
    session_date : date
    duration : decimal(4,1)       # minutes
    """

    class Trial(dj.Part):
        definition = """
        -> master
        trial_idx : int32
        ---
        outcome : enum('hit', 'miss', 'false_alarm', 'correct_reject')
        reaction_time : decimal(3,2)  # seconds
        """

@schema
class SessionAnalysis(dj.Computed):
    definition = """
    -> Session
    ---
    hit_rate : float32
    """
    
    def make(self, key):
        outcomes = (Session.Trial & key).to_arrays('outcome')
        n_trials = len(outcomes)
        hit_rate = np.sum(outcomes == 'hit') / n_trials if n_trials else 0.0
        self.insert1({**key, 'hit_rate': hit_rate})
# Define tables for this tutorial
@schema
class Lab(dj.Manual):
    definition = """
    lab_id : varchar(16)
    ---
    lab_name : varchar(100)
    """

@schema
class Subject(dj.Manual):
    definition = """
    subject_id : varchar(16)
    ---
    -> Lab
    species : varchar(50)
    date_of_birth : date
    notes = '' : varchar(1000)
    """

@schema
class Session(dj.Manual):
    definition = """
    -> Subject
    session_idx : int32
    ---
    session_date : date
    duration : decimal(4,1)       # minutes
    """

    class Trial(dj.Part):
        definition = """
        -> master
        trial_idx : int32
        ---
        outcome : enum('hit', 'miss', 'false_alarm', 'correct_reject')
        reaction_time : decimal(3,2)  # seconds
        """

@schema
class SessionAnalysis(dj.Computed):
    definition = """
    -> Session
    ---
    hit_rate : float32
    """
    
    def make(self, key):
        outcomes = (Session.Trial & key).to_arrays('outcome')
        n_trials = len(outcomes)
        hit_rate = np.sum(outcomes == 'hit') / n_trials if n_trials else 0.0
        self.insert1({**key, 'hit_rate': hit_rate})

Insert Operations¶

`insert1()` — Single Row¶

Use insert1() to add a single row as a dictionary:

In [3]:

Copied!





# Insert a single row
Lab.insert1({'lab_id': 'tolias', 'lab_name': 'Tolias Lab'})

Subject.insert1({
    'subject_id': 'M001',
    'lab_id': 'tolias',
    'species': 'Mus musculus',
    'date_of_birth': '2026-01-15'
})

Subject()
# Insert a single row
Lab.insert1({'lab_id': 'tolias', 'lab_name': 'Tolias Lab'})

Subject.insert1({
    'subject_id': 'M001',
    'lab_id': 'tolias',
    'species': 'Mus musculus',
    'date_of_birth': '2026-01-15'
})

Subject()

Out[3]:

subject_id	lab_id	species	date_of_birth	notes
M001	tolias	Mus musculus	2026-01-15

Total: 1

`insert()` — Multiple Rows¶

Use insert() to add multiple rows at once. This is more efficient than calling insert1() in a loop.

In [4]:

Copied!





# Insert multiple rows as a list of dictionaries
Subject.insert([
    {
        'subject_id': 'M002',
        'lab_id': 'tolias',
        'species': 'Mus musculus',
        'date_of_birth': '2026-02-01'
    },
    {
        'subject_id': 'M003',
        'lab_id': 'tolias',
        'species': 'Mus musculus',
        'date_of_birth': '2026-02-15'
    },
])

Subject()
# Insert multiple rows as a list of dictionaries
Subject.insert([
    {
        'subject_id': 'M002',
        'lab_id': 'tolias',
        'species': 'Mus musculus',
        'date_of_birth': '2026-02-01'
    },
    {
        'subject_id': 'M003',
        'lab_id': 'tolias',
        'species': 'Mus musculus',
        'date_of_birth': '2026-02-15'
    },
])

Subject()

Out[4]:

subject_id	lab_id	species	date_of_birth
M001	tolias	Mus musculus	2026-01-15
M002	tolias	Mus musculus	2026-02-01
M003	tolias	Mus musculus	2026-02-15

Total: 3

Accepted Input Formats¶

insert() accepts several formats:

Format	Example
List of dicts	`[{'id': 1, 'name': 'A'}, ...]`
pandas DataFrame	`pd.DataFrame({'id': [1, 2], 'name': ['A', 'B']})`
numpy structured array	`np.array([(1, 'A')], dtype=[('id', int), ('name', 'U10')])`
QueryExpression	`OtherTable.proj(...)` (INSERT...SELECT)

In [5]:

Copied!





# Insert from pandas DataFrame
import pandas as pd

df = pd.DataFrame({
    'subject_id': ['M004', 'M005'],
    'lab_id': ['tolias', 'tolias'],
    'species': ['Mus musculus', 'Mus musculus'],
    'date_of_birth': ['2026-03-01', '2026-03-15']
})

Subject.insert(df)
print(f"Total subjects: {len(Subject())}")
# Insert from pandas DataFrame
import pandas as pd

df = pd.DataFrame({
    'subject_id': ['M004', 'M005'],
    'lab_id': ['tolias', 'tolias'],
    'species': ['Mus musculus', 'Mus musculus'],
    'date_of_birth': ['2026-03-01', '2026-03-15']
})

Subject.insert(df)
print(f"Total subjects: {len(Subject())}")

Total subjects: 5

Handling Duplicates¶

By default, inserting a row with an existing primary key raises an error:

In [6]:

Copied!





# This will raise an error - duplicate primary key
try:
    Subject.insert1({'subject_id': 'M001', 'lab_id': 'tolias', 
                     'species': 'Mus musculus', 'date_of_birth': '2026-01-15'})
except Exception as e:
    print(f"Error: {type(e).__name__}")
    print("Cannot insert duplicate primary key!")
# This will raise an error - duplicate primary key
try:
    Subject.insert1({'subject_id': 'M001', 'lab_id': 'tolias', 
                     'species': 'Mus musculus', 'date_of_birth': '2026-01-15'})
except Exception as e:
    print(f"Error: {type(e).__name__}")
    print("Cannot insert duplicate primary key!")

Error: DuplicateError
Cannot insert duplicate primary key!

Use skip_duplicates=True to silently skip rows with existing keys:

In [7]:

Copied!





# Skip duplicates - existing row unchanged
Subject.insert1(
    {'subject_id': 'M001', 'lab_id': 'tolias', 'species': 'Mus musculus', 'date_of_birth': '2026-01-15'},
    skip_duplicates=True
)
print("Insert completed (duplicate skipped)")
# Skip duplicates - existing row unchanged
Subject.insert1(
    {'subject_id': 'M001', 'lab_id': 'tolias', 'species': 'Mus musculus', 'date_of_birth': '2026-01-15'},
    skip_duplicates=True
)
print("Insert completed (duplicate skipped)")

Insert completed (duplicate skipped)

Note: replace=True is also available but has the same caveats as update1()—it bypasses immutability and can break provenance. Use sparingly for corrections only.

Extra Fields¶

By default, inserting a row with fields not in the table raises an error:

In [8]:

Copied!





try:
    Subject.insert1({'subject_id': 'M006', 'lab_id': 'tolias', 
                     'species': 'Mus musculus', 'date_of_birth': '2026-04-01',
                     'unknown_field': 'some value'})  # Unknown field!
except Exception as e:
    print(f"Error: {type(e).__name__}")
    print("Field 'unknown_field' not in table!")
try:
    Subject.insert1({'subject_id': 'M006', 'lab_id': 'tolias', 
                     'species': 'Mus musculus', 'date_of_birth': '2026-04-01',
                     'unknown_field': 'some value'})  # Unknown field!
except Exception as e:
    print(f"Error: {type(e).__name__}")
    print("Field 'unknown_field' not in table!")

Error: KeyError
Field 'unknown_field' not in table!

In [9]:

Copied!





# Use ignore_extra_fields=True to silently ignore unknown fields
Subject.insert1(
    {'subject_id': 'M006', 'lab_id': 'tolias', 'species': 'Mus musculus',
     'date_of_birth': '2026-04-01', 'unknown_field': 'ignored'},
    ignore_extra_fields=True
)
print(f"Total subjects: {len(Subject())}")
# Use ignore_extra_fields=True to silently ignore unknown fields
Subject.insert1(
    {'subject_id': 'M006', 'lab_id': 'tolias', 'species': 'Mus musculus',
     'date_of_birth': '2026-04-01', 'unknown_field': 'ignored'},
    ignore_extra_fields=True
)
print(f"Total subjects: {len(Subject())}")

Total subjects: 6

Master-Part Tables and Transactions¶

Compositional integrity means that a master and all its parts must be inserted (or deleted) as an atomic unit. This ensures downstream computations see complete data.

Auto-populated tables (Computed, Imported) enforce this automatically—make() runs in a transaction
Manual tables require explicit transactions to maintain compositional integrity

Inserting Master with Parts¶

In [10]:

Copied!





# Use a transaction to ensure master and parts are inserted atomically
with dj.conn().transaction:
    Session.insert1({
        'subject_id': 'M001',
        'session_idx': 1,
        'session_date': '2026-01-06',
        'duration': 45.5
    })
    Session.Trial.insert([
        {'subject_id': 'M001', 'session_idx': 1, 'trial_idx': 1,
         'outcome': 'hit', 'reaction_time': 0.35},
        {'subject_id': 'M001', 'session_idx': 1, 'trial_idx': 2,
         'outcome': 'miss', 'reaction_time': 0.82},
        {'subject_id': 'M001', 'session_idx': 1, 'trial_idx': 3,
         'outcome': 'hit', 'reaction_time': 0.41},
        {'subject_id': 'M001', 'session_idx': 1, 'trial_idx': 4,
         'outcome': 'false_alarm', 'reaction_time': 0.28},
        {'subject_id': 'M001', 'session_idx': 1, 'trial_idx': 5,
         'outcome': 'hit', 'reaction_time': 0.39},
    ])

# Both master and parts committed together, or neither if error occurred
Session.Trial()
# Use a transaction to ensure master and parts are inserted atomically
with dj.conn().transaction:
    Session.insert1({
        'subject_id': 'M001',
        'session_idx': 1,
        'session_date': '2026-01-06',
        'duration': 45.5
    })
    Session.Trial.insert([
        {'subject_id': 'M001', 'session_idx': 1, 'trial_idx': 1,
         'outcome': 'hit', 'reaction_time': 0.35},
        {'subject_id': 'M001', 'session_idx': 1, 'trial_idx': 2,
         'outcome': 'miss', 'reaction_time': 0.82},
        {'subject_id': 'M001', 'session_idx': 1, 'trial_idx': 3,
         'outcome': 'hit', 'reaction_time': 0.41},
        {'subject_id': 'M001', 'session_idx': 1, 'trial_idx': 4,
         'outcome': 'false_alarm', 'reaction_time': 0.28},
        {'subject_id': 'M001', 'session_idx': 1, 'trial_idx': 5,
         'outcome': 'hit', 'reaction_time': 0.39},
    ])

# Both master and parts committed together, or neither if error occurred
Session.Trial()

Out[10]:

subject_id	session_idx	trial_idx	outcome	reaction_time
M001	1	1	hit	0.35
M001	1	2	miss	0.82
M001	1	3	hit	0.41
M001	1	4	false_alarm	0.28
M001	1	5	hit	0.39

Total: 5

Update Operations¶

DataJoint provides only update1() for modifying single rows. This is intentional—updates bypass the normal workflow and should be used sparingly for corrective operations.

When to Use Updates¶

Appropriate uses:

Fixing data entry errors (typos, wrong values)
Adding notes or metadata after the fact
Administrative corrections

Inappropriate uses (use delete + insert + populate instead):

Regular workflow operations
Changes that should trigger recomputation

In [11]:

Copied!

# Update a single row - must provide all primary key values
Subject.update1({'subject_id': 'M001', 'notes': 'Primary subject for behavioral study'})

(Subject & {'subject_id': 'M001'}).fetch1()
# Update a single row - must provide all primary key values
Subject.update1({'subject_id': 'M001', 'notes': 'Primary subject for behavioral study'})

(Subject & {'subject_id': 'M001'}).fetch1()

Out[11]:

{'subject_id': 'M001',
 'lab_id': 'tolias',
 'species': 'Mus musculus',
 'date_of_birth': datetime.date(2026, 1, 15),
 'notes': 'Primary subject for behavioral study'}

In [12]:

Copied!





# Update multiple attributes at once
Subject.update1({
    'subject_id': 'M002',
    'notes': 'Control group',
    'species': 'Mus musculus (C57BL/6)'  # More specific
})

(Subject & {'subject_id': 'M002'}).fetch1()
# Update multiple attributes at once
Subject.update1({
    'subject_id': 'M002',
    'notes': 'Control group',
    'species': 'Mus musculus (C57BL/6)'  # More specific
})

(Subject & {'subject_id': 'M002'}).fetch1()

Out[12]:

{'subject_id': 'M002',
 'lab_id': 'tolias',
 'species': 'Mus musculus (C57BL/6)',
 'date_of_birth': datetime.date(2026, 2, 1),
 'notes': 'Control group'}

Update Requirements¶

Complete primary key: All PK attributes must be provided
Exactly one match: Must match exactly one existing row
No restrictions: Cannot call on a restricted table

In [13]:

Copied!





# Error: incomplete primary key
try:
    Subject.update1({'notes': 'Missing subject_id!'})
except Exception as e:
    print(f"Error: {type(e).__name__}")
    print("Primary key must be complete")
# Error: incomplete primary key
try:
    Subject.update1({'notes': 'Missing subject_id!'})
except Exception as e:
    print(f"Error: {type(e).__name__}")
    print("Primary key must be complete")

Error: DataJointError
Primary key must be complete

In [14]:

Copied!





# Error: cannot update restricted table
try:
    (Subject & {'subject_id': 'M001'}).update1({'subject_id': 'M001', 'notes': 'test'})
except Exception as e:
    print(f"Error: {type(e).__name__}")
    print("Cannot update restricted table")
# Error: cannot update restricted table
try:
    (Subject & {'subject_id': 'M001'}).update1({'subject_id': 'M001', 'notes': 'test'})
except Exception as e:
    print(f"Error: {type(e).__name__}")
    print("Cannot update restricted table")

Error: DataJointError
Cannot update restricted table

Reset to Default¶

Setting an attribute to None resets it to its default value:

In [15]:

Copied!

# Reset notes to default (empty string)
Subject.update1({'subject_id': 'M003', 'notes': None})

(Subject & {'subject_id': 'M003'}).fetch1()
# Reset notes to default (empty string)
Subject.update1({'subject_id': 'M003', 'notes': None})

(Subject & {'subject_id': 'M003'}).fetch1()

Out[15]:

{'subject_id': 'M003',
 'lab_id': 'tolias',
 'species': 'Mus musculus',
 'date_of_birth': datetime.date(2026, 2, 15),
 'notes': ''}

Delete Operations¶

Cascading Deletes¶

Deleting a row automatically cascades to all dependent tables. This maintains referential integrity across the pipeline.

In [16]:

Copied!





# First, let's see what we have
print(f"Sessions: {len(Session())}")
print(f"Trials: {len(Session.Trial())}")

# Populate computed table
SessionAnalysis.populate()
print(f"SessionAnalysis: {len(SessionAnalysis())}")
# First, let's see what we have
print(f"Sessions: {len(Session())}")
print(f"Trials: {len(Session.Trial())}")

# Populate computed table
SessionAnalysis.populate()
print(f"SessionAnalysis: {len(SessionAnalysis())}")

Sessions: 1

Trials: 5

SessionAnalysis: 1

In [17]:

Copied!





# Delete a session - cascades to Trial and SessionAnalysis
(Session & {'subject_id': 'M001', 'session_idx': 1}).delete(prompt=False)

print(f"After delete:")
print(f"Sessions: {len(Session())}")
print(f"Trials: {len(Session.Trial())}")
print(f"SessionAnalysis: {len(SessionAnalysis())}")
# Delete a session - cascades to Trial and SessionAnalysis
(Session & {'subject_id': 'M001', 'session_idx': 1}).delete(prompt=False)

print(f"After delete:")
print(f"Sessions: {len(Session())}")
print(f"Trials: {len(Session.Trial())}")
print(f"SessionAnalysis: {len(SessionAnalysis())}")

[2026-03-13 13:18:18] Deleting 1 rows from "tutorial_data_entry"."__session_analysis"

[2026-03-13 13:18:18] Deleting 5 rows from "tutorial_data_entry"."session__trial"

[2026-03-13 13:18:18] Deleting 1 rows from "tutorial_data_entry"."session"

After delete:
Sessions: 0
Trials: 0
SessionAnalysis: 0

Prompt Behavior¶

The prompt parameter controls whether delete() asks for confirmation. When prompt=None (default), the behavior is determined by dj.config['safemode']:

# Uses config['safemode'] setting (default)
(Table & condition).delete()

# Explicitly skip confirmation
(Table & condition).delete(prompt=False)

# Explicitly require confirmation
(Table & condition).delete(prompt=True)

In [18]:

Copied!





# Add more data for demonstration
with dj.conn().transaction:
    Session.insert1({
        'subject_id': 'M002',
        'session_idx': 1,
        'session_date': '2026-01-07',
        'duration': 30.0
    })
    Session.Trial.insert([
        {'subject_id': 'M002', 'session_idx': 1, 'trial_idx': 1,
         'outcome': 'hit', 'reaction_time': 0.40},
        {'subject_id': 'M002', 'session_idx': 1, 'trial_idx': 2,
         'outcome': 'hit', 'reaction_time': 0.38},
    ])

# Delete with prompt=False (no confirmation prompt)
(Session & {'subject_id': 'M002', 'session_idx': 1}).delete(prompt=False)
# Add more data for demonstration
with dj.conn().transaction:
    Session.insert1({
        'subject_id': 'M002',
        'session_idx': 1,
        'session_date': '2026-01-07',
        'duration': 30.0
    })
    Session.Trial.insert([
        {'subject_id': 'M002', 'session_idx': 1, 'trial_idx': 1,
         'outcome': 'hit', 'reaction_time': 0.40},
        {'subject_id': 'M002', 'session_idx': 1, 'trial_idx': 2,
         'outcome': 'hit', 'reaction_time': 0.38},
    ])

# Delete with prompt=False (no confirmation prompt)
(Session & {'subject_id': 'M002', 'session_idx': 1}).delete(prompt=False)

[2026-03-13 13:18:18] Deleting 0 rows from "tutorial_data_entry"."__session_analysis"

[2026-03-13 13:18:18] Deleting 2 rows from "tutorial_data_entry"."session__trial"

[2026-03-13 13:18:18] Deleting 1 rows from "tutorial_data_entry"."session"

Out[18]:

The Recomputation Pattern¶

When source data needs to change, the correct pattern is delete → insert → populate. This ensures all derived data remains consistent:

In [19]:

Copied!





# Add a session with trials (using transaction for compositional integrity)
with dj.conn().transaction:
    Session.insert1({
        'subject_id': 'M003',
        'session_idx': 1,
        'session_date': '2026-01-08',
        'duration': 40.0
    })
    Session.Trial.insert([
        {'subject_id': 'M003', 'session_idx': 1, 'trial_idx': 1,
         'outcome': 'hit', 'reaction_time': 0.35},
        {'subject_id': 'M003', 'session_idx': 1, 'trial_idx': 2,
         'outcome': 'miss', 'reaction_time': 0.50},
    ])

# Compute results
SessionAnalysis.populate()
print("Before correction:", SessionAnalysis.fetch1())
# Add a session with trials (using transaction for compositional integrity)
with dj.conn().transaction:
    Session.insert1({
        'subject_id': 'M003',
        'session_idx': 1,
        'session_date': '2026-01-08',
        'duration': 40.0
    })
    Session.Trial.insert([
        {'subject_id': 'M003', 'session_idx': 1, 'trial_idx': 1,
         'outcome': 'hit', 'reaction_time': 0.35},
        {'subject_id': 'M003', 'session_idx': 1, 'trial_idx': 2,
         'outcome': 'miss', 'reaction_time': 0.50},
    ])

# Compute results
SessionAnalysis.populate()
print("Before correction:", SessionAnalysis.fetch1())

Before correction: {'subject_id': 'M003', 'session_idx': 1, 'hit_rate': 0.5}

In [20]:

Copied!





# Suppose we discovered trial 2 was actually a 'hit' not 'miss'
# WRONG: Updating the trial would leave SessionAnalysis stale!
# Session.Trial.update1({...})  # DON'T DO THIS

# CORRECT: Delete, reinsert, recompute
key = {'subject_id': 'M003', 'session_idx': 1}

# 1. Delete cascades to SessionAnalysis
(Session & key).delete(prompt=False)

# 2. Reinsert with corrected data (using transaction)
with dj.conn().transaction:
    Session.insert1({**key, 'session_date': '2026-01-08', 'duration': 40.0})
    Session.Trial.insert([
        {**key, 'trial_idx': 1, 'outcome': 'hit', 'reaction_time': 0.35},
        {**key, 'trial_idx': 2, 'outcome': 'hit', 'reaction_time': 0.50},
    ])

# 3. Recompute
SessionAnalysis.populate()
print("After correction:", SessionAnalysis.fetch1())
# Suppose we discovered trial 2 was actually a 'hit' not 'miss'
# WRONG: Updating the trial would leave SessionAnalysis stale!
# Session.Trial.update1({...})  # DON'T DO THIS

# CORRECT: Delete, reinsert, recompute
key = {'subject_id': 'M003', 'session_idx': 1}

# 1. Delete cascades to SessionAnalysis
(Session & key).delete(prompt=False)

# 2. Reinsert with corrected data (using transaction)
with dj.conn().transaction:
    Session.insert1({**key, 'session_date': '2026-01-08', 'duration': 40.0})
    Session.Trial.insert([
        {**key, 'trial_idx': 1, 'outcome': 'hit', 'reaction_time': 0.35},
        {**key, 'trial_idx': 2, 'outcome': 'hit', 'reaction_time': 0.50},
    ])

# 3. Recompute
SessionAnalysis.populate()
print("After correction:", SessionAnalysis.fetch1())

[2026-03-13 13:18:18] Deleting 1 rows from "tutorial_data_entry"."__session_analysis"

[2026-03-13 13:18:18] Deleting 2 rows from "tutorial_data_entry"."session__trial"

[2026-03-13 13:18:18] Deleting 1 rows from "tutorial_data_entry"."session"

After correction: {'subject_id': 'M003', 'session_idx': 1, 'hit_rate': 1.0}

Validation¶

Use validate() to check data before insertion:

In [21]:

Copied!





# Validate rows before inserting
rows_to_insert = [
    {'subject_id': 'M007', 'lab_id': 'tolias', 'species': 'Mus musculus', 'date_of_birth': '2026-05-01'},
    {'subject_id': 'M008', 'lab_id': 'tolias', 'species': 'Mus musculus', 'date_of_birth': '2026-05-15'},
]

result = Subject.validate(rows_to_insert)

if result:
    Subject.insert(rows_to_insert)
    print(f"Inserted {len(rows_to_insert)} rows")
else:
    print("Validation failed:")
    print(result.summary())
# Validate rows before inserting
rows_to_insert = [
    {'subject_id': 'M007', 'lab_id': 'tolias', 'species': 'Mus musculus', 'date_of_birth': '2026-05-01'},
    {'subject_id': 'M008', 'lab_id': 'tolias', 'species': 'Mus musculus', 'date_of_birth': '2026-05-15'},
]

result = Subject.validate(rows_to_insert)

if result:
    Subject.insert(rows_to_insert)
    print(f"Inserted {len(rows_to_insert)} rows")
else:
    print("Validation failed:")
    print(result.summary())

Inserted 2 rows

In [22]:

Copied!





# Example of validation failure
bad_rows = [
    {'subject_id': 'M009', 'species': 'Mus musculus', 'date_of_birth': '2026-05-20'},  # Missing lab_id!
]

result = Subject.validate(bad_rows)

if not result:
    print("Validation failed!")
    for error in result.errors:
        print(f"  {error}")
# Example of validation failure
bad_rows = [
    {'subject_id': 'M009', 'species': 'Mus musculus', 'date_of_birth': '2026-05-20'},  # Missing lab_id!
]

result = Subject.validate(bad_rows)

if not result:
    print("Validation failed!")
    for error in result.errors:
        print(f"  {error}")

Validation failed!
  (0, 'lab_id', "Required field 'lab_id' is missing")

Transactions¶

Single operations are atomic by default. Use explicit transactions for:

Master-part inserts — Maintain compositional integrity
Multi-table operations — All succeed or all fail
Complex workflows — Coordinate related changes

In [23]:

Copied!





# Atomic transaction - all inserts succeed or none do
with dj.conn().transaction:
    Session.insert1({
        'subject_id': 'M007',
        'session_idx': 1,
        'session_date': '2026-01-10',
        'duration': 35.0
    })
    Session.Trial.insert([
        {'subject_id': 'M007', 'session_idx': 1, 'trial_idx': 1,
         'outcome': 'hit', 'reaction_time': 0.33},
        {'subject_id': 'M007', 'session_idx': 1, 'trial_idx': 2,
         'outcome': 'miss', 'reaction_time': 0.45},
    ])

print(f"Session inserted with {len(Session.Trial & {'subject_id': 'M007'})} trials")
# Atomic transaction - all inserts succeed or none do
with dj.conn().transaction:
    Session.insert1({
        'subject_id': 'M007',
        'session_idx': 1,
        'session_date': '2026-01-10',
        'duration': 35.0
    })
    Session.Trial.insert([
        {'subject_id': 'M007', 'session_idx': 1, 'trial_idx': 1,
         'outcome': 'hit', 'reaction_time': 0.33},
        {'subject_id': 'M007', 'session_idx': 1, 'trial_idx': 2,
         'outcome': 'miss', 'reaction_time': 0.45},
    ])

print(f"Session inserted with {len(Session.Trial & {'subject_id': 'M007'})} trials")

Session inserted with 2 trials

Best Practices¶

1. Prefer Insert/Delete Over Update¶

When source data changes, delete and reinsert rather than updating. Updates and replace=True bypass immutability and break provenance:

# Good: Delete and reinsert
(Trial & key).delete(prompt=False)
Trial.insert1(corrected_trial)
DerivedTable.populate()

# Avoid: Update that leaves derived data stale
Trial.update1({**key, 'value': new_value})

2. Use Transactions for Master-Part Inserts¶

# Ensures compositional integrity
with dj.conn().transaction:
    Session.insert1(session_data)
    Session.Trial.insert(trials)

3. Batch Inserts for Performance¶

# Good: Single insert call
Subject.insert(all_rows)

# Slow: Loop of insert1 calls
for row in all_rows:
    Subject.insert1(row)  # Creates many transactions

4. Validate Before Insert¶

result = Subject.validate(rows)
if not result:
    raise ValueError(result.summary())
Subject.insert(rows)

5. Configure Safe Mode for Production¶

# In production scripts, explicitly control prompt behavior
(Subject & condition).delete(prompt=False)  # No confirmation

# Or configure globally via settings
dj.config['safemode'] = True  # Require confirmation by default

Quick Reference¶

Operation	Method	Use Case
Insert one	`insert1(row)`	Adding single entity
Insert many	`insert(rows)`	Bulk data loading
Update one	`update1(row)`	Surgical corrections only
Delete	`delete()`	Removing entities (cascades)
Delete quick	`delete_quick()`	Internal cleanup (no cascade)
Validate	`validate(rows)`	Pre-insert check

See the Data Manipulation Specification for complete details.

Next Steps¶

Queries — Filtering, joining, and projecting data
Computation — Building computational pipelines

In [24]:

Copied!

# Cleanup
schema.drop(prompt=False)
# Cleanup
schema.drop(prompt=False)