Skip to content

Virtual Schemas Specification

Overview

Virtual schemas provide a way to access existing database schemas without the original Python source code. This is useful for:

  • Exploring schemas created by other users
  • Accessing legacy schemas
  • Quick data inspection and queries
  • Schema migration and maintenance

1. Schema-Module Convention

DataJoint maintains a 1:1 mapping between database schemas and Python modules:

Database Python
Schema Module
Table Class

This convention reduces conceptual complexity: modules are schemas, classes are tables.

When you define tables in Python:

# lab.py module
import datajoint as dj
schema = dj.Schema('lab')

@schema
class Subject(dj.Manual):  # Subject class โ†’ `lab`.`subject` table
    ...

@schema
class Session(dj.Manual):  # Session class โ†’ `lab`.`session` table
    ...

Virtual schemas recreate this mapping when the Python source isn't available:

# Creates module-like object with table classes
lab = dj.virtual_schema('lab')
lab.Subject  # Subject class for `lab`.`subject`
lab.Session  # Session class for `lab`.`session`


2. Schema Introspection API

2.1 Direct Table Access

Access individual tables by name using bracket notation:

schema = dj.Schema('my_schema')

# By CamelCase class name
experiment = schema['Experiment']

# By snake_case SQL name
experiment = schema['experiment']

# Query the table
experiment.fetch()

2.2 get_table() Method

Explicit method for table access:

table = schema.get_table('Experiment')
table = schema.get_table('experiment')  # also works

Parameters: - name (str): Table name in CamelCase or snake_case

Returns: FreeTable instance

Raises: DataJointError if table doesn't exist

2.3 Iteration

Iterate over all tables in dependency order:

for table in schema:
    print(table.full_table_name, len(table))

Tables are yielded as FreeTable instances in topological order (dependencies before dependents).

2.4 Containment Check

Check if a table exists:

if 'Experiment' in schema:
    print("Table exists")

if 'nonexistent' not in schema:
    print("Table doesn't exist")

3. Virtual Schema Function

3.1 dj.virtual_schema()

The recommended way to access existing schemas as modules:

lab = dj.virtual_schema('my_lab_schema')

# Access tables as attributes (classes)
lab.Subject.fetch()
lab.Session & "subject_id='M001'"

# Full query algebra supported
(lab.Session * lab.Subject).fetch()

This maintains the module-class convention: lab behaves like a Python module with table classes as attributes.

Parameters:

Parameter Type Default Description
schema_name str required Database schema name
connection Connection None Database connection (uses default)
create_schema bool False Create schema if missing
create_tables bool False Allow new table declarations
add_objects dict None Additional objects for namespace

Returns: VirtualModule instance

3.2 VirtualModule Class

The underlying class (prefer virtual_schema() function):

module = dj.VirtualModule('lab', 'my_lab_schema')
module.Subject.fetch()

The first argument is the module display name, second is the schema name.

3.3 Accessing the Schema Object

Virtual modules expose the underlying Schema:

lab = dj.virtual_schema('my_lab_schema')
lab.schema.database  # 'my_lab_schema'
lab.schema.list_tables()  # ['subject', 'session', ...]

4. Table Class Generation

4.1 make_classes()

Create Python classes for all tables in a schema:

schema = dj.Schema('existing_schema')
schema.make_classes()

# Now table classes are available in local namespace
Subject.fetch()
Session & "date > '2024-01-01'"

Parameters: - into (dict, optional): Namespace to populate. Defaults to caller's locals.

4.2 Generated Class Types

Classes are created based on table naming conventions:

Table Name Pattern Generated Class
subject dj.Manual
#lookup_table dj.Lookup
_imported_table dj.Imported
__computed_table dj.Computed
master__part dj.Part

4.3 Part Table Handling

Part tables are attached to their master classes:

lab = dj.virtual_schema('my_lab')

# Part tables are nested attributes
lab.Session.Trial.fetch()  # Session.Trial is a Part table

5. Use Cases

5.1 Data Exploration

# Quick exploration of unknown schema
lab = dj.virtual_schema('collaborator_lab')

# List all tables
print(lab.schema.list_tables())

# Check table structure
print(lab.Subject.describe())

# Preview data
lab.Subject.fetch(limit=5)

5.2 Cross-Schema Queries

my_schema = dj.Schema('my_analysis')
external = dj.virtual_schema('external_lab')

# Reference external tables in queries
@my_schema
class Analysis(dj.Computed):
    definition = """
    -> external.Session
    ---
    result : float
    """

5.3 Schema Migration

old = dj.virtual_schema('old_schema')
new = dj.Schema('new_schema')

# Copy data in topological order (iteration yields dependencies first)
for table in old:
    new_table = new.get_table(table.table_name)
    # Server-side INSERT...SELECT (no client-side data transfer)
    new_table.insert(table)

5.4 Garbage Collection

from datajoint.gc import scan_hash_references

schema = dj.Schema('my_schema')

# Scan all tables for hash references
refs = scan_hash_references(schema, verbose=True)

6. Comparison of Methods

Method Use Case Returns
schema['Name'] Quick single table access FreeTable
schema.get_table('name') Explicit table access FreeTable
for t in schema Iterate all tables FreeTable generator
'Name' in schema Check existence bool
dj.virtual_schema(name) Module-like access VirtualModule
make_classes() Populate namespace None (side effect)

7. Implementation Reference

File Purpose
schemas.py Schema, VirtualModule, virtual_schema
table.py FreeTable class
gc.py Uses get_table() for scanning

8. Error Messages

Error Cause Solution
"Table does not exist" get_table() on missing table Check table name spelling
"Schema must be activated" Operations on unactivated schema Call schema.activate(name)
"Schema does not exist" Schema name not in database Check schema name, create if needed