Manage volumetric fluorescent microscopy experiments with DataJoint Elements¶

This notebook will walk through processing volumetric two-photon calcium imaging data with the DataJoint Workflow for volumetric image processing. The workflow currently supports volumetric data collected from ScanImage.

Please note that uploading data to BossDB via this pipeline requires the following: permissions from the team at BossDB.

An API token which can be obtained by creating an account at api.bossdb.io. You will also need resource manager

A local setup of the intern REST API using the instructions below: path to this file should be ~/.intern/intern.cfg.
- Create a new folder .intern in your root directory.
- Create a configuration file intern.cfg within the .intern folder. The
- The intern.cfg file should contain the following exactly as shown below:
```
# ~/.intern/intern.cfg
[Default]
protocol = https
host = api.bossdb.io
token = <your-api-key>
```

In this notebook, we will explain the following concepts as they relate to this workflow:

What is an Element versus a Workflow?
Plot the workflow with dj.Diagram
Insert data into tables
Query table contents
Fetch table contents
Run the workflow for your experiments

For detailed documentation and tutorials on general DwataJoint principles that support collaboration, automation, reproducibility, and visualizations:

DataJoint Interactive Tutorials - Fundamentals including table tiers, query operations, fetch operations, automated computations with the make function, etc.
DataJoint Core - Documentation - Relational data model principles
DataJoint API for Python - Documentation
DataJoint Element for Volumetric Calcium Imaging - Documentation

Let's start by importing the packages necessary to run this workflow.

In [ ]:

Copied!

import os

if os.path.basename(os.getcwd()) == "notebooks":
    os.chdir("..")

import datajoint as dj
import datetime
import os

if os.path.basename(os.getcwd()) == "notebooks":
    os.chdir("..")

import datajoint as dj
import datetime

The DataJoint Workflow for volumetric Calcium Imaging is assembled from 5 DataJoint Elements¶

Element	Source Code	Documentation	Description
Element Lab	Link	Link	Lab management related information, such as Lab, User, Project, Protocol, Source.
Element Animal	Link	Link	General animal metadata and surgery information.
Element Session	Link	Link	General information of experimental sessions.
Element Calcium Imaging	Link	Link	General information about the calcium imaging scan.
Element zstack	Link	Link	Volumetric data segmentation and export.

Each workflow is composed of multiple Elements. Each Element contains 1 or more modules, and each module declares its own schema in the database.

The Elements are imported within the workflow_zstack.pipeline script.

By importing the modules for the first time, the schemas and tables will be created in the database. Once created, importing modules will not create schemas and tables again, but the existing schemas/tables can be accessed.

The schema diagram (shown below) is a good reference for understanding the order of the tables within the workflow.

Let's activate the Elements.

In [ ]:

Copied!





from workflow_zstack.pipeline import (
    lab,
    subject,
    session,
    scan,
    volume,
    volume_matching,
    bossdb,
)
from workflow_zstack.pipeline import (
    lab,
    subject,
    session,
    scan,
    volume,
    volume_matching,
    bossdb,
)

Diagram¶

We can plot the diagram of tables within multiple schemas and their dependencies using dj.Diagram(). For details, see the documentation.

In [ ]:

Copied!





(
    dj.Diagram(subject.Subject)
    + dj.Diagram(session.Session)
    + dj.Diagram(scan.Scan)
    + dj.Diagram(volume)
    + dj.Diagram(bossdb)
)
(
    dj.Diagram(subject.Subject)
    + dj.Diagram(session.Session)
    + dj.Diagram(scan.Scan)
    + dj.Diagram(volume)
    + dj.Diagram(bossdb)
)

While the diagram above seems complex at first, it becomes more clear when it's approached as a hierarchy of tables that define the order in which the workflow expects to receive data in each of its tables.

The tables higher up in the diagram such as subject.Subject() should be the first to receive data.

Data is manually entered into the green, rectangular tables with the insert1() method.

Tables connected by a solid line depend on entries from the table above it.

There are 5 table tiers in DataJoint. Some of these tables appear in the diagram above.

Table tier	Color and shape	Description
Manual table	Green box	Data entered from outside the pipeline, either by hand or with external helper scripts.
Lookup table	Gray box	Small tables containing general facts and settings of the data pipeline; not specific to any experiment or dataset.
Imported table	Blue oval	Data ingested automatically inside the pipeline but requiring access to data outside the pipeline.
Computed table	Red circle	Data computed automatically entirely inside the pipeline.
Part table	Plain text	Part tables share the same tier as their master table.

Insert entries into manual tables¶

In this section, we will insert metadata about an animal subject, experiment session, and optogenetic stimulation parameters.

Let's start with the first schema and table in the schema diagram (i.e. subject.Subject table).

Each module (e.g. subject) contains a schema object that enables interaction with the schema in the database.

In [ ]:

Copied!

subject.schema
subject.schema

The table classes in the module correspond to a table in the database.

In [ ]:

Copied!

subject.Subject()
subject.Subject()

We can view the table dependencies and the attributes we need to insert by using the functions .describe() and .heading. The describe() function displays the table definition with foreign key references and the heading function displays the attributes of the table definition. These are particularly useful functions if you are new to DataJoint Elements and are unsure of the attributes required for each table.

In [ ]:

Copied!

print(subject.Subject.describe())
print(subject.Subject.describe())

In [ ]:

Copied!

subject.Subject.heading
subject.Subject.heading

We will insert data into the subject.Subject table.

In [ ]:

Copied!





subject.Subject.insert1(
    dict(
        subject="subject1",
        sex="M",
        subject_birth_date="2023-01-01",
        subject_description="Cellpose segmentation of volumetric data.",
    )
)
subject.Subject()
subject.Subject.insert1(
    dict(
        subject="subject1",
        sex="M",
        subject_birth_date="2023-01-01",
        subject_description="Cellpose segmentation of volumetric data.",
    )
)
subject.Subject()

Let's continue inserting in the other manual tables. The Session table is next.

In [ ]:

Copied!

print(session.Session.describe())
print(session.Session.describe())

In [ ]:

Copied!

session.Session.heading
session.Session.heading

The cells above show the dependencies and attributes for the session.Session table.

Notice that describe shows the dependencies of the table on upstream tables (i.e. foreign key references). The Session table depends on the upstream Subject table.

Whereas heading lists all the attributes of the Session table, regardless of whether they are declared in an upstream table.

In [ ]:

Copied!





session_key = dict(
    subject="subject1",
    session_id=0,
)
session.Session.insert1(
    dict(
        session_key,
        session_datetime=datetime.datetime.now(),
    ),
)
session.Session()
session_key = dict(
    subject="subject1",
    session_id=0,
)
session.Session.insert1(
    dict(
        session_key,
        session_datetime=datetime.datetime.now(),
    ),
)
session.Session()

The SessionDirectory table locates the relevant data files in a directory path relative to the root directory defined in your dj.config["custom"]. More information about dj.config is provided at the end of this tutorial and is particularly useful for local deployments of this workflow.

In [ ]:

Copied!

print(session.SessionDirectory.describe())
print(session.SessionDirectory.describe())

In [ ]:

Copied!

session.SessionDirectory.heading
session.SessionDirectory.heading

In [ ]:

Copied!





session.SessionDirectory.insert1(
    dict(session_key, session_dir="subject1/session1"),
    skip_duplicates=True,
)
session.SessionDirectory()
session.SessionDirectory.insert1(
    dict(session_key, session_dir="subject1/session1"),
    skip_duplicates=True,
)
session.SessionDirectory()

Each volume requires an entry in the Scan table from element-calcium-imaging. Here, we'll use describe and heading for the Scan table and insert an entry for the current session.

In [ ]:

Copied!

print(scan.Scan.describe())
print(scan.Scan.describe())

In [ ]:

Copied!

scan.Scan.heading
scan.Scan.heading

In [ ]:

Copied!





scan.Scan.insert1(
    dict(
        session_key,
        scan_id=0,
        acq_software="ScanImage",
    ),
    skip_duplicates=True,
)
scan_key = (scan.Scan & "subject = 'subject1'").fetch1("KEY")
scan.Scan.insert1(
    dict(
        session_key,
        scan_id=0,
        acq_software="ScanImage",
    ),
    skip_duplicates=True,
)
scan_key = (scan.Scan & "subject = 'subject1'").fetch1("KEY")

Populate¶

Automatically populate tables¶

volume.Volume is the first table in the pipeline that can be populated automatically. If a table contains a part table, this part table is also populated during the populate() call. populate() takes several arguments including a session key. This key restricts populate() to performing the operation on the session of interest rather than all possible sessions which could be a time-intensive process for databases with lots of entries.

Let's view the volume.Volume and populate it using the populate() call.

In [ ]:

Copied!

volume.Volume.heading
volume.Volume.heading

In [ ]:

Copied!

volume.Volume()
volume.Volume()

In [ ]:

Copied!

volume.Volume.populate(scan_key, display_progress=True)
volume.Volume.populate(scan_key, display_progress=True)

Let's view the information was entered into this table:

In [ ]:

Copied!

volume.Volume()
volume.Volume()

We're almost ready to perform volume segmentation with cellpose. An important step before processing is managing the parameters which will be used in that step. To do so, we will insert parameters required by cellpose into a DataJoint table SegmentationParamSet. This table keeps track of all combinations of your image processing parameters. You can choose which parameters are used during processing in a later step.

Let's view the attributes and insert data into volume.SegmentationParamSet.

In [ ]:

Copied!

volume.SegmentationParamSet.heading
volume.SegmentationParamSet.heading

In [ ]:

Copied!





volume.SegmentationParamSet.insert_new_params(
    segmentation_method="cellpose",
    paramset_idx=1,
    params=dict(
        diameter=None,
        min_size=2,
        do_3d=False,
        anisotropy=0.5,
        model_type="nuclei",
        channels=[[0, 0]],
        z_axis=0,
        skip_duplicates=True,
    ),
)
volume.SegmentationParamSet.insert_new_params(
    segmentation_method="cellpose",
    paramset_idx=1,
    params=dict(
        diameter=None,
        min_size=2,
        do_3d=False,
        anisotropy=0.5,
        model_type="nuclei",
        channels=[[0, 0]],
        z_axis=0,
        skip_duplicates=True,
    ),
)

DataJoint uses a SegmentationTask table to manage which Volume and SegmentationParamSet should be used during processing.

This table is important for defining several important aspects of downstream processing. Let's view the attributes to get a better understanding.

In [ ]:

Copied!

print(volume.SegmentationTask.describe())
print(volume.SegmentationTask.describe())

In [ ]:

Copied!

volume.SegmentationTask.heading
volume.SegmentationTask.heading

The SegmentationTask table contains two important attributes:

paramset_idx
task_mode

The paramset_idx attribute is tracks your segmentation parameter sets. You can choose the parameter set on which you want to run segmentation analysis based on this attribute. This attribute tells the Segmentation table which set of parameters you are processing in a given populate().

The task_mode attribute can be set to either load or trigger. When set to trigger, the segmentation step will run cellpose on the raw data.

In [ ]:

Copied!





volume.SegmentationTask.insert1(
    dict(
        scan_key,
        paramset_idx=1,
        task_mode="trigger",
    ),
)
volume.SegmentationTask.insert1(
    dict(
        scan_key,
        paramset_idx=1,
        task_mode="trigger",
    ),
)

For now, Element ZStack only supports triggering cellpose. Now, we can popluate the Segmentation table. This step may take several hours, depending on your computer's capabilities.

In [ ]:

Copied!

volume.Segmentation.populate(scan_key, display_progress=True)
volume.Segmentation.populate(scan_key, display_progress=True)

Volumetric data uploaded to BossDB requires information about voxel size. The DataJoint table volume.VoxelSize can be used to insert this information for a given dataset.

In [ ]:

Copied!

volume.VoxelSize.heading
volume.VoxelSize.heading

In [ ]:

Copied!

volume.VoxelSize.insert1(dict(scan_key, width=0.001, height=0.001, depth=0.001))
volume.VoxelSize.insert1(dict(scan_key, width=0.001, height=0.001, depth=0.001))

Let's define an upload_key to automatically upload the volume to BossDB via the bossdb schema. The upload_key combines information about the current scan from scan.Scan and the paramset_idx from SegmentationParamSet.

In [ ]:

Copied!

upload_key = dict(scan_key, paramset_idx=1)
upload_key = dict(scan_key, paramset_idx=1)

The first table in this schema is VolumeUploadTask. Let's define the upload task by naming the collection, experiment, and channel where the data should be uploaded.

In [ ]:

Copied!

print(bossdb.VolumeUploadTask.describe())
print(bossdb.VolumeUploadTask.describe())

In [ ]:

Copied!

bossdb.VolumeUploadTask.heading
bossdb.VolumeUploadTask.heading

In [ ]:

Copied!





collection = "dataJointTestUpload"
experiment = "CalciumImaging"
channel = "volume-image"
bossdb.VolumeUploadTask.insert1(
    dict(
        upload_key,
        collection_name=collection,
        experiment_name=experiment,
        channel_name=channel,
    )
)
collection = "dataJointTestUpload"
experiment = "CalciumImaging"
channel = "volume-image"
bossdb.VolumeUploadTask.insert1(
    dict(
        upload_key,
        collection_name=collection,
        experiment_name=experiment,
        channel_name=channel,
    )
)

Finally, we can upload data.

As a reminder, uploading data to BossDB via this pipeline requires an API token which can be obtained by creating an account at api.bossdb.io. You will also need resource manager permissions from the team at BossDB.

In [ ]:

Copied!

bossdb.VolumeUpload.populate(upload_key)
bossdb.VolumeUpload.populate(upload_key)

To visualize the volumetric data, import the neuroglancer URL and paste it into your browser.

In [ ]:

Copied!

(bossdb.VolumeUpload.WebAddress & upload_key & "upload_type='image+annotation'").fetch1(
    "web_address"
)
(bossdb.VolumeUpload.WebAddress & upload_key & "upload_type='image+annotation'").fetch1(
    "web_address"
)