Manage volumetric fluorescent microscopy experiments with DataJoint Elements¶
This notebook will walk through processing volumetric two-photon calcium imaging
data with the DataJoint Workflow for volumetric image processing. The workflow
currently supports volumetric data collected
from ScanImage.
Please note that uploading data to BossDB via this pipeline requires the following: permissions from the team at BossDB.
- An API token which can be obtained by creating an account at api.bossdb.io. You will also need resource manager
- A local setup of the intern REST API
using the instructions below:
path to this file should be
~/.intern/intern.cfg.- Create a new folder
.internin your root directory. - Create a configuration file
intern.cfgwithin the.internfolder. The
- The
intern.cfgfile should contain the following exactly as shown below:# ~/.intern/intern.cfg [Default] protocol = https host = api.bossdb.io token = <your-api-key>
- Create a new folder
In this notebook, we will explain the following concepts as they relate to this workflow:
- What is an Element versus a Workflow?
- Plot the workflow with
dj.Diagram - Insert data into tables
- Query table contents
- Fetch table contents
- Run the workflow for your experiments
For detailed documentation and tutorials on general DwataJoint principles that support collaboration, automation, reproducibility, and visualizations:
DataJoint Interactive Tutorials - Fundamentals including table tiers, query operations, fetch operations, automated computations with the
makefunction, etc.DataJoint Core - Documentation - Relational data model principles
DataJoint Element for Volumetric Calcium Imaging - Documentation
Let's start by importing the packages necessary to run this workflow.
import os
if os.path.basename(os.getcwd()) == "notebooks":
os.chdir("..")
import datajoint as dj
import datetime
The DataJoint Workflow for volumetric Calcium Imaging is assembled from 5 DataJoint Elements¶
| Element | Source Code | Documentation | Description |
|---|---|---|---|
| Element Lab | Link | Link | Lab management related information, such as Lab, User, Project, Protocol, Source. |
| Element Animal | Link | Link | General animal metadata and surgery information. |
| Element Session | Link | Link | General information of experimental sessions. |
| Element Calcium Imaging | Link | Link | General information about the calcium imaging scan. |
| Element zstack | Link | Link | Volumetric data segmentation and export. |
Each workflow is composed of multiple Elements. Each Element contains 1 or more modules, and each module declares its own schema in the database.
The Elements are imported within the workflow_zstack.pipeline script.
By importing the modules for the first time, the schemas and tables will be created in the database. Once created, importing modules will not create schemas and tables again, but the existing schemas/tables can be accessed.
The schema diagram (shown below) is a good reference for understanding the order of the tables within the workflow.
Let's activate the Elements.
from workflow_zstack.pipeline import (
lab,
subject,
session,
scan,
volume,
volume_matching,
bossdb,
)
Diagram¶
We can plot the diagram of tables within multiple schemas and their dependencies using dj.Diagram(). For details, see the documentation.
(
dj.Diagram(subject.Subject)
+ dj.Diagram(session.Session)
+ dj.Diagram(scan.Scan)
+ dj.Diagram(volume)
+ dj.Diagram(bossdb)
)
While the diagram above seems complex at first, it becomes more clear when it's approached as a hierarchy of tables that define the order in which the workflow expects to receive data in each of its tables.
The tables higher up in the diagram such as subject.Subject() should be the first to receive data.
Data is manually entered into the green, rectangular tables with the insert1() method.
Tables connected by a solid line depend on entries from the table above it.
There are 5 table tiers in DataJoint. Some of these tables appear in the diagram above.
| Table tier | Color and shape | Description |
|---|---|---|
| Manual table | Green box | Data entered from outside the pipeline, either by hand or with external helper scripts. |
| Lookup table | Gray box | Small tables containing general facts and settings of the data pipeline; not specific to any experiment or dataset. |
| Imported table | Blue oval | Data ingested automatically inside the pipeline but requiring access to data outside the pipeline. |
| Computed table | Red circle | Data computed automatically entirely inside the pipeline. |
| Part table | Plain text | Part tables share the same tier as their master table. |
Insert entries into manual tables¶
In this section, we will insert metadata about an animal subject, experiment session, and optogenetic stimulation parameters.
Let's start with the first schema and table in the schema diagram (i.e. subject.Subject table).
Each module (e.g. subject) contains a schema object that enables interaction with the schema in the database.
subject.schema
The table classes in the module correspond to a table in the database.
subject.Subject()
We can view the table dependencies and the attributes we need to insert by using the functions .describe() and .heading. The describe() function displays the table definition with foreign key references and the heading function displays the attributes of the table definition. These are particularly useful functions if you are new to DataJoint Elements and are unsure of the attributes required for each table.
print(subject.Subject.describe())
subject.Subject.heading
We will insert data into the subject.Subject table.
subject.Subject.insert1(
dict(
subject="subject1",
sex="M",
subject_birth_date="2023-01-01",
subject_description="Cellpose segmentation of volumetric data.",
)
)
subject.Subject()
Let's continue inserting in the other manual tables. The Session table is next.
print(session.Session.describe())
session.Session.heading
The cells above show the dependencies and attributes for the session.Session table.
Notice that describe shows the dependencies of the table on upstream tables (i.e. foreign key references). The Session table depends on the upstream Subject table.
Whereas heading lists all the attributes of the Session table, regardless of
whether they are declared in an upstream table.
session_key = dict(
subject="subject1",
session_id=0,
)
session.Session.insert1(
dict(
session_key,
session_datetime=datetime.datetime.now(),
),
)
session.Session()
The SessionDirectory table locates the relevant data files in a directory path
relative to the root directory defined in your dj.config["custom"]. More
information about dj.config is provided at the end of this tutorial and is
particularly useful for local deployments of this workflow.
print(session.SessionDirectory.describe())
session.SessionDirectory.heading
session.SessionDirectory.insert1(
dict(session_key, session_dir="subject1/session1"),
skip_duplicates=True,
)
session.SessionDirectory()
Each volume requires an entry in the Scan table from
element-calcium-imaging. Here, we'll use describe and heading for the Scan
table and insert an entry for the current session.
print(scan.Scan.describe())
scan.Scan.heading
scan.Scan.insert1(
dict(
session_key,
scan_id=0,
acq_software="ScanImage",
),
skip_duplicates=True,
)
scan_key = (scan.Scan & "subject = 'subject1'").fetch1("KEY")
Populate¶
Automatically populate tables¶
volume.Volume is the first table in the pipeline that can be populated automatically.
If a table contains a part table, this part table is also populated during the
populate() call. populate() takes several arguments including a session
key. This key restricts populate() to performing the operation on the session
of interest rather than all possible sessions which could be a time-intensive
process for databases with lots of entries.
Let's view the volume.Volume and populate it using the populate() call.
volume.Volume.heading
volume.Volume()
volume.Volume.populate(scan_key, display_progress=True)
Let's view the information was entered into this table:
volume.Volume()
We're almost ready to perform volume segmentation with cellpose. An important step before
processing is managing the parameters which will be used in that step. To do so, we will
insert parameters required by cellpose into a DataJoint table
SegmentationParamSet. This table keeps track of all combinations of your image
processing parameters. You can choose which parameters are used during
processing in a later step.
Let's view the attributes and insert data into volume.SegmentationParamSet.
volume.SegmentationParamSet.heading
volume.SegmentationParamSet.insert_new_params(
segmentation_method="cellpose",
paramset_idx=1,
params=dict(
diameter=None,
min_size=2,
do_3d=False,
anisotropy=0.5,
model_type="nuclei",
channels=[[0, 0]],
z_axis=0,
skip_duplicates=True,
),
)
DataJoint uses a SegmentationTask table to
manage which Volume and SegmentationParamSet should be used during processing.
This table is important for defining several important aspects of downstream processing. Let's view the attributes to get a better understanding.
print(volume.SegmentationTask.describe())
volume.SegmentationTask.heading
The SegmentationTask table contains two important attributes:
paramset_idxtask_mode
The paramset_idx attribute is tracks
your segmentation parameter sets. You can choose the parameter set on which
you want to run segmentation analysis based on this attribute. This
attribute tells the Segmentation table which set of parameters you are
processing in a given populate().
The task_mode attribute can be set to either load or trigger. When set to trigger, the
segmentation step will run cellpose on the raw data.
volume.SegmentationTask.insert1(
dict(
scan_key,
paramset_idx=1,
task_mode="trigger",
),
)
For now, Element ZStack only supports triggering cellpose. Now, we can popluate
the Segmentation table. This step may take several hours, depending on your
computer's capabilities.
volume.Segmentation.populate(scan_key, display_progress=True)
Volumetric data uploaded to BossDB requires information about voxel size. The
DataJoint table volume.VoxelSize can be used to insert this information for a
given dataset.
volume.VoxelSize.heading
volume.VoxelSize.insert1(dict(scan_key, width=0.001, height=0.001, depth=0.001))
Let's define an upload_key to automatically upload the volume to BossDB via the
bossdb schema. The upload_key combines information about the current scan from
scan.Scan and the paramset_idx from SegmentationParamSet.
upload_key = dict(scan_key, paramset_idx=1)
The first table in this schema is VolumeUploadTask. Let's define the upload task by naming the collection, experiment,
and channel where the data should be uploaded.
print(bossdb.VolumeUploadTask.describe())
bossdb.VolumeUploadTask.heading
collection = "dataJointTestUpload"
experiment = "CalciumImaging"
channel = "volume-image"
bossdb.VolumeUploadTask.insert1(
dict(
upload_key,
collection_name=collection,
experiment_name=experiment,
channel_name=channel,
)
)
Finally, we can upload data.
As a reminder, uploading data to BossDB via this pipeline requires an API token which can be obtained by creating an account at api.bossdb.io. You will also need resource manager permissions from the team at BossDB.
bossdb.VolumeUpload.populate(upload_key)
To visualize the volumetric data, import the neuroglancer URL and paste it into your browser.
(bossdb.VolumeUpload.WebAddress & upload_key & "upload_type='image+annotation'").fetch1(
"web_address"
)