Data Repository for Motta, Berning, Boergens, Staffler et al. (2019) Science

Department of Connectomics, Max Planck Institute for Brain Research, D-60438 Frankfurt, Germany
Probabilistic Numerics Group, Max Planck Institute for Intelligent Systems, D-72076 Tübingen, Germany
Equally contributing first authors
Corresponding author (mh@brain.mpg.de)

Research article
Supplementary materials
Electron microscopy volume browser
Data repository

When using any of these data, please cite as
Motta A, Berning M, Boergens KM, Staffler B, Beining M, Loomba S, Hennig Ph, Wissler H, Helmstaedter M (2019). Dense connectomic reconstruction in layer 4 of the somatosensory cortex. Science. DOI: 10.1126/science.aay3134

Datasets

All of the following datasets are available for download in the data repository at https://l4dense2019.brain.mpg.de/webdav.

This repository can also be accessed as network-attached drive from Windows, macOS, and Linux.

Most datasets are provided as HDF5 files. This file format is supported in many programming environments, including MATLAB, Python, R, and Julia. The detailed structure of individual HDF5 files is described below.

Some of the training and validation data for machine learning algorithms are provided in form of NML files. These files can be viewed and edited in the web browser using webKnossos, and can be processed using MATLAB or Python.

Overview

Serial block-face scanning electron microscopy data from layer 4 of the primary somatosensory cortex of a 28-day-old mouse.
The first dimension (X) is oriented from pia mater towards white matter.
The nominal voxel size is 11.24 nm × 11.24 nm × 28 nm.
The electron microscopy data are stored as unsigned 8-bit integers.
The segment IDs are encoded as unsigned 32-bit integers.
The segment ID zero denotes unsegmented voxels.
All indices and coordinates are one-based.
Physical areas are measured in µm².
Coordinates are in voxel units.

Electron Microscopy Volume

The electron microscopy data are available

for viewing and annotation in the web browser at https://webknossos.org
for download in /electron-microscopy-volume

The electron microscopy volume was split into 216 separate HDF5 files. Each file contains a single dataset, /data, with a subvolume of (1024 voxels)³. The filename encodes the offset of its subvolume. x2y4z3.hdf5, for example, contains the data cube starting at position (X, Y, Z) = (2 × 1024 + 1, 4 × 1024 + 1, 3 × 1024 + 1). The full extent of the image volume is from 104 to 6793 along X, from 104 to 10048 along Y, and from 119 to 3538 along Z, including limits.

Note: The voxel data are stored in Z×Y×X row-major / X×Y×Z column-major order. If you're using Python (with NumPy) or C, for example, access the voxel data as data[z, y, x]. MATLAB, Julia, and R use column-major memory layout, data(x, y, z).

Parts of this electron microscopy volume were previously published in

Yunfeng Hua, Philip Laserstein, Moritz Helmstaedter (2015) Nature Communications
Large-volume en-bloc staining for electron microscopy-based connectomics
DOI: 10.1038/ncomms8923
Manuel Berning, Kevin M. Boergens, Moritz Helmstaedter (2015) Neuron
SegEM: Efficient Image Analysis for High-Resolution Connectomics
DOI: 10.1016/j.neuron.2015.09.003
More: https://segem.brain.mpg.de
Kevin M. Boergens, Manuel Berning, Tom Bocklisch, Dominic Bräunlein, Florian Drawitsch, Johannes Frohnhofen, Tom Herold, Philipp Otto, Norman Rzepka, Thomas Werkmeister, Daniel Werner, Georg Wiese, Heiko Wissler, Moritz Helmstaedter (2017) Nature Methods
webKnossos: efficient online 3D data annotation for connectomics
DOI: 10.1038/nmeth.4331
Benedikt Staffler, Manuel Berning, Kevin M. Boergens, Anjali Gour, Patrick van der Smagt, Moritz Helmstaedter (2017) eLife
SynEM, automated synapse detection for connectomics
DOI: 10.7554/eLife.26414
More: https://synem.brain.mpg.de

Volume Segmentation

The volume segmentation is available

for viewing and annotation in the web browser at https://webknossos.org
The segmentation must first be turned on by clicking the cogwheel → selecting the "Dataset" tab → activating the "segmentation" layer.
The opacity of the segmentation layer can then be ajusted using the slider below.
for download in /segmentation-volume

The segmentation volume is stored analogously to the electron microscopy volume. /data contains the unsigned 32-bit segment IDs for a (1024 voxels)³ subvolume. The full extent of the segmentation volume is from 129 to 5574 along X, from 129 to 8509 along Y, and from 129 to 3414 along Z, including limits.

Mapped Volume Segmentation

A volume segmentation in which all segments belonging to a given neurite were mapped to a single segment is available

for viewing and annotation in the web browser at https://webknossos.org
for download in /mapped-segmentation-volume

For further information on viewing of the segmentation volume and on the structure of the HDF5 files, please consult the section above.

The relationship between mapped segment IDs and neurites is stored in the axons.hdf5 and dendrites.hdf5 files.

Blood Vessel Volume Segmentation

A volume segmentation of the blood vessels is available

for download in /blood-vessel-segmentation-volume

For further information on the structure of the HDF5 files, please consult the section above.

Training and Evaluation Data for Machine Learning Algorithms

Volume Segmentation (SegEM)

The electron microscopy volume was segmented using SegEM. CNN 20130516T204040_8,3 and associated parameters from Table 1 of Berning et al. (2015) Neuron were used. Code, training and evaluation data, and parameters of the trained CNN are available as supplementary material of that publication or at https://segem.brain.mpg.de.

Neurite Continuity Classification (ConnectEM)

The neurite continuity classifier, ConnectEM, was trained and evaluated on so-called "merger mode tracings". These are skeleton reconstructions in which nodes are spatially mapped onto the corresponding segments. Each skeleton represents the set of segments of a neurite. Interfaces between segments within / across neurites were used as positive / negative classification samples.

Three volumes of (5 µm)³, each, were densely reconstructed this way. The skeleton reconstructions are available as NML files in /connectem/training-and-test-data.

The parameters for automated axon and dendrite reconstructions were separately optimized on a random subset of neurites. NML files with skeleton reconstructions of these neurites are available in /connectem/parameter-grid-search.

Neurite Type Classification (TypeEM)

Each segment was assigned probabilities of being part of an axon, of a dendrite, of an astrocyte, or of a spine head. These neurite type classifiers, called TypeEM, were trained and evaluated on an extended version of the ConnectEM merger mode tracings (see above) in which skeletons were labeled with the type of neurite. These ground truth data are available as NML files in /typeem/training-and-test-data.

Synapse Detection (SynEM)

Synapse-Vesicle-Mitochondrion CNN (SVM CNN)

The training and test data of the synapse-vesicle-mitochondrion CNN consist of seven roughly cubic regions in which voxels belonging to synapses, vesicle clouds, or mitochondria were manually labeled as such. Cubes 1-6 are each 300×300×120 voxels³ in size. Cube 7 is substantially larger and comprises 512×512×256 voxels³.

These data are available in /synem/synapse-vesicle-mitochondrion-cnn/training-and-test-data. Each cube corresponds to an HDF5 files with two datasets:

/label is the three-dimensional label volume. Voxels belonging to synapses, vesicle clouds, and mitochondria are marked by the integers 1, 2, and 3, respectively. The remaining "background" voxels are labeled by zeroes. This information is stored in machine-readable form as attributes on the /label dataset.
/em contains the electron microscopy (EM) data corresponding to the label volume. The EM volume contains margins of 125 voxels on each side along X and Y, and margins of 50 voxels on each side along Z relative to the label data.

Note: Details on the format of voxel data are given above.

Interface Classification using SynEM

The interface classifiers used for synapse detection were trained on an extended version of the training set published in Staffler et al. (2017) eLife. The training and test sets were furthermore complemented with annotations of the types of postsynaptic target. This separation of synapses onto spine heads, spine necks, dendritic shafts, or neuronal somata allowed the training of separate classifiers for spine and non-spine synapses.

Each of the HDF5 files in /synem/interface-classification/training-and-test-data is organized as follows:

/em contains a cuboid of electron microscopy data.
/segmentation contains the segmentation volume corresponding to /em.
/interfaces contains the contiguous interfaces between pairs of adjacent segments.
- /interfaces/7026, for example, contains the list of the voxels that constitute interface 7026. The voxels are specified by one-based column-major linear indices into the the /em and /segmentation datasets.
/edges contains the IDs of the adjacent segments that induced each of the /interfaces.
/labels contains the synapse labels for each interface. The integers 1 and 2 denote synaptic interfaces in which the first of the two segments listed in /edges is pre- and postsynaptic, respectively. Non-synaptic interfaces are marked by the integer zero. This information is stored in machine-readable form as attributes on the /labels dataset.
/targetTypes contains the type of postsynaptic target. The possible target types and their integer encoding are: spine head (1), spine neck (2), dendritic shaft (3), and soma (4). Non-synaptic interfaces are marked by the integer zero. This information is stored in machine-readable form as attributes on the /targetTypes dataset.

Note: Details on the format of voxel data are given above.

Test Set for Inhibitory Synapses

The performance of inhibitory synapse detection was evaluated on three inhibitory axons. These axons and their postsynaptic targets were (locally) volumetically reconstructed using merger mode tracings (see above). The reconstructions and classification of postsynaptic targets are provided in /synem/interface-classification/test-data/inhibitory-axons.

Neurite Reconstructions

Axons

The axon reconstructions are stored in axons.hdf5. This file is organized as follows:

/axons/skeleton contains the skeleton reconstructions of axons. Axon 10298, for example, is stored in the following datasets:
- /axons/skeleton/10298/nodes is a 3×684 dataset that contains the X, Y, and Z coordinates of the 684 skeleton nodes.
- /axons/skeleton/10298/edges is a 2×781 dataset that contains the 781 edges of the skeleton. Each pair of integers corresponds to the one-based node indices that make up an undirected edge.
- /axons/skeleton/10298/segIds is a 684 dataset that contains the segment ID corresponding to each skeleton node. Parts of the skeleton that were generated during focused error annotation using webKnossos flight mode are indicated by segment ID zero.
/axons/agglomerates contains the segment agglomerates (i.e., segment equivalence classes) derived from the skeleton reconstructions. These agglomerates were used for connectome reconstruction.
- /axons/agglomerate/10298, for example, is a dataset containing the IDs of the 394 segments making up axon 10298.
/axons/class is a 38962 dataset that encode the axon classification. The corticocortical, thalamocortical, inhibitory, and other axons are marked by the integers 1, 2, 3, and 4, respectively. This information is stored in machine-readable form as attributes on the /axons/class dataset.
/axons/mappedSegIds is a 38962 dataset that contains the axons' segment IDs in the mapped segmentation.

Dendrites

The dendrite reconstructions are stored in dendrites.hdf5. Here, "dendrites" refers to all postsynaptic targets (including, for example, neuronal somata). The file is organized as follows:

/dendrites has the same structure as /axons (see above)
/dendrites/class encodes the postsynaptic target class. The classes and their integer coding are: somata (8), proximal (spiny) dendrites (9), smooth dendrites (5), apical dendrites (1), axon initial segments (2), and other dendrites (4). This information is stored in machine-readable form as attributes on the /dendrites/class dataset.
/dendrites/neuronId stores the neuron identity. Somata and dendrites are marked by the same positive integer if and only if they belong to the same neuron. Dendrites that could not be traced back to a soma are marked by zeroes.
/dendrites/mappedSegIds is a 11400 dataset that contains the dendrites' segment IDs in the mapped segmentation.
/spineHeads/agglomerate contains the spine head agglomerates (i.e., segment equivalence classes).
- /spineHeads/agglomerate/4987, for example, contains the IDs of the segments that make up spine head 4987.
/spineHeads/volume contains the spine head volumes in µm³.

Manual Neuron Reconstructions

The manually generated neuron tracings that were used for quantitative evaluation of the soma-based neuron reconstructions and for analysis of the output synapses of L4 neurons are available as NML files in the manual-neuron-reconstructions directory.

Synapses and Connectome

The connectome shown in figure 3 is available as CSV file. Rows and columns correspond to presynaptic axons and postsynaptic targets, respectively. The first row and column indicate the axon classes and target types, respectively. Each connectome entry contains the number of synapses established between a axon-target pair.

Information on individual synapses is stored in synapses.hdf5.

Each synapse consists of a set of presynaptic segments and of a set of postsynaptic segments. The information pertaining to synapse 244418, for example, is stored in the following two datasets:

/synapses/preSegIds/244418 contains the presynaptic segment IDs
/synapses/postSegIds/244418 contains the postsynaptic segment IDs

The following information was then derived from all of the above:

/synapses/position contains the position of the synapse
/synapses/preAxonId contains the ID of the presynaptic axon (see /axons). Zero indicates that the presynaptic site was not part of the reconstructed axons.
/synapses/postDendriteId contains the ID of the postsynaptic dendrite (see /dendrites). Zero indicates that the postsynaptic site was not part of the reconstructed dendrites.
/synapses/postSpineHeadId contains the ID of the postsynaptic spine head (see /spineHeads). Zero indicates that the synapse did not not innervate any of the reconstructed spine heads.
/synapses/type encodes the synapse type. The numbers 1 through 4 indicate primary spine, secondary spine, shaft, and soma synapses, respectively. This information is stored in machine-readable form as attributes on /synapses/type.
/synapses/preSplitAxonId contains the ID of the split presynaptic axon. The analysis of same-axon same-dendrite spine synapse pairs was performed after splitting the axon reconstructions at all - potentially merger-induced - branchpoints. This reduces the number of false same-axon same-dendrite synapse pairs.
/synapses/asiArea contains the axon-spine interface area in µm² for all spine synapses between split axons and dendrites. The entry is zero for all other synapses.

Utilities

The segments.hdf5 file contains the following supplementary datasets:

/segments/position contains the segment position
/segments/voxelCount contains the segment size in number of voxels

Code

All programming code used in this study is available

for download as ZIP file (code.zip)
in a git repository

Contact

mh@brain.mpg.de
Moritz Helmstaedter
Department of Connectomics
Max Planck Institute for Brain Research
Max-von-Laue-Strasse 4
D-60438 Frankfurt am Main
Germany

Data Repository forDense Connectomic Reconstruction in Layer 4 of the Somatosensory Cortex