NeuroHDF documentation

Neuroscientists need to manage and integrate data from anatomy, physiology, behavior and simulation data on multiple spatial and temporal scales and across modalities, individuals and species. Large amounts of data with complex data types are to be produced in the coming decades - and viable solutions for databasing, data sharing and interoperability of software tools are needed.

Hierarchical Data Format (HDF5) is a data model, library, and file format for storing and managing data. It supports an unlimited variety of datatypes, and is designed for flexible and efficient I/O and for high volume and complex data.”

NeuroHDF is an effort to combine the flexibility and efficiency of HDF5 for neuroscience datasets through the specification of a simple layout for different data types with minimal Metadata. The NeuroHDF Interest Group consists of the members of this group.

Multi-compartment neural circuitry

The SWC format became the quasi-standard for the description of single neuronal cell morphology reconstructions. For the description of larger neural circuits with many neurons and their synaptic connectivity, a new and efficient data format is needed. NeuroHDF describes a multi-compartmental neural circuit, similar to SWC, with points in 3D space (vertices) and their connectivity. Attributes like vertex (skeleton node, connector, root) or edge (presynaptic_to, postsynaptic_to) type or radius are expressed as arrays corresponding to the vertices or edges.

You can use Hdfview to inspect an example NeuroHDF file. The software tool CATMAID for neural circuit reconstruction exports microcircuits in this format. Another emerging standard is libNeuroML.

Useful for neuroscience data types

  • Single cell morphology
  • Neural circuit reconstructions

Tool supporting this specification

  • CATMAID exports neural circuit reconstructions in NeuroHDF

N-dimensional, homogeneous arrays

By using a simple convention to describe metadata about the array axes, basic information is available to make sensible interpretation of the array’s content.

Useful for neuroscience data types

  • Electron microscopy: 3D array, 3 spatial dimension after alignment
  • Optical microscopy: 4D array, 3 spatial dimension and 1 channel dimension
  • Labeling
  • Functional MRI/PET: 4D array, with 3 spatial and 1 temporal dimension
  • Structural MRI: 3D array, with 3 spatial dimensions
  • Diffusion MRI: 4D array, with 3 spatial dimension and 1 dimension for gradient directions Contain metadata tables for b-values and b-vectors

Tool supporting this specification

None so far. A zebra fish dataset available as HDF5 files uses a similar specification for axes units.

Example generation

Multiscale image datasets

As an extension to generic N-dimensional, homogeneous arrays representation, addition of subgroups for different scales can represent multiple (spatial) scales.

Useful for neuroscience data types

  • Multi-scale electron microscopy

Tool supporting this specification

None so far.

Example generation

Physiology

<Text and example dataset>

Useful for neuroscience data types

  • Extracellular recordings
  • Intracellular recordings
  • Calcium imaging
  • EEG
  • MEG
  • NIRS

Tool supporting this specification

None so far.

Example generation

Surfaces

A proliferation of file formats exist for 3D surfaces. The most widely used scheme to store surfaces is as triangular meshes, using vertices (points in 3D) and faces (usually triangles which describe the connectivity of the vertices). Additionally, values can be stored either on the vertices or faces. We propose the same convention to store surfaces in NeuroHDF. Additionally, level-of-detail meshes can be expressed with an additional group indirection.

Useful for neuroscience data types

  • Cortical and subcortical surface-based atlases
  • Morphology of neurons or subcellular components

Tool supporting this specification

None so far.

Example generation

Behavioral datasets

  • Behavioral experiments of tracked animals moving on a 2D plate
    • Irregular spatio-temporal data in a spatial reference system
  • Questionnaire results

Simulation

<Text and example dataset>

Useful for neuroscience data types

Tool supporting this specification

None so far.

Example generation

Serial section 2D images

<Text and example dataset>

Useful for neuroscience data types

  • Serial section electron microscopy

Tool supporting this specification

None so far.

Example generation

Evaluation of HDF5

Main HDF Group page http://www.hdfgroup.org/

Advantages of using HDF5

  • Compact binary data storage, extensible metadata
  • Fast random and parallel access, efficient, scalable
  • Widely used in High Performance Computing
  • Open source and cross-platform
  • HDF5-Fast Query and paper

Possible limitations of HDF5

Further reading

Biological image formats

Microscopy formats and metadata

General