Sharing neuroscience data in an open format#

When sharing data and metadata, it is important to consider the FAIR guiding principles. Among these principles are interoperability and reusability, which include the idea that data and metadata should be stored in a format that can be read by many different tools, and should be thoroughly annotated with metadata.

In neurophysiology, a large fraction of experimental data are obtained using commercial setups with dedicated software, which stores data in a proprietary format. While Neo can help greatly in being able to read many proprietary formats, (i) not everyone uses Python, and (ii) many metadata of interest cannot be contained in these formats, and metadata labels do not in general use standardised terminologies.

Conversely, data from neuroscience simulations are often stored in open formats, such as plain text files or HDF5, but with considerable variability between users and little or no standardisation.

It is therefore advantageous to convert to, or write data directly in, an open format that conforms to FAIR principles.

Neo supports two such formats:

Example#

Before getting into the details of the formats, we present an example dataset that will be used to demonstrate the process of converting to open formats.

Our example is a public dataset, “Whole cell patch-clamp recordings of cerebellar granule cells”, contributed to the EBRAINS repository by Marialuisa Tognolina from the laboratory of Egidio D’Angelo at the University of Pavia.

As we can see from the dataset description,

This dataset provides a characterization of the intrinsic excitability and synaptic properties of the cerebellar granule cells. Whole-cell patch-clamp recordings were performed on acute parasagittal cerebellar slices obtained from juvenile Wistar rats (p18-p24). Passive granule cells parameters were extracted in voltage-clamp mode by analyzing current relaxation induced by step voltage changes (IV protocol). Granule cells intrinsic excitability was investigated in current-clamp mode by injecting 2 s current steps (CC step protocol). Synaptic transmission properties were investigated in current clamp mode by an electrical stimulation of the mossy fibers bundle (5 pulses at 50 Hz, EPSP protocol).

The dataset contains recordings from multiple subjects. For this example. let’s download the data for Subject 15. You can download them by hand, by selecting each file then selecting “Download file”, or run the following code:

In [1]: from urllib.request import urlretrieve

In [2]: from urllib.parse import quote

In [3]: dataset_url = "https://object.cscs.ch/v1/AUTH_63ea6845b1d34ad7a43c8158d9572867/hbp-d000017_PatchClamp-GranuleCells_pub"

In [4]: folder = "GrC_Subject15_180116"

In [5]: filenames = ["180116_0004 IV -70.abf", "180116_0005 CC step.abf", "180116_0006 EPSP.abf"]

In [6]: for filename in filenames:
   ...:     datafile_url = f"{dataset_url}/{folder}/{quote(filename)}"
   ...:     local_file = urlretrieve(datafile_url, filename)
   ...: 

Let’s start with the current-clamp data. The data are in Axon format (suffix “.abf”), so we could import AxonIO directly, but we can also ask Neo to guess the format using the get_io() function:

In [7]: from neo.io import get_io

In [8]: reader = get_io("180116_0005 CC step.abf")

In [9]: data = reader.read()

In [10]: data
Out[10]: 
[Block with [<neo.core.segment.Segment object at 0x7f6bb5c22990>, <neo.core.segment.Segment object at 0x7f6bb3172510>, <neo.core.segment.Segment object at 0x7f6bb3172bd0>, <neo.core.segment.Segment object at 0x7f6bb34e44d0>, <neo.core.segment.Segment object at 0x7f6bb31d29d0>, <neo.core.segment.Segment object at 0x7f6bb31812d0>, <neo.core.segment.Segment object at 0x7f6bb3183110>, <neo.core.segment.Segment object at 0x7f6bb3182750>, <neo.core.segment.Segment object at 0x7f6bb31822d0>, <neo.core.segment.Segment object at 0x7f6bb5973c90>, <neo.core.segment.Segment object at 0x7f6bb3180150>, <neo.core.segment.Segment object at 0x7f6bb317ff50>, <neo.core.segment.Segment object at 0x7f6bb3157450>, <neo.core.segment.Segment object at 0x7f6bb3156d50>, <neo.core.segment.Segment object at 0x7f6bb3157f50>] segments, [<neo.core.group.Group object at 0x7f6bb33ac050>] groups
 annotations: {'abf_version': 2.0}
 file_origin: '180116_0005 CC step.abf'
 rec_datetime: datetime.datetime(2016, 1, 18, 16, 27, 26, 875000)
 # segments (N=[<neo.core.segment.Segment object at 0x7f6bb5c22990>, <neo.core.segment.Segment object at 0x7f6bb3172510>, <neo.core.segment.Segment object at 0x7f6bb3172bd0>, <neo.core.segment.Segment object at 0x7f6bb34e44d0>, <neo.core.segment.Segment object at 0x7f6bb31d29d0>, <neo.core.segment.Segment object at 0x7f6bb31812d0>, <neo.core.segment.Segment object at 0x7f6bb3183110>, <neo.core.segment.Segment object at 0x7f6bb3182750>, <neo.core.segment.Segment object at 0x7f6bb31822d0>, <neo.core.segment.Segment object at 0x7f6bb5973c90>, <neo.core.segment.Segment object at 0x7f6bb3180150>, <neo.core.segment.Segment object at 0x7f6bb317ff50>, <neo.core.segment.Segment object at 0x7f6bb3157450>, <neo.core.segment.Segment object at 0x7f6bb3156d50>, <neo.core.segment.Segment object at 0x7f6bb3157f50>])
 0: Segment with [<AnalogSignal(array([[-65.7959 ],
       [-65.7959 ],
       [-65.64331],
       ...,
       [-65.76538],
       [-65.85693],
       [-65.85693]], dtype=float32) * mV, [0.0 s, 3.2 s], sampling rate: 4000.0 Hz)>] analogsignals, [<Event: >] events
    annotations: {'abf_version': 2.0}
    # analogsignals (N=[<AnalogSignal(array([[-65.7959 ],
       [-65.7959 ],
       [-65.64331],
       ...,
       [-65.76538],
       [-65.85693],
       [-65.85693]], dtype=float32) * mV, [0.0 s, 3.2 s], sampling rate: 4000.0 Hz)>])
    0: AnalogSignal with 1 channels of length 12800; units mV; datatype float32
       name: 'Signals'
       annotations: {'stream_id': '0'}
       sampling rate: 4000.0 Hz
       time: 0.0 s to 3.2 s
 1: Segment with [<AnalogSignal(array([[-66.74194 ],
       [-66.55884 ],
       [-66.77246 ],
       ...,
       [-64.575195],
       [-64.54468 ],
       [-64.575195]], dtype=float32) * mV, [4.0 s, 7.2 s], sampling rate: 4000.0 Hz)>] analogsignals, [<Event: >] events
    annotations: {'abf_version': 2.0}
    # analogsignals (N=[<AnalogSignal(array([[-66.74194 ],
       [-66.55884 ],
       [-66.77246 ],
       ...,
       [-64.575195],
       [-64.54468 ],
       [-64.575195]], dtype=float32) * mV, [4.0 s, 7.2 s], sampling rate: 4000.0 Hz)>])
    0: AnalogSignal with 1 channels of length 12800; units mV; datatype float32
       name: 'Signals'
       annotations: {'stream_id': '0'}
       sampling rate: 4000.0 Hz
       time: 4.0 s to 7.2 s
 2: Segment with [<AnalogSignal(array([[-66.40625 ],
       [-66.52832 ],
       [-66.28418 ],
       ...,
       [-65.24658 ],
       [-65.24658 ],
       [-65.582275]], dtype=float32) * mV, [8.0 s, 11.2 s], sampling rate: 4000.0 Hz)>] analogsignals, [<Event: >] events
    annotations: {'abf_version': 2.0}
    # analogsignals (N=[<AnalogSignal(array([[-66.40625 ],
       [-66.52832 ],
       [-66.28418 ],
       ...,
       [-65.24658 ],
       [-65.24658 ],
       [-65.582275]], dtype=float32) * mV, [8.0 s, 11.2 s], sampling rate: 4000.0 Hz)>])
    0: AnalogSignal with 1 channels of length 12800; units mV; datatype float32
       name: 'Signals'
       annotations: {'stream_id': '0'}
       sampling rate: 4000.0 Hz
       time: 8.0 s to 11.2 s
 3: Segment with [<AnalogSignal(array([[-64.88037],
       [-64.78882],
       [-64.63623],
       ...,
       [-66.28418],
       [-66.19263],
       [-66.19263]], dtype=float32) * mV, [12.0 s, 15.2 s], sampling rate: 4000.0 Hz)>] analogsignals, [<Event: >] events
    annotations: {'abf_version': 2.0}
    # analogsignals (N=[<AnalogSignal(array([[-64.88037],
       [-64.78882],
       [-64.63623],
       ...,
       [-66.28418],
       [-66.19263],
       [-66.19263]], dtype=float32) * mV, [12.0 s, 15.2 s], sampling rate: 4000.0 Hz)>])
    0: AnalogSignal with 1 channels of length 12800; units mV; datatype float32
       name: 'Signals'
       annotations: {'stream_id': '0'}
       sampling rate: 4000.0 Hz
       time: 12.0 s to 15.2 s
 4: Segment with [<AnalogSignal(array([[-66.65039 ],
       [-66.589355],
       [-66.589355],
       ...,
       [-65.826416],
       [-65.88745 ],
       [-65.704346]], dtype=float32) * mV, [16.0 s, 19.2 s], sampling rate: 4000.0 Hz)>] analogsignals, [<Event: >] events
    annotations: {'abf_version': 2.0}
    # analogsignals (N=[<AnalogSignal(array([[-66.65039 ],
       [-66.589355],
       [-66.589355],
       ...,
       [-65.826416],
       [-65.88745 ],
       [-65.704346]], dtype=float32) * mV, [16.0 s, 19.2 s], sampling rate: 4000.0 Hz)>])
    0: AnalogSignal with 1 channels of length 12800; units mV; datatype float32
       name: 'Signals'
       annotations: {'stream_id': '0'}
       sampling rate: 4000.0 Hz
       time: 16.0 s to 19.2 s
 5: Segment with [<AnalogSignal(array([[-65.88745 ],
       [-65.91797 ],
       [-65.7959  ],
       ...,
       [-65.979004],
       [-66.00952 ],
       [-65.85693 ]], dtype=float32) * mV, [20.0 s, 23.2 s], sampling rate: 4000.0 Hz)>] analogsignals, [<Event: >] events
    annotations: {'abf_version': 2.0}
    # analogsignals (N=[<AnalogSignal(array([[-65.88745 ],
       [-65.91797 ],
       [-65.7959  ],
       ...,
       [-65.979004],
       [-66.00952 ],
       [-65.85693 ]], dtype=float32) * mV, [20.0 s, 23.2 s], sampling rate: 4000.0 Hz)>])
    0: AnalogSignal with 1 channels of length 12800; units mV; datatype float32
       name: 'Signals'
       annotations: {'stream_id': '0'}
       sampling rate: 4000.0 Hz
       time: 20.0 s to 23.2 s
 6: Segment with [<AnalogSignal(array([[-65.85693],
       [-65.64331],
       [-65.94849],
       ...,
       [-65.7959 ],
       [-65.85693],
       [-66.00952]], dtype=float32) * mV, [24.0 s, 27.2 s], sampling rate: 4000.0 Hz)>] analogsignals, [<Event: >] events
    annotations: {'abf_version': 2.0}
    # analogsignals (N=[<AnalogSignal(array([[-65.85693],
       [-65.64331],
       [-65.94849],
       ...,
       [-65.7959 ],
       [-65.85693],
       [-66.00952]], dtype=float32) * mV, [24.0 s, 27.2 s], sampling rate: 4000.0 Hz)>])
    0: AnalogSignal with 1 channels of length 12800; units mV; datatype float32
       name: 'Signals'
       annotations: {'stream_id': '0'}
       sampling rate: 4000.0 Hz
       time: 24.0 s to 27.2 s
 7: Segment with [<AnalogSignal(array([[-65.093994],
       [-65.39917 ],
       [-65.00244 ],
       ...,
       [-64.88037 ],
       [-65.18555 ],
       [-65.06348 ]], dtype=float32) * mV, [28.0 s, 31.2 s], sampling rate: 4000.0 Hz)>] analogsignals, [<Event: >] events
    annotations: {'abf_version': 2.0}
    # analogsignals (N=[<AnalogSignal(array([[-65.093994],
       [-65.39917 ],
       [-65.00244 ],
       ...,
       [-64.88037 ],
       [-65.18555 ],
       [-65.06348 ]], dtype=float32) * mV, [28.0 s, 31.2 s], sampling rate: 4000.0 Hz)>])
    0: AnalogSignal with 1 channels of length 12800; units mV; datatype float32
       name: 'Signals'
       annotations: {'stream_id': '0'}
       sampling rate: 4000.0 Hz
       time: 28.0 s to 31.2 s
 8: Segment with [<AnalogSignal(array([[-65.39917 ],
       [-65.2771  ],
       [-65.39917 ],
       ...,
       [-66.467285],
       [-66.467285],
       [-66.345215]], dtype=float32) * mV, [32.0 s, 35.2 s], sampling rate: 4000.0 Hz)>] analogsignals, [<Event: >] events
    annotations: {'abf_version': 2.0}
    # analogsignals (N=[<AnalogSignal(array([[-65.39917 ],
       [-65.2771  ],
       [-65.39917 ],
       ...,
       [-66.467285],
       [-66.467285],
       [-66.345215]], dtype=float32) * mV, [32.0 s, 35.2 s], sampling rate: 4000.0 Hz)>])
    0: AnalogSignal with 1 channels of length 12800; units mV; datatype float32
       name: 'Signals'
       annotations: {'stream_id': '0'}
       sampling rate: 4000.0 Hz
       time: 32.0 s to 35.2 s
 9: Segment with [<AnalogSignal(array([[-66.52832],
       [-66.74194],
       [-66.43677],
       ...,
       [-65.73486],
       [-65.85693],
       [-65.91797]], dtype=float32) * mV, [36.0 s, 39.2 s], sampling rate: 4000.0 Hz)>] analogsignals, [<Event: >] events
    annotations: {'abf_version': 2.0}
    # analogsignals (N=[<AnalogSignal(array([[-66.52832],
       [-66.74194],
       [-66.43677],
       ...,
       [-65.73486],
       [-65.85693],
       [-65.91797]], dtype=float32) * mV, [36.0 s, 39.2 s], sampling rate: 4000.0 Hz)>])
    0: AnalogSignal with 1 channels of length 12800; units mV; datatype float32
       name: 'Signals'
       annotations: {'stream_id': '0'}
       sampling rate: 4000.0 Hz
       time: 36.0 s to 39.2 s
 10: Segment with [<AnalogSignal(array([[-65.91797 ],
       [-66.07056 ],
       [-65.73486 ],
       ...,
       [-65.49072 ],
       [-65.12451 ],
       [-65.582275]], dtype=float32) * mV, [40.0 s, 43.2 s], sampling rate: 4000.0 Hz)>] analogsignals, [<Event: >] events
    annotations: {'abf_version': 2.0}
    # analogsignals (N=[<AnalogSignal(array([[-65.91797 ],
       [-66.07056 ],
       [-65.73486 ],
       ...,
       [-65.49072 ],
       [-65.12451 ],
       [-65.582275]], dtype=float32) * mV, [40.0 s, 43.2 s], sampling rate: 4000.0 Hz)>])
    0: AnalogSignal with 1 channels of length 12800; units mV; datatype float32
       name: 'Signals'
       annotations: {'stream_id': '0'}
       sampling rate: 4000.0 Hz
       time: 40.0 s to 43.2 s
 11: Segment with [<AnalogSignal(array([[-65.18555 ],
       [-65.2771  ],
       [-65.216064],
       ...,
       [-64.697266],
       [-64.208984],
       [-64.39209 ]], dtype=float32) * mV, [44.0 s, 47.2 s], sampling rate: 4000.0 Hz)>] analogsignals, [<Event: >] events
    annotations: {'abf_version': 2.0}
    # analogsignals (N=[<AnalogSignal(array([[-65.18555 ],
       [-65.2771  ],
       [-65.216064],
       ...,
       [-64.697266],
       [-64.208984],
       [-64.39209 ]], dtype=float32) * mV, [44.0 s, 47.2 s], sampling rate: 4000.0 Hz)>])
    0: AnalogSignal with 1 channels of length 12800; units mV; datatype float32
       name: 'Signals'
       annotations: {'stream_id': '0'}
       sampling rate: 4000.0 Hz
       time: 44.0 s to 47.2 s
 12: Segment with [<AnalogSignal(array([[-65.30762 ],
       [-65.42969 ],
       [-65.216064],
       ...,
       [-63.964844],
       [-63.964844],
       [-64.02588 ]], dtype=float32) * mV, [48.0 s, 51.2 s], sampling rate: 4000.0 Hz)>] analogsignals, [<Event: >] events
    annotations: {'abf_version': 2.0}
    # analogsignals (N=[<AnalogSignal(array([[-65.30762 ],
       [-65.42969 ],
       [-65.216064],
       ...,
       [-63.964844],
       [-63.964844],
       [-64.02588 ]], dtype=float32) * mV, [48.0 s, 51.2 s], sampling rate: 4000.0 Hz)>])
    0: AnalogSignal with 1 channels of length 12800; units mV; datatype float32
       name: 'Signals'
       annotations: {'stream_id': '0'}
       sampling rate: 4000.0 Hz
       time: 48.0 s to 51.2 s
 13: Segment with [<AnalogSignal(array([[-64.30054 ],
       [-64.36157 ],
       [-64.36157 ],
       ...,
       [-63.201904],
       [-62.927246],
       [-63.23242 ]], dtype=float32) * mV, [52.0 s, 55.2 s], sampling rate: 4000.0 Hz)>] analogsignals, [<Event: >] events
    annotations: {'abf_version': 2.0}
    # analogsignals (N=[<AnalogSignal(array([[-64.30054 ],
       [-64.36157 ],
       [-64.36157 ],
       ...,
       [-63.201904],
       [-62.927246],
       [-63.23242 ]], dtype=float32) * mV, [52.0 s, 55.2 s], sampling rate: 4000.0 Hz)>])
    0: AnalogSignal with 1 channels of length 12800; units mV; datatype float32
       name: 'Signals'
       annotations: {'stream_id': '0'}
       sampling rate: 4000.0 Hz
       time: 52.0 s to 55.2 s
 14: Segment with [<AnalogSignal(array([[-62.98828 ],
       [-63.354492],
       [-63.079834],
       ...,
       [-61.187744],
       [-61.279297],
       [-61.309814]], dtype=float32) * mV, [56.0 s, 59.2 s], sampling rate: 4000.0 Hz)>] analogsignals, [<Event: >] events
    annotations: {'abf_version': 2.0}
    # analogsignals (N=[<AnalogSignal(array([[-62.98828 ],
       [-63.354492],
       [-63.079834],
       ...,
       [-61.187744],
       [-61.279297],
       [-61.309814]], dtype=float32) * mV, [56.0 s, 59.2 s], sampling rate: 4000.0 Hz)>])
    0: AnalogSignal with 1 channels of length 12800; units mV; datatype float32
       name: 'Signals'
       annotations: {'stream_id': '0'}
       sampling rate: 4000.0 Hz
       time: 56.0 s to 59.2 s]

We can see that the file contains a single Block, containing 15 Segments, and each segment contains one AnalogSignal with a single channel, and an Event.

To quickly take a look at the data, let’s plot it:

In [11]: import matplotlib.pyplot as plt

In [12]: fig = plt.figure(figsize=(10, 5))

In [13]: for segment in data[0].segments:
   ....:     signal = segment.analogsignals[0]
   ....:     plt.plot(signal.times, signal)
   ....: 

In [14]: plt.xlabel(f"Time ({signal.times.units.dimensionality.string})")
Out[14]: Text(0.5, 0, 'Time (s)')

In [15]: plt.ylabel(f"Voltage ({signal.units.dimensionality.string})")
Out[15]: Text(0, 0.5, 'Voltage (mV)')

In [16]: plt.savefig("open_format_example_cc_step.png")
_images/open_format_example_cc_step.png

Now we’ve read the data into Neo, we’re ready to write them to an open format.

NIX#

The NIX data model allows storing a fully annotated scientific dataset, i.e. the data together with its metadata, within a single container. The current implementations use the HDF5 file format as a storage backend.

For users of Neo, the advantage of NIX is that all Neo objects can be stored in an open format, HDF5, readable with many different tools, without needing to add extra annotations or structure the dataset in any specific way.

Using Neo’s NIXIO requires some additional dependencies. To install Neo with NIXIO support, run:

$ pip install neo[nixio]

Writing our example dataset to NIX format is straightforward:

In [17]: from neo.io import NixIO

In [18]: writer = NixIO("GrC_Subject15_180116.nix", mode="ow")

In [19]: writer.write(data)

Neurodata Without Borders (NWB)#

Neurodata Without Borders (NWB:N) is an open standard file format for neurophysiology.

Using Neo’s NWBIO requires some additional dependencies. To install Neo with NWB support, run:

$ pip install neo[nwb]

NWBIO can read NWB 2.0-format files, and maps their structure onto Neo objects and annotations.

NWBIO can also write to NWB 2.0 format. Since NWB has a more complex structure than Neo’s basic Block - Segment hierarchy, and NWB requires fairly extensive metadata, it is recommended to annotate the Neo objects with special, NWB-specific annotations, to ensure data and metadata are correctly placed within the NWB file.

The location of data stored in an NWB file depends on the source of the data, e.g. whether they are stimuli, intracellular electrophysiology recordings, extracellular electrophysiology recordings, behavioural measuremenets, etc. For this, we need to annotate all data objects with special metadata, identified by keys starting with “nwb_”:

In [20]: signal_metadata = {
   ....:     "nwb_group": "acquisition",
   ....:     "nwb_neurodata_type": ("pynwb.icephys", "PatchClampSeries"),
   ....:     "nwb_electrode": {
   ....:         "name": "patch clamp electrode",
   ....:         "description": "The patch-clamp pipettes were pulled from borosilicate glass capillaries "
   ....:                        "(Hilgenberg, Malsfeld, Germany) and filled with intracellular solution "
   ....:                        "(K-gluconate based solution)",
   ....:         "device": {
   ....:            "name": "patch clamp electrode"
   ....:         }
   ....:     },
   ....:     "nwb:gain": 1.0
   ....: }
   ....: 

In [21]: for segment in data[0].segments:
   ....:     signal = segment.analogsignals[0]
   ....:     signal.annotate(**signal_metadata)
   ....: 

We can also provide global metadata, either attaching them to a Neo Block or passing them to the write() method. Here we take metadata from the dataset description on the EBRAINS search portal:

In [22]: global_metadata = {
   ....:     "session_start_time": data[0].rec_datetime,
   ....:     "identifier": data[0].file_origin,
   ....:     "session_id": "180116_0005",
   ....:     "institution": "University of Pavia",
   ....:     "lab": "D'Angelo Lab",
   ....:     "related_publications": "https://doi.org/10.1038/s42003-020-0953-x"
   ....: }
   ....: 

Now that we have annotated our dataset, we can write it to an NWB file:

In [23]: from neo.io import NWBIO

In [24]: writer = NWBIO("GrC_Subject15_180116.nwb", mode="w", **global_metadata)

In [25]: writer.write(data)

Note

Neo support for NWB is a work-in-progress, it does not currently support NWB extensions for example. If you encounter a problem reading an NWB file with Neo, please make a bug report (see Reporting bugs, requesting new features).