=================================== Reading and analyzing data with Neo =================================== .. currentmodule:: neo.core Getting started =============== Neo is a library for working with neurophysiology data in the Python programming language. One of the big advantages of Neo is that it works with many different file formats: it doesn't matter which format your data is stored in, Neo provides a standard way to read the data, and then represents it in a standardised way, as a set of Python objects. .. loading data with an IO module The first step in reading data is to import the appropriate Neo input-output (or IO) module for your data. For this example, we're going to work with membrane potential traces stored in a text file, so we use the :class:`~neo.io.AsciiSignalIO` module: .. ipython:: In [1]: from neo.io import AsciiSignalIO In [2]: data = AsciiSignalIO("example_data.txt", delimiter=" ").read() .. link to list of IO modules .. note:: For a full list of IO modules provided by Neo, see :doc:`iolist`. .. navigating through the tree .. ipython:: In [3]: data Out[3]: [Block with 1 segments file_origin: 'example_data.txt' # segments (N=1) 0: Segment with 1 analogsignals # analogsignals (N=1) 0: AnalogSignal with 4 channels of length 10001; units mV; datatype float32 name: 'multichannel' sampling rate: 1.0 kHz time: 0.0 s to 10.001 s] Different data files can contain different amounts of data, from single traces to multiple recording sessions. To provide consistent behaviour, for all IO modules, the :meth:`read()` method returns a list of data blocks. A :class:`Block` typically represents a recording session. Each block contains a list of segments, where each :class:`Segment` contains data recorded at the same time. In this example file, we see a single type of data, the "analog signal", which represents continuous time series sampled at a fixed interval. The other types of data that can be contained in a :class:`Segment` are discussed below under :ref:`section-data-types`. .. note:: :meth:`read()` reads the entire file into memory at once. If you only want to access part of the data, you can do so using Neo's "lazy" data loading - see the section on :ref:`section-performance-memory` below. .. plotting data Neo data objects are based on NumPy_ arrays, and behave very similarly. For example, they can be plotted just like arrays: .. ipython:: In [4]: import matplotlib.pyplot as plt In [5]: signal = data[0].segments[0].analogsignals[0] In [6]: plt.plot(signal.times, signal) Out[6]: [, , , ] In [7]: plt.xlabel(f"Time ({signal.times.units.dimensionality.string})") Out[7]: Text(0.5, 0, 'Time (s)') In [8]: plt.ylabel(f"Membrane potential ({signal.units.dimensionality.string})") Out[8]: Text(0, 0.5, 'Membrane potential (mV)') In [9]: plt.savefig("example_plot.png") .. image:: example_plot.png .. link to examples. You now know enough to start using Neo. For more examples, see :doc:`../examples/index`. If you want to know more, read on. NumPy ===== Neo is based on NumPy_. All Neo data classes behave like NumPy arrays, but have extra functionality. The first addition is support for units. In contrast to a plain NumPy array, an :class:`AnalogSignal` knows the units of the data it contains, e.g.: .. ipython:: In [10]: signal.units Out[10]: array(1.) * mV This helps avoid errors like adding signals with different units, lets you auto-generate figure axis labels, and makes it easy to change units, like here from millivolts to volts e.g.: .. ipython:: In [11]: signal.magnitude[:5] Out[11]: array([[-58.469, -52.855, -58.79 , -58.815], [-59.061, -50.518, -52.003, -57.24 ], [-59.01 , -50.51 , -51.988, -57.199], [-58.96 , -50.503, -51.973, -57.158], [-58.911, -50.495, -51.958, -57.117]], dtype=float32) In [12]: signal.rescale("V").magnitude[:5] Out[12]: array([[-0.058469 , -0.052855 , -0.05879 , -0.058815 ], [-0.059061 , -0.050518 , -0.052003 , -0.05724001], [-0.05901 , -0.05051 , -0.051988 , -0.05719901], [-0.05896 , -0.050503 , -0.051973 , -0.057158 ], [-0.058911 , -0.050495 , -0.051958 , -0.057117 ]], dtype=float32) The second addition is support for structured metadata. Some of these metadata are *required*. For example, an :class:`AnalogSignal` must always have a :attr:`sampling_rate` attribute, and Neo will produce an Exception if you try to add two signals with different sampling rates: .. ipython:: In [13]: signal.sampling_rate Out[13]: array(1.) * kHz Some of these metadata are recommended but optional, like a name for each signal. Such metadata appear as attributes of the data objects: .. ipython:: In [14]: signal.name Out[14]: 'multichannel' And finally, some metadata are fully optional. These are stored in the :attr:`annotations` and :attr:`array_annotations` attributes: .. ipython:: In [15]: signal.array_annotations Out[15]: {'channel_index': array([0, 1, 2, 3])} For more information about this, see :ref:`section-annotations`. .. treating Neo data objects like numpy arrays, quantity arrays, converting between them, rescaling Most NumPy array methods also work on Neo data objects, e.g.: .. ipython:: In [16]: signal.mean() Out[16]: array(-56.33598, dtype=float32) * mV Data objects can be sliced like arrays (array annotations are automatically sliced appropriately): .. ipython:: In [17]: signal[100:110, 1:3] Out[17]: AnalogSignal with 2 channels of length 10; units mV; datatype float32 name: 'multichannel' sampling rate: 1.0 kHz time: 0.1 s to 0.11 s In [18]: signal[100:110, 1:3].array_annotations Out[18]: {'channel_index': array([1, 2])} To convert a Neo data object to a plain NumPy array, use the :attr:`magnitude` attribute: .. ipython:: In [19]: signal[100:110, 1:3].magnitude Out[19]: array([[-60.654, -60. ], [-60.724, -60. ], [-60.792, -60. ], [-60.859, -60. ], [-60.924, -60. ], [-60.989, -60. ], [-61.052, -60. ], [-61.114, -60. ], [-61.175, -60. ], [-61.234, -60. ]], dtype=float32) .. _section-data-types: Data types ========== The following classes directly represent data as arrays of numerical values with associated metadata (units, sampling frequency, etc.). * :py:class:`AnalogSignal`: A regular sampling of a single- or multi-channel continuous analog signal. * :py:class:`IrregularlySampledSignal`: A non-regular sampling of a single- or multi-channel continuous analog signal. * :py:class:`SpikeTrain`: A set of action potentials (spikes) emitted by the same unit in a period of time (with optional waveforms). * :py:class:`Event`: An array of time points representing one or more events in the data. * :py:class:`Epoch`: An array of time intervals representing one or more periods of time in the data. * :py:class:`ImageSequence`: A three dimensional array representing a sequence of images. AnalogSignal ------------ We have already met the :class:`AnalogSignal`, which represents continuous time series sampled at a fixed interval. In addition to reading data from a file, as above, it is also possible to create new signal objects directly, e.g.: .. ipython:: In [20]: import numpy as np In [21]: from quantities import mV, kHz In [22]: from neo import AnalogSignal In [23]: signal = AnalogSignal(np.random.normal(-65.0, 5.0, size=(100, 5)), ....: units=mV, sampling_rate=1 * kHz) In [24]: signal Out[24]: AnalogSignal with 5 channels of length 100; units mV; datatype float64 sampling rate: 1.0 kHz time: 0.0 s to 0.1 s IrregularlySampledSignal ------------------------ :class:`IrregularlySampledSignal` represents continuous time series sampled at non-regular time points. This means that instead of specifying the sampling rate or sampling interval, you must specify the array of times at which the signal was sampled. .. ipython:: In [25]: from quantities import ms, nA In [26]: from neo import IrregularlySampledSignal In [27]: isignal = IrregularlySampledSignal( ....: times=[0.0, 1.11, 4.27, 16.38, 19.33] * ms, ....: signal=[0.5, 0.8, 0.5, 0.7, 0.2] * nA, ....: description="input current") In [28]: isignal Out[28]: IrregularlySampledSignal with 1 channels of length 5; units nA; datatype float64 description: 'input current' sample times: [ 0. 1.11 4.27 16.38 19.33] ms .. note:: in case of multi-channel data, samples are assumed to have been taken at the same time points in all channels. If you need to specify different time points for different channels, use one signal object per channel. SpikeTrain ---------- A :class:`SpikeTrain` represents the times of occurrence of action potentials (spikes). .. ipython:: In [29]: from neo import SpikeTrain In [30]: spike_train = SpikeTrain([3, 4, 5], units='sec', t_stop=10.0) In [31]: spike_train Out[31]: SpikeTrain containing 3 spikes; units s; datatype float64 time: 0.0 s to 10.0 s It may also contain the waveforms of the action potentials, stored as :class:`AnalogSignals` *within* the spike train object - see the reference documentation for more on this. Event ----- It is common in electrophysiology experiments to record the times of specific events, such as the times at which stimuli are presented. An :class:`Event` contains an array of times at which events occurred, together with an *optional* array of labels for the events, e.g.: .. ipython:: In [32]: from neo import Event In [33]: events = Event(np.array([5, 15, 25]), units="second", ....: labels=["apple", "rock", "elephant"], ....: name="stimulus onset") In [34]: events Out[34]: Event containing 3 events with labels; time units s; datatype int64 name: 'stimulus onset' Epoch ----- A variation of events is where something occurs over a certain period of time, in which case we need to know both the start time and the duration. An :class:`Epoch` contains an array of start or onset times together with an array of durations (or a single value if all epochs have the same duration), and an optional array of labels. .. ipython:: In [35]: from neo import Epoch In [36]: epochs = Epoch(times=np.array([5, 15, 25]), ....: durations=2.0, ....: units="second", ....: labels=["apple", "rock", "elephant"], ....: name="stimulus presentations") In [37]: epochs Out[37]: Epoch containing 3 epochs with labels; time units s; datatype int64 name: 'stimulus presentations' ImageSequence ------------- In addition to electrophysiology, neurophysiology signals may be obtained through functional microscopy. The :class:`ImageSequence` class represents a sequence of images, as a 3D array organized as [frame][row][column]. It behaves similarly to :class:`AnalogSignal`, but in 3D rather than 2D. .. ipython:: In [38]: from quantities import Hz, micrometer In [39]: from neo import ImageSequence In [40]: img_sequence_array = [[[column for column in range(20)]for row in range(20)] ....: for frame in range(10)] In [41]: image_sequence = ImageSequence(img_sequence_array, units='dimensionless', ....: sampling_rate=1 * Hz, ....: spatial_scale=1 * micrometer) In [42]: image_sequence Out[42]: ImageSequence 10 frames with width 20 px and height 20 px; units dimensionless; datatype int64 sampling rate: 1.0 Hz spatial_scale: 1.0 um .. _section-annotations: Annotations =========== .. and array annotations Neo objects have certain *required* metadata, such as the :attr:`sampling_rate` for :class:`AnalogSignals`. There are also certain *recommended* metadata, such as a name and description. For any metadata not covered by the required or recommended fields, additional annotations can be added, e.g.: .. ipython:: In [43]: from quantities import um as µm In [44]: signal.annotate(pipette_tip_diameter=1.5 * µm) In [45]: signal.annotations Out[45]: {'pipette_tip_diameter': array(1.5) * um} For those IO modules that support writing data to file, annotations will also be written, provided they can be serialized to JSON format. .. todo: we should provide a custom ``JSONEncoder`` that supports quantities Array annotations ----------------- Since certain Neo objects contain array data, it is sometimes necessary to annotate individual array elements, or individual columns. For 1D arrays, the array annotations should have the same length as the array, e.g. .. ipython:: In [46]: events.shape In [47]: events.array_annotate(secondary_labels=["red", "green", "blue"]) For 2D arrays, the array annotations should match the shape of the channel dimension, e.g. .. ipython:: In [48]: signal.shape In [49]: signal.array_annotate(quality=["good", "good", "noisy", "good", "noisy"]) Dataset structure ================= The overall structure of a Neo dataset is shown in this figure: .. image:: images/base_schematic.png :height: 500 px :alt: Illustration of the main Neo data types :align: center Beyond the core data classes, Neo has various classes for grouping and structuring different data objects. We have already met two of them, the :class:`Block` and :class:`Segment`. Tree structure -------------- :class:`Block` and :class:`Segment` provide a basic two-level hierarchical structure: :class:`Blocks` contain :class:`Segments`, which contain data objects. :class:`Segments` are used to group data that have a common time basis, i.e. that were recorded at the same time. A :class:`Segment` can be considered as equivalent to a "trial", "episode", "run", "recording", etc., depending on the experimental context. :class:`Segments` have the following attributes, used to access lists of data objects: - :attr:`analogsignals` - :attr:`epochs` - :attr:`events` - :attr:`imagesequences` - :attr:`irregularlysampledsignals` - :attr:`spiketrains` :class:`Block` is the top-level container gathering all of the data, discrete and continuous, for a given recording session. It contains :class:`Segment` and :class:`Group` (see next section) objects in the attributes :attr:`segments` and :attr:`groups`. Grouping and linking objects ---------------------------- Sometimes your data have a structure that goes beyond a simple two-level hierarchy. For example, suppose that you wish to group together signals that were recorded from the same tetrode in multi-tetrode recording setup. For this, Neo provides a :class:`Group` class: .. ipython:: In [50]: from neo import Group In [51]: signal1 = AnalogSignal(np.random.normal(-65.0, 5.0, size=(100, 5)), units=mV, sampling_rate=1 * kHz) In [52]: signal2 = AnalogSignal(np.random.normal(-65.0, 5.0, size=(1000, 5)), units=nA, sampling_rate=10 * kHz) In [53]: group = Group(objects=(signal1, signal2)) In [54]: group Out[54]: Group with 2 analogsignals Since :class:`AnalogSignals` can contain data from multiple channels, sometimes we wish to include only a subset of channels in a group. For this, Neo provides the :class:`ChannelView` class, e.g.: .. ipython:: In [55]: from neo import ChannelView In [56]: channel_of_interest = ChannelView(obj=signal1, index=[2]) In [57]: signal_with_spikes = Group(objects=(channel_of_interest, spike_train)) In [58]: signal_with_spikes Out[58]: Group with 1 spiketrains, 1 channelviews .. todo: give some examples of using ROI with ImageSequence .. _section-performance-memory: Performance and memory consumption ================================== .. lazy loading In some cases you may not wish to load everything in memory, because it could be too big, or you know you only need to access a subset of the data in a file. For this scenario, some IO modules provide an optional argument to their :attr:`read()` methods: ``lazy=True/False``. With ``lazy=True`` all data objects (:class:`AnalogSignal`/:class:`SpikeTrain`/:class:`Event`/:class:`Epoch`/:class:`ImageSequence`) are replaced by proxy objects (:class:`AnalogSignalProxy`/:class:`SpikeTrainProxy`/:class:`EventProxy`/:class:`EpochProxy`/:class:`ImageSequenceProxy`). .. todo: implement ImageSequenceProxy By default (if not specified), ``lazy=False``, i.e. all data are loaded. These proxy objects contain metadata (:attr:`name`, :attr:`sampling_rate`, ...) so they can be inspected, but they do not contain any array-like data. When you want to load the actual data from a proxy object, use the :func:`load()` method to return a real data object of the appropriate type. Furthermore :func:`load()` has a ``time_slice`` argument, which allows you to load only a slice of data from the file. In this way the consumption of memory can be finely controlled. .. ipython: : .. In [59]: lazy_data = AsciiSignalIO("example_data.txt", delimiter=" ").read(lazy=True) .. In [60]: lazy_data .. todo: implement AsciiSignalRawIO Examples ======== For more examples of using Neo, see :doc:`../examples/index`. Citing Neo ========== If you use Neo in your work, please mention the use of Neo in your Methods section, using our RRID_: ``RRID:SCR_000634``. If you wish to cite Neo in publications, please use: Garcia S., Guarino D., Jaillet F., Jennings T.R., Pröpper R., Rautenberg P.L., Rodgers C., Sobolev A.,Wachtler T., Yger P. and Davison A.P. (2014) Neo: an object model for handling electrophysiology data in multiple formats. Frontiers in Neuroinformatics 8:10: doi:`10.3389/fninf.2014.00010`_ A BibTeX entry for LaTeX users is:: @article{neo14, author = {Garcia S. and Guarino D. and Jaillet F. and Jennings T.R. and Pröpper R. and Rautenberg P.L. and Rodgers C. and Sobolev A. and Wachtler T. and Yger P. and Davison A.P.}, doi = {10.3389/fninf.2014.00010}, full_text = {https://www.frontiersin.org/articles/10.3389/fninf.2014.00010/full}, journal = {Frontiers in Neuroinformatics}, month = {February}, title = {Neo: an object model for handling electrophysiology data in multiple formats}, volume = {8:10}, year = {2014} } .. _NumPy: https://numpy.org/ .. _`10.3389/fninf.2014.00010`: https://doi.org/10.3389/fninf.2014.00010 .. _RRID: https://www.rrids.org