Giter Site home page Giter Site logo

jagular's Introduction

Jagular

Out-of-core pre-processing of big-ish electrophysiology data, including spike detection and alignment.

Installation

The easiest way to install jagular is to use pip. From the terminal, run:

pip install jagular

Alternatively, you can install the latest version of jagular by running the following commands:

git clone https://github.com/kemerelab/jagular.git
cd jagular
python setup.py [install, develop]

where the develop argument should be used if you want to modify the code.

What is Jagular used for?

We perform long (multiple days to multiple week-long) chronic in-vivo electrophysiology recordings, resulting in many terabytes of data per experiment. These long recording periods pose significant challenges to almost every part of a typical analysis pipeline, including filtering, spike detection, and alignment.

For spike detection, for example, we need to filter the wideband data prior to doing some form of threshold crossing detection. But if we have terabytes worth of data, even a simple step such as filtering can become tricky, since we have to do the filtering out-of-core (since the data does not fit into memory). In addition, there can be substantial drift on the electrodes over such a long period of time, so an adaptive threshold crossing aproach would be more appropriate.

Jagular makes doing these out-of-core tasks easier, by providing a simple interface to read in well-defined chunks from multiple files, in a seamless manner. These chunks can then be used to process the data in a more manageable way. Jagular also has complete support built in for the full (filtering)-(spike-detection)-(waveform-allignment) part of the analysis process, which works out-of-core, and deals elegantly with electrode drift.

Where

download https://pypi.python.org/pypi/jagular
inspiration https://www.youtube.com/watch?v=WLCDAPNTpaM
docs coming soon!
code https://github.com/kemerelab/jagular

License

Jagular is distributed under the MIT license. See the LICENSE file for details.

jagular's People

Contributors

eackermann avatar jchutrue avatar kemerelab avatar

Stargazers

 avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

Forkers

stjordanis

jagular's Issues

implement: channel write generalization

Currently the channel write functionality in utils is a little restrictive. In particular, we need to support different formats:

# re-estimate number of packets
num_packets = len(ts)
my_ch_struct = Struct('<%dh' % num_packets)
my_ts_struct = Struct('<%dI' % num_packets) # ts_dtype should affect this!!!!
ts_packed = my_ts_struct.pack(*ts)

but other generalizations are necessary, too.

refactor: within-epoch operations

Many operations need to be restricted to within-epochs. These include filtering, spike detection and snippet extraction, resampling, __repr__ and so on.

Currently, I re-implement this functionality each time I need it. It would be great to come up with some abstract interface or service that I can invoke to make the code more readable, compact, and maintainable.

I don't have a good / clean way of doing this yet.

implement: downsampling

With an appropriate anti-aliasing filter applied first.

Should probably re-use the same underlying functions as jag.resample

Incompatibility between native Python and numpy types

In filtering.py, certain indices are computed by taking the difference between a native Python type and np.uint64. The result is a float because Python doesn't recognize the numpy int type. Result should be cast into an appropriate int (whether native or numpy) before indexing by this value.

change assumption of integral timestamps

This is prevalent in check_timestamps, has_duplicates and so on, and is unnecessary for the most part. Contiguous segment determination can get messy and tricky when we don't have integral timestamps, but it has for the most part been converted to the more general form already.

implement: waveform alignment

We plan to use the analytic solution to the peak-fit parabola as the peak location, and we plan to use jag.resample to resample the entire waveform using Fourier methods.

Write merged SD and rec file data directly to individual channels

In our current workflow, we take a rec file along with SD card file to create a merged rec file. This step can often be necessary for the following reasons:

  1. The merged rec file contains all neural data channels. This is inefficient especially if we don't record from all channels. The same can be said for the DIOs.
  2. To get a single channel data out, we go from packed data (sd file) to packed data (merged rec file) to individual channel data. It is arguably more time-consuming to do it this way when we can eliminate the intermediate merged file and go directly from packed data in the sd file to individual channel data.

Finally, the timestamps should be in absolute time to allow for pain-free comparison across days (or even within a single day if there were multiple recordings taken in a day). See #22

Absolute time in merge rec

This issue is more relevant for the data format of input files to jagular but is documented here for bookkeeping purposes. A jagular file map will extract the first and last timestamps of the passed-in merged rec files, but this is a problem when the files span multiple days, as the timestamps recorded in .rec files restart every time a new rec file is opened. A solution is to store timestamps as 64-bit absolute time in merged rec files so that the timestamps are unambiguous.

implement: waveform extraction

We need to extract a larger snippet than what we really want, so that we can align the snippets without edge effects.

extend filter configuration for fast out-of-core processing

Currently filter epochs are computed internally, and the API is not properly exposed to allow a user to specify in-core or out-of-core, etc.

In addition, it may be desirable to pass in desired epochs (with index=True) directly to the filter object, so that they won't have to be recomputed every time we want to filter. This approach can also make it simpler to customize the epoch calculation to suit the application needs.

Checking if container is sorted

The numpy diff() function returns an array of the same type as its input. This means that we have to be careful about checking whether a generic container is sorted if that container is of an unsigned type. Currently various jagular functions rely on diff(), including get_contiguous_segments() and may have to be modified if necessary.

Different number of timestamps and channel data samples during raw data extraction

In utils.py, sometimes the raw data extraction fails. The error indicates that the number of timestamps to write to disk is different from the number of samples in the channel data.

Ex: For the file install_07-09-2017_0200_0400_sd09_merge.rec, the error message was
pack expected 2097334 items for packing (got 2097340)

The failed disk write happened on both the dev branch commit c2424a5 and master branch commit
5d88362

implement: spike detection

MAD-based spike detection.

We should think carefully about lock-out periods, peeling, and adaptive detection.

For adaptive detection, we could use the block size, and do detection on a per-block level. However, we will again need to have some padding to ensure that we don't cut some spikes in half, and that we don't detect the same spikes multiple times.

Filtering failure while trying to index

See error and stack trace below:


IndexError Traceback (most recent call last)
in ()
4 fs=30000,
5 fl=600,
----> 6 fh=6000)
7 filt_data_ch0 = np.fromfile('/home/jchu/data/install/long-recording/experiments/07-09-2017/filt-ch.00.raw',
8 np.int16)

~/code/jagular/jagular/filtering.py in filtfilt_mmap(timestamps, finname, foutname, fs, fl, fh, gpass, gstop, dtype, ftype, buffer_len, overlap_len, max_len, **kwargs)
109 overlap_len=overlap_len,
110 max_len=max_len,
--> 111 **kwargs)
112 return y
113

~/code/jagular/jagular/filtering.py in filtfilt_within_epochs_mmap(timestamps, finname, foutname, dtype, sos, buffer_len, overlap_len, max_len, filter_epochs, **kwargs)
178 assume_sorted=assume_sorted,
179 step=step,
--> 180 index=True)
181
182 for (start, stop) in filter_epochs:

~/code/jagular/jagular/utils.py in _get_contiguous_segments_fast(data, step, assume_sorted, index, inclusive)
277 starts = np.insert(breaks+1, 0, 0)
278 stops = np.append(breaks, len(data)-1)
--> 279 bdries = np.vstack((data[starts], data[stops] + step)).T
280 if index:
281 if inclusive:

IndexError: arrays used as indices must be of integer (or boolean) type

benchmark: channel writes

How does it scale with # channels written out? Does the unpacking take most of the time, or the writing back to disk?

How about the effect of the block size?

implement: scriptable interface

Either change the API to make it scriptable, or at least write an example script to do the following:

  • extract all or subset of channels
  • spike detection
    • filter to spike band
    • extract epoch-aware padded snippets
    • align snippets
    • save aligned snippets, as well as aligned spike times
      ------ peeling !?
  • downsample to 3 kHz (.eeg)
  • downsample to 1 kHz
    • LFP filter (.lfp)
    • ripple filter (.ripple)
  • downsample to 120 Hz
    • theta filter (.theta)

Number of timestamps and neural samples not the same in particular file

@jchutrue please post the code snippet below!

Context: one of the 7/10 .dat files (2200--2400) seems problematic. Using the default block size, I think it "worked", but created lots of small epochs (while using max_gap_size=300), which is unexpected for this file. When we used block_size=2**24, we got an error, saying that the number of expected channel data packets to pack was 0 (which implies an empty new_ts), but that approx 2 million samples were passed for packing.

What's going on?
Is jagular missing some special case? Is the data corrupt? Both?

...

Use `logging` and only warn on incomplete packet read, not incomplete block read

I don't actually think that

print("Read {} complete packets but requested {} packets".format(num_samples_read, block_size))

is an interesting occurrence. This is almost always not seen by the user, since JFM gets the remaining packets from the next file. So this is only mildly interesting in the last file, but even then, not really.

What IS interesting to log and/or catch, is if we ever read an incomplete packet, not an incomplete block.

At any rate, we should avoid littering our code with print statements, and should use warnings at worst, or have an optional verbose mode, but logging is highly preferred.

data: multi-file gap data

The timestamps could not be fully sanitized; not clear why yet. This needs to be investigated before this example dataset can be finalized.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.