Giter Site home page Giter Site logo

cortex-lab / phy Goto Github PK

View Code? Open in Web Editor NEW
309.0 309.0 155.0 6.26 MB

phy: interactive visualization and manual spike sorting of large-scale ephys data

License: BSD 3-Clause "New" or "Revised" License

Python 95.97% GLSL 3.66% Makefile 0.07% CSS 0.24% Batchfile 0.01% Shell 0.02% HTML 0.03%
data-analysis electrophysiology python

phy's Issues

Probe widget

  • HTML/SVG/d3.js view for a probe
  • Show the layout (channel positions) with discs
  • Equal normalization for x and y axes
  • Display it with _repr_html_() of the MEA class

KwikModel

  • In phy.io.kwik.model
  • Derive from BaseModel.
  • Load data from HDF5.
  • Save data in HDF5.
  • No high-performance feature/waveform loading yet, just read from HDF5.

Selector

An object that represents a selection of spikes.

  • Can be instanciated with spike_clusters
  • Selection by specifying list of spikes or clusters (trait attributes)
  • Support a maximum number of spikes, with automatic subselection performed if too many spikes are selected by the user
  • Can be linked with a Reader: when the selection changes, new data may need to be fetched from disk or cache

See the API on the wiki.

phy.cluster.manual.color subpackage

  • Facilities to generate distinct colors
  • Generate a random color
  • Generate a color distinct from a given color

(possibly: to be partially merged into VisPy later)

Common interface for sorting algorithms

Inspired by scikit-learn:

Spike detection

# We launch the spike detection.
# This will automatically use multiple CPUs if
# multiple engines have been launched with IPython.parallel.
# This call is asynchronous: the user can continue to work in the notebook,
# and request the task's status.
phy.spikedetect.run(model, algorithm="spikedetekt", ipp_view=c.load_balanced_view())

# Launch clustering.
phy.cluster.run(model, algorithm="klustakwik2", ipp_view=c.load_balanced_view())

ClusterManager class

A structure that handles:

  • moving clusters into groups
  • changing cluster colors
  • relabelling clusters

Similarity matrix

  • See this.
  • Put in phy.cluster.masked_em._stats.
  • Add many unit tests.

To do later: support sparse structures.

Efficient data structures for the features

Benchmarks need to be done in order to find efficient on-disk formats for the features.

  • Features are used for:
    • Feature View (a subset of the spikes, two features x and y)
    • Split action (find all spikes which features x and y within a given polygon)
    • Similarity matrix (a subset of the spikes, but all feature columns)

Example size (high estimate): a (n_spikes, n_features) numerical matrix with:

  • n_spikes = 100,000,000
  • n_features = 10,000
  • about 20 non-null values per spike (sparse array)
  • float32 data type
  • total size (sparse): ~10 GB

Access patterns:

  1. View: arbitrary subset of <10,000 of rows, 2 arbitrary columns x and y.
  2. Split: arbitrary subset of several 10,000s of rows, 2 arbitrary columns x and y.
  3. Matrix: regular subset of ~10,000 rows (strided selection), all columns.

Possibilities:

  • HDF5 (dense, sparse csr, something else)
  • sqlite
  • flat binary

Notes:

  • Possibility to duplicate the data on disk using different structures for different access patterns.
  • Possibility to cache up to X GB of data, with X being a user option (1 by default?), the larger X, the better the performance.
  • We can consider SSDs exclusively for benchmarks.

Improve ClusterView

  • Show selected/unselected
  • Allow multiple selection
  • Require to find the appropriate HTML controls

Trace viewer

Possible starting point. Based on Vispy.

Features

  • Simple paging system.
  • Load the entire page into GPU memory, no dynamic undersampling (first approach).
  • Load and show the previous and next pages.
  • Pan & zoom.
  • Change channel scaling uniformly.
  • Optional automatic page scrolling with a timer.

Inputs

  • NumPy array (or memmap array) of size (nchannels, nsamples)
  • h5py dataset
  • [Optional] spike trains (spike times, neuron indices, masks) to show the spikes within the traces

Options

  • Color of the channels
  • Page size

First prototype: roadmap

  • KwikExperiment #59
  • Selector class #41
  • ClusterView: display all clusters in an IPython widget (HTML/CSS) #32
  • React to selected clusters (list traitlet attribute in the widget)
  • WaveformView #31
  • Session controller

Data structure for cluster-dependent information

We need an efficient structure for per-cluster data.

  • Based on 1D, 2D, or 3D NumPy arrays
  • Cluster list on 1 (example of cluster statistics) or 2 axis (example of CCGs)
  • Fast cluster indexing
  • Fast update when the cluster assignments change
  • Arbitrary cluster indices
  • Relabelling

We'll probably need a dynamic array implementation on top of NumPy (inspired by this for example). For dual cluster axis (CCGs) we'll need something specific as well.

Ideally, this structure would contain a cluster_map variable with the cluster assignements for all spikes. When this variable is changed, the internal arrays are updated.

cc @nippoo

Basic WaveformView

  • Waveforms positioned with a probe geometry
  • Subset of all spikes from a list of given clusters
  • Point colors as a function of the cluster
  • Implement traitlets so that selected spikes, cluster colors, and probe geometry can be easily changed through an API

Manual clustering Session object

Implement a user-level class with control actions:

class Session:
    def merge(clusters)
    def move(clusters, group)

    def undo()
    def redo()

    def start_wizard(self)
    def pause_wizard(self)
    def reset_wizard(self)

This class uses Clustering, ClusterMetadata, and Selection instances, and uses a GlobalHistory to track a unique undo stack with both clustering (merge, split, etc.) and cluster metadata (cluster moved, cluster color changed, etc.) actions.

This class can also update all views through the Selection instance. The different instances communicate with UpdateInfo instances.

Wizard

  • Keep a list of past actions (history):
    • ('move', [2], 0): move cluster 2 to group 0
    • ('merge', [3, 4], [10]): merge clusters 3 and 4 to cluster 10
  • Public methods:
    • next_best()
    • next_candidate()
    • next(): call next_candidate() or next_best() if there's no candidate left
    • merge(clusters, to): called by the Session controller
    • move(clusters, to): called by the Session controller
  • The Wizard keeps a reference to the similarity matrix.
  • What structure for the matrix? (see #43). Idea: defaultdict (cl1, cl2) ==> similarity, default value=0. When the pair doesn't exist, the structure returns 0. We just have to compute the similarity for clusters that have similar channel masks.

Add config toolbox

  • File format: key = value pairs
  • Global (user-wide) options in ~/.phy/config.py
  • Local (dataset-wide) options in ~/.phy/filename/config.py

Basic FeatureView

  • Just a scatter plot of selected spikes
  • Subset of all spikes from a list of given clusters
  • Point colors as a function of the cluster
  • Refactor WaveformVisual in a BaseVisual with baking mechanism

Raster plot

Based on Vispy.

Features

  • Optional paging system

Inputs

  • Spike times (seconds)
  • Neuron indices

Options

  • Positions of the neurons
  • Marker shape

Undo stack

  • Start from the original clustering
  • Save a stack of all actions: merge, custom spk->clu mapping (=split), move (ony forward actions are needed)
  • Write an efficient function that applies a list of actions
  • The undo/redo stack comes for free
  • We can keep a limit to the history length: we save the complete mapping of the oldest item in the history, and apply further changes on it
  • Benchmark: <50 ms to apply 100 successive changes on a 10M-long vector, if we keep in memory a tuple (spike_changed, cluster) (works for both merge and split; those are actually similar actions)

Improve Waveform view

  • Use VisPy transforms for box placement
  • Use ST instead of PanZoom (optional)
  • Support sparse waveforms
  • Better management of keyboard shortcuts
  • Add depth
  • Unit testing interactivity to increase coverage
  • More interactivity options

IPython visualization widget with traitlets

Each view for clustering will be an IPython widget exposing specific traitlet attributes:

  • clustering: a Clustering instance
  • selected_spikes: a ndarray of selected spikes (selection used for highlighting or splitting)
  • clusters: a list of selected clusters
  • cluster_order: a string specifying the cluster order (by index, cluster group, size...?)

A base widget will implement those, custom widgets will derive from it.

In the final interface in IPython, we'll link all these traitlets together using IPython's link() function. When a spike selection changes in one widget, it will also change in the others.

To make this work, we'll need to implement specific traitlet types:

  • ndarray (see this)
  • Clustering

Provisional list of clustering widgets:

  • FeatureView
  • WaveformView
  • TraceView
  • GridView
  • CorrelogramsView
  • SimilarityView

Proper format for logging, error, warn

We need to standardise:

What format the error, warn, log should take (present tense, capitalisation, line breaks, when they should be used, etc)

Also a standard for breaking from functions after an error.

Add simple HDF5 functions

Create a io/h5.py module implementing a simple HDF5 API (on top of h5py).

with open_h5(filename, 'r') as f:
    data = f.read('/path/to/node')
    value = f.read_attr('/path/to/node', 'myattr')

with open_h5(filename, 'w') as f:
    f.write('/path/to/node', data)
    f.write_attr('/path/to/node', 'myattr', value)

Structures for time data

We need specific data structures to represent temporal data (like in the file format, but for in-memory structures). To be implemented in a specific package phy.time.

What are the different types of temporal data?

  • time series
  • continuous data
  • epochs
  • ...?

Structures

We could subclass ndarray to represent temporal data.

Time series

one array + metadata:

  • array of times
  • unit (second, samples with sampling rate, ...)

Continuous data

two arrays:

  • array of times (irregularly sampled data) or sampling rate
  • values

Epochs

one array + metadata

  • a (2, N) array with start and end
  • unit

Array of time series

just a Time Series + another array with the indices (e.g. neuron number for every spike)

Routines

(proposed by Adrien Peyrache)

  • Time series: rate, restrict(interval or other time series)
  • Continuous data: thresholdInterval(value), meanInterval(interval)
  • Epochs: union, intersection, duration, dropShort(ShorterThanThisValue), mergeClose(closerThanThisValue)

Ping @kdharris101 @nippoo Adrien.

Find a dynamic layout library in JavaScript

Should offer the same experience as Qt's docking panels (resizable, drag-and-drop, fullscreen widgets).

A few links:

We should experiment with a few of these libraries and try to implement a prototype (using PNG screenshots of KlustaViewa's views for example).

ClusterView

  • IPython widget in HTML showing a list of clusters
  • Supporting multiple selection
  • Exposes a traitlet attribute with the list of selected clusters

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.