The phy's discuss from cortex-lab

Probe widget

HTML/SVG/d3.js view for a probe
Show the layout (channel positions) with discs
Equal normalization for x and y axes
Display it with _repr_html_() of the MEA class

KwikModel

In phy.io.kwik.model
Derive from BaseModel.
Load data from HDF5.
Save data in HDF5.
No high-performance feature/waveform loading yet, just read from HDF5.

Possibly use py.test instead of nose

Selector

An object that represents a selection of spikes.

Can be instanciated with spike_clusters
Selection by specifying list of spikes or clusters (trait attributes)
Support a maximum number of spikes, with automatic subselection performed if too many spikes are selected by the user
Can be linked with a Reader: when the selection changes, new data may need to be fetched from disk or cache

See the API on the wiki.

phy.cluster.manual.color subpackage

Facilities to generate distinct colors
Generate a random color
Generate a color distinct from a given color

(possibly: to be partially merged into VisPy later)

Common interface for sorting algorithms

Inspired by scikit-learn:

Spike detection

# We launch the spike detection.
# This will automatically use multiple CPUs if
# multiple engines have been launched with IPython.parallel.
# This call is asynchronous: the user can continue to work in the notebook,
# and request the task's status.
phy.spikedetect.run(model, algorithm="spikedetekt", ipp_view=c.load_balanced_view())

# Launch clustering.
phy.cluster.run(model, algorithm="klustakwik2", ipp_view=c.load_balanced_view())

User-definable parameter with maximum undo stack size

New cluster_mask property in Clustering

compute it at the beginning
with new _update() method which updates cluster counts and masks

ClusterManager class

A structure that handles:

moving clusters into groups
changing cluster colors
relabelling clusters

Similarity matrix

See this.
Put in phy.cluster.masked_em._stats.
Add many unit tests.

To do later: support sparse structures.

Efficient data structures for the features

Benchmarks need to be done in order to find efficient on-disk formats for the features.

Features are used for:
- Feature View (a subset of the spikes, two features x and y)
- Split action (find all spikes which features x and y within a given polygon)
- Similarity matrix (a subset of the spikes, but all feature columns)

Example size (high estimate): a (n_spikes, n_features) numerical matrix with:

n_spikes = 100,000,000
n_features = 10,000
about 20 non-null values per spike (sparse array)
float32 data type
total size (sparse): ~10 GB

Access patterns:

View: arbitrary subset of <10,000 of rows, 2 arbitrary columns x and y.
Split: arbitrary subset of several 10,000s of rows, 2 arbitrary columns x and y.
Matrix: regular subset of ~10,000 rows (strided selection), all columns.

Possibilities:

HDF5 (dense, sparse csr, something else)
sqlite
flat binary

Notes:

Possibility to duplicate the data on disk using different structures for different access patterns.
Possibility to cache up to X GB of data, with X being a user option (1 by default?), the larger X, the better the performance.
We can consider SSDs exclusively for benchmarks.

Improve ClusterView

Show selected/unselected
Allow multiple selection
Require to find the appropriate HTML controls

Trace viewer

Possible starting point. Based on Vispy.

Features

Simple paging system.
Load the entire page into GPU memory, no dynamic undersampling (first approach).
Load and show the previous and next pages.
Pan & zoom.
Change channel scaling uniformly.
Optional automatic page scrolling with a timer.

Inputs

NumPy array (or memmap array) of size (nchannels, nsamples)
h5py dataset
[Optional] spike trains (spike times, neuron indices, masks) to show the spikes within the traces

Options

Color of the channels
Page size

SparseCSR class in Python

Just a data structure containing:
- val
- channel (col_ind)
- spike (row_ptr)
to_dense()
from_dense

Implement CCGs/ACGs with NumPy

Possible starting point.

In phy.cluster.manual.stats.py.

Datasets package

See here. Should be in:

phy/datasets/mock.py
phy/datasets/tests/test_mock.py

First prototype: roadmap

KwikExperiment #59
Selector class #41
ClusterView: display all clusters in an IPython widget (HTML/CSS) #32
React to selected clusters (list traitlet attribute in the widget)
WaveformView #31
Session controller

Remove unused imports

Consider putting back flake8 F401
Use autoflake once on the whole code base to remove unused imports.

Data structure for cluster-dependent information

We need an efficient structure for per-cluster data.

Based on 1D, 2D, or 3D NumPy arrays
Cluster list on 1 (example of cluster statistics) or 2 axis (example of CCGs)
Fast cluster indexing
Fast update when the cluster assignments change
Arbitrary cluster indices
Relabelling

We'll probably need a dynamic array implementation on top of NumPy (inspired by this for example). For dual cluster axis (CCGs) we'll need something specific as well.

Ideally, this structure would contain a cluster_map variable with the cluster assignements for all spikes. When this variable is changed, the internal arrays are updated.

cc @nippoo

Basic WaveformView

Waveforms positioned with a probe geometry
Subset of all spikes from a list of given clusters
Point colors as a function of the cluster
Implement traitlets so that selected spikes, cluster colors, and probe geometry can be easily changed through an API

Manual clustering Session object

Implement a user-level class with control actions:

class Session:
    def merge(clusters)
    def move(clusters, group)

    def undo()
    def redo()

    def start_wizard(self)
    def pause_wizard(self)
    def reset_wizard(self)

This class uses Clustering, ClusterMetadata, and Selection instances, and uses a GlobalHistory to track a unique undo stack with both clustering (merge, split, etc.) and cluster metadata (cluster moved, cluster color changed, etc.) actions.

This class can also update all views through the Selection instance. The different instances communicate with UpdateInfo instances.

Set up flake

Locally and on Travis.

See here.

Wizard

Keep a list of past actions (history):
- ('move', [2], 0): move cluster 2 to group 0
- ('merge', [3, 4], [10]): merge clusters 3 and 4 to cluster 10
Public methods:
- next_best()
- next_candidate()
- next(): call next_candidate() or next_best() if there's no candidate left
- merge(clusters, to): called by the Session controller
- move(clusters, to): called by the Session controller
The Wizard keeps a reference to the similarity matrix.
What structure for the matrix? (see #43). Idea: defaultdict (cl1, cl2) ==> similarity, default value=0. When the pair doesn't exist, the structure returns 0. We just have to compute the similarity for clusters that have similar channel masks.

Trace View

Experimental VisPy code here: https://github.com/kwikteam/experimental/tree/master/plot

Popup widget in IPython

See this issue.

Add config toolbox

File format: key = value pairs
Global (user-wide) options in ~/.phy/config.py
Local (dataset-wide) options in ~/.phy/filename/config.py

Basic FeatureView

Just a scatter plot of selected spikes
Subset of all spikes from a list of given clusters
Point colors as a function of the cluster
Refactor WaveformVisual in a BaseVisual with baking mechanism

ndarray traitlet type

Raster plot

Based on Vispy.

Features

Optional paging system

Inputs

Spike times (seconds)
Neuron indices

Options

Positions of the neurons
Marker shape

Undo stack

Start from the original clustering
Save a stack of all actions: merge, custom spk->clu mapping (=split), move (ony forward actions are needed)
Write an efficient function that applies a list of actions
The undo/redo stack comes for free
We can keep a limit to the history length: we save the complete mapping of the oldest item in the history, and apply further changes on it
Benchmark: <50 ms to apply 100 successive changes on a 10M-long vector, if we keep in memory a tuple (spike_changed, cluster) (works for both merge and split; those are actually similar actions)

Update Probe instance to support multi shank

Fractional peak offset in waveform view

In the vertex shader, a_time + offset with different offset for every spike ==> vertex

Csicsvari format input functions

Improve Waveform view

Use VisPy transforms for box placement
Use ST instead of PanZoom (optional)
Support sparse waveforms
Better management of keyboard shortcuts
Add depth
Unit testing interactivity to increase coverage
More interactivity options

Move phy/io/utils.py to phy/utils/array.py

After #49 is merged.

phy/utils/array.py will contain array-related utility functions.

Integrate CCG into the session

IPython visualization widget with traitlets

Each view for clustering will be an IPython widget exposing specific traitlet attributes:

clustering: a Clustering instance
selected_spikes: a ndarray of selected spikes (selection used for highlighting or splitting)
clusters: a list of selected clusters
cluster_order: a string specifying the cluster order (by index, cluster group, size...?)

A base widget will implement those, custom widgets will derive from it.

In the final interface in IPython, we'll link all these traitlets together using IPython's link() function. When a spike selection changes in one widget, it will also change in the others.

To make this work, we'll need to implement specific traitlet types:

ndarray (see this)
Clustering

Provisional list of clustering widgets:

FeatureView
WaveformView
TraceView
GridView
CorrelogramsView
SimilarityView

Proper format for logging, error, warn

We need to standardise:

What format the error, warn, log should take (present tense, capitalisation, line breaks, when they should be used, etc)

Also a standard for breaking from functions after an error.

Set up coverage

Locally and on Travis.

Make clusters and their spikes deletable

For performance and memory considerations, it may be needed to delete noisy spikes to save memory.

Add simple HDF5 functions

Create a io/h5.py module implementing a simple HDF5 API (on top of h5py).

with open_h5(filename, 'r') as f:
    data = f.read('/path/to/node')
    value = f.read_attr('/path/to/node', 'myattr')

with open_h5(filename, 'w') as f:
    f.write('/path/to/node', data)
    f.write_attr('/path/to/node', 'myattr', value)

Structures for time data

We need specific data structures to represent temporal data (like in the file format, but for in-memory structures). To be implemented in a specific package phy.time.

What are the different types of temporal data?

time series
continuous data
epochs
...?

Structures

We could subclass ndarray to represent temporal data.

Time series

one array + metadata:

array of times
unit (second, samples with sampling rate, ...)

Continuous data

two arrays:

array of times (irregularly sampled data) or sampling rate
values

Epochs

one array + metadata

a (2, N) array with start and end
unit

Array of time series

just a Time Series + another array with the indices (e.g. neuron number for every spike)

Routines

(proposed by Adrien Peyrache)

Time series: rate, restrict(interval or other time series)
Continuous data: thresholdInterval(value), meanInterval(interval)
Epochs: union, intersection, duration, dropShort(ShorterThanThisValue), mergeClose(closerThanThisValue)

Ping @kdharris101 @nippoo Adrien.

Find a dynamic layout library in JavaScript

Should offer the same experience as Qt's docking panels (resizable, drag-and-drop, fullscreen widgets).

A few links:

http://layout.jquery-dev.com/demos.cfm (looks good, try simple.html, but maybe too limited: not possible to move widgets around)
http://www.dockspawn.com/ (probably the most complete option, github here, but unmaintained)
http://gridster.net/ (not exactly what we're looking for)
http://troolee.github.io/gridstack.js/ (not exactly what we're looking for)
http://methvin.com/splitter/
https://github.com/Lucent/drag-resize-dock (untested)

We should experiment with a few of these libraries and try to implement a prototype (using PNG screenshots of KlustaViewa's views for example).

ClusterView

IPython widget in HTML showing a list of clusters
Supporting multiple selection
Exposes a traitlet attribute with the list of selected clusters

Singleton instance that handles Kwik file open/close

Different classes that work with files should use this singleton to access the files.

opener = KwikOpener()
fh = opener.open('myfile.kwik')
opener.close('myfile.kwik')

cortex-lab / phy Goto Github PK

phy's Issues

Spike detection

Features

Inputs

Options

Features

Inputs

Options

Structures

Time series

Continuous data

Epochs

Array of time series

Routines

Recommend Projects

Recommend Topics

Recommend Org