spikefinder-datasets's Introduction

spikefinder-datasets

This README describes how to load the datasets for the spikefinder analysis benchmarking challenge. You probably received this document when downloading a dataset. Visit the spikefinder repository for more information on the challenge.

Training datasets are provided with ground truth in CSV format. There are five datasets numbered 1-5. For each one there is a calcium file with calcium flouresence signals, and a spikes file with spike rates, both sampled at a common rate of 100 Hz. The columns of each table are neurons, and the rows are time points. In a given dataset, some neurons will have slightly different numbers of time points than others, this is expected.

Along with the data itself, each download includes example loading scripts in python and matlab, the source code of which is in this reposistory.

To contribute example loading scripts for other languages, just submit a pull request! If there are problems with the loading scripts, create a GitHub issue.

spikefinder-datasets's People

Contributors

Stargazers

Watchers

spikefinder-datasets's Issues

data format

Opening a discussion for how to format both the input data and the results / submissions.

According to @philippberens the raw data will be both calcium fluorescence and spike rates both sampled to 100 Hz.

formatting the raw data

The raw data are basically just time series, continuous valued (for fluorescence) and possibly sparse (for spike rates). The key thing here is that the format is generic and easy to load in multiple environments. I kinda prefer csv files for simplicity, so long as they don't get too large. Then, for each dataset, we provide either a single csv file or two csv files, depending on whether it's training or testing. And we include example scripts in this repo to load in Python, Matlab, and any other language.

training / testing

How many datasets / neurons do we have? If it's less than 10-20, it might be easiest to just treat each neuron as a separate "dataset", and pair them up so we have e.g. 00.00 and 00.00.test, then 00.01, 01.00, etc, where the first number is the source lab and the second number is the neuron.

formatting the results

Using JSON here is useful because it can easily be read/write in multiple environments (for comparison to ground truth), and is easily handled for web submissions. It's been successful so far in neurofinder for representing spatial regions.

The results are likely to be sparse in time, so one option would be a structure like this

[
  {
    "dataset": "00.00.test",
    "time": [0, 10, 14, 100, ...],
    "rate": [1, 2, 1, 1.5, ...]
  },
...
]

For each dataset we basically have a sparse array, where you're storing the times of all detected events, and the corresponding numerical value. For algorithms that return binary events, we could assume that if no rate is specified all values are 1.

codeneuro / spikefinder-datasets Goto Github PK

spikefinder-datasets's Introduction

spikefinder-datasets

spikefinder-datasets's People

Contributors

Stargazers

Watchers

Forkers

spikefinder-datasets's Issues

data format

formatting the raw data

training / testing

formatting the results

add a script for loading in matlab

Data still available

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent