codeneuro / neurofinder Goto Github PK

benchmarking challenge for finding neurons in imaging data

Home Page: http://neurofinder.codeneuro.org

JavaScript 91.53% HTML 1.55% CSS 6.91%

neurofinder's Introduction

neurofinder

benchmarking challenge for finding neurons in calcium imaging data.

Calcium imaging is a widely used techniqe in modern neuroscience for measuring the activity of large populations of neurons. Identifying individual neurons in these images remains a challenge, and most approaches still rely on manual inspection or annotation. We have assembled a collection of datasets with ground truth labels, and made a small web app for researchers to submit results and compare algorithms.

This repo contains the code for the web app (for displaying and submitting results) and the server (for retrieving and updating results from a database). This document describes how to download the data, develop algoritms in your favorite computing environment, and submit your results for evaluation!

For more info, check out the following repositories:

neurofinder-datasets example scripts for loading the datasets
neurofinder-python python module used to compare algorithm results

step (1) download and develop

Browse the list of datasets below.
Download one or more of them.
Use the example scripts to learn how to load the data (examples in python, javascript, and matlab).
Develop and refine your algorithm.

During development, you might want to use the neurofinder python module to evaluate the performance of your algorithm on the training data. It computes the same metrics that will be used to evaluate your algorithm. If you are working in another language, you can look at that repository for a full explanation of the metrics, and see the source code (it's pretty simple!)

step (2) submit your algorithm

Run your algorithm on all the testing datasets.
Go to the neurofinder website.
Click the submit tab and upload your results file!

submission format

Your results should be formatted as a single JSON file with the coordinates of all identified neurons for all testing datasets, in the following format:

[
  {
    "dataset": "00.00.test",
    "regions": [{"coordinates": [[x, y], [x, y], ...]}, {"coordinates": [[x, y], [x, y], ...]}, ...]
  },
  {
    "dataset": "00.01.test",
    "regions": [{"coordinates": [[x, y], [x, y], ...]}, {"coordinates": [[x, y], [x, y], ...]}, ...]
  },
  ...
]

If you are working in python, you can generate this file by storing your results in a list of dictionaries and writing it to JSON:

import json
results = [
  {'dataset': '00.00.test', 'regions': [{'coordinates': [[0, 1], [2, 3]]}]},
  {'dataset': '00.01.test', 'regions': [{'coordinates': [[0, 1], [2, 3]]}]},
]
with open('results.json', 'w') as f:
  f.write(json.dumps(results))

If you are working in matlab, get jsonlab then generate and save a nested struct:

results = [
  struct('dataset', '00.00.test', 'regions', struct('coordinates', [[0, 1]; [2, 3]])),
  struct('dataset', '00.01.test', 'regions', struct('coordinates', [[0, 1]; [2, 3]]))
]
savejson('', results, 'results.json')

If you are working in javascript, just build and write the object:

results = [
  {dataset: '00.00.test', regions: [{coordinates: [[0, 1], [2, 3]]}]},
  {dataset: '00.01.test', regions: [{coordinates: [[0, 1], [2, 3]]}]},
]
require('fs').writeFile('results.json', JSON.stringify(results))

datasets

Datasets have been generously provided by the following individuals and labs:

Simon Peron, Nicholas Sofroniew, & Karel Svoboda / Janelia Research Campus : 00, 02
Adam Packer, Lloyd Russell & Michael Häusser / UCL : 01
Jeff Zaremba, Patrick Kaifosh & Attila Losonczy / Columbia : 03
Selmaan Chettih, Matthias Minderer, Chris Harvey / Harvard : 04

All datasets are hosted on Amazon S3, and direct links to zipped downloads are below.

Each dataset includes

raw image data as a collection of 2D TIFF files that represent a single imaging plane over time
example scripts for loading data in both python, javascript, and matlab

The training data additionally includes the coordinates of identified neurons (the "ground truth") as JSON. Depending on the dataset, these ground truth labels are based on a separate anatomical nuclear marker and/or hand annotations from the dataset providers.

training data

neurofinder.00.00 neurofinder.00.01 neurofinder.00.02 neurofinder.00.03 neurofinder.00.04 neurofinder.00.05 neurofinder.00.06 neurofinder.00.07 neurofinder.00.08 neurofinder.00.09 neurofinder.00.10 neurofinder.00.11 neurofinder.01.00 neurofinder.01.01 neurofinder.02.00 neurofinder.02.01 neurofinder.03.00 neurofinder.04.00 neurofinder.04.01

testing data

neurofinder.00.00.test neurofinder.00.01.test neurofinder.01.00.test neurofinder.01.01.test neurofinder.02.00.test neurofinder.02.01.test neurofinder.03.00.test neurofinder.04.00.test neurofinder.04.01.test

web app

To run the web app in development, clone this repo, then call

npm install
npm start

Which will start the server and live bundle the static assets.

To run in production, bundle the static assets using npm run build then start the server using npm run serve.

You need to specify a mongo database inside server/config.js, and also set the environmental variables MONGO_USER and MONGO_PASSWORD. A script for fetching datasets is included in server/fetch.js, and also requires that AWS_ACCESS_KEY_ID, and AWS_SECRET_ACCESS_KEY, because the data is fetched from S3.

neurofinder's People

Contributors

Stargazers

Watchers

neurofinder's Issues

add durations to data section

We should add something about the data durations and frame rates to the "download data" section of the main page, to make it a little easier to find at a glance.

License for datasets?

What is the license for the example datasets provided on http://neurofinder.codeneuro.org?

I'd like to repackage/redistribute some of this as NWBv2 files for testing some of our tools (https://github.com/OpenSourceBrain/NWBShowcase), and they could provide good examples for testing other NWB applications.

fix links

Our handling for username links and repo links needs some work. For example, passing a string @name yields a GitHub link and [email protected] generates a mailto link, but a bare string creates an internal link, which is no good. Probably bare links should default to GitHub usernames.

For code repositories, we currently full URIs directly, which is fine, but we need to parse partial links e.g. github.com/... as github links.

Thanks to @jwittenbach for revealing these bugs!

The tested algorithms have not been developed for 1-2 minute long recordings

So why test them as such?

For the code I developed (Suite2P), several options would have to be altered to give best performance on such short data, which otherwise would have been robust on 20+ minute datasets.

To give a simple example, Suite2P works on a pre-determined number of PCs, which has a denoising effect. For 20+minute datasets, just fixing this number to 1000 is fine. For much shorter datasets, I would set it to 100-200, otherwise the denoising advantage is lost.

Not sure how to best "fix" this issue, but I have suggested in the past subsampled data. Given the long time scale of the indicators, you don't get many independent samples if you record at 30Hz, and the data could be subsampled at 3Hz for the purpose of cell detection.

show mean images

On mouse over, for each submission, we should show a mean image when mousing over datasets, ideally with defined regions. Can use the space to the left of each table.

Add Data Explanation

Add in clear test data preferences--00 includes inactive neurons and 01-04 prefer active neurons

Add in definitions of recall/precision/inclusion/exclusion as well as predictions for the above differences in data: For algorithms that prefer active neurons, best results expected 01-04, low recall but high precision on 00. For algorithms that prefer inactive neurons, best results expected on 00, high recall but low precision on 01-04.

This will hopefully encourage labs to submit their algorithms even if they are not the most successful because no algorithm is ideal across all of the data sets provided. Additionally, enable labs to post an explanation of their results so they can make themselves look good (and make sense of their results).

registration artifacts in 00 series due to line-by-line registration

Based on my previous correspondence with Jeremy, I think the 00 datasets been registered with a line-by-line algorithm. Is it possible to redo this please?

It did not work very well. There are horizontal break points at specific Y positions in the image. Check the top 100 SVD components to see this. Not the very top ones, but everything after ~5 SVDs has horizontal artifacts. This happens for all datasets in the 00 series, and I get lots of ROIs that are just horizontal lines. I can still see the ROIs on top of these horizontal lines, but it's not ideal.

"ground truth" discussion

There's been lots of discussion of the "ground truth" labels currently used for NeuroFinder, so we wanted to consolidate that discussion in one place, and get feedback on some new ideas for moving forward.

current concerns

The labels used now reflect a mix of approaches, including activity-independent nuclear labeling, hand labeling using the raw data, hand labeling using various summary statistics, and hand curation of semi-automated methods.

All have advantages and disadvantages, but the inconsistency has been a source of confusion for both algorithm developers and those trying to interpret the results (see for example #15 and #16). A particular concern is that the variability in performance across algorithms reflects not only differences in algorithms but also differences in how ground truth is defined.

moving forward

Ideally, we should have a ground truth definition that (1) can be arrived at by following a clearly specified procedure (2) would yield similar answers if multiple people followed those instructions and (3) is applied consistently to all training and testing datasets.

Here's one proposal:

Provide each of several independent labelers (at least 3-5) with a mean and local correlation image
Also provide several examples of what individual neurons look like to the labelers
Have them label all datasets and aggregate the results via some consensus procedure

What do people think of this idea? Or other ideas?

cc @marius10p @agiovann @epnev @Selmaan @aaronkerlin @sofroniewn @svoboda314 @boazmohar @syncrostone

The dataset can't be download correctly.

When I click the button of neurofinder.04.00.test and neurofinder.04.01.test, the dataset I download from the link is neurofinder.03.00.test.
Could you tell me how to download the true dataset of neurofinder.04.00.test and neurofinder.04.01.test?Sincerely thank you.

Website gives connection error

From a linux desktop and with centos OS, most of the time I will receive a connection error

Sometimes it would seem to access the page but I suspect it is a browser artifact accessing the cache. Submission of results fails with the error "upload failed"

what are the performance measures?

I understand in principle what the metrics mean, but are there some precise definitions somewhere, perhaps a script that can be run on the provided training datasets? How are cells matched?

04.00 erroneous/inconsistent labels

Hi all,

It looks like the labels on 04.00 might be incorrect in some places. I'm not an expert with this kind of data (more CS background), but there seems to be many instances of non-neurons that look similar to neurons from other datasets but are not marked in this dataset. I've marked some examples from the thumbnail.png file below. Moreover, the submissions for 04.00.test seem to have a very low precision even among the top solutions. Maybe it's possible the heldout dataset has similar problems? It seems that @mjlm is the original contributor for this dataset. I appreciate any feedback that can be offered.

Here is a video of the 04.00 dataset for easy reference: https://youtu.be/z-IMtnw8gfs
and the 04.01 dataset: https://youtu.be/eg1Gs1a4aUg

scale detection distance threshold by pixels?

Most datasets are around ~1pix per micron, but the Losonczy lab ones are ~2pix per micron.

add 'wall time' as one of the metrics for assessing the different algo's

ignore case on search

Searching for names / algorithms currently cares about case. We should probably just ignore case during search. We might also want to normalize names and algorithms during submission to all lowercase, as mixed case looks slightly awkward:

color themes

Consider switching the green color map to something else, suggested by @mathisonian

This is the current look, for reference

full-length datasets

Current datasets are way too short and not at all representative of real use scenarios. Many more cells will be detected from a typical 1-2 hour recording than from the length of time provided here. These don't have to be downloadable, perhaps only available for running algorithms remotely on?

Citation?

How would you like this repo to be cited? Who are the authors?

Here is a quick LaTeX citation placeholder I wrote.

@misc{neurofinder2016,
title={neurofinder: benchmarking challenge for finding neurons in calcium imaging data},
author={Peron, Simon and Sofroniew, Nicholas and Svoboda, Karel and Packer, Adam and Russell, Lloyd and Häusser, Michael and Zaremba, Jeff and Kaifosh, Patrick and Losonczy, Attila and Chettih, Selmaan and Minderer, Matthias and Harvey, Chris and Rebo, Maxwell and Conlen, Matthew and Freeman, Jeffrey},
howpublished="Available at \url{https://github.com/codeneuro/neurofinder}",
year={2016},
month={March},
note = "[Online; accessed 02-January-2024]"
}

Codeneuro website not working

I am writing a survey paper on neuron detection techniques and suing Neurofinder as a benchmark. I just became aware that the leaderboard isn't loading anymore.
Is there any way I can access the leaderboard?

Neurofinder website not connecting

Hi all,

The neurofinder website is not responding and I cannot connect to it. Was this done intentionally or is there an error?

Thanks

Neurofinder Monday!

A bunch of us met during a workshop on large-scale imaging at Janelia Research Campus, on Monday November 4th, for a pow wow on the state of Neurofinder and where to take it next. Here are notes on what we discussed, and where we landed.

The following people were present: Darcy Peterka, Andrew Osheroff (@andrewosh), Jason Wittenbach (@jwittenbach), Tim Holy (@timholy), Nicholas Sofroniew (@sofroniewn), Konrad Kording, Adam Packer (@apacker83), Ferran Diego, Eftychios Pnevmatikakis (@epnev), Johannes Friedrich (@j-friedrich), Jeremy Freeman (@freeman-lab)

First we summarized the current state. We agreed that we've assembled a nice initial collection of datasets and evaluation metrics, with the help of many contributors, and we've made the data available in a variety of useful formats (including web access via notebooks, and download via these links).

But we also agreed that the current automated submission and algorithm running system, which requires that algorithms be written in Python for a standardized environment, and submitted via pull requests, has proven a barrier for algorithm developers because many are working in other languages (including Matlab and Julia) and/or find the process too disconnected from their existing workflows.

We discussed two alternatives for moving the project forward:

Continue to provide only the training data for download, but allow people to submit algorithms in their language of choice, which we would still run automatically on the test data. This would hopefully broaden the community, while ensuring that algorithms can actually be run and reproduced. But we'd need to modify the testing framework to support multiple languages and handle complex environment specifications. Of course, we could require that people submit Docker images that we use to run their algorithm, but for most computational neuroscientists writing and building Docker images may be a significant barrier — as much or more so than the current system.
Provide the training data and the (unlabeled) test data for download, and allow people to submit algorithm results on the test data. It was noted that this is more similar to benchmarking setups used e.g. for object recognition. People will still need to include a link to a github repo with their submission, but we won't run their code. In this version, we can't guarantee reproducibility, but it would open Neurofinder to the broadest community possible, and eliminate nearly all barriers to entry: presumably people can run their own code, so to submit they just need to run their code on the test data and submit the results.

After a lively debate, we all favored option 2. But to encourage reproducibility, we can request that users submit Docker images posted to DockerHub, or build Binders with Jupyter notebooks, that reproduce their results. This could be a 👍 next to their submission on the metrics page, and for these submissions we could also run the code to include stats like run time.

Feel free to add comments / ideas / anything I forgot here. Assuming we move ahead with this plan, the next step will be nailing down the format for submissions. We'll make another issue or PR to discuss that.

CC @broxtronix @poolio @mathisonian @logang

have a switch between 1) results as currently displayed, and 2) results ONLY on active cells

Due to the small size of these datasets, and due to the way cells are selected, most of the results we are seeing are really about cells from the mean image, which as a majority do not have activity. I don't care much about these cells, and no one should, because they are overwhelmingly neuropil contaminated. It is perfectly possible that an algorithm detects a lot of these silent cells and does very well by your metrics, while doing very poorly on the 10% of cells that actually matter: the active cells.

I would suggest labelling every cell in your current ground truth as active or inactive, and then also running all the benchmarks on the active subset only. There could be a switch at the top of the website to flip to "active cells only". The definition of active should definitely subtract off neuropil from each ROI, before quantifying something about the variance of the trace, perhaps relative to very high-frequency content of that trace.

store sources from contributed algorithms

and make them easily accessible from leaderboard page