codeneuro / neurofinder Goto Github PK
View Code? Open in Web Editor NEWbenchmarking challenge for finding neurons in imaging data
Home Page: http://neurofinder.codeneuro.org
benchmarking challenge for finding neurons in imaging data
Home Page: http://neurofinder.codeneuro.org
When I click the button of neurofinder.04.00.test and neurofinder.04.01.test, the dataset I download from the link is neurofinder.03.00.test.
Could you tell me how to download the true dataset of neurofinder.04.00.test and neurofinder.04.01.test?Sincerely thank you.
Our handling for username links and repo links needs some work. For example, passing a string @name
yields a GitHub link and [email protected]
generates a mailto link, but a bare string creates an internal link, which is no good. Probably bare links should default to GitHub usernames.
For code repositories, we currently full URIs directly, which is fine, but we need to parse partial links e.g. github.com/...
as github links.
Thanks to @jwittenbach for revealing these bugs!
Consider switching the green color map to something else, suggested by @mathisonian
This is the current look, for reference
Most datasets are around ~1pix per micron, but the Losonczy lab ones are ~2pix per micron.
A bunch of us met during a workshop on large-scale imaging at Janelia Research Campus, on Monday November 4th, for a pow wow on the state of Neurofinder and where to take it next. Here are notes on what we discussed, and where we landed.
The following people were present: Darcy Peterka, Andrew Osheroff (@andrewosh), Jason Wittenbach (@jwittenbach), Tim Holy (@timholy), Nicholas Sofroniew (@sofroniewn), Konrad Kording, Adam Packer (@apacker83), Ferran Diego, Eftychios Pnevmatikakis (@epnev), Johannes Friedrich (@j-friedrich), Jeremy Freeman (@freeman-lab)
First we summarized the current state. We agreed that we've assembled a nice initial collection of datasets and evaluation metrics, with the help of many contributors, and we've made the data available in a variety of useful formats (including web access via notebooks, and download via these links).
But we also agreed that the current automated submission and algorithm running system, which requires that algorithms be written in Python for a standardized environment, and submitted via pull requests, has proven a barrier for algorithm developers because many are working in other languages (including Matlab and Julia) and/or find the process too disconnected from their existing workflows.
We discussed two alternatives for moving the project forward:
After a lively debate, we all favored option 2. But to encourage reproducibility, we can request that users submit Docker images posted to DockerHub, or build Binders with Jupyter notebooks, that reproduce their results. This could be a 👍 next to their submission on the metrics page, and for these submissions we could also run the code to include stats like run time.
Feel free to add comments / ideas / anything I forgot here. Assuming we move ahead with this plan, the next step will be nailing down the format for submissions. We'll make another issue or PR to discuss that.
We should add something about the data durations and frame rates to the "download data" section of the main page, to make it a little easier to find at a glance.
I understand in principle what the metrics mean, but are there some precise definitions somewhere, perhaps a script that can be run on the provided training datasets? How are cells matched?
I am writing a survey paper on neuron detection techniques and suing Neurofinder as a benchmark. I just became aware that the leaderboard isn't loading anymore.
Is there any way I can access the leaderboard?
There's been lots of discussion of the "ground truth" labels currently used for NeuroFinder, so we wanted to consolidate that discussion in one place, and get feedback on some new ideas for moving forward.
The labels used now reflect a mix of approaches, including activity-independent nuclear labeling, hand labeling using the raw data, hand labeling using various summary statistics, and hand curation of semi-automated methods.
All have advantages and disadvantages, but the inconsistency has been a source of confusion for both algorithm developers and those trying to interpret the results (see for example #15 and #16). A particular concern is that the variability in performance across algorithms reflects not only differences in algorithms but also differences in how ground truth is defined.
Ideally, we should have a ground truth definition that (1) can be arrived at by following a clearly specified procedure (2) would yield similar answers if multiple people followed those instructions and (3) is applied consistently to all training and testing datasets.
Here's one proposal:
What do people think of this idea? Or other ideas?
cc @marius10p @agiovann @epnev @Selmaan @aaronkerlin @sofroniewn @svoboda314 @boazmohar @syncrostone
So why test them as such?
For the code I developed (Suite2P), several options would have to be altered to give best performance on such short data, which otherwise would have been robust on 20+ minute datasets.
To give a simple example, Suite2P works on a pre-determined number of PCs, which has a denoising effect. For 20+minute datasets, just fixing this number to 1000 is fine. For much shorter datasets, I would set it to 100-200, otherwise the denoising advantage is lost.
Not sure how to best "fix" this issue, but I have suggested in the past subsampled data. Given the long time scale of the indicators, you don't get many independent samples if you record at 30Hz, and the data could be subsampled at 3Hz for the purpose of cell detection.
Hi all,
The neurofinder website is not responding and I cannot connect to it. Was this done intentionally or is there an error?
Thanks
How would you like this repo to be cited? Who are the authors?
Here is a quick LaTeX citation placeholder I wrote.
@misc{neurofinder2016,
title={neurofinder: benchmarking challenge for finding neurons in calcium imaging data},
author={Peron, Simon and Sofroniew, Nicholas and Svoboda, Karel and Packer, Adam and Russell, Lloyd and Häusser, Michael and Zaremba, Jeff and Kaifosh, Patrick and Losonczy, Attila and Chettih, Selmaan and Minderer, Matthias and Harvey, Chris and Rebo, Maxwell and Conlen, Matthew and Freeman, Jeffrey},
howpublished="Available at \url{https://github.com/codeneuro/neurofinder}",
year={2016},
month={March},
note = "[Online; accessed 02-January-2024]"
}
Add in clear test data preferences--00 includes inactive neurons and 01-04 prefer active neurons
Add in definitions of recall/precision/inclusion/exclusion as well as predictions for the above differences in data: For algorithms that prefer active neurons, best results expected 01-04, low recall but high precision on 00. For algorithms that prefer inactive neurons, best results expected on 00, high recall but low precision on 01-04.
This will hopefully encourage labs to submit their algorithms even if they are not the most successful because no algorithm is ideal across all of the data sets provided. Additionally, enable labs to post an explanation of their results so they can make themselves look good (and make sense of their results).
On mouse over, for each submission, we should show a mean image when mousing over datasets, ideally with defined regions. Can use the space to the left of each table.
Due to the small size of these datasets, and due to the way cells are selected, most of the results we are seeing are really about cells from the mean image, which as a majority do not have activity. I don't care much about these cells, and no one should, because they are overwhelmingly neuropil contaminated. It is perfectly possible that an algorithm detects a lot of these silent cells and does very well by your metrics, while doing very poorly on the 10% of cells that actually matter: the active cells.
I would suggest labelling every cell in your current ground truth as active or inactive, and then also running all the benchmarks on the active subset only. There could be a switch at the top of the website to flip to "active cells only". The definition of active should definitely subtract off neuropil from each ROI, before quantifying something about the variance of the trace, perhaps relative to very high-frequency content of that trace.
and make them easily accessible from leaderboard page
What is the license for the example datasets provided on http://neurofinder.codeneuro.org?
I'd like to repackage/redistribute some of this as NWBv2 files for testing some of our tools (https://github.com/OpenSourceBrain/NWBShowcase), and they could provide good examples for testing other NWB applications.
Hi all,
It looks like the labels on 04.00 might be incorrect in some places. I'm not an expert with this kind of data (more CS background), but there seems to be many instances of non-neurons that look similar to neurons from other datasets but are not marked in this dataset. I've marked some examples from the thumbnail.png
file below. Moreover, the submissions for 04.00.test seem to have a very low precision even among the top solutions. Maybe it's possible the heldout dataset has similar problems? It seems that @mjlm is the original contributor for this dataset. I appreciate any feedback that can be offered.
Here is a video of the 04.00 dataset for easy reference: https://youtu.be/z-IMtnw8gfs
and the 04.01 dataset: https://youtu.be/eg1Gs1a4aUg
Current datasets are way too short and not at all representative of real use scenarios. Many more cells will be detected from a typical 1-2 hour recording than from the length of time provided here. These don't have to be downloadable, perhaps only available for running algorithms remotely on?
Based on my previous correspondence with Jeremy, I think the 00 datasets been registered with a line-by-line algorithm. Is it possible to redo this please?
It did not work very well. There are horizontal break points at specific Y positions in the image. Check the top 100 SVD components to see this. Not the very top ones, but everything after ~5 SVDs has horizontal artifacts. This happens for all datasets in the 00 series, and I get lots of ROIs that are just horizontal lines. I can still see the ROIs on top of these horizontal lines, but it's not ideal.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.