Giter Site home page Giter Site logo

full-length datasets about neurofinder HOT 7 OPEN

codeneuro avatar codeneuro commented on August 30, 2024
full-length datasets

from neurofinder.

Comments (7)

freeman-lab avatar freeman-lab commented on August 30, 2024 1

Echoing some comments by @sofroniewn from the gitter chat...

Most of the datasets are ~7 minutes (3000 frames @ 6-7 Hz), frame rates are in the info.json with each dataset. That's not as long as some data in the literature, but also not completely atypical. To best support a diversity of languages and platforms, especially those that require a license, we can't run algorithms remotely, so for now require people to download and run themselves — that's the main reason for keeping the sizes reasonable!

But here's the current lineup, and what we could add:

00
we have 3000 frames @ 7 Hz
we posted everything we have

01
we have 9000-20000 frames @ 30 Hz total
we posted 3000 @ 30 Hz, could post the rest as is, or downsample to ~8 Hz

02 
we have 8000 frames @ 8 Hz total
we posted 3000 @ 8 Hz, could post the rest as is

03 
we have 9000 frames @ 30 Hz total
we posted 3000 @ 30 Hz, could post the rest as is, or downsample to ~8 Hz

04
we posted 3000 frames @ 6.5 Hz
i think that's all we have

I think my vote would be to standardize all of them @ 8 Hz by downsampling if neccessary, and post everything we have. Though this will increase the data sizes by quite a bit.

from neurofinder.

marius10p avatar marius10p commented on August 30, 2024

It would be useful to have even a single full length recording, ideally one that includes one of the existing datasets, so the performance difference can be estimated.

from neurofinder.

marius10p avatar marius10p commented on August 30, 2024

I see. I guess I was thinking of series 01 and 03 which are in fact 1-2 minutes long.

Sounds like a good idea to standardize @8hz and post everything.

The longest dataset then will be ~10 minutes long. What is the bandwidth limitation, do you have to pay to host the data?

Maybe consider also adding a single good recording, with many neurons, for ~1 hour.

from neurofinder.

freeman-lab avatar freeman-lab commented on August 30, 2024

Ok, updates are done! Data set durations are now as follows:

00
~3000 frames @ 7 Hz

01
~3000 frames @ 7.5 Hz

02
~8000 frames @ 8 Hz

03
~2500 frames @ 7.5 Hz

04
~3000 frames @ 6.5 Hz

So all are now close to 7-8 Hz, all are at least 7 min, and the longest is 17 min. We've now posted everything we have from the original providers. Will add this info to the website.

My only concern adding a ~1 hr dataset to the test data is that its size @ 8 Hz could become onerous for some people's machines / some algorithms, and submitting already requires running algorithms across 7 moderately sized datasets. We could downsample a longer one even more, say to 8000 frames @ 2.5 Hz, but then it wouldn't be consistent with the sample rate of the others.

That said, always happy to add extra datasets as training data of any size just for people to play with, storage / bandwidth isn't really an issue.

from neurofinder.

marius10p avatar marius10p commented on August 30, 2024

Cool, thanks for expanding the datasets! I will be curious if it improves the scores or not.

For someone who comes to the website to see which algorithms might be useful, a single full-length dataset would be invaluable for assessing not just accuracy on a typical recording, but also how fast the algorithms are on realistic recordings. Shouldn't any algorithm be able to run on a 2hr dataset @30hz ? Most of our data is in that range.

from neurofinder.

Selmaan avatar Selmaan commented on August 30, 2024

The frame rate for the harvey lab datasets is not the same for .00 and .01. The frame rates saved in the .json file for each submitted dataset should be correct (for the .01 it is 3hz).

I think having (moderate) diversity in these datasets is a feature, not a bug. The test results so far to me look like a lot of low-rank structure: much better performance on some datasets than others. It's useful to see this to understand the successes and breakdowns of the algorithms under different conditions (and truth definitions).

from neurofinder.

marius10p avatar marius10p commented on August 30, 2024

Well, I think that the wide range of results on different datasets has solely to do with the different types of ground truth definitions, and very little to do with the algorithms. Which isn't great, this benchmark is supposed to test the algorithms, not the annotation method!

from neurofinder.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.