Giter Site home page Giter Site logo

Comments (3)

vitojph avatar vitojph commented on July 25, 2024 2

Hi @mallamanis! I understand your reasons to keep the annotations away from curious eyes, especially when the competition just got started. But, anyway I encourage you folks to release them in the near future in order to foster evaluation of NLP techniques applied to search engines.

AFAIK, it's quite difficult to find freely available datasets and annotations to fully evaluate information retrieval systems. TREC collection is one of them but your data collection would definitely add a lot of value for a different domain.

Thanks anyway for your effort :-)

from codesearchnet.

hamelsmu avatar hamelsmu commented on July 25, 2024 1

I am not sure we are going to release that. I'll let my colleagues chime in on that:

@mmjb @hohsiangwu @mallamanis

from codesearchnet.

mallamanis avatar mallamanis commented on July 25, 2024

Hi,
I am against publicly releasing the annotations at this point. By having them "hidden" behind the leaderboard evaluation we are in less danger of overfitting on the dataset (or someone "cheating" by looking in the test set). The test set is quite small and sooner or later solutions will start overfitting it.

Having said that, (a) I think that we should eventually release them (e.g. after a year or so) and/or (b) share them with individual when they have a good reason (e.g. an alternate use case) and they verbally agree not to share the testset further and not to use the testset for the CodeSearchNet challenge.

Let me know what you think.

from codesearchnet.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.