Giter Site home page Giter Site logo

jdc08161063 / nlp-progress Goto Github PK

View Code? Open in Web Editor NEW

This project forked from sebastianruder/nlp-progress

0.0 2.0 0.0 197 KB

Repository to track the progress in Natural Language Processing (NLP), including the datasets and the current state-of-the-art for the most common NLP tasks.

License: MIT License

nlp-progress's Introduction

Tracking Progress in Natural Language Processing

Table of contents

This document aims to track the progress in Natural Language Processing (NLP) and give an overview of the state-of-the-art (SOTA) across the most common NLP tasks and their corresponding datasets.

It aims to cover both traditional and core NLP tasks such as dependency parsing and part-of-speech tagging as well as more recent ones such as reading comprehension and natural language inference. The main objective is to provide the reader with a quick overview of benchmark datasets and the state-of-the-art for their task of interest, which serves as a stepping stone for further research. To this end, if there is a place where results for a task are already published and regularly maintained, such as a public leaderboard, the reader will be pointed there.

If you want to find this document again in the future, just go to nlpprogress.com or nlpsota.com in your browser.

Wish list

These are tasks and datasets that are still missing.

  • Bilingual dictionary induction
  • Discourse parsing
  • Keyphrase extraction
  • Knowledge base population (KBP)
  • More dialogue tasks
  • Relation extraction
  • Semi-supervised learning
  • Grammatical error correction
  • Word sense disambiguation

Contributing

If you would like to add a new result, you can do so with a pull request (PR). In order to minimize noise and to make maintenance somewhat manageable, results reported in published papers will be preferred (indicate the venue of publication in your PR); an exception may be made for influential preprints. The result should include the name of the method, the citation, the score, and a link to the paper and should be added so that the table is sorted (with the best result on top).

If your pull request contains a new result, please make sure that "new result" appears somewhere in the title of the PR. This way, we can track which tasks are the most active and receive the most attention.

In order to make reproduction easier, we recommend to add a link to an implementation to each method if available. You can add a Code column (see below) to the table if it does not exist. In the Code column, indicate an official implementation with Official. If an unofficial implementation is available, use Link (see below). If no implementation is available, you can leave the cell empty.

Model Score Paper / Source Code
Official
Link

To add a new dataset or task, follow the below steps. Any new datasets should have been used for evaluation in at least one published paper besides the one that introduced the dataset.

  1. Fork the repository.
  2. If your task is completely new, create a new file and link to it in the table of contents above. If not, add your task or dataset to the respective section of the corresponding file (in alphabetical order).
  3. Briefly describe the dataset/task and include relevant references.
  4. Describe the evaluation setting and evaluation metric.
  5. Show how an annotated example of the dataset/task looks like.
  6. Add a download link if available.
  7. Copy the below table and fill in at least two results (including the state-of-the-art) for your dataset/task (change Score to the metric of your dataset).
  8. Submit your change as a pull request.
Model Score Paper / Source Code

Things to do

  • Add a column for code (see above) to each table and a link to the source code to each method.
  • Add pointers on how to retrieve data.
  • Provide more details regarding the evaluation setup of each task.
  • Add an example to every task/dataset.
  • Add statistics to every dataset.
  • Provide a description and details for every task / dataset.
  • Add a table of contents to every file (particularly the large ones).
  • We could potentially use readthedocs to provide a clearer structure.
  • All current datasets in this list are for the English language (except for UD). In a separate section, we could add datasets for other languages.

nlp-progress's People

Contributors

atcbosselut avatar berndbohnet avatar cbeutenmueller avatar cbockman avatar datquocnguyen avatar davidefiocco avatar elanmart avatar execat avatar fredrodrigues avatar gangeshwark avatar gzupark avatar jfsantos avatar jplu avatar legolego avatar leondz avatar liufly avatar manuelsh avatar miguelballesteros avatar roshansridhar avatar sebastianruder avatar shashiongithub avatar svjan5 avatar tomlisankie avatar ybisk avatar

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.