Giter Site home page Giter Site logo

nlp-nl's Introduction

NLP-NL: Dutch Natural Language Processing Datasets

Dutch variants of (English) Natural Language Processing datasets.

โš ๏ธ Warning: Work in progress.

Datasets

๐Ÿ“ WiC-NL [recreated WiC] (Word Sense Disambiguation)

Recreation of Words in Context (WiC) based on DutchSemCor.

WIP

๐Ÿ“ WSC-NL [recreated WSC] (Coreference Resolution)

Recreation of The Winograd Schema Challenge (WSC) based on SemEval-2010 Task 1.

WIP

๐Ÿ“ COPA-NL [translated COPA] (Causal Reasoning)

Translation of Choice of Plausible Alternatives (COPA).

Split Source Procedure English Dutch
train COPA-dev (first 400)ยน Google Translate + Human 400 400
dev COPA-dev (last 100)ยน Google Translate + Human 100 100
test COPA-test Google Translate + Human 500 500

ยน These splits are the same as in SuperGLUE.

๐Ÿ“ SQuAD-NL [translated SQuAD / XQuAD] (Question Answering)

Translation of The Stanford Question Answering Dataset (SQuAD).

v1.1

Split Source Procedure English Dutch
train SQuAD-train-v1.1 Google Translate 87,599 87,599
dev SQuAD-dev-v1.1 \ XQuAD Google Translate 9,380 9,380
test SQuAD-dev-v1.1 & XQuAD Google Translate + Human 1,190 1,183

v2.0

Split Source Procedure English Dutch
train SQuAD-train-v2.0 Google Translate 130,319 130,319
dev SQuAD-dev-v2.0 \ XQuAD Google Translate 10,174 10,174
test SQuAD-dev-v2.0 & XQuAD Google Translate + Human 1,699 1,699

Other Similar Datasets (External)

๐Ÿ“ SICK-NL [translated SICK] (Natural Language Inference)

Translation of Sentences Involving Compositional Knowledge (SICK).

Split Source Procedure English Dutch
train SICK-train DeepL + Human 4,439 4,439
dev SICK-trial DeepL + Human 495 495
test SICK-test DeepL + Human 4,906 4,906

nlp-nl's People

Contributors

wietsedv avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.