Topic: nlp-datasets Goto Github

Some thing interesting about nlp-datasets

👇 Here are 139 public repositories matching this topic...

aajanki / finnish-nlp-datasets

nlp-datasets,Open Finnish NLP datasets

User: aajanki

Home Page: https://aajanki.github.io/finnish-nlp-datasets/

nlp-datasets finnish opendata

afrisenti-semeval / afrisent-semeval-2023

nlp-datasets,AfriSenti-SemEval Shared Task 12: Sentiment Analysis for African languages : https://afrisenti-semeval.github.io/

Organization: afrisenti-semeval

african-languages africanlp low-resolution-data low-resouce-language low-resource-nlp opinion-mining semeval-sentiment sentiment sentiment-analysis sentiment-classification

andythefactory / romanian-nlp-datasets

nlp-datasets,A list of Romanian NLP Datasets

User: andythefactory

nlp nlp-datasets nlp-resources romanian romanian-language nlp-dataset nlp-data

aryashah2k / sasbitathon-winningsolution

nlp-datasets,1st Place solution for the SAS | GIM Bitathon, an annual Data Science Hackathon organized by SAS and Goa Institute of Management. The dataset worked on is the subset of the consumer complaints database provided by www.consumerfinance.gov

User: aryashah2k

data-science nlp-datasets natural-language-processing spacy heroku-deployment python streamlit hackathon-project machine-learning

bohdan-khomtchouk / nero-nlp

nlp-datasets,NERO-nlp is a PyPI package for biomedical Named Entity (Recognition) Ontology

User: bohdan-khomtchouk

Home Page: https://pypi.org/project/NERO-nlp

biomedical-named-entity-recognition natural-language-processing nlp nlp-datasets nlp-library

bothub-it / bothub

nlp-datasets,Bothub is an open platform for predicting, training and sharing NLP datasets in multiple languages

Organization: bothub-it

Home Page: https://bothub.it

ilhasoft nlp python data chatbot push bots sharing-nlp-datasets bothub docker multiple-languages issue-tracker nlp-datasets database webapp

cjiang2 / vdcnn

nlp-datasets,Implementation of Very Deep Convolutional Neural Network for Text Classification

User: cjiang2

convolutional-neural-networks keras keras-tensorflow nlp nlp-datasets tensorflow text-classification vdcnn

cybermatt / russian-names

nlp-datasets, Library for generation of russian names

User: cybermatt

text-processing text-generation nlp-datasets

d0rj / ruslit

nlp-datasets,📚 A small collection of Russian literature 📚

User: d0rj

dataset nlp-datasets nlp russian-literature kaggle

dibyakanti / autotnli-code

nlp-datasets,This repository contains the official code for the paper : Realistic Data Augmentation Framework for Enhancing Tabular Reasoning.

User: dibyakanti

Home Page: https://autotnli.github.io/

nlp inference transformer wikipedia emnlp2022 nli nlp-datasets nlp-machine-learning semi-structured-data tables

divkakwani / webcorpus

nlp-datasets,Generate large textual corpora for almost any language by crawling the web

User: divkakwani

news-crawler nlp multilingual nlp-datasets datasets indic-languages

dkulagin / kartaslov

nlp-datasets,Открытые лингвистические датасеты: тональный словарь русского языка КартаСловСент, датасет по семантике, ассоциативный граф и датасет по орфографическим ошибкам и опечаткам.

User: dkulagin

nlp-datasets computational-linguistics datasets russian-specific

fido-ai / ua-datasets

nlp-datasets,A collection of datasets for Ukrainian language

Organization: fido-ai

Home Page: https://fido-ai.github.io/ua-datasets/

dataset ukrainian-language nlp text-classification token-classification question-answering nlp-datasets natural-language-processing

gcunhase / amicorpusxml

nlp-datasets,Extracts Transcript and Summary (Abstractive and Extractive) from the AMI Meeting Corpus

User: gcunhase

nlp-datasets meeting-dataset xml-to-story convert-to-cnn-dm-format

gkiril / benchie

nlp-datasets,Comprehensive evaluation framework for Open Information Extraction.

User: gkiril

Home Page: https://aclanthology.org/2022.acl-long.307/

open-information-extraction information-extraction benchmark-framework natural-language-processing natural-language-understanding nlp nlp-datasets dataset

gpt-tester / chatgpt-test-dataset-01

nlp-datasets,a small test dataset for use with OpenAI's ChatGPT

User: gpt-tester

chatgpt chatgpt-api openai openai-api go golang gpt-3 gpt3 python rust rust-lang nlp nlp-datasets nlp-machine-learning nlp-parsing

grammarly / ua-gec

nlp-datasets,UA-GEC: Grammatical Error Correction and Fluency Corpus for the Ukrainian Language

Organization: grammarly

Home Page: https://ua-gec-dataset.grammarly.ai/

dataset corpus gec grammatical-error-correction ukrainian-language corpus-data corpus-tools natural-language-processing nlp-datasets

guhhhhaa / 4675-scifi

nlp-datasets,chinese NLP corpus of chinese science fiction,chinese science fiction corpus : About 4675 Chinese science fiction novels 大约有4675本科幻小说，中文科幻小说自然语言处理语料库，中文科幻小说文本语料库，中文科幻小说文本数据库，科幻小说语料

User: guhhhhaa

scifi corpus corpus-data nlp nlp-datasets nlp-machine-learning nlp-resources science-fiction chinese-nlp datasets

guhhhhaa / wula-scifi

nlp-datasets,chinese NLP corpus of chinese science fiction, chinese science fiction corpus: Archive of the Ark Plan of Ula Science Fiction Website 乌拉科幻小说网方舟计划存档，中文科幻小说自然语言处理语料库，中文科幻小说文本语料库，中文科幻小说文本数据库，科幻小说语料

User: guhhhhaa

corpus corpus-data nlp nlp-datasets nlp-machine-learning nlp-resources science-fiction scifi chinese-nlp datasets

hellohaptik / multi-task-nlp

nlp-datasets,multi_task_NLP is a utility toolkit enabling NLP developers to easily train and infer a single model for multiple tasks.

Organization: hellohaptik

Home Page: https://multi-task-nlp.readthedocs.io/en/latest/

context-awareness entailment intent-classification machine-comprehension multitask-learning named-entity-recognition nli-tasks nlp nlp-apis nlp-datasets nlp-library pytorch ranking sentence-classification sequence-labeling transformers

ink-usc / commongen

nlp-datasets,A Constrained Text Generation Challenge Towards Generative Commonsense Reasoning

Organization: ink-usc

Home Page: http://inklab.usc.edu/CommonGen/

natural-language-processing commonsense-reasoning nlg-dataset natural-language-generation language-generation-dataset machine-reasoning deep-learning text-generation nlp-datasets

ink-usc / riddlesense

nlp-datasets,RiddleSense: Reasoning about Riddle Questions Featuring Linguistic Creativity and Commonsense Knowledge

Organization: ink-usc

Home Page: https://inklab.usc.edu/RiddleSense/

commonsense riddles natural-language-processing datasets nlp-datasets question-answering

ink-usc / triggerner

nlp-datasets,TriggerNER: Learning with Entity Triggers as Explanations for Named Entity Recognition (ACL 2020)

Organization: ink-usc

Home Page: https://arxiv.org/abs/2004.07493

named-entity-recognition dataset nlp-resources nlp-datasets information-extraction sequence-tagging low-resource

ink-usc / xcsr

nlp-datasets,Code Repo for the ACL21 paper "Common Sense Beyond English: Evaluating and Improving Multilingual LMs for Commonsense Reasoning"

Organization: ink-usc

Home Page: https://inklab.usc.edu/XCSR/

crosslingual-transfer natural-language-understanding commonsense-reasoning multilingual-models nlp-datasets

irfnrdh / awesome-indonesia-nlp

nlp-datasets,Resource NLP & Bahasa

User: irfnrdh

awesome nlp-resources nlp-datasets indonesian-language

jadynhax / scpscraper

nlp-datasets,A Python library designed for scraping data from the SCP wiki.

User: jadynhax

Home Page: https://pypi.org/project/scpscraper/

scp scp-foundation webscraping webscraper python python3 data-collection dataset-generation dataset-creation pypi

jamesohortle / loanwords_gairaigo

nlp-datasets,English loanwords in Japanese

User: jamesohortle

english japanese linguistics linguistics-databases nlp nlp-datasets phonetics

jasonshao55 / chinese_metaphor_explanation

nlp-datasets,An annotated Chinese metaphor dataset

User: jasonshao55

chinese metaphor nlp nlp-datasets

kelvin-jiang / freebaseqa

nlp-datasets,The release of the FreebaseQA data set (NAACL 2019).

User: kelvin-jiang

freebaseqa freebase kb-qa nlp-datasets question-answering naacl

liutiedong / goat

nlp-datasets,a Fine-tuned LLaMA that is Good at Arithmetic Tasks

User: liutiedong

ai llms nlp-datasets

marco-roberti / pytorch-e2e-dataset

nlp-datasets,The E2E Dataset, packed as a PyTorch DataSet subclass

User: marco-roberti

dataset e2e-dataset pytorch-dataset python python3 python-3 python-library pytorch pytorch-nlp pytorch-rnn

matt-seb-ho / wikiwhy

nlp-datasets,WikiWhy is a new benchmark for evaluating LLMs' ability to explain between cause-effect relationships. It is a QA dataset containing 9000+ "why" question-answer-rationale triplets.

User: matt-seb-ho

artificial-intelligence dataset explainable-ai iclr2023 machine-learning nlp nlp-datasets open-domain-qa question-answering

maxent-ai / datasets

nlp-datasets,datasets with text data for use in NLP, Text analysis, information extraction, ML research.

Organization: maxent-ai

corpus corpus-data data-journalism data-science dataset india journalism machine-learning news nlg-dataset nlp nlp-datasets pandas-dataframe political-science politics pytorch sklearn text-analysis text-classification text-mining

mehrdad-dev / battle-of-the-wordsmiths

nlp-datasets,Official github repository: Battle of the Wordsmiths: Comparing ChatGPT, GPT-4, Claude, and Bard (dataset)

User: mehrdad-dev

bard chatgpt chatgpt-api chatgpt-app claude dataset google gpt gpt-3 gpt-4 gpt4-api language-model large-language-model large-language-models llm llms nlp nlp-datasets openai palm

mihail911 / nlp-library

nlp-datasets,curated collection of papers for the nlp practitioner 📖👩‍🔬

User: mihail911

neural-network dialogue nlp machine-learning neural-machine-translation deep-learning language-model nlp-datasets

minixc / opensubtitles-dataloader

nlp-datasets,Loads OpenSubtitles v2018 dataset without having to load everything into memory at once. Works well with pytorch.

User: minixc

dataset python pytorch dataloader nlp nlp-datasets

mnschmit / sherliic

nlp-datasets,A Typed Event-Focused Lexical Inference Benchmark for Evaluating Natural Language Inference

User: mnschmit

nli lexical-inference lexical-semantics nlp nlp-resources nlp-datasets challenge acl2019 acl

mtala3t / identify-the-sentiments-av-nlp-contest

nlp-datasets,This project is submitted as python implementation in the contest of Analytics Vidhya called "Identify the Sentiments". I enjoyed the joining of this competition and all its process. This submited solution got the rank 118 in the public leaderboard.

User: mtala3t

Home Page: https://datahack.analyticsvidhya.com/contest/linguipedia-codefest-natural-language-processing-1/

nlp-machine-learning nlp-datasets nlp sentiment-analysis sentiment-classification elmo neural-network-python svm-classifier word2vec sentimental-analysis

niger-volta-lti / yoruba-text

nlp-datasets,Yorùbá language training text for NLP, ASR and TTS tasks

Organization: niger-volta-lti

african-languages natural-language-processing diacritization machine-translation training-dataset nlp yoruba tts asr nlp-datasets

osintai / arabic-dictionaries

nlp-datasets,Arabic Dictionaries

User: osintai

arabic nlp-datasets data txt dictionary arabic-dictionaries arabic-lexicons

pzoom522 / histsumm

nlp-datasets,Code and data for "Summarising Historical Text in Modern Languages" (EACL 2021)

User: pzoom522

historical-text ancient-languages cross-lingual-summarization nlp-datasets summariser eacl2021

quincyliang / nlp-public-dataset

nlp-datasets,Chinese, English NER, English-Chinese machine translation dataset. 中英文实体识别数据集，中英文机器翻译数据集, 中文分词数据集

User: quincyliang

nlp-datasets machine-learning-dataset

secsilm / zi-dataset

nlp-datasets,汉字数据集，包括汉字的相关信息，例如笔画数、部首、拼音、英文释义/同义词等。

User: secsilm

nlp chinese-nlp chinese-dataset dataset hanzi nlp-datasets

selimfirat / bilkent-turkish-writings-dataset

nlp-datasets,Turkish writings dataset that promotes creativity, content, composition, grammar, spelling and punctuation.

User: selimfirat

Home Page: https://stars.bilkent.edu.tr/turkce/

dataset nlp-datasets creative-writing nlp pdf-conversion bilkent-university turkish turkish-language

semiringinc / mueller-report-corpus

nlp-datasets,The Mueller Report Corpus V 0.1

Organization: semiringinc

corpus corpus-linguistics nlp nlp-datasets

trisongz / pylines

nlp-datasets,Simplifying parsing of large jsonline files in NLP Workflows

User: trisongz

Home Page: https://pypi.org/project/pylines/

jsonlines jsonlines-data nlp-datasets json

uma-pi1 / opiec

nlp-datasets,Reading the data from OPIEC - an Open Information Extraction corpus

Organization: uma-pi1

Home Page: https://www.uni-mannheim.de/dws/research/resources/opiec/

open-information-extraction information-extraction corpus corpus-data corpus-tools natural-language-processing natural-language-understanding nlp nlp-resources nlp-datasets

uma-pi1 / opiec-pipeline

nlp-datasets,

Organization: uma-pi1

Home Page: https://www.uni-mannheim.de/dws/research/resources/opiec/

open-information-extraction text-processing corpus-data corpus-tools corpus-linguistics corpus-processing corpus-builder corpus-generator wikipedia wiki

utahnlp / infotabs-code

nlp-datasets,Implementation of the semi-structured inference model in our ACL 2020 paper, INFOTABS: Inference on Tables as Semi-structured Data.

Organization: utahnlp

Home Page: https://infotabs.github.io/

nlp nlp-datasets nlp-machine-learning acl2020 wikipedia tables semi-structured-data svm roberta transformer

xtea / chinese_medical_words

nlp-datasets,手工整理医疗行业词汇、术语等语料。可用于语音识别、对话系统等各类nlp模型训练。

User: xtea

nlp chinese-nlp nlp-datasets medical nlp-data-to-text chinese-word-segmentation

Topic: nlp-datasets Goto Github

👇 Here are 139 public repositories matching this topic...

aajanki / finnish-nlp-datasets

afrisenti-semeval / afrisent-semeval-2023

andythefactory / romanian-nlp-datasets

aryashah2k / sasbitathon-winningsolution

bohdan-khomtchouk / nero-nlp

bothub-it / bothub

cjiang2 / vdcnn

cybermatt / russian-names

d0rj / ruslit

dibyakanti / autotnli-code

divkakwani / webcorpus

dkulagin / kartaslov

fido-ai / ua-datasets

gcunhase / amicorpusxml

gkiril / benchie

gpt-tester / chatgpt-test-dataset-01

grammarly / ua-gec

guhhhhaa / 4675-scifi

guhhhhaa / wula-scifi

hellohaptik / multi-task-nlp

ink-usc / commongen

ink-usc / riddlesense

ink-usc / triggerner

ink-usc / xcsr

irfnrdh / awesome-indonesia-nlp

jadynhax / scpscraper

jamesohortle / loanwords_gairaigo

jasonshao55 / chinese_metaphor_explanation

kelvin-jiang / freebaseqa

liutiedong / goat

marco-roberti / pytorch-e2e-dataset

matt-seb-ho / wikiwhy

maxent-ai / datasets

mehrdad-dev / battle-of-the-wordsmiths

mihail911 / nlp-library

minixc / opensubtitles-dataloader

mnschmit / sherliic

mtala3t / identify-the-sentiments-av-nlp-contest

niger-volta-lti / yoruba-text

osintai / arabic-dictionaries

pzoom522 / histsumm

quincyliang / nlp-public-dataset

secsilm / zi-dataset

selimfirat / bilkent-turkish-writings-dataset

semiringinc / mueller-report-corpus

trisongz / pylines

uma-pi1 / opiec

uma-pi1 / opiec-pipeline

utahnlp / infotabs-code

xtea / chinese_medical_words

Recommend Projects

Recommend Topics

Recommend Org