Giter Site home page Giter Site logo

bahasa-indo-nlp-dataset's Introduction

Bahasa Indonesia Natural Language Processing (Indo NLP ) Resource

Collection of Bahasa Indonesia (Indonesian) Natural Language Processing (NLP) software libraries, dictionaries, and corpus. Always welcome for pull requests.

Bahasa Indonesia NLP Libraries/Services

Natural Language Toolkit

Library Description Programming Languages License Author & Link
bahasa Pre-alpha development stage NLP toolkit for Bahasa Indonesia Python MIT License (MIT) Sutrisno Efendi

Sentiment Analysis

Library Description Programming Languages License Author & Link
python-sentianalysis-id Sentiment Analysis for Bahasa Indonesia Python yasirutomo

Part of Speech Tagging (POS Tagging)

Library Description Programming Languages License Author & Link
Open NLP POS tagging with predefined training and test data Java yohanesgultom

Named Entity Recognition

Library Description Programming Languages License Author & Link
indonesia-ner Named Entity Recoginition for Bahasa Indonesia Java MIT License (MIT) yusufsyaifudin

Stemmer

Library Description Programming Languages License Author & Link
sastrawi High quality stemmer library for Indonesian Language (Bahasa) PHP MIT License (MIT) sastrawi

Word Embedding

Library Description Programming Languages License Author & Link
indonesian-word-embedding A web application that demonstrates Indonesian word embedding Python galuhsahid

Question Answering (Machine Comprehension)

Service Description Language Author & Link
QA Question Answering System for Bahasa Indonesia Java takin

Dictionaries / Translation Pairs / Parallel Corpus

Library Description Size Features License Link
MALINDO_Morph Morphological dictionary for Malay / Indonesian English-Malay, English-Indonesian CC BY-NC-SA 4.0 TH english
TALPCo The TUFS Asian Language Parallel Corpus Japanese -> Indonesian Creative Commons Attribution 4.0 International (CC BY 4.0) license matbahasa

Downloadable Text Corpus

Library Description Size Features License Link
Indonesian-annotated-conll17 CoNLL Universal Dependency Parsing 29.64 GB Automatic segmentation, tokenization and morphological and syntactic annotations of raw texts, provided for the CoNLL 2017 Shared Task in UD Parsing. CC BY-NC-SA 4.0 TH LINDAT / CLARIN
ID-OpinionWords List of Opinion Words (positive/negative) in Bahasa Indonesia for Sentiment Analysis masdevid
freq-dist-id Most Common Bahasa Words on Twitter, Wikipedia and other sources ardwort
idn-tagged-corpus Indonesian Manually Tagged Corpus Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License famrashel

WordNet Bahasa

Library Description License Link
WordNet Bahasa Wordnet Bahasa, inspired by the Princeton WordNet and the Global WordNet Grid Large scale, freely available, semantic dictionary MIT License (MIT)

Universal Dependency Treebank Bahasa

Library Description License Link
UD_Indonesian-GSD The Indonesian UD is converted from the content head version of the universal dependency treebank v2.0 Query text by genre, domain CC BY-NC-SA 3.0 US

Pre-trained Word Vectors

Pre-trained Model Description Size Dimensions License Link
fastText Skip-Gram model trained on Wikipedia using fastText 300 CC BY-SA 3.0 Facebook + Bin & Text + Text Only
word2vec Indonesian 402MB 300 Indonesian

Grammar Resource Framework

Model Description License Link
INDRA Indonesian Resource Grammar (INDRA) - an implemented HPSG grammar for Indonesian MIT license INDRA

Not found? Try to look at another Bahasa Indonesia NLP Awesome List/Resource (Like this one)

https://github.com/kmkurn/id-nlp-resource

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.