midlajjs313,github

github-slideshow

A robot powered training repository :robot:

hanlp

Natural Language Processing for the next decade. Tokenization, Part-of-Speech Tagging, Named Entity Recognition, Syntactic & Semantic Dependency Parsing, Document Classification

competition: https://www.kaggle.com/c/jigsaw-toxic-comment-classification-challenge/overview evaluation: 6 class each binary, column-wise ROC AUC, https://www.kaggle.com/c/jigsaw-toxic-comment-classification-challenge#evaluation dataset: dir data/train.csv, data/test.csv technology: deep learning CNN http://cs231n.github.io/convolutional-networks/ lstm: http://colah.github.io/posts/2015-08-Understanding-LSTMs/, 自己写的，扔了就跑： https://zhuanlan.zhihu.com/p/35756075 bi-lstm: word embedding: https://towardsdatascience.com/introduction-to-word-embedding-and-word2vec-652d0c2060fa GloVe: embedding/glove.pdf keras: https://keras.io/zh/preprocessing/text/ network: tokenize -> word embedding -> (bi-lstm or lstm) -> CNN (3*3) (or deeper, see lstm_keras.py) -> pooling -> dense net -> 6 binary sigmoid for classify

malayalam-sentence-tokenizer

mlmorph

Malayalam Morphological Analyzer using Finite State Transducer

rnns-lstm_pos_blog_post

Earlier this year the CACCHT project (Creating Annotated Corpora of Classical Hebrew Texts), which is a joint project of the ETCBC and the Theological Seminary at Andrews University, has started.The project participants are Jarod Jacobs, Martijn Naaijer, Robert Rezetko, Oliver Glanz and Wido van Peursen and this project focuses on statistically analyzing Ancient Hebrew texts. Of course we make use of the BHSA and the extrabiblical module, but for a comprehensive analysis we would like to use more texts, especially the Dead Sea Scrolls and Rabbinic texts. The first step has been made now and you can find the results on the [ETCBC github page](https://github.com/ETCBC/dss): a brand new Text-Fabric module containing the Dead Sea Scrolls with morphological encoding. The DSS texts and morphological data connected with them were generously provided by Martin Abegg, which consist of two foundational sets of data: transcriptions and morphological tagging. The transcriptions come from various sources, but primarily reflect what is found in the Discoveries in the Judean Desert series. Abegg started morphologically tagging the Qumran texts in the mid-90s with the assistance of several people. Over the following decades, Abegg completed full morphological tagging of nearly every Hebrew and Aramaic scroll found in the Judaean Desert between 1947 and today. The data were converted to Text-Fabric by Dirk Roorda.

test

For testing

thamizhi-morph

ThamizhiMorph: A Tamil Morphological Analyser and Generator

midlajjs313 Goto Github PK

midlajjs313's Projects

github-slideshow

hanlp

lstm_keras

malayalam-sentence-tokenizer

mlmorph

rnns-lstm_pos_blog_post

test

thamizhi-morph

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent