Giter Site home page Giter Site logo

en-pos's Introduction

en-pos

A better English POS tagger written in JavaScript

Installation and usage

Install via NPM:

npm i --save en-pos

How to use

const Tag = require("en-pos").Tag;
var tags = new Tag(["this","is","my","sentence"])
.initial() // initial dictionary and pattern based tagging
.smooth() // further context based smoothing
.tags;
console.log(tags);
// ["DT","VBZ","PRP$","NN"]

Annotation Specification

Annotation Name Example
NN Noun dog man
NNS Plural noun dogs men
NNP Proper noun London Alex
NNPS Plural proper noun Smiths
VB Base form verb be
VBP Present form verb throw
VBZ Present form (3rd person) throws
VBG Gerund form verb throwing
VBD Past tense verb threw
VBN Past participle verb thrown
MD Modal verb can shall will may must ought
JJ Adjective big fast
JJR Comparative adjective bigger
JJS Superlative adjective biggest
RB Adverb not quickly closely
RBR Comparative adverb less-closely faster
RBS Superlative adverb fastest
DT Determiner the a some both
PDT Predeterminer all quite
PRP Personal Pronoun I you he she
PRP$ Possessive Pronoun I you he she
POS Possessive ending 's
IN Preposition of by in
PR Particle up off
TO to to
WDT Wh-determiner which that whatever whichever
WP Wh-pronoun who whoever whom what
WP$ Wh-possessive whose
WRB Wh-adverb how where
EX Expletive there there
CC Coordinating conjugation & and nor or
CD Cardinal Numbers 1 7 77 one
LS List item marker 1 B C One
UH Interjection ah oh oops
FW Foreign Words viva mon toujours
, Comma ,
: Mid-sent punct : ; ...
. Sent-final punct. . ! ?
( Left parenthesis ) } ]
) Right parenthesis ( { [
# Pound sign #
$ Currency symbols $ £ ¥
SYM Other symbols + * / < >
EM Emojis & emoticons :)

Accuracy and performance

TL:DR;

  • When smoothing is enabled: 96.43% accuracy (processing 132K tokens in 38 seconds)
  • When smoothing is disabled: 94.4% accuracy (processing 132K tokens in 3 seconds)

As of 25 Jan 2017, this library scored 96.43% at the Penn Treebank test (0.3% away from being a state of the art tagger).

Being written in JavaScript, I think it's safe to say that this is the most accurate JavaScript POS tagger, since the only JS library I know of is pos-js which when I tested on the same treebank scored 87.8%, though it was faster than my implementation when smoothing is enabled.

However, if performance is what's you're after rather than accuracy, then you have the option to disable smoothing in this library and this will marginally increase performance making this library even faster than pos-js but with far better accuracy (94.4%).

Building from source and testing

  • Build: tsc (requires typescript)
  • Test: node test/test.ts

Credits

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.