Giter Site home page Giter Site logo

hmmtagger's Introduction

POS Tagging with Hidden Markov Models

This repository contains an elegant implementation of a POS tagger based on Hidden Markov Models. To get a better view of the internal structure of the developed modules, check the jupypter notebook in this repository.

The HMMTagger class in HMMTagger.py relies on clean and well-document implementation of another class that represents a discrete probability distribution with different smoothing techniques. The probabilistic components of the HMM can smoothed using simple add-one smoothing or more advanced absolute discounting.

(1) Training a POS Tagger

Assuming that we have a training data in ConLL format (check this folder for more info), a POS tagger can be trained as follows:

$ python  train_tagger.py \
  -train_dir data/ \
  -train_fileid 'de-train.tt' \
  -smoothing 'abs_disc' \
  -d 0.3 \
  -tagger_file 'POS_tagger'

(2) Using a trained POS Tagger

After training a tagger using the previous step, the tagger object would be saved in disk and can be used to tag a test dataset:

$ python  test_tagger.py \
  -test_dir data/ \
  -test_fileid de-test.t \
  -POS_tagger POS_tagger \
  -tagged_output tagged_abs_disc.tt

(3) Evaluating the tagger

The python file eval.py can used to quantitatively evaluate the performance of the tagger:

$ python eval.py data/de-eval.tt tagged_abs_disc.tt

Output:

Comparing gold file "data/de-eval.tt" and system file "tagged_abs_disc.tt"

Precision, recall, and F1 score:

  DET 0.9092 0.9761 0.9415
 NOUN 0.8476 0.9835 0.9105
 VERB 0.9605 0.8712 0.9137
  ADP 0.9632 0.9762 0.9697
    . 0.9983 0.9992 0.9987
 CONJ 0.9544 0.8974 0.9250
 PRON 0.9391 0.8309 0.8817
  ADV 0.9234 0.7893 0.8511
  ADJ 0.7993 0.6485 0.7160
  NUM 0.9906 0.7778 0.8714
  PRT 0.8730 0.8730 0.8730
    X 0.2000 0.0909 0.1250

Accuracy: 0.9136

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.