Giter Site home page Giter Site logo

hidden-markov-model's Introduction

Before you started. Make sure that you have data with you. be sure to cp config.ini.template to config.ini and fill in the parent directory the data in. For example if the twi.train.json, twi.dev.json, twi.test.json and twi.bonus.json is in /home/data, then you should put the path here.

Use python run_bigram.py to run bigram HMM on test set.

If you want to get the visualization of confusion matrix of bigram model, be sure to install matplotlib and uncomment the 18, 19 line in run_bigram.py.

Use python run_trigram.py to run trigram HMM on test set.

If you want to change the input corpus, you can call these functions: Corpus.trainCorpus() to get train corpus; Corpus.devCorpus() to get dev corpus; Corpus.testCorpus() to get test corpus; Corpus.bonusCorpus to get bonus corpus. There are two parameters can be used here, one is ratio, means the ratio of corpus you want to use; second is shuffle, means if you want to shuffle the corpus.

If you want to get mixed corpus, you can use Corpus.combinedCorpus() by providing ratio of corpus, shuffle, and *tags. tags should be either 'train', 'dev', 'test', or 'bonus'.

You also can determine whether to do OOV handling by calling corpus.replace_oov_with_UNK() method. There are four parameters you can input: unk_threshold, unk_oov_ratio, trans_prob, and known_unk_dict. The first three parameters are descirbed in the HW2.pdf file. And the last parameter means that if you have a unk_dict in hand, you can input it inside. It is often used when we want to extend the unk_dict of evaluation set with the unk_dict of training set.

As for the HmmModel class, n means ngram HMM you want to use, it will whether be 2 or 3. k_lan_model and k_emiss_model parameters are also descirbed in HW2.pdf

Use python gridsearch_bigram to run gridsearch on bigram HMM model. you can update the parameters in updated_params. Avaiable keys will be unk_threshold, unk_oov_ratio, trans_prob, k_lan_model and k_emiss_model. And the value of that dictionary should be an list of values you wanna test.

Use python gridsearch_trigram to un gridsearch on trigram HMM model. But it will take forever to run...

hidden-markov-model's People

Contributors

yuchaz avatar

Watchers

James Cloos avatar  avatar

hidden-markov-model's Issues

em-algorithms

Implement the EM algorithms. After making the corpus class and make it work.

corpus-class

The Corpus Class should not be that hard. Just make some __iter__ and some size, shuffle functions. Shuffle you just make a map and start working thru. No need to actually shuffle a thing.

oov-handling

Go to check what each pos tag means. E.g. U means url, # means hashtag (trend), @ means tagging person. So that you can use regex to change it to "<URL>", "<TAG>", "<HASH>"...etc. As for others, just change to <VERB-UNK>... stuffs like that

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.