Giter Site home page Giter Site logo

minitagger's Introduction

Minitagger (Python 3 + Numpy)

Minitagger is a tagger for words in sentences. Underlying the tagger is an implementation of a multi-class SVM (Fan et al., 2008). It makes independent predictions based on local context. Even though this method is completely unstructured (as opposed to CRFs), with the addition of lexical representations it performs as well as structured models on certain problems like POS tagging.

For experimental details, see: Simple Semi-Supervised POS Tagging (Stratos and Collins, 2015). You can obtain the word representations used in the experiments at: http://www.cs.columbia.edu/~stratos/research/wordrep.tar.gz.

Highlights

Minitagger can:

  1. Utilize bit string (Brown clusters) and real-valued (word embeddings) lexical features.
  • These lexical features must include a representation for unknown words. By default, symbol "<?>" denotes this representation.
  1. Train from partially or completely labeled data, of form (an empty line marks the end of a sentence):

      The
      dog
      saw	V
      the
      cat
    
  2. Perform active learning using whatever features it's equipped with.

Usage

First, type make to compile the liblinear package.

Training and prediction

  • Try training a tagger with baseline features:

python3 minitagger.py example/example.train --model_path /tmp/example.model.baseline --train --feature_template baseline

  • Try training a tagger with bit string features:

python3 minitagger.py example/example.train --model_path /tmp/example.model.bitstring --train --feature_template bitstring --bitstring_path example/example.bitstring

  • Try training a tagger with embedding features:

python3 minitagger.py example/example.train --model_path /tmp/example.model.embedding --train --feature_template embedding --embedding_path example/example.embedding

Then try tagging test data:

python3 minitagger.py example/example.test --model_path [model] --prediction_path /tmp/example.test.prediction

Active learning

  • Try active learning with baseline features, seed size 1, and step size 1 (you can also provide a held-out dataset to monitor the improvement in a log file):

python3 minitagger.py example/example.train --train --feature_template baseline --active --active_output_path /tmp/active.baseline.seed1.step1 --active_seed_size 1 --active_step_size 1 --active_output_interval 1

Once you have actively selected examples, you can simply provide these partially labeled sentences as training data to train a model.

minitagger's People

Contributors

karlstratos avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.