Giter Site home page Giter Site logo

bist-parser's Introduction

BIST Parsers

Graph & Transition based dependency parsers using BiLSTM feature extractors.

The techniques behind the parser are described in the paper Simple and Accurate Dependency Parsing Using Bidirectional LSTM Feature Representations. Futher materials could be found here.

Required software

Train a parsing model

The software requires having a training.conll and development.conll files formatted according to the CoNLL data format. For the faster graph-based parser change directory to bmstparser (1200 words/sec), and for the more accurate transition-based parser change directory to barchybrid (800 word/sec). The benchmark was performed on a Mac book pro with i7 processor. The graph-based parser acheives an accuracy of 93.8 UAS and the transition-based parser an accuracy of 94.7 UAS on the standard Penn Treebank dataset (Standford Dependencies). The transition-based parser requires no part-of-speech tagging and setting all the tags to NN will produce the expected accuracy. The model and param files achieving those scores are available for download (Graph-based model, Transition-based model). The trained models include improvements beyond those described in the paper, to be published soon.

To train a parsing model with for either parsing architecture type the following at the command prompt:

python src/parser.py --dynet-seed 123456789 [--dynet-mem XXXX] --outdir [results directory] --train training.conll --dev development.conll --epochs 30 --lstmdims 125 --lstmlayers 2 [--extrn extrn.vectors] --bibi-lstm

We use the same external embedding used in Transition-Based Dependency Parsing with Stack Long Short-Term Memory which can be downloaded from the authors github repository and directly here.

If you are training a transition-based parser then for optimal results you should add the following to the command prompt --k 3 --usehead --userl. These switch will set the stack to 3 elements; use the BiLSTM of the head of trees on the stack as feature vectors; and add the BiLSTM of the right/leftmost children to the feature vectors.

Note 1: You can run it without pos embeddings by setting the pos embedding dimensions to zero (--pembedding 0).

Note 2: The reported test result is the one matching the highest development score.

Note 3: The parser calculates (after each iteration) the accuracies excluding punctuation symbols by running the eval.pl script from the CoNLL-X Shared Task and stores the results in directory specified by the --outdir.

Note 4: The external embeddings parameter is optional and better not used when train/predicting a graph-based model.

Parse data with your parsing model

The command for parsing a test.conll file formatted according to the CoNLL data format with a previously trained model is:

python src/parser.py --predict --outdir [results directory] --test test.conll [--extrn extrn.vectors] --model [trained model file] --params [param file generate during training]

The parser will store the resulting conll file in the out directory (--outdir).

Note 1: If you are using the arc-hybrid trained model we provided please use the --extrn flag and specify the location of the external embeddings file.

Note 2: If you are using the first-order trained model we provided please do not use the --extrn flag.

Citation

If you make use of this software for research purposes, we'll appreciate citing the following:

@article{DBLP:journals/tacl/KiperwasserG16,
    author    = {Eliyahu Kiperwasser and Yoav Goldberg},
    title     = {Simple and Accurate Dependency Parsing Using Bidirectional {LSTM}
           Feature Representations},
    journal   = {{TACL}},
    volume    = {4},
    pages     = {313--327},
    year      = {2016},
    url       = {https://transacl.org/ojs/index.php/tacl/article/view/885},
    timestamp = {Tue, 09 Aug 2016 14:51:09 +0200},
    biburl    = {http://dblp.uni-trier.de/rec/bib/journals/tacl/KiperwasserG16},
    bibsource = {dblp computer science bibliography, http://dblp.org}
}

License

This software is released under the terms of the Apache License, Version 2.0.

Contact

For questions and usage issues, please contact [email protected]

Credits

Eliyahu Kiperwasser

Yoav Goldberg

bist-parser's People

Contributors

elikip avatar mdelhoneux avatar

Watchers

 avatar  avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.