Giter Site home page Giter Site logo

kevinalexmathews / minimalist-location-metonymy-resolution Goto Github PK

View Code? Open in Web Editor NEW

This project forked from milangritta/minimalist-location-metonymy-resolution

0.0 2.0 0.0 20.83 MB

The code and data accompanying the ACL 2017 "outstanding award" publication "Vancouver Welcomes You! Minimalist Location Metonymy Resolution"

License: GNU General Public License v3.0

Python 100.00%

minimalist-location-metonymy-resolution's Introduction

Vancouver Welcomes You!

Minimalist Location Metonymy Resolution

Welcome to the home of the code and data accompanying the "outstanding award" publication at ACL 2017. The talk took place on 2.8.2017 in Vancouver, Canada, you can view my personal video recording here https://www.youtube.com/watch?v=LwGWyxRCtwY and official footage here https://vimeo.com/channels/acl2017/234945689

"Science is a wonderful thing if one does not have to earn one's living at it." -- Albert Einstein


Python libraries requirements

  • keras 1.2.2 - www.keras.io (for best replicability, go for Theano backend, not Tensorflow)
  • spacy - www.spacy.io and also download the English embeddings as instructed on the website

Embeddings

To fully replicate the results and due to GitHub's file size limits (100MB today), you need to download the GloVe embeddings and save in a local directory. Go to http://nlp.stanford.edu/projects/glove/ and change the PATH to the embeddings in the LSTM_(Train/Test).py files. Please use the 50D embeddings for publication results unless you want to experiment with bigger dimensions. The final file size is around 175MB. You can also Google the DOI of our paper for the complete set of data.

How to replicate

I fixed some random seeds, however, due to the complexity of Keras/Theano, there is still some random initiation happening, so please do multiple runs and take the average.

  • ensemble.py -> this is the ensemble method evaluation script (accuracy, precision, recall, f-score). It can be used for ReLocaR and SemEval evaluation (see internal comments for usage instructions). Both the ensemble approach and single model results will be calculated, see output.
  • create_prewin.py -> this is the preprocessing script used for taking TEXT files and outputting the processed pickled files for LSTM_train(and test).py. For replication purposes, this script applies the PREWIN method to text, see paper. There are ready pickled files in the /pickle/ directory (for replication) but feel free to create new input from new text.
  • create_baseline.py - > another preprocessing script for TEXT to PICKLED input (for replication purposes, use this script for ALL baselines in the paper). See internal comments for details of usage.
  • LSTM_Train.py -> The MAIN script for training the classifier. Please check the paths to input files (choose from /pickle/ or prepare your own with the create....py scripts), edit path to embeddings file and RUN :-) To get the EXACT numbers as reported in the paper, you may have to adjust the number of epochs in training (± 1).
  • LSTM_Test.py -> The MAIN script for testing the classifier with the trained/saved weights. The output of the clasifier is saved (please uncomment code first) in either /semeval/ or /relocar/ folders. This is then used as input to ensemble.py to produce evaluation metrics. Please see internal comments for details.
  • ReLocaR_XML/ -> this folder contains the RAW xml files (one of the paper contributions). There is the ReLocaR_Test.xml for testing and ReLocaR_Train.xml for training. Processed versions of both of these files are available for easy replication in the /pickle/ folder.
  • gold/ -> The gold standard results used for evaluation, no need to change.
  • data/ -> this directory contains text data from four different datasets (relocar, semeval, conll, wikipedia). Feel free to dabble and add new files, however, for replication purposes, these have already been processed and sit inside the /pickle/ folder.
  • data/locations.txt -> This is the list of locations used to construct ReLocaR as mentioned in the publication.
  • pickle/ -> this directory holds the input files (already processed with our methods such as PREWIN and BASELINE) ready to be plugged into the neural networks. The naming scheme should be selfexplanatory (famous last words). Get in touch if unclear.
  • weights/ -> this directory stores the neural network trained weights. No need to change or modify.

Issues during replication

Have fun and thanks for stopping by my friend. I tried to make the replication efforts as smooth as possible. I take science replication extremely seriously. In case something is missing, please raise an issue or email me (not difficult to find) and I will address the feedback. Enjoy!

Alternative link for the whole bundle: https://www.repository.cam.ac.uk/handle/1810/265068

minimalist-location-metonymy-resolution's People

Contributors

milangritta avatar

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.