Giter Site home page Giter Site logo

lixinsu / r-net-in-keras Goto Github PK

View Code? Open in Web Editor NEW

This project forked from yerevann/r-net-in-keras

0.0 2.0 0.0 8.66 MB

R-NET implementation in Keras.

Home Page: https://www.microsoft.com/en-us/research/wp-content/uploads/2017/05/r-net.pdf

License: MIT License

Python 100.00%

r-net-in-keras's Introduction

R-NET implementation in Keras

This repository is an attempt to reproduce the results presented in the technical report by Microsoft Research Asia. The report describes a complex neural network called R-NET designed for question answering.

R-NET is currently (July 2017) the best model on Stanford QA database: SQuAD. SQuAD dataset uses two performance metrics, exact match (EM) and F1-score (F1). Human performance is estimated to be EM=82.3% and F1=91.2% on the test set.

The report describes two versions of R-NET:

  1. The first one is called R-NET (Wang et al., 2017) (which refers to a paper which not yet available online) and reaches EM=71.3% and F1=79.7% on the test set. It consists of input encoders, a modified version of Match-LSTM, self-matching attention layer (the main contribution of the paper) and a pointer network.
  2. The second version called R-NET (March 2017) has one additional BiGRU between the self-matching attention layer and the pointer network and reaches EM=72.3% and F1=80.7%.

The current best single-model on SQuAD leaderboard has a higher score, which means R-NET development continued after March 2017. Ensemble models reach higher scores.

This repository contains an implementation of the first version, but we cannot yet reproduce the reported results. The best performance we got so far was EM=54.21% and F1=65.26% on the dev set. We are aware of a few differences between our implementation and the network described in the paper:

  1. We do not use character-level embedding at the input.
  2. The first formula in (11) of the report contains a strange summand W_v^Q V_r^Q. Both tensors are trainable and are not used anywhere else in the network. We have replaced this product with a single trainable vector.
  3. The size of the hidden layer should 75 according to the report, but we get better results with a lower number. Overfitting is huge with 75 neurons.

We are not sure whether we applied dropout correctly. Also there is nothing about weight initialization in the report. On the other hand we can't rule out that we have bugs in our code.

Instructions

  1. We need to parse and split the data
    python parse_data.py data/train-v1.1.json --train_ratio 0.9 --outfile data/train_parsed.json --outfile_valid data/valid_parsed.json
    python parse_data.py data/train-v1.1.json --outfile data/train_parsed.json
  1. Preprocess the data
    python preprocessing.py data/train_parsed.json --outfile data/train_data.pkl
    python preprocessing.py data/valid_parsed.json --outfile data/valid_data.pkl
    python preprocessing.py data/dev_parsed.json --outfile data/dev_data.pkl
  1. Train the model
    python train.py --hdim 40 --batch_size 70 --nb_epochs 50 --optimizer adam --dropout 0.2
  1. Predict on dev/test set samples
    python predict.py model/your-model prediction.json

r-net-in-keras's People

Contributors

mahnerak avatar hrant-khachatrian avatar tigrangalstyan avatar lixinsu avatar

Watchers

James Cloos avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.