Giter Site home page Giter Site logo

secondorderrnn's Introduction

Modeling Long-Distance Dependencies with Second-Order LSTMs

Stanford CS224N Natural Language Processing with Deep Learning

We present our results in this paper.


Datasets

mbounded-dyck-k parenthesis dataset is included in the data/mbounded-dyck-k directory.

To acquire Penn Treebank and Wiki2 datasets:

sh getdata.sh

Training and testing the language model

run.sh includes all training and testing commands.

Flags Clarifications

  • When --log <LOG_NAME> is specified, the model parameters, checkpoints, and tensorboard log will be saved under /log/LOG_NAME. Without the flag, no logs will be recorded.
  • If --is-stream 0 is specified, we treat each line in the dataset as a separate sample sentence. <start> and <end> tokens are added to each sentence and all sentences are padded to the same length then batched. The number of samples in the dataset must be divisible by the batch size. This option is used for the parenthesis dataset and syntax evaluation dataset.
  • If --is-stream 1 is specified, we treat the entire dataset as one continuous stream. The last hidden state of a batch is used to initialize the LSTM for the next batch. The dataset to padded to a multiple of batch size and bptt. The backprop through time flag --bptt determines the sequence length of each batch. This option is used for natural language datasets like Penn Treebank and Wiki2.

Analysis

  • To show the principal angles between sub-spaces (PABS) for pairs of hidden-to-hidden weight matrices, run compare_weights.py with a model checkpoint.
  • To evaluate the syntax of generated texts, see the documentation in LM_syneval.
  • (Experimental) To generate heatmaps, go to the heatmap branch and run:
python3 run.py --mode train --model attention --train-path=./data/wikitext-2/train.txt --valid-path=./data/wikitext-2/valid.txt --checkpoint ./LOG_PATH/best_val_ppl.pth --hidden-size 600 --embedding-size 300 --batch-size 10 --bptt 70 --dropout 0.5 --lr 1e-4 --is-stream 1 --epochs 1000 --second-order-size 5 > ./data/heatmaps/sample_att_wiki_5cell.txt
  • Mode should be set to train because we need temperature set to high to be able to mark how confident each word is assigned to each cell. The batch info and attention info is printed and piped to an output file. We use the output file to generate heatmaps.
python3 utils/heatmap.py --input data/heatmaps/sample_att_wiki_5cell.txt --output data/heatmaps/ --prefix att_wiki_5cell

Future Work

We have modified the baseline and attention model to work with RNN instead of LSTM in the rnn branch. However we did not have time to verify the correctness of the code and aggregate results for those models.

Sanitychecks

  • test_dataset.py tests the validity of the dataset object
  • test_eval.py tests the validity of the evaluation metrics
  • test_model.py tests the validity of the models

secondorderrnn's People

Contributors

richardlyf avatar noa-codes avatar saspinner avatar dependabot[bot] avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.