unilight / r-net-in-tensorflow Goto Github PK

View Code? Open in Web Editor NEW

77.0 12.0 41.0 32 KB

R-NET implementation in TensorFlow.

Home Page: https://www.microsoft.com/en-us/research/wp-content/uploads/2017/05/r-net.pdf

Shell 0.96% Python 99.04%

machine-comprehension tensorflow squad nlp

r-net-in-tensorflow's Introduction

R-NET in Tensorflow

This repository is a Tensorflow implementation of R-NET, a neural network designed to solve the Question Answering (QA) task.
This implementation is specifically designed for SQuAD , a large-scale dataset drawing attention in the field of QA recently.
If you have any questions, contact [email protected].

Updates and Acknowledgements

17.12.30

As some have required recently, I have released a set of trained model weights. Details can be found in the Current Results section below.

17.12.12

I'd like to thank Fan Yang for pointing out several bugs when evaluating models. First, the model to be evaluated needs to be explicitly specified when executing the evaluate.py program. See the Usage section below. Also, I fixed some problems when loading characters.

17.11.10

I'd like to thank Elías Jónsson for pointing out that there's a problem in the mapping between characters and their indices. Previously, the indices for training and testing (dev set) were inconsistent. Actually, the mapping for testing shouldn't be constructed. During testing, if the machine sees a character it has not seen in the training set, it should mark it as OOV. So the table is now constructed using only the training set, and is used in both training and testing.
As some are asking about how to turn the character embeddings off, one can now avoid using character embeddings by changing the hyperparameter in Models/config.json.
I applied dropout to various components in the model, including all LSTM cells, passage & question encoding, question-passage matching, self-attention, and question representation. This led to improvement of about 3%.
As I read the original paper more carefully, I found that the authors used Adadelta as optimizer, and 3 layers of bi-GRU were used to encode both passage and question. Changing from Adam to Adadelta led to roughly 1% improvement. In my experiments, after stacking layers, the epochs required for convergence increased, and I found that instead of stacking 3 layers, 2 layers led to better performances. Details are depicted in the current results section.

Dependency

Python 3.6
Tensorflow-gpu 1.2.1
Numpy 1.13.1
NLTK

Usage

First we need to download SQuAD as well as the pre-trained GloVe word embeddings. This should take roughly 30 minutes, depending on network speed.

cd Data
sh download.sh
cd ..

Data preprocessing, including tokenizing and collection of pre-trained word embeddings, can take about 15 minutes. Two kinds of files, {data/shared}_{train/dev}.json, will be generated and stored in Data.
- shared: including the original and tokenized articles, GloVe word embeddings and character dictionaries.
- data: including the ID, corresponding article id, tokenized question and the answer indices.

python preprocess.py --gen_seq True

Train R-NET by simply executing the following. The program will
1. Read the training data, and then build the model. This should take around an hour, depending on hardware.
2. Train for 12 epochs, by default.
Hyper-arameters can be specified in Models/config.json. The training procedure, including the mean loss and mean EM score for each epoch, will be stored in Results/rnet_training_result.txt. Note that the score appear during training could be lower than the scores from the official evaluator. The models will be stored in Models/save/.

python rnet.py

The evaluation of the model on the dev set can be generated by executing the following. The result will be stored in Results/rnet_prediction.txt. Note that the score appear during evaluation could be lower than the scores from the official evaluator. Note: The model to be evaluated has to be specified explictly. For example, if 12 epochs were trained (by default), then in Models/save/ there should exist 5 saved models:

rnet_model8.ckpt.meta
rnet_model8.ckpt.data-00000-of-00001
rnet_model8.ckpt.index
...
rnet_model11.ckpt.meta
rnet_model11.ckpt.data-00000-of-00001
rnet_model11.ckpt.index
rnet_model_final.ckpt.meta
rnet_model_final.ckpt.data-00000-of-00001
rnet_model_final.ckpt.index

Here, rnet_model11 and rnet_model_final are the same. Say, for example, one wish to evaluate on rnet_model_final, the following would to it:

python evaluate.py --model_path Models/save/rnet_model_final.ckpt

To get the final official score, you need to use the official evaluation script, which is in the Results directory.

python Results/evaluate-v1.1.py Data/dev-v1.1.json Results/rnet_prediction.txt

Current Results

Model	Dev EM Score	Dev F1 Score
Original Paper	71.1	79.5
My (Adadelta, 2 layer, dropouts, w/o char emb)	62.6	71.5
My (Adadelta, 1 layer, dropouts, w/o char emb)	61.0	70.3
My (Adam, 1 layer, dropouts, w/o char emb)	60.8	70.5
My (Adam, 1 layer, w/o char emb)	57.8	67.9
My (Adam, 1 layer, w/ char emb)	60.1	68.9

You can find the current leaderboard and compare with other models.

Trained model weights

As some have required recently, a set of trained model weights can be downloaded here. Unzip and you can find 3 files. Put the 3 files in Models/save/ and evaluate on it by following the instruction above. This set of parameter was obtained by training for 28 epochs, using current settings, and achieved 62.2/71.5 on the dev set. I didn't save each set of model weights when I ran the experiments originally, so I reran the experiment, causing a slight degration compared with the best score on the table above. I want to clarify that the difference may come from random initialization, so feel free to train your own model weights.

Discussion

Reproduction

As shown above, I still fail to reproduce the results. I think there are some technical details that draw my concern:

Data Preprocessing. I have tried two preprocessing approaches, one of which is used in the implementation of Match-LSTM, and the other is used in the implementation of Bi-DAF. While the latter approach includes lots of reasonable processing, I chose the former one empirically since it yields better performance.
As pointed out in another implementation of R-NET in Keras,

The first formula in (11) of the report contains a strange summand W_v^Q V_r^Q. Both tensors are trainable and are not used anywhere else in the network. We have replaced this product with a single trainable vector.

However, instead of replacing the product with a single trainable vector, I followed the notation and still used two vectors.
Variable sharing. The notation in the original paper was very confusing to me. For example, W_v^P appeared in both equations (4) and (8). In my opinion, they should not be the same since they are multiplied by vectors of total different spaces. As a result, I treat them as different variables empirically.
Hyper-parameters ambiguity. Some hyper-paramters weren't specified in the original paper, including character embedding matrix dimension, truncating of articles and questions, and length of answer span during inference. I set up my own hyper-parameters empirically, mostly following the settings of Match-LSTM and Bi-DAF.
Any other implementation mistakes and bugs.

OOM

The full model could not be trained with NVIDIA Tesla K40m with 12GiB memory. Tensorflow will report serious OOM problem. There are a few possible solutions.

Run with CPU. This can be achieved by assigning a device mask with command line as follows. In fact, my implementation result shown in the previous section was generated by a model trained with CPU. However, this might cause extremely slow training speed. In my experience, it might cost roughly 24 hours per epoch.

CUDA_VISIBLE_DEVICES="" python rnet.py

Reduce hyperparameters. Modifying these parameters might help:
- p_length
- Word embedding dimension: change from 300d GloVe vectors to 100d.
Don't use character embeddings. According to Bi-DAF, character embeddings don't help much. However, Bi-DAF uses 1D-CNNs to generate the character embeddings, while R-NET uses RNNs. As shown in the previous section, the performance dropped for 2%. Further investigation is needed for this part.

r-net-in-tensorflow's People

Contributors

Stargazers

Watchers

Forkers

jerrywind desert0616 sharathns93 singh0777 sdxshuai romxz chenghuige cosecant-csc xiedake zhlj98 leezqcst cutecha ruilifeng mennianshi agnon1573 mlko53 suzhidong wolfhu zxsted airxiechao muximuxi konroyzhu db-li stuartchan phychaos macporal laxengit mbabby excelsimon carolinexull derekkk haif-liu happyyolanda john9281 shenzaimin beethovenvirus hyliu-nlp mrschnappi nicolewongxiao a461925358 focuson2333

r-net-in-tensorflow's Issues

Fine-tune from checkpoint

Is it possible to do a transfer learning on a new similar QA dataset by using the checkpoint from Link as an initialization point?

the result is not good by default setting

11 epoch 1434 batch, Loss:213.51, Acc:0.35
11 epoch 1435 batch, Loss:264.95, Acc:0.37
11 epoch 1436 batch, Loss:293.33, Acc:0.27
11 epoch 1437 batch, Loss:219.04, Acc:0.35
11 epoch 1438 batch, Loss:232.19, Acc:0.42
11 epoch 1439 batch, Loss:199.04, Acc:0.47
11 epoch 1440 batch, Loss:228.41, Acc:0.32
11 epoch 1441 batch, Loss:283.48, Acc:0.30
11 epoch 1442 batch, Loss:183.52, Acc:0.50
11 epoch 1443 batch, Loss:390.74, Acc:0.25
11 epoch 1444 batch, Loss:155.02, Acc:0.53
11 epoch 1445 batch, Loss:168.39, Acc:0.53
11 epoch 1446 batch, Loss:275.70, Acc:0.27
11 epoch 1447 batch, Loss:185.64, Acc:0.50
11 epoch 1448 batch, Loss:207.22, Acc:0.37
11 epoch 1449 batch, Loss:292.28, Acc:0.30
11 epoch 1450 batch, Loss:244.72, Acc:0.25
11 epoch 1451 batch, Loss:214.99, Acc:0.43
11 epoch 1452 batch, Loss:194.86, Acc:0.30
11 epoch 1453 batch, Loss:285.40, Acc:0.35
11 epoch 1454 batch, Loss:225.28, Acc:0.43
11 epoch 1455 batch, Loss:206.02, Acc:0.38
11 epoch 1456 batch, Loss:194.24, Acc:0.47

save path: Models/save/rnet_model_final.ckpt
gpuws@gpuws32g:~/ub16_prj/R-NET-in-Tensorflow$

Why the max length of paragraph is 300?

Hi, @unilight
I have run the code successfully, but when i review the code, I found the max length of the paragraph is 300, as I know , the dataset of SQuAD max length of paragraph is much bigger than 300 word, could you help to figure this problem? thanks

Can we do transfer learning on R-net?

i.e. how can we add new paragraphs for R-net to generate answer from in real practice? Thanks

best loss ?

Can you share the best train/val loss you observed ?

Best Model

Could you maybe make the best model accessible for download?
I tried to reach your results but I struggle with the default configuration to do so.

Issue in evaluation using pretrained weights

I was trying to run evaluate.py as using the pretrained weights provided here http://slam.iis.sinica.edu.tw/demo/RNet/release.zip
But I'm getting errors, and I have no clue why. Posting the whole output here.

python evaluate.py --model_path Models/save/rnet_model27.ckpt
Model Configs:
{'a_length': 20,
'batch_size': 60,
'char_emb': False,
'char_emb_mat_dim': 8,
'char_max_length': 37,
'char_vocab_size': 1368,
'emb_dim': 300,
'glove': '300',
'in_keep_prob': 1.0,
'p_length': 300,
'q_length': 30,
'share_context_LSTM': True,
'span_length': 20,
'state_size': 75,
'word_emb_dim': 300}
Reading data
Loaded 10570 examples from dev
/home/brojo/anaconda3/lib/python3.6/site-packages/h5py/init.py:36: FutureWarning: Conversion of the second argument of issubdtype from float to np.floating is deprecated. In future, it will be treated as np.float64 == np.dtype(float).type.
from ._conv import register_converters as _register_converters
Question and Passage Encoding
Tensor("encoding/stack:0", shape=(60, 30, 150), dtype=float32)
Tensor("encoding/stack_1:0", shape=(60, 300, 150), dtype=float32)
Question-Passage Matching
v_P Tensor("stack:0", shape=(60, 300, 75), dtype=float32)
Self-Matching Attention
h_P Tensor("Self_match/stack:0", shape=(60, 300, 150), dtype=float32)
Output Layer
r_Q Tensor("Sum_600:0", shape=(60, 150), dtype=float32)
h_t1a Tensor("Sum_600:0", shape=(60, 150), dtype=float32)
Tensor("strided_slice_1200:0", shape=(60, 150), dtype=float32)
h_t1a Tensor("strided_slice_1200:0", shape=(60, 150), dtype=float32)
[<tf.Tensor 'Softmax_601:0' shape=(60, 300) dtype=float32>, <tf.Tensor 'Softmax_602:0' shape=(60, 300) dtype=float32>]
Model built
W_uQ:0 (150, 75)
W_uP:0 (150, 75)
W_vP:0 (75, 75)
W_g_QP:0 (300, 300)
W_smP1:0 (75, 75)
W_smP2:0 (75, 75)
W_g_SM:0 (150, 150)
W_ruQ:0 (150, 150)
W_vQ:0 (75, 150)
W_VrQ:0 (30, 75)
W_hP:0 (150, 75)
W_ha:0 (150, 75)
B_v_QP:0 (75,)
B_v_SM:0 (75,)
B_v_rQ:0 (150,)
B_v_ap:0 (75,)
encoding/context_encoding/cell_0/fw/basic_lstm_cell/weights:0 (375, 300)
encoding/context_encoding/cell_0/fw/basic_lstm_cell/biases:0 (300,)
encoding/context_encoding/cell_0/bw/basic_lstm_cell/weights:0 (375, 300)
encoding/context_encoding/cell_0/bw/basic_lstm_cell/biases:0 (300,)
encoding/context_encoding/cell_1/fw/basic_lstm_cell/weights:0 (225, 300)
encoding/context_encoding/cell_1/fw/basic_lstm_cell/biases:0 (300,)
encoding/context_encoding/cell_1/bw/basic_lstm_cell/weights:0 (225, 300)
encoding/context_encoding/cell_1/bw/basic_lstm_cell/biases:0 (300,)
QP_match/basic_lstm_cell/weights:0 (375, 300)
QP_match/basic_lstm_cell/biases:0 (300,)
Self_match/bidirectional_rnn/fw/basic_lstm_cell/weights:0 (225, 300)
Self_match/bidirectional_rnn/fw/basic_lstm_cell/biases:0 (300,)
Self_match/bidirectional_rnn/bw/basic_lstm_cell/weights:0 (225, 300)
Self_match/bidirectional_rnn/bw/basic_lstm_cell/biases:0 (300,)
basic_lstm_cell/weights:0 (300, 600)
basic_lstm_cell/biases:0 (600,)
177 batches
2018-03-25 17:21:17.286704: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.1 instructions, but these are available on your machine and could speed up CPU computations.
2018-03-25 17:21:17.286732: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.2 instructions, but these are available on your machine and could speed up CPU computations.
2018-03-25 17:21:17.286748: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX instructions, but these are available on your machine and could speed up CPU computations.
2018-03-25 17:21:17.286755: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX2 instructions, but these are available on your machine and could speed up CPU computations.
2018-03-25 17:21:17.286763: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use FMA instructions, but these are available on your machine and could speed up CPU computations.
2018-03-25 17:21:22.248791: W tensorflow/core/framework/op_kernel.cc:1152] Not found: Key Self_match/bidirectional_rnn/fw/basic_lstm_cell/weights not found in checkpoint
2018-03-25 17:21:22.249078: W tensorflow/core/framework/op_kernel.cc:1152] Not found: Key Self_match/bidirectional_rnn/fw/basic_lstm_cell/biases not found in checkpoint
2018-03-25 17:21:22.249297: W tensorflow/core/framework/op_kernel.cc:1152] Not found: Key Self_match/bidirectional_rnn/bw/basic_lstm_cell/weights not found in checkpoint
2018-03-25 17:21:22.249521: W tensorflow/core/framework/op_kernel.cc:1152] Not found: Key Self_match/bidirectional_rnn/bw/basic_lstm_cell/biases not found in checkpoint
2018-03-25 17:21:22.249730: W tensorflow/core/framework/op_kernel.cc:1152] Not found: Key QP_match/basic_lstm_cell/weights not found in checkpoint
2018-03-25 17:21:22.249941: W tensorflow/core/framework/op_kernel.cc:1152] Not found: Key QP_match/basic_lstm_cell/biases not found in checkpoint
2018-03-25 17:21:22.251026: W tensorflow/core/framework/op_kernel.cc:1152] Not found: Key encoding/context_encoding/cell_1/fw/basic_lstm_cell/weights not found in checkpoint
2018-03-25 17:21:22.251265: W tensorflow/core/framework/op_kernel.cc:1152] Not found: Key encoding/context_encoding/cell_1/fw/basic_lstm_cell/biases not found in checkpoint
2018-03-25 17:21:22.251493: W tensorflow/core/framework/op_kernel.cc:1152] Not found: Key encoding/context_encoding/cell_1/bw/basic_lstm_cell/weights not found in checkpoint
2018-03-25 17:21:22.251702: W tensorflow/core/framework/op_kernel.cc:1152] Not found: Key encoding/context_encoding/cell_1/bw/basic_lstm_cell/biases not found in checkpoint
2018-03-25 17:21:22.251928: W tensorflow/core/framework/op_kernel.cc:1152] Not found: Key encoding/context_encoding/cell_0/fw/basic_lstm_cell/weights not found in checkpoint
2018-03-25 17:21:22.252137: W tensorflow/core/framework/op_kernel.cc:1152] Not found: Key encoding/context_encoding/cell_0/fw/basic_lstm_cell/biases not found in checkpoint
2018-03-25 17:21:22.252360: W tensorflow/core/framework/op_kernel.cc:1152] Not found: Key encoding/context_encoding/cell_0/bw/basic_lstm_cell/weights not found in checkpoint
2018-03-25 17:21:22.252568: W tensorflow/core/framework/op_kernel.cc:1152] Not found: Key encoding/context_encoding/cell_0/bw/basic_lstm_cell/biases not found in checkpoint
2018-03-25 17:21:22.252776: W tensorflow/core/framework/op_kernel.cc:1152] Not found: Key basic_lstm_cell/weights not found in checkpoint
2018-03-25 17:21:22.252999: W tensorflow/core/framework/op_kernel.cc:1152] Not found: Key basic_lstm_cell/biases not found in checkpoint
Traceback (most recent call last):
File "/home/brojo/anaconda3/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1039, in _do_call
return fn(*args)
File "/home/brojo/anaconda3/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1021, in _run_fn
status, run_metadata)
File "/home/brojo/anaconda3/lib/python3.6/contextlib.py", line 88, in exit
next(self.gen)
File "/home/brojo/anaconda3/lib/python3.6/site-packages/tensorflow/python/framework/errors_impl.py", line 466, in raise_exception_on_not_ok_status
pywrap_tensorflow.TF_GetCode(status))
tensorflow.python.framework.errors_impl.NotFoundError: Key Self_match/bidirectional_rnn/fw/basic_lstm_cell/weights not found in checkpoint
[[Node: save/RestoreV2_9 = RestoreV2[dtypes=[DT_FLOAT], _device="/job:localhost/replica:0/task:0/cpu:0"](_recv_save/Const_0, save/RestoreV2_9/tensor_names, save/RestoreV2_9/shape_and_slices)]]

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "evaluate.py", line 105, in
run()
File "evaluate.py", line 43, in run
new_saver.restore(sess, saved_model)
File "/home/brojo/anaconda3/lib/python3.6/site-packages/tensorflow/python/training/saver.py", line 1457, in restore
{self.saver_def.filename_tensor_name: save_path})
File "/home/brojo/anaconda3/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 778, in run
run_metadata_ptr)
File "/home/brojo/anaconda3/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 982, in _run
feed_dict_string, options, run_metadata)
File "/home/brojo/anaconda3/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1032, in _do_run
target_list, options, run_metadata)
File "/home/brojo/anaconda3/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1052, in _do_call
raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.NotFoundError: Key Self_match/bidirectional_rnn/fw/basic_lstm_cell/weights not found in checkpoint
[[Node: save/RestoreV2_9 = RestoreV2[dtypes=[DT_FLOAT], _device="/job:localhost/replica:0/task:0/cpu:0"](_recv_save/Const_0, save/RestoreV2_9/tensor_names, save/RestoreV2_9/shape_and_slices)]]

Caused by op 'save/RestoreV2_9', defined at:
File "evaluate.py", line 105, in
run()
File "evaluate.py", line 41, in run
new_saver = tf.train.Saver()
File "/home/brojo/anaconda3/lib/python3.6/site-packages/tensorflow/python/training/saver.py", line 1056, in init
self.build()
File "/home/brojo/anaconda3/lib/python3.6/site-packages/tensorflow/python/training/saver.py", line 1086, in build
restore_sequentially=self._restore_sequentially)
File "/home/brojo/anaconda3/lib/python3.6/site-packages/tensorflow/python/training/saver.py", line 691, in build
restore_sequentially, reshape)
File "/home/brojo/anaconda3/lib/python3.6/site-packages/tensorflow/python/training/saver.py", line 407, in _AddRestoreOps
tensors = self.restore_op(filename_tensor, saveable, preferred_shard)
File "/home/brojo/anaconda3/lib/python3.6/site-packages/tensorflow/python/training/saver.py", line 247, in restore_op
[spec.tensor.dtype])[0])
File "/home/brojo/anaconda3/lib/python3.6/site-packages/tensorflow/python/ops/gen_io_ops.py", line 669, in restore_v2
dtypes=dtypes, name=name)
File "/home/brojo/anaconda3/lib/python3.6/site-packages/tensorflow/python/framework/op_def_library.py", line 768, in apply_op
op_def=op_def)
File "/home/brojo/anaconda3/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 2336, in create_op
original_op=self._default_original_op, op_def=op_def)
File "/home/brojo/anaconda3/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 1228, in init
self._traceback = _extract_stack()

NotFoundError (see above for traceback): Key Self_match/bidirectional_rnn/fw/basic_lstm_cell/weights not found in checkpoint
[[Node: save/RestoreV2_9 = RestoreV2[dtypes=[DT_FLOAT], _device="/job:localhost/replica:0/task:0/cpu:0"](_recv_save/Const_0, save/RestoreV2_9/tensor_names, save/RestoreV2_9/shape_and_slices)]]

settings for the best results

I just wonder what were your configurations for getting the best results of 62.6 and 71.5?
I used the default settings of your codes and can only obtain 48F1.

Resource exhausted: OOM when allocating tensor with shape[60,1200]

Hi @unilight

Thanks for your wonderful reproduction of R-Net. I encountered this error below:

W tensorflow/core/framework/op_kernel.cc:1192] Resource exhausted: OOM when allocating tensor with shape[60,1200]

In your readme you mentioned there are some solution to this OOM problem. May you elaborate the third solution, which you mentioned

To achieve this one might have to hack into Models/models_rnet.

How to do it? Thanks :)

Need to add init.py to Models Package

Otherwise you get

Traceback (most recent call last):
  File "rnet.py", line 2, in <module>
    from Models import model_rnet
ImportError: No module named Models