Giter Site home page Giter Site logo

seominjoon / qrn Goto Github PK

View Code? Open in Web Editor NEW
138.0 11.0 32.0 683 KB

Query-Reduction Networks (QRN)

Home Page: http://uwnlp.github.io/qrn/

License: MIT License

Python 95.54% Shell 0.24% HTML 3.55% Makefile 0.67%
qrn tensorflow babi dialog qa rnn university-of-washington iclr2017

qrn's Introduction

Query-Reduction Networks (QRN)

Teaser figure for QRN QRN is a purely sequential model like LSTM or GRU (but simpler than them) for story-based question answering (bAbI QA tasks). QRN is implemented using TensorFlow. Here are some notable results (error rates in %) on bAbI QA dataset:

Task LSTM MemN2N Ours
1k avg 51.3 15.2 9.9
10k avg 36.4 4.2 0.3

See model details and more results in this paper.

1. Quick Start

We are assuming you are working in a Linux environment. Make sure that you have Python (verified on 3.5, issues have been reported with 2.x), and you installed these Python packages: tensorflow (>=0.8, <=0.11, issues have been reported with >=0.12) and progressbar2.

First, download bAbI QA dataset (note that this downloads the dataset to $HOME/data/babi):

chmod +x download.sh; ./download.sh 

Then preprocess the data for a particular task, say Task 2 (this stores the preprocessed data in data/babi/en/02/):

python -m prepro --task 2

Finally, you train the model (test is automatically performed at the end):

python -m babi.main --noload --task 2

It took ~3 minutes on my laptop using CPU.

You can run it several times with new weight initialization (e.g. 10) and report the test result with the lowest dev loss:

python -m babi.main --noload --task 2 --num_trials 10

This is critical to stably get the reported results; some weight initialization leads to a bad optima.

2. Visualizing Results

After training and testing, the result is stored in evals/babi/en/02-None-00-01/test_0150.json. We can visualize the magnitudes of the update and reset gates using the result file. Note that you need jinja2 (Python package). Run the following command to host a web server for visualization and open it via browser:

python -m babi.visualize_result --task 2 --open True

then click the file(s). It takes a a few seconds to load the heatmap coloring of the gate values. You will see something like this:

visualization

By default visualize_result retrieves the first trial (1). If you want to retrieve a particular trial number, specify the trial number if --trial_num option.

3. 10k and Other Options

To train the model on 10k dataset, first preprocess the data with large flag:

python -m prepro --task 2 --large True

Then train the model with large flag as well:

python -m babi.main --noload --task 2 --large True --batch_size 128 --init_lr 0.1 --wd 0.0005 --hidden_size 200

Note that the batch size, init_lr, wd, and hidden_size changed.

Finally, visualization requires the large flag:

python -m babi.visualize_result --task 2 --open True --large True

To control other parameters and see other options, type:

python -m babi.main -h

4. Run bAbI dialog

To train the model on bAbI dialog, preprocess the data with bAbI dialog dataset:

python -m prepro-dialog --task 2

Then train the model:

python -m dialog.main --noload --task 2

To use match, use_match flag is required:

python -m dialog.main --noload --task 2 --use_match True

To use RNN decoder, use_rnn flag is required:

python -m dialog.main --noload --task 2 --use_rnn True

qrn's People

Contributors

seominjoon avatar shmsw25 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

qrn's Issues

Taking only the system responses as Previous utterances

In the implementation of QRN for babi dialog it seems like the examples only include the bot(system) responses as the previous utterances (x_1,x_2,...,x_T ) in the dialog. Shouldn't it take the sequence of user utterances and the system utterances as the previous set of utterances?

Thanks in advance.

InvalidArgumentError: Received a label value of 292 which is outside the valid range of [0, 10)

I get the following error when training the model for bAbI dialog task 5.
The command line args used are:
python dialog/main.py --load=False --task 5 --num_epochs 2 --data_dir "data/dialog-babi-tasks" --val_period 1 --save_period 1 --train=True --draft=True

The exact error is:

tensorflow.python.framework.errors_impl.InvalidArgumentError: Received a label value of 292 which is outside the valid range of [0, 10).  Label values: 0 0 0 0 0 0 0 4 227 292 0 0 0 4 0 0 0 7 0 1 0 32 9 0 0 0 0 0 0 0 0 0
[[Node: towers/gpu_0/loss/ans_loss/SparseSoftmaxCrossEntropyWithLogits_1/SparseSoftmaxCrossEntropyWithLogits = SparseSoftmaxCrossEntropyWithLogits[T=DT_FLOAT, Tlabels=DT_INT32, _device="/job:localhost/replica:0/task:0/cpu:0"](towers/gpu_0/class/Linear_1/out1, towers/gpu_0/loss/ans_loss/Gather_2)]]

After going through the code, the answers placeholder is broken into 8 pieces, where each piece refers to a different part of answer here - https://github.com/uwnlp/qrn/blob/master/prepro-dialog.py#L232

So, we get logits for each part here separately as follows:

0 = {Tensor} Tensor("towers/gpu_0/class/Linear/out0:0", shape=(32, 15), dtype=float32, device=/device:GPU:0)
1 = {Tensor} Tensor("towers/gpu_0/class/Linear_1/out1:0", shape=(32, 10), dtype=float32, device=/device:GPU:0)
2 = {Tensor} Tensor("towers/gpu_0/class/Linear_2/out2:0", shape=(32, 10), dtype=float32, device=/device:GPU:0)
3 = {Tensor} Tensor("towers/gpu_0/class/Linear_3/out3:0", shape=(32, 4), dtype=float32, device=/device:GPU:0)
4 = {Tensor} Tensor("towers/gpu_0/class/Linear_4/out4:0", shape=(32, 3), dtype=float32, device=/device:GPU:0)
5 = {Tensor} Tensor("towers/gpu_0/class/Linear_5/out5:0", shape=(32, 674), dtype=float32, device=/device:GPU:0)
6 = {Tensor} Tensor("towers/gpu_0/class/Linear_6/out6:0", shape=(32, 645), dtype=float32, device=/device:GPU:0)
7 = {Tensor} Tensor("towers/gpu_0/class/Linear_7/out7:0", shape=(32, 2), dtype=float32, device=/device:GPU:0)

where the 2nd dimension refers to num_classes for that piece of the answer if/when applicable. The 2nd dimension matches the size of dict for various positions in the answers
<class 'list'>: [15, 10, 10, 4, 3, 674, 645, 2], when pre-processing the dataset.

But, when I run the code, it throws the error mentioned above.

I'm using tensorflow 0.12.1 as 0.11 is deprecated now and there are no significant changes between the 2 releases as per - https://github.com/tensorflow/tensorflow/blob/master/RELEASE.md#release-0120

Error in Dialog/visualize_result.py

When I try to visualize the result of Dialog dataset after training I get the following error.
Traceback (most recent call last):
File "/home/prayalankar/anaconda3/envs/tyu/lib/python3.5/runpy.py", line 184, in _run_module_as_main
"main", mod_spec)
File "/home/prayalankar/anaconda3/envs/tyu/lib/python3.5/runpy.py", line 85, in _run_code
exec(code, run_globals)
File "/home/prayalankar/qrn/dialog/visualize_result.py", line 174, in
list_results(ARGS)
File "/home/prayalankar/qrn/dialog/visualize_result.py", line 88, in list_results
X, Q, Y, Y1, Y2, Y3, Y4, Y5, Y6, Y7 = data[:10]
ValueError: not enough values to unpack (expected 10, got 4)

ValueError: The shape for towers/gpu_0/networks/Bi-RNN/layer_0/FW/while/Merge_3:0 is not an invariant for the loop. on babi_rnn

python3 -m babi_rnn.main --noload --task 3

.....

WARNING:tensorflow:tf.op_scope(values, name, default_name) is deprecated, use tf.name_scope(name, default_name, values)
WARNING:tensorflow:tf.op_scope(values, name, default_name) is deprecated, use tf.name_scope(name, default_name, values)
Traceback (most recent call last):
File "/usr/local/Cellar/python3/3.6.0_1/Frameworks/Python.framework/Versions/3.6/lib/python3.6/runpy.py", line 193, in _run_module_as_main
"main", mod_spec)
File "/usr/local/Cellar/python3/3.6.0_1/Frameworks/Python.framework/Versions/3.6/lib/python3.6/runpy.py", line 85, in _run_code
exec(code, run_globals)
File "/Users//qrn/babi_rnn/main.py", line 249, in
tf.app.run()
File "/Users//ve_tf0.11_py3/venv/lib/python3.6/site-packages/tensorflow/python/platform/app.py", line 30, in run
sys.exit(main(sys.argv[:1] + flags_passthrough))
File "/Users//qrn/babi_rnn/main.py", line 165, in main
summary = _main(config, num_trials)
File "/Users//qrn/babi_rnn/main.py", line 217, in _main
runner.initialize()
File "/Users//qrn/babi_rnn/base_model.py", line 71, in initialize
tower.initialize()
File "/Users//qrn/babi_rnn/model.py", line 165, in initialize
sequence_length=m_length, dtype='float', num_layers=L)
File "/Users//qrn/my/tensorflow/rnn.py", line 634, in dynamic_bidirectional_rnn
time_major=time_major, feed_prev_out=feed_prev_out, scope='FW')
File "/Users//qrn/my/tensorflow/rnn.py", line 488, in dynamic_rnn
swap_memory=swap_memory, sequence_length=sequence_length, feed_prev_out=feed_prev_out)
File "/Users/qrn/my/tensorflow/rnn.py", line 606, in _dynamic_rnn_loop
swap_memory=swap_memory)
File "/Users//ve_tf0.11_py3/venv/lib/python3.6/site-packages/tensorflow/python/ops/control_flow_ops.py", line 2518, in while_loop
result = context.BuildLoop(cond, body, loop_vars, shape_invariants)
File "/Users//ve_tf0.11_py3/venv/lib/python3.6/site-packages/tensorflow/python/ops/control_flow_ops.py", line 2356, in BuildLoop
pred, body, original_loop_vars, loop_vars, shape_invariants)
File "/Users//ve_tf0.11_py3/venv/lib/python3.6/site-packages/tensorflow/python/ops/control_flow_ops.py", line 2337, in _BuildLoop
_EnforceShapeInvariant(m_var, n_var)
File "/Users//ve_tf0.11_py3/venv/lib/python3.6/site-packages/tensorflow/python/ops/control_flow_ops.py", line 578, in _EnforceShapeInvariant
% (merge_var.name, m_shape, n_shape))
ValueError: The shape for towers/gpu_0/networks/Bi-RNN/layer_0/FW/while/Merge_3:0 is not an invariant for the loop. It enters the loop with shape (32, 91), but has shape (32, 122) after one iteration. Provide shape invariants using either the shape_invariants argument of tf.while_loop or set_shape() on the loop variables.
(venv) ali-186590cc37a5:qrn$

babi-dialog task6

Everything works for me except the babi-dialog task6.

python -m prepro-dialog --task 6

python -m dialog.main --noload --task 6

Error message here:

Traceback (most recent call last):
  File "/home/jason/anaconda2/lib/python2.7/runpy.py", line 174, in _run_module_as_main
    "__main__", fname, loader, pkg_name)
  File "/home/jason/anaconda2/lib/python2.7/runpy.py", line 72, in _run_code
    exec code in run_globals
  File "/home/jason/qrn/dialog/main.py", line 281, in <module>
    tf.app.run()
  File "/home/jason/anaconda2/lib/python2.7/site-packages/tensorflow/python/platform/app.py", line 48, in run
    _sys.exit(main(_sys.argv[:1] + flags_passthrough))
  File "/home/jason/qrn/dialog/main.py", line 172, in main
    summary = _main(config, num_trials)
  File "/home/jason/qrn/dialog/main.py", line 238, in _main
    runner.initialize()
  File "dialog/base_model.py", line 65, in initialize
    tower.initialize()
  File "dialog/model.py", line 182, in initialize
    A = Alist[0] if self.rnn else Alist[i]
IndexError: list index out of range

Can you help? Thanks :)

Unsupported operand

Config ID <absl.flags._flag.Flag object at 0x7f3fcc0c47b8>, task <absl.flags._flag.Flag object at 0x7f3fcc0c42b0>, 1 trials
Traceback (most recent call last):
File "/home/aniket/anaconda3/envs/py305/lib/python3.5/runpy.py", line 193, in _run_module_as_main
"main", mod_spec)
File "/home/aniket/anaconda3/envs/py305/lib/python3.5/runpy.py", line 85, in _run_code
exec(code, run_globals)
File "/home/aniket/qrn/babi/main.py", line 272, in
tf.app.run()
File "/home/aniket/anaconda3/envs/py305/lib/python3.5/site-packages/tensorflow/python/platform/app.py", line 126, in run
_sys.exit(main(argv))
File "/home/aniket/qrn/babi/main.py", line 181, in main
summary = _main(config, num_trials)
File "/home/aniket/qrn/babi/main.py", line 191, in _main
load_metadata(config)
File "/home/aniket/qrn/babi/main.py", line 135, in load_metadata
data_dir = os.path.join(config.data_dir, config.lang + ("-10k" if config.large else ""))
TypeError: unsupported operand type(s) for +: 'Flag' and 'str'

How to solve this error? Please help me.

babi-dialog task5

Hello,

I faced this problem below when I ran Task 5 in babi-dialog (other task 1-4 are fine). I checked the code since like the loss is nan in this case. Could you please help me with the issue?

InvalidArgumentError (see above for traceback): Nan in summary histogram for: HistogramSummary_8
	 [[Node: HistogramSummary_8 = HistogramSummary[T=DT_FLOAT, _device="/job:localhost/replica:0/task:0/cpu:0"](HistogramSummary_8/tag, gpu_sync/average_gradients/Mean_8)]]

my python version is 3.5 and tensorflow is 0.11.0

@shmsw25 @seominjoon can you?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.