Giter Site home page Giter Site logo

rw_fg's Introduction

This repository contains the dataset and code for the INLG 2019 paper Revisiting Challenges in Data-to-Text Generation with Fact Grounding

🤝Please kindly cite this work if it helps your research:

@inproceedings{wang-2019-revisiting,
title = "Revisiting Challenges in Data-to-Text Generation with Fact Grounding",
author = "Wang, Hongmin",
booktitle = "Proceedings of the 12th International Conference on Natural Language Generation",
month = oct # "{--}" # nov,
year = "2019",
address = "Tokyo, Japan",
publisher = "Association for Computational Linguistics",
url = "https://www.aclweb.org/anthology/W19-8639",
doi = "10.18653/v1/W19-8639",
pages = "311--322"}

Get the dataset

  • 👌The dataset/scripts contains the scripts to prepare the dataset from stratch.

    • The crawl, purification and enrichment directories have respective README.md files inside to describe their functionalities.
  • 👍Download from rotowire_fg and place the 3 folders under dataset/

    • The final purified, enriched and enlarged dataset are included in new_ncpcc without having to execute the scripts.

Run the model

👉Please go to the README in under model

Run Evaluation Script

👉Refer to this line in the model/run.sh

rw_fg's People

Contributors

wanghm92 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

rw_fg's Issues

Evaluation error

Using the evaluation script in this repo to evaluate the tgt_test.norm.filter.mwe.trim.txt file against the tgt_test.norm.filter.mwe.trim.txt file in the newncpcc folder, the precision is only 95.16, even lower than the reported precision score of the NCP+TR model in your paper (95.70). There must be something wrong!

RuntimeError: invalid argument 2: sizes do not match at /pytorch/torch/lib/THC/generated/../generic/THCTensorMasked.cu:13

Hi, interesting work and thanks for providing the code!
I cloned your code and try to train the model but got following error. I am not sure what is the course as there are few posts about this issue on google. Here is the exception:

Loading train dataset from ../dataset/new_ncpcc/pt_data/newcc-trl/roto-newcc-trl.train.1.pt, number of examples: 5232
rw_fg/model/onmt/modules/GlobalAttention.py:176: UserWarning: Implicit dimension choice for softmax has been deprecated. Change the call to include dim=X as an argument.
align_vectors = self.sm(align.view(batch*targetL, sourceL))
rw_fg/model/onmt/modules/CopyGenerator.py:100: UserWarning: Implicit dimension choice for softmax has been deprecated. Change the call to include dim=X as an argument.
prob = F.softmax(logits)
Traceback (most recent call last):
File "train.py", line 463, in
main()
File "train.py", line 455, in main
train_model(model1, model2, fields, optim1, optim2, data_type, model_opt)
File "train.py", line 261, in train_model
train_stats, train_stats2 = trainer.train(train_iter, epoch, report_func)
File "rw_fg/model/onmt/Trainer.py", line 182, in train
report_stats, total_stats2, report_stats2, normalization)
File "rw_fg/model/onmt/Trainer.py", line 404, in _gradient_accumulation
self.model2((emb, trimmed_table_embs), tgt, src_lengths, dec_state)
File "lib/python2.7/site-packages/torch/nn/modules/module.py", line 357, in call
result = self.forward(*input, **kwargs)
File "rw_fg/model/onmt/Models.py", line 725, in forward
memory_lengths=lengths)
File "lib/python2.7/site-packages/torch/nn/modules/module.py", line 357, in call
result = self.forward(*input, **kwargs)
File "rw_fg/model/onmt/Models.py", line 337, in forward
tgt, (memory_bank, trimmed_table_embs), state, memory_lengths=memory_lengths)
File "w_fg/model/onmt/Models.py", line 621, in run_forward_pass
memory_lengths=memory_lengths)
File "lib/python2.7/site-packages/torch/nn/modules/module.py", line 357, in call
result = self.forward(*input, **kwargs)
File "rw_fg/model/onmt/modules/GlobalAttention.py", line 173, in forward
align.data.masked_fill
(1 - mask, -float('inf'))
RuntimeError: invalid argument 2: sizes do not match at /pytorch/torch/lib/THC/generated/../generic/THCTensorMasked.cu:13

More details:

  • my environment:
    pytorch==0.3.1
    python==2.7
  • command
    I changed $BASE in run.sh and run the script

Hope to hear from you soon,
Thanks!

What is the command to process the data?

Hi, interesting work and thanks for providing the code!
Based on the data you provided in the google drive, I assume that this is how to preprocess the data using OpenNMT:
python preprocess.py -train_src1 $BASE/train/src_train.norm.trim.ncp.txt -train_tgt1 $BASE/train/train_content_plan_ids.txt -train_src2 $BASE/train/train_content_plan_ids.txt -train_tgt2 $BASE/train/tgt_train.norm.filter.mwe.trim.txt -valid_src1 $BASE/valid/src_valid.norm.trim.ncp.txt -valid_tgt1 $BASE/valid/valid_content_plan_ids.txt -valid_src2 $BASE/valid/valid_content_plan_ids.txt -valid_tgt2 $BASE/valid/tgt_valid.norm.filter.mwe.trim.txt -save_data $BASE/preprocess/ -src_seq_length 1000 -tgt_seq_length 1000 -dynamic_dict -train_ptr $BASE/train/train_ptrs.txt

Is it correct? I assume the train_content_plan_tks.txt is used for eval.
Thanks

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.