Giter Site home page Giter Site logo

agakshat / visualdialog-pytorch Goto Github PK

View Code? Open in Web Editor NEW
14.0 3.0 2.0 303 KB

Community Regularization of Visually Grounded Dialog https://arxiv.org/abs/1808.04359

License: GNU General Public License v3.0

Python 100.00%
visual-dialog reinforcement-learning multi-agent natural-language-processing curriculum-learning reinforce pytorch emergent-behavior communication dialog

visualdialog-pytorch's Introduction

Community Regularization of Visually Grounded Dialog

Akshat Agarwal*, Swaminathan Gurumurthy*, Vasu Sharma, Mike Lewis, Katia Sycara

Carnegie Mellon University, University of Pittsburgh

This repository contains a PyTorch implementation for our arXiv paper 1808.04359 on Community Regularization for Visually Grounded Dialog. The task requires goal-oriented exchange of information in natural language, however asking the agents to maximize information exchange while requiring them to adhere to the rules of human languages is an ill-posed optimization problem. Our solution, Community Regularization, involves each agent interacting with and learning from multiple agents, which results in more grammatically correct, relevant and coherent dialog without sacrificing information exchange. If you find this work useful, please cite our paper using the following BibTeX:

@inproceedings{agarwal2018community,
  title={Community Regularization of Visually-Grounded Dialog},
  author={Agarwal, Akshat and Gurumurthy, Swaminathan and Sharma, Vasu and Lewis, Michael and Sycara, Katia},
  booktitle={Proceedings of the 18th International Conference on Autonomous Agents and Multiagent Systems (AAMAS 2019), Montreal, Canada},
  year={2019},
  organization={IFAAMAS}
}

Installation and Downloading Data

# set up a clean virtual environment
virtualenv -p python3 ~/visualdialog
source ~/visualdialog/bin/activate # you will have to run this command in every new terminal, alternatively add macro to your .bashrc

pip3 install torch torchvision (or as appropriate from pytorch.org)
sudo apt-get install -y tensorboardX h5py 

git clone https://github.com/agakshat/visualdialog-pytorch.git
cd visualdialog-pytorch

# download visual dialog data
mkdir data
cd data
wget https://filebox.ece.vt.edu/~jiasenlu/codeRelease/visDial.pytorch/data/vdl_img_vgg.h5
wget https://filebox.ece.vt.edu/~jiasenlu/codeRelease/visDial.pytorch/data/visdial_data.h5
wget https://filebox.ece.vt.edu/~jiasenlu/codeRelease/visDial.pytorch/data/visdial_params.json
# however, these data files have 512x7x7 image embeddings, in place of which we 
# used 4096 size image embeddings. we download that in another folder
mkdir v09
cd v09
wget https://computing.ece.vt.edu/~abhshkdz/visdial/data/v0.9/visdial_params.json
wget https://computing.ece.vt.edu/~abhshkdz/visdial/data/v0.9/data_img_vgg16_relu7.h5

mkdir save

Training

# now run the code

# Option 1: Train from scratch, including 15 epochs of supervised learning
# followed by RL through curriculum
python main.py --num_abots 3 --num_qbots 1 --scratch --outf save/temp_dir

# Option 2: Start training from RL, assuming pretrained supervised learning agents
python main.py --num_abots 3 --num_qbots 1 --curr  --model_path save/pretrained_SL.pth --outf save/temp_dir

Important Command Line Arguments:

  1. --data_dir specifies path to data folder. Default is data/
  2. --v09_data_dir specifies path to alternative (v09 img files) data folder. Default is data/v09/ (There is no need to change these if you installed using the exact commands as above)
  3. --num_qbots and --num_abots specifies number of Q-Bots and A-Bots, respectively
  4. --model_path specifies the torch .pt file with the pretrained agents to be loaded.
  5. --outf specifies the save directory where the trained models will be saved epoch-wise, along with tensorboard logs
  6. --scratch if specified, the agents are trained from scratch, starting with supervised learning
  7. --curr if specified, the agents are trained from the beginning of the curriculum, assuming that --model_path has been specified to load SL pretrained model files
  8. --start_curr K if specified, the agents start curriculum training not from the beginning, but after the first 10-K rounds of curriculum have happened. Look at main.py for details.
  9. --batch_size default is 75, which you might need to reduce depending on the GPU being used. Note that as curriculum training progresses, progressively greater amount of GPU memory is used, becoming constant only when the agents are training purely via RL.

Evaluation

# To run only the evaluation, get image retrieval percentile scores and/or view generated dialog:
python main.py --num_abots 3 --num_qbots 1 --curr  --model_path save/pretrained_SL.pth --outf save/temp_dir --eval 1

# To get answer retrieval Mean Rank, MRR and Recall@k metrics:
python evaluate_mrr.py --num_abots 3 --num_qbots 1 --model_path save/pretrained_model_file.pth

Example of generated dialog

ex

Acknowledgement

Credits to Jiasen Lu for his network definitions of the A-Bot encoders and decoders

visualdialog-pytorch's People

Contributors

agakshat avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

Forkers

ammieqi hyzcn

visualdialog-pytorch's Issues

Evaluation Metrics

Hi, thanks for your sharing.
But, I'm a little confused. Do you evaluate with only 20 candidate answers?

Error when training

Thanks for sharig the codes. I got an error when training with the command "python main.py --num_abots 3 --num_qbots 1 --scratch --outf save/temp_dir". My environment is Python 3.6 with pytorch 0.4.0. Can you help me about that?

(community) lvxinyu@12315:~$ CUDA_VISIBLE_DEVICES=4 python /home/lvxinyu/code/visualdialog-pytorch/main.py --num_abots 3 --num_qbots 1 --scratch --outf /home/lvxinyu/code/visualdialog-pytorch/data/v09/save/temp_dir
DataLoader loading: train
Loading image feature from data/vdl_img_vgg.h5
train number of data: 82783
Loading txt from data/visdial_data.h5
Vocab Size: 8964
DataLoader loading: test
Loading image feature from data/vdl_img_vgg.h5
test number of data: 40504
Loading txt from data/visdial_data.h5
Vocab Size: 8964
Initializing A-Bot and Q-Bot...
/home/lvxinyu/.local/lib/python3.6/site-packages/torch/nn/modules/rnn.py:38: UserWarning: dropout option adds dropout after all but last recurrent layer, so non-zero dropout expects num_layers greater than 1, but got dropout=0.5 and num_layers=1
"num_layers={}".format(dropout, num_layers))
Starting Epoch: 1 | K: 10
Done with Batch # 20 | Av. Time Per Batch: 1.090s
Done with Batch # 40 | Av. Time Per Batch: 1.015s
Done with Batch # 60 | Av. Time Per Batch: 0.998s
Done with Batch # 80 | Av. Time Per Batch: 1.034s
Done with Batch # 100 | Av. Time Per Batch: 1.069s
Done with Batch # 120 | Av. Time Per Batch: 1.067s
Done with Batch # 140 | Av. Time Per Batch: 1.088s
Done with Batch # 160 | Av. Time Per Batch: 1.043s
Done with Batch # 180 | Av. Time Per Batch: 1.075s
Done with Batch # 200 | Av. Time Per Batch: 1.092s
Done with Batch # 220 | Av. Time Per Batch: 1.037s
Done with Batch # 240 | Av. Time Per Batch: 1.061s
Done with Batch # 260 | Av. Time Per Batch: 1.038s
Done with Batch # 280 | Av. Time Per Batch: 1.051s
Done with Batch # 300 | Av. Time Per Batch: 1.099s
Done with Batch # 320 | Av. Time Per Batch: 1.096s
Done with Batch # 340 | Av. Time Per Batch: 1.076s
Done with Batch # 360 | Av. Time Per Batch: 1.067s
Done with Batch # 380 | Av. Time Per Batch: 1.057s
Done with Batch # 400 | Av. Time Per Batch: 1.066s
Done with Batch # 420 | Av. Time Per Batch: 1.078s
Done with Batch # 440 | Av. Time Per Batch: 1.051s
Done with Batch # 460 | Av. Time Per Batch: 1.085s
Done with Batch # 480 | Av. Time Per Batch: 1.076s
Done with Batch # 500 | Av. Time Per Batch: 1.064s
Done with Batch # 520 | Av. Time Per Batch: 1.052s
Done with Batch # 540 | Av. Time Per Batch: 1.093s
Done with Batch # 560 | Av. Time Per Batch: 1.045s
Done with Batch # 580 | Av. Time Per Batch: 1.076s
Done with Batch # 600 | Av. Time Per Batch: 1.031s
Done with Batch # 620 | Av. Time Per Batch: 1.043s
Done with Batch # 640 | Av. Time Per Batch: 1.073s
Done with Batch # 660 | Av. Time Per Batch: 1.028s
Done with Batch # 680 | Av. Time Per Batch: 1.039s
Done with Batch # 700 | Av. Time Per Batch: 1.073s
Done with Batch # 720 | Av. Time Per Batch: 1.035s
Done with Batch # 740 | Av. Time Per Batch: 1.074s
Done with Batch # 760 | Av. Time Per Batch: 1.085s
Done with Batch # 780 | Av. Time Per Batch: 1.070s
Done with Batch # 800 | Av. Time Per Batch: 1.047s
Done with Batch # 820 | Av. Time Per Batch: 1.073s
Done with Batch # 840 | Av. Time Per Batch: 1.083s
Done with Batch # 860 | Av. Time Per Batch: 1.058s
Done with Batch # 880 | Av. Time Per Batch: 1.067s
Done with Batch # 900 | Av. Time Per Batch: 1.043s
Done with Batch # 920 | Av. Time Per Batch: 1.063s
Done with Batch # 940 | Av. Time Per Batch: 1.063s
Done with Batch # 960 | Av. Time Per Batch: 1.064s
Done with Batch # 980 | Av. Time Per Batch: 1.070s
Done with Batch # 1000 | Av. Time Per Batch: 1.076s
Done with Batch # 1020 | Av. Time Per Batch: 1.063s
Done with Batch # 1040 | Av. Time Per Batch: 1.073s
Done with Batch # 1060 | Av. Time Per Batch: 1.087s
Done with Batch # 1080 | Av. Time Per Batch: 1.056s
Done with Batch # 1100 | Av. Time Per Batch: 1.070s
Traceback (most recent call last):
File "/home/lvxinyu/code/visualdialog-pytorch/main.py", line 624, in
im_loss_epoch_n = train(epoch,k_curr)
File "/home/lvxinyu/code/visualdialog-pytorch/main.py", line 132, in train
lm_loss.backward()
File "/home/lvxinyu/.local/lib/python3.6/site-packages/torch/tensor.py", line 93, in backward
torch.autograd.backward(self, gradient, retain_graph, create_graph)
File "/home/lvxinyu/.local/lib/python3.6/site-packages/torch/autograd/init.py", line 89, in backward
allow_unreachable=True) # allow_unreachable flag
RuntimeError: The expanded size of the tensor (75) must match the existing size (58) at non-singleton dimension 0

why invoke the encoder for one time and the decoder for two times?

Hi Akshat,

Sorry to trouble you again. Recently I am reading your code and I have a naive question. For each bot, I am wondering why did you invoke the encoder for one time and the decoder for two times.

featG, ques_hidden1 = abots[abot_idx][0](ques_emb_g, his_emb_g, img_input, ques_hidden1, hist_hidden1, rnd+1)

_, ques_hidden1 = abots[abot_idx][2](featG.view(1, -1, opt.ninp), ques_hidden1)

logprob, _ = abots[abot_idx][2](ans_emb, ques_hidden1)

Usually, when I implement the encoder-decoder framework, I just encode the input into a hidden state and then decode it into sequence for only one time. I don't understand the second line of the above 3 lines of codes. Could you tell me what's the second line of code used for?

Thanks.

running bugs and what's the v09 folder used for?

Hi Akshat Agarwal,

Thanks for your sharing of this code. Recently I tried to run your code as an baseline, while I encontered some problems.

  1. When I run the main.py file I got the following issue. Are you sure your code can be run successfully in the environment of Python 3.6 and Pytorch 0.4.0?
/home/wanyao/.conda/envs/py36/lib/python3.6/site-packages/torch/nn/modules/rnn.py:38: UserWarning: dropout option adds dropout after all but last recurrent layer, so non-zero dropout expects num_layers greater than 1, but got dropout=0.5 and num_layers=1
  "num_layers={}".format(dropout, num_layers))
Traceback (most recent call last):
  File "main.py", line 651, in <module>
    optimizerAbotRLarr = []
  File "main.py", line 115, in train

  File "/home/wanyao/.conda/envs/py36/lib/python3.6/site-packages/torch/nn/modules/module.py", line 491, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/wanyao/www/Dropbox/ghproj-py36/visualdialog-pytorch-original/networks/encoder_QIH.py", line 33, in forward
    img_emb = F.tanh(self.img_embed(img_raw.view(-1,1,self.img_feat_size)))
RuntimeError: invalid argument 2: size '[-1 x 1 x 4096]' is invalid for input with 7526400 elements at /pytorch/aten/src/TH/THStorage.c:37
  1. My second concern is that what's the v09 folder used for? I know in the repos of visDial.pytorch from Jiasen, only the vdl_img_vgg.h5 is used for extract CNN features for images, why did you use both of the vdl_img_vgg.h5 and data_img_vgg16_pool5.h5 to extract the feature of images? Furthermore, since the file data_img_vgg16_pool5.h5 is too large (50gb), it will take lots of time to load it into memory, and I think it's time consuming to use this file for development/debug. Do you have any suggestion on accelerating this process, or can I just use the vdl_img_vgg.h5 to represent the images?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.