Giter Site home page Giter Site logo

zappandy / babywalk Goto Github PK

View Code? Open in Web Editor NEW

This project forked from hexiang-hu/babywalk

0.0 0.0 0.0 1.84 MB

PyTorch code for the ACL 2020 paper: "BabyWalk: Going Farther in Vision-and-Language Navigationby Taking Baby Steps"

License: MIT License

Shell 2.41% Python 97.59%

babywalk's Introduction

BabyWalk: Going Farther in Vision-and-Language Navigationby Taking Baby Steps

License: MIT

This is the PyTorch implementation of our paper:

BabyWalk: Going Farther in Vision-and-Language Navigationby Taking Baby Steps
Wang Zhu*, Hexiang Hu*, Jiacheng Chen, Zhiwei Deng, Vihan Jain, Eugene Ie, Fei Sha
2020 Annual Conference of the Association for Computational Linguistics (ACL 2020)

[arXiv] [GitHub]

Abstract

Learning to follow instructions is of fundamental importance to autonomous agents for vision-and-language navigation (VLN). In this paper, we study how an agent can navigate long paths when learning from a corpus that consists of shorter ones. We show that existing state-of-the-art agents do not generalize well. To this end, we propose BabyWalk, a new VLN agent that is learned to navigate by decomposing long instructions into shorter ones (BabySteps) and completing them sequentially. A special design memory buffer is used by the agent to turn its past experiences into contexts for future steps. The learning process is composed of two phases. In the first phase, the agent uses imitation learning from demonstration to accomplish BabySteps. In the second phase, the agent uses curriculum-based reinforcement learning to maximize rewards on navigation tasks with increasingly longer instructions. We create two new benchmark datasets (of long navigation tasks) and use them in conjunction with existing ones to examine BabyWalk's generalization ability. Empirical results show that BabyWalk achieves state-of-the-art results on several metrics, in particular, is able to follow long instructions better.

Installation

  1. Install Python 3.7 (Anaconda recommended: https://www.anaconda.com/distribution/).
  2. Install PyTorch following the instructions on https://pytorch.org/ (we used PyTorch 1.1.0 in our experiments).
  3. Download this repository or clone with Git, and then enter the root directory of the repository:
git clone https://github.com/Sha-Lab/babywalk
cd babywalk
  1. Check the installation of required packages in requirement.txt.
  2. Download and preprocess the data
chmod +x download.sh
./download.sh

After this step, check

  • simulator/resnet_feature/ should contain ResNet-152-imagenet.tsv.
  • simulator should contain total_adj_list.json, which replace the Matterport3D simulator
  • src/vocab/vocab_data should contain vocabulary and its glove embedding files train_vocab.txt and train_glove.npy.
  • tasks/ should contain R2R, R4R, R6R, R8R, R2T8, each which a data folder in it containing training/evaluation data.

Training and evaluation

Here we take training on R2R as an example, using BABYWALK.

Warmup with IL

CUDA_VISIBLE_DEVICES=0 python src/train_follower.py \
    --split_postfix "_landmark" \
    --task_name R2R \
    --n_iters 30000 \
    --model_name "follower_bbw" \
    --il_mode "landmark_split" \
    --one_by_one \
    --one_by_one_mode "landmark" \
    --history \
    --log_every 1000

Training with CRL

CUDA_VISIBLE_DEVICES=0 python src/train_follower.py \
    --split_postfix "_landmark" \
    --task_name R2R \
    --n_iters 30000 \
    --curriculum_iters 5000 \
    --model_name "follower_bbw_crl" \
    --one_by_one \
    --one_by_one_mode "landmark" \
    --history \
    --log_every 500 \
    --reward \
    --reward_type "cls" \
    --batch_size 64 \
    --curriculum_rl \
    --max_curriculum 4 \
    --no_speaker \
    --follower_prefix "tasks/R2R/follower/snapshots/follower_bbw_sample_train_iter_30000"

Other baselines

Here we take training on R2R as an example, using Speaker-Follower and Reinforced Cross-modal Matching.

  • Speaker-Follower
CUDA_VISIBLE_DEVICES=0 python src/train_follower.py \
    --task_name R2R \
    --n_iters 50000 \
    --model_name "follower_sf_aug" \
    --add_augment
CUDA_VISIBLE_DEVICES=0 python src/train_follower.py \
    --task_name R2R \
    --n_iters 20000 \
    --model_name "follower_sf" \
    --follower_prefix "tasks/R2R/follower/snapshots/best_model"
  • Reinforced Cross-modal Matching
CUDA_VISIBLE_DEVICES=0 python src/train_follower.py \
    --task_name R2R \
    --n_iters 20000 \
    --model_name "follower_rcm_cls" \
    --reward \
    --reward_type "cls" \
    --batch_size 64 \
    --no_speaker \
    --follower_prefix "tasks/R2R/follower/snapshots/follower_sf_aug_sample_train-literal_speaker_data_augmentation_iter_50000"

Evaluation

Here we take model trained on R2R, using BABYWALK as an example.

  • Evaluate on the validation unseen data of Room 2-to-8.
CUDA_VISIBLE_DEVICES=0 python src/val_follower.py \
    --task_name R2T8 \
    --split_postfix "_landmark" \
    --one_by_one \
    --one_by_one_mode "landmark" \
    --model_name "follower_bbw" \
    --history \
    --follower_prefix "tasks/R2R/follower/snapshots/best_model"
  • Evaluate on the validation seen / unseen data of RxR (x=2,4,6,8).
    • change --task_name R2T8 to --task_name RxR
  • Evaluate on the test data of R2R.
    • set --task_name R2R
    • add --use test
  • For SF/RCM models, evaluate on RxR (x=2,4,6,8).
    • set --task_name RxR
    • set --max_steps 5*x and --max_ins_len 50*x

Download reported models in our paper

chmod +x download_model.sh
./download_model.sh

Performance comparison on SDTW

Models trained on R4R

Model Eval R2R Eval R4R Eval R6R Eval R8R
SF 14.8 9.2 5.2 5.0
RCM(FIDELITY) 18.3 13.7 7.9 6.1
REGRETFUL 13.4 13.5 7.5 5.6
FAST 14.2 15.5 7.7 6.3
BABYWALK 27.8 17.3 13.1 11.5
BABYWALK(COGROUND) 31.6 20.0 15.9 13.9

Models trained on R2R

Model Eval R2R Eval R4R Eval R6R Eval R8R
SF 27.2 6.7 7.2 3.8
RCM(FIDELITY) 34.4 7.2 8.4 4.3
REGRETFUL 40.6 9.8 6.8 2.4
FAST 45.4 7.2 8.5 2.4
BABYWALK 36.9 13.8 11.2 9.8

Citation

Please citing the follow BibTex entry if you are using any content from this repository:

@inproceedings{zhu2020babywalk,
    title = "{B}aby{W}alk: Going Farther in Vision-and-Language Navigation by Taking Baby Steps",
    author = "Zhu, Wang and Hu, Hexiang and Chen, Jiacheng and Deng, Zhiwei and Jain, Vihan and Ie, Eugene and Sha, Fei",
    booktitle = "Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics",
    year = "2020",
    publisher = "Association for Computational Linguistics",
    pages = "2539--2556",
}

babywalk's People

Contributors

hexiang-hu avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.