Giter Site home page Giter Site logo

martain-liu / avdc Goto Github PK

View Code? Open in Web Editor NEW

This project forked from flow-diffusion/avdc

0.0 0.0 0.0 32.42 MB

Official repository of Learning to Act from Actionless Videos through Dense Correspondences.

Home Page: https://flow-diffusion.github.io/

License: MIT License

Shell 0.16% Python 99.84%

avdc's Introduction

AVDC

The official codebase for training video policies in AVDC

NEWS: We have released another repository for running our Meta-World and iTHOR experiments here!

teaser_v3.mp4

This repository contains the code for training video policies presented in our work
Learning to Act from Actionless Videos through Dense Correspondences
Po-Chen Ko, Jiayuan Mao, Yilun Du, Shao-Hua Sun, Joshua B. Tenenbaum
website | paper | arXiv | experiment repo

@article{Ko2023Learning,
  title={{Learning to Act from Actionless Videos through Dense Correspondences}},
  author={Ko, Po-Chen and Mao, Jiayuan and Du, Yilun and Sun, Shao-Hua and Tenenbaum, Joshua B},
  journal={arXiv:2310.08576},
  year={2023},
}

Updates

  • 2023/10/21: Support custom task name and any number of videos (Removed task/# of vid constraints leftover from our experiments)
  • 2024/01/02: Released another repository for Meta-World and iTHOR experiments here.
  • 2024/01/03: Updated argumants for DDIM sampling and Classifier-Free Guidance.

Getting started

We recommend to create a new environment with pytorch installed using conda.

conda create -n avdc python=3.9
conda activate avdc
conda install pytorch torchvision torchaudio pytorch-cuda=11.8 -c pytorch -c nvidia

Next, clone the repository and install the requirements

git clone https://github.com/flow-diffusion/AVDC
cd AVDC
pip install -r requirements.txt

Dataset structure

This repo contains example dataset structure in datasets/.

The pytorch dataset classes are defined in flowdiffusion/datasets.py

Training models

For Meta-World experiments, run

cd flowdiffusion
python train_mw.py --mode train
# or python train_mw.py -m train

or run with accelerate

accelerate launch train_mw.py

For iTHOR experiments, run train_thor.py instead of train_mw.py
For bridge experiments, run train_bridge.py instead of train_mw.py

The trained model should be saved in ../results folder

To resume training, you can use -c --checkpoint_num argument.

# This will resume training with 1st checkpoint (should be named as model-1.pt)
python train_mw.py --mode train -c 1

Inferencing

Use the following arguments for inference
-p --inference_path: specify input image path
-t --text: specify the text discription of task
-n sample_steps Optional, the number of steps used in test time sampling. If the specified value less than 100, DDIM sampling will be used.
-g guidance_weight Optional, The weight used for classifier free guidance. Set to positive to turn on classifier free guidance.

For example:

python train_mw.py --mode inference -c 1 -p ../examples/assembly.png -t assembly -g 2 -n 20

Pretrained models

We also provide checkpoints of the models described in our experiments as following.
Meta-World | iTHOR | Bridge

Download and put the .pt file in results/[environment] folder. The resulting directory structure should be results/{mw, thor, bridge}/model-[x].pt, for example results/mw/model-24.pt

Or use download.sh

./download.sh metaworld
# ./download.sh ithor
# ./download.sh bridge

After this, you can use argument -c [x] to resume training or inference with our checkpoint. For example:

python train_mw.py --mode train -c 24

Or

python train_mw.py --mode inference -c 24 -p ../examples/assembly.png -t assembly

Acknowledgements

This codebase is modified from the following repositories:
imagen-pytorch
guided-diffusion

avdc's People

Contributors

kbkbowo avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.