Giter Site home page Giter Site logo

instruct-video-to-video's Introduction

This is the code release for the ICLR2024 paper Consistent Video-to-Video Transfer Using Synthetic Dataset.

teaser

Quick Links

Updates

  • 2024/02/13: The official synthetic data and model will not be released due to Amazon policy, but we provide a third party reproduction of the synthetic data and model weights. Please refer to this github repo
  • 2023/11/29: We have updated paper with more comparison to recent baseline methods and updated the comparison video. Gradio demo code is uploaded.

Installation

git clone https://github.com/amazon-science/instruct-video-to-video.git
pip install -r requirements.txt

NOTE: The code is tested on PyTorch 2.1.0+cu11.8 and corresponding xformers version. Any PyTorch version > 2.0 should work but please install the right corresponding xformers version.

Video Editing

We are undergoing the model release process. Please stay tuned.

Download the InsV2V model weights and change the ckpt path in the following notebook.

โœจ๐Ÿš€ This notebook provide a sample code to conduct text-based video editing.

Download LOVEU Dataset for Testing

Please follow the instructions in the LOVEU Dataset to download the dataset. Use the following script to run editing on the LOVEU dataset:

python insv2v_run_loveu_tgve.py \
    --config configs/instruct_v2v.yaml \
    --ckpt-path [PATH TO THE CHECKPOINT] \
    --data-dir [PATH TO THE LOVEU DATASET] \
    --with_optical_flow \ # use motion compensation
    --text-cfg 7.5 10 \
    --video-cfg 1.2 1.5 \
    --image-size 256 384

Note: you may need to try different combination of image resolution, video/text classifier free guidance scale to find the best editing results.

Example results of editing LOVEU-TGVE Dataset:

Image Description Image Description
Image Description Image Description
Image Description Image Description
Image Description Image Description
Image Description Image Description

Synthetic Video Prompt-to-Prompt Dataset

Generation pipeline of the synthetic video dataset: generation pipeline

Examples of the synthetic video dataset:

Image Description Image Description
Image Description Image Description
Image Description Image Description
Image Description Image Description
Image Description Image Description

Training

Download Foundational Models

Download the foundational models and place them in the pretrained_models folder.

Download Synthetic Video Dataset

See download link in the third party reproduction

Train the Model

Put the synthetic video dataset in the video_ptp folder.

Run the following command to train the model:

python main.py --config configs/instruct_v2v.yaml -r # add -r to resume training if the training is interrupted

Create Synthetic Video Dataset

If you want to create your own synthetic video dataset, please follow the instructions

vae_ckpt = 'VAE_PATH'
unet_ckpt = 'UNet_PATH'
text_model_ckpt = 'Text_MODEL_PATH'
  • Download the edit prompt files from Instruct Pix2Pix. The prompt file should be gpt-generated-prompts.jsonl, and change the file path in the video_prompt_to_prompt.py accordingly. Or download the WebVid prompt edit file proposed in our paper from To be released.
  • Run the command to generate the synthetic video dataset:
python video_prompt_to_prompt.py 
    --start [START INDEX] \
    --end [END INDEX] \
    --prompt_source [ip2p or webvid] \
    --num_sample_each_prompt [NUM SAMPLES FOR EACH PROMPT]

Visual Comparison to Other Methods

TGVE_video_edit_demo.mp4

Links to the baselines used in the video:

๏ฝœ Tune-A-Video | Control Video | Vid2Vid Zero | Video P2P ๏ฝœ

๏ฝœ TokenFlow | Render A Video | Pix2Video ๏ฝœ

Credit

The code was implemented by Jiaxin Cheng during his internship at the AWS Shanghai Lablet.

References

Part of the code and the foundational models are adapted from the following works:

instruct-video-to-video's People

Contributors

amazon-auto avatar cplusx avatar wenhao728 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.