Giter Site home page Giter Site logo

dumpmemory / text2performer Goto Github PK

View Code? Open in Web Editor NEW

This project forked from yumingj/text2performer

0.0 1.0 0.0 1.11 MB

Code for Text2Performer. Paper: Text2Performer: Text-Driven Human Video Generation

Home Page: https://yumingj.github.io/projects/Text2Performer.html

License: Other

Python 58.65% Jupyter Notebook 41.35%

text2performer's Introduction

Text2Performer: Text-Driven Human Video Generation

1S-Lab, Nanyang Technological University  2Shanghai AI Laboratory

Paper | Project Page | Dataset | Video

Text2Performer synthesizes human videos by taking the text descriptions as the only input.

πŸ“– For more visual results, go checkout our project page

Installation

Clone this repo:

git clone https://github.com/yumingj/Text2Performer.git
cd Text2Performer

Dependencies:

conda env create -f env.yaml
conda activate text2performer

(1) Dataset Preparation

In this work, we contribute a human video dataset with rich label and text annotations named Fashion-Text2Video Dataset.

You can download our processed dataset from this Google Drive. After downloading the dataset, unzip the file and put them under the dataset folder with the following structure:

./datasets
β”œβ”€β”€ FashionDataset_frames_crop
    β”œβ”€β”€ xxxxxx
        β”œβ”€β”€ 000.png
        β”œβ”€β”€ 001.png
        β”œβ”€β”€ ...
    β”œβ”€β”€ xxxxxx
    └── xxxxxx
β”œβ”€β”€ train_frame_num.txt
β”œβ”€β”€ val_frame_num.txt
β”œβ”€β”€ test_frame_num.txt
β”œβ”€β”€ moving_frames.npy
β”œβ”€β”€ captions_app.json
β”œβ”€β”€ caption_motion_template.json
β”œβ”€β”€ action_label
    β”œβ”€β”€ xxxxxx.txt
    β”œβ”€β”€ xxxxxx.txt
    β”œβ”€β”€ ...
    └── xxxxxx.txt
└── shhq_dataset % optional

(2) Sampling

Pretrained Models

Pretrained models can be downloaded from the Google Drive. Unzip the file and put them under the pretrained_models folder with the following structure:

pretrained_models
β”œβ”€β”€ sampler_high_res.pth
β”œβ”€β”€ video_trans_high_res.pth
└── vqgan_decomposed_high_res.pth

After downloading pretrained models, you can use generate_long_video.ipynb to generate videos.

(3) Training Text2Performer

Stage I: Decomposed VQGAN

Train the decomposed VQGAN. If you want to skip the training of this network, you can download our pretrained model from here.

For better performance, we also use the data from SHHQ dataset to train this stage.

python -m torch.distributed.launch --nproc_per_node=4 --master_port=29596 train_vqvae_iter_dist.py -opt ./configs/vqgan/vqgan_decompose_high_res.yml --launcher pytorch

Stage II: Video Transformer

Train the video transformer. If you want to skip the training of this network, you can download our pretrained model from here.

python -m torch.distributed.launch --nproc_per_node=4 --master_port=29596 train_dist.py -opt ./configs/video_transformer/video_trans_high_res.yml --launcher pytorch

Stage III: Appearance Transformer

Train the appearance transformer. If you want to skip the training of this network, you can download our pretrained model from here.

python train_sampler.py -opt ./configs/sampler/sampler_high_res.yml

Citation

If you find this work useful for your research, please consider citing our paper:

@inproceedings{jiang2023text2performer,
  title={Text2Performer: Text-Driven Human Video Generation},
  author={Jiang, Yuming and Yang, Shuai and Koh, Tong Liang and Wu, Wayne and Loy, Chen Change and Liu, Ziwei},
  booktitle={Proceedings of the IEEE/CVF International Conference on Computer Vision},
  year={2023}
}

πŸ—žοΈ License

Distributed under the S-Lab License. See LICENSE for more information.

visitor badge

text2performer's People

Contributors

yumingj avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    πŸ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. πŸ“ŠπŸ“ˆπŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❀️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.