Giter Site home page Giter Site logo

jackzhousz / styleheat Goto Github PK

View Code? Open in Web Editor NEW

This project forked from opentalker/styleheat

0.0 0.0 0.0 44.46 MB

[ECCV 2022] StyleHEAT: A framework for high-resolution editable talking face generation

License: MIT License

Shell 0.13% C++ 0.36% Python 97.04% Cuda 2.48%

styleheat's Introduction

StyleHEAT: One-Shot High-Resolution Editable Talking Face Generation via Pretrained StyleGAN (ECCV 2022)

paper | project website

Abstract

We investigate the latent feature space of a pre-trained StyleGAN and discover some excellent spatial transformation properties. Based on the observation, we propose a novel unified framework based on a pre-trained StyleGAN that enables a set of powerful functionalities, i.e., high-resolution video generation, disentangled control by driving video or audio, and flexible face editing.

Environment

git clone https://github.com/FeiiYin/StyleHEAT.git
cd StyleHEAT
conda create -n StyleHEAT python=3.7
conda activate StyleHEAT
pip install torch==1.7.1+cu110 torchvision==0.8.2+cu110 -f https://download.pytorch.org/whl/torch_stable.html
pip install -r requirements

Quick Start

Pretrained Models

Please download our pre-trained model and put it in ./checkpoints.

Model Description
checkpoints/Encoder_e4e.pth Pre-trained E4E StyleGAN Inversion Encoder.
checkpoints/hfgi.pth Pre-trained HFGI StyleGAN Inversion Encoder.
checkpoints/StyleGAN_e4e.pth Pre-trained StyleGAN.
checkpoints/ffhq_pca.pt StyleGAN editing directions.
checkpoints/ffhq_PCA.npz StyleGAN optimization parameters.
checkpoints/interfacegan_directions/ StyleGAN editing directions.
checkpoints/stylegan2_d_256.pth Pre-trained StyleGAN discriminator.
checkpoints/model_ir_se50.pth Pre-trained id-loss discriminator.
checkpoints/StyleHEAT_visual.pt Pre-trained StyleHEAT model.
checkpoints/BFM 3DMM library. (Note the zip file should be unzipped to BFM/.)
checkpoints/Deep3D/epoch_20.pth Pre-trained 3DMM extractor.

We also provide some example videos along with their corresponding 3dmm parameters in videos.zip. Please unzip and put them in docs/demo/videos/ for later inference.

Inference

Same-Identity Reenactment with a video.

python inference.py \
 --config configs/inference.yaml \
 --video_source=./docs/demo/videos/RD_Radio34_003_512.mp4 \
 --output_dir=./docs/demo/output --if_extract

Cross-Identity Reenactment with a single image and a video.

python inference.py \
 --config configs/inference.yaml \
 --video_source=./docs/demo/videos/RD_Radio34_003_512.mp4 \
 --image_source=./docs/demo/images/100.jpg \
 --cross_id \
 --output_dir=./docs/demo/output

The --video_source and --image_source can be specified as either a single file or a folder.

For a better inversion result but taking more time, please specify --inversion_option=optimize and we will optimize the feature latent of StyleGAN-V2. Otherwise we will use HFGI encoder to get the style code and inversion condition with --inversion_option=encode.

If you need align (crop) images during the inference process, please specify --if_align. Or you can first align the source images following FFHQ dataset.

If you need to extract the 3dmm parameters of the target video during the inference process, please specify --if_extract. Or you can first extract the 3dmm parameters with the script TODO.sh and save the 3dmm in the {video_source}/3dmm/3dmm_{video_name}.npy

If you only need to edit the expression without modifying the pose, please specify --edit_expression_only.

Intuitive Editing.

python inference.py \
 --config configs/inference.yaml \
 --image_source=./docs/demo/images/40.jpg \
 --inversion_option=optimize \
 --intuitive_edit \
 --output_dir=./docs/demo/output \
 --if_extract

The 3dmm parameters of the images can also be pre-extracted or online-extracted with the parameter --if_extract.

Attribute Editing.

python inference.py \
 --config configs/inference.yaml \
 --video_source=./docs/demo/videos/RD_Radio34_003_512.mp4 \
 --image_source=./docs/demo/images/40.jpg \
 --attribute_edit --attribute=young \
 --cross_id \
 --output_dir=./docs/demo/output

The support editable attributes include young, old, beard, lip. Note to preserve the editing attributes details in W space, the optimized inversion method is banned here.

Training

Data preprocessing.

  1. To train the VideoWarper, please follow video-preprocessing to download and pre-process the VoxCelebA dataset.

  2. To train the whole framework, please follow HDTF to download the HDTF dataset and see HDTF-preprocessing to pre-process the dataset.

  3. Please follow PIRenderer to extract the 3DMM parameters and prepare all the data into lmdb files.

Training include 2 stages.

  1. Train VideoWarper
bash bash/train_video_warper.sh
  1. Train Video Calibrator
bash bash/train_video_styleheat.sh

Note several path hyper-parameter of dataset need to be modified and then run the script.

Citation

If you find this work useful for your research, please cite:

@article{2203.04036,
      author = {Yin, Fei and Zhang, Yong and Cun, Xiaodong and Cao, Mingdeng and Fan, Yanbo and Wang, Xuan and Bai, Qingyan and Wu, Baoyuan and Wang, Jue and Yang, Yujiu},
      title = {StyleHEAT: One-Shot High-Resolution Editable Talking Face Generation via Pre-trained StyleGAN}, 
      journal = {arxiv:2203.04036},  
      year = {2022}
}

Acknowledgement

Thanks to StyleGAN-2, PIRenderer, HFGI, BaberShop, GFP-GAN, Pixel2Style2Pixel for sharing their code.

styleheat's People

Contributors

feiiyin avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.