Giter Site home page Giter Site logo

waynetech / sadtalker Goto Github PK

View Code? Open in Web Editor NEW

This project forked from opentalker/sadtalker

0.0 0.0 0.0 92.13 MB

(CVPR 2023)SadTalker:Learning Realistic 3D Motion Coefficients for Stylized Audio-Driven Single Image Talking Face Animation

Home Page: https://sadtalker.github.io/

License: MIT License

Shell 0.37% Python 98.21% Jupyter Notebook 1.42%

sadtalker's Introduction

            Open In Colab       Hugging Face Spaces

Wenxuan Zhang *,1,2Xiaodong Cun *,2Xuan Wang 3Yong Zhang 2Xi Shen 2
Yu Guo1 Ying Shan 2   Fei Wang 1

1 Xi'an Jiaotong University   2 Tencent AI Lab   3 Ant Group  

CVPR 2023

sadtalker

TL;DR:       single portrait image 🙎‍♂️      +       audio 🎤       =       talking head video 🎞.


🔥 Highlight

  • 🔥 The extension of the stable-diffusion-webui is online. Just install it in extensions -> install from URL -> https://github.com/Winfredy/SadTalker, checkout more details here.
sadtalker_demo_short.mp4
  • 🔥 full image mode is online! checkout here for more details.
still+enhancer in v0.0.1 still + enhancer in v0.0.2 input image @bagbag1815
still_e_n.mp4
full_body_2.bus_chinese_enhanced.mp4
  • 🔥 Several new mode, eg, still mode, reference mode, resize mode are online for better and custom applications.

  • 🔥 Happy to see our method is used in various talking or singing avatar, checkout these wonderful demos at bilibili and twitter #sadtalker.

📋 Changelog (Previous changelog can be founded here)

  • [2023.04.08]: ❗️❗️❗️ In v0.0.2, we add a logo watermark to the generated video to prevent abusing since it is very realistic.

  • [2023.04.08]: v0.0.2, full image animation, adding baidu driver for download checkpoints. Optimizing the logic about enhancer.

🚧 TODO

Previous TODOs
  • Generating 2D face from a single Image.
  • Generating 3D face from Audio.
  • Generating 4D free-view talking examples from audio and a single image.
  • Gradio/Colab Demo.
  • Full body/image Generation.
  • training code of each componments.
  • Audio-driven Anime Avatar.
  • interpolate ChatGPT for a conversation demo 🤔
  • integrade with stable-diffusion-web-ui. (stay tunning!)

Installing Sadtalker on Linux:

git clone https://github.com/Winfredy/SadTalker.git

cd SadTalker 

conda create -n sadtalker python=3.8

conda activate sadtalker

pip install torch==1.12.1+cu113 torchvision==0.13.1+cu113 torchaudio==0.12.1 --extra-index-url https://download.pytorch.org/whl/cu113

conda install ffmpeg

pip install -r requirements.txt

### tts is optional for gradio demo. 
### pip install TTS

More tips about installnation on Windows and the Docker file can be founded here

Sd-Webui-Extension:

CLICK ME

Installing the lastest version of stable-diffusion-webui and install the sadtalker via extension. image

Then, restarting the stable-diffusion-webui(The models will be downloaded automatically in the right place if you have a good speed network). Or (Important!) you need pre-download sadtalker checkpoints to SADTALKTER_CHECKPOINTS in webui_user.sh(linux) or webui_user.bat(windows) by:

# windows (webui_user.bat)
set COMMANDLINE_ARGS=--no-gradio-queue  --disable-safe-unpickle
set SADTALKER_CHECKPOINTS=D:\SadTalker\checkpoints

# linux (webui_user.sh)
export COMMANDLINE_ARGS=--no-gradio-queue  --disable-safe-unpickle
export SADTALKER_CHECKPOINTS=/path/to/SadTalker/checkpoints

After installation, the SadTalker can be used in stable-diffusion-webui directly. (Some important discussion if you are unable to use full mode).

image

Download Trained Models

CLICK ME

You can run the following script to put all the models in the right place.

bash scripts/download_models.sh

OR download our pre-trained model from google drive or our github release page, and then, put it in ./checkpoints.

OR we provided the downloaded model in 百度云盘 提取码: sadt.

Model Description
checkpoints/auido2exp_00300-model.pth Pre-trained ExpNet in Sadtalker.
checkpoints/auido2pose_00140-model.pth Pre-trained PoseVAE in Sadtalker.
checkpoints/mapping_00229-model.pth.tar Pre-trained MappingNet in Sadtalker.
checkpoints/mapping_00109-model.pth.tar Pre-trained MappingNet in Sadtalker.
checkpoints/facevid2vid_00189-model.pth.tar Pre-trained face-vid2vid model from the reappearance of face-vid2vid.
checkpoints/epoch_20.pth Pre-trained 3DMM extractor in Deep3DFaceReconstruction.
checkpoints/wav2lip.pth Highly accurate lip-sync model in Wav2lip.
checkpoints/shape_predictor_68_face_landmarks.dat Face landmark model used in dilb.
checkpoints/BFM 3DMM library file.
checkpoints/hub Face detection models used in face alignment.

🔮 Quick Start (Best Practice)

Animating Portrait Image from default config.

python inference.py --driven_audio <audio.wav> --source_image <video.mp4 or picture.png> --enhancer gfpgan 

The results will be saved in results/$SOME_TIMESTAMP/*.mp4.

More examples and configuration and tips can be founded in the >>> best practice documents <<<.

Full body/image Generation

Using --still to generate a natural full body video. You can add enhancer to improve the quality of the generated video.

python inference.py --driven_audio <audio.wav> \
                    --source_image <video.mp4 or picture.png> \
                    --result_dir <a file to store results> \
                    --still \
                    --preprocess full \
                    --enhancer gfpgan 

Local Gradio demo.

A local gradio demo similar to our hugging-face demo can be run by:

## you need manually install TTS(https://github.com/coqui-ai/TTS) via `pip install tts` in advanced.

python app.py

🛎 Citation

If you find our work useful in your research, please consider citing:

@article{zhang2022sadtalker,
  title={SadTalker: Learning Realistic 3D Motion Coefficients for Stylized Audio-Driven Single Image Talking Face Animation},
  author={Zhang, Wenxuan and Cun, Xiaodong and Wang, Xuan and Zhang, Yong and Shen, Xi and Guo, Yu and Shan, Ying and Wang, Fei},
  journal={arXiv preprint arXiv:2211.12194},
  year={2022}
}

💗 Acknowledgements

Facerender code borrows heavily from zhanglonghao's reproduction of face-vid2vid and PIRender. We thank the authors for sharing their wonderful code. In training process, We also use the model from Deep3DFaceReconstruction and Wav2lip. We thank for their wonderful work.

🥂 Related Works

📢 Disclaimer

This is not an official product of Tencent. This repository can only be used for personal/research/non-commercial purposes.

LOGO: color and font suggestion: ChatGPT, logo font:Montserrat Alternates .

All the copyright of the demo images and audio are from communities users or the geneartion from stable diffusion. Free free to contact us if you feel uncomfortable.

sadtalker's People

Contributors

johndpope avatar thegenerativegeneration avatar vinthony avatar winfredy avatar zqq-judy avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.