Giter Site home page Giter Site logo

edtalk's Introduction

๐Ÿš€ EDTalk: Efficient Disentanglement for Emotional Talking Head Synthesis

Shuai Tan1, Bin Ji1, Mengxiao Bi2, Ye Pan1,

1Shanghai Jiao Tong University
2NetEase Fuxi AI Lab

ECCV 2024


โ€‚ โ€‚


๐ŸŽ Abstract

Achieving disentangled control over multiple facial motions and accommodating diverse input modalities greatly enhances the application and entertainment of the talking head generation. This necessitates a deep exploration of the decoupling space for facial features, ensuring that they a) operate independently without mutual interference and b) can be preserved to share with different modal inputsโ€”both aspects often neglected in existing methods. To address this gap, this paper proposes a novel Efficient Disentanglement framework for Talking head generation (EDTalk). Our framework enables individual manipulation of mouth shape, head pose, and emotional expression, conditioned on both video and audio inputs. Specifically, we employ three lightweight modules to decompose the facial dynamics into three distinct latent spaces representing mouth, pose, and expression, respectively. Each space is characterized by a set of learnable bases whose linear combinations define specific motions. To ensure independence and accelerate training, we enforce orthogonality among bases and devise an efficient training strategy to allocate motion responsibilities to each space without relying on external knowledge. The learned bases are then stored in corresponding banks, enabling shared visual priors with audio input. Furthermore, considering the properties of each space, we propose Audio-to-Motion module for audio-driven talking head synthesis. Experiments are conducted to demonstrate the effectiveness of EDTalk.

๐Ÿ’ป Overview



๐Ÿ”ฅ Update

  • 2024.07.01 - ๐Ÿ’ป The inference code and pretrained models are available.
  • 2024.07.01 - ๐ŸŽ‰ Our paper is accepted by ECCV 2024.
  • 2024.04.02 - ๐Ÿ›ณ๏ธ This repo is released.

๐Ÿ“… TODO

  • Release training code.
  • Release inference code.
  • Release pre-trained models.
  • Release Arxiv paper.

๐ŸŽฎ Installation

We train and test based on Python 3.8 and Pytorch. To install the dependencies run:

git clone https://github.com/tanshuai0219/EDTalk.git
cd EDTalk

Install dependency

conda create -n EDTalk python=3.8
conda activate EDTalk
  • python packages
pip install -r requirements.txt

๐ŸŽฌ Quick Start

Download the checkpoints and put them into ./ckpts.

EDTalk-A:lip+pose+exp: Run the demo in audio-driven setting (EDTalk-A):

For user-friendliness, we extracted the weights of eight common sentiments in the expression base. one can directly specify the sentiment to generate emotional talking face videos (recommended)

python demo_EDTalk_A_using_predefined_exp_weights.py --source_path path/to/image --audio_driving_path path/to/audio --pose_driving_path path/to/pose --exp_type type/of/expression --save_path path/to/save

Or you can input an expression reference (image/video) to indicate expression.

python demo_EDTalk_A.py --source_path path/to/image --audio_driving_path path/to/audio --pose_driving_path path/to/pose --exp_driving_path path/to/expression --save_path path/to/save

The result will be stored in save_path.

Source_path and videos used must be first cropped using scripts crop_image2.py and crop_video.py

You can also use crop_image.py to crop the image, but increase_ratio has to be carefully set and tried several times to get the optimal result.


EDTalk-A:lip+pose without exp: If you don't want to change the expression of the identity source, please download the EDTalk_lip_pose.pt and put it into ./ckpts.

If you only want to change the lip motion of the identity source, run

 python demo_lip_pose.py --fix_pose --source_path path/to/image --audio_driving_path path/to/audio --save_path path/to/save

Or you can additionally control the head poses on top of the above via pose_driving_path

 python demo_lip_pose.py --source_path path/to/image --audio_driving_path path/to/audio --pose_driving_path path/to/pose --save_path path/to/save

Run the demo in video-driven setting (EDTalk-V):

python demo_EDTalk_V.py --source_path path/to/image --lip_driving_path path/to/lip --audio_driving_path path/to/audio --pose_driving_path path/to/pose --exp_driving_path path/to/expression --save_path path/to/save

The result will be stored in save_path.

๐ŸŽ“ Citation

@article{tan2024edtalk,
  title={EDTalk: Efficient Disentanglement for Emotional Talking Head Synthesis},
  author={Tan, Shuai and Ji, Bin and Bi, Mengxiao and Pan, Ye},
  journal={arXiv preprint arXiv:2404.01647},
  year={2024}
}

๐Ÿ™ Acknowledgement

Some code are borrowed from following projects:

Some figures in the paper is inspired by:

Thanks for these great projects.

edtalk's People

Contributors

tanshuai0219 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.