Giter Site home page Giter Site logo

sibozhang / speech2video Goto Github PK

View Code? Open in Web Editor NEW
97.0 9.0 8.0 2.87 MB

Code for ACCV 2020 "Speech2Video Synthesis with 3D Skeleton Regularization and Expressive Body Poses"

Home Page: https://sites.google.com/view/sibozhang/speech2video

speech2video-synthesis video gan computer-vision simulation face gesture pose 3d body

speech2video's Introduction

Speech2Video

This is code for "Speech2Video Synthesis with 3D Skeleton Regularization and Expressive Body Poses". ACCV 2020. Project Page

Introduction

We propose a novel approach to convert given speech audio to a photo-realistic speaking video of a specific person, where the output video has synchronized, realistic, and expressive rich body dynamics. We achieve this by first generating 3D skeleton movements from the audio sequence using a recurrent neural network (RNN), and then synthesizing the output video via a conditional generative adversarial network (GAN). To make the skeleton movement realistic and expressive, we embed the knowledge of an articulated 3D human skeleton and a learned dictionary of personal speech iconic gestures into the generation process in both learning and testing pipelines. The former prevents the generation of unreasonable body distortion, while the later helps our model quickly learn meaningful body movement through a few recorded videos. To produce photo-realistic and high-resolution video with motion details, we propose to insert part attention mechanisms in the conditional GAN, where each detailed part, e.g. head and hand, is automatically zoomed in to have their own discriminators.

Data / Preprocessing

pretrained model Download

Citation

Speech2Video Synthesis with 3D Skeleton Regularization and Expressive Body Poses

Miao Liao*, Sibo Zhang*, Peng Wang, Hao Zhu, Xinxin Zuo, Ruigang Yang. PDF Result Video 1 min Spotlight 10 min Presentation

@inproceedings{liao2020speech2video,
  title={Speech2video synthesis with 3D skeleton regularization and expressive body poses},
  author={Liao, Miao and Zhang, Sibo and Wang, Peng and Zhu, Hao and Zuo, Xinxin and Yang, Ruigang},
  booktitle={Proceedings of the Asian Conference on Computer Vision},
  year={2020}
}

Ackowledgements

This code is based on the vid2vid framework.

speech2video's People

Contributors

sibozhang avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

speech2video's Issues

demo效果

看了demo,感觉人脸的口型和语音没有匹配上。动作比较机械化。

code release

hello, appricate your work!
when will the code and the pretrained models released? i am looking forward to them !

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.