Giter Site home page Giter Site logo

syncnet-in-keras's Introduction

syncnet-in-keras

Keras version of SyncNet, by Joon Son Chung and Andrew Zisserman.

SyncNet paper: "Out of time: automated lip sync in the wild"

VGG webpage: VGG - SyncNet

Requirements

  1. Libraries required by Python (I used Python 3) are mentioned in the requirements.txt file.

  2. Keras

  3. Pre-trained weights, to be placed in the syncnet-weights directory. Instructions to download and place the files are available in the readme file inside the syncnet-weights directory

IMPORTANT

  • SyncNet takes input images of size (112, 112, 5).

  • These input images have pixel values between 0 and 255! DON'T rescale image values to [0, 1], keep them in [0, 255].

syncnet-in-keras's People

Contributors

voletiv avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

syncnet-in-keras's Issues

Input to SyncNet

Hi, I'm a bit confused about different implementations.

The SyncNet paper suggests that the image input to the SyncNet model is of mouth images and you describe the procedure to extract the mouth region here in this issue: #1 (comment)

Although, in the official repo the author mentions that the input to the pre-trained model is actually the full face and not the mouth region. They don't implement the cropping of the mouth region in their code: joonson/syncnet_python#9

So is the pretrained SyncNet v4/v7 models for frontal and multi-view on the website here actually on the entire face? Wouldn't your code give inaccurate results for the model outputs since you use the mouth region instead of the full face?

Thanks

What are the inputs to the lip & audio models?

Thanks for this awesome repo @voletiv

Question about the input data shape

  1. you mentioned

SyncNet takes input images of size (112, 112, 5).

What is 5 for? in standard cv2 imread.. it's usually (width, height, channels).. channels would be 3 (R,G,B).. but why 5?

  1. Is the image supposed to be the entire image? Or entire face in the image? Or just the lips (as region of interest) as determined by, say, dlib landmarks?

  2. I am completely new to audio processing so I am not even sure where to begin . What exactly am I supposed to pass to the model? I read the syncnet paper, but im still a bit confused

As always, sample code always appreciated

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.