Giter Site home page Giter Site logo

mmaher22 / realistic-neural-talking-head-models Goto Github PK

View Code? Open in Web Editor NEW

This project forked from jarvisss/realistic-neural-talking-head-models

0.0 1.0 0.0 2.42 MB

My implementation of Few-Shot Adversarial Learning of Realistic Neural Talking Head Models (Egor Zakharov et al.).

License: GNU General Public License v3.0

Python 100.00%

realistic-neural-talking-head-models's Introduction

Realistic-Neural-Talking-Head-Models

My implementation of Few-Shot Adversarial Learning of Realistic Neural Talking Head Models (Egor Zakharov et al.). https://arxiv.org/abs/1905.08233

Fake1 Real1

Fake2 Real2

Inference after 5 epochs of training on the smaller test dataset, due to a lack of compute ressources I stopped early (author did 75 epochs with finetuning method and 150 with feed-forward method on the full dataset).

IMAGE ALT TEXT HERE

Prerequisites

1.Loading and converting the caffe VGGFace model to pytorch for the content loss:

Follow these instructions to install the VGGFace from the paper (https://arxiv.org/pdf/1703.07332.pdf):

$ wget http://www.robots.ox.ac.uk/~vgg/software/vgg_face/src/vgg_face_caffe.tar.gz
$ tar xvzf vgg_face_caffe.tar.gz
$ sudo apt install caffe-cuda
$ pip install mmdnn

Convert Caffe to IR (Intermediate Representation)

$ mmtoir -f caffe -n vgg_face_caffe/VGG_FACE_deploy.prototxt -w vgg_face_caffe/VGG_FACE.caffemodel -o VGGFACE_IR

If you have a problem with pickle, delete your numpy and reinstall numpy with version 1.16.1

IR to Pytorch code and weights

$ mmtocode -f pytorch -n VGGFACE_IR.pb --IRWeightPath VGGFACE_IR.npy --dstModelPath Pytorch_VGGFACE_IR.py -dw Pytorch_VGGFACE_IR.npy

Pytorch code and weights to Pytorch model

$ mmtomodel -f pytorch -in Pytorch_VGGFACE_IR.py -iw Pytorch_VGGFACE_IR.npy -o Pytorch_VGGFACE.pth

At this point, you will have a few files in your directory. To save some space you can delete everything and keep Pytorch_VGGFACE_IR.py and Pytorch_VGGFACE.pth

1.1. Download caffe-trained version of VGG19 converted to pytorch for the content loss:

Download caffe-trained version of VGG19 converted to pytorch from https://web.eecs.umich.edu/~justincj/models/vgg19-d01eb7cb.pth

As there are some layer names mismatching in the converted model,

change VGG19_caffe_weight_path in params.py to your path and run

python change_vgg19_caffelayer_name.py

2.Libraries

  • face-alignment
  • torch
  • numpy
  • cv2 (opencv-python)
  • matplotlib
  • tqdm

3.VoxCeleb2 Dataset

The VoxCeleb2 dataset has videos in zip format. (Very heavy 270GB for the dev one and 8GB for the test) http://www.robots.ox.ac.uk/~vgg/data/voxceleb/vox2.html

4.Optional, my pretrained weights

Available at https://drive.google.com/open?id=1vdFz4sh23hC_KIQGJjwbTfUdPG-aYor8

How to use:

  • modify paths in params folder to reflect your path
  • preprocess.py: preprocess our data for faster inference and lighter dataset
  • train.py: initialize and train the network or continue training from trained network
  • embedder_inference.py: (Requires trained model) Run the embedder on videos or images of a person and get embedding vector in tar file
  • fine_tuning_trainng.py: (Requires trained model and embedding vector) finetune a trained model
  • webcam_inference.py: (Requires trained model and embedding vector) run the model using person from embedding vector and webcam input, just inference
  • video_inference.py: just like webcam_inference but on a video, change the path of the video at the start of the file

Architecture

I followed the architecture guidelines from the paper on top of details provided by M. Zakharov.

The images that are fed from voxceleb2 are resized from 224x224 to 256x256 by using zero-padding. This is done so that spatial dimensions don't get rounded when passing through downsampling layers.

The residuals blocks are from LARGE SCALE GAN TRAINING FOR HIGH FIDELITY NATURAL IMAGE SYNTHESIS(K. S. Andrew Brock, Jeff Donahue.).

Embedder

The embedder uses 6 downsampling residual blocks with no normalisation. A self-attention layer is added in the middle. The output from the last residual block is resized to a vector of size 512 via maxpooling.

Generator

The downsampling part of the generator uses the same architecture as the embedder with instance normalization added at each block following the paper.

The same dimension residual part uses 5 blocks. These blocks use adaptive instance normalization. Unlike the AdaIN paper(Xun Huang et al.) where the alpha and beta learnable parameters from instance normalisation are replaced with mean and variance of the input style, the adaptative parameters (mean and variance) are taken from psi. With psi = P*e, P the projection matrix and e the embedding vector calculated by the embedder.

(P is of size 2*(512*2*5 + 512*2 + 512*2+ 512+256 + 256+128 + 128+64 + 64+3) x 512 = 17158 x 512)

There are then 6 upsampling residual blocks. The final output is a tensor of dimensions 3x224x224. I rescale the image using a sigmoid and multiplying by 255. There are two adaIN layers in each upsampling block (they replace the normalisation layers from the Biggan paper).

Self-attention layers are added both in the downsampling part and upsampling part of the generator.

Discriminator

The discriminator uses the same architecture as the embedder.

realistic-neural-talking-head-models's People

Contributors

cclauss avatar jarvisss avatar nwatab avatar vincent-thevenin avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.