rudrabha / lipgan Goto Github PK

This repository contains the codes for LipGAN. LipGAN was published as a part of the paper titled "Towards Automatic Face-to-Face Translation".

Home Page: http://cvit.iiit.ac.in/research/projects/cvit-projects/facetoface-translation

License: MIT License

Python 57.53% MATLAB 42.47%

lipgan's People

Contributors

Stargazers

Watchers

Forkers

tkhchkhiche sherlock42 muddy91 jianjunwu jangocheng yashmukaty momina04 priyanshu-singhania priyansh-singhania amirunpri2018 navinkhandeparkar liannice shubhamgoel90 jack4dev imjonezz qazi0 asears nanhaishun sergeytimoshin cindy0401 peterzs leontay1976 chenlijn giggslam thegafo calculusoflambdas marcozink scpark20 astricks jjandnn ak9250 primejava sixthhokage4 ianswebpage shillerz respondgaurav c00renut nikitadurasov hs1003 manoranjithk sahi11 cuijianzhu joyoyoyoyoyo vinaypn wonwizard hengtuibabai xiangliu886 iloveoreo jojocorleone jokecorleone drolai facial-micro-expressiongc ickybodclay imuledx peternara fengweijp yanglei50 arvind-india jmoon1506 amaljithcf abhayk1201 nbadrinath xiaobingchan ismet-sudietuz nethragunti dhriti-27 taka-229 sarthak42 hkmtechnology muditver mishav78 liecec rishabbjain sahilg06 venturaville neosapience javierprtd mathpopo vernitgarg ramizf nyrize mbijon tommyct614 me-gauravaggarwal eribertoo testbestp vccheng2001 sonishweta295 anylee2021 dfqytcom jochemstoel zhangziliang04 iarrationality raishelmy jiangyv1718 chhaviilli adambear simongcc strangerstar techthiyanes

lipgan's Issues

What kind of video does the 'batch_inference.py' need?

Hi, @Rudrabha @prajwalkr thank you for your amazing work!

I'm trying to train a local video using the batch_inference.py, but I encountered the following error even I tried different video sources(all in mp4 format):

But when I used another video in mp4 format, the batch_inference.py could read the frames in the video but got the following error:

It seems that what batch_inference.py reads are invalid frames.

Thus, I wonder if there are some requirements on the input video so as to continue the trainning.

Sample Dataset for usage #2

Hi @Rudrabha ,
Amazing work.
I was working on generating video from image+audio and it would be very helpful if you could post a sample image and audio file. I've been getting different errors every time I'm using a random image.

Thanks!

May I ask how many GPU ram is needed?

Hi @prajwalkr , I tried to run it on 1080 ti with 11 G gpu ram, however the system halt every time I run on GPU, May I ask how many GPU ram is needed? Thanks.

CONTINIOUS PLAYING

I want to create a chatbot that uses text to speech and lipgan for face animation. Is there a way that the lipgan can be used in real time and create something like a talking avatar that uses text to speech live? Any help will be valuable

Some problems about lip sync

Hi, @prajwalkr. Thanks for sharing the revolutionary work. However, when I run the code and input the same image which you gave in the previous issues, I cannot get a satisfactory result. My result_video is listed as follows. Could you give me some advice on improving the result or correcting my possible mistakes? Thanks a lot.
https://www.youtube.com/watch?v=beuf71Wrg3g

Keras requires TensorFlow 2.2 or higher

Hi,

I'm porting the colab version to kaggle, on the last step (after uninstalling and installing TF to a lowest version) I get:

Keras requires TensorFlow 2.2 or higher

I checked my TF version and even after uninstalling it I get: '2.3.0' when I should have 1.14.0

Any idea why Kaggle is not allowing me to downgrade the TF version?

Thanks

melspectrogram tensor dimentions mismatch

hey, thanks for great paper and repo! I have only one issue with running pythonic implementation:

I've loaded random youtube video with talking person in .mp4 format
with ffmped I got both .wav and .mp4 files
then running something like
python batch_inference.py --checkpoint_path checkpoint.h5 --model residual --face youtube_video.mp4 --fps 30 --audio youtube_video.wav --results_dir result_dir
I face problem with melspectrogram tensor, since it has shape [80, ...], but input_audio in pretrained model requires [12, ...] (and I have 0 ideas how to solve it)

Wav file was generated with: ffmpeg -i youtube_video.mp4 youtube_video.wav

Could you please provide proper example, how to use batch_inference.py for arbitrary .mp4 video?
Or maybe you have any ideas what are reasons for such dimentions sizes mismatch?

Thanks in advance!

the MFCC feature size

Hi, great work!
I am a little confused with the MFCC feature size. In the paper, you said

We extract 13 MFCC features from each audio segment (T = 350, F = 100) and discard the first feature similar to Chung et al.

However, in the audio_hparams.py, I found the num_mels equls to 80 but not 13, which is different from the paper's claim. Can you kindly explain the difference for me?

Not using GPU

The process is not using GPU, and CPU is taking 11 hours to complete a 1-minute video. Any idea what could be the reason?

Can't able to get access of colab notebook

I saw your github repository but I dont get any source link of colab too see the demonstration of LipGan

Need Help - ValueError: Layer #37 (named "batch_normalization_34" in the current model) was found to correspond to layer conv2d_35 in the save file. However the new layer batch_normalization_34 expects 4 weights, but the saved weights have 2 elements.

Hello Sir,

I like your project very much and I am trying it on Google Colab by referring this link (https://colab.research.google.com/drive/1NLUwupCBsB1HrpEmOIHeMgU63sus2LxP). Attaching video (output_00006.mp4) and audio (taunt.wav) files for your reference. After executing all steps successfully while running last step I get below log. Please let me know if I am missing something as I did not see output file generated in /content directory even after refreshing folder in Google Colab.

/content/LipGAN
/usr/local/lib/python3.6/dist-packages/tensorflow/python/framework/dtypes.py:516: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
_np_qint8 = np.dtype([("qint8", np.int8, 1)])
/usr/local/lib/python3.6/dist-packages/tensorflow/python/framework/dtypes.py:517: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
_np_quint8 = np.dtype([("quint8", np.uint8, 1)])
/usr/local/lib/python3.6/dist-packages/tensorflow/python/framework/dtypes.py:518: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
_np_qint16 = np.dtype([("qint16", np.int16, 1)])
/usr/local/lib/python3.6/dist-packages/tensorflow/python/framework/dtypes.py:519: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
_np_quint16 = np.dtype([("quint16", np.uint16, 1)])
/usr/local/lib/python3.6/dist-packages/tensorflow/python/framework/dtypes.py:520: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
_np_qint32 = np.dtype([("qint32", np.int32, 1)])
/usr/local/lib/python3.6/dist-packages/tensorflow/python/framework/dtypes.py:525: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
np_resource = np.dtype([("resource", np.ubyte, 1)])
/usr/local/lib/python3.6/dist-packages/tensorboard/compat/tensorflow_stub/dtypes.py:541: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
_np_qint8 = np.dtype([("qint8", np.int8, 1)])
/usr/local/lib/python3.6/dist-packages/tensorboard/compat/tensorflow_stub/dtypes.py:542: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
_np_quint8 = np.dtype([("quint8", np.uint8, 1)])
/usr/local/lib/python3.6/dist-packages/tensorboard/compat/tensorflow_stub/dtypes.py:543: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
_np_qint16 = np.dtype([("qint16", np.int16, 1)])
/usr/local/lib/python3.6/dist-packages/tensorboard/compat/tensorflow_stub/dtypes.py:544: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
_np_quint16 = np.dtype([("quint16", np.uint16, 1)])
/usr/local/lib/python3.6/dist-packages/tensorboard/compat/tensorflow_stub/dtypes.py:545: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
_np_qint32 = np.dtype([("qint32", np.int32, 1)])
/usr/local/lib/python3.6/dist-packages/tensorboard/compat/tensorflow_stub/dtypes.py:550: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
np_resource = np.dtype([("resource", np.ubyte, 1)])
Using TensorFlow backend.
2000
Number of frames available for inference: 3841
(80, 328)
Length of mel chunks: 95
0% 0/1 [00:00<?, ?it/s]
0% 0/61 [00:00<?, ?it/s]
2% 1/61 [00:02<02:19, 2.33s/it]
3% 2/61 [00:02<01:21, 1.37s/it]
5% 3/61 [00:03<01:01, 1.05s/it]
7% 4/61 [00:03<00:50, 1.12it/s]
8% 5/61 [00:03<00:44, 1.26it/s]
10% 6/61 [00:04<00:40, 1.37it/s]
11% 7/61 [00:04<00:36, 1.46it/s]
13% 8/61 [00:05<00:34, 1.54it/s]
15% 9/61 [00:05<00:32, 1.61it/s]
16% 10/61 [00:06<00:30, 1.66it/s]
18% 11/61 [00:06<00:29, 1.71it/s]
20% 12/61 [00:06<00:27, 1.76it/s]
21% 13/61 [00:07<00:26, 1.80it/s]
23% 14/61 [00:07<00:25, 1.83it/s]
25% 15/61 [00:08<00:24, 1.86it/s]
26% 16/61 [00:08<00:23, 1.89it/s]
28% 17/61 [00:08<00:22, 1.92it/s]
30% 18/61 [00:09<00:22, 1.94it/s]
31% 19/61 [00:09<00:21, 1.96it/s]
33% 20/61 [00:10<00:20, 1.98it/s]
34% 21/61 [00:10<00:20, 2.00it/s]
36% 22/61 [00:10<00:19, 2.01it/s]
38% 23/61 [00:11<00:18, 2.03it/s]
39% 24/61 [00:11<00:18, 2.05it/s]
41% 25/61 [00:12<00:17, 2.06it/s]
43% 26/61 [00:12<00:16, 2.07it/s]
44% 27/61 [00:12<00:16, 2.08it/s]
46% 28/61 [00:13<00:15, 2.09it/s]
48% 29/61 [00:13<00:15, 2.11it/s]
49% 30/61 [00:14<00:14, 2.12it/s]
51% 31/61 [00:14<00:14, 2.12it/s]
52% 32/61 [00:15<00:13, 2.13it/s]
54% 33/61 [00:15<00:13, 2.14it/s]
56% 34/61 [00:15<00:12, 2.15it/s]
57% 35/61 [00:16<00:12, 2.16it/s]
59% 36/61 [00:16<00:11, 2.16it/s]
61% 37/61 [00:17<00:11, 2.17it/s]
62% 38/61 [00:17<00:10, 2.18it/s]
64% 39/61 [00:17<00:10, 2.18it/s]
66% 40/61 [00:18<00:09, 2.19it/s]
67% 41/61 [00:18<00:09, 2.19it/s]
69% 42/61 [00:19<00:08, 2.20it/s]
70% 43/61 [00:19<00:08, 2.20it/s]
72% 44/61 [00:19<00:07, 2.21it/s]
74% 45/61 [00:20<00:07, 2.21it/s]
75% 46/61 [00:20<00:06, 2.22it/s]
77% 47/61 [00:21<00:06, 2.22it/s]
79% 48/61 [00:21<00:05, 2.23it/s]
80% 49/61 [00:21<00:05, 2.23it/s]
82% 50/61 [00:22<00:04, 2.24it/s]
84% 51/61 [00:22<00:04, 2.24it/s]
85% 52/61 [00:23<00:04, 2.24it/s]
87% 53/61 [00:23<00:03, 2.25it/s]
89% 54/61 [00:23<00:03, 2.25it/s]
90% 55/61 [00:24<00:02, 2.25it/s]
92% 56/61 [00:24<00:02, 2.26it/s]
93% 57/61 [00:25<00:01, 2.26it/s]
95% 58/61 [00:25<00:01, 2.26it/s]
97% 59/61 [00:26<00:00, 2.26it/s]
98% 60/61 [00:26<00:00, 2.27it/s]
100% 61/61 [00:26<00:00, 2.30it/s]WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/keras/backend/tensorflow_backend.py:517: The name tf.placeholder is deprecated. Please use tf.compat.v1.placeholder instead.

WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/keras/backend/tensorflow_backend.py:74: The name tf.get_default_graph is deprecated. Please use tf.compat.v1.get_default_graph instead.

WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/keras/backend/tensorflow_backend.py:4138: The name tf.random_uniform is deprecated. Please use tf.random.uniform instead.

WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/keras/backend/tensorflow_backend.py:174: The name tf.get_default_session is deprecated. Please use tf.compat.v1.get_default_session instead.

WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/keras/backend/tensorflow_backend.py:181: The name tf.ConfigProto is deprecated. Please use tf.compat.v1.ConfigProto instead.

2020-07-05 15:01:10.670458: I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 AVX512F FMA
2020-07-05 15:01:10.674180: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcuda.so.1
2020-07-05 15:01:10.674808: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1005] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-07-05 15:01:10.675569: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0xcd015b80 executing computations on platform CUDA. Devices:
2020-07-05 15:01:10.675601: I tensorflow/compiler/xla/service/service.cc:175] StreamExecutor device (0): Tesla P100-PCIE-16GB, Compute Capability 6.0
2020-07-05 15:01:10.677270: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 2000185000 Hz
2020-07-05 15:01:10.677454: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0xcd015800 executing computations on platform Host. Devices:
2020-07-05 15:01:10.677481: I tensorflow/compiler/xla/service/service.cc:175] StreamExecutor device (0): ,
2020-07-05 15:01:10.677688: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1005] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-07-05 15:01:10.678238: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1640] Found device 0 with properties:
name: Tesla P100-PCIE-16GB major: 6 minor: 0 memoryClockRate(GHz): 1.3285
pciBusID: 0000:00:04.0
2020-07-05 15:01:10.678764: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcudart.so.10.0
2020-07-05 15:01:10.682118: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcublas.so.10.0
2020-07-05 15:01:10.684604: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcufft.so.10.0
2020-07-05 15:01:10.685235: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcurand.so.10.0
2020-07-05 15:01:10.688999: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcusolver.so.10.0
2020-07-05 15:01:10.690899: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcusparse.so.10.0
2020-07-05 15:01:10.691003: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcudnn.so.7
2020-07-05 15:01:10.691109: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1005] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-07-05 15:01:10.691678: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1005] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-07-05 15:01:10.692183: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1763] Adding visible gpu devices: 0
2020-07-05 15:01:10.692252: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcudart.so.10.0
2020-07-05 15:01:10.693576: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1181] Device interconnect StreamExecutor with strength 1 edge matrix:
2020-07-05 15:01:10.693601: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1187] 0
2020-07-05 15:01:10.693612: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1200] 0: N
2020-07-05 15:01:10.693725: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1005] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-07-05 15:01:10.694282: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1005] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-07-05 15:01:10.694774: W tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:40] Overriding allow_growth setting because the TF_FORCE_GPU_ALLOW_GROWTH environment variable is set. Original config value was 0.
2020-07-05 15:01:10.694810: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1326] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 15059 MB memory) -> physical GPU (device: 0, name: Tesla P100-PCIE-16GB, pci bus id: 0000:00:04.0, compute capability: 6.0)
WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/keras/backend/tensorflow_backend.py:1834: The name tf.nn.fused_batch_norm is deprecated. Please use tf.compat.v1.nn.fused_batch_norm instead.

Layer (type) Output Shape Param # Connected to

input_face (InputLayer) (None, 96, 96, 6) 0

conv2d_1 (Conv2D) (None, 96, 96, 32) 9440 input_face[0][0]

batch_normalization_1 (BatchNor (None, 96, 96, 32) 128 conv2d_1[0][0]

activation_1 (Activation) (None, 96, 96, 32) 0 batch_normalization_1[0][0]

conv2d_2 (Conv2D) (None, 48, 48, 64) 51264 activation_1[0][0]

batch_normalization_2 (BatchNor (None, 48, 48, 64) 256 conv2d_2[0][0]

activation_2 (Activation) (None, 48, 48, 64) 0 batch_normalization_2[0][0]

conv2d_3 (Conv2D) (None, 48, 48, 64) 36928 activation_2[0][0]

batch_normalization_3 (BatchNor (None, 48, 48, 64) 256 conv2d_3[0][0]

activation_3 (Activation) (None, 48, 48, 64) 0 batch_normalization_3[0][0]

conv2d_4 (Conv2D) (None, 48, 48, 64) 36928 activation_3[0][0]

batch_normalization_4 (BatchNor (None, 48, 48, 64) 256 conv2d_4[0][0]

activation_4 (Activation) (None, 48, 48, 64) 0 batch_normalization_4[0][0]

add_1 (Add) (None, 48, 48, 64) 0 activation_4[0][0]
activation_2[0][0]

activation_5 (Activation) (None, 48, 48, 64) 0 add_1[0][0]

input_audio (InputLayer) (None, 80, 27, 1) 0

conv2d_5 (Conv2D) (None, 48, 48, 64) 36928 activation_5[0][0]

conv2d_27 (Conv2D) (None, 80, 27, 32) 320 input_audio[0][0]

batch_normalization_5 (BatchNor (None, 48, 48, 64) 256 conv2d_5[0][0]

batch_normalization_27 (BatchNo (None, 80, 27, 32) 128 conv2d_27[0][0]

activation_6 (Activation) (None, 48, 48, 64) 0 batch_normalization_5[0][0]

activation_36 (Activation) (None, 80, 27, 32) 0 batch_normalization_27[0][0]

conv2d_6 (Conv2D) (None, 48, 48, 64) 36928 activation_6[0][0]

conv2d_28 (Conv2D) (None, 80, 27, 32) 9248 activation_36[0][0]

batch_normalization_6 (BatchNor (None, 48, 48, 64) 256 conv2d_6[0][0]

batch_normalization_28 (BatchNo (None, 80, 27, 32) 128 conv2d_28[0][0]

activation_7 (Activation) (None, 48, 48, 64) 0 batch_normalization_6[0][0]

activation_37 (Activation) (None, 80, 27, 32) 0 batch_normalization_28[0][0]

add_2 (Add) (None, 48, 48, 64) 0 activation_7[0][0]
activation_5[0][0]

conv2d_29 (Conv2D) (None, 80, 27, 32) 9248 activation_37[0][0]

activation_8 (Activation) (None, 48, 48, 64) 0 add_2[0][0]

batch_normalization_29 (BatchNo (None, 80, 27, 32) 128 conv2d_29[0][0]

conv2d_7 (Conv2D) (None, 24, 24, 128) 73856 activation_8[0][0]

activation_38 (Activation) (None, 80, 27, 32) 0 batch_normalization_29[0][0]

batch_normalization_7 (BatchNor (None, 24, 24, 128) 512 conv2d_7[0][0]

add_10 (Add) (None, 80, 27, 32) 0 activation_38[0][0]
activation_36[0][0]

activation_9 (Activation) (None, 24, 24, 128) 0 batch_normalization_7[0][0]

activation_39 (Activation) (None, 80, 27, 32) 0 add_10[0][0]

conv2d_8 (Conv2D) (None, 24, 24, 128) 147584 activation_9[0][0]

conv2d_30 (Conv2D) (None, 80, 27, 32) 9248 activation_39[0][0]

batch_normalization_8 (BatchNor (None, 24, 24, 128) 512 conv2d_8[0][0]

batch_normalization_30 (BatchNo (None, 80, 27, 32) 128 conv2d_30[0][0]

activation_10 (Activation) (None, 24, 24, 128) 0 batch_normalization_8[0][0]

activation_40 (Activation) (None, 80, 27, 32) 0 batch_normalization_30[0][0]

conv2d_9 (Conv2D) (None, 24, 24, 128) 147584 activation_10[0][0]

conv2d_31 (Conv2D) (None, 80, 27, 32) 9248 activation_40[0][0]

batch_normalization_9 (BatchNor (None, 24, 24, 128) 512 conv2d_9[0][0]

batch_normalization_31 (BatchNo (None, 80, 27, 32) 128 conv2d_31[0][0]

activation_11 (Activation) (None, 24, 24, 128) 0 batch_normalization_9[0][0]

activation_41 (Activation) (None, 80, 27, 32) 0 batch_normalization_31[0][0]

add_3 (Add) (None, 24, 24, 128) 0 activation_11[0][0]
activation_9[0][0]

add_11 (Add) (None, 80, 27, 32) 0 activation_41[0][0]
activation_39[0][0]

activation_12 (Activation) (None, 24, 24, 128) 0 add_3[0][0]

activation_42 (Activation) (None, 80, 27, 32) 0 add_11[0][0]

conv2d_10 (Conv2D) (None, 24, 24, 128) 147584 activation_12[0][0]

conv2d_32 (Conv2D) (None, 27, 9, 64) 18496 activation_42[0][0]

batch_normalization_10 (BatchNo (None, 24, 24, 128) 512 conv2d_10[0][0]

batch_normalization_32 (BatchNo (None, 27, 9, 64) 256 conv2d_32[0][0]

activation_13 (Activation) (None, 24, 24, 128) 0 batch_normalization_10[0][0]

activation_43 (Activation) (None, 27, 9, 64) 0 batch_normalization_32[0][0]

conv2d_11 (Conv2D) (None, 24, 24, 128) 147584 activation_13[0][0]

conv2d_33 (Conv2D) (None, 27, 9, 64) 36928 activation_43[0][0]

batch_normalization_11 (BatchNo (None, 24, 24, 128) 512 conv2d_11[0][0]

batch_normalization_33 (BatchNo (None, 27, 9, 64) 256 conv2d_33[0][0]

activation_14 (Activation) (None, 24, 24, 128) 0 batch_normalization_11[0][0]

activation_44 (Activation) (None, 27, 9, 64) 0 batch_normalization_33[0][0]

add_4 (Add) (None, 24, 24, 128) 0 activation_14[0][0]
activation_12[0][0]

conv2d_34 (Conv2D) (None, 27, 9, 64) 36928 activation_44[0][0]

activation_15 (Activation) (None, 24, 24, 128) 0 add_4[0][0]

batch_normalization_34 (BatchNo (None, 27, 9, 64) 256 conv2d_34[0][0]

conv2d_12 (Conv2D) (None, 24, 24, 128) 147584 activation_15[0][0]

activation_45 (Activation) (None, 27, 9, 64) 0 batch_normalization_34[0][0]

batch_normalization_12 (BatchNo (None, 24, 24, 128) 512 conv2d_12[0][0]

add_12 (Add) (None, 27, 9, 64) 0 activation_45[0][0]
activation_43[0][0]

activation_16 (Activation) (None, 24, 24, 128) 0 batch_normalization_12[0][0]

activation_46 (Activation) (None, 27, 9, 64) 0 add_12[0][0]

conv2d_13 (Conv2D) (None, 24, 24, 128) 147584 activation_16[0][0]

conv2d_35 (Conv2D) (None, 27, 9, 64) 36928 activation_46[0][0]

batch_normalization_13 (BatchNo (None, 24, 24, 128) 512 conv2d_13[0][0]

batch_normalization_35 (BatchNo (None, 27, 9, 64) 256 conv2d_35[0][0]

activation_17 (Activation) (None, 24, 24, 128) 0 batch_normalization_13[0][0]

activation_47 (Activation) (None, 27, 9, 64) 0 batch_normalization_35[0][0]

add_5 (Add) (None, 24, 24, 128) 0 activation_17[0][0]
activation_15[0][0]

conv2d_36 (Conv2D) (None, 27, 9, 64) 36928 activation_47[0][0]

activation_18 (Activation) (None, 24, 24, 128) 0 add_5[0][0]

batch_normalization_36 (BatchNo (None, 27, 9, 64) 256 conv2d_36[0][0]

conv2d_14 (Conv2D) (None, 12, 12, 256) 295168 activation_18[0][0]

activation_48 (Activation) (None, 27, 9, 64) 0 batch_normalization_36[0][0]

batch_normalization_14 (BatchNo (None, 12, 12, 256) 1024 conv2d_14[0][0]

add_13 (Add) (None, 27, 9, 64) 0 activation_48[0][0]
activation_46[0][0]

activation_19 (Activation) (None, 12, 12, 256) 0 batch_normalization_14[0][0]

activation_49 (Activation) (None, 27, 9, 64) 0 add_13[0][0]

conv2d_15 (Conv2D) (None, 12, 12, 256) 590080 activation_19[0][0]

conv2d_37 (Conv2D) (None, 9, 9, 128) 73856 activation_49[0][0]

batch_normalization_15 (BatchNo (None, 12, 12, 256) 1024 conv2d_15[0][0]

batch_normalization_37 (BatchNo (None, 9, 9, 128) 512 conv2d_37[0][0]

activation_20 (Activation) (None, 12, 12, 256) 0 batch_normalization_15[0][0]

activation_50 (Activation) (None, 9, 9, 128) 0 batch_normalization_37[0][0]

conv2d_16 (Conv2D) (None, 12, 12, 256) 590080 activation_20[0][0]

conv2d_38 (Conv2D) (None, 9, 9, 128) 147584 activation_50[0][0]

batch_normalization_16 (BatchNo (None, 12, 12, 256) 1024 conv2d_16[0][0]

batch_normalization_38 (BatchNo (None, 9, 9, 128) 512 conv2d_38[0][0]

activation_21 (Activation) (None, 12, 12, 256) 0 batch_normalization_16[0][0]

activation_51 (Activation) (None, 9, 9, 128) 0 batch_normalization_38[0][0]

add_6 (Add) (None, 12, 12, 256) 0 activation_21[0][0]
activation_19[0][0]

conv2d_39 (Conv2D) (None, 9, 9, 128) 147584 activation_51[0][0]

activation_22 (Activation) (None, 12, 12, 256) 0 add_6[0][0]

batch_normalization_39 (BatchNo (None, 9, 9, 128) 512 conv2d_39[0][0]

conv2d_17 (Conv2D) (None, 12, 12, 256) 590080 activation_22[0][0]

activation_52 (Activation) (None, 9, 9, 128) 0 batch_normalization_39[0][0]

batch_normalization_17 (BatchNo (None, 12, 12, 256) 1024 conv2d_17[0][0]

add_14 (Add) (None, 9, 9, 128) 0 activation_52[0][0]
activation_50[0][0]

activation_23 (Activation) (None, 12, 12, 256) 0 batch_normalization_17[0][0]

activation_53 (Activation) (None, 9, 9, 128) 0 add_14[0][0]

conv2d_18 (Conv2D) (None, 12, 12, 256) 590080 activation_23[0][0]

conv2d_40 (Conv2D) (None, 9, 9, 128) 147584 activation_53[0][0]

batch_normalization_18 (BatchNo (None, 12, 12, 256) 1024 conv2d_18[0][0]

batch_normalization_40 (BatchNo (None, 9, 9, 128) 512 conv2d_40[0][0]

activation_24 (Activation) (None, 12, 12, 256) 0 batch_normalization_18[0][0]

activation_54 (Activation) (None, 9, 9, 128) 0 batch_normalization_40[0][0]

add_7 (Add) (None, 12, 12, 256) 0 activation_24[0][0]
activation_22[0][0]

conv2d_41 (Conv2D) (None, 9, 9, 128) 147584 activation_54[0][0]

activation_25 (Activation) (None, 12, 12, 256) 0 add_7[0][0]

batch_normalization_41 (BatchNo (None, 9, 9, 128) 512 conv2d_41[0][0]

conv2d_19 (Conv2D) (None, 6, 6, 512) 1180160 activation_25[0][0]

activation_55 (Activation) (None, 9, 9, 128) 0 batch_normalization_41[0][0]

batch_normalization_19 (BatchNo (None, 6, 6, 512) 2048 conv2d_19[0][0]

add_15 (Add) (None, 9, 9, 128) 0 activation_55[0][0]
activation_53[0][0]

activation_26 (Activation) (None, 6, 6, 512) 0 batch_normalization_19[0][0]

activation_56 (Activation) (None, 9, 9, 128) 0 add_15[0][0]

conv2d_20 (Conv2D) (None, 6, 6, 512) 2359808 activation_26[0][0]

conv2d_42 (Conv2D) (None, 3, 3, 256) 295168 activation_56[0][0]

batch_normalization_20 (BatchNo (None, 6, 6, 512) 2048 conv2d_20[0][0]

batch_normalization_42 (BatchNo (None, 3, 3, 256) 1024 conv2d_42[0][0]

activation_27 (Activation) (None, 6, 6, 512) 0 batch_normalization_20[0][0]

activation_57 (Activation) (None, 3, 3, 256) 0 batch_normalization_42[0][0]

conv2d_21 (Conv2D) (None, 6, 6, 512) 2359808 activation_27[0][0]

conv2d_43 (Conv2D) (None, 3, 3, 256) 590080 activation_57[0][0]

batch_normalization_21 (BatchNo (None, 6, 6, 512) 2048 conv2d_21[0][0]

batch_normalization_43 (BatchNo (None, 3, 3, 256) 1024 conv2d_43[0][0]

activation_28 (Activation) (None, 6, 6, 512) 0 batch_normalization_21[0][0]

activation_58 (Activation) (None, 3, 3, 256) 0 batch_normalization_43[0][0]

add_8 (Add) (None, 6, 6, 512) 0 activation_28[0][0]
activation_26[0][0]

conv2d_44 (Conv2D) (None, 3, 3, 256) 590080 activation_58[0][0]

activation_29 (Activation) (None, 6, 6, 512) 0 add_8[0][0]

batch_normalization_44 (BatchNo (None, 3, 3, 256) 1024 conv2d_44[0][0]

conv2d_22 (Conv2D) (None, 6, 6, 512) 2359808 activation_29[0][0]

activation_59 (Activation) (None, 3, 3, 256) 0 batch_normalization_44[0][0]

batch_normalization_22 (BatchNo (None, 6, 6, 512) 2048 conv2d_22[0][0]

add_16 (Add) (None, 3, 3, 256) 0 activation_59[0][0]
activation_57[0][0]

activation_30 (Activation) (None, 6, 6, 512) 0 batch_normalization_22[0][0]

activation_60 (Activation) (None, 3, 3, 256) 0 add_16[0][0]

conv2d_23 (Conv2D) (None, 6, 6, 512) 2359808 activation_30[0][0]

conv2d_45 (Conv2D) (None, 3, 3, 256) 590080 activation_60[0][0]

batch_normalization_23 (BatchNo (None, 6, 6, 512) 2048 conv2d_23[0][0]

batch_normalization_45 (BatchNo (None, 3, 3, 256) 1024 conv2d_45[0][0]

activation_31 (Activation) (None, 6, 6, 512) 0 batch_normalization_23[0][0]

activation_61 (Activation) (None, 3, 3, 256) 0 batch_normalization_45[0][0]

add_9 (Add) (None, 6, 6, 512) 0 activation_31[0][0]
activation_29[0][0]

conv2d_46 (Conv2D) (None, 3, 3, 256) 590080 activation_61[0][0]

activation_32 (Activation) (None, 6, 6, 512) 0 add_9[0][0]

batch_normalization_46 (BatchNo (None, 3, 3, 256) 1024 conv2d_46[0][0]

conv2d_24 (Conv2D) (None, 3, 3, 512) 2359808 activation_32[0][0]

activation_62 (Activation) (None, 3, 3, 256) 0 batch_normalization_46[0][0]

batch_normalization_24 (BatchNo (None, 3, 3, 512) 2048 conv2d_24[0][0]

add_17 (Add) (None, 3, 3, 256) 0 activation_62[0][0]
activation_60[0][0]

activation_33 (Activation) (None, 3, 3, 512) 0 batch_normalization_24[0][0]

activation_63 (Activation) (None, 3, 3, 256) 0 add_17[0][0]

conv2d_25 (Conv2D) (None, 1, 1, 512) 2359808 activation_33[0][0]

conv2d_47 (Conv2D) (None, 1, 1, 512) 1180160 activation_63[0][0]

batch_normalization_25 (BatchNo (None, 1, 1, 512) 2048 conv2d_25[0][0]

batch_normalization_47 (BatchNo (None, 1, 1, 512) 2048 conv2d_47[0][0]

activation_34 (Activation) (None, 1, 1, 512) 0 batch_normalization_25[0][0]

activation_64 (Activation) (None, 1, 1, 512) 0 batch_normalization_47[0][0]

conv2d_26 (Conv2D) (None, 1, 1, 512) 262656 activation_34[0][0]

conv2d_48 (Conv2D) (None, 1, 1, 512) 262656 activation_64[0][0]

batch_normalization_26 (BatchNo (None, 1, 1, 512) 2048 conv2d_26[0][0]

batch_normalization_48 (BatchNo (None, 1, 1, 512) 2048 conv2d_48[0][0]

activation_35 (Activation) (None, 1, 1, 512) 0 batch_normalization_26[0][0]

activation_65 (Activation) (None, 1, 1, 512) 0 batch_normalization_48[0][0]

concatenate_1 (Concatenate) (None, 1, 1, 1024) 0 activation_35[0][0]
activation_65[0][0]

conv2d_transpose_1 (Conv2DTrans (None, 3, 3, 512) 4719104 concatenate_1[0][0]

batch_normalization_49 (BatchNo (None, 3, 3, 512) 2048 conv2d_transpose_1[0][0]

activation_66 (Activation) (None, 3, 3, 512) 0 batch_normalization_49[0][0]

concatenate_2 (Concatenate) (None, 3, 3, 1024) 0 activation_33[0][0]
activation_66[0][0]

conv2d_transpose_2 (Conv2DTrans (None, 6, 6, 512) 4719104 concatenate_2[0][0]

batch_normalization_50 (BatchNo (None, 6, 6, 512) 2048 conv2d_transpose_2[0][0]

activation_67 (Activation) (None, 6, 6, 512) 0 batch_normalization_50[0][0]

conv2d_49 (Conv2D) (None, 6, 6, 512) 2359808 activation_67[0][0]

batch_normalization_51 (BatchNo (None, 6, 6, 512) 2048 conv2d_49[0][0]

activation_68 (Activation) (None, 6, 6, 512) 0 batch_normalization_51[0][0]

conv2d_50 (Conv2D) (None, 6, 6, 512) 2359808 activation_68[0][0]

batch_normalization_52 (BatchNo (None, 6, 6, 512) 2048 conv2d_50[0][0]

activation_69 (Activation) (None, 6, 6, 512) 0 batch_normalization_52[0][0]

add_18 (Add) (None, 6, 6, 512) 0 activation_69[0][0]
activation_67[0][0]

activation_70 (Activation) (None, 6, 6, 512) 0 add_18[0][0]

conv2d_51 (Conv2D) (None, 6, 6, 512) 2359808 activation_70[0][0]

batch_normalization_53 (BatchNo (None, 6, 6, 512) 2048 conv2d_51[0][0]

activation_71 (Activation) (None, 6, 6, 512) 0 batch_normalization_53[0][0]

conv2d_52 (Conv2D) (None, 6, 6, 512) 2359808 activation_71[0][0]

batch_normalization_54 (BatchNo (None, 6, 6, 512) 2048 conv2d_52[0][0]

activation_72 (Activation) (None, 6, 6, 512) 0 batch_normalization_54[0][0]

add_19 (Add) (None, 6, 6, 512) 0 activation_72[0][0]
activation_70[0][0]

activation_73 (Activation) (None, 6, 6, 512) 0 add_19[0][0]

concatenate_3 (Concatenate) (None, 6, 6, 1024) 0 activation_32[0][0]
activation_73[0][0]

conv2d_transpose_3 (Conv2DTrans (None, 12, 12, 256) 2359552 concatenate_3[0][0]

batch_normalization_55 (BatchNo (None, 12, 12, 256) 1024 conv2d_transpose_3[0][0]

activation_74 (Activation) (None, 12, 12, 256) 0 batch_normalization_55[0][0]

conv2d_53 (Conv2D) (None, 12, 12, 256) 590080 activation_74[0][0]

batch_normalization_56 (BatchNo (None, 12, 12, 256) 1024 conv2d_53[0][0]

activation_75 (Activation) (None, 12, 12, 256) 0 batch_normalization_56[0][0]

conv2d_54 (Conv2D) (None, 12, 12, 256) 590080 activation_75[0][0]

batch_normalization_57 (BatchNo (None, 12, 12, 256) 1024 conv2d_54[0][0]

activation_76 (Activation) (None, 12, 12, 256) 0 batch_normalization_57[0][0]

add_20 (Add) (None, 12, 12, 256) 0 activation_76[0][0]
activation_74[0][0]

activation_77 (Activation) (None, 12, 12, 256) 0 add_20[0][0]

conv2d_55 (Conv2D) (None, 12, 12, 256) 590080 activation_77[0][0]

batch_normalization_58 (BatchNo (None, 12, 12, 256) 1024 conv2d_55[0][0]

activation_78 (Activation) (None, 12, 12, 256) 0 batch_normalization_58[0][0]

conv2d_56 (Conv2D) (None, 12, 12, 256) 590080 activation_78[0][0]

batch_normalization_59 (BatchNo (None, 12, 12, 256) 1024 conv2d_56[0][0]

activation_79 (Activation) (None, 12, 12, 256) 0 batch_normalization_59[0][0]

add_21 (Add) (None, 12, 12, 256) 0 activation_79[0][0]
activation_77[0][0]

activation_80 (Activation) (None, 12, 12, 256) 0 add_21[0][0]

concatenate_4 (Concatenate) (None, 12, 12, 512) 0 activation_25[0][0]
activation_80[0][0]

conv2d_transpose_4 (Conv2DTrans (None, 24, 24, 128) 589952 concatenate_4[0][0]

batch_normalization_60 (BatchNo (None, 24, 24, 128) 512 conv2d_transpose_4[0][0]

activation_81 (Activation) (None, 24, 24, 128) 0 batch_normalization_60[0][0]

conv2d_57 (Conv2D) (None, 24, 24, 128) 147584 activation_81[0][0]

batch_normalization_61 (BatchNo (None, 24, 24, 128) 512 conv2d_57[0][0]

activation_82 (Activation) (None, 24, 24, 128) 0 batch_normalization_61[0][0]

conv2d_58 (Conv2D) (None, 24, 24, 128) 147584 activation_82[0][0]

batch_normalization_62 (BatchNo (None, 24, 24, 128) 512 conv2d_58[0][0]

activation_83 (Activation) (None, 24, 24, 128) 0 batch_normalization_62[0][0]

add_22 (Add) (None, 24, 24, 128) 0 activation_83[0][0]
activation_81[0][0]

activation_84 (Activation) (None, 24, 24, 128) 0 add_22[0][0]

conv2d_59 (Conv2D) (None, 24, 24, 128) 147584 activation_84[0][0]

batch_normalization_63 (BatchNo (None, 24, 24, 128) 512 conv2d_59[0][0]

activation_85 (Activation) (None, 24, 24, 128) 0 batch_normalization_63[0][0]

conv2d_60 (Conv2D) (None, 24, 24, 128) 147584 activation_85[0][0]

batch_normalization_64 (BatchNo (None, 24, 24, 128) 512 conv2d_60[0][0]

activation_86 (Activation) (None, 24, 24, 128) 0 batch_normalization_64[0][0]

add_23 (Add) (None, 24, 24, 128) 0 activation_86[0][0]
activation_84[0][0]

activation_87 (Activation) (None, 24, 24, 128) 0 add_23[0][0]

concatenate_5 (Concatenate) (None, 24, 24, 256) 0 activation_18[0][0]
activation_87[0][0]

conv2d_transpose_5 (Conv2DTrans (None, 48, 48, 64) 147520 concatenate_5[0][0]

batch_normalization_65 (BatchNo (None, 48, 48, 64) 256 conv2d_transpose_5[0][0]

activation_88 (Activation) (None, 48, 48, 64) 0 batch_normalization_65[0][0]

conv2d_61 (Conv2D) (None, 48, 48, 64) 36928 activation_88[0][0]

batch_normalization_66 (BatchNo (None, 48, 48, 64) 256 conv2d_61[0][0]

activation_89 (Activation) (None, 48, 48, 64) 0 batch_normalization_66[0][0]

conv2d_62 (Conv2D) (None, 48, 48, 64) 36928 activation_89[0][0]

batch_normalization_67 (BatchNo (None, 48, 48, 64) 256 conv2d_62[0][0]

activation_90 (Activation) (None, 48, 48, 64) 0 batch_normalization_67[0][0]

add_24 (Add) (None, 48, 48, 64) 0 activation_90[0][0]
activation_88[0][0]

activation_91 (Activation) (None, 48, 48, 64) 0 add_24[0][0]

conv2d_63 (Conv2D) (None, 48, 48, 64) 36928 activation_91[0][0]

batch_normalization_68 (BatchNo (None, 48, 48, 64) 256 conv2d_63[0][0]

activation_92 (Activation) (None, 48, 48, 64) 0 batch_normalization_68[0][0]

conv2d_64 (Conv2D) (None, 48, 48, 64) 36928 activation_92[0][0]

batch_normalization_69 (BatchNo (None, 48, 48, 64) 256 conv2d_64[0][0]

activation_93 (Activation) (None, 48, 48, 64) 0 batch_normalization_69[0][0]

add_25 (Add) (None, 48, 48, 64) 0 activation_93[0][0]
activation_91[0][0]

activation_94 (Activation) (None, 48, 48, 64) 0 add_25[0][0]

concatenate_6 (Concatenate) (None, 48, 48, 128) 0 activation_8[0][0]
activation_94[0][0]

conv2d_transpose_6 (Conv2DTrans (None, 96, 96, 32) 36896 concatenate_6[0][0]

batch_normalization_70 (BatchNo (None, 96, 96, 32) 128 conv2d_transpose_6[0][0]

activation_95 (Activation) (None, 96, 96, 32) 0 batch_normalization_70[0][0]

concatenate_7 (Concatenate) (None, 96, 96, 64) 0 activation_1[0][0]
activation_95[0][0]

conv2d_65 (Conv2D) (None, 96, 96, 16) 9232 concatenate_7[0][0]

batch_normalization_71 (BatchNo (None, 96, 96, 16) 64 conv2d_65[0][0]

activation_96 (Activation) (None, 96, 96, 16) 0 batch_normalization_71[0][0]

conv2d_66 (Conv2D) (None, 96, 96, 16) 2320 activation_96[0][0]

batch_normalization_72 (BatchNo (None, 96, 96, 16) 64 conv2d_66[0][0]

activation_97 (Activation) (None, 96, 96, 16) 0 batch_normalization_72[0][0]

conv2d_67 (Conv2D) (None, 96, 96, 3) 51 activation_97[0][0]

prediction (Activation) (None, 96, 96, 3) 0 conv2d_67[0][0]

Total params: 49,573,971
Trainable params: 49,543,123
Non-trainable params: 30,848

WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/keras/optimizers.py:790: The name tf.train.Optimizer is deprecated. Please use tf.compat.v1.train.Optimizer instead.

Model Created
Traceback (most recent call last):
File "batch_inference.py", line 217, in
main()
File "batch_inference.py", line 193, in main
model.load_weights(args.checkpoint_path)
File "/usr/local/lib/python3.6/dist-packages/keras/engine/network.py", line 1166, in load_weights
f, self.layers, reshape=reshape)
File "/usr/local/lib/python3.6/dist-packages/keras/engine/saving.py", line 1056, in load_weights_from_hdf5_group
' elements.')
ValueError: Layer #37 (named "batch_normalization_34" in the current model) was found to correspond to layer conv2d_35 in the save file. However the new layer batch_normalization_34 expects 4 weights, but the saved weights have 2 elements.

Really Bad Lip Sync Results For Use Case 1

Hi,

Thank you for sharing the model and it's great to see the progress made overall.

When testing, I've observed that while the results are somewhat as expected for talking face videos - i.e. use case 2, lip movements of picture - but the results are really bad when generating correct lip motion on a random talking face video - i.e. the Use Case 1 as mentioned in the repository. I am comparing with the results shown in Github or discussed in the paper.

In the results generated for Use Case 1, the lip motion seems almost the same as source video. Basically I'm trying to understand if it's supposed to be like that or not. It appears as if there is nearly no lip-sync at all - the lip movements are almost like those in the source video and not much indicative of words being played in the input audio. If input audio has a pause, the lip movements keep happening if the source video had lip movements.

I'm sharing some examples of the results to get a better idea of the model's capabilities - results are generated using a sample video of Obama and another sample audio of Obama

Here is another example with a different video of Obama and another sample audio of Obama:

To generate the results, I got someone to create a colab notebook and they used Librosa approach. I can share the notebook in case you want to see if some error was made.

Again, really appreciate the model and I feel it represents a great advance in the overall tech, just creating the issue just to see whether the quality of results is what is to be expected from the model or whether it can be improved. Thank you.

About checkpoint file

Hi , could I have the pretrained checkpoint file? thanks!

when trainning, numpy.AxisError, now

File "train.py", line 143, in
(dummy_faces, audio), real_faces = next(train_datagen)
File "train.py", line 80, in datagen
mel_batch = np.expand_dims(np.asarray(mel_batch), 3)
File "<array_function internals>", line 6, in expand_dims
File "/home/user/anaconda3/envs/lipgan/lib/python3.7/site-packages/numpy/lib/shape_base.py", line 597, in expand_dims
axis = normalize_axis_tuple(axis, out_ndim)
File "/home/user/anaconda3/envs/lipgan/lib/python3.7/site-packages/numpy/core/numeric.py", line 1358, in normalize_axis_tuple
axis = tuple([normalize_axis_index(ax, ndim, argname) for ax in axis])
File "/home/user/anaconda3/envs/lipgan/lib/python3.7/site-packages/numpy/core/numeric.py", line 1358, in
axis = tuple([normalize_axis_index(ax, ndim, argname) for ax in axis])
numpy.AxisError: axis 3 is out of bounds for array of dimension 2

Inference freezes in beginning

Implementation with Android?

Is it possible to implement this with Android?

Without mathlab

Thanks for this great project. Is there a way to process the audio without having matlab?

When using a single image UnboundLocalError: local variable 'full_frames' referenced before assignment

When using a single image full_frames is not defined

Traceback (most recent call last): File "batch_inference.py", line 228, in <module> main() File "batch_inference.py", line 178, in main print ("Number of frames to be used for inference: "+str(len(full_frames))) UnboundLocalError: local variable 'full_frames' referenced before assignment

Also FPS is a required parameter, in the docs for image it isn't specified

About Chin movement

Hi, I see the result that the Chin isn't moving and looks like diecut, may I ask how to make it include the chin movement? thanks.

Low GPU usage during inference

Inference is very slow for me, I can see that only the CPU is being used at the beginning (when the two progress bars are loading), only after that, during the second stage that things get a lot faster as the GPU starts kicking in and generate the final results, would it be possible to use the GPU the entire time (especially at the beginning) or that first stage can be done only on the CPU? I'm pretty sure I installed everything correctly.
Thanks for sharing this amazing repo, I'm really impressed with the results!

Fully_pythonic branch issue: RuntimeError: Unable to open logs/mmod_human_face_detector.dat for reading.

Hello,
I had a query regarding the fully_pythonic branch execution,
I added the necessary packages and supplied the image and .wav audio file to create the example #2 (create video using image and audio).
python batch_inference.py --checkpoint_path logs/lipgan_residual_mel.h5 --face face.jpg --audio try.wav --results_dir results [cmd]

The script is starting but ending abruptly with this error:

Where can I find the human_face_detector.dat file, I only found a link to download the pretrained Lipgan checkpoint, where to get this .dat file? Thanks a lot for your help.

Didn't you apply any learning rate scheduler?

Hello, I have a question about training LipGAN.
In the train.py code, you can set the lr to 1e-3 but there is no scheduler code to reduce learning rate.
Is it right to set learning rate to 1e-3 constantly?

librosa's mfcc extraction

@prajwalkr do you think its possible to use librosa instead of matlab?

epoch size during training

In train.py or train_unet.py, the default epoch size is 20000000.
But, you trained model 20 epochs with LRS2 dataset in the paper.
How many epochs do I have to train the model if I want to get accuracy like your proposed trained weights?

Thanks!

Is it GPU accelerated?

Hi, thanks for your inspiring works. Is this project GPU accelerated? As I could not find .cuda() in your source, or I misunderstand something? thanks a lot!

colab

can inference be done in colab using octave instead of matlab

Any ideas to add blinks ?

Hello
This works really well, I was wondering if we could add blinks to the eyes in some way, this would create more realism in the videos generated from the single image.
Any pointers would be lovely

Fully_Pythonic - ModuleNotFoundError: No module named 'tensorflow.contrib'

I have been trying to use:
https://github.com/Rudrabha/LipGAN/tree/fully_pythonic

After managing to get the various packages installed on my Macbook when running the command to animate a image with an audio file I get the following error in Terminal:
LipGAN-fully_pythonic/audio_hparams.py", line 1, in
from tensorflow.contrib.training import HParams
ModuleNotFoundError: No module named 'tensorflow.contrib'

Colab Demo

I have created colab demo for this project.
May be useful for others!
https://colab.research.google.com/drive/1NLUwupCBsB1HrpEmOIHeMgU63sus2LxP

Batch inference of model after training

Hi,
I had trained my model and was trying to test it using batch_inference. However, my generator model gen.h5 got saved as a tuple (look at line 2 in the following pic) and thus, I'm getting

Also, if I try using either of those models individually, I further get an error that the no. of layers are not consistent with the existing model.

ValueError: You are trying to load a weight file containing 145 layers into a model with 61 layers.

Do I need to change something ?

AttributeError: 'tuple' object has no attribute 'load_weights'

Environment -

Ubuntu 18.4
Tensorflow-GPU - 1.18
Keras tired with 2.2.4 and 2.3.1
cuda-9.0

Tried to import keras from TensorFlow but in all cases getting the following error.

Model Created

Traceback (most recent call last):
File "batch_inference.py", line 228, in
main()
File "batch_inference.py", line 206, in main
model.load_weights(args.checkpoint_path)
AttributeError: 'tuple' object has no attribute 'load_weights'

Allignment off

I got the colab version running. But the output didn't work at all. How can I fix this?

output.mp4

Speedups during inferencing & Performance Improvements

Hey @Rudrabha & @prajwalkr
Kudos on doing such awesome work!
I had some questions regarding general improvments

Do you think if we train on more datasets like - LRW(has pauses in speech), LR2, LR3 can improve accuracy and might also fix issues with regards to pausing?
Any thoughts on improving prediction time - like using something else instead of dlib(maybe use some better and faster face detection? or more etc? Since CPU prediction time is quite slow. Any more ideas on this would be really helpful :)
How much time it took you to train on LRS2 & 4. Nvidia Titan X Pascal?

matlab issue

I have matlab 2020b installed on my mac. I can't figure out how to run the create_mat function. What should I be typing into the terminal?

Weird output for some videos

Hey @prajwalkr ,
I have attached the video used for inferencing the LipGan models. I ran the following command

!python batch_inference.py --checkpoint_path logs/lipgan_residual_mel.h5 --model residual --pad 0 0 0 0 --face "/content/male.mp4" --fps 29.970 --audio "/content/male.wav" --results_dir "/content/"

Here are the videos used. I don't think FPS is the problem here, as the model seems to work for variable FPS as well.
videos.zip

/content/LipGAN
/usr/local/lib/python3.6/dist-packages/tensorflow/python/framework/dtypes.py:516: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_qint8 = np.dtype([("qint8", np.int8, 1)])
/usr/local/lib/python3.6/dist-packages/tensorflow/python/framework/dtypes.py:517: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_quint8 = np.dtype([("quint8", np.uint8, 1)])
/usr/local/lib/python3.6/dist-packages/tensorflow/python/framework/dtypes.py:518: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_qint16 = np.dtype([("qint16", np.int16, 1)])
/usr/local/lib/python3.6/dist-packages/tensorflow/python/framework/dtypes.py:519: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_quint16 = np.dtype([("quint16", np.uint16, 1)])
/usr/local/lib/python3.6/dist-packages/tensorflow/python/framework/dtypes.py:520: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_qint32 = np.dtype([("qint32", np.int32, 1)])
/usr/local/lib/python3.6/dist-packages/tensorflow/python/framework/dtypes.py:525: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  np_resource = np.dtype([("resource", np.ubyte, 1)])
/usr/local/lib/python3.6/dist-packages/tensorboard/compat/tensorflow_stub/dtypes.py:541: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_qint8 = np.dtype([("qint8", np.int8, 1)])
/usr/local/lib/python3.6/dist-packages/tensorboard/compat/tensorflow_stub/dtypes.py:542: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_quint8 = np.dtype([("quint8", np.uint8, 1)])
/usr/local/lib/python3.6/dist-packages/tensorboard/compat/tensorflow_stub/dtypes.py:543: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_qint16 = np.dtype([("qint16", np.int16, 1)])
/usr/local/lib/python3.6/dist-packages/tensorboard/compat/tensorflow_stub/dtypes.py:544: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_quint16 = np.dtype([("quint16", np.uint16, 1)])
/usr/local/lib/python3.6/dist-packages/tensorboard/compat/tensorflow_stub/dtypes.py:545: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_qint32 = np.dtype([("qint32", np.int32, 1)])
/usr/local/lib/python3.6/dist-packages/tensorboard/compat/tensorflow_stub/dtypes.py:550: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  np_resource = np.dtype([("resource", np.ubyte, 1)])
Using TensorFlow backend.
Number of frames available for inference: 747
(80, 1874)
Length of mel chunks: 693
  0% 0/3 [00:00<?, ?it/s]
  0% 0/12 [00:00<?, ?it/s]
  8% 1/12 [00:02<00:25,  2.28s/it]
 17% 2/12 [00:02<00:13,  1.36s/it]
 25% 3/12 [00:03<00:09,  1.04s/it]
 33% 4/12 [00:03<00:07,  1.13it/s]
 42% 5/12 [00:03<00:05,  1.27it/s]
 50% 6/12 [00:04<00:04,  1.37it/s]
 58% 7/12 [00:04<00:03,  1.46it/s]
 67% 8/12 [00:05<00:02,  1.54it/s]
 75% 9/12 [00:05<00:01,  1.61it/s]
 83% 10/12 [00:06<00:01,  1.66it/s]
 92% 11/12 [00:06<00:00,  1.70it/s]
100% 12/12 [00:06<00:00,  1.78it/s]
Skipping 0
Skipping 1
Skipping 2
Skipping 3
Skipping 4
Skipping 5
Skipping 6
Skipping 7
Skipping 8
Skipping 9
Skipping 10
Skipping 11
Skipping 12
Skipping 13
Skipping 14
Skipping 15
Skipping 16
Skipping 17
Skipping 18
Skipping 19
Skipping 20
Skipping 21
Skipping 22
Skipping 23
Skipping 24
Skipping 25
Skipping 26
Skipping 27
Skipping 28
Skipping 29
Skipping 30
Skipping 31
Skipping 32
Skipping 33
Skipping 34
Skipping 35
Skipping 36
Skipping 37
Skipping 38
Skipping 39
Skipping 40
Skipping 41
Skipping 42
Skipping 43
Skipping 44
Skipping 45
Skipping 46
Skipping 47
Skipping 48
Skipping 49
Skipping 50
Skipping 51
Skipping 52
Skipping 53
Skipping 54
Skipping 55
Skipping 56
Skipping 57
Skipping 58
Skipping 59
Skipping 60
Skipping 61
Skipping 62
Skipping 63
Skipping 64
Skipping 65
Skipping 66
Skipping 67
Skipping 68
Skipping 69
Skipping 70
Skipping 71
Skipping 72
Skipping 73
Skipping 74
Skipping 75
Skipping 76
Skipping 77
Skipping 78
Skipping 79
Skipping 80
Skipping 81
Skipping 82
Skipping 83
Skipping 84
Skipping 85
Skipping 86
Skipping 87
Skipping 88
Skipping 89
Skipping 90
Skipping 91
Skipping 92
Skipping 93
Skipping 94
Skipping 95
Skipping 96
Skipping 97
Skipping 98
Skipping 99
Skipping 100
Skipping 101
Skipping 102
Skipping 103
Skipping 104
Skipping 105
Skipping 106
Skipping 107
Skipping 108
Skipping 109
Skipping 110
Skipping 111
Skipping 112
Skipping 113
Skipping 114
Skipping 115
Skipping 116
Skipping 117
Skipping 118
Skipping 119
Skipping 120
Skipping 121
Skipping 122
Skipping 123
Skipping 124
Skipping 125
Skipping 126
Skipping 127
Skipping 128
Skipping 129
Skipping 130
Skipping 131
Skipping 132
Skipping 133
Skipping 134
Skipping 135
Skipping 136
Skipping 137
Skipping 138
Skipping 139
Skipping 140
Skipping 141
Skipping 142
Skipping 143
Skipping 144
Skipping 145
Skipping 146
Skipping 147
Skipping 148
Skipping 149
Skipping 150
Skipping 151
Skipping 152
Skipping 153
Skipping 154
Skipping 155
Skipping 156
Skipping 157
Skipping 158
Skipping 159
Skipping 160
Skipping 161
Skipping 162
Skipping 163
Skipping 164
Skipping 165
Skipping 166
Skipping 167
Skipping 168
Skipping 169
Skipping 170
Skipping 171
Skipping 172
Skipping 173
Skipping 174
Skipping 175
Skipping 176
Skipping 177
Skipping 178
Skipping 179
Skipping 180
Skipping 181
Skipping 182
Skipping 183
Skipping 184
Skipping 185
Skipping 186
Skipping 187
Skipping 188
Skipping 189
Skipping 190
Skipping 191
Skipping 192
Skipping 193
Skipping 194
Skipping 195
Skipping 196
Skipping 197
Skipping 198
Skipping 199
Skipping 200
Skipping 201
Skipping 202
Skipping 203
Skipping 204
Skipping 205
Skipping 206
Skipping 207
Skipping 208
Skipping 209
Skipping 210
Skipping 211
Skipping 212
Skipping 213
Skipping 214
Skipping 215
Skipping 216
Skipping 217
Skipping 218
Skipping 219
Skipping 220
Skipping 221
Skipping 222
Skipping 223
Skipping 224
Skipping 225
Skipping 226
Skipping 227
Skipping 228
Skipping 229
Skipping 230
Skipping 231
Skipping 232
Skipping 233
Skipping 234
Skipping 235
Skipping 236
Skipping 237
Skipping 238
Skipping 239
Skipping 240
Skipping 241
Skipping 242
Skipping 243
Skipping 244
Skipping 245
Skipping 246
Skipping 247
Skipping 248
Skipping 249
Skipping 250
Skipping 251
Skipping 252
Skipping 253
Skipping 254
Skipping 255
Skipping 256
Skipping 257
Skipping 258
Skipping 259
Skipping 260
Skipping 261
Skipping 262
Skipping 263
Skipping 264
Skipping 265
Skipping 266
Skipping 267
Skipping 268
Skipping 269
Skipping 270
Skipping 271
Skipping 272
Skipping 273
Skipping 274
Skipping 275
Skipping 276
Skipping 277
Skipping 278
Skipping 279
Skipping 280
Skipping 281
Skipping 282
Skipping 283
Skipping 284
Skipping 285
Skipping 286
Skipping 287
Skipping 288
Skipping 289
Skipping 290
Skipping 291
Skipping 292
Skipping 293
Skipping 294
Skipping 295
Skipping 296
Skipping 297
Skipping 298
Skipping 299
Skipping 300
Skipping 301
Skipping 302
Skipping 303
Skipping 304
Skipping 305
Skipping 306
Skipping 307
Skipping 308
Skipping 309
Skipping 310
Skipping 311
Skipping 312
Skipping 313
Skipping 314
Skipping 315
Skipping 316
Skipping 317
Skipping 318
Skipping 319
Skipping 320
Skipping 321
Skipping 322
Skipping 323
Skipping 324
Skipping 325
Skipping 326
Skipping 327
Skipping 328
Skipping 329
Skipping 330
Skipping 331
Skipping 332
Skipping 333
Skipping 334
Skipping 335
Skipping 336
Skipping 337
Skipping 338
Skipping 339
Skipping 340
Skipping 341
Skipping 342
Skipping 343
Skipping 344
Skipping 345
Skipping 346
Skipping 347
Skipping 348
Skipping 349
Skipping 350
Skipping 351
Skipping 352
Skipping 353
Skipping 354
Skipping 355
Skipping 356
Skipping 357
Skipping 358
Skipping 359
Skipping 360
Skipping 361
Skipping 362
Skipping 363
Skipping 364
Skipping 365
Skipping 366
Skipping 367
Skipping 368
Skipping 369
Skipping 370
Skipping 371
Skipping 372
Skipping 373
Skipping 374
Skipping 375
Skipping 376
Skipping 377
Skipping 378
Skipping 379
Skipping 380
Skipping 381
Skipping 382
Skipping 383
Skipping 384
Skipping 385
Skipping 386
Skipping 387
Skipping 388
Skipping 389
Skipping 390
Skipping 391
Skipping 392
Skipping 393
Skipping 394
Skipping 395
Skipping 396
Skipping 397
Skipping 398
Skipping 399
Skipping 400
Skipping 401
Skipping 402
Skipping 403
Skipping 404
Skipping 405
Skipping 406
Skipping 407
Skipping 408
Skipping 409
Skipping 410
Skipping 411
Skipping 412
Skipping 413
Skipping 414
Skipping 415
Skipping 416
Skipping 417
Skipping 418
Skipping 419
Skipping 420
Skipping 421
Skipping 422
Skipping 423
Skipping 424
Skipping 425
Skipping 426
Skipping 427
Skipping 428
Skipping 429
Skipping 430
Skipping 431
Skipping 432
Skipping 433
Skipping 434
Skipping 435
Skipping 436
Skipping 437
Skipping 438
Skipping 439
Skipping 440
Skipping 441
Skipping 442
Skipping 443
Skipping 444
Skipping 445
Skipping 446
Skipping 447
Skipping 448
Skipping 449
Skipping 450
Skipping 451
Skipping 452
Skipping 453
Skipping 454
Skipping 455
Skipping 456
Skipping 457
Skipping 458
Skipping 459
Skipping 460
Skipping 461
Skipping 462
Skipping 463
Skipping 464
Skipping 465
Skipping 466
Skipping 467
Skipping 468
Skipping 469
Skipping 470
Skipping 471
Skipping 472
Skipping 473
Skipping 474
Skipping 475
Skipping 476
Skipping 477
Skipping 478
Skipping 479
Skipping 480
Skipping 481
Skipping 482
Skipping 483
Skipping 484
Skipping 485
Skipping 486
Skipping 487
Skipping 488
Skipping 489
Skipping 490
Skipping 491
Skipping 492
Skipping 493
Skipping 494
Skipping 495
Skipping 496
Skipping 497
Skipping 498
Skipping 499
Skipping 500
Skipping 501
Skipping 502
Skipping 503
Skipping 504
Skipping 505
Skipping 506
Skipping 507
Skipping 508
Skipping 509
Skipping 510
Skipping 511
Skipping 512
Skipping 513
Skipping 514
Skipping 515
Skipping 516
Skipping 517
Skipping 518
Skipping 519
Skipping 520
Skipping 521
Skipping 522
Skipping 523
Skipping 524
Skipping 525
Skipping 526
Skipping 527
Skipping 528
Skipping 529
Skipping 530
Skipping 531
Skipping 532
Skipping 533
Skipping 534
Skipping 535
Skipping 536
Skipping 537
Skipping 538
Skipping 539
Skipping 540
Skipping 541
Skipping 542
Skipping 543
Skipping 544
Skipping 545
Skipping 546
Skipping 547
Skipping 548
Skipping 549
Skipping 550
Skipping 551
Skipping 552
Skipping 553
Skipping 554
Skipping 555
Skipping 556
Skipping 557
Skipping 558
Skipping 559
Skipping 560
Skipping 561
Skipping 562
Skipping 563
Skipping 564
Skipping 565
Skipping 566
Skipping 567
Skipping 568
Skipping 569
Skipping 570
Skipping 571
Skipping 572
Skipping 573
Skipping 574
Skipping 575
Skipping 576
Skipping 577
Skipping 578
Skipping 579
Skipping 580
Skipping 581
Skipping 582
Skipping 583
Skipping 584
Skipping 585
Skipping 586
Skipping 587
Skipping 588
Skipping 589
Skipping 590
Skipping 591
Skipping 592
Skipping 593
Skipping 594
Skipping 595
Skipping 596
Skipping 597
Skipping 598
Skipping 599
Skipping 600
Skipping 601
Skipping 602
Skipping 603
Skipping 604
Skipping 605
Skipping 606
Skipping 607
Skipping 608
Skipping 609
Skipping 610
Skipping 611
Skipping 612
Skipping 613
Skipping 614
Skipping 615
Skipping 616
Skipping 617
Skipping 618
Skipping 619
Skipping 620
Skipping 621
Skipping 622
Skipping 623
Skipping 624
Skipping 625
Skipping 626
Skipping 627
Skipping 628
Skipping 629
Skipping 630
Skipping 631
Skipping 632
Skipping 633
Skipping 634
Skipping 635
Skipping 636
Skipping 637
Skipping 638
Skipping 639
Skipping 640
Skipping 641
Skipping 642
Skipping 643
Skipping 644
Skipping 645
Skipping 646
Skipping 647
Skipping 648
Skipping 649
Skipping 650
Skipping 651
Skipping 652
Skipping 653
Skipping 654
Skipping 655
Skipping 656
Skipping 657
Skipping 658
Skipping 659
Skipping 660
Skipping 661
Skipping 662
Skipping 663
Skipping 664
Skipping 665
Skipping 666
Skipping 667
Skipping 668
Skipping 669
Skipping 670
Skipping 671
Skipping 672
Skipping 673
Skipping 674
Skipping 675
Skipping 676
Skipping 677
Skipping 678
Skipping 679
Skipping 680
Skipping 681
Skipping 682
Skipping 683
Skipping 684
Skipping 685
Skipping 686
Skipping 687
Skipping 688
Skipping 689
Skipping 690
Skipping 691
Skipping 692

Traceback (most recent call last):
  File "batch_inference.py", line 217, in <module>
    main()
  File "batch_inference.py", line 210, in main
    out.release()
UnboundLocalError: local variable 'out' referenced before assignment

Weights for discriminator

Hello,

I tried finetuning just the Generator. The quality of the image does increase, however the lip movements go out of sync a bit.
Can I have the trained weights for the Discriminator too to start from ?

Error with ffmpeg config for batch inference

Hi, @Rudrabha @prajwalkr thanks for the amazing code and congrats on your paper!!

I am trying to run inference but I get the following error message with ffmpeg:

[buffer @ 0x6f2b80] Error setting option pix_fmt to value -1.
[graph 0 input from stream 1:0 @ 0x6f2e20] Error applying options to the filter.
Error opening filters!

If there some config requirement that I am missing?
Looking forward to your response.

When using video colab gets killed

Hey @prajwalkr
Thanks for sharing the codebase and models for such a wonderful project! Much appreciated
I have successfully been able to replicate Use Case-2 (using image with lip sync) on both CPU & Google Colab.

However for Use Case 1 (using video with LipSync) Both my CPU and Google Colab quit the process since it consumes a lot of RAM. I have a short text about 75 Characters long. I have an input video of 30 seconds at 60 FPS. How do I end up using it here?

Generated Video Rambles & Stops (Bad Lip Sync)

Thanks for developing this software as it works very well! There's an issue where if you have an audio clip, the generated video continues to "ramble" through them. Things I've tried:

Setting the correct FPS.
Setting the max seconds to match the audio/video clip.
Tried multiple video AND audio sources.
Reinstalling everything that coincides with your requirements.txt file.

It's very easy to reproduce. You can record audio and try it. The model will ramble right through it, and there will be an elongated pause at the end of the video with the person freezing. If I were to guess, maybe something with the mfcc (I'm unfamiliar with this as a whole) isn't working properly, or there needs to be an implemented interpolation method to sync video with audio. Currently using the pretrained models at your Google Drive link. Any insight is appreciated!

Issue with running python Google Collab notebook

Hi, upon running the following cell:

!python3 batch_inference.py --checkpoint_path logs/lipgan_residual_mel.h5 --model residual --face "/content/output_00006.mp4" --fps 25 --audio /content/t.wav --results_dir /content

I get the following errors:

Using TensorFlow backend.
Number of frames available for inference: 0
Traceback (most recent call last):
File "/usr/local/lib/python3.6/dist-packages/librosa/core/audio.py", line 127, in load
with sf.SoundFile(path) as sf_desc:
File "/usr/local/lib/python3.6/dist-packages/soundfile.py", line 629, in init
self._file = self._open(file, mode_int, closefd)
File "/usr/local/lib/python3.6/dist-packages/soundfile.py", line 1184, in _open
"Error opening {0!r}: ".format(self.name))
File "/usr/local/lib/python3.6/dist-packages/soundfile.py", line 1357, in _error_check
raise RuntimeError(prefix + _ffi.string(err_str).decode('utf-8', 'replace'))
RuntimeError: Error opening '/content/t.wav': System error.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "batch_inference.py", line 217, in
main()
File "batch_inference.py", line 166, in main
wav = audio.load_wav(args.audio, 16000)
File "/content/LipGAN/audio.py", line 10, in load_wav
return librosa.core.load(path, sr=sr)[0]
File "/usr/local/lib/python3.6/dist-packages/librosa/core/audio.py", line 142, in load
y, sr_native = __audioread_load(path, offset, duration, dtype)
File "/usr/local/lib/python3.6/dist-packages/librosa/core/audio.py", line 164, in __audioread_load
with audioread.audio_open(path) as input_file:
File "/usr/local/lib/python3.6/dist-packages/audioread/init.py", line 111, in audio_open
return BackendClass(path)
File "/usr/local/lib/python3.6/dist-packages/audioread/rawread.py", line 62, in init
self._fh = open(filename, 'rb')
FileNotFoundError: [Errno 2] No such file or directory: '/content/t.wav'

I ran all the existing cells before this, do you know what the issue would be?

multi gpu inference

Thanks for your great work. I tried to inference on a machine with multiple gpu, it could detect all gpu (I also set the n_gpus as the number of gpu). it works fine on dlib, after running dlib part, it halts on just after "Model Created" and "Model Loaded".

Could I have some hints how I could try on multi gpu machine? Thanks for your help!

MAT file required

Fully pythonic branch doesn't works without the mat file

Recommend Projects

React

A declarative, efficient, and flexible JavaScript library for building user interfaces.
Vue.js

🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
Typescript

TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
TensorFlow

An Open Source Machine Learning Framework for Everyone
Django

The Web framework for perfectionists with deadlines.
Laravel

A PHP framework for web artisans
D3

Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

javascript

JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
web

Some thing interesting about web. New door for the world.
server

A server is a program made to process requests and deliver data to clients.
Machine learning

Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Visualization

Some thing interesting about visualization, use data art
Game

Some thing interesting about game, make everyone happy.

Recommend Org

Facebook

We are working to build community through open source technology. NB: members must have two-factor auth.
Microsoft

Open source projects and samples from Microsoft.
Google

Google ❤️ Open Source for everyone.
Alibaba

Alibaba Open Source for everyone
D3

Data-Driven Documents codes.
Tencent

China tencent open source team.