Giter Site home page Giter Site logo

afourast / deep_lip_reading Goto Github PK

View Code? Open in Web Editor NEW
186.0 5.0 53.0 1.73 MB

Code and models for evaluating a state-of-the-art lip reading network

Home Page: http://www.robots.ox.ac.uk/~vgg/research/deep_lip_reading/

License: Apache License 2.0

Python 99.69% Shell 0.31%

deep_lip_reading's Introduction

Deep Lip Reading

This repository contains code for evaluating the best performing lip reading model described in the paper Deep Lip Reading: A comparison of models and an online application. The model is based on the Transformer architecture.

Input Crop Enc-Dec Attention Prediction
alt text alt text alt text alt text

Dependencies

System

  • ffmpeg

Python

  • TensorFlow
  • NumPy
  • PyAV
Optional for visualization
  • MoviePy
  • Imageio-ffmpeg
  • OpenCV
  • TensorBoard

Recommended way to install the python dependencies is creating a new virtual environment and then running

pip install -r requirements.txt

Demo

To verify that everything works

  1. Run ./download_models.sh to get the pretrained models
  2. Run a simple demo
python main.py --lip_model_path models/lrs2_lip_model 

expected output:

(wer=0.0) IT'S-THAT-SIMPLE --> IT'S-THAT-SIMPLE
 1/1 [================] - ETA: 0:00:00 - cer: 0.00 - wer: 0.00

Visualization

To visualize the input, attention matrix and predictions set the --tb_eval flag to 1 (not supported with beam search)

python main.py  --lip_model_path models/lrs2_lip_model --tb_eval 1 --img_channels 3

Then point tensorboard to the resulting log directory

tensorboard --logdir=eval_tb_logs

Datasets

The models have been trained and evaluated on the LRW and LRS datasets as well as the non-public MVLRS dataset. More details can be found in the paper.

To evaluate on LRS2 download and extract the dataset into e.g. data/lrs2

For a quick evaluation on the test set without beam search run:

python main.py --gpu_id 0 --lip_model_path models/lrs2_lip_model --data_path data/lrs2/main --data_list media/lrs2_test_samples.txt 

This should take a few minutes on a GPU and result in a WER of approximately 58%.

expected output:

(wer=116.7) AND-FOR-ME-THE-SURPRISE-WAS --> I-FOUND-FOR-ME-THAT-IT-IS-A-SURPRISE-RATE
   1/1243 [..............................] - ETA: 54:59 - cer: 0.6667 - wer: 1.1667
(wer=100.0) THEY'RE-MOVING-AROUND --> THEY-MOVED-IT-AROUND
   2/1243 [..............................] - ETA: 33:51 - cer: 0.5238 - wer: 1.0833
(wer=25.0) AND-WE-WERE-RIGHT --> AND-WE-WERE-READ
   3/1243 [..............................] - ETA: 26:30 - cer: 0.4276 - wer: 0.8056
(wer=100.0) AND-THE-NEXT-DAY --> IT'S-NOT-ACTUALLY
   4/1243 [..............................] - ETA: 22:40 - cer: 0.5395 - wer: 0.8542
(wer=62.5) WHEN-THERE-ISN'T-MUCH-ELSE-IN-THE-GARDEN --> WHETHER-IT'S-MUCH-HOLDING-THE-GARDEN
   5/1243 [..............................] - ETA: 21:46 - cer: 0.4916 - wer: 0.8083
                                         .
                                         .
                                         .
(wer=40.0) THESE-LAWS-WOULD-REMAIN-IN-PLACE-FOR-OVER-200-YEARS --> THESE-COURSE-WOULD-HAVE-REPLACED-FOR-OVER-200-YEARS
1239/1243 [============================>.] - ETA: 3s - cer: 0.3828 - wer: 0.5845
(wer=28.6) AS-A-RESULT-OF-THE-GUNPOWDER-PLOT --> AS-A-RESULT-OF-THE-COMPOUND-APPROACH
1240/1243 [============================>.] - ETA: 2s - cer: 0.3828 - wer: 0.5843
(wer=0.0) IT-MAY-TAKE-SOME-TIME --> IT-MAY-TAKE-SOME-TIME
1241/1243 [============================>.] - ETA: 1s - cer: 0.3824 - wer: 0.5838
(wer=0.0) YOU-KNOW-MOST-OF-IT --> YOU-KNOW-MOST-OF-IT
1242/1243 [============================>.] - ETA: 0s - cer: 0.3821 - wer: 0.5834
(wer=100.0) SO-I'LL-ASK-YOU-AGAIN --> WHEN-I-SAW-HIM
1243/1243 [==============================] - 951s 765ms/step - cer: 0.3825 - wer: 0.5837
lm=None, beam=0, bs=1, test_aug:0, horflip True: CER 0.3825, WER 0.583690

For the best results, run a full beam search, using the language model and performing simple test-time augmentation in the form of horizontal flips.

python main.py --gpu_id 0 --lip_model_path models/lrs2_lip_model --lm_path models/lrs2_language_model --data_path data/lrs2/main --data_list media/lrs2_test_samples.txt --graph_type infer --test_aug_times 2  --beam_size 35

This will take a few hours to complete on a GPU and give WER of approx. 49%.

Citation

If you use this code, please cite:

@InProceedings{Afouras18b,
  author       = "Afouras, T. and Chung, J.~S. and Zisserman, A.",
  title        = "Deep Lip Reading: a comparison of models and an online application",
  booktitle    = "INTERSPEECH",
  year         = "2018",
}

Acknowledgments

The Transformer model is based on the implementation of Kyubyong.

The beam search was adapted from Tensor2Tensor.

The char-RNN language model uses code from sherjilozair.

deep_lip_reading's People

Contributors

afourast avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

deep_lip_reading's Issues

Tensorflow 2.5.3 not works

python main.py --lip_model_path models/lrs2_lip_model
2022-02-16 17:28:19.165150: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
Traceback (most recent call last):
File "main.py", line 13, in
from language_model.char_rnn_lm import CharRnnLmWrapperSingleton
File "/home/redpanda/codebase/deep_lip_reading-dependabot-pip-tensorflow-gpu-2.5.3/language_model/char_rnn_lm.py", line 27, in
from tensorflow.contrib import legacy_seq2seq
ModuleNotFoundError: No module named 'tensorflow.contrib'

It seems that this module has been removed from Tensorflow2.5.3.

What version of tensorflow to use?

Hi!

Can I please check with the author as to which version of tensorflow is recommended for use? Using 1.15 fails with "from tensorflow.python.profiler import trace" errors while using 2.8.0 gives errors with importing tensorflow.contrib

Best,
Sourav

Training data

I'm new to lip reading so I'm sorry if this sounds like a very basic question. But I want to ask for clarification purpose, does the word boundary needed for the training data?

train

can anyone train this network?

MV-LRS dataset

Hi! Thanks for sharing the code. Could you please tell me where I can found the MV-LRS dataset? I cannot found it in the LRS paper. Is it a subset of LRS? Thank you!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.