This project aims to develop a deep learning model for lip reading using Convolutional Neural Networks (CNNs), Recurrent Neural Networks (RNNs), and Long Short-Term Memory (LSTM) cells. Lip reading, also known as automatic speech recognition from lip movements, plays a crucial role in enhancing communication accessibility for individuals with hearing impairments and in noisy environments.
data/
: Contains the dataset used for training and evaluation.models/
: Stores the trained models and checkpoints.notebooks/
: Jupyter notebooks for data exploration, model development, and evaluation.src/
: Source code for data preprocessing, model training, and inference.
- Python (>=3.6)
- TensorFlow (>=2.0)
- OpenCV (>=4.0)
- NumPy (>=1.16)
- Matplotlib (>=3.0)
- ImageIO (>=2.5)
- Clone the repository:
- Install the required dependencies
-
- Prepare the dataset:
- Place the dataset in the
data/
directory. - Preprocess the video data using the provided scripts.
- Train the model:
- Run the training script to train the deep learning model on the dataset.
- Evaluate the model:
- Use the trained model to make predictions on test data and evaluate performance metrics.
- Fine-tune and experiment with different architectures, hyperparameters, and optimization strategies to improve model performance.
This project was developed by Varun Reddy/Darshan/Arya/Sai as part of University Final Year Project.