Greetings from Mudit the grooming ML Enthusiast. Welcome to the Image Captioning repository! This project implements an image captioning system using deep learning techniques. The aim is to generate descriptive captions for images automatically.
Image captioning is a challenging problem that lies at the intersection of computer vision and natural language processing. This project uses convolutional neural networks (CNNs) to extract features from images and recurrent neural networks (RNNs), specifically Long Short-Term Memory (LSTM) networks, to generate captions.
- Extracts features from images using a pre-trained CNN.
- Generates captions using an LSTM-based RNN.
- Supports training on custom datasets.
- Includes pre-processing and post-processing steps for images and text.
This project leverages several technologies:
- Python: The programming language used for development.
- TensorFlow/Keras: Deep learning framework for building and training the neural networks.
- NumPy: Library for numerical operations.
- Pandas: Library for data manipulation and analysis.
- OpenCV: Library for image processing.
- Matplotlib: Library for plotting and visualization.
We use a pre-trained convolutional neural network (CNN), specifically InceptionV3, to extract features from images. The CNN is truncated before the final classification layer, and the output from the last convolutional block is used as the image representation.
For generating captions, we use an LSTM-based recurrent neural network. The process involves:
- Embedding Layer: Converts input words to dense vectors of fixed size.
- LSTM Layer: Processes the sequence of embedded words, maintaining a memory of the sequence.
- Dense Layer: Maps the LSTM output to the vocabulary size to predict the next word.
- Image Pre-processing: Resize and normalize images.
- Text Pre-processing: Tokenize captions, build vocabulary, and convert captions to sequences.
- Model Training: Train the model using pairs of image features and corresponding captions.
- Python 3.x
- TensorFlow
- Keras
- NumPy
- Pandas
- OpenCV
- Matplotlib
-
Clone the repository:
git clone https://github.com/Mudit2003/imageCaptioning.git cd imageCaptioning
-
Install the required packages:
pip install -r requirements.txt
- Prepare your dataset of images and captions.
- Pre-process the images and captions using the provided scripts.
- Train the model:
python train.py --dataset_path path/to/dataset --epochs 50 --batch_size 64
To generate captions for new images:
python generate_caption.py --image_path path/to/image.jpg
Contributions are welcome! Please follow these steps:
- Fork the repository.
- Create a new branch (
git checkout -b feature-branch
). - Commit your changes (
git commit -m 'Add some feature'
). - Push to the branch (
git push origin feature-branch
). - Open a pull request.
This project is licensed under the MIT License. See the LICENSE file for details.
If you have any questions or suggestions, feel free to open an issue or contact me at [email protected].
Happy coding!