This project implements an Image Caption Generator using Deep Learning techniques.
It combines a Convolutional Neural Network (CNN) for image feature extraction with a Recurrent Neural Network (LSTM) for generating descriptive captions in natural language.
- Dataset: Flickr8k (8,000 images with 5 captions each)
- Model: CNN (InceptionV3) + LSTM
- Frameworks: Python, TensorFlow/Keras
- Goal: Generate accurate captions for unseen images by learning the mapping between image features and natural language.
- Image preprocessing and feature extraction using pre-trained InceptionV3
- Caption preprocessing with tokenization and padding
- Sequence modeling with LSTM
- Evaluation using BLEU scores
- Inference script to generate captions for custom images
data/ # Dataset: images and captions src/ # Scripts: utils, feature extraction, model, training, inference notebooks/ # Interactive notebooks for exploration, training, inference requirements.txt # Python dependencies
- Images: 8,000 images of everyday scenes.
- Captions: 5 captions per image (
Flickr8k.token.txt).
Instructions:
- Download Flickr8k dataset.
- Place images in
data/Images/and captions indata/Flickr8k.token.txt.
-
Clone the repo: git clone <your_repo_link> cd Image-Caption-Generator
-
Install dependencies:
- Run
notebooks/03_model_training.ipynbto train the model. - Training and validation loss will be plotted automatically.
- Model weights will be saved to
models/model_weights.h5.
- Run
notebooks/04_inference_demo.ipynbto generate captions for sample images. - Visualize results inline or save them in
examples/.