With the help of this project, machine can generate its own captions related to the image. The model uses Flickr dataset for training data and uses LSTM for training.
- Image model: for reducing image of high dimensions into selected features.
- Language model: for creating output as embeddings to reduce complexity of model.
- Final model: which will concatenate results of both this model and will use LSTM layer and Time Distributed layer for doing final_predicttions.
A woman is looking at some mannequins in a window display .
