GitHub - nchlis/image_captioning: Image captioning tutorial using publicly available data (Flickr8k)

Deep learning for image captioning: Should RNNs be blind?

The usage of each file is the following:

load_images.py: reads the Flickr8K images, encodes then with a pre-trained ResNet50 and saves them to disk as Flickr8k_images_encoded.npy. The image filenames are saved into Flickr8k_images_filenames.npy.
load_captions.py: reads the FLickr8K captions and saves then into captions.npy
train_model_GRU.py, train_model_LSTM.py: They train a Merge architecture using a GRU and a LSTM, respectively
train_model_GRU_inject.py, train_model_LSTM.py: same as above, but using Inject architecture
evaluate_model.py: calculates BLEU scores for a given model
plot_results.py: plots the results of all models (figures of the blog post comparing the methods)
generate_new_caption.py: loads a trained model and generates captions for all images present in the ./captioned_images folder

Provide feedback

Name		Name	Last commit message	Last commit date
Latest commit History 21 Commits
captioned_images		captioned_images
figures		figures
saved_models		saved_models
LICENSE		LICENSE
README.md		README.md
comparison.csv		comparison.csv
evaluate_model.py		evaluate_model.py
generate_new_caption.py		generate_new_caption.py
load_captions.py		load_captions.py
load_images.py		load_images.py
plot_results.py		plot_results.py
train_model_GRU.py		train_model_GRU.py
train_model_GRU_inject.py		train_model_GRU_inject.py
train_model_LSTM.py		train_model_LSTM.py
train_model_LSTM_inject.py		train_model_LSTM_inject.py