CNN-RNN Automatic Image Captioning

The objective of this project is to develop, train and test a CNN-RNN model for automatically generating captions from a given image as shown in the example image below.

Dataset

The Microsoft Common Objects in COntext (MS COCO) dataset is a large-scale dataset commonly used to train and benchmark object detection, segmentation, and captioning algorithms. This dataset of image-caption pairs (obtained using the COCO API) is used in this project to train the CNN-RNN model to automatically generate captions from images.

Encoder

The Encoder uses the pre-trained ResNet-50 architecture (with the final fully-connected layer removed) to extract features from a batch of pre-processed images. The output is then flattened to a vector, before being passed through a Linear layer to transform the feature vector to have the same size as the word embedding.

Decoder

The Decoder is made of an embedding layer that stores word embedding of input feature vectors and captions, an LSTM layer and a fully-connected layer in the output that generates appropriate output key.

CNN-RNN Encoder-Decoder

The complete model combines the pretrained ResNet50 EncoderCNN model and LSTM DecoderRNN to automatically generate image captions.

Notebooks

The project is broken up into a few main parts in four Python notebooks:

Notebook 0 : Dataset - Explore the MS COCO dataset using the COCO API

Notebook 1 : Preliminaries - Explore the DataLoader, Obtain Batches, Experiment with the CNN Encoder and Implement the RNN Decoder

Notebook 2 : Training - Setup Training Process, Define & Tune Hyperparameters, Save Trained Models

Notebook 3 : Inference - Get Data Loader for Test Dataset, Define Decoder Sampler, Use trained model to generate captions for images in the test dataset.

Results

The picture above samples some images in the test dataset and the corresponding (relatively accurate) predicted captions.

The picture above samples some images in the test dataset and the corresponding (relatively inaccurate) predicted captions.

Acknowledgement

Notebook Documentation, Images and Starter Code are part of project files provided by Udacity in the Computer Vision Nanodegree.

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
images		images
.gitattributes		.gitattributes
1_Preliminaries.ipynb		1_Preliminaries.ipynb
2_Training.ipynb		2_Training.ipynb
3_Inference.ipynb		3_Inference.ipynb
LICENSE		LICENSE
README.md		README.md
data_loader.py		data_loader.py
model.py		model.py
vocab.pkl		vocab.pkl
vocabulary.py		vocabulary.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

CNN-RNN Automatic Image Captioning

Dataset

Encoder

Decoder

CNN-RNN Encoder-Decoder

Notebooks

Results

Acknowledgement

About

Releases

Packages

Languages

License

eazydammy/image-captioning

Folders and files

Latest commit

History

Repository files navigation

CNN-RNN Automatic Image Captioning

Dataset

Encoder

Decoder

CNN-RNN Encoder-Decoder

Notebooks

Results

Acknowledgement

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages