Food-Image-Captioning-In-YELP-Dataset

Problem Statement:

To deep learning build a model that can describe the content of the food image in the form of text given an input image of food.

Methodology :

The proposed methodology for food image captioning consists of three main modules.

CNN-LSTM Caption Prediction(CLCP)
Multi Label Classification(MLC)
Natural Language Generator(NLG)

Requirements:

Python >3.7
TensorFlow>2.0
Google Transformer T5
Spacy

Dataset Required:

Yelp Food Dataset
Yummly28k
FFoCat dataset

Configuring .ipynb and .py files for inference:

1.CNN-LSTM Caption Prediction Module:

IPYNB File: CNN_LSTM_MultiLabel.ipynb

1.1 Training

It is the initial model where for the training the images with captions are used. For this module a JSON file with image_id and caption are needed. Along with the images in a specific folder. The json file is red and the captions are pre-processed. And a word embedding is created from converting word to vector. The images from the path are considered for CNN feature extraction. For the LSTM words from caption are provided sequentially. LSTM tries to learn to predict next word from previous. The loss is calculated at the end and the learning rate is predicted.

1.2 Testing

The image from the folder has to be read and passed to CNN feature extractor the same is passed to LSTM to generate caption by comparing the vector with the word2vec embedding.

2.Multi Label Classification Module

PY File: Mulit-label//food_category_classification.py

2.1 Training

It is a transfer learning approach. InceptionV3 followed by dense layer is used to train the model. The FFoCat dataset with 156 labels is used for training.

2.2 Testing

Set the path of trained model in the model load section and the path that the image has to be read from. Then run the code to get the desired output.

3.NLG

IPYNB File: NLG_Googles_T5_for_T2T.ipynb

3.1 Training;

Trained on C4 data and fine-tuned an yummly28 dataset. The required caption must be present in a data frame and fed to the model for training.

3.2 Testing:

The output from Multilabel and CNN-LSTM must be concatenated and fed into the trained model to get the required output.

Results

Table 1.1: Results of Caption Generation on Yummly28k data

In this section we aim to generate the caption for the images from Yummly28k dataset. The caption that we try to generate must be close to the given caption in order to evaluate using some evaluation metrics discussed in above section. The food images considered in Table 1.1 are from Yummly28k dataset. Given caption is the ground truth caption for the food image. Intermediate caption is output of CNN LSTM model. Intermediate Label is output of Multi Label Classifier. Generated caption is the final generated caption.

Table 1.2: Performance evaluation metrics on Yummly28k data

By our proposed approach of caption generation different performance evaluation metrics achieved in Table 1.2 on Yummly28k dataset.

Results on Yelp food dataset

In this section we attempt to generate caption for the YELP food data. There are two ways in which we generate the caption for YELP data. One for the uncaptioned data (Table 1.3) where the caption is generated for the image without caption. In other way (Table 1.4) the caption is generated for the image with irrelevant caption for which we try to provide better caption than the given caption.

Table 1.3: Results of Caption Generation on Yelp uncaptioned food data

The images considered in Table 1.3 are from YELP uncaptioned food dataset. For the images considered there is no caption given in the dataset. The model is able to generate good captions for the images by effectively making use of intermediate caption and label prediction followed by NLG sentence generator for proper caption generation. The image and the caption generated have positive correlation between them.

Table 1.4: Results of Caption Generation on Yelp captioned food data

The images considered in Table 1.4 are from YELP captioned dataset. For the images considered there is caption given in the dataset. Here we aim to generate a better caption than the given that well describes the image. Since for some amount of data in the yelp for which caption is provided which is irrelevant. The model is able to generate good captions for the images by effectively making use of intermediate caption and label prediction followed by NLG sentence generator for proper caption generation. The image and the caption generated have positive correlation between them.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
Codes		Codes
Images		Images
Food Image Captioning in YELP Dataset PPT.pdf		Food Image Captioning in YELP Dataset PPT.pdf
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Food-Image-Captioning-In-YELP-Dataset

Problem Statement:

Methodology :

Requirements:

Dataset Required:

Configuring .ipynb and .py files for inference:

1.CNN-LSTM Caption Prediction Module:

1.1 Training

1.2 Testing

2.Multi Label Classification Module

2.1 Training

2.2 Testing

3.NLG

3.1 Training;

3.2 Testing:

Results

Table 1.1: Results of Caption Generation on Yummly28k data

Table 1.2: Performance evaluation metrics on Yummly28k data

Results on Yelp food dataset

Table 1.3: Results of Caption Generation on Yelp uncaptioned food data

Table 1.4: Results of Caption Generation on Yelp captioned food data

About

Releases

Packages

Languages

anushkbyakodi/Food-Image-Captioning-In-YELP-Dataset

Folders and files

Latest commit

History

Repository files navigation

Food-Image-Captioning-In-YELP-Dataset

Problem Statement:

Methodology :

Requirements:

Dataset Required:

Configuring .ipynb and .py files for inference:

1.CNN-LSTM Caption Prediction Module:

1.1 Training

1.2 Testing

2.Multi Label Classification Module

2.1 Training

2.2 Testing

3.NLG

3.1 Training;

3.2 Testing:

Results

Table 1.1: Results of Caption Generation on Yummly28k data

Table 1.2: Performance evaluation metrics on Yummly28k data

Results on Yelp food dataset

Table 1.3: Results of Caption Generation on Yelp uncaptioned food data

Table 1.4: Results of Caption Generation on Yelp captioned food data

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages