This is a full implementation of this article by me.
The dataset used in this project can be found here.
Our goal is to develop an image generation model that we can use to augment our dataset by generating new but similar crack images to real ones in order to improve Faster R-CNN performance in detecting pavement cracks.
The architecture of the image generation model is pretty creative. We first train a variational autoencoder to learn latent encodings of real crack images; Then we put decoder away and latent vectors obtained from the variational encoder are used to do sampling. This sampling is essential for providing randomness needed to generate new images. After that, we feed the sampled vector as input to the DCGAN model. Finally, we do the same thing with discriminator as we did to the decoder!
Image below is from article:
After training the VAE + DCGAN model, we fine-tuned the famous Faster R-CNN object detection model in order to find cracks on images. We tested it in two scenarios; in our first try we fine-tuned it on real images, in our last we did it on a mixed of real images and generated images to see if we can see improvements in Average Precision.
We evaluated our generative model even further by two known metrics: Inception score(IS) and Fréchet inception distance(FID). You can read more about them here.