Street View House Numbers (SHVN) is a real-world image dataset that consists of images obtained from house numbers in Google street view images. The dataset can be used to automatically transcribe an address number from a geo-located patch of pixels and the associated transcribed number can be used to pinpoint the location of the building it represents.
The challenge in transcribing text from images is that the visual appearance of text in the image dataset varies based on large range of fonts, colours, styles, orientation and character arrangements. The problem is further complicated by environmental factors such as lighting, shadows, secularities and occlusions as well as image acquisition factors such as resolution, motion and focus blurs.
The objective of the project is to learn how to implement a simple image classification pipeline based on deep neural network using Street View House Numbers (SVHN) dataset.
The goals of this project are as follows:
- Load Dataset
- Understand the basic Image Classification pipeline and the data-driven approach (train/predict stages)
- Data fetching and understand the train/val/test splits.
- Implement and apply a deep neural network classifier including (feedforward neural network, RELU, activations)
- Implement batch normalization for training the neural network
- Print the classification accuracy metrics
Street View House Number dataset can be downloaded from the following url - https://drive.google.com/file/d/15ecgFbTSpzHhHQ2q7IKQj4L20bK28gAn/view?usp=sharing