GitHub - shekarneo/Handwritten_textrecognition_page

Run demo

Go to the model/ directory and unzip the file model.zip (pre-trained on the IAM dataset). Take care that the unzipped files are placed directly into the model/ directory and not some subdirectory created by the unzip-program. Afterwards, go to the src/ directory and run python main.py. The input image and the expected output is shown below.

> python main.py
Validation character error rate of saved model: 10.624916%
Init with stored values from ../model/snapshot-38
Recognized: "little"
Probability: 0.96625507

Command line arguments

--train: train the NN, details see below.
--validate: validate the NN, details see below.
--beamsearch: use vanilla beam search decoding (better, but slower) instead of best path decoding.
--wordbeamsearch: use word beam search decoding (only outputs words contained in a dictionary) instead of best path decoding. This is a custom TF operation and must be compiled from source, more information see corresponding section below. It should not be used when training the NN.
--dump: dumps the output of the NN to CSV file(s) saved in the dump/ folder. Can be used as input for the CTCDecoder.

If neither --train nor --validate is specified, the NN infers the text from the test image (data/test.png). Two examples: if you want to infer using beam search, execute python main.py --beamsearch, while you have to execute python main.py --train --beamsearch if you want to train the NN and do the validation using beam search.

Train model

IAM dataset

The data-loader expects the IAM dataset [5] (or any other dataset that is compatible with it) in the data/ directory. Follow these instructions to get the dataset:

Register for free at this website.
Download words/words.tgz.
Download ascii/words.txt.
Put words.txt into the data/ directory.
Create the directory data/words/.
Put the content (directories a01, a02, ...) of words.tgz into data/words/.
Go to data/ and run python checkDirs.py for a rough check if everything is ok.

If you want to train the model from scratch, delete the files contained in the model/ directory. Otherwise, the parameters are loaded from the last model-snapshot before training begins. Then, go to the src/ directory and execute python main.py --train. After each epoch of training, validation is done on a validation set (the dataset is split into 95% of the samples used for training and 5% for validation as defined in the class DataLoader). If you only want to do validation given a trained NN, execute python main.py --validate.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
model		model
src		src
LICENSE.md		LICENSE.md
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Run demo

Command line arguments

Train model

IAM dataset

About

Releases

Packages

Languages

License

shekarneo/Handwritten_textrecognition_page

Folders and files

Latest commit

History

Repository files navigation

Run demo

Command line arguments

Train model

IAM dataset

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages