The SUT Dataset was first introduced in 13th International Conference on Computer and Knowledge Engineering (ICCKE 2023) conference.
The authors aim to tackle challenges associated with obtaining diverse and substantial ground-truth data for supervised models in document image analysis (DIA) tasks, including document image classification, text detection and recognition, and information retrieval.
The dataset comprises 62,453
images that have been categorized into 21
distinct classes, including identity documents featuring synthetically generated personal information superimposed on various backgrounds.
The dataset also includes corresponding files with labeling information for the images. The ground-truth data is organized in CSV files
containing image paths and associated information about the embedded data.
Applicants seeking access to the SUT dataset are kindly requested to complete the formal application form and send it to email address eshabaninia@gmail.com.
Upon receipt of your application, we will process it within 48-72
hours and subsequently provide the necessary download links for the dataset.
If you find this data useful, please consider citing our paper:🙏🌹
E. Shabaninia, F. s. Eslami, A. Afkari-Fahandari and H. Nezamabadi-pour, "SUT: a new multi-purpose synthetic dataset for Farsi document image analysis," 2023 13th International Conference on Computer and Knowledge Engineering (ICCKE), Mashhad, Iran, Islamic Republic of, 2023, pp. 253-258, doi: 10.1109/ICCKE60553.2023.10326243.