SUT Dataset: A New Multi-purpose Synthetic Dataset for Farsi Document Image Analysis🕊️

1. Introduction

The SUT Dataset was first introduced in 13th International Conference on Computer and Knowledge Engineering (ICCKE 2023) conference.

The authors aim to tackle challenges associated with obtaining diverse and substantial ground-truth data for supervised models in document image analysis (DIA) tasks, including document image classification, text detection and recognition, and information retrieval.

2. Dataset Description📝

The dataset comprises 62,453 images that have been categorized into 21 distinct classes, including identity documents featuring synthetically generated personal information superimposed on various backgrounds.

The dataset also includes corresponding files with labeling information for the images. The ground-truth data is organized in CSV files containing image paths and associated information about the embedded data.

Dataset Statistics: The destribution of images throughout the dataset is shown as below.

3. How to Access🤔💥

Applicants seeking access to the SUT dataset are kindly requested to complete the formal application form and send it to email address eshabaninia@gmail.com.

Upon receipt of your application, we will process it within 48-72 hours and subsequently provide the necessary download links for the dataset.

4. Citation:

If you find this data useful, please consider citing our paper:🙏🌹

E. Shabaninia, F. s. Eslami, A. Afkari-Fahandari and H. Nezamabadi-pour, "SUT: a new multi-purpose synthetic dataset for Farsi document image analysis," 2023 13th International Conference on Computer and Knowledge Engineering (ICCKE), Mashhad, Iran, Islamic Republic of, 2023, pp. 253-258, doi: 10.1109/ICCKE60553.2023.10326243.

Name		Name	Last commit message	Last commit date
Latest commit History 47 Commits
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

SUT Dataset: A New Multi-purpose Synthetic Dataset for Farsi Document Image Analysis🕊️

1. Introduction

2. Dataset Description📝

3. How to Access🤔💥

4. Citation:

About

Releases

Packages

License

aliiafkari/SUT_Dataset

Folders and files

Latest commit

History

Repository files navigation

SUT Dataset: A New Multi-purpose Synthetic Dataset for Farsi Document Image Analysis🕊️

1. Introduction

2. Dataset Description📝

3. How to Access🤔💥

4. Citation:

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Packages