QTID: Quran Text Image Dataset

Mahmoud Badry; Hesham Hassan; Hanaa Bayomi; Hussien Oakasha

doi:10.14569/IJACSA.2018.090351

Back

Journal article

Open access

QTID: Quran Text Image Dataset

Mahmoud Badry, Hesham Hassan, Hanaa Bayomi and Hussien Oakasha

International journal of advanced computer science & applications, Vol.9(3), pp.385-391

2018

DOI: https://doi.org/10.14569/IJACSA.2018.090351

Abstract

Computer Science

Computer Science, Theory & Methods

Science & Technology

Technology

Improving the accuracy of Arabic text recognition in imagery requires a big modern dataset as data is the fuel for many modern machine learning models. This paper proposes a new dataset, called QTID, for Quran Text Image Dataset, the first Arabic dataset that includes Arabic marks. It consists of 309,720 different 192x64 annotated Arabic word images that contain 2,494,428 characters in total, which were taken from the Holy Quran. These finely annotated images were randomly divided into 90%, 5%, 5% sets for training, validation, and testing, respectively. In order to analyze QTID, a different dataset statistics were shown. Experimental evaluation shows that current best Arabic text recognition engines like Tesseract and ABBYY FineReader cannot work well with word images from the proposed dataset.

Files and links (1)

url

https://doi.org/10.14569/IJACSA.2018.090351View

Published (Version of record) Open

Metrics

1 Record Views

Details

Title: QTID: Quran Text Image Dataset
Creators - without role: Mahmoud Badry - Fayoum University
Hesham Hassan - Cairo University
Hanaa Bayomi - Cairo University
Hussien Oakasha - Fayoum University
Publication Details: International journal of advanced computer science & applications, Vol.9(3), pp.385-391
Publisher: Science & Information Sai Organization Ltd
Number of pages: 7
Identifiers: 9923280808331
Academic Unit: King Khalid University
Language: English
Resource Type: Journal article