Arabic calligraphy, typewritten and handwritten using optical character recognition (OCR) system

Hassanin M. Al-Barhamtoshy; Kamal M. Jambi; Hany Ahmed; Shaimaa Mohamed; Sherif M. Abdo; Mohsen A. Rashwan

doi:10.21786/bbrc/12.2/11

Back

Journal article

Arabic calligraphy, typewritten and handwritten using optical character recognition (OCR) system

Hassanin M. Al-Barhamtoshy, Kamal M. Jambi, Hany Ahmed, Shaimaa Mohamed, Sherif M. Abdo and Mohsen A. Rashwan

BIOSCIENCE BIOTECHNOLOGY RESEARCH COMMUNICATIONS, Vol.12(2), pp.283-296

25/06/2019

DOI: https://doi.org/10.21786/bbrc/12.2/11

Abstract

Biotechnology & Applied Microbiology

Life Sciences & Biomedicine

Science & Technology

This paper describes an Omni OCR system for recognizing typewritten and handwritten Arabic texts documents. The proposed system of the Arabic OCR system can be classified into four main phases. The first phase is the pre-processing phase; it focuses on binarizing, skewing treatment, framing, and noise removing from the prepared documents (dataset). The second phase aims to segment the preprocessed documents into lines and words. Two main tasks are pointed during this phase: language model with the used Arabic dictionary, and the detection of segmented lines and segmented words. The third phase is features extraction phase; it is used to extract features for each segmented line/word according to the used language model. Finally, the classifier or the recognizer will be used to recognize each word/line into a text stream. Therefore, scientific evaluation of the four phases will be applied to measure the accuracy of the Arabic OCR system. The recognition approachis based on Hidden Markov Models (HMM) with the prepared datasets and software development tool are discussed and introduced. State of the art OCR's recognition systems are now capable to perform accuracy of 70% for unconstrained Arabic texts. However, this outline is still far away from what is required in a lot of useful applications. In other words, this paper describes a proposed approach based on language model with ligature and overlap characters for the pro-posed Arabic OCR. Therefore, a posterior word-based approach is used with tri-gram model to recognize the Arabic text. Features are extracted from images of words and generated pattern using the proposed solution. We test our proposed OCR system in different categories of Arabic documents: early printed or typewritten, printed, historical and calligraphy documents. The test bed of our system gives 12.5%-character error rate compared to the best OCR of other systems.

Metrics

1 Record Views

Details

Title: Arabic calligraphy, typewritten and handwritten using optical character recognition (OCR) system
Creators - without role: Hassanin M. Al-Barhamtoshy - King Abdulaziz University
Kamal M. Jambi - King Abdulaziz University
Hany Ahmed - Cairo University
Shaimaa Mohamed - Cairo University
Sherif M. Abdo - Cairo University
Mohsen A. Rashwan - Cairo University
Publication Details: BIOSCIENCE BIOTECHNOLOGY RESEARCH COMMUNICATIONS, Vol.12(2), pp.283-296
Publisher: Soc Science & Nature
Number of pages: 14
Grant note: 11-INF-1997-03 / NSTIP strategic technologies program in the Kingdom of Saudi Arabia
Identifiers: 9935865008331
Academic Unit: King Abdulaziz University
Language: English
Resource Type: Journal article