Abstract
Optical Character Recognition (OCR) system translates text images into digitally editable text files. There exist OCRs for different languages but developing an OCR for Urdu is a challenging task because of its cursive nature and context sensitivity. This is the main reason behind very limited work on very Urdu OCRs systems, and the previous works on Urdu OCR are not very efficient in converting the image text file into editable text files. In this paper, we have proposed a UOCR system for converting printed Nastalique Urdu Script image file into digitally editable text files. It takes printed Nastalique Urdu image file as input and after performing operations like binarization, segmentation, feature extraction and classification on that document, it finally produces digitally Urdu editable text files as output. We are using SIFT and SURF methods for computing descriptor in our feature extraction approach. Our proposed UOCR system is very efficient in conversion.