Abstract
Text in natural scene image portrays rich semantic information that plays an important role in content analysis. However, apart from Arabic text in documents, the text in natural scene images exhibit much higher diversity and variability, especially in uncontrolled circumstances. In this paper, a hybrid feature extraction approach is presented to detect extremal region of Arabic scene text. The binary image and image mask are considered as a variant of input image and look for concurrent extremal regions in both images. After determination of conjoined extremal points, the scale invariant technique is applied to consider those invariant points which are common in both images based on their coordinate positions. To evaluate the performance, a multidimensional long short term memory (LSTM) network is adapted and obtained 94.21% accuracy for word recognition on unconstrained Arabic scene text recognition (ASTR) dataset.