Abstract
Conference Title: 2015 International Conference on Computer, Communications, and Control Technology (I4CT) Conference Start Date: 2015, April 21 Conference End Date: 2015, April 23 Conference Location: Kuching, Sarawak, Malaysia Cameras in handheld devices, i.e., mobile phones, have become the fastest and the easiest method for capturing document images. However, document images captured with handheld cameras have been rarely collected and investigated. Digitization of text from the captured images presents a challenge because these images are prone to non-uniform lighting, uneven illumination, skew and shadow. The objectives of this paper are first to provide a benchmark dataset of document images captured via modern handheld devices and, second, to evaluate several binarization methods (i.e., Niblack, Sauvola, Wolf, Nick and Bataineh) using this dataset and certain meaningful measurements. The results show that the Nick and Bataineh methods achieved the best results in the English Printed Document Images (EPDI) test, whereas the Nick and Sauvola methods surpassed the other methods in the Arabic Printed Document Images (APDI) test that consists of two decoration formats. The Nick method surpassed other methods in documents that did not contain Harakat, and Savoula surpassed other methods in documents that did contain Harakat.