Abstract
Real-word (also known as semantic or contextsensitive) spelling error is a class of error that escapes the typical spell checker which relies on dictionary look-up. This kind of error occurs when a user types a correctly spelled word-by mistake-when another is intended, e.g., "I want a peace (piece) of cake." Further, these errors commonly arise in text written by people with dyslexia. Real-word errors are harder to detect as we need to consider the context. In this paper, we propose a spell checker that detects and corrects real-word errors for the Arabic language. Our method avoids predefined confusion sets-a simple approach used by many works tackling this problem-which limits the list of words that can be detected and corrected. Thus, our system can detect and correct a larger set of real-word errors. For the detection phase, we employ word and stern n-gram (n = 1-3) language model along with machine learning, achieving a precision and recall of 83.5% and 99.2%, respectively. And for the correction phase we use n-gram, which results in an accuracy of 98%. Our scheme is robust, with an excellent performance even when the percentage of real-word error words is high. This makes the system suitable for handling errors in post OCR recognition of Arabic text.