Abstract
Pronunciation variability is by far the most critical issue for Arabic Automatic Speech Recognition (AASR). The problem is further complicated when AASR needs to deal with both native and non-native accents. In this paper, we are concerned with the problem of non-native speech in a speaker independent, large-vocabulary speech recognition system for Modem Standard Arabic (AEA). We analyze some major differences related to the phonetic confusion in order to determine which phonemes have a significant part in the recognition performance for both native and non-native speakers. The WestPoint Language Data Consortium (LDC) modern standard Arabic database and the Hidden Markov Model Toolkit (HTK) are used in this research effort We analyzed the performance of AASR at phonetic and word levels and we found that the introduction of the language model masks the pronunciation problems of non-native speakers.