CRF-based Diacritisation of Colloquial Arabic for Automatic Speech Recognition

Sarah Al-Shareef; Thomas Hain; International Speech Communications Association

Conference proceeding

CRF-based Diacritisation of Colloquial Arabic for Automatic Speech Recognition

Sarah Al-Shareef, Thomas Hain and International Speech Communications Association

13TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2012 (INTERSPEECH 2012), VOLS 1-3, pp.1822-1825

01/01/2012

Abstract

Computer Science

Computer Science, Artificial Intelligence

Computer Science, Interdisciplinary Applications

Engineering

Engineering, Electrical & Electronic

Science & Technology

Technology

Most of the available resources of colloquial Arabic speech are transcribed without diacritics. Those diacritics provide short vowels and other pronunciation information and by omitting them a considerable amount of ambiguity is introduced. In this paper, we propose the use of an automatic diacritisation method as front-end for training of automatic speech recognition systems of colloquial Arabic. The system used is based on conditional random fields that are trained on speaker and contextual information. This method outperforms other reported methods in diacritisation colloquial Arabic by 13.2% relative. The empirical experiments show that applying this method on acoustic model training transcriptions improves the recognition performance in Levantine colloquial Arabic by 1.8% relative.

Metrics

1 Record Views

Details

Title: CRF-based Diacritisation of Colloquial Arabic for Automatic Speech Recognition
Creators - without role: Sarah Al-Shareef - University of Sheffield
Thomas Hain - University of Sheffield
International Speech Communications Association
Publication Details: 13TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2012 (INTERSPEECH 2012), VOLS 1-3, pp.1822-1825
Publisher: Isca-Int Speech Communication Assoc
Number of pages: 4
Identifiers: 9931156708331
Academic Unit: Umm Al Qura University
Language: English
Resource Type: Conference proceeding