AVAS: Speech database for multimodal recognition applications

Samar Antar; Alaa Sagheer; Saleh Aly; Mohamed F Tolba

Back

Conference proceeding

AVAS: Speech database for multimodal recognition applications

Samar Antar, Alaa Sagheer, Saleh Aly and Mohamed F Tolba

The Institute of Electrical and Electronics Engineers, Inc. (IEEE) Conference Proceedings, p.123

01/12/2013

Abstract

Conference Title: 2013 13th International Conference on Hybrid Intelligent Systems (HIS) Conference Start Date: 2013, Dec. 4 Conference End Date: 2013, Dec. 6 Conference Location: Gammarth, Tunisia Audio-visual speech recognition (AVSR) systems represent an important branch in the human computer interaction (HCI) domain, since it is the simplest way to interact with computer. However, difficulties due to visual variations in video sequence can significantly degrade the recognition performance of AVSR systems. Although several corpuses have been created in this area, most of them are not include realistic visual variations in video sequence. This paper presents the first Audio-Visual Speech recognition corpus using Arabic language denoted as AVAS. All AVAS samples contain two of the most important visual variations; illumination variations and head pose variations, in the same video recording. Hence, AVAS is useful in the development of robust AVSR systems, automatic speech recognition "audio-only" systems, lip-reading "visual-only" systems and face recognition across pose and illumination variations.

Metrics

2 Record Views

Details

Title: AVAS: Speech database for multimodal recognition applications
Creators - without role: Samar Antar
Alaa Sagheer
Saleh Aly
Mohamed F Tolba
Publication Details: The Institute of Electrical and Electronics Engineers, Inc. (IEEE) Conference Proceedings, p.123
Publisher: The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Identifiers: 9918021008331
Academic Unit: Majmaah University; King Faisal University
Language: English
Resource Type: Conference proceeding