Abstract
Objective assessment of voice pathology has a growing interest nowadays. Automatic speech/speaker recognition (ASR) systems are commonly deployed in voice pathology detection. The aim of this work was to develop a novel feature extraction method for ASR that incorporates distributions of voiced and unvoiced parts, and voice onset and offset characteristics in a time-frequency domain to detect voice pathology.
The speech samples of 70 dysphonic patients with six different types of voice disorders and 50 normal subjects were analyzed. The Arabic spoken digits (1–10) were taken as an input. The proposed feature extraction method was embedded into the ASR system with Gaussian mixture model (GMM) classifier to detect voice disorder.
Accuracy of 97.48% was obtained in text independent (all digits' training) case, and over 99% accuracy was obtained in text dependent (separate digit's training) case. The proposed method outperformed the conventional Mel frequency cepstral coefficient (MFCC) features.
The results of this study revealed that incorporating voice onset and offset information leads to efficient automatic voice disordered detection.