Abstract
This study aims to investigate the possible use of speech rhythm metrics as a new feature for speech emotion recognition, gender identification, and regional accent identification. Further, it aims to evaluate a new Arabic speech emotion corpus. The King Saud University Emotions (KSUEmotions) speech corpus contains five emotions: neutral, sadness, happiness, surprise, and anger. For this study, speech acoustic features are extracted and used to classify the speakers' emotions. All classification results were obtained using the multilayer perceptron (MLP) neural networks and support vector machine (SVM) classifiers. Results demonstrate that the rhythm metrics are not sufficient for speech emotion classification. Nevertheless, they can improve the classifier accuracy when combined with other speech acoustic features. These results also demonstrate that the average performance accuracy of the KSUEmotions Phase 1 is 54.07% and 84.14% for Phase 2 and that the emotion of sadness achieves the best emotions' classification accuracy.