Abstract
Recognizing human emotions is the indispensable requirement for efficient human machine interaction. Besides human facial expressions, speech is one of the latest challenges in automatic recognition of emotions. Current approaches in automatic speaker recognition systems are partly to entirely based on Gaussian mixture models (GMM). In this research, we study and evaluate the combination of GMM approach with different generative models (K-nearest neighbors, Naive Bayes, Multilayer perceptron) and discriminative models (Support Vector Machine, Decision Trees) in the setting of a robust emotion recognition system.
We illustrate this framework using Mel-frequency cepstral coefficients and Sequential Forward Selection method applicable to GMM supervectors.