Abstract
In machine learning, AdaBoost with Support vector Machines (SVM) based component classifier have shown to be a successful method for classification on balanced dataset with all classes having relatively similar distribution. However, the success of this method is limited when it is applied for imbalanced datasets. In many real applications, the classification of data with imbalanced proportions will be problematic since the algorithm can be biased and then might predict all the samples into majority classes. Many studies were conducted to overcome imbalance data problem by using hybrid algorithms. In this paper, we propose an improved AdaBoost with SVM based weak learner algorithm using Gaussian Mixture Modeling (GMM) supervectors called GSV-ADSVM. GMM supervectors are constructed applying MAP adaptation of the means of the mixture components based on speech from a target phoneme of TIMIT corpus. Those supervectors will be used as input datasets for the hybrid Adaboost-SVM. The main goal of this paper is to investigate the impact of using GMM supervectors with the boosted SVM in a multi-class phoneme recognition problem with the aim to advance the classification of imbalanced data since certain class of interest have very small size.