Abstract
Identification of biomarkers from high dimensional data is one of the most important emerging topics in genomics and personalized medicine. Gene selection aims to find a parsimonious subset of features that has the most discriminative information for a specific disease. The variations in real clinical tests have a great impact on the diagnosis efficiency. This influence makes producing stable or robust signatures a crucial problem in feature selection algorithms. Recently this issue has received great attention. In this paper, we propose a novel Meta-Ensemble Feature Selection approach (MEFS) for biomarker discovery. The latter is based on the concept of meta-ensemble which is a new promising direction in machine learning. The objective is to produce more parsimonious and robust selection with better classification accuracy. The proposed method is different from the conventional ensemble learning techniques and it uses Information Gain (IG) to evaluate the relevance of genes, since it is simple, fast and meaningful for an appropriate ensemble method. The efficiency and the effectiveness of our method were demonstrated through comparisons with single, ensemble versions and other ensemble feature selection techniques. Results have shown that the robustness of MEFS for biomarker discovery can be substantially increased while improving classification accuracy.