Subcellular localization using fluorescence imagery: Utilizing ensemble classification with diverse feature extraction strategies and data balancing

Muhammad Tahir; Asifullah Khan; Abdul Majid; Alessandra Lumini

doi:10.1016/j.asoc.2013.06.027

Back

Subcellular localization using fluorescence imagery: Utilizing ensemble classification with diverse feature extraction strategies and data balancing

Journal article

Peer reviewed

Subcellular localization using fluorescence imagery: Utilizing ensemble classification with diverse feature extraction strategies and data balancing

Muhammad Tahir, Asifullah Khan, Abdul Majid and Alessandra Lumini

Applied soft computing, Vol.13(11), pp.4231-4243

01/11/2013

DOI: https://doi.org/10.1016/j.asoc.2013.06.027

Abstract

Data balancing

Ensemble classification

Random forest

Rotation forest

SMOTE

Subcellular localization

•Random Forest and Rotation Forest classifiers are used for subcellular localization.•Various feature extraction strategies are utilized.•SMOTE is employed as a data balancing technique.•SMOTE has improved prediction performance in classifying protein images.•A web server is available online at http://111.68.99.218/RF-SubLoc. Protein subcellular localization plays a vital role in understanding proteins’ behavior under different circumstances. The effectiveness of various drugs can be assessed by the successful prediction of protein locations. Therefore, it is important to develop a prediction system that is sufficiently reliable and accurate in making decisions regarding the protein localization. However, main problem in developing a reliable and high throughput prediction system is the presence of imbalanced data, which greatly affects the performance of a prediction system. In order to remedy this problem, we utilized the notion of oversampling through Synthetic Minority Oversampling TEchnique (SMOTE). Further, different feature extraction strategies and ensemble classification techniques are assessed for their contribution toward the solution of the challenging problem of subcellular localization. After applying SMOTE data balancing technique, a remarkable improvement is observed in the performance of random forest and rotation forest ensemble classifiers for CHOM, CHOA and VeroA datasets. It is anticipated that our proposed model might be helpful for the research community in the field of functional and structural proteomics as well as in drug discovery.

Metrics

1 Record Views

Details

Title: Subcellular localization using fluorescence imagery: Utilizing ensemble classification with diverse feature extraction strategies and data balancing
Creators - without role: Muhammad Tahir - Pakistan Institute of Engineering and Applied Sciences
Asifullah Khan - Pakistan Institute of Engineering and Applied Sciences
Abdul Majid - Pakistan Institute of Engineering and Applied Sciences
Alessandra Lumini - University of Bologna
Publication Details: Applied soft computing, Vol.13(11), pp.4231-4243
Publisher: Elsevier B.V
Identifiers: 9910073708331
Academic Unit: Saudi Electronic University
Language: English
Resource Type: Journal article