Abstract
Predicting peptides that can bind to MHC class I molecules is an important step in the vaccine design process. Computational approaches have potential to provide good predictive models that save both time and cost of the process. Position Specific Scoring Matrix (PSSM) is a reliable approach when dealing with amino acid sequences. PSSM fojination involves carefully selecting its constructing data and parameters. In this work, we apply three different data splitting strategies and propose alternative values for the embedded PSSM parameters. The basic principle of data splitting is to choose train data that is able to represent the whole data. We propose using the Kennard Stone algorithm to highlight the importance of choosing the data constituting the PSSM. Furthermore, this work proposes modifications to PSSM parameters and studies the model behavior in response to each change. The model is applied to experimental data for the Major Histocompatibility Complex of class I. Performance of modified parameters show either comparable or better results to conventional parameters. Moreover, Kennard Stone data splitting algorithm contributed to significant model performance enhancement. (C) 2016 Nakcz Institute of Biocybemetics and Biomedical Engineering of the Polish Academy of Sciences. Published by Elsevier Sp. z o.o. All rights reserved.