Cross-validation and cross-study validation of kidney cancer with machine learning and whole exome sequences from the National Cancer Institute

Abdulrhman Aljouie; Nihir Patel; Usman Roshan; IEEE

Back

Conference proceeding

Cross-validation and cross-study validation of kidney cancer with machine learning and whole exome sequences from the National Cancer Institute

Abdulrhman Aljouie, Nihir Patel, Usman Roshan and IEEE

2018 IEEE CONFERENCE ON COMPUTATIONAL INTELLIGENCE IN BIOINFORMATICS AND COMPUTATIONAL BIOLOGY (CIBCB), pp.61-66

01/01/2018

Abstract

Life Sciences & Biomedicine

Mathematical & Computational Biology

Science & Technology

Accurate cancer risk prediction from genetic and environment variables is a key problem in medicine. One approach is to use somatic mutations which could potentially be used in early detection and prevention. SNP based studies are the most common ones utilizing this approach, however most studies lack a cross-study validation component across at least two independent studies. Here we explore the cross-validation and cross-study validation of predicting kidney cancer case and controls with SNPs obtained from whole exome sequences at the National Cancer Institute. From the Genomics Data Commons portal we obtained aligned whole exome sequences of two different kidney cancer studies: 110 cases and controls of KIRP for renal papillary cell carcinoma and 34 cases and controls of KICH for kidney chromophobe cell carcinoma. We performed a rigorous quality control procedure to obtain SNPs and rank them with feature selection. On top ranked SNPs we find the support vector machine to obtain a cross-validation accuracy of 71% (with 10 SNPs) and 72% (with 20 SNPs) in KIRP and KICH respectively. We then learn a model on KIRP and with 10 SNPs achieve an accuracy of 66% on the KICH samples. Our work shows that we can predict kidney chromophobe carcinoma from a kidney papillary carcinoma dataset with better than a random classification which would have 50% accuracy. In continuing work we are expanding these sample sizes and extending crossstudy to other kidney cancer datasets in the NCI GDC portal.

Metrics

1 Record Views

Details

Title: Cross-validation and cross-study validation of kidney cancer with machine learning and whole exome sequences from the National Cancer Institute
Creators - without role: Abdulrhman Aljouie - Biostat
Nihir Patel - Supreme Council Of Health
Usman Roshan - New Jersey Institute of Technology
IEEE
Publication Details: 2018 IEEE CONFERENCE ON COMPUTATIONAL INTELLIGENCE IN BIOINFORMATICS AND COMPUTATIONAL BIOLOGY (CIBCB), pp.61-66
Publisher: IEEE
Number of pages: 6
Identifiers: 9920885508331
Language: English
Resource Type: Conference proceeding