Abstract
Germline variants can be early useful predictors of cancer risk. Here we present cross-study validation and cross-validation of two brain cancers: Gliobastoma Multiforme (GBM) and Lower Grade Glioma (LGG). We obtained whole exome germline sequences of European ancestry individuals with these cancers from The Cancer Genome Atlas and of European ancestry control individuals from the 1000 Genomes Project. We performed a rigorous quality controlled GATK procedure to obtain variants with which we perform cross-study and cross-validation experiments. We find our germline variants to be highly predictive of both cancers in cross-study as well as in crossvalidation. Predicting LGG+controls from GBM+controls gives an 89% accuracy and predicting vice versa is 88% accurate both with the linear support vector machine classifier. We find that the main bulk of accuracy comes from the SNP rs10792053 that lies on gene OR9G1. We see that this SNP is in Hardy Weinberg equilibrium and allele frequencies similar to previously published in controls but not so in our cases. Our manual inspection of alignments reveals nothing unusual in the cases. We find our other top ranked SNPs to lie in genes known to be connected to brain cancer and cancer in general. Our study here shows a highly discriminative germline SNP for GBM and LGG cancer but requires replication studies to further verify.