Abstract
[Display omitted]
•Copy number alterations are informative for predicting cancer patients’ survival.•Copy number alteration data are highly correlated with segments.•Simultaneous prediction and identification of important genomic regions can be achieved.•Some regions in Chromosomes 3 and 7, including TERC, are identified to increase hazard.•Some regions in Chromosomes 8 and 12, including MTSS and NDUFB9, are identified to lower hazard.
Copy number alterations (CNA) are structural variation in the genome, in which some regions exhibit more or less than the normal two chromosomal copies. This genomic CNA profile provides critical information in tumour progression and is therefore informative for patients’ survival. It is currently a statistical challenge to model patients’ survival using their genomic CNA profiles while at the same time identify regions in the genome that are associated with patients’ survival. Some methods have been proposed, including Cox proportional hazard (PH) model with ridge, lasso, or elastic net penalties. However, these methods do not take the general dependencies between genomic regions into account and produce results that are difficult to interpret. In this paper, we extend the elastic net penalty by introducing additional penalty that takes into account general dependencies between genomic regions. This new model produces smooth parameter estimates while simultaneously performs variable selection via sparse solution. The results indicate that the proposed method shows a better prediction performance than other models in our simulation study, while enabling us to investigate regions in the genome that are associated with the patients’ survival with sensible interpretation. We illustrate the method using a real dataset from a lung cancer cohort and simulated data.