Abstract
DNA methylation plays an important role for initiation and development of human cancers; therefore, it is used as a biological marker for early detection of cancer. A huge number of features for each sample and a low number of the available samples are two main problems of this field. This paper presents novel vertical, horizontal, and cascaded DNA methylation feature analysis methods in promoter regions. Vertical analysis processes each feature across all normal or cancer samples to get indicators about the methylation level. The generated values are used to select a subset of features within a given threshold. This set undergoes a horizontal analysis process where we group many features into a window that is used to yield a single value. Hence, the original sample size goes through two reduction steps: the first one is an unsupervised feature selection via the vertical analysis of the features and the second one is a feature extraction via the horizontal process of the selected features. For evaluation and comparison, we used traditional feature selection methods and SVD to compare them with the proposed approaches and found that the proposed approaches outperform all other approaches with a good margin. The results of vertical analysis or horizontal analysis alone are better than traditional approaches. Moreover, the results are improved more when combining both types of analysis. With only 97 features, the proposed combined approach is 99.16% accurate while the best traditional classification is only 98.16% accurate with 31 195 features. The combined approach achieved 8.8% to 54.3% improvement percentages compared to all other approaches in terms of a mean absolute error and a root-mean-square error. This indicates that the cascaded approach is far better than the previous approaches. Moreover, the combined approach improves the system accuracy and reduces space and processing complexities of the system.