Abstract
Conference Title: GLOBECOM 2017 - 2017 IEEE Global Communications Conference Conference Start Date: 2017, Dec. 4 Conference End Date: 2017, Dec. 8 Conference Location: Singapore With an exponential increase in the Internet traffic over the network, there are growing concerns of identification of legitimate users which are the bulk sources of Internet traffic generation. However, due to the occurrence of anomalies in the network traffic, normal operations or the functionalities (traffic classification, resource allocation, and service management) of network get affected. Thus, in a given time frame, there is a requirement of anomalies detection in the network. The efficiency of any anomaly detection model mainly depends on the selection of relevant features and the learning algorithms which are used for classification of the network traffic patterns. However, due to curse of dimensionality, imbalance between classes, and variations in the types of anomalies, most of the existing solutions reported in the literature fail to deal with problems that occurs while detecting anomalies in large-scale network data. So, to remove these gaps in the existing solutions, we propose a new hybrid anomaly detection scheme called as Ensemble-based Classification Model for Network Anomaly Detection (EnClass) to detect anomalies in real- world networking datasets. EnClass has three modules as (i) Hoeffding-bound based clustering to identify the optimal subset of features to be taken for classification of network traffic (ii) Eigenvalues computation module to refine the features set for removal of unnecessary attributes and (iii) Very-fast decision tree for network traffic classification. In order to validate the proposed anomaly detection model, experimental evaluation is performed using real-world Knowledge Discovery and Data Mining (KDD'99) dataset with respect to parameters such as-detection rate, false positive rate, and F-score. The comparison with existing approaches clearly demonstrates the effectiveness of the EnClass in terms of detection rate (98.58%), false positive rate (0.42%), and F-score (96.06%).