Architecting a machine learning pipeline for online traffic classification in software defined networking using spark

Sama Salam Samaan, Hassan Awheed Jeiad

Abstract


Precise traffic classification is essential to numerous network functionalities such as routing, network management, and resource allocation. Traditional classification techniques became insufficient due to the massive growth of network traffic that requires high computational costs. The arising model of software defined networking (SDN) has adjusted the network architecture to get a centralized controller that preserves a global view over the entire network. This paper proposes a model for SDN traffic classification based on machine learning (ML) using the Spark framework. The proposed model consists of two phases; learning and deployment. A ML pipeline is constructed in the learning phase, consisting of a set of stages combined as a single entity. Three ML models are built and evaluated; decision tree, random forest, and logistic regression, for classifying a well-known 75 applications, including Google and YouTube, accurately and in a short time scale. A dataset consisting of 3,577,296 flows with 87 features is used for training and testing the models. The decision tree model is elected for deployment according to the performance results, which indicate that it has the best accuracy with 0.98. The performance of the proposed model is compared with the state-of-the-art works, and better accuracy result is reported.

Keywords


Big data; Machine learning pipeline; Software defined networking; Spark framework; Traffic classification

Full Text:

PDF


DOI: http://doi.org/10.11591/ijai.v12.i2.pp861-873

Refbacks

  • There are currently no refbacks.


View IJAI Stats

Creative Commons License
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.