Features analysis of internet traffic classification using interpretable machine learning models

Erick A. Adje, Vinasetan Ratheil Houndji, Michel Dossou

Abstract


Internet traffic classification is a fundamental task for network services and management. There are good machine learning models to identify the class of traffic. However, finding the most discriminating features to have efficient models remains essential. In this paper, we use interpretable machine learning algorithms such as decision tree, random forest and eXtreme gradient boosting (XGBoost) to find the most discriminating features for internet traffic classification. The dataset used contains 377,526 traffics. Each traffic is described by 248 features. From these features, we propose a 12-feature model with an accuracy of up to 99.76%. We tested it on another dataset with 19626 flows and obtained 98.40% of accuracy. This shows the efficiency and stability of our model. Also, we identify a set of 14 important features for internet traffic classification, including two that are crucial: port number (server) and minimum segment size (client to server).

Keywords


classification algorithm; internet traffic; machine learning; traffic classification; traffic internet discriminators;



DOI: http://doi.org/10.11591/ijai.v11.i3.pp%25p

Refbacks

  • There are currently no refbacks.


View IJAI Stats

Creative Commons License
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.