Bachelor theses
Augmenting WIF library by a new Machine Learning Classifier
Author
Jáchym Hudlický
Year
2025
Type
Bachelor thesis
Supervisor
Ing. Richard Plný
Reviewers
Mgr. Martin Jureček, Ph.D.
Department
Summary
The thesis focuses on monitoring computer networks at the level of IP flows, specifically on the classification of network traffic and threat detection using machine learning methods. The main goal of the thesis is to expand the WIF library by a new classifier based on machine learning. Great emphasis is placed on efficiency and processing speed.
After a thorough analysis of available C++ libraries for machine learning, the Mlpack library was selected. Then, a new WIF classifier was designed and implemented. The speed of this classifier was compared with the existing WIF classifier using machine learning (using the scikit-learn library), and it turned out that the new classifier using Mlpack is 21x to 190x faster, depending on the model. The new classifier is available in the offical WIF repository on GitHub.
The thesis also describes the design and implementation of a proxy server detector, including a residential proxy detector, which uses the newly designed classifier. The detector was tested on real data captured in the national CESNET3 network, which has half a million daily active users.