Ing. Dominik Soukup

Theses

Bachelor theses

WireGuard Traffic Detection Using Active Learning

Author
Štěpán Jílek
Year
2023
Type
Bachelor thesis
Supervisor
Ing. Dominik Soukup
Reviewers
Ing. Jaroslav Pešek
Summary
This bachelor's thesis deals with the detection of encrypted traffic of the WireGuard protocol, which is used for encrypted VPN connections. This protocol is only few years old and is rapidly growing in popularity. The ALF framework is used for detection which takes advantage of machine learning and is improved by the universal Annotator module. In the first part, the reader is introduced to the principles of network monitoring and the WireGuard protocol. In the next parts is the proposal and implementation. At the end, the performance and accuracy of the detection is evaluated. Classification efficiency reaches up to 95% in some cases.

Master theses

Concept drift and model degradation in network traffic classification

Author
Lukáš Jančička
Year
2024
Type
Master thesis
Supervisor
Ing. Dominik Soukup
Reviewers
Ing. Josef Koumar
Summary
Machine learning represents a highly effective and currently popular approach for network traffic classification. However, network traffic represents a challenging domain, and trained models may degrade quickly after the deployment. Other than biases present during the data capturing and model creation, concept drift represents a major source of model degradation. As the distributions evolve, the trained data patterns may stop being accurate. Because of that, the thesis focused on creating a basis for a framework for concept drift detection and analysis tailored to the domain of network traffic. The behaviour of network traffic was examined using a variety of experiments studying the development of distributions, simulating model deployment and observing the degradation over time. The presence of multiple recurring concepts was discovered with weekend traffic differing from the one of the working week. When concept drift wasn't addressed, the test F1 scores dropped from 0.92 to around 0.7 in a matter of days. Sometimes, only a few severely drifted features were the source of model degradation, so a novel approach of weighing the drift result by the feature importances was invented. The created drift detector may be enhanced by modules for additional analysis of the detected drift. A novel idea of classifying types of drift for better drift understanding is introduced. The created detector was tested to guide the model retraining and was able to not only prevent the model from degrading but also improve its performance over time.

Framework for autonomous improvement of network traffic classification

Author
Jaroslav Pešek
Year
2022
Type
Master thesis
Supervisor
Ing. Dominik Soukup
Reviewers
Ing. Simona Fornůsek, Ph.D.
Summary
This diploma thesis deals with the problem of classification of primarily encrypted network traffic by applying machine learning algorithms. Machine learning is a subfield of artificial intelligence which relies heavily on sufficiently large and general datasets. The first goal is to analyze methods that not only improve such classification over time, but also iteratively build the updated dataset. The second goal is to create a prototype of a software framework capable of doing so, while also being able to evaluate the classification. In the analysis part, the reader is introduced to the active learning method and analyzes and discusses the state-of-the-art and relevance of the methods to the network traffic domain. The design part defines the requirements and designs the solution architecture. The final part of the thesis is focused on experiments. The output of the work is a prototype of the software framework and an evaluation of various active learning methods for the network traffic domain.

Autonomous optimization of network traffic datasets

Author
Petr Skružný
Year
2022
Type
Master thesis
Supervisor
Ing. Dominik Soukup
Reviewers
doc. Ing. Tomáš Čejka, Ph.D.
Summary
The need for precise identification and monitoring of encrypted network traffic grows accordingly to the growing volume of encrypted network traffic. The solution to this problem often lies in utilization of machine learning algorithms. Before such algorithms can be employed they have to be trained using prepared training datasets. It is best to use dataset which are in part comprised of real world network traffic. Goal of this theses is to analyze potencial method for network traffic dataset optimization and propose viable algorithms for optimizing these datasets. The proposed algorithms will be implemented and experimentaly evaluated. Results of this theses are comparison of experimental results of proposed algorithms and software prototype of such algorithms implemented in Python programming language.