Ing. Dominik Soukup

Publications

Active Learning Framework For Long-term Network Traffic Classification

Authors
Pešek, J.; Soukup, D.; Čejka, T.
Year
2023
Published
IEEE Annual Computing and Communication Workshop and Conference (CCWC). New Jersey: IEEE, 2023. p. 893-899. ISBN 979-8-3503-3286-5.
Type
Invited/Awarded proceedings paper
Annotation
Recent network traffic classification methods benefit from machine learning (ML) technology. However, there are many challenges due to the use of ML, such as lack of high-quality annotated datasets, data drifts and other effects causing aging of datasets and ML models, high volumes of network traffic, etc. This paper presents the benefits of augmenting traditional workflows of ML training&deployment and adaption of the Active Learning (AL) concept on network traffic analysis. The paper proposes a novel Active Learning Framework (ALF) to address this topic. ALF provides prepared software components that can be used to deploy an AL loop and maintain an ALF instance that continuously evolves a dataset and ML model automatically. Moreover, ALF includes monitoring, datasets quality evaluation, and optimization capabilities that enhance the current state of the art in the AL domain. The resulting solution is deployable for IP flow-based analysis of high-speed (100 Gb/s) networks, where it was evaluated for more than eight months. Additional use cases were evaluated on publicly available datasets.

Evaluation of the Limit of Detection in Network Dataset Quality Assessment with PerQoDA

Authors
Wasielewska, K.; Soukup, D.; Čejka, T.; Camacho, J.
Year
2023
Published
ECML PKDD 2022: Machine Learning and Principles and Practice of Knowledge Discovery in Databases. Cham: Springer, 2023. p. 170-185. ISSN 1865-0929. ISBN 978-3-031-23632-7.
Type
Proceedings paper
Annotation
Machine learning is recognised as a relevant approach to detect attacks and other anomalies in network traffic. However, there are still no suitable network datasets that would enable effective detection. On the other hand, the preparation of a network dataset is not easy due to privacy reasons but also due to the lack of tools for assessing their quality. In a previous paper, we proposed a new method for data quality assessment based on permutation testing. This paper presents a parallel study on the limits of detection of such an approach. We focus on the problem of network flow classification and use well-known machine learning techniques. The experiments were performed using publicly available network datasets.

Automated Annotation of Network Traffic with Data from Web Browser

Authors
Kala, J.; Soukup, D.
Year
2022
Published
Proceedings of the 10th Prague Embedded Systems Workshop. Praha: CTU. Faculty of Information Technology, 2022. p. 59-67. ISBN 978-80-01-07015-4.
Type
Proceedings paper
Annotation
Encrypted traffic classification requires Machine Learning (ML) algorithms and a large amount of data to learn patterns and classify network communication without decrypting it. For the learning stage of ML models, we need a reliable and trusted dataset that delivers the ground truth for the whole classification. However, building a dataset is a very complicated and time-consuming task that stops ML to be used in the production environment of target networks. The aim of this work is to address this topic for network flow annotation using web traffic data. This paper introduces to problematics of network IP flow monitoring, analysis and classification. This problem is demonstrated on HTTP and HTTPS protocols. Moreover, this work describes a technique of data collection from web browsers and their pairing with traffic flows to create a reliable annotated dataset automatically

Vision of Active Learning Framework Approach to Network Traffic Analysis Research

Authors
Pešek, J.; Soukup, D.; Čejka, T.
Year
2022
Published
Proceedings of the 10th Prague Embedded Systems Workshop. Praha: CTU. Faculty of Information Technology, 2022. p. 68-72. ISBN 978-80-01-07015-4.
Type
Proceedings paper
Annotation
Current research in the network security domain intensively uses machine learning (ML) and artificial intelligence to automate processes and reveal hidden patterns in data. These technologies, however, require lots of training datasets with ideally high quality. Additionally, network infrastructures continuously evolve and thus network traffic dynamically changes in time as well. There is an urgent need to adapt machine learning models, update datasets with the latest samples of annotated network traffic and retrain the models regularly to sustain feasible performance. Active Learning Framework (ALF) directly targets these demands and aims to provide a modular platform for scientific experiments and deployment in practice as well as to support research activities regarding quality of datasets. This paper particularly describes ALF software and proposes its possible use cases in research and practice domains.

Towards Evaluating Quality of Datasets for Network Traffic Domain

Authors
Soukup, D.; Tisovčík, P.; Hynek, K.; Čejka, T.
Year
2021
Published
Proceedings of the 2021 17th International Conference on Network and Service Management. New York: IEEE, 2021. p. 264-268. ISSN 2165-963X. ISBN 978-3-903176-36-2.
Type
Proceedings paper
Annotation
This paper deals with the quality of network traffic datasets created to train and validate machine learning classification and detection methods. Naturally, there is a long epoch of research targeted at data quality; however, it is focused mainly on data consistency, validity, precision, and other metrics, which are insufficient for network traffic use-cases. The rise of Machine learning usage in network monitoring applications requires a new methodology for evaluation datasets. There is a need to evaluate and compare traffic samples captured at different conditions and decide the usability of the already captured and annotated data. This paper aims to explain a use case of dataset creation, propose definitions regarding the quality of the network traffic datasets, and finally, describe a framework for datasets analysis.

Behavior Anomaly Detection in IoT Networks

Year
2020
Published
Proceeding of the International Conference on Computer Networks, Big Data and IoT (ICCBI - 2019). Cham: Springer International Publishing, 2020. p. 465-473. Lecture Notes on Data Engineering and Communications Technologies. vol. 49. ISSN 2367-4520. ISBN 978-3-030-43192-1.
Type
Book chapter
Annotation
Data encryption makes deep packet inspection less suitable nowadays, and the need of analyzing encrypted traffic is growing. Machine learning brings new options to recognize a type of communication despite the heterogeneity of encrypted IoT traffic right at the network edge. We propose the design of scalable architecture and the method for behavior anomaly detection in IoT networks. Combination of two existing semi-supervised techniques that we used ensures higher reliability of anomaly detection and improves results achieved by a single method. We describe conducted classification and anomaly detection experiments allowed thanks to existing and our training datasets. Presented satisfying results provide a subject for further work and allow us to elaborate on this idea.

QoD: Ideas about Evaluating Quality of Datasets

Year
2020
Published
Proceedings of the 8th Prague Embedded Systems Workshop. Praha: Czech Technical University in Prague, 2020. p. 8-9. ISBN 978-80-01-06772-7.
Type
Proceedings paper
Annotation
Importance of computer networks is raising every year. The reason is that we are connecting more and more devices, applications and our daily routines depends on connectivity. On the other hand, this is a great potential for attackers. They can hide their activities in complex network environment and steal valuable data. Without solid dataset, our evaluation score is misinterpreting the real score in production environment, and, therefore, proper datasets have essential role in research&development of any ML-based classifier or detector. The main motivation for this paper is to find a way how to evaluate quality of any dataset to estimate if it is good enough for ML experiments. To our best knowledge, there are only a few studies focused on quality evaluation of datasets with network traffic. For experiments, we selected datasets about DNS over HTTP (DoH) detection and URL classification problems that are already being elaborated. All metrics are calculated from dataset level. Impact of these metrics is evaluated on Random Forest (RF) model. We show results we have discovered in our datasets and ML detection modules. Eventually, we discuss possible next steps in this research.

Security Framework for IoT and Fog Computing Networks

Authors
Soukup, D.; Hujňák, O.; Štefunko, S.; Krejčí, R.; Grešák, E.
Year
2019
Published
3rd International conference on I-SMAC. Piscataway, NJ: IEEE, 2019. p. 87-92. ISBN 978-1-7281-4365-1.
Type
Invited/Awarded proceedings paper
Annotation
Our environment becomes more and more in-tercon-nected. Various devices like refrigerators, doors or light bulbs communicate over different networks and provide information for applications that are supposed to make our lives easier and more comfortable. However, such data provide sensitive information about our presence or habits and become captivating for network attackers. It is very challenging to detect incidents in heterogeneous IoT networks where different devices come in and out or change their network profiles quite frequently. We propose a security framework for IoT and fog computing networks to address these challenges. Our framework is very flexible and designed even for devices with limited computational power. All components can be deployed on one network node or distributed among many, which also allows easy scalability. Part of our solution is software IoT gateway that provides the capability to analyse traffic from non-IP IoT sensors. This project covers full-stack security solution because it contains collectors, detectors and management tools. This framework has only software components with no relation to any specific hardware device. It is developed as an open-source project and it is publicly available for the worldwide community. Currently developed detectors detect identified vulnerabilities for Z-Wave, Long Range Wide Area Network (LoRaWAN), BLE and IP based IoT protocols.