Ing. Karel Hynek, Ph.D.

CESNET-TimeSeries24: Time Series Dataset for Network Traffic Anomaly Detection and Forecasting

Autoři

Koumar, J.; Hynek, K.; Čejka, T.; Pavel, Š.

Rok

2025

Publikováno

Scientific Data. 2025, 12(1), ISSN 2052-4463.

Typ

Článek

DOI

10.1038/s41597-025-04603-x

Pracoviště

Katedra číslicového návrhu

Anotace

Anomaly detection in network traffic is crucial for maintaining the security of computer networks and identifying malicious activities. Most approaches to anomaly detection use methods based on forecasting. Extensive real-world network datasets for forecasting and anomaly detection techniques are missing, potentially causing overestimation of anomaly detection algorithm performance and fabricating the illusion of progress. This manuscript tackles this issue by introducing a comprehensive dataset derived from 40 weeks of traffic transmitted by 275,000 active IP addresses in the CESNET3 network—an ISP network serving approximately half a million customers daily. It captures the behavior of diverse network entities, reflecting the variability typical of an ISP environment. This variability provides a realistic and challenging environment for developing forecasting and anomaly detection models, enabling evaluations that are closer to real-world deployment scenarios. It provides valuable insights into the practical deployment of forecast-based anomaly detection approaches.

CESNET-TLS-Year22: A year-spanning TLS network traffic dataset from backbone lines

Autoři

Hynek, K.; Luxemburk, J.; Pešek, J.; Čejka, T.; Šiška, P.

Rok

2024

Publikováno

Scientific Data. 2024, 11(1), ISSN 2052-4463.

Typ

Článek

DOI

10.1038/s41597-024-03927-4

Pracoviště

Katedra číslicového návrhu

Anotace

The modern approach for network traffic classification (TC), which is an important part of operating and securing networks, is to use machine learning (ML) models that are able to learn intricate relationships between traffic characteristics and communicating applications. A crucial prerequisite is having representative datasets. However, datasets collected from real production networks are not being published in sufficient numbers. Thus, this paper presents a novel dataset, CESNET-TLS-Year22, that captures the evolution of TLS traffic in an ISP network over a year. The dataset contains 180 web service labels and standard TC features, such as packet sequences. The unique year-long time span enables comprehensive evaluation of TC models and assessment of their robustness in the face of the ever-changing environment of production networks.

Comparative analysis of DNS over HTTPS detectors

Autoři

Jeřábek, K.; Hynek, K.; Ryšavý, O.

Rok

2024

Publikováno

Computer Networks. 2024, 247(247), ISSN 1389-1286.

Typ

Článek

DOI

10.1016/j.comnet.2024.110452

Pracoviště

Katedra číslicového návrhu

Anotace

DNS over HTTPS (DoH) is a protocol that encrypts DNS traffic to improve user privacy and security. However, its use also poses challenges for network operators and security analysts who need to detect and monitor network traffic for security purposes. Therefore, there are multiple DoH detection proposals that leverage machine learning to identify DoH connections; however, these proposals were often tested on different datasets, and their evaluation methodologies were not consistent enough to allow direct performance comparison. In this study, seven DoH detection proposals were recreated and evaluated with six different experiments to answer research questions that targeted specific deployment scenarios concerning ML-model transferability, usability, and longevity. For thorough testing, a large Collection of DoH datasets along with a novel 5-week dataset was used, which enabled the evaluation of models’ longevity. This study provides insights into the current state of DoH detection techniques and evaluates the models in scenarios that have not been previously tested. Therefore, this paper goes beyond classical replication studies and shows previously unknown properties of seven published DoH detectors.

NetTiSA: Extended IP flow with time-series features for universal bandwidth-constrained high-speed network traffic classification

Autoři

Koumar, J.; Hynek, K.; Pešek, J.; Čejka, T.

Rok

2024

Publikováno

Computer Networks. 2024, 240 1-22. ISSN 1389-1286.

Typ

Článek

DOI

10.1016/j.comnet.2023.110147

Pracoviště

Katedra číslicového návrhu

Anotace

Network traffic monitoring based on IP Flows is a standard monitoring approach that can be deployed to various network infrastructures, even the large ISP networks connecting millions of people. Since flow records traditionally contain only limited information (addresses, transport ports, and amount of exchanged data), they are also commonly extended by additional features that enable network traffic analysis with high accuracy. These flow extensions are, however, often too large or hard to compute, which then allows only offline analysis or limits their deployment only to smaller-sized networks. This paper proposes a novel extended IP flow called NetTiSA (Network Time Series Analysed) flow, based on analysing the time series of packet sizes. By thoroughly testing 25 different network traffic classification tasks, we show the broad applicability and high usability of NetTiSA flow. For practical deployment, we also consider the sizes of flows extended by NetTiSA features and evaluate the performance impacts of their computation in the flow exporter. The novel features proved to be computationally inexpensive and showed excellent discriminatory performance. The trained machine learning classifiers with proposed features mostly outperformed the state-of-the-art methods. NetTiSA finally bridges the gap and brings universal, small-sized, and computationally inexpensive features for traffic classification that can be scaled up to extensive monitoring infrastructures, bringing the machine learning traffic classification even to 100 Gbps backbone lines.

Towards reusable models in traffic classification

Autoři

Luxemburk, J.; Hynek, K.

Rok

2024

Publikováno

Proceedings of the 8th Network Traffic Measurement and Analysis Conference. Piscataway: IEEE, 2024. ISBN 978-3-903176-64-5.

Typ

Stať ve sborníku

DOI

10.23919/TMA62044.2024.10559009

Pracoviště

Katedra číslicového návrhu

Anotace

The machine learning communities, such as those around computer vision or natural language processing, have developed numerous supportive tools. In contrast, the network traffic classification field falls behind, and the lack of standard datasets and model architectures holds the entire field back. This paper aims to address this issue. We introduce CESNET Models, a package comprising pre-trained deep learning models tailored for traffic classification. The included models are trained on public datasets for the task of web service classification. Using the new package, researchers and practitioners can skip model design from scratch and the collection of large datasets but instead focus on fine-tuning and adapting the models to their specific needs, thus accelerating the pace of research and development in network traffic classification.

WIF: Efficient Library for Network Traffic Analysis

Autoři

Plný, R.; Hynek, K.; Šiška, P.

Rok

2024

Publikováno

2024 20th International Conference on Network and Service Management (CNSM). New York: IEEE, 2024. ISSN 2165-963X. ISBN 978-3-903176-66-9.

Typ

Stať ve sborníku

DOI

10.23919/CNSM62983.2024.10814378

Pracoviště

Katedra číslicového návrhu

Anotace

Network traffic classification and analysis are crucial for maintaining computer security. Nevertheless, the rise of encrypted traffic has made reliable threat detection increasingly challenging, requiring more complex algorithms such as heterogeneous ensembles. These types of algorithms proved to be effective in complex threat detection while maintaining high accuracy and explainability. However, their complexity and time-consuming development process limit their widespread adoption. Therefore, we created a new library called Weak Indication Framework (WIF) for the faster development of heterogeneous ensembles, which minimizes the time between attack discovery and detection capability. Moreover, WIF-based detectors are efficient enough to operate on large Internet Service Provider networks—a single detector can protect millions of users. We demonstrate the effectiveness of the WIF library through four different detectors (TOR, Cryptomining, IoT Malware, and Tunnel detector), each achieving outstanding performance and quick deployment times.

BOTA: Explainable IoT malware detection in large networks

Autoři

Uhříček, D.; Hynek, K.; Čejka, T.; Kolář, D.

Rok

2023

Publikováno

IEEE Internet of Things Journal. 2023, 10(10), 8416-8431. ISSN 2327-4662.

Typ

Článek

DOI

10.1109/JIOT.2022.3228816

Pracoviště

Katedra číslicového návrhu

Anotace

Explainability and alert reasoning are essential but often neglected properties of intrusion detection systems. The lack of explainability reduces security personnel’s trust, limiting the overall impact of alerts. This paper proposes the BOTA (Botnet Analysis) system, which uses the concepts of weak indicators and heterogeneous meta-classifiers to maintain accuracy compared with state-of-the-art systems while also providing explainable results that are easy to understand. To evaluate the proposed system, we have implemented a demonstration of intrusion weak-indication detectors, each working on a different principle to ensure robustness. We tested the architecture with various real-world and lab-created datasets, and it correctly identified 94.3% of infected IoT devices without false positives. Furthermore, the implementation is designed to work on top of extended bidirectional flow data, making it deployable on large 100 Gbps large-scale networks at the level of Internet Service Providers. Thus, a single instance of BOTA can protect millions of devices connected to end-users’ local networks and significantly reduce the threat arising from powerful IoT botnets.

CESNET-QUIC22: A large one-month QUIC network traffic dataset from backbone lines

Autoři

Luxemburk, J.; Hynek, K.; Čejka, T.; Lukačovič, A.; Šiška, P.

Rok

2023

Publikováno

Data in Brief. 2023, 2023(46), ISSN 2352-3409.

Typ

Článek

DOI

10.1016/j.dib.2023.108888

Pracoviště

Katedra číslicového návrhu
Katedra aplikované matematiky

Anotace

The QUIC (Quick UDP Internet Connection) protocol has the potential to replace TLS over TCP, which is the standard choice for reliable and secure Internet communication. Due to its design that makes the inspection of QUIC handshakes challenging and its usage in HTTP/3, there is an increasing demand for research in QUIC traffic analysis. This dataset contains one month of QUIC traffic collected in an ISP backbone network, which connects 500 large institutions and serves around half a million people. The data are delivered as enriched flows that can be useful for various network monitoring tasks. The provided server names and packet-level information allow research in the encrypted traffic classification area. Moreover, included QUIC versions and user agents (smartphone, web browser, and operating system identifiers) provide information for large-scale QUIC deployment studies.

DataZoo: Streamlining Traffic Classification Experiments

Autoři

Luxemburk, J.; Hynek, K.

Rok

2023

Publikováno

SAFE '23: Proceedings of the 2023 on Explainable and Safety Bounded, Fidelitous, Machine Learning for Networking. New York: Association for Computing Machinery, 2023. p. 3-7. ISBN 979-8-4007-0449-9.

Typ

Stať ve sborníku

DOI

10.1145/3630050.3630176

Pracoviště

Katedra číslicového návrhu

Anotace

The machine learning communities, such as those around computer vision or natural language processing, have developed numerous supportive tools and benchmark datasets to accelerate the development. In contrast, the network traffic classification field lacks standard benchmark datasets for most tasks, and the available supportive software is rather limited in scope. This paper aims to address the gap and introduces DataZoo, a toolset designed to streamline dataset management in network traffic classification. DataZoo provides a standardized API for accessing three extensive datasets--CESNET-QUIC22, CESNET-TLS22, and CESNET-TLS-YEAR22. Moreover, it includes methods for feature scaling and realistic dataset partitioning, taking into consideration temporal and service-related factors. The DataZoo toolset simplifies the creation of realistic evaluation scenarios, making it easier to cross-compare classification methods and reproduce results.

DNS Over HTTPS Detection Using Standard Flow Telemetry

Autoři

Jeřábek, K.; Hynek, K.; Ryšavý, O.; Burgetová, I.

Rok

2023

Publikováno

IEEE Access. 2023, 2023(11), 50000-50012. ISSN 2169-3536.

Typ

Článek

DOI

10.1109/ACCESS.2023.3275744

Pracoviště

Katedra číslicového návrhu

Anotace

The aim of DNS over HTTPS (DoH) is to enhance users’ privacy by encrypting DNS. However, it also enables adversaries to bypass security mechanisms that rely on inspecting unencrypted DNS. Therefore in some networks, it is crucial to detect and block DoH to maintain security. Unfortunately, DoH is particularly challenging to detect, because it is designed to blend into regular HTTPS traffic. So far, there have been numerous proposals for DoH detection; however, they rely on specialized flow monitoring software that can export complex features that cannot be often computed on the running sequence or suffer from low accuracy. These properties significantly limit their mass deployment into real-world environments. Therefore this study proposes a novel DoH detector that uses IP-based, machine learning, and active probing techniques to detect DoH effectively with standard flow monitoring software. The use of classical flow features also enables its deployment in any network infrastructure with flow-monitoring appliances such as intelligent switches, firewalls, or routers. The proposed approach was tested using lab-created and real-world ISP-based network data and achieved a high classification accuracy of 0.999 and an F1 score of 0.998 with no false positives.

Encrypted traffic classification: the QUIC case

Autoři

Luxemburk, J.; Hynek, K.; Čejka, T.

Rok

2023

Publikováno

Proceedings of the 7th Network Traffic Measurement and Analysis Conference. Piscataway: IEEE, 2023. ISBN 978-3-903176-58-4.

Typ

Stať ve sborníku

DOI

10.23919/TMA58422.2023.10199052

Pracoviště

Katedra číslicového návrhu

Anotace

The QUIC protocol is a new reliable and secure transport protocol that is an alternative to TLS over TCP. However, compared to TLS, QUIC obfuscates the connection hand-shake and the server name indication domain, making a simple inspection more challenging. The classification of QUIC traffic has also received less attention than that of TLS. In this work, we present a comprehensive study aiming to explore the challenges of QUIC traffic classification. We selected three models: 1) multi-modal CNN, 2) LighGBM, and 3) IP-based classifier, and evaluated their properties using a large one-month CESNET-QUIC22 dataset with 102 web service labels. The developed classifiers reached up to 88% accuracy and set the new baseline in fine-grained QUIC service classification. Moreover, the real nature of the dataset and its long time span allowed us to collect a number of insights and measure the classifiers' performance in the presence of data drift.

Network Traffic Classification Based on Single Flow Time Series Analysis

Autoři

Koumar, J.; Hynek, K.; Čejka, T.

Rok

2023

Publikováno

2023 19th International Conference on Network and Service Management (CNSM). New York: IEEE, 2023. International Conference on Network and Service Management. vol. 19. ISSN 2165-9605. ISBN 978-3-903176-59-1.

Typ

Stať ve sborníku

DOI

10.23919/CNSM59352.2023.10327876

Pracoviště

Katedra číslicového návrhu

Anotace

Network traffic monitoring using IP flows is used to handle the current challenge of analyzing encrypted network communication. Nevertheless, the packet aggregation into flow records naturally causes information loss; therefore, this paper proposes a novel flow extension for traffic features based on the time series analysis of the Single Flow Time series, i.e., a time series created by the number of bytes in each packet and its timestamp. We propose 69 universal features based on the statistical analysis of data points, time domain analysis, packet distribution within the flow timespan, time series behavior, and frequency domain analysis. We have demonstrated the usability and universality of the proposed feature vector for various network traffic classification tasks using 15 well-known publicly available datasets. Our evaluation shows that the novel feature vector achieves classification performance similar or better than related works on both binary and multiclass classification tasks. In more than half of the evaluated tasks, the classification performance increased by up to 5 %.

Collection of datasets with DNS over HTTPS traffic

Autoři

Jeřábek, K.; Hynek, K.; Čejka, T.; Ryšavý, O.

Rok

2022

Publikováno

Data in Brief. 2022, 2022(42), ISSN 2352-3409.

Typ

Článek

DOI

10.1016/j.dib.2022.108310

Pracoviště

Katedra číslicového návrhu

Anotace

Recently, the Internet has adopted the DNS over HTTPS (DoH) resolution mechanism for privacy-aware network applications. As DoH becomes more disseminated, it has also become a network monitoring research topic. For comprehensive evaluation and comparison of developed classifiers, real-world datasets are needed, motivating this contribution. We created a new large-scale collection of datasets consisting of two classes of traffic: i) DoH HTTPS communication and ii) non-DoH HTTPS connections. The DoH traffic is captured for multiple DoH providers and clients to include nuances of various DoH implementations and configurations. The non-DoH HTTPS connections complement the DoH communication aiming to include a wide range of existing network applications. The dataset collection consists of network traffic generated in a controlled environment and traffic captured from a real ISP network. The resulting datasets thus provide real-world network traffic data suitable for evaluating existing classifiers and the development of new methods.

DeCrypto: Finding Cryptocurrency Miners on ISP networks

Autoři

Plný, R.; Hynek, K.; Čejka, T.

Rok

2022

Publikováno

Secure IT Systems. Cham: Springer, 2022. p. 139-158. ISSN 0302-9743. ISBN 978-3-031-22294-8.

Typ

Stať ve sborníku

DOI

10.1007/978-3-031-22295-5_8

Pracoviště

Katedra číslicového návrhu
Katedra aplikované matematiky

Anotace

With the rising popularity of cryptocurrencies and the increasing value of the whole industry, people are incentivized to join and earn revenues by cryptomining — using computational resources for cryptocurrency transaction verification. Nevertheless, there is an increasing number of abusive cryptomining cases, and it is reported that “coin miner malware” grew by more than 4000% in 2018. In this work, we analyzed the cryptominer network communication and proposed the DeCrypto system that can detect and report mining on high-speed 100 Gbps backbone Internet lines with millions of users. The detector uses the concept of heterogeneous weak-indication detectors (Machine-Learning-based, domain-based, and payload-based) that work together and create a robust and accurate detector with an extremely low false-positive rate. The detector was implemented and evaluated on a real nationwide high-speed network and proved efficient in a real-world deployment.

Detection of Cryptomining in High-speed Networks

Autoři

Plný, R.; Hynek, K.

Rok

2022

Publikováno

Proceedings of the 10th Prague Embedded Systems Workshop. Praha: CTU. Faculty of Information Technology, 2022. p. 59-67. ISBN 978-80-01-07015-4.

Typ

Stať ve sborníku

Pracoviště

Katedra číslicového návrhu
Katedra aplikované matematiky

Anotace

This paper addresses cryptomining from the security perspective with an emphasis on abusive mining. It explores the possibility of detecting cryptominers in high-speed computer networks using a flow-based monitoring approach. Based on the analysis of mining communication, we proposed detection method, which can be deployed on high-speed networks. The proposed solution was implemented as a group of NEMEA modules. Moreover, it was deployed and evaluated on the national network CESNET2 operated by CESNET.

Discovering Coordinated Groups of IP Addresses Through Temporal Correlation of Alerts

Autoři

Žádník, M.; Wrona, J.; Hynek, K.; Čejka, T.; Husák, M.

Rok

2022

Publikováno

IEEE Access. 2022, 10(2022), 82799-82813. ISSN 2169-3536.

Typ

Článek

DOI

10.1109/ACCESS.2022.3196362

Pracoviště

Katedra číslicového návrhu

Anotace

Network-based monitoring and intrusion detection systems generate a high number of alerts reporting the suspicious activity of IP addresses. The majority of alerts are dropped due to their low relevance, low priority, or due to high number of alerts themselves. We assume that these alerts still contain valuable information, namely, about the coordination of IP addresses. Knowledge of the coordinated IP addresses improves situational awareness and reflects the requirement of security analysts as well as automated reasoning tools to have as much contextual information as possible to make an informed decision. To validate our assumption, we introduce a novel method to discover the groups of coordinated IP addresses that exhibit a temporal correlation of their alerts. We evaluate our method on data from a real sharing platform reporting approximately 1.5 million alerts per day. The results show that our method can indeed discover groups of truly coordinated IP addresses.

Large Scale Analysis of DoH Deployment on the Internet

Autoři

García, S.; Bogado Garcia, J.; Hynek, K.; Vekshin, D.; Čejka, T.; Wasicek, A.

Rok

2022

Publikováno

Computer Security - ESORICS 2022. Cham: Springer International Publishing, 2022. p. 145-165. vol. 13556. ISSN 1611-3349. ISBN 978-3-031-17143-7.

Typ

Stať ve sborníku

DOI

10.1007/978-3-031-17143-7_8

Pracoviště

Katedra číslicového návrhu

Anotace

DNS over HTTPS (DoH) is one of the standards to protect the security and privacy of users. The choice of DoH provider has controversial consequences, from monopolisation of surveillance to lost visibility by network administrators and security providers. More importantly, it is a novel security business. Software products and organisations depend on users choosing well-known and trusted DoH resolvers. However, there is no comprehensive study on the number of DoH resolvers on the Internet, its growth, and the trustworthiness of the organisations behind them. This paper studies the deployment of DoH resolvers by (i) scanning the whole Internet for DoH resolvers in 2021 and 2022; (ii) creating lists of well-known DoH resolvers by the community; (iii) characterising what those resolvers are, (iv) comparing the growth and differences. Results show that (i) the number of DoH resolvers increased 4.8 times in the period 2021-2022, (ii) the number of organisations providing DoH services has doubled, and (iii) the number of DoH resolvers in 2022 is 28 times larger than the number of well-known DoH resolvers by the community. Moreover, 94% of the public DoH resolvers on the Internet are unknown to the community, 77% use certificates from free services, and 57% belong to unknown organisations or personal servers. We conclude that the number of DoH resolvers is growing at a fast rate; also that at least 30% of them are not completely trustworthy and users should be very careful when choosing a DoH resolver.

Summary of DNS Over HTTPS Abuse

Autoři

Hynek, K.; Vekshin, D.; Luxemburk, J.; Čejka, T.; Wasicek, A.

Rok

2022

Publikováno

IEEE Access. 2022, 10(2022), 54668-54680. ISSN 2169-3536.

Typ

Článek

DOI

10.1109/ACCESS.2022.3175497

Pracoviště

Katedra číslicového návrhu

Anotace

The Internet Engineering Task Force adopted the DNS over HTTPS protocol in 2018 to remediate privacy issues regarding the plain text transmission of the DNS protocol. According to our observations and the analysis described in this paper, protecting DNS queries using HTTPS entails security threats. This paper surveys DoH related research works and analyzes malicious and unwanted activities that leverage DNS over HTTPS and can be currently observed in the wild. Additionally, we describe three real-world abuse scenarios observed in the web environment that reveal how service providers intentionally use DNS over HTTPS to violate policies. Last but not least, we identified several research challenges that we consider important for future security research.

Tunneling through DNS over TLS providers

Autoři

Melcher, L.; Hynek, K.; Čejka, T.

Rok

2022

Publikováno

Proceedings of 2022 18th International Conference on Network and Service Management (CNSM). New York: IEEE, 2022. p. 359-363. ISSN 2165-9605. ISBN 978-3-903176-51-5.

Typ

Stať ve sborníku vyzvaná či oceněná

DOI

10.23919/CNSM55787.2022.9964617

Pracoviště

Katedra číslicového návrhu

Anotace

DNS over TLS (DoT) is one of the approaches for private DNS resolution, which has already gained support by open resolvers. Moreover, DoT is used by default in Android operating systems. This study investigates the possibility of creating DNS covert channels using DoT, which is a security threat that benefits from the increased privacy of encrypted communication. We evaluated the performance and usability of DoT tunnels created via commonly used resolvers. Our results show that the performance characteristics of DoT tunnels differ vastly depending on the used DoT resolver; however, the creation of a DoT tunnel is possible, reaching speeds up to 232 Kbps. Moreover, we successfully transferred data via DoT servers claiming Anti-Virus protection and family-friendly content.

Detection of HTTPS Brute-Force Attacks with Packet-Level Feature Set

Autoři

Luxemburk, J.; Hynek, K.; Čejka, T.

Rok

2021

Publikováno

11th Annual Computing and Communication Workshop and Conference (CCWC2021). Piscataway (New Jersey): IEEE, 2021. p. 114-122. ISBN 978-1-6654-1490-6.

Typ

Stať ve sborníku

DOI

10.1109/CCWC51732.2021.9375998

Pracoviště

Katedra číslicového návrhu

Anotace

This paper presents a novel approach to detect brute-force attacks against web services in high-speed networks. The prevalence of brute-force attacks is so high that service providers, such as ISPs or web-hosting providers, cannot depend on their customers' host-based defenses. Moreover, the rising usage of encryption makes it more difficult to detect attacks on the network level. In our research, we created a dataset, which consists of 1.8 million extended IP flows from a backbone network combined with IP flows generated with three popular open-source brute-forcing tools. We identified a distinctive packet-level feature set and trained a machine-learning classifier with a false positive rate of 10^-4 and a true positive rate (the ratio of discovered attacks) of 0.938. The achieved results surpass the state-of-the-art solutions and show that the developed HTTPS brute-force detection algorithm is viable for production deployment.

Novel HTTPS classifier driven by packet bursts, flows, and machine learning

Autoři

Tropková, Z.; Hynek, K.; Čejka, T.

Rok

2021

Publikováno

Proceedings of the 2021 17th International Conference on Network and Service Management. New York: IEEE, 2021. p. 345-349. ISSN 2165-963X. ISBN 978-3-903176-36-2.

Typ

Stať ve sborníku

DOI

10.23919/CNSM52442.2021.9615561

Pracoviště

Katedra číslicového návrhu

Anotace

Encryption of network traffic recently starts to cover remaining readable information, which is heavily used by current monitoring systems; thus, it is time to focus on novel methods of encrypted traffic analysis and classification. The aim of this paper is to define a new network traffic characteristic called Sequence of packet Burst Length and Time (SBLT), which was inspired by existing approaches and definitions. Contrary to other works, SBLT is feasible even for high-speed backbone networks as a part of IP flow data. The advantage of SBLT features is shown using a machine learning classification model for HTTPS traffic types as an example. This paper presents the definition of SBLT, proposes a new annotated public dataset of HTTPS traffic with 5 categories, and evaluates the developed classifier reaching accuracy over 99 %. This classifier can help analysts to deal with a huge amount of encrypted traffic and maintain situational awareness.

Towards Evaluating Quality of Datasets for Network Traffic Domain

Autoři

Soukup, D.; Tisovčík, P.; Hynek, K.; Čejka, T.

Rok

2021

Publikováno

Proceedings of the 2021 17th International Conference on Network and Service Management. New York: IEEE, 2021. p. 264-268. ISSN 2165-963X. ISBN 978-3-903176-36-2.

Typ

Stať ve sborníku

DOI

10.23919/CNSM52442.2021.9615601

Pracoviště

Katedra číslicového návrhu

Anotace

This paper deals with the quality of network traffic datasets created to train and validate machine learning classification and detection methods. Naturally, there is a long epoch of research targeted at data quality; however, it is focused mainly on data consistency, validity, precision, and other metrics, which are insufficient for network traffic use-cases. The rise of Machine learning usage in network monitoring applications requires a new methodology for evaluation datasets. There is a need to evaluate and compare traffic samples captured at different conditions and decide the usability of the already captured and annotated data. This paper aims to explain a use case of dataset creation, propose definitions regarding the quality of the network traffic datasets, and finally, describe a framework for datasets analysis.

Behavior Anomaly Detection in IoT Networks

Autoři

Soukup, D.; Čejka, T.; Hynek, K.

Rok

2020

Publikováno

Proceeding of the International Conference on Computer Networks, Big Data and IoT (ICCBI - 2019). Cham: Springer International Publishing, 2020. p. 465-473. Lecture Notes on Data Engineering and Communications Technologies. vol. 49. ISSN 2367-4520. ISBN 978-3-030-43192-1.

Typ

Kapitola v knize

DOI

10.1007/978-3-030-43192-1_53

Pracoviště

Katedra číslicového návrhu

Anotace

Data encryption makes deep packet inspection less suitable nowadays, and the need of analyzing encrypted traffic is growing. Machine learning brings new options to recognize a type of communication despite the heterogeneity of encrypted IoT traffic right at the network edge. We propose the design of scalable architecture and the method for behavior anomaly detection in IoT networks. Combination of two existing semi-supervised techniques that we used ensures higher reliability of anomaly detection and improves results achieved by a single method. We describe conducted classification and anomaly detection experiments allowed thanks to existing and our training datasets. Presented satisfying results provide a subject for further work and allow us to elaborate on this idea.

DoH detection: Discovering hidden DNS

Autoři

Hynek, K.; Čejka, T.; Vekshin, D.

Rok

2020

Publikováno

Proceedings of the 8th Prague Embedded Systems Workshop. Praha: Czech Technical University in Prague, 2020. p. 14-16. ISBN 978-80-01-06772-7.

Typ

Stať ve sborníku

Pracoviště

Katedra číslicového návrhu

Anotace

The necessity of securing users’ privacy on the internet has given the rise of a new protocol called DNSover HTTPS (DoH). It aims to replace traditional DNS for domain name translation with encryption as a benefit. Unfortunately, the laudable attempt to increase the privacy of users also brings some security threats as well. Readable information from DNS is one of the most essential data-source in computer security, especially for security forensic analysis. The DNS queries in the network can reveal malicious activity in the network like the presence of malware, botnet communication, and also data exfiltration.Thus network administrators might want to block encrypted DoH in their network, however, the currently available approaches are based on lists of IP adresses of well-known DoH providers/resolvers. This way of detection can be easily surpassed by its own private or not generally known DoH resolver. Since the presence of DoH communication might also indicate some malicious activity or at least a policy violation, we decided to find a possible way to detect DoH based on the traffic behavior. This research aims to recognize DoH from extended IP flow data by Machine Learning regardless IP addresses.

DoH Insight: Detecting DNS over HTTPS by Machine Learning

Autoři

Vekshin, D.; Hynek, K.; Čejka, T.

Rok

2020

Publikováno

ARES '20: Proceedings of the 15th International Conference on Availability, Reliability and Security. New York: ACM, 2020. p. 1-8. ISBN 978-1-4503-8833-7.

Typ

Stať ve sborníku

DOI

10.1145/3407023.3409192

Pracoviště

Katedra číslicového návrhu

Anotace

Over the past few years, a new protocol DNS over HTTPS (DoH) has been created to improve users' privacy on the internet. DoH can be used instead of traditional DNS for domain name translation with encryption as a benefit. This new feature also brings some threats because various security tools depend on readable information from DNS to identify, e.g., malware, botnet communication, and data exfiltration. Therefore, this paper focuses on the possibilities of encrypted traffic analysis, especially on the accurate recognition of DoH. The aim is to evaluate what information (if any) can be gained from HTTPS extended IP flow data using machine learning. We evaluated five popular ML methods to find the best DoH classifiers. The experiments show that the accuracy of DoH recognition is over 99.9 %. Additionally, it is also possible to identify the application that was used for DoH communication, since we have discovered (using created datasets) significant differences in the behavior of Firefox, Chrome, and cloudflared. Our trained classifier can distinguish between DoH clients with the 99.9 % accuracy.

Evaluating Bad Hosts Using Adaptive Blacklist Filter

Autoři

Hynek, K.; Čejka, T.; Žádník, M.; Kubátová, H.

Rok

2020

Publikováno

Proceedings of the 9th Mediterranean Conference on Embedded Computing - MECO'2020. Institute of Electrical and Electronics Engineers, Inc., 2020. p. 306-310. ISSN 2637-9511. ISBN 978-1-7281-6949-1.

Typ

Stať ve sborníku

DOI

10.1109/MECO49872.2020.9134244

Pracoviště

Katedra číslicového návrhu

Anotace

Publicly available blacklists are popular tools to capture and spread information about misbehaving entities on the Internet. In some cases, their straight-forward utilization leads to many false positives. In this work, we propose a system that combines blacklists with network flow data while introducing automated evaluation techniques to avoid reporting unreliable alerts. The core of the system is formed by an Adaptive Filter together with an Evaluator module. The assessment of the system was performed on data obtained from a national backbone network. The results show the contribution of such a system to the reduction of unreliable alerts.

Pipelined ALU for effective external memory access in FPGA

Autoři

Beneš, T.; Kekely, M.; Hynek, K.; Čejka, T.

Rok

2020

Publikováno

Proceedings of the 23rd Euromicro Conference on Digital Systems Design. Los Alamitos, CA: IEEE Computer Soc., 2020. p. 97-100. ISBN 978-1-7281-9535-3.

Typ

Stať ve sborníku

DOI

10.1109/DSD51259.2020.00026

Pracoviště

Katedra číslicového návrhu

Anotace

The external memories in digital design are closely related to high response time. The most common approach to mitigate latency is adding a caching mechanism into the memory subsystem. This solution might be sufficient in CPU architecture, where we can reschedule operations when a cache miss occurs. However, the FPGA architectures are usually accelerators with simple functionality, where it is not possible to postpone work. The cache miss often leads to whole pipeline stall or even to data loss. The architecture we present in this paper reduces this problem by aggregating arithmetic operations into the memory subsystem itself. Our architecture reaches a speed of 200 Mp/s (operations carried out). It is designed to be used in systems with link speeds of 100 Gb/s. It outperforms other implementations by a factor of at least 3. The additional benefit of our architecture is reducing the number of memory transactions by a factor of two on real-world datasets.

Privacy Illusion: Beware of Unpadded DoH

Autoři

Hynek, K.; Čejka, T.

Rok

2020

Publikováno

2020 11th IEEE Annual Information Technology, Electronics and Mobile Communication Conference (IEMCON). Montreal: IEEE, 2020. p. 621-628. ISSN 2644-3163. ISBN 978-1-7281-8416-6.

Typ

Stať ve sborníku vyzvaná či oceněná

DOI

10.1109/IEMCON51383.2020.9284864

Pracoviště

Katedra číslicového návrhu

Anotace

DNS over HTTPS (DoH) has been created with ambitions to improve the privacy of users on the internet. Domain names that are being resolved by DoH are transferred via an encrypted channel, ensures nobody should be able to read the content. However, even though the communication is encrypted, we show that it still leaks some private information, which can be misused. Therefore, this paper studies the behavior of the DoH protocol implementation in Firefox and Chrome web-browsers, and the level of detail that can be revealed by observing and analyzing packet-level information. The aim of this paper is to evaluate and highlight discovered privacy weaknesses hidden in DoH. By the trained machine learning classifier, it is possible to infer individual domain names only from the captured encrypted DoH connection. The resulting trained classifier can infer domain name from encrypted DNS traffic with surprisingly high accuracy up to 90% on HTTP 1.1, and up to 70% on HTTP 2 protocol.

QoD: Ideas about Evaluating Quality of Datasets

Autoři

Soukup, D.; Hynek, K.; Čejka, T.

Rok

2020

Publikováno

Proceedings of the 8th Prague Embedded Systems Workshop. Praha: Czech Technical University in Prague, 2020. p. 8-9. ISBN 978-80-01-06772-7.

Typ

Stať ve sborníku

Pracoviště

Katedra číslicového návrhu

Anotace

Importance of computer networks is raising every year. The reason is that we are connecting more and more devices, applications and our daily routines depends on connectivity. On the other hand, this is a great potential for attackers. They can hide their activities in complex network environment and steal valuable data. Without solid dataset, our evaluation score is misinterpreting the real score in production environment, and, therefore, proper datasets have essential role in research&development of any ML-based classifier or detector. The main motivation for this paper is to find a way how to evaluate quality of any dataset to estimate if it is good enough for ML experiments. To our best knowledge, there are only a few studies focused on quality evaluation of datasets with network traffic. For experiments, we selected datasets about DNS over HTTP (DoH) detection and URL classification problems that are already being elaborated. All metrics are calculated from dataset level. Impact of these metrics is evaluated on Random Forest (RF) model. We show results we have discovered in our datasets and ML detection modules. Eventually, we discuss possible next steps in this research.

Refined detection of SSH brute-force attackers using machine learning

Autoři

Hynek, K.; Beneš, T.; Čejka, T.; Kubátová, H.

Rok

2020

Publikováno

ICT Systems Security and Privacy Protection. Cham: Springer, 2020. p. 49-63. IFIP Advances in Information and Communication Technology. vol. 580. ISSN 1868-4238. ISBN 978-3-030-58200-5.

Typ

Stať ve sborníku

DOI

10.1007/978-3-030-58201-2_4

Pracoviště

Katedra číslicového návrhu

Anotace

This paper presents a novel approach to detect SSH brute-force (BF) attacks in high-speed networks. Contrary to host-based approaches, we focus on network traffic analysis to identify attackers. Recent papers describe how to detect BF attacks using pure NetFlow data. However, our evaluation shows significant false-positive (FP) results of the current solution. To overcome the issue of high FP rate, we propose a machine learning (ML) approach to detection using specially extended IP Flows. The contributions of this paper are a new dataset from real environment, experimentally selected ML method, which performs with high accuracy and low FP rate, and an architecture of the detection system. The dataset for training was created using extensive evaluation of captured real traffic, manually prepared legitimate SSH traffic with characteristics similar to BF attacks, and, finally, using a packet trace with SSH logs from real production servers.

An Example of PCB Reverse Engineering - Reconstruction of Digilent JTAG SMT3 Schematic

Autoři

Bartík, M.; Beneš, T.; Hynek, K.

Rok

2019

Publikováno

The 7th IEEE Workshop on Advances in Information, Electronic and Electrical Engineering. Piscataway (New Jersey): IEEE, 2019. ISBN 978-1-7281-6730-5.

Typ

Stať ve sborníku

DOI

10.1109/AIEEE48629.2019.8977100

Pracoviště

Katedra číslicového návrhu

Anotace

This paper presents a successful reverse engineering process of Digilent JTAG-SMT3-NC module, revealing the identity of all key components. The reconstruction required a deep knowledge of PCB (Printed Circuit Board) design and manufacturing process and knowledge of (elementary) function principles and behavior of the examined device. We were able to reveal 80% of schematic via analysis of publicly available resources such as original high-resolution images and BOM (Bill of Material) fragments. The remaining 20% were obtained by non-invasive test equipment such as multi-meter and microscope. The reconstructed schematic has been verified by designing our own PCB implementing the original SMT3 function.

Future approaches to monitoring in high-speed backbone networks

Autoři

Hynek, K.; Beneš, T.; Čejka, T.; Kubátová, H.

Rok

2019

Publikováno

Proceedings of the 7th Prague Embedded Systems Workshop. Praha: ČVUT FIT, Katedra číslicového návrhu, 2019. p. 27-28. ISBN 978-80-01-06607-2.

Typ

Stať ve sborníku

Pracoviště

Katedra číslicového návrhu

Anotace

Network monitoring features has been always a challenge in high-speed networks. Some of themlike detailed traffic analysis and packet inspection are not suited or simply not feasible even on modernhardware. The challenges are becoming even greater with an uprise of encrypted traffic. This leaves largeopportunity for threat actors to take advantage of. Therefore, it is necessary to develop a new generationof monitoring tools that can deal with the current issues for security purposes. This research aims toimprove traffic analysis techniques to handle encrypted traffic, and also to adapt hardware acceleratedmonitoring components for processing.

Ultra High Resolution Jitter Measurement Method for Ethernet Based Networks

Autoři

Hynek, K.; Beneš, T.; Bartík, M.; Kubalík, P.

Rok

2019

Publikováno

The 9th IEEE Annual Computing and Communication Workshop and Conference (CCWC). Piscataway: IEEE, 2019. p. 847-851. ISBN 9781728105543.

Typ

Stať ve sborníku

DOI

10.1109/CCWC.2019.8666446

Pracoviště

Katedra číslicového návrhu

Anotace

This document presents a new approach to network jitter measurement and analysis in asynchronous data networks such as Ethernet. The developed monitoring device is capable to analyze an incoming stream speed of 1 Gb/s with the resolution up to 8 ns. The system architecture supports speeds up to 100 Gb/s networks. The presented architecture can provide several statistical functions such as measuring a network jitter by Interarrival Histograms method providing the mean value and peak-to-peak value as well. The architecture was implemented and tested on Xilinx Kintex UltraScale FPGA chip using Avnet AES-KU040-DB-G development board.

Ing. Karel Hynek, Ph.D.

Publikace

CESNET-TimeSeries24: Time Series Dataset for Network Traffic Anomaly Detection and Forecasting

CESNET-TLS-Year22: A year-spanning TLS network traffic dataset from backbone lines

Comparative analysis of DNS over HTTPS detectors

NetTiSA: Extended IP flow with time-series features for universal bandwidth-constrained high-speed network traffic classification

Towards reusable models in traffic classification

WIF: Efficient Library for Network Traffic Analysis

BOTA: Explainable IoT malware detection in large networks

CESNET-QUIC22: A large one-month QUIC network traffic dataset from backbone lines

DataZoo: Streamlining Traffic Classification Experiments

DNS Over HTTPS Detection Using Standard Flow Telemetry

Encrypted traffic classification: the QUIC case

Network Traffic Classification Based on Single Flow Time Series Analysis

Collection of datasets with DNS over HTTPS traffic

DeCrypto: Finding Cryptocurrency Miners on ISP networks

Detection of Cryptomining in High-speed Networks

Discovering Coordinated Groups of IP Addresses Through Temporal Correlation of Alerts

Large Scale Analysis of DoH Deployment on the Internet

Summary of DNS Over HTTPS Abuse

Tunneling through DNS over TLS providers

Detection of HTTPS Brute-Force Attacks with Packet-Level Feature Set

Novel HTTPS classifier driven by packet bursts, flows, and machine learning

Towards Evaluating Quality of Datasets for Network Traffic Domain

Behavior Anomaly Detection in IoT Networks

DoH detection: Discovering hidden DNS

DoH Insight: Detecting DNS over HTTPS by Machine Learning

Evaluating Bad Hosts Using Adaptive Blacklist Filter

Pipelined ALU for effective external memory access in FPGA

Privacy Illusion: Beware of Unpadded DoH

QoD: Ideas about Evaluating Quality of Datasets

Refined detection of SSH brute-force attackers using machine learning

An Example of PCB Reverse Engineering - Reconstruction of Digilent JTAG SMT3 Schematic

Future approaches to monitoring in high-speed backbone networks

Ultra High Resolution Jitter Measurement Method for Ethernet Based Networks