Ing. Karel Hynek, Ph.D.

Publikace

NetTiSA: Extended IP flow with time-series features for universal bandwidth-constrained high-speed network traffic classification

Autoři
Rok
2024
Publikováno
Computer Networks. 2024, 240 1-22. ISSN 1389-1286.
Typ
Článek
Anotace
Network traffic monitoring based on IP Flows is a standard monitoring approach that can be deployed to various network infrastructures, even the large ISP networks connecting millions of people. Since flow records traditionally contain only limited information (addresses, transport ports, and amount of exchanged data), they are also commonly extended by additional features that enable network traffic analysis with high accuracy. These flow extensions are, however, often too large or hard to compute, which then allows only offline analysis or limits their deployment only to smaller-sized networks. This paper proposes a novel extended IP flow called NetTiSA (Network Time Series Analysed) flow, based on analysing the time series of packet sizes. By thoroughly testing 25 different network traffic classification tasks, we show the broad applicability and high usability of NetTiSA flow. For practical deployment, we also consider the sizes of flows extended by NetTiSA features and evaluate the performance impacts of their computation in the flow exporter. The novel features proved to be computationally inexpensive and showed excellent discriminatory performance. The trained machine learning classifiers with proposed features mostly outperformed the state-of-the-art methods. NetTiSA finally bridges the gap and brings universal, small-sized, and computationally inexpensive features for traffic classification that can be scaled up to extensive monitoring infrastructures, bringing the machine learning traffic classification even to 100 Gbps backbone lines.

Collection of datasets with DNS over HTTPS traffic

Autoři
Jeřábek, K.; Hynek, K.; Čejka, T.; Ryšavý, O.
Rok
2022
Publikováno
Data in Brief. 2022, 2022(42), ISSN 2352-3409.
Typ
Článek
Anotace
Recently, the Internet has adopted the DNS over HTTPS (DoH) resolution mechanism for privacy-aware network applications. As DoH becomes more disseminated, it has also become a network monitoring research topic. For comprehensive evaluation and comparison of developed classifiers, real-world datasets are needed, motivating this contribution. We created a new large-scale collection of datasets consisting of two classes of traffic: i) DoH HTTPS communication and ii) non-DoH HTTPS connections. The DoH traffic is captured for multiple DoH providers and clients to include nuances of various DoH implementations and configurations. The non-DoH HTTPS connections complement the DoH communication aiming to include a wide range of existing network applications. The dataset collection consists of network traffic generated in a controlled environment and traffic captured from a real ISP network. The resulting datasets thus provide real-world network traffic data suitable for evaluating existing classifiers and the development of new methods.

DeCrypto: Finding Cryptocurrency Miners on ISP networks

Rok
2022
Publikováno
Secure IT Systems. Cham: Springer, 2022. p. 139-158. ISSN 0302-9743. ISBN 978-3-031-22294-8.
Typ
Stať ve sborníku
Anotace
With the rising popularity of cryptocurrencies and the increasing value of the whole industry, people are incentivized to join and earn revenues by cryptomining — using computational resources for cryptocurrency transaction verification. Nevertheless, there is an increasing number of abusive cryptomining cases, and it is reported that “coin miner malware” grew by more than 4000% in 2018. In this work, we analyzed the cryptominer network communication and proposed the DeCrypto system that can detect and report mining on high-speed 100 Gbps backbone Internet lines with millions of users. The detector uses the concept of heterogeneous weak-indication detectors (Machine-Learning-based, domain-based, and payload-based) that work together and create a robust and accurate detector with an extremely low false-positive rate. The detector was implemented and evaluated on a real nationwide high-speed network and proved efficient in a real-world deployment.

Detection of Cryptomining in High-speed Networks

Autoři
Rok
2022
Publikováno
Proceedings of the 10th Prague Embedded Systems Workshop. Praha: CTU. Faculty of Information Technology, 2022. p. 59-67. ISBN 978-80-01-07015-4.
Typ
Stať ve sborníku
Anotace
This paper addresses cryptomining from the security perspective with an emphasis on abusive mining. It explores the possibility of detecting cryptominers in high-speed computer networks using a flow-based monitoring approach. Based on the analysis of mining communication, we proposed detection method, which can be deployed on high-speed networks. The proposed solution was implemented as a group of NEMEA modules. Moreover, it was deployed and evaluated on the national network CESNET2 operated by CESNET.

Discovering Coordinated Groups of IP Addresses Through Temporal Correlation of Alerts

Autoři
Žádník, M.; Wrona, J.; Hynek, K.; Čejka, T.; Husák, M.
Rok
2022
Publikováno
IEEE Access. 2022, 10(2022), 82799-82813. ISSN 2169-3536.
Typ
Článek
Anotace
Network-based monitoring and intrusion detection systems generate a high number of alerts reporting the suspicious activity of IP addresses. The majority of alerts are dropped due to their low relevance, low priority, or due to high number of alerts themselves. We assume that these alerts still contain valuable information, namely, about the coordination of IP addresses. Knowledge of the coordinated IP addresses improves situational awareness and reflects the requirement of security analysts as well as automated reasoning tools to have as much contextual information as possible to make an informed decision. To validate our assumption, we introduce a novel method to discover the groups of coordinated IP addresses that exhibit a temporal correlation of their alerts. We evaluate our method on data from a real sharing platform reporting approximately 1.5 million alerts per day. The results show that our method can indeed discover groups of truly coordinated IP addresses.

Large Scale Analysis of DoH Deployment on the Internet

Autoři
García, S.; Bogado Garcia, J.; Hynek, K.; Vekshin, D.; Čejka, T.; Wasicek, A.
Rok
2022
Publikováno
Computer Security - ESORICS 2022. Cham: Springer International Publishing, 2022. p. 145-165. ISSN 0302-9743. ISBN 978-3-031-17142-0.
Typ
Stať ve sborníku
Anotace
DNS over HTTPS (DoH) is one of the standards to protect the security and privacy of users. The choice of DoH provider has controversial consequences, from monopolisation of surveillance to lost visibility by network administrators and security providers. More importantly, it is a novel security business. Software products and organisations depend on users choosing well-known and trusted DoH resolvers. However, there is no comprehensive study on the number of DoH resolvers on the Internet, its growth, and the trustworthiness of the organisations behind them. This paper studies the deployment of DoH resolvers by (i) scanning the whole Internet for DoH resolvers in 2021 and 2022; (ii) creating lists of well-known DoH resolvers by the community; (iii) characterising what those resolvers are, (iv) comparing the growth and differences. Results show that (i) the number of DoH resolvers increased 4.8 times in the period 2021-2022, (ii) the number of organisations providing DoH services has doubled, and (iii) the number of DoH resolvers in 2022 is 28 times larger than the number of well-known DoH resolvers by the community. Moreover, 94% of the public DoH resolvers on the Internet are unknown to the community, 77% use certificates from free services, and 57% belong to unknown organisations or personal servers. We conclude that the number of DoH resolvers is growing at a fast rate; also that at least 30% of them are not completely trustworthy and users should be very careful when choosing a DoH resolver.

Summary of DNS Over HTTPS Abuse

Autoři
Hynek, K.; Vekshin, D.; Luxemburk, J.; Čejka, T.; Wasicek, A.
Rok
2022
Publikováno
IEEE Access. 2022, 10(2022), 54668-54680. ISSN 2169-3536.
Typ
Článek
Anotace
The Internet Engineering Task Force adopted the DNS over HTTPS protocol in 2018 to remediate privacy issues regarding the plain text transmission of the DNS protocol. According to our observations and the analysis described in this paper, protecting DNS queries using HTTPS entails security threats. This paper surveys DoH related research works and analyzes malicious and unwanted activities that leverage DNS over HTTPS and can be currently observed in the wild. Additionally, we describe three real-world abuse scenarios observed in the web environment that reveal how service providers intentionally use DNS over HTTPS to violate policies. Last but not least, we identified several research challenges that we consider important for future security research.

Tunneling through DNS over TLS providers

Autoři
Melcher, L.; Hynek, K.; Čejka, T.
Rok
2022
Publikováno
Proceedings of 2022 18th International Conference on Network and Service Management (CNSM). New York: IEEE, 2022. p. 359-363. ISSN 2165-9605. ISBN 978-3-903176-51-5.
Typ
Stať ve sborníku vyzvaná či oceněná
Anotace
DNS over TLS (DoT) is one of the approaches for private DNS resolution, which has already gained support by open resolvers. Moreover, DoT is used by default in Android operating systems. This study investigates the possibility of creating DNS covert channels using DoT, which is a security threat that benefits from the increased privacy of encrypted communication. We evaluated the performance and usability of DoT tunnels created via commonly used resolvers. Our results show that the performance characteristics of DoT tunnels differ vastly depending on the used DoT resolver; however, the creation of a DoT tunnel is possible, reaching speeds up to 232 Kbps. Moreover, we successfully transferred data via DoT servers claiming Anti-Virus protection and family-friendly content.

Detection of HTTPS Brute-Force Attacks with Packet-Level Feature Set

Autoři
Luxemburk, J.; Hynek, K.; Čejka, T.
Rok
2021
Publikováno
11th Annual Computing and Communication Workshop and Conference (CCWC2021). Piscataway (New Jersey): IEEE, 2021. p. 0115-0123. ISBN 978-0-7381-4394-1.
Typ
Stať ve sborníku
Anotace
This paper presents a novel approach to detect brute-force attacks against web services in high-speed networks. The prevalence of brute-force attacks is so high that service providers, such as ISPs or web-hosting providers, cannot depend on their customers' host-based defenses. Moreover, the rising usage of encryption makes it more difficult to detect attacks on the network level. In our research, we created a dataset, which consists of 1.8 million extended IP flows from a backbone network combined with IP flows generated with three popular open-source brute-forcing tools. We identified a distinctive packet-level feature set and trained a machine-learning classifier with a false positive rate of 10^-4 and a true positive rate (the ratio of discovered attacks) of 0.938. The achieved results surpass the state-of-the-art solutions and show that the developed HTTPS brute-force detection algorithm is viable for production deployment.

Novel HTTPS classifier driven by packet bursts, flows, and machine learning

Autoři
Tropková, Z.; Hynek, K.; Čejka, T.
Rok
2021
Publikováno
Proceedings of the 2021 17th International Conference on Network and Service Management. New York: IEEE, 2021. p. 345-349. ISSN 2165-963X. ISBN 978-3-903176-36-2.
Typ
Stať ve sborníku
Anotace
Encryption of network traffic recently starts to cover remaining readable information, which is heavily used by current monitoring systems; thus, it is time to focus on novel methods of encrypted traffic analysis and classification. The aim of this paper is to define a new network traffic characteristic called Sequence of packet Burst Length and Time (SBLT), which was inspired by existing approaches and definitions. Contrary to other works, SBLT is feasible even for high-speed backbone networks as a part of IP flow data. The advantage of SBLT features is shown using a machine learning classification model for HTTPS traffic types as an example. This paper presents the definition of SBLT, proposes a new annotated public dataset of HTTPS traffic with 5 categories, and evaluates the developed classifier reaching accuracy over 99 %. This classifier can help analysts to deal with a huge amount of encrypted traffic and maintain situational awareness.

Towards Evaluating Quality of Datasets for Network Traffic Domain

Autoři
Soukup, D.; Tisovčík, P.; Hynek, K.; Čejka, T.
Rok
2021
Publikováno
Proceedings of the 2021 17th International Conference on Network and Service Management. New York: IEEE, 2021. p. 264-268. ISSN 2165-963X. ISBN 978-3-903176-36-2.
Typ
Stať ve sborníku
Anotace
This paper deals with the quality of network traffic datasets created to train and validate machine learning classification and detection methods. Naturally, there is a long epoch of research targeted at data quality; however, it is focused mainly on data consistency, validity, precision, and other metrics, which are insufficient for network traffic use-cases. The rise of Machine learning usage in network monitoring applications requires a new methodology for evaluation datasets. There is a need to evaluate and compare traffic samples captured at different conditions and decide the usability of the already captured and annotated data. This paper aims to explain a use case of dataset creation, propose definitions regarding the quality of the network traffic datasets, and finally, describe a framework for datasets analysis.

Behavior Anomaly Detection in IoT Networks

Rok
2020
Publikováno
Proceeding of the International Conference on Computer Networks, Big Data and IoT (ICCBI - 2019). Cham: Springer International Publishing, 2020. p. 465-473. Lecture Notes on Data Engineering and Communications Technologies. vol. 49. ISSN 2367-4520. ISBN 978-3-030-43192-1.
Typ
Kapitola v knize
Anotace
Data encryption makes deep packet inspection less suitable nowadays, and the need of analyzing encrypted traffic is growing. Machine learning brings new options to recognize a type of communication despite the heterogeneity of encrypted IoT traffic right at the network edge. We propose the design of scalable architecture and the method for behavior anomaly detection in IoT networks. Combination of two existing semi-supervised techniques that we used ensures higher reliability of anomaly detection and improves results achieved by a single method. We describe conducted classification and anomaly detection experiments allowed thanks to existing and our training datasets. Presented satisfying results provide a subject for further work and allow us to elaborate on this idea.

DoH detection: Discovering hidden DNS

Autoři
Hynek, K.; Čejka, T.; Vekshin, D.
Rok
2020
Publikováno
Proceedings of the 8th Prague Embedded Systems Workshop. Praha: Czech Technical University in Prague, 2020. p. 14-16. ISBN 978-80-01-06772-7.
Typ
Stať ve sborníku
Anotace
The necessity of securing users’ privacy on the internet has given the rise of a new protocol called DNSover HTTPS (DoH). It aims to replace traditional DNS for domain name translation with encryption as a benefit. Unfortunately, the laudable attempt to increase the privacy of users also brings some security threats as well. Readable information from DNS is one of the most essential data-source in computer security, especially for security forensic analysis. The DNS queries in the network can reveal malicious activity in the network like the presence of malware, botnet communication, and also data exfiltration.Thus network administrators might want to block encrypted DoH in their network, however, the currently available approaches are based on lists of IP adresses of well-known DoH providers/resolvers. This way of detection can be easily surpassed by its own private or not generally known DoH resolver. Since the presence of DoH communication might also indicate some malicious activity or at least a policy violation, we decided to find a possible way to detect DoH based on the traffic behavior. This research aims to recognize DoH from extended IP flow data by Machine Learning regardless IP addresses.

DoH Insight: Detecting DNS over HTTPS by Machine Learning

Autoři
Vekshin, D.; Hynek, K.; Čejka, T.
Rok
2020
Publikováno
ARES '20: Proceedings of the 15th International Conference on Availability, Reliability and Security. New York: ACM, 2020. p. 1-8. ISBN 978-1-4503-8833-7.
Typ
Stať ve sborníku
Anotace
Over the past few years, a new protocol DNS over HTTPS (DoH) has been created to improve users' privacy on the internet. DoH can be used instead of traditional DNS for domain name translation with encryption as a benefit. This new feature also brings some threats because various security tools depend on readable information from DNS to identify, e.g., malware, botnet communication, and data exfiltration. Therefore, this paper focuses on the possibilities of encrypted traffic analysis, especially on the accurate recognition of DoH. The aim is to evaluate what information (if any) can be gained from HTTPS extended IP flow data using machine learning. We evaluated five popular ML methods to find the best DoH classifiers. The experiments show that the accuracy of DoH recognition is over 99.9 %. Additionally, it is also possible to identify the application that was used for DoH communication, since we have discovered (using created datasets) significant differences in the behavior of Firefox, Chrome, and cloudflared. Our trained classifier can distinguish between DoH clients with the 99.9 % accuracy.

Evaluating Bad Hosts Using Adaptive Blacklist Filter

Autoři
Rok
2020
Publikováno
Proceedings of the 9th Mediterranean Conference on Embedded Computing - MECO'2020. Institute of Electrical and Electronics Engineers, Inc., 2020. p. 306-310. ISSN 2637-9511. ISBN 978-1-7281-6949-1.
Typ
Stať ve sborníku
Anotace
Publicly available blacklists are popular tools to capture and spread information about misbehaving entities on the Internet. In some cases, their straight-forward utilization leads to many false positives. In this work, we propose a system that combines blacklists with network flow data while introducing automated evaluation techniques to avoid reporting unreliable alerts. The core of the system is formed by an Adaptive Filter together with an Evaluator module. The assessment of the system was performed on data obtained from a national backbone network. The results show the contribution of such a system to the reduction of unreliable alerts.

Pipelined ALU for effective external memory access in FPGA

Autoři
Beneš, T.; Kekely, M.; Hynek, K.; Čejka, T.
Rok
2020
Publikováno
Proceedings of the 23rd Euromicro Conference on Digital Systems Design. Los Alamitos, CA: IEEE Computer Soc., 2020. p. 97-100. ISBN 978-1-7281-9535-3.
Typ
Stať ve sborníku
Anotace
The external memories in digital design are closely related to high response time. The most common approach to mitigate latency is adding a caching mechanism into the memory subsystem. This solution might be sufficient in CPU architecture, where we can reschedule operations when a cache miss occurs. However, the FPGA architectures are usually accelerators with simple functionality, where it is not possible to postpone work. The cache miss often leads to whole pipeline stall or even to data loss. The architecture we present in this paper reduces this problem by aggregating arithmetic operations into the memory subsystem itself. Our architecture reaches a speed of 200 Mp/s (operations carried out). It is designed to be used in systems with link speeds of 100 Gb/s. It outperforms other implementations by a factor of at least 3. The additional benefit of our architecture is reducing the number of memory transactions by a factor of two on real-world datasets.

Privacy Illusion: Beware of Unpadded DoH

Rok
2020
Publikováno
2020 11th IEEE Annual Information Technology, Electronics and Mobile Communication Conference (IEMCON). Montreal: IEEE, 2020. p. 621-628. ISSN 2644-3163. ISBN 978-1-7281-8416-6.
Typ
Stať ve sborníku vyzvaná či oceněná
Anotace
DNS over HTTPS (DoH) has been created with ambitions to improve the privacy of users on the internet. Domain names that are being resolved by DoH are transferred via an encrypted channel, ensures nobody should be able to read the content. However, even though the communication is encrypted, we show that it still leaks some private information, which can be misused. Therefore, this paper studies the behavior of the DoH protocol implementation in Firefox and Chrome web-browsers, and the level of detail that can be revealed by observing and analyzing packet-level information. The aim of this paper is to evaluate and highlight discovered privacy weaknesses hidden in DoH. By the trained machine learning classifier, it is possible to infer individual domain names only from the captured encrypted DoH connection. The resulting trained classifier can infer domain name from encrypted DNS traffic with surprisingly high accuracy up to 90% on HTTP 1.1, and up to 70% on HTTP 2 protocol.

QoD: Ideas about Evaluating Quality of Datasets

Rok
2020
Publikováno
Proceedings of the 8th Prague Embedded Systems Workshop. Praha: Czech Technical University in Prague, 2020. p. 8-9. ISBN 978-80-01-06772-7.
Typ
Stať ve sborníku
Anotace
Importance of computer networks is raising every year. The reason is that we are connecting more and more devices, applications and our daily routines depends on connectivity. On the other hand, this is a great potential for attackers. They can hide their activities in complex network environment and steal valuable data. Without solid dataset, our evaluation score is misinterpreting the real score in production environment, and, therefore, proper datasets have essential role in research&development of any ML-based classifier or detector. The main motivation for this paper is to find a way how to evaluate quality of any dataset to estimate if it is good enough for ML experiments. To our best knowledge, there are only a few studies focused on quality evaluation of datasets with network traffic. For experiments, we selected datasets about DNS over HTTP (DoH) detection and URL classification problems that are already being elaborated. All metrics are calculated from dataset level. Impact of these metrics is evaluated on Random Forest (RF) model. We show results we have discovered in our datasets and ML detection modules. Eventually, we discuss possible next steps in this research.

Refined detection of SSH brute-force attackers using machine learning

Autoři
Rok
2020
Publikováno
ICT Systems Security and Privacy Protection. Cham: Springer, 2020. p. 49-63. IFIP Advances in Information and Communication Technology. vol. 580. ISSN 1868-4238. ISBN 978-3-030-58200-5.
Typ
Stať ve sborníku
Anotace
This paper presents a novel approach to detect SSH brute-force (BF) attacks in high-speed networks. Contrary to host-based approaches, we focus on network traffic analysis to identify attackers. Recent papers describe how to detect BF attacks using pure NetFlow data. However, our evaluation shows significant false-positive (FP) results of the current solution. To overcome the issue of high FP rate, we propose a machine learning (ML) approach to detection using specially extended IP Flows. The contributions of this paper are a new dataset from real environment, experimentally selected ML method, which performs with high accuracy and low FP rate, and an architecture of the detection system. The dataset for training was created using extensive evaluation of captured real traffic, manually prepared legitimate SSH traffic with characteristics similar to BF attacks, and, finally, using a packet trace with SSH logs from real production servers.

An Example of PCB Reverse Engineering - Reconstruction of Digilent JTAG SMT3 Schematic

Autoři
Bartík, M.; Beneš, T.; Hynek, K.
Rok
2019
Publikováno
The 7th IEEE Workshop on Advances in Information, Electronic and Electrical Engineering. Piscataway (New Jersey): IEEE, 2019. ISBN 978-1-7281-6730-5.
Typ
Stať ve sborníku
Anotace
This paper presents a successful reverse engineering process of Digilent JTAG-SMT3-NC module, revealing the identity of all key components. The reconstruction required a deep knowledge of PCB (Printed Circuit Board) design and manufacturing process and knowledge of (elementary) function principles and behavior of the examined device. We were able to reveal 80% of schematic via analysis of publicly available resources such as original high-resolution images and BOM (Bill of Material) fragments. The remaining 20% were obtained by non-invasive test equipment such as multi-meter and microscope. The reconstructed schematic has been verified by designing our own PCB implementing the original SMT3 function.

Future approaches to monitoring in high-speed backbone networks

Autoři
Rok
2019
Publikováno
Proceedings of the 7th Prague Embedded Systems Workshop. Praha: ČVUT FIT, Katedra číslicového návrhu, 2019. p. 27-28. ISBN 978-80-01-06607-2.
Typ
Stať ve sborníku
Anotace
Network monitoring features has been always a challenge in high-speed networks. Some of themlike detailed traffic analysis and packet inspection are not suited or simply not feasible even on modernhardware. The challenges are becoming even greater with an uprise of encrypted traffic. This leaves largeopportunity for threat actors to take advantage of. Therefore, it is necessary to develop a new generationof monitoring tools that can deal with the current issues for security purposes. This research aims toimprove traffic analysis techniques to handle encrypted traffic, and also to adapt hardware acceleratedmonitoring components for processing.

Ultra High Resolution Jitter Measurement Method for Ethernet Based Networks

Autoři
Hynek, K.; Beneš, T.; Bartík, M.; Kubalík, P.
Rok
2019
Publikováno
The 9th IEEE Annual Computing and Communication Workshop and Conference (CCWC). Piscataway: IEEE, 2019. p. 847-851. ISBN 9781728105543.
Typ
Stať ve sborníku
Anotace
This document presents a new approach to network jitter measurement and analysis in asynchronous data networks such as Ethernet. The developed monitoring device is capable to analyze an incoming stream speed of 1 Gb/s with the resolution up to 8 ns. The system architecture supports speeds up to 100 Gb/s networks. The presented architecture can provide several statistical functions such as measuring a network jitter by Interarrival Histograms method providing the mean value and peak-to-peak value as well. The architecture was implemented and tested on Xilinx Kintex UltraScale FPGA chip using Avnet AES-KU040-DB-G development board.