TCI: A system for distributed network monitoring, troubleshooting and dataset creation
Autoři
Soukup, D.; Pešek, J.; Hejcman, L.; Beneš, D.; Čejka, T.
Rok
2024
Publikováno
NOMS 2024-2024 IEEE Network Operations and Management Symposium. Seoul: IEEE CLEO/Pacific Rim, 2024. ISSN 2374-9709. ISBN 979-8-3503-2793-9.
Typ
Stať ve sborníku
Pracoviště
Anotace
Network traffic monitoring is a very complex task that requires a combination of multiple tools and teams. Very often, detected events must be validated and confirmed, or ongoing detection needs additional detailed data from full packets. All these activities must be done automatically concerning data privacy. This is why we propose a solution in the form of Traffic Capture Infrastructure (TCI), a single system for network traffic capture, investigation, and dataset creation, even in high-speed provider networks. Our system supports extensive user management features to ensure dataset privacy, system integrity, and unified control over many network probes. This paper presents the architecture, main functions, recommendations, and lessons learnt from full packet monitoring in today’s networks. Lastly, we prove the value of this system with several publications that have used our system to create their underlying dataset and network traffic investigation.
NetTiSA: Extended IP flow with time-series features for universal bandwidth-constrained high-speed network traffic classification
Autoři
Rok
2024
Publikováno
Computer Networks. 2024, 240 1-22. ISSN 1389-1286.
Typ
Článek
Pracoviště
Anotace
Network traffic monitoring based on IP Flows is a standard monitoring approach that can be deployed to various network infrastructures, even the large ISP networks connecting millions of people. Since flow records traditionally contain only limited information (addresses, transport ports, and amount of exchanged data), they are also commonly extended by additional features that enable network traffic analysis with high accuracy. These flow extensions are, however, often too large or hard to compute, which then allows only offline analysis or limits their deployment only to smaller-sized networks. This paper proposes a novel extended IP flow called NetTiSA (Network Time Series Analysed) flow, based on analysing the time series of packet sizes. By thoroughly testing 25 different network traffic classification tasks, we show the broad applicability and high usability of NetTiSA flow. For practical deployment, we also consider the sizes of flows extended by NetTiSA features and evaluate the performance impacts of their computation in the flow exporter. The novel features proved to be computationally inexpensive and showed excellent discriminatory performance. The trained machine learning classifiers with proposed features mostly outperformed the state-of-the-art methods. NetTiSA finally bridges the gap and brings universal, small-sized, and computationally inexpensive features for traffic classification that can be scaled up to extensive monitoring infrastructures, bringing the machine learning traffic classification even to 100 Gbps backbone lines.
Look at my Network: An Insight into the ISP Backbone Traffic
Autoři
Beneš, T.; Pešek, J.; Čejka, T.
Rok
2023
Publikováno
2023 19th International Conference on Network and Service Management (CNSM). New York: IEEE, 2023. International Conference on Network and Service Management. vol. 19. ISSN 2165-9605. ISBN 978-3-903176-59-1.
Typ
Stať ve sborníku
Pracoviště
Anotace
High-speed ISP networks provide several challenges that prevent the creation of long-term datasets for giving insight into the traffic. Currently, there are no publicly available long-term datasets capturing the entirety of high-speed ISP networks. Such networks are traditionally monitored using IP Flows, which provide enough high-level information about the situation in the network and support various use cases, such as the detection of outages or security threats. Even with this type of aggregation long-term datasets are very unpractical due to their size. The other problem is that flow monitoring comes with significant aggregation and common traffic statistics are brief and lack useful details and require further processing. This paper addresses these problems and presents a new long-term aggregated dataset, a detailed analysis of public network traffic measured on the ISP backbone, and a monitoring architecture composed of open-source tools capable of using an existing flow exporter infrastructure. Such insight into traffic helps to design and develop hardware optimizations, tuning the performance of monitoring systems, and adapting security detection algorithms.
Augmenting Monitoring Infrastructure For Dynamic Software-Defined Networks
Autoři
Rok
2023
Publikováno
2023 8th International Conference on Smart and Sustainable Technologies (SpliTech). New Jersey: IEEE, 2023. ISBN 978-953-290-128-3.
Typ
Stať ve sborníku
Anotace
Software-Defined Networking (SDN) and virtual environment raise new challenges for network monitoring tools. The dynamic and flexible nature of these network technologies requires adaptation of monitoring infrastructure to overcome challenges of analysis and interpretability of the monitored network traffic. This paper describes a concept of automatic on-demand deployment of monitoring probes and correlation of network data with infrastructure state and configuration in time. Such an approach to monitoring SDN virtual networks is usable in several use cases, such as IoT networks and anomaly detection. It increases visibility into complex and dynamic networks. Additionally, it can help with the creation of well-annotated datasets that are essential for any further research.
Active Learning Framework For Long-term Network Traffic Classification
Autoři
Pešek, J.; Soukup, D.; Čejka, T.
Rok
2023
Publikováno
IEEE Annual Computing and Communication Workshop and Conference (CCWC). New Jersey: IEEE, 2023. p. 893-899. ISBN 979-8-3503-3286-5.
Typ
Stať ve sborníku vyzvaná či oceněná
Anotace
Recent network traffic classification methods benefit from machine learning (ML) technology. However, there are many challenges due to the use of ML, such as lack of high-quality annotated datasets, data drifts and other effects causing aging of datasets and ML models, high volumes of network traffic, etc. This paper presents the benefits of augmenting traditional workflows of ML training&deployment and adaption of the Active Learning (AL) concept on network traffic analysis. The paper proposes a novel Active Learning Framework (ALF) to address this topic. ALF provides prepared software components that can be used to deploy an AL loop and maintain an ALF instance that continuously evolves a dataset and ML model automatically. Moreover, ALF includes monitoring, datasets quality evaluation, and optimization capabilities that enhance the current state of the art in the AL domain. The resulting solution is deployable for IP flow-based analysis of high-speed (100 Gb/s) networks, where it was evaluated for more than eight months. Additional use cases were evaluated on publicly available datasets.
Vision of Active Learning Framework Approach to Network Traffic Analysis Research
Autoři
Pešek, J.; Soukup, D.; Čejka, T.
Rok
2022
Publikováno
Proceedings of the 10th Prague Embedded Systems Workshop. Praha: CTU. Faculty of Information Technology, 2022. p. 68-72. ISBN 978-80-01-07015-4.
Typ
Stať ve sborníku
Anotace
Current research in the network security domain intensively uses machine learning (ML) and artificial intelligence to automate processes and reveal hidden patterns in data. These technologies, however, require lots of training datasets with ideally high quality. Additionally, network infrastructures continuously evolve and thus network traffic dynamically changes in time as well. There is an urgent need to adapt machine learning models, update datasets with the latest samples of annotated network traffic and retrain the models regularly to sustain feasible performance. Active Learning Framework (ALF) directly targets these demands and aims to provide a modular platform for scientific experiments and deployment in practice as well as to support research activities regarding quality of datasets. This paper particularly describes ALF software and proposes its possible use cases in research and practice domains.