CESNET and FIT CTU release the largest dataset yet for threat detection and network traffic prediction

A new dataset from a real academic environment contains over 800,000 time series. It enables advanced cybersecurity research and AI model testing.

The research team from the Administration and Security Tools Department of the CESNET association and the Faculty of Information Technology at FIT CTU in Prague (FIT CTU) has released the largest dataset of its kind to date. The new dataset contains over 800,000 time series capturing anonymized network traffic from a real academic network – ranging from personal computers, servers, and routers to the network activity of entire institutions. This makes it the most realistic and comprehensive publicly available dataset for research in network traffic prediction, anomaly detection, and AI-based network management.

We encounter anomaly detection in everyday life more often than we realize – whether it’s a suspicious payment from another country, an unusual amount recorded by a banking system, deviations in health data tracked by a smartwatch, or sudden changes in online shopping behavior that may indicate account misuse. In all these cases, it’s about detecting anomalies – situations that deviate from normal behavior and may indicate risk. Similar principles apply in cybersecurity, where anomalies in network traffic often signal threats, errors, or critical changes in device behavior.

In network management and security, anomaly detection plays a key role. Modern attacks on infrastructure, such as distributed denial-of-service (DDoS) attacks, malware spread, or exploitation of compromised devices, often hide in regular traffic and evade traditional detection rules.

"By detecting anomalies, it is possible to identify previously unknown threats that manifest as changes in device network behavior," explains Josef Koumar, the main author of the dataset. "Anomalies can also indicate network configuration errors, device overloads, or other operational issues," he adds.

Timely and accurate detection of deviations is therefore crucial for the resilience and reliability of digital infrastructure.

The research team from CESNET and FIT CTU – Josef Koumar, Karel Hynek, Tomáš Čejka, and Pavel Šiška – published the largest publicly available dataset of its kind in the prestigious journal Nature Scientific Data. It contains more than 800,000 time series generated by aggregating real, anonymized network traffic from devices, networks, and institutions on the backbone links of the national academic network CESNET.

Unlike commonly used artificially created laboratory datasets previously available to the scientific community, this dataset captures extensive and diverse traffic from real computer networks. It is an unprecedented achievement that significantly advances research capabilities in cybersecurity and network management. It enables the development of highly accurate AI for anomaly detection and, importantly, its comprehensive and robust testing under real conditions with diverse traffic. This greatly increases the reliability of detection results, for example, for DDoS attacks or suspicious behavior of infected devices.

The contribution is further enhanced by the release of the open-source CESNET TS-Zoo library, which facilitates working with the dataset and allows easy sharing of methodologies through benchmarks. The combination of a realistic dataset and an open-source tool contributes to higher method transparency and reproducibility of experiments – resulting in higher-quality, verifiable results across the research ecosystem.

"Our goal was to provide the community with a realistic dataset for developing and testing algorithms that can protect networks even when most traffic is encrypted. The dataset opens the way for better detection of unknown threats because it is based on a real and complex environment. This makes the results obtained from this dataset more reliable than those from existing datasets. We believe it will contribute to the development of safer and smarter infrastructure not only in academia," comments Josef Koumar, the main author of the dataset.

The person responsible for the content of this page: Bc. Veronika Dvořáková