A parallel algorithm for approximating the silhouette using a ball tree

Autoři
Šimeček, I.; Kozický, C.
Rok
2023
Publikováno
Journal of Parallel and Distributed Computing. 2023, 173 115-125. ISSN 0743-7315.
Typ
Článek
Anotace
Clustering is widely used in many scientific fields. The contribution of enumerating the value of the silhouette is twofold: firstly it can help choosing a suitable cluster count and secondly it can be used to evaluate the quality of a clustering. Enumerating the silhouette exactly is an extremely time-consuming task, especially in big data applications; it is therefore common to approximate its value. This article presents an efficient shared-memory parallel algorithm for approximating the silhouette, which uses a ball tree. The process of initialising the ball tree and enumerating the silhouette are fully parallelised using the OpenMP API. The results of our experiments show that the proposed parallel algorithm substantially increases the speed of computing the silhouette whilst retaining necessary precision for real-world applications.

Active Learning Framework For Long-term Network Traffic Classification

Autoři
Pešek, J.; Soukup, D.; Čejka, T.
Rok
2023
Publikováno
IEEE Annual Computing and Communication Workshop and Conference (CCWC). New Jersey: IEEE, 2023. p. 893-899. ISBN 979-8-3503-3286-5.
Typ
Stať ve sborníku vyzvaná či oceněná
Anotace
Recent network traffic classification methods benefit from machine learning (ML) technology. However, there are many challenges due to the use of ML, such as lack of high-quality annotated datasets, data drifts and other effects causing aging of datasets and ML models, high volumes of network traffic, etc. This paper presents the benefits of augmenting traditional workflows of ML training&deployment and adaption of the Active Learning (AL) concept on network traffic analysis. The paper proposes a novel Active Learning Framework (ALF) to address this topic. ALF provides prepared software components that can be used to deploy an AL loop and maintain an ALF instance that continuously evolves a dataset and ML model automatically. Moreover, ALF includes monitoring, datasets quality evaluation, and optimization capabilities that enhance the current state of the art in the AL domain. The resulting solution is deployable for IP flow-based analysis of high-speed (100 Gb/s) networks, where it was evaluated for more than eight months. Additional use cases were evaluated on publicly available datasets.

Augmenting Monitoring Infrastructure For Dynamic Software-Defined Networks

Autoři
Pešek, J.; Plný, R.; Koumar, J.; Jeřábek, K.; Čejka, T.
Rok
2023
Publikováno
2023 8th International Conference on Smart and Sustainable Technologies (SpliTech). IEEE Xplore, 2023. p. 1-4.
Typ
Stať ve sborníku
Anotace
Software-Defined Networking (SDN) and virtual environment raise new challenges for network monitoring tools. The dynamic and flexible nature of these network technologies requires adaptation of monitoring infrastructure to overcome challenges of analysis and interpretability of the monitored network traffic. This paper describes a concept of automatic on-demand deployment of monitoring probes and correlation of network data with infrastructure state and configuration in time. Such an approach to monitoring SDN virtual networks is usable in several use cases, such as IoT networks and anomaly detection. It increases visibility into complex and dynamic networks. Additionally, it can help with the creation of well-annotated datasets that are essential for any further research.

Backward Pattern Matching on Elastic-Degenerate Strings

Autoři
Procházka, P.; Cvacho, O.; Krčál, L.; Holub, J.
Rok
2023
Publikováno
SN Computer Science. 2023, 4(5), ISSN 2662-995X.
Typ
Článek
Anotace
Recently, the concept of Elastic Degenerate Strings (EDS) was introduced as a way of representing a sequenced population of the same species. Several on-line Elastic Degenerate String Matching (EDSM) algorithms were presented so far. Some of them provide practical implementation. We propose a new on-line EDSM algorithm BNDM-EDS. Our algorithm combines two traditional algorithms BNDM and the Shift-And that were adapted to the specifics needed by Elastic Degenerate Strings. BNDM-EDS is running in O (Nmd w m e) worst-case time. This implies O (Nm) time for small patterns, where m is the length of the searched pattern, N is the size of EDS, and w is the size of the computer word. The algorithm uses O (N + n) space, where n is the length of EDS. BNDM-EDS requires a simple preprocessing step with time and space O(m). Experimental results on real genomic data show superiority of BNDM-EDS over state-of-the-art algorithms.

Binary Codes that do not Preserve Primitivity

Autoři
Holub, Š.; Raška, M.; Starosta, Š.
Rok
2023
Publikováno
Journal of Automated Reasoning. 2023, 67(3), ISSN 0168-7433.
Typ
Článek
Anotace
A set of words X is not primitivity-preserving if there is a primitive list of length at least two of elements from X whose concatenation is imprimitive. Here, a word or list is primitive if it is not equal to a concatenation of several copies of a shorter word or list, and imprimitive otherwise. We formalize a full characterization of such two-element sets {x, y} in the proof assistant Isabelle/HOL. The formalization is based on an existing proof which we analyze and simplify using some innovative ideas. Part of the formalization, interesting on its own, is a description of the ways in which the square xx can appear inside a concatenation of words x and y if |y| = |x|. We also provide a formalized parametric solution of the related equation x(j)y(k) = z(l).

Bribery Can Get Harder in Structured Multiwinner Approval Election

Autoři
Kusek, B.; Bredereck, R.; Faliszewski, P.; Kaczmarczyk, A.; Knop, D.
Rok
2023
Publikováno
Proceedings of the 22nd International Conference on Autonomous Agents and Multiagent Systems, AAMAS 2023. County of Richland: IFAAMAS, 2023. p. 1725-1733. ISBN 978-1-4503-9432-1.
Typ
Stať ve sborníku
Anotace
We study the complexity of constructive bribery in the context of structured multiwinner approval elections. Given such an election, we ask whether a certain candidate can join the winning committee by adding, deleting, or swapping approvals, where each such action comes at a cost and we are limited by a budget. We assume our elections to either have the candidate interval or the voter interval property, and we require the property to hold also after the bribery. While structured elections usually make manipulative attacks significantly easier, our work also shows examples of the opposite behavior. We conclude by presenting preliminary insights regarding the destructive variant of our problem.

CESNET-QUIC22: A large one-month QUIC network traffic dataset from backbone lines

Autoři
Luxemburk, J.; Hynek, K.; Čejka, T.; Lukačovič, A.; Šiška, P.
Rok
2023
Publikováno
Data in Brief. 2023, 2023(46), ISSN 2352-3409.
Typ
Článek
Anotace
The QUIC (Quick UDP Internet Connection) protocol has the potential to replace TLS over TCP, which is the standard choice for reliable and secure Internet communication. Due to its design that makes the inspection of QUIC handshakes challenging and its usage in HTTP/3, there is an increasing demand for research in QUIC traffic analysis. This dataset contains one month of QUIC traffic collected in an ISP backbone network, which connects 500 large institutions and serves around half a million people. The data are delivered as enriched flows that can be useful for various network monitoring tasks. The provided server names and packet-level information allow research in the encrypted traffic classification area. Moreover, included QUIC versions and user agents (smartphone, web browser, and operating system identifiers) provide information for large-scale QUIC deployment studies.

Combining Generators of Adversarial Malware Examples to Increase Evasion Rate

Autoři
Kozák, M.; Jureček, M.
Rok
2023
Publikováno
Proceedings of the 20th International Conference on Security and Cryptography. Porto: SciTePress - Science and Technology Publications, 2023. p. 778-786. ISSN 2184-7711. ISBN 978-989-758-666-8.
Typ
Stať ve sborníku
Anotace
Antivirus developers are increasingly embracing machine learning as a key component of malware defense. While machine learning achieves cutting-edge outcomes in many fields, it also has weaknesses that are exploited by several adversarial attack techniques. Many authors have presented both white-box and black-box generators of adversarial malware examples capable of bypassing malware detectors with varying success. We propose to combine contemporary generators in order to increase their potential. Combining different generators can create more sophisticated adversarial examples that are more likely to evade anti-malware tools. We demonstrated this technique on five well-known generators and recorded promising results. The best-performing combination of AMG-random and MAB-Malware generators achieved an average evasion rate of 15.9% against top-tier antivirus products. This represents an average improvement of more than 36% and 627% over using only the AMG-random and MAB-Malware generators, respectively. The generator that benefited the most from having another generator follow its procedure was the FGSM injection attack, which improved the evasion rate on average between 91.97% and 1,304.73%, depending on the second generator used. These results demonstrate that combining different generators can significantly improve their effectiveness against leading antivirus programs.

Constant factor approximation for tracking paths and fault tolerant feedback vertex set

Autoři
Blažej, V.; Choudhary, P.; Knop, D.; Křišťan, J.; Suchý, O.; Valla, T.
Rok
2023
Publikováno
Discrete Optimization. 2023, 47 ISSN 1572-5286.
Typ
Článek
Anotace
Consider a vertex-weighted graph G with a source s and a target t. TRACKING PATHS requires finding a minimum weight set of vertices (trackers) such that the sequence of trackers in each path from s to t is unique. In this work, we derive a factor 6-approximation algorithm for TRACKING PATHS in weighted graphs and a factor 4-approximation algorithm if the input is unweighted. This is the first constant factor approximation for this problem. While doing so, we also study approximation of the closely related r-FAULT TOLERANT FEEDBACK VERTEX SET problem. There, for a fixed integer r and a given vertex-weighted graph G, the task is to find a minimum weight set of vertices intersecting every cycle of G in at least r+1 vertices. We give a factor O(r) approximation algorithm for r-FAULT TOLERANT FEEDBACK VERTEX SET if r is a constant.

Cultural Diversity and Its Impact on Governance

Autoři
Evan, T.; Holý, V.
Rok
2023
Publikováno
International Journal of Intercultural Relations. 2023, ISSN 0147-1767.
Typ
Článek
Anotace
Hofstede’s six cultural dimensions make it possible to measure the culture of countries but are criticized for assuming the homogeneity of each country. In this paper, we propose two measures based on Hofstede’s cultural dimensions which take into account the heterogeneous structure of citizens with respect to their countries of origin. Using these improved measures, we study the influence of heterogeneous culture and cultural diversity on the quality of institutions measured by the six worldwide governance indicators. We use a linear regression model allowing for dependence in spatial and temporal dimensions as well as the high correlation between the governance indicators. Our results show that the effect of cultural diversity improves some of the governance indicators while worsening others depending on the individual Hofstede cultural dimension.

Design of a modern fast Fourier transform and cache effective bit-reversal algorithm

Autoři
Šimeček, I.; Simek, A.
Rok
2023
Publikováno
International Journal of Parallel, Emergent and Distributed Systems. 2023, 38(3), 229-248. ISSN 1744-5760.
Typ
Článek
Anotace
This article deals with efficient vectorization of the fast Fourier transform algorithm while focusing on Cooley-Tukey versions with power-of-two radixes. Aside from examples of optimizations for 256 and 512-bit vectors, this work also discusses relations between individual radix-based versions, vectorization and OpenMP threading. Ideas are progressing into a timeless design of the FFT algorithm, which can work with any vector size and radix version through conversion into radix-2 output permutation. Furthermore, the implementation of the Cache Optimized Bit-Reversal algorithm, which doubles the performance of its predecessor, is introduced.

Domain Dependent Parameter Setting in SAT Solver Using Machine Learning Techniques

Autoři
Beskyd, F.; Surynek, P.
Rok
2023
Publikováno
Agents and Artificial Intelligence 14th International Conference, ICAART 2022 Virtual Event, February 3–5, 2022 Revised Selected Papers. Cham: Springer International Publishing AG, 2023. p. 169-200. ISSN 0302-9743. ISBN 978-3-031-22952-7.
Typ
Stať ve sborníku
Anotace
We address the problem of variable and truth-value choice in modern search-based Boolean satisfiability (SAT) solvers depending on the problem domain. The SAT problem is the task to determine truth-value assignment for variables of a given Boolean formula under which the formula evaluates to true. The SAT problem is often used as a canonical representation of combinatorial problems in many domains of computer science ranging from artificial intelligence to software engineering. Modern complete search-based SAT solvers represent a universal problem solving tool which often provide higher efficiency than ad-hoc direct solving approaches. Many efficient variable and truth-value selection heuristics were devised. Heuristics can usually be fine tuned by single or multiple numerical parameters prior to executing the search process over the concrete SAT instance. In this paper we present a machine learning approach that predicts the parameters of heuristic from the underlying structure of a graph derived from the input SAT instance. Using this approach we effectively fine tune the SAT solver for specific problem domain.

Establishing Herd Immunity is Hard Even in Simple Geometric Networks

Autoři
Dvořák, M.; Knop, D.; Schierreich, Š.
Rok
2023
Publikováno
Proceedings of the 18th Workshop on Algorithms and Models for the Web Graph. Cham: Springer, 2023. p. 68-82. Lecture Notes in Computer Science. vol. 13894. ISSN 0302-9743. ISBN 978-3-031-32295-2.
Typ
Stať ve sborníku
Anotace
We study the following model of disease spread in a social network. In the beginning, all individuals are either ``infected'' or ``healthy''. Next, in discrete rounds, the disease spreads in the network from infected to healthy individuals such that a healthy individual gets infected if and only if a sufficient number of its direct neighbours are already infected. We represent the social network as a graph. Inspired by the real-world restrictions in the current epidemic, especially by social and physical distancing requirements, we restrict ourselves to networks that can be represented as geometric intersection graphs. We show that finding a minimal vertex set of initially infected individuals to spread the disease in the whole network is computationally hard, already on unit disk graphs. Hence, to provide some algorithmic results, we focus ourselves on simpler geometric graph families, such as interval graphs and grid graphs.

Evaluation of passive OS fingerprinting methods using TCP/IP fields

Autoři
Hulák, M.; Bartoš, V.; Čejka, T.
Rok
2023
Publikováno
2023 8th International Conference on Smart and Sustainable Technologies (SpliTech). IEEE Xplore, 2023.
Typ
Stať ve sborníku
Anotace
An important part of network management is to keep knowledge about the connected devices. One of the tools that can provide such information in real-time is passive OS fingerprinting, in particular the method based on analyzing values of specific TCP/IP headers. The state-of-the-art approach is to use machine learning to create such OS classifier. In this paper, we focus on the evaluation of this approach from several perspectives. We took two existing public datasets and created a new one from our network and trained machine learning models to classify the 4 most common operation system families based on selected TCP/IP fields. We compare different models, discuss the need to round TTL values to avoid over-fitting, and test the transferability of models trained on data from different networks. Although TCP/IP-related characteristics of individual operating systems should be independent on where the device is located, our experiments show that a model trained in one network performs much worse in another one, making model creation and deployment more difficult in practice. A good solution may be to combine data from multiple networks. A model trained on a combination of all three datasets exhibited the best results on average across the datasets.

Evaluation of the Limit of Detection in Network Dataset Quality Assessment with PerQoDA

Autoři
Wasielewska, K.; Soukup, D.; Čejka, T.; Camacho, J.
Rok
2023
Publikováno
ECML PKDD 2022: Machine Learning and Principles and Practice of Knowledge Discovery in Databases. Cham: Springer, 2023. p. 170-185. ISSN 1865-0929. ISBN 978-3-031-23632-7.
Typ
Stať ve sborníku
Anotace
Machine learning is recognised as a relevant approach to detect attacks and other anomalies in network traffic. However, there are still no suitable network datasets that would enable effective detection. On the other hand, the preparation of a network dataset is not easy due to privacy reasons but also due to the lack of tools for assessing their quality. In a previous paper, we proposed a new method for data quality assessment based on permutation testing. This paper presents a parallel study on the limits of detection of such an approach. We focus on the problem of network flow classification and use well-known machine learning techniques. The experiments were performed using publicly available network datasets.

Evaluation of the Medium-sized Neural Network using Approximative Computations on Zynq FPGA

Rok
2023
Publikováno
Proceedings of 2023 12th Mediterranean Conference on Embedded Computing (MECO). Piscataway: IEEE, 2023. p. 1-4. ISSN 2637-9511. ISBN 979-8-3503-2291-0.
Typ
Stať ve sborníku
Anotace
Integrating artificial intelligence technologies into embedded systems requires efficient implementation of neural networks in hardware. The paper presents a Zynq 7020 FPGA implementation and evaluation of a middle-sized dense neural network based on approximate computation by linearly approximated functions. Three famous benchmarks were used for classification accuracy evaluation and hardware testing. We use our highly pipelined neural hardware architecture that takes weights from block RAMs to save logic resources and enables their update from the processing system. The architecture reaches excellent design scalability, allowing us to estimate the number of neurons implemented in programmable logic based on single-neuron resources. We reached nearly full chip utilization while preserving the high clock frequency for the FPGA used.

Fine-grained view on bribery for group identification

Autoři
Boehmer, N.; Bredereck, R.; Knop, D.; Luo, J.
Rok
2023
Publikováno
Autonomous Agents and Multi-Agent Systems. 2023, 37(1), ISSN 1387-2532.
Typ
Článek
Anotace
Given a set of agents qualifying or disqualifying each other, group identification is the task of identifying a socially qualified subgroup of agents. Social qualification depends on the specific rule used to aggregate individual qualifications . The classical bribery problem in this context asks how many agents need to change their qualifications in order to change the outcome in a certain way. Complementing previous results showing polynomial-time solvability or NP-hardness of bribery for various social rules in the constructive (aiming at making specific agents socially qualified) or destructive (aiming at making specific agents socially disqualified) setting, we provide a comprehensive picture of the parameterized computational complexity landscape. Conceptually, we also consider a more fine-grained concept of bribery cost, where we ask how many single qualifications need to be changed, nonunit prices for different bribery actions, and a more general bribery goal that combines the constructive and destructive setting.

Flexing infinite frameworks with applications to braced Penrose tilings

Autoři
Dewar, S.; Legerský, J.
Rok
2023
Publikováno
Discrete Applied Mathematics. 2023, 324 1-17. ISSN 0166-218X.
Typ
Článek
Anotace
A planar framework – a graph together with a map of its vertices to the plane – is flexible if it allows a continuous deformation preserving the distances between adjacent vertices. Extending a recent previous result, we prove that a connected graph with a countable vertex set can be realized as a flexible framework if and only if it has a so-called NAC-coloring. The tools developed to prove this result are then applied to frameworks where every 4-cycle is a parallelogram, and countably infinite graphs with n-fold rotational symmetry. With this, we determine a simple combinatorial characterization that determines whether the 1-skeleton of a Penrose rhombus tiling with a given set of braced rhombi will have a flexible motion, and also whether the motion will preserve 5-fold rotational symmetry.

Inexact tree pattern matching with 1-degree edit distance using finite automata

Rok
2023
Publikováno
Discrete Applied Mathematics. 2023, 330 78-97. ISSN 0166-218X.
Typ
Článek
Anotace
Given an input tree and a tree pattern, the inexact tree pattern matching problem is about finding all subtrees in the input tree that match the tree pattern with up to errors. To measure the number of errors between two labeled ordered trees, we use the 1-degree edit distance that uses operations node relabel, leaf insert, and leaf delete. The proposed solution is based on a finite automaton that reads a given input tree represented in linear, prefix bar, notation. First, we show its construction for a constrained variant of 1-degree edit distance where leaf insert/delete operations cannot be applied to the tree pattern recursively. Then, we extend this approach to both unit cost and non-unit cost 1-degree edit distance. The deterministic version of the proposed finite automaton finds all inexact occurrences of the tree pattern in time linear to the size of the input tree. However, since the size of such automaton can be exponential in the number of nodes of the tree pattern, it is practical only for small patterns. Therefore, we also consider the dynamic programming approach as a way of simulating the nondeterministic finite automaton. This approach comes with the space complexity and time complexity where is the number of nodes of the tree pattern, is the number of nodes of the input tree, and is the maximum number of errors allowed.

Maximizing Influence Spread through a Dynamic Social Network (Student Abstract)

Rok
2023
Publikováno
Proceedings of the 37th AAAI Conference on Artificial Intelligence. Menlo Park: AAAI Press, 2023. p. 16316-16317. ISSN 2159-5399.
Typ
Stať ve sborníku
Anotace
Modern social networks are dynamic in their nature; new connections are appearing and old connections are disappearing all the time. However, in our algorithmic and complexity studies, we usually model social networks as static graphs. In this paper, we propose a new paradigm for the study of the well-known Target Set Selection problem, which is a fundamental problem in viral marketing and the spread of opinion through social networks. In particular, we use temporal graphs to capture the dynamic nature of social networks. We show that the temporal interpretation is, unsurprisingly, NP-complete in general. Then, we study computational complexity of this problem for multiple restrictions of both the threshold function and the underlying graph structure and provide multiple hardness lower-bounds.

Filtr

Za obsah stránky zodpovídá: doc. Ing. Štěpán Starosta, Ph.D.