A parallel algorithm for approximating the silhouette using a ball tree
Authors
Year
2023
Published
Journal of Parallel and Distributed Computing. 2023, 173 115-125. ISSN 0743-7315.
Type
Article
Departments
Annotation
Clustering is widely used in many scientific fields. The contribution of enumerating the value of the silhouette is twofold: firstly it can help choosing a suitable cluster count and secondly it can be used to evaluate the quality of a clustering. Enumerating the silhouette exactly is an extremely time-consuming task, especially in big data applications; it is therefore common to approximate its value. This article presents an efficient shared-memory parallel algorithm for approximating the silhouette, which uses a ball tree. The process of initialising the ball tree and enumerating the silhouette are fully parallelised using the OpenMP API. The results of our experiments show that the proposed parallel algorithm substantially increases the speed of computing the silhouette whilst retaining necessary precision for real-world applications.
Active Learning Framework For Long-term Network Traffic Classification
Authors
Pešek, J.; Soukup, D.; Čejka, T.
Year
2023
Published
IEEE Annual Computing and Communication Workshop and Conference (CCWC). New Jersey: IEEE, 2023. p. 0893-0899. ISBN 979-8-3503-3286-5.
Type
Invited/Awarded proceedings paper
Annotation
Recent network traffic classification methods benefit from machine learning (ML) technology. However, there are many challenges due to the use of ML, such as lack of high-quality annotated datasets, data drifts and other effects causing aging of datasets and ML models, high volumes of network traffic, etc. This paper presents the benefits of augmenting traditional workflows of ML training&deployment and adaption of the Active Learning (AL) concept on network traffic analysis. The paper proposes a novel Active Learning Framework (ALF) to address this topic. ALF provides prepared software components that can be used to deploy an AL loop and maintain an ALF instance that continuously evolves a dataset and ML model automatically. Moreover, ALF includes monitoring, datasets quality evaluation, and optimization capabilities that enhance the current state of the art in the AL domain. The resulting solution is deployable for IP flow-based analysis of high-speed (100 Gb/s) networks, where it was evaluated for more than eight months. Additional use cases were evaluated on publicly available datasets.
Adapting the Size of Artificial Neural Networks Using Dynamic Auto-Sizing
Authors
Cahlík, V.; Kordík, P.; Čepek, M.
Year
2023
Published
IEEE 17th International Conference on Computer Science and Information Technologies. Dortmund: IEEE, 2023. p. 592-596. ISBN 979-8-3503-3431-9.
Type
Proceedings paper
Departments
Annotation
We introduce dynamic auto-sizing, a novel approach to training artificial neural networks which allows the models to automatically adapt their size to the problem domain. The size of the models can be further controlled during the learning process by modifying the applied strength of regularization. The ability of dynamic auto-sizing models to expand or shrink their hidden layers is achieved by periodically growing and pruning entire units such as neurons or filters. For this purpose, we introduce weighted L1 regularization, a novel regularization method for inducing structured sparsity. Besides analyzing the behavior of dynamic auto-sizing, we evaluate predictive performance of models trained using the method and show that such models can provide a predictive advantage over traditional approaches.
CESNET-QUIC22: A large one-month QUIC network traffic dataset from backbone lines
Authors
Luxemburk, J.; Hynek, K.; Čejka, T.; Lukačovič, A.; Šiška, P.
Year
2023
Published
Data in Brief. 2023, 2023(46), ISSN 2352-3409.
Type
Article
Departments
Annotation
The QUIC (Quick UDP Internet Connection) protocol has the potential to replace TLS over TCP, which is the standard choice for reliable and secure Internet communication. Due to its design that makes the inspection of QUIC handshakes challenging and its usage in HTTP/3, there is an increasing demand for research in QUIC traffic analysis. This dataset contains one month of QUIC traffic collected in an ISP backbone network, which connects 500 large institutions and serves around half a million people. The data are delivered as enriched flows that can be useful for various network monitoring tasks. The provided server names and packet-level information allow research in the encrypted traffic classification area. Moreover, included QUIC versions and user agents (smartphone, web browser, and operating system identifiers) provide information for large-scale QUIC deployment studies.
Consistency check of automatic pipeline measurements of quasar redshifts with Bayesian convolutional networks
Authors
Year
2023
Published
Astronomical Data Analysis Software and Systems XXXII. San Francisco: Astronomical Society of the Pacific, 2023.
Type
Proceedings paper
Departments
Annotation
Spectroscopic redshifts of quasars are important inputs for constructing many cosmological models. Redshift measurement is generally considered to be a straightforward task performed by automatic pipelines based on template matching.
Due to the millions of spectra delivered by surveys of SDSS or LAMOST telescopes, it is impossible to verify all redshift measurements of automatic pipelines by a human visual inspection. However, the pipeline results are still taken as the "ground truth" for further statistical inferences.
Nevertheless, because of the similarity of patterns of quasar emission lines in different spectral ranges, an optimal match may be found for a completely different template position, causing severe errors in the measured redshift. For example, it may easily happen that a faint emission star with a noisy spectrum is identified as a high redshift quasar and vice versa.
We show such examples discovered by the consistency check of redshift measurements of the SDSS pipeline and redshift predictions of a regression Bayesian convolutional network. The network is trained on a large amount of human-inspected redshifts and predicts redshifts together with their predictive uncertainties. Therefore, it can also identify cases where predictions are uncertain and thus require human visual inspection.
Constant factor approximation for tracking paths and fault tolerant feedback vertex set
Authors
Year
2023
Published
Discrete Optimization. 2023, 47 ISSN 1572-5286.
Type
Article
Departments
Annotation
Consider a vertex-weighted graph G with a source s and a target t. TRACKING PATHS requires finding a minimum weight set of vertices (trackers) such that the sequence of trackers in each path from s to t is unique. In this work, we derive a factor 6-approximation algorithm for TRACKING PATHS in weighted graphs and a factor 4-approximation algorithm if the input is unweighted. This is the first constant factor approximation for this problem. While doing so, we also study approximation of the closely related r-FAULT TOLERANT FEEDBACK VERTEX SET problem. There, for a fixed integer r and a given vertex-weighted graph G, the task is to find a minimum weight set of vertices intersecting every cycle of G in at least r+1 vertices. We give a factor O(r) approximation algorithm for r-FAULT TOLERANT FEEDBACK VERTEX SET if r is a constant.
Design of a modern fast Fourier transform and cache effective bit-reversal algorithm
Authors
Šimeček, I.; Simek, A.
Year
2023
Published
International Journal of Parallel, Emergent and Distributed Systems. 2023, 38(3), 229-248. ISSN 1744-5760.
Type
Article
Departments
Annotation
This article deals with efficient vectorization of the fast Fourier transform algorithm while focusing on Cooley-Tukey versions with power-of-two radixes.
Aside from examples of optimizations for 256 and 512-bit vectors, this work also discusses relations between individual radix-based versions, vectorization and OpenMP threading.
Ideas are progressing into a timeless design of the FFT algorithm, which can work with any vector size and radix version through conversion into radix-2 output permutation.
Furthermore, the implementation of the Cache Optimized Bit-Reversal algorithm, which doubles the performance of its predecessor, is introduced.
Domain Dependent Parameter Setting in SAT Solver Using Machine Learning Techniques
Authors
Year
2023
Published
Agents and Artificial Intelligence 14th International Conference, ICAART 2022 Virtual Event, February 3–5, 2022 Revised Selected Papers. Cham: Springer International Publishing AG, 2023. p. 169-200. ISSN 0302-9743. ISBN 978-3-031-22952-7.
Type
Proceedings paper
Departments
Annotation
We address the problem of variable and truth-value choice in modern search-based Boolean satisfiability (SAT) solvers depending on the problem
domain. The SAT problem is the task to determine truth-value assignment for
variables of a given Boolean formula under which the formula evaluates to true.
The SAT problem is often used as a canonical representation of combinatorial
problems in many domains of computer science ranging from artificial intelligence to software engineering. Modern complete search-based SAT solvers represent a universal problem solving tool which often provide higher efficiency
than ad-hoc direct solving approaches. Many efficient variable and truth-value
selection heuristics were devised. Heuristics can usually be fine tuned by single
or multiple numerical parameters prior to executing the search process over the
concrete SAT instance. In this paper we present a machine learning approach
that predicts the parameters of heuristic from the underlying structure of a graph
derived from the input SAT instance. Using this approach we effectively fine tune
the SAT solver for specific problem domain.
Evaluation of the Limit of Detection in Network Dataset Quality Assessment with PerQoDA
Authors
Wasielewska, K.; Soukup, D.; Čejka, T.; Camacho, J.
Year
2023
Published
ECML PKDD 2022: Machine Learning and Principles and Practice of Knowledge Discovery in Databases. Cham: Springer, 2023. p. 170-185. ISSN 1865-0929. ISBN 978-3-031-23632-7.
Type
Proceedings paper
Annotation
Machine learning is recognised as a relevant approach to detect attacks and other anomalies in network traffic. However, there are still no suitable network datasets that would enable effective detection. On the other hand, the preparation of a network dataset is not easy due to privacy reasons but also due to the lack of tools for assessing their quality. In a previous paper, we proposed a new method for data quality assessment based on permutation testing. This paper presents a parallel study on the limits of detection of such an approach. We focus on the problem of network flow classification and use well-known machine learning techniques. The experiments were performed using publicly available network datasets.
Fine-grained view on bribery for group identification
Authors
Boehmer, N.; Bredereck, R.; Knop, D.; Luo, J.
Year
2023
Published
Autonomous Agents and Multi-Agent Systems. 2023, 37(1), ISSN 1387-2532.
Type
Article
Departments
Annotation
Given a set of agents qualifying or disqualifying each other, group identification is the task of identifying a socially qualified subgroup of agents. Social qualification depends on the specific rule used to aggregate individual qualifications . The classical bribery problem in this context asks how many agents need to change their qualifications in order to change the outcome in a certain way. Complementing previous results showing polynomial-time solvability or NP-hardness of bribery for various social rules in the constructive (aiming at making specific agents socially qualified) or destructive (aiming at making specific agents socially disqualified) setting, we provide a comprehensive picture of the parameterized computational complexity landscape. Conceptually, we also consider a more fine-grained concept of bribery cost, where we ask how many single qualifications need to be changed, nonunit prices for different bribery actions, and a more general bribery goal that combines the constructive and destructive setting.
Flexing infinite frameworks with applications to braced Penrose tilings
Authors
Dewar, S.; Legerský, J.
Year
2023
Published
Discrete Applied Mathematics. 2023, 324 1-17. ISSN 0166-218X.
Type
Article
Departments
Annotation
A planar framework – a graph together with a map of its vertices to the plane – is flexible if it allows a continuous deformation preserving the distances between adjacent vertices. Extending a recent previous result, we prove that a connected graph with a countable vertex set can be realized as a flexible framework if and only if it has a so-called NAC-coloring. The tools developed to prove this result are then applied to frameworks where every 4-cycle is a parallelogram, and countably infinite graphs with n-fold rotational symmetry. With this, we determine a simple combinatorial characterization that determines whether the 1-skeleton of a Penrose rhombus tiling with a given set of braced rhombi will have a flexible motion, and also whether the motion will preserve 5-fold rotational symmetry.
Inexact tree pattern matching with 1-degree edit distance using finite automata
Authors
Year
2023
Published
Discrete Applied Mathematics. 2023, 330 78-97. ISSN 0166-218X.
Type
Article
Departments
Annotation
Given an input tree and a tree pattern, the inexact tree pattern matching problem is about finding all subtrees in the input tree that match the tree pattern with up to errors. To measure the number of errors between two labeled ordered trees, we use the 1-degree edit distance that uses operations node relabel, leaf insert, and leaf delete. The proposed solution is based on a finite automaton that reads a given input tree represented in linear, prefix bar, notation. First, we show its construction for a constrained variant of 1-degree edit distance where leaf insert/delete operations cannot be applied to the tree pattern recursively. Then, we extend this approach to both unit cost and non-unit cost 1-degree edit distance. The deterministic version of the proposed finite automaton finds all inexact occurrences of the tree pattern in time linear to the size of the input tree. However, since the size of such automaton can be exponential in the number of nodes of the tree pattern, it is practical only for small patterns. Therefore, we also consider the dynamic programming approach as a way of simulating the nondeterministic finite automaton. This approach comes with the space complexity and time complexity where is the number of nodes of the tree pattern, is the number of nodes of the input tree, and is the maximum number of errors allowed.
Meningioma microstructure assessed by diffusion MRI: An investigation of the source of mean diffusivity and fractional anisotropy by quantitative histology
Authors
Brabec, J.; Friedjungová, M.; Vašata, D.; Englund, E.; Bengzon, J.; Knutsson, L.; Szczepankiewicz, F.; van Westen, D.; Sundgren, P.C.; Nilsson, M.
Year
2023
Published
Neuroimage: Clinical. 2023, 37 ISSN 2213-1582.
Type
Article
Departments
Annotation
Mean diffusivity (MD) and fractional anisotropy (FA) from diffusion MRI (dMRI) have been associated with cell density and tissue anisotropy across tumors, but it is unknown whether these associations persist at the microscopic level.
To quantify the degree to which cell density and anisotropy, as determined from histology, account for the intra-tumor variability of MD and FA in meningioma tumors. Furthermore, to clarify whether other histological features account for additional intra-tumor variability of dMRI parameters.
We performed ex-vivo dMRI at 200 μm isotropic resolution and histological imaging of 16 excised meningioma tumor samples. Diffusion tensor imaging (DTI) was used to map MD and FA, as well as the in-plane FA (FAIP). Histology images were analyzed in terms of cell nuclei density (CD) and structure anisotropy (SA; obtained from structure tensor analysis) and were used separately in a regression analysis to predict MD and FAIP, respectively. A convolutional neural network (CNN) was also trained to predict the dMRI parameters from histology patches. The association between MRI and histology was analyzed in terms of out-of-sample (R2OS) on the intra-tumor level and within-sample R2 across tumors. Regions where the dMRI parameters were poorly predicted from histology were analyzed to identify features apart from CD and SA that could influence MD and FAIP, respectively.
Cell density assessed by histology poorly explained intra-tumor variability of MD at the mesoscopic level (200 μm), as median R2OS = 0.04 (interquartile range 0.01–0.26). Structure anisotropy explained more of the variation in FAIP (median R2OS = 0.31, 0.20–0.42). Samples with low R2OS for FAIP exhibited low variations throughout the samples and thus low explainable variability, however, this was not the case for MD. Across tumors, CD and SA were clearly associated with MD (R2 = 0.60) and FAIP (R2 = 0.81), respectively. In 37% of the samples (6 out of 16), cell density did not explain intra-tumor variability of MD when compared to the degree explained by the CNN. Tumor vascularization, psammoma bodies, microcysts, and tissue cohesivity were associated with bias in MD prediction based solely on CD. Our results support that FAIP is high in the presence of elongated and aligned cell structures, but low otherwise.
Cell density and structure anisotropy account for variability in MD and FAIP across tumors but cell density does not explain MD variations within the tumor, which means that low or high values of MD locally may not always reflect high or low tumor cell density. Features beyond cell density need to be considered when interpreting MD.
Minimum Eccentricity Shortest Path Problem with Respect to Structural Parameters
Authors
Kučera, M.; Suchý, O.
Year
2023
Published
Algorithmica. 2023, 85(3), 762-782. ISSN 0178-4617.
Type
Article
Departments
Annotation
The Minimum Eccentricity Shortest Path Problem consists in finding a shortest path with minimum eccentricity in a given undirected graph. The problem is known to be NP-complete and W[2]-hard with respect to the desired eccentricity. We present fpt-algorithms for the problem parameterized by the modular width, distance to cluster graph, the combination of treewidth with the desired eccentricity, and maximum leaf number.
New classes of quadratically integrable systems in magnetic fields: The generalized cylindrical and spherical cases
Authors
Kubů, O.; Marchesiello, A.; Šnobl, L.
Year
2023
Published
Annals of Physics. 2023, 451 ISSN 0003-4916.
Type
Article
Departments
Annotation
We study integrable and superintegrable systems with magnetic field possessing quadratic integrals of motion on the three-dimensional Euclidean space. In contrast with the case without vector potential, the corresponding integrals may no longer be connected to separation of variables in the Hamilton-Jacobi equation and can have more general leading order terms. We focus on two cases extending the physically relevant cylindrical -and spherical-type integrals. We find three new integrable sys-tems in the generalized cylindrical case but none in the spherical one. We conjecture that this is related to the presence, respec-tively absence, of maximal abelian Lie subalgebra of the three-dimensional Euclidean algebra generated by first order integrals in the limit of vanishing magnetic field. By investigating superin-tegrability, we find only one (minimally) superintegrable system among the integrable ones. It does not separate in any orthogonal coordinate system. This system provides a mathematical model of a helical undulator placed in an infinite solenoid. (c) 2023 Elsevier Inc. All rights reserved.
Offline evaluation of the serendipity in recommendation systems
Authors
Pastukhov, D.; Kuznetsov, S.; Vančura, V.; Kordík, P.
Year
2023
Published
IEEE 17th International Conference on Computer Science and Information Technologies. Dortmund: IEEE, 2023. p. 597-601. ISBN 979-8-3503-3431-9.
Type
Proceedings paper
Departments
Annotation
Offline optimization of recommender systems is a difficult task. Popular optimization criteria such as RMSE, Recall, and NDCG do not correlate much with online performance, especially when the recommendation algorithm is largely different from the one used to generate the offline data. An exciting direction of research to mitigate this problem is to use more robust optimization criteria. Serendipity is reported to be a promising proxy. However, more variants exist, and it is unclear whether they can be used as a single criterion to optimize. This paper analyzes how serendipity relates to other optimization criteria for three different recommendation algorithms. Based on our findings, we propose to modify the way serendipity is computed. We conduct experiments using three collaborative filtering algorithms: K-Nearest Neighbors, Matrix Factorization, and Embarrassingly Shallow Autoencoder (EASE). We also employ and evaluate the ensemble learning approach and analyze the importance of the individual components of serendipity.
On a faithful representation of Sturmian morphisms
Authors
Lepšová, J.; Pelantová, E.; Starosta, Š.
Year
2023
Published
European Journal of Combinatorics. 2023, 110 ISSN 0195-6698.
Type
Article
Departments
Annotation
The set of morphisms mapping any Sturmian sequence to a Sturmian sequence forms together with composition the so-called monoid of Sturm. For this monoid, we define a faithful representation by 3x3-matrices with integer entries. We find three convex cones in R^3 and show that a matrix R in Sl(Z,3) is a matrix representing a Sturmian morphism if the three cones are invariant under multiplication by R or R^{1}. This property offers a new tool to study Sturmian sequences. We provide alternative proofs of four known results on Sturmian sequences fixed by a primitive morphism and a new result concerning the square root of a Sturmian sequence.
On balanced sequences and their critical exponent
Authors
Dvořáková, L.; Dolce, F.; Pelantová, E.
Year
2023
Published
Theoretical Computer Science. 2023, 939 18-47. ISSN 0304-3975.
Type
Article
Departments
Annotation
We study aperiodic balanced sequences over finite alphabets.
A sequence v of this type is fully characterised by a Sturmian sequence u and two constant gap sequences y and y'.
We show that the language of v is eventually dendric and we focus on return words to its factors.
We deduce a method computing critical exponent and asymptotic critical exponent of balanced sequences provided the associated Sturmian sequence u has a quadratic slope.
The method is based on looking for the shortest return words to bispecial factors in v.
We illustrate our method on several examples, in particular we confirm a conjecture of Rampersad, Shallit and Vandomme that two specific sequences have the least critical exponent among all balanced sequences over 9- resp. 10-letter alphabets.
On the Use of Multiple Approximations in the Linear Cryptanalysis of Baby Rijndael
Authors
Year
2023
Published
Proceedings of the 9th International Conference on Information Systems Security and Privacy. Madeira: SciTePress, 2023. p. 174-179. ISSN 2184-4356. ISBN 978-989-758-624-8.
Type
Proceedings paper
Departments
Annotation
In this paper, we follow up on our previous research on the resistance of Baby Rijndael, a reduced AES variant, to linear cryptanalysis. We address the issue of relatively low accuracy of the recovery of the encryption key by exploiting multiple linear approximations at once to deduce the correct bit of the key. We try several different methods with varying degree of success, with the final technique increasing the average accuracy of the recovery of the bit of the key to over 82 % in the best case. However, even that technique is not capable of breaking the cipher with less effort than the brute force.
Overview of ParaCell package for indexing in powder diffraction
Authors
Šimeček, I.; Rohlíček, J.; Zaloga, A.
Year
2023
Published
Journal of Applied Crystallography. 2023, 56(1), 293-301. ISSN 1600-5767.
Type
Article
Departments
Annotation
This article describes the crystallography package ParaCell that integrates several indexing methods. All methods share the basic infrastructure of the program that uses graphical processing units (GPU) to evaluate the correctness of the unit cell. The individually implemented indexing methods in the program are presented along with a comparison with other indexing software. The success and time requirements were tested on several datasets, including orthorhombic, monoclinic and triclinic examples.