Recommendations with minimum exposure guarantees: A post-processing framework

Authors
Lopes, R.; da Silva Alves, R.; Ledent, A.; Santos, R.L.T.; Kloft, M.
Year
2024
Published
Expert Systems with Applications. 2024, 236 1-9. ISSN 1873-6793.
Type
Article
Annotation
Relevance-based ranking is a popular ingredient in recommenders, but it frequently struggles to meet fairness criteria because social and cultural norms may favor some item groups over others. For instance, some items might receive lower ratings due to some sort of bias (e.g. gender bias). A fair ranking should balance the exposure of items from advantaged and disadvantaged groups. To this end, we propose a novel post-processing framework to produce fair, exposure-aware recommendations. Our approach is based on an integer linear programming model maximizing the expected utility while satisfying a minimum exposure constraint. The model has fewer variables than previous work and thus can be deployed to larger datasets and allows the organization to define a minimum level of exposure for groups of items. We conduct an extensive empirical evaluation indicating that our new framework can increase the exposure of items from disadvantaged groups at a small cost of recommendation accuracy

A parallel algorithm for approximating the silhouette using a ball tree

Authors
Šimeček, I.; Kozický, C.
Year
2023
Published
Journal of Parallel and Distributed Computing. 2023, 173 115-125. ISSN 0743-7315.
Type
Article
Annotation
Clustering is widely used in many scientific fields. The contribution of enumerating the value of the silhouette is twofold: firstly it can help choosing a suitable cluster count and secondly it can be used to evaluate the quality of a clustering. Enumerating the silhouette exactly is an extremely time-consuming task, especially in big data applications; it is therefore common to approximate its value. This article presents an efficient shared-memory parallel algorithm for approximating the silhouette, which uses a ball tree. The process of initialising the ball tree and enumerating the silhouette are fully parallelised using the OpenMP API. The results of our experiments show that the proposed parallel algorithm substantially increases the speed of computing the silhouette whilst retaining necessary precision for real-world applications.

Acceleration of a parallel BDDC solver by using graphics processing units on subdomains

Authors
Šístek, J.; Oberhuber, T.
Year
2023
Published
International Journal of High Performance Computing Applications. 2023, 37(2), 151-164. ISSN 1094-3420.
Type
Article
Annotation
An approach to accelerating a parallel domain decomposition (DD) solver by graphics processing units (GPUs) is investigated. The solver is based on the Balancing Domain Decomposition Method by Constraints (BDDC), which is a nonoverlapping DD technique. Two kinds of local matrices are required by BDDC. First, dense matrices corresponding to local Schur complements of interior unknowns are constructed by the sparse direct solver. These are further used as part of the local saddle-point problems within BDDC. In the next step, the local matrices are copied to GPUs. Repeated multiplications of local vectors with the dense matrix of the Schur complement are performed for each subdomain. In addition, factorizations and backsubstitutions with the dense saddle-point subdomain matrices are also performed on GPUs. Detailed times of main components of the algorithm are measured on a benchmark Poisson problem. The method is also applied to an unsteady problem of incompressible flow, where the Krylov subspace iterations are performed repeatedly in each time step. The results demonstrate the potential of the approach to speed up realistic simulations up to 5 times with a preference towards large subdomains.

Active Learning Framework For Long-term Network Traffic Classification

Authors
Pešek, J.; Soukup, D.; Čejka, T.
Year
2023
Published
IEEE Annual Computing and Communication Workshop and Conference (CCWC). New Jersey: IEEE, 2023. p. 893-899. ISBN 979-8-3503-3286-5.
Type
Invited/Awarded proceedings paper
Annotation
Recent network traffic classification methods benefit from machine learning (ML) technology. However, there are many challenges due to the use of ML, such as lack of high-quality annotated datasets, data drifts and other effects causing aging of datasets and ML models, high volumes of network traffic, etc. This paper presents the benefits of augmenting traditional workflows of ML training&deployment and adaption of the Active Learning (AL) concept on network traffic analysis. The paper proposes a novel Active Learning Framework (ALF) to address this topic. ALF provides prepared software components that can be used to deploy an AL loop and maintain an ALF instance that continuously evolves a dataset and ML model automatically. Moreover, ALF includes monitoring, datasets quality evaluation, and optimization capabilities that enhance the current state of the art in the AL domain. The resulting solution is deployable for IP flow-based analysis of high-speed (100 Gb/s) networks, where it was evaluated for more than eight months. Additional use cases were evaluated on publicly available datasets.

Augmenting Monitoring Infrastructure For Dynamic Software-Defined Networks

Authors
Pešek, J.; Plný, R.; Koumar, J.; Jeřábek, K.; Čejka, T.
Year
2023
Published
2023 8th International Conference on Smart and Sustainable Technologies (SpliTech). IEEE Xplore, 2023. p. 1-4.
Type
Proceedings paper
Annotation
Software-Defined Networking (SDN) and virtual environment raise new challenges for network monitoring tools. The dynamic and flexible nature of these network technologies requires adaptation of monitoring infrastructure to overcome challenges of analysis and interpretability of the monitored network traffic. This paper describes a concept of automatic on-demand deployment of monitoring probes and correlation of network data with infrastructure state and configuration in time. Such an approach to monitoring SDN virtual networks is usable in several use cases, such as IoT networks and anomaly detection. It increases visibility into complex and dynamic networks. Additionally, it can help with the creation of well-annotated datasets that are essential for any further research.

Automated age-at-death estimation from 3D surface scans of the facies auricularis of the pelvic bone

Authors
Štepanovský, M.; Buk, Z.; Koterova, A.P.; Bruzek, J.
Year
2023
Published
Forensic Science International. 2023, 349 ISSN 0379-0738.
Type
Article
Annotation
This work presents an automated data-mining model for age-at-death estimation based on 3D scans of the auricular surface of the pelvic bone. The study is based on a multi-population sample of 688 individuals (males and females) originating from one Asian and five European identified osteological collections. Our method requires no expert knowledge and achieves similar accuracy compared to traditional subjective methods. Apart from data acquisition, the whole procedure of pre-processing, feature extraction and age estimation is fully automated and implemented as a computer program. This program is a part of a freely available web-based software tool called CoxAGE3D. This software tool is available at https://cox-age3d.fit.cvut.cz/ Our age-at-death estimation method is suitable for use on individuals with known/un-known population affinity and provides moderate correlation between the estimated age and actual age (Pearson's correlation coefficient is 0.56), and a mean absolute error of 12.4 years.& COPY; 2023 Elsevier B.V. All rights reserved.

Backward Pattern Matching on Elastic-Degenerate Strings

Authors
Procházka, P.; Cvacho, O.; Krčál, L.; Holub, J.
Year
2023
Published
SN Computer Science. 2023, 4(5), ISSN 2662-995X.
Type
Article
Annotation
Recently, the concept of Elastic Degenerate Strings (EDS) was introduced as a way of representing a sequenced population of the same species. Several on-line Elastic Degenerate String Matching (EDSM) algorithms were presented so far. Some of them provide practical implementation. We propose a new on-line EDSM algorithm BNDM-EDS. Our algorithm combines two traditional algorithms BNDM and the Shift-And that were adapted to the specifics needed by Elastic Degenerate Strings. BNDM-EDS is running in O (Nmd w m e) worst-case time. This implies O (Nm) time for small patterns, where m is the length of the searched pattern, N is the size of EDS, and w is the size of the computer word. The algorithm uses O (N + n) space, where n is the length of EDS. BNDM-EDS requires a simple preprocessing step with time and space O(m). Experimental results on real genomic data show superiority of BNDM-EDS over state-of-the-art algorithms.

Binary Codes that do not Preserve Primitivity

Authors
Holub, Š.; Raška, M.; Starosta, Š.
Year
2023
Published
Journal of Automated Reasoning. 2023, 67(3), ISSN 0168-7433.
Type
Article
Annotation
A set of words X is not primitivity-preserving if there is a primitive list of length at least two of elements from X whose concatenation is imprimitive. Here, a word or list is primitive if it is not equal to a concatenation of several copies of a shorter word or list, and imprimitive otherwise. We formalize a full characterization of such two-element sets {x, y} in the proof assistant Isabelle/HOL. The formalization is based on an existing proof which we analyze and simplify using some innovative ideas. Part of the formalization, interesting on its own, is a description of the ways in which the square xx can appear inside a concatenation of words x and y if |y| = |x|. We also provide a formalized parametric solution of the related equation x(j)y(k) = z(l).

Bounds on Functionality and Symmetric Difference – Two Intriguing Graph Parameters

Authors
Dvořák, P.; Folwarczný, L.; Opler, M.; Pudlák, P.; Šámal, R.; Vu, T.A.
Year
2023
Published
Proceedings of the 49th International Workshop on Graph-Theoretic Concepts in Computer Science. Cham: Springer, 2023. p. 305-318. Lecture Notes in Computer Science. vol. 14093. ISSN 0302-9743. ISBN 978-3-031-43379-5.
Type
Proceedings paper
Annotation
[Alecu et al.: Graph functionality, JCTB2021] define functionality, a graph parameter that generalizes graph degeneracy. They research the relation of functionality to many other graph parameters (tree-width, clique-width, VC-dimension, etc.). Extending their research, we prove logarithmic lower bound for functionality of the random graph~$G(n,1/2)$. Previously known graphs have functionality logarithmic in number of vertices. We show that for every graph~$G$ on~$n$ vertices we have $\mathrm{fun}(G) \le O(\sqrt{n \log n})$ and we give a nearly matching $\Omega(\sqrt{n})$-lower bound provided by projective planes. Further, we study a related graph parameter \emph{symmetric difference}, the minimum of $|N(u) \Delta N(v)|$ over all pairs of vertices of the ``worst possible'' induced subgraph. It was observed by Alecu et al. that $\mathrm{fun}(G) \le \mathrm{sd}(G)+1$ for every graph~$G$. We compare $\mathrm{fun}$ and $\mathrm{sd}$ for the class~$\mathrm{INT}$ of interval graphs and $\mathrm{CA}$ of circular-arc graphs. We let $\mathrm{INT}_n$ denote the $n$-vertex interval graphs, similarly for $\mathrm{CA}_n$. Alecu et al. ask, whether $\mathrm{fun}(\mathrm{INT})$ is bounded. Dallard et al. answer this positively in a recent preprint. On the other hand, we show that $\Omega(\sqrt[4]{n}) \leq \sd(\mathrm{INT}_n) \leq O(\sqrt[3]{n})$. For the related class $\mathrm{CA}$ we show that $\mathrm{sd}(\mathrm{CA}_n) = \Theta(\sqrt{n})$. We propose a follow-up question: is $\mathrm{fun}(\mathrm{CA})$ bounded?

Bribery Can Get Harder in Structured Multiwinner Approval Election

Authors
Kusek, B.; Bredereck, R.; Faliszewski, P.; Kaczmarczyk, A.; Knop, D.
Year
2023
Published
Proceedings of the 22nd International Conference on Autonomous Agents and Multiagent Systems, AAMAS 2023. County of Richland: IFAAMAS, 2023. p. 1725-1733. ISSN 1548-8403. ISBN 978-1-4503-9432-1.
Type
Proceedings paper
Annotation
We study the complexity of constructive bribery in the context of structured multiwinner approval elections. Given such an election, we ask whether a certain candidate can join the winning committee by adding, deleting, or swapping approvals, where each such action comes at a cost and we are limited by a budget. We assume our elections to either have the candidate interval or the voter interval property, and we require the property to hold also after the bribery. While structured elections usually make manipulative attacks significantly easier, our work also shows examples of the opposite behavior. We conclude by presenting preliminary insights regarding the destructive variant of our problem.

Bridging Offline-Online Evaluation with a Time-dependent and Popularity Bias-free Offline Metric for Recommenders

Authors
Kasalický, P.; da Silva Alves, R.; Kordík, P.
Year
2023
Published
Proceedings of EvalRS: A Rounded Evaluation Of Recommender Systems 2023. CEUR-WS.org, 2023. ISSN 1613-0073.
Type
Proceedings paper
Annotation
The evaluation of recommendation systems is a complex task. The offline and online evaluation metrics for recommender systems are ambiguous in their true objectives. The majority of recently published papers benchmark their methods using ill-posed offline evaluation methodology that often fails to predict true online performance. Because of this, the impact that academic research has on the industry is reduced. The aim of our research is to investigate and compare the online performance of offline evaluation metrics. We show that penalizing popular items and considering the time of transactions during the evaluation significantly improves our ability to choose the best recommendation model for a live recommender system. Our results, averaged over five large-size real-world live data procured from recommenders, aim to help the academic community to understand better offline evaluation and optimization criteria that are more relevant for real applications of recommender systems.

Can variants, reinfection, symptoms and test types affect COVID-19 diagnostic performance? A large-scale retrospective study of AG-RDTs during circulation of Delta and Omicron variants, Czechia, December 2021 to February 2022

Authors
Kliegr, T.; Jarkovský, J.; Jiřincová, H.; Kuchař, J.; Karel, T.; Chudán, D.; Vojíř, S.; Zavřel, M.; Šanca, O.; Tachezy, R.
Year
2023
Published
Eurosurveillance. 2023, 28(38), 1-14. ISSN 1560-7917.
Type
Article
Annotation
Background The sensitivity and specificity of selected antigen detection rapid diagnostic tests (AG-RDTs) for SARS-CoV-2 were determined in the unvaccinated population when the Delta variant was circulating. Viral loads, dynamics, symptoms and tissue tropism differ between Omicron and Delta. Aim We aimed to compare AG-RDT sensitivity and specificity in selected subgroups during Omicron vs Delta circulation. Methods We retrospectively paired AG-RDT results with PCRs registered in Czechia’s Information System for Infectious Diseases from 1 to 25 December 2021 (Delta, n = 20,121) and 20 January to 24 February 2022 (Omicron, n = 47,104). Results When confirmatory PCR was conducted on the same day as AG-RDT as a proxy for antigen testing close to peak viral load, the average sensitivity for Delta was 80.4% and for Omicron 81.4% (p < 0.05). Sensitivity in vaccinated individuals was lower for Omicron (OR = 0.94; 95% confidence interval (CI): 0.87–1.03), particularly in reinfections (OR = 0.83; 95% CI: 0.75–0.92). Saliva AG-RDT sensitivity was below average for both Delta (74.4%) and Omicron (78.4%). Tests on the European Union Category A list had higher sensitivity than tests in Category B. The highest sensitivity for Omicron (88.5%) was recorded for patients with loss of smell or taste, however, these symptoms were almost 10-fold less common than for Delta. The sensitivity of AG-RDTs performed on initially asymptomatic individuals done 1, 2 or 3 days before a positive PCR test was consistently lower for Omicron compared with Delta. Conclusion Sensitivity for Omicron was lower in subgroups that may become more common if SARS-CoV-2 becomes an endemic virus.

CESNET-QUIC22: A large one-month QUIC network traffic dataset from backbone lines

Authors
Luxemburk, J.; Hynek, K.; Čejka, T.; Lukačovič, A.; Šiška, P.
Year
2023
Published
Data in Brief. 2023, 2023(46), ISSN 2352-3409.
Type
Article
Annotation
The QUIC (Quick UDP Internet Connection) protocol has the potential to replace TLS over TCP, which is the standard choice for reliable and secure Internet communication. Due to its design that makes the inspection of QUIC handshakes challenging and its usage in HTTP/3, there is an increasing demand for research in QUIC traffic analysis. This dataset contains one month of QUIC traffic collected in an ISP backbone network, which connects 500 large institutions and serves around half a million people. The data are delivered as enriched flows that can be useful for various network monitoring tasks. The provided server names and packet-level information allow research in the encrypted traffic classification area. Moreover, included QUIC versions and user agents (smartphone, web browser, and operating system identifiers) provide information for large-scale QUIC deployment studies.

Combining Generators of Adversarial Malware Examples to Increase Evasion Rate

Authors
Kozák, M.; Jureček, M.
Year
2023
Published
Proceedings of the 20th International Conference on Security and Cryptography. Porto: SciTePress - Science and Technology Publications, 2023. p. 778-786. ISSN 2184-7711. ISBN 978-989-758-666-8.
Type
Proceedings paper
Annotation
Antivirus developers are increasingly embracing machine learning as a key component of malware defense. While machine learning achieves cutting-edge outcomes in many fields, it also has weaknesses that are exploited by several adversarial attack techniques. Many authors have presented both white-box and black-box generators of adversarial malware examples capable of bypassing malware detectors with varying success. We propose to combine contemporary generators in order to increase their potential. Combining different generators can create more sophisticated adversarial examples that are more likely to evade anti-malware tools. We demonstrated this technique on five well-known generators and recorded promising results. The best-performing combination of AMG-random and MAB-Malware generators achieved an average evasion rate of 15.9% against top-tier antivirus products. This represents an average improvement of more than 36% and 627% over using only the AMG-random and MAB-Malware generators, respectively. The generator that benefited the most from having another generator follow its procedure was the FGSM injection attack, which improved the evasion rate on average between 91.97% and 1,304.73%, depending on the second generator used. These results demonstrate that combining different generators can significantly improve their effectiveness against leading antivirus programs.

Comprehensive Concept of DevOps Teaching at FIT CTU

Authors
Year
2023
Published
Informatika 2023 - Sborník příspěvků. Jihlava: Vysoká škola polytechnická Jihlava, 2023.
Type
Proceedings paper
Annotation
DevOps is a relatively new discipline not covered well by higher education. Companies resort to internal training to compensate for the lack of graduates. It is a spectrum of processes and technologies from agile development, testing and security, build and release management, to automation of deployment and application monitoring. At FIT CTU, the departments of Software Engineering and Computer Systems and experts from practice created a bachelor's course with prerequisites of Git and Operating Systems, whose students practically try out the basics of all technologies on internal cloud resources and a continuous integration pipeline comprehensively from compilation to deployment.

Constant factor approximation for tracking paths and fault tolerant feedback vertex set

Authors
Blažej, V.; Choudhary, P.; Knop, D.; Křišťan, J.; Suchý, O.; Valla, T.
Year
2023
Published
Discrete Optimization. 2023, 47 ISSN 1572-5286.
Type
Article
Annotation
Consider a vertex-weighted graph G with a source s and a target t. TRACKING PATHS requires finding a minimum weight set of vertices (trackers) such that the sequence of trackers in each path from s to t is unique. In this work, we derive a factor 6-approximation algorithm for TRACKING PATHS in weighted graphs and a factor 4-approximation algorithm if the input is unweighted. This is the first constant factor approximation for this problem. While doing so, we also study approximation of the closely related r-FAULT TOLERANT FEEDBACK VERTEX SET problem. There, for a fixed integer r and a given vertex-weighted graph G, the task is to find a minimum weight set of vertices intersecting every cycle of G in at least r+1 vertices. We give a factor O(r) approximation algorithm for r-FAULT TOLERANT FEEDBACK VERTEX SET if r is a constant.

Cultural Diversity and Its Impact on Governance

Authors
Evan, T.; Holý, V.
Year
2023
Published
Socio-Economic Planning Sciences. 2023, 89 1-13. ISSN 0038-0121.
Type
Article
Annotation
Hofstede’s six cultural dimensions make it possible to measure the culture of countries but are criticized for assuming the homogeneity of each country. In this paper, we propose two measures based on Hofstede’s cultural dimensions which take into account the heterogeneous structure of citizens with respect to their countries of origin. Using these improved measures, we study the influence of heterogeneous culture and cultural diversity on the quality of institutions measured by the six worldwide governance indicators. We use a linear regression model allowing for dependence in spatial and temporal dimensions as well as the high correlation between the governance indicators. Our results show that the effect of cultural diversity improves some of the governance indicators while worsening others depending on the individual Hofstede cultural dimension.

Design of a modern fast Fourier transform and cache effective bit-reversal algorithm

Authors
Šimeček, I.; Simek, A.
Year
2023
Published
International Journal of Parallel, Emergent and Distributed Systems. 2023, 38(3), 229-248. ISSN 1744-5760.
Type
Article
Annotation
This article deals with efficient vectorization of the fast Fourier transform algorithm while focusing on Cooley-Tukey versions with power-of-two radixes. Aside from examples of optimizations for 256 and 512-bit vectors, this work also discusses relations between individual radix-based versions, vectorization and OpenMP threading. Ideas are progressing into a timeless design of the FFT algorithm, which can work with any vector size and radix version through conversion into radix-2 output permutation. Furthermore, the implementation of the Cache Optimized Bit-Reversal algorithm, which doubles the performance of its predecessor, is introduced.

Domain Dependent Parameter Setting in SAT Solver Using Machine Learning Techniques

Authors
Beskyd, F.; Surynek, P.
Year
2023
Published
Agents and Artificial Intelligence 14th International Conference, ICAART 2022 Virtual Event, February 3–5, 2022 Revised Selected Papers. Cham: Springer International Publishing AG, 2023. p. 169-200. ISSN 0302-9743. ISBN 978-3-031-22952-7.
Type
Proceedings paper
Annotation
We address the problem of variable and truth-value choice in modern search-based Boolean satisfiability (SAT) solvers depending on the problem domain. The SAT problem is the task to determine truth-value assignment for variables of a given Boolean formula under which the formula evaluates to true. The SAT problem is often used as a canonical representation of combinatorial problems in many domains of computer science ranging from artificial intelligence to software engineering. Modern complete search-based SAT solvers represent a universal problem solving tool which often provide higher efficiency than ad-hoc direct solving approaches. Many efficient variable and truth-value selection heuristics were devised. Heuristics can usually be fine tuned by single or multiple numerical parameters prior to executing the search process over the concrete SAT instance. In this paper we present a machine learning approach that predicts the parameters of heuristic from the underlying structure of a graph derived from the input SAT instance. Using this approach we effectively fine tune the SAT solver for specific problem domain.

Establishing Herd Immunity is Hard Even in Simple Geometric Networks

Authors
Dvořák, M.; Knop, D.; Schierreich, Š.
Year
2023
Published
Proceedings of the 18th Workshop on Algorithms and Models for the Web Graph. Cham: Springer, 2023. p. 68-82. Lecture Notes in Computer Science. vol. 13894. ISSN 0302-9743. ISBN 978-3-031-32295-2.
Type
Proceedings paper
Annotation
We study the following model of disease spread in a social network. In the beginning, all individuals are either ``infected'' or ``healthy''. Next, in discrete rounds, the disease spreads in the network from infected to healthy individuals such that a healthy individual gets infected if and only if a sufficient number of its direct neighbours are already infected. We represent the social network as a graph. Inspired by the real-world restrictions in the current epidemic, especially by social and physical distancing requirements, we restrict ourselves to networks that can be represented as geometric intersection graphs. We show that finding a minimal vertex set of initially infected individuals to spread the disease in the whole network is computationally hard, already on unit disk graphs. Hence, to provide some algorithmic results, we focus ourselves on simpler geometric graph families, such as interval graphs and grid graphs.
Next

Filter

The person responsible for the content of this page: doc. Ing. Štěpán Starosta, Ph.D.