Data Management Planning: How Requirements and Solutions are Beginning to Converge

Autoři
Pergl, R.; Jones, S.; Hooft, R.
Rok
2019
Publikováno
Data Intelligence. 2019, 2019(2), 208-219. ISSN 2096-7004.
Typ
Článek
Anotace
Effective stewardship of data is a critical precursor to making data FAIR. The goal of this paper is to bring an overview of current state of the art of data management and data stewardship planning solutions (DMP). We begin by arguing why data management is an important vehicle supporting adoption and implementation of the FAIR principles, we describe the background, context and historical development, as well as major driving forces, being research initiatives and funders. Then we provide an overview of the current leading DMP tools in the form of a table presenting the key characteristics. Next, we elaborate on emerging common standards for DMPs, especially the topic of machine-actionable DMPs. As sound DMP is not only a precursor of FAIR data stewardship, but also an integral part of it, we discuss its positioning in the emerging FAIR tools ecosystem. Capacity building and training activities are an important ingredient in the whole effort. Although

Efficient Algorithm for Representations of U(3) in U(N)

Autoři
Langr, D.; Tvrdík, P.; Dytrych, T.; Draayer, J.P.; Launey, K.D.
Rok
2019
Publikováno
Computer Physics Communications. 2019, 244 442-447. ISSN 0010-4655.
Typ
Článek
Anotace
An efficient algorithm for enumerating representations of U(3) that occur in a representation of the unitary group U(N) is introduced. The algorithm is applicable to U(N) representations associated with a system of identical fermions (protons, neutrons, electrons, etc.) distributed among the N=(eta+1)(eta+2)/2 degenerate eigenstates of the etath level of the three-dimensional harmonic oscillator. A C++ implementation of the algorithm is provided and its performance evaluated. The implementation can employ OpenMP threading for use in parallel applications.

Missing Features Reconstruction and Its Impact on Classification Accuracy

Rok
2019
Publikováno
Computational Science – ICCS 2019. Springer, Cham, 2019. p. 207-220. vol. 11538. ISBN 978-3-030-22744-9.
Typ
Stať ve sborníku
Anotace
In real-world applications, we can encounter situations when a well-trained model has to be used to predict from a damaged dataset. The damage caused by missing or corrupted values can be either on the level of individual instances or on the level of entire features. Both situations have a negative impact on the usability of the model on such a dataset. This paper focuses on the scenario where entire features are missing which can be understood as a specific case of transfer learning. Our aim is to experimentally research the influence of various imputation methods on the performance of several classification models. The imputation impact is researched on a combination of traditional methods such as k-NN, linear regression, and MICE compared to modern imputation methods such as multi-layer perceptron (MLP) and gradient boosted trees (XGBT). For linear regression, MLP, and XGBT we also propose two approaches to using them for multiple features imputation. The experiments were performed on both real world and artificial datasets with continuous features where different numbers of features, varying from one feature to 50%, were missing. The results show that MICE and linear regression are generally good imputers regardless of the conditions. On the other hand, the performance of MLP and XGBT is strongly dataset dependent. Their performance is the best in some cases, but more often they perform worse than MICE or linear regression.

Side-Channel Attack on the A5/1 Stream Cipher

Rok
2019
Publikováno
Proceedings of the 22nd Euromicro Conference on Digital Systems Design. Los Alamitos, CA: IEEE Computer Soc., 2019. p. 633-638. ISBN 978-1-7281-2862-7.
Typ
Stať ve sborníku
Anotace
In this paper we present cryptanalysis of the A5/1 stream cipher used in GSM mobile phones. Our attack is based on power analysis where we assume that the power consumption while clocking 3 LFSRs is different than when clocking 2 LFSRs. We demonstrate a simple power analysis (SPA) attack and discuss existing differential power analysis (DPA). We present the attack for recovering secret key based on the information on clocking bits of LFSRs that was deduced from power analysis. The attack has a 100% success rate, requires minimal storage and it does not requires any single bit of a keystream. An average time complexity of our attack based on SPA is around 233 where the computation unit is a resolution of system of linear equations over the Z2. Recovering the secret key using information from the DPA has a constant complexity.

A Tight Lower Bound for Planar Steiner Orientation

Autoři
Suchý, O.; Chitnis, R.; Feldmann, A.
Rok
2019
Publikováno
Algorithmica. 2019, 81(8), 3200-3216. ISSN 0178-4617.
Typ
Článek
Anotace
In the Steiner Orientation problem, the input is a mixed graph G (it has both directed and undirected edges) and a set of k terminal pairs T. The question is whether we can orient the undirected edges in a way such that there is a directed s?t path for each terminal pair (s,t)T. Arkin and Hassin [DAM'02] showed that the Steiner Orientation problem is NP-complete. They also gave a polynomial time algorithm for the special case when k=2. From the viewpoint of exact algorithms, Cygan et al.[ESA'12, SIDMA'13] designed an XP algorithm running in nO(k) time for all k1. Pilipczuk and Wahlstrom [SODA'16, TOCT'18] showed that the Steiner Orientation problem is W[1]-hard parameterized by k. As a byproduct of their reduction, they were able to show that under the Exponential Time Hypothesis (ETH) of Impagliazzo, Paturi and Zane [JCSS'01] the Steiner Orientation problem does not admit an f(k)no(k/logk) algorithm for any computable functionf. In this paper, we give a short and easy proof that the nO(k) algorithm of Cygan etal. is asymptotically optimal, even if the input graph is planar. Formally, we show that the Planar Steiner Orientation problem is W[1]-hard parameterized by the number k of terminal pairs, and, under ETH, cannot be solved in f(k)no(k) time for any computable function f. Moreover, under a stronger hypothesis called Gap-ETH of Dinur [ECCC'16] and Manurangsi and Raghavendra [ICALP'17], we are able to show that there is no constant ?>0 such that Planar Steiner Orientation admits an -approximation in FPT time, i.e., no f(k)no(k) time algorithm can distinguish between the case when all k pairs are satisfiable versus the case when less than k pairs are satisfiable. To the best of our knowledge, this is the first FPT inapproximability result on planar graphs.

Characterization of circular D0L-systems

Rok
2019
Publikováno
Theoretical Computer Science. 2019, 790 131-137. ISSN 0304-3975.
Typ
Článek
Anotace
We give a characterization of circularity of a D0L-system. The characterizing condition is simple to verify and yields an efficient algorithm. To derive it, we prove that every non-circular D0L-system contains arbitrarily long repetitions. This result was already published in 1993 by Mignosi and Séébold, however their proof is only a sketch. We give a complete proof that, in addition, is valid for a slightly relaxed definition of circularity, called weak circularity.

Efficient algorithmic evaluation of correlation power analysis: Key distinguisher based on the correlation trace derivative

Rok
2019
Publikováno
Microprocessors and Microsystems. 2019, 2019(71), 1-8. ISSN 0141-9331.
Typ
Článek
Anotace
Correlation power analysis (CPA) is one of the most common side-channel attacks today, posing a threat to many modern ciphers, including AES. In the final step of this attack, the cipher key is usually extracted by the attacker by visually examining the correlation traces for each key guess. The naïve way to extract the correct key algorithmically is selecting the key guess with the maximum Pearson correlation coefficient. We propose another key distinguisher based on a significant change in the correlation trace rather than on the absolute value of the coefficient. Our approach performs better than the standard maximization, especially in the noisy environment, and it allows to significantly reduce the number of acquired power traces necessary to successfully mount an attack in noisy environment, and in some cases make the attack even feasible.

Contextual Equivalence fo probabilistic languages

Autoři
Culpepper, R.; Wand, M.; Cobb, A.; Giannakopoulos, T.
Rok
2018
Publikováno
Journal Proceedings of the ACM on Programming Languages,Volume 2, Issue ICFP. New York: ACM, 2018. ISSN 2475-1421.
Typ
Stať ve sborníku
Anotace
We present a complete reasoning principle for contextual equivalence in an untyped probabilistic language. The language includes continuous (real-valued) random variables, conditionals, and scoring. It also includes recursion, since the standard call-by-value fixpoint combinator is expressible. We demonstrate the usability of our characterization by proving several equivalence schemas, including familiar facts from lambda calculus as well as results specific to probabilistic programming. In particular, we use it to prove that reordering the random draws in a probabilistic program preserves contextual equivalence. This allows us to show, for example, that (let x = e1 in let y = e2 in e0) =ctx (let y = e2 in let x = e1 in e0) (provided x does not occur free in e2 and y does not occur free in e1) despite the fact that e1 and e2 may have sampling and scoring effects.

Linked Web APIs Dataset

Rok
2018
Publikováno
Semantic Web. 2018, 9(4), 381-391. ISSN 1570-0844.
Typ
Článek
Anotace
Web APIs enjoy a significant increase in popularity and usage in the last decade. They have become the core technology for exposing functionalities and data. Nevertheless, due to the lack of semantic Web API descriptions their discovery, sharing, integration, and assessment of their quality and consumption is limited. In this paper, we present the Linked Web APIs dataset, an RDF dataset with semantic descriptions about Web APIs. It provides semantic descriptions for 11,339 Web APIs, 7,415 mashups and 7,717 developer profiles, which make it the largest available dataset from the Web APIs domain. The dataset captures the provenance, temporal, technical, functional, and non-functional aspects. In addition, we describe the Linked Web APIs Ontology, a minimal model which builds on top of several well-known ontologies. The dataset has been interlinked and published according to the Linked Data principles. Finally, we describe several possible usage scenarios for the dataset and show its potential.

SOPanG: online text searching over a pan-genome

Autoři
Holub, J.; Cislak, A.; Grabowski, S.
Rok
2018
Publikováno
Bioinformatics. 2018, 34(24), 4290-4292. ISSN 1367-4803.
Typ
Článek
Anotace
Motivation: The many thousands of high-quality genomes available now-a-days imply a shift from single genome to pan-genomic analyses. A basic algorithmic building brick for such a scenario is online search over a collection of similar texts, a problem with surprisingly few solutions presented so far.

Solving Multi-Agent Path Finding on Strongly Biconnected Digraphs

Autoři
Surynek, P.; Botea, A.; Bonusi, D.
Rok
2018
Publikováno
Proceedings of the International Joint Conferences on Artifical Intelligence. International Joint Conferences on Artificial Intelligence Organization, 2018. p. 5563-5567. ISSN 1045-0823. ISBN 978-0-9992411-2-7.
Typ
Stať ve sborníku
Anotace
We present and evaluate diBOX, an algorithm for multi-agent path finding on strongly biconnected directed graphs. diBOX runs in polynomial time, computes suboptimal solutions and is complete for instances on strongly biconnected digraphs with at least two unoccupied positions. A detailed empirical analysis shows a good scalability for diBOX.

Za obsah stránky zodpovídá: doc. Ing. Štěpán Starosta, Ph.D.