Ing. Michal Valenta, Ph.D.

Head of the Department of Software Engineering

Publications

Effective Data Redistribution Based on User Queries in a Distributed Graph Database

Authors
Svitáková, L.; Valenta, M.; Pokorný, J.
Year
2020
Published
Intelligent Information and Database Systems. Cham: Springer, 2020. p. 218-229. Lecture Notes in Computer Science. ISSN 0302-9743. ISBN 978-3-030-42057-4.
Type
Proceedings paper
Annotation
The problem of data distribution in NoSQL databases is particularly difficult in the case of graph databases since the data often represent a large, highly connected graph. We face this task with monitoring of user queries, for which we created a logging module providing information serving as an input to a redistribution algorithm which bases on a lightweight method of Adaptive Partitioning but incorporates our enhancements overcoming its present drawbacks (local optima, balancing, edge weights). The results of our experiments show 70% – 80% reduction of communication between cluster nodes which is a comparable result to other methods, which, however, are more computationally demanding or suffer from other shortcomings.

Enhanced adaptive partitioning in a distributed graph database

Authors
Svitáková, L.; Pokorný, J.; Valenta, M.
Year
2020
Published
Journal of Information and Telecommunication. 2020, 5(1), 104-120. ISSN 2475-1847.
Type
Article
Annotation
Nowadays, open-source graph databases do not include an inherent mechanism for data relocation that would be based on their usage. They often do not offer even appropriate monitoring that could help to make such a decision. Information about data utilization could, however, work as an input to some decision- making process about more suitable data regrouping that could be much more efficient in terms of intra-network communication. Therefore, we created a module for the graph computational framework TinkerPop that logs traffic generated by the user queries. These logged records serve as an input for the algorithm of Adaptive Partitioning that we enhanced with better balancing, avoidance of local optima and the notion of weighted graphs. This approach yields a 70–80% improvement in intra-network communication, which is comparable to other methods, namely Ja-be-Ja, that offers similar results but has higher computational demands.

Data Lineage Temporally Using a Graph Database

Authors
Pokorny, J.; Sykora, J.; Valenta, M.
Year
2019
Published
MEDES '19: Proceedings of the 11th International Conference on Management of Digital EcoSystems. Anchorage, Alaska: IEEE, 2019. p. 285-291. ISBN 978-1-4503-6238-2.
Type
Proceedings paper
Annotation
The purpose of this paper is to analyze and implement incremental updates of data lineage storage in the software tool Manta Flow. The basis of this work is the study of current data lineage storage in Manta Flow, research of existing solutions of incremental updates in version control systems, research of incremental backups in databases, analysis and design of a new solution of incremental updates in Manta Flow and a subsequent prototype implementation and performance testing execution. The resulting prototype can be deployed into the existing Manta Flow product, reducing time complexity of updates in data lineage storage in orders of magnitude.

Query Optimization on Distributed Graph Database

Authors
Svitáková, L.; Valenta, M.; Pokorný, J.
Year
2019
Published
DATA A ZNALOSTI & WIKT 2019. Košice: Technická univerzita v Košiciach, 2019. ISBN 978-80-553-3354-0.
Type
Proceedings paper
Annotation
Distribuce dat v grafových databázích je dnes zpravidla implementována jako náhodné rozmístění nově příchozích dat na jednotlivé uzly clusteru. Efektivní využití grafové databáze však často vyžaduje přeskupení těchto dat, aby byla komunikace mezi uzly clusteru co nejnižší. Vytvořili jsme modul do frameworku TinkerPop, který získá data o dotazech provedených nad grafovou databází. Tato data slouží jako vstup pro redistribuční algoritmus, který data redistribuuje se snížením potřebné komunikace mezi uzly clusteru (v níže popsaném experimentu o 70–80 %) a s relativně nízkými výpočetními nároky. Do redistribuce chceme dále zahrnout další relevantní informace, stejně tak jako tyto informace využít pro vhodné uložení nově příchozích dat. V příspěvku přiblížíme naše výsledky a představíme oblasti, kterým se chceme v rámci optimalizace dotazů dále věnovat.

Graph Patterns Indexes: their Storage and Retrieval

Authors
Valenta, M.; Ramba, J.; Pokorný, J.
Year
2018
Published
Proceeding iiWAS2018 Proceedings of the 20th International Conference on Information Integration and Web-based Applications & Services. New York: ACM, 2018. p. 221-225. ISBN 978-1-4503-6479-9.
Type
Proceedings paper
Annotation
We propose a method for indexing graph patterns within a graph database. A graph database consists of a labelled property graph. The index is organized in a hash table and stored in the different database than the database graph. The method enables to create, use, and update indexes that are used to speed-up the process of matching graph patterns. The prototype implementing the method was analyzed for Neo4j graph database engine. Pattern indexes are stored in the embedded database MapDB. Three graph databases are used for experiments with pattern indexes. The paper provides a comparison between queries with and without using indexes.

Indexing Patterns in Graph Databases

Authors
Troup, M.; Valenta, M.; Pokorný, J.
Year
2018
Published
Proceedings of the 7th International Conference on Data Science, Technology and Applications. Porto: SciTePress - Science and Technology Publications, 2018. p. 313-321. vol. 1. ISBN 978-989-758-318-6.
Type
Proceedings paper
Annotation
Nowadays graphs have become very popular in domains like social media analytics, healthcare, naturalsciences, BI, networking, graph-based bibliographic IR, etc. Graph databases (GDB) allow simple and rapid retrieval of complex graph structures that are difficult to model in traditional IS based on a relational DBMS. GDB are designed to exploit relationships in data, which means they can uncover patterns difficult to detect using traditional methods. We introduce a new method for indexing graph patterns within a GDB modelled as a labelled property graph. The index is organized in a tree structure and stored in the same database where the database graph. The method is analysed and implemented for Neo4j GDB engine. It enables to create, use and update indexes that are used to speed-up the process of matching graph patterns. The paper provides a comparison between queries with and without using indexes.

Towards OntoUML for Software Engineering: Experimental Evaluation of Exclusivity Constraints in Relational Databases

Year
2018
Published
Model and Data Engineering. Springer, Cham, 2018. p. 58-73. vol. 1. ISSN 0302-9743. ISBN 978-3-030-00855-0.
Type
Proceedings paper
Annotation
Model-driven development approach to software engineering requires precise models defining as much of the system as possible. OntoUML is a conceptual modelling language based on Unified Foundational Ontology, which provides constructs to create ontologically well-founded and precise conceptual models. In the approach we utilize, OntoUML is used for making conceptual models of software application data and this model is then transformed into its proper realization in a relational database. In these transformations, the implicit constraints defined by various OntoUML universal types and relations are realized by database views and triggers. In this paper, we specifically discuss the realization of phase partitions of Phase types from the OntoUML model by exclusive associations and provide an experimental evaluation of this approach.

Evaluation of XPath Queries Over XML Documents Using SparkSQL Framework

Authors
Hricov, R.; Šenk, A.; Kroha, P.; Valenta, M.
Year
2017
Published
Beyond Databases, Architectures and Structures. Towards Efficient Solutions for Data Analysis and Knowledge Representation. Springer, Cham, 2017. p. 28-41. ISSN 1865-0929. ISBN 978-3-319-58274-0.
Type
Proceedings paper
Annotation
In this contribution, we present our approach to querying XML document that is stored in a distributed system. The main goal of this paper is to describe how to use Spark SQL framework to implement a subset of expressions from XPath query language. Five different methods of our approach are introduced and compared, and by this, we also demonstrate the actual state of query optimization on Spark SQL platform. It may be taken as the next contribution of our paper. A subset of expressions from XPath query language (supported by the implemented methods) contains all XPath axes except the axes of attribute and namespace while predicates are not implemented in our prototype. We present our implemented system, data, measurements, tests, and results. The evaluated results support our belief that our method significantly decreases data transfers in the distributed system that occur during the query evaluation.

Experiences with data lineage metadata storing in relational and graph database

Authors
Quast, K.; Valenta, M.
Year
2017
Published
DATESO 2017. Praha: CTU. Czech Technical University Publishing House, 2017. p. 14-20. ISBN 978-80-01-06138-1.
Type
Proceedings paper
Annotation
We've spent few last years researching possibilities of storing and managing metadata for data lineage. We tried using relational database and special category of NoSQL databases - graph database. This paper describes our experiences with two competing products for data lineage and possibilities to store and refresh data lineage metadata. We present research in progress, including benchmarks and we're currently working on several challenges, that customers requested, i.e. adding temporal dimension and multiple hierarchical views, such as logical/physical.

Integrity constraints in graph databases

Authors
Valenta, M.; Kovačič, J.; Pokorný Jaroslav, J P
Year
2017
Published
8th International Conference on Ambient Systems, Networks and Technologies, ANT-2017 and the 7th International Conference on Sustainable Energy Information Technology, SEIT 2017. San Sebastian: Elsevier, 2017. p. 975-981. vol. 109. ISSN 1877-0509.
Type
Proceedings paper
Annotation
One thing that is still being developed for graph databases is integrity constraint (IC) support. One possibility to IC proposal is to consider a graph conceptual schema and a graph database schema. At least inherent ICs coming from a graph conceptual schema should be considered as explicit ICs on the graph databases level, i.e., using a DDL. In the paper, we focus on graph database Neo4j and its possibilities to express a database schema and ICs. We extend these possibilities through new constructs in Neo4j DDL including their prototype implementation and experiments.

Graph databases as a storage for data-lineage metadata - experience and challenges

Authors
Quast, K.; Valenta, M.
Year
2016
Published
Proceedings in Informatics and Information Technologies - (WIKT & DaZ 2016) 11th Workshop on Intelligent and Knowledge Oriented Technologies 35th Conference on Data and Knowledge. Bratislava: Vydavatel'stvo STU, 2016. pp. 187-192. ISBN 978-80-227-4619-9.
Type
Proceedings paper
Annotation
V projektu zabývajícím se tzv. data lineage jsme již před 3 lety pro datové úložiště metadat použili grafovou databázi namísto relační. Od té doby přibylo instalací, zvětšil se objem dat a změnily se požadavky zákazníků. Řešili jsme temporální dimenzi úložiště a také více možných pohledů na hierarchii dat, tedy kromě fyzické hierarchie, která vychází z analýzy příslušných datových slovníků, přidat například hierarchii logickou/konceptuální. Grafové databáze se pro tento typ úlohy ukazují jako perspektivní řešení. V příspěvku přiblížíme po- žadavky na datové úložiště metadat pro data lineage, podělíme se o vlastní zku- šenosti s praktickou implementací a naznačíme další směry rozvoje a výzvy, které s nimi souvisí.

Minimization of Data Transfers during MapReduce Computations in Distributed Wide-Column Stores

Authors
Šenk, A.; Hrstka, M.; Kroha, P.; Valenta, M.
Year
2016
Published
New Trends in Databases and Information Systems. Wien: Springer, 2016. pp. 261-274. 637. ISSN 1865-0929. ISBN 978-3-319-44065-1.
Type
Proceedings paper
Annotation
In this contribution, we present our original approach to distributed wide-column store database tuning based on data locality optimization. The main goal of the optimization is the reduction of communication overhead in distributed environment during Map-Reduce query evaluation. The optimization is realized by the minimisation of the total number of key-value pairs emitted from mappers. To achieve the goal, we combine several Map-Reduce optimization methods, adapt them to wide-column store model and utilize them to overcome architectural limitation. To prove our idea, we implemented the proposed solution in HBase system that represents this class of DBMS. We present our data, measurements, and tests. The evaluated results support our idea that this method can significantly decrease data transfers in the distributed system.

Benefits of easy extensible OO design - ECA module in CellStore project

Authors
Šenk, A.; Valenta, M.
Year
2012
Published
Objekty 2012. Praha: Vysoká škola manažerské informatiky a ekonomiky, a.s., 2012. pp. 4-15. ISBN 978-80-86847-63-4.
Type
Proceedings paper
Annotation
CellStore is a XML-native database management system written in Smalltalk programming language. The main idea of this project is to focus on a clear object design with an extensible structure. This approach brings a big advantage - easy creation of new modules, which allow us to test and study new technologies and ideas. ECA paradigm is well known from SQL databases, where it is used in a feature called triggers. As the XML-native database management systems become a part of real-world information systems, their developers try to use more features known from the SQL world. Triggers are one of them. In this paper we describe CellStore's object oriented design and its APIs. We will use the experiences that we have gained from developing new XML trigger module to demonstrate the possibilities and benefits which CellStore SDK brings to developers and programmers.

Student surveys as a source of information

Authors
Součková, M.; Valenta, M.; Topinková, P.
Year
2012
Published
Proceedings ofn the 15th International Conference on Interactive Collaborative Learning and 41st International Conference on Engineering Pedagogy. Vienna: IEEE Industrial Electronic Society, 2012. ISBN 978-1-4673-2427-4.
Type
Proceedings paper
Annotation
The results of the student surveys being provide us with an interesting source of valuable information. The same data, which is used for the evaluation of current courses and teaching quality, is possible to detect broader trends.

A Large Amount of Final Projects Effectively Processed with Minimal Sofware Requirements - Open Source and Platform Independent Solution: A Case Study.

Authors
Year
2009
Published
Proceedings of the First International Conference on Computer Supported Education. Setúbal: INSTICC Press, 2009. pp. 236-241. ISBN 978-989-8111-82-1.
Type
Proceedings paper
Annotation
We present the way how to simply set up and process the set of final projects. The method used only standard open-source technologies (particularly XML and XSLT). The method was applied to approx 1000 final projects in the course Database system. Both students and teachers find this simple approach useful.

On Benchmarking Transaction Managers

Authors
Strnad, P.; Valenta, M.
Year
2009
Published
Database Systems for Advanced Applications DASFAA 2009 International Workshops: BenchmarX, MCIS, WDPP, PPDA, MBC, PhD, Brisbane, Australia, April 20 - 23, 2009. Berlin: Springer, 2009. pp. 79-92. Lecture Notes in Computer Science. ISSN 0302-9743. ISBN 978-3-642-04204-1.
Type
Proceedings paper
Annotation
We describe an idea of measuring the performance of a transaction manager's performance. We design a very simple benchmark intended for evaluating this important component of a DB engine. Then we apply it to our own transaction manager's implementation. We also describe the implementation of the transaction manager itself. It is done as a software layer over the eXist database engine. It is a standalone module which can be used to extend eXist functionality by transactional processing when it is needed.