Effective Data Redistribution Based on User Queries in a Distributed Graph Database
Authors
Svitáková, L.; Valenta, M.; Pokorný, J.
Year
2020
Published
Intelligent Information and Database Systems. Cham: Springer, 2020. p. 218-229. Lecture Notes in Computer Science. ISSN 0302-9743. ISBN 978-3-030-42057-4.
Type
Proceedings paper
Departments
Annotation
The problem of data distribution in NoSQL databases is particularly difficult in the case of graph databases since the data often represent a large, highly connected graph. We face this task with monitoring of user queries, for which we created a logging module providing information serving as an input to a redistribution algorithm which bases on a lightweight method of Adaptive Partitioning but incorporates our enhancements overcoming its present drawbacks (local optima, balancing, edge weights). The results of our experiments show 70% – 80% reduction of communication between cluster nodes which is a comparable result to other methods, which, however, are more computationally demanding or suffer from other shortcomings.
Enhanced adaptive partitioning in a distributed graph database
Authors
Svitáková, L.; Pokorný, J.; Valenta, M.
Year
2020
Published
Journal of Information and Telecommunication. 2020, 5(1), 104-120. ISSN 2475-1847.
Type
Article
Departments
Annotation
Nowadays, open-source graph databases do not include an
inherent mechanism for data relocation that would be based on
their usage. They often do not offer even appropriate monitoring
that could help to make such a decision. Information about data
utilization could, however, work as an input to some decision-
making process about more suitable data regrouping that could
be much more efficient in terms of intra-network communication.
Therefore, we created a module for the graph computational
framework TinkerPop that logs traffic generated by the user
queries. These logged records serve as an input for the algorithm
of Adaptive Partitioning that we enhanced with better balancing,
avoidance of local optima and the notion of weighted graphs.
This approach yields a 70–80% improvement in intra-network
communication, which is comparable to other methods, namely
Ja-be-Ja, that offers similar results but has higher computational
demands.
Data Lineage Temporally Using a Graph Database
Authors
Pokorny, J.; Sykora, J.; Valenta, M.
Year
2019
Published
MEDES '19: Proceedings of the 11th International Conference on Management of Digital EcoSystems. Anchorage, Alaska: IEEE, 2019. p. 285-291. ISBN 978-1-4503-6238-2.
Type
Proceedings paper
Departments
Annotation
The purpose of this paper is to analyze and implement incremental updates of data lineage storage in the software tool Manta Flow. The basis of this work is the study of current data lineage storage in Manta Flow, research of existing solutions of incremental updates in version control systems, research of incremental backups in databases, analysis and design of a new solution of incremental updates in Manta Flow and a subsequent prototype implementation and performance testing execution. The resulting prototype can be deployed into the existing Manta Flow product, reducing time complexity of updates in data lineage storage in orders of magnitude.
Query Optimization on Distributed Graph Database
Authors
Svitáková, L.; Valenta, M.; Pokorný, J.
Year
2019
Published
DATA A ZNALOSTI & WIKT 2019. Košice: Technická univerzita v Košiciach, 2019. ISBN 978-80-553-3354-0.
Type
Proceedings paper
Departments
Annotation
Distribuce dat v grafových databázích je dnes zpravidla
implementována jako náhodné rozmístění nově příchozích dat na jednotlivé uzly
clusteru. Efektivní využití grafové databáze však často vyžaduje přeskupení
těchto dat, aby byla komunikace mezi uzly clusteru co nejnižší. Vytvořili jsme
modul do frameworku TinkerPop, který získá data o dotazech provedených nad
grafovou databází. Tato data slouží jako vstup pro redistribuční algoritmus, který
data redistribuuje se snížením potřebné komunikace mezi uzly clusteru (v níže
popsaném experimentu o 70–80 %) a s relativně nízkými výpočetními nároky.
Do redistribuce chceme dále zahrnout další relevantní informace, stejně tak jako
tyto informace využít pro vhodné uložení nově příchozích dat. V příspěvku
přiblížíme naše výsledky a představíme oblasti, kterým se chceme v rámci
optimalizace dotazů dále věnovat.
Graph Patterns Indexes: their Storage and Retrieval
Authors
Valenta, M.; Ramba, J.; Pokorný, J.
Year
2018
Published
Proceeding iiWAS2018 Proceedings of the 20th International Conference on Information Integration and Web-based Applications & Services. New York: ACM, 2018. p. 221-225. ISBN 978-1-4503-6479-9.
Type
Proceedings paper
Departments
Annotation
We propose a method for indexing graph patterns within a graph
database. A graph database consists of a labelled property graph.
The index is organized in a hash table and stored in the different
database than the database graph. The method enables to create,
use, and update indexes that are used to speed-up the process of
matching graph patterns. The prototype implementing the method
was analyzed for Neo4j graph database engine. Pattern indexes are
stored in the embedded database MapDB. Three graph databases
are used for experiments with pattern indexes. The paper provides
a comparison between queries with and without using indexes.
Indexing Patterns in Graph Databases
Authors
Troup, M.; Valenta, M.; Pokorný, J.
Year
2018
Published
Proceedings of the 7th International Conference on Data Science, Technology and Applications. Porto: SciTePress - Science and Technology Publications, 2018. p. 313-321. vol. 1. ISBN 978-989-758-318-6.
Type
Proceedings paper
Departments
Annotation
Nowadays graphs have become very popular in domains like social media analytics, healthcare, naturalsciences, BI, networking, graph-based bibliographic IR, etc. Graph databases (GDB) allow simple and rapid retrieval of complex graph structures that are difficult to model in traditional IS based on a relational DBMS. GDB are designed to exploit relationships in data, which means they can uncover patterns difficult
to detect using traditional methods. We introduce a new method for indexing graph patterns within a GDB modelled as a labelled property graph. The index is organized in a tree structure and stored in the same database where the database graph. The method is analysed and implemented for Neo4j GDB engine. It enables to create, use and update indexes that are used to speed-up the process of matching graph patterns. The paper provides a comparison between queries with and without using indexes.
Towards OntoUML for Software Engineering: Experimental Evaluation of Exclusivity Constraints in Relational Databases
Authors
Year
2018
Published
Model and Data Engineering. Springer, Cham, 2018. p. 58-73. vol. 1. ISSN 0302-9743. ISBN 978-3-030-00855-0.
Type
Proceedings paper
Departments
Annotation
Model-driven development approach to software engineering requires precise models defining as much of the system as possible. OntoUML is a conceptual modelling language based on Unified Foundational Ontology, which provides constructs to create ontologically well-founded and precise conceptual models. In the approach we utilize, OntoUML is used for making conceptual models of software application data and this model is then transformed into its proper realization in a relational database. In these transformations, the implicit constraints defined by various OntoUML universal types and relations are realized by database views and triggers. In this paper, we specifically discuss the realization of phase partitions of Phase types from the OntoUML model by exclusive associations and provide an experimental evaluation of this approach.
Evaluation of XPath Queries Over XML Documents Using SparkSQL Framework
Authors
Hricov, R.; Šenk, A.; Kroha, P.; Valenta, M.
Year
2017
Published
Beyond Databases, Architectures and Structures. Towards Efficient Solutions for Data Analysis and Knowledge Representation. Springer, Cham, 2017. p. 28-41. ISSN 1865-0929. ISBN 978-3-319-58274-0.
Type
Proceedings paper
Departments
Annotation
In this contribution, we present our approach to querying XML document that is stored in a distributed system. The main goal of this paper is to describe how to use Spark SQL framework to implement a subset of expressions from XPath query language. Five different methods of our approach are introduced and compared, and by this, we also demonstrate the actual state of query optimization on Spark SQL platform. It may be taken as the next contribution of our paper. A subset of expressions from XPath query language (supported by the implemented methods) contains all XPath axes except the axes of attribute and namespace while predicates are not implemented in our prototype. We present our implemented system, data, measurements, tests, and results. The evaluated results support our belief that our method significantly decreases data transfers in the distributed system that occur during the query evaluation.
Experiences with data lineage metadata storing in relational and graph database
Authors
Quast, K.; Valenta, M.
Year
2017
Published
DATESO 2017. Praha: CTU. Czech Technical University Publishing House, 2017. p. 14-20. ISBN 978-80-01-06138-1.
Type
Proceedings paper
Departments
Annotation
We've spent few last years researching possibilities of storing and
managing metadata for data lineage. We tried using relational database and
special category of NoSQL databases - graph database. This paper describes our
experiences with two competing products for data lineage and possibilities to
store and refresh data lineage metadata. We present research in progress,
including benchmarks and we're currently working on several challenges, that
customers requested, i.e. adding temporal dimension and multiple hierarchical
views, such as logical/physical.
Integrity constraints in graph databases
Authors
Valenta, M.; Kovačič, J.; Pokorný Jaroslav, J P
Year
2017
Published
8th International Conference on Ambient Systems, Networks and Technologies, ANT-2017 and the 7th International Conference on Sustainable Energy Information Technology, SEIT 2017. San Sebastian: Elsevier, 2017. p. 975-981. vol. 109. ISSN 1877-0509.
Type
Proceedings paper
Departments
Annotation
One thing that is still being developed for graph databases is integrity constraint (IC) support. One possibility to IC proposal is to consider a graph conceptual schema and a graph database schema. At least inherent ICs coming from a graph conceptual schema should be considered as explicit ICs on the graph databases level, i.e., using a DDL. In the paper, we focus on graph database Neo4j and its possibilities to express a database schema and ICs. We extend these possibilities through new constructs in Neo4j DDL including their prototype implementation and experiments.
Graph databases as a storage for data-lineage metadata - experience and challenges
Authors
Quast, K.; Valenta, M.
Year
2016
Published
Proceedings in Informatics and Information Technologies - (WIKT & DaZ 2016) 11th Workshop on Intelligent and Knowledge Oriented Technologies 35th Conference on Data and Knowledge. Bratislava: Vydavatel'stvo STU, 2016. pp. 187-192. ISBN 978-80-227-4619-9.
Type
Proceedings paper
Departments
Annotation
V projektu zabývajícím se tzv. data lineage jsme již před 3 lety pro
datové úložiště metadat použili grafovou databázi namísto relační. Od té doby
přibylo instalací, zvětšil se objem dat a změnily se požadavky zákazníků. Řešili
jsme temporální dimenzi úložiště a také více možných pohledů na hierarchii dat,
tedy kromě fyzické hierarchie, která vychází z analýzy příslušných datových
slovníků, přidat například hierarchii logickou/konceptuální. Grafové databáze se
pro tento typ úlohy ukazují jako perspektivní řešení. V příspěvku přiblížíme po-
žadavky na datové úložiště metadat pro data lineage, podělíme se o vlastní zku-
šenosti s praktickou implementací a naznačíme další směry rozvoje a výzvy,
které s nimi souvisí.
Minimization of Data Transfers during MapReduce Computations in Distributed Wide-Column Stores
Authors
Šenk, A.; Hrstka, M.; Kroha, P.; Valenta, M.
Year
2016
Published
New Trends in Databases and Information Systems. Wien: Springer, 2016. pp. 261-274. 637. ISSN 1865-0929. ISBN 978-3-319-44065-1.
Type
Proceedings paper
Departments
Annotation
In this contribution, we present our original approach to distributed wide-column store database tuning based on data locality optimization. The main goal of the optimization is the reduction of communication overhead in distributed environment during Map-Reduce query evaluation. The optimization is realized by the minimisation of the total number of key-value pairs emitted from mappers.
To achieve the goal, we combine several Map-Reduce optimization methods, adapt them to wide-column store model and utilize them to overcome architectural limitation. To prove our idea, we implemented the proposed solution in HBase system that represents this class of DBMS. We present our data, measurements, and tests. The evaluated results support our idea that this method can significantly decrease data transfers in the distributed system.
Benefits of easy extensible OO design - ECA module in CellStore project
Authors
Šenk, A.; Valenta, M.
Year
2012
Published
Objekty 2012. Praha: Vysoká škola manažerské informatiky a ekonomiky, a.s., 2012. pp. 4-15. ISBN 978-80-86847-63-4.
Type
Proceedings paper
Departments
Annotation
CellStore is a XML-native database management system written in Smalltalk programming language. The main idea of this project is to focus on a clear object design with an extensible structure. This approach brings a big advantage - easy creation of new modules, which allow us to test and study new technologies and ideas.
ECA paradigm is well known from SQL databases, where it is used in a feature called triggers. As the XML-native database management systems become a part of real-world information systems, their developers try to use more features known from the SQL world. Triggers are one of them.
In this paper we describe CellStore's object oriented design and its APIs. We will use the experiences that we have gained from developing new XML trigger module to demonstrate the possibilities and benefits which CellStore SDK brings to developers and programmers.
Student surveys as a source of information
Authors
Součková, M.; Valenta, M.; Topinková, P.
Year
2012
Published
Proceedings ofn the 15th International Conference on Interactive Collaborative Learning and 41st International Conference on Engineering Pedagogy. Vienna: IEEE Industrial Electronic Society, 2012. ISBN 978-1-4673-2427-4.
Type
Proceedings paper
Departments
Annotation
The results of the student surveys being provide us with an interesting source of valuable information. The same data, which is used for the evaluation of current courses and teaching quality, is possible to detect broader trends.
A Large Amount of Final Projects Effectively Processed with Minimal Sofware Requirements - Open Source and Platform Independent Solution: A Case Study.
Authors
Year
2009
Published
Proceedings of the First International Conference on Computer Supported Education. Setúbal: INSTICC Press, 2009. pp. 236-241. ISBN 978-989-8111-82-1.
Type
Proceedings paper
Departments
Annotation
We present the way how to simply set up and process the set of final projects. The method used only standard open-source technologies (particularly XML and XSLT). The method was applied to approx 1000 final projects in the course Database system. Both students and teachers find this simple approach useful.
On Benchmarking Transaction Managers
Authors
Strnad, P.; Valenta, M.
Year
2009
Published
Database Systems for Advanced Applications DASFAA 2009 International Workshops: BenchmarX, MCIS, WDPP, PPDA, MBC, PhD, Brisbane, Australia, April 20 - 23, 2009. Berlin: Springer, 2009. pp. 79-92. Lecture Notes in Computer Science. ISSN 0302-9743. ISBN 978-3-642-04204-1.
Type
Proceedings paper
Departments
Annotation
We describe an idea of measuring the performance of a transaction
manager's performance. We design a very simple benchmark intended
for evaluating this important component of a DB engine. Then
we apply it to our own transaction manager's implementation. We also
describe the implementation of the transaction manager itself. It is done
as a software layer over the eXist database engine. It is a standalone
module which can be used to extend eXist functionality by transactional
processing when it is needed.