Ing. Michal Valenta, Ph.D.

Instantiation of OntoUML Models in Multi-labeled Graph Databases

Authors

Pokorný, J.; Rybola, Z.; Valenta, M.; Zikán, J.; Pergl, R.

Year

2024

Published

PROCEEDINGS OF THE 17th IADIS INTERNATIONAL CONFERENCE INFORMATION SYSTEMS 2024. Lisboa: IADIS Press, 2024. p. 78-86. ISBN 978-989-8704-56-6.

Type

Proceedings paper

DOI

10.33965/is2024_202401l010

Departments

Department of Software Engineering

Annotation

The importance of conceptual modeling in software engineering is constantly growing along with the complexity of business domains. In order to handle this increasing complexity, it is essential to use modeling notations that provide a high level of abstraction and expressiveness. For this reason, we specifically focus on ontology-driven conceptual modeling and OntoUML notation in our research. In this paper, we present the main ideas behind our novel approach to OntoUML model instantiation using multi labeled property graph databases, such as Neo4j. Our approach should not only serve to ensure data integrity in graph databases but also provide domain experts with a convenient way to verify the correctness and completeness of their conceptual models. Although there already exist approaches to instantiate OntoUML models using relational databases, our research shows that multi-labeled property graph databases are better suited for this purpose as their versatile data model better correlates to the essential principles of OntoUML.

Effective Data Redistribution Based on User Queries in a Distributed Graph Database

Authors

Svitáková, L.; Valenta, M.; Pokorný, J.

Year

2020

Published

Intelligent Information and Database Systems. Cham: Springer, 2020. p. 218-229. Lecture Notes in Computer Science. ISSN 0302-9743. ISBN 978-3-030-42057-4.

Type

Proceedings paper

DOI

10.1007/978-3-030-42058-1_18

Departments

Department of Software Engineering

Annotation

The problem of data distribution in NoSQL databases is particularly difficult in the case of graph databases since the data often represent a large, highly connected graph. We face this task with monitoring of user queries, for which we created a logging module providing information serving as an input to a redistribution algorithm which bases on a lightweight method of Adaptive Partitioning but incorporates our enhancements overcoming its present drawbacks (local optima, balancing, edge weights). The results of our experiments show 70% – 80% reduction of communication between cluster nodes which is a comparable result to other methods, which, however, are more computationally demanding or suffer from other shortcomings.

Enhanced adaptive partitioning in a distributed graph database

Authors

Svitáková, L.; Pokorný, J.; Valenta, M.

Year

2020

Published

Journal of Information and Telecommunication. 2020, 5(1), 104-120. ISSN 2475-1847.

Type

Article

DOI

10.1080/24751839.2020.1829387

Departments

Department of Software Engineering

Annotation

Nowadays, open-source graph databases do not include an inherent mechanism for data relocation that would be based on their usage. They often do not offer even appropriate monitoring that could help to make such a decision. Information about data utilization could, however, work as an input to some decision- making process about more suitable data regrouping that could be much more efficient in terms of intra-network communication. Therefore, we created a module for the graph computational framework TinkerPop that logs traffic generated by the user queries. These logged records serve as an input for the algorithm of Adaptive Partitioning that we enhanced with better balancing, avoidance of local optima and the notion of weighted graphs. This approach yields a 70–80% improvement in intra-network communication, which is comparable to other methods, namely Ja-be-Ja, that offers similar results but has higher computational demands.

Data Lineage Temporally Using a Graph Database

Authors

Pokorny, J.; Sykora, J.; Valenta, M.

Year

2019

Published

MEDES '19: Proceedings of the 11th International Conference on Management of Digital EcoSystems. Anchorage, Alaska: IEEE, 2019. p. 285-291. ISBN 978-1-4503-6238-2.

Type

Proceedings paper

DOI

10.1145/3297662.3365794

Departments

Department of Software Engineering

Annotation

The purpose of this paper is to analyze and implement incremental updates of data lineage storage in the software tool Manta Flow. The basis of this work is the study of current data lineage storage in Manta Flow, research of existing solutions of incremental updates in version control systems, research of incremental backups in databases, analysis and design of a new solution of incremental updates in Manta Flow and a subsequent prototype implementation and performance testing execution. The resulting prototype can be deployed into the existing Manta Flow product, reducing time complexity of updates in data lineage storage in orders of magnitude.

Query Optimization on Distributed Graph Database

Authors

Svitáková, L.; Valenta, M.; Pokorný, J.

Year

2019

Published

DATA A ZNALOSTI & WIKT 2019. Košice: Technická univerzita v Košiciach, 2019. ISBN 978-80-553-3354-0.

Type

Proceedings paper

Departments

Department of Software Engineering

Annotation

Distribuce dat v grafových databázích je dnes zpravidla implementována jako náhodné rozmístění nově příchozích dat na jednotlivé uzly clusteru. Efektivní využití grafové databáze však často vyžaduje přeskupení těchto dat, aby byla komunikace mezi uzly clusteru co nejnižší. Vytvořili jsme modul do frameworku TinkerPop, který získá data o dotazech provedených nad grafovou databází. Tato data slouží jako vstup pro redistribuční algoritmus, který data redistribuuje se snížením potřebné komunikace mezi uzly clusteru (v níže popsaném experimentu o 70–80 %) a s relativně nízkými výpočetními nároky. Do redistribuce chceme dále zahrnout další relevantní informace, stejně tak jako tyto informace využít pro vhodné uložení nově příchozích dat. V příspěvku přiblížíme naše výsledky a představíme oblasti, kterým se chceme v rámci optimalizace dotazů dále věnovat.

Graph Patterns Indexes: their Storage and Retrieval

Authors

Valenta, M.; Ramba, J.; Pokorný, J.

Year

2018

Published

Proceeding iiWAS2018 Proceedings of the 20th International Conference on Information Integration and Web-based Applications & Services. New York: ACM, 2018. p. 221-225. ISBN 978-1-4503-6479-9.

Type

Proceedings paper

DOI

10.1145/3282373.3282374

Departments

Department of Software Engineering

Annotation

We propose a method for indexing graph patterns within a graph database. A graph database consists of a labelled property graph. The index is organized in a hash table and stored in the different database than the database graph. The method enables to create, use, and update indexes that are used to speed-up the process of matching graph patterns. The prototype implementing the method was analyzed for Neo4j graph database engine. Pattern indexes are stored in the embedded database MapDB. Three graph databases are used for experiments with pattern indexes. The paper provides a comparison between queries with and without using indexes.

Indexing Patterns in Graph Databases

Authors

Troup, M.; Valenta, M.; Pokorný, J.

Year

2018

Published

Proceedings of the 7th International Conference on Data Science, Technology and Applications. Porto: SciTePress - Science and Technology Publications, 2018. p. 313-321. vol. 1. ISBN 978-989-758-318-6.

Type

Proceedings paper

DOI

10.5220/0006826903130321

Departments

Department of Software Engineering

Annotation

Nowadays graphs have become very popular in domains like social media analytics, healthcare, naturalsciences, BI, networking, graph-based bibliographic IR, etc. Graph databases (GDB) allow simple and rapid retrieval of complex graph structures that are difficult to model in traditional IS based on a relational DBMS. GDB are designed to exploit relationships in data, which means they can uncover patterns difficult to detect using traditional methods. We introduce a new method for indexing graph patterns within a GDB modelled as a labelled property graph. The index is organized in a tree structure and stored in the same database where the database graph. The method is analysed and implemented for Neo4j GDB engine. It enables to create, use and update indexes that are used to speed-up the process of matching graph patterns. The paper provides a comparison between queries with and without using indexes.

Towards OntoUML for Software Engineering: Experimental Evaluation of Exclusivity Constraints in Relational Databases

Authors

Rybola, Z.; Valenta, M.

Year

2018

Published

Model and Data Engineering. Springer, Cham, 2018. p. 58-73. vol. 1. ISSN 0302-9743. ISBN 978-3-030-00855-0.

Type

Proceedings paper

DOI

10.1007/978-3-030-00856-7_4

Departments

Department of Software Engineering

Annotation

Model-driven development approach to software engineering requires precise models defining as much of the system as possible. OntoUML is a conceptual modelling language based on Unified Foundational Ontology, which provides constructs to create ontologically well-founded and precise conceptual models. In the approach we utilize, OntoUML is used for making conceptual models of software application data and this model is then transformed into its proper realization in a relational database. In these transformations, the implicit constraints defined by various OntoUML universal types and relations are realized by database views and triggers. In this paper, we specifically discuss the realization of phase partitions of Phase types from the OntoUML model by exclusive associations and provide an experimental evaluation of this approach.

Evaluation of XPath Queries Over XML Documents Using SparkSQL Framework

Authors

Hricov, R.; Šenk, A.; Kroha, P.; Valenta, M.

Year

2017

Published

Beyond Databases, Architectures and Structures. Towards Efficient Solutions for Data Analysis and Knowledge Representation. Springer, Cham, 2017. p. 28-41. ISSN 1865-0929. ISBN 978-3-319-58274-0.

Type

Proceedings paper

DOI

10.1007/978-3-319-58274-0_3

Departments

Department of Software Engineering

Annotation

In this contribution, we present our approach to querying XML document that is stored in a distributed system. The main goal of this paper is to describe how to use Spark SQL framework to implement a subset of expressions from XPath query language. Five different methods of our approach are introduced and compared, and by this, we also demonstrate the actual state of query optimization on Spark SQL platform. It may be taken as the next contribution of our paper. A subset of expressions from XPath query language (supported by the implemented methods) contains all XPath axes except the axes of attribute and namespace while predicates are not implemented in our prototype. We present our implemented system, data, measurements, tests, and results. The evaluated results support our belief that our method significantly decreases data transfers in the distributed system that occur during the query evaluation.

Experiences with data lineage metadata storing in relational and graph database

Authors

Quast, K.; Valenta, M.

Year

2017

Published

DATESO 2017. Praha: CTU. Czech Technical University Publishing House, 2017. p. 14-20. ISBN 978-80-01-06138-1.

Type

Proceedings paper

Departments

Department of Theoretical Computer Science
Department of Software Engineering
Department of Computer Systems

Annotation

We've spent few last years researching possibilities of storing and managing metadata for data lineage. We tried using relational database and special category of NoSQL databases - graph database. This paper describes our experiences with two competing products for data lineage and possibilities to store and refresh data lineage metadata. We present research in progress, including benchmarks and we're currently working on several challenges, that customers requested, i.e. adding temporal dimension and multiple hierarchical views, such as logical/physical.

Integrity constraints in graph databases

Authors

Valenta, M.; Kovačič, J.; Pokorný Jaroslav, J P

Year

2017

Published

8th International Conference on Ambient Systems, Networks and Technologies, ANT-2017 and the 7th International Conference on Sustainable Energy Information Technology, SEIT 2017. San Sebastian: Elsevier, 2017. p. 975-981. vol. 109. ISSN 1877-0509.

Type

Proceedings paper

DOI

10.1016/j.procs.2017.05.456

Departments

Department of Software Engineering

Annotation

One thing that is still being developed for graph databases is integrity constraint (IC) support. One possibility to IC proposal is to consider a graph conceptual schema and a graph database schema. At least inherent ICs coming from a graph conceptual schema should be considered as explicit ICs on the graph databases level, i.e., using a DDL. In the paper, we focus on graph database Neo4j and its possibilities to express a database schema and ICs. We extend these possibilities through new constructs in Neo4j DDL including their prototype implementation and experiments.

Graph databases as a storage for data-lineage metadata - experience and challenges

Authors

Quast, K.; Valenta, M.

Year

2016

Published

Proceedings in Informatics and Information Technologies - (WIKT & DaZ 2016) 11th Workshop on Intelligent and Knowledge Oriented Technologies 35th Conference on Data and Knowledge. Bratislava: Vydavatel'stvo STU, 2016. pp. 187-192. ISBN 978-80-227-4619-9.

Type

Proceedings paper

Departments

Department of Software Engineering

Annotation

V projektu zabývajícím se tzv. data lineage jsme již před 3 lety pro datové úložiště metadat použili grafovou databázi namísto relační. Od té doby přibylo instalací, zvětšil se objem dat a změnily se požadavky zákazníků. Řešili jsme temporální dimenzi úložiště a také více možných pohledů na hierarchii dat, tedy kromě fyzické hierarchie, která vychází z analýzy příslušných datových slovníků, přidat například hierarchii logickou/konceptuální. Grafové databáze se pro tento typ úlohy ukazují jako perspektivní řešení. V příspěvku přiblížíme po- žadavky na datové úložiště metadat pro data lineage, podělíme se o vlastní zku- šenosti s praktickou implementací a naznačíme další směry rozvoje a výzvy, které s nimi souvisí.

Minimization of Data Transfers during MapReduce Computations in Distributed Wide-Column Stores

Authors

Šenk, A.; Hrstka, M.; Kroha, P.; Valenta, M.

Year

2016

Published

New Trends in Databases and Information Systems. Wien: Springer, 2016. pp. 261-274. 637. ISSN 1865-0929. ISBN 978-3-319-44065-1.

Type

Proceedings paper

DOI

10.1007/978-3-319-44039-2_18

Departments

Department of Software Engineering

Annotation

In this contribution, we present our original approach to distributed wide-column store database tuning based on data locality optimization. The main goal of the optimization is the reduction of communication overhead in distributed environment during Map-Reduce query evaluation. The optimization is realized by the minimisation of the total number of key-value pairs emitted from mappers. To achieve the goal, we combine several Map-Reduce optimization methods, adapt them to wide-column store model and utilize them to overcome architectural limitation. To prove our idea, we implemented the proposed solution in HBase system that represents this class of DBMS. We present our data, measurements, and tests. The evaluated results support our idea that this method can significantly decrease data transfers in the distributed system.

Distributed Evaluation of XPath Axes Queries over Large XML Documents Stored in MapReduce Clusters

Authors

Šenk, A.; Valenta, M.; Benn, W.

Year

2014

Published

Database and Expert Systems Applications (DEXA), 2014 25th International Workshop on. Los Alamitos, CA: IEEE Computer Soc., 2014. pp. 253-257. ISSN 1529-4188. ISBN 978-1-4799-5722-4.

Type

Proceedings paper

Departments

Department of Software Engineering

Annotation

The MR (MapReduce) framework, a programming model for parallel computation over data stored in a cluster of commodity computers, established itself as one of the leading solutions for Big Data processing. This framework is also being used like a query language in many database systems, because it can process data stored in various unstructured, semi-structured, and structured formats. Nevertheless, the MR framework can be used for XML data processing too, it does not allow to write queries in a declarative manner, like XPath or XQuery. To overcome this problem, we propose a system that enables to query XML data with XPath, but it evaluates the queries in parallel using the MR framework . First, we introduce a persistent storage that maps XML data into a wide-column store. The proposed mapping enables efficient and distributed data processing. Secondly, we describe a query processor translating an XPath language subset to MR jobs. Finally, we present tests and their results showing the scalability of our system.

Distributed Evaluation of XPath Axes Queries over Large XML Documents Stored in MapReduce Clusters

Authors

Šenk, A.; Valenta, M.; Benn, W.

Year

2014

Published

Database and Expert Systems Applications (DEXA), 2014 25th International Workshop on. Los Alamitos, CA: IEEE Computer Soc., 2014. pp. 253-257. ISSN 1529-4188. ISBN 978-1-4799-5722-4.

Type

Proceedings paper

DOI

10.1109/DEXA.2014.59

Departments

Department of Software Engineering

Annotation

The MR (MapReduce) framework, a programming model for parallel computation over data stored in a cluster of commodity computers, established itself as one of the leading solutions for Big Data processing. This framework is also being used like a query language in many database systems, because it can process data stored in various unstructured, semi-structured, and structured formats. Nevertheless, the MR framework can be used for XML data processing too, it does not allow to write queries in a declarative manner, like XPath or XQuery. To overcome this problem, we propose a system that enables to query XML data with XPath, but it evaluates the queries in parallel using the MR framework. First, we introduce a persistent storage that maps XML data into a wide-column store. The proposed mapping enables efficient and distributed data processing. Secondly, we describe a query processor translating an XPath language subset to MR jobs. Finally, we present tests and their results showing the scalability of our system.

Benefits of easy extensible OO design - ECA module in CellStore project

Authors

Šenk, A.; Valenta, M.

Year

2012

Published

Objekty 2012. Praha: Vysoká škola manažerské informatiky a ekonomiky, a.s., 2012. pp. 4-15. ISBN 978-80-86847-63-4.

Type

Proceedings paper

Departments

Department of Software Engineering

Annotation

CellStore is a XML-native database management system written in Smalltalk programming language. The main idea of this project is to focus on a clear object design with an extensible structure. This approach brings a big advantage - easy creation of new modules, which allow us to test and study new technologies and ideas. ECA paradigm is well known from SQL databases, where it is used in a feature called triggers. As the XML-native database management systems become a part of real-world information systems, their developers try to use more features known from the SQL world. Triggers are one of them. In this paper we describe CellStore's object oriented design and its APIs. We will use the experiences that we have gained from developing new XML trigger module to demonstrate the possibilities and benefits which CellStore SDK brings to developers and programmers.

Student surveys as a source of information

Authors

Součková, M.; Valenta, M.; Topinková, P.

Year

2012

Published

Proceedings ofn the 15th International Conference on Interactive Collaborative Learning and 41st International Conference on Engineering Pedagogy. Vienna: IEEE Industrial Electronic Society, 2012. ISBN 978-1-4673-2427-4.

Type

Proceedings paper

Departments

Department of Software Engineering

Annotation

The results of the student surveys being provide us with an interesting source of valuable information. The same data, which is used for the evaluation of current courses and teaching quality, is possible to detect broader trends.

A Large Amount of Final Projects Effectively Processed with Minimal Sofware Requirements - Open Source and Platform Independent Solution: A Case Study.

Authors

Valenta, M.

Year

2009

Published

Proceedings of the First International Conference on Computer Supported Education. Setúbal: INSTICC Press, 2009. pp. 236-241. ISBN 978-989-8111-82-1.

Type

Proceedings paper

Departments

Department of Software Engineering

Annotation

We present the way how to simply set up and process the set of final projects. The method used only standard open-source technologies (particularly XML and XSLT). The method was applied to approx 1000 final projects in the course Database system. Both students and teachers find this simple approach useful.

On Benchmarking Transaction Managers

Authors

Strnad, P.; Valenta, M.

Year

2009

Published

Database Systems for Advanced Applications DASFAA 2009 International Workshops: BenchmarX, MCIS, WDPP, PPDA, MBC, PhD, Brisbane, Australia, April 20 - 23, 2009. Berlin: Springer, 2009. pp. 79-92. Lecture Notes in Computer Science. ISSN 0302-9743. ISBN 978-3-642-04204-1.

Type

Proceedings paper

DOI

10.1007/978-3-642-04205-8_8

Departments

Department of Software Engineering

Annotation

We describe an idea of measuring the performance of a transaction manager's performance. We design a very simple benchmark intended for evaluating this important component of a DB engine. Then we apply it to our own transaction manager's implementation. We also describe the implementation of the transaction manager itself. It is done as a software layer over the eXist database engine. It is a standalone module which can be used to extend eXist functionality by transactional processing when it is needed.