prof. Dr. Ing. Petr Kroha, CSc.

Theses

Dissertation theses

Text extraction

Level
Topic of dissertation thesis
Topic description

Text extraction is used in many fields to reduce text by omitting its parts that are not relevant for the user. This includes, for example, summarizing text. The topic comes from the field of text mining that is an area full of open problems. In our published work, we used the techniques of linguistic analysis of sentences (part-of-speech tagging) and clustering according to subsets of sentences (chunking) to analyze the texts of functional requirements specification of software products. We generated questions from the texts to improve the specification of functional requirements by analyzing the answers. The aim of the work is to use proven techniques from our publications in the field of text extraction, compare them with current results of published methods and find better new methods.

Master theses

Using Neo4j DB system to store and query linguistic pattern

Author
Vigneshwar Manoharan
Year
2022
Type
Master thesis
Supervisor
prof. Dr. Ing. Petr Kroha, CSc.
Reviewers
Ing. David Šenkýř
Summary
In this thesis, I present my implementation of storing linguistic patterns as an oriented graph in a Neo4j database and querying it to get matched patterns. Furthermore, this will be used in one of the text mining activities that grammatically inspect the unstructured text, primarily with the part of speech tagging and dependency parsing between each word of a sentence to detect inaccuracies that occur in a text that are caused by ambiguity, incompleteness, and inconsistency. This process uses a pattern-based recognition method to identify the patterns in a text and then matches it with the defined patterns to detect inaccuracies. Since these textual patterns of a sentence are represented as an oriented graph, they will be stored in the Neo4j database which holds words, parts of speech, and punctuation as nodes. Dependencies between each node will be stored as relationships, and then the matching of Query (sentence pattern) with a predefined stored pattern will be done. This is to check which predefined patterns are subgraphs of the Query (sentence pattern). So, these results will be used in a further stage of the text mining process to detect and fix the inaccuracies that occur in a text.

Generating of UML entities from textual requirements specifications

Author
David Šenkýř
Year
2017
Type
Master thesis
Supervisor
prof. Dr. Ing. Petr Kroha, CSc.
Reviewers
Mgr. Ondřej Dvořák
Summary
The quality of Requirements Engineering plays an important role in the whole development life cycle of every software project - because the other phases depend on it. Writing requirements specifications in natural language is a common practice. The natural language is, unfortunately, prone to a number of inaccuracies like ambiguity, inconsistency, and incompleteness. This thesis presents the CASE tool called TEMOS that is able to generate fragments of the UML class model from textual requirements specification and also helps the user with the detection of some inaccuracies in the text.

Evaluation of Data from The Viewpoint of Chaos Theory

Author
Miroslav Škoula
Year
2018
Type
Master thesis
Supervisor
prof. Dr. Ing. Petr Kroha, CSc.
Reviewers
Ing. Daniel Vašata, Ph.D.
Summary
Time series that describe behavior of markets contain a mixture of trends and chaotic segments. The goal of this thesis is to find whether a new indicator that is based on variables describing a measure of chaos (e.g., Hurst exponent) can be included into the technical analysis, and how much profit it can bring.