prof. Dr. Ing. Petr Kroha, CSc.

+420224357972
petr.kroha@fit.cvut.cz
TH:A-953

Theses

Topics of theses
Sample theses

Dissertation theses

Text extraction

Supervisor

prof. Dr. Ing. Petr Kroha, CSc.

Level

Topic of dissertation thesis

Topic description

Text extraction is used in many fields to reduce text by omitting its parts that are not relevant for the user. This includes, for example, summarizing text. The topic comes from the field of text mining that is an area full of open problems. In our published work, we used the techniques of linguistic analysis of sentences (part-of-speech tagging) and clustering according to subsets of sentences (chunking) to analyze the texts of functional requirements specification of software products. We generated questions from the texts to improve the specification of functional requirements by analyzing the answers. The aim of the work is to use proven techniques from our publications in the field of text extraction, compare them with current results of published methods and find better new methods.

Master theses

Using Neo4j DB system to store and query linguistic pattern

Author

Vigneshwar Manoharan

Year

2022

Type

Master thesis

Supervisor

prof. Dr. Ing. Petr Kroha, CSc.

Reviewers

Ing. David Šenkýř

Department

Department of Software Engineering

Summary

In this thesis, I present my implementation of storing linguistic patterns as an oriented graph in a Neo4j database and querying it to get matched patterns. Furthermore, this will be used in one of the text mining activities that grammatically inspect the unstructured text, primarily with the part of speech tagging and dependency parsing between each word of a sentence to detect inaccuracies that occur in a text that are caused by ambiguity, incompleteness, and inconsistency. This process uses a pattern-based recognition method to identify the patterns in a text and then matches it with the defined patterns to detect inaccuracies. Since these textual patterns of a sentence are represented as an oriented graph, they will be stored in the Neo4j database which holds words, parts of speech, and punctuation as nodes. Dependencies between each node will be stored as relationships, and then the matching of Query (sentence pattern) with a predefined stored pattern will be done. This is to check which predefined patterns are subgraphs of the Query (sentence pattern). So, these results will be used in a further stage of the text mining process to detect and fix the inaccuracies that occur in a text.

Thesis on DSpace

Generating of UML entities from textual requirements specifications

Author

David Šenkýř

Year

2017

Type

Master thesis

Supervisor

prof. Dr. Ing. Petr Kroha, CSc.

Reviewers

Mgr. Ondřej Dvořák

Department

Department of Software Engineering

Summary

The quality of Requirements Engineering plays an important role in the whole development life cycle of every software project - because the other phases depend on it. Writing requirements specifications in natural language is a common practice. The natural language is, unfortunately, prone to a number of inaccuracies like ambiguity, inconsistency, and incompleteness. This thesis presents the CASE tool called TEMOS that is able to generate fragments of the UML class model from textual requirements specification and also helps the user with the detection of some inaccuracies in the text.

Thesis on DSpace

Evaluation of Data from The Viewpoint of Chaos Theory

Author

Miroslav Škoula

Year

2018

Type

Master thesis

Supervisor

prof. Dr. Ing. Petr Kroha, CSc.

Reviewers

Ing. Daniel Vašata, Ph.D.

Department

Department of Software Engineering

Summary

Time series that describe behavior of markets contain a mixture of trends and chaotic segments. The goal of this thesis is to find whether a new indicator that is based on variables describing a measure of chaos (e.g., Hurst exponent) can be included into the technical analysis, and how much profit it can bring.

Thesis on DSpace