doc. Ing. Jan Janoušek, Ph.D.

Head of the Department of Theoretical Computer Science

Theses

Dissertation theses

Arbology

Level
Topic of dissertation thesis
Topic description

Formal tree languages, formalisms for their generating and description, and models of computations for processing tree languages are one of the areas of the theory formal languages and automata. Practical related research topics can be effective algorithms for various problems of processing trees, such as pattern matching, indexing or computing repeats in trees. Various types of trees are considered, such as ordered and unordered trees, or ranked nad unranked trees. Classes of formal tree languages that are considered can be regular or context-free tree languages. Models of computation for processing trees can be various kinds of tree automata or pushdown automata reading linear notations of trees. The aim of the Ph.D. study is to develop the theory of formal tree languages and of effective algorithms for sequential and parallel processing of trees.

Bachelor theses

Implementing Index of Trees for Tree Patterns

Author
Lukáš Vozáb
Year
2016
Type
Bachelor thesis
Supervisor
doc. Ing. Jan Janoušek, Ph.D.
Reviewers
Ing. Jan Trávníček, Ph.D.
Summary
This bachelor thesis deals with implementation of indexing and matching patterns in tree structures. This new method uses compact suffix automaton, which comes with significant drop in memory cost. Matching phase is able to provide a set of occurrences in time not dependent on total amount of data.

Implementing Faster LR Parser

Author
Jakub Doupal
Year
2016
Type
Bachelor thesis
Supervisor
doc. Ing. Jan Janoušek, Ph.D.
Reviewers
Ing. Jan Trávníček, Ph.D.
Summary
This bachelor thesis deals with implementing faster LR parser for automata library, which is being developed at Department of Theoretical Computer Science at Faculty of Information Technology at CTU in Prague. This library offers experimentation and work with automata, tree structures, languages, and grammars and numerous algorithms concerning the aforementioned. The library is constantly being developed and after its completion, it will be ready to use as a study tool in courses such as BI-AAG, BI-PJP, or MI-SYP. While LR parser is used for work with context-free LR grammars, generalized LR parser can be used to work with ambiguous grammars as well. Because of the number of stack operations in standard LR parsing, there is demand for speeding the analysis up. This can be done by applying the faster LR analysis method, which reduces the number of stack operations in exchange for space. This can be taken advantage of by generalized LR analysis. The goal of this thesis is to study the automata library and relevant algorithms and to implement the faster LR analysis.

Simple Semantic Web Search Engine

Author
Václav Dajbych
Year
2012
Type
Bachelor thesis
Supervisor
doc. Ing. Jan Janoušek, Ph.D.
Reviewers
Ing. Petr Šlegr

Analysis of Dynamic SQL Code in Manta Project

Author
Artur Nasyrov
Year
2018
Type
Bachelor thesis
Supervisor
doc. Ing. Jan Janoušek, Ph.D.
Summary
The aim of this work is to analyze the typical cases of dynamic SQL usage in the data warehouses and to extend the Manta project with a set of strategies which allow dynamic SQL analysis in some cases. The objective of this work is finding the typical cases of dynamic SQL usage in the data warehouses, the design of improvement of the existing set of strategies and subsequent implementation and testing.

Visualisation of finite automata

Author
Petr Lessner
Year
2012
Type
Bachelor thesis
Supervisor
doc. Ing. Jan Janoušek, Ph.D.
Reviewers
Ing. Martin Poliak, Ph.D.

A formal specification of template language Latte and the implemention of a plugin for platform IntelliJ IDEA

Author
Jan Tvrdík
Type
Bachelor thesis
Supervisor
doc. Ing. Jan Janoušek, Ph.D.
Reviewers
Ing. Radomír Polách

Simulation of a nondeterministic pushdown automaton for tree indexing

Author
Jan Milík
Year
2012
Type
Bachelor thesis
Supervisor
doc. Ing. Jan Janoušek, Ph.D.
Reviewers
Ing. Jan Trávníček, Ph.D.

Master theses

Nonlinear Tree Pattern Matching

Author
Prokop Boček
Year
2012
Type
Master thesis
Supervisor
doc. Ing. Jan Janoušek, Ph.D.
Reviewers
Ing. Jan Trávníček, Ph.D.

Security Code Study in LLVM Compiler Infrastructure

Author
Sarmad Neamah Mohsen Al-Fatla
Type
Master thesis
Supervisor
doc. Ing. Jan Janoušek, Ph.D.
Reviewers
Ing. Tomáš Zahradnický, Ph.D.

Indexing XML Documents

Author
Eliška Šestáková
Year
2015
Type
Master thesis
Supervisor
doc. Ing. Jan Janoušek, Ph.D.
Reviewers
Ing. Jan Trávníček, Ph.D.
Summary
The theory of text indexing is very well-researched, which does not hold for theories of indexing other data structures, such as trees for example. In this thesis we review existing techniques for indexing texts and trees and study state-of-the-art methods for indexing XML documents. We show that automata can be used effectively for the purpose of indexing XML documents. A new and simple method for indexing XML documents using deterministic finite automaton is introduced. The presented method supports a significant fragment of XPath queries which may use any combination of child (/) and descendant-or-self (//) axes. We also propose another two indexing techniques based on finite automata, aimed to assist in evaluating paths queries with either / or // axis only. Given a subject XML document D and its corresponding XML tree model T with n nodes, the tree is preprocessed and the index is constructed. The searching phase uses the index, reads an input query Q of size m and computes the list of positions of all occurrences of target nodes of Q in T. All the proposed automata performed the searching in time O(m) and do not depend on n. Although the automaton that supports all linear XPath queries where just // axis is used evaluates O(2^n) distinct queries, number of states of the deterministic automaton is O(h^k), where h is the height of T and k is the number of its leaf nodes. Moreover, we discuss that in case of indexing a common XML document the number of state in the deterministic finite automaton is at most O(h.2^k).

Error Recovery in Tools for Automatically Analysing Program Artefacts

Author
Jakub Valenta
Year
2013
Type
Master thesis
Supervisor
doc. Ing. Jan Janoušek, Ph.D.
Reviewers
Ing. Jan Trávníček, Ph.D.

Generating Target Code Using an Index of AST

Author
Jaroslav Málek
Year
2013
Type
Master thesis
Supervisor
doc. Ing. Jan Janoušek, Ph.D.
Reviewers
Ing. Ondřej Guth, Ph.D.

Searching occurrences of tree patterns in ordered trees with the use of indexes of the trees

Author
Jan Milík
Year
2016
Type
Master thesis
Supervisor
doc. Ing. Jan Janoušek, Ph.D.
Reviewers
Ing. Jan Trávníček, Ph.D.
Summary
We compare two existing schemes for indexing trees, one based on a nondeterministic factor automaton, the other on deterministic compact suffix automaton. A third scheme is presented using position heaps - a relatively new data structures. As a side product, algorithm for converting suffix trees to position heaps and a new data structure based on the position heap is briefly sketched out. The three schemes are implemented and their running times measured. For most inputs, the third, position heap based scheme is found to be the fastest with minimal trade-off in the form of a small number of false positives.

Indexing Trees for Tree Patterns with Hamming Distance

Author
Alžběta Zyková
Year
2016
Type
Master thesis
Supervisor
doc. Ing. Jan Janoušek, Ph.D.
Reviewers
Ing. Radomír Polách
Summary
This thesis deals with indexing the trees for the tree patterns with Hamming distance. Indexing a tree in prefix notation can be regarded as indexing a text, where the tree patterns correspond to specific subsequences of the text. Indexing texts for approximate matching can be solved by many methods. First, the thesis deals with existing methods for indexing trees. Further, the thesis contains a survey of existing methods of indexing a text for approximate matching. The thesis introduces a new method of indexing trees for tree patterns with Hamming distance. The method is based on an existing method of indexing a tree for tree patterns. Indexes of texts for approximate patterns are used, and these indexes have been modified for using for trees. Given a tree of size n, and tree pattern of size m. The resulting method can search occurences of the pattern in O( km + sum(occ(Pi))) with an index size O(n logk n).

Finding repeats in XML

Author
Petr Vaněk
Year
2012
Type
Master thesis
Supervisor
doc. Ing. Jan Janoušek, Ph.D.
Reviewers
Ing. Martin Poliak, Ph.D.

Incremental analysis of PL/SQL scripts

Author
Jan Strnad
Year
2015
Type
Master thesis
Supervisor
doc. Ing. Jan Janoušek, Ph.D.
Summary
This thesis deals with improvement of transformation mechanism in \acrlong{mt} system. Transformation enables automatic modification of \acrshort{plsql} code, represented using an \acrshort{ast}, such that it conforms to some rules. Typical example is conversion of one syntactical construct to another, semantically equal one. Conversion of DECODE function to CASE expression is a such example. As a consequence of the transformation, there is a loss of information in the AST representation. Original solution used to recover this information consisted of complete syntactical and semantics reanalysis of the transformed script. This thesis analyses, designs and implements incremental solution. Implemented solution is completely transparent and does not require any additional changes of existing transformations. The effect of incremental processing was experimentally measured on several typical transformations. Very significant time improvement was achieved, in some cases up to hundreds of times.

Checking the compatibility of data types in Oracle SQL

Author
Ondřej Pelech
Year
2015
Type
Master thesis
Supervisor
doc. Ing. Jan Janoušek, Ph.D.
Reviewers
Ing. Radomír Polách
Summary
Programs in the Oracle SQL language can go wrong if they contain mistakes. In this thesis we present a method for discovering one particular kind of mistakes -- incompatibility of datatypes. We analyze the datatypes and built-in functions in the Oracle SQL dialect. Then we design a method for compatibility checking that works on the dataflow graph. We implement the checker in Java.