doc. Ing. Jan Janoušek, Ph.D.

Head of the Department of Theoretical Computer Science

Profile
Publications
Projects
Teaching
Theses

Theses

Topics of theses
Sample theses

Dissertation theses

Arbology

Supervisor

doc. Ing. Jan Janoušek, Ph.D.

Level

Topic of dissertation thesis

Topic description

Formal tree languages, formalisms for their generating and description, and models of computations for processing tree languages are one of the areas of the theory formal languages and automata. Practical related research topics can be effective algorithms for various problems of processing trees, such as pattern matching, indexing or computing repeats in trees. Various types of trees are considered, such as ordered and unordered trees, or ranked nad unranked trees. Classes of formal tree languages that are considered can be regular or context-free tree languages. Models of computation for processing trees can be various kinds of tree automata or pushdown automata reading linear notations of trees. The aim of the Ph.D. study is to develop the theory of formal tree languages and of effective algorithms for sequential and parallel processing of trees.

Bachelor theses

Simple Semantic Web Search Engine

Author

Václav Dajbych

Year

2012

Type

Bachelor thesis

Supervisor

doc. Ing. Jan Janoušek, Ph.D.

Reviewers

Ing. Petr Šlegr

Department

Department of Software Engineering

Simulation of a nondeterministic pushdown automaton for tree indexing

Author

Jan Milík

Year

2012

Type

Bachelor thesis

Supervisor

doc. Ing. Jan Janoušek, Ph.D.

Reviewers

Ing. Jan Trávníček, Ph.D.

Department

Department of Theoretical Computer Science

Visualisation of finite automata

Author

Petr Lessner

Year

2012

Type

Bachelor thesis

Supervisor

doc. Ing. Jan Janoušek, Ph.D.

Reviewers

Ing. Martin Poliak, Ph.D.

Department

Department of Theoretical Computer Science

A formal specification of template language Latte and the implemention of a plugin for platform IntelliJ IDEA

Author

Jan Tvrdík

Type

Bachelor thesis

Supervisor

doc. Ing. Jan Janoušek, Ph.D.

Reviewers

Ing. Radomír Polách

Department

Department of Theoretical Computer Science

Implementing Faster LR Parser

Author

Jakub Doupal

Year

2016

Type

Bachelor thesis

Supervisor

doc. Ing. Jan Janoušek, Ph.D.

Reviewers

Ing. Jan Trávníček, Ph.D.

Department

Department of Theoretical Computer Science

Summary

This bachelor thesis deals with implementing faster LR parser for automata library, which is being developed at Department of Theoretical Computer Science at Faculty of Information Technology at CTU in Prague. This library offers experimentation and work with automata, tree structures, languages, and grammars and numerous algorithms concerning the aforementioned. The library is constantly being developed and after its completion, it will be ready to use as a study tool in courses such as BI-AAG, BI-PJP, or MI-SYP. While LR parser is used for work with context-free LR grammars, generalized LR parser can be used to work with ambiguous grammars as well. Because of the number of stack operations in standard LR parsing, there is demand for speeding the analysis up. This can be done by applying the faster LR analysis method, which reduces the number of stack operations in exchange for space. This can be taken advantage of by generalized LR analysis. The goal of this thesis is to study the automata library and relevant algorithms and to implement the faster LR analysis.

Thesis on DSpace

Implementing Index of Trees for Tree Patterns

Author

Lukáš Vozáb

Year

2016

Type

Bachelor thesis

Supervisor

doc. Ing. Jan Janoušek, Ph.D.

Reviewers

Ing. Jan Trávníček, Ph.D.

Department

Department of Theoretical Computer Science

Summary

This bachelor thesis deals with implementation of indexing and matching patterns in tree structures. This new method uses compact suffix automaton, which comes with significant drop in memory cost. Matching phase is able to provide a set of occurrences in time not dependent on total amount of data.

Thesis on DSpace

Analysis of Dynamic SQL Code in Manta Project

Author

Artur Nasyrov

Year

2018

Type

Bachelor thesis

Supervisor

doc. Ing. Jan Janoušek, Ph.D.

Department

Department of Theoretical Computer Science

Summary

The aim of this work is to analyze the typical cases of dynamic SQL usage in the data warehouses and to extend the Manta project with a set of strategies which allow dynamic SQL analysis in some cases. The objective of this work is finding the typical cases of dynamic SQL usage in the data warehouses, the design of improvement of the existing set of strategies and subsequent implementation and testing.

Thesis on DSpace

Master theses

Finding repeats in XML

Author

Petr Vaněk

Year

2012

Type

Master thesis

Supervisor

doc. Ing. Jan Janoušek, Ph.D.

Reviewers

Ing. Martin Poliak, Ph.D.

Department

Department of Software Engineering

Nonlinear Tree Pattern Matching

Author

Prokop Boček

Year

2012

Type

Master thesis

Supervisor

doc. Ing. Jan Janoušek, Ph.D.

Reviewers

Ing. Jan Trávníček, Ph.D.

Department

Department of Theoretical Computer Science

Generating Target Code Using an Index of AST

Author

Jaroslav Málek

Year

2013

Type

Master thesis

Supervisor

doc. Ing. Jan Janoušek, Ph.D.

Department

Department of Theoretical Computer Science

Security Code Study in LLVM Compiler Infrastructure

Author

Sarmad Neamah Mohsen Al-Fatla

Type

Master thesis

Supervisor

doc. Ing. Jan Janoušek, Ph.D.

Reviewers

Ing. Tomáš Zahradnický, Ph.D.

Department

Department of Computer Systems

Error Recovery in Tools for Automatically Analysing Program Artefacts

Author

Jakub Valenta

Year

2013

Type

Master thesis

Supervisor

doc. Ing. Jan Janoušek, Ph.D.

Reviewers

Ing. Jan Trávníček, Ph.D.

Department

Department of Theoretical Computer Science

Indexing XML Documents

Author

Eliška Šestáková

Year

2015

Type

Master thesis

Supervisor

doc. Ing. Jan Janoušek, Ph.D.

Reviewers

Ing. Jan Trávníček, Ph.D.

Department

Department of Theoretical Computer Science

Summary

The theory of text indexing is very well-researched, which does not hold for theories of indexing other data structures, such as trees for example. In this thesis we review existing techniques for indexing texts and trees and study state-of-the-art methods for indexing XML documents. We show that automata can be used effectively for the purpose of indexing XML documents. A new and simple method for indexing XML documents using deterministic finite automaton is introduced. The presented method supports a significant fragment of XPath queries which may use any combination of child (/) and descendant-or-self (//) axes. We also propose another two indexing techniques based on finite automata, aimed to assist in evaluating paths queries with either / or // axis only. Given a subject XML document D and its corresponding XML tree model T with n nodes, the tree is preprocessed and the index is constructed. The searching phase uses the index, reads an input query Q of size m and computes the list of positions of all occurrences of target nodes of Q in T. All the proposed automata performed the searching in time O(m) and do not depend on n. Although the automaton that supports all linear XPath queries where just // axis is used evaluates O(2^n) distinct queries, number of states of the deterministic automaton is O(h^k), where h is the height of T and k is the number of its leaf nodes. Moreover, we discuss that in case of indexing a common XML document the number of state in the deterministic finite automaton is at most O(h.2^k).

Thesis on DSpace

Incremental analysis of PL/SQL scripts

Author

Jan Strnad

Year

2015

Type

Master thesis

Supervisor

doc. Ing. Jan Janoušek, Ph.D.

Department

Department of Theoretical Computer Science

Summary

This thesis deals with improvement of transformation mechanism in \acrlong{mt} system. Transformation enables automatic modification of \acrshort{plsql} code, represented using an \acrshort{ast}, such that it conforms to some rules. Typical example is conversion of one syntactical construct to another, semantically equal one. Conversion of DECODE function to CASE expression is a such example. As a consequence of the transformation, there is a loss of information in the AST representation. Original solution used to recover this information consisted of complete syntactical and semantics reanalysis of the transformed script. This thesis analyses, designs and implements incremental solution. Implemented solution is completely transparent and does not require any additional changes of existing transformations. The effect of incremental processing was experimentally measured on several typical transformations. Very significant time improvement was achieved, in some cases up to hundreds of times.

Thesis on DSpace

Checking the compatibility of data types in Oracle SQL

Author

Ondřej Pelech

Year

2015

Type

Master thesis

Supervisor

doc. Ing. Jan Janoušek, Ph.D.

Reviewers

Ing. Radomír Polách

Department

Department of Theoretical Computer Science

Summary

Programs in the Oracle SQL language can go wrong if they contain mistakes. In this thesis we present a method for discovering one particular kind of mistakes -- incompatibility of datatypes. We analyze the datatypes and built-in functions in the Oracle SQL dialect. Then we design a method for compatibility checking that works on the dataflow graph. We implement the checker in Java.

Thesis on DSpace

Searching occurrences of tree patterns in ordered trees with the use of indexes of the trees

Author

Jan Milík

Year

2016

Type

Master thesis

Supervisor

doc. Ing. Jan Janoušek, Ph.D.

Reviewers

Ing. Jan Trávníček, Ph.D.

Department

Department of Theoretical Computer Science

Summary

We compare two existing schemes for indexing trees, one based on a nondeterministic factor automaton, the other on deterministic compact suffix automaton. A third scheme is presented using position heaps - a relatively new data structures. As a side product, algorithm for converting suffix trees to position heaps and a new data structure based on the position heap is briefly sketched out. The three schemes are implemented and their running times measured. For most inputs, the third, position heap based scheme is found to be the fastest with minimal trade-off in the form of a small number of false positives.

Thesis on DSpace

Indexing Trees for Tree Patterns with Hamming Distance

Author

Alžběta Zyková

Year

2016

Type

Master thesis

Supervisor

doc. Ing. Jan Janoušek, Ph.D.

Reviewers

Ing. Radomír Polách

Department

Department of Theoretical Computer Science

Summary

This thesis deals with indexing the trees for the tree patterns with Hamming distance. Indexing a tree in prefix notation can be regarded as indexing a text, where the tree patterns correspond to specific subsequences of the text. Indexing texts for approximate matching can be solved by many methods. First, the thesis deals with existing methods for indexing trees. Further, the thesis contains a survey of existing methods of indexing a text for approximate matching. The thesis introduces a new method of indexing trees for tree patterns with Hamming distance. The method is based on an existing method of indexing a tree for tree patterns. Indexes of texts for approximate patterns are used, and these indexes have been modified for using for trees. Given a tree of size n, and tree pattern of size m. The resulting method can search occurences of the pattern in O( km + sum(occ(Pi))) with an index size O(n logk n).