Ing. Petr Procházka, Ph.D.

Projects

Natural Language and Formal Language Data Compression

Program
Studentská grantová soutěž ČVUT
Code
SGS10/306/OHK3/3T/18
Period
2010 - 2012
Description
The main goal of this project is design and implementation of novel methods of data compression. The contextual data compression methods are a part of lossless data compression methods. Lossless means that the compression process is fully reversible and decompressed data is identical to the original data. These methods are based on similarities in the input data. The main sight of nowadays research is aimed to word-based contextual data compression and preprocessing transformations. Word-based text compression is more adaptive to the input data. This approach takes advantage of the strictly defined structures of formal languages and natural languages and achieves better compression ratio than equivalent character-based data compression methods. The goal of this project is to use introduced possibilities to design new natural language compression methods with compression ratio comparable or better than the compression ratio of the best nowadays methods of data compression. This proje

Processing Tree Structures and Data Compression

Program
Studentská grantová soutěž ČVUT
Code
SGS13/097/OHK3/1T/18
Period
2013
Description
With the vast amount of data needed to be archived, indexed and procesed, special data structures are required. The tree is a typical data structure which is used very often for hierarchically storing data. Specialized algorithms are needed for indexing tree data structures and also accessing, extracting and analyzing data stored in them. The aim of this research is to design efficient yet simple to understand algorithms dealing with tree pattern matching (both exact and approximate) and tree indexing, and provide a toolkit implementation. Another goal of this project is design and implementation of novel methods of data compression in two areas: first, music score compression; second, natural language compression.