Ing. Miroslav Balík, Ph.D.

Projects

Natural Language and Formal Language Data Compression

Program
Studentská grantová soutěž ČVUT
Code
SGS10/306/OHK3/3T/18
Period
2010 - 2012
Description
The main goal of this project is design and implementation of novel methods of data compression. The contextual data compression methods are a part of lossless data compression methods. Lossless means that the compression process is fully reversible and decompressed data is identical to the original data. These methods are based on similarities in the input data. The main sight of nowadays research is aimed to word-based contextual data compression and preprocessing transformations. Word-based text compression is more adaptive to the input data. This approach takes advantage of the strictly defined structures of formal languages and natural languages and achieves better compression ratio than equivalent character-based data compression methods. The goal of this project is to use introduced possibilities to design new natural language compression methods with compression ratio comparable or better than the compression ratio of the best nowadays methods of data compression. This proje