Ing. Petr Procházka, Ph.D.

Publikace

Backward Pattern Matching on Elastic-Degenerate Strings

Autoři
Procházka, P.; Cvacho, O.; Krčál, L.; Holub, J.
Rok
2023
Publikováno
SN Computer Science. 2023, 4(5), ISSN 2662-995X.
Typ
Článek
Anotace
Recently, the concept of Elastic Degenerate Strings (EDS) was introduced as a way of representing a sequenced population of the same species. Several on-line Elastic Degenerate String Matching (EDSM) algorithms were presented so far. Some of them provide practical implementation. We propose a new on-line EDSM algorithm BNDM-EDS. Our algorithm combines two traditional algorithms BNDM and the Shift-And that were adapted to the specifics needed by Elastic Degenerate Strings. BNDM-EDS is running in O (Nmd w m e) worst-case time. This implies O (Nm) time for small patterns, where m is the length of the searched pattern, N is the size of EDS, and w is the size of the computer word. The algorithm uses O (N + n) space, where n is the length of EDS. BNDM-EDS requires a simple preprocessing step with time and space O(m). Experimental results on real genomic data show superiority of BNDM-EDS over state-of-the-art algorithms.

Block-oriented Dense Compressor

Rok
2011
Publikováno
Proceedings Data Compression Conference. Los Alamitos, CA: IEEE Computer Soc., 2011. pp. 372. ISSN 1068-0314. ISBN 978-0-7695-4352-9.
Typ
Stať ve sborníku
Anotace
We address the problem of block-oriented natural language compression. The block-oriented compression is semi-adaptive in terms of one block but it is adaptive in terms of whole input. Our block-oriented compression method is based on Dense Code idea. It achieves very good compression ratio around 32 % on natural language text. We show that our compression method has some interesting properties which could be applied in digital libraries and other textual databases. The compression method allows direct searching on compressed text. Moreover the vocabulary can be used as a block index which makes some kinds of searching very fast. Another property is that the compressor can send single blocks with corresponding vocabulary which is considerate to limited bandwidth. In addition the compressed file can be continuously extended without need of previous decompression.