prof. Ing. Bořivoj Melichar, DrSc.

Publikace

On modification of Boyer-Moore-horspool's algorithm for tree pattern matching in linearised trees

Autoři
Rok
2020
Publikováno
Theoretical Computer Science. 2020, 830 60-90. ISSN 0304-3975.
Typ
Článek
Anotace
Tree pattern matching on ordered trees is an important problem in Computer Science. Ordered trees can be represented as strings with additional properties via various linearisations. We present a backward tree pattern matching algorithm for ordered trees for various linear representations of trees and tree patterns. The algorithm adaptations find all occurrences of a single given tree pattern which match an input tree regardless of the chosen linearisation. The algorithms preserve the properties and advantages of standard backward string pattern matching using Boyer-Moore-Horspool's bad character shift heuristics. The number of symbol comparisons in the backward tree pattern matching can be sublinear in the size of the input tree. As in the case of the string version of Boyer-Moore-Horspool's matching algorithm, the size of the bad character shift table used by the algorithm is linear in the size of the alphabet. We compare the algorithm adaptations with the algorithm using originally chosen linear representation and with the best performing previously existing algorithms based on (non-linearised) tree pattern matching using finite tree automata or stringpath matchers. We show that the presented backward tree pattern matching algorithms outperform the non-linearising ones for single pattern matching and they perform among themselves comparably. (C) 2020 Elsevier B.V. All rights reserved.

Constrained Approximate Subtree Matching by Finite Automata

Rok
2018
Publikováno
Proceedings of the Prague Stringology Conference 2018. Praha: Czech Technical University in Prague, 2018. p. 79-90. ISBN 978-80-01-06484-9.
Typ
Stať ve sborníku
Anotace
Processing tree data structures usually requires a pushdown automaton as a model of computation. Therefore, it is interesting that a finite automaton can be used to solve the constrained approximate subtree pattern matching problem. A systematic approach to the construction of such matcher by finite automaton, which reads input trees in prefix bar notation, is presented. Given a tree pattern and an input tree with m and n nodes, respectively, the nondeterministic finite automaton for the pattern is constructed and it is able to find all occurrences of the pattern to subtrees of the input tree with maximum given distance k. The distance between the pattern and subtrees of an input tree is measured by minimal number of restricted tree edit operations, called leaf nodes edit operations. The corresponding deterministic finite automaton finds all occurrences in time O(n) and has O(|A|^k m^(k+1)) states, where A is an alphabet containing all possible node labels. Note that the size is not exponential in the number of nodes of the tree pattern but only in the number of errors. In practice, the number of errors is expected to be a small constant that is much smaller than the size of the pattern. To achieve better space complexity, it is also shown how dynamic programming approach can be used to simulate the nondeterministic automaton. The space complexity of this approach is O(m), while the time complexity is O(mn).

Efficient determinization of visibly and height-deterministic pushdown automata

Rok
2016
Publikováno
Computer Languages, Systems and Structures. 2016, 2016(46), 91-105. ISSN 1477-8424.
Typ
Článek
Anotace
New algorithms for the determinization of nondeterministic visibly and nondeterministic real-time height-deterministic pushdown automata are presented. The algorithms improve the results of existing algorithms. They construct only accessible states and necessary pushdown symbols of the resulting deterministic pushdown automata.

A new algorithm for the determinisation of visibly pushdown automata

Rok
2015
Publikováno
Proceedings of the 2015 Federated Conference on Computer Science and Information Systems. New York: IEEE, 2015. pp. 915-922. ISSN 2300-5963. ISBN 978-83-60810-65-1.
Typ
Stať ve sborníku
Anotace
Visibly pushdown automata are pushdown automata whose pushdown operations are determined by the input symbol, where the input alphabet is partitioned into three parts for push, pop and local pushdown operations. It is well known that nondeterministic visibly pushdown automata can be determinised. In this paper a new algorithm for the determinisation of nondeterministic visibly pushdown automata is presented. The algorithm improves the existing methods and can result in significantly smaller deterministic pushdown automata. This is achieved in a way that only necessary and accessible states and pushdown symbols are computed and constructed during the determinisation.

Backward Linearised Tree Pattern Matching

Autoři
Rok
2015
Publikováno
Language and Automata Theory and Applications - 9th International Conference, {LATA} 2015, Nice, France, March 2-6, 2015, Proceedings. Berlin: Springer, 2015. pp. 599-610. LNCS. ISSN 0302-9743. ISBN 978-3-319-15578-4.
Typ
Stať ve sborníku
Anotace
We present a new backward tree pattern matching algorithm for ordered trees. The algorithm finds all occurrences of a single given tree pattern which match an input tree. It makes use of linearisations of both the given pattern and the input tree. The algorithm preserves the properties and advantages of standard backward string pattern matching approaches. The number of symbol comparisons in the backward tree pattern matching can be sublinear in the size of the input tree. As in the case of backward string pattern matching, the size of the bad character shift table used by the algorithm is linear in the size of the alphabet. We compare the new algorithm with best performing previously existing algorithms based on (non-linearised) tree pattern matching using finite tree automata or stringpath matchers and show that it outperforms these for single pattern matching.

A Full and Linear Index of a Tree for Tree Patterns

Rok
2014
Publikováno
DCFS 2014: Proceedings of the 16th International Workshop on Descriptional Complexity of Formal Systems. Berlin: Springer, 2014. pp. 198-209. Lecture Notes in Computer Science. ISSN 0302-9743. ISBN 978-3-319-09703-9.
Typ
Stať ve sborníku
Anotace
A new and simple method of indexing a tree for tree patterns is presented. A tree pattern is a tree whose leaves can be labelled by a special symbol S, which serves as a placeholder for any subtree. Given a subject tree T with n nodes, the tree is preprocessed and an index, which consists of a standard string compact suffix automaton and a subtree jump table, is constructed. The number of distinct tree patterns which match the tree is O(2n), and the size of the index is O(n). The searching phase uses the index, reads an input tree pattern P of size m and computes the list of positions of all occurrences of the pattern P in the tree T . For an input tree pattern P in linear prefix notation pref(P) = P1SP2S . . . SPk, k  1, the searching is performed in time O(m+k Pi=1 |occ(Pi)|)), where occ(Pi) is the set of all occurrences of Pi in pref(T ).

Tree template matching in unranked ordered trees

Autoři
Janoušek, J.; Melichar, B.; Žďárek, J.; Christou, M.; Flouri, T.; Iliopoulos, C.S.; Pissis, S.P.
Rok
2013
Publikováno
Journal of Discrete Algorithms. 2013, 20 51-60. ISSN 1570-8667.
Typ
Článek
Anotace
We consider the problem of tree template matching, a type of tree pattern matching, where the tree templates have some of their leaves denoted as “donʼt care”, and propose a solution based on the bottom-up technique.

Arbology: Trees and pushdown automata

Autoři
Rok
2012
Publikováno
Kybernetika. 2012, 48(3), 402-428. ISSN 0023-5954.
Typ
Článek
Anotace
We present a unified and systematic approach to basic principles of Arbology, a new algorithmic discipline focusing on algorithms on trees. Stringology, a highly developed algorithmic discipline in the area of string processing, can use finite automata as its basic model of computation. For various kinds of linear notations of ranked and unranked ordered trees it holds that subtrees of a tree in a linear notation are substrings of the tree in the linear notation. Arbology uses pushdown automata reading such linear notations of trees as its basic model of computation. Basic principles known from stringology are used for the construction of particular arbology algorithms, in which the underlying tree structure is processed with the use of the pushdown store. Arbology results are shown for the basic problems subtree matching and tree indexing for ranked and unranked ordered trees.

Computing all subtree repeats in ordered trees

Autoři
Janoušek, J.; Melichar, B.; Christou, M.; Crochemore, M.; Flouri, Tomas; Iliopoulos, C S; Pisis, S.
Rok
2012
Publikováno
Information Processing Letters. 2012, 112(24), 958-962. ISSN 0020-0190.
Typ
Článek
Anotace
We consider the problem of computing all subtree repeats in a given labeled ordered tree. We first transform the tree to a string representing its postfix notation, and then present an algorithm based on the bottom-up technique to solve it. The proposed algorithm consists of two phases: the preprocessing phase and the phase where all subtree repeats are computed. The linear time and space complexity of the proposed algorithm are important parts of its quality.

Indexing ordered trees for (nonlinear) tree pattern matching

Rok
2012
Publikováno
COMSIS - Computer Science and Information Systems. 2012, 9(3), 1125-1153. ISSN 1820-0214.
Typ
Článek
Anotace
A new kind of acyclic pushdown automata for an ordered tree is presented. The tree pattern pushdown automaton and the nonlinear tree pattern pushdown automaton represent a complete index of the tree for tree patterns and nonlinear tree patterns, respectively. Given a tree with n nodes, the numbers of distinct tree patterns and nonlinear tree patterns whichmatch the tree can be atmost 2n-1+n and at most (2+v)n-1+2, respectively, where v is the maximal number of nonlinear variables allowed in nonlinear tree patterns. The total sizes of nondeterministic versions of the two pushdown automata are O(n) and O(n2), respectively. We discuss the time complexities and show timings of our implementations using the bit-parallelismtechnique. The timings show that for a given tree the running time is linear to the size of the input pattern. The presented pushdown automata are input-driven and therefore can be also determinised.

Tree compression pushdown automaton

Autoři
Rok
2012
Publikováno
Kybernetika. 2012, 48(3), 429-452. ISSN 0023-5954.
Typ
Článek
Anotace
A new kind of a deterministic pushdown automaton, called a Tree Compression Automaton, is presented. The tree compression automaton represents a complete compressed index of a set of trees for subtrees and accepts all subtrees of given trees. The algorithm for constructing our pushdown automaton is incremental. For a single tree with n nodes, the automaton has at most n+1 states, its transition function cardinality is at most 4n and there are 2n+1 pushdown store symbols. If hashing is used for storing automaton's transitions, thus removing a factor of log n, the construction of the automaton takes linear time and space with respect to the length n of the input tree(s). Our pushdown automaton construction can also be used for finding all subtree repeats without augmenting the overall complexity.

Tree template matching in ranked ordered trees by pushdown automata

Autoři
Janoušek, J.; Melichar, B.; Flouri, T.; Iliopoulos, C S; Pissis, Solon
Rok
2012
Publikováno
Journal of Discrete Algorithms. 2012, 2012(17), 478-486. ISSN 1570-8667.
Typ
Článek
Anotace
We consider the problem of tree template matching in ranked ordered trees, and propose a solution based on the bottom-up technique. Specifically, we transform the tree pattern matching problem to a string matching problem, by transforming the tree template and the subject tree to strings representing their postfix notation, and then use pushdown automata as the computational model. The method is analogous to the construction of string pattern matchers. The given tree template is preprocessed once, by constructing a nondeterministic pushdown automaton, which is then transformed to the equivalent deterministic one. Although we prove that the space required for preprocessing is exponential to the size of the tree template in the worst case, the space required for a specific class of tree templates is linear. The time required for the searching phase is linear to the size of the subject tree in both cases.

A Note on a Tree-Based 2D Indexing

Rok
2011
Publikováno
Proceedings of the 15th International Conference on Implementation and Application of Automata. Heidelberg: Springer, 2011. pp. 300-309. Lecture Notes in Computer Science. ISSN 0302-9743. ISBN 978-3-642-18097-2.
Typ
Stať ve sborníku
Anotace
A transformation of 2D structures into the form of a tree, their linearisation and constructions of pushdown automata indexing this representation of a 2D text are presented.

A Note on a Tree-Based 2D Indexing

Rok
2011
Publikováno
Lecture Notes in Computer Science. 2011, 2011(6482), 300-309. ISSN 0302-9743.
Typ
Článek
Anotace
A transformation of 2D structures into the form of a tree, their linearisation and constructions of pushdown automata indexing this representation of a 2D text are presented.

Computing All Subtree Repeats in Ordered Ranked Trees

Autoři
Janoušek, J.; Melichar, B.; Flouri, T.; Crochemore, M.; Ilioupoulos, C.S.; Christou, M.; Pissis, S.
Rok
2011
Publikováno
Lecture Notes in Computer Science. 2011, 7024 338-343. ISSN 0302-9743.
Typ
Článek
Anotace
We consider the problem of finding all subtree repeats in a given ordered ranked tree. Specifically, we transform the given tree to a string representing its postfix notation, and then propose an algorithm based on the bottom-up technique. The proposed algorithm is divided into two phases: the preprocessing phase, and the phase where all subtree repeats are computed. The linear runtime of the algorithm, as well as the use of linear auxiliary space, are important aspects of its quality.

Finite Automata for Generalized Approach to Backward Pattern Matching

Autoři
Melichar, B.; Antoš, A.J.
Rok
2011
Publikováno
Proceedings of the 15th International Conference on Implementation and Application of Automata. Heidelberg: Springer, 2011. pp. 49-58. Lecture Notes in Computer Science. ISSN 0302-9743. ISBN 978-3-642-18097-2.
Typ
Stať ve sborníku
Anotace
We generalized the DAWG backward pattern matching approach to be able to solve a broad range of pattern matching problems. We use a definition of a class of problems. We describe a finite automaton for the basic pattern matching problem of finding an exact occurrence of one string in a text. We propose a mechanism to use simple operations over finite automata in a systematic approach to derive automata for solving problems from a defined class, such as approximate matching, regular expression matching, sequence matching, matching of set of patterns, etc. and their combinations. The benefit of this approach is the ability to quickly derive solutions for newly formulated pattern matching problems.

Indexing Trees by Pushdown Automata for Nonlinear Tree Pattern Matching

Rok
2011
Publikováno
FEDCSIS 2011. Los Alamitos: IEEE Computer Society Press, 2011. pp. 871-878. IEEE Catalog Number CFP1185N-USB. ISBN 978-83-60810-22-4.
Typ
Stať ve sborníku
Anotace
A new kind of an acyclic pushdown automaton for an ordered tree is presented. The nonlinear tree pattern pushdown automaton represents a complete index of the tree for nonlinear tree patterns and accepts all nonlinear tree patterns which match the tree. Given a tree with n nodes, the number of such nonlinear tree patterns is O((2+v)n), where v is the number of variables in the patterns. We discuss time and space complexities of the nondeterministic nonlinear tree pattern pushdown automaton and a way of its implementation. The presented pushdown automaton is input-driven and therefore can be determinised.

Regular Tree Expressions and Deterministic Pushdown Automata

Rok
2011
Publikováno
Proceeding of the 7th Doctoral Workshop on Mathematical and Engineering Methods in Computer Science. Brno: Brno University of Technology, 2011. pp. 70-77. ISBN 978-80-214-4305-1.
Typ
Stať ve sborníku
Anotace
Regular tree expressions are a formalism for describing regular tree languages, which can be accepted by finite tree automata. The class of regular tree languages is a proper subclass of tree languages whose linear notations can be accepted by deterministic string pushdown automata. In this paper we present a simple and straightforward way of transforming a regular tree expression to an equivalent height-deterministic pushdown automaton which reads input ranked and unranked ordered trees in postfix and postfix bar notation, respectively.

Subtree Oracle Pushdown Automata for Ranked and Unranked Ordered Trees

Autoři
Rok
2011
Publikováno
FEDCSIS 2011. Los Alamitos: IEEE Computer Society Press, 2011. pp. 903-906. IEEE Catalog Number CFP1185N-USB. ISBN 978-83-60810-22-4.
Typ
Stať ve sborníku
Anotace
Oracle modification of subtree pushdown automata for ranked and unranked ordered trees is presented. Subtree pushdown automata [1] represent a complete index of a tree for subtrees. Subtree oracle pushdown automata, as inspired by string factor oracle automaton [2], have the number of states equal to n + 1, where n is the length of a corresponding linear notation of the tree. This makes the space complexity very low. By analogy with the string factor oracle automaton the subtree oracle automata can also accept some subtrees which are not present in the given subject tree. However, the number of such false positive matches is smaller than in the case of the string factor oracle automaton. The presented pushdown automata are input-driven and therefore they can be determinised.

Tree Indexing by Pushdown Automata and Repeats of Subtrees

Autoři
Janoušek, J.; Melichar, B.; Flouri, T.; Iliopoulos, C.S.; Pissis, S.
Rok
2011
Publikováno
FEDCSIS 2011. Los Alamitos: IEEE Computer Society Press, 2011. pp. 899-902. IEEE Catalog Number CFP1185N-USB. ISBN 978-83-60810-22-4.
Typ
Stať ve sborníku
Anotace
We consider the problem of finding all subtree repeats in a given unranked ordered tree.We show a new, elegant, and simple method, which is based on the construction of a tree indexing structure called the subtree pushdown automaton. We propose a solution for computing all subtree repeats from the deterministic subtree pushdown automaton constructed over the subject tree. The method we present is directly analogous to the relationship between string deterministic suffix automata and factor repeats in a given string.

Tree Template Matching in Ranked Ordered Trees by Pushdown Automata

Autoři
Janoušek, J.; Melichar, B.; Flouri, T.; Iliopoulos, C.S.; Pissis, S.
Rok
2011
Publikováno
Lecture Notes in Computer Science. 2011, 6807 273-281. ISSN 0302-9743.
Typ
Článek
Anotace
We consider the problem of tree template matching in ranked ordered trees, and propose a solution based on the bottom-up technique. Specifically, we transform the tree pattern matching problem to a string matching problem, by transforming the tree template and the subject tree to strings representing their postfix notation, and then use pushdown automata as the computational model. The method is analogous to the construction of string pattern matchers. The given tree template is preprocessed once, by constructing a nondeterministic pushdown automaton, which is then transformed to the equivalent deterministic one. Although we prove that the space required for preprocessing is exponential to the size of the tree template in the general case, the space required for a specific class of tree templates is linear. The time required for the searching phase is linear to the size of the subject tree in both cases.

Tree-Based 2D Indexing

Rok
2011
Publikováno
International Journal of Foundations of Computer Science. 2011, 22(8), 1893-1907. ISSN 0129-0541.
Typ
Článek
Anotace
A new approach to the 2D pattern matching and specifically to 2D text indexing is proposed. A transformation of a 2D text into the form of a tree is presented. It preserves the context of each element of the 2D text. The tree can be linearised using the prefix notation into the form of a string (a linear text) and the pattern matching is performed in this text. Pushdown automata indexing the 2D text are constructed over the tree representation. They allow to search for 2D prefixes, 2D suffixes, and 2D factors of the 2D text in time proportional to the size of the representation of a 2D pattern. This result achieves the properties analogous to the results obtained in tree pattern matching and string indexing.

A Note on a Tree-based 2D Indexing

Rok
2010
Publikováno
Pre-proceedings of Conference on Implementation and Application of Automata 2010. Manitoba: University of Manitoba, 2010. pp. 269-277.
Typ
Stať ve sborníku
Anotace
A transformation of 2D structures into the form of a tree, their linearisation and constructions of pushdown automata indexing this representation of a 2D text are presented.

Aho-Corasick like multiple subtree matching by deterministic pushdown automata

Autoři
Rok
2010
Publikováno
Proceedings of the 2010 ACM Symposium on Applied Computing. New York: ACM Press, 2010. pp. 2157-2158. ISBN 978-1-60558-639-7.
Typ
Stať ve sborníku
Anotace
Aho-Corasick like multiple subtree matching by deterministic pushdown automata.

Arbology: Trees and Pushdown Automata

Autoři
Rok
2010
Publikováno
Language and Automata Theory and Applications. Berlin: Springer, 2010. pp. 32-49. LNCS. ISSN 0302-9743. ISBN 978-3-642-13088-5.
Typ
Stať ve sborníku vyzvaná či oceněná
Anotace
Abstract Trees are (data) structures used in many areas of human activity. Tree as the formal notion has been introduced in the theory of graphs. Nevertheless, trees have been used a long time before the foundation of the graph theory. An example is the notion of a genealogical tree. The area of family relationships was an origin of some terminology in the area of the tree theory (parent, child, sibling, ...) in addition to the terms originating from the area of the dendrology (root, branch, leaf, ...).

Searching in Tree Structures

Autoři
Melichar, B.; Flouri, T.
Rok
2010
Publikováno
Workshop 2010. Praha: České vysoké učení technické v Praze, 2010. pp. 96-97. CTU Reports. ISBN 978-80-01-04513-8.
Typ
Stať ve sborníku
Anotace
This project focuses on a new research area called Arbology. This area of research is directly similar and analogous to stringology, which focuses on string problems, such as the string pattern matching problem only this time the string patterns and the subject text are replaced by tree patterns and the subject tree. The vast majority of programming languages are based on context-free grammars, and thus the generated source codes are context-free languages which can be represented by tree structures. This gives us the opportunity to effectively analyze source codes at syntax level using algorithms based on tree searching. With the use of appropriate algorithms it is possible to compare source codes, optimize source code or transform one source code to another by finding the minimal number of edit operations necessary for the transformation. The main purpose of this project is to propose algorithms for exact pattern matching in tree structures.

Subtree Matching by Pushdown Automata

Autoři
Rok
2010
Publikováno
COMSIS - Computer Science and Information Systems. 2010, 7(2), 332-357. ISSN 1820-0214.
Typ
Článek
Anotace
Subtree matching is an important problem in Computer Science on which a number of tasks, such as mechanical theorem proving, term-rewriting, symbolic computation and nonprocedural programming languages are based on. A systematic approach to the construction of subtree pattern matchers by deterministic pushdown automata, which read subject trees in prefix and postfix notation, is presented. The method is analogous to the construction of string pattern matchers: for a given pattern, a nondeterministic pushdown automaton is created and is then determinised. In addition, it is shown that the size of the resulting deterministic pushdown automata directly corresponds to the size of the existing string pattern matchers based on finite automata.

Using Finite Automata Approach for Searching Approximate Seeds of Strings

Rok
2010
Publikováno
Intelligent Automation and Computer Engineering. New York: Springer, 2010. p. 347-360. ISSN 1876-1100. ISBN 978-90-481-3516-5.
Typ
Kapitola v knize
Anotace
Seed is a type of a regularity of strings. A restricted approximate seed w of string T is a factor of T such that w covers a superstring of T under some distance rule. In this paper, the problem of searching of all restricted seeds with the smallest Hamming distance is studied and a polynomial time and space algorithm for solving the problem is presented. It searches for all restricted approximate seeds of a string with given limited approximation using Hamming distance and it computes the smallest distance for each found seed. The solution is based on a finite (suffix) automata approach that provides a straightforward way to design algorithms to many problems in stringology. Therefore, it is shown that the set of problems solvable using finite automata includes the one studied in this paper.

On regular tree languages and deterministic pushdown automata

Rok
2009
Publikováno
Acta Informatica. 2009, 46(7), 533-547. ISSN 0001-5903.
Typ
Článek
Anotace
The theory of formal string languages and of formal tree languages are both important parts of the theory of formal languages.Regular tree languages are recognized by finite tree automata.Trees in their postfix notation can be seen as strings.This paper presents a simple transformation from any given finite tree automaton recognizing a regular tree language to a deterministic pushdown automaton accepting the same tree language in postfix notation.The resulting deterministic pushdown automaton can be implemented easily by an existing parser generator because it is constructed for an LR(0) grammar, and its size directly corresponds to the size of the deterministic finite tree automaton. The class of regular tree languages in postfix notation is a proper subclass of deterministic context-free string languages. Moreover, the class of tree languages which are in their postfix notation deterministic context-free string languages is a proper superclass of the class of regular tree languages.

Subtree Matching by Deterministic Pushdown Automata

Autoři
Rok
2009
Publikováno
Proceedings of International Multiconference on Computer Science and Information Technology. New York: IEEE Computer Society Press, 2009. pp. 659-666. ISSN 1896-7094. ISBN 978-83-60810-22-4.
Typ
Stať ve sborníku
Anotace
Subtree matching is an important problem in Computer Science on which a number of tasks, such as mechanical theorem proving, term-rewriting, symbolic computation and nonprocedural programming languages are based on. A systematic approach to the construction of subtree pattern matchers by deterministic pushdown automata, which read subject trees in prefix notation, is presented. The method is analogous to the construction of string pattern matchers: for a given pattern, a nondeterministic pushdown automaton is created and then it is determinised. In addition, it is shown that the size of the resulting deterministic pushdown automata directly corresponds to the size of the existing string pattern matchers based on finite automata.