# doc. Ing. Jan Janoušek, Ph.D.

Head of the Department of Theoretical Computer Science

- +420224359881 +420224359881
- jan.janousek@fit.cvut.cz
- TH:A-1224

### Inexact tree pattern matching with 1-degree edit distance using finite automata

Authors

Year

2023

Published

Discrete Applied Mathematics. 2023, 330 78-97. ISSN 0166-218X.

Type

Article

Departments

Annotation

Given an input tree and a tree pattern, the inexact tree pattern matching problem is about finding all subtrees in the input tree that match the tree pattern with up to errors. To measure the number of errors between two labeled ordered trees, we use the 1-degree edit distance that uses operations node relabel, leaf insert, and leaf delete. The proposed solution is based on a finite automaton that reads a given input tree represented in linear, prefix bar, notation. First, we show its construction for a constrained variant of 1-degree edit distance where leaf insert/delete operations cannot be applied to the tree pattern recursively. Then, we extend this approach to both unit cost and non-unit cost 1-degree edit distance. The deterministic version of the proposed finite automaton finds all inexact occurrences of the tree pattern in time linear to the size of the input tree. However, since the size of such automaton can be exponential in the number of nodes of the tree pattern, it is practical only for small patterns. Therefore, we also consider the dynamic programming approach as a way of simulating the nondeterministic finite automaton. This approach comes with the space complexity and time complexity where is the number of nodes of the tree pattern, is the number of nodes of the input tree, and is the maximum number of errors allowed.

### On the Smallest Synchronizing Terms of Finite Tree Automata

Authors

Blažej, V.; Janoušek, J.; Plachý, Š.

Year

2023

Published

Implementation and Application of Automata. Springer, Cham, 2023. p. 79-90. ISSN 1611-3349. ISBN 978-3-031-40247-0.

Type

Proceedings paper

Departments

Annotation

This paper deals with properties of synchronizing terms for finite tree automata, which is a generalization of the synchronization principle of deterministic finite string automata (DFA) and such terms correspond to a connected subgraph, where a state in the root is always the same regardless of states of subtrees attached to it. We ask, what is the maximum height of the smallest synchronizing term of a deterministic bottom-up tree automaton (DFTA) with n states, which naturally leads to two types of synchronizing terms, called weak and strong, that depend on whether a variable, i.e., a placeholder for a subtree, must be present in at least one leaf or all of them. We prove that the maximum height in the case of weak synchronization has a theoretical upper bound sl(𝑛)+𝑛−1, where sl(𝑛) is the maximum length of the shortest synchronizing string of an n-state DFAs. For strong synchronization, we prove exponential bounds. We provide a theoretical upper bound of 2^𝑛−𝑛−1 for the height and two constructions of automata approaching it. One achieves the height of Θ(2^(𝑛−√𝑛)) with an alphabet of linear size, and the other achieves 2^(𝑛−1)−1 with an alphabet of quadratic size.

### Automata Approach to Inexact Tree Pattern Matching Using 1-degree Edit Distance

Authors

Year

2021

Published

Proceedings of the Prague Stringology Conference 2021. Praha: CESKE VYSOKE UCENI TECHNICKE V PRAZE, 2021. p. 1-15. ISBN 978-80-01-06869-4.

Type

Proceedings paper

Departments

Annotation

We compare labeled ordered trees based on unit cost 1-degree edit distance that uses operations vertex relabeling, leaf insertion, and leaf deletion. Given an input tree T and a tree pattern P, we find all subtrees in T that match P with up to k errors. We show that this problem can be solved by finite automaton when T and P are represented in linear, prefix bar, notation. First, we solve this problem by a pushdown automaton. Then, we show that it can be transformed into a nondeterministic finite automaton due to its restricted use of the pushdown store. We also show a simulation of the nondeterministic finite automaton by dynamic programming.

### Conversion of Finite Tree Automata to Regular Tree Expressions By State Elimination

Authors

Year

2020

Published

Proceedings of the Prague Stringology Conference 2020. Praha: Czech Technical University in Prague, 2020. p. 11-22. ISBN 978-80-01-06749-9.

Type

Proceedings paper

Departments

Annotation

Regular tree languages can be accepted and described by finite tree automata and regular tree expressions, respectively.
We describe a new algorithm that converts a finite tree automaton to an equivalent regular tree expression.
Our algorithm is analogous to the well-known state elimination method of the conversion of a string finite automaton to an equivalent string regular expression.
We define a generalised finite tree automaton, the transitions of which read the sets of trees described by regular tree expressions.
Our algorithm eliminates states of the generalised finite tree automaton, which is analogous to the elimination of states in converting the string finite automaton.

### Forward Linearised Tree Pattern Matching Using Tree Pattern Border Array

Authors

Year

2020

Published

Proceedings of the Prague Stringology Conference 2020. Praha: Czech Technical University in Prague, 2020. p. 61-73. ISBN 978-80-01-06749-9.

Type

Proceedings paper

Departments

Annotation

We define a tree pattern border array as a property of linearised trees analogous to border arrays from the string domain. We use it to define a new forward tree pattern matching algorithm for ordered trees, which finds all occurrences of a single given linearised tree pattern in a linearised input tree. As with the classical Morris-Pratt algorithm, the tree pattern border array is used to determine shift lengths in the matching phase of the tree pattern matching algorithm.
We compare the new algorithm with the best performing previously existing algorithms based on backward linearised tree pattern matching algorithms, (non-)linearised tree pattern matching algorithms using finite tree automata or stringpath matchers. We show that the presented algorithm outperforms these for single tree pattern matching.

### On modification of Boyer-Moore-horspool's algorithm for tree pattern matching in linearised trees

Authors

Trávníček, J.; Janoušek, J.; Melichar, B.; Cleophas, L.

Year

2020

Published

Theoretical Computer Science. 2020, 830 60-90. ISSN 0304-3975.

Type

Article

Departments

Annotation

Tree pattern matching on ordered trees is an important problem in Computer Science. Ordered trees can be represented as strings with additional properties via various linearisations. We present a backward tree pattern matching algorithm for ordered trees for various linear representations of trees and tree patterns. The algorithm adaptations find all occurrences of a single given tree pattern which match an input tree regardless of the chosen linearisation. The algorithms preserve the properties and advantages of standard backward string pattern matching using Boyer-Moore-Horspool's bad character shift heuristics. The number of symbol comparisons in the backward tree pattern matching can be sublinear in the size of the input tree. As in the case of the string version of Boyer-Moore-Horspool's matching algorithm, the size of the bad character shift table used by the algorithm is linear in the size of the alphabet. We compare the algorithm adaptations with the algorithm using originally chosen linear representation and with the best performing previously existing algorithms based on (non-linearised) tree pattern matching using finite tree automata or stringpath matchers. We show that the presented backward tree pattern matching algorithms outperform the non-linearising ones for single pattern matching and they perform among themselves comparably. (C) 2020 Elsevier B.V. All rights reserved.

### On Synchronizing Tree Automata and Their Work–Optimal Parallel Run, Usable for Parallel Tree Pattern Matching

Authors

Year

2020

Published

SOFSEM 2020: Theory and Practice of Computer Science. Cham: Springer, 2020. p. 576-586. Lecture Notes in Computer Science. vol. 12011. ISSN 0302-9743. ISBN 978-3-030-38918-5.

Type

Proceedings paper

Departments

Annotation

We present a way of synchronizing finite tree automata: We define a synchronizing
term and a k-local deterministic finite bottom–up tree automaton. Furthermore, we
present a work–optimal parallel algorithm for parallel run of the deterministic
k-local tree automaton in O(log n) time with ⌈n/ log n⌉ processors, for k ≤ log n,
or in O(k) time with ⌈n/ k⌉ processors, for k ≥ log n, where n is the number of nodes
of an input tree, on EREW PRAM. Finally, we prove that the deterministic finite
bottom–up tree automaton that is used as a standard tree pattern matcher is
k-local with respect to the height of a tree pattern.

### Automata Approach to XML Data Indexing

Type

Article

Departments

Annotation

The internal structure of XML documents can be viewed as a tree. Trees are among the fundamental and well-studied data structures in computer science. They express a hierarchical structure and are widely used in many applications. This paper focuses on the problem of processing tree data structures; particularly, it studies the XML index problem. Although there exist many state-of-the-art methods, the XML index problem still belongs to the active research areas. However, existing methods usually lack clear references to a systematic approach to the standard theory of formal languages and automata. Therefore, we present some new methods solving the XML index problem using the automata theory. These methods are simple and allow one to efficiently process a small subset of XPath. Thus, having an XML data structure, our methods can be used efficiently as auxiliary data structures that enable answering a particular set of queries, e.g., XPath queries using any combination of the child and descendant-or-self axes. Given an XML tree model with n nodes, the searching phase uses the index, reads an input query of size m, finds the answer in time O(m) and does not depend on the size of the original XML document.

### Constrained Approximate Subtree Matching by Finite Automata

Authors

Šestáková, E.; Melichar, B.; Janoušek, J.

Year

2018

Published

Proceedings of the Prague Stringology Conference 2018. Praha: Czech Technical University in Prague, 2018. p. 79-90. ISBN 978-80-01-06484-9.

Type

Proceedings paper

Departments

Annotation

Processing tree data structures usually requires a pushdown automaton as a model of computation. Therefore, it is interesting that a finite automaton can be used to solve the constrained approximate subtree pattern matching problem. A systematic approach to the construction of such matcher by finite automaton, which reads input trees in prefix bar notation, is presented. Given a tree pattern and an input tree with m and n nodes, respectively, the nondeterministic finite automaton for the pattern is constructed and it is able to find all occurrences of the pattern to subtrees of the input tree with maximum given distance k. The distance between the pattern and subtrees of an input tree is measured by minimal number of restricted tree edit operations, called leaf nodes edit operations. The corresponding deterministic finite automaton finds all occurrences in time O(n) and has O(|A|^k m^(k+1)) states, where A is an alphabet containing all possible node labels. Note that the size is not exponential in the number of nodes of the tree pattern but only in the number of errors. In practice, the number of errors is expected to be a small constant that is much smaller than the size of the pattern. To achieve better space complexity, it is also shown how dynamic programming approach can be used to simulate the nondeterministic automaton. The space complexity of this approach is O(m), while the time complexity is O(mn).

### Construction of a Pushdown Automaton Accepting a Postfix Notation of a Tree Language Given by a Regular Tree Expression

Authors

Year

2018

Published

7th Symposium on Languages, Applications and Technologies (SLATE 2018). Saarbrücken: Dagstuhl Publishing,, 2018. p. 6:1-6:12. ISSN 2190-6807. ISBN 978-3-95977-072-9.

Type

Proceedings paper

Departments

Annotation

Regular tree expressions are a formalism for describing regular tree languages, which can be accepted by a finite tree automaton as a standard model of computation. It was proved that the class of regular tree languages is a proper subclass of tree languages those linear notations can be accepted by deterministic string pushdown automata. In this paper, we present a new algorithm for transforming regular tree expressions to equivalent real-time height-deterministic pushdown automata that accept the trees in their postfix notation.

### Indexing XML Documents Using Tree Paths Automaton

Authors

Year

2017

Published

6th Symposium on Languages, Applications and Technologies. Saarbrücken: Dagstuhl Publishing,, 2017. p. 10:1-10:14. ISSN 1868-8969. ISBN 978-3-95977-056-9.

Type

Proceedings paper

Departments

Annotation

An XML document can be viewed as a tree in a natural way. Processing tree data structures usually requires a pushdown automaton as a model of computation. Therefore, it is interesting that a finite automaton can be used to solve the XML index problem. In this paper, we attempt to support a significant fragment of XPath queries which may use any combination of child (i.e., /) and descendant-or-self (i.e., //) axis. A systematic approach to the construction of such XML index, which is a finite automaton called Tree Paths Automaton, is presented. Given an XML tree model T, the tree is first of all preprocessed by means of its linear fragments called string paths. Since only path queries are considered, the branching structure of the XML tree model can be omitted. For individual string paths, smaller Tree Paths Automata are built, and they are afterwards combined to form the index. The searching phase uses the index, reads an input query Q of size m, and computes the list of positions of all occurrences of Q in the tree T. The searching is performed in time O(m) and does not depend on the size of the XML document. Although the number of queries is clearly exponential in the number of nodes of the XML tree model, the size of the index seems to be, according to our experimental results, usually only about 2.5 times larger than the size of the original document.

### Efficient determinization of visibly and height-deterministic pushdown automata

Authors

Polách, R.; Trávníček, J.; Janoušek, J.; Melichar, B.

Year

2016

Published

Computer Languages, Systems and Structures. 2016, 2016(46), 91-105. ISSN 1477-8424.

Type

Article

Departments

Annotation

New algorithms for the determinization of nondeterministic visibly and nondeterministic real-time height-deterministic pushdown automata are presented. The algorithms improve the results of existing algorithms. They construct only accessible states and necessary pushdown symbols of the resulting deterministic pushdown automata.

### A new algorithm for the determinisation of visibly pushdown automata

Authors

Polách, R.; Trávníček, J.; Janoušek, J.; Melichar, B.

Year

2015

Published

Proceedings of the 2015 Federated Conference on Computer Science and Information Systems. New York: IEEE, 2015. pp. 915-922. ISSN 2300-5963. ISBN 978-83-60810-65-1.

Type

Proceedings paper

Departments

Annotation

Visibly pushdown automata are pushdown automata whose pushdown operations are determined by the input symbol, where the input alphabet is partitioned into three parts for push, pop and local pushdown operations. It is well known that nondeterministic visibly pushdown automata can be determinised. In this paper a new algorithm for the determinisation of nondeterministic visibly pushdown automata is presented. The algorithm improves the existing methods and can result in significantly smaller deterministic pushdown automata. This is achieved in a way that only necessary and accessible states and pushdown symbols are computed and constructed during the determinisation.

### Backward Linearised Tree Pattern Matching

Authors

Trávníček, J.; Janoušek, J.; Melichar, B.; Cleophas, L.

Year

2015

Published

Language and Automata Theory and Applications - 9th International Conference, {LATA} 2015, Nice, France, March 2-6, 2015, Proceedings. Berlin: Springer, 2015. pp. 599-610. LNCS. ISSN 0302-9743. ISBN 978-3-319-15578-4.

Type

Proceedings paper

Departments

Annotation

We present a new backward tree pattern matching algorithm
for ordered trees. The algorithm finds all occurrences of a single given
tree pattern which match an input tree. It makes use of linearisations of
both the given pattern and the input tree. The algorithm preserves the
properties and advantages of standard backward string pattern matching
approaches. The number of symbol comparisons in the backward tree
pattern matching can be sublinear in the size of the input tree. As in the
case of backward string pattern matching, the size of the bad character
shift table used by the algorithm is linear in the size of the alphabet.
We compare the new algorithm with best performing previously existing
algorithms based on (non-linearised) tree pattern matching using finite
tree automata or stringpath matchers and show that it outperforms these
for single pattern matching.

### On Distributed Concern Delivery in User Interface Design

Authors

Černý, T.; Macík, M.; Donahoo, M.J.; Janoušek, J.

Year

2015

Published

COMSIS - Computer Science and Information Systems. 2015, 2015 655-681. ISSN 1820-0214.

Type

Article

Departments

Annotation

Increasing demands on user interface (UI) usability, adaptability, and dynamic behavior drives ever-growing development and maintenance complexity. Traditional UI design techniques result in complex descriptions for data presentations with significant information restatement. In addition, multiple concerns in UI development leads to descriptions that exhibit concern tangling, which results in high fragment replication. Concern-separating approaches address these issues; however, they fail to maintain the separation of concerns for execution tasks like rendering or UI delivery to clients. During the rendering process at the server side, the separation collapses into entangled concerns that are provided to clients. Such client-side entanglement may seem inconsequential since the clients are simply displaying what is sent to them; however, such entanglement compromises client performance as it results in problems such as replication, fragment granularity ill-suited for effective caching, etc. This paper considers advantages brought by concern-separation from both perspectives. It proposes extension to the aspect-oriented UI design with distributed concern delivery (DCD) for client-server applications. Such an extension lessens the serverside involvement in UI assembly and reduces the fragment replication in provided UI descriptions. The server provides clients with individual UI concerns, and they become partially responsible for the UI assembly. This change increases client-side concern reuse and extends caching opportunities, reducing the volume of transmitted information between client and server to improve UI responsiveness and performance. The underlying aspect-oriented UI design automates the server-side derivation of concerns related to data presentations adapted to runtime context, security, conditions, etc. Evaluation of the approach is considered in a case study applying DCD to an existing, production web application.

### Tree String Path Subsequences Automaton and Its Use for Indexing XML Documents

Authors

Year

2015

Published

Languages, Applications and Technologies. Cham: Springer International Publishing AG, 2015. pp. 171-181. ISSN 1865-0929. ISBN 978-3-319-27652-6.

Type

Proceedings paper

Departments

Annotation

The theory of indexing texts is well-researched, which does not hold for indexing other data structures, such as trees for example. In this paper a simple method of indexing a tree for subsequences of string paths in the tree by finite automaton is presented. The use of the index is shown on indexing XML documents for XPath descendant-or-self axis inspired queries. Given a subject tree T with n nodes, the tree is preprocessed and an index, which is a directed acyclic subsequence graph for a set of strings, is constructed. The searching phase uses the index, reads an input string path subsequence Q inspired by the specific XPath query of size m and computes the list of positions of all occurrences of Q in the tree T. The searching is performed in time O(m) and does not depend on n. Although the number of distinct valid queries is O(2n), the size of the index is O(hk), where h is the height of the tree T and k is the number of its leaves. Moreover, we discuss that in the case of indexing a common XML document the size of the index is even smaller O(h⋅2k).

### A Full and Linear Index of a Tree for Tree Patterns

Authors

Janoušek, J.; Melichar, B.; Polách, R.; Poliak, M.; Trávníček, J.

Year

2014

Published

DCFS 2014: Proceedings of the 16th International Workshop on Descriptional Complexity of Formal Systems. Berlin: Springer, 2014. pp. 198-209. Lecture Notes in Computer Science. ISSN 0302-9743. ISBN 978-3-319-09703-9.

Type

Proceedings paper

Departments

Annotation

A new and simple method of indexing a tree for tree patterns is presented. A tree pattern is a tree whose leaves can be labelled by a special symbol S, which serves as a placeholder for any subtree. Given a subject tree T with n nodes, the tree is preprocessed and an index, which consists of a standard string compact suffix automaton and a subtree jump table, is constructed. The number of distinct tree patterns which match the tree is O(2n), and the size of the index is O(n). The searching phase uses the index, reads an input tree pattern P of size m and computes the list of positions of all occurrences of the pattern P in the tree T . For an input tree pattern P in linear prefix notation pref(P) = P1SP2S . . . SPk, k 1, the searching is performed in time O(m+k Pi=1 |occ(Pi)|)), where occ(Pi) is the set of all occurrences of Pi in
pref(T ).

### Efficient Description and Cache Performance in Aspect-Oriented User Interface Design

Authors

Černý, T.; Macík, M.; Donahoo, M.; Janoušek, J.

Year

2014

Published

Proceedings of the 2014 Federated Conference on Computer Science and Information Systems. Katowice: Polish Information Processing Society, 2014. pp. 1667-1676. Annals of Computer Science and Information Systems. ISSN 2300-5963. ISBN 978-83-60810-58-3.

Type

Proceedings paper

Departments

Annotation

Increasing demands on web user interface (UI) usability, adaptability, and dynamic behavior drives ever growing development and maintenance complexity. Conventional design approaches scale poorly with such rising complexity, resulting in rapidly increasing costs. Much of the complexity centers around data presentation and processing. Recent work greatly reduces such data complexity through the application of Aspect-Oriented UI (AOUI) design, which separates various UI concerns; however, rendering in conventional and even AOUI approaches fails to maintain this separation, often resulting in high reptitions of concern fragments due to tangling. Even worse, mixing of dynamic and immutable components greatly limits caching efficacy as each have differing lifetimes. We extend AOUI design to push down concern separation to rendering, which reduces description size, through repetition reduction, and enables separate caching of individual concerns. Our results show considerable size reduction of UI descriptions for data presentations, faster load times and extended caching capabilities.

### Target code selection by tilling ast with the use of tree pattern pushdown automaton

Authors

Janoušek, J.; Málek, Jaromír

Year

2014

Published

3rd Symposium on Languages, Applications and Technologies, SLATE 2014. Wadern: Schloss Dagstuhl - Leibniz Center for Informatics, 2014. pp. 159-165. ISSN 2190-6807. ISBN 978-3-939897-68-2.

Type

Proceedings paper

Departments

Annotation

A new and simple method for target code selection by tilling an abstract syntax tree is presented. As it is usual, tree patterns corresponding to target machine instructions are matched in the abstract syntax tree. Matching tree patterns is performed with the use of tree pattern pushdown automaton, which accepts all tree patterns matching the abstract syntax tree in the linear postfix bar notation and represents a full index of the abstract syntax tree for tree patterns. The use of the index allows to match patterns quickly, in time depending on the size of patterns and not depending on the size of the tree. The selection of a particular target instruction corresponds to a modification of the abstract syntax tree and also a corresponding incremental modification of the index is performed. A reference to a fully functional prototype is provided.

### Tree template matching in unranked ordered trees

Authors

Christou, M.; Flouri, T.; Iliopoulos, C.S.; Janoušek, J.; Melichar, B.; Pissis, S.P.; Žďárek, J.

Year

2013

Published

Journal of Discrete Algorithms. 2013, 20 51-60. ISSN 1570-8667.

Type

Article

Departments

Annotation

We consider the problem of tree template matching, a type of tree pattern matching, where the tree templates have some of their leaves denoted as “donʼt care”, and propose a solution based on the bottom-up technique.

### Arbology: Trees and pushdown automata

Authors

Melichar, B.; Janoušek, J.; Flouri, T.

Year

2012

Published

Kybernetika. 2012, 48(3), 402-428. ISSN 0023-5954.

Type

Article

Departments

Annotation

We present a unified and systematic approach to basic principles of Arbology, a new algorithmic discipline focusing on algorithms on trees. Stringology, a highly developed algorithmic discipline in the area of string processing, can use finite automata as its basic model of computation. For various kinds of linear notations of ranked and unranked ordered trees it holds that subtrees of a tree in a linear notation are substrings of the tree in the linear notation. Arbology uses pushdown automata reading such linear notations of trees as its basic model of computation. Basic principles known from stringology are used for the construction of particular arbology algorithms, in which the underlying tree structure is processed with the use of the pushdown store. Arbology results are shown for the basic problems subtree matching and tree indexing for ranked and unranked ordered trees.

### Computing all subtree repeats in ordered trees

Authors

Christou, M.; Crochemore, M.; Flouri, Tomas; Iliopoulos, C S; Janoušek, J.; Melichar, B.; Pisis, S.

Year

2012

Published

Information Processing Letters. 2012, 112(24), 958-962. ISSN 0020-0190.

Type

Article

Departments

Annotation

We consider the problem of computing all subtree repeats in a given labeled
ordered tree. We first transform the tree to a string representing its postfix
notation, and then present an algorithm based on the bottom-up technique to
solve it. The proposed algorithm consists of two phases: the preprocessing phase
and the phase where all subtree repeats are computed. The linear time and space
complexity of the proposed algorithm are important parts of its quality.

### Indexing ordered trees for (nonlinear) tree pattern matching

Authors

Trávníček, J.; Janoušek, J.; Melichar, B.

Year

2012

Published

COMSIS - Computer Science and Information Systems. 2012, 9(3), 1125-1153. ISSN 1820-0214.

Type

Article

Departments

Annotation

A new kind of acyclic pushdown automata for an ordered tree is
presented. The tree pattern pushdown automaton and the nonlinear tree
pattern pushdown automaton represent a complete index of the tree for
tree patterns and nonlinear tree patterns, respectively. Given a tree with
n nodes, the numbers of distinct tree patterns and nonlinear tree patterns
whichmatch the tree can be atmost 2n-1+n and at most (2+v)n-1+2, respectively,
where v is the maximal number of nonlinear variables allowed
in nonlinear tree patterns. The total sizes of nondeterministic versions of
the two pushdown automata are O(n) and O(n2), respectively. We discuss
the time complexities and show timings of our implementations using
the bit-parallelismtechnique. The timings show that for a given tree the
running time is linear to the size of the input pattern. The presented pushdown
automata are input-driven and therefore can be also determinised.

### Tree compression pushdown automaton

Authors

Janoušek, J.; Melichar, B.; Poliak, M.

Year

2012

Published

Kybernetika. 2012, 48(3), 429-452. ISSN 0023-5954.

Type

Article

Departments

Annotation

A new kind of a deterministic pushdown automaton, called a Tree Compression Automaton, is presented. The tree compression automaton represents a complete compressed index of a set of trees for subtrees and accepts all subtrees of given trees. The algorithm for constructing our pushdown automaton is incremental. For a single tree with n nodes, the automaton has at most n+1 states, its transition function cardinality is at most 4n and there are 2n+1 pushdown store symbols. If hashing is used for storing automaton's transitions, thus removing a factor of log n, the construction of the automaton takes linear time and space with respect to the length n of the input tree(s). Our pushdown automaton construction can also be used for finding all subtree repeats without augmenting the overall complexity.

### Tree template matching in ranked ordered trees by pushdown automata

Authors

Flouri, T.; Iliopoulos, C S; Janoušek, J.; Melichar, B.; Pissis, Solon

Year

2012

Published

Journal of Discrete Algorithms. 2012, 2012(17), 478-486. ISSN 1570-8667.

Type

Article

Departments

Annotation

We consider the problem of tree template matching in ranked ordered trees, and propose a solution based on the bottom-up technique. Specifically, we transform the tree pattern matching problem to a string matching problem, by transforming the tree template and the subject tree to strings representing their postfix notation, and then use pushdown automata as the computational model. The method is analogous to the construction of string pattern matchers. The given tree template is preprocessed once, by constructing a nondeterministic pushdown automaton, which is then transformed to the equivalent deterministic one. Although we prove that the space required for preprocessing is exponential to the size of the tree template in the worst case, the space required for a specific class of tree templates is linear. The time required for the searching phase is linear to the size of the subject tree in both cases.

### Computing All Subtree Repeats in Ordered Ranked Trees

Authors

Janoušek, J.; Melichar, B.; Flouri, T.; Crochemore, M.; Ilioupoulos, C.S.; Christou, M.; Pissis, S.

Year

2011

Published

Lecture Notes in Computer Science. 2011, 7024 338-343. ISSN 0302-9743.

Type

Article

Departments

Annotation

We consider the problem of finding all subtree repeats in a given ordered ranked tree. Specifically, we transform the given tree to a string representing its postfix notation, and then propose an algorithm based on the bottom-up technique. The proposed algorithm is divided into two phases: the preprocessing phase, and the phase where all subtree repeats are computed. The linear runtime of the algorithm, as well as the use of linear auxiliary space, are important aspects of its quality.

### Indexing Trees by Pushdown Automata for Nonlinear Tree Pattern Matching

Authors

Trávníček, J.; Janoušek, J.; Melichar, B.

Year

2011

Published

FEDCSIS 2011. Los Alamitos: IEEE Computer Society Press, 2011. pp. 871-878. IEEE Catalog Number CFP1185N-USB. ISBN 978-83-60810-22-4.

Type

Proceedings paper

Departments

Annotation

A new kind of an acyclic pushdown automaton for
an ordered tree is presented. The nonlinear tree pattern pushdown
automaton represents a complete index of the tree for nonlinear
tree patterns and accepts all nonlinear tree patterns which match
the tree. Given a tree with n nodes, the number of such nonlinear
tree patterns is O((2+v)n), where v is the number of variables
in the patterns. We discuss time and space complexities of the
nondeterministic nonlinear tree pattern pushdown automaton
and a way of its implementation. The presented pushdown
automaton is input-driven and therefore can be determinised.

### Regular Tree Expressions and Deterministic Pushdown Automata

Authors

Janoušek, J.; Melichar, B.; Polách, R.

Year

2011

Published

Proceeding of the 7th Doctoral Workshop on Mathematical and Engineering Methods in Computer Science. Brno: Brno University of Technology, 2011. pp. 70-77. ISBN 978-80-214-4305-1.

Type

Proceedings paper

Departments

Annotation

Regular tree expressions are a formalism for
describing regular tree languages, which can be accepted by finite tree automata.
The class of regular tree languages is a proper subclass of tree languages
whose linear notations can be accepted by deterministic string pushdown automata.
In this paper we present a simple and straightforward way of transforming a regular tree expression to an equivalent height-deterministic pushdown automaton which reads input ranked
and unranked ordered trees in postfix and postfix bar notation, respectively.

### Subtree Oracle Pushdown Automata for Ranked and Unranked Ordered Trees

Authors

Plicka, M.; Janoušek, J.; Melichar, B.

Year

2011

Published

FEDCSIS 2011. Los Alamitos: IEEE Computer Society Press, 2011. pp. 903-906. IEEE Catalog Number CFP1185N-USB. ISBN 978-83-60810-22-4.

Type

Proceedings paper

Departments

Annotation

Oracle modification of subtree pushdown automata
for ranked and unranked ordered trees is presented. Subtree
pushdown automata [1] represent a complete index of a tree
for subtrees. Subtree oracle pushdown automata, as inspired by
string factor oracle automaton [2], have the number of states
equal to n + 1, where n is the length of a corresponding linear
notation of the tree. This makes the space complexity very low.
By analogy with the string factor oracle automaton the subtree
oracle automata can also accept some subtrees which are not
present in the given subject tree. However, the number of such
false positive matches is smaller than in the case of the string
factor oracle automaton. The presented pushdown automata are
input-driven and therefore they can be determinised.

### Tree Indexing by Pushdown Automata and Repeats of Subtrees

Authors

Flouri, T.; Janoušek, J.; Melichar, B.; Iliopoulos, C.S.; Pissis, S.

Year

2011

Published

FEDCSIS 2011. Los Alamitos: IEEE Computer Society Press, 2011. pp. 899-902. IEEE Catalog Number CFP1185N-USB. ISBN 978-83-60810-22-4.

Type

Proceedings paper

Departments

Annotation

We consider the problem of finding all subtree
repeats in a given unranked ordered tree.We show a new, elegant,
and simple method, which is based on the construction of a tree
indexing structure called the subtree pushdown automaton. We
propose a solution for computing all subtree repeats from the
deterministic subtree pushdown automaton constructed over the
subject tree. The method we present is directly analogous to the
relationship between string deterministic suffix automata and
factor repeats in a given string.

### Tree Template Matching in Ranked Ordered Trees by Pushdown Automata

Authors

Flouri, T.; Janoušek, J.; Melichar, B.; Iliopoulos, C.S.; Pissis, S.

Year

2011

Published

Lecture Notes in Computer Science. 2011, 6807 273-281. ISSN 0302-9743.

Type

Article

Departments

Annotation

We consider the problem of tree template matching in ranked ordered trees, and propose a solution based on the bottom-up technique. Specifically, we transform the tree pattern matching problem to a string matching problem, by transforming the tree template and the subject tree to strings representing their postfix notation, and then use pushdown automata as the computational model. The method is analogous to the construction of string pattern matchers. The given tree template is preprocessed once, by constructing a nondeterministic pushdown automaton, which is then transformed to the equivalent deterministic one. Although we prove that the space required for preprocessing is exponential to the size of the tree template in the general case, the space required for a specific class of tree templates is linear. The time required for the searching phase is linear to the size of the subject tree in both cases.

### Aho-Corasick like multiple subtree matching by deterministic pushdown automata

Authors

Flouri, T.; Janoušek, J.; Melichar, B.

Year

2010

Published

Proceedings of the 2010 ACM Symposium on Applied Computing. New York: ACM Press, 2010. pp. 2157-2158. ISBN 978-1-60558-639-7.

Type

Proceedings paper

Departments

Annotation

Aho-Corasick like multiple subtree matching by deterministic pushdown automata.

### Subtree Matching by Pushdown Automata

Authors

Flouri, T.; Janoušek, J.; Melichar, B.

Year

2010

Published

COMSIS - Computer Science and Information Systems. 2010, 7(2), 332-357. ISSN 1820-0214.

Type

Article

Departments

Annotation

Subtree matching is an important problem in Computer Science on which a number of tasks, such as mechanical theorem proving, term-rewriting, symbolic computation and nonprocedural programming languages are based on. A systematic approach to the construction of subtree pattern matchers by deterministic pushdown automata, which read subject trees in prefix and postfix notation, is presented. The method is analogous to the construction of string pattern matchers: for a given pattern, a nondeterministic pushdown automaton is created and is then determinised. In addition, it is shown that the size of the resulting deterministic pushdown automata directly corresponds to the size of the existing string pattern matchers based on finite automata.

### On regular tree languages and deterministic pushdown automata

Authors

Janoušek, J.; Melichar, B.

Year

2009

Published

Acta Informatica. 2009, 46(7), 533-547. ISSN 0001-5903.

Type

Article

Departments

Annotation

The theory of formal string languages and of formal tree languages are both important parts of the theory of formal languages.Regular tree languages are recognized by finite tree automata.Trees in their postfix notation can be seen as strings.This paper presents a simple transformation from any given finite tree automaton recognizing a regular tree language to a deterministic pushdown automaton accepting the same tree language in postfix notation.The resulting deterministic pushdown automaton can be implemented easily by an existing parser generator because it is constructed for an LR(0) grammar, and its size directly corresponds to the size of the deterministic finite tree automaton. The class of regular tree languages in postfix notation is a proper subclass of deterministic context-free string languages. Moreover, the class of tree languages which are in their postfix notation deterministic context-free string languages is a proper superclass of the class of regular tree languages.

### String Suffix Automata and Subtree Pushdown Automata

Authors

Year

2009

Published

Proceedings of the Prague Stringology Conference 2009. Praha: Czech Technical University in Prague, 2009. p. 160-172. ISBN 978-80-01-04403-2.

Type

Proceedings paper

Departments

Annotation

String suffix automata accept all suffixes of a given string and belong to the
fundamental stringology principles. Extending their transitions by specific
pushdown operations results in new subtree pushdown automata, which accept all
subtrees of a given subject tree in prefix notation and are analogous to the
suffix automata in their properties. The deterministic subtree pushdown
automaton accepts an input subtree in time linear to the number of nodes of the
subtree and its total size is linear to the number of nodes of the given subject
tree.

### Subtree Matching by Deterministic Pushdown Automata

Authors

Flouri, T.; Janoušek, J.; Melichar, B.

Year

2009

Published

Proceedings of International Multiconference on Computer Science and Information Technology. New York: IEEE Computer Society Press, 2009. pp. 659-666. ISSN 1896-7094. ISBN 978-83-60810-22-4.

Type

Proceedings paper

Departments

Annotation

Subtree matching is an important problem in Computer Science on which a number
of tasks, such as mechanical theorem proving, term-rewriting, symbolic
computation and nonprocedural programming languages are based on.
A systematic approach to the construction of subtree pattern matchers by
deterministic pushdown automata, which read subject trees in prefix notation,
is presented. The method is analogous to the construction of string pattern
matchers: for a given pattern, a nondeterministic pushdown automaton is
created and then it is determinised. In addition, it is shown that the size
of the resulting deterministic pushdown automata directly corresponds to the
size of the existing string pattern matchers based on finite automata.