Master theses

Application of Reinforcement Learning to Creating Adversarial Malware Samples

Author
Matouš Kozák
Year
2023
Type
Master thesis
Supervisor
Mgr. Martin Jureček, Ph.D.
Reviewers
doc. Ing. Ivan Šimeček, Ph.D.
Summary
Machine learning is becoming increasingly popular as a go-to approach for many tasks due to its world-class results. As a result, antivirus developers are starting to incorporate machine learning models into their products. While these models improve malware detection capabilities, they also carry the disadvantage of being susceptible to adversarial attacks. Although this sensitivity has been demonstrated for many models in white-box settings, a black-box attack is more applicable in practice for the domain of malware detection. Therefore, we present a black-box scenario in which the attacker only has the predicted label at his disposal and forgoes any other information about the target classifier. Using reinforcement learning algorithms, we implemented an attack against the GBDT classifier trained on the EMBER dataset. We trained several RL agents on a dataset of Windows malware with an emphasis on preserving the original functionality of the malicious samples. We achieved an evasion rate of 58.92% against the targeted classifier using the PPO algorithm. In addition to targeting this detector, we studied how the adversarial attack can be transferred to other models. The agent previously trained against the GBDT classifier scored an evasion rate of 28.91% against MalConv, a model based solely on machine learning. We also tested the generated adversarial examples against top AV programs and achieved an evasion rate ranging from 10.24% to 25.7%. These results prove that not only machine learning-based models are vulnerable to adversarial attacks and that better safeguards need to be taken to protect our systems.

Algorithms for Combinatorics on Words

Author
Martin Rejmon
Year
2021
Type
Master thesis
Supervisor
doc. Ing. Štěpán Starosta, Ph.D.
Reviewers
doc. Ing. Jan Janoušek, Ph.D.
Summary
In this thesis, several algorithms for problems from the mathematical field combinatorics on words are thoroughly explained. The algorithms are also implemented in the free and open-source mathematics software system SageMath with the goal of their eventual integration into the same system. The algorithms include: classifying mortal and bounded letters of a morphism, deciding whether a morphism is injective, finding a simplification of a morphism, finding all infinite repetitions in a D0L system, and finding all factors (up to a certain length) of words of the language of a PD0L system. Furthermore, the problem of deciding whether a morphism of a D0L system is injective on the set of factors of words of the language of the system is discussed. The decidability of this problem is an open question, which is not answered in this thesis, but it is shown why it is nontrivial and where some approaches to solving this problem fail.

A Modular and Extensible Tool for Software Fault Localization

Author
Petr Nevyhoštěný
Year
2020
Type
Master thesis
Supervisor
Ing. Petr Máj
Reviewers
doc. Ing. Ivan Šimeček, Ph.D.
Summary
Localization of faults has been recognized as one of the most tedious and time-consuming tasks in software development. Yet it is still performed mostly manually. Software fault localization (SFL) is a research field that studies the development of automated techniques helping developers with this activity. Despite the amount of effort put into the research, applications of proposed methods do not fully meet the practical needs. To that end, this thesis describes \emph{Aardwolf}, a new fault localization tool and framework whose main goal is to ease the usage of SFL techniques in real-world projects. This is achieved by highly modular design allowing convenient extensibility, and by applying recommendations that can be found in published user studies. To demonstrate Aardwolf's capabilities, three different SFL techniques and frontends for C and Python programming languages were implemented. Integration into two larger software projects was described to present its applicability. Preliminary results show that the effectiveness is comparable with the literature. The focus on user experience and tool's extensibility is however a major improvement over the current state in the field.

Backreferences in practical regular expressions

Author
Martin Hron
Year
2020
Type
Master thesis
Supervisor
Ing. Ondřej Guth, Ph.D.
Reviewers
Ing. Jan Trávníček, Ph.D.
Summary
Backreferences are an extension of regular expressions commonly supported in modern tools. Regular expressions with backreference have an increased expressive power but their matching problem is NP-complete. This work researches existing approaches for regular expression matching with focus on backreferences, and also provides an overview of theoretical work on the topic. As a part of this thesis, a matching tool based on computational model called memory automata was implemented. Matching patterns with number of backreferences to different groups limited by a constant using memory automata has polynomial time complexity. An additional recently published technique based on memory automata was also implemented, which provides polynomial complexity even for a subset of patterns with unbounded number of backreferences restricted by a certain property. As a part of this thesis, an alternative algorithm to compute this property was proposed and implemented. The memory automaton model was also extended to support counting constraints, and other common extensions were implemented. Experimental evaluation showed that the implemented tool is much more resistant to catastrophic backtracking when compared to existing implementations with support for backreferences. No tested algorithmic complexity attack triggered significant slowdown.

Parameterized Algorithms for Steiner Trees

Author
Peter Mitura
Year
2019
Type
Master thesis
Supervisor
RNDr. Ondřej Suchý, Ph.D.
Reviewers
doc. Ing. Ivan Šimeček, Ph.D.
Summary
Given a weighted graph and a subset of its vertices called terminals, the Steiner Tree problem is to find its connected subgraph of the minimum weight, that contains all of the given terminals. This problem is widely known to be NP-hard, and has a wide range of applications in integrated circuit design, networking, transportation and more. In our thesis, we analyze Dreyfus-Wagner, Erickson-Monma-Veinott, and Nederlof algorithms for this problem parameterized by the number of vertices, and the rank-based dynamic programming algorithm for the problem parameterized by treewidth. For all of these algorithms, we provide an optimized implementation, and compare it with other solutions by submitting our results to the PACE 2018 challenge.

Taking and Breaking Games

Author
Šimon Lomič
Year
2019
Type
Master thesis
Supervisor
RNDr. Tomáš Valla, Ph.D.
Summary
In this thesis, we examine several open problems in taking and breaking games under the impartial and partizan setting. We prove the existence of an impartial subtraction game with aperiodic nim-sequence and periodic outcome sequence. We also analyze the equivalence classes of subtraction games and provide a solution to one of these classes. We introduce a new taking and breaking game and partially solve it. Then we solve several partizan subtraction games and partizan pure breaking games and describe a general technique for testing arithmetic periodicity of these games. Moreover, we design some game solving algorithms for impartial taking and breaking games. We prove PSPACE-hardness for a generalized subtraction game under the impartial setting and EXPTIME-hardness under the partizan setting.

Hitting paths in graphs

Author
Radovan Červený
Year
2018
Type
Master thesis
Supervisor
RNDr. Ondřej Suchý, Ph.D.
Reviewers
RNDr. Tomáš Valla, Ph.D.
Summary
The problem of d-Path Vertex Cover, d-PVC lies in determining a subset F of vertices of a given graph G=(V,E) such that G \ F does not contain a path on d vertices. The paths we aim to cover need not to be induced. It is known that the d-PVC problem is NP-complete for any d >= 2. 5-PVC is known to be solvable in O(5^k*n^O(1)) time when parameterized by the size of the solution k. In this thesis we present an iterative compression algorithm that solves the 5-PVC problem in O(4^k*n^O(1)) time.

Subgraph Isomorphism Algorithm Based on Color Coding

Author
Josef Malík
Year
2016
Type
Master thesis
Supervisor
RNDr. Ondřej Suchý, Ph.D.
Reviewers
RNDr. Tomáš Valla, Ph.D.
Summary
This thesis describes a solution to the subgraph isomorphism problem using the color coding technique. The subgraph isomorphism problem, its variants, and its applications are shown. The problem of constructing a nice tree decomposition of optimal width for a given graph is addressed, as such structure is required for the corresponding approach. As a main part of the thesis, several modifications and optimizations of the original color coding algorithm are proposed. Practical result of this work is a module implemented in C designated to solve large instances of the subgraph isomorphism problem along with the enumeration of the results.

Indexing XML Documents

Author
Eliška Šestáková
Year
2015
Type
Master thesis
Supervisor
doc. Ing. Jan Janoušek, Ph.D.
Reviewers
Ing. Jan Trávníček, Ph.D.
Summary
The theory of text indexing is very well-researched, which does not hold for theories of indexing other data structures, such as trees for example. In this thesis we review existing techniques for indexing texts and trees and study state-of-the-art methods for indexing XML documents. We show that automata can be used effectively for the purpose of indexing XML documents. A new and simple method for indexing XML documents using deterministic finite automaton is introduced. The presented method supports a significant fragment of XPath queries which may use any combination of child (/) and descendant-or-self (//) axes. We also propose another two indexing techniques based on finite automata, aimed to assist in evaluating paths queries with either / or // axis only. Given a subject XML document D and its corresponding XML tree model T with n nodes, the tree is preprocessed and the index is constructed. The searching phase uses the index, reads an input query Q of size m and computes the list of positions of all occurrences of target nodes of Q in T. All the proposed automata performed the searching in time O(m) and do not depend on n. Although the automaton that supports all linear XPath queries where just // axis is used evaluates O(2^n) distinct queries, number of states of the deterministic automaton is O(h^k), where h is the height of T and k is the number of its leaf nodes. Moreover, we discuss that in case of indexing a common XML document the number of state in the deterministic finite automaton is at most O(h.2^k).

Incremental Clustering-Based Compression

Author
Luboš Krčál
Year
2014
Type
Master thesis
Supervisor
prof. Ing. Jan Holub, Ph.D.
Reviewers
Ing. Jan Trávníček, Ph.D.

Pattern matching and indexing of unordered trees

Author
Vojtěch Sobota
Year
2013
Type
Master thesis
Supervisor
prof. Ing. Bořivoj Melichar, DrSc.
Reviewers
doc. Ing. Jan Janoušek, Ph.D.

The person responsible for the content of this page: Ing. Zdeněk Muzikář, CSc.