Computer Science

Application of Reinforcement Learning to Creating Adversarial Malware Samples

Author

Matouš Kozák

Year

2023

Type

Master thesis

Supervisor

Mgr. Martin Jureček, Ph.D.

Reviewers

doc. Ing. Ivan Šimeček, Ph.D.

Department

Department of Theoretical Computer Science

Summary

Machine learning is becoming increasingly popular as a go-to approach for many tasks due to its world-class results. As a result, antivirus developers are starting to incorporate machine learning models into their products. While these models improve malware detection capabilities, they also carry the disadvantage of being susceptible to adversarial attacks. Although this sensitivity has been demonstrated for many models in white-box settings, a black-box attack is more applicable in practice for the domain of malware detection. Therefore, we present a black-box scenario in which the attacker only has the predicted label at his disposal and forgoes any other information about the target classifier. Using reinforcement learning algorithms, we implemented an attack against the GBDT classifier trained on the EMBER dataset. We trained several RL agents on a dataset of Windows malware with an emphasis on preserving the original functionality of the malicious samples. We achieved an evasion rate of 58.92% against the targeted classifier using the PPO algorithm. In addition to targeting this detector, we studied how the adversarial attack can be transferred to other models. The agent previously trained against the GBDT classifier scored an evasion rate of 28.91% against MalConv, a model based solely on machine learning. We also tested the generated adversarial examples against top AV programs and achieved an evasion rate ranging from 10.24% to 25.7%. These results prove that not only machine learning-based models are vulnerable to adversarial attacks and that better safeguards need to be taken to protect our systems.

Thesis on DSpace

Algorithms for Combinatorics on Words

Author

Martin Rejmon

Year

2021

Type

Master thesis

Supervisor

doc. Ing. Štěpán Starosta, Ph.D.

Reviewers

doc. Ing. Jan Janoušek, Ph.D.

Department

Department of Theoretical Computer Science

Summary

In this thesis, several algorithms for problems from the mathematical field combinatorics on words are thoroughly explained. The algorithms are also implemented in the free and open-source mathematics software system SageMath with the goal of their eventual integration into the same system. The algorithms include: classifying mortal and bounded letters of a morphism, deciding whether a morphism is injective, finding a simplification of a morphism, finding all infinite repetitions in a D0L system, and finding all factors (up to a certain length) of words of the language of a PD0L system. Furthermore, the problem of deciding whether a morphism of a D0L system is injective on the set of factors of words of the language of the system is discussed. The decidability of this problem is an open question, which is not answered in this thesis, but it is shown why it is nontrivial and where some approaches to solving this problem fail.

Thesis on DSpace

A Modular and Extensible Tool for Software Fault Localization

Author

Petr Nevyhoštěný

Year

2020

Type

Master thesis

Supervisor

Ing. Petr Máj

Reviewers

doc. Ing. Ivan Šimeček, Ph.D.

Department

Department of Theoretical Computer Science

Summary

Localization of faults has been recognized as one of the most tedious and time-consuming tasks in software development. Yet it is still performed mostly manually. Software fault localization (SFL) is a research field that studies the development of automated techniques helping developers with this activity. Despite the amount of effort put into the research, applications of proposed methods do not fully meet the practical needs. To that end, this thesis describes \emph{Aardwolf}, a new fault localization tool and framework whose main goal is to ease the usage of SFL techniques in real-world projects. This is achieved by highly modular design allowing convenient extensibility, and by applying recommendations that can be found in published user studies. To demonstrate Aardwolf's capabilities, three different SFL techniques and frontends for C and Python programming languages were implemented. Integration into two larger software projects was described to present its applicability. Preliminary results show that the effectiveness is comparable with the literature. The focus on user experience and tool's extensibility is however a major improvement over the current state in the field.

Thesis on DSpace

Backreferences in practical regular expressions

Author

Martin Hron

Year

2020

Type

Master thesis

Supervisor

Ing. Ondřej Guth, Ph.D.

Reviewers

Ing. Jan Trávníček, Ph.D.

Department

Department of Theoretical Computer Science

Summary

Backreferences are an extension of regular expressions commonly supported in modern tools. Regular expressions with backreference have an increased expressive power but their matching problem is NP-complete. This work researches existing approaches for regular expression matching with focus on backreferences, and also provides an overview of theoretical work on the topic. As a part of this thesis, a matching tool based on computational model called memory automata was implemented. Matching patterns with number of backreferences to different groups limited by a constant using memory automata has polynomial time complexity. An additional recently published technique based on memory automata was also implemented, which provides polynomial complexity even for a subset of patterns with unbounded number of backreferences restricted by a certain property. As a part of this thesis, an alternative algorithm to compute this property was proposed and implemented. The memory automaton model was also extended to support counting constraints, and other common extensions were implemented. Experimental evaluation showed that the implemented tool is much more resistant to catastrophic backtracking when compared to existing implementations with support for backreferences. No tested algorithmic complexity attack triggered significant slowdown.

Thesis on DSpace

Parameterized Algorithms for Steiner Trees

Author

Peter Mitura

Year

2019

Type

Master thesis

Supervisor

RNDr. Ondřej Suchý, Ph.D.

Reviewers

doc. Ing. Ivan Šimeček, Ph.D.

Department

Department of Theoretical Computer Science

Summary

Given a weighted graph and a subset of its vertices called terminals, the Steiner Tree problem is to find its connected subgraph of the minimum weight, that contains all of the given terminals. This problem is widely known to be NP-hard, and has a wide range of applications in integrated circuit design, networking, transportation and more. In our thesis, we analyze Dreyfus-Wagner, Erickson-Monma-Veinott, and Nederlof algorithms for this problem parameterized by the number of vertices, and the rank-based dynamic programming algorithm for the problem parameterized by treewidth. For all of these algorithms, we provide an optimized implementation, and compare it with other solutions by submitting our results to the PACE 2018 challenge.

Thesis on DSpace

Taking and Breaking Games

Author

Šimon Lomič

Year

2019

Type

Master thesis

Supervisor

RNDr. Tomáš Valla, Ph.D.

Department

Department of Theoretical Computer Science

Summary

In this thesis, we examine several open problems in taking and breaking games under the impartial and partizan setting. We prove the existence of an impartial subtraction game with aperiodic nim-sequence and periodic outcome sequence. We also analyze the equivalence classes of subtraction games and provide a solution to one of these classes. We introduce a new taking and breaking game and partially solve it. Then we solve several partizan subtraction games and partizan pure breaking games and describe a general technique for testing arithmetic periodicity of these games. Moreover, we design some game solving algorithms for impartial taking and breaking games. We prove PSPACE-hardness for a generalized subtraction game under the impartial setting and EXPTIME-hardness under the partizan setting.

Thesis on DSpace

Hitting paths in graphs

Author

Radovan Červený

Year

2018

Type

Master thesis

Supervisor

RNDr. Ondřej Suchý, Ph.D.

Reviewers

RNDr. Tomáš Valla, Ph.D.

Department

Department of Theoretical Computer Science

Summary

The problem of d-Path Vertex Cover, d-PVC lies in determining a subset F of vertices of a given graph G=(V,E) such that G \ F does not contain a path on d vertices. The paths we aim to cover need not to be induced. It is known that the d-PVC problem is NP-complete for any d >= 2. 5-PVC is known to be solvable in O(5^k*n^O(1)) time when parameterized by the size of the solution k. In this thesis we present an iterative compression algorithm that solves the 5-PVC problem in O(4^k*n^O(1)) time.

Thesis on DSpace

Subgraph Isomorphism Algorithm Based on Color Coding

Author

Josef Malík

Year

2016

Type

Master thesis

Supervisor

RNDr. Ondřej Suchý, Ph.D.

Reviewers

RNDr. Tomáš Valla, Ph.D.

Department

Department of Theoretical Computer Science

Summary

This thesis describes a solution to the subgraph isomorphism problem using the color coding technique. The subgraph isomorphism problem, its variants, and its applications are shown. The problem of constructing a nice tree decomposition of optimal width for a given graph is addressed, as such structure is required for the corresponding approach. As a main part of the thesis, several modifications and optimizations of the original color coding algorithm are proposed. Practical result of this work is a module implemented in C designated to solve large instances of the subgraph isomorphism problem along with the enumeration of the results.

Thesis on DSpace

Indexing XML Documents

Author

Eliška Šestáková

Year

2015

Type

Master thesis

Supervisor

doc. Ing. Jan Janoušek, Ph.D.

Reviewers

Ing. Jan Trávníček, Ph.D.

Department

Department of Theoretical Computer Science

Summary

The theory of text indexing is very well-researched, which does not hold for theories of indexing other data structures, such as trees for example. In this thesis we review existing techniques for indexing texts and trees and study state-of-the-art methods for indexing XML documents. We show that automata can be used effectively for the purpose of indexing XML documents. A new and simple method for indexing XML documents using deterministic finite automaton is introduced. The presented method supports a significant fragment of XPath queries which may use any combination of child (/) and descendant-or-self (//) axes. We also propose another two indexing techniques based on finite automata, aimed to assist in evaluating paths queries with either / or // axis only. Given a subject XML document D and its corresponding XML tree model T with n nodes, the tree is preprocessed and the index is constructed. The searching phase uses the index, reads an input query Q of size m and computes the list of positions of all occurrences of target nodes of Q in T. All the proposed automata performed the searching in time O(m) and do not depend on n. Although the automaton that supports all linear XPath queries where just // axis is used evaluates O(2^n) distinct queries, number of states of the deterministic automaton is O(h^k), where h is the height of T and k is the number of its leaf nodes. Moreover, we discuss that in case of indexing a common XML document the number of state in the deterministic finite automaton is at most O(h.2^k).

Thesis on DSpace

Incremental Clustering-Based Compression

Author

Luboš Krčál

Year

2014

Type

Master thesis

Supervisor

prof. Ing. Jan Holub, Ph.D.

Reviewers

Ing. Jan Trávníček, Ph.D.

Department

Department of Theoretical Computer Science

Pattern matching and indexing of unordered trees

Author

Vojtěch Sobota

Year

2013

Type

Master thesis

Supervisor

prof. Ing. Bořivoj Melichar, DrSc.

Reviewers

doc. Ing. Jan Janoušek, Ph.D.

Department

Department of Theoretical Computer Science

Theses

Master theses

Application of Reinforcement Learning to Creating Adversarial Malware Samples

Algorithms for Combinatorics on Words

A Modular and Extensible Tool for Software Fault Localization

Backreferences in practical regular expressions

Parameterized Algorithms for Steiner Trees

Taking and Breaking Games

Hitting paths in graphs

Subgraph Isomorphism Algorithm Based on Color Coding

Indexing XML Documents

Incremental Clustering-Based Compression

Pattern matching and indexing of unordered trees