Mgr. Martin Jureček, Ph.D.

Theses

Bachelor theses

Interpretability of machine learning-based results of malware detection using a set of rules

Author
Jan Dolejš
Year
2021
Type
Bachelor thesis
Supervisor
Mgr. Martin Jureček, Ph.D.
Reviewers
prof. Ing. Róbert Lórencz, CSc.
Summary
Machine learning methods have been quite successful in a variety of applications. Antivirus companies use them for quick and reliable malware detection, providing their users with a safer environment from ceaseless daily threats. However, machine learning methods such as deep neural networks are often considered black boxes as the reasoning behind their decisions may often be unclear. Their interpretability is important and helps understand potential errorful decisions. This thesis studies rule-learning algorithms and explores their potential to interpret the outcomes of machine learning algorithms. Two publicly available datasets with Portable Executable file attributes and tailor-made implementations of rule-learning algorithms were used throughout the work. Results showed that algorithm RIPPER is mostly successful at this task; it achieved high accuracies while maintaining compact sets of rules, making rule-learning algorithms a useful alternative to signature-based methods.

Correlation Attack on A5/1

Author
Martin Holec
Year
2015
Type
Bachelor thesis
Supervisor
Mgr. Martin Jureček
Reviewers
doc. Ing. Ivan Šimeček, Ph.D.
Summary
Description of A5/1 cipher, basics of cryptographic attacks and analysis of correlation attack. This thesis is helpful for everyone who is intrested in stream ciphers, basic ideas of cryptographic attacks and correlation attack on A5/1 in particular.

Static detection of malicious PE files

Author
Jakub Ács
Year
2018
Type
Bachelor thesis
Supervisor
Mgr. Martin Jureček
Reviewers
prof. Ing. Róbert Lórencz, CSc.
Summary
Since the classical used approaches for malicious software (malware) detection are failing to provide sufficient level of protection, it is becoming clear that these will have to be substituted or at least enhanced by new, inovative methods in the future. This thesis focuses on utilizing machine learning techniques for malware detection. Using static features extracted from the PE files like imported functions, we were able to train various machine learning models for malware detection. The best performing model reached almost 95\% accuracy. This model can be used for instance, for preliminary detection of malicious PE files. Another purpose of the thesis can be found in the following research, it could serve as another input for future automatic malware analysis studies.

Hard Mathematical Problems in Cryptography

Author
Marek Holík
Year
2022
Type
Bachelor thesis
Supervisor
Mgr. Martin Jureček, Ph.D.
Reviewers
RNDr. Jiřina Scholtzová, Ph.D.
Summary
This thesis describes protocols and ciphers which base their security on integer factorization problem, quadratic residuosity problem and discrete logarithm problem. Random instances of mathematical problems are generated based on the described protocols and ciphers for comparison of Magma, SageMath and MATLAB implementations. Efficiency is evaluated in terms of average time needed and success rate for solving instances of a fixed length. The time for solving one instance is limited to one hour.

Real Time Attack of A5/1

Author
Tomáš Hradský
Year
2015
Type
Bachelor thesis
Supervisor
Mgr. Martin Jureček
Reviewers
doc. Ing. Ivan Šimeček, Ph.D.
Summary
This bachelor thesis is focused on analysis of cipher A5/1 and on ways of exploiting its weaknesses. A5/1 is a stream cipher which is used to provide over-the-air privacy for mobile telecommunication. We focus mainly on time-memory tradeoff attacks which allow the attack to be conducted within tens of seconds on the expense of a vast precomputation phase. The result of the practial part of this thesis is the possibility to uncover the secret key from captured data.

Static malware detection using recurrent neural networks

Author
Matouš Kozák
Year
2020
Type
Bachelor thesis
Supervisor
Mgr. Martin Jureček
Reviewers
prof. Ing. Róbert Lórencz, CSc.
Summary
An ever-growing number of malicious attacks on our IT infrastructure calls for new and better methods of protection. In this thesis, we focus on the use of recurrent neural networks as an agile and accurate way of detecting malware. We only used features extracted from files in the PE file format to represent the suspicious programs which we used to train various types of recurrent neural networks. In this work, we present unique neural network architecture combining dense and stacked LSTM layers to classify PE files. We worked with our dataset of 30,154 files collected from available resources with which we achieved an accuracy of 98.41%, while only 0.5% of benign samples were misclassified as malware on our balanced dataset. All this was accomplished with only 250 epochs of training. These results prove that machine-learning algorithms, especially LSTM networks, can be used as a quick and reliable tool for malware detection.

Cryptanalysis of RSA based on factorization

Author
Petr Horák
Year
2020
Type
Bachelor thesis
Supervisor
Mgr. Martin Jureček
Reviewers
prof. Ing. Róbert Lórencz, CSc.
Summary
This bachelor's thesis deals with RSA cryptanalysis based on integer factorization. Described are some of the factorization methods. Emphasis is laid on their computational complexities and comparison in real-life scenarios of use. Another topic in this work is a discussion of security issues of RSA which can occur by incautious use of this cryptographic method. Selected algorithms are implemented in Magma language for speed testing on the sample data.

Master theses

Reducing Overdefined Systems of Polynomial Equations Derived from Small Scale Variants of the AES

Author
Jana Berušková
Year
2023
Type
Master thesis
Supervisor
Mgr. Martin Jureček, Ph.D.
Reviewers
prof. Ing. Róbert Lórencz, CSc.
Summary
This master thesis deals with the possibility of increasing efficiency of the calculation of the secret key of the AES cipher (or its small scale derivatives) by different types of reduction of overdefined system of polynomial equations, which models the cipher. The reduction is done by selecting and xoring such polynomials, to simplify the overall calculation. One possibility is, for example, xoring polynomials with the same most-significant monomial or finding the most similar polynomials using clustering.

Unsupersived Instance Selection for Malware Detection

Author
Mehmet Efe Zorlutuna
Year
2023
Type
Master thesis
Supervisor
Mgr. Martin Jureček, Ph.D.
Summary
This work proposes a new unsupervised instance selection algorithm for malware detection and evaluates its effectiveness. The proposed algorithm selects informative instances from unlabelled data to train a machine learning model, which is expected to improve its accuracy and efficiency. Experiments show a performance comparison of the algorithm and an existing unsupervised instance selection algorithm.

Algebraic Cryptanalysis of Small-Scale Variants of Stream Cipher E0

Author
Jan Dolejš
Year
2024
Type
Master thesis
Supervisor
Mgr. Martin Jureček, Ph.D.
Reviewers
prof. Ing. Róbert Lórencz, CSc.
Summary
This work introduces and demonstrates innovative progress in the algebraic analysis of the small-scale variants of the stream cipher E0 from the Bluetooth standard. We design the small-scale variants and represent them using a set of polynomial equations. Our work reveals a possible linear relation between the number of keystream bits and the size of the small-scale E0 variants, improving the performance of the used solvers. Our best run finds the initial configuration in $178.5$ seconds for the 22-bit E0 version. Using local sensitivity hashing, we improved the computational time of the SAT solver from 453.1 seconds to 85.3 seconds for the 19-bit E0 version.

Algebraic Cryptanalysis of Small Scale Variants of the Grain-128AEAD cipher

Author
Daniel Minarovič
Year
2023
Type
Master thesis
Supervisor
Mgr. Martin Jureček, Ph.D.
Reviewers
Ing. Josef Kokeš, Ph.D.
Summary
This master thesis deals with algebraic cryptanalysis which converts the Grain-128AEADv2 cipher, hereinafter referred to as Grain, into a system of polynomial equations and then solves it using Groebner bases. Algebraic cryptanalysis will be applied to small scale variants of Grain cipher. For a more efficient calculation of the solution, we will use the Locality Sensitive Hashing method for simplifying the system of polynomial equations. As part of the experiments, we were able to break, for example, a 16-bit small scale variant of the cipher with the number of rounds of 64, as well as the 24-bit variant with number of rounds of 8. The generation time of the equations for the mentioned 24-bit variant took approximately 8 minutes, and the calculation of the solution took 113 seconds. We also applied the Locality Sensitive Hashing algorithm and thereby accelerated the calculation of Groebner bases and solutions. For example, for the 24-bit version with the number of rounds of 8, we were able to reduce the calculation of the solution from 113 seconds to 1,88 seconds.

Application of Reinforcement Learning to Creating Adversarial Malware Samples

Author
Matouš Kozák
Year
2023
Type
Master thesis
Supervisor
Mgr. Martin Jureček, Ph.D.
Reviewers
doc. Ing. Ivan Šimeček, Ph.D.
Summary
Machine learning is becoming increasingly popular as a go-to approach for many tasks due to its world-class results. As a result, antivirus developers are starting to incorporate machine learning models into their products. While these models improve malware detection capabilities, they also carry the disadvantage of being susceptible to adversarial attacks. Although this sensitivity has been demonstrated for many models in white-box settings, a black-box attack is more applicable in practice for the domain of malware detection. Therefore, we present a black-box scenario in which the attacker only has the predicted label at his disposal and forgoes any other information about the target classifier. Using reinforcement learning algorithms, we implemented an attack against the GBDT classifier trained on the EMBER dataset. We trained several RL agents on a dataset of Windows malware with an emphasis on preserving the original functionality of the malicious samples. We achieved an evasion rate of 58.92% against the targeted classifier using the PPO algorithm. In addition to targeting this detector, we studied how the adversarial attack can be transferred to other models. The agent previously trained against the GBDT classifier scored an evasion rate of 28.91% against MalConv, a model based solely on machine learning. We also tested the generated adversarial examples against top AV programs and achieved an evasion rate ranging from 10.24% to 25.7%. These results prove that not only machine learning-based models are vulnerable to adversarial attacks and that better safeguards need to be taken to protect our systems.

Algebraic Cryptanalysis of LFSR-based Stream Ciphers

Author
Jiří Soukup
Year
2022
Type
Master thesis
Supervisor
Mgr. Martin Jureček, Ph.D.
Reviewers
prof. Ing. Róbert Lórencz, CSc.
Summary
Keystream produced by LFSR-based stream ciphers can be described by a system of polynomial equations. Such ciphers can be cracked by solving the system. Using the F4 algorithm we can effectively solve a system of polynomial equations. Knowing the parameters of an LFSR-based stream cipher we can further speed up the computation by applying the guess-and-determine method. In this thesis, we describe how the F4 algorithm works. And mainly we try to find effective ways of applying the guess-and-determine method to reduce the computation time of the F4 algorithm when used to solve a system of polynomial equations that represent an LFSR-based stream cipher.

Algebraic Cryptanalysis of Small Scale Variants of the AES

Author
Marek Bielik
Year
2021
Type
Master thesis
Supervisor
Mgr. Martin Jureček, Ph.D.
Reviewers
prof. Ing. Róbert Lórencz, CSc.
Summary
This work proposes and demonstrates new advances in algebraic cryptanalysis of small scale derivatives of the Advanced Encryption Standard (AES). We model the AES as a system of polynomial equations over GF(2), which involves only the variables of the initial key, and we subsequently attempt to solve such a system. We show, for example, that one of the attacks can recover the secret key for one round of the AES-128 under one minute on a contemporary CPU. This attack requires only two known plaintexts and their corresponding ciphertexts.

Simulation of malware detection model

Author
Libor Šlechta
Year
2020
Type
Master thesis
Supervisor
Mgr. Martin Jureček
Reviewers
prof. Ing. Róbert Lórencz, CSc.
Summary
The number of harmful programs is still rising, and attackers invent new techniques to avoid detection every day. This thesis focuses on automatic malware classification by using machine learning algorithms. The main difference from other work in this field of study is experiment that mimics behavior of a detection model. Four machine learning algorithms were used for this experiment. Neural networks, K-nearest neighbors, Decision trees and Naive Bayes classifier. The best result was achieved by using a variant of neural network. Multilayer perceptron network simulated the model with accuracy of 98,68 %.

Semi-supervised learning for malware detection

Author
Michal Buchovecký
Year
2019
Type
Master thesis
Supervisor
Mgr. Martin Jureček
Reviewers
prof. Ing. Róbert Lórencz, CSc.
Summary
Nowadays, the use of machine learning for malware detection is not very popular. One of the reasons is that labelling of malware and benign files necessary for machine learning is very expensive process. This thesis is focused on malware detection by semi - supervised learning. Semi-supervised learning is a machine learning technique that makes use of labelled as well as unlabelled samples for training. Information obtained from executable files in PE format was used for training. In the thesis it is showed that it is possible to reach better accuracy using semi - supervised learning, compared to purely supervised approach.

Generating Malware Family Signatures from Behavioral Graphs using Unsupervised Learning

Author
Tomáš Zvara
Year
2022
Type
Master thesis
Supervisor
Mgr. Martin Jureček, Ph.D.
Reviewers
prof. Ing. Róbert Lórencz, CSc.
Summary
The behavioral shield is a component of Avast AV responsible for monitoring the system and identifying suspicious behavior of running processes. The behavior is captured in the form of behavioral graphs. There is ongoing internal research that studies the options to use novel deep learning models, i.e., graph neural networks, to allow high-scale learning on these graphs. This thesis aims to study three different graph embeddings, which were produced by the existing graph neural network models, and verify whether the embedded representations allow distinguishing the malicious behavior of various malware strains. The structure of embedded spaces is analyzed using well-known clustering methods, namely k-means, DBSCAN, and agglomerative clustering. The results of the clustering process are evaluated by intrinsic and extrinsic measures. The hypothesis is that the formed clusters should represent individual malware families and thus can be used to create a behavioral signature to detect them. However, performed experiments show that the applied clustering methods produce low-quality clusters that do not allow separating the selected malware strains. There are two factors that cause the low performance. The first one is the poor expressibility of the behavioral graphs with respect to the individual malware strains. The second one is the low quality of the provided labels.

A Comparison of Adversarial Learning Techniques for Malware Detection

Author
Pavla Louthánová
Year
2023
Type
Master thesis
Supervisor
Mgr. Martin Jureček, Ph.D.
Reviewers
Ing. Matouš Kozák
Summary
Malware is one of the most significant security threats today. Early detection is important for effective malware protection. Machine learning has proven to be a useful tool for automated malware detection. However, research has shown that machine learning models are vulnerable to adversarial attacks. This thesis discusses adversarial learning techniques in malware detection. The aim is to apply some existing methods for generating adversarial malware samples, test their effectiveness against selected malware detectors, and compare the evasion rate achieved and their practical applicability. The thesis begins with an introduction to adversarial machine learning, followed by a description of the portable executable file format and a review of publications that focus on generating adversarial malware samples. The techniques used to generate malware samples for experimental evaluation are then presented. Finally, the experiments performed are described, including observation of the time required to generate samples, changes in sample size after using the generator, testing effectiveness against antivirus programs, combining the use of multiple generators to generate samples, and evaluation of the results. Five generators were selected for the experiments: Partial DOS, Full DOS, GAMMA padding, GAMMA section-injection and Gym-malware. The results showed that applying optimised modifications to previously detected malware can lead to incorrect classification of the file as benign. It was also found that generated malware samples can be successfully used against detection models other than those used to generate them, and that using combinations of generators can create new samples that evade detection. Experiments show that the Gym-malware generator, which uses a reinforcement learning approach, has the greatest practical potential. This generator achieved an average sample generation time of 5.73 seconds and the highest evasion rate of 67%. When used in combination with itself, the evasion rate improved to 78%.