Aktuální informace FIT ke koronaviru najdete zde.

Mgr. Martin Jureček, Ph.D.

Publikace

Application of Distance Metric Learning to Automated Malware Detection

Rok
2021
Publikováno
IEEE Access. 2021, 2021(9), 96151-96165. ISSN 2169-3536.
Typ
Článek
Anotace
Distance metric learning aims to find the most appropriate distance metric parameters to improve similarity-based models such as k -Nearest Neighbors or k -Means. In this paper, we apply distance metric learning to the problem of malware detection. We focus on two tasks: (1) to classify malware and benign files with a minimal error rate, (2) to detect as much malware as possible while maintaining a low false positive rate. We propose a malware detection system using Particle Swarm Optimization that finds the feature weights to optimize the similarity measure. We compare the performance of the approach with three state-of-the-art distance metric learning techniques. We find that metrics trained in this way lead to significant improvements in the k -Nearest Neighbors classification. We conducted and evaluated experiments with more than 150,000 Windows-based malware and benign samples. Features consisted of metadata contained in the headers of executable files in the portable executable file format. Our experimental results show that our malware detection system based on distance metric learning achieves a 1.09 % error rate at 0.74 % false positive rate (FPR) and outperforms all machine learning algorithms considered in the experiment. Considering the second task related to keeping minimal FPR, we achieved a 1.15 % error rate at only 0.13 % FPR.

Improving Classification of Malware Families using Learning a Distance Metric

Rok
2021
Publikováno
Proceedings of the 7th International Conference on Information Systems Security and Privacy. Madeira: SciTePress, 2021. p. 643-652. ISSN 2184-4356. ISBN 978-989-758-491-6.
Typ
Stať ve sborníku
Anotace
The objective of malware family classification is to assign a tested sample to the correct malware family. This paper concerns the application of selected state-of-the-art distance metric learning techniques to malware families classification. The goal of distance metric learning algorithms is to find the most appropriate distance metric parameters concerning some optimization criteria. The distance metric learning algorithms considered in our research learn from metadata, mostly contained in the headers of executable files in the PE file format. Several experiments have been conducted on the dataset with 14,000 samples consisting of six prevalent malware families and benign files. The experimental results showed that the average precision and recall of the k-Nearest Neighbors algorithm using the distance learned on training data were improved significantly comparing when the non-learned distance was used. The k-Nearest Neighbors classifier using the Mahalanobis distance metric learned by the Metric Learning for Kernel Regression method achieved average precision and recall, both of 97.04% compared to Random Forest with a 96.44% of average precision and 96.41% of average recall, which achieved the best classification results among the state-of-the-art ML algorithms considered in our experiments.

Representation of PE Files using LSTM Networks

Autoři
Jureček, M.; Kozák, M.
Rok
2021
Publikováno
Proceedings of the 7th International Conference on Information Systems Security and Privacy. Madeira: SciTePress, 2021. p. 516-525. ISSN 2184-4356. ISBN 978-989-758-491-6.
Typ
Stať ve sborníku
Anotace
An ever-growing number of malicious attacks on IT infrastructures calls for new and efficient methods of protection. In this paper, we focus on malware detection using the Long Short-Term Memory (LSTM) as a preprocessing tool to increase the classification accuracy of machine learning algorithms. To represent the malicious and benign programs, we used features extracted from files in the PE file format. We created a large dataset on which we performed common feature preparation and feature selection techniques. With the help of various LSTM and Bidirectional LSTM (BLSTM) network architectures, we further transformed the collected features and trained other supervised ML algorithms on both transformed and vanilla datasets. Transformation by deep (4 hidden layers) versions of LSTM and BLSTM networks performed well and decreased the error rate of several state-of-the-art machine learning algorithms significantly. For each machine learning algorithm considered in our experiments, the LSTM-based transformation of the feature space results in decreasing the corresponding error rate by more than 58.60 %, in comparison when the feature space was not transformed using LSTM network.

Distance Metric Learning using Particle Swarm Optimization to Improve Static Malware Detection

Rok
2020
Publikováno
Proceedings of the 6th International Conference on Information Systems Security and Privacy. Madeira: SciTePress, 2020. p. 725-732. ISSN 2184-4356. ISBN 978-989-758-399-5.
Typ
Stať ve sborníku
Anotace
Distance metric learning is concerned with finding appropriate parameters of distance function with respect to a particular task. In this work, we present a malware detection system based on static analysis. We use k-nearest neighbors (KNN) classifier with weighted heterogeneous distance function that can handle nominal and numeric features extracted from portable executable file format. Our proposed approach attempts to specify the weights of the features using particle swarm optimization algorithm. The experimental results indicate that KNN with the weighted distance function improves classification accuracy significantly.

Automatická detekcia škodlivého kódu

Autoři
Rok
2019
Publikováno
Sborník příspěvků PAD 2019 – elektronická verze. Praha: AMCA spol. s r.o., 2019. p. 35-38. ISBN 978-80-88214-20-5.
Typ
Stať ve sborníku
Anotace
Problém automatickej detekcie malware predstavuje v súčasnosti pre antivírové spoločnosti veľké výzvy. Každý deň sa vygeneruje čoraz väčšie množstvo nových neznámych vzoriek škodlivého kódu, ktoré je potrebné včas detekovať. Pretože manuálna analýza vzoriek z dôvodu ohromného množstva nie je možná, je nutné využiť automatickú detekciu malware. Algoritmy strojového učenia sa ukazujú byť vhodným nástrojom, ktorý je do určitej miery schopný adaptovať sa a detekovať aj nové, doteraz neznáme vzorky.

Side-Channel Attack on the A5/1 Stream Cipher

Rok
2019
Publikováno
Proceedings of the 22nd Euromicro Conference on Digital Systems Design. Los Alamitos, CA: IEEE Computer Soc., 2019. p. 633-638. ISBN 978-1-7281-2862-7.
Typ
Stať ve sborníku
Anotace
In this paper we present cryptanalysis of the A5/1 stream cipher used in GSM mobile phones. Our attack is based on power analysis where we assume that the power consumption while clocking 3 LFSRs is different than when clocking 2 LFSRs. We demonstrate a simple power analysis (SPA) attack and discuss existing differential power analysis (DPA). We present the attack for recovering secret key based on the information on clocking bits of LFSRs that was deduced from power analysis. The attack has a 100% success rate, requires minimal storage and it does not requires any single bit of a keystream. An average time complexity of our attack based on SPA is around 233 where the computation unit is a resolution of system of linear equations over the Z2. Recovering the secret key using information from the DPA has a constant complexity.

Malware Detection Using a Heterogeneous Distance Function

Rok
2018
Publikováno
Computing and Informatics. 2018, 37(3), 759-780. ISSN 1335-9150.
Typ
Článek
Anotace
Classication of automatically generated malware is an active research area. The amount of new malware is growing exponentially and since manual in- vestigation is not possible, automated malware classication is necessary. In this paper, we present a static malware detection system for the detection of unknown malicious programs which is based on combination of the weighted k-nearest neigh- bors classier and the statistical scoring technique from. We have extracted the most relevant features from portable executable (PE) le format using gain ratio and have designed a heterogeneous distance function that can handle both linear and nominal features. Our proposed detection method was evaluated on a dataset with tens of thousands of malicious and benign samples and the experimental re- sults show that the accuracy of our classier is 98.80%. In addition, preliminary results indicate that the proposed similarity metric on our feature space could be used for clustering malware into families.