Ing. Eliška Krátká

  • Profile
  • Theses

Theses

Bachelor theses

Machine Learning-Based Phishing Detection

Author
Jan Koníř
Year
2025
Type
Bachelor thesis
Supervisor
Ing. Eliška Krátká
Reviewers
Ing. Ivana Trummová
Summary
This thesis explores the use of machine learning for the static detection of phishing attacks, with a focus on phishing emails as the selected attack vector. Due to the lack of publicly available up-to-date datasets, a custom dataset was created, containing labeled samples of real-world phishing attempts and legitimate emails. A modular Python preprocessing pipeline was developed to allow various combinations of text preprocessing and feature extraction methods. These include tokenization, stemming, and stopword removal, as well as lexical, statistical, and semantic feature extraction techniques. The SVM classifier was trained using this dataset, and its performance was evaluated in a series of structured experiments. Finally, this thesis discusses the findings from the experiments and suggests various directions for future work in machine learning-based phishing detection.