Knowledge Engineering (version for Czech students)

Theses

Master theses

Explainability in deep learning-based medical image analysis

Author
Martin Lank
Year
2024
Type
Master thesis
Supervisor
Ing. Magda Friedjungová, Ph.D.
Reviewers
Ing. Daniel Vašata, Ph.D.
Summary
In this work, we apply Grad-CAM++, LayerCAM and SmoothGrad explainability methods to the proposed EfficientNetV2-based convolutional neural network fine-tuned on microscopic histology imaging. The network predicts mean diffusivity (MD) and fractional anisotropy (FA) obtained from diffusion tensor imaging. The aim of the work was to reveal which histology features tend to increase MD and FA. The proposed network achieved more than 98.5% R2 on all train, validation and test sets, surpassing the network proposed in the preceding work by tens of percentage points in R2. Nevertheless, the explainability methods applied to microscopy imaging were less valuable than anticipated. They indicate certain nuclei influence; however, the details about the relationship remain undiscovered.

Machine Learning Techniques for Laser-Plasma Acceleration Optimization

Author
Matěj Jech
Year
2024
Type
Master thesis
Supervisor
Mgr. Alexander Kovalenko, Ph.D.
Reviewers
doc. Ing. Ivan Šimeček, Ph.D.
Summary
The thesis deals with the analysis of data from the laser-plasma particle accelerator in collaboration with the scientific institution ELI Beamlines. In the scope of the work, a data pre-processing process was designed and a generative model simulating the course of physics experiments was developed. The model is conditioned on a vector of experimental parameters and generates image data showing the energy spectrum of the accelerated electron beam. The developed model can be used as a partial substitute for real experiments, which are costly in terms of time and finances. It can also be used as a simulation of real experiments for various optimization methods. This thesis defines the process of training and testing candidate models with three different architectures and based on four hyperparameters. The resulting model can generate data at a rate of 1.8 images per second and has been evaluated based on a number of metrics, including the expert opinion of scientists, as a trustworthy tool to simulate the electron acceleration process.

Detection and removal of watermarks from image data

Author
Tomáš Halama
Year
2023
Type
Master thesis
Supervisor
Ing. Miroslav Čepek, Ph.D.
Reviewers
Ing. Magda Friedjungová, Ph.D.
Summary
Digital image watermarking is a widely used technique for protecting intellectual property or authenticating digital media, but it can negatively impact image quality and usability. This motivates the need for removing watermarks from images, and deep learning presents a potential solution. This thesis develops a deep learning method for watermark removal, including a survey of existing techniques and the proposal of a novel architecture. The method's performance is evaluated in terms of watermark detection accuracy and image reconstruction quality.

Short-Term Precipitation Forecasting from Satellite Data Using Machine Learning

Author
Jiří Pihrt
Year
2023
Type
Master thesis
Supervisor
Mgr. Petr Šimánek
Reviewers
doc. Ing. Kamil Dedecius, Ph.D.
Summary
Geostationary meteorological satellites are a source of global and frequent weather observations, but they do not directly observe precipitation. We research existing methods for inferring and forecasting rainfall from satellite data. The aim of this thesis is to predict high resolution precipitation radar observations up to 8 hours ahead from larger context but lower resolution multi-spectral geostationary satellite images. We develop a novel deep learning model for this task, utilizing the U-Net and PhyDNet neural networks. We name it WeatherFusionNet, as it fuses three different ways to process the satellite data; predicting future satellite images, estimating precipitation in the input sequence, and using the input sequence directly. To train and test it on real data, we participate in the NeurIPS Weather4cast 2022 competition, which provides spatially and temporally aligned satellite imagery and target precipitation radar data. WeatherFusionNet achieved first place in the Core challenge of the competition. We further experiment with several different models, try including static data in the input, and compare our model with a direct radar-to-radar model.

Constraint Programming in Scheduling for Garage

Author
Petr Švec
Year
2023
Type
Master thesis
Supervisor
prof. RNDr. Pavel Surynek, Ph.D.
Reviewers
Ing. Daniel Vašata, Ph.D.
Summary
This thesis deals with the implementation of a system for scheduling workshop work in a car repair shop. The thesis analyses the customer requirements, based on which it defines a model. Model is using constraint programming. Based on the proposed model, we implemented the solver using the choco library. We validated the solver on synthetic and practice motivated data. Various properties of the generated solution for the test instances were measured. The focus was on universal use with maximum parameterization for the needs of individual clients and integrations.

Scalable Gaussian processes for surrogate modelling in Bayesian optimization

Author
Iveta Šárfyová
Year
2023
Type
Master thesis
Supervisor
Ing. Jiří Vošmik
Reviewers
Ing. Daniel Vašata, Ph.D.
Summary
Bayesian optimisation is a global optimisation method suitable for finding extrema of expensive-to-evaluate black-box objective functions. Gaussian Processes are frequently used as models for approximating such functions. However, their cubic time complexity limits their deployment to applications in small-data regimes. This thesis provides an overview of state-of-the-art scalable Gaussian Processes for regression. The experiments performed within this work deal with tasks of regression and Bayesian optimisation, both utilising several selected Gaussian Process models. Evaluation is done using multiple metrics, some of which are particularly appropriate for probabilistic models. Our results suggest that the same few models consistently outperform the others in both tasks.

Anomaly detection on the CERN data centre monitoring data

Author
Antonín Dvořák
Year
2022
Type
Master thesis
Supervisor
Mgr. Alexander Kovalenko, Ph.D.
Reviewers
doc. Ing. Kamil Dedecius, Ph.D.
Summary
One of the many tasks of CERN cloud service operators is to make sure that the desired computational power is delivered to all users of the scientific community. This task is accomplished by carefully setting threshold-based alarming on top of the infrastructure performance time series metrics. In order to maximize the efficiency of the cloud infrastructure and to reduce the monitoring effort for service operators, we have developed a fully automated Anomaly Detection System that leverages unsupervised machine learning methods for time series metrics. Moreover, adopting ensemble methods, we combine traditional (Isolation forest) and deep learning (Gated recurrent unit/Long short-term memory Autoencoders) approaches. This work presents a description of the CERN monitoring infrastructure, problem formulation, design of the Anomaly Detection Pipeline, description of used models, creation of the dataset and performance of the implemented models compared to the performance of the Current Alarming System.

Lazy Compilation in Classical Planning

Author
Zuzana Fílová
Year
2022
Type
Master thesis
Supervisor
prof. RNDr. Pavel Surynek, Ph.D.
Reviewers
Ing. Daniel Vašata, Ph.D.
Summary
The subject of this diploma thesis is focused on a lazy compilation in classical planning. The theoretical part summarizes the basics of classical planning. Key concepts of the classical representation of planning problems are defined and basic planning algorithms are presented, in particular, the search in the planning state space and techniques using the planning graph. The compilation of the planning problem into the propositional satisfiability problem (SAT) is discussed at the end of this section. Based on the obtained knowledge, a new method for lazy compilation of planning problems into SAT has been proposed. Different from the classical compilation, in this method the propositional formula is gradually created and modified. As part of the practical part of the work, a planner was implemented using two compilation variants - the proposed method for lazy compilation and classical compilation. The planner was tested on planning problems from the International Planning Competition (IPC). The experiments focused on evaluating the success of the planner based on lazy compilation and comparing the results with the planner using the classical compilation method. A total of 79 problems of varying difficulty from four domains were used, of which the lazy planner was able to solve 63 faster than the classical planner. The performed experiments pointed out the advantages and possible disadvantages of lazy compilation. The results of the experiments indicate that the use of lazy compilation has the potential to improve the performance of the planner.

Power line vegetation management using UAV images

Author
Radek Ježek
Year
2022
Type
Master thesis
Supervisor
Ing. Lukáš Brchl
Reviewers
doc. Ing. Štěpán Starosta, Ph.D.
Summary
The electric utility companies spend large amounts of money and effort every year to ensure the safe and uninterrupted operation of the electric power infrastructure. The most common source of outages is vegetation damaging power lines, for example, fallen trees. For this reason, companies perform regular inspections and maintenance of power line corridors, especially in forests and densely vegetated areas, creating a high demand for inexpensive and highly automated methods of power line corridor surveys. This work aims to create a robust algorithm for automatic detection of vegetation encroachment in the power line corridor using an Unmanned Aerial Vehicle (UAV), the techniques of photogrammetry, and computer vision. The study will cover the workflow for power line corridor inspection from comprehensive guidelines for data acquisition through power line 3D reconstruction to vegetation encroachment detection and visualization of the results.

The Study of Linear Self-Attention Mechanism in Transformer

Author
Uladzislau Yorsh
Year
2022
Type
Master thesis
Supervisor
Mgr. Alexander Kovalenko, Ph.D.
Reviewers
doc. Ing. Pavel Kordík, Ph.D.
Summary
As the quadratic complexity of an attention mechanism in the Transformer architecture places a high demand on processing long sequences, the goal of this research is to explore possibilities of linear attention in Transformer-like architecture and implement new methods.

Improving deep learning precipitation nowcasting by using prior knowledge

Author
Matej Choma
Year
2022
Type
Master thesis
Supervisor
Mgr. Petr Šimánek
Reviewers
Mgr. Petr Novák, Ph.D.
Summary
Deep learning methods dominate short-term high-resolution precipitation nowcasting in terms of prediction error. However, their operational usability is limited by difficulties explaining dynamics behind the predictions, which are smoothed out and missing the high-frequency features due to optimizing for mean error loss functions. This thesis summarizes our progress in addressing these issues. Firstly, we present Intensity Classification Loss to improve the prediction of severe rainfall. The model is trained to predict the probability of precipitation with an intensity over 40 dBZ as a secondary output, which is compared to binary ground truth. Experiments have shown that this approach helps predict severe rainfall but does not predict precipitation with higher intensities than the selected threshold. Secondly, we experiment with hand-engineering of the advection-diffusion differential equation into a PhyCell to introduce more accurate physical prior to a PhyDNet model that disentangles physical and residual dynamics. Results indicate that while PhyCell can learn the intended dynamics, training of PhyDNet remains driven by loss optimization, resulting in a model with the same prediction capabilities.

Bayesian filtering of state-space models with unknown covariance matrices

Author
Tomáš Vlk
Year
2021
Type
Master thesis
Supervisor
doc. Ing. Kamil Dedecius, Ph.D.
Reviewers
Ing. Ondřej Tichý, Ph.D.
Summary
This thesis explores the problem of distributed Bayesian sequential estimation of unknown state-spacemodels with unknown processes and measurement noise covariance matrices. This is a frequent problem in real-world scenarios, where the information about noise covariance matrices for specific sensors may not be available. The solution proposed in this thesis is built upon the variational Bayesian paradigm, which is used for the estimation of the states, as well as the unknown measurement noise covariance matrix. From performance improvements, the measurements and posterior estimates are shared between the adjacent node in the network. It also shows a way of optimizing the process noise covariance matrix.

Anomaly detection using Extended Isolation Forest

Author
Adam Valenta
Year
2020
Type
Master thesis
Reviewers
doc. Ing. Pavel Kordík, Ph.D.
Summary
The thesis deals with anomaly detection algorithms with a focus on the Extended Isolation Forest algorithm. Extended Isolation Forest generalizes its predecessor algorithm, the Isolation Forest. The original Isolation Forest algorithm brings a brand new form of detection, although the algorithm suffers from bias coming from tree branching. Extension of the algorithm removes the bias by adjusting the branching, and the original algorithm becomes just a special case. Extended Isolation Forest is implemented into the H2O-3 Machine Learning open-source platform. Implementation is required to run on a distributed computing system with a Map/Reduce library.

Neural Networks Based Domain Adaptation in Spectroscopic Sky Surveys

Author
Ondřej Podsztavek
Year
2020
Type
Master thesis
Supervisor
RNDr. Petr Škoda, CSc.
Reviewers
Ing. Kamil Dedecius, Ph.D.
Summary
We present an analysis of the impact of neural-based domain adaptation in astronomical spectroscopy. Domain adaptation addresses the problem of apply- ing prior knowledge to a new data of interest. Therefore, we selected a problem of quasar identification in the Large Sky Area Multi-Object Fiber Spectroscopic Telescope survey using labelled data from the Sloan Digital Sky Survey. We choose to experiment with four neural models for domain adaptation: Deep Domain Confusion, Deep Correlation Alignment, Domain-Adversarial Network and Deep Reconstruction-Classification Network. However, our experiments reveal that these model cannot improve classification performance in comparison to a convolutional neural network that does not consider domain adaptation. Using dimensionality reduction, statistics of the selected methods and misclassifications, we show that the domain adaptation methods are not robust enough to be applied to the complex and dirty astronomical data.

Recommendations Model Based on Recurrent Neural Networks

Author
Ladislav Martínek
Year
2020
Type
Master thesis
Supervisor
Ing. Tomáš Řehořek, Ph.D.
Reviewers
Ing. Mgr. Ladislava Smítková Janků, Ph.D.
Summary
This diploma thesis deals with matters of recommendation systems. The aim is to use recurrent neural networks (LSTM, GRU) to predict the subsequent interactions using sequential data from user behavior. Matrix factorization adapted for datasets with implicit feedback is used to create a representation of items (embeddings). An algorithm for creating recurrent models using the embeddings is designed and implemented in this thesis. Furthermore, an evaluation method respecting the sequential nature of the data is proposed. This evaluation method uses recall and catalog coverage metrics. Experiments are performed systematically to determine the dependencies on the observed methods and hyperparameters. The measurements were performed on three datasets. On the most extensive dataset, I managed to achieve more than double recall against other recommendation techniques, which were represented by collaborative filtering, reminder model, and popularity model. The findings, possible improvement by hyper-parametrization, and different possible means of model improvement are discussed at the end of the work.

Sequential Bayesian Poisson regression

Author
Radomír Žemlička
Year
2020
Type
Master thesis
Supervisor
Ing. Kamil Dedecius, Ph.D.
Summary
The Poisson regression is a popular generalized linear model used to model discrete count variables. This thesis is focused on the problem of its sequential estimation under potentially slowly time-varying regression coefficients. A convenient approximation by normal distribution is used to do so in the Bayesian setting. Also, a calibration technique is discussed to enhance the estimation quality. Finally, a use case of the proposed approach in the signal processing domain is suggested, in particular, its application in diffusion networks to perform distributed collaborative estimation.

Deep Latent Factor Models for Recommender Systems

Author
Radek Bartyzal
Year
2019
Type
Master thesis
Supervisor
Ing. Tomáš Řehořek, Ph.D.
Reviewers
MSc. Juan Pablo Maldonado Lopez, Ph.D.
Summary
Recommendation systems help users discover relevant items. One of the types of models used to generate the recommendations are latent factor models. We survey the state of the art neural network based latent factor models and implement four of them. We also design and implement a novel architecture of a deep latent factor model called Hybrid cSDAE that is able to process both the rating and attribute information. We comprehensively evaluate the implemented models on standard datasets.

Detection of material defects on foamed insulating panels

Author
Tomáš Duda
Year
2018
Type
Master thesis
Supervisor
doc. RNDr. Ing. Marcel Jiřina, Ph.D.
Reviewers
doc. Ing. Ivan Šimeček, Ph.D.
Summary
This master's thesis deals with the automatic detection of material defects on foamed insulating panels using methods of image processing. The process of foamed glass production and current approach to output quality control by a human worker is described. The description of installed hardware for image data acquisition is provided. Related systems for automatic material inspection are reviewed and an analysis of various methods for image texture description is provided. Conceptual design of foamed glass panels inspection system is presented. Acquired images of panels are described and an annotation application is developed. A suitable image preprocessing algorithm is proposed as well as methods for detection of different kinds of foamed glass defects. The final design of detection algorithm is supported by measurement of the accuracy of several methods. The proposed algorithm is implemented and the final accuracy of the inspection system is measured. The results are discussed and possible future improvements are proposed. The developed system was successfully deployed in a production environment.

Approximate Pattern Matching In Sparse Multidimensional Arrays Using Machine Learning Based Methods

Author
Anna Kučerová
Year
2017
Type
Master thesis
Supervisor
Ing. Luboš Krčál
Reviewers
prof. Ing. Jan Holub, Ph.D.
Summary
The main goal of this work is to propose a solution of approximate pattern matching with the use of machine learning based methods. This is done with the help of Locality Sensitive Hashing and existing algorithms. Idea of LSH is used for searching of positions of potential results and their verification is executed as in existing algorithms. Previous work was focused primarily on low dimensional pattern matching. The outcome of this work is an algorithm together with time measures and comparison with already existing solutions. Some of the comparing algorithms were only theoretically designed and not implemented until now. The solution also uses binary format used in a com- mercial array database.

Criminality prediction

Author
Veronika Maurerová
Year
2017
Type
Master thesis
Supervisor
doc. Ing. Pavel Kordík, Ph.D.
Summary
Emphasis on work efficiency and the increasing interest in data processing, Machine learning and Artificial Intelligence caused that the predictive analysis becomes part of the police activities especially in the domain of criminality prevention. For example, the police patrols are scheduled based on the predictive analysis the most risk areas in the city. This thesis is focused on supervised learning methods and their capability to find hidden patterns in the real historical crime data. The objective is to predict future crime with a certain probability using the algorithms based on decision trees and neural networks.

Scalability of Predictive Modeling Algorithms

Author
Tomáš Frýda
Year
2017
Type
Master thesis
Supervisor
doc. Ing. Pavel Kordík, Ph.D.
Reviewers
Ing. Karel Klouda, Ph.D.
Summary
This thesis has two main goals - (1) parallelize FAKE GAME by integration into, an open source machine learning framework, H2O, and (2) evaluation of anytime properties of machine learning algorithms and influence of hyper-parameter optimization on them. To meet these objectives, I have integrated FAKE GAME into H2O and, in order to evaluate anytime properties, I have implemented, a new tool, called Benchmarker. The evaluation of anytime properties shows that for some problems FAKE GAME models outperform state-of-the-art models from H2O, in both, accuracy and performance. Moreover, the evaluation of hyper-parameter optimization show little success, when optimizing H2O machine learning algorithms. I hypothesise that the negligible performance improvement, and for some optimized models even lower performance than with default configuration, is caused by hyper-parameter automatic tuning, which is done by default in H2O for some hyper-parameters.

Neural networks with memory

Author
Ondřej Kužela
Year
2016
Type
Master thesis
Supervisor
doc. RNDr. Ing. Marcel Jiřina, Ph.D.
Reviewers
Ing. Josef Pavlíček, Ph.D.
Summary
Neural networks with memory are the family of the neural networks that except the classic memory for the long-term dependencies, in a form of the weights, also contain another form of a memory. Such a memory serves to retain the mid-term, sometimes also called long-short-term, dependencies and can be of two different types, either internal or external. Within this thesis I offer a summarizing overview of the family of the neural networks with memory. Based on the analysis of the existing models I also propose a new model of the Recurrent Neural Modules with External Memory. This model offers a new and innovative approach to the usage of the external memory within the neural networks, since it deploys the external memory on the scope of parts of the network and thus deploys multiple external memories within one network. The performance of the newly proposed model was evaluated on the Air Travel Information System (ATIS) dataset.

Automatic text summarization

Author
Šimon Hlaváč
Year
2015
Type
Master thesis
Supervisor
doc. RNDr. Ing. Marcel Jiřina, Ph.D.
Reviewers
doc. Ing. Pavel Kordík, Ph.D.
Summary
This work presents the basic methods used in automatic text summarization and genetic algorithms. Furthermore, system of automatic summarization based on graph structures and Markov chains was designed, implemented and properly tested. This study also discusses learning of proper setting of importance weights of individual methods used in summarization by naive approach and genetic algorithms, which were also implemented and properly tested. System also includes possibility of parallel processing and use of caching to speed up its process.

Text signals relevance improvement for full text serch

Author
Jan Hnízdil
Year
2015
Type
Master thesis
Supervisor
Ing. Jan Šedivý, CSc.
Reviewers
doc. Ing. Pavel Kordík, Ph.D.
Summary
Although web search has become a standard and often favored source of information finding many years ago, the task of searching relevance documents to given user query has still a lot of weak spaces need to be improved. This thesis is trying to find new text relevance signals to improve full-text search and user satisfaction via datasets provided by Seznam.cz. First of all, there is analyzed and evaluated major LTR algorithms, evaluation metrics and commonly used text signals known from literature. Second, system for testing and evaluation of new signals was designed and implemented and finally bunch of experiments over the new text signals were conducted and results were compared with anonymized baseline signals provided by Seznam.cz.

Data mining of high school alumni performance at the university

Author
Eliška Hrubá
Year
2014
Type
Master thesis
Supervisor
doc. Ing. Pavel Kordík, Ph.D.
Reviewers
Ing. Stanislav Kuznetsov, Ph.D.

Supporting the Diagnosis of Borreliosis by Machine Learning Methods

Author
Jan Motl
Year
2013
Type
Master thesis
Supervisor
doc. Ing. Pavel Kordík, Ph.D.
Reviewers
Ing. Tomáš Bartoň, Ph.D.

The person responsible for the content of this page: Ing. Zdeněk Muzikář, CSc.