Each year, we announce new topics that reflect the latest trends and advancements in the field. The topics are general and every student shall agree on a specific area of interest and research with their supervisor. Dissertation theses must be submitted 6 years at the latest after the day of enrollment in studies.

## Babuška Robert, prof. Dr. Ing.

### Deep reinforcement learning for visual robot navigation

Visual navigation is essential for many applications in robotics, from manipulation, through mobile robotics to automated driving. Deep reinforcement learning (DRL) provides an elegant map-free approach integrating image processing, localization, and planning in one module, which can be trained and therefore optimized for a given environment. We investigate multiple research problems: from environment rendering to real-time robot control.

## Čejka Tomáš, Ing., Ph.D.

### Encrypted network traffic analysis and security threats detection in high-speed networks

The aim of this topic will be a research of algorithms for classification of encrypted network traffic, detection of security threats in high-speed computer networks, and the creation of automatically annotated data sets. Encrypted traffic currently represents a big challenge for standard monitoring tools, which use mainly unencrypted information extracted from packets. Thus it is an important topic for the scientific community in network security and the professional public. Most network traffic is already encrypted, so it is necessary to explore new sources of information about what is happening on the network. This information is essential for both network operators and security analysts. The aim of this general topic of the dissertation theses is research that uses mainly statistical properties of traffic, which can be calculated in real-time at a speed of above 100Gb/s (using hardware acceleration).

The goal is to research classification and detection algorithms based on machine learning for network processing of IP flows enriched with new statistics and the experimental evaluation of these developed algorithms on the long-term high-speed operation.

The dissertability of the topic is based on the fact that it is a solution to very non-trivial problems, such as processing and filtering large volumes of data, network traffic modeling, finding deviations, identifying attackers, and proper management of mitigation of the detected security incidents. In the field of encrypted traffic analysis, research results from the global scientific community are beginning to emerge, but a sufficiently feasible solution has not yet been published. The basis will be research into the possibilities of using statistical methods, probabilistic models, and artificial intelligence algorithms.

Due to the current speeds of network transmissions and requirements for online monitoring, it is necessary to design and implement algorithms using decomposition into hardware and software and using appropriate hardware acceleration technologies (e.g., FPGA).

## Fišer Petr, doc. Ing., Ph.D.

### Approximate Logic Circuits Testing

So called “approximate” logic circuits are one of contemporary mainstreams in logic design. Here the logic circuit needs not compute the desired function exactly, some error is tolerated. The main purpose of designing such circuits is a significantly reduced area and power consumption. Approximate circuits find application in image/audio processing and recognition, neural networks, data-mining, etc.

Testing of these circuits now becomes a new challenging task. Test generation for approximate circuits offers more degrees of freedom: the test needs not be complete, not all faults need to be tested in order to comply with the approximation demands. As a result, the test can be shorter.

The aim of the research will be to design novel Automated Test Patterns Generation (ATPG) algorithms for approximate circuits.

### Artificial Intelligence in Logic Synthesis

Algorithms used to design digital circuits (EDA algorithms) are usually of a greedy nature. Local decisions are made randomly, or based on topological structure, or by the algorithm designer’s experience. These decisions need not be well chosen, resulting in inferior result quality. To resolve this problem, AI strategies can be incorporated into EDA algorithms. Here the AI learns in the EDA process and then helps in the subsequent processing.

The aim of the research is to analyze possibilities of application of AI in logic synthesis algorithms and devise new AI-guided logic synthesis algorithms.

### Improvements in the Logic Synthesis and Optimization Process Control

Contemporary logic synthesis and optimization tools (both commercial and academic) mostly emphasize computational speed, at expense of result quality. Our recent research has shown that these tools tend to get stuck in deep local optima, and therefore they often produce very inferior results (in terms of area and/or delay). One of the reasons for it is a deterministic nature of the algorithms. Randomization of the algorithms has been found to be a viable, but just partial solution to this problem [1], [2]. The second, and more important reason of failure is the lack of a global algorithm control. Most of present logic synthesis and optimization algorithms are of an iterative nature, whereas their individual parts (elementary operations) are executed essentially ad-hoc and speculatively. Implementation of efficient top-level algorithm control means should significantly improve performance of logic synthesis and optimization.

The aim of the research is to investigate the behavior of individual elementary synthesis steps (e.g., in the ABC synthesis tool [3]), determine their dependence and orthogonality, and devise an improved overall algorithm control.

- [1] P. Fišer and J. Schmidt, “Improving the Iterative Power of Resynthesis,” in Proc. of 15th IEEE Symposium on Design and Diagnostics of Electronic Systems (DDECS), Tallinn (Estonia), April 18-20, 2012, pp. 30-33.
- [2] P. Fišer and J. Schmidt, “On Using Permutation of Variables to Improve the Iterative Power of Resynthesis,” in Proc. of 10th Int. Workshop on Boolean Problems (IWSBP), Freiberg (Germany), September 19-21, 2012, pp. 107-114.
- [3] Berkeley Logic Synthesis and Verification Group, “ABC: A System for Sequential Synthesis and Verification” [Online]. Available: http://www.eecs.berkeley.edu/alanmi/abc/.

### Randomized Iterative Algorithms in Logic Synthesis

Contemporary logic synthesis and optimization tools (both commercial and academic) mostly emphasize computational speed, at expense of result quality. Our recent research has shown that these tools tend to get stuck in deep local optima, and therefore they often produce very inferior results (in terms of area and/or delay). Randomized iterative algorithms seem to efficiently solve this problem [1], [2] – they offer a trade-off between the run-time and result quality.

Moreover, present studies have shown that most of logic synthesis and optimization tools are very sensitive to randomness accidently introduced “from outside”, by the designer itself [3], [4]. Synthesis then produces results significantly differing in quality, when only slight changes in the source circuit description are made. Such a behavior is highly unwanted. Thus, it is required to analyze this behavior, determine its reasons, and to suggest more efficient algorithms.

The aim of the research is to analyze selected logic synthesis and optimization algorithms [5], identify the reasons of the above-mentioned behavior, and identify points, where randomness can be introduced. The influence of randomness will be then analyzed and algorithms exploiting the randomness in a positive way will be devised [3], [4]. Next, new algorithms minimizing the sensitivity on the external randomness will be developed.

- [1] P. Fišer and J. Schmidt, “Improving the Iterative Power of Resynthesis,” in Proc. of 15th IEEE Symposium on Design and Diagnostics of Electronic Systems (DDECS), Tallinn (Estonia), April 18-20, 2012, pp. 30-33.
- [2] P. Fišer and J. Schmidt, “On Using Permutation of Variables to Improve the Iterative Power of Resynthesis,” in Proc. of 10th Int. Workshop on Boolean Problems (IWSBP), Freiberg (Germany), September 19-21, 2012, pp. 107-114.
- [3] A. Puggelli, T. Welp, A. Kuehlmann, and A. Sangiovanni-Vincentelli, “Are Logic Synthesis Tools Robust?,” in Proc. of the 48th ACM/EDAC/IEEE Design Automation Conference (DAC), 5-9 June 2011, pp. 633-638.
- [4] P. Fišer, J. Schmidt, and J. Balcárek, “On Robustness of EDA Tools,” in Proc. of 17th Euromicro Conference on Digital Systems Design (DSD), Verona (Italy), August 27-29, 2014, pp. 427-434.
- [5] Berkeley Logic Synthesis and Verification Group, “ABC: A System for Sequential Synthesis and Verification” [Online]. Available: http://www.eecs.berkeley.edu/alanmi/abc/.

### Test Compression for ASIC Circuits

With the increasing complexity of presently manufactured chips, increasingly more data must be delivered to individual chip cores to test them. Compression of this data becomes now inevitable, because of the high cost of the ATE memory and test time expenses. The ratio of expenses spent by chip testing and its development is increasing. Therefore, test compression is and will increasingly be a hot topic. It is very challenging to contribute to the research in this area and to try to overcome established industrial tools in performance.

Different test compression mechanisms have been proposed, some of them are used in industry [1], [2]. Most of them rely on a combination of pseudo-random testing (and possibly BIST), which can be implemented on-chip as whole and does not need any data to be transmitted, and deterministic test. The deterministic test is algorithmically compressed, stored in the ATE memory, and decompressed on-chip.

The aim of the research is to design test compression/decompression methods based on advanced design-for-testability (DFT) architectures [3]. This will comprise of a design of possibly novel decompression architectures, algorithms for test compression for these architectures, and design of the overall HW architecture, where test decompression, BIST and various DFT features will be implemented.

This research will (can) follow up a successfully completed Ph.D. thesis [4], [5].

- [1] J. Rajski et al. “Embedded Deterministic Test”, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, vol. 23, no. 5, pp. 776-792, 2004.
- [2] Y. Huang, S. Milewski, J. Rajski, J. Tyszer and C. Wang, “Low Cost Hypercompression of Test Data,” IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, vol. 39, no. 10, pp. 2964-2975, 2020.
- [3] R. Dorsch, H. Wunderlich, “Reusing Scan Chains for Test Pattern Decompression”, Journal of Electronic Testing: Theory and Applications, vol. 18, no. 2, pp. 231-240, 2002.
- [4] J. Balcárek, P. Fišer, and J. Schmidt, “Techniques for SAT-based Constrained Test Pattern Generation,” in Microprocessors and Microsystems, Elsevier, Vol. 37, Issue 2, March 2013, pp. 185-195. [5] J. Balcárek, „Implicit Representations in Testing and Dependability of Digital Circuits“, Ph.D. Thesis, CTU in Prague, 2016.
- [5] J. Balcárek, „Implicit Representations in Testing and Dependability of Digital Circuits“, Ph.D. Thesis, CTU in Prague, 2016.

## Haindl Michal, prof. Ing., DrSc.

### Analysis of visual materials properties

The aim of this work is to analyze the perception of surface materials under variable illumination and observation conditions. The work will have the task of finding suitable static and dynamic visual stimuli and psychophysical verification of their relative importance for human perception and recognition of different materials.

### Automatic shape estimation from video

The work is focused on the research of methods for recognizing and modeling the shape of bodies from video for virtual reality applications. Propose and implement a suitable method of automatic estimation of a 3D model from measured data using a video camera. Verify the method on selected models of sculptures and buildings.

### Light field estimation from visual scene

The thesis objective is to study environmental light field modelling methods from measured visual scenes applied to model BTF illumination in virtual reality application. Propose and implement an appropriate method for realistic illumination model estimated from real measured visual scenes. Verify this method on interior and exterior virtual reality models with variable illumination.

### Material appearance measuring on real objects

The thesis objective is to propose acquisition methods, capable to estimate advanced visual material representations (BTF, SVBRDF) directly from a real 3D object in natural lighting environment. Propose and implement an appropriate method for automatic visual surface material inference from hand held video camera sequences. Verify this method on the selected surface materials.

### Modeling of spatial spread of viral infections

Design and implementation of a suitable statistical model that will allow modeling and prediction of the spread of viral infection between neigboring geographical areas. The model will also allow the evaluation of some selected epidemiological measures and their impact on the rate of spread of viral infection. The developed Markov model will be verified on COVID-19 coronavirus infection data.

### Modelling of visual surface material properties

The appearance of real surface materials significantly changes with variations in lighting and viewing parameters. Therefore, today's most advanced texture representations (BTF) require modeling reflectivity over a wide range of lighting parameters and camera locations using complex mathematical models. The aim of the work will be to develop and verify a new BTF model based on the theory of Markov random fields, which will improve the current state of physically correct modeling of material surfaces.

### Multispectral textural features

Design of suitable multispectral texture features for analysis of surface materials of visual scenes with variable observation conditions. The features will be derived from the multidimensional statistical models, their invariant modifications will be proposed, compared with the current best published alternative textural features, and applied to BTF data.

### Skin Cancer Detection and Treatment Progress Monitoring

Malignant melanoma, as the most dangerous form of skin cancer, has been a rapidly growing threat over the past decades. The effective treatment requires its early diagnoses and surgical excision. The thesis objective is to find discriminative image features and a robust classifier which will allow to recognize skin cancer on colour skin images and to evaluate a treatment progress between several mutitemporal skin images.

### Unsupervised dynamic image segmentation

The objective of this work is to critically evaluate the current state of unsupervised segmentation of image data and to develop an algorithm for unsupervised segmentation of dynamic color / multispectral / BTF images into individual homogeneous areas. The method will be based on the underlying multidimensional Markovian models and verified on the PTSB benchmark.

### Visual quality measures

Verification of visual data modelling quality is a difficult and still unresolved problem due to the lack of existing mathematical criteria capable of approximating the human eye's perception. The work will aim at suggestion of an appropriate mathematical criterion which could quantify and rank simulation results according to their similarity to the measured visual template. The criterion will also be applicable to visual textures.

## Hanzálek Zdeněk, prof. Dr. Ing.

### Data-driven Scheduling Algorithms

Execution of many systems, such as time-triggered real-time embedded systems (TTRES) in automated cars, follows a predefined static schedule found by a complex scheduling algorithm. However, the traditional TT paradigm is too rigid and fails to account for the evolution of the TTRES configuration during its life cycle. Through our experience, we believe that the current scheduling algorithms for TTRES fail: (i) to accommodate changes while devising a gradual evolution of the TT schedule, and (ii) to enhance the search strategy with the information extracted from related scheduling problems and instances. Algorithms addressing shortcomings (i) and (ii) are driven by the data whose character and quality impact the parameters, constraints, and criteria of the scheduling problem at hand. We believe that these two issues represent significant untapped potential in scheduling algorithms. We aim to unlock this potential by developing algorithms enhanced with machine learning that are scalable and able to handle uncertain parameters.

### Scheduling algorithms for energy-efficient production

The considered basic problem is RCPSP (Resource-Constrained Project Scheduling Problem). It is characterized by a set of activities to be scheduled. The order of these activities is partly defined by precedence constraints. Each activity requires some amount of renewable resources (e.g., machines, employees) while every resource has a planned capacity and some energy (i.e., a consumable resource with limits given by 15-minute contracts). The goal is to find a sequence of activities assigned to resources (i.e., a schedule) and capacity of resources with respect to two criteria. One objective is to minimize the earliness/tardiness penalty (costs incurred by completing some of the jobs before or after their due dates). The second objective is to minimize costs connected with the increasing capacity of resources (e.g. hiring more employees or working overtime). Then the solution of the problem might consider a linear combination or a Pareto front considering these two objectives.

We may further extend the problem while considering alternatives or modes of processes to be performed. Eventually, we might investigate the use of Machine Learning to speed up our algorithms, for example, to accelerate the computation of the objective function in the algorithm performing frequent swaps of tasks as it proved to be useful in our prior work dealing with a nurse rostering problem.

- Čapek, R. - Šůcha, P. - Hanzálek, Z.: Production Scheduling with Alternative Process Plans, European Journal of Operational Research, Volume 217, Issue 2, March 2012, Pages 300–311.
- Módos, I. - Šůcha, P. - Hanzálek, Z.: Algorithms for robust production scheduling with energy consumption limits, Computers & Industrial Engineering, Volume 112, October 2017, Pages 391-408.
- Václavík, R. - Šůcha, P. - Hanzálek, Z.: Roster evaluation based on classifiers for the nurse rostering problem, Journal of Heuristics, October 2016, Volume 22, Issue 5, Pages 667–697

### Scheduling algorithms for large-scale applications

As the number of jobs and resources grows rapidly in many applications, calculating a Time-Triggered (TT) schedule is becoming computationally difficult. Even for very basic models, these problems are known to be NP-hard. Therefore, for large-scale problems, the first challenge is to obtain a reasonable solution in a reasonable time. The most common approach is to model the scheduling problem using some formalism, usually SMT or ILP, and to find a feasible solution using a general-purpose solver. The downside of this approach is its limited performance. To achieve sufficient scalability, a tailored algorithm must be developed.

We are interested to develop efficient and scalable algorithms that will be able to compute schedules (e.g. Time-Triggered schedules in IEEE 802.1Qbv Time-Sensitive Networks) with tens of thousands of jobs (i.e., messages) spanning hundreds of resources (i.e., communication channels). Besides, the algorithms will be expected to store some valuable information during the search, referred to as a context, which will then be exploited to speed up the process of evolving the schedule in response to changing requirements.

We may further extend our algorithms to handle jointly TT and Event-Triggered (ET) traffic classes, thereby improving the system timing properties. Eventually, we might investigate the use of Machine Learning to speed up our algorithms. Most recently we proposed a deep neural network combined with Lawler's decomposition for the single machine total tardiness problem. The results show that our data-driven algorithm outperformed the former state-of-the-art heuristics, and we see a great potential to extend it to other problems with the known decomposition rule.

- Vlk, M. - Brejchova, K. - Hanzalek, Z. - Tang, S.: Large-Scale Periodic Scheduling in Time-Sensitive Networks, Computers & Operations Research, Volume 137, January 2022.
- Minaeva, A. - Hanzalek, Z.: Survey on Periodic Scheduling for Time-Triggered Hard Real-Time Systems, ACM Computing Surveys, Volume 54, Issue 1, March 2021, Article 23, Pages 23:1-23:32.
- Bouška, M., Novák, A., Šůcha, P., Módos, I., Hanzálek, Z.: Data-driven Algorithm for Scheduling with Total Tardiness. In: Proceedings of the 9th International Conference on Operations Research and Enterprise Systems, ICORES 2020.

## Holeňa Martin, prof. Ing. RNDr., CSc.

### Advanced methods of evolutionary black-box optimization

Optimization tasks encountered in real-world applications more and more frequently optimize objectives that are not mathematically calculated functions, but results of computer simulations or experimental measurements. That kind of optimization, called black-box optimization, presents two great challenges: 1. it is possible to get only the values of such a black-box objective, not its gradient or higher derivatives, 2. the evaluation of the objective typically costs much time and/or money. As far as the first challenge is concerned, evolutionary algorithms have proven very successful for optimization using only the values of the objective during the last decades. However, they typically require a large number of evaluations, which conflicts with the second challenge. That conflict incited an intense reseaech into black-box evolutionary optimization during the last decade, bringing a broad spectrum of topics suitable for PhD theses.

### Explainability of graph neural networks

Graph data are used to keep knowledge about many important areas of the present world, such as computer networks, social networks, chemical molecules, communication systems, industrial processes or text documents. However, data-based methods for data analysis and modelling, such as classification, regression and clustering, have been developed primarily for vector data, and graph data need to be for using those methods first represented in some vector space. Most successful at such representation are artificial neural networks. Due to the need to learn graph representation, a specific kind of neural networks appeared, called graph neural networks. However, also graph neural networks have the propertzy of an overwhelming majority of neural networks that the rtransformation of network inputs to their outputs is a black-box mapping, which for a given network input does not enable to explain its output. In connetion with traditional neural networks, primarily multiayer perceptrons and radial basis function networks, attention has been paid, already since the 1990s, to methods enabling to describe the dependence of network output on its input by means of logical implications and equivalences, or to explain in some other way the value of its output for a given input. In the case of graph neural networks, however, research into explainability methods is only starting. Contribute to it should also the proposed thesis.

### Making use of active learning in optimization

Evolutionary algorithms are, in the last 20 years, one of the most successful methods for solving non-traditional optimization problems, such as search for the most suitable documents containing required information, discovery of the most interesting knowledge in available data, or other kinds of optimization tasks in which the values of the objective function can be obtained only empirically. Because evolutionary algorithms employ only function values of the objective function, they approach its optimum much more slowly than optimization methods for smooth functions, which make use of information about the objective function gradients as well, possibly also about its second derivatives. This property of evolutionary algorithms is particularly disadvantageous in the context of costly and time-consuming empirical way of obtaining values of the objective function. However, evolutionary algorithms can be substantially speeded up if they employ the empirical objective function only sometimes when evaluating objective function values, whereas they mostly evaluate only a sufficiently accurate regression model of that function. It is the accuracy of the model that determines how successful surrogate of the original empirical function it will be. Therefore, the model is made more accurate after obtaining each new generation of points in which the empirical function has been evaluated, through re-learning including those points. However, it is possible to go even further and to consider, when choosing points for empirical evaluation, besides the value of the empirical function also how they contribute, during model re-learning, to making it more accurate. Such an approach is termed active learning. However, using active learning to accelerate evolutionary algorithms is only at a very beginning, and should be supported also by the proposed thesis.

## Holub Jan, prof. Ing., Ph.D.

### Data Indexing for Bioinformatics

Cheap technology for genome sequencing started a huge expansion of data to be stored and processed. Such a huge volume of data is needed for personalized medicine, for pangenome research, or for finding biomarkers for various diseases. Traditional indices allowing efficient search fail this time due to exceeding the available memory. They also do not exploit the important property of high similarity of human genome sequences.

In the last few years, new developments in stringology brought various compressed indices based on the Burrows-Wheeler transform. The goal of the project is to find more efficient methods for storing and indexing data for various tasks in Bioinformatics.

### Natural Language Text Compression

Claude Shannon made an experiment in 1951 concluding that the entropy of English text is 0.6-1.3 bits per character. The model of such experiment includes knowledge of English grammar, English vocabulary, and the most common English phrases.

The goal of the dissertation is to increase the efficiency of compression of natural language texts using the relaxation of the above model. Several phases of natural language analysis will be used both to understand the text and to compress the text more efficiently.

### XML Data Compression

The XML data format was introduced many years ago and it is widely used as a defacto standard for data representation and exchange over the WWW now. There are many XML compression techniques developed. Some do not allow queries without decompression of whole XML document, some do. However, the latest developments in stringology brought us fast and compressed data structures to store data based on the Burrows-Wheeler transform. The goal of the dissertation is to find more efficient methods for storing and querying data in XML format. Those methods should keep easy and fast access to the stored data while improving memory complexity.

## Hyniová Kateřina, doc. Ing., CSc.

### A Comparative Study on Various Control Strategies for One-quarter-car Suspension Model Based on Numerical Simulations

The aim of the DP is to propose a method for the analysis of vibrations caused by road disturbances and to use it to compare the effectiveness of different car suspension control strategies. The strategies will be compared and evaluated using software-in-the-loop simulation on Matlab platform. The results of simulations of the quarter model suspension for different types of road irregularities will be compared . Car suspension strategy is a compromise between two conflicting criteria: car holding on the road and passenger comfort. These criteria will be taken into account in the study. The comparison will be made for both the active as well as the semi-active suspension group. The work will also include state-of-the-art.

## Janeček Jan, doc. Ing., CSc.

### Fog/Edge midleware for IoT applications

The methods of modern IoT must support the cooperation of end elements with current P2P methods of distributed applications and databases based on powerful computing systems with virtual and cloud architecture and high-speed Internet.

The IoT operating elements themselves (sensors, controllers and elements combining both of these functions) must respect a number of limitations (sensor energy limits, the effect of mobility on the communication limits of wireless communication, and so on).

The goal of the dissertation is the design and analysis of algorithmic methods that facilitate efficient and reliable cooperation among IoT elements and cooperatin of IoT elements with central cloud system infrastructure.

## Janota Mikoláš, Mgr., Ph.D.

### Machine Learning for Satisfiability Modulo Theories

Satisfiability modulo theories (SMT) aim at providing push-button technology for solving hard problems coming from practice and that can be formalized in mathematical logic. The sources of applications are for example software testing or verification. However, modern SMT solvers do not learn any new information from solving one problem that could be carried to another problem. In contrast, modern machine learning (ML) techniques enable improving algorithms over time. The topic of this thesis is to build on top of existing SMT algorithms and extend them with ML techniques to qualitatively improve state-of-the-art solvers.

## Janoušek Jan, doc. Ing., Ph.D.

### Arbology

Formal tree languages, formalisms for their generating and description, and models of computations for processing tree languages are one of the areas of the theory formal languages and automata. Practical related research topics can be effective algorithms for various problems of processing trees, such as pattern matching, indexing or computing repeats in trees. Various types of trees are considered, such as ordered and unordered trees, or ranked nad unranked trees. Classes of formal tree languages that are considered can be regular or context-free tree languages. Models of computation for processing trees can be various kinds of tree automata or pushdown automata reading linear notations of trees. The aim of the Ph.D. study is to develop the theory of formal tree languages and of effective algorithms for sequential and parallel processing of trees.

## Kalina Jan, RNDr., Ph.D.

### Reliability of deep learning

Recent research on reliability in the context of deep learning has shown that there is no unique understanding of reliability assessment for training of deep networks, i.e. there is no unity as to which steps should be performed to accompany the training. Statistical approaches to the newly arising field of diagnostics for deep learning start from the simplest ideas such as verification of the trained network on out-of-distribution data [5]. More powerful approaches include verifying the influence of adversarial examples in the data [1], evaluation of uncertainty within deep learning algorithms [3], or computational methods for error propagation throughout the network [2]. One of rare theoretical approaches standing on probabilistic reasoning is the hypothesis test of reliability for a multi-class classifier [4]. The thesis will propose novel tools for assessing reliability of trained deep networks. Theoretical investigations will evaluate the influence of possible errors (uncertainty, measurement errors, or outlying measurements) on the results of a trained deep network. Another aim is to derive diagnostic tools for checking the probability assumptions for deep networks. The tools used here may include bootstrapping and nonparametric combinations of tests.

- [1] Alshemali B., Kalita J. (2020). Improving the reliability of deep neural networks in NLP: A review. Knowledge-based systems 191, 105210.
- [2] Bosio A., Bernardi P., Ruospo A., Sanchez E. (2019). A reliability analysis of a deep neural network. IEEE Latin American Test Symposium LATS 2019, 1-6.
- [3] Caldeira J., Nord B. (2021). Deeply uncertain: Comparing methods of uncertainty quantification in deep learning algorithms. Machine Learning: Science and Technology 2, 015002.
- [4] Gweon H. (2022). A power-controlled reliability assessment for multi-class probabilistic classifiers. Advances in Data Analysis and Classification. Online first.
- [5] Martensson G., Ferreira D., Granberg T., Cavallin L., Oppedal K. et al. (2020). The reliability of a deep learning model in clinical out-of-distribution MRI data: A multicohort study. Medical Image Analysis 66, 101714.

### Robust linear diskriminant analysis

Linear discriminant analysis (LDA) represent a well known statistical classification method. Its various regularized versions are available for high-dimensional data with the number of variables exceeding (perhaps largely) the number of data samples. Recent proposal include regularized robust versions of LDA that are resistant to the presence of outlying values in the data. These consider robust estimates of the mean and of the covariance matrix. The aim of the work will be to propose and investigated novel versions of robust regularized LDA and to perform extensive simulations to study their performance. These will be based on new robust estimates of the covariance matrix: One idea is to exploit the novel MRWCD estimator (minimum regularized weighted covariance determinant) [1]. Another idea is to exploit the estimator based on numerical linear algebra manipulations, particularly on the minimum product of a small number of the largest eigenvalues of the covariance matrix [2].

- [1] Kalina J., Tichavský J. (2022): The minimum weighted covariance determinant estimator for high-dimensional data. Advances in Data Analysis and Classification 16, 977-999.
- [2] Kalina J. (2020): The minimum weighted covariance determinant estimator revisited. Communications in Statistics Simulation and Computation 51, 3888-3900.

## Knápek Jaroslav, prof. Ing., CSc.

### Modelling the knowledge base of economic processes in OntoUml

**Specialist supervisor:** Ing. Petra Pavlíčková, Ph.D.

In the business economy, a lot of data is currently generated and then stored in enterprise information systems (ERP). The data thus obtained and stored are a suitable basis for decision-making by managers at all levels of business management. To support managerial decision-making, it is desirable to formalise expert knowledge on the financial health of the enterprise and the management of the main economic processes in the enterprise in a way that makes them more efficient and increases the profit of the enterprise. Therefore, the framework topic of the dissertation focuses on finding a suitable method of representing the knowledge base in the area of managing the economic processes of the enterprise, in particular using modelling tools used in medical knowledge modelling (GLIF, knowledge ontology). The main direction of the research will be oriented towards modelling the knowledge base of economic processes primarily using OntoUML methodologies.

## Kordík Pavel, doc. Ing., Ph.D.

### Algorithms and architectures of recommender systems

In the recommendation systems, we are currently focusing research on a few open problems that have deep theoretical underpinnings, but whose solutions also have very concrete practical applications. We are exploring the utilization of deep neural networks to reduce the cold start problem of recommender systems, the design of transformers to predict shopping baskets. In the field of general machine learning, we focus on reinforced learning to optimize longer-term metrics such as user satisfaction, on transfer learning methods to incorporate new referral databases, and on using AutoML to optimize the architecture and hyperparameters of recommender systems.

## Křikava Filip, doc. Ing., Ph.D.

### Fast and robust data-analysis pipelines

Data analysis is typically performed by composing a series of discrete tools and libraries into a data analysis pipeline. These pipelines are at the core of data-driven science that has been central to most disciplines and today see an explosion in the widespread use of computational methods and available data. As the number of tools and size of data keep growing, we face problems with the scalability of the pipelines and the trustworthiness of their results.

The goal of this work is to research ways to make data analysis pipelines scalable (accommodate growing data and computational needs) and trustworthy (facilitate auditing of the analysis result). The research will go along two axes. The first will focus on extending the R programming language with transparent horizontal and vertical scaling. The second will study a combination of static and dynamic program analysis techniques to gain insight into the nature and severity of programming errors in the code of data-analysis pipelines, and propose algorithms for their detection and possible automated repair.

## Kroha Petr, prof. Dr. Ing., CSc.

### Text extraction

Text extraction is used in many fields to reduce text by omitting its parts that are not relevant for the user. This includes, for example, summarizing text. The topic comes from the field of text mining that is an area full of open problems. In our published work, we used the techniques of linguistic analysis of sentences (part-of-speech tagging) and clustering according to subsets of sentences (chunking) to analyze the texts of functional requirements specification of software products. We generated questions from the texts to improve the specification of functional requirements by analyzing the answers. The aim of the work is to use proven techniques from our publications in the field of text extraction, compare them with current results of published methods and find better new methods.

## Kubátová Hana, prof. Ing., CSc.

### Anomaly detection and mitigation in computer and IoT networks

**Specialist supervisor:** Ing. Tomáš Čejka, Ph.D.

The aim of this work will be research and development of algorithms for detection, identification, and mitigation of security threats and anomalies in computer networks, especially Internet of Things (IoT). From the perspective of network security, it is necessary to consider the IoT area as a threat not only for IoT devices but also for other devices and services in the Internet, since the IoT network can easily become a source of trouble (as it was observed in recent history). It is necessary to monitor IoT network traffic, derive meta information about the traffic based on events and device behavior, and use such meta information for identification and mitigation of threats, choosing a suitable mitigation strategy.

The goal of the dissertation thesis will be to find suitable ways to create long-term models of communication, which represent negative and positive events, and to use such models for anomaly detection, identification and mitigation of sources of troubles with low latency. Dissertability of this topic is based on non-trivial challenges such as processing and filtering huge volume of network traffic and creation of models of network traffic, discovering anomalies, identification of sources of trouble and choosing correct strategy of mitigation of malicious traffic. Contrary to classic IP networks, the IoT communication lays mainly in specific physical layers. This brings new potential attack vectors that are not detectable using ordinary IP traffic analysis. Therefore, it is needed to find new approaches to IoT traffic monitoring. The base of the work will be a research in the area of statistical methods, probabilistic models, and usage of algorithms of artificial intelligence.

Due to the modern networks bandwidth and on-line monitoring requirements, it is necessary to design and develop algorithms using a decomposition among hardware and software components and to use suitable technologies for hardware acceleration (e.g., FPGA).

### Dependability models and reliability parameters’ computation with respect to realistic properties of modeled systems

**Specialist supervisor:** Ing. Martin Kohlík, Ph.D.

Currently used dependability models are often based on simplified processes leading to unrealistic estimations of dependability and reliability parameters of the modeled systems [1] or rough pessimistic estimations [2]. The aim of the research should be a methodology of the dependability model design allowing fast and accurate calculations of the dependability parameters of the system. The methodology should take over-time changes (e.g. aging, maintenance, repairs) of the system, the structure (blocks and their fail-safes) of the system, and the ways of acquiring dependability parameter data from real applications into account [3].

- [1] Electronic Reliability Design Handbook - MIL-HDBK-338B. US Department of Defense, 1998.
- [2] M. Kohlík, "Hierarchical Dependability Models Based on Markov Chains", Dissertation thesis, Czech Technical University in Prague, 2015.
- [3] Daňhel, M.: "Prediction and Analysis of Mission Critical Systems Dependability", Dissertation thesis, Czech Technical University in Prague, 2018.

### Design methodology of dependable fault-tolerant and attack resistant systems

The research of methods and processes in the design of systems with pre-defined reliability parameters using programmable hardware sources (FPGA, processors). The reaserch of the impact of a redundancy at different levels (space, time, software, hardware) to the attack resistance of the whole system. The research of automatization methods of the design processes including the dependability models construction and dependability parameters computations. Partial results and proposed methods will be evaluated by real-life application and benchmarks.

### Formalization and automatization of digital system design methods

The research area will be based on formal methods and models (Petri Nets, Markov chains, UML diagrams) to use them for simplificationa and automatization of digital design process. The connection of verification methods and dependability modeling at all design periods to obtain an optimized structure according different parameters is assumed. The aim should be the combination of different models and detailed study of their relations and possible automatic conversions. Partial results and proposed methods will be evaluated by real-life applications and benchmarks. An integral part of the topic is the study of possible new models used in industry and / or research.

### New architectures with guaranteed level of dependability parameters for reconfigurable circuits

**Specialist supervisor:** Ing. Pavel Kubalík, Ph.D.

Main aims of this research are:

- Design of new architectures based on on-line error control codes available for implementation in reconfigurable hardware (FPGAs). The fulfillment of required reliable parameters is essential, together with a low area overhead, appropriate operational frequency, and low power consumption for mission-critical systems (intelligent cars, robots, etc.) where FPGA are more and more used due to their properties (and price).

- Proposal of appropriate methods to automatically select the best type of fault-tolerance and safety with respect to a particular application, its requirements and the necessary constrains, including design speed.

- Utilization of existing models and modifying them to solve this problem at the system level, and linking them to the hierarchical dependability models, created at the department.

The target platform will be FPGA circuits allowing fault recovery of a part design or completely change of an implemented function. The research will be mainly focused on the utilization of new error control codes for an optimized architecture enabled error detection and correction suitable for FPGA. The standard known fault-tolerant structures such as TMR or a duplex will be taken into account, too.

All proposed architectures and the partial results will be tested on standard known benchmarks and on dedicated sets of own circuits. The evaluation criterion will be mainly focused on realistic calculations of dependable parameters, together with low area occupation requirements.

### Research of Dependability Improvement at ISA Level

The proposed research has to verify possibilities, how to achieve (and guarantee) predefined system design constraints (size, power-consumption, dependability characteristics, fault-tolerance and attack-resistance levels via trade-off between hardware and software parts and types.

It is supposed to use Codasip system and RISC-V processor (open-source hardware instruction set architecture (ISA) based on established reduced instruction set computer (RISC) principles, designed (not only) for embedded applications, where low-power and performance are emphasized. Codasip can provide their processors with high-level tools that enable to configure the RISC-V cores, to automatically generate toolchains and synthesizable, human-readable Verilog RTL. Codasip system (https://codasip.com/) offers a verification environment, too.

The research will contain proper evaluation of obtained architecture by improved dependability models. The possible methods will be ISA improvement (e.g. adding the cryptographic instructions, specialized block, using of more application-specific processors). Final experiments will be performed by simulations and FPGA implementations.

## Kůrková Věra, RNDr., DrSc.

### Robustness of learning of deep and shallow networks

**Specialist supervisor:** RNDr. Petra Vidnerová, Ph.D.

Deep networks became state-of-the-art performance on a variety of pattern-recognition tasks, most notably visual classiffication ones. Several recent studies revealed that some deep networks can be easily fooled by changing an image in a way imperceptible to humans which can cause a deep network to label the image as something else entirely. Thus it is important to investigate robustness of learning of networks with respect to adversarial patterns.

The goal of the dissertation is to design multiobjective algorithms generating imperceptible perturbations of images leading to their misclassifications. Evolutionary algorithms for generation of adversarial patterns will be developed that maximize the error of the model while being as similar to original training patterns as possible. The role of depth of networks and locality properties of computational units in misclassiffication will be analyzed. Methods of making neural networks more robust to such adversarial patterns will be investigated.

## Langr Daniel, Ing., Ph.D.

### Algorithms and data structures for modelling of atomic nuclei

Symmetry-adapted no-core shell model is a state-of-the-art method for ab-initio (from first principles) modeling of atomic nuclei. This modeling is computationally very demanding for computer resources. Even with the use of the most powerful contemporary supercomputers, we are able to compute models of only relatively light nuclei. The symmetry-based no-core shell model allows to reduce the computational and memory complexity without the loss of the quality of obtained nuclear wave function approximations. The goal of this dissertation topic is to perform research in parallel algorithms and corresponding data structures scalable from shared memory multicore CPU systems to highly-parallel distributed memory supercomputers for computing these models. It is a part of an ongoing long-term research activity. A primary goal is the research of tradeoffs between computational and space complexity of highly parallel algorithms for solving eigenproblem of large sparse matrices. On-the-fly calculation of matrix element (without their explicit storage in memory) and optimization of the eigensolver convergence rate are of special interest.

## Lhotská Lenka, doc. Ing., CSc.

### Anomaly detection in long time series

A distant observation (or "outlier") is one that differs significantly from other elements in the sample where it occurs. The term "anomaly" refers to different problems in different areas. It is very important to detect anomalies as soon as possible to avoid major losses or problems, whether it is a machine failure, a tumor in the human body, etc.

In most areas, the collected / measured data has the character of streamed time series. Due to their inherent properties, such as periodicity, trends, seasonality and irregularity, accurate anomaly detection is a big problem. In addition, in most real scenarios, it is virtually impossible to annotate huge amounts of data. Therefore, unsupervised learning methods such as clustering are often used. However, they do not take into account the time parameter, which is an inseparable context in time series. Therefore, they do not allow to find anomalies that occur in cycles.

The aim of the work will be to design, implement and test a method of unsupervised learning, which will take into account the context, seasonality and trends in the detection of anomalies. The method should be adaptable for different scenarios and cases and able to process data from different areas.

### Contextual information in machine learning

Big data raises a number of questions, the most important of which are the recognition and proper use of different types of dependencies, as well as the context, that usually carries useful information. A typical example is the spatial arrangement of data sources, such as sensors for collecting environmental data in different environments. This context, together with the measured time series, represents a very important source of information for output prediction.

In machine learning tasks a key role is played by feature selection. In many cases, feature selection is often more complicated than identifying a single subset of input variables that would together explain the output. There may be interactions that depend on contextual information, i.e. variables that reveal to be relevant only in some specific circumstances. The basic problem to be solved is the identification of the input variables whose relevance or irrelevance for predicting the output only holds in specific circumstances, where these circumstances are assumed to be encoded by a specific context variable. This context variable can be for example a standard input variable, in which case, the goal of contextual analyses is to better understand how this variable interacts with the other inputs for predicting the output. The context can also be an external variable that does not belong to the original inputs but that may nevertheless affect their relevance with respect to the output.

The aim of the work will be to design, implement and test a method of supervised learning, which will use contextual information for more efficient output prediction. The method should be adaptable for different scenarios and cases and able to process data from different areas.

## Lórencz Róbert, prof. Ing., CSc.

### Cryptocurrency algorithms

Cryptocurrencies are a new phenomenon that is based on decentralization and allows us to make anonymous payments. Today, there are over a thousand cryptocurrencies that are based on different concepts such as proof-of-work or proof-of-stake. The goal will be to design a new cryptocurrency that would meet both security and market requirements such as scalability, sufficient transaction processing speed, low latency, and would be environmentally friendly.

### Dedicated Hardware for Modular Arithmetic

The aim is the design and implementation of dedicated hardware architectures for computing modular arithmetic operations. The results are applicable in elliptic curve cryptography, as well as in other systems that utilize modular arithmetic.

### Malware detection

Malicious code or malware is one of the biggest security threats today. A huge amount of new malicious code is generated every day, and since it is not possible to analyze each sample separately, it is necessary to develop automatic mechanisms to detect it. Machine learning algorithms turn out to be a useful tool for automatic detection of malware. With them, zero-day malware can also be detected, but in contrast to standard procedures such as signature-based detection, they achieve higher false positive (FP) ratio. The aim of the dissertation will be to develop an automatic malware detection system that achieves a solid classification accuracy and has a minimum FP.

### Mixed-radix conversion (MRC) algorithm for converting results from a system of linear congruences into a system of linear equations

The solution of an integer system of linear equations (SLE) without rounding errors can be done by dividing the solving process into systems of linear congruences (SLC), and then converting the results into a set of solutions of the original SLE. The so-called MRC algorithm is used for this conversion, which has the complexity O(nm2), where n is the matrix dimension and m is the number of SLK (modules) used.

The aim of this work is to find a more efficient way of using the MRC algorithm that benefits from the knowledge of mutual data dependency of the SLE solution. It is also possible to design a parallelization of the newly designed algorithm. The result is an MRC-based method with less than O(nm2) complexity for solving the conversion process of SLC results to SLE results.

### Modeling behavior of semiconductor components due to ionizing radiation

The behavior of various semiconductor circuits is also dependent, among other factors, on the environment in which they operate. Desirable information for users of various HW devices is the reliability of these devices on age, and to some extent the associated resistance of the semiconductor components to ionizing radiation.

The topic of the dissertation is mathematical modeling of the behavior of HW semiconductor components at various technological levels, depending on irradiation with ionizing/particulate radiation. The aim of this work is to create a model of HW device behavior including aging factors and material degradation due to radiation. The results will be useful for determining the reliability/error-free lifetime of circuitry exposed to radiation or long-term use.

### Post-quantum cryptography

The study of suitable post-quantum cryptosystems has long been in the interest of cryptologists. The reason for this is the thriving field of quantum computer technology, which could endanger the security of asymmetric cryptosystems by using suitable factorization algorithms.

The topic of the dissertation is the study and analysis of existing and new methods of post-quantum cryptographic algorithms. The goal is to create an asymmetric cryptosystem that is resistant against quantum-based attacks and is simple to implement and secure.

One of the candidates for post-quantum cryptosystems suitable for analysis and eventual improvement is the McEliece asymmetric encryption algorithm based on binary Goppa codes. This algorithm complies with the security requirements for asymmetric cryptosystems of today, but there is a problem with its large spatial complexity. Trying to reduce the size of the keys in this algorithm can be a good initial challenge for further research.

### Research of the behavior of physically unclonable functions (PUFs) and true random number generators (TRNGs)

A quality TRNG is essential for current hardware components of cryptographic. Reliable key generators based on PUF are also required. Such key generation is very much in demand, because the key generated in this way remains the "secret" of the cryptosystem hardware itself.

The topic of the dissertation is the study of the proposed PUF and TRNG in terms of their long-term stable response. The aim of this work is to explore existing and propose new PUF and TRNG solutions that are suitable for long-term generation of high-quality output by TRNG and which also guarantee stable key generation based on PUF responses. The work includes the study and understanding of the behavior of these components at the statistical level and also at the physical/technological level.

## Miškovský Vojtěch, Ing., Ph.D.

### Side-Channel Analysis, Attacks and Countermeasures

Side-channel analysis is an important topic of computer security as it enables an attacker to reveal sensitive information even if it is protected by cryptographically strong algorithms. It has been more than twenty years since first side-channel attacks were proposed, nevertheless, this topic still offers many research challenges. Attacks against the forthcoming postquantum cryptographic algorithms or so far less researched algorithms, e.g., ARX-based ones, are very actual topics. Another one is utilization of artificial intelligence in SCA. Non-interference and gadget composition are current hot topics in area of SCA countermeasures as so as is high-level synthesis. The subject of the proposed dissertation thesis aims to these challenges and related research areas.

## Novotný Martin, Dr.-Ing.

### Cryptographic/Cryptanalytical Architectures in Embedded Systems and Reconfigurable Hardware

Research in methods of implementation and acceleration of cryptologic operations and schemes in embedded systems and in reconfigurable hardware, namely in field-programmable gate arrays.

## Pajdla Tomáš, doc. Ing., Ph.D.

### Learning to solve multiple view geometry

Multiple view geometry is an important field in computer vision and robotics, which provides understanding to the foundations of the subject. The development of this field has, however, not yet been greatly influenced by the recent advances in machine learning and geometrical machine learning in particular. We aim at using machine learning to address long-standing problems in multiple view geometry that have not been solved by traditional techniques. For instance, current methods for computing camera geometry from image matches can cope efficiently with only relatively simple problems in two-view geometry. Thus, there are still no really efficient solvers even for three-view geometry. The classical approach to solver design leads in this case to very complicated computation procedures due to trying to solve exact algebraic problems. Recent advances in numerical algebraic geometry start providing a practical alternative to the elimination-based approaches. In particular, we believe that these approaches can be greatly improved in efficiency and robustness by learning optimal strategies of solving, efficient problem analysis, and fitting the problem to the data. Our plan is to develop a new approach to solving hard problems in multiple-view geometry that would use machine learning to tune solving techniques to the data and thus making them tractable and efficient.

- [MB] M. Bronstein et alt. “Geometric deep learning: going beyond Euclidean data”. IEEE Signal Processing Magazine, vol. 34, no. 4, pp. 18-42, 2017.
- [NS] D. Nister and F. Schaffalitzky. “Four Points in Two or Three Calibrated Views: Theory and Practice”. International Journal of Computer Vision (IJCV), vol. 67, pp. 211-231, 2006.
- [Fal] R Fabbri et al. “TRPLP – Trifocal Relative Pose From Lines at Points”. IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 12070-12080, 2020.
- [PLMPl] T Duff et al. “PLMP - Point-Line Minimal Problems in Complete Multi-View Visibility”. IEEE International Conference on Computer Vision (ICCV), pp. 1675-1684, 2019.
- [PL1P] T Duff et al. “PL1P - Point-line Minimal Problems under Partial Visibility in Three Views”. European Conference on Computer Vision (ECCV), pp. 175-192, 2020.
- [BGrob] V Larsson et al. "Beyond Grobner Bases: Basis Selection for Minimal Solvers". IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3945-3954, 2018.
- [Pfeifer] D Peifer et al. “Learning selection strategies in Buchberger's algorithm”. International Conference on Machine Learning (ICML), pp. 7575-7585, 2020.
- [Bernal] E. Bernal et al. “Machine learning the real discriminant locus”. ArXiv preprint, June 2020.
- [DSAC] E Brachmann et al. "DSAC – Differentiable RANSAC for Camera Localization", CVPR 2017.
- [Ranftl] R Ranftl, V Koltun. Deep Fundamental Matrix Estimation. ECCV 2018.
- [Rolinek] M Vlastelica et al. DIFFERENTIATION OF BLACKBOX COMBINATORIAL SOLVERS. ICLR 2020.

## Pokorný Jaroslav, prof. RNDr., CSc.

### Conceptual models of graph databases

**Specialist supervisor:** Ing. Michal Valenta, Ph.D.

In recent years, graph databases have been intensively developed and used in various areas (data lineage, artificial intelligence, optimization of supply chains, data analysis, ...). Representing data using both directed and undirected graphs emphasizes the interconnectedness of data and allows it to be viewed from different perspectives. With the use of graph databases, there is a growing need for research and development of conceptual modeling methods aimed at the subsequent implementation of schemes in graph databases. The dissertation will deal with the research of conceptual modeling methods and the implementation of conceptual models with great expressiveness (for example, OntoUML) in graph databases.

## Ratschan Stefan, doc. Dipl.-Ing. Dr. techn.

### Constraint Solving

We develop software, theory, and algorithms that can solve mathematical constraints consisting of non-linear equalities and inequalities, conjunctions, disjunctions, and logical quantifiers. We apply the constraint solving technology in several areas (e.g., verification of complex systems, computation of Lyapunov functions), and offer a large range of thesis topics in the area.

**More information:**

### Executable Specifications

One of the big drawbacks of classical methods for specifying software requirements is that it is difficult to check the consistency and adequacy of a given specification. The software engineering community has come up with various remedies to this situation (e.g., model based software engineering, agile software development, executable algebraic specification) that usually aim at allowing the user to see the actual behavior of a given specification as early as possible in the development process, but that makes compromises in terms of expressivity or precision. In the thesis, the student will design an executable formal specification language that fulfills two seemingly contradicting objectives:

• High modeling capabilities in terms of expressivity and precision

• Efficient execution

The key to achieving both objectives at the same time will be the design of annotations for the specification language that provide information necessary of efficient execution. Increasing user efforts in writing annotations will result in increasing efficiency of execution of the given specification.

### Verification of Complex Systems

The Pendolino train from Prague to Ostrava had severe problems in its initial phase: from time to time it stopped in the middle of nowhere due to a software bug. Similar bugs have been the cause of more severe incidents, for example the crash of the Ariane 5 rocket in 1996. We are doing research that will help to avoid such problems in the future, and offer a large range of thesis topics in the area.

**More information:**

## Schmidt Jan, doc. Ing., Ph.D.

### Algorithm acceleration in Electronic Design Automation

Contemporary industrial design and verification of digital systems has strong demands on time-to-market and hence design time. Often, we hear requirements such as “1 percent better result for 1 percent longer computation”. In this situation, accelerating the computation seems desirable. Yet, current algorithms have been designed with sequential processing in mind. They usually are complex, multi-layer iterative heuristics. To accelerate such algorithms, it is necessary either find parallelism e.g., in branch-and-bound heuristics, or replace some parts of the algorithm with better parallelizable algorithms. Further, we need to investigate which architectures and what granularity suit the designed algorithm best. It is necessary to keep work-optimality in mind, as it projects itself directly into energy efficiency.

### Application-specific contraints and optimization in approximate computing

Approximate computing is a collection of design methods for systems that fulfill their purposes but have dramatically lower demands for implementation resources, such as power or area. There are two metrics for the approximation in use. The first one is arithmetic difference and is used whenever a set of binary values is interpreted as a number. The second one is the Hamming distance, that is, the number of binary values differing from original specification. There is a number of methods for both of the metrics in development. However, the true meaning of the approximation metric should follow the intended purpose of the system. In the case of image representation, arithmetic difference usually suffices. In other applications, besides varying metrics, also hard constraints appear, that is, constraints that the approximation may not violate. Most of contemporary methods cannot accept such constraints and metrics. The aim of the work is to develop methods able to accept application-specific constraints and metrics, and, eventually, document the price that must be paid for such a versatility.

## Šimeček Ivan, doc. Ing., Ph.D.

### New parallel algorithms for indexing of data from powder diffraction

**Specialist supervisor:** Ing. Jan Rohlíček, Ph.D.

X-ray structure analysis of data from powder diffraction is an important analytic method for crystal structure determination of samples that do not offer a suitable monocrystals. The indexation process is one of the critical bottlenecks for application of this method. Without the determination of the crystal lattice parameters, we cannot estimate the crystallic structure. Existing methods for indexation have problems with low-quality data as well as with the indexation of phase mixtures. The goal of this research is to develop more efficient parallel algorithms than the current ones for solving these problems.

### Sparse matrix and tensor formats suitable for massively parallel algorithms in numerical linear algebra

**Specialist supervisor:** Ing. Daniel Langr, Ph.D.

The used sparse matrix storage format has great impact on performance and scalability of the algorithms in numerical linear algebra.

The ideal format ensures minimal memory storage requirements, maximum utilization of floating point vector units, maximum utilization of cache memories, and enables load balanced parallelization of the algorithms on massively parallel systems.

Several sparse matrix formats have been proposed in recent years, but these specialized and efficient formats also have some drawbacks. They suffer from a significant transformation overhead, are designed only for a limited set of matrix operations, or do not support fast adding or removing nonzero elements.

The dissertation goal is to conduct research on new sparse-matrix formats for massively parallel architectures to address these trade-offs and develop optimization heuristics for using these formats for a sparse matrix in numerical linear algebra operations.

## Šivic Josef, Dr. Ing.

### Learning visuomotor skills for robotic manipulation

Humans can solve everyday manipulation tasks remarkably efficiently and safely. With only a few interactions they learn to use tools without knowing a priori their exact physical properties or the properties of the environment to solve tasks such as hammering a nail, shoveling snow, raking leaves, or drilling holes into different materials. Currently, there is no artificial system with a similar level of visuomotor capabilities.

The objective of this thesis is to develop machine learning models grounded in the physical and geometrical structure of the world that enable learning safe visuomotor skills for robotic manipulation in new unseen environments with only a minimal amount of supervision, for example, coming from observing people performing the same task.

**More information:**

- Z Li, J Sedlar, J Carpentier, I Laptev, N Mansard, J Sivic, Estimating 3D Motion and Forces of Person-Object Interactions From Monocular Video, IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2019).
- Y Labbe, S Zagoruyko, I Kalevatykh, I Laptev, J Carpentier, M Aubry, J Sivic, Monte-Carlo Tree Search for Efficient Visually Guided Rearrangement Planning, IEEE Robotics and Automation Letters (2020).
- Y Labbe, J Carpentier, M Aubry, J Sivic, CosyPose: Consistent multi-view multi-object 6D pose estimation, European Conference on Computer Vision (ECCV) (2020).

### Machine Learning for Analysis of Molecular Dynamics Simulations

Molecular dynamics (MD) simulations allow analyzing the physical movements of biomolecules. The generated data are sequences of frames (in 100 000s) captured at a predefined time step. Each frame consists of positions of all the atoms of a protein (in 10 000s), which are simulated using a molecular mechanics force field. The analysis of such a massive amount of data is often challenging especially for molecules with conformational heterogeneity, such as the disordered Abeta peptide relevant for Alzheimer's disease (AD). Abeta peptide is the hallmark of the disease and adopts diverse conformations. Understanding the dynamic properties of the Abeta protein is a key to determine the effects of drug candidates for potential AD treatment. The objective of this thesis is to build on recent advances in large-scale weakly and self-supervised learning for video sequences and develop new methods for automatic analysis of molecular dynamics simulations and more generally protein engineering. This topic will be co-advised with Dr. Stanislav Mazurenko and Prof. Jiri Damborsky (Loschmidt Laboratories, Masaryk University).

**More information:**

- S Mazurenko, Z Prokop, J Damborsky. Machine learning in enzyme engineering, In ACS Catalysis, 10 (2), 1210-1223, 2020.
- A Miech, D Zhukov, JB Alayrac, M Tapaswi, I Laptev, J Sivic, Howto100m: Learning a text-video embedding by watching hundred million narrated video clips, In Proceedings of the IEEE/CVF International Conference on Computer Vision and Pattern Recognition, 2019.
- J-B Alayrac, P Bojanowski, N Agrawal, J Sivic, I Laptev, S Lacoste-Julien, Learning from Narrated Instruction Videos, IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017.
- A Mardt, L Pasquali, H Wu, F Noe, VAMPnets for deep learning of molecular kinetics. In Nature Communications, 9, 5, 2018.

### Weakly supervised learning for visual recognition

Building machines that can automatically understand complex visual inputs is one of the central problems in artificial intelligence with applications in autonomous robotics, automatic manufacturing or healthcare. The problem is difficult due to the large variability of the visual world. The recent successes are, in large part, due to a combination of learnable visual representations based on convolutional neural networks, supervised machine learning techniques and large-scale Internet image collections. The next fundamental challenge lies in developing visual representations that do not require full supervision in the form of inputs and target outputs, but are instead learnable from only weak supervision that is noisy and only partially annotated data. This thesis will address this challenge.

**More information:**

- Alayrac, J.-B., Bojanowski, P., Agrawal, N., Laptev, I., Sivic, J. and Lacoste-Julien, S., Learning from narrated instruction videos IEEE Transactions on Pattern Analysis and Machine Intelligence, 40 (9), 2194-2208 (2018)

## Skopal Tomáš, prof. RNDr., Ph.D.

### Similarity Search in Big Data

The complexity of next-generation retrieval systems originates from the requirement to organize massive and ever growing volumes of heterogeneous data and meta-data, together with the need to provide distributed management prevalently based on similarity matching. The problem starts with data acquisition of weakly structured or completely unstructured data, such as images and video, which necessarily need innovative techniques for information extraction and classification to increase their findability. In principle, we consider the object findability and the actual search process as two fundamental and synergic aspects of the retrieval. Both of them pose effectiveness and efficiency challenges which need innovative theories and technologies, and must be studied together to converge to qualitatively new retrieval tools of the future. The dissertation topic is foundational in nature as it addresses the theoretical limits of similarity retrieval in context of the Big Data problem. Fundamental to the thesis is the development of scalable solutions.

## Skrbek Miroslav, Ing., Ph.D.

### Synthesis of artificial intelligence and machine learning models to programmable hardware

Artificial intelligence and machine learning are increasingly used in real applications and thus significantly penetrates the field of embedded and especially cyberphysical systems. A typical example is the image analysis subsystem for self-driving cars or intelligent sensors. Unlike the big data area, these systems have limited computing power and have high variability in hardware architecture, and in addition, these systems are subject to additional requirements, especially for real-time processing, security and reliability, explainability, power consumption and chip size. From these perspectives, the resulting implementation must be optimized to minimize hardware resources while maintaining functionality and reliability to meet economic goals. Automated conversion of machine learning models such as deep neural networks into ASIC hardware and especially programmable hardware (FPGA) is currently a very current topic.

The subject of the proposed topic is the research of algorithms, procedures, methodologies and tools for the synthesis of artificial intelligence models and machine learning into programmable hardware. Current research is in the field of approximate calculations, hardware optimization using limited accuracy targeted according to the values of individual parameters, including other aspects of mapping to the target platform (consumption, chip size, timing). The subject of research will be not only a simple mapping of the learned model to hardware, but also the optimization of the AI model with respect to the implementation in hardware, which requires a closer connection of AI tools with tools for high-level and logical synthesis. The topic can be extended to the synthesis of networks with spiking models of neurons and the integration of learning algorithms into hardware, which is currently an unsolved or little solved problem.

- [1] Shubham Raif, et al. Logic Synthesis Meets Machine Learning: Trading Exactness for Generalization. arXiv:2012.02530v2 [cs.LG] 15 Dec 2020.
- [2] Chia-Chih Chi and Jie-Hong R. Jiang. Logic Synthesis of Binarized Neural Networks for Efficient Circuit Implementation. In IEEE/ACM INTERNATIONAL CONFERENCE ON COMPUTER-AIDED DESIGN (ICCAD ’18), November 5–8, 2018, San Diego, CA, USA. ACM, New York, NY, USA, 7 pages. https://doi.org/10.1145/3240765.3240822
- [3] Yifan Qian, et al.: Approximate Logic Synthesis in the Loop for Designing Low-Power Neural Network Accelerator. 2021 IEEE International Symposium on Circuits and Systems (ISCAS) | 978-1-7281-9201-7/20/$31.00 ©2021 IEEE | DOI: 10.1109/ISCAS51556.2021.9401451

## Smotlacha Vladimír, RNDr. Ing., Ph.D.

### Stable frequency and time transfer in optical network

Although optical networks are primarily used for data transmissions, they currently provide a suitable and perspective platform for other applications, including the transmission of a highly stable frequency and accurate time from atomic clocks to distances of up to thousands of kilometers. The advantage of the optical line is that it allows simultaneous bidirectional transmission in the same fiber and thus the interfering effects can be compensated for, as they apply equally in both directions. The used methods of time and frequency transmission work on different network layers, the most accurate of them on the link and physical layer. The student will have to get acquainted in detail with existing methods, measure and evaluate their parameters. The research will be focused on the design of new protocols, methods of encoding transmitted information and carrier signal modulation with the goal to improve accuracy and reliability of the transmitted time and frequency. This is a complex topic that is interdisciplinary and includes computer networks and programming of FPGA circuits with a possible overlap in the design of electronic circuits and possibly opto-electronic circuits.

## Starosta Štěpán, doc. Ing., Ph.D.

### Combinatorial properties of infinite words generated by morphisms

Infinite words generated by morphisms include fixed points of morphisms and words stemming from S-adic systems [1] and L-systems [2]. The objective of the thesis is the research of combinatorial properties of these structures, for instance, their factor/palindromic/abelian complexity and properties that are preserved in the case of the generating morphisms being recognizable. The thesis may furthermore focus on design of effecient algorithms or/and formalization in the generic prove assistant Isabelle/HOL [3].

- [1] Berthé, V., Delecroix, V.: Beyond substitutive dynamical systems: S-adic expansions, RIMS Lecture note ‘Kokyuroku Bessatu’ B46, pp. 81–123 (2014)
- [2] Rozenberg, G. Salomaa, A.: The Book of L, Springer Berlin Heidelberg, 1986
- [3] https://isabelle.in.tum.de/

## Šůcha Přemysl, doc. Ing., Ph.D.

### Scheduling Tests in Medical Laboratories: Reduction of Turn-Around Time

There is no doubt that a timely diagnosis is critical to avoid significant harm and death of patients suffering from such diseases as acute ischemic stroke, heart attack, or serious cases of COVID-19 infection. Therefore, medical laboratories must be able to quickly analyze the sample and provide the result. The key performance indicator measuring how fast a laboratory can analyze a sample is called turn-around time (TAT). A factor influencing TAT is how the samples are sequenced in the laboratory workflow. This topic aims to study scheduling problems related to the samples' processing. Our analysis in medical laboratories and review of the literature has shown that this problem is not sufficiently studied while the scheduling of samples' processing can significantly reduce the TAT of high priority samples.

The goal of this thesis is to propose new mathematical models for the scheduling of samples in laboratories, and design efficient scheduling algorithms where the solution space exploration is accelerated by machine learning methods.

- [1] ATLAS Collaboration, "ATLAS Software and Computing HL-LHC Roadmap", https://cds.cern.ch/record/280291

## Surynek Pavel, prof. RNDr., Ph.D.

### Automated Planning for Robotic Agents

In automated planning, the task is to find a sequence of actions that after execution generate a specific goal [1,2]. Planning takes place at the abstract level where the planning world in which the task takes place is described using sets of logical atoms. The actions change the planning world using set operations assuming certain preconditions are satisfied. Unlike general planning where actions can in principle change the planning world arbitrarily, planning for robotic agents assumes than actions are performed by one or multiple robotic agents while the topology of the planning world plays a role. The robotic agents are assumed to be localized within the planning world. That is, the limited reach of robots’ effectors, the need to move to the place of action, and occupation of space by other agents [3,4] need to be taken into account. The open question is the design of centralized algorithms for efficient and robust planning considering both a single-robot case and multiple cooperating robotic agents.

- [1] Malik Ghallab, Dana S. Nau, Paolo Traverso: Automated Planning and Acting. Cambridge University Press 2016, ISBN 978-1-107-03727-4
- [2] Malik Ghallab, Dana S. Nau, Paolo Traverso: Automated planning - theory and practice. Elsevier 2004, ISBN 978-1-55860-856-6, pp. I-XXVIII, 1-635
- [3] Ron Alterovitz, Sven Koenig, Maxim Likhachev: Robot Planning in the Real World: Research Challenges and Opportunities. AI Magazine 37(2): 76-84 (2016
- [4] Yawen Deng, Yiwen Hua, Nils Napp, Kirstin Petersen: A Compiler for Scalable Construction by the TERMES Robot Collective. Robotics Auton. Syst. 121 (2019)

### Lazy Compilation for Classical Planning

In automated planning, the task is to find a sequence of actions that, after performing, meet a certain goal [1, 2]. This topic specifically focuses on compilation of the planning task into another formalism, such as constraint satisfaction (CSP) [3] or propositional satisfiability (SAT) [4]. Especially promising seems to be the lazy compilation, when the task is encoded into the target formalism incompletely. The encoding is subsequently refined in cooperation with the solver by generating counterexamples. The lazy compilation technique has been successfully used in the specific domain of multi-agent path finding [5]. However, the generalization of lazy compilation for classical planning is still an open question.

- [1] Malik Ghallab, Dana S. Nau, Paolo Traverso: Automated Planning and Acting. Cambridge University Press 2016, ISBN 978-1-107-03727-4
- [2] Malik Ghallab, Dana S. Nau, Paolo Traverso: Automated planning - theory and practice. Elsevier 2004, ISBN 978-1-55860-856-6, pp. I-XXVIII, 1-635
- [3] Rina Dechter: Constraint processing. Elsevier Morgan Kaufmann 2003, ISBN 978-1-55860-890-0
- [4] Armin Biere, Marijn Heule, Hans van Maaren, Toby Walsh: Handbook of Satisfiability. Frontiers in Artificial Intelligence and Applications 185, IOS Press 2009, ISBN 978-1-58603-929-5
- [5] Pavel Surynek: Unifying Search-based and Compilation-based Approaches to Multi-agent Path Finding through Satisfiability Modulo Theories. Proceedings of the 28th International Joint Conference on Artificial Intelligence (IJCAI 2019), pp. 1916-1922, Macau, China, International Joint Conferences on Artificial Intelligence, 2019, ISBN (Online): 978-0-9992411-4-1

### Multi-Robot Motion Planning and Execution

Motion planning in robotics is one of the key steps to execute an abstract plan represented as a sequence of actions through the physical acting of the robot in the environment [1,2]. The motion plan maps to each continuous moment the spatial setting of the individual parts and effectors of the robot, the so-called configuration. Motion planning techniques typically work with a continuous multi-dimensional space of all possible robot configurations, where finding a motion plan corresponds to finding a continuous trajectory in the configuration space [1,3]. The research challenge is represented by motion planning for a multi-robot case with a potential for spatial collision between robots. The size of the joint configuration space increases with the increasing number of robots, and hence finding non-colliding trajectories is more difficult [4]. An interesting question is also the movement of real robots and the response to possible failure of one or more robots.

- [1] M. LaValle, S.: Planning Algorithms. Cambridge University Press, 2006
- [2] Malik Ghallab, Dana S. Nau, Paolo Traverso: Automated Planning and Acting. Cambridge University Press 2016, ISBN 978-1-107-03727-4
- [3] Choset, H., Lynch, K. M., Hutchinson, S., Kantor, G., Burgard, W., Kavraki, L. E., Thrun, S.: Principles of Robot Motion: Theory, Algorithms, and Implementations. MIT Press, 2005
- [4] Duong Le, Erion Plaku: Cooperative, Dynamics-based, and Abstraction-Guided Multi-robot Motion Planning. J. Artif. Intell. Res. 63: 361-390 (2018)

### Solving Problems in Artificial Intelligence by Reduction to Satisfiability

Research in this topic will focus on finding solutions to selected problems in artificial intelligence by reduction to satisfiability [1]. Problems representing suitable candidates for this approach include domain-dependent planning (such path-finding, robot planning) [2], scheduling, circuit design, or combinatorial tasks. The concept of satisfiability is understood here in the broader sense, that is, not only as the basic propositional satisfiability (SAT problem), but also as constraint satisfaction [3] or a combination of satisfiability in various logic theories (SAT-modulo theory - SMT) [4].

- [1] Armin Biere, Marijn Heule, Hans van Maaren, Toby Walsh: Handbook of Satisfiability. Frontiers in Artificial Intelligence and Applications 185, IOS Press 2009.
- [2] Pavel Surynek: Time-expanded graph-based propositional encodings for makespan-optimal solving of cooperative path finding problems. Ann. Math. Artif. Intell. 81(3-4): pp. 329-375, 2017.
- [3] Francesca Rossi, Peter van Beek, Toby Walsh: Handbook of Constraint Programming. Foundations of Artificial Intelligence 2, Elsevier 2006.
- [4] Daniel Kroening, Ofer Strichman: Decision Procedures - An Algorithmic Point of View, Second Edition. Texts in Theoretical Computer Science. An EATCS Series, Springer 2016.

### Techniques and Algorithms for Multi-agent Path Finding

The aim of the research is to design new concepts and related solving algorithms for Multi-Agent Path Finding [1]. The basic variant of the MAPF problem takes place on the undirected graph with agents placed in vertices. We are searching paths for individual agents so that each agent can reach its goal from the specified starting position by following its path and does not collide with any other agent. Contemporary solving techniques for MAPF include a variety of methods ranging from sub-optimal polynomial algorithms [2] through optimal algorithms based on state space search [3] to transformations into different formalisms such as SAT [4]. The open questions offer, in particular, generalized variants of MAPF, where we consider more complex avoidance, maintenance of formations, connectivity maintenance or other global properties (properties that includes all agents), generalized cost functions, or multiple independent (adversarial) teams of agents.

- [1] David Silver, “Cooperative pathfinding,” Proceedings of AIIDE, pp. 117–122, 2005.
- [2] Boris de Wilde, Adriaan ter Mors, Cees Witteveen: Push and Rotate: a Complete Multi-agent Pathfinding Algorithm. J. Artif. Intell. Res. 51: pp. 443-492, 2014.
- [3] Guni Sharon, Roni Stern, Meir Goldenberg, Ariel Felner: The increasing cost tree search for optimal multi-agent pathfinding. Artif. Intell. 195, pp. 470-495, 2013.
- [4] Pavel Surynek, Ariel Felner, Roni Stern, Eli Boyarski: Efficient SAT Approach to Multi-Agent Path Finding Under the Sum of Costs Objective. Proceedings of ECAI, pp. 810-818, 2016.

## Tvrdík Pavel, prof. Ing., CSc.

### Distributed IDS

**Specialist supervisor:** Ing. Jan Fesl, Ph.D.

Classic Intrusion Detection Systems (IDS) have been playing an important role in the security of computer networks or data centers for many years. In the case of networks with ultra-high data flow, the classic IDS concept can no longer be used due to limited local computing power. In order to eliminate this limitation, it appears promising to use a solution based on distributed IDSs (DIDS). The task of the dissertation will be to study algorithms and methods that will be used both for load distribution between network traffic probes and for the analysis of network traffic within a specific DIDS. To increase efficiency or refining the performance of the prediction of a specific network activity using DIDS collaborative machine learning appears perspective since it fully supports the DIDS concept.

### Formal models of DDOS detection/elimination

**Specialist supervisor:** Ing. Jan Fesl, Ph.D.

Distributed DDOS attacks (classic or slow) pose a major threat in today's Internet as well as in software-defined networks and clouds. An effective defense against such attacks is their fast and reliable detection, which is, however, non-trivial given the number of parameters that need to be taken into account. The topic of the dissertation is the research of mathematical models enabling the design of algorithms for the detection of both types of DDOS attacks. Currently, models based on a combination of machine learning and traditional analytical and statistical methods, which enable fast data preprocessing and thus can significantly reduce the time for making decisions, which is absolutely essential in the case of DDOS attacks, seem to be promising.

### Formal verification of network configuration models

The configuration of efficiently functioning computer networks arranged in complex topologies is nowadays considered a non-trivial problem. It often happens that a change in the configuration of a particular element within a computer network affects its other parts (e.g., in terms of availability, latency, quality of service, etc.). However, at the time of activation of the configuration, its effect may not be immediately apparent and it is therefore advisable to first verify the possible effects of the configuration. The task of the dissertation is the development of algorithms and methods for the formal verification of the state of network elements, which can point out a potential problem before the actual activation of the configuration.

### Parallel and GPU accelerated algorithms for reconstruction of recorded data into physical objects

**Specialist supervisor:** doc. Dr. André Sopczak

The software infrastructure of the ATLAS experiment at CERN, the multithreaded framework AthenaMT, is the key technology enabling the processing of data recorded from proton-proton collisions at the Large Hadron Collider (LHC) in CERN in its current version, called Run-3. The next major upgrade of the LHC, called High-Luminosity LHC (HL-LHC), will increase the collision rate by about a factor 3 and will require to collect and process up to ten times more data than today. This will impose new challenges for the ATLAS software and hardware to be developed over the next few years. The ATLAS software and computing HL-LHC roadmap [1], released on 3 March 2022 , describes a number of specific research and development tasks to be taken towards this ultimate ambitious goal at all level of the architecture stack. The current performance of the AthenaMT software used in the ATLAS experiment is based on hardware infrastructure consisting of CERN server computers and the World-wide LHC Computing Grid (WLCG) that currently encompasses nearly 1 million CPU cores. The HL-LHC data processing will require deployment of more massively parallel hardware. Therefore, the research of manycore multitheaded and GPU algorithms in various domains of HL-LHC data processing needs to be conducted.

The goal of this dissertation is the research of massively parallel algorithms and data structures needed for reconstruction of the data recorded by the various sub-detectors into physical objects like photons, electrons, muons, taus, and jets. The reconstruction should determine the energy and directions of physical objects or other physics parameters. The reconstruction algorithms and corresponding data structures are complex namely due the requirements to integrate fast and precisely the outputs of multiple detectors into single combined events and due to the necessity to account for changing detector conditions. Online reconstruction for trigger processing can potentially benefit from deploying GPUs in the trigger farm. A partial goal of the dissertation is to develop techniques for efficient offloading of algorithmic code onto heterogeneous set of different accelerators using the same code base.

## Urban Josef, Mgr., Ph.D.

### Collaborative Machine Learning for Interactive Theorem Proving

**Specialist supervisor:** Lasse Blaauwbroek, MSc.

Machine learning for interactive theorem proving is a topic with growing interest. Many individual learning projects exist which have seen considerable success in recent years. However, a lot still remains to be done to make machine learning viable in the wild. One roadblock is that most projects exist within their individual silo and cannot be directly compared to others. This is an issue that is already being solved in the wider domain of machine learning through initiatives such as the OpenML platform [1, 2] that allows one to create datasets and corresponding learning tasks and upload them to the platform. Once a dataset and task are uploaded, anyone can create and upload models that attempt to solve the task. The platform will then automatically evaluate and compare all submitted models, creating rich interactive visualizations that highlight the relative strengths and weaknesses of the models. Collaborative platforms bring considerable advantages, the most important one is the ability to easily compare different learning models through fair and consistent benchmarks that represent real-world conditions. Currently, every machine learning project develops their own evaluation technique, many of which do not represent real conditions. Additionally, human-computer research has shown that the "gamification" of difficult problems considerably helps with competitive engagement of a community [3]. Unfortunately, the data, evaluation methods and models associated to theorem proving are too complex to fit within the generic framework of initiatives like OpenML. As such, we will develop a dedicated platform for interactive theorem proving. Previous attempts at developing a central platform for theorem proving [4, 5, 14] have not gained much traction due to a lack of an interactive component. They provide a single, static dataset and code to perform an evaluation but lack the ability to automatically, fairly and interactively compare different models of the community. This work is on the intersection of multiple topics including machine learning, theorem proving, human-computer interaction, web technologies and engineering. Developing a successful platform involves multiple research angles and technical components.

**More information:**

### Language to logic translation

**Specialist supervisor:** Dr. Adam Pease

Train state-of-the-art language models to translate natural language statements such as "The cup is on the table" to formal logic statements. Those will be altimately used by today's automated theorem provers (ATPs) such as Vampire and E prover to derive conclusions from the statements. In this research we will use the SUMO (https://www.ontologyportal.org/) language and formal ontology. For example, the previous statement in SUMO will look as follows:

(exists (?X ?Y)

(and

(instance ?X Table)

(instance ?Y Cup)

(orientation ?Y ?X On)))

The language translation will be complemented by semantic tools such as SUMO's syntax and semantic checker. Data augmentation methods will be developed to increase the size of the training corpus for the NLP methods. This includes for example generating synthetic data by using SUMO's logic to language translator, direct translation to different natural languages, and transitive data generation via NLP-based methods such as Google Translate between different natural languages. The ultimate goal is to automatically enrich the SUMO knowledge base by a large amount of common-sense facts expressed in natural language. This will in turn allow automated reasoning about a growing number of everyday topics.

### Modern AI methods in Automated Theorem Proving

**Specialist supervisor:** RNDr. Martin Suda, Ph.D.

Automated theorem proving aims to fulfil the dream of automating mathematically precise deductive reasoning, with applications in formal verification of safety critical systems or the formalisation of math. Research on practical automated theorem provers combines development of logical calculi with their efficient implementation and empirical evaluation on relevant problem sets. Equally important role is played by effective heuristics which help to guide the search for the proof. Recent advances in AI, most notably machine learning and neural networks, allow for automated discovery of such heuristics, either from past prover experiences or in a reinforcement learning setup. The aim of the thesis will be to advance the state of the art in this area and to develop new methods for the automatic discovery of proof search heuristics.

## Vitvar Tomáš, doc. Ing., Ph.D.

### Operations Data Science: Machine learning on massive operational data from systems infrastructures

System infrastructures including operating systems, networking, web servers, load balancers, application servers, etc. produce massive amounts of operations data during systems runtime. Operations data science aims to improve various machine learning methods to help understand patterns, correlations or similarities from the past systems’ behavior in order to improve operations practice, for example, systems maintenance, changes in systems design or root cause analysis of incidents. This includes predicting systems behaviors when unplanned or unexpected events occur such as connectivity lost, failures of servers or spikes in systems workloads. This research will improve supervised and unsupervised learning methods tailored for problems from operational data analytics and cover a wide spectrum of underlying machine learning tasks including classification, clustering or regression.

## Zemánek Petr, doc. RNDr. Ing., CSc.

### Automatic tuning of operating system kernel

The kernels of monolithic operating systems are currently probably the most complex centralized programs that undergo a process of continuous development and tuning of both functionality and performance. They have an architecture of hundreds of modules. A number of tools are currently available to diagnose the functionality and performance of these kernels, but the level of usability of the outputs of these tools for effective kernel debugging is highly dependent on the human factor, the maturity and experience of the development team.

The aim of the dissertation is to research methods of automatic detection of the causes of performance problems of the operating system kernel and kernel scalability for multiprocessor computer systems. Investigate which known techniques of artificial intelligence and machine learning can be used to automate these methods.