Ing. Karel Klouda, Ph.D.

Theses

Bachelor theses

Application for data analysis of study results

Author
Martin Konečný
Year
2015
Type
Bachelor thesis
Supervisor
Ing. Karel Klouda, Ph.D.
Reviewers
Ing. Jitka Hrabáková, Ph.D.

Automatic categorization of job ads

Author
Patricie Petriľáková
Year
2023
Type
Bachelor thesis
Supervisor
Ing. Karel Klouda, Ph.D.
Reviewers
Rodrigo Augusto da Silva Alves, Ph.D.
Summary
This thesis presents the development of the classification model for Information Technology job advertisement at webpage up2staff.com. The objective is to create a reliable classification system that reduces the time and costs associated with manual categorization of job ads. The process involves analyzing and preprocessing a dataset of job ads, researching appropriate algorithms, and experimenting with combinations of feature engineering techniques and supervised machine learning classification algorithms. The model decides the final decision based on weighted decisions from two classification algorithms; one created for the content and the other for the job ads' title. Both classifications perform with the highest F1-score for the Support Vector Machines algorithm applied to TF-IDF features. The classification model achieves F1-score of 0.909.

Czech e-Library - Poetry of the 19th and 20th Century

Author
Jaromír Dalecký
Year
2014
Type
Bachelor thesis
Supervisor
Ing. Karel Klouda, Ph.D.
Reviewers
doc. Ing. Štěpán Starosta, Ph.D.

Web application for presentation of research institutions ranking data

Author
Pavel Švagr
Year
2018
Type
Bachelor thesis
Supervisor
Ing. Karel Klouda, Ph.D.
Reviewers
doc. Ing. Štěpán Starosta, Ph.D.
Summary
The subject of this bachelor thesis is the processing of an open data set derived from ratings of research results published by Research, Development and Innovation Council. This thesis deals with the analysis of files, implementation of parsing module and reveals inconsistencies and errors. Subsequently it is focused on analysis, design and implementation of a web application which enables seaching in ratings and displays an overview of scientific activity of reseach organizations, their units and authors.

Word sense representation for the Czech language

Author
Vojtěch Paukner
Year
2019
Type
Bachelor thesis
Supervisor
Ing. Karel Klouda, Ph.D.
Reviewers
Ing. Magda Friedjungová
Summary
The thesis surveys traditional and state-of-the-art methods of natural language processing. Particular importance is placed on languages with rich morphology. The state-of-the-art methods are then applied in various ways on the Czech language in order to differentiate between distinct word senses based on their usage in a sentence. Evaluation of these experiments is an important part of the thesis.

Management system for digitalized literary works

Author
Martin Melichar
Year
2018
Type
Bachelor thesis
Supervisor
Ing. Karel Klouda, Ph.D.
Reviewers
Mgr. Jan Starý, Ph.D.
Summary
In this thesis the development of an application for the conversion of literary works into electronic form is described. Literary research focuses on comparing of technologies for web application development and comparing text formats for maintaining of electronic works. Furthermore, the assigned input data and the way of their import are described. The practical part follows the evaluation of research and describes the process of the application development. The primary contribution of this thesis is to facilitate the conversion of literary works into electronic form for the UČL AV ČR employees.

Administration system for written tests and exams

Author
Kryštof Slavík
Year
2016
Type
Bachelor thesis
Supervisor
Ing. Karel Klouda, Ph.D.
Reviewers
doc. Ing. Štěpán Starosta, Ph.D.
Summary
This bachelor thesis focuses on design and implementation of extension for web mathematical problems database. The aim of this extension is automated generation of written tests for students of mathematical courses on FIT CTU in Prague. The application allows teachers of the courses to easily create multiple variants of tests without the need for manual assignment of mathematical problems. The thesis introduces an algorithm which is able to automatically create required number of tests based on the specified parameters using the database. The system allows comfortable specification of these parameters. The created tests can be exported in a printable format. This thesis describes in detail the analysis of the required functionality of the application and its design. It focuses on the implementation which uses Ruby on Rails technology and it describes usage of the system in practice. The source code of the application is available on the attached DVD.

Named entity recognition for poetic texts

Author
Ondřej Černý
Year
2023
Type
Bachelor thesis
Supervisor
Ing. Karel Klouda, Ph.D.
Reviewers
Ing. Magda Friedjungová, Ph.D.
Summary
The result of this work is a program that uses Natural Language Processing (NLP) techniques to identify named entities in the Corpus of Czech Verse (CCV). It is part of a cooperation with the Institute of Czech Literature (ICL). Since CCV is not even partially labeled for entity recognition, we first create a set of rules, and using those, we select entities from the poems. These entities are later on categorized into different entity categories using data from Wikipedia. After that, these categorized entities are used as training data for a BiLSTM-CRF neural network that is trained and fine-tuned for NER on the CCV. The resulting model can find and distinguish entities of Place, Person, Mystic Person, and Other. Since the text in the CCV is not labeled for NER, we cannot know the exact accuracy of the final BiLSTM-CRF model. If we would consider the data that are used for training of this model to be 100% accurate, then the final model would have achieved an accuracy of 0.99904 and an F1 score of 0.9532.

Online system to support writing pages on wikipedia.org

Author
Václav Makeš
Year
2016
Type
Bachelor thesis
Supervisor
Ing. Karel Klouda, Ph.D.
Reviewers
Ing. Tomáš Kalvoda, Ph.D.
Summary
The thesis is focused on solving the problem of detection and design corrections of erroneous and missing data from an Internet encyclopedia Wikipedia. The result is an automated system that downloads, stores and analyzes the Czech edition of Wikipedia articles. To analyze the proposed three methods to identify articles for improvements and additions. Work shows the possibility of proposing improvements to the electronic encyclopedia.

Probabilistic algorithms for computing the LTS estimate

Author
Martin Jenč
Year
2019
Type
Bachelor thesis
Supervisor
Ing. Karel Klouda, Ph.D.
Reviewers
Mgr. Petr Novák, Ph.D.
Summary
The least trimmed squares method is a robust version of the method of least squares, which is an essential tool of regression analysis used to find an estimate of coefficients in the linear regression model. Computing the least trimmed squared estimate is known to be NP-hard, hence only suboptimal probabilistic algorithms are usually used in practice. Besides describing those algorithms, we propose a few ways of combining those algorithms to obtain better performance.

NHL match results prediction

Author
Filip Kojan
Year
2019
Type
Bachelor thesis
Supervisor
Ing. Karel Klouda, Ph.D.
Reviewers
Ing. Magda Friedjungová
Summary
The goal of this thesis is to explore data sources about players and matches in NHL and about modern statistic methods, which are used for evaluating quality of teams and players and possibilities of using these informations for predicting results of NHL matches. Various classification models of machine learning are used and their predictive ability is compared. The results of predictions are compared to bookmaker predictions.

Corpus of comments below news articles

Author
Jakub Bartel
Year
2013
Type
Bachelor thesis
Supervisor
Ing. Karel Klouda, Ph.D.
Reviewers
doc. Ing. Štěpán Starosta, Ph.D.

Predicting selected basketball match events

Author
Radim Křesťan
Year
2023
Type
Bachelor thesis
Supervisor
Ing. Karel Klouda, Ph.D.
Reviewers
Mgr. Petr Novák, Ph.D.
Summary
This thesis is focused on live predictions in basketball, specifically in NBA. The thesis briefly describes the domain and includes an analysis of experiments that have been conducted in the past. It also describes the process and the possibilities of data mining. In the practical part of this thesis, several models have been used, including but not limited to linear regression and random forests. The most successful method was linear regression which had the lowest error in majority of predictions. Player stats at the end of the game were predicted with known mid-game data.

Football player value prediction

Author
Jan Garček
Year
2020
Type
Bachelor thesis
Supervisor
Ing. Karel Klouda, Ph.D.
Reviewers
Ing. Mgr. Pavla Vozárová, Ph.D., M.A.
Summary
The aim of this thesis is to explore free available data about football players. It explains variances between transfer value and market value and seeks attributes that have a major influence on the player's transfer value. The paper visualizes these attributes with special focus on seasons and nationality. Moreover, it evaluates results from other similar projects and various regression models for a prediction of transfer value are experimentally applied to collected data. Additionally, results of individual models are compared and the most accurate model is determined. The main purpose of this work is to provide an available prediction model for transfer value to the general public for free.

Automatic poetic metre detection

Author
Kristýna Klesnilová
Year
2022
Type
Bachelor thesis
Supervisor
Ing. Karel Klouda, Ph.D.
Reviewers
Ing. Magda Friedjungová, Ph.D.
Summary
This work is devoted to automatic metrical analysis of Czech syllabotonic verse metrically tagged inside a large poetic corpus - the Corpus of Czech Verse. First, it reimplements the existing data-driven approach used by a program called KVĚTA. Later, it models the problem as a sequence tagging task and solves it using machine learning. The BiLSTM-CRF model is used, representing the current state of the art for many sequence tagging tasks. Many different input configurations are tested. In all experiments, the inputted syllables or word tokens are represented by Word2Vec word embeddings trained on training data. The results are evaluated by computing three different accuracies of the predictions: syllable-level accuracy, line-level accuracy, and poem-level accuracy. It is shown that using BiLSTM-CRF represents a great success. With the best input configurations, it produces better results than the KVĚTA reimplementation, with predictions achieving 99.61% syllable accuracy, 98.86% line accuracy, and 90.40% poem accuracy. The most interesting finding is that the best results are obtained by inputting sequences representing whole poems instead of individual poem lines.

Expected goals in ice hockey

Author
Michal Seibert
Year
2022
Type
Bachelor thesis
Supervisor
Ing. Karel Klouda, Ph.D.
Reviewers
Mgr. Petr Novák, Ph.D.
Summary
The subject of this thesis is to find and examine data sources about actions taken in NHL matches and then proceed to apply these data on forming models for predictions of expected goals. Several classification models are used for prediction. The models and their success rate is then compared with each other and with existing models. They are also used to gather additional information about players’ and teams’ performance.

Detecting problems in outdoor cypher games

Author
Barbora Eliášová
Year
2019
Type
Bachelor thesis
Supervisor
Ing. Karel Klouda, Ph.D.
Reviewers
Ing. Tomáš Kalvoda, Ph.D.
Summary
The thesis grasps analysis of data created during puzzlehunts made by Cryp- tomania. It describes big cipher games and ciphers, that are used. Further off, it covers different possibilities of cipher classfication. It presents data analysis made by Tomáš Kuča and it's benefit to cipher games players. His webpage called statek.seslost.cz classifies data from big cipher games. The the- sis defines terms difficulty, complexity and time intesity. Data analysis itself examines Cryptomania puzzlehunts Avraham Hrashalom, Fantom Brna and Ztracené židovské město. Duration of cipher solving was combined with number of hints taken. The information was used to create a value that defines difficulty of the cipher. Further, every team got a rating as well. Cluster analysis uses these informa- tion to identify groups of similar teams.

Automated detection of text translations

Author
Jan Peřina
Year
2021
Type
Bachelor thesis
Supervisor
Ing. Karel Klouda, Ph.D.
Reviewers
Mgr. Petr Novák, Ph.D.
Summary
This bachelor thesis explores the possibilities of detecting a translated portions of a text together with ways of search for the origin of such text on the internet. In this thesis an experiment of chosen method for machine translation detection is reproduced. This method was then improved by utilization a different text similarity metric and lemmatisation. The applicability of this method on human produced translation was tested. And several ways of transforming this way detected texts into search engine queries to effectively find their sources on the internet.

Predicting selected basketball match events

Author
Ondřej Schejbal
Year
2020
Type
Bachelor thesis
Supervisor
Ing. Karel Klouda, Ph.D.
Reviewers
RNDr. Petr Olšák
Summary
Within this bachelor's thesis, a model predicting the total number of points scored in future match development in NBA basketball match was created. Predictions are based on data from previous games and statistics, which were already published in the ongoing match. In order to obtain the data, a study of existing materials was made, which were then successfully used for the creation of sufficient materials for the training of the prediction model. Also, the research of already finished theses, which are focused on a similar topic, was made. Based on the gathered data, a linear regression prediction model was chosen, and interesting attributes were added to the data mentioned above, which were meant to improve the model's predictions. The model was trained successfully, and it's results on the testing set of data seemed to be favourable. Although the full quality of the results would be possible to obtain by testing the model on currently played matches. Unfortunately, this wasn't possible due to the ongoing COVID-19 pandemic, which took place during the creation of this bachelor's thesis.

System supporting research in combinatorics on words

Author
Radek Jireš
Year
2013
Type
Bachelor thesis
Supervisor
Ing. Karel Klouda, Ph.D.
Reviewers
doc. Ing. Štěpán Starosta, Ph.D.

Application for visualisation of linear algebra notions and methods

Author
Martin Chvátal
Year
2016
Type
Bachelor thesis
Supervisor
Ing. Karel Klouda, Ph.D.
Reviewers
Ing. Petr Špaček, Ph.D.
Summary
The topic of this thesis is creation of an educational software for linear algebra. It allows a teacher to supplement a lecture with example of how the currently studied topic can be used in informatics. There is a set of programming tasks prepared for students, to help them practice what they learned.

Master theses

Estimating webpage content in secure communication

Author
Marek Mařík
Year
2021
Type
Master thesis
Supervisor
Ing. Karel Klouda, Ph.D.
Reviewers
Ing. Karel Hynek, Ph.D.
Summary
This master thesis deals with whether it is possible to determine from network traffic which websites were visited by the user despite the fact that the communication takes place in an encrypted way. Furthermore, whether it is possible to at least approximately determine the content of the web page from encrypted network traffic. All this based on the characteristics of network flows, i.e. without the traffic being decrypted. As part of this work, a data set generator was designed and implemented, which allows to create data sets that contain captured network flows for visits to individual websites. Two datasets were created using this generator. A diverse set of features has been designed. Based on the features vectors, experiments were performed using multiple different models to identify websites and estimate their content. Furthermore, novelty detection models were created to detect unknown web pages. Experiments show that based on encrypted traffic, websites can be relatively accurately identified and some attributes of their content can be estimated as well.

A Tool for Digitalizing Handwritten Chess Notation Sheets

Author
Jana Maříková
Year
2021
Type
Master thesis
Supervisor
Ing. Karel Klouda, Ph.D.
Reviewers
Ing. Jiří Kašpar
Summary
This thesis aims to create a tool that would automate converting a photo of a chess score sheet into digitalized form with the help of OCR and machine learning techniques. The score sheet is a paper document where players write down their and opponents' moves. First of all, the chess terminology and existing solutions are introduced. Then the description of a general OCR system is stated, and, finally, the implementation of the system and its evaluation are given.

Automatic detection of topics in poetic texts

Author
Martin Bendík
Year
2023
Type
Master thesis
Supervisor
Ing. Karel Klouda, Ph.D.
Reviewers
Ing. Magda Friedjungová, Ph.D.
Summary
This thesis studies the detection of topics in the Corpus of Czech Verse, which contains tens of thousands of poems from the 19th and early 20th centuries. It uses machine learning methods to efficiently process the large amount of data. The output of these algorithms is a set of detected topics and the classification of individual poems into these topics. This can help in further analysis of the artworks, summarizing and exploring what each poem addresses. This thesis presents current research in the area of detecting topics in poetic texts in different languages and using different technologies. The thesis also includes the development of several models that are used to assign topics to individual poems. Unsupervised, supervised and semi-supervised algorithms have been used for this purpose. We evaluate all the created models in detail, visualize them, point out their strengths and weaknesses, specific features and last but not least compare the models with each other. Since the Corpus of Czech Verse does not contain annotations of poem topics, for the purpose of supervised learning, an annotated dataset was created, which consists of a subset of poems from the original dataset.

Online doctor reservation system

Author
Martin Jelínek
Year
2014
Type
Master thesis
Supervisor
Ing. Karel Klouda, Ph.D.
Reviewers
doc. Ing. Štěpán Starosta, Ph.D.

Extracting structured data from textual car selling advertisement data

Author
Filip Kojan
Year
2021
Type
Master thesis
Supervisor
Ing. Karel Klouda, Ph.D.
Reviewers
doc. Ing. Pavel Kordík, Ph.D.
Summary
The aim of this work is to explore, design and test methods for extracting structured data from unstructured texts of car ads. Furthermore, examination of methods for text preprocessing into a format suitable for use in machine learning models and the application of these methods in combination with various machine learning models. The most successful models will be compared and the results they have achieved will be evaluated.

Web portal for testing algorithms computing least trimmed squares estimate

Author
Jan Švehla
Year
2013
Type
Master thesis
Supervisor
Ing. Karel Klouda, Ph.D.
Reviewers
Ing. Jitka Hrabáková, Ph.D.

Internet Traffic Classification

Author
Jana Mašková
Year
2020
Type
Master thesis
Supervisor
Ing. Karel Klouda, Ph.D.
Reviewers
Ing. Simona Buchovecká
Summary
This thesis delves into the topic of machine learning for the classification of internet traffic and the determination of harmful traffic. All steps of machine learning are considered as data collection and data preprocessing. Suitable classification algorithms and anomaly detection algorithms were chosen to accomplish the main task of the thesis. With regards to the classification of internet traffic, a high success rate was achieved for all selected datasets using supervised algorithms based on decision tree. For harmful traffic detection, only two of the seven datasets achieved a satisfactory score with used anomaly detection algorithms.

Algorithms for verifying properties of D0L systems

Author
Anežka Štěpánková
Year
2021
Type
Master thesis
Supervisor
Ing. Karel Klouda, Ph.D.
Reviewers
doc. Ing. Jan Janoušek, Ph.D.
Summary
The aim of this work is to present combinatorics on word and theory od D0L-systems. Further, to study and understand algorithms for determining selected properties of D0L-systems, namely: pushy, injectivity, repetitivity and circularity. Furthermore, to implement these selected algorithms in the language Python and then use them to find out these properties for binary morphisms and to evaluate the results of creating an overview of the properties of the tested binary morphisms.