Pierre Donat-Bouillud, Ph.D.

Theses

Bachelor theses

Symbolic execution for R

Author
Petr Šťastný
Year
2024
Type
Bachelor thesis
Supervisor
Pierre Donat-Bouillud, Ph.D.
Reviewers
Sebastián Krynski, MSc.
Summary
Symbolic execution is a technique that enables one to test programs and prove nontrivial properties about a program. Making a new symbolic execution engine is challenging. Instead, a technique is used where an interpreter of the target language is symbolically executed itself, simplifying the symbolic execution of its programs. In this work, I show how to apply this to the R language. The resulting symbolic execution engine can be used to, for example, test library correctness or generate type annotations.

Automated grading and leaderboard for the NI-APT course

Author
Oleh Pantus
Year
2025
Type
Bachelor thesis
Supervisor
Pierre Donat-Bouillud, Ph.D.
Reviewers
Ing. Stanislav Kuznetsov, Ph.D.
Summary
Automated grading systems for programming courses face challenges in security and test rigour, particularly for advanced topics like program testing tools. This thesis introduces Fittest, a new automated grading system designed to address these challenges. It employs a secure hybrid architecture, where student code runs on GitLab CI but critical grading logic occurs on a protected backend server. Fittest supports dynamic C program generation for evaluating student-developed code coverage tools and fuzzers, and integrates with university authentication. The result is a secure and pedagogically sound assessment platform for the NI-APT course.

Semantic diff for R notebooks

Author
Volodymyr Plita
Year
2025
Type
Bachelor thesis
Supervisor
Pierre Donat-Bouillud, Ph.D.
Reviewers
Ing. Petr Máj, Ph.D.
Summary
This thesis addresses the critical challenge of reproducibility in computational research conducted through R notebooks. While R notebooks have revolutionized scientific workflows by integrating code, narrative, and results, their outputs often vary across different execution environments due to dependencies on library versions, operating systems, and time-dependent operations. Traditional comparison tools like GNU Diff fail to capture meaningful differences in dynamic outputs such as plots and statistical tables. This research develops and evaluates a semantic diff tool specifically designed for R notebooks that can detect differences beyond textual comparison. Through comprehensive analysis of the R programming language and notebook ecosystem, a tool was implemented that computes similarity metrics (including DSSIM and relative difference) for outputs suspected to be semantically equivalent. Evaluation across varied computational environments demonstrated the tool's ability to identify meaningful differences that conventional text-based comparison would miss, particularly in complex outputs like data visualizations and statistical tables. Despite its effectiveness, opportunities remain for improving computational efficiency, particularly in output generation speed and optimization of the Top-Down computation phase. This work contributes to reproducibility science by enabling more sophisticated comparison of notebook outputs, advancing transparency and reliability in computational research.

Mutation testing for the R language

Author
Assanali Amandykov
Year
2025
Type
Bachelor thesis
Supervisor
Pierre Donat-Bouillud, Ph.D.
Reviewers
Ing. Zdeněk Rybola, Ph.D.
Summary
Mutation testing injects small, systematic faults (mutants) into code to evaluate and strengthen test suites. Although Rthe lingua franca of statistical computingrelies on testthat for testing, existing tools like mutatr exhaustively generate and run mutants serially, leading to long feedback cycles. This thesis introduces a mutation-testing tool for R that supports selective mutation strategies and parallel execution, cutting runtimes by up to 4× while preserving fault-detection effectiveness. Empirical evaluation on popular R packages demonstrates that our approach delivers rapid, configurable mutation testing and meaningfully improves software reliability.

Compilation of a typed R-like language to WebAssembly

Author
Bilguudei Baljinnyam
Year
2026
Type
Bachelor thesis
Supervisor
Pierre Donat-Bouillud, Ph.D.
Reviewers
Ing. Petr Máj, Ph.D.
Summary
R is a widely-used dynamic language for statistical computing, but its dynamic nature prevents efficient ahead-of-time compilation. This thesis presents the design and implementation of a compiler for a statically-typed subset of R, called Typed R, that targets WebAssembly. The defined language retains R's core characteristics, including first-class vector operations and lexical scoping, while introducing static typing to enable compilation. The compiler leverages the WebAssembly Garbage Collection proposal, which simplifies memory management and enables efficient representation of high-level language constructs. Source programs are transformed through parsing, type checking, and intermediate representation lowering to generate WebAssembly bytecode, supported by an easily extendable runtime system written in Typed R. Evaluation on representative statistical programs demonstrates the viability of compiling R-like languages to WebAssembly using modern proposals.

A web editor for reproducible R notebooks

Author
Anastasiia Chumak
Year
2026
Type
Bachelor thesis
Supervisor
Pierre Donat-Bouillud, Ph.D.
Reviewers
Ing. Martin Urbanec
Summary
This bachelor's thesis focuses on the development of a web-based editor for creating and managing reproducible R notebooks. It addresses the challenge that standard R Markdown files capture only code, not the complete computational environment needed for reliable re-execution. The thesis covers the analysis, design, implementation and testing of a platform that integrates static analysis for detecting reproducibility issues, automated dependency capture using the r4r tool, and semantic diff verification with rdiff. The system enables researchers to create notebooks in a web browser, automatically captures all dependencies including R packages and system libraries, and verifies that generated reproducibility packages produce identical results across different execution environments.

Master theses

Nested loops and path explosion in symbolic execution

Author
Vojtěch Rozhoň
Year
2024
Type
Master thesis
Supervisor
Pierre Donat-Bouillud, Ph.D.
Reviewers
prof. Christoph Kirsch
Summary
The thesis evaluates techniques for tackling path explosion in symbolic execution. There is a particular focus on the interaction of the techniques with nested loops. Path pruning, path subsumption, state merging, and loop summarization techniques are discussed using examples. Symbolic execution and these techniques are implemented to analyze programs written in an educational language called microc, a language inspired by C. The thesis describes experiments performed to compare the techniques, especially to find out how well the techniques deal with many loops or deep nested loops. A random microc program generator is developed to experiment with the symbolic executor.

Interpretation and Differential Fuzzing of the Vyper Language

Author
Miroslav Škrabal
Year
2025
Type
Master thesis
Supervisor
Pierre Donat-Bouillud, Ph.D.
Reviewers
Ing. Josef Gattermayer, Ph.D.
Summary
Vyper is a programming language for writing smart contracts on EVM-based blockchains. Smart contracts can hold up to billions of dollars; thus, security is the highest priority. Historically, the Vyper compiler contained multiple bugs that affected the production contracts and eventually led to a major hack. This thesis covers the topic of compiler testing with a focus on testing the Vyper compiler. It implements an AST interpreter for Vyper and uses it to differentially fuzz the compiler. The approach is tested with feasibility experiments where two old Vyper bugshigh and medium severityare ported to the current version, and the fuzzer is targeted to find them. The experiments showed that the fuzzer can quickly detect them and thus can be used to improve the correctness of the compiler.