Ing. Daniel Langr, Ph.D.

Theses

Bachelor theses

Graphical User Interface for Visualization of Very Large Sparse Matrices

Author
Ivo Kolář
Year
2014
Type
Bachelor thesis
Supervisor
Ing. Daniel Langr, Ph.D.
Reviewers
doc. Ing. Ivan Šimeček, Ph.D.

Master theses

Parallel Sorting in C++11

Author
Klára Horáčková
Year
2019
Type
Master thesis
Supervisor
Ing. Daniel Langr, Ph.D.
Reviewers
doc. Ing. Ivan Šimeček, Ph.D.
Summary
A new parallel variant of the quicksort algorithm called C++11Sort is presen- ted in this thesis. C++11Sort is implemented based on the C++11 threads and the respective synchronization techniques, i.e. without the use of the non- standard C++ extensions and external libraries. In the search section, existing implementations of the parallel versions of the quicksort algorithm are descri- bed. These implementations are afterwards experimentally compared to the C++11Sort algorithm. C++11Sort achieves excellent results in those expe- riments. When sorting two billions of integers, it is faster than the fastest existing implementation by 28 %.

Efficient concurrent memoization system

Author
Viacheslav Kroilov
Year
2020
Type
Master thesis
Supervisor
Ing. Daniel Langr, Ph.D.
Reviewers
Ing. Jiří Kašpar
Summary
An automatic memoization system - also known as a software cache - stores a limited number of recently accessed elements and speeds up consequent accesses to them. Least-Recently-Used (LRU) is a popular replacement policy for hardware and software caches. However, its concurrent implementation suffers from high contention due to the list reordering performed both on lookup and insertion. A novel LRU-inspired concurrent software cache, called DeferredLRU, is presented in this thesis. The main goal of the design was to make it scalable and suitable for many-core systems. These properties were achieved by using a different approach to tracking item access order. This approach substantially decreases the number of list reinsertions --- the main factor of the contention in a regular LRU cache. DeferredLRU throughput and hit-rate are sensitive to the meta-parameter setting. By fine-tuning meta parameters for specific inputs, it was possible to achieve higher hit-rate than of a regular LRU cache for every tested input. The relative difference was up to 7.8%. DeferredLRU performance was compared to existing alternatives, including corresponding caches from Intel TBB and Facebook HHVM projects. Tested caches were evaluated with up to 32 threads (on a 16 HW cores CPU). In 32 threads evaluation, DeferredLRU was faster in all 16 tests. When accesses were distributed among multiple smaller caches for better parallelism (this approach is called binning), DeferredLRU was faster in 11 of 16 tests and was close to best-performing caches in 5 other tests. DeferredLRU with binning was up to 28.8 times faster on 32 threads compared to single-threaded performance.

Performance analysis of the LSU3shell program

Author
Martin Kočička
Year
2019
Type
Master thesis
Supervisor
Ing. Daniel Langr, Ph.D.
Reviewers
Ing. Jiří Kašpar
Summary
Ab initio approaches to nuclear structure exploration are at the forefront of current nuclear physics research. LSU3shell is an implementation of the ab initio method called symmetry-adapted no-core shell model (SA-NCSM) optimized for distributed HPC systems. The goal of this thesis was the analysis of the LSU3shell program with focus primarily on dynamic memory allocation, research of methods that can be used to improve performance and memory usage, and their application on LSU3shell. The focus on dynamic memory allocation proved to be the right one, leading to many possible optimizations. I was able to reduce the run time by 41 % on average, thus potentially saving up to 1.4 million core-hours of our total BlueWaters allocation.

Low-Latency Heap Profiling

Author
Jan Tománek
Year
2021
Type
Master thesis
Supervisor
Ing. Daniel Langr, Ph.D.
Reviewers
doc. Ing. Ivan Šimeček, Ph.D.
Summary
Frequent heap allocations can have a significant impact on the overall performance of a program. The problem becomes even more critical in multithreaded applications, where too many interactions with heap can grow into the main bottleneck and effectively diminish scalability. Profiling such applications with existing heap profilers is almost impossible due to the enormous runtime overhead they add. This thesis describes the typical memory layout of a Linux process, briefly shows the internals of the GNU C memory allocator, analyzes available heap profiling tools, and guides the reader through the process of constructing a~custom heap profiler. Further, the thesis proposes a concept of low-latency heap profiling. Based on that, a new profiler Heaprof has been developed and compared with other heap profiling tools. Experimental results revealed that Heaprof was the only tool capable of profiling without further compromising the program's scalability. For 8-threaded benchmarks, Heaprof achieved up to 8 times profiling speedup.

Survey of Sparse Matrix Storage Formats for Parallel Systems

Author
Radek Bittara
Year
2015
Type
Master thesis
Supervisor
Ing. Daniel Langr, Ph.D.
Reviewers
doc. Ing. Ivan Šimeček, Ph.D.
Summary
The goal of this thesis is to provide a overview of published sparse matrix storage formats in computer memory. This thesis contains a description of the principles and potential use in a sparse matrix-vector multiplication for modern parallel systems.