X-ray structure analysis of data from powder diffraction is an important analytic method for crystal structure determination of samples that do not offer a suitable monocrystals. The indexation process is one of the critical bottlenecks for application of this method. Without the determination of the crystal lattice parameters, we cannot estimate the crystallic structure. Existing methods for indexation have problems with low-quality data as well as with the indexation of phase mixtures. The goal of this research is to develop more efficient parallel algorithms than the current ones for solving these problems.

Sparse matrix and tensor formats suitable for massively parallel algorithms in numerical linear algebra

The used sparse matrix storage format has great impact on performance and scalability of the algorithms in numerical linear algebra.

The ideal format ensures minimal memory storage requirements, maximum utilization of floating point vector units, maximum utilization of cache memories, and enables load balanced parallelization of the algorithms on massively parallel systems.

Several sparse matrix formats have been proposed in recent years, but these specialized and efficient formats also have some drawbacks. They suffer from a significant transformation overhead, are designed only for a limited set of matrix operations, or do not support fast adding or removing nonzero elements.

The dissertation goal is to conduct research on new sparse-matrix formats for massively parallel architectures to address these trade-offs and develop optimization heuristics for using these formats for a sparse matrix in numerical linear algebra operations.

Bachelor theses

Implementation and comparison of different types of search trees

This thesis is about data structure search tree, especially about different types of search trees. This thesis describes chosen search tree types, their advantages, disadvantages, traits and examples of their usage.
Analytical part contains description and analysis of search tree and different search tree types, practical part follows up with implementation of types researched in analytical part in C++ language, generation of data for testing and comparison of implemented types for generated data on university computer cluster STAR.
Created implementations achieve similar and in some cases better results than implementations imported from C++ libraries. Comparison also shows use cases of different search tree types notwithstanding only on asymptotic complexities.

This paper describes parallel LU decomposition algorithms of sparse matrices. It uses Crout's, Cholesky and QR method. These methods are implemented for serveral types of sparse matrices.
Implemented methods are tested for their time and memory performance.

Edit distance can be computed with the well-known dynamic programming algorithm in O(n^2) time, where n is the length of the input strings. The Four-Russians algorithm improves this complexity by a factor of (log(n))^2 by using a lookup table. In this thesis, the algorithm is thoroughly examined and important implementation details are discussed, with special consideration given to parallelizing the algorithm and reducing the size of the lookup table. An implementation in C++ is provided and its performance is compared with a popular edit distance library in several experiments. The results indicate that the algorithm is a viable option in practice, but not optimal

This bachelor thesis deals with the Chinese postman problem, especially with undirected and directed versions. The Chinese postman problem is an optimization problem from graph theory. In the introduction, the problem is presented along with the definition of terms used in the following sections. Both versions of the problem are divided into simple steps. For each of these steps some algorithms are presented. In the implementation part of this thesis, those algorithms are implemented and merged into a single solution to the Chinese postman problem. Implemented algorithms are parallelized using the OpenMP library. Algorithms are then tested and compared to each other.

Numerical databases speed up computations by memoizing pairs of an argument and the result, computed by a function with the argument. The canonical numerical database is based on the weighted search tree - a combination of the AVL tree and the binary heap. The application of alternative data structures, namely the hash table and the splay tree, is discussed in this thesis. In addition, a new data structure - CNDC - is introduced. It is similar to the weighted search tree, but all operations are declared as thread-safe.
Data structures, mentioned above, are implemented in the C++ programming language as a programming library, called numdb. The performance of each data structure is measured, and the results are compared and discussed.

The bachelor thesis deals with algorithms for multiplication of polynomials, especially Karatsuba and Schönhage-Strassen algorithm. The aim of the thesis is implementation algorithms in C++, optimization by source code transformation and parallelisation by OpenMP library. Algorithms are tested and measured on the Star computing server.

This article contains comparison of FFT algorithms and HFFT. Work focuses on iterative FFT aproach of Cooley-Tukey algorithm on radix-2, radix-4, radix-8 and their comparison with split-radix algorithm as well as
FFT algorithm on real numbers. Additionally involve new HFFT algortihm which can calculate two-dimensional FFT of hexagonally sampled data with one-dimensional FFTs.
Algorithms are implemented in C++ programming language and compiled on GCC while taking advantage of it's automatic optimalizations.
At last problems of multi-threading are dealt with in OpenMP, optimalizations include usage of vector instruction SIMD and data structures are discused.

Implementation of K-means clustering algorithm on multithreaded platform using OpenMP and
on GPU using CUDA technology. Comparison of time of execution of CUDA implementation
with multithreaded and sequential implementations on CPU.

This bachelor thesis deals with chosen algorithms for multiplication of polynomials. These algorithms are described in the theoretical part of the thesis with explanation of their principle of operation. The practical part is focused on implementation of these algorithms followed by their optimization. For implemented algorithms an analysis of the accuracy of results is executed. Optimized algorithms are then tested and measured on a computer server. In the end a comparision with already existing implementations and libraries is performed.

In this thesis we study nonlinear conjugate gradient methods for unconstrained optimization. We outline the possibilities and limits of existing methods for unconstrained optimization. Theoretical properties of the conjugate gradient methods are compared with other basic algorithms. The thesis reviews different variants of nonlinear conjugate gradient methods.
The conjugate gradient update parameter plays an important role in the convergence properties of nonlinear conjugate gradient method. Several formulas for the conjugate gradient update parameter exist and were proven to have plausible convergence properties. No generally optimal choice exists. In practice, the performance with different choices of the formula can vary significantly on different problems.
We propose a heuristic method that automatically adjusts the value of the parameter. Numerical results show that the performance of the heuristic method is often close to the performance of the best choice of the formula for a given problem.

Solving the System of Linear Equations via Gauss-Jordan method on GPU with CUDA.
Comparison of execution time of implementation on GPU with implementation on CPU.
Comparison of Gauss-Jordan implementation on GPU with cuSOLVER library implementation.

This thesis deals with sparse linear systems and their solutions using direct methods. It describes a selected set of methods for solving these systems and heuristics for performance optimization. Furthermore it contains a survey of a selected solvers. The result of this thesis is a compact application with a graphical user interface that offers several methods and heuristics and aims to allow the user to choose an efficient solution for the largest set of sparse linear systems.

This thesis is focused on the Conflict resolution algorithm, which is used to solve systems of linear inequalities. The purpose of this thesis is to find an effective implementation of the algorithm using common optimization techniques and to compare the performance of the algorithm for sparse and dense representations of linear systems. The thesis contains the description of the algorithm, the description of the optimization process and the results of performance measurements.

The goal of this thesis is to explore efficiency and consequences of solving the systems of linear equations with imprecise right hand side. These are represented by random variables and the results are compared to usage of interval arithmetic. We implement the solvers using Gaussian elimination method and also Gauss-Seidel iteration method. The aim is to compare the performance of these methods and to explore their viability in these representations. While information about real value is increased in the end by both approaches using random variables, the time and difficulty of computation is also increased. The best results within computing speed and precision are reported by Gassian elimination method with pivoting. Gauss-Seidel method generates similarly accurate results, but it is generally slower.

This thesis deals with analysis, design and implementation of a web application for multiplication of the mixed polynomials. The application uses trivial algorithm, Strassen's algorithm and algorithm for multiplication of sparse mixed polynomials in the Compressed Row Storage format. The work contains theoretical description of these algorithms. The result is a web application, which can multiply 2 bivariate polynomials using the selected algorithm and draw plots of speeds of different algorithms.

This thesis is focused on optimization of code representing celestial bodies movements which interacts by gravitational forces. In this case data representing Solar system was used for the simulation. Unoptimized algorithm is basicaly Newton's law of universal gravitation written in the C++ programming language which was then optimized by using different GCC compiler options, making small changes in the code, and also more extensive alterations such as cycle unrolling or the way how data was stored in the memory. Last but not least the algorithm was adapted for multicore systems for parallel computing by the OpenMP technology. It is possible to compile several versions of differently optimized or adjusted programs from attached source codes starting with a basic version and ending with the fastest 24 threads parallel version. By running this program with data representing Solar System in given time, it is possible to get a new system state after certain time given beforhand and vizualize the system by the Gnuplot tool in the form of a 3D graph.

This thesis deals with efficient algorithms used for polynomial multiplication, particular with algorithms Karatsuba, Toom-Cook and Fast Fourier transform. Among other things this thesis focuses on parallelization of these algorithms in programming language C++ using OpenMP library. There is also testing of real effectiveness and numerical stability of implemented algorithms.

The objective of this thesis is to conduct a study and implement the numerical database system using a weighted binary tree data structure. The implementation in the language C++ will be compared to the original implementation in Fortran programming language. The portion of this thesis is dedicated to the analysis of the current solution, its parameters, and discussion of implementing the parallel variant of the algorithm.

This thesis explores the possibility of using a sound card as an oscilloscope.
It includes comparison of existing solutions, analysis of limititations resulting from using a sound card for this purpose and development of our own solution.
In the conclusion we assess the usability and usefulness of the developed application.

This thesis deals with implementation of Conflict resolution algorithm, which solves systems of linear inequalities. Algorithm is implemented in C++17 and is parallelized using OpenMP.

This thesis focuses on sorting algorithm Timsort. The goal is to paralelize the
algorithm in a way to make it more efficient than its sequential version and
compare its speed with other algorithms.

This thesis describes the analysis of graph editor written in Java programming language using the MVC architecture,
which allows you to create graphs and verify their properties It also describes the development of the extensions of the graph editor
with further properties and algorithms eg whether a graph contains a closed Eulerian trail, whether the graph is Hamiltonian, or finding the minimal spanning tree.

The aim of this thesis is to explore the sequential optimizations of the Timsort algorithm and its parallelization using OpenMP. Newly developed algorithms are tested and compared with the base version of Timsort and other selected sorting algorithms. This testing is performed on the school server STAR, designed for objective testing of parallel algorithms. All implementations are in the C++ language.

The purpose of this work is the design of a library for multiplication of polynomials. The aim of this work is the mutual comparison of chosen algorithms among themselves and also comparision with the multiplication of sparse polynomials. The scope of the comparison is especially speed of algorithms, and potential discussion on the causes of speed. All the algorithms used for this work are analyzed and their advantages and disadvantages are discussed. Part of this work is also implementation of designed library. During implementation, there is an emphasis on reducing overhead costs, or other time-savings that are being discussed. The work also aims to find the limit to which it is advantageous to use the structure for sparse polynomials (for acceleration), both theoretically and practically. At the end educational GUI program is implemented which serves to detect selected statistics from algorithms runs.

This thesis contains an overview of FFT algorithms. Bluestein's, Rader's, the Cooley-Tukey and prime-factor algorithms are studied in detail. Potential optimizations of these algorithms are described extensively. A C++ implementation which can compute an FFT by combining algorithms is described in detail. The implemented program is optimized, optimizations include usage of SIMD instructions and parallelization via OpenMP. The resulted program is compared to libraries FFTW and Intel MKL.

This bachelor's thesis deals with the sparse matrix storage formats COO, CSR, CSC and algorithms for matrix multiplication in these formats. The goal of this thesis is the implementation of said formats and algorithms in the programming language C++ and testing their performance on the faculty cluster STAR.

The result of the thesis is a feasibility study of the efficiency of heat transfer calculation in the buildings of Rekom,~v.~d. Specifically, the proposal of measures to optimize the existing solution in MS Excel or the proposal of a new solution in the chosen platform. The work includes an economic and operational evaluation of the current and proposed solution. The thesis initially deals with an introduction to the problem of heat distribution calculation in buildings and an evaluation of the current implementation. Based on the information obtained, it excludes the possibility of developing the existing solution. It then selects a group of programming languages suitable for the new implementation of the calculation, comparing them with respect to speed and their capabilities. The implementation of the new solution in C++ was chosen as the most advantageous. From the cost-benefit analysis it was found that the investment in the new system pays off within the first year after deployment. The result of this work allows Rekom,~v.~d. to increase the efficiency of the company's processes without the risk of a wasted investment.

This thesis deals with efficient algorithms for finding convex hull in 3D space. In the thesis, the problem is defined and algorithms to solve it are described. These algorithms are implemented, optimized and parallelized. Also, these algorithms are compared against each other and against an existing solution, the QHull library and the CGAL library.

The goal of this thesis is to implement Doolittle algorithm for LU decomposition of sparse matrices and to become acquainted with heuristics for reducing fill-in, concretely with algorithms Multiple Minimum Degree, Reverse Cuthill-McKee and Markowitz strategy. The thesis also goes in their implementation and comparasion of their time and space complexity.

This bachelor thesis deals with sorting algorithms. Specifically, it is MergeSort and TimSort algorithms. These algorithms are implemented in C++, and by code transformation methods, efficient use of cache and parallelizations are adjusted to make their time and memory efficient. Furthermore, the author creates a hybrid algorithm using a combination of previous algorithms. Algorithms are tested on a STAR computing server that allows objective measurement of the performance and efficiency of algorithms. The result of this thesis is a set of sorting algorithms, graphical representation of their performance and comparison with already existing realizations of parallel stable sorting.

This bachelor thesis dealt with the analysis and implementation of the Karatsuba, Toom-3, Toom-6,5 and Toom-8,5 fast multiplication algorithms. These algorithms were then modified to increase their computing performance.
Literary research was conducted in order to study the principles of these fast multiplication algorithms, and to analyse some possible solutions for increasing their performance.
In the practical part of this thesis, I implemented these algorithms. Next, I modified the implemented solutions to properly make use of available system resources. I mainly used the parallel interface Open MP.
Subsequently, I measured the computing power of all the individual implemented algorithms. I analyzed the speed-up that was achieved by using the parallel interface Open MP. I also tested the computing power of some math libraries, namely GNU Multi-Precision Library (GMP), and compared their performance to the algorithms implemented in this thesis.
The main findings of this thesis are that properly implemented fast multiplications algorithms modified to make good use of system resources, especially multi-core processors, provide a dramatic speed-up in computing power.
In the attachment, you can find the implemented fast multiplication algorithms, including their modified versions.

Algorithms for Solution of the Chinese Postman Problem

This bachelor thesis deals with selected algorithms for solving the Problem of Chinese Postman (DCPP variant), i.e. one of the optimization problems in the field of graph theory. The task is to find the shortest route of the postman on his way through all the streets (those are the edges of the graph), considering that he must eventually return to the starting point. The work solves the version with oriented weighted edges. The thesis aims to use various partial algorithms for individual phases of solving the whole problem.
The implementation uses the parallelization and compares it with the sequential version.
More efficient of possible sub-algorithms are used to test the complete implementation.
Testing of the existing implementation proved that my solution is significantly more efficient (up to 10-times higher speedup).

This thesis deals with analysis of available algorithms for calculating exact value of Ludolph's number - pi. This thesis also describes geometric meaning, properties and brief history of Ludolph's number. Few libraries for calculations with extended precision numbers are analyzed and the best suited one is chosen for implementation of described algorithms. All algorithms for calculating exact value of Ludolph's number are implemented and test of their convergence is performed. Test of normality of Ludolph's number is performed at the end.

This bachelor thesis deals with the implementation of MergeSort and RadixSort algorithms. These algorithms are implemented in C ++. The implementations are optimized to be time-efficient.
In the thesis, these algorithms are first analyzed, and optimizations are designed to increase the efficiency of algorithms and parallelization methods using OpenMP that lead to the best distribution of the load on the individual threads.
The result of the thesis is implementation of the above-mentioned algorithms, graphical representation of their performance and comparison with already existing implementations.
The implementation of the RadixSort algorithm achieves better times when sorting integers than implementing the MergeSort algorithm contained in the standard C++ library.

Nonnegative matrix factorization is a method of matrix factorization, that can approximate data by a low-rank representation. This representation can be exploited for reducing the size of an image file while keeping most of the visual quality. An initialization of the decomposition algorithm is needed to produce the low-rank approximation. In this work, we propose the NMFNet neural model to accomplish the task of this initialization. The model is then compared to other initialization techniques used in practice. Random initialization, NNDSVD initialization and K-means clustering were chosen for this comparison. Our model was shown to compare favorably to these methods.

This thesis deals with effective algorithms for solving the
problem of finding convex hull. It introduces necessary theoretical concepts
and describes several known algorithms, which are implemented, optimized and
paralelized. In order to perform measurements, input point set generator is
created, and algorithms are compared using generated data.
A C++ library is made from created implementation, and compared to
existing solutions.

This thesis focuses on problem Knapsack 0-1 and approaches for searching exact solution. Choosed algorithms based on method branch and bounds, on method dynamic programming and brute force method are implemented in C++ language. After that author focuses on paralelization of that algorithms and implementation of paralelizations by OpenMP. Finally for authors implementations and for competitive implementations are measured and compared computational times on choosed data sets.

The thesis describes solution of system of linear equation in interval arithmetic. For solving systems of linear equation are used Gauss elimination, Jacobi method, Gauss-Seidel method and Conjugate gradient method. The thesis compares influence of these methods on resulting width of intervals. Also, it compares influence of pivotization with different heuristic and it analyses resulting width for preconditioning. Methods are implemented by using PROFIL/BIAS and Boost library, which are compare in terms of performance.

The theoretical part of this thesis describes an abstract data structure called priority queue, it's supported operations and a number of possible implementations which are described in detail from the theoretical point of view. After that, examples of priority queue applications are listed. This thesis also mentions several possible approaches to parallelization of this structure.
In the practical part, most of the described priority queue realizations have been implemented in C++. Their practical performance has been measured on the STAR school cluster in several test cases. This thesis shows results of these measurements and analyses them. The implementations are compared on several commented graphs.
These results help to choose a suitable priority queue variant for a specific purpose in order to save computing time, especially for large input data. The measurements have shown that simpler or asymptotically slower data structures can be faster in practice.

This bachelor's thesis deals with the design, implementation and testing of a calculator of a few largest eigenvalues of large sparse nonsymmetric real matrices. It hence deals with the description of the restarted Arnoldi method. The solver part implements the implicitly restarted Arnoldi method. The calculator is implemented in the C++ programming language. The implemented calculator provides for its solution a command line interface along with a simple graphical user interface which uses the framework Qt.

This thesis aims to explore solving systems of linear equations with imprecise right hand sides. Those imprecisions are represented by random variables and the results are compared with the results obtained by using interval arithmetic. The solutions are obtained by using the Gaussian elimination method. The goal of this thesis is to determine whether the results obtained using interval arithmetic can be refined. However, while that is true in the end, it transpires that any increase in precision leads to a damning increase in computational complexity.

In this thesis we measure the practical efficiency of selected implementations of priority queue for different input data and various ratios of performed operations.
The solution is implemented in the \Cpp programming language and the performance is measured on the school computing cluster named STAR with Intel Core i7-950 processor, 24 GB DDR3 RAM and two graphic cards GeForce GTX 590 and GeForce GTX 470.
The created solutions are tested using some typical scenarios, measured values are discussed and compared with the theoretical bounds of complexity.
Due to the measured values one can optimize critical parts of the algorithms that use the examined data structure in order to save computing resources and consumed energy.

This bachelor's thesis deals with extending the ParaCell package with a new computational method adapted to optimize the quality of indexing by correcting the zeroshift error before indexing the diffraction data. The method was then tested on a series of data from each crystal system and achieved high success on the test data but at a possible cost of high computational complexity. Furthermore, the work deals with the optimization of the MGLS indexing method and based on the results is created a more efficient variant.

This work describes formats of sparse matrices XY, YX, CRS and quadtree; algorithms of multiplication of sparse matrix with sparse matrix and describes algorithms used in parallel computation. The thesis includes implementation in C++, tests with real matrices and comparisons of results with calculated expectations. This work also contains comparisons of implemented algorithms with available solutions, namely Intel Math Kernel Library and the Eigen library. The result of this work shows the potential of aforementioned algorithms in parallel computation.

This thesis contains a description of in-place and out-of-place Mergesort algorithms. It describes their implementation in the C++ language followed by their optimization and parallelization using the OpenMP technology. Furthermore, a hybrid algorithm based on these algorithms is created with the aim of an effective usage of a provided buffer of various lengths. Lastly, the performance of this hybrid algorithm is compared with the performance of several selected existing implementations of stable sorting.

Thesis deals with Ludolphian number, algorithms for calculating its value and its normality. Ludolphian number and chosen approximation algorithms are described in the theoretical part. These algorithms are implemented with GNU MPFR library in the practical part. The speed of convergence of those algorithms is also measured. At the end, the normality of approximations of Pi is tested. The occurrence of single digits and strings of length 2 in base 10 and 16 is compared for this purpose.
The biggest speed of convergence was measured by Chudnovsky algorithm and Gauss-Legendre algorithm. By results achieved from normality testing it is safe to say, that tested approximations of Pi are normal numbers.

The work deals with three algorithms for solving linear system of equations. Matrix indicating the system of equations must meet a few prerequisites before selected algorithms handle the problem. Selected algorithms are Jacobi method, Gauss-Seidel method and SOR method. Algorithms are being implemented sequentially, parallel and then I compare their convergence speed with each other.

A new version of the parallel in-place Quicksort algorithm MPQsort for array sorting is presen-
ted in this thesis, using OpenMP for parallelization. Current implementations use only one pivot
for element partitioning. On the other hand, MPQsort implements parallel multi-way partitio-
ning and so is the first algorithm of its kind. Sequential multi-way partitionings are discussed
in the first part of the thesis, followed by parallel two-way partitioning. Based on the gathered
information is designed and implemented parallel multi-way partitioning. Implementation was
followed by an experimental evaluation of its efficiency and comparison with other implementati-
ons. MPQsort achieves good results in experiments and among the other considered algorithms
ranked second in terms of sorting randomly generated numbers. Conversely, it sometimes achieves
the best results for other types of data arrangements.

The objective of this thesis was to design an online cloud rendering service able to employ mostly idle school computers. The explored areas of this work include requirements gathering, evaluation of computing potential of selected school computers and insights into methods of sharing these computers with their standard users. The result of this work is a prototype implementation called FITRender which allows rendering of submitted scenes through a decoupled cooperation of multiple software elements. Finally a number of observations and suggestions is made towards the project to help it reach its goal of using the school's idle computational power to serve its students.

The diploma thesis describes advanced algorithms of system of linear interval equation. Methods used to solve these systems are Gaussian elimination, Modified Gaussian elimination and HBR method. The thesis compares these algorithms in terms of accuracy and in terms of efectivity. It also deals with possibility of parallelization these algorithms with the OpenMP library.

This thesis focuses on possibilities of GPGPU usage in a chess engine. Research of existing top chess engines is included. Part of this thesis is an analysis of typical two player game algorithms and a discussion of possibilities of usage of those algorithms for a GPU chess engine. The main focus of this thesis is design, implementation and testing of a GPU engine proof-of-concept.

Non-negative matrix factorization has been successfully applied in various scenarios involving analysis of large chunks of data and finding patterns in them for later use. It's used to perform things such as face recognition, source separation or image compression among others.
The purpose of this thesis is to research possible uses of non-negative matrix factorization in the problem of lossy audio compression. A reference audio encoder and decoder using NMF will be implemented and various experiments using this encoder will be conducted. The results will be measured and compared to existing audio compressing solutions.

This thesis is dedicated to the field of the convex hull problem and algorithms for computing the convex hull of points. The main task of this thesis was to design a new version of the Quickhull algorithm that utilizes "crawlers" during the preprocessing phase and compare its performance to the performance of the Quickhull algorithm, the Concurrent Hull algorithm, and other implementations of solvers of the convex hull problem. To accomplish this goal, we studied the convex hull problem and the state-of-the-art algorithms designed for solving the convex hull problem. Subsequently, we designed and implemented the Quickhull algorithm, the Concurrent Hull algorithm, and the new Quickhull with Crawlers algorithm for computation on the CPU using the OpenMP API, and for computation on the GPU using the CUDA API and the Thrust library. To evaluate the quality of our implementations, we measured their performance on datasets outputted by generators of input points we designed and implemented. We compared the performance of our implementations with each other, as well as with the already existing Qhull library. In conclusion to this thesis, we proposed ideas for further development of our implementations as well as ideas for future research in the field of the convex hull problem.

This thesis describes the process of creating a new LLVM compiler system
backend for the DLX architecture. It goes through all the necessary parts
of creating a new compiler backend such as instruction selection or register
allocation and describes them in terms of LLVM. It looks into how optimiza-
tions work in the LLVM system and implement several optimizations suitable
for the DLX architecture such as instruction scheduling. The result of this
thesis is a new working LLVM backend for the DLX architecture with several
optimizations in place. This backend can be used to compile several high-level
languages to the DLX assembly code.

This work explores the possible improvements to a popular text rendering technique widely used in 3D applications and video games. It proposes a universal and efficient method of constructing a multi-channel distance field for vector-based shapes, such as font glyphs, and describes its usage in rendering with improved fidelity.

This thesis deals with a virtualization of graphics cards in cloud platform Openstack. The aim of this thesis is also to implement a component of Openstack which dynamically allocates GPU cards to virtual machines for a limited time.

This master's thesis deals with implementation of the basis of sparse matrix library. It also contains implementation and optimalisations of massively parallel matrix multiplication on GPU in the CUDA technology.
This work is also intended to give basic level understanding of the matrix multiplication, sparse matrices and efficient implementation of algorithms under the limitations of the CUDA technology.

The work focuses on parallel sparse matrix multiplication and includes analysis and comparison of performance between a custom implemented algorithm in OpenMP and existing algorithms. The existing algorithms and their approaches and optimizations, which enable them to achieve their respective times, have been described. These ideas were utilized in the custom implementation. For comparison, a set of measurements was performed, testing various matrices from the SuiteSparse matrix collection. The obtained results were then analyzed and compared to determine the relative efficiency of the implemented algorithm compared to the existing algorithms.

This thesis attempts to enable factorization using Pollard's Rho and Lesntras Elliptic curve factorization algorithms on the GPU. It goes through initial sequential CPU implementation, its adoption to a multi-threaded solution using OpenMP, and GPU-based CUDA and Apple Metal API implementations. A new multi-platform arbitrary-precision integer arithmetic library was created for Metal and CUDA to support the end goal of arbitrary precision factorization on the GPU. The thesis evaluates the performance differences across the implemented solutions and the differences between CPU, CUDA, and Metal variants. It also provides a comparison with existing noteworthy solutions.

This thesis deals with the refactorization and expansion of an existing application called ParaCell, used for indexation of powder diffraction records. The thesis introduces the problem domain, along with the analysis of the current application. The main improvements are the design and implementation of a graphical user interface, reworking existing CUDA code into alternative technologies (Vulkan, OpenCL) and a flexible integration of the CMake build system. The final application is documented and undergoes automatic and user testing.

This thesis deals with implementation of secure lock-down system allowing user to display only browser with defined authorized domains using a regular Linux distribution. The first part contains an analysis of existing solutions. The second part analyzes suitable Linux distributions and selection of tools for system security. The third part describes the implementation of the system, including automated scripts for setting up the environment. The fourth part contains testing of the environment and automated scripts. The last part compares this solution with existing ones.

This thesis focuses on investigating approaches to combining sparse matrix-vector multiplication and transposed sparse matrix-vector multiplication into a single operation called joint direct and transposed sparse matrix-vector multiplication (SpMMTV). It also focuses on parallelising this joint operation. A parallel SpMMTV operation can be used to speed up the biconjugate gradient method, which is an iterative algorithm for solving large sparse systems of linear equations. Existing sparse matrix storage formats and existing approaches to parallel SpMMTV are examined in this thesis. Parallel SpMMTV implementations for CPUs on shared memory systems are developed and throughly described. Optimisations of these implementations are discussed and some of them are implemented. The resulting performance of developed SpMMTV implementations is compared with an implementation based on the Intel Math Kernel Library.

In this thesis I deal with factorization algorithm that uses elliptic curves, also known as Lenstra's algorithm. I analyze this algorithm using two elliptic curve models in Weierstrass and Edwards form using projective coordination system. In the first part I define mathematical concepts that are necessary for understanding described algorithm together with description of some known ellptic curve forms. In the next part I describe RSA cryptosystem that is based on factorization problem. Then I focus on Lenstra's algorithm, which I describe in more detail. At the end I desribe implementation and paralelization with some improvements of algorithm and show results of measurements.

The aim of this thesis is to investigate and analyze newest HEVC video compression standard and compare it with alternatives like H.264 and VP9. Major part of thesis is focused on programming video quality evaluation program, which is accelerated on GPU using CUDA and OpenCL.
In the first part, we discuss main features of HEVC. This part leads to correct understanding of weaknesses and strengths of HEVC.
The thesis then identifies best way of usage GPU in video comparison. Programmed quality evaluation application uses FFMPEG library for video encoding/decoding.
Last part of thesis is focused on testing on multiple GPUs based on collected data from previous analysis.
In conclusion, the thesis argues about efficiency of used technologies in quality evaluation task.

This thesis deals with visualisation of crystal structure in 3D and coding of
files that provide the structure. The whole project should extend JANA2006
program and improve crystal structure encoding of JANA2006 input files. This
paper provides chronological overview of statistical and dictionary methods of
data compresion and gives brief summary of the methods. Furthermore, this
ix
thesis provides information about the enviroments for developing graphics applications
and the technologies for GUI programming. In addition, this paper
provides information about proposed crystal structure encoding and about the
implementaion of the program for crystal structure visualisation in 3D.

This thesis focuses on detection of tennis ball trajectory, tennis court, players and events (hit, land etc.) in broadcast TV video of tennis match. Work assumes start and end of tennis game marked manually and is limited on singles matches. The solution is evaluated in terms of computational time and correctness of the detected events. The solution works on the tested hardware configuration at a rate of 0.68 frames per second. For the event categories of hit and serve differentiated by player and impact differentiated by half of the court, a value of over 85 % for measure F_1 is achieved with maximum distance 5 frames between prediction and true event.

The thesis focuses on an objective evaluation of video quality. It describes
methods for comparison of different version of the same video. I have chosen
several different ways to compare the quality of video and subsequently implemented
these. Using these implemented methods I have also gauged various
codecs with the aim of their relative comparison

This master's thesis analyzes the potential of image compression utilizing non-negative matrix factorization, a mathematical technique which factorizes a matrix containing non-negative elements into two factors. As this technique is only an approximation and not an exact factorization, information loss occurs, rendering the technique a lossy compression technique. Three compression schemes are proposed, based on different color models, and tested, with the best performing one compared to JPEG. The results show that while non-negative matrix factorization does not achieve better compression ratio, the information loss can be lower and the image quality therefore higher. Further improvements of the algorithm which could improve the compression ratio are proposed.

This thesis contains a theoretical overview of DFT, FFT and the Cooley-Tukey algorithm. It contains an analysis of methods of paralelization, properties of parallel architectures and a deeper description of the CUDA development platform used for generel computations on Nvidia graphics cards. Then it describes development of implementation of FFT using the CUDA platform and describes each seperate optimization used. Lastly it contains measurements of comuptaional performence and compares implemented optimizations. It also compares the resulting implementation with existing libraries for DFT.

The thesis describes the design, implementation and measurement of the speed of parallel algorithms for solving systems of linear equations and inequalities. The algorithms are implemented in the programming language C/C++ using OpenMP library to parallelize. Part of the thesis deals with known algorithms that these systems solve.

The main topic of this thesis is the processing of data obtained through powder diffraction. In particular, this thesis focuses on the ITO method. This thesis contains basic crystallographic terminology and that of a related method called powder diffraction. Furthermore, it contains a description of the ITO method, its implementation in the ParaCell program and test results of said implementation.

Main goal of this thesis is to asses CUDA technology for large integer arithmetic. A library similar to GMP was developed, supporting addition, subtraction, multiplication of large integers and also bitwise shift, AND, OR and XOR operations. Multiplication was implemented using fast Fourier transform algorithm. Library was compared to GMP library, mainly from performance and precision perspective.

This thesis concerns about sorting algorithms quicksort, mergesort and radixsort. In first part is described their sequential variant. In next step is implementation paralelized by OpenMP technology. Implementation of this issue is in the next stage altered for effective run under nVidia CUDA. In next chapter I have described my implementation of these algorithms. In the end of this thesis measurement and comparison with other similar implementation is done.

This thesis is focused on the quadratic sieve factorization algorithm and some of its modifications. The theoretical part of this thesis serves as a detailed description of these algorithms. The practical part of the thesis describes their implementation in C++ programming language and parallelization using openMP and MPI libraries. The implemented methods are tested on the faculty server STAR and results are compared together.

The aim of this thesis is to research possibilities of distributing sparse matrix-vector multiplication among multiple processes using MPI and CSR5 storage format.
The result of this research is a C++ library \texttt{dim}, which provides the building blocks for distributed
SpMV using CSR5.
The potential speedup of distributed SpMV is then benchmarked on a conjugate gradient algorithm implementation against
a single-process CSR5 based implementation as well as PETSc based multi-process implementation.

This thesis addresses movie subtitles matching using audio fingerprints. It
consists of analysis of existing subtitle formats and problems that can be encountered
when synchronizing them with video and description of existing
solutions to these problems. The thesis proposes a subtitle format which, instead
of timecode, uses audio fingerprints primarily. To create and compare
these fingerprints, Waveprint technology is used. This thesis also deals with accelerating
the fingerprint creation on the GPU using CUDA. Source codes for
a simple video player, capable of displaying subtitles in the proposed subtitle
format are included. The thesis also contains analysis how to use this format
when displaying subtitles with cinema movie projection simultaneously.