GPU Laboratory (GPULab)

The GPULab is equipped with GPU-accelerated, high-performance computing servers (“GPU” stands for graphics processing unit).

The laboratory is primarily intended for:

The computing servers can also be used for preliminary measurements in the field of high-performance computing.

Equipment

GPU servers

The most important equipment includes the following GPU servers:

  • node “gpu-01”
    • Tesla K40c
    • GeForce GTX 780 Ti
    • GeForce GTX 750
  • node “gpu-02”
    • GeForce RTX 2080 Ti
  • node “livsgpu01”
    • Tesla P100
    • Tesla P100

Publications

Multilayer Approach for Joint Direct and Transposed Sparse Matrix Vector Multiplication for Multithreaded CPUs

Authors
Šimeček, I.; Langr, D.; Kotenkov, I.
Year
2018
Published
Parallel Processing and Applied Mathematics Part I.. Cham: Springer International Publishing AG, 2018. p. 47-56. Lecture Notes in Computer Science. vol. 10777. ISSN 0302-9743. ISBN 978-3-319-78023-8.
Type
Proceedings paper
Annotation
One of the most common operations executed on modern high-perfor\-mance computing systems is multiplication of a sparse matrix by a dense vector within a shared-memory computational node. Strongly related but far less studied problem is joint direct and transposed sparse matrix-vector multiplication, which is widely needed by certain types of iterative solvers. We propose a multilayer approach for joint sparse multiplication that balances the workload of threads. Measurements prove that our algorithm is scalable and achieve high computational performance for multiple benchmark matrices that arise from various scientific and engineering disciplines.

Parallel solver of large systems of linear inequalities using Fourier--Motzkin elimination

Authors
Year
2016
Published
Computing and Informatics. 2016, 35(6), 1307-1337. ISSN 1335-9150.
Type
Article
Annotation
Fourier-Motzkin elimination is a computationally expensive but powerful method to solve a system of linear inequalities. These systems arise e.g. in execution order analysis for loop nests or in integer linear programming. This paper focuses on the analysis, design and implementation of a~parallel solver for distributed memory for large systems of linear inequalities using the Fourier--Motzkin elimination algorithm. We also measure the speedup of parallel solver and prove that this implementation results in good scalability.

All publications

Efficient parallel evaluation of block properties of sparse matrices

Year
2016
Published
Proceedings of the 2016 Federated Conference on Computer Science and Information Systems. New York: Institute of Electrical and Electronics Engineers, 2016. pp. 709-716. ISBN 978-83-60810-90-3.
Type
Proceedings paper
Annotation
Many storage formats for sparse matrices have been developed. Majority of these formats can be parametrized, so the algorithm for finding optimal parameters is crucial. For overall efficiency, it is important to reduce the execution time of this preprocessing. In this paper, we propose a new algorithm for the determination of the number of nonzero blocks of the given size in a sparse matrix. The proposed algorithm requires relatively a small amount of auxiliary memory. Our approach is based on the Morton reordering and bitwise manipulations. We also present a parallel (multithreaded) version and evaluate its performance and space complexity.

Utilization of GPU Acceleration in Le Bail Fitting Method

Authors
Šimeček, I.; Mařík, O.; Jelínek, M.
Year
2015
Published
Romanian Journal of Information Science and Technology (ROMJIST). 2015, 18(2), 182-196. ISSN 1453-8245.
Type
Article
Annotation
Le Bail fitting method is a process used in applied crystallography. It can be employed in several phases of crystal structure determination and as it is only one step in a more complex process, it needs to be as fast as possible. This article begins with a short explanation of crystallography terms needed to understand the Le Bail fitting, then continues with the description of the Le Bail fitting method itself and basic principles on which it is based. Then the parallelization method is explained, starting with a more general process, followed by specifics of GPU accelerated computing including short part on optimization. Finally, achieved results are presented along with comparison to sequential implementation and alternative parallelization approaches.

GPU solver for systems of linear equations with infinite precision

Authors
Year
2016
Published
17th International Symposium on Symbolic and Numeric Algorithms for Scientific Computing. Los Alamitos: IEEE Computer Society, 2016. p. 121-124. ISBN 978-1-5090-0461-4.
Type
Proceedings paper
Annotation
In this paper, we would like to introduce a GPU accelerated solver for systems of linear equations with an infinite precision. The infinite precision means that the system can provide a precise solution without any rounding error. These errors usually come from limited precision of floating point values within their natural computer representation. In a simplified description, the system is using modular arithmetic for transforming an original SLE into dozens of integer SLEs that are solved in parallel via GPU. In the final step, partial results are used for a calculation of the final solution. The usage of GPU plays a key role in terms of performance because the whole process is computationally very intensive. The GPU solver can provide about one magnitude higher performance than a multithreaded one.

Contact person

Where to find us

GPU Laboratory
Department of Computer Systems
Faculty of Information Technology
Czech Technical University in Prague

Room TH:A-1313 (Building A, 13th floor)
Thákurova 7
Prague 6 – Dejvice
160 00

The person responsible for the content of this page: doc. Ing. Štěpán Starosta, Ph.D.