GPU Laboratory (GPULab)

The GPULab is equipped with GPU-accelerated, high-performance computing servers (“GPU” stands for graphics processing unit).

The laboratory is primarily intended for:

Development, tuning and optimization of course projects in master courses dealing with massive parallel computing on GPU (courses “Parallel computer architectures”, “Programming in CUDA”, “GPU Architectures and Programming”, “Multicore CPU Computing”).
Development, tuning and optimization of massive parallel computing on GPU in the framework of diploma theses and research projects.

The computing servers can also be used for preliminary measurements in the field of high-performance computing.

Equipment

GPU servers

The most important equipment includes the following GPU servers:

node “gpu-01”
- Tesla K40c
- GeForce GTX 780 Ti
- GeForce GTX 750
node “gpu-02”
- GeForce RTX 2080 Ti
node “livsgpu01”
- Tesla P100
- Tesla P100

Publications

Multilayer Approach for Joint Direct and Transposed Sparse Matrix Vector Multiplication for Multithreaded CPUs

Authors

Šimeček, I.; Langr, D.; Kotenkov, I.

Year

2018

Published

Parallel Processing and Applied Mathematics Part I.. Cham: Springer International Publishing AG, 2018. p. 47-56. Lecture Notes in Computer Science. vol. 10777. ISSN 0302-9743. ISBN 978-3-319-78023-8.

Type

Proceedings paper

DOI

10.1007/978-3-319-78024-5_5

Departments

Department of Computer Systems

Annotation

One of the most common operations executed on modern high-perfor\-mance computing systems is multiplication of a sparse matrix by a dense vector within a shared-memory computational node. Strongly related but far less studied problem is joint direct and transposed sparse matrix-vector multiplication, which is widely needed by certain types of iterative solvers. We propose a multilayer approach for joint sparse multiplication that balances the workload of threads. Measurements prove that our algorithm is scalable and achieve high computational performance for multiple benchmark matrices that arise from various scientific and engineering disciplines.

Parallel solver of large systems of linear inequalities using Fourier--Motzkin elimination

Authors

Šimeček, I.; Lórencz, R.; Langr, D.; Fritsch, R.

Year

2016

Published

Computing and Informatics. 2016, 35(6), 1307-1337. ISSN 1335-9150.

Type

Article

Departments

Department of Computer Systems
Faculty of Information Technology

Annotation

Fourier-Motzkin elimination is a computationally expensive but powerful method to solve a system of linear inequalities. These systems arise e.g. in execution order analysis for loop nests or in integer linear programming. This paper focuses on the analysis, design and implementation of a~parallel solver for distributed memory for large systems of linear inequalities using the Fourier--Motzkin elimination algorithm. We also measure the speedup of parallel solver and prove that this implementation results in good scalability.

All publications

Efficient parallel evaluation of block properties of sparse matrices

Authors

Šimeček, I.; Langr, D.

Year

2016

Published

Proceedings of the 2016 Federated Conference on Computer Science and Information Systems. New York: Institute of Electrical and Electronics Engineers, 2016. pp. 709-716. ISBN 978-83-60810-90-3.

Type

Proceedings paper

DOI

10.15439/2016F366

Departments

Department of Computer Systems

Annotation

Many storage formats for sparse matrices have been developed. Majority of these formats can be parametrized, so the algorithm for finding optimal parameters is crucial. For overall efficiency, it is important to reduce the execution time of this preprocessing. In this paper, we propose a new algorithm for the determination of the number of nonzero blocks of the given size in a sparse matrix. The proposed algorithm requires relatively a small amount of auxiliary memory. Our approach is based on the Morton reordering and bitwise manipulations. We also present a parallel (multithreaded) version and evaluate its performance and space complexity.

Utilization of GPU Acceleration in Le Bail Fitting Method

Authors

Šimeček, I.; Mařík, O.; Jelínek, M.

Year

2015

Published

Romanian Journal of Information Science and Technology (ROMJIST). 2015, 18(2), 182-196. ISSN 1453-8245.

Type

Article

Departments

Department of Computer Systems

Annotation

Le Bail fitting method is a process used in applied crystallography. It can be employed in several phases of crystal structure determination and as it is only one step in a more complex process, it needs to be as fast as possible. This article begins with a short explanation of crystallography terms needed to understand the Le Bail fitting, then continues with the description of the Le Bail fitting method itself and basic principles on which it is based. Then the parallelization method is explained, starting with a more general process, followed by specifics of GPU accelerated computing including short part on optimization. Finally, achieved results are presented along with comparison to sequential implementation and alternative parallelization approaches.

GPU solver for systems of linear equations with infinite precision

Authors

Khun, J.; Šimeček, I.; Lórencz, R.

Year

2016

Published

17th International Symposium on Symbolic and Numeric Algorithms for Scientific Computing. Los Alamitos: IEEE Computer Society, 2016. p. 121-124. ISBN 978-1-5090-0461-4.

Type

Proceedings paper

DOI

10.1109/SYNASC.2015.28

Departments

Department of Computer Systems

Annotation

In this paper, we would like to introduce a GPU accelerated solver for systems of linear equations with an infinite precision. The infinite precision means that the system can provide a precise solution without any rounding error. These errors usually come from limited precision of floating point values within their natural computer representation. In a simplified description, the system is using modular arithmetic for transforming an original SLE into dozens of integer SLEs that are solved in parallel via GPU. In the final step, partial results are used for a calculation of the final solution. The usage of GPU plays a key role in terms of performance because the whole process is computationally very intensive. The GPU solver can provide about one magnitude higher performance than a multithreaded one.

Contact person

doc. Ing. Ivan Šimeček, Ph.D.

+420224359851
ivan.simecek@fit.cvut.cz
TH:A-1129

Where to find us

GPU Laboratory
Department of Computer Systems
Faculty of Information Technology
Czech Technical University in Prague

Room TH:A-1313 (Building A, 13th floor)
Thákurova 7
Prague 6 – Dejvice
160 00