prof. Ing. Pavel Tvrdík, CSc.

Vedoucí Katedry počítačových systémů

Publikace

Hierarchical Semi-Sparse Cubes - parallel framework for storing multi-modal big data in HDF5

Autoři
Nádvorník, J.; Škoda, P.; Tvrdík, P.
Rok
2023
Publikováno
IEEE Access. 2023, 119876-119897. ISSN 2169-3536.
Typ
Článek
Anotace
Since Moore‘s law applies also to data detectors, the volume of data collected in astronomy doubles approximately every year. A prime example is the upcoming Square Kilometer Array (SKA) instrument that will produce approximately 8.5 Exabytes over the first 15 years of service, starting in the year 2027. Storage capacities for these data have grown as well, and primary analytical tools have also kept up. However, the tools for combining big data from several such instruments still lag behind. Having the ability to easily combine big data is crucial for inferring new knowledge about the universe from the correlations and not only finding interesting information in these huge datasets but also their combinations. In this article, we present a revised version of the Hierarchical Semi-Sparse Cube (HiSS-Cube) framework. It aims to provide highly parallel processing of combined multi-modal multi-dimensional big data. The main contributions of this study are as follows: 1) Highly parallel construction of a database built on top of the HDF5 framework. This database supports parallel queries. 2) Design of a database index on top of HDF5 that can be easily constructed in parallel. 3) Support of efficient multi-modal big data combinations. We tested the scalability and efficiency on big astronomical spectroscopic and photometric data obtained from the Sloan Digital Sky Survey. The performance of HiSS-Cube is bounded by the I/O bandwidth and I/O operations per second of the underlying parallel file system, and it scales linearly with the number of I/O nodes.

Prototype of Interactive Visualisation Tool for Bayesian Active Deep Learning

Autoři
Podsztavek, O.; Škoda, P.; Tvrdík, P.
Rok
2023
Publikováno
Astronomy Data Analysis Software and Systems XXXI. San Francisco: Astronomical Society of the Pacific, 2023.
Typ
Stať ve sborníku
Anotace
n the era of big data in astronomy, we need to develop methods to analyse the data. One such method is Bayesian active deep learning (synergy of Bayesian convolutional neural networks and active learning). To improve the method’s performance, we have developed a prototype of an interactive visualisation tool for a selection of an informative (contains data with high predictive uncertainty, is diverse, but not redundant) data subsample for labelling by a human expert. The tool takes as input a sample of data with the highest predictive uncertainty. These data are projected to 2-D with a dimensionality reduction technique. We visualise the projected data in an interactive scatter plot and allow a human expert to label a selected subsample of data. With this tool, she or he can select a correct subsample with all the previously mentioned characteristics. This should lower the total amount of data labelled because the Bayesian model’s performance will improve faster than when the data are selected automatically.

Enhancing Reactive Ad Hoc Routing Protocols with Trust

Rok
2022
Publikováno
Future Internet. 2022, 14(1), ISSN 1999-5903.
Typ
Článek
Anotace
In wireless ad hoc networks, security and communication challenges are frequently addressed by deploying a trust mechanism. A number of approaches for evaluating trust of ad hoc network nodes have been proposed, including the one that uses neural networks. We proposed to use packet delivery ratios as input to the neural network. In this article, we present a new method, called TARA (Trust-Aware Reactive Ad Hoc routing), to incorporate node trusts into reactive ad hoc routing protocols. The novelty of the TARA method is that it does not require changes to the routing protocol itself. Instead, it influences the routing choice from outside by delaying the route request messages of untrusted nodes. The performance of the method was evaluated on the use case of sensor nodes sending data to a sink node. The experiments showed that the method improves the packet delivery ratio in the network by about 70%. Performance analysis of the TARA method provided recommendations for its application in a particular ad hoc network.

Spectroscopic redshift determination with Bayesian convolutional networks

Autoři
Podsztavek, O.; Škoda, P.; Tvrdík, P.
Rok
2022
Publikováno
Astronomy and Computing. 2022, 40 ISSN 2213-1337.
Typ
Článek
Anotace
Astronomy is facing large amounts of data, so astronomers have to rely on automated methods to analyse them. However, automated methods might produce incorrect values. Therefore, we need to develop different automated methods and perform a consistency check to identify them. If there is a lot of labelled data, convolutional neural networks are a powerful method for any task. We illustrate the consistency check on spectroscopic redshift determination with a method based on a Bayesian convolutional neural network inspired by VGG networks. The method provides predictive uncertainties that enable us to (1.) determine unusual or problematic spectra for visual inspection; (2.) do thresholding that allows us to balance between the error of redshift predictions and coverage. We used the 12th Sloan Digital Sky Survey quasar superset as the training set for the method. We evaluated its generalisation capability on about three-quarters of a million spectra from the 16th quasar superset of the same survey. On the 16th quasar superset, the method performs better in terms of the root-mean-squared error than the most used template fitting method. Using redshift predictions of the proposed method, we identified spectra with incorrectly determined redshifts that are unrecognised quasars or were misclassified as them.

HiSS-Cube: A scalable framework for Hierarchical Semi-Sparse Cubes preserving uncertainties

Autoři
Nádvorník, J.; Škoda, P.; Tvrdík, P.
Rok
2021
Publikováno
Astronomy and Computing. 2021, 36 ISSN 2213-1337.
Typ
Článek
Anotace
A wide variety of approaches are available for big data cube visualization and analysis. However, few exploit the power of array databases and none preserve the scientific uncertainties in measurements when constructing lower resolutions. In machine learning applications, we often need to rapidly search data for regions of interest and then focus on these areas, but without having to retrain the model every time we change the resolution. However, the reliable verification of these areas also requires details of the accuracy of the measured values. In this study, we developed a new software infrastructure called Hierarchical Semi-Sparse Cube (HiSS-Cube) based on Hierarchical Data Format version 5. HiSS-Cube enables visualization and machine learning using combined heterogeneous data and it was designed to be scalable for big data. HiSS-Cube allows data from multiple domains (imaging, spectral, and timeseries data) to be combined and the construction of a multi-resolution semi-sparse data cube that preserves the uncertainties of scientific measurement at all resolutions. The functionality of HiSSCube was verified based on a subset of the Sloan Digital Sky Survey Stripe 82 survey. We compared the times and volumes for visualizations and machine learning data exported to HiSS-Cube and the original format (FITS). Using these data, we demonstrated that HiSS-Cube is faster by several orders of magnitude. HiSS-Cube supports export to the VOTable format and it is compatible with common Virtual Observatory tools. The source code for our prototype HiSS-Cube is available from GitHub and the data are available from Zenodo.

Transfer Learning in Large Spectroscopic Surveys

Autoři
Podsztavek, O.; Škoda, P.; Tvrdík, P.
Rok
2021
Publikováno
Astronomical Data Analysis Software and Systems XXX. San Francisco: Astronomical Society of the Pacific, 2021. p. 235-238. Astronomical Society of the Pacific Conference Series. vol. 532. ISBN 978-1-58381-951-7.
Typ
Stať ve sborníku
Anotace
Transfer learning is a machine learning method that can reuse knowledge across spectroscopic archives with different distributions of observations. We applied transfer learning based on a convolutional neural network to spectra from Large Sky Area Multi-Object Fiber Spectroscopic Telescope and Sloan Digital Sky Survey archives. Taking advantage of known quasars in LAMOST DR5 version 3, we wanted to discover yet unseen quasars in SDSS DR14. Our transfer learning approach reaches 99.6% precision and 98.9% recall. We found examples of quasars previously classified as stars.

Active deep learning method for the discovery of objects of interest in large spectroscopic surveys

Autoři
Škoda, P.; Podsztavek, O.; Tvrdík, P.
Rok
2020
Publikováno
Astronomy & Astrophysics. 2020, 643 ISSN 1432-0746.
Typ
Článek
Anotace
Context. Current archives of the LAMOST telescope contain millions of pipeline-processed spectra that have probably never been seen by human eyes. Most of the rare objects with interesting physical properties, however, can only be identified by visual analysis of their characteristic spectral features. A proper combination of interactive visualisation with modern machine learning techniques opens new ways to discover such objects. Aims. We apply active learning classification methods supported by deep convolutional neural networks to automatically identify complex emission-line shapes in multi-million spectra archives. Methods. We used the pool-based uncertainty sampling active learning method driven by a custom-designed deep convolutional neural network with 12 layers. The architecture of the network was inspired by VGGNet, AlexNet, and ZFNet, but it was adapted for operating on one-dimensional feature vectors. The unlabelled pool set is represented by 4.1 million spectra from the LAMOST data release 2 survey. The initial training of the network was performed on a labelled set of about 13 000 spectra obtained in the 400 Å wide region around Hα by the 2 m Perek telescope of the Ondˇrejov observatory, which mostly contains spectra of Be and related early-type stars. The differences between the Ondˇrejov intermediate-resolution and the LAMOST low-resolution spectrographs were compensated for by Gaussian blurring and wavelength conversion. Results. After several iterations, the network was able to successfully identify emission-line stars with an error smaller than 6.5%. Using the technology of the Virtual Observatory to visualise the results, we discovered 1 013 spectra of 948 new candidates of emission-line objects in addition to 664 spectra of 549 objects that are listed in SIMBAD and 2 644 spectra of 2 291 objects identified in an earlier paper of a Chinese group led by Wen Hou. The most interesting objects with unusual spectral properties are discussed in detail.

Space-Efficient k-d Tree-Based Storage Format for Sparse Tensors

Autoři
Rok
2020
Publikováno
Proceedings of the 29th International Symposium on High-Performance Parallel and Distributed Computing. New York: Association for Computing Machinery, 2020. p. 29-33. HPDC ’20. ISBN 978-1-4503-7052-3.
Typ
Stať ve sborníku
Anotace
Computations with tensors are widespread in many scientific areas. Usually, the used tensors are very large but sparse, i.e., the vast majority of their elements are zero. The space complexity of sparse tensor storage formats varies significantly. For overall efficiency, it is important to reduce the execution time and additional space requirements of the initial preprocessing (i.e., converting tensors from common storage formats to the given internal format). The main contributions of this paper are threefold. Firstly, we present a new storage format for sparse tensors, which we call the succinct k-d tree-based tensor (SKTB) format. We compare the space complexity of common tensor storage formats and of the SKTB format and demonstrate the viability of using a tree as a data structurefor sparse tensors. Secondly, we present a parallel space-efficient algorithm for converting tensors to the SKTB format. Thirdly, we demonstrate the computational efficiency of the proposed format in sparse tensor-vector multiplication.

VO-supported Active Deep Learning as a New Methodology for the Discovery of Objects of Interest in Big Surveys

Autoři
Škoda, P.; Podsztavek, O.; Tvrdík, P.
Rok
2020
Publikováno
Astronomical Data Analysis Software and Systems XXIX. San Francisco: Astronomical Society of the Pacific, 2020. p. 163-166. Astronomical Society of the Pacific Conference Series. vol. 527. ISBN 978-1-58381-941-8.
Typ
Stať ve sborníku
Anotace
Deep neural networks have been proved a very successful method of supervised learning in several research fields. To perform well, they require a massive amount of labelled data, which is challenging to get from most astronomical surveys. To overcome this limitation, we have developed a novel active deep learning method. It is based on an iterative training of a deep network followed by relabelling of a small sample according to a qualified decision of an oracle (usually a human expert). To maximise the scientific return, the oracle brings to the decision the domain knowledge not limited only to the data learned by the network. By combining some external resources to extract the key information by an expert in a field, much more relevant labels are assigned. Setup of an active deep learning platform thus requires incorporation of a Virtual Observatory (VO) client infrastructure as an integral part of a machine learning experiment, which is quite different from current practices. As proof of concept, we demonstrate the efficiency of our method for discovery of new emission-line stars in a multimillion spectra archive of the LAMOST DR2 survey.

Efficient Algorithm for Representations of U(3) in U(N)

Autoři
Langr, D.; Dytrych, T.; Draayer, J.P.; Launey, K.D.; Tvrdík, P.
Rok
2019
Publikováno
Computer Physics Communications. 2019, 244 442-447. ISSN 0010-4655.
Typ
Článek
Anotace
An efficient algorithm for enumerating representations of U(3) that occur in a representation of the unitary group U(N) is introduced. The algorithm is applicable to U(N) representations associated with a system of identical fermions (protons, neutrons, electrons, etc.) distributed among the N=(eta+1)(eta+2)/2 degenerate eigenstates of the etath level of the three-dimensional harmonic oscillator. A C++ implementation of the algorithm is provided and its performance evaluated. The implementation can employ OpenMP threading for use in parallel applications.

Application of neural networks for decision making and evaluation of trust in ad-hoc networks

Rok
2017
Publikováno
13th International Wireless Communications and Mobile Computing Conference(IWCMC). IEEE, 2017. p. 371-377. ISSN 2376-6492. ISBN 978-1-5090-4372-9.
Typ
Stať ve sborníku
Anotace
In this paper, we demonstrate that neural networks (NNs) are capable of trust estimation and evaluation in ad-hoc networks. The concept of trust in distributed systems arose from the notion of social trust. By the trust problem, we understand the problem of measuring the confidence in the fact that individual nodes behave correctly. We model trust in ad-hoc networks using the packet delivery ratio (PDR) metric. We have developed a method to apply NNs for solving the trust problem in ad-hoc networks. We have conducted a series of simulation experiments and measured the quality of our new method. The results show in average 98% accuracy of the classification and 94% of the regression problem. An important contribution of our research is a verification of the hypothesis that synthetic generation of ad-hoc network traffic in a simulator is sufficient for training of a NN that is then capable to accurately estimate trust in an ad-hoc network.

Blockchain-Based Multi-Level Scoring System for P2P Clusters

Rok
2017
Publikováno
2017 46th International Conference on Parallel Processing Workshops (ICPPW). Piscataway (New Jersey): IEEE, 2017. p. 301-308. ISSN 1530-2016. ISBN 9781538610442.
Typ
Stať ve sborníku
Anotace
Peer-to-peer (P2P) design offers many benefits over a single-master or a multi-master architecture in terms of scalability, reliability and independence. The Clondike project is being converted from a grid computing system into a universal non-dedicated P2P cluster where every participating node can benefit from its membership in the cluster. But different design requires different types of algorithms in order to guarantee the same functionality. One of the challenges in P2P design is a fair scheduling and a general protection of the whole cluster against abusive or malfunctioning nodes. Algorithms used in a single-master or multi-master clusters do not work anymore. We have designed a multi-level system based on one of the main principles of cryptocurrencies to achieve a distributed master-less reputation rating across the cluster. This paper discusses proposed cluster multi-level reputation system and its parameters. The system is evaluated on a Clondike cluster on an experimental measurement with abusive nodes involved.

Using Bootstraping Principles of Contemporary P2P File-Sharing Protocols in Large-Scale Grid Computing Systems

Rok
2017
Publikováno
2017 25th Euromicro International Conference on Parallel, Distributed and Network-based Processing (PDP). Piscataway: IEEE, 2017. p. 214-218. ISSN 1066-6192. ISBN 978-1-5090-6058-0.
Typ
Stať ve sborníku
Anotace
As contemporary distributed systems reach their scalability limits, their architects search ways to push their boundaries further. One of the approaches is to convert a given distributed system into a peer-to-peer (P2P) structure. This approach causes more overhead compared to a single-master or multi-master architecture, but on larger scales (if properly designed) does not reach any boundaries. The Clondike project started as a grid computing system, but now it aims to create a universal non-dedicated P2P cluster where every participating node can benefit from its membership in the cluster. The P2P approach eliminates the single-point-of-failure problems and high availability is guaranteed by design. So far in the Clondike architecture, a bootstrap protocol was based on broadcasts, which became a limiting factor for the scalability of the whole cluster. We have compared existing P2P communication protocols used in a variety of P2P systems and contemporary P2P file-sharing protocols in order to find a suitable solution for Clondike. We have chosen Kademlia protocol and proven its logaritmic scalability in the Clondike environment. This paper presents results of our analysis and implementation and a series of measurement results comparing the old and new communication protocols.

AQsort: Scalable Multi-Array In-Place Sorting with OpenMP

Rok
2016
Publikováno
Scalable Computing: Practice and Experience. 2016, 17(4), 369-391. ISSN 1895-1767.
Typ
Článek
Anotace
A new multi-threaded variant of the quicksort algorithm called AQsort and its C++/OpenMP implementation are presented. AQsort operates in place and was primarily designed for high-performance computing (HPC) runtime environments. It can work with multiple arrays at once; such a functionality is frequently required in HPC and cannot be accomplished with standard C pointer-based or C++ iterator-based approach. An extensive study is provided that evaluates AQsort experimentally and compares its performance with modern multi-threaded implementations of in-place and out-of-place sorting algorithms based on OpenMP, Cilk Plus, and Intel TBB. The measurements were conducted on several leading-edge HPC architectures, namely Cray XE6 nodes with AMD Bulldozer CPUs, Cray XC40 nodes with Intel Hasswell CPUs, IBM BlueGene/Q nodes, and Intel Xeon Phi coprocessors. The obtained results show that AQsort provides good scalability and sorting performance generally comparable to its competitors. In particular cases, the performance of AQsort may be slightly lower, which is the price for its universality and ability to work with substantially larger amounts of data.

Evaluation Criteria for Sparse Matrix Storage Formats

Rok
2016
Publikováno
IEEE Transactions on Parallel and Distributed Systems. 2016, 27(2), 428-440. ISSN 1045-9219.
Typ
Článek
Anotace
When authors present new storage formats for sparse matrices, they usually focus mainly on a single evaluation criterion, which is the performance of sparse matrix-vector multiplication (SpMV) in FLOPS. Though such an evaluation is essential, it does not allow to directly compare the presented format with its competitors. Moreover, in case that matrices are within an HPC application constructed in different formats, this criterion alone is not sufficient for the key decision whether or not to convert them into the presented format for the SpMV-based application phase. We establish ten evaluation criteria for sparse matrix storage formats, discuss their advantages and disadvantages, and provide general suggestions for format authors/evaluators to make their work more valuable for the HPC community.

A new code transformation technique for nested loops

Rok
2014
Publikováno
COMSIS - Computer Science and Information Systems. 2014, 11(4), 1381-1416. ISSN 1820-0214.
Typ
Článek
Anotace
For good performance of every computer program, good cache utilization is crucial. In numerical linear algebra libraries, good cache utilization is achieved by explicit loop restructuring (mainly loop blocking), but it requires a complicated memory pattern behavior analysis. In this paper, we describe a new source code transformation called dynamic loop reversal that can increase temporal and spatial locality. We also describe a formal method for predicting cache behavior and evaluate results of the model accuracy by the measurements on a cache monitor. The comparisons of the numbers of measured cache misses and the numbers of cache misses estimated by the model indicate that the model is relatively accurate and can be used in practice.

Algorithm 947: Paraperm: Parallel Generation of Random Permutations with MPI

Autoři
Langr, D.; Tvrdík, P.; Dytrych, T.D.; Draayer, J.P.
Rok
2014
Publikováno
ACM TRANSACTIONS ON MATHEMATICAL SOFTWARE. 2014, 41(1), 1-26. ISSN 0098-3500.
Typ
Článek
Anotace
An algorithm for parallel generation of a random permutation of a large set of distinct integers is presented. This algorithm is designed for massively parallel systems with distributed memory architectures and the MPI-based runtime environments. Scalability of the algorithm is analyzed according to the memory and communication requirements. An implementation of the algorithm in a form of a software library based on the C++ programming language and the MPI application programming interface is further provided. Finally, performed experiments are described and their results discussed. The biggest of these experiments resulted in a generation of a random permutation of 2^41 integers in slightly more than four minutes using 131072 CPU cores.

Parallel Data Acquisition for Visualization of Very Large Sparse Matrices

Autoři
Rok
2014
Publikováno
15th International Symposium on Symbolic and Numeric Algorithms for Scientific Computing. Los Alamitos: IEEE Computer Society, 2014. pp. 336-343. ISBN 978-1-4799-3035-7.
Typ
Stať ve sborníku
Anotace
The problem of visualization of very large sparse matrices emerging on massively parallel computer systems is identified and a new method along with an accompanying algorithm for parallel acquisition of visualization data for such matrices are presented. The proposed method is based on downsampling a matrix into blocks for which the desired visualization data are saved into a file. This file is then supposed to be downloaded and processed into a final image on a personal computer. Experimental results for the evaluation of the performance and scalability of the proposed algorithm are further provided and discussed.

Space Efficient Formats for Structure of Sparse Matrices Based on Tree Structures

Rok
2014
Publikováno
15th International Symposium on Symbolic and Numeric Algorithms for Scientific Computing. Los Alamitos: IEEE Computer Society, 2014. pp. 344-351. ISBN 978-1-4799-3035-7.
Typ
Stať ve sborníku
Anotace
Very large sparse matrices are often processed on massively parallel computer systems with distributed memory architectures consisting of tens or hundreds of thousands of processor cores. The problem occurs when we want or need to load/store these matrices from/to a distributed file system. This paper deals with the design of new formats for storing very large sparse matrices suitable for parallel I/O systems. The first one is based on arithmetic coding and the second one is based on binary tree format. We compare the space complexity of common storage formats and our new formats and prove that the latter are considerably more space efficient.

Tree-based Space Efficient Formats for Storing the Structure of Sparse Matrices

Rok
2014
Publikováno
Scalable Computing: Practice and Experience. 2014, 15(1), 1-20. ISSN 1895-1767.
Typ
Článek
Anotace
Sparse storage formats describe a way how sparse matrices are stored in a computer memory. Extensive research has been conducted about these formats in the context of performance optimization of the sparse matrix-vector multiplication algorithms, but memory efficient formats for storing sparse matrices are still under development, since the commonly used storage formats (like COO or CSR) are not sufficient. In this paper, we propose and evaluate new storage formats for sparse matrices that minimize the space complexity of information about matrix structure. The first one is based on arithmetic coding and the second one is based on binary tree format. We compare the space complexity of common storage formats and our new formats and prove that the latter are considerably more space efficient.

Dynamic loop reversal - The new code transformation technique

Rok
2013
Publikováno
Proceedings of the 2013 Federated Conference on Computer Science and Information Systems. Los Alamitos: IEEE Computer Society, 2013. pp. 1587-1594. ISSN 2325-0348. ISBN 978-1-4673-4471-5.
Typ
Stať ve sborníku
Anotace
In this paper, we describe a new source code transformation called dynamic loop reversal that can increase temporal and spatial locality. We also describe a formal method for predicting the cache behaviour and evaluation results of the accuracy of the model by measurements on a cache monitor. The comparisons of the numbers of measured cache misses and the numbers of cache misses estimated by the model indicate that model is relatively accurate and can be used in practice.

Storing sparse matrices to files in the adaptive-blocking hierarchical storage format

Rok
2013
Publikováno
Proceedings of the 2013 Federated Conference on Computer Science and Information Systems. Los Alamitos: IEEE Computer Society, 2013. pp. 479-486. ISSN 2325-0348. ISBN 978-1-4673-4471-5.
Typ
Stať ve sborníku
Anotace
When there is a need to store a sparse matrix into a file system, is it worth to convert it first into some space-efficient storage format? This paper tries to answer such question for the adaptive-blocking hierarchical storage format (ABHSF), provided that the matrix is present in memory either in the coordinate (COO) or in the compressed sparse row (CSR) storage format. The conversion algorithms from COO and CSR to ABHSF are introduced and the results of performed experiments are then presented and discussed.

Adaptive-Blocking Hierarchical Storage Format for Sparse Matrices

Autoři
Langr, D.; Šimeček, I.; Tvrdík, P.; Dytrych, T.; Draayer, J.P.
Rok
2012
Publikováno
Federated Conference on Computer Science and Information Systems (FedCSIS 2012). New York: IEEE, 2012. pp. 545-551. ISBN 978-1-4673-0708-6.
Typ
Stať ve sborníku
Anotace
Hierarchical storage formats (HSFs) can significantly reduce the space complexity of sparse matrices. They vary in storage schemes that they use for blocks and for block matrices. However, the current HSFs prescribe a fixed storage scheme for all blocks, which is not always space-optimal.We show that, generally, different storage schemes are space-optimal for different blocks. We further propose a new HSF that is based on this approach and compare its space complexity with current HSFs for benchmark matrices arising from different application areas.

Different Approaches to Distributed Compilation

Rok
2012
Publikováno
Parallel and Distributed Processing Symposium Workshops & PhD Forum (IPDPSW), 2012 IEEE 26th International. Los Alamitos, CA: IEEE Computer Soc., 2012. pp. 1128-1134. ISBN 978-1-4673-0974-5.
Typ
Stať ve sborníku
Anotace
Source code compiling is a non-trivial task that requires many computing resources. As a software project grows, its build time increases and debugging on a single computer becomes more and more time consuming task. An obvious solution would be a dedicated cluster acting as a build farm, where developers can send their requests. But in most cases, this solution has a very low utilization of available computing resources which makes it very ineffective. Therefore, we have focused on non-dedicated clusters to perform distributed compilation, where we could use users' computers as nodes of a build farm. We compare two different approaches: distcc, which is an open-source program to distribute compilation of C/C++ code between several computers on a network and Clondike, which is a universal peer-to-peer cluster that is being developed at the Czech Technical University in Prague. A very complex task able to test deeply both systems is a compilation of a Linux Kernel with many config options. We have run this task on a cluster with up to 20 computers and have measured computing times and CPU loads. In this paper, we will present the results of this experiment that indicate the scalability and utilization of given resources in both systems. We also discuss the penalty of a generic solution over a task-specific one.

Fake Run-Time Selection of Template Arguments in C++

Autoři
Langr, D.; Tvrdík, P.; Dytrych, T.; Draayer, J.P.
Rok
2012
Publikováno
Objects, Models, Components, Patterns. Berlin: Springer, 2012. pp. 140-154. ISSN 0302-9743. ISBN 978-3-642-30560-3.
Typ
Stať ve sborníku
Anotace
C++ does not support run-time resolution of template type arguments. To circumvent this restriction, we can instantiate a template for all possible combinations of type arguments at compile time and then select the proper instance at run time by evaluation of some provided conditions. However, for templates with multiple type parameters such a solution may easily result in a branching code bloat. We present a template metaprogramming algorithm called for_id that allows the user to select the proper template instance at run time with theoretical minimum sustained complexity of the branching code.

Minimal Quadtree Format for Compression of Sparse Matrices Storage

Rok
2012
Publikováno
Proceedings of 14th International Symposium on Symbolic and Numeric Algorithms for Scientific Computing (SYNASC 2012). Los Alamitos: IEEE Computer Society, 2012. pp. 359-364. SYNASC'12. ISBN 978-0-7695-4934-7.
Typ
Stať ve sborníku
Anotace
Computations with sparse matrices are widespread in scientific projects. Commonly used storage formats (such as COO or CSR) are not suitable for I/O file operations with sparse matrices due to their high space complexities. Memory-efficient formats are still under development. In this paper, we present a new storage format called the Minimal quadtree (MQ) as well as algorithms for converting matrices from common storage formats to the MQ format. We compare the space complexity of common storage formats and of the MQ format and prove that the idea of using the quadtree as the data structure for sparse matrices is viable.

Porting Clondike to Heterogeneous Platforms

Rok
2012
Publikováno
2012 2nd IEEE International Conference On Parallel, Distributed and Grid Computing. Waknaghat: IEEE, 2012. p. 380-384. ISBN 978-1-4673-2922-4.
Typ
Stať ve sborníku
Anotace
Clustering plays an important role in today's computer networks. We can achieve a higher efficiency of the whole network infrastructure by simply using idle computing power of ordinary workstations. The Clondike project aims to create a universal non-dedicated peer-to-peer cluster where every participating node can benefit from its membership in the cluster. The peer-to-peer approach does not contain any single point of failure, so a high availability is guaranteed by design. So far, we have tested Clondike only in our laboratory with homogeneous computer network architecture, all the computer nodes were the same. In this paper, we report on experiments with moving the Clondike cluster closer to a real environment: with porting Clondike to a real office network with heterogeneous computers. We have run a distributed compilation of the Linux Kernel on both platforms to verify our results and to assess the weaknesses of the current solution and to identify further development needs.

Space-efficient sparse matrix storage formats for massively parallel systems

Rok
2012
Publikováno
HPCC 2012: IEEE 14th International Conference on High Performance Computing and Communications. Los Alamitos: IEEE Computer Society, 2012. pp. 54-60. ISBN 978-0-7695-4749-7.
Typ
Stať ve sborníku
Anotace
In this paper, we propose and evaluate new storage formats for sparse matrices that minimize the space complexity of information about matrix structure. The motivation of our work are applications with very large sparse matrices that due to their size must be processed on massively parallel computer systems consisting of tens or hundreds of thousands of processor cores and that must be stored in a distributed file system using parallel I/O. The parallel I/O is typically the main performance bottleneck and reading or writing such matrices from/to distributed file system can take significant amount of time. We try to reduce this time by reducing the amount of data to be processed.

Multi-Core Code in a Cluster. A Meaningful Option?

Autoři
Šťava, M.; Tvrdík, P.
Rok
2010
Publikováno
Advances in Grid and Pervasive Computing. Berlin: Springer-Verlag, 2010. pp. 15-26. Lecture Notes in Computer Science. ISSN 0302-9743. ISBN 978-3-642-13066-3.
Typ
Stať ve sborníku
Anotace
In this paper we investigate whether parallelization of an application code for multi-core machines can bring any benefit for clustering systems, especially those based on opportunistic usage of idle resources. Previous research has shown that transformation of shared memory applications into clustered applications is complicated. At the moment, there is no practical solution available. Therefore, we instead focus on message passing applications as possible candidates for parallelization. We demonstrate a low effort approach that allows programmers to transform a multi-core Erlang code into a code that can run in a cluster environment. We provide scalability measurements of the solution in small clusters of commodity computers and identify weak points of the solution.

Overlapping Non-Dedicated Clusters Architecture

Autoři
Šťava, M.; Tvrdík, P.
Rok
2009
Publikováno
2009 International Conference on Computer Engineering and Technology. Los Alamitos: IEEE Computer Society, 2009. p. 3-10. ISBN 978-1-4244-3334-6.
Typ
Stať ve sborníku
Anotace
Non-dedicated computer clusters promise more efficient resource utilization than conventional dedicated clusters. Existing non-dedicated clustering solutions either expect trust among participating users, or they do not take into account a possibility of running multiple independent clusters on a same set of computers.In this paper, we argue how an ability to run multiple independent clusters without requiring trust among participating users can be capitalized to increase user experience and thus attract more users to participate in the cluster. A generic extension of non-dedicated clusters that satisfies these requirements is defined and a feasibility of one particular extension is demonstrated on our implementation.

Security System for Overlapping Non-dedicated Clusters

Autoři
Šťava, M.; Tvrdík, P.
Rok
2009
Publikováno
2009 IEEE International Symposium on Parallel and Distributed Processing with Applications. Los Alamitos: IEEE Computer Society, 2009. pp. 272-281. ISBN 978-0-7695-3747-4.
Typ
Stať ve sborníku
Anotace
The overlapping non dedicated clusters (ONDC) architecture is trying to mix advantages of both techniques. Clusters build in ONDC style can be deployed both on small scale local networks, but as well in large scale over the Internet deployments. In this paper we analyze the security requirements of ONDC and describe our solution. The solution was implemented for the Clondike clustering system, but the same approach can be used for any other ONDC system. The unique feature of the proposed system is configurable mechanism for authorization distributions and storage that enables users to trade-off a local information availability with local storage requirements.

Towards auction algorithms for large dense assignment problems

Autoři
Buš, L.; Tvrdík, P.
Rok
2009
Publikováno
Computational Optimization and Applications. 2009, 43(3), 411-436. ISSN 0926-6003.
Typ
Článek
Anotace
In this paper, we focus on the problem of solving large-scale instances of the linear sum assignment problem by auction algorithms. We introduce a modified auction algorithm, called look-back auction algorithm, which extends the forward auction algorithm by the ability of reusing information from previous bids. We show that it is able to reuse information from the previous bids with high efficiency for all tested types of input instances. We discuss then the design and implementation of a suite of sequential and distributed memory auction algorithms on a Linux cluster with the evaluation on several types of input instances of the linear sum assignment problem. Our results show that the look-back auction algorithm solves sequentially nearly all types of dense instances faster than other evaluated algorithms and it is more stable than the forward-reverse auction algorithm for sparse instances. Our distributed memory auction algorithms are fully memory scalable.