Ing. Jiří Nádvorník

Publikace

HiSS-Cube: A scalable framework for Hierarchical Semi-Sparse Cubes preserving uncertainties

Autoři
Nádvorník, J.; Škoda, P.; Tvrdík, P.
Rok
2021
Publikováno
Astronomy and Computing. 2021, 36 ISSN 2213-1337.
Typ
Článek
Anotace
A wide variety of approaches are available for big data cube visualization and analysis. However, few exploit the power of array databases and none preserve the scientific uncertainties in measurements when constructing lower resolutions. In machine learning applications, we often need to rapidly search data for regions of interest and then focus on these areas, but without having to retrain the model every time we change the resolution. However, the reliable verification of these areas also requires details of the accuracy of the measured values. In this study, we developed a new software infrastructure called Hierarchical Semi-Sparse Cube (HiSS-Cube) based on Hierarchical Data Format version 5. HiSS-Cube enables visualization and machine learning using combined heterogeneous data and it was designed to be scalable for big data. HiSS-Cube allows data from multiple domains (imaging, spectral, and timeseries data) to be combined and the construction of a multi-resolution semi-sparse data cube that preserves the uncertainties of scientific measurement at all resolutions. The functionality of HiSSCube was verified based on a subset of the Sloan Digital Sky Survey Stripe 82 survey. We compared the times and volumes for visualizations and machine learning data exported to HiSS-Cube and the original format (FITS). Using these data, we demonstrated that HiSS-Cube is faster by several orders of magnitude. HiSS-Cube supports export to the VOTable format and it is compatible with common Virtual Observatory tools. The source code for our prototype HiSS-Cube is available from GitHub and the data are available from Zenodo.