Ing. Ondřej Podsztavek

Publikace

Prototype of Interactive Visualisation Tool for Bayesian Active Deep Learning

Autoři
Podsztavek, O.; Škoda, P.; Tvrdík, P.
Rok
2023
Publikováno
Astronomy Data Analysis Software and Systems XXXI. San Francisco: Astronomical Society of the Pacific, 2023.
Typ
Stať ve sborníku
Anotace
n the era of big data in astronomy, we need to develop methods to analyse the data. One such method is Bayesian active deep learning (synergy of Bayesian convolutional neural networks and active learning). To improve the method’s performance, we have developed a prototype of an interactive visualisation tool for a selection of an informative (contains data with high predictive uncertainty, is diverse, but not redundant) data subsample for labelling by a human expert. The tool takes as input a sample of data with the highest predictive uncertainty. These data are projected to 2-D with a dimensionality reduction technique. We visualise the projected data in an interactive scatter plot and allow a human expert to label a selected subsample of data. With this tool, she or he can select a correct subsample with all the previously mentioned characteristics. This should lower the total amount of data labelled because the Bayesian model’s performance will improve faster than when the data are selected automatically.

Spectroscopic redshift determination with Bayesian convolutional networks

Autoři
Podsztavek, O.; Škoda, P.; Tvrdík, P.
Rok
2022
Publikováno
Astronomy and Computing. 2022, 40 ISSN 2213-1337.
Typ
Článek
Anotace
Astronomy is facing large amounts of data, so astronomers have to rely on automated methods to analyse them. However, automated methods might produce incorrect values. Therefore, we need to develop different automated methods and perform a consistency check to identify them. If there is a lot of labelled data, convolutional neural networks are a powerful method for any task. We illustrate the consistency check on spectroscopic redshift determination with a method based on a Bayesian convolutional neural network inspired by VGG networks. The method provides predictive uncertainties that enable us to (1.) determine unusual or problematic spectra for visual inspection; (2.) do thresholding that allows us to balance between the error of redshift predictions and coverage. We used the 12th Sloan Digital Sky Survey quasar superset as the training set for the method. We evaluated its generalisation capability on about three-quarters of a million spectra from the 16th quasar superset of the same survey. On the 16th quasar superset, the method performs better in terms of the root-mean-squared error than the most used template fitting method. Using redshift predictions of the proposed method, we identified spectra with incorrectly determined redshifts that are unrecognised quasars or were misclassified as them.

Transfer Learning in Large Spectroscopic Surveys

Autoři
Podsztavek, O.; Škoda, P.; Tvrdík, P.
Rok
2021
Publikováno
Astronomical Data Analysis Software and Systems XXX. San Francisco: Astronomical Society of the Pacific, 2021. p. 235-238. Astronomical Society of the Pacific Conference Series. vol. 532. ISBN 978-1-58381-951-7.
Typ
Stať ve sborníku
Anotace
Transfer learning is a machine learning method that can reuse knowledge across spectroscopic archives with different distributions of observations. We applied transfer learning based on a convolutional neural network to spectra from Large Sky Area Multi-Object Fiber Spectroscopic Telescope and Sloan Digital Sky Survey archives. Taking advantage of known quasars in LAMOST DR5 version 3, we wanted to discover yet unseen quasars in SDSS DR14. Our transfer learning approach reaches 99.6% precision and 98.9% recall. We found examples of quasars previously classified as stars.

Active deep learning method for the discovery of objects of interest in large spectroscopic surveys

Autoři
Škoda, P.; Podsztavek, O.; Tvrdík, P.
Rok
2020
Publikováno
Astronomy & Astrophysics. 2020, 643 ISSN 1432-0746.
Typ
Článek
Anotace
Context. Current archives of the LAMOST telescope contain millions of pipeline-processed spectra that have probably never been seen by human eyes. Most of the rare objects with interesting physical properties, however, can only be identified by visual analysis of their characteristic spectral features. A proper combination of interactive visualisation with modern machine learning techniques opens new ways to discover such objects. Aims. We apply active learning classification methods supported by deep convolutional neural networks to automatically identify complex emission-line shapes in multi-million spectra archives. Methods. We used the pool-based uncertainty sampling active learning method driven by a custom-designed deep convolutional neural network with 12 layers. The architecture of the network was inspired by VGGNet, AlexNet, and ZFNet, but it was adapted for operating on one-dimensional feature vectors. The unlabelled pool set is represented by 4.1 million spectra from the LAMOST data release 2 survey. The initial training of the network was performed on a labelled set of about 13 000 spectra obtained in the 400 Å wide region around Hα by the 2 m Perek telescope of the Ondˇrejov observatory, which mostly contains spectra of Be and related early-type stars. The differences between the Ondˇrejov intermediate-resolution and the LAMOST low-resolution spectrographs were compensated for by Gaussian blurring and wavelength conversion. Results. After several iterations, the network was able to successfully identify emission-line stars with an error smaller than 6.5%. Using the technology of the Virtual Observatory to visualise the results, we discovered 1 013 spectra of 948 new candidates of emission-line objects in addition to 664 spectra of 549 objects that are listed in SIMBAD and 2 644 spectra of 2 291 objects identified in an earlier paper of a Chinese group led by Wen Hou. The most interesting objects with unusual spectral properties are discussed in detail.

VO-supported Active Deep Learning as a New Methodology for the Discovery of Objects of Interest in Big Surveys

Autoři
Škoda, P.; Podsztavek, O.; Tvrdík, P.
Rok
2020
Publikováno
Astronomical Data Analysis Software and Systems XXIX. San Francisco: Astronomical Society of the Pacific, 2020. p. 163-166. Astronomical Society of the Pacific Conference Series. vol. 527. ISBN 978-1-58381-941-8.
Typ
Stať ve sborníku
Anotace
Deep neural networks have been proved a very successful method of supervised learning in several research fields. To perform well, they require a massive amount of labelled data, which is challenging to get from most astronomical surveys. To overcome this limitation, we have developed a novel active deep learning method. It is based on an iterative training of a deep network followed by relabelling of a small sample according to a qualified decision of an oracle (usually a human expert). To maximise the scientific return, the oracle brings to the decision the domain knowledge not limited only to the data learned by the network. By combining some external resources to extract the key information by an expert in a field, much more relevant labels are assigned. Setup of an active deep learning platform thus requires incorporation of a Virtual Observatory (VO) client infrastructure as an integral part of a machine learning experiment, which is quite different from current practices. As proof of concept, we demonstrate the efficiency of our method for discovery of new emission-line stars in a multimillion spectra archive of the LAMOST DR2 survey.

Detekce anomálií v otevřených datech o znečištění ovzduší polétavým prachem

Autoři
Podsztavek, O.; Kuchař, J.
Rok
2019
Publikováno
DATA A ZNALOSTI & WIKT 2019. Košice: Technická univerzita v Košiciach, 2019. p. 66-71. ISBN 978-80-553-3354-0.
Typ
Stať ve sborníku
Anotace
Senzorická síť veřejného osvětlení na pražském Karlínském náměstí poskytuje měření znečištění ovzduší polétavým prachem PM10 jako otevřená data. V této práci v nich detekujeme anomálie pomocí algoritmů strojového učení pro predikci časových řad a prahování. Chceme, aby se algoritmus strojového učení naučil pravidelnosti v datech a pokud se stane něco neočekávaného, tak to prahováním odhalíme. Experimentovali jsme s lineární regresí a LSTM rekurentní neuronovou sítí, které jsme mezi sebou porovnávali střední kvadratickou chybou. Ukázalo se, že lineární regrese, která predikuje z posledních dvou měření, dosahuje lepších výsledků. Anomálie jsme detekovali z rozdílů predikovaných a skutečných hodnot. Práh pro detekování anomálií jsme vypočítali z histogramu rozdílů predikcí a skutečně naměřených hodnot. Testování ukázalo, že takto navržená metoda dokáže odhalit některé anomálie v měřeních polétavého prachu PM10, ale mnoho anomálií (například postupně nabíhajících) nedetekuje.

Comparing Offline and Online Evaluation Results of Recommender Systems

Autoři
Kordík, P.; Řehořek, T.; Bíža, O.; Bartyzal, R.; Podsztavek, O.; Povalyev, I.P.
Rok
2018
Publikováno
REVEAL RecSyS 2018 workshop proceedings. New York: ACM, 2018.
Typ
Stať ve sborníku
Anotace
Recommender systems are usually trained and evaluated on historical data. Offline evaluation is, however, tricky and offline performance can be an inaccurate predictor of the online performance measured in production due to several reasons. In this paper, we experiment with two offline evaluation strategies and show that even a reasonable and popular strategy can produce results that are not just biased, but also in direct conflict with the true performance obtained in the online evaluation. We investigate offline policy evaluation techniques adapted from reinforcement learning and explain why such techniques fail to produce an unbiased estimate of the online performance in the “watch next” scenario of a large-scale movie recommender system. Finally, we introduce a new evaluation technique based on Jaccard Index and show that it correlates with the online performance.