Ing. Daniel Vašata, Ph.D.

Předsedající Akademického senátu

Publikace

Missing Features Reconstruction Using a Wasserstein Generative Adversarial Imputation Network

Rok
2020
Publikováno
Computational Science - ICCS 2020. Cham: Springer, 2020. p. 225-239. vol. 12140. ISSN 1611-3349. ISBN 978-3-030-50423-6.
Typ
Stať ve sborníku
Anotace
Missing data is one of the most common preprocessing problems. In this paper, we experimentally research the use of generative and non-generative models for feature reconstruction. Variational Autoencoder with Arbitrary Conditioning (VAEAC) and Generative Adversarial Imputation Network (GAIN) were researched as representatives of generative models, while the denoising autoencoder (DAE) represented non-generative models. Performance of the models is compared to traditional methods k-nearest neighbors (k-NN) and Multiple Imputation by Chained Equations (MICE). Moreover, we introduce WGAIN as the Wasserstein modification of GAIN, which turns out to be the best imputation model when the degree of missingness is less than or equal to 30%. Experiments were performed on real-world and artificial datasets with continuous features where different percentages of features, varying from 10% to 50%, were missing. Evaluation of algorithms was done by measuring the accuracy of the classification model previously trained on the uncorrupted dataset. The results show that GAIN and especially WGAIN are the best imputers regardless of the conditions. In general, they outperform or are comparative to MICE, k-NN, DAE, and VAEAC.

Missing Features Reconstruction and Its Impact on Classification Accuracy

Rok
2019
Publikováno
Computational Science – ICCS 2019. Springer, Cham, 2019. p. 207-220. vol. 11538. ISBN 978-3-030-22744-9.
Typ
Stať ve sborníku
Anotace
In real-world applications, we can encounter situations when a well-trained model has to be used to predict from a damaged dataset. The damage caused by missing or corrupted values can be either on the level of individual instances or on the level of entire features. Both situations have a negative impact on the usability of the model on such a dataset. This paper focuses on the scenario where entire features are missing which can be understood as a specific case of transfer learning. Our aim is to experimentally research the influence of various imputation methods on the performance of several classification models. The imputation impact is researched on a combination of traditional methods such as k-NN, linear regression, and MICE compared to modern imputation methods such as multi-layer perceptron (MLP) and gradient boosted trees (XGBT). For linear regression, MLP, and XGBT we also propose two approaches to using them for multiple features imputation. The experiments were performed on both real world and artificial datasets with continuous features where different numbers of features, varying from one feature to 50%, were missing. The results show that MICE and linear regression are generally good imputers regardless of the conditions. On the other hand, the performance of MLP and XGBT is strongly dataset dependent. Their performance is the best in some cases, but more often they perform worse than MICE or linear regression.

Cantor spectra of magnetic chain graphs

Autoři
Vašata, D.; Exner, P.
Rok
2017
Publikováno
Journal of Physics A: Mathematical and Theoretical. 2017, 50(16), ISSN 1751-8113.
Typ
Článek
Anotace
We demonstrate a one-dimensional magnetic system can exhibit a Cantortype spectrum using an example of a chain graph with δ coupling at the vertices exposed to a magnetic field perpendicular to the graph plane and varying along the chain. If the field grows linearly with an irrational slope, measured in terms of the flux through the loops of the chain, we demonstrate the character of the spectrum relating it to the almost Mathieu operator.

Generalizations of the centroid with an application in stochastic geometry

Autoři
Rok
2017
Publikováno
Mathematische Nachrichten. 2017, 290(2-3), 452-473. ISSN 0025-584X.
Typ
Článek
Anotace
The centroid of a subset of math formula with positive volume is a well-known characteristic. An interesting task is to generalize its definition to at least some sets of zero volume. In the presented paper we propose two possible ways how to do that. The first is based on the Hausdorff measure of an appropriate dimension. The second is given by the limit of centroids of ε-neighbourhoods of the particular set when ε goes to 0. For both generalizations we discuss their existence and basic properties. Then we focus on sufficient conditions of existence of the second generalization and on conditions when both generalizations coincide. It turns out that they can be formulated with the help of the Minkowski content, rectifiability, and self-similarity. Since the centroid is often used in stochastic geometry as a centre function for certain particle processes, we present properties that are needed for both generalizations to be valid centre functions. Finally, we also show their continuity on compact convex m-sets with respect to the Hausdorff metric topology.

On long-range dependence of random measures

Autoři
Rok
2016
Publikováno
Advances in Applied Probability. 2016, 48(4), 1235-1255. ISSN 0001-8678.
Typ
Článek
Anotace
This paper deals with long-range dependence of random measures on ℝd. By examples, it is demonstrated that one must be careful in order to define it consistently. Therefore, we define long-range dependence by a rather specific second-order condition and provide an equivalent formulation involving the asymptotic behaviour of the Bartlett spectrum near the origin. Then it is shown that the defining condition may be formulated less strictly when the additional isotropy assumption holds. Finally, we present an example of a long-range dependent random measure based on the 0-level excursion set of a Gaussian random field for which the corresponding spectral density and its asymptotics are explicitly derived.