Using Paraphrasers to Detect Duplicities in Ontologies
Autoři
Korel, L.; Behr, A.S.; Kockmann, N.; Holeňa, M.
Rok
2023
Publikováno
Proceedings of the 15th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management - (Volume 2). Madeira: SciTePress, 2023. p. 40-49. KEOD. vol. 2. ISSN 2184-3228. ISBN 978-989-758-671-2.
Typ
Stať ve sborníku
Pracoviště
Anotace
This paper contains a machine-learning-based approach to detect duplicities in ontologies. Ontologies are formal specifications of shared conceptualizations of application domains. Merging and enhancing ontologies may cause the introduction of duplicities into them. The approach to duplicities proposed in this work presents a solution that does not need manual corrections by domain experts. Source texts consist of short textual descriptions from considered ontologies, which have been extracted and automatically paraphrased to receive pairs of sentences with the same or a very close meaning. The sentences in the received dataset have been embedded into Euclidean vector space. The classification task was to determine whether a given pair of sentence embeddings is semantically equivalent or different. The results have been tested using test sets generated by paraphrases as well as on a small real-world ontology. We also compared solutions by the most similar existing approach, based on GloVe and WordNet, with solutions by our approach. According to all considered metrics, our approach yielded better results than the compared approach. From the results of both experiments, the most suitable for the detection of duplicities in ontologies is the combination of BERT with support vector machines. Finally, we performed an ablation study to validate whether all paraphrasers used to create the training set for the classification were essential.
Neural-Network-Based Estimation of Normal Distributions in Black-Box Optimization
Autoři
Tumpach, J.; Koza, J.; Holeňa, M.
Rok
2022
Publikováno
30th European Symposium on Artificial Neural Networks, Computational Intelligence and Machine Learning Bruges, Belgium October 05 - 07. Louvain la Neuve: Ciaco - i6doc.com, 2022.
Typ
Stať ve sborníku
Pracoviště
Anotace
The paper presents a novel application of artificial neural
networks (ANNs) in the context of surrogate models for black-box optimization, i.e. optimization of objective functions that are accessed through
empirical evaluation. For active learning of surrogate models, a very important role plays learning of multidimensional normal distributions, for
which Gaussian processes (GPs) have been traditionally used. On the
other hand, the research reported in this paper evaluated the applicability of two ANN-based methods to this end: combining GPs with ANNs
and learning normal distributions with evidential ANNs. After methods
sketch, the paper brings their comparison on a large collection of data from
surrogate-assisted black-box optimization. It shows that combining GPs
using linear covariance functions with ANNs yields lower errors than the
investigated methods of evidential learning.
Using Artificial Neural Networks to Determine Ontologies Most Relevant to Scientific Texts
Autoři
Korel, L.; Behr, A.S.; Holeňa, M.; Kockmann, N.
Rok
2022
Publikováno
Proceedings of the 22nd Conference Information Technologies – Applications and Theory (ITAT 2022). CEUR-WS.org, 2022. p. 44-54. CEUR Workshop Proceedings. vol. 3226. ISSN 1613-0073.
Typ
Stať ve sborníku
Pracoviště
Anotace
This paper provides an insight into the possibility of how to find ontologies most relevant to scientific texts using artificial neural networks. The basic idea of the presented approach is to select a representative paragraph from a source text file, embed it to a vector space by a pre-trained fine-tuned transformer, and classify the embedded vector according to its relevance to a target ontology. We have considered different classifiers to categorize the output from the transformer, in particular random forest, support vector machine, multilayer perceptron, k-nearest neighbors, and Gaussian process classifiers. Their suitability has been evaluated in a use case with ontologies and scientific texts concerning catalysis research. From results we can say the worst results have random forest. The best results in this task brought support vector machine classifier.
Unsupervised Construction of Task-Specific Datasets for Object Re-identification
Autoři
Pulc, P.; Holeňa, M.
Rok
2021
Publikováno
ICCTA 2021 Conference Proceedings. New York: Association for Computing Machinery, 2021. p. 66-72. ISBN 978-1-4503-9052-1.
Typ
Stať ve sborníku vyzvaná či oceněná
Pracoviště
Anotace
In the last decade, we have seen a significant uprise of deep neural networks in image processing tasks and many other research areas. However, while various neural architectures have successfully solved numerous tasks, they constantly demand more and more processing time and training data. Moreover, the current trend of using existing pre-trained architectures just as backbones and attaching new processing branches on top not only increases this demand but diminishes the explainability of the whole model.
Our research focuses on combinations of explainable building blocks for the image processing tasks, such as object tracking. We propose a combination of Mask R-CNN, state-of-the-art object detection and segmentation neural network, with our previously published method of sparse feature tracking. Such a combination allows us to track objects by connecting detected masks using the proposed sparse feature tracklets. However, this method cannot recover from complete object occlusions and has to be assisted by an object re-identification.
To this end, this paper uses our feature tracking method for a slightly different task: an unsupervised extraction of object representations that we can directly use to fine-tune an object re-identification algorithm. As we have to use objects masks already in the object tracking, our approach utilises the additional information as an alpha channel of the object representations, which further increases the precision of the re-identification. An additional benefit is that our fine-tuning method can be employed even in a fully online scenario.
Active Learning for LSTM-autoencoder-based Anomaly Detection in Electrocardiogram Readings
Autoři
Šabata, T.; Holeňa, M.
Rok
2020
Publikováno
Proceedings of the Workshop on Interactive Adaptive Learning co-located with European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECML PKDD 2020). Aachen: CEUR Workshop Proceedings, 2020. p. 72-77. ISSN 1613-0073.
Typ
Stať ve sborníku
Pracoviště
Anotace
Recently, the amount of generated time series data has been increasing rapidly
in many areas such as healthcare, security, meteorology and others. However, it
is very rare that those time series are annotated. For this reason, unsupervised
machine learning techniques such as anomaly detection are often used with such
data. There exist many unsupervised algorithms for anomaly detection ranging
from simple statistical techniques such as moving average or ARIMA till complex
deep learning algorithms such as LSTM-autoencoder. For a nice overview of the
recent algorithms we refer to read.
Difficulties with the unsupervised approach are: defining an anomaly score to
correctly represent how anomalous is the time series, and setting a threshold for
that score to distinguish between normal and anomaly data. Supervised anomaly
detection, on the other hand, needs an expensive involvement of a human expert. An additional problem with supervised anomaly detection is usually the
occurrence of very low ratio of anomalies, yielding highly imbalanced data.
In this extended abstract, we propose an active learning extension for an
anomaly detector based on a LSTM-autoencoder. It performs active learning
using various classification algorithms and addresses data imbalance with oversampling and under-sampling techniques. We are currently testing it on the
ECG5000 dataset from the UCR time series classification archive.
Classification Methods for Internet Applications
Autoři
Holeňa, M.; Pulc, P.; Kopp, M.
Rok
2020
Publikováno
Cham: Springer, 2020. Studies in Big Data. vol. 69. ISSN 2197-6503. ISBN 978-3-030-36961-3.
Typ
Kniha
Pracoviště
Anotace
This book explores internet applications in which a crucial role is played by classification, such as spam filtering, recommender systems, malware detection, intrusion detection and sentiment analysis. It explains how such classification problems can be solved using various statistical and machine learning methods, including K nearest neighbours, Bayesian classifiers, the logit method, discriminant analysis, several kinds of artificial neural networks, support vector machines, classification trees and other kinds of rule-based methods, as well as random forests and other kinds of classifier ensembles. The book covers a wide range of available classification methods and their variants, not only those that have already been used in the considered kinds of applications, but also those that have the potential to be used in them in the future. The book is a valuable resource for post-graduate students and professionals alike.
Two Semi-supervised Approaches to Malware Detection with Neural Networks
Autoři
Koza, J.; Krčál, M.; Holeňa, M.
Rok
2020
Publikováno
Proceedings of the 20th Conference Information Technologies - Applications and Theory (ITAT 2020). Aachen: CEUR Workshop Proceedings, 2020. p. 176-185. ISSN 1613-0073.
Typ
Stať ve sborníku
Pracoviště
Anotace
Semi-supervised learning is characterized by
using the additional information from the unlabeled data.
In this paper, we compare two semi-supervised algorithms
for deep neural networks on a large real-world malware
dataset. Specifically, we evaluate the performance of
a rather straightforward method called Pseudo-labeling,
which uses unlabeled samples, classified with high confidence, as if they were the actual labels. The second approach is based on an idea to increase the consistency of
the network’s prediction under altered circumstances. We
implemented such an algorithm called Π-model, which
compares outputs with different data augmentation and
different dropout setting. As a baseline, we also provide
results of the same deep network, trained in the fully supervised mode using only the labeled data. We analyze the
prediction accuracy of the algorithms in relation to the size
of the labeled part of the training dataset.
Comparing rule mining approaches for classification with reasoning
Autoři
Kopp, M.; Bajer, L.; Jílek, M.; Holeňa, M.
Rok
2018
Publikováno
Proceedings of the 18th Conference Information Technologies - Applications and Theory (ITAT 2018). Aachen: CEUR Workshop Proceedings, 2018. p. 52-58. vol. 2203. ISSN 1613-0073. ISBN 9781727267198.
Typ
Stať ve sborníku
Pracoviště
Anotace
Classification serves an important role in domains such as network security or health care. Although these domains require understanding of the classifier’s decision, there are only a few classification methods trying to justify or explain their results. Classification rules and decision trees are generally considered comprehensible. Therefore, this study compares the classification performance and comprehensibility of a random forest classifier with classification rules extracted by Frequent Item Set Mining, Logical Item Set Mining and by the Explainer algorithm, which was previously proposed by the authors.
Hierarchical Motion Tracking Using Matching of Sparse Features
Autoři
Pulc, P.; Holeňa, M.
Rok
2018
Publikováno
Proceedings of the 14th International Conference on Signal-Image Technology & Internet-Based Systems (SITIS). Los Alamitos: IEEE Computer Society, 2018. p. 449-456. ISBN 978-1-5386-9385-8.
Typ
Stať ve sborníku
Pracoviště
Anotace
Fundamental approaches in motion tracking are based on registration of pixel patches from one frame to another. To ensure invariance to some changes in the image and improve the speed of discovering a match, a pyramidal approach is used to steer the process faster to optima. However, registration of the patches in high resolution is still computationally expensive.
Because we require the algorithm to process Ultra HD video content in real time on commonly available hardware, especially on mid-tier graphics processing units, approaches using matching of pixel patches are not feasible.
In this paper, we present and evaluate an approach inspired by motion tracking on an image pyramid. However, instead of comparing pixel patches one to another, we utilise binary image descriptors that are much shorter and inherently use a Hamming distance for their direct comparison.
Evaluation of our implementation, which is available on GitHub, was carried out on the Multiple Object Tracking challenge dataset.
Semisupervised segmentation of UHD video
Autoři
Keruľ-Kmec, O.; Pulc, P.; Holeňa, M.
Rok
2018
Publikováno
Proceedings of the 18th Conference Information Technologies - Applications and Theory (ITAT 2018). Aachen: CEUR Workshop Proceedings, 2018. p. 100-107. vol. 2203. ISSN 1613-0073. ISBN 9781727267198.
Typ
Stať ve sborníku
Pracoviště
Anotace
One of the key preprocessing tasks in informa-
tion retrieveal from video is the segmentation of the scene,
primarily its segmentation into foreground objects and the
background. This is actually a classification task, but with
the specific property that it is very time consuming and
costly to obtain human-labelled training data for classifier
training. That suggests to use semisupervised classifiers to
this end. The presented work in progress reports the inves-
tigation of semisupervised classification methods based on
cluster regularization and on fuzzy c-means in connection
with the foreground / background segmentation task. To
classify as many video frames as possible using only a single human-based frame, the semisupervised classifica-
tion is combined with a frequently used keypoint detec-
tor based on a combination of a corner detection method
with a visual descriptor method. The paper experimentally
compares both methods, and for the first of them, also clas-
sifiers with different delays between the human-labelled
video frame and classifier training.
Sentiment analysis from utterances
Autoři
Kožusznik, J.; Pulc, P.; Holeňa, M.
Rok
2018
Publikováno
Proceedings of the 18th Conference Information Technologies - Applications and Theory (ITAT 2018). Aachen: CEUR Workshop Proceedings, 2018. p. 92-99. vol. 2203. ISSN 1613-0073. ISBN 9781727267198.
Typ
Stať ve sborníku
Pracoviště
Anotace
The recognition of emotional states in speech is
starting to play an increasingly important role. However,
it is a complicated process, which heavily relies on the
extraction and selection of utterance features related to the
emotional state of the speaker. In the reported research,
MPEG-7 low level audio descriptors[10] serve as features
for the recognition of emotional categories. To this end, a methodology combining MPEG-7 with several important
kinds of classifiers is elaborated.
Breaking CAPTCHAs with Convolutional Neural Networks
Autoři
Kopp, M.; Nikl, M.; Holeňa, M.
Rok
2017
Publikováno
ITAT 2017: Information Technologies – Applications and Theory. Aachen: CEUR Workshop Proceedings, 2017. p. 93-99. vol. 1885. ISSN 1613-0073.
Typ
Stať ve sborníku
Pracoviště
Anotace
This paper studies reverse Turing tests to distinguish
humans and computers, called CAPTCHA. Contrary
to classical Turing tests, in this case the judge is not a human
but a computer. The main purpose of such tests is
securing user logins against the dictionary or brute force
password guessing, avoiding automated usage of various
services, preventing bots from spamming on forums and
many others.
Typical approaches to solving text-based CAPTCHA
automatically are based on a scheme specific pipeline containing
hand-designed pre-processing, denoising, segmentation,
post processing and optical character recognition.
Only the last part, optical character recognition, is usually
based on some machine learning algorithm. We present
an approach using neural networks and a simple clustering
algorithm that consists of only two steps, character localisation
and recognition. We tested our approach on 11
different schemes selected to present very diverse security
features. We experimentally show that using convolutional
neural networks is superior to multi-layered perceptrons.
K-best Viterbi Semi-supervized Active Learning in Sequence Labelling
Autoři
Šabata, T.; Borovička, T.; Holeňa, M.
Rok
2017
Publikováno
CEUR workshop proceedings. 2017, 2017 144-152. ISSN 1613-0073.
Typ
Článek
Pracoviště
Anotace
In application domains where there exists a large amount of unlabelled data but obtaining labels is expensive, active learning is a useful way to select which data should be labelled. In addition to its traditional successful use in classification and regression tasks, active learning has been also applied to sequence labelling. According to the standard active learning approach, sequences for which the labelling would be the most informative should be labelled. However, labelling the entire sequence may be inefficient as for some its parts, the labels can be predicted using a model. Labelling such parts brings only a little new information. Therefore in this paper, we investigate a sequence labelling approach in which in the sequence selected
for labelling, the labels of most tokens are predicted by a model and only tokens that the model can not predict with sufficient confidence are labelled. Those tokens are
identified using the k-best Viterbi algorithm.
Towards Real-time Motion Estimation in High-Definition Video Based on Points of Interest
Autoři
Pulc, P.; Holeňa, M.
Rok
2017
Publikováno
Proceedings of the 2017 Federated Conference on Computer Science and Information Systems. Katowice: Polish Information Processing Society, 2017. p. 67-70. Annals of Computer Science and Information Systems. vol. 11. ISSN 2300-5963. ISBN 978-83-946253-7-5.
Typ
Stať ve sborníku
Pracoviště
Anotace
Currently used motion estimation is usually based on a computation of optical flow from individual images or short sequences. As these methods do not require an extraction of the visual description in points of interest, correspondence can be deduced only by the position of such points.
In this paper, we propose an alternative motion estimation method solely using a binary visual descriptor. By tuning the internal parameters, we achieve either a detection of longer time series or a higher number of shorter series in a shorter time. As our method uses the visual descriptors, their values can be directly used in more complex visual detection tasks.
Application of Meta-learning Principles in Multimedia Indexing
Autoři
Pulc, P.; Holeňa, M.
Rok
2016
Publikováno
DATESO 2016: Databases, Texts, Specifications, and Objects. Ostrava: Vysoká škola báňská - Technická univerzita Ostrava. Archiv VŠB-TUO, 2016. p. 1-11. ISBN 978-80-248-4031-4.
Typ
Stať ve sborníku
Anotace
Databases of video content traditionally rely on annotations and meta-data imported by a person, usually the uploader. This is supposedly due to a lack of an universal approach to the automated multimedia content annotation. As it may be hard or impossible to find a single classifier for all encountered combinations of different modalities or even a network of the classifiers, current interest of our research is to use meta-learning for multiple stages of the multimedia content classification. With this, we hope to handle correctly all modalities involved including their overlaps. Successively, the extracted classes will be used to build the index and later used for searching and discovery in the multimedia.
How to Mimic Humans, Guide for Computers
Autoři
Kopp, M.; Pištora, M.; Holeňa, M.
Rok
2016
Publikováno
ITAT 2016: Information Technologies - Applications and Theory: Conference on Theory and Practice of Information Technologies. Luxemburg: CreateSpace Independent Publishing Platform, 2016. p. 110-117. ISBN 978-1-5370-1674-0.
Typ
Stať ve sborníku
Pracoviště
Anotace
This paper studies reverse Turing tests to tell humans and computers apart. Contrary to classical Turing tests, the judge is not a human but a computer. These tests are often called Completely Automated Public Turing tests to tell Computers and Humans Apart (CAPTCHA). The main purpose of such test is avoiding automated usage of various services, preventing bots from spamming on forums, securing user logins against dictionary or brute-force password guessing and many others. During years, a diversity of tests appeared. In this paper, we focused on the two most classical and
widespread schemes, which are text-based and audio-based CAPTCHA, and on their use in the Czech internet environment. The goal of this paper is to point out flaws and weak spots of often used solutions and consequent security risks. To this end, we pipelined several relatively easy algorithms like flood fill algorithm and k-nearest neighbours, to overcome CAPTCHA challenges at several web pages, including state administration.
Image Processing in Collaborative Open Narrative Systems
Autoři
Pulc, P.; Rosenzveig, E.; Holeňa, M.
Rok
2016
Publikováno
ITAT 2016: Information Technologies - Applications and Theory: Conference on Theory and Practice of Information Technologies. Luxemburg: CreateSpace Independent Publishing Platform, 2016. p. 155-162. ISSN 1613-0073.
Typ
Stať ve sborníku
Pracoviště
Anotace
Open narrative approach enables the creators of multimedia content to create multi-stranded, navigable narrative environments. The viewer is able to navigate such space depending on author’s predetermined constraints, or even browse the open narrative structure arbitrarily based on their interests. This philosophy is used with great advantage in the collaborative open narrative system NARRA. The platform creates a possibility for documentary makers, journalists, activists or other artists to link their own audiovisual material to clips of other authors and finally create a navigable space of individual multimedia pieces.
To help authors focus on building the narratives themselves, a set of automated tools have been proposed. Most obvious ones, as speech-to-text, are already incorporated in the system. However other, more complicated authoring tools, primarily focused on creating metadata for the media objects, are yet to be developed. Most complex of them involve an object description in media (with unrestricted motion, action or other features) and detection of near-duplicates of video content, which is the focus of our current interest.
In our approach, we are trying to use motion-based features and register them across the whole clip. Using GridCut algorithm to segment the image, we then try to select only parts of the motion picture, that are of our interest for further processing. For the selection of suitable description methods, we are developing a meta-learning approach. This will supposedly enable automatic annotation based not only on clip similarity per se, but rather on detected objects present in the shot.
Modeling and Clustering the Behavior of Animals Using Hidden Markov Models
Autoři
Šabata, T.; Borovička, T.; Holeňa, M.
Rok
2016
Publikováno
CEUR workshop proceedings. 2016, 2016(1649), 172-178. ISSN 1613-0073.
Typ
Článek
Pracoviště
Anotace
The objectives of this article are to model behavior of individual animals and to cluster the resulting models in order to group animals with similar behavior patterns. Hidden Markov models are considered suitable for clustering purposes. Their clustering is well studied, however, only if the observable variables can be assumed to be Gaussian mixtures, which is not valid in our case. Therefore, we use the Kullback-Leibler divergence to cluster hidden Markov models with observable variables that have an arbitrary distribution. Hierarchical and spectral clustering is applied. To evaluate the modeling approach, an experiment was performed and an accuracy of 83.86% was reached in predicting behavioral sequences of individual animals. Results of clustering were evaluated by means of statistical descriptors of the animals and by a domain expert, both methods confirm that the results of clustering are meaningful.
Modeling and Clustering the Behavior of Animals Using Hidden Markov Models.
Autoři
Šabata, T.; Borovička, T.; Holeňa, M.
Rok
2016
Publikováno
Proceedings ITAT 2016: Information Technologies - Applications and Theory.. Luxemburg: CreateSpace Independent Publishing Platform, 2016. p. 172-178. ISBN 978-1-5370-1674-0.
Typ
Stať ve sborníku
Pracoviště
Testing Gaussian Process Surrogates on CEC’2013 Multi-Modal Benchmark.
Autoři
Orekhov, N.; Bajer, L.; Holeňa, M.
Rok
2016
Publikováno
Proceedings ITAT 2016: Information Technologies - Applications and Theory.. Luxemburg: CreateSpace Independent Publishing Platform, 2016. p. 138-146. ISBN 978-1-5370-1674-0.
Typ
Stať ve sborníku
Pracoviště
Evaluation of Association Rules Extracted during Anomaly Explanation.
Autoři
Kopp, M.; Holeňa, M.
Rok
2015
Publikováno
ITAT 2015 conference proceedings. Aachen: CEUR Workshop Proceedings, 2015. pp. 143-149. ISSN 1613-0073. ISBN 978-1-5151-2065-0.
Typ
Stať ve sborníku
Pracoviště
Anotace
Discovering anomalies within data is nowadays
very important, because it helps to uncover interesting
events. Consequently, a considerable amount of anomaly
detection algorithms was proposed in the last few years.
Only a few papers about anomaly detection at least mentioned
why some samples were labelled as anomalous.
Therefore, we proposed a method allowing to extract rules
explaining the anomaly from an ensemble of specifically
trained decision trees, called sapling random forest.
Our method is able to interpret the output of an arbitrary
anomaly detector. The explanation is given as conjunctions
of atomic conditions, which can be viewed as
antecedents of association rules. In this work we focus on
selection, post processing and evaluation of those rules.
The main goal is to present a small number of the most
important rules. To achieve this, we use quality measures
such as lift and confidence boost. The resulting sets of
rules are experimentally and empirically evaluated on two
artificial datasets and one real-world dataset.
Investigation of Gaussian Processes in the Context of Black-Box Evolutionary Optimization
Autoři
Kudinov, A.; Bajer, L; Pitra, Z.; Holeňa, M.
Rok
2015
Publikováno
ITAT 2015 conference proceedings. Aachen: CEUR Workshop Proceedings, 2015, pp. 159-166. ISSN 1613-0073. ISBN 978-1-5151-2065-0.
Typ
Stať ve sborníku
Pracoviště
Interpreting and clustering outliers with sapling random forests
Autoři
Kopp, M.; Pevný, T.; Holeňa, M.
Rok
2014
Publikováno
Proceedings of the 14th conference ITAT 2014 – Workshops and Posters. Praha: Institute of Computer Science AS CR, 2014. pp. 61-67. ISBN 978-80-87136-19-5.
Typ
Stať ve sborníku
Pracoviště
Anotace
The main objective of outlier detection is find-
ing samples considerably deviating from the majority.
Such outliers, often referred to as anomalies, are nowadays
more and more important, because they help to uncover in-
teresting events within data. Consequently, a considerable
amount of statistical and data mining techniques to iden-
tify anomalies was proposed in the last few years, but only
a few works at least mentioned why some sample was la-
belled as an anomaly. Therefore, we propose a method
based on specifically trained decision trees, called sapling
random forest.
Our method is able to interpret the output of arbitrary
anomaly detector. The explanation is given as a subset of
features, in which the sample is most deviating, or as con-
junctions of atomic conditions, which can be viewed as
antecedents of logical rules easily understandable by hu-
mans. To simplify the investigation of suspicious samples
even more, we propose two methods of clustering anoma-
lies into groups. Such clusters can be investigated at once
saving time and human efforts. The feasibility of our ap-
proach is demonstrated on several synthetic and one real
world datasets.
Interpreting and clustering outliers with sapling random forests
Autoři
Kopp, M.; Pevný, T.; Holeňa, M.
Rok
2014
Publikováno
Proceedings of the 14th conference ITAT 2014 – Workshops and Posters. Praha: Institute of Computer Science AS CR, 2014. p. 61-67. ISBN 978-80-87136-19-5.
Typ
Stať ve sborníku
Pracoviště
Anotace
The main objective of outlier detection is find-
ing samples considerably deviating from the majority.
Such outliers, often referred to as anomalies, are nowadays
more and more important, because they help to uncover in-
teresting events within data. Consequently, a considerable
amount of statistical and data mining techniques to iden-
tify anomalies was proposed in the last few years, but only
a few works at least mentioned why some sample was la-
belled as an anomaly. Therefore, we propose a method
based on specifically trained decision trees, called sapling
random forest.
Our method is able to interpret the output of arbitrary
anomaly detector. The explanation is given as a subset of
features, in which the sample is most deviating, or as con-
junctions of atomic conditions, which can be viewed as
antecedents of logical rules easily understandable by hu-
mans. To simplify the investigation of suspicious samples
even more, we propose two methods of clustering anoma-
lies into groups. Such clusters can be investigated at once
saving time and human efforts. The feasibility of our ap-
proach is demonstrated on several synthetic and one real
world datasets.
Improving the Model Guided Sampling Optimization by Model Search and Slice Sampling
Autoři
Bajer, L.; Holeňa, M.; Charypar, V.
Rok
2013
Publikováno
ITAT 2013: Information Technologies - Applications and Theory Workshops, Posters, and Tutorials.. 2013, pp. 86-91. ISBN 978-1-4909-5208-6.
Typ
Stať ve sborníku
Pracoviště
Using machine learning methods in a personalized reputation system.
Autoři
Pejla, J.; Holeňa, M.
Rok
2013
Publikováno
ITAT 2013: Information Technologies - Applications and Theory Workshops, Posters, and Tutorials.. 2013, pp. 104-110. ISBN 978-1-4909-5208-6.
Typ
Stať ve sborníku
Pracoviště
Computing the correlation between catalyst composition and its performance in the catalysed process.
Autoři
Holeňa, M.; Steinfeldt, N.; Baerns, M.; Štefka, D.
Rok
2012
Publikováno
Computers and Chemical Engineering. 2012, 55-67. ISSN 0098-1354.
Typ
Článek
Pracoviště
Conformal sets in neural network regression.
Autoři
Demut, R.; Holeňa, M.
Rok
2012
Publikováno
Proceedings of Conference on Theory and Practice of information Technologies. Košice: Univerzita P. J. Šafárika, 2012. p. 17-24. ISBN 978-80-971144-0-4.
Typ
Stať ve sborníku
Pracoviště
Evolutionary optimization with active learning of surrogate models and fixed evaluation batch size.
Autoři
Charypar, V.; Holeňa, M.
Rok
2012
Publikováno
Proceedings of Conference on Theory and Practice of information Technologies. Košice: Univerzita P. J. Šafárika, 2012. p. 33-40. ISBN 978-80-971144-0-4.
Typ
Stať ve sborníku
Pracoviště
Assessing the Suitability of Surrogate Models in Evolutionary Optimization
Autoři
Demut, R.; Holeňa, M.
Rok
2011
Publikováno
Information Technologies - Applications and Theory. 2011, pp. 31-38. ISBN 978-80-89557-02-8.
Typ
Stať ve sborníku
Pracoviště
Dynamic Classifier Aggregation Using Fuzzy t-conorm Integral
Autoři
Štefka, D.; Holeňa, M.
Rok
2011
Publikováno
Proceedings of the 7th International Conference on Signal Image Technology & Internet Based Systems. Los Alamitos: IEEE Computer Society. Los Alamitos: IEEE Computer Society, 2011. p. 126-133. ISBN 978-1-4673-0431-3.
Typ
Stať ve sborníku
Pracoviště
Assessing the Usability of Predictions of Different Regression Models.
Autoři
Šťastný, J.; Holeňa, M.
Rok
2010
Publikováno
Informačné Technológie - Aplikácie a Teória. Seňa: PONT s.r.o., 2010, pp. 93-98. ISBN 978-80-970179-3-4.
Typ
Stať ve sborníku
Pracoviště
Dynamic Classifier Aggregation using Fuzzy Integral with Interaction-Sensitive Fuzzy Measure
Autoři
Štefka, D.; Holeňa, M.
Rok
2010
Publikováno
Proceedings of the 2010 10th International Conference on Intelligent Systems Design and Applications. 2010. pp. 225-230. ISBN 978-1-4244-8135-4.
Classifier Aggregation Using Local Classification Confidence.
Autoři
Štefka, D.; Holeňa, M.
Rok
2009
Publikováno
ICAART 2009. Setúbal: INSTICC Press, 2009, pp. 173-178. ISBN 978-989-8111-66-1.
Typ
Stať ve sborníku
Pracoviště
Dynamic Classifier Systems and their Applications to Random Forest Ensembles
Autoři
Štefka, D.; Holeňa, M.
Rok
2009
Publikováno
Adaptive and Natural Computing Algorithms. Heidelberg: Springer, 2009, pp. 458-468. LNCS. ISSN 0302-9743. ISBN 978-3-642-04920-0.
Typ
Stať ve sborníku
Pracoviště
Fuzzy Logic and Piecewise-Linear Regression.
Autoři
Fröhlich, J.; Holeňa, M.
Rok
2008
Publikováno
ITAT 2008 - Information Technologies - Applications and Theory. Košice: Univerzita P.J.Šafárika, 2008, pp. 35-38. ISBN 978-80-969184-8-5.
Typ
Stať ve sborníku
Pracoviště
Classification of EEG Data using Fuzzy k-NN Ensembles
Autoři
Štefka, D.; Holeňa, M.
Rok
2007
Publikováno
ITAT 2007. Conference on Theory and Practice of Information Technologies. 2007, pp. 91-94. ISBN 978-80-969184-6-1.
Typ
Stať ve sborníku
Pracoviště
The Use of Fuzzy t-conorm Integral for Combining Classifiers.
Autoři
Štefka, D.; Holeňa, M.
Rok
2007
Publikováno
Symbolic and Quantitative Approaches to Reasoning with Uncertainty. Berlin: Springer, 2007, pp. 755-766. Lecture Notes in Computer Science. ISSN 0302-9743. ISBN 978-3-540-75255-4.
Typ
Stať ve sborníku
Pracoviště
Use of Mamdani-Assilian Fuzzy Controller for Combining Classifiers
Autoři
Štefka, D.; Holeňa, M.
Rok
2007
Publikováno
Sborník semináře MIS 2007. Praha: Matematicko Fyzikální Fakulta, UK, 2007, pp. 88-97. ISBN 978-80-7378-033-3.
Typ
Stať ve sborníku
Pracoviště
Using Fuzzy k-NN Ensembles in EEG Data Classification
Autoři
Štefka, D.; Holeňa, M.
Rok
2007
Publikováno
Neuroinformatic Databases and Mining of Knowledge of them (Third book on Micro-sleeps). Praha: ČVUT v Praze, Fakulta dopravní, Ústav řidicí techniky a telematiky, 2007. ISBN 978-80-87136-01-0.
Typ
Kapitola v knize
Pracoviště
Using Fuzzy k-NN Ensembles in EEG Data Classification.
Autoři
Štefka, D.; Holeňa, M.
Rok
2007
Publikováno
Neuroinformatic Databases and Mining of Knowledge of Them.. Prague: Czech Technical University, 2007. ISBN 978-80-87136-01-0.
Typ
Stať ve sborníku
Pracoviště
The Specificity of Neural Networks in Extracting Rules from Data
Autoři
Rok
2006
Publikováno
Applied Artificial Intelligence. London: World Scientific, 2006, ISBN 981-256-690-2.
Typ
Stať ve sborníku
Pracoviště