Ing. Jaroslav Kuchař, Ph.D.

Anomaly Detection in Log Streams based on Time-Contextual Models

Authors

Fedotov, D.; Kuchař, J.; Vitvar, T.

Year

2025

Published

Web Information Systems Engineering – WISE 2024. Springer Nature Singapore Pte Ltd., 2025. p. 19-29. 1. ISSN 0302-9743. ISBN 978-981-96-0575-0.

Type

Proceedings paper

DOI

10.1007/978-981-96-0576-7_2

Departments

Department of Software Engineering

Annotation

Organisations today heavily rely on complex software systems integrated through multiple layers of middleware. This complexity leads to substantial generation of operational data of structured and semi-structured formats which is recorded in log files. The workload of the system fluctuates according to specific periods of the day which impacts the amount and quality of data generated in log files. In this paper, we propose a new log anomaly detection approach that leverages a collection of smaller models designed to capture workload fluctuations over specific time intervals. We demonstrate its effectiveness in detecting anomalies within log streams. Our evaluation uses log data from servers in a production environment, handling a complex back-end system that processes hundreds of requests per second. We show that our method outperforms traditional and widely used anomaly detection methods in data streams in the context of dynamic and time-sensitive workload scenarios.

Can variants, reinfection, symptoms and test types affect COVID-19 diagnostic performance? A large-scale retrospective study of AG-RDTs during circulation of Delta and Omicron variants, Czec

Authors

Kliegr, T.; Jarkovský, J.; Jiřincová, H.; Kuchař, J.; Karel, T.; Chudán, D.; Vojíř, S.; Zavřel, M.; Šanca, O.; Tachezy, R.

Year

2023

Published

Eurosurveillance. 2023, 28(38), 1-14. ISSN 1560-7917.

Type

Article

DOI

10.2807/1560-7917.ES.2023.28.38.2200938

Departments

Department of Software Engineering

Annotation

Background The sensitivity and specificity of selected antigen detection rapid diagnostic tests (AG-RDTs) for SARS-CoV-2 were determined in the unvaccinated population when the Delta variant was circulating. Viral loads, dynamics, symptoms and tissue tropism differ between Omicron and Delta. Aim We aimed to compare AG-RDT sensitivity and specificity in selected subgroups during Omicron vs Delta circulation. Methods We retrospectively paired AG-RDT results with PCRs registered in Czechia’s Information System for Infectious Diseases from 1 to 25 December 2021 (Delta, n = 20,121) and 20 January to 24 February 2022 (Omicron, n = 47,104). Results When confirmatory PCR was conducted on the same day as AG-RDT as a proxy for antigen testing close to peak viral load, the average sensitivity for Delta was 80.4% and for Omicron 81.4% (p < 0.05). Sensitivity in vaccinated individuals was lower for Omicron (OR = 0.94; 95% confidence interval (CI): 0.87–1.03), particularly in reinfections (OR = 0.83; 95% CI: 0.75–0.92). Saliva AG-RDT sensitivity was below average for both Delta (74.4%) and Omicron (78.4%). Tests on the European Union Category A list had higher sensitivity than tests in Category B. The highest sensitivity for Omicron (88.5%) was recorded for patients with loss of smell or taste, however, these symptoms were almost 10-fold less common than for Delta. The sensitivity of AG-RDTs performed on initially asymptomatic individuals done 1, 2 or 3 days before a positive PCR test was consistently lower for Omicron compared with Delta. Conclusion Sensitivity for Omicron was lower in subgroups that may become more common if SARS-CoV-2 becomes an endemic virus.

Time-Aware Log Anomaly Detection Based on Growing Self-organizing Map

Authors

Fedotov, D.; Kuchař, J.; Vitvar, T.

Year

2023

Published

Service-Oriented Computing. Springer, Cham, 2023. p. 169-177. ISSN 0302-9743. ISBN 978-3-031-48420-9.

Type

Proceedings paper

DOI

10.1007/978-3-031-48421-6_12

Departments

Department of Software Engineering

Annotation

A software system generates extensive log data, reflecting its workload and potential failures during operation. Log anomaly detection algorithms use this data to identify deviations in system behavior, especially when errors occur. Workload patterns can vary with time, depending on factors like the time of day or day of the week, affecting log entry volumes. Thus, it’s essential for log anomaly detection to consider temporal information that captures workload variations. This paper introduces a novel log anomaly detection method that incorporates such time information and demonstrates how smaller models enhance anomaly detection precision. We evaluate this method on a high-throughput production workload of a software system, showcasing its superior performance over conventional log anomaly detection methods.

Role of population and test characteristics in antigen-based SARS-CoV-2 diagnosis, Czechia, August to November 2021

Authors

Kliegr, T.; Jarkovský, J.; Jiřincová, H.; Kuchař, J.; Karel, T.; Tachezy, R.

Year

2022

Published

Eurosurveillance. 2022, 27(33), 1-15. ISSN 1560-7917.

Type

Article

DOI

10.2807/1560-7917.ES.2022.27.33.2200070

Departments

Department of Software Engineering

Annotation

Background Analyses of diagnostic performance of SARS-CoV-2 antigen rapid diagnostic tests (AG-RDTs) based on long-term data, population subgroups and many AG-RDT types are scarce. Aim We aimed to analyse sensitivity and specificity of AG-RDTs for subgroups based on age, incidence, sample type, reason for test, symptoms, vaccination status and the AG-RDT’s presence on approved lists. Methods We included AG-RDT results registered in Czechia’s Information System for Infectious Diseases between August and November 2021. Subpopulations were analysed based on 346,000 test results for which a confirmatory PCR test was recorded ≤ 3 days after the AG-RDT; 38 AG-RDTs with more than 100 PCR-positive and 300 PCR-negative samples were individually evaluated. Results Average sensitivity and specificity were 72.4% and 96.7%, respectively. We recorded lower sensitivity for age groups 0–12 (65.5%) and 13–18 years (65.3%). The sensitivity level rose with increasing SARS-CoV-2 incidence from 66.0% to 76.7%. Nasopharyngeal samples had the highest sensitivity and saliva the lowest. Sensitivity for preventive reasons was 63.6% vs 86.1% when testing for suspected infection. Sensitivity was 84.8% when one or more symptoms were reported compared with 57.1% for no symptoms. Vaccination was associated with a 4.2% higher sensitivity. Significantly higher sensitivity levels pertained to AG-RDTs on the World Health Organization Emergency Use List (WHO EUL), European Union Common List and the list of the United Kingdom’s Department of Health and Social Care. Conclusion AG-RDTs from approved lists should be considered, especially in situations associated with lower viral load. Results are limited to SARS-CoV-2 delta variant.

Anomaly detection in particulate matter pollution open data

Authors

Podsztavek, O.; Kuchař, J.

Year

2019

Published

DATA A ZNALOSTI & WIKT 2019. Košice: Technická univerzita v Košiciach, 2019. p. 66-71. ISBN 978-80-553-3354-0.

Type

Proceedings paper

Departments

Department of Software Engineering

Annotation

The paper is focused on an anomaly detection in particulate matter pollution open data. There are several street lights in Prague (Karlín) providing measurements from various sensors, including PM10. In this paper, we use machine learning algorithms for the anomaly detection: linear regression and LSTM neural network.

Associative Classification in R: arc, arulesCBA, and rCBA

Authors

Hahsler, M.; Johnson, I.; Kliegr, T.; Kuchař, J.

Year

2019

Published

The R Journal. 2019, 11(2), 254-267. ISSN 2073-4859.

Type

Article

DOI

10.32614/RJ-2019-048

Departments

Department of Software Engineering

Annotation

Several methods for creating classifiers based on rules discovered via association rule mining have been proposed in the literature. These classifiers are called associative classifiers and the best-known algorithm is Classification Based on Associations (CBA). Interestingly, only very few implementations are available and, until recently, no implementation was available for R. Now, three packages provide CBA. This paper introduces associative classification, the CBA algorithm, and how it can be used in R. A comparison of the three packages is provided to give the potential user an idea about the advantages of each of the implementations. We also show how the packages are related to the existing infrastructure for association rule mining already available in R.

Content-aware Collaborative Filtering in Point-ofInterest Recommendation Systems

Authors

Samigullina, G.; Kuchař, J.

Year

2019

Published

DATA A ZNALOSTI & WIKT 2019. Košice: Technická univerzita v Košiciach, 2019. p. 20-25. ISBN 978-80-553-3354-0.

Type

Proceedings paper

Departments

Department of Software Engineering

Annotation

With the availability of the vast amount of users and Location-based social networks, the problem of POI recommendations has been widely studied and received significant research attention in the last years. While previous works of POI recommendation mostly focused on investigating the spatial, temporal, and social influence, the use of additional content information has not been directionally studied. In this paper, we propose the content-aware matrix factorization method based on incorporating POI attributes and categories information. We propose two variants of the algorithm that can work with an explicit and implicit feedback. Experimental results show that the proposed method improves the quality of recommendation and outperforms most state-ofthe-art collaborative filtering algorithms.

Tuning Hyperparameters of Classification Based on Associations (CBA)

Authors

Kliegr, T.; Kuchař, J.

Year

2019

Published

Proceedings of the 19th Conference Information Technologies - Applications and Theory (ITAT 2019). Aachen: CEUR Workshop Proceedings, 2019. p. 9-16. vol. 2473. ISSN 1613-0073.

Type

Proceedings paper

Departments

Department of Software Engineering

Annotation

Classification models composed of crisp rules provide excellent explainability. The limitation of many conventional rule learning algorithms is the separate-and-conquer strategy, which may be slow on large data. Association Rule Classifiers (ARC) is an alternative approach that can be very fast on massive datasets but is highly susceptible to the correct choice of metaparameters. Most existing ARC algorithms use default thresholds of 50% for minimum confidence and 1% minimum support, which can result in excessively long rule generation or underperforming models. Due to the high-costs that can be associated with evaluation of single combination, it is impractical to use standard metaparameter optimization approaches. In this paper, we introduce two variant threshold tuning algorithms specifically designed for ARC. Evaluation on 22 standard UCI datasets shows promising results in terms of model size and accuracy in comparison with the default thresholds. The implementation of the proposed algorithms is made available in R packages rCBA and arc, which are available in the CRAN repository.

Data Mining from Open Fiscal Data

Authors

Chudán, D.; Svátek, V.; Kuchař, J.; Vojíř, S.

Year

2018

Published

Acta Informatica Pragensia. 2018, 7(1), 58-73. ISSN 1805-4951.

Type

Article

DOI

10.18267/j.aip.114

Departments

Department of Software Engineering

Annotation

Data mining methods are still more popular, even in domains where there is traditionally limited support by analytical tools and where the analyst´s manual work still prevails. Using these methods in the fiscal domain enables deeper analysis and can bring new findings. The deployment of data mining methods is one part of the OpenBudgets.eu project, which focuses on transparency and accountability in the public funds management. This overview article summarizes selected experiences of the authors of the project from the development, implementation and application of selected data mining methods on mining fiscal data. These methods are integrated into the central platform of the project available for the advanced and common users interested in fiscal data analysis. The pilot analysis showed that the problem of data mining in this domain is the large amount of found rules together with its heterogenous origin.

EasyMiner.eu: Web Framework for Interpretable Machine Learning based on Rules and Frequent Itemsets

Authors

Vojíř, S.; Zeman, V.; Kuchař, J.; Kliegr, T.

Year

2018

Published

Knowledge-Based Systems. 2018, 150 111-115. ISSN 0950-7051.

Type

Article

DOI

10.1016/j.knosys.2018.03.006

Departments

Department of Software Engineering

Annotation

EasyMiner (http://www.easyminer.eu) is a web-based machine learning system for interpretable machine learning based on frequent itemsets. The system currently offers association rule learning (apriori, FP-Growth) and classification (CBA). For association rule learning and classification, EasyMiner offers a visual interface designed for interactivity, allowing the user to define a constraining pattern for the mining task. The CBA algorithm can also be used for pruning of the rule set, thus addressing the common problem of “too many rules” on the output, and the implementation supports automatic tuning of confidence and support thresholds. The development version additionally supports anomaly detection (FPI and its variations) and linked data mining (AMIE+). EasyMiner is dockerized, some of its components are available as open source R packages.

Framework for Distributed Computing on the Web

Authors

Šiller, J.; Kuchař, J.

Year

2018

Published

Proceedings of the 18th Conference Information Technologies - Applications and Theory (ITAT 2018). Aachen: CEUR Workshop Proceedings, 2018. p. 161-167. vol. 2203. ISSN 1613-0073. ISBN 9781727267198.

Type

Proceedings paper

Departments

Department of Software Engineering
Faculty of Information Technology

Annotation

This work is a brief summary of a master thesis that focuses on design and implementation of a framework that uses computers of website visitors as computing nodes through web browsers. It contains an analysis of the Web environment, summarization of previous approaches and projects, design and implementation of the framework. The work describes the solution of computing node failure, reaction to slow computing node, possibilities of controlling the load of the framework on a website visitor’s computer, strategies for work distribution and security of the framework. At the end of the work, the experiment results and proposal of improvements are listed.

Image search in augmented reality

Authors

Chmelař, P.; Kuchař, J.

Year

2018

Published

Data a znalosti & WIKT. Brno: Vysoké učení technické v Brně. Fakulta informačních technologií, 2018. p. 111-114. 1. ISBN 978-80-214-5679-2.

Type

Proceedings paper

Departments

Department of Software Engineering

Annotation

Applications for mobile devices become part of our daily life. One of the recent trends in the development of mobile apps is a virtual or augmented reality. The goal of this project is to design, implement and evaluate a system allowing a searching in the augmented reality. The central part of the system is a server that can identify similarities of an input image with a predefined set of images in a database. The system also includes a library for iOS, REST API for integrations and a web application for management of image databases.

Spotlighting Anomalies using Frequent Patterns

Authors

Kuchař, J.; Svátek, V.

Year

2018

Published

KDD 2017 Workshop on Anomaly Detection in Finance. Proceedings of Machine Learning Research, 2018. p. 33-42. vol. 71. ISSN 1938-7228.

Type

Proceedings paper

Departments

Department of Software Engineering

Annotation

Approaches for the anomaly detection task based on frequent pattern mining follow the paradigm: if an instance contains more frequent patterns, it means that this data instance is unlikely to be an anomaly. This concept can be used in financial industry to reveal contextual anomalies. The main contribution of this paper is an approach that includes a novel formula for computation of anomaly scores. We evaluated the proposed approach on baseline datasets and present a use case on a real world financial dataset. We also propose a way how to explain the anomaly to the users. Implementations of the evaluated algorithms and experiments are available online in R.

EasyMiner – Short History of Research and Current Development

Authors

Kliegr, T.; Kuchař, J.; Vojíř, S.; Zeman, V.

Year

2017

Published

ITAT 2017: Information Technologies – Applications and Theory. Aachen: CEUR Workshop Proceedings, 2017. p. 235-239. vol. 1885. ISSN 1613-0073.

Type

Proceedings paper

Departments

Department of Software Engineering

Annotation

EasyMiner (easyminer.eu) is an academic data mining project providing data mining of association rules, building of classification models based on association rules and outlier detection based on frequent pattern mining. It differs from other data mining systems by adapting the “web search” paradigm. It is web-based, providing both a REST API and a user interface, and puts emphasis on interactivity, simplicity of user interface and immediate response. This paper will give an overview of research related to the EasyMiner project.

InBeat: JavaScript recommender system supporting sensor input and linked data

Authors

Kuchař, J.; Kliegr, T.

Year

2017

Published

Knowledge-Based Systems. 2017, 135 40-43. ISSN 0950-7051.

Type

Article

DOI

10.1016/j.knosys.2017.07.026

Departments

Department of Software Engineering

Annotation

Interest Beat (inbeat.eu) is an open source recommender framework that fulfills some of the demands raised by emerging applications that infer ratings from sensor input or use linked open data cloud for feature expansion. As a recommender algorithm, InBeat uses association rules, which allow to explain why a specific recommendation was made. Due to modular architecture, other algorithms can be easily plugged in. InBeat has a pure JavaScript version, which allows to confine processing to a client-side device. There is a performance optimized server-side bundle, which succesfully participated in two recent recommender competitions involving large volumes of streaming data. InBeat works on a number of platforms and is also available for Docker.

News Recommender System based on Association Rules @ CLEF NewsREEL 2017

Authors

Golian, C.; Kuchař, J.

Year

2017

Published

Working Notes of CLEF 2017 - Conference and Labs of the Evaluation Forum. Aachen: CEUR Workshop Proceedings, 2017. vol. 1866. ISSN 1613-0073.

Type

Proceedings paper

Departments

Department of Software Engineering
Faculty of Information Technology

Annotation

Digital editions of newspapers cause information overflow and users have problems choosing what they want to read. Systems which recommend news articles are suitable to solve such problems. Nevertheless, they face challenges unknown to the systems recommending books or movies such as a frequency of producing the new content. CLEF NewsREEL challenge enables to compare and evaluate news recommendation systems in an online and offline task focused on recommending articles to real users and tuning of algorithms respectively. This paper deals with an approach based on association rules acting as a classifier. In our approach we experimented with settings that allows to reduce the amount of rules used for the classification and increase the performance that is crucial for real recommendations. We evaluated our approach in both tasks of the CLEF NewsREEL 2017 challenge.

Outlier (Anomaly) Detection Modelling in PMML

Authors

Kuchař, J.; Ashenfelter, A.; Kliegr, T.

Year

2017

Published

RuleML+RR 2017 - Doctoral Consortium, Challenge, Industry Track, Tutorials and Posters. Aachen: CEUR Workshop Proceedings, 2017. vol. 1875. ISSN 1613-0073.

Type

Proceedings paper

Departments

Department of Software Engineering

Annotation

PMML is an industry-standard XML-based open format for representing statistical and data mining models. Since PMML does not yet support outlier (anomaly) detection, in this paper we propose a new outlier detection model to foster interoperability in this emerging field. Our proposal is included in the PMML RoadMap for PMML 4.4. We demonstrate the proposed format on one supervised and two unsupervised outlier detection approaches: association rule-based classifier CBA, frequent-pattern based method FPOF and isolation forests.

Recommending News Articles using Rule-based Classifier

Authors

Golian, C.; Kuchař, J.

Year

2017

Published

Data a znalosti 2017. Plzeň: Západočeská univerzita v Plzni, 2017. p. 51-55. ISBN 978-80-261-0720-0.

Type

Proceedings paper

Departments

Department of Software Engineering
Faculty of Information Technology

Annotation

In this paper we summarize our experiments with a rule-based classifier as a recommender within CLEF NewsREEL 2017 challenge. Systems that recommend news articles are suitable to solve information overflow in digital editions of newspapers, when users have problems choosing what they want to read. They face challenges unknown to the systems recommending books or movies such as a frequency of producing the new content. This paper deals with an approach based on association rules acting as a classifier. In our approach we experimented with settings that allow reducing the amount of rules used for the classification and increasing the performance that is crucial for real recommendations.

Using EasyMiner API for Financial Data Analysis in the OpenBudgets.eu Project

Authors

Vojíř, S.; Zeman, V.; Kuchař, J.; Kliegr, T.

Year

2017

Published

RuleML+RR 2017 - Doctoral Consortium, Challenge, Industry Track, Tutorials and Posters. Aachen: CEUR Workshop Proceedings, 2017. vol. 1875. ISSN 1613-0073.

Type

Proceedings paper

Departments

Department of Software Engineering

Annotation

This paper presents a use case for the data mining system EasyMiner in European project OpenBudgets.eu, which is concerned with publication and analysis of financial data of municipalities. EasyMiner is a web-based data mining system. This paper focuses on its new outlier detection functionality, which relies on frequent pattern mining. In addition, the system supports association rule discovery and building of rule-based classification models. The system exposes a REST API and can thus be easily integrated in third party applications.

Using EasyMiner API in the OpenBudgets.eu Project

Authors

Vojíř, S.; Zeman, V.; Kuchař, J.; Kliegr, T.

Year

2017

Published

Data a znalosti 2017. Plzeň: Západočeská univerzita v Plzni, 2017. p. 56-60. ISBN 978-80-261-0720-0.

Type

Proceedings paper

Departments

Department of Software Engineering

Annotation

Related to the increasing popularity of data mining there is a growing effort to integrate data mining algorithms and systems into user-friendly applications and information systems. This paper introduces a new version of web-based data mining system EasyMiner and its integration into a software solution developed within the European project OpenBudgets.eu. This project is aimed at publication and analysis of financial data of municipalities. The current version of EasyMiner supports mining of association rules, building of classification models and newly also outlier detection. Its functionality is available not only via a graphical user interface, bus also via REST API. The API can be easily used also from third party applications.

Analysis of Czech news articles

Authors

Filipová, M.; Kuchař, J.

Year

2016

Published

Proceedings in Informatics and Information Technologies - (WIKT & DaZ 2016) 11th Workshop on Intelligent and Knowledge Oriented Technologies 35th Conference on Data and Knowledge. Bratislava: Vydavatel'stvo STU, 2016. pp. 97-101. ISBN 978-80-227-4619-9.

Type

Proceedings paper

Departments

Department of Software Engineering
Faculty of Information Technology

Annotation

Nowadays, when the amount of information on the internet continues to grow, automatic processing and analysis of data has become a very. Online news service is one of the domains in which a significant amount of diverse as well as similar information exists. The goal of this work is to create a tool for analysis of Czech news articles. The first part is a crawler which allows downloading articles for analysis from news servers. In the second part, relevant content of articles and their other attributes are extracted from downloaded HTML pages. The third part is a text analysis for which modules for extraction of named entities and for sentiment analysis of Czech texts have been created. We performed experiments on four of the most visited Czech news portals. The results show that the presented approach is suitable for analysis of news articles.

Association Rules Mining for Big Data in Cloud

Authors

Zeman, V.; Vojiř, S.; Kuchař, J.; Kliegr, T.

Year

2016

Published

Proceedings in Informatics and Information Technologies - (WIKT & DaZ 2016) 11th Workshop on Intelligent and Knowledge Oriented Technologies 35th Conference on Data and Knowledge. Bratislava: Vydavatel'stvo STU, 2016. pp. 259-263. ISBN 978-80-227-4619-9.

Type

Proceedings paper

Departments

Department of Software Engineering

Annotation

EasyMiner is a web service for association rules mining. A new version of this tool uses Apache Hadoop and Apache Spark for big data processing in the MetaCloud of the CESNET association. The application consists of several services for dataset uploading into HDFS, preprocessing, association rules discovery and classification based on associations. All services communicate with each other through REST APIs and form a complex software working as a service in the cloud.

Exploiting Temporal Dimension in Tensor-Based Link Prediction

Authors

Kuchař, J.; Dojčinovski, M.; Vitvar, T.

Year

2016

Published

Web Information Systems and Technologies. Cham: Springer International Publishing, 2016. pp. 211-231. Lecture Notes in Business Information Processing. ISSN 1865-1348. ISBN 978-3-319-30995-8.

Type

Invited/Awarded proceedings paper

DOI

10.1007/978-3-319-30996-5_11

Departments

Department of Software Engineering

Annotation

In the recent years, there is a significant interest in a link prediction - an important task for graph-based data structures. Although there exist many approaches based on the graph theory and factorizations, there is still lack of methods that can work with multiple types of links and temporal information. The creation time of a link is an important aspect: it reflects age and credibility of the information. In this paper, we introduce a method that predicts missing links in RDF datasets. We model multiple relations of RDF as a tensor that incorporates the creation time of links as a key component too. We evaluate the proposed approach on real world datasets: an RDF representation of the ProgrammableWeb directory and a subset of the DBpedia focused on movies. The results show that the proposed method outperforms other link prediction approaches.

Augmenting a Feature Set of Movies Using Linked Open Data

Authors

Kuchař, J.

Year

2015

Published

Rule Challenge and Doctoral Consortium @ RuleML 2015. Aachen: CEUR Workshop Proceedings, 2015. ISSN 1613-0073.

Type

Proceedings paper

Departments

Department of Software Engineering

Annotation

Augmenting a feature set using mappings to the Web of data is an up-and-coming way to enrich data in the original dataset. Those enrichments are valuable especially for the recent preference learning algorithms and recommender systems. In this paper, we describe the process of mapping and augmenting the movie ratings dataset MovieTweetings from the perspective of RecSysRules 2015 Challenge. The ad-hoc queries to DBpedia are used as an underlying concept. To the best of our knowledge, there is no existing mapping dataset of movies for MovieTweetings. We also provide a brief discussion about the benefits of the augmented feature set for an elementary rule-based representation of the user preferences.

Benchmark of Rule-Based Classifiers in the News Recommendation Task

Authors

Kliegr, T.; Kuchař, J.

Year

2015

Published

Experimental IR Meets Multilinguality, Multimodality, and Interaction - 6th International Conference of the CLEF Association. Berlin: Springer-Verlag, 2015. p. 130-141. Lecture Notes in Computer Science. ISSN 0302-9743. ISBN 978-3-319-24026-8.

Type

Proceedings paper

DOI

10.1007/978-3-319-24027-5_11

Departments

Department of Software Engineering

Annotation

In this paper, we present experiments evaluating Association Rule Classification algorithms on on-line and off-line recommender tasks of the CLEF NewsReel 2014 Challenge. The second focus of the experimental evaluation is to investigate possible performance optimizations of the Classification Based on Associations algorithm. Our findings indicate that pruning steps in CBA reduce the number of association rules substantially while not affecting accuracy. Using only part of the data employed for the rule learning phase in the pruning phase may also reduce training time while not affecting accuracy significantly.

EasyMiner/R: Web Interface for Rule Learning and Classification in R

Authors

Vojíř, S.; Zeman, V.; Kuchař, J.; Kliegr, T.

Year

2015

Published

Rule Challenge and Doctoral Consortium @ RuleML 2015. Aachen: CEUR Workshop Proceedings, 2015. ISSN 1613-0073.

Type

Proceedings paper

Departments

Department of Software Engineering

Annotation

EasyMiner is a web-based visual interface for association rule learning. This paper presents a preview of the next release, which uses the R environment as the data processing backend. EasyMiner/R uses the arules package to learn rules. It uses the Classifications Based on Associations (CBA) algorithm as a classifier and to perform rule pruning. Experimental results show that EasyMiner with the R-based backend is able to handle larger datasets than the previous version.

Time-aware Link Prediction in RDF Graphs

Authors

Kuchař, J.; Dojčinovski, M.; Vitvar, T.

Year

2015

Published

WEBIST 2015 - Proceedings of the 11th International Conference on Web Information Systems and Technologies. Madeira: SciTePress, 2015. ISBN 978-989-758-106-9.

Type

Proceedings paper

DOI

10.5220/0005428403900401

Departments

Department of Software Engineering

Annotation

When a link is not explicitly present in an RDF dataset, it does not mean that the link could not exist in reality. Link prediction methods try to overcome this problem by finding new links in the dataset with support of a background knowledge about the already existing links in the dataset. In dynamic environments that change often and evolve over time, link prediction methods should also take into account the temporal aspects of data. In this paper, we present a novel time-aware link prediction method. We model RDF data as a tensor and take into account the time when RDF data was created. We use an ageing function to model a retention of the information over the time; lower the significance of the older information and promote more recent. Our evaluation shows that the proposed method improves quality of predictions when compared with methods that do not consider the time information.

Bag-of-Entities text representation for client-side (video) recommender systems

Authors

Kuchař, J.; Kliegr, T.

Year

2014

Published

RecSysTV 2014. 2014.

Type

Proceedings paper

Departments

Department of Software Engineering

Annotation

Client-side execution of a recommender system requires enrichment of the content delivered to the user with a list of potentially related content. A possible bottleneck for client-side recommendation is the data volume entailed by transferring the feature set describing each content item to the client, and the computational resources needed to process this feature set. This paper investigates whether the representation of the textual content (e.g. of videos) with Bag of Entities (BoE) vector generated by a wikifier can yield a classifier with the same accuracy at smaller size than the standard BoW approach. Experimental evaluation performed on the Reuters-21578 text categorization collection shows that there is a small improvement for small term vector sizes.

InBeat: Recommender System as a Service

Authors

Kuchař, J.; Kliegr, T.

Year

2014

Published

CLEF2014 Working Notes. Tilburg: CEUR Workshop Proceedings, 2014. p. 837-844. CLEF. ISSN 1613-0073.

Type

Proceedings paper

Departments

Department of Software Engineering

Annotation

Interest Beat (inbeat.eu) is a service for recommendation of content. InBeat was designed with emphasis on versatility, scalability and extensibility. The core contains the General Analytics INterceptor module, which collects and aggregates user interactions, the Preference Learning module and the Recommender System module. In this paper, we describe InBeat general architecture, putting emphasis on its high- performance architecture that was used in the CLEF-NEWSREEL: News Recommendation Evaluation Lab.

KINterestTV - Towards Non-invasive Measure of User Interest While Watching TV

Authors

Leroy, J.; Rocca, F.; Mancas, M.; Madhkour, R.B.; Grisard, F.; Kliegr, T.; Kuchař, J.; Vit, J.; Pirner, I.; Zimmermann, P.

Year

2014

Published

Innovative and Creative Developments in Multimodal Interaction Systems. Berlin: Springer, 2014. pp. 179-199. IFIP Advances in Information and Communication Technology. ISSN 1868-4238. ISBN 978-3-642-55142-0.

Type

Proceedings paper

DOI

10.1007/978-3-642-55143-7_8

Departments

Department of Software Engineering

Annotation

Is it possible to determine only by observing the behavior of a user what are his interests for a media? The aim of this project is to develop an application that can detect whether or not a user is viewing a content on the TV and use this information to build the user profile and to make it evolve dynamically. Our approach is based on the use of a 3D sensor to study the movements of a user’s head to make an implicit analysis of his behavior. This behavior is synchronized with the TV content (media fragments) and other user interactions (clicks, gestural interaction) to further infer viewer’s interest. Our approach is tested during an experiment simulating the attention changes of a user in a scenario involving second screen (tablet) interaction, a behavior that has become common for spectators and a typical source of attention switches.

Learning Business Rules with Association Rule Classifiers

Authors

Kliegr, T.; Kuchař, J.; Sottara, D.; Vojíř, S.

Year

2014

Published

Rules on the Web. From Theory to Applications. Cham: Springer International Publishing AG, 2014. p. 236-250. Lecture Notes in Computer Science. ISSN 0302-9743. ISBN 978-3-319-09869-2.

Type

Proceedings paper

DOI

10.1007/978-3-319-09870-8_18

Departments

Department of Software Engineering

Annotation

The main obstacles for a straightforward use of association rules as candidate business rules are the excessive number of rules discovered even on small datasets, and the fact that contradicting rules are generated. This paper shows that Association Rule Classification algorithms, such as CBA, solve both these problems, and provides a practical guide on using discovered rules in the Drools BRMS and on setting the ARC parameters. Experiments performed with modified CBA on several UCI datasets indicate that data coverage rule pruning keeps the number of rules manageable, while not adversely impacting the accuracy. The best results in terms of overall accuracy are obtained using minimum support and confidence thresholds. Disjunction between attribute values seem to provide a desirable balance between accuracy and rule count, while negated literals have not been found beneficial.

Orwellian Eye: Video Recommendation with Microsoft Kinect

Authors

Kliegr, T.; Kuchař, J.

Year

2014

Published

ECAI 2014. Amsterdam: IOS Press, 2014. pp. 1227-1228. Frontiers in Artificial Intelligence and Applications. ISSN 0922-6389. ISBN 978-1-61499-418-3.

Type

Proceedings paper

DOI

10.3233/978-1-61499-419-0-1227

Departments

Department of Software Engineering

Annotation

This paper demonstrates Interest Beat (InBeat.eu) as a recommender system for online videos, which determines user interest in the content based on gaze tracking with Microsoft Kinect in addition to explicit user feedback. Content of the videos is represented using a semantic wikifier. User profile is constructed from preference rules, which are discovered with an association rule learner.

Recommendation of multimedia content with usage of Microsoft Kinect

Authors

Kuchař, J.; Kliegr, T.

Year

2014

Published

Proceedings of the 13th Annual Conference Znalosti 2014. Praha: VŠE, 2014. pp. 84-87. ISBN 978-80-245-2054-4.

Type

Proceedings paper

Departments

Department of Software Engineering

Annotation

This paper presents an online recommender system: InBeat.eu. This system is focused on collecting of implicit and explicit feedback from users. We demonstrate our system on semantically annotated multimedia content - tv news scenario. As the main input is uses Microsoft Kinect sensor in order to get head pose of a viewer and evaluate interest of the viewer. The collected interactions of a user are aggregated and processed by an association rule-mining algorithm. The system represents the user profile by using association rules. Extracted rules are used to recommend new videos to viewers.

When TV meets the Web: towards personalised digital media

Authors

Tsatsou, D.; Mancas, M.; Kuchař, J.; Nixon, L.; Vacura, M.; Leroy, J.; Rocca, F.; Mezaris, V.

Year

2014

Published

Semantic Multimedia Analysis and Processing. Boca Raton: CRC Press, 2014. p. 221-256. Digital Imaging and Computer Vision. ISBN 978-1-4665-7549-3.

Type

Book chapter

Departments

Department of Software Engineering

Annotation

The rise of new paradigms in the field of television and digital media distribution (e.g. Smart TV, IPTV, Social TV) has opened a new digital world of data communication opportunities but at the same time exacerbated the information overload problem for media consumers and providers. Therefore, the need for personalized content delivery has extended from the traditional web to the networked media domain. This chapter presents a comprehensive research in the field of capturing and representing user preferences and context and an overview of relevant digital media-specific personalized recommendation techniques. Subsequently, it describes the vision and first personalization approach adopted within the LinkedTV EU project, for profiling and contextualizing users and providing targeted information and content in a linked media environment.

GAIN: web service for user tracking and preference learning - a smart TV use case

Authors

Kuchař, J.; Kliegr, T.

Year

2013

Published

RecSys '13 Proceedings of the 7th ACM conference on Recommender systems. New York: ACM, 2013. pp. 467-468. ISBN 978-1-4503-2409-0.

Type

Proceedings paper

DOI

10.1145/2507157.2508217

Departments

Department of Software Engineering

Annotation

GAIN (inbeat.eu) is a web application and service for capturing and preprocessing user interactions with semantically described content. GAIN outputs a set of instances in tabular form suitable for further processing with generic machine-learning algorithms. GAIN is demoed as a component of a "SMART-TV" recommender system. Content is automatically described with DBpedia types using a Named Entity Recognition (NER) system. Interest is determined based on explicit user actions and user's attention computed by 3D head pose estimation. Preference rules are learnt with an association rule mining algorithm. These can be e.g. deployed to a business rules system, acting as a recommender.

GAIN: Analysis of Implicit Feedback on Semantically Annotated Content

Authors

Kuchař, J.; Kliegr, T.

Year

2012

Published

WIKT 2012: 7th Workshop on Intelligent and Knowledge Oriented Technologies. Slovenská technická univerzita v Bratislave, 2012. pp. 75-78. ISBN 978-80-227-3812-5.

Type

Proceedings paper

Departments

Department of Software Engineering

Annotation

The trend in application development is to provide a personalized interface. The availability of the user preference level associated with user actions is the key for the personalization process. This paper describes a "work-in-progress" framework for deriving user preference from actions performed on semantically annotated objects - be it web pages or TV news. Preference level is computed using supervised learning with genetic programming from implicit feedback, which might be time on page for the web domain, or the user engagement level for the TV domain. We provide tool called GAIN (General Analytics INterceptor) covering the whole approach at wa.vse.cz.

Personalised Graph-Based Selection of Web APIs

Authors

Dojčinovski, M.; Kuchař, J.; Vitvar, T.; Zaremba, M.

Year

2012

Published

The Semantic Web -- ISWC 2012. Heidelberg: Springer-Verlag, GmbH, 2012. p. 34-48. Lecture Notes in Computer Science. ISSN 0302-9743. ISBN 978-3-642-35175-4.

Type

Proceedings paper

DOI

10.1007/978-3-642-35176-1_3

Departments

Department of Software Engineering

Annotation

Modelling and understanding various contexts of users is important to enable personalised selection of Web APIs in directories such as Programmable Web. Currently, relationships between users and Web APIs are not clearly understood and utilized by existing selection approaches. In this paper, we present a semantic model of a Web API directory graph that captures relationships such as Web APIs, mashups, developers, and categories. We describe a novel configurable graph-based method for selection of Web APIs with personalised and temporal aspects. The method allows users to get more control over their preferences and recommended Web APIs while they can exploit information about their social links and preferences. We evaluate the method on a real-world dataset from ProgrammableWeb.com, and show that it provides more contextualised results than currently available popularity-based rankings.

Learning Semantic Web Usage Profiles by Using Genetic Algorithms

Authors

Kuchař, J.; Jelínek, I.

Year

2011

Published

International Journal on Information Technologies and Security. 2011, 3(4), 3-20. ISSN 1313-8251.

Type

Article

Departments

Department of Software Engineering

Annotation

Web usage profile is very important in recommender systems. More interesting is the semantic enriched profile, which can describe visitor intents by ontologies and express more information and relations of visitor's character. Our research is based on processing semantically enriched clickstream and application of scoring algorithm, which is based on symbolic regression. A semantic enrichment uses Linked Data principles. The scoring assigns to each pageview a value, which represents and involves visitor interests. Scoring involves all know attributes of each pageview including semantic annotation. The score of each pageview is used to establish a visitor profile. The established profile can be in form of ontologies. In this paper, we propose integrate scoring algorithm into semantic web usage mining and publish visitor profile in RDF/OWL representation. We suggest merge the profiles from different web sites and integrate additional related information from publicly available reso

Ing. Jaroslav Kuchař, Ph.D.

Publications

Anomaly Detection in Log Streams based on Time-Contextual Models

Can variants, reinfection, symptoms and test types affect COVID-19 diagnostic performance? A large-scale retrospective study of AG-RDTs during circulation of Delta and Omicron variants, Czec

Time-Aware Log Anomaly Detection Based on Growing Self-organizing Map

Role of population and test characteristics in antigen-based SARS-CoV-2 diagnosis, Czechia, August to November 2021

Anomaly detection in particulate matter pollution open data

Associative Classification in R: arc, arulesCBA, and rCBA

Content-aware Collaborative Filtering in Point-ofInterest Recommendation Systems

Tuning Hyperparameters of Classification Based on Associations (CBA)

Data Mining from Open Fiscal Data

EasyMiner.eu: Web Framework for Interpretable Machine Learning based on Rules and Frequent Itemsets

Framework for Distributed Computing on the Web

Image search in augmented reality

Spotlighting Anomalies using Frequent Patterns

EasyMiner – Short History of Research and Current Development

InBeat: JavaScript recommender system supporting sensor input and linked data

News Recommender System based on Association Rules @ CLEF NewsREEL 2017

Outlier (Anomaly) Detection Modelling in PMML

Recommending News Articles using Rule-based Classifier

Using EasyMiner API for Financial Data Analysis in the OpenBudgets.eu Project

Using EasyMiner API in the OpenBudgets.eu Project

Analysis of Czech news articles

Association Rules Mining for Big Data in Cloud

Exploiting Temporal Dimension in Tensor-Based Link Prediction

Augmenting a Feature Set of Movies Using Linked Open Data

Benchmark of Rule-Based Classifiers in the News Recommendation Task

EasyMiner/R: Web Interface for Rule Learning and Classification in R

Time-aware Link Prediction in RDF Graphs

Bag-of-Entities text representation for client-side (video) recommender systems

InBeat: Recommender System as a Service

KINterestTV - Towards Non-invasive Measure of User Interest While Watching TV

Learning Business Rules with Association Rule Classifiers

Orwellian Eye: Video Recommendation with Microsoft Kinect

Recommendation of multimedia content with usage of Microsoft Kinect

When TV meets the Web: towards personalised digital media

GAIN: web service for user tracking and preference learning - a smart TV use case

GAIN: Analysis of Implicit Feedback on Semantically Annotated Content

Personalised Graph-Based Selection of Web APIs

Learning Semantic Web Usage Profiles by Using Genetic Algorithms