Ing. Jaroslav Kuchař, Ph.D.

Anomaly Detection in Log Streams based on Time-Contextual Models

Autoři

Fedotov, D.; Kuchař, J.; Vitvar, T.

Rok

2025

Publikováno

Web Information Systems Engineering – WISE 2024. Springer Nature Singapore Pte Ltd., 2025. p. 19-29. 1. ISSN 0302-9743. ISBN 978-981-96-0575-0.

Typ

Stať ve sborníku

DOI

10.1007/978-981-96-0576-7_2

Pracoviště

Katedra softwarového inženýrství

Anotace

Organisations today heavily rely on complex software systems integrated through multiple layers of middleware. This complexity leads to substantial generation of operational data of structured and semi-structured formats which is recorded in log files. The workload of the system fluctuates according to specific periods of the day which impacts the amount and quality of data generated in log files. In this paper, we propose a new log anomaly detection approach that leverages a collection of smaller models designed to capture workload fluctuations over specific time intervals. We demonstrate its effectiveness in detecting anomalies within log streams. Our evaluation uses log data from servers in a production environment, handling a complex back-end system that processes hundreds of requests per second. We show that our method outperforms traditional and widely used anomaly detection methods in data streams in the context of dynamic and time-sensitive workload scenarios.

Can variants, reinfection, symptoms and test types affect COVID-19 diagnostic performance? A large-scale retrospective study of AG-RDTs during circulation of Delta and Omicron variants, Czec

Autoři

Kliegr, T.; Jarkovský, J.; Jiřincová, H.; Kuchař, J.; Karel, T.; Chudán, D.; Vojíř, S.; Zavřel, M.; Šanca, O.; Tachezy, R.

Rok

2023

Publikováno

Eurosurveillance. 2023, 28(38), 1-14. ISSN 1560-7917.

Typ

Článek

DOI

10.2807/1560-7917.ES.2023.28.38.2200938

Pracoviště

Katedra softwarového inženýrství

Anotace

Background The sensitivity and specificity of selected antigen detection rapid diagnostic tests (AG-RDTs) for SARS-CoV-2 were determined in the unvaccinated population when the Delta variant was circulating. Viral loads, dynamics, symptoms and tissue tropism differ between Omicron and Delta. Aim We aimed to compare AG-RDT sensitivity and specificity in selected subgroups during Omicron vs Delta circulation. Methods We retrospectively paired AG-RDT results with PCRs registered in Czechia’s Information System for Infectious Diseases from 1 to 25 December 2021 (Delta, n = 20,121) and 20 January to 24 February 2022 (Omicron, n = 47,104). Results When confirmatory PCR was conducted on the same day as AG-RDT as a proxy for antigen testing close to peak viral load, the average sensitivity for Delta was 80.4% and for Omicron 81.4% (p < 0.05). Sensitivity in vaccinated individuals was lower for Omicron (OR = 0.94; 95% confidence interval (CI): 0.87–1.03), particularly in reinfections (OR = 0.83; 95% CI: 0.75–0.92). Saliva AG-RDT sensitivity was below average for both Delta (74.4%) and Omicron (78.4%). Tests on the European Union Category A list had higher sensitivity than tests in Category B. The highest sensitivity for Omicron (88.5%) was recorded for patients with loss of smell or taste, however, these symptoms were almost 10-fold less common than for Delta. The sensitivity of AG-RDTs performed on initially asymptomatic individuals done 1, 2 or 3 days before a positive PCR test was consistently lower for Omicron compared with Delta. Conclusion Sensitivity for Omicron was lower in subgroups that may become more common if SARS-CoV-2 becomes an endemic virus.

Time-Aware Log Anomaly Detection Based on Growing Self-organizing Map

Autoři

Fedotov, D.; Kuchař, J.; Vitvar, T.

Rok

2023

Publikováno

Service-Oriented Computing. Springer, Cham, 2023. p. 169-177. ISSN 0302-9743. ISBN 978-3-031-48420-9.

Typ

Stať ve sborníku

DOI

10.1007/978-3-031-48421-6_12

Pracoviště

Katedra softwarového inženýrství

Anotace

A software system generates extensive log data, reflecting its workload and potential failures during operation. Log anomaly detection algorithms use this data to identify deviations in system behavior, especially when errors occur. Workload patterns can vary with time, depending on factors like the time of day or day of the week, affecting log entry volumes. Thus, it’s essential for log anomaly detection to consider temporal information that captures workload variations. This paper introduces a novel log anomaly detection method that incorporates such time information and demonstrates how smaller models enhance anomaly detection precision. We evaluate this method on a high-throughput production workload of a software system, showcasing its superior performance over conventional log anomaly detection methods.

Role of population and test characteristics in antigen-based SARS-CoV-2 diagnosis, Czechia, August to November 2021

Autoři

Kliegr, T.; Jarkovský, J.; Jiřincová, H.; Kuchař, J.; Karel, T.; Tachezy, R.

Rok

2022

Publikováno

Eurosurveillance. 2022, 27(33), 1-15. ISSN 1560-7917.

Typ

Článek

DOI

10.2807/1560-7917.ES.2022.27.33.2200070

Pracoviště

Katedra softwarového inženýrství

Anotace

Background Analyses of diagnostic performance of SARS-CoV-2 antigen rapid diagnostic tests (AG-RDTs) based on long-term data, population subgroups and many AG-RDT types are scarce. Aim We aimed to analyse sensitivity and specificity of AG-RDTs for subgroups based on age, incidence, sample type, reason for test, symptoms, vaccination status and the AG-RDT’s presence on approved lists. Methods We included AG-RDT results registered in Czechia’s Information System for Infectious Diseases between August and November 2021. Subpopulations were analysed based on 346,000 test results for which a confirmatory PCR test was recorded ≤ 3 days after the AG-RDT; 38 AG-RDTs with more than 100 PCR-positive and 300 PCR-negative samples were individually evaluated. Results Average sensitivity and specificity were 72.4% and 96.7%, respectively. We recorded lower sensitivity for age groups 0–12 (65.5%) and 13–18 years (65.3%). The sensitivity level rose with increasing SARS-CoV-2 incidence from 66.0% to 76.7%. Nasopharyngeal samples had the highest sensitivity and saliva the lowest. Sensitivity for preventive reasons was 63.6% vs 86.1% when testing for suspected infection. Sensitivity was 84.8% when one or more symptoms were reported compared with 57.1% for no symptoms. Vaccination was associated with a 4.2% higher sensitivity. Significantly higher sensitivity levels pertained to AG-RDTs on the World Health Organization Emergency Use List (WHO EUL), European Union Common List and the list of the United Kingdom’s Department of Health and Social Care. Conclusion AG-RDTs from approved lists should be considered, especially in situations associated with lower viral load. Results are limited to SARS-CoV-2 delta variant.

Associative Classification in R: arc, arulesCBA, and rCBA

Autoři

Hahsler, M.; Johnson, I.; Kliegr, T.; Kuchař, J.

Rok

2019

Publikováno

The R Journal. 2019, 11(2), 254-267. ISSN 2073-4859.

Typ

Článek

DOI

10.32614/RJ-2019-048

Pracoviště

Katedra softwarového inženýrství

Anotace

Several methods for creating classifiers based on rules discovered via association rule mining have been proposed in the literature. These classifiers are called associative classifiers and the best-known algorithm is Classification Based on Associations (CBA). Interestingly, only very few implementations are available and, until recently, no implementation was available for R. Now, three packages provide CBA. This paper introduces associative classification, the CBA algorithm, and how it can be used in R. A comparison of the three packages is provided to give the potential user an idea about the advantages of each of the implementations. We also show how the packages are related to the existing infrastructure for association rule mining already available in R.

Content-aware Collaborative Filtering in Point-ofInterest Recommendation Systems

Autoři

Samigullina, G.; Kuchař, J.

Rok

2019

Publikováno

DATA A ZNALOSTI & WIKT 2019. Košice: Technická univerzita v Košiciach, 2019. p. 20-25. ISBN 978-80-553-3354-0.

Typ

Stať ve sborníku

Pracoviště

Katedra softwarového inženýrství

Anotace

With the availability of the vast amount of users and Location-based social networks, the problem of POI recommendations has been widely studied and received significant research attention in the last years. While previous works of POI recommendation mostly focused on investigating the spatial, temporal, and social influence, the use of additional content information has not been directionally studied. In this paper, we propose the content-aware matrix factorization method based on incorporating POI attributes and categories information. We propose two variants of the algorithm that can work with an explicit and implicit feedback. Experimental results show that the proposed method improves the quality of recommendation and outperforms most state-ofthe-art collaborative filtering algorithms.

Detekce anomálií v otevřených datech o znečištění ovzduší polétavým prachem

Autoři

Podsztavek, O.; Kuchař, J.

Rok

2019

Publikováno

DATA A ZNALOSTI & WIKT 2019. Košice: Technická univerzita v Košiciach, 2019. p. 66-71. ISBN 978-80-553-3354-0.

Typ

Stať ve sborníku

Pracoviště

Katedra softwarového inženýrství

Anotace

Senzorická síť veřejného osvětlení na pražském Karlínském náměstí poskytuje měření znečištění ovzduší polétavým prachem PM10 jako otevřená data. V této práci v nich detekujeme anomálie pomocí algoritmů strojového učení pro predikci časových řad a prahování. Chceme, aby se algoritmus strojového učení naučil pravidelnosti v datech a pokud se stane něco neočekávaného, tak to prahováním odhalíme. Experimentovali jsme s lineární regresí a LSTM rekurentní neuronovou sítí, které jsme mezi sebou porovnávali střední kvadratickou chybou. Ukázalo se, že lineární regrese, která predikuje z posledních dvou měření, dosahuje lepších výsledků. Anomálie jsme detekovali z rozdílů predikovaných a skutečných hodnot. Práh pro detekování anomálií jsme vypočítali z histogramu rozdílů predikcí a skutečně naměřených hodnot. Testování ukázalo, že takto navržená metoda dokáže odhalit některé anomálie v měřeních polétavého prachu PM10, ale mnoho anomálií (například postupně nabíhajících) nedetekuje.

Tuning Hyperparameters of Classification Based on Associations (CBA)

Autoři

Kliegr, T.; Kuchař, J.

Rok

2019

Publikováno

Proceedings of the 19th Conference Information Technologies - Applications and Theory (ITAT 2019). Aachen: CEUR Workshop Proceedings, 2019. p. 9-16. vol. 2473. ISSN 1613-0073.

Typ

Stať ve sborníku

Pracoviště

Katedra softwarového inženýrství

Anotace

Classification models composed of crisp rules provide excellent explainability. The limitation of many conventional rule learning algorithms is the separate-and-conquer strategy, which may be slow on large data. Association Rule Classifiers (ARC) is an alternative approach that can be very fast on massive datasets but is highly susceptible to the correct choice of metaparameters. Most existing ARC algorithms use default thresholds of 50% for minimum confidence and 1% minimum support, which can result in excessively long rule generation or underperforming models. Due to the high-costs that can be associated with evaluation of single combination, it is impractical to use standard metaparameter optimization approaches. In this paper, we introduce two variant threshold tuning algorithms specifically designed for ARC. Evaluation on 22 standard UCI datasets shows promising results in terms of model size and accuracy in comparison with the default thresholds. The implementation of the proposed algorithms is made available in R packages rCBA and arc, which are available in the CRAN repository.

Dolování z otevřených dat o rozpočtech a výdajích

Autoři

Chudán, D.; Svátek, V.; Kuchař, J.; Vojíř, S.

Rok

2018

Publikováno

Acta Informatica Pragensia. 2018, 7(1), 58-73. ISSN 1805-4951.

Typ

Článek

DOI

10.18267/j.aip.114

Pracoviště

Katedra softwarového inženýrství

Anotace

Metody dolování z dat jsou aplikovány ve stále větší míře, a to i v doménách, které tradičně nemají tak silnou podporu analytických nástrojů a kde převládá ruční práce analytika. Použití těchto metod v oblasti fiskálních dat umožní jejich hlubší analýzu a může přinést nová zjištění. Nasazení pokročilých metod dolování z dat je jednou z částí projektu OpenBudgets.eu, který se zaměřuje na transparentnost a odpovědnost v oblasti nakládání s veřejnými prostředky. Tento přehledový článek shrnuje některé zkušenosti autorů z tohoto projektu získané při vývoji, implementaci a aplikaci vybraných metod dolování z fiskálních dat. Jedná se zejména o metody detekce anomálií a dolování asociačních pravidel. Tyto metody jsou integrovány do centrální platformy projektu, která je k dispozici pokročilým i běžným uživatelům v případě zájmu o analýzu fiskálních dat. Pilotní analýzy ukázaly, že problémem dataminingové analýzy v této doméně je velký objem nacházených pravidel a různorodý původ jejich vzniku.

EasyMiner.eu: Web Framework for Interpretable Machine Learning based on Rules and Frequent Itemsets

Autoři

Vojíř, S.; Zeman, V.; Kuchař, J.; Kliegr, T.

Rok

2018

Publikováno

Knowledge-Based Systems. 2018, 150 111-115. ISSN 0950-7051.

Typ

Článek

DOI

10.1016/j.knosys.2018.03.006

Pracoviště

Katedra softwarového inženýrství

Anotace

EasyMiner (http://www.easyminer.eu) is a web-based machine learning system for interpretable machine learning based on frequent itemsets. The system currently offers association rule learning (apriori, FP-Growth) and classification (CBA). For association rule learning and classification, EasyMiner offers a visual interface designed for interactivity, allowing the user to define a constraining pattern for the mining task. The CBA algorithm can also be used for pruning of the rule set, thus addressing the common problem of “too many rules” on the output, and the implementation supports automatic tuning of confidence and support thresholds. The development version additionally supports anomaly detection (FPI and its variations) and linked data mining (AMIE+). EasyMiner is dockerized, some of its components are available as open source R packages.

Framework for Distributed Computing on the Web

Autoři

Šiller, J.; Kuchař, J.

Rok

2018

Publikováno

Proceedings of the 18th Conference Information Technologies - Applications and Theory (ITAT 2018). Aachen: CEUR Workshop Proceedings, 2018. p. 161-167. vol. 2203. ISSN 1613-0073. ISBN 9781727267198.

Typ

Stať ve sborníku

Pracoviště

Katedra softwarového inženýrství
Fakulta informačních technologií

Anotace

This work is a brief summary of a master thesis that focuses on design and implementation of a framework that uses computers of website visitors as computing nodes through web browsers. It contains an analysis of the Web environment, summarization of previous approaches and projects, design and implementation of the framework. The work describes the solution of computing node failure, reaction to slow computing node, possibilities of controlling the load of the framework on a website visitor’s computer, strategies for work distribution and security of the framework. At the end of the work, the experiment results and proposal of improvements are listed.

Spotlighting Anomalies using Frequent Patterns

Autoři

Kuchař, J.; Svátek, V.

Rok

2018

Publikováno

KDD 2017 Workshop on Anomaly Detection in Finance. Proceedings of Machine Learning Research, 2018. p. 33-42. vol. 71. ISSN 1938-7228.

Typ

Stať ve sborníku

Pracoviště

Katedra softwarového inženýrství

Anotace

Approaches for the anomaly detection task based on frequent pattern mining follow the paradigm: if an instance contains more frequent patterns, it means that this data instance is unlikely to be an anomaly. This concept can be used in financial industry to reveal contextual anomalies. The main contribution of this paper is an approach that includes a novel formula for computation of anomaly scores. We evaluated the proposed approach on baseline datasets and present a use case on a real world financial dataset. We also propose a way how to explain the anomaly to the users. Implementations of the evaluated algorithms and experiments are available online in R.

Vyhledávání obrázků v rozšířené realitě

Autoři

Chmelař, P.; Kuchař, J.

Rok

2018

Publikováno

Data a znalosti & WIKT. Brno: Vysoké učení technické v Brně. Fakulta informačních technologií, 2018. p. 111-114. 1. ISBN 978-80-214-5679-2.

Typ

Stať ve sborníku

Pracoviště

Katedra softwarového inženýrství

Anotace

Aplikace pro mobilní telefony se staly každodenní součástí našich životů. Jejich pole působnosti je ale zatím ve většině případů striktně limitované displejem daného zařízení. Jedním ze současných trendů ve vývoji mobilních aplikací je právě rozšiřování možností aplikace za hranice samotného přístroje -- do prostředí virtuální, potažmo rozšířené reality. Cílem tohoto projektu je vytvoření systému, který umožní obohacení mobilních aplikací o vyhledávání v prostředí rozšířené reality. Hlavní komponentou systému je serverová aplikace poskytující službu k vyhodnocení shod přijatého obrázku s definovanou databází obrázků. Pro usnadnění implementace vyhledávání v rozšířené realitě je k dispozici knihovna pro platformu iOS. Správa databází obrázků je umožněna prostřednictvím REST API nebo pomocí jednoduchého webového rozhraní.

EasyMiner – Short History of Research and Current Development

Autoři

Kliegr, T.; Kuchař, J.; Vojíř, S.; Zeman, V.

Rok

2017

Publikováno

ITAT 2017: Information Technologies – Applications and Theory. Aachen: CEUR Workshop Proceedings, 2017. p. 235-239. vol. 1885. ISSN 1613-0073.

Typ

Stať ve sborníku

Pracoviště

Katedra softwarového inženýrství

Anotace

EasyMiner (easyminer.eu) is an academic data mining project providing data mining of association rules, building of classification models based on association rules and outlier detection based on frequent pattern mining. It differs from other data mining systems by adapting the “web search” paradigm. It is web-based, providing both a REST API and a user interface, and puts emphasis on interactivity, simplicity of user interface and immediate response. This paper will give an overview of research related to the EasyMiner project.

InBeat: JavaScript recommender system supporting sensor input and linked data

Autoři

Kuchař, J.; Kliegr, T.

Rok

2017

Publikováno

Knowledge-Based Systems. 2017, 135 40-43. ISSN 0950-7051.

Typ

Článek

DOI

10.1016/j.knosys.2017.07.026

Pracoviště

Katedra softwarového inženýrství

Anotace

Interest Beat (inbeat.eu) is an open source recommender framework that fulfills some of the demands raised by emerging applications that infer ratings from sensor input or use linked open data cloud for feature expansion. As a recommender algorithm, InBeat uses association rules, which allow to explain why a specific recommendation was made. Due to modular architecture, other algorithms can be easily plugged in. InBeat has a pure JavaScript version, which allows to confine processing to a client-side device. There is a performance optimized server-side bundle, which succesfully participated in two recent recommender competitions involving large volumes of streaming data. InBeat works on a number of platforms and is also available for Docker.

News Recommender System based on Association Rules @ CLEF NewsREEL 2017

Autoři

Golian, C.; Kuchař, J.

Rok

2017

Publikováno

Working Notes of CLEF 2017 - Conference and Labs of the Evaluation Forum. Aachen: CEUR Workshop Proceedings, 2017. vol. 1866. ISSN 1613-0073.

Typ

Stať ve sborníku

Pracoviště

Katedra softwarového inženýrství
Fakulta informačních technologií

Anotace

Digital editions of newspapers cause information overflow and users have problems choosing what they want to read. Systems which recommend news articles are suitable to solve such problems. Nevertheless, they face challenges unknown to the systems recommending books or movies such as a frequency of producing the new content. CLEF NewsREEL challenge enables to compare and evaluate news recommendation systems in an online and offline task focused on recommending articles to real users and tuning of algorithms respectively. This paper deals with an approach based on association rules acting as a classifier. In our approach we experimented with settings that allows to reduce the amount of rules used for the classification and increase the performance that is crucial for real recommendations. We evaluated our approach in both tasks of the CLEF NewsREEL 2017 challenge.

Outlier (Anomaly) Detection Modelling in PMML

Autoři

Kuchař, J.; Ashenfelter, A.; Kliegr, T.

Rok

2017

Publikováno

RuleML+RR 2017 - Doctoral Consortium, Challenge, Industry Track, Tutorials and Posters. Aachen: CEUR Workshop Proceedings, 2017. vol. 1875. ISSN 1613-0073.

Typ

Stať ve sborníku

Pracoviště

Katedra softwarového inženýrství

Anotace

PMML is an industry-standard XML-based open format for representing statistical and data mining models. Since PMML does not yet support outlier (anomaly) detection, in this paper we propose a new outlier detection model to foster interoperability in this emerging field. Our proposal is included in the PMML RoadMap for PMML 4.4. We demonstrate the proposed format on one supervised and two unsupervised outlier detection approaches: association rule-based classifier CBA, frequent-pattern based method FPOF and isolation forests.

Recommending News Articles using Rule-based Classifier

Autoři

Golian, C.; Kuchař, J.

Rok

2017

Publikováno

Data a znalosti 2017. Plzeň: Západočeská univerzita v Plzni, 2017. p. 51-55. ISBN 978-80-261-0720-0.

Typ

Stať ve sborníku

Pracoviště

Katedra softwarového inženýrství
Fakulta informačních technologií

Anotace

In this paper we summarize our experiments with a rule-based classifier as a recommender within CLEF NewsREEL 2017 challenge. Systems that recommend news articles are suitable to solve information overflow in digital editions of newspapers, when users have problems choosing what they want to read. They face challenges unknown to the systems recommending books or movies such as a frequency of producing the new content. This paper deals with an approach based on association rules acting as a classifier. In our approach we experimented with settings that allow reducing the amount of rules used for the classification and increasing the performance that is crucial for real recommendations.

Using EasyMiner API for Financial Data Analysis in the OpenBudgets.eu Project

Autoři

Vojíř, S.; Zeman, V.; Kuchař, J.; Kliegr, T.

Rok

2017

Publikováno

RuleML+RR 2017 - Doctoral Consortium, Challenge, Industry Track, Tutorials and Posters. Aachen: CEUR Workshop Proceedings, 2017. vol. 1875. ISSN 1613-0073.

Typ

Stať ve sborníku

Pracoviště

Katedra softwarového inženýrství

Anotace

This paper presents a use case for the data mining system EasyMiner in European project OpenBudgets.eu, which is concerned with publication and analysis of financial data of municipalities. EasyMiner is a web-based data mining system. This paper focuses on its new outlier detection functionality, which relies on frequent pattern mining. In addition, the system supports association rule discovery and building of rule-based classification models. The system exposes a REST API and can thus be easily integrated in third party applications.

Využití EasyMiner API v projektu OpenBudgets.eu

Autoři

Vojíř, S.; Zeman, V.; Kuchař, J.; Kliegr, T.

Rok

2017

Publikováno

Data a znalosti 2017. Plzeň: Západočeská univerzita v Plzni, 2017. p. 56-60. ISBN 978-80-261-0720-0.

Typ

Stať ve sborníku

Pracoviště

Katedra softwarového inženýrství

Anotace

V souvislosti s rostoucí popularitou využívání data miningových dat lze registrovat také rostoucí poptávku po možnosti integrace data miningových algoritmů a systémů do komplexnějších, uživatelsky přívětivějších aplikací. Tento příspěvek prezentuje novou verzi systému EasyMiner, integrovanou do softwarového řešení vyvíjeného v rámci evropského projektu OpenBudgets.eu, který je zaměřen na zpřístupňování a analýzy finančních dat samospráv. EasyMiner je webový data miningový systém podporující dolování asociačních pravidel, tvorbu klasifikačních modelů a v současné verzi nově také detekci outlierů. Příslušná funkcionalita je k dispozici nejen prostřednictvím grafického uživatelského rozhraní, ale také prostřednictvím komplexního REST API.

Analýza článků z českých zpravodajských serverů

Autoři

Filipová, M.; Kuchař, J.

Rok

2016

Publikováno

Proceedings in Informatics and Information Technologies - (WIKT & DaZ 2016) 11th Workshop on Intelligent and Knowledge Oriented Technologies 35th Conference on Data and Knowledge. Bratislava: Vydavatel'stvo STU, 2016. pp. 97-101. ISBN 978-80-227-4619-9.

Typ

Stať ve sborníku

Pracoviště

Katedra softwarového inženýrství
Fakulta informačních technologií

Anotace

V dnešní době, kdy množství informací na internetu stále narůstá, se automatické zpracování a třídění dat stalo velmi oblíbeným oborem informačních technologií. Jednou z oblastí je i internetové zpravodajství. Cílem tohoto projektu je nástroj pokrývající celý proces pro základní analýzu článků z českých zpravodajských serverů. Projekt je zaměřen především na extrakci relevantních dat a jejich analýzu. V první části zahrnuje ale i související crawler, díky kterému je možné stáhnout články k analýze ze zpravodajských webů. V druhé části je ze stažených HTML stránek automaticky extrahován relevantní obsah článků a jejich další atributy. Třetí částí je pak textová analýza využívající existující postupy a nástroje, která se zaměřuje na extrakci pojmenovaných entit a analýzu sentimentu českého textu. Nad výslednými strukturovanými daty se lze dotazovat z různých pohledů a provádět tedy různé druhy experimentů.

Exploiting Temporal Dimension in Tensor-Based Link Prediction

Autoři

Kuchař, J.; Dojčinovski, M.; Vitvar, T.

Rok

2016

Publikováno

Web Information Systems and Technologies. Cham: Springer International Publishing, 2016. pp. 211-231. Lecture Notes in Business Information Processing. ISSN 1865-1348. ISBN 978-3-319-30995-8.

Typ

Stať ve sborníku vyzvaná či oceněná

DOI

10.1007/978-3-319-30996-5_11

Pracoviště

Katedra softwarového inženýrství

Anotace

In the recent years, there is a significant interest in a link prediction - an important task for graph-based data structures. Although there exist many approaches based on the graph theory and factorizations, there is still lack of methods that can work with multiple types of links and temporal information. The creation time of a link is an important aspect: it reflects age and credibility of the information. In this paper, we introduce a method that predicts missing links in RDF datasets. We model multiple relations of RDF as a tensor that incorporates the creation time of links as a key component too. We evaluate the proposed approach on real world datasets: an RDF representation of the ProgrammableWeb directory and a subset of the DBpedia focused on movies. The results show that the proposed method outperforms other link prediction approaches.

Využití cloudu pro dolování asociačních pravidel z velkých dat přes webové rozhraní

Autoři

Zeman, V.; Vojiř, S.; Kuchař, J.; Kliegr, T.

Rok

2016

Publikováno

Proceedings in Informatics and Information Technologies - (WIKT & DaZ 2016) 11th Workshop on Intelligent and Knowledge Oriented Technologies 35th Conference on Data and Knowledge. Bratislava: Vydavatel'stvo STU, 2016. pp. 259-263. ISBN 978-80-227-4619-9.

Typ

Stať ve sborníku

Pracoviště

Katedra softwarového inženýrství

Anotace

Webová aplikace EasyMiner je akademický nástroj pro získávání znalostí z malých a středně velkých dat ve formě asociačních pravidel. Nová verze tohoto systému využívá prostředí Apache Hadoop a Apache Spark pro zpracování velkých datových zdrojů na výpočetním clusteru MetaCentra sdružení CESNET. Aplikace se skládá z několika mikro služeb, které se starají o nahrávání velkých dat do distribuovaného úložiště HDFS, transformaci dat v clusteru do normalizované formy a dolování znalostí z datasetů v podobě asociačních pravidel s využitím výpočetních prostředků clusteru pomocí nástroje Apache Spark. S těmito mikro službami se dá komunikovat prostřednictvím RESTového rozhraní a jako celek tvoří data miningový software fungující jako webová služba - SaaS.

Augmenting a Feature Set of Movies Using Linked Open Data

Autoři

Kuchař, J.

Rok

2015

Publikováno

Rule Challenge and Doctoral Consortium @ RuleML 2015. Aachen: CEUR Workshop Proceedings, 2015. ISSN 1613-0073.

Typ

Stať ve sborníku

Pracoviště

Katedra softwarového inženýrství

Anotace

Augmenting a feature set using mappings to the Web of data is an up-and-coming way to enrich data in the original dataset. Those enrichments are valuable especially for the recent preference learning algorithms and recommender systems. In this paper, we describe the process of mapping and augmenting the movie ratings dataset MovieTweetings from the perspective of RecSysRules 2015 Challenge. The ad-hoc queries to DBpedia are used as an underlying concept. To the best of our knowledge, there is no existing mapping dataset of movies for MovieTweetings. We also provide a brief discussion about the benefits of the augmented feature set for an elementary rule-based representation of the user preferences.

Benchmark of Rule-Based Classifiers in the News Recommendation Task

Autoři

Kliegr, T.; Kuchař, J.

Rok

2015

Publikováno

Experimental IR Meets Multilinguality, Multimodality, and Interaction - 6th International Conference of the CLEF Association. Berlin: Springer-Verlag, 2015. p. 130-141. Lecture Notes in Computer Science. ISSN 0302-9743. ISBN 978-3-319-24026-8.

Typ

Stať ve sborníku

DOI

10.1007/978-3-319-24027-5_11

Pracoviště

Katedra softwarového inženýrství

Anotace

In this paper, we present experiments evaluating Association Rule Classification algorithms on on-line and off-line recommender tasks of the CLEF NewsReel 2014 Challenge. The second focus of the experimental evaluation is to investigate possible performance optimizations of the Classification Based on Associations algorithm. Our findings indicate that pruning steps in CBA reduce the number of association rules substantially while not affecting accuracy. Using only part of the data employed for the rule learning phase in the pruning phase may also reduce training time while not affecting accuracy significantly.

EasyMiner/R: Web Interface for Rule Learning and Classification in R

Autoři

Vojíř, S.; Zeman, V.; Kuchař, J.; Kliegr, T.

Rok

2015

Publikováno

Rule Challenge and Doctoral Consortium @ RuleML 2015. Aachen: CEUR Workshop Proceedings, 2015. ISSN 1613-0073.

Typ

Stať ve sborníku

Pracoviště

Katedra softwarového inženýrství

Anotace

EasyMiner is a web-based visual interface for association rule learning. This paper presents a preview of the next release, which uses the R environment as the data processing backend. EasyMiner/R uses the arules package to learn rules. It uses the Classifications Based on Associations (CBA) algorithm as a classifier and to perform rule pruning. Experimental results show that EasyMiner with the R-based backend is able to handle larger datasets than the previous version.

Time-aware Link Prediction in RDF Graphs

Autoři

Kuchař, J.; Dojčinovski, M.; Vitvar, T.

Rok

2015

Publikováno

WEBIST 2015 - Proceedings of the 11th International Conference on Web Information Systems and Technologies. Madeira: SciTePress, 2015. ISBN 978-989-758-106-9.

Typ

Stať ve sborníku

DOI

10.5220/0005428403900401

Pracoviště

Katedra softwarového inženýrství

Anotace

When a link is not explicitly present in an RDF dataset, it does not mean that the link could not exist in reality. Link prediction methods try to overcome this problem by finding new links in the dataset with support of a background knowledge about the already existing links in the dataset. In dynamic environments that change often and evolve over time, link prediction methods should also take into account the temporal aspects of data. In this paper, we present a novel time-aware link prediction method. We model RDF data as a tensor and take into account the time when RDF data was created. We use an ageing function to model a retention of the information over the time; lower the significance of the older information and promote more recent. Our evaluation shows that the proposed method improves quality of predictions when compared with methods that do not consider the time information.

Bag-of-Entities text representation for client-side (video) recommender systems

Autoři

Kuchař, J.; Kliegr, T.

Rok

2014

Publikováno

RecSysTV 2014. 2014.

Typ

Stať ve sborníku

Pracoviště

Katedra softwarového inženýrství

Anotace

Client-side execution of a recommender system requires enrichment of the content delivered to the user with a list of potentially related content. A possible bottleneck for client-side recommendation is the data volume entailed by transferring the feature set describing each content item to the client, and the computational resources needed to process this feature set. This paper investigates whether the representation of the textual content (e.g. of videos) with Bag of Entities (BoE) vector generated by a wikifier can yield a classifier with the same accuracy at smaller size than the standard BoW approach. Experimental evaluation performed on the Reuters-21578 text categorization collection shows that there is a small improvement for small term vector sizes.

Doporučování multimediálního obsahu s využitím senzoru Microsoft Kinect

Autoři

Kuchař, J.; Kliegr, T.

Rok

2014

Publikováno

Proceedings of the 13th Annual Conference Znalosti 2014. Praha: VŠE, 2014. pp. 84-87. ISBN 978-80-245-2054-4.

Typ

Stať ve sborníku

Pracoviště

Katedra softwarového inženýrství

Anotace

Tento příspěvek představuje online recommender InBeat.eu. Systém umožňuje sběr explicitních a implicitních zpětných vazeb od uživatelů, které jsou použity jako ukazatele zájmu. Demonstrace systému je zaměřena na interakci uživatelů s multimediálním obsahem, konkrétně se jedná o videa a scénář televizních zpráv. Videa jsou automaticky sémanticky anotována s pomocí nástroje pro hledání pojmenovaných entit. Důležitou součástí sytému je propojení se senzorem Microsoft Kinect, který umožňuje analyzovat natočení hlavy za účelem reálného vyhodnocení sledování daného videa. Ze získaných dat jsou odvozena asociační pravidla představující preference daného uživatele. Tyto pravidla jsou následně použita pro doporučování.

InBeat: Recommender System as a Service

Autoři

Kuchař, J.; Kliegr, T.

Rok

2014

Publikováno

CLEF2014 Working Notes. Tilburg: CEUR Workshop Proceedings, 2014. p. 837-844. CLEF. ISSN 1613-0073.

Typ

Stať ve sborníku

Pracoviště

Katedra softwarového inženýrství

Anotace

Interest Beat (inbeat.eu) is a service for recommendation of content. InBeat was designed with emphasis on versatility, scalability and extensibility. The core contains the General Analytics INterceptor module, which collects and aggregates user interactions, the Preference Learning module and the Recommender System module. In this paper, we describe InBeat general architecture, putting emphasis on its high- performance architecture that was used in the CLEF-NEWSREEL: News Recommendation Evaluation Lab.

KINterestTV - Towards Non-invasive Measure of User Interest While Watching TV

Autoři

Leroy, J.; Rocca, F.; Mancas, M.; Madhkour, R.B.; Grisard, F.; Kliegr, T.; Kuchař, J.; Vit, J.; Pirner, I.; Zimmermann, P.

Rok

2014

Publikováno

Innovative and Creative Developments in Multimodal Interaction Systems. Berlin: Springer, 2014. pp. 179-199. IFIP Advances in Information and Communication Technology. ISSN 1868-4238. ISBN 978-3-642-55142-0.

Typ

Stať ve sborníku

DOI

10.1007/978-3-642-55143-7_8

Pracoviště

Katedra softwarového inženýrství

Anotace

Is it possible to determine only by observing the behavior of a user what are his interests for a media? The aim of this project is to develop an application that can detect whether or not a user is viewing a content on the TV and use this information to build the user profile and to make it evolve dynamically. Our approach is based on the use of a 3D sensor to study the movements of a user’s head to make an implicit analysis of his behavior. This behavior is synchronized with the TV content (media fragments) and other user interactions (clicks, gestural interaction) to further infer viewer’s interest. Our approach is tested during an experiment simulating the attention changes of a user in a scenario involving second screen (tablet) interaction, a behavior that has become common for spectators and a typical source of attention switches.

Learning Business Rules with Association Rule Classifiers

Autoři

Kliegr, T.; Kuchař, J.; Sottara, D.; Vojíř, S.

Rok

2014

Publikováno

Rules on the Web. From Theory to Applications. Cham: Springer International Publishing AG, 2014. p. 236-250. Lecture Notes in Computer Science. ISSN 0302-9743. ISBN 978-3-319-09869-2.

Typ

Stať ve sborníku

DOI

10.1007/978-3-319-09870-8_18

Pracoviště

Katedra softwarového inženýrství

Anotace

The main obstacles for a straightforward use of association rules as candidate business rules are the excessive number of rules discovered even on small datasets, and the fact that contradicting rules are generated. This paper shows that Association Rule Classification algorithms, such as CBA, solve both these problems, and provides a practical guide on using discovered rules in the Drools BRMS and on setting the ARC parameters. Experiments performed with modified CBA on several UCI datasets indicate that data coverage rule pruning keeps the number of rules manageable, while not adversely impacting the accuracy. The best results in terms of overall accuracy are obtained using minimum support and confidence thresholds. Disjunction between attribute values seem to provide a desirable balance between accuracy and rule count, while negated literals have not been found beneficial.

Orwellian Eye: Video Recommendation with Microsoft Kinect

Autoři

Kliegr, T.; Kuchař, J.

Rok

2014

Publikováno

ECAI 2014. Amsterdam: IOS Press, 2014. pp. 1227-1228. Frontiers in Artificial Intelligence and Applications. ISSN 0922-6389. ISBN 978-1-61499-418-3.

Typ

Stať ve sborníku

DOI

10.3233/978-1-61499-419-0-1227

Pracoviště

Katedra softwarového inženýrství

Anotace

This paper demonstrates Interest Beat (InBeat.eu) as a recommender system for online videos, which determines user interest in the content based on gaze tracking with Microsoft Kinect in addition to explicit user feedback. Content of the videos is represented using a semantic wikifier. User profile is constructed from preference rules, which are discovered with an association rule learner.

When TV meets the Web: towards personalised digital media

Autoři

Tsatsou, D.; Mancas, M.; Kuchař, J.; Nixon, L.; Vacura, M.; Leroy, J.; Rocca, F.; Mezaris, V.

Rok

2014

Publikováno

Semantic Multimedia Analysis and Processing. Boca Raton: CRC Press, 2014. p. 221-256. Digital Imaging and Computer Vision. ISBN 978-1-4665-7549-3.

Typ

Kapitola v knize

Pracoviště

Katedra softwarového inženýrství

Anotace

The rise of new paradigms in the field of television and digital media distribution (e.g. Smart TV, IPTV, Social TV) has opened a new digital world of data communication opportunities but at the same time exacerbated the information overload problem for media consumers and providers. Therefore, the need for personalized content delivery has extended from the traditional web to the networked media domain. This chapter presents a comprehensive research in the field of capturing and representing user preferences and context and an overview of relevant digital media-specific personalized recommendation techniques. Subsequently, it describes the vision and first personalization approach adopted within the LinkedTV EU project, for profiling and contextualizing users and providing targeted information and content in a linked media environment.

GAIN: web service for user tracking and preference learning - a smart TV use case

Autoři

Kuchař, J.; Kliegr, T.

Rok

2013

Publikováno

RecSys '13 Proceedings of the 7th ACM conference on Recommender systems. New York: ACM, 2013. pp. 467-468. ISBN 978-1-4503-2409-0.

Typ

Stať ve sborníku

DOI

10.1145/2507157.2508217

Pracoviště

Katedra softwarového inženýrství

Anotace

GAIN (inbeat.eu) is a web application and service for capturing and preprocessing user interactions with semantically described content. GAIN outputs a set of instances in tabular form suitable for further processing with generic machine-learning algorithms. GAIN is demoed as a component of a "SMART-TV" recommender system. Content is automatically described with DBpedia types using a Named Entity Recognition (NER) system. Interest is determined based on explicit user actions and user's attention computed by 3D head pose estimation. Preference rules are learnt with an association rule mining algorithm. These can be e.g. deployed to a business rules system, acting as a recommender.

GAIN: Analysis of Implicit Feedback on Semantically Annotated Content

Autoři

Kuchař, J.; Kliegr, T.

Rok

2012

Publikováno

WIKT 2012: 7th Workshop on Intelligent and Knowledge Oriented Technologies. Slovenská technická univerzita v Bratislave, 2012. pp. 75-78. ISBN 978-80-227-3812-5.

Typ

Stať ve sborníku

Pracoviště

Katedra softwarového inženýrství

Anotace

The trend in application development is to provide a personalized interface. The availability of the user preference level associated with user actions is the key for the personalization process. This paper describes a "work-in-progress" framework for deriving user preference from actions performed on semantically annotated objects - be it web pages or TV news. Preference level is computed using supervised learning with genetic programming from implicit feedback, which might be time on page for the web domain, or the user engagement level for the TV domain. We provide tool called GAIN (General Analytics INterceptor) covering the whole approach at wa.vse.cz.

Personalised Graph-Based Selection of Web APIs

Autoři

Dojčinovski, M.; Kuchař, J.; Vitvar, T.; Zaremba, M.

Rok

2012

Publikováno

The Semantic Web -- ISWC 2012. Heidelberg: Springer-Verlag, GmbH, 2012. p. 34-48. Lecture Notes in Computer Science. ISSN 0302-9743. ISBN 978-3-642-35175-4.

Typ

Stať ve sborníku

DOI

10.1007/978-3-642-35176-1_3

Pracoviště

Katedra softwarového inženýrství

Anotace

Modelling and understanding various contexts of users is important to enable personalised selection of Web APIs in directories such as Programmable Web. Currently, relationships between users and Web APIs are not clearly understood and utilized by existing selection approaches. In this paper, we present a semantic model of a Web API directory graph that captures relationships such as Web APIs, mashups, developers, and categories. We describe a novel configurable graph-based method for selection of Web APIs with personalised and temporal aspects. The method allows users to get more control over their preferences and recommended Web APIs while they can exploit information about their social links and preferences. We evaluate the method on a real-world dataset from ProgrammableWeb.com, and show that it provides more contextualised results than currently available popularity-based rankings.

Learning Semantic Web Usage Profiles by Using Genetic Algorithms

Autoři

Kuchař, J.; Jelínek, I.

Rok

2011

Publikováno

International Journal on Information Technologies and Security. 2011, 3(4), 3-20. ISSN 1313-8251.

Typ

Článek

Pracoviště

Katedra softwarového inženýrství

Anotace

Web usage profile is very important in recommender systems. More interesting is the semantic enriched profile, which can describe visitor intents by ontologies and express more information and relations of visitor's character. Our research is based on processing semantically enriched clickstream and application of scoring algorithm, which is based on symbolic regression. A semantic enrichment uses Linked Data principles. The scoring assigns to each pageview a value, which represents and involves visitor interests. Scoring involves all know attributes of each pageview including semantic annotation. The score of each pageview is used to establish a visitor profile. The established profile can be in form of ontologies. In this paper, we propose integrate scoring algorithm into semantic web usage mining and publish visitor profile in RDF/OWL representation. We suggest merge the profiles from different web sites and integrate additional related information from publicly available reso

Ing. Jaroslav Kuchař, Ph.D.

Publikace

Anomaly Detection in Log Streams based on Time-Contextual Models

Can variants, reinfection, symptoms and test types affect COVID-19 diagnostic performance? A large-scale retrospective study of AG-RDTs during circulation of Delta and Omicron variants, Czec

Time-Aware Log Anomaly Detection Based on Growing Self-organizing Map

Role of population and test characteristics in antigen-based SARS-CoV-2 diagnosis, Czechia, August to November 2021

Associative Classification in R: arc, arulesCBA, and rCBA

Content-aware Collaborative Filtering in Point-ofInterest Recommendation Systems

Detekce anomálií v otevřených datech o znečištění ovzduší polétavým prachem

Tuning Hyperparameters of Classification Based on Associations (CBA)

Dolování z otevřených dat o rozpočtech a výdajích

EasyMiner.eu: Web Framework for Interpretable Machine Learning based on Rules and Frequent Itemsets

Framework for Distributed Computing on the Web

Spotlighting Anomalies using Frequent Patterns

Vyhledávání obrázků v rozšířené realitě

EasyMiner – Short History of Research and Current Development

InBeat: JavaScript recommender system supporting sensor input and linked data

News Recommender System based on Association Rules @ CLEF NewsREEL 2017

Outlier (Anomaly) Detection Modelling in PMML

Recommending News Articles using Rule-based Classifier

Using EasyMiner API for Financial Data Analysis in the OpenBudgets.eu Project

Využití EasyMiner API v projektu OpenBudgets.eu

Analýza článků z českých zpravodajských serverů

Exploiting Temporal Dimension in Tensor-Based Link Prediction

Využití cloudu pro dolování asociačních pravidel z velkých dat přes webové rozhraní

Augmenting a Feature Set of Movies Using Linked Open Data

Benchmark of Rule-Based Classifiers in the News Recommendation Task

EasyMiner/R: Web Interface for Rule Learning and Classification in R

Time-aware Link Prediction in RDF Graphs

Bag-of-Entities text representation for client-side (video) recommender systems

Doporučování multimediálního obsahu s využitím senzoru Microsoft Kinect

InBeat: Recommender System as a Service

KINterestTV - Towards Non-invasive Measure of User Interest While Watching TV

Learning Business Rules with Association Rule Classifiers

Orwellian Eye: Video Recommendation with Microsoft Kinect

When TV meets the Web: towards personalised digital media

GAIN: web service for user tracking and preference learning - a smart TV use case

GAIN: Analysis of Implicit Feedback on Semantically Annotated Content

Personalised Graph-Based Selection of Web APIs

Learning Semantic Web Usage Profiles by Using Genetic Algorithms