doc. Ing. Filip Křikava, Ph.D.

Reducing Feedback Pollution

Authors

Krynski, S.; Štěpánek, M.; Říha, F.; Křikava, F.; Vitek, J.

Year

2024

Published

VMIL '24: Proceedings of the 16th ACM SIGPLAN International Workshop on Virtual Machines and Intermediate Languages. New York: Association for Computing Machinery, 2024. p. 65-74. 16. ISBN 979-8-4007-1213-5.

Type

Proceedings paper

DOI

10.1145/3689490.3690404

Departments

Department of Theoretical Computer Science
Programming Research Lab

Annotation

Just-in-time compilers enhance the performance of future invocations of a function by generating code tailored to past behavior. To achieve this, compilers use a data structure, often referred to as a feedback vector, to record information about each function’s invocations. However, over time, feedback vectors tend to become less precise, leading to lower-quality code – a phenomenon known as feedback vector pollution. This paper examines feedback vector pollution within the context of a compiler for the R language. We provide data, discuss an approach to reduce pollution in practice, and implement a proof-of-concept implementation of this approach. The preliminary results of the implementation indicate ∼30% decrease in polluted compilations and ∼37% decrease in function pollution throughout our corpus.

signatr: A Data-Driven Fuzzing Tool for R

Authors

Turcotte, A.; Donat-Bouillud, P.; Křikava, F.; Vitek, J.

Year

2022

Published

SLE 2022: Proceedings of the 15th ACM SIGPLAN International Conference on Software Language Engineering. New York: Association for Computing Machinery, 2022. p. 216-221. 15. vol. 1. ISBN 978-1-4503-9919-7.

Type

Proceedings paper

DOI

10.1145/3567512.3567530

Departments

Programming Research Lab

Annotation

The fast-and-loose, permissive semantics of dynamic programming languages limit the power of static analyses. For that reason, soundness is often traded for precision through dynamic program analysis. Dynamic analysis is only as good as the available runnable code, and relying solely on test suites is fraught as they do not cover the full gamut of possible behaviors. Fuzzing is an approach for automatically exercising code, and could be used to obtain more runnable code. However, the shape of user-defined data in dynamic languages is difficult to intuit, limiting a fuzzer's reach. We propose a feedback-driven blackbox fuzzing approach which draws inputs from a database of values recorded from existing code. We implement this approach in a tool called signatr for the R language. We present the insights of its design and implementation, and assess signatr's ability to uncover new behaviors by fuzzing 4,829 R functions from 100 R packages, revealing 1,195,184 new signatures.

What we eval in the shadows: A large-scale study of eval in R programs

Authors

Goel, A.; Donat-Bouillud, P.; Křikava, F.; Kirsch, C.; Vitek, J.

Year

2021

Published

Proceedings of the ACM on Programming Languages (PACMPL). 2021, 5(OOPSLA), ISSN 2475-1421.

Type

Article

DOI

10.1145/3485502

Departments

Programming Research Lab

Annotation

Most dynamic languages allow users to turn text into code using various functions, often named eval, with language-dependent semantics. The widespread use of these reflective functions hinders static analysis and prevents compilers from performing optimizations. This paper aims to provide a better sense of why programmers use eval. Understanding why eval is used in practice is key to finding ways to mitigate its negative impact. We have reasons to believe that reflective feature usage is language and application domain-specific; we focus on data science code written in R and compare our results to previous work that analyzed web programming in JavaScript. We analyze 49,296,059 calls to eval from 240,327 scripts extracted from 15,401 R packages. We find that eval is indeed in widespread use; R's eval is more pervasive and arguably dangerous than what was previously reported for JavaScript.

Designing types for R, empirically

Authors

Turcotte, A.; Goel, A.; Křikava, F.; Vitek, J.

Year

2020

Published

Proceedings of the ACM on Programming Languages (PACMPL). 2020, 4(OOPSLA), 1-25. ISSN 2475-1421.

Type

Article

DOI

10.1145/3428249

Departments

Programming Research Lab

Annotation

The R programming language is widely used in a variety of domains. It was designed to favor an interactive style of programming with minimal syntactic and conceptual overhead. This design is well suited to data analysis, but a bad fit for tools such as compilers or program analyzers. In particular, R has no type annotations, and all operations are dynamically checked at run-time. The starting point for our work are the two questions: what expressive power is needed to accurately type R code? and which type system is the R community willing to adopt? Both questions are difficult to answer without actually experimenting with a type system. The goal of this paper is to provide data that can feed into that design process. To this end, we perform a large corpus analysis to gain insights in the degree of polymorphism exhibited by idiomatic R code and explore potential benefits that the R community could accrue from a simple type system. As a starting point, we infer type signatures for 25,215 functions from 412 packages among the most widely used open source R libraries. We then conduct an evaluation on 8,694 clients of these packages, as well as on end-user code from the Kaggle data science competition website.

Scala Implicits are Everywhere: A large-scale study of the use of Implicits in the wild

Authors

Křikava, F.; Vitek, J.; Miller, H.

Year

2019

Published

Volume 3, Issue OOPSLA October 2019. New York: ACM, 2019. p. 1-28. ISSN 2475-1421.

Type

Proceedings paper

DOI

10.1145/3360589

Departments

Programming Research Lab

Annotation

The Scala programming language offers two distinctive language features implicit parameters and implicit conversions, often referred together as implicits. Announced without fanfare in 2004, implicits have quickly grown to become a widely and pervasively used feature of the language. They provide a way to reduce the boilerplate code in Scala programs. They are also used to implement certain language features without having to modify the compiler. We report on a large-scale study of the use of implicits in the wild. For this, we analyzed 7,280 Scala projects hosted on GitHub, spanning over 8.1M call sites involving implicits and 370.7K implicit declarations across 18.7M lines of Scala code.

Tests from traces: Automated unit test extraction for R

Authors

Křikava, F.; Vitek, J.

Year

2018

Published

ISSTA 2018 - Proceedings of the 27th ACM SIGSOFT International Symposium on Software Testing and Analysis. ACM PRESS, 2018. p. 232-241. ISBN 978-1-4503-5699-2.

Type

Proceedings paper

DOI

10.1145/3213846.3213863

Departments

Programming Research Lab

Annotation

Unit tests are labor-intensive to write and maintain. This paper looks into how well unit tests for a target software package can be extracted from the execution traces of client code. Our objective is to reduce the effort involved in creating test suites while minimizing the number and size of individual tests, and maximizing coverage. To evaluate the viability of our approach, we select a challenging target for automated test extraction, namely R, a programming language that is popular for data science applications. The challenges presented by R are its extreme dynamism, coerciveness, and lack of types. This combination decrease the efficacy of traditional test extraction techniques. We present Genthat, a tool developed over the last couple of years to non-invasively record execution traces of R programs and extract unit tests from those traces. We have carried out an evaluation on 1,545 packages comprising 1.7M lines of code. The tests extracted by Genthat improved code coverage from the original rather low value of 267,496 lines to 700,918 lines. The running time of the generated tests is 1.9 times faster than the code they came from. © 2018 Association for Computing Machinery.

Contracts-based Control Integration into Software Systems

Authors

Křikava, F.; Rouvoy, R.; Collet, P.; Seintourier, L.

Year

2017

Published

Software Engineering for Self-Adaptive Systems III - Assurances. Berlin: Springer, 2017. p. 251-281. ISSN 0302-9743. ISBN 978-3-319-74182-6.

Type

Book chapter

DOI

10.1007/978-3-319-74183-3_9

Departments

Department of Software Engineering

Annotation

Among the different techniques that are used to design self-adaptive software systems, control theory allows one to design an adaptation policy whose properties, such as stability and accuracy, can be formally guaranteed under certain assumptions. However, in the case of software systems, the integration of these controllers to build complete feedback control loops is manual. More importantly it requires an extensive handcrafting of non-trivial implementation code. This may lead to inconsistencies and instabilities as no systematic and automated assurance can be obtained on the fact that the initial assumptions for the designed controller still hold in the resulting system. In this chapter, we rely on the principles of design-by-contract to ensure the correction and robustness of a self-adaptive software system built using feedback control loops. Our solution raises the level of abstraction upon which the loops are specified by allowing one to define and automatically verify system-level properties organized in contracts. They cover behavioral, structural and temporal architectural constraints as well as explicit interaction. These contracts are complemented by a first-class support for systematic fault handling. As a result, assumptions about the system operation conditions become more explicit and verifiable in a systematic way.

Hadoop-Benchmark: Rapid Prototyping and Evaluation of Self-Adaptive Behaviors in Hadoop Clusters

Authors

Zhang, B.; Křikava, F.; Rouvoy, R.; Senturier, L.

Year

2017

Published

2017 IEEE/ACM 12th International Symposium on Software Engineering for Adaptive and Self-Managing Systems (SEAMS). USA: IEEE Computer Society, 2017. p. 175-181. ISBN 978-1-5386-1550-8.

Type

Proceedings paper

DOI

10.1109/SEAMS.2017.15

Departments

Programming Research Lab

Annotation

Optimizing Hadoop executions has attracted a lot of research contributions in particular in the domain of self- adaptive software systems. However, these research efforts are often hindered by the complexity of Hadoop operation and the difficulty to reproduce experimental evaluations that makes it hard to compare different approaches to one another. To address this limitation, we propose a research acceleration platform for rapid prototyping and evaluation of self-adaptive behavior in Hadoop clusters. Essentially, it provides automated approach to provision reproducible Hadoop environments and execute acknowledged benchmarks. It is based on the state- of-the-art container technology that supports both distributed configurations as well as standalone single-host setups. We demonstrate the approach on a complete implementation of a concrete Hadoop self-adaptive case study.

Self-Balancing Job Parallelism and Throughput in Hadoop

Authors

Zhang, Bo; Křikava, F.; Rouvoy, R.; Seinturier, L.

Year

2016

Published

Distributed Applications and Interoperable Systems. Cham: Springer International Publishing, 2016. p. 129-143. ISSN 0302-9743. ISBN 978-3-319-39576-0.

Type

Proceedings paper

DOI

10.1007/978-3-319-39577-7_11

Departments

Programming Research Lab

Annotation

In Hadoop cluster, the performance and the resource consumption of MapReduce jobs do not only depend on the characteristics of these applications and workloads, but also on the appropriate setting of Hadoop configuration parameters. However, when the job workloads are not known a priori or they evolve over time, a static configuration may quickly lead to a waste of computing resources and consequently to a performance degradation. In this paper, we therefore propose an on-line approach that dynamically reconfigures Hadoop at runtime. Concretely, we focus on balancing the job parallelism and throughput by adjusting Hadoop capacity scheduler memory configuration. Our evaluation shows that the approach outperforms vanilla Hadoop deployments by up to 40% and the best statically profiled configurations by up to 13 %.

Infrastructure as Runtime Models: Towards Model-Driven Resource Management

Authors

Křikava, F.; Rouvoy, R.; Seinturier, L.

Year

2015

Published

Model Driven Engineering Languages and Systems (MODELS), 2015 ACM/IEEE 18th International Conference on. New York: ACM, 2015. p. 100-105. ISBN 978-1-4673-6908-4.

Type

Proceedings paper

DOI

10.1109/MODELS.2015.7338240

Departments

Department of Software Engineering

Annotation

The importance of continuous delivery and the emergence of tools allowing to treat infrastructure configurations programmatically have revolutionized the way computing resources and software systems are managed. However, these tools keep lacking an explicit model representation of underlying resources making it difficult to introspect, verify or reconfigure the system in response to external events. In this paper, we outline a novel approach that treats system infrastructure as explicit runtime models. A key benefit of using such models@run.time representation is that it provides a uniform semantic foundation for resources monitoring and reconfiguration. Adopting models at runtime allows one to integrate different aspects of system management, such as resource monitoring and subsequent verification into an unified view which would otherwise have to be done manually and require to use different tools. It also simplifies the development of various self-adaptation strategies without requiring the engineers and researchers to cope with low-level system complexities.

Software Engineering for Smart Cyber-Physical Systems – Towards a Research Agenda

Authors

Křikava, F.; Bures, T.; Weyns, D.; Berger, C.; Biffl, S.; Daun, M.; Gabor, T.; Garlan, D.; Gerostathopoulos, I.; Julien, C.; Mordinyi, R.; Pronios, N.

Year

2015

Published

ACM SIGSOFT Software Engineering Notes. 2015, 40(6), 28-32. ISSN 0163-5948.

Type

Article

DOI

10.1145/2830719.2830736

Departments

Department of Software Engineering

Annotation

Cyber-Physical Systems (CPS) are large interconnected softwareintensive systems that influence, by sensing and actuating, the physical world. Examples are traffic management and power grids. One of the trends we observe is the need to endow such systems with the "smart" capabilities, typically in the form of selfawareness and self-adaptation, along with the traditional qualities of safety and dependability. These requirements combined with specifics of the domain of smart CPS -- such as large scale, the role of end-users, uncertainty, and open-endedness -- render traditional software engineering (SE) techniques not directly applicable; making systematic SE of smart CPS a challenging task. This paper reports on the results of the First International Workshop on Software Engineering of Smart Cyber-Physical Systems (SEsCPS 2015), where participants discussed characteristics, challenges and opportunities of SE for smart CPS, with the aim to outline an agenda for future research in this important area.

Solving the TTC'15 Train Benchmark Case Study with SIGMA

Authors

Křikava, F.

Year

2015

Published

Proceedings of the 8th Transformation Tool Contest, a part of the Software Technologies: Applications and Foundations (STAF 2015) federation of conferences. Aachen: CEUR Workshop Proceedings, 2015. p. 167-175. ISSN 1613-0073.

Type

Proceedings paper

Departments

Department of Software Engineering

Annotation

This paper describes a solution for the Transformation Tool Contest 2015 (TTC'15) Train Benchmark case study using SIGMA, a family of Scala internal Domain-Specific Languages (DSLs) that provides an expressive and efficient API for model consistency checking and model transformations.

doc. Ing. Filip Křikava, Ph.D.

Publications

Reducing Feedback Pollution

signatr: A Data-Driven Fuzzing Tool for R

What we eval in the shadows: A large-scale study of eval in R programs

Designing types for R, empirically

Scala Implicits are Everywhere: A large-scale study of the use of Implicits in the wild

Tests from traces: Automated unit test extraction for R

Contracts-based Control Integration into Software Systems

Hadoop-Benchmark: Rapid Prototyping and Evaluation of Self-Adaptive Behaviors in Hadoop Clusters

Self-Balancing Job Parallelism and Throughput in Hadoop

Infrastructure as Runtime Models: Towards Model-Driven Resource Management

Software Engineering for Smart Cyber-Physical Systems – Towards a Research Agenda

Solving the TTC'15 Train Benchmark Case Study with SIGMA