doc. Ing. Filip Křikava, Ph.D.

Publikace

signatr: A Data-Driven Fuzzing Tool for R

Autoři
Rok
2022
Publikováno
SLE 2022: Proceedings of the 15th ACM SIGPLAN International Conference on Software Language Engineering. New York: Association for Computing Machinery, 2022. p. 216-221. 15. vol. 1. ISBN 978-1-4503-9919-7.
Typ
Stať ve sborníku
Anotace
The fast-and-loose, permissive semantics of dynamic programming languages limit the power of static analyses. For that reason, soundness is often traded for precision through dynamic program analysis. Dynamic analysis is only as good as the available runnable code, and relying solely on test suites is fraught as they do not cover the full gamut of possible behaviors. Fuzzing is an approach for automatically exercising code, and could be used to obtain more runnable code. However, the shape of user-defined data in dynamic languages is difficult to intuit, limiting a fuzzer's reach. We propose a feedback-driven blackbox fuzzing approach which draws inputs from a database of values recorded from existing code. We implement this approach in a tool called signatr for the R language. We present the insights of its design and implementation, and assess signatr's ability to uncover new behaviors by fuzzing 4,829 R functions from 100 R packages, revealing 1,195,184 new signatures.

What we eval in the shadows: A large-scale study of eval in R programs

Rok
2021
Publikováno
Proceedings of the ACM on Programming Languages (PACMPL). 2021, 5(OOPSLA), ISSN 2475-1421.
Typ
Článek
Anotace
Most dynamic languages allow users to turn text into code using various functions, often named eval, with language-dependent semantics. The widespread use of these reflective functions hinders static analysis and prevents compilers from performing optimizations. This paper aims to provide a better sense of why programmers use eval. Understanding why eval is used in practice is key to finding ways to mitigate its negative impact. We have reasons to believe that reflective feature usage is language and application domain-specific; we focus on data science code written in R and compare our results to previous work that analyzed web programming in JavaScript. We analyze 49,296,059 calls to eval from 240,327 scripts extracted from 15,401 R packages. We find that eval is indeed in widespread use; R's eval is more pervasive and arguably dangerous than what was previously reported for JavaScript.

Designing types for R, empirically

Autoři
Turcotte, A.; Goel, A.; Křikava, F.; Vitek, J.
Rok
2020
Publikováno
Proceedings of the ACM on Programming Languages (PACMPL). 2020, 4(OOPSLA), 1-25. ISSN 2475-1421.
Typ
Článek
Anotace
The R programming language is widely used in a variety of domains. It was designed to favor an interactive style of programming with minimal syntactic and conceptual overhead. This design is well suited to data analysis, but a bad fit for tools such as compilers or program analyzers. In particular, R has no type annotations, and all operations are dynamically checked at run-time. The starting point for our work are the two questions: what expressive power is needed to accurately type R code? and which type system is the R community willing to adopt? Both questions are difficult to answer without actually experimenting with a type system. The goal of this paper is to provide data that can feed into that design process. To this end, we perform a large corpus analysis to gain insights in the degree of polymorphism exhibited by idiomatic R code and explore potential benefits that the R community could accrue from a simple type system. As a starting point, we infer type signatures for 25,215 functions from 412 packages among the most widely used open source R libraries. We then conduct an evaluation on 8,694 clients of these packages, as well as on end-user code from the Kaggle data science competition website.

Scala Implicits are Everywhere: A large-scale study of the use of Implicits in the wild

Autoři
Křikava, F.; Vitek, J.; Miller, H.
Rok
2019
Publikováno
Volume 3, Issue OOPSLA October 2019. New York: ACM, 2019. p. 1-28. ISSN 2475-1421.
Typ
Stať ve sborníku
Anotace
The Scala programming language offers two distinctive language features implicit parameters and implicit conversions, often referred together as implicits. Announced without fanfare in 2004, implicits have quickly grown to become a widely and pervasively used feature of the language. They provide a way to reduce the boilerplate code in Scala programs. They are also used to implement certain language features without having to modify the compiler. We report on a large-scale study of the use of implicits in the wild. For this, we analyzed 7,280 Scala projects hosted on GitHub, spanning over 8.1M call sites involving implicits and 370.7K implicit declarations across 18.7M lines of Scala code.

Tests from traces: Automated unit test extraction for R

Rok
2018
Publikováno
ISSTA 2018 - Proceedings of the 27th ACM SIGSOFT International Symposium on Software Testing and Analysis. ACM PRESS, 2018. p. 232-241. ISBN 978-1-4503-5699-2.
Typ
Stať ve sborníku
Anotace
Unit tests are labor-intensive to write and maintain. This paper looks into how well unit tests for a target software package can be extracted from the execution traces of client code. Our objective is to reduce the effort involved in creating test suites while minimizing the number and size of individual tests, and maximizing coverage. To evaluate the viability of our approach, we select a challenging target for automated test extraction, namely R, a programming language that is popular for data science applications. The challenges presented by R are its extreme dynamism, coerciveness, and lack of types. This combination decrease the efficacy of traditional test extraction techniques. We present Genthat, a tool developed over the last couple of years to non-invasively record execution traces of R programs and extract unit tests from those traces. We have carried out an evaluation on 1,545 packages comprising 1.7M lines of code. The tests extracted by Genthat improved code coverage from the original rather low value of 267,496 lines to 700,918 lines. The running time of the generated tests is 1.9 times faster than the code they came from. © 2018 Association for Computing Machinery.

Contracts-based Control Integration into Software Systems

Autoři
Křikava, F.; Rouvoy, R.; Collet, P.; Seintourier, L.
Rok
2017
Publikováno
Software Engineering for Self-Adaptive Systems III - Assurances. Berlin: Springer, 2017. p. 251-281. ISSN 0302-9743. ISBN 978-3-319-74182-6.
Typ
Kapitola v knize
Anotace
Among the different techniques that are used to design self-adaptive software systems, control theory allows one to design an adaptation policy whose properties, such as stability and accuracy, can be formally guaranteed under certain assumptions. However, in the case of software systems, the integration of these controllers to build complete feedback control loops is manual. More importantly it requires an extensive handcrafting of non-trivial implementation code. This may lead to inconsistencies and instabilities as no systematic and automated assurance can be obtained on the fact that the initial assumptions for the designed controller still hold in the resulting system. In this chapter, we rely on the principles of design-by-contract to ensure the correction and robustness of a self-adaptive software system built using feedback control loops. Our solution raises the level of abstraction upon which the loops are specified by allowing one to define and automatically verify system-level properties organized in contracts. They cover behavioral, structural and temporal architectural constraints as well as explicit interaction. These contracts are complemented by a first-class support for systematic fault handling. As a result, assumptions about the system operation conditions become more explicit and verifiable in a systematic way.

Hadoop-Benchmark: Rapid Prototyping and Evaluation of Self-Adaptive Behaviors in Hadoop Clusters

Autoři
Zhang, B.; Křikava, F.; Rouvoy, R.; Senturier, L.
Rok
2017
Publikováno
2017 IEEE/ACM 12th International Symposium on Software Engineering for Adaptive and Self-Managing Systems (SEAMS). USA: IEEE Computer Society, 2017. p. 175-181. ISBN 978-1-5386-1550-8.
Typ
Stať ve sborníku
Anotace
Optimizing Hadoop executions has attracted a lot of research contributions in particular in the domain of self- adaptive software systems. However, these research efforts are often hindered by the complexity of Hadoop operation and the difficulty to reproduce experimental evaluations that makes it hard to compare different approaches to one another. To address this limitation, we propose a research acceleration platform for rapid prototyping and evaluation of self-adaptive behavior in Hadoop clusters. Essentially, it provides automated approach to provision reproducible Hadoop environments and execute acknowledged benchmarks. It is based on the state- of-the-art container technology that supports both distributed configurations as well as standalone single-host setups. We demonstrate the approach on a complete implementation of a concrete Hadoop self-adaptive case study.

Self-Balancing Job Parallelism and Throughput in Hadoop

Autoři
Zhang, Bo; Křikava, F.; Rouvoy, R.; Seinturier, L.
Rok
2016
Publikováno
Distributed Applications and Interoperable Systems. Cham: Springer International Publishing, 2016. p. 129-143. ISSN 0302-9743. ISBN 978-3-319-39576-0.
Typ
Stať ve sborníku
Anotace
In Hadoop cluster, the performance and the resource consumption of MapReduce jobs do not only depend on the characteristics of these applications and workloads, but also on the appropriate setting of Hadoop configuration parameters. However, when the job workloads are not known a priori or they evolve over time, a static configuration may quickly lead to a waste of computing resources and consequently to a performance degradation. In this paper, we therefore propose an on-line approach that dynamically reconfigures Hadoop at runtime. Concretely, we focus on balancing the job parallelism and throughput by adjusting Hadoop capacity scheduler memory configuration. Our evaluation shows that the approach outperforms vanilla Hadoop deployments by up to 40% and the best statically profiled configurations by up to 13 %.

Infrastructure as Runtime Models: Towards Model-Driven Resource Management

Autoři
Křikava, F.; Rouvoy, R.; Seinturier, L.
Rok
2015
Publikováno
Model Driven Engineering Languages and Systems (MODELS), 2015 ACM/IEEE 18th International Conference on. New York: ACM, 2015. p. 100-105. ISBN 978-1-4673-6908-4.
Typ
Stať ve sborníku
Anotace
The importance of continuous delivery and the emergence of tools allowing to treat infrastructure configurations programmatically have revolutionized the way computing resources and software systems are managed. However, these tools keep lacking an explicit model representation of underlying resources making it difficult to introspect, verify or reconfigure the system in response to external events. In this paper, we outline a novel approach that treats system infrastructure as explicit runtime models. A key benefit of using such models@run.time representation is that it provides a uniform semantic foundation for resources monitoring and reconfiguration. Adopting models at runtime allows one to integrate different aspects of system management, such as resource monitoring and subsequent verification into an unified view which would otherwise have to be done manually and require to use different tools. It also simplifies the development of various self-adaptation strategies without requiring the engineers and researchers to cope with low-level system complexities.

Software Engineering for Smart Cyber-Physical Systems – Towards a Research Agenda

Autoři
Křikava, F.; Bures, T.; Weyns, D.; Berger, C.; Biffl, S.; Daun, M.; Gabor, T.; Garlan, D.; Gerostathopoulos, I.; Julien, C.; Mordinyi, R.; Pronios, N.
Rok
2015
Publikováno
ACM SIGSOFT Software Engineering Notes. 2015, 40(6), 28-32. ISSN 0163-5948.
Typ
Článek
Anotace
Cyber-Physical Systems (CPS) are large interconnected softwareintensive systems that influence, by sensing and actuating, the physical world. Examples are traffic management and power grids. One of the trends we observe is the need to endow such systems with the "smart" capabilities, typically in the form of selfawareness and self-adaptation, along with the traditional qualities of safety and dependability. These requirements combined with specifics of the domain of smart CPS -- such as large scale, the role of end-users, uncertainty, and open-endedness -- render traditional software engineering (SE) techniques not directly applicable; making systematic SE of smart CPS a challenging task. This paper reports on the results of the First International Workshop on Software Engineering of Smart Cyber-Physical Systems (SEsCPS 2015), where participants discussed characteristics, challenges and opportunities of SE for smart CPS, with the aim to outline an agenda for future research in this important area.

Solving the TTC'15 Train Benchmark Case Study with SIGMA

Autoři
Rok
2015
Publikováno
Proceedings of the 8th Transformation Tool Contest, a part of the Software Technologies: Applications and Foundations (STAF 2015) federation of conferences. Aachen: CEUR Workshop Proceedings, 2015. p. 167-175. ISSN 1613-0073.
Typ
Stať ve sborníku
Anotace
This paper describes a solution for the Transformation Tool Contest 2015 (TTC'15) Train Benchmark case study using SIGMA, a family of Scala internal Domain-Specific Languages (DSLs) that provides an expressive and efficient API for model consistency checking and model transformations.