Pierre Donat-Bouillud, Ph.D.

pierre.donat.bouillud@fit.cvut.cz
TH:A-1254

Profile
Publications
Projects
Teaching
Theses

Publications

All publications

signatr: A Data-Driven Fuzzing Tool for R

Authors

Turcotte, A.; Donat-Bouillud, P.; Křikava, F.; Vitek, J.

Year

2022

Published

SLE 2022: Proceedings of the 15th ACM SIGPLAN International Conference on Software Language Engineering. New York: Association for Computing Machinery, 2022. p. 216-221. 15. vol. 1. ISBN 978-1-4503-9919-7.

Type

Proceedings paper

DOI

10.1145/3567512.3567530

Departments

Programming Research Lab

Annotation

The fast-and-loose, permissive semantics of dynamic programming languages limit the power of static analyses. For that reason, soundness is often traded for precision through dynamic program analysis. Dynamic analysis is only as good as the available runnable code, and relying solely on test suites is fraught as they do not cover the full gamut of possible behaviors. Fuzzing is an approach for automatically exercising code, and could be used to obtain more runnable code. However, the shape of user-defined data in dynamic languages is difficult to intuit, limiting a fuzzer's reach. We propose a feedback-driven blackbox fuzzing approach which draws inputs from a database of values recorded from existing code. We implement this approach in a tool called signatr for the R language. We present the insights of its design and implementation, and assess signatr's ability to uncover new behaviors by fuzzing 4,829 R functions from 100 R packages, revealing 1,195,184 new signatures.

What we eval in the shadows: A large-scale study of eval in R programs

Authors

Goel, A.; Donat-Bouillud, P.; Křikava, F.; Kirsch, C.; Vitek, J.

Year

2021

Published

Proceedings of the ACM on Programming Languages (PACMPL). 2021, 5(OOPSLA), ISSN 2475-1421.

Type

Article

DOI

10.1145/3485502

Departments

Programming Research Lab

Annotation

Most dynamic languages allow users to turn text into code using various functions, often named eval, with language-dependent semantics. The widespread use of these reflective functions hinders static analysis and prevents compilers from performing optimizations. This paper aims to provide a better sense of why programmers use eval. Understanding why eval is used in practice is key to finding ways to mitigate its negative impact. We have reasons to believe that reflective feature usage is language and application domain-specific; we focus on data science code written in R and compare our results to previous work that analyzed web programming in JavaScript. We analyze 49,296,059 calls to eval from 240,327 scripts extracted from 15,401 R packages. We find that eval is indeed in widespread use; R's eval is more pervasive and arguably dangerous than what was previously reported for JavaScript.