prof. Christoph Kirsch

Head of the Programming Research Laboratory

Publications

JEDI: These aren't the JSON documents you're looking for...

Authors
Hütter, T.; Augsten, N.; Kirsch, C.; Carey, M.J.; Li, C.
Year
2022
Published
SIGMOD '22: Proceedings of the 2022 International Conference on Management of Data. New York: Association for Computing Machinery, 2022. p. 1584-1597. ISSN 0730-8078. ISBN 978-1-4503-9249-5.
Type
Proceedings paper
Annotation
The JavaScript Object Notation (JSON) is a popular data format used in document stores to natively support semi-structured data. In this paper, we address the problem of JSON similarity lookup queries: given a query document and a distance threshold 𝜏, retrieve all documents that are within 𝜏 from the query document. Different from other hierarchical formats such as XML, JSON supports both ordered and unordered sibling collections within a single document which poses a new challenge to the tree model and distance computation. We propose JSON tree, a lossless tree representation of JSON documents, and define the JSON Edit Distance (JEDI), the first edit-based distance measure for JSON. We develop QuickJEDI, an algorithm that computes JEDI by leveraging a new technique to prune expensive sibling matchings. It outperforms a baseline algorithm by an order of magnitude in runtime. To boost the performance of JSON similarity queries, we introduce an index called JSIM and an effective upper bound based on tree sorting. Our upper bound algorithm runs in 𝑂 (𝑛𝜏) time and 𝑂 (𝑛 +𝜏 log 𝑛) space, which substantially improves the previous best bound of 𝑂 (𝑛2) time and 𝑂 (𝑛 log 𝑛) space (where 𝑛 is the tree size). Our experimental evaluation shows that our solution scales to databases with millions of documents and JSON trees with tens of thousands of nodes.

ASE: A Value Set Decision Procedure for Symbolic Execution

Authors
Abyaneh, A.S.; Kirsch, C.
Year
2021
Published
Proceedings of the 36th IEEE/ACM International Conference on Automated Software Engineering. IEEE Press, 2021. p. 203-214. ISBN 978-1-6654-0337-5.
Type
Proceedings paper
Annotation
A symbolic execution engine regularly queries a Satisfiability Modulo Theory (SMT) solver to determine reachability of code during execution. Unfortunately, the SMT solver is often the bottleneck of symbolic execution. Inspired by abstract interpretation, we propose an abstract symbolic execution (ASE) engine which aims at querying the SMT solver less often by trying to compute reachability faster through an increasingly weaker abstraction. For this purpose, we have designed and implemented a value set decision procedure based on strided value interval (SVI) sets for efficiently determining precise, or under-approximating value sets for variables. Our ASE engine begins reasoning with respect to the SVI abstraction, and then only if needed uses the theory of bit-vectors implemented in SMT solvers. Our ASE engine efficiently detects when the former abstraction becomes incomplete to move on and try the next abstraction. We have designed and implemented a prototype of our engine for a subset of 64-bit RISC-V. Our experimental evaluation shows that our prototype often improves symbolic execution time by significantly reducing the number of SMT queries while, whenever the abstraction does not work, the overhead for trying still remains low.

What we eval in the shadows: A large-scale study of eval in R programs

Year
2021
Published
Proceedings of the ACM on Programming Languages (PACMPL). 2021, 5(OOPSLA), ISSN 2475-1421.
Type
Article
Annotation
Most dynamic languages allow users to turn text into code using various functions, often named eval, with language-dependent semantics. The widespread use of these reflective functions hinders static analysis and prevents compilers from performing optimizations. This paper aims to provide a better sense of why programmers use eval. Understanding why eval is used in practice is key to finding ways to mitigate its negative impact. We have reasons to believe that reflective feature usage is language and application domain-specific; we focus on data science code written in R and compare our results to previous work that analyzed web programming in JavaScript. We analyze 49,296,059 calls to eval from 240,327 scripts extracted from 15,401 R packages. We find that eval is indeed in widespread use; R's eval is more pervasive and arguably dangerous than what was previously reported for JavaScript.