prof. Jan Vitek, MSc., Ph.D.

Reducing Feedback Pollution

Authors

Krynski, S.; Štěpánek, M.; Říha, F.; Křikava, F.; Vitek, J.

Year

2024

Published

VMIL '24: Proceedings of the 16th ACM SIGPLAN International Workshop on Virtual Machines and Intermediate Languages. New York: Association for Computing Machinery, 2024. p. 65-74. 16. ISBN 979-8-4007-1213-5.

Type

Proceedings paper

DOI

10.1145/3689490.3690404

Departments

Department of Theoretical Computer Science
Programming Research Lab

Annotation

Just-in-time compilers enhance the performance of future invocations of a function by generating code tailored to past behavior. To achieve this, compilers use a data structure, often referred to as a feedback vector, to record information about each function’s invocations. However, over time, feedback vectors tend to become less precise, leading to lower-quality code – a phenomenon known as feedback vector pollution. This paper examines feedback vector pollution within the context of a compiler for the R language. We provide data, discuss an approach to reduce pollution in practice, and implement a proof-of-concept implementation of this approach. The preliminary results of the implementation indicate ∼30% decrease in polluted compilations and ∼37% decrease in function pollution throughout our corpus.

Reusing Just-in-Time Compiled Code

Authors

Mehta, M.K.; Krynski, S.; Gualandi, H.; Thakur, M.; Vitek, J.

Year

2023

Published

Proceedings of the ACM on Programming Languages (PACMPL). 2023, 7(OOPSLA2), 1176-1197. ISSN 2475-1421.

Type

Article

DOI

10.1145/3622839

Departments

Department of Theoretical Computer Science
Programming Research Lab

Annotation

Most code is executed more than once. If not entire programs then libraries remain unchanged from one run to the next. Just-in-time compilers expend considerable effort gathering insights about code they compiled many times, and often end up generating the same binary over and over again. We explore how to reuse compiled code across runs of different programs to reduce warm-up costs of dynamic languages. We propose to use speculative contextual dispatch to select versions of functions from an off-line curated code repository. That repository is a persistent database of previously compiled functions indexed by the context under which they were compiled. The repository is curated to remove redundant code and to optimize dispatch. We assess practicality by extending Ř, a compiler for the R language, and evaluating its performance. Our results suggest that the approach improves warmup times while preserving peak performance.

Deoptless: Speculation with Dispatched On-Stack Replacement and Specialized Continuations

Authors

Flückiger, O.; Ječmen, J.; Krynski, S.; Vitek, J.

Year

2022

Published

PLDI 2022: Proceedings of the 43rd ACM SIGPLAN International Conference on Programming Language Design and Implementation. New York: Association for Computing Machinery, 2022. p. 749-761. ISBN 978-1-4503-9265-5.

Type

Proceedings paper

DOI

10.1145/3519939.3523729

Departments

Department of Theoretical Computer Science
Programming Research Lab

Annotation

Just-in-time compilation provides significant performance improvements for programs written in dynamic languages. These benefits come from the ability of the compiler to speculate about likely cases and generate optimized code for these. Unavoidably, speculations sometimes fail and the optimizations must be reverted. In some pathological cases, this can leave the program stuck with suboptimal code. In this paper we propose deoptless, a technique that replaces deoptimization points with dispatched specialized continuations. The goal of deoptless is to take a step towards providing users with a more transparent performance model in which mysterious slowdowns are less frequent and grave.

signatr: A Data-Driven Fuzzing Tool for R

Authors

Turcotte, A.; Donat-Bouillud, P.; Křikava, F.; Vitek, J.

Year

2022

Published

SLE 2022: Proceedings of the 15th ACM SIGPLAN International Conference on Software Language Engineering. New York: Association for Computing Machinery, 2022. p. 216-221. 15. vol. 1. ISBN 978-1-4503-9919-7.

Type

Proceedings paper

DOI

10.1145/3567512.3567530

Departments

Programming Research Lab

Annotation

The fast-and-loose, permissive semantics of dynamic programming languages limit the power of static analyses. For that reason, soundness is often traded for precision through dynamic program analysis. Dynamic analysis is only as good as the available runnable code, and relying solely on test suites is fraught as they do not cover the full gamut of possible behaviors. Fuzzing is an approach for automatically exercising code, and could be used to obtain more runnable code. However, the shape of user-defined data in dynamic languages is difficult to intuit, limiting a fuzzer's reach. We propose a feedback-driven blackbox fuzzing approach which draws inputs from a database of values recorded from existing code. We implement this approach in a tool called signatr for the R language. We present the insights of its design and implementation, and assess signatr's ability to uncover new behaviors by fuzzing 4,829 R functions from 100 R packages, revealing 1,195,184 new signatures.

CodeDJ: Reproducible queries over large-scale software repositories

Authors

Máj, P.; Siek, K.; Kovalenko, A.; Vitek, J.

Year

2021

Published

Leibniz International Proceedings in Informatics (LIPIcs). Saarbrücken: Dagstuhl Publishing,, 2021. p. 1-24. ISSN 1868-8969. ISBN 978-3-95977-190-0.

Type

Proceedings paper

DOI

10.4230/LIPIcs.ECOOP.2021.6

Departments

Department of Theoretical Computer Science
Department of Applied Mathematics
Programming Research Lab

Annotation

Analyzing massive code bases is a staple of modern software engineering research – a welcome side-effect of the advent of large-scale software repositories such as GitHub. Selecting which projects one should analyze is a labor-intensive process, and a process that can lead to biased results if the selection is not representative of the population of interest. One issue faced by researchers is that the interface exposed by software repositories only allows the most basic of queries. CodeDJ is an infrastructure for querying repositories composed of a persistent datastore, constantly updated with data acquired from GitHub, and an in-memory database with a Rust query interface. CodeDJ supports reproducibility, historical queries are answered deterministically using past states of the datastore; thus researchers can reproduce published results. To illustrate the benefits of CodeDJ, we identify biases in the data of a published study and, by repeating the analysis with new data, we demonstrate that the study’s conclusions were sensitive to the choice of projects.

First-class environments in R

Authors

Goel, A.; Vitek, J.

Year

2021

Published

DLS 2021: Proceedings of the 17th ACM SIGPLAN International Symposium on Dynamic Languages. New York: ACM, 2021. p. 12-22. ISBN 9781450391054.

Type

Proceedings paper

DOI

10.1145/3486602.3486768

Departments

Programming Research Lab

Annotation

The R programming language is widely used for statistical computing. To enable interactive data exploration and rapid prototyping, R encourages a dynamic programming style. This programming style is supported by features such as first-class environments. Amongst widely used languages, R has the richest interface for programmatically manipulating environments. With the flexibility afforded by reflective operations on first-class environments, come significant challenges for reasoning and optimizing user-defined code. This paper documents the reflective interface used to operate over first-class environment. We explain the rationale behind its design and conduct a large-scale study of how the interface is used in popular libraries. © 2021 Owner/Author.

Formally verified speculation and deoptimization in a JIT compiler

Authors

Barrière, A.; Blazy, S.; Flückiger, O.; Pichardie, D.; Vitek, J.

Year

2021

Published

Proceedings of the ACM on Programming Languages (PACMPL). 2021, 5(POPL), 1-26. ISSN 2475-1421.

Type

Article

DOI

10.1145/3434327

Departments

Programming Research Lab

Annotation

Just-in-time compilers for dynamic languages routinely generate code under assumptions that may be invalidated at run-time, this allows for specialization of program code to the common case in order to avoid unnecessary overheads due to uncommon cases. This form of software speculation requires support for deoptimization when some of the assumptions fail to hold. This paper presents a model just-in-time compiler with an intermediate representation that explicits the synchronization points used for deoptimization and the assumptions made by the compiler's speculation. We also present several common compiler optimizations that can leverage speculation to generate improved code. The optimizations are proved correct with the help of a proof assistant. While our work stops short of proving native code generation, we demonstrate how one could use the verified optimization to obtain significant speed ups in an end-to-end setting. © 2021 Owner/Author.

Promises Are Made To Be Broken: Migrating R to Strict Semantics

Authors

Goel, A.; Ječmen, J.; Krynski, S.; Flückiger, O.; Vitek, J.

Year

2021

Published

Proceedings of the ACM on Programming Languages (PACMPL). 2021, 5(OOPSLA), 1-20. ISSN 2475-1421.

Type

Article

DOI

10.1145/3485478

Departments

Department of Theoretical Computer Science
Programming Research Lab

Annotation

Function calls in the R language do not evaluate their arguments, these are passed to the callee as suspended computations and evaluated if needed. After 25 years of experience with the language, there are very few cases where programmers leverage delayed evaluation intentionally and laziness comes at a price in performance and complexity. This paper explores how to evolve the semantics of a lazy language towards strictness-by-default and laziness-on-demand. To provide a migration path, it is necessary to provide tooling for developers to migrate libraries without introducing errors. This paper reports on a dynamic analysis that infers strictness signatures for functions to capture both intentional and accidental laziness. Over 99% of the inferred signatures were correct when tested against clients of the libraries.

Type stability in Julia: Avoiding performance pathologies in JIT compilation

Authors

Pelenitsyn, A.; Belyakova, J.; Chung, B.; Tate, R.; Vitek, J.

Year

2021

Published

Proceedings of the ACM on Programming Languages (PACMPL). 2021, 5 1-26. ISSN 2475-1421.

Type

Article

DOI

10.1145/3485527

Departments

Programming Research Lab

Annotation

As a scientific programming language, Julia strives for performance but also provides high-level productivity features. To avoid performance pathologies, Julia users are expected to adhere to a coding discipline that enables so-called type stability. Informally, a function is type stable if the type of the output depends only on the types of the inputs, not their values. This paper provides a formal definition of type stability as well as a stronger property of type groundedness, shows that groundedness enables compiler optimizations, and proves the compiler correct. We also perform a corpus analysis to uncover how these type-related properties manifest in practice.

What we eval in the shadows: A large-scale study of eval in R programs

Authors

Goel, A.; Donat-Bouillud, P.; Křikava, F.; Kirsch, C.; Vitek, J.

Year

2021

Published

Proceedings of the ACM on Programming Languages (PACMPL). 2021, 5(OOPSLA), ISSN 2475-1421.

Type

Article

DOI

10.1145/3485502

Departments

Programming Research Lab

Annotation

Most dynamic languages allow users to turn text into code using various functions, often named eval, with language-dependent semantics. The widespread use of these reflective functions hinders static analysis and prevents compilers from performing optimizations. This paper aims to provide a better sense of why programmers use eval. Understanding why eval is used in practice is key to finding ways to mitigate its negative impact. We have reasons to believe that reflective feature usage is language and application domain-specific; we focus on data science code written in R and compare our results to previous work that analyzed web programming in JavaScript. We analyze 49,296,059 calls to eval from 240,327 scripts extracted from 15,401 R packages. We find that eval is indeed in widespread use; R's eval is more pervasive and arguably dangerous than what was previously reported for JavaScript.

Contextual Dispatch for Function Specialization

Authors

Flückiger, O.; Chari, G.; Yee, M.-H.; Ječmen, J.; Vitek, J.

Year

2020

Published

Proceedings of the ACM on Programming Languages (PACMPL). 2020, 4(OOPSLA), 1-24. ISSN 2475-1421.

Type

Article

DOI

10.1145/3428288

Departments

Department of Theoretical Computer Science
Programming Research Lab

Annotation

In order to generate efficient code, dynamic language compilers often need information, such as dynamic types, not readily available in the program source. Leveraging a mixture of static and dynamic information, these compilers speculate on the missing information. Within one compilation unit, they specialize the generated code to the previously observed behaviors, betting that past is prologue. When speculation fails, the execution must jump back to unoptimized code. In this paper, we propose an approach to further the specialization, by disentangling classes of behaviors into separate optimization units. With contextual dispatch, functions are versioned and each version is compiled under different assumptions. When a function is invoked, the implementation dispatches to a version optimized under assumptions matching the dynamic context of the call. As a proof-of-concept, we describe a compiler for the R language which uses this approach. Our implementation is, on average, 1.7× faster than the GNU R reference implementation. We evaluate contextual dispatch on a set of benchmarks and measure additional speedup, on top of traditional speculation with deoptimization techniques. In this setting contextual dispatch improves the performance of 18 out of 46 programs in our benchmark suite.

Designing types for R, empirically

Authors

Turcotte, A.; Goel, A.; Křikava, F.; Vitek, J.

Year

2020

Published

Proceedings of the ACM on Programming Languages (PACMPL). 2020, 4(OOPSLA), 1-25. ISSN 2475-1421.

Type

Article

DOI

10.1145/3428249

Departments

Programming Research Lab

Annotation

The R programming language is widely used in a variety of domains. It was designed to favor an interactive style of programming with minimal syntactic and conceptual overhead. This design is well suited to data analysis, but a bad fit for tools such as compilers or program analyzers. In particular, R has no type annotations, and all operations are dynamically checked at run-time. The starting point for our work are the two questions: what expressive power is needed to accurately type R code? and which type system is the R community willing to adopt? Both questions are difficult to answer without actually experimenting with a type system. The goal of this paper is to provide data that can feed into that design process. To this end, we perform a large corpus analysis to gain insights in the degree of polymorphism exhibited by idiomatic R code and explore potential benefits that the R community could accrue from a simple type system. As a starting point, we infer type signatures for 25,215 functions from 412 packages among the most widely used open source R libraries. We then conduct an evaluation on 8,694 clients of these packages, as well as on end-user code from the Kaggle data science competition website.

Sampling optimized code for type feedback

Authors

Flückiger, O.; Wälchli, A.; Krynski, S.; Vitek, J.

Year

2020

Published

DSL_Proceedings of the 16th ACM SIGPLAN International Symposium on Dynamic Languages. New York: ACM, 2020. p. 99-111. ISBN 978-1-4503-8175-8.

Type

Proceedings paper

DOI

10.1145/3426422.3426984

Departments

Programming Research Lab

Annotation

To efficiently execute dynamically typed languages, many language implementations have adopted a two-tier architecture. The first tier aims for low-latency startup times and collects dynamic profiles, such as the dynamic types of variables. The second tier provides high-throughput using an optimizing compiler that specializes code to the recorded type information. If the program behavior changes to the point that not previously seen types occur in specialized code, that specialized code becomes invalid, it is deoptimized, and control is transferred back to the first tier execution engine which will start specializing anew. However, if the program behavior becomes more specific, for instance, if a polymorphic variable becomes monomorphic, nothing changes. Once the program is running optimized code, there are no means to notice that an opportunity for optimization has been missed. We propose to employ a sampling-based profiler to monitor native code without any instrumentation. The absence of instrumentation means that when the profiler is not active, no overhead is incurred. We present an implementation is in the context of the A just-in-time, optimizing compiler for the R language. Based on the sampled profiles, we are able to detect when the native code produced by A is specialized for stale type feedback and recompile it to more type-specific code. We show that sampling adds an overhead of less than 3 © 2020 ACM.

World age in Julia: Optimizing method dispatch in the presence of eval

Authors

Belyakova, J.; Chung, B.; Gelinas, J.; Nash, J.; Vitek, J.

Year

2020

Published

Proceedings of the ACM on Programming Languages (PACMPL). 2020, 4(OOPSLA), 1-26. ISSN 2475-1421.

Type

Article

DOI

10.1145/3428275

Departments

Programming Research Lab

Annotation

Dynamic programming languages face semantic and performance challenges in the presence of features, such as eval, that can inject new code into a running program. The Julia programming language introduces the novel concept of world age to insulate optimized code from one of the most disruptive side-effects of eval: changes to the definition of an existing function. This paper provides the first formal semantics of world age in a core calculus named juliette, and shows how world age enables compiler optimizations, such as inlining, in the presence of eval. While Julia also provides programmers with the means to bypass world age, we found that this mechanism is not used extensively: a static analysis of over 4,000 registered Julia packages shows that only 4-9% of packages bypass world age. This suggests that Julia's semantics aligns with programmer expectations. © 2020 Owner/Author.

How to Evaluate the Performance of Gradual Type Systems

Authors

Vitek, J.; Greeman, B.; Takikawa, A.; New, M.S.; Feltey, D.; Findler, R.B.; Felleisen, M.

Year

2019

Published

JOURNAL OF FUNCTIONAL PROGRAMMING. 2019, 29 ISSN 0956-7968.

Type

Article

DOI

10.1017/S0956796818000217

Departments

Programming Research Lab

Annotation

A sound gradual type system ensures that untyped components of a program can never break the guarantees of statically typed components. This assurance relies on runtime checks, which in turn impose performance overhead in proportion to the frequency and nature of interaction between typed and untyped components. The literature on gradual typing lacks rigorous descriptions of methods for measuring the performance of gradual type systems. This gap has consequences for the implementors of gradual type systems and developers who use such systems. Without systematic evaluation of mixed-typed programs, implementors cannot precisely determine how improvements to a gradual type system affect performance. Developers cannot predict whether adding types to part of a program will significantly degrade (or improve) its performance. This paper presents the first method for evaluating the performance of sound gradual type systems. The method quantifies both the absolute performance of a gradual type system and the relative performance of two implementations of the same gradual type system. To validate the method, the paper reports on its application to twenty programs and three implementations of Typed Racket.

Can android run on time? Extending and measuring the android platform's timeliness

Authors

Yan, Y.; Gokul, G.; Dantu, K.; Ko, S.Y.; Vitek, J.; ZIAREK, L.

Year

2019

Published

ACM Transactions on Embedded Computing Systems. 2019, 17(6), ISSN 1539-9087.

Type

Article

DOI

10.1145/3289257

Departments

Programming Research Lab

Annotation

Time predictability is dificult to achieve in the complex, layered execution environments that are common inmodern embedded devices such as smartphones. We explore adopting the Android programming model for arange of embedded applications that extends beyond mobile devices, under the constraint that changes towidely used libraries should be minimized. The challenges we explore include: the interplay between real-timeactivities and the rest of the system, how to express the timeliness requirements of components, and how wellthose requirements can be met on stock embedded platforms. We detail the design and implementation of ourmodifications to the Android framework along with a real-time VM and OS, and provide experimental datavalidating feasibility over five applications.

Feature-specific profiling

Authors

Vitek, J.; Andersen, L.; St-Amour, V.; Felleisen, M.

Year

2019

Published

ACM Transactions on Programming Languages and Systems. 2019, 41(1), ISSN 0164-0925.

Type

Article

DOI

10.1145/3275519

Departments

Programming Research Lab

Annotation

While high-level languages come with significant readability and maintainability benefits, their performance remains difficult to predict. For example, programmers may unknowingly use language features inappropriately, which cause their programs to run slower than expected. To address this issue, we introduce feature-specific profiling, a technique that reports performance costs in terms of linguistic constructs. Feature-specific profilers help programmers find expensive uses of specific features of their language. We describe the architecture of a profiler that implements our approach, explain prototypes of the profiler for two languages with different characteristics and implementation strategies, and provide empirical evidence for the approach's general usefulness as a performance debugging tool. © 2018 Copyright held by the owner/author(s).

On Julia's Efficient Algorithm for Subtyping Union Types and Covariant Tuples (Artifact).

Authors

Vitek, J.; Nardelli, F.; Chung, B.

Year

2019

Published

33rd European Conference on Object-Oriented Programming (ECOOP 2019). Wadern: Schloss Dagstuhl - Leibniz Center for Informatics, 2019. p. 1-24. ISSN 1868-8969. ISBN 978-3-95977-111-5.

Type

Proceedings paper

DOI

10.4230/LIPIcs.ECOOP.2019.24

Departments

Programming Research Lab

Annotation

The Julia programming language supports multiple dispatch and provides a rich type annotation language to specify method applicability. When multiple methods are applicable for a given call, Julia relies on subtyping between method signatures to pick the correct method to invoke. Julia’s subtyping algorithm is surprisingly complex, and determining whether it is correct remains an open question. In this paper, we focus on one piece of this problem: the interaction between union types and covariant tuples. Previous work normalized unions inside tuples to disjunctive normal form. However, this strategy has two drawbacks: complex type signatures induce space explosion, and interference between normalization and other features of Julia’s type system. In this paper, we describe the algorithm that Julia uses to compute subtyping between tuples and unions – an algorithm that is immune to space explosion and plays well with other features of the language. We prove this algorithm correct and complete against a semantic-subtyping denotational model in Coq

On the Design, Implementation, and Use of Laziness in R

Authors

Vitek, J.; Goel, A.

Year

2019

Published

Volume 3, Issue OOPSLA. New York: ACM, 2019. p. 1-27. ISSN 2475-1421.

Type

Proceedings paper

DOI

10.1145/3360579

Departments

Programming Research Lab

Annotation

The R programming language has been lazy for over twenty-five years. This paper presents a review of the design and implementation of call-by-need in R, and a data-driven study of how generations of programmers have put laziness to use in their code. We analyze 16,707 packages and observe the creation of 270.9 B promises. Our data suggests that there is little supporting evidence to assert that programmers use laziness to avoid unnecessary computation or to operate over infinite data structures. For the most part R code appears to have been written without reliance on, and in many cases even knowledge of, delayed argument evaluation. The only significant exception is a small number of packages which leverage call-by-need for meta-programming.

On the Impact of Programming Languages on Code Quality: A Reproduction Study

Authors

Vitek, J.; Máj, P.; Hollenbeck, C.; Berger, E.; Vitek, O.

Year

2019

Published

ACM Transactions on Programming Languages and Systems,Vol. 41, No. 4. 2019, 41(4), 1-24. ISSN 0164-0925.

Type

Article

DOI

10.1145/3340571

Departments

Department of Theoretical Computer Science
Programming Research Lab

Annotation

In a 2014 article, Ray, Posnett, Devanbu, and Filkov claimed to have uncovered a statistically significant association between 11 programming languages and software defects in 729 projects hosted on GitHub. Specifically, their work answered four research questions relating to software defects and programming languages. With data and code provided by the authors, the present article first attempts to conduct an experimental repetition of the original study. The repetition is only partially successful, due to missing code and issues with the classification of languages. The second part of this work focuses on their main claim, the association between bugs and languages, and performs a complete, independent reanalysis of the data and of the statistical modeling steps undertaken by Ray et al. in 2014. This reanalysis uncovers a number of serious flaws that reduce the number of languages with an association with defects down from 11 to only 4. Moreover, the practical effect size is exceedingly small. These results thus undermine the conclusions of the original study. Correcting the record is important, as many subsequent works have cited the 2014 article and have asserted, without evidence, a causal link between the choice of programming language for a given task and the number of software defects. Causation is not supported by the data at hand; and, in our opinion, even after fixing the methodological flaws we uncovered, too many unaccounted sources of bias remain to hope for a meaningful comparison of bug rates across languages.

R melts brains: an IR for first-class environments and lazy effectful arguments

Authors

Flückiger, O.; Chari, G.; Ječmen, J.; Yee, M.; Hain, J.; Vitek, J.

Year

2019

Published

Proceedings of the 15th ACM SIGPLAN International Symposium on Dynamic Languages. ACM New York, 2019. p. 55-66. ISBN 978-1-4503-6996-1.

Type

Proceedings paper

DOI

10.1145/3359619.3359744

Departments

Department of Theoretical Computer Science
Programming Research Lab

Annotation

The R programming language combines a number of features considered hard to analyze and implement efficiently: dynamic typing, reflection, lazy evaluation, vectorized primitive types, first-class closures, and extensive use of native code. Additionally, variable scopes are reified at runtime as first-class environments. The combination of these features renders most static program analysis techniques impractical, and thus, compiler optimizations based on them ineffective. We present our work on PIR, an intermediate representation with explicit support for first-class environments and effectful lazy evaluation. We describe two dataflow analyses on PIR: the first enables reasoning about variables and their environments, and the second infers where arguments are evaluated. Leveraging their results, we show how to elide environment creation and inline functions.

Scala Implicits are Everywhere: A large-scale study of the use of Implicits in the wild

Authors

Křikava, F.; Vitek, J.; Miller, H.

Year

2019

Published

Volume 3, Issue OOPSLA October 2019. New York: ACM, 2019. p. 1-28. ISSN 2475-1421.

Type

Proceedings paper

DOI

10.1145/3360589

Departments

Programming Research Lab

Annotation

The Scala programming language offers two distinctive language features implicit parameters and implicit conversions, often referred together as implicits. Announced without fanfare in 2004, implicits have quickly grown to become a widely and pervasively used feature of the language. They provide a way to reduce the boilerplate code in Scala programs. They are also used to implement certain language features without having to modify the compiler. We report on a large-scale study of the use of implicits in the wild. For this, we analyzed 7,280 Scala projects hosted on GitHub, spanning over 8.1M call sites involving implicits and 370.7K implicit declarations across 18.7M lines of Scala code.

Towards a type system for R

Authors

Turcotte, A.; Vitek, J.

Year

2019

Published

ICOOOLPS '19: Proceedings of the 14th Workshop on Implementation, Compilation, Optimization of Object-Oriented Languages, Programs and Systems. New York: Association for Computing Machinery, 2019. p. 1-5. ISBN 978-1-4503-6862-9.

Type

Proceedings paper

DOI

10.1145/3340670.3342426

Departments

Programming Research Lab

Annotation

R is a fascinating language: It is dynamically typed, vectorized, both lazy and side-effecting, and it fosters an interactive style of programming. This unique combination of features makes it easy to use, but prone to errors and strange behaviour. R is the tool of choice for many data analysts, and our aim is to empower them with a language that is not simply easy to use, but easy to use well, so as to increase their confidence in the data analyses they undertake. To that end, we are developing a type system for R that is simple enough to be attractive to programmers while being expressive enough to capture existing programming paradigms. In this paper, we outline past, present, and future work as we build up to a type system for R.

Correctness of a concurrent object collector for actor languages

Authors

Franco, J.; Clebsch, S.; Drossopoulou, S.; Vitek, J.

Year

2018

Published

Lecture Notes in Computer Science book series. Düsseldorf: Springer-VDI-Verlag, 2018. p. 885-911. ISBN 978-3-319-89883-4.

Type

Proceedings paper

DOI

10.1007/978-3-319-89884-1_31

Departments

Programming Research Lab

Annotation

ORCA is a garbage collection protocol for actor-based programs. Multiple actors may mutate the heap while the collector is running without any dedicated synchronisation. ORCA is applicable to any actor language whose type system prevents data races and which supports causal message delivery. We present a model of ORCA which is parametric to the host language and its type system. We describe the interplay between the host language and the collector. We give invariants preserved by ORCA, and prove its soundness and completeness. © The Author(s) 2018.

Correctness of speculative optimizations with dynamic deoptimization

Authors

Vitek, J.; Fluckiger, O.; Scherer, G.; Yee, M.; Goel, A.; Ahmed, A.

Year

2018

Published

Proceedings of the ACM on Programming Languages (PACMPL). 2018, 2 ISSN 2475-1421.

Type

Article

DOI

10.1145/3158137

Departments

Programming Research Lab

Annotation

High-performance dynamic language implementations make heavy use of speculative optimizations to achieve speeds close to statically compiled languages. These optimizations are typically performed by a just-in-time compiler that generates code under a set of assumptions about the state of the program and its environment. In certain cases, a program may execute code compiled under assumptions that are no longer valid. The implementation must then deoptimize the program on-the-fly; this entails finding semantically equivalent code that does not rely on invalid assumptions, translating program state to that expected by the target code, and transferring control. This paper looks at the interaction between optimization and deoptimization, and shows that reasoning about speculation is surprisingly easy when assumptions are made explicit in the program representation. This insight is demonstrated on a compiler intermediate representation, named sourir, modeled after the high-level representation for a dynamic language. Traditional compiler optimizations such as constant folding, unreachable code elimination, and function inlining are shown to be correct in the presence of assumptions. Furthermore, the paper establishes the correctness of compiler transformations specific to deoptimization: namely unrestricted deoptimization, predicate hoisting, and assume composition.

Julia Subtyping: A Rational Reconstruction

Authors

Vitek, J.; Chung, B.; Belyakova, Y.; Pelenitsyn, A.; Bezanson, J.; Nardelli, F.Z.

Year

2018

Published

Journal Proceedings of the ACM on Programming Languages,Volume 2 Issue, OOPSLA. New York: ACM, 2018. ISSN 2475-1421.

Type

Proceedings paper

DOI

10.1145/3276483

Departments

Programming Research Lab

Annotation

Programming languages that support multiple dispatch rely on an expressive notion of subtyping to specify method applicability. In these languages, type annotations on method declarations are used to select, out of a potentially large set of methods, the one that is most appropriate for a particular tuple of arguments. Julia is a language for scientific computing built around multiple dispatch and an expressive subtyping relation. This paper provides the first formal definition of Julia's subtype relation and motivates its design. We validate our specification empirically with an implementation of our definition that we compare against the existing Julia implementation on a collection of real-world programs. Our subtype implementation differs on 122 subtype tests out of 6014476. The first 120 differences are due to a bug in Julia that was fixed once reported; the remaining 2 are under discussion.

Julia: Dynamism and Performance Reconciled by Design

Authors

Vitek, J.; Bezanson, J.; Chung, B.; Karpinski, S.; Shah, V.B.; Zoubritski, L.

Year

2018

Published

Journal Proceedings of the ACM on Programming Languages,Volume 2 Issue, OOPSLA. New York: ACM, 2018. ISSN 2475-1421.

Type

Proceedings paper

DOI

10.1145/3276490

Departments

Programming Research Lab

Annotation

Julia is a programming language for the scientific community that combines features of productivity languages, such as Python or MATLAB, with characteristics of performance-oriented languages, such as C++ or Fortran. Julia's productivity features include: dynamic typing, automatic memory management, rich type annotations, and multiple dispatch. At the same time, Julia allows programmers to control memory layout and leverages a specializing just-in-time compiler to eliminate much of the overhead of those features. This paper details the design choices made by the creators of Julia and reflects on the implications of those choices for performance and usability.

KafKa: Gradual typing for objects

Authors

Chung, B.; Li, P.; Nardelli, F.Z.; Vitek, J.

Year

2018

Published

Leibniz International Proceedings in Informatics, LIPIcs. Schloss Dagstuhl--Leibniz-Zentrum fuer Informatik, 2018. ISBN 978-3-95977-079-8.

Type

Proceedings paper

DOI

10.4230/LIPIcs.ECOOP.2018.12

Departments

Programming Research Lab

Annotation

A wide range of gradual type systems have been proposed, providing many languages with the ability to mix typed and untyped code. However, hiding under language details, these gradual type systems embody fundamentally different ideas of what it means to be well-typed. In this paper, we show that four of the most common gradual type systems provide distinct guarantees, and we give a formal framework for comparing gradual type systems for object-oriented languages. First, we show that the different gradual type systems are practically distinguishable via a three-part litmus test. We present a formal framework for defining and comparing gradual type systems. Within this framework, different gradual type systems become translations between a common source and target language, allowing for direct comparison of semantics and guarantees. © Benjamin Chung, Paley Li, Francesco Zappa Nardelli, and Jan Vitek.

Monotonic gradual typing in a common calculus

Authors

Chung, B.; Vitek, J.

Year

2018

Published

Proceeding ISSTA '18 Companion Proceedings for the ISSTA/ECOOP 2018 Workshops. New York: ACM, 2018. p. 17-23. ISBN 978-1-4503-5939-9.

Type

Proceedings paper

DOI

10.1145/3236454.3236475

Departments

Programming Research Lab

Annotation

Gradual typing refers to the notion that programs can be incrementally decorated with type annotations. Languages that support this approach to software development allow for programs being in various states of "typedness" on a scale ranging from entirely untyped to fully statically typed. Points in the middle of this typed-untyped scale create interactions between typed and untyped code, which is where gradual type systems differ. Each gradual type system comes with tradeoffs. Some systems provide strong guarantees at the expense of vastly degraded performance; others do not impact the running time of programs, but they do little to prevent type errors. This paper looks at an intriguing point in the landscape of these systems: the monotonic semantics. The monotonic semantics is designed to reduce the overhead of typed field access through a different enforcement mechanism compared to other gradual type systems. In our previous paper, [1], we described four semantics for gradual typing. This paper uses the framework of that companion paper to present and explore a formulation for the monotonic semantics. In comparison to the others, the monotonic semantics is designed to reduce the overhead of typed field access. We translate a common gradually typed source language to a common statically typed target language according to the monotonic semantics, allowing easy comparison to other gradual type systems in our framework.

Self-contained development environments

Authors

Chari, G.; Vitek, J.; Pimás, J.; Flückiger, O.

Year

2018

Published

DLS 2018 Proceedings of the 14th ACM SIGPLAN International Symposium on Dynamic Languages. New York: ACM, 2018. p. 76-87. ISBN 9781450360302.

Type

Proceedings paper

DOI

10.1145/3276945.3276948

Departments

Programming Research Lab

Annotation

Operating systems are traditionally implemented in lowlevel, performance-oriented programming languages. These languages typically rely on minimal runtime support and provide unfettered access to the underlying hardware. Tradition has benefits: developers control the resources that the operating system manages and few performance bottlenecks cannot be overcome with clever feats of programming. On the other hand, this makes operating systems harder to understand and maintain. Furthermore, those languages have few built-in barriers against bugs. This paper is an experiment in side-stepping operating systems, and pushing functionality into the runtime of high-level programming languages. The question we try to answer is how much support is needed to run an application written in, say, Smalltalk or Python on bare metal, that is, with no underlying operating system. We present a framework named NopSys that allows this, and we validate it with the implementation of CogNos a Smalltalk virtual machine running on bare x86 hardware. Experimental results suggest that this approach is promising.

Tests from traces: Automated unit test extraction for R

Authors

Křikava, F.; Vitek, J.

Year

2018

Published

ISSTA 2018 - Proceedings of the 27th ACM SIGSOFT International Symposium on Software Testing and Analysis. ACM PRESS, 2018. p. 232-241. ISBN 978-1-4503-5699-2.

Type

Proceedings paper

DOI

10.1145/3213846.3213863

Departments

Programming Research Lab

Annotation

Unit tests are labor-intensive to write and maintain. This paper looks into how well unit tests for a target software package can be extracted from the execution traces of client code. Our objective is to reduce the effort involved in creating test suites while minimizing the number and size of individual tests, and maximizing coverage. To evaluate the viability of our approach, we select a challenging target for automated test extraction, namely R, a programming language that is popular for data science applications. The challenges presented by R are its extreme dynamism, coerciveness, and lack of types. This combination decrease the efficacy of traditional test extraction techniques. We present Genthat, a tool developed over the last couple of years to non-invasively record execution traces of R programs and extract unit tests from those traces. We have carried out an evaluation on 1,545 packages comprising 1.7M lines of code. The tests extracted by Genthat improved code coverage from the original rather low value of 267,496 lines to 700,918 lines. The running time of the generated tests is 1.9 times faster than the code they came from. © 2018 Association for Computing Machinery.

Making Android Run on Time

Authors

Yan, Y; Dantu, K; Ko, S Y; Ziarek, L; Vitek, J.

Year

2017

Published

Real-Time and Embedded technology and Applications Symposium (RTAS 2017). San Francisco: American Institute of Physics and Magnetic Society of the IEEE, 2017.

Type

Proceedings paper

Departments

Programming Research Lab

Parallelizing Julia with a Non-invasive DSL

Authors

Anderson, T A; Liu, H; Kuper, L; Totoni, E; Vitek, J.; Shpeisman, T

Year

2017

Published

31st European Conference on Object-Oriented Programming (ECOOP 2017). Wadern: Schloss Dagstuhl - Leibniz Center for Informatics, 2017. p. 41-429. ISSN 1868-8969. ISBN 9783959770354.

Type

Proceedings paper

DOI

10.4230/LIPIcs.ECOOP.2017.4

Departments

Programming Research Lab

Annotation

Computational scientists often prototype software using productivity languages that offer high-level programming abstractions. When higher performance is needed, they are obliged to rewrite their code in a lower-level efficiency language. Different solutions have been proposed to address this trade-off between productivity and efficiency. One promising approach is to create embedded domain-specific languages that sacrifice generality for productivity and performance, but practical experience with DSLs points to some road blocks preventing widespread adoption. This paper proposes a non-invasive domain-specific language that makes as few visible changes to the host programming model as possible. We present ParallelAccelerator, a library and compiler for high-level, high-performance scientific computing in Julia. ParallelAccelerator's programming model is aligned with existing Julia programming idioms. Our compiler exposes the implicit parallelism in high-level array-style programs and compiles them to fast, parallel native code. Programs can also run in "library-only" mode, letting users benefit from the full Julia environment and libraries. Our results show encouraging performance improvements with very few changes to source code required. In particular, few to no additional type annotations are necessary.

Verifying a Concurrent Garbage Collector using a Rely-Guarantee Methodology

Authors

Zakowski, Y; Cachera, D; Demange, D; Petri, G; Pichardie, D; Jagannathan, S; Vitek, J.

Year

2017

Published

International Conference on Interactive Theorem Prooving (ITP). Berlin: Springer-Verlag, 2017.

Type

Proceedings paper

Departments

Programming Research Lab

prof. Jan Vitek, MSc., Ph.D.

Publications

Reducing Feedback Pollution

Reusing Just-in-Time Compiled Code

Deoptless: Speculation with Dispatched On-Stack Replacement and Specialized Continuations

signatr: A Data-Driven Fuzzing Tool for R

CodeDJ: Reproducible queries over large-scale software repositories

First-class environments in R

Formally verified speculation and deoptimization in a JIT compiler

Promises Are Made To Be Broken: Migrating R to Strict Semantics

Type stability in Julia: Avoiding performance pathologies in JIT compilation

What we eval in the shadows: A large-scale study of eval in R programs

Contextual Dispatch for Function Specialization

Designing types for R, empirically

Sampling optimized code for type feedback

World age in Julia: Optimizing method dispatch in the presence of eval

How to Evaluate the Performance of Gradual Type Systems

Can android run on time? Extending and measuring the android platform's timeliness

Feature-specific profiling

On Julia's Efficient Algorithm for Subtyping Union Types and Covariant Tuples (Artifact).

On the Design, Implementation, and Use of Laziness in R

On the Impact of Programming Languages on Code Quality: A Reproduction Study

R melts brains: an IR for first-class environments and lazy effectful arguments

Scala Implicits are Everywhere: A large-scale study of the use of Implicits in the wild

Towards a type system for R

Correctness of a concurrent object collector for actor languages

Correctness of speculative optimizations with dynamic deoptimization

Julia Subtyping: A Rational Reconstruction

Julia: Dynamism and Performance Reconciled by Design

KafKa: Gradual typing for objects

Monotonic gradual typing in a common calculus

Self-contained development environments

Tests from traces: Automated unit test extraction for R

Making Android Run on Time

Parallelizing Julia with a Non-invasive DSL

Verifying a Concurrent Garbage Collector using a Rely-Guarantee Methodology