Rigorous Engineering of Data Analysis Pipelines (RiGiD)

Program
Grantové projekty excelence v základním výzkumu EXPRO
Provider
Czech Science Foundation
Code
GX23-07580X
Period
2023 - 2027
Description
The RiGiD project lays the groundwork for this research programme and aims to develop a methodology for rigorous engineering of data analysis pipelines that can be adopted in practice. Our approach is pragmatic. Rather than chasing functional correctness, we hope to substantially reduce the incidence of errors in the wild. The research is structured in three overlapping chapters. First, identify the problem by carrying out user studies and large-scale program analysis of a corpus of over 100,000 data science pipelines. The outcome will be a catalog of error patterns as well as a labeled dataset to be shared with other researchers. The technical advances will focus on combining dynamic and static program analysis to approximate the behavior of partial programs and programs written in highly dynamic languages. The second part of our effort proposes a methodology and tooling for developing data sciences codes with reduced error rates. The technical contributions of this part of the project focus on lightweight specification techniques and, in particular, the development of a novel gradual typing system that deals with common programming idioms found in our corpus. This includes various forms of object orientation, data frames, and rich value specifications. These specifications are complemented with an automated test generation technique that combines test and input synthesis with fuzzing and test minimization. Finally, the execution environment is extended to support automatic reproducibility and result audits through data lineage. The third and last part of the work evaluates the proposal by conducting user studies and developing tools for automating deployment. The contribution will be a qualitative and quantitative assessment of the RiGiD methodology and tooling. The technical contribution will be tools that leverage program analysis to infer approximate specifications to assist deployment and adoption. Our tools target R, a language for data analytics with 2 milli

Connect and align ELIXIR Nodes to deliver sustainable FAIR life-science data management services’

Program
Horizon 2020
Provider
European Commission
Code
871075
Period
2021 - 2024
Description
The diversity, complexity and volume, as well as privacy and regulatory considerations, necessitate a collaborative and federated approach to life-science data. For scientists to find and share data across Europe and world-wide, ELIXIR needs to continuously develop and connect its services. The international ecosystem provided by ELIXIR – with 220 institutes in 23 Nodes, connecting hundreds of bioinformatics services – is globally unique and a competitive advantage for European research. Through our national Nodes ELIXIR has the geographical spread, service portfolio and expertise to fulfil our ambition that every European project uses FAIR data based on common standards, tools and services. The initial operational phase of ELIXIR, supported by the H2020 ELIXIR-EXCELERATE project, focussed on the coordination and delivery of bioinformatics services from national Nodes. This lay the foundation for a coordinated European infrastructure. ELIXIR-CONVERGE will build on these achievements to deliver another critical component: the provisioning, across Europe, of distributed local support for data management based on a toolkit for researchers that enables lifecycle management for their research data according to international standards. ELIXIR-CONVERGE will develop the national operations of such a distributed research infrastructure to drive good data management, reproducibility and reuse in a heterogeneous funding landscape. Over 36 months and with partners from our 23 Nodes, ELIXIR-CONVERGE takes the next step to realise a European data federation where interconnected national operations, strategically managed via national research infrastructure roadmaps, allow users to extract knowledge from life science’s large, diverse and distributed datasets. By connecting ELIXIR Nodes to provide FAIR data management as a service, ELIXIR-CONVERGE will build national capacity and create a blueprint for operating sustainable Nodes in distributed research infrastructures.

New Frontiers in Computational Social Choice

Program
Standard projects
Provider
Czech Science Foundation
Code
GA22-19557S
Period
2022 - 2024
Description
Fixed-parameter tractability and approximation algorithms are nowadays standard tools for design of algorithms for hard problems in the area of Computational Social Choice. Surprisingly, kernelization, a prominent technique in FPT algorithmics, is not used as often to tackle social choice problems. Kernelization is a formalism of safe data reduction which we believe does have its place in all research disciplines dealing with large and complex datasets. The most recent approach is the so-called lossy kernelization which on the one hand cooperates with approximation algorithms (unlike kernelization which can only be pipelined with exact algorithms) and on the other hand, allows circumventing hardness results (in exchange for introducing a possible loss in the quality of the solution). This project aims on filling this gap of usage of these tools in computational social choice. The suggested line of research continues our current studies in this area and the new proposed directions will need novel algorithmic approaches as they focus on the boundaries of traceability. We have identified several interesting questions where there is potential to apply the above. Our aim is to significantly deepen our understanding of these computationally hard problems by providing polynomial-sized (lossy) kernels or showing that the problems are resistant to this line of attack.

logicMOVE: Logic Reasoning in Motion Planning for Multiple Robotic Agents

Program
Standard projects
Provider
Czech Science Foundation
Code
GA22-31346S
Period
2022 - 2024
Description
Motion planning for multiple robotic agents (MR-MoP) is a task to find non-colliding sequences of simple movements for individual robotic agents so each agent achieves its individual goal. An important character-istic of the task is the large number of relatively simple robotic agents that can physically interact with each other in various ways. The task is based on the well-known multi-agent path finding (MAPF), but places more emphasis on the real properties of the environment in which the robotic agents operate, namely the continuity of space and time is assumed. Considering the continuity of the environment directly in abstract models can lead to more precise and mode efficient plans. The project assumes algorithmic contributions to motion planning for a multi-agent system on all important layers of common planning abstractions, i.e. from the level of (discrete) classical planning, through (continuous) motion planning, to the execution of plans with physical robots. The new algorithms will be based on the principles of logical reasoning, in particular lazy compilation approaches.

The person responsible for the content of this page: doc. Ing. Štěpán Starosta, Ph.D.