We can manage huge amounts of unused research data

Research in genetics, biomedicine, archaeology or ecology leaves behind a vast amount of data, which reuse is difficult. This valuable data cannot just be left lying around because in medicine, for example, we could lose vital information for research into incurable diseases. Data management planning is critical to solving this problem. However, it is challenging and feared by scientists. A team from FIT CTU has developed an effective solution in the form of the Data Stewardship Wizard (DSW). This tool helps to plan how to use existing data best, describe the resulting data correctly, store it and make it available for scientific purposes to universities and other research organizations in the EU and worldwide. The European Commission also recommends the tool.

Research churns out petabytes (10^15 bytes) of data every year worldwide. And enormous wealth lies in the scientific data, often created as a by-product of research. Correct processing, security and proper documentation of this data are crucial for their use and, thus, for science as a whole. In medicine, for example, the major problem is that the institutions often themselves do not know what the data on a problem already exists, where to look for them and what exactly they represent. It is one of today’s fundamental societal challenges emerging with the development of informatics and the increasing investment in digitization.

Experts from FIT CTU try to make data more accessible and usable for scientists through projects focused on the goals of the FAIR initiative. FAIR is the acronym of an initiative which aims to make data Findable, Accessible, Interoperable and Reusable. One of the team’s achievements is the DSW data management planning tool.

“We developed the DSW tool at FIT CTU in cooperation with Dutch colleagues within the ELIXIR infrastructure. It is groundbreaking because it facilitates data management planning for scientists, which all public funders now require. Without proper data management planning, no scientist would start a challenging experiment, but it is often done as an ‘annoying duty’. One reason for this is the difficulty of creating a good plan. This tool provides an easy and efficient way to create good data management plans, guides the researcher, helps to use what is available for research and maximizes the effect of research data, bringing value to researchers, institutions and society as a whole,” said Robert Pergl, head of the FIT CTU Centre for Conceptual Modelling and Implementation (CCMi).

DSW is a tool that is currently most widely used in the natural sciences, but it is also ready for application in other scientific fields. It is used not only for planning itself but also for data management education.

“The FAIR principles significantly improve data reusability, especially concerning linking data to each other, for example, linking data from clinical drug trials to genetics research. Efficient use of data at a global level is also crucial for effectively dealing with epidemiological situations, as demonstrated by the COVID-19 pandemic. In this context, we participated with the DSW project in the digitization and FAIRification of patient data that can be used, entirely anonymized, in further research,” Robert Pergl commented on the broad applicability of DSW for current societal challenges.

One of the goals of the FAIR initiative is better machine processability of data, which is essential for artificial intelligence (AI) research. If AI has more well-described and interpretable datasets, it will better understand them and use them effectively to improve its results. The exactness and auditability of AI outputs are crucial in moving from “creative AI” to “exact AI”.

DSW was created within the international infrastructure for data in the life sciences called ELIXIR, in cooperation between the Dutch node (ELIXIR-NL) and the Czech node (ELIXIR-CZ), namely, FIT CTU and the Institute of Organic Chemistry and Biochemistry, Czech Academy of Sciences. The project is funded from European and national sources, especially from the large infrastructure support projects of the Ministry of Education, Youth and Sports.

This year, the follow-up 2023–2026 LM project starts, within which the development and deployment of DSW will continue. The ambition is to progressively integrate DSW with other tools to help not only with planning but also with plan implementation, further fulfilling the authors’ vision to contribute to effective and efficient data management.