Software in science is ubiquitous yet overlooked, researchers say

Software is omnipresent in science, and yet it is overlooked everywhere. At a time when scientists (and many others) are talking about code, algorithms or artificial intelligence, software appears in the discourse as just another semantic subtlety. Many facets of software, such as questions about user licenses or file formats, are not part of the definition of code or algorithm.

Now fourteen scientists, the majority of whom work or have worked at the Käte Hamburger Kolleg: Cultures of Research (c:o/re) at RWTH Aachen University, have published an article on the lack of attention paid to software. The work appears in Nature Computational Science.

The c:o/re team is an interdisciplinary research training group that appoints ten international fellows from the humanities and social sciences, natural and technical sciences, as well as art and art history, to Aachen every year. It is an International Center for Advanced Studies in history, philosophy, and sociology of science and technology. It is headed by philosophy professor Gabriele Gramelsberger (Chair of Philosophy of Science and Philosophy of Technology) and sociology professor Stefan Böschen (Chair of Technology and Society). It analyzes the transformations of science and technology and discusses questions such as the reproducibility of research or open access to software.

In their contribution, the authors call for bringing together perspectives on software from different fields of applied science (e.g. computer-based sciences as well as humanities and social sciences) and informatics (development, use, support, etc.) backgrounds in order to uncover the different meanings that software can have. Case studies in various scientific fields, including older software developments, are intended to help improve the understanding of software.

A simple example: Microsoft Excel autocorrect

An example from bioinformatics: In the "supplementary materials" of bioinformatics publications, the preferred format for long gene lists, surprisingly, is the Microsoft .xls format. However, Excel automatically converts the designation MARCH1 for the gene "Membrane Associated Ring-CH-type finger 1" into a date. This distorts the data listed. A publication from 2021 reminds us that the problem was recognized (and published) as early as 2004, but never disappeared. One-fifth of the publications dealing with gene lists contain these errors.

Researchers could use tabularized plain text (.csv files), but they don't do this because they are used to spreadsheets. However, spreadsheets are not designed for this type of processing of large data sets. Another reason is the dependence on widespread Microsoft software, which characterizes many scientific practices. It took 20 years for the researchers to finally rename the genes in question. Just recently, Microsoft Excel, a thirty-year-old software package, enabled the conversion of a string into a date to be de-automated.

Research on practices and transformations in the fields of science and technology

The authors of the article take a look at the topic of software in scientific research from the disciplines of computer-aided sciences, history, philosophy of science, semiotics, science and technology studies (STS), and media studies. They work at various universities around the world.

The majority of them were fellows at the Käte Hamburger Center for Advanced Study in the Humanities: Cultures of Research (c:o/re), where the idea for the joint publication emerged from the workshop "Engineering Practices: New Horizons in the Social Study of Science and Software."