A researcher’s view of data handling for life science

Given the current mess of data handling in life science (or bioscience, as it is also called) which I described in a previous article, what should be done? Let us begin with a few words from one of the gurus:

You have to start with the customer experience and work backwards to the technology.

Steve Jobs, quoted here.

We should start by defining what the needs are. What does the scientist, the research group, want in terms of data storage and handling? What do they need in order to pursue successful life science? What other goals for data storage in Swedish science are there? How can we promote approaches to data handling that facilitate Open Science?

This text is not a final text or treatise. It’s a snapshot of my thinking on the subject. Serious policy and design specifications must of course be crafted through debate and input from various experts. There is now an initiative at the Science for Life Laboratory, where I work, to discuss these issues. I have written this as a starting point for the discussion, in the hope that it may be useful for SciLifeLab and for others.
Mötesplats Open Access 2016: Open Science needs infrastructure

The two-day conference Mötesplats Open Access 2016 MOA2016 was held 26-27 April 2016 in Stockholm, Sweden. It was arranged by Kungliga Biblioteket together with Stockholm University. My conclusion is that the conference showed that there are severe deficiencies in the policies and infrastructure required for Open Science, even if the idea of Open Access is fairly well established in Sweden.

I will not review the entire conference. The presentations are available here and the Powerpoint slides are available here. Instead, I will discuss the main unsolved issues that I think the conference brought into focus.
What is important, and what is not, for bioscience data handling

There is an on-going discussion between the main bureaucratic players of Swedish science regarding the issue of data storage and data handling for the biosciences in Sweden. The question they discuss is ”who should pay for what, and how should the money be channelled?”

It is exactly the wrong question.

Since all of these actors (except for non-governmental organisations, e.g. KAW) are financed by tax-payers money, it is a technical budget issue how they decide to finance things. There seems to be a strange idea floating around that if one fiddles with the financing paths, the whole problem of data storage will become more maintainable. This is, in my mind, to miss the point entirely.
The mess in bioscience data handling

Science is a social activity relying on knowledge sharing, reproducibility, reanalysis and extension of previous work. The movement towards Open Access publication and Open Science sharing of data and analysis protocols can be seen as a natural development of these ideals. Large data sets are essential to many scientific investigations and are sometimes the product of an investigation. The biosciences have fairly recently started producing large data sets. There are several well-funded international efforts maintaining focused bioscience data sets, such as genomes at Ensembl, protein sequence data at UniProt, and many others.

Bioscience researchers are performing more Big Data experiments, but the various infrastructures available at the group, department, university and national levels are unable to cope. The situation for individual research groups is basically a mess. Various ad hoc solutions are being implemented, ultimately leading to a patchwork of systems that is becoming increasingly difficult for anyone to navigate. This also makes proper implementation of Open Science extremely hard, if not impossible.
Svensk forskning: brist på risktagande

Två seminarier häromdagen handlade om vetenskap och politik. På ett något oväntat sätt satte de fokus på ett viktigt problem med svensk forskning: Dess brist på risktagande.

Det ena seminariet (24 okt, Sveriges Ingenjörer) handlade om hur utvärdering av forskning ska gå till. Statssekreterare Peter Honeth presenterade regeringens syn på saken. En synpunkt som framfördes i diskussionen var att svensk forskning inte tar tillräckliga risker. I alltför hög grad ägnar man sig åt teman och fält som redan är erkänt intressanta. Man bryter sällan ny mark.

Det andra seminariet (23 oktober, Aula Magna, Stockholms Universitet) råkade illustrera detta. Det handlade om Open Access, dvs trenden att fler och fler vetenskapliga artiklar publiceras under s.k. Open Access (fri tillgänglighet) så att man inte behöver dyra prenumerationer för att komma åt dom. Seminariet innehöll bl.a. en paneldiskussion med representanter för lärosätena i Stockholm. Alla höll med om att Open Access var viktigt, och att man måste följa vad som händer.

Men ingen tycktes tänka tanken att det fanns något man själv kunde göra för att påverka utvecklingen. Eller att man kan hitta möjligheter att utnyttja utvecklingen till egen fördel. Attityden var alltså i grunden passiv. Man tycktes vara fast i tänket att vi små stackare i Sverige inte kan påverka vad de stora drakarna gör.

Om vi ska få mer risktagande i forskningen så får man göra sig av med den attityden. När ett gammalt system, som hur vetenskapliga artiklar publiceras, bryts upp och är på väg att ersättas av ett nytt, då finns alla möjligheter även för små aktörer att bidra på ett betydande sätt. Det gäller bara att ha ideerna och våga satsa.