What is important, and what is not, for bioscience data handling

There is an on-going discussion between the main bureaucratic players of Swedish science regarding the issue of data storage and data handling for the biosciences in Sweden. The question they discuss is ”who should pay for what, and how should the money be channelled?”

It is exactly the wrong question.

Since all of these actors (except for non-governmental organisations, e.g. KAW) are financed by tax-payers money, it is a technical budget issue how they decide to finance things. There seems to be a strange idea floating around that if one fiddles with the financing paths, the whole problem of data storage will become more maintainable. This is, in my mind, to miss the point entirely.

Instead, we should focus on the primary issues for science:

  • How is science going to be done in the best way? Data handling and storage must be geared to the researcher’s needs.
  • What makes life easy for the researcher? What should the basic design of the system be?
  • How do we design the system so that it fits with Open Science (at least sometime in the future).
  • How do we give the scientists the correct incentives to store important data, scrap temporary and unimportant data, and to make data searchable?
  • How do we make data available to the world, in keeping with Open Science principles?
  • How do we ensure that researchers keep only data that is being worked on in expensive and performant media, while moving currently-not-so-relevant data to slow, cheap media?

The big players involved are: Vetenskapsrådet (VR, the Swedish Research Council), the Swedish National Infrastructure for Computing (SNIC), the universities, the various national research infrastructures (e.g. the National Genomics Infrastructure, NGI, where I work) and also the non-governmental funding bodies such as Knut and Alice Wallenberg foundation (KAW). They are currently discussing these issues, but it appears only from the bureaucratic perspective of ”who should pay”. Fine, they need to do that.

But who discusses what the researchers need? I have so far seen very little of that essential question. And I know that there are unmet needs. We at NGI have received many desperate questions of how to handle the big data sets we produce. We also get questions on how to make certain data sets generally available to the public, data sets that do not fit into the generally accepted international databases. I have first-hand experience with researchers who buy storage at DropBox and other commercial sites. I have no idea if this is in keeping with university policies. A policy is as good as reality allows it to be. If a university does not provide data storage for its researchers, they will get it elsewhere. I have seen attempts at promoting figshare for Stockholm university, and also, if I remember correctly, box.com, but these are not well known among scientists, their status is unclear, and the strategy is even more opaque.

We need to think about how to realize the law-enforced requirement to store data for 10 years. There is the legal aspect, of course, but mainly there is the fundamental issue that the tax-payers have a legitimate reason to demand that data produced using their money becomes accessible to them. Today, the responsibility for this lies on the universities, which delegate it to the research groups, which proceed to do what they find practical.

Of course, this leads, in the best scenario, to many different fragmented solutions. In practice, many groups will be hard pressed to dig out data sets 10 years old. Computers do not last 10 years, and what’s on old hard drives tends not to be brought along to the new computers, especially since the graduate students and postdocs who generated the data are no longer around.

Let’s focus on the main problem here: The researcher’s problem. And let the bureaucracy find solutions that are appropriate to solve that problem. Do not let the bureaucracy concentrate on finding solutions that are convenient for it. Only by pure luck would such a solution be the most optimal for the scientists.

3 reaktioner på ”What is important, and what is not, for bioscience data handling

  1. Erik Lindahl

    While you certainly have many good points, I also find some of them a bit naive 🙂 I would argue that research in general is defined by ”unmet needs” in the sense that the vast majority of researchers in Sweden and the world (no matter what field they are active in) don’t get all the research funding, personell and equipment they ask for. That’s why we have research councils and other agencies that try to prioritize the most important/qualitative research. Storage was trivial when it was small and cheap (although at that point we didn’t care…), but as it is becoming a major cost I don’t see how it’s realistic to hope that everybody should always get everything they need. One way to achieve that would of course be to take the money e.g. from neutron research, medicine, physics, economy, chemistry, climate or engineering. While you & I might like that, I bet each and every one of those fields could come up with equally good arguments why VR should reduce the funding to our field instead. At the end of the day, there is no magical kettle at the end of the rainbow with extra funding. Many of the agencies you mention above can certainly help achieve the technical solutions, but the difficult part is that we need to start compromising: We need to spend more money on storage & computing, but that will mean spending less money on other parts of our field.

    1. Well, naive, yes maybe. But sometimes one needs naivety to cut through the crap. Especially when the more fundamental issues have not been addressed.

      It is certainly not my point that more money in general should be channeled to bioscience from other fields. I think we agree that more money is needed for storage in the future of bioscience, but it will have to be allocated from bioscience funds. But this makes it even more important to make sure that it ends up in systems that cost-effectively help the scientists. I am very concerned that what is short-term convenient for VR, SNIC, the universities, NGI and others will be severely suboptimal for the researchers, especially considering the requirements of Open Science.

  2. Pingback: I have published an opinion piece on Open Science in the Swedish newspaper Dagens Nyheter – Civilisation

Kommentera

Fyll i dina uppgifter nedan eller klicka på en ikon för att logga in:

WordPress.com Logo

Du kommenterar med ditt WordPress.com-konto. Logga ut / Ändra )

Twitter-bild

Du kommenterar med ditt Twitter-konto. Logga ut / Ändra )

Facebook-foto

Du kommenterar med ditt Facebook-konto. Logga ut / Ändra )

Google+ photo

Du kommenterar med ditt Google+-konto. Logga ut / Ändra )

Ansluter till %s