A researcher’s view of data handling for life science

Given the current mess of data handling in life science (or bioscience, as it is also called) which I described in a previous article, what should be done? Let us begin with a few words from one of the gurus:

You have to start with the customer experience and work backwards to the technology.

Steve Jobs, quoted here.

We should start by defining what the needs are. What does the scientist, the research group, want in terms of data storage and handling? What do they need in order to pursue successful life science? What other goals for data storage in Swedish science are there? How can we promote approaches to data handling that facilitate Open Science?

This text is not a final text or treatise. It’s a snapshot of my thinking on the subject. Serious policy and design specifications must of course be crafted through debate and input from various experts. There is now an initiative at the Science for Life Laboratory, where I work, to discuss these issues. I have written this as a starting point for the discussion, in the hope that it may be useful for SciLifeLab and for others.
Fortsätt läsa ”A researcher’s view of data handling for life science”

Mötesplats Open Access 2016: Open Science needs infrastructure

The two-day conference Mötesplats Open Access 2016 MOA2016 was held 26-27 April 2016 in Stockholm, Sweden. It was arranged by Kungliga Biblioteket together with Stockholm University. My conclusion is that the conference showed that there are severe deficiencies in the policies and infrastructure required for Open Science, even if the idea of Open Access is fairly well established in Sweden.

I will not review the entire conference. The presentations are available here and the Powerpoint slides are available here. Instead, I will discuss the main unsolved issues that I think the conference brought into focus.
Fortsätt läsa ”Mötesplats Open Access 2016: Open Science needs infrastructure”

Open Science: challenges for life science

I have created a page Open Science with links to interesting discussions and information about Open Science, especially issues related to life science (bioscience). I hope to add more links to the page as I find them. I may also discuss some of them in this forum. Please feel free to send me tips.

The latest addition to the page is a very good recent article in The Economist about the emergence of bioRxiv, the pre-print server for life science. Publication of research in life science is currently under stress for many reasons. As the article in The Economist discusses, the delay between submission and publication of an article in life science may cause real damage. The peer review system is showing serious signs of dysfunction. Too many publications present results that cannot be replicated.

The Open Science movement is gathering momentum, in part as a response to these issues. Many challenges and issues lie ahead, and we are in for a very interesting time. Science, especially life science, must find new ways of doing things. Data storage and publication is one of these issues. The web is obviously already the medium of choice for scientific publication, and we need to leverage its advantages. I intend to write more about these issues in the near future.

The mess in bioscience data handling

Science is a social activity relying on knowledge sharing, reproducibility, reanalysis and extension of previous work. The movement towards Open Access publication and Open Science sharing of data and analysis protocols can be seen as a natural development of these ideals. Large data sets are essential to many scientific investigations and are sometimes the product of an investigation. The biosciences have fairly recently started producing large data sets. There are several well-funded international efforts maintaining focused bioscience data sets, such as genomes at Ensembl, protein sequence data at UniProt, and many others.

Bioscience researchers are performing more Big Data experiments, but the various infrastructures available at the group, department, university and national levels are unable to cope. The situation for individual research groups is basically a mess. Various ad hoc solutions are being implemented, ultimately leading to a patchwork of systems that is becoming increasingly difficult for anyone to navigate. This also makes proper implementation of Open Science extremely hard, if not impossible.
Fortsätt läsa ”The mess in bioscience data handling”