Science, data and the record

Microsoft Research in Cambridge, UK, hosted a very interesting event on the future of science: Science 2020.

The resulting document, Towards 2020 Science, sets out the challenges and opportunities arising from the increasing synthesis of computing and the sciences. It seeks to identify the requirements necessary to accelerate scientific advances – particularly those driven by computational sciences and the ‘new kinds’ of science the synthesis of computing and the sciences is creating. Already this synthesis has led to new fields and advances spanning genomics and proteomics, earth sciences and climatology, nanomaterials, chemistry and physics. [2020 Science]

The website has a range of materials and a parallel issue of Nature has also appeared on the topic.
From the report:

4 We highlight that an immediate and important challenge is that of end-to-end scientific data management, from data acquisition and data integration, to data treatment, provenance, and persistence. …

5 Our findings have significant implications for scientific publishing, where we believe that even near-term developments in the computing infrastructure for science which links data, knowledge and scientists will lead to a transformation of the scientific communication paradigm. [Summary. Towards 2020 science. p.8 pdf]

There is also a ‘reader’s guide’ which introduces the main features of the report. It is available as a word file. Interestingly, it raises a question that I have asked in recent talks about the ‘scholarly record’.

With an increased reliance on highly distributed and highly derived data, there is a largely unsolved problem of preserving the scientific record. There are frequent complaints that by placing data on the web (rather than conventional publications or a centralised database), essential information is lost. How do we record how a dataset was derived? How do we preserve the history of a dataset that changes all the time? How do we find the origin of data that has been repeatedly copied between data sources? Such issues have to be resolved to offer a convincing infrastructure for scientific data management. [Towards 2020 Science — A Reader’s Guide, March 2006 – word file]

As the scientific record resides not only in the published results, but in the data and applications, questions about authenticity, provenance and context come to the fore. And questions about the integrity of citation.

Science, data and the record

Teaching: one year in

Workflow is the new content 1: looking at research support and engagement

University Futures are shaping Library Futures

Lorcan Dempsey dot Net