Libraries and e-science

: Read in 3 minutes

Emerging data-intensive e-science presents many support challenges for institutions, disciplines and national bodies to work through. The role of the academic library in this multiscale world is also an open question. Two recent reports discuss e-science (or ‘cyberinfrastructure’ or ‘e-research’) in general terms and repay reading.
Liz Lyon, the Director of UKOLN, and also a principal in the Digital Curation Centre, has focused on this area for several years now and has produced an interesting synthesising report for the JISC: Open science at web-scale: optimising participation and predictive potential: consultative report [Summary; Full report PDF]. An important theme of the report is ‘data informatics’, defined in this way: “library and information science methodologies which have been applied to research data”.
The report is organized around six ‘consultation challenges’. The first is ‘scale, complexity and predictive potential’. Here is the summary:

Data-intensive science powered by contemporary computational hardware, software and research techniques, enables scientists to perform experiments and calculations at different orders of magnitude of scale and volume: research that was completed in a year can now be repeated in a weekend. Sustained growth in data modelling, complex simulations and visualisations, facilitate interpretation and analysis by humans and machines, leading to the development of predictive science scenarios in a wider range of disciplines. Examples of data intensive science at these extremes of scale, which enable forecasting and predictive assertions, have been described.

Assessments of the accuracy and robustness of predictions are linked to uncertainty quantification, the accuracy of the underlying model, and the integrity of the data. Key questions address community awareness and understanding of the potential implications and impact of (open) data-intensive science at new extremes of scale and complexity, and the service requirements for associated data curation and preservation. [Open science at web-scale: Optimising participation and predictive potential – summary]

To give some flavor of concerns, here are the other challenges: Continuum of openness; citizen science; credentials, incentives and rewards; institutional readiness and response; data informatics capacity and capability. A brief chapter is devoted to each.
The author is positive about the role of libraries and librarians, particularly in the data informatics section. That said, given the absence of routine service and organizational responses the library role is still expressed in very general terms. What it might mean in practice is naturally less well developed.
The other publication is a collection of essays assembled in honor of Jim Gray:

In The Fourth Paradigm: Data-Intensive Scientific Discovery, the collection of essays expands on the vision of pioneering computer scientist Jim Gray for a new, fourth paradigm of discovery based on data-intensive science and offers insights into how it can be fully realized. [The fourth paradigm]

For Gray the first three paradigms are experimental, theoretical, and computational.

We said, “Look, computational science is a third leg.” Originally, there was just experimental science, and then there was theoretical science, with Kepler’s Laws, Newton’s Laws of Motion, Maxwell’s equations, and so on. Then, for many problems, the theoretical models grew too complicated to solve analytically, and people had to start simulating. These simulations have carried us through much of the last half of the last millennium. At this point, these simulations are generating a whole lot of data, along with a huge increase in data from the experimental sciences. People now do not actually
look through telescopes. Instead, they are “looking” through large-scale, complex
instruments which relay data to datacenters, and only then do they look at the information on their computers.

The world of science has changed, and there is no question about this. The new model is for the data to be captured by instruments or generated by simulations before being processed by software and for the resulting information or knowledge to be stored in computers. Scientists only get to look at their data fairly late in this pipeline. The techniques and technologies for such data-intensive science are so different that it is worth distinguishing data-intensive science from computational science as a new, fourth paradigm for scientific exploration [1]. [Jim Gray on escience – PDF.]

The collection of essays is divided into these sections: Earth and environment; Health and wellbeing; Scientific infrastructure; Scholarly communications. And there are opening and concluding sections. The contributions are readable and in the form of short essays rather than research papers. There is a contribution by Cliff Lynch on the changing scholarly record, by Timo Hannay on the impact of the network on the structure of science, and by Herbert Van de Sompel and Carl Lagoze on the enhancement of the scholarly record with actionable structure.
There is no specific contribution on libraries, and it is interesting to note that the directions of much of the occasional mention of libraries is towards network level digital libraries.
It is important for libraries to understand these changes. The reshaping impact of the network on learning and research behaviors is a more important factor for libraries than the direct impact of the network on library processes themselves.




Sign in or become a member to comment. See Membership page for more detail.


Responsible for Membership and Research at OCLC, serving libraries around the world.