Configuration of data curation infrastructure


Table of Contents

Further to my remarks about Canadian research data curation of the other day, I was interested to read the following from Chris Rusbridge of the Data Curation Centre.

It is clear that a national research data infrastructure is needed, but there are problems with all of the approaches taken so far to address this. Subject data centres provide subject domain curation expertise, but there are scalability issues across the domain spectrum: it appears unlikely that research funders will extend their funding to a much larger set of these data centres (indeed the AHDS experience might suggest a concern to cut back). Institutional data repositories are being explored, but while disclosing institutional data outputs might provide sustainability incentives, and such data repositories might be managed at a storage level by developments from existing institutional library/archive and IT support services, it is difficult to see how domain expertise can be brought to bear from so many domains across so many disciplines. Meanwhile, various of the studies done by UKOLN/DCC with Southampton University suggest the value of laboratory or project repositories in assisting with curation in a more localised context. [A national research data infrastructure?]

Chris wonders whether one approach would be to unbundle the process, localising data storage with the institution, department, laboratory or other organizational unit, and specialising data curation activities in a national apparatus which assembles relevant domain or subject expertise. Such expertise, he suggests, involves “aspects of appraisal, selection, retention, transformation, combination, description, annotation, quality etc”.
I assume, although Chris does not explicitly state it, that in this model the data storage responsibility assures continuity of access, but that it might be physically sourced in various ways (from a supercomputer center, for example).
This is an interesting arrangement, because it reverses what one might think to be a natural division of labor, where expertise is localised and more generic infrastructure externalised to specialist providers.
Chris suggests that national provision may not extend across the full range of subjects, but also notes that local curation expertise does not scale well given the range of institutional disciplinary interests.
I don’t pretend to know how this type of infrastructure will eventually be provided, and at this stage it is reasonable to be working through various options. The discussion in the recent interim report of the Blue Ribbon Task Force on Sustainable Digital Preservation and Access, of which Chris is a member, has some useful background material in this context.
What I found especially interesting about Chris’s post, though, was the assumption that approaches would have to be multi-scalar in organizational and resourcing terms. (I realise that Chris is talking about how well expertise can scale above, rather than the organizational scale I have in mind here.)
We are increasingly aware that the individual institution is not the best level for many activities. Discovery, for example, is increasingly web-scale in practice, as search engines and other large aggregations support search and discovery of materials. In some cases, national scale activity is plausible, where there is a public funding context for this. Chris’s own organization for example is part of a national apparatus of support for research and learning, provided by JISC and other organizations. This can give discussions in the UK, say, a more strongly national flavor than they might have, say, in the US. In a related context, The Hathi Trust is an example of infrastructure collaboratively sourced within a consortial arrangement.
Of course, multi-scalar planning can get complicated ….


comments so far.

Sign in to comment or sign up if you are not already a member.