Herbert Van de Sompel and colleagues at LANL have been writing about the aDORe architecture for a while (see [pdf] for example). They have now released software to implement an aDORe archive.
The aDORe Archive is a write-once/read-many storage approach for Digital Objects and their constituent datastreams. The approach combines two interconnected file-based storage mechanisms that are made accessible in a protocol-based manner. First, XML-based representations of multiple Digital Objects are concatenated into a single, valid XML file named an XMLtape. The creation of indexes for both the identifier and the creation datetime of the XML-based representation of the Digital Objects, facilitates OAI-PMH-based access. Second, ARC files, as introduced by the Internet Archive, are used to contain the constituent datastreams of the Digital Objects in a concatenated manner. An index for the identifier of the datastream facilitates OpenURL-based access. The interconnection between an XMLtape and its associated ARC file(s) is provided by conveying the identifiers of these ARC files as administrative information in the XMLtape, and by including OpenURL references to constituent datastreams of a Digital Object in the XML-based representation of that Digital Object stored in the XMLtape. [aDORe Archive – Overview]
I am pleased to see some of our Open Source Software as part of the included third-party libraries (in particular for OpenURL and OAI support – see here, and here and here).
We still have no agreed protocol-based ‘interface layer’ for repositories. This has two parts: what are the core services one would like to support, and what approach is best for each service. Each repository has its own way of interacting with users and user applications. This issue is complicated as increasingly we want such interface layers to be active across domains. Think for example of a campus environment which has an institutional repository and a learning object repository, or various departmental repositories (of drawings, of slides, of data-sets, whatever). Increasingly, we want to have consistent ways of interacting with (at least some of) these. Think of a simple scenario. Many people create documents within Microsoft applications. It would be nice to be able to build a simple application on top of, say, the Research Pane (or whatever succeeds it in upcoming versions of Windows) which communicates with one or more repositories so that one could simplify deposit/retrieval/etc by placing it in the user’s routine desktop workflow.
We have candidate protocols for common interfaces:
get: OpenURL
put: SRU/W update
harvest: OAI-PMH
search: SRU/W
An interesting question then becomes how persuasive these approaches are within our community, and maybe more importantly, outside it, to others who are building repositories and applications which interact with them.