Articles on structured data: matching, mining and mixing


The current issue of Library Resources and Technical Services (not on the web) has a couple of interesting articles which touch on the complications of processing inconsistent data.
Creating organization name authority within an ERM system
Kristen Blake and Jackquie Samples
LRTS 53(2) April 2009 p 94-107
This article looks at issues of Organization name consistency within ERM systems. There is some general discussion of structured data within ERM systems followed by a specific focus on organization names. Some OCLC initiatives are discussed as part of the environmental analysis, and several people managing ERM functions in libraries are interviewed. Work on this issue at NCSU is then described. The authors conclude by suggesting that such work provides local benefits and pointing to the advantages of greater community discussion and agreement.
Automated metadata harvesting: low-barrier MARC record generation from OAI-PMH repository stores using MarcEdit.
Terry Reese
LRTS 53(2) April 2009 p 121-134
This article outlines an environment in which a library is interested in metadata for digital resources from multiple sources, often available via OAI-PMH as organizations expose data about their digital collections. A particular requirement is explored: to convert such multiple streams into MARC for internal library use. This involves crosswalking between metadata from different creation regimes and MARC. A particular issue here is the absence of widely deployed content standards which means that inconsistent approaches to data creation have to be managed. OCLC services are also considered as part of the environmental analysis. The article then considers how MarcEdit, a tool created by the author, can be used to address the requirements described.
Each of these articles discusses a topic that is the subject of active OCLC interest and development attention. But it was less this that prompted me to connect them than the discussion of where and why consistency of data is important, and what complications for processing arise in the absence of such consistency.
Library standards-making has often seemed to emerge from social consensus-making and a type of anticipative refinement to meet a wide range of cases. As libraries exercise data in a variety of applications, as data flows between systems more, as matching, mining and mixing become important to support services, then there is more evidence available about where structure and consistency are important and what trade-offs are involved (between upstream flexibility in data creation practices and downstream processing and use, for example). This should improve the ways in which we think about standardisation and best practices.




Sign in or become a member to comment. See Membership page for more detail.


Responsible for Membership and Research at OCLC, serving libraries around the world.