Metadata ...


Table of Contents

Günter has a nice entry on metadata and explores correspondences across the GLAM sectors – libraries, archives and museums. He notes a specific content type in each domain, bibliographic, archival, and material culture, respectively. Then he compares the metadata stack for each type of material, using a useful typology: data structure (e.g. MARC), data content (e.g. AACR2), data format (e.g. ISO 2709) and data exchange (OAI). Check it out for fuller enumeration of acronyms. Of course, one can add other acronyms along various dimensions …
Reading the entry prompted several thoughts, largely from a library perspective:

  • Conceptual models. The library community has FRBR; the museum community has the CIDOC Conceptual Reference Model. Each attempts to identify and define concepts important to a domain and, importantly, the relationships between them: they aim to provide a model of the world of interest, which in turn provides a basis for design of metadata approaches. Of course, although I say ‘world’ there are things in the world which are not included. FRBR, for example, identifies some of the concepts and relationships of interest, and not others. Other models have been developed in more specific areas. A couple which are influenced by FRBR are Michael Heaney’s work on collections, and, more recently, Andy Powell and Julie Allinson’s work on the model underlying the E-prints application profile.

    This work uses a combination of FRBR and the DCMI Abstract Model to create a description set for an eprint that is much richer than the traditional flat descriptions normally associated with Dublin Core. The intention is to capture some of the relationships between works, expressions, manifestations, copies and agents. [eFoundations: DC-2006 Special session – ePrints Application Profile]

    INDECS and the work built on it is in a similar space in the rights world.

  • Abstract model. The Dublin Core Abstract Model is a data model, whose purpose ” is to provide a reference model against which particular DC encoding guidelines can be compared, in order to facilitate better mappings and translations between different syntaxes”. More broadly, its supporters see it as having application beyond DC, potentially providing a consistent framework for how one groups properties about resources. In a way, it shifts emphasis from particular fixed ‘data structures’ in the typology above towards constructs like application profiles.
  • The data structures mentioned by Günter, and other data structures, will typically designate some elements whose values are taken from controlled lists or vocabularies. We are used to thinking about controlled vocabularies for people (e.g. authority files), places (e.g. gazeteers) and things (e.g. subject schemes like LCSH, MESH, and so on). This is clearly an area of strong shared interest for libraries, archives and museums even if approaches have diverged. There are other controlled lists. For example, Thom talks about MARC relator terms and codes, where the redundancy he discusses would seem to limit the usefulness of the controlled approach. This is a pity, as relationships between entities are probably among the most useful things that we can record about them especially as we try to improve navigation, clustering and retrieval in large bibliographic systems. We have lists for languages or countries and so on. Onix has codelists; indeed its approach is to ‘control’ a large part of the data. An advantage of control is predictability, simplifying design and processing. A more permissive or discretionary approach may appear attractive to some, but ultimately may make data less useful and applications harder to build.
  • In the library community, the ISO2709/MARC/AACR stack is in widespread use but is not universal.
  • Although they are intricately connected, the data structure (MARC), the data content structure (AACR/RDA), and the conceptual model (FRBR), are managed through different structures and on different schedules. One might argue that while they are conceptually distinct; in practice they are closely linked and mutually interdependent.
  • At the data structure level, a library may have some interest in MARC, various flavors of Dublin Core, MODS, EAD, and potentially IEEE LOM and Onix. Given the variety of levels at which this data can diverge, issues of transformation are complex.

One could go on. Does this all seem a little too complex in our fast moving world?
I hope that the Advisory Committee on the Future of Bibliographic Control, established by the Library of Congress, considers some of these issues. (Disclosure: I am at at-large representative on the Committee.)
Note: I have benefited from some discussion with colleagues on these matters and am certainly interested in more general views about the ‘future of bibliographic control’.
Related entry:


comments so far.

Sign in to comment or sign up if you are not already a member.