Data flows in the book world

One of the recommendations of the Library of Congress Working Group on the Future of Bibliographic Control was that ways should be found of harnessing publisher data upstream of the cataloging process. The rationale was that this would make data about materials available earlier and reduce overall creation effort.
OCLC recently organized an invitational symposium which had this issue as a central topic. The report is an interesting set set of notes from the different perspectives of the multiple players involved. It discusses current practices and incentives to do things differently.
In a follow-on activity to the LC report, R2 Consulting are mapping the flow of MARC records in North American. The symposium notes say: “This list of distributors is much larger than originally anticipated and consists of a very diverse group of entities.”
And, as I discussed the other day, the Research Information Network has published a report about UK practices, Creating catalogues: bibliographic records in a networked world [Splash page; pdf], which also recommends greater re-use of records across the publishing and library worlds.
So, there certainly seems to be a convergence of interest here. Indeed, the potential benefits of such sharing have been a topic of discussion for many years. For example, at the OCLC Symposium, Brian Green, Executive Director of the International ISBN Agency, and I reminisced about UK initiatives to we had been party almost, gulp, twenty years ago to try to create the conditions for an ‘all-through’ system of bibliographic record exchange between the various players in the bookworld.
Now, clearly quite a lot has happened and as R2 reported above data flows through many parties. And publisher data does flow into CIP, and into various organizations which support libraries. Amazon has done much to underline the importance to publishers of having book metadata to support a variety of operations. That said, the renewed emphasis on publisher-library data flow, certainly from the library side, suggests that much more might be done.
Why has more not happened to promote the flow of metadata through the system, from publishers to libraries? Three things occur …
First, there is the mechanical issue of data exchange. Onix has now emerged as a shared approach to disseminating publisher data. However, it is interesting reading the remarks about Onix in the report of the OCLC Symposium. Netlibrary reports that 10% of publishers supply data in ONIX, representing 50% of the supplied content. NLM also reported that 10% of publishers supply Onix, but that these account for 80% of materials catalogued at NLM. There were also lots of comments about the consistency of Onix data. However, one would expect improved technical apparatus to support data flow, not create the need for it.
This prompts the second question: what incentives exist and are they aligned across the system? Historically, metadata may have been created for different purposes. Publishers had an interest in the supply chain, and libraries an interest in inventory control. There may be a shared interest in discovery, but it has been approached differently in each area. In fact, one library interest is a recognition that more descriptive material (table of contents, summary, etc) is in fact very useful for users of their catalogs and other systems even though they have not historically made it a part of their catalog data. There may also be an interest in getting basic descriptive data earlier, to allow more time to be spent on other parts of record creation. What incentives exist for publishers to make data available to libraries? Amazon, and other agents in the supply chain, provide an incentive to make appropriate metadata available to support discovery and sales. Data is supplied for CIP purposes. Are there additional incentives? One may be to have enriched metadata flow back to publishers. Are there incentives here which are strong enough for a framework to emerge within which there is greater flow?
And third, related to this, and probably most important, is that the incentives on either side have not been strong enough to encourage organizations to develop services in this area which would make the flow a reality.
Of course, the reason that OCLC hosted the Symposium mentioned above is that it is now looking at whether it is sensible to begin providing such services. It is doing this in its ‘next generation cataloging‘ program.

OCLC has launched a pilot project to explore upstream metadata capture and enhancement using publisher and vendor ONIX metadata. Pilot partners from the publishing, vendor and library communities are assisting us in this effort. We hope the pilot will result in ongoing processes for the early addition of new title metadata to WorldCat and enhanced quality and consistency in upstream title metadata used by multiple channels. [Next generation cataloging]

Update: In response to query, see here for more information about how OCLC can work with publishers and here for how OCLC works with book vendors to deliver cataloging data.
Postscript: The conversation with Brian Green prompted me to look up various pieces I wrote at the time which reflected some of the discussion we remembered. (I note that while I have difficulty opening Word files from that time, the RTF file is still readable.)

Publishers and libraries: an all-through system for bibliographic data?
International Cataloguing and Bibliographic Control. 20 (3), July/September, 1991, 37-41.
RTF: https://www.ukoln.ac.uk/services/papers/ukoln/dempsey-1991-01/ubcim.rtf
Users’ requirements of bibliographic records: publishers, booksellers, librarians.
ASLIB Proceedings, 42 (2), February 1990, 61-69. [Worldcat.org]
Bibliographic records: use of data elements in the book world. Bath: Bath University Library, 1989. ISBN 0861970853 [Worldcat.org]

Data flows in the book world

Workflow is the new content 1: looking at research support and engagement

The decentered library network presence

Twit-therapy .. a quick note

lorcan dempsey dot net