Mass digitization

In preparation for my trip to France recently, I read, in English, Jean-Noël Jeanneney’s critique of the Google digitization of library books, Google and the myth of universal knowledge : a view from Europe (‘available’ here, on, er, Google Book Search). Whatever one’s response to his argument, it is a reminder of how sparse the public policy discussion in libraries has been about the framework and direction of current digitization efforts. This is whether one welcomes these as pragmatic initiatives to broaden access or whether one is concerned about private control of the intellectual record. A couple of things have come over my horizon in the last few days.
Here is David Bearman discussing the book in the current Dlib Magazine:

In the glare of publicity surrounding Google Book Search and other mass digitization projects focused on print culture, we should not lose sight of the small proportion of culture that publication represents, the problems of ceding its control to a private firm, Google’s unfortunately incendiary approach to intellectual property, the poor quality of the digital capture we have seen to date, the limits of search and presentation as performed in this one service and the restriction that Google applies to other potential value-added uses, or the significant problem of cultural bias exacerbated by Google’s advertising business model. Ian Wilson calls our attention to five principles enumerated by national librarians of la francophonie meeting in Paris on February 28, 2006: free access to publicly owned resources; non-exclusive agreements with content providers; capture of preservation standard images with assurances for long-term accessibility; protection of the integrity of original source materials; and provision of multi-lingual, multi-cultural access [17]. Jean-Noël Jeanneney has done us all a service by reminding us to look under the hood and hold Google, and those providing content to it, accountable. In the two years since Google first announced its ambitions, I think the D-Lib community has largely given Google the benefit of the doubt; now that some results are visible and the implications are more clear, I think it’s time to publicly endorse open access to rights-cleared, high quality, scanned page images and reconsider the appropriate roles for academic and public institutions participating in commercial analogue heritage conversion efforts that don’t contribute to this end. [Favid Bearman. Jean-Noel Jeanneney’s Critique of Google: Private Sector Book Digitization and Digital Library Policy]

Here is Tim O’Reilly:

Three things ought to happen to speed up the development of the book search ecosystem:

Book search engines ought to search publishers’ content repositories, rather than trying to create their own repository for works that are already in electronic format. Search engines should be switchboards, not repositories.
Publishers need to stop pretending that “opt in” will capture more than a tiny fraction of the available works. (I estimated that only 4% of books every published are being commercially exploited.)
Book search engines that are scanning out of print works in order to create a search index ought to open their archives to their competitors’ crawlers, so readers can enjoy a single integrated book search experience. (Don’t fight the internet!)

[Tim O’Reilly. Book search should work like web search]

Mass digitization

Document chat: an early AI use case

Smile: let's see your teeth

AI ... and the rhinoceros in the room redux

lorcan dempsey dot net