Metasearch, google and the rest


How quickly things can change! Last year there were discussions about the Google-busting potential of metasearch. How naive. This year there are discussions about the metasearch-busting potential of Google Scholar. Let us wait and see.
Clearly there are various issues with metasearch: the variety of data and interfaces that has to be managed means that it will always be a difficult process. It is also difficult to build out services on top of a federated resource. (I write briefly about ‘portals‘ here, and about library search here.)
But to think about the question in terms of metasearch and Google obscures a potentially more interesting longer term question. This is a question about consolidation: at what level does it make most sense for resources to be aggregated for more effective use.
Think of two poles: the fractured resource available to a library user, and Google.
Libraries struggle because they manage a resource which is fragmented and ‘off-web’. It is fragmented by user interface, by title, by subject division, by vocabulary. It is a resource very much organised by publisher interest, rather than by user need, and the user may struggle to know which databases are of potential value. By off-web, I mean that a resource hides its content behind its user interface and is not available to open web approaches. Increasingly, to be on-web is to be available in Google or other open web approaches.
These factors mean that library resources exercise a weak gravitational pull. They impose high transaction costs on a potential user. They also make it difficult to build services out on top of an integrated resource, to make it more interesting to users than a collection of databases.
A couple of recent examples emphasised for me the issues that fragmentation raises. First, see the following statement in the KB article I mention below:

It is recommended to index all metadata in a single index, and use as few different databases as possible for storage. There are hardly any databases or collections for which the use of a specific database package is justified. When there is a choice between indexing distributed databases in a central index or performing federated searching in distributed databases, it is best to choose the central indexing. There are several reasons for this, but it should be sufficient to compare Google as a central index with a theoretical Google that would distribute every user search to all websites all over the world. A combination with federated searching remains needed for databases that do not allow harvesting into a central index or for focussing a search into a specific area. [Renewing the Information Infrastructure of the Koninklijke Bibliotheek]

Second, I recently visited the Research Library in Los Alamos National Laboratories where they have a tradition of locally loading data where possible. [pdf – scroll down to page 6.] This is partly because of some of the particularities of their environment, but also because it is possible to build services out on top of this consolidated resource much more readily than on top of a federated resource. And the LANL Research Library has indeed created a very impressive set of recommender and other personalised services for their users, much richer in fact than most other libraries. They add significant value to the underlying collection of data, in large part because they have the data inhouse in a consolidated form.
The other pole is the centralized index of Google with an array of much discussed advantages, and a stated aim of consolidating all interesting data.
So, metasearch is one response to fragmentation, albeit one with limited effectiveness. Another approach is to consolidate data resources into larger reservoirs. This has the advantage of reducing the burden of integration, and enhancing the ability to create value-added services. But how and at what level could this be done? What are the sensible and possible consolidations in between the universal Google and the current debilitating fragmentation?
We have some existing consolidations: WorldCat for library materials, books especially; CrossRef for journal articles; ArtStor aspires to provide the benefits of consolidation for art images. I expect that over the next while we will see some more.


