Federated search that doesn't very well

I have written quite a bit in these pages about metasearch (see list of related entries below). In some recent presentations I have also suggested that metasearch is not a long-term solution to library or user needs. This has sometimes caused query or consternation, as libraries are investing considerably in metasearch approaches. It is a difficult question, as the metasearch motivation is clear, and alternatives are not readily available. One would like to organize resources by user interest, rather than by database boundaries influenced by historic library practice or vendor offerings. One would also like to mitigate the negative impact of fragmentation and interface difference on use.
In this context I was interested to read the post by Terry Reese describing their experiences at Oregon State University. He talks about the limitations of the federated search model. Here is part of what he says.

Why broken? Well, within the federated search model, there are just to many unknown variables that the system literately has no control. Start with query normalization. Targets will interpret user queries differently across the board meaning that a query at on target will search very different from a query at another. This leads to some difficulty as any tool created then must either code specifically for each target or provide a generalized search that normalizes to the most common target format. Likewise, as seen with our ILS, sometimes targets fail. And when they fail, there is really nothing that the system can do but report the error to the user. [Terry’s Worklog]

One might add that it is difficult to create added value over a distributed resource like this, in terms of manipulating results, doing collaborative filtering, and so on. He goes on to suggest how to address the problem.

So how do we fix the model? This is why LibraryFind is as much a harvester/indexer as it is a federated search tool. Our underlying belief while developing this program is that we want to harvest and index as much data as possible — so the tool is setup that way so that as vendors become more comfortable allowing their data to be harvested — we can take advantage of it. Of course, this can bring its own set of issues to the forefront — but I would gladly deal with database/indexing related issues over the current federated searching issues. [Terry’s Worklog]

I agree that we will see more pooling of data in this way. Of course, this might happen at various levels – locally or by some third party. If it is done by a third party, this raises interesting questions about what interfaces (machine and human) should be provided to give libraries the flexibility to partition, manipulate and present the resource as they think best.
Related entries:

Federated search that doesn't very well

Generative AI and libraries: seven contexts

The technology career ladder

Presentation: Two Metadata Directions

lorcan dempsey dot net