Table of Contents
Several colleagues met with Norbert Lossau of Bielefeld University this morning. Norbert is on a tour of US universities and Jay arranged for him to visit OCLC. Bielefeld is working with FAST — the text search company — to create library gateway services.
One of the interesting things they are exploring with the FAST developers is a federated index architecture, where indexes are integrated from geographically distributed servers.
This reminded me of CIP — Common Indexing Protocol — an approach developed within the Whois++ community for sharing index information, or centroids.
Ralph was looking at centroids a while ago, although we have not yet pursued this:
A centroid can be thought of as a simple inverted index mechanism that can be shared amongst servers in a network environment in order to provide hints as to the location of data in a large, loosely coupled distributed database. A centroid is used by a server or user client to provide it with hints as to which other servers might contain information that is relevant to a user’s search. These hints are known as “forward knowledge”.  [Centroids-based Collection Analysis [OCLC – Projects]]
It did occur to me that it would be interesting to explore sharing centroids over OAI.
Google does not turn up very much about this CIP. It was never widely deployed and seems to have faded away. Interestingly, Ralph’s  reference above points to a copy of a project paper on the Internet Archive – the original is no longer there. This project, ROADS, used Whois++ to federate Internet subject gateways. The central index server was used to route queries to relevant gateways.
The Imesh Toolkit project has an overview of CIP:
The role of the Common Indexing Protocol or CIP is to pass information about the contents of a record between servers and so facilitate access by clients to the data they seek at a later point. This process of referring or replicating queries is known as query routing and is designed to reduce server overload. The latter is a frequent consequence in a system which merely broadcasts searches across a distributed network without a mechanism to direct searching in any fashion.  [IMesh Toolkit – Technology Review – Protocols]