Table of Contents
[NOTE: this blog post formed the basis of a subsequent article:
Libraries and the Long Tail: Some Thoughts about Libraries in a Network Age
D-Lib Magazine, Vol. 12, No. 4, April 2006.
[Warning: long, long, long]
Those discussions of the long tail that I have seen or heard in the library community strike me as somewhat partial. I am not sure that we have absorbed the real relevance of the argument. Much of that discussion is about how libraries in aggregate contain deep and rich collections. However, the real issue is how well supply and demand are articulated in a network environment. And when we think about it in this systemwide way the picture is less reassuring.
Think of two figures. The first is that ILLs account for 1.7% of overall circulations. This goes up to 4.7% if we just look at academic libraries.  What this suggests is that we are not doing a very good job of aggregating supply (making it easy to find and obtain materials of interest wherever they are). The flow of materials from one library to another is very low when compared to the overall flow of materials within libraries.
The second is about circulation. We have done some work looking at circulation data in two research libraries across several years. In each case, about 20% of books (we limited the investigation to English books) accounted for about 90% of circulations. What does this say about the aggregation of demand. Materials are not being united with users who might be interested in them. ‘Just-in-case’ collection development policies, at individual institutions, do not lead to optimal systemwide allocation of resources.
The long tail
First a recap of the long tail argument, which since the publication of the original Wired article has been everywhere.
The argument is about how the Internet changes markets. In the ‘physical world’, the costs of distribution, retail and consumption mean that an item has to generate enough sales to justify its use of scarce shelf, theatre or spectrum space. This leads to a limit on what is available through physical outlets and a corresponding limit on the selection potential of users. At the same time, the demand for a particular product or service is limited by the size of the population to which the physical location is accessible. This scarcity drives behaviors, about which we may have made mistaken assumptions:
For too long we’ve been suffering the tyranny of lowest-common-denominator fare, subjected to brain-dead summer blockbusters and manufactured pop. Why? Economics. Many of our assumptions about popular taste are actually artifacts of poor supply-and-demand matching – a market response to inefficient distribution. [Wired 12.10: The Long Tail]
These inefficiencies are mitigated in a network environment. And, accordingly, so the argument goes, we observe different behaviors with network services:
Unlimited selection is revealing truths about what consumers want and how they want to get it in service after service, from DVDs at Netflix to music videos on Yahoo! Launch to songs in the iTunes Music Store and Rhapsody. People are going deep into the catalog, down the long, long list of available titles, far past what’s available at Blockbuster Video, Tower Records, and Barnes & Noble. And the more they find, the more they like. As they wander further from the beaten path, they discover their taste is not as mainstream as they thought (or as they had been led to believe by marketing, a lack of alternatives, and a hit-driven culture). [Wired 12.10: The Long Tail]
So, Netflix, for example, aggregates supply as discussed here. It makes the long tail available for inspection. However, importantly, it also aggregates demand: a larger pool of potential users is available to inspect any particular item, increasing the chances that it will be borrowed by somebody.
Anderson provided some interesting numbers to show the impact of this phenomenon in his original article and these have been updated on his website. He notes that the major Internet platforms (Amazon, eBay, Google, …) are making major businesses by aggregating the long tail. Google for example services the long tail of advertising – those for whom the bar was too high in earlier times of scarce column inches or broadcast minutes. And by aggregating demand, delivering a large volume of users, they increase the chances of the advert being seen by somebody to whom it is relevant.
Of course, merely being on the web is only a part of the issue. What the web allows is consolidation. Anderson’s examples are massive, consolided web presences. This consolidation has two aspects, as suggested a moment ago, aggregation of supply and aggregation of demand. Each is important.
Four things come to mind about the aggregation of supply. The first is transaction costs, the costs incurred – whether in attention, money, expertise or some other resource – in achieving one’s goal. High transaction costs inhibit use: they increase the friction in the system; low transaction costs encourage use: they increase the liquidity of the system. An important part of the aggregation of supply is the reduction of transaction costs. So, iTunes for example, has low transaction costs. The burden of discovering tracks of interest, of transacting for their use, of downloading them, is low. They are immediately available. Netflix has higher transaction costs given the delays caused in the mail system, but still works to provide as frictionless a workflow as possible for the user. We can think of two aspects of transaction costs: search costs and fulfilment costs. How difficult is it to discover something, and once it is found, how difficult is it to acquire a service or an object.
The second is the availability of consolidated ‘intentional data’: data about choices made in the use of the system. Netflix, Amazon, Rhapsody: they and others refine their service based on what they know of their users’ intentions, mined directly from the aggregated clickstream. This allows them to develop a service which can reflexively develop based on usage, and which can be tailored around particular behaviors and preferences. Furthermore, additional services can by built by leveraging this resource, recommender services for example. These services potentially reduce transaction costs, because they use aggregate data about behaviors to better target their offering.
The third is about inventory. These large web presences consolidate inventory: they are not encumbered with the costs of massively redundant, just-in-case inventory, scattered through multiple physical locations. This consolidation may happen in virtue of the digital nature of the collections, as with iTunes. Or, where physical inventory is involved, as with Amazon, they can consolidate in strategic locations, or with particular suppliers, as inventory need not be tied to physical store-fronts. They realise their store through the management and presentation of data. And, of course, consolidation of inventory reduces transaction costs.
And finally, the fourth is about navigating the consolidated resource. Google introduced a major innovation with its ranking approach. Amazon is interested in rich interconnection through reviews, wishlists, reader selected lists, the various phrases (capitalized and statistically improbable), and so on. In each case, simple aggregation is not good enough: we are moving towards smart aggregation.
And what about the aggregation of demand? Google, iTunes, Amazon, eBay: the gravitational pull of these resources on the open web means that they have a wide audience. This increases the chances that resources they discose will rendezvous with interested consumers.
Libraries and the long tail
So, now let’s turn back to libraries, and focus on these two issues: the aggregation of supply, and the aggregation of demand. For convenience of discussion (this is a blog entry, after all), I focus on books, drawing in other resoures occasionally. I hope readers can see how the discussion can be extended to cover other parts of collections.
The aggregation of supply in libraries
Libraries have been subject to the same physical constraints as, say, bookstores, albeit within a different service context. The library collection is not limited to the current or the popular: the library has some responsibility to the historical record, to the full range of what has been made available as well as to what is now available. That responsibility may vary by library type, and be variably exercised. The library has exercized that responsibility in two ways: by assembling a local collection and by participating in systems of extra- and inter-library provision. These systems may be organized in different ways; the resource-sharing consortium is a common pattern, and a library may belong to several.
The library collection is driven by local perception of need, within available resource, and collection development activities exist to balance resource and need. A large research library and a busy public library system will have different profiles, but are both influenced by physical constraint. In the material world the transaction costs of access to a distributed library collection are high, so those that could afford it sought to amass large local collections, to aggregate supply locally. See the large just-in-case research library collections, where supply exceeds demand. And, indeed, we are still measuring research library strengths by number of volumes. A busy public library may move towards the bookstore model. I was at a presentation recently about a busy public library system in an affluent suburban area. They turned over 15% of their stock per annum: they want stock to circulate and to keep it fresh for a demanding audience; just as in a bookstore, titles had to justify their occupation of limited shelfspace.
Think about the issues I mentioned above: transaction costs, intentional data, inventory and navigation.
A library user has had a range of discovery tools and services which provide access to a fuller range of scholarly and learning materials. This in turn is supported by a well-developed apparatus of deposit libraries, resource sharing systems, union catalogs, cooperative collection development, document supply, and other collaborative and commercial services. This ‘apparatus’ may be imperfectly and intermittently articulated, but is a significant achievement nonetheless. What an individual library may not be able to supply should be available within the overall system in which libraries participate.
However, this availability is bought at the expense of some complexity. This in turn means that the transaction costs of using the system are high enough that some needs go unrecognized or unmet. A library user may not be familiar with available tools or may not be aware that other materials are available. Thus, historically, one can say that while library services explicitly aim to ‘aggregate the long tail’ within an overall apparatus of provision, inefficiencies of articulation mean users are variably served. To make this more concrete think about the D2D chain: discover, locate, request, deliver. Here lack of integration increases transaction costs. This is integration within each process (there are many discovery options, for example) and between processes (the processes are not always connected in well-seamed ways).
- Discovery. The discovery experience is a fragmented one. A user has a range of discovery tools available and may not always know which is the most suitable. This is especially the case with the journal literature, to which the deployment of metasearch approaches is a partial response. Even for books, users may have to navigate a catalog patchwork to find something they are interested in. What might one do? One approach is consolidation: fewer larger pools of metadata to support discovery would help. Another is ‘syndication’, moving that metadata to where it might more readily rendezvous with the reader. I use syndication as a general term to include such ideas as letting metadata flow into citation managers, search engines and other resources, and to expose it in services which other applications may build on. The latter is familiar to us from Amazon, which can make its data and services available in other interfaces through its APIs.
- Location. Having identified an item of interest, a user needs to find a service that will supply it. This may be as simple as noting a call number and walking to a shelf. Or it may involve a resolution service with the return of several options. Or it may involve a further discovery experience in a library resource, if the item was originally found outside the library. This latter case is especially interesting, as library users have many more discovery options outside the library than within it. What is needed is a way of connecting the discovery experience to a library service. Here Coins provide a potential approach, coupled with various browser tools.
- Request. Another transaction here, which may involve some steps. Simple, as in placing a hold. More complex if a form has to be filled out. And so on. Increasingly, libraries may want to route requests in several directions: allowing a user buy from Amazon, initiate an ILL request, initiate a document supply request, place a hold.
- Deliver. Again, several potential options, which involve more or less difficulty depending on how presented and the disposition of supplier and user. This ties interestingly to the inventory question I come back to below.
You get the idea: at each stage, there are potentially many processes that need to be connected, and they potentially need to be connected to each other in different combinations. The better connected, the lower the transaction costs.
Data about use and usage is used to adapt and improve systems. In the library community we have not made very much of this data. Examples of intentional data are holdings data (choices made by libraries), circulation and ILL data (choices made by users), and database usage data (choices made by users). One of the issues that we have is that this data will work better in aggregate: we need better ways of consolidating it to improve service. And then it can be used to refine services, and to create new services as discussed above. Libraries are increasingly interested in using such data to improve services
The historic library model has been physical distribution of materials to multiple locations so that they can be close to the point of need. In the network environment, of course, this model changes. Resources do not need to be distributed in advance of need; they can be held in consolidated stores, which, even with replication, do not require the physical buildings we now have. As we move forward, and as more materials are available electronically, we will see more interest in managing the print collection in a less costly way. We can see some of this discussion starting in relation to the mass digitization projects and the heightened interest in off-site storage solutions. In each case, there is a growing interest in being able to make investment choices which maximize impact – based, for example, on a better understanding of what is rare or common within the system as a whole, on what levels of use are made of materials, and so on. In fact, again looking forward some time, it would be good to have management support systems in place which make recommendations for moving to storage or digitization based on patterns of use, distribution across libraries, and an agreed policy framework. There are two medium term questions that are of great interest here. First, what future patterns of storage and delivery are optimum within a system (where a system may be a state, a consortium, a country). Think of arranging a system of repositories so that they are adjacent to good transport links for example, collectively contracting with a delivery provider, and having better system support for populating the repositories, and monitoring traffic between the repository and libraries. Second, think of preservation. Currently, we worry about the unknown long term costs of digital preservation. However, what about the long term costs of print preservation? I contend that for many libraries they will become unsustainable. If the use of large just-in-case collections declines, if the use of digital resources continues to rise, if mass digitization projects continue, then it becomes increasingly hard to justify the massive expense of maintaining redundant collections. Long-term we may see a shift of cost from print to digital, but this can only be done if the costs of managing print can be reduced, which in turn means some consolidation of print collections.
Library aggregations have not been very helpful in this regard. Recently, we have realized that this is an issue. The interest in faceted browse, FRBR, recommendation, ranking by holdings or other data, and so on are testament to a realization that there need to be better ways to exploit large bibliographic resource.
Aggregating demand in libraries
The level of use of a resource depends on the size of the population to which it is accessible. One aspect of the long tail argument is that the aggregation of demand – extending the population to which a resource is accessible – means that resources have a better chance of finding interested users. In other words, use will extend down the long tail. So, as discussed above, Netflix finds viewers for movies which might not move in a physical outlet because it aggregates demand across a larger population than a single pysical store can. This provides an interesting perspective from which to view Google Scholar and Google Book Search, in particular their interaction with libraries. Take Google Book Search: what Google is doing here is potentially aggregating demand for books: it will be interesting to see what influence this has on their use. Presumably a case has been made that there is potential interest in the full scope of those collections, or, in other words, in moving down the long bibliographic tail. They are also aggregating demand for books and journals through Google Scholar. And, of course, they recognize that so as not to frustrate users they need to aggregate supply behind the discovery experience. Hence they are working with resolver data to complete the locate/requst/deliver chain for journal materials. And they are working with OCLC to connect the Google Scholar discovery experience to Find in a Library for fulfilment. What OCLC is doing is making metadata about those books available to the major search engines and routing users back to library services, to complete the D2D for books. More work needs to be done here.
Logistics and libraries
So, briefly, consequences for libraries?
Libraries have rich deep collections, and the aggregate library system is a major achievement. However, in our current network environment, libraries compete for scarce attention. This suggests that if the ‘library long tail’ is to be effectively prospected then the ‘cost’ of discovering and using library collections and services needs to be as low as possible. We have further work to do.
This is a logistics issue. Logistics is about articulating supply and demand across a network of potentially many parties. This is what libraries do, and some of the recent innovation in libraries has been precisely to automate supply chains (think resolution services, for example).
Some ways of improving aggregation of supply:
- Unified discovery experiences: fragmentation is costly.
- Project library discovery experience into other environments: search engines, browser tools, RSS aggregators, …
- Better integrate D2D, both within operation (for example, combine request options – Amazon, place hold, ILL, …) and between operations. The aim should be to be able to place a ‘get it’ button anywhere, and guide the user through simple choices.
- Medium term exploration of how ‘inventory’ and ‘distribution’ are managed across a system (whether a system is a library, a consortium, a state, a country).
- Better ‘intelligence’ within the network. This involves better representing the entities within the network. This touches on the growing interest in ‘registries’ – registries of services (a registry of deep OPAC links, or OpenURL resolvers, or Z39.50 targets are examples here), registries of collections (a registry of database descriptions is an example), registries of institutions (see the very fine National Library of Australia Libraries Gateway for example), registries of policies (increasingly important, as libraries will organize within policy frameworks), and so on. In this context, it is interesting to reflect that the distinctive value of union catalogs is the holdings data: a union catalog is a registry of ‘information object’ data related to holding institutions. Collectively, the registry data discussed here will drive the applications that support ‘library logistics’.
- Transaction support. In an environment of multiple transactions between libraries it is useful to have a way of tracking and reconciling between libraries. OCLC’s Fee Management service is an example of something that does this for some classes of transaction.
Aggregation of demand
- If more users are exposed to library collections they will be used more. Of course, in some contexts demand from external users has been one reason for not more widely exposing collection information. However, the dynamics of the network have changed use. The major Internet search presences are often the first and last resort of research, and fragmentation of library resources reduces their gravitational pull. Libraries are having to compete for the attention of their own users. They need to be in user environments, and the open web is now very much part of those environments. This leads to consideration of the discovery strategies mentioned above: more consolidation and better projection into user environments, including search engines.
Libraries do indeed collectively manage a long tail of research, learning and cultural materials. However, we need to do more work to make sure that that long tail is directly available to improve the work and lives of our users.
Discussion with Brian Lavoie made this entry better. Making it shorter would also have helped!
 This figure is from our marketing colleagues based on an analysis of available data.