Table of Contents
‘Knowledge organization’ seems a slightly quaint term now, but we don’t have a better in general use. Take the catalogue. This has been a knowledge organization tool. When an item is added, the goal is that it is related to the network of knowledge that is represented in the catalogue. In theory, this is achieved through ‘adjacency’ and cross reference, notably with reference to authors, subjects and works. In practice this has worked variably well.
In parallel with bibliographic data, the library community, notably national libraries, have developed ‘authorities’ for authors and subjects to facilitate this structure. From our current vantage point, I think we can see three stages in the development of these tools.
1. Label. In the first, subject and name authorities provide lists from which values for relevant fields were chosen. Effectively, they constrain the range and format of subject or name data, providing an agreed text label for a concept or name. Examples are LCSH, Dewey, and the Library of Congress Name Authority File. These provide some structuring devices for local catalogues, but those systems do not exploit the full structure of the authority systems from which the values are taken. Think of what is done, or not done, with classification for example. The classification system may not be used to provide interesting navigation options in the local system, and more than likely is not connected back to the fuller structure of the source scheme. That said, having a consistent label is an advantage, and facilitates matching within and between systems.
2. Data. The second stage is that these authority systems are being considered as resources in themselves, and not just as sources of controlled values for bibliographic description. So, we are seeing the Library of Congress, for example, making LCSH and the Name Authority File available as linked data. OCLC is working with a group of national libraries to synthesize name authority files and make them available as an integrated resource in the VIAF service. FAST has recently been made available in this way. The Digital Author Identifier, a national Dutch system for identifying researchers, is interesting in this context. In this arrangement, there is collaboration between the apparatus for uniquely identifying researchers and the national authority file.
3. Network. In a third stage, as these network-level resources become more richly linkable and as local environments exploit that linking ability it becomes possible to do more. This type of linking has only just begun though, and it will be interesting to see how it develops. In this context, a URI is added to the label, making it actionable and globally unique. As an example, think again of the catalogue. The structuring devices we employ are about structuring relationships *within* the catalogue. This would be turned inside out if we not only imported values, but also linked those labels to those external resources. In this way, the item represented could be re-placed in the broad network of knowledge established by the authority file from which it comes.
Of course, alongside this, they may also link to, or draw data from, other navigational, contextual, identifying or structuring resources such as DBpedia, MusicBrainz or Geonames. These and other reference points are likely to be important webscale identity and knowledge organization services. In a sense, more generally, this has already happened, as people orient themselves by links to Wikipedia, MusicBrainz, IMDB and other network level resources.
As in other areas of our activity, we need to think about how activities whose natural level was once local are now moving up to the network level. And once they are at the network level, they have to live alongside other approaches.
If this were to become more common, there are some implications …
From records to entities … we ship data around in ‘records’, bundles about individual items, and our systems are structured around managing these records. We do not tend to manage data about other things of interest to us to the same extent: authors, places, people, concepts, works, and so on, the types of things we have in authority files. What would happen if we more clearly described an item by linking it to these files? More generally, we can see stronger interest emerging in some of these other entities, personal names especially. Think for example of how Amazon has created people pages or the growing interest in researcher identification. Or of places, as geolocation services take hold. Freebase is creating an ‘entity graph‘ giving IDs to millions of entities (people, places and things).
Much of the library linked data discussion has been about making that local record-based data available in different ways. As interesting is the discussion about what key resources libraries will want to link to, and how they might be sustained. An important question for national libraries and others who manage some of the schemes mentioned above is how to move into this third phase. What would this mean for library systems or for library data of this type? What resources are important? How should they be sustained? To make this concrete, are the name authority files maintained by national libraries fit for purpose in a network world? Does it make sense to limit their scope to authors identified in a particular library workflow, cataloging, and exclude other authors (of articles, for example)? Does it make sense to limit their creation to a restricted group of specialist librarians? And so on …
Finally, as knowledge organization moves to the network level how do library resource relate to others. Can other services leverage the accumulated investment of the library community, or does it fade. The organized relationship between the Deutsche Nationalbibliothek and Wikipedia in Germany is an interesting example here, where the German Wikipedia explicitly takes advantage of the structuring work done by the DNB. Wikipedia itself is very interesting in this regard, as it has effectively become an ‘addressible knowledgebase’. If I want to tell you about a new concept or movement, or refer you to a place, or mention a person, I can send you a Wikipedia link. What would be required for Wikipedia to take advantage of ‘knowledge organization’ approaches developed in the library community?