Calais - entity identification in the cloud

Lorcan 1 min read

Calais is a web service provided by Thomson Reuters, currently at no charge.

Calais enhances your content with rich semantic metadata. Using your content as a starting point you can utilize Calais to automatically add metadata such as entities (people, places, organizations, etc.), facts (John Doe works for Acme Corporation as the CEO), and events (a natural disaster of type landslide happened on date x). That metadata is generated in industry-standard formats that ease integration with whatever commercial, open source or proprietary content management system you are using. [Content Manager | OpenCalais]

I came across it being used in the Powerhouse Museum, and have no experience of it apart from looking at a few entries in the catalogue there.

We have been experimenting with Reuters’ OpenCalais web service since it launched in January. Now we have made a basic implementation of it applied to records in our collection database, initially as a way of generating extra structured metadata for our objects. We can extract proper names, places (by continent, country, region, state and city), company names, technologies and specialist terms, from object records all without requiring cataloguers to catalogue in this way. Having this data extracted makes it much easier for us to connect objects by manufacturers, people, and places within our own collection as well as to external resources.

[fresh + new(er) » Blog Archive » OPAC2.0 – OpenCalais meets our museum collection / auto-tagging and semantic parsing of collection data]

Here is an example for the object ‘P3561 Text book, ‘Simple Directions in Needlework and Cutting Out, Intended for the Use of the National Female Schools of Ireland’, Commissioners of National Education, Dublin, Ireland, 1858‘.
I suggested a while ago that four types of metadata about things were of interest in our systems. Interestingly, an item result page from the Powerhouse Museum now displays examples of all four: professionally created, contributed (tags), intentional (in this case there is a ‘similar searches’ feature), and programmatically promoted (the data generated by the Calais service).
Related entry:

More from
So-called soft skills are hard

So-called soft skills are hard

So-called soft skills are important across a range of library activities. Existing trends will further amplify this importance. Describing these skills as soft may be misleading, or even damaging. They should be recognized as learnable and teachable, and should be explicitly supported and rewarded.
Lorcan 12 min read
The technology career ladder

The technology career ladder

Library leaders should be drawn from across the organization. Any idea that technology leaders are overly specialised or too distant from general library work is outmoded and counter-productive.
Lorcan 7 min read

Lorcan Dempsey dot Net

The social, cultural and technological contexts of libraries, services and networks

Great! You’ve successfully signed up.

Welcome back! You've successfully signed in.

You've successfully subscribed to

Success! Check your email for magic link to sign-in.

Success! Your billing info has been updated.

Your billing was not updated.