Calais - entity identification in the cloud


Table of Contents

Calais is a web service provided by Thomson Reuters, currently at no charge.

Calais enhances your content with rich semantic metadata. Using your content as a starting point you can utilize Calais to automatically add metadata such as entities (people, places, organizations, etc.), facts (John Doe works for Acme Corporation as the CEO), and events (a natural disaster of type landslide happened on date x). That metadata is generated in industry-standard formats that ease integration with whatever commercial, open source or proprietary content management system you are using. [Content Manager | OpenCalais]

I came across it being used in the Powerhouse Museum, and have no experience of it apart from looking at a few entries in the catalogue there.

We have been experimenting with Reuters’ OpenCalais web service since it launched in January. Now we have made a basic implementation of it applied to records in our collection database, initially as a way of generating extra structured metadata for our objects. We can extract proper names, places (by continent, country, region, state and city), company names, technologies and specialist terms, from object records all without requiring cataloguers to catalogue in this way. Having this data extracted makes it much easier for us to connect objects by manufacturers, people, and places within our own collection as well as to external resources.

[fresh + new(er) » Blog Archive » OPAC2.0 – OpenCalais meets our museum collection / auto-tagging and semantic parsing of collection data]

Here is an example for the object ‘P3561 Text book, ‘Simple Directions in Needlework and Cutting Out, Intended for the Use of the National Female Schools of Ireland’, Commissioners of National Education, Dublin, Ireland, 1858‘.
I suggested a while ago that four types of metadata about things were of interest in our systems. Interestingly, an item result page from the Powerhouse Museum now displays examples of all four: professionally created, contributed (tags), intentional (in this case there is a ‘similar searches’ feature), and programmatically promoted (the data generated by the Calais service).
Related entry:


comments so far.

Sign in to comment or sign up if you are not already a member.