Table of Contents
In the context of talking about disclosure I had it on my list to note Google’s Sitemap some time, and in particular the use of OAI-PMH in this context.
The Sitemap Protocol allows you to inform search engines about URLs on your websites that are available for crawling. In its simplest form, a Sitemap that uses the Sitemap Protocol is an XML file that lists URLs for a site. The protocol was written to be highly scalable so it can accommodate sites of any size. It also enables webmasters to include additional information about each URL (when it was last updated; how often it changes; how important it is in relation to other URLs in the site) so that search engines can more intelligently crawl the site. [Google Webmaster Tools]
And the use of OAI-PMH:
OAI-PMH (Open Archives Initiative Protocol for Metadata Harvesting) —This is an application-independent interoperability framework based on metadata harvesting. Generally, you would use this format only if you already have a site that uses this protocol. You can’t use this format for Mobile Sitemaps. If you use this format for your site, simply add the baseURL of your OAI repository (for instance, https://www.example.com/oaiserver). [Webmaster Help Center – What other formats can I use for a Sitmap?]
I was reminded of this by Terry Reese’s post about ContentDM, OAI and Google, where he ‘discloses’ the contents of the repository to Google using ContentDM’s OAI harvesting feed for the sitemap.