Text retrieval primer from Oracle

Introductory overview to information retrieval evaluation from Oracle. Interesting to see discussion of precision, recall, TREC conferences, etc, in this context.

Text retrieval engines, popularly known as search engines, return a list of documents (the hitlist) for a query. Typically there are some good documents in the list and some bad ones. The quality of a search engine is measured in terms of the proportion of good hits in the list, the positions of good hits relative to bad ones, and the proportion of good documents missing from the list. Ideally, a search engine must return all the good documents and only the good documents. Such an engine has very good quality and is said to have high precision, recall, and utility. Real search engines are only able to return some of the good documents in the collection along with some bad ones.[Text Retrieval Quality: A Primer]

