lecture9-queryexpansion-handout-6-per

922 thesaurusbased query expansion for each term

Info iconThis preview shows page 1. Sign up to view the full content.

View Full Document Right Arrow Icon
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: Only about 4% of query sessions from a user used relevance feedback op*on   Pseudo ­relevance feedback automates the manual part of true relevance feedback.   Pseudo ­relevance algorithm:   Expressed as “More like this” link next to each result   Retrieve a ranked list of hits for the user s query   Assume that the top k documents are relevant.   Do relevance feedback (e.g., Rocchio)   But about 70% of users only looked at first page of results and didn’t pursue things further   So 4% is about 1/8 of people extending search   Relevance feedback improved results about 2/3 of the *me Introduc)on to Informa)on Retrieval Sec. 9.2.2 Query Expansion         Works very well on average But can go horribly wrong for some queries. Several itera*ons can cause query drik. Why? Introduc)on to Informa)on Retrieval Query assist   In relevance feedback, users give addi*onal input (relevant/non ­relevant) on documents, which is used to reweight terms in the documents   In query expansion, users give addi*onal input (good/bad search term) on words or phrases Would you expect such a feature to increase the query volume at a search engine? 7 Introduc)on to Informa)on Retrieval Sec. 9.2.2 How do we augment the user query? Introduc)on to Informa)on Retrieval Sec. 9.2.2 Example of manual thesaurus   Manual thesaurus   E.g. MedLine: physician, syn: doc, doctor, MD, medico   Can be query rather than just synonyms   Global Analysis: (sta*c; of all documents in collec*on)   Automa*cally derived thesaurus   (co ­occurrence sta*s*cs)   Refinements based on query log mining   Common on the web   Local Analysis: (dynamic)   Analysis of documents in result set Introduc)on to Informa)on Retrieval Sec. 9.2.2 Thesaurus ­based query expansion   For each term, t, in a query, expand the query with synonyms and related words of t from the thesaurus   feline → feline cat         May weight added terms less than original query terms. Generally increases recall Widely used in many science/engineering fields May significantly decrease precision, par*cularly with ambiguous terms.   “interest rate” → “interest rate fascinate evaluate”   There is a high cost of manually producing a thesaurus ...
View Full Document

Ask a homework question - tutors are online