class06-eval

class06-eval - Today: Evaluation Why evaluation? Evaluating...

Info iconThis preview shows pages 1–7. Sign up to view the full content.

View Full Document Right Arrow Icon

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: Today: Evaluation Why evaluation? Evaluating a search engine Unranked and Ranked evaluation Evaluation benchmarks IR is an experimental science Formulate a research question: the hypothesis Design an experiment to answer the question Perform the experiment Compare with a baseline control Does the experiment answer the question? Are the results signicant? Or is it just luck? Report the results! Research questions Does stemming improve retrieval performance? Experiment: Build a stemmed index and compare against an unstemmed baseline Does expanding the query with synonyms improve retrieval performance? Experiment: Expand queries with synonyms and compare against baseline unexpanded queries Research questions Does keyword highlighting help users evaluate document relevance? Experiment: Build two different interfaces, one with highlighting, one without; run a user study Is letting users weight search terms a good idea? Experiment: Build two different interfaces, one with term weighting, one without: run a user study The importance of evaluation The ability to measure differences underlies experimental science How well do our systems work? Is A better than B? Really? Under what conditions? Evaluation drives what to research Identify techniques that work and that dont What we look for in evaluations ... Insightful Affordable Repeatable Explainable Evaluating a Search Engine Measures for a search engine How fast does it index Number of documents/hour (Average document size) How fast does it search Latency as a function of index size Expressiveness of query language Ability to express complex information needs Speed on complex queries Measures for a search engine All of the preceding criteria are measurable : we can quantify speed/size; we can make expressiveness precise The key measure: user happiness What is this? Speed of response/size of index are factors But blindingly fast, useless answers wont make a user happy Need a way of quantifying user happiness Measuring user happiness Issue: who is the user we are trying to make happy? Depends on the setting Web engine : user Fnds what they want and return to the engine Can measure rate of return users eCommerce site : user Fnds what they want and make a purchase Is it the end-user, or the eCommerce site, whose happiness we measure? Measure time to purchase, or fraction of searchers who become buyers? Measuring user happiness Enterprise (company/govt/academic): Care about user productivity How much time do my users save when looking for information?...
View Full Document

Page1 / 29

class06-eval - Today: Evaluation Why evaluation? Evaluating...

This preview shows document pages 1 - 7. Sign up to view the full document.

View Full Document Right Arrow Icon
Ask a homework question - tutors are online