Section8.0 - Module 8 Introduction When searching for a...

Info iconThis preview shows pages 1–4. Sign up to view the full content.

View Full Document Right Arrow Icon
Module 8 Introduction When searching for a good recipe that uses squash and ginger, there are many pages on the Web that might be of interest to you. You might try the query recipe squash ginger to identify those pages, and many pages that include those words could be of interest. However, there are also many pages that use the words but in a different context. For example, there’s a webpage containing various answers to the question “ What things do you remember from your schooldays ?” and among the answers are the phrases “recipe for disaster,” “used to squash all 8 of us,” and “a boy with ginger hair.” On the other hand, there may be other cookbook pages that don’t happen to mention the word “recipe , ” even though it’s obvious that what’s written is a recip e. How can a search engine separate the pages you want from those that just happen to use many of the same words? The goals of this fourth unit examining search engines are: to explain how search engines order the set of pages to be returned for a user’s query; to expose the principles behind evaluating a search engine’s effectiveness in ranking webpages; to explain how additional keywords in a search affect the quality of the results; to expose two approaches to iterative searching. More specifically, by the end of the module, you will be able to: calculate precision and recall for a returned set of pages, each marked as relevant or not relevant; calculate precision at 1, precision at 10, and so forth for a returned sequence of pages, each marked as relevant or not relevant; explain the fundamental ideas behind PageRank; contrast relevance feedback and query refinement.
Background image of page 1

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
CS 100 Module 8 8.2 8.0 Ranking © 2009, University of Waterloo Among the billions of pages on the Web, which are the ones that should be returned to you when you pose a query? If hundreds or thousands of webpages contain most of the words from your query, which ones should be returned first and which should be further down in the list of results? Again, the content presented here is supplemented by material from Web Dragons . In particular, you must read Chapter 4, pp. 110-143, concentrating on pp. 113-118.
Background image of page 2
CS 100 Module 8 8.3 8.1 Measuring Retrieval Effectiveness 8.1.1 Relevance If you want to find the height of the CN tower, you might try to search the Web for a page that contains that information. Perhaps you’ll find an article from Wikipedia, Guiness World Records , a Toronto tourism site, or a local newspaper. You will not be satisfied with an article that gives the height of Shanghai’s Oriental Pearl Tower and merely mentions tha t the CN Tower is not as big, without giving its actual height. The first four articles are said to be relevant to your query, whereas the other one is said to be irrelevant , even though it includes some matching terms.
Background image of page 3

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
Image of page 4
This is the end of the preview. Sign up to access the rest of the document.

Page1 / 22

Section8.0 - Module 8 Introduction When searching for a...

This preview shows document pages 1 - 4. Sign up to view the full document.

View Full Document Right Arrow Icon
Ask a homework question - tutors are online