Section5.0 - Module 5 Introduction There is so much data on...

Info iconThis preview shows pages 1–3. Sign up to view the full content.

View Full Document Right Arrow Icon
Module 5 Introduction There is so much data on the Web that we cannot hope to find what we want without the help of a search engine, such as Google , Yahoo , or Bing . How do these engines provide us with the webpages that we want to see? The goals of this first unit examining search engines are: to understand the structure of a Web index; to know what changes are applied to words on a webpage so that they are converted into index terms; to learn how queries are converted into search terms; to appreciate the need for large computer clusters to serve Web searches in practice. More specifically, by the end of the module, you will be able to: describe the contents of postings lists; explain indexing concepts, including case folding, stop words, and stemming; explain the vocabulary problems arising from word segmentation, synonymy, polysemy, and variant spellings; describe the main components of a simple Web search engine and how a user’s search flows through the engine ending with the results page being returned to the user. The online materials are supplemented by the first part of Chapter 4 in Web Dragons . You are responsible for material from both sources.
Background image of page 1

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
CS 100 Module 5 5.2 5.0 Searching and Indexing © 2009, University of Waterloo Today there are more than one trillion webpages 1 , containing an incredible amount of information on every topic imaginable. Furthermore, the number of webpages is growing by billions of pages each day. Like the libraries of old, it is impossible to find what you are looking for by browsing around from page to page. Even if a computer could do the browsing for you and could load 1000 pages per second, it would take over 31 years to check every page one after the other. As you have experienced, however, Google and other search engines can return answers to your requests almost instantaneously. How do they do it? Before answering that question, let’s look a bit closer at the types of searches, or queries , that Google provides. (Similar facilities are provided by many other engines, but we’ll look at this one to make our discussion more concrete.) Practice Exercises 1) Open a browser window and enter Google’s web address . Notice the search box for entering your query. Notice also the two search buttons labeled Google Search and I’m Feeling Lucky . At the top of the window, notice links labeled Web , Images , Videos , Maps , News , and so on. 2) In the query box, enter the word Waterloo and then click Google Search. How long did it take to respond? How many webpages does Google report as having that word? Notice that Google also returns references to several pages that it believes are most likely to match what you are searching for. For each one, it provides the page title, a snippet from the webpage showing your search word, the URI for the page, and links labeled cached and similar . Click on a webpage title (or URI) to see the full webpage found by Google . Does this page match what you had in mind when you
Background image of page 2
Image of page 3
This is the end of the preview. Sign up to access the rest of the document.

This note was uploaded on 02/04/2011 for the course CS 100 taught by Professor Bb during the Spring '11 term at University of Warsaw.

Page1 / 15

Section5.0 - Module 5 Introduction There is so much data on...

This preview shows document pages 1 - 3. Sign up to view the full document.

View Full Document Right Arrow Icon
Ask a homework question - tutors are online