chi00 - Bringing Order to the Web: Automatically...

Info iconThis preview shows pages 1–2. Sign up to view the full content.

View Full Document Right Arrow Icon
Bringing Order to the Web: Automatically Categorizing Search Results Hao Chen School of Information Management & Systems University of California Berkeley, CA 94720 USA hchen@sims.berkeley.edu Susan Dumais Microsoft Research One Microsoft Way Redmond, WA 99802 USA sdumais@microsoft.com ABSTRACT We developed a user interface that organizes Web search results into hierarchical categories. Text classification algorithms were used to automatically classify arbitrary search results into an existing category structure on-the- fly. A user study compared our new category interface with the typical ranked list interface of search results. The study showed that the category interface is superior both in objective and subjective measures. Subjects liked the category interface much better than the list interface, and they were 50% faster at finding information that was organized into categories. Organizing search results allows users to focus on items in categories of interest rather than having to browse through all the results sequentially. Keywords User Interface, World Wide Web, Search, User Study, Text Categorization, Classification, Support Vector Machine INTRODUCTION With the exponential growth of the Internet, it has become more and more difficult to find information. Web search services such as AltaVista, InfoSeek, and MSNWebSearch were introduced to help people find information on the web. Most of these systems return a ranked list of web pages in response to a user’s search request. Web pages on different topics or different aspects of the same topic are mixed together in the returned list. The user has to sift through a long list to locate pages of interest. Since the 19 th century, librarians have used classification systems like Dewey and Library of Congress classification to organize vast amounts of information. More recently, Web directories such as Yahoo! and LookSmart have been used to classify Web pages. The manual nature of the directory compiling process makes it impossible to have as broad coverage as the search engines, or to apply the same structure to intranet or local files without additional manual effort. To combine the advantage of structured topic information in directories and broad coverage in search engines, we built a system that takes the web pages returned by a search engine and classifies them into a known hierarchical structure such as LookSmart’s Web directory [24]. The system consists of two main components: 1) a text classifier that categorizes web pages on-the-fly, and 2) a user interface that presents the web pages within the category structure and allows the user to manipulate the structured view (Figure 1). Figure 1: Presenting web pages within category structure RELATED WORK Generating structure Three general techniques have been used to organize documents into topical contexts. The first one uses structural information (meta data) associated with each document. The DynaCat system by Pratt [15] used meta data from the UMLS medical thesaurus to organize search results. Two prototypes developed by Allen [1] used meta
Background image of page 1

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
Image of page 2
This is the end of the preview. Sign up to access the rest of the document.

Page1 / 10

chi00 - Bringing Order to the Web: Automatically...

This preview shows document pages 1 - 2. Sign up to view the full document.

View Full Document Right Arrow Icon
Ask a homework question - tutors are online