2111 when indexing a document d include with some

Info iconThis preview shows page 1. Sign up to view the full content.

View Full Document Right Arrow Icon
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: links seen, where do we crawl next? Assump(on 2: The text in the anchor of the hyperlink describes the target page (textual context) 9 Introduc)on to Informa)on Retrieval Introduc)on to Informa)on Retrieval Assump*on 1: reputed sites Assump*on 2: annota*on of target 11 12 2 Introduc)on to Informa)on Retrieval Anchor Text S ec. 21.1.1   When indexing a document D, include (with some weight) anchor text from links poin*ng to D.   For ibm how to dis*nguish between:   IBM s home page (mostly graphical)   IBM s copyright page (high term freq. for ibm )   Rival s spam page (arbitrarily high term freq.) ibm.com A million pieces of anchor text with ibm send a strong signal Introduc)on to Informa)on Retrieval S ec. 21.1.1 Indexing anchor text WWW Worm  ­ McBryan [Mcbr94] ibm Introduc)on to Informa)on Retrieval Armonk, NY-based computer giant IBM announced today www.ibm.com IBM home page Joe s computer hardware links Sun H...
View Full Document

This document was uploaded on 02/26/2014.

Ask a homework question - tutors are online