class11 - Today Link Analysis Anchor text PageRank Next...

Info iconThis preview shows pages 1–6. Sign up to view the full content.

View Full Document Right Arrow Icon
Today: Link Analysis Anchor text PageRank Anchor text
Background image of page 1

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
The Web as a Directed Graph Assumption 1: A hyperlink is a quality signal A hyperlink between pages denotes that the author perceived relevance Assumption 2: The anchor text describes the target page We use anchor text somewhat loosely here: the text surrounding the hyperlink. Example: “You can fnd cheap cars <a hreF=http://. ..>here</a>.” [document text only] vs. [document text + anchor text] Searching on [document text + anchor text] is oFten more eFFective than searching on [document text only]. Example: Query IBM Matches IBM’s copyright page Matches many spam pages Matches IBM wikipedia article May not match IBM home page! (iF IBM home page is mostly graphical) Searching on anchor text is better For the query IBM. Represent each page by all the anchor text pointing to it. In this representation, the page with the most occurrences oF IBM is www.ibm.com.
Background image of page 2
Anchor text containing IBM pointing to www.ibm.com Indexing anchor text Thus: Anchor text is often a better description of a page’s content than the page itself. Anchor text can be weighted more highly than Indexing anchor text can have unexpected side effects – Google bombs. A Google bomb is a search with “bad” results due to maliciously manipulated anchor text. Google introduced a new weighting function in January 2007 that Fxed many google bombs. Any “live” Google bombs?
Background image of page 3

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
PageRank Origins of PageRank: Citation Analysis Citation analysis: analysis of citations in the scientiFc literature Example citation: “Miller (2001) has shown that physical activity alters the metabolism of estrogens.” Two ways of measuring similarity of two scientiFc articles Cocitation similarity : The two articles are cited by the same articles. Bibliographic coupling similarity : The two articles cite the same articles
Background image of page 4
Aside “Similar pages” on Google: Cocitation similarity Citation frequency can be used to measure the impact of an article. Each article gets one vote.
Background image of page 5

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
Image of page 6
This is the end of the preview. Sign up to access the rest of the document.

Page1 / 19

class11 - Today Link Analysis Anchor text PageRank Next...

This preview shows document pages 1 - 6. Sign up to view the full document.

View Full Document Right Arrow Icon
Ask a homework question - tutors are online