170-google - Hyperlink Analysis on the Web Monika Henzinger...

Info iconThis preview shows pages 1–11. Sign up to view the full content.

View Full Document Right Arrow Icon
Hyperlink Analysis on the Web Monika Henzinger monika@google.com
Background image of page 1

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
Outline • Random Walks • Classic Information Retrieval (IR) vs Web IR • Hyperlink Analysis: – PageRank –H ITS
Background image of page 2
Random Walks Random Walk = discrete-time stochastic process over a graph G=(V,E) with a transition probability matrix P – Random Walk is at one node at any time, making node-transitions at time steps t=1,2, … with P ij being the probability of going to node j when at node i – Initial node chosen according to some probability distribution q (0) over S
Background image of page 3

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
Random Walks (cont.) •q (t) = row vector whose i-th component is the probability that the chain is in node i at time t •q (t+1) = q (t) P => q (t) = q (0) P t •A stationary distribution is a probability distribution q such that q = q P (steady-state behavior) • Example: –P ij = 1/degree(i) if (i,j) in G and 0 otherwise, then q i = degree(i)/2m
Background image of page 4
Random Walks (cont.) • Theorem: Under certain conditions: – There exists a unique stationary distribution q with q i > 0 for all i – Let N(i,t) be the number of times the random walk visits node i in t steps. Then, the fraction of steps the walk spends at i equals q i , i.e. i t q t t i N = ) , ( lim
Background image of page 5

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
Information Retrieval Input: Document collection •G o a l : Retrieve documents or text with information content that is relevant to user’s information need Two aspects:
Background image of page 6
Classic information retrieval Ranking is a function of query term frequency within the docum ent (tf) and across all documents (idf) This works because of the following assumptions in classical IR: Queries are long and well specified “What is the impact of the Falklands war on Anglo-Argentinean relations” Documents (e.g., newspaper articles) are coherent , well authored , and are usually about one topic The vocabulary is small and relatively well understood
Background image of page 7

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
Web information retrieval None of these assumptions hold: – Queries are short: 2.35 terms in avg Huge variety in documents : language, quality, duplication – Huge vocabulary: 100s million of terms Deliberate misinformation Ranking is a function of the query terms and of the hyperlink structure
Background image of page 8
12/6/2002 Hyperlink analysis • Idea: Mine structure of the web graph – Each web page is a node – Each hyperlink is a directed edge Related work: – Classic IR work (citations = links) a.k.a. “Bibliometrics” [K’63, G’72, S’73,…] – Socio-metrics [K’53, MMSM’86,…] – Many Web related papers use this approach [PPR’96, AMM’97, S’97, CK’97, K’98, BP’98,…]
Background image of page 9

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
Google’s approach Assumption: A link from page A to page B is a recommendation of page B by the author of A (we say B is successor of A) Î Quality of a page is related to its in-degree Recursion: Quality of a page is related to its in-degree, and to –t h e quality of pages linking to it Î
Background image of page 10
Image of page 11
This is the end of the preview. Sign up to access the rest of the document.

Page1 / 48

170-google - Hyperlink Analysis on the Web Monika Henzinger...

This preview shows document pages 1 - 11. Sign up to view the full document.

View Full Document Right Arrow Icon
Ask a homework question - tutors are online