This preview shows page 1. Sign up to view the full content.
Unformatted text preview: Any surfer looking at page i will: – if ci = 0, choose one of the other n pages at random; – if ci ≠ 0, ﬂip a coin whose P(heads) = p (the coin is assumed to be independent of the surﬁng), and • if it’s heads, select one of the out links at random; • if it’s tails, select one of the n Web pages at random. • One step transiHon probabiliHes are: ⎧
⎪
⎪
⎪
pij = ⎨
⎪
⎪
⎪
⎩ 1
n
p⋅ if ci = 0
1
1
+ (1 − p ) ⋅
ci
n (1 − p ) ⋅ 1
n if ci ≠ 0 and link i → j exists
if ci ≠ 0 and link i → j does not exist
Ilya Pollak Modiﬁed model of Web surﬁng ⎧
⎪
⎪
⎪
pij = ⎨
⎪
⎪
⎪
⎩ 1
n
p⋅ if ci = 0
1
1
+ (1 − p ) ⋅
ci
n (1 − p ) ⋅ 1
n if ci ≠ 0 and link i → j exists
if ci ≠ 0 and link i → j does not exist • Assuming that p < 1, the resulHng Markov chain graph is fully connected, with pij ≠ 0 for all Web pages i and j. • Therefore, the enHre graph forms a single recurrent class, with no periodic states. • Deﬁne PageRank(i) as the steady state probability for the surfer to be at page i acer a large number of steps under this model. • Then PageRank(i) exists and does not depend on the starHng point. • Retrieve pages based on word frequency and prominence, and perhaps other criteria, and sort by PageRank. Ilya Pollak Comments • Google’s original algorithm used word frequency, visual prominence (e.g., font size), anchor text (text surrounding the link to page j in page i), in addiHon to PageRank. • Google’s current page ranking algorithm has hundreds of other ingredients which are kept secret and are changed with Hme, so as to both improve the algorithm and prevent people from taking advantage. • Other concurrently developed algorithms for ranking websites were based on the idea that experts’ links to page i should count for more than non experts’ links. “Experts” are idenHﬁed by counHng how many highly ranked search results they link to. This is the basis for the hubs and authoriHes (or HITS) algorithm of Jon Kleinberg and SALSA algorithm of Lempel and Moran. Ilya Pollak InformaHon retrieval • Web search is an example of informaHon retrieval. • Before the Web, informaHon retrieval meant searching databases of newspaper arHcles, scienHﬁc papers, patents, legal abstracts, medical records, etc. • An interesHng applicaHon of text based search to video is SnapStream which is based on closed capHons. – Used by government enHHes and entertainment industry (e.g., the Daily Show). • PageRank is a akin to determining impact factors of scienHﬁc publicaHons: being cited helps, especially being cited by important publicaHons. • Non text based search is more diﬃcult but has wide applicaHons: – forensics (ﬁngerprint matching, footprint matching, face matching); – health care (matching an X ray image against a data based of lung cancer images, to aid in determining the diagnosis and treatment). Ilya Pollak...
View
Full
Document
This note was uploaded on 09/11/2013 for the course ECE 302 taught by Professor Gelfand during the Fall '08 term at Purdue.
 Fall '08
 GELFAND
 Electrical Engineering

Click to edit the document details