CS345
Data Mining
Link Analysis Algorithms
Page Rank
Anand Rajaraman, Jeffrey D. Ullman
Link Analysis Algorithms
Page Rank
Hubs and Authorities
Topic-Specific Page Rank
Spam Detection Algorithms
Other interesting topics we won’t cover
Detecting duplicates and mirrors
Mining for communities
Classification
Spectral clustering
Ranking web pages
Web pages are not equally “important”
www.joe-schmoe.com
v
www.stanford.edu
Inlinks as votes
www.stanford.edu
has 23,400 inlinks
www.joe-schmoe.com
has 1 inlink
Are all inlinks equal?
Recursive question!
Simple recursive formulation
Each link’s vote is proportional to the
importance of its source page
If page
P
with importance
x
has
n
outlinks, each link gets
x/n
votes
Simple “flow” model
The web in 1839
Yahoo
M’soft
Amazon
y
a
m
y/2
y/2
a/2
a/2
m
y
=
y
/2 +
a
/2
a
=
y
/2 +
m
m
=
a
/2
Solving the flow equations
3 equations, 3 unknowns, no constants
No unique solution
All solutions equivalent modulo scale factor
Additional constraint forces uniqueness
y+a+m = 1
y = 2/5, a = 2/5, m = 1/5
Gaussian elimination method works for
small examples, but we need a better
method for large graphs

