SpamHits

SpamHits - CS345 Data Mining Link Analysis 3: Hubs and...

Info iconThis preview shows pages 1–14. Sign up to view the full content.

View Full Document Right Arrow Icon
    CS345 Data Mining Link Analysis 3: Hubs and Authorities Spam Detection Anand Rajaraman, Jeffrey D. Ullman
Background image of page 1

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
Problem formulation (1998) Suppose we are given a collection of  documents on some broad topic e.g., stanford, evolution, iraq perhaps obtained through a text search Can we organize these documents in some  manner? Page rank offers one solution HITS (Hypertext-Induced Topic Selection) is  another proposed at approx the same time
Background image of page 2
HITS Model Interesting documents fall into two classes 1. Authorities  are pages containing useful  information course home pages home pages of auto manufacturers 1. Hubs  are pages that link to authorities course bulletin list of US auto manufacturers
Background image of page 3

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
Idealized view Hubs Authorities
Background image of page 4
Mutually recursive definition A good hub links to many good authorities A good authority is linked from many good  hubs Model using two scores for each node Hub score and Authority score Represented as vectors  h  and  a  
Background image of page 5

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
Transition Matrix  A HITS uses a matrix  A [ i j ] = 1 if page  i  links to  page  j , 0 if not A T the transpose of  A , is similar to the  PageRank matrix  M , but  A T  has 1’s where  M    has fractions
Background image of page 6
Example Yahoo M’soft Amazon y 1 1 1 a 1 0 1 m 0 1 0 y a m A =
Background image of page 7

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
Hub and Authority Equations The hub score of page P is proportional to the  sum of the authority scores of the pages it  links to h  =  λ A a Constant   is a scale factor λ The authority score of page P is proportional  to the sum of the hub scores of the pages it is  linked from a  = μ A h Constant μ is scale factor
Background image of page 8
Iterative algorithm Initialize  h a  to all 1’s h  =  Aa Scale  h  so that its max entry is 1.0  A T h Scale  a  so that its max entry is 1.0 Continue until  h a  converge 
Background image of page 9

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
Example 1 1 1 A = 1 0 1 0 1 0 1 1 0 A T = 1 0 1 1 1 0 a(yahoo) a(amazon) a(m’soft) = = = 1 1 1 1 1 1 1 4/5 1 1 0.75 1 . . . . . . . . . 1 0.732 1 h(yahoo) = 1 h(amazon) = 1 h(m’soft) = 1 1 2/3 1/3 1 0.73 0.27 . . . . . . . . . 1.000 0.732 0.268 1 0.71 0.29
Background image of page 10
Existence and Uniqueness h  =  λ A a a  = μ A h =  μ λ AA h a  =  μ λ A T a Under reasonable assumptions about  A the dual iterative algorithm converges to vectors  h*  and  a*  such that: h*  is the principal eigenvector of the matrix  AA T a*  is the principal eigenvector of the matrix  A T A
Background image of page 11

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
Bipartite cores Hubs Authorities Most densely-connected core ( primary core ) Less densely-connected core ( secondary core )
Background image of page 12
Secondary cores A single topic can have many bipartite cores
Background image of page 13

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
Image of page 14
This is the end of the preview. Sign up to access the rest of the document.

Page1 / 43

SpamHits - CS345 Data Mining Link Analysis 3: Hubs and...

This preview shows document pages 1 - 14. Sign up to view the full document.

View Full Document Right Arrow Icon
Ask a homework question - tutors are online