A group of pages is a spider trap if there are no links from within the group to the outside of the group Random surfer gets trapped And eventually spider traps absorb all importance 2/7/2011 Jure Leskovec, Stanford C246: Mining Massive Datasets 27 Power iteration: Y! Set ri=1 ri=∑j Mij∙rj And iterate A MS Y! y a= m 2/7/2011 1 1 1 5/8 3/8 2 Jure Leskovec, Stanford C246: Mining Massive Datasets … ½ 0 ½ 0 0 MS ¾ ½ 7/4 ½ A 1 ½ 3/2 MS Y! Example: A 0 ½ 1 0 0 3 28 The Google solution for spider traps At each time step, the random surfer has two options: With probability β, follow a link at random With probability 1-β, jump to some page uniformly at random Common values for β are in the range 0.8 to 0.9 Surfer will teleport out of spider trap within a few time steps 2/7/2011 Jure Leskovec, Stanford C246: Mining Massive Datasets 29 0.2*1/3 Yahoo 1/2 0.8*1/2 1/2 0.8*1/2 y 1/2 0.8* 1/2 0 y 1/3 + 0.2* 1/3 1/3 0.2*1/3 0.2*1/3 Amazon y y 1/2 a 1/2 m0 M'soft 1/2 1/2 0 0.8 1/2 0 0 0 1/2 1 1/3 1/3 1/3 + 0.2 1/3 1/3 1/3 1/3 1/3 1/3 y 7/15 7/15 1/15 a 7/15 1/15 1/15 m 1/15 7/15 13/15 2/7/2011 Jure Leskovec, Stanford C246: Mining Massive Datasets 30 1/2 1/2 0 0.8 1/2 0 0 0 1/2 1 Yahoo Amazon y a= m 2/7/2011 y 7/15 7/15 1/15 a 7/15 1/15 1/15 m 1/15 7/15 13/15 M'soft 1 1 1 1.00 0.60 1.40 1/3 1/3 1/3 + 0.2 1/3 1/3 1/3 1/3 1/3 1/3 0.84 0.60 1.56 0.776 0.536 . . . 1.688 Jure Leskovec, Stanford C246: Mining Massive Datasets 7/11 5/11 21/11 31 Some pages are "dead ends" (have no out-links) Y! Such pages cause importance to leak out
