universities could serve as the trusted set. This technique avoids sharing the tax in the PageRank
calculation with the large numbers of supporting pages in spam farms and thus preferentially reduces
Spam Mass: To identify spam farms, we
4. Z. Gyongi, H. Garcia-Molina, and J. Pedersen, Combating link spam with trustrank, Proc. 30th Intl.
Conf. on Very Large Databases, pp. 576 587, 2004.
5. T.H. Haveliwala, Ecient computation of PageRank, Stanford Univ. Dept. of Computer Science
200 CHAPTER 5. LINK ANALYSIS
HITS equations in the way they do for PageRank, so no taxation scheme is necessary.
5.7 References for Chapter 5
The PageRank algorithm was rst expressed in . The experiments on the structure of the Web, which
we used to ju
can obtain through a MapReduce formulation. Finally, we discuss briey how to nd frequent itemsets in
a data stream.
202 CHAPTER 6. FREQUENT ITEMSETS
6.1 The Market-Basket Model
The market-basket model of data is used to describe a common form of manym
! Exercise 6.1.8: Prove that in the data of Exercise 6.1.4 there are no interesting association rules; i.e., the
interest of every association rule is 0.
6.2. MARKET BASKETS AND THE A-PRIORI ALGORITHM 209
6.2 Market Baskets and the A-Priori Algorithm
! Exercise 6.1.4: This question involves data from which nothing interesting can be learned about
frequent itemsets, because there are no sets of items that are correlated. Suppose the items are
numbered 1 to 10, and each basket is constructed by includin
Exercise 6.1.1: Suppose there are 100 items, numbered 1 to 100, and also 100 baskets, also numbered 1
to 100. Item i is in basket b if and only if i divides b with no remainder. Thus, item 1 is in all the baskets,
item 2 is in all fty of the even-numbered