cs345-streams3-2

# cs345-streams3-2 - Still More Stream-Mining Frequent...

1 Still More Stream-Mining Frequent Itemsets Elephants and Troops Exponentially Decaying Windows

2 Counting Items rhombus6 Problem : given a stream, which items appear more than s times in the window? rhombus6 Possible solution : think of the stream of baskets as one binary stream per item. rhombus4 1 = item present; 0 = not present. rhombus4 Use DGIM to estimate counts of 1’s for all items.
3 Extensions rhombus6 In principle, you could count frequent pairs or even larger sets the same way. rhombus4 One stream per itemset. rhombus6 Drawbacks: 1. Only approximate. 2. Number of itemsets is way too big.

4 Approaches 1. Elephants and troops ”: a heuristic way to converge on unusually strongly connected itemsets. 2. Exponentially decaying windows : a heuristic for selecting likely frequent itemsets.
5 Elephants and Troops rhombus6 When Sergey Brin wasn’t worrying about Google, he tried the following experiment. rhombus6 Goal : find unusually correlated sets of words. rhombus4 High Correlation ” = frequency of occurrence of set >> product of frequency of members.

6 Experimental Setup rhombus6 The data was an early Google crawl of the Stanford Web.
