cs345-streams3-2

cs345-streams3-2 - Still More StreamMining Frequent...

This preview shows pages 1–7. Sign up to view the full content.

1 Still More Stream-Mining Frequent Itemsets Elephants and Troops Exponentially Decaying Windows

This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document
2 Counting Items Problem : given a stream, which items  appear more than  s   times in the  window? Possible solution : think of the stream of  baskets as one binary stream per item. 1 = item present; 0 = not present. Use DGIM to estimate counts of 1’s for all  items.
3 Extensions In principle, you could count frequent  pairs or even larger sets the same way. One stream per itemset. Drawbacks: 1. Only approximate. 2. Number of itemsets is way too big.

This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document
4 Approaches 1. Elephants and troops ”: a heuristic  way to converge on unusually strongly  connected itemsets. 2. Exponentially decaying windows : a  heuristic for selecting likely frequent  itemsets.
5 Elephants and Troops When Sergey Brin wasn’t worrying  about Google, he tried the following  experiment. Goal : find unusually correlated sets of  words. High Correlation  ” = frequency of  occurrence of set >> product of frequency  of members.

This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document
6 Experimental Setup The data was an early Google crawl of  the Stanford Web. Each night, the data would be streamed
This is the end of the preview. Sign up to access the rest of the document.

cs345-streams3-2 - Still More StreamMining Frequent...

This preview shows document pages 1 - 7. Sign up to view the full document.

View Full Document
Ask a homework question - tutors are online