cs345-streams3-2

cs345-streams3-2 - 1 Still More Stream-Mining Frequent...

Info iconThis preview shows pages 1–7. Sign up to view the full content.

View Full Document Right Arrow Icon

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: 1 Still More Stream-Mining Frequent Itemsets Elephants and Troops Exponentially Decaying Windows 2 Counting Items r Problem : given a stream, which items appear more than s times in the window? r Possible solution : think of the stream of baskets as one binary stream per item. R 1 = item present; 0 = not present. R Use DGIM to estimate counts of 1s for all items. 3 Extensions r In principle, you could count frequent pairs or even larger sets the same way. R One stream per itemset. r Drawbacks: 1. Only approximate. 2. Number of itemsets is way too big. 4 Approaches 1. Elephants and troops : a heuristic way to converge on unusually strongly connected itemsets. 2. Exponentially decaying windows : a heuristic for selecting likely frequent itemsets. 5 Elephants and Troops r When Sergey Brin wasnt worrying about Google, he tried the following experiment. r Goal : find unusually correlated sets of words. R H i g h C o r r e l a t i o n = frequency of occurrence of set >> product of frequency of members. 6 Experimental Setup r The data was an early Google crawl of the Stanford Web.the Stanford Web....
View Full Document

This document was uploaded on 03/04/2012.

Page1 / 17

cs345-streams3-2 - 1 Still More Stream-Mining Frequent...

This preview shows document pages 1 - 7. Sign up to view the full document.

View Full Document Right Arrow Icon
Ask a homework question - tutors are online