cs345-streams3

cs345-streams3 - 1 Still More Stream-Mining Frequent...

This preview shows pages 1–7. Sign up to view the full content.

This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document

This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document

This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: 1 Still More Stream-Mining Frequent Itemsets Elephants and Troops Exponentially Decaying Windows 2 Counting Items ◆ Problem : given a stream, which items appear more than s times in the window? ◆ Possible solution : think of the stream of baskets as one binary stream per item. ◗ 1 = item present; 0 = not present. ◗ Use DGIM to estimate counts of 1’s for all items. 3 Extensions ◆ In principle, you could count frequent pairs or even larger sets the same way. ◗ One stream per itemset. ◆ Drawbacks: 1. Only approximate. 2. Number of itemsets is way too big. 4 Approaches 1. “ Elephants and troops ”: a heuristic way to converge on unusually strongly connected itemsets. 2. Exponentially decaying windows : a heuristic for selecting likely frequent itemsets. 5 Elephants and Troops ◆ When Sergey Brin wasn’t worrying about Google, he tried the following experiment. ◆ Goal : find unusually correlated sets of words. ◗ “ High Correlation ” = frequency of occurrence of set >> product of frequency of members. 6 Experimental Setup ◆ The data was an early Google crawl of the Stanford Web....
View Full Document

This document was uploaded on 01/25/2012.

Page1 / 17

cs345-streams3 - 1 Still More Stream-Mining Frequent...

This preview shows document pages 1 - 7. Sign up to view the full document.

View Full Document
Ask a homework question - tutors are online