This preview shows page 1. Sign up to view the full content.
Unformatted text preview: t; 0 = not present Use DGIM to estimate counts of 1’s for all items 3/2/2011 Jure Leskovec, Stanford C246: Mining Massive Datasets 32 In principle, you could count frequent pairs
or even larger sets the same way One stream per itemset Drawbacks: Only approximate Number of itemsets is way too big 3/2/2011 Jure Leskovec, Stanford C246: Mining Massive Datasets 33 Exponentially decaying windows: A heuristic
for selecting likely frequent itemsets What are “currently” most popular movies? Instead of computing the raw count in last N elements Compute a smooth aggregation over the whole stream If stream is a1, a2,… and we are taking the sum
of the stream, take the answer at time t to be:
=Σi = 1,2,…,t ai e c (ti) (or, Σi = 1,…,t ai (1c)ti ) c is a constant, presumably tiny, like 106 or 109 3/2/2011 Jure Leskovec, Stanford C246: Mining Massive Datasets 34 If each ai is an “item” we can compute the
characteristic function of each possible item x
as an E.D.W. That is: Σi = 1,2,…,t δi e c (ti) where δi = 1 if ai = x, and 0 otherwise Call this sum the “weight” item x 3/2/2011 Jure Leskovec, Stanford C246: Mining Massive Datasets 35 ...
1/c 3/2/2011 Jure Leskovec, Stanford C246: Mining Massive Datasets 36 Suppose we want to find those items of
weight at least ½ Important property: Sum over all weights is
1/(1 – ec ) or very close to 1/[1 – (1 – c)] = 1/c Thus:
At most 2/c items have weight at least ½. 3/2/2011 Jure Leskovec, Stanford C246: Mining Massive Datasets 37 Count (some) itemsets in an E.D.W.
When a basket B comes in: 1. Multiply all counts by (1c );
2. For uncounted items in B, create new count.
3. Add 1 to count of any item in B and to any
counted itemset contained in B.
4. Drop counts < ½.
5. Initiate new counts (next slide). * Informal proposal of Art Owen
3/2/2011 Jure Leskovec, Stanford C246: Mining Massive Datasets 38 Start a count for an itemset S ⊆ B if every
proper subset of S had a count prior to arrival
of basket B Example: Start counting {i, j} iff both i and j
were counted prior to seeing B Example: Start counting {i, j, k} iff {i, j}, {i, k},
and {j, k} were all counted prior to seeing B 3/2/2011 Jure Leskovec, Stanford C246: Mining Massive Datasets 39 Counts for single items < (2/c) times the
average number of items in a basket Counts for larger itemsets = ??. But we are
conservative about starting counts of large
sets. If we counted every set we saw, one basket of 20
items would initiate 1M counts. 3/2/2011 Jure Leskovec, Stanford C246: Mining Massive Datasets 40...
View
Full
Document
This document was uploaded on 02/26/2014 for the course CS 246 at Stanford.
 Winter '09
 Algorithms

Click to edit the document details