cs345-streams2-2

# cs345-streams2-2 - 1 More Stream-Mining Counting How Many...

This preview shows pages 1–6. Sign up to view the full content.

This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document

This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document

This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: 1 More Stream-Mining Counting How Many Elements Computing Moments 2 Counting Distinct Elements Problem : a data stream consists of elements chosen from a set of size n . Maintain a count of the number of distinct elements seen so far. Obvious approach : maintain the set of elements seen. 3 Applications How many different words are found among the Web pages being crawled at a site? Unusually low or high numbers could indicate artificial pages (spam?). How many different Web pages does each customer request in a week? 4 Using Small Storage Real Problem : what if we do not have space to store the complete set? Estimate the count in an unbiased way. Accept that the count may be in error, but limit the probability that the error is large. 5 Flajolet-Martin* Approach Pick a hash function h that maps each of the n elements to log 2 n bits, uniformly. Important that the hash function be (almost) a random permutation of the elements. For each stream element a , let r ( a ) be the number of trailing 0s in h ( a )....
View Full Document

## This document was uploaded on 03/04/2012.

### Page1 / 19

cs345-streams2-2 - 1 More Stream-Mining Counting How Many...

This preview shows document pages 1 - 6. Sign up to view the full document.

View Full Document
Ask a homework question - tutors are online