Lecture 37 Monday April 20 dataStream_Notes

# Lecture 37 Monday April 20 dataStream_Notes - Data Streams...

This preview shows pages 1–2. Sign up to view the full content.

April 20, 2009 Data Streams Let's look at a data streams. A data stream consists of elements a 1 , a 2 , …, a n where n is the length of the string, and each element a i is from the alphabet {1, 2, …, m}. The ones we want to think about are very long data streams consisting of maybe trillions of elements. We want to find how many distinct elements d exist in the stream, but this would take log(n) bits, which for a very long data stream may be too many, so we just want to come up with an approximation that uses less space, but is provably within a certain range of the exact answer. First Approximation Algorithm 1. hash elements of the stream h i { 1,2,. .. ,m 1,2,. .. ,t } 2. test if there exists an element a i such that h(a i ) = 1. If so then assume we have seen approximately t or more distinct elements. Now we would like to prove that with high probability this algorithm yields the correct answer. What is the probability that something gets mapped to 1?

This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document
This is the end of the preview. Sign up to access the rest of the document.

## This document was uploaded on 01/22/2010.

### Page1 / 3

Lecture 37 Monday April 20 dataStream_Notes - Data Streams...

This preview shows document pages 1 - 2. Sign up to view the full document.

View Full Document
Ask a homework question - tutors are online