This preview has intentionally blurred sections. Sign up to view the full version.
View Full Document
Unformatted text preview: Solutions to Problem Set 1 October 7, 2009 Exercise 1: (20 points) Some years ago, greek videoclub chain Seven had the following offer to their customers: every time a customer rented a DVD, he was given a random coupon with the title of the Academy awards (Oscars) winner movie written on it. The first person to gather the coupons with all the unique winnermovie titles won a 10day vacation on a Caribbean island. If at that time, there were 75 unique such titles, and these titles were uniformly assigned to coupons, find the expected number of DVDs one had to rent in order to gather all of them. (Nocredit question: when was the competition held?). Solution: This is an instance of the Coupon Collector problem. The solution can be found at http: //en.wikipedia.org/wiki/Coupon collector Exercise 2: (20 points) Assume an array of n numbers X = { x 1 , ..., x n } . Write the pseudocode of an algorithm that computes the mean and the variance of the numbers by making a single pass over the data and using O (1) storage space. Recall that given X , the mean of X is given by x = 1 n n X i =1 x i , and the variance of X is given by var ( X ) = 1 n n X i =1 ( x i x ) 2 . Solution: You can rewrite the equation of the variance as follows: 1 var ( X ) = 1 n n X i =1 ( x i x ) 2 = 1 n n X i =1 ( x 2 i 2 xx i + x 2 ) = 1 n X i = 1 n x 2 i 2 x 1 n X i = 1 n x i + 1 n n X i =1 x 2 = 1 n n X i =1 x 2 i 2 x 2 + x 2 = 1 n n X i =1 x 2 i 1 n n X i =1 x i ! 2 = 1 n n X i =1 x 2 i 1 n 2 n X i =1 x i ! 2 . So a singlepass algorithm needs to go through all data points from 1 ...n and keep two arrays S [ i ] := ∑ i j =1 x j and SS [ i ] = ∑ i j =1 x 2 j . Once these two values are computed (in a single pass) computing the variance is trivial using the above expansion and the values S [ n ] and SS [ n ]. The mean can be trivially computed using the value S [ n ]. Note that you do not need to keep the array of the sums and sum of squares of the]....
View
Full Document
 Fall '09
 Variance, Data Mining, Probability theory, Natural number, sliding window

Click to edit the document details