s340text-KStestOnly

# Exact magnitude if we were interested in further

This preview shows page 1. Sign up to view the full content.

This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: ere interested in further improving our process since a number closer to 0:10 may result in the need for additional sampling/experimentation. 3.2 Kolmogorov-Smirnov Test If we wanted to compare a theoretical distribution F to the empirical CDF from ^ ^ the data F , then we will use this test. If F and F “appear” close, then we will assume our data follows from F . For this to work we will need to have a working knowledge of how several CDFs. ^ Step 1 We determine how far F and F are from each other. We call this, d, distance the K-S Distance. Step 2 We ask if d is “large” We compare d to the distances we’ get by com. d paring F to the empirical CDF and if the distance is relatively small then we would expect that we got data that did have distribution F . If our p-value is small, we reject F as the distribution. 3.2.1 Building an Empirical CDF 1 First we assume that every observation has weight n where n is the number of observations. The the empirical CDF is de…ned as: n 1X ^ F (x) = I (xi n i=1 6 x) This translates to the number of observations observed at values less than or equal to x. Take the example of a random set of data: f1; 1; 2; 5g. This has the ^ following ECDF. [If the picture doesn’ work it’ …le: ksexample1ecdf] F can t s be built for discrete and continuous data. We assume for the continuous data that P(xi = a; xj = a)=0, for i 6= j . Put simply, to create an Empirical CDF: 1. Order the data. 2. Assume we point is equiprobable. 3. Make a step function or equal to x. 3.2.2 Ix n where Ix is the number of data points less than The K-S Distance At every step of our ECDF we will calculate a two distances: lim F (a) and lim F (a) a%x a&x ^ F (x) ^ F (x). The is the “up”and “down”distances. The K-S distance d the maxfdi;j g. Example. We have received the data set f1; 4; 9; 16g, and we suspect that it follows an exponential distribution. 7 8 Step 1 Build the empirical cdf Empirical CDF 0 . 1 8 . 0 6 . 0 x) ( F 4 . 0 2 . 0 0 . 0 -5 0 10 5 15 20 X Figure 3.1: Theoretical vs. Empirical cdfs Since we believe that this follows an exponential distribution, we can estimate its parameter: ^= 1 = 2 x 15 Now, we can look at the comparison between the theoretical and empirical CDF probabilities: F (1) = 1 e ^ F (1) = 1 4 2 15 F (4) = 1 e ^ F (4) = 1 2 8 15 F (9) = 1 e 3 ^ F (9) = 4 18 5 F (16) = 1 e ^ F (16) = 1 32 15 Let di;U represent the “up” distance and di;D the “down” distance for observation i. Then, for this set of data, we have the following: d1;U = 0:1252 d2;U = 0:0866 d3;U = 0:0512 d4;U = 0:1184 d1;D = 0:1248 d2;D = 0:1634 d3;D = 0:1980 d4;D = 0:1316 Thus, our KS Distance is d = 0:1980. Step 2 Create data n data sets derived from F : At this point, we will create n sets of ECDFs from F and denote them as ^ ^ Fi (i = 1; ::; n). Then, we calculate the K-S Distance between F and each Fi . 9 Empirical CDF 0 . 1 d4,U d4,D 8 . 0 d3,U 6 . 0 d3,D x) ( F d2,U 4 . 0 d2,D 2 . 0 d1,U d1,D 0 . 0 -5 0 5 10 15 20 X Figure 3.2: Theoretical vs. Empirical cdfs with K-S Distances Finally, we consider the fr...
View Full Document

## This note was uploaded on 09/27/2013 for the course STATS 340 taught by Professor Riley during the Winter '12 term at Waterloo.

Ask a homework question - tutors are online