Unformatted text preview: ere interested in further improving our process since a number closer to
0:10 may result in the need for additional sampling/experimentation. 3.2 Kolmogorov-Smirnov Test If we wanted to compare a theoretical distribution F to the empirical CDF from
the data F , then we will use this test. If F and F “appear” close, then we will
assume our data follows from F . For this to work we will need to have a working
knowledge of how several CDFs.
Step 1 We determine how far F and F are from each other. We call this, d,
distance the K-S Distance.
Step 2 We ask if d is “large” We compare d to the distances we’ get by com.
paring F to the empirical CDF and if the distance is relatively small then
we would expect that we got data that did have distribution F . If our
p-value is small, we reject F as the distribution. 3.2.1 Building an Empirical CDF 1
First we assume that every observation has weight n where n is the number of
observations. The the empirical CDF is de…ned as:
F (x) =
6 x) This translates to the number of observations observed at values less than or
equal to x. Take the example of a random set of data: f1; 1; 2; 5g. This has the
following ECDF. [If the picture doesn’ work it’ …le: ksexample1ecdf] F can
be built for discrete and continuous data. We assume for the continuous data
that P(xi = a; xj = a)=0, for i 6= j .
Put simply, to create an Empirical CDF:
1. Order the data.
2. Assume we point is equiprobable.
3. Make a step function
or equal to x. 3.2.2 Ix
n where Ix is the number of data points less than The K-S Distance At every step of our ECDF we will calculate a two distances: lim F (a)
and lim F (a)
a%x a&x ^
F (x) ^
F (x). The is the “up”and “down”distances. The K-S distance d the maxfdi;j g.
Example. We have received the data set f1; 4; 9; 16g,
and we suspect that it follows an exponential distribution.
7 8 Step 1 Build the empirical cdf Empirical CDF
-5 0 10 5 15 20 X Figure 3.1: Theoretical vs. Empirical cdfs
Since we believe that this follows an exponential distribution, we can estimate its parameter:
^= 1 = 2
Now, we can look at the comparison between the theoretical and empirical
F (1) = 1 e
F (1) = 1
15 F (4) = 1 e
F (4) = 1
15 F (9) = 1 e
F (9) = 4 18
5 F (16) = 1 e
F (16) = 1 32
15 Let di;U represent the “up” distance and di;D the “down” distance for observation i. Then, for this set of data, we have the following:
d1;U = 0:1252 d2;U = 0:0866 d3;U = 0:0512 d4;U = 0:1184
d1;D = 0:1248 d2;D = 0:1634 d3;D = 0:1980 d4;D = 0:1316
Thus, our KS Distance is d = 0:1980.
Step 2 Create data n data sets derived from F :
At this point, we will create n sets of ECDFs from F and denote them as
Fi (i = 1; ::; n). Then, we calculate the K-S Distance between F and each Fi .
9 Empirical CDF
1 d4,U d4,D 8
0 d3,D x)
F d2,U 4
-5 0 5 10 15 20 X Figure 3.2: Theoretical vs. Empirical cdfs with K-S Distances Finally, we consider the fr...
View Full Document
This note was uploaded on 09/27/2013 for the course STATS 340 taught by Professor Riley during the Winter '12 term at Waterloo.
- Winter '12