s340text-KStestOnly

# 32 theoretical vs empirical cdfs with k s distances

This preview shows page 1. Sign up to view the full content.

This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: equency of Di > d. The more i’ that satisfy this, the s ^ more acceptable our F will be. The following are examples of how to perform a KS test using R. The …rst method uses R code to calculate a p-value. #set up for the data of interest #sorted random exponential with rate 1 x<-sort(rexp(10,rate=1)) plot(ecdf(x)) #Plots Empirical CDF lines(x,pexp(x,rate=1/mean(x))) #Plots lines for other data values ks.test(x,"pexp",1/mean(x)) #Calculates the ks test The following is the code for the KS test. As an exercise, create your own code for the KS test. #**************************************************** #KS exp will perform a KS test and return the p-value 10 KSEXP <- function (data,N,d) { data <- sort(data) #Tells us how many data values there are. n = length(data) #The MLE for the rate in an exponential. rate <- 1/mean(data) #ks is the set of all ks distances ks <- NULL #We create N datasets from an EXP # For each of these datasets we # calculate a KS distance for (i in 1:N) { #generate data from EXP(rate) #rdata is the random data that we know is EXPONENTIAL rdata <- sort(rexp(n,rate=rate)) #The rate we would calculate from our EXPONENTIAL data rrate <- 1/mean(rdata) #Calculation of the KS distance F = 1-exp(-rrate*rdata) F_hat1 = c(1:n)/n F_hat2 = c(0:(n-1))/n D <- max(abs(F-F_hat1),abs(F-F_hat2)) #Append to the vector a new KS distance. ks<-cbind(D, ks) } #end for #Outputs how often the KS distance is larger than the one you calculated # from your RAW data. Pvalue <- mean(ks>d) return(Pvalue) }#end fn KSEXP To calculate the p-value for P(D d) we want the percentage of times Di is bigger than d (that is, the percentage of time that the generated data is less 11 reliable than our observed data). P (D d) The number of times Di is bigger than d n 1X = I (Di d) n i=1 If our p-value is small, then d is too large and we reject our hypothesis. We use our usual 5% rejection rule. Since the test statistic is dependent on the number of sets of values, then the answers will vary every time the test is done. To reduce the variation, we just make a large number of sets. We will still see variation in our test statistics; however, the larger the number of generated sets are, the smaller the variation will be. 12...
View Full Document

## This note was uploaded on 09/27/2013 for the course STATS 340 taught by Professor Riley during the Winter '12 term at Waterloo.

Ask a homework question - tutors are online