Week08-Lecture (20101014) - Simulation Modeling and...

Info iconThis preview shows page 1. Sign up to view the full content.

View Full Document Right Arrow Icon
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: Simulation Modeling and Analysis (ORIE 4580/5580/5581) Week 8: Wrap-up Parameter Estimation (10/14/10) 1 Announcement and Agenda • For ALL Students: There will be a prelim at 7:30pm on Tuesday, October 19 at G01 Uris Hall. • • • • • Sample prelim is available on Blackboard (see the “Exam” Folder in Course Documents) • • Please work through the problems in the sample prelim! Review Session: Tuesday (10/19) during lecture I’ll go over the sample prelim questions Office Hours Next Week: Monday (10/18) from 1-6 Open-book and open-notes (no laptop, iPad, iPhone, iTouch etc) Please bring a calculator! 2 Goodness-of-Fit Tests • Motivation: Suppose we have fitted a distribution to the data, how can we provide a more “quantitative” assessment of the goodness of fit? • • • • • • • Histograms, bar plots, q-q plots only provide “visual” confirmation Need to perform a formal “statistical” tests. All of these tests operate on the following principle: Is it reasonable, statistically speaking, to assume that the data at hand come from the specified distribution H0 : The data come from the hypothesized distribution H1 : The data do not come from the hypothesized distribution Chi-square goodness of fit test Kolmogorov-Smirnov goodness of fit test • Two types of tests that we will cover: 3 Chi-Square Test • • • Main Idea: Compare the histogram of the data with the frequencies that would be expected if the data come from the hypothesized distribution • • • • Can be used for both discrete and continuous data STEP 1: Divide the data into a collection of, say k, bins [b0, b1), [b1, b2), ...., [bk-1, bk). The interval [b0, bk) should cover the range of the hypothesize distribution We can choose bk = ∞ and bins do NOT have to be the same size Oi = observed frequency in bin i = # of data points in interval [bi-1,bi) STEP 2: For each bin [bi-1,bi), compute Ei = expected frequency in bin i ￿ bi Ei = n f (u)du = n (F (bi ) − F (bi−1 )) (Continuous Distribution) bi−1 =n i=bi−1 bi ￿ p(i) (Discrete Distribution) 4 Chi-Square Test (cont.) • STEP 3: Compute the test statistic • • • • • A large value of D2 indicates a poor fit A small value of D2 indicates a good fit. D2 = k ￿ (Oi − Ei )2 i=1 Ei We will reject the null hypothesis H0 (that the data come from the hypothesized distribution) if D2 is too large. How large is too large? Need to specify the significance level α and need to know the distribution of the test statistic D2 under the null hypothesis H0 Under the null hypothesis H0, D2 has approximately a chi-square distribution with k - s - 1 degrees of freedom • s = # of parameters estimated from the data Test: Reject H0 if D2 ≥ χ2 −s−1,1−α k (1-α)-quantile of the chi-square distribution with k-1 degrees of freedom. 1−α α 5 χ2 −s−1,1−α k Example: Chi-Square Test • Recall our example involving interarrival times: we hypothesize that the datacome from an exponential distribution • • Bin [0.0, 0.2) [0.2, 0.4) [0.4, 0.6) [0.6, 0.8) [0.8, 1.0) [1.0, 1.2) [1.2, ∞) 173 observations over 90 minutes Estimated parameter: λ = 1.93 Observed 56 37 25 16 12 9 19 Expected 55.799 37.905 25.750 17.492 11.883 8.072 17.100 (O-E)2/E 0.0007 0.0216 0.0218 0.1273 0.0012 0.1067 0.2112 D2 = 0.4905 k=7 and s = 1 α = 5% χ2,1−α = 11.070 5 Fail to reject the null hypothesis 6 • • Can be used for both discrete and continuous distributions Comments on Chi-Square Test The range of a continuous probability distribution can be divided up into any number of bins with any probability • • • • It is desirable (but not necessary) to divide the continuous range into bins with equal probability (that is, Ei = Ej for all i and j) In our previous example, we did not use equal-probability bins Too many: Expected frequency in each bin is too small Too small: Test has little power to distinguish between H0 and H1 • • • No fixed rule for deciding the number of bins Size of bin should be such that the expected frequency of each bins is at least 5 p-Value: This corresponds to Pr{ X > D2 } where X is a chi-square random variable with k-s-1 degrees of freedom and D2 is our test statistic • • • This number is often reported by the statistics software If the p-value ≥ α, we fail to reject the null hypothesis H0 If the p-value < α, we reject the null hypothesis at significance level α 7 • • Kolmogorov-Smirnov (KS) Test The KS test compares the empirical c.d.f. of data with the c.d.f. of the hypothesized distribution • • The chi-square test compares the p.d.f. The KS test is more powerful than the chi-square test and does not require grouping of data into bins. It can only be used when the hypothesized distribution is continuous Suppose that our data are X1, ...., Xn and our hypothesized distribution is F(⋅) Empirical c.d.f. function based on the data is defined by: for all x, ˆ F (x) = fraction of data points less than or equal to x n ￿ 1 We reject the null hypothesis if the = 1[Xi ≤ x] n i=1 value of the test statistics is too large, that is, D > Dn,α (n = # of ￿ ￿ Test D = max ￿F (x) − F (x)￿ samples and α = significance level). ˆ￿ ￿ x • • Statistic: 8 How to compute the KS Test Statistics? 1 0.9 0.8 empricail/hypothesized c.d.f. empirical c.d.f. 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 0 5 10 x 15 20 hypothesized distribution • The empirical c.d.f. is a step function. To compute the KS test statistic, it is sufficient to evaluate the difference between the empirical and hypothesized distributions at the “jump” points! 9 Example and Caveats • • • Suppose we compute the test statistic and found D = 0.143 with n = 30. For a 10% significance level, D30,0.10 = 0.22. We thus fail to reject the null hypothesis at the 10% significance level. Caveats about KS test: When we use the data to estimate the parameters, we need to make adjustments to our test statistic. • • • • This topic is beyond the scope of this course. Caveats about goodness of fit test in general: Some people object to the use of goodness of fit tests for the following reasons: Little Data: All tests will have trouble rejecting any distribution Enormous Data: The hypothesized theoretical family of distributions may not be broad enough Need to be careful! 10 Fitting Nonstationary Poisson Process • • Poisson process (stationary and nonstationary) provide a good model of arrival process Question 1: How to estimate the arrival rate λ of the stationary Poisson process? • • • • • Easy: Given the arrival time data T1, T2, ...., we can compute the interarrival time A1 = T1, A2 = T2 - T1, .... The interrival times are i.i.d. random variables with an exponential distribution with parameter λ Apply MLE to the interarrival times data A1, A2, .... Question 2: How to estimate the arrival rate function λ(⋅) for a nonstationary Poisson process? For a general rate function, this is really hard. 11 Estimating Rate Function for Nonstationary Poisson • Assumption: The rate function λ(⋅) is piecewise constant with known break points! 4.5 4 3.5 3 arrival rate 2.5 2 1.5 1 0.5 0 0 5 10 time 15 20 12 • • Example Suppose we want to estimate the shape of the arrival rate function λ(⋅) during the time interval 6:00am to 8:00pm • • Assume that we have collected data over a period of M days STEP 1: Divide the time interval from 6:00am to 8:00pm into subintervals, where the arrival rate function over each subinterval is known to be approximately constant Example: 6am-8am, 8am-10am, 10am-noon, noon-2pm, 2pm-4pm, 4pm-6pm, and 6pm-8pm Let (t0, t1], (t1, t2],...., (tk-1, tk] denote the subintervals we pick • • • STEP 2: For each subinterval (ti-1, ti], let mi be defined by: Total # of Arrivals Over M Days During Subinterval [ti−1 , ti ) mi = M STEP 3: Our estimate of the interval arrival rate in subinterval (ti-1, ti] is mi ˆi = λ ti − ti−1 13 ...
View Full Document

This note was uploaded on 10/26/2010 for the course OR&IE 5580 at Cornell University (Engineering School).

Ask a homework question - tutors are online