Unformatted text preview: Simulation Modeling and Analysis (ORIE 4580/5580/5581)
Week 8: Wrapup Parameter Estimation (10/14/10) 1 Announcement and Agenda
•
For ALL Students: There will be a prelim at 7:30pm on Tuesday, October 19 at G01 Uris Hall. • • • • • Sample prelim is available on Blackboard (see the “Exam” Folder in Course Documents) • • Please work through the problems in the sample prelim! Review Session: Tuesday (10/19) during lecture I’ll go over the sample prelim questions Ofﬁce Hours Next Week: Monday (10/18) from 16 Openbook and opennotes (no laptop, iPad, iPhone, iTouch etc) Please bring a calculator! 2 GoodnessofFit Tests • Motivation: Suppose we have ﬁtted a distribution to the data, how can we provide a more “quantitative” assessment of the goodness of ﬁt? • • • • • • • Histograms, bar plots, qq plots only provide “visual” conﬁrmation Need to perform a formal “statistical” tests. All of these tests operate on the following principle: Is it reasonable, statistically speaking, to assume that the data at hand come from the speciﬁed distribution H0 : The data come from the hypothesized distribution H1 : The data do not come from the hypothesized distribution Chisquare goodness of ﬁt test KolmogorovSmirnov goodness of ﬁt test • Two types of tests that we will cover: 3 ChiSquare Test • • • Main Idea: Compare the histogram of the data with the frequencies that would be expected if the data come from the hypothesized distribution • • • • Can be used for both discrete and continuous data STEP 1: Divide the data into a collection of, say k, bins [b0, b1), [b1, b2), ...., [bk1, bk). The interval [b0, bk) should cover the range of the hypothesize distribution We can choose bk = ∞ and bins do NOT have to be the same size Oi = observed frequency in bin i = # of data points in interval [bi1,bi) STEP 2: For each bin [bi1,bi), compute Ei = expected frequency in bin i bi Ei = n f (u)du = n (F (bi ) − F (bi−1 )) (Continuous Distribution)
bi−1 =n i=bi−1 bi p(i) (Discrete Distribution)
4 ChiSquare Test (cont.) • STEP 3: Compute the test statistic • • • • • A large value of D2 indicates a poor ﬁt A small value of D2 indicates a good ﬁt. D2 = k (Oi − Ei )2 i=1 Ei We will reject the null hypothesis H0 (that the data come from the hypothesized distribution) if D2 is too large. How large is too large? Need to specify the signiﬁcance level α and need to know the distribution of the test statistic D2 under the null hypothesis H0 Under the null hypothesis H0, D2 has approximately a chisquare distribution with k  s  1 degrees of freedom • s = # of parameters estimated from the data Test: Reject H0 if D2 ≥ χ2 −s−1,1−α k
(1α)quantile of the chisquare distribution with k1 degrees of freedom. 1−α α
5 χ2 −s−1,1−α k Example: ChiSquare Test • Recall our example involving interarrival times: we hypothesize that the datacome from an exponential distribution • •
Bin [0.0, 0.2) [0.2, 0.4) [0.4, 0.6) [0.6, 0.8) [0.8, 1.0) [1.0, 1.2) [1.2, ∞) 173 observations over 90 minutes Estimated parameter: λ = 1.93
Observed 56 37 25 16 12 9 19 Expected 55.799 37.905 25.750 17.492 11.883 8.072 17.100 (OE)2/E 0.0007 0.0216 0.0218 0.1273 0.0012 0.1067 0.2112 D2 = 0.4905 k=7 and s = 1 α = 5%
χ2,1−α = 11.070 5 Fail to reject the null hypothesis
6 • • Can be used for both discrete and continuous distributions Comments on ChiSquare Test The range of a continuous probability distribution can be divided up into any number of bins with any probability •
• • • It is desirable (but not necessary) to divide the continuous range into bins with equal probability (that is, Ei = Ej for all i and j) In our previous example, we did not use equalprobability bins Too many: Expected frequency in each bin is too small Too small: Test has little power to distinguish between H0 and H1 • • • No ﬁxed rule for deciding the number of bins Size of bin should be such that the expected frequency of each bins is at least 5 pValue: This corresponds to Pr{ X > D2 } where X is a chisquare random variable with ks1 degrees of freedom and D2 is our test statistic • • • This number is often reported by the statistics software If the pvalue ≥ α, we fail to reject the null hypothesis H0 If the pvalue < α, we reject the null hypothesis at signiﬁcance level α
7 • • KolmogorovSmirnov (KS) Test
The KS test compares the empirical c.d.f. of data with the c.d.f. of the hypothesized distribution • • The chisquare test compares the p.d.f. The KS test is more powerful than the chisquare test and does not require grouping of data into bins. It can only be used when the hypothesized distribution is continuous Suppose that our data are X1, ...., Xn and our hypothesized distribution is F(⋅) Empirical c.d.f. function based on the data is deﬁned by: for all x,
ˆ F (x) = fraction of data points less than or equal to x n 1 We reject the null hypothesis if the = 1[Xi ≤ x] n i=1 value of the test statistics is too large, that is, D > Dn,α (n = # of Test D = max F (x) − F (x) samples and α = signiﬁcance level). ˆ
x • • Statistic: 8 How to compute the KS Test Statistics?
1 0.9 0.8
empricail/hypothesized c.d.f. empirical c.d.f. 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 0 5 10 x 15 20 hypothesized distribution • The empirical c.d.f. is a step function. To compute the KS test statistic, it is sufﬁcient to evaluate the difference between the empirical and hypothesized distributions at the “jump” points!
9 Example and Caveats • • • Suppose we compute the test statistic and found D = 0.143 with n = 30. For a 10% signiﬁcance level, D30,0.10 = 0.22. We thus fail to reject the null hypothesis at the 10% signiﬁcance level. Caveats about KS test: When we use the data to estimate the parameters, we need to make adjustments to our test statistic. • • • • This topic is beyond the scope of this course. Caveats about goodness of ﬁt test in general: Some people object to the use of goodness of ﬁt tests for the following reasons: Little Data: All tests will have trouble rejecting any distribution Enormous Data: The hypothesized theoretical family of distributions may not be broad enough Need to be careful!
10 Fitting Nonstationary Poisson Process • • Poisson process (stationary and nonstationary) provide a good model of arrival process Question 1: How to estimate the arrival rate λ of the stationary Poisson process? • • • • • Easy: Given the arrival time data T1, T2, ...., we can compute the interarrival time A1 = T1, A2 = T2  T1, .... The interrival times are i.i.d. random variables with an exponential distribution with parameter λ Apply MLE to the interarrival times data A1, A2, .... Question 2: How to estimate the arrival rate function λ(⋅) for a nonstationary Poisson process? For a general rate function, this is really hard.
11 Estimating Rate Function for Nonstationary Poisson • Assumption: The rate function λ(⋅) is piecewise constant with known break points!
4.5 4 3.5 3
arrival rate 2.5 2 1.5 1 0.5 0 0 5 10 time 15 20 12 • • Example
Suppose we want to estimate the shape of the arrival rate function λ(⋅) during the time interval 6:00am to 8:00pm • • Assume that we have collected data over a period of M days STEP 1: Divide the time interval from 6:00am to 8:00pm into subintervals, where the arrival rate function over each subinterval is known to be approximately constant Example: 6am8am, 8am10am, 10amnoon, noon2pm, 2pm4pm, 4pm6pm, and 6pm8pm Let (t0, t1], (t1, t2],...., (tk1, tk] denote the subintervals we pick • • • STEP 2: For each subinterval (ti1, ti], let mi be deﬁned by:
Total # of Arrivals Over M Days During Subinterval [ti−1 , ti ) mi = M STEP 3: Our estimate of the interval arrival rate in subinterval (ti1, ti] is mi ˆi = λ ti − ti−1 13 ...
View
Full Document
 '10
 PAAT
 Normal Distribution, Null hypothesis, Subintervals, Continuous probability distribution, Discrete Distribution

Click to edit the document details