chap10f - Probability and Statistics with Reliability,...

Info iconThis preview shows page 1. Sign up to view the full content.

View Full Document Right Arrow Icon
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: Probability and Statistics with Reliability, Queuing and Computer Science Applications Second edition by K.S. Trivedi Publisher-John Wiley & Sons Chapter 10: Statistical Inference Dept. of Electrical & Computer Engineering Duke University Email: kst@ee.duke.edu URL: www.ee.duke.edu/~kst Copyright © 2003 by K.S. Trivedi 1 Statistical Inference : Introduction n For any probabilistic model, parameters of one or more distributions need to be estimated from measured data For example : n For an M/M/1 queue , parameters to be estimated are n n n , the service rate For the WFS availability model (Example 8.24), the parameters to be estimated are n n n , the job arrival rate failure rates of the workstation and the file server repair rates of the workstation and the file server Two component availability model with imperfect coverage (Example 8.22) n the coverage probability c (in addition to failure and repair rates) needs to be estimated Copyright © 2003 by K.S. Trivedi y n Statistical Inference : Introduction (contd.) n n n n Estimations are based on the outcomes of the experiment Set of all possible outcomes of an experiment is called the population, but often only a subset of the population is available Methods of Statistical Inference help in estimating the characteristics of the entire population, based on a suitably selected subset of the population ( called sample). As the sample size increases, the estimate becomes more and more representative of the entire population Statistical Inference involves the tasks of n n Estimation (calculating parameter values and confidence intervals) Hypothesis testing (accept/reject assumptions about the parameter or the form of the population distribution) Copyright © 2003 by K.S. Trivedi y n Samples n n Since the outcome of the experiment is random, it makes sense to specify a population by its distribution F(x) Now suppose we collect n experimental outcomes x1, x2, . . ., xn n This collection is a subset (or sample) of the set of all possible outcomes (the population) Each xi is an observation from the population X, or the value of the random variable Xi whose distribution is identical to that of X Copyright © 2003 by K.S. Trivedi y n Estimates n Estimates are quantities that are calculated from observed sample, to represent the values of desired parameters of the population distribution F(x) Examples of estimates n Sample mean x = (x1+ x2+ . . .+ xn)/n n Sample variance s2 = (xi - x )2/(n-1) Copyright © 2003 by K.S. Trivedi y n Random Sample Definition of random sample Copyright © 2003 by K.S. Trivedi y n Statistic n Definition of statistic n Example of statistics n Sample mean (as a random variable; hence capitalized) n Sample variance (as a random variable; hence capitalized) Copyright © 2003 by K.S. Trivedi y Estimator Definition of estimator Copyright © 2003 by K.S. Trivedi y n Desired Properties of an Estimator n n Unbiased :- On the average, the estimator should give the true value Efficient :- It should have lesser variance Consistent :- Should converge, in probability, to the true value Copyright © 2003 by K.S. Trivedi y n Unbiased estimators Example 10.1 n The sample mean population mean is an unbiased estimator of the whenever the latter exists. Copyright © 2003 by K.S. Trivedi y n Unbiased estimators Example 10.2 Copyright © 2003 by K.S. Trivedi y n Copyright © 2003 by K.S. Trivedi y Efficiency n Definition of efficiency Example 10.4: The sample mean is the most efficient linear estimator of the population mean Copyright © 2003 by K.S. Trivedi y n Consistency n Definition of consistency where n is the size of the sample Copyright © 2003 by K.S. Trivedi y Methods of Parameter Estimation n Method of moments Method of maximum Likelihood Copyright © 2003 by K.S. Trivedi y n The method of moments n n n Suppose one or more parameters of the distribution of X are to be estimated. Define the kth sample moment of the RV X as Then by equating the sample moment with the corresponding population moment we get an equation: As many equations as the number of parameters are obtained and solved simultaneously, to get the desired estimates Such estimators are usually consistent, but they may be biased and inefficient Copyright © 2003 by K.S. Trivedi y n Example 10.5: Main memory needed is X n Let X have the following density function (with one parameter to be estimated) n The first moment is n And then Copyright © 2003 by K.S. Trivedi y Maximum likelihood estimation n n Suppose the distribution of X has k parameters and a pdf If we have a random sample observed values as is with , the joint pdf of Fixing s and n, the joint pdf can be considered a function of . This function is known as the likelihood function. Copyright © 2003 by K.S. Trivedi y n Maximum Likelihood Estimation (contd.) n n n MLE estimates of are those values of which maximize the likelihood function. Thus the MLE estimates are those values of the parameter for which the observed sample is most likely to occur (since pdf is maximized) Often times, dealing with the log-likelihood function (log of the likelihood function) is easier and so the log likelihood function is maximized. MLE estimators are usually consistent and also most efficient in an asymptotic sense Copyright © 2003 by K.S. Trivedi y n Maximum Likelihood estimation (contd.) Example : n Consider transmission of n messages through a channel with success probability p. Transmission of a single message is modeled with pmf n Likelihood function is the joint pmf The value of p that maximizes L(p) is the maximum likelihood estimate of p Copyright © 2003 by K.S. Trivedi y n Example 10.7 n Let the number of calls per hour, X, be Poisson distributed with parameter n The likelihood function is then n Taking logs, we have n Letting the derivative w.r.t. equal to zero, we have Copyright © 2003 by K.S. Trivedi y Maximum Likelihood Estimation : Software Reliability Models Consider the Goel–Okumoto Model for software reliability n n Software failures display the behavior of a nonhomogeneous Poisson process (NHPP) Let N(t) denote the cumulative number of faults detected by time t and m(t) be its expectation (also called the mean value function), then Copyright © 2003 by K.S. Trivedi y n Maximum Likelihood Estimation : Software Reliability Models n The mean value function m(t) is described by where, a is the expected number of faults that would be detected given infinite testing time and b is the failure occurrence rate per fault The instantaneous failure intensity is Copyright © 2003 by K.S. Trivedi y n Maximum Likelihood Estimation : Software Reliability models (contd.) n n n Let Si denote the time of occurrence of ith failure The pdf of Si at si given the previous observations, is The joint density or the likelihood function of S1,S2,…,Sn is The log-likelihood function in this (Goel-Okumoto model) case is Copyright © 2003 by K.S. Trivedi y n Maximum Likelihood Estimation : Software Reliability models (contd.) n Maximizing the log likelihood function w.r.t. a and b, we have Solving these two nonlinear equations numerically, we get a and b Copyright © 2003 by K.S. Trivedi y n Maximum Likelihood Estimation : Truncated data Example 10.9 n Consider a sample truncated test of n components without replacement, truncated after r failures n n Components follow an exponential failure law with parameter 1/θ Let denote the times to failure of the observed failures so that . Let be the times to failure of the remaining components (not observed till the end of test) The likelihood function can be written as Copyright © 2003 by K.S. Trivedi y n Maximum Likelihood Estimation : Truncated data (contd.) n Dividing by the product of , we get s and taking limit as Define accumulated life on test as Copyright © 2003 by K.S. Trivedi y n Maximum Likelihood Estimation : Truncated data (contd.) n Differentiating the likelihood function w.r.t. θ and setting it equal to 0 (maximizing w.r.t. θ) Thus we get the maximum likelihood estimator (MLE) of mean life as Thus the estimator of mean life is given by the accumulated life on test, , divided by the number of observed failures Copyright © 2003 by K.S. Trivedi y n Maximum Likelihood Estimation : Truncated data (contd.) Common Mistakes while dealing with truncated data n Ignoring the observations for altogether n Using as the observation for Copyright © 2003 by K.S. Trivedi y Maximum Likelihood Estimation : Truncated data (contd.) n n When the observations for completely ignored, this estimator is now, are When is used as the observation for the estimator is It can easily be seen that Copyright © 2003 by K.S. Trivedi y n MLE with Weibull Data : Truncated data Example 10.10 n Consider a sample truncated test of n components with test truncated after first r failures(without replacement) n The lifetimes of the components follow a Weibull Distribution. n Let denote the times to failure of the observed failures so that . Let be the times to failure of the remaining components (not observed till the end of test) Copyright © 2003 by K.S. Trivedi y Maximum Likelihood Estimation : Truncated data (contd.) n The likelihood function is defined as n Dividing by the product of s and taking limit as Copyright © 2003 by K.S. Trivedi y Maximum Likelihood Estimation : Truncated data (contd.) n Maximizing the log-likelihood by differentiating w.r.t. λ and α, respectively and equating to zero, we get Since no closed form solutions for and exist, above two equations are rearranged as following and solved iteratively for and Copyright © 2003 by K.S. Trivedi y n Need for a Confidence interval n Each time we take an n-sample and produce a point estimate of the parameter of interest, it is different in general; also the estimate will rarely if ever coincide with the true value So how can we say whether the estimate is good? n n n Note that we maximized the joint probability of the observation while computing the MLE estimate but we did not take into account the spread of the density of the estimator Smaller the variance of the estimator, better is the estimator; but how small is good? How can we get some measure of repeatability or respectability to the estimate? Copyright © 2003 by K.S. Trivedi y n Need for a Confidence Interval (contd.) An estimator is itself a random variable following a sampling distribution, hence it is important to know its fluctuation. n n n n Suppose we can produce an interval say A(θ) purported to contain θ with probability γ Note that each specific value of the estimate, either lies in the confidence interval or it does not But if we sample a large number of times, we can be sure that the fraction of times the estimate lies within the interval is γ Interestingly enough, A(θ) is a random interval in that it changes with the estimate each time! So repeatability is only partial! Copyright © 2003 by K.S. Trivedi y n Confidence Interval n Confidence interval is defined in such a way that we are reasonably confident that it contains the true value of the unknown parameter The width of the confidence interval suggests the amount of variability in the estimated value. Copyright © 2003 by K.S. Trivedi y n Confidence intervals –Chebyshev inequality n Chebyshev’s inequality suggests a way to get a bound on the confidence interval assuming that the variance of the estimator is known We can get better results if we know the nature of the distribution Copyright © 2003 by K.S. Trivedi y n Example 10.13 Copyright © 2003 by K.S. Trivedi y Example 10.13 n So ( - ε, +ε) is a confidence interval for the population mean µ with a confidence coefficient that is greater than 1- σ2/(nε2) We can get exact confidence interval if we know the nature of the distribution of the population: we consider n n n n Sampling from the normal distribution Exponential distribution Bernoulli distribution Etc. Copyright © 2003 by K.S. Trivedi y n Exact Confidence Interval Steps in obtaining exact confidence interval for parameter based on a random sample, X1,X2,..,Xn : n n n n Find a random variable that is a function of X1,..,Xn W = W(X1 ,X2,..,Xn; ) Find numbers a and b such that P(a < W < b) = After sampling the values xi of Xi, find the range of values take so that a < w( ) < b This range is the confidence interval of can Copyright © 2003 by K.S. Trivedi y n Sampling from Normal distribution n n Suppose a sample is taken from normal population t then the sample mean is , where n is the sample size To find confidence interval for population mean, we find numbers a and b such that Then Copyright © 2003 by K.S. Trivedi y n Example 10.14 n Letting a = - b, we have n Let , then these values can be read from a table n The width of the confidence interval is The number of samples required to produce this confidence interval Copyright © 2003 by K.S. Trivedi y n Student t distribution n When the sample size is small, we can use student t distribution to estimate the confidence interval If X is the sample mean and s be the sample variance of a random sample of size n from normal σ2 distribution with mean µ and variance then the random variable has student t distribution with (n-1) degrees of freedom. Copyright © 2003 by K.S. Trivedi y n Sampling from the Exponential distribution n n X is EXP(λ) Xi is EXP(λ) for each i We wish to obtain a confidence interval for either λ or the corresponding mean θ = 1/λ Copyright © 2003 by K.S. Trivedi y n Sampling from the Exponential distribution Copyright © 2003 by K.S. Trivedi y Sampling from the Exponential distribution : Truncated data n n The accumulated life on test is r-stage Erlang with parameter λ . Hence, is r-stage Erlang with parameter ½ (i.e. Distribution) Thus the 100(1-α)% confidence interval for θ is given by Copyright © 2003 by K.S. Trivedi y n Example 10.20 n n Consider job interarrival times to be exponentially distributed, and 50 jobs arrive within 100 minutes Point estimate of the job arrival rate is then 0.5 jobs/min Noting that we find the 90% confidence interval for the job arrival rate λ is (0.39, 0.62) Copyright © 2003 by K.S. Trivedi y n Reliability estimation of Software--Definition and features n n n n n Software to be deployed for operational use Assume no bugs are fixed in the field and hence no reliability growth Estimation method: same as hardware Interested in estimating parameters from observed data This is sometimes called steady state failure rate estimation in the software context (that after reliability growth stops) Remaining faults are sometimes called Heisenbugs Copyright © 2003 by K.S. Trivedi y n Reliability estimation --For exponential distribution (1) n Given random sample of n observations , the maximum-likelihood estimate of mean time to failure: θˆ = x . Where is the sample mean 100(1- α)% confidence interval of MTTF Copyright © 2003 by K.S. Trivedi y n x Reliability estimation -- For exponential distribution (2) Assume that 50 failures were observed in a software system, and their sample mean or the point estimate of the MTTF was 490 hours. Noting that 2 χ 100 ; 0.95 = 77 .93 and 2 χ 100 ; 0 .05 = 124 . 34 n 90% confidence interval for MTTF 394.1<MTTF<628.87 hours Copyright © 2003 by K.S. Trivedi y n Reliability estimation --For exponential distribution (3) n If be the lower and upper limits of the 100γ% confidence interval for Since exponential is a monotonic function, the 100γ% upper and lower confidence limits for reliability at time t are Therefore, the 100γ% confidence interval for reliability at time t is Copyright © 2003 by K.S. Trivedi y n Sampling from Bernoulli distribution n n n Suppose the random variable, denoting the experimental observation, Xi be Bernoulli distributed with parameter p The success probability is The statistic is binomially distributed Derivation of 100(1-α)% confidence interval Copyright © 2003 by K.S. Trivedi y n Sampling from Bernoulli distribution (cont’d) n n n Either tabular techniques or Mathematica may be used to obtain the confidence interval Sn may be approximated by the normal distribution in case np ≥ 5 and nq ≥ 5 In cases when p is close to 0 or 1, Poisson approximation may be used Copyright © 2003 by K.S. Trivedi y n Confidence Intervals of Coverage Probability n n n Deriving coverage probabilities and their confidence intervals from fault injection experiments If n be the number of injected faults, k the number of detected errors and c the error-detection coverage probability Let us represent the outcome of ith fault injection experiment by a Bernoulli random variable, Xi . Its observed value is The statistic the distribution function is binomially distributed, with Copyright © 2003 by K.S. Trivedi y n Confidence Intervals of Coverage Probability (contd.) n n An unbiased estimate of coverage probability is Confidence intervals of coverage probability can be obtained by using binomial formula or tables of binomial distribution. When sample size is large and c is not close to 0 or 1, we can use the normal approximation. Students t distribution gives a more accurate estimation of the confidence interval Copyright © 2003 by K.S. Trivedi y n Confidence Intervals of Coverage Probability (contd.) When c is expected to take low (c < 0.1) or high (c > 0.9 ) values, one sided confidence intervals are calculated with the aid of Poisson approximation and distribution with degrees of freedom n n For c < 0.1, the one sided confidence interval is given by For c > 0.9, the one sided confidence interval is derived for q = 1 – c , using a similar approach Copyright © 2003 by K.S. Trivedi y n Confidence Intervals of Coverage Probability (contd.) Example In order to estimate the coverage c of a fault-tolerant system, 200 random faults were inserted. The recovery mechanism detected 178 of these faults. n Calculating the exact 95% confidence interval for c n Let Xi denote the result of individual fault detection. The statistic is binomially distributed with distribution function The point estimate of c is Copyright © 2003 by K.S. Trivedi y n Confidence Intervals of Coverage Probability (contd.) n Let be the largest integer such that n Since sn = k = 178, the interval of c satisfying is (0.833,0.929) Copyright © 2003 by K.S. Trivedi y Confidence Intervals of Coverage Probability (contd.) Using normal approximation to binomial distribution, n Sn is approximately normal with and n The 95% confidence interval of c is obtained from then, n Thus the 95% confidence interval of c is (0.847,0.933) which is somewhat wider than the exact one based on the binomial as should be expected. Copyright © 2003 by K.S. Trivedi y n Confidence Intervals of Coverage Probability (contd.) Confidence interval calculation using Poisson approximation to the binomial distribution. n Let p = 1 -c (c is close to 1), be the probability of unsuccessful fault detection. The one–sided 95% confidence interval of p is given by n Note : here k is the number of undetected faults Thus the 95% confidence interval for c is (0.843,1) n Copyright © 2003 by K.S. Trivedi y n Estimation related to Markov chains n n Consider a homogeneous discrete-time Markov chain with finite number of states {1, 2, …, m} Let Nij denote the number of transitions from state i to state j. Let be the number of total transitions out of state i. The transition probability from state i to state j can be estimated as Copyright © 2003 by K.S. Trivedi y n Example 10.25 n Suppose the values of nij is n Then the transition probability is Copyright © 2003 by K.S. Trivedi y Example 10.25 (contd.) Nij can be thought to be binomially distributed so that Nij is B(k;n,pij ) . The confidence intervals for pij can thus be derived by the previously described methods. If ni is sufficiently large and if pij is not close to 0 or 1, an approximate 100γ% confidence interval for pij is given by Copyright © 2003 by K.S. Trivedi y n Availability Estimation of Repairable Systems n n n CTMC model of a simple repairable system Point estimation of availability Confidence interval estimation Guaranteed “Five Nines” availability Copyright © 2003 by K.S. Trivedi y n Simple Repairable System λ Up 0 1 Down µ State Up Down Time Copyright © 2003 by K.S. Trivedi y Estimation of Availability n Steady-state availability: MTTF λ A= = MTTF + MTTR λ + µ Point estimate: ˆ A= Total Up Time Total Up Time + Total Down Time Copyright © 2003 by K.S. Trivedi y n Derivation of Confidence Interval for System Availability n n n n n n n Time To Failure Ti ~ EXP(λ). Total Up Time Sn=∑Ti ~ Erlang(λ,n). 2λSn ~ χ2(2n). n ˆ Λ= MLE estimator of failure rate: Sn Similarly, Time To Repair Ri ~ EXP( µ). Total Down Time Yn=∑Ri ~ Erlang(µ,n). 2µ Yn ~ χ2(2n). ˆ=n Μ MLE estimator of repair rate: Yn Copyright © 2003 by K.S. Trivedi y n Estimation of availability n n For a two-state system, The MLE estimate of A is The confidence interval is Copyright © 2003 by K.S. Trivedi y n Derivation of Confidence Interval for System Availability (cnt.) n n Let system utility ρ=λ/µ ˆ MLE of ρ: R = Λ = n / Sn = Yn ˆ ˆ Μ n / Yn Sn ˆ R µ Yn 2 µYn ~ F (2 n, 2n) = = ρ λ S n 2λS n 100(1-α)% one-sided Confidence Interval of ρ: Copyright © 2003 by K.S. Trivedi y n Derivation of Confidence Interval for System Availability (cnt.) n MLE estimate of Availability A: 1 Sn ˆ= 1 = A = ˆ 1 + R 1 + Yn / S n S n + Yn Lower one-sided confidence interval of A: AL = 1+ 1 = yn / sn f 2 n ,2 n;1−α 1 ˆ 1/ A −1 1+ f 2 n , 2 n;1−α Copyright © 2003 by K.S. Trivedi y n Confidence Interval Point Estimate F(20,20) 5% One-Sided Confidence Interval 10% Double-Sided Confidence Interval Copyright © 2003 by K.S. Trivedi y Example For n=1, S1 = 999, Y1 = 1 n n Point estimation of availability use an F distribution with (2,2) degrees of freedom n The 95% confidence interval for A is n (0.9624,1) The upper one-sided confidence interval for A is (0.9813,1) Copyright © 2003 by K.S. Trivedi y n Example For n=10, S1 = 9990, Y1 = 10 n n n n The estimation of availability use an F distribution with (2,2) degrees of freedom The 95% confidence interval for A is (0.9975,0.9996) The upper one-sided confidence interval for A is (0.9979,1) Copyright © 2003 by K.S. Trivedi y n How to Guarantee The Five Nines Objective In addition to enough down time, the samples (failure events) Should be enough to check if AL>0.99999. Down Time Per Year Estimated Availability Samples Required 1.0 min 0.999998097 3 2.0 min 0.999996195 7 3.0 min 0.999994292 18 4.0 min 0.999992389 74 5.0 min 0.999990487 2171 Copyright © 2003 by K.S. Trivedi y Estimating parameters for M/M/1 queue n Maximum likelihood estimate of arrival rate is with confidence interval n Maximum likelihood estimate of service rate is with confidence interval Maximum likelihood estimate of server utilization is with confidence interval Copyright © 2003 by K.S. Trivedi y n Estimation with dependent samples n n So far, the measurements obtained were all assumed to be independent, but often they exhibit dependencies . Eg. Response times of consecutive requests to a file server are highly correlated Suppose the observed quantities are values of dependent random variables n The sample mean is n The variance is (no longer ) Copyright © 2003 by K.S. Trivedi y Estimation with dependent samples (contd.) n n As n approaches infinity, The statistic approaches standard normal distribution as n approaches infinity Therefore, an approximate 100(1-α)% confidence interval for µ is given by Copyright © 2003 by K.S. Trivedi y n Estimation with dependent samples (contd.) n n n Use independent replications to avoid the need to estimate σ 2 a Replicate the experiment m times, with each experiment containing n observations. Use xi ( j ) to denote the value of the i-th observation in the j-th experiment. Then the mean and variance are From the individual sample means , we obtain the point estimate of population mean as Copyright © 2003 by K.S. Trivedi y n Estimation with dependent samples (contd.) n n The common variance of The statistic is approximately tdistributed with (m-1) degrees of freedom. Therefore, 100(1-α)% confidence interval is given by Copyright © 2003 by K.S. Trivedi y n Example 10.29 n Consider 16 independent experiments conducted, with each experiment measuring 20 successful response time With we have Copyright © 2003 by K.S. Trivedi y n Hypothesis testing n n n n Statistical tests: Procedures that enable us to decide whether to reject or accept hypotheses based on the information contained in a sample. Null hypothesis, H0, is a claim that we are interested in rejecting or refuting. The contradictory hypothesis is called the alternative hypothesis (H1) The n-space of observations is divided into two regions, R(H0) and R(H1), called acceptance region and rejection region respectively Type I error (false alarm)– Null hypothesis is true but sample lies in rejection region. Probability denoted by α (also called the level of significance) Type II error – Null hypothesis is false but sample lies in acceptance region. Probability denoted by β. Note that (1- β) is called the power of the test Copyright © 2003 by K.S. Trivedi y n Tests on the population mean Copyright © 2003 by K.S. Trivedi y Normal distribution n n n Test a hypothesis of population mean µ based on a random sample size n with known variance The required statistic with standard normal distribution is Also let Specify the type I error as Then the acceptance region is Copyright © 2003 by K.S. Trivedi y n Normal distribution (cont’d) n If the alternative hypothesis is of the form we adopt an asymmetric acceptance region If the alternative hypothesis is we adopt the rejection region Copyright © 2003 by K.S. Trivedi y n Relation between type I and type II errors n n In practice, we usually want to fix type I error probability and then devise a test that has type II error probability as small as possible Consider and Suppose the critical region is , Then and values are illustrated as in the figure. Then, C can be determined as Copyright © 2003 by K.S. Trivedi y n If the allowable type II error probability is also specified, then the minimum acceptable sample size can also be determined. Copyright © 2003 by K.S. Trivedi y n Hypotheses concerning two means – Case 1 Test the null hypothesis n n n Random sample size Variance and and The statistic has standard normal distribution, and Therefore, we have Copyright © 2003 by K.S. Trivedi y n Case 2 n n n Consider the case when and are not known Assume Use the two sample means from the two populations and , the common population variance is estimated by If the null hypothesis holds, then the following statistic has a t distribution with degrees of freedom Copyright © 2003 by K.S. Trivedi y n Case 3 n n Test procedures for case 1 and case 2 are valid only for normal distributions We consider a distribution-free or non-parametric test for the following hypothesis Rank-sum Test n n n Suppose the random sample size is and Combine the samples and arrange them in order of increasing magnitude, and assign to the -ordered values the ranks 1, 2, 3, … Let denote the rank of , then statistic is a sum of ranks Copyright © 2003 by K.S. Trivedi y n Case 3 (cont’d) n Let denote the set of all combinations of y ranks that will sum to w. Then we get Then the significance level α is determined by For large sample size case, the statistic under the zero hypothesis possess a normal distribution with Copyright © 2003 by K.S. Trivedi y n Hypothesis concerning variances n n Consider the problem of testing a null hypothesis that a population variance equals some fixed value Assume sampling from a normal population , then the statistic is chi-square distributed with n-1 degrees of freedom. The critical regions for testing the null hypothesis are Copyright © 2003 by K.S. Trivedi y n Goodness-of-fit tests: Discrete random variables n n Assume X is a discrete random variable with pmf given by . We wish to test the null hypothesis that X possesses a certain specific pmf given by The problem then is to test the hypothesis Let n be the number of observations, and Ni be the observed number of times that the measured value of X takes value i. Then, we have Copyright © 2003 by K.S. Trivedi y n Discrete random variables (cont’d) n n The statistic is approximately chi-square distributed with (k-1) degrees of freedom. Under null hypothesis, the statistic is While there are some unknown parameters in the population distributions, the unknown parameters need to be first estimated from the collected samples of size n. Then the test statistic can be used. Copyright © 2003 by K.S. Trivedi y n Example 10.41 Consider the number of errors discovered in a system program: Copyright © 2003 by K.S. Trivedi y n Example 10.41 (cont’d) n An estimate of the rate parameter is n The test statistic is then Copyright © 2003 by K.S. Trivedi y Example 10.41 (cont’d) n The statistic is chi-square distributed with 4 degrees of freedom. Since , we reject the null hypothesis at a 5% level of significance. Copyright © 2003 by K.S. Trivedi y n Continuous random variables n Suppose X is a continuous random variable and we wish to test the hypothesis The Kolmogorov-Smirnov test is adopted. n The given random sample is first arranged in increasing order of magnitude The empirical distribution function is defined by n The value of Kolmogorov-Smirnov statistic is defined by n Copyright © 2003 by K.S. Trivedi y n Example 10.42 n n Suppose we have the following 10 Weibull distributed random deviates with shape parameter 2: and are plotted in the figure The observed value of Dn statistic is 0.0822 and is not in the rejection region. so we accept the null hypothesis at 5% level of significance. Copyright © 2003 by K.S. Trivedi y n Confidence interval n n A confidence band with coefficient using Dn statistic by using is obtained The confidence band is then Suppose some parametric family of functions is used, then the test statistic is Copyright © 2003 by K.S. Trivedi y n Graphical method to estimate confidence interval Graphical method to estimate confidence interval Transform the data in such a way that approximately straight lines can be generated when data is plotted n Exponential distribution n The CDF is Rewriting the equation, we get n The empirical CDF is defined as n n Weibull distribution n n n The CDF is Rewriting the equation we get If the distribution applies, the data plotted should approximately fall on a straight line. Copyright © 2003 by K.S. Trivedi y n Graphical method to estimate confidence interval (cont’d) Consider the following two graphs, we can conclude that the Weibull distribution assumption is appropriate. Plot with EXP Distribution Assumption Plot with Weibull Distribution Assumption Copyright © 2003 by K.S. Trivedi y n ...
View Full Document

Ask a homework question - tutors are online