This preview shows page 1. Sign up to view the full content.
Unformatted text preview: Probability and Statistics with Reliability,
Queuing and Computer Science
Applications
Second edition
by K.S. Trivedi
PublisherJohn Wiley & Sons Chapter 10: Statistical Inference
Dept. of Electrical & Computer Engineering
Duke University
Email: kst@ee.duke.edu
URL: www.ee.duke.edu/~kst
Copyright © 2003 by K.S. Trivedi 1 Statistical Inference : Introduction n For any probabilistic model, parameters of one or
more distributions need to be estimated from
measured data
For example :
n For an M/M/1 queue , parameters to be estimated are
n
n n , the service rate For the WFS availability model (Example 8.24), the
parameters to be estimated are
n
n n , the job arrival rate failure rates of the workstation and the file server
repair rates of the workstation and the file server Two component availability model with imperfect coverage
(Example 8.22)
n the coverage probability c (in addition to failure and repair rates) needs to be estimated
Copyright © 2003 by K.S. Trivedi
y n Statistical Inference : Introduction (contd.) n n n n Estimations are based on the outcomes of the experiment
Set of all possible outcomes of an experiment is called the
population, but often only a subset of the population is available
Methods of Statistical Inference help in estimating the
characteristics of the entire population, based on a suitably
selected subset of the population ( called sample).
As the sample size increases, the estimate becomes more and
more representative of the entire population
Statistical Inference involves the tasks of
n
n Estimation (calculating parameter values and confidence intervals)
Hypothesis testing (accept/reject assumptions about the
parameter or the form of the population distribution)
Copyright © 2003 by K.S. Trivedi
y n Samples
n n Since the outcome of the experiment is random, it
makes sense to specify a population by its
distribution F(x)
Now suppose we collect n experimental outcomes x1, x2, . . ., xn n This collection is a subset (or sample) of the set of
all possible outcomes (the population)
Each xi is an observation from the population X, or
the value of the random variable Xi whose distribution is identical to that of X Copyright © 2003 by K.S. Trivedi
y n Estimates n Estimates are quantities that are calculated from observed
sample, to represent the values of desired parameters of the
population distribution F(x)
Examples of estimates
n Sample mean x = (x1+ x2+ . . .+ xn)/n
n Sample variance s2 = (xi  x )2/(n1)
Copyright © 2003 by K.S. Trivedi
y n Random Sample
Definition of random sample Copyright © 2003 by K.S. Trivedi
y n Statistic
n Definition of statistic n Example of statistics
n Sample mean (as a random variable; hence capitalized) n Sample variance (as a random variable; hence capitalized) Copyright © 2003 by K.S. Trivedi
y Estimator
Definition of estimator Copyright © 2003 by K.S. Trivedi
y n Desired Properties of an Estimator n n Unbiased : On the average, the estimator should
give the true value
Efficient : It should have lesser variance Consistent : Should converge, in probability, to the
true value Copyright © 2003 by K.S. Trivedi
y n Unbiased estimators Example 10.1
n The sample mean
population mean is an unbiased estimator of the
whenever the latter exists. Copyright © 2003 by K.S. Trivedi
y n Unbiased estimators
Example 10.2 Copyright © 2003 by K.S. Trivedi
y n Copyright © 2003 by K.S. Trivedi
y Efficiency n Definition of efficiency Example 10.4: The sample mean
is the most efficient linear estimator of the population mean
Copyright © 2003 by K.S. Trivedi
y n Consistency n Definition of consistency where n is the size of the sample Copyright © 2003 by K.S. Trivedi
y Methods of Parameter Estimation n Method of moments
Method of maximum Likelihood Copyright © 2003 by K.S. Trivedi
y n The method of moments n n n Suppose one or more parameters of the distribution of X are
to be estimated. Define the kth sample moment of the RV X as Then by equating the sample moment with the corresponding
population moment we get an equation: As many equations as the number of parameters are obtained
and solved simultaneously, to get the desired estimates
Such estimators are usually consistent, but they may be biased
and inefficient
Copyright © 2003 by K.S. Trivedi
y n Example 10.5: Main memory needed is X
n Let X have the following density function (with one
parameter to be estimated) n The first moment is n And then
Copyright © 2003 by K.S. Trivedi
y Maximum likelihood estimation n n Suppose the distribution of X has k parameters
and a pdf
If we have a random sample
observed values as
is with
, the joint pdf of Fixing
s and n, the joint pdf can be considered a
function of
. This function is known as
the likelihood function. Copyright © 2003 by K.S. Trivedi
y n Maximum Likelihood Estimation (contd.) n n n MLE estimates of are those values of
which maximize the likelihood function. Thus the MLE estimates are those values of the parameter for
which the observed sample is most likely to occur (since pdf is
maximized)
Often times, dealing with the loglikelihood function (log of the
likelihood function) is easier and so the log likelihood function is
maximized.
MLE estimators are usually consistent and also most efficient in
an asymptotic sense
Copyright © 2003 by K.S. Trivedi
y n Maximum Likelihood estimation (contd.)
Example :
n Consider transmission of n messages through a
channel with success probability p. Transmission of a
single message is modeled with pmf n Likelihood function is the joint pmf The value of p that maximizes L(p) is the maximum
likelihood estimate of p Copyright © 2003 by K.S. Trivedi
y n Example 10.7
n Let the number of calls per hour, X, be Poisson distributed
with parameter n The likelihood function is then n Taking logs, we have n Letting the derivative w.r.t. equal to zero, we have Copyright © 2003 by K.S. Trivedi
y Maximum Likelihood Estimation : Software
Reliability Models
Consider the Goel–Okumoto Model for software
reliability
n n Software failures display the behavior of a nonhomogeneous
Poisson process (NHPP)
Let N(t) denote the cumulative number of faults detected by
time t and m(t) be its expectation (also called the mean
value function), then Copyright © 2003 by K.S. Trivedi
y n Maximum Likelihood Estimation : Software
Reliability Models n The mean value function m(t) is described by where, a is the expected number of faults that would
be detected given infinite testing time and b is the
failure occurrence rate per fault
The instantaneous failure intensity is Copyright © 2003 by K.S. Trivedi
y n Maximum Likelihood Estimation : Software
Reliability models (contd.)
n n n Let Si denote the time of occurrence of ith failure
The pdf of Si at si given the previous observations, is
The joint density or the likelihood function of
S1,S2,…,Sn is The loglikelihood function in this (GoelOkumoto
model) case is Copyright © 2003 by K.S. Trivedi
y n Maximum Likelihood Estimation : Software
Reliability models (contd.) n Maximizing the log likelihood function w.r.t. a and b,
we have Solving these two nonlinear equations numerically,
we get a and b Copyright © 2003 by K.S. Trivedi
y n Maximum Likelihood Estimation : Truncated
data
Example 10.9
n
Consider a sample truncated test of n components without
replacement, truncated after r failures n n Components follow an exponential failure law with parameter
1/θ
Let
denote the times to failure of the observed
failures so that
. Let
be the
times to failure of the remaining components (not observed till
the end of test)
The likelihood function can be written as Copyright © 2003 by K.S. Trivedi
y n Maximum Likelihood Estimation : Truncated
data (contd.) n Dividing by the product of
, we get s and taking limit as Define accumulated life on test as Copyright © 2003 by K.S. Trivedi
y n Maximum Likelihood Estimation : Truncated
data (contd.) n Differentiating the likelihood function w.r.t. θ and
setting it equal to 0 (maximizing w.r.t. θ) Thus we get the maximum likelihood estimator (MLE)
of mean life as Thus the estimator of mean life is given by the
accumulated life on test,
, divided by the number
of observed failures Copyright © 2003 by K.S. Trivedi
y n Maximum Likelihood Estimation : Truncated
data (contd.) Common Mistakes while dealing with truncated data
n Ignoring the observations for
altogether
n Using
as the observation for Copyright © 2003 by K.S. Trivedi
y Maximum Likelihood Estimation : Truncated
data (contd.) n n When the observations for
completely ignored, this estimator is now, are When is used as the observation for
the estimator is It can easily be seen that Copyright © 2003 by K.S. Trivedi
y n MLE with Weibull Data : Truncated data
Example 10.10
n Consider a sample truncated test of n components
with test truncated after first r failures(without
replacement)
n The lifetimes of the components follow a Weibull
Distribution.
n Let
denote the times to failure of the
observed failures so that
. Let
be the times to failure of the remaining
components (not observed till the end of test) Copyright © 2003 by K.S. Trivedi
y Maximum Likelihood Estimation : Truncated
data (contd.)
n The likelihood function is defined as n Dividing by the product of s and taking limit as Copyright © 2003 by K.S. Trivedi
y Maximum Likelihood Estimation : Truncated
data (contd.) n Maximizing the loglikelihood by differentiating w.r.t.
λ and α, respectively and equating to zero, we get Since no closed form solutions for and exist,
above two equations are rearranged as following and
solved iteratively for and Copyright © 2003 by K.S. Trivedi
y n Need for a Confidence interval n Each time we take an nsample and produce a point estimate of
the parameter of interest, it is different in general; also the
estimate will rarely if ever coincide with the true value
So how can we say whether the estimate is good?
n n n Note that we maximized the joint probability of the observation
while computing the MLE estimate but we did not take into account
the spread of the density of the estimator
Smaller the variance of the estimator, better is the estimator; but
how small is good? How can we get some measure of repeatability or respectability
to the estimate? Copyright © 2003 by K.S. Trivedi
y n Need for a Confidence Interval (contd.)
An estimator is itself a random variable following a
sampling distribution, hence it is important to know
its fluctuation.
n n n n Suppose we can produce an interval say A(θ) purported to
contain θ with probability γ
Note that each specific value of the estimate, either lies in
the confidence interval or it does not
But if we sample a large number of times, we can be sure
that the fraction of times the estimate lies within the interval
is γ Interestingly enough, A(θ) is a random interval in
that it changes with the estimate each time! So
repeatability is only partial!
Copyright © 2003 by K.S. Trivedi
y n Confidence Interval n Confidence interval is defined in such a way that we
are reasonably confident that it contains the true
value of the unknown parameter The width of the confidence interval suggests the
amount of variability in the estimated value.
Copyright © 2003 by K.S. Trivedi
y n Confidence intervals –Chebyshev inequality n Chebyshev’s inequality suggests a way to get a
bound on the confidence interval assuming that the
variance of the estimator is known We can get better results if we know the nature of
the distribution Copyright © 2003 by K.S. Trivedi
y n Example 10.13 Copyright © 2003 by K.S. Trivedi
y Example 10.13 n So (  ε, +ε) is a confidence interval for the
population mean µ with a confidence coefficient
that is greater than 1 σ2/(nε2)
We can get exact confidence interval if we know
the nature of the distribution of the population: we
consider
n
n
n
n Sampling from the normal distribution
Exponential distribution
Bernoulli distribution
Etc. Copyright © 2003 by K.S. Trivedi
y n Exact Confidence Interval
Steps in obtaining exact confidence interval for parameter
based on a random sample, X1,X2,..,Xn :
n n n n Find a random variable that is a function of X1,..,Xn
W = W(X1 ,X2,..,Xn; )
Find numbers a and b such that
P(a < W < b) =
After sampling the values xi of Xi, find the range of values
take so that a < w( ) < b
This range is the
confidence interval of can Copyright © 2003 by K.S. Trivedi
y n Sampling from Normal distribution n n Suppose a sample is taken from normal population
t
then the sample mean is
,
where n is the sample size
To find
confidence interval for population
mean, we find numbers a and b such that
Then Copyright © 2003 by K.S. Trivedi
y n Example 10.14
n Letting a =  b, we have n Let , then these values can be read from a table n The width of the confidence interval is
The number of samples required to produce this confidence
interval Copyright © 2003 by K.S. Trivedi
y n Student t distribution n When the sample size is small, we can use student t
distribution to estimate the confidence interval
If X is the sample mean and s be the sample variance
of a random sample of size n from normal
σ2
distribution with mean µ and variance
then the random variable
has student t
distribution with (n1) degrees of freedom. Copyright © 2003 by K.S. Trivedi
y n Sampling from the Exponential distribution n
n X is EXP(λ)
Xi is EXP(λ) for each i
We wish to obtain a confidence interval for either
λ or the corresponding mean θ = 1/λ Copyright © 2003 by K.S. Trivedi
y n Sampling from the Exponential distribution Copyright © 2003 by K.S. Trivedi
y Sampling from the Exponential distribution :
Truncated data n n The accumulated life on test
is rstage Erlang
with parameter λ .
Hence,
is rstage Erlang with
parameter ½ (i.e.
Distribution)
Thus the 100(1α)% confidence interval for θ is given
by Copyright © 2003 by K.S. Trivedi
y n Example 10.20 n n Consider job interarrival times to be exponentially
distributed, and 50 jobs arrive within 100 minutes
Point estimate of the job arrival rate is then 0.5
jobs/min
Noting that
we find the 90% confidence interval for the job
arrival rate λ is (0.39, 0.62) Copyright © 2003 by K.S. Trivedi
y n Reliability estimation of SoftwareDefinition and
features n n
n n n Software to be deployed for operational use
Assume no bugs are fixed in the field and hence no
reliability growth
Estimation method: same as hardware
Interested in estimating parameters from observed
data
This is sometimes called steady state failure rate
estimation in the software context (that after
reliability growth stops)
Remaining faults are sometimes called Heisenbugs Copyright © 2003 by K.S. Trivedi
y n Reliability estimation
For exponential distribution (1) n Given random sample of n observations , the
maximumlikelihood estimate of mean time to failure: θˆ = x .
Where is the sample mean 100(1 α)% confidence interval of MTTF Copyright © 2003 by K.S. Trivedi
y n x Reliability estimation
 For exponential distribution (2) Assume that 50 failures were observed in a
software system, and their sample mean or the
point estimate of the MTTF was 490 hours. Noting
that
2
χ 100 ; 0.95 = 77 .93 and
2
χ 100 ; 0 .05 = 124 . 34
n 90% confidence interval for MTTF
394.1<MTTF<628.87 hours Copyright © 2003 by K.S. Trivedi
y n Reliability estimation
For exponential distribution (3) n If
be the lower and upper limits of the
100γ% confidence interval for
Since exponential is a monotonic function, the 100γ%
upper and lower confidence limits for reliability at
time t are
Therefore, the 100γ% confidence interval for
reliability at time t is Copyright © 2003 by K.S. Trivedi
y n Sampling from Bernoulli distribution n
n
n Suppose the random variable, denoting the
experimental observation, Xi be Bernoulli distributed
with parameter p
The success probability is
The statistic
is binomially distributed
Derivation of 100(1α)% confidence interval Copyright © 2003 by K.S. Trivedi
y n Sampling from Bernoulli distribution (cont’d) n n n Either tabular techniques or Mathematica may be
used to obtain the confidence interval
Sn may be approximated by the normal distribution in
case np ≥ 5 and nq ≥ 5
In cases when p is close to 0 or 1, Poisson
approximation may be used Copyright © 2003 by K.S. Trivedi
y n Confidence Intervals of Coverage Probability n n n Deriving coverage probabilities and their confidence intervals
from fault injection experiments
If n be the number of injected faults, k the number of detected
errors and c the errordetection coverage probability
Let us represent the outcome of ith fault injection experiment by
a Bernoulli random variable, Xi . Its observed value is The statistic
the distribution function is binomially distributed, with Copyright © 2003 by K.S. Trivedi
y n Confidence Intervals of Coverage Probability
(contd.) n n An unbiased estimate of coverage probability is Confidence intervals of coverage probability can be
obtained by using binomial formula or tables of
binomial distribution.
When sample size is large and c is not close to 0 or
1, we can use the normal approximation.
Students t distribution gives a more accurate
estimation of the confidence interval Copyright © 2003 by K.S. Trivedi
y n Confidence Intervals of Coverage Probability
(contd.)
When c is expected to take low (c < 0.1) or high (c >
0.9 ) values, one sided confidence intervals are
calculated with the aid of Poisson approximation and
distribution with
degrees of freedom
n n For c < 0.1, the one sided confidence interval is given by For c > 0.9, the one sided confidence interval is derived for
q = 1 – c , using a similar approach Copyright © 2003 by K.S. Trivedi
y n Confidence Intervals of Coverage Probability
(contd.)
Example
In order to estimate the coverage c of a faulttolerant
system, 200 random faults were inserted. The recovery
mechanism detected 178 of these faults.
n Calculating the exact 95% confidence interval for c n Let Xi denote the result of individual fault detection.
The statistic
is binomially distributed with
distribution function
The point estimate of c is Copyright © 2003 by K.S. Trivedi
y n Confidence Intervals of Coverage Probability
(contd.)
n Let be the largest integer such that n Since sn = k = 178, the interval of c satisfying
is (0.833,0.929) Copyright © 2003 by K.S. Trivedi
y Confidence Intervals of Coverage Probability
(contd.)
Using normal approximation to binomial distribution,
n Sn is approximately normal with and n The 95% confidence interval of c is obtained from
then, n Thus the 95% confidence interval of c is (0.847,0.933)
which is somewhat wider than the exact one based on the
binomial as should be expected. Copyright © 2003 by K.S. Trivedi
y n Confidence Intervals of Coverage Probability
(contd.)
Confidence interval calculation using Poisson
approximation to the binomial distribution. n Let p = 1 c (c is close to 1), be the probability of unsuccessful
fault detection.
The one–sided 95% confidence interval of p is given by n Note : here k is the number of undetected faults
Thus the 95% confidence interval for c is (0.843,1) n Copyright © 2003 by K.S. Trivedi
y n Estimation related to Markov chains n n Consider a homogeneous discretetime Markov chain
with finite number of states {1, 2, …, m}
Let Nij denote the number of transitions from state i
to state j. Let
be the number of
total transitions out of state i.
The transition probability from state i to state j can
be estimated as Copyright © 2003 by K.S. Trivedi
y n Example 10.25 n Suppose the values of nij is n Then the transition probability is Copyright © 2003 by K.S. Trivedi
y Example 10.25 (contd.)
Nij can be thought to be binomially distributed so that
Nij is B(k;n,pij ) . The confidence intervals for pij can
thus be derived by the previously described methods.
If ni is sufficiently large and if pij is not close to 0 or
1, an approximate 100γ% confidence interval for pij is
given by Copyright © 2003 by K.S. Trivedi
y n Availability Estimation of Repairable Systems n
n
n CTMC model of a simple repairable system
Point estimation of availability
Confidence interval estimation
Guaranteed “Five Nines” availability Copyright © 2003 by K.S. Trivedi
y n Simple Repairable System
λ
Up 0 1 Down µ
State
Up
Down Time
Copyright © 2003 by K.S. Trivedi
y Estimation of Availability n Steadystate availability:
MTTF
λ
A=
=
MTTF + MTTR λ + µ
Point estimate: ˆ
A= Total Up Time
Total Up Time + Total Down Time Copyright © 2003 by K.S. Trivedi
y n Derivation of Confidence Interval for System
Availability n
n
n
n
n
n
n Time To Failure Ti ~ EXP(λ).
Total Up Time Sn=∑Ti ~ Erlang(λ,n).
2λSn ~ χ2(2n).
n
ˆ
Λ=
MLE estimator of failure rate:
Sn
Similarly, Time To Repair Ri ~ EXP( µ).
Total Down Time Yn=∑Ri ~ Erlang(µ,n).
2µ Yn ~ χ2(2n).
ˆ=n
Μ
MLE estimator of repair rate: Yn Copyright © 2003 by K.S. Trivedi
y n Estimation of availability n
n For a twostate system, The MLE estimate of A is
The confidence interval is Copyright © 2003 by K.S. Trivedi
y n Derivation of Confidence Interval for System
Availability (cnt.) n
n Let system utility ρ=λ/µ
ˆ
MLE of ρ: R = Λ = n / Sn = Yn
ˆ
ˆ
Μ n / Yn Sn ˆ
R µ Yn 2 µYn
~ F (2 n, 2n)
=
=
ρ λ S n 2λS n 100(1α)% onesided Confidence Interval of ρ: Copyright © 2003 by K.S. Trivedi
y n Derivation of Confidence Interval for System
Availability (cnt.) n MLE estimate of Availability A: 1
Sn
ˆ= 1 =
A
=
ˆ
1 + R 1 + Yn / S n S n + Yn Lower onesided confidence interval of A: AL = 1+ 1
=
yn / sn
f 2 n ,2 n;1−α 1
ˆ
1/ A −1
1+
f 2 n , 2 n;1−α Copyright © 2003 by K.S. Trivedi
y n Confidence Interval Point Estimate F(20,20) 5% OneSided Confidence Interval 10% DoubleSided Confidence Interval
Copyright © 2003 by K.S. Trivedi
y Example For n=1, S1 = 999, Y1 = 1
n n Point estimation of availability use an F distribution with (2,2) degrees of freedom n The 95% confidence interval for A is n (0.9624,1)
The upper onesided confidence interval for A is
(0.9813,1) Copyright © 2003 by K.S. Trivedi
y n Example
For n=10, S1 = 9990, Y1 = 10
n n n n The estimation of availability use an F distribution with (2,2) degrees of freedom
The 95% confidence interval for A is
(0.9975,0.9996)
The upper onesided confidence interval for A is
(0.9979,1) Copyright © 2003 by K.S. Trivedi
y n How to Guarantee
The Five Nines Objective
In addition to enough down time, the samples (failure events)
Should be enough to check if AL>0.99999.
Down Time Per
Year Estimated Availability Samples Required 1.0 min 0.999998097 3 2.0 min 0.999996195 7 3.0 min 0.999994292 18 4.0 min 0.999992389 74 5.0 min 0.999990487 2171 Copyright © 2003 by K.S. Trivedi
y Estimating parameters for M/M/1 queue
n Maximum likelihood estimate of arrival rate is
with confidence interval n Maximum likelihood estimate of service rate is
with confidence interval
Maximum likelihood estimate of server utilization is
with confidence interval Copyright © 2003 by K.S. Trivedi
y n Estimation with dependent samples
n n So far, the measurements obtained were all assumed to be
independent, but often they exhibit dependencies . Eg.
Response times of consecutive requests to a file server are
highly correlated
Suppose the observed quantities are values of dependent
random variables n The sample mean is n The variance is (no longer ) Copyright © 2003 by K.S. Trivedi
y Estimation with dependent samples (contd.) n n As n approaches infinity, The statistic
approaches standard normal
distribution as n approaches infinity
Therefore, an approximate 100(1α)% confidence
interval for µ is given by Copyright © 2003 by K.S. Trivedi
y n Estimation with dependent samples (contd.) n n n Use independent replications to avoid the need to estimate σ 2 a
Replicate the experiment m times, with each experiment
containing n observations.
Use xi ( j ) to denote the value of the ith observation in the jth
experiment. Then the mean and variance are From the individual sample means , we obtain the point
estimate of population mean as Copyright © 2003 by K.S. Trivedi
y n Estimation with dependent samples (contd.) n n The common variance of The statistic
is approximately tdistributed with (m1) degrees of freedom.
Therefore, 100(1α)% confidence interval is given
by Copyright © 2003 by K.S. Trivedi
y n Example 10.29 n Consider 16 independent
experiments conducted,
with each experiment
measuring 20 successful
response time With
we have
Copyright © 2003 by K.S. Trivedi
y n Hypothesis testing n n n n Statistical tests: Procedures that enable us to decide whether to
reject or accept hypotheses based on the information contained in
a sample.
Null hypothesis, H0, is a claim that we are interested in rejecting
or refuting. The contradictory hypothesis is called the alternative
hypothesis (H1)
The nspace of observations is divided into two regions, R(H0) and
R(H1), called acceptance region and rejection region
respectively
Type I error (false alarm)– Null hypothesis is true but sample lies in
rejection region. Probability denoted by α (also called the level of
significance)
Type II error – Null hypothesis is false but sample lies in
acceptance region. Probability denoted by β. Note that (1 β) is
called the power of the test
Copyright © 2003 by K.S. Trivedi
y n Tests on the population mean Copyright © 2003 by K.S. Trivedi
y Normal distribution n n
n Test a hypothesis of population mean µ based on a
random sample size n with known variance
The required statistic with standard normal
distribution is
Also let
Specify the type I error as
Then the acceptance region is Copyright © 2003 by K.S. Trivedi
y n Normal distribution (cont’d) n If the alternative hypothesis is of the form
we adopt an asymmetric acceptance region If the alternative hypothesis is
we adopt the rejection region Copyright © 2003 by K.S. Trivedi
y n Relation between type I and type II errors n
n In practice, we usually want to fix type I error
probability and then devise a test that has type II
error probability as small as possible
Consider
and
Suppose the critical region is
, Then
and
values are
illustrated as in the
figure. Then, C can be
determined as Copyright © 2003 by K.S. Trivedi
y n If the allowable type II error probability is also
specified, then the minimum acceptable sample size
can also be determined. Copyright © 2003 by K.S. Trivedi
y n Hypotheses concerning two means – Case 1
Test the null hypothesis
n
n n Random sample size
Variance
and and The statistic has standard normal distribution, and
Therefore, we have Copyright © 2003 by K.S. Trivedi
y n Case 2 n
n n Consider the case when
and
are not known
Assume
Use the two sample means from the two populations
and
, the common population variance is
estimated by
If the null hypothesis
holds, then the
following statistic has a t distribution with
degrees of freedom Copyright © 2003 by K.S. Trivedi
y n Case 3 n n Test procedures for case 1 and case 2 are valid only for normal
distributions
We consider a distributionfree or nonparametric test for the
following hypothesis Ranksum Test
n
n n Suppose the random sample size is
and
Combine the samples and arrange them in order of increasing
magnitude, and assign to the
ordered values the ranks
1, 2, 3, …
Let
denote the rank of
, then statistic is a sum of ranks
Copyright © 2003 by K.S. Trivedi
y n Case 3 (cont’d) n Let
denote the set of all combinations of y
ranks that will sum to w. Then we get Then the significance level α is determined by
For large sample size case, the statistic under the
zero hypothesis possess a normal distribution with Copyright © 2003 by K.S. Trivedi
y n Hypothesis concerning variances n n Consider the problem of testing a null hypothesis that
a population variance
equals some fixed value
Assume sampling from a normal population
,
then the statistic
is chisquare distributed
with n1 degrees of freedom.
The critical regions for testing the null hypothesis are Copyright © 2003 by K.S. Trivedi
y n Goodnessoffit tests: Discrete random
variables n n Assume X is a discrete random variable with pmf
given by
. We wish to test the null
hypothesis that X possesses a certain specific pmf
given by
The problem then is to test the hypothesis Let n be the number of observations, and Ni be the
observed number of times that the measured value
of X takes value i. Then, we have
Copyright © 2003 by K.S. Trivedi
y n Discrete random variables (cont’d) n n The statistic
is approximately
chisquare distributed with (k1) degrees of freedom.
Under null hypothesis, the statistic is While there are some unknown parameters in the
population distributions, the unknown parameters
need to be first estimated from the collected samples
of size n. Then the test statistic can be used. Copyright © 2003 by K.S. Trivedi
y n Example 10.41
Consider the number of errors discovered in a system
program: Copyright © 2003 by K.S. Trivedi
y n Example 10.41 (cont’d) n An estimate of the rate parameter is n The test statistic is then Copyright © 2003 by K.S. Trivedi
y Example 10.41 (cont’d) n The statistic is chisquare distributed with 4 degrees
of freedom.
Since
, we reject the null hypothesis at a
5% level of significance. Copyright © 2003 by K.S. Trivedi
y n Continuous random variables n Suppose X is a continuous random variable and we
wish to test the hypothesis The KolmogorovSmirnov test is adopted. n The given random sample is first arranged in increasing
order of magnitude
The empirical distribution function is defined by n The value of KolmogorovSmirnov statistic is defined by n Copyright © 2003 by K.S. Trivedi
y n Example 10.42 n n Suppose we have the following 10 Weibull distributed random
deviates with shape parameter 2: and
are
plotted in the figure
The observed value of Dn
statistic is 0.0822 and is
not in the rejection region.
so we accept the null
hypothesis at 5% level of
significance.
Copyright © 2003 by K.S. Trivedi
y n Confidence interval n n A confidence band with coefficient
using Dn statistic by using is obtained The confidence band is then
Suppose some parametric family of functions
is used, then the test statistic is Copyright © 2003 by K.S. Trivedi
y n Graphical method to estimate confidence
interval
Graphical method to estimate confidence interval
Transform the data in such a way that approximately straight
lines can be generated when data is plotted
n Exponential distribution
n The CDF is
Rewriting the equation, we get n The empirical CDF is defined as n n Weibull distribution
n
n n The CDF is
Rewriting the equation we get If the distribution applies, the data plotted should approximately
fall on a straight line. Copyright © 2003 by K.S. Trivedi
y n Graphical method to estimate confidence
interval (cont’d)
Consider the following two graphs, we can conclude
that the Weibull distribution assumption is
appropriate. Plot with EXP Distribution Assumption Plot with Weibull Distribution Assumption Copyright © 2003 by K.S. Trivedi
y n ...
View
Full
Document
 Spring '10
 MohammadAbdolahiAzgomiPh.D

Click to edit the document details