#### You've reached the end of your free preview.

Want to read all 46 pages?

**Unformatted text preview: **Application of Mathematical Statistics (6) Iwona Nowakowska STATISTICAL INFERENCE
ESTIMATION
– is the calculated approximation of unknown
value, even if input data may be incomplete or uncertain. Point Interval
– involves the use of sample data to calculate a single value "point" (known as a statistic) which is to
serve as a "best estimate" of an unknown population parameter. - is the use of sample data to calculate
an “interval” of all possible (or probable) values of an unknown
population parameter. − unknown population parameter – estimate value
= , = , = − estimator 1 Application of Mathematical Statistics (6) Iwona Nowakowska – is a function of the observations in the
sample →
a "statistic" whose values are used to estimate
unknown value of the parameter .
an estimator → is a statistic → is a random variable So, it can never be said with certainty that an estimate value is close to
the true value of a parameter of the distribution.
We use sample statistics to estimate population parameters.
– is a descriptive measure – a sample
characteristic, such as a sample mean , a sample standard
̂ , and so on.
deviation , a sample proportion The value of the sample statistic is used to estimate the value
of the corresponding population parameter.
For example, sample mean is used to estimate population mean ,
standard deviation is used to estimate population standard deviation .
Notations:
parameter
the mean
the standard deviation
the variance
the proportion sample ̂ population The properties of good estimator :
- unbiased
efficient
consistent
sufficient 2 Application of Mathematical Statistics (6) Iwona Nowakowska Point Estimation − unknown parameter – estimate value − estimator
Point Estimator Point Estimate Statistic
Number formula
calculated value For one sample, we use sample mean as the point estimator of the
population mean .
We use the sample standard deviation as the point estimator of the
population standard deviation .
̂ as the point estimator of the population
We use the sample proportion proportion .
= = = = = ̂ = = in point estimation – is the standard
deviation of the point estimator . in point estimation – is the standard error divided by value of estimated parameter obtained from the sample. − standard error of estimation − relative error of estimation = . . . . − value of estimated parameter obtained from the sample
̂)
(for example , , estimated value
3 Application of Mathematical Statistics (6) Iwona Nowakowska Point Estimation Point
Estimator Standard Error = ( ̅ ) = = ̅ =µ or = ( ̅ ) =
or = ∧ √ Relative Error √ = ( ̅ ) = ̅ √ − if – is small ( < 30) =
where:
= ∧
= ̂= = √ if – is big ( ≥ 30)
∧ ∑ ( − ) ,
=
− − = ̂)=
( ̂
̂ ̂)=√ = ( ̂ standard error of estimate = µ − µ = − = ( − ) = √ + or = √ or = √ (
+
) ( − ) + ( − )
= + − − „pooled variance”
relative error of estimate = ( − ) =
−
standard error of estimate = − ̂ − ̂ = ̂ − ̂ ) = √ = ( ̂ ̂ ̂ ̂ + relative error of estimate
̂ − ̂) = ( = ̂ − ̂ 4 + Application of Mathematical Statistics (6) Iwona Nowakowska Interval Estimation
- is the use of sample data to calculate an interval
of all possible values of an unknown population parameter.
Neyman (1937) introduced the first interval estimation.
The most prevalent form of interval estimation is: confidence interval
– consists of numbers.
– gives an estimated range of values which is likely
to include an unknown population parameter, the estimated range being
calculated from a given set of sample data.
The common notation for the parameter in question is . − parameter – estimate value − etimator
Again, we use sample statistics to estimate population parameters.
Sample mean is used to estimate population mean , standard
deviation is used to estimate population standard deviation , the
̂ is used to estimate population .
sample proportion - is defined by two numbers,
between which a population parameter is said to lie. ( ) ( )
5 Application of Mathematical Statistics (6) Iwona Nowakowska For example: the interval (, ) is the interval estimate of the population
mean .
It indicates that the population mean is greater than but less than .
The ends of the confidence interval always depend on the estimator , so
we have the signs ( ) , ( ) . We use a confidence interval to express the precision and uncertainty
associated with a particular sampling method.
A confidence interval consists of:. → confidence level
statistic
critical value
– describes the uncertainty of a sampling method
describes how strongly we believe that a particular sampling method will produce a confidence interval that includes the true population
parameter. – measures the reliability of an estimate.
Confidence level → − → the probability that unknown value of population parameter belongs to a confidence interval.
Theoretically, confidence level − can be any value between 0 and 1 but
usually is equal: − = , − = , − = , − = , 6 Application of Mathematical Statistics (6) Iwona Nowakowska ( − ) ∙ % - confidence level in percent (, ) ∙ % = %
→ 95% confidence level → 95% confidence level
“trust”, “credibility” A 95% confidence level means that 95% of the intervals contain the true
population parameter.
A 90% confidence level means that 90% of the intervals contain the
population parameter, and so on. − = , − = , − = , If − = , ⇒ = , − measure of the risk of making an error ∙ % - risk in percent
, ∙ % → % risk of error 7 Application of Mathematical Statistics (6) Iwona Nowakowska We are much more likely to accept survey findings if the confidence level is
high (say, 95%) than if it is low (say, 60%). In each case of estimation we must determine the critical value .
The critical value is the number such that the area under the standard
normal curve falling between (− , ) is equal the confidence level − − = , = , − = , The area under the normal curve from: − to: is the probability that
the standardized normal variable lies in the interval (− , ) .
This means:
(− < < ) = − 8 Application of Mathematical Statistics (6) Iwona Nowakowska The margin of error
Absolute error of estimation , − the ends of the confidence interval − the lower cutpoint − the upper cutpoint − width of an interval /length/
= − − margin of error (maximum error of estimate) = = The margin of error is defined as the half of the width of the confidence
interval. A larger sample size produces a smaller margin of error . Relative error of estimation − absolute error of estimation - margin of error ME
∗ − − relative error of estimation
∗ = = . . . . − value of estimated parameter obtained from the sample
̂)
(for example , , estimated value 9 Application of Mathematical Statistics (6) Iwona Nowakowska ∗ % = ∗ ∙ %
If ∗% (RE) is less or equal to % ⇒ then our estimation is very good If ∗% (RE) is between % , % ⇒ then estimation is quite good If ∗% (RE) is greater than % ⇒
then our estimation is not very good
⇒ we should be careful with our conclusions concerning the estimation Confidence intervals for the mean of the population = = Model 1:
Assumptions:
1) ∶ ( , )
2) − standard deviation of the population is known
Confidence interval for is:
( − ∙ √ < < + ∙ √ ) We write: ( − ∙ = − ∙ = + ∙ √ < < + ∙ √ ) = − √ √ where: − sample size
10 Application of Mathematical Statistics (6) Iwona Nowakowska − sample mean − standard deviation of the population − critical value for the confidence level − To determine the critical value we use the standard statistical table for
(, ) using the formula:
( ) = − − cumulative distribution function for the variable ∶ (, ) − width of an interval = − = + ∙ √ − + ∙ √ = ∙ √ − margin of error = = √ = ∙ √ ∙ = − ∙ = + ∙ √ √ = −
= + = − = + Minimum sample size for the margin error requirements
Determining the sample size
We can chose a sample size large enough to provide a desired margin error
.
11 Application of Mathematical Statistics (6) = ∙ √ Iwona Nowakowska = ⇒ minimum sample size for the margin error requirements If a desired margin error is selected prior to sampling, the formula above
can be used to determine the sample size necessary to satisfy the margin
error requirements . Model 2:
Assumptions:
1) ∶ ( , )
2) − standard deviation of the population is unknown
3) − sample size is small ( < )
Confidence interval for is:
( − ∙ √ − < < + ∙ √ − ) We write: ( − ∙ = − ∙ = + ∙ √ − < < + ∙ √ − )= − √ − √ − 12 Application of Mathematical Statistics (6) Iwona Nowakowska where: − sample size − sample mean − sample standard deviation − critical value for the confidence level − To determine the critical value we use the table of the Student's −distribution for the measure of the risk and for ( − ) degrees of freedom − width of an interval = − = + ∙ √ − − + ∙ − margin of error = = = − ∙ = + ∙ ∙ √ − = ∙ √ − √ − = ∙ √ − √ − √ − = − = + Minimum sample size for the margin error requirements
Determining the sample size
We can chose a sample size large enough to provide a desired
margin error . = ∙
⇒ =
+ √ − minimum sample size for the margin error requirements
13 Application of Mathematical Statistics (6) Iwona Nowakowska If a desired margin error is selected prior to sampling, the formula above
can be used to determine the sample size necessary to satisfy the margin
error requirements . Model 3:
Assumptions:
1) ∶ ( , ) or another one ∶ ___( , )
2) − standard deviation of the population is unknown
3) − sample size is big ( ≥ ) ~ Confidence interval for is:
( − ∙ √ < < + ∙ √ ) We write: ( − ∙ = − ∙ = + ∙ √ < < + ∙ √ ) = − √ √ where: − sample size − sample mean − sample standard deviation − critical value for the confidence level − To determine the critical value we use the standard statistical table for
(, ) using the formula: ( ) = − − cumulative distribution function for the variable ∶ (, ) 14 Application of Mathematical Statistics (6) Iwona Nowakowska − width of an interval = − = + ∙ √ − margin of error
= = = ∙ √ = − ∙ = ∙ ∙ √ = ∙ √ = ∙ √ √ √ = − √ = + ∙ = − + ∙ = + √ Minimum sample size for the margin error requirements
Determining the sample size
We can chose a sample size large enough to provide a desired margin error. = ∙ √ ⇒ = minimum sample size for the margin error requirements If a desired margin error is selected prior to sampling, the formula above
can be used to determine the sample size necessary to satisfy the margin
error requirements . 15 Application of Mathematical Statistics (6) Iwona Nowakowska Main idea of the confidence interval:
sample statistics ± margin error
sample statistics ± sample statistics ± multiplier ∙ standard error
sample statistics ± critical value ∙ ± ∈ (− , +) Case Problem:
In order to estimate the average monthly expenditure of KU
students on sweets and coffee , we have selected at
random a sample of nine students and we obtained
the average value PLN.
Assuming that the spending money on sweets and
coffee has normal distribution ∶ ( , ) estimate
the average amount of the expenses for all KU students
taking into account the confidence level , . Solution:
data: − monthly expenditure of students for sweets and coffee – random
variable 16 Application of Mathematical Statistics (6) Iwona Nowakowska = − sample size = PLN - sample mean ∶ ( , ) = PLN − the mean of the monthly expenditure of students for sweets and
coffee
= − = , - confidence level
We choose: Model 1. − is given ( − ∙ √ < < + ∙ √ ) = − = = PLN, = PLN − critical value for the confidence level − = , (?) − = , ⇒ = , To determine the critical value we use the standard statistical table for ∶ (, ) using the formula:
( ) = − − cumulative distribution function for the normal variable ∶ (, )
( ) = − , = −
= − , = , we look for this area
inside the table
17 Application of Mathematical Statistics (6) Iwona Nowakowska − cumulative distribution function for the normal variable ∶ (, )
⋮
, ⋮ … , …
, we have less 0, 0005 = , + , = , = , or:
⋮
, ⋮ … , …
, we have more 0,0005 = , + , = , = , Let us take = , ( − ∙
= = PLN , √ < < + ∙ √ ) = , = PLN , = , = √
√
= − , ∙ = − , = , = − ∙ = + , ∙ = − , ∙ = √
√
= + , ∙ = + , = , = + ∙ = − , ∙ = + , ∙ 18 Application of Mathematical Statistics (6) Iwona Nowakowska ∈ ( , ; , ) PLN
Margin error: = ∙ √ = , ∙ √ = , = − = − , = , = + = − , = , Confidence interval ( , ; , ) PLN covers the mean of the
monthly expenditure of all students for sweets and coffee with probability
, (with credibility % ) ; (with % risk of the error).
Margin error of this estimate is , PLN . Case Problem:
Estimate the average seniority of workers employed in the electronics
industry in Poland.
A sample of 100 people was selected from the population of all employees
and revealed the following data:
Seniority
(years)
0–2
2–4
4–6
6–8
8 – 10 number of employees
4
10
55
25
6 At confidence level − = , find the confidence interval for the mean
seniority for all employees.
Solution:
data: − seniority of = − sample workers employed in the electronics industry 19 Application of Mathematical Statistics (6) Iwona Nowakowska ∶ ? ? ? ( , ) − the mean of the seniority of workers
= − = , - confidence level
We choose: Model 3. ( − ∙ √ < < + ∙ We must evaluate first the sample parameters: √ ) = − =? , =? =
∑ ̇ ∙ = =√
∑( ̇ − ) = Auxiliary calculations: = , (̇ − ) (̇ − ) ̇ ̇ ̇ − 1 2 3 3 6 7 8 0–2 4 1 4 -4,4 19,36 77,44 2–4 10 3 30 -2,4 5,76 57,60 4–6 55 5 275 -0,4 0,16 8,80 6–8 25 7 175 1,6 2,56 64,00 8 – 10 6 9 54 3,6 12,96 77,76 100 ____ 538 ____ ____ 285,60 ∑ 20 Application of Mathematical Statistics (6) Iwona Nowakowska =
∑ ̇ ∙ = = , ≈ , = =√
∑( ̇ − ) = √
, = √, = , = To determine the critical value we use the standard statistical table for ∶ (, ) using the formula:
( ) = − − = , ⇒ = , − cumulative distribution function for the normal variable ∶ (, )
( ) = − , = −
= − , = , we look for this area
inside the table − cumulative distribution function for the normal variable ∶ (, )
⋮
, ⋮ … , …
, we have more 0,0001 = , + , = , = , 21 Application of Mathematical Statistics (6) , = , − , ∙ , = √
= , − , = , = − ∙ Iwona Nowakowska , = , + , ∙ , = √
= , + , = , = + ∙ = , − , ∙ = , + , ∙ Margin error: = ∙ √ = , ∙ , = , = − = , − , = , = + = , + , = , − = , a=5,01 b=5,79 ∈ ( , ; , ) years Confidence interval for the mean seniority is , ± , years with 98%
credibility. ∗ = − relative error of estimation
∗ = = . . = , = , years 22 Application of Mathematical Statistics (6) ∗ = = Iwona Nowakowska , =
= , , ∗ % =% Confidence interval ( , ; , ) years covers the mean of the
seniority of all workers with probability , (with credibility % ).
Margin error of this estimate is , years.
Relative error of this estimate is 7% . Case Problem: - the duration of the banking operations of the given type (in minutes) - number of banking operations
Data set: : 5, 6, 7, 5, 8, 7, 8, 9, 6, 9
It’s known that time of duration of the banking operations
random variable. is normal We will estimate the mean duration of all such operations (of the given type)
with credibility 95 %. data: − duration of the banking operations – a random variable = − sample size ∶ 5, 6, 7, 5, 8, 7, 8, 9, 6, 9
(minutes) - observations ∶ ( , )
23 Application of Mathematical Statistics (6) Iwona Nowakowska = − sample size – sample size is small − the mean duration of all banking operations of the given type
95 % - credibility − = , =
We choose: Model 2. ( − ∙ = − ∙ = + ∙ √− < < + ∙ √− ) = − = , √ − √ − − sample mean (?) − sample standard deviation (?) − critical value for the confidence level − (?)
Sample parameters:
= ( + + ⋯ + ) =
∑ = = . = √ = √ = √ ∑( − ) [ ( − ) + ( − ) + ( − ) + … + ( − ) ] = [ + + + ⋯+ ] = √ = √ ≈ , 24 Application of Mathematical Statistics (6) Iwona Nowakowska − critical value for the confidence level − − = , ⇒ = , the Student's −distribution table:
… = , …
⋮
−= , ⋮ ≈ , = − ∙ √ − = − , , √ − = − , ∙ , = = − , = , = + ∙ √ − = + , , √ − = − , ∙ , = = + , = , ∈ (, ; , ) . − = , a = 5,94 b = 8,06 Numerical confidence interval ( , ; , ) minutes covers the mean
of the duration of all banking operations of the type with probability , and with margin error , minutes. 25 Application of Mathematical Statistics (6) Iwona Nowakowska Case Problem: continuation – part 2 - the duration of the banking operations of the given type (in minutes) data: − duration of the banking operations – random variable = − sample size ∶ 5, 6, 7, 5, 8, 7, 8, 9, 6, 9 (min.) - observations ∶ ( , ) = − sample size – small − the mean of the duration of all banking operations
Now we change the confidence level: − = , − critical value for the confidence level − = , will be different − = , ⇒ = , the Student's −distribution table:
… = , …
⋮
−= , ⋮ = , = − ∙ = + ∙ √ − √ − = − , = + , , √ − , √ − = − , = , = + , = , − = , a = 5,47 b = 8,53 ∈ (, ; , ) .
26 Application of Mathematical Statistics (6) Iwona Nowakowska Numerical confidence interval ( , ; , ) minutes covers the mean of the duration of all banking operations with probability , with
maximum error = , min.
Comparison: − = , a=5,94 b=8,06 = = , − = , a=5,47 b=8,53 = = , The question:
Which estimation is better?
We should check the margin error value. We will do it once again. − margin of error
=
Model: ( − ∙ = − = + ∙ √− < < + ∙
− + ∙ √ − ∙ √ − = ∙ = = √ − √− √ − )= − = ∙ √ − 27 Application of Mathematical Statistics (6) Iwona Nowakowska 1) first estimation: − = , = , = ∙ √− = , ∙ ,
√ = , ∙ , = , ∙ , = , ∙ , = , . 2) second estimation: − = , = , = ∙ √− = , ∙ ,
√ = , ∙ , = , . − = , a=5,94 b=8,06 = , . − = , a=5,47 b=8,53 = , . Conclusion:
The margin error is bigger in the case of the second estimation, so the first
estimation is better.
……………………………………………………….
28 Application of Mathematical Statistics (6) Iwona Nowakowska We determine the relative error: − absolute error of estimation - margin error
∗ = − relative error of estimation
∗ = = . . 1) first estimation: − = , = , = , = 7 min.
∗ = = , =
=
= , . . ∗ % = % 2) second estimation: − = , = , = , = 7 min.
∗ = = , =
=
= , . . ∗ ∗ % = % % = % < ∗ % = % In each case relative error is greater than % ⇒
it means that our
estimations are not very good (but the first is better) !!!
29 Application of Mathematical Statistics (6) Iwona Nowakowska Example: continuation – part 3 - the duration of the banking operations of the given type (in minutes) data: − duration of the banking operations – random variable = − sample size ∶ 5, 6, 7, 5, 8, 7, 8, 9, 6, 9 (min.) - observations ∶ ( , ) = − sample size – small − the mean of the duration of all banking operations
------------------------------------------------------------------------------ Now the question is:
How do we have to increase the sample size (how many additional
experiments do we have to conduct) to estimate the average time of duration
of banking operations with credibility 99 % and with margin error 0,7
minutes. − = , ⇒ = , = , We must follow the formula: = + the sample size necessary to satisfy the margin error requirements 30 Application of Mathematical Statistics (6) Iwona Nowakowska (, ) ∙ , ∙ =
+ =
+ =
+ =
(, ) , , =
+ = , + = , ≈ , we must round this value always up
< it’s better to have always more information than less > So, we must check 35 experiments more (we had 10 experiments at the
beginning, 45 – 10 = 35 ) to obtain margin error not bigger than 0,7 min. with
credibility 99 %. Case Problem:
Julia enjoys jogging. She has been jogging over a period of
several years, during which time her physical condition has
remained constantly good. Usually she jogs 2 miles/day.
During the past year Julia has sometimes recorded her
times required to run 2 miles. She has a sample of 90 of
these times. For these 90 times the sample mean was 15,6
minutes with the standard deviation 1,8 minutes.
Let − be the mean jogging time for the entire distribution of Julia’s 2-miles
running times (taken over the past year).
Find a 95 % confidence interval for . Solution:
data: − jogging time for the entire distribution of Julia’s 2-miles running times
31 Application of Mathematical Statistics (6) Iwona Nowakowska − the mean of jogging time ∶ ? ? ? ( , ) = = , min. = , min.
= − = , - confidence level
We choose: Model 3. ( − ∙ √ < < + ∙ √ ) = − We must look...

View
Full Document

- Fall '13