Appl. of Math. Stat. 2019 (6).pdf - Application of Mathematical Statistics(6 Iwona Nowakowska STATISTICAL INFERENCE ESTIMATION \u2013 is the calculated

# Appl. of Math. Stat. 2019 (6).pdf - Application of...

This preview shows page 1 out of 46 pages.

#### You've reached the end of your free preview.

Want to read all 46 pages?

Unformatted text preview: Application of Mathematical Statistics (6) Iwona Nowakowska STATISTICAL INFERENCE ESTIMATION – is the calculated approximation of unknown value, even if input data may be incomplete or uncertain. Point Interval – involves the use of sample data to calculate a single value "point" (known as a statistic) which is to serve as a "best estimate" of an unknown population parameter. - is the use of sample data to calculate an “interval” of all possible (or probable) values of an unknown population parameter. − unknown population parameter – estimate value = , = , = − estimator 1 Application of Mathematical Statistics (6) Iwona Nowakowska – is a function of the observations in the sample → a "statistic" whose values are used to estimate unknown value of the parameter . an estimator → is a statistic → is a random variable So, it can never be said with certainty that an estimate value is close to the true value of a parameter of the distribution. We use sample statistics to estimate population parameters. – is a descriptive measure – a sample characteristic, such as a sample mean , a sample standard ̂ , and so on. deviation , a sample proportion The value of the sample statistic is used to estimate the value of the corresponding population parameter. For example, sample mean is used to estimate population mean , standard deviation is used to estimate population standard deviation . Notations: parameter the mean the standard deviation the variance the proportion sample ̂ population The properties of good estimator : - unbiased efficient consistent sufficient 2 Application of Mathematical Statistics (6) Iwona Nowakowska Point Estimation − unknown parameter – estimate value − estimator Point Estimator Point Estimate Statistic Number formula calculated value For one sample, we use sample mean as the point estimator of the population mean . We use the sample standard deviation as the point estimator of the population standard deviation . ̂ as the point estimator of the population We use the sample proportion proportion . = = = = = ̂ = = in point estimation – is the standard deviation of the point estimator . in point estimation – is the standard error divided by value of estimated parameter obtained from the sample. − standard error of estimation − relative error of estimation = . . . . − value of estimated parameter obtained from the sample ̂) (for example , , estimated value 3 Application of Mathematical Statistics (6) Iwona Nowakowska Point Estimation Point Estimator Standard Error = ( ̅ ) = = ̅ =µ or = ( ̅ ) = or = ∧ √ Relative Error √ = ( ̅ ) = ̅ √ − if – is small ( < 30) = where: = ∧ = ̂= = √ if – is big ( ≥ 30) ∧ ∑ ( − ) , = − − = ̂)= ( ̂ ̂ ̂)=√ = ( ̂ standard error of estimate = µ − µ = − = ( − ) = √ + or = √ or = √ ( + ) ( − ) + ( − ) = + − − „pooled variance” relative error of estimate = ( − ) = − standard error of estimate = − ̂ − ̂ = ̂ − ̂ ) = √ = ( ̂ ̂ ̂ ̂ + relative error of estimate ̂ − ̂) = ( = ̂ − ̂ 4 + Application of Mathematical Statistics (6) Iwona Nowakowska Interval Estimation - is the use of sample data to calculate an interval of all possible values of an unknown population parameter. Neyman (1937) introduced the first interval estimation. The most prevalent form of interval estimation is: confidence interval – consists of numbers. – gives an estimated range of values which is likely to include an unknown population parameter, the estimated range being calculated from a given set of sample data. The common notation for the parameter in question is . − parameter – estimate value − etimator Again, we use sample statistics to estimate population parameters. Sample mean is used to estimate population mean , standard deviation is used to estimate population standard deviation , the ̂ is used to estimate population . sample proportion - is defined by two numbers, between which a population parameter is said to lie. ( ) ( ) 5 Application of Mathematical Statistics (6) Iwona Nowakowska For example: the interval (, ) is the interval estimate of the population mean . It indicates that the population mean is greater than but less than . The ends of the confidence interval always depend on the estimator , so we have the signs ( ) , ( ) . We use a confidence interval to express the precision and uncertainty associated with a particular sampling method. A confidence interval consists of:. → confidence level statistic critical value – describes the uncertainty of a sampling method describes how strongly we believe that a particular sampling method will produce a confidence interval that includes the true population parameter. – measures the reliability of an estimate. Confidence level → − → the probability that unknown value of population parameter belongs to a confidence interval. Theoretically, confidence level − can be any value between 0 and 1 but usually is equal: − = , − = , − = , − = , 6 Application of Mathematical Statistics (6) Iwona Nowakowska ( − ) ∙ % - confidence level in percent (, ) ∙ % = % → 95% confidence level → 95% confidence level “trust”, “credibility” A 95% confidence level means that 95% of the intervals contain the true population parameter. A 90% confidence level means that 90% of the intervals contain the population parameter, and so on. − = , − = , − = , If − = , ⇒ = , − measure of the risk of making an error ∙ % - risk in percent , ∙ % → % risk of error 7 Application of Mathematical Statistics (6) Iwona Nowakowska We are much more likely to accept survey findings if the confidence level is high (say, 95%) than if it is low (say, 60%). In each case of estimation we must determine the critical value . The critical value is the number such that the area under the standard normal curve falling between (− , ) is equal the confidence level − − = , = , − = , The area under the normal curve from: − to: is the probability that the standardized normal variable lies in the interval (− , ) . This means: (− < < ) = − 8 Application of Mathematical Statistics (6) Iwona Nowakowska The margin of error Absolute error of estimation , − the ends of the confidence interval − the lower cutpoint − the upper cutpoint − width of an interval /length/ = − − margin of error (maximum error of estimate) = = The margin of error is defined as the half of the width of the confidence interval. A larger sample size produces a smaller margin of error . Relative error of estimation − absolute error of estimation - margin of error ME ∗ − − relative error of estimation ∗ = = . . . . − value of estimated parameter obtained from the sample ̂) (for example , , estimated value 9 Application of Mathematical Statistics (6) Iwona Nowakowska ∗ % = ∗ ∙ % If ∗% (RE) is less or equal to % ⇒ then our estimation is very good If ∗% (RE) is between % , % ⇒ then estimation is quite good If ∗% (RE) is greater than % ⇒ then our estimation is not very good ⇒ we should be careful with our conclusions concerning the estimation Confidence intervals for the mean of the population = = Model 1: Assumptions: 1) ∶ ( , ) 2) − standard deviation of the population is known Confidence interval for is: ( − ∙ √ < < + ∙ √ ) We write: ( − ∙ = − ∙ = + ∙ √ < < + ∙ √ ) = − √ √ where: − sample size 10 Application of Mathematical Statistics (6) Iwona Nowakowska − sample mean − standard deviation of the population − critical value for the confidence level − To determine the critical value we use the standard statistical table for (, ) using the formula: ( ) = − − cumulative distribution function for the variable ∶ (, ) − width of an interval = − = + ∙ √ − + ∙ √ = ∙ √ − margin of error = = √ = ∙ √ ∙ = − ∙ = + ∙ √ √ = − = + = − = + Minimum sample size for the margin error requirements Determining the sample size We can chose a sample size large enough to provide a desired margin error . 11 Application of Mathematical Statistics (6) = ∙ √ Iwona Nowakowska = ⇒ minimum sample size for the margin error requirements If a desired margin error is selected prior to sampling, the formula above can be used to determine the sample size necessary to satisfy the margin error requirements . Model 2: Assumptions: 1) ∶ ( , ) 2) − standard deviation of the population is unknown 3) − sample size is small ( < ) Confidence interval for is: ( − ∙ √ − < < + ∙ √ − ) We write: ( − ∙ = − ∙ = + ∙ √ − < < + ∙ √ − )= − √ − √ − 12 Application of Mathematical Statistics (6) Iwona Nowakowska where: − sample size − sample mean − sample standard deviation − critical value for the confidence level − To determine the critical value we use the table of the Student's −distribution for the measure of the risk and for ( − ) degrees of freedom − width of an interval = − = + ∙ √ − − + ∙ − margin of error = = = − ∙ = + ∙ ∙ √ − = ∙ √ − √ − = ∙ √ − √ − √ − = − = + Minimum sample size for the margin error requirements Determining the sample size We can chose a sample size large enough to provide a desired margin error . = ∙ ⇒ = + √ − minimum sample size for the margin error requirements 13 Application of Mathematical Statistics (6) Iwona Nowakowska If a desired margin error is selected prior to sampling, the formula above can be used to determine the sample size necessary to satisfy the margin error requirements . Model 3: Assumptions: 1) ∶ ( , ) or another one ∶ ___( , ) 2) − standard deviation of the population is unknown 3) − sample size is big ( ≥ ) ~ Confidence interval for is: ( − ∙ √ < < + ∙ √ ) We write: ( − ∙ = − ∙ = + ∙ √ < < + ∙ √ ) = − √ √ where: − sample size − sample mean − sample standard deviation − critical value for the confidence level − To determine the critical value we use the standard statistical table for (, ) using the formula: ( ) = − − cumulative distribution function for the variable ∶ (, ) 14 Application of Mathematical Statistics (6) Iwona Nowakowska − width of an interval = − = + ∙ √ − margin of error = = = ∙ √ = − ∙ = ∙ ∙ √ = ∙ √ = ∙ √ √ √ = − √ = + ∙ = − + ∙ = + √ Minimum sample size for the margin error requirements Determining the sample size We can chose a sample size large enough to provide a desired margin error. = ∙ √ ⇒ = minimum sample size for the margin error requirements If a desired margin error is selected prior to sampling, the formula above can be used to determine the sample size necessary to satisfy the margin error requirements . 15 Application of Mathematical Statistics (6) Iwona Nowakowska Main idea of the confidence interval: sample statistics ± margin error sample statistics ± sample statistics ± multiplier ∙ standard error sample statistics ± critical value ∙ ± ∈ (− , +) Case Problem: In order to estimate the average monthly expenditure of KU students on sweets and coffee , we have selected at random a sample of nine students and we obtained the average value PLN. Assuming that the spending money on sweets and coffee has normal distribution ∶ ( , ) estimate the average amount of the expenses for all KU students taking into account the confidence level , . Solution: data: − monthly expenditure of students for sweets and coffee – random variable 16 Application of Mathematical Statistics (6) Iwona Nowakowska = − sample size = PLN - sample mean ∶ ( , ) = PLN − the mean of the monthly expenditure of students for sweets and coffee = − = , - confidence level We choose: Model 1. − is given ( − ∙ √ < < + ∙ √ ) = − = = PLN, = PLN − critical value for the confidence level − = , (?) − = , ⇒ = , To determine the critical value we use the standard statistical table for ∶ (, ) using the formula: ( ) = − − cumulative distribution function for the normal variable ∶ (, ) ( ) = − , = − = − , = , we look for this area inside the table 17 Application of Mathematical Statistics (6) Iwona Nowakowska − cumulative distribution function for the normal variable ∶ (, ) ⋮ , ⋮ … , … , we have less 0, 0005 = , + , = , = , or: ⋮ , ⋮ … , … , we have more 0,0005 = , + , = , = , Let us take = , ( − ∙ = = PLN , √ < < + ∙ √ ) = , = PLN , = , = √ √ = − , ∙ = − , = , = − ∙ = + , ∙ = − , ∙ = √ √ = + , ∙ = + , = , = + ∙ = − , ∙ = + , ∙ 18 Application of Mathematical Statistics (6) Iwona Nowakowska ∈ ( , ; , ) PLN Margin error: = ∙ √ = , ∙ √ = , = − = − , = , = + = − , = , Confidence interval ( , ; , ) PLN covers the mean of the monthly expenditure of all students for sweets and coffee with probability , (with credibility % ) ; (with % risk of the error). Margin error of this estimate is , PLN . Case Problem: Estimate the average seniority of workers employed in the electronics industry in Poland. A sample of 100 people was selected from the population of all employees and revealed the following data: Seniority (years) 0–2 2–4 4–6 6–8 8 – 10 number of employees 4 10 55 25 6 At confidence level − = , find the confidence interval for the mean seniority for all employees. Solution: data: − seniority of = − sample workers employed in the electronics industry 19 Application of Mathematical Statistics (6) Iwona Nowakowska ∶ ? ? ? ( , ) − the mean of the seniority of workers = − = , - confidence level We choose: Model 3. ( − ∙ √ < < + ∙ We must evaluate first the sample parameters: √ ) = − =? , =? = ∑ ̇ ∙ = =√ ∑( ̇ − ) = Auxiliary calculations: = , (̇ − ) (̇ − ) ̇ ̇ ̇ − 1 2 3 3 6 7 8 0–2 4 1 4 -4,4 19,36 77,44 2–4 10 3 30 -2,4 5,76 57,60 4–6 55 5 275 -0,4 0,16 8,80 6–8 25 7 175 1,6 2,56 64,00 8 – 10 6 9 54 3,6 12,96 77,76 100 ____ 538 ____ ____ 285,60 ∑ 20 Application of Mathematical Statistics (6) Iwona Nowakowska = ∑ ̇ ∙ = = , ≈ , = =√ ∑( ̇ − ) = √ , = √, = , = To determine the critical value we use the standard statistical table for ∶ (, ) using the formula: ( ) = − − = , ⇒ = , − cumulative distribution function for the normal variable ∶ (, ) ( ) = − , = − = − , = , we look for this area inside the table − cumulative distribution function for the normal variable ∶ (, ) ⋮ , ⋮ … , … , we have more 0,0001 = , + , = , = , 21 Application of Mathematical Statistics (6) , = , − , ∙ , = √ = , − , = , = − ∙ Iwona Nowakowska , = , + , ∙ , = √ = , + , = , = + ∙ = , − , ∙ = , + , ∙ Margin error: = ∙ √ = , ∙ , = , = − = , − , = , = + = , + , = , − = , a=5,01 b=5,79 ∈ ( , ; , ) years Confidence interval for the mean seniority is , ± , years with 98% credibility. ∗ = − relative error of estimation ∗ = = . . = , = , years 22 Application of Mathematical Statistics (6) ∗ = = Iwona Nowakowska , = = , , ∗ % =% Confidence interval ( , ; , ) years covers the mean of the seniority of all workers with probability , (with credibility % ). Margin error of this estimate is , years. Relative error of this estimate is 7% . Case Problem: - the duration of the banking operations of the given type (in minutes) - number of banking operations Data set: : 5, 6, 7, 5, 8, 7, 8, 9, 6, 9 It’s known that time of duration of the banking operations random variable. is normal We will estimate the mean duration of all such operations (of the given type) with credibility 95 %. data: − duration of the banking operations – a random variable = − sample size ∶ 5, 6, 7, 5, 8, 7, 8, 9, 6, 9 (minutes) - observations ∶ ( , ) 23 Application of Mathematical Statistics (6) Iwona Nowakowska = − sample size – sample size is small − the mean duration of all banking operations of the given type 95 % - credibility − = , = We choose: Model 2. ( − ∙ = − ∙ = + ∙ √− < < + ∙ √− ) = − = , √ − √ − − sample mean (?) − sample standard deviation (?) − critical value for the confidence level − (?) Sample parameters: = ( + + ⋯ + ) = ∑ = = . = √ = √ = √ ∑( − ) [ ( − ) + ( − ) + ( − ) + … + ( − ) ] = [ + + + ⋯+ ] = √ = √ ≈ , 24 Application of Mathematical Statistics (6) Iwona Nowakowska − critical value for the confidence level − − = , ⇒ = , the Student's −distribution table: … = , … ⋮ −= , ⋮ ≈ , = − ∙ √ − = − , , √ − = − , ∙ , = = − , = , = + ∙ √ − = + , , √ − = − , ∙ , = = + , = , ∈ (, ; , ) . − = , a = 5,94 b = 8,06 Numerical confidence interval ( , ; , ) minutes covers the mean of the duration of all banking operations of the type with probability , and with margin error , minutes. 25 Application of Mathematical Statistics (6) Iwona Nowakowska Case Problem: continuation – part 2 - the duration of the banking operations of the given type (in minutes) data: − duration of the banking operations – random variable = − sample size ∶ 5, 6, 7, 5, 8, 7, 8, 9, 6, 9 (min.) - observations ∶ ( , ) = − sample size – small − the mean of the duration of all banking operations Now we change the confidence level: − = , − critical value for the confidence level − = , will be different − = , ⇒ = , the Student's −distribution table: … = , … ⋮ −= , ⋮ = , = − ∙ = + ∙ √ − √ − = − , = + , , √ − , √ − = − , = , = + , = , − = , a = 5,47 b = 8,53 ∈ (, ; , ) . 26 Application of Mathematical Statistics (6) Iwona Nowakowska Numerical confidence interval ( , ; , ) minutes covers the mean of the duration of all banking operations with probability , with maximum error = , min. Comparison: − = , a=5,94 b=8,06 = = , − = , a=5,47 b=8,53 = = , The question: Which estimation is better? We should check the margin error value. We will do it once again. − margin of error = Model: ( − ∙ = − = + ∙ √− < < + ∙ − + ∙ √ − ∙ √ − = ∙ = = √ − √− √ − )= − = ∙ √ − 27 Application of Mathematical Statistics (6) Iwona Nowakowska 1) first estimation: − = , = , = ∙ √− = , ∙ , √ = , ∙ , = , ∙ , = , ∙ , = , . 2) second estimation: − = , = , = ∙ √− = , ∙ , √ = , ∙ , = , . − = , a=5,94 b=8,06 = , . − = , a=5,47 b=8,53 = , . Conclusion: The margin error is bigger in the case of the second estimation, so the first estimation is better. ………………………………………………………. 28 Application of Mathematical Statistics (6) Iwona Nowakowska We determine the relative error: − absolute error of estimation - margin error ∗ = − relative error of estimation ∗ = = . . 1) first estimation: − = , = , = , = 7 min. ∗ = = , = = = , . . ∗ % = % 2) second estimation: − = , = , = , = 7 min. ∗ = = , = = = , . . ∗ ∗ % = % % = % < ∗ % = % In each case relative error is greater than % ⇒ it means that our estimations are not very good (but the first is better) !!! 29 Application of Mathematical Statistics (6) Iwona Nowakowska Example: continuation – part 3 - the duration of the banking operations of the given type (in minutes) data: − duration of the banking operations – random variable = − sample size ∶ 5, 6, 7, 5, 8, 7, 8, 9, 6, 9 (min.) - observations ∶ ( , ) = − sample size – small − the mean of the duration of all banking operations ------------------------------------------------------------------------------ Now the question is: How do we have to increase the sample size (how many additional experiments do we have to conduct) to estimate the average time of duration of banking operations with credibility 99 % and with margin error 0,7 minutes. − = , ⇒ = , = , We must follow the formula: = + the sample size necessary to satisfy the margin error requirements 30 Application of Mathematical Statistics (6) Iwona Nowakowska (, ) ∙ , ∙ = + = + = + = (, ) , , = + = , + = , ≈ , we must round this value always up < it’s better to have always more information than less > So, we must check 35 experiments more (we had 10 experiments at the beginning, 45 – 10 = 35 ) to obtain margin error not bigger than 0,7 min. with credibility 99 %. Case Problem: Julia enjoys jogging. She has been jogging over a period of several years, during which time her physical condition has remained constantly good. Usually she jogs 2 miles/day. During the past year Julia has sometimes recorded her times required to run 2 miles. She has a sample of 90 of these times. For these 90 times the sample mean was 15,6 minutes with the standard deviation 1,8 minutes. Let − be the mean jogging time for the entire distribution of Julia’s 2-miles running times (taken over the past year). Find a 95 % confidence interval for . Solution: data: − jogging time for the entire distribution of Julia’s 2-miles running times 31 Application of Mathematical Statistics (6) Iwona Nowakowska − the mean of jogging time ∶ ? ? ? ( , ) = = , min. = , min. = − = , - confidence level We choose: Model 3. ( − ∙ √ < < + ∙ √ ) = − We must look...
View Full Document

• Fall '13