This preview shows page 1. Sign up to view the full content.
Unformatted text preview: Chapter 14: Chapter 14 Estimation of Mean, Variance and Proportion
Up to the last chapter, we have been handling descriptive statistics, i.e. studying data without generalization (of information). From this chapter on, we shall be largely concerned about inductive statistics, which generalizes information obtained from a sample to make inference on a much larger population. A. A. Concept of Statistical Inference Usually, the population is so large that we cannot Usually, the afford to study its every member. However, we wish to know its certain important quantities, μand σ², say. and These quantities describing the population are said to be parameters. Then we take a random sample. x1 , x2 ,......., xn The sample elements, are fully known. From these we calculate a certain quantity, say, the sample mean: x = ∑ x
i n A quantity computed from the sample elements is called a A quantity computed from the sample elements is called a statistic. Then we make inference on the parameter under investigation from this statistic. Say x = 15 ˆ Then we estimate μ to be 15: µ = 15 This involves generalization of sample information to the generalization population. It is known as statistical inference. statistical There are three main topics in statistical inference: There 1. Point estimation 2. Interval estimation 3. Test of hypotheses. We shall study point estimates for μ,σ²,σ and p in this We chapter. chapter. B. Unbiased Estimate for μ B. Unbiased Estimate for x1 , x2 ,......., xn If represent a random sample taken from a population, then x ˆ µ = x = ∑ .....(1)
i x Here, is the sample mean. Its value is not fixed. It may vary from sample to sample. Certainly its value is not identical to μiin general, n since μiis a fixed quantity, although unknown to s us. ˆ We use µ to stand for an estimator for μ, or an estimator or estimate of μ. estimate ( µ is read as μhat.) ˆ n Example 1: Example 1 To estimate the mean weight (μ) of the of seven million residents of Hong Kong, a sample of size 8 is taken. The results (in kg) are as follows: The 71.2, 60.0, 55.3, 65.4, 32.7, 78.6, 68.8, 59.6 Estimate μ. Estimate Solution: From the 8 data we can obtain easily, ˆ µ = x = 61.45 Note: Note
ˆ µ = x = 61.45 The estimate , computed according to (1) uses all the information in the sample fairly. There is no inbuilt intention to overestimate or underestimate μ. It is called an unbiased estimate for μ. unbiased To appreciate this, suppose we adopt a rule of dropping he largest value and using the average of the remaining data as an estimate for μ. ˆ Then, say, for the above data, µ = 71.2 + 60 + 55.3 + 65.4 + 32.7 + 68.8 + 59.6 = 59 Then, ˆ
7 This is a downward biased estimate for μ. This ˆ In the long run, µ underestimates μ, thus it is NOT fair. ˆ C. Unbiased Estimate for σ² C. Unbiased Estimate for x1 , x2 ,......., xn Let a sample be taken from a population. To estimate σ², we use ∑ ( xi − x ) 2 2 2
ˆ σ =s = n −1 .....(2) Instead of σ 2 = d 2 = ∑ ( xi − x ) 2 .....(3) ˆ ˆ
n This is because d² gives a downward biased estimate forσ² while s² gives an unbiased estimate for σ². The rigorous proof of this fact is NOT our main concern here, but it would be useful to look at some intuitive reasons. RULE: ˆ ˆ E (θ ) = θ θ If then is an UNBIASED estimator of θ
ˆ Example: E ( X ) = µ ⇒ µ = X X The sample mean ( ) is an UNBIASED estimator of µ the population mean ( ) n Proof: ∑ Xi
X 1 + X 2 + ....... + X n n n X + X 2 + ....... + X n 1 E( X ) = E( 1 ) = {E ( X 1 ) + E ( X 2 ) + ..... + E ( X n )} n n 1 1 = {µ + µ + ........ + µ} = ⋅ nµ = µ n n X=
i =1 More explanations More explanations = xi − µ Z= σ ( xi − µ ) 2 Z2 = ~ χ 12 σ2 ( xi − µ ) 2 ∑
i =1 n n 2 ~ χn σ2 ( xi − x ) 2 ∑
i =1 σ 2 ~χ 2 n −1 Rule: Rule If Z~N(0,1) is a standard Normal then Z² has a CHISQUARED distribution with 1 degree of freedom Rule: χ12 + χ12 + ....... + χ12 = χ n2 n of these Rule: If you replace a parameter by an estimate then you lose 1 degree of freedom. s2 = ( xi − x ) 2 ∑
i =1 n n −1
n 2 σ 2 χ n −1 ~ n −1 E ( s 2 ) = E{ i =1
2 2 n −1 ( xi − x ) 2 ∑ n −1 } σχ = E( ) n −1 σ2 2 = E ( χ n −1 ) n −1 σ2 = ⋅ (n − 1) n −1 =σ 2 Rule: The mean of a chisquared Rule distribution is equal to it’s degrees of freedom. E ( χ k2 ) = k
Conclusion: E(s²)=σ² Therefore the sample variance, s², is an UNBIASED estimator of the population variance, σ². ( xi − x ) 2 ∑
i =1 n n
n 2 σ 2 χ n −1 ~ n 2 σ 2 χ n −1 } = E( ) n E{ i =1 ( xi − x ) 2 ∑ n σ2 2 = E ( χ n −1 ) n σ2 = ⋅ (n − 1) n ≠σ2 Conclusion: Conclusion E(s²) is NOT EQUAL to σ² Therefore this estimator is NOT an UNBIASED estimator of the population variance, σ². IT is BIASED! Example 2: Example 2
For the following data 71.2, 60.0, 55.3, 65.4, 32.7, 78.6, 68.8, 59.6 estimate σ². estimate
Solution: x = 61.45
( xi − x ) 2 ∑ n −1 = (71.2 − 61.45) + ....... + (59.6 − 61.45) 8 −1 ˆ σ 2 = s2 = = 189.8171 D. Estimation of σ D. Estimation of ˆ σ 2 = s2 While is an unbiased estimate for σ², we usually use to estimate σ. ˆ σ =s
Thus for the above data. ˆ σ = 189.8171 = 13.7774
In the 50FH calculator, there is a key for (x − x) ˆ σ =s= ∑
2 i n −1 xσn − 1 This is named “ ”. It makes the calculation very convenient. Data Set 1 Data Set 2
N2 = 5 Recall from Chapter 6: µ1 = 15 µ 2 = 24
σ2 = 3 σ1 = 4 Combine data sets A and B and calculate and Solution: xi xi fi fi µ1 − σ 1 µ1 + σ 1 µ2 −σ 2 µ2 + σ 2 11 2
N1 2 19 2
N1 2 21 2.5
N2 2 27 2.5
N2 2 Example 3: Example 3: Amy and Ben separately took a sample from the same population. Each submitted their own result: Data set A Data Set B
n1 = 7 x1 = 50 n2 = 8 x2 = 49
s2 = 3 s1 = 2 Now mix the two sets of data to estimate (a) the population mean, μ (b) the population standard deviation, σ. (b) Solution: Solution:
xi xi
x1 − s1 ˆ µ = x = 49.4667 ˆ σ = s = 2.5458 x1
50 1 x1 + s1 x2 − s 2 x2 x2 + s 2
49+3= 52 3.5
n2 − 1 2 502= 48 3
n1 − 1 2 50+2= 493= 52 46 49 3
n1 − 1 2 fi fi 3.5
n2 − 1 2 1 1 1 E. Use of Random Numbers to Draw a Sample E. Use of Random Numbers to Draw a Sample (NOT ON EXAM) Example 4 (a) Draw a simple random sample of size 6 from the following list of 50 words: “There are two reasons why the scope of statistics and the need to study statistics have grown enormously in the last few decades. One reason is the increasingly quantitative approach in all the sciences, as well as in business and in many other activities which directly affect our lives.” (b) Hence, estimate the mean (μ) and the s.d.(σ) of the word lengths of this list. of Solution for Example 4 Solution for Example 4 First, code the members (words) of the population (the list) as 01, 02,……, 50. For instance, “there” is coded as “01”, “are” as “02”, ……, “lives” as “50”. Then use “RAN#” key to obtain random numbers. Press “SHIFT RAN#” first, and then press “EXE” repeatedly. We get: 0.428, 0.670, 0.093, 0.812, 0.165, 0.733, 0.421, 0.276, 0.484. Since the population members are coded with twodigit numbers we shall forget about the decimal point and the last digit of the above 9 selected random numbers. They become: 42, 67, 09, 81, 16, 73, 42, 27, 48. Any number exceeding 50 has to be discarded since Any number exceeding 50 has to be discarded since the population has only 50 members. Repeated members are not discriminated against its multiple occurrences since we are taking samples with replacement. Hence members corresponding to these coded numbers, 42, 09, 16, 42, 27, 48 are drawn. These members (words) are: in, statistics, have, in, the, affect (b) The wordlengths of these 6 members are 2, 10, 4, 2, 3, 6 Hence and ˆ µ = x = 4.5 ˆ σ = s = 3.0822 F. Estimation of p F. Estimation of In repeated, independent, experiments with two outcomes, S and F, we are interested in the probability of S, i.e. p = P(S), per trial. To estimate it, the method is quite simple. Just perform the experiment n times. If x is the number of S’s, then and unbiased estimate for p is x ˆ p= n Example 5: Example 5
An unbalanced coin is tossed 8 times. Heads come up 6 times. Estimate p, the probability of heads per trial. Solution: n=8, x=6, x6 = = 0.75 n8 ˆ p= G. Use of Random Numbers to Estimate π G. Use of Random Numbers to Estimate (NOT ON EXAM) In the history of mathematics, a large number of famous mathematicians spent a great deal of time to find the value of π. Here is an unexpectedly simple but uncommon way of estimating π. In a square ABCD of side 1, draw a quarter of a circle, with centre at A, and radius 1. Then the area of square ABCD π Area( ABD) is 1, and the area of the 4 =π ∴p= = π quarter circle ABD is Area( ABCD) 1 4
4 This leads to the following simulation method: This leads to the following (i) A trial consists of picking two random numbers, called x and y, form the 50FH calculator. Say, x=0.247, y=0.803 x 2 + y 2 (= 0.705818) x2 + y2 ≤ 1 (ii) Calculate . If , call the trial a success. As AQ ≤ 1, implying Q lies within the circle. (iii) Repeat (i) and (ii) until say n=1000 trials have been performed. (iv) Count the number of successes, say, r=781. (v) Then, p, the probability that Q lies within the quarter circle ABD is estimated by (4): ˆ [Note: π=3.141592653589 7932384 6264338.] r 781 π ˆ p= = = 0.781(= ) n 1000 4 ˆ ∴π = 4 × 0.781 = 3.124 Exercise (NOT ON EXAM) Exercise (NOT ON EXAM)
Perform your own estimate of π by trying the above simulation 500 times. ...
View
Full
Document
This note was uploaded on 04/22/2011 for the course STAT 301 taught by Professor Gabrille during the Spring '11 term at HKU.
 Spring '11
 Gabrille
 Statistics, Variance

Click to edit the document details