Lecture4-1 - Sampling and Sampling Distributions I

Lecture4-1 - Sampling and Sampling Distributions I - EM 521...

Info icon This preview shows page 1. Sign up to view the full content.

View Full Document Right Arrow Icon
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: EM 521 Applied Statistics Sampling and Sampling Distributions AS&W – Chapter 7 1 Sampling and Sampling Distributions Simple Random Sampling Point Estimation Introduction to Sampling Distributions Sampling Distribution of x 2 Statistical Inference The The purpose purpose of of statistical statistical inference inference is is to to obtain obtain information information about about a a population population from from information informatio contained contained in in a a sample. sample. A A population population is is the the set set of of all all the the elements elements of of int int A A sample sample is is a a subset subset of of the the population. population. 3 Statistical Inference The The sample sample results results provide provide only only estimates estimates of of th th values values of of the the population population characteristics. characteristics. With With proper proper sampling sampling methods methods,, the the sample sample resu resu can can provide provide “good” “good” estimates estimates of of the the population population characteristics. characteristics. A A parameter parameter is is a a numerical numerical characteristic characteristic of of a a population. population. 4 Simple Random Sampling: Finite Population Finite populations are often defined by lists such as: • Organization membership roster • Credit card account numbers • Inventory product numbers A simple random sample of size n from a finite population of size N is a sample selected such that each possible sample of size n has the same probability of being selected. 5 Simple Random Sampling: Finite Population Replacing each sampled element before selecting subsequent elements is called sampling with replacement. Sampling without replacement is the procedure used most often. In large sampling projects, computer-generated random numbers are often used to automate the sample selection process. 6 Simple Random Sampling: Infinite Population Infinite populations are often defined by an ongoing process whereby the elements of the population consist of items generated as though the process would operate indefinitely. A simple random sample from an infinite population is a sample selected such that the following conditio are satisfied. • Each element selected comes from the same population. • Each element is selected independently. 7 Simple Random Sampling: Infinite Population In the case of infinite populations, it is impossible to obtain a list of all elements in the population. The random number selection procedure cannot be used for infinite populations. 8 Point Estimation In In point point estimation estimation we we use use the the data data from from the the sa sa to to compute compute a a value value of of a a sample sample statistic statistic that that se se as as an an estimate estimate of of a a population population parameter. parameter. We x We refer refer to to mean mean .. as as the the point point estimator estimator of of the the popu pop ss is stand is the the point point estimator estimator of of the the population population standa stand deviation deviation .. p is is the the point point estimator estimator of of the the population population propo propo 9 Sampling Error When the expected value of a point estimator is eq to the population parameter, the point estimator is to be unbiased. The absolute value of the difference between an unbiased point estimate and the corresponding population parameter is called the sampling error. Sampling error is the result of using a subset of the population (the sample), and not the entire population. Statistical methods can be used to make probability statements about the size of the sampling error. 10 Sampling Error The sampling errors are: | x | for sample mean | s | for sample standard deviation | p p| for sample proportion 11 Example: St. Andrew’s St. Andrew’s College receives 900 applications annually from prospective students. The application form contains a variety of information including the individual’s scholastic aptitude test (SAT) score and whether or not the individual desires on-campus housing. 12 Example: St. Andrew’s The director of admissions would like to know the following information: •the average SAT score for the 900 applicants, and •the proportion of applicants that want to live on campus. 13 Example: St. Andrew’s We will now look at three alternatives for obtaining the desired information. Conducting a census of the entire 900 applicants Selecting a sample of 30 applicants, using a random number table Selecting a sample of 30 applicants, using Excel 14 Conducting a Census If the relevant data for the entire 900 applicants were in the college’s database, the population parameters of interest could be calculated using the formulas we covered before. We will assume for the moment that conducting a census is practical in this example. 15 Conducting a Census Population Mean SAT Score x i 900 Population Standard Deviation for SAT 2 Score ( x ) i 990 900 80 Population Proportion Wanting OnCampus Housing 648 p 900 .72 16 Simple Random Sampling Now suppose that the necessary data on the current year’s applicants were not yet entered in college’s database. Furthermore, the Director of Admissions must obt ob estimates of the population parameters of intere a meeting taking place in a few hours. She decides a sample of 30 applicants will be use The applicants were numbered, from 1 to 900, as their applications arrived. 17 Simple Random Sampling: Using a Random Number Table Taking a Sample of 30 Applicants • Because the finite population has 900 elements, w will need 3-digit random numbers to randomly select applicants numbered from 1 to 900. • We will use the last three digits of the 5-digit random numbers in the third column of the textbook’s random number table, and continue into the fourth column as needed. 18 Simple Random Sampling: Using a Random Number Table Taking a Sample of 30 Applicants • The numbers we draw will be the numbers of the applicants we will sample unless • the random number is greater than 900 or • the random number has already been used. • We will continue to draw random numbers until we have selected 30 applicants for our sample. • (We will go through all of column 3 and part of column 4 of the random number table, encountering in the process five numbers greater than 900 and one duplicate, 835.) 19 Simple Random Sampling: Using a Random Number Table Use of Random Numbers for Sampling 3-Digit Applicant Random Number Included in Sample 744 No. 744 436 No. 436 865 No. 865 790 No. 790 835 No. 835 Number exceeds 900 902 190 No. 190 836 No. 836 . . . and so on 20 Simple Random Sampling: Using a Random Number Table Sample Data Random No. Number 1 744 2 436 3 865 4 790 5 835 . . . . 30 498 SAT Live OnScore Campus Applicant Conrad Harris 1025 Yes Enrique Romero 950 Yes Fabian Avante 1090 No Lucila Cruz 1120 Yes Chan Chiang 930 No . . . . . . Emily Morse 1010 No 21 Simple Random Sampling: Using a Computer Taking a Sample of 30 Applicants • Computers can be used to generate random numbers for selecting random samples. • For example, Excel’s function = RANDBETWEEN(1,900) can be used to generate random numbers betw bet 1 and 900. • Then we choose the 30 applicants correspondin to the 30 smallest random numbers as our sam 22 Point Estimation x as Point Estimator of x x 29,910 997 30 30 i s as Point Estimator of s 2 ( x x ) i 29 163,996 75.2 29 p as Point Estimator of p p 20 30 .68 Note: Different random numbers would have identified a different sample which would have resulted in different point estimates. 23 Summary of Point Estimates Obtained from a Simple Random Sample Population Parameter Parameter Value Point Estimator Point Estimate = Population mean 990 SAT score x = Sample mean 997 80 s = Sample std. 75.2 deviation for SAT score = Population std. deviation for SAT score .72 p = Population proportion wanting campus housing SAT score p = Sample pro- .68 portion wanting campus housing 24 Sampling Distribution of x A random variable is defined as: numerical description of the outcome of an experiment. Consider the process of selecting a simple random sample as an experiment Then, the sample mean is a random variable since it is a numerical description of the sample selection experiment Just like other random variables sample mean has an expected value and a standard deviation, and a probability distribution 25 Sampling Distribution of x The sampling distribution of x is the probability distribution of all possible values of the sample mean x . Expected Value ofx E( x ) = where: = the population mean 26 Sampling Distribution of x Standard Deviation of x Finite Population Infinite Population N n x ( ) n N1 x n • A finite population is treated as being infinite if n/N < .05. • ( N n) / ( N 1) is the finite correction factor. • x is referred to as the standard error of the mean. 27 Form of the Sampling Distribution ofx If we use a large (n > 30) simple random sample, the central limit theorem enables us to conclude that the x sampling distribution of can be approximated by a normal distribution. When the simple random sample is small (n < 30), the sampling distribution xof can be considered normal only if we assume the population has a normal distribution. 28 Sampling Distribution of x for SAT Scores Sampling Distribution of x E(x) 990 x 80 14.6 n 30 x 29 Sampling Distribution of x for SAT Scores What is the probability that a simple random sampl samp of 30 applicants will provide an estimate of the population mean SAT score that is within +/10 of the actual population mean ? In other words, what is the probability xthat will b between 980 and 1000? 30 Sampling Distribution of x for SAT Scores Step 1: Calculate the z-value at the upper endpoint the interval. z = (1000 990)/14.6= .68 Step 2: Find the area under the curve to the left of upper endpoint. P(z < .68) = .7517 31 Sampling Distribution of x for SAT Scores z . .00 . Cumulative Probabilities for the Standard Normal Distribution .01 .02 .03 .04 .05 .06 .07 . . . . . . . .5 .6915 .6950 .6 .7257 .7291 .7 .7580 .7611 .8 .7881 .7910 .6985 .7019 .7054 .7088 .7123 .7157 .7324 .7357 .7389 .7422 .7454 .7486 .7642 .7673 .7704 .7734 .7764 .7794 .7939 .7967 .7995 .8023 .8051 .8078 .08 . .09 . .7190 .7224 .7517 .7549 .7823 .7852 .8106 .8133 .9 .8159 .8186 .8212 .8238 .8264 .8289 .8315 .8340 .8365 .8389 . . . . . . . . . . . 32 Sampling Distribution of x for SAT Scores Sampling Distribution of x x 14.6 Area = .7517 x 990 1000 33 Sampling Distribution of x for SAT Scores Step 3: Calculate the z-value at the lower endpoint the interval. z = (980 990)/14.6= - .68 Step 4: Find the area under the curve to the left of lower endpoint. P(z < -.68) = P(z > .68) = 1 P(z < .68) = 1 . 7517 = .2483 34 Sampling Distribution of x for SAT Scores Sampling Distribution of x x 14.6 Area = .2483 x 980 990 35 Sampling Distribution of x for SAT Scores Step 5: Calculate the area under the curve between the lower and upper endpoints of the interval P(-.68 < z < .68) = P(z < .68) P(z < -.68) = .7517 .2483 = .5034 The probability that the sample mean SAT score will be between 980 and 1000 is: P(980 < x< 1000) = .5034 36 Sampling Distribution of x for SAT Scores Sampling Distribution of x x 14.6 Area = .5034 980 990 1000 x 37 Relationship Between the Sample Size and the Sampling Distribution x of Suppose we select a simple random sample of 100 applicants instead of the 30 originally considered. E(x ) = regardless of the sample size. In x our example, Ethe ( ) sample remainssize at 990. Whenever is increased, the standar x error of the mean is decreased. With the increa in the sample size to n = 100, the standard error of mean is decreased to: x 80 8.0 n 100 38 Relationship Between the Sample Size and the Sampling Distribution x of With n = 100, x 8 With n = 30, x 14.6 E(x) 990 x 39 Relationship Between the Sample Size and the Sampling Distribution x of Recall that when n = 30, P(980 x < < 1000) = .503 We follow the same steps to solve for P(980 x < <1 when n = 100 as we showed earlier when n = 30. Now, with n = 100, P(980 x< < 1000) = .7888. Because the sampling distribution with n = 100 has x of have less smaller standard error, the values variability and tend to be closer to the population mean than the valuesx of with n = 30. 40 Relationship Between the Sample Size and the Sampling Distribution x of Sampling Distribution of x x 8 Area = .7888 x 980 990 1000 41 ...
View Full Document

{[ snackBarMessage ]}

What students are saying

  • Left Quote Icon

    As a current student on this bumpy collegiate pathway, I stumbled upon Course Hero, where I can find study resources for nearly all my courses, get online help from tutors 24/7, and even share my old projects, papers, and lecture notes with other students.

    Student Picture

    Kiran Temple University Fox School of Business ‘17, Course Hero Intern

  • Left Quote Icon

    I cannot even describe how much Course Hero helped me this summer. It’s truly become something I can always rely on and help me. In the end, I was not only able to survive summer classes, but I was able to thrive thanks to Course Hero.

    Student Picture

    Dana University of Pennsylvania ‘17, Course Hero Intern

  • Left Quote Icon

    The ability to access any university’s resources through Course Hero proved invaluable in my case. I was behind on Tulane coursework and actually used UCLA’s materials to help me move forward and get everything together on time.

    Student Picture

    Jill Tulane University ‘16, Course Hero Intern