Bstat6 - Lessons in Business Statistics Prepared By P.K....

Info iconThis preview shows page 1. Sign up to view the full content.

View Full Document Right Arrow Icon
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: Lessons in Business Statistics Prepared By P.K. Viswanathan Chapter 6: Basics of Sampling and Sampling Distribution Introduction The aim of sampling is to throw light on the population (universe) parameter that is of interest to the investigator. A well thought out representative random sample most of the times gives meaningful insights into the properties of the population parameters. This is the very foundation of statistical inference. This chapter covers the sampling methodology and the associated sampling distribution. 1) What is Sampling and Why Do You Need it? Sampling is a method of selecting units of analysis such as households, people, consumers, companies etc from a population (universe) of interest to a manager. By analyzing the data collected from the sample, you draw inferences about the population parameters. In other words, sampling is employed to throw light on the population parameter. In chapter 1, you have been already exposed to the definition and meaning of the terms, “parameter” and “statistic”. A statistic is an estimate based on sample data to draw inferences about a population characteristic of interest called the parameter. 1) What is Sampling and Why Do You Need it? Continues Suppose a company is interested in launching a new product and wants to get some ideas about the demand potential. There are two ways of doing this: The first approach is called a Census (also known as complete enumeration). It has two major disadvantages. 1) It is timeconsuming 2) It is very expensive. 1) It could ask all potential buyers in the country whether they will actually buy it, and if so how much would they buy The second approach that uses Sampling procedure has two major advantages. 1) It is significantly less expensive 2) It takes the least possible time. 2) It could take a sample of the potential buyers, ask them how many units of the product would they buy, and then estimate the likely demand for the product in the market as a whole. Also there are situations that involve destruction procedure where sampling is the only answer. 2) Types of Sampling Types of Sampling Non-Probability Sampling Convenience Sampling Expert Opinion Probability Sampling Simple Random Sampling Stratified Random Sampling Systematic Sampling Cluster Sampling Quota Sampling Probability Sampling A probability sampling is a method of sampling that ensures that every unit in the population has a known non-zero chance of being selected. Please note that every potential sample need not have the same chance of selection. Practitioners have been using various forms of random selection the most popular being a random number table. Today, computers have replaced the random number table and the software generates the random numbers in a scientific manner very fast. Some Key Terms Some Key Terms in Sampling N = Number of units in the Population n = Number of units in the Sample (Sample Size) N = Number of possible selection of n units from N units n = N (N 1) (N 2)..... (N n 1) (without replacement scheme) 1 2 3........n f = n/N = Sampling fraction Sampling Frame is a complete list of the units of analysis of interest from which the samples are selected. Simple Random Sampling Simple Random Sampling is the foundation of Probability Sampling. It is a special case of probability sampling in which every unit in the population has the same chance of being selected. If you have to select n units out of N units, every possible selection of n units must have the same probability. Can you say how many ways are possible to pick up n units out of N units? Of course you can. N It is equal to n Refer the table giving some key terms in sampling in the previous slide. Simple random sampling guarantees that every possible selection of n units from N units has the same 1 N n probability We are assuming here that the units are selected without replacement. Stratified Random Sampling Stratified Random Sampling involves dividing the population into a number of groups called strata in such a manner that the units within a stratum are homogenous and the units between the strata are heterogeneous. Having divided the population into a number of strata, now select a simple random sample of appropriate size from each stratum. Systematic Sampling In systematic random sampling, the units are drawn from the population at regular intervals clearly defined. It is one of the easiest procedures to follow: The steps involved in constructing a systematic sampling scheme are as below: Compute K =(N/n) and take the integer value. K is called the sampling interval Select a random number between 1 and K Starting with this number select every Kth number until all the n units are selected Cluster Sampling Divide the population into a number of clusters based on geographic boundaries Select a random sample of clusters from this population of clusters Either measure all units within the randomly chosen clusters or do further simple random sampling in each cluster Strictly speaking, when you measure all the units in the selected clusters, the procedure is called cluster sampling. Suppose you do further sampling within each cluster by adopting a simple random sampling or stratified random sampling, the procedure becomes a multi-stage sampling. Non-Probability Sampling The fundamental difference between non-probability sampling and probability sampling is that in non-probability sampling procedure, the selection of the sample units does not ensure a known chance to the units being selected. In other words, the units are selected without using the principle of probability. Even though the non-probability sampling has advantages such as reduced cost, speed, and convenience in implementation, it lacks accuracy in view of the selection bias. Another negative point of the non-probability sampling is its inability to generalize results from the sample to the population. It is mandatory in inferential statistics to use only probability sampling for valid conclusions. Non-probability sampling is suitable for pilot studies and exploratory research. Convenience Sampling Using college and university students in studies involving attitudes towards co-education is basically a matter of convenience. In consumer panel studies you may use clients who are available to you as your respondents for giving their opinion on products and services. In many research projects, you simply look for volunteers to participate. This is how the convenience sampling is done. For heavens sake, don’t generalize results based on convenience sampling! Expert Opinion Sampling Expert Opinion Sampling involves gathering a set of people who have the knowledge and expertise in certain key areas that are crucial to decision making. In qualitative methods of demand projection for a new product, you use the expert opinion method to arrive at a reasonable forecast. The advantage of this sampling is that it acts as a support mechanism for some of your decisions in situations where virtually no data are available. The major disadvantage is that even the experts can have prejudices, likes, and dislikes that might distort the results. Quota Sampling In simple terms, quota sampling is stratified random sampling without probability principle being applied to the selection of the sample units. Suppose in an opinion study, you want both men and women to participate. You know that in the population category of interest, 65% are men and 35 % are women. If your sample size is fixed at 200, you will have a quota of 130 men and 70 women. It doesn’t matter how you get them as long as you have met the quota. There are some socio economic studies where quota sampling is the only way out because of practical considerations. You can do the descriptive statistics, graphs, charts, and summary table and stop there. That is it. Drawing any possible conclusions from a quota sampling will be highly tentative. None of the statistical inference techniques should be applied when you have followed quota sampling or for that matter any non-probability sampling procedure 3) Sampling Distribution-A Conceptual Framework The probability distribution of all the possible values a sample statistic can take is called the sampling distribution. of the statistic. The key word here is “sample statistic”. Sample mean and sample proportion based on a random sample are examples of sample statistic(s). Please note that we are not interested in the probability distribution of a set of numbers. We are interested in the probability distribution of a statistic which can assume different values in an experiment that involves taking a large number of times random samples of same sample size from a population and computing the statistic afresh every time. 3) Sampling Distribution-A Conceptual Framework Cautionary Note: There is a feeling among many students that the expression “sampling distribution” automatically implies the sampling distribution of the mean. This is not correct. You can have a sampling distribution of the median as well. Please notice that the sampling distribution of the mean is not the same as the sampling distribution of the median. They are entirely different. 4) Concept of Standard Error What is the standard deviation of the sample statistic called? Can you guess? It is called the Standard Error of the Statistic. In other words, the standard deviation of the Sample Statistic is called the Standard Error of the Estimate. Please note that any sample statistic is an estimate that is used to estimate the population parameter. The standard deviation of the distribution of the sample means is called the standard error of the mean. Likewise, the standard deviation of the distribution of the sample proportions is called the standard error of the proportion. 4) Concept of Standard Error-Continues In the context of sampling, the standard error is popularly known as the sampling error. We told earlier that a sample statistic is an estimate. Sampling error throws light on the precision and accuracy of our estimate. The logical question is how to compute the sampling error? Can you say how? Of course you can, if you think intuitively. The larger the sample standard deviation, the larger is the standard error. The larger the standard error, the larger is the sampling error. 5) Sampling Distribution of MeanNormal Population If you have gone through the initial discussion on the concepts of the sampling distribution of the mean along with the example given, the sampling distribution of the mean from the normal population is a logical extension of the same principle. The samples are randomly drawn from a normal population with certain mean and standard deviation. The original population is distributed normally. You have a clear structure for the sampling distribution of the mean with terms and notations that are given in the next slide. 5) Sampling Distribution of Mean-Normal Population X1 , X 2 , X 3 ,......., X n are n independent random samples drawn from a Normal Population with Mean = μ and Standard Deviation = σ , then the sampling distribution of X follows a Normal Distribution with Mean = μ , and Standard Deviation = σ . If n 6) Sampling Distribution of Mean-Non-Normal Population Central Limit Theorem The distinguishing and unique feature of the central limit theorem is that irrespective of the shape of the distribution of the original population, the sampling distribution of the mean will approach a normal distribution as the size of the sample increases and becomes large. How large is large? A thumb rule based on experience says a sample size of 30 and above is considered large. It works reasonably well in a large number of problems. Please remember that n is the sample size for each mean computed and not the number of samples. Also note that theoretically speaking you compute the mean for infinite number of samples. Of course in practice this could mean a sufficiently large number of samples. 6) Sampling Distribution of Mean-Non-Normal Population Mini Case A company is seriously considering buying one type of machine that can save significant labor hours. The labor hour saved by the machine follows a normal distribution with mean = 2200 and a certain standard deviation. It is known that there is a fifty- fifty chance that the labor hours saved by the machine is either greater than 2400 or less than 2000. The price of the machine is Rs.860000. The incremental cost of a labor hour incurred in the company is currently Rs. 400. The company has also performed 36 trial runs to find out how the machine is fairing. The company would like to make some preliminary assessment before buying the machine. It requires your help to answer the following fill in the blank questions: Mini Case-Questions 1) The standard deviation of the population distribution of the labor hours saved by the machine is --------------2) The standard error of the sample mean of labor hours saved is -------------3) The probability that the sample mean of labor hours saved will exceed the break-even labor hours is ------------ Mini Case-Solution Facts of the Case: The cost of the machine is Rs. 860000. The incremental cost of a labor hour =Rs. 400. Therefore the break-even point labor hours = 860000/400 = 2150. If the machine can save more than 2150 labor hours, it can be considered for buying. Then, the case says that there is a 50-50 chance that the labor hours saved by the machine is either greater than 2400 or less than 2000. What do you mean by this? It means that P(X>2400)+P(X<2000) = 0.50 where X is the random variable denoting the mean labor hours saved. Can you draw the correct diagram to the problem? Then you can easily answer the questions of the case. Mini Case-Solution Mini Case-Solution The cumulative probability up to 2000 = 0.25 (given). It is easy to solve this problem using z because you can get the value of z corresponding to the cumulative probability using NORMSINV of Excel. Here z =(x- = (2000-2200)/ = -200/ For what )/ . value of –200/ is the cumulative probability = 0.25. , The steps are given below one by one. First click Paste Function in Excel. The following paste function will appear. Mini Case-Solution You see in the above function NORMSINV is highlighted corresponding to “Statistical” on the left side. Click OK and you will get Mini Case-Solution Enter the probability value 0.25 in the cell above and then clic k OK. You will get Mini Case-Solution The answer appears under the formula result above. The answer is –0.6745 if we round it off to 4 places of decimal. That means –200/= -.6745. Therefore =200/0.6745 =296.5159. 1)This is the standard deviation of the original population distribution of labor hours saved. Approximately this is equal to 297 hours. Mini Case-Solution 2) The standard error of the sample mean = σ =297/6 = 49.5 n σ= x Mini Case-Solution 3) The probability of the sample mean of labor hours exceeding break-even point labor hours is same as P( X 2150) Using the NORMDIST we can get the answer directly. Please note that the standard deviation of the sample mean that is 49.5 is to be used for answering this question. Mini Case-Solution Enter the value 2150 for X in the cell, mean =2200, standard deviation = 49.5, and 1 in the cell of cumulative. Click OK. You get In the above table of Excel you see the cumulative probability up to 2150 =0.1562. Therefore the probability of getting more than 2150 = 1- 0.1562 =0.8438. That is there is about 84 % chance that the distribution of the mean labor hours saved by the machine will be more than the break- even labor hours when a sample of 36 trials is taken. ...
View Full Document

This note was uploaded on 02/24/2012 for the course BUSINESS 281 taught by Professor Gray during the Spring '12 term at Florida State College.

Ask a homework question - tutors are online