**Unformatted text preview: **Lessons in Business Statistics
Prepared By
P.K. Viswanathan Chapter 6: Basics of Sampling
and Sampling Distribution Introduction
The aim of sampling is to throw light on the
population (universe) parameter that is of interest
to the investigator. A well thought out
representative random sample most of the times
gives meaningful insights into the properties of the
population parameters. This is the very foundation
of statistical inference. This chapter covers the
sampling methodology and the associated sampling
distribution. 1) What is Sampling and Why Do
You Need it?
Sampling is a method of selecting units of analysis
such as households, people, consumers, companies
etc from a population (universe) of interest to a
manager. By analyzing the data collected from the
sample, you draw inferences about the population
parameters. In other words, sampling is employed to
throw light on the population parameter. In chapter 1,
you have been already exposed to the definition and
meaning of the terms, “parameter” and “statistic”. A
statistic is an estimate based on sample data to draw
inferences about a population characteristic of
interest called the parameter. 1) What is Sampling and Why
Do You Need it? Continues Suppose a company is interested in
launching a new product and wants
to get some ideas about the demand
potential. There are two ways of
doing this: The first approach is called a
Census (also known as complete
enumeration). It has two major
disadvantages. 1) It is timeconsuming 2) It is very expensive. 1) It could ask all potential buyers in
the country whether they
will
actually buy it, and if so how much
would they buy The second approach that uses
Sampling procedure has two major
advantages. 1) It is significantly
less expensive 2) It takes the least
possible time. 2) It could take a sample of the
potential buyers, ask them how
many units of the product would
they buy, and then estimate the
likely demand for the product in the
market as a whole. Also there are situations that
involve destruction procedure
where sampling is the only answer. 2) Types of Sampling
Types of Sampling Non-Probability Sampling
Convenience
Sampling Expert
Opinion Probability Sampling Simple
Random
Sampling Stratified
Random
Sampling Systematic
Sampling Cluster
Sampling Quota
Sampling Probability Sampling
A probability sampling is a method of sampling that
ensures that every unit in the population has a
known non-zero chance of being selected. Please
note that every potential sample need not have the
same chance of selection. Practitioners have been
using various forms of random selection the most
popular being a random number table. Today,
computers have replaced the random number table
and the software generates the random numbers in a
scientific manner very fast. Some Key Terms
Some Key Terms in Sampling
N = Number of units in the Population
n = Number of units in the Sample (Sample Size) N = Number of possible selection of n units from N units n = N (N 1) (N 2)..... (N n 1)
(without replacement scheme)
1 2 3........n f = n/N = Sampling fraction
Sampling Frame is a complete list of the units of analysis of interest from
which the samples are selected. Simple Random Sampling
Simple Random Sampling is the foundation of Probability
Sampling. It is a special case of probability sampling in which
every unit in the population has the same chance of being selected.
If you have to select n units out of N units, every possible selection
of n units must have the same probability. Can you say how many
ways are possible to pick up n units out of N units? Of course you
can.
N
It is equal to n Refer the table giving some key terms in sampling in the previous
slide. Simple random sampling guarantees that every possible
selection of n units from N units has the same
1
N n probability
We are assuming here that the units are selected without
replacement. Stratified Random Sampling
Stratified Random Sampling involves dividing the
population into a number of groups called strata in
such a manner that the units within a stratum are
homogenous and the units between the strata are
heterogeneous. Having divided the population into a
number of strata, now select a simple random sample
of appropriate size from each stratum. Systematic Sampling
In systematic random sampling, the units are drawn
from the population at regular intervals clearly
defined. It is one of the easiest procedures to follow:
The steps involved in constructing a systematic
sampling scheme are as below: Compute K =(N/n) and take the integer value. K is called the sampling interval Select a random number between 1 and K Starting with this number select every Kth number until all the n units are selected Cluster Sampling Divide the population into a number of clusters based
on geographic boundaries Select a random sample of clusters from this population
of clusters Either measure all units within the randomly chosen
clusters or do further simple random sampling in each
cluster
Strictly speaking, when you measure all the units in the
selected clusters, the procedure is called cluster
sampling. Suppose you do further sampling within each
cluster by adopting a simple random sampling or
stratified random sampling, the procedure becomes a
multi-stage sampling. Non-Probability Sampling The fundamental difference between non-probability sampling
and probability sampling is that in non-probability sampling
procedure, the selection of the sample units does not ensure a
known chance to the units being selected. In other words, the
units are selected without using the principle of probability. Even though the non-probability sampling has advantages such
as reduced cost, speed, and convenience in implementation, it
lacks accuracy in view of the selection bias. Another negative
point of the non-probability sampling is its inability to
generalize results from the sample to the population. It is
mandatory in inferential statistics to use only probability
sampling for valid conclusions. Non-probability sampling is
suitable for pilot studies and exploratory research. Convenience Sampling
Using college and university students in studies
involving attitudes towards co-education is basically a
matter of convenience. In consumer panel studies you
may use clients who are available to you as your
respondents for giving their opinion on products and
services. In many research projects, you simply look
for volunteers to participate. This is how the
convenience sampling is done.
For heavens sake, don’t generalize results based on
convenience sampling! Expert Opinion Sampling
Expert Opinion Sampling involves gathering a set of
people who have the knowledge and expertise in
certain key areas that are crucial to decision making.
In qualitative methods of demand projection for a new
product, you use the expert opinion method to arrive
at a reasonable forecast. The advantage of this
sampling is that it acts as a support mechanism for
some of your decisions in situations where virtually
no data are available. The major disadvantage is that
even the experts can have prejudices, likes, and
dislikes that might distort the results. Quota Sampling
In simple terms, quota sampling is stratified random sampling
without probability principle being applied to the selection of
the sample units. Suppose in an opinion study, you want both
men and women to participate. You know that in the population
category of interest, 65% are men and 35 % are women. If your
sample size is fixed at 200, you will have a quota of 130 men
and 70 women. It doesn’t matter how you get them as long as
you have met the quota. There are some socio economic studies
where quota sampling is the only way out because of practical
considerations. You can do the descriptive statistics, graphs,
charts, and summary table and stop there. That is it. Drawing
any possible conclusions from a quota sampling will be highly
tentative. None of the statistical inference techniques should be
applied when you have followed quota sampling or for that
matter any non-probability sampling procedure 3) Sampling Distribution-A
Conceptual Framework
The probability distribution of all the possible values a
sample statistic can take is called the sampling
distribution. of the statistic. The key word here is
“sample statistic”. Sample mean and sample proportion
based on a random sample are examples of sample
statistic(s). Please note that we are not interested in the
probability distribution of a set of numbers. We are
interested in the probability distribution of a statistic
which can assume different values in an experiment that
involves taking a large number of times random
samples of same sample size from a population and
computing the statistic afresh every time. 3) Sampling Distribution-A
Conceptual Framework
Cautionary Note: There is a feeling among many
students that the expression “sampling
distribution” automatically implies the sampling
distribution of the mean. This is not correct. You
can have a sampling distribution of the median as
well. Please notice that the sampling distribution
of the mean is not the same as the sampling
distribution of the median. They are entirely
different. 4) Concept of Standard Error
What is the standard deviation of the sample statistic
called? Can you guess? It is called the Standard
Error of the Statistic. In other words, the standard
deviation of the Sample Statistic is called the
Standard Error of the Estimate. Please note that any
sample statistic is an estimate that is used to estimate
the population parameter. The standard deviation of
the distribution of the sample means is called the
standard error of the mean. Likewise, the standard
deviation of the distribution of the sample
proportions is called the standard error of the
proportion. 4) Concept of Standard Error-Continues
In the context of sampling, the standard error is
popularly known as the sampling error.
We told earlier that a sample statistic is an estimate.
Sampling error throws light on the precision and
accuracy of our estimate. The logical question is how
to compute the sampling error? Can you say how?
Of course you can, if you think intuitively. The
larger the sample standard deviation, the larger is the
standard error. The larger the standard error, the
larger is the sampling error. 5) Sampling Distribution of MeanNormal Population
If you have gone through the initial discussion on the
concepts of the sampling distribution of the mean
along with the example given, the sampling
distribution of the mean from the normal population
is a logical extension of the same principle. The
samples are randomly drawn from a normal
population with certain mean and standard deviation.
The original population is distributed normally. You
have a clear structure for the sampling distribution of
the mean with terms and notations that are given in
the next slide. 5) Sampling Distribution of
Mean-Normal Population
X1 , X 2 , X 3 ,......., X n are n independent random
samples drawn from a Normal Population with Mean = μ
and Standard Deviation = σ , then the sampling
distribution of X follows a Normal Distribution with
Mean = μ , and Standard Deviation = σ . If n 6) Sampling Distribution of
Mean-Non-Normal Population
Central Limit Theorem The distinguishing and unique feature of the central limit
theorem is that irrespective of the shape of the distribution
of the original population, the sampling distribution of the
mean will approach a normal distribution as the size of the
sample increases and becomes large. How large is large? A
thumb rule based on experience says a sample size of 30
and above is considered large. It works reasonably well in a
large number of problems. Please remember that n is the
sample size for each mean computed and not the number of
samples. Also note that theoretically speaking you compute
the mean for infinite number of samples. Of course in
practice this could mean a sufficiently large number of
samples. 6) Sampling Distribution of
Mean-Non-Normal Population Mini Case
A company is seriously considering buying one type of
machine that can save significant labor hours. The labor
hour saved by the machine follows a normal distribution
with mean = 2200 and a certain standard deviation. It is
known that there is a fifty- fifty chance that the labor hours
saved by the machine is either greater than 2400 or less
than 2000. The price of the machine is Rs.860000. The
incremental cost of a labor hour incurred in the company is
currently Rs. 400. The company has also performed 36 trial
runs to find out how the machine is fairing. The company
would like to make some preliminary assessment before
buying the machine. It requires your help to answer the
following fill in the blank questions: Mini Case-Questions
1) The standard deviation of the population
distribution of the labor hours saved by the
machine is --------------2) The standard error of the sample mean of labor
hours saved is -------------3) The probability that the sample mean of labor
hours saved will exceed the break-even labor
hours is ------------ Mini Case-Solution
Facts of the Case:
The cost of the machine is Rs. 860000. The incremental
cost of a labor hour =Rs. 400. Therefore the break-even
point labor hours = 860000/400 = 2150. If the machine can
save more than 2150 labor hours, it can be considered for
buying. Then, the case says that there is a 50-50 chance
that the labor hours saved by the machine is either greater
than 2400 or less than 2000. What do you mean by this? It
means that P(X>2400)+P(X<2000) = 0.50 where X is the
random variable denoting the mean labor hours saved. Can
you draw the correct diagram to the problem? Then you
can easily answer the questions of the case. Mini Case-Solution Mini Case-Solution
The cumulative probability up to 2000 = 0.25 (given).
It is easy to solve this problem using z because you can
get the value of z corresponding to the cumulative
probability using NORMSINV of Excel.
Here z =(x- = (2000-2200)/ = -200/ For what
)/
.
value of –200/ is the cumulative probability = 0.25.
,
The steps are given below one by one. First click Paste
Function in Excel.
The following paste function will appear. Mini Case-Solution You see in the above function NORMSINV is highlighted
corresponding to “Statistical” on the left side. Click OK and you
will get Mini Case-Solution Enter the probability value 0.25 in the cell above and then clic k
OK. You will get Mini Case-Solution The answer appears under the formula result above. The answer is
–0.6745 if we round it off to 4 places of decimal. That means
–200/= -.6745. Therefore =200/0.6745 =296.5159.
1)This is the standard deviation of the original population
distribution of labor hours saved. Approximately this is equal to
297 hours. Mini Case-Solution
2) The standard error of the sample mean = σ
=297/6 = 49.5
n σ=
x Mini Case-Solution
3) The probability of the sample mean of labor
hours exceeding break-even point labor hours is
same as P( X 2150)
Using the NORMDIST we can get the answer
directly. Please note that the standard deviation of
the sample mean that is 49.5 is to be used for
answering this question. Mini Case-Solution
Enter the value 2150 for X in the cell, mean =2200, standard deviation =
49.5, and 1 in the cell of cumulative. Click OK. You get In the above table of Excel you see the cumulative probability up to 2150
=0.1562. Therefore the probability of getting more than 2150 = 1- 0.1562
=0.8438. That is there is about 84 % chance that the distribution of the
mean labor hours saved by the machine will be more than the break- even
labor hours when a sample of 36 trials is taken. ...

View
Full Document