**Unformatted text preview: **Chapter 4
Measures of
Variation 09/13/15 1 Contents
Introduction Variability and Decision Making Absolute and Relative Variation Measures of Variation Mean Deviation Standard Deviation 09/13/15 2 Contents
(Continued…) MS Excel and Measures of
Variation Relationship Between Arithmetic
Mean and Standard Deviation Relationship Between Different
Measures of Variation Lorenz Curve 09/13/15 3 Variability and Decision
Making Variability is a universal phenomenon.
People vary according to their height, weight,
intelligence etc.
There are differences in income and assets of
the individuals in an economy.
The businesses differ on account of their
sales.
The quantity of detergent powder in different
one-kg pouches is not likely to be exactly
same. 09/13/15 4 Variability and Decision
Making Importance of variation
The extent of variation may be the main
underlying criterion to decide whether or
not to buy a given security. a key input in determining call option and
put option values is the variability in the
price of the share involved in the option.
Thus, variability has a key role to play
when we realize that annually, business in
options worth millions of rupees is done. 09/13/15 5 Variability and Decision
Making (Contd….)
Variability has an important role to play
in quality control in industrial
production. Variability is the key to decide whether a
production process is under control or
not. 09/13/15 6 Absolute And Relative
Variation The variability or dispersion may be
measured in absolute or relative terms.
Absolute variation refers to the quantum of
variability in a given set of data and it is
expressed in the same units as the original
data are in.
For example, the absolute variation in the
salaries of the workers and their ages will
be expressed in terms of rupees and years
respectively. 09/13/15 7 Absolute And Relative
Variation (Contd….) Absolute
variation
does
not
allow
comparison of two or more sets of data in
terms of their variability, especially when
they involve different variables and,
therefore, different units of measurement.
A comparison can be made only when we
get rid of the units of the data involved.
It is done by computing some measures of
relative variation, called co-efficients of
dispersion and expressed as pure numbers,
in the form of proportions or percentages. 09/13/15 8 Absolute And Relative
Variation (Contd….) For example, if we desire to compare
whether there is greater variability in the
wages of skilled workers and wages of
unskilled workers, deceiving results can
be obtained if the comparison is made in
terms of a measure of absolute variation.
This is because although wages of both
groups are expressed in same units,
namely rupees, the groups are evidently
dissimilar. 09/13/15 9 Absolute And Relative
Variation (Contd….) Therefore, whenever two or more
groups are to be compared for
variation, some measure of relative
variation should be used. 09/13/15 10 Measures of Variation A. Although, several measures are
available for studying variation,
however, the choice of an appropriate
measure depends upon the type of
distribution and precision required. The
different measures are
Measure involving limits
a)
b) Range
Partial ranges comprising
i.
ii. c)
09/13/15 Inter-quartile range
Inter-percentile ranges Quartile deviation 11 Measures of Variation
(Contd….)
Measures involving deviations B. Mean Deviation
b) Standard Deviation
Besides, variation is studied graphically
for some special applications
a) 09/13/15 12 Range
Range is defined as the difference
between the largest and the smallest
values in the data set. This is the most commonly understood
and used measure of variation. Symbolically, it can be shown as
R=L–S
Where R is the range, L is the largest
value and S is the smallest value in the
data set.
09/13/15 13 Range (Contd….) In the case of a discrete frequency
distribution, the range is equal to the
difference between the largest and
smallest X values.
In the case of grouped frequency
distribution, it is equal to the
difference between the upper limit of
the highest class interval and the
lower limit of the lowest class interval. 09/13/15 14 Range (Contd….)
Important points i. ii. 09/13/15 In a grouped frequency distribution,
the range is only an approximation
since the actual largest and smallest
values are not available
If a given frequency distribution is
open-ended, then the range can not be
determined. 15 Co-efficient of Range The range is a measure of absolute
variation. Its value is dependent on the
units of measurement of data.
Moreover, it is expressed in the same
units as the original data.
For comparison of two or more
distributions we calculate co-efficient
of range (CR), a measure of relative
variation. 09/13/15 16 Co-efficient of Range
(Contd….)
Symbolically, 09/13/15 17 Limitation and Uses of
Range Although, range is easy to calculate but it is
difficult to interpret it properly without
having regard to the number of items in the
data.
Larger the sample size, the greater in
general, the range.
Range
shows
very
large
sampling
fluctuations.
Range completely ignores the variability in
the items between these two extremes. 09/13/15 18 Limitation and Uses of
Range (Contd….) The range is a poor measure of dispersion
in cases where the number of items in the
sample is large. However, in small
samples, it is useful in certain cases.
The range has a good deal of importance
in industrial quality control where a large
number of small samples are studied to
determine if the quality of the products
being produced by a production process is
under control or not. 09/13/15 19 Limitation and Uses of
Range (Contd….) There are many other situations where
range finds place.
For example, the stock exchange
quotations and prices in the
commodity markets found in the
financial columns of newspapers are in
terms of the high and low.
Agricultural productivity in an area, the
rainfall, employment etc also is usually
quoted in terms of range. 09/13/15 20 Partial Ranges Range may not serve as a suitable indicator
of variability in a set of data particularly
when some extreme values are present.
In those cases, partial ranges are advisable
to use to gauge the extent of variability.
The partial ranges do not consider the
complete data and ignore part of the data
on each end.
The commonly used partial ranges are interquartile range and inter- percentile range. 09/13/15 21 Inter-quartile Range The distance between the lower and
the upper quartiles of a distribution is
termed as the inter-quartile range, IQR.
It gives the range of the middle 50
percent of the values, ignoring 25
percent values on the lower end and
an equal number of values on the
upper end of the data.
Symbolically, IQR = Q3 – Q1 09/13/15 22 Inter-percentile Range While the inter-quartile range covers middle 50
percent of data, the inter-percentile range
allows considering a greater part of the data to
get the idea of variability.
Although we can obtain the range of values
between any pair of percentiles located
equidistant from the middle of the distribution,
the most commonly used is the one based on
the 10th and the 90th percentiles, leaving 10
percent of the observations on each end of the
distribution. 09/13/15 23 Inter-percentile Range The Inter- percentile Range (IPR) is
defined as IPR = P90 – P10 where P90
is the 90th percentile and P10 is the
10th percentile of the distribution.
It can be easily visualized that the
inter-quartile range is also a special
case of the inter-percentile range.
IQR is in fact the inter-percentile (25 –
75) range. 09/13/15 24 Quartile Deviation The quartile deviation, QD, is numerically
equal to one-half of the inter-quartile
range.
It is also called semi inter-quartile range. where Q3 and Q1 are, respectively, the third
and the first quartile of a distribution.
09/13/15 25 Co-efficient of Quartile
Deviation The quartile deviation is an absolute
measure of variation.
For comparison of distributions, the coefficient of quartile deviation, CQD, is
calculated.
The CQD is obtained by dividing the
quartile deviation by one-half of the
summation of the values of the upper
and lower quartiles. 09/13/15 26 Co-efficient of Quartile
Deviation (Contd….) The co-efficient of quartile deviation is
also called quartile co-efficient of
dispersion.
quartile deviation along with its coefficient is the most suitable measure to
study and compare variability when
open-ended distributions are given. 09/13/15 27 Mean Deviation Range, inter-percentile range, and quartile
deviation consider only part of the given
data.
Mean deviation and standard deviation, on
the other hand, are the measures which
are based on all observations of the data.
Mean deviation, MD, refers to the average
of absolute deviations of observations from
their mean (or median or mode). 09/13/15 28 Mean Deviation
(Contd….) It is represented by δ (small delta) with
a subscript either
or Me or Mo,
depending on whether the deviations
are taken from mean, median or mode.
Since the calculation of mean deviation
involves absolute deviations, it is also
called mean absolute deviation (MAD).
Calculation of MD differs in case of
individual
series
and
grouped
frequency distribution 09/13/15 29 Calculation
Deviation of Mean In case of Individual Series: In such
a case, the following steps are
involved. i.
ii. iii. 09/13/15 Obtain the average.
Take deviation of each of the
observations from the average and
consider absolute values of these.
Obtain the total of deviations and
divide by n, the number of
observations.
30 Calculation of Mean
Deviation (Contd….)
In case of Frequency Distribution i.
ii. iii. 09/13/15 Calculate average value from which
deviations are to be taken.
Measure absolute deviations of X values
(mid-points of various class intervals in case
of grouped frequency distributions) from the
average. Thus, if median is the average
used, compute
Multiply each of the deviations with
corresponding
frequency
and
obtain
summation of these products. It is
when median is used.
31 Calculation of Mean
Deviation (Contd….)
Divide the summation obtained in step (iii)
by
Symbolically,
iv. It may be noted that in case of openended distributions, we cannot calculate
mean deviation because mid-points of the
open-ended classes are not defined.
09/13/15 32 Co-efficient of Mean
Deviation (CMD)
The Co-efficient of Mean Deviation is
a measure of relative variation in a
given set of data. This is obtained by dividing the mean
deviation value by the value of the
average used to measure deviations.
Symbolically, 09/13/15 33 Standard Deviation The measures of variation discussed so far suffer
from some limitation or the other.
The standard deviation has no such limitations
and it also enjoys some well-defined mathematical
properties.
Moreover, it is extensively used in statistical
analysis.
Standard deviation is defined as the positive
square root of the average of squared deviations
taken from arithmetic mean.
It is also labeled as root-mean-squared-deviations. 09/13/15 34 Standard Deviation
(Contd….) Calculation of S. D. Individual Series Frequency Distribution 09/13/15 Deviations from arithmetic mean
Deviations from assumed mean
Calculation without measuring deviations:
When deviations are taken from arithmetic
mean
When deviations are taken from assumed mean
When step deviations are taken
When deviations are not taken:
35 Calculation of SD in
Individual Series
Calculation using deviations from arithmetic
mean involves the following steps Calculate the arithmetic mean, X. Measure deviations of each of the given X
values from
, and square them to get
. Find the average of the squared deviations,
/n. Finally, extract positive square root of the
average obtained in (iii) to get the standard
deviation, σ.
Symbolically,
i. 09/13/15 36 Calculation of SD in
Individual Series (Contd….) The following formula is then employed to
obtain the standard deviation value by
using deviation from assumed mean. where d = X – A,
Σd2 indicates the summation of squares of
deviations from A,
Σd indicates the summation of deviations
from A, and
n is the number of observations.
09/13/15 37 Calculation of SD in
Individual Series (Contd….) Standard deviation can also be
calculated
without
measuring
deviations and the following formula
is used for the purpose. 09/13/15 38 Calculation of SD in
Frequency Distributions
The following steps are taken to
calculate standard deviation when
deviations are taken from Arithmetic
Mean. i.
ii. iii.
iv.
09/13/15 Calculate the mean.
Measure deviations of X values (mid-points
in case of grouped frequency distributions)
from
; square them and multiply by
corresponding frequencies,
Divide
by Σf. This gives the variance.
Extract positive square root of the variance
to get standard deviation.
39 Calculation of SD in
Frequency Distributions
(Contd….) The formula can be shown as 09/13/15 40 Calculation of SD in
Frequency Distributions
(Contd….) The following steps are involved to
calculate standard deviation when
deviations are taken from assumed
mean. i.
ii. iii.
09/13/15 Choose an assumed mean, A.
Measure deviations of X values from A and
label the deviations as d. Multiply d and d2
with the corresponding frequencies and
add these to get Σfd and Σfd2.
Apply the formula:
41 Calculation of SD in
Frequency Distributions
(Contd….) When step deviations are taken then the
formula employed is The formula to calculate SD, when the
deviation is not taken, is where ΣfX2 is the summation of the products
of the squares of X values and the
corresponding frequencies.
09/13/15 42 Properties of Standard
Deviation
The standard deviation possesses some
well-defined mathematical properties
which make it a useful statistical tool.
The properties are discussed below.
i. 09/13/15 If a constant K is added to, or deducted
from, each value in a given series, the
value of standard deviation remains
unaffected. Symbolically, 43 Properties of Standard
Deviation (Contd….)
ii. 09/13/15 If every value in a series is
multiplied (divided) by a constant,
the standard deviation is multiplied
(divided) by the absolute value of
the constant. Symbolically, 44 Properties of Standard
Deviation (Contd….)
iii. iv. 09/13/15 The combined SD of two or more
sets of data can be obtained from
their individual means and standard
deviation values.
For two distributions with n1 and n2
observations,
and
mean
values, and σ1 and σ2 standard
deviations respectively, we have
the following:
45 Properties of Standard
Deviation (Contd….)
Where, σ is the combined standard
deviation,
, and is the
combined mean obtained as The rule can be extended to any number
of data sets
09/13/15 46 Properties of Standard
Deviation (Contd….)
For the cases involving two
distributions, the following
alternative formula can be employed. 09/13/15 47 Variance The variance of a set of data refers to
the
average
of
the
squared
deviations measured from arithmetic
mean.
It is, therefore, equal to squared
standard deviation, σ2. We have, 09/13/15 48 Properties of Variance
i. ii. 09/13/15 If each value in a given series is
increased (decreased) by a constant,
K, the variance remains unaffected.
If each value in a given series is
multiplied (divided) by a constant, K,
the variance of the new series would
be equal to variance of the given
series multiplied/divided by K2.
Symbolically,
49 Properties of Variance
(Contd….)
iii. The combined variance of two or
more series can be obtained given
their sizes, means and variances.
For two series, In this,
09/13/15 and
50 Co-efficient of Variation
The standard deviation (and variance) cannot be
used to compare variability between two groups of
data of different types. Moreover, both standard
deviation and variance are measure of absolute
variation. For example, a comparison of variation in wages
and production cannot be made in terms of their
standard deviations since they involve different
units. Even if they involve same units, it is not
advisable to use the measure of absolute
variation. a valid comparison always calls for using coefficient of variation, which is a measure of
relative variation.
51
09/13/15 Co-efficient of Variation
(Contd….) The co-efficient of variation tells how
much the standard deviation as a
percentage of the arithmetic mean
is. Thus, 09/13/15 52 MS Excel and Measures of
Variation
The MS Excel provides several functions
related to variation which can be usefully
employed.
In context of the measures of variation
discussed in this chapter, the following
functions may be used when a set of
individual values is given. The functions are: I. 09/13/15 AVEDEV: It returns the mean of the absolute
deviations of the given set of values from their
arithmetic mean. It thus yields mean deviation
about mean for the given data.
53 MS Excel and Measures of
Variation (Contd….)
II. III. IV.
V. 09/13/15 STDEV: This gives standard deviation of
the given values under the assumption that
the values represent a sample data.
STDEVP: This also gives standard deviation
of the given values. it does so under the
assumption that the values are a set of
population data.
VAR: It yields variance of the given values,
assuming they are data from a sample.
VARP: Like STDEVP, it also considers the
values to be a set of population. It gives
variance of the given set of data.
54 Relationship between
Arithmetic Mean and
Standard Deviation The arithmetic mean is a measure of
central tendency while the standard
deviation is a measure of variation or the
spread of the values in a set of data.
There are two general rules which
establish a relationship between these
measures in a given set of data. One of
these is called the Chebyshev’s theorem
or Chebyshev’s inequality, while the
other is known as the empirical rule. 09/13/15 55 Chebyshev’s Theorem It is a mathematical theorem which
states that in any distribution, at
least 1 – 1/k2 of the observations will
lie within k standard deviations of
mean. 09/13/15 56 Empirical Rule In addition to the Chebyshev’s theorem,
there is another type of statement about the
relationship between mean and standard
deviation. This is called the empirical rule.
This is based on the assumption that the
underlying population is bell-shaped which
tapers off smoothly on both the ends.
Such a distribution is called normal
distribution and looks like one shown in the
following figure 09/13/15 57 Empirical Rule (Contd….) 09/13/15 58 Empirical Rule (Contd….) For a normal distribution, 68.27% of the
values are included within one standard
deviation below the mean and one
standard deviation above it.
Similarly, µ ± 2σ covers 95.45% of the
values while µ ± 3σ covers 99.73% of the
values of the distribution.
It is significant to note that while this
relationship is true for normal distributions
only, but it can be used even for
distributions that are not strictly normal.
09/13/15 59 Empirical Rule (Contd….) Consequently,
for
most
of
the
frequency
distributions which are symmetrical or nearly
symmetrical and are large in size, the following rules
may be used. 09/13/15 Nearly 68 percent of the observations are included within
one standard deviation of the mean () on either side of it.
That is to say, ± σ covers nearly 68 percent of the data
values.
Nearly 95 percent of the observations are included within
two standard deviations of the mean on either side of it.
Thus, ± 2σ covers about 95 percent of the values of data.
Nearly all values in the data are included within three
standard deviations of the mean value. Accordingly, ± 3σ
covers almost all observations in the distribution. 60 Z-scores The relationship between mean and
standard deviation also allows one to
determine the relative position of an
X-value.
This is done by what is called the zscore.
The
z-score,
also
called
ztransformation, of X is defined as 09/13/15 61 Z-scores (Contd….) Evidently, a z-score determines the relative
position of any value in terms of the number of
standard deviations above or below the mean.
A positive value of Z-score indicates that the...

View
Full Document