This preview shows page 1. Sign up to view the full content.
Unformatted text preview: Lessons in Business Statistics
Prepared By
P.K. Viswanathan Chapter 3: Measures of
Central Tendency and Dispersion Introduction
Raw Data are the raw materials that will have to be
converted into finished products (Information). From a
voluminous database containing raw data, it is
impossible to see any pattern unless they are converted
into information by data reduction. The reduction can
be achieved by summary measures, which are concise
and yet give a reasonably accurate view of the original
data. This chapter covers the important summary
measures of central tendency and dispersion (variation) 1) What is Central Tendency?
Whenever you measure things of the same kind, a
fairly large number of such measurements will tend
to cluster around the middle value. The question
that arises is " is it possible to define one typical
representative average in such a manner that the
remaining items in the data set will cluster around
this value?" will have a tendency to be closer to this
value? Such a value is called a measure of "Central
Tendency". The other terms that are used
synonymously are "Measures of Location", or
"Statistical Averages". 2) Measures of Central Tendency
Quantitative Specialists, Statisticians, and Information
Analysts rely heavily on summary measures when a
large mass of data will have to be analyzed to help
decisionmakers. As a manager, You need these
summary measures of central tendency to draw
meaningful conclusions in your functional area of
operation. The most widely used measures of central
tendency are Arithmetic Mean , Median, and Mode. Arithmetic Mean
Arithmetic Mean (called mean) is the most common measure of
central tendency used by all managers in their sphere of
activities. It is defined as the sum of all observations in a data set
divided by the total number of observations. For example,
consider a data set containing the following observations:
4, 3, 6, 5, 3, 3. The arithmetic mean = (4+3+6+5+3+3)/6 =4. In
symbolic form mean is given by
X X n X X n = Arithmetic Mean
= Indicates sum all X values in the data set
= Total number of observations(Sample Size) Arithmetic Mean for Raw Data
Example
The inner diameter of a particular grade of tire based on 5
sample measurements are as follows: (figures in millimeters)
565, 570, 572, 568, 585
Applying the formula X
X
n We get mean = (565+570+572+568+585)/5 =572
Caution: Arithmetic Mean is affected by extreme values or
fluctuations in sampling. It is not the best average to use
when the data set contains extreme values (Very high or very
low values). Median
Median is the middle most observation when you arrange data in
ascending or descending order of magnitude. That is, the data are
ranked and the middle value is picked up. Median is such that 50%
of the observations are above the median and 50% of the
observations are below the median.
Median is a very useful measure for ranked data in the context of
consumer preferences and rating. It is not affected by extreme values
but affected by the number of observations.
n
1
Median th value of ranked data
2 n = Number of observations in the sample
Note: If the sample size is an odd number then median is (n+1)/2 th
value in the ranked data. If the sample size is even, then median will
be between two middle values. You take the average of these two
middle values. Median for Raw Data
Example Odd Sample Size
Marks obtained by 7 students in Computer Science Exam
are given below: Compute the median.
45 40 60 80 90 65 55 Arranging the data after ranking gives
90 80 65 60 55 45 40 Median = (n+1)/2 th value in this set = (7+1)/2 th
observation= 4th observation=60
Hence Median = 60 for this problem. Median for Raw Data Example
 Even Sample Size
Diameter of a shaft in millimeters in a manufacturing unit is
Given below for 10 samples. Calculate the median value.
2.50
2.66 2.45
2.65 2.55 2.60 2.46 2.43 2.56 2.58 Arranging the data in the ascending order, you will get
2.43
2.65 2.45
2.66 2.46 2.50 2.55 2.56 2.58 2.60 The median falls between 5th and 6th observation. That is
between 2.55 and 2.56. Hence median = (2.55+2.56)/2 =2.555 Mode
Mode is that value which occurs most often. It has the
maximum frequency of occurrence. Mode is not affected by
extreme values.
Mode is a very useful measure when you want to keep in the
inventory, the most popular shirt in terms of collar size during
festival season. Median and mean will not be helpful in this
type of situation. Another example where mode is the only
answer is in determining the most typical shoe size to be kept in
stock in a shop selling shoes.
Caution: In a few problems in real life, there will be more than
one mode such as bimodal and multimodal values. In these
cases mode cannot be uniquely determined. Mode for Raw Data
Example
The life in number of hours of 10 flashlight batteries are as follows:
Find the mode.
340
350
340
340
320
340
330
330
340
350
340 occurs five times. Hence, mode=340. Mean for Grouped Data
Formula for Mean is given by fX X
n Where X
fX n = Mean = Sum of cross products of frequency in each class
with midpoint X of each class
= Total number of observations (Total frequency) = f Mean for Grouped Data
Example
Find the arithmetic mean for the following continuous
frequency distribution:
Class
01
Frequency 1 12
4 23
8 34
7 45
3 56
2 Solution for the Example
1
2
3
4
5
6
7
8
9 A
Class
01
12
23
34
45
56
Totals
Mean Applying the formula B
X
0.5
1.5
2.5
3.5
4.5
5.5 C
f
1
4
8
7
3
2
25 fX X
n D
fX
0.5
6.0
20.0
24.5
13.5
11.0
75.5
3.02 = 75.5/25=3.02 Median for Grouped
Data
Formula for Median is given by
Median = (n/2) m
L c
f Where
L =Lower limit of the median class
n = Total number of observations = f
m = Cumulative frequency preceding the median class
f = Frequency of the median class
c = Class interval of the median class Median for Grouped Data
Example
Find the median for the following continuous
frequency distribution:
Class
01
Frequency 1 12
4 23
8 34
7 45
3 56
2 Solution for the Example
Class Frequency Cumulative
Frequency
01
1
1
12
4
5
23
8
13
34
7
20
45
3
23
56
2
25
Total
25
Substituting in the formula the relevant values,
(
5
Median = L (n/2) m c ,we have Median = 225/ 2) 1
f
8
= 2.9375 Mode for Grouped Data
d1 c
Mode = L d1 2
d
Where L =Lower limit of the modal class d1 1 0
ff d2 1 2
ff f1 = Frequency of the modal class f0 = Frequency preceding the modal class f2 = Frequency succeeding the modal class
C = Class Interval of the modal class Mode for Grouped Data
Example
Example: Find the mode for the following
continuous frequency distribution:
Class
01
Frequency 1 12
4 23
8 34
7 45
3 56
2 Solution for the Example
Class
01
12
23
34
45
56
Total Frequency
1
4
8
7
3
2
25 d1 c
Mode = L d 1 d 2
L=2
d1 1 0 = 84 = 4
ff d 2 1 2 = 8 7 = 1
ff
C = 1 Hence Mode = 2 4 1
5
= 2.8 Comparison of
Mean, Median, Mode
Mean Median Mode Defined as the arithmetic
average of all observations in
the data set. Defined as the middle
value in the data set
arranged in ascending
or descending order. Defined as the most
frequently occurring value
in the distribution; it has
the largest frequency. Requires measurement on all
observations. Does not require
measurement on all
observations Does not require
measurement on all
observations Uniquely and
comprehensively defined. Cannot be determined Not uniquely defined for
under all conditions. multimodal situations. Comparison of
Mean, Median, Mode Cont.
Mean Median Mode Affected by extreme values. Not affected by extreme Not affected by extreme
values.
values.
Can be treated algebraically. Cannot be treated
algebraically. That is,
That is, Means of several
Medians of several
groups can be combined.
groups cannot be
combined. Cannot be treated
algebraically. That is,
Modes of several groups
cannot be combined. 3) Measures of Dispersion
In simple terms, measures of dispersion indicate how
large the spread of the distribution is around the
central tendency. It answers unambiguously the
question " What is the magnitude of departure from
the average value for different groups having
identical averages?". It is important to study the
central tendency along with dispersion to throw light
on the shape of the curve; to gauge whether there is
distortion to the bell shaped symmetrical normal
distribution curve that forms the foundation stone
upon which the entire statistical inference is built. Range
Range is the simplest of all measures of dispersion. It is calculated
as the difference between maximum and minimum value in the
data set.
Range = XMaximum Minimum
X Example for Computing Range
The following data represent the percentage return on investment
for 10 mutual funds per annum. Calculate Range.
12, 14, 11, 18, 10.5, 11.3, 12, 14, 11, 9
Range = XMaximum Minimum
X = 189=9 Limitation of Range
Caution: Range is a good measure of spread in the
distribution only when a data set shows a stable pattern
of variation without extreme values. If one of the
components of range namely the maximum value or
minimum value becomes an extreme value, then range
should not be used. Interquartile
Range
Range is entirely dependent on maximum and minimum
values in the data set and is highly misleading when one
of them is an extreme value. To overcome this
deficiency, you can resort to interquartile range. It is
computed as the range after eliminating the highest and
lowest 25% of observations in a data set that is arranged
in ascending order. Thus this measure is not sensitive to
extreme values.
Interquartile range = Range computed on middle 50%
of the observations Interquartile RangeExample
The following data represent the percentage return on
investment for 9 mutual funds per annum. Calculate
interquartile range.
Data Set: 12, 14, 11, 18, 10.5, 12, 14, 11, 9
Arranging in ascending order, the data set becomes
9, 10.5, 11, 11, 12, 12, 14, 14, 18
Ignore the first two (9, 10.5) and last two (14, 18)
observations in this data set. The remaining contains 50%
of the data. They are 11, 11, 12, 12, 14, and 14. For this
if you calculate range, you get interquartile range.
Interquartile range = 1411 =3. Mean Absolute Deviation(MAD)
Mean Absolute Deviation (MAD) is defined as the average based on the
deviations measured from arithmetic mean, in which all deviations are treated as
positive ignoring the actual sign. Unlike range, MAD is based on all
observations. Hence it reflects the dispersion of every item in the distribution. In
symbolic form, it is defined by the following formula.
MAD = X X
n Where X X represents sum of all deviations from arithmetic mean after
ignoring sign X = Arithmetic Mean n = Number of observations in the
sample(sample size)
Caution: Mean Absolute Deviation (MAD) has two weaknesses. 1) It cannot be
combined for several groups. 2) Ignoring the sign has serious implications to a
business manager attempting to measure the spread of the distribution in a
scientific manner. Example for MAD
The following data represent the percentage return on investment
for 10 mutual funds per annum. Calculate MAD (Please note that
this is the same example used for computing Range)
12, 14, 11, 18, 10.5, 11.3, 12, 14, 11, 9 X = (12+14+11+18+10.5+11.3+12+14+11+9)/10 =12.28
n X 12
X X = 12 .28 + 14 .28 + 11 .28
12
12 12
+ 10.5 .28 + 11.3 .28
12 + 18 .28
12 + 12 .28 + 14 .28
12
12 12
12
+ 11 .28 + 9 . 28 = 18.32
MAD = X X = 18.32/10 =1.832
n Standard Deviation
Standard deviation forms the basis for the discussion on Inferential
Statistics. It is a classic measure of dispersion. It has many
advantages over the rest of the measures of variations. It is based
on all observations. It is capable of being algebraically treated
which implies that you can combine standard deviations of many
groups. It plays a very vital role in testing hypotheses and forming
confidence interval.
To define standard deviation, you need to define another term
called variance. In simple terms, standard deviation is the square
root of variance. Important Terms with Notations
Im p o rta n t T erm s w ith n o ta tio n s S am p le V ariance S 2 ( X X ) 2 n
1
S am p le S tand ard D eviatio n ( X X ) S= ( X
= ) ( X ) X
n ( X X ) 2 e stim ato r o f 2. X n is an u nbia sed n
1 2 X (X
= 3. 2 (S am p le M ea n) and X (P o p u latio n M ean)
N
n = N u m b er o f o bservatio ns in the
sa m p le (S am p le size)
N = N u m be r o f o bservatio ns in the
P o p u la tio n (P o p u latio n S ize) 4. ) 2 N is an u nbiased e stim ato r o f N W here X S 2 2 N
P o p u la tio n S tand ard D e viatio n 1. 2 n
1
P o p u la tio n V ariance
2 K ey R em a rk s X N
T he d iviso r n  1 is alw ays u sed
w hile calcu lating sa m p le variance
fo r ensu ring p ro p erty o f being
u nbiased
S tand ard d eviatio n is alw ays the
sq u are ro o t o f variance Example for Standard
Deviation
The following data represent the percentage return
on investment for 10 mutual funds per annum.
Calculate the sample standard deviation.
12, 14, 11, 18, 10.5, 11.3, 12, 14, 11, 9 Solution for the Example Solution for the Example
Cont.
From the spreadsheet of Microsoft Excel in the previous slide, it is
easy to see X =12.28 (In column A and row14, 12.28 is
that Mean = X n seen). (X )
X = 6.33 (In column D and row 14,
n
1
2 Sample Variance = 2 S 6.33 is seen) (X X ) Sample Standard Deviation = S =
(In column D and row 15, 2.52 is seen) n
1 2 = 2.52 Standard Deviation for
Grouped Data
The standard deviation for sample data, based on frequency
distribution is given by
S= f(X X )
n
1 2 which is used to estimate the Population Standard Deviation .
Here X fX
n n is the Sample Size = f , X =Mid Point of each class Standard Deviation for
Grouped DataExample
Frequency Distribution of
Funds
Return on
Investment
510
1015
1520
2025
2530
Total Return on Investment of Mutual
Number of
Mutual Funds
10
12
16
14
8
60 Solution for the Example Solution for the Example
From the spreadsheet of Microsoft Excel in the previous
slide, it is easy to see
Mean = X fX n =1040/60=17.333(cell F10),
X
f(X ) Standard Deviation = S =
(Cell H12) n
1 2 = 2448.33
59 = 6.44 Coefficient of Variation
(Relative Dispersion)
Coefficient of Variation (CV) is defined as the ratio of Standard
Deviation to Mean.
In symbolic form S
CV =
for the sample data and =
X σ
μ for the population data. CV is the measure to use when you want to see the relative spread
across groups or segments. It also measures the extent of spread in
a distribution as a percentage to the mean. Larger the CV, greater is
the percentage spread. As a manager, you would like to have a
small CV so that your assessment in a situation is robust. The
percentage risk is minimized. Coefficient of Variation
Example
Consider two Sales Persons working in the same territory.
The sales performance of these two in the context of selling
PCs are given below. Comment on the results.
Sales Person 1
Mean Sales (One year
average) 50 units Sales Person 2
Mean Sales (One year
average)75 units Standard Deviation
5 units Standard deviation
25 units Interpretation for the Example
The CV is 5/50 =0.10 or 10% for the Sales Person1 and
25/75=0.33 or 33% for sales Person2. It seems Sales
Person1 performs better than Sales Person2 with less
relative dispersion or scattering. Sales Person2 has a
very high departure or standard deviation from his
average sales achievement. The moral of the story is
"don't get carried away by absolute number". Look at the
scatter. Even though, Sales Person2 has achieved a
higher average, his performance is not consistent and
seems erratic. ...
View
Full
Document
This note was uploaded on 02/24/2012 for the course BUSINESS 281 taught by Professor Gray during the Spring '12 term at Florida State College.
 Spring '12
 gray
 Business

Click to edit the document details