Bstat3 - Lessons in Business Statistics Prepared By P.K....

Info iconThis preview shows page 1. Sign up to view the full content.

View Full Document Right Arrow Icon
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: Lessons in Business Statistics Prepared By P.K. Viswanathan Chapter 3: Measures of Central Tendency and Dispersion Introduction Raw Data are the raw materials that will have to be converted into finished products (Information). From a voluminous database containing raw data, it is impossible to see any pattern unless they are converted into information by data reduction. The reduction can be achieved by summary measures, which are concise and yet give a reasonably accurate view of the original data. This chapter covers the important summary measures of central tendency and dispersion (variation) 1) What is Central Tendency? Whenever you measure things of the same kind, a fairly large number of such measurements will tend to cluster around the middle value. The question that arises is " is it possible to define one typical representative average in such a manner that the remaining items in the data set will cluster around this value?" will have a tendency to be closer to this value? Such a value is called a measure of "Central Tendency". The other terms that are used synonymously are "Measures of Location", or "Statistical Averages". 2) Measures of Central Tendency Quantitative Specialists, Statisticians, and Information Analysts rely heavily on summary measures when a large mass of data will have to be analyzed to help decision-makers. As a manager, You need these summary measures of central tendency to draw meaningful conclusions in your functional area of operation. The most widely used measures of central tendency are Arithmetic Mean , Median, and Mode. Arithmetic Mean Arithmetic Mean (called mean) is the most common measure of central tendency used by all managers in their sphere of activities. It is defined as the sum of all observations in a data set divided by the total number of observations. For example, consider a data set containing the following observations: 4, 3, 6, 5, 3, 3. The arithmetic mean = (4+3+6+5+3+3)/6 =4. In symbolic form mean is given by X X n X X n = Arithmetic Mean = Indicates sum all X values in the data set = Total number of observations(Sample Size) Arithmetic Mean for Raw Data Example The inner diameter of a particular grade of tire based on 5 sample measurements are as follows: (figures in millimeters) 565, 570, 572, 568, 585 Applying the formula X X n We get mean = (565+570+572+568+585)/5 =572 Caution: Arithmetic Mean is affected by extreme values or fluctuations in sampling. It is not the best average to use when the data set contains extreme values (Very high or very low values). Median Median is the middle most observation when you arrange data in ascending or descending order of magnitude. That is, the data are ranked and the middle value is picked up. Median is such that 50% of the observations are above the median and 50% of the observations are below the median. Median is a very useful measure for ranked data in the context of consumer preferences and rating. It is not affected by extreme values but affected by the number of observations. n 1 Median th value of ranked data 2 n = Number of observations in the sample Note: If the sample size is an odd number then median is (n+1)/2 th value in the ranked data. If the sample size is even, then median will be between two middle values. You take the average of these two middle values. Median for Raw Data Example -Odd Sample Size Marks obtained by 7 students in Computer Science Exam are given below: Compute the median. 45 40 60 80 90 65 55 Arranging the data after ranking gives 90 80 65 60 55 45 40 Median = (n+1)/2 th value in this set = (7+1)/2 th observation= 4th observation=60 Hence Median = 60 for this problem. Median for Raw Data Example - Even Sample Size Diameter of a shaft in millimeters in a manufacturing unit is Given below for 10 samples. Calculate the median value. 2.50 2.66 2.45 2.65 2.55 2.60 2.46 2.43 2.56 2.58 Arranging the data in the ascending order, you will get 2.43 2.65 2.45 2.66 2.46 2.50 2.55 2.56 2.58 2.60 The median falls between 5th and 6th observation. That is between 2.55 and 2.56. Hence median = (2.55+2.56)/2 =2.555 Mode Mode is that value which occurs most often. It has the maximum frequency of occurrence. Mode is not affected by extreme values. Mode is a very useful measure when you want to keep in the inventory, the most popular shirt in terms of collar size during festival season. Median and mean will not be helpful in this type of situation. Another example where mode is the only answer is in determining the most typical shoe size to be kept in stock in a shop selling shoes. Caution: In a few problems in real life, there will be more than one mode such as bimodal and multi-modal values. In these cases mode cannot be uniquely determined. Mode for Raw Data Example The life in number of hours of 10 flashlight batteries are as follows: Find the mode. 340 350 340 340 320 340 330 330 340 350 340 occurs five times. Hence, mode=340. Mean for Grouped Data Formula for Mean is given by fX X n Where X fX n = Mean = Sum of cross products of frequency in each class with midpoint X of each class = Total number of observations (Total frequency) = f Mean for Grouped Data Example Find the arithmetic mean for the following continuous frequency distribution: Class 0-1 Frequency 1 1-2 4 2-3 8 3-4 7 4-5 3 5-6 2 Solution for the Example 1 2 3 4 5 6 7 8 9 A Class 0-1 1-2 2-3 3-4 4-5 5-6 Totals Mean Applying the formula B X 0.5 1.5 2.5 3.5 4.5 5.5 C f 1 4 8 7 3 2 25 fX X n D fX 0.5 6.0 20.0 24.5 13.5 11.0 75.5 3.02 = 75.5/25=3.02 Median for Grouped Data Formula for Median is given by Median = (n/2) m L c f Where L =Lower limit of the median class n = Total number of observations = f m = Cumulative frequency preceding the median class f = Frequency of the median class c = Class interval of the median class Median for Grouped Data Example Find the median for the following continuous frequency distribution: Class 0-1 Frequency 1 1-2 4 2-3 8 3-4 7 4-5 3 5-6 2 Solution for the Example Class Frequency Cumulative Frequency 0-1 1 1 1-2 4 5 2-3 8 13 3-4 7 20 4-5 3 23 5-6 2 25 Total 25 Substituting in the formula the relevant values, ( 5 Median = L (n/2) m c ,we have Median = 225/ 2) 1 f 8 = 2.9375 Mode for Grouped Data d1 c Mode = L d1 2 d Where L =Lower limit of the modal class d1 1 0 ff d2 1 2 ff f1 = Frequency of the modal class f0 = Frequency preceding the modal class f2 = Frequency succeeding the modal class C = Class Interval of the modal class Mode for Grouped Data Example Example: Find the mode for the following continuous frequency distribution: Class 0-1 Frequency 1 1-2 4 2-3 8 3-4 7 4-5 3 5-6 2 Solution for the Example Class 0-1 1-2 2-3 3-4 4-5 5-6 Total Frequency 1 4 8 7 3 2 25 d1 c Mode = L d 1 d 2 L=2 d1 1 0 = 8-4 = 4 ff d 2 1 2 = 8 -7 = 1 ff C = 1 Hence Mode = 2 4 1 5 = 2.8 Comparison of Mean, Median, Mode Mean Median Mode Defined as the arithmetic average of all observations in the data set. Defined as the middle value in the data set arranged in ascending or descending order. Defined as the most frequently occurring value in the distribution; it has the largest frequency. Requires measurement on all observations. Does not require measurement on all observations Does not require measurement on all observations Uniquely and comprehensively defined. Cannot be determined Not uniquely defined for under all conditions. multi-modal situations. Comparison of Mean, Median, Mode Cont. Mean Median Mode Affected by extreme values. Not affected by extreme Not affected by extreme values. values. Can be treated algebraically. Cannot be treated algebraically. That is, That is, Means of several Medians of several groups can be combined. groups cannot be combined. Cannot be treated algebraically. That is, Modes of several groups cannot be combined. 3) Measures of Dispersion In simple terms, measures of dispersion indicate how large the spread of the distribution is around the central tendency. It answers unambiguously the question " What is the magnitude of departure from the average value for different groups having identical averages?". It is important to study the central tendency along with dispersion to throw light on the shape of the curve; to gauge whether there is distortion to the bell shaped symmetrical normal distribution curve that forms the foundation stone upon which the entire statistical inference is built. Range Range is the simplest of all measures of dispersion. It is calculated as the difference between maximum and minimum value in the data set. Range = XMaximum Minimum X Example for Computing Range The following data represent the percentage return on investment for 10 mutual funds per annum. Calculate Range. 12, 14, 11, 18, 10.5, 11.3, 12, 14, 11, 9 Range = XMaximum Minimum X = 18-9=9 Limitation of Range Caution: Range is a good measure of spread in the distribution only when a data set shows a stable pattern of variation without extreme values. If one of the components of range namely the maximum value or minimum value becomes an extreme value, then range should not be used. Interquartile Range Range is entirely dependent on maximum and minimum values in the data set and is highly misleading when one of them is an extreme value. To overcome this deficiency, you can resort to interquartile range. It is computed as the range after eliminating the highest and lowest 25% of observations in a data set that is arranged in ascending order. Thus this measure is not sensitive to extreme values. Interquartile range = Range computed on middle 50% of the observations Interquartile Range-Example The following data represent the percentage return on investment for 9 mutual funds per annum. Calculate interquartile range. Data Set: 12, 14, 11, 18, 10.5, 12, 14, 11, 9 Arranging in ascending order, the data set becomes 9, 10.5, 11, 11, 12, 12, 14, 14, 18 Ignore the first two (9, 10.5) and last two (14, 18) observations in this data set. The remaining contains 50% of the data. They are 11, 11, 12, 12, 14, and 14. For this if you calculate range, you get interquartile range. Interquartile range = 14-11 =3. Mean Absolute Deviation(MAD) Mean Absolute Deviation (MAD) is defined as the average based on the deviations measured from arithmetic mean, in which all deviations are treated as positive ignoring the actual sign. Unlike range, MAD is based on all observations. Hence it reflects the dispersion of every item in the distribution. In symbolic form, it is defined by the following formula. MAD = X X n Where X X represents sum of all deviations from arithmetic mean after ignoring sign X = Arithmetic Mean n = Number of observations in the sample(sample size) Caution: Mean Absolute Deviation (MAD) has two weaknesses. 1) It cannot be combined for several groups. 2) Ignoring the sign has serious implications to a business manager attempting to measure the spread of the distribution in a scientific manner. Example for MAD The following data represent the percentage return on investment for 10 mutual funds per annum. Calculate MAD (Please note that this is the same example used for computing Range) 12, 14, 11, 18, 10.5, 11.3, 12, 14, 11, 9 X = (12+14+11+18+10.5+11.3+12+14+11+9)/10 =12.28 n X 12 X X = 12 .28 + 14 .28 + 11 .28 12 12 12 + 10.5 .28 + 11.3 .28 12 + 18 .28 12 + 12 .28 + 14 .28 12 12 12 12 + 11 .28 + 9 . 28 = 18.32 MAD = X X = 18.32/10 =1.832 n Standard Deviation Standard deviation forms the basis for the discussion on Inferential Statistics. It is a classic measure of dispersion. It has many advantages over the rest of the measures of variations. It is based on all observations. It is capable of being algebraically treated which implies that you can combine standard deviations of many groups. It plays a very vital role in testing hypotheses and forming confidence interval. To define standard deviation, you need to define another term called variance. In simple terms, standard deviation is the square root of variance. Important Terms with Notations Im p o rta n t T erm s w ith n o ta tio n s S am p le V ariance S 2 ( X X ) 2 n 1 S am p le S tand ard D eviatio n ( X X ) S= ( X = ) ( X ) X n ( X X ) 2 e stim ato r o f 2. X n is an u nbia sed n 1 2 X (X = 3. 2 (S am p le M ea n) and X (P o p u latio n M ean) N n = N u m b er o f o bservatio ns in the sa m p le (S am p le size) N = N u m be r o f o bservatio ns in the P o p u la tio n (P o p u latio n S ize) 4. ) 2 N is an u nbiased e stim ato r o f N W here X S 2 2 N P o p u la tio n S tand ard D e viatio n 1. 2 n 1 P o p u la tio n V ariance 2 K ey R em a rk s X N T he d iviso r n - 1 is alw ays u sed w hile calcu lating sa m p le variance fo r ensu ring p ro p erty o f being u nbiased S tand ard d eviatio n is alw ays the sq u are ro o t o f variance Example for Standard Deviation The following data represent the percentage return on investment for 10 mutual funds per annum. Calculate the sample standard deviation. 12, 14, 11, 18, 10.5, 11.3, 12, 14, 11, 9 Solution for the Example Solution for the Example Cont. From the spreadsheet of Microsoft Excel in the previous slide, it is easy to see X =12.28 (In column A and row14, 12.28 is that Mean = X n seen). (X ) X = 6.33 (In column D and row 14, n 1 2 Sample Variance = 2 S 6.33 is seen) (X X ) Sample Standard Deviation = S = (In column D and row 15, 2.52 is seen) n 1 2 = 2.52 Standard Deviation for Grouped Data The standard deviation for sample data, based on frequency distribution is given by S= f(X X ) n 1 2 which is used to estimate the Population Standard Deviation . Here X fX n n is the Sample Size = f , X =Mid Point of each class Standard Deviation for Grouped Data-Example Frequency Distribution of Funds Return on Investment 5-10 10-15 15-20 20-25 25-30 Total Return on Investment of Mutual Number of Mutual Funds 10 12 16 14 8 60 Solution for the Example Solution for the Example From the spreadsheet of Microsoft Excel in the previous slide, it is easy to see Mean = X fX n =1040/60=17.333(cell F10), X f(X ) Standard Deviation = S = (Cell H12) n 1 2 = 2448.33 59 = 6.44 Coefficient of Variation (Relative Dispersion) Coefficient of Variation (CV) is defined as the ratio of Standard Deviation to Mean. In symbolic form S CV = for the sample data and = X σ μ for the population data. CV is the measure to use when you want to see the relative spread across groups or segments. It also measures the extent of spread in a distribution as a percentage to the mean. Larger the CV, greater is the percentage spread. As a manager, you would like to have a small CV so that your assessment in a situation is robust. The percentage risk is minimized. Coefficient of Variation Example Consider two Sales Persons working in the same territory. The sales performance of these two in the context of selling PCs are given below. Comment on the results. Sales Person 1 Mean Sales (One year average) 50 units Sales Person 2 Mean Sales (One year average)75 units Standard Deviation 5 units Standard deviation 25 units Interpretation for the Example The CV is 5/50 =0.10 or 10% for the Sales Person1 and 25/75=0.33 or 33% for sales Person2. It seems Sales Person1 performs better than Sales Person2 with less relative dispersion or scattering. Sales Person2 has a very high departure or standard deviation from his average sales achievement. The moral of the story is "don't get carried away by absolute number". Look at the scatter. Even though, Sales Person2 has achieved a higher average, his performance is not consistent and seems erratic. ...
View Full Document

This note was uploaded on 02/24/2012 for the course BUSINESS 281 taught by Professor Gray during the Spring '12 term at Florida State College.

Ask a homework question - tutors are online