Measures of Variation - Chapter 4 Measures of Variation 1 Contents Introduction Variability and Decision Making Absolute and Relative Variation Measures

Measures of Variation - Chapter 4 Measures of Variation 1...

This preview shows page 1 out of 65 pages.

Unformatted text preview: Chapter 4 Measures of Variation 09/13/15 1 Contents Introduction Variability and Decision Making Absolute and Relative Variation Measures of Variation Mean Deviation Standard Deviation 09/13/15 2 Contents (Continued…) MS Excel and Measures of Variation Relationship Between Arithmetic Mean and Standard Deviation Relationship Between Different Measures of Variation Lorenz Curve 09/13/15 3 Variability and Decision Making Variability is a universal phenomenon. People vary according to their height, weight, intelligence etc. There are differences in income and assets of the individuals in an economy. The businesses differ on account of their sales. The quantity of detergent powder in different one-kg pouches is not likely to be exactly same. 09/13/15 4 Variability and Decision Making Importance of variation The extent of variation may be the main underlying criterion to decide whether or not to buy a given security. a key input in determining call option and put option values is the variability in the price of the share involved in the option. Thus, variability has a key role to play when we realize that annually, business in options worth millions of rupees is done. 09/13/15 5 Variability and Decision Making (Contd….) Variability has an important role to play in quality control in industrial production. Variability is the key to decide whether a production process is under control or not. 09/13/15 6 Absolute And Relative Variation The variability or dispersion may be measured in absolute or relative terms. Absolute variation refers to the quantum of variability in a given set of data and it is expressed in the same units as the original data are in. For example, the absolute variation in the salaries of the workers and their ages will be expressed in terms of rupees and years respectively. 09/13/15 7 Absolute And Relative Variation (Contd….) Absolute variation does not allow comparison of two or more sets of data in terms of their variability, especially when they involve different variables and, therefore, different units of measurement. A comparison can be made only when we get rid of the units of the data involved. It is done by computing some measures of relative variation, called co-efficients of dispersion and expressed as pure numbers, in the form of proportions or percentages. 09/13/15 8 Absolute And Relative Variation (Contd….) For example, if we desire to compare whether there is greater variability in the wages of skilled workers and wages of unskilled workers, deceiving results can be obtained if the comparison is made in terms of a measure of absolute variation. This is because although wages of both groups are expressed in same units, namely rupees, the groups are evidently dissimilar. 09/13/15 9 Absolute And Relative Variation (Contd….) Therefore, whenever two or more groups are to be compared for variation, some measure of relative variation should be used. 09/13/15 10 Measures of Variation A. Although, several measures are available for studying variation, however, the choice of an appropriate measure depends upon the type of distribution and precision required. The different measures are Measure involving limits a) b) Range Partial ranges comprising i. ii. c) 09/13/15 Inter-quartile range Inter-percentile ranges Quartile deviation 11 Measures of Variation (Contd….) Measures involving deviations B. Mean Deviation b) Standard Deviation Besides, variation is studied graphically for some special applications a) 09/13/15 12 Range Range is defined as the difference between the largest and the smallest values in the data set. This is the most commonly understood and used measure of variation. Symbolically, it can be shown as R=L–S Where R is the range, L is the largest value and S is the smallest value in the data set. 09/13/15 13 Range (Contd….) In the case of a discrete frequency distribution, the range is equal to the difference between the largest and smallest X values. In the case of grouped frequency distribution, it is equal to the difference between the upper limit of the highest class interval and the lower limit of the lowest class interval. 09/13/15 14 Range (Contd….) Important points i. ii. 09/13/15 In a grouped frequency distribution, the range is only an approximation since the actual largest and smallest values are not available If a given frequency distribution is open-ended, then the range can not be determined. 15 Co-efficient of Range The range is a measure of absolute variation. Its value is dependent on the units of measurement of data. Moreover, it is expressed in the same units as the original data. For comparison of two or more distributions we calculate co-efficient of range (CR), a measure of relative variation. 09/13/15 16 Co-efficient of Range (Contd….) Symbolically, 09/13/15 17 Limitation and Uses of Range Although, range is easy to calculate but it is difficult to interpret it properly without having regard to the number of items in the data. Larger the sample size, the greater in general, the range. Range shows very large sampling fluctuations. Range completely ignores the variability in the items between these two extremes. 09/13/15 18 Limitation and Uses of Range (Contd….) The range is a poor measure of dispersion in cases where the number of items in the sample is large. However, in small samples, it is useful in certain cases. The range has a good deal of importance in industrial quality control where a large number of small samples are studied to determine if the quality of the products being produced by a production process is under control or not. 09/13/15 19 Limitation and Uses of Range (Contd….) There are many other situations where range finds place. For example, the stock exchange quotations and prices in the commodity markets found in the financial columns of newspapers are in terms of the high and low. Agricultural productivity in an area, the rainfall, employment etc also is usually quoted in terms of range. 09/13/15 20 Partial Ranges Range may not serve as a suitable indicator of variability in a set of data particularly when some extreme values are present. In those cases, partial ranges are advisable to use to gauge the extent of variability. The partial ranges do not consider the complete data and ignore part of the data on each end. The commonly used partial ranges are interquartile range and inter- percentile range. 09/13/15 21 Inter-quartile Range The distance between the lower and the upper quartiles of a distribution is termed as the inter-quartile range, IQR. It gives the range of the middle 50 percent of the values, ignoring 25 percent values on the lower end and an equal number of values on the upper end of the data. Symbolically, IQR = Q3 – Q1 09/13/15 22 Inter-percentile Range While the inter-quartile range covers middle 50 percent of data, the inter-percentile range allows considering a greater part of the data to get the idea of variability. Although we can obtain the range of values between any pair of percentiles located equidistant from the middle of the distribution, the most commonly used is the one based on the 10th and the 90th percentiles, leaving 10 percent of the observations on each end of the distribution. 09/13/15 23 Inter-percentile Range The Inter- percentile Range (IPR) is defined as IPR = P90 – P10 where P90 is the 90th percentile and P10 is the 10th percentile of the distribution. It can be easily visualized that the inter-quartile range is also a special case of the inter-percentile range. IQR is in fact the inter-percentile (25 – 75) range. 09/13/15 24 Quartile Deviation The quartile deviation, QD, is numerically equal to one-half of the inter-quartile range. It is also called semi inter-quartile range. where Q3 and Q1 are, respectively, the third and the first quartile of a distribution. 09/13/15 25 Co-efficient of Quartile Deviation The quartile deviation is an absolute measure of variation. For comparison of distributions, the coefficient of quartile deviation, CQD, is calculated. The CQD is obtained by dividing the quartile deviation by one-half of the summation of the values of the upper and lower quartiles. 09/13/15 26 Co-efficient of Quartile Deviation (Contd….) The co-efficient of quartile deviation is also called quartile co-efficient of dispersion. quartile deviation along with its coefficient is the most suitable measure to study and compare variability when open-ended distributions are given. 09/13/15 27 Mean Deviation Range, inter-percentile range, and quartile deviation consider only part of the given data. Mean deviation and standard deviation, on the other hand, are the measures which are based on all observations of the data. Mean deviation, MD, refers to the average of absolute deviations of observations from their mean (or median or mode). 09/13/15 28 Mean Deviation (Contd….) It is represented by δ (small delta) with a subscript either or Me or Mo, depending on whether the deviations are taken from mean, median or mode. Since the calculation of mean deviation involves absolute deviations, it is also called mean absolute deviation (MAD). Calculation of MD differs in case of individual series and grouped frequency distribution 09/13/15 29 Calculation Deviation of Mean In case of Individual Series: In such a case, the following steps are involved. i. ii. iii. 09/13/15 Obtain the average. Take deviation of each of the observations from the average and consider absolute values of these. Obtain the total of deviations and divide by n, the number of observations. 30 Calculation of Mean Deviation (Contd….) In case of Frequency Distribution i. ii. iii. 09/13/15 Calculate average value from which deviations are to be taken. Measure absolute deviations of X values (mid-points of various class intervals in case of grouped frequency distributions) from the average. Thus, if median is the average used, compute Multiply each of the deviations with corresponding frequency and obtain summation of these products. It is when median is used. 31 Calculation of Mean Deviation (Contd….) Divide the summation obtained in step (iii) by Symbolically, iv. It may be noted that in case of openended distributions, we cannot calculate mean deviation because mid-points of the open-ended classes are not defined. 09/13/15 32 Co-efficient of Mean Deviation (CMD) The Co-efficient of Mean Deviation is a measure of relative variation in a given set of data. This is obtained by dividing the mean deviation value by the value of the average used to measure deviations. Symbolically, 09/13/15 33 Standard Deviation The measures of variation discussed so far suffer from some limitation or the other. The standard deviation has no such limitations and it also enjoys some well-defined mathematical properties. Moreover, it is extensively used in statistical analysis. Standard deviation is defined as the positive square root of the average of squared deviations taken from arithmetic mean. It is also labeled as root-mean-squared-deviations. 09/13/15 34 Standard Deviation (Contd….) Calculation of S. D. Individual Series Frequency Distribution 09/13/15 Deviations from arithmetic mean Deviations from assumed mean Calculation without measuring deviations: When deviations are taken from arithmetic mean When deviations are taken from assumed mean When step deviations are taken When deviations are not taken: 35 Calculation of SD in Individual Series Calculation using deviations from arithmetic mean involves the following steps Calculate the arithmetic mean, X. Measure deviations of each of the given X values from , and square them to get . Find the average of the squared deviations, /n. Finally, extract positive square root of the average obtained in (iii) to get the standard deviation, σ. Symbolically, i. 09/13/15 36 Calculation of SD in Individual Series (Contd….) The following formula is then employed to obtain the standard deviation value by using deviation from assumed mean. where d = X – A, Σd2 indicates the summation of squares of deviations from A, Σd indicates the summation of deviations from A, and n is the number of observations. 09/13/15 37 Calculation of SD in Individual Series (Contd….) Standard deviation can also be calculated without measuring deviations and the following formula is used for the purpose. 09/13/15 38 Calculation of SD in Frequency Distributions The following steps are taken to calculate standard deviation when deviations are taken from Arithmetic Mean. i. ii. iii. iv. 09/13/15 Calculate the mean. Measure deviations of X values (mid-points in case of grouped frequency distributions) from ; square them and multiply by corresponding frequencies, Divide by Σf. This gives the variance. Extract positive square root of the variance to get standard deviation. 39 Calculation of SD in Frequency Distributions (Contd….) The formula can be shown as 09/13/15 40 Calculation of SD in Frequency Distributions (Contd….) The following steps are involved to calculate standard deviation when deviations are taken from assumed mean. i. ii. iii. 09/13/15 Choose an assumed mean, A. Measure deviations of X values from A and label the deviations as d. Multiply d and d2 with the corresponding frequencies and add these to get Σfd and Σfd2. Apply the formula: 41 Calculation of SD in Frequency Distributions (Contd….) When step deviations are taken then the formula employed is The formula to calculate SD, when the deviation is not taken, is where ΣfX2 is the summation of the products of the squares of X values and the corresponding frequencies. 09/13/15 42 Properties of Standard Deviation The standard deviation possesses some well-defined mathematical properties which make it a useful statistical tool. The properties are discussed below. i. 09/13/15 If a constant K is added to, or deducted from, each value in a given series, the value of standard deviation remains unaffected. Symbolically, 43 Properties of Standard Deviation (Contd….) ii. 09/13/15 If every value in a series is multiplied (divided) by a constant, the standard deviation is multiplied (divided) by the absolute value of the constant. Symbolically, 44 Properties of Standard Deviation (Contd….) iii. iv. 09/13/15 The combined SD of two or more sets of data can be obtained from their individual means and standard deviation values. For two distributions with n1 and n2 observations, and mean values, and σ1 and σ2 standard deviations respectively, we have the following: 45 Properties of Standard Deviation (Contd….) Where, σ is the combined standard deviation, , and is the combined mean obtained as The rule can be extended to any number of data sets 09/13/15 46 Properties of Standard Deviation (Contd….) For the cases involving two distributions, the following alternative formula can be employed. 09/13/15 47 Variance The variance of a set of data refers to the average of the squared deviations measured from arithmetic mean. It is, therefore, equal to squared standard deviation, σ2. We have, 09/13/15 48 Properties of Variance i. ii. 09/13/15 If each value in a given series is increased (decreased) by a constant, K, the variance remains unaffected. If each value in a given series is multiplied (divided) by a constant, K, the variance of the new series would be equal to variance of the given series multiplied/divided by K2. Symbolically, 49 Properties of Variance (Contd….) iii. The combined variance of two or more series can be obtained given their sizes, means and variances. For two series, In this, 09/13/15 and 50 Co-efficient of Variation The standard deviation (and variance) cannot be used to compare variability between two groups of data of different types. Moreover, both standard deviation and variance are measure of absolute variation. For example, a comparison of variation in wages and production cannot be made in terms of their standard deviations since they involve different units. Even if they involve same units, it is not advisable to use the measure of absolute variation. a valid comparison always calls for using coefficient of variation, which is a measure of relative variation. 51 09/13/15 Co-efficient of Variation (Contd….) The co-efficient of variation tells how much the standard deviation as a percentage of the arithmetic mean is. Thus, 09/13/15 52 MS Excel and Measures of Variation The MS Excel provides several functions related to variation which can be usefully employed. In context of the measures of variation discussed in this chapter, the following functions may be used when a set of individual values is given. The functions are: I. 09/13/15 AVEDEV: It returns the mean of the absolute deviations of the given set of values from their arithmetic mean. It thus yields mean deviation about mean for the given data. 53 MS Excel and Measures of Variation (Contd….) II. III. IV. V. 09/13/15 STDEV: This gives standard deviation of the given values under the assumption that the values represent a sample data. STDEVP: This also gives standard deviation of the given values. it does so under the assumption that the values are a set of population data. VAR: It yields variance of the given values, assuming they are data from a sample. VARP: Like STDEVP, it also considers the values to be a set of population. It gives variance of the given set of data. 54 Relationship between Arithmetic Mean and Standard Deviation The arithmetic mean is a measure of central tendency while the standard deviation is a measure of variation or the spread of the values in a set of data. There are two general rules which establish a relationship between these measures in a given set of data. One of these is called the Chebyshev’s theorem or Chebyshev’s inequality, while the other is known as the empirical rule. 09/13/15 55 Chebyshev’s Theorem It is a mathematical theorem which states that in any distribution, at least 1 – 1/k2 of the observations will lie within k standard deviations of mean. 09/13/15 56 Empirical Rule In addition to the Chebyshev’s theorem, there is another type of statement about the relationship between mean and standard deviation. This is called the empirical rule. This is based on the assumption that the underlying population is bell-shaped which tapers off smoothly on both the ends. Such a distribution is called normal distribution and looks like one shown in the following figure 09/13/15 57 Empirical Rule (Contd….) 09/13/15 58 Empirical Rule (Contd….) For a normal distribution, 68.27% of the values are included within one standard deviation below the mean and one standard deviation above it. Similarly, µ ± 2σ covers 95.45% of the values while µ ± 3σ covers 99.73% of the values of the distribution. It is significant to note that while this relationship is true for normal distributions only, but it can be used even for distributions that are not strictly normal. 09/13/15 59 Empirical Rule (Contd….) Consequently, for most of the frequency distributions which are symmetrical or nearly symmetrical and are large in size, the following rules may be used. 09/13/15 Nearly 68 percent of the observations are included within one standard deviation of the mean () on either side of it. That is to say, ± σ covers nearly 68 percent of the data values. Nearly 95 percent of the observations are included within two standard deviations of the mean on either side of it. Thus, ± 2σ covers about 95 percent of the values of data. Nearly all values in the data are included within three standard deviations of the mean value. Accordingly, ± 3σ covers almost all observations in the distribution. 60 Z-scores The relationship between mean and standard deviation also allows one to determine the relative position of an X-value. This is done by what is called the zscore. The z-score, also called ztransformation, of X is defined as 09/13/15 61 Z-scores (Contd….) Evidently, a z-score determines the relative position of any value in terms of the number of standard deviations above or below the mean. A positive value of Z-score indicates that the...
View Full Document

  • Left Quote Icon

    Student Picture

  • Left Quote Icon

    Student Picture

  • Left Quote Icon

    Student Picture