03 STAT freq dist-LONG SLIDES

03 STAT freq dist-LONG SLIDES - FREQUENCY DISTRIBUTIONS Dr....

Info iconThis preview shows page 1. Sign up to view the full content.

View Full Document Right Arrow Icon
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: FREQUENCY DISTRIBUTIONS Dr. V.R. Bencivenga Economics 329 Economic Statistics FREQUENCY DISTRIBUTIONS AND HISTOGRAMS Outline Objectives: Frequency distributions and frequency histograms Grouped mean and grouped variance Summarizing and describing distributions Computing descriptive statistics from grouped data Computing percentiles from grouped data Ogive Graphical representations of distributions Shapes of distributions Language for describing shapes 1 FREQUENCY DISTRIBUTIONS FREQUENCY DISTRIBUTIONS AND FREQUENCY HISTOGRAMS 2 FREQUENCY DISTRIBUTIONS Frequency distributions A frequency distribution summarizes a large number of observations on a continuous (or approximately continuous) variable. The basic idea is to define intervals of values (“classes” or “class intervals”), and then tabulate the number or fraction of observations falling in each interval. 3 FREQUENCY DISTRIBUTIONS Example—Voter turnout Voter turnout Number of states 35 to under 40 1 .020 40 to under 45 7 .137 45 to under 50 6 .118 50 to under 55 12 .235 55 to under 60 14 .275 60 to under 65 6 .118 65 to under 70 4 .078 70 to under 75 1 .020 Total 51 1.00 4 Proportion of states FREQUENCY DISTRIBUTIONS Choosing class intervals: Identify a convenient interval covering all observations. Divide this interval into class intervals. Define class intervals so that each observation falls in exactly one class. How many classes? “Square root” guideline: number of classes approximately equals n Another popular guideline: 15-20 classes for large data sets Example—Voter turnout: Start with a multiple of 5 just below the minimum observation, and end with a multiple of 5 just above the maximum observation. Use 35 to < under 40, 40 to < under 45, 45 to < 50, etc., so it is unambiguous into which class an observation on a boundary will fall. 51 7 class intervals. Convenience suggests 8. 5 FREQUENCY DISTRIBUTIONS Class intervals may be of equal or unequal length. Equal class intervals are easier to work with, and are preferred. In some applications, unequal class intervals are a better choice. Example—Per capital income (2005 per capita income, 208 countries, World Bank) Two alternative methods of converting data into a common currency − Atlas method: exchange rates − Purchasing power parity method (PPP): price indexes including traded and nontraded goods 6 GNP per capita, 2005, Atlas method and Purchasing Power Parity (PPP) method Atlas methodology 1 Luxembourg 2 Norway 3 Switzerland 4 Bermuda 5 Denmark 6 Iceland 7 United States 8 Liechtenstein 9 Sweden 10 Ireland 11 Japan 12 United Kingdom 13 Finland 14 Channel Islands 15 Austria 16 Netherlands 17 Belgium 18 France 19 Germany 20 Canada 21 Australia 22 Isle of Man US dollars 65,630 59,590 54,930 .. a 47,390 46,320 43,740 .. a 41,060 40,150 38,980 37,600 37,460 .. a 36,980 36,620 35,700 34,810 34,580 32,600 32,220 27,770 a PPP methodology international dollars 1 Luxembourg 65,340 2 Bermuda .. a 3 United States 41,950 4 Norway 40,420 5 Liechtenstein .. a 6 Switzerland 37,080 7 Channel Islands .. a 8 Iceland 34,760 9 Ireland 34,720 10 Hong Kong, China 34,670 11 Denmark 33,570 12 Austria 33,140 13 United Kingdom 32,690 14 Belgium 32,640 15 Netherlands 32,480 16 Canada 32,220 17 Qatar .. a 18 Sweden 31,420 19 Japan 31,410 20 Finland 31,170 22 Australia 30,610 23 France 30,540 7 26 28 29 31 32 33 34 38 41 44 46 48 49 50 53 55 56 57 58 60 62 63 65 66 Italy Hong Kong, China Singapore New Zealand Kuwait Spain UAE Greece Israel Cyprus Slovenia Portugal Korea, Rep. Bahrain Malta Saudi Arabia Antigua Czech Republic Trinidad-Tobago Hungary Oman Estonia Seychelles St. Kitts and Nevis 30,010 27,670 27,490 25,960 24,040 a 25,360 23,770 a 19,670 18,620 16,510 a 17,350 16,170 15,830 14,370 a 13,590 11,770 10,920 10,710 10,440 10,030 9,070 a 9,100 8,290 8,210 25 27 29 33 34 36 37 40 41 42 44 46 47 49 50 52 56 57 58 59 60 61 62 64 Singapore Germany Italy Spain UAE Kuwait Israel Cyprus Greece New Zealand Slovenia Korea, Rep. Bahrain Czech Republic Portugal Malta Hungary Seychelles Slovak Republic Oman Estonia Saudi Arabia Lithuania Argentina 8 29,780 29,210 28,840 25,820 24,090 a, c 24,010 a, c 25,280 22,230 a 23,620 23,030 22,160 21,850 21,290 20,140 19,730 18,960 16,940 15,940 15,760 14,680 a, c 15,420 14,740 c 14,220 13,920 67 68 70 71 72 73 74 75 76 77 78 80 81 82 82 84 85 86 87 Croatia Slovak Republic Palau Mexico Poland Lithuania Latvia Lebanon Chile Libya Mauritius Botswana Gabon Malaysia South Africa Venezuela, RB St. Lucia Turkey Panama 8,060 7,950 7,630 7,310 7,110 7,050 6,760 6,180 5,870 5,530 5,260 5,180 5,010 4,960 4,960 4,810 4,800 4,710 4,630 66 67 68 69 70 71 73 75 76 78 79 80 81 82 83 85 86 87 88 Poland Latvia Trinidad-Tobago Croatia St. Kitts and Nevis Mauritius South Africa Antigua and Barbuda Chile Russian Federation Malaysia Botswana Mexico Uruguay Costa Rica Romania Bulgaria Thailand Turkey 9 13,490 13,480 13,170 12,750 12,500 12,450 12,120 c 11,700 11,470 10,640 10,320 10,250 10,030 9,810 9,680 c 8,940 8,630 8,440 8,420 88 89 90 91 92 93 94 95 96 97 98 99 100 100 102 103 103 105 106 107 108 109 110 112 Costa Rica Argentina Russian Federation Uruguay Grenada Romania Dominica St. Vincent Belize Brazil Bulgaria Jamaica Fiji Serbia-Montenegro Namibia Kazakhstan Marshall Islands Tunisia Macedonia, FYR Iran, Islamic Rep. Belarus Thailand Algeria Ecuador 4,590 4,470 4,460 4,360 3,920 3,830 3,790 3,590 3,500 3,460 3,450 3,400 3,280 3,280 d 2,990 2,930 2,930 2,890 2,830 2,770 2,760 2,750 2,730 2,630 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 105 106 107 108 109 110 111 112 114 10 Brazil Equatorial Guinea Iran, Islamic Rep. Tonga Namibia Tunisia Belarus Bosnia-Herzegovina Kazakhstan Colombia Panama Grenada Dominican Republic Macedonia, FYR Algeria Belize Ukraine China Samoa St. Vincent Venezuela, RB Cape Verde St. Lucia Fiji 8,230 7,580 a, c 8,050 8,040 c 7,910 c 7,900 7,890 7,790 c 7,730 7,420 c 7,310 7,260 7,150 c 7,080 6,770 c 6,740 6,720 6,600 6,480 6,460 6,440 6,000 c 5,980 5,960 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 134 135 Peru Albania Suriname Jordan El Salvador Bosnia-Herzegovina Guatemala Maldives Dominican Republic Micronesia, Fed. Sts. Colombia Swaziland Tonga Samoa Cape Verde China Morocco Vanuatu Ukraine Armenia Kiribati Syria 2,610 2,580 2,540 2,500 2,450 2,440 2,400 2,390 2,370 2,300 2,290 2,280 2,190 2,090 1,870 1,740 1,730 1,600 1,520 1,470 1,390 1,380 115 117 118 119 121 122 123 125 126 127 129 130 132 133 134 135 136 137 138 139 140 141 11 Gabon Peru Lebanon Dominica Albania Philippines Jordan Swaziland El Salvador Armenia Paraguay Azerbaijan Sri Lanka Egypt, Arab Rep. Guatemala Morocco Guyana Jamaica Ecuador Syria Indonesia Nicaragua 5,890 5,830 5,740 5,560 5,420 5,300 5,280 5,190 5,120 c 5,060 4,970 c 4,890 4,520 4,440 4,410 c 4,360 4,230 c 4,110 4,070 3,740 3,720 3,650 c 136 136 138 139 139 139 142 143 144 145 146 147 147 147 150 151 152 153 155 156 158 159 160 161 Angola Georgia Philippines Indonesia Paraguay West Bank-Gaza Egypt, Arab Rep. Azerbaijan Honduras Sri Lanka Djibouti Bolivia Cameroon Guyana Lesotho Congo, Rep. Nicaragua Moldova Bhutan Côte d'Ivoire Timor-Leste India Senegal Mongolia 1,350 1,350 1,300 1,280 1,280 1,120 a 1,250 1,240 1,190 1,160 1,020 1,010 1,010 1,010 960 950 910 880 870 840 750 720 710 690 144 145 146 147 148 149 151 154 155 155 158 159 159 161 162 163 163 163 166 167 167 169 169 171 12 India Lesotho Georgia Vanuatu Vietnam Honduras Bolivia Cambodia Ghana Papua New Guinea Pakistan Djibouti Guinea Angola Mongolia Cameroon Mauritania Moldova Bangladesh Lao PDR Uzbekistan Comoros Sudan Zimbabwe 3,460 c 3,410 c 3,270 3,170 c 3,010 2,900 c 2,740 2,490 c 2,370 c 2,370 c 2,350 2,240 c 2,240 2,210 c 2,190 2,150 2,150 c 2,150 2,090 2,020 2,020 2,000 c 2,000 1,940 161 163 164 164 166 167 168 169 169 171 172 172 174 175 176 176 178 178 180 180 182 183 183 185 Pakistan Papua New Guinea Comoros Sudan Vietnam Yemen, Rep. Solomon Islands Mauritania Nigeria Kenya Benin Uzbekistan Zambia Bangladesh Ghana Haiti Kyrgyz Republic Lao PDR Burkina Faso Chad São Tomé-Principe Cambodia Mali Guinea 690 660 640 640 620 600 590 560 560 530 510 510 490 470 450 450 440 440 400 400 390 380 380 370 172 173 174 175 176 177 178 179 180 181 183 184 185 186 187 188 189 191 192 193 193 195 196 197 13 Gambia, The Solomon Islands Kyrgyz Republic Haiti Senegal Togo Nepal Uganda Côte d'Ivoire Chad Rwanda Mozambique Tajikistan Burkina Faso Kenya Central African Rep Benin Nigeria Eritrea Ethiopia Mali Zambia Yemen, Rep. Madagascar 1,920 c 1,880 c 1,870 1,840 c 1,770 1,550 1,530 1,500 c 1,490 1,470 c 1,320 1,270 c 1,260 1,220 c 1,170 1,140 c 1,110 1,040 1,010 c 1,000 c 1,000 950 920 880 186 Central African Rep 186 Togo 188 Tanzania 188 Zimbabwe 190 Tajikistan 191 Mozambique 192 Gambia, The 192 Madagascar 194 Uganda 195 Nepal 196 Niger 197 Rwanda 199 Eritrea 199 Sierra Leone 201 Guinea-Bissau 202 Ethiopia 202 Malawi 206 Liberia 207 Congo, Dem. Rep. 208 Burundi World 350 350 340 340 330 310 290 290 280 270 240 230 220 220 180 160 160 130 120 100 6,987 200 201 202 203 204 206 207 208 Congo, Rep. Niger Sierra Leone Tanzania Congo, Dem. Rep. Guinea-Bissau Malawi Burundi World 14 810 800 c 780 730 g 720 c 700 c 650 640 c 9,420 World 6,987 World 9,420 Low income 580 Low income 2,486 Middle income 2,640 Middle income 7,195 Lower middle income 1,918 Lower middle income 6,313 Upper middle income 5,625 Upper middle income 10,924 Low & middle income 1,746 Low & middle income 5,151 East Asia & Pacific 1,627 East Asia & Pacific 5,914 Europe & Central Asia 4,113 Europe & Central Asia 9,142 LA & Caribbean 4,008 LA & Caribbean 8,111 Middle East & N Africa 2,241 Middle East & N Africa 6,076 South Asia 684 South Asia 3,142 Sub-Saharan Africa 745 Sub-Saharan Africa 1,981 High income 35,131 High income 32,524 EMU 31,914 EMU 28,958 Low income $875 or less. Lower middle income $876 to $3,465. Upper middle income $3,466 to $10,725. High income $10,726 or more. Notes: Figures in italics are for 2004 or 2003. “..” Not available. a. 2005 data not available; ranking is approximate. c. Estimate is based on regression; other PPP figures are extrapolated from the latest ICP benchmark estimates. d. Excludes data for Kosovo. Source: World Development Indicators database, World Bank, 1 July 2006. Source table reports GNI = GDP. Source table gives estimated income categories for remaining countries, omitted here. Source table contains additional footnotes, deleted here. See source for details. http://siteresource.worldbank.org/ICPINT/Resources/Atlas_2005.pdf 15 FREQUENCY DISTRIBUTIONS Suppose we try to construct a frequency distribution using class intervals of equal length. The “square root” guideline suggests 14 or 15 class intervals, implying class intervals of $4000 to $5000. Range of per capita income is $100 (Burundi) to over $65,000 (Luxembourg). Let’s consider using 13 class intervals of $5000. (16 class intervals of $4000 would be another possibility.) 16 FREQUENCY DISTRIBUTIONS What does this choice of class intervals imply for the frequency distribution? Portugal is approximately the 75th percentile country, with per capita income under $20,000. More than 3/4 of countries will be in the bottom 4 class intervals. Brazil ($3460 or $8230, depending on the conversion method) and Tunisia ($2890 or $7900) are approximately median countries. More than 1/2 of countries will be in the bottom 2 class intervals. Bolivia ($1010 or $2740) and India ($720 or $3400) are approximately the 25th percentile countries. More than 1/4 of countries will be in the bottom class interval! 17 FREQUENCY DISTRIBUTIONS In fact, the bottom of the distribution contains even more countries than this! Atlas method: More than 1/2 of countries would be in the bottom class interval (82 to 208, Malaysia to Burundi) PPP method: More than 1/3 of countries would be in the bottom class interval (129 to 208, Paraguay to Burundi) Class intervals at the top of the distribution have very few observations. PPP method: Luxembourg would be in the highest class interval by itself, then three class intervals would be empty, then the US would be in the 40,000 to < 45,000 interval by itself. (Bermuda is in there somewhere!) Atlas method: Only 10 countries in the top 5 class intervals Even if we used class intervals of $1000 (and 65 class intervals is too many!), we’d lose too much detail at the bottom end of the distribution. The appropriate choice is unequal class intervals—small intervals at the bottom of the distribution, and larger ones for the relatively small number of high-income countries. 18 FREQUENCY DISTRIBUTIONS Three ways of tabulating the observations: Notation (1) Frequency distribution n = number of observations K classes, indexed by k = 1,…, K Number of observations falling into each class interval Class frequencies: nk, k = 1,…, K (2) Relative frequency distribution K nk Fraction of observations falling into each class interval n k1 Relative frequencies: fk, k = 1,…, K nk fk n (3) Cumulative relative frequency distribution K Fraction of observations falling in that class or a lower class fk 1 k1 Cumulative relative frequencies: Fk, k = 1,…,K k Fk 19 h1 fh f1 f2 ... fk FREQUENCY DISTRIBUTIONS Frequency histograms To construct a frequency histogram: Mark the class intervals on the horizontal axis. Draw a rectangle on each class interval, whose height is the class frequency. 20 FREQUENCY DISTRIBUTIONS In this example, we can “re-label” the vertical axis with the relative frequencies, and the relative frequency histogram will have the same profile as the frequency histogram: Frequency histogram 14 12 7 6 4 1 35 40 45 50 55 21 60 65 70 75 FREQUENCY DISTRIBUTIONS In this example, we can “re-label” the vertical axis with the relative frequencies, and the relative frequency histogram will have the same profile as the frequency histogram: Relative frequency histogram .275 .235 .137 .118 .078 .020 35 40 45 50 55 22 60 65 70 75 FREQUENCY DISTRIBUTIONS Example—Age structure of the US population Age structure of the US population, percentages Age <5 5 to < 10 10 to < 15 15 to < 20 20 to < 25 25 to < 30 30 to < 35 35 to < 40 40 to < 45 45 to < 50 50 to < 55 55 to < 60 60 to < 65 65 to < 75 75 1960 11.3 10.4 9.4 7.4 6.2 6.1 6.6 6.9 6.5 6.0 5.3 4.7 4.0 6.1 3.1 1988 7.5 7.3 6.7 7.4 7.9 8.9 8.9 7.8 6.6 5.3 4.5 4.4 4.4 7.3 5.1 23 2005 6.8 6.6 7.0 7.1 7.1 6.8 6.8 7.1 7.7 7.6 6.8 5.9 4.4 6.3 6.1 Population was 180.671 million in 1960, 246.329 million in 1988, 296.507 million in 2005. Statistical Abstract of the United States FREQUENCY DISTRIBUTIONS Relative frequency distribution and cumulative relative frequency distributions for a more aggregated set of class intervals: Age <15 1988 cumulative relative relative frequency frequency 2005 relative frequency cumulative relative frequency 21.5 21.5 20.4 20.4 15 to < 25 15.3 36.8 14.2 34.2 25 to < 55 42.0 78.8 42.8 77.4 55 to < 65 8.8 87.6 10.3 87.7 65 12.4 100.0 12.4 100.0 24 FREQUENCY DISTRIBUTIONS Histogram for the 1988 distribution (assuming the upper limit of the highest class interval is 100 years). What do you think of this histogram? Percentage of observations 42 40 30 21.5 20 15.3 12.4 8.8 10 0 15 25 55 25 65 100 Class intervals FREQUENCY DISTRIBUTIONS The areas of the rectangles are not proportional to the relative frequencies! Why not? Because the class intervals are unequal! We must correct the heights so that area represents relative frequency. How is the height of each rectangle determined? Area = height width Width is stated as a multiple of the narrowest class interval Solve for height 26 FREQUENCY DISTRIBUTIONS Percentage of observations 20 15 21.5/1.5 15.3 = 14.33 42/3 = 14 8.8 10 5 12.4/3.5 = 3.54 0 15 25 55 27 65 100 Class intervals FREQUENCY DISTRIBUTIONS Summary: If class intervals are equal, the frequency histogram and relative frequency histogram have the same profile—we can just re-label the vertical axis to go between them. If class intervals are unequal, we must use a relative frequency histogram, correcting the heights so that area = relative frequency. The total area of a relative frequency histogram always equals one (or 100%). 28 FREQUENCY DISTRIBUTIONS GROUPED MEAN AND GROUPED VARIANCE 29 FREQUENCY DISTRIBUTIONS Sometimes we do not have the raw data, but only the frequency or relative frequency distribution (“grouped data”). How can we compute measures of central tendency, dispersion, and percentiles from grouped data? The representation of a continuous variable’s probability distribution closely resembles a relative frequency distribution. Intuition for how to compute the mean, variance, and percentiles from grouped data will carry over! Begin with the “grouped mean” and “grouped variance.” 30 FREQUENCY DISTRIBUTIONS Grouped mean: 1n We want to calculate X Xi ni 1 We don’t have Xi, i = 1, …, n, so we can’t calculate it! What do we have? 1. class intervals (K of them) K 2. class frequency of each class interval (nk, k = 1,…, K, where nk k1 31 n) FREQUENCY DISTRIBUTIONS How can we use this information? We need to assume where in each class interval the observations lie. What is a logical assumption? The midpoint of the kth class interval is called the class mark, denoted Xk. Assume every observation in the kth class interval lies on the class mark. 32 FREQUENCY DISTRIBUTIONS What’s wrong with the following statistic? 1K Xk Kk 1 Average of the class marks (midpoints) Example: n = 6 class intervals class marks data set #1 data set #2 0 to < 20 X1 = 10 n1 = 2 n1 = 1 20 to < 40 X2 = 30 n2 = 2 n2 = 2 40 to 60 X3 = 50 n3 = 2 n3 = 3 1 (10 30 50) 30 3 33 FREQUENCY DISTRIBUTIONS How should we change this statistic so it better estimates the mean of the raw data? 1K nk X k nk 1 Example: n = 6 class intervals class marks data set #1 data set #2 0 to < 20 X1 = 10 n1 = 2 n1 = 1 20 to < 40 X2 = 30 n2 = 2 n2 = 2 40 to 60 X3 = 50 n3 = 2 n3 = 3 X 1 [ 2(10) 2(30) 2(50) ] 30 6 34 X 1 2 [ 1(10) 2(30) 3(50) ] 36 6 3 FREQUENCY DISTRIBUTIONS Note the grouped mean is a weighted average of the class marks: 1K X nk X k nk 1 Kn kX X k n k1 K X f k Xk k1 Move nk n 1 inside the summation sign n fk K 1K Definition: The grouped mean is X nk X k , or equivalently, X f k Xk . nk 1 k1 35 FREQUENCY DISTRIBUTIONS Grouped variance: How can we estimate S 2 X 1 n (X i n 1i 1 kth class interval lies on the class mark! Xk X (X k X)2 ? Continue to assume every observation in the X)2 X)2 nk (Xk K nk (X k X) 2 k1 S2 X 1 K nk (X k n 1k 1 X) 2 36 FREQUENCY DISTRIBUTIONS S2 X 1 K nk (X k n 1k 1 X) 2 Example: n = 6 class intervals data set #1 data set #2 0 to < 20 X1 = 10 n1 = 2 n1 = 1 20 to < 40 X2 = 30 n2 = 2 n2 = 2 40 to 60 S2 X class marks X3 = 50 n3 = 2 n3 = 3 1 [ 2(10 30)2 5 2(30 30)2 S2 X 2(50 30)2 ] 1 2 [ 1(10 36 )2 5 3 37 320 2 2(30 36 )2 3 2 3(50 36 )2 ] 266.67 3 FREQUENCY DISTRIBUTIONS The grouped variance approximately equals a weighted average of the squared deviations of the class marks from the grouped mean: 1 S2 X K nk (X k n 1k 1 K nk (X k k1n 1 S2 X X) 2 X) 2 Divide by n – 1 Move 1 n1 inside the summation fk K S2 X f k (X k X) 2 If n is large, k1 Definition: The grouped variance is S 2 X S2 X K f k (X k 1 nk nk n1 n K nk (X k n 1k 1 X) 2 . k1 38 X)2 , or (approximately) FREQUENCY DISTRIBUTIONS Summary: When computing the mean and variance from grouped data, assume every observation in the kth class interval lies on the class mark. The grouped mean is a weighted average of the class marks, and the grouped variance is (approximately) a weighted average of the squared deviations of the class marks from the grouped mean … with relative frequencies as weights. 39 FREQUENCY DISTRIBUTIONS COMPUTING PERCENTILES FROM GROUPED DATA 40 FREQUENCY DISTRIBUTIONS Percentiles Example: Class interval Relative frequency 0 to < 5 .08 5 to < 10 .12 .4 10 to < 15 .20 .3 15 to < 20 .25 .2 20 to < 25 .35 .1 0 5 10 What is the 20th percentile? What is the 40th percentile? What is the median? median 15 .50 .40 20 15 .25 41 17 15 20 25 FREQUENCY DISTRIBUTIONS Class interval Relative frequency 0 to < 5 5 to < 10 .12 10 to < 15 .20 15 to < 20 .25 20 to < 25 10% of observations .08 .35 Divide the fourth class interval into two parts 1) “lower part” with 10% of the data (to add to the data below this class) 2) “upper part” (the rest) How far from 15 toward 20 do we have to go, to capture 10% of the data? .10 x (.25) x .10 .4 .25 10% of the data is what fraction of 25% of the data? .4 of the way from 15 toward 20 is .4(20 – 15) = 2. Median = 15 + 2 = 17. 42 REQUENCY DISTRIBUTIONS Definition: The pth percentile is the value such that that p% of the observations lie below, and (1 – p)% lie above. The median is the 50th percentile! 43 FREQUENCY DISTRIBUTIONS In calculating the median and other percentiles, what assumption have we made about where the observations lie, within the class interval? For the purpose of computing percentiles from grouped data, we assume the observations in each class are evenly or uniformly distributed along the class interval. 44 FREQUENCY DISTRIBUTIONS Example (continued)—Age structure of the US population Age 1960 1988 2005 <5 11.3 7.5 6.8 10.4 7.3 6.6 9.4 6.7 15 to < 20 7.4 7.4 20 to < 25 6.2 7.9 7.1 25 to < 30 6.1 8.9 6.8 30 to < 35 6.6 8.9 6.8 35 to < 40 6.9 7.8 7.1 40 to < 45 6.5 6.6 7.7 45 to < 50 6.0 5.3 7.6 50 to < 55 5.3 4.5 6.8 55 to < 60 4.7 4.4 5.9 60 to < 65 4.0 4.4 4.4 65 to < 75 6.1 7.3 6.3 75 3.1 5.1 6.1 5 to < 10 10 to < 15 44.7 45.7 45 7.0 7.1 48.2 median1960 .5 .447 25 (30 25) .061 29.34 median1988 .5 .457 30 (35 30) .089 32.42 median2005 .5 .482 35 (40 35) .071 36.27 FREQUENCY DISTRIBUTIONS Here are the more aggregated relative frequency distributions of the 1988 and 2005 data. 1988 2005 cumulative cumulative relative relative relative relative Age frequency frequency frequency frequency <15 21.5 21.5 20.4 20.4 15 to < 25 15.3 36.8 14.2 34.2 25 to < 55 42.0 78.8 42.8 77.4 55 to < 65 8.8 87.6 10.3 87.7 65 12.4 100.0 12.4 100.0 Let’s recompute median age. Should we expect these medians to differ from those we just calculated? 46 FREQUENCY DISTRIBUTIONS Here are the more aggregated relative frequency distributions of the 1988 and 2005 data. 1988 2005 cumulative cumulative relative relative relative relative Age frequency frequency frequency frequency <15 21.5 21.5 20.4 20.4 15 to < 25 15.3 36.8 14.2 34.2 25 to < 55 42.0 78.8 42.8 77.4 55 to < 65 8.8 87.6 10.3 87.7 65 12.4 100.0 12.4 100.0 median1988 25 .5 .368 (55 25) 34.43 .42 median2005 25 .5 .342 (55 25) 36.08 .428 47 FREQUENCY DISTRIBUTIONS Here are the more aggregated relative frequency distributions of the 1988 and 2005 data. 1988 2005 cumulative cumulative relative relative relative relative Age frequency frequency frequency frequency <15 21.5 21.5 20.4 20.4 15 to < 25 15.3 36.8 14.2 34.2 25 to < 55 42.0 78.8 42.8 77.4 55 to < 65 8.8 87.6 10.3 87.7 65 12.4 100.0 12.4 100.0 1st quartile 1988 3rd quartile 1988 15 25 .25 .215 (25 15) 17.29 .153 .75 .368 (55 25) 52.29 .42 48 IQ range1988 = 52.29 – 17.29 = 35 FREQUENCY DISTRIBUTIONS OGIVE—A GRAPHICAL REPRESENTATION OF THE CUMULATIVE RELATIVE FREQUENCY DISTRIBUTION 49 FREQUENCY DISTRIBUTIONS An ogive is a graphical representation of the cumulative relative frequency distribution. To graph an ogive: At the upper limit of each class interval, put a dot whose vertical coordinate is the cumulative relative frequency of that class. Put a dot on the horizontal axis at the lower limit of the lowest class interval. Connect the dots with line segments. 50 FREQUENCY DISTRIBUTIONS Example (continued)—Age structure of the US population, aggregated distribution Let’s graph the ogive for age distribution of the U.S. population in 1988 (assume the upper limit on age is 100 years). 1988 2005 cumulative cumulative relative relative relative relative Age frequency frequency frequency frequency <15 21.5 21.5 20.4 20.4 15 to < 25 15.3 36.8 14.2 34.2 25 to < 55 42.0 78.8 42.8 77.4 55 to < 65 8.8 87.6 10.3 87.7 65 12.4 100.0 12.4 100.0 51 FREQUENCY DISTRIBUTIONS Cumulative percentage of observations 100 87.6 78.8 36.8 21.5 0 15 55 25 65 100 Upper limit of class interval 52 FREQUENCY DISTRIBUTIONS Cumulative percentage of observations 100 87.6 78.8 50 36.8 21.5 0 15 25 55 median = 34.43 65 100 Upper limit of class interval We can read the median off the ogive. What is the intuition for why we obtain the same value as we calculated above? 53 FREQUENCY DISTRIBUTIONS SHAPES OF DISTRIBUTIONS 54 FREQUENCY DISTRIBUTIONS Shape suggests which measure of central tendency is most appropriate. Shape suggests which mathematical model is best for a variable. More generally, shape often helps us relate the data to economic theory. Here our goal is to lay out language for describing the shape of a distribution. 55 FREQUENCY DISTRIBUTIONS Modes: A mode is a peak. For grouped data, a mode is the midpoint of a class interval with the largest class frequency (if the frequency distribution has class intervals of equal length). Definition: A unimodal distribution has one mode. A bimodal distribution has two peaks. 56 FREQUENCY DISTRIBUTIONS Symmetry: Definition: A distribution is symmetric if the right half is a mirror image of the left half. symmetric symmetric 57 FREQUENCY DISTRIBUTIONS Skewness: Definition: Skewness is a quantitative measure of how far a distribution departs from symmetry. A distribution is positively skewed if its right “tail” is longer than its left tail, and it is negatively skewed if its left tail is longer than its right tail. Positively skewed distribution Negatively skewed distribution 58 FREQUENCY DISTRIBUTIONS For symmetric distributions … … mean = median … and mean = median = mode if and only if the distribution is unimodal! mean = median = mode mean = median ≠ modes! 59 FREQUENCY DISTRIBUTIONS For a unimodal, positively skewed distribution, mean > median > mode. For a unimodal, negatively skewed distribution, mean < median < mode. Mode Mean Median Definition: Pearson’s coefficient of skewness skew 3(X median) SX where –3 ≤ skew ≤ +3. 60 FREQUENCY DISTRIBUTIONS Summary: We will make heavy use of Relative frequency distributions/histograms, and cumulative relative frequency distributions/histograms Percentiles of a distribution Concepts of symmetry and skewness. The fact that mean = median for a symmetric distribution will be incredibly useful! 61 ...
View Full Document

This note was uploaded on 02/26/2012 for the course ECONOMICS 329 taught by Professor Bencivenga during the Spring '12 term at University of Texas.

Ask a homework question - tutors are online