Unformatted text preview: FREQUENCY DISTRIBUTIONS Dr. V.R. Bencivenga
Economics 329 Economic Statistics
FREQUENCY DISTRIBUTIONS AND HISTOGRAMS Outline Objectives: Frequency distributions and frequency histograms
Grouped mean and grouped variance Summarizing and describing
distributions
Computing descriptive statistics
from grouped data Computing percentiles from grouped data
Ogive Graphical representations of
distributions Shapes of distributions Language for describing shapes 1 FREQUENCY DISTRIBUTIONS FREQUENCY DISTRIBUTIONS AND FREQUENCY HISTOGRAMS 2 FREQUENCY DISTRIBUTIONS Frequency distributions
A frequency distribution summarizes a large number of observations on a continuous (or
approximately continuous) variable.
The basic idea is to define intervals of values (“classes” or “class intervals”), and then
tabulate the number or fraction of observations falling in each interval. 3 FREQUENCY DISTRIBUTIONS Example—Voter turnout
Voter turnout Number of states 35 to under 40 1 .020 40 to under 45 7 .137 45 to under 50 6 .118 50 to under 55 12 .235 55 to under 60 14 .275 60 to under 65 6 .118 65 to under 70 4 .078 70 to under 75 1 .020 Total 51 1.00 4 Proportion of states FREQUENCY DISTRIBUTIONS Choosing class intervals:
Identify a convenient interval covering all observations. Divide this interval into class
intervals.
Define class intervals so that each observation falls in exactly one class.
How many classes?
“Square root” guideline: number of classes approximately equals n
Another popular guideline: 1520 classes for large data sets Example—Voter turnout:
Start with a multiple of 5 just below the minimum observation, and end with a multiple
of 5 just above the maximum observation.
Use 35 to < under 40, 40 to < under 45, 45 to < 50, etc., so it is unambiguous into which
class an observation on a boundary will fall.
51 7 class intervals. Convenience suggests 8. 5 FREQUENCY DISTRIBUTIONS Class intervals may be of equal or unequal length. Equal class intervals are easier to work
with, and are preferred. In some applications, unequal class intervals are a better choice. Example—Per capital income (2005 per capita income, 208 countries, World Bank)
Two alternative methods of converting data into a common currency
− Atlas method: exchange rates
− Purchasing power parity method (PPP): price indexes including traded and nontraded goods 6 GNP per capita, 2005, Atlas method and Purchasing Power Parity (PPP) method
Atlas methodology
1 Luxembourg
2 Norway
3 Switzerland
4 Bermuda
5 Denmark
6 Iceland
7 United States
8 Liechtenstein
9 Sweden
10 Ireland
11 Japan
12 United Kingdom
13 Finland
14 Channel Islands
15 Austria
16 Netherlands
17 Belgium
18 France
19 Germany
20 Canada
21 Australia
22 Isle of Man US dollars
65,630
59,590
54,930
..
a
47,390
46,320
43,740
..
a
41,060
40,150
38,980
37,600
37,460
..
a
36,980
36,620
35,700
34,810
34,580
32,600
32,220
27,770 a PPP methodology
international dollars
1 Luxembourg
65,340
2 Bermuda
..
a
3 United States
41,950
4 Norway
40,420
5 Liechtenstein
..
a
6 Switzerland
37,080
7 Channel Islands
..
a
8 Iceland
34,760
9 Ireland
34,720
10 Hong Kong, China
34,670
11 Denmark
33,570
12 Austria
33,140
13 United Kingdom
32,690
14 Belgium
32,640
15 Netherlands
32,480
16 Canada
32,220
17 Qatar
..
a
18 Sweden
31,420
19 Japan
31,410
20 Finland
31,170
22 Australia
30,610
23 France
30,540
7 26
28
29
31
32
33
34
38
41
44
46
48
49
50
53
55
56
57
58
60
62
63
65
66 Italy
Hong Kong, China
Singapore
New Zealand
Kuwait
Spain
UAE
Greece
Israel
Cyprus
Slovenia
Portugal
Korea, Rep.
Bahrain
Malta
Saudi Arabia
Antigua
Czech Republic
TrinidadTobago
Hungary
Oman
Estonia
Seychelles
St. Kitts and Nevis 30,010
27,670
27,490
25,960
24,040 a
25,360
23,770 a
19,670
18,620
16,510 a
17,350
16,170
15,830
14,370 a
13,590
11,770
10,920
10,710
10,440
10,030
9,070 a
9,100
8,290
8,210 25
27
29
33
34
36
37
40
41
42
44
46
47
49
50
52
56
57
58
59
60
61
62
64 Singapore
Germany
Italy
Spain
UAE
Kuwait
Israel
Cyprus
Greece
New Zealand
Slovenia
Korea, Rep.
Bahrain
Czech Republic
Portugal
Malta
Hungary
Seychelles
Slovak Republic
Oman
Estonia
Saudi Arabia
Lithuania
Argentina
8 29,780
29,210
28,840
25,820
24,090 a, c
24,010 a, c
25,280
22,230 a
23,620
23,030
22,160
21,850
21,290
20,140
19,730
18,960
16,940
15,940
15,760
14,680 a, c
15,420
14,740 c
14,220
13,920 67
68
70
71
72
73
74
75
76
77
78
80
81
82
82
84
85
86
87 Croatia
Slovak Republic
Palau
Mexico
Poland
Lithuania
Latvia
Lebanon
Chile
Libya
Mauritius
Botswana
Gabon
Malaysia
South Africa
Venezuela, RB
St. Lucia
Turkey
Panama 8,060
7,950
7,630
7,310
7,110
7,050
6,760
6,180
5,870
5,530
5,260
5,180
5,010
4,960
4,960
4,810
4,800
4,710
4,630 66
67
68
69
70
71
73
75
76
78
79
80
81
82
83
85
86
87
88 Poland
Latvia
TrinidadTobago
Croatia
St. Kitts and Nevis
Mauritius
South Africa
Antigua and Barbuda
Chile
Russian Federation
Malaysia
Botswana
Mexico
Uruguay
Costa Rica
Romania
Bulgaria
Thailand
Turkey 9 13,490
13,480
13,170
12,750
12,500
12,450
12,120 c
11,700
11,470
10,640
10,320
10,250
10,030
9,810
9,680 c
8,940
8,630
8,440
8,420 88
89
90
91
92
93
94
95
96
97
98
99
100
100
102
103
103
105
106
107
108
109
110
112 Costa Rica
Argentina
Russian Federation
Uruguay
Grenada
Romania
Dominica
St. Vincent
Belize
Brazil
Bulgaria
Jamaica
Fiji
SerbiaMontenegro
Namibia
Kazakhstan
Marshall Islands
Tunisia
Macedonia, FYR
Iran, Islamic Rep.
Belarus
Thailand
Algeria
Ecuador 4,590
4,470
4,460
4,360
3,920
3,830
3,790
3,590
3,500
3,460
3,450
3,400
3,280
3,280 d
2,990
2,930
2,930
2,890
2,830
2,770
2,760
2,750
2,730
2,630 89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
105
106
107
108
109
110
111
112
114
10 Brazil
Equatorial Guinea
Iran, Islamic Rep.
Tonga
Namibia
Tunisia
Belarus
BosniaHerzegovina
Kazakhstan
Colombia
Panama
Grenada
Dominican Republic
Macedonia, FYR
Algeria
Belize
Ukraine
China
Samoa
St. Vincent
Venezuela, RB
Cape Verde
St. Lucia
Fiji 8,230
7,580 a, c
8,050
8,040 c
7,910 c
7,900
7,890
7,790 c
7,730
7,420 c
7,310
7,260
7,150 c
7,080
6,770 c
6,740
6,720
6,600
6,480
6,460
6,440
6,000 c
5,980
5,960 113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
134
135 Peru
Albania
Suriname
Jordan
El Salvador
BosniaHerzegovina
Guatemala
Maldives
Dominican Republic
Micronesia, Fed. Sts.
Colombia
Swaziland
Tonga
Samoa
Cape Verde
China
Morocco
Vanuatu
Ukraine
Armenia
Kiribati
Syria 2,610
2,580
2,540
2,500
2,450
2,440
2,400
2,390
2,370
2,300
2,290
2,280
2,190
2,090
1,870
1,740
1,730
1,600
1,520
1,470
1,390
1,380 115
117
118
119
121
122
123
125
126
127
129
130
132
133
134
135
136
137
138
139
140
141 11 Gabon
Peru
Lebanon
Dominica
Albania
Philippines
Jordan
Swaziland
El Salvador
Armenia
Paraguay
Azerbaijan
Sri Lanka
Egypt, Arab Rep.
Guatemala
Morocco
Guyana
Jamaica
Ecuador
Syria
Indonesia
Nicaragua 5,890
5,830
5,740
5,560
5,420
5,300
5,280
5,190
5,120 c
5,060
4,970 c
4,890
4,520
4,440
4,410 c
4,360
4,230 c
4,110
4,070
3,740
3,720
3,650 c 136
136
138
139
139
139
142
143
144
145
146
147
147
147
150
151
152
153
155
156
158
159
160
161 Angola
Georgia
Philippines
Indonesia
Paraguay
West BankGaza
Egypt, Arab Rep.
Azerbaijan
Honduras
Sri Lanka
Djibouti
Bolivia
Cameroon
Guyana
Lesotho
Congo, Rep.
Nicaragua
Moldova
Bhutan
Côte d'Ivoire
TimorLeste
India
Senegal
Mongolia 1,350
1,350
1,300
1,280
1,280
1,120 a
1,250
1,240
1,190
1,160
1,020
1,010
1,010
1,010
960
950
910
880
870
840
750
720
710
690 144
145
146
147
148
149
151
154
155
155
158
159
159
161
162
163
163
163
166
167
167
169
169
171
12 India
Lesotho
Georgia
Vanuatu
Vietnam
Honduras
Bolivia
Cambodia
Ghana
Papua New Guinea
Pakistan
Djibouti
Guinea
Angola
Mongolia
Cameroon
Mauritania
Moldova
Bangladesh
Lao PDR
Uzbekistan
Comoros
Sudan
Zimbabwe 3,460 c
3,410 c
3,270
3,170 c
3,010
2,900 c
2,740
2,490 c
2,370 c
2,370 c
2,350
2,240 c
2,240
2,210 c
2,190
2,150
2,150 c
2,150
2,090
2,020
2,020
2,000 c
2,000
1,940 161
163
164
164
166
167
168
169
169
171
172
172
174
175
176
176
178
178
180
180
182
183
183
185 Pakistan
Papua New Guinea
Comoros
Sudan
Vietnam
Yemen, Rep.
Solomon Islands
Mauritania
Nigeria
Kenya
Benin
Uzbekistan
Zambia
Bangladesh
Ghana
Haiti
Kyrgyz Republic
Lao PDR
Burkina Faso
Chad
São ToméPrincipe
Cambodia
Mali
Guinea 690
660
640
640
620
600
590
560
560
530
510
510
490
470
450
450
440
440
400
400
390
380
380
370 172
173
174
175
176
177
178
179
180
181
183
184
185
186
187
188
189
191
192
193
193
195
196
197
13 Gambia, The
Solomon Islands
Kyrgyz Republic
Haiti
Senegal
Togo
Nepal
Uganda
Côte d'Ivoire
Chad
Rwanda
Mozambique
Tajikistan
Burkina Faso
Kenya
Central African Rep
Benin
Nigeria
Eritrea
Ethiopia
Mali
Zambia
Yemen, Rep.
Madagascar 1,920 c
1,880 c
1,870
1,840 c
1,770
1,550
1,530
1,500 c
1,490
1,470 c
1,320
1,270 c
1,260
1,220 c
1,170
1,140 c
1,110
1,040
1,010 c
1,000 c
1,000
950
920
880 186 Central African Rep
186 Togo
188 Tanzania
188 Zimbabwe
190 Tajikistan
191 Mozambique
192 Gambia, The
192 Madagascar
194 Uganda
195 Nepal
196 Niger
197 Rwanda
199 Eritrea
199 Sierra Leone
201 GuineaBissau
202 Ethiopia
202 Malawi
206 Liberia
207 Congo, Dem. Rep.
208 Burundi
World 350
350
340
340
330
310
290
290
280
270
240
230
220
220
180
160
160
130
120
100
6,987 200
201
202
203
204
206
207
208 Congo, Rep.
Niger
Sierra Leone
Tanzania
Congo, Dem. Rep.
GuineaBissau
Malawi
Burundi World 14 810
800 c
780
730 g
720 c
700 c
650
640 c 9,420 World
6,987
World
9,420
Low income
580
Low income
2,486
Middle income
2,640
Middle income
7,195
Lower middle income
1,918
Lower middle income
6,313
Upper middle income
5,625
Upper middle income
10,924
Low & middle income
1,746
Low & middle income
5,151
East Asia & Pacific
1,627
East Asia & Pacific
5,914
Europe & Central Asia
4,113
Europe & Central Asia
9,142
LA & Caribbean
4,008
LA & Caribbean
8,111
Middle East & N Africa
2,241
Middle East & N Africa
6,076
South Asia
684
South Asia
3,142
SubSaharan Africa
745
SubSaharan Africa
1,981
High income
35,131
High income
32,524
EMU
31,914
EMU
28,958
Low income $875 or less. Lower middle income $876 to $3,465. Upper middle income
$3,466 to $10,725. High income $10,726 or more.
Notes:
Figures in italics are for 2004 or 2003. “..” Not available. a. 2005 data not available; ranking is approximate. c. Estimate
is based on regression; other PPP figures are extrapolated from the latest ICP benchmark estimates. d. Excludes data for
Kosovo.
Source: World Development Indicators database, World Bank, 1 July 2006. Source table reports GNI = GDP. Source
table gives estimated income categories for remaining countries, omitted here. Source table contains additional
footnotes, deleted here. See source for details. http://siteresource.worldbank.org/ICPINT/Resources/Atlas_2005.pdf 15 FREQUENCY DISTRIBUTIONS Suppose we try to construct a frequency distribution using class intervals of equal length.
The “square root” guideline suggests 14 or 15 class intervals, implying class intervals of
$4000 to $5000.
Range of per capita income is $100 (Burundi) to over $65,000 (Luxembourg).
Let’s consider using 13 class intervals of $5000. (16 class intervals of $4000 would be
another possibility.) 16 FREQUENCY DISTRIBUTIONS What does this choice of class intervals imply for the frequency distribution?
Portugal is approximately the 75th percentile country, with per capita income under
$20,000. More than 3/4 of countries will be in the bottom 4 class intervals.
Brazil ($3460 or $8230, depending on the conversion method) and Tunisia ($2890 or
$7900) are approximately median countries. More than 1/2 of countries will be in the
bottom 2 class intervals.
Bolivia ($1010 or $2740) and India ($720 or $3400) are approximately the 25th
percentile countries. More than 1/4 of countries will be in the bottom class interval! 17 FREQUENCY DISTRIBUTIONS In fact, the bottom of the distribution contains even more countries than this!
Atlas method: More than 1/2 of countries would be in the bottom class interval (82 to
208, Malaysia to Burundi)
PPP method: More than 1/3 of countries would be in the bottom class interval (129 to
208, Paraguay to Burundi)
Class intervals at the top of the distribution have very few observations.
PPP method: Luxembourg would be in the highest class interval by itself, then three
class intervals would be empty, then the US would be in the 40,000 to < 45,000 interval
by itself. (Bermuda is in there somewhere!)
Atlas method: Only 10 countries in the top 5 class intervals
Even if we used class intervals of $1000 (and 65 class intervals is too many!), we’d lose
too much detail at the bottom end of the distribution.
The appropriate choice is unequal class intervals—small intervals at the bottom of the
distribution, and larger ones for the relatively small number of highincome countries. 18 FREQUENCY DISTRIBUTIONS Three ways of tabulating the observations: Notation (1) Frequency distribution n = number of observations
K classes, indexed by k = 1,…, K Number of observations
falling into each class interval Class frequencies: nk, k = 1,…, K (2) Relative frequency distribution K nk Fraction of observations
falling into each class interval n k1 Relative frequencies: fk, k = 1,…, K
nk
fk
n (3) Cumulative relative frequency
distribution K Fraction of observations
falling in that class or a lower class fk 1 k1 Cumulative relative frequencies:
Fk, k = 1,…,K
k Fk 19 h1 fh f1 f2 ... fk FREQUENCY DISTRIBUTIONS Frequency histograms
To construct a frequency histogram:
Mark the class intervals on the horizontal axis.
Draw a rectangle on each class interval, whose height is the class frequency. 20 FREQUENCY DISTRIBUTIONS In this example, we can “relabel” the vertical axis with the relative frequencies, and the
relative frequency histogram will have the same profile as the frequency histogram: Frequency histogram
14
12 7
6
4
1
35 40 45 50 55
21 60 65 70 75 FREQUENCY DISTRIBUTIONS In this example, we can “relabel” the vertical axis with the relative frequencies, and the
relative frequency histogram will have the same profile as the frequency histogram: Relative frequency histogram
.275
.235 .137
.118
.078
.020
35 40 45 50 55
22 60 65 70 75 FREQUENCY DISTRIBUTIONS Example—Age structure of the US population
Age structure of the US population, percentages
Age
<5
5 to < 10
10 to < 15
15 to < 20
20 to < 25
25 to < 30
30 to < 35
35 to < 40
40 to < 45
45 to < 50
50 to < 55
55 to < 60
60 to < 65
65 to < 75
75 1960
11.3
10.4
9.4
7.4
6.2
6.1
6.6
6.9
6.5
6.0
5.3
4.7
4.0
6.1
3.1 1988
7.5
7.3
6.7
7.4
7.9
8.9
8.9
7.8
6.6
5.3
4.5
4.4
4.4
7.3
5.1
23 2005
6.8
6.6
7.0
7.1
7.1
6.8
6.8
7.1
7.7
7.6
6.8
5.9
4.4
6.3
6.1 Population was 180.671 million in
1960, 246.329 million in 1988,
296.507 million in 2005. Statistical
Abstract of the United States FREQUENCY DISTRIBUTIONS Relative frequency distribution and cumulative relative frequency distributions for a more
aggregated set of class intervals: Age
<15 1988
cumulative
relative
relative
frequency
frequency 2005
relative
frequency cumulative
relative
frequency 21.5 21.5 20.4 20.4 15 to < 25 15.3 36.8 14.2 34.2 25 to < 55 42.0 78.8 42.8 77.4 55 to < 65 8.8 87.6 10.3 87.7 65 12.4 100.0 12.4 100.0 24 FREQUENCY DISTRIBUTIONS Histogram for the 1988 distribution (assuming the upper limit of the highest class interval is
100 years). What do you think of this histogram?
Percentage of observations 42
40 30 21.5
20 15.3 12.4
8.8 10 0 15 25 55 25 65 100
Class intervals FREQUENCY DISTRIBUTIONS The areas of the rectangles are not proportional to the relative frequencies! Why not?
Because the class intervals are unequal! We must correct the heights so that area represents relative frequency. How is the height
of each rectangle determined?
Area = height width
Width is stated as a multiple of the narrowest class interval
Solve for height 26 FREQUENCY DISTRIBUTIONS Percentage of observations
20 15 21.5/1.5 15.3
= 14.33 42/3 = 14
8.8 10 5 12.4/3.5 = 3.54 0 15 25 55 27 65 100
Class intervals FREQUENCY DISTRIBUTIONS Summary:
If class intervals are equal, the frequency histogram and relative frequency histogram
have the same profile—we can just relabel the vertical axis to go between them.
If class intervals are unequal, we must use a relative frequency histogram, correcting
the heights so that area = relative frequency.
The total area of a relative frequency histogram always equals one (or 100%). 28 FREQUENCY DISTRIBUTIONS GROUPED MEAN AND GROUPED VARIANCE 29 FREQUENCY DISTRIBUTIONS Sometimes we do not have the raw data, but only the frequency or relative frequency
distribution (“grouped data”).
How can we compute measures of central tendency, dispersion, and percentiles from
grouped data?
The representation of a continuous variable’s probability
distribution closely resembles a relative frequency distribution.
Intuition for how to compute the mean, variance, and
percentiles from grouped data will carry over! Begin with the “grouped mean” and “grouped variance.” 30 FREQUENCY DISTRIBUTIONS Grouped mean: 1n
We want to calculate X
Xi
ni 1
We don’t have Xi, i = 1, …, n, so we can’t calculate it! What do we have?
1. class intervals (K of them)
K 2. class frequency of each class interval (nk, k = 1,…, K, where nk k1 31 n) FREQUENCY DISTRIBUTIONS How can we use this information?
We need to assume where in each class interval the observations lie. What is a logical
assumption?
The midpoint of the kth class interval is called the class mark, denoted Xk.
Assume every observation in the kth class interval lies on the class mark. 32 FREQUENCY DISTRIBUTIONS What’s wrong with the following statistic? 1K
Xk
Kk 1 Average of the class marks (midpoints) Example: n = 6
class intervals class marks data set #1 data set #2 0 to < 20 X1 = 10 n1 = 2 n1 = 1 20 to < 40 X2 = 30 n2 = 2 n2 = 2 40 to 60 X3 = 50 n3 = 2 n3 = 3 1
(10 30 50) 30
3 33 FREQUENCY DISTRIBUTIONS How should we change this statistic so it better estimates the mean of the raw data? 1K
nk X k
nk 1
Example: n = 6
class intervals class marks data set #1 data set #2 0 to < 20 X1 = 10 n1 = 2 n1 = 1 20 to < 40 X2 = 30 n2 = 2 n2 = 2 40 to 60 X3 = 50 n3 = 2 n3 = 3 X 1
[ 2(10) 2(30) 2(50) ] 30
6 34 X 1
2
[ 1(10) 2(30) 3(50) ] 36
6
3 FREQUENCY DISTRIBUTIONS Note the grouped mean is a weighted average of the class marks: 1K
X
nk X k
nk 1
Kn
kX
X
k
n
k1
K X f k Xk k1 Move nk
n 1
inside the summation sign
n fk K
1K
Definition: The grouped mean is X
nk X k , or equivalently, X
f k Xk .
nk 1
k1 35 FREQUENCY DISTRIBUTIONS Grouped variance:
How can we estimate S 2
X 1 n (X i n 1i 1
kth class interval lies on the class mark! Xk X (X k X)2 ? Continue to assume every observation in the X)2
X)2 nk (Xk
K nk (X k X) 2 k1 S2
X 1 K nk (X k n 1k 1 X) 2 36 FREQUENCY DISTRIBUTIONS S2
X 1 K nk (X k n 1k 1 X) 2 Example: n = 6
class intervals data set #1 data set #2 0 to < 20 X1 = 10 n1 = 2 n1 = 1 20 to < 40 X2 = 30 n2 = 2 n2 = 2 40 to 60
S2
X class marks X3 = 50 n3 = 2 n3 = 3 1
[ 2(10 30)2
5 2(30 30)2
S2
X 2(50 30)2 ] 1
2
[ 1(10 36 )2
5
3 37 320
2
2(30 36 )2
3 2
3(50 36 )2 ] 266.67
3 FREQUENCY DISTRIBUTIONS The grouped variance approximately equals a weighted average of the squared deviations
of the class marks from the grouped mean: 1 S2
X K nk (X k n 1k 1
K nk
(X k k1n 1 S2
X X) 2 X) 2 Divide by n – 1 Move 1
n1 inside the summation fk K S2
X f k (X k X) 2 If n is large, k1 Definition: The grouped variance is S 2
X S2
X K f k (X k 1 nk
nk
n1 n K nk (X k n 1k 1 X) 2 . k1 38 X)2 , or (approximately) FREQUENCY DISTRIBUTIONS Summary:
When computing the mean and variance from grouped data, assume every observation in
the kth class interval lies on the class mark.
The grouped mean is a weighted average of the class marks, and the grouped variance is
(approximately) a weighted average of the squared deviations of the class marks from the
grouped mean … with relative frequencies as weights. 39 FREQUENCY DISTRIBUTIONS COMPUTING PERCENTILES FROM GROUPED DATA 40 FREQUENCY DISTRIBUTIONS Percentiles
Example:
Class interval Relative frequency 0 to < 5 .08 5 to < 10 .12 .4 10 to < 15 .20 .3 15 to < 20 .25 ...
View
Full Document
 Spring '12
 BENCIVENGA
 Economics, Frequency, Frequency distribution, Histogram, frequency distributions

Click to edit the document details