This preview shows pages 1–15. Sign up to view the full content.
This preview has intentionally blurred sections. Sign up to view the full version.
View Full DocumentThis preview has intentionally blurred sections. Sign up to view the full version.
View Full DocumentThis preview has intentionally blurred sections. Sign up to view the full version.
View Full DocumentThis preview has intentionally blurred sections. Sign up to view the full version.
View Full DocumentThis preview has intentionally blurred sections. Sign up to view the full version.
View Full DocumentThis preview has intentionally blurred sections. Sign up to view the full version.
View Full DocumentThis preview has intentionally blurred sections. Sign up to view the full version.
View Full Document
Unformatted text preview: Appendix 2
Elementary Statistical Methods A21. Introduction Economists, and other scientists as well, are often interested in understanding
the relationship between two or more variables For instance, an agricultural
scientist might want to know how variations in annual rainfall affect crop
output, a social worker might wonder whether school dropout rates have
anything to do with crime rates in cities, and an economist might want to
test the hunch that higher income levels, or perhaps a diminished population
of storks, tend to make for smallersized families. An important statistical , technique that allows the exploration of possible interrelationships between
variables is called regression analysis. This book contains several instances of
such analysis. Let us suppose that we want to investigate the relationship betWeen two
variables x and y For instance, x might be annual rainfall measured in inches
and 1} annual crop output, say, metric tons of wheat Our ﬁrst task is to
collect the data: we will need a number of joint observations of (x, y) values;
usually, the more the merrier. Observations may be collected at various levels
of detail: countries, regions, groups, individuals, and so on. In the rainfall
example we might have observations from several regions or states of one
01 more countries, and several observations (at different points of time) for
each region, Observations collected at the same point in time but across different units
(regions, countries, individuals) form a crosssectional data set Observations
collected for the same unit but over different points in time form a time series.
Mixed observations (both across units and across time) form a panel The general rule, of course, is that more data are preferred to less, but
the problem is that detailed and appropriate data are often unavailable. Un—
derstandably enough, this problem is more acute for developing countries
For instance, we would love to test the Kuznets inverted—U hypothesis (see
Chapter 7) with a long time series for a given country, but this kind of de
tail is available only for a few countries. Hence, we need to be aware of the
pitfalls of limited data, and must attempt to correct for these limitations in
the best way possible. In a sense, this is what statistical analysis is all about 778 Appendix 2 Elementary Statistical Methods For example, in trying to estimate the effect of rainfall on crop output,
cross—section data on rainfall and output alone probably will be inappropriate
because there can be inrportant (unobserved) differences that might obscure
the “pure” effect of rainfall on agricultural productivity, or, worse still, the
measured effect might be systematically biased because we have neglected
to include some other variable that may be systematically correlated with
rainfall and have its own effect on crop productivity The following exercise provides an example. (1) Regions with low rainfall may have invested in irrigation. If the ir
rigation data are not included in our analysis, explain why the measured
effect of rainfall will be systematically biased downward. In other situations, a pure time series may be problematic. Suppose that
we are interested in knowing how household income affects family size.
Again, if all the data we have pertain to household income and family size,
our estimates might be confounded by changes in other variables over time:
the spread of education, the availability of better birth—control methods, and
so on Some of these variables may be completely uncorrelated with income
changes, but others may be correlated and might bias our estimates (2) Suppose that income per se has no effect on fertility choices, but
education does. If we lack data on education, show ‘that observations on
income and fertility may suggest a positive relationship between the two, when in fact there is none (ceteris paribas) Thus much of regression analysis is concerned with the careful estima—
tion of bilateral relationships, while making all attempts to control for other
variables that may also affect that relationship. A22. Summary statistics Before embarking on a detailed discussion of the relationship between
variables x and y, let’s identify some summary features of these variables.
Suppose we have a pairs of observations: represent them as (.xl, yl), (x2! 92) " ' I (xrrl ya) The mean The average of these observations is often important, and this is typically mea
sured by the arithmetic mean. It is the sum of all observations of the relevant
variable divided by the total number of observations (we have rr observa—
tions of each variable in the foregoing general description) The arithmetic A22 Summary Statistics 779 means 7E and y" of x and 1/, respectively, are mathematically represented as (A21) ﬁliizetet“ +391
ni=l I n I _ 1 n r + .
(A22) yzgzyf2J1 3’2: +an 1: The‘surmnation symbol (2) is a shorthand description of the summation, or adding up, operation. The notaﬁon x, denotes the ith observation of variable The variance The mean is not the only relevant summary of the observations of a variable.
We would also like to know whether the different observations lie more or
less close to the mean (ie, whether they are bunched closely to one another)
or far from it (ie, whether they are widely dispersed). One way to do this
is to somehow add up all the differences of the observations from the mean
Note that all differences count, whether positive or negative There is a com— .
monly accepted measure of diSpersion in statistics, which is the variance (or its close cousins, the standard deviation and the coefficient of variation). The variance puts positive and negative differences from the mean on the
same footing by squaring these differences: thus all negative signs vanish
Squaring has another property as well: it attaches proportionately greater
weight to larger deviations from the mean: a difference of 2 counts as 4
as far as the variance is concerned, whereas a difference of 5 counts as 25
Mathematically, the variance is given by the formula 1 ﬂ
(A23) V = — . — ' 2 n go: a l
which is interpreted as the average value of all (squared) deviations from
the mean 1 The variance is often presented in the following equivalent form of a standard deviation, which makes the units comparable to those in which
the variable originally was measured: (A24) 0 = ﬁl 1 . . . . . .
There is a slight distinction between the variance and the sample variance that we ignore here, but see subsequent text 730 Appendix 2 Elementary Statistical Methods Notice that it is irnpor tant to take the average of the squared differences from
the mean, and not just their sum. This is because even if individual differ—
ences are small (so that there is little “dispersion” in the data actually), the
aggregate of such differences can be large simply because we have a large
number of observations, and we don’t want that This kind of reasoning also
suggests that the variance (or standard deviation) should be expressed as a
ratio of the mean: if not, an innocent change in the units of measurement can
affect the measure of dispersion This gives rise to the coeﬁicient ofoariation: (A2 5) c = E: Correlation So far we have discussed summary statistics about a single variable How~
ever, our main goal is to understand whether two (or more) variables move
together: whether they covary To understand the notion of covariance, con
sider the familiar example of two farmers who produce the same crop in
two different parts of a country: The output of either farmer can take on
only two values: H (high) and l. (low) H occurs with probability p, where
p is the chance of having adequate rainfall and is the same across the two
farmers. Now carry out the thought experiment of moving the farmers closer and
closer together, initially starting out at two wellseparated locations in the
country in which they live Because the initial locations are far apart, the
probability of good rainfall in one location is “independent” of outcomes
in the other location Put another way, the knowledge that one farmer has
suffered l tells us nothing about what might have happened to the other
farmer As we move the two farmers closer and closer together, their for tunes
become more closely linked: if one farmer produces H, you can guess with
greater and greater degrees of certainty that the other farmer has produced
H as well At the very end of this thought experiment, when the two farmers
are neighbor's, their outputs will covary perfectly (if rainfall is the only source
of uncertainty, which we’ve assumed): Note three things about this example“ First, if we just focus on any one
of the farmer's, nothing changes The probability that he produces H is p
all along The behavior of the individual random variables (output in this
case) tells us nothing about how they might be correlated. In this sense,
notions such as the mean and the variance do not tell us anything about
joint movements of the variables x and y: Second, the fact that two variables covary (as they do in the example
here when the farmer's live close to each other) tells us nothing about the
direction of causation from one of the variables to the other, or indeed if there
is a causal link between the two at all In our example, an H for one farmer
does not in any way cause an H for the other, even if the two outputs are .,.a;...x.'w..; "1:; ﬁxated—‘5; Vljl.'#‘l':‘l‘l"v‘ti"ili‘bl‘—1;;i:ﬁ;lLing. A2 2 Summary Statistics 781 perfectly correlated It’s just that there is some third variable (in this case, the
state of the monsoon) that is a common driver for the two Outputs IhE‘I‘EfOI’e
our notions of causality must, in some sense, be formed by common sense
observations of which variable is likely to be exogenous and WhiCh is likeﬂy
to be endogenous. For instance, if we took as 0111 tWO Valiables (i) the state
of the monsoon and (ii) the crop output of a single fainter“, then a positive
correlation between these two variables is more likeb’ to be 1'1’1‘31ica’fiVe 0f
causality: it is highly unlikely that the output of a single farmer will inﬂuence
the state of the weather,
Third, two variables may covary negatively as Well as positively (3) Consider the chances of a student scoring the highE’Sl Slade in math
ematics in her class Suppose the chances are given by the probability p If
two equally able students are drawn at random and their chances are exam—
ined, then show that their chances are independent if the two are in different
classes, whereas they covary negatively if they are in the same class Note
that the negative covariance is perfect if each class has only two Students A measure of observed correlation between two Valiables 7‘ and V is
given by the covariance If We have a sample of n pairs of observations (761, V1), (742; 9’2), , (96,” y”), then the covariance2 is giVen by
l n _ ‘
(A2 6) covry E E 21:0, — xlfl/z' ' 9’) Note how this captures comovements If, when 1}: exceeds its mean! xi ex‘
ceeds its mean as well, then the covaﬂance will be positive, but it the fact
that 1, exceeds its mean has no bearing (on average) on the behavior of xi,
the covariance will be zeros Similaﬂy, if xi tends to fall short of its mean
When '91 EXCBEdS ‘9, the covariance will be negative" ;
The covariance has the same problem as the variance, in that the number
obtained is not free of units of measurement To remedy this we express the
covariance as a fraction of the product of standard deviations of the two
variables This yields the coejj‘i'cient of correlation, WhiCh we (lemme by covxy (A2 7) R : a, 0",},
2 . . _ . . 1
Just as m the case of the sample variance, there is a Shght distmctlon between the sample covariance and the covariance that we ignore here: 3 1:0 $96 this a bit more formally, note that if 3/, has "no healing _ .
P18, this just means that the distribution of r values around the me” x W111 lock the same Whether we look at the whole sample or whemeI we 1001; at the subsample restricted to one particular obser
valion 0f y Resmcﬁﬂg Ourselves to the latter; we see that Zr (xi ﬂag? ”g) 2 0 ml each suhsample of
pairs (1,, y!) such that y. equals some ﬁxed value I} Adding Over all suah subsamples {by changing
the Value Of ll); We See that the covariance must be zero. “ on the behavior of r, in the sam» A23. Regression 782 Appendix 2 Elementary Statistical Methods
783 r l Clearly R is also positive (or negative) when the two variables comove pos itively (or negatively), Sometimes R2 is reported instead of R when we do 3 1412.3, Regressibn not wish to focus on the direction of the association, but only its strength
(squaring removes the negative sign) The reason for dividing by the product of the standard deviations is only
in part to obtain a units—free number (dividing, for instance, by the product
of the means would have achieved this goal as well). The particular normal—
ization we choose has the virtue of placing R between —1 and +1. These
extremes signify maximal correlation, whereas 0 signals lack of correlation
between the two random variables Although we omit the proof of this as—
sertion, it can be found in any standard statistics textbook (see, for instance,
Hoel [1984, pp” 385—386]) It is important, however, to note that R or R2 is not just a measure of
association between two random variables, but a measure of a very special
sort of association: one that is linear, Indeed, R2 takes on its maximum value
of 1 when the relationship between x and y can be expressed in the form
y, = A + bx, for all i, for some constants A and b However, the true rela—
tionship between x and y may not be linear (though it may be very strong)
For instance, a greater consumption of calories leads to an increase in work
capacity over some range (see Chapter 8), but after a point the relationship
between calorie consumption and work capacity turns negative (as obesity
sets in) Thus the true relationship involves a zone of positive association
and another zone of negative association At the same time, if you have a
large range of calorie—capacity observations and mindlessly calculate the COI—
relation coefficient between these two variables, it may not be very high,
simply because the correlation coefficient cancels out these two conflicting
zones of association, The lack of a high correlation does not mean that there
is no relationship at all: it just means that we are applying our concepts in
the wrong way The choice of a specification of the true underlying relation—
ship is part of the economist’s art, and although statistical methods can be
indicative of the direction in which to go, an underlying theory is essential
We will consider more of this in the next section. Introduction Spppose that we are interested in the form of the relationship between vari— 6 95 TC and y, and not just in the existence of a correlation Suppose we have this theory would have the o ' ‘ ' ‘ ‘
I I pposrte prediction. A regressron usin a ll bl
data might throw light on the (relative) validity of these theories“g V8 a e The second way in which a regression can be helpful is in forecasting If with a fair degree of certainty, Variable x can also be a government policy
parameter (like taxes), so that if a change in its value is seen to be forth— (4) Generate an imaginary set of calorie—capacity observations using the
relationship y m A + bx — er2 (here it stands for calories consumed and 1/
stands for work capacity) What signs would you use for the constants A, b,
and c? For given constants, where do the zones of positive and negative asso
ciations lie? Now use this relationship to generate a set of observations and
calculate the correlation coefficient between x and y What would happen
if you restricted your observations to only those from the zone of positive association? Often, a careful look at the data with the naked eye will tell us more than all
kinds of statistical measures A scatter plot allows. us to do just this First We
decrde (on the basis of our experience and / or theory) which variable is tb be causal” and which is to be the variable that’s affected by the movements of
the causal variable.” Following convention, we let 3: stand for the causal or
independent variable and y stand for the dependent variable (the nomenclature
itself tells us that we, as econometricians, suspect a particular direction of
causation and have used this already to classify the variables), 784 Appendix 2. Elementary Statistical Methods Next, we construct a diagram in which the independent variable is put
on the horizontal axis and the dependent variable occupies the vertical axis
On this diagram we record our sample observations. Theresulting plot of
observations is called a scatter diagram or scatter plot. Our first (and critically
important) statistical technique is: stare hard at the scatter plot. As an illustration, Figure A21 reproduces Figure 2 7 from Chapter The
independent variable is per capita income and the dependent variable is life
expectancy. The observation pairs are from different countries: the data form a cross section. . ‘ f
To facilitate our visual examination, the figure draws in the means 0 each of the two variables in the form of a pair of cross lines Note how most
of the data lie in the first and third quadrants Created by the cross lines Ihrs
suggests that when per capita income exceeds its. mean value, life expectancy
tends to exceed its mean value as well, which is Just a way of noting that the
:oefficient of correlation is likely to be positive. 35 inmm iiii .wu—__________________w.mﬂ_ 30
5
T5 ! .
O
.0
* 9 ‘9 7D . . . o
a? l Q . 0
E ' ‘ O Q * 9
a; as g ,
5‘ I O
I: E O
E O
U
a 60 O .
x
in
g 0
’ 55 9 O
l : 5o , . 45 . 40 ﬂ 8000 9000 0 7000 2000 3000 4000 5000 8000 7000 Per Capita Income igare A21 Scatter diagram ofobseroatrons of per capita income and life expectancy in diﬁer‘ent
iantries Source. World Development Report (World Bank [1995]) and Human Development
Report (United Nations Development Programme [1995]) A2 3 Regression 785 After this preliminary step, it is a good idea to get a sense of the overall
relationship. Is a straight line the best ﬁt? In the example studied here, this
is unlikely to be the case. The reason is that life expectancy is difficult to
push beyond 80 or so (for medical reasons), whereas the jumps from 50 to
60 and from 60 to 70 can be made more rapidly. Ihis suggests that the true
relationship is more a curve than a straight line, with the curve ﬂattening out
as we move into higher ranges of per capita income This sort of relationship
seems to be broadly supported by the scatter diagram. Ihe mathematical
form of the regression should be constructed with this in mind Finally come two conceptually important issues. Remember that our goal
is to understand if x has a strong impact on 1/ However, what does the word “strong” mean exactly? Figure A2 2 illustrates the problem In both panels
of this diagram we have relationships that are most likely linear. In the first
panel, we have a scatter plot between x' and y where the fit is remarkably
good: the plots closely hug some straight line, but at the same time the slope
of the line is flat In the second panel the scatter is much more pronounced,
but the slepe of the “best ﬁtted line” seems to be high. (You will appreciate
the difference better if you look at the scale in which the vertical axes are
drawn in the two panels: the scale is much more compressed in the second
panel ) Thus “strong” has two meanings in this context A relationship may be
estimable in a precise fashion: in the first panel, even though the effect of
x on y is not large, the data tell us that this statement can be made quite
precisely. "Ihe second meaning is that the effect of x on y is large. As Figure
A2.2 shows, this statement is quite compatible with the observation that a
precise estimate of the relationship itself is not to be had. Note that the correlation coefﬁcient captures the notion of “strength” in
its ﬁrst sense (at least when the underlying relationship is linear) It does not matter what the slope is: if the data fit perfectly on a straight line, the
correlation coefficient will always equal unity. (5) Suppose that observations on (r, y) pairs are generated directly from
the equation y = A + bx, where A and t; are constants Assuming that b 7E 0
and that there are at least two observations, show that R2 = l irrespective of
the value of ii, the slope The basics of regressibn Suppose we feel that the relationship between x and y can be well described
by a straight line. Ihus we suggest a (linear) equation of the form (A28) y z A + a, where A and b are (as yet unspeciﬁed) constants. This equation describes a
possible relationship: it says that y assumes a value of A when x = 0, and ll 786 Appendix 2. Elementary Statistical Methods d3 as ,_,_W _,__—«
5 It) 15 20 25 Ill) 35 40 45 5o
x values MU 120 ..
v 100 ED x values figure A2 2 Nations of a “strong” relationship its value increases (or decreases) by an amount I? for each additional unit
increase (or decrease) in the value of x, In graphical terms, (A2 8) describes a straight line in the (x, 1/) plane
Of course, if we vary the values of A and b, we will alter both the position
and the slope of this straight line Look at and compare the two panels in
Figures A2 3, In both panels, the numerous dotted points represent the same
scatter diagram—a plot of several pairs of joint observations of x and 1/
On the other hand, the two straight lines in the two panels represent tWO A23 Regression 787 140 .1 140
,. ‘ /:.‘I
120 , ..  120 // ‘
100  y,/ 100 /,,r
/ 13..
/
so b u//. I so // L
‘ .. ,x/ //' .,
LO / ,
3 50 ‘I "K, g 50 /,s/
on ’ — / r
>  ‘ 4/ ' m '
>~ 4° _/ :40 /” ‘
20 ’f i
, ,,./ " 20 .,   "
/
0 r‘ﬂ— 1" . r 0 l—. ' i ‘I —r. —. _ J
,. 10
“20 20 so 40 50 _20 a ,. 10 20 30 40 50
«is L 40 x values x values figure A2 3 ﬁtting a line to a scatter: different attempts to give a stylized picture of the relationship between x
and y. In the lefthand panel, the actual data points are all more or less close
to the line drawn” In the light—hand panel, however, many of the data points
are quite far off, Obviously, the straight line in the lett~hand panel is a better
representation” of the data than its Emmtelpalt in the lightbald Primal“ In
Other WOI‘dS, it “fits” the data better Given a set of observations, therefore,
GUI faSk is to find the straight line that is the "best fit” to the data; it amounts
to the same thing as finding the “best” values of A and b in equation (A28).
However, an infinite number of straight lines can be drawn on a plane, and
rt. rs impossible to judge the relative merits of all of them merely through
Visual inspection (as We did for the relatively easy task of selecting among
the lines in Figure A23!) What precise criterion is to be used to find the
proper numerical values of A and '5?
Notice that for a line to be a "good ﬁt,” we require that the actual data
Points be. not very far away from the line For every ObSEIVaﬁon 7% 0f the
variable x, the corresponding value of y as obtained from the stylized rela
tionship sumanzed m the given line is (A + too). However, the actual, or
observng value Of y when x 2 xi is yi, Hence, if we use the line y = A + bx
as a description (or a forecasting device), then for the fill observation, we
have an "enOI” equal to (yr. _ A — 1994,) We would like to choose a line so
as to keep such eHors as low as possible, on the average, Because large er—
rors of opposite signs may cancel each other out, it is appropriate to look at
the squares of the valious error terms, It is therefore standard statistical prac—
tice to choose the values of A and tr in such a way as to make the sum oftlze
satrared errors as low as possible“ This mE‘lhOd 0f "ﬁtting" a Sﬂ'aigh’f line to
grven data is knowrr as the ordinary least squares (OLS) method. The (linear)
equation thus Obtajﬂed is called the (linear) regression equation“
All outline of the OLS procedure foHOWs We have collected rr pairs of
Observations on x and y A typical pair is denoted by (1,, y,). If we fit the 788 Appendix 2 Elementary Statistical Methods line 1/ = A + bx to the observations, the value y, will differ from the predicted
value A + hr, by a margin that is the error in the fit at that observation pair. Under this interpretation, we may think of yi' 3 A + bx; 51‘, where 6, represents all kinds of random disturbances that inﬂuence yi other
than xi. The coefﬁcients A and b are the unknown parameters that we would A like to estimate Given any estimate A and b of A and la, we see that the
predicted value of yi is yi = A + bxii
whereas the prediction error of yi is just
9i = 972' ““ yr The sum of squared errors (SSE, sometimes also called residual sum of squares)
is therefore given by SSE = i e?
i=1 A Ihe OLS estimates of A and b are deﬁned as the particular values A and
la of A and b that minimize the SSE for the sample data. We omit the details
of the derivation, but note that the OLS estimates are given by the formulas: (1/74) 21:1(3‘1‘ — 30W: — ﬂ) 2 £11: (A210) 3 E (1/11) zen — W or
and
(A211) A a 1; — tr (6) If you know a bit of calculus it is very easy to derive A and 5 on
your own Set things up as follows: using equation (A2 9), note that the minimization of SSE is equivalent to the problem
minimize A, , 2m + bx, — a)?
{:1 Now take derivatives of this expression with respect to A and b and set
the derivatives equal to zero (these are the first—order conditions) Solve the
linear equations in the ﬁrst—order conditions to get (A2 10) and (A211). mamaeminentim mum4m i am: «damJenna; ..~ ' A2 3 Regression 789 The optimally chosen value b is called the regression coeﬁ‘icient. It tells us 9
about the strength of the influence of x on y; a high value of l3 implies that a small change in x can bring about a large change in y, a low value of l; implies just the opposite. It isn’t hard to interpret the particular formula that describes 5 The nu
merator is just the covariance between x and y. The denominator is the vari—
ance of r. The regression coefﬁcient is the fraction of “covariation” relative
to the extent to which i itself varies If there is a lot of covariance even as r
vanes very little, we could say that x has a large inﬂuence on y 4 All said and done, however, even the best fitted straight line may not be
a good ﬁt, For one thing, the 01.8 procedure always gives you some answer
even When there is no relationship to write home about: you could regress
Cow—Cola consumption in China on the number of red shirts sold in Den—
mark, and you’d still get an answer from the OLS estimates More seriously
:61 systematic relationship may indeed exist but it may not be a linear one (as In our examples on nutrition and workAcapacity or per capita income and life
expectancy). The estimated values of A and b tell us nothing about whether
the overall fit is a good one. In this context, our previous discussion regard
ing the two notions of “lgﬁ‘ength” in a relationship is very much relevant
Finally! it is possible (and is almost always the case) that there are other ex—
planatmy variables that have been left out of the regression Ihe more such
variables we find, the more likely it is that the fit of the regression will be
improved (more on this later). Ihis last case is the most benign and can still
be informative I Ihe overall explanatmy power of the linear regression is best summa
rized by om familiar fiiend, the correlation coefficient As we already noted
it is quite possible for the correlation coefficient to be low even if the esti:
mated value of b is large (and vice versa). Here is another way to see the
same thing: use equations (A27) and (AZ10) *0 see that covxy covxy (7x A 0.x
R I: = 2 — = 5—,
03, 0'5, o:I 0'5, cry 50 that eveﬂ if g is large, R (01 R2) might be low if the variance of y is very
IaIge relative to that of x, What does this mean? It means that there is a
large pIOPOIﬁon of vaﬂaﬁon in y itself that cannot be preperly explained by
looking only at variations in x. As we noted previously, this could happen
for one (oi a combination) of several reasons At the same time, it is also
possible that b is 10W, but the correlation coefﬁcient of the regression is hitrh:
this happens if the overall variation in y is very low relative to the variaticbn
of x, Ihe preceding equation brings out these possibilities clearly When we look at a ﬁtted regression equation, therefore, the first thing we
Should ask 15 the value of R (or R2) A low value may lead us to have little 790
Appendix 2.. Elementary Statistical Methods faith in the regression in the first 1 ‘
With low 2 ' ‘ p ace, although if is possible tha ' ' '
palamEteIRb :strlésgjaIplaprgvr’de us With useful information, pr‘ovilileeclgllerSastHlll:
espedauy the case in eh1 precrsely enough” (see subsequent text) lhis is
because there has bee :1 atop; where the correlation coefficient is low not
there are Sim 1 t some damental error of speciﬁcation but be I p y 00 many explanatory variables for any one of them tocfarelt: a lot of power. Multivariate regression The last rem ‘ v 
moﬁvate mea::$ tie Paraglaph, as well as exercises (l) and (2)
e regression It is n10 '  '
movem . st often th
by one eligeOf Same dellllama“ Vanable can never be adequafelase ﬁllyF the
on the Ii ht—pIZHdenlt var1able alone. Several variables need to Keel)? lande:
quantity gof m n ll‘srde of the regression equation simply to bun d c u t:
However thequis erEdlhIandomess on the part Of the dependegt 3:2131:
. ' er, more immediate ~ 
mde end . _ need for includ ' ‘
I‘elatgd will: grinables. some of these variables may be systellingatillcclllllm(ma1
HSEd and With $60; more of the independent variables that we’ve allegi
variables attribit ependent Vanable ‘35 W931; 50 that the exclusion of g
_ 1 es a compound effect to the ind . sue
Included" Ependent variable already
To use the exam
. ple of convergence f ‘
note that th . mm the them Of econ ' r
of p61 Cgimme i1Siolow model predicts that countries with loweronﬂlgaitlﬁtlh
cerens Pﬂrzbris aiodnle W111 grow fastE‘I HOWEVEI} this Statement is 0111 fall;
Per capifa inclo 111 most cases the ceteris will not be paribus Thusjlni hl
capital {WmCh ripesthmay permit the accumulation of larger stoéks of hufn BI
that inéludes only emse'lve‘s generate faster rates of growth. A re ressian
a coefﬁcient on Piffdgiglplla mcomtﬁ as an indePendent variable Wm ggener 5::
_ _ a income at include b H '
ca rta .. S Oﬂ‘the d' ”
cagital 11:03: (the Solow prediction) and the “indirect” ifgcteif'eclrd per
net effe'ct ma 5138::1111318, the two effects work in opposite direction;a 3111:3216}?
growth Putt; hﬁppareIlﬂy) Show that Per capita income has noleffect of:
e<1uation helpsgto 55:53:? :riran additional valiable into the “931955103
. e
detaﬂed discussion)“ 0 Effects (599 Chaptas 3 and 4 for more
The en r  .
and let xgl Elalszbdeznlls :15le fStatEd Lat y be the independent Valiable
. I'A'“! 0 cc one .313 
estimate a linear equation of the formpenderlt variables. Then Om taSk is to (A212 _
) T"i_Alﬁblxl+I3’27‘2i“ ‘+bkxkr where the
constants (A, b1, . ., bk) are to be determined 1 A2 3. Regression
the OLS method used in the now to form an intuitive
with some difﬁculty or three or more in
“best” The rest of the story is a natural extension of case of a single independent variable. It is harder
picture of what is going on (because scatter plots work with two independent variables and cannot be drawn f
the same: we look for the dependent variables), but the main idea is
hyperplane that fits the multidimensional scatter of observations. Inst as be—
fore, We may define the predicted value 1}, for any collection (A, b1, , bk)
and any observation i as
(A213) y“, E A + ax} + r;sz + + bkxi.‘
and the prediction error e, as
(AZJM) er 3 Vi ‘ iii
Now We carry out exactly the same exercise as before: choose (A, in, . ., bk)
to minimize the sum of the squares of the prediction errors 8% These
yield natural extensions of the formulas (A210) and (A211)
Make sure that you understand just what these estimated coeffrcrents
tells us the effect on y of a change in x1 mean For instance, coefﬁcient in when all other values of (x2, . that the change in x1 will haVe no effect
but the fact remains that may, in some situations, , xk) are held constant. This does not mean on the other'values of x2. They
b1 is a measure of the “pure” direct effect of xi, freed of the “contaminating” influences of the other
gression equation, therefore, should independent variables. The “correct” re
tell us the nature of the inﬂuence of x on y, when the inﬂuence of “other factors” has been accounted for.
f the correlation coefficient in a multiple It remains to specify an analog 0 regression exercise. This is some measure of how the dependent variable is correlated with the entire set of independent variables. There is an easy way
‘ 'able case: sirnv to do this which nicely genera
e as a measure the correlation coefficient between 1/ and the predicted ply tal<
values 1} that arise from the regression. After all, the predicted values are
a measure of the joint explanatory power of all the independent variables, taken together.4 4 Actually, a slig , called the adjusted correlation caeﬁ‘icient, is used lhe adjustment is employed to allow for the fact that the inclusion of any additional independent
variables can never lower the correlation coefficient and someﬁrnes may increase it without really contributing to explanatory power Thus a correction is applied to the correlation coefﬁcient, the size
of the correction depending (among other things) on the number of independent or explanatory vari
ables included in the regression It is possible for the adjusted R2 to decline when more independent
variables are added 792 Appendix 2. Elementary Statistical Methods (7) Verify that the correlation coefﬁcient proposed for the multiple regres
sion indeed generalizes the case of a single independent variable by proving
that the correlation coefficient between two variables .x and y is the same as
the correlation coefﬁcient between A + bx and y for any constants (A, b) with l) 75 0.
There are two special cases of a multivariate regression that deserve some
attention Nonlinear regressions. A multivariate regression can be used to handle situ— ations in which the true underlying relationship is perceived to be nonlinear,
either because of conunonsense considerations or more sophisticated theoret ical reasoning Examples include the relationship between per capita income
and life expectancy, discussed earlier in this appendix and in Chapter 2, and the Kuznets invertedU hypothesis studied in Chapter 7
A first step to deal with this situation is to include both x and x2 as independent variables on the righthand side of the equation Thus, even if
there really is a single independent variable, the model behaves as if there were two: the variable and its square.
What is the advantage of including the squared term? It allows different zones of positive and negative association between x and y. However, be— cause the squared term only permits a quadratic equation to estimated, this
method carmot handle more than one switch in the direction of association. But the general method easily suggests itself: include further higher powers
of x if you wish to handle more complicated switches in behavior. Nonethe— less, few theoretical models in economics generate such complicated behav—
ior, unless they also happen to generate negative results of the sort "anything can happen.” (8) What kind of speciﬁcation would you use to generate a regression
equation for the scatter plot in Figure A21? Eyeball the plot and deseribe what values you would expect the different coefficients to have. Some forms of nonlinearity can be converted into a linear estimation equation with very simple mathematical manipulation For instance, suppose
that we are interested in estimating the coefficients of the Cobb—Douglas production function
(A215) r = AKc'LB by using data on output (Y), capital (K), and labor (l). Clearly running a lin
ear regression on these variables will get us nowhere, because the functional
form that we are trying to estimate is inherently nonlinear However, taking logarithms on both sides of (A215) will help here: (AZlé) lnYzlnA+aan+Blnl A2 3 Regression 79 3 Equation (A216) is a linear form that can be estimated using OLS. Converti
the given data to logaritlmnic form to estimate the coefficients lnA, or, and B. For an application of this method, see Chapter 3 Dummy variables. Often, an additional variable takes the form of a dummy, that is, a variable that only takes on binary values typically represented by the numbers 0 and i. For instance, we might wish to test the hypothesis that females earn less than males in similar jobs. We would need then to estimate a wage equation that includes several independent variables, such as age and education. Of special importance would be the dummy variable that takes the value of 0 if the worker is female and the value 1 if the worker is male. The discrimination hypothesis then states that controlling for other variables such as education and age, the coefficient on this dummy variable should be positive. The coefficient on the dummy can be interpreted as the additional income that a worker receives simply by virtue of being male. There may be other
effects as well—war instance, it is possible that a male may also receive more
education and that education has its independent effect on wage earnings—
but this will not be picked up by the dummy variable provided that edu—
cation is aiso included in the regression equation, and indeed it should not.
Whether the benefits of being male are chiefly manifested through factors
such as better access to education and not directly through the labor market
is something that we would like to explicitly analyze, and we don’t want to
lump all these effects under one general heading A standard way to include a dummy is through the additive form (A2 17) y x A + bx + CD + error terms, where r is a vector of independent variables, (A, b, c) are constants to be
determined (say by 018), and D is a dummy variable that takes the value
1 in the case of a certain occurrence and 0 otherwise. For instance, D might
be a country dummy that takes the value 1 if the country is Latin American
and 0 otherwise (see the study of the inverted—U hypothesis in Chapter 7)
You might ask, What is the advantage of including a dummy variable
when we can simply take the data apart for each of the classifications that the
dummy is supposed to represent? For instance, if the data are in the form of a
panel containing inequality data for some countries that are Latin American
and others that are not, then why not simply create two subsets of data from
this panel and run separate regressions? Indeed, we could do this, but the
point is that the dummy variable approach imposes much more structure
on the problemwstr ucture often driven by theoretical considerations Return
to the wage discrimination example. We might have theoretical reasons to
suppose that changes in education or age have the same effects on male and
female wages at the margin, whereas the gender effect simply raises the wages 7 94 Appendix 2 Elementary Statistical Methods for men (relative to women) by the same amount (or by the same percentage)
at every level of education or age lhis is tantamount to the assertion that
the gender effect only resides in the intercept term A and not in the regression
coefﬁcient (or the vector of regression coefﬁcients) represented by E? 5 Take another look at equation (A217) It effectively specifies the intercept term as
A + CD and retains the same value of la whether or not the dummy takes the value 0 or 1. lhus the advantage of the dummy variable approach is that it allows
us to tweak only those parameters that we consider theoretically affected
by the dummy This allows us to pool the data for greater statistical power
Moreover, we can (if we wish) allow the dummy to affect some of the regres
sion coefﬁcients by simply interacting the dummy with the relevant variables.
For instance, suppose we believe that the wage discrimination effect grows
smaller with age. Ihen our speciﬁcation might read as follows: wage “4 A + b1 educ + lr2 age + b3D age + CD + error, where the variables are selfexplanatory (perhaps expressed in logarithmic
terms) and D is the gender dummy Note that the dummy has been entered
in two places: ﬁrst as a familiar additive shift and second to explore the idea that greater age reduces the size of the shift. (9) What sign would you expect 53 to have? How would you explore the
conjecture that gender bias is unaffected by age, but is more pronounced for higher education levels? Additive dummies are often referred to as fixed eﬁfects because they cap—
ture some shift of the regression equation that is presumably intrinsic to
the characteristic captured by the dummy Ihus regressions that incorporate
country, village, or time fixed effects are simply those that include the corre
sponding dummy variables in their specification. Bias and significance Apart from questions of over all fit, there is the issue of whether the estimated
coefficient is can be trusted: is it far off the mark from "the truth”? I have put the phrase "the truth” in quotes because it needs some expla
nation I’hink of a large (potentially infinite) set of (x, y) observations that
we might have access to: what we have in our hands is really a subset or a
sample of these observations Now there is some "true" relationship between 5 in the wage discrimination example, this assertion pertains to the case in which the Wage iS
supposed to be shifted upward by a constant absolute amount because of the gender term If the
shift is proportional, we need a different specification if all effects are multiplicative, we can take
the logarithmic route discussed for nonlinear estimation {see equations (A2 15} and (A2 16)] A23 Regression 795 the variables x and 1}, but this does not mean that our particular sample al lows us to divine what this true relationship is Our sample allows us to construct estimates fl and ii of the true relationship that we believe to be “out there,” but these estimates generally would be different if we used an other sample from this large “mother set” of observations. To continue the rainfall example, as our mother set of observations, think of all the rainfall that has ever occurred in history (and all that will occur) and the accompa— nying levels of crop output. We do not have access to this entire set: just as
an example, perhaps we have all years of rainfall and crop output for alter nate years between 1970 and 1997. We use this information to estimate the “true” relationship that is hidden in the mother set of observations How—
ever, another sample of observations (say the other alternate years) may give
us a somewhat different estimate. This point of view teaches us that our estimates are themselves random
variables in some broader sense. One aim of statistics is to develop a notion
of how precise or significant our estimates are; that is, how confident can we
be that our estimated value of ii is close to the true l)? A somewhat different
twist on this problem is obtained by rephrasing the question a bit: using the
estimated value 5 and the other data of the problem, how sure can we be that
the true value of b is significantly different from 0, or from 1, or lies within
or outside some prespecified range of values? With this twist the estimators
only act as stepping stones to the true nature of things (which we can never
be completely sure of because we lack all the data). We can only say things
about the true relationship with some probability: the value of the exercise
then centers on how close the probability is to l Bias. Following up on the previous discussion, we may think of an 01.5 esti
mator as a function of the sample observations {( x1, yl), ., (xn, 1%)}, where
you can think of each in observation as multidimensional if you like. let’s
give this list of observations a name: call it 2. Thus 2 comes from some
mother set of observations, and different draws (or equivalently, different
data collection exercises) give rise to different z’s, all from the same mother
set Thus an 01.8 estimator (say of the regression coefficient in) can be thought
of as a function that yields, for every 2:, an estimate [3(2) 5 ii of the regression
coefficients . Now we can think of the average or the expected value of 5(2) as z ranges
over all conceivable samples Is this average the "true" value b? If it is, we
say that the estimator is unbiased. Ihat’s just a way of saying that we can
expect our estimate, on average, to be clustered around the true value that
we are after. An attractive feature of the OLS estimators is that they are
indeed unbiased in this sense Here are further details to support this observation. Let us restrict our—
selves (for simplicity) to the case of a single independent variable. Suppose 796 Appendix 2 Elementary Statistical Methods that in our minds, the truth is given by the model (A218) y, = A + in, + 5,, where the x'ls may or may not be random variables (it doesn’t matter for what
I'm going to talk about) and the “noise terms” 6, all come as independent
draws from a single distribution that itself is independent of the ,x values.
Because 5 is pure noise, we take it that the mean of this distribution is 0, The parameters A and b are what we are after
Recall that our estimate it of b is given by the formula 5: Zi=1(xi _ nyr _ Zisdxr — 332 where 2? and 1], you will recall, are the sample means of the x and y obser
vations, respectively It follows that Elixi —' 9?? ijlcc, — n2 } where the second equality follows from the fact that Elite,  i) = 0 and
the third equality comes from letting A, E (r, — :E)/(Zf:1(x, m if). Conse
quently, (A219) E=Zl,(a+bx,+e,)=b+Z/\,a,
i=1 i=1 where in deriving this equation, we have used the observations that 2:1 1‘: 0 and 2L1 llin = Z?=1Ai(xi — f) =1
Now, for given observations of the .x variables, I am going to take expec
tations over the noise terms 612. From equation (A219), this tells us that 5(5) = b + Z heal.) = b,
i=1 because the noise terms all have mean 0., This proves that the OLS estimate
of b is unbiased. A similar argument applies to the estimate of the intercept term A, A2 ,3. Regression 797 Signiﬁcance The lack of bias in the OLS estimates of A and it tell us that,
on average, we are not making a systematic error in our estimation of the
true values of A and b, However, this does not mean that in any particular
exercise, we are at the true value of these coefﬁcients or even close Figure
A2 4 illustrates this concept by reiterating that any estimate is a random var i
able and, in general, has a distribution of possible values around its mean.
What we showed in the previous section is that the mean is indeed the true
value that we are seeking, but as the ﬁgure shows, there will be some disper—
sion around the mean“ All we see is the estimated value b, but because we do
not know where the distribution in Figure A2 4 is centered, we do not know
whether this estimated value is bang on the truth or is far away (or greater
or less) than it, We need some probabilistic assessment of this What follows is a little more technical, so I will start by giving you a
simple, intuitive idea of how we go about the process Suppose that we are
interested in knowing whether the true value of b is positive (For instance,
we may want to know whether rainfall truly influences crop output, whether
education influences wages, or whether the stork population in a c0untry
inﬂuences the number of babies born there.) Thus we regress y on x and
form an OLS estimate is of b Now suppose for the sake of argument that the true I) is really 0 Even
so, it rarely is the case that the estimated value 5 comes out to be exactly 0, be
cause relationships that need to be estimated by statistical analysis of this sort
are rarely exact—there are always tiny unknown outside inﬂuences, ever so
minute and intractable disturbances, or simply measurement errors in vari—
ables, that tend to make the relationship between two variables somewhat
fuzzy and blurred. Thus, actual farm output, although strongly dependent A
b Density Function of the Estimator 3 b Possible Values of B Pigare A2 4 Dispersion of the 01 S estimator around the true value ofb 7 98 Appendix 2 Elementary Statistical Methods on rainfall, also may vary slightly in response to other inﬂuences that in
truth do not have any serious effect For these reasons, we carmot be sure
that a positive estimate guarantees that the true coefficient is really positive. Ihe art and science of drawing statistical inferences lies in discovering
true and strong "structural relationships” in data that have been contami—
nated by the inﬂuences of these minor factors, which statisticians (partly to
vent their frustration, perhaps!) refer to as "noise." To summarize, then, the estimated coefﬁcient of a particular "explanatory
variable” may be nonzero, even if there is no real influence of that variable
on y. Alternatively, it may be that in truth the relationship is a positive one,
but does a positive estimate really clinch the issue? Statisticians go about this task by specifying a provisional or null hypoth—
esis. For instance, they may start by hypothesizing that the “true” value of
the regression coefficient in question is nonpositive and then try to calculate
how likely it is that the estimated value from any sample can turn out to be
what it does turn out to be merely clue to the effect of noise If this probabil
ity is seen to be very low, then there exists strong reason to “reject” the null
hypothesis that was assumed to start with, that is, the hypothesis that the
“true” coefficient is not positive. In the case of such rejection, the opposite
conclusion has to be embraced: that the independent variable in question in—
deed has a nonnegligible positive influence on the variable y. In this latter
case, the estimated coefﬁcient is said to be signiﬁcantly different from zero or simply “statistically signiﬁcant."
Note that there are two tasks to be carried out here. First, our calculation of the aforementioned probability surely depends on our belief about the
strength and nature of noise in the data at hand. We need to estimate this:
we do so by looking at the size of the “residuals,” that is, the deviations
of predicted values of y (predicted from the OLS regression equation) from
actual values. This gives us some idea of the dispersion or variance in the
distribution of b. If this distribution is very closely clustered around the truth
and if our estimate is also positive, it is very likely that the truth is positive
as well. On the other hand, if this dispersion is very high, then we may not be very sure (unless the estimate itself is very large and positive).
lhus it is important to combine both the estimated value of the coefficient and the estimated strength of the noise to form a test of the null hypothesis
Under some statistical assumptions, this combination leads to what is called
a 1‘ statistic, and we decide whether the coefficient is “significantly posi
tive”, "significantly negative” or "signiﬁcantly different from zero” by em
bracing the opposite postulate as the null hypothesis and examining whether
the value of the t statistic provides us with sufficient grounds to reject this
hypothesis. For instance, under the null hypothesis of a 0 “true” coefﬁcient, higher
values of the t statistic are more unlikely Hence, the higher is the computed A2 .3. Regression 799 value of the t, the less plausible the null hypothesis will be. Usually, a cutoff point is decided beforehand, and if the value of t turns out to be any higher 1
than that, the null hypothesis is rejected and the coefﬁcient in question is
pronounced to be statistically signiﬁcant. Hence, in any technical report on a
regression run by a researcher, it is standard practice to report the respective
t values in brackets, right below the estimated coefficients, so that readers
may judge the statistical significance of those coefficients 6 lhe second element of subjectivity lies in deciding beforehand how un
likely is “too unlikely” for a null hypothesis to be rejected. It is common
practice to work with a 5% probability, that is, the null hypothesis is re
jected if, under its assumption, there is less than a 5% probability that the 1‘
value takes the value it actually does in the given sample. Such a test is then
called a test with ”5% level of signiﬁcance ” Tests with l and 10% levels of
signiﬁcance are also not uncommon. Ihe remaining sections go into these matters in more detail Standard errors Following the foregoing discussion, our first task is to de
termine how dispersed the distribution of the OLS estimator is. Ihis isn’t
difficult, at least up to a point. We can use equation (A219) to calculate the
variance of l? for a fixed set of r observations. First rewrite (A219) as and then note that A A n 2
(A220) variance of b = EU; — b)2 = 0‘2 A? = —n—O—“’——“r
; Zi=1(xi — 902 where 0' is the common variance of the (independent) error terms It is sim
ilarly easy to show that 0’2 21:1 3‘? 7’1 231:1“: ‘H E? This is good information, but it’s incomplete. In particular, we don’t know
what the unknown variance 0'2 is, and so we have to approximate it some—
how. One way to do this is to use the observed variation in the estimated err01
terms e,, which you’ll recall are just the differences between the observed y,s (A2 21) variance of fl = 5 Observe that the general philosophy reﬂected here is one of "not guilty, unless proven other
wise.” For instance, in testing if a coefﬁcient is significantly different from 0, the initial presumption
is that the variable in question has no effect on the explained variable (t), but that presumption is
later dropped it it seems too unlikely in the light of the available evidence 800 Appendix 2. Elementary Statistical Methods and the predicted 1215. It turns out that an unbiased predictor of the variance
(72 is given by the estimator 52:
(A222) ‘7 — n—2‘ i=1 An intuitive reason (somewhat unsatisfactory) for dividing by n — 2 rather
than by n is that two parameters must be estimated to compute the error
terms: A and b For instance, when computing the sample variance of a
variable, it is typical to divide by n — 1 because one parameter (the sample
mean) must be estimated to compute the sample variance. This changes the
degrees of freedom in the sums of squares that deﬁne 6'2 Mathematically, it
is easy to check by taking expectations that this particular division (by n — 2)
does indeed give us an unbiased estimate. We now substitute equation (A2 22) in equations (A220) and (A221) to A come up with estimates of the dispersion in b and A. These are known as
the standard errors (SE) of the estimates: . a2 '
(A223) 83(5) = (A224) SE (A) = Sometimes, regression results report these standard errors in parentheses
below each of the estimated coefficients. If these errors are small relative to
the size of the coefﬁcients, we can have more faith in the qualitative predic
tions yielded by our estimates. For instance, if the estimated 13 is large and
positive, while at the same time the standard error of b is small, it is very
likely that the true value of b is also positive. With some more assumptions
we can go further, as the next section shows. The t distribution. Ihere is a very special distribution called the t distribu
tion It looks a bit like a normal distribution (though it is flatter). It has mean
0, and its variance is determined by a parameter called its degrees of free—
dom Ihe t distribution is well known in statistical software programs and in
older statistics texts that actually tabulate its proper ties For instance, with 20
degrees of freedom, it is well known that the probability that a random vari—
able t will exceed the value 2 086 is precisely 0 025, that it will exceed 1.725
is 0.05, and that its absolute value will exceed 1725 is therefore 0 10 Ihe t
distribution plays a beautiful and critical role in the testing of hypotheses.
To see how this works, we need one assumption and one theorem. A2 3 Regression Assumption. I he errors 6,
distribution. 801; of our regression model follow the well—known normali There are theoretical arguments that justify this assumption, but it is assumption all the same I heorem. If the errors are normally distributed, then the random variable must follow a t distribution
size 23 — b
sea?) with n — 2 degrees of‘fr'eedom, where n is the sample Hypothesis testing. The preceding theorem allows us to test various hy potheses, such as whether a regressron coefficient is “significantly” positive, “significantly” negative, or just plain “significantly” different from O. .
For instance, we may be investigating whether school dropout rates have anything to do with crime rates in cities Assume we already have regressed crime rates on dropout rates and have obtained an estimate of b We want to know whether b is “significantly” different from 0 Alternatively, we can it is in fact true or we might fail to reject it when it is false. As researchers, we wish to limit these possibilities to a minimum
and this will determine the “power” of the test I it is customary to limit false rejections: we want to be conservative in
that we do not want to reject a hypothesis when it is actually true. Thus
when We do reject the hypothesis, we want to be very confident that the hypothesis is indeed false Ihe flip side of this approach is that a failure to reject does not mean too
much: partrcular,1t does not mean that we have “accepted” the hypothesis.
indeed, it IS often possible that both a hypothesis and its converse may stand (statistically) unrejected. The proportion of samples in which false rejection occurs is called the level of significance of the te st, usually denoted a. It is common practice to work with a 5% probability, that is, the null hypothesis is rejected if, under its assumption, there is less than a 5% probability that we have rejected the hypothesis when it was indeed true. Such a test is then called a test with “5% level of significance.”
also not uncommon. Tests with '1 and 10% levels of significance are 802 Appendix 2. Elementary Statistical Methods Now we will see how the t distribution plays a role in all this. For ex—
ample, you are investigating whether school dropout rates have anything to
do with crime rates in cities Say you have thirty pairs of observations. You
have used OLS to estimate b If the errors are normal, you know from the foregoing theorem that the variable i— 1)
sad) has a t distribution with n — 2 degrees of freedom. Because your sample size
is thirty, you have 28 degrees of freedom. First, form the null hypothesis. Let us say that we choose as the null hy
pothesis h m 0: dropout rates have no influence on crime rates Second, form the alternative hypothesis. This is the hypothesis that dropout
rates have a positive effect on crime rates. We also could have chosen as
our alternative the weaker alternative that they have some effect, positive or
negative ’ lhird, choose your level of significance Say you are willing to take a
one—in—twenty chance that you reject the null when in fact it is true, that is,
a = 0.05 Fourth, look up the value t* of the t distribution such that the probability
that t exceeds it is no greater than 0.05. P01 28 degrees of freedom, this is
the value I 701. The area to the right of this is called the critical region for the
test. Generally speaking, it is the region in which the null hypothesis will be rejected in favor of the alternative.7
Finally, calculate what is called the test statistic: the value of the ratio h/(SE(l3)) (Ihe numerator is 5 because we are working under the null hy
pothesis that h = 0.) Suppose that this statistic is 2. Because the theorem tells
us that this test statistic follows a t distribution with 28 degrees of freedom,
we see that the chances that this statistic could have acquired the high value of2
when the hypothesized value of h is O is lower than 0 05. We would then reject the
null hypothesis that school dropout rates have no influence on crime rates
in the city Here is a quick summary of the general method. (1) Specify your null hypothesis, alternative hypothesis, and the signifi
cance level a. (2) Find the critical region using the t distribution with the appropriate
degrees of freedom (3) Using the sample data, compute the value of the test statistic. 7 the critical region depends on what our alternative hypothesis is. In our case we use the alternative that dropout rates have a positive effect On crime, so the critical region will be to the right
of the threshold 1.701 (this will become clearer in the final step of the exercise) A23 Regression 803 (4) Check wheth ' ' ' '
region. er or not the calculated test statistic falls in the critical (5) Reject the null hypothesis if the test statistic falls in the critical region
Do not reject the null 1f the test statistic does not fall in the critical regign ‘ Conﬁdence intervals. Another way to check signiﬁcance is to offer the read a conﬁdence interval for your estimate; that is, provide an interval of valueEI
around your estimate with the following interpretation: the true value of 5
Will he in this constructed interval for more than a predetermined ercent~
age of samples This predetermined percentage (chosen by the r‘eseariher) is analogous to the level of significance in hypothesis testing and is known a
the conﬁdence level Indeed, the confidence level is often denoted b 1 ~ as
where a is the associated level of significance. lhus conﬁdence legels (ex:
pressed as percentages) are usually taken to be 90, 95, or 99%. Note well that our estimator ii is a random variable, so the confidence in—
terval is random as well: it will var y from sample to sample. In contrast the
true value of h is some fixed (but unknown) number. Ihe probabilistic slate—
ment in the previous paragraph thus refers to the chances that this randornl
varying interval will contain the true value within it, and not to the chance:
of some randomly varying parameter lying within some ﬁxed interval More formally, a confidence interval for a parameter lr, given the estima
tor lr, is a range I = [h — B, h + [l] computed from the sample data so that
I contains the true value b in a high enough percentage of samples (where "high enough” is given by the pr'echosen confide 1
a l ‘ I
choose 3 so that nce eve) In other words, (A2 25) probability (:5 — b] < B) = 1 w or, where I [ denotes absolute value and 'l — cu denotes the confidence level _ l‘o find B we need to have an idea of how the random variable 5 — h is
distributed Ihis is where the t distribution makes a reappearance. Sim l
divrde both sides of the inequality in (A225) by the standard error of 5 life:
the left—hand side is a random variable t that has a t distribution with n — 2
degrees of freedom. Thus (A2 25) is equivalent to the requirement A (A226) probability (m < B, ) = 1 _ a:
sea) Using tables of the t distribution, we can find the critical value t* (a n —2)
that makes this inequality true. In other words, set I 3 = t*a,rr—2 X 804 Appendix 2 Elementary Statistical Methods Where P(t < t* ) E 1— a This leads to the much more familiar expression a, n—2 for a confidence interval for the parameter l7:
I: [23 — t; H x 813(5), 13 + QM x seems In summary, we construct a confidence interval for the parameter b in
our linear regression model by using the following four steps: (1) Choose your conﬁdence level 1 — a: 90, 95, or even 99%
(2) Look up the value t“ ",2 using tables of the t distribution or let your a; computer do it for yOu‘ For example, if your sample size is 120, the values
t2.“ are 1 289, 1.654, and 2.358 for the 90, 95, and 99% conﬁdence levels, respectively
(3) Compute the estimate 5’ and its estimated standard error SEUS)
(4) Calculate S = t"‘m’n_2 x 813(5) and finally 3i mm
References ...
View Full
Document
 Fall '09
 GustavoJ.Bobonis

Click to edit the document details