Ray_App2_ElementaryStatsMethods

Ray_App2_ElementaryStatsMethods - Appendix 2 Elementary...

Info iconThis preview shows pages 1–15. Sign up to view the full content.

View Full Document Right Arrow Icon
Background image of page 1

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
Background image of page 2
Background image of page 3

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
Background image of page 4
Background image of page 5

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
Background image of page 6
Background image of page 7

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
Background image of page 8
Background image of page 9

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
Background image of page 10
Background image of page 11

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
Background image of page 12
Background image of page 13

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
Background image of page 14
Background image of page 15
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: Appendix 2 Elementary Statistical Methods A21. Introduction Economists, and other scientists as well, are often interested in understanding the relationship between two or more variables For instance, an agricultural scientist might want to know how variations in annual rainfall affect crop output, a social worker might wonder whether school dropout rates have anything to do with crime rates in cities, and an economist might want to test the hunch that higher income levels, or perhaps a diminished population of storks, tend to make for smaller-sized families. An important statistical , technique that allows the exploration of possible interrelationships between variables is called regression analysis. This book contains several instances of such analysis. Let us suppose that we want to investigate the relationship betWeen two variables x and y For instance, x might be annual rainfall measured in inches and 1} annual crop output, say, metric tons of wheat Our first task is to collect the data: we will need a number of joint observations of (x, y) values; usually, the more the merrier. Observations may be collected at various levels of detail: countries, regions, groups, individuals, and so on. In the rainfall example we might have observations from several regions or states of one 01 more countries, and several observations (at different points of time) for each region, Observations collected at the same point in time but across different units (regions, countries, individuals) form a cross-sectional data set Observations collected for the same unit but over different points in time form a time series. Mixed observations (both across units and across time) form a panel The general rule, of course, is that more data are preferred to less, but the problem is that detailed and appropriate data are often unavailable. Un— derstandably enough, this problem is more acute for developing countries For instance, we would love to test the Kuznets inverted—U hypothesis (see Chapter 7) with a long time series for a given country, but this kind of de- tail is available only for a few countries. Hence, we need to be aware of the pitfalls of limited data, and must attempt to correct for these limitations in the best way possible. In a sense, this is what statistical analysis is all about 778 Appendix 2 Elementary Statistical Methods For example, in trying to estimate the effect of rainfall on crop output, cross—section data on rainfall and output alone probably will be inappropriate because there can be inrportant (unobserved) differences that might obscure the “pure” effect of rainfall on agricultural productivity, or, worse still, the measured effect might be systematically biased because we have neglected to include some other variable that may be systematically correlated with rainfall and have its own effect on crop productivity The following exercise provides an example. (1) Regions with low rainfall may have invested in irrigation. If the ir- rigation data are not included in our analysis, explain why the measured effect of rainfall will be systematically biased downward. In other situations, a pure time series may be problematic. Suppose that we are interested in knowing how household income affects family size. Again, if all the data we have pertain to household income and family size, our estimates might be confounded by changes in other variables over time: the spread of education, the availability of better birth—control methods, and so on Some of these variables may be completely uncorrelated with income changes, but others may be correlated and might bias our estimates (2) Suppose that income per se has no effect on fertility choices, but education does. If we lack data on education, show ‘that observations on income and fertility may suggest a positive relationship between the two, when in fact there is none (ceteris paribas) Thus much of regression analysis is concerned with the careful estima— tion of bilateral relationships, while making all attempts to control for other variables that may also affect that relationship. A22. Summary statistics Before embarking on a detailed discussion of the relationship between variables x and y, let’s identify some summary features of these variables. Suppose we have a pairs of observations: represent them as (.xl, yl), (x2! 92) " ' I (xrrl ya) The mean The average of these observations is often important, and this is typically mea- sured by the arithmetic mean. It is the sum of all observations of the relevant variable divided by the total number of observations (we have rr observa— tions of each variable in the foregoing general description) The arithmetic A22 Summary Statistics 779 means 7E and y" of x and 1/, respectively, are mathematically represented as (A21) filiizetet“ +391 ni=l I n I _ 1 n r + . (A22) yzgzyf2J1 3’2: +an 1: The‘surmnation symbol (2) is a shorthand description of the summation, or adding up, operation. The notafion x,- denotes the ith observation of variable The variance The mean is not the only relevant summary of the observations of a variable. We would also like to know whether the different observations lie more or less close to the mean (ie, whether they are bunched closely to one another) or far from it (ie, whether they are widely dispersed). One way to do this is to somehow add up all the differences of the observations from the mean Note that all differences count, whether positive or negative There is a com— . monly accepted measure of diSpersion in statistics, which is the variance (or its close cousins, the standard deviation and the coefficient of variation). The variance puts positive and negative differences from the mean on the same footing by squaring these differences: thus all negative signs vanish Squaring has another property as well: it attaches proportionately greater weight to larger deviations from the mean: a difference of 2 counts as 4 as far as the variance is concerned, whereas a difference of 5 counts as 25 Mathematically, the variance is given by the formula 1 fl (A23) V = — . — ' 2 n go: a l which is interpreted as the average value of all (squared) deviations from the mean 1 The variance is often presented in the following equivalent form of a standard deviation, which makes the units comparable to those in which the variable originally was measured: (A24) 0 = fil 1 . . . . . . There is a slight distinction between the variance and the sample variance that we ignore here, but see subsequent text 730 Appendix 2 Elementary Statistical Methods Notice that it is irnpor tant to take the average of the squared differences from the mean, and not just their sum. This is because even if individual differ— ences are small (so that there is little “dispersion” in the data actually), the aggregate of such differences can be large simply because we have a large number of observations, and we don’t want that This kind of reasoning also suggests that the variance (or standard deviation) should be expressed as a ratio of the mean: if not, an innocent change in the units of measurement can affect the measure of dispersion This gives rise to the coefiicient ofoariation: (A2 5) c = E: Correlation So far we have discussed summary statistics about a single variable How~ ever, our main goal is to understand whether two (or more) variables move together: whether they covary To understand the notion of covariance, con- sider the familiar example of two farmers who produce the same crop in two different parts of a country: The output of either farmer can take on only two values: H (high) and l. (low) H occurs with probability p, where p is the chance of having adequate rainfall and is the same across the two farmers. Now carry out the thought experiment of moving the farmers closer and closer together, initially starting out at two well-separated locations in the country in which they live Because the initial locations are far apart, the probability of good rainfall in one location is “independent” of outcomes in the other location Put another way, the knowledge that one farmer has suffered l tells us nothing about what might have happened to the other farmer As we move the two farmers closer and closer together, their for tunes become more closely linked: if one farmer produces H, you can guess with greater and greater degrees of certainty that the other farmer has produced H as well At the very end of this thought experiment, when the two farmers are neighbor's, their outputs will covary perfectly (if rainfall is the only source of uncertainty, which we’ve assumed): Note three things about this example“ First, if we just focus on any one of the farmer's, nothing changes The probability that he produces H is p all along The behavior of the individual random variables (output in this case) tells us nothing about how they might be correlated. In this sense, notions such as the mean and the variance do not tell us anything about joint movements of the variables x and y: Second, the fact that two variables covary (as they do in the example here when the farmer's live close to each other) tells us nothing about the direction of causation from one of the variables to the other, or indeed if there is a causal link between the two at all In our example, an H for one farmer does not in any way cause an H for the other, even if the two outputs are -.-,.a;...x.'w.-.; "1:; fixated—‘5; Vljl.'#‘l':‘l‘l"v‘ti"ili‘bl‘—1;;i:fi;lLing. A2 2 Summary Statistics 781 perfectly correlated It’s just that there is some third variable (in this case, the state of the monsoon) that is a common driver for the two Outputs IhE‘I‘EfOI’e our notions of causality must, in some sense, be formed by common sense observations of which variable is likely to be exogenous and WhiCh is like-fly to be endogenous. For instance, if we took as 0111 tWO Valiables (i) the state of the monsoon and (ii) the crop output of a single fainter“, then a positive correlation between these two variables is more likeb’ to be 1'1’1‘31ica’fiVe 0f causality: it is highly unlikely that the output of a single farmer will influence the state of the weather, Third, two variables may covary negatively as Well as positively (3) Consider the chances of a student scoring the highE’Sl Slade in math- ematics in her class Suppose the chances are given by the probability p If two equally able students are drawn at random and their chances are exam— ined, then show that their chances are independent if the two are in different classes, whereas they covary negatively if they are in the same class Note that the negative covariance is perfect if each class has only two Students A measure of observed correlation between two Valiables 7‘ and V is given by the covariance If We have a sample of n pairs of observations (761, V1), (-742; 9’2), -, (96,” y”), then the covariance2 is giVen by l n _ ‘ (A2 6) covry E E 21:0,- — xlfl/z' ' 9’) Note how this captures comovements If, when 1}: exceeds its mean! xi ex‘ ceeds its mean as well, then the covaflance will be positive, but it the fact that 1, exceeds its mean has no bearing (on average) on the behavior of xi, the covariance will be zeros Similafly, if xi tends to fall short of its mean When '91 EXCBEdS ‘9, the covariance will be negative" ; The covariance has the same problem as the variance, in that the number obtained is not free of units of measurement To remedy this we express the covariance as a fraction of the product of standard deviations of the two variables This yields the coejj‘i'cient of correlation, WhiCh we (lemme by covxy (A2 7) R : a, 0",}, 2 . . _ . . 1 Just as m the case of the sample variance, there is a Shght distmctlon between the sample covariance and the covariance that we ignore here: 3 1:0 $96 this a bit more formally, note that if 3/, has "no healing _ . P18, this just means that the distribution of r values around the me” x W111 lock the same Whether we look at the whole sample or whemeI- we 1001; at the subsample restricted to one particular obser- valion 0f y Resmcfiflg Ourselves to the latter; we see that Zr (xi flag? ”g) 2 0 ml each suhsample of pairs (1,, y!) such that y. equals some fixed value I} Adding Over all suah subsamples {by changing the Value Of ll); We See that the covariance must be zero. “ on the behavior of r, in the sam» A23. Regression 782 Appendix 2 Elementary Statistical Methods 783 r l Clearly R is also positive (or negative) when the two variables comove pos- itively (or negatively), Sometimes R2 is reported instead of R when we do 3 1412.3, Regressibn not wish to focus on the direction of the association, but only its strength (squaring removes the negative sign) The reason for dividing by the product of the standard deviations is only in part to obtain a units—free number (dividing, for instance, by the product of the means would have achieved this goal as well). The particular normal— ization we choose has the virtue of placing R between —1 and +1. These extremes signify maximal correlation, whereas 0 signals lack of correlation between the two random variables Although we omit the proof of this as— sertion, it can be found in any standard statistics textbook (see, for instance, Hoel [1984, pp” 385—386]) It is important, however, to note that R or R2 is not just a measure of association between two random variables, but a measure of a very special sort of association: one that is linear, Indeed, R2 takes on its maximum value of 1 when the relationship between x and y can be expressed in the form y, = A + bx,- for all i, for some constants A and b However, the true rela— tionship between x and y may not be linear (though it may be very strong) For instance, a greater consumption of calories leads to an increase in work capacity over some range (see Chapter 8), but after a point the relationship between calorie consumption and work capacity turns negative (as obesity sets in) Thus the true relationship involves a zone of positive association and another zone of negative association At the same time, if you have a large range of calorie—capacity observations and mindlessly calculate the COI— relation coefficient between these two variables, it may not be very high, simply because the correlation coefficient cancels out these two conflicting zones of association, The lack of a high correlation does not mean that there is no relationship at all: it just means that we are applying our concepts in the wrong way The choice of a specification of the true underlying relation— ship is part of the economist’s art, and although statistical methods can be indicative of the direction in which to go, an underlying theory is essential We will consider more of this in the next section. Introduction Spppose that we are interested in the form of the relationship between vari— 6 95 TC and y, and not just in the existence of a correlation Suppose we have this theory would have the o ' ‘ ' ‘ ‘ I I pposrte prediction. A regressron usin a ll bl data might throw light on the (relative) validity of these theories“g V8 a e The second way in which a regression can be helpful is in forecasting If with a fair degree of certainty, Variable x can also be a government policy parameter (like taxes), so that if a change in its value is seen to be forth— (4) Generate an imaginary set of calorie—capacity observations using the relationship y m A + bx — er2 (here it stands for calories consumed and 1/ stands for work capacity) What signs would you use for the constants A, b, and c? For given constants, where do the zones of positive and negative asso- ciations lie? Now use this relationship to generate a set of observations and calculate the correlation coefficient between x and y What would happen if you restricted your observations to only those from the zone of positive association? Often, a careful look at the data with the naked eye will tell us more than all kinds of statistical measures A scatter plot allows. us to do just this First We decrde (on the basis of our experience and / or theory) which variable is tb be causal” and which is to be the variable that’s affected by the movements of the causal variable.” Following convention, we let 3: stand for the causal or independent variable and y stand for the dependent variable (the nomenclature itself tells us that we, as econometricians, suspect a particular direction of causation and have used this already to classify the variables), 784 Appendix 2. Elementary Statistical Methods Next, we construct a diagram in which the independent variable is put on the horizontal axis and the dependent variable occupies the vertical axis On this diagram we record our sample observations. The-resulting plot of observations is called a scatter diagram or scatter plot. Our first (and critically important) statistical technique is: stare hard at the scatter plot. As an illustration, Figure A21 reproduces Figure 2 7 from Chapter The independent variable is per capita income and the dependent variable is life expectancy. The observation pairs are from different countries: the data form a cross section. . ‘ f To facilitate our visual examination, the figure draws in the means 0 each of the two variables in the form of a pair of cross lines Note how most of the data lie in the first and third quadrants Created by the cross lines Ihrs suggests that when per capita income exceeds its. mean value, life expectancy tends to exceed its mean value as well, which is Just a way of noting that the :oefficient of correlation is likely to be positive. 35 inmm iiii .wu—__________________w.mfl_ 30 5 T5 ! . O .0 * 9 ‘9 7D . . . o a? l Q . 0 E ' ‘ O Q * 9 a; as g , 5‘ I O I: E O E O U a 60 O . x in g 0 -’ 55 9 O l : 5o , . 45 . 40 fl 8000 9000 0 7000 2000 3000 4000 5000 8000 7000 Per Capita Income igare A21 Scatter diagram ofobseroatrons of per capita income and life expectancy in difier‘ent iantries Source. World Development Report (World Bank [1995]) and Human Development Report (United Nations Development Programme [1995]) A2 3 Regression 785 After this preliminary step, it is a good idea to get a sense of the overall relationship. Is a straight line the best fit? In the example studied here, this is unlikely to be the case. The reason is that life expectancy is difficult to push beyond 80 or so (for medical reasons), whereas the jumps from 50 to 60 and from 60 to 70 can be made more rapidly. Ihis suggests that the true relationship is more a curve than a straight line, with the curve flattening out as we move into higher ranges of per capita income This sort of relationship seems to be broadly supported by the scatter diagram. Ihe mathematical form of the regression should be constructed with this in mind Finally come two conceptually important issues. Remember that our goal is to understand if x has a strong impact on 1/ However, what does the word “strong” mean exactly? Figure A2 2 illustrates the problem In both panels of this diagram we have relationships that are most likely linear. In the first panel, we have a scatter plot between x' and y where the fit is remarkably good: the plots closely hug some straight line, but at the same time the slope of the line is flat In the second panel the scatter is much more pronounced, but the slepe of the “best fitted line” seems to be high. (You will appreciate the difference better if you look at the scale in which the vertical axes are drawn in the two panels: the scale is much more compressed in the second panel ) Thus “strong” has two meanings in this context A relationship may be estimable in a precise fashion: in the first panel, even though the effect of x on y is not large, the data tell us that this statement can be made quite precisely. "Ihe second meaning is that the effect of x on y is large. As Figure A2.2 shows, this statement is quite compatible with the observation that a precise estimate of the relationship itself is not to be had. Note that the correlation coefficient captures the notion of “strength” in its first sense (at least when the underlying relationship is linear) It does not matter what the slope is: if the data fit perfectly on a straight line, the correlation coefficient will always equal unity. (5) Suppose that observations on (r, y) pairs are generated directly from the equation y = A + bx, where A and t; are constants Assuming that b 7E 0 and that there are at least two observations, show that R2 = l irrespective of the value of ii, the slope The basics of regressibn Suppose we feel that the relationship between x and y can be well described by a straight line. Ihus we suggest a (linear) equation of the form (A28) y z A + a, where A and b are (as yet unspecified) constants. This equation describes a possible relationship: it says that y assumes a value of A when x = 0, and ll 786 Appendix 2. Elementary Statistical Methods d3 as ,_,_W _,__—« 5 It) 15 20 25 Ill) 35 40 45 5o x values MU 120 .. v 100 ED x values figure A2 2 Nations of a “strong” relationship its value increases (or decreases) by an amount I? for each additional unit increase (or decrease) in the value of x, In graphical terms, (A2 8) describes a straight line in the (x, 1/) plane Of course, if we vary the values of A and b, we will alter both the position and the slope of this straight line Look at and compare the two panels in Figures A2 3, In both panels, the numerous dotted points represent the same scatter diagram—a plot of several pairs of joint observations of x and 1/ On the other hand, the two straight lines in the two panels represent tWO A23 Regression 787 140 .1 140 ,. ‘ /:.‘I 120 , .. - 120 // ‘ 100 - y,/ 100 /,,r / 13.. / so b u//. I so // L ‘ .. ,x/ //' ., LO / , 3 50 ‘I "K, g 50 /,s/ on -’ — / r > - -‘ 4/ ' m ' >~ 4° _/ :40 /” ‘- 20 ’f i , ,,./ " 20 ., -- - " / 0 r‘fl— 1" . r 0 l—. ' i ‘I —r. —.- _ J ,. 10 “20 20 so 40 50 _20 a ,. 10 20 30 40 50 «is L -40 x values x values figure A2 3 fitting a line to a scatter: different attempts to give a stylized picture of the relationship between x and y. In the left-hand panel, the actual data points are all more or less close to the line drawn” In the light—hand panel, however, many of the data points are quite far off, Obviously, the straight line in the lett~hand panel is a better representation” of the data than its Emmtelpalt in the light-bald Primal“ In Other WOI‘dS, it “fits” the data better Given a set of observations, therefore, GUI faSk is to find the straight line that is the "best fit” to the data; it amounts to the same thing as finding the “best” values of A and b in equation (A28). However, an infinite number of straight lines can be drawn on a plane, and rt. rs impossible to judge the relative merits of all of them merely through Visual inspection (as We did for the relatively easy task of selecting among the lines in Figure A23!) What precise criterion is to be used to find the proper numerical values of A and '5? Notice that for a line to be a "good fit,” we require that the actual data Points be. not very far away from the line For every ObSEIVafion 7% 0f the variable x, the corresponding value of y as obtained from the stylized rela- tionship sumanzed m the given line is (A + too). However, the actual, or observng value Of y when x 2 xi is yi, Hence, if we use the line y = A + bx as a description (or a forecasting device), then for the fill observation, we have an "en-OI” equal to (yr. _ A — 1994,) We would like to choose a line so as to keep such eH-ors as low as possible, on the average, Because large er— rors of opposite signs may cancel each other out, it is appropriate to look at the squares of the valious error terms, It is therefore standard statistical prac— tice to choose the values of A and tr in such a way as to make the sum oftlze satrared errors as low as possible“ This mE‘lhOd 0f "fitting" a Sfl'aigh’f line to grven data is knowrr as the ordinary least squares (OLS) method. The (linear) equation thus Obtajfled is called the (linear) regression equation“ All outline of the OLS procedure foHOWs We have collected rr pairs of Observations on x and y A typical pair is denoted by (1,, y,). If we fit the 788 Appendix 2 Elementary Statistical Methods line 1/ = A + bx to the observations, the value y,- will differ from the predicted value A + hr, by a margin that is the error in the fit at that observation pair. Under this interpretation, we may think of yi' 3 A + bx; 51‘, where 6,- represents all kinds of random disturbances that influence yi other than xi. The coefficients A and b are the unknown parameters that we would A like to estimate Given any estimate A and b of A and la, we see that the predicted value of yi is yi = A + bxii whereas the prediction error of yi is just 9i = 972' ““ yr- The sum of squared errors (SSE, sometimes also called residual sum of squares) is therefore given by SSE = i e? i=1 A Ihe OLS estimates of A and b are defined as the particular values A and la of A and b that minimize the SSE for the sample data. We omit the details of the derivation, but note that the OLS estimates are given by the formulas: (1/74) 21:1(3‘1‘ — 30W: — fl) 2 £11: (A210) 3 E (1/11) zen — W or and (A211) A a 1; — tr (6) If you know a bit of calculus it is very easy to derive A and 5 on your own Set things up as follows: using equation (A2 9), note that the minimization of SSE is equivalent to the problem minimize A, , 2m + bx, — a)? {:1 Now take derivatives of this expression with respect to A and b and set the derivatives equal to zero (these are the first—order conditions) Solve the linear equations in the first—order conditions to get (A2 10) and (A211). mamaeminent-im- mum-4m i am: «dam-Jenna; ..~ ' A2 3 Regression 789 The optimally chosen value b is called the regression coefi‘icient. It tells us 9 about the strength of the influence of x on y; a high value of l3 implies that a small change in x can bring about a large change in y,- a low value of l; implies just the opposite. It isn’t hard to interpret the particular formula that describes 5 The nu- merator is just the covariance between x and y. The denominator is the vari— ance of r. The regression coefficient is the fraction of “covariation” relative to the extent to which i itself varies If there is a lot of covariance even as r vanes very little, we could say that x has a large influence on y 4 All said and done, however, even the best fitted straight line may not be a good fit, For one thing, the 01.8 procedure always gives you some answer even When there is no relationship to write home about: you could regress Cow—Cola consumption in China on the number of red shirts sold in Den— mark, and you’d still get an answer from the OLS estimates More seriously :61 systematic relationship may indeed exist but it may not be a linear one (as In our examples on nutrition and workAcapacity or per capita income and life expectancy). The estimated values of A and b tell us nothing about whether the overall fit is a good one. In this context, our previous discussion regard- ing the two notions of “lgfi‘ength” in a relationship is very much relevant Finally! it is possible (and is almost always the case) that there are other ex— planatmy variables that have been left out of the regression Ihe more such variables we find, the more likely it is that the fit of the regression will be improved (more on this later). Ihis last case is the most benign and can still be informative I Ihe overall explanatmy power of the linear regression is best summa- rized by om familiar fiiend, the correlation coefficient As we already noted it is quite possible for the correlation coefficient to be low even if the esti: mated value of b is large (and vice versa). Here is another way to see the same thing: use equations (A27) and (AZ-10) *0 see that covxy covxy (7x A 0.x R I: = 2 — = 5—, 03, 0'5, o:I 0'5, cry 50 that evefl if g is large, R (01 R2) might be low if the variance of y is very IaI-ge relative to that of x, What does this mean? It means that there is a large pI-OPOIfion of vaflafion in y itself that cannot be preperly explained by looking only at variations in x. As we noted previously, this could happen for one (oi a combination) of several reasons At the same time, it is also possible that b is 10W, but the correlation coefficient of the regression is hitrh: this happens if the overall variation in y is very low relative to the variaticbn of x, Ihe preceding equation brings out these possibilities clearly When we look at a fitted regression equation, therefore, the first thing we Should ask 1-5 the value of R (or R2) A low value may lead us to have little 790 Appendix 2.. Elementary Statistical Methods faith in the regression in the first 1 ‘ With low 2 ' ‘ p ace, although if is possible tha ' ' ' pal-amEteIRb :strlésgjaIplaprgvr’de us With useful information, pr‘ovilileeclgllerSastHlll: espedauy the case in eh1 precrsely enough” (see subsequent text) lhis is because there has bee :1 atop; where the correlation coefficient is low not there are Sim 1 t some damental error of specification but be I p y 00 many explanatory variables for any one of them tocfarelt: a lot of power. Multivariate regression The last rem ‘ v - mofivate mea::$ tie Paraglaph, as well as exercises (l) and (2) e regression It is n10 ' - ' movem . st often th by one eligeOf Same dell-llama“ Vanable can never be adequafelase fillyF the on the Ii ht—pIZHdenl-t var1able alone. Several variables need to Keel)? lande: quantity gof m n ll‘srde of the regression equation simply to bun d c u t: However thequis erEdlhIandomess on the part Of the dependegt 3:2131: . ' er, more immediate ~ - mde end . _ need for includ ' ‘ I‘elatgd will: grinables. some of these variables may be systellingatillcclllllm(ma1 HSEd and With $60; more of the independent variables that we’ve allegi- variables attribit ependent Vanable ‘35 W931; 50 that the exclusion of g _ 1 es a compound effect to the ind . sue Included" Ependent variable already To use the exam . ple of convergence f ‘ note that th . mm the them Of econ ' r of p61 Cgimme i1Siolow model predicts that countries with loweronfllgaitlfitlh cerens Pflrzbris aiodnle W111 grow fastE‘I- HOWEVEI} this Statement is 0111 fall; Per capifa inclo 111 most cases the ceteris will not be paribus Thusjlni hl capital {Wm-Ch ripesthmay permit the accumulation of larger stoéks of hufn BI that inéludes only emse'lve‘s generate faster rates of growth. A re ressian a coefficient on Piffdgiglplla mcomtfi as an indePendent variable Wm ggener 5:: _ _ a income at include b H ' ca rta .. S Ofl‘the d' ” cagital 11:03: (the Solow prediction) and the “indirect” ifgcteif'eclrd per net effe'ct ma 5138::1111318, the two effects work in opposite direction;a 3111:3216}? growth Putt; hfippareI-lfly) Show that Per capita income has noleffect of: e<1uation helpsgto 55:53:? :riran additional valiable into the “931955103 . e detafled discussion)“ 0 Effects (599 Chaptas 3 and 4 for more The en r - . and let xgl Elalszbdeznlls :15le fStatEd Lat y be the independent Valiable . I'A'“! 0 cc one .313 - estimate a linear equation of the formpenderlt variables. Then Om taSk is to (A212 _ ) T"i_Alfiblxl+I3’27‘2-i“ ‘+bkxkr where the constants (A, b1, . ., bk) are to be determined 1 A2 3. Regression the OLS method used in the now to form an intuitive with some difficulty or three or more in- “best” The rest of the story is a natural extension of case of a single independent variable. It is harder picture of what is going on (because scatter plots work with two independent variables and cannot be drawn f the same: we look for the dependent variables), but the main idea is hyperplane that fits the multidimensional scatter of observations. Inst as be— fore, We may define the predicted value 1}, for any collection (A, b1, , bk) and any observation i as (A213) y“, E A + ax} + r;sz + + bkxi.‘ and the prediction error e,- as (AZJM) er 3 Vi ‘ iii Now We carry out exactly the same exercise as before: choose (A, in, . ., bk) to minimize the sum of the squares of the prediction errors 8% These yield natural extensions of the formulas (A210) and (A211) Make sure that you understand just what these estimated coeffrcrents tells us the effect on y of a change in x1 mean For instance, coefficient in when all other values of (x2, . that the change in x1 will haVe no effect but the fact remains that may, in some situations, , xk) are held constant. This does not mean on the other'values of x2. They b1 is a measure of the “pure” direct effect of xi, freed of the “contaminating” influences of the other gression equation, therefore, should independent variables. The “correct” re tell us the nature of the influence of x on y, when the influence of “other factors” has been accounted for. f the correlation coefficient in a multiple It remains to specify an analog 0 regression exercise. This is some measure of how the dependent variable is correlated with the entire set of independent variables. There is an easy way ‘ 'able case: sirnv to do this which nicely genera e as a measure the correlation coefficient between 1/ and the predicted ply tal< values 1} that arise from the regression. After all, the predicted values are a measure of the joint explanatory power of all the independent variables, taken together.4 4 Actually, a slig , called the adjusted cor-relation caefi‘icient, is used lhe adjustment is employed to allow for the fact that the inclusion of any additional independent variables can never lower the correlation coefficient and somefirnes may increase it without really contributing to explanatory power Thus a correction is applied to the correlation coefficient, the size of the correction depending (among other things) on the number of independent or explanatory vari- ables included in the regression It is possible for the adjusted R2 to decline when more independent variables are added 792 Appendix 2. Elementary Statistical Methods (7) Verify that the correlation coefficient proposed for the multiple regres- sion indeed generalizes the case of a single independent variable by proving that the correlation coefficient between two variables .x and y is the same as the correlation coefficient between A + bx and y for any constants (A, b) with l) 75 0. There are two special cases of a multivariate regression that deserve some attention Nonlinear regressions. A multivariate regression can be used to handle situ— ations in which the true underlying relationship is perceived to be nonlinear, either because of conunonsense considerations or more sophisticated theoret- ical reasoning Examples include the relationship between per capita income and life expectancy, discussed earlier in this appendix and in Chapter 2, and the Kuznets inverted-U hypothesis studied in Chapter 7 A first step to deal with this situation is to include both x and x2 as independent variables on the right-hand side of the equation Thus, even if there really is a single independent variable, the model behaves as if there were two: the variable and its square. What is the advantage of including the squared term? It allows different zones of positive and negative association between x and y. However, be— cause the squared term only permits a quadratic equation to estimated, this method carmot handle more than one switch in the direction of association. But the general method easily suggests itself: include further higher powers of x if you wish to handle more complicated switches in behavior. Nonethe— less, few theoretical models in economics generate such complicated behav— ior, unless they also happen to generate negative results of the sort "anything can happen.” (8) What kind of specification would you use to generate a regression equation for the scatter plot in Figure A21? Eyeball the plot and deseribe what values you would expect the different coefficients to have. Some forms of nonlinearity can be converted into a linear estimation equation with very simple mathematical manipulation For instance, suppose that we are interested in estimating the coefficients of the Cobb—Douglas production function (A215) r = AKc'LB by using data on output (Y), capital (K), and labor (l). Clearly running a lin-- ear regression on these variables will get us nowhere, because the functional form that we are trying to estimate is inherently nonlinear However, taking logarithms on both sides of (A215) will help here: (AZlé) lnYzlnA+aan+Blnl A2 3 Regression 79 3 Equation (A216) is a linear form that can be estimated using OLS. Converti the given data to logaritlmnic form to estimate the coefficients lnA, or, and B. For an application of this method, see Chapter 3 Dummy variables. Often, an additional variable takes the form of a dummy,- that is, a variable that only takes on binary values typically represented by the numbers 0 and i. For instance, we might wish to test the hypothesis that females earn less than males in similar jobs. We would need then to estimate a wage equation that includes several independent variables, such as age and education. Of special importance would be the dummy variable that takes the value of 0 if the worker is female and the value 1 if the worker is male. The discrimination hypothesis then states that controlling for other variables such as education and age, the coefficient on this dummy variable should be positive. The coefficient on the dummy can be interpreted as the additional income that a worker receives simply by virtue of being male. There may be other effects as well—war instance, it is possible that a male may also receive more education and that education has its independent effect on wage earnings— but this will not be picked up by the dummy variable provided that edu— cation is aiso included in the regression equation, and indeed it should not. Whether the benefits of being male are chiefly manifested through factors such as better access to education and not directly through the labor market is something that we would like to explicitly analyze, and we don’t want to lump all these effects under one general heading A standard way to include a dummy is through the additive form (A2 17) y x A + bx + CD + error terms, where r is a vector of independent variables, (A, b, c) are constants to be determined (say by 018), and D is a dummy variable that takes the value 1 in the case of a certain occurrence and 0 otherwise. For instance, D might be a country dummy that takes the value 1 if the country is Latin American and 0 otherwise (see the study of the inverted—U hypothesis in Chapter 7) You might ask, What is the advantage of including a dummy variable when we can simply take the data apart for each of the classifications that the dummy is supposed to represent? For instance, if the data are in the form of a panel containing inequality data for some countries that are Latin American and others that are not, then why not simply create two subsets of data from this panel and run separate regressions? Indeed, we could do this, but the point is that the dummy variable approach imposes much more structure on the problemwstr ucture often driven by theoretical considerations Return to the wage discrimination example. We might have theoretical reasons to suppose that changes in education or age have the same effects on male and female wages at the margin, whereas the gender effect simply raises the wages 7 94 Appendix 2 Elementary Statistical Methods for men (relative to women) by the same amount (or by the same percentage) at every level of education or age lhis is tantamount to the assertion that the gender effect only resides in the intercept term A and not in the regression coefficient (or the vector of regression coefficients) represented by E? 5 Take another look at equation (A217) It effectively specifies the intercept term as A + CD and retains the same value of la whether or not the dummy takes the value 0 or 1. lhus the advantage of the dummy variable approach is that it allows us to tweak only those parameters that we consider theoretically affected by the dummy This allows us to pool the data for greater statistical power Moreover, we can (if we wish) allow the dummy to affect some of the regres- sion coefficients by simply interacting the dummy with the relevant variables. For instance, suppose we believe that the wage discrimination effect grows smaller with age. Ihen our specification might read as follows: wage “4 A + b1 educ + lr2 age + b3D age + CD + error, where the variables are self-explanatory (perhaps expressed in logarithmic terms) and D is the gender dummy Note that the dummy has been entered in two places: first as a familiar additive shift and second to explore the idea that greater age reduces the size of the shift. (9) What sign would you expect 53 to have? How would you explore the conjecture that gender bias is unaffected by age, but is more pronounced for higher education levels? Additive dummies are often referred to as fixed efifects because they cap— ture some shift of the regression equation that is presumably intrinsic to the characteristic captured by the dummy Ihus regressions that incorporate country, village, or time fixed effects are simply those that include the corre- sponding dummy variables in their specification. Bias and significance Apart from questions of over all fit, there is the issue of whether the estimated coefficient is can be trusted: is it far off the mark from "the truth”? I have put the phrase "the truth” in quotes because it needs some expla- nation I’hink of a large (potentially infinite) set of (x, y) observations that we might have access to: what we have in our hands is really a subset or a sample of these observations Now there is some "true" relationship between 5 in the wage discrimination example, this assertion pertains to the case in which the Wage iS supposed to be shifted upward by a constant absolute amount because of the gender term If the shift is proportional, we need a different specification if all effects are multiplicative, we can take the logarithmic route discussed for nonlinear estimation {see equations (A2 15} and (A2 16)] A23 Regression 795 the variables x and 1}, but this does not mean that our particular sample al- lows us to divine what this true relationship is Our sample allows us to construct estimates fl and ii of the true relationship that we believe to be “out there,” but these estimates generally would be different if we used an- other sample from this large “mother set” of observations. To continue the rainfall example, as our mother set of observations, think of all the rainfall that has ever occurred in history (and all that will occur) and the accompa— nying levels of crop output. We do not have access to this entire set: just as an example, perhaps we have all years of rainfall and crop output for alter- nate years between 1970 and 1997. We use this information to estimate the “true” relationship that is hidden in the mother set of observations How— ever, another sample of observations (say the other alternate years) may give us a somewhat different estimate. This point of view teaches us that our estimates are themselves random variables in some broader sense. One aim of statistics is to develop a notion of how precise or significant our estimates are; that is, how confident can we be that our estimated value of ii is close to the true l)? A somewhat different twist on this problem is obtained by rephrasing the question a bit: using the estimated value 5 and the other data of the problem, how sure can we be that the true value of b is significantly different from 0, or from 1, or lies within or outside some prespecified range of values? With this twist the estimators only act as stepping stones to the true nature of things (which we can never be completely sure of because we lack all the data). We can only say things about the true relationship with some probability: the value of the exercise then centers on how close the probability is to l Bias. Following up on the previous discussion, we may think of an 01.5 esti- mator as a function of the sample observations {( x1, yl), ., (xn, 1%)}, where you can think of each in observation as multidimensional if you like. let’s give this list of observations a name: call it 2. Thus 2 comes from some mother set of observations, and different draws (or equivalently, different data collection exercises) give rise to different z’s, all from the same mother set Thus an 01.8 estimator (say of the regression coefficient in) can be thought of as a function that yields, for every 2:, an estimate [3(2) 5 ii of the regression coefficients . Now we can think of the average or the expected value of 5(2) as z ranges over all conceivable samples Is this average the "true" value b? If it is, we say that the estimator is unbiased. Ihat’s just a way of saying that we can expect our estimate, on average, to be clustered around the true value that we are after. An attractive feature of the OLS estimators is that they are indeed unbiased in this sense Here are further details to support this observation. Let us restrict our— selves (for simplicity) to the case of a single independent variable. Suppose 796 Appendix 2 Elementary Statistical Methods that in our minds, the truth is given by the model (A218) y, = A + in, + 5,, where the x'ls may or may not be random variables (it doesn’t matter for what I'm going to talk about) and the “noise terms” 6, all come as independent draws from a single distribution that itself is independent of the ,x values. Because 5 is pure noise, we take it that the mean of this distribution is 0, The parameters A and b are what we are after Recall that our estimate it of b is given by the formula 5: Zi=1(xi _ nyr _ Zisdxr — 332 where 2? and 1], you will recall, are the sample means of the x and y obser- vations, respectively It follows that Elixi- —' 9?? ijlcc, — n2 } where the second equality follows from the fact that Elite, -- i) = 0 and the third equality comes from letting A, E (r, — :E)/(Zf:1(x, m if). Conse- quently, (A219) E=Zl,(a+bx,+e,)=b+Z/\,a, i=1 i=1 where in deriving this equation, we have used the observations that 2:1 1‘: 0 and 2L1 llin = Z?=1Ai(xi — f) =1 Now, for given observations of the .x variables, I am going to take expec- tations over the noise terms 612. From equation (A219), this tells us that 5(5) = b + Z heal.) = b, i=1 because the noise terms all have mean 0., This proves that the OLS estimate of b is unbiased. A similar argument applies to the estimate of the intercept term A, A2 ,3. Regression 797 Significance The lack of bias in the OLS estimates of A and it tell us that, on average, we are not making a systematic error in our estimation of the true values of A and b, However, this does not mean that in any particular exercise, we are at the true value of these coefficients or even close Figure A2 4 illustrates this concept by reiterating that any estimate is a random var i- able and, in general, has a distribution of possible values around its mean. What we showed in the previous section is that the mean is indeed the true value that we are seeking, but as the figure shows, there will be some disper— sion around the mean“ All we see is the estimated value b, but because we do not know where the distribution in Figure A2 4 is centered, we do not know whether this estimated value is bang on the truth or is far away (or greater or less) than it, We need some probabilistic assessment of this What follows is a little more technical, so I will start by giving you a simple, intuitive idea of how we go about the process Suppose that we are interested in knowing whether the true value of b is positive (For instance, we may want to know whether rainfall truly influences crop output, whether education influences wages, or whether the stork population in a c0untry influences the number of babies born there.) Thus we regress y on x and form an OLS estimate is of b Now suppose for the sake of argument that the true I) is really 0 Even so, it rarely is the case that the estimated value 5 comes out to be exactly 0, be- cause relationships that need to be estimated by statistical analysis of this sort are rarely exact—there are always tiny unknown outside influences, ever so minute and intractable disturbances, or simply measurement errors in vari— ables, that tend to make the relationship between two variables somewhat fuzzy and blurred. Thus, actual farm output, although strongly dependent A b Density Function of the Estimator 3 b Possible Values of B Pig-are A2 4 Dispersion of the 01 S estimator around the true value ofb 7 98 Appendix 2 Elementary Statistical Methods on rainfall, also may vary slightly in response to other influences that in truth do not have any serious effect For these reasons, we carmot be sure that a positive estimate guarantees that the true coefficient is really positive. Ihe art and science of drawing statistical inferences lies in discovering true and strong "structural relationships” in data that have been contami— nated by the influences of these minor factors, which statisticians (partly to vent their frustration, perhaps!) refer to as "noise." To summarize, then, the estimated coefficient of a particular "explanatory variable” may be nonzero, even if there is no real influence of that variable on y. Alternatively, it may be that in truth the relationship is a positive one, but does a positive estimate really clinch the issue? Statisticians go about this task by specifying a provisional or null hypoth— esis. For instance, they may start by hypothesizing that the “true” value of the regression coefficient in question is nonpositive and then try to calculate how likely it is that the estimated value from any sample can turn out to be what it does turn out to be merely clue to the effect of noise If this probabil- ity is seen to be very low, then there exists strong reason to “reject” the null hypothesis that was assumed to start with, that is, the hypothesis that the “true” coefficient is not positive. In the case of such rejection, the opposite conclusion has to be embraced: that the independent variable in question in— deed has a nonnegligible positive influence on the variable y. In this latter case, the estimated coefficient is said to be significantly different from zero or simply “statistically significant." Note that there are two tasks to be carried out here. First, our calculation of the aforementioned probability surely depends on our belief about the strength and nature of noise in the data at hand. We need to estimate this: we do so by looking at the size of the “residuals,” that is, the deviations of predicted values of y (predicted from the OLS regression equation) from actual values. This gives us some idea of the dispersion or variance in the distribution of b. If this distribution is very closely clustered around the truth and if our estimate is also positive, it is very likely that the truth is positive as well. On the other hand, if this dispersion is very high, then we may not be very sure (unless the estimate itself is very large and positive). lhus it is important to combine both the estimated value of the coefficient and the estimated strength of the noise to form a test of the null hypothesis Under some statistical assumptions, this combination leads to what is called a 1‘ statistic, and we decide whether the coefficient is “significantly posi- tive”, "significantly negative” or "significantly different from zero” by em- bracing the opposite postulate as the null hypothesis and examining whether the value of the t statistic provides us with sufficient grounds to reject this hypothesis. For instance, under the null hypothesis of a 0 “true” coefficient, higher values of the t statistic are more unlikely Hence, the higher is the computed A2 .3. Regression 799 value of the t, the less plausible the null hypothesis will be. Usually, a cutoff point is decided beforehand, and if the value of t turns out to be any higher 1 than that, the null hypothesis is rejected and the coefficient in question is pronounced to be statistically significant. Hence, in any technical report on a regression run by a researcher, it is standard practice to report the respective t values in brackets, right below the estimated coefficients, so that readers may judge the statistical significance of those coefficients 6 lhe second element of subjectivity lies in deciding beforehand how un- likely is “too unlikely” for a null hypothesis to be rejected. It is common practice to work with a 5% probability, that is, the null hypothesis is re- jected if, under its assumption, there is less than a 5% probability that the 1‘ value takes the value it actually does in the given sample. Such a test is then called a test with ”5% level of significance ” Tests with l and 10% levels of significance are also not uncommon. Ihe remaining sections go into these matters in more detail Standard errors Following the foregoing discussion, our first task is to de- termine how dispersed the distribution of the OLS estimator is. Ihis isn’t difficult, at least up to a point. We can use equation (A219) to calculate the variance of l? for a fixed set of r observations. First rewrite (A219) as and then note that A A n 2 (A220) variance of b = EU; — b)2 = 0‘2 A? = —n—O—“’——“r ; Zi=1(xi — 902 where 0' is the common variance of the (independent) error terms It is sim- ilarly easy to show that 0’2 21:1 3‘? 7’1 231:1“: ‘H E? This is good information, but it’s incomplete. In particular, we don’t know what the unknown variance 0'2 is, and so we have to approximate it some— how. One way to do this is to use the observed variation in the estimated err01 terms e,, which you’ll recall are just the differences between the observed y,s (A2 21) variance of fl = 5 Observe that the general philosophy reflected here is one of "not guilty, unless proven other- wise.” For instance, in testing if a coefficient is significantly different from 0, the initial presumption is that the variable in question has no effect on the explained variable (t), but that presumption is later dropped it it seems too unlikely in the light of the available evidence 800 Appendix 2. Elementary Statistical Methods and the predicted 1215. It turns out that an unbiased predictor of the variance (72 is given by the estimator 52: (A222) ‘7 — n—2‘ i=1 An intuitive reason (somewhat unsatisfactory) for dividing by n — 2 rather than by n is that two parameters must be estimated to compute the error terms: A and b For instance, when computing the sample variance of a variable, it is typical to divide by n — 1 because one parameter (the sample mean) must be estimated to compute the sample variance. This changes the degrees of freedom in the sums of squares that define 6'2 Mathematically, it is easy to check by taking expectations that this particular division (by n — 2) does indeed give us an unbiased estimate. We now substitute equation (A2 22) in equations (A220) and (A221) to A come up with estimates of the dispersion in b and A. These are known as the standard errors (SE) of the estimates: . a2 ' (A223) 83(5) = (A224) SE (A) = Sometimes, regression results report these standard errors in parentheses below each of the estimated coefficients. If these errors are small relative to the size of the coefficients, we can have more faith in the qualitative predic- tions yielded by our estimates. For instance, if the estimated 13 is large and positive, while at the same time the standard error of b is small, it is very likely that the true value of b is also positive. With some more assumptions we can go further, as the next section shows. The t distribution. Ihere is a very special distribution called the t distribu- tion It looks a bit like a normal distribution (though it is flatter). It has mean 0, and its variance is determined by a parameter called its degrees of free— dom Ihe t distribution is well known in statistical software programs and in older statistics texts that actually tabulate its proper ties For instance, with 20 degrees of freedom, it is well known that the probability that a random vari— able t will exceed the value 2 086 is precisely 0 025, that it will exceed 1.725 is 0.05, and that its absolute value will exceed 1725 is therefore 0 10 Ihe t distribution plays a beautiful and critical role in the testing of hypotheses. To see how this works, we need one assumption and one theorem. A2 3 Regression Assumption. I he errors 6, distribution. 801; of our regression model follow the well—known normali There are theoretical arguments that justify this assumption, but it is assumption all the same I heorem. If the errors are normally distributed, then the random variable must follow a t distribution size 23 — b sea?) with n — 2 degrees of‘fr'eedom, where n is the sample Hypothesis testing. The preceding theorem allows us to test various hy- potheses, such as whether a regressron coefficient is “significantly” positive, “significantly” negative, or just plain “significantly” different from O. . For instance, we may be investigating whether school dropout rates have anything to do with crime rates in cities Assume we already have regressed crime rates on dropout rates and have obtained an estimate of b We want to know whether b is “significantly” different from 0 Alternatively, we can it is in fact true or we might fail to reject it when it is false. As researchers, we wish to limit these possibilities to a minimum and this will determine the “power” of the test I it is customary to limit false rejections: we want to be conservative in that we do not want to reject a hypothesis when it is actually true. Thus when We do reject the hypothesis, we want to be very confident that the hypothesis is indeed false Ihe flip side of this approach is that a failure to reject does not mean too much: partrcular,1t does not mean that we have “accepted” the hypothesis. indeed, it IS often possible that both a hypothesis and its converse may stand (statistically) unrejected. The proportion of samples in which false rejection occurs is called the level of significance of the te st, usually denoted a. It is common practice to work with a 5% probability, that is, the null hypothesis is rejected if, under its assumption, there is less than a 5% probability that we have rejected the hypothesis when it was indeed true. Such a test is then called a test with “5% level of significance.” also not uncommon. Tests with '1 and 10% levels of significance are 802 Appendix 2. Elementary Statistical Methods Now we will see how the t distribution plays a role in all this. For ex— ample, you are investigating whether school dropout rates have anything to do with crime rates in cities Say you have thirty pairs of observations. You have used OLS to estimate b If the errors are normal, you know from the foregoing theorem that the variable i— 1) sad) has a t distribution with n — 2 degrees of freedom. Because your sample size is thirty, you have 28 degrees of freedom. First, form the null hypothesis. Let us say that we choose as the null hy- pothesis h m 0: dropout rates have no influence on crime rates Second, form the alternative hypothesis. This is the hypothesis that dropout rates have a positive effect on crime rates. We also could have chosen as our alternative the weaker alternative that they have some effect, positive or negative ’ lhird, choose your level of significance Say you are willing to take a one—in—twenty chance that you reject the null when in fact it is true,- that is, a = 0.05 Fourth, look up the value t* of the t distribution such that the probability that t exceeds it is no greater than 0.05. P01 28 degrees of freedom, this is the value I 701. The area to the right of this is called the critical region for the test. Generally speaking, it is the region in which the null hypothesis will be rejected in favor of the alternative.7 Finally, calculate what is called the test statistic: the value of the ratio h/(SE(l3)) (Ihe numerator is 5 because we are working under the null hy- pothesis that h = 0.) Suppose that this statistic is 2. Because the theorem tells us that this test statistic follows a t distribution with 28 degrees of freedom, we see that the chances that this statistic could have acquired the high value of2 when the hypothesized value of h is O is lower than 0 05. We would then reject the null hypothesis that school dropout rates have no influence on crime rates in the city Here is a quick summary of the general method. (1) Specify your null hypothesis, alternative hypothesis, and the signifi- cance level a. (2) Find the critical region using the t distribution with the appropriate degrees of freedom (3) Using the sample data, compute the value of the test statistic. 7 the critical region depends on what our alternative hypothesis is. In our case we use the alternative that dropout rates have a positive effect On crime, so the critical region will be to the right of the threshold 1.701 (this will become clearer in the final step of the exercise) A23 Regression 803 (4) Check wheth ' ' ' ' region. er or not the calculated test statistic falls in the critical (5) Reject the null hypothesis if the test statistic falls in the critical region Do not reject the null 1f the test statistic does not fall in the critical regign ‘ Confidence intervals. Another way to check significance is to offer the read a confidence interval for your estimate; that is, provide an interval of valueEI around your estimate with the following interpretation: the true value of 5 Will he in this constructed interval for more than a predetermined ercent~ age of samples This predetermined percentage (chosen by the r‘eseariher) is analogous to the level of significance in hypothesis testing and is known a the confidence level Indeed, the confidence level is often denoted b 1 ~ as where a is the associated level of significance. lhus confidence legels (ex: pressed as percentages) are usually taken to be 90, 95, or 99%. Note well that our estimator ii is a random variable, so the confidence in— terval is random as well: it will var y from sample to sample. In contrast the true value of h is some fixed (but unknown) number. Ihe probabilistic slate— ment in the previous paragraph thus refers to the chances that this randornl varying interval will contain the true value within it, and not to the chance: of some randomly varying parameter lying within some fixed interval More formally, a confidence interval for a parameter lr, given the estima- tor lr, is a range I = [h — B, h + [l] computed from the sample data so that I contains the true value b in a high enough percentage of samples (where "high enough” is given by the pr'echosen confide 1 a l ‘ I choose 3 so that nce eve) In other words, (A2 25) probability (:5 — b] < B) = 1 w or, where I [ denotes absolute value and 'l — cu denotes the confidence level _ l‘o find B we need to have an idea of how the random variable 5 — h is distributed Ihis is where the t distribution makes a reappearance. Sim l divrde both sides of the inequality in (A225) by the standard error of 5 life: the left—hand side is a random variable t that has a t distribution with n — 2 degrees of freedom. Thus (A2 25) is equivalent to the requirement A (A226) probability (m < B, ) = 1 _ a: sea) Using tables of the t distribution, we can find the critical value t* (a n —2) that makes this inequality true. In other words, set I 3 = t*a,rr—2 X 804 Appendix 2 Elementary Statistical Methods Where P(|t| < t* ) E 1— a This leads to the much more familiar expression a, n—2 for a confidence interval for the parameter l7: I: [23 —- t; H x 813(5), 13 + QM x seems In summary, we construct a confidence interval for the parameter b in our linear regression model by using the following four steps: (1) Choose your confidence level 1 — a: 90, 95, or even 99% (2) Look up the value t“ ",2 using tables of the t distribution or let your a; computer do it for yOu‘ For example, if your sample size is 120, the values t2.“ are 1 289, 1.654, and 2.358 for the 90, 95, and 99% confidence levels, respectively (3) Compute the estimate 5’ and its estimated standard error SEUS) (4) Calculate S = t"‘m’n_2 x 813(5) and finally 3i mm References ...
View Full Document

Page1 / 15

Ray_App2_ElementaryStatsMethods - Appendix 2 Elementary...

This preview shows document pages 1 - 15. Sign up to view the full document.

View Full Document Right Arrow Icon
Ask a homework question - tutors are online