This preview has intentionally blurred sections. Sign up to view the full version.
View Full DocumentThis preview has intentionally blurred sections. Sign up to view the full version.
View Full DocumentThis preview has intentionally blurred sections. Sign up to view the full version.
View Full DocumentThis preview has intentionally blurred sections. Sign up to view the full version.
View Full DocumentThis preview has intentionally blurred sections. Sign up to view the full version.
View Full Document
Unformatted text preview: A Simple Introduction to Econometrics By Mark R. Killingsworth
Department of Economics, Rutgers University Econometrics is the measurement of economic relationships using statistical techniques such as regression
analysis. It can be used to answer questions such as the following: 0 If taxes are cut by $10 billion, by how much will consumption spending change?
o If a firm raises the price of a product by five percent, by how much will its sales change? 0 Are women or blacks paid less than white men with the same skills and jobs preferences? If so, by
how much? These questions are all concerned with the si_Z_e of the effect that one factor has on another. Common
sense and/or economic theory can sometimes help us make an informed guess about the direction of such
effects. For example, other things being equal, one would expect that a rise in price would lead to a
decline in the quantity purchased by consumers. However, neither common sense nor economic theory
can tell us how large such an effect will be. And, in some cases, we can’t be sure that there is any effect
at all.‘ Finally, although most propositions in economics are predictions about what will happen when
% thing changes with “other things being equal,” it isn’t exactly obvious how one would go about
verifying such propositions when many things are happening at the same time.2 Hence the value of econometrics: it is a way to measure the "other things being equal" effect on one
factor of a change in some other factor; and it is also a way to determine whether there’s any effect at all. This discussion is concerned with several aspects of econometrics: what regression analysis results mean,
and how you can use them to derive quantitative measures of economic relationships; and the statistical
precision of quantitative measures derived from such analyses (and the related issue of statistical signiﬁcance). l. A Simple Example To start, consider how one would analyze data on price and quantity purchased. In particular, how would
one quantify the extent to which, other things being equal, a change in price leads to a change in quantity
demanded? Suppose that we have obtained data on both (i) the price P paid for meat per pound, and (ii) the number of
pounds of meat Q purchased by different households over a onemonth period. To get some idea of the
relation between price and quantity, we could simply graph the P—Q combinations for the different
households, as shown in Figure 1. Each household is represented by a diamond denoting P and Q. (For
example, Household A paid a price of $6 per pound and purchased about 6 pounds of meat during the
month.) Obviously, the relationship here isn't exact, but it does seem to be the case that when the price P
is high, a smaller quantity Q is purchased. What regression analysis does is to summarize the relation between P and Q by drawing a regression line
through the scatter of data points. This is shown in Figure 2. Note that, roughly speaking, the regression 1For example, if the demand for a product is completely inelastic — that is, if the income and substitution effects of a
change in price on quantity demanded are equal _ then a higher price won’t have any net effect at all on the demand
for the product. 2For example, we may find that when the price ofa product is changing, consumer incomes are changing too. That
makes it hard to be sure whether demand changed because of the change in price, the change in income or a change
in some other factor. line in Figure 2 goes through the middle of the scatter of points denoting individual households, and that
it does indeed summarize the overall or average relation between price and quantity purchased. The slope
or “tilt” of the line is negative, implying that, on average, quantity purchased falls as price rises. This relationship can be expressed in the form of an equation. To see how, note first that the regression
line runs into the vertical axis at P = 20; in other words, if the price (P) were $20 per pound, quantity
purchased (Q) would be zero. Note also from the regression line that, on average, as quantity (Q)
increases by one pound per month, price (P) falls by about $2.00 per pound. Thus, for these data, the
regression line implies the following demand equation or formula: (1) Price per pound = 20  (2 x Quantity purchased per month) or, using the symbols P and Q for price and quantity purchased, respectively: (2) P = 20 — 2Q In these equations, 20 is called the “intercept” or “constant term” in the demand equation, and —2 is called
the “slope” (or “coefficient”) of the “quantity purchased” (Q) variable.3 Economists usually graph the
demand curve with P on the vertical axis and Q on the horizontal axis. Note also, however, that we can
rearrange the terms in equation (2) to get (3) P  20 = — 2Q or, equivalently, (4) Q: lO—O.5P Equation (4) simply restates the same information contained in equation (2): equation (4) says, for example, that each increase of $1 per pound in P results in a drop in Q of about 0.5 pounds per month, on
4 average. Figure 2 shows a regression analysis of the relation between just two factors — price and quantity.
However, the technique can readily be extended to multiple factors. This is particularly important in
economics, because in most cases economic theory suggests that many different factors may have an
effect on a given outcome. (For example, the theory of consumer demand suggests that the quantity
purchased of a given good will depend on the price of that good, on prices of QM goods, on consumer
income, and on various noneconomic factors such as advertising, tastes and preferences, etc.) To see how multiple factors can be considered in a regression analysis, consider Figure 3, which is a
scatter diagram showing price—quantity combinations for meat demand for both “high—income” and “low—
income” households. In Figure 3, just as in Figures 1 and 2, the relation between price and quantity (for
either type of household) is negative. However, Figure 3 shows something else that Figures 1 and 2 did
not: as indicated in Figure 3, quantity demanded is higher at any given price for mgh—income households
than for Low—income households. (In other words, according to the evidence in Figure 3, higher income
shifts the demand curve for meat up and to the right, implying that meat is a socalled “normal good”) 3 Note that the slope of any line is equal to “rise” divided by “run,” where “rise” is the change in the vertical
dimension and “run” is the change in the horizontal dimension as we move along the line. In this case, for each
“run” (change in Q) of+l, we find that there is a “rise” of —2 (in other words, a decrease) in P as we move along the
line. So the slope = rise/run : —2/+l = 2. ‘1 If you don’t see this, try plugging in different numbers for P and using equation (4) to calculate the amount of Q
demanded at each P. For example, for P : 9, Q = 10 ~ 45 = 5.5; for P : 10, Q : 10—5 2 5; and so on. Thus, as P
rises by l, Q falls by 0.5. 2 These relationships can be summarized by the regression lines shown in Figure 4. Note that now there
are two regression lines, one for each type of household. For either type of household, quantity demanded
is higher at lower prices, on average: the lines have a downward “tilt.” The “tilt” (or slope) of the
regression lines measures the independent relation between price and quantity, with income staying the
m: that is, as we move along the same regression line, for a given type of household (either high— or
low—income), we’re changing P and seeing how Q varies with income held constant. Note that in Figure
4, as in Figure 2, the slope of each demand line is 2. Similarly, at any given value of P, the level of Q is greater for high—income households (on average) than
it is for lowincome households. Equivalently, the difference (measured vertically) between the two
regression lines measures the difference (on average, and other things being equal) in the maximum prices
that low and high—income households would be willing to pay for the same quantity. Just as in the simpler analysis of Figure 2, the results of the more complex analysis of Figure 4 can be
summarized in the form of an equation. To see how, note several things from Figure 4: 0 on average, demand of highdncome households (Q) will be zero when the price per pound (P) is $30,
vs. a figure of $18 for lowincome households; and 0 on average, for either kind of household, it would be necessary to reduce P by $0.50 per pound in
order to cause an increase in Q of one pound per month. Thus, for these data, the formula or equation for demand is
(5) P = 18 — 2Q + 12 i_f household is high’income
Equivalently, we can rearrange the terms in (5) to obtain
(6) Q = 9 — 0.5P + 6 ﬂ household is high—income Note that there are now two “slopes” or “coefficients,” and that each of them measures the “otherthings—
being—equal” effect of the variable with which they are associated — in other words, the coefficient for a
given variable measures the effect of a change in that variable with the other variable(s) remaining
unchanged. For example, in (6), the coefficient on the price or P variable is ~05. This means that, other
things being equal (including whether a household is a high— or a low—income household), a rise in P of $1
per pound will be associated, on average, with a reduction in Q of 0.5 pounds per month. Likewise, the
coefficient in equation (6) on the “if household is hi ghincome” variable measures the effect on quantity
demanded (Q) of a change in household income, with price remaining the same. In other words, the slope
of 6 on the “if household is high—income” variable in equation (6) means that, if price were the same, a hi ghincome household would buy about 6 more pounds of meat per month, on average, than a low“
income household. The analysis shown in Figure 4 is concerned with the relation between just three factors — prices, quantity
purchased and income. Moreover, “income” is measured in a very simple way (i.e., households are just
called “highincome” or “lowincome,” but there is no attempt to take into accountjust how much income
each household has). However, the kind of analysis shown in Figure 4 can easily be generalized to take
many different factors into account. As additional factors are taken into account, it becomes more
difficult to show them in graphical form. Fortunately, it is still relatively simple to express them in
equation form, and thus to interpret their implications. For example, consider the following “multivariate” regression (which is just a fancy name for a regression
that has many variablesl): (7) Quantity of meat 2 ll — (0.171 x Price of meat per pound)
purchased/month + (0.021 x Household income, in $ per month)
+ (0.1 14 x Number of persons in household)
 (0.013 x Price of potatoes per pound)
+ (0.281 x Price of fish per pound) This analysis considers the quantity of meat purchased per month (on average) by a household in relation
to the price of meat, household income, the number of person in the household, the price of potatoes and
the price of fish. (Can you think of additional factors that might affect meat purchases?) In equation (7), the coefficient for each variable measures the effect of a one—unit change in that variable
on meat purchases, on average and other things being equal. For example, the coefficient on the “price of
meat” variable in equation (7) is 0.171. This means that, on average and provided other things
(household income, number of person in household, price of potatoes, price of fish) remain the same, a $1
rise in the prices of meat per pound will result in a reduction in meat purchases of about 0.17 pounds per
month. Likewise, the coefficient on the “household income” variable in equation (7) is 0.02]. So
equation (7) implies that, on average and provided other things (which in this case are the price of meat,
number of persons in household, price of potatoes and price of fish) remain the same, a $1 rise in
household income will result in an increase in purchases of meat of about 0.02 pounds per month. Each
of the other coefficients in equation (7) may be interpreted in the same way: as the effect, “on average”
and “other things being equal,” of a one—unit change in the variable with which it is associated.5 Finally, equation (7) can also be used to compute the average effect on meat purchases of changes in
several variables occurring at the same time. For example, suppose household income rises by $100 per
month z_1_r1_d price falls $1 per pound. According to equation (7), the effect of the change in household
income on meat purchase would be a change of +0.02} x +100 = 2.1 pounds per month; the effect of the
drop in price would be “0.171 x —l = 0.171 pounds per month; so, on average (and provided nothing
changes except income and the price of meat), the total effect on meat purchases will be a change of
(+2.1) + (+0171) 2 +2.27l, or an increase of 2.27] pounds per month, on average. 2. Statistical Precision and Statistical Significance Econometric analyses (e.g., regression analysis) do more than provide coefficients — i.e., quantitative
measures of associations between an outcome (the “dependent variable”) such as price and one or more
factors that may be associated with it (the “independent variables”). They also provide measures of the
statistical precision of those coefficients. The concept of statistical precision is intimately connected with
two other concepts: the standard error and the statistical significance of the coefficients. In discussing these concepts, an analogy to polling may be helpful. When they discuss results of opinion
polls, pollsters often say that their poll has a‘ ‘margin of error’ ’of pl us or minus so many percentage
points.6 The margin of error provides information about how precise the poll really is. For example 5 For example, the coefficient on “number of persons in household” in equation (7) is 0.1 14. Thus, the equation
implies that. with all other factors remaining the same, having an extra person in the household would lead to
purchases of an extra 0.1 14 pound of meat per month, on average. (’“The margin of error of a poll depends on essentially two things: it will be lower if opinion doesn’t vary a lot in
the population itself, and/or ifa large number of people have been included in the poll. This accords with common
sense: a poll can give a very precise reading on how the population feels ifmost people have the same opinion; and
a poll will be less reliable (i.e., less precise) ifonly a relatively small number of persons have been polled. Note that
the precision ofa poll result is not the same thing as the M of a poll result: for example, if a poll is selected by
sampling from registered voters, it may not give a fair (i.e., unbiased) reading of the opinion of the population gig
whole (because some people in the population aren’t registered to vote); but if a very large number of registered
voters are included in the sample and are surveyed, the poll may give a very precise estimate (one with only a small
“margin of error”) of how revistered voters feel. 4 suppose the Poll A has a margin of error of plus or minus two percentage points whereas Poll B has a
margin of error of plus or minus ten percentage points. Obviously, Poll A will provide a much more
precise estimate of how the population feels about a given issue. Similarly, in evaluating econometric results like those for equation (7), it is important to keep in mind that
the regression coefficients, like poll results, are not absolutely precise but, rather, should be regarded as
being subject to a “margin of error.” For example, according to equation (7), an increase of $1 per pound
in the price of meat is estimated to reduce the quantity of meat purchases per month by 0.17] pounds, on
average and other things (income, number of persons in household, etc.) being equal. But, by analogy with opinion polls, it would be more appropriate to say that this reduction would be 0.171 pounds, plus or
minus some “margin of error.” What “margin of error” should one use in interpreting the results of a regression analysis like equation
(7)? This proceeds in several steps. m, one uses statistical formulas and the same data used in
computing the regression line itself to compute something called a standard error for each coefficient.
Then, divide the standard error into the absolute value of the coefficient itself, to obtain something called
a t—statistic for that coefficient. Thus, the relation between these concepts is as follows: (8) t~statistic = absolute value of regression coefficient
standard error for that regression coefficient In general, the larger a regression coefficient is relative to its standard error, the more “precise” that
coefficient is, and the more confident we can be that changes in the relevant independent variable (e.g.,
price) will be associated with changes in the dependent variable (e.g., quantity).7 Equivalently, if the
coefficient for a given independent variable has a high tstatistic (i.e., if the coefficient relative to its
standard error is large), then (i) the effect of that independent variable is measured relatively precisely,
and (ii) we can be confident that there really is a relation between that independent variable and the
dependent variable. In this case, the effect of the independent variable is called “statistically significant.” Just how high does the t—statistic have to be before we can say that the independent variable has a
“statistically significant” effect, or, equivalently, before we can say that the effect of the independent
variable is quite precisely measured? A convenient rule of thumb to use here is that we can say that the
independent variable’s effect is reasonable precisely measured (and its effect is “statistically significant”)
if the t‘statistic for that variable is greater than about 2.0 or more is absolute value. Of course, the higher
the t—statistic, the more precise (and the more “statistically significant”) the effect is. Equivalently, a t—
statistic of 1.8 (in absolute value) for a particular independent variable provides some evidence that this
variable is indeed related to the dependent variable; but a t—statistic of 2.8 would provide even stronger
evidence; and if the tstatistic were 3.8 (or even greater), the evidence would be stronger still. To fix ideas, consider Table l on page 7. This shows some hypothetical regression results for both (i) the
demand for meat and (ii) the demand for fish, displayed in a format commonly used by economists.
Column (1) refers to the demand for meat, and column (2) refers to the demand for fish. Note that
column (1) simply restates equation (7), and that the results in column (2) could likewise be written out in
the form of an equation instead of in the format used in Table l. (That is, the equation for the demand for
fish would be Q = 4.193 + (0.123 x Price of meat) + (0.048 x Household income) + etc.) 7 Note that the logic here is very similar to the logic underlying the margin of error for opinion polls: if a given
result (regression coefficient or poll result) is large relative to its standard error, the result is quite precise. For
example, suppose that an opinion poll says that 45% of voters favor Candidate A whereas 51% favor Candidate B.
If the margin of error for the poll is 2% (two percentage points), we can be rather confident that Candidate B is
leading Candidate A. However, if the margin of error for the poll is 6% (six percentage points), our poll doesn’t
allow us to be very confident about whether Candidate B is actually ahead. 5 First consider column (1) in Table 1. 1n the demand equation for meat, the “price of meat” variable has a
coefficient of —0.171 and a standard error of 0.042. Thus, the coefficient is large relative to its standard
error — it’s more than four times larger than its standard error, in fact — so we can be quite conﬁdent that
there is an otherthingsequal relation between the price of meat and the quantity of meat purchased. The
t~statistic for the price of meat variable is therefore 0171/0042 = 4.071, which is much larger than 2.0 or
even 3.0. So in the demand—for—meat equation, the price of meat variable is highly “statistically
significant.” In contrast, the “price of potatoes” variable in the same equation has a tstatistic of
0013/0012 = 1.083, so this variable is not “statistically significant” under the rule of thumb mentioned
earlier. The other entries for column (1), and the similar entries in the demand—for—fish equation in
column (2), can be interpreted in the same way. Note that the demands for both kinds of commodities
rise with household income (and that both “household income” coefficients have t—statistics of 2.0 or
more); thus, both commodities are “normal goods,” and these effects are both statistically significant.
Similarly, both demand curves are downwardsloping: a rise in the price of meat will reduce the demand
for meat, on average and other things being equal; and a rise in the price of fish will reduce the demand
for fish (note that the “price of fish” variable in column (2) has a negative coefficient). Also, note that in
both cases a rise in the price of we of these goods will increase the quantity demanded of the other good:
a $1 increase in the price of fish will increase the demand for meat (by 0.281 of a pound per month);
similarly, a $1 increase in the price of meat will increase the demand for fish (by 0.123 of a pound per
month). Thus, these goods are “substitutes” rather than complements. Note that Table 1 shows the regression coefficients and their standard errors. Displaying the results in
this form is common, but somewhat inconvenient: one must divide each coefficient by its standard error
(as was done above) in order to get the t—statistic for that coefficient. For this reason, economists
sometimes display the t—statistics rather than the standard errors, as shown in Table 2. It’s a good idea to
check whether the numbers in parentheses immediately underneath the coefficient are standard errors or t—
statistics, since this has a lot to do with how one interprets the results! 20W 18 16 _x
A .4
N 0) price ($ per pound)
a Q0 00 .9 Figure 1: Household Meat Purchases vs. Price ”99 0 99990 / . househoid A 3 4 5 6 7
quantity bought (pounds per month) 10 price ($ per pound) 20 18 16 14' .3
N .A
O CO Figure 2: Meat Purchase/Price Regression Line intercept = 20 0 ”rise“ _ "run“ = +2.5 slope = rise/run = —5/2.5 = 2 O 1 2 3 4 5 6 7 quantity bought (pounds per month) 10 price ($ per pound) 30 25 20 15 1O o
‘ Figure 3: Household Meat Purchases vs. Price by Level of Household Income 9 low income
gibigh income O” o
3
o o
‘ 2 3 4 5 6 7 8 9
quantity bought (pounds per month) price ($ per pound) Figure 4: Meat Purchase/Price Regression Lines by Level of Household income ‘/ intercept = 30 3O W‘WWWWWWWWWWWWMWWWW: i i i 25 g i i 20 , l i regression line (high income) % i is i highiow income g difference = 12 i 10 ~ E run = +2.5 ~ ~ .3 i slope = rise/run = 5/2.5 = 2 ‘ ~ ‘3 regression line (low income) i ~ S 5 o ‘ ~ ‘ , E ‘ i s . ~ ~ . ~ ‘ ~ E ‘ i O L ~ ‘ E
O 1 2 3 4 5 6 7 8 9 quantity bought (pounds per month) Table 1: Hypothetical regression results for demand for meat and fish (standard errors in parentheses) dependent variable: independent (1 ) (2)
variable meat fish
price of meat, per pound 0. 171 0.123
(0.042) (0.060)
household income, in $/month 0.021 0.048
(0.010) (0.01 1)
number of persons in household 0.1 14 0.080
(0.042) (0.1 12)
price of potatoes, in $/pound 0.013 —0.089
(0.012) (0.021)
price of fish, in iii/pound 0.281 0.234
(0.1 12) (0.063)
intercept 1 1.009 4.193
(3.212) (1.567) Table 2: Hypothetical regression results for demand for meat and fish (tstatistics in parentheses) dependent variable: independent (1 ) (2)
variable meat fish
price of meat, per pound —0. 1 71 0.123
(4.071) (2.050)
household income, in $/month 0.021 0.048
(2.100) (4.364)
number of persons in household 0.1 14 0.080
(2.714) (0.714)
price of potatoes, in $/pound —0.013 —0.089
(1.083) (4.238)
price of fish, in SS/pound 0.281 —0.234
(2.609) (3.714)
intercept 1 1.009 4.193
(3.427) (2.676) ...
View
Full Document
 Spring '08
 Vancura

Click to edit the document details