**Unformatted text preview: **ECF 230: Introduction to Econometrics
Chapter # 1: Introduction By: Damodar N. Gujarati Ms Mwafulirwa Jane
School of Business: University of Lusaka
2018. WHAT IS ECONOMETRICS?
• Econometrics means “economic measurement”. the scope of
econometrics is much broader, as can be seen from the following
definitions:
• “Consists of the application of mathematical statistics to economic
data to lend empirical support to the models constructed by
mathematical economics and to obtain numerical results” (Gerhard
1968).
• “Econometrics may be defined as the social science in which the tools
of economic theory, mathematics, and statistical inference are
applied to the analysis of economic phenomena” (Goldberger 1964).
• “Econometrics is concerned with the empirical determination of
economic laws” (Theil 1971). METHODOLOGY OF ECONOMETRICS
• Broadly speaking, traditional econometric methodology proceeds
along the following lines:
1. Statement of theory or hypothesis.
2. Specification of the mathematical model of the theory
3. Specification of the statistical, or econometric, model
4. Collecting the data
5. Estimation of the parameters of the econometric model
6. Hypothesis testing
7. Forecasting or prediction
8. Using the model for control or policy purposes.
• To illustrate the preceding steps, let us consider the well-known
Keynesian theory of consumption. 1. Statement of Theory or Hypothesis
• Keynes states that on average, consumers increase their consumption as their
income increases, but not as much as the increase in their income (0<MPC <
1). 2. Specification of the Mathematical Model of Consumption (singleequation model)
Y = α+ βX 0 < β2 < 1 (I.3.1) Y = consumption expenditure and (dependent variable)
X = income, (independent, or explanatory variable)
α = the intercept/constant
β = the slope coefficient/ gradient •
• The slope coefficient β measures the Marginal Propensity to consume.
The specific form of this mathematical model is a linear model THE MEANING OF THE TERM LINEAR
• Linearity in the Variables
• The first meaning of linearity is that the conditional expectation of Y is a linear
function of Xi, the regression curve in this case is a straight line. But
• E(Y | Xi) = β1 + β2X2i is not a linear function • Linearity in the Parameters
• The second interpretation of linearity is that the conditional expectation of Y,
E(Y | Xi), is a linear function of the parameters, the β’s; it may or may not be
linear in the variable X.
• E(Y | Xi) = β1 + β2X2i
• is a linear (in the parameter) regression model. All the models shown in Figure
2.3 are thus linear regression models, that is, models linear in the parameters. • Now consider the model: • E(Y | Xi) = β1 + β22 Xi . • The preceding model is an example of a nonlinear (in the parameter) regression
model. • From now on the term “linear” regression will always mean linear in parameters
regression that is linear in the parameters; the β’s (that is, the parameters are
raised to the first power only). • Geometrically, the MPC shows us by how much the
consumption expenditure will change when income increases by
one unit. 3. Specification of the Econometric model of Consumption
• The relationships between economic variables are generally inexact. In
addition to income, other variables affect consumption expenditure. For
example, size of family, individual preferences, environment, family religion ,
etc., are likely to exert some influence on consumption.
• To allow for the inexact relationships between economic variables, (I.3.1) is
modified as follows: •
• Y = α + βX + u
(I.3.2)
where u, known as the disturbance, or error, term, is a random (stochastic)
variable that has well-defined probabilistic properties. The disturbance term
u may well represent all those factors that affect consumption but are not
taken into account explicitly.
Therefore the econometric model is comprised of the mathematical model
and the random component. • THE SIGNIFICANCE OF THE STOCHASTIC
DISTURBANCE TERM
• The disturbance term ui is a surrogate for all those variables that
are omitted from the model but that collectively affect Y. Why don’t
we introduce them into the model explicitly? The reasons are many:
• 1. Vagueness of theory: The theory, if any, determining the behavior
of Y may be, and often is, incomplete. We might be ignorant or
unsure about the other variables affecting Y.
• 2. Unavailability of data: Lack of quantitative information about
these variables, e.g., information on family wealth generally is not
available.
• 3. Core variables versus peripheral variables: Assume that besides
income X1, the number of children per family X2, sex X3, religion X4,
education X5, and geographical region X6 also affect consumption
expenditure. But the joint influence of all or some of these variables
may be so small and it does not pay to introduce them into the
model explicitly. One hopes that their combined effect can be
treated as a random variable ui. • 4. Intrinsic randomness in human behavior: Even if we succeed in
introducing all the relevant variables into the model, there is bound to
be some “intrinsic” randomness in individual Y’s that cannot be
explained no matter how hard we try. The disturbances, the u’s, may
very well reflect this intrinsic randomness.
• 5. Poor proxy variables: for example, Friedman regards permanent
consumption (Yp) as a function of permanent income (Xp). But since
data on these variables are not directly observable, in practice we use
proxy variables, such as current consumption (Y) and current income
(X), there is the problem of errors of measurement, u may in this case
then also represent the errors of measurement.
• 6. Principle of parsimony: we would like to keep our regression model
as simple as possible. If we can explain the behavior of Y
“substantially” with two or three explanatory variables and if our theory
is not strong enough to suggest what other variables might be included,
why introduce more variables? Let ui represent all other variables. • 7. Wrong functional form: Often we do not know the form of
the functional relationship between the regressand
(dependent) and the regressors. Is consumption expenditure
a linear (in variable) function of income or a nonlinear
(invariable) function? If it is the former,
• Yi = β1 + B2Xi + ui is the proper functional relationship
between Y and X, but if it is the latter,
• Yi = β1 + β2Xi + β3X2i + ui may be the correct functional form.
• In two-variable models the functional form of the relationship
can often be judged from the scattergram. But in a multiple
regression model, it is not easy to determine the appropriate
functional form, for graphically we cannot visualize
scattergrams in multiple dimensions. 4. Obtaining Data
• To obtain the numerical values of β1 and β2, we need data. Look at Table
I.1, which relate to the personal consumption expenditure (PCE) and the
gross domestic product (GDP). The data are in “real” terms. The data are plotted in Figure I.3 Basic Data Types used in Econometrics
There are 3 types of data which econometricians might use for
analysis:
1.Time series data
• is a set of observations on the values that a variable takes at
different times. Such data may be collected at regular time intervals,
such as daily (e.g., stock prices, weather reports), weekly (e.g.,
money supply figures), monthly [e.g., the unemployment rate, the
Consumer Price Index (CPI)], quarterly (e.g., GDP), and annually
(government budgets).
2. Cross-sectional data
•are data on one or more variables collected at the same point in
time, such as the census of population conducted by the Central
Statistics Organisation (CSO) every 10 years (the latest being in year
2010), the basic food basket by JCTR and opinion polls by MUVI –
TV and the Post Newspaper. 3. Panel data, a combination of 1. & 2.
•For a panel, data from both time series and cross-section data elements
are used. The table below is an example of panel data.
Hypothetical Egg Production in Zambian: 2010-2011.
Province
Y1 = eggs produced in 2010 (millions)
Y2 = eggs produced in 2011 (millions)
X1 = price per tray(ZMK) in 2010
X2 = price per tray(ZMK) in 2011 Y1 Y2 X1 X2 LUSAKA 100 150 20 22 COPPERBELT 200 280 19 21 CENTRAL 60 62 18 20 NORTHERN 50 53 18.5 19.5 SOUTHERN 44 48 18 21 MUCHINGA 10 28 19 20 N.WESTERN 65 90 19.5 20.5 WESTERN 30 35 20 21 EASTERN 55 65 18 22 LUAPULA 60 78 19 21 For each year we have 10 cross-sectional observations (the provinces) and
for each province we have two time series observations on output and
prices of eggs, a total of 20 (combined) observations. Note that a panel is a
special case of pooled data. Pooled data combines cross sections and time
series. They could be different cross sections but panel uses the same crosssections over the period considered. 5. Estimation of the Econometric Model
• Regression analysis is the main tool used to obtain the estimates. Using this
technique and the data given in Table I.1, we obtain the following estimates
of α and β, namely, -1208 and 0.67. Thus, the estimated consumption
function is:
• Yˆ = −1208 + 0.67Xi • The slope coefficient (i.e., the MPC) was about 0.70, an increase in real
income of 1 dollar led, on average, to an increase of about 70 cents in real
consumption.
The estimated regression line of the estimated equation can be shown on a
graph as a regression line. • (I.3.3) 6. Hypothesis Testing
•
•
•
•
•
•
•
• That is to find out whether the estimates obtained in, Eq. (I.3.3) are in
accord with the expectations of the theory that is being tested . Keynes
expected the MPC to be positive. Thus we could have stated our hypothesis as
Hₒ : β <0
H₁: β >0
Based on the estimation done in the previous step, MPC is 0.67. Hypothesis
testing entails determining if 0.67 is statistically positive despite it being
obvious on face value. Alternatively, the hypothesis of interest could also be;
Hₒ : β >1
H₁: β <1
Which is also in line with Keynes theory of consumption. In this case, we
would want to establish if the 0.67 is statistically less than 1.
Such confirmation or refutation of economic theories on the basis of sample
evidence is based on a branch of statistical theory known as statistical
inference (hypothesis testing). 7. Forecasting or Prediction
• To illustrate, suppose we want to predict the mean consumption
expenditure for 2017. If the GDP value for 2017 was 7269.8 billion dollars
consumption would be:
Yˆ1997 = −1208 + 0.67(7269.8) = 3662.766 • (I.3.4) The actual value of the consumption expenditure reported in 2017 was
4913.5 billion dollars. The estimated model (I.3.3) thus under-predicted the
actual consumption expenditure by about 1250.73 billion dollars. We could
say the forecast error is about 1250.73 billion dollars, which is represented
by the random or stochastic term. • Suppose that, as a result of the proposed policy change, investment
expenditure increases. What will be the effect on the economy? As
macroeconomic theory shows, the change in income following, a dollar’s
worth of change in investment expenditure is given by the income
multiplier M, which is defined as: • M = 1/(1 − MPC) • The multiplier is about M = 3.03. That is, an increase (decrease) of a dollar
in investment will eventually lead to more than a threefold increase
(decrease) in income; note that it takes time for the multiplier to work.
The critical value in this computation is MPC. Thus, a quantitative estimate
of MPC provides valuable information for policy purposes. Knowing MPC,
one can predict the future course of income and consumption expenditure. • (I.3.5) 8. Use of the Model for Control or Policy Purposes
• •
•
• Suppose we have the estimated consumption function given in (I.3.3).
Suppose further the government believes that consumer expenditure of
about 4900 will keep the unemployment rate at its current level of about
4.2%. What level of income will guarantee the target amount of
consumption expenditure?
If the regression results given in (I.3.3) seem reasonable, simple arithmetic
will show that:
4900 = −1208 + 0.67X
(I.3.6)
which gives X = 9116.42, approximately. That is, an income level of about
9116.42(billion) dollars, given an MPC of about 0.67, will produce an
expenditure of about 4900 billion dollars. Thus the government needs to
ensure 9116.42 level of income if it is to keep consumption at 4900 for a
given MPC. As these calculations suggest, an estimated model may be used
for control, or policy, purposes. By appropriate fiscal and monetary policy
mix, the government can manipulate the control variable X to produce the
desired level of the target variable Y. summary
• Figure I.4 summarizes the anatomy of classical econometric modeling. A Note on the Measurement Scales of Variables
The variables that will generally encounter fall into the
following four broad categories:
1.Nominal Scale
•nominal scales assign numbers as labels to identify objects
or classes of objects. The assigned numbers carry no
additional meaning except as identifiers. For example, the
use of ID codes A, N and P to represent aggressive, normal,
and passive drivers is a nominal scale variable. Note that the
order has no meaning here, and the difference between
identifiers is meaningless. In practice it is often useful to
assign numbers instead of letters to represent nominal scale
variables, but the numbers should not be treated as ordinal,
interval, or ratio scale variables. Further, variables such as
gender (male, female) and marital status (married, unmarried,
divorced, separated) simply denote categories 2. Ordinal Scale
•Something measured on an "ordinal" scale does have an
evaluative connotation. One value is greater or larger or better than
the other. Product A is preferred over product B, and therefore A
receives a value of 1 and B receives a value of 2.
•Another example might be rating your job satisfaction on a scale
from 1 to 10, with 10 representing complete satisfaction. With
ordinal scales, we only know that 2 is better than 1 or 10 is better
than 9; we do not know by how much. It may vary. The distance
between 1 and 2 maybe shorter than between 9 and 10.
•Other examples are grading systems (A, B, C grades) or income
class (upper, middle, lower). For these variables the ordering exists
but the distances between the categories cannot be quantified.
•Students of economics will recall the indifference curves between
two goods, each higher indifference curve indicating higher level
of utility, but one cannot quantify by how much one indifference
curve is higher than the others. 3. Interval Scale
• Interval scales build upon ordinal scale variables. In an interval
scale, numbers are assigned to objects such that the differences
(but not ratios) between the numbers can be meaningfully
interpreted. Temperature (in Celsius or Fahrenheit) represents an
interval
scale variable, since the difference between
measurements is the same anywhere along the scale, and is
consistent across measurements. Ratios of interval scale variables
have limited meaning because there is not an absolute zero for
interval scale variables. The temperature scale in Kelvin, in
contrast, is a ratio scale variable because its zero value is
absolute zero, i.e. nothing can be measured at a lower temperature
than 0 degrees Kelvin.
•Time is an example of variable measured on the interval scale. The
distance between 1 and 2 is equal to the distance between 9 and
10. Temperature using Celsius or Fahrenheit is a good example,
there is the exact same difference between 100 degrees and 90 as
there is between 42 and 32. 4. Ratio Scale
•Ratio scales have all the attributes of interval scale variables
and one additional attribute: ratio scales include an absolute
“zero” point. For example, traffic density (measured in
vehicles per kilometer) represents a ratio scale. The density
of a link is defined as zero when there are no vehicles in a
link. Other ratio scale variables include number of vehicles in
a queue, height of a person, distance traveled, accident rate,
etc
•Temperature measured in Kelvin is an example. There is no
value possible below 0 degrees Kelvin, it is absolute zero.
Weight is another example, 0 lbs. is a meaningful absence of
weight. Your bank account balance is another. Although you
can have a negative or positive account balance, there is a
definite and non arbitrary meaning of an account balance of
0. REGRESSION ANALYSIS;A HYPOTHETICAL EXAMPLE
• Regression analysis is largely concerned with estimating and/or predicting
the (population) mean value of the dependent variable (regressand) on
the basis of the known or fixed values of the explanatory variable(s)
(repressor).
• Look at table 2.1 which refers to a total population of 60 families and their
weekly income (X) and weekly consumption expenditure (Y). The 60
families are divided into 10 income groups.
• There is considerable variation in weekly consumption expenditure in
each income group. But the general picture that one gets is that, despite
the variability of weekly consumption expenditure within each income
bracket, on the average, weekly consumption expenditure increases as
income increases. • (I.3.2) is an example of a linear regression model, i.e., it hypothesizes that Y
is linearly related to X, but that the relationship between the two is not exact;
it is subject to individual variation. The econometric model of (I.3.2) can be
depicted as shown in Figure I.2. • The dark circled points in Figure 2.1 show the conditional
mean values of Y for a given X value. If we join these
conditional mean values, we obtain what is known as the
population regression line (PRL), or more generally, the
population regression curve. More simply, it is the regression
of Y on X. The adjective “population” comes from the fact
that we are dealing in this example with the entire population
of 60 families. Of course, in reality a population may have
many more families. THE CONCEPT OF POPULATION REGRESSION
FUNCTION (PRF)
• From the preceding discussion and Figures. 2.1 and 2.2, it is clear
that each conditional mean E(Y | Xi) is a function of Xi. Symbolically,
• E(Y | Xi) = f (Xi) …………………………………..(2.2.1)
• Equation (2.2.1) is known as the conditional expectation function
(CEF) or population regression function (PRF) or population
regression (PR) for short.
• The functional form of the PRF is an empirical question. For
example, we may assume that the PRF E(Y | Xi) is a linear function
of Xi, say, of the type
• E(Y | Xi) = β1 + β2Xi ………………………………..(2.2.2) STOCHASTIC SPECIFICATION OF PRF
• We can express the deviation of an individual Yi around its expected value as follows:
• ui = Yi − E(Y | Xi)
• Technically, ui is known as the stochastic disturbance or
stochastic error term.
• How do we interpret (2.4.1)?
• The sum of two components:
– (1) E(Y | Xi), is known as the systematic, or deterministic,
component,
– (2) ui, which is the random, or nonsystematic, component. • Now if we take the expected value of (2.4.1) on both sides, we obtain
• E(Yi | Xi) = E[E(Y | Xi)] + E(ui | Xi)
•
= E(Y | Xi) + E(ui | Xi)
(2.4.4)
• Where expected value of a constant is that constant itself.
• Since E(Yi | Xi) is the same thing as E(Y | Xi), Eq. (2.4.4) implies that
• E(ui | Xi) = 0
(2.4.5)
• Thus, the assumption that the regression line passes through the
conditional means of Y implies that the conditional mean values of ui
(conditional upon the given X’s) are zero.
• It is clear that
• E(Y | Xi) = β1 + β2Xi
(2.2.2)
• and
• Yi = β1 + β2Xi + ui (2.4.2) Better
• are equivalent forms if E(ui | Xi) = 0. • We can develop the concept of the sample regression
function (SRF) to represent the sample regression line. The
sample counterpart of (2.2.2) may be written as
• Yˆi = βˆ1 + βˆ2Xi
(2.6.1)
• where Yˆ is read as “Y-hat’’ or “Y-cap’’
• Yˆi = estimator of E(Y | Xi)
• βˆ1 = estimator of β1
• βˆ2 = estimator of β2
• Note that an estimator, also known as a (sample) statistic, is
simply a rule or formula or method that tells how to estimate
th...

View
Full Document