Unformatted text preview: D.S.G. POLLOCK: TOPICS IN ECONOMETRICS 2011
1. EXPECTATIONS AND CONDITIONAL EXPECTATIONS
The joint density function of x and y is
f (x, y ) = f (xy )f (y ) = f (y x)f (x), (1) where
f (x, y )dx f (x) = and f (y ) = f (x, y )dy y (2) x are the marginal distributions of x and y respectively and where
f (xy ) = f (y, x)
f (y ) f (y x) = and f (y, x)
f (x) (3) are the conditional distributions of x given y and of y given x.
The unconditional expectation of y ∼ f (y ) is
E (y ) = yf (y )dy. (4) y The conditional expectation of y given x is
E (y x) = yf (y x)dy =
y y
y f (y, x)
dy.
f (x) (5) The expectation of the conditional expectation is an unconditional expectation:
E {E (y x)} = y
x y = f (y, x)
dy f (x)dx
f (x) yf (y, x)f (x)dydx
x y = (6) y f (y, x)dx dy = y x f (y )dy = E (y ).
y The conditional expectation of y given x is the minimum mean squared
error prediction
Proof. Let y = E (y x) and let π = π (x) be any other estimator. Then,
ˆ
E (y − π )2 = E (y − y ) + (ˆ − π )
ˆ
y 2 ˆy
ˆ
= E (y − y )2 + 2E (y − y )(ˆ − π ) + E (y − π )2 .
ˆ (7) In the second term, there is
E (y − y )(ˆ − π ) =
ˆy (y − y )(ˆ − π )f (x, y )∂y∂x
ˆy
x y (8)
(y − y )f (y x)∂y (y − π )f (x)∂x = 0.
ˆ
ˆ =
x y 1 D.S.G. POLLOCK: TOPICS IN ECONOMETRICS 2011
ˆ
y
ˆ
Therefore, E {(y − π )2 } = E {(y − y )2 } + E {(ˆ − π )2 } ≥ E {(y − y )2 }, and the
assertion is proved.
The error in predicting y is uncorrelated with x. The proof of this depends
on showing that E (ˆx) = E (yx), where y = E (y x):
y
ˆ
xE (y x)f (x)dx E (ˆx) =
y
x x = y x y = f (y, x)
dy f (x)dx
f (x) (9) xyf (y, x)dydx = E (xy ).
x y The result can be expressed as E {(y − y )x} = 0.
ˆ
This result can be used in deriving expressions for the parameters α and
β of a linear regression of the form
E (y x) = α + βx, (10) from which and unconditional expectation is derived in the form of
E (y ) = α + βE (x). (11) The orthogonality of the prediction error implies that
0 = E {(y − y )x} = E {(y − α − βx)x}
ˆ
= E (xy ) − αE (x) − βE (x2 ). (12) In order to eliminate αE (x) from this expression, equation (11) is multiplied
by E (x) and rearranged to give
αE (x) = E (x)E (y ) − β {E (x)}2 . (13) This substituted into (12) to give
E (xy ) − E (x)E (y ) = β E (x2 ) − {E (x)}2 ,
whence
β= C (x, y )
E (xy ) − E (x)E (y )
=
.
2 ) − {E (x)}2
E (x
V (x) (14) (15) The expression
α = E (y ) − βE (x) (16) comes directly from (9).
Observe that, by substituting (16) into (10), the following predictionerror
equation for the conditional expectation is derived:
E (y x) = E (y ) + β {x − E (x)}.
2 (17) D.S.G. POLLOCK: TOPICS IN ECONOMETRICS 2011
Thus, the conditional expectation of y given x is obtained by adjusting the
unconditional expectation by some proportion of the error in predicting x by
its expected value.
2. THE PARTITIONED REGRESSSION MODEL
Consider taking a regression equation in the form of
y = [ X1 X2 ] β1
+ ε = X1 β1 + X2 β2 + ε.
β2 (1) Here [X1 , X2 ] = X and [β1 , β2 ] = β are obtained by partitioning the matrix X
and vector β of the equation y = Xβ + ε in a conformable manner. The normal
equations X X β = X y can be partitioned likewise. Writing the equations
without the surrounding matrix braces gives
X1 X1 β1 + X1 X2 β2 = X1 y, (2) X2 X1 β1 + X2 X2 β2 = X2 y. (3) From (2), we get the equation X1 X1 β1 = X1 (y − X2 β2 ) which gives an expresˆ
sion for the leading subvector of β :
ˆ
ˆ
β1 = (X1 X1 )−1 X1 (y − X2 β2 ). (4) ˆ
To obtain an expression for β2 , we must eliminate β1 from equation (3). For
this purpose, we multiply equation (2) by X2 X1 (X1 X1 )−1 to give
X2 X1 β1 + X2 X1 (X1 X1 )−1 X1 X2 β2 = X2 X1 (X1 X1 )−1 X1 y. (5) When the latter is taken from equation (3), we get
X2 X2 − X2 X1 (X1 X1 )−1 X1 X2 β2 = X2 y − X2 X1 (X1 X1 )−1 X1 y.
On deﬁning P1 = X1 (X1 X1 )−1 X1 , (6) (7) can we rewrite (6) as
X2 (I − P1 )X2 β2 = X2 (I − P1 )y,
whence
ˆ
β2 = X2 (I − P1 )X2 −1 X2 (I − P1 )y. (8) (9) Now let us investigate the eﬀect that conditions of orthogonality amongst
the regressors have upon the ordinary leastsquares estimates of the regression
parameters. Consider a partitioned regression model, which can be written as
y = [ X 1 , X2 ] β1
+ ε = X1 β1 + X2 β2 + ε.
β2
3 (10) D.S.G. POLLOCK: TOPICS IN ECONOMETRICS 2011
It can be assumed that the variables in this equation are in deviation form.
Imagine that the columns of X1 are orthogonal to the columns of X2 such that
X1 X2 = 0. This is the same as assuming that the empirical correlation between
variables in X1 and variables in X2 is zero.
The eﬀect upon the ordinary leastsquares estimator can be seen by examˆ
ining the partitioned form of the formula β = (X X )−1 X y . Here we have
X1
[ X1
X2 XX= X2 ] = X1 X1
X2 X1 X1 X2
X2 X2 X1 X1
0 = 0
,
X2 X2 (11) where the ﬁnal equality follows from the condition of orthogonality. The inverse
of the partitioned form of X X in the case of X1 X2 = 0 is
−1 (X X ) X1 X1
=
0 0
X2 X2 We also have −1 X1 Xy= X2 = y= (X1 X1 )−1
0
X1 y
X2 y 0
.
(X2 X2 )−1 . (12) (13) On combining these elements, we ﬁnd that
ˆ
β1
ˆ
β2 = (X1 X1 )−1 0 X1 y 0 (X2 X2 )−1 X2 y = (X1 X1 )−1 X1 y
(X2 X2 )−1 X2 y . (14) In this special case, the coeﬃcients of the regression of y on X = [X1 , X2 ] can
be obtained from the separate regressions of y on X1 and y on X2 .
It should be understood that this result does not hold true in general. The
ˆ
ˆ
general formulae for β1 and β2 are those which we have given already under
(4) and (9):
ˆ
ˆ
β1 = (X1 X1 )−1 X1 (y − X2 β2 ),
ˆ
β2 = X2 (I − P1 )X2 −1 X2 (I − P1 )y, −1 P1 = X1 (X1 X1 ) (15)
X1 . It can be conﬁrmed easily that these formulae do specialise to those under (14)
in the case of X1 X2 = 0.
The purpose of including X2 in the regression equation when, in fact,
interest is conﬁned to the parameters of β1 is to avoid falsely attributing the
explanatory power of the variables of X2 to those of X1 .
Let us investigate the eﬀects of erroneously excluding X2 from the regression. In that case, the estimate will be
˜
β1 = (X1 X1 )−1 X1 y
= (X1 X1 )−1 X1 (X1 β1 + X2 β2 + ε)
= β1 + (X1 X1 )−1 X1 X2 β2 + (X1 X1 )−1 X1 ε.
4 (16) D.S.G. POLLOCK: TOPICS IN ECONOMETRICS 2011
On applying the expectations operator to these equations, we ﬁnd that
˜
E (β1 ) = β1 + (X1 X1 )−1 X1 X2 β2 , (17) since E {(X1 X1 )−1 X1 ε} = (X1 X1 )−1 X1 E (ε) = 0. Thus, in general, we have
˜
˜
E (β1 ) = β1 , which is to say that β1 is a biased estimator. The only circumstances in which the estimator will be unbiased are when either X1 X2 = 0 or
β2 = 0. In other circumstances, the estimator will suﬀer from a problem which
is commonly described as omittedvariables bias.
We need to ask whether it matters that the estimated regression parameters are biased. The answer depends upon the use to which we wish to put the
estimated regression equation. The issue is whether the equation is to be used
simply for predicting the values of the dependent variable y or whether it is to
be used for some kind of structural analysis.
If the regression equation purports to describe a structural or a behavioral
relationship within the economy, and if some of the explanatory variables on
the RHS are destined to become the instruments of an economic policy, then
it is important to have unbiased estimators of the associated parameters. For
these parameters indicate the leverage of the policy instruments. Examples of
such instruments are provided by interest rates, tax rates, exchange rates and
the like.
On the other hand, if the estimated regression equation is to be viewed
solely as a predictive device—that it to say, if it is simply an estimate of the
function E (y x1 , . . . , xk ) which speciﬁes the conditional expectation of y given
the values of x1 , . . . , xn —then, provided that the underlying statistical mechanism which has generated these variables is preserved, the question of the
unbiasedness the regression estimates does not arise.
3. HYPOTHESES CONCERNING SUBSETS
OF THE REGRESSION COEFFICIENTS
Consider a set of linear restrictions on the vector β of a classical linear regression
model N (y ; Xβ, σ 2 I ) which take the form of
Rβ = r, (1) where R is a matrix of order j × k and of rank j , which is to say that the j
restrictions are independent of each other and are fewer in number than the
parameters within β . We know that the ordinary leastsquares estimator of β
ˆ
is a normally distributed vector β ∼ N {β, σ 2 (X X )−1 }. It follow that
ˆ
Rβ ∼ N Rβ = r, σ 2 R(X X )−1 R ; (2) and, from this, we can immediately infer that
ˆ
(Rβ − r) R(X X )−1 R
σ2
5 −1 ˆ
(Rβ − r) ∼ χ2 (j ). (3) D.S.G. POLLOCK: TOPICS IN ECONOMETRICS 2011
We have already established the result that
ˆ
ˆ
(T − k )ˆ 2
σ
(y − X β ) (y − X β )
=
∼ χ2 (T − k )
σ2
σ2 (4) is a chisquare variate which is statistically independent of the chisquare variate
ˆ
ˆ
(β − β ) X X (β − β )
∼ χ2 (k )
2
σ (5) derived from the estimator of the regression parameters. The variate of (4)
must also be independent of the chisquare of (3); and it is straightforward to
deduce that
F= = −1 ˆ
(Rβ − r) R(X X )−1 R
j
ˆ
(Rβ − r) R(X X )−1 R
σ2 j
ˆ −1 ˆ
(Rβ − r) ˆ
(Rβ − r) ˆ
ˆ
(y − X β ) (y − X β )
T −k (6) ∼ F (j, T − k ), which is to say that the ratio of the two independent chisquare variates, divided by their respective degrees of freedom, is an F statistic. This statistic,
which embodies only know and observable quantities, can be used in testing
the validity of the hypothesised restrictions Rβ = r.
A specialisation of the statistic under (6) can also be used in testing an
hypothesis concerning a subset of the elements of the vector β . Let β =
∗
[β1 , β2 ] . Then the condition that the subvector β1 assumes the value of β1 can
be expressed via the equation
[Ik1 , 0] β1
β2 ∗
= β1 . (7) This can be construed as a case of the equation Rβ = r where R = [Ik1 , 0] and
∗
r = β1 .
In order to discover the specialised form of the requisite test statistic, let
us consider the following partitioned form of an inverse matrix:
−1 (X X ) = = X1 X1 X1 X2 X2 X1 −1 X2 X2 {X1 (I − P2 )X1 }−1 − {X1 (I − P2 )X1 }−1 X1 X2 (X2 X2 )−1 −{X2 (I − P1 )X2 }−1 X2 X1 (X1 X1 )−1 {X2 (I − P1 )X2 }−1 (8)
, Then, with R = [I, 0], we ﬁnd that
R(X X )−1 R = X1 (I − P2 )X1
6 −1 (9) D.S.G. POLLOCK: TOPICS IN ECONOMETRICS 2011
It follows in a straightforward manner that the specialised form of the F statistic
of (6) is
F= = ∗
∗
ˆ
ˆ
(β1 − β1 ) X1 (I − P2 )X1 (β1 − β1 )
k1 ˆ
(β1 − ∗
β1 ) ˆ
X1 (I − P2 )X1 (β1 −
σ 2 k1
ˆ ∗
β1 ) ˆ
ˆ
(y − X β ) (y − X β )
T −k (10) ∼ F (k1 , T − k ). By specialising the expression under (10), a statistic may be derived for
testing the hypothesis that βi = βi , concerning a single element:
F= ˆ
(βi − βi )2
,
σ 2 wii
ˆ (11) Here, wii stands for the ith diagonal element of (X X )−1 . If the hypothesis is
true, then this will have an F (1, T − k ) distribution.
However, the usual way of testing such an hypothesis is to use
t= ˆ
βi − βi (12) (σ 2 wii )
ˆ in conjunction with the tables of the t(T − k ) distribution. The t statistic shows
the direction in which the estimate of βi deviates from the hypothesised value
as well as the size of the deviation.
4. REGRESSIONS ON TRIGONOMETRICAL FUNCTIONS:
DISCRETETIME FOURIER ANALYSIS
An example of orthogonal regressors is a Fourier analysis, where the explanatory variables are sampled from a set of trigonometric functions with angular
velocities, called Fourier frequencies, that are evenly distributed in an interval
from zero to π radians per sample period.
If the sample is indexed by t = 0, 1, . . . , T − 1, then the Fourier frequencies
are ωj = 2πj/T ; j = 0, 1, . . . , [T /2], where [T /2] denotes the integer quotient of
the division of T by 2.
The object of a Fourier analysis is to express the elements of the sample
as a weighted sum of sine and cosine functions as follows:
[T /2] {αj cos(ωj t) + β sin(ωj t)} ; yt = α0 + t = 0, 1, . . . , T − 1. (1) j =1 The vectors of the generic trigonometric regressors may be denoted by
cj = [c0j , c1j , . . . cT −1,j ] and
7 sj = [s0j , s1j , . . . sT −1,j ] , (3) D.S.G. POLLOCK: TOPICS IN ECONOMETRICS 2011
where ctj = cos(ωj t) and stj = sin(ωj t). The vectors of the ordinates of functions of diﬀerent frequencies are mutually orthogonal. Therefore, the following
orthogonality conditions hold:
ci cj = si sj = 0 if i = j, and ci sj = 0 for all i, j. (4) In addition, there are some sums of squares which can be taken into account
in computing the coeﬃcients of the Fourier decomposition:
c0 c0 = ι ι = T,
cj cj = sj sj = T /2 s0 s0 = 0,
for j = 1, . . . , [(T − 1)/2] (5) When T = 2n, there is ωn = π and there is also
sn sn = 0, and cn cn = T. (6) The “regression” formulae for the Fourier coeﬃcients can now be given.
First, there is
1
yt = y .
¯
(7)
α0 = (ι ι)−1 ι y =
Tt
Then, for j = 1, . . . , [(T − 1)/2], there are
αj = (cj cj )−1 cj y =
and
βj = (sj sj )−1 sj y = 2
T yt cos ωj t, (8) yt sin ωj t. (9) t 2
T t If T = 2n is even, then there is no coeﬃcient βn and there is
αn = (cn cn )−1 cn y = 1
T (−1)t yt . (10) t By pursuing the analogy of multiple regression, it can be seen, in view of
the orthogonality relationships, that there is a complete decomposition of the
sum of squares of the elements of the vector y :
[T /2] yy= 2
α0 ι 2
2
αj cj cj + βj sj sj . ι+ (11) j =1
2
Now consider writing α0 ι ι = y 2 ι ι = y y , where y = [¯, y , . . . , y ] is a vector
¯
¯¯
¯
y¯
¯
2
whose repeated element is the sample mean y . It follows that y y − α0 ι ι =
¯ 8 D.S.G. POLLOCK: TOPICS IN ECONOMETRICS 2011
¯¯
¯
¯
y y − y y = (y − y ) (y − y ). Then, in the case where T = 2n is even, the equation
can be written as
T
¯
(y − y ) (y − y ) =
¯
2 n−1
2
αj + 2
βj j =1 + 2
T αn T
=
2 n ρ2 .
j (12) j =1 2
2
where ρj = αj + βj for j = 1, . . . , n − 1 and ρn = 2αn . A similar expression
exists when T is odd, with the exceptions that αn is missing and that the
summation runs to (T − 1)/2.
It follows that the variance of the sample can be expressed as 1
T T −1 1
(yt − y ) =
¯
2
t=0 n 2 2
2
(αj + βj ). (13) j =1 The proportion of the variance that is attributable to the component at fre2
2
quency ωj is (αj + βj )/2 = ρ2 /2, where ρj is the amplitude of the component.
j
The number of the Fourier frequencies increases at the same rate as the
sample size T , and, if there are no regular harmonic components in the underling process, then we can expect the proportion of the variance attributed to
the individual frequencies to decline as the sample size increases.
If there is a regular component, then we can expect the the variance attributable to it to converge to a ﬁnite value as the sample size increases.
In order provide a graphical representation of the decomposition of the
sample variance, we must scale the elements of equation (36) by a factor of T .
2
2
The graph of the function I (ωj ) = (T /2)(αj + βj ) is know as the periodogram.
We should note that the trigonometric functions that underlie the periodogram are deﬁned over an inﬁnite index set {t = 0 ± 1, ±2, . . .} comprising
all positive and negative integers. As the index t progresses through this sequence, equation (1) will generate successive replications of the data set that is
deﬁned on the points t = 0, 1, . . . , T − 1. The result is described as the periodic
extension of the data.
Since the trigonometric functions are both periodic and strictly bounded,
they cannot be used to synthesise a perpetual trend. Therefore, before a periodogram analysis can be applied eﬀectively, it is necessary to ensure that the
data sequence is free of trend. A failure to detrend the data will lead to successive disjunctions in its periodic extension at the points where the beginning
of one replication of the data set joins the end of the previous replication. The
result will be a socalled saw tooth function.
The typical periodogram of a saw tooth function has the form of a rectangular hyperbola, which descends from a high point at the fundamental frequency
of ω1 = 2π/T and which reaches a low point in the vicinity of the limiting
frequency of π . Within such a periodogram, the features of primary interest,
which relate to the ﬂuctuations that are superimposed upon the trend, may be
virtually imperceptible—see Figure 5, for example. 9 D.S.G. POLLOCK: TOPICS IN ECONOMETRICS 2011 5.4
5.2
5
4.8
0 25 50 75 100 125 Figure 1. The plot of 132 monthly observations on the U.S. money supply,
beginning in January 1960. A quadratic function has been interpolated
through the data. 0.015
0.01
0.005
0
0 π/4 π/2 3π/4 π Figure 2. The periodogram of the residuals of the logarithmic moneysupply data. The Periodogram and the Autocovariance Function
A stationary stochastic process can be characterised, equivalently, by its autocovariance function or its partial autocovariance function.
It can also be characterised by is spectral density function, which is the
Fourier transform of the autocovariances {γτ ; τ = 0, ±1, ±2, . . .} :
∞ f (ω ) = ∞ γτ cos(ωτ ) = γ0 + 2
τ =−∞ γτ cos(ωτ ). (14) τ =1 Here, ω ∈ [0, π ] is an angular velocity, or frequency value, in radians per period.
The empirical counterpart of the spectral density function is the periodogram I (ωj ), which may be deﬁned as
1
I (ωj ) =
2 T −1 T −1 cτ cos(ωj τ ) = c0 + 2
τ =1−T cτ cos(ωj τ ),
τ =1 10 (15) D.S.G. POLLOCK: TOPICS IN ECONOMETRICS 2011
where {cτ ; τ = 0, ±1, . . . , ±(T − 1)}, with
cτ = T −1 T −1 (yt − y )(yt−τ − y ),
¯
¯ (16) t=τ are the empirical autocovariances, and where ωj = 2πj/T ; j = 0, 1, . . . , [T /2]
are the Fourier frequencies and We need to show this deﬁnition of the peridogram is equivalent to the previous deﬁnition, which was based on the following
frequency decomposition of the sample variance:
1
T
where 2
T αj = [T /2] 1
(yt − y ) =
¯
2
t=0
2 yt cos(ωj t) =
t 2
T βj = T −1 yt sin(ωj t) =
t 2
2
(αj + βj ), (17) j =0 2
T
2
T (yt − y ) cos(ωj t),
¯
t (yt − y ) sin(ωj t).
¯
t 2
2
Substituting these into the term T (αj + βj )/2 gives the periodogram
T −1 2
I (ωj ) =
T T −1 2 cos(ωj t)(yt − y )
¯ 2 sin(ωj t)(yt − y )
¯ + t=0 . t=0 The quadratic terms may be expanded to give
I (ωj ) = 2
T
+ cos(ωj t) cos(ωj s)(yt − y )(ys − y )
¯
¯
t 2
T s sin(ωj t) sin(ωj s)(yt − y )(ys − y ) ,
¯
¯
t s Since cos(A) cos(B ) + sin(A) sin(B ) = cos(A − B ), this can be written as
I (ωj ) = 2
T cos(ωj [t − s])(yt − y )(ys − y )
¯
¯
t s On deﬁning τ = t − s and writing cτ =
the latter expression to t (yt − y )(yt−τ − y )/T , we can reduce
¯
¯ T −1 I (ωj ) = 2 cos(ωj τ )cτ ,
τ =1−T which is a Fourier transform of the empirical autocovariances.
11 D.S.G. POLLOCK: TOPICS IN ECONOMETRICS 2011
5. FILTERING MACROECONOMIC DATA
Wiener–Kolmogorov Filtering of Stationary Sequences
The classical theory of linear ﬁltering was formulated independently by Norbert
Wiener (1941) and Andrei Nikolaevich Kolmogorov (1941) during the Second
World War. They were both considering the problem of how to target radarassisted antiaircraft guns on incoming enemy aircraft.
The theory has found widespread application in analog and digital signal
processing and in telecommunications in general. Also, it has provided a basic
technique for the enhancement of recorded music.
The classical theory assumes that the data sequences are generated by
stationary stochastic processes and that these are of suﬃcient length to justify
the assumption that they constitute doublyinﬁnite sequences.
For econometrics, the theory must to be adapted to cater to short trended
sequences. Then, Wiener–Kolmogorov ﬁlters can used to extract trends from
economic data sequences and for generating seasonally adjusted data.
Consider a vector y with a signal component ξ and a noise component η :
y = ξ + η. (1) These components are assumed to be independently normally distributed with
zero means and with positivedeﬁnite dispersion matrices. Then,
E (ξ ) = 0, D(ξ ) = Ωξ , E (η ) = 0, D(η ) = Ωη , (2) and C (ξ, η ) = 0.
A consequence of the independence of ξ and η is that
D(y ) = Ωξ + Ωη and C (ξ, y ) = D(ξ ) = Ωξ . (3) The signal component is estimated by a linear transformation x = Ψx y
of the data vector that suppresses the noise component. Usually, the signal
comprises lowfrequency elements and the noise comprises elements of higher
frequencies.
The Minimum MeanSquared Error Estimator
The principle of linear minimum meansquared error estimation indicates that
the error ξ − x in representing ξ by x should be uncorrelated with the data in
y:
0 = C (ξ − x, y ) = C (ξ, y ) − C (x, y )
= C (ξ, y ) − Ψx C (y, y )
(4)
= Ωξ − Ψx (Ωξ + Ωη ).
This indicates that the estimate is
x = Ψx y = Ωξ (Ωξ + Ωη )−1 y.
12 (5) D.S.G. POLLOCK: TOPICS IN ECONOMETRICS 2011
The corresponding estimate of the noise component η is
h = Ψh y = Ωη (Ωξ + Ωη )−1 y
= {I − Ωξ (Ωξ + Ωη )−1 }y. (6) It will be observed that Ψξ + Ψη = I and, therefore, that x + h = y .
Conditional Expectations
In deriving the estimator, we might have used the formula for conditional expectations. In the case of two linearly related scalar random variables ξ and y ,
the conditional expectation of ξ given y is
E (ξ y ) = E (ξ ) + C (ξ, y )
{y − E (y )}
V (y ) (7) In the case of two vector quantities, this becomes
E (ξ y ) = E (ξ ) + C (ξ, y )D−1 (y ){y − E (y )} (8) By setting
C (ξ, y ) = Ωξ and D(y ) = Ωξ + Ωη as in (3), and by setting E (ξ ) = E (y ) = 0, we get the expression that is to be
found under (5):
x = Ωξ (Ωξ + Ωη )−1 y. The Diﬀerence Operator and Polynomial Regression
The lag operator L, which is commonly deﬁned in respect of a doublyinﬁnite
sequence x(t) = {xt ; t = 0 ± 1, ±2, . . .}, has the eﬀect that Lx(t) = x(t − 1).
The (backwards) diﬀerence operator ∇ = 1 − L has the eﬀect that ∇x(t) =
x(t) − x(t − 1). It serves to reduce a constant function to zero and to reduce a
linear function to a constant. The secondorder or twofold diﬀerence operator
∇2 = 1 − 2L + L2
is eﬀective in reducing a linear function to zero.
A diﬀerence operator ∇d of order d is commonly employed in the context
of an ARIMA(p, d, q ) model to reduce the data to stationarity. Then, the diﬀerenced data can be modelled by an ARMA(p, q ) process. In such circumstances,
the diﬀerence operator takes the form of a matrix transformation.
13 D.S.G. POLLOCK: TOPICS IN ECONOMETRICS 2011 6 4 2 0
0 π/4 π/2 3π/4 π Figure 3. The squared gain of the diﬀerence operator, which has a zero at zero
frequency, and the squared gain of the summation operator, which is unbounded at
zero frequency. The Matrix Diﬀerence Operator
The matrix analogue of the secondorder diﬀerence operator in the case of
T = 5, for example, is given by 1
0
0
00
0
0 0 −2 1 Q∗
2 (9)
= 1 −2 1
∇5 =
0 0. Q 0
1 −2 1 0
0
0
1 −2 1
The ﬁrst two rows, which do not produce true diﬀerences, are liable to be
discarded.
The diﬀerence operator nulliﬁes data elements at zero frequency and it
severely attenuates those at the adjacent frequencies. This is a disadvantage
when the low frequency elements are of primary interest. Another way of
detrending the data is to ﬁt a polynomial trend by leastsquares regression and
to take the residual sequence as the detrended data.
Polynomial Regression
Using the matrix Q deﬁned above, we can represent the vector of the ordinates
of a linear trend line interpolated through the data sequence as
x = y − Q(Q Q)−1 Q y . (10) The vector of the residuals is
e = Q(Q Q)−1 Q y . (11) Observe that this vector contains exactly the same information as the
diﬀerenced vector g = Q y . However, whereas the lowfrequency structure of
14 D.S.G. POLLOCK: TOPICS IN ECONOMETRICS 2011 11.5
11
10.5
10
0 50 100 150 Figure 4. The quarterly series of the logarithms of consumption in the U.K., for
the years 1955 to 1994, together with a linear trend interpolated by leastsquares
regression. 8
6
4
2
0
0 π/4 π/2 3π/4 π Figure 5. The periodogram of the trended logarithmic data. 0.3
0.2
0.1
0
0 π/4 π/2 3π/4 π Figure 6. The periodogram of the diﬀerenced logarithmic consumpption data. 15 D.S.G. POLLOCK: TOPICS IN ECONOMETRICS 2011 0.01
0.0075
0.005
0.0025
0
0 π/4 π/2 3π/4 π Figure 7. The periodogram of the residual sequence obtained from the linear detrending of the logarithmic consumption data. the data in invisible in the periodogram of the latter, it is entirely visible in
the periodogram of the residuals.
Filters for Short Trended Sequences
Applying Q to the equation y = ξ + η , representing the trended data, gives
Qy =Qξ+Qη
= δ + κ = g. (12) The vectors of the expectations and the dispersion matrices of the diﬀerenced
vectors are
E (δ ) = 0,
D(δ ) = Ωδ = Q D(ξ )Q,
(13)
E (κ) = 0,
D(κ) = Ωκ = Q D(η )Q.
The diﬃculty of estimating the trended vector ξ = y − η directly is that some
starting values or initial conditions are required in order to deﬁne the value at
time t = 0. However, since η is from a stationary meanzero process, it requires
only zerovalued initial conditions. Therefore, the startingvalue problem can
be circumvented by concentrating on the estimation of η .
The conditional expectation of η , given the diﬀerenced data g = Q y , is
provided by the formula
h = E (η g ) = E (η ) + C (η, g )D−1 (g ){g − E (g )}
= C (η, g )D−1 (g )g, (14) where the second equality follows in view of the zerovalued expectations.
Within this expression, there are
D(g ) = Ωδ + Q Ωη Q and C (η, g ) = Ωη Q.
16 (15) D.S.G. POLLOCK: TOPICS IN ECONOMETRICS 2011 1
0.75
0.5
0.25
0
0 π/4 π/2 3π/4 π Figure 8. The gain of the Hodrick–Prescott lowpass ﬁlter with a smoothing parameter set to 100, 1,600 and 14,400. Putting these details into (14) gives the following estimate of η :
h = Ωη Q(Ωδ + Q Ωη Q)−1 Q y . (16) Putting this into the equation x = y − h gives
x = y − Ωη Q(Ωδ + Q Ωη Q)−1 Q y . (17) The Leser (H–P) Filter
We now consider two speciﬁc cases of the Wiener–Kolmogorov ﬁlter. First,
there is the Leser or Hodrick–Prescott (H–P) ﬁlter. This can be derived from
a model that supposes that the signal is generated by an integrated (secondorder) random walk and and that the noise is from a whitenoise process.
The random walk process is reduced to a whitenoise process δ (t) by taking
twofold diﬀerences. Thus, (1 − L)2 ξ (t) = δ (t), and the corresponding equation
for the sample is Q ξ = δ . Accordingly, the ﬁlter is derived by setting
2
D(η ) = Ωη = ση I, within (17) to give 2
D(δ ) = Ωδ = σδ I and λ = x = y − Q(λ−1 I + Q Q)−1 Q y . 2
ση
2
σδ (18) (19) Here, λ is the socalled smoothing parameter. It will be observed that, as
λ → ∞, the vector x tends to that of a linear function interpolated into the
data by leastsquares regression, which is represented by equation (10):
x = y − Q(Q Q)−1 Q y . 17 D.S.G. POLLOCK: TOPICS IN ECONOMETRICS 2011 1
0.75
0.5
0.25
0
0 π/4 π/2 3π/4 π Figure 9. The gain of the lowpass Butterworth ﬁlters of orders n = 6 and
n = 12 with a nominal cutoﬀ point of 2π/3 radians. Figure 8 depicts the frequency response of the lowpass H–P ﬁlter for various
values of the smoothing parameter λ. The innermost proﬁle corresponds to the
highest value of the parameter, and it represents a ﬁlter that transmits only
the data elements of lowest frequency.
For all values of λ, the response of the H–P ﬁlter shows a gradual transition
from the pass band, which corresponds to the frequencies that are transmitted
by the ﬁlter, to the stop band, which corresponds to the frequencies that are
impeded.
Often, there is a requirement for a more rapid transition as well as a need
to control the location in frequency where the transitions occurs. These needs
can be served by the Butterworth ﬁlter, which is more amenable to adjustment.
The Butterworth Filter
The Butterworth ﬁlter can be derived from an heuristic model in which the
signal and the noise are generated by processes that are described, respectively,
by the equations (1 − L)2 ξ (t) = (1 + L)n ζ (t) and (1 − L)2 η (t) = (1 − L)n ε(t),
where ζ (t) and η (t) are mutually independent whitenoise processes.
The ﬁlter that is appropriate to short trended sequences can be represented
by the equation
x = y − λΣQ(M + λQ ΣQ)−1 Q y .
(20)
Here, the matrices are
Σ = {2IT − (LT + LT )}n−2 and M = {2IT + (LT + LT )}n , (21) where LT is a matrix of order T with units on the ﬁrst subdiagonal; and it can
be veriﬁed that
(22)
Q ΣQ = {2IT − (LT + LT )}n .
Figure 9 shows the frequency response of the Butterworth ﬁlter for various
values of n and for a speciﬁc cutoﬀ frequency, which is determined by the
18 D.S.G. POLLOCK: TOPICS IN ECONOMETRICS 2011
parameter λ. The greater the value of n, the more rapid is the transition from
pass band to stop band.
LAGGED DEPENDENT VARIABLES AND
ERROR CORRECTION MECHANISMS
A regression equation can also be set in motion by including lagged values of
the dependent variable on the RHS. With one lagged value, we get
y (t) = φy (t − 1) + βx(t) + ε(t). (1) In terms of the lag operator, this is
(1 − φL)y (t) = βx(t) + ε(t), (2) of which the rational form is
y (t) = 1
β
x(t) +
ε(t).
1 − φL
1 − φL (3) The advantage of equation (1) is that it is amenable to estimation by
ordinary leastsquares regression. Although the estimates will be biased in
ﬁnite samples, they will be consistent, if the model is correctly speciﬁed.
The disadvantage is the restrictive assumption that the systematic and
disturbance parts have the same dynamics.
Partial Adjustment and Adaptive Expectations
A simple partialadjustment model has the form
y (t) = λ γ x(t) + (1 − λ)y (t − 1) + ε(t), (4) If y (t) is current consumption, x(t) is disposable income, then γx(t) = y ∗ (t) is
“desired”consumption. If habits of consumption persist, then current consumption will be a weighted combination of the previous consumption and present
desired consumption.
The weights of the combination depend on the partialadjustment parameter λ ∈ (0, 1]. If λ = 1, then the consumers adjust their consumption instantaneously to the desired value. As λ → 0, their consumption habits become
increasingly persistent.
When the notation λγ = β and (1 − λ) = φ is adopted, equation (4)
becomes identical to equation (2), which is the regression model with a lagged
dependent variable. Observe also that γ = β/(1 − φ) is a “longterm multipier”
in the relationhip between iincome and comsumption. It can also be described
as the steadystate gain of the transfer function from x(t) to y (t), which is
depocted in equation (3).
19 D.S.G. POLLOCK: TOPICS IN ECONOMETRICS 2011
ErrorCorrection Forms, and Nonstationary Signals
The usual linear regression procedures presuppose that the relevant data
moments will converge asymptotically to ﬁxed limits as the sample size increases. This cannot happen if the data are trended, in which case, the standard
techniques of statistical inference will not be applicable.
A common approach is to subject the data to as many diﬀerencing operations as may be required to achieve stationarity. However, diﬀerencing tends to
remove some of the essential information regarding the behaviour of economic
agents. Moreover, it is often discovered that the regression model looses much
of its explanatory power when the diﬀerences of the data are used instead.
In such circumstances, one might use the socalled errorcorrection model.
The model depicts a mechanism whereby two trended economic variables maintain an enduring longterm proportionality with each other.
The data sequences comprised by the model are stationary, either individually or in an appropriate combination; and this enables us apply the standard
procedures of statistical inference that are appropriate to models comprising
data from stationary processes.
Consider taking y (t − 1) from both sides of the equation of (1), which
represents the ﬁrstorder dynamic model. This gives
∇y (t) = y (t) − y (t − 1) = (φ − 1)y (t − 1) + βx(t) + ε(t)
= (1 − φ) β
x(t) − y (t − 1) + ε(t)
1−φ (5) = λ γ x(t) − y (t − 1) + ε(t),
where λ = 1 − φ and where γ is the gain of the transfer function from x(t)
to y (t) deﬁned under (3). This is the socalled errorcorrection form of the
equation; and it indicates that the change in y (t) is a function of the extent to
which the proportions of the series x(t) and y (t − 1) diﬀers from those which
would prevail in the steady state.
The errorcorrection form provides the basis for estimating the parameters
of the model when the signal series x(t) is trended or nonstationary. A pair
of nonstationary series that maintain a longrun proportionality are said to
be cointegrated. It is easy to obtain an accurate estimate of γ , which is the
coeﬃcient of proportionality, simply by running a regression of y (t − 1) on x(t).
Once a value for γ is available, the remaining parameter λ may be estimated by regressing ∇y (t) upon the composite variable {γx(t) − y (t − 1)}.
However, if the errorcorrection model is an unrestricted reparametrisation of
an original model in levels, then its parameters can be estimated by ordinary
leastsquares regression. The same estimates can also be inferred from the
leastsquares estimates of the parameters of the original model in levels. A
pair of nonstationary series that maintain a longrun proportionality are said
to be cointegrated. It is easy to obtain an accurate estimate of γ , which is the
coeﬃcient of proportionality, simply by running a regression of y (t − 1) on x(t).
To see how to derive an errorcorrection form for a more general autoregressive distributedlag model, consider the secondorder model:
y (t) = φ1 y (t − 1) + φ2 y (t − 2) + β0 x(t) + β1 x(t − 1) + ε(t).
20 (6) D.S.G. POLLOCK: TOPICS IN ECONOMETRICS 2011
The part φ1 y (t − 1) + φ2 y (t − 2) comprising the lagged dependent variables can
be reparameterised as follows:
[ φ1 φ2 ] 1
1 0
1 1
−1 0
1 y (t − 1)
y (t − 2)) = [θ ρ] y (t − 1)
. (7)
∇y (t − 1)) Here, the matrix that postmultiplies the row vector of the parameters is the
inverse of the matrix that premultiplies the column vector of the variables.
The sum β0 x(t) + β1 x(t − 1) can be reparametrised to become
[ β0 β1 ] 1
1 1
0 0
1 1
−1 x(t)
x(t − 1) = [κ δ] x(t − 1)
.
∇x(t) (8) It follows that equation (61) can be recast in the form of
y (t) = θy (t − 1) + ρ∇y (t − 1) + κx(t − 1) + δ ∇x(t) + ε(t). (9) Taking y (t − 1) from both sides of this equation and rearranging it gives
∇y (t) = (1 − θ) κ
x(t − 1) − y (t − 1) + ρ∇y (t − 1) + δ ∇x(t) + ε(t)
1−θ = λ {γx(t − 1) − y (t − 1)} + ρ∇y (t − 1) + δ ∇x(t) + ε(t).
(10)
This is an elaboration of equation (5); and it includes the diﬀerenced sequences
∇y (t − 1) and ∇x(t). These are deemed to be stationary, as is the composite
error sequence γx(t − 1) − y (t − 1).
Observe that, in contrast to equation (5), the errorcorrection term of (10)
comprises the lagged value x(t − 1) in place of x(t). Had the reparametrising
transformation that has been employed in equation (7) also been used in (8),
then the consequence would have been to generate an errorcorrection term
of the form γx(t) − y (t − 1). It should also be observed that the parameter
associated with x(t) in (10), which is
γ= β0 + β1
κ
,
=
1−φ
1 − φ1 − φ2 (11) is the steady state gain of the transfer function from x(t) to y (t).
Additional lagged diﬀerences can be added to the equation (10); and this
is tantamount to increasing the number of lags of the dependent variable y (t)
and the number of lags of the input variable x(t) within equation (6). 21 ...
View
Full Document
 Spring '12
 D.S.G.Pollock
 Econometrics, Linear Regression, Regression Analysis, X1 X1, D.S.G. POLLOCK, Ωξ

Click to edit the document details