TOPICS2011 - D.S.G. POLLOCK: TOPICS IN ECONOMETRICS 2011 1....

Info iconThis preview shows page 1. Sign up to view the full content.

View Full Document Right Arrow Icon
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: D.S.G. POLLOCK: TOPICS IN ECONOMETRICS 2011 1. EXPECTATIONS AND CONDITIONAL EXPECTATIONS The joint density function of x and y is f (x, y ) = f (x|y )f (y ) = f (y |x)f (x), (1) where f (x, y )dx f (x) = and f (y ) = f (x, y )dy y (2) x are the marginal distributions of x and y respectively and where f (x|y ) = f (y, x) f (y ) f (y |x) = and f (y, x) f (x) (3) are the conditional distributions of x given y and of y given x. The unconditional expectation of y ∼ f (y ) is E (y ) = yf (y )dy. (4) y The conditional expectation of y given x is E (y |x) = yf (y |x)dy = y y y f (y, x) dy. f (x) (5) The expectation of the conditional expectation is an unconditional expectation: E {E (y |x)} = y x y = f (y, x) dy f (x)dx f (x) yf (y, x)f (x)dydx x y = (6) y f (y, x)dx dy = y x f (y )dy = E (y ). y The conditional expectation of y given x is the minimum mean squared error prediction Proof. Let y = E (y |x) and let π = π (x) be any other estimator. Then, ˆ E (y − π )2 = E (y − y ) + (ˆ − π ) ˆ y 2 ˆy ˆ = E (y − y )2 + 2E (y − y )(ˆ − π ) + E (y − π )2 . ˆ (7) In the second term, there is E (y − y )(ˆ − π ) = ˆy (y − y )(ˆ − π )f (x, y )∂y∂x ˆy x y (8) (y − y )f (y |x)∂y (y − π )f (x)∂x = 0. ˆ ˆ = x y 1 D.S.G. POLLOCK: TOPICS IN ECONOMETRICS 2011 ˆ y ˆ Therefore, E {(y − π )2 } = E {(y − y )2 } + E {(ˆ − π )2 } ≥ E {(y − y )2 }, and the assertion is proved. The error in predicting y is uncorrelated with x. The proof of this depends on showing that E (ˆx) = E (yx), where y = E (y |x): y ˆ xE (y |x)f (x)dx E (ˆx) = y x x = y x y = f (y, x) dy f (x)dx f (x) (9) xyf (y, x)dydx = E (xy ). x y The result can be expressed as E {(y − y )x} = 0. ˆ This result can be used in deriving expressions for the parameters α and β of a linear regression of the form E (y |x) = α + βx, (10) from which and unconditional expectation is derived in the form of E (y ) = α + βE (x). (11) The orthogonality of the prediction error implies that 0 = E {(y − y )x} = E {(y − α − βx)x} ˆ = E (xy ) − αE (x) − βE (x2 ). (12) In order to eliminate αE (x) from this expression, equation (11) is multiplied by E (x) and rearranged to give αE (x) = E (x)E (y ) − β {E (x)}2 . (13) This substituted into (12) to give E (xy ) − E (x)E (y ) = β E (x2 ) − {E (x)}2 , whence β= C (x, y ) E (xy ) − E (x)E (y ) = . 2 ) − {E (x)}2 E (x V (x) (14) (15) The expression α = E (y ) − βE (x) (16) comes directly from (9). Observe that, by substituting (16) into (10), the following prediction-error equation for the conditional expectation is derived: E (y |x) = E (y ) + β {x − E (x)}. 2 (17) D.S.G. POLLOCK: TOPICS IN ECONOMETRICS 2011 Thus, the conditional expectation of y given x is obtained by adjusting the unconditional expectation by some proportion of the error in predicting x by its expected value. 2. THE PARTITIONED REGRESSSION MODEL Consider taking a regression equation in the form of y = [ X1 X2 ] β1 + ε = X1 β1 + X2 β2 + ε. β2 (1) Here [X1 , X2 ] = X and [β1 , β2 ] = β are obtained by partitioning the matrix X and vector β of the equation y = Xβ + ε in a conformable manner. The normal equations X X β = X y can be partitioned likewise. Writing the equations without the surrounding matrix braces gives X1 X1 β1 + X1 X2 β2 = X1 y, (2) X2 X1 β1 + X2 X2 β2 = X2 y. (3) From (2), we get the equation X1 X1 β1 = X1 (y − X2 β2 ) which gives an expresˆ sion for the leading subvector of β : ˆ ˆ β1 = (X1 X1 )−1 X1 (y − X2 β2 ). (4) ˆ To obtain an expression for β2 , we must eliminate β1 from equation (3). For this purpose, we multiply equation (2) by X2 X1 (X1 X1 )−1 to give X2 X1 β1 + X2 X1 (X1 X1 )−1 X1 X2 β2 = X2 X1 (X1 X1 )−1 X1 y. (5) When the latter is taken from equation (3), we get X2 X2 − X2 X1 (X1 X1 )−1 X1 X2 β2 = X2 y − X2 X1 (X1 X1 )−1 X1 y. On defining P1 = X1 (X1 X1 )−1 X1 , (6) (7) can we rewrite (6) as X2 (I − P1 )X2 β2 = X2 (I − P1 )y, whence ˆ β2 = X2 (I − P1 )X2 −1 X2 (I − P1 )y. (8) (9) Now let us investigate the effect that conditions of orthogonality amongst the regressors have upon the ordinary least-squares estimates of the regression parameters. Consider a partitioned regression model, which can be written as y = [ X 1 , X2 ] β1 + ε = X1 β1 + X2 β2 + ε. β2 3 (10) D.S.G. POLLOCK: TOPICS IN ECONOMETRICS 2011 It can be assumed that the variables in this equation are in deviation form. Imagine that the columns of X1 are orthogonal to the columns of X2 such that X1 X2 = 0. This is the same as assuming that the empirical correlation between variables in X1 and variables in X2 is zero. The effect upon the ordinary least-squares estimator can be seen by examˆ ining the partitioned form of the formula β = (X X )−1 X y . Here we have X1 [ X1 X2 XX= X2 ] = X1 X1 X2 X1 X1 X2 X2 X2 X1 X1 0 = 0 , X2 X2 (11) where the final equality follows from the condition of orthogonality. The inverse of the partitioned form of X X in the case of X1 X2 = 0 is −1 (X X ) X1 X1 = 0 0 X2 X2 We also have −1 X1 Xy= X2 = y= (X1 X1 )−1 0 X1 y X2 y 0 . (X2 X2 )−1 . (12) (13) On combining these elements, we find that ˆ β1 ˆ β2 = (X1 X1 )−1 0 X1 y 0 (X2 X2 )−1 X2 y = (X1 X1 )−1 X1 y (X2 X2 )−1 X2 y . (14) In this special case, the coefficients of the regression of y on X = [X1 , X2 ] can be obtained from the separate regressions of y on X1 and y on X2 . It should be understood that this result does not hold true in general. The ˆ ˆ general formulae for β1 and β2 are those which we have given already under (4) and (9): ˆ ˆ β1 = (X1 X1 )−1 X1 (y − X2 β2 ), ˆ β2 = X2 (I − P1 )X2 −1 X2 (I − P1 )y, −1 P1 = X1 (X1 X1 ) (15) X1 . It can be confirmed easily that these formulae do specialise to those under (14) in the case of X1 X2 = 0. The purpose of including X2 in the regression equation when, in fact, interest is confined to the parameters of β1 is to avoid falsely attributing the explanatory power of the variables of X2 to those of X1 . Let us investigate the effects of erroneously excluding X2 from the regression. In that case, the estimate will be ˜ β1 = (X1 X1 )−1 X1 y = (X1 X1 )−1 X1 (X1 β1 + X2 β2 + ε) = β1 + (X1 X1 )−1 X1 X2 β2 + (X1 X1 )−1 X1 ε. 4 (16) D.S.G. POLLOCK: TOPICS IN ECONOMETRICS 2011 On applying the expectations operator to these equations, we find that ˜ E (β1 ) = β1 + (X1 X1 )−1 X1 X2 β2 , (17) since E {(X1 X1 )−1 X1 ε} = (X1 X1 )−1 X1 E (ε) = 0. Thus, in general, we have ˜ ˜ E (β1 ) = β1 , which is to say that β1 is a biased estimator. The only circumstances in which the estimator will be unbiased are when either X1 X2 = 0 or β2 = 0. In other circumstances, the estimator will suffer from a problem which is commonly described as omitted-variables bias. We need to ask whether it matters that the estimated regression parameters are biased. The answer depends upon the use to which we wish to put the estimated regression equation. The issue is whether the equation is to be used simply for predicting the values of the dependent variable y or whether it is to be used for some kind of structural analysis. If the regression equation purports to describe a structural or a behavioral relationship within the economy, and if some of the explanatory variables on the RHS are destined to become the instruments of an economic policy, then it is important to have unbiased estimators of the associated parameters. For these parameters indicate the leverage of the policy instruments. Examples of such instruments are provided by interest rates, tax rates, exchange rates and the like. On the other hand, if the estimated regression equation is to be viewed solely as a predictive device—that it to say, if it is simply an estimate of the function E (y |x1 , . . . , xk ) which specifies the conditional expectation of y given the values of x1 , . . . , xn —then, provided that the underlying statistical mechanism which has generated these variables is preserved, the question of the unbiasedness the regression estimates does not arise. 3. HYPOTHESES CONCERNING SUBSETS OF THE REGRESSION COEFFICIENTS Consider a set of linear restrictions on the vector β of a classical linear regression model N (y ; Xβ, σ 2 I ) which take the form of Rβ = r, (1) where R is a matrix of order j × k and of rank j , which is to say that the j restrictions are independent of each other and are fewer in number than the parameters within β . We know that the ordinary least-squares estimator of β ˆ is a normally distributed vector β ∼ N {β, σ 2 (X X )−1 }. It follow that ˆ Rβ ∼ N Rβ = r, σ 2 R(X X )−1 R ; (2) and, from this, we can immediately infer that ˆ (Rβ − r) R(X X )−1 R σ2 5 −1 ˆ (Rβ − r) ∼ χ2 (j ). (3) D.S.G. POLLOCK: TOPICS IN ECONOMETRICS 2011 We have already established the result that ˆ ˆ (T − k )ˆ 2 σ (y − X β ) (y − X β ) = ∼ χ2 (T − k ) σ2 σ2 (4) is a chi-square variate which is statistically independent of the chi-square variate ˆ ˆ (β − β ) X X (β − β ) ∼ χ2 (k ) 2 σ (5) derived from the estimator of the regression parameters. The variate of (4) must also be independent of the chi-square of (3); and it is straightforward to deduce that F= = −1 ˆ (Rβ − r) R(X X )−1 R j ˆ (Rβ − r) R(X X )−1 R σ2 j ˆ −1 ˆ (Rβ − r) ˆ (Rβ − r) ˆ ˆ (y − X β ) (y − X β ) T −k (6) ∼ F (j, T − k ), which is to say that the ratio of the two independent chi-square variates, divided by their respective degrees of freedom, is an F statistic. This statistic, which embodies only know and observable quantities, can be used in testing the validity of the hypothesised restrictions Rβ = r. A specialisation of the statistic under (6) can also be used in testing an hypothesis concerning a subset of the elements of the vector β . Let β = ∗ [β1 , β2 ] . Then the condition that the subvector β1 assumes the value of β1 can be expressed via the equation [Ik1 , 0] β1 β2 ∗ = β1 . (7) This can be construed as a case of the equation Rβ = r where R = [Ik1 , 0] and ∗ r = β1 . In order to discover the specialised form of the requisite test statistic, let us consider the following partitioned form of an inverse matrix: −1 (X X ) = = X1 X1 X1 X2 X2 X1 −1 X2 X2 {X1 (I − P2 )X1 }−1 − {X1 (I − P2 )X1 }−1 X1 X2 (X2 X2 )−1 −{X2 (I − P1 )X2 }−1 X2 X1 (X1 X1 )−1 {X2 (I − P1 )X2 }−1 (8) , Then, with R = [I, 0], we find that R(X X )−1 R = X1 (I − P2 )X1 6 −1 (9) D.S.G. POLLOCK: TOPICS IN ECONOMETRICS 2011 It follows in a straightforward manner that the specialised form of the F statistic of (6) is F= = ∗ ∗ ˆ ˆ (β1 − β1 ) X1 (I − P2 )X1 (β1 − β1 ) k1 ˆ (β1 − ∗ β1 ) ˆ X1 (I − P2 )X1 (β1 − σ 2 k1 ˆ ∗ β1 ) ˆ ˆ (y − X β ) (y − X β ) T −k (10) ∼ F (k1 , T − k ). By specialising the expression under (10), a statistic may be derived for testing the hypothesis that βi = βi , concerning a single element: F= ˆ (βi − βi )2 , σ 2 wii ˆ (11) Here, wii stands for the ith diagonal element of (X X )−1 . If the hypothesis is true, then this will have an F (1, T − k ) distribution. However, the usual way of testing such an hypothesis is to use t= ˆ βi − βi (12) (σ 2 wii ) ˆ in conjunction with the tables of the t(T − k ) distribution. The t statistic shows the direction in which the estimate of βi deviates from the hypothesised value as well as the size of the deviation. 4. REGRESSIONS ON TRIGONOMETRICAL FUNCTIONS: DISCRETE-TIME FOURIER ANALYSIS An example of orthogonal regressors is a Fourier analysis, where the explanatory variables are sampled from a set of trigonometric functions with angular velocities, called Fourier frequencies, that are evenly distributed in an interval from zero to π radians per sample period. If the sample is indexed by t = 0, 1, . . . , T − 1, then the Fourier frequencies are ωj = 2πj/T ; j = 0, 1, . . . , [T /2], where [T /2] denotes the integer quotient of the division of T by 2. The object of a Fourier analysis is to express the elements of the sample as a weighted sum of sine and cosine functions as follows: [T /2] {αj cos(ωj t) + β sin(ωj t)} ; yt = α0 + t = 0, 1, . . . , T − 1. (1) j =1 The vectors of the generic trigonometric regressors may be denoted by cj = [c0j , c1j , . . . cT −1,j ] and 7 sj = [s0j , s1j , . . . sT −1,j ] , (3) D.S.G. POLLOCK: TOPICS IN ECONOMETRICS 2011 where ctj = cos(ωj t) and stj = sin(ωj t). The vectors of the ordinates of functions of different frequencies are mutually orthogonal. Therefore, the following orthogonality conditions hold: ci cj = si sj = 0 if i = j, and ci sj = 0 for all i, j. (4) In addition, there are some sums of squares which can be taken into account in computing the coefficients of the Fourier decomposition: c0 c0 = ι ι = T, cj cj = sj sj = T /2 s0 s0 = 0, for j = 1, . . . , [(T − 1)/2] (5) When T = 2n, there is ωn = π and there is also sn sn = 0, and cn cn = T. (6) The “regression” formulae for the Fourier coefficients can now be given. First, there is 1 yt = y . ¯ (7) α0 = (ι ι)−1 ι y = Tt Then, for j = 1, . . . , [(T − 1)/2], there are αj = (cj cj )−1 cj y = and βj = (sj sj )−1 sj y = 2 T yt cos ωj t, (8) yt sin ωj t. (9) t 2 T t If T = 2n is even, then there is no coefficient βn and there is αn = (cn cn )−1 cn y = 1 T (−1)t yt . (10) t By pursuing the analogy of multiple regression, it can be seen, in view of the orthogonality relationships, that there is a complete decomposition of the sum of squares of the elements of the vector y : [T /2] yy= 2 α0 ι 2 2 αj cj cj + βj sj sj . ι+ (11) j =1 2 Now consider writing α0 ι ι = y 2 ι ι = y y , where y = [¯, y , . . . , y ] is a vector ¯ ¯¯ ¯ y¯ ¯ 2 whose repeated element is the sample mean y . It follows that y y − α0 ι ι = ¯ 8 D.S.G. POLLOCK: TOPICS IN ECONOMETRICS 2011 ¯¯ ¯ ¯ y y − y y = (y − y ) (y − y ). Then, in the case where T = 2n is even, the equation can be written as T ¯ (y − y ) (y − y ) = ¯ 2 n−1 2 αj + 2 βj j =1 + 2 T αn T = 2 n ρ2 . j (12) j =1 2 2 where ρj = αj + βj for j = 1, . . . , n − 1 and ρn = 2αn . A similar expression exists when T is odd, with the exceptions that αn is missing and that the summation runs to (T − 1)/2. It follows that the variance of the sample can be expressed as 1 T T −1 1 (yt − y ) = ¯ 2 t=0 n 2 2 2 (αj + βj ). (13) j =1 The proportion of the variance that is attributable to the component at fre2 2 quency ωj is (αj + βj )/2 = ρ2 /2, where ρj is the amplitude of the component. j The number of the Fourier frequencies increases at the same rate as the sample size T , and, if there are no regular harmonic components in the underling process, then we can expect the proportion of the variance attributed to the individual frequencies to decline as the sample size increases. If there is a regular component, then we can expect the the variance attributable to it to converge to a finite value as the sample size increases. In order provide a graphical representation of the decomposition of the sample variance, we must scale the elements of equation (36) by a factor of T . 2 2 The graph of the function I (ωj ) = (T /2)(αj + βj ) is know as the periodogram. We should note that the trigonometric functions that underlie the periodogram are defined over an infinite index set {t = 0 ± 1, ±2, . . .} comprising all positive and negative integers. As the index t progresses through this sequence, equation (1) will generate successive replications of the data set that is defined on the points t = 0, 1, . . . , T − 1. The result is described as the periodic extension of the data. Since the trigonometric functions are both periodic and strictly bounded, they cannot be used to synthesise a perpetual trend. Therefore, before a periodogram analysis can be applied effectively, it is necessary to ensure that the data sequence is free of trend. A failure to detrend the data will lead to successive disjunctions in its periodic extension at the points where the beginning of one replication of the data set joins the end of the previous replication. The result will be a so-called saw tooth function. The typical periodogram of a saw tooth function has the form of a rectangular hyperbola, which descends from a high point at the fundamental frequency of ω1 = 2π/T and which reaches a low point in the vicinity of the limiting frequency of π . Within such a periodogram, the features of primary interest, which relate to the fluctuations that are superimposed upon the trend, may be virtually imperceptible—see Figure 5, for example. 9 D.S.G. POLLOCK: TOPICS IN ECONOMETRICS 2011 5.4 5.2 5 4.8 0 25 50 75 100 125 Figure 1. The plot of 132 monthly observations on the U.S. money supply, beginning in January 1960. A quadratic function has been interpolated through the data. 0.015 0.01 0.005 0 0 π/4 π/2 3π/4 π Figure 2. The periodogram of the residuals of the logarithmic moneysupply data. The Periodogram and the Autocovariance Function A stationary stochastic process can be characterised, equivalently, by its autocovariance function or its partial autocovariance function. It can also be characterised by is spectral density function, which is the Fourier transform of the autocovariances {γτ ; τ = 0, ±1, ±2, . . .} : ∞ f (ω ) = ∞ γτ cos(ωτ ) = γ0 + 2 τ =−∞ γτ cos(ωτ ). (14) τ =1 Here, ω ∈ [0, π ] is an angular velocity, or frequency value, in radians per period. The empirical counterpart of the spectral density function is the periodogram I (ωj ), which may be defined as 1 I (ωj ) = 2 T −1 T −1 cτ cos(ωj τ ) = c0 + 2 τ =1−T cτ cos(ωj τ ), τ =1 10 (15) D.S.G. POLLOCK: TOPICS IN ECONOMETRICS 2011 where {cτ ; τ = 0, ±1, . . . , ±(T − 1)}, with cτ = T −1 T −1 (yt − y )(yt−τ − y ), ¯ ¯ (16) t=τ are the empirical autocovariances, and where ωj = 2πj/T ; j = 0, 1, . . . , [T /2] are the Fourier frequencies and We need to show this definition of the peridogram is equivalent to the previous definition, which was based on the following frequency decomposition of the sample variance: 1 T where 2 T αj = [T /2] 1 (yt − y ) = ¯ 2 t=0 2 yt cos(ωj t) = t 2 T βj = T −1 yt sin(ωj t) = t 2 2 (αj + βj ), (17) j =0 2 T 2 T (yt − y ) cos(ωj t), ¯ t (yt − y ) sin(ωj t). ¯ t 2 2 Substituting these into the term T (αj + βj )/2 gives the periodogram T −1 2 I (ωj ) = T T −1 2 cos(ωj t)(yt − y ) ¯ 2 sin(ωj t)(yt − y ) ¯ + t=0 . t=0 The quadratic terms may be expanded to give I (ωj ) = 2 T + cos(ωj t) cos(ωj s)(yt − y )(ys − y ) ¯ ¯ t 2 T s sin(ωj t) sin(ωj s)(yt − y )(ys − y ) , ¯ ¯ t s Since cos(A) cos(B ) + sin(A) sin(B ) = cos(A − B ), this can be written as I (ωj ) = 2 T cos(ωj [t − s])(yt − y )(ys − y ) ¯ ¯ t s On defining τ = t − s and writing cτ = the latter expression to t (yt − y )(yt−τ − y )/T , we can reduce ¯ ¯ T −1 I (ωj ) = 2 cos(ωj τ )cτ , τ =1−T which is a Fourier transform of the empirical autocovariances. 11 D.S.G. POLLOCK: TOPICS IN ECONOMETRICS 2011 5. FILTERING MACROECONOMIC DATA Wiener–Kolmogorov Filtering of Stationary Sequences The classical theory of linear filtering was formulated independently by Norbert Wiener (1941) and Andrei Nikolaevich Kolmogorov (1941) during the Second World War. They were both considering the problem of how to target radarassisted anti-aircraft guns on incoming enemy aircraft. The theory has found widespread application in analog and digital signal processing and in telecommunications in general. Also, it has provided a basic technique for the enhancement of recorded music. The classical theory assumes that the data sequences are generated by stationary stochastic processes and that these are of sufficient length to justify the assumption that they constitute doubly-infinite sequences. For econometrics, the theory must to be adapted to cater to short trended sequences. Then, Wiener–Kolmogorov filters can used to extract trends from economic data sequences and for generating seasonally adjusted data. Consider a vector y with a signal component ξ and a noise component η : y = ξ + η. (1) These components are assumed to be independently normally distributed with zero means and with positive-definite dispersion matrices. Then, E (ξ ) = 0, D(ξ ) = Ωξ , E (η ) = 0, D(η ) = Ωη , (2) and C (ξ, η ) = 0. A consequence of the independence of ξ and η is that D(y ) = Ωξ + Ωη and C (ξ, y ) = D(ξ ) = Ωξ . (3) The signal component is estimated by a linear transformation x = Ψx y of the data vector that suppresses the noise component. Usually, the signal comprises low-frequency elements and the noise comprises elements of higher frequencies. The Minimum Mean-Squared Error Estimator The principle of linear minimum mean-squared error estimation indicates that the error ξ − x in representing ξ by x should be uncorrelated with the data in y: 0 = C (ξ − x, y ) = C (ξ, y ) − C (x, y ) = C (ξ, y ) − Ψx C (y, y ) (4) = Ωξ − Ψx (Ωξ + Ωη ). This indicates that the estimate is x = Ψx y = Ωξ (Ωξ + Ωη )−1 y. 12 (5) D.S.G. POLLOCK: TOPICS IN ECONOMETRICS 2011 The corresponding estimate of the noise component η is h = Ψh y = Ωη (Ωξ + Ωη )−1 y = {I − Ωξ (Ωξ + Ωη )−1 }y. (6) It will be observed that Ψξ + Ψη = I and, therefore, that x + h = y . Conditional Expectations In deriving the estimator, we might have used the formula for conditional expectations. In the case of two linearly related scalar random variables ξ and y , the conditional expectation of ξ given y is E (ξ |y ) = E (ξ ) + C (ξ, y ) {y − E (y )} V (y ) (7) In the case of two vector quantities, this becomes E (ξ |y ) = E (ξ ) + C (ξ, y )D−1 (y ){y − E (y )} (8) By setting C (ξ, y ) = Ωξ and D(y ) = Ωξ + Ωη as in (3), and by setting E (ξ ) = E (y ) = 0, we get the expression that is to be found under (5): x = Ωξ (Ωξ + Ωη )−1 y. The Difference Operator and Polynomial Regression The lag operator L, which is commonly defined in respect of a doubly-infinite sequence x(t) = {xt ; t = 0 ± 1, ±2, . . .}, has the effect that Lx(t) = x(t − 1). The (backwards) difference operator ∇ = 1 − L has the effect that ∇x(t) = x(t) − x(t − 1). It serves to reduce a constant function to zero and to reduce a linear function to a constant. The second-order or twofold difference operator ∇2 = 1 − 2L + L2 is effective in reducing a linear function to zero. A difference operator ∇d of order d is commonly employed in the context of an ARIMA(p, d, q ) model to reduce the data to stationarity. Then, the differenced data can be modelled by an ARMA(p, q ) process. In such circumstances, the difference operator takes the form of a matrix transformation. 13 D.S.G. POLLOCK: TOPICS IN ECONOMETRICS 2011 6 4 2 0 0 π/4 π/2 3π/4 π Figure 3. The squared gain of the difference operator, which has a zero at zero frequency, and the squared gain of the summation operator, which is unbounded at zero frequency. The Matrix Difference Operator The matrix analogue of the second-order difference operator in the case of T = 5, for example, is given by 1 0 0 00 0 0 0 −2 1 Q∗ 2 (9) = 1 −2 1 ∇5 = 0 0. Q 0 1 −2 1 0 0 0 1 −2 1 The first two rows, which do not produce true differences, are liable to be discarded. The difference operator nullifies data elements at zero frequency and it severely attenuates those at the adjacent frequencies. This is a disadvantage when the low frequency elements are of primary interest. Another way of detrending the data is to fit a polynomial trend by least-squares regression and to take the residual sequence as the detrended data. Polynomial Regression Using the matrix Q defined above, we can represent the vector of the ordinates of a linear trend line interpolated through the data sequence as x = y − Q(Q Q)−1 Q y . (10) The vector of the residuals is e = Q(Q Q)−1 Q y . (11) Observe that this vector contains exactly the same information as the differenced vector g = Q y . However, whereas the low-frequency structure of 14 D.S.G. POLLOCK: TOPICS IN ECONOMETRICS 2011 11.5 11 10.5 10 0 50 100 150 Figure 4. The quarterly series of the logarithms of consumption in the U.K., for the years 1955 to 1994, together with a linear trend interpolated by least-squares regression. 8 6 4 2 0 0 π/4 π/2 3π/4 π Figure 5. The periodogram of the trended logarithmic data. 0.3 0.2 0.1 0 0 π/4 π/2 3π/4 π Figure 6. The periodogram of the differenced logarithmic consumpption data. 15 D.S.G. POLLOCK: TOPICS IN ECONOMETRICS 2011 0.01 0.0075 0.005 0.0025 0 0 π/4 π/2 3π/4 π Figure 7. The periodogram of the residual sequence obtained from the linear detrending of the logarithmic consumption data. the data in invisible in the periodogram of the latter, it is entirely visible in the periodogram of the residuals. Filters for Short Trended Sequences Applying Q to the equation y = ξ + η , representing the trended data, gives Qy =Qξ+Qη = δ + κ = g. (12) The vectors of the expectations and the dispersion matrices of the differenced vectors are E (δ ) = 0, D(δ ) = Ωδ = Q D(ξ )Q, (13) E (κ) = 0, D(κ) = Ωκ = Q D(η )Q. The difficulty of estimating the trended vector ξ = y − η directly is that some starting values or initial conditions are required in order to define the value at time t = 0. However, since η is from a stationary mean-zero process, it requires only zero-valued initial conditions. Therefore, the starting-value problem can be circumvented by concentrating on the estimation of η . The conditional expectation of η , given the differenced data g = Q y , is provided by the formula h = E (η |g ) = E (η ) + C (η, g )D−1 (g ){g − E (g )} = C (η, g )D−1 (g )g, (14) where the second equality follows in view of the zero-valued expectations. Within this expression, there are D(g ) = Ωδ + Q Ωη Q and C (η, g ) = Ωη Q. 16 (15) D.S.G. POLLOCK: TOPICS IN ECONOMETRICS 2011 1 0.75 0.5 0.25 0 0 π/4 π/2 3π/4 π Figure 8. The gain of the Hodrick–Prescott lowpass filter with a smoothing parameter set to 100, 1,600 and 14,400. Putting these details into (14) gives the following estimate of η : h = Ωη Q(Ωδ + Q Ωη Q)−1 Q y . (16) Putting this into the equation x = y − h gives x = y − Ωη Q(Ωδ + Q Ωη Q)−1 Q y . (17) The Leser (H–P) Filter We now consider two specific cases of the Wiener–Kolmogorov filter. First, there is the Leser or Hodrick–Prescott (H–P) filter. This can be derived from a model that supposes that the signal is generated by an integrated (secondorder) random walk and and that the noise is from a white-noise process. The random walk process is reduced to a white-noise process δ (t) by taking twofold differences. Thus, (1 − L)2 ξ (t) = δ (t), and the corresponding equation for the sample is Q ξ = δ . Accordingly, the filter is derived by setting 2 D(η ) = Ωη = ση I, within (17) to give 2 D(δ ) = Ωδ = σδ I and λ = x = y − Q(λ−1 I + Q Q)−1 Q y . 2 ση 2 σδ (18) (19) Here, λ is the so-called smoothing parameter. It will be observed that, as λ → ∞, the vector x tends to that of a linear function interpolated into the data by least-squares regression, which is represented by equation (10): x = y − Q(Q Q)−1 Q y . 17 D.S.G. POLLOCK: TOPICS IN ECONOMETRICS 2011 1 0.75 0.5 0.25 0 0 π/4 π/2 3π/4 π Figure 9. The gain of the lowpass Butterworth filters of orders n = 6 and n = 12 with a nominal cut-off point of 2π/3 radians. Figure 8 depicts the frequency response of the lowpass H–P filter for various values of the smoothing parameter λ. The innermost profile corresponds to the highest value of the parameter, and it represents a filter that transmits only the data elements of lowest frequency. For all values of λ, the response of the H–P filter shows a gradual transition from the pass band, which corresponds to the frequencies that are transmitted by the filter, to the stop band, which corresponds to the frequencies that are impeded. Often, there is a requirement for a more rapid transition as well as a need to control the location in frequency where the transitions occurs. These needs can be served by the Butterworth filter, which is more amenable to adjustment. The Butterworth Filter The Butterworth filter can be derived from an heuristic model in which the signal and the noise are generated by processes that are described, respectively, by the equations (1 − L)2 ξ (t) = (1 + L)n ζ (t) and (1 − L)2 η (t) = (1 − L)n ε(t), where ζ (t) and η (t) are mutually independent white-noise processes. The filter that is appropriate to short trended sequences can be represented by the equation x = y − λΣQ(M + λQ ΣQ)−1 Q y . (20) Here, the matrices are Σ = {2IT − (LT + LT )}n−2 and M = {2IT + (LT + LT )}n , (21) where LT is a matrix of order T with units on the first subdiagonal; and it can be verified that (22) Q ΣQ = {2IT − (LT + LT )}n . Figure 9 shows the frequency response of the Butterworth filter for various values of n and for a specific cut-off frequency, which is determined by the 18 D.S.G. POLLOCK: TOPICS IN ECONOMETRICS 2011 parameter λ. The greater the value of n, the more rapid is the transition from pass band to stop band. LAGGED DEPENDENT VARIABLES AND ERROR CORRECTION MECHANISMS A regression equation can also be set in motion by including lagged values of the dependent variable on the RHS. With one lagged value, we get y (t) = φy (t − 1) + βx(t) + ε(t). (1) In terms of the lag operator, this is (1 − φL)y (t) = βx(t) + ε(t), (2) of which the rational form is y (t) = 1 β x(t) + ε(t). 1 − φL 1 − φL (3) The advantage of equation (1) is that it is amenable to estimation by ordinary least-squares regression. Although the estimates will be biased in finite samples, they will be consistent, if the model is correctly specified. The disadvantage is the restrictive assumption that the systematic and disturbance parts have the same dynamics. Partial Adjustment and Adaptive Expectations A simple partial-adjustment model has the form y (t) = λ γ x(t) + (1 − λ)y (t − 1) + ε(t), (4) If y (t) is current consumption, x(t) is disposable income, then γx(t) = y ∗ (t) is “desired”consumption. If habits of consumption persist, then current consumption will be a weighted combination of the previous consumption and present desired consumption. The weights of the combination depend on the partial-adjustment parameter λ ∈ (0, 1]. If λ = 1, then the consumers adjust their consumption instantaneously to the desired value. As λ → 0, their consumption habits become increasingly persistent. When the notation λγ = β and (1 − λ) = φ is adopted, equation (4) becomes identical to equation (2), which is the regression model with a lagged dependent variable. Observe also that γ = β/(1 − φ) is a “long-term multipier” in the relationhip between iincome and comsumption. It can also be described as the steady-state gain of the transfer function from x(t) to y (t), which is depocted in equation (3). 19 D.S.G. POLLOCK: TOPICS IN ECONOMETRICS 2011 Error-Correction Forms, and Nonstationary Signals The usual linear regression procedures presuppose that the relevant data moments will converge asymptotically to fixed limits as the sample size increases. This cannot happen if the data are trended, in which case, the standard techniques of statistical inference will not be applicable. A common approach is to subject the data to as many differencing operations as may be required to achieve stationarity. However, differencing tends to remove some of the essential information regarding the behaviour of economic agents. Moreover, it is often discovered that the regression model looses much of its explanatory power when the differences of the data are used instead. In such circumstances, one might use the so-called error-correction model. The model depicts a mechanism whereby two trended economic variables maintain an enduring long-term proportionality with each other. The data sequences comprised by the model are stationary, either individually or in an appropriate combination; and this enables us apply the standard procedures of statistical inference that are appropriate to models comprising data from stationary processes. Consider taking y (t − 1) from both sides of the equation of (1), which represents the first-order dynamic model. This gives ∇y (t) = y (t) − y (t − 1) = (φ − 1)y (t − 1) + βx(t) + ε(t) = (1 − φ) β x(t) − y (t − 1) + ε(t) 1−φ (5) = λ γ x(t) − y (t − 1) + ε(t), where λ = 1 − φ and where γ is the gain of the transfer function from x(t) to y (t) defined under (3). This is the so-called error-correction form of the equation; and it indicates that the change in y (t) is a function of the extent to which the proportions of the series x(t) and y (t − 1) differs from those which would prevail in the steady state. The error-correction form provides the basis for estimating the parameters of the model when the signal series x(t) is trended or nonstationary. A pair of nonstationary series that maintain a long-run proportionality are said to be cointegrated. It is easy to obtain an accurate estimate of γ , which is the coefficient of proportionality, simply by running a regression of y (t − 1) on x(t). Once a value for γ is available, the remaining parameter λ may be estimated by regressing ∇y (t) upon the composite variable {γx(t) − y (t − 1)}. However, if the error-correction model is an unrestricted reparametrisation of an original model in levels, then its parameters can be estimated by ordinary least-squares regression. The same estimates can also be inferred from the least-squares estimates of the parameters of the original model in levels. A pair of nonstationary series that maintain a long-run proportionality are said to be cointegrated. It is easy to obtain an accurate estimate of γ , which is the coefficient of proportionality, simply by running a regression of y (t − 1) on x(t). To see how to derive an error-correction form for a more general autoregressive distributed-lag model, consider the second-order model: y (t) = φ1 y (t − 1) + φ2 y (t − 2) + β0 x(t) + β1 x(t − 1) + ε(t). 20 (6) D.S.G. POLLOCK: TOPICS IN ECONOMETRICS 2011 The part φ1 y (t − 1) + φ2 y (t − 2) comprising the lagged dependent variables can be reparameterised as follows: [ φ1 φ2 ] 1 1 0 1 1 −1 0 1 y (t − 1) y (t − 2)) = [θ ρ] y (t − 1) . (7) ∇y (t − 1)) Here, the matrix that postmultiplies the row vector of the parameters is the inverse of the matrix that premultiplies the column vector of the variables. The sum β0 x(t) + β1 x(t − 1) can be reparametrised to become [ β0 β1 ] 1 1 1 0 0 1 1 −1 x(t) x(t − 1) = [κ δ] x(t − 1) . ∇x(t) (8) It follows that equation (61) can be recast in the form of y (t) = θy (t − 1) + ρ∇y (t − 1) + κx(t − 1) + δ ∇x(t) + ε(t). (9) Taking y (t − 1) from both sides of this equation and rearranging it gives ∇y (t) = (1 − θ) κ x(t − 1) − y (t − 1) + ρ∇y (t − 1) + δ ∇x(t) + ε(t) 1−θ = λ {γx(t − 1) − y (t − 1)} + ρ∇y (t − 1) + δ ∇x(t) + ε(t). (10) This is an elaboration of equation (5); and it includes the differenced sequences ∇y (t − 1) and ∇x(t). These are deemed to be stationary, as is the composite error sequence γx(t − 1) − y (t − 1). Observe that, in contrast to equation (5), the error-correction term of (10) comprises the lagged value x(t − 1) in place of x(t). Had the reparametrising transformation that has been employed in equation (7) also been used in (8), then the consequence would have been to generate an error-correction term of the form γx(t) − y (t − 1). It should also be observed that the parameter associated with x(t) in (10), which is γ= β0 + β1 κ , = 1−φ 1 − φ1 − φ2 (11) is the steady state gain of the transfer function from x(t) to y (t). Additional lagged differences can be added to the equation (10); and this is tantamount to increasing the number of lags of the dependent variable y (t) and the number of lags of the input variable x(t) within equation (6). 21 ...
View Full Document

Ask a homework question - tutors are online