lecture1 - EC3062 ECONOMETRICS ELEMENTARY REGRESSION...

Info iconThis preview shows page 1. Sign up to view the full content.

View Full Document Right Arrow Icon
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: EC3062 ECONOMETRICS ELEMENTARY REGRESSION ANALYSIS We shall consider three methods for estimating statistical parameters. These are the method of moments, the method of least squares and the principle of maximum likelihood. In the case of the regression model, the three methods generate estimating equations that are identical; but the assumptions differ. Conditional Expectations If y ∼ f (y ), then, in the absence of further information, the minimummean-square-error predictor is its expected value E (y ) = y f (y )dy. Proof. If π is the value of a prediction, then the mean-square error is (1) M= (y − π )2 f (y )dy = E (y − π )2 = E (y 2 ) − 2πE (y ) + π 2 ; and, by calculus, it can be shown that M is minimised by taking π = E (y ). 1 EC3062 ECONOMETRICS If x is related to y , then the m.m.s.e prediction of y is the conditional expectation E (y |x) = (2) f (x, y ) dy. y f (x) Proof. Let y = E (y |x) and let π = π (x) be any other estimator. Then, ˆ (5) E (y − π )2 = E (y − y ) + (ˆ − π ) ˆ y 2 ˆy ˆ = E (y − y )2 + 2E (y − y )(ˆ − π ) + E (y − π )2 . ˆ In the second term, there is (y − y )(ˆ − π )f (x, y )∂y∂x ˆy E (y − y )(ˆ − π ) = ˆy x (6) y (y − y )f (y |x)∂y (y − π )f (x)∂x = 0. ˆ ˆ = x y ˆ y ˆ Therefore, E {(y − π )2 } = E {(y − y )2 } + E {(ˆ − π )2 } ≥ E {(y − y )2 }, and the assertion is proved. 2 EC3062 ECONOMETRICS The definition of the conditional expectation implies that xyf (x, y )∂y∂x = E (xy ) = x y yf (y |x)∂y f (x)∂x = E (xy ). ˆ x x y When E (xy ) = E (xy ) is rewritten as E x(y − y ) = 0, it may be described ˆ ˆ as an orthogonality condition. This indicates that the prediction error y −y ˆ is uncorrelated with x. If it were correlated with x, then we should not be using the information of x efficiently in forming y . ˆ Linear Regression Assume that x and y have a joint normal distribution, which implies that there is a linear regression relationship: (9) E (y |x) = α + βx, The object is to express α and β in terms of the expectations E (x), E (y ), the variances V (x), V (y ) and the covariance C (x, y ). 3 EC3062 ECONOMETRICS First, multiply (9) by f (x), and integrate with respect to x to give (10) E (y ) = α + βE (x), whence the equation for the intercept is (11) α = E (y ) − βE (x). Equation (10) shows that the regression line passes through the expected value of the joint distribution E (x, y ) = {E (x), E (y )}. By putting (11) into E (y |x) = α + βx from (9), we find that (12) E (y |x) = E (y ) + β x − E (x) . Now multiply (9) by x and f (x) and integrate with respect to x to give (13) E (xy ) = αE (x) + βE (x2 ). Multiplying (10) by E (x) gives (14) E (x)E (y ) = αE (x) + β E (x) 4 2 , EC3062 ECONOMETRICS E (xy ) = αE (x) + βE (x2 ). (13) Multiplying (10) by E (x) gives (14) E (x)E (y ) = αE (x) + β E (x) 2 , whence, on taking (14) from (13), we get (15) E (xy ) − E (x)E (y ) = β E (x2 ) − E (x) 2 , which implies that (16) β = E (xy ) − E (x)E (y ) E (x2 ) − E (x) 2 E = x − E (x) E y − E (y ) x − E (x) 2 = C (x, y ) . V (x) Thus, we have expressed α and β in terms of the moments E (x), E (y ), V (x) and C (x, y ) of the joint distribution of x and y . 5 EC3062 ECONOMETRICS Estimation by the Method of Moments Let (x1 , y1 ), (x2 , y2 ), . . . , (xT , yT ) be a sample of T observations. Then, we can calculate the following empirical or sample moments: 1 x= ¯ T (21) 1 2 sx = T sxy 1 = T T xt , t=1 1 y= ¯ T T 1 2 (xt − x) = ¯ T t=1 T T yt , t=1 T x2 − x2 , ¯ t t=1 1 (xt − x)(yt − y ) = ¯ ¯ T t=1 T xt yt − xy . ¯¯ t=1 To estimate α and β , we replace the population moments in the formulae of (11) and (16) by the sample moments. Then, the estimates are (22) α = y − β x, ˆ ¯ ˆ¯ ˆ β= 6 ¯ ¯ (xt − x)(yt − y ) . ¯ (xt − x)2 EC3062 ECONOMETRICS Convergence We can expect the sample moments to converge to the true moments of the bivariate distribution, thereby causing the estimates of the parameters to converge likewise to the true values. (23) A sequence of numbers {an } is said to converge to a limit a if, for any arbitrarily small real number , there exists a corresponding integer N such that |an − a| < for all n ≥ N . This is not appropriate to a stochastic sequence, such as a sequence of estimates. For, it is always possible for an to break the bounds of a ± when n > N . The following is a more appropriate definition: (24) A sequence of random variables {an } is said to converge weakly in probability to a limit a if, for any , there is lim P (|an − a| > ) = 0 as n → ∞ or, equivalently, lim P (|an − a| ≤ ) = 1. With the increasing size of the sample, it becomes virtually certain that an will ‘fall within an epsilon of a.’ We describe a as the probability limit of an and we write plim(an ) = a. 7 EC3062 ECONOMETRICS This definition does not presuppose that an has a finite variance or even a finite mean. However, if an does have finite moments, then we may talk of mean-square convergence: (25) A sequence of random variables {an } is said to converge in mean square to a limit a if lim(n → ∞)E {(an − a)2 } = 0. We should note that E an − a 2 =E an − E (an ) − a − E (an ) 2 (26) = V (an ) + E a − E (an ) 2 . Thus, the mean-square error of an estimator an is the sum of its variance and the square of its bias. If an is to converge in mean square to a, then both of these quantities must vanish. Convergence in mean square implies convergence in probability. When an estimator converges in probability to the parameter which it purports to represent, then we say that it is a consistent estimator. 8 EC3062 ECONOMETRICS 80 75 70 65 60 55 55 60 65 70 75 80 Figure 1. Pearson’s data comprising 1078 measurements of the heights of fathers (the abscissae) and of their sons (the ordinates), together with the two regression lines. The correlation coefficient is 0.5013. 9 EC3062 ECONOMETRICS The Bivariate Normal Distribution Most of the results in the theory of regression can be obtained by examining the functional form of the bivariate normal distribution. Let x and y be the two variables. Let us denote their means by E (x) = µx , 2 2 E (y ) = µy , their variances by V (x) = σx , V (y ) = σy and their covariance by C (x, y ) = ρσx σy . Here, the correlation coefficient (30) ρ= C (x, y ) V (x)V (y ) provides a measure of the relatedness of these variables. The bivariate distribution is specified by 1 exp Q(x, y ), (31) f (x, y ) = 2 2πσx σy 1 − ρ where (32) −1 Q= 2(1 − ρ2 ) x − µx σx 2 − 2ρ x − µx σx 10 y − µy σy + y − µy σy 2 . EC3062 ECONOMETRICS The quadratic function can also be written as y − µy x − µx −ρ σy σx −1 (33) Q = 2(1 − ρ2 ) 2 − (1 − ρ2 ) x − µx σx Thus, we have f (x, y ) = f (y |x)f (x), (34) where (35) f (x) = σx 1 √ (x − µx )2 exp − 2 2σx 2π , and (36) f (y |x) = (y − µy|x )2 exp − 2 2) 2σy (1 − ρ)2 2π (1 − ρ 1 σy with (37) µy|x ρσy = µy + (x − µx ). σx 11 , 2 . EC3062 ECONOMETRICS Least-Squares Regression Analysis The regression equation, E (y |x) = α + βx can be written as (39) y = α + xβ + ε, where ε = y − E (y |x) is a random variable, with E (ε) = 0 and V (ε) = σ 2 , that is independent of x. Given observations (x1 , y1 ), . . . , (xT , yT ), the estimates are the values that minimise the sum of squares of the distances—measured parallel to the y -axis—of the data points from an interpolated regression line: T (40) T (yt − α − xt β )2 . ε2 = t S= t=1 t=1 Differentiating S with respect to α and setting to zero gives (41) −2 (yt − α − βxt ) = 0, or y − α − β x = 0. ¯ ¯ This generates the following estimating equation for α: (42) α(β ) = y − β x. ¯ ¯ 12 EC3062 ECONOMETRICS By differentiating with respect to β and setting the result to zero, we get −2 (43) xt (yt − α − βxt ) = 0. On substituting for α from (42) and eliminating the factor −2, this becomes xt yt − (44) xt (¯ − β x) − β y ¯ x2 = 0, t whence we get (45) ˆ β= xt yt − T xy ¯¯ = ¯ x2 − T x2 t (xt − x)(yt − y ) ¯ ¯ . ¯ (xt − x)2 This is identical to the estimate under (22) derived via the method of ˆ moments. Putting β into the equation α(β ) = y − β x of (42), gives the ¯ ¯ estimate of α found under (22). ˆ 13 EC3062 ECONOMETRICS The method of least squares does not automatically provide an estimate of σ 2 = E (ε2 ). To obtain an estimate, we may apply the method t of moments to the regression residuals et = yt − α − βxt to give ˆˆ σ2 = ˜ (46) 1 T e2 . t In fact, this is a biased estimator with (47) E σ2 ˜ T T −2 T = σ2 ; so it is common to adopt the unbiased estimator (48) e2 t σ= ˆ . T −2 2 14 EC3062 ECONOMETRICS Properties of the Least-Squares Estimator The disturbance term ε is assumed to be a random variable with (49) E (εt ) = 0, and V (εt ) = σ 2 for all t. We might assume that x is a random variable uncorrected with ε such that that C (x, ε) = 0. However, if we are prepared to regard the xt as predetermined values which have no effect on the εt , then we can say that (50) E (xt εt ) = xt E (εt ) = 0, for all t. In place of an assumption attributing a finite variance to x, we may assert that (51) 1 lim(T → ∞) T T x2 = mxx < ∞. t t=1 For the random sequence {xt εt }, we assert that (52) 1 plim(T → ∞) T 15 T xt εt = 0. t=1 EC3062 ECONOMETRICS To see the effect of these assumptions, let us substitute the expression (53) ¯ ¯ ¯ yt − y = β (xt − x) + εt − ε ˆ in the expression for β found under (45). By rearranging the result, we have (54) ˆ β=β+ ¯ (xt − x)εt . 2 ¯ (xt − x) The numerator of the second term on the RHS is obtained with the help of the identity (55) ¯ ¯ (xt − x)(εt − ε) = = (xt εt − xεt − xt ε + xε) ¯ ¯ ¯¯ ¯ (xt − x)εt . From the assumption under (50), it follows that (56) ¯ ¯ E (xt − x)εt = (xt − x)E (εt ) = 0 16 for all t. EC3062 ECONOMETRICS Therefore, (57) ˆ E (β ) = β + (xt − x)E (εt ) ¯ (xt − x)2 ¯ = β; ˆ and β is seen to be an unbiased estimator of β . The consistency of the estimator follows, likewise, from the assumptions under (51) and (52). Thus (58) ˆ plim(β ) = β + plim T −1 (xt − x)εt ¯ plim T −1 ¯ (xt − x)2 = β; ˆ and β is seen to be a consistent estimator of β . ˆ The consistency of β depends crucially upon the assumption that the disturbance term is independent of, or uncorrelated with, the explanatory variable or regressor x. 17 EC3062 ECONOMETRICS Example. A simple model of the economy is postulated that comprises two equations in income y , consumption c and investment i: (59) y = c + i, (60) c = α + βy + ε. Also, s = y − c or s = i, where s is savings. The disturbance ε, is assumed to be independent of investment i. Substituting (60) into (59) gives 1 (61) y= α+i+ε , 1−β from which 1 ¯ i ¯ it − ¯ + εt − ε . (62) yt − y = 1−β The estimator of the parameter β , marginal propensity to consume is (63) ˆ β=β+ (yt − y )εt ¯ . (yt − y )2 ¯ ˆ Since y is dependent on ε, according to (61), β cannot be a consistent estimator of β . 18 EC3062 ECONOMETRICS y=c+i c, y c + i2 c + i1 c = α + βy i2 i1 α 45 o y y1 y2 Figure 2. If the only source of variation in y is the variation in i, then the observations on y and c will delineate the consumption function. 19 EC3062 ECONOMETRICS c=y −i c c = α + βy + ε c = α + βy ε α i 45 o y y1 y2 Figure 3. If the only source of variation in y are the disturbances to c, then the observations on y and c will line along a 45◦ line. 20 EC3062 ECONOMETRICS To determine the probability limit of the estimator, we must assess the separate probability limits of the numerator and the denominator of the term on the RHS of (63). The following results are available: 1 lim T (64) 1 plim T 1 plim T T (it − ¯)2 = mii = V (i), i t=1 T mii + σ 2 (yt − y )2 = ¯ = V (y ), (1 − β )2 t=1 T σ2 = C (y, ε). (yt − y )εt = ¯ 1−β t=1 The results indicate that (65) σ 2 (1 − β ) βmii + σ 2 ˆ plim β = β + = . mii + σ 2 mii + σ 2 21 EC3062 ECONOMETRICS The Method of Maximum Likelihood The disturbance εt ; t = 1, . . . , T in the regression model are assumed to be independently and identically distributed with a normal density: ε2 exp − t2 N (εt ; 0, σ ) = √ 2σ 2πσ 2 2 (66) 1 . Since they are assumed to be independently distributed, their joint probability density function (p.d.f.) is T N (εt ; 0, σ 2 ) = (2πσ 2 )−T /2 exp (67) t=1 −1 2σ 2 T ε2 . t=1 If we regard the elements x1 , . . . , xT as a given set of numbers, then it follows that the conditional p.d.f. of the sample y1 , . . . , yT is (68) f (y1 , . . . , yT |x1 , . . . , xT ) = (2πσ 2 )−T /2 exp 22 −1 2σ 2 T (yt − α − βxt ) . t=1 EC3062 ECONOMETRICS The maximum likelihood estimates α, β and σ 2 are the values that maximise the probability measure that is attributed to the sample y1 , . . . , yT . The log likelihood function, which is maximised by these values, is (69) T 1 T log L = − log(2π ) − log(σ 2 ) − 2 2 2 2σ T (yt − α − βxt )2 . t=1 ˆ ˆ Given the value of σ 2 , this is maximised by the values α and β under (42) and (45) respectively, which minimise the error sum of squares. The estimate of σ 2 is from the following first-order condition: (70) ∂ log L T 1 =− 2 + 4 ∂σ 2 2σ 2σ T (yt − α − βxt )2 = 0. t=1 Multiplying throughout by 2σ 4 /T and rearranging the result, gives (71) 1 σ 2 (α, β ) = T T (yt − α − βxt )2 = t=1 23 1 T e2 . t ...
View Full Document

Ask a homework question - tutors are online