1 n n i 1 m iL \u03b8 25 where n denotes the sample size Assume that these sample

# 1 n n i 1 m il θ 25 where n denotes the sample size

• Homework Help
• 11

This preview shows page 7 - 9 out of 11 pages.

. . 1 n n i =1 m iL ( θ ) (25) where n denotes the sample size. Assume that these sample moments converge to their population counterparts, i.e., m n p -→ E ( m i ( θ )) = 0 . Directly relating the sample moments to their population counterparts (as we did for the MoM case) produces the moment equations. Even if these equations are independent (and L < n , as assumed), this system will not have a unique solution. (Henceforth let’s drop the n subscript for ease of exposition). 4.1 Relationship to other estimators We can recognize that all other estimators we have seen so far, such as OLS, IV, and ML, can be interpreted as GMM estimators. For OLS, for example, we have the following moment conditions: E ( m i ( β ) ) = E x i 1 ( y i - x 0 i β ) x i 2 ( y i - x 0 i β ) . . . x iK ( y i - x 0 i β ) = E ( x i ε i ) = 0 . (26) Notice that in this example L = K and the system is exactly identified. The corresponding sample moments of (26) are m ( β ) = E 1 n n i =1 m i 1 ( β ) 1 n n i =1 m i 2 ( β ) . . . 1 n n i =1 m iL ( β ) = 1 n n X i =1 x i ( y i - x 0 i β ) . (27) The moment equations are m ( β ) = 0 . The IV case is similar to (26) and (27) where we replace x il with z il , and generally L > K . For the ML estimator, the moment conditions are given by the score identity, which states that the expectation of the first derivative of the log-likelihood function for any observation should be zero, i.e. m i ( θ ) = E ∂l i ( θ ) θ = 0 . (28) Then, the sample moments are given by m ( θ ) = 1 n n X i =1 ∂l i ( θ ) θ (29) 7
which yields the moment equations, m ( θ ) = 0 . Note that we have again an exactly identified system with L = K . The division by n in (29)is just for compatibility with the general GMM approach and won’t change the ML solution. 4.2 Criterion Function The GMM approach builds on the notion of defining a scalar-valued criterion function q ( m ( θ ) ) that incorporates all available moments. Minimizing the criterion function with respect to θ yields a unique GMM solution if the identification conditions are met. The question we need to answer is which criterion function we should use. One can for example use the sum of the squares. i.e., q ( m ( θ ) ) = m ( θ ) 0 m ( θ ) = n X i =1 ( m l ) 2 . (30) This criterion function gives each moment equal weight, which can be shown to be inefficient. Intuitively, we’d like to give moments with relatively large asymptotic variance a relatively smaller weight to boost the efficiency of the resulting estimator. This requires some form of weighted least squares, i.e., q ( m ( θ ) ) = m ( θ ) 0 W n m ( θ ) , (31) where W n can be any positive definite matrix that may depend on the data, but not on θ . We also assume that W n has a well-behaved large-sample counterpart, i.e., W n p -→ W , a positive definite matrix. If W n satisfies these conditions, the GMM estimator ˆ θ gmm from the minimization of (31) will be consistent. The appropriate choice for the weight matrix matters for the efficiency. The optimal choice of W n is the matrix that converges in probability to the inverse of the asymptotic variance of n m ( θ ). Again, let’s denote this asymptotic variance by Φ . The asymptotic distribution of the GMM estimator is given as ˆ θ gmm A -→ N θ 0 , \ Var ( ˆ θ gmm ) (32) where \ Var ( ˆ θ gmm ) = 1 n ( Γ 0 ) - 1 ( Γ 0 WΦWΓ )( Γ 0 ) - 1 (33) and Γ
• • • 