Unformatted text preview: (3) θ As a practical matter, when computing the maximum likelihood estimate it is often easier to work with the log-likelihood, R(θ) := log p(y |θ). Because the logarithm is monotonic, it does not affect the argmax: ˆ θML ∈ arg max R(θ). θ 3 (4) The ML estimator is very popular and has been used all the way back to Laplace. It has a number of nice properties, one of which is that it is a consistent estimator. Let’s explain what that means. Definition 1 (Convergence in Probability). A sequence of random variables X1 , X2 , . . . is said to converge in probability to a random variable X if, ∀E > 0, lim n→∞ 1 (|Xn − X | ≥ E) = 0. P We denote this convergence as Xn − X . → Definition 2 (Consistent estimators). Suppose the data y1 , . . . , ym were gen­ ˆ erated by a probability distribution p(y |θ0 ). An estimator θ is consistent if P ˆ→ it converges in probability to the true value: θ − θ0 as m → ∞. We said that maximum likelihood is consistent. This means that if the distri­ bution that generated the data belongs to the family defined by our likeliho...
