This preview shows pages 1–8. Sign up to view the full content.
This preview has intentionally blurred sections. Sign up to view the full version.
View Full DocumentThis preview has intentionally blurred sections. Sign up to view the full version.
View Full DocumentThis preview has intentionally blurred sections. Sign up to view the full version.
View Full DocumentThis preview has intentionally blurred sections. Sign up to view the full version.
View Full Document
Unformatted text preview: Stat 5102 Lecture Slides Deck 3 Charles J. Geyer School of Statistics University of Minnesota 1 Likelihood Inference We have learned one very general method of estimation: the method of moments. Now we learn another: the method of maximum likelihood. 2 Likelihood Suppose have a parametric statistical model specified by a PMF or PDF. Our convention of using boldface to distinguish between scalar data x and vector data x and a scalar parameter θ and a vector parameter θ becomes a nuisance here. To begin our discussion we write the PMF or PDF as f θ ( x ). But it makes no difference in likelihood inference if the data x is a vector. Nor does it make a difference in the fundamental definitions if the parameter θ is a vector. You may consider x and θ to be scalars, but much of what we say until further notice works equally well if either x or θ is a vector or both are. 3 Likelihood The PMF or PDF, considered as a function of the unknown parameter or parameters rather than of the data is called the likelihood function L ( θ ) = f θ ( x ) Although L ( θ ) also depends on the data x , we suppress this in the notation. If the data are considered random, then L ( θ ) is a random variable, and the function L is a random function. If the data are considered nonrandom, as when the observed value of the data is plugged in, then L ( θ ) is a number, and L is an ordinary mathematical function. Since the data X or x do not appear in the notation L ( θ ), we cannot distinguish these cases notationally and must do so by context. 4 Likelihood (cont.) For all purposes that likelihood gets used in statistics — it is the key to both likelihood inference and Bayesian inference — it does not matter if multiplicative terms not containing unknown parameters are dropped from the likelihood function. If L ( θ ) is a likelihood function for a given problem, then so is L * ( θ ) = L ( θ ) h ( x ) where h is any strictly positive realvalued function. 5 Log Likelihood In frequentist inference, the log likelihood function , which is the logarithm of the likelihood function, is more useful. If L is the likelihood function, we write l ( θ ) = log L ( θ ) for the log likelihood. When discussing asymptotics, we often add a subscript denot ing sample size, so the likelihood becomes L n ( θ ) and the log likelihood becomes l n ( θ ). Note: we have yet another capital and lower case convention: capital L for likelihood and lower case l for log likelihood. 6 Log Likelihood (cont.) As we said before (slide 5), we may drop multiplicative terms not containing unknown parameters from the likelihood function. If L ( θ ) = h ( x ) g ( x,θ ) we may drop the term h ( x ). Since l ( θ ) = log h ( x ) + log g ( x,θ ) this means we may drop additive terms not containing unknown parameters from the log likelihood function....
View
Full
Document
This note was uploaded on 10/28/2010 for the course STAT 5102 taught by Professor Staff during the Spring '03 term at Minnesota.
 Spring '03
 Staff
 Statistics

Click to edit the document details