This preview has intentionally blurred sections. Sign up to view the full version.
View Full DocumentThis preview has intentionally blurred sections. Sign up to view the full version.
View Full DocumentThis preview has intentionally blurred sections. Sign up to view the full version.
View Full DocumentThis preview has intentionally blurred sections. Sign up to view the full version.
View Full Document
Unformatted text preview: Stat 5102 Lecture Slides Deck 5 Charles J. Geyer School of Statistics University of Minnesota 1 Linear Models We now return to frequentist statistics for the rest of the course. The next subject is linear models , parts of which are variously called regression and analysis of variance (ANOVA) and analysis of covariance (ANCOVA), with regression being subdivided into simple linear regression and multiple regression . Although users have a very fractured view of the subject — many think regression and ANOVA have nothing to do with each other — a unified view is much simpler and more powerful. 2 Linear Models (cont.) In linear models we have data on n individuals. For each individ ual we observe one variable, called the response , which is treated as random, and also observe other variables, called predictors or covariates , which are treated as fixed. If the predictors are actually random, then we condition on them. Collect the response variables into a random vector Y of length n . In linear models we assume the components of Y are normally distributed and independent and have the same variance σ 2 . We do not assume they are identically distributed. Their means can be different. 3 Linear Models (cont.) E ( Y ) = μ ( * ) var( Y ) = σ 2 I ( ** ) where I is the n × n identity matrix. Hence Y ∼ N ( μ ,σ 2 I ) ( *** ) Recall that we are conditioning on the covariates, hence the ex pectation ( * ) is actually a conditional expectation, conditioning on any covariates that are random, although we have not indi cated that in the notation. Similarly, the variance in ( ** ) is a conditional variance, and the distribution in ( *** ) is a conditional distribution. 4 Linear Models (cont.) One more assumption gives “linear models” its name μ = M β where M is a nonrandom matrix, which may depend on the co variates, and β is a vector of dimension p of unknown parameters. The matrix M is called the model matrix or the design matrix . We will always use the former, since the latter doesn’t make much sense except for a designed experiment. Each row of M corresponds to one individual. The ith row determines the mean for the ith individual E ( Y i ) = m i 1 β 1 + m i 2 β 2 + ··· + m ip β p and m i 1 , ... , m ip depend only on the covariate information for this individual. 5 Linear Models (cont.) The joint PDF of the data is f ( y ) = n Y i =1 1 √ 2 πσ exp 1 2 σ 2 ( y i μ i ) 2 = (2 πσ 2 ) n/ 2 exp  1 2 σ 2 n X i =1 ( y i μ i ) 2 = (2 πσ 2 ) n/ 2 exp 1 2 σ 2 ( y M β ) T ( y M β ) Hence the log likelihood is l ( β ,σ 2 ) = n 2 log( σ 2 ) 1 2 σ 2 ( y M β ) T ( y M β ) 6 The Method of Least Squares The maximum likelihood estimate for β maximizes the log like lihood, which is the same as minimizing the quadratic function β 7→ ( y M β ) T ( y M β ) Hence this method of estimation is also called the “method of least squares”. Historically, the method of least squares was invented about 1800 and the method of maximum likelihood was invented about 1920. So the older name still attaches to the method....
View
Full Document
 Spring '03
 Staff
 Linear Algebra, Statistics, Normal Distribution, Regression Analysis, Cont

Click to edit the document details