This preview has intentionally blurred sections. Sign up to view the full version.
View Full Document
Unformatted text preview: Extending Linear Regression: Weighted Least Squares, Heteroskedasticity, Local Polynomial Regression 36350, Data Mining 23 October 2009 Contents 1 Weighted Least Squares 1 2 Heteroskedasticity 3 2.1 Weighted Least Squares as a Solution to Heteroskedasticity . . . 5 3 Local Linear Regression 10 4 Exercises 15 1 Weighted Least Squares Instead of minimizing the residual sum of squares, RSS ( β ) = n X i =1 ( y i ~x i · β ) 2 (1) we could minimize the weighted sum of squares, WSS ( β, ~w ) = n X i =1 w i ( y i ~x i · β ) 2 (2) This includes ordinary least squares as the special case where all the weights w i = 1. We can solve it by the same kind of algebra we used to solve the ordinary linear least squares problem. But why would we want to solve it? For three reasons. 1. Focusing accuracy. We may care very strongly about predicting the re sponse for certain values of the input — ones we expect to see often again, ones where mistakes are especially costly or embarrassing or painful, etc. 1 — than others. If we give the points x i near that region big weights w i , and points elsewhere smaller weights, the regression will be pulled towards matching the data in that region. 2. Discounting imprecision. Ordinary least squares is the maximum likeli hood estimate when the in Y = ~ X · β + is IID Gaussian white noise. This means that the variance of has to be constant, and we measure the regression curve with the same precision elsewhere. This situation, of constant noise variance, is called homoskedasticity . Often however the magnitude of the noise is not constant, and the data are heteroskedastic . When we have heteroskedasticity, even if each noise term is still Gaussian, ordinary least squares is no longer the maximum likelihood estimate, and so no longer efficient. If however we know the noise variance σ 2 i at each measurement i , and set w i = 1 /σ 2 i , we get the heteroskedastic MLE, and recover efficiency. To say the same thing slightly differently, there’s just no way that we can estimate the regression function as accurately where the noise is large as we can where the noise is small. Trying to give equal attention to all parts of the input space is a waste of time; we should be more concerned about fitting well where the noise is small, and expect to fit poorly where the noise is big. 3. Doing something else. There are a number of other optimization prob lems which can be transformed into, or approximated by, weighted least squares. The most important of these arises from generalized linear mod els, where the mean response is some nonlinear function of a linear pre dictor. (Logistic regression is an example.) In the first case, we decide on the weights to reflect our priorities. In the third case, the weights come from the optimization problem we’d really rather be solving. What about the second case, of heteroskedasticity?...
View
Full Document
 Spring '12
 Staff
 Linear Regression, Regression Analysis, Ordinary least squares

Click to edit the document details