This preview shows page 1. Sign up to view the full content.
Unformatted text preview: 1072
0.085392
e401k
4.264744
3.753411
0.000179 VER. 10/23/2012. © P. KOLM 72 Some Final Comments about GLS/WLS
• Feasible GLS is in general not BLUE (unlike GLS) but is consistent and asymptotically more efficient than OLS
• Remember we are using GLS/WLS just for inference/efficiency – OLS is still unbiased and consistent
• OLS/GLS estimates will still be different due to sampling error, but if they are very different then it’s likely that some other GaussMarkov assumption
is not true VER. 10/23/2012. © P. KOLM 73 Multiple Regression Analysis: LAD, Ridge and Lasso
Regression VER. 10/23/2012. © P. KOLM 74 Motivation Discussion points:
• How sensitive is linear regression to outliers?
• How sensitive is linear regression to multicollinearity?
• How much confidence do we have in a linear regression model that has been calibrated using noisy data?
How do we deal with these issues?
• Many techniques and approaches to deal with these issues
• Outlier detection, censoring and data filtering
• Which one is best depends on the situation
• Several approaches are based on “changing” the way in which the regression is calculated
In the next few slides we discuss the main ideas behind the LAD, Ridge and
Lasso regression techniques VER. 10/23/2012. © P. KOLM 75 Least Absolute Deviation (LAD) The least absolute deviation (LAD) estimator determines the coefficients in the
linear model y = β0 + β1x1 +…+ βk xk + u
by minimizing the sum of absolute values of the residuals, that is
n min ˆˆ
ˆ
β0 , β1 ,..., βk ∑y
i =1 i ˆ
ˆ
ˆ
− β0 − β1x i 1 −…−βk x ik • Minimizing the sum of absolute values of the residuals makes the regression less sensitive to outliers (as compared to classical linear regression that
minimizes the squared residuals
• LAD estimates are not available in closed form and have to be solved for numerically VER. 10/23/2012. © P. KOLM 76 Ridge Regression Ridge regression determines the coefficients in the linear model y = β0 + β1x1 +…+ βk xk + u
by minimizing the sum of squared residuals plus a “penalty term” of the squared
ˆ
regression coefficients (excluding β )
0 n min ˆ
ˆˆ β0 , β1 ,..., βk ∑ (y
i =1 i k ˆ
ˆ
ˆ
ˆ
− β0 − β1x i 1 −…−βk x ik ) + λ ∑ βk2
2 i =1 The “ridge parameter”, λ , can be chosen in a number of different ways
• Ridge estimates can be calculated in closed form
• Ridge regression is known to shrink the coefficients of correlated predictors towards each other, allowing them to “borrow strength from each other” VER. 10/23/2012. © P. KOLM 77 The Lasso The Lasso determines the coefficients in the linear model y = β0 + β1x1 +…+ βk xk + u
by minimizing the sum of squared residuals plus a “penalty term” of the absolute
ˆ
value of the regression coefficients (excluding β )
0 n min ˆˆ
ˆ
β0 , β1 ,..., βk ∑ (y
i =1 i k ˆ
ˆ
ˆ
ˆ
− β0 − β1x i 1 −…−βk x ik ) + λ ∑ βk
2 i =1 The “Lasso parameter”, λ...
View
Full
Document
This document was uploaded on 02/17/2014 for the course COURANT G63.2751.0 at NYU.
 Fall '14

Click to edit the document details