IAML: Regularization and Ridge Regression Nigel Goddard School of Informatics Semester 1

Regularization I Regularization is a general approach to add a “complexity parameter” to a learning algorithm. Requires that the model parameters be continuous. (i.e., Regression OK, Decision trees not.) I If we penalize polynomials that have large values for their coefficients we will get less wiggly solutions ˜ E ( w ) = | y - Φ w | 2 + λ | w | 2 I Solution is ˆ w = (Φ T Φ + λ I ) - 1 Φ T y I This is known as ridge regression I Rather than using a discrete control parameter like M (model order) we can use a continuous parameter λ I Caution: Don’t shrink the bias term! (The one that corresponds to the all 1 feature.) 2 / 12
Regularized Loss Function I The overall cost function is the sum of two parabolic bowls. The sum is also a parabolic bowl. I The combined minimum lies on the line between the minimum of the squared error and the origin.

