{[ promptMessage ]}

Bookmark it

{[ promptMessage ]}

prml-slides-3

# prml-slides-3 - PATTERN RECOGNITION AND MACHINE LEARNING...

This preview shows pages 1–15. Sign up to view the full content.

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 3: LINEAR MODELS FOR REGRESSION

This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document
Linear Basis Function Models (1) Example: Polynomial Curve Fitting
Linear Basis Function Models (2) Generally where Á j ( x ) are known as basis functions . Typically, Á 0 ( x ) = 1 , so that w 0 acts as a bias. In the simplest case, we use linear basis functions : Á d ( x ) = x d .

This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document
Linear Basis Function Models (3) Polynomial basis functions: These are global; a small change in x affect all basis functions.
Linear Basis Function Models (4) Gaussian basis functions: These are local; a small change in x only affect nearby basis functions. ¹ j and s control location and scale (width).

This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document
Linear Basis Function Models (5) Sigmoidal basis functions: where Also these are local; a small change in x only affect nearby basis functions. ¹ j and s control location and scale (slope).
Maximum Likelihood and Least Squares (1) Assume observations from a deterministic function with added Gaussian noise: which is the same as saying, Given observed inputs, , and targets, , we obtain the likelihood function where

This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document
Maximum Likelihood and Least Squares (2) Taking the logarithm, we get where is the sum-of-squares error.
Computing the gradient and setting it to zero yields Solving for w , we get where Maximum Likelihood and Least Squares (3) The Moore-Penrose pseudo-inverse, .

This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document
Geometry of Least Squares Consider S is spanned by . w ML minimizes the distance between t and its orthogonal projection on S , i.e. y . N -dimensional M -dimensional
Sequential Learning Data items considered one at a time (a.k.a. online learning); use stochastic (sequential) gradient descent: This is known as the least-mean-squares (LMS) algorithm . Issue: how to choose ´ ?

This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document
Regularized Least Squares (1) Consider the error function: With the sum-of-squares error function and a quadratic regularizer, we get which is minimized by Data term + Regularization term ¸ is called the regularization coefficient.
Regularized Least Squares (2) With a more general regularizer, we have Lasso Quadratic

This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document
Regularized Least Squares (3) Lasso tends to generate sparser solutions than a quadratic regularizer.
This is the end of the preview. Sign up to access the rest of the document.

{[ snackBarMessage ]}