{[ promptMessage ]}

Bookmark it

{[ promptMessage ]}

prml-slides-3 - PATTERN RECOGNITION AND MACHINE LEARNING...

Info iconThis preview shows pages 1–15. Sign up to view the full content.

View Full Document Right Arrow Icon
PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 3: LINEAR MODELS FOR REGRESSION
Background image of page 1

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full Document Right Arrow Icon
Linear Basis Function Models (1) Example: Polynomial Curve Fitting
Background image of page 2
Linear Basis Function Models (2) Generally where Á j ( x ) are known as basis functions . Typically, Á 0 ( x ) = 1 , so that w 0 acts as a bias. In the simplest case, we use linear basis functions : Á d ( x ) = x d .
Background image of page 3

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full Document Right Arrow Icon
Linear Basis Function Models (3) Polynomial basis functions: These are global; a small change in x affect all basis functions.
Background image of page 4
Linear Basis Function Models (4) Gaussian basis functions: These are local; a small change in x only affect nearby basis functions. ¹ j and s control location and scale (width).
Background image of page 5

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full Document Right Arrow Icon
Linear Basis Function Models (5) Sigmoidal basis functions: where Also these are local; a small change in x only affect nearby basis functions. ¹ j and s control location and scale (slope).
Background image of page 6
Maximum Likelihood and Least Squares (1) Assume observations from a deterministic function with added Gaussian noise: which is the same as saying, Given observed inputs, , and targets, , we obtain the likelihood function where
Background image of page 7

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full Document Right Arrow Icon
Maximum Likelihood and Least Squares (2) Taking the logarithm, we get where is the sum-of-squares error.
Background image of page 8
Computing the gradient and setting it to zero yields Solving for w , we get where Maximum Likelihood and Least Squares (3) The Moore-Penrose pseudo-inverse, .
Background image of page 9

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full Document Right Arrow Icon
Geometry of Least Squares Consider S is spanned by . w ML minimizes the distance between t and its orthogonal projection on S , i.e. y . N -dimensional M -dimensional
Background image of page 10
Sequential Learning Data items considered one at a time (a.k.a. online learning); use stochastic (sequential) gradient descent: This is known as the least-mean-squares (LMS) algorithm . Issue: how to choose ´ ?
Background image of page 11

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full Document Right Arrow Icon
Regularized Least Squares (1) Consider the error function: With the sum-of-squares error function and a quadratic regularizer, we get which is minimized by Data term + Regularization term ¸ is called the regularization coefficient.
Background image of page 12
Regularized Least Squares (2) With a more general regularizer, we have Lasso Quadratic
Background image of page 13

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full Document Right Arrow Icon
Regularized Least Squares (3) Lasso tends to generate sparser solutions than a quadratic regularizer.
Background image of page 14
Image of page 15
This is the end of the preview. Sign up to access the rest of the document.

{[ snackBarMessage ]}