lecture17

# lecture17 - CSE 6740 Lecture 17 What Loss Function Should I...

This preview shows pages 1–10. Sign up to view the full content.

CSE 6740 Lecture 17 What Loss Function Should I Use? II (Estimation Theory) Alexander Gray [email protected] Georgia Institute of Technology CSE 6740 Lecture 17 – p. 1/3

This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document
Today 1. Robustness (“How safe/stable is my loss function?”) 2. Comparing Estimators (“How can I say one loss function is superior to another?”) CSE 6740 Lecture 17 – p. 2/3
Robustness We often choose according to mathematical/computational convenience. Otherwise, mostly robustness decides. CSE 6740 Lecture 17 – p. 3/3

This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document
Robustness In the (approximate) words of [Huber, 1981]: Any statistical procedure should possess the following desirable features: It has reasonably good efficiency under the assumed model. It is robust in the sense that small deviations from the assumed model assumptions should impair the performance only slightly. Somewhat larger deviations from the model should not cause a catastrophe. CSE 6740 Lecture 17 – p. 4/3
MLE vs. L2E Let’s revisit L 2 estimation (L2E), which we used for KDE. If f is the true density and hatwide f θ is an estimate with parameters θ , the L 2 error or L 2 distance is L 2 ( θ ) = integraldisplay ( hatwide f θ ( x ) - f ( x )) 2 dx (1) = integraldisplay hatwide f 2 θ ( x ) dx - 2 integraldisplay hatwide f θ ( x ) f ( x ) dx + integraldisplay f 2 ( x ) dx. Note that the third term can be ignored for the purpose of comparing different estimators. CSE 6740 Lecture 17 – p. 5/3

This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document
MLE vs. L2E Given a dataset, we wish to find the parameters which minimize the L 2 risk E [ L 2 ( θ )] = integraldisplay hatwide f 2 θ ( x ) dx - 2 N N summationdisplay i =1 hatwide f θ ( x i ) . (2) The term integraltext hatwide f 2 θ ( x ) dx can be thought of as a kind of built-in regularization term, which acts to penalize spikes or overly large densities (due to, say, overlapped components in a mixture), and the second term as a goodness-of-fit term. CSE 6740 Lecture 17 – p. 6/3
MLE vs. L2E Let’s do L2E for a mixture of Gaussians hatwide f θ ( x ) = K summationdisplay k =1 ω k φ ( x | μ k , Σ k ) . (3) The L2E regularization term for a mixture of Gaussians is integraldisplay hatwide f 2 θ ( x ) dx = K summationdisplay k =1 K summationdisplay j =1 ω k ω j φ ( μ j | μ k , Σ k + Σ j ) . (4) The expression φ ( μ j | μ k , Σ k + Σ j ) comes from the identity φ ( x | μ k , Σ k ) φ ( x | μ j , Σ j ) = φ ( μ j | μ k , Σ k + Σ j ) φ ( x | μ k,j , Σ k,j ) ; this and other properties of Gaussians make the integral tractable. CSE 6740 Lecture 17 – p. 7/3

This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document
MLE vs. L2E One Extreme Outlier MLE 40% Uniform Noise 40% Mixed Gaussian Noise Quasar Data (Stars as Noise) L2E CSE 6740 Lecture 17 – p. 8/3
Robustness Let X N ( μ, σ 2 ) . The value which minimizes squared-error, or L 2 loss, arg min θ E ( X - θ ) 2 , is the mean of X : d E ( X - θ ) 2 = 0 θ = E X (5) The value which minimizes absolute, or L 1 loss, arg min θ E | X - θ | , is the median: d E | X - θ | = 0 θ = m (6) where m is the median of X , i.e.

This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document
This is the end of the preview. Sign up to access the rest of the document.

{[ snackBarMessage ]}

### What students are saying

• As a current student on this bumpy collegiate pathway, I stumbled upon Course Hero, where I can find study resources for nearly all my courses, get online help from tutors 24/7, and even share my old projects, papers, and lecture notes with other students.

Kiran Temple University Fox School of Business ‘17, Course Hero Intern

• I cannot even describe how much Course Hero helped me this summer. It’s truly become something I can always rely on and help me. In the end, I was not only able to survive summer classes, but I was able to thrive thanks to Course Hero.

Dana University of Pennsylvania ‘17, Course Hero Intern

• The ability to access any university’s resources through Course Hero proved invaluable in my case. I was behind on Tulane coursework and actually used UCLA’s materials to help me move forward and get everything together on time.

Jill Tulane University ‘16, Course Hero Intern