Stat841f09 - Wiki Course Notes

# M e rce rs the ore m httpe nwikipe diaorgwikim e rce

This preview shows page 1. Sign up to view the full content.

This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: eorem in detail Let be a mapping to a high dimensional Hibert Space (http://en.wikipedia.org/wiki/Hilbert_space) The transformed coordinates can be defined as, By Hilbert - Schmidt theory we can represent an inner product in Hilbert space as, where K is symmetric, then Mercer's theorem gives necessary and sufficient conditions on K for it to satisfy the above relation. M e rce r's The ore m (http://e n.wikipe dia.org/wiki/M e rce r%27s _the ore m) Let C be a compact subset of and K a function , if then, converges absolutely and uniformly to a symmetric function References: Vapnik, V., 1998. Statistical Learning Theory. John Wiley & Sons, {423} Mercer, J., 1909. Functions of positive and negative type and their connection with the theory of integral equations. Philos. Trans. Roy. Soc. London, A 209:415{446} Kernel Functions There are various kernel functions, for example: wikicour senote.com/w/index.php?title= Stat841&pr intable= yes 68/74 10/09/2013 Stat841 - Wiki Cour se Notes Linear kernel: Polynomial kernel: Gaussian kernel: If is a matrix in the original space, and is a matrix in the Hilbert space (http://en.wikipedia.org/wiki/Hilbert_space) (good explanation video: part 1 (http://www.youtube.com/watch?v=V2pBdH7YzX0) part 2 (http://www.youtube.com/watch?v=YRY5xlk3TC0) ), then is an matrix. The inner product is also illustrated as correlation, which measures the similarity between data points. This gives us some insight in how to choose the kernel. The choice depends on certain prior knowledge of the problem and on how we believe the similarity of our data should be measured. In practice, the Gaussian (RBF) kernel usually works best. Besides the most common kernel functions mentioned above, many novel kernels are also suggested for different problem domains like text classification, gene classification and so on. These kernel functions can be applied to many algorithms to derive the "kernel version". For example, kernel PCA, kernel LDA, etc. SVM: non-s eparable cas e We have seen how SVMs are able to f...
View Full Document

## This document was uploaded on 03/07/2014.

Ask a homework question - tutors are online