This preview shows pages 1–12. Sign up to view the full content.
This preview has intentionally blurred sections. Sign up to view the full version.
View Full DocumentThis preview has intentionally blurred sections. Sign up to view the full version.
View Full DocumentThis preview has intentionally blurred sections. Sign up to view the full version.
View Full DocumentThis preview has intentionally blurred sections. Sign up to view the full version.
View Full DocumentThis preview has intentionally blurred sections. Sign up to view the full version.
View Full DocumentThis preview has intentionally blurred sections. Sign up to view the full version.
View Full Document
Unformatted text preview: Manifold Regularization 9.520 Class 06, 27 February 2006 Andrea Caponnetto About this class Goal To analyze the limits of learning from examples in high dimensional spaces. To introduce the semisupervised setting and the use of unlabeled data to learn the in trinsic geometry of a problem. To define Riemannian Manifolds, Manifold Laplacians, Graph Laplacians. To introduce a new class of algorithms based on Manifold Regularization (LapRLS, LapSVM). Unlabeled data Why using unlabeled data? labeling is often an expensive process semisupervised learning is the natural setting for hu man learning Semisupervised setting u i.i.d. samples drawn on X from the marginal distribution p ( x ) { x 1 , x 2 , . . . , x u } , only n of which endowed with labels drawn from the con ditional distributions p ( y  x ) { y 1 , y 2 , . . . , y n } . The extra u n unlabeled samples give additional informa tion about the marginal distribution p ( x ). The importance of unlabeled data Curse of dimensionality and p ( x ) Assume X is the Ddimensional hypercube [0 , 1] D . The worst case scenario corresponds to uniform marginal dis tribution p ( x ). Two perspectives on curse of dimensionality: As d increases, local techniques (eg nearest neighbors) become rapidly ineffective. Minimax results show that rates of convergence of em pirical estimators to optimal solutions of known smooth ness, depend critically on D Curse of dimensionality and kNN It would seem that with a reasonably large set of train ing data, we could always approximate the conditional expectation by knearestneighbor averaging. We should be able to find a fairly large set of observa tions close to any x [0 , 1] D and average them. This approach and our intuition breaks down in high dimensions . Sparse sampling in high dimension Suppose we send out a cubical neighborhood about one vertex to capture a fraction r of the observations. Since this corresponds to a fraction r of the unit volume, the expected edge length will be 1 e D ( r ) = r D . Already in ten dimensions e 10 (0 . 01) = 0 . 63, that is to capture 1% of the data, we must cover 63% of the range of each input variable! No more local neighborhoods! Distance vs volume in high dimensions 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Distance p=1 p=2 p=3 p=10 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Fraction of Volume Curse of dimensionality and smoothness Assuming that the target function f (in the squared loss case) belongs to the Sobolev space W s 2 ([0 , 1] D ) = { f L 2 ([0 , 1] D )  2 s  f ( )  2 < + } Z d it is possible to show that s sup IE S ( I [ f S ] I [ f ]) > Cn D ,f W 2 s More smoothness s faster rate of convergence Higher dimension D slower rate of convergence A DistributionFree Theory of Nonparametric Regression , Gyorfi Intrinsic dimensionality Raw format of natural data is often high dimensional, but in many cases...
View
Full
Document
This note was uploaded on 11/11/2011 for the course BIO 9.07 taught by Professor Ruthrosenholtz during the Spring '04 term at MIT.
 Spring '04
 RuthRosenholtz

Click to edit the document details