CS142: Machine Learning Spring 2017 Lecture 17 Instructor: Pedro Felzenszwalb Scribes: Dan Xiang, Tyler Dae Devlin Dimensionality Reduction Let x 1 , . . . , x n D . We tried to estimate a density p ( x ) using histograms, but the number of bins grows on the order of C D for some constant C . For parametric estimation, if we use a multivariate gaussian model N ( x | μ, Σ) then we are estimating parameters μ D and Σ D × D . Thus, the number of parameters we need to estimate gets very large as D grows. Linear Projections Let φ : D m where m D . 1. We would like φ to “preserve information”. 2. If y i = φ ( x i ) we want y i to approximate x i . Affine Subspaces Let u 1 , . . . , u M D be orthonormal vectors, i.e. k u i k = 1 and u T i u j = 0 for i 6 = j . An affine space A is a space generated by linear combinations of the set of vectors { u i } M 1 plus an offset u 0 D , i.e. a hyperplane A defined by A . = { u 0 + a i u i + · · · + a M u M : a i } An equivalent definition is the set of points x D satisfying the equation x T u 0 = C for some constant C .

