{[ promptMessage ]}

Bookmark it

{[ promptMessage ]}

cs229-notes10 - CS229 Lecture notes Andrew Ng Part XI...

Info icon This preview shows pages 1–3. Sign up to view the full content.

View Full Document Right Arrow Icon
CS229 Lecture notes Andrew Ng Part XI Principal components analysis In our discussion of factor analysis, we gave a way to model data x R n as “approximately” lying in some k -dimension subspace, where k n . Specif- ically, we imagined that each point x ( i ) was created by first generating some z ( i ) lying in the k -dimension a ffi ne space { Λ z + μ ; z R k } , and then adding Ψ -covariance noise. Factor analysis is based on a probabilistic model, and parameter estimation used the iterative EM algorithm. In this set of notes, we will develop a method, Principal Components Analysis (PCA), that also tries to identify the subspace in which the data approximately lies. However, PCA will do so more directly, and will require only an eigenvector calculation (easily done with the eig function in Matlab), and does not need to resort to EM. Suppose we are given dataset { x ( i ) ; i = 1 , . . . , m } of attributes of m dif- ferent types of automobiles, such as their maximum speed, turn radius, and so on. Lets x ( i ) R n for each i ( n m ). But unknown to us, two di ff erent attributes—some x i and x j —respectively give a car’s maximum speed mea- sured in miles per hour, and the maximum speed measured in kilometers per hour. These two attributes are therefore almost linearly dependent, up to only small di ff erences introduced by rounding o ff to the nearest mph or kph. Thus, the data really lies approximately on an n - 1 dimensional subspace. How can we automatically detect, and perhaps remove, this redundancy? For a less contrived example, consider a dataset resulting from a survey of pilots for radio-controlled helicopters, where x ( i ) 1 is a measure of the piloting skill of pilot i , and x ( i ) 2 captures how much he/she enjoys flying. Because RC helicopters are very di ffi cult to fly, only the most committed students, ones that truly enjoy flying, become good pilots. So, the two attributes x 1 and x 2 are strongly correlated. Indeed, we might posit that that the 1
Image of page 1

Info icon This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document Right Arrow Icon
2 data actually likes along some diagonal axis (the u 1 direction) capturing the intrinsic piloting “karma” of a person, with only a small amount of noise lying o ff this axis. (See figure.) How can we automatically compute this u 1 direction?
Image of page 2
Image of page 3
This is the end of the preview. Sign up to access the rest of the document.

{[ snackBarMessage ]}

What students are saying

  • Left Quote Icon

    As a current student on this bumpy collegiate pathway, I stumbled upon Course Hero, where I can find study resources for nearly all my courses, get online help from tutors 24/7, and even share my old projects, papers, and lecture notes with other students.

    Student Picture

    Kiran Temple University Fox School of Business ‘17, Course Hero Intern

  • Left Quote Icon

    I cannot even describe how much Course Hero helped me this summer. It’s truly become something I can always rely on and help me. In the end, I was not only able to survive summer classes, but I was able to thrive thanks to Course Hero.

    Student Picture

    Dana University of Pennsylvania ‘17, Course Hero Intern

  • Left Quote Icon

    The ability to access any university’s resources through Course Hero proved invaluable in my case. I was behind on Tulane coursework and actually used UCLA’s materials to help me move forward and get everything together on time.

    Student Picture

    Jill Tulane University ‘16, Course Hero Intern