{[ promptMessage ]}

Bookmark it

{[ promptMessage ]}

cs229-notes10

# cs229-notes10 - CS229 Lecture notes Andrew Ng Part XI...

This preview shows pages 1–3. Sign up to view the full content.

CS229 Lecture notes Andrew Ng Part XI Principal components analysis In our discussion of factor analysis, we gave a way to model data x R n as “approximately” lying in some k -dimension subspace, where k n . Specif- ically, we imagined that each point x ( i ) was created by first generating some z ( i ) lying in the k -dimension a ffi ne space { Λ z + μ ; z R k } , and then adding Ψ -covariance noise. Factor analysis is based on a probabilistic model, and parameter estimation used the iterative EM algorithm. In this set of notes, we will develop a method, Principal Components Analysis (PCA), that also tries to identify the subspace in which the data approximately lies. However, PCA will do so more directly, and will require only an eigenvector calculation (easily done with the eig function in Matlab), and does not need to resort to EM. Suppose we are given dataset { x ( i ) ; i = 1 , . . . , m } of attributes of m dif- ferent types of automobiles, such as their maximum speed, turn radius, and so on. Lets x ( i ) R n for each i ( n m ). But unknown to us, two di ff erent attributes—some x i and x j —respectively give a car’s maximum speed mea- sured in miles per hour, and the maximum speed measured in kilometers per hour. These two attributes are therefore almost linearly dependent, up to only small di ff erences introduced by rounding o ff to the nearest mph or kph. Thus, the data really lies approximately on an n - 1 dimensional subspace. How can we automatically detect, and perhaps remove, this redundancy? For a less contrived example, consider a dataset resulting from a survey of pilots for radio-controlled helicopters, where x ( i ) 1 is a measure of the piloting skill of pilot i , and x ( i ) 2 captures how much he/she enjoys flying. Because RC helicopters are very di ffi cult to fly, only the most committed students, ones that truly enjoy flying, become good pilots. So, the two attributes x 1 and x 2 are strongly correlated. Indeed, we might posit that that the 1

This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document
2 data actually likes along some diagonal axis (the u 1 direction) capturing the intrinsic piloting “karma” of a person, with only a small amount of noise lying o ff this axis. (See figure.) How can we automatically compute this u 1 direction?
This is the end of the preview. Sign up to access the rest of the document.

{[ snackBarMessage ]}

### What students are saying

• As a current student on this bumpy collegiate pathway, I stumbled upon Course Hero, where I can find study resources for nearly all my courses, get online help from tutors 24/7, and even share my old projects, papers, and lecture notes with other students.

Kiran Temple University Fox School of Business ‘17, Course Hero Intern

• I cannot even describe how much Course Hero helped me this summer. It’s truly become something I can always rely on and help me. In the end, I was not only able to survive summer classes, but I was able to thrive thanks to Course Hero.

Dana University of Pennsylvania ‘17, Course Hero Intern

• The ability to access any university’s resources through Course Hero proved invaluable in my case. I was behind on Tulane coursework and actually used UCLA’s materials to help me move forward and get everything together on time.

Jill Tulane University ‘16, Course Hero Intern