probably not make sense to use all D of the princi- pal components, as that would project the original D -dimensional data instances right back onto a D - dimensional subspace, meaning that there would be no dimensionality reduction at all. For visualization on a flat 2-D surface such as a computer screen, selecting K to be 2 or 3 would be appropriate. For the purpose of reducing the number of features in order to reduce the amount of data and improve the running time of a machine learning training algo- rithm, the PCA software user must make a choice for K . On the one hand, a very small K would be de- sirable because it would reduce the amount of data, but on the other hand, if too many dimensions are re- moved, the data may not capture important details. In the context of PCA, the notion of capturing de- tails is quantified by the amount of total variance ex- plained by the selected principal components. Recall that when the original data points are projected onto an axis defined by a principal component, the vari- ance of this projected data can be computed. The amount of total variance explained is then the sum of the variances over all K principal components. 5
You've reached the end of your free preview.
Want to read all 14 pages?
- Fall '13