probably not make sense to use all
D
of the princi-
pal components, as that would project the original
D
-dimensional data instances right back onto a
D
-
dimensional subspace, meaning that there would be
no dimensionality reduction at all.
For visualization on a flat 2-D surface such as a
computer screen, selecting
K
to be 2 or 3 would be
appropriate.
For the purpose of reducing the number of features
in order to reduce the amount of data and improve
the running time of a machine learning training algo-
rithm, the PCA software user must make a choice for
K
. On the one hand, a very small
K
would be de-
sirable because it would reduce the amount of data,
but on the other hand, if too many dimensions are re-
moved, the data may not capture important details.
In the context of PCA, the notion of capturing de-
tails is quantified by the amount of total variance ex-
plained by the selected principal components. Recall
that when the original data points are projected onto
an axis defined by a principal component, the vari-
ance of this projected data can be computed.
The
amount of total variance explained is then the sum
of the variances over all
K
principal components.
5



You've reached the end of your free preview.
Want to read all 14 pages?
- Fall '13