Comp Prin - Principle Components Analysis A Short Primer by Chris Simpkins [email protected] High-level Ideas A PCA projection represents a data set in terms

# Comp Prin - Principle Components Analysis A Short Primer by...

• Essay
• 3

This preview shows page 1 - 2 out of 3 pages.

Principle Components Analysis A Short Primer by Chris Simpkins, [email protected] High-level Ideas A PCA projection represents a data set in terms of the orthonormal eigenvectors of the data set’s covariance matrix. A covariance matrix captures the correlation between variables in a data set. PCA finds the orthonormal eigenvectors of the covariance matrix as the basis for the transformed feature space. (Eigenvectors can be thought of as the “natural basis” for a given multi-dimensional data set.) Higher eigenvalues in the covariance matrix indicate lower correlation between the features in the data set. PCA projections seek uncorrelated variables. Every data set has principle components, but PCA works best if data are Gaussian- distributed. For high dimensional data the Central Limit theorem allows us to assume Gaussian distributions. Covariance Matrices The variance of a single variable x is given by: σ 2 = n i = 1 ( x i ¯ X ) 2 n The variance of two variables, x and y, is given by: cov ( X , Y ) = n i