An Introduction to Principal Component Analysis
with Examples in R
Thomas Phan
first.last @ acm.org
Technical Report
*
September 1, 2016
1
Introduction
Principal component analysis (PCA) is a series of
mathematical steps for reducing the dimensionality of
data. In practical terms, it can be used to reduce the
number of features in a data set by a large factor (for
example, from 1000s of features to 10s of features) if
the features are correlated.
This type of “feature compression” is often used for
two purposes. First, if high-dimensional data is to be
visualized by plotting it on a 2-D surface (such as a
computer monitor or a piece of paper), then PCA can
be used to reduce the data to 2-D or 3-D; in this con-
text, PCA can be considered a complete, standalone
unsupervised machine learning algorithm.
Second,
if a different machine learning training algorithm is
taking too long to run, then PCA can be used to re-
duce the number of features, which in turn reduces
the amount of training data and the time to train a
model; here, PCA is used as a pre-processing step as
part of a larger workflow.
In this paper we discuss
PCA largely for the first purpose of visualizing and
exploring patterns in data.
It is important to note that PCA does not reduce
features by selecting a subset of the original features
(such as what is done with wrapper feature selection
algorithms that perform feature-by-feature forward
or backward search [6]).
Instead, PCA creates new,
uncorrelated features that are a linear combination of
the original features.
For a given data instance, its
features are transformed via a dot product with a nu-
meric vector to create a new feature; this vector is a
principal component
that serves as the direction of an
axis upon which the data instance is projected. The
new features are thus the projections of the original
features into a new coordinate space defined by the
principal components. To perform the actual dimen-
sionality reduction, the user can follow a well-defined
methodology to select the fewest new features that
*
This document serves as a readable tutorial on PCA using
only basic concepts from statistics and linear algebra.
explain a desired amount of data variance.
This paper is organized in the following manner.
In Section 2 we explain how PCA is applied to data
sets and how it creates new features from existing fea-
tures.
Importantly, we explain various tips for how
to effectively use PCA with the
R
programming lan-
guage in order to achieve good feature compression.
In Section 3 we use PCA to explore three different
data sets:
Fisher’s Iris data, Kobe Bryant’s shots,
and car class fuel economy. In Section 4 we show R
code examples that run PCA on data sets, and in
Section 5 we provide references for further reading.


You've reached the end of your free preview.
Want to read all 14 pages?
- Fall '13