PRINCIPAL COMPONENT
ANALYSIS (PCA)
So you
’
ve measured lots of features …
•
Overfitting, interpretability issues, visualization, computation
13
Can we combine raw features into new features that yield a simpler description of the same system?
Images: Zico Kolter

PRINCIPAL COMPONENT
ANALYSIS (PCA)
Assume: data is normalized
??????????
•
Zero mean, unit (= 1) variance
Hypothesis function:
•
First multiply input by low rank matrix
W
(“compress” it), then
map back into the initial space using
U
Loss function: squared distance (like k-means)
Optimization problem:
14

PRINCIPAL COMPONENT
ANALYSIS (PCA)
Dimensionality reduction
: main use of PCA for data science
applications
If
, then
is a reduced (probably
with some loss) representation of input features x
15

PRINCIPAL COMPONENT
ANALYSIS (PCA)
PCA optimization problem is non-convex
???????????
We can solve the problem exactly using the singular value
decomposition (SVD):
•
Factorize matrix M = U Σ V
T
(also used to approximate)
16
m x n
≈
U
M
Σ
V
T
m x r
r x n
r x r
CMSC422
MATH240

PRINCIPAL COMPONENT
ANALYSIS (PCA)
Solving PCA exactly using the SVD:
1.
Normalize input data, pick #components k
2.
Compute (exact) SVD of X = U Σ V
T
3.
Return:
•
U = V
:,1:k
Σ
-1
1:k,1:k
•
W = V
T
:,1:k
Loss is
17
n
X
i
=
k
+1
⌃
2
ii
CMSC422
MATH240

PCA IN PYTHON
Can roll your own PCA easily (assuming a call to SVD via
SciPy or similar) …
… or just use Scikit-Learn:
18
from sklearn.decomposition import PCA
X=np.array([[-1,-1],[-2,-1],[-3,-2],[1,1],[2,1],[3,2]])
# Fit PCA with 2 components (i.e., two final features)
pca = PCA(n_components=2)
pca.fit(X)
print(pca.explained_variance_ratio_)
[ 0.99244... 0.00755...]
Looks like our data basically sit on a line

HOW TO USE PCA & FRIENDS
IN PRACTICE
Unsupervised learning methods are useful for EDA
•
Cluster or reduce to a few dimensions and visualize!
Also useful as data prep before supervised learning!
1.
Run PCA, get W matrix
2.
Transform
– (reduce colinearity, dimension)
3.
Train and test your favorite supervised classifier
Or use k-means to set up radial basis functions (RBFs):
1.
Get k centers
2.
Create RBF features
19

RECOMMENDER SYSTEMS &
COLLABORATIVE FILTERING
20

NETFLIX PRIZE
Recommender systems
: predict a user’s rating of an item
Netflix Prize: $1MM to the first team that beats our in-house
engine by 10%
•
Happened after about three years
•
Model was
never used
by Netflix for a variety of reasons
•
Out of date (DVDs vs streaming)
•
Too complicated / not interpretable
21
Twilight
Wall-E
Twilight II
TFotF
User 1
+1
-1
+1
?
User 2
+1
-1
?
?
User 3
-1
+1
-1
+1

RECOMMENDER
SYSTEMS
Recommender systems feel like:
•
Supervised learning (we know the user watched some movies,
so these are like labels)
•
Unsupervised learning (we want to find latent structure, e.g.,
genres of movies)
They fall somewhere in between, in “Information Filtering” or
Information Retrieval” …
•
… but we can still just phrase the problem in terms of
hypothesis classes, loss functions, and optimization problems
22


You've reached the end of your free preview.
Want to read all 37 pages?
- Spring '17
- John P. Dickerson