PRINCIPAL COMPONENT ANALYSIS (PCA) So you ’ ve measured lots of features … • Overfitting, interpretability issues, visualization, computation 13 Can we combine raw features into new features that yield a simpler description of the same system? Images: Zico Kolter
PRINCIPAL COMPONENT ANALYSIS (PCA) Assume: data is normalized ?????????? • Zero mean, unit (= 1) variance Hypothesis function: • First multiply input by low rank matrix W (“compress” it), then map back into the initial space using U Loss function: squared distance (like k-means) Optimization problem: 14
PRINCIPAL COMPONENT ANALYSIS (PCA) Dimensionality reduction : main use of PCA for data science applications If , then is a reduced (probably with some loss) representation of input features x 15
PRINCIPAL COMPONENT ANALYSIS (PCA) PCA optimization problem is non-convex ??????????? We can solve the problem exactly using the singular value decomposition (SVD): • Factorize matrix M = U Σ V T (also used to approximate) 16 m x n ≈ U M Σ V T m x r r x n r x r CMSC422 MATH240
PRINCIPAL COMPONENT ANALYSIS (PCA) Solving PCA exactly using the SVD: 1. Normalize input data, pick #components k 2. Compute (exact) SVD of X = U Σ V T 3. Return: • U = V :,1:k Σ -1 1:k,1:k • W = V T :,1:k Loss is 17 n X i = k +1 ⌃ 2 ii CMSC422 MATH240
PCA IN PYTHON Can roll your own PCA easily (assuming a call to SVD via SciPy or similar) … … or just use Scikit-Learn: 18 from sklearn.decomposition import PCA X=np.array([[-1,-1],[-2,-1],[-3,-2],[1,1],[2,1],[3,2]]) # Fit PCA with 2 components (i.e., two final features) pca = PCA(n_components=2) pca.fit(X) print(pca.explained_variance_ratio_) [ 0.99244... 0.00755...] Looks like our data basically sit on a line
HOW TO USE PCA & FRIENDS IN PRACTICE Unsupervised learning methods are useful for EDA • Cluster or reduce to a few dimensions and visualize! Also useful as data prep before supervised learning! 1. Run PCA, get W matrix 2. Transform – (reduce colinearity, dimension) 3. Train and test your favorite supervised classifier Or use k-means to set up radial basis functions (RBFs): 1. Get k centers 2. Create RBF features 19
RECOMMENDER SYSTEMS & COLLABORATIVE FILTERING 20
NETFLIX PRIZE Recommender systems : predict a user’s rating of an item Netflix Prize: $1MM to the first team that beats our in-house engine by 10% • Happened after about three years • Model was never used by Netflix for a variety of reasons • Out of date (DVDs vs streaming) • Too complicated / not interpretable 21 Twilight Wall-E Twilight II TFotF User 1 +1 -1 +1 ? User 2 +1 -1 ? ? User 3 -1 +1 -1 +1
RECOMMENDER SYSTEMS Recommender systems feel like: • Supervised learning (we know the user watched some movies, so these are like labels) • Unsupervised learning (we want to find latent structure, e.g., genres of movies) They fall somewhere in between, in “Information Filtering” or Information Retrieval” … • … but we can still just phrase the problem in terms of hypothesis classes, loss functions, and optimization problems 22
You've reached the end of your free preview.
Want to read all 37 pages?
- Spring '17
- John P. Dickerson