PRINCIPAL COMPONENT ANALYSIS PCA So you ve measured lots of features

Principal component analysis pca so you ve measured

This preview shows page 13 - 23 out of 37 pages.

PRINCIPAL COMPONENT ANALYSIS (PCA) So you ve measured lots of features … Overfitting, interpretability issues, visualization, computation 13 Can we combine raw features into new features that yield a simpler description of the same system? Images: Zico Kolter
Image of page 13
PRINCIPAL COMPONENT ANALYSIS (PCA) Assume: data is normalized ?????????? Zero mean, unit (= 1) variance Hypothesis function: First multiply input by low rank matrix W (“compress” it), then map back into the initial space using U Loss function: squared distance (like k-means) Optimization problem: 14
Image of page 14
PRINCIPAL COMPONENT ANALYSIS (PCA) Dimensionality reduction : main use of PCA for data science applications If , then is a reduced (probably with some loss) representation of input features x 15
Image of page 15
PRINCIPAL COMPONENT ANALYSIS (PCA) PCA optimization problem is non-convex ??????????? We can solve the problem exactly using the singular value decomposition (SVD): Factorize matrix M = U Σ V T (also used to approximate) 16 m x n U M Σ V T m x r r x n r x r CMSC422 MATH240
Image of page 16
PRINCIPAL COMPONENT ANALYSIS (PCA) Solving PCA exactly using the SVD: 1. Normalize input data, pick #components k 2. Compute (exact) SVD of X = U Σ V T 3. Return: U = V :,1:k Σ -1 1:k,1:k W = V T :,1:k Loss is 17 n X i = k +1 2 ii CMSC422 MATH240
Image of page 17
PCA IN PYTHON Can roll your own PCA easily (assuming a call to SVD via SciPy or similar) … … or just use Scikit-Learn: 18 from sklearn.decomposition import PCA X=np.array([[-1,-1],[-2,-1],[-3,-2],[1,1],[2,1],[3,2]]) # Fit PCA with 2 components (i.e., two final features) pca = PCA(n_components=2) pca.fit(X) print(pca.explained_variance_ratio_) [ 0.99244... 0.00755...] Looks like our data basically sit on a line
Image of page 18
HOW TO USE PCA & FRIENDS IN PRACTICE Unsupervised learning methods are useful for EDA Cluster or reduce to a few dimensions and visualize! Also useful as data prep before supervised learning! 1. Run PCA, get W matrix 2. Transform – (reduce colinearity, dimension) 3. Train and test your favorite supervised classifier Or use k-means to set up radial basis functions (RBFs): 1. Get k centers 2. Create RBF features 19
Image of page 19
RECOMMENDER SYSTEMS & COLLABORATIVE FILTERING 20
Image of page 20
NETFLIX PRIZE Recommender systems : predict a user’s rating of an item Netflix Prize: $1MM to the first team that beats our in-house engine by 10% Happened after about three years Model was never used by Netflix for a variety of reasons Out of date (DVDs vs streaming) Too complicated / not interpretable 21 Twilight Wall-E Twilight II TFotF User 1 +1 -1 +1 ? User 2 +1 -1 ? ? User 3 -1 +1 -1 +1
Image of page 21
RECOMMENDER SYSTEMS Recommender systems feel like: Supervised learning (we know the user watched some movies, so these are like labels) Unsupervised learning (we want to find latent structure, e.g., genres of movies) They fall somewhere in between, in “Information Filtering” or Information Retrieval” … … but we can still just phrase the problem in terms of hypothesis classes, loss functions, and optimization problems 22
Image of page 22
Image of page 23

You've reached the end of your free preview.

Want to read all 37 pages?

  • Spring '17
  • John P. Dickerson

  • Left Quote Icon

    Student Picture

  • Left Quote Icon

    Student Picture

  • Left Quote Icon

    Student Picture