Question

# Please explain how you would approach this question. Thanks a lot.

Consider the following 5 ˆ 2 data

matrix. Think of the rows as observations and columns as feature values for each observation. X0 " ¨ ˚˚˚˚˚˚˚˚˚˝ 0 0 0 2 2 0 2 2 x y ˛ ‹ ‹ ‹ ‹ ‹ ‹ ‹ ‹ ‹‚ P R 5ˆ2 , px, yq P r0, 2s 2 Write x i 0 for the ith column of X0 and µi for its average, i " 1, 2. x, y are scalars that can take any values between 0 and 2. In Q1, fix px, yq " p1, 1q. (a) (2 pt) Write out the centered version of X0, calling it X. In other words, x 1 " x 1 0 ´µ1 is the first column of X. (b) (2 pt) Write out XtX. Hint: Remember the ij-th entry of this matrix is the inner product of columns x i and x j . This should make it easy to calculate. For example, the 11 entry of this matrix (top left corner) is px 1 q t px 1 q. (c) (2 pt) First, show the column vector v1 " p1, 0q t is an eigenvector for XtX and write its corresponding eigenvalue, which you should call λ1. Remember, v is an eigenvector for a matrix A with eigenvalue λ if Av " λv. Second, find a second eigenvector for XtX and its corresponding eigenvalue λ2. (d) (4 pt) Say the SVD of X is X " UDVt . Remember from class that XtX " VD2Vt where D2 is the diagonal matrix in which each entry of D is squared. First, use (c) to justify the fact that V " ¨ ˝ 1 0 0 ´1 ˛ ‚ Second, find an explicit expression for D. You should be specific in your explanations. 1 (e) (5 pt) Compute the PC-k variance for k " 1, 2 principle component directions, using your answers to (c), (d). Look up the formula in lecture slides if you have forgotten. (f) (7 pt) Go to the website http://setosa.io/ev/principal-component-analysis/ and look at the 2-dimensional PCA example. Move the points in the left-hand display to match the coordinates of X0 with px, yq " p1, 1q. When done correctly, the left display should have a square with dots at the four corners and one at the middle. The L shape in the display should have its red line horizontal and its green line vertical. Explanation of right-hand display: The x-axis coordinates show the dot products of the first principle component vector with each of the the centered observation vectors (rows of X), and the y-axis shows the dot products with the second principle component. Note the L in the left-hand display is rotated in the right-hand display so that the red base of the L is horizontal, corresponding to the first PC direction, and the green left-hand side of the L is vertical, corresponding to the second PC direction. Therefore, looking at how spread out the points are along the x-axis compared to the y-axis shows the variation along the PC1 and PC2 directions respectively. The right-hand display should be a square with corners at p1, 1q,p1, ´1q etc. and center zero. Use your answers to (c) through (d) to select all correct statements about the right-hand display on the 2d example obtained from your representation of X0 on the plot. The right-hand display... ( ) looks exactly like a plot of X. ( ) looks exactly like a plot of X0. ( ) shows the vertical direction (pointing up) of X0 has more variation than the horizontal direction. ( ) shows the horizontal (pointing right) direction of X0 has more variation than the vertical direction. ( ) shows the horizontal and vertical directions of X0 have the same variation. ( ) shows that the projections of the observation vectors in X onto the PC1 and PC2 directions are projections onto the original axes. ( ) shows that all of the total variance of the data is explained by the first principle component. ( ) shows that half of the total variance of the data is explained by the first principle component. ( ) shows that about a quarter of the total variance of the data is explained by the second principle component. ( ) shows that the matrix of principle component directions times X gives a rotation of X, including possibly the identity where nothing moves.

### Recently Asked Questions

- The mean of a normally distributed data set is 112, and the standard deviation is 18. a) Use the standard normal table to find the probability that a

- A survey showed that 720 of the 1000 adult respondents believe in global warming. (a) Construct a 90% confidence interval estimate of the proportion of adults

- explain the statistical concept of regression or reversion to the mean and provide an example.