{[ promptMessage ]}

Bookmark it

{[ promptMessage ]}

Lect16 Micro-array analysis

# An Introduction to Bioinformatics Algorithms (Computational Molecular Biology)

This preview shows pages 1–9. Sign up to view the full content.

L16: Micro-array analysis Dimension reduction Unsupervised clustering

This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document
PCA: motivating example Consider the expression values of 2 genes over 6 samples. Clearly, the expression of g 1 is not informative, and it suffices to look at g 2 values. Dimensionality can be reduced by discarding the gene g 1 g 1 g 2
PCA: Ex2 Consider the expression values of 2 genes over 6 samples. Clearly, the expression of the two genes is highly correlated. Projecting all the genes on a single line could explain most of the data.

This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document
PCA Suppose all of the data were to be reduced by projecting to a single line β from the mean. How do we select the line β ? m β
PCA cont’d Let each point x k map to x’ k =m+a k β . We want to minimize the error Observation 1: Each point x k maps to x’ k = m + β T (x k -m) β (a k = β T (x k -m)) x k - x ' k 2 k m β x k x’ k

This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document
Proof of Observation 1 min a k x k - x ' k 2 = min a k x k - m + m - x ' k 2 = min a k x k - m 2 + m - x ' k 2 - 2( x ' k - m ) T ( x k - m ) = min a k x k - m 2 + a k 2 b T b - 2 a k b T ( x k - m ) = min a k x k - m 2 + a k 2 - 2 a k b T ( x k - m ) 2 a k - 2 b T ( x k - m ) = 0 a k = b T ( x k - m ) a k 2 = a k b T ( x k - m ) x k - x ' k 2 = x k - m 2 - b T ( x k - m )( x k - m ) T b Differentiating w.r.t a k
Minimizing PCA Error To minimize error, we must maximize β T S β By definition, λ = β T S β implies that λ is an eigenvalue, and β the corresponding eigenvector. Therefore, we must choose the eigenvector corresponding to the largest eigenvalue. x k - x ' k k 2 = C - b T k ( x k - m )( x k - m ) T b = C - b T Sb

This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document
PCA The single best dimension is given by the eigenvector of the largest eigenvalue of S The best k dimensions can be obtained by the eigenvectors { β 1 , β 2 , …, β k } corresponding to the k largest eigenvalues.
This is the end of the preview. Sign up to access the rest of the document.

{[ snackBarMessage ]}