Lect16 Micro-array analysis

An Introduction to Bioinformatics Algorithms (Computational Molecular Biology)

Info iconThis preview shows pages 1–9. Sign up to view the full content.

View Full Document Right Arrow Icon
L16: Micro-array analysis Dimension reduction Unsupervised clustering
Background image of page 1

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
PCA: motivating example Consider the expression values of 2 genes over 6 samples. Clearly, the expression of g 1 is not informative, and it suffices to look at g 2 values. Dimensionality can be reduced by discarding the gene g 1 g 1 g 2
Background image of page 2
PCA: Ex2 Consider the expression values of 2 genes over 6 samples. Clearly, the expression of the two genes is highly correlated. Projecting all the genes on a single line could explain most of the data.
Background image of page 3

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
PCA Suppose all of the data were to be reduced by projecting to a single line β from the mean. How do we select the β ? m β
Background image of page 4
PCA cont’d Let each point x k map to x’ k =m+a k β . We want to minimize the error Observation 1: Each point x k maps to x’ k = m + β T (x k -m) β (a k = β T (x k -m)) x k - x ' k 2 k m β x k x’ k
Background image of page 5

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
Proof of Observation 1 min a k x k - x ' k 2 = min a k x k - m + m - x ' k 2 = min a k x k - m 2 + m - x ' k 2 - 2( x ' k - m ) T ( x k - m ) = min a k x k - m 2 + a k 2 b T b - 2 a k b T ( x k - m ) = min a k x k - m 2 + a k 2 - 2 a k b T ( x k - m ) 2 a k - 2 b T ( x k - m ) = 0 a k = b T ( x k - m ) a k 2 = a k b T ( x k - m ) x k - x ' k 2 = x k - m 2 - b T ( x k - m )( x k - m ) T b Differentiating w.r.t a k
Background image of page 6
Minimizing PCA Error To minimize error, we must maximize β T S β By definition, λ = β T S β implies that λ is an eigenvalue, and β the corresponding eigenvector. Therefore, we must choose the eigenvector corresponding to the largest eigenvalue. x k - x ' k k 2 = C - b T k ( x k - m )( x k - m ) T b = C - b T Sb
Background image of page 7

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
PCA The single best dimension is given by the eigenvector of the largest eigenvalue of S The best k dimensions can be obtained by the eigenvectors { β 1 , β 2 , …, β k } corresponding to the k largest eigenvalues.
Background image of page 8
Image of page 9
This is the end of the preview. Sign up to access the rest of the document.

Page1 / 36

Lect16 Micro-array analysis - L16: Micro-array analysis...

This preview shows document pages 1 - 9. Sign up to view the full document.

View Full Document Right Arrow Icon
Ask a homework question - tutors are online