class_09_05 - Statistical Data Mining ORIE 474 Fall 2007...

Info iconThis preview shows pages 1–12. Sign up to view the full content.

View Full Document Right Arrow Icon
Statistical Data Mining ORIE 474 Fall 2007 Tatiyana Apanasovich 09/05/07 Principal Components Analysis
Background image of page 1

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
3.6 PCA :Overview What is Dimensionality Reduction? Simplifying complex data Can be used as a Data Mining “tool” Useful for both “data modeling” and “data analysis” Linear Dimensionality Reduction Methods Principal Component Analysis (PCA)
Background image of page 2
Suppose you have M objects, each with N measurements Find the best K-dimensional parameterization Goal: Find a “compact parameterization” or “Latent Variable” representation Underlying assumptions to DimRedux Measurements over-specify data, N > K The number of measurements exceed the number of “true” degrees of freedom in the system The measurements capture all of the significant variability What is Dimensionality Reduction?
Background image of page 3

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
Uses for DimRedux Build a “compact” model of the data Compression for storage, transmission, & retrieval Parameters for indexing, exploring, and organizing Generate “plausible” new data Answer fundamental questions about data What is its underlying dimensionality? How many degrees of freedom are exhibited? How many “latent variables”? How independent are my measurements? Is there a projection of my data set where important relationships stand out?
Background image of page 4
PCA: What problem does it solve? Minimizes “least-squares” (Euclidean) error between original data and reconstruction The D-dimensional model provided by PCA has the smallest Euclidean error of any K-parameter linear model. where is the model predicted by the D-dimensional PCA. Projects data s.t. the variance is maximized Find an optimal “orthogonal” basis set for describing the given data - = n 1 i 2 i i ) x x ~ ( min i x ~
Background image of page 5

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
Simple 3D example I generated 500 points in 3D space. I can rotate picture to get a better view (2D projection) of variability of points. “Bad” View “Better” View
Background image of page 6
Simple 3D example PCA gives the best projection
Background image of page 7

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
One approach to deal with high dimensional data is by reducing their dimensionality. Project high dimensional data onto a lower dimensional sub-space using linear transformations.
Background image of page 8
Linear transformations are simple to compute and tractable. Classical –linear- approaches: Principal Component Analysis (PCA) Fisher Discriminant Analysis (FDA) ( ) t i i i Y U X b u a = = K x 1 Kx N N x 1 (K<<N) K x 1 Kx N N x 1 (K<<N)
Background image of page 9

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
1 1 2 2 1 2 ˆ ... where , ,. .., isa basein the -dimensionalsub-space(K<N) K K K x bu b u b u u u u K = + + + ˆ x x = 1 1 2 2 1 2 ... where , ,. .., isa basein theoriginal N-dimensionalspace N N n x a v a v a v v v v = + + + Find a basis in a low dimensional sub-space: Approximate vectors by projecting them in a low dimensional sub-space : (1) Original space representation: (2) Lower-dimensional sub-space representation: Note: if K=N, then
Background image of page 10
Information loss Dimensionality reduction implies information loss !!
Background image of page 11

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
Image of page 12
This is the end of the preview. Sign up to access the rest of the document.

Page1 / 31

class_09_05 - Statistical Data Mining ORIE 474 Fall 2007...

This preview shows document pages 1 - 12. Sign up to view the full document.

View Full Document Right Arrow Icon
Ask a homework question - tutors are online