This preview shows pages 1–2. Sign up to view the full content.
Comments, Thoughts, etc Concerning Principal Component Analysis (PCA)
1)
Exploratory in nature
a.
implies that assumptions can be somewhat relaxed
i.
normality is not critical if only interest is in dimension reduction
2)
PCA strongly depends on the correlation structure of the variables used in the analysis
a.
requires linear relationships among variables
b.
requires sufficient sample sizes to accurately represent the correlation structure
i.
some texts recommend at least 200 observations
ii.
if data are highly variable (high variance) then more observations are
better than less observations
iii.
if variables are highly correlated than fewer observations are needed
than when correlation is not strong
iv.
if few variables are important in the PCA, then need fewer observations
than when many variables are important
3)
Missing values
a.
PCA does not use an observation if any of the variables has a missing value
b.
when the pattern of missingness is not completely at random, then the results
of the PCA can be compromised
c.
possible fixes:
i.
remove variables that have a high number of missing values
ii.
use imputation techniques to predict the likely value of the missing
variables
1.
e.g. suppose X1 and X2 are highly correlated and X2 tends to be
filled on but X1 has some missing values
a.
regress X1 on X2 using the observed data to get predicted
values for the missing data. Replace the missing values
This preview has intentionally blurred sections. Sign up to view the full version.
View Full Document
This is the end of the preview. Sign up
to
access the rest of the document.
 Spring '08
 Staff
 Correlation

Click to edit the document details