SAS - Preparing Data for Multivariate Analysis

# SAS - Preparing Data for Multivariate Analysis - Chapter 7...

This preview shows pages 1–4. Sign up to view the full content.

1 Chapter 7 Preparing Data for Multivariate Analysis Section 7.1 Screening, Cleaning, and Preparing Data 3 Objectives ± Understand many of the most common data problems for multivariate analysis and the consequences of these problems. ± Screen for restricted range, small groups, and outliers. ± Clean and prepare data files for multivariate analysis using SAS. 4 Data Reality… Data 5 Things to Do Before You Begin 9 Data files accurate? 9 Outliers? 9 Restricted ranges in continuous variables? 9 Unequal cell sizes in categorical variables? 9 Distributions? 9 Collinearity/singularity in variables? 9 Homogeneous covariance matrices? 9 Extent and nature of missing data? This is just a sampling! 6 Data Preparation is Key to Success You should reasonably expect to spend more time cleaning, verifying, screening, and imputing your data than analyzing it. Data analysis is ± 90% perspiration ± 10% analysis ± 100% FUN with SAS!

This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document
2 7 Problem: Accuracy of Data Files Look at summary statistics to verify N , scale, and so on. Check ranges of variables for incorrectly keyed numeric values. Use frequency tables for incorrectly keyed categorical variables. Check data for duplicates. ± PROC FREQ; PROC SORT nodupkey; Recode items if needed. ± DATA step, PROC SQL PROC MEANS min max N mean median ; 8 Problem: Outliers and Influential Points . . . . . . . . . . . . . . . . . . . . . . 9 Outlier Detection Tools ± Leverage, DFFITS (PROC REG) ± Z-scores (PROC STDIZE) ± Schematic box plots (PROC BOXPLOT). 10 Outlier Detection Tools Specifically for multivariate outliers: ± Two- and three-way scatter plots ± Principal components. 11 Problem: Restriction of Range Years of Education 12 Near-Zero Group Sizes 40 42 35 2 B1 B2 A1 A2
3 13 Sandwich Nutrition Example Calories Calories Total Fat Carbohydrates Sodium Weight Weight Fiber Category Category Protein 14 This demonstration performs multivariate data screening using interactive graphical techniques Outlier Analysis and Data Screening Using SAS/IML Workshop 2.1 15 Outliers: What to Do? There are several ways of handling outlying data points, the usefulness of which vary by discipline. ± Use winsorized or trimmed statistics. ± Analyze data with and without outliers. – If outliers make little difference, leave them in. ± Delete significant ( p < .001) outliers.

This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document
This is the end of the preview. Sign up to access the rest of the document.

## This note was uploaded on 12/29/2011 for the course EXST 7037 taught by Professor Geaghan,j during the Fall '08 term at LSU.

### Page1 / 7

SAS - Preparing Data for Multivariate Analysis - Chapter 7...

This preview shows document pages 1 - 4. Sign up to view the full document.

View Full Document
Ask a homework question - tutors are online