Handout05 - Lecture 5 1. Data checking: correct formats,...

Info iconThis preview shows pages 1–5. Sign up to view the full content.

View Full Document Right Arrow Icon
Lecture 5 1. Data checking: correct formats, unusual values 2. Scatterplot matrix 3. Editing data in the data step 4. Simple scatter plot: Plot, Insight, SGplot, Gplot 1 Data Checking 1. Check that SAS read each variable as the correct type: • Numeric data: NUM • Character data: CHAR • Date-time data: converted to numeric 2. Check that SAS read the correct number of observations. Proc Contents answers both these questions. Proc Contents data = PH6470.child_iq; 2
Background image of page 1

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
The CONTENTS Procedure Data Set Name PH6470.CHILD_IQ Observations 434 Member Type DATA Variables 6 Engine V9 Indexes 0 Created Thu, Sep 10, 2009 11:55:16 AM Observation Length 48 Filename C:\Documents and Settings\Administrator\Desktop\SAS Class\child_iq.sas7bdat Release Created 9.0201M0 Host Created XP_PRO [other stuff] Alphabetic List of Variables and Attributes #V a r i a b l e T y p eL e nL a b e l 1I D N u m 8I D 2c h i l d _ I Q N u m 8c h i l d I Q 6 male Num 8 male 3m o m _ H S _ g r a dN u m 8m o m H S g r a d 5m o m _ I Q N u m o m I Q 4m o m _ a g e N u m o m a g e 3 Data N; input ID gender $ birthdate MMDDYY10. ; format birthdate MMDDYY10.; cards; 4833 F 5/16/1978 4834 F 7/4/1980 4855 M 12/14/1988 ; Proc Contents data=N; The CONTENTS Procedure Data Set Name WORK.N Observations 3 Member Type DATA Variables 3 ..... Alphabetic List of Variables and Attributes a r i a b l e T y p e nF o r m a t D N u m 8 3 birthdate Num 8M M D D Y Y 1 0 . 2g e n d e r C h a r 8 4
Background image of page 2
3. What is the pattern of missing data? Proc Means nmiss n data=pubh.OGTT_hw2; NMISS = count of missing values, N = count of non-missing The MEANS Procedure N Variable Miss N ------------------------ min000 0 1019 min030 3 1016 min060 2 1017 min090 4 1015 min120 0 1019 id 0 1019 ------------------------ 5 4. Find unusual observations—are there outliers or incorrect values? Use Insight for quick scatterplots, Proc Univariate to identify extreme observations, Proc Freq for list of distinct values 5. Should some variables be transformed? For positive variables, when maximum value minimum value > 10, take logs. 6
Background image of page 3

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
Scatterplot matrix can help with both questions. Three ways to do this: Proc SGscatter data=ph6470.child_iq; SG = statistical graphic matrix child_iq mom_iq mom_age / group = male; Proc Insight data=ph6470.child_iq; interactive, but no grouping variable
Background image of page 4
Image of page 5
This is the end of the preview. Sign up to access the rest of the document.

Page1 / 19

Handout05 - Lecture 5 1. Data checking: correct formats,...

This preview shows document pages 1 - 5. Sign up to view the full document.

View Full Document Right Arrow Icon
Ask a homework question - tutors are online