This preview shows pages 1–22. Sign up to view the full content.
This preview has intentionally blurred sections. Sign up to view the full version.
View Full DocumentThis preview has intentionally blurred sections. Sign up to view the full version.
View Full DocumentThis preview has intentionally blurred sections. Sign up to view the full version.
View Full DocumentThis preview has intentionally blurred sections. Sign up to view the full version.
View Full DocumentThis preview has intentionally blurred sections. Sign up to view the full version.
View Full DocumentThis preview has intentionally blurred sections. Sign up to view the full version.
View Full DocumentThis preview has intentionally blurred sections. Sign up to view the full version.
View Full DocumentThis preview has intentionally blurred sections. Sign up to view the full version.
View Full DocumentThis preview has intentionally blurred sections. Sign up to view the full version.
View Full DocumentThis preview has intentionally blurred sections. Sign up to view the full version.
View Full DocumentThis preview has intentionally blurred sections. Sign up to view the full version.
View Full Document
Unformatted text preview: THE UNIVERSITY OF HONG KONG
DEPARTMENT OF STATISTICS AND ACTUARIAL SCIENCE STAT1303 DATA MANAGEMENT
May 14, 2008 Time: 2:30 p.m. — 4:30 p.m. Candidates taking examinations that permit the use of calculators may use any calculator which
fulﬁls the following criteria: (a) it should be selfcontained, silent, batteryoperated and pocket sized and (b) it should have numeraldisplay facilities only and should be used only for the
purposes of calculation. It is the candidate’s responsibility to ensure that the calculator operates satisfactorily and the
candidate must record the name and type of the calculator on the front page of the examination
scripts. Lists ofpermitted/prohibited calculators will not be made available to candidates for
reference, and the onus will be on the candidate to ensure that the calculator used will not be in
violation of the criteria listed above. Answer ALL FIVE questions. Marks are shown in square brackets. An abridged version of
SAS syntax is provided in ANNEX 3. 1. A survey was conducted to study the prevalence of depression among people in the US adult population. The questionnaire is given in ANNEX 1. The data of the survey will be saved into a
SAS data set DEPRESSION. (a) Design the codebook for the questionnaire. Note that the variable names speciﬁed in the
codebook will be used in the SAS data set. (b) State whether the variables deﬁned in the codebook are interval, ratio, ordinal or nominal.
Suggest two descriptive statistics for each of the variables. (0) Write SAS programs to produce the appropriate graphs for the following. (i) Produce a line plot to investigate how the gender is related to whether the respondent
ever feels discouraged about how things were going in his/her life during the sad
episodes. (ii) Produce a bar chart to investigate if the mean number of depression problems as
listed in question D4 is 0.5 for male and l for female, respectively. (iii) Produce a scatter plot to investigate how the age is related to the longest period of
days the respondent ever had when he/she lost interest in most things he/she enjoys. [Total: 22 marks] S&AS: STAT1303 Data Management 2 2. A study was conducted for the elderly health in Hong Kong. The data set is HEALTH and the
variables are Variable Description Code Q1 Gender 0=Male; l=Female OMl Stamina 0 to 10 0M2 Cognition 0 to 10 0M3 Behavourial symptoms 0 to 10 0M4 Shortness of breath O=No; 1=Yes OMS Alcohol abuse 0=No abuse; l=Drink daily; 2=Alcohol abuse Referring to the SAS output in ANNEX 2, answer the following questions. For the statistical
inference, you should 0 specify the null hypothesis,
0 state the pvalue of the statistical test,
0 state the acceptance or rejection of the null hypothesis, and 0 state the conclusion.
Write SAS program(s) for each question to generate the SAS output in the ANNEX to which
you are referring. (a) Considering the Stamina and Cognition measures for male and female, ﬁnd the number of
nonmissing cases for the two variables for each sex and comment on the difference
between the central locations of the two variables between the sexes. (b) Test whether the mean of the Behavioral Symptom measure is 1.9 for those who do not
have Shortness of Breath and is 2.2 for those who have Shortness of Breath at the 5% level
of signiﬁcance. (c) Comment on whether the distributions of the Cognition measure are normal in Male and
Female groups, respectively. (d) Test at the 5% level of signiﬁcance, the relation between gender and Alcohol abuse.
Comment on the relation if it is signiﬁcant. [Total: 22 marks] S&AS: STAT1303 Data Management 3 3. A study was conducted and the data are stored in two text ﬁles ‘DEMODAT’ and
‘VISITDAT’ in the folder ‘C:\’. The data DEMO.DAT consists of the demographic variables
of the subjects in which one subject has one observation in the data ﬁle. The data VISITDAT
consists of the hospital visits of the subjects in which each subject may have more than one
visits. The data format of the text ﬁles is given as follows. 0 DEMO.DAT
Variable Type Description Code
ID Numeric Unique subject ID number
AGE Numeric Age of the subject
SEX Character Sex of the subject ‘F’ for female
’M’ for male
EDU Character Education level ‘N’ for below primary
‘P’ for primary
‘8’ for secondary
‘T’ for tertiary or above
DIST Character District Maximum 50 characters
Remarks: 0 The values of a subject are stored in one single line.
0 The values are separated by space(s). o Samle data
1 20 F N HK East
2 25 M P Kowloon West 0 VISITDAT
Variable Type Remark
ID Numeric Subject ID number
DATE Date Date of visit.
The INFORMAT is DATE9.
TYPE Numeric Type of visit
1 = Normal visit
2 = Special visit
M1 to M7 Numeric Seven medical measures
Remarks: 0 The values of a single subject are stored in three lines. o In the ﬁrst line of a subject, the values are ID, DATE and TYPE and are separated by
space. 0 In the second line, depending on the value of TYPE, the values are either I M1, M2 and M3, ifthe value of TYPE is 1, or I M1, M3 and M4, if the value of TYPE is 2. In the third line, the values are M5, M6 and M71 Each of the values of M1 to M7 is stored in 2 columns without any delimitor. For exam 1e, when the data in VISITDAT are the followin
1 12jan2007 1 111213 151617 1 25jan2007 2 212223 252627 00 , O S&AS: STAT1303 Data Management the values in the data set should be the followin. 1d date type m1 m2 m3 m4 m5 m6 m7
1 12JAN2007 1 11 12 13 . 15 16 17
1 25JAN2007 2 21 . 22 23 25 26 27 Write a SAS program for each of the following question. Create all SAS data sets in the folder ‘C:\’ .
(a)
(b) (C) (d) Create SAS data sets DEMO for DEMO.DAT and VISIT for VIS IT.DAT. The severity is deﬁned by the sum of M1, M2 and M3 of a subject. If the sum is equal to
or below 10, it has severity level 1. If the sum is equal to or greater than 10 and below 20,
it has severity level 2. If the sum is equal to or greater than 20, it has severity level 3. Add
a variable SEVERITY for the severity and the SEVLVL for the severity level to the data
set VISIT. The baseline severity level is the severity level of a subject in his/her first visit. Create
three subsets, MALEI, FEMAL2 and MILD, of DEMO. MALEl contains all male subjects whose baseline severity level is l or above, FEMAL2 contains all female subjects
whose baseline severity level is 2 or above and MILD contains all other subjects. The data sets should contain all the variables in DEMO and all the measures in the ﬁrst visit in
VISIT. An overall measure is deﬁned as the average of M5, M6 and M7 for each visit of each subject. (i) Add the variable OM for the overall measure to the data set VISIT. (ii) Deﬁne the improvement of each visit, except the ﬁrst visit, of each subject as the
ratio (0M in the current visit)— (0M in the previous visit)
(0M in the previous visit)
Add the variable OMINP for the improvement to the data set VISIT.
Deﬁne the average improvement for each subject as the average value of all OMINP
among the subject’s visits. Create a new data set IMPROVE containing the subject ID and the variable IMP which is the average improvement. There should only be
one observation for each subject. (iii) [Total: 19 marks] S&AS: STAT1303 Data Management 5 4. The variables of a questionnaire are deﬁned as follows. Question Variable description Valid values Code for
missing
1 ID number Numerals only
2 Gender M = male
F=ﬁmﬂe
3 Age in year 18 to 100 999
999 = No answer
4 Primary language 0 = English 9
1 = Chinese
2 = Others
9 = No answer
5 Ckherlanguage 20 characnns
6 The case was born in HK O = No 9
1 = Yes
9 = No answer
7 Did your heart pound or race? For Q7 to Q10 For Q7 to Q10
8 Were you short of breath? O = No 9
9 Did you have nausea or discomfort in your stomach? I = Yes
10 Did you feel dizzy or faint? 9 = No answer
11 About how many of these sudden attacks of fear or 0 to 900 999
panic have you had in your entire lifetime? 999 = No answer
12 What is your exact age when the attack occurred? 0 to 100 999
999 = No answer
13 Did you have one of these attacks at any time in the past 0 = No 9
12 months? 1 = Yes
9 = No answer
14 How many weeks in the past 12 months did you have at 052 999
least one attack? 999 = No answer
Remarks 0 When Q4 is 2, enter the language in Q5.
When all Q7 to Q10 are 0, end the questionnaire.
When Q11 is 0, end the questionnaire. When Q11 is 999, skip Q12.
When Q13 is O or 9, end the questionnaire. The raw data ﬁle PAN IC.DAT consists of a number of cases in list format separated by coma.
Sample observations are given for reference. 502,M,48,1, ,0,0,0,l,0,999,4,0,.
.,M,31,0, ,1,0,0,0,0,56,4,0,.
698,F,44,2,France,1,0,1,0,l,36,75,0,.
542,f,120,., 1,1,0,1,0,13,9,0,.
217,F,45,1, 777,M,27,2, .,0,0,13,9,0,.
8,M,l6,l, O,1,43,71,1,5
646,M,42,l, l,1,l,27,24,0,.
360,F,45,9, O,.,O,27,24,0,.
823,F,999,l, O 0,0,27,24,0,.
514, 1,0,0,1,20,10,1,999
79,M,29,1, a,F,49,l, 79,1,.,l, 273,F,99,0 S&AS: (a) (b)
(C) (d) (e) (f) (g) STAT1303 Data Management 6 Use the data step to identify the observations, from the raw data ﬁle, with missing and/or
invalid ID. Identify and use PROC PRINT to print out the observations with duplicate ID. Use PROC FORMAT to deﬁne formats for the four value types: valid values (excluding
missing codes), missing codes, missing values and invalid values. Using the data step and the formats deﬁned in (0), identify and print the observations, from
the raw data ﬁle, with missing values or invalid data type. Observations with codes for
missing should not be identiﬁed. There is no need to consider the contingence of the
questions stated in the remarks. Using the data step and the formats deﬁned in (c), identify and print the observations, from
the raw data ﬁle, with invalid value range or code. Create a SAS data set from the raw data ﬁle and hence, identify the outlying observations
deﬁned as those differing from the mean by at least three standard deviations. The
observations with invalid value should not be identiﬁed. The mean and the standard
deviation should be calculated based on the observations with valid value only. The
standardized value of the variables of concern should also be printed for the identiﬁed
outliers. The standardized value of a variable X is deﬁned as X — mean
SD Based on the SAS data set created in (f), identify the observations that violate the remarks.
There is no need to identify observations with missing values as identiﬁed in ((1) even if
they violate the remarks. [Total: 24 marks] S&AS: STATl303 Data Management 7 5. Consider a data set REG consisting of two variables, X and Y. data REG;
infile 'c:\reg.dat'; input x y;
run; A model is applied to study the relation betweenX and Y given as
y] =a+ﬁxl +813 for a sample of n observations, (x,, y,), i = 1, ..., n. Note that the knowledge of the model is
NOT required for this question. The estimates for a and [t are given as 2,1106.  ny, — i)
21:1(x' _ EV ’ __i n __l n
x_nZ,=1xl’andy—"ZI=1yi' Write SAS programs using only Data Step and PROC SORT without other SAS PROC’s for the
following questions. You may assume that there is no missing value in REG. o?=?l§fandl§= where, (a) Create a data set BETA containing the values of a? and [i for the data set REG. (b) With the values of 02 and [i , a ﬁtted value for each observation (x,, y.) is deﬁned as
5/, =é+ﬁx1,i= 1, ..., n.
Create a data set FITTED containing the original values xi and y, and the ﬁtted value )7, for
i = 1, ..., n. (c) The error sum of squares (SSE), total sum of squares (SST) and the coefﬁcient of
determination (R2) are, respectively, deﬁned as SSE = 27.10: "W, SST = 2,1,0). J7)2 and R2 =1__
Create a data set SS containing the values of SSE, SST and R2. [Total: 13 marks] ************** ************** S&AS: STAT1303 Data Management 8 ANNEX 1
Question 1  questionnaire Personal section SCI. How old are you?
__ YEARS OLD
DON’T KNOW
REFUSED SC1.1. INTERVIEWER QUERY
Respondent IS A MALE
Respondent IS A FEMALE SC2. Are you currently married, separated, divorced, widowed, or never married?
MARRIED GO TO SC3
SEPARATED
DIVORCED
WIDOWED
NEVER MARRIED
DON’T KNOW
REFUSED SC2A. Are you currently living with someone in a marriagelike relationship?
YES
NO
DON’T KNOW
REFUSED SC3. How would you rate your overall physical health — excellent, very good, good, fair, or poor?
EXCELLENT
VERY GOOD
GOOD
FAIR
POOR
DON’T KNOW
REFUSED DMl. What is the highest grade of school or year of college you completed?
Below primary school
Primary school
Secondary school (FlF3)
High school (F4 — F7/TI)
Tertiary education (nondegree)
University graduate
DON’T KNOW
REFUSED Depression section D1. Earlier in the interview, you mentioned having periods that lasted several days or longer when you felt sad,
empty, or depressed most of the day. During episodes of this sort, did you ever feel discouraged about how
things were going in your life? YES NO DON’T KNOW
REFUSED S&AS: STAT1303 Data Management 9 .Dla. During the episodes of being sad, empty, or depressed, did you ever lose interest in most things like
work, hobbies, and other things you usually enjoy?
YES
NO
DON’T KNOW
REFUSED D2. Earlier in the interview you mentioned having periods that lasted several days or longer when you felt
discouraged about how things were going in your life. During episodes of this sort, did you ever lose interest in
most things like work, hobbies, and other things you usually enjoy? YES NO DON’T KNOW
REFUSED D3. Did you ever have a period of being (sad/or/discomaged/or/uninterested in things) that lasted most of the day,
nearly every day, for two weeks or longer?
YES GO TO D4
NO
DON’T KNOW
REFUSED D3a. How long was the longest period of days you ever had when you were
(sad/or/discouraged/or/uninterested) most of the day?
DAYS
DON’T KNOW
REFUSED D4. In answering the next questions, think about the period of (several days/two weeks) or longer during that
episode when your (sadness/and/discouragement/and/loss of interest) and other problems were most severe and
frequent. During that period, which of the following problems did you have most of the day nearly every day:
(check all applied) D4a. Did you feel sad, empty, or depressed most of the day nearly
eve da durin that eriod of (several da 5/ two weeks)? D4b. During that period of (several days/ two weeks), did you feel discouraged about how things were going in your life most of the da nearl eve da ? D4c. During that period of (several days/ two weeks), did you lose I interest in almost all things like work and hobbies and things you
like to do for fun? D4d. Did you feel like nothing was fun even when good things were
ha enin? S&AS: STAT1303 Data Management 10 ANNEX 2 Question 2 — SAS output The MEANS Procedure N
Gender Obs Variable Label N Mean Std Dev Skewness
Male 514 om1 Stamina 505 4.2428246 1.0707152 0.1355560
om2 Cognition 508 2.7846501 0.8114557 0.0240414
Female 359 om1 Stamina 351 3.9828213 1.0125062 0.1748180
om2 Cognition 354 3.1703210 0.1655430 0.7112749
N Lower 10% Upper 10%
Gender Obs Variable Label Kurtosis CL for Mean CL for Mean
Male 514 om1 Stamina 0.2387424 4.2368343 4.2488149
om2 Cognition 0.1224122 2.7801237 2.7891765
Female 359 om1 Stamina 0.3335443 3.9760252 3.9896174
om2 Cognition ~0.2477158 3.1692146 3.1714275
The UNIVARIATE Procedure
Variable: om3 (Behavorial symptoms)
Moments
N 615 Sum Weights 615
Mean 1.936938 Sum Observations 1191.21687
Std Deviation 0.61423956 Variance 0.37729024
Skewness 0.04826048 Kurtosis 0.1546144
Uncorrected SS 2538.96944 Corrected SS 231.656208
Coeff Variation 31.7118856 Std Error Mean 0.02476853
Basic Statistical Measures
Location Variability
Mean 1.936938 Std Deviation 0.61424
Median 1.939686 Variance 0.37729
Mode . Range 4.00425
Interquartile Range 0.82002
Tests for Location: Mu0=1.9
Test Statistic —p Value     
Student's t t 1.491328 Pr > t 0.1364
Sign M 14.5 Pr >= M 0.2539
Signed Rank 3 6252 Pr >= S 0.1563
The UNIVARIATE Procedure
Variable: om3 (Behavorial symptoms)
Moments
N 253 Sum Weights 253
Mean 2.28805591 Sum Observations 578.878145
Std Deviation 0.58480506 Variance 0.34199696
Skewness 0.1606922 Kurtosis 0.0857781
Uncorrected SS 1410.68879 Corrected SS 86.1832345
Coeff Variation 25.5590373 Std Error Mean 0.03676638 Basic Statistical Measures Location Variability
Mean 2.288056 Std Deviation 0.58481
Median 2.263917 Variance 0.34200
Mode . Range 3.56527 Interquartile Range 0.80848 S&AS: STAT1303 Data Management
Tests for Location: Mu0=2.2
Test Statistic p Value     
Student's t t 2.395012 Pr > t 0.0174
Sign M 8.5 Pr >= M 0.3145
Signed Rank 3 2424.5 Pr >= S 0.0372
The UNIVARIATE Procedure
Variable: om2 (Cognition)
q1 = Male
Moments
N 508 Sum Weights 508
Mean 2.78465009 Sum Observations 1414.60225
Std Deviation 0.81145569 Variance 0.65846033
Skewness 0.0240414 Kurtosis —0.1224122
Uncorrected SS 4273.01166 Corrected SS 333.839388
Coeff Variation 29.1403106 Std Error Mean 0.03600252
Basic Statistical Measures
Location Variability
Mean 2.784650 Std Deviation 0.81146
Median 2.800379 Variance 0.65846
Mode Range 4.51011
Interquartile Range 1.05776
Tests for Location: Mu0=0
Test Statistic ——p Value     
Student's t t 77.34597 Pr > t <.ooo1
Sign M 254 Pr >= M <.0001
Signed Rank S 64643 Pr >= S <.0001
Tests for Normality
Test —Statistic p Value     
ShapiroWilk W 0.998244 Pr < W 0.8910
KolmogorovSmirnov 0 0.022351 Pr > D >0.1500
Cramervon Mises WSq 0.023755 Pr > WSq >0.2500
AndersonDarling ASq 0.158718 Pr > ASq >0.2500
The UNIVARIATE Procedure
Variable: om2 (Cognition)
q1 = Female
Moments
N 354 Sum Weights 354
Mean 3.17032105 Sum Observations 1122.29365
Std Deviation 0.16554302 Variance 0.02740449
Skewness 0.71127487 Kurtosis 0.2477158
Uncorrected SS 3567.70497 Corrected SS 9.67378574
Coeff Variation 5.22164852 Std Error Mean 0.00879851
Basic Statistical Measures
Location Variability
Mean 3.170321 Std Deviation 0.16554
Median 3.139803 Variance 0.02740
Mode Range 0.75594
Interquartile Range 0.20954
Tests for Location: Mu0=0
Test Statistic —p Value  ~   
Student's t t 360.3247 Pr > t <.ooo1
Sign M 177 Pr >= M <.0001
Signed Rank 8 31417.5 Pr >= S <.0001 11 Test
ShapiroWilk Kolmogorov—Smirnov Cramervon Mises
AndersonDarling q1(Gender) S&AS: STAT1303 Data Management Tests for Normality Statistic— p Value     
W 0.945325 Pr < W <0.0001
D 0.084757 Pr > D <0.0100 WSq 1.006363 PP > W—Sq <0.0050
ASq 6.247588 PP > ASq <0.0050 The FREQ Procedure
Table of q1 by om5
om5(Alcohol abuse) Frequency
Percent
Row Pct
Col Pct No abuse Drink da Alcohol Total
ily abuse
514
58.88
Female 359
41.12
Total 739 78 56 873
84.65 8.93 6.41 100.00
Statistics for Table of q1 by om5
Statistic DF Value Prob
ChiSquare 2 20.2688 <.0001
Likelihood Ratio ChiSquare 2 23.4683 <.0001
MantelHaenszel ChiSquare 1 15.3411 <.0001
Phi Coefficient 0.1524
Contingency Coefficient 0.1506
Cramer's V 0.1524 Sample Size = 873 12 S&AS: STAT1303 Data Management ANNEX 3 : An abridged version of SAS Syntax A. DATA STEP LIBNAME libref ’SAS—data—libmry’ ; DATA dataset—l <(dataset—options)> . . . 
INPUT van'ablds) <fomat> . . . ;
LENGTH variable—1 <$>length. . . ;
INFORMAT variable—1 <inf0rmat> . . . '
LABEL variable1=’label—1’ . . . ;
FORMAT variable—1 <format> . . . 
CARDSIDATALINES ; data RUN; LIBNAME libref ’SAS—data—libmry’ ; DATA datasetI <(dataset0ptions)> . . . '
INFILE ﬁlename; INPUT variableﬁs) <f0rmat> ; LENGTH variable1 <$>length. . . ;
INFORMAT variable—1 <inf0rmat> . . . '
LABEL variable1=’labelZ’ . . . ;
FORMAT variableJ <f0rmat> . . . ‘
RUN; S&AS: STAT1303 Data Management 14 LIBNAME libref ’SAS—datalibrary’ ; DATA datasetI <(data—set—options)> . . . ; MERGE] SET datasetI < (dataset—options) >
<dataset2 <(data—setoptions)> > ...; UPDATE datasetI <(data—set0ptions)>
dataset2 <(data—set—options)> ; BY <DESCENDING> variable—1 . . . ; DROP variable{s) ; KEEP oariable(s) ; variable=expression ; ARRAY arrayname (subscript) <arrayelements> ; DELETE ;
FILE ﬁleref. . . ;
OUTPUT datasetJ . . . ;
PUT ’characterstring’ variable1: . . .
RENAME old—nameJ =newname1 . . . ;
RETAIN variable(s) ;
STOP ;
WHERE whereempression ;
IF expression ;
IF expression THEN statement ; <ELSE statement ; >
DO; . more statements . . . END;
DO index—variablezstart TO stop ;
. more SAS statements . . . END;
RUN; 1. data set options in DATA step and other SAS PROCS:
DROP=, FIRSTOBS=, IN=, KEEP=, OBS=, RENAMEz, WHERE: S&AS: STAT1303 Data Management 15 B. The following statements are common to All SAS PROCS 1. FORMAT statement:
FORMAT variable1 <format> . . . ; 2. LABEL statement:
LABEL variableJ=’label—1’. . . ; 3. WHERE statement:
WHERE where—expression ; C. APPEND PROC APPEND BASEzdatasetname <DATA=datasetname> <FORCE> ; RUN; D. CONTENTS PROC CONTENTS <DATA=datasetname> <VARNUM> ;
RUN; E. CORR PROC CORR DATA = datasetname <options1 > ; VAR variablds);
RUN; 1. options in PROC CORR:
COV, NOSIMPLE, NOPROB, PEARSON, OUT: datasetname S&AS: STAT1303 Data Management 16 F. EXPORT PROC EXPORT DATA=datasetname
OUTFILE=“ﬁlename”  OUTTABLE=“tablename”
<DBMS=identiﬁer><REPLACE> ; <data—sourcestatements ;>
RUN; G. FORMAT PROC FORMAT <options1 > ; INVALUE <$>name valueormnge—J =inf0rmat—valueJ
< value—ormngen=informatvalue—n> ; VALUE <$>name valueormnge—I =f0rmatvalue—1 < value—or—mngen2format—valuen> ;
RUN; 1. options in PROC FORMAT:
CNTLIN=, CNTLOUT=, LIBRARY: H. FREQ PROC FREQ <DATA=datasetname <data set options>> <options1>; TABLES variable] variableQ variabl62*vam'able1 </options2> ;
WEIGHT variable; BY <DESCENDING> variable—1 < <DESCENDING> variable72>;
RUN; 1. options in PROC FREQ:
FORMCHAR(1,2,7)=formcharsmng, PAGE, NOPRINT S&AS: STAT1303 Data Management 17 2. options in TABLE statement:
NOCOL, NOROW, NOPRECENT, NOFREQ, NOCUM, NOPRINT,
TESTP=(p1p2 . . .), EXPECTED, CHISQ, FISHERlEXACT, MEASURES,
MISSING, MISSPRINT, OUT: datasetname <data set options> I. GCHART PROC GCHART DATA = datasetname ;
HBAR  HBAR3D  VBAR  VBARBD chart—variable{s)</ option(s)1 > ;
PIE I PIE3D  DONUT chart—variable(s) </ opti0n(s)2 > ' 9 BY grouping—variable{s) ;
RUN; 1. options in HBAR  HBARBD  VBAR I VBAR3D statement:
LEGEND, GROUP=, SUBGROUP=, MIDPOINTSz,
SUMVAR=, TYPE=, NOSTATS 2. options in PIE  PIE3D  DONUT statement:
LEGEND, SLICE=, VALUE=, PERCENTz, GROUP=, SUBGROUP=,
ACROSS=, DOWN=, MIDPOINTS=, SUMVAR=, TYPE: J. GPLOT PROC GPLOT DATA = datasetname ;
PLOT vertical*h0m'zontal < / options > ;
PLOT vertical*h0riz0ntal = symbolvariable < / options > ; PLOT vertical*h0n‘zontal :2 classvariable < / options > ; BY groupingvariable(s);
RUN; 1. options in PLOT statement:
CAXISICA = axiscolor, CTEXTC = textcolor, GRID, HREF=value
list, VREF=valuelz'st, OVERLAY, LEGEND S&AS: STAT1303 Data Management 18 K. IMPORT PROC IMPORT DATAFILE=“ﬁlename” I TABLE=“tablename”
OUT=datasetname <DBMS=identiﬁer><REPLACE> ; RUN; L. MEANS PROC MEANS <DATA=datasetname <data set options>>
<0ptions1 > statistic—keyword2;
BY <DESCENDING> variable—1 < <DESCENDING> variable—71>;
CLASS grdupingvariable{s);
VAR variable(s};
FREQ variable;
ID variablds);
OUTPUT OUT: datasetname <data set 0ptions> statistic—keyword3 < (variable(s))> = <name (s)>;
RUN; 1. options in PROC MEANS:
ALPHA=, MISSING, NONOBS, NOPRINT, NWAY . statistickeyword in PROC MEANS:
CLM CSS CV KURTOSIS LCLM MAX MEAN MIN N NMISS RANGE
SKEWNESS STD STDERR SUM SUMWGT UCLM USS VAR
MEDIAN P1 P5 P10 Q1 Q3 P90 P95 P99 QRANGE PROBT T . statistic—keyword in OUTPUT statement:
CSS CV KURTOSIS LCLM MAX MEAN MIN N NMISS RANGE
SKEWNESS STD STDERR SUM SUMWGT UCLM USS VAR
MEDIAN P1 P5 P10 Q1 Q3 P90 P95 P99 QRANGE PROBT T S&AS: STAT1303 Data Management 19 M. PRINT PROC PRINT <DATA=datasetname <data set 0ptions>> <options1 > ;
VAR variable(s); BY <DESCENDING> variable{s); ID variablds); SUM variable(s); RUN; 1. options in PROC PRINT:
NOOBS, LABEL N. REPORT PROC REPORT <DATA=datasetname <data set options>>
<options1 >; BY <DESCENDING> variable1 < <DESCENDING> variablen>; FREQ variable; COLUMN colamn—speclﬁcatz‘on{s} ; DEFINE variable / <usage options2 > ; BREAK location3 variable </option(s)4 > ; RBREAK location3 </opt'1on(s)4 > ; RUN; 1. options in PROC REPORT:
MISSING, FORMCHAR(1,2,7)=f07mchar—stmng, NOWINDOWS 2. options in DEFINE statement:
ACROSS, ANALYSIS, DISPLAY, GROUP, ORDER 3. location in BREAK and RBREAK statements can be either BEFORE or
AFTER. 4. options in BREAK and RBREAK statements:
OL, PAGE, SKIP, SUMMARIZE, UL S&AS: STAT1303 Data Management 20 O. SORT PROC SORT <DATA=datasetname <data set options>>
<OUT=datasetname <data set options>> ; BY <DESCENDING> variable—1 < <DESCENDING> variablen>;
RUN; P. SQL PROC SQL ;
CREATE TABLE tablename AS queryexpression
<ORDER BY orderbyitem <,0rder—byz'tem>...>;
SELECT <DISTINCT> objectitem <,0bject—z'tem>...
<INTO :macrovariable—speciﬁcation <, "macrovariablespeciﬁcation>...>
FROM fromlist
<WHERE sql—empression> <GROUP BY group—byz'tem <,g7‘0up~byz’tem>...> <HAVING sqlexpressi0n> <ORDER BY order—by—item <,0rderbyz'tem>...>;
QUIT; S&AS: STAT1303 Data Management 21 Q. TAB ULATE PROC TABULATE <DATA= datasetname <data set 0ptions>>
<opti0ns1 >; BY <DESCENDING> variable1 < <DESCENDING> variable—72>; CLASS groupingvariable{s); VAR analysisvariable(s); FREQ variable; TABLE <<pageea:pressi0n,> rowexpression) columnexpression
</ tableoption(s)2 > ; KEYLABEL keyword1=‘label—1’ <keyw0rdn=‘labeln’> ; RUN; 1. options in PROC TABULATE:
MISSING, FORMCHAR(1,2,7)=f0rmchar~3tring 2. options in TABLE statement:
BOX=, MISSTEXT=, RTSPACEz R. UNIVARIATE PROC UNIVARIATE <DATA= datasetname <data set options>>
<0ptions1 >; BY <DESCENDING> variable1 < <DESCENDING> variable11>; CLASS groupingvamableﬁs); VAR variablds); FREQ variable; ID vam'ablds); HISTOGRAM variableﬁs) / normal; QQPLOT variablds) / normal (Inu=est sigma=est); OUTPUT OUT = datasetname statistickeyword2< (variable (s))> = < name {s)>;
RUN; S&AS: STAT1303 Data Management 22 1. options in PROC UNIVARIATE:
ALL, ALPHA=value, CIBASIC<TYPE=LOWERUPPERITWOSIDE>,
MUO=value(s), NORMAL, ROBUSTSCALE, FREQ, NOPRINT, PLOTS,
NEXTROBS=n, NEXTRVALG 2. statistickeyword in OUTPUT statement:
CSS CV KURTOSIS MAX MEAN N MIN MODE RANGE NMISS NOBS
STDMEAN SKEWNESS STD USS SUM SUMWGT VAR
MEDIAN P1 P5 P10 P90 P95 P99 Q1 Q3 QRANGE
GINI MAD QN SN STD_GINI STD_MAD STD_QN STD_QRANGE STD_SN ...
View
Full
Document
This document was uploaded on 03/18/2012.
 Spring '11
 Statistics

Click to edit the document details