1303_0708sem2

1303_0708sem2 - THE UNIVERSITY OF HONG KONG DEPARTMENT OF...

Info iconThis preview shows pages 1–22. Sign up to view the full content.

View Full Document Right Arrow Icon
Background image of page 1

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
Background image of page 2
Background image of page 3

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
Background image of page 4
Background image of page 5

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
Background image of page 6
Background image of page 7

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
Background image of page 8
Background image of page 9

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
Background image of page 10
Background image of page 11

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
Background image of page 12
Background image of page 13

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
Background image of page 14
Background image of page 15

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
Background image of page 16
Background image of page 17

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
Background image of page 18
Background image of page 19

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
Background image of page 20
Background image of page 21

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
Background image of page 22
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: THE UNIVERSITY OF HONG KONG DEPARTMENT OF STATISTICS AND ACTUARIAL SCIENCE STAT1303 DATA MANAGEMENT May 14, 2008 Time: 2:30 p.m. — 4:30 p.m. Candidates taking examinations that permit the use of calculators may use any calculator which fulfils the following criteria: (a) it should be self-contained, silent, battery-operated and pocket- sized and (b) it should have numeral-display facilities only and should be used only for the purposes of calculation. It is the candidate’s responsibility to ensure that the calculator operates satisfactorily and the candidate must record the name and type of the calculator on the front page of the examination scripts. Lists ofpermitted/prohibited calculators will not be made available to candidates for reference, and the onus will be on the candidate to ensure that the calculator used will not be in violation of the criteria listed above. Answer ALL FIVE questions. Marks are shown in square brackets. An abridged version of SAS syntax is provided in ANNEX 3. 1. A survey was conducted to study the prevalence of depression among people in the US adult population. The questionnaire is given in ANNEX 1. The data of the survey will be saved into a SAS data set DEPRESSION. (a) Design the codebook for the questionnaire. Note that the variable names specified in the codebook will be used in the SAS data set. (b) State whether the variables defined in the codebook are interval, ratio, ordinal or nominal. Suggest two descriptive statistics for each of the variables. (0) Write SAS programs to produce the appropriate graphs for the following. (i) Produce a line plot to investigate how the gender is related to whether the respondent ever feels discouraged about how things were going in his/her life during the sad episodes. (ii) Produce a bar chart to investigate if the mean number of depression problems as listed in question D4 is 0.5 for male and l for female, respectively. (iii) Produce a scatter plot to investigate how the age is related to the longest period of days the respondent ever had when he/she lost interest in most things he/she enjoys. [Total: 22 marks] S&AS: STAT1303 Data Management 2 2. A study was conducted for the elderly health in Hong Kong. The data set is HEALTH and the variables are Variable Description Code Q1 Gender 0=Male; l=Female OMl Stamina 0 to 10 0M2 Cognition 0 to 10 0M3 Behavourial symptoms 0 to 10 0M4 Shortness of breath O=No; 1=Yes OMS Alcohol abuse 0=No abuse; l=Drink daily; 2=Alcohol abuse Referring to the SAS output in ANNEX 2, answer the following questions. For the statistical inference, you should 0 specify the null hypothesis, 0 state the p-value of the statistical test, 0 state the acceptance or rejection of the null hypothesis, and 0 state the conclusion. Write SAS program(s) for each question to generate the SAS output in the ANNEX to which you are referring. (a) Considering the Stamina and Cognition measures for male and female, find the number of non-missing cases for the two variables for each sex and comment on the difference between the central locations of the two variables between the sexes. (b) Test whether the mean of the Behavioral Symptom measure is 1.9 for those who do not have Shortness of Breath and is 2.2 for those who have Shortness of Breath at the 5% level of significance. (c) Comment on whether the distributions of the Cognition measure are normal in Male and Female groups, respectively. (d) Test at the 5% level of significance, the relation between gender and Alcohol abuse. Comment on the relation if it is significant. [Total: 22 marks] S&AS: STAT1303 Data Management 3 3. A study was conducted and the data are stored in two text files ‘DEMODAT’ and ‘VISITDAT’ in the folder ‘C:\’. The data DEMO.DAT consists of the demographic variables of the subjects in which one subject has one observation in the data file. The data VISITDAT consists of the hospital visits of the subjects in which each subject may have more than one visits. The data format of the text files is given as follows. 0 DEMO.DAT Variable Type Description Code ID Numeric Unique subject ID number AGE Numeric Age of the subject SEX Character Sex of the subject ‘F’ for female ’M’ for male EDU Character Education level ‘N’ for below primary ‘P’ for primary ‘8’ for secondary ‘T’ for tertiary or above DIST Character District Maximum 50 characters Remarks: 0 The values of a subject are stored in one single line. 0 The values are separated by space(s). o Samle data 1 20 F N HK East 2 25 M P Kowloon West 0 VISITDAT Variable Type Remark ID Numeric Subject ID number DATE Date Date of visit. The INFORMAT is DATE9. TYPE Numeric Type of visit 1 = Normal visit 2 = Special visit M1 to M7 Numeric Seven medical measures Remarks: 0 The values of a single subject are stored in three lines. o In the first line of a subject, the values are ID, DATE and TYPE and are separated by space. 0 In the second line, depending on the value of TYPE, the values are either I M1, M2 and M3, ifthe value of TYPE is 1, or I M1, M3 and M4, if the value of TYPE is 2. In the third line, the values are M5, M6 and M71 Each of the values of M1 to M7 is stored in 2 columns without any delimitor. For exam 1e, when the data in VISITDAT are the followin 1 12jan2007 1 111213 151617 1 25jan2007 2 212223 252627 00 , O S&AS: STAT1303 Data Management the values in the data set should be the followin. 1d date type m1 m2 m3 m4 m5 m6 m7 1 12JAN2007 1 11 12 13 . 15 16 17 1 25JAN2007 2 21 . 22 23 25 26 27 Write a SAS program for each of the following question. Create all SAS data sets in the folder ‘C:\’ . (a) (b) (C) (d) Create SAS data sets DEMO for DEMO.DAT and VISIT for VIS IT.DAT. The severity is defined by the sum of M1, M2 and M3 of a subject. If the sum is equal to or below 10, it has severity level 1. If the sum is equal to or greater than 10 and below 20, it has severity level 2. If the sum is equal to or greater than 20, it has severity level 3. Add a variable SEVERITY for the severity and the SEVLVL for the severity level to the data set VISIT. The baseline severity level is the severity level of a subject in his/her first visit. Create three subsets, MALEI, FEMAL2 and MILD, of DEMO. MALEl contains all male subjects whose baseline severity level is l or above, FEMAL2 contains all female subjects whose baseline severity level is 2 or above and MILD contains all other subjects. The data sets should contain all the variables in DEMO and all the measures in the first visit in VISIT. An overall measure is defined as the average of M5, M6 and M7 for each visit of each subject. (i) Add the variable OM for the overall measure to the data set VISIT. (ii) Define the improvement of each visit, except the first visit, of each subject as the ratio (0M in the current visit)— (0M in the previous visit) (0M in the previous visit) Add the variable OMINP for the improvement to the data set VISIT. Define the average improvement for each subject as the average value of all OMINP among the subject’s visits. Create a new data set IMPROVE containing the subject ID and the variable IMP which is the average improvement. There should only be one observation for each subject. (iii) [Total: 19 marks] S&AS: STAT1303 Data Management 5 4. The variables of a questionnaire are defined as follows. Question Variable description Valid values Code for missing 1 ID number Numerals only 2 Gender M = male F=fimfle 3 Age in year 18 to 100 999 999 = No answer 4 Primary language 0 = English 9 1 = Chinese 2 = Others 9 = No answer 5 Ckherlanguage 20 characnns 6 The case was born in HK O = No 9 1 = Yes 9 = No answer 7 Did your heart pound or race? For Q7 to Q10 For Q7 to Q10 8 Were you short of breath? O = No 9 9 Did you have nausea or discomfort in your stomach? I = Yes 10 Did you feel dizzy or faint? 9 = No answer 11 About how many of these sudden attacks of fear or 0 to 900 999 panic have you had in your entire lifetime? 999 = No answer 12 What is your exact age when the attack occurred? 0 to 100 999 999 = No answer 13 Did you have one of these attacks at any time in the past 0 = No 9 12 months? 1 = Yes 9 = No answer 14 How many weeks in the past 12 months did you have at 0-52 999 least one attack? 999 = No answer Remarks 0 When Q4 is 2, enter the language in Q5. When all Q7 to Q10 are 0, end the questionnaire. When Q11 is 0, end the questionnaire. When Q11 is 999, skip Q12. When Q13 is O or 9, end the questionnaire. The raw data file PAN IC.DAT consists of a number of cases in list format separated by coma. Sample observations are given for reference. 502,M,48,1, ,0,0,0,l,0,999,4,0,. .,M,31,0, ,1,0,0,0,0,56,4,0,. 698,F,44,2,France,1,0,1,0,l,36,75,0,. 542,f,120,., 1,1,0,1,0,13,9,0,. 217,F,45,1, 777,M,27,2, .,0,0,13,9,0,. 8,M,l6,l, O,1,43,71,1,5 646,M,42,l, l,1,l,27,24,0,. 360,F,45,9, O,.,O,27,24,0,. 823,F,999,l, O 0,0,27,24,0,. 514, 1,0,0,1,20,10,1,999 79,M,29,1, a,F,49,l, 79,1,.,l, 273,F,99,0 S&AS: (a) (b) (C) (d) (e) (f) (g) STAT1303 Data Management 6 Use the data step to identify the observations, from the raw data file, with missing and/or invalid ID. Identify and use PROC PRINT to print out the observations with duplicate ID. Use PROC FORMAT to define formats for the four value types: valid values (excluding missing codes), missing codes, missing values and invalid values. Using the data step and the formats defined in (0), identify and print the observations, from the raw data file, with missing values or invalid data type. Observations with codes for missing should not be identified. There is no need to consider the contingence of the questions stated in the remarks. Using the data step and the formats defined in (c), identify and print the observations, from the raw data file, with invalid value range or code. Create a SAS data set from the raw data file and hence, identify the outlying observations defined as those differing from the mean by at least three standard deviations. The observations with invalid value should not be identified. The mean and the standard deviation should be calculated based on the observations with valid value only. The standardized value of the variables of concern should also be printed for the identified outliers. The standardized value of a variable X is defined as X — mean SD Based on the SAS data set created in (f), identify the observations that violate the remarks. There is no need to identify observations with missing values as identified in ((1) even if they violate the remarks. [Total: 24 marks] S&AS: STATl303 Data Management 7 5. Consider a data set REG consisting of two variables, X and Y. data REG; infile 'c:\reg.dat'; input x y; run; A model is applied to study the relation betweenX and Y given as y] =a+fixl +813 for a sample of n observations, (x,, y,), i = 1, ..., n. Note that the knowledge of the model is NOT required for this question. The estimates for a and [t are given as 2,1106. - ny, — i) 21:1(x' _ EV ’ __i n __l n x_nZ,=1xl’andy—"ZI=1-yi' Write SAS programs using only Data Step and PROC SORT without other SAS PROC’s for the following questions. You may assume that there is no missing value in REG. o?=?-l§fandl§= where, (a) Create a data set BETA containing the values of a? and [i for the data set REG. (b) With the values of 02 and [i , a fitted value for each observation (x,, y.) is defined as 5/, =é+fix1,i= 1, ..., n. Create a data set FITTED containing the original values xi and y,- and the fitted value )7, for i = 1, ..., n. (c) The error sum of squares (SSE), total sum of squares (SST) and the coefficient of determination (R2) are, respectively, defined as SSE = 27.10: "W, SST = 2,1,0). -J7)2 and R2 =1-__ Create a data set SS containing the values of SSE, SST and R2. [Total: 13 marks] ************** ************** S&AS: STAT1303 Data Management 8 ANNEX 1 Question 1 - questionnaire Personal section SCI. How old are you? __ YEARS OLD DON’T KNOW REFUSED SC1.1. INTERVIEWER QUERY Respondent IS A MALE Respondent IS A FEMALE SC2. Are you currently married, separated, divorced, widowed, or never married? MARRIED GO TO SC3 SEPARATED DIVORCED WIDOWED NEVER MARRIED DON’T KNOW REFUSED SC2A. Are you currently living with someone in a marriage-like relationship? YES NO DON’T KNOW REFUSED SC3. How would you rate your overall physical health — excellent, very good, good, fair, or poor? EXCELLENT VERY GOOD GOOD FAIR POOR DON’T KNOW REFUSED DMl. What is the highest grade of school or year of college you completed? Below primary school Primary school Secondary school (Fl-F3) High school (F4 — F7/TI) Tertiary education (non-degree) University graduate DON’T KNOW REFUSED Depression section D1. Earlier in the interview, you mentioned having periods that lasted several days or longer when you felt sad, empty, or depressed most of the day. During episodes of this sort, did you ever feel discouraged about how things were going in your life? YES NO DON’T KNOW REFUSED S&AS: STAT1303 Data Management 9 .Dla. During the episodes of being sad, empty, or depressed, did you ever lose interest in most things like work, hobbies, and other things you usually enjoy? YES NO DON’T KNOW REFUSED D2. Earlier in the interview you mentioned having periods that lasted several days or longer when you felt discouraged about how things were going in your life. During episodes of this sort, did you ever lose interest in most things like work, hobbies, and other things you usually enjoy? YES NO DON’T KNOW REFUSED D3. Did you ever have a period of being (sad/or/discomaged/or/uninterested in things) that lasted most of the day, nearly every day, for two weeks or longer? YES GO TO D4 NO DON’T KNOW REFUSED D3a. How long was the longest period of days you ever had when you were (sad/or/discouraged/or/uninterested) most of the day? DAYS DON’T KNOW REFUSED D4. In answering the next questions, think about the period of (several days/two weeks) or longer during that episode when your (sadness/and/discouragement/and/loss of interest) and other problems were most severe and frequent. During that period, which of the following problems did you have most of the day nearly every day: (check all applied) D4a. Did you feel sad, empty, or depressed most of the day nearly eve da durin that eriod of (several da 5/ two weeks)? D4b. During that period of (several days/ two weeks), did you feel discouraged about how things were going in your life most of the da nearl eve da ? D4c. During that period of (several days/ two weeks), did you lose I interest in almost all things like work and hobbies and things you like to do for fun? D4d. Did you feel like nothing was fun even when good things were ha enin? S&AS: STAT1303 Data Management 10 ANNEX 2 Question 2 — SAS output The MEANS Procedure N Gender Obs Variable Label N Mean Std Dev Skewness Male 514 om1 Stamina 505 4.2428246 1.0707152 0.1355560 om2 Cognition 508 2.7846501 0.8114557 -0.0240414 Female 359 om1 Stamina 351 3.9828213 1.0125062 0.1748180 om2 Cognition 354 3.1703210 0.1655430 0.7112749 N Lower 10% Upper 10% Gender Obs Variable Label Kurtosis CL for Mean CL for Mean Male 514 om1 Stamina 0.2387424 4.2368343 4.2488149 om2 Cognition -0.1224122 2.7801237 2.7891765 Female 359 om1 Stamina 0.3335443 3.9760252 3.9896174 om2 Cognition ~0.2477158 3.1692146 3.1714275 The UNIVARIATE Procedure Variable: om3 (Behavorial symptoms) Moments N 615 Sum Weights 615 Mean 1.936938 Sum Observations 1191.21687 Std Deviation 0.61423956 Variance 0.37729024 Skewness 0.04826048 Kurtosis 0.1546144 Uncorrected SS 2538.96944 Corrected SS 231.656208 Coeff Variation 31.7118856 Std Error Mean 0.02476853 Basic Statistical Measures Location Variability Mean 1.936938 Std Deviation 0.61424 Median 1.939686 Variance 0.37729 Mode . Range 4.00425 Interquartile Range 0.82002 Tests for Location: Mu0=1.9 Test -Statistic- --—--p Value - - - - -- Student's t t 1.491328 Pr > |t| 0.1364 Sign M 14.5 Pr >= |M| 0.2539 Signed Rank 3 6252 Pr >= |S| 0.1563 The UNIVARIATE Procedure Variable: om3 (Behavorial symptoms) Moments N 253 Sum Weights 253 Mean 2.28805591 Sum Observations 578.878145 Std Deviation 0.58480506 Variance 0.34199696 Skewness 0.1606922 Kurtosis -0.0857781 Uncorrected SS 1410.68879 Corrected SS 86.1832345 Coeff Variation 25.5590373 Std Error Mean 0.03676638 Basic Statistical Measures Location Variability Mean 2.288056 Std Deviation 0.58481 Median 2.263917 Variance 0.34200 Mode . Range 3.56527 Interquartile Range 0.80848 S&AS: STAT1303 Data Management Tests for Location: Mu0=2.2 Test -Statistic- -----p Value - - - - -- Student's t t 2.395012 Pr > |t| 0.0174 Sign M 8.5 Pr >= |M| 0.3145 Signed Rank 3 2424.5 Pr >= |S| 0.0372 The UNIVARIATE Procedure Variable: om2 (Cognition) q1 = Male Moments N 508 Sum Weights 508 Mean 2.78465009 Sum Observations 1414.60225 Std Deviation 0.81145569 Variance 0.65846033 Skewness -0.0240414 Kurtosis —0.1224122 Uncorrected SS 4273.01166 Corrected SS 333.839388 Coeff Variation 29.1403106 Std Error Mean 0.03600252 Basic Statistical Measures Location Variability Mean 2.784650 Std Deviation 0.81146 Median 2.800379 Variance 0.65846 Mode Range 4.51011 Interquartile Range 1.05776 Tests for Location: Mu0=0 Test -Statistic- --——-p Value - - - - -- Student's t t 77.34597 Pr > |t| <.ooo1 Sign M 254 Pr >= |M| <.0001 Signed Rank S 64643 Pr >= |S| <.0001 Tests for Normality Test —-Statistic--- -----p Value - - - - -- Shapiro-Wilk W 0.998244 Pr < W 0.8910 Kolmogorov-Smirnov 0 0.022351 Pr > D >0.1500 Cramer-von Mises W-Sq 0.023755 Pr > W-Sq >0.2500 Anderson-Darling A-Sq 0.158718 Pr > A-Sq >0.2500 The UNIVARIATE Procedure Variable: om2 (Cognition) q1 = Female Moments N 354 Sum Weights 354 Mean 3.17032105 Sum Observations 1122.29365 Std Deviation 0.16554302 Variance 0.02740449 Skewness 0.71127487 Kurtosis -0.2477158 Uncorrected SS 3567.70497 Corrected SS 9.67378574 Coeff Variation 5.22164852 Std Error Mean 0.00879851 Basic Statistical Measures Location Variability Mean 3.170321 Std Deviation 0.16554 Median 3.139803 Variance 0.02740 Mode Range 0.75594 Interquartile Range 0.20954 Tests for Location: Mu0=0 Test -Statistic- -—---p Value - ~ - - -- Student's t t 360.3247 Pr > |t| <.ooo1 Sign M 177 Pr >= |M| <.0001 Signed Rank 8 31417.5 Pr >= |S| <.0001 11 Test Shapiro-Wilk Kolmogorov—Smirnov Cramer-von Mises Anderson-Darling q1(Gender) S&AS: STAT1303 Data Management Tests for Normality --Statistic—-- -----p Value - - - - -- W 0.945325 Pr < W <0.0001 D 0.084757 Pr > D <0.0100 W-Sq 1.006363 PP > W—Sq <0.0050 A-Sq 6.247588 PP > A-Sq <0.0050 The FREQ Procedure Table of q1 by om5 om5(Alcohol abuse) Frequency Percent Row Pct Col Pct No abuse Drink da Alcohol Total ily abuse 514 58.88 Female 359 41.12 Total 739 78 56 873 84.65 8.93 6.41 100.00 Statistics for Table of q1 by om5 Statistic DF Value Prob Chi-Square 2 20.2688 <.0001 Likelihood Ratio Chi-Square 2 23.4683 <.0001 Mantel-Haenszel Chi-Square 1 15.3411 <.0001 Phi Coefficient 0.1524 Contingency Coefficient 0.1506 Cramer's V 0.1524 Sample Size = 873 12 S&AS: STAT1303 Data Management ANNEX 3 : An abridged version of SAS Syntax A. DATA STEP LIBNAME libref ’SAS—data—libmry’ ; DATA dataset—l <(data-set—options)> . . . - INPUT van'ablds) <fomat> . . . ; LENGTH variable—1 <$>length. . . ; INFORMAT variable—1 <inf0rmat> . . . ' LABEL variable-1=’label—1’ . . . ; FORMAT variable—1 <format> . . . - CARDSIDATALINES ; data RUN; LIBNAME libref ’SAS—data—libmry’ ; DATA dataset-I <(data-set-0ptions)> . . . ' INFILE filename; INPUT variablefis) <f0rmat> ; LENGTH variable-1 <$>length. . . ; INFORMAT variable—1 <inf0rmat> . . . ' LABEL variable-1=’label-Z’ . . . ; FORMAT variable-J <f0rmat> . . . ‘ RUN; S&AS: STAT1303 Data Management 14 LIBNAME libref ’SAS—data-library’ ; DATA dataset-I <(data—set—options)> . . . ; MERGE] SET dataset-I < (data-set—options) > <dataset-2 <(data—set-options)> > ...; UPDATE dataset-I <(data—set-0ptions)> dataset-2 <(data—set—options)> ; BY <DESCENDING> variable—1 . . . ; DROP variable{s) ; KEEP oariable(s) ; variable=expression ; ARRAY array-name (subscript) <array-elements> ; DELETE ; FILE fileref. . . ; OUTPUT dataset-J . . . ; PUT ’character-string’ variable-1: . . . RENAME old—name-J =new-name-1 . . . ; RETAIN variable(s) ; STOP ; WHERE where-empression ; IF expression ; IF expression THEN statement ; <ELSE statement ; > DO; . more statements . . . END; DO index—variablezstart TO stop ; . more SAS statements . . . END; RUN; 1. data set options in DATA step and other SAS PROCS: DROP=, FIRSTOBS=, IN=, KEEP=, OBS=, RENAMEz, WHERE: S&AS: STAT1303 Data Management 15 B. The following statements are common to All SAS PROCS 1. FORMAT statement: FORMAT variable-1 <format> . . . ; 2. LABEL statement: LABEL variable-J=’label—1’. . . ; 3. WHERE statement: WHERE where—expression ; C. APPEND PROC APPEND BASEzdatasetname <DATA=datasetname> <FORCE> ; RUN; D. CONTENTS PROC CONTENTS <DATA=datasetname> <VARNUM> ; RUN; E. CORR PROC CORR DATA = datasetname <options1 > ; VAR variablds); RUN; 1. options in PROC CORR: COV, NOSIMPLE, NOPROB, PEARSON, OUT: datasetname S&AS: STAT1303 Data Management 16 F. EXPORT PROC EXPORT DATA=datasetname OUTFILE=“filename” | OUTTABLE=“tablename” <DBMS=identifier><REPLACE> ; <data—source-statements ;> RUN; G. FORMAT PROC FORMAT <options1 > ; INVALUE <$>name value-or-mnge—J =inf0rmat—value-J < value—or-mnge-n=informat-value—n> ; VALUE <$>name value-or-mnge—I =f0rmat-value—1 < value—or—mnge-n2format—value-n> ; RUN; 1. options in PROC FORMAT: CNTLIN=, CNTLOUT=, LIBRARY: H. FREQ PROC FREQ <DATA=datasetname <data set options>> <options1>; TABLES variable] variableQ variabl62*vam'able1 </options2> ; WEIGHT variable; BY <DESCENDING> variable—1 < <DESCENDING> variable-72>; RUN; 1. options in PROC FREQ: FORMCHAR(1,2,7)=formchar-smng, PAGE, NOPRINT S&AS: STAT1303 Data Management 17 2. options in TABLE statement: NOCOL, NOROW, NOPRECENT, NOFREQ, NOCUM, NOPRINT, TESTP=(p1p2 . . .), EXPECTED, CHISQ, FISHERlEXACT, MEASURES, MISSING, MISSPRINT, OUT: datasetname <data set options> I. GCHART PROC GCHART DATA = datasetname ; HBAR | HBAR3D | VBAR | VBARBD chart—variable{s)</ option(s)1 > ; PIE I PIE3D | DONUT chart—variable(s) </ opti0n(s)2 > ' 9 BY grouping—variable{s) ; RUN; 1. options in HBAR | HBARBD | VBAR I VBAR3D statement: LEGEND, GROUP=, SUBGROUP=, MIDPOINTSz, SUMVAR=, TYPE=, NOSTATS 2. options in PIE | PIE3D | DONUT statement: LEGEND, SLICE=, VALUE=, PERCENTz, GROUP=, SUBGROUP=, ACROSS=, DOWN=, MIDPOINTS=, SUMVAR=, TYPE: J. GPLOT PROC GPLOT DATA = datasetname ; PLOT vertical*h0m'zontal < / options > ; PLOT vertical*h0riz0ntal = symbol-variable < / options > ; PLOT vertical*h0n‘zontal :2 class-variable < / options > ; BY grouping-variable(s); RUN; 1. options in PLOT statement: CAXISICA = axis-color, CTEXT|C = text-color, GRID, HREF=value- list, VREF=value-lz'st, OVERLAY, LEGEND S&AS: STAT1303 Data Management 18 K. IMPORT PROC IMPORT DATAFILE=“filename” I TABLE=“tablename” OUT=datasetname <DBMS=identifier><REPLACE> ; RUN; L. MEANS PROC MEANS <DATA=datasetname <data set options>> <0ptions1 > statistic—keyword2; BY <DESCENDING> variable—1 < <DESCENDING> variable—71>; CLASS grduping-variable{s); VAR variable(s}; FREQ variable; ID variablds); OUTPUT OUT: datasetname <data set 0ptions> statistic—keyword3 < (variable(s))> = <name (s)>; RUN; 1. options in PROC MEANS: ALPHA=, MISSING, NONOBS, NOPRINT, NWAY . statistic-keyword in PROC MEANS: CLM CSS CV KURTOSIS LCLM MAX MEAN MIN N NMISS RANGE SKEWNESS STD STDERR SUM SUMWGT UCLM USS VAR MEDIAN P1 P5 P10 Q1 Q3 P90 P95 P99 QRANGE PROBT T . statistic—keyword in OUTPUT statement: CSS CV KURTOSIS LCLM MAX MEAN MIN N NMISS RANGE SKEWNESS STD STDERR SUM SUMWGT UCLM USS VAR MEDIAN P1 P5 P10 Q1 Q3 P90 P95 P99 QRANGE PROBT T S&AS: STAT1303 Data Management 19 M. PRINT PROC PRINT <DATA=datasetname <data set 0ptions>> <options1 > ; VAR variable(s); BY <DESCENDING> variable{s); ID variablds); SUM variable(s); RUN; 1. options in PROC PRINT: NOOBS, LABEL N. REPORT PROC REPORT <DATA=datasetname <data set options>> <options1 >; BY <DESCENDING> variable-1 < <DESCENDING> variable-n>; FREQ variable; COLUMN colamn—speclficatz‘on{s} ; DEFINE variable / <usage options2 > ; BREAK location3 variable </option(s)4 > ; RBREAK location3 </opt'1on(s)4 > ; RUN; 1. options in PROC REPORT: MISSING, FORMCHAR(1,2,7)=f07mchar—stmng, NOWINDOWS 2. options in DEFINE statement: ACROSS, ANALYSIS, DISPLAY, GROUP, ORDER 3. location in BREAK and RBREAK statements can be either BEFORE or AFTER. 4. options in BREAK and RBREAK statements: OL, PAGE, SKIP, SUMMARIZE, UL S&AS: STAT1303 Data Management 20 O. SORT PROC SORT <DATA=datasetname <data set options>> <OUT=datasetname <data set options>> ; BY <DESCENDING> variable—1 < <DESCENDING> variable-n>; RUN; P. SQL PROC SQL ; CREATE TABLE table-name AS query-expression <ORDER BY order-by-item <,0rder—by-z'tem>...>; SELECT <DISTINCT> object-item <,0bject—z'tem>... <INTO :macro-variable—specification <, "macro-variable-specification>...> FROM from-list <WHERE sql—empression> <GROUP BY group—by-z'tem <,g7‘0up~by-z’tem>...> <HAVING sql-expressi0n> <ORDER BY order—by—item <,0rder-by-z'tem>...>; QUIT; S&AS: STAT1303 Data Management 21 Q. TAB ULATE PROC TABULATE <DATA= datasetname <data set 0ptions>> <opti0ns1 >; BY <DESCENDING> variable-1 < <DESCENDING> variable—72>; CLASS grouping-variable{s); VAR analysis-variable(s); FREQ variable; TABLE <<page-ea:pressi0n,> row-expression) column-expression </ table-option(s)2 > ; KEYLABEL keyword-1=‘label—1’ <keyw0rd-n=‘label-n’> ; RUN; 1. options in PROC TABULATE: MISSING, FORMCHAR(1,2,7)=f0rmchar~3tring 2. options in TABLE statement: BOX=, MISSTEXT=, RTSPACEz R. UNIVARIATE PROC UNIVARIATE <DATA= datasetname <data set options>> <0ptions1 >; BY <DESCENDING> variable-1 < <DESCENDING> variable-11>; CLASS grouping-vamablefis); VAR variablds); FREQ variable; ID vam'ablds); HISTOGRAM variablefis) / normal; QQPLOT variablds) / normal (Inu=est sigma=est); OUTPUT OUT = datasetname statistic-keyword2< (variable (s))> = < name {s)>; RUN; S&AS: STAT1303 Data Management 22 1. options in PROC UNIVARIATE: ALL, ALPHA=value, CIBASIC<TYPE=LOWER|UPPERITWOSIDE>, MUO=value(s), NORMAL, ROBUSTSCALE, FREQ, NOPRINT, PLOTS, NEXTROBS=n, NEXTRVALG 2. statistic-keyword in OUTPUT statement: CSS CV KURTOSIS MAX MEAN N MIN MODE RANGE NMISS NOBS STDMEAN SKEWNESS STD USS SUM SUMWGT VAR MEDIAN P1 P5 P10 P90 P95 P99 Q1 Q3 QRANGE GINI MAD QN SN STD_GINI STD_MAD STD_QN STD_QRANGE STD_SN ...
View Full Document

This document was uploaded on 03/18/2012.

Page1 / 22

1303_0708sem2 - THE UNIVERSITY OF HONG KONG DEPARTMENT OF...

This preview shows document pages 1 - 22. Sign up to view the full document.

View Full Document Right Arrow Icon
Ask a homework question - tutors are online