THE UNIVERSITY OF HONG KONG DEPARTMENT OF STATISTICS AND ACTUARIAL SCIENCE STAT1303 DATA MANAGEMENT (SEMESTER 1 2009/2010) Class Test November 5, 2009 Time: 4:00 p.m. - 5:00 p.m. Answer ALL THREE questions. Marks are shown in square brackets. 1. Some observations of a data °le ±PERFORM.DAT² are provided and some of the observations are shown as follows. The °le contains 8 variables and values are stored in di/erent column positions. The data values are stored in the data °le in the order of ID, DOE, NAME, DEPT, SECTION, GROUP, PERFORM AND SALARY. Create a SAS data set PERFORM in the folder ±D: n TEMP² . 1 2 3 4 5 ( column index 12345678901234567890123456789012345678901234567890 ( not in the data file -------------------------------------------------- 00112dec1999John Chan MARKETA13.45\$12,300 ( observations in 01101jan1997Mary Lai SALES A22.76\$4,300 ( the data file The output of PROC PRINT for the data set is given below: Obs ID DOE Name Dept Section group perform salary 1 1 12DEC1999 John Chan MARKET A 1 3.45 12300 2 11 01JAN1997 Mary Lai SALES A 2 2.76 4300 ... The variables ID, GROUP, PERFORM and SALARY are numeric variables. The last value may end at di/erent position (a) Write a SAS program to create the data set. When the column input is used, specify both the starting and ending column positions. [10 marks] (b) Write a SAS program to create the data set. When the column input is used, specify only the starting but not the ending column positions. [10 marks] [Total: 20 marks] 2. A study was conducted to investigate the height of plants with two types of fertilizers. The SAS data set FERTILIZER consists of two variables HEIGHT and FERTILIZER. FERTILIZER = 1 or 2 for the two types of fertilizers. Part of the SAS output are given as followings 1

For fertilizer 1, The UNIVARIATE Procedure Variable: height Moments N 10 Sum Weights 10 Mean 51.91 Sum Observations 519.1 Std Deviation 3.37027859 Variance 11.3587778 Skewness 0.60003925 Kurtosis -0.2954342 Uncorrected SS 27048.71 Corrected SS 102.229 Coeff Variation 6.49254207 Std Error Mean 1.06577567 Basic Statistical Measures Location Variability Mean 51.91000 Std Deviation 3.37028 Median 51.70000 Variance 11.35878 Mode . Range 10.50000 Interquartile Range 5.50000 Tests for Normality Test --Statistic--- -----p Value------ Shapiro-Wilk W 0.951249 Pr < W 0.6833 Kolmogorov-Smirnov D 0.124543 Pr > D >0.1500 Cramer-von Mises W-Sq 0.028609 Pr > W-Sq >0.2500 Anderson-Darling A-Sq 0.214151 Pr > A-Sq >0.2500 For fertilizer 2, The UNIVARIATE Procedure Variable: height Moments N 8 Sum Weights 8 Mean 56.55 Sum Observations 452.4 Std Deviation 3.14415558 Variance 9.88571429 Skewness 0.1506603 Kurtosis -1.0830816 Uncorrected SS 25652.42 Corrected SS 69.2 Coeff Variation 5.55995681 Std Error Mean 1.11162686 Basic Statistical Measures Location Variability Mean 56.55000 Std Deviation 3.14416 Median 56.55000 Variance 9.88571 Mode . Range 9.00000 Interquartile Range 4.90000 2
Tests for Normality Test --Statistic--- -----p Value------ Shapiro-Wilk W 0.970532 Pr < W 0.9021 Kolmogorov-Smirnov D 0.11873 Pr > D >0.1500 Cramer-von Mises W-Sq 0.020152 Pr > W-Sq >0.2500 Anderson-Darling A-Sq 0.149336 Pr > A-Sq >0.2500 (a) Write SAS program(s) to generate the above SAS output. [5 marks] (b) What are the means, standard deviation, skewness and kurtosis of the heights for the two types of the fertilizers. Comment on skewness and kurtosis. [10 marks] (c) Write a SAS program to save the mean and standard deviation for height for the two types of fertilizers to a SAS data set SUMMARY. Do not include overall mean and SD. [10 marks]

