lab8 - STAT 350 – Spring 2009 Lab 8 SOLUTION Correlation...

Info iconThis preview shows page 1. Sign up to view the full content.

View Full Document Right Arrow Icon
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: STAT 350 – Spring 2009 Lab 8 SOLUTION Correlation and Scatter Plots For this lab we will use two data sets (Fitness and Children). The data sets are available in the accompanying Excel Workbook. Unless otherwise stated below, the calculations, analyses, and graphs for this lab are to be done in SAS. All SAS code (contents of the editor window) and all SAS output should be given as an appendix. Fitness The first data set is on the worksheet titled “Fitness”. This data set gives measurements made on men involved in a phyical fitness course at N.C. State University (and taken from the SAS documentation). The variables are: • Age (in years) • Weight (in kilograms) • Oxygen Intake Rate (in ml per kg body weight per minute) • Time to Run 1.5 miles (in minutes) • Heart Rate (pulse) while Resting • Heart Rate (pulse) while Running (taken at the same time Oxygen rate was measured) • Maximum heart rate recorded while running Children The second data set is on the worksheet titled “Children”. The data set is from Lewis and Taylor, 1967 (and taken from the SAS documentation.) The variables are: • gender ("f" for females, "m" for males) • Age (in months) • Height (in inches) • Weight (in pounds) Requirements for Excel plots: There should be no border to the figure or plot area. There should be no gridlines. The plot area should be white. The points should be clear to see (in size and color). The scales of the x and y axes should be such that the data spans most of the plot area. The axes should have informative labels and the scale numbers should be clearly readable. Units (if known) should be given. The proportion of the graph (height versus width of figure) should be such that the relationship of the two variables is as obvious as possible. The title for each plot should be the problem number (e.g., Problem #1.b.ii). Requirements for plots generated with SAS PROC GPLOT (in addition to the requirements given for each problem below): Axes should be properly labeled (including units, when possible). Labels for the y-axis should be rotated so that the bottoms of the letters face the y-axis. The title for each plot should be the problem number (e.g., Problem #2.a). Lab 8 - SOLUTION Page 1 1. For the following problems, use the Fitness data set. a. Make a nice table giving the Pearson Correlation Coefficients (to 3 decimal places) for all pairs of variables in this data set. Use SAS to obtain the correlations, but do not paste the SAS output here. Age Weight Oxygen Intake -.305 -.163 1.000 Age 1.000 -.234 Weight 1.000 Oxygen Intake Run Time Resting Pulse Running Pulse Max. Pulse It is not necessary to show the ones on the diagonal. Run Time 0.189 0.144 -.862 1.000 Resting Pulse -.164 0.044 -.399 0.450 1.000 Running Pulse -.338 0.182 -.398 0.314 0.352 1.000 Max. Pulse -.433 0.249 -.237 0.226 0.305 0.930 1.000 b. Oxygen Intake vs. Run Time i. Give the Pearson Correlation Coefficient for these two variables -0.862 ii. Classify whether this correlation is strong/moderate/weak and positive/negative This is a Strong Negative correlation iii. Using Excel, make a scatter plot of oxygen intake (on the y axis) against running time (on the x axis). Make the plot points black circles with clear centers. Lab 8 - SOLUTION Page 2 c. Run Time vs. Age i. Give the Pearson Correlation Coefficient for these two variables +0.189 ii. Classify whether this correlation is strong/moderate/weak and positive/negative This is a Weak Postive correlation iii. Using SAS PROC GPLOT, make a scatter plot of running time (on the y axis) against age (on the x axis). Make the plot points circles. Add an interpolation line with smoothing parameter = 50. 15 14 13 12 11 10 9 8 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 A ge ( i n year s) Lab 8 - SOLUTION Page 3 iv. Using SAS PROC GPLOT, make a scatter plot of running time (on the y axis) against age (on the x axis). Make the plot points triangles. Add an interpolation line with smoothing parameter = 80. 15 14 13 12 11 10 9 8 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 A ge ( i n year s) Lab 8 - SOLUTION Page 4 2. For the following problems, use the Children data set. a. Using SAS PROC GPLOT, make a scatter plot of height (on the y axis) versus age (on the x axis). Use red circles for the females and black triangles for the males. Include regression lines (one for each gender). 80 70 60 50 130 140 150 160 170 180 190 200 210 220 230 240 250 A ge ( i n m hs) ont sex f m b. Find the correlation between age (in months) and height (in inches) for males. 0.74656 c. Find the correlation between age (in months) and height (in inches) for females. 0.54385 d. Find the correlation between age (in months) and height (in inches) (genders combined). 0.64886 Lab 8 - SOLUTION Page 5 e. In SAS, make two new variables (We did similar conversions in SAS back in Lab #2.): • age_years - the age of each child in years (do not round to integers). For example a child who is 245 months will be 145/12 = 12.083333 years • height_cm – the height of each child in centimeters. For example, a child who is 48 inches will be 48×2.54 = 121.92 cm Find the correlation between age (in years) and height (in cm) (genders combined). Compare this answer to your answer in (d). 0.64886 This is the same as the correlation in (d). Recall that correlation is independent of scale! Appendix Do not forget to include your SAS code (contents of the Editor window) and all SAS output (contents of the Output window, NOT the Log window) as an appendix. Note: you may truncate the data set parts of the code to save paper. /* Problem #1 */ data fitness; input Age Weight Oxygen RunTime datalines; 44 89.47 44.609 11.37 62 178 182 44 85.84 54.297 8.65 45 156 168 38 89.02 49.874 9.22 55 178 180 40 75.98 45.681 11.95 70 176 180 44 81.42 39.442 13.08 63 174 176 44 73.03 50.541 10.13 45 168 168 45 66.45 44.754 11.12 51 176 176 54 83.12 51.855 10.33 50 166 170 51 69.63 40.836 10.95 57 168 172 48 91.63 46.774 10.25 48 162 164 57 73.37 39.407 12.63 58 174 176 52 76.32 45.441 9.63 48 164 166 51 67.25 45.118 11.08 48 172 172 51 73.71 45.790 10.47 59 186 188 49 76.32 48.673 9.40 56 186 188 52 82.78 47.467 10.50 53 170 172 ; run; RestPulse RunPulse MaxPulse @@; 40 42 47 43 38 45 47 49 51 49 54 50 54 57 48 75.07 68.15 77.45 81.19 81.87 87.66 79.15 81.42 77.91 73.37 79.38 70.87 91.63 59.08 61.24 45.313 59.571 44.811 49.091 60.055 37.388 47.273 49.156 46.672 50.388 46.080 54.625 39.203 50.545 47.920 10.07 8.17 11.63 10.85 8.63 14.03 10.60 8.95 10.00 10.08 11.17 8.92 12.88 9.93 11.50 62 40 58 64 48 56 47 44 48 67 62 48 44 49 52 185 166 176 162 170 186 162 180 162 168 156 146 168 148 170 185 172 176 170 186 192 164 185 168 168 165 155 172 155 176 /* 1.a */ proc corr data=fitness pearson; var Age Weight Oxygen RunTime RestPulse RunPulse MaxPulse; run; /* 1.c.iii */ proc sort data=fitness; by age; run; symbol v=circle i=sm50; title 'Problem #1.c.iii'; axis1 label = ('Age (in years)'); axis2 label = (angle=90 'Time to Run 1.5 Miles (in minutes)'); proc gplot data=fitness; plot RunTime*Age / haxis=axis1 vaxis=axis2; run; /* 1.c.iv */ symbol v=triangle i=sm80; title 'Problem #1.c.iv'; proc gplot data=fitness; plot RunTime*Age / haxis=axis1 vaxis=axis2; run; Lab 8 - SOLUTION Page 6 /* Problem #2 */ data children; input sex $ age height weight @@; age_years = age/12; /* this is for part e height_cm = height*2.54; /* this is for part e datalines; f 143 56.3 85.0 f 155 62.3 105.0 f 153 63.3 108.0 f 161 f 191 62.5 112.5 f 171 62.5 112.0 f 185 59.0 104.0 f 142 f 160 62.0 94.5 f 140 53.8 68.5 f 139 61.5 104.0 f 178 f 157 64.5 123.5 f 149 58.3 93.0 f 143 51.3 50.5 f 145 f 191 65.3 107.0 f 150 59.5 78.5 f 147 61.3 115.0 f 180 f 141 61.8 85.0 f 140 53.5 81.0 f 164 58.0 83.5 f 176 f 185 63.3 101.0 f 166 61.5 103.5 f 175 60.8 93.5 f 180 f 210 65.5 140.0 f 146 56.3 83.5 f 170 64.3 90.0 f 162 f 149 64.3 110.5 f 139 57.5 96.0 f 186 57.8 95.0 f 197 f 169 62.3 99.5 f 177 61.8 142.5 f 185 65.3 118.0 f 182 f 173 62.8 102.5 f 166 59.3 89.5 f 168 61.5 95.0 f 169 f 150 61.3 94.0 f 184 62.3 108.0 f 139 52.8 63.5 f 147 f 144 59.5 93.5 f 177 61.3 112.0 f 178 63.5 148.5 f 197 f 146 60.0 109.0 f 145 59.0 91.5 f 147 55.8 75.0 f 145 f 155 61.3 107.0 f 167 62.3 92.5 f 183 64.3 109.5 f 143 f 183 64.5 102.5 f 185 60.0 106.0 f 148 56.3 77.0 f 147 f 154 60.0 114.0 f 156 54.5 75.0 f 144 55.8 73.5 f 154 f 152 60.5 105.0 f 191 63.3 113.5 f 190 66.8 140.0 f 140 f 148 60.5 84.5 f 189 64.3 113.5 f 143 58.3 77.5 f 178 f 164 65.3 98.0 f 157 60.5 112.0 f 147 59.5 101.0 f 148 f 177 61.3 81.0 f 171 61.5 91.0 f 172 64.8 142.0 f 190 f 183 66.5 112.0 f 143 61.5 116.5 f 179 63.0 98.5 f 186 f 182 65.5 133.0 f 182 62.0 91.5 f 142 56.0 72.5 f 165 f 165 55.5 67.0 f 154 61.0 122.5 f 150 54.5 74.0 f 155 f 163 56.5 84.0 f 141 56.0 72.5 f 147 51.5 64.0 f 210 f 171 63.0 84.0 f 167 61.0 93.5 f 182 64.0 111.5 f 144 f 193 59.8 115.0 f 141 61.3 85.0 f 164 63.3 108.0 f 186 f 169 61.5 85.0 f 175 60.3 86.0 f 180 61.3 110.5 m 165 m 157 60.5 105.0 m 144 57.3 76.5 m 150 59.5 84.0 m 150 m 139 60.5 87.0 m 189 67.0 128.0 m 183 64.8 111.0 m 147 m 146 57.5 90.0 m 160 60.5 84.0 m 156 61.8 112.0 m 173 m 151 66.3 117.0 m 141 53.3 84.0 m 150 59.0 99.5 m 164 m 153 60.0 84.0 m 206 68.3 134.0 m 250 67.5 171.5 m 176 m 176 65.0 118.5 m 140 59.5 94.5 m 185 66.0 105.0 m 180 m 146 57.3 83.0 m 183 66.0 105.5 m 140 56.5 84.0 m 151 m 151 61.0 81.0 m 144 62.8 94.0 m 160 59.3 78.5 m 178 m 193 66.3 133.0 m 162 64.5 119.0 m 164 60.5 95.0 m 186 m 143 57.5 75.0 m 175 64.0 92.0 m 175 68.0 112.0 m 175 m 173 69.0 112.5 m 170 63.8 112.5 m 174 66.0 108.0 m 164 m 144 59.5 88.0 m 156 66.3 106.0 m 149 57.0 92.0 m 144 m 147 57.0 84.0 m 188 67.3 112.0 m 169 62.0 100.0 m 172 m 150 59.5 84.0 m 193 67.8 127.5 m 157 58.0 80.5 m 168 m 140 58.5 86.5 m 156 58.3 92.5 m 156 61.5 108.5 m 158 m 184 66.5 112.0 m 156 68.5 114.0 m 144 57.0 84.0 m 176 m 168 66.5 111.5 m 149 52.5 81.0 m 142 55.0 70.0 m 188 m 203 66.5 117.0 m 142 58.8 84.0 m 189 66.3 112.0 m 188 m 200 71.0 147.0 m 152 59.5 105.0 m 174 69.8 119.5 m 166 m 145 56.5 91.0 m 143 57.5 101.0 m 163 65.3 117.5 m 166 m 182 67.0 133.0 m 173 66.0 112.0 m 155 61.8 91.5 m 162 m 177 63.0 111.0 m 177 60.5 112.0 m 175 65.5 114.0 m 166 m 150 59.0 98.0 m 150 61.8 118.0 m 188 63.3 115.5 m 163 m 171 61.8 112.0 m 162 63.0 91.0 m 141 57.5 85.0 m 174 m 142 56.0 87.5 m 148 60.5 118.0 m 140 56.8 83.5 m 160 m 144 60.0 89.0 m 206 69.5 171.5 m 159 63.3 112.0 m 149 m 193 72.0 150.0 m 194 65.3 134.5 m 152 60.8 97.0 m 146 m 139 55.0 73.5 m 186 66.5 112.0 m 161 56.8 75.0 m 153 m 196 64.5 98.0 m 164 58.0 84.0 m 159 62.8 99.0 m 178 m 153 57.8 79.5 m 155 57.3 80.5 m 178 63.5 102.5 m 142 m 164 66.5 112.0 m 189 65.0 114.0 m 164 61.5 140.0 m 167 m 151 59.3 87.0 ; Lab 8 - SOLUTION */ */ 59.0 56.5 61.5 58.8 63.3 61.3 59.0 58.0 61.5 58.3 62.0 59.8 64.8 57.8 55.5 58.3 62.8 60.0 66.5 59.0 56.8 57.0 61.3 66.0 62.0 61.0 63.5 64.8 60.8 50.5 61.3 57.8 63.8 61.8 58.3 67.3 66.0 63.5 63.5 60.0 65.0 60.0 65.0 61.5 71.0 65.8 62.5 67.3 60.0 62.0 66.0 63.0 64.0 56.3 55.0 64.8 63.8 55.0 62.0 92.0 69.0 103.5 89.0 114.0 112.0 112.0 84.0 121.0 104.5 98.5 84.5 112.0 84.0 84.0 111.5 93.5 77.0 117.5 95.0 98.5 83.5 106.5 144.5 116.0 92.0 108.0 98.0 128.0 79.0 93.0 95.0 98.5 104.0 86.0 119.5 112.0 98.5 108.0 117.5 112.0 93.5 121.0 81.0 140.0 150.5 84.0 121.0 105.0 91.0 112.0 112.0 116.0 72.0 71.5 128.0 112.0 76.0 107.5 Page 7 /* #2.a. */ symbol1 v=circle i=rl c=red; symbol2 v=triangle i=rl c=black; title 'Problem #2.a'; axis1 label = ('Age (in months)'); axis2 label = (angle=90 'Height (in inches)'); proc gplot data=children; plot height*age = sex / haxis=axis1 vaxis=axis2; run; title; /* this clears the title option otherwise, "Problem #2.a" would be a title for all of the following output */ /* #2b and #2c */ proc sort data=children; by sex; run; proc corr data=children pearson; by sex; var age height; run; /* #2d */ proc corr data=children pearson; var age height; run; /* #2e */ proc corr data=children pearson; var age_years height_cm; run; Lab 8 - SOLUTION Page 8 Problem #2.a 1 The CORR Procedure 7 Variables: Age Weight Oxygen RunTime RestPulse RunPulse MaxPulse Simple Statistics Variable Age Weight Oxygen RunTime RestPulse RunPulse MaxPulse N Mean Std Dev Sum Minimum Maximum 31 31 31 31 31 31 31 47.67742 77.44452 47.37581 10.58613 53.45161 169.64516 173.77419 5.21144 8.32857 5.32723 1.38741 7.61944 10.25199 9.16410 1478 2401 1469 328.17000 1657 5259 5387 38.00000 59.08000 37.38800 8.17000 40.00000 146.00000 155.00000 57.00000 91.63000 60.05500 14.03000 70.00000 186.00000 192.00000 Pearson Correlation Coefficients, N = 31 Prob > |r| under H0: Rho=0 Age Weight Oxygen RunTime Rest Pulse RunPulse MaxPulse 1.00000 -0.23354 0.2061 -0.30459 0.0957 0.18875 0.3092 -0.16410 0.3777 -0.33787 0.0630 -0.43292 0.0150 Weight -0.23354 0.2061 1.00000 -0.16275 0.3817 0.14351 0.4412 0.04397 0.8143 0.18152 0.3284 0.24938 0.1761 Oxygen -0.30459 0.0957 -0.16275 0.3817 1.00000 -0.86219 <.0001 -0.39936 0.0260 -0.39797 0.0266 -0.23674 0.1997 0.18875 0.3092 0.14351 0.4412 -0.86219 <.0001 1.00000 0.45038 0.0110 0.31365 0.0858 0.22610 0.2213 RestPulse -0.16410 0.3777 0.04397 0.8143 -0.39936 0.0260 0.45038 0.0110 1.00000 0.35246 0.0518 0.30512 0.0951 RunPulse -0.33787 0.0630 0.18152 0.3284 -0.39797 0.0266 0.31365 0.0858 0.35246 0.0518 1.00000 0.92975 <.0001 MaxPulse -0.43292 0.0150 0.24938 0.1761 -0.23674 0.1997 0.22610 0.2213 0.30512 0.0951 0.92975 <.0001 1.00000 Age RunTime Lab 8 - SOLUTION Page 9 2 --------------------------------------------- sex=f --------------------------------------------The CORR Procedure 2 Variables: age height Simple Statistics Variable age height N Mean Std Dev Sum Minimum Maximum 111 111 164.40541 60.52613 18.12752 3.35829 18249 6718 139.00000 51.30000 210.00000 66.80000 Pearson Correlation Coefficients, N = 111 Prob > |r| under H0: Rho=0 age height age 1.00000 0.54385 <.0001 height 0.54385 <.0001 1.00000 3 --------------------------------------------- sex=m --------------------------------------------The CORR Procedure 2 Variables: age height Simple Statistics Variable age height N Mean Std Dev Sum Minimum Maximum 126 126 164.45238 62.10317 18.75680 4.27669 20721 7825 139.00000 50.50000 250.00000 72.00000 Pearson Correlation Coefficients, N = 126 Prob > |r| under H0: Rho=0 age age 1.00000 0.74656 <.0001 height Lab 8 - SOLUTION height 0.74656 <.0001 1.00000 Page 10 4 The CORR Procedure 2 Variables: age height Simple Statistics Variable age height N Mean Std Dev Sum Minimum Maximum 237 237 164.43038 61.36456 18.42577 3.94540 38970 14543 139.00000 50.50000 250.00000 72.00000 Pearson Correlation Coefficients, N = 237 Prob > |r| under H0: Rho=0 age height age 1.00000 0.64886 <.0001 height 0.64886 <.0001 1.00000 5 The CORR Procedure 2 Variables: age_years height_cm Simple Statistics Variable age_years height_cm N Mean Std Dev Sum Minimum Maximum 237 237 13.70253 155.86597 1.53548 10.02132 3248 36940 11.58333 128.27000 20.83333 182.88000 Pearson Correlation Coefficients, N = 237 Prob > |r| under H0: Rho=0 age_years age_years 1.00000 0.64886 <.0001 height_cm Lab 8 - SOLUTION height_cm 0.64886 <.0001 1.00000 Page 11 ...
View Full Document

This note was uploaded on 02/16/2010 for the course STAT 350 taught by Professor Staff during the Spring '08 term at Purdue University-West Lafayette.

Ask a homework question - tutors are online