Chapter 16
Regression Analysis: Model Building
Chapter 16
Regression Analysis: Model Building
Case Problem 1:
PGA Tour
Descriptive statistics and the sample correlation coefficients for the data follow:
Variable
Mean
StDev
Minimum
Q1
Median
Q3
Maximum
Earnings
1632143
1325944
626736
870121
1184458
1963451
10628023
Scoring Avg.
70.885
0.513
69.110
70.550
70.940
71.145
72.320
Yards/Drive
289.87
8.72
258.70
284.35
289.60
295.10
316.10
Driving Acc.
62.802
4.850
49.300
59.850
62.400
66.050
75.900
Greens In Reg.
65.687
2.502
59.600
64.050
65.700
67.300
71.800
Putting Avg.
1.7745
0.0228
1.7100
1.7620
1.7760
1.7900
1.8370
Save Pct.
50.009
6.232
32.500
46.300
49.600
54.600
63.000
Earnings
Scoring Avg.
Yards/Drive
Driving Acc.
Scoring Avg.
0.633
0.000
Yards/Drive
0.325
0.175
0.000
0.050
Driving Acc.
0.125
0.124
0.669
0.166
0.168
0.000
Greens In Re
0.367
0.604
0.248
0.327
0.000
0.000
0.005
0.000
Putting Avg.
0.258
0.417
0.033
0.134
0.004
0.000
0.717
0.136
Save Pct.
0.161
0.128
0.201
0.058
0.073
0.155
0.025
0.524
Greens In Re
Putting Avg.
Putting Avg.
0.223
0.012
Save Pct.
0.186
0.173
0.037
0.054
Cell Contents: Pearson correlation
PValue
We see that for the top 125 players the average earnings is $1,632,143, the average score is 70.89, the
average yards per drive is 289.9, and so on. The sample correlation coefficient between earnings and the
average score is .633; thus, lower scores are associated with higher earnings. In analyzing the data in an
attempt to predict the average score, earnings would not be considered an independent variable; it is simply
another output measure that has been used to rank the data.
The sample correlation coefficients show that the independent variable most highly correlated with the
average score is the percentage of time a player is able to hit the green in regulation. Thus, the best single
variable model uses Greens In Reg. to predict Scoring Avg. The corresponding Minitab regression output is
shown below:
The regression equation is
Scoring Avg. = 79.0  0.124 Greens In Reg.
CP  61
This preview has intentionally blurred sections. Sign up to view the full version.
View Full Document
Chapter 16
Regression Analysis: Model Building
Predictor
Coef
SE Coef
T
P
Constant
79.0294
0.9694
81.52
0.000
Greens In Reg.
0.12398
0.01475
8.41
0.000
S = 0.410863
RSq = 36.5%
RSq(adj) = 36.0%
Analysis of Variance
Source
DF
SS
MS
F
P
Regression
1
11.931
11.931
70.68
0.000
Residual Error
123
20.763
0.169
Total
124
32.694
The best single variable equation is able to explain 36% of the variation in the average score. To investigate
what other independent variables might be useful in predicting the average score we used Minitab’s best
subsets procedure.
Response is Scoring Avg.
G
r
D e P
Y r e u
a i n t
r v s t S
d i
i a
s n I n v
/ g n g e
D
r A R A P
i c e v c
Mallows
v c g g t
Vars
RSq
RSq(adj)
Cp
S
e . . . .
1
36.5
36.0
144.5
0.41086
X
1
17.3
16.7
224.6
0.46871
X
2
68.5
68.0
12.7
0.29058
X X
2
42.5
41.5
121.5
0.39261
X
X
3
71.3
70.6
2.8
0.27825
X X X
3
68.7
67.9
14.1
0.29102
X X X
4
71.5
70.6
4.1
0.27852
X X X X
4
71.5
70.6
4.1
0.27856
X
X X X
5
71.5
70.3
6.0
0.27963
X X X X X
This output indicates that three independent variables (Greens In Reg., Putting Avg., and Save Pct.) can be
used to develop an estimated regression equation with RSq (adj) = 70.6. The Minitab regression output for
this model follows:
The regression equation is
Scoring Avg. = 59.6  0.156 Greens In Reg. + 12.5 Putting Avg.
This is the end of the preview.
Sign up
to
access the rest of the document.
 Spring '09
 Unknown
 Statistics, Linear Regression, Regression Analysis, Errors and residuals in statistics, greens

Click to edit the document details