hw5 - If using SAS: with significance levels to stay and...

Info iconThis preview shows page 1. Sign up to view the full content.

View Full Document Right Arrow Icon
STA 6208 – HW #5 – Due 10/30/09 LPGA 2008 – Regression Analysis The dataset lpga1.dat contains statistics for the 2008 Ladies Professional Golf Association, containing the following variables: Golfer X 1 = Number of Rounds X 2 = Average Distance for Drives (Yards) X 3 = Percent of Fairways hit X 4 = Percent of Time on green in regulation X 5 = Average number of putts per round X 6 = Average number of sand traps hit per round X 7 = Percent of time making par when in sand Y = Prize Winnings per round ($) 1) Download the dataset lpga1.dat , 2) Obtain the best models with p’ =2,…,8 in terms of R 2 , Adj-R 2 , C P , SBC (BIC in R) . 3) Plot each of these versus p’. 4) Which model do you select? 5) Run the stepwise regression:
Background image of page 1
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: If using SAS: with significance levels to stay and enter ( sls=.15, sle=.15 ). What model is selected? Print out the results of this analysis. If using R, based on using minimum BIC criterion 6) RPD: 7.1, 7.2, 7.3, 7.4, 7.13 Use your best model from the lpga1.dat dataset (part 4) on lpga2.dat to validate the model. Use the model set up in Example 7.9 to: 1. Obtain Predicted values for lpga2 dataset, based on the regression from the lpga1 dataset 2. Obtain = P-Y for each of the golfers, as well as the mean and sd of 3. Conduct the t-test of H : Bias is 0 at = 0.05 significance level. 4. Obtain the Mean Squared Error of Prediction (MSEP) 5. What proportion of MSEP is due to bias in the predicted values?...
View Full Document

Ask a homework question - tutors are online