Multiple Linear Regression (Last) Data Mining Prof. Dawn Woodard School of ORIE Cornell University 1 Outline 1 Announcements 2 Multiple Linear Regression 2 Announcements Questions? 4 C p Statistic Obtain the cheese data from Blackboard Read the data into R: cheese <- read.table( "C:/temp/cheese.txt", skip = 8, col.names = c("ID", "taste", "acetic", "h2s", "lactic")) p = ? N = ? How many models to consider? What are they? Fit the full linear model; summarize the results summary( cheeseLM ) Look at the residuals ( Y i ˆ Y i ) : cheeseLM \$ residuals 6

C p Statistic Calculate the RSS for the full model. What is the code? Fit the following models, and for each ±nd the RSS: 1 H2S and Lactic 2 Acetic and H2S
Unformatted text preview: 3 Just H2S We really should consider all the models, but to save time I have just picked the promising ones. 7 C p Statistic Write down the RSS for each model: RSS Full model H2S & Lactic Acetic & H2S Just H2S Calculate the C p statistic for each model: C p Full model H2S & Lactic Acetic & H2S Just H2S What model do we choose? 8 C p Statistic What is the value of C p for the full model? 9 Model Choice Criteria There are several criteria for model choice that do not use a test data set E.g., C p , Aikake’s Information Criterion (AIC), Deviance Information Criterion (DIC) 10...
