ST512 Lab 4

ST512 Lab 4 - Page 1 of 7 ST512 Assignment 4 Due Date:...

Info iconThis preview shows pages 1–3. Sign up to view the full content.

View Full Document Right Arrow Icon

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: Page 1 of 7 ST512 Assignment 4 Due Date: September 28 Assignment Goals: 1. Calculate leverages in R. 2. Explore the effects of collinearity. 3. Practice using indicator variables. This lab is based on the article, Using cigarette data for an introduction to multiple regression, by L. McIntyre (J. Stat. Ed. vol 2 no. 1, 1994). Quoting the article, The Federal Trade Commission annually rates varieties of domestic cigarettes according to their tar, nicotine, and carbon monoxide content. The United States Surgeon General considers each of these substances hazardous to a smoker's health. Past studies have shown that increases in the tar and nicotine content of a cigarette are accompanied by an increase in the carbon monoxide emitted from the cigarette smoke. The dataset presented here contains measurements of weight and tar, nicotine, and carbon monoxide (CO) content for 25 brands of cigarettes. The data were taken from Mendenhall and Sincich (1992) . The original source of the data is the Federal Trade Commission. These data also exemplify the truism that smoking is a leading cause of statistics. Read these data into R with a command similar to > cig<- read.table("J:/.eos/courses/st/st512/www/lec/001/data/cigarettes.txt",head=T) This data set (like all data sets used in labs) is stored in a public directory that can be accessed by pointing your web browser to . The variables contained in this data set are: Brand name Tar content (mg) Nicotine content (mg) Weight (g) Carbon monoxide content (mg) Exploratory analysis The ultimate goal of this lab is to build a regression model that predicts the carbon monoxide (CO) content of cigarettes as a function of their tar content, nicotine content, and weight. When encountering any data set for the first time, it is usually helpful to try to get a feel for the data by plotting it. Try Page 2 of 7 > pairs(cig) or (to get rid of the meaningless row and column of plots with Brand) > pairs(cig[,2:5]) (The R syntax here is that cig[,2:5] is a data frame consisting only of columns 2-5 of the original data frame, cig ). Examine the data and answer question 1. Fit a regression model with CO as the response and tar, nicotine, and weight as the predictors: > fm1<-lm(co~tar+nicotine+weight,data=cig) Calculate the leverages of each data point with the commands > lev<-lm.influence(fm1)$hat (The R syntax here is that lm.influence is a function that calculates a variety of...
View Full Document

Page1 / 7

ST512 Lab 4 - Page 1 of 7 ST512 Assignment 4 Due Date:...

This preview shows document pages 1 - 3. Sign up to view the full document.

View Full Document Right Arrow Icon
Ask a homework question - tutors are online