This preview shows pages 1–3. Sign up to view the full content.
This preview has intentionally blurred sections. Sign up to view the full version.View Full Document
Unformatted text preview: Page 1 of 7 ST512 Assignment 4 Due Date: September 28 Assignment Goals: 1. Calculate leverages in R. 2. Explore the effects of collinearity. 3. Practice using indicator variables. This lab is based on the article, Using cigarette data for an introduction to multiple regression, by L. McIntyre (J. Stat. Ed. vol 2 no. 1, 1994). Quoting the article, The Federal Trade Commission annually rates varieties of domestic cigarettes according to their tar, nicotine, and carbon monoxide content. The United States Surgeon General considers each of these substances hazardous to a smoker's health. Past studies have shown that increases in the tar and nicotine content of a cigarette are accompanied by an increase in the carbon monoxide emitted from the cigarette smoke. The dataset presented here contains measurements of weight and tar, nicotine, and carbon monoxide (CO) content for 25 brands of cigarettes. The data were taken from Mendenhall and Sincich (1992) . The original source of the data is the Federal Trade Commission. These data also exemplify the truism that smoking is a leading cause of statistics. Read these data into R with a command similar to > cig<- read.table("J:/.eos/courses/st/st512/www/lec/001/data/cigarettes.txt",head=T) This data set (like all data sets used in labs) is stored in a public directory that can be accessed by pointing your web browser to http://courses.ncsu.edu/st512/lec/001/data/ . The variables contained in this data set are: Brand name Tar content (mg) Nicotine content (mg) Weight (g) Carbon monoxide content (mg) Exploratory analysis The ultimate goal of this lab is to build a regression model that predicts the carbon monoxide (CO) content of cigarettes as a function of their tar content, nicotine content, and weight. When encountering any data set for the first time, it is usually helpful to try to get a feel for the data by plotting it. Try Page 2 of 7 > pairs(cig) or (to get rid of the meaningless row and column of plots with Brand) > pairs(cig[,2:5]) (The R syntax here is that cig[,2:5] is a data frame consisting only of columns 2-5 of the original data frame, cig ). Examine the data and answer question 1. Fit a regression model with CO as the response and tar, nicotine, and weight as the predictors: > fm1<-lm(co~tar+nicotine+weight,data=cig) Calculate the leverages of each data point with the commands > lev<-lm.influence(fm1)$hat (The R syntax here is that lm.influence is a function that calculates a variety of...
View Full Document
- Spring '07