Homework #6:
P.825-833 (13.13, 13.14, 13.16, 13.34, 13.35), and the additional problems below.
Due Friday, March 20 .
Some Notes on the Homework :
The data for problem 13.34 and additional problems 1, 2, 3, & 4 are available on the course webpage.
For p
Comments - Homework #6
1. How categorical variables enter models: In problem 13.13, an industry type variable
was dened as a 0, 1, 2, or 3 depending on which of four industry types an observation
was, and you were asked to explain why this was not a good
Homework #7:
P.828-864 (13.6(a,b only), 13.24, 13.63, 13.64), and the additional problems
below. Due Friday, March 27 .
Some Notes on the Homework :
The data for Problem 13.11 (aphid.txt) and additional problems 1 & 2 are on the course webpage.
Organize
Solutions - Homework #1
1. Problem 11.62
(a) The relationship between price and income is essentially linear, although the house
with the highest income deviates slightly from this linear pattern and the house
with the 2nd highest income deviates signican
Solutions - Homework #6
1. Problem 13.13
(a) If the industry variable is dened simply as a 0, 1, 2, or 3 depending on the type of
industry, this imposes a linear ordering on the 4 industry categories that is totally
articial in two respects. First, there
Homework #5:
P.738-754 (12.32, 12.49), & the 5 additional problems below. Due Monday, March 9 .
Some Notes on the Homework
The data for additional problems 3, 4, & 5 are available on the course webpage.
For any test of signicance performed, you should a
Homework #1:
P.648-655 (11.62, 11.63 (modied - see below), 11.64, 11.77, 11.78), and the
additional problems below. Due Friday, February 6 .
Some Notes on the Homework :
The data for 11.62 and additional problems 1, 2, 3, & 4 are available on the course
Analysis of Variance (ANOVA) (8.1-8.2)
Much of statistical inference centers around the ability to distinguish between two or more
groups in terms of some underlying response variable y. For example, we may want to
estimate the dierence in mean cortisone
Looking for LOF with Residual Plots
In previous lectures, we have discussed in vague terms three main steps in regression model
development. These three steps were given as:
1. Variable Selection - identifying all potentially important explanatory variabl
Nonlinear Regression Models (13.3)
Recall that a parametric model is said to be linear if it is a linear function of the model
parameters. Any model which is linear can be written in the form of a general linear model
(GLM) as:
y = 0 + 1 x1 + + k xk + , f
# Reads in the Pima Indian diabetes data
# =
library(faraway) # Loads faraway library
data(pima) # Loads pima data
# Scatterplot of diabetes incidence vs. glucose concentration
# =
plot(pima$glucose,pima$test, # Plots test outcome vs. glucose
xlab="Gluco
# Performs Kruskal-Wallis test of failure time data
# =
time <- c(105,3,90,217,22,76,43,1,37,14, # Vector of failure times
183,144,219,76,39)
location <- as.factor(rep(1:3,each=5) # Vector of location levels
kruskal.test(time~location) # Conducts Kruskal
The Factorial Design (14.3-14.5)
This handout introduces the Factorial Design, the purpose of such a design, model form,
analysis, and interpretations which can be made. To this point, we have studied experimental designs such as the completely randomized
Kruskal-Wallis Test (8.6)
As mentioned in the homework, the Kruskal-Wallis test is a nonparametric alternative to
the 1-way ANOVA procedure for comparing distributions for multiple populations.
It is used primarily when the normality or variance homogene
nic <- read.csv("Data/nicotine.txt",header=T) # Reads in nicotine data
par(mfrow=c(1,2) # Creates 1x2 graphics window
boxplot(leafsize~condition,xlab="Condition",data= # Boxplot for leaf sizes by
nic,ylab="Leaf Size",cex.lab=1.5,cex.axis=1.5) # conditi
Examining the ANOVA Variance Homogeneity Assumption (7.4, 8.4)
In ANOVA problems, variance homogeneity refers to having equal population variances for
each of the t populations or treatments considered. Two of the more common ways of testing
this homogene
# Scatterplot & Residual plot for Extinction data
# =
extinct <- read.csv("Data/extinct.txt", # Reads in the extinction
header=T) # data
logtime <- log(extinct$exttime) # Vector of log extinction times
logpairs <- log(extinct$numpairs) # Vector of log nu
Model Diagnostics
The term model diagnosticsrefers to methodology for examining whether or not there are
problems in a model. We will discuss problems of two basic types: those involving the model
assumptions, and those involving specic observations (outl
biom <- c(16.6,49.1,121.7,219.6,375.5,570.8, # Vector of biomass values
648.2,755.6)
solrad <- c(29.7,68.4,120.7,217.2,313.5, # Vector of solar radiation values
419.1,535.9,641.5)
plot(solrad,biom,xlab="Solar Radiation (x)", # Plots biomass vs. solar r
Inference in Multiple Regression w/Example (12.3, 12.4)
The purpose of this handout is to consider modeling some data with a multiple linear regression model and to discuss some of the types of inferences one can make with regard to the
model parameters,