# This last lab will discuss the two tests that arise in the context of
# ANOVA and regression, namely the Ftest, the t.test for the
# regression coefficients, and the Confidence Interval (CI) and the
# Prediction Interval (PI) applied to the predicted value.
# 1) Let's see how we do the Ftest introduced for doing 1way
# (or 1factor) ANOVA in Ch 9.
Recall that the main question is if
# k means are all equal. I.e.,
# H0: mu1=mu2=.
..=muk,
# H1: At least two of the mu's are different.
# Here we will reproduce Table 9.1, on page 411.
# Note that the data in 9_1_dat.txt are entered in a form that is consistent
# with what I was saying about ANOVA and regression being similar, i.e.,
# 1st column is x and 2nd column is y.
dat <
read.table("http://www.stat.washington.edu/marzban/390/9_1_dat.txt",header=TRUE)
aov.1 < aov(Vibration ~ as.factor(Brand), data=dat)
summary(aov.1)
# Make sure you compare the output here
# with what we got on page 13 of lecture 28.
# You can skip this commented block, but note that similar results can be
# obtained from general linear models (glm), which constitute a generalization
# of linear regression.
#
glm.1 < glm(Vibration ~ as.factor(Brand), data=dat)
#
aov.2 < anova(glm.1)
#
aov.2
# Given the really small pvalue (.00018), we reject the null in favor of the
# alternative. I.e., at least 2 of the means are statistically different
# from the rest. Which two? Section 9.3 shows how to identify the ones
# that are statistically equivalent, but we skipped it. Visually,
# you can look at the following boxplots. This plot is a better
# version of what the book calls the "effects plot" on page 416.
# It's better because it shows not just the mean, but the 5number
# summary at each level of x.
boxplot(Vibration ~ Brand, data=dat)
# This allows for a visual comparison of the distribution of the
# 5 populations. The pvalue told us that at least 2 of the means are
# different. It's evident, for example, that the population means of
# brand 2 and 5 are probably different.
# The following performs Tukey's method (section 9.3) for identifying the
# different means. Although we skipped it, the results are easy to interpret.
# It gives CIs and pvalues for pairwise tests of population means.
# Recall, if the CI does NOT include zero, then we "conclude" that the
# two means being tested are different.
library(stats)
tuk.1 < TukeyHSD(aov.1, conf.level=0.99);
tuk.1
# Study this output! The lowerbound (lwr) and upperbound (upr) are
# given for the difference in the mean of two pops. These values are
This preview has intentionally blurred sections. Sign up to view the full version.
View Full Document# affected by conf.level in TukeyHSD(). Then, look at the pvalues in
# the last column; they test the H1 that the two means are different.
# At alpha=0.01, it's evident that the means of brand 1 and 2 are different.
This is the end of the preview.
Sign up
to
access the rest of the document.
 Spring '08
 Statistics, Regression Analysis, Df Sum Sq

Click to edit the document details