# This last lab will discuss the two tests that arise in the context of
# ANOVA and regression, namely the F-test, the t.test for the
# regression coefficients, and the Confidence Interval (CI) and the
# Prediction Interval (PI) applied to the predicted value.
# 1) Let's see how we do the F-test introduced for doing 1-way
# (or 1-factor) ANOVA in Ch 9.
Recall that the main question is if
# k means are all equal. I.e.,
# H0: mu1=mu2=.
..=muk,
# H1: At least two of the mu's are different.
# Here we will reproduce Table 9.1, on page 411.
# Note that the data in 9_1_dat.txt are entered in a form that is consistent
# with what I was saying about ANOVA and regression being similar, i.e.,
# 1st column is x and 2nd column is y.
dat <-
read.table("http://www.stat.washington.edu/marzban/390/9_1_dat.txt",header=TRUE)
aov.1 <- aov(Vibration ~ as.factor(Brand), data=dat)
summary(aov.1)
# Make sure you compare the output here
# with what we got on page 13 of lecture 28.
# You can skip this commented block, but note that similar results can be
# obtained from general linear models (glm), which constitute a generalization
# of linear regression.
#
glm.1 <- glm(Vibration ~ as.factor(Brand), data=dat)
#
aov.2 <- anova(glm.1)
#
aov.2
# Given the really small p-value (.00018), we reject the null in favor of the
# alternative. I.e., at least 2 of the means are statistically different
# from the rest. Which two? Section 9.3 shows how to identify the ones
# that are statistically equivalent, but we skipped it. Visually,
# you can look at the following boxplots. This plot is a better
# version of what the book calls the "effects plot" on page 416.
# It's better because it shows not just the mean, but the 5-number
# summary at each level of x.
boxplot(Vibration ~ Brand, data=dat)
# This allows for a visual comparison of the distribution of the
# 5 populations. The p-value told us that at least 2 of the means are
# different. It's evident, for example, that the population means of
# brand 2 and 5 are probably different.
# The following performs Tukey's method (section 9.3) for identifying the
# different means. Although we skipped it, the results are easy to interpret.
# It gives CIs and p-values for pairwise tests of population means.
# Recall, if the CI does NOT include zero, then we "conclude" that the
# two means being tested are different.
library(stats)
tuk.1 <- TukeyHSD(aov.1, conf.level=0.99);
tuk.1
# Study this output! The lower-bound (lwr) and upper-bound (upr) are
# given for the difference in the mean of two pops. These values are