An Introduction to R
Notes on R: A Programming Environment for Data Analysis and Graphics
Version 3.2.2 (2015-08-14)
W. N. Venables, D. M. Smith
and the R Core Team
This manual
Copyright c
Copyright c
Copyright c
Copyright c
Copyright c
is for R, version

2
Simple Linear Regression
In this chapter we develop the basic theory for a simple linear regression, in which there is one y variable and one x variable. The least squares
estimates of the intercept and slope are derived, as well as their sampling
distr

8
Analysis of Designed Experiments
8.1
Introduction
In this chapter we discuss experiments whose main aim is to study and
compare the eects of treatments (diets, varieties, doses) by measuring
response (yield, weight gain) on plots or units (points, subje

7
Miscellaneous Topics in Regression
This chapter presents four topics which ll in a few gaps in the previous chapters, but which also introduce some more advanced concepts
which would need a separate course to be properly developed. Section 7.1
covers we

6
Two Case Studies
This chapter presents two extended case studies that utilize multiple
regression in contexts rather more involved than in the examples considered
so far in this book. The rst of these continues the discussion of Chapter
1, about air pol

5
Diagnostics for Model Selection
An important part of any practical regression analysis is the selection of
a suitable model for the data. This chapter is concerned with three aspects
of model selection. First, we review the various techniques that are a

4
Diagnostics for Inuential
Observations
A major trend of the last thirty years, as statistical analysis has become more automated, has been the development of diagnostic measures
which can indicate when the assumptions of the analysis are invalid. This
c

3
Multiple Regression
Chapter 2 has introduced us to many of the fundamental concepts of
regression through the simple linear regression model, in which there is a
single y variable which we want to predict, and a single x variable which
we use as a predi

Homework 5
Reading:
(A)
S & Y book: Chap 7 & 8;
due Tuesday 12/6
Faraway book: Chap 6 & 9
Faraway book: Chap 9, Exercise 4; see HW5-faraway.pdf for the description.
(B) S & Y book: Chap 7, Exercises 1 & 3
(C) S & Y book: Chap 8, Exercise 1
1

STOR 664 Homework 5
Solution
Part A. Exercises (Faraway book)
Ch.9 Ex.4
>
>
>
>
>
>
Set up the data and compute training and test RMSEs for each model.
library(faraway)
data(fat)
index <- seq(10, 250, by = 10)
train <- fat[-index, -c(1,3,8)]
test <- fat[i

STOR 664 Midterm Exam Fall 2012
olufong
Name:
ID#:
I pledge that I have neither given 1101: received unauthorized aid on this exam.
Signature:
(1) Consider a one-way ANOVA model. There are 3 small classes with sizes 111 m 10, n2 2 12 and
n3 : 15 stu

Midterm Review
STOR 664, Fall 2012
10/11/2012
664-Fall-2012
midterm review
Midterm Exam
Tuesday 10/16/2012, 12:30 1:45 pm, in class
closed book and notes, a single formula sheet
(double-sided) and scratch papers allowed, calculator
needed, stat. tables to

#
# Amherst example
# The first four lines read in the data, define n to be
# the number of observations ("temp" is one of the variables)
# and then perform a regression. You should modify these
# four lines for your own application.
amh<-read.table(fil

norm.test <- function(y, x, nsim=1000)
cfw_
# y: response x: covariates
# Program to perform goodness of fit tests for regression data
# The next lines compute the Looney-Gulledge statistic c1, the
# Kolmogorov-Smirnov statistic d1, the Cramer-von Mis

CHAPTER 2
SIMPLE LINEAR REGRESSION
1
Examples:
1. Amherst, MA, annual mean temperatures,
18361997
2. Summer mean temperatures in Mount Airy
(NC) and Charleston (SC), 19481996
Scatterplots outliers? inuential values?
independent v. dependent variables
2
Me

STOR 664: FINAL EXAM
DECEMBER 14 2009
This is an open book exam. Course text, personal notes and calculator are allowed. You have
3 hours to complete the exam. Answers should preferably be written in blue books.
SHOW ALL WORKING: You are not expected to p

1
HOMEWORK 5 SOLUTIONS
Problem 5.13
(a) Model 1 v. 2: F statistic is (70.119490.49895)/5 = 991 which is obviously
0.49895/35
signicant (with such a large F statistic, there is no need to look up
in a table). Conclusion: Yes, arrest numbers do vary by type

1
HOMEWORK 6 SOLUTIONS
Problem 7.2
(a) We have
1
1
X =.
.
.
x2
C
1
1
0 C1
n0 n
x2
2
1 C1
T
T
1
0 1 0 ,
. , X X = 0 n 0 , (X X) =
.
n
1
1
.
n 0 Cn
C1 0 C1
1 xn x2
n
x1
x2
.
.
.
so the normal equations lead to
n0 =
C
C 1
n1 =
n2 =
yi
1
C 1
yi xi ,
1
C

STOR 664: MIDTERM EXAM
OCTOBER 19 2009
This is an open book exam. Course text, personal notes and calculator are allowed. You have
75 minutes to complete the exam. Answers should preferably be written in blue books.
SHOW ALL WORKING: You are not expected

1
2
2000
500 1000
0
Buchanan Votes
3000
Palm Beach
0
50000
100000
150000
200000
250000
300000
Bush Votes
3
Objective
Buchanans vote of 3407 in Palm Beach County appears to be a
gross outlier compared with his votes in other Florida counties.
Crude analyse

Final Review
STOR 664, Fall 2012
12/4/2012
664-Fall-2012
nal review
Final Exam
Tuesday 12/11/2012, 12:00 3:00 pm, in class
closed book and notes, two formula sheets (double-sided)
and scratch papers allowed, calculator needed, stat. tables
to be provided

STOR 664: APPLIED STATISTICS
FINAL EXAM
4:00pm-7:00pm, Dec. 13, 2010
Time allowed: 3 hours.
This is a closed book, closed note exam except two-page (8.5*11 inches A4) double sided
notes. You are expected to use your own calculator.
The exam is expected to

CHAPTER 8
ANALYSIS OF DESIGNED
EXPERIMENTS
Discuss experiments whose main aim is to study
and compare the eects of treatments (diets,
varieties, doses) by measuring response (yield,
weight gain) on plots or units (points, subjects, patients). In general t

STATISTICS 174: APPLIED STATISTICS MIDTERM EXAM OCTOBER 15, 2003 Time allowed: 75 minutes. This is an open book exam: all course notes and the text are allowed, and you are expected to use your own calculator. Answers should preferably be written in a blu

CHAPTER 6: Two CASE STUDIES
AIR POLLUTION AND DAILY
MORTALITY IN BIRMINGHAM,
ALABAMA
This analysis is intended to illustrate regression analysis in the context of an issue of much
public interest. We are not claiming that all of
the checks and tests given

CHAPTER 5
DIAGNOSTICS FOR MODEL
SELECTION
1
5.1 Variable Selection
P possible regressors, choose a subset of p
P , p initially undetermined.
Simplest solution: t all 2P possible models
and compare their error sums of squares (SSE s).
By computing best mo