When to use regression analysis
Goodness-of-fit: how well does the model fit the data?
Es#ma#on example
Es#ma#on example
This example concerning the number of species of tortoise on the various
Galapagos Islands. There are 30 case
Lecture One
Before you start
Statistics starts with a problem, continues
with the collection of data, proceeds with
the data analysis and finishes with
conclusions.
It is a common mistake of inexperienced
Statisticians to plunge into a complex
analysis wi
Parameter Estimation
Here is a data set concerning the number of
species of tortoise on the various Galapagos
Islands. There are 30 cases (Islands) and 7
variables in the data set. We start by
reading the data into R.
> gala <- read.table("gala.data") # r
Diagnostics 2
Residual Plots
We still use the saving data as an example again:
> savings <- read.table("saving.txt") # read the data into R
> g <- lm(sav ~ p15 + p75 + inc + gro, data=savings) # fit the model with sav as the response and
the rest variable
Diagnostics 1
Residual
We'll use the saving data (with country name) as an example here.
?
First fit the model and make an index plot of the residuals:
> saving.x <- read.table("saving.txt",header=T) # read the data into R
> p15 <- saving.x[,1];
> p75 <-
Identifiability?
Now, consider the saving data we analyzed in previous lab:
> saving.x <- read.table("saving.txt",header=T) # read the data into R
> p15 <- saving.x[,1];
> p75 <- saving.x[,2];
> inc <- saving.x[,3];
> gro <- saving.x[,4];
> sav <- saving.
Generalized least square ?
We'll use a built-in R dataset called Longley's regression data where
the response is number of people employed, yearly from 1947 to 1962, and
the predictors are
o
GNP implicit price deflator (1954=100),
o
GNP,
o
unemployed,
o
a
Confidence Interval and Region
Now, consider the savings data we analyzed in previous lab:
> saving.x <- read.table("saving.x", header=T) # read the data into R
> p15 <- saving.x[,1]; p75 <- saving.x[,2]; inc <- saving.x[,3]; gro <- saving.x[,4]; sav <- s
Hypothesis Test
We will illustrate a dataset called "saving.txt".
Savings Rates for Countries
SUMMARY:
The saving data set is originally from unpublished data of Arlie
Sterling. It is a matrix with 50 rows representing countries and 5
columns representing
Introduction to R
> 2+3 # R can be used as a simple calculator
> exp(1)
# All the usual calculator functions are available
> pnorm(1.645) # the normal probability function
The assignment operator is "<-"
> x <- 2 # assign the value 2 to x
> y <- 3
# y is
Homework 4 (due Dec 18th)
1. Read Chapter 12
2. Analyze the Chicago data south and north separately, including model fitting,
residual plots, identifying influential points. Compare the results from north and
south and write down your conclusions.
3. Infa
Homework 3 (due Dec 4th)
1. Read Chapter 10
2. Air Pollution and Mortality problem (dataset: smsa.txt):
Researchers at General Motors collected data on 60 U.S. Standard Metropolitan
Statistical Areas (SMSAs) in a study of whether or not air pollution cont