Stat 704: Data Analysis I, Fall 2011
Generalized linear models
Generalize regular regression to non-normal data cfw_(Yi , xi )n ,
i=1
most often Bernoulli or Poisson Yi .
The general theory of GLMs has been developed to outcomes in
the exponential famil
One-sample normal hypothesis Testing,
paired t-test, two-sample normal inference,
normal probability plots
Timothy Hanson
Department of Statistics, University of South Carolina
Stat 704: Data Analysis I
1 / 25
Hypothesis testing
We may perform a t-test to
Nonparametric tests
Timothy Hanson
Department of Statistics, University of South Carolina
Stat 704: Data Analysis I
1 / 16
Nonparametric one and two-sample tests
If data do not come from a normal population (and if the sample
is not large), we cannot use
Chapter 1
Timothy Hanson
Department of Statistics, University of South Carolina
Stat 704: Data Analysis I
1 / 24
Functional versus stochastic relations
Model: a mathematical approximation of the relationship among
real quantities (equation & assumptions a
Chapters 1 and 2
Timothy Hanson
Department of Statistics, University of South Carolina
Stat 704: Data Analysis I
1 / 31
Toluca data (p. 19)
Toluca makes replacement parts for refridgerators.
We consider one particular part, manufactured in varying lot
siz
Chapter 2
Timothy Hanson
Department of Statistics, University of South Carolina
Stat 704: Data Analysis I
1 / 21
2.7 Analysis of variance approach to regression (pp. 6372)
If x is useless, i.e. 1 = 0, then E (Yi ) = 0 . In this case 0 is
estimated by Y .
Chapter 5
Timothy Hanson
Department of Statistics, University of South Carolina
Stat 704: Data Analysis I
1 / 43
Chapter 3: Diagnostics
Section 3.1: Outlying x -values can be found via boxplot (or a
scatterplot!) Useful for assessing extrapolation. More
a
Chapter 6 Multiple Regression
Timothy Hanson
Department of Statistics, University of South Carolina
Stat 704: Data Analysis I
1 / 33
6.1 Multiple regression models
We now add more predictors, linearly, to the model. For example
lets add one more to the si
Sections 3.9 and 6.8: Transformations
Timothy Hanson
Department of Statistics, University of South Carolina
Stat 704: Data Analysis I
1 / 24
Transformations of variables (Section 3.9 & p. 236)
Some violations of our model assumptions may be xed by
transfo
Sections 7.1, 7.2, 7.4, & 7.6
Timothy Hanson
Department of Statistics, University of South Carolina
Stat 704: Data Analysis I
1 / 22
Chapter 7 example: Body fat
n = 20 healthy females 2534 years old.
x1 = triceps skinfold thickness (mm)
x2 = thigh circumf
Chapter 8
Timothy Hanson
Department of Statistics, University of South Carolina
Stat 704: Data Analysis I
1 / 23
8.1 Polynomial regression
Used when the relationship between Y and the predictor(s) is
curvilinear.
Example: we might add a quadratic term to
Chapter 9 Model Selection and Validation
Timothy Hanson
Department of Statistics, University of South Carolina
Stat 704: Data Analysis I
1 / 40
Salary example in proc glm
Model salary ($1000) as function of age in years, years post-high
school education (
Lecture 23: Poisson Regression
Stat 704: Data Analysis I, Fall 2010
Tim Hanson, Ph.D.
University of South Carolina
T. Hanson (USC)
Stat 704: Data Analysis I, Fall 2010
1 / 27
Chapter 14
14.13 Poisson regression
Poisson regression
* Regular regression data
Stat 704: Multicollinearity and Variance Ination Factors
Multicollinearity occurs when several of the predictors under consideration x1 , x2 , . . . , xk are
highly correlated with other predictors. Problems arising when this happens include:
1. Adding/re
Stat 704 Data Analysis I
Probability Review
Timothy Hanson
Department of Statistics, University of South Carolina
1 / 29
Course information
Logistics: LeConte College 210A, Tuesday & Thursday
3:30-4:45pm.
Instructor: Tim Hanson, Leconte 219C, phone 777-38
Stat 704: Midterm Exam, Tuesday October 18
1. Increased arterial blood pressure in the lungs can lead to heart failure in patients with
chronic obstructive pulmonary disease (COPD). Determining arterial lung pressure is
invasive, dicult, and can hurt the
Chapter 10: More diagnostics
Timothy Hanson
Department of Statistics, University of South Carolina
Stat 704: Data Analysis I
1 / 29
PRESSp criterion
n
n
(Yi Yi (i ) )2
PRESSp =
i =1
=
i =1
ei
1 hii
2
,
where Yi (i ) is the tted value at xi with the (xi ,
Lecture 24: Generalized Additive Models
Stat 704: Data Analysis I, Fall 2010
Tim Hanson, Ph.D.
University of South Carolina
T. Hanson (USC)
Stat 704: Data Analysis I, Fall 2010
1 / 26
Generalized additive models
Additive predictors
Generalized additive mo
Stat 704, Fall 2011: Homework 1
Due Sept. 1
1. Let
iid
2
Y11 , Y12 , . . . , Y1n1 N (1 , 1 ),
independent of
iid
2
Y21 , Y22 , . . . , Y2n2 N (2 , 2 ).
Let Y1 =
tions.
1
n1
n1
i=1
Y1i and Y2 =
1
n2
n2
i=1
Y2i be the sample means from the two popula-
(a) F
Stat 704, Fall 2011: Homework 2
Due Tuesday Sept. 13
1. Regression through the origin We will consider a special case of simple linear
regression where the intercept is assumed to be zero from the outset (this is often
assumed in calibration of certain me
Stat 704, Fall 2011: Homework 3
Due Tuesday Sept. 20
Regression through the origin, again: Let
Y i = xi + i ,
where E ( i ) = 0 and var( i ) = 2 .
(a) Write the model as Y = X + , dening each matrix/vector.
n
i=1 xi Yi
n
2
i=1 xi
(b) Show that = (X X)1 X
Stat 704, Fall 2011: Homework 4
Due Thursday Oct. 6
Carry out all hypothesis tests at the 5% signicance level.
1. Consider the brand preference data of Problem 6.5.
(a) Obtain and report the scatterplot matrix; what does it tell you about the relationship
Stat 704, Fall 2011: Homework 5
Due Tuesday Oct. 25
1. Brand preference: 8.11 (a) and (b).
2. Commercial properties: 8.8 (a, you do not need to center x1 ) and (c).
3. Assessed valuations: 8.24 (a), (b), and (c).
4. Kidney function: 9.15 (a), (b), and (c)
Stat 704, Fall 2011: Homework 6
Due Thursday Nov. 3
1. Brand preference: 10.5 (a) and (b); 10.9 (a, use = 0.05), (b), (c), (d), (e), (f,
instead examine regression eects with and without case 14), (g).
2. Commercial properties: 10.8 (a) and (b); 10.12 (a,
Stat 704: Homework 7, due Tuesday, Nov. 15
1. Weighted least squares : 11.6:(a,c,d,e,f).
2. Weighted least squares : 11.7(a,b,c,d,e,f). For 11.7(b) the SAS code might look something like this:
proc model data=d; parms b0 b1; y=b0+x1*b1; fit y / breusch=(1
Stat 704, Fall 2011: Homework 8, due Tuesday, Nov. 22
I posted some sample SAS code that does most of what you need for the following
problems.
Flu shots : 14.14(a,b,c), 14.20(b), 14.22(a,b), 14.28(b,c), 14.32(a,b), 14.36(a). For
14.28(a,b), use a rst-or
Sections 2.11 and 5.8
Timothy Hanson
Department of Statistics, University of South Carolina
Stat 704: Data Analysis I
1 / 25
Gesell data
Let X be the age in in months a child speaks his/her rst word
and let Y be the Gesell adaptive score, a measure of a c
Lecture 18: Weighted least squares & ridge regression
Stat 704: Data Analysis I, Fall 2010
Tim Hanson, Ph.D.
University of South Carolina
T. Hanson (USC)
Stat 704: Data Analysis I, Fall 2010
1 / 21
Chapter 11
11.1 Unequal variance rem. measure: Weighted l
Lecture 19: Robust & Quantile regression
Stat 704: Data Analysis I, Fall 2010
Tim Hanson, Ph.D.
University of South Carolina
T. Hanson (USC)
Stat 704: Data Analysis I, Fall 2010
1 / 17
Chapter 11
11.3 Inuential cases rem. measure: Robust regression
11.3:
Chapter 6 Multiple Regression
Timothy Hanson
Department of Statistics, University of South Carolina
Stat 704: Data Analysis I
1 / 25
6.7 CI for mean response and PI for new response
Lets construct a CI for the mean response corresponding to a set
of value