STAT5044: Regression and ANOVA
Homework #1: solution
Problem# 1. Refer to regression model. Assume that X = 0 is within the scope of
the model. What is the implication for the regression function if 0 = 0 so that the
model is Yi = 1 Xi + i ? How would the
y 8
0.0
10
12
14
16
18
20
22
0.5
1.0
1.5 x
2.0
2.5
3.0
Figure 1: The tted line using the shipment route-number of ampules data STAT5044: Regression and ANOVA The Solution of Homework #2 Inyoung Kim Problem# 1. The shipment route (X) and the number of ampu
STAT5044: Regression and Anova
Inyoung Kim
Outline
1
Weighted Least Square
Weighted Least Square
Only problem: nonconstant variance Linear model with heterogeneity: Yi = 0 + 1 xi + i i
Yi
i
=
0 1 + xi + i i
We want to minimize this form.
cfw_ i ( i
i
yi
0
STAT5044: Regression and Anova
Inyoung Kim
Outline
1
How to make an inference without normality assumption
Inference without normality in one population
Example1 : you want estimate the median of a population but dont know the functional form of the densi
STAT5044: Regression and Anova
Inyoung Kim
Outline
1
Box-Cox transformation
2
Prediction Interval using Transformated model
Choosing a transformation
Log transformation: common for skewed data with heterogeneity and nonlinearity Square root transformation
STAT5044: Regression and Anova
Inyoung Kim
1 / 49
Outline
1
How to check assumptions
2 / 49
Assumption
Linearity: scatter plot, residual plot
Randomness: Run test, Durbin-Watson test when the data can
be arranged in time order.
Constant variance: scatter
STAT5044: Regression and Anova
Inyoung Kim
Outline
1
Prediction
Prediction
Two meaning
Predict conditional mean of Y given a xnew
Point estimation for conditional mean is 0 + 1 xnew .
Predict a new observation Y given a xnew
Y
= 0 + 1 xnew +
N (0, 2 )
w
STAT5044: Regression and Anova
Inyoung Kim
Outline
1
Testing
2
Condence interval
3
ANOVA table
Testing procedure
Decide what question we want to test:
Null hypothesis and alternative hypothesis
Test statistic
Decision rule
Make conclusion.
Hypothesis
H0
STAT5044: Regression and Anova
Inyoung Kim
Outline
1
Matrix Expression
2
Linear and quadratic forms
3
Properties of quadratic form
4
Properties of estimates
5
Distributional properties
6
Distributional properties
Matrix Expression
If we have p variables x
STAT5044: Regression and Anova
Inyoung Kim
Outline
1
Regression
2
Properties of Least Squares Estimators: Gauss-Markov theorem
Regression
A way to model the relationship between dependent (
variable Y and independent (
) variable X.
)
Regression
A way to
STAT5044: Regression and Anova
Inyoung Kim
Outline
1
Introduction to regression and anova
Statistics
Descriptive statistics: summarization of populations,
e.g, mean, variance, 5 number summary
Statistical Inference: estimation and testing,
e.g, maximum li
STAT5044: Regression and Anova
Inyoung Kim
Outline
1
Polynomial Regression
Polynomial Regression (nonlinearity)
Using taylor series approximate polynomial function
mth order polynomial,
yi = 0 + 1 xi + 2 xi2 + + m xim + i ,
i N (0, 2 ).
The number of para
STAT5044: Regression and Anova
Inyoung Kim
Outline
1
Multiple Linear Regression
Basic Idea
An extra sum of squares the marginal reduction in the error sum
of squares when one or several predictor variables are added to
the regression model, given that oth
STAT5044: Regression and Anova
Inyoung Kim
Outline
1
More on R 2
R2
R 2 : the coefcient of multiple determination
R 2 is not an estimate of a population quantity unless the data is
multivariate normal
R 2 can be drametically changed by how the x s are sel
STAT5044: Regression and Anova
Inyoung Kim
1 / 13
Outline
1
Inference for GLMs
2 / 13
Deviance and Goodness of Fit
The saturated GLM has a separate parameter for each observation. It gives a perfect t. This sound good, but it is not helpful model It does
STAT5044: Regression and Anova
Inyoung Kim
1 / 15
Outline
1
Fitting GLMs
2 / 15
Fitting GLMS
We study how to nd the maxlimum likelihood estimator of GLM parameters The likelihood equaions are usually nonlinear in
We describe a general-purpose iterative
STAT5044: Regression and Anova
Inyoung Kim
1 / 18
Outline
1
Logistic regression for Binary data
2
Poisson regression for Count data
2 / 18
GLM
Let Y denote a binary response variable. Each observation has one of two outcomes, denoted by 0 or 1, binomial f
STAT5044: Regression and Anova
Inyoung Kim
1 / 17
Outline
1
Generalized Linear Model
2 / 17
GLM
Consider the simple linear regression model E (Y ) = 0 + 1 x where Y is normally distributed Denoting E (Y ) = , we can write
= 0 + 1 x
For the logistic regre
STAT5044: Regression and Anova
Inyoung Kim
1 / 37
Outline
1
Test of goodness- of- t
2
Test of independence
3
Test of homogeneity
2 / 37
Test of goodness-of-t: Test whether the data come from a multinomial (or binomial) distribution Test of independence: T
STAT5044: Regression and Anova
Inyoung Kim
1 / 48
Outline
1
Categorical data analysis
2
Three measures of relationship between categorical variables
3
Testing Independent in two way contingency table
2 / 48
Describing Contingency Tables
Introduce tables t
STAT5044: Regression and Anova
Inyoung Kim
Outline
1
One regression model using data from two sources
2
Segmented regression
Data from two sources
Example
Female salary Yi = 0 + 1 xi + i , j=1,.,n
Male salary Yj = 0 + 1 xj + j , j=n+1,n+2,.,n+m
and cfw_i
STAT5044: Regression and Anova
Inyoung Kim
Outline
1
Collinearity
Collinearity
A near-linear relationship (high correlation coefcient) among
covariates
Does not reduce bias much (because it can be explained roughly
by a linear combination of other covaria
STAT5044: Lab11
Inyoung Kim
Outline
1
How to t GLM using R
Example for logistic regression
The logistic model we start with the relates the probability of
developing Kyphosis to the three predictor variables, Age, Number,
and Start. We t the model using g
STAT5044: Lab4
Inyoung Kim
Outline
1
How to analyze categorical data using R
Example
Clinical Trial for Rosiglitazone (Avandia) for the treatment of
diabetes
Data
Avabdia
Control
Total
Heart Attack
27
41
68
No Heart Attack
1429
2854
4283
Total
1456
2895
4
STAT5044: lab 9
Inyoung Kim
1 / 11
Outline
1
How to t the segmented regression in R
2 / 11
Example
The data is an old economic dataset on 50 different countries.
These data are averages from 1960 to 1970 (to remove business
cycle or short-term uctuations)
STAT5044: lab 2
Inyoung Kim
Outline
1
How to estimate the regression line and make inference
Example
A substance used in biological and medical research is shipped
by airfreight to users in cartons of 1,000 ampules.
The data, involving 10 shipments, were
STAT5044: Regression and ANOVA, Fall 2010
Final Exam on Dec 11
Your Name:
Please make sure to specify all of your notations in each problem
GOOD LUCK!
1
Problem# 1.
A hospital administrator wished to study the relation between patient satisfaction (Y ) an