Assignment 1
Due Data: 10/9/2014
Data Description and Background
In 1929, Edwin Hubble investigated the relationship between the distance of a galaxy from the earth
and the velocity with which it appears to be receding. Galaxies appear to be moving away f

Assignment 1
Due Data: TBD
Data Description and Background
In 1929, Edwin Hubble investigated the relationship between the distance of a galaxy from the earth
and the velocity with which it appears to be receding. Galaxies appear to be moving away from us

Assignment 3
Due Data: 11/25/2014
Data Description and Background
Variation in gasoline mileage among makes and models of automobiles is influenced substantially by
the size of the vehicle and its engine. The analysis dataset contains data on various auto

Assignment 2
Due Data: 10/23/2014
Data Description and Background
An amateur brewer wishes to better understand how the temperature that the beer ferments at (in
degrees Fahrenheit) affects the alcohol content of the beer upon completion of brewing. Fortu

Assignment 2
Due Data: 10/23/2014
Data Description and Background
An amateur brewer wishes to better understand how the temperature that the beer ferments at (in
degrees Fahrenheit) affects the alcohol content of the beer upon completion of brewing. Fortu

Final Problems
1) Let and denote2 the least squares estimates of the coefficients and residual variance
% the multiple linear model. Show that the quantity
a t in the below
respectively for
lies
%
d 1
a
confidence interval for all
%
t
a
%
a% X X a% dF

Assignment 3
Due Data: 11/25/2014
Data Description and Background
Variation in gasoline mileage among makes and models of automobiles is influenced substantially by
the size of the vehicle and its engine. The analysis dataset contains data on various auto

Assignment 4
Due Data: 12/18/2014
Data Description and Background
The RMS Titanic was a passenger liner built in 1912 and at the time had the distinctionofbeing the
largest sea-going vessel in the world. However, its legacy was forever defined the morning

Multicollinearity
STAT 563 Spring 2007
Recap
When the predictors are correlated or express near-linear dependencies, we face the problem of multicollinearity The primary sources of multicollinearity
The data collection method employed Constraints on the

Animal Data Analysis Code
# First load the data into R (Remember to change the directory) #
animal_data = read.table("Animal_data.txt", header = TRUE)
# Now let's get a sense of what we have #
dim(animal_data)
names(animal_data)
animal_data
#
is
as
#
We w

Initial Data Exploration
STAT 563 Spring 2007 Mani Lakshminarayanan
Inheritance of Height
Original data collected by E.S. Pearson during 1893-1898 n=1375 heights of mothers under the age of 65 and one of their adult daughters over the age of 18 Q

STAT 563 - Assignment 1
Due Date: September 24th 2015
Data Description and Background
In 1929, Edwin Hubble investigated the relationship between the distance of a galaxy from the earth
and the velocity with which it appears to be receding. Galaxies appea

Final Problems
1) Let (xi1, xi2, , xid, yi), i = 1, n be an i.i.d. multivariate sample and assume the multiple linear
model holds, that is
Yi xi xit i | xi
with E i xi 0 and Var i xi 2 .2 Assume that the conditional distribution of the error
terms follow

Final Report
Due Data: 12/17/2015
Data Description and Background
A university medical center urology group was interested in the association between prostate
specific antigen (PSA) and a number of prognostic clinical measurements in men with advanced
pro

Assignment 4
Due Data: 12/3/2015
Data Description and Background
The RMS Titanic was a passenger liner built in 1912 and at the time had the distinction of being the
largest sea-going vessel in the world. However, its legacy was forever defined the mornin

STAT 563: Regression Methods
Simple Linear Regression
What is a Regression Model?
Essentially a mathematical formula relating one variable, referred to as a response variable, to one or more other variables, called predictor variables Often used i

Multiple Regression
STAT 563 Spring 2007
Design Matrix
Define the n x p matrix
1 x11 x12 . x1k 1 x x . x 21 22 2k X= . . . . 1 xn1 xn 2 . xnk
And the column vectors Xj=[X1j,., Xnj] Model now can be written as Y = X +
Model
Where
y1 0

Model Adequacy Checking (Chap 4 & 6)
STAT 563 Spring 2007
Model Assumptions
Recall the five assumptions
Response and the predictors are approximately linear related The error term has zero mean The error term has constant variance 2 The errors

Transformations and Weighting
STAT 563 Spring 2007
Model Assumptions
Common violations are:
Expression for the expected value of Y is not correct The variance is not constant over the range of the data The data are not normally distributed
One

Indicator Variables
STAT 563 Spring 2007
General Concept
Generally multiple regression accommodates only quantitative variables Dummy (or indicator) variables are useful in incorporating qualitative (or categorical) variables in the model
Simples

Variable Selection and Model Building
STAT 563 Spring 2007
Selection of final equation
Two opposing views
To make the equation useful for prediction purposes, we want to include as many predictors (original, transformed etc) as possible Because o

Validation of Regression Models
STAT 563 Spring 2007
Adequacy vs Validation
Model Adequacy requires
Residual analysis Testing for lack of fit Searching for influential observations Other internal analysis
Validation is directed toward determi

Nonlinear Regression
STAT 563 Spring 2007
Regression Model
Recall that we can write the normal theory regression model as
y = f (x, ) +
Where x is a n-vector of input variables, is a k-vector of parameters, and the errors are independent N(0,2

Logistic Regression
STAT 563 Spring 2007
General Linear Models
Family of Regression Models Outcomes variable determines the choice of the model
Binomial Distribution
Example
Assume 5% of the population has Coronary Heart Disease (CHD). If we pi