STA4203/5207 - Applied Linear Regression
Adrian Barbu
August 30, 2012
1
Estimation
We are given some data and we want to observe a relationship between one or more
variables of interest y and some observed variables x = (x1 , ., xp ). y is called the
depe
STA4203/5207 - Applied Linear Regression
Adrian Barbu
September 14, 2015
3
Diagnostics
For our regressions, we made some assumptions that can generate problems such as:
Error We assumed that N (0, 2 I), i.e. the errors are i.i.d. from N (0, 2 ).
Model We
Homework 6, due October 11th, 5:15pm
October 4, 2012
1. Using the divusa data:
a) Fit a regression model with divorce as the response and unemployed, femlab,
marriage, birth and military as predictors. Compute the condition indexes and interpret their mea
STA4203/5207 - Applied Linear Regression
Adrian Barbu
August 31, 2015
2
Inference
The obtained parameters have uncertainty.
The least squares estimation assumes that the errors are i.i.d (independent and identically distributed) with mean 0 and variance 2
STA4203/5207 - Applied Linear Regression
Adrian Barbu
September 11, 2012
3
Diagnostics
For our regressions, we made some assumptions that can generate problems such as:
Error We assumed that N (0, 2 I ), i.e. the errors are i.i.d. from N (0, 2 ).
Model We
STA4203/5207 - Applied Linear Regression
Adrian Barbu
August 31, 2015
1
Estimation
We are given some data and we want to observe a relationship between one or more
variables of interest y and some observed variables x = (x1 , ., xp ). y is called the
depe
Homework 3, due Sept 20th, 5:15pm
September 18, 2012
1. For the prostate data, t a model with lpsa as the response, and the other
variables as predictors.
a) Which variables are statistically signicant at the 5% level? (1 point)
b1 ) Predict the lpsa for
STA4203/5207 - Applied Linear Regression
Adrian Barbu
September 4, 2012
2
Inference
The obtained parameters have uncertainty.
The least squares estimation assumes that the errors are i.i.d (independent and identically distributed) with mean 0 and variance
Accounting Intensive Review 2016
Pre-qualifying Exam LEVEL 1
Financial Accounting and Reporting
Name:
Date:
Instruction: Shade the letter of the correct answer. Use pencil in shading. NO ERASURE. NO CHEATING.
1.
To be relevant, an information should have
STA4203/5207 - Applied Linear Regression
Adrian Barbu
August 29, 2016
0.1
Short Review on Matrices
A matrix A Mm,n is a table with m rows and n columns:
a11 a12 . a1n
a21 a22 . a2n
Mm,n
A=
.
.
.
.
am1 am2 . amn
A can also be written as A = (aij )i,j
Armando Diaz
Raul Rodriguez
Daphne Solis
Applied Regression Methods: Homework #2
1.a)
E ( )=e T e=( Y X T ) (Y X )
T
T
T
T
T
T
Y Y Y X X Y +B X XB
Y T Y 2 Y T X + T X T X , where T X T Y =Y T X
E
=02 Y T X +2 T X T X
T
T
T
2Y X+ 2 X X
1.b)
Y T X + T X T
STA4203/5207 - Applied Linear Regression
Adrian Barbu
October 28, 2015
7
Variable Selection
We want to do variable selection because:
1. We want to explain the data using the simplest models possible. The principle
of Occams Razor states that among severa
STA4203/5207 - Applied Linear Regression
Adrian Barbu
October 12, 2015
4
4.1
Problems with the Predictors
Errors in the Predictors
What happens if the predictors themselves are measured with some error?
For example, consider the problem of determining the
STA4203/5207 - Applied Linear Regression
Adrian Barbu
November 9, 2015
8
Shrinkage Methods
8.1
Principal Components
Recall the regression equation:
y = X +
We saw that it can happen that the predictors are correlated. We would like them to
be orthogonal (
Linhui Yao
Homework 7
a)
b)
c)
d)
e)
f)
g)
Variance is constant.
Yes.
Yes
La Vallee, Val de Ruz
No outliers
V.De Geneve
Nonlinearity and no outliers.
h) Pr<Dw is 0.0113 (correlated errors)
Midterm Project, due October 18th, 5:15pm
STA 4203/5207
October 11, 2012
This project is not straightforward. You will have to work out how to write the code
to achieve the given tasks. You might nd proc score and proc univariate
useful for solving d), e)
Homework 12, due November 29th, 5:15pm
November 20, 2012
1. Consider the seatpos dataset, with hipcenter as the dependent variable and
the other variables as predictors. Standardize all the predictors. Divide the dataset into
seatpos0 and seatpos1,contain
Homework 11, due November 20th, 5:15pm
STA 4203/5207
1. Consider the seatpos dataset, with hipcenter as the dependent variable and
the other variables as predictors. Standardize all the predictors.
a) Draw a scatter plot of seated vs ht. (1 point)
b) Perf
STA4203/5207 - Applied Linear Regression
Adrian Barbu
August 28, 2012
0.2
Loading the data
We will use datasets in the form of tab delimited (.txt) or coma separated (.csv). Loading can be done using the menus or using a few lines of code.
Load tab-delimi
STA4203/5207 - Applied Linear Regression
Adrian Barbu
September 27, 2012
4
4.1
Problems with the Predictors
Errors in the Predictors
What happens if the predictors themselves are measured with some error?
For example, consider the problem of determining t
STA4203/5207 - Applied Linear Regression
Adrian Barbu
October 4, 2012
5
Problems with the Error
The assumptions about the error were that the error is independent and identically
distributed (i.i.d.) from case to case. We saw in the Diagnostics section th
STA4203/5207 - Applied Linear Regression
Adrian Barbu
October 16, 2012
6
Transformation
We can transform the response and the predictors to improve the linear t and correct
violations of the model assumptions such as constant variance.
6.1
Transforming th
STA4203/5207 - Applied Linear Regression
Adrian Barbu
October 25, 2012
7
Variable Selection
We want to do variable selection because:
1. We want to explain the data using the simplest models possible. The principle
of Occams Razor states that among severa
STA4203/5207 - Applied Linear Regression
Adrian Barbu
December 4, 2012
8
Shrinkage Methods
8.1
Principal Components
Recall the regression equation:
y = X +
We saw that it can happen that the predictors are correlated. We would like them to
be orthogonal (
Final Project, due December 6th, 5:15pm
November 29, 2012
This project uses the bodyfat dataset with fat as the response and the remaining
variables except id, siri and density as predictors.
1. Perform variable selection on the entire dataset using the A
Homework 1, due September 6th, 5:15pm
August 30, 2012
1. We saw in class that the linear model can be written in matrix form:
Y = X +
Since
n
2
i=1 i
=
T
, the least square estimation means nding that minimizes
E ( ) =
T
= (Y X )T (Y X )
From here
a) Use
Homework 2, due Sept 13th, 5:15pm
STA 4203/5207
September 10, 2012
1. Generate a table containing 500 rows (xi , ui , vi , ei ) such that
ui , xi are independent uniform samples from the interval [0, 1], xi , ui U ([0, 1].
vi and ei are independent norm
Homework 4, due Sept 27th, 5:15pm
STA 4203/5207
September 20, 2012
1. Using the prostate data, t a model with lpsa as the response and the other
variables as predictors.
a) Plot the residuals versus predicted values. Do the residuals seem to have constant
Homework 5, due October 4th, 5:15pm
September 27, 2012
1. Using the uswages data, t a model with log(wage) as the response and educ
and exper as predictors.
a) Find the cutoff value for the outliers. (1 point)
b) Based on the cutoff, nd the outliers and r