HSTD 334/Stat 224
Applied Regression Analysis
MIDTERM EXAM 4 November 2014
NAME: (Q _Student ID (if any): _._
INSTRUCTIONS: You have 1 hour 20 minutes (full class period) to work on this exam.
Questions are of varying difculty (so dont overthink the easie
STAT 224 / HSTD 324
Problem Set 2
Fall 2014
Possible points 64.
1. [15 pts total] Exercise 3.3
Table 3.10 shows the scores in the nal examination F and the scores in the two preliminary
examinations P1 and P2 for 22 students in a statistics course. The da
STAT 224 / HSTD 324
Problem Set 5
Fall 2014
Possible points 45
1. [15 pts total] Exercises 6.7
Oil Production Data: The data in Table 6.19 are the annual world crude oil production in
millions of barrels for the period 1880-1988. The data are taken from M
STAT 224 / HSTD 324
Problem Set 4
Fall 2014
Possible points 78.
1. [20 pts total] Modied from Exercises 5.4
Perform a thorough analysis of the Education Expenditures data in Tables 5.12, 5.13, and
5.14 using the ideas presented in Section 5.7. You are exp
STAT 224 / HSTD 324
Problem Set 3
Fall 2014
Possible points 67
1. [12 pts, 3 pts for each part] Exercise 4.1(a)
Check to see whether or not the standard regression assumptions are valid for the following
data set:
The Milk Production data described in Sec
STAT 224 / HSTD 324
Solutions to Problem Set 5
Fall 2014
Possible points 45.
1. [15 pts total] Exercises 6.7
. use http:/www.ats.ucla.edu/stat/stata/examples/chp/p179, clear
0
5000
Barrels
10000
15000
20000
(a) [2 pt] Construct a scatter plot of the oil p
STAT 224 / HSTD 324
Solutions to Problem Set 2
Fall 2014
Possible points 64.
1. [15 pts total] Exercise 3.3
. use http:/www.ats.ucla.edu/stat/stata/examples/chp/p076, clear
(a) The scatter plots indicate positive associations between preliminary exams and
STAT 224 / HSTD 324
Solutions to Problem Set 1
Fall 2014
Possible points 90.
1. [6 pts, 2 pts each] Exercise 2.2
(a) Disagree. Cov(Y, X) can take any value between to +, but the correlation measure
cor(Y, X) must be between -1 and 1.
(b) Disagree. If Cov(
STAT 224 / HSTD 324
Problem Set 1
Fall 2014
Possible points 90.
1. [6 pts, 2 pts each] Modied Exercise 2.2
Explain why you would or would not agree with each of the following statements:
(a) Cov(Y, X) and Cor(Y, X) can take values between and +.
(b) If Co
PBHS 32400 / STAT 22400 Autumn
Transformation of Variables
What do we do when the regression assumptions are violated? We
will consider the situations of non-linearity, non-normality and
heteroscedasticity, and examine available remedies for each.
Trans
PBHS 32400 / STAT 22400
Multiple Linear Regression
We started by the simplest statistical model that makes some
physical sense and fits the assumptions we impose. From there,
we might naturally build a more elaborate models.
Specifically, if there are o
PBHS 32400 / STAT 22400
Regression Diagnostics I
Up to this point we have looked at the basics of linear regression.
We learned how to:
1. fit simple and multiple linear regression models
2. interpret the coefficients
3. test hypotheses about the models
PBHS 32400/STAT 22400
Categorical Predictor Variables
Not all potential predictors in a regression model need to be
values measured on a continuous numeric scale. In fact, in
addition to numeric predictors we have looked at variables that
could be descri
1)
a) Cov(Y, X) can take values between positive and negative infinity because the scale of the
covariance value is dependent on the units of measurement used. A covariance measured
in nanometers will have a wildly different value that one measured in kil
PBHS 32400 / STAT 22400 Autumn
Adjusting for Non-constant Variance (Heteroscedastic Errors)
We looked at some transformations that deal with situations
where the response variable is not normally distributed but rather
comes from a distribution where the
PBHS 32400 / STAT 22400
Elementary Inference: a review
Concepts:
Sample and population
Sampling as an experiment
Sample statistics (summaries of data) as random variables
Sampling distributions
Hypotheses, test statistics, and hypothesis testing
Some
STAT 224 / PBHS 324
Problem Set 1
Due Oct 7th 6pm, 2016.
Fall 2016
Possible points 88.
1. [6 pts, 2 pts each] Modified Exercise 2.2
Explain why you would or would not agree with each of the following statements:
(a) Cov(Y, X) and Cor(Y, X) can take values
STAT 224 / HSTD 324
Problem Set 3
Fall 2016
Due Oct 22nd 6pm. Possible points 67.
1. [12 pts, 3 pts for each part] Exercise 4.1(a)
Check to see whether or not the standard regression assumptions are valid for the following
data set:
The Milk Production da
STAT 224 / HSTD 324
Problem Set 2
Fall 2016
Due Oct 15th 6pm. Possible points 65.
1. [15 pts total] Exercise 3.3
Table 3.10 shows the scores in the final examination F and the scores in the two preliminary
examinations P1 and P2 for 22 students in a stati
PBHS 32400 / STAT 22400
History of Linear Regression
Early Ideas and Methodology
Early 1800s, Legendre, Laplace, Gauss (1822) fully established key
properties of method of least squares to fit lines to observations.
Used in various fields - astronomy, g
* Example: create dummy variables (HW4 Q2)
* load dataset
use http:/www.ats.ucla.edu/stat/stata/examples/chp/p148, clear
* create dummy (indicator) variables for the fertilizers
* there are three ways to create them, you can use any one of the followings
PBHS 32400 / STAT 22400
Multicollinearity in Multiple Regression
What is multicollinearity? Example from Table 9.1 and 9.2 of
C&H: Equal Educational Opportunity (EEO) data:
Measurements were taken in 1965 for 70 random schools. The
level of student achie
HW2 STATA Help
To perform correlation of variables
cor y var1 var2
To perform regression and F test to test significance of variable
regress y var1 var2
test var1
To test if two variables are statistically different from zero
test var1 var2
To generate a
PBHS 32400 / STAT 22400
Regression Models for a Probability of Response Outcome
To now we have talked about regression models where the
response variable Y was continuous and (approximately) normally
distributed. We now consider the case where Y is a bin
PBHS 32400 / STAT 22400
Variable (Model) Selection
Thus far we have mostly worked with example problems where
predictor variables were identified in advance. All or most of these
had some value towards constructing the linear model. Often in
modeling, we
A Generalized Approach for Many Model Types
Noting and taking advantage of commonalities among linear
models for dierent response variable types, Nelder and
Wedderburn and later McCullagh (UChicago) and Nelder
developed Generalized Linear Models
This ap
* Example: create dummy variables (HW4 Q2)
* load dataset
use http:/www.ats.ucla.edu/stat/stata/examples/chp/p148, clear
* create dummy (indicator) variables for the fertilizers
* there are three ways to create them, you can use any one of the followings:
STAT 224 / PBHS 324
Problem Set 4
Fall 2016
Due Nov 5th 6pm. Possible points 71.
1. [20 pts total] Modified from Exercises 5.4
Perform a thorough analysis of the Education Expenditures data in Tables 5.12, 5.13, and
5.14. In this data, Y is the per capita