36-402/608
Homework #2 Solutions
1/28
1. Serial correlation simulation (25 points)
Examine and load the function ARcorSim() from the le ARcorSim.R. Use this
function along with summary() to calculate the power for nsim=1000, n=25, and
1 = 0 over the set o
Homework Assignment 8: Fairs Aairs
36-402, Advanced Data Analysis, Spring 2011
SOLUTIONS
library(AER)
data(Affairs)
1. Answer:
(a) When dealing with an counting variable Y with a known (not estimated) upper limit m, we can try to model it as having a bino
Homework Assignment 1: Whats That Got to
Do with the Price of Condos in California?
36-402, Advanced Data Analysis, Spring 2011
SOLUTIONS
The easiest way to load the data is with read.table, but you have to tell
R that the rst line names the variables:
>
23:15 Wednesday 27th February, 2013
Chapter 12
Logistic Regression
12.1
Modeling Conditional Probabilities
So far, we either looked at estimating the conditional expectations of continuous
variables (as in regression), or at estimating distributions. Ther
36-402/608
Homework #7 Solutions
3/4
1. Monkey memory (40 points)
This problem uses the monkey memory data of Sleuth Chapter 16, case 1 from
case1601.csv. See page 463 for a description.
(a) Read in the data, produce a plot similar to the one on page 463,
36-402/608
Homework #8
due 10:30AM 3/18
(Optional)
1. Pig fat (50 points)
(a) Look at Sleuth problem 17.08 and the answer on page 528. Load the data
from ex1708.csv. Verify that you cannot run step(lm(fat ., data=fat),
direction="backward") (even after co
36-402/608
Homework #8 Solutions
3/18
1. Pig fat (50 points)
(a) Look at Sleuth problem 17.08 and the answer on page 528. Load the data
from ex1708.csv. Verify that you cannot run step(lm(fat ., data=fat),
direction="backward") (even after correcting for
36-402/608
Homework #9
due 10:30AM 3/25
1. Violins and Brains (50 points)
You must use SAS for this problem!
Using the data in ex0730.csv, do Sleuth problem 30 on page 204. Create an indicator
variable for player vs. non-player. Include appropriate EDA fo
36-402/608
Homework 9 Solutions: SAS
March 25
Problem 1 (50 points)
Your code (30 points) should always include a title. The infile statement
includes DSD to handle comma-separated-values and firstobs=2 to skip the
header line in the file. The player=1 an
36-402/608
Homework #10
due 10:30AM 4/1
1. Fixing Breakout 17 (60 points)
You must use SAS for this problem!
Modify the code in wallaby.sas to load the wallaby data and to create a new outcome
in the form of the log of the grams. Follow the model selectio
36-402/608
Homework #10 Solutions
4/1
1. Fixing Breakout 17 (60 points)
You must use SAS for this problem!
Modify the code in wallaby.sas to load the wallaby data and to create a new outcome
in the form of the log of the grams. Follow the model selection
36-402/608
Homework #11
due 10:30AM 4/8
1. Dyads (60 points)
You must use SAS for this problem! Use the DDFM=SATTERTH option.
This problem is a study of income in married couples in Massachusetts. Use this
code to load the data in dyads.dat.
DATA dyads;
I
36-402/608
Homework #11
due 10:30AM 4/8
1. Dyads (60 points)
You must use SAS for this problem! Use the DDFM=SATTERTH option.
This problem is a study of income in married couples in Massachusetts. Use this
code to load the data in dyads.dat.
DATA dyads;
I
13:58 Friday 15th February, 2013
Chapter 9
Additive Models
9.1
Partial Residuals and Back-tting for Linear Models
The general form of a linear regression model is
~ x
E Y |X = ~ =
~ ~ =
x
0+
p
X
j =0
j xj
(9.1)
where for j 2 1 : p, the x j are the compone
36-402/608
Homework #12
due 10:30AM 4/22
1. Lymphoma and radiation (34 points)
Read problem 19.14 on page 574. Using ex1914.csv, load the data into R using this
code:
lymph = read.csv("ex1914.csv")
lymphA = array(t(cbind(lymph$survive,lymph$died),
dim=c(2
36-402/608
Homework #12 Solutions
4/22
1. Lymphoma and radiation (34 points)
Read problem 19.14 on page 574. Using ex1914.csv, load the data into R using this
code:
lymph = read.csv("ex1914.csv")
lymphA = array(t(cbind(lymph$survive,lymph$died),
dim=c(2,2
High-dimensional regression
Advanced Methods for Data Analysis (36-402/36-608)
Spring 2014
1
Back to linear regression
1.1
Shortcomings
Suppose that we are given outcome measurements y1 , . . . yn R, and corresponding predictor
measurements x1 , . . . xn
10:25 Wednesday 30th January, 2013
Chapter 6
The Bootstrap
We are now several chapters into a statistics class and have said basically nothing
about uncertainty. This should seem odd, and may even be disturbing if you are very
attached to your p-values an
12:10 Tuesday 12th February, 2013
Chapter 8
Splines
8.1
Smoothing by Directly Penalizing Curve Flexibility
Lets go back to the problem of smoothing one-dimensional data. We have data points
(x1 , y1 ), (x2 , y2 ), . . . (xn , yn ), and we want to nd a goo
09:26 Thursday 28th March, 2013
Chapter 17
Principal Components Analysis
Principal components analysis (PCA) is one of a family of techniques for taking
high-dimensional data, and using the dependencies between the variables to represent
it in a more trac
11:53 Thursday 24th January, 2013
Chapter 4
Using Nonparametric
Smoothing in Regression
Having spent long enough running down linear regression, and thought through
evaluating predictive models, it is time to turn to constructive alternatives, which are
(
36-402/608
Homework #7
due 10:30AM 3/4
1. Monkey memory (40 points)
This problem uses the monkey memory data of Sleuth Chapter 16, case 1 from
case1601.csv. See page 463 for a description.
(a) Read in the data, produce a plot similar to the one on page 46
36-402/608
Homework #6 Solutions
2/25
1. Global warming (50 points)
This problem uses the global warming data of Sleuth Chapter 15, case 2 from
case1502.csv. See page 438 for a description.
(a) Read in the data, and t a simple regression model of temperat
36-402/608
Homework #6
due 10:30AM 2/25
1. Global warming (50 points)
This problem uses the global warming data of Sleuth Chapter 15, case 2 from
case1502.csv. See page 438 for a description.
(a) Read in the data, and t a simple regression model of temper
Homework Assignment 5: Bootstrapping Will
Continue Until Morale Improves
36-402, Advanced Data Analysis, Spring 2011
SOLUTIONS
1. Answer:
library(MASS)
cats.lm1 <- lm(Hwt ~ 0+Bwt,data=cats) # "0+" sets intercept to zero
summary(cats.lm1)
# Quick view of r
Homework 4: An Insucciently Random Walk
Down Wall Street
36-402, Advanced Data Analysis, Spring 2011
SOLUTIONS
1. Answer: Following the notes for lecture 7,
# You can download the data directly the web like this
sp <- read.csv("SPhistory.short.csv")
spdat
Homework Assignment 2: The Advantages of
Backwardness
36-402, Advanced Data Analysis, Spring 2011
SOLUTIONS
This problem set was based on the preliminary analysis in the paper
E. Maasoumi, J. S. Racine and T. Stengos, Growth and convergence: a prole of di
Midterm Exam 2: Mystery Multivariate Data
36-402, Advanced Data Analysis, Spring 2011
SOLUTIONS
General note: The data came from a ten-dimensional Gaussian. Each
variable had an expected value of 100 and a standard deviation of 15. The
correlation matrix
Midterm Exam 1: Urban Scaling, Continued
36-402, Advanced Data Analysis, Spring 2011
SOLUTIONS
General set-up:
gmp = read.csv(file = "gmp-2006.csv")
Your data le was derived from this data le, plus or minus 4% noise for each
observation.
1. Answer: The ba