13:58 Friday 15th February, 2013
Chapter 9
Additive Models
9.1
Partial Residuals and Back-tting for Linear Models
The general form of a linear regression model is
~ x
E Y |X = ~ =
~ ~ =
x
0+
p
X
j =0
j xj
(9.1)
where for j 2 1 : p, the x j are the compone
36-402/608
Homework #10
due 10:30AM 4/1
1. Fixing Breakout 17 (60 points)
You must use SAS for this problem!
Modify the code in wallaby.sas to load the wallaby data and to create a new outcome
in the form of the log of the grams. Follow the model selectio
36-402/608
Homework 9 Solutions: SAS
March 25
Problem 1 (50 points)
Your code (30 points) should always include a title. The infile statement
includes DSD to handle comma-separated-values and firstobs=2 to skip the
header line in the file. The player=1 an
36-402/608
Homework #9
due 10:30AM 3/25
1. Violins and Brains (50 points)
You must use SAS for this problem!
Using the data in ex0730.csv, do Sleuth problem 30 on page 204. Create an indicator
variable for player vs. non-player. Include appropriate EDA fo
36-402/608
Homework #8 Solutions
3/18
1. Pig fat (50 points)
(a) Look at Sleuth problem 17.08 and the answer on page 528. Load the data
from ex1708.csv. Verify that you cannot run step(lm(fat ., data=fat),
direction="backward") (even after correcting for
36-402/608
Homework #8
due 10:30AM 3/18
(Optional)
1. Pig fat (50 points)
(a) Look at Sleuth problem 17.08 and the answer on page 528. Load the data
from ex1708.csv. Verify that you cannot run step(lm(fat ., data=fat),
direction="backward") (even after co
36-402/608
Homework #7 Solutions
3/4
1. Monkey memory (40 points)
This problem uses the monkey memory data of Sleuth Chapter 16, case 1 from
case1601.csv. See page 463 for a description.
(a) Read in the data, produce a plot similar to the one on page 463,
36-402/608
Homework #7
due 10:30AM 3/4
1. Monkey memory (40 points)
This problem uses the monkey memory data of Sleuth Chapter 16, case 1 from
case1601.csv. See page 463 for a description.
(a) Read in the data, produce a plot similar to the one on page 46
36-402/608
Homework #6 Solutions
2/25
1. Global warming (50 points)
This problem uses the global warming data of Sleuth Chapter 15, case 2 from
case1502.csv. See page 438 for a description.
(a) Read in the data, and t a simple regression model of temperat
36-402/608
Homework #10 Solutions
4/1
1. Fixing Breakout 17 (60 points)
You must use SAS for this problem!
Modify the code in wallaby.sas to load the wallaby data and to create a new outcome
in the form of the log of the grams. Follow the model selection
36-402/608
Homework #11
due 10:30AM 4/8
1. Dyads (60 points)
You must use SAS for this problem! Use the DDFM=SATTERTH option.
This problem is a study of income in married couples in Massachusetts. Use this
code to load the data in dyads.dat.
DATA dyads;
I
36-402/608
Homework #11
due 10:30AM 4/8
1. Dyads (60 points)
You must use SAS for this problem! Use the DDFM=SATTERTH option.
This problem is a study of income in married couples in Massachusetts. Use this
code to load the data in dyads.dat.
DATA dyads;
I
High-dimensional regression
Advanced Methods for Data Analysis (36-402/36-608)
Spring 2014
1
Back to linear regression
1.1
Shortcomings
Suppose that we are given outcome measurements y1 , . . . yn R, and corresponding predictor
measurements x1 , . . . xn
23:15 Wednesday 27th February, 2013
Chapter 12
Logistic Regression
12.1
Modeling Conditional Probabilities
So far, we either looked at estimating the conditional expectations of continuous
variables (as in regression), or at estimating distributions. Ther
11:53 Thursday 24th January, 2013
Chapter 4
Using Nonparametric
Smoothing in Regression
Having spent long enough running down linear regression, and thought through
evaluating predictive models, it is time to turn to constructive alternatives, which are
(
09:26 Thursday 28th March, 2013
Chapter 17
Principal Components Analysis
Principal components analysis (PCA) is one of a family of techniques for taking
high-dimensional data, and using the dependencies between the variables to represent
it in a more trac
12:10 Tuesday 12th February, 2013
Chapter 8
Splines
8.1
Smoothing by Directly Penalizing Curve Flexibility
Lets go back to the problem of smoothing one-dimensional data. We have data points
(x1 , y1 ), (x2 , y2 ), . . . (xn , yn ), and we want to nd a goo
10:25 Wednesday 30th January, 2013
Chapter 6
The Bootstrap
We are now several chapters into a statistics class and have said basically nothing
about uncertainty. This should seem odd, and may even be disturbing if you are very
attached to your p-values an
36-402/608
Homework #12 Solutions
4/22
1. Lymphoma and radiation (34 points)
Read problem 19.14 on page 574. Using ex1914.csv, load the data into R using this
code:
lymph = read.csv("ex1914.csv")
lymphA = array(t(cbind(lymph$survive,lymph$died),
dim=c(2,2
36-402/608
Homework #12
due 10:30AM 4/22
1. Lymphoma and radiation (34 points)
Read problem 19.14 on page 574. Using ex1914.csv, load the data into R using this
code:
lymph = read.csv("ex1914.csv")
lymphA = array(t(cbind(lymph$survive,lymph$died),
dim=c(2
36-402/608
Homework #6
due 10:30AM 2/25
1. Global warming (50 points)
This problem uses the global warming data of Sleuth Chapter 15, case 2 from
case1502.csv. See page 438 for a description.
(a) Read in the data, and t a simple regression model of temper
36-402/608
Homework #5 Solutions
2/18
1. Moon phases and behavior (20 points, 5 each)
20
15
10
5
Month (black) / sqrtAcc (red)
25
30
There have been many studies of moon phases and behavior. The data for this
problem represent the daily accident rate for
36-402/608
Homework #5
due 10:30AM 2/18
1. Moon phases and behavior (20 points, 5 each)
20
15
10
5
Month (black) / sqrtAcc (red)
25
30
There have been many studies of moon phases and behavior. The data for this
problem represent the daily accident rate fo
Homework Assignment 2: The Advantages of
Backwardness
36-402, Advanced Data Analysis, Spring 2011
SOLUTIONS
This problem set was based on the preliminary analysis in the paper
E. Maasoumi, J. S. Racine and T. Stengos, Growth and convergence: a prole of di
Homework Assignment 1: Whats That Got to
Do with the Price of Condos in California?
36-402, Advanced Data Analysis, Spring 2011
SOLUTIONS
The easiest way to load the data is with read.table, but you have to tell
R that the rst line names the variables:
>
Midterm Exam 2: Mystery Multivariate Data
36-402, Advanced Data Analysis, Spring 2011
SOLUTIONS
General note: The data came from a ten-dimensional Gaussian. Each
variable had an expected value of 100 and a standard deviation of 15. The
correlation matrix
Midterm Exam 1: Urban Scaling, Continued
36-402, Advanced Data Analysis, Spring 2011
SOLUTIONS
General set-up:
gmp = read.csv(file = "gmp-2006.csv")
Your data le was derived from this data le, plus or minus 4% noise for each
observation.
1. Answer: The ba
Homework Assignment 10: Estimating with
DAGs
36-402, Advanced Data Analysis, Spring 2011
Solutions
1. (a) Answer:
Variable
cancer
cellular damage
tar
teeth
dental care
smoking
asbestos
occupation
(b) Answer:
Variable
cancer
cellular
tar
teeth
dental
smoki
Homework Assignment 9: Patterns of Exchange
36-402, Advanced Data Analysis, Spring 2011
SOLUTIONS
1. (a) Answer:
# Load fx.csv as "fx"
fx = read.csv("http:/www.stat.cmu.edu/~cshalizi/402/hw/09/fx.csv",
header = T, row.names = 1)
# verify that the matrix i
Homework Assignment 8: Fairs Aairs
36-402, Advanced Data Analysis, Spring 2011
SOLUTIONS
library(AER)
data(Affairs)
1. Answer:
(a) When dealing with an counting variable Y with a known (not estimated) upper limit m, we can try to model it as having a bino