13:58 Friday 15th February, 2013
Chapter 9
Additive Models
9.1
Partial Residuals and Back-fitting for Linear Models
The general form of a linear regression model is
~ x
E Y |X = ~ =
~ ~ =
x
0+
p
X
j =0
j xj
(9.1)
where for j ∈ 1:p, the xj are the components
36-402/608
Homework #10
due 10:30AM 4/1
1. Fixing Breakout 17 (60 points)
You must use SAS for this problem!
Modify the code in wallaby.sas to load the wallaby data and to create a new outcome
in the form of the log of the grams. Follow the model selectio
Using the data in ex0730.csv, do Sleuth problem 30 on page 204. Create an indicator
(a) Look at Sleuth problem 17.08 and the answer on page 528. Load the data
(a) Look at Sleuth problem 17.08 and the answer on page 528. Load the data
(a) Read in the data, produce a plot similar to the one on page 463,
(a) Read in the data, and fit a simple regression model of temperature
High-dimensional regression
Advanced Methods for Data Analysis (36-402/36-608)
Spring 2014
1
Back to linear regression
1.1
Shortcomings
Suppose that we are given outcome measurements y1 , . . . yn R, and corresponding predictor
measurements x1, . . . xn
23:15 Wednesday 27th February, 2013
Chapter 12
Logistic Regression
12.1
Modeling Conditional Probabilities
So far, we either looked at estimating the conditional expectations of continuous
variables (as in regression), or at estimating distributions. There
11:53 Thursday 24th January, 2013
Chapter 4
Using Nonparametric
Smoothing in Regression
Having spent long enough running down linear regression, and thought through
evaluating predictive models, it is time to turn to constructive alternatives, which are
(
09:26 Thursday 28th March, 2013
Chapter 17
Principal Components Analysis
Principal components analysis (PCA) is one of a family of techniques for taking
high-dimensional data, and using the dependencies between the variables to represent
it in a more tractable
12:10 Tuesday 12th February, 2013
Chapter 8
Splines
8.1
Smoothing by Directly Penalizing Curve Flexibility
Lets go back to the problem of smoothing one-dimensional data. We have data points
(x1, y1), (x2, y2), . . . (xn, yn), and we want to find a good
10:25 Wednesday 30th January, 2013
Chapter 6
The Bootstrap
We are now several chapters into a statistics class and have said basically nothing
about uncertainty. This should seem odd, and may even be disturbing if you are very
attached to your p-values and
Homework Assignment 2: The Advantages of
Backwardness
36-402, Advanced Data Analysis, Spring 2011
SOLUTIONS
This problem set was based on the preliminary analysis in the paper
E. Maasoumi, J. S. Racine and T. Stengos, Growth and convergence: a profile of distribution
Homework Assignment 1: Whats That Got to
Do with the Price of Condos in California?
36-402, Advanced Data Analysis, Spring 2011
SOLUTIONS
The easiest way to load the data is with read.table, but you have to tell
R that the rst line names the variables:
>
Midterm Exam 2: Mystery Multivariate Data
36-402, Advanced Data Analysis, Spring 2011
SOLUTIONS
General note: The data came from a ten-dimensional Gaussian. Each
variable had an expected value of 100 and a standard deviation of 15. The
correlation matrix
Midterm Exam 1: Urban Scaling, Continued
36-402, Advanced Data Analysis, Spring 2011
SOLUTIONS
General set-up:
gmp = read.csv(file = "gmp-2006.csv")
Your data le was derived from this data le, plus or minus 4% noise for each
observation.
1. Answer: The basic
Homework Assignment 10: Estimating with
DAGs
36-402, Advanced Data Analysis, Spring 2011
Solutions
1. (a) Answer:
Variable
cancer
cellular damage
tar
teeth
dental care
smoking
asbestos
occupation
(b) Answer:
Variable
cancer
cellular
tar
teeth
dental
smoki
Homework Assignment 9: Patterns of Exchange
36-402, Advanced Data Analysis, Spring 2011
SOLUTIONS
1. (a) Answer:
# Load fx.csv as "fx"
fx = read.csv("http:/www.stat.cmu.edu/~cshalizi/402/hw/09/fx.csv",
header = T, row.names = 1)
# verify that the matrix is
Homework Assignment 8: Fairs Aairs
36-402, Advanced Data Analysis, Spring 2011
SOLUTIONS
library(AER)
data(Affairs)
1. Answer:
(a) When dealing with an counting variable Y with a known (not estimated) upper limit m, we can try to model it as having a bino