10.12-HW2 - MS&E 226 Small Data Problem Set 2 Due 5:00...

Info icon This preview shows pages 1–3. Sign up to view the full content.

View Full Document Right Arrow Icon
MS&E 226 Problem Set 2 “Small” Data Due: October 13, 2016, 5:00 PM (electronic submission on Gradescope) Problem 1. (Intercepts and centering) Suppose you are given observations Y i , i = 1 , . . . , n and covariates X ij , i = 1 , . . . , n , j = 1 , . . . p . Suppose we center each covariate by removing its sample mean: ˜ X ij = X ij - X j , where X j = 1 n P n i =1 X ij . In addition, center the observations: ˜ Y i = Y i - Y , where Y = 1 n P n i =1 Y i . Now suppose we fit a linear model ˜ Y i ˆ β 0 + P p j =1 ˆ β j ˜ X ij . Show that in the resulting ordinary least squares solution, ˆ β 0 = 0 . Problem 2. (Linear transformations) Suppose you are given data Y i , i = 1 , . . . , n , and covariates X ij , i = 1 , . . . , n , j = 1 , . . . , p . Assume that p < n and the n ( p + 1) design matrix X has full rank p + 1 , with first column equal to all 1 ’s. You fit a linear regression model Y X ˆ β by ordinary least squares and obtain the coefficients ˆ β 0 , ˆ β 1 , . . . , ˆ β p . Given an invertible ( p + 1) ( p + 1) matrix A , define ˜ X = XA . Fit a linear model Y ˜ X ˜ β . What are the coefficients ˜ β produced by ordinary least squares?
Image of page 1

Info icon This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document Right Arrow Icon
Problem 3. (Collinearity) In this problem you will use a simulated dataset in R to get a feel for collinearity in regression models. First, create a vector X consisting of n = 1000 i.i.d. samples from a N (0 , 1) distribution: > X = rnorm(1000,0,1) This will be our “synthetic” covariate vector for this example. ( Note : rnorm(n,a,b) generates n i.i.d. normal random variables with mean a and variance b .) Next, create 1000 “synthetic” observed outcomes as Y , as follows: > Y = 1 + X + rnorm(1000,0,0.5) Note Y is approximately linear in X , but with noise added. a) Run the regression lm(formula = Y ˜ 1 + X) . What is the coefficient on X ? What is R 2 ? b) Now repeat the following steps 10 times: (i) Create a vector Z as: > Z = X + rnorm(1000,0,0.05) (ii) Run the regression lm(formula = Y ˜ 1 + X + Z) . Record the coefficients on X and Z , as well as the resulting R 2 . What do you notice about the coefficients on X and Z in your simulations? Compare your results, as well as the R 2 obtained, to part (a). (For further exploration, you might also find it interesting to try using a noise variance larger or smaller than 0 . 05 in the expression for Z above.) c) Now we study out-of-sample predictions
Image of page 2
Image of page 3
This is the end of the preview. Sign up to access the rest of the document.

{[ snackBarMessage ]}

What students are saying

  • Left Quote Icon

    As a current student on this bumpy collegiate pathway, I stumbled upon Course Hero, where I can find study resources for nearly all my courses, get online help from tutors 24/7, and even share my old projects, papers, and lecture notes with other students.

    Student Picture

    Kiran Temple University Fox School of Business ‘17, Course Hero Intern

  • Left Quote Icon

    I cannot even describe how much Course Hero helped me this summer. It’s truly become something I can always rely on and help me. In the end, I was not only able to survive summer classes, but I was able to thrive thanks to Course Hero.

    Student Picture

    Dana University of Pennsylvania ‘17, Course Hero Intern

  • Left Quote Icon

    The ability to access any university’s resources through Course Hero proved invaluable in my case. I was behind on Tulane coursework and actually used UCLA’s materials to help me move forward and get everything together on time.

    Student Picture

    Jill Tulane University ‘16, Course Hero Intern