The linear least squares or linear regression problem

Info icon This preview shows pages 115–127. Sign up to view the full content.

View Full Document Right Arrow Icon
The linear least squares (or linear regression) problem is to compute Thus we want to find the linear combination of the columns of X which is closest to y in the least squares sense. The solution is b=X + y, although this solution may not be unique. 115 min b ( y - Xb ) ( y - Xb )
Image of page 115

Info icon This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document Right Arrow Icon
Use X=QR in the least squares problem to find Thus we merely have to solve the triangular system and the minimum we find is 116 ( y - X β ) ( y - X β ) = y ( I - QQ ) y + ( Q y - R β ) ( Q y - R β ) . Q y = R β y ( I - QQ ) y .
Image of page 116
Linear Least Squares in R lm () lsfit () qr () 117
Image of page 117

Info icon This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document Right Arrow Icon
Ridge Regression In OLS regression the problem of "bouncing beta's" can be quite serious. Several methods have been proposed over the years to regularize the regression coefficients. The oldest one is Ridge Regression, in which we minimize For a zero penalty this is OLS. If increases we shrink B towards the origin. This can be discussed in terms of the famous bias-variance tradeoff. 118 σ ( B ) = Y - XB 2 + κ B 2 . κ
Image of page 118
The solution is which, as a function of , defines the ridge trace . Observe that, even if X is singular, 119 B ( κ ) = ( X X + κ I ) - 1 X Y , κ lim κ 0 B ( κ ) = X + Y .
Image of page 119

Info icon This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document Right Arrow Icon
Eigenvalue problems Let's first discuss the power method. Fo this method we find that converges to the normalized eigenvector corresponding to the largest eigenvalue. 120 y ( k ) = Ax ( k ) , x ( k +1) = y ( k ) y ( k ) . lim k →∞ x ( k )
Image of page 120
library(car) powIter<-function(a,x) { i<-1 repeat { pdf(paste("evec",as.character(i),".pdf",sep="")) z<-3*matrix(c(-1,-1,1,1,-1,1,-1,1),4,2) plot(z,type="n") ellipse(c(0,0),diag(2),1) x<-x/sqrt(sum(x^2)) text(t(x),"x") lines(t(matrix(c(0,0,x),2,2))) y<-as.vector(a%*%x); u<-y/sqrt(sum(y^2)) text(t(y),"Ax") lines(t(matrix(c(0,0,y),2,2))) dev.off() if (max(abs(x-u))<1e-6) return() x<-u; i<-i+1 } } 121
Image of page 121

Info icon This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document Right Arrow Icon
122
Image of page 122
Low Rank Approximation Suppose X is an n x m matrix. We want to approximate X by a product AB' , with A an n x r matrix and B an m x r matrix. Or, equivalently, we want to approximate X by a matrix of rank at most r . We usually choose r << min(m,n) , so we have substantial data reduction . The loss function is 123 σ ( A , B ) = X - AB 2 .
Image of page 123

Info icon This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document Right Arrow Icon
One can also interpret this as creating a number of new columns (variables) A such that the original variables in X are closely approximated by linear combinations of these new variables, which are traditionally called the principal components . The stationary equations are 124 X A = BA A , XB = AB B .
Image of page 124
This leads immediately to an alternating least squares algorithm. Start with some A (0) . Then alternate the steps Both steps will decrease the loss function and thus lead to a convergent algorithm. Variations are possible, for example by using QR to transform A (k) to orthonormality. 125 B ( k ) = X A ( k ) (( A ( k ) ) A ( k ) ) - 1 , A ( k + 1) = XB ( k ) (( B ( k ) ) B ( k ) ) - 1 .
Image of page 125

Info icon This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document Right Arrow Icon
Such variations are possible because of a basic non-uniqueness of the approximation of X by AB'. Suppose T is non-singular. Define and Then we clearly have Thus we require, without loss of generality, that A'A = I , or that B'B = I , or that both A'A and B'B are diagonal. Especially if we approximate in fairly high dimensionality, this leads to a considerable amount of non-uniqueness.
Image of page 126
Image of page 127
This is the end of the preview. Sign up to access the rest of the document.

{[ snackBarMessage ]}

What students are saying

  • Left Quote Icon

    As a current student on this bumpy collegiate pathway, I stumbled upon Course Hero, where I can find study resources for nearly all my courses, get online help from tutors 24/7, and even share my old projects, papers, and lecture notes with other students.

    Student Picture

    Kiran Temple University Fox School of Business ‘17, Course Hero Intern

  • Left Quote Icon

    I cannot even describe how much Course Hero helped me this summer. It’s truly become something I can always rely on and help me. In the end, I was not only able to survive summer classes, but I was able to thrive thanks to Course Hero.

    Student Picture

    Dana University of Pennsylvania ‘17, Course Hero Intern

  • Left Quote Icon

    The ability to access any university’s resources through Course Hero proved invaluable in my case. I was behind on Tulane coursework and actually used UCLA’s materials to help me move forward and get everything together on time.

    Student Picture

    Jill Tulane University ‘16, Course Hero Intern