# If we add an extra component equal to 1 to each data

• 213
• 100% (3) 3 out of 3 people found this document helpful

This preview shows page 104 - 107 out of 213 pages.

##### We have textbook solutions for you!
The document you are viewing contains questions related to this textbook. The document you are viewing contains questions related to this textbook.
Chapter 4 / Exercise 7
Numerical Analysis
Burden/Faires Expert Verified
If we add an extra component, equal to 1, to each data vector X j so that now x j = [1 , x j 1 , . . . , x jd ] T , for j = 1 , . . . , N , then we can write (5.64) as f ( x ) = a 0 + a T x (5.65) and the dimension d is increased by one. Then we are seeking for a vector a R d that minimizes E ( a ) = N X j =1 [ f j - a T x j ] 2 . (5.66) Putting the data x j as rows of an N × d ( N d ) matrix X and f j as the components of a (column) vector f , i.e. X = x 1 x 2 . . . x N and f = f 1 f 2 . . . f N (5.67)
##### We have textbook solutions for you!
The document you are viewing contains questions related to this textbook. The document you are viewing contains questions related to this textbook.
Chapter 4 / Exercise 7
Numerical Analysis
Burden/Faires Expert Verified
96 CHAPTER 5.LEAST SQUARES APPROXIMATION we can write (5.66) as E ( a ) = ( f - X a ) T ( f - X a ) = k f - X a k 2 . (5.68) The normal equations are given by the condition a E ( a ) = 0. Since a E ( a ) = - 2 X T f + 2 X T X a , we get the linear system of equations X T X a = X T f.X. Clearly,W (5.69) Every solution of the least square problem is necessarily a solution of the normal equations. We will prove that the converse is also true and that the solutions have a geometric characterization. Let W be the linear space spanned by the columns of R N . Then, the least square problem is equivalent to minimizing k f - w k 2 among all vectors w in W . There is always at least one solution, which can be obtained by projecting f onto W , as Fig. 5.2 illustrates. First, note that if a R d is a solution of the normal equations (5.69) then the residual f - X a is orthogonal to W because X T ( f - X a ) = X T f - X T X a = 0 (5.70) and a vector r R N is orthogonal to W if it is orthogonal to each column of X , i.e. X T r = 0. Let a * be a solution of the normal equations, let r = f - X a * , and for arbitrary a R d , let s = X a - X a * . Then, we have k f - X a k 2 = k f - X a * - ( X a - X a * ) k 2 = k r - s k 2 . (5.71) But r and s are orthogonal. Therefore, k r - s k 2 = k r k 2 + k s k 2 ≥ k r k 2 (5.72) and so we have proved that k f - X a k 2 ≥ k f - X a * k 2 (5.73) for arbitrary a R d , i.e. a * minimizes k f - X a k 2 . If the columns of X are linearly independent, i.e. if for every a 6 = 0 we have that X a 6 = 0 , then the d × d matrix X T X is positive definite and hence nonsingular. Therefore, in this case, there is a unique solution to the least squares problem min a k f - X a k 2 given by a * = ( X T X ) - 1 X T f . (5.74)
5.5. HIGH-DIMENSIONAL DATA FITTING 97 f X a f - X a W Figure 5.2: Geometric interpretation of the solution X a of the Least Squares problem as the orthogonal projection of f on the approximating linear sub- space W . The d × N matrix X = ( X T X ) - 1 X T (5.75) is called the pseudoinverse of the N × d matrix X . Note that if X were square and nonsingular X would coincide with the inverse, X - 1 . As we have done in the other least squares problems we seen so far, rather than working with the normal equations, whose matrix X T X may be very sensitive to perturbations in the data, we use an orthogonal basis for the ap- proximating subspace ( W in this case) to find a solution. While in principle this can be done by applying the Gram-Schmidt process to the columns of X , this is a numerically unstable procedure; when two columns are nearly linearly dependent, errors introduced by the finite precision representation
• • • 