If we add an extra component equal to 1 to each data

This preview shows page 104 - 107 out of 213 pages.

We have textbook solutions for you!
The document you are viewing contains questions related to this textbook.
Numerical Analysis
The document you are viewing contains questions related to this textbook.
Chapter 4 / Exercise 7
Numerical Analysis
Burden/Faires
Expert Verified
If we add an extra component, equal to 1, to each data vector X j so that now x j = [1 , x j 1 , . . . , x jd ] T , for j = 1 , . . . , N , then we can write (5.64) as f ( x ) = a 0 + a T x (5.65) and the dimension d is increased by one. Then we are seeking for a vector a R d that minimizes E ( a ) = N X j =1 [ f j - a T x j ] 2 . (5.66) Putting the data x j as rows of an N × d ( N d ) matrix X and f j as the components of a (column) vector f , i.e. X = x 1 x 2 . . . x N and f = f 1 f 2 . . . f N (5.67)
We have textbook solutions for you!
The document you are viewing contains questions related to this textbook.
Numerical Analysis
The document you are viewing contains questions related to this textbook.
Chapter 4 / Exercise 7
Numerical Analysis
Burden/Faires
Expert Verified
96 CHAPTER 5.LEAST SQUARES APPROXIMATION we can write (5.66) as E ( a ) = ( f - X a ) T ( f - X a ) = k f - X a k 2 . (5.68) The normal equations are given by the condition a E ( a ) = 0. Since a E ( a ) = - 2 X T f + 2 X T X a , we get the linear system of equations X T X a = X T f.X. Clearly,W (5.69) Every solution of the least square problem is necessarily a solution of the normal equations. We will prove that the converse is also true and that the solutions have a geometric characterization. Let W be the linear space spanned by the columns of R N . Then, the least square problem is equivalent to minimizing k f - w k 2 among all vectors w in W . There is always at least one solution, which can be obtained by projecting f onto W , as Fig. 5.2 illustrates. First, note that if a R d is a solution of the normal equations (5.69) then the residual f - X a is orthogonal to W because X T ( f - X a ) = X T f - X T X a = 0 (5.70) and a vector r R N is orthogonal to W if it is orthogonal to each column of X , i.e. X T r = 0. Let a * be a solution of the normal equations, let r = f - X a * , and for arbitrary a R d , let s = X a - X a * . Then, we have k f - X a k 2 = k f - X a * - ( X a - X a * ) k 2 = k r - s k 2 . (5.71) But r and s are orthogonal. Therefore, k r - s k 2 = k r k 2 + k s k 2 ≥ k r k 2 (5.72) and so we have proved that k f - X a k 2 ≥ k f - X a * k 2 (5.73) for arbitrary a R d , i.e. a * minimizes k f - X a k 2 . If the columns of X are linearly independent, i.e. if for every a 6 = 0 we have that X a 6 = 0 , then the d × d matrix X T X is positive definite and hence nonsingular. Therefore, in this case, there is a unique solution to the least squares problem min a k f - X a k 2 given by a * = ( X T X ) - 1 X T f . (5.74)
5.5. HIGH-DIMENSIONAL DATA FITTING 97 f X a f - X a W Figure 5.2: Geometric interpretation of the solution X a of the Least Squares problem as the orthogonal projection of f on the approximating linear sub- space W . The d × N matrix X = ( X T X ) - 1 X T (5.75) is called the pseudoinverse of the N × d matrix X . Note that if X were square and nonsingular X would coincide with the inverse, X - 1 . As we have done in the other least squares problems we seen so far, rather than working with the normal equations, whose matrix X T X may be very sensitive to perturbations in the data, we use an orthogonal basis for the ap- proximating subspace ( W in this case) to find a solution. While in principle this can be done by applying the Gram-Schmidt process to the columns of X , this is a numerically unstable procedure; when two columns are nearly linearly dependent, errors introduced by the finite precision representation

  • Left Quote Icon

    Student Picture

  • Left Quote Icon

    Student Picture

  • Left Quote Icon

    Student Picture