Section 5.4 - Sec 5.4 Orthogonal Systems 455[1 1 1 0(a...

Info iconThis preview shows pages 1–23. Sign up to view the full content.

View Full Document Right Arrow Icon
Background image of page 1

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full Document Right Arrow Icon
Background image of page 2
Background image of page 3

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full Document Right Arrow Icon
Background image of page 4
Background image of page 5

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full Document Right Arrow Icon
Background image of page 6
Background image of page 7

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full Document Right Arrow Icon
Background image of page 8
Background image of page 9

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full Document Right Arrow Icon
Background image of page 10
Background image of page 11

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full Document Right Arrow Icon
Background image of page 12
Background image of page 13

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full Document Right Arrow Icon
Background image of page 14
Background image of page 15

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full Document Right Arrow Icon
Background image of page 16
Background image of page 17

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full Document Right Arrow Icon
Background image of page 18
Background image of page 19

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full Document Right Arrow Icon
Background image of page 20
Background image of page 21

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full Document Right Arrow Icon
Background image of page 22
Background image of page 23
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: Sec. 5.4 Orthogonal Systems 455 [1] 1 1 0 (a) 2 (b) [2 (c) 2 —1 3 1 1 2 0 1 0 2 —1 (d) 1 1 0 (e) 1 —1 3 V 0 —2 1 2 1 0 ~1 1 2 —1 4 1 . For each matrix A in Exercise 26, find a basis {v,} for the Range(A) [= Col(A)] and a basis {wt} for Null(AT). Then verify that the V, are orthogonal to the w,-, as required by Theorem 7, part (ii). . For each ‘of the following matrices A, express the 1’s vector 1 as a unique sum, 1 = x1 + x2 of a vector X1 in R0w(A) and a vector x2 in Null(A). 1—2b21 120 110 (a) —2 4 () 12 (0012 (d)212 1 0 1 . Find a solution to Ax = 1, for A the matrix in Exercise 28, part '(c), in which x is in R0w(A). . Use Theorem 8 to prove that if v1, v2, . . . , vk are a linearly inde- pendent set of vectors in the row space of a matrix A, then W,- = Av, are a linearly independent set of vectors in the, range of A. Thus, if {vi} are a basis for R0w(A), then {Avi} are a basis for Col(A). I Orthogonal Systems In Section 5.3 we saw that the calculation of the pseudoinverse of a matrix A simplified greatly if the columns of A were orthogonal. In this section we examine sets of orthogonal vectors further. If a set of vectors, such as the columns of a matrix, are not orthogonal, we give a procedure to transform them into an equivalent set of orthogonal vectors. Finally, we generalize the idea of an orthogonal set of vectors to build vector spaces for continuous functions generated by an orthogonal set of functions. The idea of projecting one vector onto another vector is used over and over again in this section. Such projections provide simple solutions to systems of equations Ax = b for which the columns of A are orthogonal. The underlying computational property that makes it easy to work with orthogonal columns is, if a, b are orthogonal, their scalar product a - b = I 0. Scalar products are the building blocks for much of matrix algebra (e. g., :3 each entry in the product of two matrices is a scalar product). Thus com- ‘ putations with orthogonal vectors create a lot of 0’s and hence yield simple results. 456 Ch. 5 Theory of Systems of Linear Equations and Eigenvalue/Eigenvector Problems The inverse A“1 of a matrix A with orthogonal columns a? is easy to describe. It is essentially the same as the pseudoinverse: A‘1 is formed by dividing each column a? by a? - 3?, the sum of the squares of its entries, and forming the transpose of the resulting matrix. Thus, if s,- = a? - aC then Via l _C 31 A‘1 = L (1); We verify (1) by noting that entry (1', j) in A‘lA will be 0 if i 75 j be- cause az.C - a].C = O (the columns are orthogonal). Entry (i, i) equals (aic/Si) ' a? = a? ' Bic/(a? ' a?) = 1. Example 1. Inverse of Matrix with Orthogonal Columns ‘ 79' Li 5 u {3: i“ y 3 —4 (i) Consider the matrix A = [4 3], whose columns are otho- gonal. The sum of the squares of the entries in each column of A is 32 + 42 = 25. If we divide each column by 25 and take the transpose, we obtain The reader should check that this matrix is exactly what one would get by computing this 2-by—2 inverse using elimination, (ii) Consider the orthogonal-column matrix 3!“ file a A“1— 2“” _ _i 2 5 2 1 O A: l —1 1 1 —1 —1 Its inverse, by (1), is 1.; % A-1= % —% ~5— o a —% Again the reader should check that A“ 1A = I. I A . 5.4 Orthogonal Systems ‘ 457 Let use use (1) to obtain a formula for the ith component x,- in the solution X to AX = b. Given the inverse A”, we can find X as X = A‘lb. The ith component in A‘ 1b is the scalar product of ith row of A” 1 with b. By (1), the ith row of A‘1 is aiC/(aic - a?) and thus .0. b xi = - aic (2) Our old friend, the length of the projection of b onto column a? (see Theo— rem 2 of Section 5.3). A set of orthogonal vectors of unit length (whose norm is 1) are called orthonormal. The preceding formulas for x, and A‘1 become even nicer if the columns of A are orthonormal. In this case, a? - a? = 1. Then the denominator in (2) is 1, so now the projection formula is x, = a? - b. To obtain A‘ 1, we divide each column of A by 1 and form the transpose: that is, A‘1 = AT. Summarizing this discussion, we have Theorem 1 (i) If A is an n-by-n matrix whose columns are orthogonal, then A“1 is obtained by dividing the ith column of A by the sum of the squares of its entries and transposing the resulting matrix [see (1)]. The ith component x,- in the solution of AX = b is the length of the projection of b on aiC: xi = a? - b/aiC - aiC. (ii) If the columns of A are orthonormal, then the inverse A‘ 1 is AT and the length of the projection is just x,- = a? ' b. Suppose that we have a basis of n orthogonal vectors qi for n—space. If Q has the qi as its columns, the solution X = b* of QX = b will be a vector b* of lengths of the projections of b onto each qi: Qb*=TQ1+b:QZ+"'+b:un=b (3) Here the term b’i‘q1 is just the projection of b onto q]. So (3) simply says r Corollary. Any n—vector b can be expressed as the sum of the projections ' of b onto a set of n orthogonal vectors qi. \ Example 2. onversion of Coordinates from One Basis to Another Consider the orthonormal basis q1 = [.8, .6], q2 = [—.6, .8] for 2-space. To express the vector b = [1, 2] in terms of q1, q2 coordi- nates, we need to solve the system 1‘ :2] Mali] = [i] 458 Ch. 5 Theory of Systems of Linear Equations and Eigenvalue/Eigenvector Problems Fi u . -. ~ ' ' g re 5 6 l 6, 8] am [3, .6]»axis (3) ‘13] or bi“ [1%] + b? [72] = [i] bf= 2, b; = 1 are projections [acumen-:21 [1,2] 01‘ Qb* = b, where Q = [q1 q2] By Theorem 1, bT=q1-b=.8><1+ .6><2= 2, b: =q2-b= -.6><1+.8><2=1 where qul = 2[.8, .6] is the projection of b on ql, and biqz = [— .6, .8] is the projection of b on q2. Thus b = [1, 2] is expressed as an e 1 — e2 coordinate vector, while b* = [2, l] is the same vector expressed in ql — q2 coordinates. A geometric picture of this con- version is given in Figure 5.6, where the vector [2, 1] is depicted as the sum of its projection onto q1 and onto q2. I Theorem 1 is a carbon copy of Theorem 5 of Section 5.3 about pseu- doinverses when columns are orthogonal. As with the inverse, if A’s col- umns are orthonormal, the pseudoinverse A‘r of A will simply be AT. The following example gives a familiar illustration of this result and shows why orthogonal columns make inverses and pseudoinverse so similar. Example 3. Pseudoinverse of Matrix with Orthonormal Columns Let [2 be the first two columns of the 3-by-3 identity matrix. 1 0 12 = 0 1 0 0 Then J£=I§=100 010 For any vector b = [b1, b2, b3], the least-squares solution x = b* to 12x = b is Sec. 5.4 Orthogonal Systems ‘ / b1 , 1 0 0 b1 i b* = I+b = b = . ‘7 2 [0 1 0] b2 [b2] 3 ; This result confirms our intuitive notion that [b1, b2, 0]*is the closest ’ point in the x—y plane to the point [[71, b2, b3]. I ions Optional There is another interesting geometric fact about orthonormal columns (see the Exercises for the two—dimensional case). Theorem 2. When Q has orthonormal columns, then solving QX = b for b* = QTb is equivalent to performing the orthonormal change of basis b —-> b* = QT b. Such a basis change is simply a rotation of the coordinate axes, a reflection through a plane, or a cOmbination of both. The entries in Q can be expressed in terms of the sines and cosines of l- the angles of this rotation. For example, the rotation of axis in the plane by 6° is a linear trans- led formation R of 2-space: tor I ’n‘ j R: x’ = x cos 6° + y sin 6° or u’ = Au a: a y’ = —x sin 6° + y cos 6° 5' where 11]- . )1— cos 6° sin 6° he [— sin 6° cos 6° 3y 4 It is easy to check that A has orthonormal columns. . It follows that the distance between a pair of vectors and the angle that they form do not change with an orthonormal change of basis. (Note: End of optional material.) Orthogonal columns have another important advantage besides easy formulas. A highly nonorthogonal set of columns—that is, columns that are almost parallel—can result in unstable computations. Example 4. Nonorthogonal Columns Consider the following system of equations: 1x1 + .75x2 = 5 (4) 1x1 + 1x2 = 7 460 Ch. 5 Theory of Systems of Linear Equations and Eigenvalue/Eigenvector Problems Let us call the two column vectors in the coefficient matrix of (4): u = [1, 1] and v = [.75, 1]. The cosine of their angle is, by Theorem 6 of Section 5.3, u-v 1.75 = — = — = .99 5 “’5 90" V) |qu| «2-125 ( ) The angle with cosine of .99 is 8°. Thus u and v are almost parallel (almost the same vector). Representing any 2-vector b as a linear combination of two vectors that are almost the same is tricky, that is, unstable. For example, to solve (4) we must we find weights x1, x2 such that 1 .75 5 The system (4) is the canoe-with—sail system from Section 1.1. We already know that calculations with A, the coefficient matrix in (4), are very unstable. In Section 3.5 we computed the condition number of A to be c(A) 2 16. Recall that the condition number c(A) = ' HA“ 1“ measures how much a relative error in the entries of A (or in b) could affect the relative error in x = [x1, x2]; in this case, a 5% error in b could cause an error 16 [= c(A)] times greater in x, a 16 X 5% = 80% error. We solved (4) in Section 1.1 and obtained x1 ,= — 1, x2 = 7. If we had solved for b’ = [7, 5], we would have obtained the answer x1 = 13, x2 = — 8 (see Figure 5.7 for a picture of this result). Or for b” = [6, 6],)6I = 6, x2 = O. V I Figure 5.7 . 13[1,1] —8[.75, 1] Sec. 5.4 Orthogonal Systems 461 Reading the results of Example 4 in reverse, we see that when errors arise in solving an ill—conditioned system of equations Ax = b (in which A has a large condition number), the problem should be that some column vector (or a linear combination of them) forms a small angle with another column vector—this means that the columns are almost linearly dependent. If the columns were close to mutually orthogonal, the system Ax = b would be well-conditioned. Principle. Let A be an n~by-n matrix with rank(A)= n so that the system of equations Ax = b has a unique solution. The solution to Ax = b will be more or less stable according to how close or far from orthogonal the column vectors of A are. Suppose that the columns of the n-by-n matrix A are linearly inde- pendent but not orthogonal. We shall show how to find a new n-by-n matrix A* of orthonormal columns (orthogonal and unit length) that are linear com— binations of the columns of A. Our procedure can be applied to any basis 31, a2, . . . , am of an m— dimensional space V and will yield a new basis of m orthonormal vectors q,- for V (unit-length vectors make calculations especially simple). The proce— dure is inductive in the sense that the first k q, will be an orthonormal basis for the space Vk generated by the first k 3,. The method is called Gram—Schmidt orthogonalization. For k = 1, q1 should be a multiple of 31. To make q1 have norm 1, we set q1 = a1 / [all Next we must construct from 32 a second unit vector q2 orthogonal to ql. We divide 32 into two “parts”: the part of 32 parallel to ql and the part of a2 orthogonal (perpendicular) to q1 (see Figure 5.8). The component of 32 in ql’s direction is simply the projection of 32 onto q]. This projection is sq], where the length s of the projection is 32 ' ql s. = = a - 7 ql _ ql 2 ql ( ) since q1 - q1 = 1. The rest of az, the vector 32 — sql, is orthogonal to the projection sq], and hence orthogonal to ql. So 32 — sq1 is the orthogonal vector we want for qz. To have unit norm, we set q2 = (32 ‘ “IO/'32 ‘ sqli' Let us show how the procedure works thus far. Figure 5.8 Gram—Schmidt orthogonalization. Ill 462 Ch. 5 Theory of Systems of Linear Equations and Eigenvalue/Eigenvector Problems mple 5. ram—Schmidt Orthogonalization in Two Dimensions Suppose that a1 = [3, 4] and a2 = [2, 1] (see Figure 5.8). We set q =1=[3,4l= fifi ‘ Iall 5 5’5 We project a2 onto q1 to get the part of a2 parallel to ql. From (7), the length of the projection is‘ S232'Q1:[2a1]'[%,%]=% =2 and the projection is sq1 = 2[-5-, 3] = [%, g]. Next we determine the other part of a2, the part orthogonal to sq]: a2 _ Sq] = [2" _ [ga = [%, Since |[%, = 1, then a2 — sq1 _ [5 ‘gl 2 [4 3] '32 _ S(11’ — 1 ‘12: I We extend the previous construction by finding the projections of a3 onto ql and q2. Then the vector a3 — slql — .9qu, which is orthogonal to q1 and q2 should be q3; as before, we divide a3 — slq1 — szq2 by its norm to make q3 unit length. We continue this process to find q4, q5, and so on. xample 6. Gram—Schmidt Orthogonalization of 3-by-3 Matrix Let us perform orthogonalization on the matrix A whose ith column we denote by at. 0 3 2 A = 3 5 5 (8) 4 0 5 First q1 = al/Iall = [0, 3, 41/5 = [0, %, %]. The length of the projection a2 onto q1 is s=a2-q1=3-0+5-%+0-%=3 (9a) So the projection of a2 onto q1 is IS (D Sec. 5.4 Orthogonal Systems Next we compute a2 — sql = [3,5,0] — [43,§,%] [3,1543 _15_2] Where laz — sqli 9 + 256/25 + 144/25 = 5, Then ___ a2"“11 = __2 ‘12 la, — sqll 5’ 25’ 25 We compute the length of the projections of 33 onto q1 and q2: ‘13 = (33 _ 51‘11 '— The matrix of these new orthogonal column vectors is o g 1 Q: __ i 25 In keeping with the principle above, the accuracy of this procedure depends on how close to and far from orthogonality the columns a, are. If a linear combination of some 3,- forms a small angle with another vector ak (this means the matrix A has a large condition number), then the resulting qt. will have errors, making them not exactly orthogonal. However, more stable methods are available using advanced techniques, such as Householder transformations. 464 Ch. 5 Theory of Systems of Linear Equations and Eigenvalue/Eigenvector Problems Suppose that the columns of A are not linearly independent. If, say, 33 is a linear combination of a1 and a2, then in the Gram—Schmidt'procedure the error vector a3 — slq1 - szq2 with respect to q1 and q2 will be 0. In this case we skip a3 and use a4 - slq1 — .qu2 to define q3. The number of vectors qi formed will be the dimension of the column space of A, that is, rank(A). The effect of the orthogonalization process can be represented by an upper triangular matrix R so that one obtains the matrix factorization % Theorem 3. Any m-by-n matrix A can be factored in the form A = QR (11) where Q is the m-by-rank(A) matrix with orthonormal columns q,- obtained by Gram—Schmidt orthogonalization, and R is an‘ upper tri- angular matrix of size rank(A)-by-n (described below). For i < j, entry rij of R is aj - q, the projection of aj onto q. The diagonal entries in R are the Slzes, before normalization, of the new columns: r11 = '31,, r22 = '32 ‘591': r33 = ’33 _ 51‘11— 52%, and SO 011- Example 7. QR Decomposition Give the QR decomposition for the matrix A in Example 6. 0 3 2 A = 3 5 5 4 0 5 The orthonormal matrix Q is given in (10). We form R from the information about the sizes of new columns and the projections as described in the preceding paragraph. Here r12 = s = 3 in (9a), and r13 = 31 = 7, r23 = 32 = 2 in (9b). Then 0 g %537 QR=%§—§—é—§052 %—§% %001 Let us compute the second column of QR—multiplying Q by r5, the second column of R—and show that the result is a2, the second column of A. 0 g %‘ 3 QrE: % é—g —% 5 2 _Q i 0 5 25 25_ (12) 0 g g 3 =3 % +5 é-g +0 —% = 5 =2.2 g _1_2 a 0 5 25 25 I :_ ‘ilil‘iii‘ihi ‘i‘ Sec. 5.4 Orthogonal Systems 465 Columns of Q are obtained from linear combinations of the columns of A. Reversing this procedure yields the columns of A as linear combina— tions of the columns of Q. This reversal is what is accomplished by the matrix product QR. Consider the computation in (12). In terms of the col— umns q,- of Q, (12) is 3‘11 + 5‘12 + 0‘13 = 32 or, in terms of R, r12‘l1 + 722% = 32 (13) (a2 equals its projection onto q1 plus its projection onto qz). Next consider the formula for qz: 32 _ 5‘11 = a2 ” r12‘l1‘ (14) '32 " iii 7'22 '(U ‘12: since r22 = [32 — sqll and r12 s. Solving for 32 in (14), we obtain (13) 32 — r12‘l1 Hz ‘ r‘ ‘ ’9 r12Q1+ r22‘l2 = a2 ' 22 The same analysis shows that the jth column in the product QR is just a reversal of the orthogonalization steps for finding qj. The matrix R is upper triangular because column a,- is only involved in building columns qi, qt“, . . . , q” of Q. The QR decomposition is the column counterpart to the LU decomposition, given in Section 3 .2, in which the row combinations of Gaussian elimination are reversed to obtain the matrix A“ from its row-reduced matrix U. The QR decomposition is used frequently in numerical procedures. We use it to find eigenvalues in the appendix to Section 5.5. We will sketch one of its most frequent uses, finding the inverse or pseudoinverse of an ill-conditioned matrix. If A is an n—by—n matrix with linearly independent columns, the decomposition A = QR yields HIUJUJ A—I = (QR)—1 = R—IQ‘I : R~IQT The fact that Q“1 = QT when Q has orthonormal columns was part of Theorem 1. Given the QR decomposition of A, (15) says that to get A”‘, we only need to determine R~ 1. Since R is an upper triangular matrix, its inverse is obtained quickly by back substitution (see Exercise 12 of Section 3.5). When A is very ill—conditioned, one should compute A“1 via (15): first, determining the QR decomposition of A, using advanced (more stable) variations of the GramflSchmidt procedure; then determining R“ 1; and thus obtaining A‘1 = RT‘QT. Equation (15) extends to pseudoinverses. That is, if A is an m—by-n 1|] 466 Ch. 5 Theory of Systems of Linear Equations and Eigenvalue/Eigenvector Problems matrix with linearly independent columns and m > n, then its pseudoinverse A+ can be computed as A+ = R‘IQT (16) See the Exercises for instructions on how to verify (l6) and examples of its use. This formula for the pseudoinverse is the standard way pseudoinverses are computed in practice. Even if one determines Q and R using the basic Gram—Schmidt procedure given above, the resulting A+ from (16) will be substantially more accurate than computing A+ using the standard formula A+ = (ATA)”1AT, because the matrix ATA tends to be ill-conditioned. For example, in the least-squares polynomial-fitting problem in Example 5 of Section 5.4, the condition number of the 37-by—3 matrix XTX was around 2000! Principle. Because of conditioning problems, the pseudoinverse A+ of a matrix A should be computed by the formula A+ = RTIQT, where Q and R are the matrices in the QR decomposition of A. We now introduce a very different use of orthogonality. Our goal is to make a vector space for the set of all continuous functions. To make matters a little easier, let us focus on functions that can be expressed as a polynomial or infinite series in powers of x, such as x3 + 3x2 — 4x + 1 or e" or sin x. Recall that the defining property of a vector space V is that if u and v are in V, then ru + sv is also in V, for any scalars r, 5. Clearly, linear combinations of polynomials (or infinite series) are again polynomials (or infinite series), so these functions form a vector space. For a vector space of functions tobe useful, we need a coordinate system, that is, a basis of independent functions u,-(x) (functions that are not linearly dependent on each other) so that any function f(x) can be expressed as a linear combination of these basis functions. f(x) = f1u1(x) + f2u2(x) + ' ‘ ' (17) This basis will need to be infinite and the linear combinations of basis functions may also be infinite. The best basis would use orthogonal, or even better, orthonormal functions. To make an orthogonal basis, we first need to extend the definition of a scalar, or inner, product c - d of vectors to an inner product of functions. The inner product of two functions f(x) and g(x) on the interval [a, b] is defined as b f(x) ' g(x) = L f(x)g(x) dx (18) This definition is a natural generalization of the standard inner product 0 - d in that both c - d and f(x) - g(x) form sums of term-by-term products of the respective entities, but in (18) we have a continuous sum, an integral. / Sec. 5.4 Orthogonal Systems 467 With an inner product defined, most of the theory and formulas defined for vector spaces can be applied to our space of functions. The inner product tells us when two vectors c, d are orthogonal (if c - d = O), and allows us to compute coordinates cf of c in an orthonormal basis ui: cf = c - u, (these coordinates are just the projections of c onto the u,). We can now do the same calculations for functions with (18). The functional equivalent of the euclidean norm is defined by b |f(x)|2 = f(x) - f(x) = f(x)2 dx (19) The counterpart of the sum norm [c], = 2 (0,] for vectors is |f(x)]s = I lax)! dx- An orthonormal basis for our functions on the interval [a, b] will be a set of functions {ui(x)} which are orthogonal—by (18), f ul-(x)uJ-(x) dx = O, for all i aé j—and whose norms are 1—by (19), f ul-(x)2 dx = 1. Given such an orthonormal basis {ui(x)}, the coordinates f,- of a function f(x) in terms of the ui(x) are computed by the projection formula f,- = f(x) - ui(x) used for n-dimensional orthonormal bases: f(x) = [f (x) ' u1(x)]u1(x) + [NO ' u2(x)]u2(x) + ' ' ' (20) How do we find such an orthonormal basis? The first obvious choice is the set of powers of x: 1, x, x2, x3, . . . . These are linearly independent; that is, x" cannot be expressed as a linear combination of smaller powers of x. Unfortunately, there is no interval on which 1, x, and x2 are mutually orthogonal. On[—1, 1], 1 -x = fxdx = Oandx-x2 = fx3 dx = 0, but1-x2 = fxza’x = There are many sets of orthogonal functions that have been developed over the years. We shall mention two, Legendre polynomials and Fourier trigonometric, functions. The Gram—Schmidt orthogonalization procedure provides a way to build. an orthonormal basis out of a basis of linearly independent vectors. The calculations in this procedure use inner products, and hence this pro- cedure can be applied to the powers of x (which are linearly independent but, as we just said, far ’from orthogonal) to find an orthonormal set of polynomials. When the interval is [— 1, 1], the polynomials obtained by orthogon- alization are called Legendre polynomials Lk(x). Actually, we shall not worry about making their norms equal to 1. As noted above, the functions x0 = 1 and x are orthogonal on [—1, 1]. So L0(x) = 1 and L1(x) = x. Also, x2 is orthogonal to x but not to 1 on [~ 1, 1]. We must subtract off the projection of x2 onto 1: 1-x2 fxzdx g L = 2— 1: 2—— : 2——= 2— 20‘) X (1_1) x fldx x 2 x 1 3 (21) A similar orthogonalization computation shows that L3(x) = x3 — §x. 468 Ch. 5 Theory of Systems of Linear Equations and Eigenvalue/Eigenvector Problems Example 8. Approximating ex by Legendre Polynomials Let us use the first four Legendre polynomials L0(x) = l, L1(x) = x, L2(x) = x2 — %, L3(x) = x3 — 3x/5 to approximate ex on the interval [— l, 1]. We want the first four terms in (20): ex 2 woLo + W1L1(x) + W2L2(x) + W3L3(x) (22) 1 3 =w0+w1x+w2(x2—-§)+w3(x3—-§) where W, ? ex - Li(x)/L,-(x) - L[(x) = f exLl-(x) dx/f L,‘(x)2 dx. For example, 1 J4 e)‘(x2 — %) dx w2 :,————-—————— l Lw—am With a little calculus, we compute the w, to be (approximately) 2.35 .736 wO — T _ 1.18, w1 - T67 — 1.10, .096 .008 = _ = . = —— = w .178 53’ ‘ W3 .046 8 Then (22) becomes 1 3 ex = 1.18 +1.10x + .53 (x2 — 5) + .18 (x3 — (23) If we collect like powers of x together on the right side, (23) simplifies to ex = l + x + .53x2 + .18x3 (24) Comparing our approximation against the real values of ex at the points —1, —.5, 0, .5, l, we find Sec. 5.4 Orthogonal Systems 469 A pretty good fit. In particular, it is a better fit on [— 1, 1] than simply using the first terms of the power series for 6", namely, 1 + x + x2/ 2 + x3/ 6. The approximation gets more accurate as more Legendre polynomials are used. I Over the interval [0, 217] the trigonometric functions (1/ sin kx and (l/W) cos kx, for k = 1, 2, . . . , plus the constant function 1/ V 217 are an orthonormal basis. To verify that they are orthogonal requires showing that 27r fisinjx-écoskx=%fo sinjxcoskxdx=0 forallj,k 1 . . 1 . 1 2“. . . V—Eanx-fismkx=;fo s1njxs1nkxdx=0 foralljyék 1T 0 1 1 1 r“ —cos'x-—coskx — cos' coskxdx=0 W J V; 1" forallj75k plus showing these trigonometric functions are orthogonal to a constant func- tion. To verify that these trigonometric functions have unit length requires showing 1 1 1 2" Tsinkx~7sinlcx=—I0 sinzkxdx=1 forallk 17 17 17 1 1 —cosk -——coskx V; x v; When u2k_1(x) = (l/W) sin kx and u2k(x) = (l/W) cos kx, k = 1, 2, . . . and u0(x) = 1/Vfi in (20), this representation of f(x) is called a Fourier series, and the coefficients f(x) - ui(x) in (20) are called Fourier coefficients. Using Fourier series, we see that any piecewise continuous function can be expressed as a linear combination of sine and cosine waves. One important physical interpretation of this fact is that any complex elec- trical signal can be expressed as a sum of simple sinusoidal signals. 211' l I cos2 kx dx = 1 for all k 11' 0 Example 9. Fourier Series Representation of 3 Jump Function Let us determine the Fourier series representation of the discontinuous function: f(x) = 1 for 0 < x 5 at and = 0 for 1T < x S 211. The Fourier coefficients f(x) - ui(x) in (20) are 470 Ch. 5 Theory of Systems of Linear Equations and Eigenvalue/Eigenvector Problems 1 1 “1 fix) ‘ “2k—1(x) = f(x) ' W sin kx = W [0 sin kx dx i— k dd = L [—003 kr]11 = kW 0 W? 0 0 k even 1 (25) f(X) ‘ usz) = f(x) ' ficos kx = V1; f0 cos kx dx 1 = kW [sin kx]Z,T 0 Further, we calculate f(x) - 1 /\/§17r = \/ 17/2, so the constant term of the Fourier series for this f(x) is (fix) - u0(x))u0(x) = By (25), only the odd sine terms occur. Letting an odd k be written as 2n — 1, we obtain the Fourier series. 1 °° 2 . f(x) _ E + Z] sm [(211 — 1)x] (26) 71 Figure 5.9 shows the approximation to f(x) obtained when the first three sine terms in (26) are used (dashed line) and when the first eight sine terms are used. The fit is impressive. I 1.1 1.0 3.8 I Figure 5.9 Dashed lines use first three trigonometric terms in Fourier series for f(x). Solid lines use first eight terms. ” WWII“ ‘u Sec. 5.4 Orthogonal Systems 471 Representing a function in terms of an orthonormal set of functions as in (20) has a virtually unlimited number of applications in the physical sciences and elsewhere. If one can solve a physical problem for the ortho— normal basis functions, then one can typically obtain a solution for any function as a linear combination of the solutions for the basis functions. This is true for- most differential equations associated with electrical circuits, vibrating bodies, and so on. Statisticians use Fourier series to analyze time— . ‘ _ series patterns (see Example 3 of Section 1.5). The study of Fourier series “ l is one of the major fields of mathematics. ‘ We complete our discussion of vector spaces of functions by showing how badly conditioned the powers of x are as a basis for representing func— tions. Remember that the powers of x, x’, i = 0, 1, . . . , are linearly independent. The problem is that they are far from orthogonal. Let us consider how we might approximate an arbitrary function f(x) as a linear combination of, say, the powers of x up to JP: f(x) 2 wO + wlx + wzx2 + w3x3 + w4x4 + w5x5 (27) using the continuous version of least—squares theory. If f(x) and the powers of x were vectors, not functions, then (27) would have the familiar matrix form f 2 AW and the approximate solution w would be given by w = A+f, where A+ = (ATA)_1AT. Let us generalize f = Aw to functions by letting the columns of a matrix be functions. We define the functional “matrix” A(x): A(x) = [1, x, x2, x3, x4, x5] Now (27) becomes f (x) = A(X)W (28) To find the approximate solution to (28), we need to compute the functional version of the pseudoinverse A(x) +2 A(x) + = (A(x)TA(x)) ‘ 1A(x)T and then find the vector w of coefficients in (27): W = A(x)+f(x) = (A(x)TA(x))‘1(A(x)Tf(x) (29) H The matrix A(x)T has xi as its ith “row , so the matrix product A(x)TA(x) involves computing the inner product of each ‘row” of A(x)T with each “column” of A(x): entry (1', j) in A(x)TA(x) is xi - xi (= f x’xf' dx) Similarly, the matrix-‘ ‘vector’ ’ product A(x)Tf(x) is the vector of inner prod— 472 Ch. 5 Theory of Systems of Linear Equations and Eigenvalue/Eigenvector Problems ucts xi - f(x). The computations are simplest if we use the interval [0, 1]. Then entry (1', j) of A(x)TA(x) is I 1 . xi+j+1 1 1 x‘ -x1' =f0xi+1dx = ._.——— = T‘,—' (30) z+j+1O l+j+1 For example, entry (1, 2) is f xx2 dx = f x3 dx = %. Note that we consider the constant function 1 (= )6") to be the zeroth row of A(x)T. Computing all the inner products for A(x)TA(x) yields 1 l l l l l 2 3 4 5 6 .1. l l l l l 2 3 4 5 6 7 e e e e e e T _ A(x) _ l 1 1 1 l 1 4 3 E 7 B a l l l l l L 5 6 7 8 9 10 l l l A _1. L 6 7 8 9 10 11 This matrix is very ill-conditioned since the columns are all similar to each other. When the fractions in (31) are expressed to six decimal places, such as % = .333333, the inverse given by the author’s microcomputer was (with entries rounded to integer values) Fractions expressed to six decimal places (A(x)TA(x))‘1 = 17 —116 —47 1,180 ——1,986 958 — 116 342 7,584 —34,881 49,482 —22,548 — 47 7,584 — 76,499 242,494 - 301,846 129,004 1,180 — 34,881 242,494 644,439 723,636 — 289,134 — 1,986 49,482 - 301,846 723,636 - 747,725 278,975 958 — 22,548 129,004 — 289,134 278,975 — 97,180 (32) The (absolute) sum of the fifth column in (32) is about 2,000,000. The first column in (31) sums to about 2.5. So the condition number of A(x)TA(x), in the sum norm, is about 2,000,000 X 2.5 = 5,000,000. Now that is an ill-conditioned matrix! We rounded fractions to six significant digits, but our condition number tells us that without a seventh significant digit, our numbers in (32) could be off by 500% error [a relative error of .000001 in A(x)TA(x) could yield answers off by a factor of 5 in pseudoinverse calculations]. Thus the numbers in (32) are worthless. Suppose that we enter the matrix in (31) again, now expressing frac— tions to seven decimal places. The new inverse computation yields WWW Sec. 5.4 Orthogonal Systems 473 Fractions expressed to seven decimal places (A(JC)TA(X))_1 = 51 —1,051 6,160 —1,475 15,419 -5,845 — 1,051 26,385 — 165,765 410,749 —438,029 168,208 6,160 ~—165,765 1,079,198 ~2,731,939 2,955,103 —— 1,146,281 — 1,475 410,749 —2,731,939 7,017,359 —7,671,190 2,999,546 15,419 —438,029 2,955,103 —7,671,190 8,454,598 —3,327,362 —5,845 168,208 —1,146,281 2,999,546 -—3,327,362 1,316,523 (33) We have a totally different matrix. Most of the entries in (33) are about 10 times larger than corresponding entries in (32). The sum of the fifth column in (33) is about 23,000,000. If we use (33), the condition number of A(x)TA(x) is around 56,000,000. Our entries in (33) were rounded to seven significant digits, but the condition number says eight significant digits were needed. Again our numbers are worthless. To compute the inverse accurately would require double—precision computation. It is only fair to note that the ill—conditioned matrix (31) is famously bad. It is called a 6-by-6 Hilbert matrix [a Hilbert matrix has 1 /(i + j + 1) in entry (1', j)]. Suppose that we used the numbers in (32) for (A(x)TA(x))’1 in com- puting the pseudoinverse. Let us proceed to calculate A(x)+ and then com- pute the coefficients in an approximation for a function by a fifth—degree polynomial. Let us choose f(x) = 6". Then (A(x)Te") is the vector of inner products xi - 6i = f xie" dx, i = 0, 1, . . . , 5. Some calculus yields A(x)Tex = [2.718, 1, .718, .563, .465, .396] (expressed to three significant digits). Now inserting our values for (A(x)TA(x))#1 and ATe‘ into (27), we obtain W = (A(X)TA(X))“1(A(X)T6‘) = 17 — 116 —47 1,180 — 1,986 958 17 —87 —219 1,611 2,449 1,135 —116 ~47 1,180 —1,986 958 2.718 342 7,584 — 34,881 49,482 — 22,548 1 7,584 — 76,499 242,494 — 301,846 129,004 .718 — 34,881 242,494 644,439 723,636 -289,134 .563 49,482 —- 301,846 723,636 — 747,725 278,975 .465 — 22,548 129,004 -— 289,134 278,975 — 97,180 .396 (34) M1 474 Ch. 5 Theory of Systems of Linear Equations and Eigenvalue/Eigenvector Problems Thus our fifth-degree polynomial approximation of e" on the interval [0, 1] is ex = 17 — 87x — 219262 + 1611263 + 2449x4 + 113525 (35) Settingx = lin (35), we have 61 = 17 — 86 — 219 + 1611 + 2449 + 1135 = 4907, pretty bad. Since our computed values in (A(X)TA(X))_1 are meaningless, such a bad approximation of ex was to be expected. Compare (35) with the Legendre polynomial approximation in Ex— ample 8. Section 5.4 Exercises Summary of Exercises Exercises 1—11 involve inverses, pseudoinverses, and projections for matri— ces with orthogonal columns. Exercises 12—21 involve Gram—Schmidt or- thogonalization and the QR decomposition. Exercises 22—30 present prob— lems about functional inner products and functional approximation. 1. Compute the inverses of these matrices with orthogonal columns. Solve AX: th—t where A is the matrix in part (b). .6 .8 V 2 2 1 (a) —.8 .6 (b) —2 1 2 1 —2 2 2. Compute the inverses of these matrices with orthogonal columns. —1 4 —1 2 —3 6 (a) 2 1—2 (b) —6 2 3 1 2 3 3 6 2 .5 —.5 1 (c) —.5 .5 1 1 .5 0 Solve AX = l, where A is the matrix in part (a). 3. Show that if A is an n-by-n upper triangular matrix with orthonormal columns, A is the identity matrix I. 4. Compute the length k of the projection of b onto a and give the pro— jection vector ka. (a) a = [0,1,0],b = [3, 2, 4] (b) a = [1, —1,2],b = [2, 3,1] (c) a = [i 13:, %], b = [4, 1, 3] (d) a = [2, —1, 3],b = [—2,5, 3] 5. Express the vector [2, 1, 2],as a linear combination of the following orthogonal bases for three—dimensional space. (a) [1, —1, 2],[2,2,0],[—1,1, 1] (b) [%, %, ~%], [i %, %], [%, —%, i] (c) [3,1.5,1],[1, —3, 1.5], [—1.5, 1, 3] 6. Compute the pseudoinverse of II | wIN ODIN 039‘ MM cab-I who Find the least-squares solution to Ax = 1. 7. Compute the pseudoinverse of 3 4 A: 1 —2 2 —5 Find the least-squares solution to Ax = l. 8. Consider the regression model 2 = qx + ry + s for the following data, where the x-value is a scaled score (to have average value of 0) of high school grades, the y—value is a scaled score of SAT scores, and the z—value is a score of college grades. Determine q, r, and s. Note that the x, y, and 1 vectors are mutually orthogonal. 9. Verify that Theorem 2 is true in two dimensions, namely, that a change from the standard {e1, e2} basis to some other orthonormal basis {q1, q2} corresponds to a rotation (around the origin) and possibly a reflection. Note that since q1, q2 have unit length, they are completely determined by knowing the (counterclockwise) angles 01, 02 they make with the positive e1 axis; also since q1, q2 are orthogonal, [01 - 02] = 90°. 476 Ch. 5 Theory of Systems of Linear Equations and Eigenvalue/Eigenvector Problems 10. 11. 12. 13. 14. 15. 16. 17. 18. (a) Show that an orthonormal change of basis preserves lengths (in euclidean norm). Hint: Verify that (Qv) - (Qv) = v - v (where Q has orthonormal columns) by using the identity (Ab) - (Cd) = bT(ATC)d. (b) Show that an orthonormal change of basis preserves angles. Hint: Show that the cosine formula for the angle is unchanged by the method in part (a). Compute the angle between the following pairs of nonorthogonal vec— tors. Which are close to orthogonal? (a) [3, 2], [*3, 4] (b) [1, 2, 5], [2, 5, 3] (c) [17 _3’ 2], {—2, 4’ Find the QR decomposition of the following matrices. ( ) 3 — 1 2 1 1 — 1 2 2 3 1 a 4 1 (b) 1 1 (c) 2 - 1 1 (d) 1 1 1 - 2 3 2 — 2 2 2 1 2 Use the Gram—Schmidt orthogonalization to find an orthonormal basis that generates the same vector space as the following bases: (3) [1, 1], [2, -l] (b) [2, 1, 2], [4, 1, 1], (c) [3, 1, 1], [1,2, 1], [1, 1,2] (3) Compute the inverse of the matrix in Exercise 12, part (c) by first finding the QR decomposition of the matrix and then using (15) to get the inverse. (See Exercise 12 of Section 3.5 for instructions on computing R‘ 1.) What is its condition number? Check your answer by computing the inverse by the regular elimi- nation by pivoting method. (1!) (3) Find the pseudoinverse A+ of the matrix A in Exercise 12, part (b) by using the QR decomposition of A and computing A+ as + = R—lQT- Check your answer by finding the pseudoinverse from the formula A+ = (ATA)‘1AT. Note that this is a very poorly conditioned matrix; compute the condition number of (ATA). (1!) Use (16) to find the pseudoinverse in solving the refinery problem in Example 3 of Section 5.3. Use (16) to find the pseudoinverse in the following regression problems using the model )‘2 = qx + r. (a) (x, y) points: (0, 1), (2, 1), (4, 4) (b) (x, y) points: (3, 2), (4, 5), (5, 5), (6, 5) (c) (x, y) points: (—2, 1), (0, 1), (2, 4) Use (16) to find the pseudoinverse in the least-squares polynomial—fitting problem in Example 5 of Section 5.3. Sec. 5.4 Orthogonal Systems 19. 20. 21. 22. 23. 24. 25. 26. 27. 28. 477 Verify (16): A+ = R“‘QT, by substituting QR for A (and RTQT for AT) in the pseudoinverse formula A+ = (ATA)‘1AT and simplifying (remember that R is invertible; we assume that the columns of A are linearly independent). Show that if the columns of the m-by-n matrix A are linearly inde- pendent, the m-by-m matrix R of the QR decomposition must be in— vertible. Hint: Show main diagonal entries of R are nonzero and then see Ex- ercise 12 of Section 3.5 for instructions on computing inverse of R. Show that any set H of k orthonormal n—vectors can be extended to an orthonormal basis for n-dimensional space. Hint: Form an n—by-(k + n) matrix whose first k columns come from H and whose remaining n columns form the identity matrix; now apply the Gram—Schmidt orthogonalization to this matrix. Over the interval [0, 1], compute the following inner products: x - x, x-x3, x3-x3. Verify that the fourth Legendre polynomial is x3 -— gx. Verify the values found for the weights w, w, w}, and w4 in Exam— ple 8. Note: You must use integration by parts~or a table of integrals. Approximate the following functions f(x) as a linear combination of the first four Legendre polynomials over the interval [~- 1, 1]: L0(x) = 1, L1(x) = x, L2(x) = x2 —- %, L3(x) = x3 — 3x/5. (:0 f(x) = x4 (b) f(x) = le (c) f(x) = —1:x<0, =12x20 Approximate x3 + 2x — 1 as a linear combination of the first four Legendre polynomials over the interval {—1, 1]: Lo(x) = 1, L1(x) = x, L2(x) = x2 ~ %, L3(x) = x3 — 3x/5. Your “approximation” should equal x3 + 2x ~ 1, since this polynomial is a linear combination of the functions 1, x, x2, and x3, from which the Legendre polynomials were derived by orthogonalization. (21) Find the Legendre polynomial of degree 4. (b) Find the Legendre polynomial of degree 5. (2) Using the interval [0, 1], instead of [— l, 1], find three orthogonal polynomials of the form K0(x) = a, K1(x) = bx + c, and K2(x) = dx2 + ex + f. (b) Find a least-squares approximation of x“ on the interval [0, 1] using your three polynomials in part (a). ...
View Full Document

{[ snackBarMessage ]}

Page1 / 23

Section 5.4 - Sec 5.4 Orthogonal Systems 455[1 1 1 0(a...

This preview shows document pages 1 - 23. Sign up to view the full document.

View Full Document Right Arrow Icon
Ask a homework question - tutors are online