This preview shows pages 1–23. Sign up to view the full content.
This preview has intentionally blurred sections. Sign up to view the full version.
View Full DocumentThis preview has intentionally blurred sections. Sign up to view the full version.
View Full DocumentThis preview has intentionally blurred sections. Sign up to view the full version.
View Full DocumentThis preview has intentionally blurred sections. Sign up to view the full version.
View Full DocumentThis preview has intentionally blurred sections. Sign up to view the full version.
View Full DocumentThis preview has intentionally blurred sections. Sign up to view the full version.
View Full DocumentThis preview has intentionally blurred sections. Sign up to view the full version.
View Full DocumentThis preview has intentionally blurred sections. Sign up to view the full version.
View Full DocumentThis preview has intentionally blurred sections. Sign up to view the full version.
View Full DocumentThis preview has intentionally blurred sections. Sign up to view the full version.
View Full DocumentThis preview has intentionally blurred sections. Sign up to view the full version.
View Full Document
Unformatted text preview: Sec. 5.4 Orthogonal Systems 455 [1] 1 1 0
(a) 2 (b) [2 (c) 2 —1
3 1 1
2 0 1 0 2 —1
(d) 1 1 0 (e) 1 —1 3
V 0 —2 1 2 1 0
~1 1 2 —1 4 1 . For each matrix A in Exercise 26, ﬁnd a basis {v,} for the Range(A)
[= Col(A)] and a basis {wt} for Null(AT). Then verify that the V, are
orthogonal to the w,, as required by Theorem 7, part (ii). . For each ‘of the following matrices A, express the 1’s vector 1 as a
unique sum, 1 = x1 + x2 of a vector X1 in R0w(A) and a vector x2 in
Null(A). 1—2b21 120 110
(a) —2 4 () 12 (0012 (d)212
1 0 1 . Find a solution to Ax = 1, for A the matrix in Exercise 28, part '(c),
in which x is in R0w(A). . Use Theorem 8 to prove that if v1, v2, . . . , vk are a linearly inde
pendent set of vectors in the row space of a matrix A, then W, = Av,
are a linearly independent set of vectors in the, range of A. Thus, if {vi}
are a basis for R0w(A), then {Avi} are a basis for Col(A). I Orthogonal Systems In Section 5.3 we saw that the calculation of the pseudoinverse of a matrix
A simpliﬁed greatly if the columns of A were orthogonal. In this section we
examine sets of orthogonal vectors further. If a set of vectors, such as the
columns of a matrix, are not orthogonal, we give a procedure to transform
them into an equivalent set of orthogonal vectors. Finally, we generalize the
idea of an orthogonal set of vectors to build vector spaces for continuous
functions generated by an orthogonal set of functions. The idea of projecting
one vector onto another vector is used over and over again in this section.
Such projections provide simple solutions to systems of equations Ax = b
for which the columns of A are orthogonal. The underlying computational property that makes it easy to work with
orthogonal columns is, if a, b are orthogonal, their scalar product a  b = I
0. Scalar products are the building blocks for much of matrix algebra (e. g., :3
each entry in the product of two matrices is a scalar product). Thus com ‘
putations with orthogonal vectors create a lot of 0’s and hence yield simple
results. 456 Ch. 5 Theory of Systems of Linear Equations and Eigenvalue/Eigenvector Problems The inverse A“1 of a matrix A with orthogonal columns a? is easy to
describe. It is essentially the same as the pseudoinverse: A‘1 is formed by
dividing each column a? by a?  3?, the sum of the squares of its entries,
and forming the transpose of the resulting matrix. Thus, if s, = a?  aC then Via l _C
31 A‘1 = L (1); We verify (1) by noting that entry (1', j) in A‘lA will be 0 if i 75 j be cause az.C  a].C = O (the columns are orthogonal). Entry (i, i) equals (aic/Si) ' a? = a? ' Bic/(a? ' a?) = 1. Example 1. Inverse of Matrix with
Orthogonal Columns ‘
79'
Li
5
u
{3:
i“
y 3 —4
(i) Consider the matrix A = [4 3], whose columns are otho gonal. The sum of the squares of the entries in each column of A
is 32 + 42 = 25. If we divide each column by 25 and take the
transpose, we obtain The reader should check that this matrix is exactly what one would
get by computing this 2by—2 inverse using elimination,
(ii) Consider the orthogonalcolumn matrix 3!“ ﬁle a A“1— 2“”
_ _i 2 5 2 1 O
A: l —1 1
1 —1 —1
Its inverse, by (1), is 1.; %
A1= % —% ~5—
o a —% Again the reader should check that A“ 1A = I. I A . 5.4 Orthogonal Systems ‘ 457 Let use use (1) to obtain a formula for the ith component x, in the
solution X to AX = b. Given the inverse A”, we can ﬁnd X as X = A‘lb.
The ith component in A‘ 1b is the scalar product of ith row of A” 1 with b.
By (1), the ith row of A‘1 is aiC/(aic  a?) and thus .0. b
xi =  aic (2)
Our old friend, the length of the projection of b onto column a? (see Theo—
rem 2 of Section 5.3). A set of orthogonal vectors of unit length (whose norm is 1) are called
orthonormal. The preceding formulas for x, and A‘1 become even nicer if
the columns of A are orthonormal. In this case, a?  a? = 1. Then the
denominator in (2) is 1, so now the projection formula is x, = a?  b. To
obtain A‘ 1, we divide each column of A by 1 and form the transpose: that is, A‘1 = AT. Summarizing this discussion, we have Theorem 1
(i) If A is an nbyn matrix whose columns are orthogonal, then A“1 is obtained by dividing the ith column of A by the sum of the
squares of its entries and transposing the resulting matrix [see (1)].
The ith component x, in the solution of AX = b is the length of
the projection of b on aiC: xi = a?  b/aiC  aiC. (ii) If the columns of A are orthonormal, then the inverse A‘ 1 is AT
and the length of the projection is just x, = a? ' b. Suppose that we have a basis of n orthogonal vectors qi for n—space.
If Q has the qi as its columns, the solution X = b* of QX = b will be a
vector b* of lengths of the projections of b onto each qi: Qb*=TQ1+b:QZ+"'+b:un=b (3)
Here the term b’i‘q1 is just the projection of b onto q]. So (3) simply says r Corollary. Any n—vector b can be expressed as the sum of the projections
' of b onto a set of n orthogonal vectors qi. \ Example 2. onversion of Coordinates from
One Basis to Another Consider the orthonormal basis q1 = [.8, .6], q2 = [—.6, .8] for
2space. To express the vector b = [1, 2] in terms of q1, q2 coordi
nates, we need to solve the system 1‘ :2] Mali] = [i] 458 Ch. 5 Theory of Systems of Linear Equations and Eigenvalue/Eigenvector Problems Fi u . . ~ ' '
g re 5 6 l 6, 8] am [3, .6]»axis (3) ‘13] or bi“ [1%] + b? [72] = [i] bf= 2, b; = 1 are projections [acumen:21 [1,2] 01‘ Qb* = b, where Q = [q1 q2] By Theorem 1, bT=q1b=.8><1+ .6><2= 2,
b: =q2b= .6><1+.8><2=1 where qul = 2[.8, .6] is the projection of b on ql, and biqz =
[— .6, .8] is the projection of b on q2. Thus b = [1, 2] is expressed
as an e 1 — e2 coordinate vector, while b* = [2, l] is the same vector
expressed in ql — q2 coordinates. A geometric picture of this con
version is given in Figure 5.6, where the vector [2, 1] is depicted as
the sum of its projection onto q1 and onto q2. I Theorem 1 is a carbon copy of Theorem 5 of Section 5.3 about pseu
doinverses when columns are orthogonal. As with the inverse, if A’s col
umns are orthonormal, the pseudoinverse A‘r of A will simply be AT. The
following example gives a familiar illustration of this result and shows why
orthogonal columns make inverses and pseudoinverse so similar. Example 3. Pseudoinverse of Matrix with
Orthonormal Columns Let [2 be the ﬁrst two columns of the 3by3 identity matrix. 1 0
12 = 0 1 0 0
Then J£=I§=100
010 For any vector b = [b1, b2, b3], the leastsquares solution x = b* to
12x = b is Sec. 5.4 Orthogonal Systems ‘ / b1 , 1 0 0 b1
i b* = I+b = b =
. ‘7 2 [0 1 0] b2 [b2] 3 ; This result conﬁrms our intuitive notion that [b1, b2, 0]*is the closest
’ point in the x—y plane to the point [[71, b2, b3]. I ions Optional
There is another interesting geometric fact about orthonormal columns
(see the Exercises for the two—dimensional case). Theorem 2. When Q has orthonormal columns, then solving QX = b for b* = QTb is equivalent to performing the orthonormal change of basis b —> b* = QT b. Such a basis change is simply a rotation of the coordinate axes, a reﬂection through a plane, or a cOmbination of both. The entries in Q can be expressed in terms of the sines and cosines of
l the angles of this rotation. For example, the rotation of axis in the plane by 6° is a linear trans led formation R of 2space:
tor I ’n‘ j R: x’ = x cos 6° + y sin 6° or u’ = Au
a: a y’ = —x sin 6° + y cos 6°
5' where
11] .
)1— cos 6° sin 6°
he [— sin 6° cos 6°
3y 4 It is easy to check that A has orthonormal columns. .
It follows that the distance between a pair of vectors and the angle that
they form do not change with an orthonormal change of basis. (Note: End of optional material.)
Orthogonal columns have another important advantage besides easy formulas. A highly nonorthogonal set of columns—that is, columns that are
almost parallel—can result in unstable computations. Example 4. Nonorthogonal Columns Consider the following system of equations: 1x1 + .75x2 = 5 (4)
1x1 + 1x2 = 7 460 Ch. 5 Theory of Systems of Linear Equations and Eigenvalue/Eigenvector Problems Let us call the two column vectors in the coefﬁcient matrix of (4):
u = [1, 1] and v = [.75, 1]. The cosine of their angle is, by Theorem
6 of Section 5.3, uv 1.75 = — = — = .99 5
“’5 90" V) qu «2125 ( ) The angle with cosine of .99 is 8°. Thus u and v are almost parallel
(almost the same vector). Representing any 2vector b as a linear
combination of two vectors that are almost the same is tricky, that is,
unstable. For example, to solve (4) we must we ﬁnd weights x1, x2 such that
1 .75 5 The system (4) is the canoewith—sail system from Section 1.1. We
already know that calculations with A, the coefﬁcient matrix in (4),
are very unstable. In Section 3.5 we computed the condition number
of A to be c(A) 2 16. Recall that the condition number c(A) = ' HA“ 1“ measures how much a relative error in the entries of A (or
in b) could affect the relative error in x = [x1, x2]; in this case, a 5%
error in b could cause an error 16 [= c(A)] times greater in x, a
16 X 5% = 80% error. We solved (4) in Section 1.1 and obtained x1 ,= — 1, x2 = 7.
If we had solved for b’ = [7, 5], we would have obtained the answer
x1 = 13, x2 = — 8 (see Figure 5.7 for a picture of this result). Or for
b” = [6, 6],)6I = 6, x2 = O. V I Figure 5.7 . 13[1,1] —8[.75, 1] Sec. 5.4 Orthogonal Systems 461 Reading the results of Example 4 in reverse, we see that when errors
arise in solving an ill—conditioned system of equations Ax = b (in which A
has a large condition number), the problem should be that some column
vector (or a linear combination of them) forms a small angle with another
column vector—this means that the columns are almost linearly dependent.
If the columns were close to mutually orthogonal, the system Ax = b would
be wellconditioned. Principle. Let A be an n~byn matrix with rank(A)= n so that the system of
equations Ax = b has a unique solution. The solution to Ax = b will
be more or less stable according to how close or far from orthogonal
the column vectors of A are. Suppose that the columns of the nbyn matrix A are linearly inde
pendent but not orthogonal. We shall show how to ﬁnd a new nbyn matrix
A* of orthonormal columns (orthogonal and unit length) that are linear com—
binations of the columns of A. Our procedure can be applied to any basis 31, a2, . . . , am of an m—
dimensional space V and will yield a new basis of m orthonormal vectors q,
for V (unitlength vectors make calculations especially simple). The proce—
dure is inductive in the sense that the ﬁrst k q, will be an orthonormal basis
for the space Vk generated by the ﬁrst k 3,. The method is called
Gram—Schmidt orthogonalization. For k = 1, q1 should be a multiple of 31. To make q1 have norm 1,
we set q1 = a1 / [all Next we must construct from 32 a second unit vector
q2 orthogonal to ql. We divide 32 into two “parts”: the part of 32 parallel
to ql and the part of a2 orthogonal (perpendicular) to q1 (see Figure 5.8).
The component of 32 in ql’s direction is simply the projection of 32 onto
q]. This projection is sq], where the length s of the projection is 32 ' ql
s. = = a  7 ql _ ql 2 ql ( )
since q1  q1 = 1. The rest of az, the vector 32 — sql, is orthogonal
to the projection sq], and hence orthogonal to ql. So 32 — sq1 is the
orthogonal vector we want for qz. To have unit norm, we set q2 =
(32 ‘ “IO/'32 ‘ sqli' Let us show how the procedure works thus far. Figure 5.8 Gram—Schmidt
orthogonalization. Ill 462 Ch. 5 Theory of Systems of Linear Equations and Eigenvalue/Eigenvector Problems mple 5. ram—Schmidt Orthogonalization
in Two Dimensions Suppose that a1 = [3, 4] and a2 = [2, 1] (see Figure 5.8). We set q =1=[3,4l= ﬁﬁ
‘ Iall 5 5’5 We project a2 onto q1 to get the part of a2 parallel to ql. From (7),
the length of the projection is‘ S232'Q1:[2a1]'[%,%]=% =2 and the projection is sq1 = 2[5, 3] = [%, g]. Next we determine the other part of a2, the part orthogonal to sq]:
a2 _ Sq] = [2" _ [ga = [%, Since [%, = 1, then a2 — sq1 _ [5 ‘gl 2 [4 3] '32 _ S(11’ — 1 ‘12:
I We extend the previous construction by ﬁnding the projections of a3
onto ql and q2. Then the vector a3 — slql — .9qu, which is orthogonal to
q1 and q2 should be q3; as before, we divide a3 — slq1 — szq2 by its norm
to make q3 unit length. We continue this process to ﬁnd q4, q5, and so on. xample 6. Gram—Schmidt Orthogonalization
of 3by3 Matrix Let us perform orthogonalization on the matrix A whose ith column
we denote by at. 0 3 2
A = 3 5 5 (8)
4 0 5
First q1 = al/Iall = [0, 3, 41/5 = [0, %, %].
The length of the projection a2 onto q1 is
s=a2q1=30+5%+0%=3 (9a) So the projection of a2 onto q1 is IS (D Sec. 5.4 Orthogonal Systems Next we compute
a2 — sql = [3,5,0] — [43,§,%] [3,1543 _15_2]
Where laz — sqli 9 + 256/25 + 144/25 = 5, Then ___ a2"“11 = __2
‘12 la, — sqll 5’ 25’ 25 We compute the length of the projections of 33 onto q1 and q2: ‘13 = (33 _ 51‘11 '— The matrix of these new orthogonal column vectors is o g
1 Q: __ i
25 In keeping with the principle above, the accuracy of this procedure
depends on how close to and far from orthogonality the columns a, are. If
a linear combination of some 3, forms a small angle with another vector ak
(this means the matrix A has a large condition number), then the resulting
qt. will have errors, making them not exactly orthogonal. However, more
stable methods are available using advanced techniques, such as Householder
transformations. 464 Ch. 5 Theory of Systems of Linear Equations and Eigenvalue/Eigenvector Problems Suppose that the columns of A are not linearly independent. If, say,
33 is a linear combination of a1 and a2, then in the Gram—Schmidt'procedure
the error vector a3 — slq1  szq2 with respect to q1 and q2 will be 0. In
this case we skip a3 and use a4  slq1 — .qu2 to deﬁne q3. The number of
vectors qi formed will be the dimension of the column space of A, that is,
rank(A). The effect of the orthogonalization process can be represented by an
upper triangular matrix R so that one obtains the matrix factorization % Theorem 3. Any mbyn matrix A can be factored in the form
A = QR (11) where Q is the mbyrank(A) matrix with orthonormal columns q,
obtained by Gram—Schmidt orthogonalization, and R is an‘ upper tri
angular matrix of size rank(A)byn (described below). For i < j, entry rij of R is aj  q, the projection of aj onto q. The
diagonal entries in R are the Slzes, before normalization, of the new columns:
r11 = '31,, r22 = '32 ‘591': r33 = ’33 _ 51‘11— 52%, and SO 011 Example 7. QR Decomposition Give the QR decomposition for the matrix A in Example 6. 0 3 2 A = 3 5 5
4 0 5
The orthonormal matrix Q is given in (10). We form R from the
information about the sizes of new columns and the projections as described in the preceding paragraph. Here r12 = s = 3 in (9a), and
r13 = 31 = 7, r23 = 32 = 2 in (9b). Then 0 g %537
QR=%§—§—é—§052
%—§% %001 Let us compute the second column of QR—multiplying Q by
r5, the second column of R—and show that the result is a2, the second
column of A. 0 g %‘ 3
QrE: % é—g —% 5
2 _Q i 0
5 25 25_ (12)
0 g g 3
=3 % +5 ég +0 —% = 5 =2.2
g _1_2 a 0
5 25 25 I :_ ‘ilil‘iii‘ihi ‘i‘ Sec. 5.4 Orthogonal Systems 465 Columns of Q are obtained from linear combinations of the columns
of A. Reversing this procedure yields the columns of A as linear combina—
tions of the columns of Q. This reversal is what is accomplished by the
matrix product QR. Consider the computation in (12). In terms of the col—
umns q, of Q, (12) is 3‘11 + 5‘12 + 0‘13 = 32
or, in terms of R,
r12‘l1 + 722% = 32 (13) (a2 equals its projection onto q1 plus its projection onto qz). Next consider the formula for qz: 32 _ 5‘11 = a2 ” r12‘l1‘ (14)
'32 " iii 7'22 '(U ‘12: since r22 = [32 — sqll and r12 s. Solving for 32 in (14), we obtain (13) 32 — r12‘l1
Hz ‘ r‘ ‘ ’9 r12Q1+ r22‘l2 = a2
' 22 The same analysis shows that the jth column in the product QR is just a
reversal of the orthogonalization steps for ﬁnding qj. The matrix R is upper triangular because column a, is only involved
in building columns qi, qt“, . . . , q” of Q. The QR decomposition is the
column counterpart to the LU decomposition, given in Section 3 .2, in which
the row combinations of Gaussian elimination are reversed to obtain the
matrix A“ from its rowreduced matrix U. The QR decomposition is used frequently in numerical procedures.
We use it to ﬁnd eigenvalues in the appendix to Section 5.5. We will sketch one of its most frequent uses, ﬁnding the inverse or
pseudoinverse of an illconditioned matrix. If A is an n—by—n matrix with
linearly independent columns, the decomposition A = QR yields HIUJUJ A—I = (QR)—1 = R—IQ‘I : R~IQT The fact that Q“1 = QT when Q has orthonormal columns was part of
Theorem 1. Given the QR decomposition of A, (15) says that to get A”‘,
we only need to determine R~ 1. Since R is an upper triangular matrix, its
inverse is obtained quickly by back substitution (see Exercise 12 of Section
3.5). When A is very ill—conditioned, one should compute A“1 via (15):
ﬁrst, determining the QR decomposition of A, using advanced (more stable)
variations of the GramﬂSchmidt procedure; then determining R“ 1; and thus
obtaining A‘1 = RT‘QT. Equation (15) extends to pseudoinverses. That is, if A is an m—byn 1] 466 Ch. 5 Theory of Systems of Linear Equations and Eigenvalue/Eigenvector Problems matrix with linearly independent columns and m > n, then its pseudoinverse
A+ can be computed as A+ = R‘IQT (16) See the Exercises for instructions on how to verify (l6) and examples of its
use. This formula for the pseudoinverse is the standard way pseudoinverses
are computed in practice. Even if one determines Q and R using the basic
Gram—Schmidt procedure given above, the resulting A+ from (16) will be
substantially more accurate than computing A+ using the standard formula
A+ = (ATA)”1AT, because the matrix ATA tends to be illconditioned. For
example, in the leastsquares polynomialﬁtting problem in Example 5 of
Section 5.4, the condition number of the 37by—3 matrix XTX was around
2000! Principle. Because of conditioning problems, the pseudoinverse A+ of a
matrix A should be computed by the formula A+ = RTIQT, where
Q and R are the matrices in the QR decomposition of A. We now introduce a very different use of orthogonality. Our goal is
to make a vector space for the set of all continuous functions. To make
matters a little easier, let us focus on functions that can be expressed as a
polynomial or inﬁnite series in powers of x, such as x3 + 3x2 — 4x + 1
or e" or sin x. Recall that the deﬁning property of a vector space V is that if u and v
are in V, then ru + sv is also in V, for any scalars r, 5. Clearly, linear
combinations of polynomials (or inﬁnite series) are again polynomials (or
inﬁnite series), so these functions form a vector space. For a vector space of functions tobe useful, we need a coordinate
system, that is, a basis of independent functions u,(x) (functions that are not
linearly dependent on each other) so that any function f(x) can be expressed
as a linear combination of these basis functions. f(x) = f1u1(x) + f2u2(x) + ' ‘ ' (17) This basis will need to be inﬁnite and the linear combinations of basis
functions may also be inﬁnite. The best basis would use orthogonal, or even
better, orthonormal functions. To make an orthogonal basis, we ﬁrst need to extend the deﬁnition of
a scalar, or inner, product c  d of vectors to an inner product of functions.
The inner product of two functions f(x) and g(x) on the interval [a, b] is
deﬁned as b
f(x) ' g(x) = L f(x)g(x) dx (18) This deﬁnition is a natural generalization of the standard inner product 0  d
in that both c  d and f(x)  g(x) form sums of termbyterm products of the
respective entities, but in (18) we have a continuous sum, an integral. / Sec. 5.4 Orthogonal Systems 467 With an inner product deﬁned, most of the theory and formulas deﬁned
for vector spaces can be applied to our space of functions. The inner product
tells us when two vectors c, d are orthogonal (if c  d = O), and allows us to compute coordinates cf of c in an orthonormal basis ui: cf = c  u, (these
coordinates are just the projections of c onto the u,). We can now do the
same calculations for functions with (18). The functional equivalent of the euclidean norm is deﬁned by b
f(x)2 = f(x)  f(x) = f(x)2 dx (19) The counterpart of the sum norm [c], = 2 (0,] for vectors is f(x)]s =
I lax)! dx An orthonormal basis for our functions on the interval [a, b] will be
a set of functions {ui(x)} which are orthogonal—by (18), f ul(x)uJ(x) dx =
O, for all i aé j—and whose norms are 1—by (19), f ul(x)2 dx = 1. Given
such an orthonormal basis {ui(x)}, the coordinates f, of a function f(x) in
terms of the ui(x) are computed by the projection formula f, = f(x)  ui(x)
used for ndimensional orthonormal bases: f(x) = [f (x) ' u1(x)]u1(x) + [NO ' u2(x)]u2(x) + ' ' ' (20) How do we ﬁnd such an orthonormal basis? The ﬁrst obvious choice
is the set of powers of x: 1, x, x2, x3, . . . . These are linearly independent;
that is, x" cannot be expressed as a linear combination of smaller powers of
x. Unfortunately, there is no interval on which 1, x, and x2 are mutually
orthogonal. On[—1, 1], 1 x = fxdx = Oandxx2 = fx3 dx = 0,
but1x2 = fxza’x = There are many sets of orthogonal functions that have been developed
over the years. We shall mention two, Legendre polynomials and Fourier
trigonometric, functions. The Gram—Schmidt orthogonalization procedure provides a way to
build. an orthonormal basis out of a basis of linearly independent vectors.
The calculations in this procedure use inner products, and hence this pro
cedure can be applied to the powers of x (which are linearly independent
but, as we just said, far ’from orthogonal) to ﬁnd an orthonormal set of
polynomials. When the interval is [— 1, 1], the polynomials obtained by orthogon
alization are called Legendre polynomials Lk(x). Actually, we shall not
worry about making their norms equal to 1. As noted above, the functions
x0 = 1 and x are orthogonal on [—1, 1]. So L0(x) = 1 and L1(x) = x.
Also, x2 is orthogonal to x but not to 1 on [~ 1, 1]. We must subtract off
the projection of x2 onto 1: 1x2 fxzdx g
L = 2— 1: 2—— : 2——= 2—
20‘) X (1_1) x fldx x 2 x 1
3 (21) A similar orthogonalization computation shows that L3(x) = x3 — §x. 468 Ch. 5 Theory of Systems of Linear Equations and Eigenvalue/Eigenvector Problems Example 8. Approximating ex by
Legendre Polynomials Let us use the ﬁrst four Legendre polynomials L0(x) = l, L1(x) = x,
L2(x) = x2 — %, L3(x) = x3 — 3x/5 to approximate ex on the interval
[— l, 1]. We want the ﬁrst four terms in (20): ex 2 woLo + W1L1(x) + W2L2(x) + W3L3(x) (22) 1 3
=w0+w1x+w2(x2—§)+w3(x3—§) where W, ? ex  Li(x)/L,(x)  L[(x) = f exLl(x) dx/f L,‘(x)2 dx. For
example, 1
J4 e)‘(x2 — %) dx w2 :,—————————— l
Lw—am With a little calculus, we compute the w, to be (approximately) 2.35 .736
wO — T _ 1.18, w1  T67 — 1.10,
.096 .008
= _ = . = —— = w .178 53’ ‘ W3 .046 8
Then (22) becomes
1 3
ex = 1.18 +1.10x + .53 (x2 — 5) + .18 (x3 — (23) If we collect like powers of x together on the right side, (23) simpliﬁes
to ex = l + x + .53x2 + .18x3 (24) Comparing our approximation against the real values of ex at the
points —1, —.5, 0, .5, l, we ﬁnd Sec. 5.4 Orthogonal Systems 469 A pretty good ﬁt. In particular, it is a better ﬁt on [— 1, 1] than simply
using the ﬁrst terms of the power series for 6", namely, 1 + x +
x2/ 2 + x3/ 6. The approximation gets more accurate as more Legendre
polynomials are used. I Over the interval [0, 217] the trigonometric functions (1/ sin kx
and (l/W) cos kx, for k = 1, 2, . . . , plus the constant function 1/ V 217 are an orthonormal basis. To verify that they are orthogonal requires
showing that 27r
ﬁsinjxécoskx=%fo sinjxcoskxdx=0
forallj,k
1 . . 1 . 1 2“. . .
V—Eanxﬁsmkx=;fo s1njxs1nkxdx=0 foralljyék 1T 0 1 1 1 r“
—cos'x—coskx — cos' coskxdx=0
W J V; 1" forallj75k plus showing these trigonometric functions are orthogonal to a constant func
tion. To verify that these trigonometric functions have unit length requires
showing 1 1 1 2"
Tsinkx~7sinlcx=—I0 sinzkxdx=1 forallk
17 17 17 1 1
—cosk ——coskx
V; x v; When u2k_1(x) = (l/W) sin kx and u2k(x) = (l/W) cos kx, k =
1, 2, . . . and u0(x) = 1/Vﬁ in (20), this representation of f(x) is called
a Fourier series, and the coefﬁcients f(x)  ui(x) in (20) are called Fourier
coefﬁcients. Using Fourier series, we see that any piecewise continuous
function can be expressed as a linear combination of sine and cosine waves.
One important physical interpretation of this fact is that any complex elec
trical signal can be expressed as a sum of simple sinusoidal signals. 211'
l I cos2 kx dx = 1 for all k
11' 0 Example 9. Fourier Series Representation
of 3 Jump Function Let us determine the Fourier series representation of the discontinuous
function: f(x) = 1 for 0 < x 5 at and = 0 for 1T < x S 211. The
Fourier coefﬁcients f(x)  ui(x) in (20) are 470 Ch. 5 Theory of Systems of Linear Equations and Eigenvalue/Eigenvector Problems 1 1 “1
ﬁx) ‘ “2k—1(x) = f(x) ' W sin kx = W [0 sin kx dx
i— k dd
= L [—003 kr]11 = kW 0
W? 0
0 k even 1 (25) f(X) ‘ usz) = f(x) ' ﬁcos kx = V1; f0 cos kx dx 1
= kW [sin kx]Z,T 0 Further, we calculate f(x)  1 /\/§17r = \/ 17/2, so the constant term
of the Fourier series for this f(x) is (ﬁx)  u0(x))u0(x) = By (25), only the odd sine terms occur. Letting an odd k be
written as 2n — 1, we obtain the Fourier series. 1 °° 2 .
f(x) _ E + Z] sm [(211 — 1)x] (26) 71 Figure 5.9 shows the approximation to f(x) obtained when the ﬁrst
three sine terms in (26) are used (dashed line) and when the ﬁrst eight
sine terms are used. The ﬁt is impressive. I 1.1 1.0 3.8
I Figure 5.9 Dashed lines use ﬁrst three trigonometric terms in Fourier series for
f(x). Solid lines use ﬁrst eight terms. ” WWII“ ‘u Sec. 5.4 Orthogonal Systems 471 Representing a function in terms of an orthonormal set of functions as
in (20) has a virtually unlimited number of applications in the physical
sciences and elsewhere. If one can solve a physical problem for the ortho—
normal basis functions, then one can typically obtain a solution for any
function as a linear combination of the solutions for the basis functions. This
is true for most differential equations associated with electrical circuits,
vibrating bodies, and so on. Statisticians use Fourier series to analyze time— . ‘ _ series patterns (see Example 3 of Section 1.5). The study of Fourier series
“ l is one of the major ﬁelds of mathematics.
‘ We complete our discussion of vector spaces of functions by showing
how badly conditioned the powers of x are as a basis for representing func—
tions. Remember that the powers of x, x’, i = 0, 1, . . . , are linearly
independent. The problem is that they are far from orthogonal. Let us consider how we might approximate an arbitrary function f(x)
as a linear combination of, say, the powers of x up to JP: f(x) 2 wO + wlx + wzx2 + w3x3 + w4x4 + w5x5 (27) using the continuous version of least—squares theory. If f(x) and the powers
of x were vectors, not functions, then (27) would have the familiar matrix
form f 2 AW and the approximate solution w would be given by w = A+f,
where A+ = (ATA)_1AT. Let us generalize f = Aw to functions by letting the columns of a
matrix be functions. We deﬁne the functional “matrix” A(x): A(x) = [1, x, x2, x3, x4, x5]
Now (27) becomes
f (x) = A(X)W (28) To ﬁnd the approximate solution to (28), we need to compute the
functional version of the pseudoinverse A(x) +2 A(x) + = (A(x)TA(x)) ‘ 1A(x)T
and then ﬁnd the vector w of coefﬁcients in (27): W = A(x)+f(x) = (A(x)TA(x))‘1(A(x)Tf(x) (29) H The matrix A(x)T has xi as its ith “row , so the matrix product
A(x)TA(x) involves computing the inner product of each ‘row” of A(x)T with
each “column” of A(x): entry (1', j) in A(x)TA(x) is xi  xi (= f x’xf' dx) Similarly, the matrix‘ ‘vector’ ’ product A(x)Tf(x) is the vector of inner prod— 472 Ch. 5 Theory of Systems of Linear Equations and Eigenvalue/Eigenvector Problems ucts xi  f(x). The computations are simplest if we use the interval [0, 1].
Then entry (1', j) of A(x)TA(x) is I 1 . xi+j+1 1 1
x‘ x1' =f0xi+1dx = ._.——— = T‘,—' (30)
z+j+1O l+j+1 For example, entry (1, 2) is f xx2 dx = f x3 dx = %. Note that we consider
the constant function 1 (= )6") to be the zeroth row of A(x)T.
Computing all the inner products for A(x)TA(x) yields 1 l l l l l 2 3 4 5 6
.1. l l l l l
2 3 4 5 6 7
e e e e e e T _ A(x) _ l 1 1 1 l 1 4 3 E 7 B a
l l l l l L
5 6 7 8 9 10
l l l A _1. L
6 7 8 9 10 11 This matrix is very illconditioned since the columns are all similar to
each other. When the fractions in (31) are expressed to six decimal places,
such as % = .333333, the inverse given by the author’s microcomputer was
(with entries rounded to integer values) Fractions expressed to six decimal places
(A(x)TA(x))‘1 = 17 —116 —47 1,180 ——1,986 958 — 116 342 7,584 —34,881 49,482 —22,548 — 47 7,584 — 76,499 242,494  301,846 129,004 1,180 — 34,881 242,494 644,439 723,636 — 289,134 — 1,986 49,482  301,846 723,636  747,725 278,975 958 — 22,548 129,004 — 289,134 278,975 — 97,180
(32) The (absolute) sum of the ﬁfth column in (32) is about 2,000,000. The ﬁrst
column in (31) sums to about 2.5. So the condition number of A(x)TA(x),
in the sum norm, is about 2,000,000 X 2.5 = 5,000,000. Now that is an
illconditioned matrix! We rounded fractions to six signiﬁcant digits, but our condition number
tells us that without a seventh signiﬁcant digit, our numbers in (32) could
be off by 500% error [a relative error of .000001 in A(x)TA(x) could yield
answers off by a factor of 5 in pseudoinverse calculations]. Thus the numbers
in (32) are worthless. Suppose that we enter the matrix in (31) again, now expressing frac—
tions to seven decimal places. The new inverse computation yields WWW Sec. 5.4 Orthogonal Systems 473
Fractions expressed to seven decimal places
(A(JC)TA(X))_1 =
51 —1,051 6,160 —1,475 15,419 5,845
— 1,051 26,385 — 165,765 410,749 —438,029 168,208
6,160 ~—165,765 1,079,198 ~2,731,939 2,955,103 —— 1,146,281
— 1,475 410,749 —2,731,939 7,017,359 —7,671,190 2,999,546
15,419 —438,029 2,955,103 —7,671,190 8,454,598 —3,327,362
—5,845 168,208 —1,146,281 2,999,546 —3,327,362 1,316,523
(33) We have a totally different matrix. Most of the entries in (33) are about 10
times larger than corresponding entries in (32). The sum of the ﬁfth column
in (33) is about 23,000,000. If we use (33), the condition number of
A(x)TA(x) is around 56,000,000. Our entries in (33) were rounded to seven
signiﬁcant digits, but the condition number says eight signiﬁcant digits were
needed. Again our numbers are worthless. To compute the inverse accurately
would require double—precision computation. It is only fair to note that the ill—conditioned matrix (31) is famously
bad. It is called a 6by6 Hilbert matrix [a Hilbert matrix has 1 /(i + j + 1)
in entry (1', j)]. Suppose that we used the numbers in (32) for (A(x)TA(x))’1 in com
puting the pseudoinverse. Let us proceed to calculate A(x)+ and then com
pute the coefﬁcients in an approximation for a function by a ﬁfth—degree
polynomial. Let us choose f(x) = 6". Then (A(x)Te") is the vector of
inner products xi  6i = f xie" dx, i = 0, 1, . . . , 5. Some calculus yields
A(x)Tex = [2.718, 1, .718, .563, .465, .396] (expressed to three signiﬁcant
digits). Now inserting our values for (A(x)TA(x))#1 and ATe‘ into (27), we
obtain W = (A(X)TA(X))“1(A(X)T6‘) = 17 — 116
—47
1,180
— 1,986
958 17
—87
—219
1,611
2,449
1,135 —116 ~47 1,180 —1,986 958 2.718
342 7,584 — 34,881 49,482 — 22,548 1
7,584 — 76,499 242,494 — 301,846 129,004 .718
— 34,881 242,494 644,439 723,636 289,134 .563
49,482 — 301,846 723,636 — 747,725 278,975 .465
— 22,548 129,004 — 289,134 278,975 — 97,180 .396
(34) M1 474 Ch. 5 Theory of Systems of Linear Equations and Eigenvalue/Eigenvector Problems Thus our ﬁfthdegree polynomial approximation of e" on the interval
[0, 1] is ex = 17 — 87x — 219262 + 1611263 + 2449x4 + 113525 (35) Settingx = lin (35), we have 61 = 17 — 86 — 219 + 1611 + 2449 +
1135 = 4907, pretty bad. Since our computed values in (A(X)TA(X))_1 are
meaningless, such a bad approximation of ex was to be expected. Compare (35) with the Legendre polynomial approximation in Ex—
ample 8. Section 5.4 Exercises Summary of Exercises Exercises 1—11 involve inverses, pseudoinverses, and projections for matri—
ces with orthogonal columns. Exercises 12—21 involve Gram—Schmidt or
thogonalization and the QR decomposition. Exercises 22—30 present prob—
lems about functional inner products and functional approximation. 1. Compute the inverses of these matrices with orthogonal columns. Solve AX: th—t where A is the matrix in part (b). .6 .8 V 2 2 1
(a) —.8 .6 (b) —2 1 2
1 —2 2 2. Compute the inverses of these matrices with orthogonal columns. —1 4 —1 2 —3 6
(a) 2 1—2 (b) —6 2 3
1 2 3 3 6 2 .5 —.5 1 (c) —.5 .5 1 1 .5 0 Solve AX = l, where A is the matrix in part (a). 3. Show that if A is an nbyn upper triangular matrix with orthonormal
columns, A is the identity matrix I. 4. Compute the length k of the projection of b onto a and give the pro—
jection vector ka. (a) a = [0,1,0],b = [3, 2, 4]
(b) a = [1, —1,2],b = [2, 3,1]
(c) a = [i 13:, %], b = [4, 1, 3] (d) a = [2, —1, 3],b = [—2,5, 3] 5. Express the vector [2, 1, 2],as a linear combination of the following
orthogonal bases for three—dimensional space.
(a) [1, —1, 2],[2,2,0],[—1,1, 1]
(b) [%, %, ~%], [i %, %], [%, —%, i] (c) [3,1.5,1],[1, —3, 1.5], [—1.5, 1, 3] 6. Compute the pseudoinverse of II 
wIN ODIN 039‘
MM cabI who Find the leastsquares solution to Ax = 1.
7. Compute the pseudoinverse of 3 4
A: 1 —2
2 —5 Find the leastsquares solution to Ax = l. 8. Consider the regression model 2 = qx + ry + s for the following data,
where the xvalue is a scaled score (to have average value of 0) of high
school grades, the y—value is a scaled score of SAT scores, and the
z—value is a score of college grades. Determine q, r, and s. Note that the x, y, and 1 vectors are mutually
orthogonal. 9. Verify that Theorem 2 is true in two dimensions, namely, that a change
from the standard {e1, e2} basis to some other orthonormal basis
{q1, q2} corresponds to a rotation (around the origin) and possibly a
reﬂection. Note that since q1, q2 have unit length, they are completely
determined by knowing the (counterclockwise) angles 01, 02 they make
with the positive e1 axis; also since q1, q2 are orthogonal, [01  02] =
90°. 476 Ch. 5 Theory of Systems of Linear Equations and Eigenvalue/Eigenvector Problems 10. 11. 12. 13. 14. 15. 16. 17. 18. (a) Show that an orthonormal change of basis preserves lengths (in euclidean norm).
Hint: Verify that (Qv)  (Qv) = v  v (where Q has orthonormal
columns) by using the identity (Ab)  (Cd) = bT(ATC)d. (b) Show that an orthonormal change of basis preserves angles.
Hint: Show that the cosine formula for the angle is unchanged by
the method in part (a). Compute the angle between the following pairs of nonorthogonal vec— tors. Which are close to orthogonal?
(a) [3, 2], [*3, 4] (b) [1, 2, 5], [2, 5, 3] (c) [17 _3’ 2], {—2, 4’ Find the QR decomposition of the following matrices. ( ) 3 — 1 2 1 1 — 1 2 2 3 1 a 4 1 (b) 1 1 (c) 2  1 1 (d) 1 1 1
 2 3 2 — 2 2 2 1 2 Use the Gram—Schmidt orthogonalization to ﬁnd an orthonormal basis
that generates the same vector space as the following bases: (3) [1, 1], [2, l] (b) [2, 1, 2], [4, 1, 1], (c) [3, 1, 1], [1,2, 1], [1, 1,2] (3) Compute the inverse of the matrix in Exercise 12, part (c) by ﬁrst
ﬁnding the QR decomposition of the matrix and then using (15) to
get the inverse. (See Exercise 12 of Section 3.5 for instructions on
computing R‘ 1.) What is its condition number? Check your answer by computing the inverse by the regular elimi
nation by pivoting method. (1!) (3) Find the pseudoinverse A+ of the matrix A in Exercise 12, part (b) by using the QR decomposition of A and computing A+ as
+ = R—lQT Check your answer by ﬁnding the pseudoinverse from the formula A+ = (ATA)‘1AT. Note that this is a very poorly conditioned matrix; compute the condition number of (ATA). (1!) Use (16) to ﬁnd the pseudoinverse in solving the reﬁnery problem in
Example 3 of Section 5.3. Use (16) to ﬁnd the pseudoinverse in the following regression problems
using the model )‘2 = qx + r. (a) (x, y) points: (0, 1), (2, 1), (4, 4) (b) (x, y) points: (3, 2), (4, 5), (5, 5), (6, 5) (c) (x, y) points: (—2, 1), (0, 1), (2, 4) Use (16) to ﬁnd the pseudoinverse in the leastsquares polynomial—ﬁtting
problem in Example 5 of Section 5.3. Sec. 5.4 Orthogonal Systems 19. 20. 21. 22. 23. 24. 25. 26. 27. 28. 477 Verify (16): A+ = R“‘QT, by substituting QR for A (and RTQT for
AT) in the pseudoinverse formula A+ = (ATA)‘1AT and simplifying
(remember that R is invertible; we assume that the columns of A are
linearly independent). Show that if the columns of the mbyn matrix A are linearly inde
pendent, the mbym matrix R of the QR decomposition must be in—
vertible. Hint: Show main diagonal entries of R are nonzero and then see Ex
ercise 12 of Section 3.5 for instructions on computing inverse of R. Show that any set H of k orthonormal n—vectors can be extended to an
orthonormal basis for ndimensional space. Hint: Form an n—by(k + n) matrix whose ﬁrst k columns come from
H and whose remaining n columns form the identity matrix; now apply
the Gram—Schmidt orthogonalization to this matrix. Over the interval [0, 1], compute the following inner products: x  x, xx3, x3x3. Verify that the fourth Legendre polynomial is x3 — gx. Verify the values found for the weights w, w, w}, and w4 in Exam—
ple 8. Note: You must use integration by parts~or a table of integrals. Approximate the following functions f(x) as a linear combination of the
ﬁrst four Legendre polynomials over the interval [~ 1, 1]: L0(x) = 1,
L1(x) = x, L2(x) = x2 — %, L3(x) = x3 — 3x/5. (:0 f(x) = x4 (b) f(x) = le (c) f(x) = —1:x<0, =12x20 Approximate x3 + 2x — 1 as a linear combination of the ﬁrst four
Legendre polynomials over the interval {—1, 1]: Lo(x) = 1, L1(x) =
x, L2(x) = x2 ~ %, L3(x) = x3 — 3x/5. Your “approximation” should
equal x3 + 2x ~ 1, since this polynomial is a linear combination of
the functions 1, x, x2, and x3, from which the Legendre polynomials
were derived by orthogonalization. (21) Find the Legendre polynomial of degree 4. (b) Find the Legendre polynomial of degree 5. (2) Using the interval [0, 1], instead of [— l, 1], ﬁnd three orthogonal
polynomials of the form K0(x) = a, K1(x) = bx + c, and
K2(x) = dx2 + ex + f. (b) Find a leastsquares approximation of x“ on the interval [0, 1] using your three polynomials in part (a). ...
View
Full
Document
This note was uploaded on 02/24/2010 for the course AMS 210 taught by Professor Fried during the Fall '08 term at SUNY Stony Brook.
 Fall '08
 FRIED

Click to edit the document details