Unformatted text preview: EE448/528 Version 1.0 John Stensby Chapter 6 r r AX = b: The Minimum Norm Solution and the LeastSquareError Problem
Like the previous chapter, this chapter deals with the linear algebraic equation problem AX = b. However, in this chapter, we impose additional structure in order to obtain specialized, but important, results. First, we consider Problem #1: b is in the range of A. For AX = b, either a unique solution exists or an infinite number of solutions exist. We want to find the solution with minimum norm. The minimum norm solution always exists, and it is unique. Problem #1 is called the minimum norm problem. Next, we consider Problem #2: b is not in the range of A so that r r r r r AX = b has no solution. Of all the vectors X that minimize AX − b 2 , we want to find the one with minimum norm. Problem #2 is called the minimum norm, leastsquareerror problem. Its solution always exists and it is unique. It should be obvious that Problems #1 and #2 are special cases of a more general problem. That is, given AX = b (with no conditions placed on b), we want to find the X which r r r r simultaneously minimizes both AX − b 2 and X 2 . Such an X always exists, it is always unique, and it is linearly related to b. Symbolically, we write r r r r r r r r r r r X = A+ b , r r (61) where A+ denotes a linear operator called the pseudo inverse of A (yes, if A1 exists, then A+ = A1 and X = A1b). For Problem #1, X = A+ b will be a solution of AX = b, and it will be the “shortest length” (minimum norm) solution. For Problem #2, X = A+ b simultaneously minimizes both r r r r r AX − b 2 and X 2 even though it does not satisfy AX = b (which has no solution for Problem #2 where b ∉ R(A) ). Problem #1: The Minimum Norm Problem In this section we consider the case where b ∈ R(A). The system r r r r r r r r r r CH6.DOC Page 61 EE448/528 Version 1.0 John Stensby AX = b r r (62) has a unique solution or it has and infinite number of solutions as described by (57). If (62) has a unique solution, then this solution is the minimum norm solution, by default. If (62) has an infinite number of solutions, then we must find the solution with the smallest norm. In either case, the minimum norm solution is unique, and it is characterized as being orthogonal to K(A), as shown in what follows. Each solution X of (62) can be uniquely decomposed into r r r X = X K⊥ ⊕ X K , r (63) where r X K⊥ ∈ K( A )⊥ = R ( A ∗ ) r X K ∈ K( A ) . (64) However r r r r r AX = A ( X K⊥ ⊕ X K ) = AX K⊥ = b , r r r so X K⊥ ∈ K( A )⊥ is the only part of X that is significant in generating b. Theorem 6.1 r In the decomposition (6.3), there is only one vector X K⊥ ∈ K( A )⊥ that is common to all r r r r r solutions of AX = b. Equivalently, there is a unique vector X K⊥ ∈ K( A )⊥ for which A X K⊥ = b. Proof: To show this, assume there are two, and arrive at a contradiction. We assume the r r r r r r existence of X K⊥ ∈ K( A )⊥ and YK⊥ ∈ K( A )⊥ such that A X K⊥ = b and A YK⊥ = b. Then, simple (65) CH6.DOC Page 62 EE448/528 Version 1.0 John Stensby r r r r r subtraction leads to the conclusion that A( X K⊥  YK⊥ ) = 0, or the difference X K⊥  YK⊥ ∈ K(A). r r This contradiction (as we know, X K⊥  YK⊥ ∈ K(A)⊥ ) leads to the conclusion that there is only r r r one X K⊥ ∈ K( A )⊥ that is common to all solutions of AX = b (each solution has a decomposition r of the form (6.3) where a common X K⊥ is used).♥ Theorem 6.2 r r r The unique solution X K⊥ ∈ K( A )⊥ is the minimum norm solution of AX = b. That is, r X K⊥ r < X ,
2 2 (66) r r r r r where X is any other (i.e., X ≠ X K⊥ ) solution of AX = b.
Proof: r r r r r Let X, X ≠ X K⊥ , be any solution of AX = b. As shown by (63), we can write the decomposition
r r r X = X K⊥ ⊕ X K , (67) where XK ∈ K(A). Clearly, we have r 2 r r 2 r r r r X = X K⊥ ⊕ X K = X K⊥ ⊕ X K , X K⊥ ⊕ X K
2 2 r r r r r = X K⊥ , X K⊥ + X K , X K r r 2 2 = X K⊥ + X K , 2 2 ( due to orthogonality of vectors) (68) r r so that X K⊥ 2 < X 2 as claimed.♥ When viewed geometrically, Theorem 6.2 is obvious. As shown by Figure 6.1, the solution set for AX = b is a linear variety, a simple translation of K(A). Figure 6.1 shows an r r CH6.DOC Page 63 EE448/528 Version 1.0
r John Stensby So lu tio ns of Co nt ain in g A r X = r X
r XK V ar iet y Li ne ar r X K⊥ r X is non  optimum r X K⊥ ∈ K(A)⊥ is optimum (minimum norm) Figure 61: The solution set is a linear variety. The minimum norm solution is orthogonal to K(A). r r arbitrary solution X, and it shows the “optimum”, minimum norm, solution X K⊥ . Clearly, the solution becomes “optimum” when it is exactly orthogonal to K(A) (i.e., it is in K(A)⊥). We have solved Problem #1. The Pseudo Inverse A+ K[ A] r r r r X = X K ⊥ + X K where X K ∈ K( A ) b AX = b always exists. And, this solution is in R(A*) = K(A)⊥. Hence we have a mapping from b ∈ R(A) back to X ∈ R(A*) (at this point, it might be a good idea to study Fig. 5.1 once again). It is not difficult to show that this mapping is linear, onetoone and onto. We denote this mapping by r r As shown by Theorems 6.1 and 6.2, when b ∈ R(A), a unique minimum norm solution to r r r A+ : R(A) → R(A*), (69) and we write X = A+b, for b ∈ R(A) and X ∈ R(A*). Finally, our unique minimum norm solution r r r r r to the AX = b, b ∈ R(A), problem (i.e., "Problem #1") is denoted symbolically by X K⊥ = A+b.
CH6.DOC Page 64 r r r r EE448/528 Version 1.0 John Stensby As a subspace of U, R(A*) is a vector space in its own right. Every X in vector space R(A*) ⊂ U gets mapped by matrix A to something in vector space R(A) ⊂ V. By restricting the domain of A to be just R(A*) (instead of the whole U), we have a mapping with domain R(A*) and codomain R(A). Symbolically, we denote this mapping by r A Y R [ A∗ : R(A ∗ ) → R(A) , (610) and we call it the restriction of A to R(A*). The mapping (610) is linear, onetoone, and onto (even though A : U → V may not be onetoone or onto). More importantly, the inverse of mapping (610) is the mapping (69). As is characteristic of an operator and its inverse, we have A Y R [ A∗ ] r r r A+ b = b for all b ∈ R(A) , r r r ∗ X = X for all X ∈ R(A ) (611) A A + Y R [ A∗ ] as expected. Finally, the relationship between (69) and (610) is illustrated by Fig. 62. As defined so far, the domain of A+ is only R(A) ⊂ V. This is adequate for our discussion of "Problem #1"type problems where b ∈ R(A). However, for our discussion of "Problem #2" (and the more general optimization problem, discussed in the paragraph containing (61), where b ∈ V), we must extend the domain of A+ to be all of V = R(A) ⊕ K(A*). A+ is already defined on r r A R(A*) R(A*) = K(A)⊥ R(A) = K(A*)⊥ A+
Figure 62: Restricted to R(A*) = K(A)⊥, A is onetoone and onto, and it has the inverse A+.
CH6.DOC Page 65 EE448/528 Version 1.0 John Stensby R(A); we need to define the operator on K(A*). We do this in a very simple manner. We define A+b = 0, b ∈ K(A*). r r r (612) With the extension offered by (612), we have the entire vector space V as the domain of A+. The operations performed by A and A+ are illustrated by Fig. 6.3. When defined in this manner, A+ is called the MoorePenrose pseudo inverse of A (or simply the pseudo inverse). A number of important observations can be made from inspection of Fig. 63. Note that (1) AA+ X = 0 for X ∈ K(A*) = R(A)⊥, (2) AA+ X = X for X ∈ K(A*)⊥ = R(A), (3) A+ AX = 0 for X ∈ K(A) = R(A*)⊥, (4) A+ AX = X for X ∈ R(A*) = K(A)⊥, and (5) R(A*) = R(A+ ) and K(A*) = K(A+ ) (however, A* ≠ A+ ). Note that (1) and (2) of (613) imply that AA+ is the orthogonal projection of V onto R(A). Also,
Su bs pa c Co es m are pl O em rt en hog ts on al al on og rth O nts re e s a lem ce p pa om bs C Su r r r r r r r r r (613) r r r Space U A
R(A*) Space V R(A*) = K(A)⊥
A+ R(A) = K(A*)⊥ K(A) = R(A*)⊥ When restricted to R(A*), A is onetoone from R(A*) onto R(A). A+ is onetoone from R(A) onto R(A*) . r A maps K(A) to 0 r A+ maps K(A*) to 0 K(A*) = R(A)⊥ A r 0 A+ r 0 Figure 63: How A and A+ map the various parts of U and V. CH6.DOC Page 66 EE448/528 Version 1.0 John Stensby (3) and (4) of (613) imply that A+ A is the orthogonal projection of U onto R(A*) = R(A+ ). Finally, when A1 exists we have A+ = A1. Original Moore and Penrose Definitions of the Pseudo Inverse In 1935, Moore defined A+ as the unique n×m matrix for which
U A+ A V 1) AA+ is the orthogonal projection of V onto R(A) = K(A*)⊥, and 2) A+ A is the orthogonal projection of U onto R(A*) = K(A)⊥. (614) In 1955, Penrose gave a purely algebraic definition of the pseudo inverse. He said that A+ is the unique n×m matrix satisfying 3) AA+ A = A, 4) A+ AA+ = A+ , 5) AA+ and A+ A are Hermitian. (615) Of course Moore's and Penrose's definitions are equivalent, and they agree with the geometric description we gave in the previous subsection. While not very inspiring, Penrose's algebraic definition makes it simple to check if a given matrix is A+ . Once we compute a candidate for A+, we can verify our results by checking if our candidate satisfies Penrose's conditions. In addition, Penrose's definition leads to some simple and useful results as shown by the next few theorems. Theorem 6.3 (A+)+ = A++ = A Proof: The result follows by direct substitution into (615). Theorem 6.4 (A*)+ = (A+)* or A*+ = A+* .
CH6.DOC (616) (617)
Page 67 EE448/528 Version 1.0 John Stensby That is, the order of the + and * is not important! Proof: We claim that the pseudo inverse of A* is A + * . Let's use (615) to check this out! 3′) A*A+*A* = A*(AA+)* = ((AA+)A)* = (AA+A)* = A* so that 3) holds! 4′) A+*A*A+* = A+*(A+A)* = ((A+A)A+)* = (A+AA+)* = A+* so that 4) holds! 5′) A+*A* = (AA+)* = ((AA+)*)* = (A+*A* )* so A+*A* is Hermitian! So is A*A+* !! Theorem 6.5 Suppose A is Hermitian. Then A+ is Hermitian. Proof: (A+)* = (A*)+ = A+. Given matrices A and B, one might ask if (AB)+ = B+A+ in general. The answer is NO! It is not difficult to find some matrices that illustrate this. Hence, a well known and venerable property of inverses (i.e., that (AB)1 = B1A1) does not carry over to psuedo inverses. Computing the Pseudo Inverse A+ There are three cases that need to be discussed when one tries to find A+. The first case is when rank(A) < min(m,n) so that A does not have full rank. The second case is when m×n matrix A has full row rank (rank(A) = m). The third case is when A has full column rank (rank(A) = n). Case #1: r = Rank(A) < Min(m,n)
? In this case, K(A) contains nonzero vectors, and the system AX = b has an infinite number of solutions. For this case, a simple "pluginthevariables" formula for A+ does not exist. However, the pseudo inverse can be calculated by using the basic ideas illustrated by Fig. 6.3. r r Ymr be a basis of K(A*). From (611) and Fig. 6.3, we see r r A+ AX1 L AX r r r r r Y1 L Ym − r = A+ AX1 L A+ AX r r r = X1 L X r r r A+ Y1 L A+ Ym − r (618) r Suppose m×n matrix A has rank r. Let X1, X2, ... , Xr be a basis of R(A*) and Y1, Y2, ... , r r r r r r r 0 L 0 . Now, note that AX1, ..., AXr is a basis for R(A). Since Y1, Y2, ... , Ymr is a basis of K(A*) = r r r r r CH6.DOC Page 68 EE448/528 Version 1.0 John Stensby r r R(A)⊥, we see that m×m matrix AX1 L AX r we have r r A + = [X1 L X r r r r r 0 L 0] AX1 L AX r 1 24 4 3
m − r zero columns r r Y1 L Ym − r is nonsingular. From (618) r r −1 Y1 L Ym − r (619) While not simple, Equation (619) is "useable" in many cases, especially when m and n are small integers. Example % Pseudo Inverse Example % Enter 4x3 matrix A. A = [1 1 2; 0 2 2; 1 0 1; 1 0 1] A = 1 1 2 0 2 2 1 0 1 1 0 1 % Have MatLab Calculate an Orthonormal Basis of R(A*) X = orth(A') X = 0.3052 0.7573 0.5033 0.6430 0.8084 0.1144 % Have MatLab Calculate an Orthonormal Basis of K(A*) Y = null(A') Y = 0.7559 0.3780 0.3780 0.3780 0 0.0000 0.7071 0.7071 % Use (619) of class notes to calculate the pseudo inverse pseudo = [ X [0 0 0]' [0 0 0]' ]*inv([A*X Y]) pseudo = 0.1429 0.2381 0.2619 0.2619 0.0000 0.3333 0.1667 0.1667 0.1429 0.0952 0.0952 0.0952 % Same as that produced by MatLab's pinv function? pinv(A) ans = 0.1429 0.2381 0.2619 0.2619 0.0000 0.3333 0.1667 0.1667 0.1429 0.0952 0.0952 0.0952 % YES! YES! CH6.DOC Page 69 EE448/528 Version 1.0 John Stensby Case #2: r = Rank(A) = m (A has full row rank) For this case, we have m ≤ n, the columns of A may or may not be dependent, and AX = b may or may not have an infinite number of solutions. For this case, n×m matrix A* has full column rank and (619) becomes r r A + = A ∗ AA∗ e j −1 , (620) a simple formula. Case #3: Rank(A) = n (A has full column rank) For this case, we have n ≤ m, the columns of A are independent. For this case, the n×n matrix A*A is nonsingular (why?). On the left, we multiply the equation AX = b by A* to obtain A*AX = A*b, and this leads to r r X K⊥ = ( A∗A )−1 A∗ b r r r r (621) When A has full column rank, the pseudo inverse is A+ = (A*A)1A*, (622) a simple formula. Example 61 A = LM1 N0 0 1 1 0 OP Q r 2 b= 1 LM OP NQ Note that rank(A) = 2 and nullity(A) = 1. By inspection, the general solution is 2 1 r X = 1 +α 0 0 −1 LM OP MM PP NQ LM MM N OP PP Q
Page 610 CH6.DOC EE448/528 Version 1.0 John Stensby Now, find the minimum norm solution. We do this two ways. Since this problem is so simple, lets compute r 2 2 2 2 X 2 = (2 + α) + 1 + α . The minimum value of this is found by computing d r 2 X 2 = 2( 2 + α ) + 2α = 0 dα which yields α = 1. Hence, the minimum norm solution is 2 1 1 r X K⊥ = 1 − 1 0 = 1 , 0 −1 1 LM OP MM PP NQ LM MM N OP PP Q LM OP MM PP NQ (623) a vector that is orthogonal to K(A) as required (please check orthogonality). This same result can be computed by using (620) since, as outlined above, special case #2 applies here. First, we compute AA ∗ = LM2 0OP , ( AA∗ )−1 = LM1 / 2 0OP . N0 1 Q N 0 1Q Then, we use these results with (620) to obtain 1 0 r r r 1/ 2 0 X K⊥ = A + b = A ∗ ( AA∗ )−1 b = 0 1 0 1 1 0 LM MM N OP L PP MN Q OP LM2OP = LM1 0 Q N1Q MMN1 0 1 1 1 = 1 , 1 0 1 OP L O PP MN PQ Q LM OP MM PP NQ CH6.DOC Page 611 EE448/528 the same results as (623)! Version 1.0 John Stensby MatLab’s Backslash ( \ ) and Pinv Functions A unique solution of AX = b exists if b ∈ R(A) and nullity(A) = 0. It can be found by using MatLab's backslash ( \ ) function. The syntax is X = A\b; MatLab calls their backslash matrix left division. r r r Let r = rank(A) and nullity(A) = n  r ≥ 1. For if b ∈ R(A), the complete solution can be described in terms of n  r independent parameters, as shown by (57). These independent parameters can be chosen to yield a solution X that contains at least n  r zero components. Such a solution is generated by the MatLab syntax X = A\b. So, if a unique solution exists, MatLab's backslash operator finds it; otherwise, the backslash operator finds the solution that contains the r maximum number of zeros (in general, this maximumnumberofzeros solution will not be X K⊥ , the minimum norm solution). r MatLab can also find the optimum, minimum norm, solution X K⊥ . The MatLab syntax for this is X = pinv(A)*b. If a unique solution to AX = b exists, then you can find it using this syntax (but, for this case, the backslash operator is a better method). If AX = b has an infinite number of solutions, then X = pinv(A)*b finds the optimum, minimum norm solution. MatLab calculates the pseudo inverse by using the singular value decomposition of A, a method that we will discuss in Chapter 8. Example 62 (Continuation of Example 61) We apply MatLab to the problem given by Example 61. The MatLab diary file follows. % use the same A and b as was used in Example 61 A = [1 0 1;0 1 0]; b = [2;1]; % find the solution with the maximum number of zero components X = A\b X = 2 1 0 norm(X)
CH6.DOC Page 612 r r r r r r EE448/528 Version 1.0 John Stensby ans = 2.2361 % % find the optimum, minimum norm, solution Y = pinv(A)*b Y = 1.0000 1.0000 1.0000 % solution is the same as that found in example 61! norm(Y) ans = 1.7321 % the "minimum norm" solution has a smaller norm % then the "maximumnumberofzeros" solution. Problem #2: The Minimum Norm, LeastSquareError Problem (Or: What Can We Do With Inconsistent Linear Equations?) There are many practical problems that result, in one way or another, in an inconsistent system AX = b of equations (i.e., b ∉ R(A)). A solution does not exist in this case. However, we can "leastsquares fit" a vector X to an inconsistent system. That is, we can consider the leastr r r squares error problem of calculating an X that minimizes the error AX − b 2 (equivalent to r r 2 minimizing the square error AX − b 2 ). As should be obvious (or, with just a little thought!), such an X is not always unique. Hence, we find the minimum norm vector that minimizes the r r r r error AX − b 2 (often, the error vector AX  b is called the residual). As it turns out, this minimum norm solution to the leastsquares error problem is unique, and it is denoted as X = A+ b , where A+ is the pseudo inverse of A. Furthermore, with MatLab's pinv function, we can solve this minimumnorm, leastsquareerror problem (as it is called in the literature). However, before we jump too far into this problem, we must discuss orthogonal projections and projection operators. The Orthogonal Projection of One Vector on Another Let's start out with a simple, one dimensional, projection problem. Suppose we have two vectors, X1 and X2, and we want to find the "orthogonal projection", denoted here as Xp, of vector X2 onto vector X1. As depicted by Figure 64, the "orthogonal projection" is colinear with X1. And, the error vector X2  Xp is orthogonal to X1. r r r r r r r r r r r r r r r r CH6.DOC Page 613 EE448/528 Version 1.0 r X2 John Stensby r r X2 − X p r 0 r Xp r X1 r r Figure 64: Orthogonal projection of X 2 onto X1 We can find a formula for Xp. Since the error must be orthogonal to X1, we can write r r r r r r r r r r 0 = X 2 − X p , X1 = X 2 − αX1, X1 = X 2 , X1 − α X1, X1 r r (624) which leads to r r X 2 , X1 α= r 2 X1 2 (625) and r r r r X2 , X1 r r X1 X p = r 2 X1 = X 2 , r X1 2 X1 2 r X1 r . X1 2 (626) Hence, Xp is the component of X2 in the X1 direction; X2  Xp is the component of X2 in a direction that is perpendicular to X1. In the next section, these ideas are generalized to the projection of a vector on an arbitrary subspace. Orthogonal Projections Let W be a subspace of ndimensional vector space U (the dimensionality of W is not important in this discussion). An n×n matrix P is said to represent an orthogonal projection operator on W (more simply, P is said to be an orthogonal projection on W) if r r r r r r r CH6.DOC Page 614 EE448/528 a) R(P) = W b) P2 = P (i.e., P is idempotent) c) P* = P (i.e., P is Hermitian) Version 1.0 John Stensby (627) We can obtain some simple results from consideration of definition (627). First, from a) we have PX ∈ W for all X ∈ U. Secondly, from a) and b) we have PX = X for all X ∈ W. To see this, note that for X ∈ W = R(P) we have P(PX  X) = P2X  PX = PX  PX = 0 so that PX X ∈ K(P). r r r r r r r r r r r r r r r However, PX  X ∈ R(P) = W. r r The only vector that is in K(P) and R(P) simultaneously is 0. Hence, PX = X for each and every X ∈ W. The third observation from (627) is that a), b) and c) tell us that (I  P)X ∈ W⊥ for each X ∈ U. To see this, consider any Y ∈ W = R(P). Then, Y = PX2 for some X2 ∈ U. Now, take any X1 ∈ U and write r r r r r r r r ( I − P )X1, Y = (I − P )X1, PX 2 = P∗ ( I − P )X1, X 2 = P( I − P )X1, X2 r r r r r r = PX1 − P 2X1, X 2 = PX1 − PX1, X 2 =0 Hence, for all X1 ∈ U, we have (I  P)X1 orthogonal to W. Hence, X is decomposed into a part PX ∈ W and a part (I  P)X ∈ W⊥. Figure 6.5 depicts this important decomposition. The fourth observation from (627) is . (628) r r r r r r r r r r r r r r r r X ∈ W⊥ ⇒ PX = 0. r r r (629) CH6.DOC Page 615 EE448/528 Version 1.0 John Stensby r X r (I  P)X ∈ W ⊥ r 0 r PX Subspace W Figure 65: Orthogonal Projection of X onto a subspace W. X is r r decomposed into PX ∈ W and (I  P)X ∈ W⊥. This is obvious; it follows from the facts that r PX ∈ W r r r ( I − P )X = X − PX ∈ W ⊥ , (630) r r and if X ∈ W⊥, the only way (630) can be true is to have PX = 0 for all X ∈ W⊥. The fifth observation is that P plays a role in the unique direct sum decomposition of any X ∈ U. Recall r r r r r that U = W⊕W⊥, and any X ∈ U has the unique decomposition X = X W ⊕ X W ⊥ , where XW ∈ r r r r r W and X W ⊥ ∈ W ⊥ . Note that PX = XW and (I  P)X = X W ⊥ . Hence, if P is the orthogonal projection operator on W, then (I  P) is the orthogonal projection operator on W⊥. PX is the Optimal Approximation of X Of all vectors in W, PX is the "optimal" approximation of X ∈ U. Consider any Xδ ∈ W and any X ∈ U, and note that Z ≡ PX + Xδ ∈ W (think of Xδ ∈ W as being a perturbation of PX r r r r r r r ∈ W). We show that X − PX 2 ≤ X − {PX + Xδ } 2 for all X ∈ U and Xδ ∈ W. Consider r r r 2 r r r r r r X − {PX + Xδ } = ( X − PX ) − X δ , ( X − PX ) − X δ
2 r r r r r r r r r r r r r r r r r r r r r r r r r r = X − PX, X − PX − X δ , X − PX − X − PX, X δ + Xδ , Xδ . (631) CH6.DOC Page 616 EE448/528 Version 1.0 John Stensby r r r r r r However, the cross terms Xδ , X − PX and X − PX, Xδ are zero so that r r r 2 r r 2 r 2 X − {PX + Xδ } = X − PX + Xδ .
2 2 2 (632) From this, we conclude that r r 2 r r r 2 r r 2 r 2 X − PX ≤ X − {PX + Xδ } = X − PX + Xδ
2 2 2 2 (633) for all Xδ ∈ W and X ∈ U. Of all the vectors in subspace W, PX ∈ W comes “closest” (in the 2norm sense) to X ∈ U. Projection Operator on W is Unique The orthogonal projection on a subspace is unique. Assume, for the moment, that there are n×n matrices P1 and P2 with the properties given by (627). Use these to write r r r r 2 r∗ r ∗ r ∗ ∗ ( P1 − P2 )X 2 = X ( P1 − P2 ) ( P1 − P2 )X = P1X ( P1 − P2 )X − P2X ( P1 − P2 )X r r r r c h c h r r ∗ = P1X {P1 − I} − {P2 − I} X − c hb
r g c hb r r ∗ P2X {P1 − I} − {P2 − I} X g (634) However both P1X and P2X are orthogonal to both (P1  I)X and (P2  I)X. This leads to the conclusion that r ( P1 − P2 )X 2 = 0 r r r (635) for all X ∈ U, a result that requires P1 = P2. Hence, given subspace W, the orthogonal projection operator on W is unique, as originally claimed. r CH6.DOC Page 617 EE448/528 Version 1.0 John Stensby How to “Make” an Orthogonal Projection Operator Let v1, v2, ... , vk be n×1 orthonormal vectors that span kdimensional subspace W of ndimensional vector space U. Use these vectors as columns in the n×k matrix r r r Q ≡ v1 v 2 L v k . r r r (636) We claim that n×n matrix P = Q Q∗ (637) is the unique orthogonal projection onto subspace W. To show this, note that for any vector X ∈ U we have r LM v1∗ OP r∗ r Mv2 P r r v k M P X = v1 MM M PP r MNv∗k PQ r r LM v1∗ X OP r∗ r r Mv2 X P vk M MM M PPP r r MNv∗k XPQ r r r r r PX = Q Q∗ X = v1 v 2 L r v2 L (638) Now, PX is a linear combination of the orthonormal basis v1, v2, ... , vk. As X ranges over all of U, the vector PX ranges over all of W (since Pvj = vj, 1 ≤ j ≤ k, and v1, ... , vk span W). As a result, R(P) = W; this is the first requirement of (627). The second requirement of (627) is met by realizing that r r r r r r r r r r P2 = (QQ*)(QQ*) = Q I Q* = P, (639) since Q*Q is an k×k identity matrix I. Finally, the third requirement of (627) is obvious, and P = CH6.DOC Page 618 EE448/528 Version 1.0 John Stensby QQ* is the unique orthogonal projection on the subspace W. Special Case: One Dimensional Subspace W Consider the one dimensional case W = span(X1), where X1 has unit length. Then (637) r r∗ r r gives us n×n matrix P = X1 X1 . Let X2 ∉ W. Then the orthogonal projection of X2 onto W is r r r r∗ r r r∗r r∗r r r r r X p = PX 2 = ( X1 X1 )X 2 = X1 ( X1 X 2 ) = ( X1 X 2 )X1 = X2 , X1 X1 , r r (640) r∗r r where we have used the fact that X1 X 2 is a scalar. In light of the fact that X1 has unit length, (640) is the same as (626). Now that we are knowledgeable about projection operators, we can handle the important AX = b, b ∉ R(A), problem. The MinimumNorm, LeastSquaresError Problem At last, we are in a position to solve yet another colossal problem in linear algebraic equation theory. Namely, what to do with the AX = b problem when b ∉ R(A) so that no exact solution(s) exist. This type of problem occurs in applications where there are more equations (constraints) then there are unknowns to be solved for. In this case, the problem is said to be overdetermined. Such a problem is characterized by a matrix equation AX = b, where A is m×n with m > n, and b is m×1. Clearly, the rows of A are dependent. And, b may have been obtained by making imprecise measurements, so that rank(A) ≠ rank(A ¦ b), and no solution exists. r r r r r r r r r r r Thus, not knowing what is important in a model (i.e., which constraints are the most important and which can be thrown out), and an inability to precisely measure input b, can lead to an inconsistent set of equations. A "fix" for this problem is to throw out constraints (individual equations) until the modified system has a solution. However, as discussed above, it is not always clear how to accomplish this when all of the m constraints appear to be equally valid (ignorance keeps us from obtain a "better" model). Instead of throwing out constraints, it is sometimes better to do the best with what you
CH6.DOC Page 619 r EE448/528 Version 1.0 John Stensby have. Often, the best approach is to find an X that satisfies r r AX − b r r = min AZ − b . r
Z∈U 2 r 2 (641) That is, we try to get AX as close as possible to b. The problem of finding X that minimizes r r AX − b 2 is known as the least squares problem (since the 2norm is used). A solution to the least squares problem always exists. However, it is not unique, in general. To see this, let X minimize the norm of the residual; that is, let X satisfy (641) so that it is a solution to the least squares problem. Then, add to X any vector in K(A) to get yet another vector that minimizes the norm of the residual. It is common to add structure to the least squares problem in order to force a unique r r r r solution. We choose the smallest norm X that minimizes AX − b 2 . That is, we want the X that r r r 1) minimizes AX − b 2 , and 2) minimizes X 2 . This problem is called the minimumnorm, leastsquares problem. For all coordinate vectors b in V, it is guaranteed to have a unique solution (when b ∈ R(A), we have a "Problem #1"type problem  see the discussion on page 61). And, as shown below, from b in V back to the coordinate vector X in U, the mapping is the pseudo inverse of A. As before, we write X = A+ b for each b in V. The pseudo inverse solves the general optimization problem that is discussed in the paragraph containing Equation (61). r r r r r To verify that X = A+ b is the smallest norm X that minimizes AX − b 2 , let’s go about the process of minimizing the norm of the residual AX  b. First, recall from Fig. 51 that U = R(A*)⊕K(A) and V = R(A)⊕K(A*). Hence, X ∈ U and b ∈ V have the unique decompositions r r r r r r r r r r r r r r r r r r r r X = X R [ A∗ ] ⊕ X K[ A ] , RX r SX r T R [ A∗ ] K[ A ] ∈ R ( A∗ ) ∈ K[A ] U  V  W U  V  W
(642) r r r b = b R [ A ] ⊕ b K[ A ∗ ] , R  S  T r bR[A ] ∈ R ( A ) r . b K[ A ∗ ] ∈ K( A ∗ ) CH6.DOC Page 620 EE448/528 Now, we compute Version 1.0 John Stensby r r r r2 r r 2 AX − b = A{X R [ A∗ ] ⊕ X K[ A ]} {b R [ A ] ⊕ b K[ A ∗ ]}
2 2 r r r 2 = {AX R [ A∗ ] − b R [ A ]} − b K[ A ∗ ]
2 (643) r r r 2 2 = AX R [ A∗ ] − b R [ A ] + b K[ A ∗ ] .
2 2 r r The second line of this result follows from the fact AX K[ A ] = 0 . The third line follows from the r r r orthogonality of {AX R [ A∗ ] − b R [ A ]} and b ker[A ∗ ] . In minimizing (643), we have no control over r r r r b K[ A ∗ ] . However, how to choose X should be obvious. Select X = X R [ A∗ ] ∈ R(A*) = K(A)⊥ r r r which satisfies A X R [ A∗ ] = b R[ A ] . By doing this, we will minimize (643) and X 2 simultaneously. However, this optimum X is given by r r r r r X R [ A∗ ] = A+ b R [ A ] = A+ b R [ A ] ⊕ b K[ A ∗ ] = A+ b . r (644) From this, we conclude that the pseudo inverse solves the general optimization problem introduced by the paragraph containing Equation (61). Finally, when you use X = A+b, the norm r of the residual becomes b K[ A ∗ ] 2 ; folks, it just doesn’t get any smaller than this! Let’s give a geometric interpretation of our result. Given b ∈ V, the orthogonal r r projection of b on the range of A is b R[ A ] . Hence, when solving the general minimum norm, r r r leastsquare error problem, we first project b on R(A) to obtain b R[ A ] . Then we solve AX = r b R[ A ] for its minimum norm solution, a "Problem #1"type problem. Folks, this stuff is worth repeating and illustrating with a graphic! Again, to solve the general problem outlined on page 61, we r r 1) Orthogonally project b onto R(A) to get b R[ A ] , r r r 2) Solve AX = b R[ A ] for X ∈ R(A*) = K(A)⊥ (a “Problem #1” exercise). r r r CH6.DOC Page 621 EE448/528 Version 1.0
r John Stensby
b r r b − b R[A ] V ar ie ty Li ne ar r b R [A ] Co nt ain in g r b r X (a) K( A) r r Figure 66: a) b R[A ] is the orthogonal projection of b onto R(A). b) Find the minimum norm r r solution to AX = b R[A ] (this is a "Problem #1"type problem). This twostep procedure produces an optimum X that minimizes simultaneously! It is illustrated by Fig. 66. Application of the Pseudo Inverse: Least Squares Curve Fitting We want to pass a straight line through a set of n data points. We want to do this in such a manner that minimizes the squared error between the line and the data. Suppose we have the data points (xk, yk), 1 ≤ k ≤ n. As illustrated by Figure 67, we desire to fit the straight line r So lu tio ns of A r X = r X′ r X ′ is non  optimum r X is optimum (b) r r AX − b 2 and r X2 y = a0x + a1 (645) to the data. We want to find the coefficients a0 and a1 that minimize CH6.DOC Page 622 EE448/528
yaxis Version 1.0 John Stensby x y=
x x x x + a 1x a0
x xaxis x denotes supplied data point Figure 67: Leastsquares fit a line to a data set. Error 2 =
k =1 ∑ [ yk − ( a1xk + a0 )]2 . n (646) This problem can be formulated as an overdetermined linear system, and it can be solved by using the pseudo inverse operator. On the straight line, define ~k, 1 ≤ k ≤ n, to be the ordinate y values that correspond to the xk, 1 ≤ k ≤ n. That is, define ~k = a1xk + a0, 1 ≤ k ≤ n, as the line y points. Now, we adopt the vector notation r T y y y YL ≡ ~1 ~2 L ~n r T Yd ≡ y1 y2 L y n , (647) where YL denotes “line values”, and Yd denotes “ydata”. Now, denote the “xdata” n×2 matrix as r r LM1 1 Xd = M MMM N1 x1 x2 . M xn OP PP PQ (648) Finally, the “line equation” is
CH6.DOC Page 623 EE448/528 Xd r LMa0 OP = YL , N a1 Q Version 1.0 John Stensby (649) which defines a straight line with slope a1. Our goal is to select [a0 a1]T to minimize r a0 r 2 r r 2 − Yd . YL − Yd 2 = Xd a1
2 LM OP N Q (650) Unless all of the data lay on the line, the system r a0 r = Yd Xd a1 LM OP N Q (651) is inconsistent, so we want a “least squares fit” to minimize (650). From our previous work, we know the answer is r LMa0 OP = Xd+ Yd , N a1 Q (652) where Xd+ is the pseudo inverse of Xd. Using (622), we write r LMa0 OP = dX T X i−1XdT Yd , N a1 Q d d (653) T Xd is n×2 with rank 2. Hence X d X d is a 2×2 positive definite symmetric (and nonsingular) matrix. Example Consider the 5 points
CH6.DOC Page 624 EE448/528 k 1 2 3 4 5 xk 0 1 2 3 5 yk 0 1.4 2.2 3.5 4.4 Version 1.0 John Stensby LM1 MM1 Xd = 1 MM1 MN1 0 1 2 , 3 5 OP PP PP PQ LM 0 OP . MM14 PP r Yd = 2.2 MM35PP . MN4.4PQ MatLab yields the pseudo inverse X d+ = LM .5270 N−.1486 .3784 .2297 .0811 −.2162 −.0811 −.0135 .0541 .1892 OP Q so that the coefficients a0 and a1 are r LMa0 OP = Xd+ Yd = LM.3676OP N a1 Q N.8784Q The following MatLab program plots the points and the straight line approximation. % EX7_3.M Leastsquares curve fit with a line using % \ operator and polyfit. The results are displayed % and plotted. x=[0 1 2 3 5]; % Define the data points y=[0 1.4 2.2 3.5 4.4]; A1=[1 1 1 1 1 ]'; % Least squares matrix A=[A1 x']; Als=A'*A;
CH6.DOC Page 625 EE448/528 Version 1.0 John Stensby bls=A'*y'; % Compute least squares fit Xlsq1=Als\bls; Xlsq2=polyfit(x,y,1); f1=polyval(Xlsq2,x); error=yf1; disp(' x y f1 yf1') table=[x' y' f1' error']; disp(table) fprintf('Strike a key for the plot\n') pause % Plot clf plot(x,y,'xr',x,f1,'') axis([1 6 1 6]) title('Least Squares line Figure 7.3') xlabel('x') ylabel('y') The output of the program follows. x 0 1.0000 2.0000 3.0000 5.0000 y 0 1.4000 2.2000 3.5000 4.4000 f1 yf1 Least Squares line Figure 7.3 6 5 4 3 y 2 1 0 1 1 0 1 2 x 3 4 5 0.3676 0.3676 1.2459 2.1243 3.0027 0.1541 0.0757 0.4973 4.7595 0.3595 Strike a key for the plot NPort Resistive Networks: Application of Pseudo Inverse Consider the nport electrical network.
+ V1 + V2  We treat the nport as a black box that is
+ Vn  I1 I2 In ... ... Black Box nPort Resistive Network Figure 68: nport network described by terminal behavior.
CH6.DOC Page 626 EE448/528 Version 1.0 John Stensby characterized by its terminal behavior. Suppose there are only resistors and linearly dependent sources in the black box. Then we can characterize the network using complexvalued voltages and currents (i.e., “phasors”) by a set of equations of the form V1 = z11I1 + z12I2 + L + z1n I n V2 = z21I1 + z22I2 + L + z2n I n . M M M M (654) Vn = z n1I1 + z n 2I2 + L + z nn I n This is equivalent to writing V= ZI where V = [V1 V2 ... Vn]T, I = [I1 I2 ... In]T and r r r r LM z11 z21 Z=M MM M Nzn1 z12 L z1n z22 L z2n M L M z n 2 L z nn OP PP PQ (655) is the n×n impedance matrix. The zik, 1 ≤ i,k ≤ n, are known as open circuited impedance parameters (they are realvalued since we are dealing with resistive networks). The network is said to be reciprocal if input port i can be interchanged with output port j and the relationship between port voltages and currents remain unchanged. That is, the network is reciprocal if Z is Hermitian (Z* = ZT since the matrix is realvalued). Also, for a reciprocal resistive nport, it is possible to show that Hermitian Z is positive semidefinite. Suppose we connect two reciprocal nports in parallel to form a new reciprocal nport. Given impedance matrices Z1 and Z2 for the two nports, we want to find the impedance matrix Zequ of the parallel combination. Before doing this, we must give some preliminary material. Theorem 66 Given n×n impedance matrices Z1 and Z2 of two resistive, reciprocal n port networks. Then CH6.DOC Page 627 EE448/528 R ( Z1 ) + R ( Z2 ) = R ( Z1 + Z2 ) . Version 1.0 John Stensby (656) Proof: As discussed above, we know that Z1 and Z2 are Hermitian and nonnegative definite. We prove this theorem by showing 1) R(Z1 + Z2) ⊂ R(Z1) + R(Z2), and 2) R(Z1) ⊂ R(Z1 + Z2) and R(Z2) ⊂ R(Z1 + Z2) so that R(Z1) + R(Z2) ⊂ R(Z1 + Z2). We show #1. Consider Y ∈ R(Z1 + Z2). Then there exists an X such that Y = (Z1 + Z2)X = Z1X + Z2X. But Z1X ∈ R(Z1) and Z2X ∈ R(Z2). Hence, Y ∈ R(Z1) + R(Z2), and this implies that R(Z1 + Z2) ⊂ R(Z1) + R(Z2). We show #2. Let Y ∈ K(Z1+Z2) so that (Z1+Z2)Y = 0. Then r r r r r r r r 0 = Y,( Z1 + Z2 )Y = ( Z1 + Z2 )Y, Y = Z1Y, Y + Z2Y, Y . r r r r r r r r r r r r (657) Now, Z1 and Z2 are nonnegative definite so that both inner products on the righthand side of (657) must be nonnegative. Hence, we must have r r r r Z1Y, Y = Z2Y, Y = 0 . (658) Since Z1 and Z2 are Hermitian, Equation (658) can be true if and only if Z1Y = 0 and Z2Y = 0; that is, Y ∈ K(Z1) and Y ∈ K(Z2). Hence, we have shown that K(Z1 + Z2 ) ⊂ K(Z1 ), K(Z1 + Z2 ) ⊂ K(Z2 ) (659) R(Z1 ) ⊂ R(Z1 + Z2 ), R(Z2 ) ⊂ R(Z1 + Z2 ) r r r r r r Finally, the second set relationship in (659) implies R(Z1) + R(Z2) ⊂ R(Z1+Z2), and the theorem is proved.♥ CH6.DOC Page 628 EE448/528 Parallel Sum of Matrices Version 1.0 John Stensby Let N1 and N2 be reciprocal nport resistive networks that are described by n×n matrices Z1 and Z2, respectively. The parallel sum of Z1 and Z2 is denoted as Z1:Z2, and it is defined as Z1:Z2 = Z1 (Z1 + Z2)+ Z2 , (660) an n×n matrix. Theorem 67 Let Z1 and Z2 be impedance matrices of reciprocal nport resistive networks. Then, we have Z1:Z2 = Z1 (Z1 + Z2)+ Z2 = Z2 (Z1 + Z2)+ Z1 = Z2:Z1 (661) Proof: By inspection, the matrix (Z1 + Z2)(Z1 + Z2)+(Z1 + Z2) is symmetric. Multiply out this matrix to obtain (Z1 + Z2)(Z1 + Z2)+(Z1 + Z2) = [Z1(Z1 + Z2)+Z1 + Z2(Z1 + Z2)+Z2] + [Z1(Z1 + Z2)+Z2 + Z2(Z1 + Z2)+Z1] . (662) The lefthandside of this result is symmetric. Hence, the righthandside must be symmetric as well. This requires that both Z1(Z1 + Z2)+Z2 and Z2(Z1 + Z2)+Z1 be symmetric, so that Z1 Z1 + Z2 b g+ Z2 = dZ1( Z1 + Z2 )+ Z2 iT = ZTe( Z1 + Z2 )+ jT Z1T = ZTe( Z1 + Z2 )T j+ Z1T 2 2
(663) = Z2 Z1 + Z2 b g+ Z1 as claimed. Hence, we have Z1:Z2 = Z2:Z1 and the parallel sum of symmetric matrices is a CH6.DOC Page 629 EE448/528 symmetric matrix. Version 1.0 John Stensby Theorem 68: Parallel nPort Resistive Networks Suppose that N1 and N2 are two reciprocal nport resistive networks that are described by impedance matrix Z1 and Z2, respectively. Then the parallel connection of N1 and N2 is described by the impedance matrix Zequ = Z1 : Z2 = Z1(Z1 + Z2)+Z2 , (664) the parallel sum of matrices Z1 and Z2. Proof: Let I 1, I 2 and I = I 1 + I 2 be the current vectors entering networks N1, N2 and the parallel combination of N1 and N2, respectively. Likewise, let V be the voltage vector that is supplied to the parallel connection of N1 and N2. To prove this theorem, we must show that r r r r r r V=Z1(Z1 + Z2)+Z2 I = (Z1 : Z2)I . r r r (665) Since V is supplied to the parallel combination, we have r V = Z1 I 1 = Z2 I 2 . r r r (666) Use this last equation to write r r r r r Z1I = Z1( I1 + I2 ) = V + Z1I2 r r r r r Z2 I = Z2 ( I1 + I2 ) = V + Z2 I1 . (667) Now, multiply these last two equations by (Z1 + Z2)+ to obtain CH6.DOC Page 630 EE448/528 Version 1.0 John Stensby r r r ( Z1 + Z2 )+ Z1I = ( Z1 + Z2 )+ V + ( Z1 + Z2 )+ Z1I2 r r r ( Z1 + Z2 ) Z2 I = ( Z1 + Z2 )+ V + ( Z1 + Z2 )+ Z2 I1
+ . (668) On the left, multiply the first of these by Z2 and the second by Z1 to obtain (using the parallel sum notation) r r r ( Z2: Z1 ) I = Z2 ( Z1 + Z2 )+ V + ( Z2: Z1 ) I2 r r r ( Z1: Z2 ) I = Z1( Z1 + Z2 )+ V + ( Z1: Z2 ) I1 . (669) However, we know that Z1:Z2 = Z2:Z1, I = I 1 + I 2, and Z1+Z2 is symmetric. Hence, when we add the two equation (669) we get r r ( Z1: Z2 ) I = ( Z1 + Z2 )( Z1 + Z2 )+ V r = PV where , (670) r r r P ≡ (Z1 + Z2)(Z1 + Z2)+ (671) is an orthogonal projection operator onto the range of Z1 + Z2. Since V = Z1 I 1 = Z2 I 2, we know that V ∈ R(Z1) and V ∈ R(Z2). However, by Theorem 66, we know that R(Z1) ⊂ R(Z1 + Z2) and R(Z2) ⊂ R(Z1 + Z2). Hence, we have V ∈ R(Z1 + Z2) so that PV = V, where P is the orthogonal projection operator (671). Hence, from (670), we conclude that r r r ( Z1: Z2 ) I = P V = V , r r r r r r r r (672) so the parallel sum Z1:Z2 is the impedance matrix for the parallel combination of N1 and N2.♥ CH6.DOC Page 631 EE448/528 Example Version 1.0 John Stensby •
2 3 R1 R2 1 • • • Consider the 3port networks shown to the right (1 to 1', 2 to 2' and 3 to 3' are the ports; voltages are sensed positive, and currents are defined as entering, the unprimed terminals 1, 2 and 3). For 1 ≤ i, j ≤ 3, the entries in the impedance matrix are V zij ≡ i , Ij • • 1' 2' 3' where Vi is the voltage across the ith port (voltage is signed positive at the unprimed port terminal), and Ij is the current entering the jth port (current defined as entering unprimed port terminal). Elementary circuit theory can be used to obtain LMR1 Z= 0 MMR N1 0 R2 R2 R1 R2 R1 + R 2 OP PP Q as the impedance matrix for this 3port. Connect two of these 3port networks in parallel to form the network depicted to the right. The impedance matrices are
3 • • • • 3 1Ω 1 • • • • • •
Page 632 2 1' 3' 1 LM1 Z1 = 0 MM1 N 1 MML0 Z2 = MN1
CH6.DOC 0 1 2 2 2 3 0 1 1 OP PP Q , 1O 1P P 2P Q • • • 2' 3 2Ω •
1' 1 2 • • 2' 3' 1Ω • • 2 1Ω 1' 2' 3' EE448/528 Version 1.0 John Stensby both of which are singular. By Theorem 68, the impedance matrix for the parallel combination is Zequ = Z1: Z2 = Z1( Z1 + Z2 )+ Z2 LM1 = M0 MN1 0 1 2 2 2 3 OP F LM1 0 PP GG MM1 Q HN 0 1 1 0 1 2 2 + 0 1 1 2 3 1 1 2 PPO PQ LM MM N OPI + LMML1 PPJJ MMMM0 QK NN1 0 1 1 1 1 2 OPOP LM1 / 2 PPPP = MM1 02 QQ N / 0 1/ 2 . 2/3 2/3 2/3 7/6 OP PP Q CH6.DOC Page 633 ...
View
Full Document
 Fall '08
 Chandrasekara
 Linear Algebra, John Stensby

Click to edit the document details