This preview shows page 1. Sign up to view the full content.
Unformatted text preview: EE448/528 Version 1.0 John Stensby Chapter 4: Matrix Norms
The analysis of matrixbased algorithms often requires use of matrix norms. These algorithms need a way to quantify the "size" of a matrix or the "distance" between two matrices.
For example, suppose an algorithm only works well with fullrank, n×n matrices, and it produces
inaccurate results when supplied with a nearly rank deficit matrix. Obviously, the concept of εrank (also known as numerical rank), defined by
rank ( A , ε ) = min rank ( B)
A−B ≤ ε (41) is of interest here. All matrices B that are "within" ε of A are examined when computing the εrank of A.
We define a matrix norm in terms of a given vector norm; in our work, we use only the pr
vector norm, denoted as X p . Let A be an m×n matrix, and define
r
AX p
,
A p = sup
r
r
Xp
X ≠0 (42) where "sup" stands for supremum, also known as least upper bound. Note that we use the same
⋅ p notation for both vector and matrix norms. However, the meaning should be clear from
context.
Since the matrix norm is defined in terms of the vector norm, we say that the matrix norm
is subordinate to the vector norm. Also, we say that the matrix norm is induced by the vector
norm. r
Now, since X p is a scalar, we have r
rr
AX p
A p = sup
= sup AX / X p .
r
r
r
p
Xp
X ≠0
X ≠0 CH4.DOC (43) Page 41 EE448/528 Version 1.0 John Stensby rr
In (43), note that X / X p has unit length; for this reason, we can express the norm of A in terms
of a supremum over all vectors of unit length,
sup
Ap= r X p =1 r
AX p . (44) r
r
That is, A p is the supremum of AX p on the unit ball X p = 1.
Careful consideration of (42) reveals that
r
r
AX p ≤ A p X p (45) r
r
r
r
for all X. However, AX p is a continuous function of X, and the unit ball X p = 1 is closed and
bounded (real analysis books, it is said to be compact). Now, on a closed and bounded set, a
continuous function always achieves its maximum and minimum values. Hence, in the definition
of the matrix norm, we can replace the "sup" with "max" and write
r
AX p
r
= max AX p .
A p = max r
r
r
X≠ 0
Xp
X p =1 (46) When computing the norm of A, the definition is used as a starting point. The process has
two steps.
r
r
r
1) Find a "candidate" for the norm, call it K for now, that satisfies AX p ≤ K X p for all X.
r
r
r
2) Find at least one nonzero X0 for which AX 0 p = K X 0 p
Then, you have your norm: set A p = K. MatLab's Matrix Norm Functions
From an application standpoint, the 1norm, 2norm and the ∞norm are amoung the most CH4.DOC Page 42 EE448/528 Version 1.0 John Stensby important; MatLab computes these matrix norms. In MatLab, the 1norm, 2norm and ∞norm
are invoked by the statements norm(A,1), norm(A,2), and norm(A,inf), respectively.
The 2norm is the default in MatLab. The statement norm(A) is interpreted as norm(A,2) by
MatLab. Since the 2norm used in the majority of applications, we will adopt it as our default. In
what follows, an "undesignated" norm A is to be intrepreted as the 2norm A 2 . The Matrix 1Norm
Recall that the vector 1norm is given by
n
r
X 1 = ∑ xi . (47) i =1 Subordinate to the vector 1norm is the matrix 1norm A 1 = max
j F a I.
GH ∑ ij JK
i (48) That is, the matrix 1norm is the maximum of the column sums. To see this, let m×n matrix A
be represented in the column format
r
r
r
A = A1 A 2 L A n . (49) Then we can write
nr
r
r
r
rr
AX = A1 A 2 L A n X = ∑ A k x k , (410) k =1 CH4.DOC Page 43 EE448/528 Version 1.0 John Stensby r where xk, 1 ≤ k ≤ n, are the components of arbitrary vector X. The triangle inequality and
standard analysis applied to the norm of (410) yields
nr
x
r
r
AX 1 = ∑ A k x k ≤ ∑ x k A k 1
k =1 r
≤ max A j
j 1 k =1 Fnx I
1G ∑ k J
H k =1 K (411) r
r
= max A j
X1 .
1
j With the development of (411), we have completed step #1 for computing the matrix 1norm.
That is, we have found a constant
r
K = max A j
1
j
r
r
r
such that AX 1 ≤ K X 1 for all X. Step 2 requires us to find at least one vector for which we
r
r
have equality in (411). But this is easy; to maximize AX 1 , it is natural to select the X0 that puts
"all of the allowable weight" on the component that will "pullout" the maximum column sum.
That is, the "optimum" vector is
r
X0 = 0 0 L 0 1 0 L 0
jthposition
where j is the index that satisfies
r
r
A j = max A k
1
1
k CH4.DOC Page 44 EE448/528 Version 1.0 John Stensby (the sum of the magnitudes in the jth column is equal to, or larger than, the sum of the magnitudes r in any column). When X0 is used, we have equality in (411), and we have completed step #2, so
(48) is the matrix 1norm. The Matrix ∞Norm
Recall that the vector ∞norm is given by
r
X ∞ = max x k , (412) k the vector's largest component. Subordinate to the vector ∞norm is the matrix ∞norm A ∞ = max
i F aI
GG ∑ ij JJ .
Hj K (413) That is, the matrix ∞norm is the maximum of the row sums. To see this, let A be an arbitrary
m×n matrix and compute ∑ a1k xk
k ∑ a2k x k r
k
AX ∞ = M ∑ a mk xk
k ≤ max
i CH4.DOC = max
i ∑ aik x k ≤ max
k i F axI
GH ∑ ik k JK
k
(414) ∞ F a I max x
GH ∑ ik JK k k
k Page 45 EE448/528 Version 1.0 = max
i F aI
GH ∑ ik JK
k John Stensby r
X∞ We have completed the first step. We have found a constant K = max
i r
r
AX ∞ ≤ K X ∞ F a I for which
GH ∑ ik JK
k
(415) r for all X. r Step #2 requires that we find a nonzero X0 for which equality holds in (414) and (415). r A close examination of these formulas leads to the conclusion that equality prevails if X0 is
defined to have the components xk =
= aik
, a ik ≠ 0, 1 ≤ k ≤ n,
a ik
1, a ik = 0 (416) , (the overbar denotes complex conjugate) where i is the index for the maximum row sum. That is,
in (416), use index i for which ∑ aik ≥ ∑ a jk
k , i ≠ j. (417) k Hence, (413) is the matrix ∞norm as claimed. The Matrix 2Norm
Recall that the vector 2norm is given by CH4.DOC Page 46 EE448/528
n r
X2= Version 1.0 ∑ xi 2 = John Stensby rr
X, X . (418) i =1 Subordinate to the vector 2norm is the matrix 2norm
∗
A 2 = largest eigenvalue of A A . (419) Due to this connection with eigenvalues, the matrix 2norm is called the spectral norm.
To see (419) for an arbitrary m×n matrix A, note that A*A is n×n and Hermitian. By
Theorem 4.2.1 (see Appendix 4.1), the eigenvalues of A*A are realvalued. Also, A*A is at least r r r r r positive semidefinite since X*(A*A)X = (AX)*(AX) ≥ 0 for all X. Hence, the eigenvalues of
A*A are both realvalued and nonnegative; denote them as 2
2
σ1 ≥ σ 2 ≥ σ 3 ≥ L ≥ σ 2 ≥ 0 .
2
n (420) 2
Note that these eigenvalues are arranged according to size with σ1 being the largest. These eigenvalues are known as the singular values of matrix A.
Corresponding to these eigenvalues are n orthonormal (hence, they are independent) rr r eigenvectors U1, U2, ... , Un with r r (A*A)Uk = (σk2)Uk , 1 ≤ k ≤ n. (421) The n eigenvectors form the columns of a unitary n×n matrix U that diagonalizes matrix A*A
under similarity (matrix U*(A*A)U is diagonal with eigenvalues (420) on the diagonal). rr r Since the n eigenvectors U1, U2, ... , Un are independent, they can be used as a basis, and r vector X can be expresssed as CH4.DOC Page 47 EE448/528 Version 1.0 John Stensby n
r
r
X = ∑ ck U k , (422) k =1 r r where the ck are Xdependent constants. Multiply X, in the form of (422), by A*A by to obtain
n
n
r
r
r
A ∗AX = A ∗A ∑ ck U k = ∑ c k σ 2 U k ,
k
k =1 (423) k =1 which leads to
r2
r
rr
r
AX = ( AX )∗ AX = X∗ ( A ∗AX ) =
2 2
≤ σ1 r
F n c∗ U∗ I F n c σ2 U I = n c 2 σ2
r
GH ∑1 j j jJK ∑ k k
GH k∑1 k k JK j=
=
k =1 F n c 2I
GH k∑1 k JK
= (424) 2r2
= σ1 X
2 r 2
for arbitrary X. Hence, we have completed step#1: we found a constant K = σ1 such that
r
r
r
r
AX 2 ≤ K X 2 for all X. Step#2 requires us to find at least one vector X0 for which equality
r
r
r
holds; that is, we must find an X0 with the property that AX 0 2 = K X 0 2 . But, it is obvious r r 2
that X0 = U1, the unitlength eigenvector associated with eigenvalue σ1 , will work. Hence, the matrix 2norm is given by A 2 = 2
σ1 , the square root of the largest eigenvalue of A*A. The 2norm is the default in MatLab. Also, it is the default here. From now on, unless
specified otherwise, the 2norm is assumed: A means A 2 .
Example A= LM1 1OP
N2 1Q CH4.DOC Page 48 EE448/528 Version 1.0 John Stensby r
r
Find A 2 , and find the unitlength vector X0 that maximizes AX . First, compute the product
2
53
.
A ∗A = A TA =
32 LM OP
NQ The eigenvalues of this matrix are σ12 = 6.8541 and σ22 = .1459; note that ATA is positive definite
2
symmetric. The 2norm of A is A 2 = σ1 = 2.618 , the square root of the largest eigenvalue of ATA. r r The corresponding eigenvectors are U1 = [.8507 .5257] and U2 = [.5257 .8507]. They r r are columns in an orthogonal matrix U = [U1 ¦ U2]; note that UTU = I or UT = U1. Furthermore,
matrix U diagonalizes ATA U T LMσ12
( A A )U =
NM 0
T OP L
QP MN 6.8541
0
0
=
.1459
0
σ2
2 r
The unitlength vector that maximizes AX
r
max AX
r
X =1 2 OP
Q
r 2 r is X0 = U1, and 2
= σ 1 = 6.8541 pNorm of Matrix Product
For m×n matrix A we know that
r
r
AX p ≤ A p X p (425) r for all X. Now, consider m×n matrix A and n×q matrix B. The product AB is m×q. For q r dimensional X, we have CH4.DOC Page 49 EE448/528 Version 1.0 r
r
r
r
ABX p = A ( BX ) p ≤ A p BX p ≤ A p B p X p . John Stensby
(426) by applying (425) twice in a row. Hence, we have
r
ABX p
≤ Ap Bp
r
Xp (427) r for all nonzero X. As a result, it follows that
AB p ≤ A p B p , (428) a useful, simple result. 2Norm Bound
Let A be an m×n matrix with elements aij, 1 ≤ i ≤ m, 1 ≤ j ≤ n. Let
max a ij
i, j denote the magnitude of the largest element in A. Let (i0, j0) be the indices of the largest element
so that
a i j ≥ a ij
00 (429) for all i, j. CH4.DOC Page 410 EE448/528 Version 1.0 John Stensby Theorem 4.1 (2norm bound):
max a ij ≤ A 2 ≤ mn max a ij i, j (430) i, j Proof: Note that r
AX 2
2 ∑ a1k x k = k A 2 = max
r X =1 2 + ∑ a2k x k
k 2 +L+ ∑ a mk x k (431) k r
AX .
2 2 r Define X0 as the vector with 1 in the j0 position and zero elsewhere (see (429) for definition of i0
and j0). Since
r
r
AX 0 2 ≤ max AX = A 2
r
2
X 2 =1 (432) we have ai j ≤
00 ∑ a k j0 2
k r
≤ max AX 2 = A 2
r
X=
2 (433) 1 Hence, we have
max a ij ≤ A 2
i, j (434) as claimed. Now, we show the remainder of the 2norm bound. Observe that CH4.DOC Page 411 EE448/528 Version 1.0
2 A 2 = max
r X 2 =1 ∑ a1k x k 2 + k ≤ max
r X 2 =1 ≤ ∑ a2k x k 2 +L+ k ∑ a1k xk 2 k John Stensby ∑ a mk x k
k + ∑ a 2 k x k + L + ∑ a mk x k
2 k 2 k (435) ∑ a1k 2 + ∑ a2k 2 + L + ∑ a mk 2
k k k ≤ mn max a ij .
i, j When combined with (434), we have max a ij ≤ A 2 ≤
i, j mn max a ij .
i, j Triangle Inequality for the pNorm
Recall the Triangle inequality for real numbers: α = β≤α + β. A similar result is
valid for the matrix pnorm. Theorem 4.2 (Triangle Inequality for the matrix 2norm)
Let A and B be m×n matrices. Then
A+Bp ≤ A p+ Bp (436) Application of Matrix Norm: Inverse of Matrices “Close” to a Nonsingular Matrix.
Let A be an n×n nonsingular matrix. Can we state conditions on the “size”(i.e., norm) of
n×n matrix E which guarantees that A+E is nonsingular? Before developing these conditions, we
derive a closely related result, the geometric series for matrices. Recall the geometric series
∞
1
= ∑ xk , x < 1
1 − x k =1 (437) for real variables. We are interested in the “matrix version” of this result.
CH4.DOC Page 412 EE448/528 Version 1.0 John Stensby Theorem 4.3 (Geometric Series for Matrices)
Let F be any n×n matrix with F 2 < 1 . Let I denote the n×n identity matrix. Then, the
difference I  F is nonsingular and ( I − F)−1 = ∞ ∑ Fk (438) k=0 with
1
.
( I − F)−1 2 ≤
1− F 2 (439) Proof: First, we show that I  F is nonsingular. Suppose that this is not true; suppose that I  F is
r
r
r
r
singular. Then there exists at least one nonzero X such that (I  F)X = 0 so that X 2 = FX 2 .
r
r
r
r
r
But since FX 2 ≤ F 2 X 2 , we must have X 2 = FX 2 ≤ F 2 X 2 which requires that F 2 ≥ 1 ,
a contradiction. Hence, the n×n matrix I  F must be nonsingular. To obtain a series for (I  F)1,
consider the obvious identity F N Fk I ( I − F) = I − F
GH k∑0 JK
= N +1 (440) k However, Fk 2 ≤ F 2 , and since F 2 < 1 , we have Fk → 0 as k → ∞. As a result of this, F N Fk I ( I − F) = I
limit G ∑ J
N →∞ H k = 0 K
so that ( I − F) −1 = (441) ∞ ∑ Fk as claimed. Finally k=0 CH4.DOC Page 413 EE448/528
( I − F) −1
2 Version 1.0
∞ ∑F = k k =0 ≤ ∞ ≤
2 ∑ F2 k John Stensby ∞ ∑ Fk 2 k =0 (this is an "ordinary" scalar geometric series) . (442) k =0 = 1
1− F 2 as claimed.♥
In many applications, Theorem 4.3 is used in the form ( I − εF ) −1 = ∞ ∑ Fk ε k k =0 for ε < 1 / F 2 , (443) where ε is considered to be a "small" parameter. Next, we generalize Theorem 4.3 and obtain a r result that will be used in our study of how small errors in A and b influence the solution of the r r linear algebraic problem AX = b. Theorem 44
Assume that A is n×n and nonsingular. Let E be an arbitrary n×n matrix. If
ρ = A −1E 2 < 1, (444) then A + E is nonsingular and
2 ( A + E )−1 − A −1 2 ≤ E 2 A −1
2
1− ρ (445) Proof: Since A is nonsingular, we have CH4.DOC Page 414 EE448/528 Version 1.0 John Stensby A + E = A ( I − F), where F ≡ A 1E . (446) Since ρ = F 2 = A −1E 2 < 1, it follows from Theorem 4.3 that I  F is nonsingular and
( I − F)−1 2 ≤ 1 = 1− F 2 1
.
1− ρ (447) From (446) we have (A + E)1 = (I  F)1A1; with the aid of (447), this can be used to write ( A + E) −1
2 ≤ A −1 2. (448) 1− ρ Now, multiply both sides of (A + E)1  A1 = A1E(A+E)1 (449) by A + E to see this matrix identity. Finally, take the norm of (449) to obtain
( A + E )−1 − A −1 2 = A −1E( A + E )−1 2 (450)
≤A −1
2 E 2 ( A + E) −1
2 Now use (448) in this last result to obtain ( A + E) −1 −A −1
2 ≤A A −1 −1
2 E2 2 1− ρ = A −1 2
2 1− ρ E2 (451) as claimed.♥ CH4.DOC Page 415 EE448/528 Version 1.0 John Stensby r r We will use Theorem 4.4 when studying the sensitivity of the linear equation AX = b. r r That is, we want to relate changes in solution X to small changes (errors) in both A and b. CH4.DOC Page 416 ...
View
Full
Document
 Fall '08
 Chandrasekara
 Algorithms

Click to edit the document details