Chapters 3-4 - i i i “main” 2007/2/16 page 189 i...

Info iconThis preview shows page 1. Sign up to view the full content.

View Full Document Right Arrow Icon
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: i i i “main” 2007/2/16 page 189 i CHAPTER 3 Determinants Mathematics is the gate and key to the sciences. — Roger Bacon In this chapter, we introduce a basic tool in applied mathematics, namely the determinant of a square matrix. The determinant is a number, associated with an n × n matrix A, whose value characterizes when the linear system Ax = b has a unique solution (or, equivalently, when A−1 exists). Determinants enjoy a wide range of applications, including coordinate geometry and function theory. Sections 3.1–3.3 give a detailed introduction to determinants, their properties, and their applications. Alternatively, Section 3.4, “Summary of Determinants,” can provide a nonrigorous and much more abbreviated introduction to the fundamental results required in the remainder of the text. We will see in later chapters that determinants are invaluable in the theory of eigenvalues and eigenvectors of a matrix, as well as in solution techniques for linear systems of differential equations. 3.1 The Definition of the Determinant We will give a criterion shortly (Theorem 3.2.4) for the invertibility of a square matrix A in terms of the determinant of A, written det(A), which is a number determined directly from the elements of A. This criterion will provide a first extension of the Invertible Matrix Theorem introduced in Section 2.8. To motivate the definition of the determinant of an n × n matrix A, we begin with the special cases n = 1, n = 2, and n = 3. Case 1: n = 1. According to Theorem 2.6.5, the 1 × 1 matrix A = [a11 ] is invertible if and only if rank(A) = 1, if and only if the 1 × 1 determinant, det(A), defined by det (A) = a11 is nonzero. 189 i i i i i i i “main” 2007/2/16 page 190 i 190 CHAPTER 3 Determinants Case 2: n = 2. According to Theorem 2.6.5, the 2 × 2 matrix A= a11 a12 a21 a22 is invertible if and only if rank(A) = 2, if and only if the row-echelon form of A has two nonzero rows. Provided that a11 = 0, we can reduce A to row-echelon form as follows: a12 1 ∼ a12 a21 . 0 a22 − a11 a11 a12 a21 a22 a11 1. A12 − a21 a11 a12 a21 = 0, or that a11 a22 − a12 a21 = 0. a11 Thus, for A to be invertible, it is necessary that the 2 × 2 determinant, det(A), defined by For A to be invertible, it is necessary that a22 − det (A) = a11 a22 − a12 a21 (3.1.1) be nonzero. We will see in the next section that this condition is also sufficient for the 2 × 2 matrix A to be invertible. Case 3: n = 3. According to Theorem 2.6.5, the 3 × 3 matrix a11 a12 a13 A = a21 a22 a23 a31 a32 a33 is invertible if and only if rank(A) = 3, if and only if the row-echelon form of A has three nonzero rows. Reducing A to row-echelon form as in Case 2, we find that it is necessary that the 3 × 3 determinant defined by det (A) = a11 a22 a33 + a12 a23 a31 + a13 a21 a32 − a11 a23 a32 − a12 a21 a33 − a13 a22 a31 (3.1.2) be nonzero. Again, in the next section we will prove that this condition on det(A) is also sufficient for the 3 × 3 matrix A to be invertible. To generalize the foregoing formulas for the determinant of an n × n matrix A, we take a closer look at their structure. Each determinant above consists of a sum of n! products, where each product term contains precisely one element from each row and each column of A. Furthermore, each possible choice of one element from each row and each column of A does in fact occur as a term of the summation. Finally, each term is assigned a plus or a minus sign. Based on these observations, the appropriate way in which to define det(A) for an n × n matrix would seem to be to add up all possible products consisting of one element from each row and each column of A, with some condition on which products are taken with a plus sign and which with a minus sign. To describe this condition, we digress to discuss permutations. i i i i i i i “main” 2007/2/16 page 191 i 3.1 The Definition of the Determinant 191 Permutations Consider the first n positive integers 1, 2, 3, . . . , n. Any arrangement of these integers in a specific order, say, (p1 , p2 , . . . , pn ), is called a permutation. Example 3.1.1 There are precisely six distinct permutations of the integers 1, 2 and 3: (1, 2, 3), (1, 3, 2), (2, 1, 3), (2, 3, 1), (3, 1, 2), (3, 2, 1). More generally, we have the following result: Theorem 3.1.2 There are precisely n! distinct permutations of the integers 1, 2, . . . , n. The proof of this result is left as an exercise. The elements in the permutation (1, 2, . . . , n) are said to be in their natural increasing order. We now introduce a number that describes how far a given permutation is from its natural order. For i = j , the pair of elements pi and pj in the permutation (p1 , p2 , . . . , pn ) are said to be inverted if they are out of their natural order; that is, if pi > pj with i < j . If this is the case, we say that (pi , pj ) is an inversion. For example, in the permutation (4, 2, 3, 1), the pairs (4, 2), (4, 3), (4, 1), (2, 1), and (3, 1) are all out of their natural order. Consequently, there are a total of five inversions in this permutation. In general we let N(p1 , p2 , . . . , pn ) denote the total number of inversions in the permutation (p1 , p2 , . . . , pn ). Example 3.1.3 Find the number of inversions in the permutations (1, 3, 2, 4, 5) and (2, 4, 5, 3, 1). Solution: The only pair of elements in the permutation (1, 3, 2, 4, 5) that is out of natural order is (3, 2), so N(1, 3, 2, 4, 5) = 1. The permutation (2, 4, 5, 3, 1) has the following pairs of elements out of natural order: (2, 1), (4, 3), (4, 1), (5, 3), (5, 1), and (3, 1). Thus, N(2, 4, 5, 3, 1) = 6. It can be shown that the number of inversions gives the minimum number of adjacent interchanges of elements in the permutation that are required to restore the permutation to its natural increasing order. This justifies the claim that the number of inversions describes how far from natural order a given permutation is. For example, N(3, 2, 1) = 3, and the permutation (3, 2, 1) can be restored to its natural order by the following sequence of adjacent interchanges: (3, 2, 1) → (3, 1, 2) → (1, 3, 2) → (1, 2, 3). The number of inversions enables us to distinguish two different types of permutations as follows. DEFINITION 3.1.4 1. If N(p1 , p2 , . . . , pn ) is an even integer (or zero), we say (p1 , p2 , . . . , pn ) is an even permutation. We also say that (p1 , p2 , . . . , pn ) has even parity. 2. If N(p1 , p2 , . . . , pn ) is an odd integer, we say (p1 , p2 , . . . , pn ) is an odd permutation. We also say that (p1 , p2 , . . . , pn ) has odd parity. i i i i i i i “main” 2007/2/16 page 192 i 192 CHAPTER 3 Determinants Example 3.1.5 The permutation (4, 1, 3, 2) has even parity, since we have N(4, 1, 3, 2) = 4, whereas (3, 2, 1, 4) is an odd permutation since N(3, 2, 1, 4) = 3. We associate a plus or a minus sign with a permutation, depending on whether it has even or odd parity, respectively. The sign associated with the permutation (p1 , p2 , . . . , pn ) can be specified by the indicator σ (p1 , p2 , . . . , pn ), defined in terms of the number of inversions as follows: σ (p1 , p2 , . . . , pn ) = +1 −1 if (p1 , p2 , . . . , pn ) has even parity, if (p1 , p2 , . . . , pn ) has odd parity. Hence, σ (p1 , p2 , . . . , pn ) = (−1)N (p1 ,p2 ,...,pn ) . Example 3.1.6 It follows from Example 3.1.3 that σ (1, 3, 2, 4, 5) = (−1)1 = −1, whereas σ (2, 4, 5, 3, 1) = (−1)6 = 1. The proofs of some of our later results will depend upon the next theorem. Theorem 3.1.7 If any two elements in a permutation are interchanged, then the parity of the resulting permutation is opposite to that of the original permutation. Proof We first show that interchanging two adjacent terms in a permutation changes its parity. Consider an arbitrary permutation (p1 , . . . , pk , pk +1 , . . . , pn ), and suppose we interchange the adjacent elements pk and pk +1 . Then • If pk > pk +1 , then N(p1 , p2 , . . . , pk +1 , pk , . . . , pn ) = N(p1 , p2 , . . . , pk , pk +1 , . . . , pn ) − 1, • If pk < pk +1 , then N(p1 , p2 , . . . , pk +1 , pk , . . . , pn ) = N(p1 , p2 , . . . , pk , pk +1 , . . . , pn ) + 1, so that the parity is changed in both cases. Now suppose we interchange the elements pi and pk in the permutation (p1 , p2 , . . . , pi , . . . , pk , . . . , pn ). Note that k − i > 0. We can accomplish this by successively interchanging adjacent elements. In moving pk to the i th position, we perform k − i interchanges involving adjacent terms, and the resulting permutation is (p1 , p2 , . . . , pk , pi , . . . , pk −1 , pk +1 , . . . , pn ). Next we move pi to the k th position. A moment’s thought shows that this requires (k − i) − 1 interchanges of adjacent terms. Thus, the total number of adjacent interchanges involved in interchanging the elements pi and pk is 2(k − i) − 1, which is always i i i i i i i “main” 2007/2/16 page 193 i 3.1 The Definition of the Determinant 193 an odd integer. Since each adjacent interchange changes the parity, the permutation resulting from an odd number of adjacent interchanges has opposite parity to the original permutation. At this point, we are ready to see how permutations can facilitate the definition of the determinant. From the expression (3.1.2) for the 3 × 3 determinant, we see that the row indices of each term have been arranged in their natural increasing order and that the column indices are each a permutation (p1 , p2 , p3 ) of 1, 2, 3. Further, the sign attached to each term coincides with the sign of the permutation of the corresponding column indices; that is, σ (p1 , p2 , p3 ). These observations motivate the following general definition of the determinant of an n × n matrix: DEFINITION 3.1.8 Let A = [aij ] be an n × n matrix. The determinant of A, denoted det(A), is defined as follows: det (A) = σ (p1 , p2 , . . . , pn )a1p1 a2p2 a3p3 · · · anpn , (3.1.3) where the summation is over the n! distinct permutations (p1 , p2 , . . . , pn ) of the integers 1, 2, 3, . . . , n. The determinant of an n × n matrix is said to have order n. We sometimes denote det(A) by a11 a12 a21 a22 .. .. .. an1 an2 . . . a1 n . . . a2 n . .. .. .. . . . ann Thus, for example, from (3.1.1), we have a11 a12 = a11 a22 − a12 a21 . a21 a22 Example 3.1.9 Use Definition 3.1.8 to derive the expression for the determinant of order 3. Solution: When n = 3, (3.1.3) reduces to det (A) = σ (p1 , p2 , p3 )a1p1 a2p2 a3p3 , where the summation is over the 3! = 6 permutations of 1, 2, 3. It follows that the six terms in this summation are a11 a22 a33 , a11 a23 a32 , a12 a21 a33 , a12 a23 a31 , a13 a21 a32 , a13 a22 a31 , so that det (A) = σ (1, 2, 3)a11 a22 a33 + σ (1, 3, 2)a11 a23 a32 + σ (2, 1, 3)a12 a21 a33 + σ (2, 3, 1)a12 a23 a31 + σ (3, 1, 2)a13 a21 a32 + σ (3, 2, 1)a13 a22 a31 . To obtain the values of each σ (p1 , p2 , p3 ), we determine the parity for each permutation (p1 , p2 , p3 ). We find that σ (1, 2, 3) = +1, σ (2, 3, 1) = +1, σ (1, 3, 2) = −1, σ (3, 1, 2) = +1, σ (2, 1, 3) = −1, σ (3, 2, 1) = −1. i i i i i i i “main” 2007/2/16 page 194 i 194 CHAPTER 3 Determinants Hence, a11 a12 a13 a11 a12 a21 a22 a23 a21 a22 a31 a32 a33 a31 a11 a12 a13 det (A) = a21 a22 a23 a31 a32 a33 a32 Figure 3.1.1: A schematic for obtaining the determinant of a 3 × 3 matrix A = [aij ]. Example 3.1.10 = a11 a22 a33 + a12 a23 a31 + a13 a21 a32 − a11 a23 a32 − a12 a21 a33 − a13 a22 a31 . A simple schematic for obtaining the terms in the determinant of order 3 is given in Figure 3.1.1. By taking the product of the elements joined by each arrow and attaching the indicated sign to the result, we obtain the six terms in the determinant of the 3 × 3 matrix A = [aij ]. Note that this technique for obtaining the terms in a determinant does not generalize to determinants of n × n matrices with n > 3. Evaluate (a) | − 3|. 3 −2 . 14 (b) (c) 1 2 −3 4 −1 2 . 031 Solution: (a) | − 3| = −3. In the case of a 1 × 1 matrix, the reader is cautioned not to confuse the vertical bars notation for the determinant with absolute value bars. (b) 3 −2 = (3)(4) − (−2)(1) = 14. 14 (c) In this case, the schematic in Figure 3.1.1 is 1 2 −3 4 −1 2 031 12 4 −1 03 so that 1 2 −3 4 −1 2 = (1)(−1)(1) + (2)(2)(0) 031 + (−3)(4)(3) − (0)(−1)(−3) − (3)(2)(1) − (1)(4)(2) = −51. We now to turn to some geometric applications of the determinant. Geometric Interpretation of the Determinants of Orders Two and Three If a and b are two vectors in space, we recall that their dot product is the scalar a · b = ||a|| ||b|| cos θ, (3.1.4) where θ is the angle between a and b, and ||a|| and ||b|| denote the lengths of a and b, respectively. On the other hand, the cross product of a and b is the vector a × b = ||a|| ||b|| sin θ n, (3.1.5) i i i i i i i “main” 2007/2/16 page 195 i 3.1 The Definition of the Determinant 195 where n denotes a unit vector1 that is perpendicular to the plane of a and b and chosen in such a way that {a, b, n} is a right-handed set of vectors. If i, j, k denote the unit vectors pointing along the positive x -, y - and z-axes, respectively, of a rectangular Cartesian coordinate system and a = a1 i + a2 j + a3 k, b = b1 i + b2 j + b3 k, then Equation (3.1.5) can be expressed in component form as a × b = (a2 b3 − a3 b2 )i + (a3 b1 − a1 b3 )j + (a1 b2 − a2 b1 )k. (3.1.6) This can be remembered most easily in the compact form ijk a × b = a1 a2 a3 , b1 b2 b3 whose validity is readily checked by using the schematic in Figure 3.1.1. We will use the equations above to establish the following theorem. Theorem 3.1.11 1. The area of a parallelogram with sides determined by the vectors a = a1 i + a2 j and b = b1 i + b2 j is Area = | det (A)|, a1 a2 . b1 b2 where A = 2. The volume of a parallelepiped determined by the vectors a = a1 i + a2 j + a3 k, b = b1 i + b2 j + b3 k, c = c1 i + c2 j + c3 k is Volume = | det (A)|, a1 a2 a3 where A = b1 b2 b3 . c1 c2 c3 Before presenting the proof of this theorem, we make some remarks and give two examples. Remarks 1. The vertical bars appearing in the formulas in Theorem 3.1.11 denote the absolute value of the number det (A). 2. We see from the expression for the volume of a parallelepiped that the condition for three vectors to lie in the same plane (i.e., the parallelepiped has zero volume) is that det(A) = 0. This will be a useful result in the next chapter. Example 3.1.12 Find the area of the parallelogram containing the points (0, 0), (1, 2), (3, 4) and (4, 6). Solution: The sides of the parallelogram are determined by the vectors a = i + 2j and b = 3i + 4j. According to part 1 of Theorem 3.1.11, the area of the parallelogram is det 12 34 = |(1)(4) − (2)(3)| = | − 2| = 2. 1 A unit vector is a vector of length 1. i i i i i i i “main” 2007/2/16 page 196 i 196 CHAPTER 3 Determinants Example 3.1.13 Determine whether or not the vectors a = i + 2j + 3k, b = 4i + 5j + 6k, and c = −5i + (−7)j + (−9)k lie in a single plane in 3-space. Solution: By Remark 2 above, it suffices to determine whether or not the volume of the parallelepiped determined by the three vectors is zero or not. To do this, we use part 2 of Theorem 3.1.11: 123 Volume = det 4 5 6 −5 −7 −9 (1)(5)(−9) + (2)(6)(−5) + (3)(4)(−7) = 0, −(−5)(5)(3) − (−7)(6)(1) − (−9)(4)(2) = which shows that the three vectors do lie in a single plane. Now we turn to the Proof of Theorem 3.1.11: 1. The area of the parallelogram is area = (length of base) × (perpendicular height). From Figure 3.1.2, this can be written as Area = ||a||h = ||a|| ||b|| | sin θ | = ||a × b||. (3.1.7) y b a h x Figure 3.1.2: Determining the area of a parallelogram. Since the k components of a and b, a3 and b3 , are both zero (since the vectors lie in the xy -plane), substitution from Equation (3.1.6) yields Area = ||(a1 b2 − a2 b1 )k|| = |a1 b2 − a2 b1 | = | det (A)|. 2. The volume of the parallelepiped is Volume = (area of base) × (perpendicular height). The base is determined by the vectors b and c (see Figure 3.1.3), and its area can be written as ||b × c||, in similar fashion to what was done in (3.1.7). From Figure 3.1.3 and Equation (3.1.4), we therefore have i i i i i i i “main” 2007/2/16 page 197 i 3.1 The Definition of the Determinant 197 Volume = ||b × c||h = ||b × c|| ||a||| cos ψ | = ||b × c|| |a · n|, where n is a unit vector that is perpendicular to the plane containing b and c. We can now use Equations (3.1.5) and (3.1.6) to obtain Volume = ||b × c|| ||a|| | cos ψ | = |a · (b × c)| = (a1 i + a2 j + a3 k) · (b2 c3 − b3 c2 )i + (b3 c1 − b1 c3 )j + (b1 c2 − b2 c1 )k = |a1 (b2 c3 − b3 c2 ) + a2 (b3 c1 − b1 c3 ) + a3 (b1 c2 − b2 c1 )| = | det (A)|, as required. z a h c u b c y x Figure 3.1.3: Determining the volume of a parallelepiped. Exercises for 3.1 Key Terms True-False Review Permutation, Inversion, Parity, Determinant, Order, Dot product, Cross product. For Questions 1–8, decide if the given statement is true or false, and give a brief justification for your answer. If true, you can quote a relevant definition or theorem from the text. If false, provide an example, illustration, or brief explanation of why the statement is false. Skills • Be able to compute determinants by using Definition 3.1.8. • Be able to list permutations of 1, 2, . . . , n. • Be able to find the number of inversions of a given permutation and thus determine its parity. 1. If A is a 2 × 2 lower triangular matrix, then det(A) is the product of the elements on the main diagonal of A. 2. If A is a 3 × 3 upper triangular matrix, then det (A) is the product of the elements on the main diagonal of A. • Be able to compute the area of a parallelogram with sides determined by vectors in R2 . 3. The volume of the parallelepiped whose sides are determined by the vectors a, b, and c is given by det(A), where A = [a, b, c]. • Be able to compute the volume of a parallelogram with sides determined by vectors in R3 . 4. There are the same number of permutations of {1, 2, 3, 4} of even parity as there are of odd parity. i i i i i i i “main” 2007/2/16 page 198 i 198 CHAPTER 3 Determinants 5. If A and B are 2 × 2 matrices, then det(A + B) = det(A)+ det(B). 6. The determinant of a matrix whose elements are all positive must be positive. 7. A matrix containing a row of zeros must have zero determinant. 8. Three vectors v1 , v2 , and v3 in R3 are coplanar if and only if the determinant of the 3 × 3 matrix [v1 , v2 , v3 ] is zero. Problems 002 13. A = 0 −4 1 . −1 5 −7 123 4 0 5 6 7 14. A = 0 0 8 9 . 0 0 0 10 0020 5 0 0 0 15. A = 0 0 0 3 . 0200 For Problems 16–21, evaluate the given determinant. For Problems 1–6, determine the parity of the given permutation. 1. (2, 1, 3, 4). 3. (1, 4, 3, 5, 2). 4. (5, 4, 3, 2, 1). 18. 5. (1, 5, 2, 4, 3). 6. (2, 4, 6, 1, 3, 5). 7. Use the definition of a determinant to derive the general expression for the determinant of A if A= a11 a12 . a21 a22 For Problems 8–15, evaluate the determinant of the given matrix. 8. A = 1 −1 . 23 9. A = 2 −1 . 6 −3 10. A = −4 10 . −1 8 1 11. A = 2 0 2 12. A = 4 9 π π2 . 16. √ 2 2π 2 3 −1 17. 1 4 1 . 31 6 2. (1, 3, 2, 4). −1 0 3 6 . 2 −1 15 2 3 . 51 32 6 2 1 −1 . −1 1 4 236 19. 0 1 2 . 150 √ 2 e −1 √π e 20. 67 1/30 2001 . π π2 π3 e 2 t e 3t e − 4 t 21. 2e2t 3e3t −4e−4t . 4e2t 9e3t 16e−4t In Problems 22–23, we explore a relationship between determinants and solutions to a differential equation. The 3 × 3 matrix consisting of solutions to a differential equation and their derivatives is called the Wronskian and, as we will see in later chapters, plays a pivotal role in the theory of differential equations. 22. Verify that y1 (x) = cos 2x , y2 (x) = sin 2x , and y3 (x) = ex are solutions to the differential equation y − y + 4y − 4y = 0 , y1 y2 y3 and show that y1 y2 y3 is nonzero on any interval. y1 y2 y3 i i i i i i i “main” 2007/2/16 page 199 i 3.1 23. (a) Verify that y1 (x) = ex , y2 (x) = cosh x , and y3 (x) = sinh x are solutions to the differential equation y − y − y + y = 0, y1 y2 y3 and show that y1 y2 y3 is identically zero. y1 y2 y3 (b) Determine nonzero constants d1 , d2 , and d3 such that d1 y1 + d2 y2 + d3 y3 = 0. 24. (a) Write all 24 distinct permutations of the integers 1, 2, 3, 4. (b) Determine the parity of each permutation in part (a). (c) Use parts (a) and (b) to derive the expression for a determinant of order 4. For Problems 25–27, use the previous problem to compute the determinant of A. 1 −1 0 1 3 0 2 5 25. A = 2 1 0 3 . 9 −1 2 1 11 0 1 3 1 −2 3 26. A = 2 3 1 2 . −2 3 5 −2 0123 2 0 3 4 27. A = 3 4 0 5 . 4560 28. Use Problem 27 to find the determinant of A, where 01230 2 0 3 4 0 A = 3 4 0 5 0. 4 5 6 0 0 00007 a11 a12 and c is a constant, verify that a21 a22 det (cA) = c2 det (A). 29. (a) If A = The Definition of the Determinant 199 (b) Use the definition of a determinant to prove that if A is an n × n matrix and c is a constant, then det (cA) = cn det (A). For Problems 30–33, determine whether the given expression is a term in the determinant of order 5. If it is, determine whether the permutation of the column indices has even or odd parity and hence find whether the term has a plus or a minus sign attached to it. 30. a11 a25 a33 a42 a54 . 31. a11 a23 a34 a43 a52 . 32. a13 a25 a31 a44 a42 . 33. a11 a32 a24 a43 a55 . For Problems 34–37, determine the values of the indices p and q such that the following are terms in a determinant of order 4. In each case, determine the number of inversions in the permutation of the column indices and hence find the appropriate sign that should be attached to each term. 34. a13 ap4 a32 a2q . 35. a21 a3q ap2 a43 . 36. a3q ap4 a13 a42 . 37. apq a34 a13 a42 . 38. The alternating symbol ij k is defined by 1, if (ij k) is an even permutation of 1, 2, 3, = −1, if (ij k) is an odd permutation of 1, 2, 3, ij k 0, otherwise. (a) Write all nonzero 1 ≤ k ≤ 3. ij k , for 1 ≤ i ≤ 3, 1 ≤ j ≤ 3, (b) If A = [aij ] is a 3 × 3 matrix, verify that 3 3 3 det (A) = ij k a1i a2j a3k . i =1 j =1 k =1 39. If A is the general n × n matrix, determine the sign attached to the term a1n a2 n−1 a3 n−2 · · · an1 , which arises in det(A). i i i i i i i “main” 2007/2/16 page 200 i 200 CHAPTER 3 Determinants 40. Use some form of technology to evaluate the determinants in Problems 16–21. 41. Let A be an arbitrary 4 × 4 matrix. By experimenting with various elementary row operations, conjecture how elementary row operations applied to A affect the value of det(A). 3.2 42. Verify that y1 (x) = e−2x cos 3x , y2 (x) = sin 3x , and y3 (x) = e−4x are solutions to the differential equation e −2 x y + 8y + 29y + 52y = 0, y1 y2 y3 and show that y1 y2 y3 is nonzero on any interval. y1 y2 y3 Properties of Determinants For large values of n, evaluating a determinant of order n using the definition given in the previous section is not very practical, since the number of terms is n! (for example, a determinant of order 10 contains 3,628,800 terms). In the next two sections, we develop better techniques for evaluating determinants. The following theorem suggests one way to proceed. Theorem 3.2.1 If A is an n × n upper or lower triangular matrix, then n det (A) = a11 a22 a33 · · · ann = aii . i =1 Proof We use the definition of the determinant to prove the result in the upper triangular case. From Equation (3.1.3), det (A) = σ (p1 , p2 , . . . , pn )a1p1 a2p2 a3p3 . . . anpn . (3.2.1) If A is upper triangular, then aij = 0 whenever i > j , and therefore the only nonzero terms in the preceding summation are those with pi ≥ i for all i . Since all the pi must be distinct, the only possibility is (by applying pi ≥ i to i = n, n − 1, . . . , 2, 1 in turn) pi = i, i = 1, 2, . . . , n, and so Equation (3.2.1) reduces to the single term det (A) = σ (1, 2, . . . , n)a11 a22 · · · ann . Since σ (1, 2, . . . , n) = 1, it follows that det (A) = a11 a22 · · · ann . The proof in the lower triangular case is left as an exercise (Problem 47). Example 3.2.2 According to the previous theorem, 2 5 −1 3 0 −1 0 4 = (2)(−1)(7)(5) = −70. 0 0 78 0 0 05 i i i i i i i “main” 2007/2/16 page 201 i 3.2 Properties of Determinants 201 Theorem 3.2.1 shows that it is easy to compute the determinant of an upper or lower triangular matrix. Recall from Chapter 2 that any matrix can be reduced to row-echelon form by a sequence of elementary row operations. In the case of an n × n matrix, any row-echelon form will be upper triangular. Theorem 3.2.1 suggests, therefore, that we should consider how elementary row operations performed on a matrix A alter the value of det (A). Elementary Row Operations and Determinants Let A be an n × n matrix. P1. If B is the matrix obtained by permuting two rows of A, then det (B) = − det (A). P2. If B is the matrix obtained by multiplying one row of A by any2 scalar k , then det (B) = k det (A). P3. If B is the matrix obtained by adding a multiple of any row of A to a different row of A, then det (B) = det (A). The proofs of these properties are given at the end of this section. Remark The main use of P2 is that it enables us to factor a common multiple of the entries of a particular row out of the determinant. For example, if A= −1 4 3 −2 and B= −5 20 , 3 −2 where B is obtained from A by multiplying the first row of A by 5, then we have det (B) = 5 det (A) = 5[(−1)(−2) − (3)(4)] = 5(−10) = −50. We now illustrate how the foregoing properties P1–P3, together with Theorem 3.2.1, can be used to evaluate a determinant. The basic idea is the same as that for Gaussian elimination. We use elementary row operations to reduce the determinant to upper triangular form and then use Theorem 3.2.1 to evaluate the resulting determinant. Warning: When using the properties P1–P3 to simplify a determinant, one must remember to take account of any change that arises in the value of the determinant from the operations that have been performed on it. Example 3.2.3 2 −1 3 7 1 −2 4 3 Evaluate . 3 4 2 −1 2 −2 8 −4 2 This statement is even true if k = 0. i i i i i i i “main” 2007/2/16 page 202 i 202 CHAPTER 3 Determinants Solution: We have 2 −1 3 7 1 −2 4 3 1 =2 3 4 2 −1 2 −2 8 −4 2 −1 3 7 1 −2 4 3 2 = −2 3 4 2 −1 1 −1 4 −2 1 −2 4 3 2 −1 3 7 3 = −2 3 4 2 −1 1 −1 4 −2 1 −2 4 3 01 0 −5 5 4 =2 = 20 0 10 −10 −10 0 3 −5 1 1 −2 4 3 0 3 −5 1 0 10 −10 −10 01 0 −5 1 −2 4 3 0 1 0 −5 6 = 20 0 1 −1 −1 0 3 −5 1 1 −2 4 3 0 1 0 −5 0 0 −1 4 0 0 −5 16 1 −2 4 3 0 1 0 −5 = 20 = 80. 0 0 −1 4 0 0 0 −4 7 1 1. M4 ( 2 ) 5. 2. P12 1 M3 ( 10 ) 3. A12 (−2), A13 (−3), A14 (−1) 6. A23 (−1), A24 (−3) 4. P24 7. A34 (−5) Theoretical Results for n × n Matrices and n × n Linear Systems In Section 2.8, we established several conditions on an n × n matrix A that are equivalent to saying that A is invertible. At this point, we are ready to give one additional characterization of invertible matrices in terms of determinants. Theorem 3.2.4 Let A be an n×n matrix with real elements. The following conditions on A are equivalent. (a) A is invertible. (g) det (A) = 0. Proof Let A∗ denote the reduced row-echelon form of A. Recall from Chapter 2 that A is invertible if and only if A∗ = In . Since A∗ is obtained from A by performing a sequence of elementary row operations, properties P1–P3 of determinants imply that det(A) is just a nonzero multiple of det(A∗ ). If A is invertible, then det(A∗ ) = det(In ) = 1, so that det(A) is nonzero. Conversely, if det(A) = 0, then det(A∗ ) = 0. This implies that A∗ = In , hence A is invertible. According to Theorem 2.5.9 in the previous chapter, any linear system Ax = b has either no solution, exactly one solution, or infinitely many solutions. Recall from the Invertible Matrix Theorem that the linear system Ax = b has a unique solution for every b in Rn if and only if A is invertible. Thus, for an n × n linear system, Theorem 3.2.4 tells us that, for each b in Rn , the system Ax = b has a unique solution x if and only if det(A) = 0. Next, we consider the homogeneous n × n linear system Ax = 0. Corollary 3.2.5 The homogeneous n × n linear system Ax = 0 has an infinite number of solutions if and only if det (A) = 0, and has only the trivial solution if and only if det (A) = 0. i i i i i i i “main” 2007/2/16 page 203 i 3.2 Properties of Determinants 203 Proof The system Ax = 0 clearly has the trivial solution x = 0 under any circumstances. By our remarks above, this must be the unique solution if and only if det(A) = 0. The only other possibility, which occurs if and only if det(A) = 0, is that the system has infinitely many solutions. Remark The preceding corollary is very important, since we are often interested only in determining the solution properties of a homogeneous linear system and not actually in finding the solutions themselves. We will refer back to this corollary on many occasions throughout the remainder of the text. Example 3.2.6 Verify that the matrix 1 −1 3 A = 2 4 −2 357 is invertible. What can be concluded about the solution to Ax = 0? Solution: It is easily shown that det(A) = 52 = 0. Consequently, A is invertible. It follows from Corollary 3.2.5 that the homogeneous system Ax = 0 has only the trivial solution (0, 0, 0). Example 3.2.7 Verify that the matrix 10 1 A = 0 1 0 −3 0 −3 is not invertible and determine a set of real solutions to the system Ax = 0. Solution: By the row operation A13 (3), we see that A is row equivalent to the upper triangular matrix 101 B = 0 1 0. 000 By Theorem 3.2.1, det(B) = 0, and hence B and A are not invertible. We illustrate Corollary 3.2.5 by finding an infinite number of solutions (x1 , x2 , x3 ) to Ax = 0. Working with the upper triangular matrix B , we may set x3 = t , a free parameter. The second row of the matrix system requires that x2 = 0 and the first row requires that x1 + x3 = 0, so x1 = −x3 = −t . Hence, the set of solutions is {(−t, 0, t) : t ∈ R}. Further Properties of Determinants In addition to elementary row operations, the following properties can also be useful in evaluating determinants. Let A and B be n × n matrices. P4. det(AT ) = det(A). i i i i i i i “main” 2007/2/16 page 204 i 204 CHAPTER 3 Determinants P5. Let a1 , a2 , . . . , an denote the row vectors of A. If the i th row vector of A is the sum of two row vectors, say ai = bi + ci , then det(A) = det(B)+ det(C), where a1 a1 . . . . . . ai −1 ai −1 B = bi and C = ci . ai +1 ai +1 . . . . . . an an The corresponding property is also true for columns. P6. If A has a row (or column) of zeros, then det(A) = 0. P7. If two rows (or columns) of A are the same, then det(A) = 0. P8. det(AB) = det(A)det(B). The proofs of these properties are given at the end of the section. The main importance of P4 is the implication that any results regarding determinants that hold for the rows of a matrix also hold for the columns of a matrix. In particular, the properties P1–P3 regarding the effects that elementary row operations have on the determinant can be translated to corresponding statements on the effects that “elementary column operations” have on the determinant. We will use the notations CPij , CMi (k), and CAij (k) to denote the three types of elementary column operations. Example 3.2.8 Use only column operations to evaluate 3 6 9 15 Solution: 3 6 9 15 6 −1 2 10 3 4 . 20 5 4 34 3 8 We have 6 −1 2 10 3 4 1 = 3 · 22 20 5 4 34 3 8 1 2 3 5 3 −1 1 5 32 2 = 12 10 5 2 17 3 4 1 00 0 2 −1 5 0 3 = 12 3 1 8 −1 5 2 8 −1 1 00 0 2 −1 0 0 3 1 13 −1 5 2 18 −1 1 0 00 2 −1 0 0 = 12 = 12(−5) = −60, 3 1 13 0 5 5 2 18 13 4 where we have once more used Theorem 3.2.1. 1 1 1. CM1 ( 1 ), CM2 ( 2 ), CM4 ( 2 ) 3 3. CA23 (5) 2. CA12 (−3), CA13 (1), CA14 (−1) 1 4. CA34 ( 13 ) i i i i i i i “main” 2007/2/16 page 205 i 3.2 Properties of Determinants 205 The property that often gives the most difficulty is P5. We explicitly illustrate its use with an example. Example 3.2.9 Use property P5 to express a1 + b1 c1 + d1 a2 + b2 c2 + d2 as a sum of four determinants. Solution: Applying P5 to row 1 yields: a1 + b1 c1 + d1 a1 c1 b1 d1 = + . a2 + b2 c2 + d2 a2 + b2 c2 + d2 a2 + b2 c2 + d2 Now we apply P5 to row 2 of both of the determinants on the right-hand side to obtain a1 + b1 c1 + d1 ac ac bd bd = 1 1 + 1 1 + 1 1 + 1 1. a2 + b2 c2 + d2 a2 c2 b2 d2 a2 c2 b2 d2 Notice that we could also have applied P5 to the columns of the given determinant. Warning In view of P5, it may be tempting to believe that if A, B , and C are n × n matrices such that A = B + C , then det (A) = det (B) + det (C). This is not true! Examples abound to show the failure of this equation. For instance, if we take B = I2 and C = −I2 , then det (A) = det (02 ) = 0, while det (B) = det (C) = 1. Thus, det (B) + det (C) = 1 + 1 = 2 = 0. Next, we supply some examples of the last two properties, P7 and P8. Example 3.2.10 Evaluate 1 2 −3 −2 4 6 (a) −3 −6 9 2 11 −6 1 2 . 3 4 2 − 4x −4 2 (b) 5 + 3x 3 −3 . 1 − 2x −2 1 Solution: (a) We have 1 2 −3 −2 4 6 −3 −6 9 2 11 −6 1 1 2 11 21 −2 4 −2 2 = −3 = 0, 3 −3 −6 −3 3 4 2 11 2 4 since the first and third columns of the latter matrix are identical (see P7). 1 1. CM3 (− ) 3 i i i i i i i “main” 2007/2/16 page 206 i 206 CHAPTER 3 Determinants (b) Applying P5 to the first column, we have 2 − 4x −4 2 2 −4 2 −4x −4 2 5 + 3x 3 −3 = 5 3 −3 + 3x 3 −3 1 − 2x −2 1 1 −2 1 −2x −2 1 1 −2 1 −4 −4 2 = 2 5 3 −3 + x 3 3 −3 = 0 + 0 = 0, 1 −2 1 −2 −2 1 since the first and third rows of the first matrix agree, and the first and second columns of the second matrix agree. Example 3.2.11 If A= sin φ cos φ − cos φ sin φ and B= cos θ − sin θ sin θ cos θ , show that det (AB) = 1. Solution: Using P8, we have det (AB) = det (A) det (B) = (sin2 φ + cos2 φ)(cos2 θ + sin2 θ) = 1 · 1 = 1. Example 3.2.12 Find all x satisfying x2 x 1 1 1 1 = 0. 4 21 Solution: If we expanded this determinant according to Definition 3.1.8 (or using the schematic in Figure 3.1.1), then we would have a quadratic equation in x . Thus, there are at most two distinct values of x that satisfy the equation. By inspection, the determinant vanishes when x = 1 (since the first two rows of the matrix coincide in this case), and it vanishes when x = 2 (since the first and third rows of the matrix coincide in this case). Consequently, the two values of x satisfying the given equation are x = 1 and x = 2. Proofs of the Properties of Determinants We now prove the properties P1–P8. Proof of P1: Let B be the matrix obtained by interchanging row r with row s in A. Then the elements of B are related to those of A as follows: aij if i = r, s, bij = asj if i = r , arj if i = s . Thus, from Definition 3.1.8, det (B) = = σ (p1 , p2 , · · · , pr , · · · , ps , · · · , pn )b1p1 b2p2 · · · brpr · · · bsps · · · bnpn σ (p1 , p2 , · · · , pr , · · · , ps , · · · , pn )a1p1 a2p2 · · · aspr · · · arps · · · anpn . i i i i i i i “main” 2007/2/16 page 207 i 3.2 Properties of Determinants 207 Interchanging pr and ps in σ (p1 , p2 , . . . , pr , . . . , ps , . . . , pn ) and recalling from Theorem 3.1.7 that such an interchange has the effect of changing the parity of the permutation, we obtain σ (p1 , p2 , · · · , ps , · · · , pr , · · · , pn )a1p1 a2p2 · · · arps · · · aspr · · · anpn , det (B) = − where we have also rearranged the terms so that the row indices are in their natural increasing order. The sum on the right-hand side of this equation is just det(A), so that det (B) = − det (A). Proof of P2: Let B be the matrix obtained by multiplying the i th row of A through by any scalar k . Then bij = kaij for each j . Then det (B) = = σ (p1 , p2 , · · · , pn )b1p1 b2p2 · · · bnpn σ (p1 , p2 , · · · , pn )a1p1 a2p2 · · · (kaipi ) · · · anpn = k det (A). We prove properties P5 and P7 next, since they simplify the proof of P3. Proof of P5: The elements of A are akj = akj , bij + cij , if k = i , if k = i . Thus, from Definition 3.1.8, det (A) = σ (p1 , p2 , · · · , pn )a1p1 a2p2 · · · anpn = σ (p1 , p2 , · · · , pn )a1p1 a2p2 · · · ai −1pi −1 (bipi + cipi )ai +1pi +1 · · · anpn = σ (p1 , p2 , · · · , pn )a1p1 a2p2 · · · ai −1pi −1 bipi ai +1pi +1 · · · anpn + σ (p1 , p2 , · · · , pn )a1p1 a2p2 · · · ai −1pi −1 cipi ai +1pi +1 · · · anpn = det (B) + det (C). Proof of P7: Suppose rows i and j in A are the same. Then if we interchange these rows, the matrix, and hence its determinant, are unaltered. However, according to P1, the determinant of the resulting matrix is − det (A). Therefore, det (A) = − det (A), which implies that det (A) = 0. Proof of P3: Let A = [a1 , a2 , . . . , an ]T , and let B be the matrix obtained from A when k times row j of A is added to row i of A. Then B = [a1 , a2 , . . . , ai + k aj , . . . , an ]T i i i i i i i “main” 2007/2/16 page 208 i 208 CHAPTER 3 Determinants so that, using P5, det (B) = det ([a1 , a2 , . . . , ai + k aj , . . . , an ]T ) = det ([a1 , a2 , . . . , an ]T ) + det ([a1 , a2 , . . . , k aj , . . . , an ]T ). By P2, we can factor out k from row i of the second determinant on the right-hand side. If we do this, it follows that row i and row j of the resulting determinant are the same, and so, from P7, the value of the second determinant is zero. Thus, det (B) = det ([a1 , a2 , . . . , an ]T ) = det (A), as required. Proof of P4: Using Definition 3.1.8, we have det (AT ) = σ (p1 , p2 , . . . , pn )ap1 1 ap2 2 ap3 3 · · · apn n . (3.2.2) Since (p1 , p2 , . . . , pn ) is a permutation of 1, 2, . . . , n, it follows that, by rearranging terms, ap1 1 ap2 2 ap3 3 · · · apn n = a1q1 a2q2 a3q3 · · · anqn , (3.2.3) for appropriate values of q1 , q2 , . . . , qn . Furthermore, N(p1 , . . . , pn ) = # of interchanges in changing (1, 2, . . . , n) to (p1 , p2 , . . . , pn ) = # of interchanges in changing (p1 , p2 , . . . , pn ) to (1, 2, . . . , n) and by (3.2.3), this number is = # of interchanges in changing (1, 2, . . . , n) to (q1 , q2 , . . . , qn ) = N(q1 , . . . , qn ). Thus, σ (p1 , p2 , . . . , pn ) = σ (q1 , q2 , . . . , qn ). (3.2.4) Substituting Equations (3.2.3) and (3.2.4) into Equation (3.2.2), we have det (AT ) = σ (q1 , q2 , . . . , qn )a1q1 a2q2 a3q3 · · · anqn = det (A). Proof of P6: Since each term σ (p1 , p2 , . . . , pn )a1p1 a2p2 · · · anpn in the formula for det(A) contains a factor from the row (or column) of zeros, each such term is zero. Thus, det(A) = 0. Proof of P8: Let E denote an elementary matrix. We leave it as an exercise (Problem 51) to verify that −1, if E permutes rows, det (E) = +1, if E adds a multiple of one row to another row, k, if E scales a row by k . It then follows from properties P1–P3 that in each case det (EA) = det (E) det (A). (3.2.5) Now consider a general product AB . We need to distinguish two cases. i i i i i i i “main” 2007/2/16 page 209 i 3.2 Properties of Determinants 209 Case 1: If A is not invertible, then from Corollary 2.6.12, so is AB . Consequently, applying Theorem 3.2.4, det (AB) = 0 = det (A) det (B). Case 2: If A is invertible, then from Section 2.7, we know that it can be expressed as the product of elementary matrices, say, A = E1 E2 · · · Er . Hence, repeatedly applying (3.2.5) gives det (AB) = det (E1 E2 · · · Er B) = det (E1 ) det (E2 · · · Er B) = det (E1 ) det (E2 ) · · · det (Er ) det (B) = det (E1 E2 · · · Er ) det (B) = det (A) det (B). Exercises for 3.2 Skills 6. If A and B are n × n matrices, then det(AB) = det(BA). • Be able to compute the determinant of an upper or lower triangular matrix “at a glance” (Theorem 3.2.1). Problems • Know the effects that elementary row operations have on the determinant of a matrix. For Problems 1–12, reduce the given determinant to upper triangular form and then evaluate. • Likewise, be comfortable with the effects that column operations have on the determinant of a matrix. • Be able to use the determinant to decide if a matrix is invertible (Theorem 3.2.4). • Know how the determinant is affected by matrix multiplication and by matrix transpose. True-False Review For Questions 1–6, decide if the given statement is true or false, and give a brief justification for your answer. If true, you can quote a relevant definition or theorem from the text. If false, provide an example, illustration, or brief explanation of why the statement is false. 1. If each element of an n × n matrix is doubled, then the determinant of the matrix also doubles. 2. Multiplying a row of an n × n matrix through by a scalar c has the same effect on the determinant as multiplying a column of the matrix through by c. 3. If A is an n × n matrix, then det(A5 ) = (det A)5 . 4. If A is a real n × n matrix, then det(A2 ) cannot be negative. x2 x 5. The matrix y2 y x = 0 or y = 0. is not invertible if and only if 1 23 1. 2 6 4 . 3 −5 2 2. 2 −1 4 3 21 . −2 1 4 21 3 3. −1 2 6 . 4 1 12 0 1 −2 4. −1 0 3 . 2 −3 0 37 1 5. 5 9 −6 . 21 3 1 −1 2 4 3 124 6. . −1 1 3 2 2 142 2 26 7. 2 1 32 104 56 40 1 4 26 −13 . 2 7 1 5 i i i i i i i “main” 2007/2/16 page 210 i 210 CHAPTER 3 0 1 −1 −1 0 1 8. 1 −1 0 −1 −1 −1 2 3 9. 4 5 1 0 1 2 3 1 4 5 Determinants 1 1 . 1 0 5 2 . 3 3 20. Determine all values of the constant k for which the given system has a unique solution x1 + kx2 = b1 , kx1 + 4x2 = b2 . 21. Determine all values of the constant k for which the given system has an infinite number of solutions. x1 + 2x2 + kx3 = 0, 2x1 − kx2 + x3 = 0, 3x1 + 6x2 + x3 = 0. 2 −1 3 4 7 12 3 10. . −2 4 8 6 6 −6 18 −24 7 −1 3 4 14 2 4 6 11. . 21 1 3 4 −7 4 5 8 3 1 12. 4 3 8 7 12 3 1 −1 0 1 8 −1 6 6 . 7 09 4 16 −1 8 12 22. Determine all values of k for which the given system has an infinite number of solutions. x1 + 2x2 + x3 = kx1 , 2x1 + x2 + x3 = kx2 , x1 + x2 + 2x3 = kx3 . 23. Determine all values of k for which the given system has a unique solution. x1 + kx2 = 2, kx1 + x2 + x3 = 1, x1 + x2 + x3 = 1. For Problems 13–19, use Theorem 3.2.4 to determine whether the given matrix is invertible or not. 13. 14. 15. 16. 17. 18. 19. 21 . 32 −1 1 . 1 −1 2 6 −1 3 5 1 . 20 1 −1 2 3 5 −2 1 . 8 −2 5 1 0 2 −1 3 −2 1 4 2 1 6 2 . 1 −3 4 0 11 1 1 −1 1 −1 1 1 1 −1 −1 . −1 1 1 −1 1 2 −3 5 −1 2 −3 6 2 3 −1 4 . 1 −2 3 −6 1 −1 2 A = 3 1 4, 0 13 24. If find det(A), and use properties of determinants to find det(AT ) and det(−2A). 25. If A= 1 −1 23 and B= 12 , −2 4 evaluate det (AB) and verify P8. 26. If A= cosh x sinh x sinh x cosh x and B = cosh y sinh y , sinh y cosh y evaluate det(AB). For Problems 27–29, use properties of determinants to show that det(A) = 0 for the given matrix A. 32 1 27. A = 6 4 −1 . 96 2 i i i i i i i “main” 2007/2/16 page 211 i 3.2 1 −3 1 28. A = 2 −1 7 . 3 1 13 1 + 3a 1 3 29. A = 1 + 2a 1 2 . 2 20 (b) Does your answer to (a) change if we instead consider the volume of the parallelepiped determined by the column vectors of the matrix A? Why or why not? ab cd and assume det(A) = 1. Find det(B). 30. B = 3c 3d . 4a 4b 31. B = −2a −2c . 3a + b 3c + d 32. B = −b −a . d − 4b c − 4a abc A = d e f ghi and assume det(A) = −6. Find det(B). −4d −4e −4f 33. B = g + 5a h + 5b i + 5c . a b c d e f 34. B = −3a −3b −3c . g − 4d h − 4e i − 4f 2a 2d 2g 35. B = b − c e − f h − i . c−a f −d i−g For Problems 36–40, let A and B be 4 × 4 matrices such that det(A) = 5 and det(B) = 3. Compute the determinant of the given matrix. (c) For what value(s) of k , if any, is A invertible? 42. Without expanding the determinant, determine all values of x for which det(A) = 0 if 1 −1 x A = 2 1 x2 . 4 −1 x 3 43. Use only properties P5, P1, and P2 to show that αx − βy βx − αy αβ = (x 2 + y 2 ) . βx + αy αx + βy βα 44. Use only properties P5, P1, and P2 to find the value of αβγ such that a1 + βb1 b1 + γ c1 c1 + αa1 a2 + βb2 b2 + γ c2 c2 + αa2 = 0 a3 + βb3 b3 + γ c3 c3 + αa3 for all values of ai , bi , ci . 45. Use only properties P3 and P7 to prove property P6. 46. An n × n matrix A that satisfies AT = A−1 is called an orthogonal matrix. Show that if A is an orthogonal matrix, then det(A) = ±1. 47. (a) Use the definition of a determinant to prove that if A is an n × n lower triangular matrix, then n det (A) = a11 a22 a33 · · · ann = aii . i =1 36. AB T . 37. A2 B 5 . (b) Evaluate the following determinant by first reducing it to lower triangular form and then using the result from (a): 38. (A−1 B 2 )3 . 39. ((2B)−1 (AB)T ). 40. (5A)(2B). 41. Let 211 (a) In terms of k , find the volume of the parallelepiped determined by the row vectors of the matrix A. For Problems 30–32, let A = For Problems 33–35, let Properties of Determinants 124 A = 3 1 6. k32 2 −1 3 5 1 221 . 3 014 1 201 48. Use determinants to prove that if A is invertible and B and C are matrices with AB = AC , then B = C . i i i i i i i “main” 2007/2/16 page 212 i 212 CHAPTER 3 Determinants 49. If A and S are n × n matrices with S invertible, show that det (S −1 AS) = det (A). [Hint: Since S −1 S = In , how are det (S −1 ) and det (S) related?] 50. If det(A3 ) = 0, is it possible for A to be invertible? Justify your answer. 51. Let E be an elementary matrix. Verify the formula for det(E) given in the text at the beginning of the proof of P8. (b) Verify property P2 of determinants in the case when row 1 of A is divided by k . (c) Verify property P3 of determinants in the case when k times row 2 is added to row 1. 57. 58. 52. Show that xy1 x1 y1 1 = 0 x2 y2 1 represents the equation of the straight line through the distinct points (x1 , y1 ) and (x2 , y2 ). 59. 54. If A is an n × n skew-symmetric matrix and n is odd, prove that det(A) = 0. 55. Let A = [a1 , a2 , . . . , an ] be an n × n matrix, and let b = c1 a1 + c2 a2 + · · · + cn an , where c1 , c2 , . . . , cn are constants. If Bk denotes the matrix obtained from A by replacing the k th column vector by b, prove that det (Bk ) = ck det (A), 56. k = 1, 2, . . . , n. Let A be the general 4 × 4 matrix. (a) Verify property P1 of determinants in the case when the first two rows of A are permuted. 3.3 Determine all values of a for which 1234a 2 1 2 3 4 3 2 1 2 3 4 3 2 1 2 a4321 is invertible. 53. Without expanding the determinant, show that 1 x x2 1 y y 2 = (y − z)(z − x)(x − y). 1 z z2 For a randomly generated 5 × 5 matrix, verify that det(AT ) = det(A). If 14 1 A = 3 2 1, 3 4 −1 determine all values of the constant k for which the linear system (A − kI3 )x = 0 has an infinite number of solutions, and find the corresponding solutions. 60. Use the determinant to show that 1234 2 1 2 3 A= 3 2 1 2 4321 is invertible, and use A−1 to solve Ax = b if b = [3, 7, 1, −4]T . Cofactor Expansions We now obtain an alternative method for evaluating determinants. The basic idea is that we can reduce a determinant of order n to a sum of determinants of order n−1. Continuing in this manner, it is possible to express any determinant as a sum of determinants of order 2. This method is the one most frequently used to evaluate a determinant by hand, although the procedure introduced in the previous section whereby we use elementary row operations to reduce the matrix to upper triangular form involves less work in general. When A is invertible, the technique we derive leads to formulas for both A−1 and the unique solution to Ax = b. We first require two preliminary definitions. DEFINITION 3.3.1 Let A be an n × n matrix. The minor, Mij , of the element aij , is the determinant of the matrix obtained by deleting the i th row vector and j th column vector of A. i i i i i i i “main” 2007/2/16 page 213 i 3.3 Cofactor Expansions 213 Remark Notice that if A is an n × n matrix, then Mij is a determinant of order n − 1. By convention, if n = 1, we define the “empty” determinant M11 to be 1. Example 3.3.2 If a11 a12 a13 A = a21 a22 a23 , a31 a32 a33 then, for example, M23 = Example 3.3.3 a11 a12 a31 a32 and M31 = a12 a13 . a22 a23 Determine the minors M11 , M23 , and M31 for 21 3 A = −1 4 −2 . 31 5 Solution: Using Definition 3.3.1, we have M11 = 4 −2 = 22, 15 M23 = 21 = −1, 31 M31 = 13 = −14. 4 −2 DEFINITION 3.3.4 Let A be an n × n matrix. The cofactor, Cij , of the element aij , is defined by Cij = (−1)i +j Mij , where Mij is the minor of aij . From Definition 3.3.4, we see that the cofactor of aij and the minor of aij are the same if i + j is even, and they differ by a minus sign if i + j is odd. The appropriate sign in the cofactor Cij is easy to remember, since it alternates in the following manner: + − + . . . Example 3.3.5 − + − . . . + − + . . . − + − . . . + ··· − ··· + ··· . . . . Determine the cofactors C11 , C23 , and C31 for the matrix in Example 3.3.3. Solution: We have already obtained the minors M11 , M23 , and M31 in Example 3.3.3, so it follows that C11 = +M11 = 22, C23 = −M23 = 1, C31 = +M31 = −14. i i i i i i i “main” 2007/2/16 page 214 i 214 CHAPTER 3 Determinants Example 3.3.6 If A = a11 a12 , verify that det (A) = a11 C11 + a12 C12 . a21 a22 Solution: In this case, C11 = + det [a22 ] = a22 , C12 = − det [a12 ] = −a12 , so that a11 C11 + a12 C12 = a11 a22 + a12 (−a21 ) = det (A). The preceding example is a special case of the following important theorem. Theorem 3.3.7 (Cofactor Expansion Theorem) Let A be an n × n matrix. If we multiply the elements in any row (or column) of A by their cofactors, then the sum of the resulting products is det (A). Thus, 1. If we expand along row i , n det (A) = ai 1 Ci 1 + ai 2 Ci 2 + · · · + ain Cin = aik Cik . k =1 2. If we expand along column j , n det (A) = a1j C1j + a2j C2j + · · · + anj Cnj = akj Ckj . k =1 The expressions for det(A) appearing in this theorem are known as cofactor expansions. Notice that a cofactor expansion can be formed along any row or column of A. Regardless of the chosen row or column, the cofactor expansion will always yield the determinant of A. However, sometimes the calculation is simpler if the row or column of expansion is wisely chosen. We will illustrate this in the examples below. The proof of the Cofactor Expansion Theorem will be presented after some examples. Example 3.3.8 Use the Cofactor Expansion Theorem along (a) row 1, (b) column 3 to find 2 34 1 −1 1 . 6 30 i i i i i i i “main” 2007/2/16 page 215 i 3.3 Cofactor Expansions 215 Solution: (a) We have 2 34 −1 1 11 1 −1 1 −1 1 = 2 −3 +4 = −6 + 18 + 36 = 48. 30 60 63 6 30 (b) We have 2 34 1 −1 23 1 −1 1 = 4 −1 + 0 = 36 + 12 + 0 = 48. 63 63 6 30 Notice that (b) was easier than (a) in the previous example, because of the zero in column 3. Whenever one uses the cofactor expansion method to evaluate a determinant, it is usually best to select a row or column containing as many zeros as possible in order to minimize the amount of computation required. Example 3.3.9 Evaluate 0 5 7 6 Solution: we have 0 5 7 6 3 −1 0 0 82 . 2 54 1 70 In this case, it is easiest to use either row 1 or column 4. Choosing row 1, 3 −1 0 582 502 0 82 = −3 7 5 4 + (−1) 7 2 4 2 54 670 610 1 70 = −3 [2 (49 − 30) − 4 (35 − 48) + 0] − [5 (0 − 4) − 0 + 2 (7 − 12)] = −240. In evaluating the determinants of order 3 on the right side of the first equality, we have used cofactor expansion along column 3 and row 1, respectively. For additional practice, the reader may wish to verify our result here by cofactor expansion along a different row or column. Now we turn to the Proof of the Cofactor Expansion Theorem: It follows from the definition of the determinant that det(A) can be written in the form ˆ ˆ ˆ det (A) = ai 1 Ci 1 + a12 Ci 2 + · · · + ain Cin (3.3.1) ˆ where the coefficients Cij contain no elements from row i or column j . We must show that ˆ Cij = Cij where Cij is the cofactor of aij . Consider first a11 . From Definition 3.1.8, the terms of det(A) that contain a11 are given by a11 σ (1, p2 , p3 , . . . , pn )a2p2 a3p3 · · · anpn , i i i i i i i “main” 2007/2/16 page 216 i 216 CHAPTER 3 Determinants where the summation is over the (n − 1)! distinct permutations of 2, 3, . . . , n. Thus, ˆ C11 = σ (1, p2 , p3 , . . . , pn )a2p2 a3p3 · · · anpn . However, this summation is just the minor M11 , and since C11 = M11 , we have shown the coefficient of a11 in det(A) is indeed the cofactor C11 . Now consider the element aij . By successively interchanging adjacent rows and columns of A, we can move aij into the (1, 1) position without altering the relative positions of the other rows and columns of A. We let A denote the resulting matrix. Obtaining A from A requires i − 1 row interchanges and j − 1 column interchanges. Therefore, the total number of interchanges required to obtain A from A is i + j − 2. Consequently, det (A) = (−1)i +j −2 det (A ) = (−1)i +j det (A ). Now for the key point. The coefficient of aij in det(A) must be (−1)i +j times the coefficient of aij in det(A ). But, aij occurs in the (1, 1) position of A , and so, as we have previously shown, its coefficient in det(A ) is M11 . Since the relative positions of the remaining rows in A have not altered, it follows that M11 = Mij , and therefore the coefficient of aij in det(A ) is Mij . Consequently, the coefficient of aij in det(A) is (−1)i +j Mij = Cij . Applying this result to the elements ai 1 , ai 2 , . . . , ain and comparing with (3.3.1) yields ˆ Cij = Cij , j = 1, 2, . . . , n, which establishes the theorem for expansion along a row. The result for expansion along a column follows directly, since det(AT ) = det(A). We now have two computational methods for evaluating determinants: the use of elementary row operations given in the previous section to reduce the matrix in question to upper triangular form, and the Cofactor Expansion Theorem. In evaluating a given determinant by hand, it is usually most efficient (and least error prone) to use a combination of the two techniques. More specifically, we use elementary row operations to set all except one element in a row or column equal to zero and then use the Cofactor Expansion Theorem on that row or column. We illustrate with an example. Example 3.3.10 Evaluate 2 1 −1 1 Solution: 2 1 −1 1 18 41 21 3 −1 6 3 . 4 2 We have 18 41 21 3 −1 6 0 −7 6 0 −7 6 0 −7 60 311 4 1 32 3 4 = = − 6 2 7 = − −1 −12 0 = 90. 4 0627 −1 −2 −1 −1 −2 −1 2 0 −1 −2 −1 1. A21 (−2), A23 (1), A24 (−1) 2. Cofactor expansion along column 1 3. A32 (7) 4. Cofactor expansion along column 3 i i i i i i i “main” 2007/2/16 page 217 i 3.3 Example 3.3.11 Cofactor Expansions 217 Determine all values of k for which the system 10x1 + kx2 − x3 = 0, kx1 + x2 − x3 = 0, 2x1 + x2 − 3x3 = 0, has nontrivial solutions. Solution: We will apply Corollary 3.2.5. The determinant of the matrix of coefficients of the system is 10 k −1 10 k −1 k − 10 1 − k 1 2 det (A) = k 1 −1 = k − 10 1 − k 0 = − −28 1 − 3k 2 1 −3 −28 1 − 3k 0 = − [(k − 10)(1 − 3k) − (−28)(1 − k)] = 3k 2 − 3k − 18 = 3(k 2 − k − 6) = 3(k − 3)(k + 2). 1. A12 (−1), A13 (−3) 2. Cofactor expansion along column 3. From Corollary 3.2.5, the system has nontrivial solutions if and only if det(A) = 0; that is, if and only if k = 3 or k = −2. The Adjoint Method for A−1 We next establish two corollaries to the Cofactor Expansion Theorem that, in the case of an invertible matrix A, lead to a method for expressing the elements of A−1 in terms of determinants. Corollary 3.3.12 If the elements in the i th row (or column) of an n × n matrix A are multiplied by the cofactors of a different row (or column), then the sum of the resulting products is zero. That is, 1. If we use the elements of row i and the cofactors of row j , n aik Cj k = 0, i = j. (3.3.2) k =1 2. If we use the elements of column i and the cofactors of column j , n aki Ckj = 0, i = j. (3.3.3) k =1 Proof We prove (3.3.2). Let B be the matrix obtained from A by adding row i to row j (i = j ) in the matrix A. By P3, det(B) = det(A). Cofactor expansion of B along row j gives n det (A) = det (B) = n (aj k + aik )Cj k = k =1 n aj k Cj k + k =1 aik Cj k . k =1 i i i i i i i “main” 2007/2/16 page 218 i 218 CHAPTER 3 Determinants That is, n det (A) = det (A) + aik Cj k , k =1 since by the Cofactor Expansion Theorem the first summation on the right-hand side is simply det (A). It follows immediately that n aik Cj k = 0, i = j. k =1 Equation (3.3.3) can be proved similarly (Problem 47). The Cofactor Expansion Theorem and the above corollary can be combined into the following corollary. Corollary 3.3.13 Let A be an n × n matrix. If δij is the Kronecker delta symbol (see Definition 2.2.19), then n n aik Cj k = δij det (A), k =1 aki Ckj = δij det (A). (3.3.4) k =1 The formulas in (3.3.4) should be reminiscent of the index form of the matrix product. Combining this with the fact that the Kronecker delta gives the elements of the identity matrix, we might suspect that (3.3.4) is telling us something about the inverse of A. Before establishing that this suspicion is indeed correct, we need a definition. DEFINITION 3.3.14 If every element in an n × n matrix A is replaced by its cofactor, the resulting matrix is called the matrix of cofactors and is denoted MC . The transpose of the matrix of T cofactors, MC , is called the adjoint of A and is denoted adj(A). Thus, the elements of adj(A) are adj(A)ij = Cj i . Example 3.3.15 Determine adj(A) if Solution: 2 0 −3 A = −1 5 4 . 3 −2 0 We first determine the cofactors of A: C11 = 8, Thus, C12 = 12, C13 = −13, C21 = 6, C22 = 9, C31 = 15, C32 = −5, C33 = 10. C23 = 4, 8 12 −13 4, MC = 6 9 15 −5 10 i i i i i i i “main” 2007/2/16 page 219 i 3.3 so that Cofactor Expansions 219 8 6 15 T adj(A) = MC = 12 9 −5 . −13 4 10 We can now prove the next theorem. Theorem 3.3.16 (The Adjoint Method for Computing A−1 ) If det (A) = 0, then A−1 = 1 adj(A). det (A) 1 adj(A). Then we must establish that AB = In = BA. But, det (A) using the index form of the matrix product, Proof Let B = n (AB)ij = n aik bkj = k =1 aik · k =1 1 1 · adj(A)kj = det (A) det (A) n aik Cj k = δij , k =1 where we have used Equation (3.3.4) in the last step. Consequently, AB = In . We leave it as an exercise (Problem 53) to verify that BA = In also. Example 3.3.17 For the matrix in Example 3.3.15, det (A) = 55, so that A− 1 8 6 15 1 12 9 −5 . = 55 −13 4 10 For square matrices of relatively small size, the adjoint method for computing A−1 is often easier than using elementary row operations to reduce A to upper triangular form. In Chapter 7, we will find that the solution of a system of differential equations can be expressed naturally in terms of matrix functions. Certain problems will require us to find the inverse of such matrix functions. For 2 × 2 systems, the adjoint method is very quick. Example 3.3.18 Find A−1 if A = Solution: e 2 t e −t . 3e2t 6e−t In this case, det (A) = (e2t )(6e−t ) − (3e2t )(e−t ) = 3et , and adj(A) = 6e−t −e−t −3e2t e 2t , 2 e −2 t − 1 e −2 t 3 1t −et 3e . so that A−1 = i i i i i i i “main” 2007/2/16 page 220 i 220 CHAPTER 3 Determinants Cramer’s Rule We now derive a technique that enables us, in the case when det(A) = 0, to express the unique solution of an n × n linear system Ax = b directly in terms of determinants. Let Bk denote the matrix obtained by replacing the k th column vector of A with b. Thus, a11 a12 . . . b1 . . . a1n a21 a22 . . . b2 . . . a2n Bk = . . . . . . . . . .. . . an1 an2 . . . bn . . . ann The key point to notice is that the cofactors of the elements in the k th column of Bk coincide with the corresponding cofactors of A. Thus, expanding det(Bk ) along the k th column using the Cofactor Expansion Theorem yields n det (Bk ) = b1 C1k + b2 C2k + · · · + bn Cnk = bi Cik , k = 1, 2, . . . , n, (3.3.5) i =1 where the Cij are the cofactors of A. We can now prove Cramer’s rule. Theorem 3.3.19 (Cramer’s Rule) If det (A) = 0, the unique solution to the n × n system Ax = b is (x1 , x2 , . . . , xn ), where det (Bk ) , det (A) xk = k = 1, 2, . . . , n. (3.3.6) Proof If det(A) = 0, then the system Ax = b has the unique solution x = A−1 b, (3.3.7) where, from Theorem 3.3.16, we can write A−1 = If we let 1 adj(A). det (A) x1 x2 x= . . . (3.3.8) and xn b1 b2 b= . . . bn and recall that adj(A)ij = Cj i , then substitution from (3.3.8) into (3.3.7) and use of the index form of the matrix product yields n xk = i =1 (A−1 )ki bi = n i =1 1 adj(A)ki bi det (A) i i i i i i i “main” 2007/2/16 page 221 i 3.3 = 1 det (A) Cofactor Expansions 221 n k = 1, 2, . . . , n. Cik bi , i =1 Using (3.3.5), we can write this as xk = det (Bk ) , det (A) k = 1, 2, . . . , n as required. Remark In general, Cramer’s rule requires more work than the Gaussian elimination method, and it is restricted to n × n systems whose coefficient matrix is invertible. However, it is a powerful theoretical tool, since it gives us a formula for the solution of an n × n system, provided det(A) = 0. Example 3.3.20 Solve 3x1 + 2x2 − x3 = 4, x1 + x2 − 5x3 = −3, −2x1 − x2 + 4x3 = 0. Solution: The following determinants are easily evaluated: det (A) = det (B2 ) = 3 2 −1 1 1 −5 = 8, −2 −1 4 4 2 −1 det (B1 ) = −3 1 −5 = 17, 0 −1 4 3 4 −1 1 −3 −5 = −6, −2 0 4 det (B3 ) = Inserting these results into (3.3.6) yields x1 = 3 that the solution to the system is ( 17 , − 4 , 7 ). 8 8 17 8 , x2 324 1 1 −3 = 7. −2 −1 0 3 = − 6 = − 4 , and x3 = 7 , so 8 8 Exercises for 3.3 Key Terms Minor, Cofactor, Cofactor expansion, Matrix of cofactors, Adjoint, Cramer’s rule. Skills • Be able to compute the minors and cofactors of a matrix. • Understand the difference between Mij and Cij . • Be able to compute the determinant of a matrix via cofactor expansion. • Be able to compute the matrix of cofactors and the adjoint of a matrix. • Be able to use the adjoint of an invertible matrix A to compute A−1 . • Be able to use Cramer’s rule to solve a linear system of equations. True-False Review For Questions 1–7, decide if the given statement is true or false, and give a brief justification for your answer. If true, you can quote a relevant definition or theorem from the text. If false, provide an example, illustration, or brief explanation of why the statement is false. 1. The (2, 3)-minor of a matrix is the same as the (2, 3)cofactor of the matrix. 2. We have A · adj(A) = det (A) · In for all n × n matrices A. i i i i i i i “main” 2007/2/16 page 222 i 222 CHAPTER 3 Determinants 3. Cofactor expansion of a matrix along any row or column will yield the same result, although the individual terms in the expansion along different rows or columns can vary. 4. If A is an n × n matrix and c is a scalar, then adj(cA) = c · adj(A). 5. If A and B are 2 × 2 matrices, then adj(A + B) = adj(A) + adj(B). 6. If A and B are 2 × 2 matrices, then adj(AB) = adj(A) · adj(B). 7. For every n, adj(In ) = In . Problems For Problems 1–3, determine all minors and cofactors of the given matrix. 1 2. A = 3 2 2 3. A = 0 4 −1 2 −1 4 . 15 10 3 −1 0 . 15 4. If 0 2 −3 9. −2 0 5 , row 3. 3 −5 0 1 −2 3 0 4 0 7 −2 10. , column 4. 0134 1 5 −2 0 For Problems 11–19, evaluate the given determinant using the techniques of this section. 1 0 −2 11. 3 1 −1 . 72 5 −1 2 3 0 14. 2 −1 3 2 −1 3 13. 5 2 1 . 3 −3 7 14. 1 3 A= 7 5 3 −1 2 4 1 2 , 1 4 6 0 12 determine the minors M12 , M31 , M23 , M42 , and the corresponding cofactors. For Problems 5–10, use the Cofactor Expansion Theorem to evaluate the given determinant along the specified row or column. 5. 31 4 8. 7 1 2 , column 1. 2 3 −5 12. 1 −3 . 24 1. A = 2 1 −4 7. 7 1 3 , row 2. 1 5 −2 1 −2 , row 1. 13 −1 2 3 6. 1 4 −2 , column 3. 31 4 0 −2 1 2 0 −3 . −1 3 0 1 0 15. −1 0 0 −1 0 1 0 −1 . 0 −1 0 101 2 −1 3 1 1 4 −2 3 16. . 0 2 −1 0 1 3 −2 4 352 6 2 3 5 −5 17. . 7 5 −3 −16 9 −6 27 −12 2 −7 4 3 5 5 −3 7 18. . 6 2 63 4 2 −4 5 i i i i i i i “main” 2007/2/16 page 223 i 3.3 2 0 19. 0 1 3 0 −1 3 0 3 0 12 1 3 04. 0 1 −1 0 0 2 05 20. If 0xyz −x 0 1 −1 A= −y −1 0 1 , −z 1 −1 0 show that det(A) = (x + y + z)2 . 21. (a) Consider the 3 × 3 Vandermonde determinant V (r1 , r2 , r3 ) defined by 111 V (r1 , r2 , r3 ) = r1 r2 r3 . 222 r1 r2 r3 Show that V (r1 , r2 , r3 ) = (r2 − r1 )(r3 − r1 )(r3 − r2 ). (b) More generally, show that the n × n Vandermonde determinant 1 r1 2 V (r1 , r2 , . . . , rn ) = r1 . . . 1 ... r2 . . . 2 r2 . . . . . . 1 rn 2 rn . . . n n n r1 −1 r2 −1 . . . rn −1 has value V (r1 , r2 , . . . , rn ) = (rm − ri ). 1≤i<m≤n For Problems 22–31, find (a) det(A), (b) the matrix of cofactors MC , (c) adj(A), and, if possible, (d) A−1 . 22. A = 31 . 45 23. A = −1 −2 . 41 52 24. A = . −15 −6 2 −3 0 25. A = 2 1 5 . 0 −1 2 Cofactor Expansions 223 −2 3 −1 26. A = 2 1 5 . 02 3 1 −1 2 27. A = 3 −1 4 . 5 17 0 12 28. A = −1 −1 3 . 1 −2 1 2 −3 5 29. A = 1 2 1 . 0 7 −1 11 1 1 −1 1 −1 1 30. A = 1 1 −1 −1 . −1 1 1 −1 103 5 −2 1 1 3 31. A = 3 9 0 2 . 2 0 3 −1 1 −2x 2x 2 32. Let A = 2x 1 − 2x 2 −2x . 2x 1 2x 2 (a) Show that det(A) = (1 + 2x 2 )3 . (b) Use the adjoint method to find A−1 . In Problems 33–35, find the specified element in the inverse of the given matrix. Do not use elementary row operations. 111 33. A = 1 2 2 ; (3, 2)-element. 123 2 0 −1 34. A = 2 1 1 ; (3, 1)-element. 3 −1 0 1 0 10 2 −1 1 3 35. A = 0 1 −1 2 ; (2, 3)-element. −1 1 2 0 In Problems 36–38, find A−1 . 36. A = 3et e2t . 2e t 2e 2t 37. A = et sin 2t −e−t cos 2t . et cos 2t e−t sin 2t i i i i i i i “main” 2007/2/16 page 224 i 224 CHAPTER 3 Determinants et tet e−2t 38. A = et 2tet e−2t . et tet 2e−2t 46. Find all solutions to the system (b + c)x1 + a(x2 + x3 ) = a, (c + a)x1 + b(x3 + x1 ) = b, (a + b)x1 + c(x1 + x2 ) = c, 123 A = 3 4 5, 456 39. If compute the matrix product A · adj(A). What can you conclude about det(A)? For Problems 40–43, use Cramer’s rule to solve the given linear system. where a, b, c are constants. Make sure you consider all cases (that is, those when there is a unique solution, an infinite number of solutions, and no solutions). 47. Prove Equation (3.3.3). 48. Let A be a randomly generated invertible 4 × 4 matrix. Verify the Cofactor Expansion Theorem for expansion along row 1. 40. 2x1 − 3x2 = 2, x1 + 2x2 = 4. 49. 41. 3x1 − 2x2 + x3 = 4, x1 + x2 − x3 = 2, x1 + x3 = 1. Let A be a randomly generated 4 × 4 matrix. Verify Equation (3.3.3) when i = 2 and j = 4. 50. x1 − 3x2 + x3 = 0, x1 + 4x2 − x3 = 0, 2x1 + x2 − 3x3 = 0. Let A be a randomly generated 5 × 5 matrix. Determine adj(A) and compute A · adj(A). Use your result to determine det(A). 51. 42. 43. x1 − 2x2 + 3x3 − x4 + x3 2x1 x1 + x2 − x4 x2 − 2x3 + x4 = = = = 1, 2, 0, 3. 1.21x1 + 3.42x2 + 2.15x3 = 3.25, 5.41x1 + 2.32x2 + 7.15x3 = 4.61, 21.63x1 + 3.51x2 + 9.22x3 = 9.93. Round answers to two decimal places. 44. Use Cramer’s rule to determine x1 and x2 if et x1 + e−2t x2 = 3 sin t, et x1 − 2e−2t x2 = 4 cos t. 52. 45. Determine the value of x2 such that x1 2x1 x1 3x1 + + + + 4x2 9x2 5x2 14x2 − − + + 3.4 2x3 3x3 x3 7x3 + − − − Solve the system of equations x4 2x4 x4 2x4 = = = = 2, 5, 3, 6. Use Cramer’s rule to solve the system Ax = b if 12344 68 2 1 2 3 4 −72 A = 3 2 1 2 3, and b = −87 . 4 3 2 1 2 79 44321 43 53. Verify that BA = In in the proof of Theorem 3.3.16. Summary of Determinants The primary aim of this section is to serve as a stand-alone introduction to determinants for readers who desire only a cursory review of the major facts pertaining to determinants. It may also be used as a review of the results derived in Sections 3.1–3.3. Formulas for the Determinant The determinant of an n × n matrix A, denoted det(A), is a scalar whose value can be obtained in the following manner. 1. If A = [a11 ], then det(A) = a11 . i i i i i i i “main” 2007/2/16 page 225 i 3.4 Summary of Determinants 225 a11 a12 , then det(A) = a11 a22 − a12 a21 . a21 a22 3. For n > 2, the determinant of A can be computed using either of the following formulas: 2. If A = det (A) = ai 1 Ci 1 + ai 2 Ci 2 + · · · + ain Cin , det (A) = a1j C1j + a2j C2j + · · · + anj Cnj , (3.4.1) (3.4.2) where Cij = (−1)i +j Mij , and Mij is the determinant of the matrix obtained by deleting the i th row and j th column of A. The formulas (3.4.1) and (3.4.2) are referred to as cofactor expansion along the i th row and cofactor expansion along the j th column, respectively. The determinants Mij and Cij are called the minors and cofactors of A, respectively. We also denote det(A) by a11 a12 . . . a1n a21 a22 . . . a2n .. .. .. . .. . an1 an2 . . . ann As an example, consider the general 3 × 3 matrix a11 a12 a13 A = a21 a22 a23 . a31 a32 a33 Using cofactor expansion along row 1, we have det (A) = a11 C11 + a12 C12 + a13 C13 . (3.4.3) We next compute the required cofactors: C11 = +M11 = a22 a23 = a22 a33 − a23 a32 , a32 a33 C12 = −M12 = − C13 = +M13 = a21 a23 = −(a21 a33 − a23 a31 ), a31 a33 a21 a22 = a21 a32 − a22 a31 . a31 a32 Inserting these expressions for the cofactors into Equation (3.4.3) yields det (A) = a11 (a22 a33 − a23 a32 ) − a12 (a21 a33 − a23 a31 ) + a13 (a21 a32 − a22 a31 ), which can be written as det (A) = a11 a22 a33 + a12 a23 a31 + a13 a21 a32 − a11 a23 a32 − a12 a21 a33 − a13 a22 a31 . a11 a12 a13 a11 a12 a21 a22 a23 a21 a22 a31 a32 a33 a31 a32 Figure 3.4.1: A schematic for obtaining the determinant of a 3 × 3 matrix A = [aij ]. Although we chose to use cofactor expansion along the first row to obtain the preceding formula, according to (3.4.1) and (3.4.2), the same result would have been obtained if we had chosen to expand along any row or column of A. A simple schematic for obtaining the terms in the determinant of a 3 × 3 matrix is given in Figure 3.4.1. By taking the product of the elements joined by each arrow and attaching the indicated sign to the result, we obtain the six terms in the determinant of the 3 × 3 matrix A = [aij ]. Note that this technique for obtaining the terms in a 3 × 3 determinant does not generalize to determinants of larger matrices. i i i i i i i “main” 2007/2/16 page 226 i 226 CHAPTER 3 Determinants Example 3.4.1 Evaluate 2 −1 1 3 42. 7 58 Solution: In this case, the schematic given in Figure 3.4.1 is 2 −1 1 2 −1 3 423 4 7 587 5 so that 2 −1 1 3 4 2 = (2)(4)(8) + (−1)(2)(7) + (1)(3)(5) − (7)(4)(1) − (5)(2)(2) − (8)(3)(−1) 7 58 = 41. Properties of Determinants Let A and B be n × n matrices. The determinant has the following properties: P1. If B is obtained by permuting two rows (or columns) of A, then det (B) = − det (A). P2. If B is obtained by multiplying any row (or column) of A by a scalar k , then det (B) = k det (A). P3. If B is obtained by adding a multiple of any row (or column) of A to another row (or column) of A, then det (B) = det (A). P4. det(AT ) = det(A). P5. Let a1 , a2 , . . . , an denote the row vectors of A. If the i th row vector of A is the sum of two row vectors, say ai = bi + ci , then det (A) = det (B) + det (C), where B = [a1 , a2 , . . . , ai −1 , bi , ai +1 , . . . , an ]T and C = [a1 , a2 , . . . , ai −1 , ci , ai +1 , . . . , an ]T . The corresponding property for columns is also true. P6. If A has a row (or column) of zeros, then det(A) = 0. P7. If two rows (or columns) of A are the same, then det(A) = 0. P8. det(AB) = det(A)det(B). i i i i i i i “main” 2007/2/16 page 227 i 3.4 Summary of Determinants 227 The first three properties tell us how elementary row operations and elementary column operations performed on a matrix A alter the value of det(A). They can be very helpful in reducing the amount of work required to evaluate a determinant, since we can use elementary row operations to put several zeros in a row or column of A and then use cofactor expansion along that row or column. We illustrate with an example. Example 3.4.2 Evaluate 2 −1 5 −2 1 32 1 −2 2 . 1 −2 1 3 11 Solution: Before performing a cofactor expansion, we first use elementary row operations to simplify the determinant: 21 32 0 3 −1 6 −1 1 −2 2 1 −1 1 −2 2 5 1 −2 1 ∼ 0 6 −12 11 −2 3 1 1 01 5 −3 According to P3, the determinants of the two matrices above are the same. To evaluate the determinant of the matrix on the right, we use cofactor expansion along the first column. 0 3 −1 6 3 −1 6 −1 1 −2 2 = −(−1) 6 −12 11 0 6 −12 11 1 5 −3 01 5 −3 To evaluate the determinant of the 3 × 3 matrix on the right, we can use the schematic given in Figure 3.4.1, or, we can continue to use elementary row operations to introduce zeros into the matrix: 3 −1 6 0 −16 15 −16 15 2 6 −12 11 = 0 −42 29 = = 166. −42 29 1 5 −3 1 5 −3 Here, we have reduced the 3 × 3 determinant to a 2 × 2 determinant by using cofactor expansion along the first column of the 3 × 3 matrix. 1. A21 (2), A23 (5), A24 (−2) 2. A31 (−3), A32 (−6) Basic Theoretical Results The determinant is a useful theoretical tool in linear algebra. We list next the major results that will be needed in the remainder of the text. 1. The volume of the parallelepiped determined by the vectors a = a1 i + a2 j + a3 k, b = b1 i + b2 j + b3 k, c = c1 i + c2 j + c3 k is Volume = | det (A)|, a1 a2 a3 where A = b1 b2 b3 . c1 c2 c3 i i i i i i i “main” 2007/2/16 page 228 i 228 CHAPTER 3 Determinants 2. An n × n matrix is invertible if and only if det(A) = 0. 3. An n × n linear system Ax = b has a unique solution if and only if det(A) = 0. 4. An n × n homogeneous linear system Ax = 0 has an infinite number of solutions if and only if det(A) = 0. We see, for example, that according to (2), the matrices in Examples 3.4.1 and 3.4.2 are both invertible. If A is an n × n matrix with det(A) = 0, then the following two methods can be derived for obtaining the inverse of A and for finding the unique solution to the linear system Ax = b, respectively. 1. Adjoint Method for A−1 : If A is invertible, then A−1 = 1 adj(A), det (A) where adj(A) denotes the transpose of the matrix obtained by replacing each element in A by its cofactor. 2. Cramer’s Rule: If det(A) = 0, then the unique solution to Ax = b is x = (x1 , x2 , . . . , xn ), where xk = Example 3.4.3 det (Bk ) , det (A) k = 1, 2, . . . , n, and Bk denotes the matrix obtained when the k th column vector of A is replaced by b. 2 −1 1 Use the adjoint method to find A−1 if A = 3 4 2 . 7 58 Solution: We have already shown in Example 3.4.1 that det(A) = 41, so that A is invertible. Replacing each element in A with its cofactor yields the matrix of cofactors 22 −10 −13 9 −17 , MC = 13 −6 −1 11 so that 22 13 −6 T 9 −1 . adj(A) = MC = −10 −13 −17 11 Consequently, A−1 = 22 41 1 adj(A) = − 10 41 det (A) 13 41 9 41 − 13 − 17 41 41 Example 3.4.4 6 − 41 1 − 41 . 11 41 Use Cramer’s rule to solve the linear system 2x1 − x2 + x3 = 2, 3x1 + 4x2 + 2x3 = 5, 7x1 + 5x2 + 8x3 = 3. i i i i i i i “main” 2007/2/16 page 229 i 3.4 Solution: Summary of Determinants 229 The matrix of coefficients is 2 −1 1 A = 3 4 2. 7 58 We have already shown in Example 3.4.1 that det(A) = 41. Consequently, Cramer’s rule can indeed be applied. In this problem, we have 2 −1 1 det (B1 ) = 5 4 2 = 91, 3 58 221 det (B2 ) = 3 5 2 = 22, 738 2 −1 2 det (B3 ) = 3 4 5 = −78. 7 53 It therefore follows from Cramer’s rule that x1 = det (B1 ) 91 = , det (A) 41 x2 = det (B2 ) 22 = , det (A) 41 x3 = det (B3 ) 78 =− . det (A) 41 Exercises for 3.4 Skills • Be able to compute the determinant of an n × n matrix. • Know the effects that elementary row operations and elementary column operations have on the determinant of a matrix. • Be able to use the determinant to decide if a matrix is invertible. • Know how the determinant is affected by matrix multiplication and by matrix transpose. • Be able to compute the adjoint of a matrix and use it to find A−1 for an invertible matrix A. Problems For Problems 1–7, evaluate the given determinant. 1. 5 −1 . 37 35 7 2. −1 2 4 . 6 3 −2 3. 514 613 . 14 2 7 2.3 1.5 7.9 4. 4.2 3.3 5.1 . 6.8 3.6 5.7 abc 5. b c a . cab 3 5 −1 2 2 1 52 6. . 3 2 57 1 −1 2 1 7 12 2 −2 4 7. 3 −1 5 18 9 27 3 6 . 4 54 i i i i i i i “main” 2007/2/16 page 230 i 230 CHAPTER 3 Determinants For Problems 8–12, find det(A). If A is invertible, use the adjoint method to find A−1 . 9. 10. 11. 12. 1 A = 2 3 3 A = 2 3 2 A = 4 6 5 3 A= 1 5 4 13 5 15. A = 2 −1 5 , b = 7 . 2 31 2 23 3 1 . 12 53 6 3 16. A = 2 4 −7 , b = −1 . 25 9 4 47 6 1 . 14 −1 57 −3 2 . 9 11 −1 2 −1 4 −1 2 9 −3 3.1 3.5 7.1 3.6 17. A = 2.2 5.2 6.3 , b = 2.5 . 1.4 8.1 0.9 9.3 18. If A is an invertible n × n matrix, prove that 1 5 . 1 2 det (A−1 ) = For Problems 13–17, use Cramer’s rule to determine the unique solution to the system Ax = b for the given matrix and vector. 13. A = 35 ,b = 62 e −t . 3e−t ,b = 35 . 27 8. A = cos t sin t sin t − cos t 14. A = 19. Let A and B be 3 × 3 matrices with det(A) = 3 and det(B) = −4. Determine det (2A), det (A−1 ), det (B 5 ), 4 . 9 3.5 1 . det (A) det (B −1 AB). det (AT B), Chapter Review This chapter has laid out a basic introduction to the theory of determinants. Determinants and Elementary Row Operations For a square matrix A, one approach for computing the determinant of A, det(A), is to use elementary row operations to reduce A to row-echelon form. The effects of the various types of elementary row operations on det(A) are as follows: • Pij : permuting two rows of A alters the determinant by a factor of −1. • Mi (k): multiplying the i th row of A by k multiplies the determinant of the matrix by a factor of k . • Aij (k): adding a multiple of one row of A to another has no effect whatsoever on det(A). A crucial fact in this approach is the following: Theorem 3.5.1 If A is an n × n upper (or lower) triangular matrix, its determinant is det (A) = a11 a22 · · · ann . Therefore, since the row-echelon form of A is upper triangular, we can compute det(A) by using Theorem 3.5.1 and by keeping track of the elementary row operations involved in the row-reduction process. i i i i i i i “main” 2007/2/16 page 231 i 3.5 Chapter Review 231 Cofactor Expansion Another way to compute det(A) is via the Cofactor Expansion Theorem: For n ≥ 2, the determinant of A can be computed using either of the following formulas: det (A) = ai 1 Ci 1 + ai 2 Ci 2 + · · · + ain Cin , det (A) = a1j C1j + a2j C2j + · · · + anj Cnj , (3.5.1) (3.5.2) where Cij = (−1)i +j Mij , and Mij is the determinant of the matrix obtained by deleting the i th row and j th column of A. The formulas (3.5.1) and (3.5.2) are referred to as cofactor expansion along the i th row and cofactor expansion along the j th column, respectively. The determinants Mij and Cij are called the minors and cofactors of A, respectively. Adjoint Method and Cramer’s Rule If A is an n × n matrix with det(A) = 0, then the following two methods can be derived for obtaining the inverse of A and for finding the unique solution to the linear system Ax = b, respectively. 1. Adjoint Method for A−1 : If A is invertible, then A−1 = 1 adj(A), det (A) where adj(A) denotes the transpose of the matrix obtained by replacing each element in A by its cofactor. 2. Cramer’s Rule: If det(A) = 0, then the unique solution to Ax = b is x = (x1 , x2 , . . . , xn ), where xk = det (Bk ) , det (A) k = 1, 2, . . . , n, and Bk denotes the matrix obtained when the k th column vector of A is replaced by b. Additional Problems For Problems 1–6, evaluate the determinant of the given matrix A by using (a) the definition, (b) elementary row operations to reduce A to an upper triangular matrix, and (c) the Cofactor Expansion Theorem. 1. A = −7 −2 . 1 −5 2. A = 66 . −2 1 −1 4 1 3. A = 0 2 2 . 2 2 −3 2 3 −5 4. A = −4 0 2 . 6 −3 3 3 −1 −2 1 0 0 1 4 5. A = 0 2 1 −1 . 0 0 0 −4 0 0 0 −2 0 0 −5 1 6. A = 0 1 −4 1 . −3 −3 −3 −3 i i i i i i i “main” 2007/2/16 page 232 i 232 CHAPTER 3 Determinants For Problems 7–10, suppose that abc A = d e f , and det (A) = 4. ghi Compute the determinant of each matrix below. g h i 7. −4a −4b −4c . 2d 2e 2f a − 5d b − 5e c − 5f . 3h 3i 8. 3g −d + 3g −e + 3h −f + 3i 3b 3e 3h 9. c − 2a f − 2d i − 2g . −a −d −g a−d b−e c−f 2h 2i . 10. 3 2g −d −e −f For Problems 11–14, suppose that A and B are 4 × 4 invertible matrices. If det (A) = −2 and det (B) = 3, compute each determinant below. 11. det (AB). 12. det (B 2 A−1 ). 13. det (((A−1 B)T )(2B −1 )). 14. det ((−A)3 (2B 2 )). 15. Let 21 1 05 1 2 −1 A= , B = 5 −2 , C = 3 −1 4 . 21 4 47 2 −2 6 Determine, if possible, det (A), det (C T ), det (B T AT ), det (B), det (AB), det (BAC), det (C), det (BA), det (ACB). 16. Let A= 12 , 34 and B= 54 . 11 Use the adjoint method to find B −1 and then determine (A−1 B T )−1 . For Problems 17–21, use the adjoint method to determine A−1 for the given matrix A. 2 −1 1 17. A = 0 5 −1 . 113 0 −3 2 2 0 1 1 1 18. A = 1 2 3 −4 . 1 00 5 0001 0 1 3 −3 19. A = −2 −3 −5 2 . 4 −4 4 6 5 8 16 8 . 20. A = 4 1 −4 −4 −11 266 21. A = 2 7 6 . 277 22. Add one row to the matrix A= 4 −1 0 5 14 so as to create a 3 × 3 matrix B with det(B) = 10. 23. True or False: Given any real number r and any 3 × 3 matrix A whose entries are all nonzero, it is always possible to change at most one entry of A to get a matrix B with det(B) = r . 124 24. Let A = 3 1 6 . k32 (a) Find all value(s) of k for which the matrix A fails to be invertible. (b) In terms of k , determine the volume of the parallelepiped determined by the row vectors of the matrix A. Is that the same as the volume of the parallelepiped determined by the column vectors of the matrix A? Explain how you know this without any calculation. 25. Repeat the preceding problem for the matrix k+1 2 1 A = 0 3 k . 1 11 26. Repeat the preceding problem for the matrix 2 k − 3 k2 A = 2 1 4 . 1k 0 i i i i i i i “main” 2007/2/16 page 233 i 3.5 Chapter Review 29. −3x1 + x2 = 3, x1 + 2x2 = 1. 28. A real n × n matrix A is called orthogonal if AAT = AT A = In . If A is an orthogonal matrix, prove that det(A) = ±1. 30. 2x1 − x2 + x3 = 2, 4x1 + 5x2 + 3x3 = 0, 4x1 − 3x2 + 3x3 = 2. For Problems 29– 31, use Cramer’s rule to solve the given linear system. 31. 233 3x1 + x2 + 2x3 = −1, 2x1 − x2 + x3 = −1, 5x2 + 5x3 = −5. 27. Let A and B be n × n matrices such that AB = −BA. Use determinants to prove that if n is odd, then A and B cannot both be invertible. Project: Volume of a Tetrahedron In this project, we use determinants and vectors to derive the formula for the volume of a tetrahedron with vertices A = (x1 , y1 , z1 ), B = (x2 , y2 , z2 ), C = (x3 , y3 , z3 ), and D = (x4 , y4 , z4 ). Let h denote the distance from A to the plane determined by B, C , and D . From geometry, the volume of the tetrahedron is given by Volume = 1 h(area of triangle BCD). 3 (3.5.3) (a) Express the area of triangle BCD in terms of a cross product of vectors. (b) Use trigonometry to express h in terms of the distance from A to B and the angle − → between the vector AB and the segment connecting A to the base BCD at a right angle. (c) Combining (a) and (b) with the volume of the tetrahedron given above, express the volume of the tetrahedron in terms of dot products and cross products of vectors. (d) Following the proof of part 2 of Theorem 3.1.11, express the volume of the tetrahedron in terms of a determinant with entries in terms of the xi , yi , and zi for 1 ≤ i ≤ 4. (e) Show that the expression in part (d) is the same as x1 y1 1 x2 y2 Volume = 6 x3 y3 x4 y4 z1 z2 z3 z4 1 1 . 1 1 (3.5.4) (f) For each set of four points below, determine the volume of the tetrahedron with those points as vertices by using (3.5.3) and by using (3.5.4). Both formulas should yield the same answer. (i) (0, 0, 0), (1, 0, 0), (0, 1, 0), (0, 0, 1). (ii) (−1, 1, 2), (0, 3, 3), (1, −1, 2), (0, 0, 1). i i i i i i i “main” 2007/2/16 page 234 i CHAPTER 4 Vector Spaces To criticize mathematics for its abstraction is to miss the point entirely. Abstraction is what makes mathematics work. — Ian Stewart The main aim of this text is to study linear mathematics. In Chapter 2 we studied systems of linear equations, and the theory underlying the solution of a system of linear equations can be considered as a special case of a general mathematical framework for linear problems. To illustrate this framework, we discuss an example. Consider the homogeneous linear system Ax = 0, where 1 −1 2 A = 2 −2 4 . 3 −3 6 It is straightforward to show that this system has solution set S = {(r − 2s, r, s) : r, s ∈ R}. Geometrically we can interpret each solution as defining the coordinates of a point in space or, equivalently, as the geometric vector with components v = (r − 2s, r, s). Using the standard operations of vector addition and multiplication of a vector by a real number, it follows that v can be written in the form v = r(1, 1, 0) + s(−2, 0, 1). We see that every solution to the given linear problem can be expressed as a linear combination of the two basic solutions (see Figure 4.0.1): v1 = (1, 1, 0) and v2 = (−2, 0, 1). 234 i i i i i i i “main” 2007/2/16 page 235 i 4.1 Vectors in Rn 235 x3 v2 v ( 2, 0, 1) rv1 + sv2 x2 x1 v1 (1, 1, 0) Figure 4.0.1: Two basic solutions to Ax = 0 and an example of an arbitrary solution to the system. We will observe a similar phenomenon in Chapter 6, when we establish that every solution to the homogeneous second-order linear differential equation y + a1 y + a2 y = 0 can be written in the form y(x) = c1 y1 (x) + c2 y2 (x), where y1 (x) and y2 (x) are two nonproportional solutions to the differential equation on the interval of interest. In each of these problems, we have a set of “vectors” V (in the first problem the vectors are ordered triples of numbers, whereas in the second, they are functions that are at least twice differentiable on an interval I ) and a linear vector equation. Further, in both cases, all solutions to the given equation can be expressed as a linear combination of two particular solutions. In the next two chapters we develop this way of formulating linear problems in terms of an abstract set of vectors, V , and a linear vector equation with solutions in V . We will find that many problems fit into this framework and that the solutions to these problems can be expressed as linear combinations of a certain number (not necessarily two) of basic solutions. The importance of this result cannot be overemphasized. It reduces the search for all solutions to a given problem to that of finding a finite number of solutions. As specific applications, we will derive the theory underlying linear differential equations and linear systems of differential equations as special cases of the general framework. Before proceeding further, we give a word of encouragement to the more applicationoriented reader. It will probably seem at times that the ideas we are introducing are rather esoteric and that the formalism is pure mathematical abstraction. However, in addition to its inherent mathematical beauty, the formalism incorporates ideas that pervade many areas of applied mathematics, particularly engineering mathematics and mathematical physics, where the problems under investigation are very often linear in nature. Indeed, the linear algebra introduced in the next two chapters should be considered an extremely important addition to one’s mathematical repertoire, certainly on a par with the ideas of elementary calculus. 4.1 Vectors in Rn In this section, we use some familiar ideas about geometric vectors to motivate the more general and abstract idea of a vector space, which will be introduced in the next section. We begin by recalling that a geometric vector can be considered mathematically as a directed line segment (or arrow) that has both a magnitude (length) and a direction attached to it. In calculus courses, we define vector addition according to the parallelogram law (see Figure 4.1.1); namely, the sum of the vectors x and y is the diagonal of i i i i i i i “main” 2007/2/16 page 236 i 236 CHAPTER 4 Vector Spaces the parallelogram formed by x and y. We denote the sum by x + y. It can then be shown geometrically that for all vectors x, y, z, x+y = y+x y (4.1.1) x + (y + z) = (x + y) + z. x (4.1.2) and y x These are the statements that the vector addition operation is commutative and associaFigure 4.1.1: Parallelogram law tive. The zero vector, denoted 0, is defined as the vector satisfying of vector addition. x + 0 = x, (4.1.3) for all vectors x. We consider the zero vector as having zero magnitude and arbitrary direction. Geometrically, we picture the zero vector as corresponding to a point in space. Let −x denote the vector that has the same magnitude as x, but the opposite direction. Then according to the parallelogram law of addition, x + (−x) = 0. kx, k 0 x kx, k Figure 4.1.2: Scalar multiplication of x by k . 0 (4.1.4) The vector −x is called the additive inverse of x. Properties (4.1.1)–(4.1.4) are the fundamental properties of vector addition. The basic algebra of vectors is completed when we also define the operation of multiplication of a vector by a real number. Geometrically, if x is a vector and k is a real number, then k x is defined to be the vector whose magnitude is |k | times the magnitude of x and whose direction is the same as x if k > 0, and opposite to x if k < 0. (See Figure 4.1.2.) If k = 0, then k x = 0. This scalar multiplication operation has several important properties that we now list. Once more, each of these can be established geometrically using only the foregoing definitions of vector addition and scalar multiplication. For all vectors x and y, and all real numbers r, s and t , 1x (st)x r(x + y) (s + t)x = = = = x, s(t x), r x + r y, s x + t x. (4.1.5) (4.1.6) (4.1.7) (4.1.8) It is important to realize that, in the foregoing development, we have not defined a “multiplication of vectors.” In Chapter 3 we discussed the idea of a dot product and cross product of two vectors in space (see Equations (3.1.4) and (3.1.5)), but for the purposes of discussing abstract vector spaces we will essentially ignore the dot product and cross product. We will revisit the dot product in Section 4.11, when we develop inner product spaces. We will see in the next section how the concept of a vector space arises as a direct generalization of the ideas associated with geometric vectors. Before performing this abstraction, we want to recall some further features of geometric vectors and give one specific and important extension. We begin by considering vectors in the plane. Recall that R2 denotes the set of all ordered pairs of real numbers; thus, R2 = {(x, y) : x ∈ R, y ∈ R}. The elements of this set are called vectors in R2 , and we use the usual vector notation to denote these elements. Geometrically we identify the vector v = (x, y) in R2 with i i i i i i i “main” 2007/2/16 page 237 i Vectors in Rn 4.1 237 y (x, y) (0, y) v x (x, 0) Figure 4.1.3: Identifying vectors in R2 with geometric vectors in the plane. the geometric vector v directed from the origin of a Cartesian coordinate system to the point with coordinates (x, y). This identification is illustrated in Figure 4.1.3. The numbers x and y are called the components of the geometric vector v. The geometric vector addition and scalar multiplication operations are consistent with the addition and scalar multiplication operations defined in Chapter 2 via the correspondence with row (or column) vectors for R2 : If v = (x1 , y1 ) and w = (x2 , y2 ), and k is an arbitrary real number, then v + w = (x1 , y1 ) + (x2 , y2 ) = (x1 + x2 , y1 + y2 ), k v = k(x1 , y1 ) = (kx1 , ky1 ). (4.1.9) (4.1.10) These are the algebraic statements of the parallelogram law of vector addition and the scalar multiplication law, respectively. (See Figure 4.1.4.) Using the parallelogram law of vector addition and Equations (4.1.9) and (4.1.10), it follows that any vector v = (x, y) can be written as v = x i + y j = x(1, 0) + y(0, 1), where i = (1, 0) and j = (0, 1) are the unit vectors pointing along the positive x - and y -coordinate axes, respectively. y (x1 x2, y1 y2) (x2, y2) w v w (x1, y1) v kv (kx1, ky1) x Figure 4.1.4: Vector addition and scalar multiplication in R2 . The properties (4.1.1)–(4.1.8) are now easily verified for vectors in R2 . In particular, the zero vector in R2 is the vector 0 = (0, 0). Furthermore, Equation (4.1.9) implies that (x, y) + (−x, −y) = (0, 0) = 0, so that the additive inverse of the general vector v = (x, y) is −v = (−x, −y). It is straightforward to extend these ideas to vectors in 3-space. We recall that R3 = {(x, y, z) : x ∈ R, y ∈ R, z ∈ R}. As illustrated in Figure 4.1.5, each vector v = (x, y, z) in R3 can be identified with the geometric vector v that joins the origin of a Cartesian coordinate system to the point with coordinates (x, y, z). We call x , y , and z the components of v. i i i i i i i “main” 2007/2/16 page 238 i 238 CHAPTER 4 Vector Spaces z (0, 0, z) (x, y, z) v (0, y, 0) y (x, 0, 0) (x, y, 0) x Figure 4.1.5: Identifying vectors in R3 with geometric vectors in space. Recall that if v = (x1 , y1 , z1 ), w = (x2 , y2 , z2 ), and k is an arbitrary real number, then addition and scalar multiplication were given in Chapter 2 by v + w = (x1 , y1 , z1 ) + (x2 , y2 , z2 ) = (x1 + x2 , y1 + y2 , z1 + z2 ), k v = k(x1 , y1 , z1 ) = (kx1 , ky1 , kz1 ). (4.1.11) (4.1.12) Once more, these are, respectively, the component forms of the laws of vector addition and scalar multiplication for geometric vectors. It follows that an arbitrary vector v = (x, y, z) can be written as v = x i + y j + zk = x(1, 0, 0) + y(0, 1, 0) + z(0, 0, 1), where i = (1, 0, 0), j = (0, 1, 0), and k = (0, 0, 1) denote the unit vectors which point along the positive x -, y -, and z-coordinate axes, respectively. We leave it as an exercise to check that the properties (4.1.1)–(4.1.8) are satisfied by vectors in R3 , where 0 = (0, 0, 0), and the additive inverse of v = (x, y, z) is −v = (−x, −y, −z). We now come to our first major abstraction. Whereas the sets R2 and R3 and their associated algebraic operations arise naturally from our experience with Cartesian geometry, the motivation behind the algebraic operations in Rn for larger values of n does not come from geometry. Rather, we can view the addition and scalar multiplication operations in Rn for n > 3 as the natural extension of the component forms of addition and scalar multiplication in R2 and R3 in (4.1.9)–(4.1.12). Therefore, in Rn we have that if v = (x1 , x2 , . . . , xn ), w = (y1 , y2 , . . . , yn ), and k is an arbitrary real number, then v + w = (x1 + y1 , x2 + y2 , . . . , xn + yn ), k v = (kx1 , kx2 , . . . , kxn ). (4.1.13) (4.1.14) Again, these definitions are direct generalizations of the algebraic operations defined in R2 and R3 , but there is no geometric analogy when n > 3. It is easily established that these operations satisfy properties (4.1.1)–(4.1.8), where the zero vector in Rn is 0 = (0, 0, . . . , 0), and the additive inverse of the vector v = (x1 , x2 , . . . , xn ) is −v = (−x1 , −x2 , . . . , −xn ). The verification of this is left as an exercise. i i i i i i i “main” 2007/2/16 page 239 i 4.1 Example 4.1.1 Vectors in Rn 239 If v = (1.2, 3.5, 2, 0) and w = (12.23, 19.65, 23.22, 9.76), then v + w = (1.2, 3.5, 2, 0) + (12.23, 19.65, 23.22, 9.76) = (13.43, 23.15, 25.22, 9.76) and 2.35v = (2.82, 8.225, 4.7, 0). Exercises for 4.1 Key Terms Vectors in Rn , Vector addition, Scalar multiplication, Zero vector, Additive inverse, Components of a vector. Skills • Be able to perform vector addition and scalar multiplication for vectors in Rn given in component form. • Understand the geometric perspective on vector addition and scalar multiplication in the cases of R2 and R3 . • Be able to formally verify the axioms (4.1.1)–(4.1.8) for vectors in Rn . True-False Review For Questions 1–12, decide if the given statement is true or false, and give a brief justification for your answer. If true, you can quote a relevant definition or theorem from the text. If false, provide an example, illustration, or brief explanation of why the statement is false. 1. The vector (x, y) in R2 is the same as the vector (x, y, 0) in R3 . 2. Each vector (x, y, z) in R3 has exactly one additive inverse. 6. If s and t are scalars and x and y are vectors in Rn , then (s + t)(x + y) = s x + t y. 7. For every vector x in Rn , the vector 0x is the zero vector of Rn . 8. The parallelogram whose sides are determined by vectors x and y in R2 have diagonals determined by the vectors x + y and x − y. 9. If x is a vector in the first quadrant of R2 , then any scalar multiple k x of x is still a vector in the first quadrant of R2 . √ 10. The vector 5i − 6j + 2k in R3 is the same as √ (5, −6, 2). 11. Three vectors x, y, and z in R3 always determine a 3-dimensional solid region in R3 . 12. If x and y are vectors in R2 whose components are even integers and k is a scalar, then x + y and k x are also vectors in R2 whose components are even integers. Problems 1. If x = (3, 1), y = (−1, 2), determine the vectors v1 = 2x, v2 = 3y, v3 = 2x + 3y. Sketch the corresponding points in the xy -plane and the equivalent geometric vectors. 3. The solution set to a linear system of 4 equations and 6 unknowns consists of a collection of vectors in R6 . 2. If x = (−1, −4) and y = (−5, 1), determine the vectors v1 = 3x, v2 = −4y, v3 = 3x + (−4)y. Sketch the corresponding points in the xy -plane and the equivalent geometric vectors. 4. For every vector (x1 , x2 , . . . , xn ) in Rn , the vector (−1) · (x1 , x2 , . . . , xn ) is an additive inverse. 3. If x = (3, −1, 2, 5), y = (−1, 2, 9, −2), determine v = 5x + (−7)y and its additive inverse. 5. A vector whose components are all positive is called a “positive vector.” 4. If x = (1, 2, 3, 4, 5) and z = (−1, 0, −4, 1, 2), find y in R5 such that 2x + (−3)y = −z. i i i i i i i “main” 2007/2/16 page 240 i 240 CHAPTER 4 Vector Spaces 5. Verify the commutative law of addition for vectors in R4 . 6. Verify the associative law of addition for vectors in R4 . 8. Show with examples that if x is a vector in the first quadrant of R2 (i.e., both coordinates of x are positive) and y is a vector in the third quadrant of R2 (i.e., both coordinates of y are negative), then the sum x + y could occur in any of the four quadrants. 7. Verify properties (4.1.5)–(4.1.8) for vectors in R3 . 4.2 Definition of a Vector Space In the previous section, we showed how the set Rn of all ordered n-tuples of real numbers, together with the addition and scalar multiplication operations defined on it, has the same algebraic properties as the familiar algebra of geometric vectors. We now push this abstraction one step further and introduce the idea of a vector space. Such an abstraction will enable us to develop a mathematical framework for studying a broad class of linear problems, such as systems of linear equations, linear differential equations, and systems of linear differential equations, which have far-reaching applications in all areas of applied mathematics, science, and engineering. Let V be a nonempty set. For our purposes, it is useful to call the elements of V vectors and use the usual vector notation u, v, . . . , to denote these elements. For example, if V is the set of all 2 × 2 matrices, then the vectors in V are 2 × 2 matrices, whereas if V is the set of all positive integers, then the vectors in V are positive integers. We will be interested only in the case when the set V has an addition operation and a scalar multiplication operation defined on its elements in the following senses: Vector Addition: A rule for combining any two vectors in V . We will use the usual + sign to denote an addition operation, and the result of adding the vectors u and v will be denoted u + v. Real (or Complex) Scalar Multiplication: A rule for combining each vector in V with any real (or complex) number. We will use the usual notation k v to denote the result of scalar multiplying the vector v by the real (or complex) number k . To combine the two types of scalar multiplication, we let F denote the set of scalars for which the operation is defined. Thus, for us, F is either the set of all real numbers or the set of all complex numbers. For example, if V is the set of all 2 × 2 matrices with complex elements and F denotes the set of all complex numbers, then the usual operation of matrix addition is an addition operation on V , and the usual method of multiplying a matrix by a scalar is a scalar multiplication operation on V . Notice that the result of applying either of these operations is always another vector (2 × 2 matrix) in V . As a further example, let V be the set of positive integers, and let F be the set of all real numbers. Then the usual operations of addition and multiplication within the real numbers define addition and scalar multiplication operations on V . Note in this case, however, that the scalar multiplication operation, in general, will not yield another vector in V , since when we multiply a positive integer by a real number, the result is not, in general, a positive integer. We are now in a position to give a precise definition of a vector space. i i i i i i i “main” 2007/2/16 page 241 i 4.2 Definition of a Vector Space 241 DEFINITION 4.2.1 Let V be a nonempty set (whose elements are called vectors) on which are defined an addition operation and a scalar multiplication operation with scalars in F . We call V a vector space over F , provided the following ten conditions are satisfied: A1. Closure under addition: For each pair of vectors u and v in V , the sum u + v is also in V . We say that V is closed under addition. A2. Closure under scalar multiplication: For each vector v in V and each scalar k in F , the scalar multiple k v is also in V . We say that V is closed under scalar multiplication. A3. Commutativity of addition: For all u, v ∈ V , we have u + v = v + u. A4. Associativity of addition: For all u, v, w ∈ V , we have (u + v) + w = u + (v + w). A5. Existence of a zero vector in V : In V there is a vector, denoted 0, satisfying v + 0 = v, for all v ∈ V . A6. Existence of additive inverses in V : For each vector v In V , there is a vector, denoted −v, in V such that v + (−v) = 0. A7. Unit property: For all v ∈ V , 1v = v. A8. Associativity of scalar multiplication: For all v ∈ V and all scalars r, s ∈ F , (rs)v = r(s v). A9. Distributive property of scalar multiplication over vector addition: For all u, v ∈ V and all scalars r ∈ F , r(u + v) = r u + r v. A10. Distributive property of scalar multiplication over scalar addition: For all v ∈ V and all scalars r, s ∈ F , (r + s)v = r v + s v. Remarks 1. A key point to note is that in order to define a vector space, we must start with all of the following: (a) A nonempty set of vectors V . (b) A set of scalars F (either R or C). i i i i i i i “main” 2007/2/16 page 242 i 242 CHAPTER 4 Vector Spaces (c) An addition operation defined on V . (d) A scalar multiplication operation defined on V . Then we must check that the axioms A1–A10 are satisfied. 2. Terminology: A vector space over the real numbers will be referred to as a real vector space, whereas a vector space over the complex numbers will be called a complex vector space. 3. As indicated in Definition 4.2.1, we will use boldface to denote vectors in a general vector space. In handwriting, it is strongly advised that vectors be denoted either → as v or as v . This will avoid any confusion between vectors in V and scalars in F . ∼ 4. When we deal with a familiar vector space, we will use the usual notation for vectors in the space. For example, as seen below, the set Rn of ordered n-tuples is a vector space, and we will denote vectors here in the form (x1 , x2 , . . . , xn ), as in the previous section. As another illustration, it is shown below that the set of all real-valued functions defined on an interval is a vector space, and we will denote the vectors in this vector space by f, g, . . . . Examples of Vector Spaces 1. The set of all real numbers, together with the usual operations of addition and multiplication, is a real vector space. 2. The set of all complex numbers is a complex vector space when we use the usual operations of addition and multiplication by a complex number. It is also possible to restrict the set of scalars to R, in which case the set of complex numbers becomes a real vector space. 3. The set Rn , together with the operations of addition and scalar multiplication defined in (4.1.13) and (4.1.14), is a real vector space. As we saw in the previous section, the zero vector in Rn is the n-tuple of zeros (0, 0, . . . , 0), and the additive inverse of the vector v = (x1 , x2 , . . . , xn ) is −v = (−x1 , −x2 , . . . , −xn ). Strictly speaking, for each of the examples above it is necessary to verify all of the axioms A1–A10 of a vector space. However, in these examples, the axioms hold immediately as well-known properties of real and complex numbers and n-tuples. Example 4.2.2 Let V be the set of all 2 × 2 matrices with real elements. Show that V , together with the usual operations of matrix addition and multiplication of a matrix by a real number, is a real vector space. Solution: We must verify the axioms A1–A10. If A and B are in V (that is, A and B are 2 × 2 matrices with real entries), then A + B and kA are in V for all real numbers k . Consequently, V is closed under addition and scalar multiplication, and therefore Axioms A1 and A2 of the vector space definition hold. A3. Given two 2 × 2 matrices A= a1 a2 a3 a4 and B= b1 b2 , b3 b4 we have A+B = = a1 a2 a3 a4 + b1 b2 b3 b4 b1 + a1 b2 + a2 b3 + a3 b4 + a4 = = a1 + b1 a2 + b2 a3 + b3 a4 + b4 b1 b2 b3 b4 + a1 a2 a3 a4 = B + A. i i i i i i i “main” 2007/2/16 page 243 i 4.2 Definition of a Vector Space 243 A4. Given three 2 × 2 matrices A= a1 a2 , a3 a4 b1 b2 , b3 b4 B= c1 c2 , c3 c4 C= we have a1 a2 a3 a4 (A + B) + C = b1 b2 b3 b4 + c1 c2 c3 c4 + = a1 + b1 a2 + b2 a3 + b3 a4 + b4 = (a1 + b1 ) + c1 (a2 + b2 ) + c2 (a3 + b3 ) + c3 (a4 + b4 ) + c4 = a1 + (b1 + c1 ) a2 + (b2 + c2 ) a3 + (b3 + c3 ) a4 + (b4 + c4 ) = a1 a2 a3 a4 + = a1 a2 a3 a4 + b1 b2 b3 b4 + A+ 00 00 = A. + c1 c2 c3 c4 b1 + c1 b2 + c2 b3 + c3 b4 + c4 c1 c2 c3 c4 = A + (B + C). A5. If A is any matrix in V , then Thus, 02 is the zero vector in V . ab cd A6. The additive inverse of A = is −A = −a −b , since −c −d a + (−a) b + (−b) c + (−c) d + (−d) A + (−A) = 00 00 = = 02 . A7. If A is any matrix in V , then 1A = A, thus verifying the unit property. A8. Given a matrix A = (rs)A = ab cd and scalars r and s , we have (rs)a (rs)b (rs)c (rs)d = r (sa) r(sb) r(sc) r(sd) =r s a sb sc sd = r(sA), as required. A9. Given matrices A = r(A + B) = r =r = a1 a2 a3 a4 a1 a2 a3 a4 + b1 b2 b3 b4 and B = and a scalar r , we have b1 b2 b3 b4 a1 + b1 a2 + b2 a3 + b3 a4 + b4 = r a1 + rb1 ra2 + rb2 ra3 + rb3 ra4 + rb4 r (a1 + b1 ) r(a2 + b2 ) r(a3 + b3 ) r(a4 + b4 ) = r a1 ra2 ra3 ra4 + r b1 rb2 rb3 rb4 = rA + rB. i i i i i i i “main” 2007/2/16 page 244 i 244 CHAPTER 4 Vector Spaces A10. Given A, r , and s as in A8 above, we have (r + s)A = = (r + s)a (r + s)b (r + s)c (r + s)d = r a rb rc rd = rA + sA, + s a sb sc sd r a + sa rb + sb rc + sc rd + sd as required. Thus V , together with the given operations, is a real vector space. Remark In a manner similar to the previous example, it is easily established that the set of all m × n matrices with real entries is a real vector space when we use the usual operations of addition of matrices and multiplication of matrices by a real number. We will denote the vector space of all m × n matrices with real elements by Mm×n (R), and we denote the vector space of all n × n matrices with real elements by Mn (R). Example 4.2.3 Let V be the set of all real-valued functions defined on an interval I . Define addition and scalar multiplication in V as follows. If f and g are in V and k is any real number, then f + g and kf are defined by (f + g)(x) = f (x) + g(x) (kf )(x) = kf (x) for all x ∈ I, for all x ∈ I. Show that V , together with the given operations of addition and scalar multiplication, is a real vector space. Solution: It follows from the given definitions of addition and scalar multiplication that if f and g are in V , and k is any real number, then f + g and kf are both real-valued functions on I and are therefore in V . Consequently, the closure axioms A1 and A2 hold. We now check the remaining axioms. A3. Let f and g be arbitrary functions in V . From the definition of function addition, we have (f + g)(x) = f (x) + g(x) = g(x) + f (x) = (g + f )(x), y for all x ∈ I. (The middle step here follows from the fact that f (x) and g(x) are real numbers associated with evaluating f and g at the input x , and real number addition commutes.) Consequently, f + g = g + f (since the values of f + g and g + f agree for every x ∈ I ), and so addition in V is commutative. y f (x) y I y A4. Let f, g, h ∈ V . Then for all x ∈ I , we have O(x) x —f (x) Figure 4.2.1: In the vector space of all functions defined on an interval I , the additive inverse of a function f is obtained by reflecting the graph of f about the x -axis. The zero vector is the zero function O(x). [(f + g) + h](x) = (f + g)(x) + h(x) = [f (x) + g(x)] + h(x) = f (x) + [g(x) + h(x)] = f (x) + (g + h)(x) = [f + (g + h)](x). Consequently, (f +g)+h = f +(g +h), so that addition in V is indeed associative. A5. If we define the zero function, O , by O(x) = 0, for all x ∈ I , then (f + O)(x) = f (x) + O(x) = f (x) + 0 = f (x), for all f ∈ V and all x ∈ I , which implies that f + O = f . Hence, O is the zero vector in V . (See Figure 4.2.1.) i i i i i i i “main” 2007/2/16 page 245 i 4.2 Definition of a Vector Space 245 A6. If f ∈ V , then −f is defined by (−f )(x) = −f (x) for all x ∈ I , since [f + (−f )](x) = f (x) + (−f )(x) = f (x) − f (x) = 0 for all x ∈ I . This implies that f + (−f ) = O . A7. Let f ∈ V . Then, by definition of the scalar multiplication operation, for all x ∈ I , we have (1f )(x) = 1f (x) = f (x). Consequently, 1f = f . A8. Let f ∈ V , and let r, s ∈ R. Then, for all x ∈ I , [(rs)f ](x) = (rs)f (x) = r [sf (x)] = r [(sf )(x)]. Hence, the functions (rs)f and r(sf ) agree on every x ∈ I , and hence (rs)f = r(sf ), as required. A9. Let f, g ∈ V and let r ∈ R. Then, for all x ∈ I , [r(f + g)] (x) = r [(f + g)(x)] = r [f (x) + g(x)] = rf (x) + rg(x) = (rf )(x) + (rg)(x) = (rf + rg)(x). Hence, r(f + g) = rf + rg . A10. Let f ∈ V , and let r, s ∈ R. Then for all x ∈ I , [(r +s)f ](x) = (r +s)f (x) = rf (x)+sf (x) = (rf )(x)+(sf )(x) = (rf +sf )(x), which proves that (r + s)f = rf + sf . Since all parts of Definition 4.2.1 are satisfied, it follows that V , together with the given operations of addition and scalar multiplication, is a real vector space. Remark As the previous two examples indicate, a full verification of the vector space definition can be somewhat tedious and lengthy, although it is usually straightforward. Be careful to not leave out any important steps in such a verification. The Vector Space Cn We now introduce the most important complex vector space. Let Cn denote the set of all ordered n-tuples of complex numbers. Thus, Cn = {(z1 , z2 , . . . , zn ) : z1 , z2 , . . . , zn ∈ C}. We refer to the elements of Cn as vectors in Cn . A typical vector in Cn is (z1 , z2 , . . . , zn ), where each zk is a complex number. Example 4.2.4 The following are examples of vectors in C2 and C4 , respectively: u = (2.1 − 3i, −1.5 + 3.9i), v = (5 + 7i, 2 − i, 3 + 4i, −9 − 17i). In order to obtain a vector space, we must define appropriate operations of “vector addition” and “multiplication by a scalar” on the set of vectors in question. In the case of Cn , we are motivated by the corresponding operations in Rn and thus define the addition i i i i i i i “main” 2007/2/16 page 246 i 246 CHAPTER 4 Vector Spaces and scalar multiplication operations componentwise. Thus, if u = (u1 , u2 , . . . , un ) and v = (v1 , v2 , . . . , vn ) are vectors in Cn and k is an arbitrary complex number, then u + v = (u1 + v1 , u2 + v2 , . . . , un + vn ), k u = (ku1 , ku2 , . . . , kun ). Example 4.2.5 If u = (1 − 3i, 2 + 4i), v = (−2 + 4i, 5 − 6i), and k = 2 + i , find u + k v. Solution: We have u + k v = (1 − 3i, 2 + 4i) + (2 + i)(−2 + 4i, 5 − 6i) = (1 − 3i, 2 + 4i) + (−8 + 6i, 16 − 7i) = (−7 + 3i, 18 − 3i). It is straightforward to show that Cn , together with the given operations of addition and scalar multiplication, is a complex vector space. Further Properties of Vector Spaces The main reason for formalizing the definition of an abstract vector space is that any results that we can prove based solely on the definition will then apply to all vector spaces we care to examine; that is, we do not have to prove separate results for geometric vectors, m × n matrices, vectors in Rn or Cn , or real-valued functions, and so on. The next theorem lists some results that can be proved using the vector space axioms. Theorem 4.2.6 Let V be a vector space over F . 1. The zero vector is unique. 2. 0u = 0 for all u ∈ V . 3. k 0 = 0 for all scalars k ∈ F . 4. The additive inverse of each element in V is unique. 5. For all u ∈ V , −u = (−1)u. 6. If k is a scalar and u ∈ V such that k u = 0, then either k = 0 or u = 0. Proof 1. Suppose there were two zero vectors in V , denoted 01 and 02 . Then, for any v ∈ V , we would have v + 01 = v (4.2.1) v + 02 = v. (4.2.2) and We must prove that 01 = 02 . But, applying (4.2.1) with v = 02 , we have 02 = 02 + 01 = 01 + 02 = 01 (Axiom A3) (from (4.2.2) with v = 01 ). Consequently, 01 = 02 , so the zero vector is unique in a vector space. i i i i i i i “main” 2007/2/16 page 247 i 4.2 247 Definition of a Vector Space 2. Let u be an arbitrary element in a vector space V . Since 0 = 0 + 0, we have 0u = (0 + 0)u = 0u + 0u, by Axiom A10. Now Axiom A6 implies that the vector −(0u) exists, and adding it to both sides of the previous equation yields 0u + [−(0u)] = (0u + 0u) + [−(0u)]. Thus, since addition in a vector space is associative (Axiom A4), 0u + [−(0u)] = 0u + (0u + [−(0u)]). Applying Axiom A6 on both sides and then using Axiom A5, this becomes 0 = 0u + 0 = 0u , and this completes the verification of (2). 3. Using the fact that 0 = 0 + 0 (by Axiom A5), the proof here proceeds along the same lines as the proof of result 2. We leave the verification to the reader as an exercise (Problem 21 ). 4. Let u ∈ V be an arbitrary vector, and suppose that there were two additive inverses, say v and w, for u. According to Axiom A6, this implies that u+v =0 (4.2.3) u + w = 0. (4.2.4) and We wish to show that v = w. Now, Axiom A6 implies that a vector −v exists, so adding it on the right to both sides of (4.2.3) yields (u + v) + (−v) = 0 + (−v) = −v. Applying Axioms A4 and A6 on the left side, we simplify this to u = −v. Substituting this into (4.2.4) yields −v + w = 0. Adding v to the left of both sides and applying Axioms A4 and A6 once more yields v = w, as desired. 5. To verify that −u = (−1)u for all u ∈ V , we note that 0 = 0u = (1 + (−1))u = 1u + (−1)u = u + (−1)u, where we have used property 2 and Axioms A10 and A7. The equation above proves that (−1)u is an additive inverse of u, and by the uniqueness of additive inverses that we just proved, we conclude that (−1)u = −u, as desired. Finally, we leave the proof of result 6 in Theorem 4.2.6 as an exercise (Problem 22). i i i i i i i “main” 2007/2/16 page 248 i 248 CHAPTER 4 Vector Spaces Remark The proof of Theorem 4.2.6 involved a number of tedious and seemingly obvious steps. It is important to remember, however, that in an abstract vector space we are not allowed to rely on past experience in deriving results for the first time. For instance, the statement “0 + 0 = 0” may seem intuitively clear, but in our newly developed mathematical structure, we must appeal specifically to the rules A1–A10 given for a vector space. Hence, the statement “0 + 0 = 0” should be viewed as a consequence of Axiom A5 and nothing else. Once we have proved these basic results, of course, then we are free to use them in any vector space context where they are needed. This is the whole advantage to working in the general vector space setting. We end this section with a list of the most important vector spaces that will be required throughout the remainder of the text. In each case the addition and scalar multiplication operations are the usual ones associated with the set of vectors. • Rn , the (real) vector space of all ordered n-tuples of real numbers. • Cn , the (complex) vector space of all ordered n-tuples of complex numbers. • Mm×n (R), the (real) vector space of all m × n matrices with real elements. • Mn (R), the (real) vector space of all n × n matrices with real elements. • C k (I ), the vector space of all real-valued functions that are continuous and have (at least) k continuous derivatives on I . We will show that this set of vectors is a (real) vector space in the next section. • Pn , the (real) vector space of all real-valued polynomials of degree ≤ n with real coefficients. That is, Pn = {a0 + a1 x + a2 x 2 + · · · + an x n : a0 , a1 , . . . , an ∈ R}. We leave the verification that Pn is a (real) vector space as an exercise (Problem 23). Exercises for 4.2 Key Terms Vector space (real or complex), Closure under addition, Closure under scalar multiplication, Commutativity of addition, Associativity of addition, Existence of zero vector, Existence of additive inverses, Unit property, Associativity of scalar multiplication, Distributive properties, Examples: Rn , Cn , Mn (R), C k (I ), Pn . Skills • Be able to define a vector space. Specifically, be able to identify and list the ten axioms A1–A10 governing the vector space operations. • Know each of the standard examples of vector spaces given at the end of the section, and know how to perform the vector operations in these vector spaces. • Be able to check whether or not each of the axioms A1– A10 holds for specific examples V . This includes, if possible, closure of V under vector addition and scalar multiplication, as well as identification of the zero vector and the additive inverse of each vector in the set V. • Be able to prove basic properties that hold generally for vector spaces V (see Theorem 4.2.6). i i i i i i i “main” 2007/2/16 page 249 i 4.2 Definition of a Vector Space 249 True-False Review (a) Is the zero vector from M2 (R) in S ? For Questions 1–8, decide if the given statement is true or false, and give a brief justification for your answer. If true, you can quote a relevant definition or theorem from the text. If false, provide an example, illustration, or brief explanation of why the statement is false. (b) Give an explicit example illustrating that S is not closed under matrix addition. (c) Is S closed under scalar multiplication? Justify your answer. 2. If v is a vector in a vector space V , and r and s are scalars such that r v = s v, then r = s . 7. Let N = {1, 2, . . . } denote the set of all positive integers. Give three reasons why N, together with the usual operations of addition and scalar multiplication, is not a real vector space. 3. The set Z of integers, together with the usual operations of addition and scalar multiplication, forms a vector space. 8. We have defined the set R2 = {(x, y) : x, y ∈ R}, together with the addition and scalar multiplication operations as follows: 1. The zero vector in a vector space V is unique. 4. If x and y are vectors in a vector space V , then the additive inverse of x + y is (−x) + (−y). 5. The additive inverse of a vector v in a vector space V is unique. 6. The set {0}, with the usual operations of addition and scalar multiplication, forms a vector space. (x1 , y1 ) + (x2 , y2 ) = (x1 + x2 , y1 + y2 ), k(x1 , y1 ) = (kx1 , ky1 ). Give a complete verification that each of the vector space axioms is satisfied. 7. The set {0, 1}, with the usual operations of addition and scalar multiplication, forms a vector space. 9. Determine the zero vector in the vector space M2×3 (R), and the additive inverse of a general element. (Note that the vector space axioms A1–A4 and A7–A10 follow directly from matrix algebra.) 8. The set of positive real numbers, with the usual operations of addition and scalar multiplication, forms a vector space. 10. Generalize the previous exercise to find the zero vector and the additive inverse of a general element of Mm×n (R). 11. Let P denote the set of all polynomials whose degree is exactly 2. Is P a vector space? Justify your answer. Problems For Problems 1–5, determine whether the given set of vectors is closed under addition and closed under scalar multiplication. In each case, take the set of scalars to be the set of all real numbers. 1. The set of all rational numbers. 2. The set of all upper triangular n × n matrices with real elements. 3. The set of all solutions to the differential equation y +9y = 4x 2 . (Do not solve the differential equation.) 4. The set of all solutions to the differential equation y + 9y = 0. (Do not solve the differential equation.) 5. The set of all solutions to the homogeneous linear system Ax = 0. 12. On R+ , the set of positive real numbers, define the operations of addition and scalar multiplication as follows: x + y = xy, c · x = xc. Note that the multiplication and exponentiation appearing on the right side of these formulas refer to the ordinary operations on real numbers. Determine whether R+ , together with these algebraic operations, is a vector space. 13. On R2 , define the operation of addition and multiplication by a real number as follows: (x1 , y1 ) + (x2 , y2 ) = (x1 − x2 , y1 − y2 ), k(x1 , y1 ) = (−kx1 , −ky1 ). 6. Let S = {A ∈ M2 (R) : det(A) = 0}. Which of the axioms for a vector space are satisfied by R2 with these algebraic operations? i i i i i i i “main” 2007/2/16 page 250 i 250 CHAPTER 4 Vector Spaces 14. On R2 , define the operation of addition by Determine which of the axioms for a vector space are satisfied by M2 (R) with the operations ⊕ and ·. (x1 , y1 ) + (x2 , y2 ) = (x1 x2 , y1 y2 ). Do axioms A5 and A6 in the definition of a vector space hold? Justify your answer. 15. On M2 (R), define the operation of addition by A + B = AB, and use the usual scalar multiplication operation. Determine which axioms for a vector space are satisfied by M2 (R) with the above operations. 16. On M2 (R), define the operations of addition and multiplication by a real number (⊕ and · , respectively) as follows: A ⊕ B = −(A + B), k · A = −kA, where the operations on the right-hand sides of these equations are the usual ones associated with M2 (R). 4.3 For Problems 17–18, verify that the given set of objects together with the usual operations of addition and scalar multiplication is a complex vector space. 17. C2 . 18. M2 (C), the set of all 2 × 2 matrices with complex entries. 19. Is C3 a real vector space? Explain. 20. Is R3 a complex vector space? Explain. 21. Prove part 3 of Theorem 4.2.6. 22. Prove part 6 of Theorem 4.2.6. 23. Prove that Pn is a vector space. Subspaces Let us try to make contact between the abstract vector space idea and the solution of an applied problem. Vector spaces generally arise as the sets containing the unknowns in a given problem. For example, if we are solving a differential equation, then the basic unknown is a function, and therefore any solution to the differential equation will be an element of the vector space V of all functions defined on an appropriate interval. Consequently, the solution set of a differential equation is a subset of V . Similarly, consider the system of linear equations Ax = b, where A is an m × n matrix with real elements. The basic unknown in this system, x, is a column n-vector, or equivalently a vector in Rn . Consequently, the solution set to the system is a subset of the vector space Rn . As these examples illustrate, the solution set of an applied problem is generally a subset of vectors from an appropriate vector space (schematically represented in Figure 4.3.1). The question we will need to answer in the future is whether this subset of vectors is a vector space in its own right. The following definition introduces the terminology we will use: Vector space of unknowns V S Solution set of applied problem: Is S a vector space? Figure 4.3.1: The solution set S of an applied problem is a subset of the vector space V of unknowns in the problem. i i i i i i i “main” 2007/2/16 page 251 i 4.3 Subspaces 251 DEFINITION 4.3.1 Let S be a nonempty subset of a vector space V . If S is itself a vector space under the same operations of addition and scalar multiplication as used in V , then we say that S is a subspace of V . In establishing that a given subset S of vectors from a vector space V is a subspace of V , it would appear as though we must check that each axiom in the vector space definition is satisfied when we restrict our attention to vectors lying only in S . The first and most important theorem of the section tells us that all we need do, in fact, is check the closure axioms A1 and A2. If these are satisfied, then the remaining axioms necessarily hold in S . This is a very useful theorem that will be applied on several occasions throughout the remainder of the text. Theorem 4.3.2 Let S be a nonempty subset of a vector space V . Then S is a subspace of V if and only if S is closed under the operations of addition and scalar multiplication in V . Proof If S is a subspace of V , then it is a vector space, and hence, it is certainly closed under addition and scalar multiplication. Conversely, assume that S is closed under addition and scalar multiplication. We must prove that Axioms A3–A10 of Definition 4.2.1 hold when we restrict to vectors in S . Consider first the axioms A3, A4, and A7–A10. These are properties of the addition and scalar multiplication operations, hence since we use the same operations in S as in V , these axioms are all inherited from V by the subset S . Finally, we establish A5 and A6: Choose any vector1 u in S . Since S is closed under scalar multiplication, both 0u and (−1)u are in S . But by Theorem 4.2.6, 0u = 0 and (−1)u = −u, hence 0 and −u are both in S . Therefore, A5 and A6 are satisfied. The idea behind Theorem 4.3.2 is that once we have a vector space V in place, then any nonempty subset S , equipped with the same addition and scalar multiplication operations, will inherit all of the axioms that involve those operations. The only possible concern we have for S is whether or not it satisfies the closure axioms A1 and A2. Of course, we presumably had to carry out the full verification of A1–A10 for the vector space V in the first place, before gaining the shortcut of Theorem 4.3.2 for the subset S . In determining whether a subset S of a vector space V is a subspace of V , we must keep clear in our minds what the given vector space is and what conditions on the vectors in V restrict them to lie in the subset S . This is most easily done by expressing S in set notation as follows: S = {v ∈ V : conditions on v}. We illustrate with an example. Example 4.3.3 Verify that the set of all real solutions to the following linear system is a subspace of R3 : x1 + 2x2 − x3 = 0, 2x1 + 5x2 − 4x3 = 0. Solution: The reduced row-echelon form of the augmented matrix of the system is 10 30 , 0 1 −2 0 1 This is possible since S is assumed to be nonempty. i i i i i i i “main” 2007/2/16 page 252 i 252 CHAPTER 4 Vector Spaces so that the solution set of the system is S = {x ∈ R3 : x = (−3r, 2r, r), r ∈ R}, which is a nonempty subset of R3 . We now use Theorem 4.3.2 to verify that S is a subspace of R3 : If x = (−3r, 2r, r) and y = (−3s, 2s, s) are any two vectors in S , then x + y = (−3r, 2r, r) + (−3s, 2s, s) = (−3(r + s), 2(r + s), r + s) = (−3t, 2t, t), where t = r + s . Thus, x + y meets the required form for elements of S , and consequently, if we add two vectors in S , the result is another vector in S . Similarly, if we multiply an arbitrary vector x = (−3r, 2r, r) in S by a real number k , the resulting vector is k x = k(−3r, 2r, r) = (−3kr, 2kr, kr) = (−3w, 2w, w), where w = kr . Hence, k x again has the proper form for membership in the subset S , and so S is closed under scalar multiplication. By Theorem 4.3.2, S is a subspace of R3 . Note, of course, that our application of Theorem 4.3.2 hinges on our prior knowledge that R3 is a vector space. Geometrically, the vectors in S lie along the line of intersection of the planes with the given equations. This is the line through the origin in the direction of the vector v = (−3, 2, 1). (See Figure 4.3.2.) z x ( 3r, 2r, r) r ( 3, 2, 1) y x Figure 4.3.2: The solution set to the homogeneous system of linear equations in Example 4.3.3 is a subspace of R3 . Example 4.3.4 Verify that S = {x ∈ R2 : x = (r, −3r + 1), r ∈ R} is not a subspace of R2 . Solution: One approach here, according to Theorem 4.3.2, is to demonstrate the failure of closure under addition or scalar multiplication. For example, if we start with two vectors in S , say x = (r, −3r + 1) and y = (s, −3s + 1), then x + y = (r, −3r + 1) + (s, −3s + 1) = (r + s, −3(r + s) + 2) = (w, −3w + 2), where w = r + s . We see that x + y does not have the required form for membership in S . Hence, S is not closed under addition and therefore fails to be a subspace of R2 . Alternatively, we can show similarly that S is not closed under scalar multiplication. Observant readers may have noticed another reason that S cannot form a subspace. Geometrically, the points in S correspond to those points that lie on the line with Cartesian equation y = −3x + 1. Since this line does not pass through the origin, S does not contain the zero vector 0 = (0, 0), and therefore we know S cannot be a subspace. i i i i i i i “main” 2007/2/16 page 253 i 4.3 Remark Subspaces 253 In general, we have the following important observation. If a subset S of a vector space V fails to contain the zero vector 0, then it cannot form a subspace. This observation can often be made more quickly than deciding whether or not S is closed under addition and closed under scalar multiplication. However, we caution that if the zero vector does belong to S , then the observation is inconclusive and further investigation is required to determine whether or not S forms a subspace of V . Example 4.3.5 Let S denote the set of all real symmetric n × n matrices. Verify that S is a subspace of Mn (R). Solution: The subset of interest is S = {A ∈ Mn (R) : AT = A}. Note that S is nonempty, since, for example, it contains the zero matrix 0n . We now verify closure of S under addition and scalar multiplication. Let A and B be in S . Then AT = A and B T = B. Using these conditions and the properties of the transpose yields (A + B)T = AT + B T = A + B and (kA)T = kAT = kA for all real values of k . Consequently A + B and kA are both symmetric matrices, so they are elements of S . Hence S is closed under both addition and scalar multiplication and so is indeed a subspace of Mn (R). Remark Notice in Example 4.3.5 that it was not necessary to actually write out the matrices A and B in terms of their elements [aij ] and [bij ], respectively. This shows the advantage of using simple abstract notation to describe the elements of the subset S in some situations. Example 4.3.6 Let V be the vector space of all real-valued functions defined on an interval [a, b], and let S denote the set of all functions in V that satisfy f (a) = 0. Verify that S is a subspace of V . Solution: We have S = {f ∈ V : f (a) = 0}, which is nonempty since it contains, for example, the zero function O(x) = 0 for all x in [a, b]. Assume that f and g are in S , so that f (a) = 0 and g(a) = 0. We now check for closure of S under addition and scalar multiplication. We have (f + g)(a) = f (a) + g(a) = 0 + 0 = 0, i i i i i i i “main” 2007/2/16 page 254 i 254 CHAPTER 4 Vector Spaces which implies that f + g ∈ S . Hence, S is closed under addition. Further, if k is any real number, (kf )(a) = kf (a) = k 0 = 0, so that S is also closed under scalar multiplication. Theorem 4.3.2 therefore implies that S is a subspace of V . Some representative functions from S are sketched in Figure 4.3.3. In the next theorem, we establish that the subset {0} of a vector space V is in fact a subspace of V . We call this subspace the trivial subspace of V . Theorem 4.3.7 Let V be a vector space with zero vector 0. Then S = {0} is a subspace of V . Proof Note that S is nonempty. Further, the closure of S under addition and scalar multiplication follow, respectively, from 0+0=0 and k 0 = 0, where the second statement follows from Theorem 4.2.6. We now use Theorem 4.3.2 to establish an important result pertaining to homogeneous systems of linear equations that has already been illustrated in Example 4.3.3. Theorem 4.3.8 Let A be an m × n matrix. The solution set of the homogeneous system of linear equations Ax = 0 is a subspace of Cn . Proof Let S denote the solution set of the homogeneous linear system. Then we can write S = {x ∈ Cn : Ax = 0}, y a subset of Cn . Since a homogeneous system always admits the trivial solution x = 0, we know that S is nonempty. If x1 and x2 are in S , then f (x) Ax1 = 0 a b x and Ax2 = 0. Using properties of the matrix product, we have A(x1 + x2 ) = Ax1 + Ax2 = 0 + 0 = 0, so that x1 + x2 also solves the system and therefore is in S . Furthermore, if k is any Figure 4.3.3: Representative functions in the subspace S given complex scalar, then in Example 4.3.6. Each function in A(k x) = kAx = k 0 = 0, S satisfies f (a) = 0. so that k x is also a solution of the system and therefore is in S . Since S is closed under both addition and scalar multiplication, it follows from Theorem 4.3.2 that S is a subspace of Cn . The preceding theorem has established that the solution set to any homogeneous linear system of equations is a vector space. Owing to the importance of this vector space, it is given a special name. DEFINITION 4.3.9 Let A be an m × n matrix. The solution set to the corresponding homogeneous linear system Ax = 0 is called the null space of A and is denoted nullspace(A). Thus, nullspace(A) = {x ∈ Cn : Ax = 0}. i i i i i i i “main” 2007/2/16 page 255 i 4.3 255 Subspaces Remarks 1. If the matrix A has real elements, then we will consider only the corresponding real solutions to Ax = 0. Consequently, in this case, nullspace(A) = {x ∈ Rn : Ax = 0}, a subspace of Rn . 2. The previous theorem does not hold for the solution set of a nonhomogeneous linear system Ax = b, for b = 0, since x = 0 is not in the solution set of the system. Next we introduce the vector space of primary importance in the study of linear differential equations. This vector space arises as a subspace of the vector space of all functions that are defined on an interval I . Example 4.3.10 Let V denote the vector space of all functions that are defined on an interval I , and let C k (I ) denote the set of all functions that are continuous and have (at least) k continuous derivatives on the interval I , for a fixed non-negative integer k . Show that C k (I ) is a subspace of V . Solution: In this case C k (I ) = {f ∈ V : f, f , f , . . . , f (k) exist and are continuous on I }. This set is nonempty, as the zero function O(x) = 0 for all x ∈ I is an element of C k (I ). Moreover, it follows from the properties of derivatives that if we add two functions in C k (I ), the result is a function in C k (I ). Similarly, if we multiply a function in C k (I ) by a scalar, then the result is a function in C k (I ). Thus, Theorem 4.3.2 implies that C k (I ) is a subspace of V . Our final result in this section ties together the ideas introduced here with the theory of differential equations. Theorem 4.3.11 The set of all solutions to the homogeneous linear differential equation y + a1 (x)y + a2 (x)y = 0 (4.3.1) on an interval I is a vector space. Proof Let S denote the set of all solutions to the given differential equation. Then S is a nonempty subset of C 2 (I ), since the identically zero function y = 0 is a solution to the differential equation. We establish that S is in fact a subspace of2 C k (I ). Let y1 and y2 be in S , and let k be a scalar. Then we have the following: y1 + a1 (x)y1 + a2 (x)y1 = 0 and y2 + a1 (x)y2 + a2 (x)y2 = 0. (4.3.2) Now, if y(x) = y1 (x) + y2 (x), then y + a1 y + a2 y = (y1 + y2 ) + a1 (x)(y1 + y2 ) + a2 (x)(y1 + y2 ) = [y1 + a1 (x)y1 + a2 (x)y1 ] + [y2 + a1 (x)y2 + a2 (x)y2 ] = 0 + 0 = 0, 2 It is important at this point that we have already established Example 4.3.10, so that S is a subset of a set that is indeed a vector space. i i i i i i i “main” 2007/2/16 page 256 i 256 CHAPTER 4 Vector Spaces where we have used (4.3.2). Consequently, y(x) = y1 (x) + y2 (x) is a solution to the differential equation (4.3.1). Moreover, if y(x) = ky1 (x), then y + a1 y + a2 y = (ky1 ) + a1 (x)(ky1 ) + a2 (x)(ky1 ) = k [y1 + a1 (x)y1 + a2 (x)y1 ] = 0, where we have once more used (4.3.2). This establishes that y(x) = ky1 (x) is a solution to Equation (4.3.1). Therefore, S is closed under both addition and scalar multiplication. Consequently, the set of all solutions to Equation (4.3.1) is a subspace of C 2 (I ). We will refer to the set of all solutions to a differential equation of the form (4.3.1) as the solution space of the differential equation. A key theoretical result that we will establish in Chapter 6 regarding the homogeneous linear differential equation (4.3.1) is that every solution to the differential equation has the form y(x) = c1 y1 (x) + c2 y2 (x), where y1 , y2 are any two nonproportional solutions. The power of this result is impressive: It reduces the search for all solutions to Equation (4.3.1) to the search for just two nonproportional solutions. In vector space terms, the result can be restated as follows: Every vector in the solution space to the differential equation (4.3.1) can be written as a linear combination of any two nonproportional solutions y1 and y2 . We say that the solution space is spanned by y1 and y2 . Moreover, two nonproportional solutions are referred to as linearly independent. For example, we saw in Example 1.2.16 that the set of all solutions to the differential equation y + ω2 y = 0 is spanned by y1 (x) = cos ωx , and y2 (x) = sin ωx, and y1 and y2 are linearly independent. We now begin our investigation as to whether this type of idea will work more generally when the solution set to a problem is a vector space. For example, what about the solution set to a homogeneous linear system Ax = 0? We might suspect that if there are k free variables defining the vectors in nullspace(A), then every solution to Ax = 0 can be expressed as a linear combination of k basic solutions. We will establish that this is indeed the case in Section 4.9. The two key concepts we need to generalize are (1) spanning a general vector space with a set of vectors, and (2) linear independence in a general vector space. These will be addressed in turn in the next two sections. Exercises for 4.3 Key Terms Subspace, Trivial subspace, Null space of a matrix A. Skills • Be able to check whether or not a subset S of a vector space V is a subspace of V . • Be able to compute the null space of an m × n matrix A. you can quote a relevant definition or theorem from the text. If false, provide an example, illustration, or brief explanation of why the statement is false. 1. The null space of an m × n matrix A with real elements is a subspace of Rm . 2. The solution set of any linear system of m equations in n variables forms a subspace of Cn . True-False Review 3. The points in R2 that lie on the line y = mx + b form a subspace of R2 if and only if b = 0. For Questions 1–8, decide if the given statement is true or false, and give a brief justification for your answer. If true, 4. If m < n, then Rm is a subspace of Rn . i i i i i i i “main” 2007/2/16 page 257 i 4.3 5. A nonempty set S of a vector space V that is closed under scalar multiplication contains the zero vector of V. 6. If V = R is a vector space under the usual operations of addition and scalar multiplication, then the subset R+ of positive real numbers, together with the operations defined in Problem 12 of Section 4.2, forms a subspace of V . R3 7. If V = and S consists of all points on the xy -plane, the xz-plane, and the yz-plane, then S is a subspace of V . 8. If V is a vector space, then two different subspaces of V can contain no common vectors other than 0. Problems 1. Let S = {x ∈ R2 : x = (2k, −3k), k ∈ R}. (a) Establish that S is a subspace of R2 . (b) Make a sketch depicting the subspace S in the Cartesian plane. 2. Let S = {x ∈ R3 : x = (r − 2s, 3r + s, s), r, s ∈ R}. (a) Establish that S is a subspace of R3 . (b) Show that the vectors in S lie on the plane with equation 3x − y + 7z = 0. For Problems 3–19, express S in set notation and determine whether it is a subspace of the given vector space V . 3. V = R2 , and S is the set of all vectors (x, y) in V satisfying 3x + 2y = 0. 6. V = Rn , and S is the set of all solutions to the nonhomogeneous linear system Ax = b, where A is a fixed m × n matrix and b (= 0) is a fixed vector. 7. V = R2 , and S consists of all vectors (x, y) satisfying x 2 − y 2 = 0. 8. V = M2 (R), and S is the subset of all 2 × 2 matrices with det(A) = 1. 9. V = Mn (R), and S is the subset of all n × n lower triangular matrices. 257 10. V = Mn (R), and S is the subset of all n × n invertible matrices. 11. V = M2 (R), and S is the subset of all 2 × 2 symmetric matrices. 12. V = M2 (R), and S is the subset of all 2 × 2 skewsymmetric matrices. 13. V is the vector space of all real-valued functions defined on the interval [a, b], and S is the subset of V consisting of all functions satisfying f (a) = f (b). 14. V is the vector space of all real-valued functions defined on the interval [a, b], and S is the subset of V consisting of all functions satisfying f (a) = 1. 15. V is the vector space of all real-valued functions defined on the interval (−∞, ∞), and S is the subset of V consisting of all functions satisfying f (−x) = f (x) for all x ∈ (−∞, ∞). 16. V = P2 , and S is the subset of P2 consisting of all polynomials of the form p(x) = ax 2 + b. 17. V = P2 , and S is the subset of P2 consisting of all polynomials of the form p(x) = ax 2 + 1. 18. V = C 2 (I ), and S is the subset of V consisting of those functions satisfying the differential equation y + 2y − y = 0 on I . 19. V = C 2 (I ), and S is the subset of V consisting of those functions satisfying the differential equation 4. V = R4 , and S is the set of all vectors of the form (x1 , 0, x3 , 2). 5. V = R3 , and S is the set of all vectors (x, y, z) in V satisfying x + y + z = 1. Subspaces y + 2y − y = 1 on I . For Problems 20–22, determine the null space of the given matrix A. 1 −2 1 20. A = 4 −7 −2 . −1 3 4 1 3 −2 1 21. A = 3 10 −4 6 . 2 5 −6 −1 1 i −2 22. A = 3 4i −5 . −1 −3i i i i i i i i i “main” 2007/2/16 page 258 i 258 CHAPTER 4 Vector Spaces 23. Show that the set of all solutions to the nonhomogeneous differential equation and let S1 + S2 = {v ∈ V : v = x + y for some x ∈ S1 and y ∈ S2 } . y + a1 y + a2 y = F (x), where F (x) is nonzero on an interval I , is not a subspace of C 2 (I ). 24. Let S1 and S2 be subspaces of a vector space V . Let (b) Show that S1 ∩ S2 is a subspace of V . S1 ∪ S2 = {v ∈ V : v ∈ S1 or v ∈ S2 }, (c) Show that S1 + S2 is a subspace of V . S1 ∩ S2 = {v ∈ V : v ∈ S1 and v ∈ S2 }, 4.4 (a) Show that, in general, S1 ∪ S2 is not a subspace of V . Spanning Sets The only algebraic operations that are defined in a vector space V are those of addition and scalar multiplication. Consequently, the most general way in which we can combine the vectors v1 , v2 , . . . , vk in V is c1 v1 + c2 v2 + · · · + ck vk , (4.4.1) where c1 , c2 , . . . , ck are scalars. An expression of the form (4.4.1) is called a linear combination of v1 , v2 , . . . , vk . Since V is closed under addition and scalar multiplication, it follows that the foregoing linear combination is itself a vector in V . One of the questions we wish to answer is whether every vector in a vector space can be obtained by taking linear combinations of a finite set of vectors. The following terminology is used in the case when the answer to this question is affirmative: DEFINITION 4.4.1 If every vector in a vector space V can be written as a linear combination of v1 , v2 , . . . , vk , we say that V is spanned or generated by v1 , v2 , . . . , vk and call the set of vectors {v1 , v2 , . . . , vk } a spanning set for V . In this case, we also say that {v1 , v2 , . . . , vk } spans V . This spanning idea was introduced in the preceding section within the framework of differential equations. In addition, we are all used to representing geometric vectors in R3 in terms of their components as (see Section 4.1) v = a i + bj + ck, where i, j, and k denote the unit vectors pointing along the positive x -, y -, and z-axes, respectively, of a rectangular Cartesian coordinate system. Using the above terminology, we say that v has been expressed as a linear combination of the vectors i, j, and k, and that the vector space of all geometric vectors is spanned by i, j, and k. We now consider several examples to illustrate the spanning concept in different vector spaces. Example 4.4.2 Show that R2 is spanned by the vectors v1 = (1, 1) and v2 = (2, −1). i i i i i i i “main” 2007/2/16 page 259 i 4.4 259 Spanning Sets Solution: We must establish that for every v = (x1 , x2 ) in R2 , there exist constants c1 and c2 such that v = c1 v1 + c2 v2 . y (4.4.2) That is, in component form, (4/3, 4/3) (1, 1) (x1 , x2 ) = c1 (1, 1) + c2 (2, −1). (2, 1) v1 Equating corresponding components in this equation yields the following linear system: v x (2/3, 1/3) v2 c1 + 2c2 = x1 , c1 − c2 = x2 . (2, 1) In this system, we view x1 and x2 as fixed, while the variables we must solve for are c1 and c2 . The determinant of the matrix of coefficients of this system is Figure 4.4.1: The vector v = (2, 1) expressed as a linear combination of v1 = (1, 1) and v2 = (2, −1). 12 = −3. 1 −1 Since this is nonzero regardless of the values of x1 and x2 , the matrix of coefficients is invertible, and hence for all (x1 , x2 ) ∈ R2 , the system has a (unique) solution according to Theorem 2.6.4. Thus, Equation (4.4.2) can be satisfied for every vector v ∈ R2 , so the given vectors do span R2 . Indeed, solving the linear system yields c1 = 1 (x1 + 2x2 ), 3 y c2v2 v2 Hence, v2 c2 c v (x1 , x2 ) = 1 (x1 + 2x2 )v1 + 1 (x1 − x2 )v2 . 3 3 v1 1 v1 c2 = 1 (x1 − x2 ). 3 For example, if v = (2, 1), then c1 = x illustrated in Figure 4.4.1. c1v1 4 3 and c2 = 1 , so that v = 4 v1 + 1 v2 . This is 3 3 3 Figure 4.4.2: Any two noncollinear vectors in R2 span R2 . More generally, any two nonzero and noncolinear vectors v1 and v2 in R2 span R2 , since, as illustrated geometrically in Figure 4.4.2, every vector in R2 can be written as a linear combination of v1 and v2 . Example 4.4.3 Determine whether the vectors v1 = (1, −1, 4), v2 = (−2, 1, 3), and v3 = (4, −3, 5) span R3 . Solution: Let v = (x1 , x2 , x3 ) be an arbitrary vector in R3 . We must determine whether there are real numbers c1 , c2 , c3 such that v = c1 v1 + c2 v2 + c3 v3 (4.4.3) or, in component form, (x1 , x2 , x3 ) = c1 (1, −1, 4) + c2 (−2, 1, 3) + c3 (4, −3, 5). Equating corresponding components on either side of this vector equation yields c1 − 2c2 + 4c3 = x1 , −c1 + c2 − 3c3 = x2 , 4c1 + 3c2 + 5c3 = x3 . i i i i i i i “main” 2007/2/16 page 260 i 260 CHAPTER 4 Vector Spaces Reducing the augmented matrix of this system to row-echelon form, we obtain 1 −2 4 x1 0 1 −1 . −x1 − x2 0 0 0 7x1 + 11x2 + x3 It follows that the system is consistent if and only if x1 , x2 , x3 satisfy 7x1 + 11x2 + x3 = 0. (4.4.4) Consequently, Equation (4.4.3) holds only for those vectors v = (x1 , x2 , x3 ) in R3 whose components satisfy Equation (4.4.4). Hence, v1 , v2 , and v3 do not span R3 . Geometrically, Equation (4.4.4) is the equation of a plane through the origin in space, and so by taking linear combinations of the given vectors, we can obtain only those vectors which lie on this plane. We leave it as an exercise to verify that indeed the three given vectors lie in the plane with Equation (4.4.4). It is worth noting that this plane forms a subspace S of R3 , and that while V is not spanned by the vectors v1 , v2 , and v3 , S is. The reason that the vectors in the previous example did not span R3 was because they were coplanar. In general, any three noncoplanar vectors v1 , v2 , and v3 in R3 span R3 , since, as illustrated in Figure 4.4.3, every vector in R3 can be written as a linear combination of v1 , v2 , and v3 . In subsequent sections we will make this same observation from a more algebraic point of view. z v c1v1 c2v2 c3v3 v1 c3v3 c1v1 v3 c2v2 v2 y x Figure 4.4.3: Any three noncoplanar vectors in R3 span R3 . Notice in the previous example that the linear combination (4.4.3) can be written as the matrix equation Ac = v, where the columns of A are the given vectors v1 , v2 , and v3 : A = [v1 , v2 , v3 ]. Thus, the question of whether or not the vectors v1 , v2 , and v3 span R3 can be formulated as follows: Does the system Ac = v have a solution c for every v in R3 ? If so, then the column vectors of A span R3 , and if not, then the column vectors of A do not span R3 . This reformulation applies more generally to vectors in Rn , and we state it here for the record. Theorem 4.4.4 Let v1 , v2 , . . . , vk be vectors in Rn . Then {v1 , v2 , . . . , vk } spans Rn if and only if, for the matrix A = [v1 , v2 , . . . , vk ], the linear system Ac = v is consistent for every v in Rn . i i i i i i i “main” 2007/2/16 page 261 i 4.4 Spanning Sets 261 Proof Rewriting the system Ac = v as the linear combination c1 v1 + c2 v2 + · · · + ck vk = v, we see that the existence of a solution (c1 , c2 , . . . , ck ) to this vector equation for each v in Rn is equivalent to the statement that {v1 , v2 , . . . , vk } spans Rn . Next, we consider a couple of examples involving vector spaces other than Rn . Example 4.4.5 Verify that 10 , 00 A1 = A2 = 11 , 00 A3 = 11 , 10 A4 = 11 11 span M2 (R). Solution: An arbitrary vector in M2 (R) is of the form A= ab . cd If we write c1 A1 + c2 A2 + c3 A3 + c4 A4 = A, then equating the elements of the matrices on each side of the equation yields the system c1 + c2 + c3 + c4 c2 + c3 + c4 c3 + c4 c4 = a, = b, = c, = d. Solving this by back substitution gives c1 = a − b, c2 = b − c, c3 = c − d, c4 = d. Hence, we have A = (a − b)A1 + (b − c)A2 + (c − d)A3 + dA4 . Consequently every vector in M2 (R) can be written as a linear combination of A1 , A2 , A3 , and A4 , and therefore these matrices do indeed span M2 (R). Remark The most natural spanning set for M2 (R) is 10 01 00 00 , , , 00 00 10 01 , a fact that we leave to the reader as an exercise. Example 4.4.6 Determine a spanning set for P2 , the vector space of all polynomials of degree 2 or less. Solution: The general polynomial in P2 is p(x) = a0 + a1 x + a2 x 2 . If we let p0 (x) = 1, p1 (x) = x, p2 (x) = x 2 , then p(x) = a0 p0 (x) + a1 p1 (x) + a2 p2 (x). Thus, every vector in P2 is a linear combination of 1, x , and x 2 , and so a spanning set for P2 is {1, x, x 2 }. For practice, the reader might show that {x 2 , x + x 2 , 1 + x + x 2 } is another spanning set for P2 , by making the appropriate modifications to the calculations in this example. i i i i i i i “main” 2007/2/16 page 262 i 262 CHAPTER 4 Vector Spaces The Linear Span of a Set of Vectors Now let v1 , v2 , . . . , vk be vectors in a vector space V . Forming all possible linear combinations of v1 , v2 , . . . , vk generates a subset of V called the linear span of {v1 , v2 , . . . , vk }, denoted span{v1 , v2 , . . . , vk }. We have span{v1 , v2 , . . . , vk } = {v ∈ V : v = c1 v1 + c2 v2 + · · · + ck vk , c1 , c2 , . . . , ck ∈ F }. (4.4.5) For example, suppose V = C 2 (I ), and let y1 (x) = sin x and y2 (x) = cos x . Then span{y1 , y2 } = {y ∈ C 2 (I ) : y(x) = c1 cos x + c2 sin x, c1 , c2 ∈ R}. From Example 1.2.16, we recognize y1 and y2 as being nonproportional solutions to the differential equation y + y = 0. Consequently, in this example, the linear span of the given functions coincides with the set of all solutions to the differential equation y + y = 0 and therefore is a subspace of V . Our next theorem generalizes this to show that any linear span of vectors in any vector space forms a subspace. Theorem 4.4.7 Let v1 , v2 , . . . , vk be vectors in a vector space V . Then span{v1 , v2 , . . . , vk } is a subspace of V . Proof Let S = span{v1 , v2 , . . . , vk }. Then 0 ∈ S (corresponding to c1 = c2 = · · · = ck = 0 in (4.4.5)), so S is nonempty. We now verify closure of S under addition and scalar multiplication. If u and v are in S , then, from Equation (4.4.5), u = a1 v1 + a2 v2 + · · · + ak vk and v = b1 v1 + b2 v2 + · · · + bk vk , for some scalars ai , bi . Thus, u + v = (a1 v1 + a2 v2 + · · · + ak vk ) + (b1 v1 + b2 v2 + · · · + bk vk ) = (a1 + b1 )v1 + (a2 + b2 )v2 + · · · + (ak + bk )vk = c1 v1 + c2 v2 + · · · + ck vk , where ci = ai + bi for each i = 1, 2, . . . , k . Consequently, u + v has the proper form for membership in S according to (4.4.5), so S is closed under addition. Further, if r is any scalar, then r u = r(a1 v1 + a2 v2 + · · · + ak vk ) = (ra1 )v1 + (ra2 )v2 + · · · + (rak )vk = d1 v1 + d2 v2 + · · · + dk vk , where di = rai for each i = 1, 2, . . . , k . Consequently, r u ∈ S , and so S is also closed under scalar multiplication. Hence, S = span{v1 , v2 , . . . , vk } is a subspace of V . Remarks 1. We will also refer to span{v1 , v2 , . . . , vk } as the subspace of V spanned by v1 , v2 , . . . , vk . 2. As a special case, we will declare that span(∅) = {0}. i i i i i i i “main” 2007/2/16 page 263 i 4.4 Example 4.4.8 Spanning Sets 263 If V = R2 and v1 = (−1, 1), determine span{v1 }. Solution: We have span{v1 } = {v ∈ R2 : v = c1 v1 , c1 ∈ R} = {v ∈ R2 : v = c1 (−1, 1), c1 ∈ R} = {v ∈ R2 : v = (−c1 , c1 ), c1 ∈ R}. Geometrically, this is the line through the origin with parametric equations x = −c1 , y = c1 , so that the Cartesian equation of the line is y = −x . (See Figure 4.4.4.) y (—c1, c1) c1v1 ( 1, 1) The subspace of by the vector v1 2 spanned ( 1, 1) v1 x Figure 4.4.4: The subspace of R2 spanned by v1 = (−1, 1). Example 4.4.9 If V = R3 , v1 = (1, 0, 1), and v2 = (0, 1, 1), determine the subspace of R3 spanned by v1 and v2 . Does w = (1, 1, −1) lie in this subspace? Solution: We have span{v1 , v2 } = {v ∈ R3 : v = c1 v1 + c2 v2 , c1 , c2 ∈ R} = {v ∈ R3 : v = c1 (1, 0, 1) + c2 (0, 1, 1), c1 , c2 ∈ R} = {v ∈ R3 : v = (c1 , c2 , c1 + c2 ), c1 , c2 ∈ R}. Since the vector w = (1, 1, −1) is not of the form (c1 , c2 , c1 + c2 ), it does not lie in span{v1 , v2 }. Geometrically, span{v1 , v2 } is the plane through the origin determined by the two given vectors v1 and v2 . It has parametric equations x = c1 , y = c2 , z = c1 + c2 , which implies that its Cartesian equation is z = x + y . Thus, the fact that w is not in span{v1 , v2 } means that w does not lie in this plane. The subspace is depicted in Figure 4.4.5. z The subspace of 3 spanned by v1 (1, 0, 1), v2 (0, 1, 1) v1 v2 y x w (1, 1, 1) does not lie in span{v1, v2} Figure 4.4.5: The subspace of R3 spanned by v1 = (1, 0, 1) and v2 = (0, 1, 1) is the plane with Cartesian equation z = x + y . i i i i i i i “main” 2007/2/16 page 264 i 264 CHAPTER 4 Vector Spaces Example 4.4.10 Let A1 = 10 , 00 A2 = 01 , 10 A3 = 00 01 in M2 (R). Determine span{A1 , A2 , A3 }. Solution: By definition we have span{A1 , A2 , A3 } = {A ∈ M2 (R) : A = c1 A1 + c2 A2 + c3 A3 , c1 , c2 , c3 ∈ R} 10 01 00 = A ∈ M2 (R) : A = c1 + c2 + c3 00 10 01 c1 c2 = A ∈ M2 (R) : A = , c1 , c2 , c3 ∈ R . c2 c3 This is the set of all real 2 × 2 symmetric matrices. Example 4.4.11 Determine the subspace of P2 spanned by p1 (x) = 1 + 3x, p2 (x) = x + x 2 , and decide whether {p1 , p2 } is a spanning set for P2 . Solution: We have span{p1 , p2 } = {p ∈ P2 : p(x) = c1 p1 (x) + c2 p2 (x), c1 , c2 ∈ R} = {p ∈ P2 : p(x) = c1 (1 + 3x) + c2 (x + x 2 ), c1 , c2 ∈ R} = {p ∈ P2 : p(x) = c1 + (3c1 + c2 )x + c2 x 2 , c1 , c2 ∈ R}. Next, we will show that {p1 , p2 } is not a spanning set for P2 . To establish this, we need give only one example of a polynomial in P2 that is not in span{p1 , p2 }. There are many such choices here, but suppose we consider p(x) = 1 + x . If this polynomial were in span{p1 , p2 }, then we would have to be able to find values of c1 and c2 such that 1 + x = c1 + (3c1 + c2 )x + c2 x 2 . (4.4.6) Since there is no x 2 term on the left-hand side of this expression, we must set c2 = 0. But then (4.4.6) would reduce to 1 + x = c1 (1 + 3x). Equating the constant terms on each side of this forces c1 = 1, but then the coefficients of x do not match. Hence, such an equality is impossible. Consequently, there are no values of c1 and c2 such that the Equation (4.4.6) holds, and therefore, span{p1 , p2 } = P2 . Remark In the previous example, the reader may well wonder why we knew from the beginning to select p(x) = 1 + x as a vector that would be outside of span{p1 , p2 }. In truth, we only need to find a polynomial that does not have the form p(x) = c1 + (3c1 + c2 )x + c2 x 2 and in fact, “most” of the polynomials in P2 would have achieved the desired result here. i i i i i i i “main” 2007/2/16 page 265 i 4.4 Spanning Sets 265 Exercises for 4.4 Key Terms Linear combination, Linear span, Spanning set. 8. If S is a spanning set for a vector space V , then any proper subset S of S is not a spanning set for V . Skills 9. The vector space of 3 × 3 upper triangular matrices is spanned by the matrices Eij where 1 ≤ i ≤ j ≤ 3. • Be able to determine whether a given set of vectors S spans a vector space V , and be able to prove your answer mathematically. • Be able to determine the linear span of a set of vectors. For vectors in Rn , be able to give a geometric description of the linear span. • If S is a spanning set for a vector space V , be able to write any vector in V as a linear combination of the elements of S . • Be able to construct a spanning set for a vector space V . As a special case, be able to determine a spanning set for the null space of an m × n matrix. • Be able to determine whether a particular vector v in a vector space V lies in the linear span of a set S of vectors in V . True-False Review For Questions 1–12, decide if the given statement is true or false, and give a brief justification for your answer. If true, you can quote a relevant definition or theorem from the text. If false, provide an example, illustration, or brief explanation of why the statement is false. 1. The linear span of a set of vectors in a vector space V forms a subspace of V . 2. If some vector v in a vector space V is a linear combination of vectors in a set S , then S spans V . 3. If S is a spanning set for a vector space V and W is a subspace of V , then S is a spanning set for W . 4. If S is a spanning set for a vector space V , then every vector v in V must be uniquely expressible as a linear combination of the vectors in S . 5. A set S of vectors in a vector space V spans V if and only if the linear span of S is V . 6. The linear span of two vectors in R3 is a plane through the origin. 7. Every vector space V has a finite spanning set. 10. A spanning set for the vector space P2 must contain a polynomial of each degree 0, 1, and 2. 11. If m < n, then any spanning set for Rn must contain more vectors than any spanning set for Rm . 12. The vector space P of all polynomials with real coefficients cannot be spanned by a finite set S . Problems For Problems 1–3, determine whether the given set of vectors spans R2 . 1. {(1, −1), (2, −2), (2, 3)}. 2. {(2, 5), (0, 0)}. 3. {(6, −2), (−2, 2/3), (3, −1)}. Recall that three vectors v1 , v2 , v3 in R3 are coplanar if and only if det([v1 , v2 , v3 ]) = 0. For Problems 4–6, use this result to determine whether the given set of vectors spans R3 . 4. {(1, −1, 1), (2, 5, 3), (4, −2, 1)}. 5. {(1, −2, 1), (2, 3, 1), (0, 0, 0), (4, −1, 2)}. 6. {(2, −1, 4), (3, −3, 5), (1, 1, 3)}. 7. Show that the set of vectors {(1, 2, 3), (3, 4, 5), (4, 5, 6)} does not span R3 , but that it does span the subspace of R3 consisting of all vectors lying in the plane with equation x − 2y + z = 0. 8. Show that v1 = (2, −1), v2 = (3, 2) span R2 , and express the vector v = (5, −7) as a linear combination of v1 , v2 . 9. Show that v1 = (−1, 3, 2), v2 = (1, −2, 1), v3 = (2, 1, 1) span R3 , and express v = (x, y, z) as a linear combination of v1 , v2 , v3 . i i i i i i i “main” 2007/2/16 page 266 i 266 CHAPTER 4 Vector Spaces 10. Show that v1 = (1, 1), v2 = (−1, 2), v3 = (1, 4) span R2 . Do v1 , v2 alone span R2 also? For Problems 22–24, determine whether the given vector v lies in span{v1 , v2 }. 11. Let S be the subspace of R3 consisting of all vectors of the form v = (c1 , c2 , c2 − 2c1 ). Show that S is spanned by v1 = (1, 0, −2), v2 = (0, 1, 1). 22. v = (3, 3, 4), v1 = (1, −1, 2), v2 = (2, 1, 3) in R3 . 12. Let S be the subspace of R4 consisting of all vectors of the form v = (c1 , c2 , c2 − c1 , c1 − 2c2 ). Determine a set of vectors that spans S . 13. Let S be the subspace of R3 consisting of all solutions to the linear system x − 2y − z = 0. 23. v = (5, 3, −6), v1 = (−1, 1, 2), v2 = (3, 1, −4) in R3 . 24. v = (1, 1, −2), v1 = (3, 1, 2), v2 = (−2, −1, 1) in R3 . 25. If p1 (x) = x − 4 and p2 (x) = x 2 − x + 3, determine whether p(x) = 2x 2 − x + 2 lies in span{p1 , p2 }. 26. Consider the vectors Determine a set of vectors that spans S . For Problems 14–15, determine a spanning set for the null space of the given matrix A. 123 14. A = 3 4 5 . 567 123 5 15. A = 1 3 4 2 . 2 4 6 −1 16. Let S be the subspace of M2 (R) consisting of all symmetric 2 × 2 matrices with real elements. Show that S is spanned by the matrices A1 = 10 , 00 A2 = 00 , 01 A3 = 01 . 10 17. Let S be the subspace of M2 (R) consisting of all skewsymmetric 2 × 2 matrices with real elements. Determine a matrix that spans S . 18. Let S be the subset of M2 (R) consisting of all upper triangular 2 × 2 matrices. A1 = 20. v1 = (1, 2, −1), v2 = (−2, −4, 2). R3 spanned by the vectors 21. Let S be the subspace of v1 = (1, 1, −1), v2 = (2, 1, 3), v3 = (−2, −2, 2). Show that S also is spanned by v1 and v2 only. 01 , −2 1 A3 = 30 12 27. Consider the vectors A1 = 12 , −1 3 A2 = −2 1 1 −1 in M2 (R). Find span{A1 , A2 }, and determine whether or not 31 B= −2 4 lies in this subspace. 28. Let V = C ∞ (I ) and let S be the subspace of V spanned by the functions f (x) = cosh x, g(x) = sinh x. (a) Give an expression for a general vector in S . (b) Show that S is also spanned by the functions h(x) = ex , (b) Determine a set of 2 × 2 matrices that spans S . 19. v1 = (1, −1, 2), v2 = (2, −1, 3). A2 = in M2 (R). Determine span{A1 , A2 , A3 }. (a) Verify that S is a subspace of M2 (R). For Problems 19–20, determine span{v1 , v2 } for the given vectors in R3 , and describe it geometrically. 1 −1 , 20 j (x) = e−x . For Problems 29–32, give a geometric description of the subspace of R3 spanned by the given set of vectors. 29. {0}. 30. {v1 }, where v1 is any nonzero vector in R3 . 31. {v1 , v2 }, where v1 , v2 are nonzero and noncollinear vectors in R3 . i i i i i i i “main” 2007/2/16 page 267 i 4.5 32. {v1 , v2 }, where v1 , v2 are collinear vectors in R3 . 33. Prove that if S and S are subsets of a vector space V such that S is a subset of S , then span(S) is a subset of span(S ). 4.5 Linear Dependence and Linear Independence 267 34. Prove that span{v1 , v2 , v3 } = span{v1 , v2 } if and only if v3 can be written as a linear combination of v1 and v2 . Linear Dependence and Linear Independence As indicated in the previous section, in analyzing a vector space we will be interested in determining a spanning set. The reader has perhaps already noticed that a vector space V can have many such spanning sets. Example 4.5.1 Observe that {(1, 0), (0, 1)}, {(1, 0), (1, 1)}, and {(1, 0), (0, 1), (1, 2)} are all spanning sets for R2 . As another illustration, two different spanning sets for V = M2 (R) were given in Example 4.4.5 and the remark that followed. Given the abundance of spanning sets available for a given vector space V , we are faced with a natural question: Is there a “best class of” spanning sets to use? The answer, to a large degree, is “yes”. For instance, in Example 4.5.1, the spanning set {(1, 0), (0, 1), (1, 2)} contains an “extra” vector, (1, 2), which seems to be unnecessary for spanning R2 , since {(1, 0), (0, 1)} is already a spanning set. In some sense, {(1, 0), (0, 1)} is a more efficient spanning set. It is what we call a minimal spanning set, since it contains the minimum number of vectors needed to span the vector space.3 But how will we know if we have found a minimal spanning set (assuming one exists)? Returning to the example above, we have seen that span{(1, 0), (0, 1)} = span{(1, 0), (0, 1), (1, 2)} = R2 . Observe that the vector (1, 2) is already a linear combination of (1, 0) and (0, 1), and therefore it does not add any new vectors to the linear span of {(1, 0), (0, 1)}. As a second example, consider the vectors v1 = (1, 1, 1), v2 = (3, −2, 1), and v3 = 4v1 + v2 = (7, 2, 5). It is easily verified that det([v1 , v2 , v3 ]) = 0. Consequently, the three vectors lie in a plane (see Figure 4.5.1) and therefore, since they are not collinear, the linear span of these three vectors is the whole of this plane. Furthermore, the same plane is generated if we consider the linear span of v1 and v2 alone. As in the previous example, the reason that v3 does not add any new vectors to the linear span of {v1 , v2 } is that it is already a linear combination of v1 and v2 . It is not possible, however, to generate all vectors in the plane by taking linear combinations of just one vector, as we could generate only a line lying in the plane in that case. Consequently, {v1 , v2 } is a minimal spanning set for the subspace of R3 consisting of all points lying on the plane. As a final example, recall from Example 1.2.16 that the solution space to the differential equation y +y =0 3 Since a single (nonzero) vector in R2 spans only the line through the origin along which it points, it cannot span all of R2 ; hence, the minimum number of vectors required to span R2 is 2. i i i i i i i “main” 2007/2/16 page 268 i 268 CHAPTER 4 Vector Spaces z (7, 2, 5) v3 4v1 v2 (3, 2, 1) v2 v1 (1, 1, 1) y (3, 2, 0) x (1, 1, 0) (7, 2, 0) Figure 4.5.1: v3 = 4v1 + v2 lies in the plane through the origin containing v1 and v2 , and so, span{v1 , v2 , v3 } = span{v1 , v2 }. can be written as span{y1 , y2 }, where y1 (x) = cos x and y2 (x) = sin x . However, if we let y3 (x) = 3 cos x − 2 sin x , for instance, then {y1 , y2 , y3 } is also a spanning set for the solution space of the differential equation, since span{y1 , y2 , y3 } = {c1 cos x + c2 sin x + c3 (3 cos x − 2 sin x) : c1 , c2 , c3 ∈ R} = {(c1 + 3c3 ) cos x + (c2 − 2c3 ) sin x : c1 , c2 , c3 ∈ R} = {d1 cos x + d2 sin x : d1 , d2 ∈ R} = span{y1 , y2 }. The reason that {y1 , y2 , y3 } is not a minimal spanning set for the solution space is that y3 is a linear combination of y1 and y2 , and therefore, as we have just shown, it does not add any new vectors to the linear span of {cos x, sin x }. More generally, it is not too difficult to extend the argument used in the preceding examples to establish the following general result. Theorem 4.5.2 Let {v1 , v2 , . . . , vk } be a set of at least two vectors in a vector space V . If one of the vectors in the set is a linear combination of the other vectors in the set, then that vector can be deleted from the given set of vectors and the linear span of the resulting set of vectors will be the same as the linear span of {v1 , v2 , . . . , vk }. Proof The proof of this result is left for the exercises (Problem 48). For instance, if v1 is a linear combination of v2 , v3 , . . . , vk , then Theorem 4.5.2 says that span{v1 , v2 , . . . , vk } = span{v2 , v3 , . . . , vk }. In this case, the set {v1 , v2 , . . . , vk } is not a minimal spanning set. To determine a minimal spanning set, the problem we face in view of Theorem 4.5.2 is that of determining when a vector in {v1 , v2 , . . . , vk } can be expressed as a linear combination of the remaining vectors in the set. The correct formulation for solving this problem requires the concepts of linear dependence and linear independence, which we are now ready to introduce. First we consider linear dependence. i i i i i i i “main” 2007/2/16 page 269 i 4.5 Linear Dependence and Linear Independence 269 DEFINITION 4.5.3 A finite nonempty set of vectors {v1 , v2 , . . . , vk } in a vector space V is said to be linearly dependent if there exist scalars c1 , c2 , . . . , ck , not all zero, such that c1 v1 + c2 v2 + · · · + ck vk = 0. Such a nontrivial linear combination of vectors is sometimes referred to as a linear dependency among the vectors v1 , v2 , . . . , vk . A set of vectors that is not linearly dependent is called linearly independent. This can be stated mathematically as follows: DEFINITION 4.5.4 A finite, nonempty set of vectors {v1 , v2 , . . . , vk } in a vector space V is said to be linearly independent if the only values of the scalars c1 , c2 , . . . , ck for which c1 v1 + c2 v2 + · · · + ck vk = 0 are c1 = c2 = · · · = ck = 0. Remarks 1. It follows immediately from the preceding two definitions that a nonempty set of vectors in a vector space V is linearly independent if and only if it is not linearly dependent. 2. If {v1 , v2 , . . . , vk } is a linearly independent set of vectors, we sometimes informally say that the vectors v1 , v2 , . . . , vk are themselves linearly independent. The same remark applies to the linearly dependent condition as well. Consider the simple case of a set containing a single vector v. If v = 0, then {v} is linearly dependent, since for any nonzero scalar c1 , c1 0 = 0. On the other hand, if v = 0, then the only value of the scalar c1 for which c1 v = 0 is c1 = 0. Consequently, {v} is linearly independent. We can therefore state the next theorem. Theorem 4.5.5 A set consisting of a single vector v in a vector space V is linearly dependent if and only if v = 0. Therefore, any set consisting of a single nonzero vector is linearly independent. We next establish that linear dependence of a set containing at least two vectors is equivalent to the property that we are interested in—namely, that at least one vector in the set can be expressed as a linear combination of the remaining vectors in the set. i i i i i i i “main” 2007/2/16 page 270 i 270 CHAPTER 4 Vector Spaces Theorem 4.5.6 Let {v1 , v2 , . . . , vk } be a set of at least two vectors in a vector space V . Then {v1 , v2 , . . . , vk } is linearly dependent if and only if at least one of the vectors in the set can be expressed as a linear combination of the others. Proof If {v1 , v2 , . . . , vk } is linearly dependent, then according to Definition 4.5.3, there exist scalars c1 , c2 , . . . , ck , not all zero, such that c1 v1 + c2 v2 + · · · + ck vk = 0. Suppose that ci = 0. Then we can express vi as a linear combination of the other vectors as follows: vi = − 1 (c1 v1 + c2 v2 + · · · + ci −1 vi −1 + ci +1 vi +1 + · · · + ck vk ). ci Conversely, suppose that one of the vectors, say, vj , can be expressed as a linear combination of the remaining vectors. That is, vj = c1 v1 + c2 v2 + · · · + cj −1 vj −1 + cj +1 vj +1 + · · · + ck vk . Adding (−1)vj to both sides of this equation yields c1 v1 + c2 v2 + · · · + cj −1 vj −1 − vj + cj +1 vj +1 + · · · + ck vk = 0. Since the coefficient of vj is −1 = 0, the set of vectors {v1 , v2 , . . . , vk } is linearly dependent. As far as the minimal-spanning-set idea is concerned, Theorems 4.5.6 and 4.5.2 tell us that a linearly dependent spanning set for a (nontrivial) vector space V cannot be a minimal spanning set. On the other hand, we will see in the next section that a linearly v3 independent spanning set for V must be a minimal spanning set for V . For the remainder v2 of this section, however, we focus more on the mechanics of determining whether a given v1 set of vectors is linearly independent or linearly dependent. Sometimes this can be done x by inspection. For example, Figure 4.5.2 illustrates that any set of three vectors in R2 is linearly dependent. Figure 4.5.2: The set of vectors As another example, let V be the vector space of all functions defined on an interval {v1 , v2 , v3 } is linearly dependent I . If 2 , since v is a linear in R 3 f1 (x) = 1, f2 (x) = 2 sin2 x, f3 (x) = −5 cos2 x, combination of v1 and v2 . y then {f1 , f2 , f3 } is linearly dependent in V , since the identity sin2 x + cos2 x = 1 implies that for all x ∈ I , 1 f1 (x) = 2 f2 (x) − 1 f3 (x). 5 We can therefore conclude from Theorem 4.5.2 that span{1, 2 sin2 x, −5 cos2 x } = span{2 sin2 x, −5 cos2 x }. In relatively simple examples, the following general results can be applied. They are a direct consequence of the definition of linearly dependent vectors and are left for the exercises (Problem 49). Proposition 4.5.7 Let V be a vector space. 1. Any set of two vectors in V is linearly dependent if and only if the vectors are proportional. i i i i i i i “main” 2007/2/16 page 271 i 4.5 Linear Dependence and Linear Independence 271 2. Any set of vectors in V containing the zero vector is linearly dependent. Remark We emphasize that the first result in Proposition 4.5.7 holds only for the case of two vectors. It cannot be applied to sets containing more than two vectors. Example 4.5.8 If v1 = (1, 2, −9) and v2 = (−2, −4, 18), then {v1 , v2 } is linearly dependent in R3 , since v2 = −2v1 . Geometrically, v1 and v2 lie on the same line. Example 4.5.9 If A1 = 21 , 34 A2 = 00 , 00 A3 = 25 , −3 2 then {A1 , A2 , A3 } is linearly dependent in M2 (R), since it contains the zero vector from M2 (R). For more complicated situations, we must resort to Definitions 4.5.3 and 4.5.4, although conceptually it is always helpful to keep in mind that the essence of the problem we are solving is to determine whether a vector in a given set can be expressed as a linear combination of the remaining vectors in the set. We now give some examples to illustrate the use of Definitions 4.5.3 and 4.5.4. Example 4.5.10 If v1 = (1, 2, −1) v2 = (2, −1, 1), and v3 = (8, 1, 1), show that {v1 , v2 , v3 } is linearly dependent in R3 , and determine the linear dependency relationship. Solution: We must first establish that there are values of the scalars c1 , c2 , c3 , not all zero, such that c1 v1 + c2 v2 + c3 v3 = 0. (4.5.1) Substituting for the given vectors yields c1 (1, 2, −1) + c2 (2, −1, 1) + c3 (8, 1, 1) = (0, 0, 0). That is, (c1 + 2c2 + 8c3 , 2c1 − c2 + c3 , −c1 + c2 + c3 ) = (0, 0, 0). Equating corresponding components on either side of this equation yields c1 + 2c2 + 8c3 = 0, 2c1 − c2 + c3 = 0, −c1 + c2 + c3 = 0. The reduced row-echelon form of the augmented matrix of this system is 1020 0 1 3 0. 0000 Consequently, the system has an infinite number of solutions for c1 , c2 , c3 , so the vectors are linearly dependent. In order to determine a specific linear dependency relationship, we proceed to find c1 , c2 , and c3 . Setting c3 = t , we have c2 = −3t and c1 = −2t . Taking t = 1 and i i i i i i i “main” 2007/2/16 page 272 i 272 CHAPTER 4 Vector Spaces substituting these values for c1 , c2 , c3 into (4.5.1), we obtain the linear dependency relationship −2v1 − 3v2 + v3 = 0, or equivalently, 3 1 v1 = − 2 v2 + 2 v3 , which can be easily verified using the given expressions for v1 , v2 , and v3 . It follows from Theorem 4.5.2 that span{v1 , v2 , v3 } = span{v2 , v3 }. Geometrically, we can conclude that v1 lies in the plane determined by the vectors v2 and v3 . Example 4.5.11 Determine whether the following matrices are linearly dependent or linearly independent in M2 (R): A1 = 1 −1 , 20 A2 = 21 , 03 A3 = 1 −1 . 21 Solution: The condition for determining whether these vectors are linearly dependent or linearly independent, c1 A1 + c2 A2 + c3 A3 = 02 , is equivalent in this case to c1 1 −1 21 1 −1 + c2 + c3 20 03 21 = 00 , 00 which is satisfied if and only if c1 + 2c2 + −c1 + c2 − 2c1 + 3c2 + c3 c3 2c3 c3 = = = = 0, 0, 0, 0. The reduced row-echelon form of the augmented matrix of this homogeneous system is 1000 0 1 0 0 0 0 1 0, 0000 which implies that the system has only the trivial solution c1 = c2 = c3 = 0. It follows from Definition 4.5.4 that {A1 , A2 , A3 } is linearly independent. As a corollary to Theorem 4.5.2, we establish the following result. Corollary 4.5.12 Any nontrivial, finite set of linearly dependent vectors in a vector space V contains a linearly independent subset that has the same linear span as the given set of vectors. i i i i i i i “main” 2007/2/16 page 273 i 4.5 Linear Dependence and Linear Independence 273 Proof Since the given set is linearly dependent, at least one of the vectors in the set is a linear combination of the remaining vectors, by Theorem 4.5.6. Thus, by Theorem 4.5.2, we can delete that vector from the set, and the resulting set of vectors will span the same subspace of V as the original set. If the resulting set is linearly independent, then we are done. If not, then we can repeat the procedure to eliminate another vector in the set. Continuing in this manner (with a finite number of iterations), we will obtain a linearly independent set that spans the same subspace of V as the subspace spanned by the original set of vectors. Remark Corollary 4.5.12 is actually true even if the set of vectors in question is infinite, but we shall not need to consider that case in this text. In the case of an infinite set of vectors, other techniques are required for the proof. Note that the linearly independent set obtained using the procedure given in the previous theorem is not unique, and therefore the question arises whether the number of vectors in any resulting linearly independent set is independent of the manner in which the procedure is applied. We will give an affirmative answer to this question in Section 4.6. Example 4.5.13 Let v1 = (1, 2, 3), v2 = (−1, 1, 4), v3 = (3, 3, 2), and v4 = (−2, −4, −6). Determine a linearly independent set of vectors that spans the same subspace of R3 as span{v1 , v2 , v3 , v4 }. Solution: Setting c1 v1 + c2 v2 + c3 v3 + c4 v4 = 0 requires that c1 (1, 2, 3) + c2 (−1, 1, 4) + c3 (3, 3, 2) + c4 (−2, −4, −6) = (0, 0, 0), leading to the linear system c1 − c2 + 3c3 − 2c4 = 0, 2c1 + c2 + 3c3 − 4c4 = 0, 3c1 + 4c2 + 2c3 − 6c4 = 0. The augmented matrix of this system is 1 −1 3 −2 0 2 1 3 −4 0 3 4 2 −6 0 and the reduced row-echelon form of the augmented matrix of this system is 1 0 2 −2 0 0 1 −1 0 0 . 00 0 00 The system has two free variables, c3 = s and c4 = t , and so {v1 , v2 , v3 , v4 } is linearly dependent. Then c2 = s and c1 = 2t − 2s . So the general form of the solution is (2t − 2s, s, s, t) = s(−2, 1, 1, 0) + t (2, 0, 0, 1). Setting s = 1 and t = 0 yields the linear combination −2v1 + v2 + v3 = 0, (4.5.2) i i i i i i i “main” 2007/2/16 page 274 i 274 CHAPTER 4 Vector Spaces and setting s = 0 and t = 1 yields the linear combination 2v1 + v4 = 0. (4.5.3) We can solve (4.5.2) for v3 in terms of v1 and v2 , and we can solve (4.5.3) for v4 in terms of v1 . Hence, according to Theorem 4.5.2, we have span{v1 , v2 , v3 , v4 } = span{v1 , v2 }. By Proposition 4.5.7, v1 and v2 are linearly independent, so {v1 , v2 } is the linearly independent set we are seeking. Geometrically, the subspace of R3 spanned by v1 and v2 is a plane, and the vectors v3 and v4 lie in this plane. Linear Dependence and Linear Independence in Rn Let {v1 , v2 , . . . , vk } be a set of vectors in Rn , and let A denote the matrix that has v1 , v2 , . . . , vk as column vectors. Thus, A = [v1 , v2 , . . . , vk ]. (4.5.4) Since each of the given vectors is in Rn , it follows that A has n rows and is therefore an n × k matrix. The linear combination c1 v1 + c2 v2 + · · · + ck vk = 0 can be written in matrix form as (see Theorem 2.2.9) Ac = 0, (4.5.5) where A is given in Equation (4.5.4) and c = [c1 c2 . . . ck ]T . Consequently, we can state the following theorem and corollary: Theorem 4.5.14 Let v1 , v2 , . . . , vk be vectors in Rn and A = [v1 , v2 , . . . , vk ]. Then {v1 , v2 , . . . , vk } is linearly dependent if and only if the linear system Ac = 0 has a nontrivial solution. Corollary 4.5.15 Let v1 , v2 , . . . , vk be vectors in Rn and A = [v1 , v2 , . . . , vk ]. 1. If k > n, then {v1 , v2 , . . . , vk } is linearly dependent. 2. If k = n, then {v1 , v2 , . . . , vk } is linearly dependent if and only if det (A) = 0. Proof If k > n, the system (4.5.5) has an infinite number of solutions (see Corollary 2.5.11), hence the vectors are linearly dependent by Theorem 4.5.14. On the other hand, if k = n, the system (4.5.5) is n × n, and hence, from Corollary 3.2.5, it has an infinite number of solutions if and only if det(A) = 0. Example 4.5.16 Determine whether the given vectors are linearly dependent or linearly independent in R4 . 1. v1 = (1, 3, −1, 0), v2 = (2, 9, −1, 3), v3 = (4, 5, 6, 11), v4 = (1, −1, 2, 5), v5 = (3, −2, 6, 7). 2. v1 = (1, 4, 1, 7), v2 = (3, −5, 2, 3), v3 = (2, −1, 6, 9), v4 = (−2, 3, 1, 6). i i i i i i i “main” 2007/2/16 page 275 i 4.5 Linear Dependence and Linear Independence 275 Solution: 1. Since we have five vectors in R4 , Corollary 4.5.15 implies that {v1 , v2 , v3 , v4 , v5 } is necessarily linearly dependent. 2. In this case, we have four vectors in R4 , and therefore, we can use the determinant: 1 3 2 −2 4 −5 −1 3 det(A) = det[v1 , v2 , v3 , v4 ] = = −462. 1261 7396 Since the determinant is nonzero, it follows from Corollary 4.5.15 that the given set of vectors is linearly independent. Linear Independence of Functions We now consider the general problem of determining whether or not a given set of functions is linearly independent or linearly dependent. We begin by specializing the general Definition 4.5.4 to the case of a set of functions defined on an interval I . DEFINITION 4.5.17 The set of functions {f1 , f2 , . . . , fk } is linearly independent on an interval I if and only if the only values of the scalars c1 , c2 , . . . , ck such that c1 f1 (x) + c2 f2 (x) + · · · + ck fk (x) = 0, for all x ∈ I , (4.5.6) are c1 = c2 = · · · = ck = 0. The main point to notice is that the condition (4.5.6) must hold for all x in I . A key tool in deciding whether or not a collection of functions is linearly independent on an interval I is the Wronskian. As we will see in Chapter 6, we can draw particularly sharp conclusions from the Wronskian about the linear dependence or independence of a family of solutions to a linear homogeneous differential equation. DEFINITION 4.5.18 Let f1 , f2 , . . . , fk be functions in C k −1 (I ). The Wronskian of these functions is the order k determinant defined by W [f1 , f2 , . . . , fk ](x) = f1 (x) f1 (x) . . . (k −1) f1 f2 (x) f2 (x) . . . (k −1) (x) f2 ... ... fk (x) fk (x) . . . (k −1) (x) . . . fk . (x) Remark Notice that the Wronskian is a function defined on I . Also note that this function depends on the order of the functions in the Wronskian. For example, using properties of determinants, W [f2 , f1 , . . . , fk ](x) = −W [f1 , f2 , . . . , fk ](x). i i i i i i i “main” 2007/2/16 page 276 i 276 CHAPTER 4 Vector Spaces Example 4.5.19 If f1 (x) = sin x and f2 (x) = cos x on (−∞, ∞), then W [f1 , f2 ](x) = sin x cos x = (sin x)(− sin x) − (cos x)(cos x) cos x − sin x = −(sin2 x + cos2 x) = −1. Example 4.5.20 If f1 (x) = x , f2 (x) = x 2 , and f3 (x) = x 3 on (−∞, ∞), then x x2 x3 W [f1 , f2 , f3 ](x) = 1 2x 3x 2 = x(12x 2 − 6x 2 ) − (6x 3 − 2x 3 ) = 2x 3 . 0 2 6x We can now state and prove the main result about the Wronskian. Theorem 4.5.21 Let f1 , f2 , . . . , fk be functions in C k −1 (I ). If W [f1 , f2 , . . . , fk ] is nonzero at some point x0 in I , then {f1 , f2 , . . . , fk } is linearly independent on I . Proof To apply Definition 4.5.17, assume that c1 f1 (x) + c2 f2 (x) + · · · + ck fk (x) = 0, for all x in I . Then, differentiating k − 1 times yields the linear system c1 f1 (x) c1 f1 (x) (k −1) c1 f1 + c2 f2 (x) + c2 f2 (x) (k −1) (x) + c2 f2 + · · · + ck fk (x) + · · · + ck fk (x) (k −1) (x) + · · · + ck fk = 0, = 0, . . . (x) = 0, where the unknowns in the system are c1 , c2 , . . . , ck . We wish to show that c1 = c2 = · · · = ck = 0. The determinant of the matrix of coefficients of this system is just W [f1 , f2 , . . . , fk ](x). Consequently, if W [f1 , f2 , . . . , fk ](x0 ) = 0 for some x0 in I , then the determinant of the matrix of coefficients of the system is nonzero at that point, and therefore the only solution to the system is the trivial solution c1 = c2 = · · · = ck = 0. That is, the given set of functions is linearly independent on I . Remarks 1. Notice that it is only necessary for W [f1 , f2 , . . . , fk ](x) to be nonzero at one point in I for {f1 , f2 , . . . , fk } to be linearly independent on I . 2. Theorem 4.5.21 does not say that if W [f1 , f2 , . . . , fk ](x) = 0 for every x in I , then {f1 , f2 , . . . , fk } is linearly dependent on I . As we will see in the next example below, the Wronskian of a linearly independent set of functions on an interval I can be identically zero on I . Instead, the logical equivalent of the preceding theorem is: If {f1 , f2 , . . . , fk } is linearly dependent on I , then W [f1 , f2 , . . . , fk ](x) = 0 at every point of I . i i i i i i i “main” 2007/2/16 page 277 i 4.5 Linear Dependence and Linear Independence 277 If W [f1 , f2 , . . . , fk ](x) = 0 for all x in I , Theorem 4.5.21 gives no information as to the linear dependence or independence of {f1 , f2 , . . . , fk } on I . Example 4.5.22 Determine whether the following functions are linearly dependent or linearly independent on I = (−∞, ∞). (a) f1 (x) = ex , f2 (x) = x 2 ex . (b) f1 (x) = x , f2 (x) = x + x 2 , f3 (x) = 2x − x 2 . (c) f1 (x) = x 2 , f2 (x) = 2x 2 , if x ≥ 0, −x 2 , if x < 0. Solution: (a) W [f1 , f2 ](x) = ex x 2 ex 2x 2 2 2x = 2xe2x . x ex (x 2 + 2x) = e (x + 2x) − x e e Since W [f1 , f2 ](x) = 0 (except at x = 0), the functions are linearly independent on (−∞, ∞). (b) x x + x 2 2x − x 2 W [f1 , f2 , f3 ](x) = 1 1 + 2x 2 − 2x 0 2 −2 = x [(−2)(1 + 2x) − 2(2 − 2x)] − (−2)(x + x 2 ) − 2(2x − x 2 ) = 0. Thus, no conclusion can be drawn from Theorem 4.5.21. However, a closer inspection of the functions reveals, for example, that f2 = 3f1 − f3 . Consequently, the functions are linearly dependent on (−∞, ∞). (c) If x ≥ 0, then W [f1 , f2 ](x) = x 2 2x 2 = 0, 2 x 4x W [f1 , f2 ](x) = x 2 −x 2 = 0. 2x −2x whereas if x < 0, then Thus, W [f1 , f2 ](x) = 0 for all x in (−∞, ∞), so no conclusion can be drawn from Theorem 4.5.21. Again we take a closer look at the given functions. They are sketched in Figure 4.5.3. In this case, we see that on the interval (−∞, 0), the functions are linearly dependent, since f1 + f2 = 0. i i i i i i i “main” 2007/2/16 page 278 i 278 CHAPTER 4 Vector Spaces y y y f1(x) x f2(x) 2x2 f2(x) 2 y f2(x) f1(x) on ( 2f1(x) on [0, ) f1(x) x x2 , 0) y f2(x) x2 Figure 4.5.3: Two functions that are linearly independent on (−∞, ∞), but whose Wronskian is identically zero on that interval. They are also linearly dependent on [0, ∞), since on this interval we have 2f1 − f2 = 0. The key point is to realize that there is no set of nonzero constants c1 , c2 for which c1 f1 + c2 f2 = 0 holds for all x in (−∞, ∞). Hence, the given functions are linearly independent on (−∞, ∞). This illustrates our second remark following Theorem 4.5.21, and it emphasizes the importance of the role played by the interval I when discussing linear dependence and linear independence of functions. A collection of functions may be linearly independent on an interval I1 , but linearly dependent on another interval I2 . It might appear at this stage that the usefulness of the Wronskian is questionable, since if W [f1 , f2 , . . . , fk ] vanishes on an interval I , then no conclusion can be drawn as to the linear dependence or linear independence of the functions f1 , f2 , . . . , fk on I . However, the real power of the Wronskian is in its application to solutions of linear differential equations of the form y (n) + a1 (x)y (n−1) + · · · + an−1 (x)y + an (x)y = 0. (4.5.7) In Chapter 6, we will establish that if we have n functions that are solutions of an equation of the form (4.5.7) on an interval I , then if the Wronskian of these functions is identically zero on I , the functions are indeed linearly dependent on I . Thus, the Wronskian does completely characterize the linear dependence or linear independence of solutions of such equations. This is a fundamental result in the theory of linear differential equations. Exercises for 4.5 Key Terms Linearly dependent set, Linear dependency, Linearly independent set, Minimal spanning set, Wronskian of a set of functions. Skills • Be able to determine whether a given finite set of vectors is linearly dependent or linearly independent. For sets of one or two vectors, you should be able to do this at a glance. If the set is linearly dependent, be able to determine a linear dependency relationship among the vectors. • Be able to take a linearly dependent set of vectors and remove vectors until it becomes a linearly independent set of vectors with the same span as the original set. i i i i i i i “main” 2007/2/16 page 279 i 4.5 Linear Dependence and Linear Independence 279 Problems • Be able to produce a linearly independent set of vectors that spans a given subspace of a vector space V . • Be able to conclude immediately that a set of k vectors in Rn is linearly dependent if k > n, and know what can be said in the case where k = n as well. • Know what information the Wronskian does (and does not) give about the linear dependence or linear independence of a set of functions on an interval I . For Problems 1–9, determine whether the given set of vectors is linearly independent or linearly dependent in Rn . In the case of linear dependence, find a dependency relationship. 1. {(1, −1), (1, 1)}. 2. {(2, −1), (3, 2), (0, 1)}. 3. {(1, −1, 0), (0, 1, −1), (1, 1, 1)}. 4. {(1, 2, 3), (1, −1, 2), (1, −4, 1)}. 5. {(−2, 4, −6), (3, −6, 9)}. True-False Review 6. {(1, −1, 2), (2, 1, 0)}. For Questions 1–9, decide if the given statement is true or false, and give a brief justification for your answer. If true, you can quote a relevant definition or theorem from the text. If false, provide an example, illustration, or brief explanation of why the statement is false. 7. {(−1, 1, 2), (0, 2, −1), (3, 1, 2), (−1, −1, 1)}. 1. Every vector space V possesses a unique minimal spanning set. 2. The set of column vectors of a 5 × 7 matrix A must be linearly dependent. 8. {(1, −1, 2, 3), (2, −1, 1, −1), (−1, 1, 1, 1)}. 9. {(2, −1, 0, 1), (1, 0, −1, 2), (0, 3, 1, 2), (−1, 1, 2, 1)}. 10. Let v1 = (1, 2, 3), v2 = (4, 5, 6), v3 = (7, 8, 9). Determine whether {v1 , v2 , v3 } is linearly independent in R3 . Describe span{v1 , v2 , v3 } geometrically. 3. The set of column vectors of a 7 × 5 matrix A must be linearly independent. 4. Any nonempty subset of a linearly independent set of vectors is linearly independent. 5. If the Wronskian of a set of functions is nonzero at some point x0 in an interval I , then the set of functions is linearly independent. 6. If it is possible to express one of the vectors in a set S as a linear combination of the others, then S is a linearly dependent set. 7. If a set of vectors S in a vector space V contains a linearly dependent subset, then S is itself a linearly dependent set. 8. A set of three vectors in a vector space V is linearly dependent if and only if all three vectors are proportional to one another. 11. Consider the vectors v1 = (2, −1, 5), v2 = (1, 3, −4), v3 = (−3, −9, 12) in R3 . (a) Show that {v1 , v2 , v3 } is linearly dependent. (b) Is v1 ∈ span{v2 , v3 }? Draw a picture illustrating your answer. 12. Determine all values of the constant k for which the vectors (1, 1, k), (0, 2, k), and (1, k, 6) are linearly dependent in R3 . For Problems 13–14, determine all values of the constant k for which the given set of vectors is linearly independent in R4 . 13. {(1, 0, 1, k), (−1, 0, k, 1), (2, 0, 1, 3)}. 14. {(1, 1, 0, −1), (1, k, 1, 1), (2, 1, k, 1), (−1, 1, 1, k)}. For Problems 15–17, determine whether the given set of vectors is linearly independent in M2 (R). 15. A1 = 9. If the Wronskian of a set of functions is identically zero at every point of an interval I , then the set of functions is linearly dependent. 11 , A2 = 01 16. A1 = 2 −1 , A2 = 34 2 −1 , A3 = 01 36 . 04 −1 2 . 13 i i i i i i i “main” 2007/2/16 page 280 i 280 CHAPTER 4 17. A1 = Vector Spaces 10 , A2 = 12 −1 1 , A3 = 21 21 . 57 For Problems 18–19, determine whether the given set of vectors is linearly independent in P1 . 18. p1 (x) = 1 − x, 19. p1 (x) = 2 + 3x, p2 (x) = 1 + x . 20. Show that the vectors and f1 (x) = if x ≥ 0, if x < 0, x2, 3x 3 , f2 (x) = 7x 2 , I = (−∞, ∞). p2 (x) = 4 + 6x . p1 (x) = a + bx 33. p2 (x) = c + dx are linearly independent in P1 if and only if the constants a, b, c, d satisfy ad − bc = 0. 21. If f1 (x) = cos 2x, f2 (x) = sin2 x, f3 (x) = cos2 x , determine whether {f1 , f2 , f3 } is linearly dependent or linearly independent in C ∞ (−∞, ∞). For Problems 22–28, determine a linearly independent set of vectors that spans the same subspace of V as that spanned by the original set of vectors. For Problems 34–36, show that the Wronskian of the given functions is identically zero on (−∞, ∞). Determine whether the functions are linearly independent or linearly dependent on that interval. 34. f1 (x) = 1, f2 (x) = x, f3 (x) = 2x − 1. 35. f1 (x) = ex , f2 (x) = e−x , f3 (x) = cosh x . 36. f1 (x) = 2x 3 , 37. Consider the functions f1 (x) = x , 22. V = R3 , {(1, 2, 3), (−3, 4, 5), (1, − 4 , − 5 )}. 3 3 23. V = R3 , {(3, 1, 5), (0, 0, 0), (1, 2, −1), (−1, 2, 3)}. ifx ≥ 0, if x < 0. 5x 3 , −3x 3 , f2 (x) = f2 (x) = if x ≥ 0, if x < 0. x, −x, 24. V = R3 , {(1, 1, 1), (1, −1, 1), (1, −3, 1), (3, 1, 2)}. 25. V = R4 , {(1, 1, −1, 1), (2, −1, 3, 1), (1, 1, 2, 1), (2, −1, 2, 1)}. 26. V = M2 (R), 12 −1 2 32 , , 34 57 11 . 27. V = P1 , {2 − 5x, 3 + 7x, 4 − x }. 28. V = P2 , {2 + x 2 , 4 − 2x + 3x 2 , 1 + x }. For Problems 29–33, use the Wronskian to show that the given functions are linearly independent on the given interval I . 29. f1 (x) = 1, f2 (x) = x, f3 (x) = x 2 , I = (−∞, ∞). 30. f1 (x) = sin x, f2 (x) = cos x, f3 (x) = tan x, I = (−π/2, π/2). 31. f1 (x) = 1, f2 (x) = 3x, f3 (x) = x 2 − 1, I = (−∞, ∞). 32. f1 (x) = e2x , f2 (x) = e3x , f3 (x) = e−x , I = (−∞, ∞). (a) Show that f2 is not in C 1 (−∞, ∞). (b) Show that {f1 , f2 } is linearly dependent on the intervals (−∞, 0) and [0, ∞), while it is linearly independent on the interval (−∞, ∞). Justify your results by making a sketch showing both of the functions. 38. Determine whether the functions f1 (x) = x , f2 (x) = x, 1, if x = 0, if x = 0. are linearly dependent or linearly independent on I = (−∞, ∞). 39. Show that the functions f1 (x) = x − 1, 2(x − 1), if x ≥ 1, if x < 1, f2 (x) = 2x, f3 (x) = 3 form a linearly independent set on (−∞, ∞). Determine all intervals on which {f1 , f2 , f3 } is linearly dependent. i i i i i i i “main” 2007/2/16 page 281 i 4.6 40. (a) Show that {1, x, x 2 , x 3 } is linearly independent on every interval. (b) If fk (x) = x k for k = 0, 1, . . . , n, show that {f0 , f1 , . . . , fn } is linearly independent on every interval for all fixed n. 41. (a) Show that the functions f1 (x) = er1 x , f2 (x) = er2 x , f3 (x) = er3 x have Wronskian 111 W [f1 , f2 , f3 ](x) = e(r1 +r2 +r3 )x r1 r2 r3 222 r1 r2 r3 = e(r1 +r2 +r3 )x (r3 − r1 )(r3 − r2 )(r2 − r1 ), and hence determine the conditions on r1 , r2 , r3 such that {f1 , f2 , f3 } is linearly independent on every interval. (b) More generally, show that the set of functions {er1 x , er2 x , . . . , ern x } is linearly independent on every interval if and only if all of the ri are distinct. [Hint: Show that the Wronskian of the given functions is a multiple of the n × n Vandermonde determinant, and then use Problem 21 in Section 3.3.] 42. Let {v1 , v2 } be a linearly independent set in a vector space V , and let v = α v1 + v2 , w = v1 + α v2 , where α is a constant. Use Definition 4.5.4 to determine all values of α for which {v, w} is linearly independent. 43. If v1 and v2 are vectors in a vector space V , and u1 , u2 , u3 are each linear combinations of them, prove that {u1 , u2 , u3 } is linearly dependent. 44. Let v1 , v2 , . . . , vm be a set of linearly independent vectors in a vector space V and suppose that the vectors u1 , u2 , . . . , un are each linear combinations of them. It follows that we can write m uk = aik vi , k = 1, 2, . . . , n, i =1 for appropriate constants aik . 4.6 Bases and Dimension 281 (a) If n > m, prove that {u1 , u2 , . . . , un } is linearly dependent on V . (b) If n = m, prove that {u1 , u2 , . . . , un } is linearly independent in V if and only if det[aij ] = 0. (c) If n < m, prove that {u1 , u2 , . . . , un } is linearly independent in V if and only if rank(A) = n, where A = [aij ]. (d) Which result from this section do these results generalize? 45. Prove from the definition of “linearly independent” that if {v1 , v2 , . . . , vn } is linearly independent and if A is an invertible n × n matrix, then the set {Av1 , Av2 , . . . , Avn } is linearly independent. 46. Prove that if {v1 , v2 } is linearly independent and v3 is not in span{v1 , v2 }, then {v1 , v2 , v3 } is linearly independent. 47. Generalizing the previous exercise, prove that if {v1 , v2 , . . . , vk } is linearly independent and vk +1 is not in span{v1 , v2 , . . . , vk }, then {v1 , v2 , . . . , vk +1 } is linearly independent. 48. Prove Theorem 4.5.2. 49. Prove Proposition 4.5.7. 50. Prove that if {v1 , v2 , . . . , vk } spans a vector space V , then for every vector v in V , {v, v1 , v2 , . . . , vk } is linearly dependent. 51. Prove that if V = Pn and S = {p1 , p2 , . . . , pk } is a set of vectors in V each of a different degree, then S is linearly independent. [Hint: Assume without loss of generality that the polynomials are ordered in descending degree: deg(p1 ) > deg(p2 ) > · · · > deg(pk ). Assuming that c1 p1 + c2 p2 + · · · + ck pk = 0, first show that c1 is zero by examining the highest degree. Then repeat for lower degrees to show successively that c2 = 0, c3 = 0, and so on.] Bases and Dimension The results of the previous section show that if a minimal spanning set exists in a (nontrivial) vector space V , it cannot be linearly dependent. Therefore if we are looking for minimal spanning sets for V , we should focus our attention on spanning sets that are linearly independent. One of the results of this section establishes that every spanning set for V that is linearly independent is indeed a minimal spanning set. Such a set will be i i i i i i i “main” 2007/2/16 page 282 i 282 CHAPTER 4 Vector Spaces called a basis. This is one of the most important concepts in this text and a cornerstone of linear algebra. DEFINITION 4.6.1 A set of vectors {v1 , v2 , . . . , vk } in a vector space V is called a basis4 for V if (a) The vectors are linearly independent. (b) The vectors span V . Notice that if we have a finite spanning set for a vector space, then we can always, in principle, determine a basis for V by using the technique of Corollary 4.5.12. Furthermore, the computational aspects of determining a basis have been covered in the previous two sections, since all we are really doing is combining the two concepts of linear independence and linear span. Consequently, this section is somewhat more theoretically oriented than the preceding ones. The reader is encouraged not to gloss over the theoretical aspects, as these really are fundamental results in linear algebra. There do exist vector spaces V for which it is impossible to find a finite set of linearly independent vectors that span V . The vector space C n (I ), n ≥ 1, is such an example (Example 4.6.19). Such vector spaces are called infinite-dimensional vector spaces. Our primary interest in this text, however, will be vector spaces that contain a finite spanning set of linearly independent vectors. These are known as finite-dimensional vector spaces, and we will encounter numerous examples of them throughout the remainder of this section. We begin with the vector space Rn . In R2 , the most natural basis, denoted {e1 , e2 }, consists of the two vectors e1 = (1, 0), e2 = (0, 1), (4.6.1) and in R3 , the most natural basis, denoted {e1 , e2 , e3 }, consists of the three vectors e1 = (1, 0, 0), e2 = (0, 1, 0), e3 = (0, 0, 1). (4.6.2) The verification that the sets (4.6.1) and (4.6.2) are indeed bases of R2 and R3 , respectively, is straightforward and left as an exercise.5 These bases are referred to as the standard basis on R2 and R3 , respectively. In the case of the standard basis for R3 given in (4.6.2), we recognize the vectors e1 , e2 , e3 as the familiar unit vectors i, j, k pointing along the positive x -, y -, and z-axes of the rectangular Cartesian coordinate system. More generally, consider the set of vectors {e1 , e2 , . . . , en } in Rn defined by e1 = (1, 0, . . . , 0), e2 = (0, 1, . . . , 0), ..., en = (0, 0, . . . , 1). These vectors are linearly independent by Corollary 4.5.15, since det([e1 , e2 , . . . , en ]) = det(In ) = 1 = 0. Furthermore, the vectors span Rn , since an arbitrary vector v = (x1 , x2 , . . . , xn ) in Rn can be written as v = x1 (1, 0, . . . , 0) + x2 (0, 1, . . . , 0) + · · · + xn (0, 0, . . . , 1) = x1 e1 + x2 e2 + · · · + xn en . 4 The plural of basis is bases. 5 Alternatively, the verification is a special case of that given shortly for the general case of Rn . i i i i i i i “main” 2007/2/16 page 283 i 4.6 Bases and Dimension 283 Consequently, {e1 , e2 , . . . , en } is a basis for Rn . We refer to this basis as the standard basis for Rn . The general vector in Rn has n components, and the standard basis vectors arise as the n vectors that are obtained by sequentially setting one component to the value 1 and the other components to 0. In general, this is how we obtain standard bases in vector spaces whose vectors are determined by the specification of n independent constants. We illustrate with some examples. Example 4.6.2 Determine the standard basis for M2 (R). Solution: The general matrix in M2 (R) is ab . cd Consequently, there are four independent parameters that give rise to four special vectors in M2 (R). Sequentially setting one of these parameters to the value 1 and the others to 0 generates the following four matrices: A1 = 10 , 00 A2 = 01 , 00 A3 = 00 , 10 A4 = 00 . 01 We see that {A1 , A2 , A3 , A4 } is a spanning set for M2 (R). Furthermore, c1 A1 + c2 A2 + c3 A3 + c4 A4 = 02 holds if and only if c1 10 01 00 00 + c2 + c3 + c4 00 00 10 01 = 00 00 —that is, if and only if c1 = c2 = c3 = c4 = 0. Consequently, {A1 , A2 , A3 , A4 } is a linearly independent spanning set for M2 (R), hence it is a basis. This is the standard basis for M2 (R). Remark More generally, consider the vector space of all m × n matrices with real entries, Mm×n (R). If we let Eij denote the m × n matrix with value 1 in the (i, j )-position and zeros elsewhere, then we can show routinely that {Eij : 1 ≤ i ≤ m, 1 ≤ j ≤ n} is a basis for Mm×n (R), and it is the standard basis. Example 4.6.3 Determine the standard basis for P2 . Solution: We have P2 = {a0 + a1 x + a2 x 2 : a0 , a1 , a2 ∈ R}, so that the vectors in P2 are determined by specifying values for the three parameters a0 , a1 , and a2 . Sequentially setting one of these parameters to the value 1 and the other two to the value 0 yields the following vectors in P2 : p0 (x) = 1, p1 (x) = x, p2 (x) = x 2 . i i i i i i i “main” 2007/2/16 page 284 i 284 CHAPTER 4 Vector Spaces We have shown in Example 4.4.6 that {p0 , p1 , p2 } is a spanning set for P2 . Furthermore, 1 x x2 W [p0 , p1 , p2 ](x) = 0 1 2x = 2 = 0, 00 2 which implies that {p0 , p1 , p2 } is linearly independent on any interval.6 Consequently, {p0 , p1 , p2 } is a basis for P2 . This is the standard basis for P2 . Remark More generally, the reader can check that the standard basis for the vector space of all polynomials of degree n or less, Pn , is {1, x, x 2 , . . . , x n }. Dimension of a Finite-Dimensional Vector Space The reader has probably realized that there can be many different bases for a given vector space V . In addition to the standard basis {e1 , e2 , e3 } on R3 , for example, it can be checked7 that {(1, 2, 3), (4, 5, 6), (7, 8, 8)} and {(1, 0, 0), (1, 1, 0), (1, 1, 1)} are also bases for R3 . And there are countless others as well. Despite the multitude of different bases available for a vector space V , they all share one common feature: the number of vectors in each basis for V is the same. This fact will be deduced as a corollary of our next theorem, a fundamental result in the theory of vector spaces. Theorem 4.6.4 If a finite-dimensional vector space has a basis consisting of m vectors, then any set of more than m vectors is linearly dependent. Proof Let {v1 , v2 , . . . , vm } be a basis for V , and consider an arbitrary set of vectors in V , say, {u1 , u2 , . . . , un }, with n > m. We wish to prove that {u1 , u2 , . . . , un } is necessarily linearly dependent. Since {v1 , v2 , . . . , vm } is a basis for V , it follows that each uj can be written as a linear combination of v1 , v2 , . . . , vm . Thus, there exist constants aij such that u1 = a11 v1 + a21 v2 + · · · + am1 vm , u2 = a12 v1 + a22 v2 + · · · + am2 vm , . . . un = a1n v1 + a2n v2 + · · · + amn vm . To prove that {u1 , u2 , . . . , un } is linearly dependent, we must show that there exist scalars c1 , c2 , . . . , cn , not all zero, such that c1 u1 + c2 u2 + · · · + cn un = 0. (4.6.3) Inserting the expressions for u1 , u2 , . . . , un into Equation (4.6.3) yields c1 (a11 v1 + a21 v2 + · · · + am1 vm ) + c2 (a12 v1 + a22 v2 + · · · + am2 vm ) + · · · + cn (a1n v1 + a2n v2 + · · · + amn vm ) = 0. 6 Alternatively, we can start with the equation c p (x) + c p (x) + c p (x) = 0 for all x in R and show 00 11 22 readily that c0 = c1 = c2 = 0. 7 The reader desiring extra practice at the computational aspects of verifying a basis is encouraged to pause here to check these examples. i i i i i i i “main” 2007/2/16 page 285 i 4.6 Bases and Dimension 285 Rearranging terms, we have (a11 c1 + a12 c2 + · · · + a1n cn )v1 + (a21 c1 + a22 c2 + · · · + a2n cn )v2 + · · · + (am1 c1 + am2 c2 + · · · + amn cn )vm = 0. Since {v1 , v2 , . . . , vm } is linearly independent, we can conclude that a11 c1 + a12 c2 + · · · + a1n cn = 0, a21 c1 + a22 c2 + · · · + a2n cn = 0, . . . am1 c1 + am2 c2 + · · · + amn cn = 0. This is an m × n homogeneous system of linear equations with m < n, and hence, from Corollary 2.5.11, it has nontrivial solutions for c1 , c2 , . . . , cn . It therefore follows from Equation (4.6.3) that {u1 , u2 , . . . , un } is linearly dependent. Corollary 4.6.5 All bases in a finite-dimensional vector space V contain the same number of vectors. Proof Suppose {v1 , v2 , . . . , vn } and {u1 , u2 , . . . , um } are two bases for V . From Theorem 4.6.4 we know that we cannot have m > n (otherwise {u1 , u2 , . . . , um } would be a linearly dependent set and hence could not be a basis for V ). Nor can we have n > m (otherwise {v1 , v2 , . . . , vn } would be a linearly dependent set and hence could not be a basis for V ). Thus, it follows that we must have m = n. We can now prove that any basis provides a minimal spanning set for V . Corollary 4.6.6 If a finite-dimensional vector space V has a basis consisting of n vectors, then any spanning set must contain at least n vectors. Proof If the spanning set contained fewer than n vectors, then there would be a subset of less than n linearly independent vectors that spanned V ; that is, there would be a basis consisting of less than n vectors. But this would contradict the previous corollary. The number of vectors in a basis for a finite-dimensional vector space is clearly a fundamental property of the vector space, and by Corollary 4.6.5 it is independent of the particular chosen basis. We call this number the dimension of the vector space. DEFINITION 4.6.7 The dimension of a finite-dimensional vector space V , written dim[V ], is the number of vectors in any basis for V . If V is the trivial vector space, V = {0}, then we define its dimension to be zero. Remark We say that the dimension of the world we live in is three for the very reason that the maximum number of independent directions that we can perceive is three. If a vector space has a basis containing n vectors, then from Theorem 4.6.4, the maximum number of vectors in any linearly independent set is n. Thus, we see that the terminology dimension used in an arbitrary vector space is a generalization of a familiar idea. Example 4.6.8 It follows from our examples earlier in this section that dim[R3 ] = 3, dim[M2 (R)] = 4, and dim[P2 ] = 3. i i i i i i i “main” 2007/2/16 page 286 i 286 CHAPTER 4 Vector Spaces More generally, the following dimensions should be remembered: dim[Rn ] = n, dim[Mm×n (R)] = mn, dim[Mn (R)] = n2 , dim[Pn ] = n + 1. These values have essentially been established previously in our discussion of standard bases. The standard basis for Rn is {e1 , e2 , . . . , en }, where ei is the n-tuple with value 1 in the i th position and value 0 elsewhere. Thus, this basis contains n vectors. The standard basis for Mm×n (R) is the set of matrices Eij (1 ≤ i ≤ m, 1 ≤ j ≤ n) with value 1 in the (i, j ) position and value 0 elsewhere. There are mn such matrices in this standard basis. The case of Mn (R) is just a special case of Mm×n (R) in which m = n. Finally, the standard basis for Pn is {1, x, x 2 , . . . , x n }, a set of n + 1 vectors. Next, let us return once more to Example 1.2.16 to cast its results in terms of the basis concept. Example 4.6.9 Determine a basis for the solution space to the differential equation y +y =0 on any interval I . Solution: Our results from Example 1.2.16 tell us that all solutions to the given differential equation are of the form y(x) = c1 cos x + c2 sin x. Consequently, {cos x, sin x } is a linearly independent spanning set for the solution space of the differential equation and therefore is a basis. More generally, we will show in Chapter 6 that all solutions to the differential equation y + a1 (x)y + a2 (x)y = 0 on the interval I have the form y(x) = c1 y1 (x) + c2 y2 (x), where {y1 , y2 } is any linearly independent set of solutions to the differential equation. Using the terminology introduced in this section, it will therefore follow that: The set of all solutions to y + a1 (x)y + a2 (x)y = 0 on an interval I is a vector space of dimension two. If a vector space has dimension n, then from Theorem 4.6.4, the maximum number of vectors in any linearly independent set is n. On the other hand, from Corollary 4.6.6, the minimum number of vectors that can span V is also n. Thus, a basis for V must be a linearly independent set of n vectors. Our next theorem establishes that any set of n linearly independent vectors is a basis for V . Theorem 4.6.10 If dim[V ] = n, then any set of n linearly independent vectors in V is a basis for V . i i i i i i i “main” 2007/2/16 page 287 i 4.6 Bases and Dimension 287 Proof Let v1 , v2 , . . . , vn be n linearly independent vectors in V . We need to show that they span V . To do this, let v be an arbitrary vector in V . From Theorem 4.6.4, the set of vectors {v, v1 , v2 , . . . , vn } is linearly dependent, and so there exist scalars c0 , c1 , . . . , cn , not all zero, such that c0 v + c1 v1 + · · · + cn vn = 0. (4.6.4) If c0 = 0, then the linear independence of {v1 , v2 , . . . , vn } and (4.6.4) would imply that c0 = c1 = · · · = cn = 0, a contradiction. Hence, c0 = 0, and so, from Equation (4.6.4), v=− 1 (c1 v1 + c2 v2 + · · · + cn vn ). c0 Thus v, and hence any vector in V , can be written as a linear combination of v1 , v2 , . . . , vn , and hence, {v1 , v2 , . . . , vn } spans V , in addition to being linearly independent. Hence it is a basis for V , as required. Theorem 4.6.10 is one of the most important results of the section. In Chapter 6, we will explicitly construct a basis for the solution space to the differential equation y (n) + a1 (x)y (n−1) + · · · + an−1 (x)y + an (x)y = 0 consisting of n vectors. That is, we will show that the solution space to this differential equation is n-dimensional. It will then follow immediately from Theorem 4.6.10 that every solution to this differential equation is of the form y(x) = c1 y1 (x) + c2 y2 (x) + · · · + cn yn (x), where {y1 , y2 , . . . , yn } is any linearly independent set of n solutions to the differential equation. Therefore, determining all solutions to the differential equation will be reduced to determining any linearly independent set of n solutions. A similar application of the theorem will be used to develop the theory for systems of differential equations in Chapter 7. More generally, Theorem 4.6.10 says that if we know in advance that the dimension of the vector space V is n, then n linearly independent vectors in V are already guaranteed to form a basis for V without the need to explicitly verify that these n vectors also span V . This represents a useful reduction in the work required to verify a basis. Here is an example: Example 4.6.11 Verify that {1 + x, 2 − 2x + x 2 , 1 + x 2 } is a basis for P2 . Solution: Since dim[P2 ] = 3, Theorem 4.6.10 will guarantee that the three given vectors are a basis, once we confirm only that they are linearly independent. The polynomials p1 (x) = 1 + x, p2 (x) = 2 − 2x + x 2 , p3 (x) = 1 + x 2 have Wronskian 1 + x 2 − 2x + x 2 1 + x 2 W [p1 , p2 , p3 ](x) = 1 −2 + 2x 2x = −6 = 0. 0 2 2 Since the Wronskian is nonzero, the given set of vectors is linearly independent on any interval. Consequently, {1 + x, 2 − 2x + x 2 , 1 + x 2 } is indeed a basis for P2 . i i i i i i i “main” 2007/2/16 page 288 i 288 CHAPTER 4 Vector Spaces There is a notable parallel result to Theorem 4.6.10 which can also cut down the work required to verify that a set of vectors in V is a basis for V , provided that we know the dimension of V in advance. Theorem 4.6.12 If dim[V ] = n, then any set of n vectors in V that spans V is a basis for V . Proof Let v1 , v2 , . . . , vn be n vectors in V that span V . To confirm that {v1 , v2 , . . . , vn } is a basis for V , we need only show that this is a linearly independent set of vectors. Suppose, to the contrary, that {v1 , v2 , . . . , vn } is a linearly dependent set. By Corollary 4.5.12, there is a linearly independent subset of {v1 , v2 , . . . , vn }, with fewer than n vectors, which also spans V . But this implies that V contains a basis with fewer than n vectors, a contradiction. Putting the results of Theorems 4.6.10 and 4.6.12 together, the following result is immediate. Corollary 4.6.13 If dim[V ] = n and S = {v1 , v2 , . . . , vn } is a set of n vectors in V , the following statements are equivalent: 1. S is a basis for V . 2. S is linearly independent. 3. S spans V . We emphasize once more the importance of this result. It means that if we have a set S of dim[V ] vectors in V , then to determine whether or not S is a basis for V , we need only check if S is linearly independent or if S spans V , not both. We next establish another corollary to Theorem 4.6.10. Corollary 4.6.14 Let S be a subspace of a finite-dimensional vector space V . If dim[V ] = n, then dim[S ] ≤ n. Furthermore, if dim[S ] = n, then S = V . Proof Suppose that dim[S ] > n. Then any basis for S would contain more than n linearly independent vectors, and therefore we would have a linearly independent set of more than n vectors in V . This would contradict Theorem 4.6.4. Thus, dim[S ] ≤ n. Now consider the case when dim[S ] = n = dim[V ]. In this case, any basis for S consists of n linearly independent vectors in S and hence n linearly independent vectors in V . Thus, by Theorem 4.6.10, these vectors also form a basis for V . Hence, every vector in V is spanned by the basis vectors for S , and hence, every vector in V lies in S . Thus, V = S . Example 4.6.15 Give a geometric description of the subspaces of R3 of dimensions 0, 1, 2, 3. Solution: Zero-dimensional subspace: This corresponds to the subspace {(0, 0, 0)}, and therefore it is represented geometrically by the origin of a Cartesian coordinate system. One-dimensional subspace: These are subspaces generated by a single (nonzero) basis vector. Consequently, they correspond geometrically to lines through the origin. Two-dimensional subspace: These are the subspaces generated by any two noncollinear vectors and correspond geometrically to planes through the origin. i i i i i i i “main” 2007/2/16 page 289 i 4.6 Bases and Dimension 289 Three-dimensional subspace: Since dim[R3 ] = 3, it follows from Corollary 4.6.14 that the only three-dimensional subspace of R3 is R3 itself. Example 4.6.16 Determine a basis for the subspace of R3 consisting of all solutions to the equation x1 + 2x2 − x3 = 0. Solution: We can solve this problem geometrically. The given equation is that of a plane through the origin and therefore is a two-dimensional subspace of R3 . In order to determine a basis for this subspace, we need only choose two linearly independent (i.e., noncollinear) vectors that lie in the plane. A simple choice of vectors is8 v1 = (1, 0, 1) and v2 = (2, −1, 0). Thus, a basis for the subspace is {(1, 0, 1), (2, −1, 0)}. Corollary 4.6.14 has shown that if S is a subspace of a finite-dimensional vector space V with dim[S ] = dim[V ], then S = V . Our next result establishes that, in general, a basis for a subspace of a finite-dimensional vector space V can be extended to a basis for V . This result will be required in the next section and also in Chapter 5. Theorem 4.6.17 Let S be a subspace of a finite-dimensional vector space V . Any basis for S is part of a basis for V . Proof Suppose dim[V ] = n and dim[S ] = k . By Corollary 4.6.14, k ≤ n. If k = n, then S = V , so that any basis for S is a basis for V . Suppose now that k < n, and let {v1 , v2 , . . . , vk } be a basis for S . These basis vectors are linearly independent, but they fail to span V (otherwise they would form a basis for V , contradicting k < n). Thus, there is at least one vector, say vk +1 , in V that is not in span{v1 , v2 , . . . , vk }. Hence, {v1 , v2 , . . . , vk , vk +1 } is linearly independent. If k + 1 = n, then we have a basis for V by Theorem 4.6.10, and we are done. Otherwise, we can repeat the procedure to obtain the linearly independent set {v1 , v2 , . . . , vk , vk +1 , vk +2 }. The process will terminate when we have a linearly independent set containing n vectors, including the original vectors v1 , v2 , . . . , vk in the basis for S . This proves the theorem. Remark a basis. Example 4.6.18 The process used in proving the previous theorem is referred to as extending Let S denote the subspace of M2 (R) consisting of all symmetric 2×2 matrices. Determine a basis for S , and find dim[S ]. Extend this basis for S to obtain a basis for M2 (R). Solution: We first express S in set notation as S = {A ∈ M2 (R) : AT = A}. In order to determine a basis for S , we need to obtain the element form of the matrices in S . We can write ab S= : a, b, c ∈ R . bc Since ab bc =a 10 01 00 +b +c , 00 10 01 it follows that S = span 10 01 00 , , 00 10 01 . 8 There are many others, of course. i i i i i i i “main” 2007/2/16 page 290 i 290 CHAPTER 4 Vector Spaces Furthermore, it is easily shown that the matrices in this spanning set are linearly independent. Consequently, a basis for S is 10 01 00 , , 00 10 01 , so that dim[S ] = 3. Since dim[M2 (R)] = 4, in order to extend the basis for S to a basis for M2 (R), we need to add one additional matrix from M2 (R) such that the resulting set is linearly independent. We must choose a nonsymmetric matrix, for any symmetric matrix can be expressed as a linear combination of the three basis vectors for S , and this would create a linear dependency among the matrices. A simple choice of nonsymmetric matrix (although this is certainly not the only choice) is 01 . 00 Adding this vector to the basis for S yields the linearly independent set 10 01 00 01 , , , 00 10 01 00 . (4.6.5) Since dim[M2 (R)] = 4, Theorem 4.6.10 implies that (4.6.5) is a basis for M2 (R). It is important to realize that not all vector spaces are finite dimensional. Some are infinite-dimensional. In an infinite-dimensional vector space, we can find an arbitrarily large number of linearly independent vectors. We now give an example of an infinitedimensional vector space that is of primary importance in the theory of differential equations, C n (I ). Example 4.6.19 Show that the vector space C n (I ) is an infinite-dimensional vector space. Solution: Consider the functions 1, x, x 2 , . . . , x k in C n (I ). Of course, each of these functions is in C k (I ) as well, and for each fixed k , the Wronskian of these functions is nonzero (the reader can check that the matrix involved in this calculation is upper triangular, with nonzero entries on the main diagonal). Hence, the functions are linearly independent on I by Theorem 4.5.21. Since we can choose k arbitrarily, it follows that there are an arbitrarily large number of linearly independent vectors in C n (I ), hence C n (I ) is infinite-dimensional. In this example we showed that C n (I ) is an infinite-dimensional vector space. Consequently, the use of our finite-dimensional vector space theory in the analysis of differential equations appears questionable. However, the key theoretical result that we will establish in Chapter 6 is that the solution set of certain linear differential equations is a finite-dimensional subspace of C n (I ), and therefore our basis results will be applicable to this solution set. Exercises for 4.6 Key Terms Basis, Standard basis, Infinite-dimensional, Finitedimensional, Dimension, Extension of a subspace basis. Skills • Be able to determine whether a given set of vectors forms a basis for a vector space V . • Be able to construct a basis for a given vector space V. • Be able to extend a basis for a subspace of V to V itself. • Be familiar with the standard bases on Rn , Mm×n (R), and Pn . i i i i i i i “main” 2007/2/16 page 291 i 4.6 Bases and Dimension 291 • Be able to give the dimension of a vector space V . 4. {(1, 1, −1, 2), (1, 0, 1, −1), (2, −1, 1, −1)}. • Be able to draw conclusions about the properties of a set of vectors in a vector space (i.e., spanning or linear independence) based solely on the size of the set. 5. {(1, 1, 0, 2), (2, 1, 3, −1), (−1, 1, 1, −2), (2, −1, 1, 2)}. • Understand the usefulness of Theorems 4.6.10 and 4.6.12. True-False Review For Questions 1–11, decide if the given statement is true or false, and give a brief justification for your answer. If true, you can quote a relevant definition or theorem from the text. If false, provide an example, illustration, or brief explanation of why the statement is false. 1. A basis for a vector space V is a set S of vectors that spans V . 2. If V and W are vector spaces of dimensions n and m, respectively, and if n > m, then W is a subspace of V . 3. A vector space V can have many different bases. 4. dim[Pn ] = dim[Rn ]. 5. If V is an n-dimensional vector space, then any set S of m vectors with m > n must span V . 6. Five vectors in P3 must be linearly dependent. 7. Two vectors in P3 must be linearly independent. 8. Ten vectors in M3 (R) must be linearly dependent. 9. If V is an n-dimensional vector space, then every set S with fewer than n vectors can be extended to a basis for V . 10. Every set of vectors that spans a finite-dimensional vector space V contains a subset which forms a basis for V . 11. The set of all 3 × 3 upper triangular matrices forms a three-dimensional subspace of M3 (R). Problems For Problems 1–5, determine whether the given set of vectors is a basis for Rn . 1. {(1, 1), (−1, 1)}. 2. {(1, 2, 1), (3, −1, 2), (1, 1, −1)}. 3. {(1, −1, 1), (2, 5, −2), (3, 11, −5)}. 6. Determine all values of the constant k for which the set of vectors {(0, −1, 0, k), (1, 0, 1, 0), (0, 1, 1, 0), (k, 0, 2, 1)} is a basis for R4 . 7. Determine a basis S for P3 , and hence, prove that dim[P3 ] = 4. Be sure to prove that S is a basis. 8. Determine a basis S for P3 whose elements all have the same degree. Be sure to prove that S is a basis. For Problems 9–12, find the dimension of the null space of the given matrix A. 13 . −2 −6 000 10. A = 0 0 0 . 010 1 −1 4 11. A = 2 3 −2 . 1 2 −2 1 −1 2 3 2 −1 3 4 12. A = 1 0 1 1 . 3 −1 4 5 9. A = 13. Let S be the subspace of R3 that consists of all solutions to the equation x − 3y + z = 0. Determine a basis for S , and hence, find dim[S ]. 14. Let S be the subspace of R3 consisting of all vectors of the form (r, r − 2s, 3s − 5r), where r and s are real numbers. Determine a basis for S , and hence, find dim[S ]. 15. Let S be the subspace of M2 (R) consisting of all 2 × 2 upper triangular matrices. Determine a basis for S , and hence, find dim[S ]. 16. Let S be the subspace of M2 (R) consisting of all 2 × 2 matrices with trace zero. Determine a basis for S , and hence, find dim[S ]. 17. Let S be the subspace of R3 spanned by the vectors v1 = (1, 0, 1), v2 = (0, 1, 1), v3 = (2, 0, 2). Determine a basis for S , and hence, find dim[S ]. i i i i i i i “main” 2007/2/16 page 292 i 292 CHAPTER 4 Vector Spaces 18. Let S be the vector space consisting of the set of all linear combinations of the functions f1 (x) = ex , f2 (x) = e−x , f3 (x) = sinh(x). Determine a basis for S , and hence, find dim[S ]. 26. Let A1 = 13 , −1 2 00 , 00 −1 4 , 11 5 −6 . −5 1 20. Let v1 = (1, 1) and v2 = (−1, 1). (a) Show that {v1 , v2 } spans A2 = 13 , −1 0 A3 = 19. Determine a basis for the subspace of M2 (R) spanned by −1 1 , 01 10 , 12 A4 = 0 −1 . 23 (a) Show that {A1 , A2 , A3 , A4 } is a basis for M2 (R). [The hint on the previous problems applies again.] (b) Express the vector R2 . 56 78 (b) Show that {v1 , v2 } is linearly independent. (c) Conclude from (a) or (b) that {v1 , v2 } is a basis for R2 . What theorem in this section allows you to draw this conclusion from either (a) or (b), without proving both? 21. Let v1 = (2, 1) and v2 = (3, 1). (a) Show that {v1 , v2 } spans R2 . (b) Show that {v1 , v2 } is linearly independent. (c) Conclude from (a) or (b) that {v1 , v2 } is a basis for R2 . What theorem in this section allows you to draw this conclusion from either (a) or (b), without proving both? 22. Let v1 = (0, 6, 3), v2 = (3, 0, 3), and v3 = (6, −3, 0). Show that {v1 , v2 , v3 } is a basis for R3 . [Hint: You need not show that the set is both linearly independent and a spanning set for P2 . Use a theorem from this section to shorten your work.] 23. Determine all values of the constant α for which {1 + αx 2 , 1 + x + x 2 , 2 + x } is a basis for P2 . 24. Let p1 (x) = 1 + x, p2 (x) = x(x − 1), p3 (x) = 1+2x 2 . Show that {p1 , p2 , p3 } is a basis for P2 . [Hint: You need not show that the set is both linearly independent and a spanning set for P2 . Use a theorem from this section to shorten your work.] 25. The Legendre polynomial of degree n, pn (x), is defined to be the polynomial solution of the differential equation (1 − x 2 )y − 2xy + n(n + 1)y = 0, which has been normalized so that pn (1) = 1. The first three Legendre polynomials are p0 (x) = 1, p1 (x) = 1 x , and p2 (x) = 2 (3x 2 − 1). Show that {p0 , p1 , p2 } is a basis for P2 . [The hint for the previous problem applies again.] as a linear combination of the basis in (a). 27. Let 1 1 −1 1 A = 2 −3 5 −6 , 5 0 2 −3 and let v1 = (−2, 7, 5, 0) and v2 = (3, −8, 0, 5). (a) Show that {v1 , v2 } is a basis for the null space of A. (b) Using the basis in part (a), write an expression for an arbitrary vector (x, y, z, w) in the null space of A. 28. Let V = M3 (R) and let S be the subset of all vectors in V such that the sum of the entries in each row and in each column is zero. (a) Find a basis and the dimension of S . (b) Extend the basis in (a) to a basis for V . 29. Let V = M3 (R) and let S be the subset of all vectors in V such that the sum of the entries in each row and in each column is the same. (a) Find a basis and the dimension of S . (b) Extend the basis in (a) to a basis for V . For Problems 30–31, Symn (R) and Skewn (R) denote the vector spaces consisting of all real n × n matrices that are symmetric and skew-symmetric, respectively. 30. Find a basis for Sym2 (R) and Skew2 (R), and show that dim[Sym2 (R)] + dim[Skew2 (R)] = dim[M2 (R)]. i i i i i i i “main” 2007/2/16 page 293 i 4.7 31. Determine the dimensions Skewn (R), and show that of Symn (R) and dim[Symn (R)] + dim[Skewn (R)] = dim[Mn (R)]. For Problems 32–34, a subspace S of a vector space V is given. Determine a basis for S and extend your basis for S to obtain a basis for V . R3 , 32. V = S is the subspace consisting of all points lying on the plane with Cartesian equation x + 4y − 3z = 0. 33. V = M2 (R), S is the subspace consisting of all matrices of the form ab . ba 4.7 293 Change of Basis 34. V = P2 , S is the subspace consisting of all polynomials of the form (2a1 + a2 )x 2 + (a1 + a2 )x + (3a1 − a2 ). 35. Let S be a basis for Pn−1 . Prove that S ∪ {x n } is a basis for Pn . 36. Generalize the previous problem as follows. Let S be a basis for Pn−1 , and let p be any polynomial of degree n. Prove that S ∪ {p} is a basis for Pn . 37. (a) What is the dimension of Cn as a real vector space? Determine a basis. (b) What is the dimension of Cn as a complex vector space? Determine a basis. Change of Basis Throughout this section, we restrict our attention to vector spaces that are finite-dimensional. If we have a (finite) basis for such a vector space V , then, since the vectors in a basis span V , any vector in V can be expressed as a linear combination of the basis vectors. The next theorem establishes that there is only one way in which we can do this. Theorem 4.7.1 If V is a vector space with basis {v1 , v2 , . . . , vn }, then every vector v ∈ V can be written uniquely as a linear combination of v1 , v2 , . . . , vn . Proof Since v1 , v2 , . . . , vn span V , every vector v ∈ V can be expressed as v = a1 v1 + a2 v2 + · · · + an vn , (4.7.1) for some scalars a1 , a2 , . . . , an . Suppose also that v = b1 v1 + b2 v2 + · · · + bn vn , (4.7.2) for some scalars b1 , b2 , . . . , bn . We will show that ai = bi for each i , which will prove the uniqueness assertion of this theorem. Subtracting Equation (4.7.2) from Equation (4.7.1) yields (a1 − b1 )v1 + (a2 − b2 )v2 + · · · + (an − bn )vn = 0. (4.7.3) But {v1 , v2 , . . . , vn } is linearly independent, and so Equation (4.7.3) implies that a1 − b1 = 0, a2 − b2 = 0, ..., an − bn = 0. That is, ai = bi for each i = 1, 2, . . . , n. Remark The converse of Theorem 4.7.1 is also true. That is, if every vector v in a vector space V can be written uniquely as a linear combination of the vectors in {v1 , v2 , . . . , vn }, then {v1 , v2 , . . . , vn } is a basis for V . The proof of this fact is left as an exercise (Problem 38). Up to this point, we have not paid particular attention to the order in which the vectors of a basis are listed. However, in the remainder of this section, this will become i i i i i i i “main” 2007/2/16 page 294 i 294 CHAPTER 4 Vector Spaces an important consideration. By an ordered basis for a vector space, we mean a basis in which we are keeping track of the order in which the basis vectors are listed. DEFINITION 4.7.2 If B = {v1 , v2 , . . . , vn } is an ordered basis for V and v is a vector in V , then the scalars c1 , c2 , . . . , cn in the unique n-tuple (c1 , c2 , . . . , cn ) such that v = c1 v1 + c2 v2 + · · · + cn vn are called the components of v relative to the ordered basis B = {v1 , v2 , . . . , vn }. We denote the column vector consisting of the components of v relative to the ordered basis B by [v]B , and we call [v]B the component vector of v relative to B . Determine the components of the vector v = (1, 7) relative to the ordered basis B = {(1, 2), (3, 1)}. Example 4.7.3 Solution: If we let v1 = (1, 2) and v2 = (3, 1), then since these vectors are not collinear, B = {v1 , v2 } is a basis for R2 . We must determine constants c1 , c2 such that y c1 v1 + c2 v2 = v. (4, 8) (1, 7) We write 4v1 v = 4v1 c1 (1, 2) + c2 (3, 1) = (1, 7). v2 This requires that c1 + 3c2 = 1 and 2c1 + c2 = 7. The solution to this system is (4, −1), which gives the components of v relative to the ordered basis B = {v1 , v2 }. (See Figure 4.7.1.) Thus, (1, 2) v1 (3, 1) v2 v = 4v1 − v2 . x Therefore, we have v2 Figure 4.7.1: The components of the vector v = (1, 7) relative to the basis {(1, 2), (3, 1)}. [v]B = 4 . −1 Remark In the preceding example, the component vector of v = (1, 7) relative to the ordered basis B = {(3, 1), (1, 2)} is [v]B = −1 . 4 Thus, even though the bases B and B contain the same vectors, the fact that the vectors are listed in different order affects the components of the vectors in the vector space. Example 4.7.4 In P2 , determine the component vector of p(x) = 5 + 7x − 3x 2 relative to the following: (a) The standard (ordered) basis B = {1, x, x 2 }. (b) The ordered basis C = {1 + x, 2 + 3x, 5 + x + x 2 }. i i i i i i i “main” 2007/2/16 page 295 i 4.7 Change of Basis 295 Solution: (a) The given polynomial is already written as a linear combination of the standard basis vectors. Consequently, the components of p(x) = 5 + 7x − 3x 2 relative to the standard basis B are 5, 7, and −3. We write 5 [p(x)]B = 7 . −3 (b) The components of p(x) = 5 + 7x − 3x 2 relative to the ordered basis C = {1 + x, 2 + 3x, 5 + x + x 2 } are c1 , c2 , and c3 , where c1 (1 + x) + c2 (2 + 3x) + c3 (5 + x + x 2 ) = 5 + 7x − 3x 2 . That is, (c1 + 2c2 + 5c3 ) + (c1 + 3c2 + c3 )x + c3 x 2 = 5 + 7x − 3x 2 . Hence, c1 , c2 , and c3 satisfy c1 + 2c2 + 5c3 = 5, c1 + 3c2 + c3 = 7, c3 = −3. The augmented matrix of this system has reduced row-echelon form 1 0 0 40 0 1 0 −10 , 0 0 1 −3 so that the system has solution (40, −10, −3), which gives the required components. Hence, we can write 5 + 7x − 3x 2 = 40(1 + x) − 10(2 + 3x) − 3(5 + x + x 2 ). Therefore, 40 [p(x)]C = −10 . −3 Change-of-Basis Matrix The preceding example naturally motivates the following question: If we are given two different ordered bases for an n-dimensional vector space V , say B = {v1 , v2 , . . . , vn } and C = {w1 , w2 , . . . , wn }, (4.7.4) and a vector v in V , how are [v]B and [v]C related? In practical terms, we may know the components of v relative to B and wish to know the components of v relative to a different ordered basis C . This question actually arises quite often, since different bases are advantageous in different circumstances, so it is useful to be able to convert i i i i i i i “main” 2007/2/16 page 296 i 296 CHAPTER 4 Vector Spaces components of a vector relative to one basis to components relative to another basis. The tool we need in order to do this efficiently is the change-of-basis matrix. Before we describe this matrix, we pause to record the linearity properties satisfied by the components of a vector. These properties will facilitate the discussion that follows. Lemma 4.7.5 Let V be a vector space with ordered basis B = {v1 , v2 , . . . , vn }, let x and y be vectors in V , and let c be a scalar. Then we have (a) [x + y]B = [x]B + [y]B . (b) [cx]B = c[x]B . Proof Write x = a1 v1 + a2 v2 + · · · + an vn and y = b1 v1 + b2 v2 + · · · + bn vn , so that x + y = (a1 + b1 )v1 + (a2 + b2 )v2 + · · · + (an + bn )vn . Hence, a1 b1 a1 + b1 a2 + b2 a2 b2 [x + y]B = = . + . = [x]B + [y]B , . . . . . . . an + bn an bn which establishes (a). The proof of (b) is left as an exercise (Problem 37). DEFINITION 4.7.6 Let V be an n-dimensional vector space with ordered bases B and C given in (4.7.4). We define the change-of-basis matrix from B to C by PC ←B = [v1 ]C , [v2 ]C , . . . , [vn ]C . (4.7.5) In words, we determine the components of each vector in the “old basis” B with respect the “new basis” C and write the component vectors in the columns of the change-of-basis matrix. Remark Of course, there is also a change-of-basis matrix from C to B , given by PB ←C = [w1 ]B , [w2 ]B , . . . , [wn ]B . We will see shortly that the matrices PB ←C and PC ←B are intimately related. Our first order of business at this point is to see why the matrix in (4.7.5) converts the components of a vector relative to B into components relative to C . Let v be a vector in V and write v = a1 v1 + a2 v2 + · · · + an vn . i i i i i i i “main” 2007/2/16 page 297 i 4.7 297 Change of Basis a1 a2 [v]B = . . . . Then an Hence, using Theorem 2.2.9 and Lemma 4.7.5, we have PC ←B [v]B = a1 [v1 ]C + a2 [v2 ]C +· · ·+ an [vn ]C = [a1 v1 + a2 v2 +· · ·+ an vn ]C = [v]C . This calculation shows that premultiplying the component vector of v relative to B by the change of basis matrix PC ←B yields the component vector of v relative to C : [v]C = PC ←B [v]B . Example 4.7.7 (4.7.6) Let V = R2 , B = {(1, 2), (3, 4)}, C = {(7, 3), (4, 2)}, and v = (1, 0). It is routine to verify that B and C are bases for V . (a) Determine [v]B and [v]C . (b) Find PC ←B and PB ←C . (c) Use (4.7.6) to compute [v]C , and compare your answer with (a). Solution: (a) Solving (1, 0) = a1 (1, 2) + a2 (3, 4), we find a1 = −2 and a2 = 1. Hence, −2 . 1 [v]B = Likewise, setting (1, 0) = b1 (7, 3) + b2 (4, 2), we find b1 = 1 and b2 = −1.5. Hence, 1 [v]C = . −1.5 (b) A short calculation shows that [(1, 2)]C = −3 5.5 and Thus, we have PC ←B = [(3, 4)]C = −5 . 9.5 −3 −5 . 5.5 9.5 Likewise, another short calculation shows that [(7, 3)]B = −9.5 5.5 Hence, PB ←C = and [(4, 2)]B = −5 . 3 −9.5 −5 . 5.5 3 i i i i i i i “main” 2007/2/16 page 298 i 298 CHAPTER 4 Vector Spaces (c) We compute as follows: PC ←B [v]B = −3 −5 5.5 9.5 −2 1 = 1 −1.5 = [v]C , as we found in part (a). The reader may have noticed a close resemblance between the two matrices PC ←B and PB ←C computed in part (b) of the preceding example. In fact, a brief calculation shows that PC ←B PB ←C = I2 = PB ←C PC ←B . The two change-of-basis matrices are inverses of each other. This turns out to be always true. To see why, consider again Equation (4.7.6). If we premultiply both sides of (4.7.6) by the matrix PB ←C , we get PB ←C [v]C = PB ←C PC ←B [v]B . (4.7.7) Rearranging the roles of B and C in (4.7.6), the left side of (4.7.7) is simply [v]B . Thus, PB ←C PC ←B [v]B = [v]B . Since this is true for any vector [v]B in Rn , this implies that PB ←C PC ←B = In , the n × n identity matrix. Likewise, a similar calculation shows that PC ←B PB ←C = In . Thus, we have proved that The matrices PC ←B and PB ←C are inverses of one another. Example 4.7.8 Let V = P2 , and let B = {1, 1 + x, 1 + x + x 2 }, and C = {2 + x + x 2 , x + x 2 , x }. It is routine to verify that B and C are bases for V . Find the change-of-basis matrix from B to C , and use it to calculate the change-of-basis matrix from C to B . Solution: We set 1 = a1 (2 + x + x 2 ) + a2 (x + x 2 ) + a3 x . With a quick calculation, we find that a1 = 0.5, a2 = −0.5, and a3 = 0. Next, we set 1 + x = b1 (2 + x + x 2 ) + b2 (x + x 2 ) + b3 x , and we find that b1 = 0.5, b2 = −0.5, and b3 = 1. Finally, we set 1 + x + x 2 = c1 (2 + x + x 2 ) + c2 (x + x 2 ) + c3 x , from which it follows that c1 = 0.5, c2 = 0.5, and c3 = 0. Hence, we have a1 b1 c1 0.5 0.5 0.5 PC ←B = a2 b2 c2 = −0.5 −0.5 0.5 . 0 10 a3 b3 c3 Thus, we have PB ←C = (PC ←B )−1 1 −1 −1 = 0 0 1. 110 i i i i i i i “main” 2007/2/16 page 299 i 4.7 Change of Basis 299 In much the same way that we showed above that the matrices PC ←B and PB ←C are inverses of one another, we can make the following observation. Theorem 4.7.9 Let V be a vector space with ordered bases A, B , and C . Then PC ←A = PC ←B PB ←A . (4.7.8) Proof Using (4.7.6), for every v ∈ V , we have PC ←B PB ←A [v]A = PC ←B [v]B = [v]C = PC ←A [v]A , so that premultiplication of [v]A by either matrix in (4.7.8) yields the same result. Hence, the matrices on either side of (4.7.8) are the same. We conclude this section by using Theorem 4.7.9 to show how an arbitrary changeof-basis matrix PC ←B in Rn can be expressed as a product of change-of-basis matrices involving the standard basis E = {e1 , e2 , . . . , en } of Rn . Let B = {v1 , v2 , . . . , vn } and C = {w1 , w2 , . . . , wn } be arbitrary ordered bases for Rn . Since [v]E = v for all column vectors v in Rn , the matrices PE ←B = [[v1 ]E , [v2 ]E , . . . , [vn ]E ] = [v1 , v2 , . . . , vn ] and PE ←C = [[w1 ]E , [w2 ]E , . . . , [wn ]E ] = [w1 , w2 , . . . , wn ] can be written down immediately. Using these matrices, together with Theorem 4.7.9, we can compute the arbitrary change-of-basis matrix PC ←B with ease: PC ←B = PC ←E PE ←B = (PE ←C )−1 PE ←B . Exercises for 4.7 Key Terms True-False Review Ordered basis, Components of a vector relative to an ordered basis, Change-of-basis matrix. For Questions 1–8, decide if the given statement is true or false, and give a brief justification for your answer. If true, you can quote a relevant definition or theorem from the text. If false, provide an example, illustration, or brief explanation of why the statement is false. Skills • Be able to find the components of a vector relative to a given ordered basis for a vector space V . • Be able to compute the change-of-basis matrix for a vector space V from one ordered basis B to another ordered basis C . • Be able to use the change-of-basis matrix from B to C to determine the components of a vector relative to C from the components of the vector relative to B . • Be familiar with the relationship between the two change-of-basis matrices PC ←B and PB ←C . 1. Every vector in a finite-dimensional vector space V can be expressed uniquely as a linear combination of vectors comprising a basis for V . 2. The change-of-basis matrix PB ←C acts on the component vector of a vector v relative to the basis C and produces the component vector of v relative to the basis B . 3. A change-of-basis matrix is always a square matrix. 4. A change-of-basis matrix is always invertible. i i i i i i i “main” 2007/2/16 page 300 i 300 CHAPTER 4 Vector Spaces 5. For any vectors v and w in a finite-dimensional vector space V with basis B , we have [v−w]B = [v]B −[w]B . 6. If the bases B and C for a vector space V contain the same set of vectors, then [v]B = [v]C for every vector v in V . 7. If B and C are bases for a finite-dimensional vector space V , and v and w are in V such that [v]B = [w]C , then v = w. 8. The matrix PB ←B is the identity matrix for any basis B for V . Problems For Problems 1–13, determine the component vector of the given vector in the vector space V relative to the given ordered basis B . 1. V = R2 ; B = {(2, −2), (1, 4)}; v = (5, −10). 2. V = R2 ; B = {(−1, 3), (3, 2)}; v = (8, −2). 3. V = R3 ; B = {(1, 0, 1), (1, 1, −1), (2, 0, 1)}; v = (−9, 1, −8). 4. V = R3 ; B = {(1, −6, 3), (0, 5, −1), (3, −1, −1)}; v = (1, 7, 7). 5. V = R3 ; B = {(3, −1, −1), (1, −6, 3), (0, 5, −1)}; v = (1, 7, 7). 6. V = R3 ; B = {(−1, 0, 0), (0, 0, −3), (0, −2, 0)}; v = (5, 5, 5). 7. V = P2 ; B = {x 2 + x, 2 + 2x, 1}; p(x) = −4x 2 + 2x + 6. 8. V = P2 ; B = {5 − 3x, 1, 1 + 2x 2 }; p(x) = 15 − 18x − 30x 2 . 12. V = M2 (R); 2 −1 04 11 3 −1 B= , , , 35 −1 1 11 25 −10 16 A= . −15 −14 13. V = M2 (R); −1 1 13 10 0 −1 B= , , , 01 −1 0 12 23 56 A= . 78 ; ; 14. Let v1 = (0, 6, 3), v2 = (3, 0, 3), and v3 = (6, −3, 0). Determine the component vector of an arbitrary vector v = (x, y, z) relative to the ordered basis {v1 , v2 , v3 }. 15. Let p1 (x) = 1 + x , p2 (x) = x(x − 1), and p3 (x) = 1 + 2x 2 . Determine the component vector of an arbitrary polynomial p(x) = a0 + a1 x + a2 x 2 relative to the ordered basis {p1 , p2 , p3 }. For Problems 16–25, find the change-of-basis matrix PC ←B from the given ordered basis B to the given ordered basis C of the vector space V . 16. V = R2 ; B = {(9, 2), (4, −3)}; C = {(2, 1), (−3, 1)}. 17. V = R2 ; B = {(−5, −3), (4, 28)}; C = {(6, 2), (1, −1)}. 18. V = R3 ; B = {(2, −5, 0), (3, 0, 5), (8, −2, −9)}; C = {(1, −1, 1), (2, 0, 1), (0, 1, 3)}. 19. V = R3 ; B = {(−7, 4, 4), (4, 2, −1), (−7, 5, 0)}; C = {(1, 1, 0), (0, 1, 1), (3, −1, −1)}. 20. V = P1 ; B = {7 − 4x, 5x }; C = {1 − 2x, 2 + x }. 21. V = P2 ; B = {−4 + x − 6x 2 , 6 + 2x 2 , −6 − 2x + 4x 2 }; C = {1 − x + 3x 2 , 2, 3 + x 2 }. 9. V = P3 ; B = {1, 1 + x, 1 + x + x 2 , 1 + x + x 2 + x 3 }; p(x) = 4 − x + x 2 − 2x 3 . 22. V = P3 ; B = {−2+3x +4x 2 −x 3 , 3x +5x 2 +2x 3 , −5x 2 −5x 3 , 4 + 4x + 4x 2 }; C = {1 − x 3 , 1 + x, x + x 2 , x 2 + x 3 }. 10. V = P3 ; B = {x 3 + x 2 , x 3 − 1, x 3 + 1, x 3 + x }; p(x) = 8 + x + 6x 2 + 9x 3 . 23. V = P2 ; B = {2 + x 2 , −1 − 6x + 8x 2 , −7 − 3x − 9x 2 }; C = {1 + x, −x + x 2 , 1 + 2x 2 }. 11. V = M2 (R); 11 11 11 10 B= , , , 11 10 00 00 −3 −2 A= . −1 2 24. V = M2 (R); 10 0 −1 35 −2 −4 B= , , , −1 −2 30 00 00 11 11 11 10 C= , , , . 11 10 00 00 ; ; i i i i i i i “main” 2007/2/16 page 301 i 4.8 Row Space and Column Space 301 33. v = (−1, 2, 0); V , B , and C from Problem 19. 25. V = M2 (R); B = {E12 , E22 , E21 , E11 }; C = {E22 , E11 , E21 , E12 }. For Problems 26–31, find the change-of-basis matrix PB ←C from the given basis C to the given basis B of the vector space V . 34. p(x) = 6 − 4x ; V , B , and C from Problem 20. 35. p(x) = 5 − x + 3x 2 ; V , B , and C from Problem 21. −1 −1 ; V , B , and C from Problem 24. −4 5 26. V , B , and C from Problem 16. 36. A = 27. V , B , and C from Problem 17. 37. Prove part (b) of Lemma 4.7.5. 28. V , B , and C from Problem 18. 38. Prove that if every vector v in a vector space V can be written uniquely as a linear combination of the vectors in {v1 , v2 , . . . , vn }, then {v1 , v2 , . . . , vn } is a basis for V. 29. V , B , and C from Problem 20. 30. V , B , and C from Problem 22. 31. V , B , and C from Problem 25. For Problems 32–36, verify Equation (4.7.6) for the given vector. 32. v = (−5, 3); V , B , and C from Problem 16. 4.8 39. Show that if B is a basis for a finite-dimensional vector space V , and C is a basis obtained by reordering the vectors in B , then the matrices PC ←B and PB ←C each contain exactly one 1 in each row and column, and zeros elsewhere. Row Space and Column Space In this section, we consider two vector spaces that can be associated with any m × n matrix. For simplicity, we will assume that the matrices have real entries, although the results that we establish can easily be extended to matrices with complex entries. Row Space Let A = [aij ] be an m × n real matrix. The row vectors of this matrix are row n-vectors, and therefore they can be associated with vectors in Rn . The subspace of Rn spanned by these vectors is called the row space of A and denoted rowspace(A). For example, if A= 2 −1 3 , 5 9 −7 then rowspace(A) = span{(2, −1, 3), (5, 9, −7)}. For a general m × n matrix A, how can we obtain a basis for rowspace(A)? By its very definition, the row space of A is spanned by the row vectors of A, but these may not be linearly independent, hence the row vectors of A do not necessarily form a basis for rowspace(A). We wish to determine a systematic and efficient method for obtaining a basis for the row space. Perhaps not surprisingly, it involves the use of elementary row operations. If we perform elementary row operations on A, then we are merely taking linear combinations of vectors in rowspace(A), and we therefore might suspect that the row space of the resulting matrix coincides with the row space of A. This is the content of the following theorem. Theorem 4.8.1 If A and B are row-equivalent matrices, then rowspace(A) = rowspace(B). i i i i i i i “main” 2007/2/16 page 302 i 302 CHAPTER 4 Vector Spaces Proof We establish that the matrix that results from performing any of the three elementary row operations on a matrix A has the same row space as the row space of A. If we interchange two rows of A, then clearly we have not altered the row space, since we still have the same set of row vectors (listed in a different order). Now let a1 , a2 , . . . , am denote the row vectors of A. We combine the remaining two types of elementary row operations by considering the result of replacing ai by the vector r ai + s aj , where r (= 0) and s are real numbers. If s = 0, then this corresponds to scaling ai by a factor of r , whereas if r = 1 and s = 0, this corresponds to adding a multiple of row j to row i . If B denotes the resulting matrix, then rowspace(B) = {c1 a1 + c2 a2 + · · · + ci (r ai + s aj ) + · · · + cm am } = {c1 a1 + c2 a2 + · · · + (rci )ai + · · · + (cj + sci )aj + · · · + cm am } = {c1 a1 + c2 a2 + · · · + di ai + · · · + dj aj + · · · + cm am }, where di = rci and dj = cj + sci . Note that di and dj can take on arbitrary values, hence the vectors in rowspace(B) consist precisely of arbitrary linear combinations of a1 , a2 , . . . , am . That is, rowspace(B) = span{a1 , a2 , . . . , am } = rowspace(A). The previous theorem is the key to determining a basis for rowspace(A). The idea we use is to reduce A to row-echelon form. If d1 , d2 , . . . , dk denote the nonzero row vectors in this row-echelon form, then from the previous theorem, rowspace(A) = span{d1 , d2 , . . . , dk }. We now establish that {d1 , d2 , . . . , dk } is linearly independent. Consider c1 d1 + c2 d2 + · · · + ck dk = 0. (4.8.1) Owing to the positioning of the leading ones in a row-echelon matrix, each of the row vectors d1 , d2 , . . . , dk −1 will have a leading one in a position where each succeeding row vector in the row-echelon form has a zero. Hence, Equation (4.8.1) is satisfied only if c1 = c2 = · · · = ck −1 = 0, and therefore, it reduces to ck dk = 0. However, dk is a nonzero vector, and so we must have ck = 0. Consequently, all of the constants in Equation (4.8.1) must be zero, and therefore {d1 , d2 , . . . , dk } not only spans rowspace(A), but also is linearly independent. Hence, {d1 , d2 , . . . , dk } is a basis for rowspace(A). We have therefore established the next theorem. Theorem 4.8.2 The set of nonzero row vectors in any row-echelon form of an m × n matrix A is a basis for rowspace(A). As a consequence of the preceding theorem, we can conclude that all row-echelon forms of A have the same number of nonzero rows. For if this were not the case, then we could find two bases for rowspace(A) containing a different number of vectors, which would contradict Corollary 4.6.5. We can therefore consider Theorem 2.4.10 as a direct consequence of Theorem 4.8.2. i i i i i i i “main” 2007/2/16 page 303 i 4.8 Example 4.8.3 Row Space and Column Space 303 Determine a basis for the row space of 1 −1 1 3 2 2 −1 1 5 1 A= 3 −1 1 7 0 . 0 1 −1 −1 −3 Solution: We first reduce A to row-echelon form: 1 −1 1 3 2 1 −1 1 3 2 1 −1 −1 −3 2 0 1 −1 −1 −3 1 0 A∼ 0 2 −2 −2 −6 ∼ 0 0 0 0 0 . 0 1 −1 −1 −3 00000 1. A12 (−2), A13 (−3) 2. A23 (−2), A24 (−1) Consequently, a basis for rowspace(A) is {(1, −1, 1, 3, 2), (0, 1, −1, −1, −3)}, and therefore rowspace(A) is a two-dimensional subspace of R5 . Theorem 4.8.2 also gives an efficient method for determining a basis for the subspace of Rn spanned by a given set of vectors. If we let A be the matrix whose row vectors are the given vectors from Rn , then rowspace(A) coincides with the subspace of Rn spanned by those vectors. Consequently, the nonzero row vectors in any row-echelon form of A will be a basis for the subspace spanned by the given set of vectors. Example 4.8.4 Determine a basis for the subspace of R4 spanned by {(1, 2, 3, 4),(4, 5, 6, 7),(7, 8, 9, 10)}. Solution: We first let A denote the matrix that has the given vectors as row vectors. Thus, 123 4 A = 4 5 6 7 . 7 8 9 10 We now reduce A to row-echelon form: 12 3 4 12 3 4 1234 1 2 3 2 3 ∼ 0 1 2 3. A ∼ 0 −3 −6 −9 ∼ 0 1 0 −6 −12 −18 0 −6 −12 −18 0000 1. A12 (−4), A13 (−7) 2. M2 (− 1 ) 3 3. A23 (6) Consequently, a basis for the subspace of R4 spanned by the given vectors is {(1, 2, 3, 4), (0, 1, 2, 3)}. We see that the given vectors span a two-dimensional subspace of R4 . Column Space If A is an m × n matrix, the column vectors of A are column m-vectors and therefore can be associated with vectors in Rm . The subspace of Rm spanned by these vectors is called the column space of A and denoted colspace(A). i i i i i i i “main” 2007/2/16 page 304 i 304 CHAPTER 4 Vector Spaces Example 4.8.5 For the matrix A= 2 −1 3 , 5 9 −7 we have colspace(A) = span{(2, 5), (−1, 9), (3, −7)}. We now consider the problem of determining a basis for the column space of an m × n matrix A. Since the column vectors of A coincide with the row vectors of AT , it follows that colspace(A) = rowspace(AT ). Hence one way to obtain a basis for colspace(A) would be to reduce AT to row-echelon form, and then the nonzero row vectors in the resulting matrix would form a basis for colspace(A). There is, however, a better method for determining a basis for colspace(A) directly from any row-echelon form of A. The derivation of this technique is somewhat involved and will require full attention. We begin by determining the column space of an m × n reduced row-echelon matrix. In order to introduce the basic ideas, consider the particular reduced row-echelon matrix 12030 0 0 1 5 0 E= 0 0 0 0 1. 00000 In this case, we see that the first, third, and fifth column vectors, which are the column vectors containing the leading ones, coincide with the first three standard basis vectors in R4 (written as column vectors): 1 0 0 0 1 0 e1 = , e2 = , e3 = . 0 0 1 0 0 0 Consequently, these column vectors are linearly independent. Furthermore, the remaining column vectors in E (those that do not contain leading ones) are both linear combinations of e1 and e2 , columns that do contain leading ones. Therefore {e1 , e2 , e3 } is a linearly independent set of vectors that spans colspace(E), and so a basis for colspace(E) is {(1, 0, 0, 0), (0, 1, 0, 0), (0, 0, 1, 0)}. Clearly, the same arguments apply to any reduced row-echelon matrix E . Thus, if E contains k (necessarily ≤ n) leading ones, a basis for colspace(E) is {e1 , e2 , . . . , ek }. Now consider an arbitrary m × n matrix A, and let E denote the reduced row-echelon form of A. Recall from Chapter 2 that performing elementary row operations on a linear system does not alter its solution set. Hence, the two homogeneous systems of equations Ac = 0 and Ec = 0 (4.8.2) have the same solution sets. If we write A and E in column-vector form as A = [a1 , a2 , . . . , an ] and E = [d1 , d2 , . . . , dn ], respectively, then the two systems in (4.8.2) can be written as c1 a1 + c2 a2 + · · · + cn an = 0, c1 d1 + c2 d2 + · · · + cn dn = 0, respectively. Thus, the fact that these two systems have the same solution set means that a linear dependence relationship will hold between the column vectors of E if and only if i i i i i i i “main” 2007/2/16 page 305 i 4.8 Row Space and Column Space 305 precisely the same linear dependence relation holds between the corresponding column vectors of A. In particular, since our previous work shows that the column vectors in E that contain leading ones give a basis for colspace(E), they give a maximal linearly independent set in colspace(E). Therefore, the corresponding column vectors in A will also be a maximal linearly independent set in colspace(A). Consequently, this set of vectors from A will be a basis for colspace(A). We have therefore shown that the set of column vectors of A corresponding to those column vectors containing leading ones in the reduced row-echelon form of A is a basis for colspace(A). But do we have to reduce A to reduced row-echelon form? The answer is no. We need only reduce A to row-echelon form. The reason is that going further to reduce a matrix from row-echelon form to reduced row-echelon form does not alter the position or number of leading ones in a matrix, and therefore the column vectors containing leading ones in any row-echelon form of A will correspond to the column vectors containing leading ones in the reduced row-echelon form of A. Consequently, we have established the following result. Theorem 4.8.6 Let A be an m × n matrix. The set of column vectors of A corresponding to those column vectors containing leading ones in any row-echelon form of A is a basis for colspace(A). Example 4.8.7 Determine a basis for colspace(A) if 1 2 −1 −2 −1 2 4 −2 −3 −1 A= 5 10 −5 −3 −1 . −3 −6 3 2 1 Solution: We first reduce A to row-echelon form: 1 2 −1 −2 −1 1 2 −1 −2 −1 0 1 1 2 0 0 0 1 1 1 0 0 A∼ 0 0 0 7 4 ∼ 0 0 0 0 −3 0 0 0 −4 −2 00 0 0 2 1 2 −1 −2 −1 1 2 −1 −2 −1 0 1 1 4 0 0 0 1 1 3 0 0 ∼ 0 0 0 0 1 ∼ 0 0 0 0 1. 00 0 0 2 00 0 0 0 1. A12 (−2), A13 (−5), A14 (3) 2. A23 (−7), A24 (4) 3. M3 (− 1 ) 4. A34 (−2) 3 Since the first, fourth, and fifth column vectors in this row-echelon form of A contain the leading ones, it follows from Theorem 4.8.6 that the set of corresponding column vectors in A is a basis for colspace(A). Consequently, a basis for colspace(A) is {(1, 2, 5, −3), (−2, −3, −3, 2), (−1, −1, −1, 1)}. Hence, colspace(A) is a three-dimensional subspace of R4 . Notice from the row-echelon form of A that a basis for rowspace(A) is {(1, 2, −1, −2, −1), (0, 0, 0, 1, 1), (0, 0, 0, 0, 1)} so that rowspace(A) is a three-dimensional subspace of R5 . We now summarize the discussion of row space and column space. i i i i i i i “main” 2007/2/16 page 306 i 306 CHAPTER 4 Vector Spaces Summary: Let A be an m × n matrix. In order to determine a basis for rowspace(A) and a basis for colspace(A), we reduce A to row-echelon form. 1. The row vectors containing the leading ones in the row-echelon form give a basis for rowspace(A) (a subspace of Rn ). 2. The column vectors of A corresponding to the column vectors containing the leading ones in the row-echelon form give a basis for colspace(A) (a subspace of Rm ). Since the number of vectors in a basis for rowspace(A) or in a basis for colspace(A) is equal to the number of leading ones in any row-echelon form of A, it follows that dim[rowspace(A)] = dim[colspace(A)] = rank(A). However, we emphasize that rowspace(A) and colspace(A) are, in general, subspaces of different vector spaces. In Example 4.8.7, for instance, rowspace(A) is a subspace of R5 , while colspace(A) is a subspace of R4 . For an m × n matrix, rowspace(A) is a subspace of Rn , whereas colspace(A) is a subspace of Rm . Exercises for 4.8 Key Terms Row space, Column space. Skills • Be able to compute a basis for the row space of a matrix. • Be able to compute a basis for the column space of a matrix. True-False Review For Questions 1–6, decide if the given statement is true or false, and give a brief justification for your answer. If true, you can quote a relevant definition or theorem from the text. If false, provide an example, illustration, or brief explanation of why the statement is false. 1. If A is an m × n matrix such that rowspace(A) = colspace(A), then m = n. 2. A basis for the row space of a matrix A consists of the row vectors of any row-echelon form of A. 3. The nonzero column vectors of a row-echelon form of a matrix A form a basis for colspace(A). 4. The sets rowspace(A) and colspace(A) have the same dimension. 5. If A is an n × n invertible matrix, then rowspace(A) = Rn . 6. If A is an n × n invertible matrix, then colspace(A) = Rn . Problems For Problems 1–6, determine a basis for rowspace(A) and a basis for colspace(A). 1. A = 1 −2 . −3 6 2. A = 1 1 −3 2 . 3 4 −11 7 123 3. A = 5 6 7 . 9 10 11 031 4. A = 0 −6 −2 . 0 12 4 1 3 5. A = 1 5 2 6 2 10 −1 3 −3 5 . −1 −1 −5 7 1 −1 2 3 6. A = 1 1 −2 6 . 3 1 42 i i i i i i i “main” 2007/2/16 page 307 i 4.9 The Rank-Nullity Theorem 307 124 12. Let A = 5 11 21 . 3 7 13 For Problems 7–10, use the ideas in this section to determine a basis for the subspace of Rn spanned by the given set of vectors. 7. {(1, −1, 2), (5, −4, 1), (7, −5, −4)}. (a) Find a basis for rowspace(A) and colspace(A). 8. {(1, 3, 3), (1, 5, −1), (2, 7, 4), (1, 4, 1)}. (b) Show that rowspace(A) corresponds to the plane with Cartesian equation 2x + y − z = 0, whereas colspace(A) corresponds to the plane with Cartesian equation 2x − y + z = 0. 9. {(1, 1, −1, 2), (2, 1, 3, −4), (1, 2, −6, 10)}. 10. {(1, 4, 1, 3), (2, 8, 3, 5), (1, 4, 0, 4), (2, 8, 2, 6)}. 11. Let A= 13. Give examples to show how each type of elementary row operation applied to a matrix can change the column space of the matrix. −3 9 . 1 −3 Find a basis for rowspace(A) and colspace(A). Make a sketch to show each subspace in the xy -plane. 4.9 14. Give an example of a square matrix A whose row space and column space have no nonzero vectors in common. The Rank-Nullity Theorem In Section 4.3, we defined the null space of a real m × n matrix A to be the set of all real solutions to the associated homogeneous linear system Ax = 0. Thus, nullspace(A) = {x ∈ Rn : Ax = 0}. The dimension of nullspace(A) is referred to as the nullity of A and is denoted nullity(A). In order to find nullity(A), we need to determine a basis for nullspace(A). Recall that if rank(A) = r , then any row-echelon form of A contains r leading ones, which correspond to the bound variables in the linear system. Thus, there are n − r columns without leading ones, which correspond to free variables in the solution of the system Ax = 0. Hence, there are n − r free variables in the solution of the system Ax = 0. We might therefore suspect that nullity(A) = n − r . Our next theorem, often referred to as the Rank-Nullity Theorem, establishes that this is indeed the case. Theorem 4.9.1 (Rank-Nullity Theorem) For any m × n matrix A, rank(A) + nullity(A) = n. (4.9.1) Proof If rank(A) = n, then by the Invertible Matrix Theorem, the only solution to Ax = 0 is the trivial solution x = 0. Hence, in this case, nullspace(A) = {0}, so nullity(A) = 0 and Equation (4.9.1) holds. Now suppose rank(A) = r < n. In this case, there are n − r > 0 free variables in the solution to Ax = 0. Let t1 , t2 , . . . , tn−r denote these free variables (chosen as those variables not attached to a leading one in any row-echelon form of A), and let x1 , x2 , . . . , xn−r denote the solutions obtained by sequentially setting each free variable to 1 and the remaining free variables to zero. Note that {x1 , x2 , . . . , xn−r } is linearly independent. Moreover, every solution to Ax = 0 is a linear combination of x1 , x2 , . . . , xn−r : x = t1 x1 + t2 x2 + · · · + tn−r xn−r , which shows that {x1 , x2 , . . . , xn−r } spans nullspace(A). Thus, {x1 , x2 , . . . , xn−r } is a basis for nullspace(A), and nullity(A) = n − r . i i i i i i i “main” 2007/2/16 page 308 i 308 CHAPTER 4 Vector Spaces Example 4.9.2 If 1 1 23 A = 3 4 −1 2 , −1 −2 5 4 find a basis for nullspace(A) and verify Theorem 4.9.1. Solution: We must find all solutions to Ax = 0. Reducing the augmented matrix of this system yields 1 1 2 30 11 2 30 1 2 A# ∼ 0 1 −7 −7 0 ∼ 0 1 −7 −7 0 . 0 −1 7 7 0 00 0 00 1. A12 (−3), A13 (1) 2. A23 (1) Consequently, there are two free variables, x3 = t1 and x4 = t2 , so that x2 = 7t1 + 7t2 , x1 = −9t1 − 10t2 . Hence, nullspace(A) = {(−9t1 − 10t2 , 7t1 + 7t2 , t1 , t2 ) : t1 , t2 ∈ R} = {t1 (−9, 7, 1, 0) + t2 (−10, 7, 0, 1) : t1 , t2 ∈ R} = span{(−9, 7, 1, 0), (−10, 7, 0, 1)}. Since the two vectors in this spanning set are not proportional, they are linearly independent. Consequently, a basis for nullspace(A) is {(−9, 7, 1, 0), (−10, 7, 0, 1)}, so that nullity(A) = 2. In this problem, A is a 3 × 4 matrix, and so, in the Rank-Nullity Theorem, n = 4. Further, from the foregoing row-echelon form of the augmented matrix of the system Ax = 0, we see that rank(A) = 2. Hence, rank(A) + nullity(A) = 2 + 2 = 4 = n, and the Rank-Nullity Theorem is verified. Systems of Linear Equations We now examine the linear structure of the solution set to the linear system Ax = b in terms of the concepts introduced in the last few sections. First we consider the homogeneous case b = 0. Corollary 4.9.3 Let A be an m × n matrix, and consider the corresponding homogeneous linear system Ax = 0. 1. If rank(A) = n, then Ax = 0 has only the trivial solution, so nullspace(A) = {0}. 2. If rank(A) = r < n, then Ax = 0 has an infinite number of solutions, all of which can be obtained from x = c1 x1 + c2 x2 + · · · + cn−r xn−r , (4.9.2) where {x1 , x2 , . . . , xn−r } is any linearly independent set of n − r solutions to Ax = 0. i i i i i i i “main” 2007/2/16 page 309 i 4.9 The Rank-Nullity Theorem 309 Proof Note that part 1 is a restatement of previous results, or can be quickly deduced from the Rank-Nullity Theorem. Now for part 2, assume that rank(A) = r < n. By the Rank-Nullity Theorem, nullity(A) = n − r . Thus, from Theorem 4.6.10, if {x1 , x2 , . . . , xn−r } is any set of n − r linearly independent solutions to Ax = 0, it is a basis for nullspace(A), and so all vectors in nullspace(A) can be written as x = c1 x1 + c2 x2 + · · · + cn−r xn−r , for appropriate values of the constants c1 , c2 , . . . , cn−r . Remark Ax = 0. The expression (4.9.2) is referred to as the general solution to the system We now turn our attention to nonhomogeneous linear systems. We begin by formulating Theorem 2.5.9 in terms of colspace(A). Theorem 4.9.4 Let A be an m × n matrix and consider the linear system Ax = b. 1. If b is not in colspace(A), then the system is inconsistent. 2. If b ∈ colspace(A), then the system is consistent and has the following: (a) a unique solution if and only if dim[colspace(A)] = n. (b) an infinite number of solutions if and only if dim[colspace(A)] < n. Proof If we write A in terms of its column vectors as A = [a1 , a2 , . . . , an ], then the linear system Ax = b can be written as x1 a1 + x2 a2 + · · · + xn an = b. Consequently, the linear system is consistent if and only if the vector b is a linear combination of the column vectors of A. Thus, the system is consistent if and only if b ∈ colspace(A). This proves part 1. Parts 2(a) and 2(b) follow directly from Theorem 2.5.9, since rank(A) = dim[colspace(A)]. The set of all solutions to a nonhomogeneous linear system is not a vector space, since, for example, it does not contain the zero vector, but the linear structure of nullspace(A) can be used to determine the general form of the solution of a nonhomogeneous system. Theorem 4.9.5 Let A be an m × n matrix. If rank(A) = r < n and b ∈ colspace(A), then all solutions to Ax = b are of the form x = c1 x1 + c2 x2 + · · · + cn−r xn−r + xp , (4.9.3) where xp is any particular solution to Ax = b, and {x1 , x2 , . . . , xn−r } is a basis for nullspace(A). Proof Since xp is a solution to Ax = b, we have Axp = b. (4.9.4) Let x = u be an arbitrary solution to Ax = b. Then we also have Au = b. (4.9.5) i i i i i i i main” 2007/2/16 page 310 i 310 CHAPTER 4 Vector Spaces Subtracting (4.9.4) from (4.9.5) yields Au − Axp = 0, or equivalently, A(u − xp ) = 0. Consequently, the vector u − xp is in nullspace(A), and so there exist scalars c1 , c2 , . . . , cn−r such that u − xp = c1 x1 + c2 x2 + · · · + cn−r xn−r , since {x1 , x2 , . . . , xn−r } is a basis for nullspace(A). Hence, u = c1 x1 + c2 x2 + · · · + cn−r xn−r + xp , as required. Remark The expression given in Equation (4.9.3) is called the general solution to Ax = b. It has the structure x = xc + xp , where xc = c1 x1 + c2 x2 + · · · + cn−r xn−r is the general solution of the associated homogeneous system and xp is one particular solution of the nonhomogeneous system. In later chapters, we will see that this structure is also apparent in the solution of all linear differential equations and in all linear systems of differential equations. It is a result of the linearity inherent in the problem, rather than the specific problem that we are studying. The unifying concept, in addition to the vector space, is the idea of a linear transformation, which we will study in the next chapter. Example 4.9.6 Let 1 1 23 A = 3 4 −1 2 −1 −2 5 4 and 3 b = 10 . −4 Verify that xp = (1, 1, −1, 1) is a particular solution to Ax = b, and use Theorem 4.9.5 to determine the general solution to the system. Solution: For the given xp , we have 1 1 1 23 3 1 Axp = 3 4 −1 2 −1 = 10 = b. −1 −2 5 4 −4 1 Consequently, xp = (1, 1, −1, 1) is a particular solution to Ax = b. Further, from Example 4.9.2, a basis for nullspace(A) is {x1 , x2 }, where x1 = (−9, 7, 1, 0) and x2 = (−10, 7, 0, 1). Thus, the general solution to Ax = 0 is xc = c1 x1 + c2 x2 , i i i i i i i “main” 2007/2/16 page 311 i 4.9 The Rank-Nullity Theorem 311 and therefore, from Theorem 4.9.5, the general solution to Ax = b is x = c1 x1 + c2 x2 + xp = c1 (−9, 7, 1, 0) + c2 (−10, 7, 0, 1) + (1, 1, −1, 1), which can be written as x = (−9c1 − 10c2 + 1, 7c1 + 7c2 + 1, c1 − 1, c2 + 1). Exercises for 4.9 Skills • For a given matrix A, be able to determine the rank from the nullity, or the nullity from the rank. • Know the relationship between the rank of a matrix A and the consistency of a linear system Ax = b. • Know the relationship between the column space of a matrix A and the consistency of a linear system Ax = b. • Be able to formulate the solution set to a linear system Ax = b in terms of the solution set to the corresponding homogeneous linear equation. True-False Review For Questions 1–9, decide if the given statement is true or false, and give a brief justification for your answer. If true, you can quote a relevant definition or theorem from the text. If false, provide an example, illustration, or brief explanation of why the statement is false. 1. For an m × n matrix A, the nullity of A must be at least |m − n|. 2. If A is a 7 × 9 matrix with nullity(A) = 2, then rowspace(A) = R7 . 3. If A is a 9 × 7 matrix with nullity(A) = 0, then rowspace(A) = R7 . 4. The nullity of an n × n upper triangular matrix A is simply the number of zeros appearing on the main diagonal of A. 5. An n × n matrix A for which nullspace(A) = colspace(A) cannot be invertible. 6. For all m × n matrices A and B , nullity(A + B) = nullity(A)+ nullity(B). 7. For all n × n matrices A and B , nullity(AB) = nullity(A)· nullity(B). 8. For all n × n matrices A and B , nullity(AB) ≥ nullity(B). 9. If xp is a solution to the linear system Ax = b, then y + xp is also a solution for any y in nullspace(A). Problems For Problems 1–4, determine the null space of A and verify the Rank-Nullity Theorem. 1. A = 1 0 −6 −1 . 2 −1 . −4 2 1 1 −1 3. A = 3 4 4 . 11 0 1 4 −1 3 4. A = 2 9 −1 7 . 2 8 −2 6 2. A = For Problems 5–8, determine the nullity of A “by inspection” by appealing to the Rank-Nullity Theorem. Avoid computations. 2 −3 0 0 . 5. A = −4 6 22 −33 1 3 −3 2 5 −4 −12 12 −8 −20 . 6. A = 0 000 0 1 3 −3 2 6 010 0 1 0 7. A = 0 0 1 . 001 8. A = 0 0 0 −2 . i i i i i i i “main” 2007/2/16 page 312 i 312 CHAPTER 4 Vector Spaces For Problems 9–12, determine the solution set to Ax = b, and show that all solutions are of the form (4.9.3). 1 3 −1 4 9. A = 2 7 9 , b = 11 . 1 5 21 10 2 −1 1 4 5 10. A = 1 −1 2 3 , b = 6 . 1 −2 5 5 13 1 1 −2 −3 3 −1 −7 2 11. A = 1 1 1 , b = 0 . 2 2 −4 −6 1 1 −1 5 0 12. A = 0 2 −1 7 , b = 0 . 4 2 −3 13 0 14. Show that a 6 × 4 matrix A with nullity(A) = 0 must have rowspace(A) = R4 . Is colspace(A) = R4 ? 15. Prove that if rowspace(A) = nullspace(A), then A contains an even number of columns. 16. Show that a 5×7 matrix A must have 2 ≤ nullity(A) ≤ 7. Give an example of a 5 × 7 matrix A with nullity(A) = 2 and an example of a 5 × 7 matrix A with nullity(A) = 7. 17. Show that 3 × 8 matrix A must have 5 ≤ nullity(A) ≤ 8. Give an example of a 3 × 8 matrix A with nullity(A) = 5 and an example of a 3 × 8 matrix A with nullity(A) = 8. 18. Prove that if A and B are n × n matrices and A is invertible, then 13. Show that a 3 × 7 matrix A with nullity(A) = 4 must have colspace(A) = R3 . Is rowspace(A) = R3 ? 4.10 nullity(AB) = nullity(B). [Hint: B x = 0 if and only if AB x = 0.] The Invertible Matrix Theorem II In Section 2.8, we gave a list of characterizations of invertible matrices (Theorem 2.8.1). In view of the concepts introduced in this chapter, we are now in a position to add to the list that was begun there. Theorem 4.10.1 (Invertible Matrix Theorem) Let A be an n×n matrix with real elements. The following conditions on A are equivalent: (a) A is invertible. (h) nullity(A) = 0. (i) nullspace(A) = {0}. (j) The columns of A form a linearly independent set of vectors in Rn . (k) colspace(A) = Rn (that is, the columns of A span Rn ). (l) The columns of A form a basis for Rn . (m) The rows of A form a linearly independent set of vectors in Rn . (n) rowspace(A) = Rn (that is, the rows of A span Rn ). (o) The rows of A form a basis for Rn . (p) AT is invertible. Proof The equivalence of (a) and (h) follows at once from Theorem 2.8.1(d) and the Rank-Nullity Theorem (Theorem 4.9.1). The equivalence of (h) and (i) is immediately clear. The equivalence of (a) and (j) is immediate from Theorem 2.8.1(c) and Theorem 4.5.14. Since the dimension of colspace(A) is simply rank(A), the equivalence of (a) and (k) is immediate from Theorem 2.8.1(d). Next, from the definition of a basis, i i i i i i i “main” 2007/2/16 page 313 i 4.10 The Invertible Matrix Theorem II 313 we see that (j) and (k) are logically equivalent to (l). Moreover, since the row space and column space of A have the same dimension, (k) and (n) are equivalent. Since rowspace(A) = colspace(AT ), the equivalence of (k) and (n) proves that (a) and (p) are equivalent. Finally, the equivalence of (a) and (p) proves that (j) is equivalent to (m) and that (l) is equivalent to (o). Example 4.10.2 Do the rows of the matrix below span R4 ? −2 −2 1 3 3 3 0 −1 A= −1 −1 −2 −5 2211 Solution: We see by inspection that the columns of A are linearly dependent, since the first two columns are identical. Therefore, by the equivalence of (j) and (n) in the Invertible Matrix Theorem, the rows of A do not span R4 . Example 4.10.3 If A is an n × n matrix such that the linear system AT x = 0 has no nontrivial solution x, then nullspace(AT ) = {0}, and thus AT is invertible by the equivalence of (a) and (i) in the Invertible Matrix Theorem. Thus, by the same theorem, we can conclude that the columns of A form a linearly independent set. Despite the lengthy list of characterizations of invertible matrices that we have been able to develop so far, this list is still by no means complete. In the next chapter, we will use linear transformations and eigenvalues to provide further characterizations of invertible matrices. Exercises for 4.10 Skills • Be well familiar with all of the conditions (a)–(p) in the Invertible Matrix Theorem that characterize invertible matrices. True-False Review For Questions 1–10, decide if the given statement is true or false, and give a brief justification for your answer. If true, you can quote a relevant definition or theorem from the text. If false, provide an example, illustration, or brief explanation of why the statement is false. 1. The set of all row vectors of an invertible matrix is linearly independent. 4. If A is an n × n matrix with det(A) = 0, then the columns of A must form a basis for Rn . 5. If A and B are row-equivalent n × n matrices such that rowspace(A) = Rn , then colspace(B) = Rn . 6. If E is an n × n elementary matrix and A is an n × n matrix with nullspace(A) = {0}, then det(EA) = 0. 7. If A and B are n × n invertible matrices, then nullity([A|B ]) = 0, where [A|B ] is the n × 2n matrix with the blocks A and B as shown. 8. A matrix of the form 0a0 b 0 c 0d0 2. An n × n matrix can have n linearly independent rows and n linearly dependent columns. 3. The set of all row vectors of an n × n matrix can be linearly dependent while the set of all columns is linearly independent. cannot be invertible. i i i i i i i “main” 2007/2/16 page 314 i 314 CHAPTER 4 Vector Spaces 9. A matrix of the form 0 c 0 g a 0 e 0 0 d 0 h 10. A matrix of the form abc d e f ghi b 0 f 0 such that ae − bd = 0 cannot be invertible. cannot be invertible. 4.11 Inner Product Spaces We now extend the familiar idea of a dot product for geometric vectors to an arbitrary vector space V . This enables us to associate a magnitude with each vector in V and also to define the angle between two vectors in V . The major reason that we want to do this is that, as we will see in the next section, it enables us to construct orthogonal bases in a vector space, and the use of such a basis often simplifies the representation of vectors. We begin with a brief review of the dot product. Let x = (x1 , x2 , x3 ) and y = (y1 , y2 , y3 ) be two arbitrary vectors in R3 , and consider the corresponding geometric vectors x = x1 i + x2 j + x3 k, y = y1 i + y2 j + y3 k. The dot product of x and y can be defined in terms of the components of these vectors as x · y = x1 y1 + x2 y2 + x3 y3 . (4.11.1) An equivalent geometric definition of the dot product is z (y1, y2, y3) (x1, x2, x3) x y y x Figure 4.11.1: Defining the dot product in R3 . x · y = ||x|| ||y|| cos θ, (4.11.2) where ||x||, ||y|| denote the lengths of x and y respectively, and 0 ≤ θ ≤ π is the angle between them. (See Figure 4.11.1.) Taking y = x in Equations (4.11.1) and (4.11.2) yields 2 2 2 ||x||2 = x · x = x1 + x2 + x3 , so that the length of a geometric vector is given in terms of the dot product by ||x|| = √ x·x = 2 2 2 x1 + x2 + x3 . Furthermore, from Equation (4.11.2), the angle between any two nonzero vectors x and y is cos θ = x·y , ||x|| ||y|| (4.11.3) which implies that x and y are orthogonal (perpendicular) if and only if x · y = 0. In a general vector space, we do not have a geometrical picture to guide us in defining the dot product, hence our definitions must be purely algebraic. We begin by considering the vector space Rn , since there is a natural way to extend Equation (4.11.1) in this case. Before proceeding, we note that from now on we will use the standard terms inner product and norm in place of dot product and length, respectively. i i i i i i i “main” 2007/2/16 page 315 i 4.11 Inner Product Spaces 315 DEFINITION 4.11.1 Let x = (x1 , x2 , . . . , xn ) and y = (y1 , y2 , . . . , yn ) be vectors in Rn . We define the standard inner product in Rn , denoted x, y , by x, y = x1 y1 + x2 y2 + · · · + xn yn . The norm of x is ||x|| = Example 4.11.2 x, x = 2 2 2 x1 + x2 + · · · + xn . If x = (1, −1, 0, 2, 4) and y = (2, 1, 1, 3, 0) in R5 , then x, y = (1)(2) + (−1)(1) + (0)(1) + (2)(3) + (4)(0) = 7, √ ||x|| = 12 + (−1)2 + 02 + 22 + 42 = 22, √ ||y|| = 22 + 12 + 12 + 32 + 02 = 15. Basic Properties of the Standard Inner Product in Rn In the case of Rn , the definition of the standard inner product was a natural extension of the familiar dot product in R3 . To generalize this definition further to an arbitrary vector space, we isolate the most important properties of the standard inner product in Rn and use them as the defining criteria for a general notion of an inner product. Let us examine the inner product in Rn more closely. We view it as a mapping that associates with any two vectors x = (x1 , x2 , . . . , xn ) and y = (y1 , y2 , . . . , yn ) in Rn the real number x, y = x1 y1 + x2 y2 + · · · + xn yn . This mapping has the following properties: For all x, y, and z in Rn and all real numbers k , 1. x, x ≥ 0. Furthermore, x, x = 0 if and only if x = 0. 2. y, x = x, y . 3. k x, y = k x, y . 4. x + y, z = x, z + y, z . These properties are easily established using Definition 4.11.1. For example, to prove property 1, we proceed as follows. From Definition 4.11.1, 2 2 2 x, x = x1 + x2 + · · · + xn . Since this is a sum of squares of real numbers, it is necessarily nonnegative. Further, x, x = 0 if and only if x1 = x2 = · · · = xn = 0—that is, if and only if x = 0. Similarly, for property 2, we have y, x = y1 x1 + y2 x2 + · · · + yn xn = x1 y1 + x2 y2 + · · · + xn yn = x, y . We leave the verification of properties 3 and 4 for the reader. i i i i i i i “main” 2007/2/16 page 316 i 316 CHAPTER 4 Vector Spaces Definition of a Real Inner Product Space We now use properties 1–4 as the basic defining properties of an inner product in a real vector space. DEFINITION 4.11.3 Let V be a real vector space. A mapping that associates with each pair of vectors u and v in V a real number, denoted u, v , is called an inner product in V , provided it satisfies the following properties. For all u, v, and w in V , and all real numbers k , 1. u, u ≥ 0. Furthermore, u, u = 0 if and only if u = 0. 2. v, u = u, v . 3. k u, v = k u, v . 4. u + v, w = u, w + v, w . The norm of u is defined in terms of an inner product by ||u|| = u, u . A real vector space together with an inner product defined in it is called a real inner product space. Remarks √ 1. Observe that ||u|| = u, u takes a well-defined nonnegative real value, since property 1 of an inner product guarantees that the norm evaluates the square root of a nonnegative real number. 2. It follows from the discussion above that Rn together with the inner product defined in Definition 4.11.1 is an example of a real inner product space. One of the fundamental inner products arises in the vector space C 0 [a, b] of all real-valued functions that are continuous on the interval [a, b]. In this vector space, we define the mapping f, g by b f, g = f (x)g(x) dx, (4.11.4) a for all f and g in C 0 [a, b]. We establish that this mapping defines an inner product in C 0 [a, b] by verifying properties 1–4 of Definition 4.11.3. If f is in C 0 [a, b], then y y [f (x)]2 b f, f = [f (x)]2 dx. a a b Figure 4.11.2: f, f gives the area between the graph of y = [f (x)]2 and the x -axis, lying over the interval [a, b]. x Since the integrand, [f (x)]2 , is a nonnegative continuous function, it follows that f, f measures the area between the graph y = [f (x)]2 and the x -axis on the interval [a, b]. (See Figure 4.11.2.) Consequently, f, f ≥ 0. Furthermore, f, f = 0 if and only if there is zero area between the graph y = [f (x)]2 and the x -axis—that is, if and only if [f (x)]2 = 0 for all x in [a, b]. i i i i i i i “main” 2007/2/16 page 317 i 4.11 y f (x) a Inner Product Spaces 317 Hence, f, f = 0 if and only if f (x) = 0, for all x in [a, b], so f must be the zero function. (See Figure 4.11.3.) Consequently, property 1 of Definition 4.11.3 is satisfied. 0 0 for all x in [a,b] Now let f, g, and h be in C [a, b], and let k be an arbitrary real number. Then b g, f = x a b Figure 4.11.3: f, f = 0 if and only if f is the zero function. b g(x)f (x) dx = f (x)g(x) dx = f, g . a Hence, property 2 of Definition 4.11.3 is satisfied. For property 3, we have b kf, g = b (kf )(x)g(x) dx = a b kf (x)g(x) dx = k a f (x)g(x) dx = k f, g , a as needed. Finally, b f + g, h = a = b b (f + g)(x)h(x) dx = b f (x)h(x) dx + a [f (x) + g(x)]h(x) dx a g(x)h(x) dx = f, h + g , h , a so that property (4) of Definition 4.11.3 is satisfied. We can now conclude that Equation (4.11.4) does define an inner product in the vector space C 0 [a, b]. Example 4.11.4 Use Equation (4.11.4) to determine the inner product of the following functions in C 0 [0, 1]: f (x) = 8x, g(x) = x 2 − 1. Also find ||f || and ||g ||. Solution: From Equation (4.11.4), 1 f, g = 1 8x(x 2 − 1) dx = 2x 4 − 4x 2 0 = −2. 0 Moreover, we have 1 ||f || = 0 8 64x 2 dx = √ 3 and 1 ||g || = 1 (x 2 − 1)2 dx = 0 (x 4 − 2x 2 + 1) dx = 0 8 . 15 We have already seen that the norm concept generalizes the length of a geometric vector. Our next goal is to show how an inner product enables us to define the angle between two vectors in an abstract vector space. The key result is the Cauchy-Schwarz inequality established in the next theorem. Theorem 4.11.5 (Cauchy-Schwarz Inequality) Let u and v be arbitrary vectors in a real inner product space V . Then | u, v | ≤ ||u|| ||v||. (4.11.5) i i i i i i i “main” 2007/2/16 page 318 i 318 CHAPTER 4 Vector Spaces Proof Let k be an arbitrary real number. For the vector u + k v, we have 0 ≤ ||u + k v||2 = u + k v, u + k v . (4.11.6) But, using the properties of a real inner product, u + k v , u + k v = u, u + k v + k v , u + k v = u + k v, u + u + k v, k v = u, u + k v , u + u, k v + k v , k v = u, u + 2 k v , u + k v , k v = u, u + 2 k v , u + k k v , v = u, u + 2 k v , u + k 2 v , v = ||u||2 + 2k v, u + k 2 ||v||2 . Consequently, (4.11.6) implies that ||v||2 k 2 + 2 u, v k + ||u||2 ≥ 0. (4.11.7) The left-hand side of this inequality defines the quadratic expression P (k) = ||v||2 k 2 + 2 u, v k + ||u||2 . The discriminant of this quadratic is = 4( u, v )2 − 4||u||2 ||v||2 . If > 0, then P (k) has two real and distinct roots. This would imply that the graph of P crosses the k -axis and, therefore, P would assume negative values, contrary to (4.11.7). Consequently, we must have ≤ 0. That is, 4( u, v )2 − 4||u||2 ||v||2 ≤ 0, or equivalently, ( u, v )2 ≤ ||u||2 ||v||2 . Hence, | u, v | ≤ ||u|| ||v||. If u and v are arbitrary vectors in a real inner product space V , then u, v is a real number, and so (4.11.5) can be written in the equivalent form −||u|| ||v|| ≤ u, v ≤ ||u|| ||v||. Consequently, provided that u and v are nonzero vectors, we have −1 ≤ u, v ≤ 1. ||u|| ||v|| Thus, each pair of nonzero vectors in a real inner product space V determines a unique angle θ by cos θ = u, v , ||u|| ||v|| 0 ≤ θ ≤ π. (4.11.8) i i i i i i i “main” 2007/2/16 page 319 i 4.11 Inner Product Spaces 319 We call θ the angle between u and v. In the case when u and v are geometric vectors, the formula (4.11.8) coincides with Equation (4.11.3). Example 4.11.6 Determine the angle between the vectors u = (1, −1, 2, 3) and v = (−2, 1, 2, −2) in R4 . Solution: Using the standard inner product in R4 yields √ √ u, v = −5, ||u|| = 15, ||v|| = 13, so that the angle between u and v is given by √ 5 195 cos θ = − √ √ = − , 39 15 13 0 ≤ θ ≤ π. Hence, √ θ = arccos − Example 4.11.7 195 39 ≈ 1.937 radians ≈ 110◦ 58 . Use the inner product (4.11.4) to determine the angle between the functions f1 (x) = sin 2x and f2 (x) = cos 2x on the interval [−π, π ]. Solution: Using the inner product (4.11.4), we have f1 , f2 = π −π sin 2x cos 2x dx = 1 2 π π sin 4x dx = 1 (− cos 4x) 8 π −π = 0. Consequently, the angle between the two functions satisfies cos θ = 0, 0 ≤ θ ≤ π, which implies that θ = π/2. We say that the functions are orthogonal on the interval [−π, π ], relative to the inner product (4.11.4). In the next section we will have much more to say about orthogonality of vectors. Complex Inner Products9 The preceding discussion has been concerned with real vector spaces. In order to generalize the definition of an inner product to a complex vector space, we first consider the case of Cn . By analogy with Definition 4.11.1, one might think that the natural inner product in Cn would be obtained by summing the products of corresponding components of vectors in Cn in exactly the same manner as in the standard inner product for Rn . However, one reason for introducing an inner product is to obtain a concept of “length” of a vector. In order for a quantity to be considered a reasonable measure of length, we would want it to be a nonnegative real number that vanishes if and only if the vector itself is the zero vector (property 1 of a real inner product). But, if we apply the inner product in Rn given in Definition 4.11.1 to vectors in Cn , then, since the components of vectors in Cn are complex numbers, it follows that the resulting norm of a vector in 9 In the remainder of the text, the only complex inner product that we will require is the standard inner product in Cn , and this is needed only in Section 5.10. i i i i i i i “main” 2007/2/16 page 320 i 320 CHAPTER 4 Vector Spaces Cn would be a complex number also. Furthermore, applying the R2 inner product to, for example, the vector u = (1 − i, 1 + i), we obtain ||u||2 = (1 − i)2 + (1 + i)2 = 0, which means that a nonzero vector would have zero “length.” To rectify this situation, we must define an inner product in Cn more carefully. We take advantage of complex conjugation to do this, as the definition shows. DEFINITION 4.11.8 If u = (u1 , u2 , . . . , un ) and v = (v1 , v2 , . . . , vn ) are vectors in Cn , we define the standard inner product in Cn by10 u, v = u1 v 1 + u2 v 2 + · · · + un v n . The norm of u is defined to be the real number ||u|| = u, u = |u1 |2 + |u2 |2 + · · · + |un |2 . The preceding inner product is a mapping that associates with the two vectors u = (u1 , u2 , . . . , un ) and v = (v1 , v2 , . . . , vn ) in Cn the scalar u, v = u1 v 1 + u2 v 2 + · · · + un v n . In general, u, v will be nonreal (i.e., it will have a nonzero imaginary part). The key point to notice is that the norm of u is always a real number, even though the separate components of u are complex numbers. Example 4.11.9 If u = (1 + 2i, 2 − 3i) and v = (2 − i, 3 + 4i), find u, v and ||u||. Solution: Using Definition 4.11.8, u, v = (1 + 2i)(2 + i) + (2 − 3i)(3 − 4i) = 5i − 6 − 17i = −6 − 12i, √ √ ||u|| = u, u = (1 + 2i)(1 − 2i) + (2 − 3i)(2 + 3i) = 5 + 13 = 3 2. The standard inner product in Cn satisfies properties (1), (3), and (4), but not property (2). We now derive the appropriate generalization of property (2) when using the standard inner product in Cn . Let u = (u1 , u2 , . . . , un ) and v = (v1 , v2 , . . . , vn ) be vectors in Cn . Then, from Definition 4.11.8, v, u = v1 u1 + v2 u2 + · · · + vn un = u1 v 1 + u2 v 2 + · · · + un v n = u, v . Thus, v , u = u, v . We now use the properties satisfied by the standard inner product in Cn to define an inner product in an arbitrary (that is, real or complex) vector space. 10 Recall that if z = a + ib, then z = a − ib and |z|2 = zz = (a + ib)(a − ib) = a 2 + b2 . i i i i i i i “main” 2007/2/16 page 321 i 4.11 Inner Product Spaces 321 DEFINITION 4.11.10 Let V be a (real or complex) vector space. A mapping that associates with each pair of vectors u, v in V a scalar, denoted u, v , is called an inner product in V , provided it satisfies the following properties. For all u, v and w in V and all (real or complex) scalars k , 1. u, u ≥ 0. Furthermore, u, u = 0 if and only if u = 0. 2. v, u = u, v . 3. k u, v = k u, v . 4. u + v, w = u, w + v, w . The norm of u is defined in terms of the inner product by ||u|| = u, u . Remark Notice that the properties in the preceding definition reduce to those in Definition 4.11.3 in the case that V is a real vector space, since in such a case the complex conjugates are unnecessary. Thus, this definition is a consistent extension of Definition 4.11.3. Example 4.11.11 Use properties 2 and 3 of Definition 4.11.10 to prove that in an inner product space u, k v = k u, v for all vectors u, v and all scalars k . Solution: From properties 2 and 3, we have u, k v = k v , u = k v , u = k v , u = k u, v . Notice that in the particular case of a real vector space, the foregoing result reduces to u, k v = k u, v , since in such a case the scalars are real numbers. Exercises for 4.11 Key Terms Inner product, Axioms of an inner product, Real (complex) inner product space, Norm, Angle, Cauchy-Schwarz inequality. Skills • Know the four inner product space axioms. • Be able to check whether or not a proposed inner product on a vector space V satisfies the inner product space axioms. • Be able to compute the inner product of two vectors in an inner product space. • Be able to find the norm of a vector in an inner product space. • Be able to find the angle between two vectors in an inner product space. True-False Review For Questions 1–7, decide if the given statement is true or false, and give a brief justification for your answer. If true, you can quote a relevant definition or theorem from the text. If false, provide an example, illustration, or brief explanation of why the statement is false. i i i i i i i “main” 2007/2/16 page 322 i 322 CHAPTER 4 Vector Spaces 1. If v and w are linearly independent vectors in an inner product space V , then v, w = 0. 2. In any inner product space V , we have For Problems 6–7, use the inner product (4.11.9) to determine A, B , ||A||, and ||B ||. 6. A = 2 −1 ,B = 35 31 . −1 2 7. A = 32 ,B = −2 4 11 . −2 1 k v, k w = k v, w . 3. If v1 , w = v2 , w = 0 in an inner product space V , then c1 v1 + c2 v2 , w = 0. 4. In any inner product space V , x + y, x − y < 0 if and only if ||x|| < ||y||. 5. In any vector space V , there is at most one valid inner product , that can be defined on V . 6. The angle between the vectors v and w in an inner product space V is the same as the angle between the vectors −2v and −2w. 7. If p(x) = a0 +a1 x +a2 x 2 and q(x) = b0 +b1 x +b2 x 2 , then we can define an inner product on P2 via p, q = a0 b0 . Problems 1. Use the standard inner product in R4 to determine the angle between the vectors v = (1, 3, −1, 4) and w = (−1, 1, −2, 1). 2. If f (x) = sin x and g(x) = x on [0, π ], use the function inner product defined in the text to determine the angle between f and g . 3. If v = (2 + i, 3 − 2i, 4 + i) and w = (−1 + i, 1 − 3i, 3 − i), use the standard inner product in C3 to determine, v, w , ||v||, and ||w||. 4. Let A= a11 a12 , a21 a22 B= b11 b12 b21 b22 be vectors in M2 (R). Show that the mapping A, B = a11 b11 + a12 b12 + a21 b21 + a22 b22 (4.11.9) defines an inner product in M2 (R). 5. Referring to A and B in the previous problem, show that the mapping A, B = a11 b22 + a12 b21 + a21 b12 + a22 b11 does not define a valid inner product on M2 (R). 8. Let p1 (x) = a + bx and p2 (x) = c + dx be vectors in P1 . Determine a mapping p1 , p2 that defines an inner product on P1 . Consider the vector space R2 . Define the mapping , by v, w = 2v1 w1 + v1 w2 + v2 w1 + 2v2 w2 (4.11.10) for all vectors v = (v1 , v2 ) and w = (w1 , w2 ) in R2 . This mapping is required for Problems 9–12. 9. Verify that Equation (4.11.10) defines an inner product on R2 . For Problems 10–12, determine the inner product of the given vectors using (a) the inner product (4.11.10), (b) the standard inner product in R2 . 10. v = (1, 0), w = (−1, 2). 11. v = (2, −1), w = (3, 6). 12. v = (1, −2), w = (2, 1). 13. Consider the vector space R2 . Define the mapping , by v, w = v1 w1 − v2 w2 , (4.11.11) for all vectors v = (v1 , v2 ) and w = (w1 , w2 ). Verify that all of the properties in Definition 4.11.3 except (1) are satisfied by (4.11.11). The mapping (4.11.11) is called a pseudo-inner product in R2 and, when generalized to R4 , is of fundamental importance in Einstein’s special relativity theory. 14. Using Equation (4.11.11), determine all nonzero vectors satisfying v, v = 0. Such vectors are called null vectors. 15. Using Equation (4.11.11), determine all vectors satisfying v, v < 0. Such vectors are called timelike vectors. i i i i i i i “main” 2007/2/16 page 323 i 4.12 Orthogonal Sets of Vectors and the Gram-Schmidt Process [Hint: ||v + w||2 = v + w, v + w .] 16. Using Equation (4.11.11), determine all vectors satisfying v, v > 0. Such vectors are called spacelike vectors. (b) Two vectors v and w in an inner product space V are called orthogonal if v, w = 0. Use (a) to prove the general Pythagorean theorem: If v and w are orthogonal in an inner product space V , then 17. Make a sketch of R2 and indicate the position of the null, timelike, and spacelike vectors. 18. Consider the vector space Rn , and let v = (v1 , v2 , . . . , vn ) and w = (w1 , w2 , . . . , wn ) be vectors in Rn . Show that the mapping , defined by ||v + w||2 = ||v||2 + ||w||2 . v, w = k1 v1 w1 + k2 v2 w2 + · · · + kn vn wn (c) Prove that for all v, w in V , (i) ||v + w||2 − ||v − w||2 = 4 v, w . is a valid inner product on Rn if and only if the constants k1 , k2 , . . . , kn are all positive. 19. Prove from the inner product axioms that, in any inner product space V , v, 0 = 0 for all v in V . 323 (ii) ||v + w||2 + ||v − w||2 = 2(||v||2 + ||w||2 ). 21. Let V be a complex inner product space. Prove that for all v, w in V , 20. Let V be a real inner product space. ||v + w||2 = ||v||2 + 2Re( v, w ) + ||v||2 , (a) Prove that for all v, w ∈ V , ||v + w||2 = ||v||2 + 2 v, w + ||w||2 . 4.12 where Re denotes the real part of a complex number. Orthogonal Sets of Vectors and the Gram-Schmidt Process The discussion in the previous section has shown how an inner product can be used to define the angle between two nonzero vectors. In particular, if the inner product of two nonzero vectors is zero, then the angle between those two vectors is π/2 radians, and therefore it is natural to call such vectors orthogonal (perpendicular). The following definition extends the idea of orthogonality into an arbitrary inner product space. DEFINITION 4.12.1 Let V be an inner product space. 1. Two vectors u and v in V are said to be orthogonal if u, v = 0. 2. A set of nonzero vectors {v1 , v2 , . . . , vk } in V is called an orthogonal set of vectors if whenever i = j. vi , vj = 0, (That is, every vector is orthogonal to every other vector in the set.) 3. A vector v in V is called a unit vector if ||v|| = 1. 4. An orthogonal set of unit vectors is called an orthonormal set of vectors. Thus, {v1 , v2 , . . . , vk } in V is an orthonormal set if and only if (a) vi , vj = 0 whenever i = j . (b) vi , vi = 1 for all i = 1, 2, . . . , k . i i i i i i i “main” 2007/2/16 page 324 i 324 CHAPTER 4 Vector Spaces Remarks 1. The conditions in (4a) and (4b) can be written compactly in terms of the Kronecker delta symbol as vi , vj = δij , i, j = 1, 2, . . . , k. 2. Note that the inner products occurring in Definition 4.12.1 will depend upon which inner product space we are working in. 1 3. If v is any nonzero vector, then v is a unit vector, since the properties of an ||v|| inner product imply that 1 1 1 1 v, v = ||v||2 = 1. v, v= 2 ||v|| ||v|| ||v|| ||v||2 Using Remark 3 above, we can take an orthogonal set of vectors {v1 , v2 , . . . , vk } 1 and create a new set {u1 , u2 , . . . , uk }, where ui = vi is a unit vector for each i . ||vi || Using the properties of an inner product, it is easy to see that the new set {u1 , u2 , . . . , uk } is an orthonormal set (see Problem 31). The process of replacing the vi by the ui is called normalization. Example 4.12.2 Verify that {(−2, 1, 3, 0), (0, −3, 1, −6), (−2, −4, 0, 2)} is an orthogonal set of vectors in R4 , and use it to construct an orthonormal set of vectors in R4 . Solution: Then Let v1 = (−2, 1, 3, 0), v2 = (0, −3, 1, −6), and v3 = (−2, −4, 0, 2). v1 , v2 = 0, v1 , v3 = 0, v2 , v3 = 0, so that the given set of vectors is an orthogonal set. Dividing each vector in the set by its norm yields the following orthonormal set: 1 1 1 √ v1 , √ v2 , √ v3 . 46 26 14 Example 4.12.3 Verify that the functions f1 (x) = 1, f2 (x) = sin x , and f3 (x) = cos x are orthogonal in C 0 [−π, π ], and use them to construct an orthonormal set of functions in C 0 [−π, π ]. Solution: In this case, we have f1 , f2 = f2 , f3 = π −π π −π sin x dx = 0, sin x cos x dx = f1 , f3 = 12 sin x 2 π −π cos x dx = 0, π −π = 0, so that the functions are indeed orthogonal on [−π, π ]. Taking the norm of each function, we obtain ||f1 || = ||f2 || = ||f3 || = π −π π −π π −π 1 dx = √ 2π , sin2 x dx = cos2 x dx = π −π π −π √ 1 (1 − cos 2x) dx = π , 2 √ 1 (1 + cos 2x) dx = π . 2 i i i i i i i “main” 2007/2/16 page 325 i 4.12 Orthogonal Sets of Vectors and the Gram-Schmidt Process 325 Thus an orthonormal set of functions on [−π, π ] is 1 1 1 √ , √ sin x, √ cos x . π π 2π Orthogonal and Orthonormal Bases In the analysis of geometric vectors in elementary calculus courses, it is usual to use the standard basis {i, j, k}. Notice that this set of vectors is in fact an orthonormal set. The introduction of an inner product in a vector space opens up the possibility of using similar bases in a general finite-dimensional vector space. The next definition introduces the appropriate terminology. DEFINITION 4.12.4 A basis {v1 , v2 , . . . , vn } for a (finite-dimensional) inner product space is called an orthogonal basis if vi , vj = 0 whenever i = j, and it is called an orthonormal basis if vi , vj = δij , i, j = 1, 2, . . . , n. There are two natural questions at this point: (1) How can we obtain an orthogonal or orthonormal basis for an inner product space V ? (2) Why is it beneficial to work with an orthogonal or orthonormal basis of vectors? We address the second question first. In light of our work in previous sections of this chapter, the importance of our next theorem should be self-evident. Theorem 4.12.5 If {v1 , v2 , . . . , vk } is an orthogonal set of nonzero vectors in an inner product space V , then {v1 , v2 , . . . , vk } is linearly independent. Proof Assume that c1 v1 + c2 v2 + · · · + ck vk = 0. (4.12.1) We will show that c1 = c2 = · · · = ck = 0. Taking the inner product of each side of (4.12.1) with vi , we find that c1 v1 + c2 v2 + · · · + ck vk , vi = 0, vi = 0. Using the inner product properties on the left side, we have c1 v1 , vi + c2 v2 , vi + · · · + ck vk , vi = 0. Finally, using the fact that for all j = i , we have vj , vi = 0, we conclude that ci vi , vi = 0. Since vi = 0, it follows that ci = 0, and this holds for each i with 1 ≤ i ≤ k . Example 4.12.6 Let V = M2 (R), let W be the subspace of all 2 × 2 symmetric matrices, and let S= 2 −1 11 22 , , −1 0 12 2 −3 . i i i i i i i “main” 2007/2/16 page 326 i 326 CHAPTER 4 Vector Spaces Define an inner product on V via11 a11 a12 bb , 11 12 a21 a22 b21 b22 = a11 b11 + a12 b12 + a21 b21 + a22 b22 . Show that S is an orthogonal basis for W . Solution: According to Example 4.6.18, we already know that dim[W ] = 3. Using the given inner product, it can be directly shown that S is an orthogonal set, and hence, Theorem 4.12.5 implies that S is linearly independent. Therefore, by Theorem 4.6.10, S is a basis for W . Let V be a (finite-dimensional) inner product space, and suppose that we have an orthogonal basis {v1 , v2 , . . . , vn } for V . As we saw in Section 4.7, any vector v in V can be written uniquely in the form v = c1 v1 + c2 v2 + · · · + cn vn , (4.12.2) where the unique n-tuple (c1 , c2 , . . . , cn ) consists of the components of v relative to the given basis. It is easier to determine the components ci in the case of an orthogonal basis than it is for other bases, because we can simply form the inner product of both sides of (4.12.2) with vi as follows: v, vi = c1 v1 + c2 v2 + · · · + cn vn , vi = c1 v1 , vi + c2 v2 , vi + · · · + cn vn , vi = ci ||vi ||2 , where the last step follows from the orthogonality properties of the basis {v1 , v2 , . . . , vn }. Therefore, we have proved the following theorem. Theorem 4.12.7 Let V be a (finite-dimensional) inner product space with orthogonal basis {v1 , v2 , . . . , vn }. Then any vector v ∈ V may be expressed in terms of the basis as v= v, v1 ||v1 ||2 v1 + v, v2 ||v2 ||2 v , vn ||vn ||2 v2 + · · · + vn . Theorem 4.12.7 gives a simple formula for writing an arbitrary vector in an inner product space V as a linear combination of vectors in an orthogonal basis for V . Let us illustrate with an example. Example 4.12.8 Let V , W , and S be as in Example 4.12.6. Find the components of the vector v= 0 −1 −1 2 relative to S . Solution: From the formula given in Theorem 4.12.7, we have v= 2 6 2 2 −1 + −1 0 7 10 11 − 12 21 22 , 2 −3 11 This defines a valid inner product on V by Problem 4 in Section 4.11. i i i i i i i “main” 2007/2/16 page 327 i 4.12 Orthogonal Sets of Vectors and the Gram-Schmidt Process 327 so the components of v relative to S are 12 10 , ,− . 37 21 If the orthogonal basis {v1 , v2 , . . . , vn } for V is in fact orthonormal, then since ||vi || = 1 for each i , we immediately deduce the following corollary of Theorem 4.12.7. Corollary 4.12.9 Let V be a (finite-dimensional) inner product space with an orthonormal basis {v1 , v2 , . . . , vn }. Then any vector v ∈ V may be expressed in terms of the basis as v = v, v1 v1 + v, v2 v2 + · · · + v, vn vn . Remark Corollary 4.12.9 tells us that the components of a given vector v relative to the orthonormal basis {v1 , v2 , . . . , vn } are precisely the numbers v, vi , for 1 ≤ i ≤ n. Thus, by working with an orthonormal basis for a vector space, we have a simple method for getting the components of any vector in the vector space. Example 4.12.10 We can write an arbitrary vector in Rn , v = (a1 , a2 , . . . , an ), in terms of the standard basis {e1 , e2 , . . . , en } by noting that v, ei = ai . Thus, v = a1 e1 + a2 e2 + · · · + an en . Example 4.12.11 We can equip the vector space P1 of all polynomials of degree ≤ 1 with inner product p, q = 1 −1 p(x)q(x) dx, √ thus making P1 into an inner product space. Verify that the vectors p0 = 1/ 2 and √ p1 = 1.5x form an orthonormal basis for P1 and use Corollary 4.12.9 to write the vector q = 1 + x as a linear combination of p0 and p1 . Solution: We have p0 , p1 = ||p0 || = ||p1 || = 1 −1 1√ √ · 1.5x dx = 0, 2 p0 , p0 = p1 , p1 = 1 −1 1 −1 2 p0 dx = 2 p1 dx = 1 −1 1 −1 √ 1 dx = 1 = 1, 2 32 x dx = 2 13 x 2 1 −1 = √ 1 = 1. Thus, {p0 , p1 } is an orthonormal (and hence linearly independent) set of vectors in P1 . Since dim[P1 ] = 2, Theorem 4.6.10 shows that {p0 , p1 } is an (orthonormal) basis for P1 . Finally, we wish to write q = 1 + x as a linear combination of p0 and p1 , by using √ Corollary 4.12.9. We leave it to the reader to verify that q , p0 = 2 and q , p1 = 2 . 3 Thus, we have √ 2 p0 + √ 2 1 p1 = 2 · √ + 3 2 2 · 3 √ So the component vector of 1 + x relative to {p0 , p1 } is ( 2, 1+x = 3 x. 2 2T 3) . i i i i i i i “main” 2007/2/16 page 328 i 328 CHAPTER 4 Vector Spaces The Gram-Schmidt Process Next, we return to address the first question we raised earlier: How can we obtain an orthogonal or orthonormal basis for an inner product space V ? The idea behind the process is to begin with any basis for V , say {x1 , x2 , . . . , xn }, and to successively replace these vectors with vectors v1 , v2 , . . . , vn that are orthogonal to one another, and to ensure that, throughout the process, the span of the vectors remains unchanged. This is known as the Gram-Schmidt process. To describe it, we shall once more appeal to a look at geometric vectors. If v and w are any two linearly independent (noncollinear) geometric vectors, then the orthogonal projection of w on v is the vector P(w, v) shown in Figure 4.12.1. We see from the figure that an orthogonal basis for the subspace (plane) of 3-space spanned by v and w is {v1 , v2 }, where v1 = v v2 = w − P(w, v). In order to generalize this result to an arbitrary inner product space, we need to derive an expression for P(w, v) in terms of the dot product. We see from Figure 4.12.1 that the norm of P(w, v) is z v2 and w w P(w, v) v v1 P(w, v) y ||P(w, v)|| = ||w|| cos θ, where θ is the angle between v and w. Thus P(w, v) = ||w|| cos θ x Figure 4.12.1: Obtaining an orthogonal basis for a two-dimensional subspace of R3 . v , ||v|| which we can write as P(w, v) = ||w|| ||v|| cos θ v. ||v||2 (4.12.3) Recalling that the dot product of the vectors w and v is defined by w · v = ||w|| ||v|| cos θ, it follows from Equation (4.12.3) that P(w, v) = (w · v) v, ||v||2 or equivalently, using the notation for the inner product introduced in the previous section, P(w, v) = w, v v. ||v||2 Now let x1 and x2 be linearly independent vectors in an arbitrary inner product space V . We show next that the foregoing formula can also be applied in V to obtain an orthogonal basis {v1 , v2 } for the subspace of V spanned by {x1 , x2 }. Let v1 = x1 and v2 = x2 − P(x2 , v1 ) = x2 − x2 , v1 v1 . ||v1 ||2 (4.12.4) i i i i i i i “main” 2007/2/16 page 329 i 4.12 Orthogonal Sets of Vectors and the Gram-Schmidt Process 329 Note from (4.12.4) that v2 can be written as a linear combination of {x1 , x2 }, and hence, v2 ∈ span{x1 , x2 }. Since we also have that x2 ∈ span{v1 , v2 }, it follows that span{v1 , v2 } = span{x1 , x2 }. Next we claim that v2 is orthogonal to v1 . We have x2 , v1 x2 , v1 v , v = x2 , v1 − v1 , v1 211 ||v1 || ||v1 ||2 x2 , v1 v1 , v1 = 0, = x2 , v1 − ||v1 ||2 v2 , v1 = x2 − which verifies our claim. We have shown that {v1 , v2 } is an orthogonal set of vectors which spans the same subspace of V as x1 and x2 . The calculations just presented can be generalized to prove the following useful result (see Problem 32). Lemma 4.12.12 Let {v1 , v2 , . . . , vk } be an orthogonal set of vectors in an inner product space V . If x ∈ V , then the vector x − P(x, v1 ) − P(x, v2 ) − · · · − P(x, vk ) is orthogonal to vi for each i . Now suppose we are given a linearly independent set of vectors {x1 , x2 , . . . , xm } in an inner product space V . Using Lemma 4.12.12, we can construct an orthogonal basis for the subspace of V spanned by these vectors. We begin with the vector v1 = x1 as above, and we define vi by subtracting off appropriate projections of xi on v1 , v2 , . . . , vi −1 . The resulting procedure is called the Gram-Schmidt orthogonalization procedure. The formal statement of the result is as follows. Theorem 4.12.13 (Gram-Schmidt Process) Let {x1 , x2 , . . . , xm } be a linearly independent set of vectors in an inner product space V . Then an orthogonal basis for the subspace of V spanned by these vectors is {v1 , v2 , . . . , vm }, where v1 = x1 x2 , v1 v1 ||v1 ||2 x3 , v1 x3 , v2 v3 = x3 − v1 − v2 ||v1 ||2 ||v2 ||2 . . . v2 = x2 − i −1 vi = xi − k =1 xi , vk vk ||vk ||2 . . . m−1 vm = xm − k =1 xm , vk vk . ||vk ||2 Proof Lemma 4.12.12 shows that {v1 , v2 , . . . , vm } is an orthogonal set of vectors. Thus, both {v1 , v2 , . . . , vm } and {x1 , x2 , . . . , xm } are linearly independent sets, and hence span{v1 , v2 , . . . , vm } and span{x1 , x2 , . . . , xm } i i i i i i i “main” 2007/2/16 page 330 i 330 CHAPTER 4 Vector Spaces are m-dimensional subspaces of V . (Why?) Moreover, from the formulas given in Theorem 4.12.13, we see that each xi ∈ span{v1 , v2 , . . . , vm }, and so span{x1 , x2 , . . . , xm } is a subset of span{v1 , v2 , . . . , vm }. Thus, by Corollary 4.6.14, span{v1 , v2 , . . . , vm } = span{x1 , x2 , . . . , xm }. We conclude that {v1 , v2 , . . . , vm } is a basis for the subspace of V spanned by x1 , x2 , . . . , xm . Example 4.12.14 Obtain an orthogonal basis for the subspace of R4 spanned by x1 = (1, 0, 1, 0), Solution: x2 = (1, 1, 1, 1), x3 = (−1, 2, 0, 1). Following the Gram-Schmidt process, we set v1 = x1 = (1, 0, 1, 0). Next, we have v2 = x2 − x2 , v1 2 v = (1, 1, 1, 1) − (1, 0, 1, 0) = (0, 1, 0, 1) 21 2 ||v1 || and x3 , v1 x3 , v2 v1 − v2 ||v1 ||2 ||v2 ||2 1 3 = (−1, 2, 0, 1) + (1, 0, 1, 0) − (0, 1, 0, 1) 2 2 111 1 = − , , ,− . 222 2 v3 = x3 − The orthogonal basis so obtained is 111 1 (1, 0, 1, 0), (0, 1, 0, 1), − , , , − 222 2 . Of course, once an orthogonal basis {v1 , v2 , . . . , vm } is obtained for a subspace vi of V , we can normalize this basis by setting ui = to obtain an orthonormal ||vi || basis {u1 , u2 , . . . , um }. For instance, an orthonormal basis for the subspace of R4 in the preceding example is 1 1 111 1 1 1 √ , 0, √ , 0 , 0, √ , 0, √ , − , , , − 222 2 2 2 2 2 Example 4.12.15 . Determine an orthogonal basis for the subspace of C 0 [−1, 1] spanned by the functions f1 (x) = x , f2 (x) = x 3 , f3 (x) = x 5 , using the same inner product introduced in the previous section. Solution: In this case, we let {g1 , g2 , g3 } denote the orthogonal basis, and we apply the Gram-Schmidt process. Thus, g1 (x) = x , and g2 (x) = f2 (x) − f2 , g1 g1 (x). ||g1 ||2 (4.12.5) i i i i i i i “main” 2007/2/16 page 331 i 4.12 Orthogonal Sets of Vectors and the Gram-Schmidt Process We have f2 , g1 = 1 −1 1 f2 (x)g1 (x) dx = 1 ||g1 ||2 = g1 , g1 = −1 −1 x 4 dx = 2 5 331 and x 2 dx = 2 . 3 Substituting into Equation (4.12.5) yields g2 (x) = x 3 − 3 x = 1 x(5x 2 − 3). 5 5 We now compute g3 (x). According to the Gram-Schmidt process, g3 (x) = f3 (x) − f3 , g1 f3 , g2 g1 (x) − g2 (x). ||g1 ||2 ||g2 ||2 (4.12.6) We first evaluate the required inner products: f3 , g1 = f3 , g2 = ||g2 ||2 = = 1 −1 1 −1 1 f3 (x)g1 (x) dx = f3 (x)g2 (x) dx = [g2 (x)]2 dx = −1 11 6 25 −1 (25x 1 −1 1 5 x 6 dx = 2 , 7 1 −1 x 6 (5x 2 − 3) dx = 11 2 2 25 −1 x (5x 1 5 10 9 − 6 7 = 16 315 , − 3)2 dx − 30x 4 + 9x 2 ) dx = 8 175 . Substituting into Equation (4.12.6) yields g3 (x) = x 5 − 3 x − 2 x(5x 2 − 3) = 7 9 1 5 63 (63x − 70x 3 + 15x). Thus, an orthogonal basis for the subspace of C 0 [−1, 1] spanned by f1 , f2 , and f3 is 1 x , 1 x(5x 2 − 3), 63 x(63x 4 − 70x 2 + 15) . 5 Exercises for 4.12 Key Terms Orthogonal vectors, Orthogonal set, Unit vector, Orthonormal vectors, Orthonormal set, Normalization, Orthogonal basis, Orthonormal basis, Gram-Schmidt process, Orthogonal projection. Skills • Be able to determine whether a given set of vectors are orthogonal and/or orthonormal. • Be able to determine whether a given set of vectors forms an orthogonal and/or orthonormal basis for an inner product space. • Be able to replace an orthogonal set with an orthonormal set via normalization. • Be able to readily compute the components of a vector v in an inner product space V relative to an orthogonal (or orthonormal) basis for V . • Be able to compute the orthogonal projection of one vector w along another vector v: P(w, v). • Be able to carry out the Gram-Schmidt process to replace a basis for V with an orthogonal (or orthonormal) basis for V . i i i i i i i “main” 2007/2/16 page 332 i 332 CHAPTER 4 Vector Spaces True-False Review For Questions 1–7, decide if the given statement is true or false, and give a brief justification for your answer. If true, you can quote a relevant definition or theorem from the text. If false, provide an example, illustration, or brief explanation of why the statement is false. For Problems 6–7, show that the given set of vectors is an orthogonal set in Cn , and hence obtain an orthonormal set of vectors in Cn in each case. 6. {(1 − i, 3 + 2i), (2 + 3i, 1 − i)}. 7. {(1 − i, 1 + i, i), (0, i, 1 − i), (−3 + 3i, 2 + 2i, 2i)}. 8. Consider the vectors v = (1 − i, 1 + 2i), w = (2 + i, z) in C2 . Determine the complex number z such that {v, w} is an orthogonal set of vectors, and hence obtain an orthonormal set of vectors in C2 . 1. Every orthonormal basis for an inner product space V is also an orthogonal basis for V . 2. Every linearly independent set of vectors in an inner product space V is orthogonal. π 0 3. With the inner product f, g = f (t)g(t) dt , the functions f (x) = cos x and g(x) = sin x are an orthogonal basis for span{cos x, sin x }. For Problems 9–10, show that the given functions in C 0 [−1, 1] are orthogonal, and use them to construct an orthonormal set of functions in C 0 [−1, 1]. 9. f1 (x) = 1, f2 (x) = sin πx, f3 (x) = cos πx . 4. The Gram-Schmidt process applied to the vectors {x1 , x2 , x3 } yields the same basis as the Gram-Schmidt process applied to the vectors {x3 , x2 , x1 }. 1 10. f1 (x) = 1, f2 (x) = x, f3 (x) = 2 (3x 2 − 1). These are the Legendre polynomials that arise as solutions of the Legendre differential equation 5. In expressing the vector v as a linear combination of the orthogonal basis {v1 , v2 , . . . , vn } for an inner product space V , the coefficient of vi is ci = v , vi . ||vi ||2 (1 − x 2 )y − 2xy + n(n + 1)y = 0, when n = 0, 1, 2, respectively. For Problems 11–12, show that the given functions are orthonormal on [−1, 1]. 6. If u and v are orthogonal vectors and w is any vector, then P(P(w, v), u) = 0. 7. If w1 , w2 , and v are vectors in an inner product space V , then P(w1 + w2 , v) = P(w1 , v) + P(w2 , v). Problems For Problems 1–4, determine whether the given set of vectors is an orthogonal set in Rn . For those that are, determine a corresponding orthonormal set of vectors. 1. {(2, −1, 1), (1, 1, −1), (0, 1, 1)}. 2. {(1, 3, −1, 1), (−1, 1, 1, −1), (1, 0, 2, 1)} 3. {(1, 2, −1, 0), (1, 0, 1, 2), (−1, 1, 1, 0), (1, −1, −1, 0)}. 4. {(1, 2, −1, 0, 3), (1, 1, 0, 2, −1), (4, 2, −4, −5, −4)} 5. Let v1 = (1, 2, 3), v2 = (1, 1, −1). Determine all nonzero vectors w such that {v1 , v2 , w} is an orthogonal set. Hence obtain an orthonormal set of vectors in R3 . 11. f1 (x) = sin πx, f2 (x) = sin 2πx, f3 (x) = sin 3πx . [Hint: The trigonometric identity 1 sin a sin b = 2 [cos(a + b) − cos(a − b)] will be useful.] 12. f1 (x) = cos 3πx . cos πx, f2 (x) = cos 2πx, f3 (x) = 13. Let A1 = 11 , A2 = −1 2 A3 = −1 1 , and 21 −1 −3 . 02 Use the inner product A, B = a11 b11 + a12 b12 + a21 b21 + a22 b22 to find all matrices A4 = ab cd such that {A1 , A2 , A3 , A4 } is an orthogonal set of matrices in M2 (R). i i i i i i i “main” 2007/2/16 page 333 i 4.12 Orthogonal Sets of Vectors and the Gram-Schmidt Process For Problems 14–19, use the Gram-Schmidt process to determine an orthonormal basis for the subspace of Rn spanned by the given set of vectors. 14. {(1, −1, −1), (2, 1, −1)}. 333 On Pn , define the inner product p1 , p2 by p1 , p2 = a0 b0 + a1 b1 + · · · + an bn for all polynomials 15. {(2, 1, −2), (1, 3, −1)}. p1 (x) = a0 + a1 x + · · · + an x n , 16. {(−1, 1, 1, 1), (1, 2, 1, 2)}. p2 (x) = b0 + b1 x + · · · + bn x n . 17. {(1, 0, −1, 0), (1, 1, −1, 0), (−1, 1, 0, 1)} 18. {(1, 2, 0, 1), (2, 1, 1, 0), (1, 0, 2, 1)}. 19. {(1, 1, −1, 0), (−1, 0, 1, 1), (2, −1, 2, 1)}. 20. If 3 14 A = 1 −2 1 , 1 52 determine an orthogonal basis for rowspace(A). For Problems 21–22, determine an orthonormal basis for the subspace of C3 spanned by the given set of vectors. Make sure that you use the appropriate inner product in C3 . 21. {(1 − i, 0, i), (1, 1 + i, 0)}. 22. {(1 + i, i, 2 − i), (1 + 2i, 1 − i, i)}. For Problems 23–25, determine an orthogonal basis for the subspace of C 0 [a, b] spanned by the given vectors, for the given interval [a, b]. 23. f1 (x) = 1, f2 (x) = x, f3 (x) = x 2 , a = 0, b = 1. 24. f1 (x) = 1, f2 (x) = x 2 , f3 (x) = x 4 , a = −1, b = 1. f3 (x) = cos x, On M2 (R) define the inner product A, B by A, B = 5a11 b11 + 2a12 b12 + 3a21 b21 + 5a22 b22 for all matrices A = [aij ] and B = [bij ]. For Problems 26– 27, use this inner product in the Gram-Schmidt procedure to determine an orthogonal basis for the subspace of M2 (R) spanned by the given matrices. 26. A1 = 1 −1 , A2 = 21 27. A1 = 01 , A2 = 10 30. Let {u1 , u2 , v} be linearly independent vectors in an inner product space V , and suppose that u1 and u2 are orthogonal. Define the vector u3 in V by u3 = v + λu1 + µu2 , where λ, µ are scalars. Derive the values of λ and µ such that {u1 , u2 , u3 } is an orthogonal basis for the subspace of V spanned by {u1 , u2 , v}. 31. Prove that if {v1 , v2 , . . . , vk } is an orthogonal set of 1 vi vectors in an inner product space V and if ui = ||vi || for each i , then {u1 , u2 , . . . , uk } form an orthonormal set of vectors. 32. Prove Lemma 4.12.12. Let V be an inner product space, and let W be a subspace of V . Set W ⊥ = {v ∈ V : v, w = 0 for all w ∈ W }. The set W ⊥ is called the orthogonal complement of W in V . Problems 33–38 explore this concept in some detail. Deeper applications can be found in Project 1 at the end of this chapter. 33. Prove that W ⊥ is a subspace of V . 34. Let V = R3 and let W = span{(1, 1, −1)}. 2 −3 . 41 01 , A3 = 11 28. p1 (x) = 1 − 2x + 2x 2 , p2 (x) = 2 − x − x 2 . 29. p1 (x) = 1 + x 2 , p2 (x) = 2 − x + x 3 , p3 (x) = 2x 2 − x . 25. f1 (x) = 1, f2 (x) = sin x, a = −π/2, b = π/2. For Problems 28–29, use this inner product to determine an orthogonal basis for the subspace of Pn spanned by the given polynomials. Find W ⊥ . 11 . 10 Also identify the subspace of M2 (R) spanned by {A1 , A2 , A3 }. 35. Let V = R4 and let W = span{(0, 1, −1, 3), (1, 0, 0, 3)}. Find W ⊥ . i i i i i i i “main” 2007/2/16 page 334 i 334 CHAPTER 4 Vector Spaces 36. Let V = M2 (R) and let W be the subspace of 2 × 2 symmetric matrices. Compute W ⊥ . 37. Prove that W ∩ W ⊥ = 0. (That is, W and W ⊥ have no nonzero elements in common.) [Hint: You may assume that interchange of the infinite summation with the integral is permissible.] (c) Use a similar procedure to show that 38. Prove that if W1 is a subset of W2 , then (W2 )⊥ is a subset of (W1 )⊥ . bm = 39. The subject of Fourier series is concerned with the representation of a 2π -periodic function f as the following infinite linear combination of the set of functions {1, sin nx, cos nx }∞ 1 : n= 1 f (x) = 2 a0 + ∞ n=1 (an cos nx (a) Use appropriate trigonometric identities, or some form of technology, to verify that the set of functions {1, sin nx, cos nx }∞ 1 n= is orthogonal on the interval [−π, π ]. (b) By multiplying (4.12.7) by cos mx and integrating over the interval [−π, π ], show that 1 a0 = π and am = 1 π π −π f (x) dx π −π 4.13 π −π f (x) sin mx dx. It can be shown that if f is in C 1 (−π, π), then Equation (4.12.7) holds for each x ∈ (−π, π). The series appearing on the right-hand side of (4.12.7) is called the Fourier series of f , and the constants in the summation are called the Fourier coefficients for f . + bn sin nx). (4.12.7) In this problem, we investigate the possibility of performing such a representation. 1 π (d) Show that the Fourier coefficients for the function f (x) = x, −π < x ≤ π, f (x + 2π) = f (x), are an = 0, n = 0, 1, 2, . . . , 2 bn = − cos nπ, n n = 1, 2, . . . , and thereby determine the Fourier series of f . (e) Using some form of technology, sketch the approximations to f (x) = x on the interval (−π, π) obtained by considering the first three terms, first five terms, and first ten terms in the Fourier series for f . What do you conclude? f (x) cos mx dx. Chapter Review In this chapter we have derived some basic results in linear algebra regarding vector spaces. These results form the framework for much of linear mathematics. Following are listed some of the chapter highlights. The Definition of a Vector Space A vector space consists of four different components: 1. A set of vectors V . 2. A set of scalars F (either the set of real numbers R, or the set of complex numbers C). 3. A rule, +, for adding vectors in V . 4. A rule, · , for multiplying vectors in V by scalars in F . Then (V , +, ·) is a vector space over F if and only if axioms A1–A10 of Definition 4.2.1 are satisfied. If F is the set of all real numbers, then (V , +, ·) is called a real vector space, whereas if F is the set of all complex numbers, then (V , +, ·) is called a complex i i i i i i i “main” 2007/2/16 page 335 i 4.13 Chapter Review 335 vector space. Since it is usually quite clear what the addition and scalar multiplication operations are, we usually specify a vector space by giving only the set of vectors V . The major vector spaces we have dealt with are the following: Rn Cn Mn (R) C k (I ) Pn the (real) vector space of all ordered n-tuples of real numbers. the (complex) vector space of all ordered n-tuples of complex numbers. the (real) vector space of all n × n matrices with real elements. the vector space of all real-valued functions that are continuous and have (at least) k continuous derivatives on I . the vector space of all polynomials of degree ≤ n with real coefficients. Subspaces Usually the vector space V that underlies a given problem is known. It is often one that appears in the list above. However, the solution of a given problem in general involves only a subset of vectors from this vector space. The question that then arises is whether this subset of vectors is itself a vector space under the same operations of addition and scalar multiplication as in V . In order to answer this question, Theorem 4.3.2 tells us that a nonempty subset of a vector space V is a subspace of V if and only if the subset is closed under addition and closed under scalar multiplication. Spanning Sets A set of vectors {v1 , v2 , . . . , vk } in a vector space V is said to span V if every vector in V can be written as a linear combination of v1 , v2 , . . . , vk —that is, if for every v ∈ V , there exist scalars c1 , c2 , . . . , ck such that v = c1 v1 + c2 v2 + · · · + ck vk . Given a set of vectors {v1 , v2 , . . . , vk } in a vector space V , we can form the set of all vectors that can be written as a linear combination of v1 , v2 , . . . , vk . This collection of vectors is a subspace of V called the subspace spanned by {v1 , v2 , . . . , vk }, and denoted span{v1 , v2 , . . . , vk }. Thus, span{v1 , v2 , . . . , vk } = {v ∈ V : v = c1 v1 + c2 v2 + · · · + ck vk }. Linear Dependence and Linear Independence Let {v1 , v2 , . . . , vk } be a set of vectors in a vector space V , and consider the vector equation c1 v1 + c2 v2 + · · · + ck vk = 0. (4.13.1) Clearly this equation will hold if c1 = c2 = · · · = ck = 0. The question of interest is whether there are nonzero values of some or all of the scalars c1 , c2 , . . . , ck such that (4.13.1) holds. This leads to the following two ideas: There exist scalars c1 , c2 , . . . , ck , not all zero, such that (4.13.1) holds. Linear independence: The only values of the scalars c1 , c2 , . . . , ck such that (4.13.1) holds are c1 = c2 = · · · = ck = 0. Linear dependence: To determine whether a set of vectors is linearly dependent or linearly independent we usually have to use (4.13.1). However, if the vectors are from Rn , then we can use Corollary 4.5.15, whereas for vectors in C k −1 (I ) the Wronskian can be useful. i i i i i i i “main” 2007/2/16 page 336 i 336 CHAPTER 4 Vector Spaces Bases and Dimension A linearly independent set of vectors that spans a vector space V is called a basis for V . If {v1 , v2 , . . . , vk } is a basis for V , then any vector in V can be written uniquely as v = c1 v1 + c2 v2 + · · · + ck vk , for appropriate values of the scalars c1 , c2 , . . . , ck . 1. All bases in a finite-dimensional vector space V contain the same number of vectors, and this number is called the dimension of V , denoted dim[V ]. 2. We can view the dimension of a finite-dimensional vector space V in two different ways. First, it gives the minimum number of vectors that span V . Alternatively, we can regard dim[V ] as determining the maximum number of vectors that a linearly independent set in V can contain. 3. If dim[V ] = n, then any linearly independent set of n vectors in V is a basis for V . Alternatively, any set of n vectors that spans V is a basis for V . Inner Product Spaces An inner product is a mapping that associates, with any two vectors u and v in a vector space V , a scalar that we denote by u, v . This mapping must satisfy the properties given in Definition 4.11.10. The main reason for introducing the idea of an inner product is that it enables us to extend the familiar idea of orthogonality and length of vectors in R3 to a general vector space. Thus u and v are said to be orthogonal in an inner product space if and only if u , v = 0. The Gram-Schmidt Orthonormalization Process The Gram-Schmidt procedure is a process that takes a linearly independent set of vectors {x1 , x2 , . . . , xm } in an inner product space V and returns an orthogonal basis {v1 , v2 , . . . , vm } for span{x1 , x2 , . . . , xm }. Additional Problems For Problems 1–2, let r and s denote scalars and let v and w denote vectors in R5 . 1. Prove that (r + s)v = r v + s v. 2. Prove that r(v + w) = r v + r w. For Problems 3–13, determine whether the given set (together with the usual operations on that set) forms a vector space over R. In all cases, justify your answer carefully. 3. The set of polynomials of degree 5 or less whose coefficients are even integers. 4. The set of all polynomials of degree 5 or less whose coefficients of x 2 and x 3 are zero. 5. The set of solutions to the linear system − 2x2 + 5x3 = 7, 4x1 − 6x2 + 3x3 = 0. i i i i i i i “main” 2007/2/16 page 337 i 4.13 6. The set of solutions to the linear system Chapter Review 337 For Problems 19–24, decide (with justification) whether W is a subspace of V . 4x1 − 7x2 + 2x3 = 0, 5x1 − 2x2 + 9x3 = 0. 19. V = R2 , W = {(x, y) : x 2 − y = 0}. 7. The set of 2 × 2 real matrices whose entries are either all zero or all nonzero. 8. The set of 2 × 2 real matrices that commute with the matrix 12 . 02 9. The set of all functions f : [0, 1] → [0, 1] such that 1 3 f (0) = f ( 4 ) = f ( 0 ) = f ( 4 ) = f (1) = 0. 2 10. The set of all functions f : [0, 1] → [0, 1] such that f (x) ≤ x for all x in [0, 1]. 11. The set of n × n matrices A such that A2 is symmetric. 12. The set of all points (x, y) in R2 that are equidistant from (−1, 2) and (1, −2). 13. The set of all points (x, y, z) in R3 that are a distance 5 from the point (0, −3, 4). 14. Let 20. V = R2 , W = {(x, x 3 ) : x ∈ R}. 21. V = M2 (R), W = {2 × 2 orthogonal matrices}. [An n × n matrix A is orthogonal if it is invertible and A−1 = AT .] 22. V = C [a, b], W = {f ∈ V : f (a) = 2f (b)}. 23. V = C [a, b], W = {f ∈ V : b a f (x) dx = 0}. 24. V = M3×2 (R), ab W = c d : a + b = c + f and a − c = e − f − d . ef For Problems 25–32, decide (with justification) whether or not the given set S of vectors (a) spans V , and (b) is linearly independent. 25. V = R3 , S = {(5, −1, 2), (7, 1, 1)}. 26. V = R3 , S = {(6, −3, 2), (1, 1, 1), (1, −8, −1)}. V = {(a1 , a2 ) : a1 , a2 ∈ R, a2 > 0}. 27. V = R4 , S = {(6, −3, 2, 0),(1, 1, 1, 0),(1, −8, −1, 0)}. Define addition and scalar multiplication on V as follows: 28. V = R3 , S = {(10, −6, 5), (3, −3, 2), (0, 0, 0), (6, 4, −1), (7, 7, −2)}. (a1 , a2 ) + (b1 , b2 ) = (a1 + b1 , a2 b2 ), k(a1 , a2 ) = k (ka1 , a2 ), k ∈ R. Explicitly verify that V is a vector space over R. 15. Show that W = {(a, 2a ) : a ∈ R} is a subspace of the vector space V given in the preceding problem. 29. V = P3 , S = {2x − x 3 , 1 + x + x 2 , 3, x }. 30. V = P4 , S = {x 4 +x 2 +1,x 2 +x +1,x + 1,x 4 +2x +3}. 31. V = M2×3 (R), −1 0 0 321 −1 −2 −3 S= , , , 011 123 321 −11 6 −5 1 −2 −5 . 16. Show that {(1, 2), (3, 8)} is a linearly dependent set in the vector space V in Problem 14. 32. V = M2 (R), 12 34 −2 −1 S= , , , 21 43 −1 −2 17. Show that {(1, 4), (2, 1)} is a basis for the vector space V in Problem 14. −3 0 20 , 03 00 18. What is the dimension of the subspace of P2 given by W = span{2 + x 2 , 4 − 2x + 3x 2 , 1 + x }? . 33. Prove that if {v1 , v2 , v3 } is linearly independent and v4 is not in span{v1 , v2 , v3 }, then {v1 , v2 , v3 , v4 } is linearly independent. i i i i i i i “main” 2007/2/16 page 338 i 338 CHAPTER 4 Vector Spaces −3 −6 . −6 −12 34. Let A be an m × n matrix, let v ∈ colspace(A) and let w ∈ nullspace(AT ). Prove that v and w are orthogonal. 41. A = 35. Let W denote the set of all 3 × 3 skew-symmetric matrices. −1 42. A = 3 7 −4 0 43. A = 6 −2 3 1 44. A = 1 −2 (a) Show that W is a subspace of M3 (R). (b) Find a basis and the dimension of W . (c) Extend the basis you constructed in part (b) to a basis for M3 (R). 36. Let W denote the set of all 3 × 3 matrices whose rows and columns add up to zero. (a) Show that W is a subspace of M3 (R). (b) Find a basis and the dimension of W . (c) Extend the basis you constructed in part (b) to a basis for M3 (R). 37. Let (V , +V , ·V ) and (W, +W , ·W ) be vector spaces and define V ⊕ W = {(v, w) : v ∈ V and w ∈ W }. Prove that (a) V ⊕ W is a vector space, under componentwise operations. (b) Via the identification v → (v, 0), V is a subspace of V ⊕ W , and likewise for W . (c) If dim[V ] = n and dim[W ] = m, then dim[V ⊕ W ] = m + n. [Hint: Write a basis for V ⊕ W in terms of bases for V and W .] 38. Show that a basis for P3 need not contain a polynomial of each degree 0, 1, 2, 3. 39. Prove that if A is a matrix whose nullspace and column space are the same, then A must have an even number of columns. 40. Let 62 0 3 1 5 . 21 7 15 03 10 13 . 5 2 5 10 5520 0 2 2 1 . 1 1 −2 −2 0 −4 −2 −2 For Problems 45–46, find an orthonormal basis for the row space, column space, and null space of the given matrix A. 126 2 1 6 45. A = 0 1 2. 102 1 35 −1 −3 1 46. A = 0 2 3 . 1 5 2 1 58 For Problems 47–50, find an orthogonal basis for the span of the set S , where S is given in 47. Problem 25. 48. Problem 26. 49. Problem 29, using p · q = 1 0 p(t)q(t) dt . 50. Problem 32, using the inner product defined in Problem 4 of Section 4.11. b1 b2 B= . . . and C = c1 c2 . . . cn . bn Prove that if all entries b1 , b2 , . . . , bn and c1 , c2 , . . . , cn are nonzero, then the n × n matrix A = BC has nullity n − 1. For Problems 41–44, find a basis and the dimension for the row space, column space, and null space of the given matrix A. For Problems 51–54, determine the angle between the given vectors u and v using the standard inner product on Rn . 51. u = (2, 3) and v = (4, −1). 52. u = (−2, −1, 2, 4) and v = (−3, 5, 1, 1). 53. Repeat Problems 51–52 for the inner product on Rn given by u, v = 2u1 v1 + u2 v2 + u3 v3 + · · · + un vn . i i i i i i i “main” 2007/2/16 page 339 i 4.13 54. Let t0 , t1 , . . . , tn be real numbers. For p and q in Pn , define p · q = p(t0 )q(t0 ) + p(t1 )q(t1 ) + · · · + p(tn )q(tn ). (a) Prove that p · q defines a valid inner product on Pn . (b) Let t0 = −3, t1 = −1, t2 = 1, and t3 = 3. Let p0 (t) = 1, p1 (t) = t , and p2 (t) = t 2 . Find a polynomial q that is orthogonal to p0 and p1 , such that {p0 , p1 , q } is an orthogonal basis for span{p0 , p1 , p2 }. Chapter Review 339 55. Find the distance from the point (2, 3, 4) to the line in R3 passing through (0, 0, 0) and (6, −1, −4). 56. Let V be an inner product space with basis {v1 , v2 , . . . , vn }. If x and y are vectors in V such that x · vi = y · vi for each i = 1, 2, . . . , n, prove that x = y. 57. State as many conditions as you can on an n × n matrix A that are equivalent to its invertibility. Project I: Orthogonal Complement Let V be an inner product space and let W be a subspace of V . Part 1 Definition Let W ⊥ = {v ∈ V : v, w = 0 for all w ∈ W }. Show that W ⊥ is a subspace of V and that W ⊥ and W share only the zero vector: W ⊥ ∩ W = {0}. Part 2 Examples (a) Let V = M2 (R) with inner product a11 a12 bb , 11 12 a21 a22 b21 b22 = a11 b11 + a12 b12 + a21 b21 + a22 b22 . Find the orthogonal complement of the set W of 2 × 2 symmetric matrices. (b) Let A be an m × n matrix. Show that (rowspace(A))⊥ = nullspace(A) and (colspace(A))⊥ = nullspace(AT ). Use this to find the orthogonal complement of the row space and column space of the matrices below: (i) A = 3 1 −1 . 6 0 −4 −1 0 6 2 (ii) A = 3 −1 0 4 . 1 1 1 −1 (c) Find the orthogonal complement of (i) the line in R3 containing the points (0, 0, 0) and (2, −1, 3). (ii) the plane 2x + 3y − 4z = 0 in R3 . i i i i i i i “main” 2007/2/16 page 340 i 340 CHAPTER 4 Vector Spaces Part 3 Some Theoretical Results product space V . Let W be a subspace of a finite-dimensional inner (a) Show that every vector in V can be written uniquely in the form w + w⊥ , where w ∈ W and w⊥ ∈ W ⊥ . [Hint: By Gram-Schmidt, v can be projected onto the subspace W as, say, projW (v), and so v = projW (v) + w⊥ , where w⊥ ∈ W ⊥ . For the uniqueness, use the fact that W ∩ W ⊥ = {0}.] (b) Use part (a) to show that dim[V ] = dim[W ] + dim[W ⊥ ]. (c) Show that (W ⊥ )⊥ = W. Project II: Line-Fitting Data Points Suppose data points (x1 , y1 ), (x2 , y2 ), . . . , (xn , yn ) in the xy -plane have been collected. Unless these data points are collinear, there will be no line that contains all of them. We wish to find a line, commonly known as a least-squares line, that approximates the data points as closely as possible. How do we go about finding such a line? The approach we take12 is to write the line as y = mx + b, where m and b are unknown constants. Part 1 Derivation of the Least-Squares Line (a) By substituting the data points (xi , yi ) for x and y in the equation y = mx + b, show that the matrix equation Ax = y is obtained, where x1 1 y1 x2 1 y2 m A = . . , x= , and y = . . b . . . .. . xn 1 yn Unless the data points are collinear, the system Ax = y obtained in part (a) has no solution for x. In other words, the vector y does not lie in the column space of A. The goal then becomes to find x0 such that the distance ||y − Ax0 || is as small as possible. This will happen precisely when y − Ax0 is perpendicular to the column space of A. In other words, for all x ∈ R2 , we must have (Ax) · (y − Ax0 ) = 0. (b) Using the fact that the dot product of vectors u and v can be written as a matrix multiplication, u · v = uT v , show that (Ax) · (y − Ax0 ) = x · (AT y − AT Ax0 ). (c) Conclude that AT y = AT Ax0 . Provided that A has linearly independent columns, the matrix AT A is invertible (see Problem 34, in Section 4.13). 12 We can also obtain the least-squares line by using optimization techniques from multivariable calculus, but the goal here is to illustrate the use of linear systems and projections. i i i i i i i “main” 2007/2/16 page 341 i 4.13 Chapter Review 341 (d) Show that the least-squares solution is x0 = (AT A)−1 AT y and therefore, Ax0 = A(AT A)−1 AT y is the point in the column space of A that is closest to y. Therefore, it is the projection of y onto the column space of A, and we write Ax0 = A(AT A)−1 AT y = P y, where P = A(AT A)−1 AT (4.13.2) is called a projection matrix. If A is m × n, what are the dimensions of P ? (e) Referring to the projection matrix P in (4.13.2), show that P A = A and P 2 = P . Geometrically, why are these facts to be expected? Also show that P is a symmetric matrix. Part 2 Some Applications In parts (a)–(d) below, find the equation of the least-squares line to the given data points. (a) (0, −2), (1, −1), (2, 1), (3, 2), (4, 2). (b) (−1, 5), (1, 1), (2, 1), (3, −3). (c) (−4, −1), (−3, 1), (−2, 3), (0, 7). (d) (−3, 1), (−2, 0), (−1, 1), (0, −1), (2, −1). In parts (e)–(f), by using the ideas in this project, find the distance from the point P to the given plane. (e) P (0, 0, 0); 2x − y + 3z = 6. (f) P (−1, 3, 5); −x + 3y + 3z = 8. Part 3 A Further Generalization Instead of fitting data points to a least-squares line, one could also attempt to do a parabolic approximation of the form ax 2 + bx + c. By following the outline in Part 1 above, try to determine a procedure for finding the best parabolic approximation to a set of data points. Then try out your procedure on the data points given in Part 2, (a)–(d). i i i i ...
View Full Document

This note was uploaded on 02/17/2012 for the course MA 262 taught by Professor Ber during the Spring '08 term at Purdue University-West Lafayette.

Ask a homework question - tutors are online