la - LINEAR ALGEBRA GABRIEL NAGY Mathematics Department,...

Info iconThis preview shows pages 1–2. Sign up to view the full content.

View Full Document Right Arrow Icon

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: LINEAR ALGEBRA GABRIEL NAGY Mathematics Department, Michigan State University, East Lansing, MI, 48824. DECEMBER 8, 2009 Abstract. These are the lecture notes for the course MTH 415, Applied Linear Algebra, Winter Semester, Summer Session and Fall Semester 2009. These notes present a basic introduction to linear algebra with emphasis on applications. Chapter 1 introduces systems of linear equations, the Gauss-Jordan method to find solutions of these systems which transforms the augmented matrix associated with a linear system into reduced echelon form, where the solutions of the linear system are simple to obtain. We end the Chapter with two applications of linear systems: First, to find approximate solutions to differential equations using the method of finite differences; second, to solve linear systems using floating-point numbers, as happens in a computer. Chapter 2 reviews matrix algebra, that is, we introduce the linear combination of matrices, the multiplication of appropriate matrices, and the inverse of a square matrix. We end the Chapter with the LU-factorization of a matrix. Chapter 3 reviews the determinant of a square matrix, the relation between a non-zero determinant and the existence of the inverse matrix, a formula for the inverse matrix using the matrix of cofactors, and the Cramer rule for the formula of the solution of a linear system with an invertible matrix of coefficients. The advanced part of the course really starts in Chapter 4 with the definition of vector spaces, subspaces, the linear dependence or independence of a set of vectors, bases and dimensions of vector spaces. Both finite and infinite dimensional vector spaces are presented, however finite dimensional vector spaces are the main interest in this notes. Chapter 5 presents linear transformations between vector spaces, the components of a linear transformation in a basis, and the formulas for the change of basis for both vector components and transformation components. Chapter 6 introduces a new structure on a vector space, called an inner product. The definition of an inner product is based on the properties of the dot product in Rn . We study the notion of orthogonal vectors, orthogonal projections, best approximations of a vector on a subspace, and the Gram-Schmidt orthonormalization procedure. The central application of these ideas is the method of least-squares to find approximate solutions to inconsistent linear systems. One application is to find the best polynomial fit to a curve on a plane. Chapter 7 introduces the notion of a normed space, which is a vector space with a norm function which does not necessarily comes from an inner product. We study the main properties of the p-norms on Rn or Cn , which are useful norms in functional analysis. We briefly discuss induced operator norms. The last Section is an application of matrix norms. It discusses the condition number of a matrix and how to use this information to determine ill-conditioned linear systems. Finally, Chapter 8 introduces the notion of eigenvalue and eigenvector of a linear operator. We study diagonalizable operators, which are operators with diagonal matrix components in a basis of its eigenvectors. We also study functions of diagonalizable operators, with the exponential function as a main example. We also discuss how to apply these ideas to find solution of linear systems of ordinary differential equations. Date : December 8, 2009, gnagy@math.msu.edu. 1 2 Contents Chapter 1. Linear systems 1.1. Row picture 1.2. Column picture 1.3. Gauss-Jordan method 1.4. Echelon forms 1.5. Non-homogeneous equations 1.6. Discretization of linear ODE 1.7. Floating-point numbers 5 5 10 15 20 28 36 39 Chapter 2. Matrix algebra 2.1. Linear transformations 2.2. Linear combinations 2.3. Matrix multiplication 2.4. Inverse matrix 2.5. Null and range spaces 2.6. LU-factorization 47 47 56 62 73 81 90 Chapter 3. Determinants 3.1. Definitions and properties 3.2. Applications 97 97 108 Chapter 4. Vector spaces 4.1. Spaces and subspaces 4.2. Linear dependence 4.3. Bases and dimension 4.4. Vector components 114 114 124 128 137 Chapter 5. Linear transformations 5.1. Linear transformations 5.2. Transformation components 5.3. Change of basis 143 143 150 157 Chapter 6. Inner product spaces 6.1. The dot product 6.2. Inner product 6.3. Orthogonal vectors 6.4. Orthogonal projections 6.5. Best approximation 6.6. Gram-Schmidt method 6.7. Least squares 165 165 174 181 188 195 201 206 Chapter 7. Normed spaces 7.1. The p-norm 7.2. Operator norms 7.3. Condition numbers 216 216 227 234 Chapter 8. Spectral decomposition 8.1. Eigenvalues and eigenvectors 8.2. Diagonalizable operators 8.3. Differential equations 8.4. Normal operators 8.5. The spectral theorem 237 237 246 254 266 272 Chapter 9. Appendix 9.1. Review exercises References 274 274 286 G. NAGY – LINEAR ALGEBRA December 8, 2009 3 Overview Linear algebra is a collection of ideas involving systems of linear equations, vectors and vector spaces, and linear transformations between vector spaces. Algebraic equations are called a system when there is more than one equation, and they are called linear when the unknown appears as a multiplicative factor with power zero or one. An example of a linear system of two equations in two unknowns is given in Eqs. (1.3)-(1.4) below. Systems of linear equations are the main subject of this Chapter. Examples of vectors are oriented segments on a line, plane, or space. An oriented segment is an ordered pair of points in these sets. Such ordered pair can be drawn as an arrow that starts on the first point and ends on the second point. Fix a preferred point in the line, plane or space, called the origin point, and then there exists a one-to-one correspondence between points in these sets and arrows that start at the origin point. The set of oriented segments with common origin in a line, plane, and space are called R, R2 and R3 , respectively. A sketch of vectors in these sets can be seen in Fig. 1. Two operations are defined on oriented segments with common origin point: An oriented segment can be stretched or compressed; and two oriented segments with the same origin point can be added using the parallelogram law. An addition of several stretched or compressed vectors is called a linear combination. The set of all oriented segments with common origin point together with this operation of linear combination is the essential structure called vector space. The origin of the word “space” in the term “vector space” originates precisely in these examples, which were associated with the physical space. O O O Figure 1. Example of vectors in the line, plane and space, respectively. Linear transformations are a particular type of functions between vector spaces that preserve the operation of linear combination. An example of a linear transformation is a 2 × 2 matrix A = 1 2 together with a matrix-vector product that specifies how this matrix 34 transforms a vector on the plane into another vector on the plane. The result is thus a function A : R2 → R2 . These notes try to be an elementary introduction to linear algebra. Notation and conventions. We use the notation F ∈ {R, C} to mean that F = R or F = C. Vectors will be denoted by boldface letters, like u and v. Two exceptions are column vectors in Fn which are denoted in sanserif, like u and v, and matrices will be denoted by capital sanserif letters like A and B. Below is a list of few mathematical symbols 4 G. NAGY – LINEAR ALGEBRA december 8, 2009 used in these notes: R Set of real numbers, Q Set of rational numbers, Z Set of integer numbers, N Set of positive integers, {0} Zero set, ∪ Union of sets, := Definition, ∀ For all, Proof Beginning of a proof, Example Beginning of an example, ∅ Empty set, ∩ Intersection of sets, ⇒ ∃ Implies, There exists, End of a proof, End of an example. Acknowledgments. I thanks all my students for pointing out several misprints and for helping make these notes more readable. I am specially grateful to Zhuo Wang and Wenning Feng. G. NAGY – LINEAR ALGEBRA December 8, 2009 5 Chapter 1. Linear systems 1.1. Row picture A central problem in linear algebra is to find solutions of a system of linear equations. A 2 × 2 linear system consists of two linear equations in two unknowns, that is, given the real numbers A11 , A12 , A21 , A22 , b1 , and b2 , find all the numbers x and y solutions of A11 x + A12 y = b1 , (1.1) A21 x + A22 y = b2 . (1.2) These equations are called a system because there is more than one equation, and they are called linear because the unknowns, x and y , appear as multiplicative factors with power zero or one (for example, there is no term proportional to x2 or to y 3 ). The row picture of a linear system is the method of finding solutions to this system as the intersection of all solutions to every single equation in the system. The individual equations are called row equations, or simply rows of the system. Example 1.1.1: An example of a 2 × 2 linear system is the following: Find all the numbers x and y solutions of 2x − y = 0, (1.3) −x + 2y = 3. (1.4) Solution: The solution to each row of the system above is found geometrically in Fig. 2. y second row 2y = x+3 first row y = 2x 2 (1,2) 1 −3 x Figure 2. The solution of a 2 × 2 linear system in the row picture is the intersection of the two lines, which are the solutions of each row equation. Analytically, the solution can be found by substitution: 2x − y = 0 ⇒ y = 2x ⇒ − x + 4x = 3 ⇒ x = 1, y = 2. An interesting property of the solutions to any 2 × 2 linear system is simple to prove using the row picture, and it is the following result. Theorem 1.1. Given any 2 × 2 linear system, only one of the following statements holds: (i) There exists a unique solution; (ii) There exist infinity many solutions. (iii) There exists no solution; 6 G. NAGY – LINEAR ALGEBRA december 8, 2009 It is interesting to remark what cannot happen, for example there is no 2 × 2 linear system having only two solutions. Unlike the quadratic equation x2 − 5x + 6 = 0, which has two solutions given by x = 2 and x = 3, a 2 × 2 linear system has only one solution, or infinitely many solutions, or no solution at all. Examples of these three cases, respectively, are given in Fig. 3. y y x y x x Figure 3. An example of the cases given in Theorem 1.1, cases (i)-(iii). Proof of Theorem 1.1: The solutions of each equation in a 2 × 2 linear system represents a line in R2 . Two lines in R2 can intersect at a point, or can be coincident, or can be parallel but not coincident. These are the cases given in (i)-(iii). This establishes the Theorem. We now generalize the definition of a 2 × 2 linear system given in the Example 1.1.1 to m equations of n unknowns. Definition 1.2. An m × n linear system is a set of m 1 linear equations in n 1 unknowns is the following: Given the coefficients numbers Aij and the source numbers bi , with i = 1, · · · , m and j = 1, · · · n, find the real numbers xj solutions of A11 x1 + · · · + A1n xn = b1 . . . Am1 x1 + · · · + Amn xn = bm . Furthermore, an m × n linear system is called consistent iff it has a solution, and it is called inconsistent iff it has no solutions. Example 1.1.2: Consider the 2 × 3 and the 3 × 3 linear systems are given below, respectively, x1 + 2x2 + x3 = 1 1 −3x1 + x2 − x3 = 6 3 2x1 + x2 + x3 = 2 −x1 + 2x2 = 1 x1 − x2 + 2x3 = −2. (1.5) The row picture is appropriate to solve small systems of linear equations. However it becomes difficult to carry out in 3 × 3 and bigger linear systems. For example, find the numbers x1 , x2 , x3 solutions of the 3 × 3 linear system above. Substitute the second equation into the first, x1 = −1 + 2x2 ⇒ x3 = 2 − 2x1 − x2 = 2 + 2 − 4x2 − x2 ⇒ x3 = 4 − 5x2 ; then, substitute the second equation and x3 = 4 − 5x2 into the third equation, (−1 + 2x2 ) − x2 + 2(4 − 5x2 ) = −2 ⇒ x2 = 1, and then, substituting backwards, x1 = 1 and x3 = −1, so the solution is a single point in space given by (1, 1, −1). G. NAGY – LINEAR ALGEBRA December 8, 2009 7 The solution of each separate equation in the examples above represents a plane in R3 . A solution to the whole system is a point that belongs to the three planes. In the 3 × 3 example above there is a unique solution, the point (1, 1, −1), which means that the three planes intersect at a single point. In the general case, a 3 × 3 system can have a unique solution, infinitely many solutions or no solutions at all, depending on how the three planes in space intersect among them. The case with unique solution was represented in Fig. 4, while two possible situations corresponding to no solution are given in Fig. 5. Finally, two cases of 3 × 3 linear system having infinitely many solutions are pictured in Fig 6, where in the first case the solutions form a line, and in the second case the solution form a plane because the three planes coincide. Figure 4. Planes representing the solutions of each row equation in a 3 × 3 linear system having a unique solution. Figure 5. Two cases of planes representing the solutions of each row equation in 3 × 3 linear systems having no solutions. Figure 6. Two cases of planes representing the solutions of each row equation in 3 × 3 linear systems having infinity many solutions. Solutions of linear systems with more than three unknowns can not be represented in the three dimensional space, in addition the substitution method becomes more involved to 8 G. NAGY – LINEAR ALGEBRA december 8, 2009 solve. Hence, alternative ideas are needed to solve such systems. Later on we will introduce the Gauss-Jordan method, which is a procedure appropriate to solve large systems of linear equations in an efficient way. Further reading. For more details on the row picture see Section 1.1 in Lay’s book [2]. G. NAGY – LINEAR ALGEBRA December 8, 2009 9 Exercises. 1.1.1.- Use the substitution method to find the solutions to the 2 × 2 linear system 2x − y = 1, x + y = 5. 1.1.2.- Sketch the three lines solution of each row in the system x + 2y = 2 x−y =2 y = 1. Is this linear system consistent? 1.1.3.- Sketch a graph representing the solutions of each row in the following nonlinear system, and decide whether it has solutions or not, x2 + y 2 = 4 x − y = 0. 1.1.4.- Graph on the plane the solution of each individual equation of the 3 × 2 linear system system 3x − y = 0 , x + 2y = 4, −x + y = −2, and determine whether the system is consistent or inconsistent. 1.1.5.- Show that the 3 × 3 linear system x + y + z = 2, x + 2y + 3z = 1, y + 2z = 0, is inconsistent, by finding a combination of the three equations that ads up to the equation 0 = 1. 1.1.6.- Find all values of the constant k such that there exists infinitely many solution to the 2 × 2 linear system kx + 2y = 0, x+ k y = 0. 2 10 G. NAGY – LINEAR ALGEBRA december 8, 2009 1.2. Column picture Consider again the linear system in Eqs. (1.3)-(1.4) and introduce a change in the names of the unknowns, calling them x1 and x2 instead of x and y . The problem is to find the numbers x1 , and x2 solutions of 2x1 − x2 = 0, (1.6) −x1 + 2x2 = 3. (1.7) We know that the answer is x1 = 1, x2 = 2. The row picture consisted in solving each row separately. The main idea in the column picture is to interpret the 2 × 2 linear system as an addition of new objects, column vectors, in the following way, 2 −1 0 x+ x= . −1 1 22 3 (1.8) The new objects are called column vectors and they are denoted as follows, A1 = 2 , −1 A2 = −1 , 2 b= 0 . 3 We can represent these vectors in the plane, as it is shown in Fig. 7. x2 3 A2 b 2 2 −1 x1 −1 A1 Figure 7. Graphical representation of column vectors in the plane. The column vector interpretation of a 2 × 2 linear system determines an addition law of vectors and a multiplication law of a vector by a number. In the example above, we know that the solution is given by x1 = 1 and x2 = 2, therefore in the column picture interpretation the following equation must hold 2 −1 0 + 2= . −1 2 3 The study of this example suggests that the multiplication law of a vector by numbers and the addition law of two vectors can be defined by the following equations, respectively, −1 (−1)2 2= , 2 (2)2 2 −2 2−2 + = . −1 4 −1 + 4 The study of several examples of 2 × 2 linear systems in the column picture determines the following definition. G. NAGY – LINEAR ALGEBRA December 8, 2009 11 u1 v and v = 1 , and real numbers a and b, u2 v2 the linear combination of u and v is defined as follows, Definition 1.3. Given any 2-vectors u = u1 v au1 + bv1 + b 1 := . u2 v2 au2 + bv2 a A linear combination includes the particular cases of addition (a = b = 1), and multiplication of a vector by a number (b = 0), respectively given by, u1 v u + v1 + 1= 1 , u2 v2 u2 + v2 a u1 au1 = . u2 au2 The addition law in terms of components is represented graphically by the parallelogram law, as it can be seen in Fig. 8. The multiplication of a vector by a number a affects the length and direction of the vector. The product au stretches the vector u when a > 1 and it compresses u when 0 < a < 1. If a < 0 then it reverses the direction of u and it stretches when a < −1 and compresses when −1 < a < 0. Fig. 8 represents some of these possibilities. Notice that the difference of two vectors is a particular case of the parallelogram law, as it can be seen in Fig. 9. x2 aV V+W v2 a>1 V −V V (v+w) 2 aV 0<a<1 W w2 x1 w1 v1 a = −1 (v+w)1 Figure 8. The addition of vectors can be computed with the parallelogram law. The multiplication of a vector by a number stretches or compresses the vector, and changes it direction in the case that the number is negative. V−W V+W V+(−W) V W V −W W Figure 9. The difference of two vectors is a particular case of the parallelogram law of addition of vectors. The column picture interpretation of a general 2 × 2 linear system given in Eqs. (1.1)(1.2) is the following: Introduce the coefficient and source column vectors A1 = A11 , A21 A2 = A12 , A22 b= b1 , b2 (1.9) 12 G. NAGY – LINEAR ALGEBRA december 8, 2009 and then find the coefficients x1 and x2 that change the length of the coefficient column vectors A1 and A2 such that they add up to the source column vector b, that is, A1 x1 + A2 x2 = b. For example, the column picture of the linear system in Eqs. (1.6)-(1.7) is given in Eq. (1.8). The solution of this system are the numbers x1 = 1 and x2 = 2, and this solution is represented in Fig. 10. x2 x2 2A 2 4 4 2A 2 b 2 A2 −2 2 x1 −1 −2 A1 x1 −1 Figure 10. Representation of the solution of a 2 × 2 linear system in the column picture. The existence and uniqueness of solutions in the case of 2 × 2 systems can be studied geometrically in the column picture as it was done in the row picture. In the latter case we have seen that all possible 2 × 2 systems fall into one of these three cases, unique solution, infinitely many solutions and no solutions at all. The proof was to study all possible ways two lines can intersect on the plane. The same existence and uniqueness statement can be proved in the column picture. In Fig. 11 we present these three cases in both row and column pictures. In the latter case the proof is to study all possible relative positions of the column vectors A1 , A2 , and b on the plane. y y y x x b x2 x x2 x2 b A2 A2 A1 A1 x1 A2 b A1 x1 x1 Figure 11. Examples of a solutions of general 2 × 2 linear systems having a unique, infinite many, and no solution, represented in the row picture and in the column picture. G. NAGY – LINEAR ALGEBRA December 8, 2009 13 We see in Fig. 11 that the first case corresponds to a system with unique solution. There is only one linear combination of the coefficient vectors A1 and A2 which adds up to b. The reason is that the coefficient vectors are not proportional to each other. The second case corresponds to the case of infinitely many solutions. The coefficient vectors are proportional to each other and the source vector b is also proportional to them. So, there are infinitely many linear combinations of the coefficient vectors that add up to the source vector. The last case corresponds to the case of no solutions. While the coefficient vectors are proportional to each other, the source vector is not proportional to them. So, there is no linear combination of the coefficient vectors that add up to the source vector. The ideas in the column picture can be generalized to m × n linear equations, which gives rise to the generalization to m-vectors of the definitions of linear combination presented above. u1 v1 . . , v = . and the real numbers a, b, the Definition 1.4. Given m-vectors u = . . . um vm linear combination of the vectors u and v is defined as follows u1 v1 au1 + bv1 . . . . a . + b . = . . . . um vm aum + bvm This definition can be generalized to an arbitrary number of vectors. Column vectors provide a new way to denote an m × n system of linear equations. Definition 1.5. An m × n linear system of m 1 linear equations in n 1 unknowns is the following: Given the coefficient m-vectors A1 , · · · , An and the source m-vector b, find the real numbers x1 , · · · , xn solution of the linear combination A1 x1 + · · · + An xn = b. For example, recall the 3 × 3 system given as the second system in Eq. (1.5). This system in the column picture is the following: Find numbers x1 , x2 and x3 such that 2 1 1 2 −1 x1 + 2 x2 + 0 x3 = 1 . (1.10) 1 −1 2 −2 These are the main ideas in the column picture. We will see later that linear algebra emerges from the column picture. The next section we give a method, due to Gauss, to solve in an efficient way m × n linear systems for large m and n. Further reading. There is a clear explanation of the column picture in Section 1.3 in Lay’s book [2]. See also Section 1.2 in Strang’s book [4] for a shorter summary of both the row and column pictures. 14 G. NAGY – LINEAR ALGEBRA december 8, 2009 Exercises. 1.2.1.- Sketch a graph of the vectors »– »– »– 1 2 1 A1 = , A2 = , b= . 2 1 −1 Is the linear system A1 x1 + A2 x2 = b consistent? If the answer is “yes,” find the solution. 1.2.2.- Consider the vectors »– »– 4 −2 A1 = , A2 = , 2 −1 »– 2 b= . 0 (a) Graph the vectors A1 , A2 and b on the plane. (b) Is the linear system A1 x1 +A2 x2 = b consistent? »– 6 (c) Given the vector c = , is the 3 linear system A1 x1 + A2 x2 = c consistent? If the answer is “yes,” is the solution unique? 1.2.3.- Consider the vectors »– »– 4 −2 A1 = , A2 = , 2 −1 ˆ2˜ and given a real number h, set c = h . Find all values of h such that the system A1 x1 + A2 x2 = c is consistent. 1.2.4.- Show that the three vectors below lie on the same plane, by expressing the third vector as a linear combination of the first two, where 23 23 23 1 1 1 A1 = 415 , A2 = 425 , A3 = 435 . 0 1 2 Is the linear system A1 x1 + A2 x2 + A3 x3 = 0 consistent? If the answer is “yes,” is the solution unique? G. NAGY – LINEAR ALGEBRA December 8, 2009 15 1.3. Gauss-Jordan method The Gauss-Jordan method is a procedure to find solutions to m × n linear systems in an efficient way. Efficient here means to perform as few as possible algebraic steps to find the solution or to show that the solution does not exist. Before introducing this method, we need several definitions. Consider an m × n linear system A11 x1 + · · · + A1n xn = b1 . . . Am1 x1 + · · · + Amn xn = bm . Introduce the matrix of coefficients, the source vector, and the augmented matrix of a linear system, given respectively by the following expressions, n columns A11 · · · A1n b1 A11 · · · A1n b1 . . . m rows, b := . , . . . . . A := . A|b := . . . . . . . . Am1 · · · Amn bm Am1 · · · Amn bm We call A an m × n matrix, and the source vector b is a particular case of an m × 1 matrix. The augmented matrix of an m × n linear system is given by the coefficients and the source coefficients together, hence it is an m × (n +1) matrix. The symbol “:=” denotes “definition,” that is, the object to the left of the symbol := is defined by the expression on the right of that symbol. Example 1.3.1: Consider again the 2 × 2 linear system 2x1 − x2 = 0, (1.11) −x1 + 2x2 = 3. (1.12) The coefficient matrix is 2 × 2, the source vector is 2 × 1, and the augmented matrix is 2 × 3, given respectively by A= 2 −1 −1 , 2 b= 0 , 3 [A|b] = 2 −1 −1 2 0 . 3 (1.13) We also use the alternative notation A = [Aij ] to denote a matrix with components Aij , and b = [bi ] to denote a vector with components bi . Given a matrix A = [Aij ], the elements Aii are called diagonal elements. A matrix is upper triangular iff all coefficients below the diagonal elements vanish. Example 1.3.2: The diagonal elements in the 3 × 3, 2 × 3 and 3 × 2 matrices presented below are written explicitly, while the coefficients with ∗ denote non-diagonal elements: A11 ∗ ∗ A11 ∗ A11 ∗ ∗ ∗ ∗ A22 ∗ , A22 . , ∗ A22 ∗ ∗ ∗ A33 ∗ ∗ The following matrices are upper triangular: A11 ∗ ∗ A11 0 A22 ∗ , 0 0 0 A33 ∗ A22 ∗ , ∗ A11 0 0 ∗ A22 . 0 16 G. NAGY – LINEAR ALGEBRA december 8, 2009 The Gauss-Jordan method is a procedure performed on the augmented matrix of a linear system. It transforms the augmented matrix into a different matrix with two important properties: First, the linear system associated with the latter augmented matrix has the same solutions as the original linear system; second, the solutions on the latter linear system are simpler to find than in the original linear system. The augmented matrix is changed by doing any of the following three operations, called Gauss operations: (i) Adding to one row a multiple of the another; (ii) Interchanging two rows; (iii) Multiplying a row by a non-zero number. These operations are respectively represented by the symbols given in Fig. 12. a a Figure 12. A representation of the Gauss elimination operations (i), (ii) and (iii), respectively. The Gauss operations change the coefficients of the augmented matrix of a system but do not change its solution. Two systems of linear equations having the same solutions are called equivalent. The Gauss-Jordan method is an algorithm using Gauss operations such that given any m × n linear system there exists an equivalent system whose augmented matrix is simple in the sense that the solution can be found by inspection. Example 1.3.3: Find the solution of the 2 × 2 linear system with augmented matrix given in Eq. (1.13) using Gauss operations. Solution: 2 −1 −1 2 0 2 → 3 −2 −1 4 0 2 → 6 0 −1 3 0 → 6 2 −1 0 20 2 10 1 → → . 0 1 2 01 2 01 2 The Gauss operations have changed the augmented matrix of the original system as follows: 2 −1 −1 2 0 1 → 3 0 0 1 1 . 2 Since the Gauss operations do not change the solution of the associated linear systems, the augmented matrices above imply that the following two linear systems have the same solutions, 2x1 − x2 = 0, x1 = 1, ⇔ −x1 + 2x2 = 3. x2 = 2. On the last system the solution, x1 = 1, x2 = 2, is easy to read. A precise way to define the notion of an augmented matrix corresponding to a linear system with solutions “easy to read” is captured in the notions of echelon form of a matrix, and reduced echelon form of a matrix. We present these notions in detail in the next Section. In the rest of this Section we study a particular type of linear systems having the same number of equations and unknowns. We call them n × n linear systems, and also square G. NAGY – LINEAR ALGEBRA December 8, 2009 17 systems. An example of a square system is the 2 × 2 linear system in Eqs. (1.11)-(1.12). So, in the rest of this Section we introduce the Gauss operations and back substitution method to find solutions to square linear system. We finally compare this method with the Gauss-Jordan method also restricted to square linear systems. 1.3.1. Gauss operations and back substitution. Consider a square n × n linear system. The solutions to that system can be found using the Gauss operations and back substitution method. The method has two main parts: First, use Gauss operations to transform the augmented matrix of the system into an upper triangular form; second, use back substitution to compute the solution to the system. Example 1.3.4: Use the Gauss method and back substitution to solve the 3 × 3 linear system 2x1 + x2 + x3 = 2 −x1 + 2x2 = 1 x1 − x2 + 2x3 = −2. Solution: We have already seen in Example 1.1.2 that the solution of this system is given by x1 = x2 = 1, x3 = −1. Let us find that solution using the Gauss method with back substitution. First transform the augmented matrix of the system into upper triangular form: 2 11 2 1 −1 2 −2 1 −1 2 −2 −1 20 1 → −1 20 1 → 0 1 2 −1 1 −1 2 −2 2 11 2 0 3 −3 6 1 −1 2 −2 1 −1 2 −2 0 1 2 −1 → 0 12 −1 . 0 0 −9 9 0 01 −1 We now write the linear system corresponding to the last augmented matrix, 2x1 − x2 + 2x3 = −2 x2 + 2x3 = −1 x3 = −1. We now use the back substitution to obtain the solution, that is, introduce x3 = −1 into the second equation, which gives us x2 = 1. The substitute x3 = −1 and x2 = 1 into the first equation, which gives us x1 = 1. 1.3.2. Gauss-Jordan method. Consider again a square n × n linear system. The GaussJordan method on square systems is a minor modification of the Gauss method and back substitution. Do not stop doing Gauss operations when the augmented matrix becomes upper triangular. Keep doing Gauss operations in order to make zeros above the diagonal. Then, back substitution will no be needed to find the solution of the linear system. It will be found just by inspection. Example 1.3.5: Use the Gauss-Jordan method to solve the same 3 × 3 linear system as in Example 1.3.4, that is, 2x1 + x2 + x3 = 2 −x1 + 2x2 = 1 x1 − x2 + 2x3 = −2. 18 G. NAGY – LINEAR ALGEBRA december 8, 2009 Solution: In Example 1.3.4 we performed Gauss operations on the augmented matrix of the system above until we obtained an upper triangular matrix, that is, 1 −1 2 −2 2 11 2 −1 −1 . 20 1 → 0 12 0 01 −1 −2 1 −1 2 The idea now is to continue with 1 −1 2 −2 10 0 → 0 1 12 −1 0 01 −1 00 Gauss operations, as follows: 4 −3 100 1 2 −1 → 0 1 0 1 1 −1 001 −1 ⇒ x1 = 1, x2 = 1, x = −1. 3 In the last step we do not need to do back substitution to compute the solution. It is obtained by inspection only. Further reading. Almost every linear algebra book describes the Gauss-Jordan method. See Section 1.2 in Lay’s book [2] for a summary of echelon forms and the Gauss-Jordan method. See Sections 1.2 and 1.3 in Meyer’s book [3] for more details on Gauss elimination operations and back substitution. Also see Section 1.3 in Strang’s book [4]. G. NAGY – LINEAR ALGEBRA December 8, 2009 19 Exercises. 1.3.1.- Use Gauss operations and back substitution to find the solution of the 3 × 3 linear system x1 + x2 + x3 = 1, x1 + 2x2 + 2x3 = 1, x1 + 2x2 + 3x3 = 1. 1.3.2.- Use Gauss operations and back substitution to find the solution of the 3 × 3 linear system 2x1 − 3x2 = 3, 4x1 − 5x2 + x3 = 7, 2x1 − x2 − 3x3 = 5. 1.3.3.- Find the solution of the following linear system with Gauss-Jordan’s method, 4x2 − 3x3 = 3, −x1 + 7x2 − 5x3 = 4, −x1 + 8x2 − 6x3 = 5. 1.3.4.- Find the solutions to the following two linear systems, which have the same matrix of coefficient A but different source vectors b1 and b2 , given respectively by, 4x1 − 8x2 + 5x3 = 1, 4x1 − 8x2 + 5x3 = 0, 4x1 − 7x2 + 4x3 = 0, 4x1 − 7x2 + 4x3 = 1, 3x1 − 4x2 + 2x3 = 0, 3x1 − 4x2 + 2x3 = 0. Solve these two systems at one time using the Gauss-Jordan method on an augmented matrix of the form [A|b1 |b2 ]. 20 G. NAGY – LINEAR ALGEBRA december 8, 2009 1.4. Echelon forms 1.4.1. Echelon forms and reduced echelon forms. The Gauss-Jordan method is a procedure using Gauss operations that transforms the augmented matrix of any m × n linear system into the augmented matrix of an equivalent system whose solutions can be found just by inspection. A precise way to define the notion of a linear system with solutions that can be “found by inspection” is captured in the notions of echelon form and reduced echelon form of its augmented matrix. Definition 1.6. An m × n matrix is in echelon form iff the following conditions hold: (i) All zero rows are at the bottom of the matrix; (ii) The first non-zero coefficient on a row is always to the right of the first non-zero coefficient of the row above it. The pivot coefficient is the first non-zero coefficient on every non-zero row in a matrix in echelon form. Example 1.4.1: The 6 × 8, 3 × 5 and 3 × 3 matrices given below are in echelon form, where the ∗ means any non-zero number and pivots are displayed in boldface. ∗∗∗∗∗∗∗∗ 0 0 ∗ ∗ ∗ ∗ ∗ ∗ ∗∗∗∗∗ ∗∗∗ 0 0 0 ∗ ∗ ∗ ∗ ∗ 0 0 ∗ ∗ ∗ , 0 ∗ ∗ . 0 0 0 0 0 0 ∗ ∗ , 00000 00∗ 0 0 0 0 0 0 0 0 00000000 Example 1.4.2: The following matrices are in echelon form, with pivot coefficients displayed in boldface: 211 13 23 2 , , 0 3 4 . 01 0 4 −2 000 Definition 1.7. An m × n matrix is in reduced echelon form iff the matrix is in echelon form and the following two conditions hold: (i) The pivot coefficient is equal to 1; (ii) The pivot coefficient is the only non-zero coefficient in that column. We denote by EA the reduced echelon form of a matrix A. Example 1.4.3: The 6 × 8, 3 × 5 and 3 × 3 matrices given below are in echelon form, where the ∗ means any non-zero number and pivots are displayed in boldface. 1∗00∗∗0∗ 0 0 1 0 ∗ ∗ 0 ∗ 1∗0∗∗ 100 0 0 0 1 ∗ ∗ 0 ∗ 0 0 1 ∗ ∗ , 0 1 0 . 0 0 0 0 0 0 1 ∗ , 00000 001 0 0 0 0 0 0 0 0 00000000 G. NAGY – LINEAR ALGEBRA December 8, 2009 21 Example 1.4.4: And the following matrices are not only in echelon form but also in reduced echelon form; again, pivot coefficient are displayed in boldface: 100 10 104 , , 0 1 0 . 01 015 000 Summarizing, the Gauss-Jordan method consists in changing the augmented matrix of a linear system into reduced echelon form using Gauss operations. Then, is not difficult to decide whether the linear system has solutions or not. Example 1.4.5: Consider a 3 × 3 linear system with augmented matrix having a reduced echelon form given below. Then, the solution of this linear system is simple to obtain: x1 = −1 100 −1 0 1 0 ⇒ x2 = 3 3 x = 2. 001 2 3 Recall that we use the notation EA for the reduced echelon form of an m × n matrix A. The reduced echelon form of a matrix has an important property: It is unique. We state this property in Proposition 1.8 below and we prove it in the case of 2 × 2 matrices. Notice that the echelon form of a matrix is not unique. Only the reduced echelon form is unique. Also notice that given a matrix A, there are many different Gauss operations schemes that produce the reduced echelon form EA , as is sketched in Fig. 13. A EA Different Gauss operation schemes Figure 13. The reduced echelon form of a matrix can be obtained in many different ways. Example 1.4.6: We find the reduced echelon form of matrix A below using two different Gauss operation schemes: A= A= 2 1 4 3 2 1 4 3 10 12 → 7 13 10 13 → 7 24 5 1 → 7 0 7 1 → 10 0 2 −2 2 1 5 10 → 2 01 5 1 → −4 0 2 1 1 = EA , 2 5 10 → 2 01 1 = EA . 2 The matrix EA is independent of the sequences of Gauss operations used to find it. Proposition 1.8. The reduced echelon EA form of an m × n matrix A is unique. 22 G. NAGY – LINEAR ALGEBRA december 8, 2009 Proof of Proposition 1.8: We only show the proof in the case of 2 × 2 matrices. Nevertheless, we start with the following observation that holds for any m × n matrix A: Since Gauss operations do not change the solutions of the homogeneous system [A|0], if a matrix A ˜ has two different reduced echelon forms, EA and EA , then the set of solutions of the systems ˜ [EA |0] and [EA |0] must coincide with the solutions of the system [A|0]. What we are going to show now is the following: All possible 2 × 2 reduced echelon form matrices E have different solutions to the homogeneous equation [E|0]. This property then establishes that every 2 × 2 matrix A has a unique reduced echelon form. Given any 2 × 2 matrix A, all possible reduced echelon forms are the following: EA = 1c , 00 c ∈ R, EA = 01 , 00 EA = 10 . 01 We claim that all these matrices determine different sets of solutions to the equation [EA |0]. In the first case, the set of solutions are lines given by x1 = −c x2 , c ∈ R. (1.14) In the second case, the set of solutions is a single line given by x2 = 0, x1 ∈ R. Notice that this line does not belong to the set given in Eq. (1.14). In the third case, the solution is a single point x1 = 0, x2 = 0. Since all these sets of solutions are different, and only one corresponds to the equation [A|0], then there is only one reduced echelon form EA for matrix A. This establishes the Proposition for 2 × 2 matrices. We assume now that the Proposition 1.8 is true for m × n matrices. Since the reduced echelon form of a matrix is unique, the number of pivots of EA is also unique. We give that number a particular name, since it will be important later on. Definition 1.9. The rank of an m × n matrix A, denoted as rank(A), is the number of pivots in its reduced echelon form EA . Given a consistent linear system with augmented matrix [A|b], the number n − rank(A) is called the number of free variables in the solutions of that system. Example 1.4.7: Consider a 3 × 3 linear system with augmented matrix having a reduced echelon form given below. The coefficient matrix has rank two, hence the solutions of the linear system contains one free variable. That means, there are infinitely many solutions to this system. Since the coefficient matrix is in reduced echelon form, the solutions of the linear system are simple to obtain, and they are given by x1 = −1 − 3x3 , 10 3 −1 ⇒ 0 1 −2 x2 = 3 + 2x3 , 3 x : free variable. 00 0 0 3 The number of non-pivots columns in EA is actually the number of variables not fixed by the linear system with augmented matrix [A|b]. These variables are not fixed in the sense that for every value of these free variables the rest of the variables are fixed by the linear system. This is the origin of the name “free variable”. In Example 1.4.7 the free variable is x3 , since for every value of x3 the other two variables x1 and x2 are fixed by the system. Notice that only the number of free variables is relevant, but not which particular variable G. NAGY – LINEAR ALGEBRA December 8, 2009 23 is the free one. In the following example we express the solutions in Example 1.4.7 in terms of x1 as a free variable. Example 1.4.8: The solutions given in Example 1.4.7 can be expressed in the following alternative way: x1 : free variable, 2 x2 = 3 + (−1 − x1 ), 3 1 x3 = (−1 − x1 ). 3 Example 1.4.9: We now present three examples of consistent linear systems. • The 2 × 2 linear system below is has a coefficient matrix with rank one, the system is consistent and has one free variable: 2x1 − x2 = 1 1 1 1 − x1 + x1 = − . 2 4 4 The proof of the statement above is the following: Gauss-Jordan operations transform the augmented matrix of the system as follows, 2 −1 2 −1 1 4 1 2 → −1 0 4 −1 0 1 , 0 so the system is consistent, has rank one, and has a free variable, and therefore, infinitely many solutions. • The 2 × 3 linear system below has a coefficient matrix with rank two, the linear system is consistent, hence the system has one free variable: x1 + 2x2 + 3x3 = 1, 3x1 + x2 + 2x3 = 2. • The 3 × 2 linear system below has a coefficient matrix with rank two, the system is consistent, hence the linear system has no free variables: x1 + 3x2 = 2, 2x1 + 2x2 = 0, 3x1 + x2 = −2. 1.4.2. Inconsistent systems. We now present a condition on the augmented matrix of a linear system that determines if the linear system is inconsistent. We first present an example. Example 1.4.10: Show that the following 2 × 2 linear system has no solutions, 2x1 − x2 = 0 1 1 1 − x1 + x1 = − . 2 4 4 (1.15) (1.16) 24 G. NAGY – LINEAR ALGEBRA december 8, 2009 Solution: One way to see that there is no solution is the following: Multiplying the second equation by −4 one obtains the equation 2x1 − x2 = 1, whose solutions form a parallel line to the line given in Eq. (1.15). Therefore, the system in Eqs. (1.15)-(1.16) has no solution. A second way to see that the system above has no solution is using Gauss operations. The system above has augmented matrix 2 −1 2 −1 1 4 0 2 → −1 0 4 −1 0 0 . 1 The echelon form above corresponds to the linear system 2x1 − x2 = 0 0 = 1. The solutions of this second system coincides with the solutions of the first system in Eqs. (1.15)-(1.16). Since the second system has no solution, then the system in Eqs. (1.15)(1.16) has no solution. This example is a particular cases of the following result. Theorem 1.10. An m × n linear system with augmented matrix [A|b] is inconsistent iff the reduced echelon form of its augmented matrix contains a row of the form [0, · · · , 0 | 1]; equivalently rank(A) < rank(A|b). Furthermore, a consistent system contains: (i) A unique solution iff it has no free variables; equivalently rank(A) = n. (ii) Infinitely many solutions iff it has at least one free variable; equivalently rank(A) < n. The idea of the proof is to study all possible reduced echelon forms EA of an arbitrary matrix A, and then to study all possible augmented matrices [EA |c]. One then concludes that there are three main cases, no solutions, unique solutions, or infinitely many solutions. Proof of Theorem 1.10: We only give the proof in the case of 3 × 3 linear systems. The reduced echelon form EA of a 3 × 3 matrix A determines a 3 × 4 augmented matrix [EA |c]. There are 14 possible forms for this matrix. We start with the case of three pivots: 100 ∗ 10∗ ∗ 1∗0 ∗ 010 ∗ 0 1 0 ∗ , 0 1 ∗ ∗ , 0 0 1 ∗ , 0 0 1 ∗ . 001 ∗ 000 1 000 1 000 1 In the first case we have a unique solution, and rank(A) = 3. The other three cases correspond to no solutions. We now continue with the case of two pivots, which contains six possibilities, with the first three given by 010 ∗ 1∗0 ∗ 10∗ ∗ 0 1 ∗ ∗ , ∗ , 0 0 1 ∗ , 0 0 1 000 0 000 0 000 0 which correspond to infinitely many solutions, with one free other three possibilities are given by 0 01∗ ∗ 1∗∗ ∗ 0 0 0 1 , 0 1 , 0 0 0 0 000 0 000 0 variable, rank(A) = 2. The 01 00 00 ∗ 1 ; 0 G. NAGY – LINEAR ALGEBRA December 8, 2009 25 which correspond to no solution. Finally, we have the case of one pivot. It contains four possibilities, the first three of them are given by 001 ∗ 01∗ ∗ 1∗∗ ∗ 0 0 0 0 , 0 0 0 0 ; 0 , 0 0 0 0 000 0 000 0 000 which correspond to infinitely many solutions, with two free variables and rank(A) = 1. The last possibility is the trivial case 000 1 0 0 0 0 , 000 0 which has no solutions. This establishes the Theorem for 3 × 3 linear systems. Example 1.4.11: Sow that the 2 × 2 linear system below is inconsistent: 2x1 − x2 = 1 1 1 1 − x1 + x1 = − . 2 4 4 Solution: The proof of the statement above is the following: Gauss-Jordan operations transform the augmented matrix of the system as follows, 2 −1 2 −1 0 2 → −1 0 4 1 4 −1 0 0 2 → −1 0 4 −1 0 0 . 1 Since there is a line of the form [0, 0|1], the system is inconsistent. We can also say that the coefficient matrix has rank one, but the definition of free variables does not apply, since the system is inconsistent. Example 1.4.12: Find all numbers h and k such that the system below has only one, many, or no solutions, x1 + hx2 = 1 x1 + 2x2 = k. Solution: Start finding the associated augmented matrix and reducing it into echelon form, 1 1 h 2 1 1 h → k 0 2−h 1 . k−1 Suppose h = 2, for example set h = 1, then 11 01 1 10 → k−1 01 2−k , k−1 so the system has a unique solution for all values of k . (The same conclusion holds if one sets h to any number different of 2.) Suppose now that h = 2, then, 12 00 If k = 1 then 12 00 1 0 ⇒ 1 . k−1 x1 = 1 − 2x2 , x2 : free variable. 26 G. NAGY – LINEAR ALGEBRA december 8, 2009 so there are infinitely many solutions. If k = 1, then 12 00 1 k−1=0 and the system is inconsistent. Summarizing, for h = 2 the system has a unique solution for every k . If h = 2 and k = 1 the system has infinitely many solutions, and if h = 2 and k = 1 the system has no solution. Further reading. See Sections 2.1, 2.2 and 2.3 in Meyer’s book [3] for a detailed discussion on echelon forms and rank, reduced echelon forms and inconsistent systems, respectively. Again, see Section 1.2 in Lay’s book [2] for a summary of echelon forms and the GaussJordan method. Section 1.3 in Strang’s book [4] also helps. G. NAGY – LINEAR ALGEBRA December 8, 2009 27 Exercises. 1.4.1.- Find the rank and the pivot columns of the matrix 2 3 1211 A = 42 4 2 2 5 . 3634 1.4.4.- Construct a 3 × 4 matrix A and 3vectors b, c, such that [A|b] is the augmented matrix of a consistent system and [A|c] is the augmented matrix of an inconsistent system. 1.4.2.- Find all the solutions x of the linear system Ax = 0, where the matrix A is given by 2 3 12 13 A = 42 1 − 1 0 5 10 01 1.4.5.- Let A be an m × n matrix having rank(A) = m. Explain why the system with augmented matrix [A|b] is consistent for every m-vector b. 1.4.3.- Find all the solutions x of the linear system Ax = b, where the matrix A and the vector b are given by 3 2 23 1 2 4 3 A = 42 −1 −75 , b = 4−45 . 3 2 0 1 1.4.6.- Consider the following system of linear equations, where k represents any real number, 2x1 + kx2 = 1, −4x1 + 2x2 = 4. (a) Find all possible values of the number k such that the system above is inconsistent. (b) Set k = 2. » – In this case, find the x1 solution x = . x2 28 G. NAGY – LINEAR ALGEBRA december 8, 2009 1.5. Non-homogeneous equations The main subject of this Section is to introduce the definitions of homogeneous and nonhomogeneous linear systems and discuss few relations between their solutions. However, we start this Section introducing a new notation. We first define a vector form for the solution of a linear system and we then introduce a matrix-vector product in order to express a linear system in a compact notation. One advantage of this notation appears in this Section: It is simple to realize that solutions of a non-homogeneous linear system are translations of solutions of the associated homogeneous system. A much deeper insight obtained from the matrix-vector product will be discussed in the next Chapter, but we can summarize it as follows: A matrix can be thought as a function acting on the space of vectors. 1.5.1. Matrix-vector product. We start generalizing the vector notation to include the unknowns of a linear system. That is, consider the m × n linear system A11 x1 + · · · + A1n xn = b1 , . . . Am1 x1 + · · · + Amn xn = bm , and introduce the m × n coefficients matrix A, variable n-vector x and source m-vector b by A11 · · · A1n x1 b1 . . = A ,··· ,A , x = . , b = . . . . . A= . . . :1 :n . . Am1 · · · Amn xn bm We will also use the shortcut A = [Aij ], x = [xj ] and b = [bi ] for the component notation. The main idea now is to use the matrix and vectors above to express a linear system in a compact notation. The key to achieve this is the following definition. Definition 1.11. Given an m × n matrix A = [Aij ] and an n-vector x = [xj ], where i = 1, · · · , m an j = 1, · · · , n, the matrix-vector product of A with x is the m-vector A11 x1 + · · · + A1n xn . . Ax = . . Am1 x1 + · · · + Amn xn Notice that the matrix-vector product can also be written as a linear combination of the matrix column vectors A:j , more precisely as follows, Ax = A:1 x1 + · · · + A:n xn . Using this notation, the m × n linear system above can be written as follows, Ax = b. Therefore, an m × n linear system with augmented matrix [A|b] can now be expressed as using the matrix-vector product as Ax = b, where x is the variable n-vector and b is the usual source m-vector. Finally, an important matrix is the n × n identity matrix Iii = 1 In = [Iij ] with Iij = 0 i = j. The cases n = 2, 3 are given by I2 = 10 , 01 10 I3 = 0 1 00 0 0 . 1 G. NAGY – LINEAR ALGEBRA December 8, 2009 29 Example 1.5.1: Use the matrix-vector product to express the linear system 2x1 − x2 = 3, −x1 + 2x2 = 0. Solution: The matrix of coefficient A, the variable vector x and the source vector b are respectively given by A= 2 −1 −1 , 2 x= x1 , x2 b= 3 . 0 The coefficient matrix can be written as A= 2 −1 −1 = A:1 , A:2 , 2 A:1 = 2 , −1 A:2 = −1 . 2 The linear system above can be written in the compact way Ax = b, that is, 2 −1 −1 2 x1 3 = . x2 0 We now verify that the notation above actually represents the original linear system: 3 2 = 0 −1 −1 2 x1 2 −1 2x1 −x2 2x1 − x2 = x+ x2 = + = , x2 −1 1 2 −x1 2x2 −x1 + 2x2 which indeed is the original linear system. Example 1.5.2: Use the matrix-vector product to express the 2 × 3 linear system 2x1 − 2x2 + 4x3 = 6, x1 + 3x2 + 2x3 = 10. Solution: The matrix of coefficients A, variable vector x and source vector b are given by x1 2 −2 4 6 A= , x = x2 , b = . 1 32 10 x3 Therefore, the linear system above can be written as Ax = b, that is, x 2 −2 4 1 6 x2 = . 1 32 10 x3 We now verify that this notation reproduces the linear system above: x 6 2 −2 4 1 2 −2 4 2x1 − 2x2 + 4x3 x2 = = x+ x2 + x= , 10 1 32 11 3 23 x1 + 3x2 + 2x3 x3 which indeed is the linear system above. Example 1.5.3: Use the matrix-vector product to express the 3 × 2 linear system x1 − x2 −x1 + x2 = 0, = 2, x1 + x2 = 0. 30 G. NAGY – LINEAR ALGEBRA december 8, 2009 Solution: The matrix-vector product representation of the linear system above is the following, 1 −1 0 x −1 1 1 = 2 . x2 1 1 0 We now verify that the notation above actually represents the original linear system: x1 − x2 −1 1 −1 1 0 x1 2 = −1 1 = −1 x1 + 1 x2 = −x1 + x2 , x2 1 x1 + x2 1 1 1 0 which is indeed the linear system above. 1.5.2. Homogeneous and non-homogeneous linear systems. We now classify all possible linear systems into two main classes, homogeneous and non-homogeneous, depending whether the source vector vanishes or not. This classification will be useful to express solutions of non-homogeneous systems in terms of solutions of the homogeneous system. Definition 1.12. The m × n linear system Ax = b is called homogeneous iff b = 0 and is called non-homogeneous iff b = 0. Notice that every m × n homogeneous linear system has at least one solution, given by x = 0. This solution is called the trivial solution of the homogeneous system. The following example shows that homogeneous linear systems can also have non-trivial solutions. Example 1.5.4: Find the solutions of the system Ax = 0, with coefficient matrix A= −2 1 4 . −2 Solution: The linear system above can be written in the usual way as follows, −2x1 + 4x2 = 0, (1.17) x1 − 2x2 = 0. (1.18) The solution of this system can be found using Gauss elimination operations, as follows x1 = 2x2 , −2 4 −2 4 1 −2 → → ⇒ 1 −2 00 0 0 x2 :free variable. Therefore, the coefficient matrix of this system has rank one, and so the solutions have one free variable. The set of all solutions of the linear system above is given by x= x1 2x2 2 = = x, x2 x2 12 x2 ∈ R. Therefore, the set of all solutions of this linear system can be identified with the set of points that belong to the line shown in Fig. 14. It is also possible that an homogeneous linear system has only the trivial solution, like this example shows. Example 1.5.5: Find the solutions of the system Ax = 0 with coefficient matrix A= 2 −1 −1 . 2 G. NAGY – LINEAR ALGEBRA December 8, 2009 31 x2 2 1 1 2 x1 Figure 14. We plot the solutions of the homogeneous linear system given in Eqs. (1.17)-(1.18). Solution: The linear system above can be written in the usual way as follows, 2x1 − x2 = 0, −x1 + 2x2 = 0. The solution of this system can be found using Gauss elimination operations, as follows x 1 = 0, 2 −1 1 −2 1 −2 10 → → → ⇒ −1 2 2 −1 0 3 01 x 2 = 0. Therefore, the coefficient matrix of this system has rank two, and so the solutions have no free variable. The solution is unique and is the trivial solution x = 0. Examples 1.5.4 and 1.5.5 are particular cases of the following statement: An m × n homogeneous linear system has non-trivial solutions iff the system has at least one free variable. Example 1.5.6: Find all solutions to the 2 × 3 homogeneous linear system Ax = 0 with coefficient matrix 2 −2 4 A= . 1 32 Solution: The linear system above can be written in the usual way as follows, 2x1 − 2x2 + 4x3 = 0, (1.19) x1 + 3x2 + 2x3 = 0. (1.20) We now compute the solutions to the homogeneous linear system above. We start with Gauss elimination operations x1 = −2x3 , 1 32 1 32 102 x2 = 0, → → ⇒ 2 −2 4 0 −8 0 010 x :free variable. 3 Therefore, the set of all solutions of the linear system above can be written in vector notation as follows, x1 −2x3 −2 x = x2 = 0 = 0 x3 , x3 ∈ R. x3 x3 1 In Fig. 15 we emphasize that the solution vector x belong to the space R3 , while the column vectors of the coefficient matrix of this same system belong to the space R2 . 32 G. NAGY – LINEAR ALGEBRA december 8, 2009 x3 y2 3 3 1 2 2 A1 x2 1 1 A3 A2 x 2 x1 −2 −1 1 4 y1 Figure 15. The picture on the left represents the solutions of the homogeneous linear system given in Eq. (1.19)-(1.20), which are 3-vectors, elements in R3 . The picture on the right represents the column vectors of the coefficient matrix in this system which are 2-vectors, elements in R2 . Example 1.5.7: Find all solutions to the linear system Ax = 0, with coefficient matrix A= 1 2 3 6 4 . 8 Solution: We only need to use the Gauss-Jordan method on the coefficient matrix 1 2 3 6 4 1 → 8 0 3 0 4 0 ⇒ x1 = −3x2 − 4x3 x2 , x3 free variables. In this case the solution can be expressed in vector notation as follows x1 −3x2 − 4x3 −3 −4 = 1 x2 + 0 x3 . x2 x = x2 = x3 x3 0 1 The solutions of the homogeneous linear system are all linear combinations of the vectors −3 −4 u1 = 1 u2 = 0 . 0 1 We now introduce a set of vectors which is useful to express solutions of homogeneous linear systems. Definition 1.13. Let U = {u1 , · · · , uk } ⊂ Rn be a finite set, with k 1. The span of U , denoted as Span(U ), is the the set of all possible linear combinations of elements in U . Using this definition we can express the solutions x of the system in Example 1.5.7 as −3 −4 x ∈ Span u1 = 1 , u2 = 0 . 0 1 G. NAGY – LINEAR ALGEBRA December 8, 2009 33 In this case, the set of all solutions forms a plane in R3 , which contains the vectors u1 and u2 . In the case of Example 1.5.4 the solutions x belong to a line in R2 given by 2 1 x ∈ Span . Knowing the solutions of an homogeneous linear system gives important information about the solutions of an inhomogeneous linear system with the same coefficient matrix. The next result establishes this relation in a precise way. Theorem 1.14. If the m × n linear system Ax = b is consistent and the vector xp is one particular solution of this linear system, then any solution x to this system can be decomposed as x = xp + xh , where the vector xh is a solution of the homogeneous linear system Axh = 0. Proof of Theorem 1.14: We know that the vector xp is a solution of the linear system, that is, Axp = b. Suppose that there is any other solution x, that is, Ax = b. Then, their difference xh = x − xp satisfies the homogeneous equation Axh = 0, since, Axh = A(x − xp ) = Ax − Axp = b − b = 0. We have used the property that the matrix-vector product is distributive, that is, A(x − xp ) = Ax − Axp . This property follows from the definition of the matrix-vector product, and will be proved in the next Chapter. With this property we have established the Theorem. We say that the solution to a non-homogeneous linear system is written in vector form or in parametric form when it is expressed as in Theorem 1.14, that is, x = xp + xh , where the vector xh is a solution of the homogeneous linear system, and the vector xp is any solution of the non-homogeneous linear system. Example 1.5.8: Find all solutions of the 2 × 2 linear system below, and write them in parametric form, where the linear system is given by 2 1 6 x1 . = 3 x2 −4 −2 (1.21) Solution: We first find the solutions of this inhomogeneous linear system using Gauss elimination operations, 1 2 −2 −4 |3 1 → |6 0 −2 0 | | 3 0 x1 = 2x2 + 3, ⇒ x2 free variable. Therefore, the set of all solutions of the linear system above is given by x= x1 2x2 + 3 = x2 x2 ⇒ x= 2 3 x+ . 12 0 In this case we see that xp = 3 , 0 xh = 2 x. 12 The vector xp is the particular solution to the non-homogeneous system given by x2 = 0, while it is not difficult to check that xh above is solution of the homogeneous equation Axh = 0. In Fig. 16 we represent these solutions on the plane. We can see that the solution of the non-homogeneous system is the translation by xp of the solutions of the homogeneous system. 34 G. NAGY – LINEAR ALGEBRA december 8, 2009 x 2 1 1 1 2 3 Figure 16. The line through the origin represents solutions to the homogeneous system associated with Eq. (1.21). The second line represents the solutions to the non-homogeneous system in Eq. (1.21), which is the translation by xp of the line passing by the origin. Further reading. See Sections 2.4 and 2.5 in Meyer’s book [3] for a detailed discussion on homogeneous and non-homogeneous linear systems, respectively. See Section 1.4 in Lay’s book [2] for a detailed discussion of the matrix-vector product, and Section 1.5 for detailed discussions on homogeneous and non-homogeneous linear systems. G. NAGY – LINEAR ALGEBRA December 8, 2009 35 Exercises. 1.5.1.- Find the general solution of the homogeneous linear system x1 + 2x2 + x3 + 2x4 = 0 1.5.5.- Suppose that the solution to a system of linear equation is given by x1 = 5 + 4x3 2x1 + 4x2 + x3 + 3x4 = 0 x2 = −2 − 7x3 3x1 + 6x2 + x3 + 4x4 = 0. x3 1.5.2.- Find all the solutions x of the linear system Ax = b, where A and b are given by 2 3 23 1 −2 −1 1 1 8 5 , b = 42 5 , A = 42 1 −1 1 1 and write these solutions in parametric form, that is, in terms of column vectors. 1.5.3.- Prove the following statement: If the vectors c and d are solutions of the homogeneous linear system Ax = 0, then c + d is also a solution. 1.5.4.- Find the general solution of the nonhomogeneous linear system x1 + 2x2 + x3 + 2x4 = 3 2x1 + 4x2 + x3 + 3x4 = 4 3x1 + 6x2 + x3 + 4x4 = 5. free. Use column vectors to describe this set as a line in R3 . 1.5.6.- Suppose that the solution to a system of linear equation is given by x1 = 3x4 x2 = 8 + 4x4 x3 = 2 − 5x4 x4 free. Use column vectors to describe this set as a line in R4 . 1.5.7.- Consider the following system of linear equations, where k represents any real number, 2 32 3 2 3 223 x1 0 44 8 125 4x2 5 = 4−45 . 62k x3 4 (a) Find all possible values of the number k such that the system above has a unique solution. (b) Find all possible values of the number k such that the system above has infinitely many solutions, and express those solutions in parametric form. 36 G. NAGY – LINEAR ALGEBRA december 8, 2009 1.6. Discretization of linear ODE A differential equation is an equation where the unknown is a function, and the function itself together with its derivatives appear in the equation. Solutions of differential equations can be approximated by solutions of appropriate n × n linear systems in the limit that n approaches infinity. Computers are used to solve such a large linear system, whose solution is a vector in Rn , that is, an array of n numbers. These numbers are used to construct an approximation of the solution to the differential equation. In this Section we show how to approximate a simple differential equation by an n × n linear system. We are interested to find an approximate solution to the following problem, which is called a boundary value problem. Given a continuously differentiable function f : [0, 1] → R, find a function y : [0, 1] → R solution of the boundary value problem y (x) = f (x), y (0) = 0, y (1) = 0, (1.22) d2 y dx2 . where second derivatives are denoted as y = This is a simple problem that can be solved without approximation by computing two anti-derivatives of the function f . Such integrations introduce two constants, which are then determined by the boundary conditions y (0) = y (1) = 0. An approximate solution can be obtained in many different ways, we choose here the method of finite differences. 1.6.1. Method of finite differences. Fix a positive integer n ∈ N, and introduce the homogeneous grid {xi = i/n}n in the interval [0, 1] ⊂ R. Denote by h = 1/n the grid step i=0 size, and introduce the numbers fi = f (xi ). Finally, denote yi = y (xi ). The numbers xi and fi are known from the problem, and so are the y0 = y (0) = 0 and yn = y (1) = 0, while the yi for i = 1, · · · , n − 1 are the unknowns. We now use the original differential equation to construct an (n − 1) × (n − 1) linear system for these unknowns yi . There are many different ways to construct such linear system, since the derivative of a function can be approximated in many different ways: y (x + ∆x) − y (x) yi+1 − yi ∆x=h −→ d+ yi = , ∆x h yi − yi−1 y (x) − y (x − ∆x) ∆x=h −→ d- yi = , = lim ∆x→0 ∆x h yi+1 − yi−1 y (x + ∆x) − y (x − ∆x) ∆x=h −→ dc yi = = lim . ∆x→0 2∆x 2h While the expressions on the left hand side are all equal to y (x), the expressions on the left are all different. The latter differ by quantities that approach zero as ∆x gets smaller. There are many other approximations to y (x) than those given above. Something similar occurs when one constructs approximations for y (x). For example, one can do the following: y (x) = lim ∆x→0 d+ yi+1 − d+ yi , h yi+2 + yi − 2yi+1 = . h2 In this Section we choose a different approximation for y (x). We use a discrete derivative formula obtained from the Taylor series of y . We know that for any ∆x small enough the following expression holds for any three times continuously differentiable function y , namely, y (x) = lim ∆x→0 y (x + ∆x) − y (x) ∆x ∆x=h −→ d2 yi = + y (x) (∆x)2 + O (∆x)3 , 2 y (x) y (x − ∆x) = y (x) − y (x) ∆x + (∆x)2 + O (∆x)3 , 2 y (x + ∆x) = y (x) + y (x) ∆x + G. NAGY – LINEAR ALGEBRA December 8, 2009 37 where O (∆x)n denotes a function that approaches zero like (∆x)n as ∆x → 0. Adding and subtracting the two expressions above we obtain the formulas y (x + ∆x) − y (x − ∆x) = 2y (x) ∆x + O (∆x)3 , y (x + ∆x) + y (x − ∆x) = 2y (x) + y (x) (∆x)2 + O (∆x)3 . Choosing ∆x = h, we obtain the following formulas for the approximate derivatives yi+1 + yi−1 − 2yi yi+1 − yi−1 , d2 yi = . (1.23) dyi = 2h h2 We now can state the approximate problem we will solve: Given the constants {fi }n−1 , i=1 together with y0 = 0 and yn = 0, find the constants {yi }n−1 solutions of the linear system i=1 d2 yi = fi , i = 1, · · · , n − 1. (1.24) We first show that Eq. (1.24) is indeed a linear system for yi , since it is equivalent to the equations yi+1 + yi−1 − 2yi = h2 fi , i = 1, · · · , n − 1. For example, choose n = 6, so h = 1/6, and recalling that y0 = y6 = 0, then the system above together with its augmented matrix are given by the following expressions, respectively, −2y1 + y2 = f1 /36, −2 1 0 0 0 f1 /36 y1 − 2y2 + y3 = f2 /36, 1 −2 1 0 0 f2 /36 0 y2 − 2y3 + y4 = f3 /36, ⇔ 1 −2 1 0 f3 /36 . 0 0 1 −2 1 f4 /36 y3 − 2y4 + y5 = f4 /36, 0 0 0 1 −2 f5 /36 y4 − 2y5 = f5 /36, We then conclude that the solution y of the boundary value problem in Eq. (1.22) can be approximated by the solution yi of the 5 × 5 linear system above. The same type of approximate solution can be found for all n ∈ N. How well does a function constructed with the set of points {yi }n approximate the i=0 function y : [0, 1] → R solution to the boundary value problem in Eq. (1.22) in the limit n → ∞? The answer to this question is a crucial subject of numerical analysis, but is not studied in this notes. Further reading. See Section 1.4 in Meyer’s book [3] for a detailed discussion on discretizations of two-point boundary values problems. 38 G. NAGY – LINEAR ALGEBRA december 8, 2009 Exercises. 1.6.1.- Consider the boundary value problem for the function y given by y (x) = 25 x, y (0) = 0, y (1) = 0, 1.6.2.- Consider the boundary value problem for the function y given by y (x) + 2y (x) = 25 x, y (0) = 0, y (1) = 0, x ∈ [0, 1]. x ∈ [0, 1]. Divide the interval [0, 1] into five equal subintervals and use the finite difference method to find an approximate solution vector y = [yi ] to the boundary value problem above, where i = 0, · · · , 5. Use the discrete derivatives presented in Eq. (1.23). Divide the interval [0, 1] into five equal subintervals and use the finite difference method to find an approximate solution vector y = [yi ] to the boundary value problem above, where i = 0, · · · , 5. Use the discrete derivatives presented in Eq. (1.23). G. NAGY – LINEAR ALGEBRA December 8, 2009 39 1.7. Floating-point numbers Floating-point numbers are a finite subset of the rational numbers. Many different types of floating-point numbers exist, all of them are characterized by having a finite number of digits when written in a particular base. Digital computers use floating-point numbers to carry out almost every arithmetic operation. When an m × n algebraic linear system is solved using a computer, every Gauss operations is performed in a particular set of floating-point numbers. In this Section we study what type of approximations occur in this process. Definition 1.15. A non-zero rational number x is a floating-point number in base b ∈ N, of precision p ∈ N, with exponent n ∈ Z in the range −N n N ∈ N, iff there exist integers di , for i = 1, · · · , p, satisfying 0 di b − 1 and d1 = 0, such that the number x has the form x = ±0.d1 · · · dp × bn . (1.25) We call p the precision, b the base and N the exponent range of the floating point number x. We denote by Fp,b,N the set of all floating-point numbers of fixed precision p, base b and exponent range N . In this notes we always work in base b = 10. Computers usually work with base b = 2, but also with base b = 16, and they present their results with base b = 10. Example 1.7.1: The following numbers belong to the set F2,10,3 , 215 = 0.215 × 10−3 . 106 The set F3,10,3 is a finite subset of the rational numbers. The biggest number and the smallest number in absolute value are the following, respectively, 210 = 0.210 × 103 , 1 = 0.100 × 10, −0.02 = −0.200 × 10−1 , 0.999 × 103 = 999 0.100 × 10−3 = 0.0001. Any number bigger 999 or closer to 0 than 0.0001 does not belong to F3,10,3 . Here are other examples of numbers that do not belong to F3,10,3 , 1000 = 0.100 × 104 210.3 = 0.2103 × 103 , 1.001 = 0.1001 × 10, 0.000027 = 0.270 × 10−4 . In the first case the number is too big, we need an exponent n = 4 to specify it. In the second and third cases we need a precision p = 4 to specify those numbers. In the last case the number is too close to zero, we need an exponent n = −4 to specify it. Example 1.7.2: The set of floating-point numbers F1,10,1 is small enough to picture it on the real line. The set of all positive elements in F1,10,1 is shown on Fig. 17, and this is the union of the following three sets, {0.01, 0.02, 0.03, 0.04, 0.05, 0.06, 0.07, 0.08, 0.09} = {0.i × 10−1 }9=1 ; i {0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9} = {0.i × 100 }9=1 ; i {1, 2, 3, 4, 5, 6, 7, 8, 9} = {0.i × 101 }9=1 . i One can see in this example that the elements on F1,10,1 are not homogeneously distributed on the interval [0, 10]. The irregular distribution of the floating-point numbers plays an important role when one computes the addition of a small number to a big number. Example 1.7.3: In Table 1 we show the main set of floating-point numbers used nowadays in computers. For example, the format called Binary64 represents all floating-point numbers 40 G. NAGY – LINEAR ALGEBRA december 8, 2009 0 1 0 0.1 2 3 4 5 6 7 8 9 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Figure 17. All the positive elements in F1,10,1 . in the set F53,2,1024 . One of the bigger numbers in this set is 21023 . To have an idea how big is this number, let us rewrite it as 10x , that is, 21023 = 10x ⇔ 1023 ln(2) = x ln(10) ⇔ x = 307.95... So the biggest number in F53,2,1024 is close to 10308 . Format name Base b Digits Max. exp. p N Binary 32 2 24 128 Binary 64 2 53 1024 Binary 128 2 113 16384 Decima l64 10 16 385 Decimal 128 10 34 6145 Table 1. List of the parameters b, p and N that determine the floatingpoint sets Fp,b,N , which are most used in computers. The first column presents a standard name given in the scientific computing community to these floating-point formats. We say that a set A ⊂ R is closed under addition iff for every two elements in A the sum of these two numbers also belongs to A. The definition of a set A ⊂ R being closed under multiplication is similar. The sets of floating-point numbers Fp,b,N ⊂ R are not closed under addition or multiplication. This means that the sum of two numbers in Fp,b,N might not belong to Fp,b,N . And the multiplication of two numbers in Fp,b,N might not belong to Fp,b,N . Here are some examples. Example 1.7.4: Consider the set F2,10,2 . It is not difficult to see that F2,10,2 is not closed under multiplication, as the first line below shows. The rest of the example below shows that the sum of two numbers in F2,10,2 does not belong to that set. x = 10−3 = 0.10 × 10−2 ∈ F2,10,2 ⇒ x2 = 10−6 = 0.0001 × 10−2 ∈ F2,10,2 , / G. NAGY – LINEAR ALGEBRA December 8, 2009 x = 10−3 = 0.10 × 10−2 ∈ F2,10,2 , y = 1 = 0.10 × 101 ∈ F2,10,2 , ⇒ 41 x + y = 0.001 + 1 = 1.001 = 0.1001 × 10 ∈ F2,10,2 . / 1.7.1. The rounding-off function. Since the set Fp,b,N is not closed under addition or multiplication, not every arithmetic calculation involving real numbers can be performed in Fp.,b,N . A way to perform a sequence of arithmetic operations in Fp,b,N is first, to project the real numbers into the floating-point numbers, and then to perform the calculation. Since the result might not be in the set Fp,b,N , one must project again the result onto Fp,b,N . The action to project a real number into the floating-point set is called to round-off the real number. There are many different ways to do this. We now present a common round-off function. Definition 1.16. Given the floating-point number set Fp,b,N , let xN = 0.9 · · · 9 × bN be the biggest number in Fp,b,N . The rounding function f : R ∩ [−xN , xN ] → Fp,b,N is defined as follow: Given x ∈ R ∩ [−xN , xN ], with x = ±0.d1 · · · dp dp+1 · · · × bn and −N n N, holds ± 0.d1 · · · dp × bn if dp+1 < 5, f (x) = −p n ± (0.d1 · · · dp + b ) × b if dp+1 5. Example 1.7.5: We now present few numbers not in F3,10,3 and their respective round-offs x = 0.2103 × 103 , f (x) = 0.210 × 103 , x = 0.21037 × 103 , f (x) = 0.210 × 103 , x = 0.2105 × 103 , f (x) = 0.211 × 103 , x = 0.21051 × 103 , f (x) = 0.211 × 103 . The rounding function has the following properties: Proposition 1.17. Given Fp,b,N there always exist x, y ∈ R such that f f (x) + f (y ) = f (x + y ), f f (x)f (y ) = f (xy ). We do not prove this Proposition, we only provide a particular example, in the case that the floating-point number is F2,10,2 . Example 1.7.6: The real numbers x = 21/2 and y = 11/2 add up to x + y = 16. Only one of them belongs to F2,10,2 , since x = 0.105 × 102 ∈ F2,10,2 / ⇒ f (x) = 0.11 × 102 , y = 0.55 × 10 ∈ F2,10,2 ⇒ f (y ) = 0.55 × 10 = y. We now verify that for these numbers holds that f f (x) + f (y ) = f (x + y ), since x + y = 0.16 × 102 ∈ F2,10,2 ⇒ f (x + y ) = 0.16 × 102 = x + y, f f (x) + f (y ) = f (0.11 × 102 + 0.55 × 10) = f (0.165 × 102 ) = 0.17 × 102 . Therefore, we conclude that 17 = f f (x) + f (y ) = f (x + y ) = 16. 42 G. NAGY – LINEAR ALGEBRA december 8, 2009 1.7.2. Solving linear systems using floating-point numbers. Arithmetic operations are, in general, not possible in Fp.b.N , since this set is not closed under these operations. Rounding after each arithmetic operation is a procedure that determines numbers in Fp,b,N which are close to the result of the arithmetic operations. The difference between these two numbers is the error of the procedure, is the error of making calculations in a finite set of numbers. One could think that the Gauss-Jordan method could be carried out in the set Fp,b,N , if one round-off each intermediate calculation. This Subsection presents an example that this is not the case. The Gauss-Jordan method is, in general, not possible in any set Fp,b,N using rounding after each arithmetic operation. Example 1.7.7: Use the floating-point set F3,10,3 and Gauss operations to find the solution of the 2 × 2 linear system 5 x1 + x2 = 6, 9.43 x1 + 1.57 x2 = 11. Solution: Note that the solution is x1 = 1, x2 = 1. The calculation we do now shows that the Gauss-Jordan method is not possible in the set F3,10,3 using round-off functions; the Gauss-Jordan method does not work in this example. The fist step in the Gauss-Jordan method is to compute the augmented matrix of the system and perform a Gauss operation to make the coefficient in the position of a21 vanish, that is, 5 1 9.43 1.57 6 5 → 11 0 1 −0.316 6 . −0.316 This operation cannot be performed in the set F3,10,3 using rounding. In order to understand this, let us review what we did in the calculation above. We multiplied the first row by −9.43/5 and add that result to the second row. The result using real numbers is that the new coefficient obtained after this calculation is a21 = 0. If we do this calculation in the set ˜ F3,10,3 using rounding, we have to do the following calculation: 9.43 a21 = f 9.43 − f f (5) f ˜ , 5 that is, we round the quotient −9.43/5, then we multiply by 5, we round again, then we subtract that from 9.43, and we finally round the result: a21 = f 9.43 − f 5 f (1.886) ˜ = f 9.43 − f 5 (1.89) = f 9.43 − f (9.45) = f 9.43 − 9.45 = f (−0.02) = −0.02. Therefore, with this Gauss operation on F3,10,3 using rounding one obtains a21 = −0.02 = 0. ˜ The same type of calculation on the other coefficients a22 , ˜2 , produces the following new ˜ b augmented matrix 5 1 9.43 1.57 6 5 → 11 −0.02 1 −0.32 6 . −0.3 The Gauss-Jordan method cannot follow unless the coefficient a21 = 0. But this is not ˜ possible in our example. A usual procedure used in scientific computation is to modify G. NAGY – LINEAR ALGEBRA December 8, 2009 43 the Gauss-Jordan method. The modification introduces further approximation errors in the calculation. The modification in our example is the following: Replace the coefficient a21 = −0.02 by a21 = 0. The modified Gauss-Jordan method in our example is given by ˜ ˜ 5 6 → 0 11 5 1 9.43 1.57 1 −0.32 6 . −0.3 (1.26) What we have done here is not a rounding error. It is a modification of the Gauss-Jordan method to find an approximate solution of the linear system in the set F3,10,3 . The rest of the calculation to find the solution of the linear system is the following: 5 0 51 6 → −0.3 01 1 −0.32 6 ; 0.3 0.32 f since 0.3 = f (0.9375) = 0.938, 0.32 f we have that 5 1 0 6 1 f → 0.3 0.32 5 0 1 1 6 50 → 0.938 01 f (6 − 0.938) ; 0.938 since f ) (6 − 0.938) = f (5.062) = 5.06, we also have that 50 01 5.06 10 → 0.938 01 f 5.06 5 0.938 ; since f 5.06 = f (1.012) = 1.01, 5 we conclude that 1 0 0 1 1.01 0.938 ⇒ x1 = 0.101 × 10, . x2 = 0.938. We conclude that the solution in the set F3,10,3 differs from the exact solution x1 = 1, x2 = 1. The errors in the result are produced by rounding errors and by the modification of the Gauss-Jordan method discussed in Eq. (1.26). We finally comment that the round-off error becomes important when adding a small number to a big number, or when dividing by a small number. Here is an example of the former case. Example 1.7.8: Add together the numbers x = 103 and y = 4 in the set F3,10,4 . Solution: Since x = 0.100 × 104 and y = 0.400 × 10, both numbers belong to F3,10,4 and so f (x) = x, f (y= y . Therefore, their addition is the following, f (x + y ) = f (1000 + 4) = f (0.1004 × 103 ) = 1 × 103 = x. That is, f (x + y ) = x, and the information of y is completely lost. 44 G. NAGY – LINEAR ALGEBRA december 8, 2009 1.7.3. Techniques to help minimize rounding errors. We have seen that solving an algebraic linear system in a floating-point number set Fp,b,N introduces rounding errors in the solution. There are several techniques that help keep these errors from becoming an important part of the solution. We comment here on four of these techniques. We present these techniques without any proof. The first two techniques are implemented before one start to solve the linear system. They are called column scaling and row scaling. The column scaling consists in multiplying by a same scale factor a whole column in the linear system. One performs a column scaling when one column of a linear system has coefficients that are far bigger or far smaller than the rest of the matrix. The factor in the column scaling is chosen in order that all the coefficients in the linear system are similar in size. One can interpret the column scaling on the column i of a linear system as changing the physical units of the unknown xi . The row scaling consists in multiplying by the same scale factor a whole equation of the system. One performs a row scaling when one row of the linear system has coefficients that are far bigger or far smaller than the rest of the system. Like in the column scaling, one chooses the scaling factor in order that all the coefficients of the linear system are similar in size. The last two techniques refer to what sequence of Gauss operations is more efficient in reducing rounding errors. It is well-known that there are many different ways to solve a linear system using Gauss operations in the set of the real numbers R. For example, we now solve the linear system below using two different sequences of Gauss operations: 24 13 24 13 10 12 → 7 13 10 13 → 7 24 7 1 → 10 0 5 12 → 7 01 2 −2 5 10 → 2 01 5 12 → −4 01 1 , 2 5 10 → 2 01 1 . 2 The solution obtained is independent of the sequences of Gauss operations used to find them. This property does not hold in the floating-point number set Fp,b,N . Two different sequences of Gauss operations on the same augmented matrix might produce different approximate solutions when they are performed in Fp,b,N . The main idea behind the last two techniques we now present to solve linear systems using floating-point numbers is the following: To find the sequence of Gauss operations that minimize the rounding errors in the approximate solution. The last two techniques are called partial pivoting and complete pivoting. The partial pivoting is the row interchange in a matrix in order to have the biggest coefficient in that column as pivot. Here below we do a partial pivoting for the pivot on the first column: −2 10 2 1 10 4 1 1 103 102 → 1 103 102 , −2 10 4 1 10 2 1 that is, use a pivot for the first column the coefficient with 10 instead of the coefficient with 10−2 of the coefficient with 1. Now proceed in the usual way: 1 0.4 0.1 1 0.4 0.1 10 4 1 1 103 102 → 0 999.6 99.9 . 103 102 → 1 10−2 2 1 0 1.996 0.999 10−2 2 1 The next step is again to use as pivot the biggest coefficient in the second column. In this case it is the coefficient 999.6, so no further row interchanges are necessary. Repeat this procedure till the last column. The complete pivoting is the row and column interchange in a matrix in order to have as pivot the biggest coefficient in the lower-right block from the pivot position. Here below G. NAGY – LINEAR ALGEBRA December 8, 2009 we do a complete pivoting −2 10 2 1 103 10 4 for the pivot on the first column: 3 1 1 103 102 10 102 → 10−2 2 1 → 2 1 10 4 1 4 1 10−2 10 45 102 1 . 1 For the first pivot, the lower-right part of the matrix is the whole matrix. In the example above we used as pivot for the first column the coefficient 103 in position (2, 2). We needed to do a a row interchange and then a column interchange. Now proceed in the usual way: 3 1 0.001 0.1 1 10−3 10−1 10 1 102 2 10−2 1 → 0 0.008 0.8 . 1 → 2 10−2 0 9.996 0.6 4 10 1 4 10 1 The next step in complete pivoting is to choose the coefficient for the pivot position (2, 2). We have to look in the lower-right block form th pivot position, that is, in the block 0.008 9.996 0.8 . 0.6 The biggest coefficient in that block is 9.996, so we not need to do column interchanges, that is, 1 0.001 0.1 1 0 0.008 0.8 → 0 0 9.996 0.6 0 Repeat this procedure till the last column. have to do a row interchange but we do 0.001 9.996 0.008 0.1 0.6 . 0.8 46 G. NAGY – LINEAR ALGEBRA december 8, 2009 Exercises. 1.7.1.- Consider the following system: 10 −3 x1 − x2 = 1, x1 + x2 = 0. (a) Solve this system in F3,10,6 with rounding, but without partial or complete pivoting. (b) Find the system that is exactly satisfied by your solution in (a), and note how close is this system to the original system. (c) Use partial pivoting to solve this system in F3,10,5 with rounding. (d) Find the system that is exactly satisfied by your solution in (c), and note how close is this system to the original system. (e) Solve this system in R without partial or complete pivoting, and compare this exact solution with the solutions in (a) and (c). (f) Round the exact solution up to three digits, an compare it with the results from (a) and (c). 1.7.2.- Consider the following system: x1 + x2 = 3, −10 x1 + 105 x2 = 105 . (a) Solve this system in F4,10,6 with partial pivoting but no scaling. (b) Solve this system in F4,10,6 with complete pivoting but no scaling. (c) Use partial pivoting to solve this system in F3,10,5 with rounding. (d) This time row scale the original system, and then solve it in F4,10,6 with partial pivoting. (e) Solve this system in R and compare this exact solution with the solutions in (a)-(d). 1.7.3.- Consider the linear system −3 x1 + x2 = −2, 10 x1 − 3 x2 = 7. Solve this system in F3,10,6 without partial pivoting, and then solve it again with partial pivoting. Compare your results with the exact solution. G. NAGY – LINEAR ALGEBRA December 8, 2009 47 Chapter 2. Matrix algebra 2.1. Linear transformations In Sect. 1.5 we introduced the matrix-vector product. We defined such product because it provided a convenient notation; we used the matrix-vector product to express a system of linear equations in a compact way. We see in this Section that the matrix-vector product has a deeper meaning. It is the key to interpret a matrix as a function acting on vectors. We will see that a matrix is a particular type of function, called linear function or linear transformation. In this Section we study several examples of such functions determined by matrices. From now on we consider both real-valued and complex-valued matrices and vectors. We use the notation F ∈ {R, C} to mean that F = R or F = C, and elements in F are called scalars. So a scalar is a real number or a complex number. We denote by Fn the set of all n-vectors x = [xi ] with components xi ∈ F, where i = 1, · · · , n. Finally we denote by Fm,n the set of all m × n matrices A = [Aij ] with components Aij ∈ F, where i = 1 · · · m and j = 1 , · · · , n. The matrix-vector product of a matrix A = [Aij ] ∈ Fm,n and a vector x = [xj ] ∈ Fn is defined as follows A11 . Ax = . . ··· Am1 ··· A1n x1 A11 x1 + · · · + A1n xn . . = . . . . . . . . Amn xn Am1 x1 + · · · + Amn xn If we express the m × n matrix A in terms of its column vectors, A = [A:1 , · · · , A:n ], the matrix-vector product has the form Ax = A:1 , · · · , A:n x1 . . = A:1 x1 + · · · + A:n xn . . xn This product was introduced to represent in a compact way a linear system with augmented matrix [A|b], which can be thus expressed as Ax = b. This matrix-vector product has an important property; it preserves the linear combination of vectors. This property is summarized in the following result. Proposition 2.1. Given any matrix A ∈ Fm,n , vectors x, y ∈ Fn and scalars a, b ∈ F, the matrix-vector product satisfies that A(ax + by) = a Ax + b Ay. This Proposition says that the matrix-vector product of a linear combination of vectors is the linear combination of the matrix-vector products. The expression above contains the particular cases a = b = 1 and b = 0, which are respectively given by A(x + y) = Ax + Ay, A(ax) = a Ax. 48 G. NAGY – LINEAR ALGEBRA december 8, 2009 Proof of Proposition 2.1: From the definition of the matrix-vector product we see that: ax1 + by1 . . A(ax + by) = [A:1 , · · · , A:n ] . axn + byn = A:1 (ax1 + by1 ) + · · · + A:n (axn + byn ) = a A:1 x1 + · · · + A:n xn + b A:1 y1 + · · · + A:n yn = a Ax + b Ay. This establishes the Proposition. 2.1.1. A matrix is a function. The matrix-vector product provides a new interpretation for a matrix: An m × n matrix A ∈ Fm,n defines a function A : Fn → Fm as follows, Fn x → Ax = y ∈ Fm , where we introduced the notation Fn mapped to.” x meaning that x belongs to Fn , and → meaning “is Example 2.1.1: The 2 × 3 matrix A ∈ R2,3 given by A= 2 1 −2 4 32 defines a function A : R3 → R2 as follows, x x1 2x1 − 2x2 + 4x3 2 −2 4 1 x2 = ∈ R2 . R3 x = x2 → Ax = x1 + 3x2 + 2x3 1 32 x3 x3 For example, 1 given x = 1 ∈ R3 , 1 then Ax = 2 1 1 4 −2 4 1= ∈ R2 . 6 32 1 1 0 defines a function A : R2 → R2 , which can 0 −1 be interpreted as a reflection along the horizontal line. Indeed, the action of the matrix A on an arbitrary element in x ∈ R2 is the following, Example 2.1.2: The 2 × 2 matrix A = Ax = 1 0 0 −1 x1 x1 = . x2 −x2 Here are particular cases, the first and the last ones are represented in Fig. 18, A 2 2 = , 1 −1 A −1 −1 = , −3 3 A 1 1 = . 0 0 (2.1) 01 defines a function A : R2 → R2 , which can 10 be interpreted as a reflection along the line x1 = x2 , see Fig. 19. Indeed, the action of the matrix A on an arbitrary element in x ∈ R2 is the following, Example 2.1.3: The 2 × 2 matrix A = Ax = 0 1 1 0 x1 x = 2. x2 x1 G. NAGY – LINEAR ALGEBRA December 8, 2009 x 49 2 Av u 1 Aw=w −1 x1 −1 Au v Figure 18. We sketch the action of matrix A in Example 2.1.2 on the vectors given in Eq. (2.1), which we called u, v, and w, respectively. Since A : R2 → R2 , we used the same plane to plot both u and Au. Here are particular cases, the first and the last ones are represented in Fig. 18, A 2 1 = , 1 2 A −3 −1 = , −1 −3 x A 2 2 = . 2 2 (2.2) x 1= x 2 2 Au Aw=w 1 u −1 x1 −1 v Av Figure 19. We sketch the action of matrix A in Example 2.1.3 on the vectors given in Eq. (2.2), which we called u, v, and w, respectively. Since A : R2 → R2 , we used the same plane to plot both u and Au. 0 −1 defines a function A : R2 → R2 , which 1 0 can be interpreted as a rotation by an angle π/2 counterclockwise, see Fig. 20. Indeed, the action of the matrix A on an arbitrary element in x ∈ R2 is the following, Example 2.1.4: The 2 × 2 matrix A = Ax = 0 1 −1 0 x1 −x2 = . x2 x1 Here are particular cases, the first and the last ones are represented in Fig. 18, A 2 −1 = , 1 2 A −3 −1 = , 1 −3 A 2 −2 = . 2 2 (2.3) 50 G. NAGY – LINEAR ALGEBRA december 8, 2009 x2 Au w Aw 1 u v −1 x 1 −1 Av Figure 20. We sketch the action of matrix A in Example 2.1.3 on the vectors given in Eq. (2.2), which we called u, v, and w, respectively. Since A : R2 → R2 , we used the same plane to plot both u and Au. Example 2.1.5: Given any real number θ, the 2 × 2 matrix A= cos(θ) − sin(θ) , sin(θ) cos(θ) defines a function A : R2 → R2 , which can be interpreted as a rotation by an angle θ counterclockwise. In order to verify that this matrix A rotates a vector x by an angle θ counterclockwise, let us first compute the action of the matrix A on an arbitrary element in x ∈ R2 , and let us call the result y ∈ R2 , that is, Ax = y. An explicit calculation shows, cos(θ) − sin(θ) sin(θ) cos(θ) x1 x cos(θ) − x2 sin(θ) y =1 = 1. x2 x1 sin(θ) + x2 cos(θ) y2 (2.4) The components y1 and y2 above are precisely the components of the vector that is the rotation of the vector x by an angle θ counterclockwise. The rest of this Example is to verify this assertion, that is, we will show the following: If y is the rotation by θ of the vector x, as shown in Fig. 21, then the components of vector y and x must be related by Eq. (2.4). Indeed, in that Figure it is simple to check that the following relation holds: y1 = cos(θ + φ) y , y2 = sin(θ + φ) y , where y is the magnitude of the vector y. Since a rotation does not change the magnitude of the vector, then y = x and so, y1 = cos(θ + φ) x , y2 = sin(θ + φ) x . Recalling now the formulas for the cosine and the sine of a sum of two angles, cos(θ + φ) = cos(θ) cos(φ) − sin(θ) sin(φ), sin(θ + φ) = sin(θ) cos(φ) + cos(θ) sin(φ), we obtain that y1 = cos(θ) cos(φ) x − sin(θ) sin(φ) x , y2 = sin(θ) cos(φ) x + cos(θ) sin(φ) x . G. NAGY – LINEAR ALGEBRA December 8, 2009 51 Recalling that x1 = cos(φ) x , x2 = sin(φ) x , we obtain the formula y1 = cos(θ) x1 − sin(θ) x2 , y2 = sin(θ) x1 + sin(θ) x2 . This are precisely the expression that defines matrix A. Therefore, the action of the matrix A above is a rotation by θ counterclockwise. x2 y x 0 0 x1 Figure 21. The vector y is the rotation by an angle θ counterclockwise of the vector x. 20 defines a function A : R2 → R2 , which can 02 be interpreted as a dilation, see Fig. 22. Indeed, the action of the matrix A on an arbitrary element in x ∈ R2 is the following, Ax = 2x. Example 2.1.6: The 2 × 2 matrix A = x2 1 2 x1 Figure 22. We sketch the action of matrix A in Example 2.1.6. Given any vector x with end point on the circle of radius one, the vector Ax = 2x is a vector parallel to x and with end point on the dashed circle of radius two. 52 G. NAGY – LINEAR ALGEBRA december 8, 2009 20 defines a function A : R2 → R2 , which can 01 be interpreted as a shear, see Fig. 23. Indeed, the action of the matrix A on an arbitrary element in x ∈ R2 is the following, Example 2.1.7: The 2 × 2 matrix A = Ax = 2 0 0 1 x1 2x1 = . x2 x2 x2 1 2 x1 Figure 23. We sketch the action of matrix A in Example 2.1.7. Given any vector x with end point on the circle of radius one, the vector Ax = 2x is a vector parallel to x and with end point on the dashed curve. 2.1.2. A matrix is a linear transformation. We have seen several examples of functions given by matrices. All these functions have a common property: They preserve linear combinations. That means, given any m × n matrix A ∈ Fm,n , the function A : Fn → Fm defined as A(x) = Ax satisfies that A(ax + by) = a A(x) + b A(y) n for all vectors x, y ∈ F and all scalars a, b ∈ F. This property is inherited from the matrixvector product. Any function satisfying this property will be called a linear function, or linear transformation. Definition 2.2. A function T : Fn → Fm is called a linear transformation iff for all vectors x, y ∈ Fn and for all scalars a, b ∈ F holds T (ax + by) = a T (x) + b T (y). The expression above contains the particular cases a = b = 1 and b = 0, which are respectively given by T (x + y) = T (x) + T (y), T (ax) = a T (x). Example 2.1.8: Show that the only linear transformations T : R → R are straight lines through the origin. Solution: Since y = T (x) is linear and x ∈ R, we have that y = T (x) = T (x × 1) = x T (1). If we denote T (1) = m, we then conclude that a linear transformation T : R → R must be y = mx, which is a straight line through the origin with slope m. G. NAGY – LINEAR ALGEBRA December 8, 2009 53 Example 2.1.9: Any m × n matrix A ∈ Fm,n defines a linear transformation T : Fn → Fm by the equation T (x) = Ax. In particular, all the functions defined in Examples 2.1.1-2.1.7 are linear transformations. Consider the most general 2 × 2 matrix, A11 A21 A= A12 ∈ F2,2 A22 and explicitly show that the function T (x) = Ax is a linear transformation. Solution: The explicit form of the function T : F2 → F2 given by T (x) = Ax is the following, x1 A11 A12 x1 A11 x1 + A12 x2 T = = x2 A21 A22 x2 A21 x1 + A22 x2 This function is linear, as it can be seen from the following explicit computation, x1 y +d 1 x2 y2 T (cx + dy) = T c =T cx1 + dy1 cx2 + dy2 = A11 (cx1 + dy1 ) + A12 (cx2 + dy2 ) A21 (cx1 + dy1 ) + A22 (cx2 + dy2 ) = A11 cx1 + A12 cx2 A dy + A12 dy2 + 11 1 A21 cx1 + A22 cx2 A21 dy1 + A22 dy2 A11 x1 + A12 x2 A11 y1 + A12 y2 +d A21 x1 + A22 x2 A21 y1 + A22 y2 =c = c T (x) + d T (y). This establishes that T is a linear transformation. Example 2.1.10: Find a function T : R2 → R2 that projects a vector onto the line x1 = x2 , see Fig. 24. Show that this function is linear. Finally, find a matrix A such that T (x) = Ax. Solution: From Fig. 24 one can see that a possible way to compute the projection of a vector x onto the line x1 = x2 is the following: Add to the vector x its reflection along the line x1 = x2 , and divide the result by two, that is, T x1 x2 = 1 2 x1 x +2 x2 x1 ⇒ T x1 x2 = (x1 + x2 ) 1 . 1 2 We have obtained the projection function T . We now show that this function is linear: Indeed T (ax + by) = T a =T x1 y +b 1 x2 y2 ax1 + by1 ax2 + by2 (ax1 + by1 + ax2 + by2 ) 1 1 2 (ax1 + ax2 ) 1 (by1 + by2 ) 1 = + 1 1 2 2 = a T (x) + b T (y). = 54 G. NAGY – LINEAR ALGEBRA december 8, 2009 This shows that T is linear. We now find a matrix A such that T (x) = Ax, as follows T x1 x2 (x1 + x2 ) 1 1 2 1 x1 + x2 = 2 x1 + x2 = 1 1 1 x1 ⇒ 2 1 1 x2 This matrix projects vectors onto the line x1 = x2 . = x2 A= 111 . 211 x 1= x2 x f(x) x1 Figure 24. The function T projects the vector x onto the line x1 = x2 . Further reading. See Sections 1.8 and 1.9 in Lay’s book [2]. Also Sections 3.3 and 3.4 in Meyer’s book [3]. G. NAGY – LINEAR ALGEBRA December 8, 2009 55 Exercises. 2.1.1.- Determine which of the following functions T : R2 → R2 is linear: (a) “»x –” » 3x – 1 2 T = . x2 2 + x1 (b) T “ »x – ” 1 x2 (c) T » = “ »x – ” 1 x2 – x1 + x2 . x1 − x2 » – x1 x2 = . 0 2.1.2.- Let T : R2 → R2 be the linear transformation given by T (x) = Ax, where » – 12 A= . 23 »– 5 . Find x ∈ R2 such that T (x) = 7 2.1.3.- Given the matrix and vector » – »– 1 −5 −7 −2 A= , b= , −3 7 5 −2 define the function T : R3 → R2 as T (x) = Ax, and then find all vectors x such that T (x) = b. 2.1.4.- Let T : R2 → R2 be a linear transformation such that “»1–” »3– “ »0 –” »1 – T = ,T = . 0 1 1 3 “ »x –” 1 Find the values of T for any vecx2 »– »– »– x1 1 0 tor = x1 + x2 ∈ R2 . x2 0 1 2.1.5.- Describe geometrically what is the action of T over a vector x ∈ R2 , where » –» – −1 0 x1 (a) T (x) = ; 0 −1 x2 » –» – 2 0 x1 (b) T (x) = ; 0 2 x2 » –» – 0 0 x1 ; (c) T (x) = 0 1 x2 –» – » 0 1 x1 . (d) T (x) = 1 0 x2 »– – »– 7 −2 x1 , ,u= ,v= 2.1.6.- Let x = −3 5 x2 2 2 and let T : R → R be the linear transformation T (x) = x1 v + x2 u. Find a matrix A such that T (x) = Ax for all x ∈ R2 . » 56 G. NAGY – LINEAR ALGEBRA december 8, 2009 2.2. Linear combinations One could say that the idea to introduce matrix operations originates from the interpretation of a matrix as a function. A matrix A ∈ Fm,n determines a function A : Fn → Fm . Such functions are generalizations of scalar valued functions of a single variable, f : R → R. It is well-known how to compute the linear combination of two functions f , g : R → R and, when possible, how to compute their composition and their inverses. Matrices determine a particular generalizations of scalar functions where the operations mentioned above on scalar functions can be defined on matrices. The result is called the linear combination of matrices, and when possible, the product of matrices and the inverse of a matrix. Since matrices are generalizations of scalar valued functions, there are few operations on matrices that reduce to the identity operation in the the case of scalar functions. Among such operations belong the transpose of a matrix and the trace of a matrix. 2.2.1. Linear combination of matrices. The addition of two matrices and the multiplication of a matrix by scalar are defined component by component. Definition 2.3. Let A = [Aij ] and B = [Bij ] be m × n matrices in Fm,n and a, b ∈ F be any scalars. The linear combination of A and B is also and m × n matrix in Fm,n , denoted as aA + bB, and given by aA + bB = [a Aij + b Bij ]. Using the notation aA + bB = [(aA + bB )ij ], that is, the components of the matrix aA + bB are denoted by (aA + bB )ij , then the definition above can be expressed in terms of the matrices components as follows (aA + bB )ij = a Aij + b Bij . This definition contains the particular cases a = b = 1 and b = 0, given by, respectively, (A + B )ij = Aij + Bij , Example 2.2.1: Consider the matrices 2 −1 A= −1 2 (aA)ij = a Aij . B= 3 2 0 . −1 (a) Find the matrix A + B and 3A. (b) Find a matrix C such that 2C + 6A = 4B. Solution: Part (a): The definition above gives, A+B= 2 −1 −1 3 + 2 2 0 5 .= −1 1 −1 , 1 3A = 6 −3 −3 . 6 Part (b): Matrix C is given by C= 1 4B − 6A 2 The definition above implies that C = 2B − 3A = 2 3 2 0 2 −3 −1 −1 −1 6 = 2 4 therefore, we conclude that C= 0 7 3 . −8 0 6 −3 −2 −3 −3 , 6 G. NAGY – LINEAR ALGEBRA December 8, 2009 57 We now summarize the main properties of the matrix linear combination. Proposition 2.4. For every matrices A, B ∈ Fm,n and every scalars a, b ∈ F, hold: (a) (ab)A = a(bA), (associativity); (b) a(A + B) = aA + aB, (distributivity); (c) (a + b)A = aA + bA, (distributivity); (d) 1 A = A, (1 is then identity). The definition of linear combination of matrices is defined as the linear combination of their components, which are real of complex numbers. Therefore, all properties in Proposition 2.4 on linear combinations of matrices are obtained from the analogous properties on the linear combination of real or complex numbers. Proof of Proposition 2.4: We use components notation. Property (a): (ab)A ij = (ab)Aij = a(bAij ) = a(bA)ij = a(bA) ij . Property (b): a(A + B ) ij = a(A + B )ij = a(Aij + Bij ) = aAij + aBij = (aA)ij + (aB )ij = aA + aB ij . Property (c): (a + b)A ij = (a + b)Aij = aAij + bAij = (aA)ij + (bA)ij = aA + bA ij . Property (d): (1A)ij = 1 Aij = Aij . 2.2.2. The transpose, adjoint, and trace of a matrix. Since matrices are generalizations of scalar-valued functions, one can define operations on matrices that, unlike linear combinations, have no analogs on scalar-valued functions. One of such operations is the transpose of a matrix, which is a new matrix with the rows and columns interchanged. Definition 2.5. The transpose of a matrix A = [Aij ] ∈ Fm,n is a matrix AT ∈ Fn,m with components AT kl = Alk . Example 2.2.2: Find the transpose of the 2 × 3 matrix A = 1 2 3 4 5 . 6 Solution: Matrix A has components Aij with i = 1, 2 and j = 1, 2, 3. Therefore, its transpose has components (AT )ji = Aij , that is, AT has three rows and two columns, 12 AT = 3 4 . 56 Example 2.2.3: Show that the transpose operation satisfies (AT )T = A. Solution: The proof is: (AT )T ij = (AT )ji = Aij . An example of this property is the following: In Example 2.2.2 we showed that 12 T 135 = 3 4 . 246 56 58 G. NAGY – LINEAR ALGEBRA december 8, 2009 Therefore, 1 2 3 4 5 6 T T 1 = 3 5 T 2 13 4 = 24 6 5 . 6 If a matrix has complex-valued coefficients, then the conjugate of a matrix can be defined as the conjugate of each component. Definition 2.6. The complex conjugate of a matrix A = [Aij ] ∈ Fm,n is A = Aij ∈ Fm,n . Example 2.2.4: A matrix A and its conjugate is given below, A= 1 2+i , −i 3 − 4i ⇔ A= 1 i 2−i . 3 + 4i Example 2.2.5: A matrix A has real coefficients iff A = A; It has purely imaginary coefficients iff A = −A. Here are examples of these two situations: A= 1 3 2 4 A= i 2i 3i 4i ⇒ A= 1 3 2 = A; 4 ⇒ A= −i −2i = −A. −3i −4i Definition 2.7. The adjoint of a matrix A ∈ Fm,n is the matrix A∗ = A T ∈ Fn,m . T Since A = (AT ), the order of the operations does not change the result, so we eliminate the parenthesis in the definition of A∗ . Example 2.2.6: A matrix A and its adjoint is given below, A= 1 2+i , −i 3 − 4i ⇔ A∗ = 1 i . 2 − i 3 + 4i The transpose, conjugate and adjoint operations are useful to specify certain classes of matrices with particular symmetries. Here we introduce few of these classes. Definition 2.8. An n × n matrix A is called: (a) symmetric iff holds A = AT ; (b) skew-symmetric iff holds A = −AT ; (c) Hermitian iff holds A = A∗ ; (d) skew-Hermitian iff holds A = −A∗ . Example 2.2.7: We present examples of each of the classes introduced in Def. 2.8. Part (a): Matrices A and B are symmetric. Notice that A is also Hermitian, while B is not Hermitian, 123 1 2 + 3i 3 7 4i = BT . A = 2 7 4 = AT , B = 2 + 3i 348 3 4i 8 G. NAGY – LINEAR ALGEBRA December 8, 2009 Part (b): Matrix C is skew-symmetric, 0 −2 3 0 −4 ⇒ C= 2 −3 4 0 0 CT = −2 3 2 0 −4 59 −3 4 = −C. 0 Notice that the diagonal elements in a skew-symmetric matrix must vanish, since Cij = −Cji in the case i = j means Cii = −Cii , that is, Cii = 0. Part (c): Matrix D is Hermitian but is not symmetric: 1 2−i 3 1 2+i 3 7 4 − i = D, 7 4 + i ⇒ DT = 2 + i D = 2 − i 3 4+i 8 3 4−i 8 however, 1 2+i 3 7 4 + i = D. D∗ = D = 2 − i 3 4−i 8 Notice that the diagonal elements in a Hermitian matrix must be real numbers, since the condition Aij = Aji in the case i = j implies Aii = Aii , that is, 2iIm(Aii ) = Aii − Aii = 0. T We can also verify what we said in part (a), matrix A is Hermitian since A∗ = A = AT = A. Part (d): The following matrix E is skew-Hermitian: i 2+i −3 i −2 + i 3 7i 4 + i ⇒ ET = 2 + i 7i −4 + i E = −2 + i 3 −4 + i 8i −3 4+i 8i T therefore, −i −2 − i 3 −7i −4 − i = −E. E∗ = E 2 − i −3 4−i −8i A skew-Hermitian matrix has purely imaginary elements in its diagonal, and the off diagonal elements have skew-symmetric real parts with symmetric imaginary parts. T The trace of a square matrix is a number, computed as the sum of all the diagonal elements of the matrix. Definition 2.9. The trace of a square matrix A = Aij ∈ Fn,n , denoted as tr (A) ∈ F, is the sum of its diagonal elements, that is, the scalar given by tr (A) = A11 + · · · + Ann . 123 Example 2.2.8: Find the trace of the matrix A = 4 5 6. 789 Solution: We only have to add up the diagonal elements: tr (A) = 1 + 5 + 9 ⇒ tr (A) = 15. The operations of computing the transpose or computing the trace of a matrix can be thought as function on the set of all m × n matrices. The transpose operation is in fact a function T : Fm,n → Fn,m given by T (A) = AT . In a similar way, the trace is a function tr : Fn,n → F, where tr (A) = A11 + · · · + Ann . One can verify that T and tr are linear 60 G. NAGY – LINEAR ALGEBRA december 8, 2009 functions. In the case of the transpose function this means that for all A, B ∈ Fm,n and all a, b ∈ F holds T (aA + bB) = a T (A) + b T (B). In the case of the transpose function this means that for all A, B ∈ Fn,n and all a, b ∈ F holds tr (aA + bB) = a tr (A) + b tr (B). G. NAGY – LINEAR ALGEBRA December 8, 2009 61 Exercises. 2.2.1.- Construct an example of a 3 × 3 matrix satisfying: (a) Is symmetric and skew-symmetric. (b) Is Hermitian and symmetric. (c) Is Hermitian but not symmetric. 2.2.2.- Find the numbers x, y , z solution of the equation » –» –T x+2 y+3 36 2 = 3 0 yz 2.2.3.- Given any square matrix A show that A+AT is a symmetric matrix, while A − AT is a skew-symmetric matrix. 2.2.4.- Prove that there is only one way to express a matrix A ∈ Fn,n as a sum of a symmetric matrix and a skewsymmetric matrix. 2.2.5.- Prove the following statements: (a) If A = [Aij ] is a skew-symmetric matrix, then holds Aii = 0. (b) If A = [Aij ] is a skew-Hermitian matrix, then the coefficients Aii are purely imaginary. (c) If A is a real and symmetric matrix, then B = iA is skew-Hermitian. 2.2.6.- Prove that for all A, B ∈ Fm,n and all a, b ∈ F holds (aA + bB)∗ = a A∗ + b B∗ . 2.2.7.- Prove that the transpose function T : Fm,n → Fn,m and trace function tr : Fn,n → F are linear functions. 62 G. NAGY – LINEAR ALGEBRA december 8, 2009 2.3. Matrix multiplication The operation of matrix multiplication originates in the composition of functions. We call this operation a multiplication since it reduces to the multiplication of real numbers in the case of 1 × 1 real matrices. Unlike the multiplication of real numbers, the product of general matrices is not commutative, that is, AB = BA in the general case. This property reflects the fact that the composition of two functions is a non-commutative operation. In this Subsection we first introduce the multiplication of two matrices using the matrix vector product. We then introduce the formula for the components of the product of two matrices. Finally we show that the composition of two matrices is their matrix product. Definition 2.10. The matrix multiplication of the m × n matrix A with the n × matrix B = B:1 , · · · , B: , denoted by AB, is the m × matrix given by AB = AB:1 , · · · , AB: . We have used the matrix-vector product to define the matrix product. Notice that the product is not defined for two arbitrary matrices, but the size of the matrices is important: A times m×n B n× defines AB m× The size of the matrices is important to define the matrix multiplication. The numbers of columns in the first matrix must match the numbers of rows in the second matrix. We assign a name to matrices satisfying this property. Definition 2.11. Matrices A and B are called conformable in the order A B iff the product AB is well defined. Example 2.3.1: Compute the product of the matrices A and B below, which are conformable in both orders AB and BA, where A= 2 −1 −1 , 2 B= 3 2 0 . −1 Solution: Following the definition above we compute the product in the order AB, namely, AB = AB:1 , AB:2 = A 3 0 ,A 2 −1 6−2 0+1 , −3 + 4 0−2 = = 4 1 1 . −2 Using the same definition we can compute the product in the opposite order, that is, BA = BA:1 , BA:2 = B −1 2 ,B 2 −1 = 6−0 −3 + 0 , 4+1 −2 − 2 = 6 5 −3 . −4 This is an example where we have that AB = BA. The following result gives a formula to compute the components of the product matrix in terms of the components of the individual matrices. Proposition 2.12. Consider the the m × n matrix A = [Aij ] and the n × matrix B = [Bjk ], where the indices take values as follows: i = 1, · · · , m, j = 1, · · · , n and k = 1, · · · , . The components of the product matrix AB are given by n (AB )ik = Aij Bjk . j =1 (2.5) G. NAGY – LINEAR ALGEBRA December 8, 2009 63 n We recall that the symbol j =1 in Eq. (2.5) means to add up all the terms having the index j starting from j = 1 until j = n, that is, n Aij Bjk = Ai1 B1k + Ai2 B2k + · · · + Ain Bnk . j =1 Proof of Proposition 2.12: The column k of the product AB is given by the column vector AB:k . This is a vector with m components, n j =1 A1j Bjk . . AB:k = , . n j =1 Amj Bjk therefore the i-th component of this vector is given by n (AB )ik = Aij Bjk . j =1 This establishes the Proposition. Example 2.3.2: We now use Eq. (2.5) to find the product of matrices A and B in Example 2.3.1. The component (AB )11 = 4 is obtained from the first row in matrix A and the first column in matrix B as follows: 2 −1 3 0 4 1 = , (2)(3) + (−1)(2) = 4; −1 2 2 −1 1 −2 And finally the component (AB )12 = −1 is obtained as follows: 2 −1 −1 2 3 2 0 4 = −1 1 1 , −2 (2)(0) + (−1)(1) = −1; The component (AB )21 = 1 is obtained as follows: 3 2 2 −1 −1 2 4 0 = 1 −1 1 , −2 (−1)(3) + (2)(2) = 1; And finally the component (AB )22 = −2 is obtained as follows: 2 −1 −1 2 3 2 0 4 = −1 1 1 , −2 (−1)(0) + (2)(−1) = −2. We have seen in Example 2.3.1 that the matrix product is not commutative, since in that example AB = BA. In that example the matrices were conformable in both orders AB and BA, although their products do not match. It can also be possible that two matrices A and B are conformable in the order AB but they are not conformable in the opposite order. That is, the matrix product is possible in one order but not in the other order. Example 2.3.3: Consider the matrices 43 A= 21 B= 1 4 2 5 3 6 These matrices are conformable in the order AB but not in the order BA. In the first case we obtain 43 123 16 23 30 AB = = . 21 456 6 9 12 The product BA is not possible. 64 G. NAGY – LINEAR ALGEBRA december 8, 2009 Example 2.3.4: Column vectors and row vectors are particular cases of matrices, more precisely, they are n × 1 and 1 × n matrices, respectively. We denote row vectors as transpose of a column vector. Using this notation and the matrix product, compute both products vT u and u vT , where the vectors are given by u= 2 , 3 5 . 1 v= Solution: In the first case we multiply the matrices 1 × 2 and 2 × 1, so the result is a 1 × 1 matrix, a real number, given by, vT u = 5 1 2 = 13. 3 In the second case we multiply the matrices 2 × 1 and 1 × 2, so the result is a 2 × 2 matrix, given by, u vT = 2 3 10 15 5 1= 2 . 3 It is well-known that the product of two numbers is zero, then one of them must be zero. This property is not true in the case of matrix multiplication, as can be seen in the following Example. Example 2.3.5: Compute the product AB where A= 1 −1 −1 1 B= 1 1 −1 . −1 Solution: It is simple to check that AB = 1 −1 −1 1 1 −1 0 = 1 −1 0 0 . 0 The product is the zero matrix, although A = 0 and B = 0. 2.3.1. Composition and matrix multiplication. The origin of the matrix product is the composition of linear functions, as can be seen in the following result. Proposition 2.13. Given the m × n matrix A : Fn → Fm and the n × matrix B : F → Fn , their composition is a function B A A ◦ B : F −→ Fn −→ Fm , which is an m × matrix given by the matrix product of A and B, that is, A ◦ B = AB. Proof of Proposition 2.13: The composition of the function A and B is defined for all x ∈ F as follows (A ◦ B)x = A(Bx). G. NAGY – LINEAR ALGEBRA December 8, 2009 65 Introduce the usual notation B = [B:1 , · · · , B: ]. Then, the composition A ◦ B can be reexpressed as follows, x1 . (A ◦ B)x = A B:1 , · · · , B: . . x = A B:1 x1 + · · · + B: x = A B:1 x1 + · · · + A B: x = AB:1 x1 + · · · + AB: x x1 . = AB:1 , · · · , AB: . . x = (AB)x. This establishes the Proposition. Example 2.3.6: Find the matrix T : R2 → R2 that produces a rotation by an angle θ1 counterclockwise and then another rotation by an angle θ2 counterclockwise. Solution: Let us denote by R(θ) the matrix that performs a rotation on the plane by an angle θ counterclockwise, that is, R(θ) = cos(θ) sin(θ) − sin(θ) . cos(θ) The matrix T is the the composition T = R(θ2 ) ◦ R(θ1 ). Since the matrix of a composition of these two rotations can be obtained computing the matrix product of them, then we obtain T = R(θ2 )R(θ1 ) = cos(θ2 ) − sin(θ2 ) sin(θ2 ) cos(θ2 ) cos(θ1 ) sin(θ1 ) − sin(θ1 ) . cos(θ1 ) The product above is given by T= cos(θ2 ) cos(θ1 ) − sin(θ2 ) sin(θ1 ) sin(θ1 ) cos(θ2 ) + sin(θ2 ) cos(θ1 ) − sin(θ1 ) cos(θ2 ) + sin(θ2 ) cos(θ1 ) − sin(θ2 ) sin(θ1 ) + cos(θ2 ) cos(θ1 ) . The formulas cos(θ1 + θ2 ) = cos(θ2 ) cos(θ1 ) − sin(θ2 ) sin(θ1 ) sin(θ1 + θ2 ) = sin(θ1 ) cos(θ2 ) + sin(θ2 ) cos(θ1 ), imply that cos(θ1 + θ2 ) − sin(θ1 + θ2 ) . sin(θ1 + θ2 ) cos(θ1 + θ2 ) Notice that we have obtained a result that it is intuitively clear: Two consecutive rotations on the plane is equivalent to a single rotation by an angle that is the sum of the individual rotations, that is, R(θ2 )R(θ1 ) = R(θ2 + θ1 ). In particular, notice that the matrix multiplication when restricted to the set of all rotation matrices on the plane is a commutative operation, that is, T= R(θ2 )R(θ1 ) = R(θ1 )R(θ2 ), θ1 , θ2 ∈ R. 66 G. NAGY – LINEAR ALGEBRA december 8, 2009 In Examples 2.3.7 and 2.3.8 below we show that the order of the functions in a composition change the resulting function. This is essentially the reason behind the non-commutativity of the matrix multiplication. Example 2.3.7: Find the matrix T : R2 → R2 that first performs a rotation by an angle π/2 counterclockwise and then performs a reflection along the x1 = x2 line on the plane. Solution: The matrix T is the composition the rotation R(π/2) with the reflection function A, given by 0 −1 01 R(π/2) = , A= . 1 0 10 The matrix T is given by T = AR(π/2) = 01 10 0 1 −1 1 = 0 0 0 . −1 Therefore, the function T is a reflection along the horizontal line x2 = 0. Example 2.3.8: Find the matrix S : R2 → R2 that first performs a reflection along the x1 = x2 line on the plane an then performs a rotation by an angle π/2 counterclockwise. Solution: The matrix S is the composition the reflection function A and then the rotation R(π/2), given in the previous Example 2.3.7. The matrix S is the given by S = R(π/2)A = 0 1 −1 0 01 −1 = 10 0 0 . 1 Therefore, the function S is a reflection along the vertical line x1 = 0. 2.3.2. Properties of the matrix multiplication. We summarize the main properties of the matrix product in the following result. Proposition 2.14. The following properties hold for all m × n matrix A, n × m matrix B, n × matrices C, D, and × k matrix E: (a) AB = BA in the general case, so the product is non-commutative; (b) A(C + D) = AC + AD, and (C + D)E = CE + DE; (c) A(CE) = (AC)E, associativity; (d) Im A = AIn = A; (e) (AC)T = CT AT , and (AC)∗ = C∗ A∗ . (f ) In the case m = n A and B are arbitrary n × n matrices. Then tr (AB) = tr (BA). Proof of Proposition 2.14: Part (a): When m = n the matrix AB is m × m while BA is n × n, so they cannot be equal. When m = n, the matrices in Example 2.3.4 and 2.3.5 show that this product is not commutative. Part (b): This property can be shown as follows: A(C + D) = A C:1 , · · · , C: + D:1 , · · · , D: = A (C:1 + D:1 ), · · · , (C: + D: ) = A(C:1 + D:1 ), · · · , A(C: + D: ) = (AC:1 + AD:1 ), · · · , (AC: + AD: ) = AC:1 , · · · , AC: = AC + AD. The other equation is proven in a similar way. + AD:1 , · · · , AD: G. NAGY – LINEAR ALGEBRA December 8, 2009 67 Part (c): This property is proven using the component expression for the matrix product: n A(CE ) ij n = Aik (CE )kj = k=1 Aik k=1 Ckl Elj ; l=1 however, the order of the sums can be interchanged, n n Aik k=1 Ckl Elj = l=1 Aik Ckl Elj ; l=1 k=1 So, from this last expression is not difficult to show that: n Aik Ckl Elj = l=1 k=1 (AC )il Elj = (AC )E ij ; l=1 We have just proved that A(CE ) ij = (AC )E ij . Part (d): We can use components again, recalling that the components of Im are given by (Im )ij = 0 if i = j and is (Im )ii = 1. Therefore, m (Im A)ij = (Im )ik Akj = (Im )iiAij = Aij . k=1 Analogously, n (AIn )ij = Aik (In )kj = Aij (In )jj = Aij . k=1 Part (e): Use components once more: n (AC )T ij n = (AC )ji = k=1 n (AT )kj (C T )ik = Ajk Cki = k=1 (C T )ik (AT )kj = C T AT ij . k=1 The second equation follows from the proof above and the property of the complex conjugate: (AC) = A C. Indeed (AC)∗ = (AC)T = CT AT = CT AT = C∗ A∗ . Part (f): Recall that the trace of a matrix A is given by n tr (A) = A11 + · · · + Ann = Aii . i=1 Then it is simple to see that n n tr (AB) = n i=1 k=1 n n Aik Bki = n Bik Aki = i=1 k=1 Bik Aki = tr (BA). k=1 i=1 This establishes the Proposition. We use the notation A2 = AA, A3 = A2 A, and An = An−1 A. Notice that the matrix product is not commutative, so the formula (a + b)2 = a2 + 2ab + b2 does not hold for matrices. Instead we have: (A + B)2 = (A + B)(A + B) = A2 + AB + BA + B2 . 68 G. NAGY – LINEAR ALGEBRA december 8, 2009 2.3.3. Block multiplication. The multiplication of two large matrices can be simplified in the case that each matrix can be subdivided in appropriate blocks. If these matrix blocks are conformable the multiplication of the original matrices reduces to the multiplication of the smaller matrix blocks. The next result is presents a simple case. Proposition 2.15. If A is an m × n matrix and B is an n × matrix having the following block decomposition, . . .A . m ×n 12 1 2 A11 . m1 × n1 . A = . . . . . . . . . . . . , A = . . . . . . . . . . . . . . . . . . . . . , . . .A . m ×n A21 . m 2 × n1 . 22 2 2 . . . . B11 . B12 n1 × 1 . n1 × 2 . . . . . . . . . . . . , . . . . . . . . . . . . . . . . . . . , B= B= . . .B B21 . n2 × 1 . n2 × 2 . 22 where m1 + m2 = m, n1 + n2 = n and 1 + 2 = , then the product AB has the form . . . A B +A B . m× 11 12 12 22 1 2 A11 B11 + A12 B21 . m1 × 1 . AB = . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . , AB = . . . . . . . . . . . . . . . . . . . . . . . . A B +A B . m× A B +A B . m× . 21 11 22 21 21 12 22 22 2 1 2 2 The proof is a straightforward computation, so we omit it. This type of block decomposition is useful when several blocks are repeated inside the matrices A and B . This situation appears in the following example. Example 2.3.9: Use block multiplication to find the matrix AB , where 1210 1012 3 4 0 1 0 1 3 4 A= B= 1 0 0 0 , 0 0 1 2 . 0100 0034 Solution: These matrices have the following block structure: . . 1 2 . 1 0 . 1 0 . 1 2 . . . 3 4 . 0 1 0 1 . 3 4 . . A = . . . . . . . . . . . . . . , B = . . . . . . . . . . . . . . , . . 1 0 . 0 0 0 0 . 1 2 . . . . .00 .34 01. 00. so, introduce the matrices I= 1 0 0 , 1 C= 12 , 34 0= 00 , 00 then, the original matrices have the block form . . C . I . I . C . A = . . . . . . . . , B = . . . . . . . . . . . .0 .C I. 0. G. NAGY – LINEAR ALGEBRA December 8, 2009 69 Then, the matrix AB has the form . . C2 + C C . AB = . . . . . . . . . . . . . . . . I. C So, the only calculation we need to do is the matrix C2 + C, which is given by C2 + C = hence . . 8 12 1 2 . . 3 4 . 18 26 . AB = . . . . . . . . . . . . . . . . . 1 0 . 1 2 . . .34 01. 8 12 , 18 26 ⇔ 1 3 AB = 1 0 2 4 0 1 8 18 1 3 12 26 . 2 4 Example 2.3.10: Given arbitrary matrices A, that is n × k , and B, that is k × n, show that the (n + k ) × (n + n) matrix C below satisfies C2 = In+k , where . . . B Ik − BA C = . . . . . . . . . . . . . . . . . . . . . . . . AB − I 2A − ABA . n Solution: Notice that AB is an n × n matrix, while BA is an k × k matrix, so the definition of C implies that . . k×n k × k . C = . . . . . . . . . . . . . . . . . . . n×n n×k . Using block multiplication we obtain: . . . . (Ik − BA) . B (Ik − BA) . B C2 = . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . = . . . (AB − I ) . (AB − I ) (2A − ABA) . (2A − ABA) . n n . . (Ik − BA)(Ik − BA) + B(2A − ABA) . (Ik − BA)B + B(AB − In ) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . (2A − ABA)B + (AB − I )(AB − I ) (2A − ABA)(I − BA) + (AB − I )(2A − ABA) . k n n n Now, notice that the block (1, 1) above is: (Ik − BA)(Ik − BA) + B(2A − ABA) = Ik − BA − BA + BABA + 2BA − BABA = Ik . The block (1, 2) is: (Ik − BA)B + B(AB − In ) = B − BAB + BAB − B = 0. 70 G. NAGY – LINEAR ALGEBRA december 8, 2009 The block (2, 1) is: (2A − ABA)(Ik − BA) + (AB − In )(2A − ABA) = 2A − 2ABA − ABA + ABABA + 2ABA − ABABA − 2A + ABA = 0. Finally, the block (2, 2) is: (2A − ABA)B + (AB − In )(AB − In ) = 2AB − ABAB + ABAB − AB − AB + In = In . Therefore, we have shown that: . Ik . 0 . C2 = . . . . . . . . . = Ik+n . . .I 0. n 2.3.4. Matrix commutators. Matrix multiplication is in general not commutative. Given two n × n matrices A and B, their product AB is in general different from BA. We call the difference between these two matrices the commutator of A and B. Definition 2.16. Given any n × n matrices A and B, their commutator is the matrix denoted as [A, B] given by [A, B] = AB − BA. Furthermore, the square matrices A, B commute iff [A, B] = 0. The commutator of two operators defined on an inner product space is an important concept in quantum mechanics, since it measures how well the observables corresponding to these operators can be measured simultaneously. The uncertainty principle is a statement about these commutators. Example 2.3.11: We have seen in Example 2.3.6 that two rotation matrices R(θ1 ) and R(θ2 ) commute for all θ1 , θ2 ∈ R, that is, [R(θ1 ), R(θ2 )] = R(θ1 )R(θ2 ) − R(θ2 )R(θ1 ) = 0. Proposition 2.17. For all matrices A, B, C ∈ Fn,n and scalars a, b, c ∈ F holds: (a) [A, B] = −[B, A], (antisymmetry); (b) [aA, bB] = ab [A, B], (linearity); (c) [A, B + C] = [A, B] + [A, C], (linearity on the right entry); (d) [A + B, C] = [A, C] + [B, C], (linearity on the left entry); (e) [A, BC] = [A, B]C + B[A, C], (right derivation property); (f ) [AB, C] = [A, C]B + A[B, C], (right derivation property); (g) [A, B], C + [C, A], B + [B, C], A = 0, (Jacobi property). Proof of Proposition 2.17: All properties are simple to show by a straightforward computation. Part (a): [A, B] = AB − BA = − BA − AB = −[B, A]. Part (b): [aA, bB] = aAbB − bBaA = ab AB − BA = ab [A, B]. Part (c): [A, B + C] = A(B + C) − (B + C)A = AB + AC − BA − CA = [A, B] + [A, C]. G. NAGY – LINEAR ALGEBRA December 8, 2009 71 Part (d) is similar. Part (e): [A, BC] = A(BC) − (BC)A = ABC + (BAC − BAC) − BCA = [A, B]C + B[A, C]. Part (f) is similar. Part (g): We write down each term and we verify that they all add up to zero: [A, B], C = (AB − BA)C − C(AB − BA); [C, A], B = (CA − AC)B − B(CA − AC); [B, C], A = (BC − CB)A − A(BC − CB). This establishes the Proposition. We finally highlight that these properties implies that [A, Am ] = 0 holds for all m ∈ N. Further reading. Almost every book in linear algebra explains matrix multiplication. See Section 3.5 in Meyer’s book [3], and Section 3.6 for block multiplication. Also Strang’s book [4]. The definition of commutators can be found in Section 2.3.1 in Hassani’s book [1]. Commutators play an important role in the Spectral Theorem, which we study in Chapter 8. 72 G. NAGY – LINEAR ALGEBRA december 8, 2009 Exercises. 2.3.1.- Given the matrices 2 3 2 3 1 −2 3 12 A = 40 −5 45 , B = 40 45 , 4 −3 8 37 23 1 and the vector C = 425, compute the 3 following products, if possible: (a) AB, BA, CB and CT B. (b) A2 , B2 , CT C and CCT . 2.3.2.- Consider the matrices » – » – » 21 11 1 A= ,B = ,C = 31 30 2 – 4 . 3 (a) Compute [A, B]. (b) Find the product ABC. 2.3.3.- Find A2 and A3 2 0 A = 40 0 for 1 0 0 the matrix 3 1 15 . 0 2.3.4.- Given a real number a find the matrix An , where n is any positive integer and – » 1a . A= 01 2.3.5.- Given any square matrices A, B, prove that (A + B)2 = A2 +2AB + B2 ⇔ [A, B] = 0. 2.3.6.- Given a = 1/3, 2 10 60 1 6 60 0 A=6 60 0 6 40 0 00 divide the matrix 3 0aaa 0 a a a7 7 1 a a a7 7 0 a a a7 7 0 a a a5 0aaa into appropriate blocks, and using block multiplication find the matrix A300 . 2.3.7.- Prove that for all matrices A, B ∈ Fn,n holds tr (AB) = tr (BA). 2.3.8.- Let `A be´ an m × n matrix. Show that tr AT A = 0 iff A = 0. 2.3.9.- Prove that: If A, B ∈ Fn,n are symmetric matrices and commute, then their product AB is also a symmetric matrix. 2.3.10.- Let A be an arbitrary n × n matrix. Use the trace function to show that there exists no n × n matrix X solution of the matrix equation [A, X] = In . G. NAGY – LINEAR ALGEBRA December 8, 2009 73 2.4. Inverse matrix In this Section we introduce the concept of the inverse of a square matrix. Not every square matrix is invertible. In the case of 2 × 2 matrices we present a condition on the matrix coefficients that is equivalent to the invertibility of the matrix, and we also present a formula for the inverse matrix. Later on in Section 3.2 we generalize this formula for n × n matrices. The inverse of a matrix is useful to compute solutions to systems of linear equations. We start recalling that the matrix In ∈ Fn,n is called the identity matrix iff holds that Ix x = x for all x ∈ Fn . It is simple to see that the components of the identity matrix are given by Iii = 1 In = [Iij ] with Iij = 0 i = j. The cases n = 2, 3 are given by I2 = 10 , 01 10 I3 = 0 1 00 0 0 . 1 We use the following special notation for the column vectors of the identity matrix, In = [e1 , · · · , en ], that is, we denote I:i = ei for i = 1, · · · , n. For example, in the case of I3 we have that 100 1 0 0 I3 = [e1 , e2 , e3 ] = 0 1 0 ⇒ e1 = 0 , e2 = 1 , e3 = 0 . 001 0 0 1 We are now ready to introduce the notion of the inverse matrix. Definition 2.18. A matrix A ∈ Fn,n is called invertible iff there exists a matrix, denoted as A−1 , such that A−1 A = In , and A A−1 = In . Since the matrix product is non-commutative, the products A A−1 = In and A−1 A = In must be specified in the definition above. Notice that we do not need to assume that the inverse matrix belongs to Fn,n , since both products A A−1 and A−1 A are well-defined we conclude that the inverse matrix must be n × n. Example 2.4.1: Verify that the matrix and its inverse are given by A= 22 13 A−1 = 13 4 −1 −2 . 2 Solution: We have to compute the products, A A−1 = 221 3 1 3 4 −1 140 −2 = = I2 . 2 404 It is simple to check that the equation A−1 A = I2 also holds. Example 2.4.2: The only real numbers that are equal to is own inverses are a = 1 and a = −1. This is not true in the case of matrices. Verify that the matrix A below is its own inverse, that is, 1 1 A= = A−1 . 0 −1 74 G. NAGY – LINEAR ALGEBRA december 8, 2009 Solution: We have to compute the products, A A−1 = 1 0 1 −1 1 0 1 10 = = I2 . −1 01 It is simple to check that the equation A−1 A = I2 also holds. Example 2.4.3: Not every square matrix is invertible. Show that the following matrix has no inverse: 12 A= . 36 Solution: Suppose there exists the inverse matrix A−1 = ab cd Then, the following equation holds, A A−1 = I2 12 36 ⇔ ab 1 = cd 0 0 . 1 The last equation implies a + 2 c = 1, b + 2 d = 0, 3(a + 2c) = 0, 3(b + 2d) = 1. However, both systems are inconsistent, so the inverse matrix A−1 does not exist. In the case of 2 × 2 matrices there is a simple way to find out whether a matrix has inverse or not. If the 2 × 2 matrix is invertible, then there is a simple formula for the inverse matrix. This is summarized in the following result. Theorem 2.19. Given a 2 × 2 matrix A introduce the number ∆, called the determinant of A, as follows: ab A= , ∆ = ad − bc. cd The matrix A is invertible iff ∆ = 0. Furthermore, if A is invertible, its inverse is given by A−1 = 1 d −b . −c a ∆ (2.6) The number ∆ is called the determinant of A since it is the number that determines whether A is invertible or not, and soon we will see that it is the number that determines whether a system of linear equations has a unique solution or not. Also later on we will study generalizations to n × n of the Theorem above. That will require a generalization to n × n matrices of the determinant ∆ of a matrix. Example 2.4.4: Compute the inverse of matrix A = 22 , given in Example 2.4.1. 13 Solution: Following Theorem 2.19 we first compute ∆ = 6 − 4 = 4. Since ∆ = 0, then A−1 exists and it is given by 1 3 −2 . A−1 = 2 4 −1 G. NAGY – LINEAR ALGEBRA December 8, 2009 75 Example 2.4.5: Theorem 2.19 says that the matrix given in Example 2.4.3 is not invertible, since 12 A= ⇒ ∆ = 6 − (3)(2) = 0. 36 Proof of Theorem 2.19: If the matrix A−1 exists, from the definition of inverse matrix it follows that A−1 must be 2 × 2. Suppose that the inverse of matrix A is given by A−1 = x1 x2 y1 . y2 We first show that A−1 exists iff ∆ = 0. The 2 × 2 matrix A−1 exists iff the equation A A−1 = I2 , which is equivalent to the systems ab cd x1 x2 y1 1 = y2 0 0 1 ⇔ ax1 + bx2 = 1, ay1 + by2 = 0, cx1 + dx2 = 0, cy1 + dy2 = 1. (2.7) Consider now the particular case a = 0 and c = 0, which imply that ∆ = 0. In this case the equations above reduce to: bx2 = 1, by2 = 0, dx2 = 0, dy2 = 1. These systems have no solution, since from the first equation on the left b = 0, and from the first equation on the right we obtain that y2 = 0. But this contradicts the second equation on the right, since dy2 is zero and so can never be equal to one. We then conclude there is no inverse matrix in this case. Assume now that at least one of the coefficients a or c is non-zero, and let us return to Eqs. (2.7). Both systems can be solved using the following augmented matrix ab cd 10 01 Now, perform the following Gauss operations: ac ac bc ad c 0 0 ac → a 0 bc ad − bc c −c 0 ac bc = a 0∆ c0 −c a At least one of the source coefficients in the second row above is non-zero. Therefore, the system above is consistent iff ∆ = 0. This establishes the first part of the Theorem. In order to prove the furthermore part, one can continue with the calculation above, and find the formula for the inverse matrix. This is a long calculation, since one has to study three different cases: the case a = 0 and c = 0, the case a = 0 and c = 0, and the case where both a and c are non-zero. It is faster to check that the expression in the Theorem is indeed the inverse of A. Since ∆ = 0, the matrix in Eq. (2.6) is well-defined. Then, the straightforward calculation A A−1 = 1 d −b ∆ −ab + ba ab1 = = I2 . a cd − dc ∆ c d ∆ −c ∆ It is not difficult to see that the second condition A−1 A = I2 is also satisfied. This establishes the Theorem. There are many different ways to characterize n × n invertible matrices. One possibility is to relate the existence of the inverse matrix to the solution of appropriate systems of linear equations. The following result summarizes few of these characterizations. 76 G. NAGY – LINEAR ALGEBRA december 8, 2009 Theorem 2.20. Given an n × n matrix A, the following statements are equivalent: (a) A−1 exists; (b) rank(A) = n; (c) EA = In ; (d) The homogeneous equation Ax = 0 has a unique solution x = 0; (e) The non-homogeneous equation Ax = b has a unique solution for all source b ∈ Rn . Proof of Theorem 2.20: It is clear that properties (b)-(d) are all equivalent. Here we only show that (a) is equivalent to (b). We start with (a) ⇒ (b): Since A−1 exists, the homogeneous equation Ax = 0 has a unique solution x = 0. (Proof: Assume there are two solutions x1 and x2 , then A(x2 − x1 ) = 0, and so (x2 − x1 ) = A−1 A(x2 − x1 ) = A−1 0 = 0, and so x2 = x1 .) But this implies there are no free variables in the solutions of Ax = 0, that is, EA = In , that is, rank(A) = n. We finish with (b) ⇒ (a): Since rank(A) = n, the non-homogeneous equation Ax = b has a unique solution x for every source vector b ∈ Rn . In particular, there exist unique vectors x1 , · · · , xn solutions of Ax1 = e1 , · · · Axn = en . where ei is the i-th column of the identity matrix In . This is equivalent to say that the matrix X = x1 , · · · , xn satisfies the equation AX = In . −1 This matrix X = A since X also satisfies the equation XA = In . (Proof: Consider the identities A − A = 0 ⇔ AXA − A = 0 ⇔ A(XA − In ) = 0; The last equation is equivalent to the n systems of equations Ayi = 0, where yi is the i-th column of the matrix (XA − In ); Since rank(A) = n, each of these systems has a unique solution yi = 0, that is, XA = In .) This establishes the Theorem. It is simple to see from Theorem 2.20 that an invertible matrix has a unique inverse. Corollary 2.21. An invertible matrix has a unique inverse. Proof of Corollary 2.21: Suppose that matrices X and Y are two inverses of an n × n matrix A. Then, AX = In and AY = In , hence A(X − Y) = 0. The latter are n systems of linear equations, one for each column vector in (X − Y). From Theorem 2.20 we know that rank(A) = n, so the only solution to these equations is the trivial solution, so each column vector vanishes, therefore X = Y. 2.4.1. Properties of invertible matrices. The three most basic properties are summarized below. Proposition 2.22. If A and B are n × n invertible matrices, then holds: (a) A−1 −1 = A; (b) AB −1 = B−1 A−1 ; (c) AT −1 = A−1 T . Proof of Proposition 2.22: Since an invertible matrix is unique, we only have to verify these equations in (a)-(c). −1 satisfying the equations Part (a): The inverse of A−1 is a matrix A−1 A−1 −1 A−1 = In , A−1 A−1 −1 = In . But matrix A satisfies precisely these equations, and recalling that the inverse of a matrix −1 is unique, then A = A−1 . G. NAGY – LINEAR ALGEBRA December 8, 2009 77 Part (b): The proof is similar to the previous one. We verify that matrix B−1 ( A−1 satisfies the equations that (AB)−1 must satisfy. Then, they must be the same, since the inverse matrix is unique. Notice that, (AB) B−1 A−1 = A BB−1 A−1 , B−1 A−1 (AB) = B−1 A−1 A B, = B−1 B, = A A−1 , = In ; = In . −1 We then conclude that AB = B−1 A−1 . T Part (c): Recall that (AB) = BT AT , therefore, A−1 A = In ⇔ A A−1 = In ⇔ T A A−1 T = IT n ⇔ T A−1 A = IT n ⇔ AT A−1 A−1 T T = In , AT = In . −1 . This establishes the Proposition. Therefore, A−1 = AT The properties presented above are useful to solve equations involving matrices. Example 2.4.6: Find a matrix C solution of the equation (BC)T − A = 0, where 1 2 1 −1 4 , A= 3 B= . 1 2 −1 −1 Solution: The matrix B is invertible, since ∆(B) = 3, so B is invertible and its inverse is given by 121 B−1 = . 3 −1 1 Therefore, matrix C is given by (BC)T = A BC = AT ⇔ ⇔ C = B−1 AT , that is, C= 121 3 −1 1 1 2 3 4 −2 −1 ⇒ C= 1 4 10 31 1 −5 . 1 Example 2.4.7: The (n + k ) × (n + k ) matrix C introduced in Example 2.3.10 satisfies the equation C2 = In+k . Therefore, matrix C is its own inverse, that is, C−1 = C. 2.4.2. Computing the inverse matrix. We now show how to use Gauss operations to find the inverse matrix in the case that such inverse exists. The main idea needed to compute the inverse of an n × n matrix is summarized in the following result. Proposition 2.23. Let A be an n × n matrix. If the n systems of linear equations Ax1 = e1 , ··· Axn = en , are all consistent, then the matrix A is invertible and its inverse is given by A−1 = x1 , · · · , xn . If at least one system in Eq. (2.8) is inconsistent, then matrix A is not invertible. (2.8) 78 G. NAGY – LINEAR ALGEBRA december 8, 2009 Proof of Proposition 2.23: We first show that the consistency of all systems in Eq. (2.8) implies that matrix A is invertible. Indeed, if all systems in Eq. (2.8) are consistent, then the system Ax = b is also consistent for all b ∈ Rn (Proof: The solution for the source b = b1 e1 + · · · + bn en is simply x = b1 x1 + · · · + bn xn .) Therefore, rank(A) n. Since A is an n × n matrix, we conclude that rank(A) = n. Then, Theorem 2.20 implies that matrix A is invertible. We now introduce the matrix X = x1 , · · · , xn and we show that A−1 = X. From the definition of X we see that AX = In . Since rank(A) = n, the same argument given in the proof of Theorem 2.20 shows that XA = In . Since the inverse of a matrix is unique, we conclude that A−1 = X. This establishes the Proposition. This Proposition provides a method to find the inverse of a matrix. One solves the n linear systems in Eq. (2.8). Since all these systems share the same coefficient matrix, matrix A, one can solve them all at the same time, introducing the augmented matrix A e1 , · · · , en = A In . In the proof of Proposition 2.23 we show that rank(A) = n. Therefore, its reduced echelon form is EA = In , so the Gauss method implies that A In = A e1 , · · · , en → EA x1 , · · · , xn = In A−1 . We can summarize the method to find the inverse of an n × n invertible matrix A using Gauss operations as follows: A In → In A−1 . Example 2.4.8: Find the inverse of the matrix 24 A= . 13 Solution: We have to solve the two systems of equations Ax1 = e1 and Ax2 = e2 , that is, 2 1 4 3 2 1 1 x1 = 0 x2 0 y1 . = 1 y2 4 3 We can solve both systems at the same time using the Gauss method on the following augmented matrix 24 13 10 12 → 01 13 1/2 0 12 → 01 01 1/2 −1/2 0 10 → 1 01 therefore, A−1 = 13 2 −1 Example 2.4.9: Find the inverse of the matrix 12 A = 2 5 37 Solution: We compute 12 2 5 37 −4 . 2 3 7 . 9 the inverse matrix as follows: 123 3 100 7 0 1 0 → 0 1 1 010 9 001 100 −2 1 0 → −3 0 1 3/2 −1/2 −2 1 G. NAGY – LINEAR ALGEBRA December 8, 2009 10 0 1 00 1 1 −1 −2 0 10 1 0 → 0 1 00 −1 1 5 −2 −1 Therefore, 4 −3 1 −3 0 1 1 1 . −1 1 1 . −1 A−1 Example 2.4.10: Is matrix A = 0 0 1 79 4 −3 0 = −3 1 1 1 2 2 invertible? 4 Solution: The answer is no, since 12 24 10 1 → 01 0 2 0 10 −2 1 and the two systems of equations given by the augmented matrix above are inconsistent. Further reading. Almost any book in linear algebra introduces the inverse matrix. See Sections 3.7 and 3.9 in Meyer’s book [3]. 80 G. NAGY – LINEAR ALGEBRA december 8, 2009 Exercises. 2.4.1.- When possible, find the inverse of the following matrices. (Check your answers using matrix multiplication.) (a) » – 12 A= ; 13 (b) 3 2 −1 −2 1 2 −65 ; A=4 3 1 1 −2 (c) 2 1 A = 44 7 2 5 8 3 3 65 . 9 2.4.2.- Find the values of the constant k such that the matrix A below is not invertible, 2 3 1 1 −1 k 5. A = 42 3 1k 3 2.4.3.- Consider the matrix 2 3 0 −1 0 0 1 5, A = 42 0 1 −1 (a) Find the inverse of matrix A. (b) Use the previous part to find a solution to the linear system 23 1 Ax = b, b = 4−15 . 3 2.4.4.- Show that for every invertible matrix A holds that [A, A−1 ] = 0. 2.4.5.- Find a matrix X such that the equation X = AX + B holds for 2 3 2 3 0 −1 0 12 0 −15 , B = 42 15 . A = 40 0 0 0 33 2.4.6.- If A is invertible and symmetric, then show that A−1 is also symmetric. 2.4.7.- Prove that: If the square matrix A satisfies A2 = 0, then the matrix (I − A) is invertible. 2.4.8.- Prove that: If the square matrix A satisfies A3 = 0, then the matrix (I − A) is invertible. 2.4.9.- Let A be a square matrix. Prove the following statements: (a) If A contains a zero column then A is not invertible; (b) If one column is multiple of another column in A, then matrix A is not invertible. (c) Use the trace function to prove the following statement: If A is an m×n matrix and B is an n × m matrix such that AB = Im and BA = In , then m = n. 2.4.10.- Consider the invertible matrices A ∈ Fr,r , B ∈ Fs,s and the matrix C ∈ Fr,s . Prove that the inverse of 2 3 . A . C7 . 6 6. . . . . . . . .7 4 5 . .B 0. is given by 2 3 . . −A−1 C B−1 −1 . 6A 7 6. . . . . . . . . . . . . . . . . . . . . 7 . 4 5 . . −1 0 . B G. NAGY – LINEAR ALGEBRA December 8, 2009 81 2.5. Null and range spaces A matrix A ∈ Fm,n defines two functions, A : Fn → Fm and AT : Fm → Fn , which in turn determine two sets in Fn and two sets in Fm . In this Section we define these sets, we call them null and range spaces, and study the main relations among them. We start introducing the two spaces associated with the function A. Definition 2.24. Consider the matrix A ∈ Fm,n defining the linear function A : Fn → Fm . The set N (A) ⊂ Fn given by N (A) = x ∈ Fn : Ax = 0 is called the null space of the function A. The set R(A) ⊂ Fm given by R(A) = y ∈ Fm : ∃ x ∈ Fn with Ax = y is called the range space of the function A. In Fig. 25 we show a picture, usual in set theory, sketching the null and range spaces associated with a matrix A. One can see in these pictures that for a function A : Fn → Fm , the null space is a subset in Fn , while the range space is a subset in Fm . Recall that the homogeneous equation Ax = 0 always has the trivial solution x = 0. This property implies both that 0 ∈ Fn also belongs to N (A) and that 0 ∈ Fm also belongs to R(A). A A F n F F m n F 0 m 0 0 0 R(A) N(A) Figure 25. Sketch of the null space, N (A), and the range space, R(A), of a linear function determined by the m × n matrix A. The four spaces associated with a matrix A are N (A) and R(A) together with N (AT ) and R(AT ). Notice that these null and range spaces are subsets of different spaces. More precisely, a matrix A ∈ Fm.n defines the functions and subsets, A : Fn → Fm N (A), R(AT ) ⊂ Fn , AT : Fm → Fn , and while R(A), N (AT ) ⊂ Fm . Example 2.5.1: Find the N (A) and R(A) for the function A : R3 → R2 given by A= 1 2 2 4 3 . 1 Solution: We first find N (A), the null space of A. This is the set of elements x ∈ R3 that are solutions of the equation Ax = 0. We use the Gauss method to find all such solutions, x1 = −2x2 123 12 3 120 x3 = 0, → → ⇒ 241 0 0 −5 001 x : free variable. 2 82 G. NAGY – LINEAR ALGEBRA december 8, 2009 Therefore, the elements in N (A) are given by −2 −2x2 x2 = 1 x2 ⇒ N (A) x = 0 0 N (A) = Span −2 1 0 ⊂ R3 . We now find R(A), the range space of A. This is the set of elements y ∈ R2 that can be expressed as y = Ax for any x ∈ R3 . In our case this means x 1 2 3 1 1 2 3 x2 = y = Ax = x+ x+ x. 241 21 42 13 x3 What this last equation says, is that the elements of R(A) can be expressed as a linear combination of the column vectors of matrix A. Therefore, the set R(A) is indeed the set of all possible linear combinations of the column vectors of matrix A. We then conclude that R(A) = Span 1 2 3 , , 2 4 1 . Notice that 2 1 2 = 2 4 ⇒ Span 1 2 3 , , 2 4 1 = Span 1 3 , 2 1 . Therefore, we conclude that R(A) = Span 1 3 , 2 1 = R2 . In the Example 2.5.1 above we have seen that both sets N (A) and R(A) can be expressed as spans of appropriate vectors. It is not surprising to see that the same property holds for N (AT ) and R(AT ), as we observe in the following Example. Example 2.5.2: Find the N (AT ) and R(AT ) for matrix in Example 2.5.1, that is, 1 AT = 2 3 the function AT : R2 → R3 , where A is the 2 4 . 1 Solution: We start finding the N (AT ), that is, all vectors y ∈ R2 solutions of the homogeneous equation AT y = 0. Using the Gauss method we obtain, 12 1 2 10 0 0 2 4 → 0 0 → 0 1 ⇒ y = ⇒ N (AT ) = ⊂ R2 . 0 0 31 0 −5 00 We now find the R(AT ). This is the set of x ∈ R3 that can be expressed as x = AT y for any y ∈ R2 , that is, 12 1 2 y x = AT y = 2 4 1 = 2 y1 + 4 y2 . y2 31 3 1 As in the previous example, this last equation says that the elements of R(AT ) can be expressed as a linear combination of the column vectors of matrix AT . Therefore, the set G. NAGY – LINEAR ALGEBRA December 8, 2009 83 R(AT ) is indeed the set of all possible linear combinations of the column vectors of matrix AT . We then conclude that 1 2 R(AT ) = Span 2 , 4 ⊂ R3 . 3 1 In Fig. 26 we have sketched the sets N (A) and R(AT ), which are subsets of R3 . x3 T R(A ) N(A) −2 2 x2 1 x1 Figure 26. We sketch the set N (A), which is a line on the plane x3 = 0 passing through the origin, and the set R(AT ), which is a plane also passing through the origin, where the matrix A is given in Example 2.5.1 and is also used in Example 2.5.2. Notice that the line is perpendicular to the plane. This property reflects a general property of the spaces N (A) and R(AT ), which will be studied later on. Example 2.5.3: Find the N (A) and R(A) for the matrix 1 3 −1 A = 2 6 −2 . 3 9 −3 Solution: Any element x ∈ N (A) The Gauss method implies, 1 3 −1 1 2 6 −2 → 0 3 9 −3 0 must be solution of the homogeneous equation Ax = 0. 3 0 0 −1 0 0 ⇒ Therefore, every element x ∈ N (A) has the form −3x2 + x3 −3 1 = 1 x2 + 0 x3 ⇒ x2 x= x3 0 1 The R(A) is the set of 1 y = 2 3 x1 = −3x2 + x3 , x2 , x3 free variables. N (A) = Span −3 1 1 , 0 0 1 all y ∈ R3 such that y = Ax for some x ∈ R3 . Therefore, 3 −1 x1 1 3 −1 6 −2 x2 = x1 2 + x2 6 + x3 −2 , 9 −3 x3 3 9 −3 . 84 G. NAGY – LINEAR ALGEBRA december 8, 2009 and we conclude that R(A) = Span 1 3 −1 2 , 6 , −2 3 9 −3 . However, the expression above can be simplified noticing that 1 1 −1 3 −2 = − 2 , 6 = 3 2 , 3 −3 3 9 so a simpler answer for R(A) is the following: R(A) = Span 1 2 3 . The property of the null and range sets we have found for the matrix in Examples 2.5.1 and 2.5.2 also holds for every matrix. This will be an important property later on, so we give it a name. Definition 2.25. A subset of U ⊂ Fn is called closed under linear combination iff for all elements x, y ∈ U and all scalars a, b ∈ F holds that (ax + by) ∈ U . In words, a set U ⊂ Fn is closed under linear combination iff every linear combination of elements in U stays in U . In Chapter 4 we generalize the linear combination structure of Fn into a structure we call a vector space; we will see in that Chapter that sets in a vector space which are closed under linear combinations are smaller vector spaces inside the original vector space, and will be called subspaces. We now state as a general result the property we found both in Examples 2.5.1 and 2.5.2. Proposition 2.26. Given a matrix A ∈ Fm,n , both sets N (A) ⊂ Fn and R(A) ⊂ Fm are closed under linear combinations in Fn and Fm , respectively. Furthermore, denoting the matrix A = A:1 , · · · , A:n , we conclude that R(A) = Span A:1 , · · · , A:n . It is common in the literature to introduce the column space of an m × n matrix A = A:1 , · · · , A:n , denoted as Col(A), as the set of all linear combinations of the column vectors of A, that is, Col(A) = Span A:1 , · · · , A:n . The Proposition above then says that R(A) = Col(A). Proof of Proposition 2.26: The sets N (A) and R(A) are closed under linear combinations because the matrix-vector product is a linear operation. Consider two arbitrary elements x1 , x2 ∈ N (A), that is, Ax1 = 0 and Ax2 = 0. Then, for any a, b ∈ F holds A(ax1 + bx2 ) = a Ax1 + b Ax2 = 0 ⇒ (ax1 + bx2 ) ∈ N (A). Therefore, N (A) ⊂ Fn is closed under linear combinations. Analogously, consider two arbitrary elements y1 , y2 ∈ R(A), that is, there exist x1 , x2 ∈ Fn such that y1 = Ax1 and y2 = Ax2 . Then, for any a, b ∈ F holds (ay1 + by2 ) = a Ax1 + b Ax2 = A(ax1 + bx2 ) ⇒ (ay2 + by2 ) ∈ R(A). Therefore, R(A) ⊂ Fm is closed under linear combinations. The furthermore part is proved as follows. Denote A = A:1 , · · · , A:n , then any element y ∈ R(A) can be expressed as G. NAGY – LINEAR ALGEBRA December 8, 2009 y = Ax for some x = [xi ] ∈ Fn , that is, x1 . y = Ax = A:1 , · · · , A:n . = A:1 x1 + · · · + A:n xn ∈ Span . 85 A:1 , · · · , A:n . xn This implies that R(A) ⊂ Col(A). The opposite inclusion, Col(A) ⊂ R(A) is trivial, since any element of the form y = A:1 x1 + · · · + A:n xn ∈ Col(A) also belongs to R(A), since y = Ax, where x = [xi ]. This establishes the Proposition. 2.5.1. Gauss operations and the four spaces. In this section we study the relations between the four spaces associated with the matrices A and B in the case that matrix B can be obtained from matrix A by performing Gauss operations. We use the following row notation: A ←→ B to indicate that A can be transformed into matrix B by performing Gauss operations on the rows of A. Proposition 2.27. If matrices A, B ∈ Fm,n , then row (a) A ←→ B row (b) A ←→ B ⇔ ⇔ N (A) = N (B); R(AT ) = R(BT ). The proof of this Proposition is based in the following property of Gauss operations: If the m × n matrices A and B are related by Gauss operations on their rows, then there exists an m × m invertible matrix G such that GA = B. The proof of this property goes along the following ideas. Each one of the Gauss operations is associated with an invertible matrix, E, called an elementary Gauss matrix. Every elementary Gauss matrix is invertible, since every Gauss operation can always be reversed. The result of several Gauss operations on a matrix A is the product of the appropriate elementary Gauss matrices in the same order as the Gauss operations are performed. If the matrix B is obtained from matrix A by doing Gauss operations given by matrices Ei , for i = 1, · · · , k , in that order, we can express the result of the Gauss method as follows: Ek · · · E1 A = B, G = Ek · · · E1 ⇒ GA = B. Since each elementary Gauss matrix is invertible, then the matrix G is also invertible. The following example shows all 3 × 3 elementary Gauss matrices. Example 2.5.4: Find all the elementary Gauss matrices which operate on 3 × n matrices. Solution: In the case of 3 × n matrices, the elementary Gauss matrices are 3 × 3. We present these matrices for each one of the three main types of Gauss operations. Consider the matrices Ei for i = 1, 2, 3 are given by k00 100 100 E1 = 0 1 0 , E2 = 0 k 0 , E3 = 0 1 0 ; 001 001 00k then, the product Ei A represents the Gauss operation of multiplying by k the first, second and third row of A, respectively. Consider the matrices Ei for i = 4, 5, 6 given by 010 001 100 E4 = 1 0 0 , E5 = 0 1 0 , E6 = 0 0 1 ; 001 100 010 then, the product Ei A for i = 4, 5, 6 represents the Gauss operation of interchanging the first and second, the first and third, and the second and third rows of A, respectively. Finally, 86 G. NAGY – LINEAR ALGEBRA december 8, 2009 consider the matrices Ei for j = 7, · · · , 12 given by 100 100 1 E7 = k 1 0 , E8 = 0 1 0 , E9 = 0 001 k01 0 1k0 10k 1 E10 = 0 1 0 , E11 = 0 1 0 , E12 = 0 001 001 0 0 1 k 0 0 , 1 00 1 k ; 01 then, the product Ei A for i = 7, · · · , 12 represents the Gauss operation of multiplying by k one row of A and add the result to another row of A. Proof of Proposition 2.27: Recall the comment below the Proposition 2.27: If the m × n matrices A and B are related by Gauss operations on their rows, then there exits an m × m invertible matrix G such that GA = B. This observation is the key to show that N (A) = N (B), since given any element x ∈ N (A) Ax = 0 ⇔ GAx = 0, where the equivalence follows from G being invertible. Then it is simple to see that 0 = GAx = Bx ⇔ x ∈ N (B). Therefore, we have shown that N (A) = N (B). We now show the converse statement. If N (A) = N (B) this means that their reduced echelon forms are the same, that is, EA = EB . This means that there exist Gauss operations on the rows of A that transform it into matrix B. We now show that R(AT ) = R(BT ). Given any element x ∈ R(AT ) we know that there exists an element y ∈ Fm such that x = AT y = AT GT GT −1 y = (GA)T ˜ = BT ˜, y y T ˜ = GT y T −1 y. T We have shown that given any x ∈ R(A ), then x ∈ R(B ), that is, R(A ) ⊂ R(BT ). The opposite implication is proven in the same way: Given any x ∈ R(BT ) there exists ˜ ∈ Fm y such that −1 T T x = BT ˜ = BT GT y G ˜ = G−1 B y = AT y, y y = GT ˜. y We have shown that given any x ∈ R(BT ), then x ∈ R(AT ), that is, R(BT ) ⊂ R(AT ). Therefore, R(AT ) = R(BT ). We now show that the converse statement. Assume that R(AT ) = R(BT ). This means that every row in matrix A is a linear combination of the rows in matrix B. This also means that there exists Gauss operations on the rows of A such that transform A into B. This establishes the Proposition. Example 2.5.5: One application of the Proposition 2.27 is the following: If EA is the reduced echelon form of matrix A, then R(AT ) = R (EA )T . Consider the matrix A and its reduced echelon form EA given by A= 1 2 2 4 Their respective transposed matrices 1 AT = 2 3 3 , 1 EA = 1 0 2 0 0 . 1 are given by 2 10 4 , (EA )T = 2 0 . 1 01 G. NAGY – LINEAR ALGEBRA December 8, 2009 The result of the Proposition 2.27 is that R(AT ) = R (EA )T ⇔ Span 1 2 2 , 4 3 1 = Span 87 1 0 2 , 0 0 1 . We can verify that this result above is correct, since the column vectors in AT are linear combinations of the column vectors in (EA )T , as the following equations show, 0 1 2 0 1 1 4 = 2 2 + 0 . 2 = 2 + 3 0 1 0 1 1 0 3 The Example 2.5.5 above helps understand one important meaning of Gauss operations. The Gauss operations on the rows of a matrix A can be interpreted as linear combination of the column vectors of AT . This is the reason why the interchange change A ↔ B by doing Gauss operations on rows leaves the the range spaces of their transposes invariant, that is, R(AT ) = R(BT ). Example 2.5.6: Consider the matrices 115 A = 2 0 6 , 127 1 B = 4 0 −4 4 −8 6 . −4 5 Verify whether the following equations hold: R(AT ) = R(BT )? R(A) = R(B)? N (A) = N (B)? N (AT ) = N (BT )? Solution: We base our answer in Proposition 2.27 and an extra observation. First, let EA and EB be the reduced echelon forms of A and B, respectively, and let EAT and EBT be the reduced echelon forms of AT and BT respectively. The extra observation is the row following: EA = EB iff A ←→ B. This observation and Proposition 2.27 imply that EA = EB T is equivalent to R(A ) = R(BT ) and it is also equivalent to N (A) = N (B). We then find EA and EB , 115 1 1 5 103 A = 2 0 6 → 0 −2 −4 → 0 1 2 = EA , 127 0 1 2 000 1 −4 4 1 −4 4 1 −4 4 10 −1 8 −10 → 0 4 −5 → 0 1 −5/4 = EB . B = 4 −8 6 → 0 0 −4 5 0 −4 5 0 0 0 00 0 Since EA = EB , we conclude that R(AT ) = R(BT ). This result also says that N (A) = N (B). So for the first and third questions, the answer is no. row A similar argument also says that EAT = EBT iff AT ←→ BT . This observation and Proposition 2.27 imply that EAT = EBT is equivalent to R(A) = R(B) and it is also equivalent to N (AT ) = N (BT ). We then find EAT and EBT , 121 1 21 10 2 AT = 1 0 2 → 0 −2 1 → 0 1 −1/2 = EAT , 567 0 −4 2 00 0 1 4 0 1 4 0 14 0 10 2 8 −4 → 0 2 −1 → 0 1 −1/2 = EBT . BT = −4 −8 −4 → 0 4 6 5 0 −10 5 00 0 00 0 88 G. NAGY – LINEAR ALGEBRA december 8, 2009 Since EAT = EBT , we conclude that R(A) = R(B). This result also says that N (AT ) = N (BT ). So for the second and four questions, the answer is yes. We finish this Section with two results concerning the ranks of a matrix and its transpose. We delay the proof of the first result to Chapter 6, where we introduce the notion of inner product in a vector space. Using an inner product in Fn , the proof of Proposition 2.28 below is simple. This result says that transposition operation on a matrix does not change its rank. Proposition 2.28. Every matrix A ∈ Fm,n satisfies that rank(A) = rank(AT ). We will give a proof of this statement in Chapter 6. We now introduce a particular name for those matrices having the maximum possible rank. Definition 2.29. An m × n matrix A has full rank iff rank(A) = min(m, n). Our last result of this Section concerns full rank matrices and relates the rank of a matrix A to the size of both N (A) and N (AT ). Proposition 2.30. If matrix A ∈ Fm,n has full rank, then hold: (a) If m = n, then rank(A) = rank(AT ) = n = m ⇔ {0} = N (A) = N (AT ) ⊂ Fn ; (b) If m < n, then rank(A) = rank(AT ) = m < n T (c) If m > n, then rank(A) = rank(A ) = n < m ⇔ ⇔ {0} N (A) ⊂ Fn , {0} = N (AT ) ⊂ Fm ; {0} = N (A) ⊂ Fn , {0} N (AT ) ⊂ Fm . Proof of Proposition 2.30: We start recaling that the rank of a matrix A is the number of pivot columns in its reduced echelon form EA . If an m × n matrix A has rank(A) = n, this means two things: First, n m; and second, that every column in EA has a pivot. The latter property implies that there is no free variables in the solution of the equation Ax = 0, and so x = 0 is the unique solution. We conclude that N (A) = {0}. In order to study N (AT ) we need to consider the two possible cases n = m or n < m. If n = m, then the matrices A and AT are square, and the same argument about free variables applies to solutions of AT y = 0, so we conclude that N (AT ) = {0}. This proves (a). In the case that n < m, then there are free variables in the solution of the equation AT y = 0, therefore {0} N (AT ). This proves (c). If an m × n matrix A has rank(A) = m, recalling that rank(A) = rank(AT ), this means two things: First, m n; and second, that every column in EAT has a pivot. The latter property shows that AT is full rank, so the argument above shows that N (AT ) = {0}. Since the case n = m has already been studied, so we only need to consider the case m < n. In this case there are free variables in the solution to the equation Ax = 0, therefore {0} N (A). This proves (b), and we have established the Proposition. Further reading. Section 4.2 in Meyer’s book [3] follows closely the descriptions of the four spaces given here, although in more depth. G. NAGY – LINEAR ALGEBRA December 8, 2009 89 Exercises. 2.5.1.- Find R(A) and R(AT ) for 2 3 1223 A = 42 4 1 3 5 . 3614 2.5.2.- Find the N (A), R(AT ) for 2 1 2 A = 4− 2 − 4 1 2 R(A), N (AT ) and 1 0 2 1 4 4 3 5 −25 . 9 2.5.3.- Let A be a 3 × 3 matrix such that 232 3 1 o” “n 1 R(A) = Span 425 , 4−15 , 3 2 23 “n −2 o” N (A) = Span 4 1 5 . 0 (a) Show that the linear system Ax = b is consistent for 23 1 b = 485 . 5 (b) Show that the system Ax = b above has infinitely many solutions. 2.5.4.- Prove: (a) Ax = b is consistent iff b ∈ R(A). (b) The consistent system Ax = b has a unique solution iff N (A) = {0}. 2.5.5.- Prove: A matrix A ∈ Fn,n is invertible iff R(A) = Fn . 2.5.6.- Let A be an invertible matrix. Find the spaces N (A), R(A), N (AT ), and R(AT ). 2.5.7.- Consider the matrices 2 3 2 15 3 1 A = 42 1 − 3 5 , B = 40 13 1 0 0 1 0 3 −2 1 5. 0 Answer the questions below. If the answer is yes, give a proof; if the answer is no, give a counter-example. (a) Is R(A) = R(B)? (b) Is R(AT ) = R(BT )? (c) Is N (A) = N (B)? (d) Is N (AT ) = N (BT )? 90 G. NAGY – LINEAR ALGEBRA december 8, 2009 2.6. LU-factorization A factorization of a number means to decompose that number as a product of appropriate factors. For example, the prime factorization of an integer number means to decompose that integer as a product of prime numbers, like 70 = (2)(5)(7). In a similar way, a factorization of a matrix means to decompose that matrix, using matrix multiplication, as a product of appropriate factors. In this Section we introduce a particular type of factorization, called LU-factorization, which stands for lower triangular-upper triangular factorization. A given matrix A is expressed as a product A = LU, where L is lower triangular and U is upper triangular. This type of factorization can be useful to solve systems of linear equations having A as the coefficient matrix. The LU-factorization of A reduces the number of algebraic operations needed to solve the linear system, saving computer time when solving these systems using a computer. However, not every matrix has this type of factorization. We provide sufficient conditions on A that guarantee its LU-factorization. We start with few basic definitions. Definition 2.31. An m × n matrix is called upper triangular iff all elements below the diagonal vanish, and it is called lower triangular iff all elements above the diagonal vanish. Example 2.6.1: The matrices U1 and lower triangular, 234 234 U1 = 0 5 6 , U2 = 0 5 6 001 001 U2 below are upper triangular, while L1 and L2 are 3 2 , 5 2 L1 = 3 5 0 4 6 0 0 , 7 20 L2 = 3 4 56 0 0 7 0 0 . 0 Definition 2.32. An m × n matrix A has an LU-factorization iff there exists a lower triangular m × m matrix L and an upper triangular m × n matrix U such that A = LU. In the particular case that A is a square matrix both L and U are square matrices. Example 2.6.2: The matrix 11 A = 1 2 12 where A below has 1 10 2 = 1 1 3 11 1 L = 1 1 0 1 1 an LU-factorization, since 0 111 0 0 1 1 ⇒ A = LU, 1 001 0 0 , 1 11 U = 0 1 00 1 1 . 1 Example 2.6.3: The matrix A below has an LU-factorization, since 2 4 −1 1 00 2 4 −1 3 = −2 1 0 0 3 1 ⇒ A = −4 −5 2 −5 −4 1 −3 1 00 0 where 1 L = −2 1 00 1 0 , −3 1 24 U = 0 3 00 −1 1 . 0 A = LU, G. NAGY – LINEAR ALGEBRA December 8, 2009 91 We now provide sufficient conditions on a matrix that imply that such matrix has an LU-factorization. Theorem 2.33. Let A be an m × n matrix that can be transformed into echelon form without row exchanges and without row rescaling. Then, A has an LU-factorization, A = LU, where matrix U is the m × n echelon form mentioned above, and L = Lij is a lower triangular matrix satisfying two conditions: First, the coefficients Lii = 1; second, the coefficients Lij below the diagonal are equal to the multiple of row j that is subtracted from row i in other to annihilate the (i, j ) position during the Gauss elimination method. We first show few examples in order to understand the statement in Theorem 2.33 and later on we present a sketch of the proof for the simple cases of 2 × 2 and 3 × 3 matrices. 1 3 Example 2.6.4: Find the LU-factorization of matrix A = 2 . 4 Solution: Theorem 2.33 says that U is an echelon form of A obtained without row exchanges 1 0 and without row rescaling, while L has the form L = , that is, we need to find only L21 1 the coefficient L21 . In this simple example we can summarize the whole procedure in the following diagram: A= 1 2 row(2)−3 row(1) 1 −−−−−− −−−−−→ 34 0 2 = U, −2 10 . 31 L= This is what we have done: An echelon form for A is obtained with only one Gauss operation: Subtract from the second row the first row multiplied by 3. The resulting echelon form of A is matrix U already. And L21 = 3, since we multiplied by 3 the first row and we subtracted it from the second row. So we have both U and L, and the result is: A= 1 3 2 1 = 4 3 0 1 1 0 2 = LU. −2 2 Example 2.6.5: Find the LU-factorization of the matrix A = −4 2 −1 3 . −4 4 −5 −5 Solution: We recall that we have to find the coefficients below the diagonal in matrix 1 0 0 1 0 . L = L21 L31 L32 1 From the first Gauss operation we obtain L21 as follows: 2 4 −1 2 4 −1 row(2)−(−2) row(1) −4 −5 3 −− − − − − −→ 0 3 1 −−−−−−− 2 −5 −4 2 −5 −4 From the second Gauss operation we obtain 2 4 −1 2 row(3)−1 row(1) 0 3 1− − − − − − −−−−−→ 0 2 −5 −4 0 ⇒ L31 as follows: 4 −1 3 1 ⇒ −9 −3 1 L = −2 L31 1 L = −2 1 0 1 L32 0 1 L32 0 0 . 1 0 0 . 1 92 G. NAGY – LINEAR ALGEBRA december 8, 2009 From the third Gauss operation we obtain L32 2 4 −1 2 row(3)−(−3) row(2) 0 3 1 − − − − − − − → 0 −−−−−−− 0 −9 −3 0 We then conclude, 1 00 1 0 , L = −2 1 −3 1 as follows: 4 −1 3 1 ⇒ 0 0 24 U = 0 3 00 and we have the decomposition 1 2 4 −1 −4 −5 3 = −2 1 2 −5 −4 00 1 0 −3 1 2 0 0 00 1 0 . −3 1 1 L = −2 1 −1 1 , 0 4 3 0 −1 1 . 0 In the following Example we show a matrix A that does not have an LU-factorization. The reason is that in the procedure to find U appears a diagonal coefficient that vanishes. Since row interchanges are prohibited, then there is no LU-factorization in this case. 2 4 −1 3 has no LU-factorization. Example 2.6.6: Show that matrix A = −4 −8 2 −5 −4 Solution: 2 −4 2 From the first Gauss operation we obtain L21 as follows: 4 −1 2 4 −1 1 row(2)−(−2) row(1) −8 3 −− − − − − −→ 0 0 1 ⇒ L = −2 −−−−−−− −5 −4 2 −5 −4 L31 From the second Gauss operation we obtain 2 4 −1 2 row(3)−1 row(1) 0 3 1 − − − − − − 0 −−−−−→ 2 −5 −4 0 L31 as follows: 4 −1 0 1 ⇒ −9 −3 1 L = −2 1 0 1 L32 0 1 L32 0 0 . 1 0 0 . 1 However, we cannot continue the Gauss method to find an echelon form for A without interchanging the second and third rows. Therefore, matrix A has no LU-factorization. In the Example 2.6.5 we also had a diagonal coefficient in matrix U that vanishes, the coefficient U33 = 0. However, in this case A did have an LU-factorization, since this vanishing coefficient was in the last row of matrix U, and no further Gauss operations were needed. Proof of Theorem 2.33: We only give a proof in the case that matrix A is 2 × 2 or 3 × 3. This would give an idea how to construct a proof in the general case. This generalization does not involve new ideas, only a more sophisticated notation. Assume that matrix A is 2 × 2, that is A= A11 A21 A12 . A22 Matrix A is assumed to satisfy the following property: A can be transformed into echelon form by the only Gauss operation of multiplying a row and add that result to another row. If the coefficient A21 = 0, then matrix A is upper triangular already and it has the trivial LU-factorization A = I2 A. If the coefficient A21 = 0, then from Example 2.6.6 we know that G. NAGY – LINEAR ALGEBRA December 8, 2009 93 the assumption on A implies that the coefficient A11 = 0. Then, we can perform the Gauss operation EA = U, that is 1 0 A11 A12 A11 A12 = A22 A11 − A21 A12 = U. EA = A21 A21 A22 1 0 − A11 A11 Matrix U is the upper triangular, and denoting U22 = A22 A11 − A21 A12 , A11 L21 = A21 , A11 we obtain that U= A11 0 A12 , U22 E= 1 −L21 0 1 ⇒ E−1 = 1 L21 0 . 1 We conclude that matrix A has an LU-factorization A= 1 L21 0 1 Assume now that matrix A is 3 × 3, that A11 A = A21 A31 A11 0 is A12 A22 A32 A12 . U22 A13 A23 . A33 Once again, matrix A is assumed to satisfy the following property: A can be transformed into echelon form by the only Gauss operation of multiplying a row and add that result to another row. If any of the coefficients A21 or A13 is non-zero, then the assumption on A implies that the coefficient A11 = 0. Then, we can perform the Gauss operation E2 E1 A = B, that is 1 00 1 00 A11 A12 A13 A11 A12 A13 0 1 0 −L21 1 0 = A21 A22 A23 = 0 B22 B23 , −L31 0 1 0 01 A31 A32 A33 0 B32 B33 where 1 E2 = 0 −L31 00 1 0 01 1 E1 = −L21 0 00 1 0 , 01 we used the notation A21 A31 , L31 = , A11 A11 and the Bij coefficients can be easily computed. Now, if the coefficient B32 = 0, then the assumption on A implies that the coefficient B22 = 0. We then assume that B22 = 0 and we proceed one more step. We can perform the Gauss operation E3 B = U, that is 1 0 0 A11 A12 A13 A11 A12 A13 1 0 0 B22 B23 = 0 B22 B23 = U, E3 B = 0 0 −L31 1 0 B32 B33 0 0 U33 L21 = B32 and the coefficient U33 can be easily computed. This B22 product can be expressed as follows, where we used the notation L31 = E3 E2 E1 A = U ⇒ A = E−1 E−1 E−1 U. 1 2 3 94 It is not difficult to see 1 E−1 = L21 1 0 G. NAGY – LINEAR ALGEBRA december 8, 2009 that 00 1 0 , 01 E− 1 2 1 = 0 L31 which implies that 00 1 0 , 01 E−1 3 1 = 0 0 0 1 L32 0 0 , 1 1 0 0 1 0 = L . E−1 E−1 E−1 = L21 1 2 3 L31 L32 1 We have then shown that matrix A has the LU-factorization A = L U, where A11 A12 A13 1 0 0 1 0 , B22 B23 . U= 0 L = L21 L31 L32 1 0 0 U33 It is not difficult to see that in the case where the coefficient B32 = 0, then the expression A = E−1 E−1 B is already the LU-factorization of matrix A. This establishes the Theorem 1 2 for 2 × 2 and 3 × 3 matrices. One important motivation for finding the LU-factorization of matrix is that it saves computer time when one solves linear systems with such coefficient matrix. Suppose that the m × n matrix A admits the LU-factorization A = LU. Suppose that b ∈ Fm . Then, any solution of the linear system Ax = b can be found as follows: First, find the vector y ∈ Fm solution of Ly = b; second, find the vector x ∈ Fn solution of Ux = y. The first step can be done using forward substitution, since L is lower triangular; the second step can be done using back substitution, since U is upper triangular. Summarizing: Ly = b, Ax = b ⇔ LU x = b ⇔ Ux = y. Example 2.6.7: Use the LU-factorization of matrix A below, to find the solution x to the system Ax = b, where 111 1 A = 1 2 2 , b = 2 . 123 3 Solution: In Example 2.6.2 we have 10 L = 1 1 11 shown that A = LU with 0 111 0 , U = 0 1 1 . 1 001 We first find y solution of the system For the first system we have 100 1 1 1 0 2 ⇒ 111 3 Ly = b,then, having y, we find x solution of Ux = y. y1 = 1, y1 + y2 = 2, y + y + y = 3 , 1 2 3 ⇒ 1 y = 1 . 1 This system is solved for y using forward substitution. For the second system we have x1 + x2 + x3 = 1, 111 1 0 0 1 1 x2 + x3 = 1, 1 ⇒ ⇒ x = 0 . 1 001 1 x3 = 1, This system is solved for x using back substitution. G. NAGY – LINEAR ALGEBRA December 8, 2009 95 Further reading. There exists a vast literature on matrix factorization. See Section 3.10 in Meyer’s book [3] for few types of generalizations of the LU-factorizations, for example, admitting row interchanges in the process to obtain the factorization, or the LDU-factorization. 96 G. NAGY – LINEAR ALGEBRA december 8, 2009 Exercises. 2.6.1.- Find the LU-factorization of matrix » – 5 2 A= . −15 −3 2.6.2.- Find the LU-factorization of matrix » – 213 A= . 467 2.6.5.- Find the LU-factorization of a tridiagonal matrix 3 2 2 −1 0 0 6−1 2 −1 07 7. T=6 4 0 −1 2 − 15 0 0 −1 1 2.6.3.- Find the LU-factorization of matrix 2 3 2 12 5 55 . A = 44 6 −3 5 2.6.6.- Use the LU-factorization to find the solutions to the system Ax = b, where 2 3 23 22 2 12 7 5 , b = 4245 . A = 44 7 6 18 22 12 2.6.4.- Determine if the matrix below has an LU-factorization, 2 3 12 4 17 63 6 −12 3 7 7. A=6 42 3 − 3 25 0 2 −2 6 2.6.7.- Find the values of the number c such that matrix A below has no LUfactorization, 2 3 c20 A = 41 c 15 . 01c G. NAGY – LINEAR ALGEBRA December 8, 2009 97 Chapter 3. Determinants 3.1. Definitions and properties A determinant is a scalar associated to a square matrix which can be used to determine whether the matrix is invertible or not, and this property is the origin of its name. The determinant has a clear geometrical meaning in the case of 2 × 2 and 3 × 3 matrices. In the former case the absolute value of the determinant is the area of a parallelogram formed with the matrix column vectors; in the latter case the absolute value of the determinant is the volume of the parallelepiped formed with the matrix column vectors. The determinant for n × n matrices is introduced as a suitable generalization of these properties. In this Section we present the determinant for 2 × 2 and 3 × 3 matrices and we study their main properties. We then present the definition of determinant for n × n matrices and we mention without proof its main properties. 3.1.1. Determinant of 2 × 2 matrices. We start introducing the determinant as a scalarvalued function on 2 × 2 matrices. Definition 3.1. The determinant of a 2 × 2 matrix A = A11 A21 A12 is the value of the A22 function det : F2,2 → F given by det(A) = A11 A22 − A12 A21 . Depending on the context we will use any of the following notations det(A) = |A| = ∆ = A11 A21 A12 , A22 for the determinant. For example, A11 A21 A12 = A11 A22 − A12 A21 . A22 Example 3.1.1: The value of the determinant can be any real number, as the following three cases show: 12 = 4 − 6 = − 2, 34 2 3 1 = 8 − 3 = 5, 4 1 2 2 = 4 − 4 = 0. 4 We have seen in Theorem 2.19 in Section 2.4 that a 2 × 2 matrix is invertible iff its determinant is non-zero. We now see that there is an interesting geometrical interpretation of this property. Proposition 3.2. Given a matrix A = [A1 , A2 ] ∈ R2,2 , the absolute value of its determinant, | det(A)|, is the area of the parallelogram formed by the vectors A1 , A2 . Proof of Proposition 3.2: Denote the matrix A as follows A= ab cd ⇒ A1 = a , c A2 = b . d The case where all the coefficients in matrix A are positive is shown in Fig. 27. 98 G. NAGY – LINEAR ALGEBRA december 8, 2009 b a c d A1 d y − c − + a A2 a b Figure 27. The geometrical meaning of the determinant of a 2 × 2 matrix is that its absolute value is the area of the parallelogram formed by the matrix column vectors. We can see in Fig. 27 that the area of the parallelogram formed by the vectors A1 and A2 is related to the area of the rectangle with sides b and c. More precisely, the area of the parallelogram is equal to the area of the rectangle minus the area of the two triangles marked with a “−” sign in Fig. 27 plus the area of the triangle marked with “+” sign. Denoting the area of the parallelogram by Ap , we obtain the equation ac yd (a − y )(c − d) − + 2 2 2 ac yd ac ad yc yd = bc − − + − − + 2 2 2 2 2 2 ad yc −. = bc − 2 2 Similar triangles implies that a y = ⇒ yc = ad. d c Introducing this relation in the equation above it we obtain Ap = bc − ad ad − ⇒ Ap = ad − bc = | det(A)|. 2 2 We consider only this case in our proof. The remaining cases can be studied in a similar way. This establishes the Theorem. Ap = bc − Example 3.1.2: Find the area of the parallelogram formed by a = 1 2 and b = . 2 1 12 , then the area A of the parallelo21 gram formed by the vectors a, b is A = | det(A)|, that is, Solution: If we consider the matrix A = [a, b] = det(A) = 12 = 1 − 4 = −3 21 ⇒ A = 3. G. NAGY – LINEAR ALGEBRA December 8, 2009 99 The determinant function on 2 × 2 matrices satisfies the following properties. Proposition 3.3. For all vectors a1 , a2 , b ∈ F2 and scalars k ∈ F, the determinant function det : F2,2 → F satisfies: (a) (b) (c) (d) det det det det [a1 , a2 ] = − det [a2 , a1 ] ; [k a1 , a2 ] = det [a1 , k a2 ] = k det [a1 , a2 ] ; [a1 , k a1 ] = 0 ; [(a1 + b), a2 ] = det [a1 , a2 ] + det [b, a2 ] . Proof of Proposition 3.3: Introduce the notation a1 = a , c a2 = b , d a1 , a2 = ⇒ ab . cd Part (a): ab = ad − bc cd ba det [a2 , a1 ] = = bc − ad, dc det [a1 , a2 ] = ⇒ det [a1 , a2 ] = − det [a2 , a1 ] . Part (b): det [k a1 , a2 ] = ka kc b ab = k (ad − bc) = k = k det [a1 , a2 ] , d cd det [a1 , k a2 ] = a kb ab = k (ad − bc) = k = k det [a1 , a2 ] . c kd cd Part (c): det [a1 , k a1 ] = det [k a1 , a1 ] = − det [a1 , k a1 ] ⇒ det [a1 , k a1 ] = 0. b1 . Then, b2 Part (d): Introduce the notation b = det [(a1 + b), a2 ] = (a + b1 ) b (c + b2 ) d = (a + b1 )d − (c + b2 )b = (ad − cb) + (b1 d − b2 b) = ab b +1 cd b2 b d = det [a1 , a2 ] + det [b, a2 ] . This establishes the Proposition. The determinant function also satisfies the following extra properties. Proposition 3.4. Let A, B ∈ F2,2 . Then it holds: (a) Matrix A is invertible iff det(A) = 0; (b) det(A) = det(AT ); (c) det(AB) = det(A) det(B). (d) If matrix A is invertible, then det(A−1 ) = 1 . det(A) 100 G. NAGY – LINEAR ALGEBRA december 8, 2009 Proof of Proposition 3.4: Part (a) have been proven in Theorem 2.19. ab Part (b): Denote A = . Then holds cd det(A) = Part (c): Denoting A = AB = A11 A21 ab ac = ad − bc = = det AT . cd bd A12 B11 and B = A22 B21 B12 we obtain, B22 (A11 B11 + A12 B21 ) (A11 B12 + A12 B22 ) . (A21 B11 + A22 B21 ) (A21 B12 + A22 B22 ) Therefore, det(AB) = (A11 B11 + A12 B21 )(A21 B12 + A22 B22 ) − (A11 B12 + A12 B22 )(A21 B11 + A22 B21 ) = A11 B11 A21 B12 + A11 B11 A22 B22 + A12 B21 A21 B12 + A12 B21 A22 B22 − A21 B11 A11 B12 − A21 B11 A12 B22 − A22 B21 A11 B12 − A22 B21 A12 B22 = A11 B11 A22 B22 + A12 B21 A21 B12 − A21 B11 A12 B22 − A22 B21 A11 B12 = (A11 A22 − A21 A12 )B11 B22 − (A11 A22 − A12 A21 )B12 B21 = (A11 A22 − A21 A12 ) (B11 B22 − B12 B21 ) = det(A) det(B). Part (d): Since A(A−1 ) = I2 , we obtain 1 = det(I2 ) = det A(A−1 ) = det(A) det A−1 ⇒ det A−1 = 1 . det(A) This establishes the Proposition. We finally mention that det(A) = det(AT ) implies that Proposition 3.3 can be generalized from properties involving columns of the matrix into properties involving rows of the matrix. Introduce the following notation that generalizes column vectors to include row vectors: A= A11 A21 A12 A = A:1 , A:2 = 1: , A22 A2: where A:1 = A11 , A21 A:2 = A12 , A22 A1: = A11 A12 , A2: = A21 A22 . The first two vectors above are the usual column vectors of matrix A, and the last two are its row vectors. When working with both, column vectors and row vectors, we use the notation above; when working only with column vectors, we drop the colon and we denote them, for example, as A:1 = A1 . Using this notation, it is simple to verify the following result. Proposition 3.5. For all vectors aT , aT , bT ∈ F2 and scalars k ∈ F, the determinant 1: 2: 1: function det : F2,2 → F satisfies: a1: a2: (a) det = − det ; a2: a1: k a1: a1: a1: (b) det = det = k det ; a2: k a2: a2: a1: (c) det = 0; k a1: G. NAGY – LINEAR ALGEBRA December 8, 2009 (d) det (a1: + b1: ) a2: = det a1: a2: + det b1: a2: 101 . The proof is left as an exercise. 3.1.2. Determinant of 3 × 3 matrices. The determinant for 3 × 3 matrices is defined recursively in terms of 2 × 2 determinants. Definition 3.6. The determinant on 3 × 3 matrices is the function det : F3,3 → F, A11 det(A) = A21 A31 A12 A22 A32 A13 A A23 = A11 22 A32 A33 A A23 − A12 21 A31 A33 A A23 + A13 21 A31 A33 A22 . A32 The determinant of a 3 × 3 matrix is computed using the determinants of 2 × 2 matrices, A A det(A) = A11 det ˚11 − A12 det ˚12 + A13 det ˚13 , A where ˚ = A22 A11 A32 A23 , A33 ˚ = A21 A12 A31 A23 , A33 ˚ = A21 A13 A31 A21 ; A31 that is, the 2 × 2 matrix ˚ij is obtained from the 3 × 3 matrix A by removing the row i and A the column j . Example 3.1.3: Find the determinant of the matrix 1 3 −1 1 . A = 2 1 32 1 Solution: We use the definition above, that is, 1 2 3 3 1 2 −1 11 2 1 =1 −3 21 3 1 1 21 + (−1) 1 32 = (1 − 2) − 3(2 − 3) − (4 − 3) =1+3−1 = 1. As in the 2 × 2 case, the determinant of a real-valued 3 × 3 matrix can be any real number, positive, negative of zero. The absolute value of the determinant has the geometrical meaning, see Fig. 28. Theorem 3.7. Let A = [A1 , A2 , A3 ] be a 3 × 3 matrix. Then, the absolute value of the determinant, | det(A)|, is the volume of the parallelepiped formed by the vectors A1 , A2 , A3 . We do not give a prove of this statement. See the references at the end of the Section. Example 3.1.4: Show that the determinant of an upper triangular or a lower triangular matrix is the product of its diagonal elements. 102 G. NAGY – LINEAR ALGEBRA december 8, 2009 x3 A1 A3 x1 x2 A2 Figure 28. The geometrical meaning of the determinant of a 3 × 3 matrix is that its absolute value is the volume of the parallelepiped formed by the matrix column vectors. Solution: From the definition of determinant of a 3 × 3 matrix we obtain, A11 0 0 A12 A22 0 A13 A A23 0 A23 = A11 22 − A12 0 A33 0 A33 = A11 A22 A33 ; A11 A21 A31 0 A22 A32 0 A 0 A 0 = A11 22 − 0 21 A32 A33 A31 A33 = A11 A22 A33 . A23 0 + A13 A33 0 A22 0 0 A + 0 21 A33 A31 A22 A32 The determinant function on 3 × 3 satisfies a generalization of the properties proven for 2 × 2 matrices in Propositions 3.3. Proposition 3.8. For all vectors a1 , a2 , a3 , b ∈ F3 and scalars k ∈ F, the determinant function det : F3,3 → F satisfies: (a) det [a1 , a2 , a3 ] = − det [a2 , a1 , a3 ] = − det [a1 , a3 , a2 ] ; (b) det [k a1 , a2 , a3 ] = k det [a1 , a2 , a3 ] ; (c) det [a1 , k a1 , a3 ] = 0; (d) det [(a1 + b), a2 , a3 ] = det [a1 , a2 , a3 ] + det [b, a2 , a3 ] . The proof of this Proposition is left as an exercise. The property (a) implies that all the remaining properties (b)-(d) also hold for all the column vectors. That is, from properties (a) and (b) one also shows det [k a1 , a2 , a3 ] = det [a1 , k a2 , a3 ] = det [a1 , a2 , k a3 ] = k det [a1 , a2 , a3 ] ; from properties (a) and (c) one also shows det [a1 , a2 , k a1 ] = 0, det [a1 , a2 , k a2 ] = 0; and from properties (a) and (d) one also shows det [a1 , (a2 + b), a3 ] = det [a1 , a2 , a3 ] + det [a1 , b, a3 ] , det [a1 , a2 , (a3 + b)] = det [a1 , a2 , a3 ] + det [a1 , a2 , b] . G. NAGY – LINEAR ALGEBRA December 8, 2009 103 The following result is a generalization of Proposition 3.4. Proposition 3.9. Let A, B ∈ F3,3 . Then it holds: (a) Matrix A is invertible iff det(A) = 0; (b) det(A) = det(AT ); (c) det(AB) = det(A) det(B). 1 (d) If matrix A is invertible, then det(A−1 ) = . det(A) The proof of this Proposition is left as an exercise. Like in the case of matrices 2 × 2, Proposition 3.8 can be generalized from properties involving column vector to properties involving row vectors. Proposition 3.10. For all vectors aT , aT , aT , bT ∈ F3 and scalars k ∈ F, the determinant 1: 2: 3: 1: function det : F3,3 → F satisfies: a1: a2: a3: a1: (a) det a2: = − det a1: ; = − det a2: ; = − det a3: ; a a a1: a2: 3: 3: a1: k a1: (b) det a2: = k det a2: ; a3: a3: a1: (c) det k a1: = 0; a 3: (a1: + b1: ) a1: b1: = det a2: + det a2: . a2: (d) det a3: a3: a3: The proof is left as an exercise. The property (a) implies that all the remaining properties (b)-(d) also hold for all the row vectors. That is, from properties (a) and (b) one also shows k a1: a1: a1: a1: det a2: = det k a2: = det a2: = k det a2: ; a3: a3: k a3: a3: from properties (a) and (c) one also shows a1: det a2: = 0; k a1: from properties (a) and (d) one also shows a1: det (a2: + b2: ) = det a3: a1: = det a2: det (a3: + b3: ) a1: det k a2: = 0; a2: a1: a2: + det a3: a1: a2: + det a3: a1: b2: , a3: a1: a2: . b3: 3.1.3. Determinant of n × n matrices. The notion of determinant can be generalized to n × n matrices with n 1. One way to find an appropriate generalization is to realize that the determinant of 2 × 2 and 3 × 3 matrices are related to the notion of areas and volumes, respectively. One then studies the properties of areas an volumes, like the ones given in Proposition 3.3 and 3.8. It can be shown that generalizations of these properties 104 G. NAGY – LINEAR ALGEBRA december 8, 2009 determine a unique function det : Fn,n → F. In this notes we only present the final result, the definition of determinant for n × n matrices. The reader is referred to the literature for a constructive proof of the function determinant. Definition 3.11. Let A = [Aij ] ∈ Fn,n , for n det : Fn,n → F is given by 1 and i, j = 1, · · · , n. The function det(A) = A11 C11 + · · · + A1n C1n , where Cij , called cofactor i, j of matrix A, is the scalar given by Cij = (−1)(i+j ) det ˚ij , A where the matrices ˚ij ∈ F(n−1),(n−1) , called minors matrices of A, are obtained from A matrix A by eliminating the row i and the column j , which are denoted in boldface in the expression below A11 · · · A1j · · · A1n . . . . . . . . . ˚ = A Aij i1 · · · Aij · · · Ain . . . . . . . . . . An1 · · · Anj · · · Ann Example 3.1.5: Use the formula in Def. 3.11 to compute the determinant of a 2 × 2 matrix. Solution: Consider the matrix A = ˚ =A , A11 22 A11 A21 ˚ =A , A12 21 A12 . The four minor matrices are given by A22 ˚ =A , A21 12 ˚ =A . A22 11 This means, generalizing the notion of determinant to numbers by det(a) = a, that the cofactors are C11 = A22 , C12 = −A21 , C21 = −A12 , C22 = A11 . So, we obtain that det(A) = A11 C11 + A12 C12 = A11 A22 − A12 A21 . Notice that we do not need to compute all four cofactors to compute the determinant of the matrix; just two of them are enough. Example 3.1.6: Use the formula in Def. 3.11 to compute the determinant of a 3 × 3 matrix. Solution: Consider the matrix A11 A = A21 A31 A12 A22 A32 A13 A23 . A33 The nine minor matrices are given by ˚ = A22 A11 A32 A23 A33 ˚ = A21 A12 A31 A23 A33 ˚ = A21 A13 A31 A22 A32 ˚ = A12 A21 A32 A13 A33 ˚ = A11 A22 A31 A13 A33 ˚ = A11 A23 A31 A12 A32 ˚ = A12 A31 A22 A13 A23 ˚ = A11 A32 A21 A13 A23 ˚ = A11 A23 A21 A12 . A22 G. NAGY – LINEAR ALGEBRA December 8, 2009 105 Then, Def. 3.11 agrees with Def. 3.6, since det(A) = A11 C11 + A12 C12 + A13 C13 = A11 A22 A32 A23 A − A12 21 A33 A31 A23 A + A13 21 A33 A31 A22 . A32 We now state several results that summarize the main properties of the determinant of n×n matrices. We state the results without proof. The frost result says that the determinant of a matrix can be computed using expansions along any row or any column in the matrix. Proposition 3.12. The determinant of an n × n matrix A = [Aij ] can be computed expanding along any row or any column of matrix A, that is, det(A) = Ai1 Ci1 + · · · + Ain Cin , = A1j C1j + · · · + Anj Cnj , i = 1, · · · , n, j = 1, · · · , n. Example 3.1.7: Show all possible expansions of the determinant of a 3 × 3 matrix A = [Aij ]. Solution: The expansions along each of the three rows are the following: det(A) = A11 C11 + A12 C12 + A13 C13 = A21 C21 + A22 C22 + A23 C23 = A31 C31 + A32 C32 + A33 C33 ; The expansions along each of the three columns are the following: det(A) = A11 C11 + A21 C21 + A31 C31 = A12 C12 + A22 C22 + A32 C32 = A13 C13 + A23 C23 + A33 C33 . Example 3.1.8: Use an expansion by the matrix 1 A = 2 3 third column to compute the determinant of 3 −1 1 1 . 2 1 Solution: The expansion by the third column is the following, 13 21 32 −1 21 1 1 = (−1) − (1) 32 2 1 3 1 + (1) 3 2 3 1 = −(4 − 3) − (2 − 9) + (1 − 6) = −1 + 7 − 5 = 1. This results agrees with Example 3.1.3. The generalization of Proposition 3.8 to n × n matrices is the following. Proposition 3.13. For all vectors ai , aj ,b ∈ Fn , with i, j = 1, · · · , n, and for all scalars k ∈ F, the determinant function det : Fn,n → F satisfies: 106 G. NAGY – LINEAR ALGEBRA december 8, 2009 (a) (b) (c) (d) det det det det [a1 , · · · [a1 , · · · [a1 , · · · [a1 , · · · , ai , · · · , aj , · · · , an ] = − det [a1 , · · · , aj , · · · , ai , · · · , an ] ; , k ai , · · · , an ] = k det [a1 , · · · , ai , · · · , an ] ; , k ai , · · · , ai , · · · , an ] = 0; , (ai + b), · · · , an ] = det [a1 , · · · , ai , · · · , an ] + det [b, · · · , b, · · · , an ] . The generalization of Proposition 3.9 to n × n matrices is the following. Proposition 3.14. Let A, B ∈ Fn,n . Then it holds: (a) Matrix A is invertible iff det(A) = 0; (b) det(A) = det(AT ); (c) det(AB) = det(A) det(B). 1 (d) If matrix A is invertible, then det(A−1 ) = . det(A) The proof of this Proposition is left as an exercise. Like in the case of matrices 2 × 2, Proposition 3.8 can be generalized from properties involving column vector to properties involving row vectors. Proposition 3.15. For all vectors aT: , aT: , bT: ∈ Fn and scalars k ∈ F, the determinant i j i function det : Fn,n → F satisfies: a1: . . . a1: . . . ai: . =− . . aj : .; . . aj : . . . ai: . . . an: an: a1: . . . a1: . . . k ai: = k ai: ; . . . . . . an: an: a1: . . . k ai: . = 0; . . ai: . . . an: a1: . . . a1: . . . a1: . . . (ai: + bi: ) = ai: + bi: . . . . . . . . . . an: an: an: G. NAGY – LINEAR ALGEBRA December 8, 2009 107 Exercises. 3.1.1.- Find the determinant of matrices 2 3 2 3 211 211 A = 4 6 2 15 , B = 4 6 0 15 . −2 2 1 −2 0 1 3.1.2.- Find the volume of the parallelepiped formed by the vectors 23 23 23 3 0 1 x1 = 4 0 5 , x2 = 425 , x3 = 405 . −4 0 1 3.1.3.- Find the determinants of the upper and lower triangular matrices 2 3 2 3 123 100 U = 40 4 5 5 , L = 4 2 3 0 5 . 006 456 3.1.4.- Find the determinants of the matrix 2 3 001 A = 40 2 3 5 . 456 3.1.5.- Give an example to show that in general det(A + B) = det(A) + det(B). 3.1.6.- If A ∈ Fn,n express det(2A), det(−A), det(A2 ) in terms of det(A) and n. 3.1.7.- Given matrices A, P ∈ Fn,n , with P invertible, let B = P−1 A P. Prove that det(B) = det(A). 3.1.8.- Prove that for all matrix A ∈ Fn,n holds that det(A∗ ) = det(A). 3.1.9.- Prove that for all A ∈ Fn,n holds that det(A∗ A) 0. 3.1.10.- Prove that for all A ∈ Fn,n and all k ∈ F holds that det(k A) = kn det(A). 3.1.11.- Prove that a skew-symmetric matrix A ∈ Fn,n , with n odd, must satisfy that det(A) = 0. 3.1.12.- Let A ∈ Fn,n be a matrix satisfying that AT A = In . Prove that det(A) = ±1. 108 G. NAGY – LINEAR ALGEBRA december 8, 2009 3.2. Applications There are several uses of determinants. They determine whether a matrix A ∈ Fn,n is invertible or not, since a matrix A is invertible iff det(A) = 0. The absolute value of the determinant of a 2 × 2 matrix is the area of the parallelogram formed by the matrix column vectors; while the absolute value of the determinant of a 3 × 3 matrix is the volume of the parallelepiped formed by the matrix column vectors. The determinant also provides a formula for the inverse of a matrix. The existence of this formula does not change the fact that one can always compute the inverse matrix using Gauss operations. This formula for the inverse matrix is important though, since it explicitly shows how the inverse matrix depends on the coefficients of the original matrix. It thus shows how the inverse changes when the original matrix changes. Theorem 3.16. Given a matrix A = [Aij ] ∈ Fn,n , let C = [Cij ] be its cofactor matrix, where the cofactors are given by Cij = (−1)(i+j ) det(˚ij ), and matrix ˚ij ∈ F(n−1),(n−1) is A A the (i, j ) minor of matrix A. If det(A) = 0, then the inverse of matrix A is given by A−1 = 1 CT , det(A) A−1 that is, ij = 1 Cji . det(A) Proof of Theorem 3.16: Since det(A) = 0 we know that matrix A is invertible. We only need to verify that the expression for A−1 given in Theorem 3.16 is correct. That is, we need to show that CT A = det(A) In = A CT . We start with the product (CT A)ij = C1i A1j + · · · + Cni Anj . Notice that this component (CT A)ij is precisely the the expansion along the column i of the determinant of the following matrix: The matrix constructed from matrix A by placing the column vector Aj : in both columns i and j . In the particular case that i < j , this component (CT A)ij has the form A11 . . . A1j . . . ··· A1 j . . . ··· A1n . . . Ai1 . . . ··· Aij . . . ··· Aij . . . ··· Ain ., . . Aj 1 . . . ··· Ajj . . . ··· Ajj . . . ··· Ajn . . . An1 C1i A1j + · · · + Cni Anj = ··· ··· Anj ··· Anj ··· Ann i < j, where in boldface we have highlighted the column i which is occupied by the column vector Aj : . It is simple to see that the determinant on the right hand side vanishes, since for i < j the matrix has the columns i and j repeated. A similar situation happens for i > j . So we conclude that the off-diagonal elements of (CT A) vanish. The diagonal elements are given by (CT A)ii = C1i A1i + · · · + Cni Ani = det(A), the determinant of A expanded along the i-th column. This shows that CT A = det(A) In . A similar analysis shows that A CT = det(A) In . This establishes the Theorem. Example 3.2.1: Use the formula in Theorem 3.16 to find the inverse of matrix 1 3 −1 1 . A = 2 1 32 1 G. NAGY – LINEAR ALGEBRA December 8, 2009 109 Solution: We already know from Example 3.1.3 that det(A) = 1. We now need to compute all the cofactors: 11 21 = −1; C11 = 3 −1 2 1 = −5; 3 1 = 4; 2 3 = 1; 1 −1 3 1 = 4; C23 = − C22 = 13 21 = −5. C33 = C32 = − Therefore, the cofactor matrix is given by −1 C = −5 4 1 4 −3 13 32 = 7; 1 −1 2 1 = −3; −1 1 1 2 C13 = = 1; C21 = − C31 = 21 31 C12 = − 1 7 , −5 and the formula A−1 = CT / det(A) together with det(A) = 1 imply that −1 −5 4 4 −3 . A−1 = 1 1 7 −5 The formula in Theorem 3.16 is useful to compute individual components of the inverse of a matrix. Example 3.2.2: Find the coefficients A−1 12 1 A = 1 1 and A−1 32 of the matrix 11 −1 2 . 13 Solution: We first need to compute −1 2 12 1 − (1) + (1) 13 13 1 det(A) = (1) −1 = −5 − 1 + 2 1 ⇒ det(A) = −4. The formula in Theorem 3.16 implies that A−1 12 A−1 = 32 1 C21 , −4 = 1 C23 , −4 C21 = (−1) 1 1 C23 = (−1) 1 = −2, 3 ⇒ A−1 11 = 0, 11 ⇒ A−1 12 32 = 1 . 2 = 0. 110 G. NAGY – LINEAR ALGEBRA december 8, 2009 3.2.1. Cramer’s rule. Given an invertible matrix A ∈ Fn,n , a system of linear equations Ax = b has a unique solution x for every source vector b ∈ Fn . This solution can be written in terms of the inverse matrix A−1 as x = A−1 b. The formula for the inverse matrix given in Theorem 3.16 provides an explicit expression for the solution x. The result is known as Cramer’s rule, and it is summarized below. Theorem 3.17. Let A = [A1 , · · · , An ] ∈ Fn,n be an invertible matrix. Then, the system of linear equations Ax = b has a unique solution x = [xi ] for every b ∈ Fn given by xi = det Ai (b) , det(A) with matrix Ai (b) = [A1 , · · · , b, · · · , An ] where vector b is placed in the i-th column. Example 3.2.3: Use Cramer’s rule to find the solution x of the linear system Ax = b, where A= 3 −5 −2 , 6 b= 7 . −5 Solution: We first need to compute the determinant of A, that is, det(A) = 18 − 10 ⇒ det(A) = 8. Then, we need to find the matrices A1 (b) and A2 (b), given by A1 (b) = [b, A2 ] = 7 −5 −2 , 6 A2 (b) = [A1 , b] = 3 −5 7 . −5 We finally compute their determinants, det(A1 (b)) = 42 − 10 = 32, det(A2 (b)) = −15 + 35 = 20. So the solution is, x= x1 x2 x1 = 32 = 4, 8 x2 = 20 5 =, 8 2 ⇒ x= 4 . 5/2 Proof of Theorem 3.17: Since matrix A is invertible, the solution to the linear system is x = A−1 b. Using the formula for the inverse matrix given in Theorem 3.16, the solution x = [xi ] can be written as x= 1 CT b det(A) ⇒ xi = 1 det(A) n Cji bj . j =1 Notice that the sum in the last equation above is precisely the expansion along the column i of the determinant of the matrix Aj (b), that is, det(Ai (b)) = A11 . . . An1 ··· ··· b1 . . . bn ··· ··· A1n . = b C + ··· + b C = . 1 1i n ni . Ann n bj Cji , j =1 where the vector b replaces the vector Ai in the column i of matrix A. So, we conclude that xi = This establishes the Theorem. det(Ai (b)) . det(A) G. NAGY – LINEAR ALGEBRA December 8, 2009 111 3.2.2. Determinants computed with Gauss operations. Gauss elimination operations can be used to compute the determinant of a matrix. If a Gauss operation on matrix A produces matrix B, then det(A) is related to det(B) in a precise way. This relation is summarized in the following result. Proposition 3.18. Let A, B ∈ Fm,n be matrices related by a Gauss operation, that is, A → B by a Gauss operation. Then, the following statements hold: (a) If matrix B is the result of adding to a row in A a multiple of another row in A, then det(A) = det(B); (b) If matrix B is the result of interchanging two rows in A, then det(A) = − det(B); (c) If matrix B is the result of the multiplication of one row in A by the scalar k ∈ F, then 1 det(A) = det(B). k Proof of Proposition 3.18: We use the row vector notation for matrices A and B, A1: B1: . . A = . , B = . . . . An: Bn: Part (a): Matrix B results from multiplying row j of A by k and adding that to row i, A1: A1: A1: A1: A1: . . . . . . . . . . . . . . . Ai: + k Aj : Ai: + k Aj : Ai: Aj : Ai: . . . . . . . B= = . + k . = . = det(A). ⇒ det(B) = . . . . . Aj : Aj : Aj : Aj : Aj : . . . . . . . . . . . . . . . An: An: An: An: An: Part (b): Matrix B is the result of the interchange of rows i and j in matrix A, that is, A1: A1: A1: A1: . . . . . . . . . . . . Ai: Aj : Aj : Ai: . . . . A = . → B = . ⇒ det(B) = . = − . = − det(A). . . . . Aj : Ai: Ai: Aj : . . . . . . . . . . . . An: An: An: An: Part (c): Matrix B is the result of the multiplication of row i A1: A1: A1: . . . . . . . . . A = Ai: → B = k Ai: ⇒ det(B) = k Ai: = k . . . . . . . . . An: An: An: This establishes the Proposition. of A by k ∈ F, that is, A1: . . . Ai: = k det(A). . . . An: 112 G. NAGY – LINEAR ALGEBRA december 8, 2009 Example 3.2.4: Use Gauss operations to transform matrix A below into upper-triangular form, and use that calculation to find det(A), where, 1 3 −1 1 . A = 2 1 32 1 Solution: We perform Gauss operations to transform A into upper-triangular form: 1 det(A) = 2 3 3 1 2 1 −1 1 =0 0 1 3 −5 −7 1 −1 3 = (−5) 0 0 4 13 3 −1 1 −3/5 = (−5) 0 1 00 −7 4 −1 −3/5 . −1/5 We only need to reduce matrix A into upper-triangular form, not into reduced echelon form, since the determinant of an upper-triangular matrix is simple enough to find, just the product of its diagonal elements. In our case we find that 13 det(A) = (−5) 0 1 00 −1 1 −3/5 = (−5)(1)(1) − 5 −1/5 ⇒ det(A) = 1. G. NAGY – LINEAR ALGEBRA December 8, 2009 113 Exercises. 3.2.1.- Use determinants to where 2 21 A=4 6 2 −2 2 compute A−1 , 3 1 15 . 1 3.2.2.- Find the coefficients (A−1 )12 and (A−1 )32 , where 2 3 157 A = 42 1 0 5 . 413 3.2.3.- Use Gauss operations to reduce matrices A and B below to upper triangular form and evaluate their determinant, where 3 3 2 2 123 1 35 4 25 . A = 42 4 1 5 , B = 4− 1 144 3 −2 4 3.2.4.- Use Gauss operations to prove the formula ˛ ˛ ˛1 a a 2 ˛ ˛ ˛ ˛1 b b2 ˛ = (b − a)(c − a)(c − b). ˛ ˛ ˛1 c c 2 ˛ 3.2.5.- Use the det(A) to find the values of the constant k such that the system Ax = b has a unique solution for every source vector b, where 2 3 1k 0 A = 40 1 −15 . k0 1 3.2.6.- Use Cramer’s rule to find the solution to the linear system a x1 + b x 2 = 1 c x 1 + d x2 = 0 . where a, b, c, d ∈ R. 3.2.7.- Use Cramer’s rule to find the solution to the linear system 32 3 2 3 2 x1 1 111 41 1 05 4x2 5 = 425 . x3 3 011 114 G. NAGY – LINEAR ALGEBRA december 8, 2009 Chapter 4. Vector spaces 4.1. Spaces and subspaces In this Section we introduce the notion of a vector space as a set of elements where the linear combination operation can be defined. The actual elements that constitute the vector space are left unspecified, only the relation among them is determined. An example of a vector space is the set Fn of n-vectors with the operation of linear combination studied in Chapter 1. Another example is the set Fm,n of all m × n matrices with the operation of linear combination studied in Chapter 2. We now define a vector space and comment its main properties. A subspace is introduced later on as a smaller vector space inside the original vector space. We end this Section with the concept of span of a set of vectors, which is a way to construct a subspace from a subset in a vector space. Definition 4.1. A set V is a vector space over the scalar field F ∈ {R, C} iff there are two operations defined on V , called vector addition and scalar multiplication with the following properties: For all u, v, w ∈ V the vector addition satisfies (A1) u + v ∈ V , (closure of V ); (A2) u + v = v + u, (commutativity); (A3) (u + v) + w = u + (v + w), (associativity); (A4) ∃ 0 ∈ V : 0 + u = u ∀ u ∈ V , (existence of a neutral element); (A5) ∀ u ∈ V ∃ (−u) ∈ V : u + (−u) = 0, (existence of an opposite element); furthermore, for all a, b ∈ F the scalar multiplication satisfies (M1) au ∈ V , (closure of V ); (M2) 1 u = u, (neutral element); (M3) a(bu) = (ab)u, (associativity); (M4) a(u + v) = au + av, (distributivity); (M5) (a + b)u = au + bu, (distributivity). The definition of a vector space does not specify the elements of the set V , it only determines the properties of the vector addition and scalar multiplication operations. We use the convention that elements in a vector space, vectors, are represented in boldface. Nevertheless, we allow several exceptions, the first two of them are given in Examples 4.1.1 and 4.1.2. We now present several examples of vector spaces. Example 4.1.1: The set Fn of n-vectors u = [ui ] with components ui ∈ F and the operations of column vector addition and scalar multiplication given by [ui ] + [vi ] = [ui + vi ], a[ui ] = [aui ]. is a vector space. This space of column vectors was introduced in Chapter 1. Elements in these vector spaces are not represented in boldface, instead we keep the previous sanserif font, u ∈ Fn . The reason for this notation will be clear in Sect. 4.4. Example 4.1.2: The set Fm,n of m × n matrices A = [Aij ] with matrix coefficients Aij ∈ F and the operations of matrix addition and scalar multiplication given by Aij + Bij = Aij + Bij , a Aij = aAij , is a vector space. These operations were introduced in Chapter 2. As in the previous example, elements in these vector spaces are not represented in boldface, instead we keep the previous capital sanserif font, A ∈ Fm,n . The reason for this notation will be clear in Sect. 4.4. G. NAGY – LINEAR ALGEBRA December 8, 2009 115 Example 4.1.3: Let Pn (U ) be the set of all polynomials having degree n U ⊂ F, that is, 0 and domain Pn (U ) = p(x) = a0 + a1 x + · · · + an xn , with a0 , · · · , an ∈ F and x ∈ U ⊂ F . The set Pn (U ) together with the addition of polynomials (p + q)(x) = p(x) + q(x) and the scalar multiplication (ap)(x) = a p(x) is a vector space. In the case U = R we use the notation Pn = Pn (R). Example 4.1.4: Let C k [a, b], F be the set of scalar valued functions with domain [a, b] ⊂ R with the k -th derivative being a continuous function, that is, C k [a, b], F = f : [a, b] → F such that f (k) is continuous . k The set C [a, b], F together with the addition of functions (f + g)(x) = f (x) + g(x) and the scalar multiplication (af )(x) = a f (x) is a vector space. The particular case C k (R, R) is denoted simply as C k . The set of real valued continuous function is then C 0 . Example 4.1.5: Let be the set of absolute convergent series, that is, ∞ = a= an : an ∈ F and |an | exists . n=0 The set with the addition of series a + b = ca = c an is a vector space. (an + bn ) and the scalar multiplication The properties (A1)-(M5) given in the definition of vector space are not redundant. For example, notice that these properties do not include the condition that the neutral element 0 is unique, since it follows from the definition. Proposition 4.2. The 0 element in a vector space is unique. Proof Proposition 4.2: Suppose that there exist two neutral elements 01 and 02 in the vector space V , that is, 01 + u = u and 02 + u = u for all u∈V Taking u = 02 in the first equation above, and u = 01 in the second equation above we obtain that 01 + 02 = 02 , 02 + 01 = 01 . These equations above simply that the two neutral elements must be the same, since 02 = 01 + 02 = 02 + 01 = 01 ; where in the second equation we used that the addition operation is commutative. This establishes the Proposition. Proposition 4.3. 0 u = 0 for all element u in a vector space V . Proof Proposition 4.3: u = 1 u = (1 + 0)u = 1 u + 0 u = u + 0 u = 0 u + u ⇒ 0 u = u. Hence, the element 0 u is a neutral element. Since Proposition 4.2 says that the neutral element is unique, we conclude that 0 u = 0. This establishes the Proposition. 116 G. NAGY – LINEAR ALGEBRA december 8, 2009 Also notice that the property (A5) in the definition of vector space says that the opposite element exists, but it does not say whether it is unique. The opposite element is actually unique. Proposition 4.4. The opposite element −u in a vector space is unique. Proof Proposition 4.4: Suppose there are two opposite elements −u1 and −u2 to the element u ∈ V , that is, u + (−u1 ) = 0, u + (−u2 ) = 0. Therefore, (−u1 ) = 0 + (−u1 ) = u + (−u2 ) + (−u1 ) = (−u2 ) + u + (−u1 ) = (−u2 ) + 0 = 0 + (−u2 ) = (−u2 ) ⇒ (−u1 ) = (−u2 ). This establishes the Proposition. Finally, notice that the element (−u) opposite to u is actually the element (−1)u. Proposition 4.5. (−1) u = (−u). Proof Proposition 4.4: 0 = 0 u = (1 − 1) u = 1 u + (−1) u = u + (−1) u. Hence (−1) u is an opposite element of u. Since Proposition 4.4 says that the opposite element is unique, we conclude that (−1) u = (−u). This establishes the Proposition. 4.1.1. Subspaces. We now introduce the notion of a subspace of a vector space, which is essentially a smaller vector space inside the original vector space. Subspaces are important in physics, since physical processes frequently take place not inside the whole space but in a particular subspace. For instance, planetary motion does not take plane in the whole space but it is confined to a plane. Definition 4.6. The subset W ⊂ V of a vector space V over F is called a subspace of V iff for all u, v ∈ W and all a, b ∈ F holds that au + bv ∈ W . A subspace is a particular type of set in a vector space. Is a set where all possible linear combinations of two elements in the set results in another element in the same set. In other words, elements outside the set cannot be reached by linear combinations of elements within the set. For this reason a subspace is called a closed set under linear combinations. The following statement is usually helpful Proposition 4.7. If W ⊂ V is a subspace of a vector space V , then 0 ∈ W . Proof of Proposition 4.7: Since W is closed under linear combinations, given any element u ∈ W , the trivial linear combination 0 u = 0 ∈ W . This establishes the Proposition. This fact helps proving that a set W is not a subspace: Simply verify that 0 ∈ W . / However, if actually 0 ∈ W , this fact alone does not prove that W is a subspace. One must show that W is closed under linear combinations. Example 4.1.6: Show that the set W ⊂ R3 given by W = u = [ui ] ∈ R3 : u3 = 0 is a subspace of R3 : G. NAGY – LINEAR ALGEBRA December 8, 2009 117 Solution: Given two arbitrary elements u, v ∈ W we must show that au + bv ∈ W for all a, b ∈ R. Since u, v ∈ W we know that v1 u1 u = u2 , v = v2 . 0 0 Therefore au1 + bv1 au + bv = au2 + bv2 ∈ W, 0 3 hence W is a subspace of R . In Fig. 29 we see the plane u3 = 0. It is a subspace, since not only 0 ∈ W , but also any linear combination of vectors on the plane results in a vector on the plane. R u3 3 u2 u1 W = { u 3 =0 } Figure 29. The horizontal plane u3 = 0 is a subspace of R3 . Example 4.1.7: Show that the set W = u = [ui ] ∈ R2 : u2 = 1 is not a subspace of R2 . Solution: The set W is not a subspace, since 0 ∈ W . This is enough to show that W is / not a subspace. Also notice that the addition of two vectors in the set is a vector outside the set, as can be seen by the following calculation, u= u1 ∈ W, 1 v= v1 ∈W 1 ⇒ u+v = u1 + v1 ∈ W. / 2 An example of this calculation is given in Fig. 30. u2 R 2 1 W = { u2 = 1 } u1 Figure 30. The horizontal line u2 = 1 is not a subspace of R2 . 118 G. NAGY – LINEAR ALGEBRA december 8, 2009 Example 4.1.8: Determine which one of the sets given in Fig. 31 is a subspace of R2 . R 2 x2 R U x2 2 R 2 x2 W V x1 x1 x1 Figure 31. Three subsets, U , V , and W , of R2 . Only the set U is a subspace. Solution: The set U is a vector space, since any linear combination of vectors parallel to the line is again a vector parallel to the line. The sets V and W are not subspaces, since given a vector u in these spaces, a the vector au does not belong to these sets for a number a ∈ R big enough. This argument is sketched in Fig. 32. R 2 x2 R U x2 2 R 2 x2 au W au au V u u u x1 x1 x1 Figure 32. Three subsets, U , V , and W , of R2 . Only the set U is a subspace. 4.1.2. The span of a set of vectors. If a set is not a subspace there is a way to increase it into a subspace. Define a new set including all possible linear combinations of elements in the old set. Definition 4.8. Given a finite set S = {u1 , · · · , un } in a vector space V over F, the span of the set S , denoted as Span(S ), is the set Span(S ) = {u ∈ V : u = c1 u1 + · · · + cn un , c1 , · · · , cn ∈ F}. The following result remarks that the span of a set is a subspace. Proposition 4.9. Given a finite set S in a vector space V , Span(S ) is a subspace of V . Proof of Proposition 4.9: Since Span(S ) contains all possible linear combinations of the elements in S , then Span(S ) is closed under linear combinations. This establishes the Proposition. Example 4.1.9: The subspace Span {v} , that is, the set of all possible linear combinations of the vector v, is formed by all vectors of the form cv, and these vectors belong to a line containing v. The subspace Span {v, w} , that is, the set of all linear combinations of two G. NAGY – LINEAR ALGEBRA December 8, 2009 R 2 R x2 119 3 x3 Span{v,w} Span{v} v v w x1 x2 x1 Figure 33. Examples of the span of a set of a single vector, and the span of a linearly independent set of two vectors. vectors v, w, is the plane containing both vectors v and w. See Fig. 33 for the case of the vector spaces R2 and R3 , respectively. 4.1.3. Intersection and addition of subspaces. We now show that the intersection of two subspaces is again a subspace. However, the union of two subspaces is not, in general, a subspace. The smaller subspace containing the union of two subspaces is precisely the span of the union. We then define the addition of two subspaces as the span of the union of two subspaces. Proposition 4.10. If W1 and W2 are subspaces of a vector space V , then W1 ∩ W2 ⊂ V is also a subspace of V . Proof of Proposition 4.10: Let u and v be any two elements in W1 ∩ W2 . This means that u, v ∈ W1 , which is a subspace, so any linear combination (au + bv) ∈ W1 . Since u, v belong to W1 ∩ W2 they also belong to W2 , which is a subspace, so any linear combination (au + bv) ∈ W2 . Therefore, any linear combination (au + bv) ∈ W1 ∩ W2 . This establishes the Proposition. Example 4.1.10: The sketch in Fig. 34 shows the intersection of two subspaces in R3 , a plane and a line. In this case the intersection is the former line, so the intersection is a subspace. While the intersection of two subspaces is always a subspace, their union is, in general, not a subspace, unless one subspace is contained into the other. Example 4.1.11: Consider the vector space V = R2 , and the subspaces W1 and W2 given by the lines sketched in Fig. 35. Their union is the set formed by these two lines. This set is not a subspace, since the addition of the vectors u1 ∈ W1 with u2 ∈ W2 does not belongs to W1 ∪ W2 , as is it shown in Fig. 35. Although the union of two subspaces is not always a subspace, it is possible to enlarge the union into a subspace. The idea is to incorporate all possible additions of vectors in the two original subspaces, and the result is called the addition of the two subspaces. Here is the precise definition. 120 G. NAGY – LINEAR ALGEBRA december 8, 2009 R 3 x3 W2 W 1 x2 x1 Figure 34. Intersection of two subspaces, W1 and W2 in R3 . Since the line W2 is included into the plane W1 , we have that W1 ∩ W2 = W2 . x2 R2 W2 v u1 u2 x1 W1 Figure 35. The unions of the subspaces W1 and W2 is the set formed by these two lines. This is not a subspace, since the addition of u1 ∈ W1 and u2 ∈ W2 is the vector v ∈ W1 ∪ W2 . / Definition 4.11. Given two subspaces W1 , W2 of a vector space V , the addition of W1 and W2 , denoted as W1 + W2 , is the set given by W1 + W2 = u ∈ V : u = w1 + w2 with w1 ∈ W1 , w2 ∈ W2 . The following result remarks that the addition of subspaces is again a subspace. Proposition 4.12. If W1 and W2 are subspaces of a vector space V , then the addition W1 + W2 is also a subspace of V . Proof of Proposition 4.12: Suppose that x ∈ W1 + W2 and y ∈ W1 + W2 . We must show that any linear combination ax + by also belongs to W1 + W2 . This is the case, by the following argument. Since x ∈ W1 + W2 , there exist x1 ∈ W1 and x2 ∈ W2 such that x = x1 + x2 . Analogously, since y ∈ W1 + W2 , there exist y1 ∈ W1 and y2 ∈ W2 such that y = y1 + y2 . Now any linear combination of x and y satisfies ax + by = a(x1 + x2 ) + b(y1 + y2 ) = (ax1 + by1 ) + (ax2 + by2 ) G. NAGY – LINEAR ALGEBRA December 8, 2009 121 Since W1 and W2 are subspaces, (ax1 + by2 ) ∈ W1 , and (ax2 + by2 ) ∈ W2 . Therefore, the equation above says that (ax + by) ∈ W1 + W2 . This establishes the Proposition. Example 4.1.12: The sketch in Fig. 36 shows the union and the addition of two subspaces in R3 , each subspace given by a line through the origin. While the union is not a subspace, their addition is the plane containing both lines, which is a subspace. Given any non-zero vector w1 ∈ W1 and any other non-zero vector w2 ∈ W2 , one can verify that the sum of two subspaces is the span of w1 , w2 , that is, W1 + W2 = Span w1 ∪ w2 . x3 3 R W + W2 1 W2 x2 x1 W1 Figure 36. Union and addition of the subspaces W1 and W2 in R3 . The union is not a subspace, while the addition is a subspace of R3 . 4.1.4. Internal direct sums. This is a particular case of the addition of subspaces. It is called internal direct sum in order to differentiate it from another type of direct sum found in the literature. The latter, also called external direct sum, is a sum of different vector spaces, and it is a way to construct new vector spaces from old ones. We do not discuss this type of direct sums here. From now on, direct sum in these notes means the internal direct sum of subspaces inside a vector space. Definition 4.13. Given a vector space V , we say that V is the (internal) direct sum of the subspaces W1 , W2 ⊂ V , denoted as V = W1 ⊕ W2 , iff every vector v ∈ V can be written in a unique way, except for order, as a sum of vectors from W1 and W2 . That is, ˜ ˜ for every v ∈ V exist w1 ∈ W1 and w2 ∈ W2 such that v = w1 + w2 , and if v = w1 + w2 ˜ ˜ ˜ ˜ with w1 ∈ W1 and w2 ∈ W2 , then w1 = w1 and w2 = w2 . If V = W1 ⊕ W2 , we say that W1 and W2 are direct summands of V , and that W1 is the direct complement of W2 in V . There is an useful characterization of internal direct sums. Proposition 4.14. A vector space V is the direct sum of subspaces W1 and W2 iff holds both V = W1 + W2 and W1 ∩ W2 = {0}. Proof of Proposition 4.14: (⇒) If V = W1 ⊕ W2 , then it implies that V = W1 + W2 . Suppose that v ∈ W1 ∩ W2 , then on the one hand, there exists w1 ∈ W1 such that v = w1 + 0; on the other hand, there is w2 ∈ W2 such that v = 0 + w2 . Therefore, w1 = 0 and w2 = 0, so W1 ∩ W2 = {0}. 122 G. NAGY – LINEAR ALGEBRA december 8, 2009 (⇐) Since V = W1 + W2 , for every v ∈ V there exist w1 ∈ W1 and w2 ∈ W2 such ˜ ˜ that v = w1 + w2 . Suppose there exists other vectors w1 ∈ W1 and w2 ∈ W2 such that ˜ ˜ v = w1 + w2 . Then, ˜ ˜ 0 = (w1 − w1 ) + (w2 − w2 ) ⇔ ˜ ˜ (w1 − w1 ) = −(w2 − w2 ), ˜ ˜ Therefore (w1 − w1 ) ∈ W2 and so (w1 − w1 ) ∈ W1 ∩ W2 . Since W1 ∩ W2 = {0}, we then ˜ ˜ conclude that w1 = w1 , which also says w2 = w2 . Then V = W1 ⊕ W2 . This establishes the Proposition. Example 4.1.13: Denote by Sym and SkewSym the sets of all symmetric and all skewsymmetric n × n matrices. Show that Fn,n = Sym ⊕ SkewSym. Solution: Given any matrix A ∈ Fn,n , holds 1 1 1 A = A + AT − AT = A + AT + A − AT , 2 2 2 where B = A + AT /2 ∈ Sym and C = A − AT /2 ∈ SkewSym. Furthermore, if there exist ˜ ˜ ˜˜ ˜ ˜ B ∈ Sym and C ∈ SkewSym such that A = B + C, then B − B = C − C and these matrices are both symmetric and skew-symmetric, which implies that they are the zero matrix. So we have shown that Fn,n = Sym ⊕ SkewSym. G. NAGY – LINEAR ALGEBRA December 8, 2009 123 Exercises. 4.1.1.- Determine subsets of Rn , subspaces: (a) {x ∈ Rn : (b) {x ∈ Rn : (c) {x ∈ Rn : (d) {x ∈ Rn : (e) {x ∈ Rn : (f) {x ∈ Rn : which of the following with n 1, are in fact xi 0 i = 1, · · · , n}; x1 = 0}; x1 x2 = 0 n 2}; x 1 + · · · + x n = 0 }; x 1 + · · · + x n = 1 }; Ax = b, A = 0, b = 0}. 4.1.2.- Determine which of the following subsets of Fn,n , with n 1, are in fact subspaces: (a) {A ∈ Fn,n : A = AT }; (b) {A ∈ Fn,n : A invertible}; (c) {A ∈ Fn,n : A not invertible}; (d) {A ∈ Fn,n : A upper-triang.}; (e) {A ∈ Fn,n : A2 = A}; (f) {A ∈ Fn,n : tr (A) = 0}. (g) Given a matrix X ∈ Fn,n , define {A ∈ Fn,n : [A, X] = 0}. 4.1.3.- Find W1 + W2 ⊂ R3 , where W1 is a plane passing through the origin in R3 and W2 is a line passing through the origin in R3 not contained in W1 . 4.1.4.- Sketch a picture of the subspaces spanned by the following vectors: 23232 3 3 −2 o n1 (a) 425 , 465 , 4−45 ; 3 9 −6 232323 1 2o n0 (b) 425 , 415 , 435 ; 0 0 0 232323 1 1o n1 (c) 405 , 415 , 415 . 0 0 1 4.1.5.- Given two finite subsets S1 , S2 in a vector space V , show that Span(S1 ∪ S2 ) = Span(S1 ) + Span(S2 ). 4.1.6.- Let W1 ⊂ R3 be the subspace 23 23 1 o” “n 1 W1 = Span 425 , 405 . 3 1 Find a subspace W2 ⊂ R3 such that R 3 = W1 ⊕ W2 . 124 G. NAGY – LINEAR ALGEBRA december 8, 2009 4.2. Linear dependence In this Section we present the notion of a linearly dependent set of vectors. If one of the vectors in the set is a linear combination of the other vectors in the set, then the set is called linearly dependent. If this is not the case, the set is called linearly independent. This notion plays a crucial role to define a basis of a vector space in Sect. 4.3. Bases are very useful in part because every vector in the vector space can be decomposed in a unique way as a linear combination of the basis elements. Bases also provide a precise way to measure the size of a vector space. Definition 4.15. A finite set of vectors v1 , · · · , vk in a vector space is called linearly dependent iff there exists a corresponding set of scalars {c1 , · · · , ck }, not all of them zero, such that, c1 v1 + · · · + ck vk = 0. (4.1) On the other hand, the set v1 , · · · , vk is called linearly independent iff Eq. (4.1) implies that all the scalars c1 = · · · = ck = 0. The wording in this definition is carefully chosen to cover the case of the empty set. The result is that the empty set is linearly independent. It might seems strange, but this result fits well with the rest of the theory. On the other hand, the set {0} is linearly dependent, since c1 0 = 0 for any c1 = 0. Moreover, any set containing the zero vector is also linearly dependent. Linear dependence or independence are properties of a set of vectors. There is no meaning to a vector to be linearly dependent, or independent. And there is no meaning of a set of linearly dependent vectors, as well as a set of linearly independent vectors. What is meaningful is to talk of a linearly dependent or independent set of vectors. Example 4.2.1: Show that the set S ⊂ R2 below is linearly dependent, S= Solution: It is clear that 2 1 0 =2 +3 3 0 1 1 0 2 , , 0 1 3 ⇒ 2 . 1 0 2 0 +3 − = . 0 1 3 0 Since c1 = 2, c2 = 3, and c3 = −1 are non-zero, the set S is linearly dependent. It will be convenient to have the concept of a linearly dependent or independent set containing infinitely many vectors. Definition 4.16. An infinite set of vectors S = v1 , v2 , · · · in a vector space V is called linearly independent iff every finite subset of S is linearly independent. Otherwise, the infinite set S is called linearly dependent. Example 4.2.2: Consider the vector space V = C ∞ [− , ], R , that is, the space of infinitely differentiable real-valued functions defined on the domain [− , ] ⊂ R with the usual operation of linear combination of functions. This vector space contains linearly independent sets with infinitely many vectors. One example is the infinite sets S1 below, which is linearly independent, S1 = 1, x, x2 , · · · , xn , · · · . Another example is the infinite set S2 , which is also linearly independent, nπx ∞ nπx , sin . S2 = 1, cos n=1 G. NAGY – LINEAR ALGEBRA December 8, 2009 125 As we have seen in the Example 4.2.1 above, in a linearly dependent set there is always at least one vector that is a linear combination of the other vectors in the set. This is simple to see from the Definition 4.15. Since not all the coefficients ci are zero in a linearly dependent set, let us suppose that cj = 0; then from the Eq. (4.1) we obtain vj = − 1 c1 v1 + · · · + cj −1 vj −1 + cj +1 vj +1 + · · · + ck vk , cj that is, vj is a linear combination of the other vectors in the set. Proposition 4.17. The set v1 , · · · , vk is linearly dependent with the vector vk being a linear combination of the the remaining k − 1 vectors iff Span v1 , · · · , vk = Span v1 , · · · , vk−1 . This Proposition captures the idea behind the notion of a linearly dependent set: A finite set is linearly dependent iff there exists a smaller set with the same span. In this sense the vector vk in the Proposition above is redundant with respect to linear combinations. Proof of Proposition 4.17: Let Sk = v1 , · · · , vk and Sk−1 = v1 , · · · , vk−1 . On the one hand, if vk is a linear combination of the other vectors in S , then for every x ∈ Span(Sk ) can be expressed as an element in Span(Sk−1 ) simply by replacing vk in terms ˜ of the vectors in S . This shows that Span(Sk ) ⊂ Span(Sk−1 ). The other inclusion is trivial, so Span(Sk ) = Span(Sk−1 ). On the other hand, if Span(Sk ) = Span(Sk−1 ), this means that vk is a linear combination of the elements in Sk−1 . Therefore, the set Sk is linearly dependent. This establishes the Proposition. Example 4.2.3: Consider the set S ⊂ R3 given by −2 4 −2 −4 S = 2 , −6 , −3 , 1 . −3 8 2 −3 ˜ ˜ Find a set S ⊂ S having the smallest number of vectors such that Span(S ) = Span(S ). Solution: We have to find all the redundant vectors in S with respect to linear combina˜ tions. In other words, with have to find a linearly independent subset of S ⊂ S such that ˜ Span(S ) = Span(S ). The calculation we must do is to find the non-zero coefficients ci in the solution of the equation c 0 −2 4 −2 −4 −2 4 −2 −4 1 c 0 = c1 2 + c2 −6 + c3 −3 + c4 1 = 2 −6 −3 1 2 . c3 0 −3 8 2 −3 −3 8 2 −3 c4 Hence, we must find the reduced echelon form of the coefficient matrix above, that is, 2 −2 A=4 2 −3 4 −6 8 −2 −3 2 3 2 −4 1 1 5 → 40 −3 0 −2 −2 2 1 −5 5 3 2 2 1 −35 → 40 3 0 This means that the solution for the coefficients is 3 5 c1 = −6c3 − 5c4 , c2 = − c3 − c4 , 2 2 −2 2 0 1 5 0 3 2 2 1 3 5 → 40 0 0 0 1 0 c3 , c4 free variables. 6 5 2 0 5 3 35 2 0 = EA . 126 G. NAGY – LINEAR ALGEBRA december 8, 2009 Choosing: c4 = 0, c3 = 2 ⇒ c1 = −12, c2 = −5 ⇒ c4 = 2, c3 = 0 ⇒ c1 = −10, c2 = −3 ⇒ −2 4 −2 0 −12 2 − 5 −6 + 2 −3 = 0 , −3 8 2 0 −2 4 −4 0 −10 2 − 3 −6 + 2 1 = 0 . −3 8 −3 0 We can interpret this result thinking that the third and fourth vectors in matrix A are linear combination of the first two vectors. Therefore, a linearly independent subset of S having its same span is given by −2 4 ˜ S = 2 , −6 . −3 8 ˜ is in matrix EA , the reduced echelon form of Notice that all the information to find S matrix A, 1065 −2 4 −2 −4 5 1 → 0 1 2 3 = EA . A = 2 −6 −3 2 −3 8 2 −3 0000 The columns with pivots in EA determine the column vectors in A that form a linearly independent set. The non-pivot columns in EA determine the column vectors in A that are linear combination of the vectors in the linearly independent set. The factors of these linear combinations are precisely the component of the non-pivot vectors in EA . For example, the last column vector in EA has components 5 and 3/2, and these are precisely the coefficients in the linear combination: −4 −2 4 3 1 = 5 2 + −6 . 2 −3 −3 8 In Example 4.2.3 we answered a question about the linear independence of a set S = v1 , · · · , vn ⊂ Fn by studying the properties of a matrix having these vectors a column vectors, that is, A = v1 , · · · , vn . It turns out that this is a good idea and the following result summarizes few useful relations. Proposition 4.18. Given Let A ∈ Fm,n , the following statements are equivalent: (a) The column vectors of matrix A form a linearly independent set; (b) N (A) = {0}; (c) rank(A) = n Furthermore, in the particular case that A is an n × n matrix, the column vectors of A form a linearly independent set iff A is invertible. Proof of Proposition 4.18: Let us denote by S = v1 , · · · , vn ⊂ Fm a set of vectors in a vector space, and introduce the matrix A = v1 , · · · , vn . The set S is linearly independent iff only solution c ∈ Rn to the equation Ac = 0 is the trivial solution c = 0. This is equivalent to say that N (A) = {0}. This is equivalent to say that EA has n pivot columns, which is equivalent to say that rank(A) = n. The furthermore part is straightforward, since an n × n matrix A is invertible iff rank(A) = n. This establishes the Proposition. Further reading. See Section 4.3 in Meyer’s book [3]. G. NAGY – LINEAR ALGEBRA December 8, 2009 127 Exercises. 4.2.1.- Determine which of the following sets is linearly independent. For those who are linearly dependent, express one vector as a linear combination of the other 2 3 2 in the3 vectors 3 2 set. 2 1o n1 (a) 425 , 415 , 455 ; 3 0 9 23232323 0 0 1o n1 (b) 425 , 445 , 405 , 415 ; 3 5 6 1 232323 3 1 2o n (c) 425 , 405 , 415 . 1 0 0 2 3 2110 4.2.2.- Let A = 44 2 1 25. 6323 (a) Find a maximal linearly independent set of the columns of A. (b) Find the number of linearly independent sets that can be constructed using any number of column vectors of A. 4.2.3.- Show that any set containing the zero vector must be linearly dependent. 4.2.4.- Given a vector space V , prove the following: If the set {v , w } ⊂ V is linearly independent, then so is ˘ ¯ (v + w), (v − w) . 4.2.5.- Determine whether the set – n »1 2 – » 2 1 – » 4 −1 o , , ⊂ R2,2 21 11 −1 1 is linearly independent of dependent. 4.2.6.- Show that the following set in P2 is linearly dependent, {1, x, x2 , 1 + x + x2 }. 4.2.7.- Determine whether S ⊂ P2 is a linearly independent set, where ˘ ¯ S = 1 + x + x2 , 2x − 3x2 , 2 + x . 128 G. NAGY – LINEAR ALGEBRA december 8, 2009 4.3. Bases and dimension In this Section we introduce a notion that quantifies the size of a vector space. We first separate two main cases, the vector spaces we call finite dimensional from those called infinite dimensional. In the case of finite dimensional vector spaces we introduce the notion of a basis. This is a particular type of set in the vector space that is small enough to be a linearly independent set and big enough to span the whole vector space. A basis of a finite dimensional vector space is not unique. In the case of infinite dimensional vector spaces we do not introduce here a concept of basis. More structure is needed in the vector space to be able to determine whether or not an infinite sum of vectors converges. We will not discuss these issues here. 4.3.1. Bases. A particular type of finite sets in a vector space, small enough to be linearly independent and big enough to span the whole vector space, is called a basis of that vector space. Vector spaces having a finite set with these properties are essentially small, and they are called finite dimensional. When there is no finite set that spans the whole vector space, we call that space infinite dimensional. We repeat these ideas more precisely in the following statement. Definition 4.19. A finite set S ⊂ V is called a finite basis of a vector space V iff S is linearly independent and Span(S ) = V . The existence of a finite basis is the property that defines the size of the vector space. Definition 4.20. A vector space V is finite dimensional iff V has a finite basis or V is one of the following two extreme cases: V = ∅ or V = {0}. Otherwise, the vector space V is called infinite dimensional. In these notes we will often call a finite basis just simply as a basis, without remarking that they contain a finite number of elements. We only study this type of basis, and we do not introduce the concept of an infinite basis. Why don’t we define the notion of an infinite basis, since we have already defined the notion of an infinite linearly independent set? Because we do not have any way to define what is the span of an infinite set of vectors. In a vector space, without any further structure, there is no way to know whether an infinite sum converges or not. The notion of convergence needs further structure in a vector space, for example it needs a notion of distance between vectors. So, only in certain vector spaces with a notion of distance it is possible to introduce an infinite basis. We will discuss this subject in the next Chapter. Example 4.3.1: We now present several examples. 1 0 (a) Let V = R2 , then the set S2 = e1 = , e2 = is a basis of F2 . Notice that 0 1 ei = I:i , that is, ei is the i-th column of the identity matrix I2 . This basis S2 is called the standard basis of R2 . (b) A vector space can have infinitely many bases. For example, a second basis for R2 is 1 −1 the set U = u1 = , u2 = . It is not difficult to verify that this set is a basis 1 1 of R2 , since u is linearly independent, and Span(U ) = R2 . (c) Let V = Fn , then the set Sn = e1 = I:1 , · · · , en = I:n is a basis of Rn , where I:i is the i-th column of the identity matrix In . This set Sn is called the standard basis of Fn . (d) A basis for the vector space F2,2 of all 2 × 2 matrices is the set S2,2 given by S2,2 = E11 = 1 0 0 01 00 0 , E12 = , E21 = , E22 = 0 00 10 0 0 1 ; G. NAGY – LINEAR ALGEBRA December 8, 2009 129 This set is linearly independent and Span(S2,2 ) = F2,2 , since any element A ∈ F2,2 can be decomposed as follows, A= A11 A21 A12 10 0 = A11 + A12 A22 00 0 1 00 0 + A21 + A22 0 10 0 0 . 1 (e) A basis for the vector space Fm,n of all m × n matrices is the following: Sm,n = E11 , E12 , · · · , Emn , where each m × n matrix Eij is a matrix with all coefficients zero except the coefficient (i, j ) which is equal to one (see previous example). The set Sm,n is linearly independent, and Span(Sm,n ) = Fm,n , since m n A = [Aij ] = Aij Eij . i=1 j =1 (f) Let V = Pn , the set of all polynomials with domain R and degree less or equal n. Any element p ∈ Pn can be expressed as follows, p(x) = a0 + a1 x + a2 x2 + · · · + an xn , that is equivalent to say that the set S = p0 = 1, p1 = x, p2 = x2 , · · · , pn = xn satisfies Pn = Span(S ). The set S is also linearly independent, since q(x) = c0 + c1 x + c2 x2 + · · · + cn xn = 0 ⇒ c0 = · · · = cn = 0. The proof of the latter statement is simple: Compute the n-th derivative of q above, and obtain the equation n! cn = 0, so cn = 0. Add this information into the (n − 1)-th derivative of q and we conclude that cn−1 = 0. Continue in this way, and you will prove that all the coefficient c’s vanish. Therefore, S is a basis of Pn , ands it is also called the standard basis of Pn . 1 0 1 Example 4.3.2: Show that the set U = 0 , 1 , 0 is a basis for R3 . 1 1 −1 Solution: We must show that U is a linearly independent set and Span(U ) = R3 . Both properties follow from the fact that matrix U below, whose columns are the elements in U , is invertible, 10 1 1 −1 1 1 0 ⇒ U− 1 = 0 2 0 . U = 0 1 2 1 1 −1 1 1 −1 Let us show that U is a basis of R3 : Since matrix U is invertible, this implies that its reduced echelon form EU = I3 , so its column vectors form a linearly independent set. The existence of U−1 implies that the system of equations Ux = y has a solution x = U−1 y for every y ∈ R3 , that is, y ∈ Col(U) = Span(U ) for all y R3 . This means that Span(U ) = R3 . Hence, the set U is a basis of R3 . The following definitions will be useful to establish important properties of a basis. 130 G. NAGY – LINEAR ALGEBRA december 8, 2009 Definition 4.21. Let V be a vector space and Sn ⊂ V be a subset with n elements. The set Sn is a maximal linearly independent set iff Sn is linearly independent and every other ˜ set Sm with m > n elements is linearly dependent. The set Sn is a minimal spanning set ˜ ˜ iff Span(S ) = V and every other set Sm with m < n elements satisfies Span(S ) V . A maximal linearly independent set S is the biggest set in a vector space that is linearly independent. A set cannot be linearly independent if it is too big, since the bigger the set the more probable that one element in the set is a linear combination of the other elements in the set. A minimal spanning set is the smallest set in a vector space that spans the whole space. A spanning set, that is, a set whose span is the whole space, cannot be too small, since the smaller the set the more probable that an element in the vector space is outside the span of the set. The following result provides a useful characterization of a basis: A basis is a set in the vector space that is both maximal linearly independent and minimal spanning. In this sense, a basis is a set with the right size, small enough to be linearly independent and big enough to span the whole vector space. Proposition 4.22. Let V be a vector space. The following statements are equivalent: (a) U is a basis of V ; (b) U is a minimal spanning set in V . (c) U is a maximal linearly independent set in V ; 1 0 1 Example 4.3.3: We showed in Example 4.3.2 above that the set U = 0 , 1 , 0 1 1 −1 is a basis for R3 . Since this basis has three elements, Proposition 4.22 says that any other spanning set in R3 cannot have less than three vectors, and any other linearly independent set in R3 cannot have more that three vectors. For example, any subset of U containing two elements cannot span R3 ; the linear combination of two vectors in U span a plane in R3 . Another example, any set of four vectors in R3 must be linearly dependent. Proof of Proposition 4.22: We first show part (b). (⇒) Assume that U is a basis of V . If the set U is not a minimal spanning set of V , that ˜ ˜ ˜ ˜ means there exists U = u1 , · · · , un−1 such that Span(U ) = V . So, every vector in U can ˜ be expressed as a linear combination of vectors in U . Hence, there exists a set of coefficients Cij such that n−1 ˜ Cij ui , uj = j = 1, · · · , n. i=1 The reason to order the coefficients Cij is this form is that they form a matrix C = [Cij ] which is (n − 1) × n. This matrix C defines a function C : Rn → Rn−1 , and since rank(C) (n − 1) < n, this matrix satisfies that N (C) is nontrivial as a subset of Rn . So there exists a nonzero column vector in Rn with components z = [zj ] ∈ Rn , not all components zero, such that z ∈ N (C), that is, n Cij zj = 0, i = 1, · · · , (n − 1). j =1 What we have found is that the linear combination n z1 u1 + · · · + zn un = n zj uj = j =1 n−1 j =1 n−1 n ˜ Cij ui = zj i=1 ˜ Cij zj ui = 0, i=1 j =1 G. NAGY – LINEAR ALGEBRA December 8, 2009 131 with at least one of the coefficients zj non-zero. This means that the set U is not linearly independent. But this contradicts that U is a basis. Therefore, the set U is a minimal spanning set of V . (⇐) Assume that U is a minimal spanning set of V . If U is not a basis, that means U is not a linearly independent set. At least one element in U must be a linear combination of the others. Let us arrange the order of the basis vectors such that the vector un is a linear ˜ combination of the other vectors in U . Then, the set U = u, · · · , un−1 must still span V , ˜ that is Span(U ) = V . But this contradicts the assumption that U is a minimal spanning set of V . We now show part (c). (⇐) Assume that U is a maximal linearly independent set in V . If U is not a basis, that means Span(U ) V , so there exists un+1 ∈ V such that un+1 ∈ Span(U ). Hence, the set / ˜ = u, · · · , un+1 is a linearly independent set. However, this contradicts the assumption U that U is a maximal linearly independent set. We conclude that U is a basis of V . (⇒) Assume that U is a basis of V . If the set U is not a maximal linearly independent ˜ ˜ ˜ set in V , then there exists a maximal linearly independent set U = u1 , · · · , uk , with ˜ ˜ k > n. By the argument given just above, U is a basis of V . By part (b) the set U must be a minimal spanning set of V . However, this is not true, since U is smaller and spans V . Therefore, U must be a maximal linearly independent set in V . This establishes the Proposition. 4.3.2. Dimension. The characterization of a basis given in Proposition 4.22 above implies that the number of elements in a basis is always the same as in any other basis. Theorem 4.23. The number of elements in any basis of a finite dimensional vector space is the same as in any other basis. Proof of Theorem (4.23: Let Vn and Vm be two bases of a vector space V with n and m elements, respectively. If m > n, the property that Vm is a minimal spanning set implies that Span(Vn ) Span(Vm ) = V . The former inclusion contradicts that Vn is a basis. Therefore, n = m. (A similar proof can be constructed with the maximal linearly independence property of a basis.) This establishes the Theorem. The number of elements in a basis of a finite dimensional vector space is a characteristic of the vector space, so we give that characteristic a name. Definition 4.24. The dimension of a finite dimensional vector space V with a finite basis is the number of elements in a basis of V . The extreme cases of V = ∅ and V = {0} are defined as zero dimensional. We use the notation dim V to represent the dimension of the vector space V . For example, dim{0} = 0 and dim ∅ = 0. Example 4.3.4: We now present several examples. (a) The set Sn = e1 = I:1 , · · · , en = I:n is a basis for Fn , so dim Fn = n. (b) A basis for the vector space F2,2 of all 2 × 2 matrices is the set S2,2 is given by S2,2 = E11 = 1 0 0 01 00 0 , E12 = , E21 = , E22 = 0 00 10 0 0 1 so we conclude that dim F2,2 = 4. (c) A basis for the vector space Fm,n of all m × n matrices is the following: Sm,n = E11 , E12 , · · · , Emn , , 132 G. NAGY – LINEAR ALGEBRA december 8, 2009 where we recall that each m × n matrix Eij is a matrix with all coefficients zero except the coefficient (i, j ) which is equal to one. Since the basis Sm,n contains mn elements, we conclude that dim Fm,n = mn. (d) A basis for the vector space Pn of all polynomial with degree less or equal n is the set S given by S = p0 = 1, p1 = x, p2 = x2 , · · · , pn = xn . This set has n + 1 elements, so dim Pn = n + 1. Given a vector space V , any subspace W ⊂ V is also a vector space, so the definition of basis also holds for W . Since W ⊂ V , we conclude that dim W dim V . Example 4.3.5: Consider the case V = R3 . It is simple to see in Fig. 37 that dim U = 1 and dim W = 2, where the subspaces U and W are spanned by one vector and by two non-collinear vectors, respectively. V= R U 3 0 W dim ( U ) = 1 dim ( W ) = 2 Figure 37. Sketch of two subspaces U and W , in the vector space R3 , of dimension one and two, respectively. Example 4.3.6: Find a basis for N (A) and R(A), where matrix A ∈ R3,4 is given by −2 4 −2 −4 1 . A = 2 −6 −3 (4.2) −3 8 2 −3 Solution: Since A ∈ R3,4 , then A : R4 → R3 , which implies that N (A) ⊂ R4 while R(A) ⊂ R3 . A basis for N (A) is found as follows: Find all solution of Ax = 0 and express these solutions as the span of a linearly independent set of vectors. We first find EA , x1 = −6x3 − 5x4 , 1065 −2 4 −2 −4 5 3 1 → 0 1 5 3 = EA ⇒ A = 2 −6 −3 x = − x3 − x4 , 2 2 2 2 2 −3 8 2 −3 0000 x3 , x4 free variables. Therefore, every element in N (A) can be expressed as follows, −6x3 − 5x4 −5 −6 − 5 x3 − 3 x4 − 5 − 3 2 = 2 x3 + 2 x4 , ⇒ N (A) = Span x= 2 0 1 x3 0 1 x4 −6 −5 − 5 − 3 2 , 2 1 0 0 1 . G. NAGY – LINEAR ALGEBRA December 8, 2009 133 Since the vectors in the span above form a linearly independent set, we conclude that a basis for N (A) is the set N given by −6 −5 − 5 − 3 N = 2 , 2 . 1 0 0 1 We now find a basis for R(A). We know that R(A) = Col(A), that is, the span of the column vectors of matrix A. We only need to find a linearly independent subset of column vectors of A. This information is given in EA , since the pivot columns in EA indicate the columns in A which form a linearly independent set. In our case, the pivot columns in EA are the first and second columns, so we conclude that a basis for R(A) is the set R given by −2 4 R = 2 , −6 . −3 8 We know that a basis of a vector space is not unique, and the following result says that actually any linearly independent set can be extended into a basis of a vector space. Proposition 4.25. Let V = v1 , · · · , vn be a basis of a vector space V , and let Sk = u1 , · · · , uk be a linearly independent set in V , with k < n. Then, there always exists an extension of the set Sk of the form S = u1 , · · · , uk , vi1 , · · · , vin−k that is a basis of V . The statement above says that a linearly independent set Sk can be extended into a basis S of a a vector space V simply incorporating appropriate vectors from any basis of V . If a basis V of V has n vectors, and the set Sk has k < n vectors, then one can always select n − k vectors from the basis V to enlarge the set Sk into a basis of V . Proof of Proposition 4.25: Introduce the set Sk+n sk+n = u1 , · · · , uk , v1 , · · · , vn . We know that Span(Sk+n ) = V since V ⊂ Sk+n . We also know that Sk+n is linearly dependent, since the maximal linearly independent set contains n elements and Sk+n contains n + k > n elements. The idea is to eliminate the vi such that Sk ∪ {vi } is linearly dependent. Since the maximal linearly independent set contains n elements and the Sk is linearly independent, there are k elements in V that will be eliminated. The resulting set is S , which is a basis of V containing Sk . Example 4.3.7: Given the 3 × 4 matrix A defined in Eq. (4.2) in Example 4.3.6 above, extend the basis of N (A) ⊂ R4 into a basis of R4 . Solution: We know from Example 4.3.6 that a basis for the N (A) is the set N given by −6 −5 − 5 − 3 N = 2 , 2 . 1 0 0 1 134 G. NAGY – LINEAR ALGEBRA december 8, 2009 Following the idea in the proof of Proposition vectors among the columns of the matrix −6 −5 − 5 − 3 2 M= 2 1 0 0 1 4.25, we look for a linear independent set of 1 0 0 0 000 1 0 0 . 0 1 0 001 That is, matrix M include the basis vectors of N (A) and the four vectors ei of the standard basis of R4 . It is important to place the basis vectors of N (A) in the first columns of M. In this way, the Gauss method will select these first vectors as part of the linearly independent set of vectors. Find now the reduced echelon form matrix EM , 100010 0 1 0 0 0 1 EM = 0 0 1 0 6 5 . 000153 Therefore, the first four vectors in M are form a linearly independent set, so a basis of R4 that includes N is given by −6 −5 1 0 − 5 − 3 0 1 V = 2 , 2 , , . 1 0 0 0 0 0 0 1 Recall that the sum and the intersection of two subspaces is again a subspace in a given vector space. The following result relates the dimension of a sum of subspaces with the dimension of the individual subspaces and the dimension of their intersection. Proposition 4.26. If W1 , W2 ⊂ V are subspaces of a vector space V , then dim(W1 + W2 ) = dim W1 + dim W2 − dim(W1 ∩ W2 ). Proof of Proposition 4.26: We find the dimension of W1 + W2 finding a basis of this sum. The key idea is to start with a basis of W1 ∩ W2 . Let B0 = z1 , · · · , zl be a basis for W1 ∩ W2 . Enlarge that basis into basis B1 for W1 and B2 for W2 as follows, B1 = z1 , · · · , zl , x1 , · · · , xn , B2 = z1 , · · · , zl , y1 , · · · , ym . We use the notation l = dim(W1 ∩ W2 ), l + n = dim W1 and l + m = dim W2 . We now propose as basis for W1 + W2 the set B = z1 , · · · , zl , x1 , · · · , xn , y1 , · · · , ym . By construction this set satisfies that Span(B) = W1 + W2 . We only need to show that B is linearly independent. Assume that the set B is linearly dependent. This means that there is non-zero constants ai , bj and ck solutions of the equation n m ai xi + i=1 This implies that the vector W2 , since n i=1 j =1 ck zk = 0. (4.3) k=1 ai xi , which by definition belongs to W1 , also belongs to n m ai xi = − i=1 l bj yj + l ck zk ∈ W2 . bj yj + j =1 k=1 G. NAGY – LINEAR ALGEBRA December 8, 2009 135 n Therefore, i=1 ai xi belongs to W1 ∩ W2 , and so is a linear combination of the elements of B0 , that is, there exists scalars dk such that l n ai xi = i=1 dk zk . i=1 Since B1 is a basis of W1 , this implies that all the coefficients ai and dk vanish. Introduce this information into Eq. (4.3) and we conclude that m l bj yj + j =1 ck zk = 0. k=1 Analogously, the set B2 is a basis, so all the coefficients bj and ck must vanish. This implies that the set B is linearly independent, hence a basis of W1 + W2 . Therefore, the dimension of the sum is given by dim(W1 + W2 ) = n + m + k = (n + k ) + (m + k ) − k = dim W1 + dim W2 − dim(W1 ∩ W2 ). This establishes the Proposition. The following corollary is immediate. Corollary 4.27. If a vector space can be decomposed as V = W1 ⊕ W2 , then dim(W1 ⊕ W2 ) = dim W1 + dim W2 . The proof is left for the reader. 136 G. NAGY – LINEAR ALGEBRA december 8, 2009 Exercises. 4.3.1.- Find a basis for each of the spaces N (A), R(A), N (AT ), R(AT ), where 2 3 1223 A = 42 4 1 3 5 . 3614 4.3.2.- Find the dimension of the space spanned by 2 3232 32323 3 1 2 1 1 n6 2 7 607 6 8 7 617 637o 6 7,6 7,6 7,6 7,6 7 . 4−15 405 4−45 415 405 1 6 2 8 3 4.3.3.- Find the dimension of the following spaces: (a) The space Pn of polynomials of degree less or equal n. (b) The space Fm,n of m × n matrices. (c) The space of real symmetric n × n matrices. (d) The space of real skew-symmetric n × n matrices. 4.3.4.- Find an example to show that the following statement is false: If {v1 , v2 } is a basis of R2 and W ⊂ R2 is a subspace, then there exists a basis of W containing at least one of the basis vectors v1 , v2 . 4.3.5.- Given the matrix A and vector v, 23 −8 2 3 6 17 12205 67 A = 42 4 3 1 8 5 , v = 6 3 7 , 67 4 35 36155 0 verify that v ∈ N (A), and then find a basis of N (A) containing v. 4.3.6.- Determine whether or not the set 232 3 1o n2 B = 435 , 4 1 5 −1 2 is a basis for the subspace 232323 5 3 o” “n 1 Span 425 , 485 , 445 ⊂ R3 . 3 7 1 G. NAGY – LINEAR ALGEBRA December 8, 2009 137 4.4. Vector components An important property of a basis is that every vector in a finite dimensional vector space can be expressed in a unique way as a linear combination of the basis vectors. Proposition 4.28. Let V be an n-dimensional vector space over the scalar field F, and let U = u1 , · · · , un be a basis of V . Then, every vector v ∈ V determines a unique set of scalars v1 , · · · , vn ⊂ F such that v = v1 u1 + · · · + vn un . (4.4) And every set of scalars {v1 , · · · , vn } ⊂ F determines a unique vector v ∈ V by Eq. (4.4). The scalars v1 , · · · , vn determined in Eq. (4.4) are called the components of vector v in the basis U . Proof of Proposition 4.28: The set U is a basis, hence Span(U ) = V and U is linearly independent. The first property implies that for every v ∈ V there exist scalars v1 , · · · , vn such that v is a linear combination of the basis vectors, that is, v = v1 u1 + · · · + vn un . The second property of a basis implies that the linear combination above is unique. Indeed, if there exists another linear combination v = ν1 u1 + · · · + νn un , then 0 = v − v = (v1 − ν1 )u1 + · · · + (vn − νn )un . Since U is linearly independent, this implies that each coefficient above vanishes, so v1 = ν1 , · · · , vn = νn . The converse statement can be shown as follows: A set of scalars {v1 , · · · , vn } determines ˜ a vector v ∈ V by Eq. (4.4). This vector is unique, since any other vector v ∈ V determined by the right hand side of Eq. (4.4) must satisfy that ˜ v − v = 0u1 + · · · + 0un . ˜ Since U is a basis, v − v = 0. This establishes the Proposition. A basis of a vector space is a linearly independent spanning set without extra conditions, for example without a given ordering. However, in many circumstances it is convenient to consider bases with basis vectors given in a specific order. Definition 4.29. An ordered basis of an n-dimensional vector space V is a sequence (v1 , · · · , vn ) of vectors such that the set {v1 , · · · , vn } is a basis of V . Example 4.4.1: The following four ordered basis of R3 are all different, 0 0 1 0 0 1 0 1 0 1 0 0 0 , 1 , 0 , 1 , 0 , 0 , 1 , 0 , 0 , 0 , 1 , 0 , 1 0 0 0 1 0 0 0 1 0 0 1 1 0 0 however, they determine the same basis S3 = 0 , 1 , 0 . 0 0 1 Using Proposition 4.28 is not difficult to see that there exists a correspondence between vectors in a vector space and their components in an ordered basis. 138 G. NAGY – LINEAR ALGEBRA december 8, 2009 Definition 4.30. Let V be an n-dimensional vector space over F and U = (u1 , · · · , un ) be an ordered basis of V . The coordinate map is the function [ ]u : V → Fn , defined by [v]u = vu , where v1 . vu = . ⇔ v = v1 u1 + · · · + vn un . . vn Therefore, we use the notation [v]u = vu ∈ Fn for the components of a vector v ∈ V in an ordered basis U . We remark that the coordinate map is defined only after an ordered basis is fixed in V . Different ordered bases on V determine different coordinate maps between V and Fn . When the situation under study involves only one ordered basis, we suppress the basis subindex. The coordinate map will be denoted by [ ] : V → Fn and the vector components by v = [v]. In the particular case that V = Fn and the basis is the standard basis Sn , then the coordinate map [ ]s is the identity map, so v = [v]s = v. In this case we follow the convention established in the first Chapters, that is, we denote vectors in Fn by v instead of v. When the situation under study involves more than one ordered basis we keep the sub-indices in the coordinate map, like [ ]u , and in the vector components, like vu , to keep track of the basis attached to these expressions. Example 4.4.2: Let V be the set of points on the plane with a preferred origin. Let S = e1 , e2 be an ordered basis, pictured in Fig. 38. (a) Find the components vs = [v]s ∈ R2 of the vector v = e1 + 3e2 in the ordered basis S . (b) Find the components vu = [v]u ∈ R2 of the same vector v given in part (a) but now in the ordered basis U = u1 = e1 + e2 , u2 = −e1 + e2 . Solution: Part (a) is straightforward to compute, since the definition of component of a vector says that the numbers multiplying the basis vectors in the equation v = e1 + 3e2 are the components of the vector, that is, v = e1 + 3e2 ⇔ vs = 1 . 3 Part (b) is more involved. We are looking for numbers v1 and v2 such that ˜ ˜ v = v1 u1 + v2 u2 ˜ ˜ ⇔ vu = v1 ˜ . v2 ˜ (4.5) From the definition of the basis U we know the components of the basis vectors in U in terms of the standard basis, that is, u1 = e1 + e2 ⇔ u1s = 1 , 1 u2 = −e1 + e2 ⇔ u2s = −1 . 1 In other words, we can write the ordered basis U as the column vectors of the matrix Us = U s = [u1 ]s , [u2 ]s given by Us = u1s , u2s = 1 1 −1 . 1 Expressing Eq. (4.5) in the standard basis means e1 + 3e2 = v = v1 (e1 + e2 ) + v2 (−e1 + e2 ) ˜ ˜ ⇔ 1 1 −1 = v1 ˜ + v2 ˜ . 3 1 1 G. NAGY – LINEAR ALGEBRA December 8, 2009 139 The last equation on the right is a matrix equation for the unknowns vu = 1 1 −1 1 v1 ˜ 1 = v2 ˜ 3 ⇔ v1 ˜ , v2 ˜ Us vu = vs . We find the solution using the Gauss method, 1 1 −1 1 1 10 → 3 01 2 1 ⇒ vu = 2 1 ⇔ v = 2u1 + u2 . A sketch of what has been computed is in Fig. 38. In this Figure is clear that the vector v is fixed, and we have only expressed this fixed vector in as a linear combination of two different bases. It is clear in this Fig. 38 that one has to stretch the vector u1 by two and add the result to the vector u2 to obtain v. x2 v u2 e2 u1 e1 x1 Figure 38. The vector v = e1 + 3e2 expressed in terms of the basis U = u1 = e1 + e2 , u2 = −e1 + e2 is given by v = 2u1 + u2 . Example 4.4.3: Consider the vector space P2 of all polynomials of degree less or equal two, and let us consider the case of F = R. An ordered basis is S = p0 = 1, p1 = x, p2 = x2 . The coordinate map is [ ]s : P2 → R3 defined as follows, [p]s = ps , where a ps = b ⇔ p(x) = a + bx + cx2 . c The column vector ps represents the components of the vector p in the ordered basis S . The equation above defines a correspondence between every element in P2 and every element in R3 . The coordinate map depend on the choice of the ordered basis. For example, choosing ˜ the ordered basis S = p0 = x2 , p1 = x, p2 = 1 , the corresponding coordinate map is 3 [ ]s : P2 → R defined by [p]s = ps , where ˜ ˜ ˜ c ps = b ⇔ p(x) = a + bx + cx2 . ˜ a 140 G. NAGY – LINEAR ALGEBRA december 8, 2009 The coordinate maps above generalize to the spaces Pn and Rn+1 for all n ∈ N. Given the ordered basis S = p0 = 1, p1 = x, · · · , pn = xn , the corresponding coordinate map [ ]s : Pn → Rn+1 defined by [p]s = ps , where a0 . ps = . ⇔ p(x) = a0 + · · · + an xn . . an Example 4.4.4: Consider V = P2 with ordered basis S = p0 = 1, p1 = x, p2 = x2 . (a) Find rs = [r]s , the components of r(x) = 3 + 2x + 4x2 in the ordered basis S . (b) Find rq = [r]q , the components of the same polynomial r given in part (a) but now in the ordered basis Q = q0 = 1, q1 = 1 + x, q2 = 1 + x + x2 , . Solution: Part (a) is straightforward to compute, since r(x) = 3 + 2x + 4x2 implies that 3 r(x) = 3 p0 (x) + 2 p1 (x) + 4p2 (x) ⇔ rs = 2 . 4 Part (b) is more involved, as in Example 4.4.2. We look for numbers r1 , r2 , r3 such that ˜˜˜ r0 ˜ ˜ r(x) = r0 q0 (x) + r1 q1 (x) + r2 q2 (x) ⇔ rq = r1 . ˜ ˜ ˜ (4.6) r2 ˜ From the definition of the basis Q we know the components of the basis vectors in Q in terms of the S basis, that is, 1 q0 (x) = p0 (x) ⇔ q0s = 0 , 0 1 q1 (x) = p0 (x) + p1 (x) ⇔ q1s = 1 0 1 q2 (x) = p0 (x) + p1 (x) + p2 (x) ⇔ q2s = 1 . 1 Now we can write the ordered basis Q in terms of the column vectors of the matrix Qs = Q s = q0s , q1s , q2s , as follows, 111 Qs = 0 1 1 . 001 Expressing Eq. (4.6) in the standard basis means 3 1 1 1 2 = r0 0 + r1 1 + r2 1 . ˜ ˜ ˜ 4 0 0 1 G. NAGY – LINEAR ALGEBRA December 8, 2009 141 The last equation on the right is a matrix equation for the unknowns r0 , r1 , and r2 ˜˜ ˜ 111 r0 ˜ 3 0 1 1 r1 = 2 ⇔ Qs rq = rs . ˜ r2 ˜ 001 4 We find the solution using the Gauss method, 111 3 100 0 1 1 2 → 0 1 0 001 4 001 hence the solution is 1 rq = −2 4 1 −2 , 4 ⇔ r(x) = q0 (x) − 2 q1 (x) + 4 q2 (x) . We can verify that this is the solution, since r(x) = q0 (x) − 2 q1 (x) + 4 q2 (x) = 1 − 2 (1 + x) + 4 (1 + x + x2 ) = (1 − 2 + 4) + (−2 + 4)x + 4x2 = 3 + 2x + 4x2 = 3 p0 (x) + 2 p1 (x) + 4p2 (x). Example 4.4.5: Given any ordered basis U = u1 , u2 , u3 of a 3-dimensional vector space V , find Uu = U u ⊂ F3,3 , that is, find ui = [ui ]u for i = 1, 2, 3, the components of the basis vectors ui in its own basis U . Solution: The answer is simple: The definition of vector components in a basis says that 1 u1 = u1 + 0 u2 + 0 u3 ⇔ u1u = 0 = e1 , 0 0 u2 = 0 u1 + u2 + 0 u3 ⇔ u2u = 1 = e2 , 0 0 u3 = 0 u1 + 0 u2 + u3 ⇔ u3u = 0 = e3 . 1 In other words, using the coordinate map φu : V → F3 , we can always write any basis U as components in its own basis as follows, Uu = e1 , e2 , e3 = I3 . This example says that there is nothing special about the standard basis S = {e1 , · · · , en } of Fn , where ei = I:i is the i-th column of the identity matrix In . Given any n-dimensional vector space V over F with any ordered basis V , the components of the basis vectors expressed on its own basis is always the standard basis of Fn , that is, the result is always V v = In . 142 G. NAGY – LINEAR ALGEBRA december 8, 2009 Exercises. ` ´ 4.4.1.- Let S = e1 , e2 , e3 be the standard basis of R3 . Find the components of the vector v = e1 + e2 + 2e3 in the ordered basis U 23 23 23 1 0 1” “ u1s = 405 , u2s = 415 , u3s = 415 . 1 1 0 ` ´ 4.4.2.- Let S = e1 , e2 , e3 be the standard 3 basis of R . Find the components of the vector 23 8 v s = 47 5 4 in the ordered basis U given by 23 23 23 1 1 1” “ u1s = 415 , u2s = 425 , u3s = 425 . 1 2 3 4.4.3.- Consider the vector space V = P2 with the ordered basis S given by ` ´ S = p0 = 1, p1 = x, p2 = x2 . (a) Find the components of the polynomial r(x) = 2 + 3x − x2 in the ordered basis S . (b) Find the components of the same polynomial r given in part (a) but now in the ordered basis Q given by ` ´ q0 = 1, q1 = 1 − x, q2 = x + x2 , . 4.4.4.- Let S be the standard ordered basis of R2,2 , that is, S = (E11 , E12 , E21 , E22 ) ⊂ R2,2 , with » E11 = » E21 = 1 0 0 1 – 0 , 0 – 0 , 0 » E12 = » E22 = 0 0 0 0 – 1 , 0 – 0 . 1 (a) Show that the ordered set M below is a basis of R2,2 , where M = (M1 , M2 , M3 , M4 ) ⊂ R2,2 , with » 0 M1 = 1 » 1 M3 = 0 – 1 , 0 – 0 , 1 » 0 M2 = 1 » 1 M2 = 0 – −1 , 0 – 0 , −1 where the matrices above are written in the standard basis. (b) Consider the matrix A written in the standard basis S , – » 12 . A= 34 Find the components of the matrix A in the ordered basis M. G. NAGY – LINEAR ALGEBRA December 8, 2009 143 Chapter 5. Linear transformations 5.1. Linear transformations We now introduce the notion of a linear transformation between vector spaces. Vector spaces are defined not by the elements they are made of but by the relation among these elements. They are defined by the properties of the operation we called linear combinations of the vector space elements. Linear transformations are a very special type of functions that preserve the linear combinations on both vector spaces where they are defined. Linear transformations generalize the definition of a linear function introduced in Sect. 2.1 between the spaces Fn and Fm to any two vector spaces. Definition 5.1. Given the vector spaces V and W over F, the function T : V → W is called a linear transformation iff for all u, v ∈ V and all scalars a, b ∈ F holds T(au + bv) = a T(u) + b T(v). In the case that V = W a linear transformation T : V → V is called a linear operator. Taking a = b = 0 in the definition above we see that every linear transformation satisfies that T(0) = 0. As we said in Sect. 2.1, if a function does not satisfy this condition, then it cannot be a linear transformation. We now consider several examples of linear transformations. We do not show that the transformation we show below are indeed linear; the proofs are left to the reader. Example 5.1.1: We give several examples of linear transformations. (a) Every example of a matrix as a linear function given in Sect. 2.1 is a linear transformation. More precisely, if V = Fn and W = Fm , then any m × n matrix A defines a linear transformation A : Fn → Fm by A = A. Therefore, rotations, reflections, projections, dilations are linear transformations in the sense given in Def. 5.1 above. In particular, square matrices are now called linear operators. (b) The vector spaces V and W are part of the definition of the linear transformation. For example, a matrix A alone does not determine a linear transformation, since the vector spaces must also be specified. For example, and m × n matrix A defines two different linear transformations, the first one A : Rn → Rm , and the second one A : Cn → Cm . Although the action of these transformations is the same, the matrix-vector product A(x) = Ax, the linear transformations are different. We will see in Chapter 8 that in the case m = n these transformations may have different eigenvalues and eigenvectors. (c) Let V = P3 , W = P2 and let D : P3 → P2 be the differentiation transformation dp . dx That is, given a polynomial p ∈ P3 , the transformation D acting on p is a polynomial one degree less than p given by the derivative of p. For example, D(p) = p(x) = 2 + 3x + x2 − x3 ∈ P3 ⇒ D(p)(x) = 3 + 2x − 3x2 ∈ P2 . The transformation D is linear, since for all polynomials p, q ∈ P3 and scalars a, b ∈ F holds dq(x) d dp(x) +b = a D(p)(x) + b D(q)(x). D(ap + bq)(x) = a p(x) + b q(x) = a dx dx dx (d) Notice that the differentiation transformation introduced above can also be defined as a linear operator: Let V = W = P3 , and introduce D : P3 → P3 with the same action dp . The vector spaces used to define the transformation are as above, that is, D(p) = dx 144 G. NAGY – LINEAR ALGEBRA december 8, 2009 important. The transformation defined in this example D : P3 → P3 is different for the one defined above D : P3 → P2 . Although the action is the same, since the action of both transformations is to take the derivative of their arguments, the transformations are different. We comment on these issues later on. (e) Let V = P2 , W = P3 and let S : P2 → P3 be the integral transformation x S(p)(x) = p(t) dt. 0 For example, 32 13 x + x ∈ P3 . 2 3 The transformation S is linear, since for all polynomials p, q ∈ P2 and scalars a, b ∈ F holds p(x) = 2 + 3x + x2 ∈ P2 ⇒ S(p)(x) = 2x + x S(ap + bq)(x) = x a p(t) + b q(t) dt = a 0 x p(t) dt + b 0 q(t) dt = a S(p)(x) + b S(q)(x). 0 (f) Let V = C 1 R, R , the space of all functions f : R → R having one continuous derivative,that is f is continuous. Let W = C 0 R, R , the space of all continuous functions f : R → R. Notice that V W , since f (x) = |x|, the absolute value function, belongs to W but it does not belong to V . The differentiation transformation D : V → W , df D(f )(x) = (x), dx is a linear transformation. The integral transformation S : W → V , x S(f )(x) = f (t) dt, 0 is also a linear transformation. The range and null spaces of an m × n matrix introduced in Sect. 2.5 can be also be defined for linear transformations between arbitrary vector spaces. Definition 5.2. Let V , W be vector spaces and T : V → W be a linear transformation. The set N (T) ⊂ V given by N (T) = x ∈ V : T(x) = 0 is called the null space of the transformation T. The set R(T) ⊂ W given by R(T) = y ∈ W : ∃ x ∈ V with T(x) = y is called the range space of the transformation T. Example 5.1.2: In the case of V = Rn , W = Rm and A = A an m × n matrix, we have seen many examples of the sets N (A) and R(A) in Sect. 2.5. Additional examples in the case of more general linear transformations are the following: (a) Let V = P3 , W = P2 , and let D : P3 → P2 be the differentiation transformation dp . dx Recalling that D(p) = 0 iff p(x) = c, with c constant, we conclude that N (D) is the set of constant polynomials, that is D(p) = N (D) = Span {p0 = 1} ⊂ P3 . G. NAGY – LINEAR ALGEBRA December 8, 2009 145 Let us now find the set R(D). Notice that the derivative of a degree three polynomial is a degree two polynomial, never a degree three. We conclude that R(D) = P2 . (b) Let V = P2 , W = P3 , and let S : P2 → P3 be the integration transformation x S(p)(x) = p(t) dt. 0 Recalling that S(p) = 0 iff p(x) = 0, we conclude that N (S) = p(x) = 0 ⊂ P2 . Let us now find the set R(S). Notice that the integral of a nonzero constant polynomial is a polynomial of degree one. We conclude that R(S) = p(x) ∈ P3 : p(x) = a1 x + a2 x2 + a3 x3 . Therefore, non-zero constant polynomials do not belong to R(S), so R(S) P3 . Given a linear transformation T : V → W between vector spaces V and W , it is not difficult to show that the sets N (T) ⊂ V and R(T) ⊂ W are also subspaces of their respective vector spaces. Proposition 5.3. Let V and W be vector spaces and T : V → W be a linear transformation. Then, the sets N (T ) ⊂ V and R(T ) ⊂ W are subspaces of V and W , respectively. The proof is formally the same as the proof of Proposition 2.26 in Sect. 2.5. Proof of Proposition 5.3: The sets N (T ) and R(T ) are subspaces because the transformation T is linear. Consider two arbitrary elements x1 , x2 ∈ N (T ), that is, T(x1 ) = 0 and T(x2 ) = 0. Then, for any a, b ∈ F holds T(ax1 + bx2 ) = a T(x1 ) + b T(x2 ) = 0 ⇒ (ax1 + bx2 ) ∈ N (T). Therefore, N (T ) ⊂ V is a subspace. Analogously, consider two arbitrary elements y1 , y2 ∈ R(T ), that is, there exist x1 , x2 ∈ V such that y1 = T(x1 ) and y2 = T(x2 ). Then, for any a, b ∈ F holds (ay1 + by2 ) = a T(x1 ) + b T(x2 ) = T(ax1 + bx2 ) ⇒ (ay2 + by2 ) ∈ R(T ). Therefore, R(T ) ⊂ W is a subspace. This establishes the Proposition. 5.1.1. Rank-Nullity Theorem. The following result relates the dimensions of the null and range spaces of a linear transformation on finite dimensional vector spaces. This result is usually called nullity-rank theorem, where the nullity of a linear transformation is the dimension of its null space, and the rank is the dimension of its range space. Theorem 5.4. Every linear transformation T : V → W between finite dimensional vector spaces V and W satisfies that dim N (T ) + dim R(T ) = dim V. (5.1) Proof of Theorem 5.4: Let us denote by n = dim V and by N = u1 , · · · , uk a basis for N (T ), where 0 k n, with k = 0 representing the case N (T ) = {0}. By Proposition 4.25 we know we can increase the set N into a basis of V , so let us denote this basis as V = u1 , · · · , uk , v1 , · · · , vl , In order to prove Eq. (5.1) we now show that the set R = T(v1 ), · · · , T(vl ) k + l = n. 146 G. NAGY – LINEAR ALGEBRA december 8, 2009 is a basis of R(T ). We first show that Span(R) = R(T ): Given any vector v ∈ V , we can express it in the basis V as follows, v = x1 u1 + · · · + xk uk + y1 v1 + · · · + yl vl . Since R(T ) is the set of vectors of the form T(v) for all v ∈ V , therefore T(v) ∈ R(T ) iff T(v) = y1 T(v1 ) + · · · + yl T(vl ) ∈ Span(R). But this just says that R(T ) = Span(R). Now we show that R is linearly independent: Given any linear combination of the form 0 = c1 T(v1 ) + · · · + cl T(vl ) = T(c1 v1 + · · · + cl vl ), we conclude that c1 v1 + · · · + cl vl ∈ N (T ), so there exist d1 , · · · , dk such that c1 v1 + · · · + cl vl = d1 u1 + · · · + dk uk , which is equivalent to d1 u1 + · · · + dk uk − c1 v1 − · · · − cl vl = 0. Since V is a basis we conclude that all coefficients ci and dj must vanish, for i = 1, · · · , l and j1 , · · · , k . This shows that R is a linearly independent set. Then R is a basis of R(T ) and so, dim N (T ) = k, dim R(T ) = l = n − k. This establishes the Theorem. In the case that the linear transformation is given by an m × n matrix, Theorem 5.4 establishes a relation between the the nullity and the rank of the matrix. Corollary 5.5. Every m × n matrix A satisfies that dim N (A) + rank(A) = n. Proof of Corollary 5.5: An m × n matrix A defines a linear transformation given by A : Fn → Fm by A(x) = Ax. Since rank(A) = dim R(A) and dim Fn = n, Eq. (5.1) implies that dim N (A) + rank(A) = n. An alternative wording of this result uses EA , the reduced echelon form of A. Since N (A) = N (EA ), the dim N (A) is the number of non-pivot columns in EA . We also know that rank(A) is the number of pivot columns in EA . Therefore dim N (A) + rank(A) is the total number of columns in A, which is n. 5.1.2. Injections, surjections and bijections. A linear transformation may have both, only one, or none of the following properties. Definition 5.6. Let V and W be vector spaces and T : V → W be a linear transformation. (a) T is injective (or one-to-one) iff for all vectors v1 , v2 ∈ V holds v1 = v2 ⇒ T(v1 ) = T(v2 ). (b) T is surjective (or onto) iff R(T ) = W . (c) T is bijective iff T is both injective and surjective. In Fig. 39 we sketch the meaning of these definitions using standard pictures from set theory. Notice that a given transformation can be only injective, or it can be only surjective, or it can be both injective and surjective (bijective), or it can be neither. That a transformation is injective does not imply anything about whether it is surjective or not. That a transformation is surjective does not imply anything about whether it is injective or not. Before we present examples of injective and/or surjective linear transformations it is useful to introduce a result to characterize those transformations that are injective. This can be done in terms of the null space of the transformation. G. NAGY – LINEAR ALGEBRA December 8, 2009 V v1 R(T)=W V W 147 T ( v 1) T ( v 2) v2 T is injective T is surjective V R(T) V W W v1 T ( v 1) = T ( v2 ) v2 T is not injective T is not surjective Figure 39. Sketch representing an injective and a non-injective function, as well as a surjective and a non-surjective function. Proposition 5.7. Given vector spaces V and W , the linear transformation T : V → W is injective iff N (T ) = {0}. Proof of Proposition 5.7: (⇒) Since the transformation T is injective, given two elements v = 0, we know that T(v) = T(0) = 0, where the last equality comes from the fact that T is linear. We conclude that the null space of T contains only the zero vector. (⇐) Since N (T) = {0}, then given any two different elements v1 , v2 , we know that v1 − v2 = 0, therefore T(v1 − v2 ) = 0, since the null space is trivial. Then, T(v1 ) = T(v2 ). We conclude that T is injective. This establishes the Proposition. Example 5.1.3: Let V = R3 , W = R2 , and consider the linear transformation A defined 123 . Is A injective? Is A surjective? by the 2 × 3 matrix A = 241 Solution: A simple way to answer these questions is to find bases for the N (A) and R(A) spaces. Both bases can be obtained from the information given in EA , the reduced echelon form of A. A simple calculation shows A= 12 24 3 1 → 1 0 2 0 0 = EA . 1 This information gives a basis for N (A), since all solutions Ax = 0 are given by x1 = −2x2 , −2 −2 x2 free variable, ⇒ x = 1 x2 ⇒ N (A) = Span 1 0 0 x3 = 0, . Since N (A) = {0}, the transformation associated with A is not injective. The reduced echelon form above also says that the first and third column vectors in A form a linearly independent set. Therefore, the set R= 1 3 , 2 1 is a basis for R(A), so dim R(A) = 2, and then R(A) = R2 . Therefore, the transformation associated with A is surjective. 148 G. NAGY – LINEAR ALGEBRA december 8, 2009 Example 5.1.4: Consider the differentiation and integration transformations x dp D : P3 → P2 D(p)(x) = (x), S : P2 → P3 S(p)(x) = p(t) dt. dx 0 Show that D above is surjective but not injective, and S above is injective but not surjective. Solution: Let us start with the differentiation transformation. In Example 5.1.2 we found that N (D) = {0} and R(D) = P2 . Therefore, D above is not injective but it is surjective. Regarding the integration transformation, we have seen in the same Example 5.1.2 that N (S) = {0} and R(S) P3 . Therefore, S above is injective but it is not surjective. 5.1.3. Transformations as vectors. We have seen that one example of a vector space is the set Fm,n consisting of all m × n matrices together with the operations of matrix addition and multiplication of a matrix by a scalar. Therefore, an m × n matrix A can be interpreted in two ways: On the one hand, as a linear transformation between vector spaces A : Fn → Fm ; on the other hand, as a vector in the space of matrices, A ∈ Fm,n . It will not be a surprise to verify that something similar happens for linear transformations between any vector spaces. Definition 5.8. Given the vector spaces V and W over F, denote by L(V, W ) the set of all linear transformations T : V → W . Given T, S ∈ L(V, W ) and any scalar a ∈ F, introduce the addition of linear transformations and scalar multiplication as follows (T + S)(v) = T(v) + S(v), (aT)(v) = a T(v), for all v ∈ V. The particular case of the space of all linear operators T : V → V is denoted as L(V ). The space L(V, W ) with the transformation addition and scalar multiplication operations above is indeed a vector space. Proposition 5.9. Given any vector spaces V and W , the set L(V, W ) of linear transformations T : V → W together with the addition and scalar multiplication operations defined in Def. 5.8 is a vector space. The proof is to verify all properties in the Def. 4.1, and we left it to the reader. Therefore, linear transformations can be interpreted not only as functions between vector spaces, but as vectors themselves in the space L(V, W ). If V and W are finite dimensional vector spaces, then L(V, W ) is also finite dimensional and dim L(V, W ) = (dim V )(dim W ). G. NAGY – LINEAR ALGEBRA December 8, 2009 149 Exercises. 5.1.1.- Consider the operator A : R2 → R2 » – 11 1 given by the matrix A = , 11 » –2 x1 which projects a vector onto the x2 2 line x1 = x2 on R . Is A injective? Is A surjective? 5.1.2.- Fix any real number θ ∈ [0, 2π ), and define the operator » (θ) : R2 → R2 by R – cos(θ) − sin(θ) the matrix R(θ) = , sin(θ) cos(θ) which is a rotation by an angle θ counterclockwise. Is R(θ) injective? Is R(θ) surjective? 5.1.3.- Let Fn,n be the vector space of all n × n matrices, and fix A ∈ Fn,n . Determine which of the following transformations T : Fn,n → Fn,n is linear. (a) T(X) = AX − XA; (b) T(X) = XT ; (c) T(X) = XT + A; (d) T(X) = A tr (X). (e) T(X) = X + XT . 5.1.4.- Fix a vector v ∈ Fn and then define the function T : Fn → F by T(x) = vT x. Show that T is a linear transformation. Is T a linear operator? 5.1.5.- Show that the mapping ∆ : P3 → P1 is a linear transformation, where d2 p (x) dx2 Is ∆ injective? Is ∆ surjective? ∆(p)(x) = 5.1.6.- Prove the following statement: A linear operator T : V → V on a finite dimensional vector space V is injective iff T is surjective. 5.1.7.- Prove the following statement: If V and W are finite dimensional vector spaces with dim V > dim W , then every linear transformation T : V → W is not injective. 150 G. NAGY – LINEAR ALGEBRA december 8, 2009 5.2. Transformation components 5.2.1. The matrix of a linear transformation. We have seen in Sect. 2.1 and at the beginning of this Section that every m × n matrix defines a linear transformation between the vector spaces Fn and Fm . We now show that every linear transformation T : V → W between finite dimensional vector spaces V and W has associated a matrix T. The matrix associated with such linear transformation is not unique. In order to compute the matrix of a linear transformation a basis must be chosen both in V and in W . The matrix associated with a linear transformation depends on the choice of bases done in V and W . We can say that the matrix T of a linear transformation T : V → W are the components of T in the given bases for V and W , analogously to the components v of a vector v ∈ V in a basis for V studied in Sect. 4.4. Definition 5.10. Let V and W be finite dimensional vector spaces over the field F. Let V = v1 , · · · , vn and W = w1 , · · · , wm be their respective ordered bases and let the map [ ]w : W → Fm be the coordinate map on W . The matrix of the linear transformation T : V → W in the ordered bases V and W is the m × n matrix Tvw = [T(v1 )]w , · · · , [T(vn )]w . (5.2) Let us explain the notation in Eq. (5.2). Given any basis vector vi ∈ V ⊂ V , where i = 1, · · · , n, we know that T(vi ) is a vector in W . Like any vector in a vector space, T(vi ) can be decomposed in a unique way in terms of the basis vectors in W , and following Sect. 4.4 we denote them as [T(vi )]w . When there is no possibility of confusion, we denote Tvw simply as T. Example 5.2.1: Consider the vector spaces V = W = R2 , both with the standard basis, 1 0 that is, S = e1 = , e2 = . Let the linear operator T : R2 → R2 be given by 0 1 x1 x2 T s s = 3x1 + 2x2 4x1 − x2 . (5.3) s (a) Find the matrix Tss associated with the linear operator above. 1 −1 (b) Consider the ordered basis U for R2 given by U = u1s = , u 2s = 1 1 matrix Tuu associated with the linear operator above. . Find the Solution: Part (a): The definition of Tss implies Tss = [T(e1 )]s , [T(e2 )]s . From Eq. (5.3) we know that 3 0 2 1 T = , T = . 0s s 4s 1s s −1 s Therefore, 3 2 Tss = . 4 −1 Part (b): By definition Tuu = T(u1 ) u , T(u2 ) u . Notice that from the definition of basis U we have uis = [ui ]s , the column vector form with the components of the basis vectors ui in the standard basis S . So, we can use the definition of T in Eq. (5.3), that is, [T(u1 )]s = T 1 1 s s = 5 3 , s [T(u2 )]s = T −1 1 s s = −1 −5 . s The results above are vector components in the standard basis S , so we need to translate these results into components on the U basis. We translate them in the usual way, solving G. NAGY – LINEAR ALGEBRA December 8, 2009 151 a linear system for each of these two vectors: 5 3 1 1 = y1 s + y2 s −1 1 −1 −5 , s 1 1 = z1 s −1 1 + z2 s . s The solutions are the components we are looking for, since y1 y2 [T(u1 )]u = , z1 z2 [T(u2 )]u = u . u We can solve both systems above at the same time using the augmented matrix 1 1 −1 1 5 3 −1 10 → −5 01 4 −1 −3 . −2 We have obtained that 4 −1 [T(u1 )]u = , −3 −2 [T(u2 )]u = u So we conclude that Tuu = . u 4 −3 . −1 −2 From the Definition 5.10 it is clear that the matrix associated with a linear transformation T : V → W depends on the choice of bases V and W for the spaces V and W , respectively. In the case that V = W the matrix of the linear operator T : V → V usually means Tvv , that is, to choose the same basis V for the domain space V and the range space V . But this ˜ is not the only choice. One can choose different bases V and V for the domain and range spaces, respectively. In this case, the matrix associated with the linear operator T is Tvv , ˜ that is, Tvv = [T(v1 )]v , · · · , [T(vn )]v . ˜ ˜ ˜ Example 5.2.2: Consider the linear operator defined in Eq. (5.3) in Example 5.2.1. Find the associated matrices Tus and Tsu , where S and U are the bases defined in that Example 5.2.1. Solution: The first matrix is simple to obtain, since Tus = [T(u1 )]s , [T(u2 )]s and it is straightforward to compute T 1 1 s s = 5 3 , T s −1 1 s s −1 −5 = , s so, we then conclude that Tus = 5 3 −1 −5 . us We now compute Tsu = [T(e1 )]u , [T(e2 )]u . From the definition of T we know that 1 3 0 2 T = , T = 0s s 4s 1s s −1 . s The results are expressed in the standard basis S , so we need to translate them into the U basis, as follows 3 4 = y1 s 1 1 + y2 s −1 1 , s 2 −1 = z1 s 1 1 + z2 s −1 1 . s 152 G. NAGY – LINEAR ALGEBRA december 8, 2009 The solutions are the components we are looking for, since [T(e1 )]u = y1 y2 , z1 z2 [T(e2 )]u = u . u We can solve both systems above at the same time using the augmented matrix 1 1 −1 1 3 4 20 2 → −1 02 5 −1 7 . 3 We have obtained that [T(e1 )]u = 1 5 2 −1 So we conclude that Tsu = , [T(e2 )]u = u 1 57 2 −1 3 17 23 . u . su 5.2.2. Action as matrix-vector product. An important property of the matrix associated with a linear transformation T is that the action of the transformation onto a vector can be represented as the matrix-vector product between the transformation matrix T the vector components in the appropriate bases. Proposition 5.11. Let V and W be finite dimensional vector spaces with ordered bases V and W , respectively. Let T : V → W be a linear transformation with associated matrix Tvw . Then, the components of the vector T(x) ∈ W in the basis W can be expressed as the matrix-vector product [T(x)]w = Tvw xv , where xv = [x]v are the components of the vector x ∈ V in the basis V . Proof of Proposition 5.11: Given any vector x ∈ V , the definition of its vector components in the basis V = v1 , · · · , vn implies that x1 . xv = . ⇔ x = x1 v1 + · · · + xn vn . . xn V Since T is a linear transformation we know that T(x) = x1 T(v1 ) + · · · + xn T(vn ). The equation above holds in any basis of W , in particular in W , that is, [T(x)]w = x1 [T(v1 )]w + · · · + xn [T(vn )]w x1 . = [T(v1 )]w , · · · [T(vn )]w . . xn v = Tvw xv . This establishes the Proposition. Example 5.2.3: Consider the linear operator T : R2 → R2 given in Example 5.2.1 above together with the bases S and U defined in that example. (a) Use the matrix vector product to express the action of the operator T when the standard basis S is used in both domain and range spaces R2 . G. NAGY – LINEAR ALGEBRA December 8, 2009 153 (b) Use the matrix vector product to express the action of the operator T when the basis U is used in both domain and range spaces R2 . Solution: Part (a): Since xs = [T(x)]s = x1 x2 , we can express the action of T on x as s 3x1 + 2x2 4x1 − x2 = s 3 4 2 −1 ss x1 x2 ⇒ [T(x)]s = Tss xs , s where we used the expression for Tss found in Example 5.2.1. Part (b): We need to find [T(x)]u and express the result using Tuu . We use the notation x ˜ xu = 1 , and we repeat the steps followed in the proof of Proposition 5.11. x2 u ˜ [T(x)]u = [T(˜1 u1 + x2 u2 )]u = x1 [T(u1 )]u + x2 [T(u2 )]u = [T(u1 )]u , [T(u2 )]u x ˜ ˜ ˜ x1 ˜ x2 ˜ , u so we conclude that [T(x)]u = Tuu xu . At the end of Example 5.2.1 we have found that Tuu = 4 −1 −3 −2 , uu therefore, the action of T on x when expressed in the basis U is given by the following matrix-vector product, 4 −3 x1 ˜ [T(x)]u = . −1 −2 uu x2 u ˜ Example 5.2.4: Express the action of the differentiation transformation D : P3 → P2 , given dp by D(p)(x) = (x) as a matrix-vector product in the ordered standard bases of P3 and dx P2 , S = p0 = 1, p1 = x, p2 = x2 , p3 = x3 , ˜ S = q0 = 1, q1 = x, q2 = x2 . Solution: Following the Proposition 5.11 we only need to find the matrix Dss associated ˜ with the transformation D, 0100 Dss = [D(p0 )]s , [D(p1 )]s , [D(p2 )]s , [D(p3 )]s ⇒ Dss = 0 0 2 0 . ˜ ˜ ˜ ˜ ˜ ˜ 0 0 0 3 ss ˜ Therefore, given any vector p(x) = a0 + a1 x + a2 x2 + a3 x3 ⇔ a0 a1 ps = a2 a3 s we obtain that D(p)(x) = a1 + 2a2 x + 3a3 x2 is equivalent to a0 0100 a1 a1 [D(p)]s = Dss ps = 0 0 2 0 ⇒ [D(p)]s = 2a2 . ˜ ˜ ˜ a2 0 0 0 3 ss 3a3 s ˜ a3 ˜ s 154 G. NAGY – LINEAR ALGEBRA december 8, 2009 Example 5.2.5: Express the action of the integration transformation S : P2 → P3 , given x by S(q)(x) = 0 q(t) dt as a matrix-vector product in the standard ordered bases of P3 and P2 , S = p0 = 1, p1 = x, p2 = x2 , p3 = x3 , ˜ S = q0 = 1, q1 = x, q2 = x2 . Solution: Following the Proposition 5.11 we only need to find the matrix Sss associated ˜ with the transformation S, 000 1 0 0 Sss = [S(q0 )]s , [S(q1 )]s , [S(q2 )]s ⇒ Sss = ˜ ˜ 0 1 0 . 2 0 0 1 ss 3˜ Therefore, given any vector q(x) = a0 + a1 x + a2 x2 ⇔ a0 qs = a1 ˜ a2 s ˜ a1 2 a2 3 we obtain that S(q)(x) = a0 x + x+ x is equivalent to 2 3 000 0 a0 1 0 0 a a1 [S(q)]s = Sss qs = ⇒ [S(q)]s = 0 . ˜˜ 1 0 a1 /2 0 2 a2 s ˜ a2 /3 s 0 0 1 ss 3˜ 5.2.3. Composition and matrix product. The following formula relates the composition of linear transformations with the matrix product of their associated matrices. Proposition 5.12. Let U , V and W be finite-dimensional vector spaces with bases U , V and W , respectively. Let T : U → V and S : V → W be linear transformations with associated matrices Tuv and Svw , respectively. Then, the composition S ◦ T : U → W given by (S ◦ T)(u) = S( T(u) , for all u ∈ U , is a linear transformation and the associated matrix (S ◦ T)uw is given by the matrix product by (S ◦ T)uw = Svw Tuv . Proof of Proposition 5.12: First show that the composition of two linear transformations S and T is a linear transformation. Given any u1 , u2 ∈ U and arbitrary scalars a and b holds (S ◦ T)(au1 + bu2 ) = S T(au1 + bu2 ) = S a T(u1 ) + b T(u2 ) = a S T(u1 ) + b S( T(u2 ) = a (S ◦ T)(u1 ) + b (S ◦ T)(u2 ). We now compute the matrix of the composition transformation. Denote the ordered basis in U as follows, U = u1 , · · · , un . Then compute (S ◦ T)uw = S T(u1 ) w , · · · , S T(un ) w . G. NAGY – LINEAR ALGEBRA December 8, 2009 155 The column i, with i = 1, · · · , n, in the matrix above has the form S T(ui ) w = Svw [T(ui )]v = Svw Tuv uiu . Therefore, we conclude that [S ◦ T]uw = Svw Tuv . This establishes the Proposition. Example 5.2.6: Let S be the standard ordered basis of R3 , and consider the linear transformations T : R3 → R3 and S : R3 → R3 given by 2x1 − x2 + 3x3 x1 −x1 x1 = −x1 + 2x2 − 4x3 , S x2 = 2x 2 T x2 s s x2 + 3x3 x3 s 3x 3 s x3 s s (a) Find a matrices Tss and Sss . By the way, Is T injective? Is T surjective? (b) Find the matrix of the composition T ◦ S : R3 → R3 in standard ordered basis S . Solution: Since there is only one ordered basis in this problem, we represent Tss and Sss simply by T and S, respectively. Part (a): A straightforward calculation from the definitions T = [T(e1 )]s , [T(e2 )]s , [T(e3 )]s , S = [S(e1 )]s , [S(e2 )]s , [S(e3 )]s implies that the matrices associated with T and S in the standard ordered basis are given by 2 −1 3 −1 0 0 2 −4 , T = −1 S = 0 2 0 . 0 1 3 003 The information in matrix T is useful to find out if T is injective and/or surjective. First, find the reduced echelon form of T, 2 −1 3 1 −2 4 1 −2 4 1 0 10 100 −1 2 −4 → 2 −1 3 → 0 3 5 → 0 1 3 → 0 1 0 . 0 1 3 0 13 0 13 0 0 −4 001 We conclude that N (T) = {0}, which implies that T is injective. The relation dim N (T) + dim Col(T) = 3 together with dim N (T) = 0 imply that dim Col(T) = 3, hence T is surjective. Part (b): The matrix of the composition T ◦ S in the standard ordered basis S is the product TS, that is, 2 −1 3 −1 0 0 −2 −2 9 2 −4 0 2 0 = 1 4 −12 . (T ◦ S) = TS = −1 0 1 3 003 0 2 9 156 G. NAGY – LINEAR ALGEBRA december 8, 2009 Exercises. 5.2.1.- Consider T : R2 → R2 the linear operator given by » – h “»x – ”i x1 + x2 1 T = , x2 s s −2x1 + 4x2 s where S denote the standard ordered basis of R2 . Find the matrix Tuu associated with the linear operator T in and the ordered basis U of R2 given by »– »–” “ 1 1 [u1 ]s = , [u2 ]s = . 1s 2s 5.2.2.- Let T : R3 → R3 be given by 23 2 3 x1 − x2 h “ x1 ”i T 4x 2 5 = 4−x1 + x2 5 , s x3 s x1 − x3 s 5.2.3.- Fix an m × n matrix A and define the linear transformation T : Rn → Rm as follows: [T(x)]s = Axs , where S ⊂ ˜ ˜ Rn and S ⊂ Rm are standard ordered bases. Show that Tss = A. ˜ 5.2.4.- Find the matrix associated with the linear transformation T : P3 → P2 , dp d2 p (x), (x) − dx2 dx in the standard ordered bases of P3 , P2 . T(p)(x) = 5.2.5.- Find the matrices in the standard bases of P3 and P2 of the transformations S ◦ D : P3 → P3 , 3 S is the standard ordered basis of R . (a) Find the matrix Tuu of the linear operator T in the ordered basis 23 23 23 0 1” “1 U = 40 5 , 41 5 , 41 5 . 1s 1s 0s (b) Verify that [T(v)]u = Tuu vu , where 23 1 vs = 415 . 2s D ◦ S : P2 → P2 , that is, the composition of the differentiation and integration transformations, as defined in Sect. 5.1. G. NAGY – LINEAR ALGEBRA December 8, 2009 157 5.3. Change of basis 5.3.1. Vector components. We have seen in Sect. 4.4 that every vector v in a finite dimensional vector space V with an ordered basis V , can be expressed in a unique way as a linear combination of the basis elements, with vv = [v]v denoting the coefficients in that linear combination. These components depend on the basis chosen in V . Given two different ˜ ordered bases V , V ⊂ V , the components vv and vv associated with a vector v ∈ V are in ˜ general different. We now use the matrix-vector product to find a simple formula relating vv to vv . ˜ ˜ Let us recall the following notation. Given an n-dimensional vector space V , let V and V be two ordered bases of V given by ˜ ˜ ˜ V = v1 , · · · , vn and V = v1 , · · · , vn . Let I : V → V be the identity transformation, that is, I(v) = v for all v ∈ V , and introduce the change of basis matrices v v Ivv = ˜1v , · · · , ˜nv ˜ and Ivv = v1˜ , · · · , vnv , v ˜ ˜ where we denoted, as usual, ˜iv = [˜i ]v and viv = [vi ]v , for i = 1, · · · , n. Since the sets v v ˜ ˜ ˜ V and V are bases of V , the matrices Ivv and Ivv are invertible, and it is not difficult to ˜ ˜ show that (Ivv )−1 = Ivv . Finally, introduce the following notation for the change of basis ˜ ˜ matrices, P = Ivv and P−1 = Ivv . ˜ ˜ ˜ Theorem 5.13. Let V be a finite dimensional vector space, let V and V be two ordered bases of V , and let P = Ivv be the change of basis matrix. Then, the components xv and ˜ ˜ ˜ xv of any vector x ∈ V in the ordered bases V and V , respectively, are related by the linear equation xv = P−1 xv . (5.4) ˜ Notice that Eq. (5.4) is equivalent to the inverse equation xv = Pxv . ˜ Proof of Theorem 5.13: Let V be an n-dimensional vector space with two ordered bases ˜ ˜ ˜ V = v1 , · · · , vn and V = v1 , · · · , vn . Given any vector x ∈ V , then the definition of vector components xv = [x]v in the basis V implies x1 . xv = . ⇔ x = x1 v1 + · · · + xn vn . . xn v ˜ Express the second equation above in terms of components in the ordered basis V , x1 . xv = x1 v1˜ + · · · + xn vnv = v1˜ , · · · , vnv . ⇒ xv = Ivv xv . v ˜ ˜ v ˜ ˜ ˜ . xn We conclude that xv = P ˜ −1 v xv . This establishes the Theorem. Example 5.3.1: Consider the vector space V = R2 with the standard ordered basis S and 1 −1 the ordered basis U = u1s = ,u = . Given the vector with components 1 s 2s 1s 1 xs = , find xu . 3s Solution: The answer is given by Theorem 5.13, that says xu = P−1 xs , P = Ius . 158 G. NAGY – LINEAR ALGEBRA december 8, 2009 From the data of the problem the matrix P is simple to compute, since 1 1 P = Ius = u1s , u2s = −1 1 . us Computing the inverse matrix P−1 = 111 2 −1 1 su we obtain the final result xu = 111 2 −1 1 1 3 su ⇒ xu = s 2 1 . u Example 5.3.2: Let V = R2 with ordered bases B = b1 , b2 and C = c1 , c2 related by the equations b1 = −c1 + 4c2 , b2 = 5c1 − 3c2 . 5 , find xc . 3b 1 (b) Given xc = , find xb . 1c (a) Given xb = Solution: Part (a): We know that xc = P−1 xb , where P = Icb . From the data of the problem we know that −1 b1 = −c1 + 4c2 ⇔ b1c = , 4 c −1 5 , ⇒ Ibc = 4 −3 bc 5 b2 = 5c1 − 3c2 . ⇔ b1c = , −3 c hence we know the change of basis matrix Ibc = P−1 , so we conclude that xc = −1 4 5 −3 5 3 bc ⇒ xc = b 10 11 c Part (b): keeping the definition of matrix P = Icb as we introduced it in part (a), we know that xc = P−1 xb . In this part (b) we need the inverse relation xb = Pxc . Since −1 5 135 P−1 = , it is simple to obtain P−1 = 17 . Using this matrix we obtain 4 −3 bc 4 1 cb xb = 13 17 4 5 1 cb 1 1 ⇒ c xb = 18 17 5 . b 5.3.2. Transformation components. Analogously, we have seen in Sect. 5.2 that every linear transformation T : V → W between finite dimensional vector spaces V and W with ordered bases V and W , respectively, can be expressed in a unique way as an dim W × dim V matrix Tvw . These components depend on the bases chosen in V and W . Given two different ˜ ˜ ordered bases V , V ⊂ V , and given two different bases W , W ⊂ W , the matrices Tvw and Tvw ˜˜ associated with the linear transformation T are in general different. Matrix multiplication provides a simple formula relating Tvw and Tvw . ˜˜ G. NAGY – LINEAR ALGEBRA December 8, 2009 159 Let us recall old notation and also introduce a bit of new one. Given an n-dimensional ˜ vector space V , let V and V be ordered bases of V , ˜ ˜ ˜ V = v1 , · · · , vn and V = v1 , · · · , vn ; ˜ and given an m-dimensional vector space W , let W and W be two ordered bases of W , ˜ ˜ ˜ W = w1 , · · · , wm and W = w1 , · · · , wm . Let I : V → V be the identity transformation, that is, I(v) = v for all v ∈ V , and introduce the change of basis matrices Ivv = v1˜ , · · · , vnv v ˜ ˜ v v and Ivv = ˜1v , · · · , ˜nv . ˜ Let J : W → W be the identity transformation, that is, J(w) = w for all w ∈ W , and introduce the change of basis matrices Jww = w1w , · · · , wmw ˜ ˜ ˜ ˜ ˜ and Jww = w1w , · · · , wmw . ˜ ˜ Notice that the sets V and V are bases of V , therefore the n × n matrices Ivv and Iv are ˜ ˜ invertible, and (Ivv )−1 = Ivv . The similar statement is true for the m × m matrices Jww and ˜ ˜ ˜ Jww , and (Jww )−1 = Jww . Finally, introduce the following notation for the change of basis ˜ ˜ ˜ matrices, P = Ivv ˜ ⇒ P−1 = Ivv , ˜ and Q = Jww ˜ ⇒ Q−1 = Jww . ˜ ˜ Theorem 5.14. Let V and W be finite dimensional vector spaces, let V and V be two ˜ ordered bases of V , let W and W be two ordered bases of W , and let P = Ivv and Q = Jww ˜ ˜ be the change of basis matrices, respectively. Then, the components Tvw and Tvw of any ˜˜ ˜˜ linear transformation T : V → W in the bases V , W and V , W , respectively, are related by the matrix equation Tvw = Q−1 Tvw P. ˜˜ (5.5) Notice that Eq. (5.5) is equivalent to the inverse equation Tvw = QTvw P−1 . An important ˜˜ particular case of Theorem 5.14 is when T is a linear operator. In this is the case T : V → V , ˜ ˜ so V = W . Often in applications one also has V = W and V = W , which imply P = Q. ˜ Corollary 5.15. Let V be a finite dimensional vector space, let V and V be two ordered bases of V , let P = Ivv be the change of basis matrix, and let T : V → V be a linear operator. ˜ ˜ ˜ Then, the components T = Tvv and T = Tvv of T in the bases V and V , respectively, are ˜˜ related by the matrix equation ˜ T = P−1 TP. (5.6) Proof of Theorem 5.14: Let V be an n-dimensional vector space with ordered bases ˜ ˜ ˜ V = v1 , · · · , vn and V = v1 , · · · , vn ; and let W be an m-dimensional vector space with ˜ ˜ ˜ ordered bases W = w1 , · · · , wm and W = w1 , · · · , wm . We know that the matrix Tvw ˜˜ ˜˜ associated with the transformation T and the bases V , W , and the matrix Tvw associated with the transformation T and the bases V , W satisfy the following equations, [T(x)]w = Tvw xv ˜ ˜˜ ˜ and [T(x)]w = Tvw xv . (5.7) ˜ Since T(x) ∈ W , Theorem 5.13 says that its components in the bases W and W are related by the equation [T(x)]w = Q−1 [T(x)]w , ˜ with Q = Jww . ˜ 160 G. NAGY – LINEAR ALGEBRA december 8, 2009 Therefore, the first equation in (5.7) implies that Tvw xv = Q−1 [T(x)]w ˜˜ ˜ = Q−1 Tvw xv = Q−1 Tvw P xv , ˜ with P = Iv v , ˜ where in the second line above we used the second equation in (5.7), and in the third line above we used the inverse form of Theorem 5.13. Since the equation above holds for all x ∈ V we conclude that Tvw = Q−1 Tvw P. ˜˜ This establishes the Theorem. Example 5.3.3: Let S be the standard ordered basis of R2 , and let T : R2 → R2 be the linear operator given by x1 x T =2, x2 s s x1 s that is, a reflection along the line x2 = x1 . Find the matrix Tuu , where the ordered basis U 1 −1 is given by U = u1s = ,u = . 1 s 2s 1s Solution: From the definition of T is straightforward to obtain Tss , as follows, Tss = 1 0 T s s ,T 0 1 s ⇒ s Tss = 0 1 1 0 . ss Since T is a linear operator and we want to compute the matrix Tuu from matrix Tss , we can use Corollary 5.15, which says that these matrices are related by the similarity transformation Tuu = P−1 Tss P, where P = Ius . From the data of the problem we know that Ius = u1s , u2s us = 1 1 −1 1 , us therefore, P= 1 1 −1 1 P−1 = , us 1 1 2 −1 1 1 . su We then conclude that Tuu = 1 11 2 −1 1 su 0 1 1 0 ss 1 1 −1 1 ⇒ Tuu = us 1 0 0 −1 . uu In this particular example we can see that the matrix associated with the reflection transformation T is diagonal in the basis U . For this particular transformation we have that T(u1 ) = u1 , T(u2 ) = −u2 . Non-zero vectors v with this property, that T(v) = λv, are called eigenvectors of the operator T and the scalar λ is called eigenvalue of T. In this example the elements in the basis U are eigenvectors of the reflection operator, and the matrix of T in this special basis is diagonal. Basis formed with eigenvectors of a given linear operator will be studied in a later on. G. NAGY – LINEAR ALGEBRA December 8, 2009 161 Example 5.3.4: Let S be the standard ordered basis of R2 , and let T : R2 → R2 be the linear operator given by (x1 + x2 ) 1 x1 T = , x2 s s 1s 2 that is, a projection along the line x2 = x1 . Find the matrix Tuu , where the ordered basis −1 1 ,u = . U = u 1s = 1s 1 s 2s Solution: From the definition of T is straightforward to obtain Tss , as follows, Tss = 1 0 T s s 0 1 ,T ⇒ s s 111 211 Tss = . ss Since T is a linear operator and we want to compute the matrix Tuu from matrix Tss , we again use Corollary 5.15, which says that these matrices are related by the similarity transformation Tuu = P−1 Tss P, where P = Ius . In Example 5.3.3 we have already computed P= 1 1 −1 1 su 11 21 P−1 = , us 1 1 2 −1 1 1 . su We then conclude that Tuu = 1 1 2 −1 1 1 1 1 ss 1 1 −1 1 ⇒ Tuu = us 10 00 . uu Again in this example we can see that the matrix associated with the reflection transformation T is diagonal in the basis U , with diagonal elements equal to one and zero. For this particular transformation we have that the basis vectors in U are eigenvectors of T with eigenvalues 1 and 0, that is, T(u1 ) = u1 and T(u2 ) = 0. Example 5.3.5: Let U and V be standard ordered bases of R3 and R2 , respectively, let T : R3 → R2 be the linear transformation x1 x − x2 + x3 T x2 =1 , x2 − x3 v x3 u v and introduce the ordered bases 1 1 0 ˜ ˜ ˜ ˜ U = u1u = 1 , u2u = 1 , u3u = 0 ⊂ R3 , 1u 1u 0u 1 ˜ V = ˜1v = v 2 , ˜2v = v v 2 1 ⊂ R2 . v Find the matrices Tuv and Tuv . ˜˜ Solution: We start finding the matrix Tuv , which by definition is given by 0 0 1 , T 0 , T 1 Tuv = T 0 v v 1u v 0u 0u 162 G. NAGY – LINEAR ALGEBRA december 8, 2009 1 −1 0 1 are related by the equation hence we obtain Tuv = 1 −1 . Theorem 5.14 says that the matrices Tuv and Tuv ˜˜ uv Tuv = Q−1 Tuv P, ˜˜ where Q = Jvv ˜ From the data of the problem we know that 101 ˜˜˜˜ Iuu = u1u , u2u , u3u uu = 1 1 0 ˜ 0 1 1 uu ˜ vv Jvv = ˜1v , ˜2v ˜ vv ˜ = 12 21 ⇒ Q= vv ˜ and , Tu v ˜˜ and the result is Tuv = ˜˜ 1 3 20 −1 0 2 −1 1 0 vv ˜ −4 5 −1 1 1 −1 uv 1 1 0 0 1 1 Q−1 = vv ˜ Therefore, we need to compute the matrix product 1 −1 = 2 3 1 P = 1 0 ⇒ 12 21 P = Iuu . ˜ 0 1 1 1 0 , 1 uu ˜ 1 −1 2 3 2 −1 . vv ˜ 1 0 1 uu ˜ . uv ˜˜ 5.3.3. Determinant and trace of linear operators. The type of matrix transformation given by Eq. (5.6) will be important later on, so we give such transformations a name. Definition 5.16. The n × n matrices A and B are related by a similarity transformation iff there exists and invertible n × n matrix P such that B = P−1 AP. We now show that the determinant and the trace of a square matrix are invariant under similarity transformations. Proposition 5.17. If the square matrices A, B are related by the similarity transformation B = P−1 AP, then holds det(B) = det(A), tr (B) = tr (A). Proof of Proposition: 5.17: The determinant is invariant under similarity transformations, since det(B) = det(P−1 AP) = det(P−1 ) det(A) det(P) = det(A). The trace is also invariant under similarity transformations, since tr (B) = tr (P−1 AP) = tr (PP−1 A) = tr (A). This establishes the Proposition. Both the invariance of the determinant and trace under similarity transformations and the transformation law for the operator components in a basis given in Corollary 5.15 imply that the determinant and the trace can be defined on linear operators. Definition 5.18. Let V be a finite dimensional vector space, let T ∈ L(V ) be a linear operator, and let Tvv be the matrix of the linear operator in an arbitrary ordered basis V of V . The determinant and trace of a linear operator T ∈ L(V ) are respectively given by, det(T) = det(Tvv ), tr (T) = tr (Tvv ). G. NAGY – LINEAR ALGEBRA December 8, 2009 163 The determinant and trace of a linear operator is well-defined, since given any other ˜ ordered basis V of V , and denoting P = Ivv , we obtain ˜ det(Tvv ) = det(P−1 Tvv P) = det(P−1 ) det(Tvv ) det(P) = det(Tvv ) = det(T) ˜˜ tr (Tvv ) = tr (P−1 Tvv P) = tr (PP−1 Tvv ) = tr (Tvv ) = tr (T). ˜˜ 164 G. NAGY – LINEAR ALGEBRA december 8, 2009 Exercises. 5.3.1.- Let U = (u1 , u2 ) be an ordered basis of R2 given by u1 = 2e1 − 9e2 , u2 = e1 + 8e2 , where S = (e1 , e2 ) is the standard ordered basis of R2 . (a) Find both change of basis matrices Ius and Isu . (b) Given the vector x = 2u1 + u2 , find both xs and xu . 5.3.2.- Consider the ordered bases of R3 , B = (b1 , b2 , b3 ) and C = (c1 , c2 , c3 ), where c1 = b 1 − 2 b 2 + b 3 , c2 = − b 2 + 3 b 3 , c3 = −2b1 + b3 . (a) Find both the change of basis matrices Ibc and Icb . (b) Let x = c1 − 2c2 + 2c3 . Find both xb and xc . 5.3.3.- Consider the ordered bases U and S of R2 given by »–” »– “ 2 1 , , u2s = U = u1s = 1s 2s »–o »– “ 0 1 . , e2s = S = e1s = 1s 0s »– 3 find xs . (a) Given xu = 2u (b) Find e1u and e2u . 5.3.4.- Consider the ordered bases of R2 » –” »– “ 1 1 B = b1s = , b2s = 2s −2 s »– » –” “ 1 −1 , C = c1s = , c2s = 1s 1s where S is the standard ordered basis. »– 2 (a) Given xc = , find xs . 3c (b) For the same x above, find xb . 5.3.5.- Let S = (e1 , e2 ) be the standard ordered basis of R2 and B = (b1 ,» 2 ) be b – 12 another ordered basis. Let A = 23 be the matrix that transforms the components of a vector x ∈ R2 from the basis S into the basis B, that is, xb = Axs . Find the components of the basis vectors b1 , b2 in the standard basis, that is, find b1s and b2s . 5.3.6.- Show that similarity is a transitive property, that is, if matrix A is similar to matrix B, and B is similar to matrix C, then A is similar to C. 5.3.7.- Consider R3 with the standard ordered basis S and the ordered basis 23 23 23 1 1” “1 U = 40 5 , 41 5 , 41 5 . 0s 0s 1s Let T : R3 → R3 be the linear operator 23 2 3 x1 + 2x2 − x3 h “ x1 ”i 5. −x2 T 4x 2 5 =4 s x3 s x1 + 7x3 s Find both matrices Tss and Tuu . 5.3.8.- Consider R2 with ordered bases »–” »– “ 0 1 , ,e = S = e1s = 1s 0 s 2s » –” »– “ 1 1 . ,u = U = u1s = −1 s 1 s 2s Let T : R2 → R2 be a linear transformation given by »– »– 1 3 . [T(u1 )]s = , [T(u2 )]s = 3s 1s Find the matrix Tus , then the matrices Tss , Tuu , and finally Tsu . G. NAGY – LINEAR ALGEBRA December 8, 2009 165 Chapter 6. Inner product spaces An inner product space is a vector space with an additional structure called inner product. This additional structure associates each pair of vectors in the vector space with a scalar. An inner product is a generalization to any vector space of the dot product defined on Rn . This generalization of the dot product is done in such a way that allows to translate from Rn to an arbitrary vector space the concepts of length of a vector and angle between vectors. 6.1. The dot product We review the definition of the dot product between vectors in R2 , and we describe its main properties, including the Cauchy-Schwarz inequality. We then use the dot product to introduce the notion of length of a vector, distance and angle between vectors, including the special case of perpendicular vectors. We then review that all these notions can be generalized in a straightforward way from R2 to Fn , n 1. x1 y , y = 1 in the x2 y2 standard ordered basis S . The dot product on R2 with is the function · : R2 × R2 → R given by x · y = x1 y1 + x2 y2 . The dot product norm of a vector x ∈ R2 is the value of the function : R2 → R, √ x = x · x. Definition 6.1. Given any vectors x, y ∈ R2 with components x = The norm distance between x, y ∈ R2 is the value of the function d : R2 × R2 → R, d(x, y) = x − y . The dot product can be expressed using the transpose of a vector components in the standard basis, as follows, xT y = [x1 , x2 ] y1 = x1 y1 + x2 y2 = x · y. y2 The dot product norm and the norm distance can be expressed in term of vector components in the standard ordered basis S as follows, x= (x1 )2 + (x2 )2 , d(x, y) = (x1 − y1 )2 + (x2 − y2 )2 . The geometrical meaning of the norm and distance is clear form this expression in components, as is shown in Fig. 40. The norm of a vector is the Euclidean length from the origin point to the head point of the vector, while the distance between two vectors in the Euclidean distance between the head points of the two vectors. x x2 R 2 R 2 2 || y || = ( y1 ) + ( y 2 ) 2 R || x || y y || x − y || y 2 || y || x1 y1 2 2 y 2 || y || y1 Figure 40. Example of the Euclidean notions of vector length and distance between vectors in R2 . 166 G. NAGY – LINEAR ALGEBRA december 8, 2009 It is important that we summarize the main properties of the dot product in R2 , since they are the main guide to construct the generalizations of the dot product to other vector spaces. Proposition 6.2. The dot product on R2 satisfies: (a) (Positive definiteness) For all x ∈ R2 holds x · x 0, and x · x = 0 iff x = 0; (b) (Symmetry) For all x, y ∈ R2 holds x · y = y · x; (c) (Linearity on the second argument) For all x, y, z ∈ R2 and for all a, b ∈ R holds x · (ay + bz) = a (x · y) + b (x · z). Proof of Proposition 6.2: These properties are simple to obtain from the definition of the dot product. Part (a): This follows from x · x = (x1 )2 + (x2 )2 0; furthermore, in the case x · x = 0 we obtain that (x1 )2 + (x2 )2 = 0 ⇔ x1 = x2 = 0. Part (b): It is simple to see that x · y = x1 y1 + x2 y2 = y1 x1 + y2 x2 = y · x. Part (c): It is also simple to see that x · (ay + bz) = x1 (ay1 + bz1 ) + x2 (ay2 + bz2 ) = a (x1 y1 + x2 y2 ) + b (x1 z1 + x2 z2 ) = a (x · y) + b (x · z). This establishes the Proposition. These simple properties are crucial to establish the following result, known as CauchySchwarz inequality for the dot product in R2 . This inequality allows to express the dot product of two vectors in R2 in terms of the angle between the vectors. Theorem 6.3. (Cauchy-Schwarz) The properties (a)-(c) in Proposition 6.2 imply that for all x, y ∈ R2 holds |x · y| x y. Proof of Theorem 6.3: From the positive definiteness property we know that the following inequality holds for all x, y ∈ R2 and for all a ∈ R, 0 ax − y 2 = (ax − y) · (ax − y). The symmetry and the linearity on the second argument imply 0 (ax − y) · (ax − y) = a2 x 2 − 2a (x · y) + y 2 . (6.1) Since the inequality above holds for all a ∈ R, let us choose a particular value of a, the solution of the equation x·y a x 2 − (x · y) = 0 ⇒ a = . x2 Introduce this particular value of a into Eq. (6.1), x·y 0− (x · y) + y 2 ⇒ x2 This establishes the Theorem. |x · y|2 x 2 y 2. G. NAGY – LINEAR ALGEBRA December 8, 2009 167 The Cauchy-Schwarz inequality implies that we can express the dot product of two vectors in an alternative and more geometrical way, in terms of an angle related with the two vectors. The Cauchy-Schwarz inequality says x·y −1 1, xy which suggests that the number (x · y)/( x appropriate angle. y ) can be expressed as a sine or a cosine of an Proposition 6.4. The angle between two vectors x, y ∈ R2 is the solution θ ∈ [0, π ] of the equation x·y cos(θ) = . xy Proof of Proposition 6.4: It is not difficult to see that given any vectors x, y ∈ R2 , the vectors x/ x and y/ y have unit norm. Indeed, x x 2 = (x1 )2 (x2 )2 1 + = (x1 )2 + (x2 )2 = 1. 2 x x2 x2 The same holds for the vector y/ y . The expression x·y x y = · , xy x y shows that the number (x · y)/( x as shown in Fig. 41. y ) is the inner product of two vectors in the unit circle, R x 2 0 01 y 02 1 Figure 41. The dot product of two vectors x, y ∈ R2 can be expressed in terms of the angle θ = θ1 − θ2 between the vectors. Therefore, we know that x cos(θ1 ) = , sin(θ1 ) x y cos(θ2 ) = . sin(θ2 ) y Their dot product is given by x x · y = cos(θ1 ), sin(θ1 ) y cos(θ2 ) = cos(θ1 ) cos(θ2 ) + sin(θ1 ) sin(θ2 ). sin(θ2 ) Using the formula cos(θ1 ) cos(θ2 ) + sin(θ1 ) sin(θ2 ) = cos(θ1 − θ2 ), and denoting the angle between the vectors by θ = θ1 − θ2 , we conclude that x·y = cos(θ). xy This establishes the Proposition. 168 G. NAGY – LINEAR ALGEBRA december 8, 2009 Recall the notion of perpendicular vectors. Definition 6.5. The vectors x, y ∈ R2 are orthogonal, denoted as x ⊥ y, iff the angle θ ∈ [0, π ] between the vectors is θ = π/2. The notion of orthogonal vectors in Def. 6.5 can be expressed in terms of the dot product, and it is equivalent to Pythagoras Theorem on right triangles. Proposition 6.6. Let x, y ∈ R2 be non-zero vectors, then the following statement holds, x⊥y ⇔ x·y =0 ⇔ x−y 2 =x 2 + y 2. Proof of Proposition 6.6: The non-zero vectors x and y ∈ R2 are orthogonal iff θ = π/2, which is equivalent to x·y = 0 ⇔ x · y = 0. xy The last part of the Proposition comes from the following calculation, x−y 2 = (x1 − y1 )2 + (x2 − y2 )2 = (x1 )2 + (x2 )2 + (y1 )2 + (y2 )2 − 2(x1 y1 + x2 y2 ) =x 2 +y 2 − 2 x · y. Hence, x ⊥ y iff x · y = 0 iff Pythagoras Theorem holds for the triangle with sides given by x, y and hypotenuse x − y. 1 3 and y = , the angle between 2 1 them, and then find a non-zero vector z orthogonal to x. Example 6.1.1: Find the length of the vectors x = Solution: We first find the length, that is, the norms of x and y, x 2 = x · x = xT x = 1 2, y 2 = y · y = yT y = 3 1 1 =1+4 2 3 =9+1 1 ⇒ ⇒ x= y= √ √ 5, 10. We now find the angle between x and y, 3 12 1 x·y 5 1 cos(θ) = = √√ = √ =√ xy 5 10 52 2 We now find z such that z ⊥ x, that is, z1 = −2z2 1 0 = z1 z2 = z1 + 2z2 ⇒ 2 z2 free variable ⇒ ⇒ θ= π . 4 z= −2 z2 . 1 6.1.1. Dot product in Fn . The notion of dot product reviewed above can be generalized in a straightforward way from R2 to Fn , n 1, where F ∈ {R, C}. Definition 6.7. The dot product on the vector space Fn , n 1, is the function · : Fn × Fn → F given by x · y = x∗ y, where x, y denote components in the standard basis of Fn . The dot product norm of a vector x ∈ Fn is the value of the function : Fn → R, √ x = x · x. G. NAGY – LINEAR ALGEBRA December 8, 2009 169 The norm distance between x, y ∈ Fn is the value of the function d : Fn × Fn → R, d(x, y) = x − y . n The vectors x, y ∈ F are orthogonal, denoted as x ⊥ y, iff holds x · y = 0. Notice that we defined two vectors to be orthogonal by the condition that their dot product vanishes. This is the appropriate generalization to Fn of the ideas we saw in R2 . The concept of angle is more difficult to study. In the case that F = C is not clear what the angle between vectors mean. In the case F = R and n > 3 we have to define angle by the number (x · y)/( x y ). This will be done after we prove the Cauchy-Schwarz inequality, which then is used to show that the number (x · y)/( x y ) ∈ [−1, 1]. The formulas above for the dot product, norm and distance can be expressed in terms of the vector components in the standard basis as follows, x · y = x1 y1 + · · · + xn yn , x= |x1 |2 + · · · + |xn |2 , |x1 − y1 |2 + · · · + |xn − yn |2 , x1 y1 . . where we used the standard notation x = . , y = . , and |xi |2 = xi xi , for i = 1, · · · n. . . d(x, y) = xn yn In the particular case that F = R all the vector components are real numbers, so xi = xi . Example 6.1.2: Find whether x is orthogonal to y and/or z, where 1 −5 −4 2 4 −3 x = , y = , z = . 3 −3 2 4 2 1 Solution: We need to compute the dot products x · y and x · z. We obtain −5 4 T x y = 1 2 3 4 = −5 + 8 − 9 + 8 ⇒ x · y = 2 ⇒ x ⊥ y, −3 2 −4 −3 T = − 4 − 6 + 6 + 4 ⇒ x · z = 0 ⇒ x ⊥ z. x z= 1 2 3 4 2 1 2 + 3i 2i Example 6.1.3: Find x · y, where x = i and y = 1 . 1−i 1 + 3i Solution: The first product x · y is given by 2i x∗ y = 2 − 3i −i 1 + i 1 = (2 − 3i)(2i) − i + (1 + i)(1 + 3i), 1 + 3i so x · y = 4i + 6 − i + 1 − 3 + i + 3i = 4 + 7i. 170 G. NAGY – LINEAR ALGEBRA december 8, 2009 The dot product satisfies the following properties. Proposition 6.8. The dot product on Fn , n 1, satisfies: (a) (Positive definiteness) For all x ∈ Fn holds x · x 0, and x · x = 0 iff x = 0; (b1) (Symmetry F = R) For all x, y ∈ Rn holds x · y = y · x; (b2) (Conjugate symmetry, for F = C) For all x, y ∈ Cn holds x · y = y · x; (c) (Linearity on the second argument) For all x, y, z ∈ Fn and for all a, b ∈ F holds x · (ay + bz) = a (x · y) + b (x · z). Proof of Proposition 6.8: use the expression of the dot product in terms of the vector components. The property in (a) follows from x · x = x∗ x = |x1 |2 + · · · + |xn |2 0; furthermore, in the case that x · x = 0 we obtain that |x1 |2 + · · · + |xn |2 = 0 ⇔ x1 = · · · = xn = 0 ⇔ x = 0. The property in (b1) can be established as follows, x · y = x1 y1 + · · · + xn yn = y1 x1 + · · · + yn xn = y · x. The property in (b2) can be established as follows, x · y = x1 y1 + · · · + xn yn = (y 1 x1 + · · · + y n xn ) = y · x. The property in (c) is shown in a similar way, x · (ay + bz) = x∗ (a y + b z) = a x∗ y + b x∗ z = a (x · y) + b (x · z). This establishes the Proposition. The positive definiteness property (a) above shows that the dot product norm √ indeed a is real-valued and not a complex-valued function, since x · x 0 implies that x = x · x ∈ R. In the case of F = R, the symmetry property and the linearity in the second argument property imply that the dot product on Rn is also linear in the first argument. This is a reason to call the dot product on Rn a bilinear form. Finally, notice that in the case F = C, the conjugate symmetry property and the linearity in the second argument imply that the dot product on Cn is conjugate linear on the first argument. The proof of the latter statement is the following, (ay + bz) · x = x · (ay + bz) = a (x · y) + b (x · z) = a (x · y) + b (x · z), that is, for all x, y, z ∈ Cn and all a, b ∈ C holds (ay + bz) · x = a (y · x) + b (z · x). Hence we say that the dot product on Cn is conjugate linear in the first argument. Example 6.1.4: Compute the dot product of x = 2 + 3i 3i with y = . 6i − 9 2 Solution: This is a straightforward computation x · y = x∗ y = 2 − 3i −6i − 9 Notice that x = (2 + 3i)ˆ, with ˆ = x x 3i = 6i + 9 − 12i − 18 = −9 − 6i. 2 1 , so we could use the conjugate linearity in the first 3i argument to compute x · y = (2 + 3i)ˆ · y = (2 − 3i) (ˆ · y) = (2 − 3i) 1 x x −3i 3i = (2 − 3i)(3i − 6i), 2 G. NAGY – LINEAR ALGEBRA December 8, 2009 171 and we obtain the same result, x · y = −9 − 6i. Finally, notice that y · x = −9 + 6i. An important result is that the dot product in Fn satisfies the Cauchy-Schwarz inequality. Theorem 6.9. (Cauchy-Schwarz) The properties (a)-(c) in Proposition 6.8 imply that for all x, y ∈ Fn holds |x · y| x y. We remark that the proof of the Cauchy-Schwarz inequality only uses the three properties of the dot product presented in Proposition 6.8. Any other function f : Fn × Fn → F having these three properties also satisfies the Cauchy-Schwarz inequality. Proof of Theorem 6.9: From the positive definiteness property we know that the following inequality holds for all x, y ∈ Fn and for all a ∈ F, 0 ax − y 2 = (ax − y) · (ax − y). The symmetry and the linearity on the second argument imply 0 (ax − y) · (ax − y) = a a x 2 − a (x · y) − a (y · x) + y 2 . (6.2) Since the inequality above holds for all a ∈ F, let us choose a particular value of a, the solution of the equation x·y a a x 2 − a (x · y) = 0 ⇒ a = . x2 Introduce this particular value of a into Eq. (6.2), x·y 0− (x · y) + y 2 ⇒ |x · y|2 x 2 y 2. x2 This establishes the Theorem. In the case F = R, the Cauchy-Schwarz inequality in Rn implies that the number (x · y)/( x y ) ∈ [−1, 1], which is a necessary and sufficient condition for the following definition of angle between two vectors in Rn . Definition 6.10. The angle between two vectors x, y ∈ Rn is θ ∈ [0, π ] solution of x·y cos(θ) = . xy The dot product norm function in Def. 6.7 satisfies the following properties. Proposition 6.11. The dot product norm function on Fn , n 1, satisfies: (a) (Positive definiteness) For all x ∈ Fn holds x 0, and x = 0 iff x = 0; (b) (Scaling) For all x ∈ Fn and all a ∈ F holds ax = |a| x ; (c) (Triangle inequality) For all x, y ∈ Fn holds x + y x + y. Proof of Proposition 6.11: Properties (a) and (b) are straightforward to show from the definition of dot product, and their proof is left as an exercise. We show here how to obtain the triangle inequality, property (c). The proof uses the Cauchy-Schwarz inequality presented in Theorem 6.9. Given any vectors x, y ∈ Fn holds x+y 2 = (x + y) · (x + y) =x 2 + 2 |x · y| + y x 2 + (x · y) + (y · x) + y x We conclude that x + y 2 2 +2 x x+y 2 2 2 y+y 2 = x+y 2 , . This establishes the Proposition. 172 G. NAGY – LINEAR ALGEBRA december 8, 2009 A vector v ∈ Fn is called a normal or unit vector iff v = 1. Examples of unit vectors are the vectors in the standard basis. Unit vectors parallel to a given vector are very easy to find. v is a unit vector parallel to v. Proposition 6.12. If v ∈ Fn is non-zero, then v v Proof of Proposition 6.12: Notice that u = is parallel to v, and it is straightforward v to check that u is a unit vector, since 1 v = v = 1. u= v v This establishes the Proposition. 1 Example 6.1.5: Find a unit vector parallel to x = 2. 3 Solution: First compute the norm of x, √ √ x = 1 + 4 + 9 = 14, 1 1 2 is a unit vector parallel to v. therefore u = √ 14 3 G. NAGY – LINEAR ALGEBRA December 8, 2009 173 Exercises. 6.1.1.- Consider the vector space R4 with standard basis S and dot product. Find the norm of u and v, their distance and the angle between them, where 23 23 2 1 6 17 6−17 u = 6 7, v = 6 7. 4− 4 5 4 15 −2 −1 6.1.2.- Use the dot product on R2 to find two unit vectors orthogonal to »– 3 x= . 2 6.1.3.- Use the dot product on C2 to find a unit vector parallel to – » 1 + 2i . x= 2−i 6.1.4.- Consider the vector space R2 with the dot product. (a) Give an example of a linearly independent set {x, y} with x ⊥ y. (b) Give an example of a linearly dependent set {x, y} with x ⊥ y. 6.1.5.- Consider the vector space Fn with the dot product, and let Re denote the real part of a complex number. Show that for all x, y ∈ Fn holds x−y 2 =x 2 +y 2 − 2Re(x · y). 6.1.6.- Use the result in Probl. 5 above to prove the following generalizations of the Pythagoras Theorem to Fn with the dot product. (a) For x, y ∈ Rn holds x⊥y ⇔ x+y 2 =x 2 + y 2. =x 2 + y 2. (b) For x, y ∈ Cn holds x⊥y ⇒ x+y 2 6.1.7.- Prove that the parallelogram law holds for the dot product norm in Fn , that is, show that for all x, y ∈ Fn holds x+y 2 + x−y 2 =2 x 2 + 2 y 2. This law states that the sum of the squares of the lengths of the four sides of a parallelogram formed by x and y equals the sum of the square of the lengths of the two diagonals. 174 G. NAGY – LINEAR ALGEBRA december 8, 2009 6.2. Inner product An inner product on a vector space is a generalization of the dot product on Rn or Cn introduced in Sect. 6.1. The inner product is not defined with a particular formula, or requiring a particular basis in the vector space. Instead, the inner product is defined by a list of properties that must satisfy. We did something similar when we introduced the concept of a vector space. In that case we defined a vector space as a set of any kind of elements were linear combinations are possible, instead of defining the set by explicitly giving its elements. Definition 6.13. Let V be a vector space over the scalar field F ∈ {R, C}. A function , : V × V → F is called an inner product iff it satisfies the following properties (a) (Positive definiteness) For all x ∈ V holds x, x 0, and x, x = 0 iff x = 0; (b1) (Symmetry, for F = R) For all x, y ∈ V holds x, y = y, x ; (b2) (Conjugate symmetry, for F = C) For all x, y ∈ V holds x, y = y, x ; (c) (Linearity on the second argument) For all x, y, z ∈ V and all a, b ∈ F holds x, (ay + bz) = a x, y + b x, z . An inner product space is a pair V , , of a vector space with an inner product. Different inner products can be defined on a given vector space. The dot product is an inner product in Fn . A different inner product can be defined in Fn , as can be seen in the following example. Example 6.2.1: We show that Rn can have different inner products. (a) The dot product on Rn is an inner product, since the expression y1 . x, y = xT ys = x1 · · · xn s . = x1 y1 + · · · + xn yn , s . yn (6.3) s satisfies all the properties in Def. 6.13, with S the standard ordered basis in Rn . (b) A different inner product in Rn can be introduced by a similar formula to the one in Eq. (6.3) just by choosing a different ordered basis. If U is any ordered basis of Rn , then x, y = xT yu . u defines an inner product on Rn . The inner product defined using the basis U is not equal to the inner product defined using the standard basis S . Let P = Ius be the change of basis matrix, then we know that xu = P−1 xs . The inner product above can be expressed in terms of the S basis as follows, x, y = xT M ys , s M = P−1 T P−1 , and in general, M = In . Therefore, the inner product above is not equal to the dot product. Also, see Example 6.2.2. Example 6.2.2: Let S be the standard ordered basis in R2 , and introduce the ordered basis U as the following rescaling of S , 1 1 U = u1 = e1 , u2 = e2 . 2 3 Express the inner product x, y = xT yu in terms of xs and ys . Is this inner product the u same as the dot product? G. NAGY – LINEAR ALGEBRA December 8, 2009 175 Solution: The definition of the inner product says that x, y = xT yu . Introducing the u x1 ˜ y1 ˜ notation xu = and yu = , we obtain the usual expression x, y = x1 y1 + x2 y2 . ˜˜ ˜˜ x2 u ˜ y2 u ˜ The components xu and xs are related by the change of basis formula xu = P−1 xs , P = Ius = 1/2 0 0 1/3 20 03 P−1 = ⇒ us = P−1 T . su Therefore, x, y = xT yu = xT P−1 u s T P−1 ys = x1 , x2 x1 x2 where we used the standard notation xs = 2 x, y = xT P−1 ys s y1 y2 and ys = s ⇔ 4 0 s 0 9 y1 y2 su s . We conclude that s x, y = 4x1 y1 + 9x2 y2 . The inner product x, y = xT yu is different from the dot product x · y = xT ys . s u Example 6.2.3: Consider the vector space Fm,n of all m × n matrices. Show that an inner product on that space is the function , F : Fm,n × Fm,n → F A, B F = tr (A∗ B). The inner product is called Frobenius inner product. Solution: We show that the Frobenius function , F above satisfies the three properties in Def. 6.13. We use the component notation A = [Aij ], B = [Bkl ], with i, k = 1, · · · , m and j, l = 1, · · · , n, so m ∗ (A B)jl = m T A ji Bil = i=1 n Aij Bil ⇒ A, B F n = i=1 Aij Bij . j =1 i=1 The first property is satisfied, since n A, A F m = tr (A∗ A) = Aij 2 0, j =1 i=1 and A, A F = 0 iff Aij = 0 for every indices i, j , which is equivalently to A = 0. The second property is satisfied, since n A, B F n n n Aij Bij = = B ij Aij = B, A F . j =1 i=1 j =1 i=1 The same proof can be expressed in index-free notation using the properties of the trace, T T tr A∗ B ) = tr A B = tr (A B)T = tr BT A = tr B∗ A , that is, A, B F = B, A F . The third property comes from the distributive property of the matrix product, that is, A, (aB + bC) F = tr A∗ (aB + bC) = a tr A∗ B + b tr A∗ C = a A, B F + b A, C F . 176 G. NAGY – LINEAR ALGEBRA december 8, 2009 Example 6.2.4: Compute the Frobenius inner product A, B A= 12 24 3 , 1 B= 32 21 F where 1 ∈ R2,3 . 2 Solution: Since the matrices have real coefficients, the Frobenius inner product has the form A, B F = tr AT B . So, we need to compute the diagonal elements in the product 12 7∗∗ 321 AT B = 2 4 A, B F = 7 + 8 + 5 = 20. = ∗ 8 ∗ ⇒ 212 31 ∗∗5 Example 6.2.5: Consider the vector space Pn ([−1, 1]) of polynomials with real coefficients having degree less or equal n ≥ 1 and being defined on the interval [−1, 1]. Show that an inner product in this space is the following: 1 p, q = p(x)q(x) dx. p, q ∈ P n . −1 Solution: We need to verify the three properties in the Def. 6.13. The positive definiteness property is satisfied, since 1 p, p = p(x) 2 dx 0, −1 and in the case p, p = 0 this implies that the integrand must vanish, that is, [p(x)]2 = 0, which is equivalent to p = 0. The symmetry property is satisfied, since p(x)q(x) = q(x)p(x), which implies that p, q = q, p . The linearity property on the second argument is also satisfied, since 1 p, (aq + br) = p(x) a q(x) + b r(x) dx −1 1 =a 1 p(x)q(x) dx + b −1 p(x)r(x) dx −1 = a p, q + b p, r . Example 6.2.6: Consider the vector space C k ([a, b], R), with k 0 and a < b, of k -times continuously differentiable real-valued functions f : [a, b] → R. An inner product in this vector space is given by b f (x)g(x) dx. f,g = a Any positive function µ ∈ C 0 ([a, b], R) determines an inner product in C k ([a, b], R) as follows b f,g µ = µ(x) f (x)g(x) dx. a The function µ is called a weigh function. An inner product in the vector space C k ([a, b], C) of k -times continuously differentiable complex-valued functions f : [a, b] ⊂ R → C is the following, b f,g = f (x)g(x) dx. a G. NAGY – LINEAR ALGEBRA December 8, 2009 177 An inner product satisfies the following inequality. Theorem 6.14. (Cauchy-Schwarz) If V , , 2 | x, y | is an inner product space over F, then x, x y, y ∀ x, y ∈ V. Furthermore, equality holds iff y = ax, with a = x, y / x, x . Proof of Theorem 6.14: From the positive definiteness property we know that the following inequality holds for all x, y ∈ V and for all scalar a ∈ F, holds 0 (ax − y), (ax − y) . The symmetry and the linearity on the second argument imply 0 (ax − y), (ax − y) = a a x, x − a x, y − a y, x + y, y . (6.4) Since the inequality above holds for all a ∈ F, let us choose a particular value of a, the solution of the equation x, y a a x, x − a x, y = 0 ⇒ a = . x, x Introduce this particular value of a into Eq. (6.4), x, y x, y + y, y ⇒ | x, y |2 x, x y, y . x, x Finally, notice that equality holds iff ax = y, and in this case, computing the inner product with x we obtain a x, x = x, y . This establishes the Theorem. 0 − 6.2.1. Inner product norm. The inner product on a vector space determines a particular notion of length, or norm, of a vector, and we call it the inner product norm. After we introduce this norm we show its main properties. In Chapter 7 later on we use these properties to define a more general notion of norm as any function on the vector space satisfying these properties. The inner product norm is just a particular case of this broader notion of length. A normed space is a vector space with any norm. Definition 6.15. Given an inner product space V , , function : V → R, x= x, x . , the inner product norm is the The Cauchy-Schwarz inequality is often expressed using the inner product norm as follows: For all x, y ∈ V holds | x, y | x y. A vector x ∈ V is a normal or unit vector iff x = 1. v Proposition 6.16. If v = 0 belongs to V , , , then is a unit vector parallel to v. v The proof is the same of Proposition 6.12. Example 6.2.7: Consider the inner product space Fm,n , , F , where Fm,n is the vector space of all m × n matrices and , F is the Frobenius inner product defined in Example 6.2.3. The associated inner product norm is called the Frobenius norm and is given by A F = A, A F = tr A∗ A . If A = [Aij ], with i = 1, · · · , m and j = 1, · · · , n, then m A F n |Aij |2 = i=1 j =1 1 /2 . 178 G. NAGY – LINEAR ALGEBRA december 8, 2009 Example 6.2.8: Find an explicit expression for the Frobenius norm of any element A ∈ F2,2 . A11 A21 Solution: The Frobenius norm of an arbitrary matrix A = A 2 F = tr A11 A12 A21 A22 A11 A21 A12 A22 A12 ∈ F2,2 is given by A22 . Since we are only interested in the diagonal elements of the matrix product in the equation above, we obtain |A11 |2 + |A21 |2 ∗ A 2 = tr F ∗ |A12 |2 + |A22 |2 which gives the formula A 2 F = |A11 |2 + |A12 |2 + |A21 |2 + |A22 |2 . 2 This is the explicit expression of the sum A F 2 |Aij |2 = 1 /2 . i=1 j =1 The inner product norm function has the following properties. Proposition 6.17. The inner product norm introduced in Def. 6.15 satisfies: (a) (Positive definiteness) For all x ∈ V holds x 0, and x = 0 iff x = 0; (b) (Scaling) For all x ∈ V and all a ∈ F holds ax = |a| x ; (c) (Triangle inequality) For all x, y ∈ V holds x + y x + y. Proof of Proposition 6.17: Properties (a) and (b) are straightforward to show from the definition of inner product, and their proof is left as an exercise. We show here how to obtain the triangle inequality, property (c). Given any vectors x, y ∈ V holds x+y 2 = (x + y), (x + y) =x 2 + x, y + y, x + y x 2 + 2 | x, y | + y x 2 +2 x y+y 2 2 2 = x+y 2 , where the last inequality comes from the Cauchy-Schwarz inequality. We then conclude that 2 x+y 2 x + y . This establishes the Proposition. 6.2.2. Norm distance. The norm on an inner product space determines a particular notion of distance between vectors. After we introduce this norm we show its main properties. Definition 6.18. Given a vector space V with a norm function : V → R, the norm distance between two vectors is the value of the function d : V × V → R given by d(x, y) = x − y . Proposition 6.19. The norm distance introduced in Def. 6.18 satisfies: (a) (Positive definiteness) For all x, y ∈ V holds d(x, y) 0, and d(x, y) = 0 iff x = y; (b) (Symmetry) For all x, y ∈ V holds d(x, y) = d(y, x); (c) (Triangle inequality) For all x, y, z ∈ V holds d(x, y) d(x, z) + d(z, y). G. NAGY – LINEAR ALGEBRA December 8, 2009 179 Proof of Proposition 6.17: Properties (a) and (b) are straightforward from properties (a) and (b), and their proof are left as an exercise. We show how the triangle inequality for the distance comes from the triangle inequality for the norm. Indeed d(x, y) = x − y = (x − z) − (y − z) x z + y − z = d(x, z) + d(z, y), where we used the symmetry of the distance function on the last term above. This establishes the Proposition. The presence of an inner product, and hence a norm and a distance, on a vector space permits to introduce the notion of convergence of an infinite sequence of vectors. We say that the sequence {xn }∞ ⊂ V converges to x ∈ V iff n=0 lim d(xn , x) = 0. n→∞ Some of the most important concepts related to convergence are closedness of a subspace, completeness of the vector space, and the continuity of linear operators and linear transformations. In the case of finite dimensional vector spaces the situation is straightforward. All subspaces are closed, all inner product spaces are complete and all linear operators and linear transformations are continuous. However, in the case of infinite dimensional vector spaces, things are not so simple. 180 G. NAGY – LINEAR ALGEBRA december 8, 2009 Exercises. 6.2.1.- Prove which of the following functions , : R3 × R3 → R defines an inner product on R3 : (a) x, y = x1 y1 + x3 y3 ; (b) x, y = x1 y1 − x2 y2 + x3 y3 ; (c) x, y = 2x1 y1 + x2 y2 + 4x3 y3 ; 2 2 2 (d) x, y = x2 y1 + x2 y2 + x2 y3 . 3 2 1 We used the standard notation 23 23 x1 y1 x = 4x2 5 , y = 4y2 5 . x3 y3 6.2.2.- Prove that an inner product function , : V × V → F satisfies the following properties: (a) If x, y = 0 for all x ∈ V , then y = 0. (b) ax, y = a x, y for all x, y ∈ V . 2,2 6.2.3.- Given a matrix M ∈ R introduce the function , M : R2 × R2 → R, y, x M = y T M x. For each of the matrices M below prove whether , M defines an inner product or not, where: – » 41 ; (a) M = 19 – » 4 −3 ; (b) M = 3 9 – » 41 . (c) M = 09 6.2.4.- Fix any A ∈ Rn,n with N (A) = {0} and introduce M = AT A. Prove that , M : Rn × Rn → R, given by y , x = y T M x. is an inner product in Rn . 6.2.5.- Find k ∈ R such that the matrices A, B ∈ R2,2 satisfy A, B F = 0, where » – » – 12 12 A= , B= . 34 k1 6.2.6.- Evaluate the Frobenius norm for the matrices 2 3 » – 010 1 −2 A= , B = 40 0 1 5 . −1 2 100 6.2.7.- Prove that A ∈ Fm,n . A F = A∗ F for all 6.2.8.- Consider the vector space P2 ([0, 1]) with inner product Z1 p(x)q(x) dx. p, q = 0 Find a unit vector parallel to p(x) = 3 − 5x2 . G. NAGY – LINEAR ALGEBRA December 8, 2009 181 6.3. Orthogonal vectors The presence of an inner product allows us to define the concept of orthogonality on any vector space. Definition 6.20. Two vectors x, y in the inner product space V , , denoted as x ⊥ y, iff x, y = 0. are orthogonal, The Pythagoras Theorem holds on any inner product space. Proposition 6.21. Let V , , following statements hold: (a) If F = R, then x ⊥ y ⇔ (b) If F = C, then x ⊥ y ⇒ be an inner product space over the field F. Then, the x−y x−y 2 2 2 =x =x 2 + y 2; + y 2. Proof of Proposition 6.21: Both statements in Proposition 6.21 derive from the following equation: x−y 2 = (x − y), (x − y) = x, x + y, y − x, y − y, x =x 2 +y 2 − 2 Re x, y . (6.5) In the case F = R holds Re x, y = x, y , so Part (a) follows. If F = C, then x, y implies x − y 2 = x 2 + y 2 , so Part (b) follows. (Notice that the converse statement is not true in the case F = C, since Eq. (6.5) together with the hypothesis x − y 2 = x 2 + y 2 do not fix Im x, y .) This establishes the Proposition. In the case of real vector space the Cauchy-Schwarz inequality stated in Theorem 6.14 allows us to define the angle between vectors. Definition 6.22. The angle between two vectors x, y in a real vector space V with inner product , is the number θ ∈ [0, π ] solution of cos(θ) = x, y . xy Example 6.3.1: Consider the inner product space R2,2 , , matrices are orthogonal, A= 1 −1 3 , 4 B= F and show that the following −5 2 . 51 Solution: Since we need to compute the Frobenius inner product A, B the matrix 1 −1 −5 2 −10 1 AT B = = . 3 4 51 5 10 Therefore A, B F F , we first compute = tr AT B = 0, so A ⊥ B. Example 6.3.2: Consider the vector space V = C ∞ [− , ], R with the inner product f,g = f (x)g(x) dx. − Consider the functions un (x) = cos nπx and vm (x) = sin (a) Show that un ⊥ vm for all n, m. (b) Show that un ⊥ um for all n = m. mπx , where n, m are integers. 182 G. NAGY – LINEAR ALGEBRA december 8, 2009 (c) Show that vn ⊥ vm for all n = m. Solution: Recall the identities 1 2 1 cos(θ) cos(φ) = 2 1 sin(θ) sin(φ) = 2 Part (a): Using identity in Eq. (6.6) is sin(θ) cos(φ) = un , vm = cos nπx sin(θ − φ) + sin(θ + φ) , (6.6) cos(θ − φ) + cos(θ + φ) , (6.7) cos(θ − φ) − cos(θ + φ) . (6.8) simple to show that mπx sin dx − = 1 2 sin (n − m)πx + sin (n + m)πx . (6.9) − First, assume that both n − m and n + m are non-zero, un , vm = − 1 (n − m)πx cos 2 (n − m)π − + (n + m)π cos (n + m)πx − . (6.10) Since cos((n ± m)π ) = cos(−(n ± m)π ), we conclude that both terms above vanish. Second, in the case that n − m = 0 the first term in Eq. (6.9) vanishes identically and we need to compute the term with (n + m), which also vanishes by the second term in Eq. (6.10). Analogously, in the case of (n + m) = 0 the second term in Eq. (6.9) vanishes identically and we need to compute the term with (n − m) which also vanishes by the first term in Eq. (6.10). Therefore, un , vm = 0 for all n, m integers, and so un ⊥ vm in this case. Part (b): Using identity in Eq. (6.7) is simple to show that un , um = cos nπx cos mπx dx − = 1 2 cos (n − m)πx + cos (n + m)πx . (6.11) − We know that n − m is non-zero. Now, assume that n + m is non-zero, then un , um = = 1 (n − m)πx sin 2 (n − m)π (n − m)π sin((n − m)π ) + − + (n + m)π (n + m)π sin (n + m)πx − sin((n + m)π ). (6.12) Since sin((n ± m)π ) = 0 for (n ± m) integer, we conclude that both terms above vanish. In the case of (n + m) = 0 the second term in Eq. (6.11) vanishes identically and we need to compute the term with (n − m) which also vanishes by the first term in Eq. (6.12). Therefore, un , um = 0 for all n = m integers, and so un ⊥ um in this case. Part (c): Using identity in Eq. (6.8) is simple to show that vn , vm = sin nπx sin mπx dx − = 1 2 cos − (n − m)πx − cos (n + m)πx . (6.13) G. NAGY – LINEAR ALGEBRA December 8, 2009 183 Since the only difference between Eq. (6.13) and (6.11) is the sign of the second term, repeating the argument done in case (b) we conclude that vn , vm = 0 for all n = m integers, and so vn ⊥ vm in this case. 6.3.1. Orthonormal basis. An important property of a basis is that every vector in a vector space can be decomposed in a unique way in terms of the basis vectors. This decomposition is particularly simple to find in an inner product space when the basis is an orthonormal basis. In order to introduce such basis we start defining an orthonormal set. Definition 6.23. The set U = {u1 , · · · , up }, p 1, in an inner product space V , , called orthonormal iff for all i, j = 1, · · · , p holds 0 if i = j, ui , uj = 1 if i = j. is The set U is called orthogonal iff holds that ui , uj = 0 if i = j and ui , ui = 0. Example 6.3.3: Consider the vector space V = C ∞ [− , ], R with the inner product f,g = f (x)g(x) dx. − Show that the set 1 1 nπx 1 mπx U = u0 = √ , un (x) = √ cos , vm (x) = √ sin 2 is an orthonormal set. ∞ n=1 Solution: We have shown in Example 6.3.2 that the set U is an orthogonal set. We only need to compute the norm of the vectors u0 , un and vn , for n = 1, 2, · · · . The norm of the first vector is simple to compute, u0 , u0 = − 1 dx = 1. 2 The norm of the cosine functions is computed as follows, un , un = 1 cos2 nπx dx − = 1 2 1 + cos 2nπx dx − =1+ 2nπx ⇒ 2nπ − A similar calculation for the sine functions gives the result vn , vn = sin 1 sin2 nπx un , un = 1. dx − = 1 2 =1− 1 − cos 2nπx dx − 2nπ Therefore, U is an orthonormal set. sin 2nπx A straightforward result is the following: − ⇒ vn , vn = 1. 184 G. NAGY – LINEAR ALGEBRA december 8, 2009 Proposition 6.24. An orthogonal set in an inner product space is linearly independent. Proof of Proposition 6.24: Let U = {u1 , · · · , up }, p c1 , · · · , cp ∈ F be scalars such that 1, be an orthogonal set. Let c1 u1 + · · · + cp up = 0. Then, for any ui ∈ U holds c1 u1 , ui + · · · + cp un ui = 0 ⇒ ci ui , ui = 0 ⇒ ci = 0. So, all scalars ci = 0 for i = 1, · · · , p. Therefore, U is a linearly independent set. Definition 6.25. A basis U of a finite dimensional inner product space is called an orthonormal basis (or orthogonal basis) iff the basis U is an orthonormal (or orthogonal) set. Example 6.3.4: Consider the inner product space R2 , · . Determine whether the following bases are orthonormal, orthogonal or neither: S = e1 = 1 0 , e2 = 0 1 , U = u1 = 1 −1 , u2 = 1 1 , V = v1 = 1 3 , v2 = 3 1 . Solution: The basis S is orthonormal, since e1 · e2 = 0 and e1 · e1 = e2 · e2 = 1. The basis U is orthogonal since u1 · u2 = 0, but it is not orthonormal. Finally, the basis V is neither orthonormal nor orthogonal. Proposition 6.26. Given the set U = {u1 , · · · , up }, p 1, in the inner product space Fn , · , introduce the matrix U = [u1 , · · · , up ]. Then the following statements hold: (a) U is an orthonormal set iff matrix U satisfies U∗ U = Ip . (b) U is an orthonormal basis of Fn iff matrix U satisfies U−1 = U∗ . Proof of Proposition 6.26: Part (a) is proved by a straightforward computation, ∗ ∗ u1 u1 · · · u∗ up u1 u1 · u1 · · · u1 · up 1 . . . = . . =I . . . . U∗ U = . u 1 , · · · , u p = . p . . . . . ∗ ∗ ∗ up u1 · · · up up up up · u1 · · · up · up Part (b) follows from part (a): If U is a basis of Fn , then p = n; since U is an orthonormal set, part (a) implies U∗ U = In . Since U is an n × n matrix, it follows that U∗ = U−1 . This establishes the Proposition. 1 2 Example 6.3.5: Consider v1 = 1, v2 = 0 , in the inner product space R3 , · . 2 −1 (a) Show that v1 ⊥ v2 ; (b) Find x ∈ R3 such that x ⊥ v1 and x ⊥ v2 . (c) Rescale the elements of {v1 , v2 , x} so that the new set is an orthonormal set. Solution: Part (a): 11 2 2 0=2+0−2=0 −1 ⇒ v1 ⊥ v2 . G. NAGY – LINEAR ALGEBRA December 8, 2009 x1 Part (b): We need to find x = x2 such that x3 x1 v2 · x = v1 · x = 1 1 2 x2 = 0, x3 2 Gauss elimination operation on AT imply 11 20 2 10 → −1 01 −1/2 5/2 ⇒ x1 −1 x2 = 0, x3 0 The equations above can be written in matrix notation as 1 AT x = 0, where A = v1 , v2 = 1 2 185 2 0 . −1 x = 1 x , 1 3 2 5 x2 = − x3 , 2 x free. 3 1 There is a solution for any choice of x3 = 0, so we choose x3 = 2, that is, x = −5. 2 Part (c): The vectors v1 , v2 and x are mutually orthogonal. Their norms are: √ √ √ v2 = 5, x = 30. v1 = 6, Therefore, the orthonormal set is 1 2 1 1 1 1 1 , u2 = √ 0 , u3 = √ −5 . u1 = √ 62 5 −1 30 2 Finally, notice that the inverse of the matrix 1 2 1 √ U= √6 1 6 2 √ 6 √ 5 0 1 − √5 √ 30 − √5 30 √2 30 is U−1 = 1 √ √6 2 5 √1 30 1 √ 6 0 − √5 30 2 √ 6 1 − √5 √2 30 = UT . 6.3.2. Vector components. The components of an arbitrary vector in an orthonormal basis are simple to find. Proposition 6.27. If V , , is an n-dimensional inner product space with an orthonormal basis U = {u1 , · · · , un }, then every vector x ∈ V can be decomposed as x = u1 , x u1 + · · · + un , x un . (6.14) Proof of Proposition 6.27: Since U is a basis, we know that for all x ∈ V there exist scalars c1 , · · · , cn such that x = c1 u1 + · · · + cn un . Therefore, the inner product ui , x for any i = 1, · · · , n is given by ui , x = c1 ui , u1 + · · · + cn ui , un . Since U is an orthonormal set, ui , x = ci . This establishes the Proposition. 186 G. NAGY – LINEAR ALGEBRA december 8, 2009 If U = (u1 , · · · , un ) is an orthonormal ordered basis, the coordinate map [ ]u : V → Fn is expressed in terms of the Fourier coefficients as follows, u1 , x . [x]u = xu = . . . un , x 3 Example 6.3.6: Consider the inner product space R , · with the standard ordered basis 1 S , and find the vector components of x = 2 in the orthonormal ordered basis 3 1 2 1 1 1 1 1 , u2 = √ 0 , u3 = √ −5 . U = u1 = √ 62 5 −1 30 2 Solution: The vector components of x in the orthonormal basis U are given by √ 9 u1 , x 6 −√ xu = u2 , x = 15 . u3 , x − √3 30 Notice that we have done a change of basis, from the standard basis S to the U basis. In fact, we can express the calculation above as follows 1 2 1 √ −1 xu = P xs , where P = Ius 16 = √6 2 √ 6 √ 5 0 1 − √5 √ 30 − √5 30 √2 30 = U. Since U is an orthonormal basis, U−1 = UT , so we conclude that 1 9 1 2 √ √ √ √ 1 6 6 6 6 √ 1 1 2 T 0 − √5 2 = − √5 . xu = U xs = 5 √1 √2 3 − √5 − √3 30 30 30 30 G. NAGY – LINEAR ALGEBRA December 8, 2009 187 Exercises. 6.3.1.- Prove that the following form of the Pythagoras Theorem holds on complex vector spaces: Two ` vectors x, y in an ´ inner product space V, , over C are orthogonal iff for all a, b ∈ C holds ax + b y 2 = ax 2 + by 2 . 6.3.2.- Consider the vector space R3 with the dot product. Find all vectors x ∈ R3 which are orthogonal to the vector 23 1 v = 42 5 . 3 6.3.3.- Consider the vector space R3 with the dot product. Find all vectors x ∈ R3 which are orthogonal to the vectors 23 23 1 1 v1 = 425 , v1 = 405 . 1 3 6.3.4.- Let P3 ([−1, 1]) be the space of polynomials up to degree three defined on the interval [−1, 1] ⊂ R with the inner product Z1 p, q = p(x)q(x) dx. −1 Show that the set (p0 , p1 , p2 , p3 ) is an orthogonal basis of P3 , where p0 (x) = 1, p1 (x) = x, 1 p2 (x) = (3x2 − 1), 2 1 p3 (x) = (5x3 − 3x). 2 (These polynomials are the first four of the Legendre polynomials.) 6.3.5.- Consider the vector space R3 with the dot product. (a) Show that the following ordered basis U is orthonormal, 23 23 23 1 1 −1 “1 1 4 5 1 4 5” √ 4−15 , √ 1 ,√ −1 . 2 31 6 0 2 (b) Use part (a) to find the components in the ordered basis U of the vector 23 1 x = 4 0 5. −2 6.3.6.- Consider the vector space R2,2 with the Frobenius inner product. (a) Show that the ordered basis given by U = (E1 , E2 , E3 , E4 ) is orthonormal, where » – » – 101 11 0 E1 = √ E2 = √ 21 0 2 0 −1 – – » » 1 1 −1 111 E3 = E4 = . 1 21 2 −1 1 (b) Use part (a) to find the components in the ordered basis U of the matrix – » 11 . A= 11 6.3.7.- Consider the inner product space ` 2,2 ´ R , , F , and find the cosine of the angle between the matrices – – » » 2 −2 13 . A= , B= 24 2 0 6.3.8.- Find the third column in matrix U below such that UT = U−1 , where √ 2√ 3 1/√3 1/√14 U13 U = 41/√3 2/ √ 14 U23 5 . 1/ 3 −3/ 14 U33 188 G. NAGY – LINEAR ALGEBRA december 8, 2009 6.4. Orthogonal projections 6.4.1. Orthogonal projection onto subspaces. We start decomposing a vector in orthogonal components with respect to a one dimensional subspace. The study of this simple case describes the main ideas and the main notation used in orthogonal decompositions. Later on we present the decomposition of a vector onto an n-dimensional subspace. Proposition 6.28. Fix a vector u = 0 in an inner product space V , , . Given any vector x ∈ V decompose it as x = x + x− where x ∈ Span({u}). Then, x− ⊥ u iff holds x= u, x u. u2 (6.15) Furthermore, in the case that u is a unit vector holds x = u, x u. (6.16) The main idea of this decomposition can be understood in the inner product space R2 , · and it is sketched in Fig. 42. It is obvious that every vector x can be expressed as the sum of two vectors. What is special of the decomposition in Eq. 6.15 is that x has the precise length such that x− is orthogonal to x (see Fig. 42). x x u x Figure 42. Orthogonal decomposition of the vector x ∈ R2 onto the subspace spanned by vector u. Proof of Proposition 6.28: Since x ∈ Span({u}), there exists a scalar a such that x = au. Therefore u ⊥ x− iff holds that u, x− = 0. A straightforward computation shows, 0 = u, x− = u, x − u, x = u, x − a u, u ⇔ a= u, x . u2 We conclude that the decomposition x = x + x− satisfies x− ⊥ u ⇔ x= u, x u. u2 In the case that u is a unit vector holds u = 1, so x is given by Eq. (6.16). This establishes the Proposition. Example 6.4.1: Consider the inner product space R3 , · and decompose the vector x in orthogonal components with respect to the vector u, where 3 1 x = 2 , u = 2 . 1 3 G. NAGY – LINEAR ALGEBRA December 8, 2009 189 u·x u2 Solution: We first compute x = u. Since 3 u · x = 1 2 3 2 = 3 + 4 + 3 = 10, 1 1 u 2 = 1 2 3 2 = 1 + 4 + 9 = 14, 3 1 5 we obtain x = 2. We now compute x− as follows, 7 3 3 1 21 5 16 5 1 1 1 2= 14 − 10 = 4 x− = x − x = 2 − 7 7 7 7 1 3 7 15 −8 ⇒ 4 4 1. x− = 7 −2 Therefore, x can be decomposed as 1 4 5 4 2+ 1. x= 7 7 3 −2 We can verify that this decomposition is orthogonal with respect to u, since 4 4 4 1 2 3 1 = (4 + 2 − 6) = 0. u · x− = 7 7 −2 We now decompose a vector into orthogonal components with respect to a p-dimensional subspace with p 1. Proposition 6.29. Fix an orthogonal set U = {u1 , · · · , up }, with p 1, in an inner product space V , , . Given any vector x ∈ V , decompose it as x = x + x− , where x ∈ Span(U ). Then, x− ⊥ ui , for i = 1, · · · , p iff holds x= u1 , x up , x u1 + · · · + up . u1 2 up 2 (6.17) Furthermore, in the case that U is an orthonormal basis holds x = u1 , x u1 + · · · + up , x up . (6.18) The main idea behind this decomposition can be understood in the inner product space R3 , · and it is sketched in Fig. 43. Proof of Proposition 6.29: Since x ∈ Span(U ), there exist scalars ai , for i = 1, · · · , p such that x = a1 u1 + · · · + ap up . The vector x− ⊥ ui iff holds that ui = x− = 0. A straightforward computation shows that, for i = 1, · · · , p holds 0 = ui , x− = ui , x − ui , x = ui , x − a1 ui , u1 − · · · − ap ui , up = ui , x − ai ui , ui ⇔ ai = ui , x . ui 2 190 G. NAGY – LINEAR ALGEBRA december 8, 2009 x x R 3 u2 x u1 U Figure 43. Orthogonal decomposition of the vector x ∈ R3 onto the subspace U spanned by the vectors u1 and u2 . We conclude that the decomposition x = x + x− satisfies x− ⊥ ui for i = 1, · · · , p iff holds x= u1 , x up , x u1 + · · · + up . u1 2 up 2 In the case that U is an orthonormal set holds ui = 1 for i = 1, · · · , p, so x is given by Eq. (6.18). This establishes the Proposition. Example 6.4.2: Consider the inner product space R3 , · and decompose the vector x in orthogonal components with respect to the subspace U , where 1 2 −2 x = 2 , U = Span u1 = 5 , u2 = 1 . 3 −1 1 Solution: In order to use Eq. (6.17) we need an orthogonal basis of U . So, need need to verify whether u1 is orthogonal to u2 . This is indeed the case, since −2 u1 · u2 = 2 5 −1 1 = −4 + 5 − 1 = 0. 1 So now we use u1 and u2 to compute x using Eq. (6.18). We need the quantities 1 2 u1 · x = 2 5 −1 2 = 9, u1 · u1 = 2 5 −1 5 = 30, 3 −1 1 −2 u1 · x = −2 1 1 2 = 3, u2 · u2 = −2 1 1 1 = 6. 3 1 Now is simple to compute x , since 2 −2 2 −2 6 −10 3 1 1 1 9 3 5+ 1= 5+ 1= 15 + 5 , x= 30 6 10 2 10 10 −1 1 −1 1 −3 5 therefore, −4 1 20 x= 10 2 ⇒ −2 1 10 . x= 5 1 G. NAGY – LINEAR ALGEBRA December 8, 2009 The vector x− is obtained as follows, 1 −2 5 −2 7 1 1 1 1 x− = x − x = 2 − 10 = 10 − 10 = 0 5 5 5 5 3 1 15 1 14 191 ⇒ 1 7 x− = 0 . 5 2 We conclude that −2 1 1 7 10 + 0. x= 5 5 1 2 We can verify that x− ⊥ U , since 1 7 7 2 5 −1 0 = (2 + 0 − 2) = 0, u1 · x− = 5 5 2 1 7 7 −2 1 1 0 = (2 + 0 − 2) = 0. u2 · x− = 5 5 2 6.4.2. Orthogonal complement. There is a natural extension to a vector space of the orthogonal decomposition of a vector presented above. We start introducing the orthogonal complement of a subspace. Later on it will be clear why it is called a complement. Definition 6.30. Let W be a subspace in an inner product space V , , complement of W , denoted as W ⊥ , is the set W ⊥ = {u ∈ V : u, w = 0 . The orthogonal ∀ w ∈ W }. 3 Example 6.4.3: In the inner product space R , · , the orthogonal complement to a line is a plane, and the orthogonal complement to a plane is a line, as it is shown in Fig. 44 R 3 U R 3 V 0 0 U V Figure 44. The orthogonal complement to the plane U is the line U ⊥ , and the orthogonal complement to the line V is the plane V ⊥ . As the sketch in Fig. 44 suggest, the orthogonal complement of a subspace is itself a subspace. Proposition 6.31. Given an inner product space V , , orthogonal complement W ⊥ is also a subspace of V . and a subspace W ⊂ V , the 192 G. NAGY – LINEAR ALGEBRA december 8, 2009 Proof of Proposition 6.31: Let u1 , u2 ∈ W ⊥ , that is, ui , w = 0 for all w ∈ W and i = 1, 2. Then, any linear combination au1 + bu2 also belongs to W ⊥ , since (au1 + bu2 ), w = a u1 , w + b u2 , w = 0 + 0 ∀w ∈ W. This establishes the Proposition. −1 w1 = 2 3 Example 6.4.4: Find W ⊥ for the subspace W = Span in R3 , · . Solution: We need to find the set of all x ∈ R3 such that x · w1 = 0. That is, x1 −1 2 3 x2 = 0 ⇒ x1 = 2x2 + 3x3 . x3 The solution is 2x2 + 3x3 x2 x= x3 ⇒ hence we obtain W ⊥ = Span 2 3 x = 1 x2 + 0 x3 , 0 1 2 3 1 , 0 0 1 . So, the orthogonal complement of a line is a plane, as sketched in the second picture in Fig. 44. We can verify that the result is correct, since 2 3 −1 2 3 1 = −2 + 2 + 0 = 0, −1 2 3 0 = −3 + 0 + 3 = 0. 0 1 Example 6.4.5: Find W ⊥ for the subspace W = Span 1 3 w1 = 2 , w2 = 2 3 1 in R3 , · . Solution: We need to find the set of all x ∈ R3 such that x · w1 = 0 and x · w2 = 0. That is, x 1 2 3 1 x2 = 0 321 x3 We an use Gauss elimination to find the solution, x1 = x3 , 123 1 0 −1 x2 = −2x3 , → ⇒ 321 01 2 x free. 3 hence we obtain W ⊥ = Span 1 −2 1 . ⇒ 1 x = −2 x3 , 1 G. NAGY – LINEAR ALGEBRA December 8, 2009 193 So, the orthogonal complement of a pane is a line, as sketched in the first picture in Fig. 44. We can verify that the result is correct, since 1 1 1 2 3 −2 = 1 − 4 + 3 = 0, 3 2 1 −2 = 3 − 4 + 1 = 0. 1 1 The reason for the word “complement” in the name of an orthogonal complement is the following result, which says that the vector space can be split into the sum of two subspaces with zero intersection. Proposition 6.32. If V , , is a finite dimensional inner product space and W ⊂ V is a subspace, then holds V = W ⊕ W ⊥ . Proof of Proposition 6.32: We first show that V = W + W ⊥ . In order to do that, first choose an orthonormal basis W = {w1 , · · · , wp } for W , we here we have assumed dim W = p n = dim V . Then, for every vector x ∈ V holds that it can be decomposed as x = x + x− , x = w1 , x w1 + · · · + wp , x wp , x− = x − x . Proposition 6.29 says that x− ⊥ wi for i = 1, · · · , p. This implies that x− ∈ W ⊥ and, since x is an arbitrary vector in V , we have established that V = W + W ⊥ . We now show that W ∩ W ⊥ = {0}. Indeed, if u ∈ W ⊥ , then u, w = 0 for all w ∈ W . If u ∈ W , then choosing w = u in the equation above we get u, u = 0, which implies u = 0. Therefore, W ∩ W ⊥ = {0} and we conclude that V = W ⊕ W ⊥ . This establishes the Proposition. Since the orthogonal complement W ⊥ of a subspace W is itself a subspace, we can compute (W ⊥ )⊥ . The following statement says that the result is the original subspace W . Proposition 6.33. If V , , subspace, then holds W ⊥⊥ is a finite dimensional inner product space and W ⊂ V is a = W. Proof of Proposition 6.33: First notice that W ⊂ W ⊥ ⊥ ⊥ . Indeed, for every w ∈ W holds that w, u = 0 for u ∈ W . This condition says that w ∈ W ⊥ W⊂ W ⊥⊥ ⊥ , so we conclude . Second, notice that Proposition 6.29 says, V = W⊥ ⊕ W⊥ ⊥ . ⊥ In particular, this decomposition implies that dim V = dim W ⊥ + dim W ⊥ . Proposition 6.29 also says that V = W ⊕ W ⊥, which in particular implies that dim V = dim W +dim W ⊥ . We thus conclude that dim W = ⊥ ⊥ ⊥ dim W ⊥ . Since W ⊂ W ⊥ , we arrive to the conclusion that W = W ⊥ . This establishes the Proposition. 194 G. NAGY – LINEAR ALGEBRA december 8, 2009 Exercises. 6.4.1.- Consider the inner product space `3 ´ R , · and use Proposition 6.28 to find the orthogonal decomposition of vector x along vector u, where 23 23 1 1 x = 425 , u = 415 . 3 1 6.4.2.- Consider the subspace W given by 23 23 1 2 o” “n Span w1 = 425 , w2 = 4−15 1 3 `3 ´ in the inner product space R , · . (a) Find an orthogonal decomposition of the vector w2 with respect to the vector w1 . Using this decomposition, find an orthogonal basis for the space W . (b) Find the decomposition of the vector x below in orthogonal components with respect to the subspace W , where 23 4 x = 43 5 . 0 6.4.3.- Consider the subspace W given by 23 23 2 2 “n o” Span w1 = 4 0 5 , w2 = 4−15 , −2 0 `3 ´ in the inner product space R , · . Decompose the vector x below into orthogonal components with respect to W , where 23 1 x = 4 1 5. −1 (Notice that w1 ⊥ w2 .) 6.4.4.- Given the matrix A below, find a basis for the space R(A)⊥ , where 2 3 12 A = 41 1 5 . 20 6.4.5.- Consider the subspace 23 n 1o W = Span 4−15 2 ` ´ in the inner product space R3 , · . (a) Find a basis for the space W ⊥ , that is, find a basis for the orthogonal complement of the space W . (b) Use Proposition 6.28 to transform the basis of W ⊥ found in part (a) into an orthogonal basis. 6.4.6.- Consider the inner product space `4 ´ R , · , and find a basis for the orthogonal complement of the subspace W given by 23 23 2 1 “n627 647o” 6 7, 6 7 . W = Span 4 5 4 5 1 0 6 3 6.4.7.- Let X and Y be subspaces of a finite dimensional inner product space ` ´ V, , . Prove the following: (a) X ⊂ Y ⇒ Y ⊥ ⊂ X ⊥; ⊥ (b) (X + Y ) = X ⊥ ∩ Y ⊥ ; (c) (X ∩ Y )⊥ = X ⊥ + Y ⊥ . G. NAGY – LINEAR ALGEBRA December 8, 2009 195 6.5. Best approximation 6.5.1. Fourier expansions. We have seen that orthonormal bases have a practical advantage over arbitrary basis. The components [x]u of the vector x in the orthonormal basis U = {u1 , · · · , un } of the inner product space V , , are given by u1 , x . [x]u = . . . un , x In the case that an orthonormal set U = {u1 , · · · , up } is not a basis of V , that is, p < dim V , one can always introduce the orthogonal projection of a vector x ∈ V onto the subspace Span(U ) as follows, x = u1 , x u1 + · · · + up , x up . We now give such a projection a new name and we study one important property of this projection. Definition 6.34. Let U = {u1 , · · · , up } be an orthonormal set in an inner product space V , , . The Fourier expansion with respect to U of a vector x ∈ V is the unique vector x ∈ Span(U ) given by x = u1 , x u1 + · · · + up , x up . Each coefficient ui , x is called a Fourier coefficient of x with respect to U . Therefore, the Fourier expansion with respect to a an orthonormal set U of a vector x is the orthogonal projection x onto Span(U ), which was introduced above. We know that this vector x is very special since (x − x ) = x− ∈ Span(U )⊥ . This property means that x is the best approximation of the vector x from within Span(U ). See Fig. 45 for the case V = R3 , with , = ·, and Span(U ) two-dimensional. Theorem 6.35. (Best approximation) Let U = {u1 , · · · , up } be an orthonormal set of an inner product space V , , . The Fourier expansion x of a vector x ∈ V is the unique vector in Span(U ) that is closest to x, in the sense that x−x < x−y ∀ y ∈ Span(U ) − {x }. Proof of Theorem 6.35: Recall that x− = x − x is orthogonal to Span(U ), that is (x − x ) ⊥ (x − y) for all y ∈ Span(U ). Hence, x−y 2 = (x − x ) + (x − y) 2 = x−x 2 + x − y 2, (6.19) where the las equality comes from Pythagoras Theorem. Eq. (6.19) says that x − y is the smallest iff y = x and the smallest value is x − x . This establishes the Theorem. Example 6.5.1: Consider the vector space C ∞ [− , ], R with inner product f,g = f (x)g(x) dx. − Consider the orthonormal set UN given by 1 1 1 nπx nπx , vn = √ sin UN = u0 = √ , un = √ cos 2 N n=1 . (a) Find the general expression of the Fourier expansion of a function f ∈ C ∞ [− , ], R with respect to the orthonormal set UN . (b) Consider the case = 1 and find the Fourier expansion of the function f (x) = x with respect to UN . 196 G. NAGY – LINEAR ALGEBRA december 8, 2009 R 3 x (x−y) x y Span ( U ) Figure 45. The Fourier expansion x of the vector x ∈ R3 is the best approximation of x from within Span(U ). Solution: Part (a) is straightforward, since, the Fourier expansion N f= um , f um + vm , f vm , m=0 takes the form N f (x) = am cos mπx mπx + bm sin m=0 where the coefficients a0 , an and bn , for n = 1, · · · , N are given by a0 = an = 1 2 f (x) dx, − 1 nπx f (x) cos dx, − bn = 1 nπx f (x) sin dx. − Part (b) is simply to apply formulas above to the function f (x) = x on the interval [−1, 1], since = 1. The coefficient a0 above is given by a0 = 1 2 1 x dx = −1 12 x 4 1 −1 ⇒ a0 = 0. The coefficients an , bn , for n = 1, · · · , N computed with one integration by parts, 1 x sin(nπx) + 2 2 cos(nπx), x cos(nπx) dx = nπ nπ x 1 x sin(nπx) dx = − cos(nπx) + 2 2 sin(nπx). nπ nπ The coefficients an vanish, since 1 an = x cos(nπx) dx = −1 x sin(nπx) nπ 1 −1 + 1 n2 π 2 cos(nπx) 1 −1 ⇒ an = 0. The coefficients bn are given by 1 bn = x sin(nπx) dx = − −1 x cos(nπx) nπ 1 −1 + 1 n2 π 2 sin(nπx) 1 −1 ⇒ bn = 2(−1)(n+1) . nπ G. NAGY – LINEAR ALGEBRA December 8, 2009 197 Therefore, the Fourier expansion of f (x) = x with respect to UN is given by N f (x) = 2(−1)(n+1) sin(nπx). nπ n=0 Theorem 6.35 states that the function f above is the combination of sine and cosine functions that best approximate the linear the function f (x) = x on the interval [−1, 1]. 6.5.2. Null and range spaces of a matrix. The null and range spaces associated with a matrix A ∈ Fm,n and its adjoint matrix A∗ are deeply related. Theorem 6.36. For all matrices A ∈ Fm,n hold N (A) = R(A∗ )⊥ and N (A∗ ) = R(A)⊥ . Since for every subspace W holds (W ⊥ )⊥ = W , we also have the relations N (A)⊥ = R(A∗ ), N (A∗ )⊥ = R(A). In the case of real-valued matrices, the Theorem above says that N (A) = R(AT )⊥ and N (AT ) = R(A)⊥ . Before the proof let us review the following notation: Given an m × n matrix A ∈ Fm,n , we write it either in terms of column vectors A:j ∈ Fm for j = 1, · · · , n, or in terms of row vectors Ai: ∈ Fn for i = 1, · · · , m, as follows, A1: . A = A:1 · · · A:n , A = . . . Am: Since the same type of definition holds for the n × m matrix A∗ , that is, ∗ (A )1: . ∗ ∗ ∗ ∗ A = (A ):1 · · · (A ):m , A = . , . (A∗ )n: then we have the relations (A:j )∗ = (A∗ )j : , (Ai: )∗ = (A∗ ):i . For example, consider the 2 × 3 matrix A= 12 45 The transpose is a 1 AT = 2 3 3 6 ⇒ A:1 = 1 2 3 , A:2 = , A:3 = , 4 5 6 A1: = 1 2 A2: = 4 5 3, . 6. 3 × 2 matrix that can be written as follows (AT )1: = 1 4 , 4 1 4 5 ⇒ (AT ):1 = 2 , (AT ):2 = 5 , (AT )2: = 2 5 ,. 6 3 6 (AT )3: = 3 6 . So, for example we have the relation (A:3 )T = 3 6 = (AT )3: , 4 (A2: )T = 5 = (AT ):2 . 6 198 G. NAGY – LINEAR ALGEBRA december 8, 2009 Proof of Theorem 6.36: We first show that the N (A) = R(AT ). A vector x ∈ Fn belongs to N (A) iff holds ∗ (A∗ ):1 · x = 0, (A∗ ):1 A1: . . . . . Ax = 0 ⇔ . x = 0 ⇔ x = 0 ⇔ . . . ∗ ∗ ∗ Am: (A ):m (A ):m · x = 0. So, x ∈ N (A) iff x is orthogonal to every column vector in A∗ , that is, x ∈ R(A∗ )⊥ . The equation N (A∗ ) = R(A)⊥ comes from N (B) = R(B∗ )⊥ taking B = A∗ . Nevertheless, we repeat the proof above, just to understand the previous argument. A vector y ∈ Fm belongs to N (A∗ ) iff ∗ A:1 · y = 0, (A )1: (A:1 )∗ . . . . A∗ y = 0 ⇔ . y = 0 ⇔ . y = 0 ⇔ . . . (A:n )∗ (A∗ )n: A:n · y = 0. So, y ∈ N (A∗ ) iff y is orthogonal to every column vector in A, that is, y ∈ R(A)⊥ . This establishes the Theorem. Example 6.5.2: Verify Theorem 6.36 for the matrix A = 1 3 2 2 3 . 1 Solution: We first find the N (A), that is, all x ∈ R3 solutions of Ay = 0. Gauss operations on matrix A imply x1 = x3 , 1 1 0 −1 123 x2 = −2x3 , ⇒ N (A) = Span −2 . ⇒ → 01 2 321 x free, 1 3 It is simple to find R(AT ), since 1 3 2 , 2 3 1 R(AT ) = Span Theorem 6.36 is verified, since 1 1 2 3 −2 = 1 − 4+3 = 0, 1 . 32 1 1 −2 = 3 − 4+1 = 0 1 N (A) = R(AT )⊥ . ⇒ Let us verify the same Theorem for AT . We first find N (AT ), that is, all y ∈ R2 solutions of AT y = 0. Gauss operations on matrix AT imply 13 10 2 2 → 0 1 ⇒ y = 0 ⇒ N (AT ) = {0}. 0 31 00 The space R(A) is given by R(A) = Span 1 2 3 , , 3 2 1 = Span 1 2 , 3 2 = R2 . Since (R2 )⊥ = {0}, Theorem 6.36 is verified. We end this Section with a result we used in Chapter 2 without proof. G. NAGY – LINEAR ALGEBRA December 8, 2009 199 Theorem 6.37. For every matrix A ∈ Fm,n holds that rank(A) = rank(A∗ ). Proof of Theorem 6.37: Recall the nullity plus rank result in Corollary 5.5, which says that for all matrix A ∈ Fm,n holds dim N (A) + dim R(A) = n. Equivalently, dim R(A) = n − dim N (A) = n − dim R(A∗ )⊥ , since N (A) = R(A∗ )⊥ . From the orthogonal decomposition Fn = R(A∗ ) ⊕ R(A∗ )⊥ we know that dim R(A∗ ) = n − dim R(A∗ )⊥ . We then conclude that dim R(A) = dim R(A∗ ). This establishes the Theorem. 200 G. NAGY – LINEAR ALGEBRA december 8, 2009 Exercises. 6.5.1.- Consider the inner product space `3 ´ R , · and the orthonormal set U , 23 23 1 1 n 14 5 1 4 5o − 1 , u2 = √ 1. u1 = √ 2 31 0 Find the best approximation of x below in the subspace Span(U ), where 23 1 x = 4 0 5. −2 6.5.2.- Consider the inner product space ` 2,2 ´ R , , F and the orthonormal set U = {E1 , E2 }, where » – » – 101 11 0 E1 = √ , E2 = √ . 21 0 2 0 −1 Find the best approximation of matrix A below in the subspace Span(U ), where – » 11 . A= 11 6.5.3.- Consider the inner R product space 1 P2 ([0, 1]), with p, q = 0 p(x)q(x) dx, and the subspace U = Span(U ), where 1 U = {q0 = 1, q1 = (x − 2 )}. (a) Show that U is an orthogonal set. (b) Find r , the best approximation with respect to U of the polynomial r(x) = 2x + 3x2 . (c) Verify whether (r − r ) ∈ U ⊥ or not. 6.5.4.- Consider the space C ∞ ([− , ], R) with inner product Z f,g = f (x)g(x) dx, − and the orthonormal set U given by 1 u0 = √ 2 “ πx ” 1 u1 = √ cos “ πx ” 1 v1 = √ sin . Find the best approximation of x 0x , f (x) = −x − x < 0. in the space Span(U ) 6.5.5.- For the matrix A ∈ R3,3 below, verify that N (A) = R(AT )⊥ and that N (AT ) = R(A)⊥ , where 3 2 2 1 1 0 5. A = 4−1 −1 −2 −1 −1 G. NAGY – LINEAR ALGEBRA December 8, 2009 201 6.6. Gram-Schmidt method We now describe the Gram-Schmidt orthogonalization method, which is a method to transform a linearly independent set of vectors into an orthonormal set. The method is based on projecting the i-th vector in the set onto the subspace spanned by the previous (i − 1) vectors. Theorem 6.38. (Gram-Schmidt) Let X = {x1 , · · · , xp } be a linearly independent set in an inner product space V , , . Define the set Y = {y1 , · · · , yp } as follows, y1 = x1 , y2 = x2 − y1 , x2 y, y1 2 1 . . . yp = xp − y(p−1) , xp y1 , xp y1 − · · · − y . 2 y1 y(p−1) 2 (p−1) Then, Y is an orthogonal set with Span(Y ) = Span(X ). Furthermore, the set yp y Z = z1 = 1 , · · · , zp = y1 yp is an orthonormal set with Span(Z ) = Span(Y ). Using the notation in Proposition 6.29 we can write y2 = x2− , where the projection is onto the subspace Span({y1 }). Analogously, yi = xi− , for i = 2, · · · , p, where the projection is onto the subspace Span({y1 , · · · , yi−1 }). Proof of Theorem 6.38: We first show that Y is an orthogonal set. It is simple to see that y2 ∈ Span({y1 })⊥ , since y1 , y2 = y1 , x2 − y1 , x2 y1 , y1 = 0. y1 2 Assume that yi ∈ Span({y1 , · · · , yi−1 })⊥ , we then show that yi+1 ∈ Span({y1 , · · · , yi })⊥ . Indeed, for j = 1, · · · , i holds yj , yi+1 = yj , xi+1 − yj , xi+1 yj , yj = 0, yj 2 where we used that yj ∈ Span({y1 , · · · , yi−1 })⊥ , for all j = 1, · · · , i. Therefore, Y is an orthogonal set (and so, a linearly independent set). The proof that Span(X ) = Span(Y ) has two steps: On the one hand, the elements in Y are linear combinations of elements in X , hence Span(Y ) ⊂ Span(X ); on the other hand dim Span(X ) = dim Span(Y ), since X and Y are both linearly independent sets with the same number of elements We conclude that Span(X ) = Span(Y ). It is straightforward to see that Z is an orthonormal set, and since every element zi ∈ Z is proportional to every yi ∈ Y , then Span(Y ) = Span(Z ). This establishes the Theorem. Example 6.6.1: Use the Gram-Schmidt method to find an orthonormal basis for the inner product space R3 , · from the ordered basis 1 2 1 X = x1 = 1 , x2 = 1 , x3 = 1 . 0 0 1 202 G. NAGY – LINEAR ALGEBRA december 8, 2009 Solution: We first find an orthogonal basis. The first element is 1 y1 = x1 = 1 ⇒ y 1 2 = 2. 0 The second element is y2 = x2 − where y1 · x2 = 1 1 y1 · x2 y1 , y1 2 2 0 1 = 3. 0 A simple calculation shows 2 1 4 3 1 3 1 1 1 1= 2− 3= −1 , y2 = 1 − 2 2 2 2 0 0 0 0 0 therefore, 1 1 −1 y2 = 2 0 ⇒ y2 2 = 1 . 2 Finally, the last element is y3 = x3 − y1 · x3 y2 · x3 y ,− y2 , 21 y1 y2 2 where 1 1 1 y1 · x3 = 1 1 0 1 = 2, y2 · x3 = 2 1 Another simple calculation shows 1 0 1 2 1=0, y3 = 1 − 2 0 1 1 1 −1 0 1 = 0. 1 therefore, 0 y3 = 0 ⇒ y 3 2 = 1. 1 The set Y = {y1 , y2 , y3 } is an orthogonal set. Rescaling these vectors we find the orthonormal set 1 1 0 1 1 1 , z2 = √ −1 , z3 = 0 . Z z1 = √ 20 2 0 1 Example 6.6.2: Consider the vector space P3 ([−1, 1]) with the inner product 1 p, q = p(x)q(x) dx. −1 Given the basis {p0 = 1, p1 = x, p2 = x2 , p3 = x3 }, use the Gram-Schmidt method starting with the vector p0 to find an orthogonal basis for P3 ([−1, 1]). G. NAGY – LINEAR ALGEBRA December 8, 2009 203 Solution: The first element in the new basis is q0 = p0 = 1 ⇒ The second element is 1 2 q0 = dx = 2. −1 q0 , p1 q. q0 2 0 q 1 = p1 − It is simple to see that 1 q0 , p1 = x dx = −1 12 x 2 1 −1 = 0. So we conclude that q1 = p1 = x ⇒ q1 2 1 = x2 dx = −1 13 x 3 1 ⇒ −1 q1 2 The third element in the basis is q2 = p2 − q , p2 q0 , p2 q − 1 2 q1 . q0 2 0 q1 It is simple to see that 1 q0 , p2 = q1 , p2 = x2 dx = −1 1 13 x 3 1 x3 dx = 14 x 4 1 −1 −1 −1 = 2 , 3 = 0. Hence we obtain 21 1 q = x2 − 32 0 3 The norm square of this vector is q2 = p2 − q2 2 = 1 9 1 ⇒ q2 = 1 (3x2 − 1). 3 (3x2 − 1)(3x2 − 1) dx −1 1 1 (9x4 − 6x2 + 1) dx 9 −1 1 195 = x − 2x3 + x 95 −1 8 = . 45 Finally, the fourth vector of the orthogonal basis is given by = q , p3 q , p3 q0 , p3 q − 1 2 q1 − 1 2 q2 . q0 2 0 q1 q2 q 3 = p3 − It is simple to see that 1 q0 , p3 = q1 , p3 = q2 , p3 = x3 dx = −1 1 14 x 4 1 x4 dx = 15 x 5 1 −1 1 3 1 −1 −1 −1 = 0, = 2 , 5 (3x2 − 1) x3 dx = 116 14 x− x 32 4 1 −1 = 0. = 2 . 3 204 G. NAGY – LINEAR ALGEBRA december 8, 2009 Hence we obtain 3 1 23 q = x3 − x ⇒ q3 = (5x3 − 3x). 52 1 5 5 The orthogonal basis is then given by 1 1 q0 = 1, q1 = x, q2 = (3x2 − 1), q3 = (5x3 − 3x) . 3 5 These polynomials are proportional to the first three Legendre polynomials. The Legendre polynomials form an orthogonal set in the space P∞ ([−1, 1]) of polynomials of all degrees. They play an important role in physics, since Legendre polynomials are solution of a particular differential equation that often appears in physics. q3 = p3 − G. NAGY – LINEAR ALGEBRA December 8, 2009 205 Exercises. 6.6.1.- Find an orthonormal basis for the subspace of R3 spanned by the vectors 23 23 −2 1o n u1 = 4 2 5 , u2 = 4−35 , −1 1 using the Gram-Schmidt process starting with the vector u1 . 6.6.2.- Let W ⊂ R3 be the subspace 23 23 0 3o n Span u1 = 425 , u2 = 415 . 0 4 (a) Find an orthonormal basis for W using the Gram-Schmidt method starting with the vector u1 . (b) Decompose the vector x below as x = x + x− , with x ∈ W and x− ∈ W ⊥ , where 23 5 x = 415 . 0 6.6.3.- The column vectors in matrix A below form a linearly independent set. Use the Gram-Schmidt method to find an orthonormal basis for R(A), where 2 3 12 5 0 5. A = 40 2 1 0 −1 6.6.4.- Use the Gram-Schmidt method to find an orthonormal basis for R(A), where 2 3 12 1 A = 40 2 − 2 5 . 10 3 6.6.5.- Consider the vector space P2 ([0, 1]) with inner product Z1 p(x)q(x) dx. p, q = 0 Use the Gram-Schmidt method on the ordered basis ` ´ p0 = 1, p1 = x, p2 = x2 , starting with vector p0 , to obtain an orthogonal basis for P2 ([0, 1]). 206 G. NAGY – LINEAR ALGEBRA december 8, 2009 6.7. Least squares We describe the least squares method to find approximate solutions to inconsistent linear systems. The method is often used to find the best parameters that fit experimental data. The parameters are the unknowns of the linear system, and the experimental data determines the matrix of coefficients and the source vector of the system. Such a linear system usually contains more equations than unknowns, and it is inconsistent, since there are no parameters that fit all the data exactly. We start introducing the notion of least squares solution of a possibly inconsistent linear system. Definition 6.39. Given a matrix A ∈ Fm,n and a vector b ∈ Fm , · , the vector ˆ ∈ Fn is x called a least squares solution of the linear system Ax = b iff holds Aˆ − b x y−b ∀ y ∈ R(A). The problem we study is to find the least squares solution to an m × n linear system Ax = b. In the case that b ∈ R(A) the linear system Ax = b is consistent and the least squares solution ˆ is the actual solution of the system, hence Aˆ − b = 0. In the case that x x b does not belong to R(A), the linear system Ax = b is inconsistent. In such a case the least squares solution ˆ is the vector in Rn with the property that Aˆ is the closest vector to b in x x the inner product space Rm , · . A sketch of this situation for a matrix A ∈ R3,2 is given in Fig. 46. A R b 2 R 3 x Ax Ax R(A) Figure 46. The meaning of the least squares solution ˆ ∈ R2 for the 3 × 2 x inconsistent linear system Ax = b is that the vector Aˆ is the closest to b x in the inner product space R3 , · . Theorem 6.40. Consider the matrix A ∈ Fm,n and the vector b in the inner product space Fm , · . The vector ˆ ∈ Fn is a least squares solution of the m × n linear system Ax = b iff x ˆ is solution to the n × n linear system, called normal equation, x A∗ A ˆ = A∗ b. x (6.20) Furthermore, the least squares solution ˆ is unique iff the column vectors of matrix A form x a linearly independent set. In the case that F = R, the normal equation reduces to AT A ˆ = AT b. x Proof of Theorem 6.40: We are interested in finding a vector ˆ ∈ Fn such that Aˆ is the x x best approximation in R(A) of vector b ∈ Fm . That is, we want to find ˆ ∈ Fn such that x Aˆ − b x y−b ∀ y ∈ R(A). G. NAGY – LINEAR ALGEBRA December 8, 2009 207 Theorem 6.35 says that the best approximation of b is when Aˆ = b , where b is the x orthogonal projection of b onto the subspace R(A). This means that (Aˆ − b) ∈ R(A)⊥ = N (A∗ ) x ⇔ A∗ (Aˆ − b) = 0. x We then conclude that ˆ must be solution of the normal equation x A∗ Aˆ = A∗ b. x The furthermore can be shown as follows. The column vectors of matrix A form a linearly independent set iff N (A) = {0}. Lemma 6.41 establishes that, for all matrix A holds that N (A) = N (AT A). This result in our case implies that N (AT A) = {0}. Since matrix AT A is a square, n × n, matrix, we conclude that it is invertible. This is equivalent to say that the solution ˆ to the normal equation is unique; moreover, it is given by ˆ = (AT A)−1 AT b. This x x establishes the Theorem. In the proof of Theorem 6.40 above we used the following result: Lemma 6.41. If A ∈ Fm,n , then N (A) = N (A∗ A). Proof of Lemma 6.41: We first show that N (A) ⊂ N (A∗ A). Indeed, x ∈ N (A) ⇒ Ax = 0 A∗ Ax = 0 ⇒ ⇒ x ∈ N (A∗ A). Now, suppose that there exists x ∈ N (A∗ A) such that x ∈ N (A). Therefore, x = 0 and / Ax = 0, which imply that 0 = Ax 2 = x∗ A∗ Ax ⇒ A∗ Ax = 0. However, this last equation contradicts the assumption that x ∈ N (A∗ A). Therefore, we conclude that N (A) = N (A∗ A). This establishes the Lemma. Example 6.7.1: Show that the 3 × 2 linear system Ax = b is inconsistent; then find a least x ˆ squares solutions ˆ = 1 to that system, where x x2 ˆ 13 −1 A = 2 2 , b = 1 . 31 −1 Solution: We first show that the linear system above is inconsistent, since Gauss operation on the augmented matrix [A|b] imply 13 −1 1 3 −1 1 3 −1 2 2 1 → 0 −4 3 → 0 −4 3 . 31 −1 0 −8 2 0 0 1 In order to find the least squares solution to the system above we first construct the normal equation. We need to compute 13 −1 123 14 10 123 −2 2 2 = 1= AT A = , AT b = . 321 10 14 321 −2 31 −1 Therefore, the normal equation is given by 14 10 10 14 x1 ˆ −2 = . x2 ˆ −2 Since the column vectors of A form a linearly independent set, matrix AT A is invertible, AT A −1 = 1 14 96 −10 1 −10 7 = 14 48 −5 −5 . 7 208 G. NAGY – LINEAR ALGEBRA december 8, 2009 The least squares solution is unique and given by ˆ= x We now verify 1 x AT ˆ − b = 2 3 Since 1 −2 1 7 24 −5 −5 7 −1 −1 ˆ= x ⇒ 1 −1 . 12 −1 that (Aˆ − b) ∈ R(A)⊥ . Indeed, x 3 −1 −1 1 1 −1 1 2 − 1 = − 1 − 1 12 −1 3 1 −1 −1 1 1 1 2 = 1 − 4 + 3 = 0, 3 1 −2 1 2 AT ˆ − b = −2 . x 3 1 ⇒ 3 1 2 = 3 − 4 + 1 = 0, 1 we have verified that (Aˆ − b) ∈ R(A)⊥ . x We finish this Subsection with an alternative proof of Theorem 6.40 in the particular case that involves real-valued matrices, that is, F = R. The proof is interesting in its own, since it is based in solving a constrained minimization problem. Alternative proof of Theorem 6.40 for F = R: The vector ˆ ∈ Rn is a least squares x solution of the system Ax = b iff the function f : Rn → R given by f (x) = Aˆ − b 2 has a x minimum at x = ˆ. We then find all minima of function f . We first express f as follows, x f (x) = (Ax − b) · (Ax − b) = (Ax) · (Ax) − 2b · (Ax) + b · b = xT AT Ax − 2bT Ax + bT b. We now need to find all solutions to the equation x f (x) = 0. Recalling the definition of a gradient vector ∂f ∂x1 . xf = . , . ∂f ∂xn it is simple to see that, for any vector a ∈ Rn , holds, x (a T T x) = a, x (x a) = a. Therefore, the gradient of f is given by xf = 2AT Ax − 2AT b. We are interested in the stationary points, the ˆ solutions of x x x f (ˆ) =0 ⇔ AT Aˆx = AT b. We conclude that all stationary points ˆ are solutions of the normal equation, Eq. (6.20). x These stationary points must be a minimum of f , since f is quadratic on the vector components xi having the degree two terms all positive coefficients. This establishes the first part of Theorem 6.40 in the case that F = R. G. NAGY – LINEAR ALGEBRA December 8, 2009 209 6.7.1. Least squares fit. It is often desirable to construct a mathematical model to describe the results of an experiment. This may involve fitting an algebraic curve to the given experimental data. The least squares method can be used to find the best parameters that fit the data. Example 6.7.2: (Linear fit) The simplest situation is the case where the best curve fitting the data is a straight line. More precisely, suppose that the result of an experiment is the following collection of ordered numbers (t1 , b1 ), · · · , (tm , bm ) , m 2, and suppose that a plot on a plane of the result of this experiment is given in Fig. 47. (For example, from measuring the vertical displacement bi in a spring when a weight ti is attached to it.) Find the best line y (t) = x2 t + x1 that approximate these points in least ˆ ˆ m squares sense. The latter means to find the numbers x2 , x1 ∈ R such that i=1 |∆bi |2 is ˆˆ the smallest possible, where ∆bi = bi − y (ti ) ⇔ ∆bi = bi − (ˆ2 ti + x1 ), x ˆ i = 1, · · · , m. b y ( t ) = x2 t + x1 ( t i , bi ) bi bi ti t Figure 47. Sketch of the best line y (t) = x2 t + x1 fitting the set of points ˆ ˆ (ti , bi ), for i = 1, · · · , 10. Solution: Let us rewrite this problem as the least squares solution of an m × 2 linear system, which in general is inconsistent. We are interested to find x2 , x1 solution of the ˆˆ linear system y (t1 ) = b1 x1 + t1 x2 = b1 ˆ ˆ . . . . . . ⇔ ⇔ 1 x1 + tm x2 = bm ˆ ˆ y (tm ) = bm 1 . . . t1 b1 ˆ . x1 = . . . . . . x2 ˆ tm bm Introducing the notation 1 . A = . . 1 t1 . , . . tm ˆ= x x1 ˆ , x2 ˆ b1 . b = . , . bm 210 G. NAGY – LINEAR ALGEBRA december 8, 2009 we are then interested in finding the solution ˆ of the m × 2 linear system Aˆ = b. Introducing x x also the vector ∆b 1 . ∆b = . , . ∆b m it is clear that Aˆ − b = ∆b, and so we obtain the important relation x m Aˆ − b x 2 = ∆b 2 2 = ∆b i . i=1 m Therefore, the vector ˆ that minimizes the square of the deviation from the line, i=1 (∆bi )2 , x is precisely the same vector ˆ ∈ R2 that minimizes the number Aˆ − b 2 . We studied the x x latter problem at the begining of this Section. We called it a least squares problem, and the solution ˆ is the solution of the normal equation x AT A ˆ = AT b. x It is simple to see that AT A = 1 t1 ··· ··· 1 tm 1 . . . 1 t1 . = . . m ti tm 1 t1 AT b = ·· ··· 1 tm b1 . . = . ti , t2 1 bi . ti bi bm Therefore, we are interested in finding the solution to the 2 × 2 linear system m ti ti t2 1 x1 ˆ = x2 ˆ bi . ti b i Suppose that at least one of the ti is different from 1, then matrix AT A is invertible and the inverse is 1 t2 − ti −1 i AT A = . 2− ti m m t2 − ti i We conclude that the solution to the normal equation is x1 ˆ = x2 ˆ m 1 t2 i − ti 2 t2 i − ti − ti bi . ti bi m So, the slope x2 and vertical intercept x1 of the best fitting line are given by ˆ ˆ x2 = ˆ m ti b i − ( m t2 i − ti )( ti bi ) 2 , x1 = ˆ ( t2 )( i m bi ) − ( t2 i − ti )( ti 2 ti bi ) . Example 6.7.3: (Polynomial fit) Find the best polynomial of degree (n − 1) 0, say p(t) = xn t(n−1) + · · · + x1 , that approximates in least squares sense the set of points ˆ ˆ (t1 , b1 ), · · · , (tm , bm ) , m n. G. NAGY – LINEAR ALGEBRA December 8, 2009 211 (See Fig. 48 for an example in the case that the fitting curve is a parabola, n = 3.) Following Example 6.7.2, the least squares approximation means to find the numbers xn , · · · , x1 ∈ R ˆ ˆ m such that i=1 |∆bi |2 is the smallest possible, where ∆bi = bi − p(ti ) (n−1) ⇔ ∆bi = bi − (ˆn ti x + · · · + x1 ), ˆ i = 1, · · · , m. 2 p ( t ) = x 3 t + x2 t + x1 b (t ,b ) b i i i bi ti t Figure 48. Sketch of the best parabola p(t) = x3 t2 + x2 t + x1 fitting the ˆ ˆ ˆ set of points (ti , bi ), for i = 1, · · · , 10. Solution: We rewrite this problem as the least squares solution of an m × n linear system, which in general is inconsistent. We are interested to find xn , · · · , x1 solution of the linear ˆ ˆ system (n−1) (n−1) p(t1 ) = b1 xn = b1 ˆ x1 + · · · + t1 ˆ 1 · · · t1 x1 ˆ b1 . . . . = . . . . . . . . . ⇔ ⇔ . . .. . . p(tm ) = bm 1 x1 + · · · + t(n−1) xn = bm ˆ ˆ m Introducing the notation 1 . A = . . ··· 1 ··· (n−1) t1 x1 ˆ . ˆ = . , x . , . . . ··· (n−1) xn ˆ bm b1 . b = . , . xn ˆ tm (n−1) tm bm we are then interested in finding the solution ˆ of the m×n linear system Aˆ = b. Introducing x x also the vector ∆b 1 . ∆b = . , . ∆b m it is clear that Aˆ − b = ∆b, and so we obtain the important relation x m Aˆ − b x 2 = ∆b 2 = 2 ∆b i . i=1 m Therefore, the vector ˆ that minimizes the square of the deviation from the line, i=1 (∆bi )2 , x is precisely the same vector ˆ ∈ R2 that minimizes the number Aˆ − b 2 . We studied the x x 212 G. NAGY – LINEAR ALGEBRA december 8, 2009 latter problem at the begining of this Section. We called it a least squares problem, and the solution ˆ is the solution of the normal equation x AT A ˆ = AT b. x (6.21) It is simple to see that Eq. (6.21) is an n × n linear system, since (n−1) 1 ··· 1 1 · · · t1 . . . . , . . . AT A = . . . . . (n−1) (n−1) (n−1) t1 · · · tm 1 · · · tm 1 ··· 1 b1 . . . T . . . . A b= . . . (n−1) (n−1) bm t1 · · · tm We do not compute these expressions explicitly here. In the case that the columns of A form a linearly independent set, the solution ˆ to the normal equation is x ˆ = AT A x −1 AT b. The components of ˆ provide the parameters for the best polynomial fitting the data in least x squares sense. 6.7.2. Linear correlation. In statistics a correlation coefficient measures the departure of two random variables from independence. For centered data, that is, for data with zero average, the correlation coefficient can be viewed as the cosine of the angle in an abstract Rn space between two vectors constructed with the random variables data. We now define and find the correlation coefficient for two variables as given in Example 6.7.2. Once again, suppose that the result of an experiment is the following collection of ordered numbers (t1 , b1 ), · · · , (tm , bm ) , m 2. (6.22) Introduce the vectors e, t, b ∈ Rm as follows, 1 t1 . . e = . , t = . , . . 1 tm b1 . b = . . . bm Before introducing the correlation coefficient, let us use these vectors above to write down the least squares coefficients x found in Example 6.7.2. The matrix of coefficients can be written as A = [e, t], therefore, AT A = eT tT e, t = e·e t·e , e·t t·t AT b = eT tT b= e·b . t·b The least squares solution can be written as follows, 1 x1 ˆ t·t = x2 ˆ (e · e)(t · t) − (t · e)2 −e · t −e · t e·e e·b , t·b that is, (e · e)(t · b) − (e · t)(e · b) , (e · e)(t · t) − (t · e)2 Introduce the average values e·t , t= e·e ˆ2 = x x1 = ˆ b= (e · b)(t · t) − (e · t)(t · b) . (e · e)(t · t) − (t · e)2 e·b . e·e G. NAGY – LINEAR ALGEBRA December 8, 2009 213 These are indeed the average values of ti and bi , since m e · e = m, e·t= m ti , e·b= i=1 bi . i=1 ˆ Introduce the zero-average vectors ˆ = (t − t e) and b = (b − b e). The correlation coeffit cient of the data given in (6.22) is given by cor(t, b) = ˆ· b tˆ . ˆb tˆ Therefore, the correlation coefficient between the data vectors t and b is the angle between ˆ the zero-average vectors ˆ and b in Rm . t In order to understand what measures this angle, let us consider the case where all the ordered pairs in (6.22) lies on a line, that is, there exists a solution ˆ of the linear system x Aˆ = b (a solution, not only a least squares solution). In that case we have x x1 e + x2 t = b ˆ ˆ ⇒ x1 + x2 t = b, ˆ ˆ and this implies that x 2 ( t − t e) = ( b − b e) ˆ ⇔ x2 ˆ = b ˆt ˆ ⇔ cor(t, b) = 1. That is, in the case that t is linearly related to b we obtain that the zero-average vectors ˆ t ˆ and b are parallel, so the correlation coefficient is equal one. 6.7.3. QR-factorization. The Gram-Schmidt method can be used to factor any m × n matrix A into a product of an m × n matrix Q with orthonormal column vectors and an upper triangular n × n matrix R. We will see that the QR-factorization is useful to solve the normal equation associated to a least squares problem. Theorem 6.42. If the column vectors of matrix A ∈ Fm,n form a linearly independent set, then there exist matrices Q ∈ Fm,n and R ∈ Fn,n satisfying that Q∗ Q = In , matrix R is upper triangular with positive diagonal elements, and the following equation holds A = QR. Proof of Theorem 6.42: Use the Gram-Schmidt method to obtain an orthonormal set {q1 , · · · , qn } from the column vectors of the m × n matrix A = [A:1 , · · · , A:n ], that is, p1 p1 = A:1 q1 = , p1 p2 p2 = A:2 − A:2 · q1 q1 q2 = , p2 . . . . . . pn . pn = A:n − A:n · q1 q1 − · · · − A:n · qn−1 qn−1 qn = pn Define matrix Q = [q1 , · · · , qn ], which then satisfies the equation Q∗ Q = In . Notice that the equations above can be expressed as follows, A:1 = p1 q1 , A:2 = p2 q2 + q1 · A:2 q1 . . . A:n = pn qn + q1 · A:n q1 + q2 · A:n q2 + · · · + qn−1 · A:n qn−1 . 214 G. NAGY – LINEAR ALGEBRA december 8, 2009 After some time staring at the equations above, one can rewrite it as a matrix product p1 (q1 · A:2 ) · · · (q1 · A:n ) 0 p2 ··· (q2 · A:n ) . . . . . A:1 , · · · , A:n = q1 , · · · , qn . (6.23) . . . 0 0 · · · (qn−1 · A:n ) 0 0 ··· pn Define matrix R by equation above as the matrix satisfying A = QR. Then, Eq. (6.23) says that matrix R is n × n, upper triangular, with positive diagonal elements. This establishes the Theorem. 121 Example 6.7.4: Find the QR-factorization of matrix A = 1 1 1. 001 Solution: First use the Gram-Schmidt method to transform the column vectors of matrix A into an orthonormal set. This was done in Example 6.6.1. The result defines the matrix Q as follows √ 1 1 √ 0 2 2 1 1 Q = √2 − √2 0 . 0 0 1 Having matrix A and Q, and knowing that Theorem 6.42 is true, then we can compute matrix R by the equation R = QT A. Since the column vectors of Q form an orthonormal set, we have that QT = Q, so matrix R is given by √ √ √ 1 1 3 √ 0 2 √2 2 121 2 2 1 1 1 √ R = QA = √2 − √2 0 1 1 1 ⇒ R = 0 0 . 2 001 0 0 1 0 0 1 The QR-factorization of matrix A is then given by √ √ 1 1 √ 0 2 2 2 1 1 A = √2 − √2 0 0 0 0 1 0 3 √ 2 1 √ 2 0 √ 2 0 . 1 The QR-factorization is useful to solve the normal equation in a least squares problem. Proposition 6.43. Assume that the matrix A ∈ Rm,n admits the QR-factorization A = QR. The vector ˆ ∈ Rn is solution of the normal equation AT A ˆ = AT b iff it is solution of x x Rˆ = QT b. x Proof of Proposition 6.43: Introduce the QR-factorization into the normal equation AT A ˆ = AT b as follows, x RT QT x QR ˆ = RT QT b ⇔ RT R ˆ = RT QT b x ⇔ RT R ˆ − QT b = 0. x Since R is a square, upper triangular matrix with non-zero coefficients, we conclude that R is invertible. Therefore, from the last equation above we conclude that ˆ is solution of the x normal equation iff holds Rˆ = QT b. x This establishes the Proposition. G. NAGY – LINEAR ALGEBRA December 8, 2009 215 Exercises. 6.7.1.- Consider the matrix A and the vector b given by 2 3 23 10 1 A = 40 1 5 , b = 41 5 . 11 0 (a) Find the least-squares solution ˆ to x the linear system Ax = b. (b) Verify that the solution ˆ satisfies x (Aˆ − b) ∈ R(A)⊥ . x 6.7.2.- Consider the matrix A and the vector b given by 2 3 23 2 2 1 A = 4 0 −15 , b = 4 1 5 . −2 0 −1 (a) Find the least-squares solution ˆ to x the linear system Ax = b. (b) Find the orthogonal projection of the source vector b onto the subspace R(A). 6.7.3.- Find all the least-squares solutions ˆ to the linear system Ax = b, where x 2 3 23 12 1 A = 42 4 5 , b = 41 5 . 36 1 6.7.4.- Find the best line in least-squares sense that fits the measurements, where t1 is the independent variable and bi is the dependent variable, t1 = − 2, b 1 = 4, t2 = − 1, b 2 = 3, t3 = 0 , b 3 = 1, t4 = 2 , b 4 = 0. 6.7.5.- Find the correlation coefficient corresponding to the measurements given in Exercise 6.7.4 above. 6.7.6.- Use Gram-Schmidt method on the columns of matrix A below to find its QR factorization, where 3 2 11 A = 42 3 5 . 21 6.7.7.- Find the QR factorization of matrix 2 3 110 A = 41 0 1 5 . 011 216 G. NAGY – LINEAR ALGEBRA december 8, 2009 Chapter 7. Normed spaces 7.1. The p-norm An inner product in a vector space always determines an inner product norm, which satisfies the properties (a)-(c) in Proposition 6.17. However, the inner product norm is not the only function V → R satisfying these properties. Definition 7.1. A norm on a vector space V over the field F is any function function : V → R satisfying the following properties, (a) (Positive definiteness) For all x ∈ V holds x 0, and x = 0 iff x = 0; (b) (Scaling) For all x ∈ V and all a ∈ F holds ax = |a| x ; (c) (Triangle inequality) For all x, y ∈ V holds x + y x + y. A normed space is a pair V , of a vector space with a norm. The inner product norm, x = x, x for all x ∈ V , defined in an inner product space V , , , is thus an obvious example of a norm, since it satisfies the properties (a)-(c) in Proposition 6.17, which are precisely the conditions given in Definition 7.1. It is important to notice that alternative norms exist on inner product spaces. Moreover, a norm can be introduced in a vector space without having an inner product structure. One particularly important example of the former case is given by the p-norms defined on V = Fn . Definition 7.2. The p-norm on the vector space Fn , with 1 n p : F → R defined as follows, x x p ∞ = |x1 |p + · · · + |xn |p 1/p , p ∞, is the function p ∈ [1, ∞), = max |x1 |, · · · , |xn | , (p = ∞), with x = [xi ], for i = 1, · · · , n, the vector components in the standard ordered basis of Fn . 1 /2 Since the dot product norm is given by x = |x1 |2 + · · · + |xn |2 , it is simple to see n that = 2 , that is, the case p = 2 coincides with the dot product norm on F . The most commonly used norms, besides p = 2, are the cases p = 1 and p = ∞, x 1 = |x1 | + · · · + |xn |, x ∞ = max |x1 |, · · · , |xn | . Theorem 7.3. For each value of p ∈ [1, ∞] the p-norm function in Definition 7.2 is a norm on Fn . p : Fn → R introduced The Theorem above states that the function p satisfies the properties (a)-(c) in Definition 7.1. Therefore, for each value p ∈ [1, ∞] the space Fn , p is a normed space. We will see at the end of this Section that in the case p ∈ [1, ∞] and p = 2 these norms are not inner product norms. In other words, for these values of p there is no inner product , p defined on Fn such that , p. p= In order to prove that the p-norm is indeed a norm, we first need to establish the following generalization of the Cauchy-Schwarz inequality, called H¨lder inequality. o Theorem 7.4. (H¨lder inequality) For all vectors x, y ∈ Fn and p ∈ [1, ∞) holds that o |x∗ y| x p y q, with 11 + = 1. pq Proof of Theorem 7.4: We first show that for all real numbers a aλ b1−λ (1 − λ ) b + λ a λ ∈ [0, 1]. (7.1) 0, b > 0 holds (7.2) G. NAGY – LINEAR ALGEBRA December 8, 2009 217 This inequality can be shown using the auxiliary function f (t) = (1 − λ) + λ t − tλ . Its derivative is f (t) = λ[1 − t(λ−1) ], so the function f satisfies f (1) = 0, f (t) > 0 for t > 1, f (t) < 0 for 0 t < 1. We conclude that f (t) 0 for t ∈ [0, ∞). Given two real numbers a proven that f (a/b) 0, that is, 0, b > 0, we have aλ a ⇔ aλ b1−λ (1 − λ) b + λ a. (1 − λ) + λ bλ b Having established Eq. (7.2), we use it to prove H¨lder inequality. Let x = [xi ], y = [yj ] ∈ Fn o be arbitrary vectors, where i, j = 1, · · · , n, and introduce the rescaled vectors x , xp ˆ= x y , yq ˆ= y 11 + = 1. pq These rescaled vectors satisfy that ˆ p = 1 and ˆ q = 1. Denoting ˆ = [ˆi ] and ˆ = [ˆi ], x y x x y y ˆ use the inequality in Eq. (7.2) in the case that a = xi , b = yi and λ = 1/p, as follows, ˆ |xi |p n p j =1 |xj | |xi yi | = ˆˆ |yi |q n q j =1 |yj | 1 p |yi |q n q j =1 |yj | + 1 1 1 ˆ |yi |q + |xi |p = ˆ ˆ y q p q q q |xi |p n p j =1 |xj | 1 p 1− 1 p 1 1− p 1 1 |yi |q + |xi |p . ˆ ˆ q p Adding up over all components, n |ˆ∗ ˆ| xy |xi yi | ˆˆ i=1 ˆ∗ ˆ Therefore, |x y| 1 ˆ x p + p p = 11 + = 1. q p 1, which is equivalent to y x∗ xp yq 1 ⇔ |x∗ y| x p y q. This establishes the Theorem. The H¨lder inequality plays an important role to show that the p-norms satisfy the o triangle inequality. We saw the same situation when the Cauchy-Schwarz inequality played a crucial role to prove that the inner product norm satisfied the triangle inequality. Proof of Theorem 7.3: We show the proof for p ∈ [1, ∞). The case p = ∞ is left as an 1 exercise. So, we assume that p ∈ [1, ∞), we introduce q by the equation p + 1 = 1. In order q to show that the p-norm is a norm we need to show that the p-norm satisfies the properties (a)-(c) in Definition 7.1. The first two properties a simple to prove. The p-norm is positive, since for all x ∈ Fn holds n x p |xi |p = 1 p n 0, |xi |p and 1 p =0 ⇔ |xi | = 0 ⇔ x = 0. i=1 i=1 The p-norm satisfies the scaling property, since for all x ∈ Fn and all a ∈ F holds n ax p |axi |p = i=1 1 p n |a|p |xi |p = i=1 1 p n = |a|p |xi |p i=1 1 p = |a| x p . 218 G. NAGY – LINEAR ALGEBRA december 8, 2009 The difficult part is to establish that the p-norm satisfies the triangle inequality. We start proving the following statement: For all real numbers a, b holds p |a + b |p p |a| |a + b| q + |b| |a + b| q . (7.3) This is indeed the case, since |a + b|p = |a + b| |a + b|p−1 , 1 1 p−1 =1− = q p p and p = p − 1, q ⇔ so we conclude that p p (|a| + |b|) |a + b| q , |a + b| = |a + b| |a + b| q which is the inequality in Eq. (7.3). This inequality will be used in the following calculation. Given arbitrary vectors x = [xi ], y = [yi ] ∈ Fn , for i = 1, · · · , n, the following inequalities hold n x+y p p n p |xi + yi |p = p |xi | |xi + yi | q + |yi | |xi + yi | q , i=1 (7.4) i=1 where Eq. (7.3) was used to obtain the last inequality. Now the H¨lder inequality implies o both n n p i=1 n i=1 n p i=1 1 q |xi + yi |p i=1 n 1 p |yi |p |yi | |xi + yi | q n 1 p |xi |p |xi | |xi + yi | q 1 q |xi + yi |p , . i=1 i=1 Inserting these expressions in Eq. (7.4) we obtain x+y Recalling that p q x+y p p x p x+y p q p +y p x+y p p q . = p − 1, we obtain the inequality p p x p +y x+y p (p−1) p ⇔ x+y This establishes the Theorem for p ∈ [1, ∞). 1 Example 7.1.1: Find the length of x = 2 in the norms −3 p x p + y p. 1, 2 and ∞. Solution: A simple calculations shows, x 1 = |1| + |2| + | − 3| = 6, x 2 = 12 + 22 + (−3)2 x ∞ 1 /2 = √ 14 = 3.74, = max |1|, |2|, | − 3| = 3. Notice that for the vector x above holds x ∞ x 2 x 1. (7.5) One can prove that the inequality in Eq. (7.5) holds for all x ∈ Fn . Example 7.1.2: Sketch on R2 the set of vectors Bp = x ∈ R2 : p = 1, p = 2, and p = ∞. x p = 1 for the cases G. NAGY – LINEAR ALGEBRA December 8, 2009 Solution: Recall we use the standard basis to express x = 219 x1 . We start with the set B2 , x2 which is the circle of radius one in Fig. 49, that is, (x1 )2 + (x2 )2 = 1. The set B1 is the square of side one given by |x1 | + |x2 | = 1. The sides are given by the lines ±x1 ± x2 = 1. See Fig. 49. The set B∞ is the square of side two given by max |x1 |, |x2 | = 1. The sides are given by the lines x1 = ±1 and x2 = ±1. See Fig. 49. x2 x2 x2 1 1 1 1 −1 x1 x1 1 −1 −1 −1 1 −1 x1 −1 Figure 49. Unit sets in R2 for the p-norms, with p = 1, 2, ∞, respectively. Example 7.1.3: Show that for every x ∈ R2 holds x ∞ x 2 x 1. Solution: Introducing the unit disks Dp = x ∈ R2 : x p 1 , with p = 1, 2, ∞, then Fig. 50 shows that D1 ⊂ D2 ⊂ D∞ . Let us choose y ∈ R2 such that y 2 = 1, that is, a vector on the circle, for example the vector given in second picture in Fig. 50. Since this vector is outside the disk D1 , that implies y 1 1, and since this vector is inside the disk D∞ , that implies y ∞ 1. The three conditions together say y ∞ y 2 y 1. The equal signs correspond to the cases where y is a horizontal or a vertical vector. Since any vector x ∈ R2 is a scaling of an appropriate vector y on the border of D2 , that is, x = c y, with 0 c ∈ R, then, multiplying the inequality above by c we obtain x ∞ x 2 x 1, ∀ x ∈ R2 . The p-norms can be defined on infinite dimensional vector spaces like function spaces. 220 G. NAGY – LINEAR ALGEBRA december 8, 2009 x2 x2 1 1 y 1 −1 x1 1 −1 −1 x1 −1 Figure 50. A comparison of the unit sets in R2 for the p-norms, with p = 1, 2, ∞. Definition 7.5. The p-norm on the vector space V = C k ([a, b], R), with 1 function p : V → R defined as follows, b f p = |f (x)|p dx 1/p , p ∞, is the p ∈ [1, ∞), a f ∞ = max |f (x)|, (p = ∞). x∈[a,b] One can show that the p-norms introduced in Definition 7.5 are indeed norms on the vector space C k ([a, b], R). The proof of this statement follows the same ideas given in the proofs of Theorems 7.4 and 7.3 above, and we do not present it in these notes. Example 7.1.4: Consider the normed space C 0 ([−1, 1], R), find the p-norm of the element f (x) = x. p for any p ∈ [1, ∞] and Solution: In the case of p ∈ [1, ∞) we obtain f p p 1 = 1 |x|p dx = 2 −1 xp dx = 2 0 x(p+1) 1 2 = (p + 1) 0 p + 1 ⇒ f p = 2 p+1 1 p . In the case of p = ∞ we obtain f ∞ = max |x| x∈[−1,1] ⇒ f ∞ = 1. Relations between the p-norms of a given vector f analogous to those relations found in Example 7.1.3 are not longer true. The volume of the integration region, the interval [a, b], appears in any relation between p-norms of a fixed vector. We do not address such relations in these notes. 7.1.1. Not every norm is an inner product norm. Since every inner product in a vector space determines a norm, the inner product norm, it is natural to ask whether the converse property holds: Does every norm in a vector space determine an inner product? The answer is no. Only those norms satisfying an extra condition, called the parallelogram identity, define an inner product. G. NAGY – LINEAR ALGEBRA December 8, 2009 221 Definition 7.6. A norm in a vector space V satisfies the polarization identity iff for all vectors x, y ∈ V holds x+y 2 + x−y 2 =2 x 2 2 +y . The polarization identity (also referred as parallelogram identity) is a well-known property of the dot product norm, which is geometrically described in Fig. 51 in the case of the vector space R2 . It turns out that this property is crucial to determine whether a norm is an inner product norm for some inner product. x+y x x+y x−y x y y Figure 51. The polarization identity says that the sum of the squares of the diagonals in a parallelogram is twice the sum of the squares of the sides. Theorem 7.7. Given a normed space V , , the norm the norm satisfies the polarization identity. is an inner product norm iff It is not difficult to see that an inner product norm satisfies the polarization identity; see the first part in the proof below. It is rather involved to show the converse statement. If the norm satisfies the polarization identity and V is a real vector space, then one shows that the function , : V × V → R given by 1 x+y 2− x−y 2 4 is an inner product on V ; in the case that V is a complex vector space, then one shows that the function , : V × V → C given by x, y = 1 i x+y 2− x−y 2 + x + iy 2 − x − iy 2 4 4 is an inner product on V . Proof of Theorem 7.7: (⇒) Consider the inner product space V , , with inner product norm vectors x, y ∈ V holds x, y = x+y 2 = (x + y), (x + y) = x 2 + x, y + y, x + y 2 , x−y 2 = (x − y), (x − y) = x 2 . For all − x, y − y, x + y 2 . Adding both equations above up we obtain that x+y 2 + x−y 2 =2 x 2 2 +y . (⇐) We only give the proof for real vector spaces. The proof for complex vector spaces is left as an exercise. Consider the normed space V , and assume that V is a real vector space. In this case, introduce the function , : V × V → R as follows, x, y = 1 4 x+y 2 − x−y 2 . 222 G. NAGY – LINEAR ALGEBRA december 8, 2009 Notice that this function satisfies x, 0 = 1 x 2 − x 2 = 0 for all x ∈ V . We now show 4 that this function , is an inner product on V . It is positive definite, since 1 2x 2 = x 4 and the norm is positive definite, so we obtain that x, x = x, x 0, and x, x = 0 2 ⇔ x = 0. The function , is symmetric, since 1 1 x+y 2− x−y 2 = y + x 2 − y − x 2 = y, x . 4 4 The difficult part is to show that the function , is linear in the second argument. Here is where we need the polarization identity. We start with the following expressions, which are obtained from the polarization identity, x, y = x+y+z 2 + x+y−z 2 =2 x+y 2 + 2 z 2, x−y+z 2 + x−y−z 2 =2 x−y 2 + 2 z 2. If we subtract the second equation from the first one, and ordering the terms conveniently, we obtain, x + (y + z) 2 − x − (y + z) 2 + x + (y − z) 2 − x − (y − z) 2 =2 x+y 2 − x−y 2 which can be written in terms of the function , as follows, x, (y + z) + x, (y − z) = 2 x, y . (7.6) Several relations come from this equation. For the first relation, take y = z, and recall that x, 0 = 0, then we obtain x, 2y = 2 x, y . (7.7) The second relation derived from Eq. (7.6) is obtained renaming the vectors y + z = u and y − z = v, that is, (u + v) x, u + x, v = 2 x, = x, (u + v) , 2 where the equation on the far right comes from Eq. (7.7). We have shown that for all x, u, v ∈ V holds x, (u + v) = x, u + x, v which is a particular case of the linearity in the second argument property of an inner product. We only need to show that for all x, y ∈ V and all a ∈ R holds x, ay = a x, y . We have proven the case a = 2 in Eq. (7.7). The case a = n ∈ N is proven by induction: If x, ny = n x, y , then the same relation holds for (n + 1), since x, (n + 1)y = x, (ny + y) = x, ny + x, y = n x, y + x, y = (n + 1) x, y . The case a = 1/n with n ∈ N comes from the relation x, y = x, n 1 y = n x, y n n ⇒ x, 1 1 y= x, y . n n G. NAGY – LINEAR ALGEBRA December 8, 2009 223 These two cases show that for any rational number a = p/q ∈ Q holds p 1 p x, y = p x, y = x, y . (7.8) q q q Finally, the same property holds for all a ∈ R by the following continuity argument. Fix arbitrary vectors x, y ∈ V and define the function f : R → R by a → f (a) = x, ay . We left as an exercise to show that f is a continuous function. Now, using Eq. (7.8) we know that this function satisfies f (a) = a f (1) ∀ a ∈ Q. {ak }∞ k=1 Let ⊂ Q be a sequence of rational numbers that converges to a ∈ R. Since f is a continuous function we know that limk→∞ f (ak ) = f (a). Since the sequence is constructed with rational numbers, for every element in this sequence holds f (ak ) = ak f (1) which implies that lim f (ak ) = k→∞ lim ak f (1) = a f (1). k→∞ So we have shown that for all a ∈ R holds f (a) = a f (1), that is, x, ay = a x, y . Since x, y ∈ V are fixed but arbitrary, we have shown that the function , in linear in the second argument. We conclude that this function is an inner product on V . This establishes the Theorem in the case that V is a real vector space. Example 7.1.5: Show that for p = 2 the p-norm on the vector space Fn introduced in Definition 7.2 does not satisfy the polarization identity. Solution: Consider the vectors e1 and e2 , the first two columns of the identity matrix In . It is simple to compute, e1 + e2 2 p = 22/p , e1 − e2 2 p = 22/p , e1 2 p = e2 2 p =1 therefore, p+2 e1 + e2 2 + e1 − e2 2 = 2 p and 2 e1 2 + e2 2 = 4. p p p p We conclude that the p-norm satisfies the polarization identity only in the case p = 2. 7.1.2. Equivalent norms. We have seen that a norm in a vector space determines a notion of distance between vectors given by the norm distance introduced in Definition 6.18. The notion of distance is the structure needed to define the convergence of an infinite sequence of vectors. A sequence of vectors {xi }∞ in a normed space V , converges to a vector i=1 x ∈ V iff lim x − xk = 0, k→∞ that is, for every > 0 there exists n ∈ N such that for all k > n holds that x − xk < . With the notion of convergence of a sequence it is possible to introduce concepts like the continuous and differentiable functions defined on the vector space. Therefore, the calculus can be extended from Rn to any normed vector space. We have also seen that there is no unique norm in a vector space. This implies that there is no unique notion of norm distance. It is important to know whether two different norms provide the same notion of convergence of a sequence. By this we mean that every sequence that converges (diverges) with respect to one norm distance also converges (diverges) with 224 G. NAGY – LINEAR ALGEBRA december 8, 2009 respect to the other norm distance. In order to answer this question it is useful the following notion. Definition 7.8. The norms a and equivalent iff there exist real constants K kx defined on a vector space V are called to be k > 0 such that for all x ∈ V holds b x b K x b. a It is simple to see that if two norms defined on a vector space are equivalent, then they have the same notion of convergence. What it is non-trivial to prove is that the converse also holds. Since two norms have the same notion of convergence iff they are equivalent, it is important to know whether a vector space can have non-equivalent norms. The following result addresses the case of finite dimensional vector spaces. Theorem 7.9. If V is a finite dimensional, then all norms defined on V are equivalent. This result says that there is a unique notion of convergence in any finite dimensional vector space. So, functions that are continuous or differentiable with respect to one norm are also continuous or differentiable with respect to any other norm. This is not the case of infinite dimensional vector spaces. It is possible to find non-equivalent norms on infinite dimensional vector spaces. Therefore, functions defined on such a vector space can be continuous with respect to one norm and discontinuous with respect to the other norm. Proof of Theorem 7.9: Let a and b be two norms defined on a finite dimensional vector space V . Let dim V = n, and fix a basis V = {v1 , · · · , vn } of V . Then, any vector x ∈ V can be decomposed in terms of the basis vectors as x = x1 v1 + · · · + xn vn . Since of ( n i=1 is a norm, we can obtain the following bound on x |xi |) as follows. a x a = x1 v1 + · · · + xn vn |x1 | v1 a for every x ∈ V in terms a a + · · · + |xn | vn a n |x1 | + · · · + |xn | amax ⇒ x |xi | amax . a i=1 where amax = max{ v1 duce the set a, · · · , vn a }. A lower bound can also be found as follows. Intron S = [xi ] ∈ Fn : ˆ ⊂ Fn , |xi | = 1 ˆ i=1 and then introduce the function fa : S → R as follows n fa ([ˆi ]) = x xi vi ˆ i=1 a . Since function fa is a continuous and defined on a closed and bounded set of Fn , then fa has attains a maximum and a minimum values on S . We are here interested only in the minimum value. If [ˆi ] ∈ S is a point where fa takes its minimum value, let let us denote y fa ([ˆi ]) = amin . y G. NAGY – LINEAR ALGEBRA December 8, 2009 225 Since the norm a is positive, we know that amin > 0. The existence of this minimum value of fa implies the following bound on x a for all x ∈ V , namely, n x a = xi vi i=1 a n j =1 |xj | n n j =1 = |xj | i=1 n = xi vi n xi |xj | j =1 vi |xj | a n |xj | xi vi ˆ j =1 n = n j =1 i=1 n = a a i=1 n |xj | fa ([ˆi ]) x ⇒ x |xj | amin . a j =1 j =1 Summarizing, we have found real numbers amax ity holds for all x ∈ V , amin > 0 such that the following inequal- n amin n |xj | x |xj | amax . a j =1 j =1 Since no special property of norm a has been used, the same type of inequality holds for norm bmin > 0 such that the following b , that is, there exist real numbers bmax inequality holds for all x ∈ B , n n |xj | bmin x |xj | bmax . b j =1 j =1 These inequalities imply that norms x b amin bmax bmax amin bmax a and b n are equivalent, since n |xj | x |xj | amax a j =1 j =1 bmin bmin amax x b. bmin (Start reading the inequality form the center, at x a , and first see the inequalities to the right; then go to the center again and read the inequalities to the left.) Denoting k = amin /bmax and K = amax /bmin , we have obtained that kx This establishes the Theorem. b x a Kx b ∀ x ∈ V. 226 G. NAGY – LINEAR ALGEBRA december 8, 2009 Exercises. 7.1.1.- Consider the vector space C4 and for p = 1, 2, ∞ find the p-norms of the vectors 23 2 3 2 1+i 6 17 61 − i7 7 x = 6 7, y = 6 4− 4 5 4 1 5. −2 4i 7.1.2.- Determine which of the following functions : R2 → R defines a norm on R2 . We denote x = [xi ] the components of x in a standard basis of R2 . Justify your answers. (a) x = |x1 |; (b) x = |x1 + x2 |; (c) x = |x1 |2 + |x2 |2 ; (d) x = 2|x1 | + 3|x2 |. 7.1.3.- True or false? Justify your answer: If a and b are two norms on a vector space V , then defined as x=x a +x b ∀x∈V is also a norm on V . 7.1.4.- Consider the space P2 ([0, 1]) with the p-norm “Z 1 ”1 p q p= |q(x)|p dx , 0 for p ∈ [1, ∞). Find the p-norm of the vector q(x) = −3x2 . Also find the supremum norm of q, defined as q ∞ = max |q(x)|. x∈[0,1] G. NAGY – LINEAR ALGEBRA December 8, 2009 227 7.2. Operator norms We have seen that the space of all linear transformations L(V, W ) between the vector spaces V and W is itself a vector space. As in any vector space, it is possible to define norm functions : L(V, W ) → R on the vector space L(V, W ). For example, We will see that in the case where V = Fn and W = Fm , one of such norms is the Frobenius norm, which is the inner product norm corresponding to the Frobenius inner product introduced in Section 6.2. More precisely, the Frobenius norm is given by A F F = = A, A ∀ A ∈ Fm,n . F It is simple to see that m A n |Aij |2 tr A∗ A = 1 2 . i=1 j =1 However, in the case where V and W are normed spaces with norms v and w , there exists a particular norm on L(V, W ) which is induced from the norms on V and W . This induced norm can be defined on elements L(V, W ) by recalling that the element T ∈ L(V, W ) as a function T : V → W . Definition 7.10. Let V , and W, be finite dimensional normed spaces. The v w induced norm on the space L(V, W ) of all linear transformations T : V → W is a function : L(V, W ) → R given by T = max T(x) w . x v =1 In the particular case that V = W and v= called the operator norm, and is given by w = , the induced norm on L(V ) is T = max T(x) . x =1 The definition above says that given arbitrary norms on the vector spaces V and W , they induce a norm on the vector space L(V, W ) of linear transformations T : V → W . A particular case of this definition is when V = Fn and W = Fm , we fix standard bases on V and W , and we introduce p-norms on these spaces. So, the normed spaces Fn , p and Fm , q induce a norm on the vector space Fm,n as follows. Definition 7.11. Consider the normed spaces Fn , p and Fm , The induced (p, q )-norm on the space L(Fn , Fm ) is the function given by T p,q = max T(x) q ∀ T ∈ L(Fn , Fm ). x q , with p, q ∈ [1, ∞]. : L(Fn , Fm ) → R p,q p =1 In the particular case of p = q we denote p,p = p . In the case p = q and n = m the induced norm on L(Fn ) is called the p-operator norm and is given by T p = max x p =1 T(x) p ∀ T ∈ L(Fn ). n In the case p = q above we use the same notation p for the p-norm on F , the (p, p)n m n norm on L(F , F ) and the p-operator nor in L(F ). The context should help to decide which norm we use on every particular situation. Example 7.2.1: Consider the particular case V = W = R2 with standard ordered bases and the p = 2 norms in both, V and W . In this case, the space L(R2 ) can be identified with 228 G. NAGY – LINEAR ALGEBRA december 8, 2009 the space R2,2 of all 2 × 2 matrices. The induced norm on R2,2 , denoted as the operator norm, is the following: A 2 = max x ∀ A ∈ R2,2 . Ax 2 , 2 =1 ||2 and called The meaning of this norm is deeply related with the interpretation of the matrix A as a linear operator A : R2 → R2 . Suppose that the action of the operator A on the unit circle (B2 in the notation of Example 7.1.2) is given in Fig. 52. Then the value of the operator norm A 2 is the size measured in the 2-norm of the maximum deformation by A of the unit circle. x A 2 x2 || A x1 1 || 2 1 x1 Figure 52. Geometrical meaning of the operator norm on R2,2 induced by the 2-norm on R2 . Example 7.2.2: Consider the normed space R2 , A11 0 A= 0 A22 with 2 , and let A : R2 → R2 be the matrix |A11 | = |A22 |. Find the 2-operator norm induced on A. Solution: Since the norm on R2 is the 2-norm, the induced norm on A is given by A 2 = max x 2 =1 Ax 2 . We need to find the maximum of Ax 2 among all x subject to the constraint x 2 = 1. So, this is a constrained maximization problem, that is, a maxima-minima problem where the variable x is restricted by a constraint equation. In general, this type of problems can be solved using the Lagrange multipliers method. Since this example is simple enough, we solve it in a simpler way. We first solve the constraint equation on x, and we then find the maxima of Ax 2 among these solutions only. The general solution of the equation x 2 =1 ⇔ (x1 )2 + x2 )2 = 1 is given by x(θ) = cos(θ) sin(θ) Introduce this general solution into Ax Ax 2 = 2 with θ ∈ [0, 2π ). we obtain (A11 )2 cos2 (θ) + (A22 )2 sin2 (θ). G. NAGY – LINEAR ALGEBRA December 8, 2009 229 Since the maximum in θ of the function Ax 2 is the same as the maximum of Ax 2 , we 2 need to find the maximum on θ of the function f (θ) = Ax(θ) 2 , that is, 2 f (θ) = (A11 )2 cos2 (θ) + (A22 )2 sin2 (θ). The solution is simple, find θ solutions of df f (θ) = (θ) = 0 ⇒ 2 −(A11 )2 + (A22 )2 sin(θ) cos(θ) = 0. dθ Since we assumed that |A11 | = |A22 |, then sin(θ) = 0 θ1 = 0, θ2 = π, π 3π cos(θ) = 0 ⇒ θ3 = , θ4 = . 2 2 We obtained four solutions for θ. We evaluate f at these solutions, π 3π f (0) = f (π ) = (A11 )2 , f =f = (A22 )2 . 2 2 Recalling that Ax(θ) 2 = f (θ), we obtain A 2 ⇒ = max |A11 |, |A22 | . It was mentioned in Example 7.2.2 above that finding the operator norm requires solving a constrained maximization problem. In the case that the operator norm is induced from a dot product norm, the constrained maximization problem can be solved in an explicit form. Proposition 7.12. Consider the vector spaces Rn and Rm with inner product given by the n m dot products, inner product norms 2 , and the vector space L(R , R ) with the induced 2-norm. A 2 = max Ax 2 ∀ A ∈ L(Rn , Rm ). x 2 =1 Introduce the scalars λi ∈ R, with i = 1, · · · , k n, as all the roots of the polynomial T p(λ) = det A A − λ In . Then, all scalars λi are non-negative real numbers and the induced 2-norm of the transformation A is given by A 2 = max λ1 , · · · , λk . Proof of Proposition 7.12: Introduce the functions f : Rm → R and g : Rn → R as follows, f (x) = Ax 2 = Ax · Ax = xT AT Ax, g (x) = x 2 = x · x = xT x. 2 2 To find the induced norm of T is then equivalent to solve the constrained maximization problem: Find the maximum of f (x) for x ∈ Rn subject to the constraint g (x) = 1. The vectors x that provide solutions to the constrained maximization problem must we solutions of the Euler-Lagrange equations f =λ g (7.9) where λ ∈ R, and we introduced the gradient row vectors f= ∂f ∂x1 ··· ∂f , ∂xn f= ∂g ∂x1 ··· ∂g . ∂xn In order to understand why the solution x must satisfy the Euler-Lagrange equations above we need to recall two properties of the gradient vector. First, the gradient of a function f : Rn → R is a vector that determines the direction on Rn where f has the maximum 230 G. NAGY – LINEAR ALGEBRA december 8, 2009 increase. Second, which is deeply related to the first property, the gradient vector of a function is perpendicular to the surfaces where the function has a constant value. The surfaces of constant value of a function are called level surfaces of the function. Therefore, the function f has a maximum or minimum value at x on the constraint level surface g = 1 if f is perpendicular to the level surface g = 1 at that x. (Proof: Suppose that at a particular x on the constraint surface g = 1 the projection of f onto the constraint surface is nonzero; then the values of f increase along that direction on the constraint surface; this means that f does not attain a maximum value at that x on g = 1.) We conclude that both gradients f and g are parallel, which is precisely what Eq. (7.9) says. In our particular problem we obtain for f and g the following: f (x) = xT AT Ax ⇒ f (x) = 2xT AT A, g (x) = xT x ⇒ g (x) = 2xT . We must look for x = 0 solution of the equation xT AT A = λ xT ⇔ AT Ax = λ x, where the condition x = 0 comes from x 2 = 1. Therefore, λ must not be any scalar but the precise scalar or scalars such that the matrix (AT A − λ In ) is not invertible. An equivalent condition is that p(λ) = det AT A − λ In = 0. The function p is a polynomial of degree n in λ so it has at most n real roots. Let us denote these roots by λ1 , · · · , λk , with 1 k n. For each of these values λi the matrix AT A − λi In is not invertible, so N (AT A − λi In ) is non-trivial. Let xi be any element in N (AT A − λi In ), that is, AT Axi = λi xi , i = 1, · · · , k. At this point it is not difficult to see that λi 0 for i = 0, · · · , k . Indeed, multiply the equation above by xT , that is, i xT AT Axi = λi xT xi . i i Since both xT AT Axi and xT xi are non-negative numbers, so is λi . Returning to xi , only i i these vectors are the candidates to find a solution to our maximization problem, and f has the values f (xi ) = xT AT Axi = λi xT xi = λi i i ⇒ f (xi ) = λi , i = 1, · · · , k. The induced 2-norm of A is the maximum of these scalar λi . This establishes the Proposition. Example 7.2.3: Consider the vector spaces R3 and R2 with the dot product. Find the induced 2-norm of the 2 × 3 matrix A= 10 01 1 . −1 Solution: Following the Proposition 7.12 the value of the induced norm A imum of the λi roots of the polynomial p(λ) = det AT A − λ I3 = 0. We start computing 1 AT A = 0 1 0 1 1 0 −1 0 1 1 1 = 0 −1 1 0 1 −1 1 −1 . 2 2 is the max- G. NAGY – LINEAR ALGEBRA December 8, 2009 231 The next step is to find the polynomial p(λ) = (1 − λ ) 0 0 (1 − λ) 1 −1 1 −1 = (1 − λ) (1 − λ)(2 − λ) − 1 + (1 − λ), (2 − λ) therefore, we obtain p(λ) = −λ(λ − 1)(λ − 3). We have three roots λ1 = 0 , λ2 = 1 , λ 3 = 3. We then conclude that the induced norm of A is given by A 2 = 3. In the case that the norm in a normed space is not an inner product norm the induced norm on linear operators is not simple to evaluate. The constrained maximization problem is in general complicated to solve. Two particular cases can be solved explicitly though, when the operator norm is induced from the p-norms with p = 1 and p = ∞. Proposition 7.13. Consider the normed spaces Rn , space L(Rn , Rm ) with the induced p-norm. A p = max x A(x) p =1 p and Rm , p and the vector p ∀ A ∈ L(Rn , Rm ). If p = 1 or p = ∞, then the following formulas hold, respectively, m A 1 = n max j ∈{1,··· ,n} |Aij |, A ∞ = max i∈{1,··· ,m} i=1 |Aij |. j =1 Proof of proposition 7.13: From the definition of the induced p-norm for p = 1, A 1 = max x Ax 1 . 1 =1 m From the p-norm on Rm we know that Ax 1 n = Aij xj , therefore, i=1 j =1 m Ax n n m |Aij | |xj | = 1 i=1 j =1 and introducing the condition x |xj | j =1 1 n |Aij | m |xj | i=1 j =1 max j ∈{1,··· ,n} |Aij | ; i=1 = 1 we obtain the inequality m Ax max 1 j ∈{1,··· ,n} |Aij |. i=1 Recalling the column vector notation A = A:1 , · · · , A:n , we notice that A:j so the inequality above can be expressed as Ax max 1 j ∈{1,··· ,n} A:j 1 = m i=1 |Aij |, 1. Since the left hand side is independent of x, the inequality also holds for the maximum in x 1 = 1, that is, A1 max A:j 1 . j ∈{1,··· ,n} It is now clear that the equality has to be achieved, since the Ax 1 in the case x = ej , with ej a standard basis vector, takes the value Aej 1 = A:j 1 . Therefore, A 1 = max j ∈{1,··· ,n} A:j 1. 232 G. NAGY – LINEAR ALGEBRA december 8, 2009 The second part of Proposition 7.13 can be proven as follows. From the definition of the induced p-norm for p = ∞ we know that A ∞ = max x Ax ∞ =1 ∞. From the p-norm on Rm we see n Ax ∞ = i∈{1,··· ,m} n n Aij xj max max i∈{1,··· ,m} j =1 |Aij | |xj | |Aij |. max i∈{1,··· ,m} j =1 j =1 Since the far right hand side does not depend on x, the inequality must hold for the maximum in x with x ∞ = 1, so we conclude that n A max ∞ i∈{1,··· ,m} |Aij |. j =1 As in the previous p = 1 case, the value on the right hand side above is achieved for Ax ∞ in the case of x with components ±1 depending on the sign of Aij . More precisely, choose x as follows, 1 xj = −1 n 0, if Aij ⇒ if Aij < 0, n Aij xj = j =1 |Aij |, i = 1, · · · , m. j =1 n Therefore, for that x we have x ∞ = 1 and Ax ∞ = max i∈{1,··· ,m} |Aij |. So, we conclude j =1 n A ∞ = max i∈{1,··· ,m} |Aij |. j =1 This establishes the Proposition Example 7.2.4: Find the induced p-norm, where p = 1, ∞, for the 2 × 3 matrix 14 21 A= 1 . −1 Solution: Proposition 7.13 says that A 1 is the largest absolute value sum of components among columns of A, while A ∞ is the largest absolute value sum among rows of A. In the first case we have: 2 A since 1 = max j =1,2,3 2 2 |Ai1 | = 3, 1 2 |Ai2 | = 5, i=1 therefore, A |Aij |; i=1 i=1 | Ai 3 | = 2 , i=1 = 5. In the second case e have: 2 A since ∞ = max 3 i=1,2 |Aij |; j =1 3 |A1j | = 6, j =1 therefore, A ∞ = 6. |A2j | = 4, j =1 G. NAGY – LINEAR ALGEBRA December 8, 2009 233 Exercises. 7.2.1.- Evaluate the induced p-norm, where p = 1, 2, ∞, for the matrices 2 3 » – 010 1 −2 A= , B = 40 0 1 5 . −1 2 100 ` ´ 7.2.2.- In the normed space R2 , 2 , find 2 2 the induced norm of A : R → R » – 1 3 √1 − . A= √ 8 30 ` ´ 7.2.3.- Consider the space Fn , and p the space Fn,n with the induced norm n,n and p . Prove that for all A, B ∈ F all x ∈ Fn holds (a) Ax p A p x p; (b) AB p A p B p. 234 G. NAGY – LINEAR ALGEBRA december 8, 2009 7.3. Condition numbers In Sect. 1.7 we discussed several types of approximations errors that appear when solving an m × n linear system in a floating-point number set using rounding. In this Section we discuss a particular type of square linear systems with unique solutions that are greatly affected by small changes in the coefficients of their augmented matrices. We will call such systems ill-conditioned. When an ill-conditioned n × n linear system is solved on a floatingpoint number set using rounding, a small rounding error in the coefficients of the system may produce an approximate solution that differs significantly from the exact solution. Definition 7.14. An n × n linear system having a unique solution is ill-conditioned iff a 1% perturbation in a coefficient of its augmented matrix produces a perturbed linear system still having a unique solution which differs from the unperturbed solution in a 100% or more. We remark that the choice of the values 1% and 100% in the above definition is not a standard choice in the literature. While these values may change on different books, the idea behind the definition of an ill-conditioned system is still the same, that a small change in a coefficient of the linear system produces a big change in its solution. We also remark that our definition of an ill-conditioned system applies only to square linear system having a unique solution, and such that the perturbed linear system also has a unique solution. The concept of ill-conditioned system can be generalized to other linear systems, but we do not study those cases here. The following example gives some insight to understand what causes a 2 × 2 system to be ill-conditioned. Example 7.3.1: It is not difficult to understand when a 2 × 2 linear system is ill-conditioned. The solution of a 2 × 2 linear system can be thought as the intersection of two lines on the plane, where each line represents the solution of each equation of the system. A 2 × 2 linear system is ill-conditioned when these intersecting lines are almost parallel. Then, a small change in the coefficients of the system produces a small change in the lines representing the solution of each equation. Since the lines are almost parallel, this small change on the lines may produce a large change of the intersection point. This situation is sketched on Fig. 53. x2 x2 x1 x1 Figure 53. The intersection point of almost parallel lines represents a solution of an ill-conditioned 2 × 2 linear system. A small perturbation in the coefficients of the system produces a small change in the lines, which in turn produces a large change in the solution of the system. Example 7.3.2: Show that the following 2 × 2 linear system is ill-conditioned: 0.835 x1 + 0.667 x2 = 0.168, 0.333 x1 + 0.266 x2 = 0.067. G. NAGY – LINEAR ALGEBRA December 8, 2009 235 Solution: We first note that the solution to the linear system above is given by x1 = 1, x2 = −1. In order to show that the system above is ill-conditioned we only need to find a coefficient in the system such that a small change in that coefficient produces a large change in the solution. Consider the following change on the second source coefficient: 0.067 → 0.066. One can check that the solution to the new system 0.835 x1 + 0.667 x2 = 0.168, 0.333 x1 + 0.266 x2 = 0.066, is given by x1 = −666, x2 = 834. We then conclude that the system is ill-conditioned. Summarizing, solving a linear system in a floating-point number set using rounding introduces approximation errors in the coefficients of the system. The modified Gauss-Jordan method on the floating point number set also introduces approximation errors in the solution of the system. Both type of approximation errors can be controlled, that is, be kept small, choosing a particular scheme of Gauss operations; for example partial pivoting and complete pivoting. However, if the original linear system we solve is ill-conditioned, then even very small approximation errors in the system coefficients and in the Gauss operations may result in a huge error in the solutions. Therefore, it is important to prevent solving ill-conditioned systems when approximation errors are unavoidable. How handle such situations is one important research area in numerical analysis. 236 G. NAGY – LINEAR ALGEBRA december 8, 2009 Exercises. 7.3.1.- Consider the ill-conditioned system from Example 7.3.2, 0.835 x1 + 0.667 x2 = 0.168, 0.333 x1 + 0.266 x2 = 0.067. (a) Solve this system in F5,10,6 without scaling, partial or complete pivoting. (b) Solve this system in F6,10,6 without scaling, partial or complete pivoting. (c) Compare the results in (a) and (b) with the result in R. 7.3.2.- Perturb the ill-conditioned system given in Exc. 1 as follows, 0.835 x1 + 0.667 x2 = 0.1669995, 0.333 x1 + 0.266 x2 = 0.066601. Find the solution of this system in R and compare it with the solution in Exc. 1. 7.3.3.- Find the solution in R of the following system 8 x1 + 5 x2 + 2 x3 = 15, 21 x1 + 19 x2 + 16 x3 = 56, 39 x1 + 48 x2 + 53 x3 = 140. Then, change 15 to 14 in the first equation and solve it again in R. Is this system ill-conditioned? G. NAGY – LINEAR ALGEBRA December 8, 2009 237 Chapter 8. Spectral decomposition In Sect.5.2 we discussed the matrix representation of linear transformations. We saw that this representation is basis-dependent. The matrix of a linear transformation can be complicated in one basis and simple in another basis. In the case of linear operators sometimes there exists a special basis where the operator matrix is the simplest possible: It is a diagonal matrix. In this Chapter we study those linear operators having a diagonal matrix representation, called normal operators. We start introducing the eigenvalues and eigenvectors of a linear operator. Later on we the present the main result, the Spectral Theorem for normal operators. We use this result to define functions of operators. We finally mention how to apply these ideas to find solutions to a linear system of ordinary differential equations. 8.1. Eigenvalues and eigenvectors Sometimes a linear operator has the following property: The image under the operator of a particular line in the vector space is again the same line. In that case we call the line special (eigen, in German), and any non-zero vector in that line is also special and is called an eigenvector. Definition 8.1. Let V be a finite dimensional vector space over the field F. The scalar λ ∈ F and the non-zero vector x ∈ V are called eigenvalue and eigenvector of the linear operator T ∈ L(V ) iff holds T(x) = λ x. (8.1) The set σT ⊂ F of all eigenvalues of the operator T is called the spectrum of T. The subspace Eλ = N (T − λ I) ⊂ V , the null space of the operator (T − λ I), is called the eigenspace of T corresponding to λ. An eigenvector of a linear operator T : V → V is a vector that remain invariant except by scaling under the action of T. The change in scaling of the eigenvector x under T determines the eigenvalue λ. Since the operator T is linear, given an eigenvector x and any non-zero scalar a ∈ F, the vector ax is also an eigenvector. (Proof: T(ax) = aT(x) = λ(ax).) The elements of the eigenspace Eλ are all eigenvectors with eigenvalue λ and the zero vector. Indeed, a vector x ∈ Eλ iff holds (T − λ I)(x) = 0, where I ∈ L(V ) is the identity operator, and this equation implies that x = 0 or T(x) = λ x. Eigenvalues and eigenvectors are notions defined on an operator, independently of any basis on the vector space. Given a particular ordered basis V in the vector space V , and denoting xv ∈ Fn and Tvv ∈ F n,n the components of an eigenvector and the operator T in that basis, respectively, then Eq. (8.1) has the form Tvv xv = λ xv . (8.2) The property that eigenvalues and eigenvectors are notions defined independently of a basis is the reason why Eq. (8.2) is invariant under similarity transformations of the matrix Tvv . ˜ Indeed, given any other ordered basis V of the vector space V , denote the change of basis matrix P = Ivv . Then, multiply Eq. (8.2) by P −1 , that is, ˜ P −1 Tvv xv = λ P −1 xv ⇔ (P −1 Tvv P )(P −1 xv ) = λ (P −1 xv ); using the change of basis formulas Tvv = P −1 Tvv P and xv = P −1 xv we conclude that ˜˜ ˜ Tvv xv = λ xv . ˜˜ ˜ ˜ Therefore, the eigenvalues and eigenvectors of a linear operator are the eigenvalues and eigenvectors of any matrix representation of the operator on any ordered basis of the vector space. 238 G. NAGY – LINEAR ALGEBRA december 8, 2009 Example 8.1.1: Consider the vector space R2 values and eigenvectors of the linear operator basis given by 0 T= 1 with the standard basis S . Find the eigenT : R2 → R2 with matrix in the standard 1 . 0 Solution: This operator makes a reflection along the line x1 = x2 , that is, Tx = 0 1 1 0 x1 x = 2. x2 x1 1 is left invariant 1 From this definition we see that any non-zero vector proportional to v1 = by T, that is, 01 1 1 = ⇔ T(v1 ) = v1 . 10 1 1 So we conclude that v1 is an eigenvector of T with eigenvalue λ1 = 1. Analogously, one can −1 check that the vector v2 = satisfies the equation 1 −1 1 −1 =− = 1 −1 1 01 10 ⇔ T(v2 ) = −v2 . So we conclude that v2 is an eigenvector of T with eigenvalue λ2 = −1. See Fig. 54. x x x1 = x 2 2 x 1= x 2 2 T(u) v2 1 T ( v1 ) = v 1 1 u −1 x1 −1 −1 x1 −1 v T ( v2 ) = − v 2 T(v) Figure 54. On the first picture we sketch the action of matrix T in Example 8.1.1, and on the second picture, and we sketch the eigenvectors v1 and v2 with eigenvalues λ1 = 1 and λ2 = −1, respectively. In this example the spectrum is σT = {1, −1} and the respective eigenspaces are E1 = Span 1 1 , E−1 = Span −1 1 . Example 8.1.2: Not every linear operator has eigenvalues and eigenvectors. Consider the vector space R2 with standard basis and fix θ ∈ (0, π ). The linear operator T : R2 → R2 with matrix cos(θ) − sin(θ) T= sin(θ) cos(θ) G. NAGY – LINEAR ALGEBRA December 8, 2009 239 acts on the plane rotating every vector by an angle θ counterclockwise. Since θ ∈ (0, π ), there is no line on the plane left invariant by the rotation. Therefore, this operator has no eigenvalues and eigenvectors. Example 8.1.3: Consider the vector space R2 with standard basis. Show that the linear 13 operator T : R2 → R2 with matrix T = has the eigenvalues and eigenvectors 31 v1 = 1 , 1 λ1 = 4 , and v2 = 1 , −1 λ2 = −2. Solution: We only verify that Eq. (8.1) holds for the vectors and scalars above, since Tv1 = Tv2 = 1 3 3 1 13 31 1 4 1 = =4 = λ1 v1 ; 1 4 1 1 −2 1 = = −2 = λ2 v2 . −1 2 −1 In this example the spectrum is σT = {4, −2} and the respective eigenspaces are E4 = Span 1 1 , E−2 = Span 1 −1 . Example 8.1.4: Consider the vector space V = C ∞ (R, R). Show that the vector f (x) = eax , with a = 0, is an eigenvector with eigenvalue a of the linear operator D : V → V , given by df D(f ) = . dx Solution: The proof is straightforward, since D(f )(x) = df (x) = aeax = a f (x). dx Example 8.1.5: Consider the vector space V = C ∞ (R, R). Show that the vector f (x) = cos(ax), with a = 0, is an eigenvector with eigenvalue −a2 of the linear operator T : V → V , d2 f given by T(f ) = . dx2 Solution: Again, the proof is straightforward, since T(f )(x) = d2 f (x) = −a2 cos(ax) = −a2 f (x). dx2 We know that an eigenspace of a linear operator is not only a subset of the vector space, it is a subspace. Moreover, it is not any subspace, it is an invariant subspace under the linear operator. Given a vector space V and a linear operator T ∈ L(V ), the subspace W ⊂ V is invariant under T iff holds T(W ) ⊂ W . Proposition 8.2. The eigenspace Eλ of the linear operator T ∈ L(V ) corresponding to the eigenvalue λ is an invariant subspace of the vector space V under the operator T. 240 G. NAGY – LINEAR ALGEBRA december 8, 2009 Proof of Proposition 8.2: Let x, y ∈ Eλ , that is, T(x) = λ x and T(y) = λ y. Then, for all a, b ∈ F holds, T(ax + by) = a T(x) + b T(y) = aλ x + bλ y = λ (ax + by) ⇒ (ax + by) ∈ Eλ . This shows that Eλ is a subspace. Since λ (ax + by) = T(ax + by) ∈ Eλ , this also shows that T (Eλ ) ⊂ Eλ . This establishes the Proposition. 8.1.1. Characteristic polynomial. We now address the eigenvalue-eigenvector problem: Given a finite-dimensional vector space V over F and a linear operator T ∈ L(V ), find a scalar λ ∈ F and a non-zero vector x ∈ V solution of T(x) = λ x. This problem is more complicated than solving a linear system of equations T(x) = b, since in our case the source vector is b = λx, which is not a given but part of the unknowns. One way to solve the eigenvalue-eigenvector problem is to first solve for the eigenvalues λ and then solve for the eigenvectors x. Theorem 8.3. Let V be a finite-dimensional vector space over F and let T ∈ L(V ). (a) The scalar λ ∈ F is an eigenvalue of T iff λ is solution of the equation det(T − λ I) = 0. (b) Given λ ∈ F eigenvalue of T, the corresponding eigenvectors x ∈ V are the non-zero solutions of the equation (T − λ I)(x) = 0. Recall that a determinant of a linear operator on a finite-dimensional vector space V was introduced in Def. 5.18 as the determinant of its associated matrix in any ordered basis of V . This definition is independent of the basis chosen in V , since the operator matrix transforms by a similarity transformation under a change of basis and the determinant is invariant under similarity transformations. Proof of Theorem 8.3: Part (a): The scalar λ and the vector x are eigenvalue and eigenvector of T iff holds T(x) = λ x ⇔ (T − λ I) x = 0 ⇔ det(T − λ I) = 0. Part (b) is simpler. Since λ is the scalar such that the operator (T − λ I) is not invertible, this means that N (T − λ I) = {0}, that is, there exists a solution x to the linear equation (T − λ I)x = 0. It is simple to see that this solution is an eigenvector of T. This establishes the Theorem. Definition 8.4. Given a finite-dimensional vector space V and a linear operator T ∈ L(V ), the function p(λ) = det(T − λ I) is called the characteristic polynomial of T. The function p defined above is a polynomial in λ, which can be seen from the definition of determinant of a matrix. The eigenvalues of a linear operator are the roots of its characteristic polynomial. Example 8.1.6: Find the eigenvalues and eigenvectors of the linear operator T : R2 → R2 , 13 T= . 31 Solution: We start computing the eigenvalues, which are the roots of the characteristic polynomial p(λ) = det(T − λ I) = det 1 3 3 λ − 1 0 0 λ = (1 − λ ) 3 , 3 (1 − λ) G. NAGY – LINEAR ALGEBRA December 8, 2009 241 hence p(λ) = (1 − λ)2 − 9. The roots are (λ − 1)2 = 32 λ1 = 4 , ⇒ λ2 = −2 We now find the eigenvector for the eigenvalue λ1 = 4. We solve the system (T − 4 I) x = 0 performing Gauss operation in the matrix −3 3 1 → 3 −3 0 T − 4I = −1 0 x1 = x2 , ⇒ x2 free. 1 . In a similar way we find the eigenvector 1 for the eigenvalue λ2 = −2. We solve the linear system (T + 2 I) x = 0 performing Gauss operation in the matrix Choosing x2 = 1 we obtain the eigenvector x1 = T + 2I = 33 1 → 33 0 1 0 x1 = −x2 , ⇒ Choosing x2 = −1 we obtain the eigenvector x2 = x2 free. 1 . These results λ1 , x1 and λ2 , x2 −1 agree with Example 8.1.3. We now introduce two numbers that characterize important properties of eigenvectors. Definition 8.5. Let V be a vector space over F, let T ∈ L(V ) be a linear operator. Denote by λi , for i = 1, · · · , k all the roots of the characteristic polynomial p, that is, p(λ) = (λ − λ1 )r1 · · · (λ − λk )rk q (λ), with q (λi ) = 0; and denote by si = dim Eλi , the dimension of the eigenspaces corresponding to the eigenvalue λi . The numbers ri are called the algebraic multiplicity and the numbers si are called the geometric multiplicity of the eigenvalue λi . In the case that F = C, hence the characteristic polynomial is complex-valued, the polynomial q is constant. Indeed, when p is an n-degree complex-valued polynomial the Fundamental Theorem of Algebra says that p has n complex roots. Therefore, the characteristic polynomial has the form p(λ) = c (λ − λ1 )r1 · · · (λ − λk )rk , where r1 + · · · + rk = n. On the other hand, in the case that F = R the characteristic polynomial is real-valued. In this case the polynomial q may have degree greater than zero. Example 8.1.7: For the following matrices of their eigenvalues, 3 32 A= , B = 0 03 0 find the algebraic and geometric multiplicities 1 3 0 1 2 , 1 30 C = 0 3 00 1 2 . 1 Solution: The eigenvalues of matrix A are the roots of the characteristic polynomial pa (λ) = (3 − λ) 2 = (λ − 3)2 0 (3 − λ) ⇒ λ1 = 3 , r1 = 2. 242 G. NAGY – LINEAR ALGEBRA december 8, 2009 So, the eigenvalue λ1 = 3 has algebraic multiplicity r1 = 2. To find the geometric multiplicity we need to compute the eigenspace Eλ1 , which is the null space of the matrix (A − 3I), that is, 02 1 ⇒ x2 = 0, x1 free ⇒ x1 = x. A − 3 I2 = 00 01 We have obtained Eλ1 = Span 1 0 ⇒ s1 = dim Eλ1 = 1. In the case of matrix A the algebraic multiplicity is greater than the geometric multiplicity, 2 = r1 > s1 = 1, since the greatest linearly independent set of eigenvectors for the eigenvalue λ1 contains only one vector. The algebraic and geometric multiplicities for matrices B and C is computed in a similar way. For matrix B we obtain, Notice that these two matrices differ only in the matrix coefficient (1, 2). We will see that both matrices B and C have the same algebraic multiplicities but they have different geometric multiplicities. We start computing the characteristic polynomial of matrix B, pb (λ) = (3 − λ) 1 1 0 (3 − λ) 2 = −(λ − 3)2 (λ − 1), 0 0 (1 − λ) which implies that λ1 = 3 has algebraic multiplicity r1 = 2 and λ2 = 1 has algebraic multiplicity r2 = 1. To find the geometric multiplicities we need to find the corresponding eigenspaces. We start with λ1 = 3, x1 free, 01 1 010 x2 = 0, 2 → 0 0 1 ⇒ B − 3 I3 = 0 0 x = 0, 0 0 −2 000 3 which implies that Eλ1 = Span The geometric multiplicity for 2 B − I3 = 0 0 1 0 0 the eigenvalue 11 1 2 2 → 0 00 0 which implies that ⇒ s1 = 1 . λ2 = 1 is computed x1 00 x2 1 1 ⇒ x 00 3 = 0, = −x3 , free, Eλ2 = Span 0 −1 1 as follows, ⇒ s2 = 1 . We then conclude 2 = r1 > s1 = 1 and r2 = s2 = 1. Finally, we compute the characteristic polynomial of matrix C, pc (λ) = (3 − λ) 0 0 (3 − λ) 0 0 1 2 = −(λ − 3)2 (λ − 1) (1 − λ) so, we obtain the same eigenvalues we had for matrix B, that is, λ1 = 3 has algebraic multiplicity r1 = 2 and λ2 = 1 has algebraic multiplicity r2 = 1. The geometric multiplicity G. NAGY – LINEAR ALGEBRA December 8, 2009 of λ1 is computed as follows, 00 C − 3 I3 = 0 0 00 1 00 2 → 0 0 −2 00 1 0 0 ⇒ 243 x1 free, x2 free, x = 0, 3 which implies that 1 0 Eλ1 = Span 0 , 1 ⇒ s1 = 2 . 0 0 In this case we obtained 2 = r1 = s1 . The geometric multiplicity for the eigenvalue λ2 = 1 is computed as follows, x1 = − 1 x3 , 1 201 102 2 0 2 2 → 0 1 1 ⇒ C − I3 = x2 = −x3 , 000 000 x3 free, which implies that (choosing x3 = 2), Eλ2 = Span −1 −2 2 ⇒ s2 = 1 . We then conclude r1 = s1 = 2 and r2 = s2 = 1. Comparing the results for matrix B and C we see that a change in just one matrix coefficient can change the eigenspaces even in the case where the eigenvalues do not change. In fact, it can be shown that the eigenvalues are continuous functions of the matrix coefficients while the eigenvectors are not continuous functions. This means that a small change in the matrix coefficients produces a small change in the eigenvalues but it might produce a big change in the eigenvectors, just like in this Example. We finish this Section with one important result concerning the eigenvectors of different eigenvalues. They form a linearly independent set. Proposition 8.6. Let V be a finite dimensional vector space over F, T ∈ L(V ) be an operator with eigenvalues {λ1 , · · · , λk } ⊂ F, with k 2, and corresponding eigenvectors X = {x1 , · · · , xk } ⊂ V . If the eigenvalues λi are all different, for i = 1, · · · , k , then the set X of eigenvectors is linearly independent. Proof of Proposition 8.6: Let c1 , · · · , ck ∈ F be scalars such that c1 x1 + · · · + ck xk = 0. (8.3) Now perform the following two steps: First, apply the operator T to both sides of the equation above. Since T(xi ) = λi xi , we obtain, c1 λ1 x1 + · · · + ck λk xk = 0. (8.4) Second, multiply Eq. (8.3) by λ1 and subtract it from Eq.(8.4). The result is c2 (λ2 − λ1 ) x2 + · · · + ck (λk − λ1 ) xk = 0. (8.5) Notice that all factors λi − λ1 = 0 for i = 2, · · · , k . Repeat these two steps: First, apply the operator T on both sides of Eq. (8.5), that is, c2 (λ2 − λ1 ) λ2 x2 + · · · + ck (λk − λ1 ) λk xk = 0; (8.6) 244 G. NAGY – LINEAR ALGEBRA december 8, 2009 second, multiply Eq. (8.5) by λ2 and subtract it from Eq. (8.6), that is, c2 (λ2 − λ1 ) (λ3 − λ2 ) x3 + · · · + ck (λk − λ1 ) (λk − λ2 ) xk = 0. (8.7) Repeat the idea in these two steps until one reaches the equation ck (λk − λ1 ) · · · (λk − λk−1 ) xk = 0. Since the eigenvalues λi are all different, we conclude that ck = 0. Introducing this information at the very begining we get that c1 x1 + · · · + ck−1 xk−1 = 0. Repeating the whole procedure we conclude that ck−1 = 0. In this way one shows that all coefficient c1 = · · · = ck = 0. Therefore, the set X is linearly independent. This establishes the Proposition. Notes and additional reading. A detailed presentation of eigenvalues and eigenvectors of a matrix can be found in Sections 5.1 and 5.2 in Lay’s book [2], while a shorter and deeper summary can be found in Section 7.1 in Meyer’s book [3]. Also see Chapter 4 in Hassani’s book [1]. G. NAGY – LINEAR ALGEBRA December 8, 2009 245 Exercises. 8.1.1.- Find the spectrum and all eigenspaces of the operators A : R2 → R2 and B : R3 → R3 , » – −10 −7 A= , 14 11 2 3 1 10 3 15 . B = 40 1 −1 2 8.1.2.- Show that for θ ∈ (0, π ) the rotation operator R(θ) : R2 → R2 has no eigenvalues, where » – cos(θ) − sin(θ) R( θ ) = . sin(θ) cos(θ) Consider now matrix above as a linear operator R(θ) : C2 → C2 . Show that this linear operator has eigenvalues, and find them. 8.1.3.- Let A : R3 → R3 be the linear operator given by 3 2 2 −1 3 1 h5 . A = 40 0 02 (a) Find all the eigenvalues and their corresponding algebraic multiplicities of the matrix A. (b) Find the value(s) of h ∈ R such that the matrix A above has a twodimensional eigenspace, and find a basis for this eigenspace. (c) Set h = 1, and find a basis for all the eigenspaces of matrix A above. 8.1.4.- Find all the eigenvalues with their corresponding algebraic multiplicities, and find all the associated eigenspaces of the matrix A ∈ R3,3 given by 2 3 211 A = 40 2 3 5 . 001 8.1.5.- Let k ∈ R and consider the matrix A ∈ R4,4 given by 2 3 2 −2 4 −1 60 3k 07 7. A=6 40 02 45 0 00 1 (a) Find the eigenvalues of A and their algebraic multiplicity. (b) Find the number k such that matrix A has an eigenspace Eλ that is two dimensional, and find a basis for this Eλ . 8.1.6.- Comparing the characteristic polynomials for A ∈ Fn,n and AT , show that these two matrices have the same eigenvalues. 8.1.7.- Let A ∈ R3,3 be a matrix with eigenvalues 2, −1 and 3. Find the eigenvalues of: (a) A−1 . (b) Ak , for any k ∈ N. (c) A2 − A. 246 G. NAGY – LINEAR ALGEBRA december 8, 2009 8.2. Diagonalizable operators In this Section we study linear operators on a finite dimensional vector space that have a complete set of eigenvectors. This means that there exists a basis of the vector space formed with eigenvectors of the linear operator. We show that these operators are diagonalizable, that is, the matrix of the operator in the basis of its own eigenvectors is diagonal. We end this Section showing that it is not difficult to define functions of operators in the case that the operator is diagonalizable. Definition 8.7. A linear operator T ∈ L(V ) defined on an n-dimensional vector space V has a complete set of eigenvectors iff there exists a linearly independent set formed with n eigenvectors of T. In other words, a linear operator T has a complete set of eigenvectors iff there exists a basis of V formed by eigenvectors of T. Not every linear operator has a complete set of eigenvectors. For example, a linear operator without a complete set of eigenvectors is a rotation on a plane R(θ) : R2 → R2 by an angle θ ∈ (0, π ). This particular operator has not eigenvectors at all. Example 8.2.1: Matrix B in Example 8.1.7 does not have a complete set of eigenvectors. The largest linearly independent set of eigenvectors of matrix B contains only two vectors, one possibility is shown below. 311 1 0 B = 0 3 2 , X = x1 = 0 , x2 = −1 . 001 0 1 Matrix C in Example 3 C = 0 0 8.1.7 has a complete set of eigenvectors, indeed, 01 1 0 1 3 2 , X = x1 = 0 , x2 = 1 , x3 = −2 . 01 0 0 2 We now introduce the notion of a diagonalizable operator. Definition 8.8. A linear operator T ∈ L(V ) defined on a finite dimensional vector space V ˜ is called diagonalizable iff there exists a basis V of V such that the matrix Tvv is diagonal. ˜˜ Recall that a square matrix D = [Dij ] is diagonal iff Dij = 0 for i = j . We denote an n × n diagonal matrix by D = diag[D1 , · · · , Dn ], so we use only one index to label the diagonal elements, Dii = Di . Examples of 3 × 3 diagonal matrices are given by 100 200 D11 0 0 D22 0 . A = 0 2 0 , B = 0 0 0 , D= 0 003 004 0 0 D33 These two notions given in Definitions 8.7 and 8.8 are equivalent. Theorem 8.9. The linear operator T defined on an n-dimensional vector space V is diagonalizable iff the operator T has a complete set of eigenvectors. Furthermore, denoting by (λi , xi ) for i = 1, · · · , n the pairs of eigenvalue and eigenvector ˜ of T, the matrix of the operator T in the ordered basis of its eigenvectors V = (x1 , · · · , xn ) is diagonal with the eigenvalues on the diagonal, that is, Tvv = diag[λ1 , · · · , λn ]. ˜˜ G. NAGY – LINEAR ALGEBRA December 8, 2009 247 Proof of Theorem 8.9: ˜ (⇒) Since T is diagonalizable, we know that there exists a basis V = (x1 , · · · , xn ) such that its matrix is diagonal, that is, Tvv = diag[λ1 , · · · , λn ]. This implies that for i = 1, · · · , n ˜˜ holds Tvv xiv = λi xiv ⇔ T(xi ) = λi xi . ˜˜ ˜ ˜ We then conclude that xi is an eigenvector of the operator T with eigenvalue λi . Since ˜ V is a basis of the vector space V , this means that the operator T has a complete set of eigenvectors. ˜ (⇐) Since the set of eigenvectors V = (x1 , · · · , xn ) of the operator T with corresponding eigenvalues λ1 , · · · , λn form a basis of V , then the eigenvalue-eigenvector equation for i = 1, · · · , n T(xi ) = λi xi . imply that the matrix Tvv is diagonal, since ˜˜ Tvv = [T(x1 )]v , · · · , [T(xn )]v = λ1 [x1 ]v , · · · , λn [xn ]v = λ1 e1 , · · · , λn en , ˜ ˜ ˜ ˜ ˜˜ so we arrive at the equation Tvv = diag[λ1 , · · · , λn ]. This establishes the Theorem. ˜˜ Example 8.2.2: Show that the linear operator T ∈ L(R2 ) with matrix Tss = 13 in the 31 standard basis of S of R2 is diagonalizable. Solution: In Example 8.1.6 we obtained that the eigenvalues and eigenvectors of matrix Tss are given by λ1 = 4 , x1 = 1 1 and λ2 = −2, x2 = 1 . −1 We now affirm that the matrix of the linear operator T is diagonal in the ordered basis formed by the eigenvectors above, ˜ V= 1 1 , 1 −1 . ˜ First notice that the set V is linearly independent, so it is a basis for R2 . Second, the change of basis matrix P = Ivs is given by ˜ P= 1 1 1 −1 ⇒ P−1 = 11 21 1 . −1 Third, the result of the change of basis is a diagonal matrix: P−1 Tss P = 11 21 1 −1 13 31 1 1 11 1 = −1 21 1 −1 4 4 −2 4 = 2 0 0 = Tv v . ˜˜ −2 As stated in Theorem 8.9, the diagonal elements in Tvv are precisely the eigenvalues of the ˜˜ ˜ operator T, in the same order as the eigenvectors in the ordered basis V . The statement in Definition 8.8 can be expressed as a statement between matrices. A matrix A ∈ Fn,n is diagonalizable if there exists an invertible matrix P ∈ Fn,n and a diagonal matrix D ∈ Fn,n such that A = P D P−1 . These two notions are equivalent, since A is the matrix of an linear operator T ∈ L(V ) in the standard basis S of V , that is, A = Tss . The similarity transformation above can expressed as D = P−1 Tss P. 248 G. NAGY – LINEAR ALGEBRA december 8, 2009 ˜ Denoting P = Ivs as the change of basis matrix from the standard basis S to a basis V , we ˜ conclude that D = Tvv , ˜˜ ˜ That is, the matrix of operator T is diagonal in the basis V . Furthermore, Theorem 8.9 can also be expressed in terms of similarity transformations between matrices as follows: A square matrix has a complete set of eigenvalues iff the matrix is similar to a diagonal matrix. This point of view is common in the literature. 12 in 36 the standard basis of R2 is diagonalizable. Find a similarity transformation that converts matrix T into a diagonal matrix. Example 8.2.3: Show that the linear operator T : R2 → R2 with matrix T = Solution: To find out whether T is diagonalizable or not we need to compute its eigenvectors, so we start with its eigenvalues. The characteristic polynomial is p(λ) = (1 − λ) 3 2 = λ(λ − 7) = 0 (6 − λ) λ1 = 0 , ⇒ λ2 = 7 . Since the eigenvalues are different, we know that the corresponding eigenvectors form a linearly independent set (by Proposition 8.6), and so matrix T has a complete set of eigenvectors. So A is diagonalizable. The corresponding eigenvectors are the null spaces: N (T) and N (T − 7 I2 ), which are computed as follows: 1 3 −6 3 2 12 → 6 00 2 3 → −1 0 ⇒ −1 0 ⇒ ⇒ 3x1 = x2 x1 = −2 ; 1 ⇒ x1 = −2x2 x2 = 1 . 3 Since the set V = {x1 , x2 } is a complete set of eigenvectors of T, we conclude that T is diagonalizable. Proposition 8.9 says that D = P−1 T P, where D = 0 0 0 , 7 P= −2 1 1 . 3 We finally verify this the equation above is correct: P D P− 1 = −2 1 13 0 0 0 1 −3 77 1 1 −2 1 = 2 13 0 1 0 12 = = T. 2 36 8.2.1. Functions of diagonalizable operators. We have seen that the set of all linear operators L(V ) on a vector space V is itself a vector space, since the linear combination of linear operators is again a linear operator. The space L(V ) has an additional structure though, since the composition of two linear operators is again a linear operator. We also know that once a basis is fixed in the vector space V the composition of operators is equivalent to the matrix multiplication of the operators matrices. The additional structure on L(V ) given by operator composition allow us to introduce functions of operators. The calculations involved in computing the function of an operator are simpler in the case that the operator is diagonalizable. For that reason we consider in this Section only functions of diagonalizable operators. We start with the power function. G. NAGY – LINEAR ALGEBRA December 8, 2009 249 Lemma 8.10. Let V be an n-dimensional vector space over F, T ∈ L(V ) be a diagonalizable linear operator with matrix T in the standard ordered basis of V . Let D, P ∈ Fn,n be diagonal and an invertible matrices, respectively, such that T = P D P−1 . Then, for every k ∈ N holds Tk = P Dk P−1 . Proof of Lemma 8.10: Consider the case k = 2. A simple calculation shows T2 = T T = P D P−1 P D P−1 = P D D P−1 = P D2 P−1 . Suppose now that the formula holds for k ∈ N, that is, Tk = P Dk P−1 , and let us show that it also holds for k + 1. Indeed, T(k+1) = T Tk = P D P−1 P Dk P−1 = P D Dk P−1 = P D(k+1) P−1 . This establishes the Lemma. 1 3 2 . 6 0 1 −3 77 1 1 . 2 Example 8.2.4: Given the matrix T ∈ R3,3 , compute Tk , where k ∈ N and T = Solution: From Example 8.2.3 we know that T is diagonalizable, and that D= 0 0 0 , 7 P= −2 1 1 3 ⇒ P−1 = 1 −3 71 1 . 2 Therefore, Tk = P Dk P−1 = −2 1 13 0 0 0 1 −3 7k 7 1 1 −2 1 = 7(k−1) 2 13 0 0 The final result is Tk = 7(k−1) T. It is simple to generalize Lemma 8.10 to polynomial functions. Lemma 8.11. Assume the hypotheses in Lemma 8.10 and denote by D = diag [λ1 , · · · , λn ]. If p : F → F is a polynomial of degree k ∈ N, then, denoting p(D) = diag [p(λ1 ), · · · , p(λn )], the following expression holds p(T) = P p(D) P−1 . Proof of Lemma 8.11: Given scalars a0 , · · · , ak ∈ F, denote the polynomial p by p(x) = a0 + a1 x + · · · + ak xk . Then, p(T) = a0 In + a1 T + · · · + ak Tk = a0 P P−1 + a1 P D P−1 + · · · + ak P Dk P−1 = P (a0 In + a1 D + · · · + ak Dk ) P−1 . Noticing that p(D) = a0 In + a1 D + · · · + ak Dk = diag [p(λ1 ), · · · , p(λn )]. We conclude that p(T) = P p(D) P−1 . This establishes the Lemma. Functions that admit a power series expansion can defined on diagonalizable operators in the same way as polynomial functions in Lemma 8.11. We consider first an important particular case, the exponential function f (x) = ex . The exponential function is usually 250 G. NAGY – LINEAR ALGEBRA december 8, 2009 defined as the inverse of the natural logarithm function g (x) = ln(x), which in turns is defined as the area under the graph of the function h(x) = 1/x from 1 to x, that is, x 1 dy x ∈ (0, ∞). y 1 It is not clear how to use this way of defining the exponential function on real numbers to extended it to operators. However, one shows that the exponential function on real numbers has several properties, among them that it can be expressed as an infinite power series, ln(x) = ∞ ex = k=0 x2 xk xk =1+x+ + ··· + + ··· . k! 2! k! Such a power series provides the path to generalize the exponential function to a diagonalizable linear operator. Lemma 8.12. Assume the hypotheses in Lemma 8.10, denote by D = diag [λ1 , · · · , λn ], and introduce the exponential function of an operator as follows, ∞ eT = k=0 D λ1 Then, denoting e = diag [e , · · · , e λn Tk . k! , the following expression holds, eT = P eD P−1 . Proof of Lemma 8.12: For every N ∈ N introduce the partial sum SN (T) as follows, N SN (T) = k=0 Tk , k! which is a well defined polynomial in T. A straightforward computation shows that N SN (T) = k=0 P Dk P−1 =P k! N k=0 Dk P−1 . k! In the far right hand side of expression above it is possible to compute the limit as N → ∞, and the result is ∞ Dk eT = S∞ (T) = P P−1 . = P eD P−1 , k! D k=0 n λ1 where we have denoted e = diag [e , · · · , e T . We conclude that D e = P e P−1 . This establishes the Lemma. Example 8.2.5: For every t ∈ R find the value of the exponential function eA t , where A= Solution: From Example 8.2.2 we know where 4 0 1 D= , P= 0 −2 1 Therefore, 1 1 At = 1 −1 1 3 3 . 1 that A is diagonalizable, and that A = P D P−1 , 1 −1 ⇒ P−1 = 4t 0 11 0 −2t 2 1 1 −1 11 21 1 . −1 G. NAGY – LINEAR ALGEBRA December 8, 2009 251 and Lemma 8.12 imply that eA t = 1 1 1 −1 e 4t 0 0 e−2t 11 21 1 . −1 The function introduced in Example 8.2.5 above can be seen as an operator-valued function f : R → R2,2 given by f (t) = eA t , A ∈ R2,2 . It can be shown that this function is actually differentiable, and that df (t) = A eA t . dt A more precise statement is the following. Lemma 8.13. Assume the hypotheses in Lemma 8.10 and introduce the operator-valued function f : R → L(V ) as follows, f (x) = eT x x ∈ R. Then, the function f is differentiable and df (x) = T eT x = eT x T. dx Proof of Lemma 8.13: It is simple to see that d d D x −1 df (x) = P eD x P−1 = P e P. dx dx dx It is not difficult to see that the expression of the far right in equation above is given by d Dx d λ1 x d λn x e = diag e ,··· , e = diag [λ1 eλ1 x , · · · , λn eλn x ] = D eD x = eD x D, dx dx dx where we used that D = diag [λ1 , · · · , λn ]. Recalling that P D eD x P−1 = P D P−1 P eD x P−1 = T eT x , we conclude that d Tx e = T e T x = e T x T. dx This establishes the Lemma. Example 8.2.6: Find the derivative of f (t) = eA t , where A = 13 . 31 Solution: From Example 8.2.5 we know that eA t = 1 1 1 −1 e 4t 0 0 − 2t e 11 21 1 . −1 Then, Lemma 8.13 implies that d At 1 e= 1 dt 1 −1 4e4t 0 0 −2e−2t 11 21 1 . −1 We end this Section presenting a result without proof, that says that given any scalarvalued function with a convergent power series, that function can be extended into an operator-valued function in the case that the operator is diagonalizable. 252 G. NAGY – LINEAR ALGEBRA december 8, 2009 Theorem 8.14. Assume the hypotheses in Lemma 8.10, denote by D = diag [λ1 , · · · , λn ], and let f : F → F be a function given by a power series ∞ ck (z − z0 )k , f (z ) = k=0 which converges for |z − z0 | < r, for some positive real number r. The function f : F → L(V ) given by ∞ cn (T − z0 In )k f (T) = k=0 converges iff |λi − z0 | < r for all i = 1, · · · ,. G. NAGY – LINEAR ALGEBRA December 8, 2009 253 Exercises. 8.2.1.- Which of the following matrices cannot be diagonalized? » – 2 −2 A= , 2 −2 » – 2 0 B= , 2 −2 » – 20 C= . 22 8.2.2.- Verify that the matrix » – 7/5 1 /5 A= −1 1/2 has eigenvalues λ1 = 1 and λ2 = 9/10 and associated eigenvectors »– »– −1 −2 x1 = , x2 = . 2 5 Use this information to compute lim Ak . k→∞ 8.2.3.- Given the matrix and vector , – »– » 2 13 , x0 = A= 1 31 Show that the function x : R → R2 given by x(t) = eA t x0 is a solution of the differential equation d x(t) = A x(t) dt and satisfies that x(t = 0) = x0 . 8.2.4.- Let A ∈ R3,3 be a matrix with eigenvalues 2, −1 and 3. Find the determinant of A. 8.2.5.- Let A ∈ R4,4 be a matrix that can be decomposed as A = P D P−1 , with matrix P an invertible matrix and the matrix 1 D = diag(2, , 2, 3). 4 Knowing only this information about the matrix A, is it possible to compute the det(A)? If your answer is no, explain why not; if your answer is yes, compute det(A) and show your work. 8.2.6.- Let A ∈ R4,4 be a matrix that can be decomposed as A = P D P−1 , with matrix P an invertible matrix and the matrix D = diag(2, 0, 2, 5). Knowing only this information about the matrix A, is it possible to whether A invertible? Is it possible to know tr (A)? If your answer is no, explain why not; if your answer is yes, compute tr (A) and show your work. 254 G. NAGY – LINEAR ALGEBRA december 8, 2009 8.3. Differential equations Eigenvalues and eigenvectors of a matrix are useful to find solutions to systems of differential equations. In this Section we first recall what is a system of first order, linear, homogeneous, differential equations with constant coefficients. Then we use the eigenvalues and eigenvectors of the coefficient matrix to obtain solutions to such differential equations. In order to introduce a linear, first order system of differential equations we need some notation. Let A : R → Rn,n be a real matrix-valued function, x, b : R → Rn be real vector-valued functions, with values A(t), x(t), and b(t) given by A11 (t) · · · A1n (t) x1 (t) b1 (t) . . . . , . A(t) = . x(t) = . , b(t) = . . . . . . An1 (t) · · · Ann (t) xn (t) bn (t) So, A(t) is an n × n matrix for each value of t ∈ R. An example in the case n = 2 is the matrix-valued function cos(2πt) − sin(2πt) A(t) = . sin(2πt) cos(2πt) The values of this function are rotation matrices on R2 , counterclockwise by an angle 2πt. So the bigger the parameter t the bigger the is rotation. Derivatives of matrix- and vectord valued functions are computed component-wise, and we use the notation ˙ = dt ; for example dx1 x1 (t) ˙ dt (t) . dx . ˙ (t) = . is denoted as x(t) = . . . . dt dx xn (t) ˙ n (t) dt We are now ready to introduce the main definitions. Definition 8.15. A system of first order linear differential equations on n unknowns, with n 1, is the following: Given a real matrix-valued function A : R → Rn,n , and a real vector-valued function b : R → Rn , find a vector-valued function x : R → Rn solution of ˙ x(t) = A(t) x(t) + b(t). (8.8) The system in (8.8) is called homogeneous iff b(t) = 0 for all t ∈ R. The system in (8.8) is called of constant coefficients iff the matrix- and vector-valued functions are constant, that is, A(t) = A0 and b(t) = b0 for all t ∈ R. The differential equation in (8.8) is called first order because it contains only first derivatives of the the unknown vector-valued function x; it is called linear because the unknown x appears linearly in the equation. In this Section we are interested in finding solutions to an initial value problem involving a constant coefficient, homogeneous differential system. Definition 8.16. An initial value problem (IVP) for an homogeneous constant coefficients linear differential equation is the following: Given a matrix A ∈ Rn,n and a vector x0 ∈ Rn , find a vector-valued function x : R → Rn solution of the differential equation and initial condition ˙ x(t) = A x(t), x(0) = x0 . In this Section we only consider the case where the coefficient matrix A ∈ Rn,n has a complete set of eigenvectors, that is, matrix A is diagonalizable. In this case it is possible to find all solutions of the initial value problem for an homogeneous and constant coefficient differential equation. These solutions are linear combination of vectors proportional to the eigenvectors of matrix A, where the scalars involved in the linear combination depend on G. NAGY – LINEAR ALGEBRA December 8, 2009 255 the eigenvalues of matrix A. The explicit form of the solution depends on these eigenvalues and can be classified in three groups: Non-repeated real eigenvalues, non-repeated complex eigenvalues, and repeated eigenvalues. We consider in these notes only the first two cases of non-repeated eigenvalues. In this case, the main result is the following. Theorem 8.17. Assume that matrix A ∈ Rn,n has a complete set of eigenvectors denoted as V = {v1 , · · · , vn } with corresponding eigenvalues {λ1 , · · · , λn }, all different, and fix x0 ∈ Rn . Then the initial value problem ˙ x(t) = A x(t), x(0) = x0 , (8.9) has a unique solution given by x(t) = eAt x0 , (8.10) where eAt = P eDt P−1 , P = v1 , · · · , vn , eDt = diag eλ1 t , · · · , eλn t . The solution given in Eq. (8.10) is often written in an equivalent way, as follows: x(t) = P eDt P−1 x0 = P eDt P−1 x0 , then, introducing the notation P eDt = v1 eλ1 t , · · · , vn eλn t , c = P−1 x0 , c1 . c = . , . cn we write the solution x(t) in the form x(t) = c1 v1 eλ1 t + · · · + cn vn eλn t , P c = x0 . This latter notation is common in the literature on ordinary differential equations. The solution x(t) is expressed as a linear combination of the eigenvectors vi of the coefficient matrix A, where the components are functions of the variable t given by ci eλi t , for i = 1, · · · , n. So the eigenvalues and eigenvectors of matrix A are the crucial information to find the solution x(t) of the initial value problem in Eq. (8.16), as can be seen from the following calculation: Consider the function yi (t) = eλi t vi , i = 1, · · · , n. This function is solution of the differential equation above, since d λi t e vi = λi eλi t vi = eλi t (λi vi ) = eλi t A vi = A (eλi t vi ) = A yi (t), dt ˙ hence, yi (t) = A yi (t). This calculation is the essential part in the proof of Theorem 8.17. Proof of Theorem 8.17: Since matrix A has a complete set of eigenvectors V , then for every value of t ∈ R there exist t dependent scalars c1 (t), · · · , cn (t) such that ˙ yi (t) = x(t) = c1 (t) v1 + · · · + cn (t) vn . (8.11) The t-derivative of the expression above is ˙ x(t) = c1 (t) v1 + · · · + cn (t) vn . ˙ ˙ Recalling that A vi = λi vi , the action of matrix A in Eq. (8.11) is A x(t) = c1 (t) λ1 v1 + · · · + cn (t) λn vn . ˙ The vector x(t) is solution of the differential equation in (8.16) iff x(t) = A x(t), that is, c1 (t) v1 + · · · + cn (t) vn . = c1 (t) λ1 v1 + · · · + cn (t) λn vn , ˙ ˙ 256 G. NAGY – LINEAR ALGEBRA december 8, 2009 which is equivalent to [c1 (t) − λ1 c1 (t)] v1 + · · · + [cn (t) − λn cn (t)] vn = 0. ˙ ˙ Since the set V is a basis of Rn , each term above must vanish, that is, for all i = 1, · · · , n holds ci (t) = λi ci (t) ⇒ ci (t) = ci eλi t . ˙ So we have obtained the general solution x(t) = c1 eλ1 t v1 + · · · + cn eλn t vn . The initial condition x(0) = x0 fixes a unique set of constants ci as follows x(0) = c1 v1 + · · · + cn vn = x0 , n since the set V is a basis of R . This expression of the solution can be rewritten as follows: Using the matrix notation P = v1 , · · · , vn , we see that the vector c satisfies the equation P c = x0 . Also notice that P eDt = v1 eλ1 t , · · · , vn eλn t , therefore, the solution x(t) can be written as x(t) = P eDt P−1 x0 = P eDt P−1 x0 . Since eAt = P eDt P−1 , we conclude that x(t) = eAt x0 . This establishes the Theorem. Example 8.3.1: (Non-repeated, real eigenvalues) Given the matrix A ∈ R2,2 and vector x0 ∈ R2 below, find the function x : R → R2 solution of the initial value problem ˙ x(t) = A x(t), x(0) = x0 , where A= 1 3 3 , 1 x0 = 6 . 4 Solution: Recall that matrix A has a complete set of eigenvectors, with V = v1 = −1 1 ., , v2 = 1 1 {λ1 = 4, λ2 = −2}. Then, Theorem 8.17 says that the general solution of the differential equation above is x(t) = c1 eλ1 t v1 + c2 eλ2 t v2 x(t) = c1 e4t ⇔ 1 −1 + c2 e−2t . 1 1 The constants c1 and c2 are obtained from the initial data x0 as follows x(0) = c1 1 −1 6 + c2 = x0 = 1 1 4 ⇒ 1 1 −1 1 The solution of this linear system is 111 c1 = c2 2 −1 1 6 5 = . 4 −1 Therefore, the solution x(t) of the initial value problem above is x(t) = 5 e4t 1 −1 − e−2t . 1 1 c1 6 = . c2 4 G. NAGY – LINEAR ALGEBRA December 8, 2009 257 8.3.1. Non-repeated real eigenvalues. We present a qualitative description of the solutions to Eq. (8.9) in the particular case of 2 × 2 linear ordinary differential systems with matrix A having two real and different eigenvalues. The main tool will be the sketch of phase diagrams, also called phase portraits. The solution at a particular value t is given by a vector x (t) x(t) = 1 x2 (t) so it can be represented by a point on a plane, while the solution function for all t ∈ R corresponds to a curve on that plane. In the case that the solution vector x(t) represents a position function of a particle moving on the plane at the time t, the curve given in the phase diagram is the trajectory of the particle. Arrows are added to this trajectory to indicate the motion of the particle as time increases. Since the eigenvalues of the coefficient matrix A are different, Theorem 8.17 says that there always exist two linearly independent eigenvectors v1 and v2 associated with the eigenvalues λ1 and λ2 , respectively. The general solution to the Eq. (8.9) is then given by x(t) = c1 v1 eλ1 t + c2 v2 eλ2 t . A phase diagram contains several curves associated with several solutions, that correspond to different values of the free constants c1 and c2 . In the case that the eigenvalues are nonzero, the phase diagrams can be classified into three main classes according to the relative signs of the eigenvalues λ1 = λ2 of the coefficient matrix A, as follows: (i) 0 < λ2 < λ1 , that is, both eigenvalues positive; (ii) λ2 < 0 < λ1 , that is, one eigenvalue negative and the other positive; (iii) λ2 < λ1 < 0, that is, both eigenvalues negative. The study of the cases where one of the eigenvalues vanishes is simpler and is left as an exercise. We now find the phase diagrams for three examples, one for each of the classes presented above. These examples summarize the behavior of the solutions to 2 × 2 linear differential systems with coefficient matrix having two real, different and non-zero eigenvalues λ2 < λ1 . The phase diagrams can be sketched following these steps: First, plot the eigenvectors v2 and v1 corresponding to the eigenvalues λ2 and λ1 , respectively. Second, draw the whole lines parallel to these vectors and passing through the origin. These straight lines correspond to solutions with one of the coefficients c1 or c2 vanishing. Arrows on these lines indicate how the solution changes as the variable t grows. If t is interpreted as time, the arrows indicate how the solution changes into the future. The arrows point towards the origin if the corresponding eigenvalue λ is negative, and they point away form the origin if the eigenvalue is positive. Finally, find the non-straight curves correspond to solutions with both coefficient c1 and c2 non-zero. Again, arrows on these curves indicate the how the solution moves into the future. Example 8.3.2: (Case 0 < λ2 < λ1 .) Sketch the phase diagram of the solutions to the differential equation 1 11 3 ˙ x = A x, A= . (8.12) 419 Solution: The characteristic equation for matrix A is given by λ1 = 3 , det(A − λ I2 ) = λ2 − 5λ + 6 = 0 ⇒ λ2 = 2 . One can show that the corresponding eigenvectors are given by v1 = 3 , 1 v2 = −2 . 2 258 G. NAGY – LINEAR ALGEBRA december 8, 2009 So the general solution to the differential equation above is given by x(t) = c1 v1 eλ1 t + c2 v2 eλ2 t ⇔ x(t) = c1 3 3t − 2 2t e + c2 e. 1 2 In Fig. 55 we have sketched four curves, each representing a solution x(t) corresponding to a particular choice of the constants c1 and c2 . These curves actually represent eight different solutions, for eight different choices of the constants c1 and c2 , as is described below. The arrows on these curves represent the change in the solution as the variable t grows. Since both eigenvalues are positive, the length of the solution vector always increases as t grows. The straight lines correspond to the following four solutions: c1 = 1, c2 = 0, Line on the first quadrant, starting at the origin, parallel to v1 ; c1 = 0, c2 = 1, Line on the second quadrant, starting at the origin, parallel to v2 ; c1 = −1, c2 = 0 , Line on the third quadrant, starting at the origin, parallel to −v1 ; c1 = 0, c2 = −1, Line on the fourth quadrant, starting at the origin, parallel to −v2 . x2 v2 v1 x1 Figure 55. The graph of several solutions to Eq. (8.12) corresponding to the case 0 < λ2 < λ1 , for different values of the constants c1 and c2 . The trivial solution x = 0 is called an unstable point. Finally, the curved lines on each quadrant start at the origin, and they correspond to the following choices of the constants: c1 > 0, c2 > 0, Line starting on the second to the first quadrant; c1 < 0, c2 > 0, Line starting on the second to the third quadrant; c1 < 0, c2 < 0, Line starting on the fourth to the third quadrant, c1 > 0, c2 < 0, Line starting on the fourth to the first quadrant. Example 8.3.3: (Case λ2 < 0 < λ1 .) Sketch the phase diagram of the solutions to the differential equation 13 ˙ x = A x, A= . (8.13) 31 G. NAGY – LINEAR ALGEBRA December 8, 2009 259 Solution: We known from the calculations performed in Example 8.3.1 that the general solution to the differential equation above is given by x(t) = c1 v1 eλ1 t + c2 v2 eλ2 t ⇔ x(t) = c1 1 4t −1 −2t e + c2 e, 1 1 where we have introduced the eigenvalues and eigenvectors λ1 = 4 , v1 = 1 1 and λ2 = −2, v2 = −1 . 1 In Fig. 56 we have sketched four curves, each representing a solution x(t) corresponding to a particular choice of the constants c1 and c2 . These curves actually represent eight different solutions, for eight different choices of the constants c1 and c2 , as is described below. The arrows on these curves represent the change in the solution as the variable t grows. The part of the solution with positive eigenvalue increases exponentially when t grows, while the part of the solution with negative eigenvalue decreases exponentially when t grows. The straight lines correspond to the following four solutions: c1 = 1, c2 = 0, Line on the first quadrant, starting at the origin, parallel to v1 ; c1 = 0, c2 = 1, Line on the second quadrant, ending at the origin, parallel to v2 ; c1 = −1, c2 = 0, Line on the third quadrant, starting at the origin, parallel to −v1 ; c1 = 0, Line on the fourth quadrant, ending at the origin, parallel to −v2 . c2 = −1, x2 1 v2 v1 −1 1 x1 −1 Figure 56. The graph of several solutions to Eq. (8.13) corresponding to the case λ2 < 0 < λ1 , for different values of the constants c1 and c2 . The trivial solution x = 0 is called a saddle point. Finally, the curved lines on each quadrant correspond to the following choices of the constants: c1 > 0, c2 > 0, Line from the second to the first quadrant, c1 < 0, c2 > 0, Line from the second to the third quadrant, c1 < 0, c2 < 0, Line from the fourth to the third quadrant, c1 > 0, c2 < 0, Line from the fourth to the first quadrant. 260 G. NAGY – LINEAR ALGEBRA december 8, 2009 Example 8.3.4: (Case λ2 < λ1 < 0.) Sketch the phase diagram of the solutions to the differential equation 1 −9 3 ˙ . (8.14) x = A x, A= 1 −11 4 Solution: The characteristic equation for this matrix A is given by det(A − λ I) = λ2 + 5λ + 6 = 0 ⇒ λ1 = −2, λ2 = −3. One can show that the corresponding eigenvectors are given by v1 = 3 , 1 v2 = −2 . 2 So the general solution to the differential equation above is given by x(t) = c1 v1 eλ1 t + c2 v2 eλ2 t ⇔ x(t) = c1 3 −2t − 2 − 3t e + c2 e. 1 2 In Fig. 57 we have sketched four curves, each representing a solution x(t) corresponding to a particular choice of the constants c1 and c2 . These curves actually represent eight different solutions, for eight different choices of the constants c1 and c2 , as is described below. The arrows on these curves represent the change in the solution as the variable t grows. Since both eigenvalues are negative, the length of the solution vector always decreases as t grows and the solution vector always approaches zero. The straight lines correspond to the following four solutions: c1 = 1, c2 = 0, Line on the first quadrant, ending at the origin, parallel to v1 ; c1 = 0, c2 = 1, Line on the second quadrant, ending at the origin, parallel to v2 ; c1 = −1, c2 = 0, Line on the third quadrant, ending at the origin, parallel to −v1 ; c1 = 0, Line on the fourth quadrant, ending at the origin, parallel to −v2 . c2 = −1, x2 v2 v1 x1 Figure 57. The graph of several solutions to Eq. (8.14) corresponding to the case λ2 < λ1 < 0, for different values of the constants c1 and c2 . The trivial solution x = 0 is called a stable point. G. NAGY – LINEAR ALGEBRA December 8, 2009 261 Finally, the curved lines on each quadrant start at the origin, and they correspond to the following choices of the constants: c1 > 0, c2 > 0, Line entering the first from the second quadrant; c1 < 0, c2 > 0, Line entering the third from the second quadrant; c1 < 0, c2 < 0, Line entering the third from the fourth quadrant, c1 > 0, c2 < 0, Line entering the first from the fourth quadrant. 8.3.2. Non-repeated complex eigenvalues. The complex eigenvalues of a real valued matrix A ∈ Rn,n are always complex conjugate pairs, as it is shown below. Lemma 8.18. (Conjugate pairs) If a real valued matrix A ∈ Rn, has a complex eigenvalue λ with eigenvector v, then λ and v are also an eigenvalue and eigenvector of matrix A. Proof of Lemma 8.18: Complex conjugate the eigenvalue-eigenvector equation for λ and v and recalling that A = A, we obtain Av = λv ⇔ A v = λ v. Since the complex eigenvalues of a matrix with real coefficients are always complex conjugate pairs, there is an even number of complex eigenvalues. Denoting the eigenvalue pair by λ± and the corresponding eigenvector pair by v± , it holds that λ+ = λ− and v+ = v− . Hence, an eigenvalue and eigenvector pairs have the form λ± = α ± iβ, v± = a ± ib, (8.15) where α, β ∈ R and a, b ∈ Rn . It is simple to obtain two linearly independent solutions to the differential equation in Eq. (8.9) in the case that matrix A has a complex conjugate pair of eigenvalues and eigenvectors. These solutions can be expressed both as complex-valued or as real-valued functions. Theorem 8.19. (Conjugate pairs) Let λ± = α ± iβ be eigenvalues of a matrix A ∈ Rn,n with respective eigenvectors v± = a ± ib, where α, β ∈ R, while a, b ∈ Rn , and n 2. Then a linearly independent set of complex valued solutions to the differential equation in (8.9) is formed by the functions x+ = v+ eλ- t , x− = v− eλ- t , (8.16) while a linearly independent set of real valued solutions to Eq. (8.9) is given by the functions x1 = a cos(βt) − b sin(βt) eαt , x2 = a sin(βt) + b cos(βt) eαt . (8.17) Proof of Theorem 8.19: We know from Theorem 8.17 that two linearly independent solutions to Eq. (8.9) are given by Eq. (8.16). The new information in Theorem 8.19 above is the real-valued solutions in Eq. (8.17). They can be obtained from Eq. (8.16) as follows: x± = (a ± ib) e(α±iβ )t = eαt (a ± ib) e±iβt = eαt (a ± ib) cos(βt) ± i sin(βt) = eαt a cos(βt) − b sin(βt) ± ieαt a sin(βt) + b cos(βt) . 262 G. NAGY – LINEAR ALGEBRA december 8, 2009 Since the differential equation in (8.9) is linear, the functions below are also solutions, 1 x+ + x− = eαt a cos(βt) − b sin(βt) , 2 1 x2 = x+ − x− = eαt a sin(βt) + b cos(βt) . 2i This establishes the Theorem. x1 = Example 8.3.5: Find a real-valued set of fundamental solutions to the differential equation ˙ x = A x, 23 , −3 2 A= (8.18) and then sketch a phase diagram for the solutions of this equation. Solution: Fist find the eigenvalues of matrix A above, 0= (2 − λ) 3 = (λ − 2)2 + 9 −3 (2 − λ) ⇒ λ± = 2 ± 3i. We then find the respective eigenvectors. The one corresponding to λ+ is the solution of the homogeneous linear system with coefficients given by 2 − (2 + 3i) −3 3 −3i 3 −i = → 2 − (2 + 3i) −3 −3i −1 Therefore the eigenvector v+ = v1 = −iv2 ⇒ 1 1 → −i −1 i 1 → −i 0 i . 0 v1 is given by v2 v2 = 1 , v1 = −i, ⇒ v+ = −i , 1 λ+ = 2 + 3i. The second eigenvectors is the complex conjugate of the eigenvector found above, that is, v− = i , 1 λ− = 2 − 3i. Notice that v± = 0 −1 ± i. 1 0 Hence, the real and imaginary parts of the eigenvalues and of the eigenvectors are given by α = 2, β = 3, a= 0 , 1 b= −1 . 0 So a real-valued expression for a fundamental set of solutions is given by x1 = 0 −1 cos(3t) − sin(3t) e2t 1 0 ⇒ x1 = sin(3t) 2t e, cos(3t) x2 = 0 −1 sin(3t) + cos(3t) e2t 1 0 ⇒ x2 = − cos(3t) 2t e. sin(3t) The phase diagram of these two fundamental solutions is given in Fig. 58 below. There is also a circle given in that diagram, corresponding to the trajectory of the vectors ˜1 = x sin(3t) cos(3t) ˜2 = x − cos(3t) . sin(3t) The trajectory of these vectors is a circle since their length is constant equal to one. G. NAGY – LINEAR ALGEBRA December 8, 2009 263 x2 x (1) a x1 b x (2) Figure 58. The graph of the fundamental solutions x1 and x2 of the Eq. (8.18). In the particular case that the matrix A in Eq. (8.9) is 2 × 2, then any solutions is a linear combination of the solutions given in Eq. (8.17). That is, the general solution of given by x(t) = c1 x1 (t) + c2 x2 (t), where x1 = a cos(βt) − b sin(βt) eαt , x2 = a sin(βt) + b cos(βt) eαt . We now do a qualitative study of the phase diagrams of the solutions for this case. We first fix the vectors a and b, and the plot phase diagrams for solutions having α > 0, α = 0, and α < 0. These diagrams are given in Fig. 59. One can see that for α > 0 the solutions spiral outward as t increases, and for α < 0 the solutions spiral inwards to the origin as t increases.. x2 x x2 x2 (2) x a b (2) x (1) x (1) a b x1 x (1) a b x1 x1 x (2) Figure 59. The graph of the fundamental solutions x1 and x2 (dashed line) of the Eq. (8.17) in the case of α > 0, α = 0, and α < 0, respectively. Finally, let us study the following cases: We fix α > 0 and the vector a, and we plot the phase diagrams for solutions with two choices of the vector b, as shown in Fig. 60. It can then bee seen that the relative directions of the vectors a and b determines the rotation direction of the solutions as t increases. 264 G. NAGY – LINEAR ALGEBRA december 8, 2009 x2 x x2 (2) x a x (2) a (1) b x1 x1 b x (1) Figure 60. The graph of the fundamental solutions x1 and x2 (dashed line) of the Eq. (8.17) in the case of α > 0, for a given vector a and for two choices of the vector b. The relative positions of the vectors a and b determines the rotation direction. G. NAGY – LINEAR ALGEBRA December 8, 2009 265 Exercises. 8.3.1.- Given the matrix A ∈ R2,2 and vector x0 ∈ R2 below, find the function x : R → R2 solution of the initial value problem ˙ x(t) = A x(t), where » A= 1 3 x(0) = x0 , – »– 3 3 , x0 = . 1 4 8.3.2.- Given the matrix A and the vector x0 in Exercise 8.3.1, compute the operator valued function eA t and verify that the solution x of the initial value problem given in Exercise 8.3.1 can be written as x(t) = eA t x0 . 8.3.3.- Given the matrix A ∈ R2,2 below, find all functions x : R → R2 solutions of the differential equation ˙ x(t) = A x(t), where » – 1 −1 A= . 5 −3 Since this matrix has complex eigenvalues, express the solutions as linear combination of real vector valued functions. 8.3.4.- Given the matrix A ∈ R3,3 and vector x0 ∈ R3 below, find the function x : R → R3 solution of the initial value problem ˙ x(t) = A x(t), where 2 3 A = 40 0 0 2 0 x(0) = x0 , 3 23 1 1 2 5 , x 0 = 42 5 . 1 3 266 G. NAGY – LINEAR ALGEBRA december 8, 2009 8.4. Normal operators In this Section we introduce a particular type of linear operators called normal operators, which are defined on vector spaces having an inner product. Particular cases are Hermitian operators and unitary operators, which are widely used in physics. Rotations in space are examples of unitary operators on real vector spaces, while physical observables in quantum mechanics are examples of Hermitian operators. In this Section we restrict our description to finite dimensional inner product spaces. These definitions generalize the notions of unitary and Hermitian matrices already introduced in Chapter 2. We first describe the Riesz Representation Theorem, needed to verify that the notion of adjoint of a linear operator is well-defined. After reviewing the commutator of two operators we then introduce normal operators and discuss the particular cases of unitary and Hermitian operators. Finally we comment on the relations between these notions and the unitary and Hermitian matrices already introduced in Chapter 2. The Riesz Representation Theorem is a statement concerning linear functionals on an inner product space. Given a vector space V over the scalar field F, a linear functional is a scalar-valued linear function f : V → F, that is, for all x, y ∈ V and all a, b ∈ F holds f (ax + by) = a f (x) + b f (y) ∈ F. An example of a linear functional on R3 is the function x1 R3 x = x2 → f (x) = 3x1 + 2x2 + x3 ∈ R. x3 This function can be expressed in terms of the dot product in R3 as follows 3 f (x) = u · x, u = 2 . 1 The Riesz Representation Theorem says that what we did in this example can be done in the general case. In an inner product space V , , every linear functional f can be expressed in terms of the inner product. Theorem 8.20. Consider a finite dimensional inner product space V , , over the scalar field F. For every linear functional f : V → F there exists a unique vector uf ∈ V such that holds f (v) = uf , v ∀v ∈ V. Proof of Theorem 8.20: Introduce the set N = {v ∈ V : f (v) = 0 } ⊂ V. This set is the analogous to linear functionals of the null space of linear operators. Since f is a linear function the set N is a subspace of V . (Proof: Given two elements v1 , v2 ∈ N and two scalars a, b ∈ F, holds that f (av1 + bv2 ) = a f (v1 ) + b f (v2 ) = 0 + 0, so (av1 + bv2 ) ∈ N .) Introduce the orthogonal complement of N , that is, N ⊥ = {w ∈ V : w, v = 0 ∀v ∈ V }, ⊥ ⊥ which is also a subspace of V . If N ⊥ = {0}, then N = N ⊥ = {0} = V . Since the null space of f is the whole vector space, the functional f is identically zero, so only for the choice uf = 0 holds f (v) = 0, v for all v ∈ V . In the case that N ⊥ = {0} we now show that this space cannot be very big, in fact it ˜ has dimension one, as the following argument shows. Choose u ∈ N ⊥ such that f (˜ ) = 1. u G. NAGY – LINEAR ALGEBRA December 8, 2009 267 Then notice that for every w ∈ N ⊥ the vector w − f (w)˜ is trivially in N ⊥ but it is also in u N , since ˜ u f w − f (w) u = f (w) − f (w) f (˜ ) = f (w) − f (w) = 0. ˜ A vector both in N and N ⊥ must vanish, so w = f (w) u. Then every vector in N ⊥ is ˜ proportional to u, so dim N ⊥ = 1. This information is used to split any vector v ∈ V as ˜ follows v = a u + x where x ∈ V and a ∈ F. It is clear that ˜ f (v) = f (a u + x) = a f (˜ ) + f (x) = a f (˜ ) = a. u u However, the function with values g (v) = since for all v ∈ V holds ˜ u g (v) = ,v = ˜ u2 ˜ u ,v ˜ u2 has precisely the same values as f , ˜ u a ˜ , (a u + x) = 2 ˜ ˜ u u 2 ˜˜ u, u + 1 ˜ u 2 ˜ u, x = a. ˜˜ Therefore, choosing uf = u/ u 2 , holds that f (v) = uf , v ∀v ∈ V. ⊥ Since dim N = 1, the choice of uf is unique. This establishes the Theorem. Given a linear operator defined on an inner product space, a new linear operator can be defined through an equation involving the inner product. Proposition 8.21. Let T ∈ L(V ) be a linear operator on a finite-dimensional inner product space V , , . There exists one and only one linear operator T ∗ ∈ L(V ) such that v, T ∗ (u) = T(v), u holds for all vectors u, v ∈ V . Given any linear operator T on a finite-dimensional inner product space, the operator T ∗ whose existence is guaranteed in Proposition 8.21 is called the adjoint of T. Proof of Proposition 8.21: We first establish the following statement: For every vector u ∈ V there exists a unique vector w ∈ V such that T(v), u = v, w ∀v ∈ V. (8.19) The proof starts noticing that for a fixed u ∈ V the scalar-valued function fu : V → F given by fu (v) = u, T(v) is a linear functional. Therefore, the Riesz Representation Theorem 8.20 implies that there exists a unique vector w ∈ V such that fu (v) = w, v . This establishes that for every vector u ∈ V there exists a unique vector w ∈ V such that Eq. (8.19) holds. Now that this statement is proven we can define a map, that we choose to denote as T ∗ : V → V , given by u → T ∗ (u) = w. We now show that this map T ∗ is linear. Indeed, for all u1 , u2 ∈ V and all a, b ∈ F holds v, T ∗ (au1 + bu2 ) = T(v), (au1 + bu2 ) ∀v ∈ V, = a T(v), u1 + b T(v), u2 ) = a v, T ∗ (u1 ) + b v, T ∗ (u2 ) = v, a T ∗ (u1 ) + b T ∗ (u2 ) ∀v ∈ V, hence T ∗ (au1 + bu2 ) = a T ∗ (u1 ) + b T ∗ (u2 ). This establishes the Proposition. The next result relates the adjoint of a linear operator with the concept of the adjoint of a square matrix introduced in Sect. 2.2. Recall that given a basis in the vector space, every linear operator has associated a unique square matrix. Let us use the notation [T ] and [T ∗ ] for the matrices on a given basis of the operators T and T ∗ , respectively. 268 G. NAGY – LINEAR ALGEBRA december 8, 2009 Proposition 8.22. Let V , , be a finite-dimensional vector space, let V be an orthonormal basis of V , and let [T ] be the matrix of the linear operator T ∈ L(V ) in the basis V . Then, the matrix of the adjoint operator T ∗ in the basis V is given by [T ∗ ] = [T ]∗ . Proposition 8.22 says that the matrix of the adjoint operator is the adjoint of the matrix of the operator, however this is true only in the case that the basis used to compute the respective matrices is orthonormal. If the basis is not orthonormal, the relation between the matrices [T ] and [T ∗ ] is more involved. Proof of Proposition 8.22: Let V = {e1 , · · · , en } be an orthonormal basis of V , that is, ei , ej = 0 if i = j, 1 if i = j. The components of two arbitrary vectors u, v ∈ V in the basis V is denoted as follows u= ui ei , v= i vi ei . i The action of the operator T can also be decomposed in the basis V as follows T(ej ) = [T ]ij ei , [T ]ij = [T(ej )]i . i We use the same notation for the adjoint operator, that is, T ∗ (ej ) = [T ∗ ]ij ei , [T ∗ ]ij = [T ∗ (ej )]i . i The adjoint operator is defined such that the equation v, T ∗ (u) = T(v), u holds for all u, v ∈ V . This equation can be expressed in terms of components in the basis V as follows vi ei , uj [T ∗ (ej )]k ek = ijk vi [T(ei )]k ek , uj ej , ijk that is, v i uj [T ∗ ]kj ei , ek = ijk v i [T ]ki uj ek , ej . ijk Since the basis V is orthonormal we obtain the equation v i uj [T ∗ ]ij = ij v i [T ]ji uj , ijk which holds for all vectors u, v ∈ V , so we conclude [T ∗ ]ij = [T ]ji ⇔ [T ∗ ] = [T ] T ⇔ [T ∗ ] = [T ]∗ . This establishes the Proposition. Example 8.4.1: Consider the inner product space C3 , · . Find the adjoint of the linear operator T with matrix in the standard basis of C3 given by x1 + 2ix2 + ix3 x1 , ix1 − x3 [T(x)] = [x] = x2 . x1 − x2 + ix3 x3 Solution: The matrix of this operator in the standard basis of C3 is given by 1 2i i 0 − 1 . [T ] = i 1 −1 i G. NAGY – LINEAR ALGEBRA December 8, 2009 269 Since the standard basis is an orthonormal basis with respect to the dot product, Proposition 8.22 implies that ∗ 1 2i i 1 −i 1 x1 − ix2 + x3 0 −1 = −2i 0 −1 ⇒ [T ∗ (x)] = −2ix1 − x3 . [T ∗ ] = [T ]∗ = i 1 −1 i −i −1 −i −ix1 − x2 − ix3 Recall now that the commutator of two linear operators T, S ∈ L(V ) is the linear operator [T, S] ∈ L(V ) given by [T, S](u) = T(S(u)) − S(T(u)) ∀u ∈ V. Two operators T, S ∈ L(V ) are said to commute iff their commutator vanishes, that is, [T, S] = 0. Examples of operators that commute are two rotations on the plane. Examples of operators that do not commute are two arbitrary rotations in space. Definition 8.23. A linear operator T defined on a finite-dimensional inner product space V , , is called a normal operator iff holds [T, T ∗ ] = 0, that is, the operator commutes with its adjoint. An interesting characterization of normal operators is the following: A linear operator T on an inner product space is normal iff T(u) = T ∗ (u) holds for all u ∈ V . Normal operators are particularly important because for these operators hold the Spectral Theorem, which we study in Chapter 8. Two particular cases of normal operators are often used in physics. A linear operator T on an inner product space is called a unitary operator iff T ∗ = T−1 , that is, the adjoint is the inverse operator. Unitary operators are normal operators, since T ∗ = T−1 ⇒ TT ∗ = I, T ∗ T = I, ⇒ [T, T ∗ ] = 0. Unitary operators preserve the length of a vector, since v 2 = v, v = v, T−1 (T(v)) = v, T ∗ (T(v)) = T(v), T(v) = T(v) 2 . Unitary operators defined on a complex inner product space are particularly important in quantum mechanics. The particular case of unitary operators on a real inner product space are called orthogonal operators. So, orthogonal operators do not change the length of a vector. Examples of orthogonal operators are rotations in R3 . A linear operator T on an inner product space is called an Hermitian operator iff T ∗ = T, that is, the adjoint is the original operator. This definition agrees with the definition of Hermitian matrices given in Chapter 2. Example 8.4.2: Consider the inner product space C3 , · and the linear operator T with matrix in the standard basis of C3 given by x1 − ix2 + x3 x1 [T(x)] = ix1 − x3 , [x] = x2 . x1 − x2 + x3 x3 Show that T is Hermitian. 270 G. NAGY – LINEAR ALGEBRA december 8, 2009 Solution: We need to compute the adjoint of T. The matrix of this operator in the standard basis of C3 is given by 1 −i 1 0 − 1 . [T ] = i 1 −1 1 Since the standard basis is an orthonormal basis with respect to the dot product, Proposition 8.22 implies that ∗ 1 −i 1 1 −i 1 0 −1 = [T ]. 0 −1 = i [T ∗ ] = [T ]∗ = i 1 −1 1 1 −1 1 Therefore, T ∗ = T. G. NAGY – LINEAR ALGEBRA December 8, 2009 Exercises. 8.4.1.- . 8.4.2.- . 271 272 G. NAGY – LINEAR ALGEBRA december 8, 2009 8.5. The spectral theorem To be done. • A sufficient condition for diagonalizable: An n×n matrix with n different eigenvalues have a linearly independent set of n vectors. • Problem: Find or characterize diagonalizable operators, that is, operators having a complete set of eigenvectors. Recall that the matrix representations of an operator changes by similarity transformations under a change of basis. That is why the problem of finding diagonalizable operators is sometimes referred to as finding those matrices that can be diagonalized by a similarity transformation. • Definition of Hilbert adjoint operator. Generalizes the adjoint of a matrix to linear operators. • Recall: The commutator and normal operators. Particular case: Hermitian operators. • Review orthogonal projectors. (P Hermitian and P2 = P.) • unitary Spectral decomposition Theorem: Normal operators have a complete orthonormal set of eigenvectors. • Also: T is normal, then (a) N (T) ⊥ R(T), (b) N (T − λi I) ⊥ N (T − λj I) for i = j , (c) Spectral Theorem holds, and the projectors become orthogonal projectors. • Important case: If F = R, then Hermitian operators are symmetric operators. G. NAGY – LINEAR ALGEBRA December 8, 2009 Exercises. 8.5.1.- . 8.5.2.- . 273 274 G. NAGY – LINEAR ALGEBRA december 8, 2009 Chapter 9. Appendix 9.1. Review exercises 9.1.1. Chapter 1: Linear systems. 1.- Consider the linear system 2x1 + 3x2 − x3 = 6 −x1 − x2 + 2x3 = −2 x1 + 2x3 = 2 (a) Use Gauss operations to find the reduced echelon form of the augmented matrix for this system. (b) Is this system consistent? If “yes,” find all the solutions. 2.- Find all the solutions x to the linear system Ax = b and express them in vector form, where 3 23 2 1 1 −2 −1 1 8 5 , b = 42 5 . A = 42 1 1 −1 1 3.- Consider the matrix and the vector 3 23 2 0 1 −2 7 1 1 5 , b = 41 5 . A = 41 3 2 22 Is the vector b a linear combination of the column vectors of A? 4.- Let s be a real number, and consider the system sx1 − 2sx2 = −1, 3x1 + 6sx2 = 3. (a) Determine the values of the parameter s for which the system above has a unique solution. (b) For all the values of s such that the system above has a unique solution, find that solution. 5.- Find the values of k such that the system below has no solution; has one solution; has infinitely many solutions; kx1 + x2 = 1 x1 + kx2 = 1. 6.- Find a condition on the components of vector b such that the system Ax = b is consistent, where 2 3 23 1 1 −1 b1 A = 42 0 − 6 5 , b = 4b 2 5 . 3 1 −7 b3 7.- Find the general solution to the homogeneous linear system with coefficient matrix 2 3 1 3 −1 5 3 05 , A = 42 1 32 41 and write this general solution in vector form. 8.- (a) Find a value of the constants h and k such that the non-homogeneous linear system below is consistent and has one free variable. x1 + h x2 + 5x3 = 1, x2 − 2x3 = k, x1 + 3x2 − 3x3 = 5. (b) Using the value of the constants h and k found in part (a), find the general solution to the system given in part (a). 9.- (a) Find the general solution to the system below and write it in vector form, x1 + 2x2 − x3 = 2, 3x1 + 7x2 − 3x3 = 7, x1 + 4x2 − x3 = 4. (b) Sketch a graph on R3 of the general solution found in part (a). G. NAGY – LINEAR ALGEBRA December 8, 2009 275 9.1.2. Chapter 2: Matrix algebra. 1.- Consider the vectors »– »– 1 1 u= , v= , 1 −1 and the linear function T : R2 → R2 such that »– »– 1 3 T (u) = , T (v ) = . 3 1 Find the matrix A = [T (e1 ), T (e2 )] of the linear transformation, where 1 1 e1 = (u + v), e2 = (u − v). 2 2 Show your work. 2.- Find the matrix for the linear transformation T : R2 → R2 representing a reflection on the plane along the vertical axis followed by a rotation by θ = π/3 counterclockwise. 3.- Which of the following matrices below is equal to (A + B)2 for every square matrices A and B? (B + A)2 , 2 2 A + 2AB + B , (A + B)(B + A), 2 A + AB + BA + B2 , A(A + B) + (A + B)B. 4.- Find a matrix A solution of the matrix equation – » 54 AB + 2 I2 = , −2 3 where » B= 7 2 5.- Consider the matrix 2 53 A = 41 2 21 – 3 . 1 3 s −15 . 1 Find the value(s) of the constant s such that the matrix A is invertible. 6.- Let A be an n × n matrix, D be and m × m matrix, and C be and m × n matrix. Assume that both A and D are invertible matrices and denote by A−1 , D−1 their respective inverse matrices. Let M be the (n + m) × (n + m) matrix » – A0 M= . CD Find an m × n matrix X (in terms of any of the matrices A, D, A−1 , D−1 , and C) such that M is invertible and the inverse is given by » −1 – A 0 M −1 = −1 . X D 7.- Consider the matrix and the vector 2 3 23 1 −2 7 0 1 15 , b = 415 . A = 41 2 22 3 (a) Does vector b belong to the R(A)? (b) Does vector b belong to the N (A)? 8.- Consider the matrix 2 1 2 3 A = 4− 2 0 −1 6 2 −2 3 −7 0 5. 2 Find N (A) and the R(AT ). 9.- Consider the matrix 2 21 A = 44 5 69 3 3 7 5. 12 (a) Find the LU-factorization of A. (b) Use the LU-factorization above to find the solutions x1 and x2 of the systems Ax = e1 and Ax2 = e2 , where 23 23 1 0 e1 = 405 , e2 = 415 . 0 0 (c) Use the LU-factorization above to find A−1 . 276 G. NAGY – LINEAR ALGEBRA december 8, 2009 9.1.3. Chapter 3: Determinants. 1.- Find the determinant of the matrices » – 1+i −3i A= , −4i 1 − 2i 2 3 0 −1 0 0 6 0 −2 3 −27 7, B=6 40 0 −1 −35 −2 0 0 1 » – cos(θ) − sin(θ) C= . sin(θ) cos(θ) 2.- Given matrix A below, find the cofactors matrix C, and explicitly show that A CT = det(A) I3 , where 2 3 1 3 −1 1 5. A = 44 0 21 3 3.- Given matrix A below, find the coefficients (A−1 )13 and (A−1 )23 of the inverse matrix A−1 , where 2 3 53 1 A = 41 2 −15 . 21 1 4.- Find the change in the area of the parallelogram formed by the vectors »– »– 2 1 , , v= u= 1 2 when this parallelogram is transformed under the following linear transformation, A : R2 → R2 , » – 21 A= . 12 5.- Find the volume of the parallelepiped formed by the vectors 23 23 23 1 3 1 v1 = 425 , v2 = 425 , v3 = 4 1 5 . 3 1 −1 6.- Consider the matrix » 4 A= 1 – 2 . 3 Find the values of the scalar λ such that the matrix (A − λ I2 ) is not invertible. 7.- Prove the following: If there exists an integer k 1 such that A ∈ Fn,n satisfies Ak = 0, then det(A) = 0. 8.- Assume that matrix A ∈ Fn,n satisfies the equation A2 = In . Find all possible values of det(A). 9.- Use Cramer’s rule to find the solution of the linear system Ax = b, where 3 23 2 1 1 4 −1 1 5 , b = 405 . A = 41 1 0 20 3 G. NAGY – LINEAR ALGEBRA December 8, 2009 277 9.1.4. Chapter 4: Vector spaces. 1.- Determine which of the following subsets of R3,3 are subspaces: (a) The symmetric matrices. (b) The skew-symmetric matrices. (c) The matrices A with A2 = A. (d) The matrices A with tr (A) = 0. (e) The matrices A with det(A) = 0. In the case that the set is a subspace, find a basis of this subspace. 2.- Find the dimension and give a basis of the subspace W ⊂ R3 given by 2 3 n − a + b + c − 3d o 4 b + 3c − d 5 with a, b, c, d ∈ R . a + 2b + 8c 3.- Find the dimension of both the null space and the range space of the matrix 2 3 1 11 5 2 14 7 3 5. A = 42 0 −1 2 −3 −1 4.- Consider the matrix 2 1 −1 1 A = 40 1 3 3 5 −25 . −3 (a) Find a basis for the null space of A. (b) Find a basis for the subspace in R3 consisting of all vectors b ∈ R3 such that the linear system Ax = b is consistent. 5.- Show whether the following statement is true or false: Given a vector space V , if the set {v1 , v2 , v3 } ⊂ V is linearly independent, then so is the set {w1 , w2 , w3 }, where w1 = (v1 + v2 ), w2 = (v1 + v3 ), w3 = (v2 + v3 ). 6.- Show that the set U ⊂ P3 given by all polynomials satisfying the condition Z1 p(x) dx = 0 0 is a subspace of P3 . Find a basis for U . 7.- Determine whether the set U ⊂ P2 of all polynomials of the form p(x) = a + ax + ax2 with a ∈ F, is a subspace of P2 . 278 G. NAGY – LINEAR ALGEBRA december 8, 2009 9.1.5. Chapter 5: Linear transformations. 1.- Consider the matrix 2 1 2 3 A = 4− 2 0 −1 6 2 −2 3 −7 0 5. 2 (1) Find a basis for R(A). (2) Find a basis for N (A). (3) Consider the linear transformation A : R4 → R3 determined by A. Is it injective? Is it surjective? Justify your answers. 2.- Let T : R3 → R2 be the linear transformation given by » – 2x1 + 6x2 − 2x3 [T (xs )]s = , ˜ 3x1 + 8x2 + 2x3 s 23 x1 [x]s = 4x2 5 , x3 ˜ where S and S are the standard bases in R3 and R2 , respectively. (a) Is T injective? Is T surjective? (b) Find all solutions of the linear system T (xs ) = bs , where ˜ »– 2 . bs = ˜ −1 (c) Is the set of all solutions found in part (a) a subspace of R3 ? 3.- Let A : C3 → C4 be mation 2 1 60 A=6 41 − i i the linear transfor0 i 0 1 3 i 17 7. 1 + i5 0 Find a basis for N (A) and R(A). 4.- Let D : P3 → P3 be the differentiation operator, dp D(p)(x) = (x), dx and I : Pn → Pn the identity operator. Let S = (1, x, x2 , x3 ) be the standard ordered basis of P3 . Show that the matrix of the operator ( I − D2 ) : P3 → P3 in the basis S is invertible. G. NAGY – LINEAR ALGEBRA December 8, 2009 279 9.1.6. Chapter 6: Inner product spaces. ` ´ 1.- Let V, , be a real inner product space. Show that (x − y) ⊥ (x + y) iff x = y. 2.- Consider ´the inner product space given ` by Fn , · . Prove that for every matrix A ∈ Fn,n holds x · (Ay) = (A∗ x) · y. 3.- A matrix A ∈ Fn,n is called unitary iff A A∗ = A∗ A = In . Show that a unitary matrix does not change the norm of a vector in the in` ´ ner product space Fn , · , that is, for all x ∈ Fn and all unitary matrix A ∈ Fn,n holds Ax = x . 4.- Find all vectors in the inner product ` ´ space R4 , · perpendicular to both 23 23 1 2 64 7 67 6 7 and u = 697 . v=4 5 485 4 1 2 5.- Consider ´ inner product space given the ` by C3 , · and the subspace W ⊂ C3 spanned by the vectors 2 3 2 3 1+i −1 u = 4 1 5, v = 4 0 5. i 2−i (a) Use the Gram-Schmidt method to find an orthonormal basis for W . (b) Extend the orthonormal basis of W into an orthonormal basis for C3 . 280 G. NAGY – LINEAR ALGEBRA december 8, 2009 9.1.7. Chapter 7: Normed spaces. 1.- . 2.- . G. NAGY – LINEAR ALGEBRA December 8, 2009 9.1.8. Chapter 8: Spectral decomposition. 1.- . 2.- . 281 282 G. NAGY – LINEAR ALGEBRA december 8, 2009 9.1.9. Practice Exam 1. 53 1 1. Consider the matrix A = 1 2 −1. Find the coefficients (A−1 )21 and (A−1 )32 of the 21 1 matrix A−1 , that is, of the inverse matrix of A. Show your work. 2. (a) Find k ∈ R such that the volume of the parallelepiped formed by the vectors below is equal to 4, where 1 v1 = 2 , 3 k v3 = 1 1 3 v2 = 2 , 1 (b) Set k = 1 and define the matrix A = [v1 , v2 , v3 ]. Matrix A determines the linear transformation A : R3 → R3 . Is this linear transformation injective (one-to-one)? Is it surjective (onto)? 3. Determine whether the subset V ⊂ R3 is a subspace, where −a + b V = a − 2b a − 7b with a, b ∈ R . If the set is a subspace, find an orthogonal basis in the inner product space R3 , · . 4. True of false: (Justify your answers.) (a) If the set of columns of A ∈ Fm,n is a linearly independent set, then Ax = b has exactly one solution for every b ∈ Fm . (b) The set of column vectors of an 5 × 7 is never linearly independent. 5. Consider the vector space R2 with the standard basis S and let T : R2 → R2 be the linear transformation [T ]ss = 1 2 2 3 . ss Find [T ]bb , the matrix of T in the basis B , where B = [b1 ]s = 1 2 , [b2 ]s = s 1 −2 . s 6. Consider the linear transformations T : R3 → R2 and S : R3 → R3 given by x1 T x2 x3 s s2 x1 − x2 + x3 = −x1 + 2x2 + x3 , s2 3 3 x1 S x2 x3 s 2 s3 3 3x3 = 2x2 , x1 s 3 where S3 and S2 are the standard basis of R and R , respectively. (a) Find a matrix [T ]s3 s2 and the matrix [S ]s3 s3 . Show your work. (b) Find the matrix of the composition T ◦ S : R3 → R2 in the standard basis, that is, find [T ◦ S ]s3 s2 . (c) Is T ◦ S injective (one-to-one)? Is T ◦ S surjective (onto)? Justify your answer. 11 2 7. Consider the matrix A = 1 2 and the vector b = 1. 21 1 G. NAGY – LINEAR ALGEBRA December 8, 2009 283 (a) Find the least-squares solution ˆ to the matrix equation A x = b. x (b) Verify whether the vector A ˆ − b belong to the space R(A)⊥ ? Justify your answers. x −1/2 −3 . 1/2 2 (a) Show that matrix A is diagonalizable. (b) Using that A is diagonalizable, find the lim Ak . 8. Consider the matrix A = k→∞ 9. Let V , , be an inner product space with inner product norm . Let T : V → V be a linear transformation and x, y ∈ V be vectors satisfying the following conditions: T(x) = 2 x, T(y) = −3 y, x = 1/3, y = 1, x ⊥ y. (a) Compute v for the vector v = 3x − y. (b) Compute T(v) for the vector v given above. 2 −1 2 10. Consider the matrix A = 0 1 h. 0 02 (a) Find all eigenvalues of matrix A and their corresponding algebraic multiplicities. (b) Find the value(s) of the real number h such that the matrix A above has a twodimensional eigenspace, and find a basis for this eigenspace. 284 G. NAGY – LINEAR ALGEBRA december 8, 2009 9.1.10. Practice Exam 2. −2 3 −1 2 −1. Find the coefficients (A−1 )13 and (A−1 )21 of 1. Consider the matrix A = 1 −2 −1 1 the inverse matrix of A. Show your work. 2. Consider the vector space P3 ([0, 1]) with the inner product 1 p, q = p(x)q(x) dx. 0 Given the set U = {p1 = x2 , p2 = x3 }, find an orthogonal basis for the subspace U = Span U using the Gram-Schmidt method on the set U starting with the vector p1 . 131 1 3. Consider the matrix A = 2 6 3 0 . 3 9 5 −1 3 1 (a) Verify that the vector v = belongs to the null space of A. −4 −2 (b) Extend the set {v} into a basis of the null space of A. 4. Use Cramer’s rule to find the solution to the linear system 2x1 + x2 x3 = 0 x1 + x3 = 1 x1 + 2x2 + 3x3 = 0. 5. Let S3 and S2 be standard bases of R3 and R2 , respectively, and consider the linear transformation T : R3 → R2 given by x1 −x1 + 2x2 − x3 T x2 = x1 + x3 s2 x3 s , s2 3 3 and introduce the bases U ⊂ R and V ⊂ R2 given by 1 1 0 U = [u1 ]s3 = 1 , [u2 ]s3 = 0 , [u3 ]s3 = 1 0s 1s 1s 3 3 V = [v1 ]s2 = 1 2 , [v2 ]s2 = s2 −3 2 , 3 . s2 Find the matrices [T ]s3 s2 and [T ]uv . Show your work. 6. Consider the inner product space R2,2 , , W = Span E1 = F and the subspace 01 1 , E2 = 10 0 0 −1 Find a basis for W ⊥ , the orthogonal complement of W . ⊂ R2,2 . G. NAGY – LINEAR ALGEBRA December 8, 2009 285 1 2 0 7. Consider the matrix A = 1 −1 and the vector b = 1. −2 1 0 (a) Find the least-squares solution ˆ to the matrix equation A x = b. x (b) Verify that the vector A ˆ − b, where ˆ is the least-squares solution found in part (a), x x belongs to the space R(A)⊥ , the orthogonal complement of R(A). 8. Suppose that a matrix A ∈ R3,3 has eigenvalues λ1 = 1, λ2 = 2, and λ3 = 4. (a) Find the trace of A, find the trace of A2 , and find the determinant of A. (b) Is matrix A invertible? If your answer is “yes”, then prove it and find det(A−1 ); if your answer is “no”, then prove it. 7 5 . 3 −7 (a) Find the eigenvalues and eigenvectors of A. (b) Compute the matrix eA . 9. Consider the matrix A = 10. Find the function x : R → R2 solution of the initial value problem d x(t) = A x(t), x(0) = x0 , dt −5 2 1 where the matrix A = and the vector x0 = . −12 5 1 286 G. NAGY – LINEAR ALGEBRA december 8, 2009 References [1] [2] [3] [4] S. Hassani. Mathematical physics. Springer, New York, 2000. Corrected second printing. D. Lay. Linear algebra and its applications. Addison Wesley, New York, 2005. Third updated edition. C. Meyer. Matrix analysis and applied linear algebra. SIAM, Philadelphia, 2000. G. Strang. linear algebra and its applications. Brooks/Cole, India, 2005. Fourth edition. ...
View Full Document

Page1 / 286

la - LINEAR ALGEBRA GABRIEL NAGY Mathematics Department,...

This preview shows document pages 1 - 2. Sign up to view the full document.

View Full Document Right Arrow Icon
Ask a homework question - tutors are online