math21-2011

math21-2011 - INTERMEDIATE CALCULUS AND LINEAR ALGEBRA Part...

Info iconThis preview shows page 1. Sign up to view the full content.

View Full Document Right Arrow Icon
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: INTERMEDIATE CALCULUS AND LINEAR ALGEBRA Part I J. KAZDAN Harvard University Lecture Notes ii Preface These notes will contain most of the material covered in class, and be distributed before each lecture (hopefully). Since the course is an experimental one and the notes written before the lectures are delivered, there will inevitably be some sloppiness, disorganization, and even egregious blunders—not to mention the issue of clarity in exposition. But we will try. Part of your task is, in fact, to catch and point out these rough spots. In mathematics, proofs are not dogma given by authority; rather a proof is a way of convincing one of the validity of a statement. If, after a reasonable attempt, you are not convinced, complain loudly. Our subject matter is intermediate calculus and linear algebra. We shall develop the material of linear algebra and use it as setting for the relevant material of intermediate calculus. The first portion of our work—Chapter 1 on infinite series—more properly belongs in the first year, but is relegated to the second year by circumstance. Presumably this topic will eventually take its more proper place in the first year. Our course will have a tendency to swallow whole two other more advanced courses, and consequently, like the duck in Peter and the Wolf, remain undigested until regurgitated alive and kicking. To mitigate—if not avoid—this problem, we shall often take pains to state a theorem clearly and then either prove only some special case, or offer no proof at all. This will be true especially if the proof involves technical details which do not help illuminate the landscape. More often than not, when we only prove a special case, the proof in the general case is essentially identical—the equations only becoming larger. September 1964 iii Afterward I have now taught from these notes for two years. No attempt has been made to revise them, although a major revision would be needed to bring them even vaguely in line with what I now believe is the “right” way to do things. And too, the last several chapters remain unwritten. Because the notes were written as a first draft under panic pressure, they contain many incompletely thought-out ideas and expose the whimsy of my passing moods. It is with this—and the novelty of the material at the sophomore level—in mind, that the following suggestions and students’ reactions are listed. There are three categories, A), Material that turned out to be too difficult (they found rigor hard, but not many of the abstractions), B), changes in the order of covering the stuff, and C), material—mainly supplementary at this level—which is not too hard, but should be omitted if one ever hopes to complete the ”standard” topics within the confines of a year course. (A) It was too hard (unless one took vast chunks of time). (1) Completeness of reals. Only “monotone sequences converge” is needed for infinite series. (2) Term-by-term differentiation and integration of power series. The statement of the main theorem should be fully intelligible—but the proof is too complicated. (3) Cosets. This is apparently too abstract. It might be possible to do after finding general solutions of linear inhomogeneous O.D.E.’s. (4) L2 and uniform convergence of Fourier series. Again, all I ended up doing was to try to state what the issues were, and not to attempt the proof. The ambitious student should be warned that my proof of the Weierstrass theorem is opaque (one should explicitly introduce the idea of an approximate identity). (5) Fundamental Theorem of Algebra. The students simply don’t believe inequalities in such profusion. (6) I you want to see rank confusion, try to teach the class how to compute higher order partial derivatives using the chain rule. That computation should be one of the headaches of advanced calculus. (7) Existence of a determinant function. I don’t know a simple proof except for the one involving permutations—and I hate that one. (8) Dual spaces. As lovely as the ideas are, this topic is too abstract, and to my knowledge, unneeded at this level where almost all of the spaces are either finite dimensional or Hilbert spaces. One should, however, mention the words “vector” and “covector” to distinguish column from row vectors. I forgot to do so in these notes and it did cause some confusion. (B) Changes in Order and Timing. The structure of the notes is to investigate bare linear spaces, then linear mappings between them, and finally non-linear mappings between them. It is with this in mind that linear O.D.E.’s came before nonlinear maps from Rn → R . The course ended by treating the simplest problem in the calculus of variations as an example of a nonlinear map from an infinite dimensional space iv to the reals. My current feeling is to consider linear and non-linear maps between finite dimensional spaces before doing the infinite dimensional example of differential equations. The first semester should get up to the generalities on solving LX = Y , p. 319 [incidentally, the material on inverses (p. 355 ff) belongs around p. 319]. Most students find the material on linear dependence difficult—probably for two reasons: 1) they are not used to formal definitions, and ii) they think they have learned a technique for doing something, not just a naked definition, and can’t quite figure out just what they can do with it. In other words, they should feel these definitions about the anatomy of linear spaces are similar to those describing a football field and of little value until the game begins—i.e., until the operators between spaces make their grand entrance. Because of time shortages, the sections on linear maps from R1 → Rn and Rn → R1 , pp. 320-41 were regrettably omitted both years I taught the course. The notes were written so that these sections can be skipped. (C) Supplementary Material. A remarkable number of fascinating and important topics could have been included—if there were only enough time. For example: (1) Change of bases for linear transformations (including the spectral theorem). (2) Elementary differential geometry of curves and surfaces. (3) Inverse and implicit function theorems. These should be stated as natural generalizations of the problems of a) inverting a linear map, b) finding the null space of a linear map, and c) generalizing dim D(L) = dim R(L) + dim N (L) all to local properties of nonlinear maps via the tangent map. (4) Change of variable in multiple integration. Determinants were deliberately introduced as oriented volume to make the result obvious for linear maps and plausible for nonlinear maps. (5) Constrained extrema using Lagrange multipliers. (6) Line and surface integrals along with the theorems of Gauss, Green, and Stokes. The formal development of differential forms takes too much time to do here. Perhaps a satisfactory solution is to restrict oneself to line integrals and these theorems in the plane, where the topological difficulties are minimal. (7) Elementary Morse Theory. One can prove the Morse inequalities easily for the real line, the circle, the plane, and S 2 merely by gradually flooding these sets and observing the number of lakes and shore line changes only at the critical points. (8) Sturm-Liouville theory. An elegant fusion of the geometry of Hilbert spaces to differential equations. (9) Translation-invariant operators with applications to constant coefficient difference and differential equations. The Laplace and Fourier transforms enter naturally here. (10) The Calculus of Variations. The formalism of nonlinear functionals on R , i.e., maps f : R → R , generalizes immediately to nonlinear functionals defined on infinite dimensional spaces. v (11) The deleted rigor. (12) Linear operators with finite dimensional (perhaps even compact) range. One parting warning. When covering intermediate calculus from this viewpoint, it is all too natural to forget the innocence of the class, to enchant with glitter, and to numb with purity and formalism. Emphasis should be placed on developing insight and intuition along with routine computational facility. My classes found frequent reviews of the mathematical edifice, backward glances at the previous months’ work, not only helpful but mandatory if they were to have any conception of the vast canvas which was being etched in their minds over the course of the year. The question, “What are we doing now and how does it fit into the larger plan?” must constantly be raised and at least partially resolved. May, 1966 Contents 0 Remembrance of Things Past. 0.1 Sets and Functions . . . . . . . . . . . . . . . . . . . . . . . . . 0.2 Relations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 0.3 Mathematical Induction . . . . . . . . . . . . . . . . . . . . . . 0.4 Reals: Algebraic and Order Properties . . . . . . . . . . . . . . 0.5 Reals: Completeness . . . . . . . . . . . . . . . . . . . . . . . . 0.6 Appendix: Continuous Functions and the Mean Value Theorem 0.7 Complex Numbers: Algebraic Properties . . . . . . . . . . . . . 0.8 Complex numbers: Completeness and Functions . . . . . . . . 1 Infinite Series 1.1 Introduction . . . . . . . . . . . . . . . . . . . 1.2 Tests for Convergence of Positive Series . . . 1.3 Absolute and Conditional Convergence . . . . 1.4 Power Series, Infinite Series of Functions . . . 1.5 Properties of Functions Represented by Power 1.6 Complex-Valued Functions, ez , cos z, sin z . . 1.7 Appendix to Chapter 1, Section 7. . . . . . . 2 Linear Vector Spaces: Algebraic Structure 2.1 Examples and Definition . . . . . . . . . . . . a) The Space R2 . . . . . . . . . . . . . . b) The Space Rn . . . . . . . . . . . . . c) The Space C [a, b] . . . . . . . . . . . . d) D. The Space C k [a, b] . . . . . . . . . e) E. The Space l1 . . . . . . . . . . . . . f) F. The Space L1 [a, b] . . . . . . . . . . g) G. The Space fn . . . . . . . . . . . . h) Appendix. Free Vectors . . . . . . . . 2.2 Subspaces. Cosets. . . . . . . . . . . . . . . . 2.3 Linear Dependence and Independence. Span. 2.4 Bases and Dimension . . . . . . . . . . . . . . .... .... .... .... Series .... .... . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1 5 6 7 9 15 22 28 . . . . . . . 33 33 36 41 43 48 65 70 . . . . . . . . . . . . 75 75 75 76 77 77 78 78 79 80 84 88 93 3 Linear Spaces: Norms and Inner Products 101 3.1 Metric and Normed Spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . 101 3.2 The Scalar Product in E2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107 3.3 Abstract Scalar Product Spaces . . . . . . . . . . . . . . . . . . . . . . . . . 113 vii viii CONTENTS 3.4 3.5 3.6 Fourier Series. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Appendix. The Weierstrass Approximation Theorem . . . . . . . . . . . . . The Vector Product in R3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 Linear Operators: Generalities. V 1 → Vn , Vn → V 1 4.1 Introduction. Algebra of Operators . . . . . . . . . 4.2 A Digression to Consider au + bu + cu = f . . . . 4.3 Generalities on LX = Y . . . . . . . . . . . . . . . 4.4 L : R1 → Rn . Parametrized Straight Lines. . . . . 4.5 L : Rn → R1 . Hyperplanes. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132 140 146 . . . . . 147 147 161 170 177 182 5 Matrix Representation 5.1 L : Rm → Rn . . . . . . . . . . . . . . . . . . . . . . . . . 5.2 Supplement on Quadratic Forms . . . . . . . . . . . . . 5.3 Volume, Determinants, and Linear Algebraic Equations. a) Application to Linear Equations . . . . . . . . . 5.4 An Application to Genetics . . . . . . . . . . . . . . . . 5.5 A pause to find out where we are . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 187 187 210 217 234 243 246 6 Linear Ordinary Differential Equations 6.1 Introduction . . . . . . . . . . . . . . . . . . . . . . 6.2 First Order Linear . . . . . . . . . . . . . . . . . . 6.3 Linear Equations of Second Order . . . . . . . . . a) A Review of the Constant Coefficient Case. b) Power Series Solutions . . . . . . . . . . . . c) General Theory . . . . . . . . . . . . . . . . 6.4 First Order Linear Systems . . . . . . . . . . . . . 6.5 Translation Invariant Linear Operators . . . . . . . 6.6 A Linear Triatomic Molecule . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 249 249 252 258 258 259 266 278 283 286 . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 Nonlinear Operators: Introduction 293 1 to R1 , a Review . . . . . . . . . . . . . . . . . . . . . . 293 7.1 Mappings from R 7.2 Generalities on Mappings from Rn to Rm . . . . . . . . . . . . . . . . . . . 295 7.3 Mapping from E1 to En . . . . . . . . . . . . . . . . . . . . . . . . . . . . 300 8 Mappings from En to E : The Differential Calculus 8.1 The Directional and Total Derivatives . . . . . . . . . 8.2 The Mean Value Theorem. Local Extrema. . . . . . . 8.3 The Vibrating String. . . . . . . . . . . . . . . . . . . a) The Mathematical Model . . . . . . . . . . . . b) Uniqueness . . . . . . . . . . . . . . . . . . . . c) Existence . . . . . . . . . . . . . . . . . . . . . 8.4 Multiple Integrals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 309 309 321 332 333 334 336 347 9 Differential Calculus of Maps from En to Em , s. 361 9.1 The Derivative . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 361 9.2 The Derivative of Composite Maps (“The Chain Rule”). . . . . . . . . . . . 373 10 Miscellaneous Supplementary Problems 383 Chapter 0 Remembrance of Things Past. We shall treat a hodge-podge of topics in a hasty and incomplete fashion. While most of these topics should have been learned earlier, section 5 on the completeness of the real numbers has its more rightful place in advanced calculus. Do not take time to read this chapter unless the particular topic is needed; then read only the relevant portions. The chapter is included for reference. 0.1 Sets and Functions A set is any collection of objects, called the elements of the set, together with a criterion for deciding if an object is in the set. For example, I) the set of all girls with blue eyes and blond hair, and ii) the less picturesque set of all positive even integers. We can also define a set by bluntly listing all of its elements. Thus, the set of all students in this class is defined by the list in the roll book. Sets are often specified by a notation which is best described by examples. i) S = { x : x is an integer } is the set of all integers. ii) T = { (x, y ) : x2 + y 2 = 1 } is the set of all points (x, y ) on the unit circle x2 + y 2 = 1 . iii) A = { 1, 2, 7, −3 } is the set of integers 1, 2, 7 and −3 . Our attitude toward set theory will be extremely casual; we shall mainly use it as a language and notation. Without further ado, let us introduce some notation. x ∈ S, x is an element of the set S , or just x is in S . x ∈ S, x is not an element of the set S . Z, the set of all integers, positive, zero, and negative. Z+ , the set of all positive integers, excluding 0. R the set of all real numbers (to be defined more precisely later). C, The set of all complex numbers (also to be defined more precisely later). ∅, the set with no elements, the empty or null set. It is extremely uninteresting. Definition: Given the two sets S and T , i) the set S ∪ T , “ S union T ”, is the set of elements which are in either S or T , or both. ii) The set S ∩ T , “ S intersection T ”, is the set of elements in both S and T . If we represent S by one blob and T by another, S ∪ T is the shaded region while S ∩ T is the cross-hatched region. Note that all elements in S ∩ T are also in S ∪ T . Two sets are disjoint if S ∩ T = ∅ , that is, if their intersection is empty. 1 2 CHAPTER 0. REMEMBRANCE OF THINGS PAST. A subset of a set is another way of referring to a portion of a given set. Formally, A is the subset of S , written A ⊂ S , if every element in A is also an element of S . The set A is a subset of the set S if and only if either A ∪ S = S, or, equivalently, A ∩ S = A. It is possible that A = S , or that A = ∅ . If these degenerate cases are excluded, we say that A is a proper subset of S . Given the two sets S and T , it is natural to form a new set S × T , “ S cross T ”, which consists of all pairs of elements, one from S and the other from T . For example, if S is the set of all men in this class, and T the set of all women in this class, then S × T is the set of all couples, a natural set to contemplate. If x ∈ S and y ∈ T , the standard notation for the induced element in S × T is (x, y ) . Note that the order in (x, y ) is important. The element on the left is from S , while that on the right is from T . For this reason the pair of elements (x, y ) is usually called an ordered pair. The whole set S × T is called the product, direct product, or Cartesian product of S and T , all three names being used interchangeably. You have met this idea in graphing points in the plane. Since these points, (x, y ) , are determined by an ordered pair of real numbers, they are just the elements of R × R . From this example it is clear that even though this set R × R is the product of a set with itself, the order of the pair (x, y ) is still important. For example the point (1, 2) ∈ R × R is certainly not the same as (2, 1) ∈ R × R . Having defined the direct product of two sets S and T as ordered pairs, it is reasonable to define the direct product of three sets S, T, and U as the set of ordered triplets (x, y, z ) , where x ∈ S, y ∈ T , and z ∈ U . The extension to n sets, S1 × S2 × · · · × Sn , is done in the same way. Let us now recall the ideas behind the notion of a function. A function f from the set X into the set B is a rule which assigns to every x ∈ X one and only one element y = f (x) ∈ B . We shall also say that f maps X into B , and write either f f : X → B, or X → B. This alternative notation is useful when X and B are more important than the specific nature of f . The set X is the domain of f , while the range of f is the subset Y ⊂ B of all elements y ∈ B which are the image of (at least) one point x ∈ X , so y = f (x) , or in suggestive notation, Y = f (X ). Automobile license plates supply a nice example, for they assign to every license plate sold a unique car. The domain is the set of all license plates sold, while the range is not all cars, but rather the subset of all cars which are driven. Wrecks and museum pieces neither need nor have license plates since they are not on the roads. Some other examples are i) 1 the function f (n) ≡ n , n = 1, 2, 3, . . . which assigns to every n ∈ Z+ the rational number 1 m n , and ii) the function f (n, m) = n , n, m = 1, 2, 2, . . . , which assigns to every element of m Z+ × Z+ the rational number n . Quite often we shall use functions which map part of some set into part of some other set. In other words the function may be defined on only a subset of a given set and take on values in a subset of some other set. The function f (n, m) ≡ m of the previous paragraph n is of this nature for we defined it on a subset of Z × Z and takes its values on the positive subset of the set of all rational numbers. 0.1. SETS AND FUNCTIONS 3 There is some standard nomenclature (or $10 words if you like) associated with mapping. Say X ⊂ A and the function f : X → B . Note that we know the definition of f only on X . It may not be defined for the remainder of A . Definition: i) if every element of B is the image of (at least) one point in X , the map f is called surjective or onto. In other words f : X → B is a surjection if the range of f is all of B . Thus f is always surjective onto its range. ii) If the map f has the property that for every x1 , x2 ∈ X , we have f (x1 ) = f (x2 ) when and only when x1 = x2 , the map is called injective or one to one (1-1). This is the case if no two different elements in X are mapped into the same element in B . iii) If the map f is both surjective and injective, that is, if it is both onto and 1-1, then f is called bijective. Examples: For these, we have f : X → B where X = B = Z . (1) The map f (n) = 2n is injective but not surjective since the range does not contain the odd integers in B . n 2 n+1 2 if nis even is surjective but not injective since every eleif nis odd ment in B is the image of two distinct elements of X . (2) The map f (n) = (3) The map f (n) = n + 7 is bijective. Notational Remark: For functions whose domain is Z or Z+ it is customary to indicate 1 the element of the range by a notation like an instead of f (n) . Thus f (n) = n , where 1 n ∈ Z+ , is written as an = n . Such a function is usually called a sequence. The concepts we have just defined are useful if we try to define what we mean by the inverse of a function. Definition: A function f : X → B is invertible if to every b ∈ B there is one and only one x ∈ X such that b = f (x) . Thus f is invertible if and only if it is bijective. If f is invertible, we denote the inverse function by f −1 , so x = f −1 (b) . If f : A → B , and g : B → C , then when composed (put together) these two functions induce a mapping, g ◦ f , of A into C . Slightly more generally, if B ⊂ R , and f : A → B while g : R → C , th en g ◦ f : A → C . You should be able to see why the composed map g ◦ f is only defined on A , and then understand that our stipulation that B ⊂ R is a convenient requirement. If x ∈ A and z ∈ C , then g ◦ f maps x onto z = (g ◦ f )(x) , or in more familiar notation, z = g (f (x)) . Now an example. Say the distance s you have walked at time t is specified by the function s = f (t) , and the amount z of shoe leather worn out by walking the distance s is given by the function z = g (s) . Then the amount of shoe leather you have worn out at time t is given by the composed function z = g (f (t)) . Here t ∈ A, s ∈ B , and z ∈ C . Hopefully you have by now recognized that the “chain rule” for derivatives is just the procedure for finding the derivative of composed functions from their constituent parts. In our example the chain rule would be used to find dz from dg and df -if these dt ds dt functions were differentiable. We conclude this section with more symbols—if you have not yet had enough. These are borrowed from logic. Although we shall use them only infrequently as a shorthand, they might have greater use to you in class notes. ∀ “for every” 4 CHAPTER 0. REMEMBRANCE OF THINGS PAST. ∃ “there is”, or “there exists” “such that” A ⇒ B “the truth of statement A implies that of statement B ”. A ⇔ B “statement A is equivalent to statement B , that is, both A ⇒ B and B ⇒ A. Exercises (1) If R = { 1, 4 }, S = { 1, 2, 3, 4, } , and T = { 2, 3, 7 } , find the six other sets R ∪ S, R ∩ S, R ∪ T, R ∩ T, S ∪ T, and S ∩ T . Which of these nine sets are proper subsets of which other sets? (2) If S = { x : |x − 1| ≤ 2 } and T = { x : |x| ≤ 2 } , find S ∪ T and S ∩ T . A sketch is adequate. (3) If A, B , and C are any subsets of a set S , prove (a) (A ∪ B ) ∪ C = A ∪ (B ∪ C ) —so that the parenthesis can be omitted without creating ambiguity. (b) (A ∩ B ) ∩ C = A ∩ (B ∩ C ) —so that again the parentheses are superfluous. (c) (A ∪ B ) ∩ C = (A ∩ C ) ∪ (B ∩ C ). (d) (A ∩ B ) ∪ C = (A ∪ C ) ∩ (B ∪ C ). Remark: two sets X and Y are proved equal by showing that both X ⊂ Y and Y ⊂X. (4) If the function f has domain S , and both A ⊂ C and B ⊂ S , prove that (i) A ⊂ B ⇒ f (A) ⊂ f (B ) . (ii) f (A ∩ B ) ⊂ f (A) ∩ f (B ) [We cannot hope to prove equality because of counterexamples like: let A = { −2, −1, 0, 1, 2, 3 } and B = { −4, −3, −2, −1 } . Then with f (n) = n2 , we have f (A) = { 0, 1, 4, 9 }, f (B ) = { 1, 4, 9, 16 } , and f (A ∪ B ) = { 1, 4 } = f (A) ∩ f (B ) ]. (iii) f (A ∪ B ) = f (A) ∪ f (B ) . (5) For the following functions f : X → B , classify as to injection, surjection, or bijection, or none of these. (i) f (n) = n2 with X = Z+ and B = Z . (ii) Let X = { all rational numbers } , B = { all rational numbers } , and f (x) = n n where x = m ∈ X [Here m is assumed to be reduced to lowest terms.] (iii) f (x) = 1 x 1 m , , where x ∈ X and X = B = { all positive rational numbers } . (iv) X = { all women born in May }, B = { the thirty days in the month of June } , and let f be the function assigning “her birthday” to each woman born in June. (v) f (n) = |n| , with X = B = Z . 0.2. RELATIONS 0.2 5 Relations A relationship often exists between elements of sets. Some common examples are i) a ≥ b , ii) a ⊥ b (perpendicular to), iii) a loves b , and iv) a = b . Let S be a given set, a , b ∈ S , and let R be a relation defined on S (that is, ∀ a, b ∈ S , either aRb or aRb with no third alternative possible). Most relations have at least one of the following properties. (i) reflexive aRa ∀a ∈ S (ii) symmetric aRb ⇒ bRa (iii) transitive (aRb and bRc) ⇒ aRc . Examples: (1) perpendicular ( ⊥ ) is only symmetric. (2) “loves” enjoys none of these (well, maybe it is reflexive). (3) equality ( = ) has all three properties. (4) geometric congruence (∼ and geometric similarity ( ) both have all three. =) (5) parallel ( ) has all three—if we are willing to agree that a line is parallel to itself. (6) “is less than five miles from” is only reflexive and symmetric. (7) for a, b ∈ Z+ , the relation “ a is divisible by b ” is only reflexive and transitive but not symmetric. (8) “less than” ( < ) is only transitive. A relation which is reflexive, symmetric and transitive is called an equivalence relation. The standard examples are those of algebraic equality and of geometric congruence. An equivalence relation on a set S partitions the set into subsets of equivalent elements. Those terms are illustrated in the following. Examples: (1) In the set S of all triangles, the equivalence relation of geometric congruence partitions S into subsets of congruent triangles, any two triangles of S being in the same subset (or equivalence class as it is called) if and only i f they are congruent. (2) In the set P of all people, consider the equivalence relation ”has the same birthday,” disregarding the year. This relation partitions P into 366 equivalence classes. Two people are in the same equivalence class if their birthdays fall on the same day of the year. Notice that any two equivalence classes are either identical or disjoint, that is, they have either no elements in common or they coincide. This is particularly clear from the examples with birthdays. By the fundamental theorem of calculus, we know that the indefinite integral of an integrable function f can be represented by any function F whose derivative is f . The 6 CHAPTER 0. REMEMBRANCE OF THINGS PAST. mean value theorem told us that every other indefinite integral of f differs from F by only a constant. Thus, the indefinite integrals of a given function are an equivalence class of functions, differing from each other by constants. The equivalence relation is “equal up to an additive constant”. Exercises (1) If a, b, c, d ∈ Z+ , let us define the following equivalence relation between the elements of Z+ × Z+ : (a, b)R(c, d) if and only if ad = bc. Verify that R is an equivalence relation. [In real life, the pair (a, b) of this example c is written as a , so all we have said is a = d if and only if ad = bc . This equivalence b b relation partitions the set of rational numbers into very familiar equivalence classes. 3 For example the equivalent rational numbers 1 , 2 , 6 , . . . are in the same equivalence 24 class, to no one’s surprise]. (2) Explain the fallacy in the following argument by observing that equality “ = ” here is not the usual algebraic equality , but rather some other equivalence relation. “Let = dx x . Integration by parts ( p = 1/x , dq = dx ), gives 1 A = x( ) − x x(− 1 ) dx = 1 + A. x2 Hence 0 = 1 .” 0.3 Mathematical Induction You are familiar with a variety of proofs, viz. direct proofs and proofs by contradiction. There is, however, another type of proof which is not encountered very often in elementary mathematics: proof by induction. Abstractly, you have a sequence of statements P1 , P2 , P3 , . . . , and a guess for the nature of the general statement Pn . A proof by mathematical induction provides a method for showing the general statement Pn is correct. Here is how it is carried out. First verify that the statement is true in some special case, say for n = 1 , so you check the validity of P1 . Second you show that if it is true in some particular case n = k , then it is true for the next case n = k + 1 , that is, Pk ⇒ Pk+1 . Now since P1 is true, so is P1+1 = P2 , and consequently so is P2+1 = P3 , and so on up. Observe that the procedure does not tell you how in the world to guess the general statement Pn , but only shows how to verify it. Let us carry out the procedure for an example. We guess the formula n(n + 1) (0-1) 2 step 1. Is the formula true for n = 1 ? Yes, since both sides then equal 1. step 2. Assuming the formula is true for n = k , we must show this implies the formula is true for n = k + 1 . 1 + 2 + ··· + n = 1 + 2 + · · · + k + (k + 1) = (k + 1)(k + 2) . 2 0.4. REALS: ALGEBRAIC AND ORDER PROPERTIES 7 The formula, assumed to be true, for n = k is 1 + 2 + ··· + k = k (k + 1) . 2 Adding (k + 1) to both sides we find that 1 + 2 + · · · + k + (k + 1) = (k + 1)(k + 2) k (k + 1) + (k + 1) = 2 2 which is exactly the statement we wanted. This proves that formula (0.3) is true for all n ≥ 1. Exercises Use mathematical induction to prove the given statements. (1) 12 + 22 + · · · + n2 = (2) d n dx (x ) n(n+1)(2n+1) 6 = nxn−1 (use the formula for the derivative of a product). (3) Let I (n) = π 2 0 sinn x dx (a) Prove the following formula is correct when n is an odd integer ≥ 3 , I (n) = 2 · 4 · 6 · · · ·(n − 1) 1 · 3 · 5 · · · ·n (b) Guess and prove the formula when n is an even integer ≥ 2 . (4) Let Γ(s) = ∞ −t s−1 dt , 0et where s > 0 (this is the famous gamma function ). (a) Show Γ(s + 1) = sΓ(s) (Hint: integrate by parts) (b) If n ∈ Z+ , guess and prove the formula for Γ(n + 1) . 0.4 The Real Numbers: Algebraic and Order Properties. The set of all real numbers can be characterized by a set of axioms. These properties are of three different types, i) algebraic properties, ii) order properties, and iii) the completeness property. Of these, the last is by far the most difficult to grasp. But that is getting ahead of our story. Let S be a set with the following properties. I. Algebraic Properties A. Addition. To every pair of elements a, b ∈ S , is associated another element, denoted by a + b , with the properties A - 0. (a + b) ∈ S A - 1. Associative: for every a, b, c ∈ S, a + (b + c) = (a + b) + c . A - 2. Commutative: a + b = b + a A - 3. There is an additive identity, that is, an element ”0” ∈ S such that 0 + a = a for all a ∈ S . A - 4. For every a ∈ S , there is also a b ∈ S such that a + b = 0 . b is the additive inverse of a , usually written −a . 8 CHAPTER 0. REMEMBRANCE OF THINGS PAST. M. Multiplication. To every pair a, b ∈ S , there is associated another element, denoted by ab , with the properties M - 0. ab ∈ S M - 1. Associative. For every a, b, c ∈ S, a(bc) = (ab)c . M - 2. Commutative. ab = ba . M - 3. There is a multiplicative identity, that is, an element “ l” ∈ S such that la = a for all a ∈ S . Moreover 1 = 0 . M - 4. For every a ∈ S, a = 0 , there is also a b ∈ S such that ab = 1 . b is the 1 multiplicative inverse of a , usually written a or a−1 . D. Connection between Addition and Multiplication. D - 1. Distributive. For every a, b, c ∈ S, a(b + c) = ab + ac . Some sample - and simple—consequences of these nine axioms are i) a + 0 = a , ii) a · 1 = a , and iii) a + b = a + c ⇒ b = c . Any set whose elements satisfy the axioms A-0 to A-4 is called a commutative (or abelian ) group. The group operation here is addition. In this language, we see that the multiplication axioms just state that the elements of S —with the additive identity 0 excluded—also form a commutative group, with the group operation being multiplication. These additive and multiplicative structures are connected by the distributive axiom. Most of high school algebra takes place in this setting; however, the possibility of non-integer exponents is not yet specifically included; in particular the square root of an element of S is not necessarily also in S . Our axioms, or some part of them, are satisfied by sets other than the real numbers. The set of even integers form a commutative group with the group operation being addition, while numbers of the form 2n , n ∈ Z , form a commutative group under multiplication. The set of rational numbers satisfies all nine axioms. Any such set which satisfies all nine axioms is called a field. Both the real numbers and the rational numbers (a subset of the real numbers) are fields. A more thorough investigation of groups and fields is carried out in courses in modern algebra. II. Order Axioms Besides the above algebraic rules, we shall introduce an order relation, intuitively, the notion of ’greater than”. To do this we need to use an undefined concept of positivity for elements of S and use it to state our axioms. O -1. If a ∈ S and b ∈ S are positive, so are a + b and ab . O -2. The additive identity 0 is not positive. O - 3. For every a ∈ S, a = 0 , either a or −a is positive, but not both. If −a is positive, we shall say that a is negative. Trichotomy Theorem. For any two numbers a, b ∈ S , exactly one of the following three statements is true, i) a − b is positive, ii) b − a is positive, or iii) b − a is zero. If the notation a < b is used to mean “ b − a is positive,” and a > b means b < a , then this theorem reads, either a > b, a < b, or a = b . The proof—which you should do—is a simple consequence of our axioms. Some other consequences are a < b and b < c ⇒ a < c (transitivity of “ < ”) a < b and c > 0 ⇒ ac < bc a = 0 ⇒ a2 > 0 . (Since 1 = 12 , this implies 1 > 0 ). The set of rational numbers as well as the set R of real numbers satisfy all twelve axioms. Any set which satisfies these twelve axioms is called an ordered field. 0.5. REALS: COMPLETENESS 9 Exercises √ (1) Let T be a set whose elements are of the form a + b 2 , where a and b are rational numbers (and so are elements of a field). Show that T is also a field. (2) Consider the set of all integers Z with the following equivalence relation: m ∈ Z and n ∈ Z are equivalent if they have the same remainder when divided by 2. The notation for this equivalence is m≡n (mod2) This equivalence relation partitions Z into two equivalence classes which we may denote respectively by 0 if the number is even, and 1 if the number is odd . Thus 8 ≡ −22 (mod 2) and 7 ≡ 13 (mod 2). Prove that the set Z with ordinary addition and multiplication but with this equivalence relation forms a field. (3) Prove the trichotomy theorem. (4) Prove that if a = 0 , then a2 > 0 . Use it to prove that 1 > 0 and then to conclude that all of the ’positive integers” are, in fact, positive. 0.5 The Real Numbers: Completeness Property. III. Completeness Axiom. So far our axioms do not insure that we can take fractional powers like the square root, of an element of an ordered field S and still obtain an element of the same field. The issue here is not merely that of fractional powers or other algebraic operations, but a more serious one. Imagine the (as yet undefined) real number line. Although the rational numbers are an infinite number of points on √ line, there are many √ the “holes” between the rationals. We already know of one “hole” at 2 , there is another at 3 , at π , and at e . In fact, in a sense which can be made precise, almost all of the points on the real number line represent irrational numbers. The completeness axiom is designed to eliminate the possibility of ”holes” in the real number line. It does so by more or less bluntly stating that there are no holes. This is the “Dedekind cut” form of the completeness axiom. we have chosen it over other equivalent axioms because it is easy to visualize—even though the “Cauchy sequence” form is perhaps preferable for more advanced analysis courses. A definition is needed before the axiom can be stated. Definition: Let S1 and S2 be subsets of an ordered field S . Then the set S1 precedes S2 if for every a ∈ S1 and b ∈ S2 , we have a ≤ b . If you imagine the real number line, “ S1 precedes S2 ” should be thought of as meaning that all of S1 is to the left of all of S2 . S1 and S2 of course might touch, or might just miss touching. Completeness Axiom. Let S1 and S2 be nonempty subsets of an ordered field S . If S1 precedes S2 , then there is at least one number c ∈ S such that c precedes S2 and is preceded by S1 . In other words, there is (at least) one element of S between S1 and S2 . Definition: The set of real numbers, R , is a set which satisfies the above axioms of algebra, order, and completeness. Thus, the real numbers is a complete ordered field. This type of definition of R amounts to saying “we don’t know or care what the real numbers are, but in any event they have the required properties.” If we had used the 10 CHAPTER 0. REMEMBRANCE OF THINGS PAST. Cauchy sequence version of the completeness axiom, we would have begun the rational numbers—which we do know—and then defined the real numbers as the set of limits of rational numbers. This would have been somewhat more concrete, but would have involved the difficult concept of limit before we even get off the ground. From the picture associated with the completeness axiom, we see that it exactly states that the real number line has no holes, for - emotionally speaking—if there were a hole, let S1 be the set of real numbers to the left of the hole, and S2 the s et to the right of the hole. Then there would be no real number between S1 and S2 , since the hole is there, contradicting the completeness axiom. Let us use the idea of the last paragraph to show that the rational numbers, an ordered field, are not complete by exhibiting two sets, one preceding the other, which have no rational number between them. Just let S1 = { x : x > 0, x2 < 2 } and S2 = { x : x > 0, x2 > 2 }. √ The only possible number between S1 and S2 is 2 —which is irrational. This construction is just what we need to prove the following sample. Theorem 0.1 Every non-negative real number a ∈ R has a unique non-negative square root. Proof: If a = 0 , then 0 is the square root. If a > 0 , let S1 = { x : x > 0, x2 < a } and S2 = { x : x > 0, x2 > a } . We first show that neither S1 nor S2 is empty. Since 2 a (1 + a )2 = 1 + a + a > a , we know that (1 + a ) ∈ S2 , so S2 = ∅ . Also ( 1+ a )2 < a (check 2 4 2 2 a this) so that 1+ a ∈ S1 and hence S1 = ∅ . Because S1 precedes S2 , by the completeness 2 axiom there is a c ∈ R between S1 and S2 . Notice that c > 0 , since c is preceded by S1 . It remains to show that c2 = a . By the trichotomy theorem, either c2 > a, c2 < a , or c2 = a . The first two possibilities will be shown to give contradictions. If c2 > a , since 2 2 a < ( c 2+a )2 < c2 , we see that c 2+a ∈ S2 an d precedes c2 , contradicting the property c c specified in the completeness axiom that c2 precedes every element of S2 . Similarly the assumption c2 < a , with the inequality c2 < ( c2aca )2 < a , leads to a contradiction. The 2+ only remaining possibility is c2 = a , which shows that c is the desired positive square root of a . Let us now prove that the positive square root c of a is unique. Assume that there are two positive numbers c1 and c2 such that both c2 = a and c2 = a . Then 1 2 0 = c2 − c2 = (c1 − c2 )(c1 + c2 ) 1 2 Since c1 + c2 > 0 , we conclude that c1 − c2 = 0 , so c1 = c2 , completing the proof of the theorem. Definition: The real number M is an upper bound for the set A ⊂ R if for every a ∈ A , we have a ≤ M . The number µ ⊂ R is a least upper bound (l.u.b) for A if µ is an upper bound for A and no smaller number is also an upper bound for A . Lower bound and greatest lower bound (g.l.b) are defined similarly. A set A ⊂ R is bounded if it has both upper and lower bounds. Theorem 0.2 Every non-empty bounded set A ⊂ R has both a greatest lower bound and a least upper bound. 0.5. REALS: COMPLETENESS 11 Proof: Observe first that this theorem utilizes the completeness property in that without it, there might have been a ”hole” just where the g.l.b. and l.u.b. should be. Since the proofs for the g.l.b. and l.u.b. are almost identical we only prove there is a g.l.b. Let S1 = { x : x precedes A }, and S2 = A. By hypothesis S2 = ∅ . Since A is bounded, it has a lower bound m, m ∈ S1 so S1 = ∅ . By the completeness axiom, there is a c ∈ R between S1 and S2 . It should be obvious that c is both greater than or equal to every element of S1 , and less than or equal to every element of S2 - so it is the required g.l.b. Definition: The closed interval [a,b] is the set { x ∈ R : ≤ ≤ } . The open interval (a,b) is the set { x ∈ R : < < } . All we can do is apologize for the multiple use of the parentheses in notation. Please note that sets are not like doors. Some sets, like (a, b) = { x ∈ R : ≤ < } are neither open nor closed. Theorem 0.3 (Nested set property). Let I1 , I2 , . . . be a sequence of non-empty closed bounded intervals, In = { x : an ≤ x ≤ bn } , which are nested in the sense I1 ⊃ I2 ⊃ I3 . . . , so each covers all that follow it. Then there is at least one point c ∈ R which lies in all of the intervals, that is, c is in their intersection c ∈ ∩∞ Ik . k=1 Proof: Let S1 = { x : x precedes some In , and so all Ik , k ≥ n } S2 = { x : x preceded by some In , and so all Ik , k ≥ n }. First, neither S1 nor S2 are empty since a1 ∈ S1 and b1 ∈ S2 . Thus by the completeness axiom, there is at least one c ∈ R between S1 and S2 . This c is the required number (complete the reasoning). If the intervals Ik do not get smaller after, say IN because aN = aN +1 = . . . and bN = bN +1 = . . . , then the whole interval aN ≤ x ≤ bN is caught by the preceding argument. The more common case is there the ak ’s strictly increase and the bk ’s strictly decrease. This is what happens when approximating a real number to successively greater √ accuracy by the decimal expansion. In the case of 2 for example, I1 = { x : 1 ≤ x ≤ 2 }, I2 = { x : 1.4 ≤ x ≤ 1.5 }, I3 = { x : 1.41 ≤ x ≤ 1.42 }, I4 = { x : 1.414 ≤ x ≤ 1.415 }, √ and so on, gradually squeezing down on 2 to any desired accuracy. Definition: The sequence an ∈ R, = , , . . . . of real numbers converges to the real number c if, given any > 0 , there is an integer N such that |an − c| < for all n > N . We will then write an → c . [In practice no confusion arises for the use of → to denote both convergence and mappings (cf. 1)]. Again ordinary decimals supply an example, for they allow us to get arbitrarily close to any real number. We could have defined the real numbers as all decimals; however there would be a mess avoiding the built-in ambiguity illustrated by 1.9999 . . . . = 2.0000 . . . . Theorem 0.4 Under the hypotheses of the previous theorem, if in addition the length of In tends to zero, (bn − an ) → 0 , then the number c ∈ R found is unique. Furthermore, if uk ∈ Ik for all k , that is if ak ≤ uk ≤ bk , then uk → c too. Proof: Suppose there were two real numbers c and c in all of the intervals, ˜ ak ≤ c ≤ bk and ak ≤ c ≤ bk ˜ for all k. 12 CHAPTER 0. REMEMBRANCE OF THINGS PAST. Rewriting the second inequality as −bk ≤ −c ≤ −ak , and adding this to the first inequality, ˜ we find that ak − bk ≤ c − c ≤ bk − ak . Since both sides of this inequality tend to zero, if ˜ c − c = 0 , we would have a contradiction. ˜ To prove uk → c , repeat the above reasoning with c replaced by uk . We find that ˜ ak − bk ≤ c − uk ≤ bk − ak . Again both sides of this inequality tend to zero. Now let us fiddle with the , N definition of limit t o complete the proof. Since bn − an → 0 , given any > 0 , there is an N such that |an − bn | < for all n > N . Thus for any > 0 and the same N, |un − c| < for n > N , which is the definition of un → c . Theorem 0.5 Bolzano-Weierstrass. Every infinite sequence of real numbers { uk } in a bounded interval I has at least one subsequence which converges to a number c ∈ R . Proof: This one is very clever and picturesque. Watch. Bisect I into two intervals I1 ˜ ˜ and I1 of equal length. At least one of I1 or I1 must contain an infinite number of the { uk } ’s. Continuing in this way we obtain a set of nested intervals I ⊃ I1 ⊃ I2 ⊃ . . . each of which have an infinite number of the { uk } ’s, and the length of In tending to zero. From Theorem 3 we conclude that there must be a c ∈ R common to all of the intervals. We must now select the subsequence { ukn } of the { uk } ’s which converge to c . Since each In contains an infinite number of points of the sequence, we can certainly pick one, say ukn ∈ In . This sequence { ukn } satisfies the hypotheses of Theorem 4. Thus ukn → c . Remarks: 1. If we also assume I is closed, then we can further assert that c ∈ I . If I is not closed, c may be an end point I . 2. If a sequence uk converges to a c ∈ R , then every infinite subsequence ukn also converges, and to the same number c . Theorem 0.6 . If the sequence { uk } converges, it is bounded. Proof: Say uk → α , and let = 1 in the definition of convergence. Then there is an N such that |un − α| < 1 for all n > N . Thus, when n > N , |un | = |un − α + α| ≤ |un − α| + |α| < 1 + |a| Therefore for any k the number |uk | is bounded by the largest of the N + 1 numbers |u1 | , |u2 | , . . . , |uN | and (1 + |a|) . The following theorem shows how to handle algebraic combinations of convergent sequences. Theorem 0.7 If an → α and bn → β , then i) an + bn → α + β ii) an bn → αβ n iii) an → α if both bn = 0 , for all n , and if β = 0 . b β Proof: Since the proofs are all similar, we only prove ii). Observe that |an bn − αβ | = |(an bn − αbn ) + (αbn − αβ )| ≤ |an − α| |bn | + |alpha| |bn − β | By Theorem 6, the |bn | ’s are bounded, say by B . Since an → α , given any ε > 0 , there is an N1 such that |an − α| < 2ε for all n > N1 , and since bn → β , for the same ε there B 0.5. REALS: COMPLETENESS 13 is an N2 such that |bn − β | < 2|ε | for all n > N2 . Thus, if n is greater than the larger of α N1 and N2 , n > max(N1 , N2 ) , we find that |an bn − αβ | < εε + =ε 22 which does the job. Definition: The sequence a1 , a2 , . . . of real numbers is said to be monotone increasing if a1 ≤ a2 ≤ a3 ≤ . . . , and monotone decreasing a1 ≥ a2 ≥ a3 ≥ . . . . Both kinds are called monotone sequences. Theorem 0.8 Every bounded monotone sequence a1 , a2 , . . . of real numbers converges. In other words, there is an αεR such that an → α . Proof: We assume the sequence is increasing. The proof for decreasing sequence is identical. Since the sequence is bounded, by Theorem 2 it has a least upper bound αεR . We maintain an → α . Given any ε > 0 , we know that for all n, an < a + ε because α is an upper bound. Since α − ε < α , and α is the l.u.b. of the sequence, we can find an N such that α − ε < aN . But then, because the sequence is increasing α − ε < an for all n ≥ N . Thus for all n ≥ N, a − ε < an < a + ε ; that is, |an − α| < ε for all n ≥ N , proving the convergence to α . We shall close this difficult section with a wonderful procedure for computing the square root of a positive real number. I use it all of the time. It is much easier to understand than the hair-raising method taught in public school. Theorem 0.9 For any positive real numbers A and a0 the infinite sequence defined by A 1 an+1 = (an + ), n = 0, 1, 2, . . . , 2 an √ is monotone decreasing and converges to A . Moreover, if we let bn = √ are monotone increasing and also converge to A : √ ba ≤ b2 ≤ . . . ≤ A ≤ . . . ≤ a2 ≤ a1 (0-2) A an , then the bn ’s Proof: We first show that a2 ≥ A and that ak+1 ≤ ak , k 1 A2 1 A2 ) − A = (ak−1 + ) ≥ 0, so a2 ≥ A. a2 − A = (ak−1 + k k 4 ak − 1 4 ak − 1 From this, it is easy to see that ak+1 ≤ ak , for a2 − A 1 A ≥ 0. ak − ak+1 = ak − (ak + ) = k 2 ak 2ak √ Thus a1 ≥ a2 ≥ · · · . ≥ A . That the a2 converge is an immediate consequence of Theorem 8, since the sequence k 2 } is a bounded (by A) monotone decreasing sequence. Denoting the limit by α, a2 → α , { ak k the proof that α = A is identical to th e reasoning which gave a unique limit in Theorem 4. √ A Since bn = an , and the an ’s decrease and are ≥ A , then the bn ’s increase and are √ √ √ A ≤ A . This also shows that bn ≤ an . Since an → A , we have bn = an → A too. 14 CHAPTER 0. REMEMBRANCE OF THINGS PAST. √ 8 Application: We compute 8 . Take a0 = 3 . Then a1 = 1 (3 + 3 ) = 17 , and 6 √ 2 577 48 6 b1 = 8 · 17 = 17 . Similarly, a2 = 577 , b2 = 1632 . This gives 1632 ≤ 8 ≤ 204 , or in decimal 204 577 577 form √ 2.82842 < 8 < 2.82843, astounding accuracy after only two steps. I carried the computations one step further and found √ 2.828427124 . . . . ≤ 8 ≤ 2.828427124 . . . ., where the dots indicate I gave up on the arithmetic, having obtained the exact value as far as the approximation went. Digital computers use this method and related ones for similar computations. It is particularly well adapted to them (and me) since only simple arithmetic operations are involved. This Theorem 9 gives another proof that every positive real number has a unique positive square root. It is valuable to compare this proof with that of Theorem 1. The main distinction is that the second proof just given is constructive it actually shows a way to compute successive approximations to the square root of any number. However, you are justified in asking how we ever found the procedure of equation (0.9) in the first place. The secret is that this formula is a statement of Newton’s method for finding roots of f (x) = 0 , applied to the particular function f (x) = x2 − A . See most calculus books for more information about this method. Hopefully, we will have time to discuss this topic later, for it is a constructive way o f proving the existence of a sought after object. The standard existence theorem for ordinary differential equations is a close relative of Newton’s method. Exercises (1) For the sequences defined below, find which converge, which do not converge but do have at least one convergent subsequence, and which have neither. In all cases n ∈ Z+ . (a) an = (b) an = (c) bn = 1 n +1 (−1)n n n e (d) an = e−2n+1 (e) an = 1 + n (f) an = 2 + (−1)n √ √ (g) an = n + 1 − n (h) an = (i) an = 2− 3n 5n+1 7n n! (j) sn = 1 + (tough, isn’t it?) 1 2 + 1 4 + 1 8 + ··· + 1 2n . (2) Prove that if an → α and bn → β , then (an + bn ) → α + β , where all the letters represent real numbers. 0.6. APPENDIX: CONTINUOUS FUNCTIONS AND THE MEAN VALUE THEOREM15 (3) a). Prove Bernoulli’s inequality (1 + h)n > 1 + nh, h = 0, h > −1, n ≥ 2. Here h ∈ R and n ∈ Z . I suggest proof by induction. b). If s ∈ R , use part a) to prove that an ≡ sn → [Hint: If |s| < 1 , write |s| = 0.6 1 1+h , 0 if |s| < 1 ∞ if |s| > 1. h > 0 , while if |s| > 1 , write |s| = 1 + h, h > 0] . Appendix: Continuous Functions and the Mean Value Theorem Definition: : The function f (x) is continuous at the point x0 if, given any is a δ ( ) > 0 such that |f (x) − f (x0 )| < ε when 0 < |x − x0 | < δ ( ). Remark: This may be rephrased as lim f (x) = f (x0 ). x→x0 Note that either statement requires (1) f be defined at x0 . (2) lim f (x) exists. x → x0 x = x0 (3) the limiting value of f at x0 is equal to the defined value of f at x0 . If a function is discontinuous at x0 , it has at least one of the four troubles (1) Jump discontinuity (2) Infinite discontinuity (3) Infinite oscillations (4) Removable discontinuity. Here are examples of each trouble at the point x = 0 . (1) f (x) = (2) f (x) = 1, 0 ≤ x −1, x < 0 1 x anything, x=0 say 1, x = 0 > 0 , there 16 CHAPTER 0. REMEMBRANCE OF THINGS PAST. (3) f (x) = 1 sin x anything, (4) f (x) = x, x = 0 1, x = 0 x=0 say 0, x = 0 Note that a function may oscillate infinitely about a point and still be continuous there. This is illustrated by the everywhere continuous function f (x) = 1 x sin x 0 , x=0 , x=0 Theorem 0.10 I. If f (x) is continuous at x = c , and f (c) = A = 0 , then f (x) will keep the same sign as f (c) in a suitably small neighborhood of x = c . Proof: : We construct the desired neighborhood. Assume A is positive. The proof if A < 0 is essentially the same. In the definition of continuity, take = A . Then there is a δ > 0 such that |f (x) − A| < A when |x − c| < δ, that is, 0 < f (x) < 2A, when |x − c| < δ. In other words, f (x) is positive in the interval |x − c| < δ . Theorem 0.11 II. If f (x) is continuous at every point of a closed and bounded interval, then there is a constant M such that |f (x)| ≤ M throughout the interval. Thus a continuous function in a closed and bounded interval is bounded. Proof: : By contradiction. If f is not bounded, there is a sequence of points xn such that |f (xn )| > n . From that sequence by Theorem 5 (Bolzano-Weierstrass) we can select a subsequence xnk which converges to some point x0 in t he interval, xnk → x0 . Thus |f (xnk )| → ∞. But we know from the continuity of f that |f (xnk )| → |f (x0 )| . A contradiction. Moreover, with the same hypotheses, we can conclude more. Theorem 0.12 III. If f is continuous at every point of a closed and bounded interval, then there are points x = α and x = β in the interval where f assumes its greatest and least values, respectively. Proof: : We show that f assumes its greatest value. The proof for the least value is essentially identical. Let S be the set of all upper bounds for f . By Theorem II S is not empty. Therefore by Theorem 2, S has a g.l.b., call it M0 . Since M0 is the greatest lower bound of upper bounds for f , there is a sequence xn such that limn→∞ f (xn ) → M0 . Use Bolzano-Weierstrass to pick a subsequence xnk of the xn such that the xnk converges, say to c . By continuity of f, limnk →∞ f (xnk ) = f (x) . Thus f (c) = M0 , so f does assume its greatest value at x = c . Remark: This theorem refers to the absolute maximum and absolute minimum values. 0.6. APPENDIX: CONTINUOUS FUNCTIONS AND THE MEAN VALUE THEOREM17 Examples: The following show that the theorem is not necessarily true if any of the hypotheses are omitted. (1) f (x) = x, 0 < x ≤ 1 . No min. (interval not closed). (2) f (x) = x, x ≤ 0 , and f (x) = unbounded.) (3) f (x) = 1 , 1+x2 all x , both have no min. (the interval is x, 0 ≤ x < 3. No max. (function is discontinuous.) x − 2, 3 ≤ x ≤ 4 Theorem 0.13 If f (x) is continuous at every point of a closed and bounded interval [a, b] , and if f (a) and f (b) have opposite sign, then there is at least one point c ∈ (a, b) such that f (c) = 0 . Proof: : Say f (a) < 0, f (b) > 0 . We find one point c , “the largest x such that f (x) = 0 ”. Let S = { x ∈ [a, b] : f (x) ≤ 0 } . Since f (a) < 0, S is not empty. It thus has a l.u.b., c . We prove that f (c) = 0 . Either f (c) > 0, f (c) < 0 , or f (c) = 0 . The first two possibilities cannot happen, since by Theorem I, if they did, f would be positive (or negative) in a whole neighborhood of c -violating the fact that c is the l.u.b. of S . Corollary 0.14 (intermediate value theorem). Let f (x) be continuous at every point of a closed and bounded interval [a, b] , with f (a) = A, and f (b) = B . Then if C is any number between A and B , there is at least one point c, a ≤ c < b , such that f (c) = C . Thus, f assumes every value between A and B at least once. Proof: : Apply Theorem IV to the function ϕ(x) = C − f (x) . Remark: The function may assume values other than just those between A and B . An example is the function f (x) = x2 , −1 ≤ x ≤ 3 . The theorem requires that it assume all values between f (−1) = 1 and f (3) = 9 . Besides those values , this function also happens to assume all values between 0 and 1. We can offer another proof of Corollary 0.15 Every positive number k has a unique positive square root. Proof: : Consider f (x) = x2 − k , which is clearly continuous everywhere. Since f (0) < 0 , 2 and f (1+ k ) = (1+ k )2 − k = 1+ k4 > 0 , Theorem IV shows that f must vanish somewhere 2 2 in the interval 0 < x < 1 + k . This is the root. It is the unique positive square root, for 2 say there were two positive numbers x and y such that x2 − k = 0 and y 2 − k = 0 . then x2 − y 2 = 0 . Thus, 0 = x2 − y 2 = (x − 1)(x + y ) . Since x + y > 0 , we conclude x − y = 0 , or x = y . Remark : It appears that if a function has the property of Corollary 1, the intermediate value property, then it must be continuous. This is false. An example is given by the discontinuous (trouble 3) function f (x) = 1 sin x 0 , x=0 , x=0 18 CHAPTER 0. REMEMBRANCE OF THINGS PAST. about the point x = 0 . If a is any number < 0 , and b any number > 0 , then f (x) assumes every value between f (a) and f (b) , but f (x) is not continuous throughout the interval since it is not continuous at x = 0 . Definition: The function f (x) has a relative maximum (minimum) at the point x0 , if, for all x in a sufficiently small interval containing x0 as an interior point, we have f (x) ≤ f (x0 ) (f (x) ≥ f (x0 )). Remark: By convention, we shall agree not to call the possible max (or min) at the end point of an interval a relative max (or min). This does lead to the possibility of an absolute max (or min) not being a relative max (or min). However, if the absolute max (or min) does occur at an interior point of an interval, it is also a relative max (or min). Definition: The function f (x) is differentiable at the point x0 if the following limit lim x→x0 f (x) − f (x0 ) x − x0 exists. There are the usual notations: f (x0 ) , df dx x=x 0 , Df (x0 ) . Theorem 0.16 If f (x) is differentiable at x0 , then it is continuous there. Proof: : Now if the limit f (x) − f (x0 ) x − x0 exists, as we have assumed, then the numerator must approach zero as x tends to x0 . Thus f is continuous at x0 . lim x→x0 Theorem 0.17 If f (x) is differentiable at x0 and has a relative maximum or minimum at x0 , then f (x0 ) = 0 . Proof: : Assume f has a relative min at x0 . Then for all x near x0 , f (x) ≥ f (x0 ) . )( (i) if x < x0 f (xx−f 0x0 ) ≤ 0 −x f ( x) − f ( x0 ) ≤ x− x0 )( (ii) if x > x0 f (xx−f 0x0 ) −x )− f ( so limx→x0 f (xx−x0x0 ) ≥ x>x so limx→x0 x<x0 0 ≥0 0 0 Because the function is differentiable at x0 , the two limiting values are f (x0 ) . Thus f (x) ) ≤ 0 and f (x0 ) ≥ 0 . Both statements can be true only if f (x0 ) = 0 . The trick here was, the slope must be negative to the left, and positive to the right of x0 . Since there is a unique slope (the derivative) at x0 , the slope must be zero there. At a relative max., the same proof holds with obvious modifications. Examples: 1. Although the function f (x) = |x| has a relative minimum at x = 0 , the conclusion of the theorem does not hold since f is not differentiable there. Note that both (i) and (ii) of the proof still do hold. 2. The differentiable function (for all x ) f (x) = 1 x4 sin x 0 , x=0 , x=0 has an infinite number of relative max and min in any interval including the origin. 0.6. APPENDIX: CONTINUOUS FUNCTIONS AND THE MEAN VALUE THEOREM19 Theorem 0.18 (Rolle). If (i) f (x) is continuous at every point of the closed and bounded interval [a, b] (ii) f (x) is differentiable at every point of the open interval (a, b) and (iii) f (a) = f (b) , then there is at least one point c , a < c < b , where f (c) = 0 . Proof: : If f (x) ≡ constant throughout [a, b] , take c to be any point in (a, b) . Otherwise f (x) must go either above or below (or both) the value f (a) . Assume it goes above. Then by Theorem III there is a point x = c where f has its absolute maximum. Since we assumed f (x) goes above f (a) , the point x = c is an interior point. Thus there is a relative maximum. Since f is differentiable in (a, b) , we may apply Theorem VI to conclude that f (c) = 0 . If we had assumed f went below f (a) , then there would have been an absolute (and relative) min. etc. Remarks: 1. From the proof of the theorem, we see that if f has values both greater and less than f (a) , then there would be at least two points in (a, b) where f = 0 . 2. You should be able to construct examples showing the theorem is not true if any of the hypotheses are dropped. Corollary 0.19 (mean value theorem) If (i) f (x) is continuous at every point of the closed and bounded interval [a, b] and (ii) f (x) is differentiable at every point of the open interval (a, b) , then there is at least one point c in (a, b) where f (c) = f (b) − f (a) . b−a Proof: : “Shift and apply Rolle’s Theorem”. In more detail, consider F (x) = f (x) − f (a) − x−a (f (b) − f (a)). b−a F (x) satisfies all of the assumption of Rolle’s Theorem. Therefore there is a point c where F (c) = 0 . Since f (b) − f (a) F (x) = f (x) − , b−a at x = c , we have f (c) = f (b) − f (a) . b−a Remarks: 1. The function f (x) = |x| in the interval [a, b], a < 0, b > 0 , shows what happens if the function fails to be differentiable at even one point of the open interval (a, b) . 2. An alternative form of the conclusion is: there is a number θ, 0 < θ < 1 , such that f (b) − f (a) = f (a + θ(b − a))(b − a). This is because every point in the interval (a, b) is of the form a + θ(b − a) , for some θ, 0 < θ < 1 . We shall now give some applications of the Mean Value Theorem. The first one is a specific example, while the others have great significance in themselves. 20 CHAPTER 0. REMEMBRANCE OF THINGS PAST. Example: The function f (x) = a1 sin x + a2 sin 2x + b cos x + b2 cos 2x has at least one zero in the interval [0, 2π ] , no matter what the coefficients a1 , a2 , b1 and b2 are. To show this, we shall show f is the derivative of a function g (x) which satisfies the hypotheses of Rolle’s theorem. This function g is just an anti-derivative of f : g (x) = f (x) g (x) = −a cos x − b2 a2 cos 2x + b1 sin x + sin 2x. 2 2 Since g is clearly continuous and differentiable everywhere, we must only see if g (0) = g (2π ) , which as also easy. Theorem 0.20 If f (x) is continuous and differentiable throughout [a, b] , and |f | < N there too, then the δ ( ) in the definition of continuity can be chosen as δ (c) = N . This δ works for every x in [a, b] . Proof: : Use the form of the mean value theorem in Remark 2. Then for any points x, x0 in (a, b) , f (x) − f (x0 ) = f (x)(x − x0 ), ˜ where x is somewhere between x and x0 . Thus ˜ |f (x) − f (x0 )| ≤ N |x − x0 | . We see now that if δ ( ) = N , then for any |f (x) − f (x0 )| < > 0, if |x − x0 | < δ. Theorem 0.21 If f satisfies the hypotheses of the mean value theorem and if in addition f (x) ≡ 0 throughout (a, b) , then f (x) ≡ const. Proof: : Let x1 and x2 be any points on (a, b) . Then by the form of the mean value theorem in Remark 2 f (x2 ) − f (x1 ) = 0 · (x2 − x1 ) = 0. Thus f (x2 ) = f (x1 ) for any two points in (a, b) , that is, f is identically constant. Corollary 0.22 If f (x) and g (x) both satisfy the hypotheses of the mean value theorem, and if in addition f (x) ≡ g (x) for all x in (a, b) , then f (x) = g (x) + c , where c is some constant. Proof: : consider the function F (x) = f (x) − g (x) . It satisfies the hypothesis of Theorem VII, so F (x) ≡ c, c constant. Thus f (x) − g (x) = c . Remark: Theorem IX is the converse of the theorem: “the derivative of a constant function is zero.” a figure goes here 0.6. APPENDIX: CONTINUOUS FUNCTIONS AND THE MEAN VALUE THEOREM21 Exercises (1) Look over all the theorems (and corollaries) here and be sure you can find examples showing that the theorems are not true if any of the hypotheses are relaxed. (2) Let f (x) = 1, if x is a rational number 0, if x is an irrational number. Is f continuous anywhere? (3) Let f (x) be an everywhere differentiable function which is zero at x = aj , j = 1, 2, . . . , n. Find a function which vanishes at least once between each of the zeros of f. (4) Use Theorem VIII to find a δ ( ) for the given functions. (a) f (x) = x4 − 7, −2 ≤ x ≤ 3. (b) f (x) = x2 sin x, −4 ≤ x ≤ 3 1 (c) f (x) = 1+x2 , −2 ≤ x ≤ 1 4 (d) f (x) = x 3 + 7, −2 ≤ x ≤ 8 √ (e) f (x) = x x2 + 1, −2 ≤ x ≤ 2 (5) (a) The function f (x) satisfies the following condition |f (x) − f (x0 )| ≤ 2 |x − x0 |3 for every pair of points x, x0 in the interval [a, b] . Prove f (x) ≡ constant in this interval. (b) Generalize your proof to the case when f satisfies |f (x) − f (x0 )| ≤ c |x − x0 |α , where c > 0 is some constant and α is any number > 1 . 2 (6) Consider the function f (x) = x 3 , in the interval [−8, 8] . Sketch a graph. Note that f (−8) = f (8) = 4 but there is no point where f = 0 ; which hypothesis of Rolle’s theorem is violated? (7) In a trip, the average speed of a car is 180 miles per hour. Prove that at some time during the trip, the speedometer must have registered precisely 180 miles per hour. (8) Let P1 := (x1 , y1 ) and P2 := (x2 , y2 ) be any two points on the parabola y = ax2 + bx + c , and let P3 := (x3 , y3 ) be the point on the arc P1 P2 where the tangent is parallel to the chord P1 P2 . Show that x3 = x1 + x2 . 2 (9) Prove that every polynomial of odd degree P (x) = x2n+1 + a2n x2n + · · · + a1 x + a0 has at least one real root. (10) If f is a nice function and f < 0 everywhere, prove that f is strictly decreasing. 22 0.7 CHAPTER 0. REMEMBRANCE OF THINGS PAST. Complex Numbers: Algebraic Properties . In high school, to be able to find the roots of all quadratic equations ax2 + 2bx + c = 0 , √ we were forced to introduce the symbol i ≡ −1 , in other words, introduce a special symbol for a root of x2 + 1 = 0 . Before going any further, we should prove that no real number c can satisfy c2 + 1 = 0 . By contradiction, assume that there is such a c . Then necessarily either c > 0, c < 0, or c = 0 . If c = 0 , we have the immediate contradiction that 1 = 0 . If c > 0 , or c < 0, 0 < c2 . Consequently 0 < c2 + 1 too, which again contradicts 0 = c2 + 1 , and proves our contention that no real number can satisfy x2 + 1 = 0 . Observe that our proof also shows that if we introduce a new symbol for a root of 2 + 1 = 0 , that symbol cannot be an element of an ordered field, for only the ordered field x properties of the real numbers were used in the above proof. we shall see that “i” is an element of a field, but not an ordered field. It is difficult to overestimate the importance of complex numbers for all of mathematics, both from an esthetic as well as from a practical viewpoint. With them we can prove that every quadratic polynomial has exactly two roots (which may coincide). What is more surprising is that every polynomial of order n an xn + an−1 xn−1 + · · · + a1 x + a0 = 0, an = 0, has exactly n complex roots. This result, thefundamental theorem of algebra, was first proved by Gauss in his doctoral dissertation (1799). It is one of the crown jewels of mathematics. The difficult part is proving that every polynomial has at least one complex root, from which the general result follows using only the “factor theorem” of high school algebra. Later on in the semester we shall discuss this more fully and offer a proof. It is not simpleminded, for the proof is non-constructive pure existence proof, giving absolutely no method of finding the roots. Perhaps we shall even prove some more exotic results. Having gotten carried away, let us retreat and obtain the algebraic rules governing the set C of complex numbers. In order to reveal the algebraic structure most clearly, we shall denote a complex number z by an ordered pair of real numbers: z = (x, y ), x, y ∈ R . Thus C is R × R with the following additional algebraic structure. Definition: If z1 = (x1 , y1 ) and z2 = (x2 , y 2 ) are any two complex numbers, then we define Addition: z1 + z2 = (x1 + x2 , y1 + y2 ) , and Multiplication: z1 · z2 = (x1 x2 − y1 y2 , x1 y2 + y1 x2 ) . Equality: z1 = z2 if and only if both x1 = x2 and y1 = y2 . Thus, the complex number zero—the additive identity—is (0, 0) , while the complex number one—the multiplicative identity—is (1, 0) . Using the fact that the real numbers R form a field, we can now prove the Theorem 0.23 The complex numbers C form a field. Proof: Since the verification of the field axioms are entirely straightforward we give only a smattering. Note that we shall rely heavily on the field properties of R . Addition is commutative: z1 + z2 = (x1 , y1 ) + (x2 , y 2 ) = (x1 + x2 , y1 + y2 ) = (x2 + x1 , y2 + y1 ) = (x2 , y2 ) + (x1 , y1 ) = z2 + z1 . (0-3) 0.7. COMPLEX NUMBERS: ALGEBRAIC PROPERTIES 23 Additive identity: 0 + z = (0, 0) + (x, y ) = (0 + x, 0 + y ) = (x, y ) = z. Multiplicative inverse: For any z ∈ C, z = (0, 0) , we must find a z = (ˆ, y ) ∈ C such ˆ xˆ that z z = 1 , that is, find real numbers x and y such that (x, y )(ˆ, y ) = (1, 0) . Using ˆ ˆ ˆ xˆ the definition o f complex multiplication, this means we must solve the two linear algebraic equations xx − y y = 1 ˆ ˆ x, y ∈ R, y x + xy = 0 ˆ ˆ for x and y ∈ R . The result is ˆ ˆ z = (ˆ, y ) = ( ˆ xˆ x2 −y x , ). 2 x2 + y 2 +y 1 We will denote this multiplicative inverse, which we have just proved does exist, by z or −1 . z It is interesting to notice that complex numbers of the form (x, 0) have the same arithmetic definitions as the real numbers, viz. (x1 , 0) + (x2 , 0) = (x1 + x2 , 0) (x1 , 0)(x2 , 0) = (x1 x2 , 0). We can easily verify that all complex numbers of this form (x, 0) also form a field, a subfield of the field C . On the basis of these last two equations, we can identify a real number x with the complex number (x, 0) in the sense th at if we perform any computation with these complex numbers of this form, the result will be the same as if the computation had been performed with the real numbers alone. Thus, numbers of the form (x, 0) ∈ C are algebraically equivalent to the numbers x ∈ R . The technical term for such an algebraic equivalence is isomorphic, much as a term for geometric equivalence is congruent. After identifying the real numbers with complex numbers of the form (x, 0) , we can say that the field of real numbers R is embedded as a subfield in the field of complex numbers, R ⊂ C . After all this chatter, let us at least convince ourselves that every quadratic equation is solvable if we use complex numbers. First we solve z 2 + 1 = 0 , which may be written as (x, y )(x, y ) + (1, 0) = (0, 0) , or as the two real equations x2 − y 2 = −1, 2xy = 0 . The last equation says that either x = 0 or y = 0 . Now if y = 0 , we are left to solve x2 + 1 = 0, x ∈ R , which we know is impossible. Therefore x = 0 and then y 2 = 1 . Thus the two complex numbers (0, 1) and (0, −1) both satisfy z 2 + 1 = 0 . The general case, az 2 + bz + c = 0 is easily reduced to the special one by completing the square. One by-product of the above demonstration is that we see it is foolhardy to try to define an order relation on C to obtain an ordered field. This is because the equation x2 + 1 = 0 cannot be solved in any ordered field, as was shown earlier, whereas we have just solved it in C . Observe that every (x, y ) ∈ C can be written as (x, y ) = (x, 0)(1, 0) + (y, 0)(0, 1), where the complex number (0, 1) is called the imaginary unit and is denoted by i. If we now utilize the isomorphism between the real number a and complex numbers (a, 0) , the 24 CHAPTER 0. REMEMBRANCE OF THINGS PAST. last equation shows that (x, y ) may be thought of as x + iy . Thus, we have obtained the usual notation for complex numbers. From our development, the algebraic role of i as the symbol for the imaginary unit (0, 1) is hopefully clarified. The number x is called the real part, and y the imaginary part of the complex number z = x + iy . In symbols, x = Re{ z } and y = Im{ z } . Our introduction of complex numbers suggests a geometric interpretation. We have defined complex numbers C as ordered pairs of real numbers, elements of R × R , with an additional algebraic structure. Since the points in the plane are also elements of R × R , it is clear that there is a one to one correspondence between the complex numbers and the points in the plane. If we plot the point z = (x, y ) , the real number |z | , the “absolute value or modulus of z ” is the distance of the point z from the origin. Its value is computed by the Pythagorean theorem |z | = x2 + y 2 . Here are several formulas which are easily verified: |z1 z2 | = |z1 | + |z2 | |x| ≤ |z | , |y | ≤ |z | |z1 + z2 | ≤ |z1 | + |z2 | (triangle inequality) (0-4) If the line joining the point z to the origin is drawn, the angle θ between that line and the positive real (= x) axis is called the argument or amplitude or z . The absolute value r and argument θ of a complex number determine it uniquely, since we have z = r(cos θ + i sin θ) (0-5) This is the polar coordinate form of the complex number z . Note that conversely, z determines its argument only to within an additive multiple of 2π . This observation will prove of value to us shortly. Associated with every complex number, z = x + iy there is another complex number z = x − iy , the complex conjugate of z . It is the reflection of z in the real axis. Probably the main reason for introducing z is that we can solve for x and y in terms of z and z : x= z+z , 2 y= z−z . 2i Again some simple formulas: |z | = |z | , |z |2 = |z |2 = z z. ¯ ¯ (z1 + z2 ) = z1 + z2 , (z1 z2 ) = z1 z2 . (0-6) To illustrate the value of this notation, let us leave the main road to prove the interesting Theorem 0.24 . If the complex number γ is a root of the polynomial P (t) = an tn + an−1 tn−1 + · · · + a1 t + a0 , where the coefficients a0 , a1 , . . . , an are real numbers, then γ is also a root of P (t) . In other words, the roots of real equations occur in conjugate pairs. 0.7. COMPLEX NUMBERS: ALGEBRAIC PROPERTIES 25 Proof: Since γ is a root, the complex number P (γ ) = an γ n + · · · + a1 γ + a0 is zero, P (γ ) = 0 . This implies that its conjugate is also 0, P (γ ) = 0 . By using equations (0.7), we have that P (γ ) = an γ n + · · · + a1 γ + a0 , since the coefficients aj are real, aj = aj . Thus 0 = P (γ ) = an γ n + · · · a1 γ + a0 = P (γ ), that is, the complex number γ is a root of the same polynomial. Now if the proof looks like it was done with mirrors, go over each step carefully. This type of reasoning is somewhat typical of modern mathematics in that it yields information about an object (the roots of a polynomial in this case) without first obtaining an explicit formula for the object. After this digression let us return and find a geometric interpretation for the arithmetic operations on complex numbers. First, addition. The three points z1 , z2 and z1 + z2 together with the origin determine a parallelogram. (check this). Thus addition of complex numbers is sometimes called the parallelogram rule for additions. Given the points z1 and z2 , the point z1 + z2 can be constructed using compass and straight-edge. Subtraction is just z1 + (−z2 ) . Multiplication is much more difficult to interpret geometrically. We shall use equation (0.7) and write zj = |zj | (cos θj + i sin θj ), j = 1, 2. Then z1 z2 = |z1 | (cos θ1 + i sin θ1 ) |z2 | (cos θ2 + i sin θ2 ) z1 z2 = |z1 z2 | [cos θ1 + θ2 ) + i sin(θ1 + θ2 )]. (0-7) Thus the product of z1 and z2 has modulus |z1 z2 | and argument θ1 + θ2 : multiply the moduli and add the arguments. This too may be carried out using compass and straightedge. Since z1 = |z1 | (cos θ2 − i sin θ2 ) , division reads 2 2 z1 z1 = [cos(θ1 − θ2 ) + i sin(θ1 − θ2 )], z2 z2 so the moduli are divided while the arguments are subtracted. We will exploit the multiplication formula (0.7) to find all n complex roots of the specific polynomial z n = A, for any A ∈ C . This equation is one of the few whose roots can always be found explicitly. The trick is to write A in its polar coordinate form A = |A| [cos(α + 2kπ ) + i sin(α + 2kπ )], where α is the argument of A and k is any integer. Although we get the same A no matter what k is used, as was observed following equation (0.7), we shall retain the arbitrary k since it is the heart of the process we have in mind. From equation (0.7) we see that 1 1 A n = |A| n [cos α + 2kπ α + 2kπ + i sin n n 26 CHAPTER 0. REMEMBRANCE OF THINGS PAST. 1 in the sense that for any value of the integer k, (A n )n = A . As k runs through the integers, we get only n different angles of the form α+2kπ , since the other angles differ from these n n angles by multiples of 2π . For each of these n different angles we obtain a different 1 1 complex number A n . These n numbers for A n are the desired n roots of z n = A . It is usually convenient to obtain the angles by letting k = 0, 1, 2, . . . , n − 1 , although any n integers which do not differ by multiples of n will do. An example should help clear the air. We shall find the three cube roots of −2 , that is, solve z 3 = −2 . First, −2 = 2[cos(π + 2kπ ) + i sin(π + 2kπ )], since the argument of −2 is π while its modulus is 2. Thus, the roots are 1 z = 2 3 [cos π + 2kπ π + 2kπ + i sin, k = 0, t1 , t2 . . . 3 3 There are only three values of z possible, no matter what k ’s are used. These three cube roots of −2 are √ 1 1 1 k = 0, 3, 6, . . . z1 = 2 3 [cos( π ) + i sin( π )] = 2 3 ( 1 + i 23 ) k = 1, 4, 7, . . . z2 = 2 3 [cos(π ) + 3 3 2 1 i sin(π )] = −2 3 √ 1 1 π π 1 k = 2, 5, 8, . . . z3 = 2 3 [cos( 53 ) + i sin( 53 )] = 2 3 ( 2 + i 23 ). It is time-saving to observe that the n roots of unity, that is, of z n = 1 , can be written down immediately by utilizing the geometric interpretation of multiplication. All of the roots have modulus 1, and so must lie on the unit circle |z | = 1 . Bisecting the circle into n equal sectors by the radii, the first beginning on the positive x -axis, we find the roots of unity, wj , at the n successive intersections of these radii with the unit circle. The roots π wj , j = 1, 2, 3, of z 3 = 1 are illustrated in the figure as the intersections of θ = 0, θ = 23 , 4π 2π 2π and θ = 3 with |z | = 1 . Thus w1 = cos 0 + i sin 0 = 1, w2 = cos 3 + i sin 3 = 1 −2 + i √ 3 2 , w3 π π = cos 43 + i sin 43 = − 1 − i 2 √ 3 2. Exercises (1) Express the following complex numbers in the form a + bi . (a) (1 − i)2 (b) (2 + i)(3 − i) (c) (d) (e) (f) 1 i 1+i 2− i 1+i 1+2i i3 + i4 + i271 (2) Compute the absolute values of the complex numbers in Ex. 1. 0.7. COMPLEX NUMBERS: ALGEBRAIC PROPERTIES 27 (3) a) Add (1 + i) and (1 + 2i) using compass and straight-edge. b) Multiply (1 + i) and (1 + 2i) using compass and straight-edge. (4) Express in the form r(cos θ + i sin θ), with 0 ≤ θ < 2π : (a) i (b) 2i (c) −2i (d) 4 (e) −1 (f) −1 + i (g) (1 − i)3 (h) (i) 1 (1+i)2 √ 1 2( 3 + i) (5) Determine the (a) three cube roots of i, −i , and of 1 + i , (b) four fourth roots of −1 and +2 (c) six roots of z 6 = 1 . (6) Let A be any complex number, A = |A| [cos α + i sin α] , and let w1 , . . . , wn be the n roots of z n = 1 . Prove that the n roots of z n = A are 1 1 1 z1 = A n w1 , z2 = A n w2 , . . . zn = A n wn , where 1 1 A n = |A| n (cos α α + i sin ) n n is the principal n th root of A . This shows that the problem of finding the roots of a complex number is essentially reduced to the simpler problem of finding the roots of unity. (7) Draw a sketch of the following sets of points in the complex plane. (a) { z ∈ C : |z − 2| ≤ 1 } (b) { z ∈ C : |z − 1 + i| ≤ 2 } (c) { z ∈ C : |z − 2| > 3 } (d) { z ∈ C : 1 ≤ |z − 2| ≤ 3 } (e) { z ∈ C : 1 ≤ |z + i| < 2 } 28 0.8 CHAPTER 0. REMEMBRANCE OF THINGS PAST. Complex numbers: Completeness Properties, Complex Functions. We have just considered the algebraic properties of complex numbers. Now we look at infinite sequences of complex numbers. To develop the desired properties of C , we shall utilize those of R . Definition: The sequence zn of complex numbers converges to the complex number z if, given any > 0 , there is an N such that |zn − z | < for all n > N . We shall again write zn → z . In order to apply the theorem known for real sequences to complex sequences, the following is vital. Theorem 0.25 Let zn = xn + iyn , and z = x + iy . Then zn converges to z if and only if both the real and imaginary parts converge to their respective limits. In symbols, zn → z ⇐⇒ xn → x and yn → y. Proof: Since zn → z , given any > 0 , we can find an N etc. for the zn ’s. Now by equation (0.7) |xn − x| ≤ |zn − z | < and |yn − y | ≤ |zn − z | < so both xn → x and yn → y . Conversely, given any > 0 , we can find an N1 for the xn ’s and an N2 for the yn ’s. Let N be the larger of N1 and N2 , N = max (N1 , N2 ) . This N works for both the xn and yn . But |zn − z | = |xn + iyn − x − iy | ≤ |xn − x| + |yn − y | < 2 . Therefore zn → z , completing the proof. This theorem states that a definition is equivalent to some other property. We could thus have used either property as a definition. Recall that the real numbers were defined so that there would be no“hole” in the real line. This was the completeness property. It guaranteed that if a sequence of real numbers an “looked like” they were approaching a limiting value, then indeed th ere is some a ∈ R such that an → a . The issue here was to avoid the problem of a sequence of rational numbers approaching an irrational number—which is a “hole” if our set just consisted of the rationals. One consequence of the las t theorem is that the set of complex numbers C is also complete. Theorem 0.26 . Every bounded infinite sequence of complex numbers { zk } has at least one subsequence which converges to a number z ∈ C . (By bounded, we mean that there is some r ∈ R such that |zk | < r for all k ). Proof: Since the { zk } are bounded, we know { xk } and { yk } are also bounded sequences of real numbers. The conclusion is now a consequence of the Bolzano-Weierstrass theorem 5 applied to { xk } and { yk } , and of theorem 12 just proved. There is a fine point though: how to get a subsequence of the zk whose real and imaginary parts both converge. The trick is first to select a subsequence { xkj } = { Rezkj } of the { xk } which converge to some x ∈ R . Then, from the related subsequence { ykj } = { Imzkj } , select 0.8. COMPLEX NUMBERS: COMPLETENESS AND FUNCTIONS 29 a subsequence { ykjn } which converges to some y ∈ R . Then { xkjn } also converges to x ∈ R so zkjn → z , and we a re done. With these technical results under our belts, sequences in C become no more difficult than those in R . Let us briefly examine the elements of functions of a complex variable. A complexvalued function f (z ) of the complex variable z is a mapping of some subset z U ⊂C 2 , and f (z ) = 1 . into the complex numbers C, f : U → C . Two examples are f (z ) = z z Both the domain and range of f (z ) = z 2 are all of C , while the domain and range of 1 f (z ) = z are all of C with the exception of 0. If f maps R → R , like f (x) = 1 + x or f (x) = ex , since R ⊂ C , one asks how the domain of definition of f can be extended from R to C . Of course there are many possible ways to do this, but most of them are entirely artificial. For f (x) = 1 + x , the natural extension is f (z ) = 1+ z, z ∈ C . Similarly, if P (x) = N=0 ak xk is any polynomial k defined for x ∈ R , the natural extension to z ∈ C is P (z ) = N=0 ak z k . We are thus led k to extend f (x) = ex for x ∈ R , to z ∈ C by defining f (z ) = e2 . The only problem is that we have absolutely no idea what it means to raise a real number, e , to a complex power. Taylor (power) series are needed to resolve this issue. This will be carried out at the end of Chapter 1. Continuity of complex functions is defined in a natural way. Let z0 be an interior point of the set U ⊂ C (that is, z0 is not on the boundary of U ). Definition: The function f : U → C is continuous at the interior point a0 U if, given any > 0 there is a δ > 0 such that |f (z ) − f (z0 )| < for all z in 0 < |z − z0 | < δ . Reasonable theorems like, if f and g are continuous at the interior point z0 U , so is the function f + g , are true too - with the same proof as was given for real-valued functions of a real variable. Although we could go on and define the derivative and integral for complex-valued functions f (z ) of a complex variable, the development would take too much work. For our future purposes, it will be sufficient to define the derivative and integral of a complexvalued function f (x) of the real variable x . The first step is to split f (x) into its real and imaginary parts, that is, find real valued functions u(x) and v (x) such that f (x) = u(x) + iv (x) . This decomposition c an always be done by taking u(x) = f (x) + f (x) f (x) + f (x) , v (x) = . 2 2i Since u(x) = u(x) and v (x) = v (x) , both u(x) and v (x) are real-valued functions. It is clear that f (x) = u(x) + iv (x) . Example: For the functions f (x) = 1 + 2ix , we have f (x) = 1 − 2ix , so u(x) = (1 + 2ix) + (1 − 2ix) (1 + 2ix) − (1 − 2ix) = 1, v (x) = = 2x. 2 2i as expected. Because f (x) is a complex number for every x in the domain where f is defined, we |f (x)| = u2 (x) + v 2 (x). With this notion of absolute value, the definitions of continuity and differentiability read just as if f were itself real-valued. For example 30 CHAPTER 0. REMEMBRANCE OF THINGS PAST. Definition: : The complex-valued function f (x) of the real variable x is differentiable at the point x0 if f (x) − f (x0 ) lim x→x0 x − x0 exists. A more convenient way of dealing with the derivative is supplied by the following Theorem 0.27 . The function f (x) = u(x) + iv (x) is differentiable at a point x0 if and only if both u(x) and v (x) are differentiable there, and du dv df = +i . dx dx dx Proof: We shall use Theorem 12. Let { xn } be any sequence whose limit is x0 . Define the sequences { an }, { αn }, and { βn } by an = αn = f (xn ) − f (x0 ) , xn − x0 u(xn ) − u(x0 ) v (xn ) − v (x0 ) , and βn = . xn − x0 xn − x0 We must show that limn→∞ an exists if and only if both limits limn→∞ αn and limn→∞ βn exist, for the existence of these limits is equivalent to the existence of the respective derivatives. But notice that an = αn + iβn , since an = f (xn ) − f (x0 ) u(xn ) + iv (xn ) − (u(x0 ) + iv (x0 )) = = αn + iβn . xn − x0 xn − x0 Thus we can appeal to Theorem 12 to conclude that lim an exists if and only if both lim αn and lim βn exist. The formula f = u + iv is an immediate consequence since an → f (x0 ), αn → u (x0 ), and βn → v (x0 ) Examples: df d d a) If f (x) = 1 + 2ix, dx = dx 1 + i dx 2x = 2i b) If f (θ) = cos 7θ + i sin 7θ + 2θ − iθ2 df d d 2 dθ = dθ [2θ + cos 7θ ] + i dθ [−θ + sin 7θ ] = 2 − 7 sin 7θ + i[−2θ + 7 cos 7θ ] A related result which is even easier to prove is Theorem 0.28 . The complex-valued function f (x) = u(x) + iv (x), x R is continuous at x0 R if and only if both u(x) and v (x) are continuous at x0 . Proof: An exercise. Integration is defined more directly. Definition: Let f (x) = u(x) + iv (x), x R . If the real-valued functions u(x) , and v (x) are integrable for x [a, b] , we define the definite integral of f (x) by b b f (x) dx = a b u(x) dx + i a v (x) dx. a 0.8. COMPLEX NUMBERS: COMPLETENESS AND FUNCTIONS 31 The standard theorems, like if c is any complex constant, then b a b b b f (x) dx ≤ f (x) dx, and, if a ≤ b, cf (x) dx = c a a |f (x)| dx a are proved by using the definition above and the corresponding theorems for real functions. We shall, however, need the more difficult Theorem 0.29 . If the complex-valued function f (t) = u(t) + iv (t), t R , is continuous for all t [a, b] , then there is a constant K such that |f (t)| ≤ K for all t [a, b] . Furthermore if x, x0 [a, b] , the n x f (t) dt ≤ K |x − x0 | . (0-8) x0 Notice that the left-hand side absolute value is in the sense of complex numbers. Proof: Since f (t) is continuous in [a, b] , by Theorem 15 so are both u(t) and v (t) . But a real-valued function which is continuous in a closed and bounded interval is bounded. Thus there are constants K1 and K2 such that |u(t)| ≤ K1 , |v (t)| ≤ K2 for all t [a, b] . then 2 2 |f (t)| = u2 (t) + v 2 (t) ≤ K1 + K2 ≡ K. To prove the inequality (0.29), we use the inequality mentioned before the theorem to see that if x0 ≤ x x x f (t) dt ≤ x0 |f (t)| dt. x0 Since |f (t)| ≤ K , we find that x |f (t)| dt ≤ K |x − x0 | . x0 Combining these last two inequalities, we obtain the desired inequality (0.29) if x0 ≤ x . The other case, x ≤ x0 , can be reduced to that already proved by observing that x x f (t) dt = − x0 x f (t) dt ≤ K |x0 − x| = K |x − x0 | . f (t) dt = x0 x0 Exercises (1) In the complex sequences below, which ones converge, which do not converge but have at least one convergent subsequence, and which do neither? In all cases n = 1, 2, 3, . . . . (a) zn = i n + 3i − 4 (b) zn = 2i + (−1)n (c) zn = n − i (d) zn = in √ (e) zn = 1 + i 3 − (−1)n 7n 32 CHAPTER 0. REMEMBRANCE OF THINGS PAST. (f) zn = (4+6i)n−5 1−2ni . (2) Write the following complex-valued functions f (x) of the real variable x as f (x) = u(x) + iv (x) , where u and v and real-valued. (a) f (x) = i + 2(3 − 2i)x2 , (b) f (x) = (1 + 2ix)2 (c) f (x) = cos 3x2 − (3 + i) sin x (d) f (x) = 1 1+2i−x (3) (a) Use the definition of the derivative to compute above. (b) Find df dx for all the functions in Exercise 2 above. (4) Evaluate (a) (b) df dx 3 −1 (1 + 2ix) dx 4 1 [x + (1 − i) cos 2x] dx for the function in Exercise 2a Chapter 1 Infinite Series 1.1 Introduction In elementary calculus you have met the notion of the limit of a sequence of numbers (see also Chapter 0, sections 5 and 7). This concept of limit is just what essentially distinguishes calculus from algebra. It was crucial in the definition of the derivative as the limit of a difference quotient and the integral as the limit of a Riemann sum. We now propose to discuss another limiting process, infinite series, in detail. An infinite series is a sum of the form ∞ ak = a1 + a2 + a3 + · · · , (1-1) k=1 where the ak ’s are real or complex numbers. Since there is no added difficulty we shall suppose the ak ’s are complex numbers. One immediate trouble is that it would take us an infinite amount of time to add an infinite sum. For example, what is ∞ (a) =? k=1 1 = 1 + 1 + 1 + 1 + · · · ∞ (b) 1 = 1 − 1 + 1 − 1 + 1 − 1··· =? k=1 ∞ 1 1 (c) = 1 + 1 + 1 + 1 + 16 + · · · =? k=1 2k−1 2 4 8 Thus, we are faced with the realization that be sum (1) is not really well defined, even in cases where we feel it might make sense. Our first task is to give a more adequate definition. Let Sn be the sum of the first n terms: n Sn := a1 + a2 + · · · + an = ak . k=1 Then for each n , we have a complex number Sn , called the nth partial sum of the series (1). Definition: If limn→∞ Sn = S , where S is a (finite) complex number, we say that the infinite series converges to S . If the sequence S1 , S2 , S3 , . . . has no limit, we say that the infinite series diverges. For the examples given just above, we have (a) Sn = n 1 = n → ∞ so the infinite series diverges to ∞ . 1 1 n odd, (b) Sn = n (−1)n+1 = . which does not have a limiting value since 1 0 n even, it oscillates between 1 and 0. 33 34 CHAPTER 1. INFINITE SERIES (c) Sn = n 2k1 1 = 2(1 − 21 ) → 2 , so the infinite series converges to the number 2 n − 1 (we found the sum of the series by realizing it is a simple geometric series: 1 + r + r2 + · · · + rN = 1 − rN +1 ) 1−r for (r = 1). With an adequate definition of convergence of infinite series, it is clear that we should develop some tests for determining if a given series converges. That will be done in the next section. In preparation, let us examine some simple types of series which occur often and prove a few useful theorems. There are two types of series whose sums can always be found, and for which the question of convergence is exceedingly elementary. Definition: An infinite geometric series is a series of the form ∞ ark = a + ar + ar2 + · · · . k=0 The partial sums are Sn = a + ar + · · · + arn = a Theorem 1.1 The infinite geometric series a |r| < 1 . Then the sum is 1−r . 1 − rn+1 1−r ∞ k k=0 ar , for (r = 1). a = 0 , converges if and only if Proof: limn→∞ rn+1 exists only if |r| < 1 . Then the limit is zero so limn→∞ Sn = (the non-convergence when |r| = 1 follow from Theorem 6, p. ?) Examples: (a) ∞ k=0 (1 (b) ∞ 1+i k k=0 ( 2 ) (c) ∞ k=1 1 (d) ∞ k k=1 (−1) + i)k diverges since |1 + i| = converges since 1+i 2 √ 2 ≥ 1. √ = 2 2 < 1. The sum of this series is 1 + i . diverges since |1| = 1 . diverges since |−1| = 1 . Definition: An infinite telescopic series is one of the form ∞ (αk − αk+1 ) = (α1 − α2 ) + (α2 − α3 ) + (α3 − α4 ) + · · · . k=1 It is clear that most of the terms cancel each other. Theorem 1.2 If αk → α , then ∞ k=1 (αk − αk+1 ) = α1 − α . Proof: Sn = (α1 − α2 ) + (α2 − α3 ) + · · · + (αn − αn+1 ) = α1 − αn+1 → α1 − α . Examples: 1 1 (a) 112 + 213 + 314 + · · · = ∞ k(k1 = ∞ ( k − k+1 ) = 1 . k=1 k=1 · · · +1) 1 1 (b) 4·11−1 + 4·21−1 + 4·31−1 + · · · = 2 ∞ ( 2k1 1 − 2k1 ) = 2 2 2 2 k=1 − +1 a 1− r 1.1. INTRODUCTION 35 We close this section with some reasonable (and desirable) theorems. The proofs are immediate consequences of the definition of convergence of infinite series and the related theorems about limits of sequences of numbers. ∞ k=1 ak Theorem 1.3 . If n k=1 ak Theorem 1.4 If → a , and c is any number then → a and n k=1 bk ∞ k=1 cak n k=1 (ak → b , then → ca . + bk ) → a + b . Theorem 1.5 Let ak = αk + iβk , where αk and βk are real. The infinite series ak converges if and only if the two real series αk and βk both converge. That is, an infinite complex series converges if and only if both its real and imaginary parts converge. Proof: We must look at the partial sums. Let σn = n Sn = n n αk = k=1 n k=1 αk (αk + iβk ) = k=1 , and τn = n k=1 βk . Then n αk + i k=1 βn = σn + iτn . k=1 But we know from Theorem 12 of Chapter 0 that the complex sequence Sn converges if and only if both its real part, σn , and imaginary part, τn , both converge—in other words, if the series ak and βk both converge. Two remarks should be made in an attempt to mitigate some confusion. First, the index ∞ ∞ ∞ k of the series k=1 ak could have been any other letter. Thus k=1 ak = j =1 aj . This is perhaps indicated most clearly if we left an empty box instead of using any letter at all: . The connecting line means that the same letter must be used in both boxes. Now you can fill in any letter that makes you happy. No matter w hat you write, it still means a1 + a2 + a3 + · · · . In a similar way, the index need not begin with 1. Thus, for example, ∞ ∞ k=1 ak = k=17 ak−16 = a1 + a2 + · · · . Although this manipulation looks like unwanted silliness here, it is sometimes quite useful. Later on this year you will need it. The related transformation for integrals is illustrated by 3 2 1 dt = t 2 1 1 dt. t+1 Exercises (1) Find a closed form expression for the nth partial sum of the following infinite series and determine if they converge. (a) 2 3 + 2 9 (b) 1 + i + (c) 1 2! + ∞ 2 k=1 3k . 2 2 27 + · · · + 3n + · · · = i2 + i3 + · · · + in + · · · + 2 3! + 3 4! + ··· = ∞ k −1 k=2 k! = ∞ 1 k=2 ( (k−1)! 3 n (d) ln 1 + ln 2 + ln 4 + · · · + ln( n+1 ) + · · · 2 3 (e) (f) ∞ 3− 4i m m=0 ( 7 ) ∞ 2− 3i n=1 n(n+1) 1 − k) 36 CHAPTER 1. INFINITE SERIES (2) The repeating decimal 1.565656 · · · can be written as 56 56 56 1 + 2 + 4 + 6 + · · · = 1 + 56 10 10 10 ∞ ( k=1 1k ). 102 Sum the geometric series and find what rational number the repeating decimal represents. In a similar way, every decimal which begins to repeat eventually is a rational number. What rational number is represented by 1.4723? (3) A ball is dropped from a height of 20 feet. Every time it bounces, it rebounds to 3 4 of its height on the previous bounce. What is the total distance traveled by the ball? (4) If ∞ ∞ k=1 ak → a and k=1 bk ∞ (αak + βbk ) → αa + βb . k=1 (5) If an > 0 and → b , and if α and β are any numbers, prove that 1 an an converges, prove that (6) Does the convergence of ∞ n=1 an diverges. imply the convergence of ∞ n=1 (an + an+1 ) ? (7) (a) If the partial sums of an are bounded, and { bn } is a strictly decreasing sequence with limit 0, bn 0 , prove that an bn converges. (b) Use (a) to prove that if ∞ n=1 nan converges then so does the series (c) Use (a) to discuss the convergence of 1.2 ∞ sin nx n=1 n ∞ n=1 an . . Tests for Convergence of Positive Series Tests to determine convergence are of several types, i) those that give sufficient conditions, ii) those that give necessary conditions, and iii) those that give both necessary and sufficient conditions. Theorem 1 of the last section governing geometric series was of the last type; however it is more common to find convergence tests of the first two types since they are usually easier to come by. You should be careful to observe the nature of a test . A simple theorem should make the point clear. Theorem 1.6 . If the series ∞ k=1 ak —where ak may be complex—converges, then lim |ak | = 0. k→∞ Proof: Let Sn = a1 + a2 + · · · + an . Then |an | = |Sn − Sn−1 | . As n → ∞ both Sn and Sn−1 tend to the same limit, so |an | → 0 . Returning to the point made before, this theorem states a necessary but not sufficient (as we shall see) condition for an infinite series to converge. We can apply it to see that k k k+1 diverges—since k+1 → 1 = 0 . Thus this theorem is useful as a quick crude test to ∞1 weed out series which diverge badly. But all it tells us about the series k=1 k —for which 1 k → 0 so the criterion of the theorem is satisfied—is that it might converge. In fact, this series diverges too, as we shall now prove. ∞ k=1 1 1111 11 1 1 1 = 1 + + + + + ··· + + + ··· + + + ··· + + ··· k 2345 89 16 17 32 1.2. TESTS FOR CONVERGENCE OF POSITIVE SERIES 1+ 37 1111 1 1 1 1 1 + + + + ··· + + + ··· + + + ··· + + ··· 2448 8 16 16 32 32 =1+ 11 + 22 + 1 2 1 2 + + 1 2 + · · · .. 1 1 1 Thus S1 = 1, S2 = 1 + 1 , S4 > 1 + 1 + 2 = 1 + 2 · 2 , S8 > 1 + 3 · 2 , S16 > 1 + 4 · 1 , . . . , S2n > 2 2 2 1 1 1 + n · 2 . We can easily see that as n → ∞, S2n → ∞ , so the series k , called the harmonic series, diverges. For the many series which slip through the test of Theorem 6, more refined criteria are needed. The criteria we shall present in the remainder of this section are for series with positive terms, an ≥ 0 . Application of these criteria to series with complex terms will be made in the next section. Theorem 1.7 . If ak ≥ 0 for each k , then the series the sequence of partial sums is bounded from above. ∞ k=1 ak converges if and only if Proof: Since all the ak ’s are non-negative, Sn+1 ≥ Sn . Thus the Sn ’s are a monotone increasing sequence of real numbers. By Theorems 6 and 8 of Chapter 0, this sequence Sn converge if and only if it is bounded. Example: The series ∞1 k=1 k! of positive terms converges, since 1 1 1 1 = ≤ = k −1 k! 1 · 2 · 3 · · · .k 1 · 2 · 2 · 2 · · · .2 2 so n Sn = k=1 1 ≤ k! n k=1 1 2k − 1 ∞ ≤ k=0 1 = 2. 2k The convergence now follows since Sn is bounded from above. We can extract an exceedingly useful idea from these examples: check the convergence of a given series by comparing it with another series which we know to converge or diverge. Theorem 1.8 . (comparison test) Let ak ≤ bk for n > N . Then i) if bk converges, so does ak . ii) if ak diverges, so does bk . ak and bk be two positive series for which Proof: Let sn = n+N +1 ak and tn = n+N +1 bk . Then sn ≤ tn for all n > N , so i) if k k tn → t , then sn is bounded (sn ≤ t) , ii) if sn → ∞ , then tn → ∞ too. Remark: The “n > N ” part of the hypothesis reflects the fact that it is only the infinite tail of an infinite series that we need to worry about. Any finite number of terms can always be added later on. Examples: ∞ 1 1 1 1 (a) converges. k=1 2k +1 converges since 2k +1 < 2k and 2k ∞√ 1 1 1 √ ≥ 1 (for k ≥ 1 ) and (b) k=1 k diverges since k k diverges. k Our next test is based upon comparison with a geometric series rn . 38 CHAPTER 1. INFINITE SERIES Theorem 1.9 . (ratio test) Let following limit exists an be a series with positive terms such that the lim n→∞ an+1 = L. an Then i) if L < 1 , the series converges ii) if L > 1 , the series diverges iii) if L = 1 , the test is inconclusive. Remark: If the assumed limit does not exist, a variant of the theorem is still true but we shall not discuss it. Proof: i) If L < 1 , pick any r, L < r < 1 . Then there is an N such that for all n ≥ N, an+1 < r . Therefore an < ran−1 < r2 an−2 < . . . < rn−N aN , so that an < an ∞ N −1 ∞ N Krn , n ≥ N , where K > aN . The series n=1 an = n=1 an + n=N an consists of a r finite sum plus an infinite tail which is dominated by the geometric series K rn . Since r < 1 , the geometric series converges and by the comparison test, so does an . ii) If L > 1 , then an+1 > an for all n > N ; thus limn→∞ an = 0 . By Theorem 6, the series an cannot converge. iii) This is seen from the two examples. an+1 1 n (a) n , with limn→∞ an = limn→∞ n+1 = 1 , which we know diverges. (n+1)(n+2) an+1 1 (b) = 1 , which we know (Theorem n(n+1) , with limn→∞ an = limn→∞ n(n+) 2, Example a) converges. In both these cases L = 1 . You should notice that the criterion uses the limiting value 1 of an+1 /an . The divergent harmonic series n , whose ratio n/n + 1 is less than one for finite n , but 1 in the limit shows the mistake you will make if you use the ratio before passing to the limit. Examples: (a) n! : Since limn→∞ ( an+1 ) = limn→∞ ( (n+1)! ) = limn→∞ an less than one so the series converges. 1 n! (b) 10n n! (c) n! 2n 1 n+1 = 0 < 1 , the ratio is : Since limn→∞ ( an+1 ) = limn→∞ ( n10 ) = 0 < 1 , the series converges. an +1 : Since limn→∞ ( an+1 ) = limn→∞ ( n+1 ) = ∞ , the series diverges. an 2 Our last test for series with positive terms is associated with a picture. The crux of the ∞ matter is very simple and clever. We associate an area with the infinite series n=1 an . For the term an we use a rectangle between n ≤ x ≤ n + 1 of height an and base one. Then the sum of the infinite series is represented by total area under the rectangles. Now by Theorem 7, if all the an ’s are positive we know the series converges if the total area is finite. Thus, if we can find a function f (x) whose graph lies above the rectangles, and whose total area is finite, then we know the area contained in the rectangles is finite and so the series converges. ∞ Theorem 1.10 . (integral test) Let n=i an be a series of positive decreasing terms: 0 < an+1 ≤ an , and f (x) a continuous decreasing function with f (n) = an . Then the sequence N SN = N f (x) dx an and TN = n=1 1 1.2. TESTS FOR CONVERGENCE OF POSITIVE SERIES 39 either both converge or both diverge, in fact, SN − a1 ≤ TN ≤ SN −1 . Proof: First of all, N 2 f (x) dx = 3 +··· + + 1 N −1 N 1 f (x) dx. N −1 2 n+1 = n n=1 Since in the interval n ≤ x ≤ n + 1 we know that an = f (n) ≥ f (x) ≥ f (n + 1) = an+1 , we see that n+1 n+1 f (n) dx ≥ an = n n+1 f (x) dx ≥ n f (n + 1) dx = an+1 . n Adding these up, we find N −1 N −1 an ≥ n=1 or f (x) dx ≥ n=1 n N −1 an+1 n=1 N N an ≥ n=1 N −1 n+1 f (x) dx ≥ 1 an . n=2 Thus SN −1 ≥ TN ≥ SN − a1 . From this last inequality, we see that limn → ∞TN is finite if and only if limn→∞ SN is finite. Since the sequences SN and TN are both monotone increasing sequences, by Theorem 7 the sequences converge or diverge together. And we are done. Examples: ∞ 1 (a). n=1 np converges if p > 1 , diverges if p ≤ 1 . We use the function f (x) = which satisfies the hypothesis of the theorem, and examine the integral N TN = 1 1 dx = xp N 1−p −1 1− p ln N , p = 1. , p=1 1 xp , . As N → ∞, ln N → ∞ , and so does N 1−p if p < 1 , while N 1−p → 0 if p > 1 . Therefore ∞ 1 TN converges if and only if p > 1 , so by our theorem n=1 np converges if and only if 1 p > 1 . In the special case p = 1 we have again proven that the harmonic series n 1 diverges. Another often seen special case is p = 2, , which converges. Sometime later n2 ∞ 1 π2 we shall prove the amazing n=1 n2 = 6 . N dx ∞ 1 (b) n=2 n ln n diverges since as N → ∞, 2 x ln 2 = ln(ln N ) − ln(ln 2) → ∞ Exercises (1) Determine if the following series converge or diverge. (a) ∞ 1 n=1 n2 +1 40 CHAPTER 1. INFINITE SERIES (b) (c) (d) (e) (f) (g) (h) (i) (j) (k) (l) (m) (n) ∞ 1 n=1 2n−1 ∞ 1 n=1 n(ln n)2 ∞ 1 n=1 10n2 ∞ n n=1 n2 +1 ∞ 1 n=1 2n+3 ∞ cos2 n n=1 2n √ n ∞ n=1 n3 +1 ∞ n2 n=1 2n ∞ −n2 n=1 ne ∞√ 1 n=1 n(n+1)(n+2) ∞ n! n=1 22n ∞ |a n | n=1 10n , |an | ∞ p −n , p n=1 n e < 10 ∈R (2) If an ≥ 0 and bn ≥ 0 for all n ≥ 1 , and if there is a constant c such that an ≤ cbn , prove that the convergence of bn implies the convergence of an . 1 (3) Use the geometric idea of the integral test to show limn→∞ [1 + 1 + · · · + n − ln n] 2 1 converges to a constant γ , and show that 2 < γ < 1 . γ is called Euler’s constant. (4) If an converges, where an ≥ 0 , prove that an 1+an also converges. (5) (a). If an converges, where an ≥ 0 , and cn have the property 0 ≤ cn ≤ K , the same K for all n , then prove that cn an converges. (b). Deduce the result of Exercise 4 from Exercise 5a. (6) Use the geometric idea behind the integral test to prove that (a). ln n! = ln 1 + ln 2 + ln 3 + · · · + ln n > From this deduce that n 1 ln x dx = n ln n − n + 1 when n ≥ 2 . (b). n! > e( n )n , when n ≥ 2 . e (c). As an application of (b), prove that limn→∞ xn n! = 0. 1 (7) (a). Use the idea in the proof of the divergence of the harmonic series, n , to prove the following test for convergence: Let { an } be a positive monotonically decreasing sequence. Then an converges or diverges respectively if and only if the “condensed” n a n converges or diverges. series 22 (b). Apply the test of part (a) to again prove that diverges if p ≤ 1 . 1 np converges if p > 1 , and (c). Apply the test of part (a) to determine the values of p for which the series ∞ 1 n=2 n(ln n)p converges and diverges. 1.3. ABSOLUTE AND CONDITIONAL CONVERGENCE 1.3 41 Absolute and Conditional Convergence The tests just given for series with positive terms can be applied to many series with complex terms an by utilizing the concept of absolute convergence. ∞ Definition: The series k=1 ak , where the ak may be complex numbers, converges ∞ absolutely if the series of positive numbers k=1 |ak | converges. It is called conditionally ∞ ∞ convergent if ak converges but |ak | diverges. k=1 k=1 Absolute convergence is stronger than ordinary convergence because Theorem 1.11 . If ∞ n=1 |an | converges, then ∞ n=1 an converges. Proof: Let aN = αn + iβn . We shall show that the real series αn and βn both converge. Then by Theorem 5 an converges too. To show that αn converges, let 2 2 cn = αn + |an | . Since |αn | ≤ (αn + βn ) = |an | , we know that 0 ≤ cn ≤ 2 |an | . Thus the positive series cn is bounded, cn ≤ 2 |an | < inf ty , and so converges by the comparison test (Theorem 8). Since αn = (cn − |an |) , and both cn and |an | converge, then αn also converges by Theorem 4. Similarly, by taking dn = βn + |an | , the series dn converges, from which we can conclude that βn converges. Examples: (a) The complex series in−1 n2 = 1 n2 1 12 + i 22 + i2 32 + and the positive series i3 + ··· 42 ∞ 1 n=1 n2 = ∞ in n=1 n2 converges absolutely since converges. (b) 1 + 212 − 213 − 214 + 215 + 216 − 217 − 218 + · · · , which is the geometric series 1 negative signs thrown in, converges absolutely since 2n converges. (c) rn , r complex, converges absolutely if 1 2n with |r|n converges, that is, if |r| < 1 . n+1 1 1 (d) 1 − 2 + 3 − 1 + 1 . . . = ∞ (−1) , the alternating harmonic series does not converge n=1 4 5 n 1 absolutely because n diverges. It does converge though, as we shall see shortly. Thus the alternating harmonic series is conditionally convergent. On the basis of this last theorem, many complex series can be proved to converge by proving they converge absolutely. Since absolute convergence concerns itself with series having only positive terms, all the tests for convergence developed in the previous section may be used. This is the most common way of proving a complex series converges. If it does not converge absolutely, the proof of convergence will usually be more difficult and use special ingenuity based on the particular series at hand. There is one case of conditional convergence which is easy to treat, that of alternating series. Definition: A series of real numbers is called alternating if the positive and negative terms occur alternately. They have the form ∞ (−1)n−1 an = a1 − a2 + a3 − a4 + · · · , n=1 where the an ’s are all positive. 42 CHAPTER 1. INFINITE SERIES ∞ n−1 a , a > 0 , converges if i) the a Theorem 1.12 . The alternating series n n n n=1 (−1) are monotone decreasing ( an ), and ii) limn→∞ an = 0 . If S is the sum of the series, the inequality 0 < |S − SN | < aN +1 (1-2) shows how much the N th partial sum differs from the limit S . In words inequality (2) says that the error which results by using the first N terms is less than the first neglected term aN +1 . Proof: The idea is quite simple. Observe that since an , S2n − S2n−2 = a2n−1 − a2n > 0 , so the S2n ’s increase. Similarly the S2n+1 ’s decrease. Also both sequences are bounded— from below by S2 and from above by S1 (you should check this). Therefore by Theorem 8 Chapter 0, the bounded monotonic sequences S2n and S2n+1 converge to real numbers ˆ ˆ S and S respectively. Let us show that S = S . ˆ S − S = lim S2n+1 − lim S2n = lim (S2n+1 − S2n ) = lim a2n+1 = 0 n→∞ n→∞ n→∞ n→∞ Thus the alternating series converges to the unique limit S . All that is left to verify is inequality (2). Because S2n is increasing and S2n+1 is decreasing, we know that S2n < S and S < S2n+1 Therefore 0 < S − S2n < S2n+1 − S2n = a2n+1 0 < S2n−1 − S < S2n−1 − S2n = a2n . and These two inequalities are the cases N even and N odd in (2). Examples: (a) ∞ (−1)n−1 n=1 n converges since it is an alternating sequence and tonically to zero. Later we shall show that its sum is ln 2 . (b) ∞ (−1)n n=2 ln n (c) ∞ n−1 n n=1 (−1) n+1 converges since 1 ln n 1 n decreases mono- decreases monotonically to zero. diverges by Theorem 6 since limn→∞ (−1)n−1 n n+1 is not zero. Exercises (1) Determine which of the following series converge absolutely, converge conditionally, or diverge. (a) (b) (c) (d) (e) 1 (f) 1) ∞ (−√ n+1 n=1 n ∞ (2−3i)n n=1 n! ∞ (2k+i)2 k=2 ek ∞ n−1 ln n n=1 (−1) n 1 − 1 + 1 − 212 + 5 2 3 ∞ 1 n=1 n2 +2i − 1 23 + 1 7 − 1 24 + 1 9 − ··· . 1.4. POWER SERIES, INFINITE SERIES OF FUNCTIONS (g) ∞ 1 n=1 n+2i (h) ∞ (−1)n−1 n=1 n+2i (i) ∞ (−1)n−1 , n=1 np (j) 43 2 ∞ n (1+i)n n=1 (−1) 2n2 +1 (k) ∞ cos nθ n=1 n2 , p > 0, θ arbitrary. (2) If an and bn are absolutely convergent, and α and β are any complex numbers, prove that (αan + βbn ) also converges absolutely. (3) Show that ∞ n n=1 nz converges absolutely if |z | < 1 . (4) Show that for any θ ∈ R , then ∞ then n=0 sin nθ also diverges. 1.4 ∞ n=0 cos nθ diverges, and that if θ = 0, ±π, ±2π, . . . , Power Series, Infinite Series of Functions As you will all agree, the simplest functions are polynomials. With infinite series at hand, it is reasonable to consider an “infinite polynomial” ∞ a0 + a1 z + a2 z 2 + a3 a3 + · · · = an z n . n=0 Because of the appearance of the powers of z , this is called a power series. The question of convergence of a power series is trivial at z = 0 , for then we have only the one term a0 . Does this series converge for any other values of z , and if so, for which ones? The answer depends on the coefficients an , but in any case, the set of complex numbers, z ∈ C , for which the series converges is always a disc |z | < ρ —with possibly some additional points on the boundary |z | = ρ —in the complex pane bC . This number ρ is called the radius of convergence of the power series. We shall first prove that the set z ∈ C for which a power series converges is always a disc. Then we shall give a way of computing the radius ρ of that disc. Theorem 1.13 . The set z ∈ C for which the power series an z n converges is always a disc |z | < ρ , inside of which it even converges absolutely. We do not exclude the two extreme possibilities that the radius of this disc is zero or infinity. The series might converge at some, none, or all of the points on the boundary of the disk |z | = ρ . Proof: We shall show that if the series converges for any ζ ∈ C , then it converges absolutely for all complex z with |z | < |ζ | . If ζ = 0 , there is nothing to prove, so assume ζ = 0 . Because an ζ n converges, limn→∞ |an ζ n | → 0 . Thus all the terms are bounded in absolute value, that is, there is an M such that |an ζ n | < M for all n . Then, since |an z n | = an ζ n zn z <M n ζ ζ n for all n, 44 CHAPTER 1. INFINITE SERIES the series |an z n | is dominated by M z ζ n . But this last series is a geometric series an z n which does converge since |z | < |ζ | , so z < 1 . Thus by the comparison test ζ converges absolutely for all z ∈ C with |z | < |ζ | . Therefore, if the power series an z n converges for some complex number ζ , then it converges in the whole disc |z | < |ζ | . The radius of convergence ρ is then the radius of the largest disc |z | < ρ for which the series converges. See Exercise 3 for examples concerning convergence on the boundary of the disk. Let us now give a method of computing ρ which covers most cases arising in practices. Theorem 1.14 . If limn→∞ an+1 an = L exists, the power series an z n has radius of 1 convergence ρ = L if L = 0, ∞ if L = 0 . In other words, if L = 0 the series converges 1 1 in the disc |z | < L and diverges if |z | > L . On the circumference |z | = 1/L , anything may happen (see Exercise 3 at the end of this section). If L = 0 , the series converges in the whole complex plane. Proof: This is a simple application of the ratio test. The series converges if the limit of z n+1 is less than one and diverges if it is greater the ratio of successive terms limn→∞ an+1 z n an than one. Thus we have convergence if lim 1 an+1 z = |z | L < 1, i.e. if |z | < , an L lim an+1 z 1 = |z | L > 1, i.e. if |z | > . an L n→∞ and divergence if n→∞ Remark: In the one additional case an+1 → ∞ as n → ∞ , the series diverges for every an |z | = 0 , as can easily be seen again by the ratio test. Examples: (a) ∞ n n=0 z (b) ∞ nz n n=0 2n converges where limn→∞ z n+1 /z n < 1 that is; for |z | < 1 . converges where limn→∞ lim n→∞ (n+1)z n+1 nz n / 2n 2n+1 < 1 . Since (n + 1)z n+1 nz n (n + 1)z z / n = lim , = n+1 n→∞ 2 2 2n 2 the series converges for all |z | < 2 . (c) ∞ zn n=0 n! converges where limn→∞ lim n→∞ z n+1 z n (n+1)! / n! < 1 . Since z n+1 z n z / = lim = 0, n→∞ n + 1 (n + 1)! n! the series converges for all z ∈ C , that is, in the whole complex plane. 1.4. POWER SERIES, INFINITE SERIES OF FUNCTIONS (d) ∞ n n=0 n!z converges where limn→∞ lim n→∞ (n+1)!z n+1 n!z n 45 < 1 . But (n + 1)!z n+1 = lim |(n + 1)z | = ∞ n→∞ n!z n unless z = 0 . Thus the ratio is less than one only at z = 0 , so the series converges only at the origin. Only minor modifications are needed for the more general power series ∞ a0 + a1 (z − z0 ) + a2 (z − z0 )2 + · · · = an (z − z0 )n , n=0 where a0 ∈ C . Again the series converges in a disc in the complex plane, only now the disc has its center at z0 instead of the origin, so if the radius of convergence is ρ , the series converges for |z − z0 | < ρ . An example should make this clear. Example: ∞ (z −2i)n n=1 n . By the ratio test, this converges when (z − 2i)n+1 (z − 2i)n / < 1, n+1 n lim n→∞ that is, when |z − 2i| < 1 . This is a disc with center at 2i and radius 1. A few words should be said about real power series an (x − x0 )n where both x and x0 are real (some people only use this phrase if the an are also real). This is a special case of an (z − z0 )n where z0 is on the real axis and we only ask for what real z the series converges. However we know that an (z − z0 )n converges only for those z in the disc of convergence |z − z0 | < ρ —and possibly some boundary points. Thus the real values of z for which the series an (z − z0 )n converges are exactly those points on the real axis which are also inside the disc of convergence of the complex power series. In particular the series an (x − x0 )n , with both x and x0 real converges for |x − x0 | < ρ , i.e., in the interval x0 − ρ ≤ x ≤ x0 + ρ . ∞ 1 n Example: For what x ∈ R does n=0 2n (x − 1) converge? The related complex series ∞ 1 n converges in the disc |z − 1| < 2 . The points on the real axis which are n=0 2n (z − 1) in this disc are |x − 1| < 2 , which is −1 < x < 3 . A direct check shows the series diverges at both end points x = −1 and x = 3 . If an and bn both converge, can we define their product in a meaningful way ∞ ( n=0 ∞ an )( n=0 ∞ bn ) = cn ? n=0 and if so, does the resulting series converge? The most simple-minded approach is to insert powers of z (a bookkeeping device), giving ( an z n )( bn z n ) , try long multiplication and see what happens. A computation shows that (a0 + a1 z + ax z 2 + · · · )(b0 + b1 z + b2 z 2 + · · · ) = a0 b0 + (a0 b1 + a1 b0 )z +(a0 b2 + a1 b1 + a2 b0 )z 2 + · · · + (a0 bn + a1 bn−1 + · · · + an b0 )z n + · · · . 46 CHAPTER 1. INFINITE SERIES Motivated by this, we make the following Definition: The formal product, called the Cauchy product, of the series is defined to be ∞ ∞ ( bn ∞ bn ) ≡ an )( n=0 an and n=0 cn , n=0 where n cn = a0 bn + az bn−1 + · · · + an b0 = ak bn−k . k=0 With this definition we shall answer the question we raised about multiplication of power series. Theorem 1.15 . If Cauchy product series ∞ n=0 an ∞ n=0 bn = A and ∞ ( = B both converge absolutely, then the ∞ ∞ bn ) ≡ ( an )( n=0 n=0 where cn ), n=0 ∞ ak bn−k , cn = k=0 also converges absolutely, and to C = AB . Proof: Let AN = N=0 an , BN = N=0 bn , and CN = N=0 cn . We shall show that by n n n picking N large enough, |AN BN − CN | can be made arbitrarily small. Since AN BN → AB , this will complete the proof. Observe that CN = a0 b0 + (a0 b1 + a1 b0 ) + · · · + (a0 bN + · · · + aN b0 ) = while N aj bk , N AN BN = (a0 + · · · + aN )(b0 + · · · + bN ) = aj bk . j =0 k=0 Therefore N N N N aj bk ≤ |AN BN − CN | = j =0 k=0 |aj | |bk | . j =0 k=0 Since j + k > N , either j > N/2 or k > N/2 , so N N j> N 2 k=0 |AN BN − CN | ≤ N N |aj | |bk | + |aj | |bk | . j =0 k> N 2 Because the original series both converge absolutely, they are bounded, ∞ ∞ |aj | < M and j =0 |bk | < M. k=0 Consequently, ∞ |AN BN − CN | ≤ M ( j> N 2 ∞ |aj | + |bk |). k> N 2 1.4. POWER SERIES, INFINITE SERIES OF FUNCTIONS 47 Again using the absolute convergence of the original series, we see that for N large, the right side can be made arbitrarily small. Since we shall need the ideas later on, let us digress briefly and examine the convergence of infinite series of functions, un (z ) . In the special case where un (z ) = an (z − z0 )n , this is a power series. Generally, there is little one can say about the convergence of such series except to apply our general tests and hope for the best. We shall only illustrate the situation with two Examples: (a) (b) ∞ cos nθ n=1 n2 , where θ is any real number. This converges for all θ since it converges nθ absolutely, that is + cos2 converges. We can see this last statement is true n cos nθ with the larger convergent series (since |cos nθ| ≤ 1 ) by comparing + n2 ∞ 1 n=1 n2 . ∞ nx n=1 ne . By the ratio test, converges if limn→∞ (n + 1)e(n+1)x /nenx < 1 . Since lim (u + 1)e(n+1)x /nenx = lim n→∞ n→∞ n+1 x e = ex , n the series converges if ex < 1 , which happens only when x < 0 . Exercises (1) Find the disc of convergence of the following power series by finding the center and radius of the disc. (a) (b) (c) (d) (e) (f) (g) (h) (i) (j) (k) (l) (m) (n) ∞ zn n=0 n+1 ∞ (z −2)n n=0 n ∞ in n−1 n=0 2n−1 z ∞ n=0 (n + 1)[z − 2 ∞ (2z +3)n n=0 n2 +2i ∞ 1 n−2 n=0 ln n z ∞ (2n−i) n n=0 3n z ∞ 2n z n n=0 n! (0! ≡ 1) ∞ 1 in n=0 ( 2n + 3n z ∞ (z +i)n n=0 22n ∞ z2 n n=0 (2n)! ∞ n n n=0 n (z − 1) ∞ z 2n n=0 4n ∞ 1 i n=0 ( n + n2 +1 )(z + 3i]n − √ 2i)n (2) Find the set x ∈ R for which the following series converge. 48 CHAPTER 1. INFINITE SERIES (a) (b) (c) (d) (e) (f) (g) ∞ (x−1)n n=0 n2n ∞ cos nx n=0 2n ∞ 1 x− 1 n n=0 n ( x ) ∞ −n(x+1) n=0 e ∞ 2n (sin x)n n=0 n ∞ xn n=0 (1 + e ) ∞ xn n=0 (1 − e ) (3) The point of this exercise is to show that a power series might converge at some, none, or all of the points on the boundary of the disk of convergence. ∞ n n=0 z (a) Show that vergence. diverges at every point on the boundary of its disc of con- n ∞ z (b) Show that n=0 n+1 diverges for z = 1 but converges for z = −1 (in fact, it converges everywhere on |z | = 1 except at z = 1 ). (c) Show that convergence. (4) If ∞ xn n=0 (n+1)2 converges at every point on the boundary of its disc of an z n diverges for z = ζ ∈ C , prove that it diverges for all z ∈ C with |z | > |ζ | . 2 ∞ z (5) For what z ∈ C does n=0 (1+z 2 )n converge? Find a formula for the n th partial sum Sn (z ) . Evaluate limn→∞ Sn (z ) . Is the limit function continuous? ∞ n have radius of convergence rho , and let P (n) be any polyno(6) Let n=0 P (n)an z ∞ n converges and also has ρ as its radius of convermial. Prove that n=0 P (n)an z gence. (By P (n) w e mean P (n) = Ak nk + Ak−1 nk−1 + · · · + A1 n + A0 ). 1.5 Properties of Functions Represented by Power Series Having found that a power series an (z − z0 )n converges in some disc, |z − z0 | < , it is interesting to study the function f (z ) defined by the power series for z in the disc of convergence ∞ an (z − z0 )n , f (z ) = |z − z0 | < ρ. n=0 It turns out that functions f (z ) defined by a convergent power series are delightful, as nicely behaved as functions can be. In particular, they are not only continuous, but also automatically have an infinite number of continuous derivatives and many other amazing properties. This section will be devoted to proving the more elementary properties of functions represented by power series, while in the next section we will begin with given functions, like sin x , and see if there is a convergent power series associated with the m, as well as showing a way of obtaining the coefficients an of that power series. The profound theory of functions represented by convergent power series is called analytic functions of a complex variable. 1.5. PROPERTIES OF FUNCTIONS REPRESENTED BY POWER SERIES 49 Definition: A function f (z ) of the complex variable z is said to be analytic in the disc |z − z0 | < ρ if f (z ) can be represented by a convergent power series in that disc: ∞ an (z − z0 )n , f (z ) = |z − z0 | < ρ. n=0 df Since we have not developed the notion of the derivative, dz , of a complex valued function f (z ) of the complex variable z , nor have we considered the corresponding theory of integration, f (z )dz , the scope of our treatment will regrettably have to be narrowed. However our proofs will have the property that as soon as an adequate theory of differentiation and integration is given, the theorems and proofs remain unchanged. Instead of considering power series in the complex variable z , we shall restrict our attention to series in the real variable x ∞ an (x − x0 )n , |x − x0 | < ρ, f (x) = (1-3) n=0 still allowing the coefficients an to be complex. Thus, f (x) is a complex-valued function of the real variable x . The definitions of derivative and integral for such functions were given in Section 7 of Chapter 0. We shall use that material here . Our aim is the following: ∞ n n=0 an x Theorem 1.16 . Suppose that Then has radius of convergence ρ > 0 (possibly ∞ ). (a) the function f (x) defined by ∞ an xn , |x| < ρ, f (x) = n=0 has an infinite number of derivatives; (b) the series ∞ n−1 n=0 nan x has the same radius of convergence ρ and ∞ nan xn−1 , |x| < ρ, f (x) = n=0 and (c) the series ∞ an n+1 n=0 n+1 x has the same radius of convergence ρ , and ∞ x f (t) dt = 0 n=0 an n+1 x , |x| < ρ. n+1 Remark: If we omit f (x) from the picture and write (b) and (c) directly in terms of the infinite sum, we find ∞ ∞ d (b) [ an xn ] = nan xn−1 dx n=0 and x∞ (c) [ 0 n=0 n=0 ∞ an tn ] dt = n=0 an n−1 x . n+1 50 CHAPTER 1. INFINITE SERIES These two statements are usually abbreviated “a power series may be differentiated term by term” and “a power series may be integrated term by term” within their domain of convergence (these statements are not generally true for an arbitrary infinite series of ∞ n is functions un (x) , see Exercise 4 below). The generalization to n=0 an (x − x0 ) obvious. Our proof will be given in several parts. We begin with the Lemma 1. Under the hypothesis of the theorem, f (x) is continuous for all x with |x| < ρ . Proof: (This is a ˜ ˜ little dull). Given any > 0 , we must find a δ > 0 such that |f (x) − f (˜)| < x when |x − x| < δ. ˜ N ∞ n n Let us write fN (x) = n=0 an x and RN (x) = N +1 an x , so that f (x) = fN (x) + RN (x) . Observe that |f (x) − f (˜)| = |fN (x) − fN (˜) + RN (x) − RN (˜)| ≤ |fN (x) − fN (˜)| + x x x x |RN (x)| + |RN (˜)| . x We shall show that each of these three terms can be made < 3 by picking x close enough to x and N -which is entirely at our disposal- large enough. ˜ First work with RN (x) and RN (˜) . Choose r such that |x| < r < ρ . This is to x ˜ insure that we stay away from the boundary |x| = ρ where the series may diverge. Then ∞ N n n n=0 |an r | converges absolutely, say to the number S . If we let SN = 0 |an r | , N n we know that by picking N large enough, N +1 |an r | = S − SN < 3 . But |RN (x)| = ∞ ∞ n≤ n | , so that if |x| + ≤ r , by using the same N found above, we N +1 an x N +1 |an x have ∞ |RN (x)| ≤ N +1 |an rn | = S − SN < . 3 Since by the definition of r we know |x| ≤ r , this also proves that for this same ˜ x N |RN (˜)| < 3 . Thus by restricting |x| ≤ r , we have seen that both |RN (x)| and |RN (˜)| x can be made less than 3 . Having fixed N, fN (x) is a polynomial -which we know is continuous. Thus there is a δ, > 0 such that |fN (x) − fN (˜)| < when |x − x| < δ1 . x ˜ 3 This shows that |f (x) − f (˜)| < if x is in the intersection of the intervals |x| ≤ |x| < x ˜ r < ρ ) and |x − x| < δ1 . That there is some interval contained in both of these intervals is ˜ easy to see since both contain all points sufficiently close to x . And the proof is completed. ˜ As you have observed, the proof involves no new ideas but is rather technical. With this lemma proved, we know that f (x) is continuous -and hence integrable. Thus x we can work with 0 f (t) dt . Our next task is to prove a portion of Part (c) of Theorem 16. Lemma 1.17 If ∞ n n=0 an x ∞ n=0 has radius of convergence ρ > 0 , then an n+1 x = n+1 x f (t) dt for all |x| < ρ. 0 Proof: We shall show that N x f (t) dt − 0 n=0 an n+1 x n+1 (1-4) 1.5. PROPERTIES OF FUNCTIONS REPRESENTED BY POWER SERIES 51 can be made arbitrarily small by choosing N large enough. Write ∞ N an tn + f (t) = n=0 an tn . n=N +1 Then since we can integrate any finite sum term by term, we have N x f (t) dt = 0 x an n=0 ∞ x n t dt + N n [ 0 0 an t ] dt = n=1 n=N +1 an n+1 x = n+1 ∞ x [ 0 an tn ] dt, n=N +1 so that (4) reduces to showing that ∞ x an tn dt 0 n=N +1 can be made small by choosing N large. The idea here is to apply Theorem 16 of Chapter 0. This means we need to estimate the size of the above integrand. By now you should recognize the method. Because |x| < ρ , we can choose an r such that |x| < r < ρ . Then an rn is convergent so its terms are bounded, say M ≥ |an rn | for all n , that is, M |an | ≤ rn . Therefore, since |t| < |x| , we find the inequality ∞ ∞ an t n ≤ N +1 ∞ n |an | |t| ≤ N +1 N +1 But the last series is a geometric series whose sum is ∞ an tn ≤ N +1 x r N M |x|n . n r x N M | x| r r −x . Thus M |x| . r−x Applying Theorem 16 of Chapter 0, we find that ∞ x an tn ) dt ≤ ( 0 N +1 x r N M |x|2 . r−x that is, N x f (t) dt − 0 0 an n+1 x x ≤ n+1 4 N M |x|2 . r−x N Since x < 1 , we know that x → 0 as N → ∞ , which completes the proof of the r r lemma. Incidentally, all we have left to prove of part c of the theorem is that the radius of convergence of the integrated series is no larger than ρ (since the lemma shows it is at least ρ ). But this will have to wait until after Lemma 1.18 If an xn has radius of convergence ρ , the series obtained by formally differentiating term by term, nan xn−1 , has the same radius of convergence. 52 CHAPTER 1. INFINITE SERIES Remark: This lemma does not say that the derived series is equal to the derivative of the function defined by the original series. It only discusses the radius of convergence, not the relationship of the functions represented b y the two series. Proof: Let ρ1 be the radius of convergence of nan xn−1 . First we show that ρ1 ≤ ρ . If nan xn−1 converges for some fixed x , then so does nan xn . But the terms of this last n since |na xn | ≥ |a xn | . Thus by the comparison sequence are larger than those of an x n n n also converges for that x , which shows ρ ≤ ρ . test an x 1 To show that ρ ≤ ρ1 , assume an xn converges for some x and choose r between |x| and ρ, |x| < r < ρ . As in the proof of Lemma 2 we find that |an | < M r−n . Then the terms n−1 . By nan xn−1 are smaller than the corresponding terms in n M | x| in the series rr the ratio test this last series converges, since |x| < r . Thus the derived series nan xn−1 also converges, showing that ρ ≤ ρ1 and completing the proof of the lemma. Now we can complete the proof of part c of Theorem 16. Corollary 1.19 If an xn has radius of convergence ρ , then the series obtained by foran n+1 mally integrating term by term, also has radius of convergence ρ . n+1 x an n+1 , and we have an xn is the formal derivative of the series Proof: The series n+1 x just seen that these two series have the same radius of convergence. We shall next prove part (b) of Theorem 16 as Lemma 1.20 f (x) ≡ ∞ n n=0 an x has radius of convergence ρ > 0 then df d = [ dx dx ∞ ∞ nan xn−1 , an xn ] = n=0 n=0 and this series also has radius of convergence ρ . Proof: In Lemma 3 we proved that the radii of convergence are the same. What we must prove here is that the derivative of the function is given by the derivative of the series. This is a more or less immediate consequence of Lemma 2, for let us apply this integration lemma to the function g (x) defined by ∞ nan xn−1 , |x| < ρ. g (x) ≡ n=1 Then we find that ∞ x an xn = f (x) − a0 , |x| < ρ. g (t) dt = 0 n=1 By the fundamental theorem of calculus, we can take the derivative of the left side, and it is g (x) . Thus g (x) = f (x), that is, ∞ nan xn−1 = n=1 d f (x). dx This incidentally also proves the otherwise not obvious fact that f (x) , only known to be continuous so far (Lemma 1) is also differentiable. 1.5. PROPERTIES OF FUNCTIONS REPRESENTED BY POWER SERIES 53 To complete the proof of Theorem 16, we must prove Lemma 5. If the power series ∞ n an xn converges for |x| < ρ , then the function f (x) defined by f (x) ≡ n=0 an x has an infinite number of derivatives. The derivatives are represented by the formal series obtained by term-by-term differentiation. Proof: By induction, Lemma 4 shows us that f (x) has one derivative. Assume f (x) has k derivatives. We shall show that is has k + 1 . Let f (k) (x) = bn xn be the series for the k th derivative of f . Applying Lemma 4 to this series we find that f (k) (x) is differentiable. This proves that f has k + 1 derivatives and completes the induction proof. Examples: (a) We know that 1 = 1+t ∞ (−t)n = 1 − t + t2 − t3 + · · · . n=0 where the geometric series converges for |t| < 1 . Applying the theorem, we integrate term by term to find that x ln(1 + x) = 0 1 dt = 1+t or ∞ n=0 (−1)n · xn+1 , |x| < 1, n+1 x2 x3 x4 x5 + − + + ··· . 2 3 4 5 ln(1 + x) = x − Thus the function ln(1 + x) is equal to the power series on the right. With a little more work we can prove that the series, which converges at x = 1 , converges to ln(1 + 1) and obtain the following interesting formula. ln 2 = 1 − 1111 + − + − ··· . 2345 The power series for ln(1 + x) can be used to illustrate the possibilities of computing with infinite series. If 0 < x < 1 the series for ln(1 + x) is a strictly alternating series to which we can apply inequality (2) of Theorem 12. For this series it reads k 0 < ln(1 + x) − n=0 (−1)n xn+1 xk+2 < , n+1 k+2 x > 0. This inequality states that if only the first k terms of the infinite series are used to compute k+2 ln(1 + x) , the error will be less than x +2 . Say we want to compute ln(1 + 1 ) = k 4 ln 5 to 5 decimal places. Then we want t o choose k so that 4 1 k+2 4 k+2 < 1 = 10−6 1, 000, 000 Cross-multiplying, writing 4 = 22 , we want k such that 106 < (k + 2)22k+4 , since k + 2 ≥ 2, 22k+5 ≤ (k + 2)22k+4 . Thus, we are done if we can find k such that 106 ≤ 22k+5 . 54 CHAPTER 1. INFINITE SERIES But since 210 = 1024 > 103 , we know 220 > 106 . Thus if 2k + 5 ≥ 20 , or k = 8 we will have the desired accuracy. This means that ln 5 1 11 11 = − ( )2 + · · · + ( )8+1 + error 4 4 24 94 where the error is less than 10−6 . From the form of the error estimate, it is clear that the series converges faster if x is smaller. This power series, valid only if |x| < 1 can be used to compute ln(1 + x) if |x| > 1 by utilizing the observation illustrated by 4 1 1 3 ln 6 = 3 ln( ) + 2 ln( ) = 3 ln(1 + ) + 2 ln(1 + ), 2 3 2 3 1 where both ln(1 + 1 ) and ln(1 + 3 ) can be computed using the power series. We should 2 confess that this series converges too slowly to be of much value for that purpose in real life. 1 (b) Since 1+t2 is also the sum of a geometric series 1 = 1 − t2 + t4 − t6 + t8 + · · · = 1 + t2 ∞ (−1)n t2n , |t| < 1, 0 if we integrate term by term, we find 2 tan−1 x = 0 dt + 1 + t2 ∞ n=0 x3 x5 (−1)n x2n+1 =x− + + ··· , 2n + 1 3 5 which converges if |x| < 1 . Further investigation shows that the series also converges at x = 1 and represents the function at that point. This yields the wonderful formula (obtained by letting x = 1 ) 111 π = 1 − + − + ··· 4 357 from which we can compute π to any desired accuracy. Exercises 1 (1) Write down an infinite series whose sum is 1−t and integrate the series term by term to obtain a power series for ln(1 − x) . For what x does the series converge? (2) Find a power series which converges about x = 0 for the function x (1−x)2 by recog- 1 nizing (1−x)2 as the derivative of a function whose power series in known. For what x does the series converge? (3) Compute ln 9 to 4 decimal places, proving the error in your approximation is correct. 8 2 ∞ sin n x (4) Show that converges for all x but the series obtained by differentiating n=1 n2 term-by-term does not converge, say at x = 0 . (5) Exercise your ingenuity and apply the theorems of this section to find the function whose power series is (a) a + 2x2 + 4x4 + 6x6 + 8x8 + · · · + (2n)x2n + · · · . (b) 2 + 3 · 2x + 4 · 3x2 + 5 · 4x3 + · · · + (k + 2)(k + 1)xk + · · · 1.5. PROPERTIES OF FUNCTIONS REPRESENTED BY POWER SERIES 55 6. Taylor’s Theorem. Representation of a Given Function in a Power Series. The Binomial Theorem. In this section we prove Taylor’s Theorem, an important generalization of the mean value theorem, and use it to investigate the questions i) when does a given function f (x) have a power series? and ii) if f (x) has a power series about x0 , f (x) = ∞ an (x − x0 )n , n=0 how can we find the coefficients an ? As a partial answer to i) we know from Theorem 16 of the last section that if f (x) has a power series about x0 , it must necessarily have an infinite number of derivatives at x0 . It turns out that this is not enough. Perhaps it is easiest to begin with question ii). Assume f (x) has a power series about x0 , ∞ an (x − x0 )n , f (x) = n=0 which converges for |x − x0 | < ρ . How can we find the coefficients an ? By Theorem 16 we know that f has an infinite number of derivatives at x0 . Moreover these derivatives can be calculated by differentiating the power series term-by-term. F or convenience we let x0 = 0 . f (x) = a0 + a1 x + a2 x2 + a3 x3 + · · · + an xn + · · · , f (x) = a1 + 2a2 x + 3a3 x2 + · · · + nan xn−1 + · · · , f x(x) = 2a2 + 2 · 3a3 x + 3 · 4 · a4 x2 + · · · + n(n − 1)an xn−2 + · · · , f (3) (x) = 2 · 3a3 + 2 · 3 · 4a4 x + 3 · 4 · 5a5 x2 + · · · +, f (n) (x) = n!an + (n + 1)!an+1 x + (n + 2) !an+2 x2 + · · · . x By letting x = 0 in each line, we find a0 = f (0), a1 = f (0), a2 = f (0) f (n) (0) , . . . , an = . 2 n! This proves Theorem 1.21 If f (x) = an (x − x0 )n has a convergent power series representation about x0 , then the coefficients an are equal to f (n) (x0 )/n! , so in fact ∞ f (x) = n=0 f (n) (x0 ) (x − x0 )n . n! (1-5) This formula (1.21) completely solves the problem of finding the coefficients an of a function if that function has a power series. A simple consequence is the Corollary 1.22 A function f (x) has at most one convergent Taylor series about a point x0 . Proof: By the above theorem, if f (x) = f (n) (x an (x − x0 )n and f (x) = bn (x − x0 )n , then an = n! 0 ) = bn , so the power series are identical. Remark: When f has a power series expansion about x0 , the series is usually called the Taylor series of f at x0 . In the special case x0 = 0 , the series is sometimes called the Maclaurin series for f . 56 CHAPTER 1. INFINITE SERIES Examples: n d (a) If f (x) = ex has a power series about x = 0 , what is it? Since f (n) (0) = dxn ex x=0 = ∞ 1n e0 = 1 , we know that an − 1/n! so that the power series is n=0 n! x . We cannot ∞ 1n yet write ex = n=0 n! x since we have not proved that ex does have a power series. (b) If f (x) = cos x has a power series about x = 0 , what is it? f (0) = 1, f (0) = − sin 0 = 0, f (0) = − cos 0 = −1, f (0) = sin 0 = 0, f (4) (0) = cos 0 = 1, . . . . All the odd derivatives at 0 are zero while the even derivatives alternate between +1 and −1 . Therefore the series is a− 1 1 12 x + x4 − x6 + · · · 2! 4! 6! ∞ n=0 (−1)n x2n . (2n)! Again we cannot yet claim that this is cos x . 1 (c) If f (x) = e− x2 0 , x=0 , x=0 . has a power series about x = 0 what is it? The computation is somewhat more difficult here. f (x) = 1 4 )e− x2 x6 1 1 2 − x2 , e x3 6 f (x) = (− x4 − n− 3n , and generally f (n) (x) = ( α3n + · · · + α2n+22 )e− x2 where the αk are real x x numbers we don’t need to find. If we let x = 0 in f (n) (x) , the resulting expression has the indeterminate form ∞ · 0 . Thus l’Hˆspital’s rule must be invoked. Now o 2 2 −1/x e−1/x , k > 0 . What is limx→0 e xk ? Let xk k/2 limt→0 tk/2 e−t = limt→∞ t et . If k is an even integer, times leaves a constant in the numerator and et in the f (n) (x) is the sum of terms of the form 1 t = x2 , and we must evaluate applying l’Hˆspital’s rule k/2 o denominator, so the limit is limt→∞ const = 0 . If k is odd, applying l’Hˆspital’s rule o et √ (k + 1)/2 times leaves a function of the form const , which also tends to 0 as t → ∞ . tet What we have just shown is that f (n) (0) = 0 . The power series associated with e−1/x 2 is 0+0·x+ 02 0 x + · · · xn + · · · ≡ 0. 2! n! 2 This function e−1/x , whose power series about x = 0 is zero, is an example of a function which is clearly not equal to the power series, 0, associated with it. To find if a given function has a power series expansion about x0 we turn to Taylor’s Theorem (also known as the extended mean value theorem). Now if a function f defined in a neighborhood of x0 has a power series expansion there, we know the series is given by (5). Thus we should investigate N RN (x) ≡ f (x) − n=0 f (n) (x0 ) (x − x0 )n . n! To say that f is equal to its series expansion is the same as saying that the remainder, RN (x) , becomes arbitrarily small as N → ∞ . We must now seek an estimate of this remainder RN (x) . Taylor’s theorem is one way of finding an estimate. 1.5. PROPERTIES OF FUNCTIONS REPRESENTED BY POWER SERIES 57 Theorem 1.23 . (Taylor’s Theorem). Let f be a real-valued function with N + 1 continuous derivatives defined on an interval containing x0 and x . There exists a number ζ between x0 and x such that f (x) = f (x) + f (x0 )(x − x0 ) + + f (x0 ) f (x0 ) (x − x0 )2 + (x − x0 )3 + · · · 2! 3! f ( N + 1)(ζ ) f (N ) (x0 ) (x − x0 )N + (x − x0 )N +1 . N! (N + 1)! (1-6) In other words, RN (x) = f (N +1) (ζ ) (x − x0 )N +1 . (N + 1)! (1-7) Remark: 1 The proof will only tell us that such a ζ exists but will give us no way to find it. In practice we often try to find some upper bound M for f ( N + 1)(ζ ) , so f ( N + 1)(ζ ) ≤ M , for all N , and only use the crude resulting estimate |RN (x)| ≤ M |x − x0 |N +1 . (N + 1)! (1-8) An example of this is the series for cos x . Assuming the proof of the theorem, we know that (see Example b above) about x0 = 0 , cos x = 1 − where RN (x) = Since x2 x4 (−1)N 2N + + ··· + x + RN (x), 2! 4! (2N )! 1 d2N +2 [ 2N +2 cos x]x=ζ x2N +2 , ζ (0, x). (2N + 2)! dx d2N +2 cos x dx2N +2 we find that |RN (x)| ≤ ≤ 1, x= ζ 1 |x|2N +2 (2N + 2)! Because, for fixed x , this remainder tends to 0 as n → ∞ , we have proved that the power series for cos x at x0 = 0 does converge to cos x , so in the limit ∞ cos x = n=0 (−1)n 2n x. (2n)! We can apply Theorem 16 and differentiate both sides of this to find the series for sin x . Remark: 2 Observe that Taylor’s Theorem is only proved for real-valued functions f . It is not true if f is complex-valued. However using it we will be able to prove the inequality (7) for complex-valued f . Proof: (Taylor’s Theorem). Our proof is short—perhaps a little too slick. The trick is to appeal to the mean value theorem (really only Rolle’s theorem is used). 58 CHAPTER 1. INFINITE SERIES Fix x and define the real number A by N f (x) = n=0 (x − x0 )N +1 f ( n)(x0 ) (x − x0 )n + A . n! (N + 1)! (1-9) Now let H (t) := f (x) − f (t) + f (t)(x − t) + f (t)(x − t)2 f (N ) (t) (x − t)N +1 +···+ (x − t)N − A . 2! N! (N + 1)! Thus we are letting x0 vary, not x . Observe that H (x) = 0 (obviously) and H (x0 ) = 0 (by definition of A). Since H (t) satisfies the hypotheses of the mean value theorem, we conclude that there is some ζ between x0 and x such that H (ζ ) = 0 . But H (t) = −f (t) − f (t)(x − t) − f (t) − · · · − −A f (N +1) (t) f (N ) (t) (x − t)N − (x − t)N −1 N! (N − 1)! (x − t)N (x − t)N A − f (N +1) (t) . = N! N! Amazingly, almost all the terms canceled. Since H (ζ ) = 0 and ζ = x , we now know that A = f (N +1) (ζ ) . Substitution of this value of A into (8) gives us exactly (6), which is just what we wanted to prove. As an application let us prove the Binomial Theorem. That is the name given to the Maclaurin series for (1 + x)α , where α ∈ R . The derivatives are easy to compute. f (x) = (1 + x)α f (x) = α(1 + x)α−1 f (x) = α(α − 1)(1 + x)α−2 ... f (n) = α(α − 1) · · · .(α − n + 1)(1 + x)α−n . Thus the power series about 0 associated formally with (1 + x)α is ∞ n=0 α(α − 1) · · · (α − n + 1) n x. n! By the ratio test this series converges for |x| < 1 . Does it converge to (1 + x)α when |x| < 1 ? If α is a positive integer, α = N , the terms in the power series from n = N + 1 on all are zero since they contain the factor (N − N ) . In this case we have only a finite series so convergence is trivial. The resulting polynomial is the familiar Binomial Theorem of high school algebra. Let us therefore assume α is not a positive integer (or 0). Then we have an honest infinite series. In order to prove that (1 + x)α is equal to the infinite series, we must show that the remainder N RN (x) ≡ (a + x)α − n=0 α(α − 1) · · · (α − n + 1) n x n! 1.5. PROPERTIES OF FUNCTIONS REPRESENTED BY POWER SERIES 59 tends to zero as N → ∞ . By Taylor’s Theorem RN (x) = α(α − 1) · · · (α − N ) (1 + ζ )α−N −1 xN +1 , (N + 1)! where ζ is between 0 and x . We shall prove that this tends to 0 as N → ∞ only when 0 ≤ x < 1 . It is also true for −1 < x ≤ 0 , but the proof is much longer so we will not give it [however a different attack yields the proof easily]. Now if 0 ≤ x < 1 , since 0 < ζ < x , then 1 < 1 + ζ . Therefore for N ≥ α , we have (z + ζ )α−N −1 < 1 . Thus |RN (x)| < α(α − 1) · · · (α − N ) N +1 x (N + 1)! which does tend to zero as N → ∞ (since it is the N + 1 st term of the convergent series ∞ α(α−1)···(α−n+1) n x , |x| < 1) . n! Although we have proved it only if 0 ≤ x < 1 , we shall state the complete Theorem 1.24 (Binomial Theorem). The function (1 + x)α is equal to a power series which converges for |x| < 1 . It is ∞ (1 + x)α = n=0 α(α − 1) · · · (α − n + 1) . n! (1-10) In practice it is silly to memorize this formula since it is easier to expand (1 + x)α directly in a Maclaurin series, which we have just shown (partly anyway) is equal to the function. We close this section with the generalization of Taylor’s Theorem to complex-valued function f (x) . Theorem 1.25 . Let f (x) = u(x) + iv (x) be a complex-valued function with N + 1 continuous derivatives defined on an interval containing x0 and x . There exists a real number MN depending on N such that N f (x) − n=0 f (n) (x0 ) MN (x − x0 )n ≤ |x − x0 |N +1 n! (N + 1)! (1-11) Proof: Since f has N + 1 continuous derivatives, so do the real-valued functions u(x) and v (x) . Applying Taylor’s Theorem to u and v , we find numbers ζ1 and ζ2 , both between x0 and x , such that N u(x) − n=0 u(n) (x0 ) u(N +1) (ζ1 ) (x − x0 )n = (x − x0 )N +1 , n! (N + 1)! and N v (x) − n=0 v (n) (x0 ) v (N +1) (ζ2 ) (x − x0 )n = (x − x0 )N +1 . n! (N + 1)! 60 CHAPTER 1. INFINITE SERIES Thus, by addition, since f (n) = u(n) = iv (n) , we find N f (x) − n=0 u(N +1) (ζ1 ) + iv (N +1) (ζ2 ) f (n) (x0 ) (x − x0 )n = (x − x0 )N +1 . n! (N + 1)! However since u(N +1) and v (N +1) are assumed continuous in an interval containing ˆ ˜ x0 and x , they are bounded there, say by MN and MN . Taking absolute values of the ˆ ˜ last equation, we obtain equation (10) where MN = M 2 + M 2 . N N Exercises (1) Find the Taylor series about the specified point x0 and determine the interval of convergence for the following functions. You need not prove that the series do converge to the functions. (a) (b) (c) (d) (e) (f) (g) (h) sin x, x0 = 0, ln x, x0 = 1, 1 x , x0 = −1, √ x, x0 = 6, 1x −x 2 (e + e ), x0 = 0 x+ i 1+x , x0 = 0, cos x, x0 = π , 4 1 i+x , x0 = 0 2 (i) e−x , x0 = 0, (j) (1 + x + x2 )−1 , x0 = 0, (k) cos x + i sin x, x0 = 0, 1 (l) √1+2x , x0 = 0. (2) Prove that in their interval of convergence about 0 the following power series associated with the given functions converge to the functions. Do this by proving that the remainder |RN (x)| → 0 as N → ∞ . (a) sin x, 1 (b) 1+x4 , (c) e−x (d) cosh x [Recall the definition: cosh x = ex +e−x 2 . 2 1 (3) One often approximates √1+x2 by 1 − x when |x| is small. Give some estimate of 2 the error if a) |x| + < 10−1 , b) |x| < 10−2 , c) |x| < 10−4 . (4) Use the Taylor series 2 e−x = 1 − x2 + 1 2 x4 x6 (−1)n x2n − + ··· + + ··· . 2! 3! n! to evaluate 0 e−x dx to three decimal places. I suggest using Theorem 16 and the error estimate of Theorem 12. 1.5. PROPERTIES OF FUNCTIONS REPRESENTED BY POWER SERIES 61 (5) Assume the ordinary differential equation y − y = 0 , with y (0) = 1 has a power series solution y (x) = ∞ an xn about x = 0 . a). Substitute this series directly n=0 into the differential equation and solve for the coefficients an . b). Find when the series converges; c). justify (a posteriori) the fact that the function defined by the convergent series does satisfy the differential equation. [We do not yet know that this is the only solution. All we know is that it is the only solution which has a power series]. (6) In this exercise you will prove that e is irrational. It all hinges on the series for 3 . e=1+1+ 1 1 1 + + ··· + + ··· . 2 3! n! (a) Prove that 2 < e < 3 , so e is not an integer (cf. page 58, bottom). (b) Assume e is rational, e = p , where p and q are integers with no common q factor and q ≥ 2 . Then use the Taylor series with q terms and the remainder Rq to show that e · q ! = N + qeζ , where 0 < ζ < 1 , and N is an integer. +1 ζ (c) From this deduce that qe must be an integer, and show that this contradicts +1 eζ < e < 3 , and q + 1 ≥ 3 . (7) This exercise generalizes the form of the remainder (6’) in Taylor’s Theorem. Fix x and define the number B by N f (x) = n=0 f (n) (x0 ) (x − x0 )n + B (x − x0 )α , α ≥ 1. n! Then consider the function H (t) defined by N H (t) ≡ f (x) − n=0 f (n) (t) (x − t)n − B (x − t)α . n! Show that there is a ζ between x0 and x such that B= f (N +1) (ζ ) (x − ζ )N +1−α , αN ! so that f (N +1) (ζ ) (x − x0 )α (x − ζ )N +1−α . αN ! This is Schlomilch’s form of the remainder. In the special case α = N + 1 , we obtain Lagrange’s form of the remainder, (6) found previously, while for α = 1 we obtain Cauchy’s form of the remainder RN = RN = f (N +1) (ζ ) (x − x0 )(x − ζ )N . N! Here are two applications of Taylor’s Theorem to problems other than infinite series. The first one deals with max-min. Let f (x) be a sufficiently smooth function (by which we mean f has plenty of derivatives—we’ll specify the number later). Now we know that 62 CHAPTER 1. INFINITE SERIES if f has a local maximum or minimum at x0 , then f (x0 ) = 0 , and it is a maximum if f (x0 ) < 0 , a minimum if f (x0 ) > 0 . But what if f (x0 ) = 0 ? Consider the examples f1 (x) = x4 , f2 (x) = −x4 , f3 (x) = x3 , the first of which has a minimum at x = 0 , the second a maximum at x = 0 , while the third has neither. These three examples suggest the criterion will depend upon the lowest non-zero derivative being an even or odd derivative, and on its sign. a figure goes here By the definition of local maximum and minimum, the issue is the behavior of f (x) in a neighborhood of x0 , that is, the nature of f (x0 + h) for |h| small. We remind you that f has a local max at x0 if f (x0 + h) − f (x0 ) ≤ 0 for all |h| sufficiently small, and a local min at x0 if f (x0 + h) − f (x0 ) ≥ 0 for all |h| sufficiently small. Since the behavior of f (x) near x0 is determined by the Taylor polynomial f (x0 + h) = f (x0 ) + f (x0 )h + f (n) f (n+1) (ζ )hn+1 f (x0 )h2 + ··· + (x0 )hn + 2! n! (n + 1)! where ζ is between x0 and x0 + h , it is natural to look at this polynomial to answer our question. Theorem 1.26 Assume f has (at least) n + 1 continuous derivatives in some interval containing x0 . Say f (x0 ) = f (x0 ) = . . . = f n (x0 ) = 0 but f (n+1) (x0 ) = 0 , then (a) if n is even, then f has neither a max nor min at x0 . (b) if n is odd, then i) f has a max at x0 if f (n+1) (x0 ) < 0. ii) f has a min at x0 if f (n+1) (x0 ) > 0. Proof: We shall use Taylor’s polynomial with n + 1 terms. Since the first n derivatives vanish at x0 , we have f (x0 + h) − f (x0 ) = f (n+1) (ζ ) n+1 ,ζ (n+1)! h (n+1) (ζ ) must f between x0 and x0 + h . Because f (n+1) (x) is assumed continuous at x0 , have the same sign as f (n+1) (x0 ) in some neighborhood of x0 . Restrict your attention to the neighborhood. If n is even, n + 1 is odd, so that hn+1 is positive if h > 0 , negative if h < 0 . Thus f (x0 + h) − f (x0 ) changes sign in any neighborhood of x0 . However if n is odd, hn+1 is positive no matter if h > 0 or h < 0 . Therefore f (x0 + h) − f (x0 ) has the same sign as f (n+1) (x0 ) throughout some neighborhood about x0 . The precise conditions are easy to verify now. Examples: 1. f (x) = x5 + 1 has neither a max nor min at x = 0 , since f (0) = . . . = f (4) (0) = 0 , but f (5) (0) = 5! = 0 . 2. f (x) = (x − 1)6 − 7 has a min at x = 1 since f (1) = . . . = f (5) (1) = 0 , but f (6) (1) = 6! > 0 . Our second application is a geometrical interpretation of the Taylor polynomial. Given the function f (x) , consider the polynomial Pn (x) = f (x0 ) + f (x0 )(x − x0 ) + f (x0 ) f (n) (x0 ) (x − x0 ) + · · · + (x − x0 )n , 2! n! 1.5. PROPERTIES OF FUNCTIONS REPRESENTED BY POWER SERIES 63 whose first n derivatives agree with those of f at x = x0 . P1 (x) = f (x0 ) + f (x0 )(x − x0 ) is the equation of the tangent to the curve y = f (x) at x0 . It is the straight line which most closely approximates the curve at x0 . Similarly P2 (x) is the parabola which most closely approximates the curve at x0 . Generally, Pn (x) is the polynomial of degree n which most closely approximates the curve y = f (x) at the point x0 . Using this Taylor polynomial, we can define the order of contact of two curves at a point. Definition: The two curves y = f (x) and y = g (x) have order of contact n at the point x0 if their Taylor polynomials of degree n at x0 are identical, but their n + 1 st Taylor polynomials differ. An equivalent definition is that f (x0 ) = g (x0 ) , f (x0 ) = g (x0 ) , . . . , f (n) (x0 ) = (n) (x ) , but f (n+1) (x ) = g (n+1) (x ) . We have assumed that f and g have n + 1 g 0 0 0 continuous derivatives. If f and g have contact n at x0 , then f (x0 + h) − g (x0 + h) = f (n+1) (ζ1 ) − g (n+1) (ζ2 ) n+1 h . (n + 1)! One interesting consequence of this formula is that if f and g have contact of even order, then the curves will cross at x0 , while if the contact is of odd order, the curves will not cross in some neighborhood of x0 . We can define the curvature of a curve in the plane by using the concept of contact. First we define the curvature of a circle (whose curvature had better be constant). 1 1 Definition: The curvature k of a circle of radius R is defined to be R , k = R . Thus the smaller the circle, the larger the curvature—a natural outcome. Furthermore, a straight line—which may be thought of as a circle with infinite radius—has curvature zero. How can we define the curvature of a given curve? For all non-circles, the curvature will clearly vary from point to point of the curve. Thus, the concept we want is the curvature of a given curve y = f (x) at a point x0 . Our definition should appear reasonable. Definition: The curvature k of a plane curve y = f (x) at the point x0 is the curvature of the circle which has contact of order two at x0 . This circle which has contact of order two is called the osculating circle to the curve at x0 (osculate: Latin, to kiss). Let us convince ourselves that there is only one osculating circle (for if there were two, the curvature would not b e well defined.) Consider all circles of contact one to f (x) at x0 . These are all circles tangent to f (x) at x0 . Their centers lie on the line l normal to the curve at x0 (“normal” means perpendicular to the tangent line). It is geometrically clear that of these circles with contact 1, there will be exactly one with contact 2. Example: Find the curvature of y = ex at x = 0 . The slope of the curve at (0, 1) is 1. Therefore the equation of the normal is y − 1 = −x . Since the center (x0 , y0 ) of the osculating circle must lie on this line, and the circle contains the point (0, 1) , subject to y0 = 1 − x0 , the value of x0 must be determined from the fact that the second derivative of the circle (0, 1) must equal the second derivative of y = ex at x = 0 , that is, it must equal 1. But for any circle, (y − y0 )y + y 2 + 1 = 0 . In our case y = 1 at (0, 1) (recall the circle is tangent to ex at (0, 1) ), so that (1 − y0 ) · 1 + 1 + 1 = 0 , or y0 = 3 . The equation y0 = 1 − x0 implies that x0 = −2 . Thus the equation of the osculating circle is 1 (y − 3)2 + (x + 2)2 = 8 , and the curvature of y = ex at x = 0 is k = √8 . Later on we will give another definition of curvature which is applicable not only to plane curves, but also to curves in space. 64 CHAPTER 1. INFINITE SERIES Exercises (1) What is the order of contact of the curves y = e−x and y = 1 1+x + 1 sin2 x at x = 0 ? 2 (2) Find the osculating circle and curvature for the curve y = x2 at x = 1 . (3) Show that at x = a , the curve y = f (x) has curvature k = f (a ) 3 [1+f (a)2 ] 2 f of the osculating circle is at the point (a − f (a) [1 + f (a)2 ], f (a) + (a ) is the messy equation of the osculating circle? and the center 1+f (a)2 f (a) ). What (4) At the given points, the following curves have slope zero. Determine if the curve has a max, min, or neither there. (a). y = (x + 1)4 , x = −1, (b). y = x2 sin x, x = 0. (5) Let P1 , P , and P2 be three distinct points on the curve y = f (x) , and consider the circle passing through those three points. Show that in the limit as both P1 and P2 approach P , this circle becomes the osculating circle. (Hint: Taylor’s Theorem will be needed here). (6) In this problem we outline another derivation of Taylor’s Theorem. Whereas the one in the notes did not use the fact the f (n+1) was continuous, this proof relies upon that fact. (a) Show that x x0 (x − t)k−1 (k) (x − x0 )k f (t) dt = f (k) x0 + (k − 1)! k! x x0 (x − t)k (k+1) f (t) dt. k! (b) Prove by induction that f (x) = f (x0 ) + f (x0 )(x − x0 ) + · · · + f (n) (x0 ) (x − x0 )n + n! x x0 (x − t)n (n+1) f (t) dt. n! The remainder is expressed as an integral here. It is because f (n+1) is to be integrated that we require its continuity. (7) (a) Let g (x) have contact of order n with the function 0 at the point x = a , and assume that f (x) has contact of order at least n with the function 0 at x = a . Use Taylor’s Theorem to prove that f (x) f (n+1) (a) = (n+1) x→a g (x) g (a) lim This is l’Hˆspital’s Rule. o (b) Apply l’Hˆspital’s rule to evaluate o i) lim x→0 x − sin x 1 − tan θ , ii) lim x3 θ− π θ→ π 4 v 1.6. COMPLEX-VALUED FUNCTIONS, E Z , COS Z, SIN Z . 65 (8) Assume f has two derivatives in the interval [a, b] , and assume that f ≥ 0 throughout the interval. Prove that if ζ is any point in [a, b] , then the curve y = f (x) never falls below its tangent at the point x = ζ, y = f (ζ ) . [hint: Use Taylor’s Theorem with three terms]. (9) Use Cauchy’s form of the remainder (p. 103-4, no. 7) for Taylor’s Theorem to prove that the binomial series converges to (1 + x)α for −1 < x ≤ 0 . This will complete the proof of the binomial theorem. n 1d (10) The nth Legendre polynomial Pn (x) is defined by Pn (x) = 2n n! dxn [(x2 − 1)n ] . Prove that Pn (x) is a polynomial of degree n and has n distinct real zeros in the interval (−1, 1) . (11) Verify that eax is a solution of y = ay . Prove that every solution has the form Aeax , where A is a constant. (12) Assume that f (x) has plenty of derivatives in the interval [a, b] , and that f has n + 1 distinct zeros in the interval. Prove that there is at least one c ∈ (a, b) such that f (n) (c) = 0 . 1.6 Complex-Valued Functions, ez , cos z, sin z . The task of this section is to answer the following question. Say f (x) is a real or complex valued function of the real variable x . How can we define f (z ) where z is complex ? For example, if P (x) = a0 + a1 x + · · · + an xn is a polynomial, the answer is easily given: just define P (z ) = a0 + a1 z + · · · + an z n . Since this function only involves addition and multiplication of complex numbers, for any complex z the number P (z ) can be computed. P Similarly any rational function, Q(x) , where P (x) and Q(x) are both polynomials, can be ( x) P defined for complex z as Q(z ) since both P (z ) and Q(z ) are defined separately and we (z ) can then take their quotient. But how do we define ez , or cos z , or (1 + z )α , where α R is not a positive integer? As might have been suspected, the trick is to use infinite series. Definition: If f (x), x R , has a convergent Taylor series, ∞ an xn , f (x) = |x| < ρ, n=0 then we define f (z ), z C , by the infinite series ∞ an z n , f (z ) = n=0 and the infinite series converges throughout the disc |z | < ρ . The assertion that the complex series converges throughout the disc |z | < ρ is an immediate consequence of Theorem 13 on page ?. Thus, for example, we define. ∞ E (z ) = n=0 1n z, n! 66 CHAPTER 1. INFINITE SERIES ∞ C (z ) = n=0 ∞ S (z ) = n=0 and ∞ (1 + z )α = n=0 (−1)n z 2n , (2n)! (−1)n z 2n+1 , (2n + 1)! α(α − 1) · · · (α − n + 1) n z, n! αR where the first three series converge for all z C , while the last converge for |z | < 1 . We have temporarily used the notation E (z ) in place of ez , C (z ) for cos z , and S (z ) for sin z so that you do not jump to hasty conclusion s about these functions by merely extrapolating your knowledge of ex etc. For example it is not true that |sin z | ≤ 1 for all z C , even though |sin x| ≤ 1 for all x R . All properties of these function s for z C must be proved again beginning with the power series definitions. Known properties of ex , x R and wishful thinking don’t prove properties of ez , z C . Let us begin by proving Theorem 1.27 . (a) E (iz ) = C (z ) + iS (z ), for all z ∈ C . (b) E (−iz ) = C (z ) − iS (z ) , for all z ∈ C . 1 (c) C (z ) = 2 [E (iz ) + E (−iz )] , for all z ∈ C . (d) S (z ) = 1 2i [E (iz ) − E (−iz )] , for all z ∈ C . Proof: a). b). Just substitute and rearrange the series. For example C (z ) = 1 − z 2 x4 x6 + − + ··· . 2! 4! 6! iS (z ) = i[z − z3 z5 z7 + − + ···] 3! 5! 7! so z2 z3 z4 z5 −i + + i − ··· , 2! 3! 4! 5! where the adding of the two series is justified by Theorem 5(page ?). We must compare the last series with that for E (iz ) : C (z ) + iS (z ) = 1 + iz − E (iz ) = 1 + iz + (iz )2 (iz )3 (iz )4 z2 z3 z4 + + + · · · = 1 + iz − −i + + ··· , 2! 3! 4! 2! 3! 4! which is identical to the series for C (z ) + iS (z ) . c)-d). These follow by elementary algebra from a) and b). The formulas a)-d) of Theorem 21 show there is a close connection between the four functions E (iz ), E (−iz ), C (z ), and S (z ) . Our next theorem shows that the formula ex ey = ex+y , x, y ∈ R , extends to the function E (z ) . Theorem 1.28 . E (z )E (w) = E (z + w) , for all z, w ∈ C . 1.6. COMPLEX-VALUED FUNCTIONS, E Z , COS Z, SIN Z . 67 Proof: We must show that ∞ ( n=0 zn )( n! ∞ n=0 wn )= n! ∞ n=0 (z + w)n , n! The product of the two series is defined in Theorem 15. Using that definition, we find that ∞ ( n=0 zn )( n! ∞ n=0 wn )= n! ∞ n ( n=0 k=0 z k wn−k ). k ! (n − k )! However, the binomial theorem for positive integer exponents (which only uses the algebraic rules for complex numbers) states that n n (z + w) = k=0 n! z k wn−k . k !(n − k )! Upon substituting this into the last equation, we obtain the desired formula. The formula of this theorem is the key to many results, like the following generalization of sin2 x + cos2 x = 1 . Corollary 1.29 C (z )2 + S (z )2 = 1 for all z ∈ C . Proof: We use equations a) and b) of Theorem 21 to reduce the question to one of exponentials. E (iz )E (−iz ) = [C (z ) + iS (z )][C (z ) − iS (z )] = C 2 (z ) + S 2 (z ). But by Theorem 22, E (iz )E (−iz ) = E (iz − iz ) = E (0) . Directly from the power series we see that E (0) = 1 . This proves the formula. Our next corollary states that the addition formulas for sin x and cos x are still valid for C (z ) and S (z ) . Corollary 1.30 C (z + w) = C (z )C (w) − S (z )S (w) and S (z + w) = S (z )C (w) − S (w)C (z ) for all z, w ∈ C Proof: A direct algebraic computation does the job. C (z + w) + iS (z + w) = E (iz + iw) = E (iz )E (iw) = [C (z ) + iS (z )][C (w) + iS (w)] = [C (z )C (w) − S (z )S (w)] + i[S (z )C (w) + S (w)C (z )]. Similarly we find that C (z + w) − iS (z + w) = [C (z )C (w) − S (z )S (w)] − i[S (z )C (w) + S (w)C (z )]. Addition of these two equations gives the formula for C (z + w) , while subtraction gives the formula for S (z + w) . Had we but world enough, and time, we would linger a while. A lovely result we have not proved is that E (z + 2πi) = E (z ) , the periodicity of E (z ) , which is a consequence of 68 CHAPTER 1. INFINITE SERIES the formulas C (z + 2π ) = C (z ) , and S (z + 2π ) = S (z ) , the periodicity of C (z ) and S (z ) , by using Theorem 21 (but see pp. ??). We shall close this chapter by restating the results proved above in the usual language of ez etc. instead of the temporary notation E (z ) etc. we have been using. eiz = cos z + i sin z (1-12) −iz = cos z − i sin z (1-13) 1 cos z = 2 (eiz + e−iz ) 1 sin z = 2i (eiz − e−iz ) zw z +w (1-14) e (1-15) e e =e (1-16) 2 sin z + cos z = 1 (1-17) cos(z + w) = cos z cos w − sin z sin w (1-18) sin(z + w) = sin z cos w + sin w cos z (1-19) 2 Generally, all algebraic formulas for sin x, cos x , and ex remain valid for sin z, cos z , and ez . In fact any algebraic relationship between any combination of analytic functions remains valid as we change the in dependent variable from a real x to the complex z . Inequalities almost always fall apart in the transition from x ∈ R to z ∈ C . Exercise 2e below illustrates this. One formula which we will use frequently later on is a specialization of (1-12) to the case when z is real. Then writing the real z as θ we have the famous formula eiθ = cos θ + i sin θ, θ ∈ R. (1-20) We cannot resist stating this formula down again for θ = π : eiπ = −1, an almost mystical identity connecting the four numbers e, i π , and −1 . Notice that (1.6) also implies eiθ = 1 . If we write z = x + iy , then using (1.6) we find ez = ex+iy = ex eiy = ex (cos y + i sin y ). (1-21) |ez | = ex (1-22) A consequence of this is Exercises (1) Observe that (directly from the power series) cos(−z ) = cos z, and sin(−z ) = − sin z. Use this and the addition formula for cos(z + w) to prove that sin2 z + cos2 z = 1 . 1 1 (2) If we define sin hx = 2 (ex − e−x ) and cos hx = 2 (ex + e−x ), x ∈ R , we prove that 1.6. COMPLEX-VALUED FUNCTIONS, E Z , COS Z, SIN Z . 69 (a) cos ix = cos h x, sin ix = i sin h x (b) cos z = cos h y − i sin x sin h y, (z = x + iy ) sin z = sin x cos h y + i cos x sin h y (c) |cos z |2 = cos2 x + sin h2 y |cos z |2 = cos h2 y − sin2 x |sin z |2 = sin2 x + sin h2 y |sin z |2 = cos h2 y − cos2 x (d) Use the identities of part c) to deduce that |sin h y | ≤ |cos z | ≤ cos h y |sin h y | ≤ |sin z | ≤ cos h y (e) Prove that there is some z ∈ C such that |sinz | > 1, and |cos z | > 1. (3) Define the derivative of f (z ) at z0 , where z, z0 ∈ C , as lim z →z0 f (z ) − f (z0 ) , z − z0 if the limit exists. (a) By working directly with the power series, show that ez is differentiable for all z , and that d az e = aeaz , a, z ∈ C, dz (b) Apply this to (1-12) and (1-13) to deduce that d d cos z = − sin z, sin z = cos z dz dz (We cannot appeal to Theorem 16 and differentiate term-by-term since that theorem assumed the independent variable, x , was real). (4) Use the results of Exercise 2c to show that the only complex roots z = x + iy of sin z and cos z are at the points on the real axis y = 0 where sin x = 0 and cos x = 0 , respectively. (5) Use the results of this section to prove DeMoirve’s Theorem (cos θ + i sin θ)n = cos nθ + i sin nθ, θ ∈ R, where n is a positive integer. (6) (a) Show that the sum of the finite geometric series N einx = n=1 einx is ei(N +1/2)x − eix/2 . eix/2 − e−ix/2 70 CHAPTER 1. INFINITE SERIES (b) Take the real and imaginary parts of the above formula and prove that for all x = 0, x ∈ (0, 2π ) , N cos x = n=1 sin(N + 1/2)x − sin 1/2x 1 2 sin 2 x N sin nx = n=1 1.7 1 1 cos 2 x − cos(N + 2 )x . 2 sin 1 2 Appendix to Chapter 1, Section 7. As a special dessert let us take some time out and prove some interesting results you would probably never see otherwise. We have in mind to define a specific number α ∈ R as the smallest positive zero of cos x, x ∈ R —so α had better turn out as π/2 . Then we prove that 1) sin(x + 4a) = sin x etc., 2) the ratio of the circumference to diameter of a circle is 2α so that 2α does equal the π of public school fame. Furthermore, we also present a way of computing α . In this section we take sin z and cos z, z ∈ C to be defined by their power series, and use only the properties of these functions which were obtained from the power series definition. Lemma 1.31 The set A = { x ∈ R : cos x = 0, 0 < x < 2 } is not empty, that is, the equation cos x = 0 has at least one real root for x ∈ (0, 2) . Proof: Since cos x is defined by a convergent power series, it is continuous (even infinitely differentiable); furthermore because x ∈ R and the power series has real coefficients, we know that cos x, x ∈ R is real-valued. Observe that cos 0 = 1 > 0 , and the following crude inequality 22 + cos 2 = 1 − 1·2 < −1 + ∞ n=2 4∞2 2 4! k=0 (−1)n 22n < −1 + (2n)! ( )2k 5 ∞ n=2 22 n (2n)! (1-23) 50 = −1 + < 0. 63 Thus cos 0 > 0 and cos 2 < 0 , so there is at least one point in (0, 2) where the real-valued continuous function cos x vanishes. This proves the lemma. Denote the g.l.b of A (which does exist since A is bounded—say by 0 and 2) by α . We shall show that α ∈ A . Since α is the g.l.b. of A , there exists a sequence of points αk ∈ A (the αk may just be the same point repeated over and over) such that αk → α and cos αk = 0 . But since cos x is continuous, 0 = lim cos αk = cos α, k→∞ so in fact cos α = 0 too ⇒ α ∈ A . Now cos x must be positive throughout the interval [0, α) , since it is positive at x = 0 d and α is the first place it vanishes. Therefore the formula dx sin x = cos x —obtained by differentiating the real power series for sin x term by term—shows that sin x is increasing 1.7. APPENDIX TO CHAPTER 1, SECTION 7. 71 for x ∈ [0, α) . Since sin 0 = 0 , we see that sin x ≥ 0 for x ∈ [0, α) . Thus the formula d dx cos x = − sin x tells us that cos x is decreasing in the interval [0, α] . From the formula 1 = sin2 α + cos2 α = sin2 α, and the fact that sin α > 0 , we find that sin α = 1 . We can thus conclude from the addition formulas for sin x and cos x the: Theorem 1.32 Let α denote the smallest zero of cos x for x > 0 . Then cos α = 0, cos 2α = −1, cos 3α = 0, cos 4α = 1 sin α = 1, sin 2α = 0, sin 3α = −1, sin 4α = 0, or more generally cos(z + α) = − sin z, sin(z + α) = cos z cos(z + 4α) = cos z, sin(z + 4α) = sin z This proves that the sin z and cos z are periodic with period 4α . As you have guessed, α is another name for π/2 —and serves as our definition of π . This is based upon power series and is independent of circles or triangles—or even the entire concept of angle. A simple consequence is the Corollary 1.33 The function ez is periodic with period 4αi , ez +4αi = ez e4αi = ez . Proof: ez +4α = ez e4iα = ez (cos 4α + i sin 4α) + ez (1 + i0) = ez . Two issues remain to be settled before closing up. We should 1) prove that the ratio of the circumference C of a circle to its diameter D is π , i.e., C = 2αD , and 2) find some way of approximating α numerically (for all we know of alpha so far is that it is the smallest element in a set and 0 < α < 2 ). The two problems are closely related. The circle of radius R has the equation x2 + y 2 = R2 . Consider the portion in the first quadrant. Then using the familiar formulas for arc length, we find that C =R 4 R √ 0 dx =R R2 − x2 1 √ 0 dt , 1 − t2 where the change of variable x = Rt has been used to obtain the last integral [this is legal since the mapping “multiply by R” is a bijection and hence an invertible function]. Thus, the desired result, C = 2αD = 4αR will be proved if we can p rove Theorem 1.34 1 √ dt 0 1−t2 = α(= π ) 2 Corollary 1.35 If C denotes the arc length of the circumference of a circle of radius R , then C = 4αR . 72 CHAPTER 1. INFINITE SERIES Proof: of Theorem. We want to make the change of variable t = sin ζ , where t ∈ [0, 1] . In order to do this we must only check that the function sin ζ is differentiable and invertible function there. We know it is differentiable . Since sin x is continuous and monotone increasing for x ∈ [0, α] , and since the end points are mapped into 0 and 1 respectively ( sin 0 = 0, sin α = 1 ), the function f (ζ ) = sin ζ is invertible for x ∈ [0, α] ⇐⇒ t ∈ [0, 1] . The usual formulas are applicable and yield 1 √ 0 1 dt = 1 − t2 α dζ = α 0 Q.E.D. To compute π = 2α , it is convenient to introduce tan z = sin z/ cos z , for all z where cos z = 0 . In particular tan x is defined for all real x in the interval 0 ≤ x < α/2 . From the behavior of sin x and cos x in the interval x ∈ [0, α/2) , it is easy to show that tan x has infinitely many derivatives and is increasing for x ∈ [0, α/2) , assuming the values from 0 = tan 0 to 1 = tan α . The function tan x is therefore invertible in that interval, so we 2 can make the natural change of variable t = tan x and obtain 1 0 dt = 1 + t2 α/2 0 α/2 d 1 2 x ( dx tan x)dx = 1 + tan dx = 0 α . 2 But the integral on the left can be approximated readily because of the algebraic identity 1 = 1 + t2 N (−1)n t2n + 0 Thus α π == 4 2 or 1 0 dt = 1 + t2 (−1)N +1 t2N +2 , all t = i. 1 + t2 N 1 (−1)n 0 0 1 2N +2 t t2n dt + (−1)N +1 0 1 + t2 dt, π 111 (−1)N = 1 − + − + ··· + + RN , 4 357 2N + 1 where since 2t ≤ 1 + t2 the remainder RN can be estimated by 1 2N +2 t |RN | = 0 1+ t2 1 2N +2 t dt < 0 2t dt = 1 4N + 4 If the first 250 terms in the series are used, N = 250 , we find π 11 1 = 1 − + − ··· + + R250 , 4 35 251 1 1 where |R250 | < 1004 < 1000 , so three decimal accuracy is obtained. This is quite slow—but it does work. For practical computations, a series which converges much faster is needed. See exercise 2 below; it is neat. Since RN → 0 as N → ∞ , the following formula is a consequence of our effort: π 1111 = 1 − + − + − · · · .. 4 3579 Exercises 1.7. APPENDIX TO CHAPTER 1, SECTION 7. 73 (1) Use the method illustrated here to slow that 1 ln 2 = 0 1111 (−1)N +1 1 dx = 1 − + − + − · · · + + RN , 1+x 2345 N where limN →∞ RN = 0 . Find an N such that |RN | < 10−3 . [Hint: Write 1 1+x (2) To approximate N nn 0 (−1) x = π 4 1 = 1 + t2 + (−1)N +1 xN +1 , 1+x x = −1]. with fewer terms, the following clever device works. Write N −1 (−1)N t2N (−1)N t2N (−1)N +1 t2N +2 +( + ) 2 2 1 + t2 (−1)n t2n + 0 and show that π 11 (−1)N −1 (−1)N ˜ = 1 − + + ··· + + + RN , 4 35 2N − 1 2(2N − 1) ˜ where RN + (−1)N 2 1 t2N −t2N +2 0 1+t2 ˜ (a) Prove that RN < dt . 1 8N 2 +8N . ˜ (b) What should N be to make RN < 10−3 ? Amazing saving, isn’t it? The technique does generalize to other series and can be refined to yield even better results. (c) Apply the method given here to problem 1 above to show that ln 2 = 1 − 1 + 2 N +1 −N 1 1 1 ˜ ˜ + · · · + (N1)1 + 2 (−1) + RN , where RN < (2N +1)(2N +3) . Pick N so that 3 − N ˜ RN < 10−3 . 74 CHAPTER 1. INFINITE SERIES Chapter 2 Linear Vector Spaces: Algebraic Structure 2.1 Examples and Definition In order to develop intuition for linear vector spaces, a slew of standard examples are needed. From them we shall abstract the needed properties which will then be stated as a set of axioms. a) The Space R2 . We begin by informally examining a space of two dimensions (whatever that means). It is constructed by taking the Cartesian Product of R with itself. We are thus looking at R × R , which is denoted by R2 . A point X in this space is an ordered pair, X = (x1 , x2 ) , where x1 ∈ R, x2 ∈ R . x1 and x2 are called the coordinates or components of the point x . Let us propose a reasonable algebraic structure on R × R . If X = (x1 , x2 ) , and Y = (y1 , y2 ) are any two points, and α is any real number, we define addition: X + Y = (x1 + y1 , x2 + y2 ) . multiplication by scalars: α · X = (αx1 , αx2 ), α ∈ R . equality: X = Y ⇐⇒ x1 = y1 , x2 = y2 The addition formula states that the parallelogram rule is used to add points, whereas the second formula states that a point X is “stretched” by α by stretching each coordinate by α . Some immediate consequences of the above definitions are, for all X, Y, Z in R × R , (1) addition is associative (X + Y ) + Z = X + (Y + Z ) (2) addition is commutative X + Y = Y + X (3) There is an additive identity, 0=(0,0) with the property that X + 0 = X for any X . (4) Every X = (x1 , x2 ) ∈ R × R has an additive inverse (−x1 , −x2 ) , which we denote by −X . Thus X + (−X ) = 0 . Thus the set of points in R × R forms an additive abelian group. The following additional properties are also obvious, where α and β are arbitrary real numbers. 75 76 CHAPTER 2. LINEAR VECTOR SPACES: ALGEBRAIC STRUCTURE (5) α(βX ) = (αβ )X (6) 1 · X = X . and the two distributive laws. (7) (α + β )X = αX + βX (8) α(X + Y ) = αX + αY . To insure that you too feel these properties are obvious, let us prove, one, say 7. (α + β ) · X = (α + β ) · (x1 , x2 ) = ((α + β )x1 , (α + β )x2 ) = (αx1 + βx1 , αx2 + βx2 ) = (αx1 , αx2 ) + (βx1 , βx2 ) (2-1) = α · (x1 , x2 ) + β · (x1 , x2 ) = α · X + β · X Example: If X = (2, 1) , then 3X = (6, 3) and −2X = (−4, −2) . Instead of thinking of the elements (x1 , x2 ) in R2 as points, it is sometimes useful to think of them as directed line segments, from the origin (0,0) directed to the point (x1 , x2 ) . The figure at the right illustrates this. Note that the axes need not be perpendicular to each other in the space R2 . They could just as well veer off at some outrageous angle, as in the diagram. This is because we have yet to place a metric (distance) structure on R2 or introduce any concept of angle measurement. When we do that, we will have Euclidean 2-space E2 . But right now all we have is R2 , which might be thought of as a floppy Euclidean space. b) The Space Rn . This is a simple-minded generalization of R2 . A point X in Rn = R × ... × R is an ordered n tuple, X = (x1 , x2 , ..., xn ) of real numbers, xk ∈ R . If X = (x1 , ..., xn ) and Y = (y1 , ..., yn ) are any two points in Rn , and α is any real number, we define addition: λ + Y = (x1 + y1 , x2 + y2 , ..., xn + yn ) multiplication by scalars: α · X = (αx1 , αx2 , ..., αxn ), α ∈ R. equality: X = Y ⇐⇒ xj = yj for all j . 1 Example: The point X = (1, 2, 3) , and 1 X = ( 2 , 1, 3 ) in R3 are indicated in the figure. 2 2 Again the coordinate axes need not be mutually perpendicular. Properties 1-8 listed earlier remain valid - and with the proofs essentially unchanged (just add dots inside the parentheses). Remark: . At this stage, you probably are anxiously waiting for us to define multiplication in Rn , that is, the product of two points in Rn , X · Y = Z ∈ Rn , possibly using the multiplication of complex numbers (points in R2 ) as a guide. Well, we would if we could. It turns out that it is possible to define such a multiplication only in R1 , R2 , R4 , and in R8 –but in no others. This is a famous theorem. In R2 ordinary complex multiplication does the job. To do it in R4 , we have to abandon the commutative law for multiplication. The result is called quaternions. In R8 , the multiplication is neither commutative nor associative. The result there is the Cayley numbers. Here we shall not have time to treat this issue. All we shall do (later) is introduce a “pseudo multiplication” in R3 —the so called cross product - obtained from the quaternion 2.1. EXAMPLES AND DEFINITION 77 algebra in R4 . The major importance of this pseudo multiplication which holds only in R3 is the fact of life that our world has three space dimensions. This multiplication is extremely valuable in physics. c) The Space C [a, b] . Our next example is of an entirely different nature, it is a space of functions, a function space. The space C [a, b] is the set of all real-valued functions of a real variable x which are continuous for x ∈ [a, b] . If f and g are continuous for x ∈ [a, b] , that is if f and g ∈ C [a, b] , and if α is any real number, we define, in the usual way, addition: (f + g )(x) = f (x) + g (x) , multiplication by scalars: (αf )(x) = α[f (x)]. α ∈ R equality: f = g ⇐⇒ f (x) = g (x) for all x ∈ [a, b]. Notice that the sum of two functions in C [a, b] is again in C [a, b] , and the product of a continuous function - in C [a, b] —by a constant α is also an element of C [a, b] . We shall ignore the fact that the product of two continuous functions is also a continuous function. Properties 1-8 listed earlier are also valid here, that is, if f, g , and h are any elements in C [a, b] , then (1) f + (g + h) = (f + g ) + h (2) f + g = g + f (3) f + 0 = f (4) f + (−1)f = 0 (5) α(βf ) = (αβ )f (6) (1)f = f 1∈R (7) (α + β )f = αf + βf (8) α(f + g ) = αf + αg . Again, 1-4 state that the elements of C [a, b] form an abelian group with the group operation being addition. When we define the dimension of a vector space, it will turn out that the space C [a, b] is infinite dimensional, but don’t let that bother you. This nice space, C [a, b] , and Rn are the two most useful examples of a vector space. d) D. The Space C k [a, b] . The space C k [a, b] consists of all real-valued functions f (x) which have k continuous derivatives for x in the interval [a, b] ⊂ R . When k = 0 , this reduces to the space C [a, b] . Addition and scalar multiplication are defined just as in C [a, b] . The key property is that the sum of two functions with k continuous derivatives of x ∈ [a, b] is also a function with k continuous derivatives. All of properties 1-8 are valid in C k [a, b] . Every function f (x) which has one continuous derivative is necessarily continuous. This is a basic result from elementary calculus; it may be written as C 1 [a, b] ⊂ C [a, b] . Since the function |x| , x ∈ [−1, 1] is in C [−1, 1] but not in C 1 [−1, 1] , we see that C 1 and C are not the same, that is C 1 is a proper subset of C . Similarly, C k+1 [a, b] ⊂ C k [a, b] (see Exercise 7). 78 CHAPTER 2. LINEAR VECTOR SPACES: ALGEBRAIC STRUCTURE The space C ∞ [a, b] consists of all functions with an infinite number of continuous derivatives for x ∈ [a, b] . All functions which have a convergent Taylor series for x ∈ [a, b] 2 are in C ∞ [a, b] . In addition, C ∞ [a, b] contains functions like f (x) = e−1/x , x = 0, f (0) = 0 , which have an infinite number of continuous derivatives (see p. ??) but do not have convergent Taylor series. Another example of a function space is the set of analytic functions A(z0 , R) , functions which have a convergent Taylor series in the disc with center at z0 ∈ C and radius at least R. e) E. The Space l1 . The space l1 (tired yet?) consists of all infinite sequences X = (x1 , x2 , x3 , ...) which satisfy ∞ |xn | < ∞ . Addition and multiplication by scalars are defined in a natural the condition n=1 way. If X and Y are in l1 , then X + Y + (x1 + y1 , x2 + y2 , x3 + y3 , ...) and, if α is any complex number α · X = (αx1 , αx2 , ...). Equality is defined by X + Y ⇐⇒ xj = yj for all j. We should show that if X and Y are in l1 , then so is X + Y and x · X . To prove that X + Y ∈ l1 , we must show that |xn + yn | < ∞ . But since |xn + yn | ≤ |xn | + |yn | , we have for any N ∈ Z+ N N |xn + yn | ≤ n=1 ∞ N |xn | + n=1 |yn | ≤ n=1 ∞ |xn | + n=1 |yn | < ∞. n=1 ∞ Now letting N → ∞ on the left, we see that that α · X is also in l1 since ∞ ∞ |αxn | = n=1 f) |xn + yn | < ∞ . If X ∈ l1 , it is obvious n=1 ∞ |α| |xn | = |α| n=1 |xn | < ∞. n=1 F. The Space L1 [a, b] . Yes, the space L1 [a, b] does consist of all functions f (x) (possibly complex-valued) with b the property that a |f (x)| dx < ∞ . It is the integral analogue of l1 . Addition and scalar multiplication are defined as in C [a, b] , that is, as usual. If f and g are in L1 [a, b] , then so are f + g and αf , where α ∈ C , since b b |f (x) + g (x)| dx ≤ a b |f (x)| dx + a |g (x)| dx < ∞, a 2.1. EXAMPLES AND DEFINITION 79 and b b |αf (x)| dx = |α| a |f (x)| dx < ∞. a For example, f (x) = x is in L1 [0, 1] but f (x) = that properties 1-8 are satisfied in L1 [a, b] . g) 1 x2 is not in L1 [0, 1] . It is simple to check G. The Space fn . If P (x) = a0 + a1 x + ... + an xn is any polynomial of degree n with real coefficients and Q(x) = b0 + b1 x + ... + bn xn is another one, then with ordinary addition, multiplication by real scalars and equality the set fn of all polynomials of degree n satisfy conditions 1-8. Since a0 + a1 x + ... + an−1 xn−1 = a0 + a1 x + ... + an−1 xn−1 + 0xn , it is clear that fn−1 ⊂ fn . Enough examples for now. You must have gotten the point. We shall meet more later on. Let us give the abstract definition of a linear vector space. Definition: . Let S be a set with elements X, Y, Z, ... and F be a field with elements α, β... . The set S is a linear vector space (linear space, vector space )over the field F if the following conditions are satisfied. For any two elements X, Y ∈ S , there is a unique third element X + Y ∈ S , such that (1) (X + Y ) + Z = X + (Y + Z ); (2) X + Y = Y + X ; (3) There exists an element 0 ∈ S having the property that 0 + X = X for all X ∈ S ; (4) for every X ∈ S , there is an element −X ∈ S ; such that X + (−X ) = 0 . Furthermore, if α is any element of the field F , there is a unique element αX ∈ S such that, for any α, β ∈ F , (5) α(βX ) = (αβ )X ; (6) 1 · X = X . The additive and field multiplicative structures are related by the following distributive rules (7) (α + β )X = αX + βX (8) α(X + Y ) = αX + αY. Elements of the field F are called scalars, whereas elements of S are called vectors. We shall usually take the real numbers R for our field F , although the complex numbers C will be used at times. Exercise 4 shows the need for Axiom 6 (in case you thought it was superfluous). All of the examples of this section are linear spaces. For most purposes the simple example R2 will serve you well as a guide to further expectations. The pictures there are simple. In fact, with a certain degree of cleverness, the “right” proof for R2 immediately generalizes to all other linear spaces - even “infinite dimensional” ones. 80 CHAPTER 2. LINEAR VECTOR SPACES: ALGEBRAIC STRUCTURE Since you probably think that everything is a linear space, here is an example to dispel the delusion. Let S be the subset of all functions f (x) in C [0, 1] which have the property f (0) = 1 . Then if f and g are in S , we are immediately stuck since f (0) + g (0) = 2 , so that f + g is not in S . Also, 0 ∈ S . Both here, and before (p.?) when defining a field, axioms “0” have been used. They all express roughly the same concept. We have some set S and an operation * defined on the set. These axioms all stated that for any x, y ∈ S , we also have x ∗ y ∈ S . In other words, the set S is closed under the operation * in the sense that performing that operation does not take us out of the set. We shall find this concept useful. h) Appendix. Free Vectors One more example is needed, an exceedingly important example. There are “physicists’ vectors” or free vectors. I always thought they were easy to define - until today. Twelve hours and fifty pages later, I begin again on the fifth attempt. The essential idea is easy to imagine but difficult to convey in a clear and precise exposition. Say you are given two elements X and Y of Rn , which we represent by directed line segments from the origin. Somehow we want to find a directed line segment V from the tip of X to the tip of Y . Now V “looks” like a vector. The problem is that all of the vectors we have met so far have been directed line segments in Rn beginning at the origin. In order to find a way out, it is best to examine the problem for the most simple case −R1 , the ordinary line. Watch closely since we will be so shrewd that all the formalism will be adequate without change for the general case of Rn . We are given two points, X and Y of R1 which we shall represent by directed line segments from the origin. To make the picture clear, we will draw them slightly above the line. a figure goes here We want a directed line segment V from the tip of X to the tip of Y . Of course you recognize this as the problem of solving X +V =Y The solution, V = Y − X , is the difference of the two real numbers Y and X . But where should we draw V ? If we are stubborn and demand that all real numbers must be represented by line segments beginning at the origin, we have the picture a figure goes here but what we really want to do is place the tail of V at the tip of X and add the line segments. Why not relent and allow ourselves this added flexibility. a figure goes here There! Now we have solved our problem. But we have made an important generalization in doing so. You see, this V has been released from its bondage to the origin and is now free to move along the whole of R . Although we were led to this V from the pair X and Y , the same V could have been ˜ ˜ generated by a different pair X and Y , as the diagram below indicates, 2.1. EXAMPLES AND DEFINITION 81 a figure goes here ˜ ˜ for we still have X + V = Y . In the first case we might have had X = 2 and Y = 3 , so that V = 1 , while in the second, we might have had X = −4 and Y = −3 , and again V = 1 . Even though we have let this V go free, sliding from place to place along R , we still want to say that this is only one V , and in fact, we want to identify this V with the V tied to the origin in (2). In other words, we would like to say that all three V ’s used above are equivalent to each other. More formally, the element V is generated by an ordered pair, V = [X, Y ] , which we ˜ read as the vector from X to Y , for X, Y ∈ R . If some V is generated by another ordered ˜ = [X, Y ], X, Y ∈ R , then we want equality V = V to mean that Y − X = Y − X . ˜˜ ˜˜ ˜ ˜˜ pair, V Moreover, we want to represent V = [X, Y ] , the vector from X to Y , by the vector from the origin 0 to Y − X, V = [0, Y − X ] . This representation of V is unique, since if any ˜˜ ˜ ˜ other pair also generates V, V = [X, Y ] , the representative V = [0, Y − X ] = [0, Y − X ] ˜ ˜ since V = V implies that Y − X = Y − X . Therefore much as each rational number is an equivalence class, represented by a single rational number - as 1 represents the 2 3 equivalence class 1 , 2 , 6 , ... , each V is an equivalence class of ordered pairs V = [X, Y ] , 24 where X, Y ∈ R . It is uniquely represented by an element of R , viz. V = Y − X , the representation being independent of the particular ordered pair [X, Y ] which generates V . It is possible to think of V either as an ordered pair with an equivalence relation, or just as the representative V = [0, Y − X ] of the whole equivalence class, the representation being written more simply as an element of R : V = Y − X , where here equality is between elements of R . The generalization is now easily made Definition: . (Free vectors). Let X and Y be any elements of Rn . An element V ∈ V n , “physicists’ n -space”, is defined as an equivalence class of ordered pairs of elements in Rn , V = [X, Y ], X, Y ∈ Rn , ˜ ˜˜ with the following equivalence relation: If V = [X, Y ] and V = [X, Y ] , then ˜ ˜ ˜ V = V ⇐⇒ Y − X = Y − X, where the second equality is that of elements in Rn . If we are given X and Y in Rn , we speak of V = [X, Y ] as the free vector going from X to Y . Previous reasoning also shows that each V ∈ Vn is uniquely represented by the ordered pair V = [0, Y − X ] . This representation is independent of the elements [X, Y ] which generated V . We were led to this definition of Vn by examining the situation in the special case of 1 . Since our formal reasoning there was quite algebraic and general, we know that the V definition works algebraically. The geometry works too. An example in V2 should make the general case clear. Let X = (1, 3) and Y = (2, 1) . These two points in R2 generate the ordered pair V = [(1, 3), (2, 1)] in V2 . V is the vector going from X = (1, 3) to Y = (2, 1) . Of all equivalent V ’s, the unique representative which begins at the origin is V = [(0, 0), (1, −2)] , which we simply write as V = (1, −2) and represent as an ordinary element of R2 . On ˜ ˜ the same diagram we exhibit the vector from X = (−2, 2) to Y = (−1, 0) , which is ˜ ˜ ˜ V = [(−2, 2), (−1, 0)] . The unique representative (of all V ’s equivalent of V ) which begins 82 CHAPTER 2. LINEAR VECTOR SPACES: ALGEBRAIC STRUCTURE ˜ ˜ from (0,0) is V = [(0, 0), (1, −2)] , which we write simply as V = (1, −2) . Comparison of ˜ reveals that they are equal, V = V . Thus, from the diagram, we see that a ˜ V and V free vector is an equivalence class of directed line segments, with two directed line segments ˜ V, V being equivalent as vectors in V2 if they are equivalent to the same directed line ˜ segment which begins at the origin. In more geometrical language, V = V if by sliding them “parallel to themselves”, they can be made to coincide with their representer which begins at the origin. (We shall not define “parallel” here. It is not needed because we already have a satisfactory algebraic definition of equivalence.) ˆ Notice that X = (1, 3) and Y = (2, 1) also generates a second ordered pair V = [(2, 1), (1, 3)] , the vector from Y = (2, 1) to X = (1, 3) . Its unique representation which ˆ ˆ begins at the origin is V = [(0, 0), (−1, 2)] , or more simply V = (−1, 2) . Comparison with ˆ the previous example shows that V = −V : the vector from Y to X is the negative of the vector from X to Y . We need the little arrow on our picture of V = [X, Y ] to distinguish it from −V = [Y, X ] which is also between the same points but headed in the opposite direction. From now on we shall denote a vector V ∈ Vn from X to Y by its representative Y − X in Rn , so V = Y − X . Hence the vector from (1,3) to (2,1) will be immediately written as V = (1, −2) . As we have said many times, the representation V = Y − X as an element on Rn is independent of which particular pair [X, Y ] happened to generate V . The following diagram shows a whole bunch of equivalent vectors Vj ∈ V2 , a figure goes here Vj = Vk , and their particular representative V chained to the origin. In order to justify calling the elements of Vn vectors, we should prove that the elements of Vn do form a vector space. Addition and scalar multiplication must first be defined, an easy task. Since every V ∈ Vn is uniquely represented as an element of Rn , V = Y − X ∈ Rn , we use addition and scalar multiplication for elements of Rn —which has already been defined. Because Rn is known to be a vector space, it is a tedious triviality to prove. Theorem 2.1 . Vn is a linear vector space. Proof: . Only a smattering. (1) Vn is closed under addition. Say V1 and V2 are in Vn . Then they are represented as the difference of two elements of Rn , say V1 = Y1 − X1 and V2 = Y2 − X2 . Thus V1 + V2 = (Y1 − X1 ) + (Y2 − X2 ) = (Y1 + Y2 ) − (X1 + X2 ), so that their sum is generated by [X1 + X2 , Y1 + Y2 ] . In other words, there is at least one pair of elements, [X3 , Y3 ], X3 = X1 + X2 and Y3 = Y1 + Y2 , in Rn which generate V1 + V2 , so that V3 = V1 + V2 ∈ Vn . Of course [0, Y3 − X3 ] and many other pairs also generate V3 . (2) Commutativity. V1 + V2 = (Y1 − X1 ) + (Y2 − X2 ) = (Y2 − X2 ) + (Y1 − X1 ) = V2 + V1 . (3) (α + β )V1 = (α + β )(Y1 − X1 ) = α(Y1 − X1 ) + β (Y1 − X1 ) = αV1 + βV1 2.1. EXAMPLES AND DEFINITION 83 1 Example: If A = (4, 2, −3), B = (0, 1, −2), C = (−1, 0, 1 ) and D = (4, − 2 , 1) , find the 2 vector V1 from A to B and the vector from C to D . Then compute V1 + 2V2 and V1 − V2 . solution: V1 = B − A = (0, 1, −2) − (4, 2, −3) = (−4, −1, 1) 1 11 1 V2 = D − C = (4, − , 1) − (−1, 0, ) = (5, − , ) 2 2 22 11 V1 + 2V2 = (−4, −1, 1) + 2(5, − , ) = (−4, −1, 1) + (10, −1, 1) = (6, −2, 2) 22 11 11 11 V1 − V2 = (−4, −1, 1) − (5, − , ) = (−4, −1, 1) + (−5, , − ) = (−9, − , ) 22 22 22 Exercises (1) (a) Find the vector representing the free vectors from the given A ∈ Rn to B ∈ Rn . (i) (ii) (iii) (iv) (v) (vi) A = (3, 1), B = (2, 2). A = (−3, 3), B = (0, 4). A = (2, 2, 3), B = (5, 2, 17) A = (0, 0, 0) B = (9, 8, −3) A = (1, 2, 3), B = (0, 0, −1) A = (0, 0, −1), B = (1, 2, 3) (b) Let V1 and V2 be the respective vectors of iii) and v) above. Compute V1 + V2 , V1 − V2 , and 2V1 − 3V2 . (c) Draw a diagram on which you indicate the vector going from A = (3, 1) to B = (2, 2) , and indicate the representer of that vector which begins at the origin. Do the same with the vector from B to A . (2) Which of the following subsets of C [−1, 1] are linear spaces: (a) The set of all even functions in C [−1, 1] , that is, functions f (x) with the additional property f (−x) = f (x) , like x2 and cos x . (b) The set of all functions f in C [−1, 1] with the additional property that |f (x)| ≤ 1. (c) The set of all functions f in C [−1, 1] with the property that f (0) = 0 . (3) In R3 , let X = (1, −1, 2) and Y = (0, 4, −3) . Find X + 2Y, Y − X , and 7X − 4Y . (4) (a) Show that for every X ∈ R3 you can find scalars αj ∈ R such that X can be written as X = α1 e1 + α2 e2 + α3 e3 , where e1 = (1, 0, 0), e2 = (0, 1, 0), e3 = (0, 0, 1) . (b) If X ∈ R3 , can you find scalars αj ∈ R such that X = α1 θ1 + α2 θ2 + α3 θ3 , where θ1 = (1, −1, 0), θ2 = (−1, 1, 0), θ3 = (0, 0, 1) , and αj ∈ R ? Proof or counter-example. 84 CHAPTER 2. LINEAR VECTOR SPACES: ALGEBRAIC STRUCTURE (c) Find two polynomials P1 (x) and P2 (x) in ?1 such that for every polynomial P (x) ∈?1 you can find scalars αj ∈ R such that P can be written in the form P (x) = α1 P1 (x) + α2 P2 (x). (5) Let V = R × R with the following definition of addition and scalar multiplication X + Y = (x1 + x2 , y1 + y2 ), αX = (αx1 , 0), 0 = (0, 0), −X = (−x1 , −x2 ). Is V a vector space? Why? (6) Show that any field can be considered to be a vector space over itself. (7) Consider the set S = { u ∈ C 2 [0, 1] : a2 u + a1 u + a0 u = 0 }, where the aj (x) ∈ C [0, 1] . Is S a linear space? Note that we do not yet know that S has any elements at all. The proof that S is not empty is the existence theorem for ordinary differential equations. (8) By integrating |x| the “right” number of times, find a function which is in C k [−1, 1] but is not in C k+1 [−1, 1] . 2.2 Subspaces. Cosets. With this section we begin the process of assigning names to the various concepts surrounding the idea of a linear vector space. This name calling will take us the balance of the chapter. Although the ideas are elementary and theorems simple, do not deceive yourselves into thinking this must be some grotesque joke that mathematicians have perpetrated. You see, we are in the process of building a machine. Most of its constituent parts are very easy to grasp. But when combined, the machine will be equipped successfully to assault a diversity of problems which appear off hand to be unrelated. The value of this abstract formalism is that many seemingly distinct complicated specific problems are just one single problem in a variety of fancy dresses. By ignoring the extraneous paraphernalia we can concentrate on the essential issues. a figure goes here We begin by defining what is meant by a subspace of a vector space W . While reading the definition, think of a plane through the origin, which is a subspace of ordinary three dimensional space. Definition: . A set A is a linear subspace (linear variety, linear manifold) of the linear space W if i) A is a subset of W , and ii) A is also a linear space under the operations of vector addition and multiplication by scalars already defined on V . 2.2. SUBSPACES. COSETS. 85 Examples: (1) Let A = { X ∈ R3 : X = (x1 , x2 , 0) } , that is, the points in R3 whose last coordinate is zero. Since A ⊂ R3 , and a simple check shows that A is also a linear space, we see that A is a linear subspace of R3 . Intuitively, this set A certainly “looks like” R2 . You are right, and recall that the fancy word for this equivalence - of R2 = (x1 , x2 ) and the points in R3 of the form (x1 , x2 , 0) —is isomorphic. Similarly, the set B = { X ∈ R3 : X = (x1 , 0, x3 ) } is also a subspace of R3 . B is also isomorphic to R2 . (2) Let A = { X ∈ Rn : X = (x1 , x2 , ..., xk , 0, 0, ..., 0) } , that is, the points in A are those points in Rn whose last n − k coordinates are zero. It is easy to see that A is a linear subspace of Rn , and that A is isomorphic to Rk . (3) Let A = { f ∈ C [0, 1] : f (0) = 0 } . A is a subset of the linear space C [0, 1] , and is also a linear space (check this). Thus A is a linear subspace of C [0, 1] . (4) Let A = { f ∈ C [0, 1] : f (0) = 1 } . A is a subset of C [0, 1] , but it is not a linear subspace since - as we saw in the last section (p. ?)— A is itself not a linear space. The following lemma supplies a convenient criterion for checking if a given subset A of a linear space W is a subspace. Theorem 2.2 . If A is a non-empty subset of the linear space W , then A is a linear subspace of W ⇐⇒ A is closed under addition of vectors in A and multiplication by all scalars. Proof: . ⇒ . Since A is a subspace, it is itself a linear space. But all linear spaces are, by definition, closed under addition and multiplication by scalars. ⇐ . Because A is a subset of W , and properties 1,2,5,6,7, and 8 hold in W , they also hold for the particular elements in W which happened to be in A . Notice that here we use the fact that A is closed under addition. Therefore only the existential axioms 3 and 4 need be checked. Since A is not empty, it contains at least one element, say X ∈ A . Because A is closed under multiplication by scalars we see that 0 = 0 · X ∈ A . Furthermore, for every X ∈ A , also −X = (−1) · X ∈ A . Example: Let A = { f ∈ C 1 [0, 1] : f (0) = 0 } . Since A is a subset of the linear space C 1 [0, 1] , all we need show is that A is closed under addition and multiplication by scalars in order to prove A a linear subspace of C 1 [0, 1] . If f, g ∈ A , then (f + g ) (0) = (f + g )(0) = f (0) + g (0) = 0 , so f + g ∈ A . Also, for any α ∈ R, (αf ) (0) = α(f )(0) = α · 0 = 0 , so αf ∈ A . Theorem 2.3 . The intersection of two subspaces is also a subspace, but the union of two subspaces is not necessarily a subspace. More generally, the intersection of any collection of subspaces is also a subspace. Proof: . Let A, B be subspaces of W . We show that A ∩ B is a subspace. Since A ∩ B ⊂ W , all we need show is the closure properties of A ∩ B . If X, Y ∈ A ∩ B , then X and Y are both in A and B , so X + Y ∈ A and X + Y ∈ B ⇒ X + Y ∈ A ∩ B too. Similarly for scalar multiples. The proof that A ∩ B ∩ C ∩ ... is a subspace is identical except for a notational mess. 86 CHAPTER 2. LINEAR VECTOR SPACES: ALGEBRAIC STRUCTURE For the second part of the theorem we merely exhibit an example of two subspaces A, B for which A ∪ B is not a subspace. In R2 let A be the linear subspace “horizontal axis”, that is, A = { X ∈ R2 : X = (x1 , 0) } , while B is “the vertical axis”, B = { X ∈ R2 : X = (0, x2 ) } . Then A ∪ B is the “cross” of all points on either the horizontal axis or the vertical axis. This is not a linear space because points like (1, 0) ∈ A, (0, 1) ∈ B do not have their sum (1, 0) + (0, 1) = (1, 1) in A ∪ B . Precisely for this reason R2 = R1 × R1 was constructed as the Cartesian product of R1 with itself; for if it had been constructed as R1 × R1 , then only the points situated on the axes themselves would get caught. More generally - and for the same reason - the Cartesian product is the process always used to “glue” together a larger space from several linear spaces. Only when A ⊂ B (or B ⊂ A ) is A ∪ B also a subspace (Exercise 4). Your image of a linear space should be R3 , and a subspace S is a plane or line in R3 . Note that since every subspace must contain 0, these planes or lines must pass through the origin. Example: Let Sc = { X ∈ R2 : x1 + 2x2 = c, c real } . Thus, the set Sc is all points S = (s1 , s2 ) ∈ R2 on the straight line s1 + 2s2 = c . For what value(s) of c is Sc a subspace? If Sc is a subspace, then we must have aS ∈ Sc for all scalars a , that is aS = (as1 , as2 ) ∈ Sc ⇒ as1 + 2as2 = c . But for a = 0 this states that c = 0 . Therefore the only possible subspace is S0 = { X ∈ R2 : x1 + 2x2 = 0 } . It is easy to check that if S1 and S2 are in S0 , then so are S1 + S2 and aS1 . Thus S0 is a subspace. Similarly, every straight line through the origin is a subspace. Our question now is, how can we talk about the other straight lines or planes which do not happen to pass through the origin? First we answer the question for our example above. There we have the linear space R2 and the subspace S0 which will be simply written as S . S is a line through the origin. Let X1 be any element in R2 (think of X1 as a point). Then the set of all elements of R2 which can be written in the form S + X1 , where S ∈ S , is the line “parallel” to S which passes through X1 . This line is written as S + X1 . More explicitly, say X1 = (1, 3 ) . The set S + X1 is the set of all points X = (x1 , x2 ) ∈ R2 of 2 the form X = S + X1 , which S ∈ S, or 3 (x1 , x2 ) = (s1 , s2 ) + (1, ), where s1 + 2s2 = 0. 2 3 Consequently x1 = s1 + 1 , and x2 = s2 + 2 . Using the relation s1 + 2s2 = 0 , we find that 3 x1 + 2x2 = 4 — exactly the equation of the straight line through X1 = (1, 2 ) and “parallel” 2 : X = S + X , where S ∈ S } , is called to the subspace S . This subset, S + X1 = { X ∈ R 1 the X1 coset of S . Thus, cosets are the names given to “linear objects” which are not subspaces. They are subspaces translated to pass through X1 . You might prefer to call them affine subspaces instead of cosets. Please observe that the cosets S + X1 and S + X2 , where X1 , X2 ∈ W , are not necessarily distinct. In our example, these cosets coincide if and only if X2 is on the line S + X1 , that is, if X2 ∈ S + X1 . The easiest way to test this is to see if X2 − X1 ∈ S . Say X1 = (1, 3 ) as before, and that X2 = (2, 1) . Then the cosets S + X1 and S + X2 are 2 1 the same since the point X2 − X1 = (1, − 2 ) is in S . It should be geometrically clear that 2.2. SUBSPACES. COSETS. 87 the relation of equality among these cosets is an equivalence relation (and so deserving of the title “equality”). We shall state these ideas formally as we turn from this special - but characteristic - example to the general situation. The general problem of describing lines or planes or “higher dimensional linear objects” which do not pass through the origin - so are not subspaces - is solved similarly. Definition: . Let W be a linear space, S a subspace of V , and X1 any element of W . All elements in W which can be written in the form S + X1 , where S ∈ S , is called the X1 coset of S , and written as S + X1 . Our first theorem states that if X2 is in the X1 coset of S , then X1 is in the X2 coset of S : Theorem 2.4 . X2 ∈ S + X1 ⇐⇒ X1 ∈ S + X2 . Proof: Since X2 ∈ S + X1 , there is an S ∈ S such that X2 = S + X1 . Therefore X1 = (−S ) + X2 . Because S is a linear space, (−S ) ∈ S . Thus X1 has been written as the sum of X2 and an element of S , which means that X1 ∈ S + X2 . By the same argument, one sees that any two cosets S + X1 and S + X2 are either identical or are disjoint (have no element in common). Thus the cosets of S partition W in the sense that every element of W is in exactly one coset, just as for our example, every point in the plane R2 was in exactly one straight line parallel to the subspace determined by x1 + 2x2 = 0 . Although we were motivated by geometrical considerations, the ideas apply without alteration to any linear space. This is illustrated by again examining the set A = { f ∈ C [−1, 1] : f (0) = 1 }, which is not a subspace. It is a coset of a subspace S of C [−1, 1] which is constructed as follows. Consider the subspace S which is “naturally” associated with A , viz. S = { g ∈ C [−1, 1] : g (0) = 0 }. Then A is the coset S + 1, A = S + 1 . This is true since clearly A ⊃ S + 1 . Also A ⊂ S + 1 because for every f ∈ A , f (x) = [f (x) − 1] + 1 = g (x) + 1, where g ∈ S. ˆ ˆ Therefore A = S +1 . Similarly, we could have written A as S + f , where f is any function in A , for example A = S + cos x . Exercises (1) Find which of the following subsets of Rn are subspaces. (a) { X ∈ Rn : x1 = 0 }, (b) { X ∈ Rn : x1 ≥ 0 }, (c) { X ∈ Rn : x1 − x2 = 0 }, (d) { X ∈ Rn : x1 − x2 = 1 }, (e) { X ∈ Rn : x2 − x2 = 0 }, 1 88 CHAPTER 2. LINEAR VECTOR SPACES: ALGEBRAIC STRUCTURE (2) In P3 , the linear space of all polynomials of degree ≤ 3 , let A = { p(x) ∈ P3 : p(0) = 0 } , and let B = { p(x) ∈ P3 : p(1) = 0 } . (a). Show that A and B are subspaces of P3 . (b). Find A ∩ B and A ∪ B . Give an example which shows that A ∪ B is not a subspace of P3 . (3) (a) If X1 and X2 are given fixed vectors in R2 then is A = { X ∈ R2 : X = a1 X1 + a2 X2 , a1 and a2 any scalars } a subspace of R2 ? (b) Same as (a) but replace R2 by an arbitrary linear space W . (c) If X1 , X2 , ..., Xk ∈ W , then is k A = {X ∈ W : X = aj Xj , for any scalars aj }, 1 a subspace of W ? (4) Let A and B be subspaces of a linear space W . Prove that A ∪ B is also a subspace if and only if either A ⊂ B or B ⊂ A , that is, if one of the subspaces contains the other. (5) Let S and T be subspaces of a linear space W , and suppose that A is a coset of S and B is a coset of T . Prove that (a). A ⊂ B ⇒ S ⊂ T , and also (b). A=B ⇒S=T. (6) (a) Write the plane 2x1 − 3x2 + x3 = 7 as a coset of some suitable subspace S ⊂ R36 . (b) Write the set A = { f ∈ C [0, 4] : f (0) = 1, f (1) = 3 } , as a coset of some suitable subspace S ⊂ C [0, 4] . (c) Write the set A = { f ∈ C 1 [0, 4] : f (1) = 1, f (1) = 2 } as a coset of some suitable subspace S ⊂ C 1 [0, 4] . 2.3 Linear Dependence and Independence. Span. If W is a linear space and X1 , X2 , ..., Xk ∈ W , then we know that, for any scalars aj , k aj Xj = a1 X1 + a2 X2 + ... + ak Xk Y= j =1 is also in V . Y is a linear combination of the Xj ’s. Now if 0 can be expressed as a linear combination of the Xj ’s, where at least one of the aj ’s is not zero we expect that there is something degenerate around. In fact, if 0 = a1 X1 + ... + ak Xk where say a1 = 0 , then we can solve for X1 as a linear combination of X2 , X3 , ..., Xk , X1 = −1 (a2 X2 + ... + a, Xk ). a1 This leads us to make a definition and state a theorem. 2.3. LINEAR DEPENDENCE AND INDEPENDENCE. SPAN. 89 Definition: . A finite set of elements Xj ∈ W, j = 1, ..., k is called linearly dependent if k there exists a set of scalars aj , j = 1, ..., k , not all zero such that 0 = aj Xj . If the Xj 1 are not linearly dependent, we say they are linearly independent. Theorem 2.5 . A set of vectors Xj ∈ W, j = 1, ..., k is linearly dependent if and only if at least one of the Xj ’s can be written as a linear combination of the other Xj ’s. To test if a given set of vectors is linearly independent, an equivalent form of Theorem 5 is useful. Corollary 2.6 A set of vectors Xj ∈ W, j = 1, ..., k is linearly independent if and only if k aj Xj = 0 implies that a1 = a2 = ... = ak = 0 . j =1 Examples: (1) The vectors X1 = (2, 0), X2 = (0, 1), X3 = (1, 1) in R are linearly dependent since 0 = X1 + 2X2 − 2X3 . Equivalently, we could have applied the theorem since X3 can be written as a linear combination of X1 and X2 1 X3 = X1 + X2 . 2 x −x (2) The functions f1 (x) = ex , f2 (x) = e−x , f3 (x) = e +e 2 dent since 0 = f1 + f2 − 2f3 in C [0, 1] are linearly depen- (3) The vectors X1 = (2, 0, 1), X2 = (−1, 0, 0) in R3 are linearly independent, since if for some a1 , a2 , 0 = a1 X1 + a + 2X2 = (2a1 , 0, a1 ) + (−a2 , 0, 0), then 0 = (0, 0, 0) = (2a1 − a2 , 0, a1 ), which implies that 2a1 − a2 = 0 , and a1 = 0 =⇒ a1 = a2 = 0 . a figure goes here A simple consequence of these ideas is the following Theorem 2.7 . If A and B are any subsets of the linear space W and if A ⊂ B , then i) A is linearly dependent ⇒ B is linearly dependent; and the contrapositive: ii) B is linearly independent ⇒ A is linearly independent. We now prove the transitivity of linear dependence. 90 CHAPTER 2. LINEAR VECTOR SPACES: ALGEBRAIC STRUCTURE Theorem 2.8 . If Z is linearly dependent on the set { Yj }, j = 1, . . . , n and each Yj is linearly dependent on the set { Xl }, l = 1, . . . , m then Z is linearly dependent on the { Xl } . Proof: . This is trivial arithmetic. We know that Z = a1 Y1 + . . . + an Yn , and that Yj = c1j X1 + c2j X2 + ... + cmj Xm By substitution then Z = al (cll Xl + · · · + cml Xm ) + a2 (c12 X1 + · · · + cm2 Xm ) + · · · + an (c1m X1 + · · · + cmn Xm ) = (a1 c11 + a2 c12 + · · · + an cln )X1 + (a1 c21 + · · · + an c2n )X2 + · · · + (a1 cml + · · · + cmn )Xm n = γ1 X1 + · · · + γm Xm , where γl = aj clj . j =1 More concisely: n Z= n aj Yj = j =1 m aj j =1 m clj Xl = l=1 n m aj clj Xl = l=1 j =1 γl Xl . l=1 Let X1 and X2 be any elements of a linear space W . Is there a smallest subspace A of W which contains X1 and X2 ? There are two possible ways of answering this, constructively and non-constructively. First, constructively. We observe that the desired subspace must contain X1 and X2 , and all linear combinations of X1 and X2 , that is, A must contain all X ∈ W of the form X = a1 X1 + a2 X2 for all scalars a1 and a2 . But observe that the set B = { X ∈ V : X = a1 X1 + a2 X2 } is a linear space, since if X and Y ∈ B , then aX ∈ B for any scalar a , and also X + Y ∈ B . Thus the desired subspace A is just B itself. The constructive proof goes as follows: just let A be the intersection of all subspaces containing X1 and X2 . By Theorem 3 the intersection of these subspaces is also a subspace. It is clearly the smallest one. Do you feel cheated? This type of reasoning is often used in modern mathematics. Although it reveals little more than the existence of the sought-after object, it is an extremely valuable procedure when you really don’t want anything more than to know the object exists. More important, procedures like this are vital when there is no constructive proof available. More generally, if S = { Xj }, j = 1, . . . , k , is any finite subset of a linear space W , we ask for the smallest subspace A of W which contains S . There are two proofs - exactly as in the simple case above (where k = 2 ). From the constructive proof we find that k A = {X ∈ W : X = aj Xj , aj scalars }, 1 so A is the set of all linear combinations of the Xj ’s. This set A is called the span of S , and denoted by A = span(S ) . We also say that S spans A , or that A is generated by S . 2.3. LINEAR DEPENDENCE AND INDEPENDENCE. SPAN. 91 Examples: (1) In R3 let X1 = (1, 0, 0) and S2 = (0, 1, 0) . Then the span of S = { Xj , j = 1, 2 } is all X ∈ R3 of the form X = a1 X1 + a2 X2 = (a1 , a2 , 0) . If we imagine R3 as ordinary 3-space, then the span of X1 and X2 is the entire x1 , x2 plane. (2) In R3 , let X1 = (1, 0, 0), X2 = (0, 1, 0) , and X3 = (0, 0, 1) . Then the span of T = { Xj , j = 1, 2, 3 } is all X ∈ R3 of the form X = a1 X1 + a2 X2 + a3 X3 (a1 , a2 , a3 ) . Since all of R3 can be so represented, we have span(T ) = R3 , that is, the set B spans R3 . Comparing these two examples, we see that S ⊂ T and span(S ) ⊂ span(T ) . (3) In R3 , let X1 = (1, 0) and X2 = (0, 1) . Then the span of S = { X1 , X2 } is all of R2 , since every X ∈ R2 can be written as X = a1 X1 + a2 X2 , where a1 and a2 are scalars. Many other sets also span R2 . In fact almost every set of two vectors X1 and X2 in R2 span R2 . This can be seen from the diagram, where we have drawn a net parallel to X1 and X2 . Then X = a1 X1 + a2 X2 . Any vectors X1 and X2 would do equally well, as long as they do not point in the same (or opposite) direction. We collect some properties of the span Theorem 2.9 . Let R, S , and T be subsets of a linear space W . Then (a) R ⊂ span(R) . (b) R ⊂ S =⇒ span(R) ⊂ span(S ). (c) R ⊂ span(S ) and S ⊂ span(T ) =⇒ R ⊂ span(T ). (d) S ⊂ span(T ) =⇒ span(S ) ⊂ span(T ). (e) span(span(T )) = span(T ). (f) A vector Xj ∈ S is linearly dependent on the other elements of S ⇐⇒ span(S ) = span(S − { Xj }) . (Here S − { Xj } means the set A with the one vector Xj deleted). Proof: These all depend on the representation of span(S ) as a linear combination of the elements of S . (a) and (b)—Obvious. They really should be if you understand the definitions. (c). A direct translation of Theorem 7. (d). This is the special case R = span(S ) of part c. (e). By part (a) span(span(T )) ⊃ span(T ) . The opposite inclusion span(span(T )) ⊂ span(T ) is the special case S = span(T ) of part (d). (f). Xj linearly dependent on S − { Xj } =⇒ S ⊂ span(S − { Xj }) . Thus by part (d), span(S ) ⊂ span(S −{ Xj }) . Inclusion in the opposite direction span(S −{ Xj }) ⊂ span(S ) follows from part (b). Therefore span(S ) = span(S − { Xj }) means that Xj ∈ span(S ) can be expressed as a linear combination S − { Xj } , i.e., the other Xk ’s. Now most likely this proof was your first taste of abstract juggling and you find it difficult. Relax and don’t be impressed with how formidable it appears. Except for parts a and b, the whole business hinges on the explicit construction of Theorem ?. Since (d) is a special case of (c), a good exercise is to write out the proof of (d) without relying on (c). In R2 , let X1 = (1, 0) , X2 = (0, 1) , and X3 be any vector in R2 . Observe that X1 and X2 together span R2 . Thus X3 can be expressed as a linear combination of X1 and X2 , so that X1 , X2 , and X3 are linearly dependent. The next theorem is a generalization of this idea. 92 CHAPTER 2. LINEAR VECTOR SPACES: ALGEBRAIC STRUCTURE Theorem 2.10 . If a finite set A = { Xj , j = 1, . . . , n } spans a linear space W , then ev˜ ery set S = { Yj ∈ V, j = 1, . . . , m > n } with more than n elements is linearly dependent. In other words, every linearly independent set has at most n elements. ˜ Proof: Pick any n + 1 elements Y1 , . . . Yn+1 from S and throw the rest away. Call the new set S . We shall show that these n + 1 elements are linearly dependent. Then, since ˜ ˜ S ⊂ S , Theorem ? tells us that S is also linearly dependent. The only problem is how to carry out the proof without getting involved in a mess of algebra. By the principle of conservation of effort, this means that there will be some fancy footwork. Reasoning by contradiction, assume S is linearly independent. If we can show that span(A) = span(S − { Yn+1 }) , then span(S ) ⊂ span(S − { Yn+1 }) because span(S ) ⊂ V = span(A) = span(S − { Yn+1 })) . Since span(S − { Yn+1 }) ⊂ span(S ) , we can apply part f of Theorem ? to conclude that S is linearly dependent - the desired contradiction. Thus, assuming S = { Y1 , . . . , Yn+1 } is linearly independent, we are done if we prove that span(A) = span(S − { Yn+1 }) . Consider the set Bk = { Y1 , . . . , Yk , Xk+1 , . . . , Xn } . We know that B0 = A , so that span(B0 ) = span(A) = W . Then by induction we shall prove that span(Bk ) = W =⇒ span(Bk+1 ) = W . Since span(Bk ) spans W , then Yk+1 is a linear combination of the elements of Bk . Because the Y ’s are assumed linearly independent, this linear combination must involve at least one of Xk+1 , . . . , Xn . Say it involves Xk+1 (if not, relabel the X ’s to make it so). Then we can solve for Xk+1 as a linear combination of span(Bk+1 ) . Therefore W = span(Bk ) = span(Bk+1 ) . Putting this part together, we find that span(A) = W = span(B0 ) = span(B1 ) = . . . = span(Bn ) . But Bn = S − { Yn+1 } . Thus span(A) = span(S − { Yn+1 }) , and the proof is completed. Example. In R2 , any three (or more) non-zero vectors are linearly dependent since the two vectors X1 = (1, 0) and X2 = (0, 1) span R2 . Exercises (1) (a) In P2 (b) In R3 , p1 (x) = 1, p2 (x) = 1 + x, p3 (x) = x − x2 X1 = (0, 1, 1), X2 = (0, 0, −1), X3 = (0, 2, 3) . (c) In C [0, π ], f (x) = sin x, g (x) = cos x . (d) In Rn , e1 = (1, 0, 0, . . . , 0), e2 = (0, 1, 0, 0), . . . , en = (0, 0, . . . , 0, 1) . (2) Use the result of (d) to show that any set of n + 1 vectors in Rn must be linearly dependent. (3) (a) Find a set which spans i) P3 , ii) R4 (b) Show that no finite set spans l1 . (4) Let X1 , . . . , Xk be any elements of a linear space V . (a) Prove that span({ X1 , . . . , Xk }) = span({ X1 + aXj , X2 , . . . , Xk }) , where a is any scalar and Xj is any of the X2 , X3 , . . . , Xk , . (b) Prove that span({ X1 , . . . , Xk }) = span({ aX1 , X2 , . . . , Xk }), a = 0 . 2.4. BASES AND DIMENSION 93 (c) In Rn , consider the ordered set of vectors { X1 , X2 , . . . , Xk } , where Xj = (x1j , x2j , . . . , xnj ) . They are said to be in echelon form if i) no Xj is zero, and ii) the index of the first non-zero entry in Xj is less than the index of the first nonzero entry in Xj +1 , for each j = 1, . . . , k − 1 . Thus X1 = (0, 1, 0), X2 = (0, 0, 1) are in echelon form while X1 = (0, 1, 0), X2 = (1, 0, 1) are not in echelon form. Prove that any set of vectors in echelon form is always linearly independent. (I suggest a proof by induction). (5) For what real value(s) of the scalar α are the vectors (α, 1, 0), (1, α, 1) and (0, 1, α) in R3 linearly dependent? (6) (a) In R3 , let X1 = (3, −1, 2) . Express (−6, 2, −4) linearly in terms of X1 . Show that (3, 4, −7) cannot be expressed linearly in terms of X1 . Can (1, 2, 1) be expressed linearly in terms of X1 ? (b) In R3 , let A = { X1 , X2 } , where X1 = (1, 3, −2) and X2 = (2, 1, 1) . Express (3, −1, 4) linearly in terms of A . Show that (0, 0, 2) cannot be expressed linearly in terms of A . Can (0, 5, −5) be expressed linearly in terms of A ? (7) (a) In C [0, 10] , let f1 , . . . , f8 be defined by f1 (x) = x2 − x + 2, f2 (x) = (x + 1)2 f3 (x) = x + 3 f4 (x) = 1 f5 (x) = x3 f6 (x) = sin x f7 (x) = cos x f8 (x) = sin(x + π/4). Let A = { f1 , f2 , f3 } . Express f4 linearly in terms of A . Show that f5 cannot be expressed linearly in terms of A . Is f6 ∈ span(A) ? Is f8 ∈ span(f6 , f7 ) ? Is f6 ∈ span(f5 , f7 , f8 ) ? (b) If we let f9 (x) = (x − 1)3 , f10 (x) = 2x − 1 , determine which of the following sets are linearly dependent: (i) (ii) (iii) (iv) 2.4 { f1 , f3 , f10 }, { f1 , f5 , f9 }, { f3 , f4 , f10 }, { f1 , f4 , f5 , f9 }, Bases and Dimension If the set { X1 , . . . , Xm } spans the linear space W , is there any set with less than m vectors which also spans W ? There certainly is if the { X1 , . . . Xm } are linearly dependent, for if say Xm depends linearly upon the { X1 , . . . , Xm−1 } , then by Theorem ??, span({ X1 , . . . , Xm }) = span({ X1 , . . . , Xm }) = W , so then { X1 , . . . , Xm−1 } span W . We can continue and eliminate the extra linearly dependent elements until we obtain a set { X1 , . . . , Xn } of linearly independent vectors which still span W . Definition. A set of vectors Xj ∈ W, j = 1, . . . , n which is i) linearly independent, and ii) spans W is called a basis for W . Examples. 94 CHAPTER 2. LINEAR VECTOR SPACES: ALGEBRAIC STRUCTURE (1) In R2 , the vectors X1 = (1, 0) and X2 = (0, 1) are linearly independent and span R2 . Therefore X1 and X2 form a basis for R2 . The vectors X3 = (3, −1) and X4 = (−2, 2) in R2 are also linearly independent and span R2 . They thus constitute another basis for R2 . Almost any two vectors in R2 span R2 , as long as they do not point on the same or opposite direction. (2) In P2 , the polynomials p1 (x) = 1 , and p2 (x) = x − x2 do not form a basis. They are linearly independent but do not span the space - since for example you can never obtain the polynomial p(x) = x which is in P2 . If we add the third polynomial, say p3 (x) = x − 2x2 , then p1 , p2 and p3 do form a basis for P2 . Bases have an important property. Theorem 2.11 . If { X1 , . . . , Xn } form a basis for the linear space W , then every X ∈ W can be expressed uniquely as a linear combination of the Xj ’s. Remark: Every set which spans W has, by definition, the property that every X ∈ W can be expressed as a linear combination of the Xj ’s. The point here is that for a basis, this linear combination is uniquely determined. n Proof: Suppose that X = n ak Xk and also X = 1 bk Xk . We must show that ak = bk 1 n ck Xk , where ck = ak − bk . for all k . Subtracting the two equations we find that 0 = 1 But since the Xk ’s are linearly independent, by the Corollary to Theorem 5, the only way a linear combination can be zero is if ck = 0, k = 1, . . . , n , that is, ak = bk for all k . We have observed that a linear space may have several different bases. Is it possible that different bases contain a different number of elements? Our next theorem states that the answer is NO. Theorem 2.12 . If a linear space W has one basis with a finite number of elements, say n , then all other bases are finite and also have exactly n elements. Proof: We invoke Theorem ?. Let A be a basis with n elements and B be a basis with m elements. Now A spans W and the elements of B are linearly independent, so the Theorem ?, m ≤ n . Reversing the roles of A and B we find that n ≤ m . Therefore n = m. With this result behind us, we can now define the dimension of a linear space. Definition. If a linear space W has a basis with n elements, then we say that the dimension of W is n . If a linear space W has the property that no finite set of elements spans it, we say it is infinite dimensional. Remarks. Theorem ? states that the dimension of W is independent of which basis we happened to pick. If we want to emphasize the dimension of a finite dimensional space, we will write W n . Announcement. The dimension of Rn is n , for the n elements e1 = (1, 0, 0, . . . , 0), e2 = (0, 1, 0, . . . , 0), . . . , en = (0, . . . , 0, 1) are linearly independent and span Rn . A picture. We have seen that e1 = (1, 0, 0), e2 = (0, 1, 0), e3 = (0, 0, 1) form a basis in R3 . Thus every X ∈ R3 can be expressed uniquely as a linear combination of the ej ’s, X = a1 e1 + a2 e2 + a3 e3 . If we represent e1 as a directed line segment from the origin to (1, 0, 0) , and similarly for e2 and e3 , then X is the geometrical sum of a1 e1 + a2 e2 + a3 e3 , 2.4. BASES AND DIMENSION 95 and is represented as a directed line segment from the origin to (a1 , a2 , a3 ) . In R3 , e1 is usually written as i , e2 as j and e3 as k , so that a vector X ∈ R3 is written as X = a1 i + a2 j + a3 k . The points in the plane x3 = 0 , which is isomorphic to R2 , are then represented as ˆ X = a1ˆ + a2 ˆ + 0k = a1ˆ + a2 ˆ . We would retain this notation except that one runs out of i j i j letters when considering spaces of higher dimension. For that reason the subscript notation e1 , e2 , . . . is better suited to our purposes. It behooves us to show that the linear space C [0, 1] of functions continuous in the interval [0, 1] is infinite dimensional. This will be done by proving that the functions f0 (x) = 1, f1 (x) = ex , f2 (x) = e2x , . . . , fn (x) = enx , . . . are linearly independent. Assume N ak ekx , where N is any non-negative integer. We must show that all the ak ’s that 0 = k=0 are zero. The trick is to use induction. For N = 0 , we know that 0 = a0 only if a0 = 0 . N −1 ak ekx = 0 if and only Suppose 1, ex , e2x , . . . , e(N −1)x are linearly independent. Then k=0 N ak ekx = 0 if and only if if all of the ak ’s are zero. Let us show that this implies that k=0 all the ak ’s vanish. Take the derivative. The constant term drops out and we are left with 0 = a1 ex + 2a2 e2x + · · · + N aN eN x . Factor out ex 0 = ex (a1 + 2a2 ex + · · · + N aN e(N −1)x ). Since ex is never zero, we know that 0 = a1 + 2a2 ex + · · · + N aN e(N −1)x . By our induction hypothesis, this linear combination of !, ex , . . . , e(N −1)x can be zero if and only if a1 = a2 = a3 = · · · = aN = 0 . It remains to show that a0 = 0 . This is an N ak ekx = 0 and the vanishing of ak , for k ≥ 1 . immediate consequence of k=0 Since the functions 1, ex , e2x , . . . , are in C k [a, b] for any k we have shown that these spaces are infinite dimensional too. Moreover, the exact same proof also shows that the set { eα1 x , eα2 x , . . . , eαN x } , where α1 , . . . , aN are arbitrary distinct complex numbers, is linearly independent. This fact will be needed later. Perhaps we shall present a different proof - or several different ones - at that time. All of the other proofs still involve some calculus - but that should be no surprise since we used calculus to define the exponential function in the first place. Not all spaces of functions are infinite dimensional. For example, the function space A = { f ∈ C [−1, 1] : f (x) = a + bex , a, b ∈ R } has dimension 2. The functions f1 (x) = 1 and f2 (x) = ex constitute a basis for A because every f ∈ A can be written in the form f = a1 f1 + a2 f2 , where a1 and a2 are real numbers. Another basis for A is f3 (x) = 1 + ex and f4 = 2 − ex . There are many ways to see this. One is to observe that f3 + f4 = 3 b and 2f3 − f4 = 3ex . Thus if f (x) = a + bex ∈ A , then f = a (f3 + f4 ) + 3 (2f3 − f4 ) = 3 a 2b a b ( 3 + 3 )f3 + ( 3 − 3 )f4 . 96 CHAPTER 2. LINEAR VECTOR SPACES: ALGEBRAIC STRUCTURE The function space B = { f ∈ C [−1, 1] : f (x) = a sin(x + α), α, a ∈ R } , also has dimension two, since f (x) = (a cos α) sin x + (a sin α) cos x = a1 sin x + a2 cos x . Thus f1 (x) = sin x and f2 (x) = cos x form a basis. Actually, we have only shown that f1 and f2 span B , but not that they are linearly independent. You can settle that point yourselves. A few more remarks should be added. If A is a subspace of an n dimensional space n , we would like to enlarge a basis { e , . . . , e } for A to a larger basis { e , . . . , e } for W 1 1 n k all of W . Since A ⊂ W n , it is clear that k = dim A ≤ n . If A = W n , we are done since { e1 , . . . , ek } already span W n . Otherwise there is some element ek+1 in W n which is not in A . Let A1 = span{ e1 , . . . , ek+1 } ⊂ W n . If A1 = W n , then { e1 , . . . , ek+1 } form a basis for W n . Otherwise there is some element ek+2 in W n which is not in A1 . Form A2 = sp{ e1 , . . . , em+2 } . Repeat this process until you finally get a basis for all W n . Only a finite number of steps are needed since the dimension of W n is finite. This proves Theorem 2.13 . If A is a subspace of (finite dimensional) space W , then any basis for A can be extended to a basis that spans all of W . Consider a subspace A of a linear space W . Somehow we would like to discuss - and give a name to - the part A of V which is not in A . We would like A to be a subspace of V such that the only element of V which A and A share is 0, and such that every element in V can be written as the sum of an element in A and an element in A . Definition: . Let A be a subspace of the linear space V . A complementary subspace A of A is a subset of V with the properties 1. A is a subspace of V , 2. If X ∈ V , then X = X1 + X2 , where X1 ∈ A and X2 ∈ A . 3. A ∩ A = 0 . (The zero vector, not the empty set). Our first task is to prove Theorem 2.14 . Every subspace A ⊂ V has at least one complement A . . Proof: Let { e1 , . . . , em } be a basis for A , and { e1 , . . . em , em+1 , . . . , en } an extension to a basis for V . We shall verify that A = sp{ em+1 , . . . , en } satisfies both criteria. Now if X ∈ A and X ∈ A , then we can write X = a1 e1 + . . . + am em ∈ A , and X = am+1 ee+1 + . . . + an en ∈ A . Subtracting these equations, we find 0 = a1 e1 + . . . + am em − am+1 em+1 − . . . − an en . But since { e1 , . . . , en } is a basis for V , the elements are linearly independent. Thus a1 = a2 = . . . = am = am+1 = . . . = an = 0 , so X = 0 . Therefore A ∩ A = 0 . Furthermore, if X ∈ V since { e1 , . . . , en } is a basis for V , then n X= cj ej + cj ej = j =1 n m j =1 cj ej . j =m+1 Thus we just let X1 = c1 e1 + . . . + cm em ∈ A and X2 and X2 = cm+1 em+1 + . . . + cn en . It is easy to see that the above construction of A is independent of the basis chosen for A . This is because the construction of em+1 , . . . , en (Theorem ??) did not depend on the particular basis for A . That construction only utilized the fact that we can pick elements 2.4. BASES AND DIMENSION 97 not in A . However, the construction of A does depend on which elements em+1 , . . . en (not in A ) we pick. For example, let V = R2 , and A be some one dimensional subspace. Then we pick e , as any vector in A , and e2 as any vector not in A . The resulting complement A is then the span of e2 . But { e1 } could have been extended to a basis ˜ for V by choosing another vector e2 A . This determines a different complement A of ˜ A . A subspace has many possible complements. This ambiguity will not bother us since we shall only use the properties of a particular complement which do not depend on which particular complement is chosen. The dimension of the complement is such a property. It only depends on the dimension of the subspace A and the larger space V , and has the reasonable formula dim A = dim V − dim A , which we now prove. Theorem 2.15 . If A is a subspace of a linear space V and if A is any complement of A , then dim A + dim A = dim V. Thus, the dimension of A is determined by A and V alone. Proof: The dim A and dim V are given data. We shall compute dim A . Since the union of a basis for A with a basis for any A spans V (property 2), it is clear that dim A + dim A ≥ dim V . However A and any A intersect only at the origin (property 3) and are subspaces of V . Thus the union of their bases can span at most V , that is, dim A + dim A ≤ dim V . These two inequalities prove the theorem. REMARK. Some people refer to dim A as the codimension of A (complementary dimension). In this way they avoid mentioning A at all. The last theorem can be written as dim A + codim A = dim V . A simple result closes the chapter. Theorem 2.16 . If A is a subspace of V and A is a complement of A , then for X ∈ V the decomposition X = X1 + X2 , X1 ∈ A, X2 ∈ A is unique. ˜ ˜ Proof: Assume there are two decompositions, X = X1 + X2 and X = X1 + X2 . Then ˜ 1 + X2 = X1 + X2 or X1 − X1 = X2 − X2 . However the left side of this equation is in ˜ ˜ ˜ X ˜ A while the right is in A . The only element in both A and A is 0. Thus X1 = X1 and ˜ X2 = X2 . a figure goes here EXERCISES (1) (a) Let A = { X ∈ R2 : x1 = 0 } . Find a basis for A and extend it to a basis for all of R2 . Use this to define a complement A of A . Sketch A and A . Extend the same basis for A in a different way to a basis for all of R2 . Use this to ˜ ˜ define another complement A of A . Sketch A . (b) Find a basis for the subspace A = { X ∈ R2 : x1 + x2 + x3 = 0 } . Extend this basis to one for all of R3 . Define a complement A of A induced by this extension. Write X = (−1, 0, 7) as X = Y1 + Y2 where Y1 ∈ A and Y2 ∈ A . (2) (a) Let A = { p ∈ P2 : p(0) = 0 } . Find a basis for A and extend it to a basis for all of P2 . Define A induced by this extension. Is the particular polynomial p(x) = 1 + x2 in A ? in A ? Write p(x) as p(x) = q1 (x) + q2 (x) where q1 (x) ∈ A, q2 (x) ∈ A . 98 CHAPTER 2. LINEAR VECTOR SPACES: ALGEBRAIC STRUCTURE (b) Let A = { p ∈ P2 : p(1) = 0 } . Find a basis for A and extend it to a basis for all of P2 . (3) Let A be a subspace of a linear space V . Show by an example that a basis for V need not contain a basis for A . (4) If dim V = n and V = sp{ X1 , . . . , Xn } , prove that X1 , . . . , Xn are linearly independent. (5) Let V = P4 and A the subspace spanned by 1, x2 and x4 . Find three different subspaces complementary to A (you may specify a subspace by giving a basis for it). After all this about bases, it is probably best to notify you that properties of linear spaces are best defined and proved without introducing a particular basis. As soon as you define a property of a linear space in terms of a basis, you must then prove that the property is intrinsic to the space itself and does not depend upon the basis you choose. We met this problem in defining the dimension in terms of a basis - and were consequently forced to prove Theorem ? which stated that the property really only depended on the space itself, not on the basis chosen. This, in fact, corresponds to one of the major requisites for laws of physics: they should not depend upon the particular coordinate system you choose (picking a coordinate system is equivalent to picking a basis). Moreover, the laws should not depend on the units you choose for each axis of the coordinate system. But these are long, involved questions which must be investigated deeply to make our remarks precise. One should, however, distinguish theoretical issues from computational ones. In theoretical questions, the rule is never pick a specific basis unless there is no way out. On the other hand, for computational questions you must always pick a basis. Just as in physics, on order to perform any measurements, you must pick some specific coordinate system and specific units. If the theoretical foundations are firm, then you can feel confident that no matter what choice of basis you make, the essential nature of the results will remain unchanged. As an example, let us consider a point P and two different fixed coordinate systems in the plane of this paper. You should feel that any motion of the point P can be described adequately in either coordinate system - and that when the observers in the two coordinate systems get together and discuss the motion of P , they will agree as to what happened. A common example is the meeting of two people from countries using different units of money. Exercises (1) Prove that any n + 1 elements in a linear space of dimension n must be linearly dependent. (2) Prove that Pn has dimension n + 1 . (3) Since a basis for a linear space of dimension n must contain exactly n elements, all one must test is that the n elements which are candidates for a basis are linearly independent - or equivalently that they span the space. Show that the vectors { X1 , . . . , Xn } form a basis for Rn if and only if e1 , e2 , . . . , en can all be expressed as a linear combination of the { X1 , . . . , Xn } . 2.4. BASES AND DIMENSION 99 (4) Use Exercise 3 to determine which of the following sets from bases for R3 . (a) X1 = (1, 1, 0), X2 = (1, 0, 1), X3 = (0, 1, 1). (b) X1 = (1, 0, 1), X2 = (1, 1, 1). (c) X1 = (1, 0, 1), X2 = (1, 1, 0), X3 = (0, −1, 1). (d) X1 = (1, 1, 1), X2 = (1, 2, 3), X3 = (17, 3, 9), X4 = (−2, 7, −1). 11 (e) X1 = (−1, 0, 2), X2 = (1, 1, 1), X3 = ( 2 , 3 , −1). (5) Prove that the subspace of functions in C [0, π ] which vanish at x = 0 and at x = π is infinite dimensional by showing that the functions f1 (x) = sin x, f2 (x) = sin 2x, . . . , fk (x) = sin kx, . . . are all linearly independent. [Hint: Assume that 0 = N ak sin kx , for arbitrary N and show that all the ak ’s must be zero by multiplying k=1 both sides by sin nx and utilizing the important formula π sin nx sin kx dx = 0 0 π 2 , k=n , k = n. .] (6) Let C ∗ [a, b] denote the set of all complex-valued functions f (x) = u(x) + iv (x) which are continuous for x ∈ [a, b] . The complex number field C is the field of scalars for C ∗ . What is the dimension of the subspace A = { f ∈ C ∗ [−π, π ] : f (x) = aeix + be−ix , a, b ∈ C } ? Show that f1 (x) = cos x and f2 (x) = sin x constitute a basis for A . [Hint: Use (?) on p. ?]. (7) Which of the following sets of vectors form a basis for R4 ? (a) X1 = (1, 0, 0, 5), X2 = (0, 3, 2, 6), (b) X1 = (1, 6, 7, 0), X2 = (−2, 2, 5, 0), X3 = (4, 5, 6, 0), X4 = (7, 8, 3, 0). (c) X1 = (1, 2, 5, 7), X5 = (0, 0, 0, 1). X2 = (4, 9, 11, 8), X3 = (6, 3, 12, 2), X4 = (3, −4, 7, 6), (d) X1 = (1, 2, 3, 4), X2 = (0, 2, 3, 4), (8) Find a basis for the following subspaces. (a) A = { X ∈ R2 : x1 + x2 = 0 } (b) B = { X ∈ R3 : x1 + x2 + x3 = 0 } (c) C = { p ∈ P3 : p(0) = 0 } (d) D = { p ∈ P3 : p(1) = 0 } (e) E = { u ∈ C 1 [−1, 1] : u − u = 0 } (f) F = { u ∈ C 1 [−1, 1] : u + 2u = 0 } . X3 = (0, 0, 1, 2), X3 = (0, 0, 3, 4), X4 = (0, 0, 0, 1). X4 = (0, 0, 0, 4). 100 CHAPTER 2. LINEAR VECTOR SPACES: ALGEBRAIC STRUCTURE Chapter 3 Linear Spaces: Norms and Inner Products 3.1 Metric and Normed Spaces Until now we have been contented with being able to add two elements X1 and X2 of a linear space, and to multiply them by scalars, aX . Since only these algebraic operations have been defined, only algebraic questions could have been raised and answered. Notably absent were any mention of convergence, because the idea of one element of a linear space being “close” to another was not defined. In this chapter we shall introduce a distance or metric structure into linear spaces. Instead of lingering in the realm of generalities, we shall define metric and norm in this first section and devote the balance of the chapter to a particular kind of metric which generalizes the “Pythagorean distance” of ordinary Euclidean space. Fourier series supply a wonderful and valuable application. Our first notion of distance, that of a metric, makes sense for elements X, Y, Z of an arbitrary set S . The idea is to define the distance d(X, Y ) between any two elements of S . This distance is a function which assigns to every pair of points (X, Y ) a positive real number d(X, Y ) called the “distance between X and Y ”. Definition. Let S be a non-empty set. A metric on S is a real-valued function d : S × S → R , where X, Y ∈ S , which has the three properties: i) d(X, Y ) ≥ 0. d(X, Y ) = 0 ⇐⇒ X = Y ii) (symmetry) d(X, Y ) = d(Y, X ) , iii) (triangle inequality) d(X, Z ) ≤ d(X, Y ) + d(Y, Z ) . Well, they certainly are reasonable requirements for any function we intend to think of as measuring distance. Examples. (1) This first example is trivial but acts as an important check on intuition. With it, you see that every non-empty set can be regarded as a metric space with the following metric 0 , if X = Y d(X, Y ) = 1 , if X = Y. A moments reflection will show that this is a metric—but not too useful since it is so coarse. 101 102 CHAPTER 3. LINEAR SPACES: NORMS AND INNER PRODUCTS (2) For the real line, R , with the usual definition of absolute value we define d(X, Y ) = |X − Y | , which is clearly a metric. |X − Y | (3) Another less common metric may be given to R . We define d(X, Y ) = 1+|X −Y | . Only the triangle inequality is not evident—and that involves some algebra. This metric has the property that the distance between any two points is always less than one, d(X, Y ) < 1 for all X, Y ∈ R . (4) Rn can be endowed with many metrics. Let X = (x1 , x2 , . . . , xn ) , Y = (y1 , . . . , yn ) and Z = (z1 , . . . zn ) be arbitrary points in Rn . The metric you most expect is the Euclidean distance n 2 2 1/2 d(X, Y ) = [(x1 − y1 ) + . . . + (xn − yn ) ] (xk − yk )2 ]1/2 =[ k=1 Again, only the triangle inequality is not obvious. It is a consequence of the CauchySchwarz inequality 2 n n n x2 k ≤ xk yk k=1 2 yk , (3-1) k=1 k=1 which in turn is an immediate consequence of the algebraic identity 2 n xk yk n n k=1 2 yk − x2 k = k=1 k=1 1 2 n n (xi yj − xj yi )2 . i=1 j =1 And now the triangle inequality. Let ak = xk − yk , and bk = yk − zk . Then xk − zk = ak + bk . Thus, using Cauchy- Schwarz in the second line below, we find that n [d(X, Z )]2 = n n (ak + bk )2 = k=1 n n a2 + 2 k ≤ k=1 a2 k k=1 1/2 n a2 k = k=1 b2 k b2 k (3-2) k=1 1/2 2 n b2 k + k=1 n + k=1 b2 k ak bk + k=1 1/2 k=1 n n a2 + 2 k = [d(X, Y ) + d(Y, Z )]2 , k=1 so d(X, Z ) ≤ d(X, Y ) + d(Y, Z ). Another proof of the Schwarz and triangle inequalities for this metric will be given later in the chapter. (5) A second metric for R is n |xk − yk | d(X, Y ) = k=1 The axioms for a metric are easily verified. 3.1. METRIC AND NORMED SPACES 103 (6) A third metric for Rn is 1/p n p |xk − yk | d(X, Y ) = , 1 ≤ p < ∞. k=1 Example 4 is the special case p = 2 , while example 5 is the special case p = 1 . And again, all but the triangle inequality are obvious. However the triangle inequality, called Minkowski’s inequality in this general case, is not simple. We shall not prove it here. Perhaps it will appear as an exercise later. (7) The usual metric for C [a, b] is the uniform metric d(f, g ) = max |f (x) − g (x)| . a ≤ x≤ b Geometrically, this distance is the largest vertical distance between the graphs of f and g for all x ∈ [a, b] . (8) The space L1 [a, b] of functions whose absolute value is integrable has the “natural” metric b |f (x) − g (x)| dx, d(f, g ) = a which can be interpreted as the total area between the two curves. Since every function which is continuous for x ∈ [a, b] is integrable there, i.e., C [a, b] ⊂ L1 [a, b] , this metric is another metric for C [a, b] . (9) For the function space C 1 [a, b] , the standard metric is d(f, g ) = max |f (x) − g (x)| + max f (x) − g (x) a ≤ x≤ b a ≤ x≤ b The metric for C k [a, b] is defined similarly. There are many theorems one can prove about metric spaces (a metric space is a set S on which a metric is defined). Look in any book on general topology (or point set topology, as it is often called) and you will find more than enough to satisfy you. For most of our purposes metric spaces are too general. Normed linear spaces will suffice. The norm X of an element X in a linear space V is the “distance” of X from the origin—the 0 element of V . Definition. Let V be a linear space over the real or complex field. If to every element X ∈ V there is associated a real number X , the norm of X , which has the three properties i) X ≥ 0. X = 0 ⇐⇒ X = 0 ii) aX = |a| iii) X + Y ≤ X + Y , (triangle inequality), X (homogeneity), a is a scalar, then we say that V is a normed linear space. How does a norm differ from a metric? First of all, a norm is only defined on a linear space (since aX and X + Y appear in the definition) whereas a metric may be defined on any set (cf. example 1 above). But if we 104 CHAPTER 3. LINEAR SPACES: NORMS AND INNER PRODUCTS restrict our attention to linear spaces, how do the concepts of norm and metric differ? Every normed linear space can be made into a metric space in such a way that X is indeed the distance of X from the origin, d(X, 0) = X . The explicit formula for d(X, Y ) should surprise no one d(X, Y ) = X − Y . It is easy to check that d(X, Y ) is a metric. Thus every normed linear space has a “natural” metric induced upon it. However, a linear space which has a metric need not be a normed linear space. For example in R , the linear space of the real numbers, the metric of example 3 |X − Y | d(X, Y ) = 1 + |X − Y | is not associated with a norm because axiom ii) for a norm is not satisfied. Of the examples considered earlier, all but the first and third metrics arise from norms, in the sense that d(X, Y ) = d(X − Y, 0) = X − Y . By far the most common norm in Rn is that given by the Pythagorean theorem (example 4). Then 1/2 n X= x2 1 + x2 2 + ··· + 2 kn x2 x = k=1 and the induced metric is 1/2 n 2 d(X, Y ) = X − Y = (xk − yk ) k=1 For obvious historical reasons, we shall refer to R2 with this Pythagorean norm as Euclidean n -space, and denote it by En . Note that En is a linear space with a particular way of measuring length specified. A metric removes the floppiness from Rn , giving the additional structure needed to investigate those geometrical concepts which utilize the notion of distance. Once we have a norm (or metric) it becomes possible to discuss convergence of a sequence of elements. Definition: If V is a normed linear space, the sequence Xn ∈ V converges to X ∈ V if, given any > 0 , there in an N such that Xn − X < for all n > N. As an example, we shall prove the sample (n) (n) (n) Theorem 3.1 . A sequence of points Xn = (x1 , x2 , . . . xk ) in Ek converges to the (n) point X = (x1 , . . . , xk ) in Ek if and only if each component xj converges to its respective (n) limit, lim xj n→∞ = xj , j = 1, . . . , k . (n) Proof: i) Xn → X ⇒ xj (n) xj − xj ≤ → xj . This is a consequence of the trivial inequality (n) (n) (x1 − x1 )2 + · · · + (xk − xk )2 = Xn − X ; 3.1. METRIC AND NORMED SPACES 105 (n) (n) for if Xn − X < for n > N , then xj − xj < for n > N too. Thus xj → xj . If the subscripts are cluttering up the proof, go through it again in a special case, say (n) x2 → x2 . (n) ii) xj → xj ⇒ Xn → X . By hypothesis, given any > 0 , there are numbers (n) N1 , N2 , . . . Nk such that x1 − x1 < N2 , . . . , (n) xk − xk < work for all the (n) xj (n) , for all n > N1 , x2 − x2 < for all n > for all n > Nk . Pick N = max(N1 , N2 , . . . , Nk ) . This N will ’s, that is, for every j , (n) xj − xj < for all n > N. Thus Xn − X = < (n) (n) (x1 − x1 )2 + . . . + (xk − xk )2 √ 2 + ... + 2 = k, for all n > N. (3-3) Since k is a fixed finite number, this shows that Xn − X may be made arbitrarily small by picking n big enough, so Xn does converge to X . 1 n Example. In E4 , the sequence Xn = ( n+1 , 2, − n , 0) converges to X = (1, 2, 0, 0) since n 1 n+1 → 1, 2 → 2, − n → 0 , and 0 → 0 . A useful elementary result is Theorem 3.2 . If V is a normed linear space, and if Xn → X, Yn → Y in V , then for any scalars a and b, aXn + bYn → aX + bY . Proof: There are essentially no changes from the case of R1 . We must show that aXn + bYn − aX − bY can be made arbitrarily small by picking n large enough. One application of the triangle inequality aXn + bYn − aX − bY ≤ aXn − aX + bYn − bY , and the homogeneity of a norm, yields ≤ |a| Xn − X + |b| Yn − Y . Because Xn → X and Yn → Y , if n > N1 , then Yn − Y < . Pick N = max(N1 , N2 ) . Thus Xn − X < . Also, if n > N2 , then aXn + bYn − aX − bY < |a| + |b| = (|a| + |b|) , n > N, and the desired convergence is proved. For a given linear space V , there may be two (or even more) norms defined, say and 1 to distinguish them. Why carry them both around? First of all, a sequence may converge in one norm and not in the other. Second, even if both norms yield the same convergent sequences, one norm may be more convenient in some particular computation. Example. Consider the linear space C [−1, 1] of functions f (x) continuous for x ∈ [−1, 1] , with the two norms (Examples 7 and 8) 1 f ∞ = max |f (x)| − 1 ≤ x≤ 1 ; f 1 |f (x)| dx, = −1 106 CHAPTER 3. LINEAR SPACES: NORMS AND INNER PRODUCTS that is, the uniform norm and the L1 norm. We shall exhibit a sequence of functions which converge in the second norm but not in the first. Let fn (x) be 1 x ∈ [−1, − n2 ] 0, 3 1 1 n (x + n2 ) x ∈ [− n2 , 0] fn (x) = 3 (x − 1 ) x ∈ [0, 1 ] −n n2 n2 1 0 x ∈ [ n2 , 1] Then by inspection from the graph ( fn ∞ = n , and 1 is the area), we see that 1 fn 1 = n . As n → ∞, fn − 0 1 → 0 so that fn → 0 in the L1 norm. On the other hand, fn ∞ → ∞ so the limit does not exist in the uniform norm. If you look at the 1 1 graph, fn is zero except for a spike in the interval [− n2 , n2 ] . As n → ∞ , the function is zero in essentially the whole interval, except for the bit around the origin where it blows up—but it blows up slowly enough that the area under the curve tends to zero. However, we can prove the Theorem 3.3 . Let fn and f be continuous functions, n = 1, 2, . . . . If fn → f in the uniform norm, then also fn → f in the L1 norm. Remark. We have just seen that the converse is false. Proof: An immediate consequence of the Lemma 3.4 fn − f 1 ≤ (b − a) fn − f ∞ Proof: b fn − f 1 b |fn (x) − f (x)| dx ≤ = fn − f n ∞ dx a (3-4) b = fn − f dx = (b − a) fn − f ∞ a Exercises (1) In the set Z , define d(m, n) = |m − n| where |x| is ordinary absolute value. Prove that d(m, n) is a metric. (2) Suppose that d1 (X, Y ) and d2 (X, Y ) are both metrics for a set S , where X, Y ∈ S . a). Show that [d1 (X, Y )]2 is not, in general, a metric. b). Prove that d1 + d2 and d2 + d2 are also metrics for S . 1 2 (3) Prove that the function d(X, Y ) = a norm on R . |X − Y | 1+|X −Y | , (4) Let X = (x1 , . . . , xk ) ∈ Rk . Define (a) Prove that X ∞ X ∞ X 1 1≤ l ≤ k 1/2 k |xl | , and = = max |xl | . is a norm for Rk , and write down the induced metric. k (b) Let X, Y ∈ R , is a metric, but that it is not X 2 2 |xl | = l=1 . l=1 Prove X ∞ ≤X 2 ≤X 1 ≤k X ∞. 3.2. THE SCALAR PRODUCT IN E2 107 1 1 (c) Consider the sequence Xn = (1 − n , −7, n2 ) in R3 . In which of the norms ∞, 2, 1 does it converge, and to what? (5) Let Xn be a sequence of elements in a normed linear space V (not necessarily finite dimensional). Prove that if Xn → X , then the sequence Xn is bounded in norm (a sequence Xn in a normed linear space is bounded if there is an M ∈ R such that Xn ≤ M for all n ). [Hint: Compare with Theorem 6, page ??]. (6) Compute the R3 . 1, 2 , and (cf. Ex. 4) norms of the following vectors in ∞ b) Y = (2, −2, 1) , a) X = (1, 2, 2) , (7) Compute the [−1, 1] . 1, 2 , and a) f (x) = −2x + 3 ∞ c) Z = (0, 3, −4) , d) W = (0, −1, 0) . norms of the following functions for the interval b) g (x) = sin πx , c) hn (x) = xn d) As n → ∞ , does the sequence hn converge in any of these three norms? (8) Which of the following define norms for the given linear spaces? a) For R3 , [X ] = x2 + x2 + x2 x 1 3 b) For P3 , [p] = max p(x) 0 ≤ x≤ 1 c) For P3 , [p] = max |p(x)| 0 ≤ x≤ 1 d) For R3 , [X ] = |x1 | + |x2 | e) For R4 , [X ] = 1 + x2 + x2 + x2 + x2 . 1 2 3 4 (9) Prove that [X ] = x2 + x2 defines a norm for R2 (some algebraic fortitude will be 1 2 needed to prove the triangle inequality). 3.2 The Scalar Product in 2 E In Euclidean space E2 —which we remind you is R2 with the Euclidean norm X = x2 + x2 —one can introduce many geometric concepts and develop a corresponding geo1 2 metric theory. Most important of these concepts is that of angle—especially orthogonality (perpendicularity). It turns out that these ideas generalize almost immediately to all En , and even to some exceedingly important infinite dimensional spaces. This section is devoted to the most simple situation: E2 . Please look at the pictures. To begin, we introduce the scalar product (also called the dot product, or inner product) of two vectors X and Y . Definition: If X and Y are two vectors in E2 , their scalar product X, Y (sometimes written X · Y ) is defined by X, Y = X Y cos θ, where θ is the angle between X and Y . Notice that the scalar product of two vectors is a real number, a scalar, not another vector. We need not specify the direction in which θ is measured, counterclockwise or clockwise, since cos θ = cos(−θ) . Further, we can use either the acute or obtuse angle 108 CHAPTER 3. LINEAR SPACES: NORMS AND INNER PRODUCTS between X and Y since cos(2π − θ) = cos θ . It is important to observe that the scalar product of two vectors is defined independent of any coordinate system. We are immediately led to some simple consequences. Lemma 3.5 . Two vectors X and Y are orthogonal if and only if X, Y = 0 Proof: If X and Y are orthogonal, the angle θ between them is π , so X, Y = 2 Y cos θ = 0 . If X Y cos π = 0 . In the other direction, if X, Y = 0 , then X 2 π neither X nor Y = 0 , then cos θ = 0 , that is θ = π or 32 . Thus X is orthogonal 2 to Y . If X or Y = 0 , then one of them is just the point at the origin, the zero vector. We agree to say that the zero vector is orthogonal to every other vector. With this agreement, X, Y = 0 ⇒ X ⊥ Y , and the second half of the theorem is proved too. There is a nice geometric interpretation of the scalar product. A hint of it appeared in our last lemma. Let e be a unit vector, e = 1 . Consider X, e = X cos θ (see figure). This is the length of the projection of X in the direction of e , or in other words, the length of the projection of X into the subspace spanned by the single vector e . Strictly X, e is not really a “length”, since “length” carries the implication of being positive, whereas the real number X, e will be negative if the projection “points” in the direction opposite to e . We shall, however, allow ourselves this abuse of language. The vector U1 which is the projection of X into the subspace spanned by e is U1 = X, e e . If Y is a (non-zero) vector in E2 which is not a unit vector, the above geometric idea goes through by making the simple observation that given any vector Y = 0 , the vector e = Y / Y is a unit vector in the direction of Y . Now you are certainly wondering how in the world we compute this scalar product. You could take out your ruler, protractor and table of cosines—but we will present a more convenient method. In order to compute this as is always the case, a particular basis must be chosen. Then the vectors X and Y can be given explicitly in terms of the basis. Since we want to show that the concepts are independent of any particular basis, you must relax and be patient. Only after the theory has been exposed will we reveal how to compute in terms of a given basis. Theorem 3.6 (a) X, X = X 2 (b) X, Y = Y , X (c) aX, Y = a X, Y (d) X, aY = a X, Y , where a ∈ R . (e) X + Y, Z = X, Z + Y , Z (f) X, Y + Z = X, Y + X, Z (g) | X, Y | ≤ X where a ∈ R . Y (Cauchy- Schwarz inequality) Proof: (a) Obvious since θ = 0 and cos 0 = 1 . (b) Obvious since cos(−θ) = cos θ . 3.2. THE SCALAR PRODUCT IN E2 109 (c) The vectors X and aX lie along the same line through the origin. There are two cases, a > 0 and a < 0 ( a = 0 is trivial). If a > 0 , the angle θ between X and Y is identical to that between aX and Y . Since aX = a X for a > 0 , this case is proved, for aX, Y = aX Y cos θ = a X, Y . If a < 0 , then aX points in the direction opposite to X . Thus the angle θ1 between aX and Y equals π − θ , where θ is the angle between X and Y . The following computation completes the proof: aX, Y = aX = − |a| Y cos θ1 = |a| X X Y cos(π − θ) Y cos θ = a X (3-5) Y cos θ = a X, Y (d) By (b) and (c) and (b) again we are done X, aY = aY, X = a Y , X = a X, Y . (e) This is the most subtle part. We shall rely on the interpretation of the scalar product U, e as the length of the projection of U in the subspace spanned by e . First, let e = Z/ Z be the unit vector in the direction of Z . We shall show that X + Y, e = X, e + Y , e . A picture is all that is needed now. Two situations are illustrated, where both X and Y are on the same side of the line perpendicular to e and a figure goes here where X and Y are on opposite sides of that line. The vector X + Y is found from X and Y by the parallelogram rule for addition. Interpreting the scalar product of a vector with e as the length of the projection into the subspace (line) spanned by e , we see (look) that we must prove → → → OP =OQ + OM . → → But since OA and BC are on opposite sides of the same parallelogram, know that → → OM =QP both in magnitude and direction. The natural substitution yields → → → OP =OQ + QP , which is indeed all we desired. Thus X + Y, e = X, e + Y , e To prove the general result for Z = Z e , multiply the last equation by is a scalar. Then by part a we find X + Y, Z e = X, Z e + Y , Z e , or X + Y, Z = X, Z + Y , Z , We are done. Z , which 110 CHAPTER 3. LINEAR SPACES: NORMS AND INNER PRODUCTS (f) By parts (b), (e) and (b) again we obtain the result. X, Y + Z = Y + Z, X = Y , X + Z, X = X, Y + X, Z (g) Obvious since |cos θ| ≤ 1 . It is evident that equality occurs when and only when cos θ ± 1 , that is, when X and Y lie along the same line (possibly pointing in opposite directions). If e is a unit vector, we know how to find the projection U1 of a given vector X into the subspace spanned by e , it is U1 = X, e e . Similarly, if Y is any vector—not necessarily of length one, then since Y / Y is a unit vector in the direction of Y , the projection of X into the subspace spanned by Y is X, Y / Y Y / Y = X, Y Y / Y 2 . We can also find the projection U2 of X into the subspace orthogonal to the unit vector e . Since the sum of U1 and U2 must add up to X, X = U1 + U2 , we find that U2 = X − U1 = X − X, e e . Thus, we have proved Theorem 3.7 . If X and Y are any two vectors, Y = 0 , then X can be decomposed into two vectors U1 and U2 , X = U1 + U2 such that U1 is in the subspace spanned by Y and U2 is in the orthogonal subspace. The decomposition is given by U1 = X, Y Y / Y 2 and U2 = X − X, Y Y / Y 2 , so that X = X, Y Y Y 2 + (X − X, Y Y Y 2 ). Without further delay, we shall show how to compute the scalar product of two vectors. In order to carry this out we must introduce a basis. Let X1 and X2 be any two vectors in E2 which span E2 . Then every vector X ∈ E2 can be written in the form X = a1 X1 +a2 X2 , where the scalars a1 and a2 are determined uniquely by the vector X . Now it is most convenient to have a basis whose vectors are i) orthogonal to each other and ii) have unit length. Such a basis is called an orthonormal basis (orthogonal and normalized to have unit length). In other words e1 and e2 are an orthonormal basis for E2 if ej = 1 and e1 , e2 = 0 . This requirement is most conveniently stated by introducing the Kronecker symbol δjk 0 j=k δjk = 1 j = k. Then the orthonormality property reads ej , ek = δjk , j, k = 1, 2 . The notation is perhaps excessive for this simple case, but will really be useful in our generalizations. Therefore, let e1 and e2 be an orthonormal basis for E2 , so that if X ∈ E2 , X = x1 e1 + x2 e2 . Fix the basis throughout the ensuing discussion. Observe that x1 and x2 can be computed in terms of X , and the basis vectors e1 and e2 , viz X, e1 = x1 e1 + x2 e2 , e1 = x1 e1 , e1 + x2 e2 , e1 = x1 , since e1 , e1 = 1 and e1 , e2 = 0 . Similarly, X, ex = x2 . Thus we have proved Theorem 3.8 . If { ej }, j = 1, 2 , form an orthonormal basis for E2 , then every vector 2 X ∈ E2 can be written as X = xj ej , where xj is the length of the projection of X j =1 into the subspace spanned by ej , xj = X, ej . 3.2. THE SCALAR PRODUCT IN E2 111 If X = x1 e1 + x2 e2 and Y = y1 e1 + y2 e2 are any two vectors in E2 , then X, Y = x1 e1 + x2 e2 , y1 e1 + y2 e2 = x1 e1 + x2 e2 , y1 e1 + x1 e1 + x2 e2 , y2 e2 = x1 e1 , y1 e1 + x2 e2 , y1 e1 + x1 e1 , y2 e2 + x2 e2 , , y2 = x1 y1 e1 , e1 + x2 yx e2 , e1 + x1 y2 e1 , e2 + x2 y2 e2 , e2 = x1 y1 + 0 + 0 + x2 y2 = x1 y1 + x2 y2 . Now you see how easy it is to compute the scalar product of X and Y in terms of the representation from an orthonormal basis. Let us rewrite our result formally. 2 Theorem 3.9 . Let { ej }, j = 1, 2 , form an orthonormal basis for E2 . If X = xj ej j =1 2 and Y = vj ej , then j =1 2 X, Y = xj yj = x1 y1 + x2 y2 . j =1 Some numerical examples should reassure you of the basic simplicity of the computation. As our orthonormal basis in E2 , we choose the vectors e1 = (1, 0) and e2 = (0, 1) . These both have unit length, and are perpendicular (one is on the horizontal axis, the other on the vertical axis). Let X = (−2, 3) . Then X = −2e1 + 3e2 . Notice that −2e1 and 3e2 are exactly the projections of X into the subspaces spanned by e1 and e2 respectively. If Y = (1, −2) , then our theorem shows that X, Y = (−2)(−1) + (3)(−2) = −2 − 6 = −8. From this computation we can reverse the geometric procedure and find the angle θ between X and Y , for we know the formula X, Y . X Y √ √ √ √ In this example, X, Y = −8, X = 4 + 9 = 13 and Y = 1 + 4 = 5 . Thus −8 θ = cos−1 ( √65 ) which can be evaluated by consulting your favorite numerical tables. It is equally simple to check if two vectors are orthogonal. Let X = (2, −3) and Y = (6, 4) . Then X, Y = (2)(6) + (−3)(4) = 0 ; consequently X and Y are orthogonal. Another consequence is the law of cosines. Let X = (x1 , x2 ) and Y = (y1 , y2 ) . Then from the parallelogram construction, the length of the segment joining the tip of X to the tip of Y has length Y − X . But Y − X 2 = Y − X, Y − X = X 2 + Y 2 − 2 X, Y = X 2+ Y 2−2 X Y ??θ. One more example. We shall find the distance of the point P = (−3, 2) from the coset A = { X = (x1 , x2 ) ∈ E2 : x1 − 2x2 = 2 } . Pick some point in X0 in A , say X0 = (3, 1 ) . 2 The distance d from P to A is then the length of the projection of the segment X0 P onto a line l orthogonal to A . First of all, we can replace the segment X0 P by a vector 3 from the origin 0 to the point Q = P − X0 = (−6, 2 ) , for the length of the projection of 0¯ onto a line l orthogonal to A is equal to the length of the projection of X¯P onto Q 0 cos θ = 112 CHAPTER 3. LINEAR SPACES: NORMS AND INNER PRODUCTS l (see figure). Now we have the vector Q = (−6, 3 ) ; all we need to compute the desired 2 projection is another vector N orthogonal to A , for then d = | Y, N/ N | . To find a vector N orthogonal to A , we realize that N will also be orthogonal to the subspace S parallel to the coset A so A = S + X0 , where S = { X = (x1 , x2 ) ∈ E2 : x1 − 2x2 = 0 } . If N = (n1 , n2 ) and X is any element of S , since N ⊥ S , we must have 0 = X, N = x1 n1 + x2 n2 . However X ∈ S so x1 − 2x2 = 0 . We want the equation x1 n1 + x2 n2 = 0 to hold for all points on x1 − 2x2 = 0 , that is for all X ∈ S . This is only possible if √ 1 = 1 · c and n2 = −2 · c , where c is any constant. Thus N = c(1, −2) and n N = |c| 5 . The distance d between the point P and the coset A is then c 3 √ (1, −2) d = | Y, N/ N | = (−6, ), 2 |c| 5 c 9 3 −2c = (−6)( √ ) + ( )( √ ) = √ . 2 |c| 5 |c| 5 5 (3-6) This example contained a plethora of ideas. It would be wise to go through it again and list the constructions and concepts used. The exercises will develop many of them in greater generality. Now you should try some problems on your own. Exercises → (1) If X = (3, 4) and Y = (5, −12) are two points in E2 , find the angle between OX → and OY , where 0 is the origin. (2) If X = (3, −4) and Y = (5, 12) are two vectors in E2 , find vectors U1 ∈ span(Y ) and U2 orthogonal to span(Y ) such that X = U1 + U2 . (3) Show that the vector N = (a1 , a2 ) is perpendicular to the straight line whose equation is a1 x1 + a2 x2 = c (you will have to supply the natural definition of what it means for a vector to be perpendicular to a straight line). (4) (a) Find the distance of the point P = (2, −1) from the coset A = { X ∈ E2 : x1 + x2 = −2 } . (b) Find the distance between the two “parallel” cosets A defined above and B = { X ∈ E2 : x1 + x2 = 1 } . (Hint: Draw a figure and observe that P ∈ B ) . (5) (a) Prove that the distance d of the point P = (y1 , y2 ) from the coset A = { X ∈ E2 : a1 x1 + a2 x2 = c } is given by d= |a1 y1 + a2 y2 − c| a2 + a2 1 2 . (b) Prove that the distance d between the two “parallel” cosets A = { X ∈ E2 : a1 x1 + a2 x2 = c1 } , and B = { X ∈ E2 : a1 x1 + a2 x2 = c2 } is given by d= |c1 − c2 | a2 + a2 1 2 . (Hint: If you use part (a) and are cunning, the derivation takes but one line). 3.3. ABSTRACT SCALAR PRODUCT SPACES 113 (6) (a) If it is known that X, Y1 = X, Y2 , and that X = 0 for a fixed X , can you “cancel” X from both sides and conclude that Y1 = Y2 ? Reason? (b) If it is known that X, Y Reason? (c) If it is known that Y1 = Y2 ? Reason? = 0 for every X , can you conclude that Y = 0 ? X , Y1 = X , Y2 for every X , can you conclude that (7) (a) Show that the vector Z= X Y+ Y X . X+Y bisects the angle between the vectors X and Y . (b) Show that the vector X Y. X Y + Y X is perpendicular to the vector Y X− (8) Express the angle between an edge and a diagonal of a rectangle in terms of the scalar product. (9) Let two of the sides of a parallelogram be given by the vectors X and Y . The parallelogram theorem states that the sum of the squares of the sides is equal to the sum of the squares of the diagonals, that is, X +Y 2 + X −Y 2 =2 X 2 +2 Y 2 . Prove this in two ways: i) using elementary geometry, and ii) using only the fact that X and Y are elements of a linear space, and the properties of the scalar product contained in Theorem 4 (using 4a to define ). (10) Let X be any vector in E2 , and let e be a unit vector. Define the vector U = ae , where a = X, e is the length of the projection of X into the subspace spanned by e , and V = αe , where α is any scalar. Prove that X −V 2 ≥ X −U 2 =X 2 −U 2 =X 2 − a2 . This shows that in the subspace spanned by e , the vector closest to X is the projection U of X into that subspace. (11) If X is orthogonal to Y , prove the Pythagorean theorem X + Y 2 = X 2 + Y using only V 2 = V , V and the properties of a scalar product in Theorem 4. (12) Let X and Y be orthogonal elements of E2 , with neither X nor Y that X and Y are linearly independent. Do not introduce a basis. 3.3 2 zero. Prove Abstract Scalar Product Spaces We shall turn the tables around. Whereas in the last section we defined the scalar product geometrically and deduced its properties, in this section we define a scalar product space as a linear space upon which a scalar product is defined, and the scalar product is stipulated to have the properties deduced earlier. After presenting our abstract definition, we shall give examples—other than E2 —of scalar product spaces. 114 CHAPTER 3. LINEAR SPACES: NORMS AND INNER PRODUCTS Definition. A linear space H is called a real scalar product space if to every pair of elements X, Y ∈ H is associated a real number X, Y , the scalar product of X and Y , which has the properties 1. X, X ≥ 0 with equality if and only if X = 0 . 2. X, Y = Y , X 3. aX, Y = a X, Y , a ∈ R 4. X + Y, Z = X, Z + Y , Z You should observe that the scalar product in E2 does have these properties (Theorem X, X and suspect that 4). Using E2 as our model, it is natural to define X = is indeed a norm on the linear space H . This is true, but proving the triangle inequality for this norm using only properties 1-4 will take some work. We shall do just that after presenting Examples (1) Let X = (x1 , . . . , xn ) and Y = (y1 , . . . , yn ) be points in the linear space R2 . We define X, Y = x1 y1 + x2 y2 + . . . + xn yn . Only easy algebra is needed to verify that the real number X, Y satisfies all of the properties of a scalar product. It turns out (after we prove the triangle inequality) X, X is the Euclidean norm, so this is E2 . that the natural norm X = (2) This example is the first hint that our abstractions are fruitful. Let the functions f (x) and g (x) be points in the linear function space C [a, b] of real-valued functions continuous for a ≤ x ≤ b . We define b f, g = f (x)g (x) dx. a You might be surprised; in any event let us verify that the real number f , g associated with the pair of functions f and g does satisfy the four properties of a scalar product. (i) (ii) (iii) (iv) b f , f = a f 2 (x) dx . This is clearly non-negative and f = 0 implies that b f , f = 0 . All we must show is that if f , f = a f 2 (x) dx = 0 , then f = 0 . By contradiction, assume f (x) = 0 . Then there is some point x0 ∈ [a, b] such that f (x0 ) = c = 0 . Thus f 2 (x0 ) = c2 > 0 . Since f —and hence f 2 —is continuous, this means that f 2 is positive in some interval about x0 (p. 29b, b Theorem I), so that a f 2 (x) dx > 0 , the desired contradiction. b b a f (x)g (x) dx = a g (x)f (x) dx = g , f . b b αf, g = a αf (x)g (x) dx = α a f (x)g (x) dx = α b f + g, h = a (f (x) + g (x))h(x) dx b b = a f (x)h(x) dx + a g (x)h(x) dx f, g = f , g , where α ∈ R . = f , h + g, h . There. We did it. After we prove the triangle inequality for an abstract scalar product space, the natural candidate for a norm f is a norm: b f 2 (x) dx. f= a 3.3. ABSTRACT SCALAR PRODUCT SPACES 115 I like this space very much. You will be meeting it often, becoming much more intimate with its finer features. We shall—somewhat improperly—refer to this linear space with the given scalar product as L2 [a, b] . The name is improper since L2 [a, b] is customarily used for our space but with more general functions and an extended notion of integration. (3) Let f (x) and g (x) be in C [0, ∞] . This time define ∞ f, g = f (x)g (x)e−x dx. 0 e−x Since is continuous and positive for all x , we are assured that f , f ≥ 0 , with equality if and only if f = 0 . The other properties of an inner product follow from simple manipulations. Do them. Remark: Complex scalar product spaces are defined similarly. For them, X, Y may be a complex number, and complex scalars are admitted. The only change in the axioms is that property 2 is dropped in favor of ¯ Y , X = X, Y , ¯ 2. where the bar means take the complex conjugate of the complex number X, Y . Since we shall not develop the theory far enough, our attention henceforth will be restricted to real scalar product spaces. The first order of business is to prove that the natural candidate for a norm X = X, X is in fact a norm for the linear space V . Only properties 1-4 may be used. (1) X ≥ 0 , with equality if and only if X = 0 . This follows immediately from the corresponding property of X, X . (2) aX = |a| |a| X. X . For aX = aX, aX = a2 X, X = |a| X, X = The proof of the triangle inequality (3) X + Y ≤ X + Y involves more labor. We shall first need to prove the CauchySchwarz inequality (cf. Theorem 4,g). Theorem 3.10 (Cauchy-Schwarz inequality). | X, Y | ≤ X Proof: If either X or Y nor Y is zero and define Y. is zero, this is immediate. Thus, assume that neither X X Y , V= , X Y so that both U and V are unit vectors, U = V = 1 . Then U= 0≤ U ±V 2 = = = U ± V, U ± V U, U ± U, V ± V , U + V , V U 2 ± 2 U, V + V 2 , . Since U = 1 and V = 1 , this shows ± U, V ≤ 1 . Substituting for U and V , we obtain the inequality sought: | X, Y | ≤ X Y . 116 CHAPTER 3. LINEAR SPACES: NORMS AND INNER PRODUCTS Theorem 3.11 (Triangle inequality) X +Y ≤ X + Y . Proof: This is identical to that given in section 1. X + Y 2 = X + Y, X + Y X 2 + 2 X, Y + Y 2 . By Cauchy-Schwarz, X, Y ≤ X Y , so X +Y 2 ≤X 2 +2 X 2 Y +Y = = ( X + Y )2 . Now take square root of both sides to find X +Y ≤ X + Y . Nice, eh? See how clean everything is. We have proved X, X in terms Theorem 3.12 . If H is a scalar product space and we define X = of the scalar product, then is a norm and H is a normed linear space with that norm. This special case where the norm is induced by a scalar product is called a pre-Hilbert space (an honest Hilbert space has the additional property of being “complete”). Let us state two easy algebraic consequences of our axioms for a scalar product. The proofs are identical to those of Theorem 4 in the previous section. Theorem 3.13 X, aY = a X, Y , a∈R (3-7) X, Y + Z = X, Y + X, Z , (3-8) Needless to say, we hope you are still thinking in the geometric terms presented earlier. In particular, the next definition should be reasonable. Definition Two vectors X, Y are said to be orthogonal if X, Y = 0 . The Pythagorean theorem suggests Theorem 3.14 . If X and Y are orthogonal, then X ±Y 2 =X 2 +Y 2 , and conversely. Proof: Both parts are an immediate consequence of the identity X ±Y 2 = X + Y, X + Y = X 2 ± 2 X, Y + Y 2 . Examples. (1) Let X = (2, 3, −1) and Y = (1, −1, −1) be points in E3 , where we use the scalar product of example 1 in this section. Then X, Y = 2 · 1 + 3(−1) + (−1)(−1) = 0 so X and Y are orthogonal. Similarly X = (2, 3, 1, −1) and Y = (3, −3, 3, 0) in E4 are orthogonal. A useful example is supplied by the vectors e1 = (1, 0, 0, . . . , 0) , e2 = (0, 1, 0, 0, . . . , 0), . . . , en = (0, 0, . . . , 0, 1) in En . These are orthonormal since ek , ek = 1 , but ek , el = 0, k = l , that is, ek , el = δkl . 3.3. ABSTRACT SCALAR PRODUCT SPACES 117 (2) Consider the functions Φk (x) = sin kx in L2 [−π, π ] , where k = 1, 2, 3, . . . . Then, 1 since sin θ sin Ψ = 2 [cos(θ − Ψ) − cos(θ + Ψ)] , we find that π sin2 kx dx = π Φk , Φk = −π and for k = l p Φk , Φl = i−π sin kx sin lx dx = 0. as a computation reveals. Thus in L2 [−π, π ] the function sin kx is orthogonal to the function sin lx when k = l . The whole computation may be summarized by Φk , Φl = sin kx, sin lx = πδkl . It is only the factor π which does not allow us to say that the Φk are orthonormal— √ kx but that is easily patched up. Let ek (x) = sin π . Then sin kx sin lx √ ,√ π π 1 = sin kx, sin lx , π ek , el = or ek , el = δkl . √ kx Therefore the functions ek (x) = sin π are orthonormal. Don’t attempt to imagine it. Just keep on thinking of a big E2 and all will be well. So far we have discussed the notion of two vectors X and Y being orthogonal. This can be restated as one vector X being orthogonal to the subspace A spanned by Y , for all vectors in A are of the form aY where a is a scalar, and X, aY = 0 ⇐⇒ X, Y = 0 since X, aY = a X, Y . One can also introduce the concept of a vector X being orthogonal to an arbitrary subspace A . Think of A as being a plane (through the origin of course). Definition The vector X is orthogonal to the subspace A if X is orthogonal to every vector in the subspace A . In practice, the usual way to check if X is orthogonal to the subspace A is as follows. Pick some basis { Y1 , Y2 , . . . } for A . Then every Y ∈ A is of the form Y= ak Yk (if the basis has an infinite number of elements—that is, if A is infinite dimensional— one should worry about convergence; however we shall ignore that issue for now). By the algebraic rules for the scalar product, we find that X, Y = X, ak Yk = ak X, Yk . Thus, X is orthogonal to the subspace A if X is orthogonal to every element in some basis for A X, Yk = 0 . For example, if A is the x1 x2 plane in E3 , and X is the vector (0, 0, 1) , then we can show that X = (0, 0, 1) is orthogonal to A by showing it is orthogonal to both 118 CHAPTER 3. LINEAR SPACES: NORMS AND INNER PRODUCTS the vector e1 = (1, 0, 0) and to e2 = (0, 1, 0) , since e1 and e2 form a basis for A . The computation X, e1 = 0 and X, e2 = 0 is immediate. Because Y1 = (1, 2, 0) and Y2 = (1, −1, 0) also form a basis for A , we could prove that X is orthogonal to A by showing that X, Y1 = 0 and X, Y2 = 0 —which is equally simple. A less obvious example is supplied by the function Ψ(x) = cos x which is orthogonal to the subspace A spanned by Φ1 (x) = sin x, Φ2 (x) = sin 2x, . . . , Φn (x) = sin nx in Lx (−π, π ) . The proof is a consequence of the integration formula π Ψ, Φk = cos x sin kx dx = 0 for all k. −π Even more general than a vector being orthogonal to a subspace is the idea that two subspaces A and B are orthogonal, by which we mean that every vector in A is orthogonal to every vector in B . If A is a subspace of a scalar product space H , then it is natural to define the orthogonal complement A⊥ of A as the set A⊥ = { X ∈ H : X, Y = 0 for all Y ∈ A } of vectors X orthogonal to A , that is, orthogonal to every vector Y ∈ A . The set A⊥ is a subspace since it is closed under vector addition and multiplication by scalars (Theorem 2, p. 142). Without fear of evoking surprise, we define the angle θ between two vectors X and Y by the formula X, Y . cos θ = X Y No matter what X and Y are, this defines a real angle since the right side of the equation is a real number between −1 and +1 (by the Cauchy-Schwarz inequality). To be honest, there is little use for the concept of angles other than right angles. In E3 the formula has some use, but is totally unused for more general scalar product spaces. If we are given a set of linearly independent vectors { X1 , X2 , . . . } which span a linear scalar product space H , how can we construct an orthonormal set { e1 , e2 , . . . } which also 1 spans the space? The process is carried out inductively. Let e1 = X1 . Now we want a X unit vector e2 orthogonal to e1 . A reasonable candidate is e2 = X2 − X2 , e1 e1 , ˜ which is X2 with the projection of X2 onto e1 subtracted off (see fig.) This vector e2 is ˜ orthogonal to e1 since e2 , e1 = 0 . We divide by its length to obtain the unit vector e2 , ˜ e2 = X2 − X2 , e1 e1 . X2 − X2 , e1 e1 Next we take X2 and subtract off both its projection into the subspace spanned by e1 and e2 e3 = X3 − [ X3 , e1 e1 + X3 , e2 e2 ]. ˜ This vector e3 is orthogonal to both e1 and e2 . Normalize it to get e3 = e3 / e3 . ˜ ˜˜ 3.3. ABSTRACT SCALAR PRODUCT SPACES 119 More generally, say we have used the vectors X1 , X2 , . . . , Xk to obtain the orthonormal set e1 , e2 , . . . , ek . Then ek+1 is given by k Xk+1 − Xk+1 , el el l=1 k ek+1 = Xk+1 − Xk+1 , el l=1 This procedure is called the Gram-Schmidt orthogonalization process. With it we can assert that if some set of linearly independent vectors spans a linear space A , we might as well suppose that those vectors constitute an orthonormal set, for if they don’t just use GramSchmidt to construct a set that is orthonormal. The next result is a useful observation. Theorem 3.15 . A set { X1 , X2 , . . . , Xn } of orthogonal vectors, none of which is the zero vector, is necessarily linearly independent. Proof: The hypothesis states that Xj , Xk = 0, j = k and that Xj , Xj = 0 . Assume there are scalars a1 , a2 , . . . an such that 0 = a1 X1 + a2 X2 + . . . + an Xn . We shall show that a1 = a2 = . . . = an = 0 . Take the scalar product of both sides with the vector X1 . Then 0, X1 = a1 X1 , X1 + a2 X2 , X1 + · · · + an Xn , X1 . so that 0 = a1 X1 , X1 . Since X1 , X1 = 0 , we conclude that a1 = 0 . Similarly, by taking the scalar product with X2 we find that a2 = 0 , and so on. An easy consequence of this theorem is the fact that the functions fn (x) = sin nx, n = 1, 2, . . . , N where x ∈ [−π, π ] are linearly independent, for they are orthogonal (cf. Exercise 5, p. ???). Say we are given an orthonormal set of n vectors, { ej }, j = 1, . . . , n, ej , ek = δjk , and X an element of the linear space A spanned by the { ej } . Then n X= xj ej , j =1 where the xj are uniquely determined just from the general theory of linear spaces (p. 160, Theorem 10). In the special case of a scalar product space we can conclude even more. Theorem 3.16 . Let { ej , j = 1, . . . , n } be an orthonormal set of vectors which span A . n Then every vector X ∈ A can be uniquely written as X = xj ej , where xj is the length j =1 of the projection of X into the subspace spanned by ej , that is, xj = X, ej . The xj are the Fourier coefficients of X with respect to the orthonormal basis { ej } . 120 CHAPTER 3. LINEAR SPACES: NORMS AND INNER PRODUCTS Proof: This is identical to Theorem 6 of the last section. Take the inner product of both n sides of X = xj ej with ek . Then n=1 n X , ek = xj ej , ek j =1 n = (3-9) n xj ej , ek = j =1 xj δjk , j =1 so that X, ek = xk . Furthermore, Theorem 3.17 . Let { ej }, j = 1, . . . , n be an orthonormal set of vectors which span A . n If X = n xj ej and Y = j =1 yj ej are vectors in A , then j =1 n xj yj = x1 y1 + x2 y2 + · · · + xn yn . X, Y = j =1 Proof: Identical to Theorem 7 of the last section. n n X, Y = xj ej , j =1 n = yk ek k=1 n xj ej , j =1 n = n xj ( j =1 n = yk ek k=1 (3-10) yk ej , ek ) k=1 n xj ( j =1 so that yk δjk ), k=1 n xj yj . X, Y = j =1 Remark: We shall see that these two theorems extend to the case n = ∞ . Examples. (1) The vectors e1 = (1, 0, 0), e2 = (0, 1, 0) , and e3 (0, 0, 1) clearly form an orthonormal basis for E3 . Let X = (2, −1, 4) . We shall compute the xj in 3 X= xj ej . j =1 3.3. ABSTRACT SCALAR PRODUCT SPACES 121 Since xj = X, ej , we find z1 = X, e1 = (2, −1, 4), (1, 0, 0) = 2·1+(−1)·0+4·0 = 2 , and similarly, x2 = −1, x3 = 4 as expected. Thus (2, −1, 4) = 2e1 − e2 + 4e3 . In the same way, if Y = (7, 1, −3) , then Y = 7e1 + e2 − 3e3 . Also, X, Y = (2)(7) + (−1)(1) + (4)(−3) = 1. The projection of X into the subspace spanned by Y is X, Y / Y Y Y 1 (7, 1, −3) 59 1 3 7 = e1 + e2 − e3 . 59 59 59 = (3-11) 1 1 1 1 Another orthonormal basis for E3 is e1 = ( √2 , √2 , 0), e2 = (− √2 , √2 , 0) , and e3 = ˜ ˜ ˜ (0, 0, 1) , since ej , ek = δjk . The expansion for X in this basis is ˜˜ 3 X= xj ej , ˜˜ j =1 where 1 11 x1 = X, e1 = (2, −1, 4), ( √ , √ , 0) = √ , ˜ ˜ 22 2 3 x2 = − √ , and x3 = 4. ˜ ˜ 2 (3-12) (3-13) Thus 3 1 ˜ ˜ e X = √ e1 − √ e2 + 4˜3 . 2 2 Similarly, 6 8 Y = √ e1 − √ e2 − 3˜3 . ˜ ˜ e 2 2 Therefore 1 8 3 6 X, Y = ( √ )( √ ) + (− √ )(− √ ) + (4)(−3) = 1. 2 2 2 2 Notice that the number X, Y is the same no matter which basis is used. This is not a coincidence. Recall that the scalar product X, Y was defined independently of any basis. Hence its value should not be dependent upon which basis we happen to choose. If you think of X, Y geometrically in terms of the projection, it should be clear that the number should not depend upon which particular basis is used to describe the vectors. 122 CHAPTER 3. LINEAR SPACES: NORMS AND INNER PRODUCTS (2) For our second example, we consider the set of orthonormal functions e1 (x) = sin x √ π , sin x √ π e2 (x) = and let A be the set in L2 (−π, π ) which they span. We would like to expand some function 2 f (x) + fj ej (x). j =1 The only trouble is that Theorems 14 and 15 only allow us to expand functions f which are in the subspace A , that is, are a linear combination of the basis elements e1 and e2 . Since we secretly know that f (x) = sin x cos x(= 1 sin 2x) is such a function, 2 let us find its expansion. By elementary integration, π sin x (sin x cos x) √ dx = 0, x −π f1 = f , e1 = and √ sin 2x π . (sin x cos x) √ dx = 2 π −π π f2 = f , e2 = Therefore √ √ π π f = 0 · e1 + e2 = e2 2 2 or √ sin 2x π sin 2x ( √ )= , sin x cos x = 2 2 π which we knew was the case from trigonometry. If the orthonormal set { ej }, j = 1, . . . , m spans a subspace A of a linear scalar product space H , and if X ∈ H , can any sense be made of the expansion m ? X= xj ej ? j =1 One way to seek an answer is to examine a special case. Again geometry will supply the key. Let H = E3 and let A be the subspace spanned by the orthonormal vectors e1 = (1, 0, 0) and e2 = (0, 1, 0) . Then if X ∈ E3 , how can we interpret 2 ? X= xj ej = x1 e1 + x2 e2 ? j =1 Plowing blindly ahead, we take the scalar product of both sides with e1 and then with e2 . This gives us xj = X, ej . Thus the right side, x1 e1 + x2 e2 , is the projection of X into the subspace A spanned by { ej } . It is now clear how our original quandary is resolved. Definition If the orthonormal set { ej }, j = 1, . . . , m spans a subspace A of a linear m scalar product space H , and if X ∈ H , then the vector xj ej , where xj = X, ej , is j =1 the projection of X into the subspace A . Remark. It is customary to denote the projection of X into A by PA X . Think of PA as an operator (function) which maps the vector X into its projection in A . With this notation the above definition reads m PA X = xj ej , j =1 3.3. ABSTRACT SCALAR PRODUCT SPACES 123 where xj = X, ej and the orthonormal set { ej } spans A . Since the projection PA X is defined in terms of a particular basis for A , we should show that this geometrical object is independent of the basis you choose for A . But we shall not take the time right now. In reality, Theorem 17 below leads us to make a better definition of projection. Theorem 3.18 . If the orthonormal set { ej }, j = 1, . . . , m spans a subspace A ⊂ H , and if X and Y are in H , then m a) PA X, PA Y = xj yj , j =1 where xj = X, ej and yj = Y , ej . In particular m b) x2 . j PA X = j =1 Furthermore, X − PA X ∈ A⊥ , that is, for every Y ∈ A X − PA X, Y = 0 c) Every X ∈ H can be written as d) X = PA X + PA⊥ X, where PA⊥ X ≡ X − PA X is in A⊥ . m Proof: Since both vectors PA X = m xj ej and PA Y = j =1 yj ej are in A itself, a) and j =1 b) are immediate consequences of Theorem 15. Although the equation c) is geometrically clear, we shall compute it too. Since the ej span A , this is equivalent to showing it is orthogonal to all the ej . Now X − PA X, ej = X, ej − PA X, ej = xj − xj = 0 . Since trivially X = PA X +(X − PA X ) , the only content of part d) is that (X − PA X ) ∈ A⊥ , which is just what part c) proved. Corollary 3.19 a) X X 2 2 = PA X = PA X 2 2 + X − PA X + PA⊥ X 2 2 (Pythagorean Theorem) m b) X 2 ≥ PA X 2 x2 j = (Bessel’s Inequality) j =1 Proof: a) is a result of the fact that PA X ∈ A is orthogonal to X − PA X ∈ A⊥ and Theorem 12. The inequality b), Bessel’s inequality, is simply a weaker form of a)—since X −PA X ≥ 0 . There is equality if and only if X ∈ A , for only then does X −PA X = 0 . Examples: 124 CHAPTER 3. LINEAR SPACES: NORMS AND INNER PRODUCTS (1) Let A be the subspace of E3 spanned by e1 = (1, 0, 0) and e2 = (0, 1, 0) . The projection of X = (3, −1, 7) into A is represented by PA X = X, e1 e1 + X, e2 e2 = 3e1 − e2 ∈ A Also PA⊥ X = X − PA X = 3e1 − e2 + 7e3 − (3e1 − e2 ) = 7e3 ∈ A⊥ . 1 1 1 1 ˜ Since e1 = ( √2 , √2 , 0) and e2 = (− √2 , √2 , 0) also form an orthonormal basis for A , ˜ we can equally well write 2 4 PA X = X, e1 e1 + X, e2 e2 = √ e1 − √ e2 . ˜˜ ˜˜ ˜ ˜ 2 2 (2) Let A be the subspace of L2 [−π, π ] spanned by the orthonormal functions e1 (x) = sin x √ , e2 (x) = sin 2x . The projection of the function f (x) ≡ x into A is represented √ π π by PA f = f , e1 e1 + f , e2 e2 . Since an integration by parts shows that π −x cos kx k x sin kx dx = −π =− x cos kx k π −π =− 2π k,k π − 2k , 2π cos kπ = k we find π −π cos kx dx odd k even = (−1)k+1 2π , k π f , e1 = x, e1 = and √ sin x x √ dx = 2 π π −π π f , e2 = x, e2 = Thus √ sin 2x x √ dx = − π. π −π √ sin x √ sin 2x PA x = 2 π √ − π √ , π π or PA x = 2 sin x = sin 2x. Also, 2 PA X = f , e1 2 + f , e2 2 = 5π. ˜ More generally, we can let A be the subspace of L2 [−π, π ] spanned by { ek }, k = sin kx ˜ 1, 2, . . . , N , where ek (x) = √π . Then the projection of x onto A is given by N PA x = ˜ x, ek ek (x). k=1 Since √ sin kx k+1 2 π = x √ dx = (−1) , k π −π π x, ek 3.3. ABSTRACT SCALAR PRODUCT SPACES 125 we have √ N k+1 2 (−1) PA x = ˜ k=1 N π sin kx √ k π (−1)k+1 sin kx k =2 k=1 = 2(sin x − (3-14) sin 2x sin 3x sin N x + − . . . + (−1)N +1 ). 2 3 N Furthermore, N 2 PA x ˜ N = 2 x, ek = k=1 k=1 4π = 4π k2 N k=1 1 . k2 It is from this formula that we eventually intend to obtain the famous formula ∞ k=1 1 1 1 1 π2 = 1 + 2 + 2 + 2 + ··· = . k2 2 3 4 6 We will observe that π f 2 2 =X x · x dx = = −π 2π 3 3 and prove lim N →∞ PA⊥ x = lim ˜ N →∞ x − PA x = 0 ˜ Then from the Corollary to Theorem 16, 2 X = lim N →∞ PA x 2 , ˜ or 2π 3 = 4π 3 ? k=1 1 π2 ⇒ = k2 6 ∞ k=1 1 . k2 Geometry leads us to the next theorem—and the proof too. Let X be a given vector and PA X its projection into the subspace A . Since distance is measured by dropping a perpendicular, we expect that PA X is the vector in A which is closest to X , that is, most closely approximates X . Theorem 3.20 . Let X be a vector in a scalar product space H and A a subspace of H . Then if V is any vector in A , X − PA X ≤ X − V . Proof: We shall prove the stronger statement (cf. fig. above) X − PA X 2 + V − PA X 2 = X −V 2 . Observe that (V − PA X ) ∈ A , since both terms are in A and A is a subspace. Moreover X − PA X ∈ A⊥ (Theorem 16c). Therefore X − PA X is orthogonal to PA X − V , so the identity is a consequence of Theorem 12. 126 CHAPTER 3. LINEAR SPACES: NORMS AND INNER PRODUCTS Remark. With this theorem in mind, we could define the projection PA X into a subspace A as the element in A which is closest to X . This definition is independent of any basis, whereas our original definition was not. One must, however, be somewhat careful when defining the projection into an infinite dimensional subspace. Although it is clear that the number X − V has a g.l.b. as V wanders throughout A , it is not clear that it has an actual min, that is, there really is a vector U ∈ A such that X − U takes on its g.l.b. as a min. If there is such a U , we call it PA X . Otherwise there is no projection. When projecting into a finite dimensional space this difficulty does not arise (but we will stop without further explanation of this detail). Some discussion of these results is needed to place the material in its proper perspective. If you are given an orthonormal set of vectors { ej } which span some subspace A of a scalar product space H , then for any X in H you can find a representation for PA X in terms of that basis, PA X = xj ej . If the vector X happened to already lie in A , then PA X = X x2 . This last equation for the length of X is the so X = xj ej and X = j Pythagorean Theorem. If X did not lie entirely in A , but “stuck out” of it into the rest of H , then PA X = xj ej only represents a piece of X , its projection into A . Since x2 . This inequality part of X has been omitted, we expect that X > PA X = j was the content of the Corollary to Theorem 16. Informally, if no vector X ∈ H sticks out of the linear space spanned by the { ej } , then the set { ej } is said to be complete (do not confuse this with the complete of Chapter 0; they are entirely different concepts, an unfortunate coincidence). More precisely, Definition An orthonormal set is complete for the scalar product space H if that orthonormal set is not properly contained in a larger orthonormal set. There are many ways to check if a given orthonormal set is complete for H . Geometry suggests them all. Theorem 3.21 . Let { ej } be an orthonormal set which spans the subspace A of the scalar product space H . The following statements are equivalent (a) The set { ej } is complete for H . (b) If X, ej = 0 for all j , then X = 0 . (c) A = H . (d) If X ∈ H , then X = xj ej , where xj = X, ej . (e) If X and Y ∈ H , then X, Y = xj yj , where xj = X, ej and yj = Y , ej (f) If X ∈ H , then (Pythagorean Theorem) X 2 = x2 , where xj = X, ej j Proof: We shall use the chain of reasoning a ⇒ b ⇒ c . . . ⇒ f ⇒ a . a ⇒ b . If X, ej = 0 but X = 0 , then X/ X is a unit vector orthogonal to all the ej . This means that { X , e1 , e2 , . . . } is an orthonormal set which contains { e1 , e2 , . . . } X as a proper subset. b ⇒ c . If there is an X ∈ H but X A , then PA⊥ X = X − PA X ∈ A⊥ and is not zero. Since all the ej ∈ A , we have PA⊥ X, ej = 0 for all j but PA⊥ X = 0 , contradicting b). Thus H ⊂ A . Since A ⊂ H by hypothesis, this proves that H = A . 3.3. ABSTRACT SCALAR PRODUCT SPACES 127 c ⇒ d . Since every X ∈ A has the form X = xj ej (by Theorem 14) and since H = A , the conclusion is immediate. d ⇒ e ⇒ f . A restatement of Theorem 16 since for every X ∈ H , we know that PA X = PH X = X . f ⇒ a . If { ej } is not complete, it is contained in a larger orthonormal set. Let e be a vector in that larger set which is not one of the ej . Then by f), and the fact that e, ej = 0 , e 2 = e, ej 2 = 0. Therefore e = 0 . Remarks. 1. Because each of the six conditions a-f are equivalent, any one of them could have been used as the definition of a complete orthonormal set. 2. If the orthonormal set { ej } has a (countably) infinite number of elements, the theorem is still valid but some convergence questions for d-f arise because of the then ∞ infinite series X = xj ej . The appropriate sense of convergence is that the remainder 1 ∞ N xj ej = X − after N terms, N +1 xj ej tends to zero in the norm of the scalar product 1 space, that is, if N lim N →∞ X− xj ej = 0. 1 We shall meet this in the next section for the space L2 [−π, π ] . Condition f) gives us no convergence problems since the series is an infinite series of positive terms which is always bounded by X 2 (Bessel’s Inequality—Corollary b to Theorem 16), and so always converges. This criterion just asks if the sum of the series actually equals X 2 (we know it is no larger). Examples 1 1 1 1 (1) The set of orthonormal vectors e1 = ( √2 , √2 , 0) and e2 = ( √2 , − √2 , 0) are not complete for E3 since any basis for E3 must have three elements because its dimension is 3. This could also be seen geometrically from the fact that, for example X = (1, 2, 3) sticks out of the space spanned by e1 and e2 , or from the fact that e3 = (0, 0, 2) is a non-zero vector orthogonal to both e1 and e2 , or in many other ways. The dimension argument is the easiest to apply if H is finite dimension, for then the number of elements in a complete orthonormal set { ek } must equal the dimension of H. √ (2) The set { en } where en (x) = sin nx is an orthonormal set of functions in the scalar ˜ ˜ π product space L2 [−π, π ] , but it is not a complete orthonormal set for that space since the function cos x is a non-zero function in L2 [−π, π ] which is orthogonal to all the en , ˜ π sin nx cos x, en = ˜ cos √ dx = 0. π −π Thus, although the set { en } has an infinite number of elements, it is still not big ˜ enough to span all of L2 [−π, π ] . The next section will be devoted to proving that the 128 CHAPTER 3. LINEAR SPACES: NORMS AND INNER PRODUCTS larger orthonormal set, e0 , e1 , e1 , e2 , e2 , . . . , where ˜ ˜ 1 cos nx sin nx e0 = √ , en (x) = √ , en (x) = √ ˜ π π 2π is a complete orthonormal set for the scalar product space L2 [−π, π ] . This is a difficult theorem. Specific applications of the ideas in this section are contained in the exercises. For many of them you would be wise if you referred to their corresponding special cases which appeared in Section 2. Exercises (1) Let X and Y be points in Rn . Determine which of the following make Rn into a scalar product space, and why—or why not. n (a) X, Y = k=1 n (b) 1 xk yk . k (−1)k xk yk . X, Y = k=1 n (c) 2 x2 yk . k X, Y = k=1 n (d) X, Y = ak xk yk , where ak > 0 for all k . k=1 (2) Let f and g be continuous real-valued functions in the interval [0, 1] , so f, g ∈ C [0, 1] . Determine which of the following make C [0, 1] into a scalar product space, and why—or why not. 1 (a) f, g = f (x)g (x) 0 1 dx . 1 + x2 1 (b) f, g = f (x)g (x) sin 2πx dx . 0 1 (c) f (x)g 2 (x) dx f, g = 0 1 (d) f, g = f (x)g (x)ρ(x) dx , where ρ(x) is a fixed continuous function with the 0 property ρ(x) > 0 . (e) f , g = f (0)g (0) . (3) This is the analogue of L2 for sequences. Let l2 be the set of all sequences X = ∞ (x1 , x2 , xe , . . .) with the property that X x2 < ∞ . Prove that l2 is a j = j =1 normed linear space (cf. the example for l1 in Section 1). 3.3. ABSTRACT SCALAR PRODUCT SPACES 129 ∞ ∞ n2 a2 < ∞ , then n (4) Use the Cauchy-Schwarz inequality to prove that if n=1 (Hint: |an | = 1 n |an | < ∞ . n=1 |nan | ). (5) Consider the following linearly independent vectors in E3 : X1 = (1, 0, −1), X3 = (2, −1, 0). X2 = (0, 3, 1), (a) Use the Gram-Schmidt orthogonalization process to find an orthonormal set of vectors, e1 , e2 and e3 such that e1 is in the subspace spanned by X1 . 3 (b) Write X = (1, 2, 3) as X = xj ej , where the ej are those of part a). Also, j =1 compute X and PX . (6) Consider the following linearly independent set of functions in L2 [−1, 1] f1 (x) = 1, f2 (x) = x, f3 (x) = x2 . (a) Use the Gram-Schmidt orthogonalization process to find an orthonormal set of functions e1 (x), e2 (x) and e3 (x) such that e1 is in the subspace spanned by f1 . (b) Find the projection of the function f (x) = (1+ x)3 into the subspace of L2 [−1, 1] spanned by e1 (x), e2 (x) , and e3 (x) . Also, compute f and P f . (7) Let Pn (x) = 1 dn 2n n! dxn (1 − x2 )n , n = 0, 1, 2, . . . . These are the Legendre Polynomials. 1 (a) Prove that Pn , Pm = Pn (x)Pm (x) dx = 0, n = m , that is, the Pn are −1 orthogonal in L2 [−1, 1] by first proving that 1 Pn (x)xm dx = 0, m < n. −1 (b) Show that Pn 2 = 2 2n+1 . Thus the functions en (x) = 2n + 1 Pn (x) 2 are an orthonormal set of functions for L2 [−1, 1] . Compute e0 (x), e1 (x) , and e2 (x) and compare with Exercise 6a. (8) (a) Show that the vector N = (a1 , a2 , a3 ) is orthogonal to the coset (a plane in E3 ) A = { X ∈ E3 : a1 x1 + a2 x2 + a3 x3 = c } . (b) Show that the vector N = (a1 , . . . , an ) is orthogonal to the coset (a hyperplane in En ) A = { X ∈ En : a1 x1 + . . . an xn = c } . (c) Find the coset A ⊂ E3 which passes through the point X0 = (1, −1, 2) and is orthogonal to N = (1, 3, 2) . In ordinary language, A is the plane containing the point X0 which is orthogonal to N . 130 CHAPTER 3. LINEAR SPACES: NORMS AND INNER PRODUCTS (d) Show that the coset A ⊂ En which passes through the point X0 = (˜1 , . . . , xn ) x ˜ and is orthogonal to N = (a1 , . . . , an ) is A = { X ∈ En : X, N = X0 , N }. (9) (a) Use Problem 8a to show that the distance d from the point P = (y1 , y2 , y3 ) ∈ E3 to the coset a1 x1 + a2 x2 + a3 x3 = c in E3 is d= |a1 y1 + a2 y2 + a3 y3 − c| a2 a + a2 2 + a2 3 = | N, P − c| N (b) Show that the distance d from the point P = (y1 , . . . , yn ) ∈ En to the coset a1 x1 + . . . + an xn = c in En is d= |a1 y1 + a2 y2 + · · · + an yn − c| a2 1 + a2 2 + ··· + a2 n = | N, P − c| . N (c) Show that the distance d between the “parallel” cosets a1 x1 + · · · + an xn = c1 and a1 x1 + · · · + an xn = c2 in En is d= |c1 − c2 | a2 + · · · + a2 n 1 = |c1 − c2 | . N (Hint: Pick a point P in one of the cosets and apply part b). (10) Find the angle between the diagonal of a cube and one of its edges. (11) Let Y1 and Y2 be fixed vectors in a scalar product space H . a). If X, Y1 = 0 for all X ∈ H , prove that Y1 = 0 . b). If X, Y1 = X, Y2 for all X ∈ H , prove that Y1 = Y2 . (12) Let Y0 be a fixed vector in a scalar product space H . Let A = { X ∈ H : Y , Y0 = 0Rightarrow X, Y = 0 } . Prove that A is the span of Y0 : { X ∈ ARightarrowX = cY0 } for some scalar c . Make sure to see the geometrical situation for the case H = E3 . [Hint: Let B be the set of all vectors orthogonal to Y0 , so Y ∈ B . Since H is composed of two parts, Y0 and B , every X ∈ H can be written as X = cY0 + Z , where cY0 is the projection of X into the subspace spanned by Y0 (so c = X, Y0 / Y0 2 ) and Z = (X − cY0 ) ∈ B . Now show that X ∈ A ⇒ Z = 0) . (13) (a) Let X = (1, 3, −1) and Y = (2, 1, 1) . Find a vector N which is orthogonal to the subspace spanned by X and Y . (b) Let X = (x1 , x2 , x3 ) and Y = (y1 , y2 , y3 ) . Find a vector N which is orthogonal to the subspace spanned by X and Y . [Answer. N = c(x2 y3 − y2 x3 , y1 x3 − x1 y3 , x1 y2 − y1 x2 ) , where c is any non-zero scalar]. (14) Let A be the subspace of L2 [−π, π ] spanned by the orthonormal set { en (x) }, n = √ 1, 2, . . . , N , where en (x) = sin nx . π (a) Find the projection of f (x) = x2 , into A . (The answer should surprise you). Compute f and PA f too. 3.3. ABSTRACT SCALAR PRODUCT SPACES 131 (b) Find the projection of f (x) = 1 + sin3 x into A . Compute f and PA f . (c) If f (x) is an even function, f (x) = f (−x) , show that its projection into A is zero. Now look at part (a) again. (15) (a) If f ∈ C [a, b] , show that b b f (x) dx)2 ≤ (b − a) ( f 2 (x) dx. a a [Hint: Write f (x) = 1 · f (x) and use the Cauchy-Schwarz inequality for L2 [a, b] ]. (b) If f ∈ C 1 [a, b] , prove that b |f (x) − f (a)|2 ≤ (x − a) f (x)2 dx, x ∈ (a, b). a x af [Hint: Write f (x) − f (a) = (t) dt and apply part a)] (c) If f ∈ C 1 [a, b] and f (a) = 0 , use part b to prove that b f 2 (x) dx ≤ a (b − a)2 2 b f (x)2 dx. a (16) (a) Let A = { h ∈ C 1 [a, b] : h(a) = h(b) } and let B = { h ∈ C 1 [a, b] : 1, h = 0 } , where h = dh . Show that the subspaces A and B are identical h ∈ A ⇐⇒ dx h∈B. b (b) Let f (x) be any continuous function such that f (x)h (x) dx = 0 for all a h(x) ∈ C 1 [a, b] with h(a) = h(b) . Show that f ≡ constant. [Hint: Use part (a) and the result of Exercise 12]. b (17) If f (x) ∈ C [a, b] and satisfies the condition which satisfy the conditions b b h(x) dx = 0, a f (x)h(x) dx = 0 for all h(x) ∈ C [a, b] a b xn h(x) dx = 0, xh(x) dx = 0, . . . , a a prove that f ∈ Pn , that is, f is of the form f (x) = a0 + a1 x + · · · + an xn , where the aj are constants. [Hint: Use Exercise 12]. (18) Determine which of the following orthonormal sets are complete for their respective spaces. (a) In E3 , e1 = (0, 1, 0), e2 = ( 3 , 0, 4 ), e3 = (− 4 , 0, 3 ) 5 5 5 5 1 1 (b) In E4 , e1 = (1, 0, 0, 0), ex = (0, 1, 0, 0), e3 = (0, 0, √2 , √2 ). 1 1 (c) In E4 , e1 , e2 , e3 as in (b), and e4 = (0, 0, √2 , − √2 ). 132 CHAPTER 3. LINEAR SPACES: NORMS AND INNER PRODUCTS (19) Let e1 , e2 , and e3 be an orthonormal basis for E3 , and let A be the subspace spanned by X1 = 3e1 − 4e3 . Find an orthonormal basis for A⊥ . (20) Let A be a subspace of a scalar product space H . If X ∈ H , prove that PA (PA X ) = 2 PA X and interpret this geometrically. This result can be written as PA = PA . (21) Let A be any operator (not necessarily linear) on a scalar product space. Prove the polarization identity 2 AX, AY = AX + AY 3.4 2 − AX 2 − AY 2 . Fourier Series. Throughout this section we shall only use the scalar product of L2 [a, b] , b f, g = f (x)g (x) dx. a We begin with the observation that in the interval [−π, π ] π sin nx, sin mx = sin nx sin mx dx = πδnm , (3-15) −π π sin nx, cos mx = sin nx cos mx dx = 0, (3-16) cos nx cos mx dx = πδnm , (3-17) −π and π cos nx, cos mx = −π where n, m = 0, 1, 2, 3, . . . . Thus the functions sin nx cos nx 1 ˜ e0 (x) = √ , en (x) = √ , cn (x) = √ π π 2π form an orthonormal set: en , em = δnm en , em = 0, en , em = δnm . ˜ ˜ ˜˜ Thus, if f ∈ Lx [−π, π ] , we can find the projection PN f of f into the subspace spanned by e0 , e1 , e, . . . , eN , eN . ˜ ˜ N (PN f ) = a0 e0 + an en + bn en , ˜ (3-18) n=1 where ak = f , ek and bk = f , ek ˜ (3-19) More explicitly, N 1 cos nx sin nx (PN f )(x) = a0 √ + an √ + bn √ , π π 2π n=1 (3-20) 3.4. FOURIER SERIES. 133 where π a0 = and 1 f (x) · √ dx, 2π −π π π an = cos nx dx, f (x) √ π −π bn = sin nx f (x) √ dx π −π (3-21) A natural question arises: as N → ∞ , does the series converge: PN f → f , in the sense that f − PN f → 0 ? In other words, is the set { ej (x), ej (x) }, j = 0, 1, 2, . . . a complete ˜ orthonormal set of functions for L2 [−π, π ] ? The answer is yes, as we shall prove. Thus for any f ∈ L2 [−π, π ] , ∞ sin nx 1 cos nx + bn √ , (3-22) f (x) = a0 √ + an √ π π 2π n=1 where the Fourier coefficients, an , bn are determined by the formulas (2). The expansion (3) is called the Fourier series for f . Historically, Fourier series did not arise from the geometrical considerations we have developed. Mathematical physics—in particular the vibrations of strings and the flow of heat in a bar—take the credit for these ideas. Only in recent years has the geometrical viewpoint been investigated. Later on we shall discuss some of the fascinating problems in mathematical physics to which Fourier series can be applied. Beware. The equality which appears in (3) is equality in the L2 [−π, π ] norm, viz. b [f (x) − (PN f )(x)]2 dx → 0 f − PN f = a This is quite different than the convergence of infinite series to which you’re accustomed, which is the uniform norm f − PN f ∞ = max |f (x) − (PN f )(x)| . − π ≤ x≤ π In Section 1 (p. 176) you saw one instance of where a sequence of functions converged in some norm (the L1 norm there) but did not converge in the uniform norm. Such is also the case here. In fact, contrasting the situation in the L2 norm, there do exist continuous functions f whose Fourier series (3) does not converge to f in the uniform norm. However if the function f has one derivative, then its Fourier series does converge to f in the uniform norm. In addition, there are some discontinuous functions whose Fourier series converge. These ideas will become clearer later on. You should be warned that our definition (1), (3) of a Fourier series is not the standard √ nx ˜ √ one. Most books do not work with the orthonormal set e0 = √1 π , en = cos π , en = sin nx , π 2 ˜ but rather use just an orthogonal set which is not normalized θ0 = 1 , θn = cos nx, θn = 2 sin nx . For these people, f (x) = where A0 + 2 π An = f (x) −π ∞ An cos nx + Bn sin nx, n=1 cos nx dx, π π Bn = f (x) −π sin nx dx, π 134 CHAPTER 3. LINEAR SPACES: NORMS AND INNER PRODUCTS √ n = 0, 1, 2 . . . . As you can see, these differ from our formulas only by factors of π . Needless to say, the resulting Fourier series for a given function f does not depend which intermediate formulas you use. We prefer the less standard ones because they are more intimately tied to geometry (so there is less to remember). Before discussing the difficult issues of convergence in detail, we will find the Fourier series associated with some specific functions. Examples. (1) Find the Fourier series associated with the functions f (x) = x , −π ≤ x ≤ π . We actually found this in the previous section. A computation (involving integration by parts) shows that π a0 = f , e0 = 1 x · √ dx = 0 2π −π π an = f , en = cos nx x√ dx = 0, π −π n = 1, 2, . . . √ sin nx 2(−1)n+1 π = x √ dx = , n π −π π bn = f , en ˜ n = 1, 2, . . . Thus, upon substituting into (3) we find that ∞ ∞ 2(−1)n+1 √ sin nx (−1)n+1 x= π√ =2 sin nx n n π n=1 n=1 or x = 2[sin x − sin 2x sin 3x sin 4x + − + · · · ]. 2 3 4 Again we remind you that the equality here is in the sense of convergence in L2 . For this particular function, there is also equality in the usual sense of convergence for infinite series for all x ∈ (−π, π ) . Direct substitution reveals that it does not converge in the usual sense at x = ±π . These remarks are based upon convergence theorems we have yet to prove. At x = π , this yields 2 π 111 = 1 − + − + ··· 4 357 (2) Since the formulas (2)’ make sense even if the function f (x) has a finite number of discontinuities, we are tempted to find the Fourier series for discontinuous functions (in contrast, recall that the coefficients of an infinite power series are only defined if the function had an infinite number of derivatives). We shall find the Fourier series associated with the discontinuous function f (x) = 0, −π ≤ x ≤ 0 π, 0 < x < π 3.4. FOURIER SERIES. 135 The computations are particularly simple. 0 π 1 1 0 · √ dx + π · √ dx 2π 2π −π 0 0 π cos nx cos nx = 0· √ dx + π· √ dx = 0, π π −π 0 π2 = √ , (3-23) 2π a0 = f , e0 = an = f , en 0 bn = f , en = ˜ √ sin nx 0 · √ dx + π −π π = (1 − cos nπ ) = n √ 2π n 0 π 0 n > 0, (3-24) sin nx π · √ dx π (3-25) , n odd , n even (3-26) Therefore the Fourier series associated with this function is √ 1 π2 f (x) = √ · √ + 2 π 2π 2π or f (x) = sin x sin 3x sin 5x √ + √ + √ + ··· π 3π 5π , sin 3x sin 5x π + 2(sin x + + + · · · ). 2 3 5 As usual, the equality is meant in the sense of convergence in the L2 norm. The series also converges to the function f in the uniform norm in the whole interval except for a neighborhood of x = 0 . At 0 it hasn’t got a chance because of the discontinuity of f there. A glance at the series reveals that at x = 0 , the right side is π/2 —the arithmetic mean between the values of f just to the left and right of 0. This is the usual case at a discontinuity: a Fourier series converges to the average of the function values to the right and left of the point where f is discontinuous. We still offer no proof for these statements. Observe that the Fourier series (3) for any function f (x) depends only upon the values of x in the interval −π ≤ x ≤ π . However the series itself is periodic with period 2π . If the function f (x) , which we considered only for x ∈ [−π, π ] is defined for all other x by the formula f (x + 2π ) = f (x) (making f periodic too), then both sides of the Fourier series (3) are periodic with period 2π . Therefore whatever they do in the interval [−π, π ] is repeated every 2π . For example, the function f (x) = x, x ∈ [−π, π ] when continued outside the interval [−π, π ] as a function periodic with period 2π becomes a figure goes here Since the Fourier series for this particular function converges uniformly for all x ∈ (−π, π ) , it also converges uniformly to the periodically continued function for all x ∈ (kπ, kπ + 2π ), k = 1, ±1, ±2, . . . . This also makes it clear why the Fourier series for f (x) = x converges to zero at x = ±π , for the series is just converging to the arithmetic mean of its neighboring values at the discontinuity. It is pleasant to look at a picture. Let us see how the first four terms of its Fourier series approximates the function x x = 2(sin x − sin 2x sin 3x sin 4x + − + ···) 2 3 4 a figure goes here 136 CHAPTER 3. LINEAR SPACES: NORMS AND INNER PRODUCTS Notice that as more terms are used, the projection PN x PN x = 2(sin x − sin22x + · · · + N (−1)N +1 sinN x ) more and more closely approximate x . This reflects the convergence of the Fourier series, PN f → f . One popular interpretation of a Fourier series is as a sum of “waves” which approximate a given function. Thus the function x is the sum of 2 times the wave sin x plus (−1) times the wave sin 2x and so on. In other words, the Fourier series for the function f (x) = x represents that function as the superposition of sine waves. The term 2 sin x is spoken of 2 as the first harmonic, the term – sin 2x as the second harmonic, the term 3 sin 3x as the third harmonic, etc. Although it is difficult to believe, the ear hears by taking the sound wave f (x) which impinges on the ear drum and splitting it up into its Fourier components (3). It then analyzes each component an en + bn en —only considering the coefficients an and bn . These ˜ Fourier coefficients measure the intensity of the n th harmonic. Particular sounds are then heard in terms of the intensity of their various harmonics. We recognize familiar sounds by recognizing that the sound waves have similar Fourier coefficients. Amazing. It is time to consider the convergence of Fourier series. The question is: does the partial Fourier series N PN f = a0 e0 + an en + bn en ˜ n=0 converge to the function f as N → ∞ . Since there are several norms, in particular the L2 norm and the uniform norm ∞ , we must investigate convergence in each norm. Even though our proofs are reasonably slick, they are neither short nor particularly simple. A great deal of analytical technique will be needed. The proofs to be presented have been chosen because each of the devices invoked are important devices in their own right. We begin with some useful facts which have nothing especially to do with Fourier series. Theorem 3.22 (Weierstrass Approximation Theorem). If f (x) is continuous in the interval [−π, π ] and f (−π ) = f (π ) , then given any > 0 there is a trigonometric polynomial N TN (x) = α0 + αn cos nx + βn sin nx n=1 N (3-27) ˆ˜ αn en + βn en , ˆ = α0 e0 + ˆ n=1 √ √ ˆ√ ˆ ˆ (where α0 = α0 2π, αn = αn π, βn = βn π ), such that f − TN ∞ = max |f (x) − TN (x)| < . − π ≤ x≤ π Note that the numbers αn and βn are not necessarily the Fourier coefficients of f . The proof, which is placed as an appendix at the end of this section, will indicate how they can be found. The following theorem states that convergence in the uniform norm implies convergence in the L2 norm. Theorem 3.23 . If θ(x) is any bounded integrable function, then (if b > a ) √ θ ≤ b − a θ ∞. 3.4. FOURIER SERIES. Proof: Since θ ∞ 137 = max |θ(x)| we find immediately that x∈[a,b] b b θ(x)2 dx ≤ a b θ 2 ∞ dx =θ 2 ∞ dx = (b − a) θ a 2 ∞ a from which the conclusion is obvious. On geometrical grounds the theorem is even easier, since θ ∞ is the greatest height of the curve θ(x) . Although convergence in the L2 norm does not imply convergence in the uniform norm (the example in Section 1 comparing L1 convergence and uniform convergence also works for L2 ), a useful weaker statement is true. Theorem 3.24 . (cf. Ex. 15 Section 3). If θ ∈ C 1 [a, b] and θ(x0 ) = 0 , where x0 ∈ [a, b] , then for every x ∈ [a, b] |θ(x)| ≤ √ b b−a θ dt = √ b−a θ . a Since the right side is independent of x , this implies that θ ∞ = maxx∈[a,b] |θ(x)| ≤ √ b−a θ = √ b − a Dθ Proof: By the fundamental theorem of calculus, x θ(x) = θ(x) − θ(x0 ) = θ (t) dt. x0 Thus the Cauchy-Schwarz inequality yields 2 2 x |θ(x)| = 1 · θ (t) dt x x 12 dt ≤ x0 x0 θ (t)2 dt x0 x θ (t)2 dt = (x − x0 ) (3-28) x0 b θ (t)2 dt. ≤ (b − a) a Therefore |θ(x)|2 = (b − a) θ 2 . With these preliminaries behind us we turn to the convergence of Fourier series. First up is convergence in the L2 norm. Theorem 3.25 . Assume f is continuous in the interval [−π, π ] and f (−π ) = f (π ) . Denote the sum of the first N terms of its Fourier series by PN f . Then π lim N →∞ [f (x) − (PN f )(x)]2 dx = 0 f − PN f = lim N →∞ −π 138 CHAPTER 3. LINEAR SPACES: NORMS AND INNER PRODUCTS Proof: Given any > 0 , let TN (x) be the trigonometric polynomial given by Weierstrass Approximation Theorem. The trick is to apply Theorem 17. Using the N of TN , we know that N PN f = a0 e0 + an en + bn en ˜ n=1 and N ˆ˜ αn en + βn en . ˆ TN = a0 e0 + ˆ n=1 Let A be the subspace of H = L2 [−π, π ] spanned by e0 , e1 , e1 , . . . eN , eN Then both PN f ˜ ˜ and TN are in A . Thus by Theorem 17 of the last section (where slightly different notation was used), f − PN f ≤ f − T N , and by Theorem 20 ≤ √ b − a f − TN < ∞ √ b−a . Thus lim N →∞ f − PN f = 0, proving the theorem. Corollary 3.26 (Parseval’s Theorem). If f (x) is continuous in the interval [−π, π ] and f (−π ) = f (π ) , then f 2 = lim PN f 2 , N →∞ that is, ∞ π 2 f (x) dx = −π a2 0 (a2 + b2 ), n n + n=1 where the Fourier coefficients aj and bj are determined by equations (2) or (2)’. Proof: The Corollary to Theorem 16 states that f 2 = PN f 2 + f − PN f 2 . If we now let N → ∞ , the second term on the right vanishes by the theorem just proved. Remark: The theorem and corollary state that the orthonormal set of functions e0 = √1 , en (x) = cos nx , and en (x) = sin nx is a complete orthonormal set for the scalar prod√ √ ˜ π π 2π uct space L2 [−π, π ] . The formula contained in the corollary is a generalization of the Pythagorean Theorem to L2 [−π, π ] . The proof of convergence in the uniform norm if the function has one continuous derivative is only slightly more difficult. We shall need a preliminary Lemma 3.27 . Assume f ∈ C 1 [−π, π ] . Extend it as a periodic function with period 2π by f (x + 2π ) = f (x) . Let (PN f ) be the sum of the first N terms of its Fourier series. Then the sum of the first N terms in the Fourier series for Df = df is PN (Df ) , that is dx PN (Df ) = D(PN f ). This in not necessarily true for other bases in L2 [−π, π ] . 3.4. FOURIER SERIES. 139 Proof: We know that N 1 cos nx sin nx (PN f )(x) = a0 √ + an √ + bn √ . π π 2π n=1 Since we can differentiate a finite sum term by term, we find that N D(PN f )(x) = sin nx cos nx −nan √ + nbn √ , π π n=1 where the an and bn are found by using formulas (2)’. If N cos nx 1 sin nx An √ + Bn √ , PN (Df ) = A0 √ + π π 2π n=1 where the An and Bn are also found by using (2)’, we must show that An = nbn , and Bn = −nan A0 = 0, But π A0 = 1 1 (Df (x)) √ dx = √ [f (π ) − f (−π )] = 0 2π 2π −π since f is periodic. Integrating by parts, we further find π cos nx dx = n (Df (x)) √ π −π An = and π Bn = sin nx (Df (x)) √ dx = −n π −π π sin nx f (x) √ dx = nbn π −π π cos nx dx = −nan . f (x) √ π −π Our result is now only a few steps away. Theorem 3.28 . If f ∈ C 1 [−π, π ] and if both f and f are periodic with period 2π , then the Fourier series PN f converges to f in the uniform norm lim N →∞ f − PN f ∞ = 0. Proof: The key observation is that f is a continuous function, so that Theorem 22 can be applied to its Fourier series. This shows that lim N →∞ Df − PN (Df ) = 0. By the above lemma, D(f − Pn f ) = Df − D(PN f ) = Df − PN (Df ). Thus lim N →∞ D(f − PN f ) = 0. (3-29) 140 CHAPTER 3. LINEAR SPACES: NORMS AND INNER PRODUCTS We would like to apply Theorem 21 to the function θN = f − PN f . In order to do so, we must only verify that θN vanishes somewhere in [−π, π ] . But the area under θN = f − PN f is π √ θN (x) dx = 2π θN , e0 = 0 −π since f − PN f is orthogonal to the space spanned by e0 , e1 , e1 , . . . , eN , eN (Theorem 16c). ˜ ˜ Because θN (x) is a continuous function (the difference of the C 1 ) function f and the infinitely differentiable trigonometric polynomial PN f ), the area under it can be zero only if θN vanishes somewhere. Thus Theorem 21 is applicable and yields the inequality √ f − PN f ∞ ≤ b − a D(f − PN f ) . We now pass to the limit N → ∞ and use equation (4) to complete the proof of the theorem: √ lim f − PN f ∞ ≤ lim b − a D(f − PN f ) = 0. N →∞ N →∞ C 1 [−π, π ] Remarks. The hypothesis that f ∈ and is periodic with period 2π has been proved a sufficient condition for the Fourier series to converge to the function in the uniform norm. Much weaker hypotheses also suffice to prove the same result—but mere continuity is not enough. Convergence of Fourier series or generalizations thereof is a vast and deep subject, one still the object of intense study. On the basis of the theorems we have proved, many other problems are reasonably accessible—like the convergence of the Fourier series for a function which is nice except for a finite number of jump discontinuities. But there is not time for this pleasant excursion. a figure goes here 3.5 Appendix. The Weierstrass Approximation Theorem . The proof—which is difficult—will be given as a series of lemmas. Lemma 3.29 . If f (x) is continuous and periodic with period 2π , then for any a ∈ R , the following equality holds a+2π 2π f (x) dx = f (x) dx. a 0 Proof: This is clear from a graph of f , since the area under one period of f does not depend upon where you begin measuring. We also offer a computational proof. Write a+2π 0 f (x) dx = a 2π f (x) dx + a a+2π f (x) dx + 0 f (x) dx. 2π Let x = t + 2π in the last integral and use the fact that f (t + 2π ) = f (t) . The last integral is then 0 − f (t) dt, a which cancels the unwanted term in the last equation and proves the lemma. 3.5. APPENDIX. THE WEIERSTRASS APPROXIMATION THEOREM π /2 cos2n t dt = Lemma 3.30 . 0 141 1 2 · 4 · 6 · · · (2n) 1 , where cn = v . 2cn π 1 · 3 · 5 · · · (2n − 1) Proof: A computation. Integrate by parts to show that π /2 cos2n t dt = (2n − 1)(I2n−2 − I2n ). I2n = 0 Thus I2n = I0 = π/2 . 2n−1 2n I2n−2 . Now induction can be used to do the rest, since by observation Lemma 3.31 . Assume f (x) is continuous and periodic with period 2π . Let TN (x) = Then given any cN 2 π f (t) cos2N ( −π t−x ) dt 2 (3-30) > 0 , there is an N such that f − TN = max |f (x) − TN (x)| < . ∞ − π ≤ x≤ π Proof: How did we guess the formula (4)? We observed that cos2N x is one at x = 0 , and strictly less than one for all other x ∈ [−π, π ] . Thus, for large N, cos2N x is one at x = 0 , and decreases sharply thereafter so cos2N ( t−x ) has the same property at x − t = 0 , 2 where x = t . Then essentially the only values of f (t) which will count are those about t = x , so what comes out will be f (x) . Let us proceed with the details. Take s = t−x . Then 2 π /2 f (x + 2s) cos2N s ds. TN (x) = cN −π/2 Split the integral into two pieces, from − π to 0 and from 0 to 2 −s in the first one. This gives π 2 , and then replace s by π /2 [f (x) + 2s) + f (x − 2s)] cos2N s ds. TN (x) = cN 0 From Lemma 2 we know that π /2 2f (x) cos2N s ds, f (x) = cN 0 since f (x) is a constant in the integration with respect to s . Therefore π /2 [f (x + 2s) − 2f (x) + f (x − 2s)] cos2N s ds. TN (x) − f (x) = cN 0 Now given any such that > 0 , from the continuity of f we can pick a δ > 0 independent of x |f (x1 ) − f (x2 )| < when |x1 − x2 | < δ. 2 This will be the of our conclusion. Break the integral into two parts, one from 0 to δ and the other from δ to π/2 , where δ is the δ we just found. Then in the [0, δ ] interval, |f (x + 2s) − 2f (x) + f (x − 2s)| ≤ |f (x + 2s) − f (x)| + |f (x) − f (x − 2s)| < , 142 CHAPTER 3. LINEAR SPACES: NORMS AND INNER PRODUCTS while in the [δ, π ] interval, 2 |f (x + 2s) − 2f (x) + f (x − 2s)| ≤ |f (x + 2s)| + 2 |f (x)| + |f (x − 2s)| ≤ 4M, where M = max |f (x)| . Hence x∈[−π,π ] δ π /2 cos2N s ds + 4M |f (x) − TN (x)| < cN [ 0 cos2N s ds]. 0 Now we observe that π /2 δ cos2N s ds < cos2N s ds = 0 0 1 , 2cN and that, since cos s decreases as s goes to π/2 , π π /2 cos2N s ds < cos2N δ ds < δ δ πN γ, 2 where γ = cos2 δ < 1 . Thus |f (x) − TN (x)| < Now πcN = 2 3 2 + 2πM cN γ N . 4 2N − 2 · 5 · · · 2N −1 · 2N < 2N , so that 2πM cN γ N < 4M N γ N . Because γ < 1 , we know that lim N γ N = 0 . Thus, pick N so large that N γ N < N →∞ same 8M , where this is the as before. Consequently, for this N , |f (x) − TN (x)| < . Since is independent of x , f (x) − TN (x) = max |f (x) − TN (x)| < x∈[−π,π ] too. A difficult lemma is thereby proved. The whole proof is completed in the following simple Lemma 3.32 . The function TN (x) defined by (3) TN (x) = cN 2 π f (t) cos2N ( −π t−x ) dt 2 is a trigonometric polynomial. Proof: This can be horribly messy unless one is shrewd. We shall use the formula eiθ = cos θ + i sin θ and the binomial theorem (top p. 108). First notice that cos2N θ = eiθ + e−iθ 2 2N = 1 22N 2N k=0 (2N )! eikθ e−i(2N −k)θ . (2N − k )k ! 3.5. APPENDIX. THE WEIERSTRASS APPROXIMATION THEOREM 143 Let dk = (2N )!/22N (2N − k )!k ! Then 2N dk e−i(2N −2k)θ cos2N θ = k=0 2N (3-31) dk [cos(2N − 2k )θ − i sin(2N − 2k )θ]. = k=0 Since cos2N θ is real, the sum of the imaginary terms on the right must be zero. Thus, replacing 2θ by t − x , we find that cos2N ( t−x 2= ) 2N dk cos(N − k )(t − x) k=0 2N (3-32) dk [cos(N − k )t cos(N − k )x + sin(N − k )t sin(N − k )x]. = k=0 Split the sum into two parts, one from 0 to N , the other from N + 1 to 2N , and let n = N − k in the first, n = k − N in the second. This gives cos2N ( t−x )= 2 N dN −n [cos nt cos nx + sin nt sin nx] n=0 N + (3-33) dN +n [cos nt cos nx + sin nt sin nx], n=1 so cos2N ( t−x ) = dN + 2 N (dN +n + dN −n )[cos nt cos nx + sin nt sin nx], n=1 which is much more simple than one might have anticipated. Substituting this into (4) and realizing that the t integrations just yield constants, we find that TN (x) is indeed a trigonometric polynomial. Coupled with Lemma 3, the proof of Weierstrass’ Approximation Theorem is completely proved. Exercises (1) Find the Fourier series with period 2π for the given functions. (a) f (x) = 0, −π ≤ x ≤ 0 2, 0 < x < π (b) f (x) = −2, −π ≤ x < 0 2, 0≤x<π (c) f (x) = sin 17x + cos 2s, −π ≤ x < π (d) f (x) = sin2 x, −π ≤ x ≤ π ; (e) f (x) = x2 , −π ≤ x ≤ π 144 CHAPTER 3. LINEAR SPACES: NORMS AND INNER PRODUCTS (f) f (x) = ∞ x + π, −π ≤ x ≤ 0 −x + π, 0 ≤ x ≤ π (Also, compute f 2 (a2 + b2 ) n n and a2 + 0 n=1 for (a)-(f)). (2) (a) Apply Parseval’s Theorem (Corollary to Theorem 22) to the function f (x) = x and its Fourier series to deduce that 1 1 1 π2 = 1 + 2 + 2 + 2 + ··· 6 2 3 4 (cf. the example before Theorem 17 of Section 3). (b) Do the same for the function f (x) = x2 (Ex. 1, e above) to evaluate 1+ 1 1 1 + 4 + 4 + · · · =? 4 2 3 4 (3) A function f (x) is even if f (−x) = f (x) , odd if f (−x) = −f (x) . Thus 2 + x2 is an even function, x3 − sin x is an odd function, while 1 + x is neither even nor odd. Let an and bn be the Fourier coefficients of the piecewise continuous function f (x) . Prove the following statements. (a) If f is an odd function, π an = 0, bn = 2 0 sin nx f (x) √ dx π (b) If f is an even function π an = 2 0 cos nx f (x) √ dx, π bn = 0 (c) A function f defined in [0, π ] may be extended to [−π, π ] as either an even or odd function by the formulas even extension : f (−x) = f (x), x ≥ 0, f (−x) = −f (x), x ≥ 0. or odd extension : The even extension of f (x) = x, x ∈ [0, π ] is f (x) = |x| , x ∈ [−π, π ] , while its odd extension is f (x) = x, x ∈ [−π, π ] . The odd extension of f (x) = x2 , x ∈ x2 , x ∈ [0, π ] [0, π ] is f (x) = . Extend the function f (x) = 1, x ∈ [0, π ] −x2 , x ∈ [−π, 0] to the interval [−π, π ] as an odd function and sketch its graph. Find its Fourier series using part (a). (4) (a) Let f (x) be a given function. Find a solution of the O.D. E. u + λ2 u = f , where λ is a real number and u(x) satisfies the boundary condition u(−π ) = u(π ) = 0 , by the following procedure: Expand f in its Fourier series and assume u has a Fourier series whose coefficients are to be found. Find a formula for the Fourier coefficients of u in terms of those for f in the case where λ is not an integer. 3.5. APPENDIX. THE WEIERSTRASS APPROXIMATION THEOREM 145 (b) If λ = n is an integer, show that there is a solution if and only if 0 = f , en = ˜ π sin nx f (x) √ dx . π −π (5) (a) State Parseval’s Theorem for the special cases i) f is a continuous even function in [−π, π ] , and ii) f is a continuous odd function in [−π, π ] . (b) If f is a continuous even function in [−π, π ] and π f (x) cos nx dx = 0, n = 0, 1, 2, 3, . . . , 0 show that f = 0 in [−π, π ] . (c) State and prove a theorem similar to (b) in the case of a continuous odd function. (6) In this exercise you show how a function f ∈ L2 [−A, A] can be expanded in a modified Fourier series (so far we know only L2 [−π, π ] ). Let y = πx —this maps the interval A [−A, A] onto [−π, π ] —and define g (y ) by f (x) = f ( πx Ay ) = g (y ) = g ( ). π A Since g (y ) ∈ L2 [−π, π ] , it can be expanded in a Fourier series ∞ cos ny 1 sin ny g (y ) = a0 √ + + bn √ , an √ π π 2π n=1 where the an and bn are given by the usual formulas (2)’. (a) Prove that f (x) ∈ L2 [−A, A] has the modified Fourier series ∞ bn 1 nx nπ f (x) = a0 √ x + √ sin x, + cos A A 2A n=1 A where 1 a0 = √ 2A 1 an = √ A A f (x) cos −A nπx dx, A A f (x) dx −A 1 bn = √ A A f (x) sin −A nπx dx. A (b) Find the modified Fourier series for f (x) = |x| , in the interval [−1, 1] . The following exercises all concern the Weierstrass Approximation Theorem. (7) Prove the following version of the Weierstrass Approximation Theorem. Let f ∈ C [a, b] . Then given any > 0 , there is a polynomial Q(x) such that f −Q ∞ = max |f (x) − Q(x)| < . x∈[a,b] −a (Hint: Let y = −π +2 (x−a ) π . This maps [a, b] into [−π, π ] . Define g (y ), y ∈ [−π, π ] b by (b − a) (x − a) (y + π ) = g (y ) = g (−π + 2 π ). f (x) = f (a + 2π b−a 146 CHAPTER 3. LINEAR SPACES: NORMS AND INNER PRODUCTS Use the version of the theorem proved to approximate g (y ), y ∈ [−π, π ] by a trigonometric polynomial TN (y ) to within /2 . Then approximate sin ny and cos ny to within c (you pick c ) by a finite piece of their Taylor series—which are polynomials. Put both parts together to obtain the complete proof for g (y ) . The transition back to f (x) is trivial.] (8) (Riemann-Lebesgue Lemma). Let f ∈ C [a, b] . Prove that b lim f (x) sin λx dx = 0. λ→∞ a [Hint: Integrate by parts to prove it first for all f ∈ C 1 [a, b] . For arbitrary f , approximate f by a polynomial—Ex. 7 above—to within /2 and realize that every polynomial is in C 1 [a, b] ]. (9) If f ∈ C [0, 1] , prove that 1 f (x)xn dx = f (1). lim n n→∞ 0 [Hint: Use the hint in Ex. 8]. (10) If f ∈ C [a, b] , and if b f (x)xn dx = 0, n = 0, 1, 2, 3, . . . , a b show that f = 0 . [Hint: This implies that f (x)Q(x) dx = 0 , where Q is a ˜ any polynomial. f can be approximated by some polynomial Q . Now show that b f 2 (x) dx = 0 .] a 3.6 The Vector Product in R 3 . As you grasped many years ago, the world we live in has three space dimensions. For this reason the material in this section is important in many applications. What we intend to do is define a way to multiply two vectors X and Y in R3 . Whereas the scalar product X, Y is a scalar, this product X × Y , the vector product, or cross product as it is often called, is a vector. For several reasons [i) we shall not cover this in class, and ii) I can probably not do as good a job as appears in many books] we shall let you read about this topic elsewhere. But make sure to read about it even though you’ll never be examined on it. Chapter 4 Linear Operators: Generalities. V 1 → Vn, Vn → V 1 4.1 Introduction. Algebra of Operators . Let V by a linear space. So far we have considered the algebraic structure of such a space; however most significant reason for studying linear spaces is so that one can study operators defined on them. Operator is another, more organic, name for function. Thus an operator T: A→B T maps elements in its domain A into elements of B , where B contains the range of T . If X ∈ A , then T (X ) = Y ∈ B . Think of feeding X into the operator T , and Y being a figure goes here what T sends out in return. It is useful to think of T as some type of machine or factory, the input (raw material) is X , and the output is Y . Some examples should illustrate the situation and its potential power. Examples: (1) Let V = R2 . If X = (x1 , x2 ) ∈ R2 , and Y = (y1 , y2 , y3 ) , we define T (X ) = Y by x1 + 2x2 = y1 x1 + x2 = y2 T (X ) = , 3x1 + x2 = y3 or T (X ) = T (x1 , x2 ) = (x1 + 2x2 , x1 + x2 , 3x1 + x2 ) = (y1 , y2 , y3 ) = Y. This operator T has the property that to every X ∈ R2 it assigns a Y ∈ R3 . In other words T maps the two dimensional space R2 into the three dimensional space R3 T : R2 → R3 . 147 CHAPTER 4. LINEAR OPERATORS: GENERALITIES. V 1 → VN , VN → V 1 148 R2 is the domain of T , denoted by D(T ) , while the range of T, R(T ) is contained in R3 , D(T ) = R2 , R(T ) ⊂ R3 . Since y1 = y2 = 0 implies that x1 = x2 = 0 , which in turn implies that y3 = 0 , we see that the point (0, 0, 1) ∈ R3 is not in the range of T . Thus, T is not surjective onto R3 . It is injective (one-to-one) since every point Y ∈ R(T ) is the image of exactly one X ∈ D(T ) . This an be seen by observing that y1 and y2 suffice to determine X = (x1 , x2 ) uniquely by solving the first two equations −y1 + 2y2 = x1 y1 − y2 = x2 . Hence if Y = T (X1 ) and also Y = T (X2 ) , then X1 = X2 . Since the operator T is completely determined by the coefficients in the equations, it is reasonable to represent this T by the matrix 12 T = 1 1 31 If you care to think of X as the input into a paint-making machine, then x1 might represent the quantity of yellow and x2 the quantity of blue used. In this case y1 , y2 and y3 represent the quantities of three different shades of green the machine yields. For this machine, as soon as you specify the desired quantities of any two of the greens, say y1 and y2 , the quantities x1 and x2 of the input colors are completely determined, as is the quantity y3 of the remaining shade of green. (2) Let V be R2 again. With X = (x1 , x2 ) ∈ R2 , and Y = (y1 ) ∈ R1 , define T by x2 + x2 = y1 , 1 2 or T (X ) = x2 + x2 . 1 2 This operator T maps R2 into R1 T : R2 → R1 . It is not surjective onto R1 since the negative half of R1 is completely omitted from R(T ) . Furthermore, it is not injective either since each point y1 ∈ R(T ) other than zero is the image of infinitely many points—all of those on the circle x2 + x2 = y1 . 1 2 (3) Let V be C [−1, 1] . If f ∈ C [−1, 1] , we define T by T (f ) = f (0). Thus, if f (x) = 2 + cos x , then T f = 3 . This operator T is usually denoted by δ and called the Dirac delta functional. It was first used by Dirac in his work on quantum mechanics and is extremely valuable in modern mathematics and physics. T assigns to each continuous function f its value at x = 0 , a real number. Therefore T : C [−1, 1] → R1 . 4.1. INTRODUCTION. ALGEBRA OF OPERATORS 149 The operator T is not injective, since for example the element 2 ∈ R1 is the image of both f (x) = 1 + ex and f (x) = 2 . It is surjective since every element a ∈ R1 is the image of at least one element in C [−1, 1] (if f (x) ≡ a , then clearly T (f ) = a ). (4) Let V be C [−1, 1] . If f ∈ C 1 [−1, 1] then the differentiation operator D is defined by df (x). (Df )(x) = dx It maps each function into its derivative. If f (x) = x2 , then (Df )(x) = 2x . Since the derivative of a continuously differentiable function (a function in C 1 ) is necessarily continuous, we see that D : C 1 [−1, 1] → C [−1, 1]. D is not injective since, for example, the function g (x) = 1 is the image of both f1 (x) = x and f2 (x) = 2 + x . D is surjective onto C [−1, 1] . R(D) = C [−1, 1], since if g (x) is any element of C [−1, 1] , then g is the image of the particular function f ∈ C 1 [−1, 1] defined by x f (x) = g (s) ds, 0 because Df = g by the fundamental theorem of calculus. Throughout this and the next chapter we will study some of the elementary aspects of linear operators. It is reasonable to denote a linear operator by L . Definition Let V1 and V2 both be linear spaces over the same field of scalars. An ˜ operator L mapping V1 into V2 is called a linear operator if for every X and X in V1 and any scalar a , L satisfies the two conditions ˜ ˜ 1. L(X + X ) = L(X ) + L(X ) 2. L(aX ) = aL(X ). Whenever ambiguity does not arise, we will omit the parentheses and write LX instead of L(X ) . An equivalent form of the definition is Theorem 4.1 . L is a linear operator ⇐⇒ ˜ ˜ L(aX + bX ) = aL(X ) + bL(X ), ˜ where X, X ∈ V1 and a and b are any scalars. Proof: ⇒ ˜ ˜ L(aX + bX ) = L(aX ) + L(bX ) (property 1) ˜ = aLX + bLX (property 2). (4-1) ⇐ Property 1 is the special case a = b = 1 . Property 2 is the special case b = 0 . Remark: It is useful to observe that always L(0) = L(0 · X ) = 0L(X ) = 0 . This identity is often the easiest way to test if an operator is not linear. 150 CHAPTER 4. LINEAR OPERATORS: GENERALITIES. V 1 → VN , VN → V 1 Examples: (1) The operator L defined by example 1 where L : R2 → R3 is LX = (x1 + 2x2 , x1 + x2 , 3x1 + x2 ), ˜ is linear. Let X = (x1 , x2 ) and X = (˜1 , x2 ) . Then x˜ ˜ L(X + X ) = (x1 + x1 + 2x2 + 2˜2 , x1 + x1 + x2 + x2 , 3x1 + 3˜1 + x2 + x2 ) ˜ x ˜ ˜ x ˜ = (x1 + 2x2 , x1 + x2 , 3x1 + x2 ) + (˜1 + 2˜2 , x1 + x2 , 3˜1 + x2 ) x x˜ ˜x ˜ ˜ = LX + LX and L(aX ) = (ax1 + 2ax2 , ax1 + ax2 , 3ax1 + ax2 ) = a(x1 + 2x2 , x1 + x2 , 3xz + x2 ) = aLX. (4-2) (2) The operator T X = x2 + x2 with domain R2 and range R1 is not linear, since 1 2 T (aX ) = (ax1 )2 + (ax2 )2 = a2 [x2 + x2 ] = aT X 1 2 except for the particular scalars a = 0, 1 . df (3) The operator Df = dx with domain C 1 [−1, 1] and range C [−1, 1] is linear since if f1 and f2 are in C 1 [−1, 1] and a and b are any real numbers, then by elementary calculus d df1 df2 (af1 + bf2 ) = a +b dx dx dx = aDf1 + bDf2 . D(af1 + bf2 ) = (4-3) (4) The operator L defined as Lu = a2 (x)u + a1 (x)u + a0 (x)u, (= d ), dx where u(x) ∈ D(L) = C 2 , and where a0 (x) , a1 (x) , and a2 (x) are continuous functions, is a linear operator, L : C 2 → C. If A and B are any constants (scalars for C 2 ), then for any u1 and u2 ∈ C 2 , L(Au1 + Bu2 ) = ax [Au1 + Bu2 ] + a1 [Au1 + Bu2 ] + a0 [Au1 + Bu2 ] = a2 Au1 + a2 B2 + a1 Au1 + a1 Bu2 + a0 Au1 + a0 Bu2 = A[a2 u1 + a1 u1 + a0 u1 ] + B [a2 u2 + a1 u2 + a0 u2 ] = ALu1 + BLu2 . (4-4) 4.1. INTRODUCTION. ALGEBRA OF OPERATORS 151 (5) The identity operator I is the operator which leaves everything unchanged. Because it is so simple, it can be defined on an arbitrary set S and maps S into itself S → S in a trivial way. If X ∈ S , then we define IX = X. What could be more simple? If S is a linear space V (so aX and X1 + X2 are defined), then I is trivially a linear operator, since I (aX1 + bX2 ) = aX1 + bX2 = aIX1 + bIX2 Why are linear operators important? There are several reasons. First, they are much easier to work with than nonlinear operators. Second, most of the operators which arise in applications are linear. The feature possessed by linear operators which is central to applications is that of superposition. If Lu1 = f and Lu2 = g , then L(u1 + u2 ) = f + g . In other words, if u1 is the response to some external influence f and u2 the response to g , then the response to f + g is found by adding the separate responses. The special case of a linear operator whose range is the real number line R1 arises often enough to receive a name of its own. Definition: A linear operator whose range is R1 is called a linear functional, V → R1 . The Dirac delta functional is such an operator. So is the operator 1 l(f ) = f (x) dx, 0 which assigns to every continuous function f ∈ C [0, 1] the real number equal to the area between the graph of f and the x -axis. Check that is linear. If the linear operator L : V1 → V2 the range of L —a subset of the linear space V2 —has a particularly nice structure. In fact, R(L) is not just any clump of points in V2 but Theorem 4.2 . The range of a linear operator L : V1 → V2 is a linear subspace of V2 . Remark: Even more is true. We shall prove (p. 312-3) that dim R(L) ≤ dim D(L) so that no matter how large V2 is, the range has at most the same dimension as the domain. Proof: The range of L consists of all elements Y ∈ V2 of the form Y = LX where X ∈ V1 . We know that R(L) is a subset of the linear space V2 . The only task is to prove that it is actually a subspace. Since V2 is a linear space, it is sufficient to show that the set R(L) is closed under multiplication by scalars, and under addition of vectors. i) R(L) is closed under multiplication by scalars. If Y ∈ R(L) , there is an X ∈ V1 = D(L) such ˜ ˜ that Y + LX . We must find some X in V1 such that aY = LX , where a is any scalar. ˜ Since aY = aLX = L(aX ) , we take X = aX . ii) R(L) is closed under addition of vectors. If Y1 and Y2 are in R(L) , there are elements X1 and X2 in V1 = D(L) such that Y1 = LX1 and Y2 = LX2 . We must show that ˜ ˜ Y1 + Y2 ∈ D(L) , that is, find some X ∈ V1 such that Y1 + Y2 = LX . But Y1 + Y2 = ˜ = X1 + X2 . LX1 + LX2 = L(X1 + X2 ) . Thus we can take X Before moving further on into the realm of special linear operators, we shall take this opportunity to define algebraic operations (addition and multiplication) for linear operators. But first we define equality, L1 = L2 , in a straightforward way. 152 CHAPTER 4. LINEAR OPERATORS: GENERALITIES. V 1 → VN , VN → V 1 Definition: (equality) If L1 and L2 both map V1 into V2 , where V1 and V2 are linear spaces, and if L1 X = L2 X for all X in V1 , then L1 equals L2 . Thus, two operators are equal if they have the same effect on any vector. Addition is equally simple. Definition: (addition). If L1 : V1 → V2 and L2 : V1 → V2 then their sum, L1 + L2 , is defined by the rule (L1 + L2 )X = L1 X + L2 X, X ∈ V1 Examples: (1) Let L1 : R2 → R3 be defined by X = (x1 , x2 ) ∈ R2 L1 (X ) = (x1 + x2 , x1 + 2x2 , −x2 ), and L2 : R2 → R3 be defined by X = (x1 , x2 ) ∈ R2 . L2 X = (−3x1 + x2 , x1 − x2 , x1 ), Then L1 + L2 is defined, and is (L1 + L2 )X + L1 X + L2 X = (x1 + x2 , x1 + 2x2 , −x2 ) + (−3x1 + x2 , x1 − x2 , x1 ) = (−2x1 + 2x2 , 2x1 + x2 , x1 − x2 ) (4-5) (2) Let D : C 1 → C be defined by Du = du dx u ∈ C 1, and L : C 1 → C be defined by 1 Lu = ex−t u(t) dt u ∈ C1 0 1 x =e (4-6) −t e u(t) dt. 0 (In reality, L may be defined on a much larger class of functions— u ∈ C is plenty, while its image is the smaller space, constant ex ⊂ C . We have decided on the smaller domain and larger image space so that the sum D + L is defined). Then for any u ∈ C 1 . (D + L)u = Du + Lu = du + dx 1 ex−t u(t) dt. 0 The following theorem is a statement of some simple facts about the sum of two linear operators. Theorem 4.3 . Let L1 , L2 , L3 , . . . be any linear operators which map V1 → V2 , so that their sums are defined. Then 0. L = L1 + L2 is a linear operator (1) L1 + (L2 + L3 ) + (L1 + L2 ) + L3 , 4.1. INTRODUCTION. ALGEBRA OF OPERATORS 153 (2) L1 + L2 = L2 + L1 (3) Let 0 be the operator which maps every element of V1 into 0 ∈ V2 , so 0X = 0 . Then L1 + 0 = L1 . (4) L1 + (−L1 ) = 0 . Here −L1 is the operator which maps every element X ∈ V1 into −(L1 X ) . Proof: These are just computations. Let X1 , X2 ∈ V1 . 0. L(aX1 + bX2 ) = (L1 + L2 )(aX1 + bX2 ) = L1 (aX1 + bX2 ) + L2 (aX1 + bX2 ) = aL1 X1 + bL1 X2 + aL2 X1 + bL2 X2 = a(L1 X1 + L2 X1 ) + b(L1 X2 + L2 X2 ) (4-7) = a(L1 + L2 )X1 + b(L1 + L2 )X2 = aLX1 + bLX2 . (1) (L1 + (L2 + L3 ))X = L1 X + (L2 + L3 )X = L1 X + L2 X + L3 X = (L1 + L2 )X + L3 X = ((L1 + L2 ) + L3 )X. (2) (L1 + L2 )X = L1 X + L2 X = L2 X + L1 X = (L2 + L1 )X . The step L1 X + L2 X = L2 X + L1 X is justified on the grounds that the vectors Y1 := L1 X and Y2 := L2 X are elements of V2 —which is a linear space—so that Y1 + Y2 = Y2 + Y1 . (3) (L1 + 0)X = L1 X + 0X = L1 X + 0 = L1 X Note that the 0 in 0X is an operator, while the 0 in the next step is an element of V2 . This ambiguity causes no trouble once you understand it. (4) (L1 + (−L1 ))X = L1 X + (−L1 )X = L1 X − L1 X = 0 The crucial step (−L1 )X = −L1 X is the definition of the operator (−L1 ) . Remark: This theorem states that the set of all linear operators mapping one linear space V1 into another V2 form an abelian group under addition. Multiplication of operators is not much more difficult. If L1 and L2 are linear operators, then their product L2 L1 in that order is defined by the rule L2 L1 X = L2 (L1 X ) . In other words, first operate on X with L1 giving a vector Y = L1 X . Then operate on this new vector Y with L2 , giving L2 Y = L2 (L1 X ) . It is clear that in order for this to make sense, for every X ∈ D(L1 ) , the new vector Y = L1 X must be in the domain of L2 . Thus to form the product L2 L1 , we require that R(L1 ) ⊂ D(L2 ) . Look at our machine again. a figure goes here 154 CHAPTER 4. LINEAR OPERATORS: GENERALITIES. V 1 → VN , VN → V 1 The multiplication L2 L1 means sending the output from L1 as input into L2 . In order to join the machines in this way, surely one necessary requirement is that L2 is equipped to act on the output from R(L1 ) , that is, R(L1 ) ⊂ D(L2 ) . Of course the L2 machine might be able to digest input other than what L1 sends out. But all we care is that L2 can digest at least what L1 sends it. Definition: (multiplication). Let L1 : V1 → V2 and L2 : V3 → V4 . If the range of L1 is contained in the domain of L2 , R(L1 ) ⊂ D(L2 ) , then the product L2 L1 is definable by the composition rule L2 L1 X = L2 (L1 X ), where X ∈ V1 = D(L1 ). The product L2 L1 maps the input V1 for L1 into the output V4 for L2 , L2 L1 : V1 → V3 → V4 . We exhibit a little diagram (cf. p. ???). a figure goes here The way to get from V1 to V4 using L2 L1 is to first use L1 to reach V2 . Then use L2 to get to V4 . Remarks: If L2 L1 is defined, it is not necessarily true that L1 L2 is defined (Example 1 below). Furthermore, even if L1 L2 is also defined, it is only a rare coincidence that multiplication is commutative. Usually L2 L1 = L1 L2 when both products are defined. Thus the order L2 L1 is important. Examples: (1) Let L1 : R2 → R3 be defined as L1 X = (x1 − x2 , x2 , −x1 − 2x2 ), where X = (x1 , x2 ) ∈ R2 , and let L2 : R3 → R1 be defined as L2 Y = (y1 + 2y2 − y3 ), where Y = (y1 , y2 , y3 ) ∈ R3 . L 1 Then R(L1 ) ⊂ R3 = D(L2 ) so that the product L2 L1 is definable and L2 L1 : R2 → L2 R3 → R1 . Consider what L2 L1 does to the particular vector X0 = (−1, 2) ∈ R2 . L2 L1 X0 = L2 (L1 X0 ) = L2 (−3, 2, −3) = (−3 + 4 + 3 = 4) Thus L2 L1 maps (−1, 2) ∈ R2 into 4 ∈ R1 . More generally, if X is any vector in R2 , L2 L1 X = L2 (L1 X ) = L2 (x1 − x2 , x2 , −x1 − 2x2 ) = (x1 − x2 + 2x2 + x1 + 2x2 ) = 2x1 + 3x2 ∈ R1 . (4-8) Thus L2 L1 maps (x1 , x2 ) ∈ R2 into 2x1 + 3x2 ∈ R1 . Since R(L2 ) = R1 and D(L1 ) = R2 , R(L2 ) not ⊂ D(L1 ) so that the product L1 L2 is not defined. You might be thinking that R1 is part of R2 . What you mean is that R2 has one dimensional subspaces. It certainly does—an infinite number of them, all of the straight lines through the origin. Because there are so many subspaces of R2 4.1. INTRODUCTION. ALGEBRA OF OPERATORS 155 which are one dimensional, there is no natural way of regarding R1 as being contained in R2 . [On the other hand, there is a natural way in which C 1 can be regarded as contained in C . We used this above in our second example for addition of linear operators]. (2) Define L1 : R2 → R2 by the rule L1 X = (2x1 − 3x2 , −x1 + x2 ) and L2 : R2 → R2 by the rule L2 X = (2x2 , x1 + x2 ) Then R(L1 ) = R2 = D(L2 ) so that L2 L1 is defined. It is given by L2 L1 X = L2 (2x1 − 3x2 , −x1 + x2 ) = (−2x1 + 2x2 , x1 − 2x2 ) In particular, L2 L1 maps X0 = (1, 2) into (2, −3) . Now R(L2 ) = R2 = D(L1 ) , so that L1 L2 is also definable. It is given by L1 L2 X = L1 (2x2 , x1 + x2 ) = (2 · 2x2 − 3 · (x1 + x2 ), −2x2 + (x1 + x2 )) (4-9) = (−3x1 + x2 , x1 − x2 ). In particular, L1 L2 maps X0 = (1, 2) into (−1, −1) . Since L1 L2 and L2 L1 map the point X0 = (1, 2) into two different points, it is clear that L1 L2 = L2 L1 , the operators do not commute. (3) Let A be the subspace of R2 spanned by some unit vector e1 and B be the subspace spanned by another unit vector e2 . Consider the projection operators PA and PB . They are linear since, for example, PA (aX1 + bX2 ) = aX1 + bX2 , e1 e1 = a X1 , e1 e1 + b X2 , e1 e1 (4-10) = aPA X1 + bPA X2 . Because PA : R2 → R2 and PB : R2 → R2 , both products PA PB and PB PA are defined. We have PA PB X = PA (PB X ) = PA ( X, e2 e2 ) = X, e2 PA e2 = X, e2 e2 , e1 e1 . (4-11) Also, PB PA X = PB (PA X ) = PB ( X, e1 e1 ) = X, e1 PB e1 = X, e1 e1 , e2 e2 . (4-12) Since PA PB X ∈ A ⊂ R2 , while PB PA X ∈ B ⊂ R2 , it is clear that usually PA PB = PB PA . They will happen to be equal if A = B , or if A ⊥ B (for then PA PB = PB PA = 0 ). See the figure at the beginning of this example—and draw some more special cases for yourself. (4) Let L : C ∞ → C ∞ ( C ∞ is the space of infinitely differentiable functions) be defined by (Lu)(x) = xu(x), u ∈ C ∞ , CHAPTER 4. LINEAR OPERATORS: GENERALITIES. V 1 → VN , VN → V 1 156 and D : C ∞ → C ∞ be defined by (Du)(x) = du (x), dx u ∈ C ∞. Then R(L) = D(D) so that the product DL is definable by DLu = D(Lu) = D(xu) = d (xu(x)) = xu + u. dx Also, R(D) = D(L) so LD is definable by LDu = L(Du) = L(u ) = xu . Notice that LD = DL unless u = 0 . We collect some properties of multiplication. Theorem 4.4 . If L1 : V1 → V2 , L2 : V3 → V4 , and L3 : V5 → V6 , where V1 ⊂ V3 and V4 ⊂ V5 , then 0. The operator L = L2 L1 is a linear operator. 1. L3 (L2 L1 ) = (L3 L2 )L1 —Associative law. Proof: 0. L(aX1 + bX2 ) = L2 L1 (aX1 + bX2 ) = L2 aL1 X1 + bL1 X2 = L2 (aL1 X1 ) + L2 (bL1 X2 ) (4-13) = aL2 L1 X1 + bL2 L1 X2 = aLX1 + bLX2 . (1) By definition of the product, [L3 (L2 L1 )]X = L3 [(L2 L1 )X ] = L3 [L2 (L1 X )] and [(L3 L2 )L1 ]X = (L3 L2 )(L1 X ) = L3 [L2 (L1 X )]. Now match the ends. Notice that the commonly occurring special case V1 = V2 = V3 = V4 = V5 = V6 is included in this theorem. In this special case, even more can be proved. For then the identity operator I , defined by IX = X for all X ∈ V can be used to multiply any other operator. Moreover, addition, L1 + L2 also makes sense. Theorem 4.5 . If the linear operators L1 , L2 , L3 all map V into V , then representing any one of these by L , (1) LI = IL = L. (2) For any positive integer n , we define Ln inductively by the rule Ln+1 = LLn , and L0 = I . Then for any non-negative integers m and n , Lm+1 = Lm Ln . 4.1. INTRODUCTION. ALGEBRA OF OPERATORS 157 (3) (L1 + L2 )L3 = L1 L3 + L2 L3 . (4) L3 (L1 + L2 ) = L3 L1 + L3 L2 (This is needed in addition to 3 because of the noncommutativity). Proof: (1) If X ∈ V , (LI )X = L(IX ) = LX (IL)X = I (LX ) = LX (2) We shall prove Lm+n = Lm Ln by induction on m . The statement is true, by definition, for m = 1 . Assume it is true for m = k , so Lk+n = Lk Ln . Our job is to prove the statement for m = k + 1 . By the definition and the induction hypothesis, we have Lk+n+1 = LLk+n = L(Lk Ln ). Since multiplication is associative, we find that L(Lk Ln ) = (LLk )Ln . But, by definition, LLk = Lk+1 . Thus, Lk+n+1 = Lk+1 Ln . This completes the induction proof. (3) If X ∈ V , [(L1 + L2 )L3 ]X = (L1 + L2 )(L3 X ) Let L3 X = Y ∈ V . Then (L1 + L2 )Y = L1 Y + L2 Y. Thus [(L1 + L2 )L3 ]X = L1 (L3 X ) + L2 (L3 X ) = (L1 , L3 )X + (L2 L3 )X. (4) Same proof as 3. Remark: If V1 and V2 are two linear spaces, the set of all linear operators which map V1 into V2 is usually denoted by Hom(V1 , V2 ) —Hom rhymes with Mom and Tom. In this notation, the last theorem concerned Hom(V, V ) . The abbreviation Hom is for the impressive word “homomorphism”. Tell your friends. Examples: Consider D : C ∞ → C ∞ defined by (Du)(x) = dn . dxn Exercises (1) Determine which of the following are linear operators. du dx (x) . Then Dn = 158 CHAPTER 4. LINEAR OPERATORS: GENERALITIES. V 1 → VN , VN → V 1 (a) T : R2 → R2 T X = (x1 + x2 , x1 − x2 ), where X = (x1 , x2 ) ∈ R2 . (b) T : R2 → R2 , T (X ) = (x1 + x2 + 1, x1 − x2 ) (c) T : R3 → R2 T (X ) = (x1 + x1 x2 , x2 ) (d) T : R3 → R1 T (X ) = (x1 + x2 − x3 ) (e) T : R3 → R1 T (X ) + (x1 + x2 − x3 + 2) (f) D : P2 → P1 . If P (x) = a2 x2 + a1 x + a0 ∈ P2 then D(P ) = 2a2 + a1 ∈ P1 . (g) T : C 1 [−1, 1] → R1 . If u(x) ∈ C 1 [−1, 1] , then T (u) = u(0) + u (0). (h) T : C [2, 3] → C [2, 3] . If u ∈ C [2, 3] , 3 ex−t u(t) dt (T u)(x) = 2 (i) T : C [2, 3] → C [2, 3] , 3 (T u)(x) = 1 + ex−t u(t) dt 2 (j) T : C [2, 3] → C [2, 3], 3 (T u)(x) = ex−t u2 (t) dt 2 (k) S1 : C [0, ∞] → C [0, ∞] (S1 u)(x) = u(x + 1) − u(x) (l) L : A → C [0, ∞] , where A = { u ∈ C [0, ∞] : ∞ 0 |u(t)| dt < ∞ } , ∞ (Lu)(x) = e−xt u(t) dt, 0 [Our restriction on A is just to insure that the integral exists. Lu is usually called the Laplace transform of u ]. 4.1. INTRODUCTION. ALGEBRA OF OPERATORS 159 (m) T : C [0, ∞] → C [0, ∞] (T u)(x) = a2 u(x2 ) + a1 u(x + 1) + a0 u(x), where the ak (x) are continuous functions. (n) T : C [0, 1] → C [0, 1] . (T u)(x) = 2xu(x). (o) T : R2 → R1 T X = |x1 + x2 | , where X = (x1 , x2 ) ∈ R2 . [Answers: a,d,f,g,h,k,l,m,n are linear]. (2) (a) If l(x) is a linear functional mapping R1 → R1 , prove that l(x) = αx , where α = l(1) . n (b) If l(X ) is a linear functional mapping Rn → R1 , prove that l(X ) = αk xk , k=1 where X = (x1 , . . . , xn ) . (3) Let L1 : R1 → R2 be defined by L1 X = (x1 , 3x1 ), where X = (x1 ) ∈ R1 , and L2 : R2 → R2 be defined by L2 Y = (y1 + y2 , y1 + 2y2 ), where Y = (y1 , y2 ) ∈ R2 . Compute L2 L1 X0 , where X0 = 2 ∈ R1 . Is L1 L2 defined? (4) Let A : R2 → R2 be defined by X = (x1 , x2 ) ∈ R2 AX = (x1 + 3x2 , −x1 − x2 ), and B : R2 → R2 by BX = (−x1 + x2 , 2x1 + x2 ). a). Compute ABX, BAX, B 2 X, A2 BX , and (A + B )X . b). Find an operator C such that CA = I . [hint: Let CX = (c11 x1 + c12 x2 , c21 x1 + c22 x2 ) and solve for c11 , c12 , etc.] (5) Consider the operators D : C ∞ → C ∞ , (Du) = u and L : C ∞ → C ∞ , (Lu)(x) = x 0 u(t) dt . (a) Show that DL = I, LD = I − δ , where δ is the delta functional. x 2 (L2 u)(x) = u(t) dt 0 ds. 0 Integrate by parts to conclude that x (L2 u)(x) + (x − t)u(t) dt. 0 160 CHAPTER 4. LINEAR OPERATORS: GENERALITIES. V 1 → VN , VN → V 1 (b) Observe that D2 L2 = D(DL)L = DIL = DL = I . Use this observation to find a solution of the differential equation D2 u = f for u , where f ∈ C ∞ . Solve 1 the particular equation (D2 u)(x) = 1+x2 (6) Let A : R2 → R2 be defined by AX = (a11 x1 + a12 x2 , a21 x1 + a22 x2 ), and B : R2 → R2 be defined by BX = (b11 x1 + b12 x2 , b21 x1 + b22 x2 ). (a) Compute AB . (b) Find a matrix B such that AB = I , that is, determine b11 , b12 , . . . in terms of a11 , a12 , . . . such that AB = I . [In the course of your computation, I suggest introducing a symbol, say ∆ , for a11 a12 − a12 a21 when that algebraic combination crops up.] (7) In the plane E2 , consider the operator R which rotates a vector by 90o and the operator P projecting onto the subspace spanned by e (see fig). (a) Prove that R is linear. (b). Let X = (x1 , x2 ) be any point on E2 . Compute P RX and RP X . Draw a sketch for the special case X = (1, 1) . (8) In R3 , let A denote the operator of rotation through 90o about the x1 -axis (so A : (0, 1, 0) → (0, 0, 1) ), B the operator of rotation through 90o about the x2 -axis and C the operator of rotation through 90o about the x3 -axis (see fig.) Prove these operators are linear (just do it for A ). Show that A4 = B 4 = C 4 = I, AB = BA , and that A2 B 2 = B 2 A2 . Is it true that ABAB = A2 B 2 ? (9) Let P denote the linear space of all polynomials in x . For p ∈ P , consider the dp operators Dp = dx and Lp = xp . Show that DL − LD = I . (10) (a) If L1 L2 = L2 L1 , prove that (L1 + L2 )2 = L2 + 2L1 L2 + L2 . 1 2 (b) If L1 L2 = L2 L1 , then (L1 + L2 )2 =? (11) If L1 and L2 are operators such that L1 L2 − L2 L1 = I , prove the formula Ln L2 − 1 L2 Ln = nLn−1 , where n = 1, 2, 3, . . . . 1 1 (12) If L1 is a linear operator, L1 : V1 → V2 [or L1 ∈ Hom(V1 , V2 ) ], and a is any scalar, define the operator L = aL1 by the rule LX = (aL1 )X = a(L1 X ) , where X ∈ V1 . Prove (0). L = aL1 is a linear operator, L : V1 → V1 . (5). a(bL1 ) = (ab)L1 , where a, b are any scalars. (6). 1 · L1 = L1 . (7). (a + b)L1 = aL1 + bL1 . (8). a(L1 + L2 ) = aL1 + aL2 , where L2 ∈ Hom(V1 , V2 ) . 4.2. A DIGRESSION TO CONSIDER AU + BU + CU = F . 161 Coupled with Theorem 3, this exercise proves that the set of all linear operators mapping one linear space in to another linear is itself a linear space, that is, Hom (V1 , V2 ) is a linear space. (13) (a). In E2 , let L denote the operator which rotates a vector by 90o . Then L : E2 → E2 . If X = (x1 , x2 ) = x1 e1 + x2 e2 , where e1 = (1, 0) and e2 = (0, 1) , write L as LX = (a11 x1 + a12 x2 , a21 x1 + a22 x2 ), That is, find the coefficients a11 , a12 , . . . . This gives two ways to represent L , as a rotation (geometrically), and by linear equations in terms of a particular basis (algebraically). (b). In E2 , consider the operator L of rotation through an angle α . Show that Le1 = (cos α, sin α), Le2 = (− sin α, cos α), and then deduce that if X = (x1 , x2 ) = x1 e1 + x2 e2 , LX = (x1 cos α − x2 sin α, x1 sin α + x2 cos α). (14) Consider the space Pn of all polynomial of degree n . Define L : Pn → Pn as the translation operator (Lp)(x) = p(x + 1) , and D : Pn → Pn as the differentiation dp operator, (Dp)(x) = dx (x) . Show that L=I +D+ D2 Dn−1 Dn + ··· + + 2! (n − 1)! n! (15) Consider the linear operators L1 = a1 D2 + b1 D + c1 I , and L2 = a2 D2 + b2 D + e2 I . Both L1 and L2 map the linear space of infinitely differentiable function into itself, Lj : C ∞ → C ∞ . If the coefficients a1 , a2 , b1 , . . . are constants, prove that L1 L2 = L2 L1 . 4.2 A Digression to Consider au + bu + cu = f . Essentially the only linear equation you can solve explicitly are linear algebraic equations, like two equations in two unknowns. Since our theory applies to much more general situations, we shall develop a different example for you to keep in the back of your minds along with that of linear algebraic equations. The example we have chosen has the additional virtue that it contains most of the solvable differential equations which arise anywhere. Watch closely because we shall be brief and with a high density of valuable ideas. Problems concerning vibration or oscillatory phenomena are among the most important and significant ones which arise in applications. The simplest case is that of a simple harmonic oscillator. We have a figure goes here 162 CHAPTER 4. LINEAR OPERATORS: GENERALITIES. V 1 → VN , VN → V 1 a mass m attached to a spring. Pull the mass back a little and watch it move back and forth, back and forth. These are oscillations. To make the situation simple, we assume that the spring has no mass and that the surface upon which the mass rests is frictionless. Let u(t) denote the displacement of the center of gravity of the mass from the equilibrium position. Two experimental results are needed from physics. . 1. Newton’s Second Law: m . . u = F , where F means the resultant of all the forces on the center of gravity of the mass (we assume all forces are acting horizontally). 2. Hooke’s Law: If a spring is not stretched too far, then the force it exerts is proportional to the displacement, F = −ku, k > 0. We chose the minus sign since if a spring is displaced, the force it exerts is in the direction opposite to the displacement. [Under larger displacements, actually F (u) = a1 u + a2 u2 + a3 u3 + . . . -where a0 = F (0) = 0 . If the displacement u is small, the lowest term in the Taylor series for F (u) gives an adequate approximation. This is a more precise statement of Hooke’s Law]. Putting these two results together, we find that mu = −ku + F1 , ¨ (notation : u = ¨ d2 u ) dt2 where F1 represents all of the remaining forces on the mass. One possible force (so far incorporated into F1 ) is a so-called viscous damping force. It is of the form Fv = −µu where ˙ µ > 0 ; at low velocities, this force is experimentally found to account for air resistance. It is directed opposite to the velocity, and increases as the speed does (speed = velocity ). [Again, Fv = b1 u + b2 u2 + . . . , that is Fv (u) is given by a Taylor series with Fv (0) = 0 . At ¨ ˙ ˙ low speeds, the higher order terms can be neglected to yield a reasonable approximation.] Thus, to our approximation, mu = −ku − µu + F2 , ¨ ˙ where F2 represents the forces yet unaccounted for. Let us assume that these remaining forces do not depend on the motion and are applied by the outside world. Then the force F2 depends only on time, F2 = f (t) . It is called the applied or external force. Newton’s law gives mu = −ku − µu + f (t), ¨ ˙ or Lu : = au = bu + cu = f (t), ¨ ˙ where a = m , b = µ , and c = k . For the purposes of our discussion, we shall assume that k and µ do not depend on time. Then a, b and c are non negative constants. In order to determine the motion of the mass, we must solve the ordinary differential equation Lu = f for u . Have we given enough information to determine the solution? In other words, is the solution unique? For any physically reasonable problem, we expect the mathematical model has a unique solution since (neglecting quantum mechanical effects) once we let the mass go, it will certainly move in one particular way, the same way every time we perform the same experiment. It is clear that the motion will depend on the initial 4.2. A DIGRESSION TO CONSIDER AU + BU + CU = F . 163 position u(t0 ) . But if two masses have the same initial position, the resulting motion will still be different if their initial velocities u(t0 ) are different. Thus we must also specify the ˙ initial velocity u(t0 ) as well as the initial position u(t0 ) . Are these sufficient to determine ˙ the motion? Yes, however that requires proof. What must be proved is that if we have two solutions u1 (t) and u2 (t) of the same ordinary differential equation (1) and if their initial positions and velocities coincide, then the solutions coincide, u1 = u2 for all later time, t ≥ t0 . Theorem 4.6 (Uniqueness). Let u1 (t) and u2 (t) be two solutions of the ordinary differential equation Lu : = au + bu + cu = f (t), ¨ ˙ where a, b , and c are constants, a > 0, b ≥ 0, c ≥ 0 . If u1 (t0 ) = u2 (t0 ) , and u1 (t0 ) = ˙ u2 (t0 ) , then u1 (t) = u2 (t) for all t ≥ 0 , in other words, the solution is uniquely determined ˙ by the initial position and velocity. Remark: The theorem is true under much more general conditions - as we shall prove in Chapter 6. Proof: Let w(t) = u2 (t) − u1 (t) . We shall show that w(t) ≡ 0 for all t ≥ t0 . Now Lw = L(u2 − u1 ) = Lu2 − Lu1 = f − f = 0, that is, aw + bw + cw = 0 ¨ ˙ (4-14) w(t0 ) = 0 and w(t0 ) = 0, ˙ (4-15) Furthermore since w(t0 ) = u2 (t0 ) − u1 (t0 ) = 0 , and w(t0 ) = u2 (t0 ) − u1 (t0 ) = 0 . This reduces the ˙ ˙ ˙ question to showing that if Lw = 0 , and if w has zero initial position and velocity, then in fact w ≡ 0 . The trick is to introduce a new function, E (t) , associated with (2) (which happens to be the total energy of the system) 1 1 E (t) = aw2 + cw2 . ˙ 2 2 How does this function change with time? We compute its derivative. ˙ E (t) = aww + cww = w(aw + cw). ˙¨ ˙ ˙¨ Using (2) we know that aw + cw = −bw . Therefore ¨ ˙ ˙ E (t) = −bw2 ≤ 0 ˙ (since b ≥ 0) ˙ [Thus energy is dissipated (b > 0 ) - or conserved E = 0 in the special case of no damping ( b = 0 ).] Consequently E (t) ≤ E (t0 ) for all t ≥ t0 (4-16) a Now observe that for the mechanical system associated with w , we have E (t0 ) = w w2 (t0 )+ ˙ c2 w (t0 ) = 0 . Furthermore, it is obvious from the definition of E (t) (since a and c are 2 positive) that 0 ≤ E (t) . Substitution of this information into (4) reveals 0 ≤ E (t) ≤ 0 for all t ≥ t0 . 164 CHAPTER 4. LINEAR OPERATORS: GENERALITIES. V 1 → VN , VN → V 1 This proves E (t) ≡ 0 for all t ≥ t0 , which in turn implies w(t) ≡ 0 —again from the definition of E (t) . Our proof is completed. We have taken some care since all of our uniqueness proofs will use essentially no additional ideas. A more general case ( a, b and c still constants but not necessarily positive) will be treated in Exercise 9. Having proved that there is at most one solution of the initial value problem Lu : = au + bu + cu = f (t) ¨ ˙ u(t0 ) = α and u(t0 ) = β ˙ (differential equation) (initial conditions) we must now prove there is at least one solution. This is the question of existence. For the special equation (5), the solution is shown to exist by explicitly exhibiting it. In the case of more complicated equations we are not as fortunate and must content ourselves with just showing that a unique solution exists but cannot exhibit it in closed form. It is easiest to begin with the homogeneous equation Lu = 0 , that is, find a solution of au + bu + cu = 0 ¨ ˙ with u(t0 ) = α, and u(t0 ) = β. ˙ Without motivation, let us see what the substitution u(t) = eλt yields. Here λ is a constant. We must compute Leλt . Leλt = (aλ2 + bλ + c)eλt . Can λ be chosen so that eλt is a solution of Lu = 0 ? Since eλt = 0 for any t , this means, is it possible to pick λ so that aλ2 + bλ + c = 0 ? Yes. In fact that “quadratic equation formula” yields two roots √ √ −b − b2 − 4ac −b + b2 − 4ac , λ2 = λ1 = 2a 2a of the characteristic polynomial p(λ) = aλ2 + bλ + c . Notice that we have assumed a = 0 . Thus, two solutions of the homogeneous equation are u1 (t) = eλ1 t and u2 (t) = eλ2 t . Since the operator L is linear, every linear combination of solutions is also a solution, L(Au1 + Bu2 ) = ALu1 + BLu2 = 0 . Therefore u(t) = Au1 (t) + Bu2 (t) is a solution of the homogeneous equation Lu = 0 for any choice of the scalars A and B . What about the initial conditions u(t0 ) = α, u(t0 ) = β ; can they be satisfied by picking ˙ the constants A and B suitably? Let us try. We want to pick A and B so that Aeλ1 t0 + Beλ2 t0 = α λ1 t0 Aλ1 e λ2 t0 + Bλ2 e (u(t0 ) = α) =β (u(t0 ) = β ). ˙ These equations can be solved as long as 0 = λ2 e(λ1 +λ2 )t0 − λ1 e(λ2 +λ1 )t0 = (λ2 − λ1 )e(λ1 +λ2 )t0 , which means λ1 = λ2 or b2 − 4ac = 0 . [The linear equations Ar1 + Bs1 = α, Ar2 + Bs2 = β can be solved for A and B if r1 s2 − r2 s1 = 0 ]. Before dealing with the degenerate case b2 − 4ac = 0 , let us consider an 4.2. A DIGRESSION TO CONSIDER AU + BU + CU = F . 165 Example: Solve u + 3u + 2u = 0 with the initial conditions u(0) = 1 and u(0) = 0 . If ¨ ˙ ˙ λt , the characteristic polynomial is λ2 + 3λ + 2 = 0 , we seek a solution of the form u(t) = e and has roots λ1 = −1, λ2 = −2 . Therefore u(t) = Ae−1 + Be−2t is a solution. Since λ1 = λ2 , we can solve for A and B by using the initial conditions. We find A+B =1 (u(0) = 1), −A − 2B = 0 (u(0) = 0). ˙ These two equations yield A = 1, B = −1 . Thus u(t) = 2e−t − e−2t is the unique solution of our initial value problem. The degenerate case b2 − 4ac = 0 must be discussed separately. In this case λ1 = λ2 = b − 2a , so the two solutions eλ1 t and eλ2 t are really the same solution. Without motivation (but see Exercise 12) we claim that teλ1 t is also a solution. This is easy to verify by a calculation. L(teλ1 t ) = a(tλ2 eλ1 t + 2λ1 eλ1 t ) + b(eλ1 t + λ1 teλ1 t ) + cteλt 1 = (aλ2 + bλ1 + c)teλt + (2aλ1 + b)eλ1 t 1 (4-17) Since (aλ2 + bλ1 + c) = 0 by definition of λ1 , and λ1 = − 2ba in our special case, both terms 1 on the right vanish. Hence both u1 (t) = eλ1 t and u2 (t) = teλ1 t are solutions of Lu = 0 (if b2 − 4ac = 0) , so u(t) = Aeλ1 t + Bteλ1 t is a solution for any choice of A and B . It is possible to pick A and B to satisfy arbitrary initial conditions u(t0 ) = α, u(t0 ) = β . ˙ Aeλ1 t0 + Bt0 eλ1 t0 = α, λ1 t0 Aλ1 e (u(t0 ) = α) λ1 t0 + B (1 + λ1 t0 )e =β (u(t0 ) = β ). ˙ These can be solved for A and B since 0 = (1 + λ1 t0 )e2λ1 t0 − λ1 t0 e2λ1 t0 = e2λ1 t0 . Example: Solve u +6u +9u = 0 with the initial conditions u(1) = 2 , u(1) = −1 . Seeking ¨ ˙ ˙ a solution in the form eλt , we are led to the characteristic equation λ2 + 6λ + 9 = 0 , which has λ1 = −3, λ2 = −3 , as roots. Therefore u1 (t) = e−3t is a solution of Lu = 0 . Since λ1 = λ2 , another solution is u2 (t) = te−3t . Thus u(t) = Ae−35 + Bte−3t is a solution for any A and B . To solve for A and B in terms of the initial conditions, we must solve the algebraic equations Ae−3 + B · 1 · e−3 = 2, −3 − 3Ae −3 + B (1 − 3)e (u(1) = 2), = −1, (u(1) = −1). ˙ We find that A = −3e3 and B = 5e3 . Thus u(t) = −3e3 e−3t + 5e3 te−3t , or, equivalently, u(t) = −3e−3(t−1) + 5te−3(t−1) . Our results will now be collected as 166 CHAPTER 4. LINEAR OPERATORS: GENERALITIES. V 1 → VN , VN → V 1 Theorem 4.7 . The initial value problem au + bu + cu = 0, a = 0, ¨ ˙ with u(t0 ) = α, u(t0 ) = β, ˙ where a, b , and c are constants, has a unique solution. i) If b2 − 4ac = 0 , it is of the form u(t) = Aeλ1 t + Beλ2 t . ii) If b2 − 4ac = 0 , so λ1 = λ2 , it is of the form u(t) = Aeλ1 t + Bteλ1 t . Here λ1 and λ2 are the roots of the characteristic equation aλ2 + bλ + c = 0 , and the constants A and B are determined from the initial conditions. Remark: We have omitted the condition a > 0 , b ≥ 0 , c ≥ 0 from our theorem since the construction presented to find a solution did not depend on this. Uniqueness for that case is treated as exercise 9, as we mentioned earlier. Another Example: Solve u − 2u + 2u = 0 , with the initial conditions u(0) = 1, u(0) = 1 . The ¨ ˙ ˙ characteristic polynomial is λ2 − 2λ + 2 = 0 . Its roots are λ1 = 1 + i , and λ2 = 1 − i . Since λ1 = λ2 , the solution is of the form u(t) = Ae(1i )t + Be(1−i)t . From the initial conditions, we find that A + B = 1, (u(0) = 1), (1 + i)A + (1 − i)B = 1, Thus A = 1 , B = 2 1 2 (u(0) = 1). ˙ , so 1 1 u(t) = e(1+i)t + e(1−i)t . 2 2 Recalling that ex+iy = ex (cos t + i sin y ) , this solution may be written in a more familiar form: 1 1 u(t) = et (cos t + i sin t) + et (cos t − i sin t), 2 2 that is, u(t) = et cos t. What has been done can be summarized elegantly in the language of linear spaces. We have sought a solution of a second order linear O.D.E., which we write as Lu = 0 . It was found that every solution of this equation could be expressed as a linear combination of two specific solutions u1 and u2 , u(t) = Au1 (t) + Bu2 (t) , where the constants A and B are uniquely determined from u(t0 ) and u(t0 ) . Thus, the set of functions u which satisfy ˙ Lu = 0 form a two dimensional subspace of D(L) = C 2 . The functions u1 and u2 span that subspace. If we call the set of all solutions of Lu = 0 the nullspace of L, N(L) , then our result simply reads “dim N(L) = 2 ”. A particular solution of Lu = 0 is found by specifying u(t0 ) and u(t0 ) . ˙ The inhomogeneous equation Lu = f is treated by finding a coset of the nullspace of L . For if u0 is a particular solution of the inhomogeneous equation Lu0 = f , then 4.2. A DIGRESSION TO CONSIDER AU + BU + CU = F . 167 u = u+u0 ; where u ∈ N(L) , is also a solution since Lu = L(˜+u0 ) = Lu+Lu0 = 0+f = f . ˜ ˜ u ˜ Therefore, if one solution u0 of the inhomogeneous equation Lu = f is found, the general solution is u = u + u0 where u ∈ N(L) . In particular, the solution u ∈ N(L) can be ˜ ˜ ˜ chosen so that arbitrary initial conditions for u , u(t0 ) = α, u(t0 ) = β , can be met. We ˙ shall defer (until our systematic treatment of linear O.D.E.’s) presenting a general method for finding a solution u0 of the inhomogeneous equation. In our example, the particular solution will be found by guessing. Example: Solve Lu : = u − u = 2t , with the initial conditions u(0) = −1, u(0) = 3 . The ¨ ˙ homogeneous equation Lu = 0 has the general solution u(t) = Aet + Be−t . We observe ˜ that the function u0 (t) = −2t is a particular solution of the inhomogeneous equation, Lu = 2t . Thus u(t) = Aet + Be−t . The initial conditions lead us to solve the following equations for A and B , A + B = −1 (4-18) A − B − 2 = 3. (4-19) A computation gives A = 2 , B = −3 . Thus the solution of our problem is u(t) = 2et − 3e−t − 2t It is routine to verify that this function u(t) does satisfy the O.D.E. and initial conditions (you should verify the solutions to check for algebraic mistakes). Exercises (1) Solve the following homogeneous initial value problems, (a). u − u = 0, ¨ u(0) = 0, u(0) = 1 . ˙ (b). u + u = 0, ¨ u(0) = 1, u(0) = 0. ˙ (c). u − 4u + 5u = 0, ¨ ˙ u(0) = −1, (d). u + 2u − 8u = 0, ¨ ˙ u(2) = 3, (e). u = 0, ¨ u(0) = 7, u(0) = 2. ˙ u(2) = 0. ˙ u(0) = 3. ˙ (2) Solve the following inhomogeneous initial value problems by guessing a particular solution of the inhomogeneous equation. Check your answers. (a) u − u = t2 , ¨ u(0) = 0, hint: Try u0 (t) = a1 t2 (b) u − 4u +5u = sin t, ¨ ˙ u(0) = 0 ˙ + a2 t + a3 and solve for a1 , a2 , a3 .] u(0) = 1, u(0) = 0 [hint: Try u0 (t) = a1 sin t + a2 cos t .] ˙ (3) Consider an undamped harmonic oscillator with a sinusoidal forcing term, u + n2 u = ¨ 2 = n2 [try u (t) = a sin γt + a cos γt for a sin γt . Find the general solution if γ 0 1 2 + particular solution]. What happens if γ →− n ? This is called resonance. (4) You shall discuss damping in this problem. Consider the equation u + 2µu + ku = 0 , ¨ ˙ where µ > 0 , and k > 0 . We shall let γ = |µ2 − k | . 168 CHAPTER 4. LINEAR OPERATORS: GENERALITIES. V 1 → VN , VN → V 1 (a) Light damping (µ2 < k ) . Show that the solution is u(t) = e−µt (A cos γt + B sin γt), and sketch a rough graph for the case A = 1, B = 0 . This is the kind of oscillation you want for a pendulum clock, with µ small. (b) Heavy damping (µ2 > k ) . Show that the solution is u(t) = e−µt (Aeγt + Be−γt ). Show that u(t) vanishes at most once. Sketch a graph for the two cases A = B = 1 and A = −1, B = 3 . The first describes the oscillation of an ideal screen door, while the second describes the ideal oscillation of a slammed car door. (5) It is often useful to study the oscillations described by u + 2µu + ku = 0 by sketching ¨ ˙ the solution in the u, u plane - or phase space as it is called. Investigate the curves ˙ for heavily and lightly damped oscillators. Show that the curve for a heavily damped oscillator will be a straight line through the origin for special initial conditions. What does the phase space curve look like for an undamped oscillator (µ = 0, k > 0) ? (6) Consider the linear operator Lu = au + bu + cu , where a, b, c are constants. We have ¨ ˙ seen that Lert = p(r)ert where p(r) is the characteristic polynomial. (a) If r is not one of the roots of the characteristic polynomial, observe that you can find a particular solution of Lu = ert . What is it? (b) If neither r1 nor r2 is a root of the characteristic polynomial, find a particular solution of Lu = a1 er1 t + a2 er2 t , where a1 and a2 are specified constants. (c) Use this procedure to find a particular solution of i)¨ − 4u = cos ht, u ii)¨ + 4u = sin t u (7) (a) Imitate our procedure and develop a theory for the first order homogeneous O.D.E. Lu : = u + bu = 0 , where b is a constant. In particular, you should prove ˙ that there exists a unique solution satisfying the initial condition u(t0 ) = α , and give a recipe for finding it. Use your recipe to solve u + 2u = 0, u(0) = 3 . ˙ (b) And now you will show us how to find a particular solution of the inhomogeneous equation Lu = f , where f (t) is some given continuous function and Lu : = d u + bu . [hint: Try to find a function µ(t) such that µ(u + bu) = dt (µu) . Then ˙ ˙ d integrate dt (µu) = µf , and solve for u ]. Use your method to find a particular solution for u + 2u = x , and then a solution of the same equation which satisfies ˙ the initial condition u(0) = 1 . (8) Find a solution of u − 2u − u + 2u = 0 which satisfies the initial conditions u(0) = u (0) = 0, u (0) = 1 . [hint: The cubic equation γ 3 − 2γ 2 − γ + 2 has roots +1, −1 and 2]. (9) You will prove the uniqueness theorem for the equation u + bu + cu = 0 , where b and ¨ ˙ c are any constants (we have let a = 1 , because if it is not 1, just divide the whole equation by a). The trick is to reduce this to the special case b ≥ 0, c ≥ 0 , already done. 4.2. A DIGRESSION TO CONSIDER AU + BU + CU = F . 169 (a) Show that in order to prove the solution of u + bu + cu = f, where u(t0 ) = α, u(t0 ) = β ¨ ˙ ˙ is unique, it is sufficient to prove that the only solution of w + bw + cw = 0, w(t0 ) = 0, w(t0 ) = 0 ¨ ˙ ˙ is w(t) ≡ 0 . (b) Define ϕ(t) by w(t) = eγt ϕ(t) . Observe: to prove w = 0 , it is sufficient to prove ϕ ≡ 0 ( here γ is any constant). Use the differential equation and initial conditions for w to find the differential equation and initial conditions for ϕ . Show that γ can be picked so that the D.E. for ϕ is ϕ + ˜ϕ + ˜ϕ = 0, ¨ b˙ 0 where ˜ and c are positive. Deduce that ϕ ≡ 0 , and from that, that w ≡ 0 , b ˜ completing the proof. (10) A boundary value problem for the equation u + bu + cu = 0 is to find a solution of the equation with given boundary values, say u(0) = α and u(1) = β . Assume b and c are real numbers. (a) Show that a solution of the boundary value problem always exists if b2 − 4c ≥ 0 (the case b2 − 4c = 0 will have to be done separately). (b) Prove that if b2 − 4c ≥ 0 , the solution is unique too. [I suggest letting u(t) = eγt v (t) , and then choosing γ so that the equation satisfied by v is of the form v + cv = 0 , where c ≤ 0 . The case c = 0 is trivial. If ˜c) < 0 , can the solution ˜ ˜ ˜ ( have a positive maximum or negative minimum?] (11) If a spring is hung vertically and a mass m placed at its end, an external force of magnitude mg due to gravity is placed on the system. Assume there are no dissipative forces of any kind. (a) Set up the differential equation of motion. Remember that you must specify which is the positive direction. (b) If the tip of the spring is displaced a distance d by placing the mass on it (no motion yet), so the equilibrium position is d below the unstretched end of the spring, show that the spring constant k is given by k = mg/d . (c) Let the body weigh 32 pounds, and d be 2 feet. Find the subsequent motion if the body is initially displaced from rest one foot below its equilibrium position. [Take |g | = 32 ft/sec 2 ]. (12) * Consider au + bu + cu = 0 . If γ1 = γ2 , are the roots of the characteristic equation, observe that the function eγ1 t − eγ2 t u(t) = ˜ γ1 − γ2 170 CHAPTER 4. LINEAR OPERATORS: GENERALITIES. V 1 → VN , VN → V 1 is also a solution (it is a linear combination of eγ1 t and eγ2 t ). Now pass to the limit γ2 → γ1 (leave γ1 fixed and let γ2 move) by using the Taylor series for eγt . The function you get is then a “guess” for a second solution in the degenerate case γ1 = γ2 . This supplies some motivation for the guess made earlier. (13) * Consider Lu : = u + 2u = f , where f is given. You know how to solve Lu = A sin nx (Exercise 6). Find a particular solution to the general inhomogeneous equation in the interval [−π, π ] by expanding f in a Fourier series and then use superposition. Apply this to solve u + 2u = x . (14) Consider an undamped harmonic oscillator, whose motion is specified by u(t) , where k k t + B1 sin t may mu + ku = 0, k > 0 . Show that the solution u(t) = A1 cos m m be written in the form u(t) = A sin(wt + θ), where A is the amplitude of the oscillation, w = 2πv, v is the frequency, and θ is the phase. Show that u(t) is periodic, u(t + T ) = u(t) , where the period T = 1/v . Interpret the amplitude and phase and determine A, w, and θ in terms of A1 , B1 , k and m . [I suggest looking at a specific example and its graph first]. 4.3 Generalities on LX = Y . Undoubtedly the fundamental problem in the theory of linear (and nonlinear) operators is to determine the nature of the range of an operator L . One particular aspect of this is the vast problem of solving the equation LX = Y for X when Y is given to you. The question here is, “is a given Y in the range of L ?”, or “can we find some X such that LX = Y ?” If one can solve the problem uniquely for any Y , then the solution is written as X = L−1 (Y ), where L−1 is the operator inverse to L , in the sense that L−1 L = I (so to solve LX = Y , apply L−1 , X = L−1 LX = L−1 Y ). Let us give some examples, familiar and unfamiliar, of problems of the form LX = Y , where Y is given. 1. LX = (2x1 + 3x2 , x1 + 2x2 ), X ∈ R2 , L : R2 → R2 . The problem of solving LX = Y where Y = (−1, 2) ∈ R2 is that of solving the two equations 2x1 + 3x2 = −1 x1 + 2x2 = 2 for two unknowns (x1 , x2 ) = X . 4.3. GENERALITIES ON LX = Y . 171 2. Lu = u + 2u + 3u, where u ∈ C 2 , L : C 2 → C. The problem of solving L(u) = x is that of solving the inhomogeneous ordinary differential equation Lu : = u + 2u + 3u = x for u(x) . π cos(x − t)u(t) dt, 3. Lu = u ∈ C [0, π ]. 0 You should check that L is a linear operator. The problem of solving L(u) = sin x is that of solving the integral equation π cos(x − t)u(t) dt = sin x Lu : = 0 for the function u . In this example, it is instructive to examine the range more closely. Since cos(x − t) = cos x cos t + sin x sin t and since functions of x are constant with respect to t integration, we see that Lu may be written as π Lu : = cos x π cos t u(t) dt + sin x 0 sin t u(t) dt, 0 or Lu : = α1 cos x + α2 sin x, where the numbers α1 and α2 are π α1 = π (cos t)u(t) dt; α2 = 0 (sin t)u(t) dt. 0 Thus, the range of L is the linear space spanned by cos x and sin x , which has dimension two. This linear operator L therefore maps the infinite dimensional space C [0, π ] into a finite (two) dimensional space. In order to even have a chance of solving Lu = f for this operator L , we first check to see if f even lies in this two dimensional subspace (for if it doesn’t, it is futile to go further). The particular function sin x does, so it is reasonable to look for a solution - which we shall not do right now (however there are infinitely many 2 solutions, among them u(x) = π sin x ). One particularly important equation which arises frequently is the homogeneous equation LX = 0, which is the special case Y = 0 of the inhomogeneous equation, LX = Y. Since L is a linear operator, there is no problem of our finding one solution of LX = 0 for X = 0 is a solution, the so-called trivial solution of the homogeneous equation. The problem is to find a non-trivial solution, or better yet, all solutions. In the previous section, this question was answered fully for the particular operator Lu = au + bu + cu , where a, b , and c are constants. Many of our results there generalize immediately, as we shall see now. 172 CHAPTER 4. LINEAR OPERATORS: GENERALITIES. V 1 → VN , VN → V 1 Definition: The set of all solutions of the homogeneous equation LX = 0 where L is a linear operator is called the nullspace of L . This nullspace of L, N(L) , consists of all X in the domain of L which are mapped into zero by L , L : N(L) → 0. N(L) ⊂ D(L). We have called N(L) the nullspace of L , not the null set because of Theorem 4.8 . The nullspace of a linear operator L : V1 → V2 is a linear space, a subspace of the domain of L . Proof: Since the domain of L, D(L) = V1 , is a linear space and N(L) ⊂ D(L) , by Theorem 2, p.142 all we need show is that the set N(L) is closed under multiplication by scalars and under addition of vectors. Say X1 and X2 ∈ N(L) . Then LX1 = 0 and LX2 = 0 . We must show that L(aX1 ) = 0 for any scalar a , and that L(X1 + X2 ) = 0 . But L(aX1 ) = aL(X1 ) = a · 0 = 0 , and L(X1 + X2 ) = LX1 + LX2 = 0 + 0 = 0 . Thus N(L) is a subspace of D(L) = V1 . One important reason for examining the null space of a linear operator is because if N(L) is known, and if any one solution of the inhomogeneous equation is known, say LX1 = Y (where Y was given and X1 is the solution we know), then every solution of the ˜ ˜ inhomogeneous equation is of the form X + X1 , where X ∈ N(L) . In other words every solution of LX = Y is in N(L) + X1 , the X1 coset of the subspace N(L) . Theorem 4.9 . Let L : V1 → V2 be a linear operator. If X1 and X2 are any two solutions of the inhomogeneous equation LX = Y , where Y is given, then X2 − X1 ∈ N(L) , or ˜ ˜ X2 = X + X1 where X ∈ N(L) . ˜ ˜ Proof: Let X = X2 − X1 . We shall show that X ∈ N(L) . ˜ LX = L(X2 − X1 ) = LX2 − LX1 = Y − Y = 0. By using this theorem, we see that if all solutions of the homogeneous equation LX = 0 are known - the nullspace of L —and if one solution of the inhomogeneous equation LX1 = Y is known, then all of the solutions of the inhomogeneous equation are known. This solution set of the inhomogeneous equation is the X1 coset of N(L) . Example: 1 Let L : R2 → R2 be defined by LX = (x1 + x2 , x1 − x2 ) ∈ R2 Then N(L) is the set of all points in R2 such that LX = 0 , that is, which satisfy the equations x1 + x2 = 0 x1 − x2 = 0 Thus the nullspace of L consists of the intersection of the two lines x1 + x2 = 0, x1 − x2 = 0 . The only point on both lines is 0. Thus N(L) is just the point 0. To solve the inhomogeneous equation LX = Y , where Y = (1, 1) . x1 + x2 = 1, x1 − x2 = 1, 4.3. GENERALITIES ON LX = Y . 173 we find one solution of it, X1 = (1, 0) . Then every solution of the inhomogeneous equation ˜ ˜ ˜ is of the form X = X + X1 , where X ∈ N(L) . But since X + 0 is the only point in N(L) , every solution is of the form X = 0 + X1 = X1 . Thus every solution is exactly X1 , which is the unique solution of LX = Y . This situation is a general one. Again, we also saw this for Lu = au + bu + cu . Theorem 4.10 . If the nullspace N(L) of the linear operator consists only of 0, then the solution of the inhomogeneous equation LX = Y (if a solution exists) is unique. (Thus, if the nullspace contains only 0, then L is injective). Proof: Say there were two solution X1 and X2 . Then LX1 = Y and LX2 = Y , which implies L(X2 − X1 ) = LX2 − LX1 = Y − Y = 0 . Therefore (X2 − X1 ) ∈ N(L) . Since the only element of N(L) is 0, X2 − X1 = 0 , or, X1 = X2 . In other words, the two solutions are the same. Example: 2 Let L : C 2 → C be defined on functions u ∈ C 2 by Lu : = a(x)u + b(x)u + c(x)u. The nullspace of L consists of all solutions of the homogeneous equation Lu = 0 . It turns out (see chapter 6) - as in the constant coefficient case - that every solution of this homogeneous O.D.E. has the form u = Au1 + Bu2 , where u1 and u2 are any two linearly independent solutions of the equation, and where A and B are constants. Thus N(L) is a two dimensional space spanned by u1 and u2 . If u1 is a particular solution of the inhomogeneous equation Lu1 = f , then all the solutions of Lu = f are just the elements of the u1 coset of N(L) , that is, functions of the form u = u + u1 , where u ∈ N(L) . ˜ ˜ With every linear operator L : V1 → V2 , V1 = D(L) , we have associated two other linear spaces, the nullspace N(L) ⊂ D(L) = V1 and range R(L) ⊂ V2 . There is a valuable and elegant way to connect D(L), N(L) and R(L) . The result we are aiming at is certainly the most important theorem of this section. We know that R(L) ⊂ V2 . The space V2 may be of arbitrarily high dimension. However, since R(L) is the image of D(L) , we suspect that R(L) can take up “no more room” then D(L) . To be more precise, dim R(L) ≤ dim D(L). Thus, for example, if L : R2 → R17 , we expect that the range of L is a subspace of dimension no more than two in R17 . Not only is this a justifiable expectation, but even more is true. If Dim R(L) = Dim D(L) , essentially all of D(L) is carried over under the mapping. But if Dim R(L) < Dim D(L) , what has happened to the remainder of D(L) ? Let us look at N(L) ⊂ D(L) . The elements of N(L) are all squashed into the zero element of V2 . In other words, a set of dim N(L) in V1 = D(L) is mapped into a set of dimension zero in V2 . Does L decompose D(L) + V1 into two parts, N(L) and a complement N(L) such that L maps N(L) into zero and the dimension of the remainder, N(L) , is preserved under L (so dim N(L) = dim R(L)) . Of course, a figure goes here 174 CHAPTER 4. LINEAR OPERATORS: GENERALITIES. V 1 → VN , VN → V 1 Theorem 4.11 . Let the linear operator L map V1 = D(L) into V2 . If D(L) has finite dimension, then dim D(L) = dim R(L) + dim N(L). Proof: Let N(L) be a complement of N(L) (cf. pp. 163a-d). Since dim N(L) + dim N(L) = dim D(L) , it is sufficient to prove that dim N(L) = dim R(L) . For X ∈ V1 , we can write X = X1 + X2 , where X1 ∈ N(L) and X2 ∈ N (L) . Now LX = LX1 + LX2 , so the image of D(L) is the same as the image of N(L) . In addition, if X2 ∈ N(L) , then LX2 = 0 if and only if X2 = 0 , merely because N(L) is a complement of the nullspace. Let { θ1 , . . . , θk } be a basis for N(L ) . If X2 ∈ N(L) , we can write X2 = k k aj θj , and LX2 = j =1 aj Lθj . Let Lθ1 = Y1 , Lθ2 = Y2 , . . . , Lθk = Yk . Since the image j =1 of N(L) is R(L) , the vectors Y1 , . . . , Yk span R(L) . Thus, dim R(L) ≤ k = dim N(L) . To show that there is equality, dim R(L) = dim N(L) , we prove that Y1 , . . . , Yk are linearly independent. If c1 Y1 + · · · + ck Yk = 0 , then 0 = c1 Lθ1 + · · · + ck Lθk = L(c1 θ1 + ˜ ˜ · · · + ck θk ) = LX where x = c1 θ1 + · · · + ck θk ∈ N(L) . However for any X ∈ N(L) , ˜ ˜ we know Lx = 0 implies that X = 0 . The linear independence of θ1 , . . . , θk further ˜ shows that c1 = c2 = · · · = ck = 0 . The hypothesis c1 Y1 + · · · + ck Yk = 0 has led us to conclude that the cj ’s are all zero, that is, the Yj ’s are linearly independent. Therefore dim R(L) = dim N(L) . Coupled with our first relationship, this proves the result. Corollary 4.12 : dim R(L) ≤ dim D(L). Proof: dim N(L) ≥ 0 . Two examples, one an illustration, the other an application. Example: 1 Consider a projection operator, PA , mapping vectors from E n into a subspace A of En , where the dim A = m < n . Let us first show that PA is a linear operator. If e1 , . . . em is an orthonormal basis for A , then for any X and Y in En , m PA (X + Y ) = m X + Y, ek ek = k=1 m = ( X, ek + Y , ek )ek k=1 m X, ek ek + k=1 Y , ek ek = PA X + PA Y. k=1 Similarly, PA (aX ) = aPA X for every scalar a . Thus the projection operator is a linear operator. Since R(PA ) = A and dim A = m , while dim En = n , we conclude that dim N(PA ) = n − m . This could have been arrived at immediately since PA will certainly map everything perpendicular to A , that is A⊥ , into 0 (see fig. illustrating the case E2 → A , where A is a line). Thus N(PA ) + A⊥ , so dim N(PA ) = dim A⊥ = n − m . Example: 2 Define L : Rn → Rk by, LX = (a11 x1 + a12 x2 + · · · + a1n xn , a21 x1 + · · · + a2n xn , · · · , akl x1 + ak2 x2 + · · · + akn xn ) where X = (x1 , x2 , . . . xn ) ∈ Rn . If we let Y = (y1 , . . . , yk ) ∈ Rk , then writing Y = LX , the linear operator L may be defined by the k equations (for y1 , . . . , yk ) in n “unknowns” 4.3. GENERALITIES ON LX = Y . 175 (x1 , . . . , xn ) , a11 x1 + a12 x2 + · · · + a1n xn = y1 a21 x1 + a22 x2 + · · · + a2n xn = y2 . . . ak1 x1 + ak2 x2 + · · · + akn xn = yk . The problem of solving LX = Y , where Y is given, is that of solving k equations with n “unknowns”. Consider the special case k < n , when there are less equations than unknowns. Since the range of L is contained in Rk , R(L) ⊂ Rk , then dim R(L) ≤ dim Rk = k . Because D(L) = Rn , we also know that dim D(L) = dim Rn = n . Thus dim N(L) = dim D(L) − dim R(L) ≥ n − k > 0. However if dim N(L) > 0 , then N(L) must contain something other than zero. Thus there ˜ ˜ ˜ is at least one non-trivial solution X of the homogeneous equation, LX = 0 . Since aX is also a solution, where a is any scalar, there are, in fact an infinite number of solutions. Notice that the above was a non-constructive existence theorem. We proved that a solution does exist but never gave a recipe to obtain it. One consequence of this result is that, if dim N(L) > 0 , and if a solution of the inhomogeneous equation LX = Y exists, it ˜ ˜ is not unique; for if LX1 = Y , then also L(X1 + X ) = Y , where X is any solution of the homogeneous equation. In the special case n = k , and dim N(L) = 0 a fascinating (and non-constructive) theorem falls out of Theorem 11: the inhomogeneous equation LX = Y always has a solution and the solution is unique. Put in more conventional terms, if there are the same number of equations as unknowns, and if the only solution of the homogeneous equation is zero, then the inhomogeneous equation always has a unique solution. Thus, if n = k , uniqueness implies existence. Since dim N(L) = 0 , then dim R(L) = dim D(L) = n . However L : Rn → Rn in this case (n = k ) . Since R(L) ⊂ Rn and dim R(L) = n , we see that R(L) must be all of Rn , that is, every Y ∈ Rn is in the range of L , which means that the inhomogeneous equation LX = Y is solvable for every Y ∈ Rn . Theorem 10 gives the uniqueness. We shall obtain a better theorem later. Remark: Some people refer to dim R(L) as the rank of the linear operator L . We shall, however, refer to it as the dimension of the range of L . If L1 : V1 → V2 and L2 : V2 → V3 , it is easy to make a few statements about dim R(L2 L1 ) . Theorem 4.13 . If L1 : V1 → V2 and L2 : V3 → V4 , when V2 ⊂ V3 , (so L2 L1 ) is defined), then dim R(L2 L1 ) ≤ min(dim R(L1 ), dim R(L2 )). Proof: The last corollary states that an operator is like a funnel with respect to dimension: the dimension can only get smaller or remain the same. After passing through two funnels, we obtain no more than the smallest allowed through. One might think that there should be equality in the formula. That this is not the case can be seen from the possibility illustrated in the figure. Only the shaded stuff gets through. 176 CHAPTER 4. LINEAR OPERATORS: GENERALITIES. V 1 → VN , VN → V 1 Exercises (1) Let L : Rn → Rn be defined by LX = (x1 , x2 , . . . , xk , 0, . . . , 0), where X = (x1 , x2 , . . . xn ) ∈ Rn . Describe R(L) and N(L) . Compute dim R(L) and dim N(L) . (2) (a) Describe the range and nullspace of the linear operator L : R3 → R3 defined by LX = (x1 + x2 − x3 , 2x1 − x2 + x3 , x2 − x3 ), X = (x1 , x2 , x3 ) ∈ R3 . (b) Compute dim R(L) and dim N(L) . (c) Is (1, 2, 0) ∈ R(L) ? Is (1, 2, 1) ∈ N(L) ? Is (1, 2, 2) ∈ N(L) ? Is (0, −1, −1) ∈ N(L) ? (3) Let A = { u ∈ C 2 [0, 2] : u(0) = u(1) = 0 } , and define L : A → C [0, 1] by Lu = u + b(x)u − u , where b(x) is some continuous function. Prove N(L) = 0 . [hint: If u ∈ N(L) , can u have a positive maximum or negative minimum?] (4) Consider the linear operator L : C [0, 1] → C [0, 1] defined by 1 (Lu)(x) = u(x) + 2 ex−t u(t) dt 0 (a) Find the nullspace of L . (b) Solve Lu = 3ex . Is the solution unique? (c) Show that the unique solution of Lu = f , where f ∈ C [0, 1] is u(x) = f (x) − cex , where c = 2 3 1 e−t f (t) dt. 0 (5) Let L : V → V (so Lk is defined for k = 0, 1, 2 . . . ). Prove that (a) R(L) ⊂ N(L) if and only if L2 = 0 . (b) N(L) ⊂ N(L2 ) ⊂ N(L3 ) ⊂ . . . (c) N(L) ⊃ N(L2 ) ⊃ N(L3 ) ⊃ . . . . (6) If L1 : V1 → V2 and L2 : V3 → V4 where V2 ⊂ V3 , Theorem 12 gives an upper bound for dim R(L2 L1 ) . (a) Prove the corresponding lower bound dim R(L2 L1 ) ≥ dim R(L1 ) + dim R(L2 ) − dim V3 . [hint: Prove the equivalent inequality dim R(L1 ) ≤ dim R(L2 L1 ) + dim N(L2 ) ˜ ˜ by letting V = R(L1 ) and applying Theorem 11 to L2 defined on V ]. (b) Prove: if dim N(L2 ) = 0 , then dim R(L2 L1 ) = dim R(L1 ). 4.4. L : R1 → RN . PARAMETRIZED STRAIGHT LINES. 177 (c) If dim N(L1 ) = 0 , is it then true that dim R(L2 L1 ) = dim R(L2 ) ? Proof or counterexample. (d) If dim V1 = dim V2 = dim V3 and dim N(L1 ) = 0 , is it true that dim R(L2 L1 ) = dim R(L2 ) ? Proof or counterexample. (7) If L1 and L2 both map V1 → V2 , prove |dim R(L1 ) − dim R(L2 )| ≤ dim R(L1 + L2 ). (8) Consider the operator L : C 2 [0, ∞) → C [0, ∞) defined by Lu : = u + 3u + 2u. (a) Describe N(L) . What is dim N(L) ? Is f (x) = sin x ∈ R(L) ? (b) Consider the same operator L but mapping A into C [0, ∞] , where A = { u ∈ C 2 [0, ∞) : u(0) + u (0) = 0 } . Answer the same questions as part a). (c) Same as b but A = { u ∈ C 2 [0, ∞) : u(1) + u (1) = 0 } this time. 4.4 L: R 1 → Rn . Parametrized Straight Lines. Our study of particular linear operators begins with the most simple case: a linear operator which maps a one- dimensional space R1 into an n dimensional space Rn . Since the dimension of the range of L is no greater than that of the domain R1 and dim R1 = 1 , then dim R(L) ≤ 1. This proves Remark: If L : R1 → Rn , then the dimension of the range of L is either one or zero. The case dim R(L) = 0 is trivial, for then L must map all of R1 into a single point, and that single point must be the origin since the range of L is a subspace. Thus, if dim R(L) = 0 , then L maps every point into 0. Without change, the same holds if L : V1 → V2 (where V1 and V2 are any linear spaces) and dim R(L) = 0 . Not very profound. If dim R(L) = 1 , then the subspace R(L) in Rn is a one dimensional subspace in the n dimensional space Rn , this is, R(L) is a “straight line” through the origin of Rn . This straight line is determined if any one point P = 0 on it is known. Then there is a point X1 ∈ R1 such that LX1 = P . Since R1 is one dimensional it is spanned by any element other than zero, so every X ∈ R1 can be written as X = sX1 . Therefore, if X is any element of R1 , LX = L(sX1 ) = sLX1 = tP. In other words, this last equation states that the range of L is a multiple of a particular vector P , that is, a straight line through the origin. Example: If L : R1 → R2 such that the point X1 = 2 ∈ R1 is mapped into the point P = (1, −2) ∈ R2 , then L : X = s2 → (s, −2s), In particular, the point X = 3(s = 3 ) is mapped into the point ( 3 , −3) . 2 2 178 CHAPTER 4. LINEAR OPERATORS: GENERALITIES. V 1 → VN , VN → V 1 a figure goes here In applications, the domain R1 usually represents time, while the range represents the position of a particle. Then L : R1 → Rn is an operator which specifies the position of a particle at a given time. Since L is linear and L0 = 0 , the path of the particle must be a straight line which passes through the origin at t = 0 . Later on in this section we shall show how to treat the situation of a straight line not through the origin, while in Chapter 7 we shall examine curved paths (non-linear operators). Example: This is the same example as before. L : R1 → R2 is such that at time t = 2 ∈ R1 a particle is at the point (1, −2) (while at t = 0 it is at the origin). At any time t = s2 , 3 the particle is at (s, −2s) . In particular, at t = 3(s = 2 ) , the particle is at ( 3 , −3) . It is 2 also convenient to rewrite the position (s, −2s) directly in terms of the time. Since t = 2s , t the position at time t is ( 2 , −t) . Thus we can write t L : t → ( , −t), 2 which clearly indicates the position at a given time. If a point in the space R2 is specified by Y = (y1 , y2 ) ∈ R2 , then the operator can be written as 1 y1 = t 2 y2 = −t. All of these are useful ways to write the operator L . In some situations, one might be more useful than another. This brings us to an issue which perhaps seems a bit pedantic but can serve you well in times of need. How can we represent the operator in a picture? There are three distinct ways. Some clarity can be gained by distinguishing them carefully. The same ideas carry over immediately to nonlinear operators. Our first picture has two parts. If L : R1 → Rn , then the first part is a diagram of R1 , the second part a diagram of Rn , and between them are arrows to indicate the image of each point in R1 . The picture below the first example was of this type. All of the arrows get in the way, so a more convenient picture is needed. That comes next. The second picture is the graph of an operator L . The graph L : R1 → Rn is the set of points (X, LX ) in the Cartesian product space R1 × Rn . Thus, if V 1 is time, and Rn space with L assigning a position to every time, then the points on the graph are points t in time - space (X, LX ) . For the previous example, these are the points (t, ( 2 , −t)) in R1 × R2 , a straight line in time-space ( or space-time if you prefer). To each time, there is a unique point in space. In a sense, this second picture, the graph, associated with an operator results from gluing together the two pieces of the first picture. By using the graph of an operator, we avoid the arrow mess of the first picture. The third picture just indicates the range of an operator (when thinking of pictures, the range is often referred to as the path of the operator). In terms of the time- position example, this picture only shows the path of a particle in space and ignores when a particle had a given position. Thus, this picture is the second half of the first picture. From our physical interpretation, it is clear that two different operators might have the same path (for two particles could travel the same path without having the same position at every time). Thus, this picture is an incomplete representation of an operator. 4.4. L : R1 → RN . PARAMETRIZED STRAIGHT LINES. 179 ˜ Example: If L : R1 → R2 such that the point X1 = 1 ∈ R1 is mapped into the point P = (1, −2) ∈ R2 , then ˜ L : X = s · 1 → (s, −2s). ˜ In particular, the point X = 3(s = 3) is mapped into the point (3, −6) . The graph of L 1 × R2 . Compare this with the is the set of points (s, s, −2s) , which is a straight line in R operator L considered previously (we remind you that L : X = 2s → (s, −2s)) . The graph ˜ of L was the set of point (2s, s, −2s) . These two sets of points the graphs of L and L , respectively, do not coincide since the operators are the same. On the other hand, the path ˜ of L is the set of points (s, −2s) , which is exactly the same set of points as the path of L . Shortly, we shall ask the question, how can we describe a straight line in Rn ? One way is to find an operator whose path is that straight line. Since many operators have the same path, there will be many possible ways to describe the straight line. All we need do is pick one, any one will do. Let L : R1 → Rn and Y0 be some fixed point in Rn . Consider the operator M X := LX + Y0 . Since M 0 = L0 + Y0 = Y0 = 0 , we see that M is not a linear operator; it is called an affine operator or affine mapping. The range of M is the subspace translated by the vector Y0 , a straight line which does not necessarily pass through the origin ( it will if and only if Y0 ∈ R(L) ). In other words, R(M ) is the Y0 coset of the subspace R(L) . Example: Take L to be the same as before, so L : X = 2s → (s, −2s) or L(2s) = (s, −2s) . Let Y0 = (−3, 2) . Then M X := LX + Y0 = (s, −2s) + (−3, 2) = (s − 3, −2s + 2) , where e X = 2s . In particular, M maps the point X = 3 ∈ V 1 (s = 2 ) into (− 3 , −1) ∈ R2 . The 2 figure shows the path of L and M . Since X = 2s , we can eliminate s from the above formula and write 1 M X = ( X − 3, −X + 2), X ∈ R1 . 2 If we denote by Y = (y1 , y2 ) a general point in R2 , then M may be written in the standard form 1 y−1= X −3 2 y − 2 = −X + 2. Of course, one could eliminate X from these too and be left with 2y1 + y2 = −4 , which is the equation of the path and could come from any mapping with the same path. It is instructive to investigate the reverse question, given two points P and Q in Rn , find an equation for the straight line passing through them. Any mapping whose path is the desired line will do. We have learned that M X = LX + Y0 is the general equation of a straight line through Y0 . There is complete freedom in specifying which points are mapped into P and Q , so we would be foolish not to pick the most simple case. Let M : 0 → P and M : 1 → Q . Then P = M (0) = L(0) + Y0 = Y0 , so Y0 = P , and Q = M (1) = L(1) + Y0 = L(1) + P , so L : 1 → P − Q . This completely determines M (since L is determined once the image of one point is known, L : 1 → P − Q , and the vector Y0 is also determined, Y0 = P ). Example: Find an equation for the straight line passing through the two points P = (1, 2, −3, −4) , Q = (−1, 3, 2, −2) in R4 . Say P is the image of 0 and Q is the image of 1, so M : 0 → P and M : 1 → Q . Then since M X = LX + Y0 ⇒ P = M (0) = Y0 so Y0 = (1, 2, −3, −4) . Also Q = L(1) + Y0 ⇒ L(1) = Q − Y0 = Q − P = (−2, 1, 5, 2) . Because 180 CHAPTER 4. LINEAR OPERATORS: GENERALITIES. V 1 → VN , VN → V 1 every X ∈ R1 can be written as X = s · 1 ⇒ LX = L(s · 1) = sL(1) = s(−2, 1, 5, 2) , or LX = (−2s, s, 5s, 2s) , where X = s · 1 ∈ R1 . Thus M X = LX + Y0 = (−2s, s, 5s, 2s) + (1, 2, −3, −4) , or M X = (−2s + 1, s + 2, 5s − 3, 2s − 4), where X = s· ∈ R1 . If we use Y = (y1 , y2 , y3 , y4 ) to indicate a general point in R4 , then M : R1 → R4 can be written as four equations. y1 = −2s + 1 y2 = s + 2 y3 = 5s − 3 y4 = 2s − 4 where X = s · 1 ∈ R1 . For example, the image of X = 2(s = 2) in R1 is the point Y = (−3, 4, 7, 4) ∈ R4 . The discussion before the example contained most of the proof of Theorem 13. Let P and Q be two points in Rn . Then the affine mapping M X = P + s(Q − P ), has as its path the straight line passing through P and Q . ˜ Remark: 1 The affine mapping M X = P + ks(Q − P ) , where k = 0 is some constant, has the same path too. The only change is that while M : 0 → P and M : 1 → Q this ˜ ˜ ˜ mapping M : 0 → P and M : ks → Q . In other words for M we have chosen to take ks (not s ) as the pre-image of Q . This pre-image of Q was entirely arbitrary anyway. Remark: 2 The equation M X = P + s(Q − P ) of M : R1 → Rn , where X = s · 1 ∈ R1 is called a parametric equation of the straight line which passes through P and Q in Rn , and s is called the parameter. Other parametric equations of the same line arise if X = ks · 1 ∈ R1 (cf. Remark 1), where k is some non-zero constant. In order to introduce the slope of a straight line, let us paraphrase the last few paragraphs in terms of particle motion. If P and Q are two points in Rn , then M t = P + t(Q − P ) M : R1 → Rn , where t ∈ R1 describes the position of the particle at time t . At t = 0 the particle is at P , while at t = 1 the particle is at Q . Another particle ˜ moving k times as fast has the position M t = P + kt(Q − P ) . This other particle is also 1 at P when t = 0 , but takes time t = k to reach the point Q . It still has the same path as the first particle. If we denote by Y = (y1 , y2 , . . . , yn ) an arbitrary point in Rn , then the position Y at time t is y1 = p1 + kt(q1 − p1 ) . . . y2 = p2 + kt(q2 − p2 ) yn = pn + kt(qn − pn ). Now consider the mapping M t = P + kt(Q − P ) . The derivative at t = t1 is dM dt t=t1 = lim t2 →t1 M (t2 ) − M (t1 ) t2 − t1 4.4. L : R1 → RN . PARAMETRIZED STRAIGHT LINES. 181 It represents the velocity at t = t1 . To have this make sense, we must introduce a norm in Rn so that the limit can be defined. Use the Euclidean norm (although any other one could be used, for it turns out that there is no need for a limit in the case of a straight line). Since M (t2 ) − M (t1 ) = P + kt2 (Q − P ) − [P + kt1 (Q − P )] = k (t2 − t1 )(Q − P ) , we have M (t2 ) − M (t1 ) = k (Q − P ), t2 − t1 so dM (t) = k (Q − P ). dt Because this is independent of t , it is the derivative at any time t . Thus, the derivative is a vector, k (Q − P ) . The derivative represents the velocity of a particle moving on the line. The speed is the length of the velocity vector, speed = k (Q − P ) . What is the slope of the line? Since the line is the path of a mapping, it should not depend on which mapping is used. In terms of mechanics, the slope should not depend on the speed of the particle moving along the line, but only that it moved along the straight line, that is its velocity vector was along the line. Thus we define the slope as a unit vector in the direction of the velocity. In our case, slope = Q − P/ Q − P . This is a unit vector from P to Q and only depends upon the mapping to specify a positive direction (orientation) for the line. Example: A particle moves on a straight line from P = (1, −2, 1) at t = 0 to Q = (3, 1, −5) at t = 2 . Find the position of the particle as a function of time, the velocity and speed of the particle, and slope of the path. The equation of the path is M t = P + kt(Q − P ) , where k is determined from 1 Q = M (2) = P + 2k (Q − P ) , so k = 1 . Thus M (t) = (1, −2, 1) + 2 t(2, 3, −6) = 2 3 1 3 (1 + t, 2 + 2 t, 1 − 3t) . Velocity = 2 (Q − P ) = (1, 2 , −3) . Speed = velocity = 7 . Slope 2 6 = velocity /speed = ( 2 , 3 , − 7 ) . 77 A glance at the formulas which precede the example reveals that the position of a particle which moves along a straight line through P can be written in any of the forms 1. M (t) = P + kt(Q − P ). where Q is another point on the path and the particle is at Q when t = 2. M (t) = P + 1 k , dM t dt or 3. M (t) = P + V t, where V is the velocity. See Exercise 5 too. Exercises (1) (a) If L : R1 → R2 such that the point X1 = 3 ∈ R1 is mapped into P = (1, 0) , which of the following points are in R(L) i) (2, 0) , ii) (1, 2) , iii) (−1, 0) ? (b) Sketch two pictures, one of the graph of L , the other of the path of L . ˜ (c) Find another operator L : R1 → R2 whose path is the same as that for L . 182 CHAPTER 4. LINEAR OPERATORS: GENERALITIES. V 1 → VN , VN → V 1 (2) Find a mapping whose path is the straight line passing through the points (2, −1, 3) and (1, −3, −5) . Find its slope too. (3) If a point is at (1, −1, 0) at t = 0 and at (2, 3, 8) at t = 3 , find the position as a function of time if the particle moves along a straight line. What is the velocity and speed of the particle? (4) If a particle is initially at (0, 1, 0, 1) and has constant velocity (1, −2, 3, −1) , find its position as a function of time. Where is it at t = 3 ? ˜ (5) A particle moves along a straight line in such a way that at t = t0 it is at P , while ˜. at t = t1 it is at Q (a) Show that its position M (t) as a function of time is ˜ ˜ Q−P ˜ M (t) = P + (t − t0 ) t1 − t0 (b) What is the velocity? (c) Show that M (t) = M (t0 ) + dM (t − t0 ). dt (6) Two straight lines are parallel if they have the same slope. If M (t) = P + t(Q − P ) is a parametric equation of one line, find an equation for the parallel line which passes ˜ through the point P . 4.5 L: R n → R1 . Hyperplanes. Whereas in the previous section we examined linear mappings from a one-dimensional linear space into an n dimensional space, now we shall look at the opposite extreme, linear mappings from an n dimensional space into a one- dimensional space. Let L : Rn → R1 . We would like to find a representation theorem for this linear operator. The most natural way to do this is to work with a basis { e1 , . . . , en } for Rn . n Then every X ∈ Rn can be written as X = xk ek . Consequently, 1 n n LX = L xk ek 1 = n L(xk ek ) = 1 xk L(ek ). 1 It is clear that LX is determined once we know all the numbers Lek . In other words, the linear mapping L is determined by the effect of the mapping on a basis for the domain of the operator. This proves Theorem 4.14 . Let L : Rn → R1 linearly. If { ek } is a basis for the domain of L, Rn , then n LX = a1 x1 + a2 x2 + . . . + an xn = ak xk , k =a 4.5. L : RN → R1 . HYPERPLANES. 183 n where X = xk ek and ak = Lek . Notice that the ak are scalars since they are in the 1 range of L —and the range of L is R1 by hypothesis. Examples: (1) Consider the linear operator L : R3 → R1 , which maps L : e1 = (1, 0, 0) → 1, L : e2 = (0, 1, 0) → 0 , and L : e3 = (0, 0, 1) → 0 . Since the ek constitute a basis for R3 , the mapping L is completely determined by using Theorem 14. If X = (x1 , x2 , x3 ) ∈ R3 , then X = x1 e1 + x2 e2 + x3 e3 . Thus LX = x1 Le1 + x2 Le2 + x3 Le3 = x1 − x2 or LX = x1 . For example, L : (2, 1, 7) → 2 . The nullspace of L —those points X ∈ R3 such that LX = 0 —are the points X = (x1 , x2 , x3 ) ∈ R3 such that x1 = 0 which is the x2 x3 plane. (2) Let L : R4 → R1 such that Le1 = 1 , Le2 = −2 , Le3 = 5 , Le4 = −3 , where e1 = (1, 0, 0, 0) , e2 = etc. Then if X = (x1 , x2 , x3 , x4 ) ∈ R4 , we have LX = x1 − 2x2 + 5x3 − 3x4 . The nullspace of L is again a hyperplane, the hyperplane x1 − 2x2 + 5x3 − 3x4 = 0 in R4 . So far we have not given any attention to the range of L , all of our pictures being in the domain of L . Since the range is R1 , its picture is a simple straight line which is not very interesting. However the graph of L is interesting. Let L : Rn → R1 and Y = (y ) ∈ R1 . Then y = a1 x1 + . . . + an xn . The graph of L is the set of points (X, LX ) [or (X, Y ) where Y = LX ] in Rn × R1 ∼ = Rn+1 . A point (X, Y ) = (x1 , . . . , xn , y ) ∈ Rn × R1 is on the graph if the coordinates satisfy the equation y = a1 x1 + . . . + an xn . This equation can be written as 0 = a1 x1 + . . . + an xn + (−1)y which is a hyperplane in Rn+1 . Thus we have found two ways to associate a hyperplane with L : Rn → R1 , i) All X such that LX = 0 , which is the nullspace of L , a linear space of dimension n − 1 (since dim N(L) = dim D(L) − dim R(L) = n − 1) . ii) The graph of L , that is, all points of the form (X, LX ) , is a linear space of dimension n+1. Although this is confusing, both ways are used in practice, whichever is most convenient for the problem at hand. For the remainder of this section, we shall confine our attention to hyperplanes defined in the first way. Since linear mappings L : Rn → R1 all have the form LX = a1 x1 + . . . + an xn , and since it is natural to think of the sum as the scalar product of the vectors N = (a1 , . . . , an ) and X = (x1 , . . . , xn ) . Theorem 14 may be rephrased as Theorem 4.15 . If L : Rn → R1 , then LX = N , X , where N is the vector N = (Le1 , . . . , Len ) and { ek } form a basis for Rn . 184 CHAPTER 4. LINEAR OPERATORS: GENERALITIES. V 1 → VN , VN → V 1 Remark: The vector N is an element of the so-called dual space of Rn . From the above, it is clear that the dual space of Rn also has dimension n . Theorem 14’ is a “representation theorem”. It states that every linear mapping L : Rn → 1 may be represented in the form LX := N , X for some vector N which depends on R L . You may wish to think of N as a vector perpendicular to the hyperplane LX = 0 (cf. Ex. 8, p. 225). Example: Consider the operator L of Example 2 in this section. For it, LX = N , X where N is the particular vector N = (1, −2, 5, 3) . Recall that a linear functional is a linear operator l whose range is R1 . Since the operators L : Rn → R1 we are considering have range R1 , they are all linear functionals. We may again rephrase Theorem 14 in this language. It states that every linear functional defined on Rn may be represented in the form l(X ) = N , X , where N depends on the functional l at hand. This is just a restatement of Theorem 14 with the realization that our L ’s are linear functionals. Don’t let the excess language bewilder you. So far in this section, we have concentrated our attention on the algebraic representation of a linear operator (functional) L : Rn → R1 . Let us turn to geometry for a bit. In passing we observed that the nullspace of the operator was a hyperplane in the domain of L (a hyperplane in a linear space V is a “flat” subset of V whose dimension is one less than V , that is, of codimension one). These hyperplanes, { X ∈ Rn : LX = 0 } , all passed through the origin of Rn . A plane parallel to this one which passes through the particular point X 0 ∈ Rn has the form L(X − X 0 ) = 0. It is clear that the point X = X 0 does satisfy the equation. From the representation theorem, L(X − X 0 ) = a1 (x1 − x0 ) + a2 (x2 − x0 ) + . . . + an (xn − x0 ) = 0, 1 2 n is the equation of this hyperplane, where X = (x1 , x2 , . . . , xn ) and X 0 = (x0 , x0 , . . . , x0 ) . n 12 If we again write N = (a1 , a2 , . . . , an ) , then the equation of the hyperplane is N , X − X 0 = 0, all vectors X such that X − X0 is perpendicular to N . Examples: (1) Find the equation of a plane which passes through the point X 0 = (1, 2, −5) and is parallel to the plane −2x1 + 7x2 + 4x3 = 0 . Solution : Here N = (−2, 7, 4) , X = (x1 , x2 , x3 ) , so the plane has the equation 0 = N , X − X 0 = −2(x1 − 1) + 7(x2 − 2) + 4(x3 + 5), which may be written as −2x1 + 7x2 + 4x3 = −8. The equation has been cooked up so that X 0 = (1, 2, −5) does satisfy it. (2) Find the equation of a plane which passes through the point X 0 = (1, 2, −5) and is parallel to the plane −2x1 + 7x2 + 4x3 = 37 . Solution : Since this plane is also parallel to the plane −2x1 + 7x2 + 4x3 = 0 , the solution is that of Example 1. 4.5. L : RN → R1 . HYPERPLANES. 185 (3) Find the equation of the plane in R4 which is perpendicular to the vector N = (1, −2, 3, 1) and passes through the point X 0 = (1, 0, 1, −1) . Easy. The plane is all points X such that N , X − X 0 = 0, that is (x1 − 1) − 2(x2 − 0) + 3(x3 − 1) + (x4 + 1) = 0, or x1 − 2x2 + 3x3 + x4 = 3. (4) Find the equation of the plane in R3 which passes through the three points X 1 = (7, 0, 0), X 2 = (1, 0, −2), X 3 = (0, 5, 1). We shall find this by using the general equation of a plane, a1 (x1 − x0 ) + a2 (x2 − x0 ) + a3 (x3 − x0 ) = 0. 1 2 3 Here X 0 = (x0 , x0 , x0 ) is a particular point on the plane. We may use any of X 1 , X 2 , 123 or X 3 for it. Since X 1 is simplest, we take X 0 = (7, 0, 0) . All that remains is to find the coefficients a1 , a2 , and a3 in a1 (x1 − 7) + a2 x2 + a + 3x3 = 0. Since X 2 and X 3 are in the plane (and so must satisfy its equation), the substitution X = X 2 and X = X 3 yields two equations for the coefficients, a1 (1 − 7) + a2 0 + a3 (−2) = 0 a1 (0 − 7) + a2 (5) + a3 (1) = 0. These two equations in three unknowns may be solved for any two in terms of the third. We find a3 = −3a1 and a2 = 2a1 , so the equation is a1 (x1 − 7) + 2a1 x2 − 3a1 x3 = 0. Factoring out the coefficient a1 , we obtain the desired equation x1 − 7 + 2x2 − 3x3 = 0. (It is clear from the general equation of a plane that the coefficients are determined only to within a constant multiple). Exercises (1) Let L : R2 → R1 map L : (1, 0) → 3, L : (0, 1) → −2. Write LX in the form Lx = a1 x1 + a2 x2 . L : (7, 3) → ? 186 CHAPTER 4. LINEAR OPERATORS: GENERALITIES. V 1 → VN , VN → V 1 (2) Let L : R2 → R1 map L : (2, 1) → 1, L : (0, 3) → −2. Write LX in the form LX = a1 x1 + a2 x2 . L : (7, 3) → ? (3) Find the equation of a plane in R3 which passes through the point (3, −1, 2) and is parallel to the plane x1 − x2 − 2x3 = 7 . (4) Find the equation of a plane in R5 which is perpendicular to the vector N = (6, 2, −3, 1, −1) and contains the point (1, 1, 1, 1, 4) . (5) Find the equation of a plane in R4 which contains the four points X1 = (2, 0, 0, 0) , X2 = (1, 0, 2, 0) , X3 = (0, −1, 0, −1) , X4 = (3, 0, 1, 1) . (6) In this problem, you will have to use the norm induced by the scalar product. a). Show that the distance between the point Y ∈ Rn and the plane A = { X ∈ Rn : N , X − X 0 = 0 } is d(Y, A) = N, Y − X 0 . N b). Prove that the distance between the parallel planes A = { X ∈ Rn : N , X − X 1 = 0 } and B = { X ∈ Rn : N , X − X 2 = 0 } is d(A, B ) = N, X 2 − X 1 . N Chapter 5 Matrices and the Matrix Representation of a Linear Operator 5.1 L: R m → Rn . The simplest example of a linear operator L which maps Rm into Rn is supplied by n linear algebraic equations with m variables. Let X = (x1 , . . . , xm ) ∈ Rm . Then we define a11 x1 + a12 x2 + · · · + a1m xm a x + a22 x2 + · · · + a2m xm LX = 21 1 (5-1) . . . . . . . . . . . . . . . . . . . . . . . . . . . . an1 x1 + · · · · · · + anm xm Notice the right side of this equation is a (column) vector with n components. If we let Y = (y1 , . . . , yn ) , then the equation LX = Y or m aij xj = yi , i = 1, 2, . . . , n, j =1 determines a vector Y in Rn for every X in Rm . Since the operator L is essentially specified by the coefficients a11 , a12 , . . . , anm , it is convenient to represent it by the notation a11 a12 · · · alm a a22 · · · a2m L = 21 . . . . . . . . . . . . . . . . . . . , anl an2 · · · anm and use the notation a11 a12 · · · alm x1 a21 a22 · · · a2m x2 LX = . . . . . . . . . . . . . . . . . . . . anl . . . . . . . . anm xm (5-2) The ordered array of m × n coefficients is called a matrix associated with L , and the numbers aij are called the elements of the matrix. The first index i refers to the row while 187 188 CHAPTER 5. MATRIX REPRESENTATION the second index j refers to the column. We may also write L = ((aij )) as a shorthand to refer to the whole matrix. Since we shall only use linear operators in this chapter it is convenient to drop the letter L for the operator and use A = ((aij )) instead. This will facilitate the notation when referring to other matrices B = ((bin ) , etc. since there will be enough subscripts without adding to the confusion by using L1 , L2 , etc. for linear operators. In this section we shall work out the meaning of operator algebra applied to the special case of operators L : Rm → Rn which are represented by matrices. It turns out that every operator L : Rm → Rn can be represented by a matrix (proved later in this very section). Let us first i) define equality, ii) exhibit the matrices for the zero operator O(X ) = 0 (additive identity). If A = ((aij )) and B = ((bij )) both map Rm → Rn , then by definition, A = B if and only if AX = BX for every X ∈ Rm , that is, for all X = (x1 , x2 , . . . , xm ) , ai1 x1 + ai2 x2 + · · · + aim xm = bil xl + · · · + bim xm , or m i = 1, 2, . . . , n m aij xj = bij xj , i = 1, 2, . . . , n. (aij − bij )xj = 0, i = 1, 2, . . . , n j =1 j =1 Subtracting, we find that m j =1 must hold for any choice of X = (x1 , x2 , . . . , xm ) . From the particular choice X = (1, 0, 0, . . . 0) , we see that ai1 − bi1 = 0, i = 1, 2, . . . , n, that is, a11 = b11 , a21 = b21 , . . . , anl = bnl . Similarly, by using other vectors X , we conclude Theorem 5.1 1 (equality). If A = ((aij )) and B = ((bij )) both map Rm → Rn , then A = B if and only if the corresponding elements of their matrices are equal, aij = bij , i = 1, 2, . . . , n, j = 1, 2, . . . , m. It is clear that the n × m matrix all of whose elements are zero 0 0 ··· 0 0 0 · · · 0 0= . . . . . . . . . . . . 0 0 ··· 0 has the property that it maps every X ∈ Rm into zero, and thus satisfies the conditions for the zero matrix. That this is the only such matrix follows from Theorem 1, since any other matrix which acts the same way on every vector X ∈ Rm must have the same elements all zeroes. 5.1. L : RM → RN . 189 Theorem 5.2 2. The zero matrix 0 : Rm → Rn is uniquely represented by a matrix with n rows and m columns, all of whose elements are zero. How is the identity matrix I defined? Since I : Rn → Rn maps every vector into itself, IX = X , the linear equations (1) must have the property that given any vector X = (x1 , x2 , . . . , xn ) ∈ Rn , then n δij xj = xi , i = 1, 2, . . . , n j =1 If aij = δij (the Kronecker delta), so a11 = a22 = · · · = ann = 1 while aij = 0, then indeed i = j, n δij xj = xi , i = 1, 2, . . . n j =1 is satisfied. Thus, the coefficients of the identity matrix are I = ((δij )) . This is a square (nx m) matrix, 1 0 ··· 0 0 1 · · · 0 I= . . . . . . . . . . . . 0 0 ··· 1 with ones along the main diagonal and zeroes elsewhere. Theorem 5.3 3. The identity matrix I : Rn → Rn is uniquely represented by a square (n × n) matrix whose elements are I = ((δij )) . We turn to addition. Let A = ((aij )) and B = ((bij )) be two n × m matrices, so they both represent operators mapping Rm into Rn . Their sum C = A + B is defined as the operator which acts upon X according to the rule (p. 268) CX = AX + BX, X ∈ Rm . The elements cij of the matrix C consequently satisfy m m cij xj = j =1 aij xj + j =1 or m bij xj , j = 1, 2, . . . , n. j =m m = (aij + bij )xj , j = 1, 2, . . . , n j =1 for all X = (x1 , x2 , . . . , xm ) . Thus, the cij are in fact aij + bij (by Theorem 1) Theorem 5.4 4. If A = ((aij )) and B = ((bij )) both map Rm into Rn , then their sum C = A + B has elements cij = aij + bij . 190 CHAPTER 5. MATRIX REPRESENTATION Remark: From this it follows that the zero matrix is actually the additive identity, for if A = ((aij )) , then C = A + 0 has elements cij = aij + 0 = aij , that is, A + 0 = A . Example: 1 Let A and B which map R3 → R4 be represented by the matrices 2 2 2 −3 0 1 −3 7 2 −1 0 0 . A= 5 4 −3 ; B = −4 −2 2 0 −1 −1 01 1 Then −1 4 A+B = 1 0 2 3 2 −1 . 2 −1 0 0 Example: 2. Let A and B be the operators on p. 268 (called L1 and L2 there) which map R2 → R3 . Then 1 1 −3 1 2 , A= 1 B = 1 −1 , 0 −1 1 0 so −2 2 1 A+B = 2 1 −1 which agrees with the sum obtained there. ˜ ˜ ˜ If A = ((aij )) , is there a matrix A such that A + A = 0 ? Clearly the matrix A ˜ defined by A = ((−aij )) does the job since ˜ A + A = ((aij )) + ((−aij )) = ((0)) by definition of addition. We shall denote the matrix with elements ((−aij )) by “ −A ” since A + (−A) = 0 . This matrix “ −A ” is the additive inverse to A . Example: If 1 −1 2 A = −π 0 −1 then −1 1 − A = π −2 . 0 1 Since a linear operator which is represented by a matrix is still a linear operator, Theorem 3 (p. 269) certainly holds for matrix addition. We shall rewrite it. Theorem 5.5 Let A, B, C, . . . be matrices which map Rm into Rn (so they are n × m matrices). The set of all such matrices forms an abelian (commutative) group under addition, that is, 1. A + (B + C ) = (A + B ) + C 2. A + B = B + A 3. A + 0 = A 4. For every A , there is a matrix ( −A ) such that A + (−A) = 0. 5.1. L : RM → RN . 191 Proof: No need to do this again since it was carried out in even greater generality on p. 270. For practice, you might want to write out the proof in the special case of 2 × 3 matrices and see how much more awkward the formulas become when you use the specific elements instead of proceeding more abstractly as we did in the proof on p. 270. If α is a scalar and A = ((aij )) is an n × m matrix which represents a linear operator mapping Rm → Rn , the operator αA is defined by the rule (αA)X = A(αX ) where X is any vector in Rm . In terms of the elements ((aij )) , this means that the elements ((˜ij )) of αA are given by a m m aij xj = ˜ j =1 aij (αxj ), i = 1, 2, . . . , n (αaij )xj , i = 1, 2, . . . , n j =1 m = j =1 so aij = αaij . Thus, the matrix αA is found by multiplying each of the elements of A by ˜ α, αa11 αa12 · · · αa1m a11 a12 · · · a1m a . . . . . . . . a2m αa21 αa22 · · · αa2m α 21 . . . . . . . . . . . . . . . . . . . = . . . . . . . . . . . . . . . . . . . . . . . αanl . . . · · · αanm anl . . . . . . anm Example: 7 1 3 −14 −2 −6 −2 −1 4 4 2 −8 −1 = 9 6 5 −18 −12 −10 −3 1 −1 6 −2 2 . The following theorem concerns multiplication of matrices by scalars. It is proved either by direct computation - or more simply by realizing that it is a special case of Exercise 12, p. 284. Theorem 5.6 . If A and B are matrices which map Rm → Rn , and if α, β are any scalars, then 1. α(βA) = (αβ )A 2. 1 · A = A 3. (α + β )A = αA + βA 4. α(A + B ) = αA + αB. Remark: Theorems 5 and 6 together state that the set of all matrices which map Rm into Rn forms a linear space. It is easy to show that the dimension of this space is m · n (by exhibiting m · n linearly independent matrices which span the whole space). Now we get more algebraic structure and see how to multiply. Let A map R? into Rn and B = ((bij )) map Rr into Rs . By definition of operator multiplication (p. 271-2), the product AB is defined on an element X ∈ Rr = D(B ) by the rule ABX = A(BX ). 192 CHAPTER 5. MATRIX REPRESENTATION Since the vector BX ∈ Rs must be fed into A , we find that BX ∈ Rm too. Thus, in order for the product AB of an n × m matrix A with a s × r matrix B to make sense, we must have ? = m , that is, the range of B must be contained in the domain of A , a figure goes here If C = ((cij )) = AB , then for every X ∈ Rr CX = A(BX ) or r r s cij xk = bjk xk , i = 1, 2, . . . , n aij bjk xk , i = 1, 2, . . . , n. aij j =1 k=1 k=1 so r = s k=1 j =1 Therefore, the elements cik of the product AB are given by the formula s cik = aij bjk j =1 i = 1, 2, . . . , n . k = 1, 2, . . . , r Since the summation signs have probably overwhelmed you, we repeat it in a special case. Let B be determined by the linear equations b11 x1 + b12 x2 + b13 x3 = y1 b21 x1 + b22 x2 + b23 x3 = y2 . Then B : R3 → R2 . Also let A : R2 → R2 be determined by a11 y1 + a12 y2 = z1 a21 y1 + a22 y2 = z2 . The product AB maps a vector X ∈ R3 first into Y = BX ∈ R2 and then into Z = ABX ∈ R2 . a figure goes here Ordinary substitution yields Z = ABX as a function of X : a11 (b11 x1 + b12 x2 + b13 x3 ) + a12 (b21 x1 + b22 x2 + b21 x3 ) = z1 a21 (b11 x1 + b12 x2 + b13 x3 ) + a22 (b21 x1 + b22 x2 + b23 x3 ) = z2 , or (a11 b11 + a12 b21 )x1 + (a11 b12 + a12 b22 )x2 + (a11 b13 + a12 b23 )x3 = z1 (a21 b11 + a22 b21 )x1 + (a21 b12 + a22 b22 )x2 + (a21 b13 + a22 b23 )x3 = z2 . 5.1. L : RM → RN . 193 If we write this in the matrix form x1 x2 = x3 c11 c12 c13 c21 c22 c23 z1 z2 , we find c11 = a11 b11 + a12 b21 , c12 = a11 b12 + a12 b22 etc., just as was dictated by the general formula for the multiplication of matrices. Theorem 5.7 . If A = ((aij )) and B = ((bij )) are matrices with B : Rr → Rs and A : Rs → Rn , then the product C = AB is defined and the elements of the product C = ((cij )) are given by the formula s cik = aij bjk , i = 1, 2, . . . , n; k = 1, 2, . . . , r. j =a Remark: Since this formula for matrix multiplication is impossible to remember as it stands, it is fortunate that there is an easy way to remember it. We shall work with the example of matrices A : R2 → R2 and B : R3 → R2 discussed earlier. Then AB = b11 b12 b13 b21 b22 b23 a11 a12 a21 a22 = c11 c12 c13 c21 c22 c23 . To compute the element cik , we merely observe that 2 cik = aij bjk = ai1 b1k + a12 b2k j =1 cik is the scalar product of the i th row in A with the k th column in B (see fig.). Thus, the element c21 in C = AB is the scalar product of the 2nd row of A with the 1st column of B . Do not be embarrassed to use two hands to multiply matrices. Everybody does. Examples: (1) (cf. p. 274 where this was done without matrices). If A= 2 −3 −1 1 , B= 02 11 , then AB = 2 −3 −1 1 BA = 02 11 02 11 = −3 1 1 −1 2 −3 −1 1 = −2 2 1 −2 and . Notice that even though AB and BA are both defined, we have AB = BA —the expected noncommutativity in operator multiplication. 194 CHAPTER 5. MATRIX REPRESENTATION (2) (cf. p. 272 bottom where this was done without matrices). If 1 −1 1 , A= 0 −1 −2 B = (1, 2, −1), then 1 −1 1 = (2, 3). BA = (1, 2, −1) 0 −1 −2 However the product AB does not make sense. From the general theory of linear operators (Theorem 4, p. 276) we can conclude Theorem 5.8 . Matrix multiplication is associative, that is, if A B C Rk → Rl → Rm → Rn , so the products C (BA) and (CB )A are defined, then C (BA) = (CB )A. Thus the parenthesis can be omitted without risking chaos. Remark: Returning to linear algebraic equations, you will observe that the matrix notation AX there [eq(2)] can now be viewed as matrix multiplication of the n × m matrix A = ((aij )) with the m × 1 matrix (column vector) X . In developing the algebra of matrices - and operators in general - we have been neglecting one important issue, that of an inverse operator. If L : V1 → V2 , can we find ˜ an operator L : V2 → V1 which reverses the effect of L , that is, if LX = Y , where ˜ ˜ X ∈ V1 and Y ∈ V2 , is there an operator L such that LY = X ? If so, then ˜ ˜ LLX = LY = X, and we write ˜ LL = I. ˜ ˆ This operator L is the left (multiplicative) inverse of L . Similarly, an operator L ˆ = I is the right (multiplicative) inverse of L . We shall shortly prove such that LL ˆ that if an operator L has both a left inverse L and a right inverse L , then they are ˆ = L , so without ambiguity one can write L−1 for the inverse. ˜ equal, L a figure goes here 5.1. L : RM → RN . 195 To begin, we compute the inverse of the matrix A= 5 −2 3 −1 associated with the system of linear equations 5x1 − 2x2 = y1 3x1 − x2 = y2 . These equations specify a mapping from R2 into R2 . They map a point X into Y . Finding the inverse of A is equivalent to answering the question, if we are given a point Y , can we find the X whence it came? AX = Y, X = A−1 Y. Finding the X in terms of Y means solving these two equations, a routine task. The answer is x1 = −y1 + 2y2 x1 −1 2 y1 so = x2 = −3y1 + 5y2 . x2 −3 5 y2 Thus, X = A−1 Y, where A−1 = −1 2 −3 5 . The matrix A−1 is the matrix inverse to A . It is easy to check that AA−1 = 5 −2 3 −1 −1 2 −3 5 = 10 01 =I A−1 A = −1 2 −3 5 5 −2 3 −1 = 10 01 = I. and Thus, this matrix A−1 is both the right and left inverse of A . Our second example is of a more geometric nature. We shall consider a matrix R which represents rotation of a vector in E2 through an angle α . a figure goes here R is represented by the matrix (cf. Ex. 13b p. 285) R= cos α − sin α sin α cos α . It is geometrically clear that in inverse of this operator R is an operator which rotates through an angle −α , unwinding the effect of R . Thus, immediately from the formula for R , we find cos(−α) − sin(−α) cos α sin α R −1 = = . sin(−α) cos(−α) − sin α cos α 196 CHAPTER 5. MATRIX REPRESENTATION To check that geometry has not deceived us, we should multiply out RR−1 and R−1 R . Do it. You will find RR−1 = R−1 R = I . One could also have found R−1 by solving linear algebraic equations as was done in the first example. The problem of finding the matrix inverse to any square matrix, a11 a12 · · · a1n a21 a22 · · · a2n A= . . . . . . an1 an2 · · · ann is equivalent to the dull problem of solving n linear algebraic equations in n unknowns a11 x1 + · · · a21 x1 + · · · . . . an1 x1 + · · · for X in terms of Y, yields the formulas +a1n xn +a2n xn . . . +ann xn = yn = y1 = y2 X = A−1 Y . For n = 2 the computation is not too grotesque, and a12 a22 y1 − y2 ∆ ∆ a11 −a21 y1 + y1 x2 = ∆ ∆ x1 = where ∆ = a11 a22 − a12 a21 (= determinant of A , for those who have seen this before). From this formula we read off that the inverse of the 2 × 2 matrix A= a11 a12 a21 a22 is A−1 = 1 ∆ a22 −a12 −a21 a11 . As a check, one computes that AA−1 = A−1 A = I. Thus the 2 × 2 matrix A has an inverse if and only if ∆ := a11 a22 − a12 a21 = 0 . a figure goes here Fortunately, one rarely needs the explicit formula for the inverse of a square n × n matrix other than the reasonable cases n = 2 and n = 3 . The inverse of a matrix has greater conceptual use as the inverse of an operator. Having relegated the computation of the inverse of a matrix to the future, let us see what can be said about the inverse without computation. This will necessarily be a bit more abstract. Since the issues involve solving systems of linear algebraic equations, we shall invoke the theory concerning that which was developed in Chapter 4 Section 3. For this discussion, it is convenient to use the following definition (cf. p. 6). Definition: An operator A : V1 → V2 is invertible if it has the two properties i) If X1 = X2 then AX1 = AX2 (injective, 1-1) ii) To every Y ∈ V2 , there is at least one X ∈ V1 such that AX = Y (surjective, onto). 5.1. L : RM → RN . 197 Thus, an operator is invertible if and only if it is bijective. An invertible matrix is usually called non-singular, while a matrix which is not invertible is called singular. To show that this definition is identical with the previous one, we must show that every invertible linear operator A has a right and left inverse. A more pressing matter though, is Theorem 5.9 . If the linear operator A : V1 → V2 where V1 and V2 are finite dimensional, is invertible, then dim V1 = dim V2 , so a matrix must necessarily be square for an inverse to exist (but being square is not sufficient, as was seen in the 2 × 2 case where the additional condition a11 a22 − a21 a12 = 0 we needed). In other words, you haven’t got a chance to invert a matrix unless it is square, but being square is not enough. Proof: Condition i) states that N(A) = 0 , for if X1 = 0 , then AX1 = 0 . Therefore dim R(A) = dim D(A) − dim N(A) = dim V1 − 0 = dim V1 . On the other hand, condition ii) states that V2 ⊂ R(A) . Since A : V1 → V2 , we know that R(A) ⊂ V2 . Therefore R(A) = V2 . Coupled with the first part, we have dim V1 = dim R(A) = dim V2 . Theorem 5.10 . Given an operator A which is invertible, there is a linear operator A−1 such that AA−1 = A−1 A = I . ˜ ˜ ˜ ˜ ˜ Proof: If Y ∈ V2 , there is an X ∈ V1 such that AX = Y (by property ii), and that X −1 Y = X . A similar ˜ ˜ is unique (property i). Therefore without ambiguity we can define A process defines the operator A−1 for every Y ∈ V2 . From our construction, it is clear (or should be) that AA−1 = A−1 A = I. ˜ ˜ ˆ ˜ All that remains is to show A−1 is linear. If AX = Y and AX = Y , then since A is linear, ˜ + bX ) = aAX + bAX = aY + bY . Thus A−1 (aY + bY ) = aX + bX = aA−1 Y + bA−1 Y . ˆ ˜ ˆ ˜ ˆ ˜ ˆ ˜ ˆ ˜ ˆ A(aX Remark: Glancing over this proof, it should be observed that finite dimensionality (or even the concept of dimension) never entered - so the result is true for infinite dimensional spaces. Furthermore, linearity was only used to show that A−1 was linear. Thus the theorem (except for the claim that A−1 is linear) is true for nonlinear operators as well. Needless to say, this construction of A−1 one point at a time is useless as a method for finding A−1 (since even in the simplest case A : R1 → R1 it involves an infinite number of points). This theorem shows that if an operator A is invertible, then there are right and left inverses which are equal AA−1 = A−1 A = I . We can reverse the theorem and prove ˆ Theorem 5.11 . Given the linear operator A : V1 → V2 , if there are linear operators A ˜ (left inverse) such that (right inverse) and A ˆ ˜ AA = AA = I, ˆ ˜ then A is invertible and A−1 = A = A . 198 CHAPTER 5. MATRIX REPRESENTATION ˜ ˜ ˜ Proof: Verify condition i: If AX1 = AX2 , then AAX1 = AAX2 . Since AA = I , this implies X1 = X2 . ˆ ˆ Verify condition ii. If Y is any element in V2 , let X = AY . Then AX = AAY = Y , so that Y is the image of X under the mapping. ˜ ˆ The proof that A−1 = A = A is delightfully easy. Only the associative property of multiplication is used: ˆ ˆ ˆ ˜ ˜ ˜ A = (A−1 A)A = A−1 (AA) = A−1 = (AA)A−1 = A(AA−1 ) = A. Examples: (1) The identity operator I on every linear space is invertible, for it trivially satisfies both criteria. Not only that, but it is its own inverse for II = I . (2) The zero operator is never invertible, for even though X1 = X2 , we always have 0(X1 ) = 0 = 0(X2 ) . (3) The 2 × 2 matrix 1 3 −2 −6 A= is not invertible since, from the formula AX = 1 3 −2 −6 x1 x2 = x1 + 3x2 −2x1 − 6x2 , we see that the vector (−3, 1) = 0 is mapped into zero by A (whereas criterion i). states that only 0 can be mapped into 0 by an invertible linear operator). Another way to see that A is not invertible is to observe that ∆ = a11 a22 − a12 a21 = 0 . thus violating the explicit condition for 2 × 2 matrices found earlier. In this last example, we observed that if a linear operator A is invertible, then by property i) the equation AX = 0 has exactly one solution X = 0 . If A : V1 → V2 on a finite dimensional space, and dim V1 = dim V2 the converse is true also. Theorem 5.12 If the linear operator A maps the linear space V1 into V2 and dim V1 = dim V2 < ∞ , then A is invertible ⇐⇒ AX = 0 implies X = 0. Proof: ⇒ A restatement of condition i) in the definition. ⇐ A restatement of lines 7-10 on page 316. Corollary 5.13 . A square matrix A = ((aij )) is invertible if and only if its columns a12 a1n a11 a21 a22 a2n · · · A1 = , A2 = , . . . , An = · · · · · · an1 an2 ann are linearly independent vectors. 5.1. L : RM → RN . 199 Proof: To test for linear independence, we examine xa A1 + x2 A2 + · · · + xn An = 0, and try to prove that x1 = x2 = · · · = xn = 0 . But writing the equation in full, it reads a11 x1 + a12 x2 + · · · a21 x1 + a22 x2 + · · · · · · · · · an1 x1 + an2 x2 + · · · +a1n xn = 0 +a2n xn = 0 · · · +ann xn = 0, or AX = 0. By the theorem, A is invertible if and only if the equation AX = 0 has only the solution X = 0 . Thus A is invertible if and only if the only solution of x1 A1 + x2 A2 + · · · + xn An = 0 is x1 = x2 = · · · = xn = 0 . We close our discussion of invertible operators with Theorem 5.14 . The set of all invertible linear operators which map a space into itself constitutes a (non- commutative) group under multiplication; that is, if L1 , L2 , . . . are invertible operators which map V into itself then they satisfy 0. Closed under multiplication ( L1 L2 is an invertible linear operator which maps V into itself ). (1) L1 (L2 L3 ) = (L1 L2 )L3 - Associative (2) There is an identity I such that IL = LI = L. (3) For every operator L in the set, there is another operator L−1 for which LL−1 = L−1 L = I. Proof: 0) L1 L2 is a linear operator which maps V into itself by part 0. of Theorem 4 (p. 276). It is invertible since its inverse can be written in the explicit form (an important formula) (L1 L2 )−1 = L−1 LL−1 , 2 1 as we will verify: (L1 L2 )(L−1 L−1 ) = L1 (L2 L−1 )L−1 = L1 IL−1 = L1 L−1 = I 2 1 2 1 1 1 connect these?? (L−1 L−1 )(L1 L2 ) = L−1 (L−1 L1 )L2 = L−1 IL2 = L−1 IL2 = L−1 L2 = I. 2 1 2 1 2 2 2 (1) Part 1 of Theorem 4 (p. 276). 200 CHAPTER 5. MATRIX REPRESENTATION (2) Part 1 of Theorem 5 (p. 277) (3) A direct restatement of the fact that our set consists only of invertible operators. Closely associated with a matrix A : Rn → Rm a11 a12 · · · a21 · · · · · · · A= · · am1 am2 · · · is another matrix A∗ , the transpose or adjoint the rows and columns of A , viz. a11 a21 a12 a22 · ∗ A = · · a1n · · · a1n · · · · amn . of A , which is obtained by interchanging ··· ··· ··· am1 · · · · amn . For example, 1 2 if A = 4 −2 , then A∗ = 5 −2 1 4 5 2 −2 −1 If A = ((aij )) , then A∗ = ((aij )) . The adjoint of an m × n matrix Thus, if A : Rn → Rm then A∗ : Rm → Rn , and for any Z ∈ Rm , we a11 z1 + a21 z2 + a11 a21 · · · am1 z1 a12 z1 + · · · a12 a22 · · · · · · · ∗ · = · A Z= · · · · · · · zm a1n z1 + · · · a1n · · · · · · amn . is an n × m matrix. have · · · + am1 zm · · · + am2 zm · , · · · · · + amn zm so the j th component (A∗ Z )j of the vector A∗ Z ∈ Rn is m (A∗ Z )j = aij zi = a1j z1 + a2j z2 + · · · + amj zm . i=1 Beware: The classical literature on matrices uses the term “adjoint of a matrix” for an entirely different object. Our nomenclature is now standard in the theory of linear operators. A real square matrix A is called symmetric or self-adjoint if A = A∗ . For example, 7 2 −3 5 = A∗ . A = 2 −1 −3 5 4 For a symmetric matrix A , we have aij = aji . 5.1. L : RM → RN . 201 The significance of the adjoint of a matrix (as well as its relation to the more general conception of the adjoint of an arbitrary operator) arises in the following way. If A : En → Em , then for any X in En the vector Y = AX is a vector in Em . We can form the scalar product of this vector Y = AX with any other vector Z in Em (because Y and Z are both in Em Z, Y = Z, AX . Since A∗ : Em → En , and Z ∈ Em , then A∗ Z makes sense, and is a vector in En , so A∗ Z, X is a real number for any X ∈ En . Claim: Z, AX = A∗ Z, X . This is easy to verify. Let A = ((aij )) . Then n m (AX )i = and (A∗ Z )j = aij xj j =1 aij zi , i=1 so that m Z, AX = m zi (AX )i = i=1 m n zi ( i=1 aij xj ) j =1 n = zi aij xj . i=1 j =1 In the same way, n A∗ Z, X = n (A∗ Z )j xj = j =1 mn = m ( aij zi )xj j =1 i=1 zi aij xj . i=1 j =1 Comparison reveals we have proved Theorem 5.15 . If A : En → Em , then for any X ∈ En and any Z ∈ Em , Z, AX = A ∗ Z, X , where A∗ is the adjoint of A . Remark: From a more abstract point of view, the operator A∗ is usually defined as the operator which has the above property. If this definition is adopted, one must use it to prove the adjoint A∗ of a matrix A is found by merely interchanging the rows and columns (try to do it!). It is remarkably easy to obtain some properties of the adjoint by using Theorem 14. Our attention will be restricted to square matrices (although the results are still true with but minor modifications for a rectangular matrix). 202 CHAPTER 5. MATRIX REPRESENTATION Theorem 5.16 . Let A and B be n × n matrices (so the products AB , BA , B ∗ A∗ , A + B etc. are all defined). Then 0. I ∗ = I (because I is symmetric) 1. (A∗ )∗ = A 2. (AB )∗ = B ∗ A∗ 3. (A + B )∗ = A∗ + B ∗ . 4. (cA)∗ = cA∗ , c is a real scalar. 5. A is invertible if and only if A∗ is invertible, and (A∗ )−1 = (A−1 )∗ . 6. A is invertible if and only if the rows of A are linearly independent. Proof: We could use subscripts and the aij stuff - but it is clearer to use the result of Theorem 14. In order to do so, an important preliminary result is needed. Theorem 5.17 . If C : En → Em , then the equation C x, Y = 0 ⇐⇒ for all X in En and Y in Em C is the zero operator, C = 0 . Thus if C1 and C2 map En into itself, the equation C1 X, Y = C2 X, Y for all X, Y ∈ En ⇐⇒ C1 = C2 . Proof: ⇒ By contradiction, if C = 0 there is some X0 such that 0 = CX0 ∈ En . Now just pick Y0 = CX0 . Then 0 = C X0 , Y0 = C X0 , CX0 = C X0 2 >0 because by assumption CX0 = 0 . A glance at this line reveals the desired contradiction. ⇐ Obvious. The last assertion of the theorem follows by subtraction, 0 = C1 X, Y − C2 X, Y = C1 X − C2 X, Y = (C1 − C2 )X, Y and letting C = C1 − C2 . Now we return to the Proof of Theorem 15: The vectors X, Z will be in En . (0) Particularly clear because I is symmetric. You should try constructing another proof patterned on those below. (1) Two successive interchanges of the rows and columns of a matrix leave it unchanged. Again, try to construct another proof patterned on those below. (2) (AB )∗ Z, X = Z, ABX = Z, A(BX ) = A∗ Z, BX = B ∗ (A∗ Z ), X = (B ∗ A∗ )Z, X for all X, Z in En . Application of Theorem 16 yields the result. 5.1. L : RM → RN . (3) 203 (A + B )∗ Z, X = Z, (A + B )X = Z, AX + BX = Z, AX + Z, BX, = A∗ Z, X + B ∗ Z, X = A∗ Z + B ∗ Z, X = (A∗ + B ∗ )Z, X . And apply Theorem 16. (4) (cA)∗ Z, X = Z, cAX = c Z, AX = c A∗ Z, X = (cA∗ )Z, X . Apply Theorem 16. (5) If A is invertible, then AA−1 = A−1 A = I . An application of parts 0 and 2 shows (A−1 )∗ A∗ = (AA−1 )∗ = I ∗ = I. Similarly, A∗ (A−1 )∗ = I . Thus A∗ has a left and right inverse, so it is invertible by Theorem 11. The above formulas reveal (A∗ )−1 = (A−1 )∗ . In the other direction, assume A∗ is invertible. Since A∗∗ = A (part 1) the matrix A is the adjoint of A∗ . But we just saw that if a matrix is invertible then its adjoint is too. Thus the invertibility of A∗ implies that of A . (6) By the Corollary to Theorem 12, A∗ is invertible if and only if its columns are linearly independent. Since the columns of A∗ are the rows of A , we find that A∗ is invertible if and only if the rows of A are linearly independent. Coupled with Part 5, the proof is completed. In our later work we shall need an inequality. Why not insert it here for future reference. Theorem 5.18 . If A = ((aij )) is an m × n matrix, so A : En → Em , then for any X in En and Y in Em AX ≤ k X and | Y, AX | ≤ k X where m Y, n k2 = a2 . ij i=1 j =1 Proof: By definition m AX 2 m (AX )2 i = ( = i=1 n aij xj )2 , i=1 j =1 where (AX )i is the i th component of the vector AX . The Schwarz inequality shows n n 2 aij xj ) ≤ ( j =1 n a2 ij j =1 n x2 j j =1 =X 2 a2 . ij j =1 204 CHAPTER 5. MATRIX REPRESENTATION Thus, m AX 2 ≤X 2 n ( a2 ) = k 2 X ij 2 , i=1 j =1 which proves the first part. The second part follows from this and one more application of Schwarz: | Y, AX | ≤ Y AX ≤ k X Y . After all of this detailed discussion of matrices as an example of a linear operator L mapping one finite dimensional space into another, our next theorem will show why matrices are so ubiquitous. You see, we shall prove that every such linear operator L : V1 → V2 can be represented as a matrix after bases for V1 and V2 have been selected. Theorem 5.19 . (Representation Theorem) Let L be a linear operator which maps one finite dimensional space into another L : V1 → V2 . Let { e1 , e2 , . . . , en } be a basis for V1 , and { θ1 , θ2 , . . . θm } be a basis for V2 . Then in terms of these bases L may be represented by the matrix θ Le whose j th column is the vector (Lej )θ , that is, the vector Lej (which is a vector in V2 ) written in terms of the θ basis for V2 . Pictorially we have θ Le = ((Le1 )θ · · · (Len )θ ) . Proof: Finding the representation of L in terms of given bases for V1 and V2 means: given a vector X in V1 which is represented in the e basis for V1 (write it as Xe ) to find a matrix θ Le such that the image vector θ Le Xe is the image (LX )θ of X written in the θ basis for V2 . We have used the cumbersome notation θ Le to make explicit the fact that it maps vectors written in the e basis for V1 into vectors written in the θ basis V2 . To avoid even further notation, we shall carry out the details only for the particular case where the domain V1 is two dimensional with basis { e1 , e2 } and V2 is three dimensional with basis { θ1 , θ2 , θ3 } . The general case is proved in the same way. Since the vectors Le1 and Le2 are in V2 , they can be written in the θ basis, say, Le1 = a1 θ1 + b1 θ2 + c1 θ3 , Le2 = a2 θ1 + b2 θ2 + c2 θ3 , so a1 (Le1 )θ = b1 c1 a2 and (Le2 )θ = b2 . c2 Given X in V1 , it can be written in the e basis for V1 , X = x1 e1 + x2 e2 , so Xe = x1 x2 . Then LX = L(x1 e1 + x2 e2 ) = x1 Le1 + x2 Le2 = x1 (a1 θ1 + b1 θ2 + c1 θ3 ) + x2 (a2 θ1 + b2 θ2 + c2 θ3 ) = (a1 x1 + a2 x2 )θ1 + (b1 x1 + b2 x2 )θ2 + (c1 x1 + c2 x2 )θ3 . 5.1. L : RM → RN . 205 If we write LX as a column vector in the θ basis it is a1 x1 + a2 x2 (LX )θ = b1 x1 + b2 x2 c1 x2 + c2 x3 which is recognized as a product a1 a2 θ L e = b1 b2 c1 c2 a1 a2 = b1 b2 X e c1 c2 x1 x2 therefore, the matrix we want is a1 a2 b1 b2 ((Le1 ) (Le2 ) ) , θ Le = θ θ c1 c2 a matrix whose j th column is the vector Lej written in the θ basis for V2 . x Example: Consider the integral operator L : = 0 as a map of the two dimensional space P1 into the three dimensional space P2 . Any bases for P1 and P2 will do, however we must simply fix our attention to specific bases. Say basis for P1 := { e1 (x) = 1, e2 (x) = x } basis for P2 := { θ1 (x) = 1+x , θ2 (x) = 1−x , θ3 (x) = x2 }. 2 2 Then x 1 dt = x = θ1 − θ2 Le1 = 0 and x Le2 = t dt = 0 Therefore 1 (Le1 )θ = −1 , 0 x2 1 = θ3 . 2 2 0 and (Le2 )θ = 0 , 1 2 so 1 Le = ((Le1 )θ (Le2 )θ ) = −1 θ 0 0 0 1 2 is the matrix representing L in terms of the given e basis for P1 and θ basis for P2 . To make you believe this, let us evaluate x LP = P 0 for some polynomial p ∈ P1 by using the matrix. For example, p(x) = 3 − x = 3e1 − e2 , 3 so in the e basis for P1 , P3 = . Its image under L in terms of the θ basis for −1 P2 is then 10 3 3 (Lp )θ =θ L(pe ) = −1 0 = −3 ; e −1 1 1 02 −2 206 CHAPTER 5. MATRIX REPRESENTATION that is, 1+x 1−x 1 1 1 ) − 3( ) − (x2 ) = 3x − x2 Lp = 3θ1 − 3θ2 − θ3 = 3( 2 2 2 2 2 which, of course, agrees with x x p(t) dt = 0 0 1 (3 − t) dt = 3x − x2 . 2 WARNING: If we had used a different basis for either P1 or P2 , the resulting matrix representing L would be different. For example, if the same basis were used for P1 but a different basis for P2 , ˜ ˜ θ basis for P2 := { θ1 (x) = 1, then ˜ Le1 = x = θ2 so and 0 (Le1 )θ = 1 , ˜ 0 ˜ θ2 (x) = x, ˜ θ3 (x) = x2 }, 1˜ x2 = θ3 , 2 2 0 (Le2 )θ = 0 . ˜ Le2 = 1 2 ˜ ˜ Therefore the matrix θL e which represents L in terms of the e basis for P1 and the θ basis for P2 is 00 1 0 . ˜ θ Le = 1 02 ˜ Again, if p(x) = 3 − x = 3e1 − e2 , then in the θ basis 00 3 (Lp )θ =θ Le Pe = 1 0 ˜ ˜ −1 01 2 = 0 3 ; 1 −2 that is, 1˜ 1 ˜ ˜ Lp = 0θ1 + 3θ2 − θ3 = 3x − x2 , 2 2 to no one’s surprise. Observe that the matrices θ Le and θ L3 both represent L —but with respect to dif˜ ferent basis. The second matrix θ Le is somewhat simpler that the first since it has more ˜ zeroes. It is often useful to pick bases in order that the representing matrix be as simple as possible. We shall not discuss that issue right now. There is a simple class of operators (transformations) which are not linear, but enjoy most of the properties which linear ones do. They are affine operators, or affine transformations. To define them, it is best to first define the translation operator. Definition: If V is any linear space and Y0 a particular element of V , then the operator T : V → V defined by T Y = Y + Y0 , Y ∈ V, is the translation operator. It translates a vector Y into the vector Y + Y0 . 5.1. L : RM → RN . 207 Definition: An affine transformation A is a linear transformation L followed by a translation. if L : V1 → V2 and Y0 ∈ V2 , it has the form AX := LX + Y0 . X ∈ V1 , Y0 ∈ V2 . Affine transformations can be added and multiplied by the same definition which governed linear transformations. Thus, if A and B are affine transformations mapping V1 into V2 , (A + B )X := AX + BX. In particular, if AX = L1 X + Y0 and BX = L2 X + Z0 , where Y0 and Z0 are in V2 , then (A + B )X = AX + BX = L1 X + Y0 + L2 X + Z0 = (L1 + L2 )X + (Y0 + Z0 ). Similarly, if A : V1 → V2 and B : V3 → V4 , where V2 ⊂ V3 , then (BA)X := B (AX ) = B (L1 X + Y0 ) = L2 (L1 X + Y0 ) + Z0 = L2 L1 X + L2 Y0 + Z0 , where Y0 ∈ V2 and Z0 ∈ V4 . You will carry out the (straightforward) proofs of the algebraic properties for affine transformations in Exercise 23. The curtain on this longest of sections will be brought down with a brief discussion of the operators which characterize rigid body motions, or Euclidean motions, as they are often called. Definition: The transformation R : En → En is an isometric transformation, (or Euclidean transformation or rigid body transformation) if the distance between two points is preserved (invariant) under the transformation. Thus, R is an isometry if RX − RY = X − Y for all X and Y in En . It is interesting to think for a moment how all these names originated. The phrase rigid body transformation arises from the idea that any motion of a rigid body (such as a translation or rotation) does not alter the distance between any two points in the body. In the framework of Euclidean geometry the whole notion of congruence is defined to be just those properties of a figure which are invariant under isometries. By allowing deformations other than isometries, one obtains geometries, so affine geometry is the study of properties invariant under all affine motions. The study of isometric transformations is mainly contained in that of a special case, orthogonal transformations. These are isometries which leave the origin fixed, R0 = 0 . It should be clear from our next theorem (part 3) that the idea of an orthogonal transformation generalizes the idea of a rotation to higher dimensional space. Reflections (mirror images) are also orthogonal transformations. Theorem 20 states that every isometric transformation is the result of an orthogonal transformation followed by a translation. Example: The matrix R = X= x1 x2 , 1 0 0 −1 defines an orthogonal transformation since if then RX = 1 0 0 −1 x1 x2 = x1 −x2 , 208 CHAPTER 5. MATRIX REPRESENTATION and if Y= y1 y2 , y1 −y2 then RY = . Consequently RX − RY = X − Y = (x1 − y1 )2 + (x2 − y2 )2 , so R , being isometric and linear is an orthogonal transformation. It represents a reflection across the x1 axis. Our definition of an orthogonal transformation does not presume its linearity. This is because the linearity is a consequence of the given properties. A proof is outlined in Ex. 16, p. 390. For convenience, the linearity will be assumed in the following theorem where we collect the standard properties of orthogonal transformations. Theorem 5.20 . Let R : En → En be a linear transformation. The following properties of R are equivalent. (1) R is an orthogonal transformation, that is RX − RY = X − Y (2) RX, RY = X, Y R 0 = 0. RX = X (3) and (so angles are preserved) (4) R∗ R = I (5) R is invertible and R−1 = R∗ . (Only in this part do we use the finite dimensionality of En ). Proof: We shall prove the following chain of implications: 1 =⇒ 2 =⇒ 3 =⇒ 4 =⇒ 5 =⇒ 4 =⇒ 1 1 =⇒ 2 . Trivial, for RX = RX − R0 = X − 0 = X . 2 =⇒ 3 . By linearity and part 2) applied to the vector X + Y , we have RX + RY = R(X + Y ) = X + Y . Now square both sides and express the norm as a scalar product: RX + RY, RX + RY = X + Y, X + Y . Upon expanding both sides, we find that RX 2 + 2 RX, RY + RY 2 =X 2 + 2 X, Y + Y 2 . Since by part 2) RX = X and RY = Y , we are done. 3 =⇒ 4 . By part 3) and Theorem 14 (p. 369), R∗ RX, Y = RX, RY = X, Y . Thus, an application of the second part of Theorem 16 (p. 371) gives us R∗ R = I . 4 =⇒ 5 . Since X = R∗ RX , we see that RX = 0 implies X = 0 , consequently, R is invertible (Theorem 12, p. 364). Moreover R∗ R = I so R∗ = R−1 . 5 =⇒ 4 . Clear, since R∗ = R−1 . 5.1. L : RM → RN . 209 5 =⇒ 1 . Because R is linear, R0 = 0 . It remains to show that RX − RY = X − Y , an easy computation. RX − RY 2 = R(X − Y ) 2 = R(X − Y ), R(X − Y ) , so using 4) = R∗ R(X − Y ), X − Y = (X − Y ), (X − Y ) = X − Y 2 . Done. Earlier in this section (p. 357-8) we considered a matrix R which represented the operator which rotates a vector in E2 through an angle α . This matrix is the simplest (non-trivial) example of a rigid body transformation which leaves the origin fixed, that is, an orthogonal transformation. R= cos α − sin α sin α cos α . To prove that R is an orthogonal matrix, by Theorem 19 part 3, it is sufficient to verify RX, RY = X, Y for all X and Y in E2 . A calculation is in order here. RX = cos α − sin α sin α cos α x1 x2 = x1 cos α − x2 sin α x1 sin α + x2 cos α . Similarly for RY , just replace x1 and x2 by y1 and y2 respectively. Then RX, RY = (x1 cos α − x2 sin α)(y1 cos α − y2 sin α) + missing? (x1 sin α + x2 cos α)(y1 sin α + y2 cos α) = x1 y1 cos2 α − (x1 y2 + x2 y1 ) sin α cos α + x2 y2 sin2 α +x1 y1 sin2 α + (x1 y2 + x2 y1 ) sin α cos α + x2 y2 cos2 α = x1 y1 + x2 y2 = X, Y . Done. We previously found an expression for R−1 (p. 358) by geometric reasoning. It is reassuring to notice R−1 = R∗ , just as part 5 of our theorem states. The most general rotation in E3 may be decomposed into a product of these simple two dimensional rotations. For a brief discussion - complete with pictures - open Goldstein, Classical Mechanics to pp. 107-9. Now to the last theorem of this section. Theorem 5.21 . If R : En → En is a rigid body transformation, then for every X ∈ En RX = R0 X + X0 , where R0 is an orthogonal transformation (rotation) and X0 is a fixed vector in En . Thus, every rigid body motion is composed of a rotation (by R0 and a translation (through X0 ). Proof: Let R0 X = RX − R0 . Since R0 0 = R 0 − R 0 = 0, the operator R0 has the property R0 0 = 0 . Furthermore, for any X and Y in En , R0 X − R0 Y = RX − R0 − RY + R0 = RX − RY = X − Y . Therefore R0 satisfies the definition of an orthogonal transformation. The proof is completed by defining X0 to be the image of the origin under R, X0 = R0 . Then R0 X = RX − X0 , or RX = R0 X + X0 . 210 5.2 CHAPTER 5. MATRIX REPRESENTATION Supplement on Quadratic Forms Quadratic polynomials of the form Q(X ) = αx2 + βx1 x2 + γx2 , 1 2 X = (x1 , x2 ) and the generalization to n variables X = (x1 , x2 , . . . , xn ) n n Q(X ) = αij xi xj i j =1 often arise in mathematics. They are called quadratic forms and can always be represented in the form X, SX where S is a self adjoint matrix. For example, the first quadratic form can be written as Q(X ) = (x1 , x2 ) α β 2 β 2 γ x1 x2 = X, SX , where S is the matrix indicated. The procedure for finding the elements ((aij )) of the matrix S is simple. First take care of the diagonal terms by letting aii be the coefficient of x2 in Q(X ) . Realizing that 1 xi xj = xi xj , collect the terms αij xi xj and αji xj xi in Q(X ) , getting (αij + αji )xi xj . Then let 1 aij = aji = (αij + αji ) i = j. 2 Example: Q(X ) = x2 − 2x1 x3 − x2 + 6x1 x2 + 4x3 x1 . Rewrite this as Q(X ) = x2 − x2 + 1 2 1 2 6x1 x2 + 2x1 x3 . Then 1 31 S = 3 −1 0 1 00 and Q(X ) = X, SX . as you can easily verify. Definition: A quadratic form Q(X ) is positive semi definite if Q(X ) ≥ 0 for all X and positive definite if Q(X ) > 0, x = 0. Q(X ) is negative semi definite or negative definite if, respectively, Q(X ) ≤ 0 , or Q(X ) < 0, X = 0 . If S is the self adjoint matrix associated with the quadratic form Q(X ) , then S is positive semi definite, positive definite, etc., if Q(X ) has the respective property. We may think of Q(X ) as representing a quadratic surface. Thus, if S is diagonal, for example 200 S = 0 1 0 , 003 with positive diagonal elements, then the equation Q(X ) = 1 , where Q(X ) = X, SX = 2x2 + x2 +3x2 , represents an ellipsoid. This matrix S is positive definite since by inspection 1 2 3 Q(X ) > 0, X = 0 . It is easy to see if a diagonal matrix S is positive semi definite, negative semi definite, positive definite, or negative definite. 5.2. SUPPLEMENT ON QUADRATIC FORMS 211 Example: The diagonal matrix γ1 . . . 0 S = ... 0 . . . γn is (a) positive semi definite if and only if γ1 , . . . , γn are all non-negative, (b) positive definite if and only if γ1 , . . . , γn are all positive (not zero), and the obvious statements for negative semi definite and negative definite. The problem of determining if a non diagonal symmetric matrix is positive etc. is more subtle. We shall find necessary and sufficient conditions for the two variable case, but only necessary conditions for the general case. Consider the 2 × 2 self-adjoint matrix S= ab bc and the associated quadratic form Q(X ) = ax2 + 2bxy + cy 2 . There are several cases. (i) If a = 0 , then Q(X ) = (2bx + cy )y. If b = 0 , by choosing x and y appropriately, we can make Q(X ) assume both positive and negative values. Thus, for a = 0, b = 0, Q can be neither a positive nor a negative semi-definite form. On the other hand, if a = 0 , and b = 0 , then Q is positive (negative) semi definite if and only if c ≥ 0 (c ≤ 0) . If a = 0, Q can never be positive definite or negative definite since if X = (x, 0) where x = 0 , then Q(X ) = 0 but X = 0 . (ii) If a = 0 , then Q can be written as Q(X ) = 1 [(ax + by )2 = (ac − b2 )y 2 ]. a We can immediately read off the conditions from this. Q is positive semi definite (definite) if and only if a > 0 and ac − b2 ≥ 0 (ac − b2 > 0) , and negative semi definite (definite) if and only if a < 0 and ac − b2 ≥ 0 (ac − b2 > 0) . In summary, we have proved Theorem 5.22 A. Let Q(X ) = ax2 + 2bxy + cy 2 , and S be the associated symmetric matrix. Then (a) Q is positive semi definite if and only if a ≥ 0 and ac − b2 ≥ 0 (this implies c ≥ 0 too). (b) Q is positive definite if and only if a > 0 and ac − b2 > 0 (this implies c > 0 too). 212 CHAPTER 5. MATRIX REPRESENTATION The general case of a quadratic form in n variables is much more difficult to treat. There are known necessary and sufficient conditions, but they are not too useful in practice, especially for a large number of variables. We shall only prove one necessary condition for a quadratic form to be positive semi-definite (or positive definite), a condition which is both transparent to verify in practice and even easier to prove. THEOREM B. If the self adjoint matrix S = ((aij )) is positive definite, then the diagonal elements must all be positive, a11 , a22 , . . . , ann > 0 . Similarly, if S is negative definite then the diagonal elements must all be negative. n Proof: Q(X ) = X, SX = aij xi xj . Since Q is positive definite, Q(X ) > 0 for all i,j =1 X = 0 . In particular, Q(ek ) > 0, k = 1, . . . , n , where ek is the k th coordinate vector ek = (0, 0, . . . , 0, 1, 0, . . . , 0) . But Q(ek ) = akk . Thus akk > 0, k = 1, . . . , n , just what we wanted to prove. Examples: 1. The quadratic form Q(X ) = 3x2 + 743xy − y 2 + 4z 2 + xz is positive definite or semi definite since the coefficient of y 2 is negative. It is not negative definite or semi definite since the coefficient of x2 is positive. 2. The quadratic form Q(X ) = x2 − 5xy + y 2 + 2z 2 satisfies the necessary conditions of Theorem B, but the conditions of Theorem D were not sufficient conditions for positive definiteness. Thus, we cannot conclude this Q(X ) is positive definite. In fact, this Q(X ) is not positive definite or semi definite since, for example, if X = (1, 1, 1) , then Q(X ) = −1 . It is clearly not negative definite or semi definite. Exercises (1) Find the self-adjoint matrix S associated with the following quadratic forms: (a) Q(X ) = x2 − 2x1 x2 + 4x2 . 1 2 (b) Q(X ) = −x2 + x1 x2 − x1 x3 + x2 − 3x2 x1 − 2x3 x2 + 3x2 1 2 3 (c) Q(X ) = 2x1 x2 − 3x3 x2 + 4x2 x4 + x3 x4 + 7x2 2 [Answers: (a) 1 −1 −1 4 −1 −1 −1 1 , (b) 1 − 2 −1 0 1 0 1 7 −3 2 −1 , (c) 0 −3 0 2 3 1 0 2 2 −1 2 0 2 1 2 0 (2) Use Theorem A or B to determine which of the following quadratic forms in two variables are positive or negative definite, or semi definite, or none of these. (a) Q(X ) = x2 − 2x1 x2 + 4x2 1 2 (b) Q(X ) = −x2 + x1 x2 − 4x2 1 2 (c) Q(X ) = x2 − 6x1 x2 − 4x2 1 2 (d) Q(X ) = x2 − 6x1 x2 + 4x2 1 2 (e) Q(X ) = x2 − 6x1 x2 + 4x2 x3 − x2 + 4x2 1 2 3 (3) If the self-adjoint matrix S is positive definite, prove it is invertible. Give an example of an invertible self-adjoint matrix which is neither positive nor negative definite. 5.2. SUPPLEMENT ON QUADRATIC FORMS 213 (4) Find all real values for λ for which the quadratic form Q(X ) = 2x2 + y 2 + 3z 2 + 2λxy + 2xz √ is positive definite. [Hint: Q(X ) = ( 5 − λ2 )x2 + (λx + y )2 + ( 3z + 3 1 √ x)2 3 (5) Let the integer n be ≥ 3 . If the quadratic form n Q(X ) = aij xi xj , aij = aji i,j =1 is the product of two linear forms n Q(X ) = ( n λi xi )( i=1 µj xj ), j =1 show that det A = det((aij )) = 0 . (6) If the self-adjoint matrix S is positive definite or semi-definite, prove the generalized Schwarz inequality: | Y, SX |2 ≤ Y , SY X, SX for all X and Y . [Hint: Observe [X, Y ] := Y , SX scalar product]. satisfies all the axioms for a (7) If the self-adjoint matrix S is positive definite (so S −1 exists by Exercise 3), prove that S −1 is also positive definite. [Hint: Use the generalized Schwarz inequality, Exercise 6, with Y = S −1 X and the inequality X, SX ≤ k 2 X 2 of Theorem 17, p. 373]. (8) Proof or counterexample: (a) If a matrix A = ((aij )) is positive definite, then all of its elements are positive, aij > 0 for all i, j . (b) If a matrix A is such that all of its elements are positive, aij > 0 , then the matrix is positive definite. Exercises (1) Write out the matrices associated with the operators A and B in Exercise 4a, p. 281, and carry out the computation there using matrices. (2) Write out the matrices RA , RB , and RC for the rotation operators A, B , and C in Exercise 8 p. 281 and complete that problem using matrices. [Ans. RA = 10 0 0 0 −1 in terms of the basis e1 = (1, 0, 0), e2 = (0, 1, 0), e3 = (0, 0, 1) ]. 01 0 (3) Prove Exercise 2b (p. 281) as a corollary of Theorem 18. 214 CHAPTER 5. MATRIX REPRESENTATION (4) If 00 01 A= compute AB, , 01 00 B= BA , and B 2 . (5) Compute A−1 if 12 34 (a). A = −2 [ans. A−1 = 40 5 (b). A = 0 1 −6 30 4 1100 0 1 1 0 (c) A = 0 0 1 1 0001 3 2 1 1 −2 4 0 −5 [ans. A−1 = −18 1 24 ] −3 0 4 1 −1 1 −1 0 1 −1 1 ] [ans. A−1 = 0 0 1 −1 0 0 0 1 (6) If A is the matrix of 5a) above, from the definition compute directly, (a) −6A−1 + 1 A∗ 2 [ans. −9 2 −8 5 25 2 . (b) (A∗ )−1 and (A−1 )∗ . Compare them. State and prove a general theorem. (c) AA∗ and A∗ A . (7) If A= 12 34 , B= 1 1 2 −1 , compute (AB )∗ , A∗ B ∗ , and B ∗ A∗ . Compare (AB )∗ and B ∗ A∗ and explain the outcome. (8) Prove that I ∗ = I and A∗∗ = A using only Theorems 14 and 16 (cf. Parts 2-4 of Theorem 15). (9) If A : Rn → Rm and B : Rm → Rn where n > m , prove that BA (an n × n matrix) is singular. Is AB necessarily singular? (Proof or counterexample). (10) Given two square matrices A and B such that AB = 0 , which of the following statements are always true. Proofs or counterexamples are called for. [I suggest you confine your search for counterexamples to the case of 2 × 2 matrices.] (a). A = 0 . (b). B = 0. (c). A and/or B are (is) singular (not invertible). (d). A is singular. (e). B −1 exists. (f). If A−1 exists, then B = 0 . (g). If B is nonsingular, then A = C . 5.2. SUPPLEMENT ON QUADRATIC FORMS 215 (h). BA = 0 . (i). If A = 0 and B = 0 , then neither A nor B are invertible. (11) (a). If A is a square matrix which satisfies A2 − 2A − I = 0, find A−1 in terms of A . [Hint: Find a matrix B such that AB = BA = I .] (b). If A is a square matrix which satisfies An + an−1 An−1 + an−2 An−2 + . . . + a1 A + a0 I = 0, a0 = 0, where a0 , a1 , . . . , an−1 are scalars, prove that A is invertible and find A−1 in terms of A . (12) (a). If L : En → Em , prove N(L∗ ) = R(L)⊥ [Hint: Show (in two lines) that X ∈ N(L∗ ) ⇐⇒ X, LZ = 0 for all Z ∈ En —from which the result is immediate.] (b). Use part (a) to show that dim R(L) = dim R(L∗ ) . (c). Do exercise 19, page 441. (13) (a). If T : En → En is a translation, T X = X + X0 , prove T is invertible by explicitly finding T −1 (which is a trivial task). [Answer: T −1 X = X − X0 .] (b). If R : En → En is a rigid body transformation, show that R is always invertible by exhibiting R−1 . [Answer: If RX = R0 X + X0 , then R can be written as ∗ Rx = (T R0 )X. R−1 = R0 T −1 .] (14) If A is any n × n matrix, find matrices A1 and A2 such that A is decomposed into the two parts A = A1 + A2 where A1 is symmetric and A2 is anti-symmetric, i.e., A∗ = −A2 . [Hint: Assume 2 there is such a decomposition and use it to find A1 and A2 in terms of A and A∗ . Then verify that these work.] d (15) Consider the operator D = dx on P5 . Prove that D is not invertible (return to the definition p. 360) but exhibit an operator L which is a right inverse, DL = I . (16) This problem proves that R is orthogonal if and only if R is linear and isometric. (a) Prove that if R is linear and isometric, then it is orthogonal. (Trivial!). (b) If R is orthogonal, prove that i) RX = X ii) RX, RY = X, Y (Hint: Use RX − RY 2 = X − Y 2 ) iii) R(aX ) = aRX (Hint: Prove R(aX ) − aRX 2 = 0 ) iv) R(X + Y ) = RX + RY (Hint: Prove “something” 2 = 0 ) v) R is linear and isometric [Warning: If you assume linearity in b), you’ll vitiate the whole problem]. 216 CHAPTER 5. MATRIX REPRESENTATION (17) (a). Let A be a square matrix such that A5 = 0 . Verify that (I + A)−1 = I − A + A2 − A3 + A4 . (b). If A7 = 0 , then (I − A)−1 = ? (18) Consider the matrices (a). (c). α 1 −2 1 2 δ 0β γ0 , (b). , (d). 1 √ 2 γ β , 1 √ 2 1β 02 . For what value(s) of α, β, γ and δ do these matrices represent orthogonal transformations? (19) If A = ((aij )) is a square (n × n) matrix, the trace of A is defined as the sum of the elements on the main diagonal, tr A : = a11 + a22 + . . . + ann . Prove (a). tr(αA) = αA , where α is a scalar. (b). tr(A + B ) = tr A + tr B , where B is also an n × n matrix. (c). tr(AB ) = tr(BA) . (d). tr(I ) =? (20) Assume that A : En → En is anti-symmetric, A∗ = −A . (a). Prove A − I is invertible. [By Theorem 12, it is sufficient to show (A − I )X = 0 ⇒ X = 0 . Use the property of A to prove it AX = X , then X, AX = X 2, A∗ X, X = − X 2 , and X, AX = A∗ X, X .] (b). If U = (A + I )(A − I )−1 , then U is an orthogonal transformation. (21) Let An be the orthogonal matrix which rotates vectors in E2 through an angle of 2π/n . (a). Find a matrix representing An (use the standard basis for E2 ). (b). Let B denote the orthogonal matrix of reflection across the x1 axis (p. 382). Show that BAb = A−1 B . [The group of matrices generated by An and B and all n possible products is the dihedral group of order n ]. (22) Prove that the set of all orthogonal transformations of En into En forms a (noncommutative) group under multiplication. (23) An affine transformation AX = LX + X0 of a linear space into itself is called non-singular if the linear transformation L is non-singular. Prove that the set of all such non-singular affine transformations form a (non-commutative) group under multiplication. (24) Let A = ((aij )) be a square matrix. Find all such matrices with the property that tr(AA∗ ) = 0 (see Ex. 19 for the definition of the trace). 5.3. VOLUME, DETERMINANTS, AND LINEAR ALGEBRAIC EQUATIONS. 217 (25) Consider the linear space S = { f (x) : f (x) = a + b cos x + c sin x }. with the scalar product 1 f , g = aa + (b˜ + cc), ˜ b ˜ 2 where g (x) = a + ˜ cos x + c sin x . Define the linear transformation R : S → S by the ˜b ˜ rule (Rf )(x) = f (x + α), α real. (a) Show that R is an orthogonal transformation by proving that Rf, Rg = f , g for all f, g in S . (b) Choose a basis for S and exhibit a matrix e Re which represents R with respect to that basis for both the domain and target. (26) Let A : En → Em . Prove: A is surjective (= onto) if and only if A∗ is injective (= one to one). (27) Define A : P3 → R3 by A[p(x)] = (p(0), p(1), p(−1)) where p ∈ P3 . Find the matrix for this transformation with respect to the basis e1 = 1 e2 = (x + 1)2 , e3 = (x − 1)2 , e4 = x3 for P3 ; and the standard basis for R3 . (b). Find the matrix representing A using the same basis for R3 but using the basis e1 = 1, e2 = x, e3 = x2 and e4 = x3 for P3 . ˆ ˆ ˆ ˆ (28) If A and B both map the linear space V into itself, and if B is the only right inverse of A, AB = I , prove A is invertible. [Hint: Consider BA + B + I ]. (29) Let A : En → Em be represented by the matrix ((aij )) , and B : Em → En by ((bij )) . If Y , AX = B Y, X for all X ∈ En and all Y ∈ Em , prove B = A∗ . This proves the statement made in the remark following Theorem 14. (30) Let L : R4 → R4 be defined by LX = (x1 , 0, x3 , 0) , where X = (x1 , x2 , x3 , x4 ) . Find a matrix representing L in terms of some basis. You may use the same basis for both the domain and the target. 5.3 Volume, Determinants, and Linear Algebraic Equations. Often we have stated that thus and so is true if and only if a certain set of vectors are linearly independent. But we still have no adequate criteria for determining if a set of vectors is linearly independent. What would be an ideal criterion? One superb criteria would be as follows. Find a function which assigns to a set of n vectors X1 , X2 , . . . , Xn in Rn a real number, with the property that this number is zero if and only if the vectors are linearly dependent. 218 CHAPTER 5. MATRIX REPRESENTATION There is a geometric way of solving this problem. For clarity we shall work in two dimensions, E2 . If X1 and X2 are any two vectors in E2 , then intuition tells us X1 and X2 are linearly dependent if and only if the area of the parallelogram (see fig.) is zero. Thus, once we define the analogue of volume for n dimensional parallelepipeds in Rn , the appropriate criterion appears to be that a set of n vectors X1 , . . . , Xn in Em is linearly dependent if and only if the volume of the parallelepiped they span is zero. The major hurdle is constructing a volume function which behaves in the manner dictated by two and three dimensional intuition. Our program is to state a few (four to be exact) desirable properties of a volume function V for parallelepipeds, then construct a simpler related function - the determinant D , and observe that V = |D| (absolute value of D ) is a volume function. This determinant function will prove useful in the theory of linear algebraic equations. Let X1 and X2 be any two vectors in R2 . We define the parallelogram spanned by X1 and X2 to be the set of points X in R2 which have the form X = t1 X 1 + t2 X 2 , 0 ≤ t 1 ≤ 1, 0 ≤ t 2 ≤ 1 You can check that these points are precisely those in the parallelogram drawn above. The volume function (really area in this case) V (X1 , X2 ) which assigns to each parallelogram its volume should have the properties 1. V (X1 , X2 ) ≥ 0. 2. V (λX1 , X2 ) = |λ| V (X1 , X2 ), λ scalar. 3. V (X1 + X2 , X2 ) = V (X1 , X2 ) = V (X1 , X1 + X2 ) . 4. V (e1 , e2 ) = 1. e1 = (1, 0), e2 = (0, 1). The second property states that if one side is multiplied by λ1 then the volume is multiplied by |λ| (see fig.). The third property is more subtle. It states that the volume of the parallelogram spanned by X1 and X2 is the same as the parallelogram spanned by X1 and X1 + X2 . This is clear from the figure since both parallelograms have the same base and height. The last property merely normalizes the volume. It states that the unit square has volume 1. Our first task is to define a parallelepiped in En . Definition: The n dimensional parallelepiped in En spanned by a linearly independent set of vectors X1 , X2 , . . . , Xn is the set of all points X in Rn of the form X = t1 X 1 + t2 X 2 + · · · + tn X n , 0 ≤ tj ≤ 1. It is a straightforward matter to write the axioms for the volume V (X1 , X2 , . . . , Xn ) for the n dimensional parallelepiped in En . V-1. V (X1 , X2 , . . . , Xn ) ≥ 0. V-2. V (X1 , X2 , . . . , Xn ) is multiplied by |λ| if some Xj is replaced by λXj where λ is real. V-3. V (X1 , X2 , . . . , Xn ) does not change if some Xj is replaced by Xj + Xk , where j =k. V-4. V (e1 , e2 , . . . , en ) = 1 , where e1 = (1, 0, 0, . . . , 0) , etc. These axioms are amazingly simple. It is surprising that the volume function V in uniquely determined by them; that is, there is only one function which satisfies these axioms. You might wonder why we did not add the reasonable stipulation that volume remains 5.3. VOLUME, DETERMINANTS, AND LINEAR ALGEBRAIC EQUATIONS. 219 unchanged if the parallelepiped is subjected to a rigid body transformation. The reason is that this axiom would be redundant, for this invariance of volume under rigid body transformation will be one of our theorems. The most simple way to obtain the volume function is to first obtain the determinant function D(X1 , X2 , . . . , Xn ) . We define the determinant function D(X1 , X2 , . . . , Xn ) of n vectors X1 , X2 , . . . , Xn in Rn by the following axioms (selected from those for V ). D-1. D(X1 , X2 , . . . , Xn ) is a real number. D-2. D(X1 , X2 , . . . , Xn ) is multiplied by λ if some Xj is replaced by λXj where λ is real. D-3. D(X1 , X2 , . . . , Xn ) does not change if some Xj is replaced by Xj + Xk , where j =k. D-4. D(e1 , e2 , . . . , en ) = 1 , where e1 = (1, 0, 0, . . . , 0) etc. Remarks: (1) If A = ((aij )) is a (square) n × n matrix, A= a11 a12 · · · a21 a22 · · · · · · an1 an2 · · · a1n · · · · ann we can consider it as being composed of n column vectors A1 , A2 , . . . , An , and define the determinant of the square matrix A in terms of the determinant of these vectors det A = D(A1 , A2 , . . . , An ) = a11 a12 · · · a21 · · · an1 · · · · · · a1n a2n · · · ann . (2) Although we have written a set of axioms for D , it is not at all obvious that such a function exists. Rest assured that we will prove the existence of such a function. (3) Observe: if we define V (X1 , X2 , . . . , Xn ) := |D(X1 , X2 , . . . , Xn )| , Xj ∈ En , then V does satisfy the axioms for volume. Granting existence of D , we derive some algebraic consequences of the axioms. Theorem 5.23 . Let D be a function which satisfies axiom D-1 to D-3 (not necessarily D-4). (1) If Xj is replaced by Xj = λk Xk then D does not change. k =j 220 CHAPTER 5. MATRIX REPRESENTATION (2) If one of the vectors Xj is zero, then D = 0 . (3) If the vectors X1 , X2 , . . . , Xn are linearly dependent then D = 0 . In particular D = 0 if two vectors are equal. (4) D is a linear function of each of its variables, that is D(. . . , λY + µZ, . . .) = λD(. . . , Y, . . .) + µD(. . . , Z, . . .) (so D is a multilinear function). (5) If any two vectors Xi and Xj are interchanged, then D is multiplied by −1 . D(. . . , Xi , . . . , Xj , . . .) = −D(. . . , Xj , . . . , Xi , . . .) Proof: These proofs, like the statements above, are conceptually simple but notationally awkward. Notice that only Axioms 1-3 but not Axiom 4 will be used. We shall need this fact shortly. (1) We prove this only if Xj is replaced by Xj + λXk , j = k and λ = 0 . The general case is a simple repetition of this until the other Xk ’s are used up. It is simplest to work backward. By Axiom 2, D(. . . , Xj + λXk , . . . , Xk . . .) = 1 D(. . . , Xj + λXk , . . . , λXk , . . .) λ so by axiom 3 (since λXk is now a vector in D ) = 1 D(. . . , Xj , . . . , λXk , . . .) λ and axiom 2 again = D(. . . , Xj , . . . , Xk , . . .). (2) Write the vector Xj = 0 as 0Xj where 0 is now a scalar. This scalar may be brought outside D by axiom 2. Since D is a real number, 0 · D = 0 . (3) Let Xj = ak Xk . By part 1, D does not change if Xj is replaced by Xj + k =j λk X k . k =j Choose λk = −ak . This gives a D with one vector zero, Xj − ak Xk = 0 . Thus k =j D is zero by part 2. (4) The trickiest part. Axiom 2 immediately reduced this to the special case λ = µ = 1 . For notational convenience, let Y + Z by in the last slot. We have to prove D(X1 , X2 , . . . , Y + Z ) = D(X1 , X2 , . . . , Y ) + D(X1 , X2 , . . . , Z ). If X1 , X2 , . . . , Xn−1 (which appear in all three terms above) are linearly dependent, we are done by part 3. Thus assume they are linearly independent. Since our linear space Rn has dimension n , these n − 1 vectors can be extended to a basis for Rn 5.3. VOLUME, DETERMINANTS, AND LINEAR ALGEBRAIC EQUATIONS. 221 ˜ by adding one more, Xn . Now we can write Y and Z as a linear combination of these basis vectors ˜ Y = a1 X1 + · · · + an−1 Xn−1 + an Xn , ˜ Z = b1 X1 + · · · + bn−1 Xn−1 + bn Xn . Substituting this into D we obtain n−1 ˜ (aj + bj )Xj + (an + bn )Xn ). D(X1 , . . . , Y + Z ) = D(X1 , . . . , . . . , 1 But by part 1, ˜ = D(X1 , . . . , (an + bn )Xn ) and axiom 1 results in ˜ = (an + bn )D(X1 , . . . , Xn ). However, again by part 1, n−1 ˜ aj Xj + an Xn ) D(X1 , . . . , Y ) = D(X1 , . . . , 1 ˜ ˜ = D(X1 , . . . , . . . , an Xn ) = an D(X1 , . . . , Xn ). Similarly ˜ D(X1 , . . . , Z ) = bn D(X1 , . . . , Xn ). Adding these two expressions and comparing them with the above, we obtain the result. (5) To avoid a mess, indicate only the i th and j th vectors. Our task is to prove D(. . . , Xi , . . . , Xj , . . .) = −D(. . . , Xj , . . . , Xi , . . .). This is clever. Watch: By the multilinearity (part 4) D(. . . , Xi + Xj , . . . , Xi + Xj , . . .) = D(. . . , Xi , . . . , Xi , . . . , ) + · · · + D(. . . , Xi , . . . , Xj , . . .) + D(. . . , Xj , . . . , Xi , . . .) + · · · + D(. . . , Xj , . . . , Xj , . . .). However part 2 states that the left side as well as the first and last terms on the right are zero. Thus 0 = D(. . . , Xi , . . . , Xj , . . .) + D(. . . , Xj , . . . , Xi , . . .). Transposition of one of the terms to the other side of the equality sign completes the proof. You should also be able to fashion an easy proof of this part which uses only the axioms directly (and uses none of the other parts of this theorem). Instead of moving on immediately, it is instructive to compute D[X1 , X2 ] where X1 and X2 are vectors in R2 , X1 = (a, b), X2 = (c, d) . Then we are computing D a b , c d , 222 CHAPTER 5. MATRIX REPRESENTATION ac bd which is, equivalently, the determinant of the matrix D a b , c d = aD = aD = aD 1 b a 1 b a 1 b a = (ad − bc) D = (ad − bc) D = (ad − bc) D , c d , c d (axiom 2) 1 −c (algebra) ad−cb a 1 b a 1 b a 1 0 0 1 , − , (Theorem 21 part 1) b a 0 , . b a (axiom 2) 0 1 0 1 = (ad − bc) D [e1 , e2 ] = ad − bc , 0 1 (Theorem 21 part 1) (algebra) (axiom 4). Thus |Area| = |(a + c)(b + d) − 2bc − cd − ab| = |ad − bc| You can indulge in a bit of analytic geometry (or look at my figure) to show that the area of a parallelogram spanned by X1 and X2 is |ad − bc| . From our explicit calculation, the existence and uniqueness of the determinant of two vectors in R2 has been proved. There are several ways to prove the general existence and uniqueness of a determinant function. Our procedure is to first prove there is at most one determinant function (uniqueness). Then we shall define a function inductively, and verify it satisfies the axioms. By uniqueness, it must be the only function. Two interesting and important preliminary propositions are needed. The following lemma shows how to evaluate the determinant if all of the elements above the principal diagonal are zeroes (that is, the determinant of a lower triangular matrix). LEMMA: Let X1 , · · · , Xn be the columns of a lower triangular matrix a11 0 0 ... 0 a21 a22 0 · · · 0 · · · · · · an1 an2 · · · · · · ann Then D(X1 , · · · , Xn ) = a11 a22 · · · ann D(e1 , · · · , en ) = a11 a22 · · · ann , that is, the determinant of a triangular matrix is the product of the diagonal elements. Proof: If any one of the principal diagonal elements are zero, then the determinant is zero. For example, if ajj = 0 , then the n − j + 1 vectors Xj , · · · , Xn all have their first j components zero, and hence can span at most an n − j dimensional space. Since n − j + 1 > n − j , these vectors must be linearly dependent. Therefore, by Theorem 21, part 3, the determinant is zero, as the theorem asserts. [If you didn’t follow this, look at a 3 × 3 or 4 × 4 lower triangular matrix and think for a moment]. 5.3. VOLUME, DETERMINANTS, AND LINEAR ALGEBRAIC EQUATIONS. 223 If none of the diagonal elements are zero, we can carry out the following simple recipe. The recipe gives a procedure for reducing the problem to evaluating a matrix which is zero everywhere except along the diagonal. First, we get all zeros to the left of a22 in the second row by multiplying the second column, X2 , by −a21 /a22 and adding the resulting vector to X1 . This gives a new first column with i = 2, j = 1 element zero. Moreover, the new matrix has the same determinant as the old one (Theorem 21, part 1). It looks like a11 0 0 0 0 a22 0 · a31 a32 a33 · ˜ · · · 0 . · · · · · · · · an1 an2 · · · ann ˜ Only the first column has changed. Repeat the same process to get all zeros to the left of a33 . Thus, multiply the third column by −a31 /a33 and −a31 /a33 and add the result ˜ to the first and second columns respectively. This gives a new matrix, again with equal determinant, but which looks like a11 0 0 0 ··· 0 0 a22 0 0 0 0 0 a33 0 · a41 a42 a43 a44 ˆ · . ˆ · · 0 · · · · · · an1 an2 an3 ˆ ˆ · · Moving on, we gradually eliminate all of the terms the same diagonal ones. The final result is a11 0 · · · 0 0 a22 · · · 0 · · · · · · 0 0 0 ann ann to the left of the diagonal but keep . It has the same determinant as the original matrix, so D(X1 , . . . , Xn ) = D(a11 e1 , . . . , ann en ) = a11 · · · ann D(e1 , . . . , en ), where Axiom 2 has been used to pull out the constants. Now Axiom 4, D(e1 , . . . , en ) = 1 , can be used to complete the proof. Observe that Axiom 4 is not used until the very last step. Thus, the formula D = (something) D(e1 , . . . , en ) depends only on Axioms 1-3. We shall need this soon. The above theorem shows how easy it is to evaluate the determinant of a lower triangular matrix. It becomes particularly valuable when coupled with the next theorem which 224 CHAPTER 5. MATRIX REPRESENTATION shows how the determinant of an arbitrary matrix can be reduced to that of a lower triangular matrix. The reduction procedure given here is the best practical way of evaluating a determinant. There is a peculiar criss-cross method for evaluating 3 × 3 determinants which is taught in many high schools. Forget it. The method is not very practical and does not generalize to 4 × 4 or larger determinants. Theorem 5.24 . The evaluation of the determinant D(X1 , . . . , Xn ) can be reduced to the evaluation of a lower triangular matrix - and hence has the form D = (something)D(e1 , . . . , en ). The proof gives a way of computing “something” in terms of the original matrix. Remark: In the above formula, we did not utilize the fact that D(e1 , · · · , en ) = 1 since this one step in the proof is the only place where Axiom 4 would be used, so we can (and shall) use the fact that this result holds for any function which only satisfies Axioms 1-3. Proof: This is just a recipe for carrying out the reduction. It essentially is a repetition of the last part of the preceding lemma. Instead of waving our hands at the procedure, we shall work out a representative Example: Evaluate D = D(X1 , X2 , X3 , X4 ) = 1 2 −1 0 −1 −2 3 1 0 −1 4 −3 2 5 0 1 by reducing it to a lower triangular determinant. First we get all zeros to the right of the diagonal in the first row, that is, except in the a11 slot, by multiplying X1 by the constants −2, 1 and 0 and adding the resulting vectors to X2 , X3 , and X4 , respectively. We obtain D= 1 2 −1 0 −1 −2 3 1 0 −1 4 −3 2 5 0 1 = 1 0 −1 0 −1 0 3 1 0 −1 4 −3 2 1 0 1 = 1 0 −1 0 0 −1 2 1 0 0 2 1 4 −3 2 1 Now we get all zeros to the right of the diagonal in the second row. Since the new a22 element above is zero, interchange the second and third columns (one could have interchanged the second and fourth). This introduces a factor of −1 (by Theorem 21, part 5). Then multiply the new second column by the constants 0 and − 1 , respectively, and add to the 2 last two columns, respectively. This gives D= 1 0 −1 0 0 −1 2 1 0 0 2 1 4 −3 2 1 = 1 −1 0 2 0 0 0 2 0 1 4 −1 −3 2 1 1 = 1 −1 0 2 0 0 0 2 0 0 4 −1 −5 2 1 0 5.3. VOLUME, DETERMINANTS, AND LINEAR ALGEBRAIC EQUATIONS. 225 And on the third row, where we again want all zeros to the right of the diagonal, so multiply the new third column by −5 and add it to the fourth column: 1 −1 D=− 0 2 0 0 0 2 0 0 4 −1 −5 2 1 0 1 −1 =− 0 2 0 0 0 2 0 0 4 −1 0 2 1 −5 = −(1)(2)(−1)(−5) = −10, where we have used the lemma about determinants of lower triangular matrices to evaluate the last determinant. Uniqueness is now elementary. Theorem 5.25 . There is at most one function D(X1 , · · · , Xn ), X k ∈ Rn , which satisfies the 4 axioms for a determinant function. Proof: Assume there are two such functions, D(X1 , · · · , Xn ) and ˜ D(X1 , · · · , Xn ). Let ˜ ∆(X1 , · · · , Xn ) = D(X1 , · · · , Xn ) − D(X1 , · · · , Xn ). ˜ We shall show ∆(X1 , · · · , Xn ) = 0 for any choice of X1 , · · · , Xn . Since both D and D satisfy Axioms 1-4, we have ˜ 1). ∆ = D − D is real valued. ˜ 2). ∆(. . . , λXj , . . .) = D(. . . , λXj , . . .) − D(. . . , λXj , . . .) ˜ = λD(. . . , Xj , . . .) − λD(. . . , Xj , . . .) = λ∆(. . . , Xj , . . .). ˜ 3). ∆(. . . , Xj + Xk , . . .) = D(. . . , Xj + Xk , . . .) − D(. . . , Xj + Xk , . . .) ˜ = D(. . . , Xj , . . . − D(. . . , Xj , . . .) = ∆(. . . , Xj , . . .), j = k. ˜ 4). ∆(e1 , . . . , en ) = D(e1 , . . . , en ) − D(e1 , . . . , en ) = 1 − 1 = 0 . Thus, ∆ satisfies the same first three axioms but ∆(e1 , . . . , en ) = 0 in place of Axiom 4. Because the proof of Theorem 22 and its predecessors never used Axiom 4, we know that ∆(X1 , . . . , Xn ) = (something) ∆(e1 , . . . , en ) = 0. Thus ∆(X1 , · · · , Xn ) = 0 for any vectors Xj . If it exists, the determinant function is known to be unique. We intend to define the determinant of order n , that is, of n vectors in Rn , in terms of determinants of order n − 1 . The key to such an approach is a relationship between a determinant of order n and 226 CHAPTER 5. MATRIX REPRESENTATION determinants of order n − 1 . To motivate our definition, we first examine the case n = 3 and utilize the intimate relation between determinant and volume. Let X1 , X2 and X3 be three vectors in R3 . To find the determinant D(X1 , X2 , X3 ) , we can resolve one of the vectors, say X1 , into its components X1 = a11 e1 + a21 e2 + a31 e3 . Since the determinant function is linear (Theorem 21, part 4), D = D(X1 , X2 , X3 ) = a11 D(e1 , X2 , X3 ) + a21 D(e2 , X2 , X3 ) + a31 D(e3 , X2 , X3 ). How can we interpret D(a11 e1 , X2 , X3 ) , a11 a12 a13 0 a22 a23 ? 0 a32 a33 D(a11 e1 , X2 , X3 ) = By subtracting suitable multiples of the first column from the other two, we have a11 0 0 0 a22 a23 0 a32 a33 D(a11 e1 , X2 , X3 ) = . Consider the related volume function. The vectors in the last matrix span a parallelepiped whose base is the parallelogram spanned by (0, a22 , a32 ) and (0, a23 , a33 ) , while the height is a11 . Thus, we expect the volume to be a11 times the area of the base. Since the area of a a the base is det 22 23 , we hope a32 a33 a11 0 0 0 a22 a23 0 a32 a33 a22 a23 a32 a33 = a11 . except possibly for a factor of ±1 . This last formula is the connection between determinants of order three and those of order two. Notice that the determinant on the right in the last equation is obtained from that of D = D(X1 , X2 , X3 ) by deleting both the first row and first column. It is called the 1,1 minor of D , and written D11 . More generally, the i, j minor Dij of D is the determinant obtained by deleting the i th row and j th column of D . If D is of order n , then each Dij is of order n − 1 . In this notation, we expect from the expansion of D(X1 , X2 , X3 ) that D(X1 , X2 , X3 ) = ±?a11 D11 ±?a21 D21 ±?31 D31 , or a11 a21 a13 a21 a22 a23 a31 a32 a33 = ±?a11 a22 a23 a32 a33 ±?a21 a12 a13 a32 a33 ±?a31 a12 a13 a22 a23 . where ? indicates our doubt as to the signs. Explicit evaluation of both sides (using Theorem 22) reveals that the correct sign pattern is +, −, + . Having examined this special case (and the 4 × 4 case too), we are tentatively led to Suspicion (Expansion by Minors). If D(X1 , . . . , Xn ) is a determinant function, that is, if it satisfies the axioms, then n (−1)i+j aij Dij , D(X1 , X2 , . . . , Xn ) = i=1 (5-3) 5.3. VOLUME, DETERMINANTS, AND LINEAR ALGEBRAIC EQUATIONS. 227 where Xj = (a1j , a2j , . . . , anj ) . For the case n = 3, j = 1 this is the formula we found above. To verify that the formula is correct, we must verify that the function satisfies our axioms for a determinant. The reasoning goes as follows: we know exactly what determinants of order two are by a previous computation, so the formula gives a candidate for the determinant of order three, which in turn gives a candidate for a determinant function of order four, and so on. Thus, by induction, let us assume that determinants of order k − 1 are known. We must prove Theorem 5.26 . The previous function D(X1 , . . . , Xk ) defined by the above formula is a determinant function, that is, it satisfies the axioms. Proof: 1). D(X1 , . . . , Xk ) is real valued since, by our induction hypothesis, each of the Dij , determinants of order k − 1 , is real valued. 2). D(. . . , λXl , . . .) = λD(. . . , Xl , . . .) . There are two cases. If l = j , then λXj means that a1j , a2j , . . . , is multiplied by λ . Thus n (−1)i+j λaij Dij = λD(. . . , Xj , . . .), D(. . . , λXj , . . .) = i=1 so the axiom is satisfied. If l = j , then some vector X other so · · · λa1l · · · · · · λa2l · D(. . . , λXl , . . . , Xj , . . .) = · · · · · λakl than Xj is multiplied by λ , a1j a2j · · · akj ··· ··· ··· Since Dij is formed by deleting the i th row and j th column of D , and l = j , one column in minor Dij will have the factor λ appearing in it. By the induction hypothesis, the factor can be pulled out of each one, and hence from any linear combination of them. Because the expansion formula for D is a linear combination of the minors, the axiom is verified in this case too. 3). Omitted. This one is just plain messy. If you don’t care to try the general case for yourself, at least try the case n = 3 and verify it there. 4). To prove D(e1 , . . . , en ) = 1 . Of the coefficients a1j , a2j , . . . , anj , only ajj = 0 , and ajj = 1 . Thus D(e1 , . . . , en ) = (−1)j +j ajj Djj = Djj . But by the induction hypothesis, Djj = 1 since it has only ones on its main diagonal and zero elsewhere. Therefore D(e1 , . . . , en ) = 1 , as desired. This theorem completes (except for one segment) the proof that a unique determinant function exists. The uniqueness was proved directly, while the existence was obtained from the known existence of 2 × 2 determinant functions (the simpler case of 1 × 1 determinants could also have been used) and proving inductively that a candidate for the n × n determinant function does satisfy the axioms. Emerging from the jungle of the existence proof, we are fully equipped with the powerful determinant function and the associated volume function. It will be relatively simple to prove the remaining theorems involving determinants. The trick in most of them is to make clever use of the fact that the determinant function is unique. We shall expose this trick in its bare form. 228 CHAPTER 5. MATRIX REPRESENTATION Theorem 5.27 . Let ∆(X1 , . . . , Xn ) be a function of n vectors in Rn which satisfies axioms 1-3 for the determinant. Then for every set of vectors X1 , . . . , Xn ∆(X1 , . . . , Xn ) = ∆(e1 , . . . , en )D(X1 , . . . , Xn ). Thus, the function ∆ differs from D only by a constant multiplicative factor, which is the number ∆ assigns to the unit matrix (geometrically, the unit cube) in Rn . Proof: If ∆(e1 , . . . , en ) = 1 , then ∆ satisfies Axiom 4 also, so by the uniqueness theorem, it must be D itself. If ∆(e1 , . . . , en ) = 1 , consider D(X1 , . . . , Xn ) = ∆(X1 , . . . , Xn ) ˜ . D(X1 , . . . , Xn ) := 1 − ∆(e1 , . . . , en ) Note that the denominator is a fixed scalar which does not depend on X1 , . . . , Xn . It is a ˜ ˜ mental calculation to verify that D satisfies all of Axioms 1-4. Therefore D(X1 , . . . , Xn ) := D(X1 , . . . , Xn ) by uniqueness. Solving the last equation for ∆(X1 , . . . , Xn ) yields the formula. Consider D(X1 , . . . , Xn ) . If B = ((bij )) is a square n × n matrix representing a linear transformation from Rn to Rn , how are D(X1 , . . . , Xn ) and D(BX1 , BX2 , . . . , BXn ) related? The answer to this question is vital if we are to find how volume varies under a linear transformation B . If A = ((aij )) is the matrix whose columns are X1 , . . . , Xn , and C = ((cij )) is the matrix whose columns are BX1 , BX2 , . . . , BXn , then C = BA [since, for example, c11 —the first element in the vector BX1 —is c11 = b11 a11 + b12 a21 + b13 a31 + · · · + b1n an1 .] Because D(X1 , . . . , Xn ) = det A and D(BX1 , . . . , BXn ) = det C , our question becomes one of relating det C = det(BA) to det A . The result is as simple as one could possibly expect. Theorem 5.28 . If A and B are two n × n matrices, then det(BA) = (det B )(det A) = (det A)(det B ) = det(AB ) or, if X1 , . . . , Xn are the column vectors of A , then this is equivalent to D(BX1 , . . . , BXn ) = D(Be1 , Be2 , . . . , Ben )D(X1 , . . . , Xn ) (since the matrix whose columns are Be1 , . . . , Ben is just B ). Proof: Let ∆(X1 , . . . , Xn ) := D(BX1 , . . . , BXn ) . This function clearly satisfies Axiom 1. We shall verify Axioms 2 and 3 at the same time. ∆(. . . , λXj + µXk , . . .) = D(. . . , B (λXn + µXk ), . . .) Because B is a linear transformation, we have = D(. . . , λBXj + µBXk , . . .). By the linearity of D (Theorem 21, part 4) = λD(. . . , BXj , . . .) + µD(. . . , BXk , . . .). 5.3. VOLUME, DETERMINANTS, AND LINEAR ALGEBRAIC EQUATIONS. 229 If j = k , then the vector BXk in the second term on the right also appears as another column in the same determinant. Hence the second term vanishes. Thus if j = k , ∆(. . . , λXj + µXk , . . .) = λD(. . . , BXj , . . .). The special case µ = 0 shows Axiom 2 holds for ∆ , while the case λ = µ = 1 verifies Axiom 3. Therefore ∆ satisfies Axioms 1-3. Applying the preceding Theorem (25), we have ∆(X1 , . . . , Xn ) = ∆(e1 , . . . , en )D(X1 , . . . , Xn ). By definition, ∆(e1 , . . . , en ) := D(Be1 , . . . , Ben ) . Substitution verifies our formula. The commutativity (det B )(det A) = (det A)(det B ) follows from the fact that det A and det B are real numbers - which do commute under multiplications. Corollary 5.29 . If A is an invertible matrix, then det(A−1 ) = 1 . det A Proof: Since AA−1 = I , and det I = 1 , we find (det A)(det A−1 ) = det(AA−1 ) = det I = 1. Ordinary division completes the proof. Our next theorem is also a corollary, but because of its importance, we call it Theorem 5.30 . The vectors X1 , . . . , Xn in Rn are linearly independent if and only if D(X1 , . . . , Xn ) = 0 . Proof: ⇐ If D(X1 , . . . , Xn ) = 0 , then the vectors X1 , . . . , Xn are linearly independent, since if they were dependent, then D = 0 by part 3 of Theorem 21. ⇒ . If X1 , . . . , Xn are linearly independent vectors in Rn , then the Corollary to Theorem 12 (p. 364) shows that the matrix A whose columns are the Xj is invertible. Let A−1 be its inverse. From the computation in the corollary preceding this theorem, (det A)(det A−1 ) = 1. Thus the real number det A cannot be zero. The equivalent form of our theorem is also a consequence of the Corollary to Theorem 12. Example: (cf. p. 157, Ex. 1b). Are the vectors X1 = (0, 1, 1), X2 = (0, 0, 1), X3 = (0, 2, 3) linearly dependent? We compute the determinant D(X1 , X2 , X3 ) = 0 00 1 02. 1 −1 3 230 CHAPTER 5. MATRIX REPRESENTATION If we knew that “the determinant of a matrix was equal to the determinant of its adjoint” (a true theorem to be proved below), then taking the adjoint we get a matrix with one column zero 0 which gives D = 0 . Since the quoted theorem is not yet proved, we proceed differently and reduce our 3 × 3 determinant to 2 × 2 determinants expanding by minors (p. 411). The simplest column to use is the second. 0 00 1 02 1 −1 3 12 13 = (−1)1+20 = 00 12 + (−1)2+20 00 13 + (−1)3+2 (−1) 00 12 =0·2−1·0=0 by the explicit formula for evaluating 2 × 2 determinants. Thus D = 0 so the vectors X1 , X2 , X3 are linearly dependent. That nice theorem we could have used in the above example is our next target. Theorem 5.31 . If A is an n × n matrix, then det A∗ = det A. Proof: Let A1 , . . . , An be the columns of A and B, . . . , Bn its rows, A= a11 a12 · · · a21 · · · · · · · · · an1 · · · · · · a1n a2n · · · ann B1 } B2 · . · · } Bn Consider the function D(B1 , . . . , Bn ) = a11 · · · a12 · · a1n an1 · · · ann } A1 · · = det A∗ . · } An since the rows of A are the columns of A∗ . Let us define a new function ˆ D(A1 , . . . , An ) := D(B1 , . . . , Bn ). ˆ Our task is to verify that D(A1 , . . . , An ) satisfies all of Axioms 1-4. Then by uniqueness ˆ det A∗ := D(A1 , . . . , An ) = D(A1 , . . . , An ) = det A. ˆ (1) D(A1 , . . . , An ) is a real number since det A∗ , the determinant of the matrix A∗ is a real number. 5.3. VOLUME, DETERMINANTS, AND LINEAR ALGEBRAIC EQUATIONS. 231 ˆ ˆ (2) We must show D(. . . , λAj , . . .) = λD(. . . , Aj , . . .) , that is, a11 · · λa1j · · · a1n a2j ··· λa2j ... ... =λ ann ··· anl ··· anj a1n · · · . . . λanj an1 ann a11 · · a1j (a fact we only know so far if a column is multiplied by a scalar). Trick: observe that 10 0 1 j th row . . . 0 ··· 0 = 0 0 λ 1 0 0... 0 1 0 a11 · · · · · a1j · · · · · a1n · · · anl · · anj · · ann a11 · · · · · a1j · · · · · a1n · · · anl · · anj · · ann The matrix on the left is the identity matrix I except for a λ in its j th row and j th column. Its determinant is λ (since you can factor λ from the j th column and are left with the identity matrix). By Theorem 26, the determinant of the product ˆ ˆ ˆ on the left is λD(A1 , . . . , An ) while the right is D(A1 , . . . , An ) , proving D satisfies Axiom 2. ˆ (3) The proof of Axiom 3 involves a similar trick. We have to show D(. . . , Aj + Ak , . . .) = ˆ D(. . . , Aj , . . .) where j = k , that is, to show a11 ··· · · a1j + a1k · · · · · · a1n ··· an1 · · anj + ank · · · ann = a11 · · · · · a1j · · · · · · a1n · · · an1 anj · · · ann , j = k. 232 CHAPTER 5. MATRIX REPRESENTATION Observe that 1 0 ··· 0 0 0 1 ··· 0 0 ··· ··· ··· ··· 0 1 0 a11 · · · · · a1j · · · · · · a1n · · · an1 anj = ann a11 ··· · · a1j + a1k · · · · · · a1n ··· an1 · · anj + ank , ann where the matrix on the left is the identity matrix with an extra 1 in the j th row, k th column. Since the determinant of this matrix is one (check by a mental computation), the rule for the determinant of a product of matrices shows that Axiom 3 is satisfied. (4) Easy, for ˆ D(e1 , . . . , en ) = 1 0 0 1 ··· 0 ··· ··· ··· ··· ··· ··· ··· 1 0 0 0 0 1 = D(e1 , . . . , en ) = 1. This verification of the four Axioms coupled with the remarks at the beginning of the proof completes the proof. Corollary 5.32 . The column operations of Theorem 21 are also valid as row operations. Proof: Every row operation on a matrix A (like adding two rows) can be split up to : i) take A∗ so the rows become columns, ii) carry out the operation on the column of A∗ and iii) take the adjoint again. Since the determinant does not change under these operations, we are done. Corollary 5.33 . If R is an orthogonal matrix then det R = ±1. Proof: If R is orthogonal, then R∗ R = I by Theorem 19 (p. 383). Thus, a = det I = det(R∗ R) = (det R∗ )(det R) = (det R)2 , where Theorems 25 and 27 were invoked once each. Now take the square root of both sides. The orthogonal matrices R1 = 10 01 and R2 = 01 10 , for which det R1 = 1 and det R2 = −1 show that both signs are possible. If det R = −1 , then the orthogonal transformation has not only been a rotation but also a reflection. The transformation given by R2 is a figure goes here 5.3. VOLUME, DETERMINANTS, AND LINEAR ALGEBRAIC EQUATIONS. 233 which can be thought of as the composition (product) of a rotation by +900 followed ˆ˜ by a reflection (mirror image). In fact, R2 may be factored into RR = r2 , where 1 0 0 −1 01 −1 0 01 10 = = R2 . Pictorially a figure goes here Our theorems about determinants also imply the following valuable result about volume. Theorem 5.34 . Let X1 , . . . , Xn span a parallelepiped Q in En and the matrix A map En into En . Then the volume is magnified by |det A| , that is, V [AX1 , . . . , AXn ] = |det A| V [X1 , . . . , Xn ]. If we denote the image of Q by A(Q) , then this theorem reads Vol[A(Q)] = |det A| Vol[Q]. Proof: [We should first prove that there is at most one volume function V satisfying its four axioms. Since V := |D| is a volume function, assume there is another volume function ˜ V ∗ and define D(X1 , . . . , Xn ) by ˜ D(X1 , . . . , Xn ) := V ∗ (X1 ,...,Xn )D(X1 ,...,Xn ) |D(X1 ,...,Xn )| 0 if D = 0 if D = 0. ˜ ˜ It is simple to check that D satisfies the axioms for a determinant. By uniqueness, D = D . ∗ (X , . . . , X ) = |D (X . . . , X )| ≡ V (X , . . . , X ) , so Solving the last equation, we find V 1 n , n 1 n the volume function is also unique.] The theorem is easily proved. Since V = |D| , an application of Theorem 26 tells us that V [AX1 , . . . , AXn ] = |D(AX1 , . . . , AXn )| = |D(Ae1 , . . . , Aen )| = |det A| |D(X1 , . . . , Xn )| V [X1 , . . . , Xn ]. Done. Corollary 5.35 . Volume is invariant under an orthogonal transformation. V (RQ) = V (Q) Proof: If R is an orthogonal transformation, |det R| = 1 . Remark 1. Since we eventually want to define the volume of suitable sets by approximating the sets by parallelepipeds, this theorem will allow us to conclude the same results about how the volume of some set changes under a linear transformation in general and an orthogonal transformation in particular. 234 CHAPTER 5. MATRIX REPRESENTATION Remark: 2 We define the determinant of a linear transformation L which maps Rn into Rn as the determinant of a matrix which represents L . This definition makes it mandatory to prove: “the determinant of two different matrices which represent L (different because of a different choice of bases) are equal.” However the theorem is an immediate consequence of the following fact we never proved: “if A and B are matrices which represent the same linear transformation L with respect to different bases then there is a nonsingular matrix C such that B = CAC −1 .” The matrix C is the matrix expressing one set of bases vectors in terms of the other bases. Using this theorem, we find det B = det(CAC −1 ) = (det C )(det A)(det C −1 ) = det A. How does volume change under a translation T, T X = X + X0 ? A little thought is needed. Imagine a parallelepiped Q spanned by X1 , . . . , Xn . The crux of the matter is to realize that the parallelepiped has the origin as one of its vertices and X1 , . . . , Xn at the others. Under the translation T , not only do the Xj ’s get translated through X0 , but so does the origin, 0 → X0 , X1 → X1 + X0 , X2 → X2 + X0 , etc. a figure goes here In terms of free vectors, the edge from 0 to Xj becomes the edge from X0 to Xj + X0 (see figure). Thus the free vector representing this edge is (Xj + X0 ) − X0 , that is, it is still Xj ! This motivates the Definition: The volume of a parallelepiped is defined to be the volume of the parallelepiped after translating one vertex to the origin. Theorem 5.36 . The change in volume of a parallelepiped Q under an affine transformation AX = LX + X0 , L linear, is given by: Vol[A(Q)] = |det L| Vol[Q]. In particular, volume is invariant under a rigid body transformation (for then L is an orthogonal transformation). Proof: The affine transformation may be factored into A = T L , a linear transformation followed by a translation (p. 380). Since L changes volume by |det L| while translation preserves the volume, the net result is a change by |det L| as claimed. a) Application to Linear Equations What have our geometrically motivated determinants in common with the determinants of high school fame - where they were used to solve systems of linear algebraic equations? Everything, for they are the same. Since determinants are defined only for square matrices, they are applicable to linear algebraic equations only when there are the same number of equations as unknowns. At the end of this section, we shall make some remarks about the case when the number of equations and unknowns are not equal. Consider the system of equations a11 x1 + · · · + a1n xn = y1 a21 x1 + · · · + a2n xn = y2 . . . . . . an1 x1 + · · · + ann xn = yn , 5.3. VOLUME, DETERMINANTS, AND LINEAR ALGEBRAIC EQUATIONS. 235 which we can write as x1 A1 + · · · + xn An = Y, where Aj is the j th column of the matrix A = ((aij )) and Y is the obvious column vector. The problem is to find numbers x1 , . . . , xn such that x1 A1 + · · · + xn An = Y , where Y is given. Theorem 5.37 . Let A = ((aij )) be a square n × n matrix and Y a given vector. The system of linear algebraic equations AX = Y can always be solved for X if and only if det A = 0 . This can be rephrased as, A is invertible if and only if det A = 0 . Proof: Let Aj be the j th vector of A . Each Aj is a vector in Rn . If det A = 0 , then the An ’s are linearly independent by Theorem 27, p. 417. But since they are linearly independent and there are n of them, A1 , · · · , An , they must span Rn . Thus, any Y ∈ Rn can be written as a linear combination of the Aj ’s. The numbers x1 , · · · , xn are just the coefficients in this linear combination. Conversely, if the equations AX = Y can be solved for any Y ∈ Rn , then the vectors A1 , · · · , An span Rn . But if n vectors span Rn , these vectors must be linearly independent, so det A = 0 , again by Theorem 27, page 417. Theorem 5.38 . Let A be a square matrix. The system of homogeneous equations AX = 0 has a non-trivial solution if and only if det A = 0 . Proof: By Theorem 27, Page 417, det A = 0 if and only if the column vectors A1 , . . . , An are linearly dependent. Now if the column vectors A1 , . . . , An are linearly dependent, then there are numbers x1 , . . . , xn , not all zero, such that x1 A1 + . . . + xn An = 0 . The vector X = (x1 , . . . , xn ) is then a non-trivial solution of AX = 0 . Conversely, if there is a non-trivial solution of AX = 0 , then x1 A1 + · · · + xn An = 0 , so the Aj ’s are linearly dependent. Hence det A = 0 . In contrast to the above theorems which give no hint of a procedure for finding the desired vector X , the next theorem gives an explicit formula for the solution of AX = Y . Theorem 5.39 (Cramer’s Rule). Let A = ((aij )) be a square n × n matrix with columns A1 , . . . , An . Assume det A = 0 . Then for any vector Y , the solution of AX = Y is x1 = D(Y, A2 , . . . , An ) , D(A1 , . . . , An ) x2 = D(A1 , Y, A3 , . . . , An ) D(A1 , . . . , An ) . . . xn = D(A1 , . . . , An−1 , Y ) . D(A1 , . . . , An ) For example, in detail, the formula for x2 is ··· a1n . . . an1 yn an3 · · · ann a11 . . . x2 = y1 . . . a13 . . . a12 . . . a13 . . . ··· . . . a1n . . . an1 an2 an3 ··· ann a11 . . . . 236 CHAPTER 5. MATRIX REPRESENTATION Proof: A snap. Since det A = 0 , by Theorem 31 we know a solution X = (x1 , . . . , xn ) exists. Thus x1 A1 + · · · + xn An = Y . Let us obtain the formula for x2 as a representative case. Observe that D(A1 , Y, A3 , · · · , An ) = D(A1 , x1 A1 + · · · + xn An , A3 , · · · , An ). Since D is multilinear, we can expand the above to = x1 D(A1 , A1 , A3 , · · · , An ) + xn D(A1 , A2 , A3 , · · · , An ) + · · · + xn D(A1 , An , A3 · · · An ). Now all of these determinants, except the second one, vanishes since each has two identical columns (part 5 of Theorem 21, page 400). Thus D(A1 , Y, A3 , ·, An ) = x2 D(A1 , A2 , · · · , An ). Because det A = D(A1 , · · · , An ) = 0 , we can divide to find the desired formula for x2 . Done. Remark: This elegant formula is mainly of theoretical use. It is not the most efficient procedure for solving such equations. That honor belongs to the method of reducing to triangular form which was outlined in the proof of Theorem 22. To be more vivid, if Cramer’s rule were used to solve a system of 26 equations, approximately (23 + 1)! ≈ 1028 multiplications would be required. Reduction to triangular form, on the other hand, would only require about (1/3)(23)3 ≈ 6000 multiplications. Think about that. For non-square matrices, determinants are not applicable. Given a vector Y , one would still like a criterion to determine if one can solve AX = Y , that is, one would like a criterion to see if Y ∈ R(A) . Theorem 5.40 . Let L : V1 → V2 be a linear operator. Then R(L)⊥ = N(L∗ ); or equivalently (for finite dimensional spaces) R(L) = N(L∗ )⊥ . Proof: If X ∈ V , and Y ∈ R(L)⊥ , then for all X 0 = Y , LX = L∗ Y, X . This means L∗ Y is orthogonal to all X , consequently, L∗ Y = 0 , so Y ∈ N(L∗ ) . The converse is proved by observing that our steps are reversible. Application. For what vectors Y = (y1 , y2 , y3 ) can you solve the equations 2x1 , +3x2 = y1 x1 − x2 = y2 x1 + 2x2 = y3 ? 5.3. VOLUME, DETERMINANTS, AND LINEAR ALGEBRAIC EQUATIONS. 237 If the equations are written as AX = Y , then by the above theorem Y ∈ R(A) if and only if Y⊥ N(A∗ ) . Let us find a basis for N(A∗ ) . This means solving the homogeneous equations A∗ Z = 0 , 2z 1 + z 2 + z 3 = 0 3z1 − z2 + 2z3 = 0. If we let z1 = α , and solve the resulting equations for z2 and z3 , we find that z3 = −5α/3 and z2 = −11α/3 . Consequently, all vectors Z ∈ (A∗ ) have the form Z = (3α, −11α, −5α) . A basis for N(A∗ ) is e = (3, −11, −5) . Therefore, Y⊥ N(A∗ ) if and only if 3y1 − 11y2 − 5y3 = 0 . By the above reasoning, the equation AX = Y can be solved for only these Y ’s. Remark: The use of Theorem 34 as a criterion for finding if Y ∈ R(L) is much more valuable in infinite dimensional spaces, for it quite often turns out that N(L∗ ) is still finite dimensional while R(L) is infinite dimensional. For more on these ideas, see page 389, Exercise 12 and page 501 Exercises 27- 29. Exercises (1) Evaluate the following determinants as you see fit: a). 7 3 , 2 −1 c). −10 −2 3 −3 2 1, 5 0 −1 f ). 2 1 1 1 1 2 1 1 1 1 2 1 b). 1 1 , 1 2 1 2 5 . −3 4 d). g). a b c c d 1 1 0 0 e 53 17 29 36 12 39 , 69 23 75 e). 1 1 0 1 2 0 3 4 1 −5 2 3 1 0 6 4 00 0 00 0 0 1 −b 0 1 −a 1f g [Answers: a) −13 , b) 17, c) −14 , d) 6, e) 5, g) −(b − a)2 ]. (2) If A and B are the matrices whose respective determinants appear in #1 a) and b), compute det(AB ) by first finding AB . Compare with (det A)(det B ) . (3) a). Use Cramer’s rule (Theorem 33) to solve the equation AX = Y , where A is given below. Then observe you have computed A−1 , so exhibit it. 1 1 1 6 8 2 1 −6 −3 3 ]. A = 2 −3 −1 . [A−1 = 30 4 9 1 30 −5 −5 b). Use the formula for A−1 to solve the equations AX = Y where Y = (1, 2, 0). 238 CHAPTER 5. MATRIX REPRESENTATION (4) a). Find the volume of the parallelepiped Q in E3 which is spanned by the vectors X1 = (1, 1, 1), X2 = (2, −1, −3) and X3 = (4, 1, 9) . [Answer: Volume = 30]. b). The matrix A , −10 −2 3 2 1 A = −3 5 0 −1 − (cf. #1,c) maps E3 into itself. Find the volume of the image of Q , that is, the volume of A(Q) . [Answer: 420]. (5) Let B = A − λI where A is a square matrix. The values λ for which B is singular are called the eigenvalues of A . Find the eigenvalues for a). A = 3 2 2 −1 c). A = ab cd , b). A = 3 2 1 −1 . . [Hint: If B is singular, then 0 = det B = det(A − λI ) . Now observe that det(A − λI ) 1 is a polynomial in λ . The answer to c) is λ = 2 (a + d ± (a + d)2 − 4(ad − bc))]. (6) For what value(s) of α are the vectors X1 = (1, 2, 3), X2 = (2, 0, 1), X3 = (0, α, −1) linearly dependent? (7) If X1 , X2 , X3 and Y1 , Y2 , Y3 are vectors in R3 , prove that D[X1 , X2 , X3 ] − D[Y1 , Y2 , Y3 ] = D[X1 − Y1 , X2 , X3 ] + D[X1 , X2 − Y2 , X3 ] + D[X1 , X2 , X3 − Y3 ]. [Hint: First work out the corresponding formula for the 2 × 2 case.] (8) Here you shall compute the derivative of a determinant if the coefficients of A = ((aij )) depend on t, aij (t) . Let X1 (t), . . . , Xn (t) be the vectors which constitute the columns of A . The problem is to compute dD(t) d d = D[X1 , . . . , Xn ](t) = dt dt dt a11 (t), · · · , a1n (t) · · · an1 (t), · · · , ann (t) a). Use Exercise 7 (generalized to n × n matrices) to show D(t + ∆t) − D(t) ≡ D[X1 (t + ∆t), X2 (t + ∆t), . . .] − D[X1 (t), X2 (t), . . .] n D[X1 (t), . . . , Xj −1 (t), Xj (t + ∆t) − Xj (t), Xj +1 (t + ∆b), . . .] = j =1 5.3. VOLUME, DETERMINANTS, AND LINEAR ALGEBRAIC EQUATIONS. 239 [Hint: Do the cases n = 2 and n = 3 first]. b). Use part a to show that dD D(t + ∆t) − D(t) = lim ∆t→0 dt ∆t n = D[X1 , . . . , Xj −1 , j =1 dXj , Xj +1 , . . . , Xn ], dt so the derivative of a determinant is found by taking the derivative one column at a time and adding the result. (9) Let u1 (t) , and u3 (t) be solutions of the differential equation u + a1 (t)u + a0 (t)u = 0. Consider the Wronski determinant W (u1 , u2 )(t) := u1 (t) u2 (t) u1 (t) u2 (t) (a) Use Exercise 8 to prove dW = −a1 (t)W. dt (b) Consequently, show t W (t) = W (t0 ) exp − a1 (s) ds . t0 (c) Apply this to show that if the vectors (u1 (t), u1 (t)) and (u2 (t), u2 (t)) are linearly independent at t = t0 , then they are always linearly independent. (d) Let u1 (t) . . . , un (t) be solutions of the differential equation u(n) + an−1 (t)u(n−1) + · · · + a1 (t)u + a0 (t)u = 0. Consider the Wronski determinant of u1 , . . . , un W (u1 , . . . , un ) = u1 u1 · · · (n−1) u1 Prove u2 u2 ··· ··· (n−) ··· a2 un un (n−1) un dW = −an−1 (t)W, dt so again t W (t) = W (t0 ) exp − an−1 (s) ds . t0 240 CHAPTER 5. MATRIX REPRESENTATION (e) Use part d) to conclude that the n vectors (n−1) (u1 , u1 , . . . , u1 (n−1) ), (u2 , u2 , . . . , u2 , ), · · · (un , un , . . . , u(n−1) ) n (where the uj are solutions of the O.D.E.) are linearly independent for all t if and only if they are so at t = t0 . (10) A matrix A is upper (lower) triangular if diagonal are zero, a11 a12 0 a22 A= 0 0 0 0 all the elements below (above) the main ··· ··· ··· an · . · ann If A is upper (or lower) triangular, prove again that det A = a11 a22 . . . ann . by expanding by minors. What is the relation of this result to the exercise ( #4 , p. 157) on echelon form? ˆ (11) Let X1 , . . . , Xn be vectors in Rn and let D(X1 , . . . , Xn ) be a real valued function ˆ which has properties 1 and 4 of Theorem 21. Thus D is skew-symmetric, and is ˆ necessarily satisfies Axioms 2 and 3 for the linear in each of its columns. Prove D determinant, and conclude that ˆ D(X1 , . . . , Xn ) = kD(X1 , . . . , Xn ), where the constant k = D(e1 , . . . , en ) . (12) Let u1 (t), . . . , un (t) be sufficiently differentiable functions (C n−1 is enough). Define the Wronskian as in Exercise 9 part d. Prove that if the functions u1 , . . . , un are linearly dependent, then W (t) ≡ 0 . Thus, if W (t0 ) = 0 , the functions are linearly independent in any interval containing t0 . [Do not try to apply the result of Exercise 9 for it is not applicable]. (13) (a) If I is the n × n identity matrix, evaluate det(λI ) where λ is a constant. (b) If A is an n × n matrix, prove det(λA) = λn det A. (c) If A or B are n × n matrices, is ? det(A + B ) = det A + det B ? Proof or counterexample. (14) For what value of α does the system of equations x + 2y + z = 0 −2x + αy + 2z = 0 x + 2y + 3z = 0 have more than one solution? 5.3. VOLUME, DETERMINANTS, AND LINEAR ALGEBRAIC EQUATIONS. 241 (15) A matrix is nilpotent if some power of it is zero, that is, AN = 0 for some positive integer N . Prove that if A is nilpotent, then det A = 0 . (16) (a) Solve the systems of equations i) x + y = 1 , x − .9y = −1 and ii) x + y = 1 , x − 1.1y = −1 , and compare your solutions, which should be almost the same. (b) Solve the systems of equations i) x + y = 1 , x + .9y = −1, and x + y = 1 , x + 1.1y = −1. and again compare your solutions. Explain the result in terms of the theory in this section. (c) Consider the solution of the systems of equations x+y =1 x + αy = −1 as the point where the lines x + y = 1 and x + αy = −1 intersect. Sketch the graph of these lines for α near −1 and then for α near +1 . Use these observations to again explain the phenomena in parts a) and b). (17) Let ∆n be the n × n determinant of a matrix with a ’s along the main diagonal and b ’s on the two “off diagonals” directly above and below the main diagonal. Thus ∆5 = ab000 bab00 0bab0. 00bab 000ba (a) Prove ∆n = a∆n−1 − b2 ∆n−2 . (b) Compute ∆1 and ∆2 by hand. Then use the formula to compute ∆3 and ∆4 . (c) If a2 = 4b2 , can you show ∆n = √ a2 1 a+ 2 − 4b √ a2 2 − 4b 2 n+1 − a− √ a2 2 − 4b2 n+1 ? Later, we shall give a method for obtaining this directly from the equation of part a). [p. 522-523]. (18) Prove Part 5 of Theorem 21 using only the axioms and no other part of Theorem 21. 242 CHAPTER 5. MATRIX REPRESENTATION (19) Apply the result of Exercise 12 on page 389. Try to prove the following. A is a square matrix. a). dim N(A) = dim N(A∗ ). Thus, the homogeneous equation AX = 0 has the same number of linearly independent solutions as does the equation A∗ Z = 0 . b). Let Z1 , . . . , Zk span N(A∗ ) . Then the inhomogeneous equation AX = Y has a solution, that is, Y ∈ R(A) , if and only if Zj , Y = 0, j = 1, 2, . . . , k. In other words, the equation AX = Y has a solution if and only if Y is orthogonal to the solutions of the homogeneous adjoint equation. c). Consider the system of linear equations 2x − 3y + z = 1 −3x + 2y − 4z = α x − 4y − 2z = β. Let A be the coefficient matrix. Find a basis for N(A∗ ) . [Answer: dim N(A∗ ) = 1 and Z1 = (2, 1, −1) is a basis]. For what value(s) of the constants α, β can you solve the given system of equations? [Answer: There is a solution if and only if β − α = 2 .] Find a solution if α = 1 and β = 3 . d). Repeat part c) for the system of equations x−y =1 x − 2y = −1 x + 3y = α. [Answer: dim N(A) = 1 and Z1 = (−5, 4, 1) is a basis. There is a solution if and only if α = −1 ]. (20) Use the result of Exercise 12 to prove that each of the following sets of functions are linearly independent everywhere. a) u1 (x) = sin x, b) u1 (x) = sin nx, u2 (x) = cos x u2 (x) = cos mx, where n = 0. c) u1 (x) = ex , u2 (x) = e2x , u3 (x) = e3x . d) u1 (x) = eax , u2 (x) = ebx , u3 (x) = ecx , where a, b , and c are distinct numbers. e) u1 (x) = 1, u2 (x) = x, u3 (x) = x2 , u4 (x) = x3 f) u1 (x) = ex , u2 (x) = e−x , u3 (x) = xex , u4 (x) = xe−x . 5.4. AN APPLICATION TO GENETICS 5.4 243 An Application to Genetics A mathematical model is developed and solved. Although this particular model will be motivated by genetics, the resulting mathematical problem also arises in sociometrics and statistical mechanics as well as many other places. In the literature you will find these mathematical ideas listed under the title Markov chains. Part of the value you should glean from our discourse is insight into the process of going from vague qualitative phenomena to setting up a quantitative model. One part of this scientific process we shall not have time to investigate in detail is the very important step of comparing the quantitative results with experimental data. Furthermore, we shall never delve into the fertile realm of generalizing our accumulated knowledge to more complicated - as well as more interesting and realistic - situations. In bisexual mating, the genes of the resulting offspring occur in pairs, one gene in each pair being contributed by each parent. Consider the simplest case of a trait which is determined by a single pair of genes, each of which is one of two types g and G . Thus, the father contributes G or g to the pair, and the mother does likewise. Since experimental results show that the pair Gg is identical to the pair gG , the offspring has one of the three pairs GG Gg gg. The gene G dominates g if the resulting offspring with genetic types GG and Gg “appear” identical but both are different from gg . In this case, an individual with genetic type GG is called dominant, while the types gg and Gg are called recessive and hybrid, respectively. An offspring can have the pair GG (resp. gg ) if and only if both parents contributed a gene of type G (resp. g) while the combination Gg occurs if either parent contributed G and the other g . A fundamental assumption we shall make is that a parent with genetic type ab can only contribute a gene of type a or of type b . This assumption ignores such things as radioactivity as a genetic force. Thus, a dominant parent, GG can only contribute a dominant gene, G , a recessive parent, gg , can only contribute g , and a hybrid parent Gg can contribute either G or g (with equal probability). Consequently, if two hybrids are mated, the offspring has probability 1 of getting G or g from each parent, so the 2 probability of his having genetic type GG of gg is 1 each, while the probability of having 4 1 genetic type Gg is 2 . We introduce a probability vector V = (v1 , v2 , v3 ) , with v1 representing the probability of being genetic type GG , v2 of being type Gg , and v3 of being type gg . Thus for an 11 offspring of two hybrid parents, V = ( 4 , 2 , 1 ) . Observe that, by definition of probability, 4 0 ≤ vj ≤ 1, j = 1, 2, 3 , and v1 + v2 + v3 = 1 (since with probability one - certainty - the offspring is either GG , Gg , or gg ). Consider the issue of mating an individual whose genetic type is unknown with an individual of known genetic type (dominant, hybrid or recessive). To be specific, assume the known person is of dominant type. Then the following matrix of transition probabilities 1 1 /2 0 D = 0 1 /2 1 000 describes the probability of the offspring’s genetic type in the following sense: if the unknown parent had genetic type V0 (so V0 = (1, 0, 0) if unknown was dominant, V0 = (0, 1, 0) if 244 CHAPTER 5. MATRIX REPRESENTATION hybrid, and V0 = (0, 0, 1) if recessive), then V1 = DV0 , is the probability vector of the offspring. For example, if the unknown parent was hy1 brid, then V1 = DV0 = ( 1 , 2 , 0) . Thus the offspring can, with equal likelihood, be either 2 dominant or hybrid, but cannot be recessive. Notice that the matrix D embodies the fact that one of the parents is dominant. If the individual of unknown genetic type were crossed with an individual of hybrid type, then the corresponding matrix H is 1 1 2 40 1 H = 1 1 2 , 2 2 1 1 042 while if the person of unknown type were crossed with the individual of recessive type, then 000 1 R = 1 2 0 . 1 021 It is of interest to investigate the question of genetic stability under various circumstances. Say we begin with an individual of unknown genetic type and cross it with a dominant individual, then cross that offspring with another dominant individual, and so on, always mating the resulting offspring with a dominant individual. Let Vn represent the genetic probability vector for the offspring in the n th generation. Then Vn = DVn−1 = D2 Vn−2 = · · · = Dn V0 , where V0 is the unknown vector for the initial parent (of unknown genetic type). Without knowing V0 , can we predict the eventual (n → ∞ ) genetic types of the offspring? Intuitively, we expect that no matter what the type of the initial parent, the repeated mating with a dominant individual will produce a dominant strain. The question we are asking is, does lim Vn exist, and if so, what is it? n→∞ Assume for the moment that the limit does exist and denote it by V . Then V = DV since V = lim Vn = lim Vn+1 = lim DVn = D( lim Vn ) = DV n→∞ n→∞ n→∞ n→∞ Armed with the equation DV = V , we can solve linear equations for the vector V = (v1 , v2 , v3 ) 1 v1 + v2 + 0 = v1 2 1 0 + v2 + v3 = v2 2 0 + 0 + 0 = v3 . Clearly v1 = v2 = v3 = 0 is a trivial solution. A non-trivial one can be found by transposing the vj ’s to the left side and solving. We find v1 = 1, v2 = 0, v3 = 0(v1 = 1 since v1 + v2 + v3 = 1 ). Thus, if the limit Vn exists, the limit must be V = (1, 0, 0) . In genetic terms, this sustains our feeling that the offspring will eventually become genetically dominant. 5.4. AN APPLICATION TO GENETICS 245 But does the limit exist? To prove it does, we must show for any probability vector V0 = (v1 , v2 , v3 ) , where v1 + v2 + v3 = 1 , that the limit lim Vn = lim Dn V0 , n→∞ n→∞ exists and equals V = (1, 0, 0) . By evaluating D, D2 , and D3 explicitly, we are led to guess 1 1 − 21 1 − 2n1 1 n − 1 1 , Dn = 0 2n 2n−1 0 0 0 which is then easily verified using mathematical induction. Thus v1 + (1 − 21 ) + (1 − 2n1 1 )v3 n − 1 + 2n1 1 v3 Vn = D n V0 = 0 + − 2n v2 0+ 0 +0 v1 + v2 + v3 − 1 = 2n v2 + 0 1 2n (v2 + 1 v 2n−1 3 2v3 ) Since v1 + v2 + v3 = 1 , we find 1 −1 1 Vn = Dn V0 = 0 + n (v2 + 2v3 ) 1 . 2 0 0 It is now clear that the limit as n → ∞ does exist, and is V = (1, 0, 0) . Consequently, if we begin with a random individual (you) and mate that individual and the successive offspring with a dominant gene bearer, then the resulting generations will tend to all dominant individuals. Moreover, the process proceeds exponentially because the “damping factor” is 1 essentially 2 for each generation (see above formula). Were there enough time, you would see a second application of matrices to the special theory of relativity. Given your knowledge of linear spaces, it is possible to present an elegant exposition of the theory. The Lorentz transformation would appear as an orthogonal transformation - a rotation - in world space or Minkowski’s space as it is often called. This is a four dimensional space three of whose dimension are those of ordinary space, while √ the fourth dimension is an imaginary (i = −1) time dimension. Goldstein’s Classical Mechanics contains the topic. Regrettably, he does not begin with the Michelson - Morley experiment but rather plunges immediately into mathematical technicalities. Exercises 1. If you begin with an individual of unknown genetic type and cross it with a hybrid individual and then cross the successive offspring with hybrids, does the resulting strain approach equilibrium? If so, what is it? 2. Same as 1 but you mate an individual of unknown type with a recessive individual. 3. Beginning with an individual of unknown genetic type, you mate it with a dominant individual, mate the offspring with a hybrid, mate that offspring with a dominant, and continue mating alternate generations with dominants and hybrids respectively. Does the 246 CHAPTER 5. MATRIX REPRESENTATION resulting strain approach equilibrium? If so, what is it? (You will need to define equilibrium to cope with this problem. There are several reasonable definitions.) 4. a). The city X has found that each year 5% of the city dwellers move to the suburbs, while only 1% of the suburbanites move to the city. Assuming the total population of the city plus suburb does not change, show that the matrix of transition probabilities is P= .95 .01 .05 .99 , where a vector V = (v1 , v2 ) = (proportion of people in city, proportion of people in suburb). b). Given any initial population distribution V , does the population approach an equilibrium distribution? If so, find it. 5. A long queue in front of a Moscow market in the Stalin era sees the butcher whisper to the first in line. He tells her “Yes, there is steak today.” She tells the one behind her and so on down the line. However, Moscow housewives are not reliable transmitters. If one is told “yes”, there is only an 80% chance she’ll report “yes” to the person behind her. On the other hand, being optimistic, if one hears “no”, she will report “yes” 40% of the time. If the queue is very long, what fraction of them will hear “there is no steak”? [This problem can be solved without finding a formula for P n , although you might find it a challenge to find the formula]. 5.5 A pause to find out where we are . We all know the homily about the forest and the trees. The next few pages are about the forest. In the beginning we introduced dead linear spaces with their algebraic structure (Chapter II). Then we investigated the geometry induced by defining an inner product on a linear space and saw how easily many of the results in Euclidean geometry generalize (Chapter III). Our next step was to consider mappings, linear mappings, between linear spaces (Chapter IV). Not much could be said in general, so we began investigating a particular case, linear maps between finite dimensional spaces. Two important special cases of this L : R1 → Rn , and L : Rn → R1 , were treated before the general case, L : Rn → Rm . A key theorem which facilitates the theory of linear mappings between finite dimensional spaces is the representation theorem (page 374): every such map can be represented as a matrix. What next? There are two equally reasonable alternatives: 5.5. A PAUSE TO FIND OUT WHERE WE ARE 247 (A) We can continue with linear maps, L : V1 → V2 , and consider the case where V1 or V2 , or both are infinite dimensional. The general theory here is in its youth and still undeveloped. Only one of the sources of difficulty is that a generalization of the representation theorem (page 374) remains unknown - except for some special cases. Thus, many special types of mappings have to be investigated individually. We shall consider only one type of linear mapping between infinite dimensional spaces, those defined by linear differential operators (Chapter VI and Chapter VII, Section 3). (B) The second alternative is to continue our study of mappings between finite dimensional spaces, only now switch to non linear mappings. This theory should parallel the transition in elementary calculus from the analytic geometry of straight lines, f (x) = a + bx, that is, affine mappings, to genuine non linear mappings, as √ f (x) = x2 − 7 x or f (x) = x3 − esin x . You recall, one important idea was to approximate the graph of a function y = f (x) at a point x0 by its tangent line at x0 , since for x near x0 , the curve and the tangent line there approximately agree. For example, one easily proves that at a maximum or minimum, the tangent line must be horizontal, f = 0 . In generalizing this to functions of several variables, Y = F (X ) = F (x1 , · · · , xn ), the role of the derivative at X0 is assumed by the affine map, A(X ) = Y0 + LX, which is tangent to F at X0 . Thus, linear algebra appears as the natural extension of analytic geometry to higher dimensional spaces. See Chapters VII - IX for this. 248 CHAPTER 5. MATRIX REPRESENTATION Chapter 6 Linear Ordinary Differential Equations 6.1 Introduction . A differential equation is an equation relating the values of a function u(t) with the values of its derivatives at a point, F (t, u(t), du dn u ,..., n ) = 0 dt dt (6-1) The order of the equation is the order, n , of the highest derivative which appears. For example, the equations 3 d2 u du −7 + t2 u2 − sin t = 0 dt2 dt du − t sin u2 = 0 dt are of order two and one respectively. A function u(t) is a solution of the differential equation if it has at least as many derivatives as the order of the equation, and if substitution of it into the equation yields an identity. Thus, the equation du dt 2 + u2 = 1 has the function u(t) = sin t as a solution, since for all t d sin t dt 2 + (sin t)2 = 1. A differential equation (1) for the unknown function u(t) is linear if it has the form Lu := an (t) dn u dn−1 + an−1 (t) n−1 + · · · + a0 (t)u = 0 dtn dt (6-2) You should verify that this coincides with the notion of a linear operator used earlier. Equation (2) is sometimes called linear homogeneous to distinguish it from the inhomogeneous equation Lu = f (t), (6-3) 249 250 CHAPTER 6. LINEAR ORDINARY DIFFERENTIAL EQUATIONS that is an (t) dn u + · · · + a0 (t)u = f (t). dtn (6-4) The subject of this chapter is linear ordinary differential equations with variable coefficients (to distinguish them from the special case where the aj ’s are constants). This operator L defined by (2) has as its domain the set of all sufficiently differentiable functions— n derivatives is enough. These functions constitute an infinite dimensional linear space. Thus, the differential operator L acts on an infinite dimensional space, as opposed to a matrix which acts on a finite dimensional space. Differential equations abound throughout applications of mathematics. This is because most phenomena are described by laws which relate the rate of change of a function - the derivative - at a given time (or point) to the values of the function at that same time. For example, we have seen that at any time the acceleration of a harmonic oscillator is determined by its position and velocity at the same time, u = −µu − ku. ¨ ˙ When confronted by a differential equation, your first reaction should be to attempt to find the solution explicitly. We were able to do this for linear constant coefficient equations (Chapter 4, Section 2). One of the main goals of this chapter is to show you how to solve as many linear ordinary differential equations as possible. However, it is naive to expect to solve an arbitrary equation which crops up in terms of the few functions we know: xα , ex , log x, sin x , and cos x . In fact, to even solve the elementary equation 1 du =, dx x appearing in elementary calculus, we were forced to define a new function as the solution of this equation u(x) = log x + c and obtain the properties of this function and its inverse ex directly from the differential equation. Many many functions arise which cannot be expressed in terms of the few elementary functions we know and love. Most of these functions - like Bessel’s functions, elliptic functions, and hypergeometric functions, arise directly because they are the solutions of differential equations nature has forced us to consider. How do we know these strange sounding functions are solutions of the differential equations? Well, we somehow prove a solution exists and then simply give a name to the solution - much as babies are given names at birth. Furthermore, as is the case with babies, their actual “names” are the least important aspect. To summarize briefly, we shall solve as many equations as we can. For the remaining ones (which include most equations), we shall attempt to describe a few of the main properties so that if one arises in your work, you will have a place to begin the attack. Later on, we shall again return to the more complicated situation of nonlinear equations. Much less can be said there. Only very few general results are known. Lest you get the wrong idea, we shall cover but a fraction of the known theory for just linear ordinary differential equations. In the next chapter, we shall only look at one partial differential equation (the wave equation for a vibrating violin string). The general theory there is too complicated to allow discussion for more than one particular equation. 6.1. INTRODUCTION 251 Exercises 1. Assume there exists a unique function E (x) which satisfies the following differential equation for all x and satisfies the initial condition du = u, dx u(0) = 1. (a) Use the “chain rule” and uniqueness to prove for any a ∈ R E (x + a) = E (a)E (x) ˜ [Hint: Prove E (x) := E (x + a) is also a solution of the equation. Then apply the ˜ uniqueness to the function E (x)/E (a) ]. (b) Prove E (−x) = 1 . E (x) (c) Prove for any x E (nx) = [E (x)]n , n ∈ Z. In particular, show E (n) = [E (1)]n , and E( n∈Z 1 ) = [E (1)]1/m , m m ∈ Z+ (d) Prove n ) = [E (1)]n/m , n ∈ Z, m ∈ Z+ m n [Thus, the function E (x) is defined for all rational x = m as the number E (1) to the power n/m . Since E (x) is continuous (even differentiable by definition, we can extend the last formula to irrational x by continuity: if rj is a sequence of rational numbers converging to the real number x (which may or may not be rational) then by continuity E (x) = lim E (rj ) = lim [E (1)]r j = E (1)x . E( j →∞ j →∞ Consequently, E (x) is the familiar exponential function ex ]. 2. Find the general solutions of the following equations by any method you can. (a) du dx − 2u = 0 (b) du dx = x2 + sin x (c) du 2 dx + 4u2 = 1 (d) du dx = (e) du dx = x2 eu (f) d2 u dx2 x u+1 + 3 du − 4u = 4 dx 252 CHAPTER 6. LINEAR ORDINARY DIFFERENTIAL EQUATIONS 6.2 First Order Linear . Except for those differential equations which can be solved by inspection, the next most simple equation is one which is linear and first order, the homogeneous equation du + a(x)u = 0, dx (6-5) du + a(x)u = f (x). dx (6-6) and the inhomogeneous equation The homogeneous equation can be solved by first writing it in the form 1 du = −a(x) u dx and then integrating both sides x log u(x) = − a(s) ds + C1 . Thus x u(x) = Ce − a(s) ds (6-7) is the solution of equation (4) for any constant C . In the very special case a(s) ≡ constant, the solution does have the form found earlier (Chapter 4, Section 2) for a linear equation with constant coefficients. How can we integrate the inhomogeneous equation (5)? A useful device is needed. Multiply both sides of this equation by an unknown function q (x) q (x) du + q (x)a(x)u = q (x)f (x), dx If we can find q (x) so that the left side is a derivative, q (x) du d + q (x)a(x)u = (q (x)u), dx dx (6-8) then the equation reads d (q (x)u) = q (x)f (x), dx which can be integrated immediately, x q (x)u(x) = q (s)f (s) ds + c, (6-9) and then solved for u(x) by dividing by q (x) . Thus, the problem is reduced to finding a q (x) which satisfies (7). Evaluating the right side of (7), we find du dq du q + qa u = u +q , dx dx dx 6.2. FIRST ORDER LINEAR 253 so q (x) must satisfy dq = q (x)a(x). dx It is easy to find a function q (x) which satisfies this - for it is a homogeneous equation of the form (4). Therefore Rx q (x) = e a(t) dt , the reciprocal of the solution (6) to the homogeneous equation, does satisfy (7). Notice we have ignored the arbitrary constant factor in the solution since all we want is any one function q (x) for (7). Now we can substitute into (8) to find the solution of the inhomogeneous equation u(x) = x 1 q (x) q (s)f (s) ds + c , q (x) (6-10) where q (x) is given by the formula at the top of the page. If it makes you happier, substitute the expression for q (x) into (9) to obtain the messy formula. We have left some room. a figure goes here Examples: 1. First, du dx 2 + x u = (1 + x3 )17 , x q (x) = exp( Thus x = 0. 2 ds) = exp(2 ln x) = exp(ln x2 ) = x2 . s d2 (x u) = x2 (1 + x3 )17 . dx Integrating both sides we find x2 u(x) = Therefore u(x) = 2. First, du dx (1 + x3 )18 + C. 54 1 (1 + x3 )18 C + 2, 54 x2 x x = 0. + 2xu = x x q (x) = exp( 2s ds) = exp x2 . Thus, d x2 2 (e u) = ex x. dx Integrating both sides, we find 12 2 ex u(x) = ex + C, 2 so 1 2 + Ce−x . 2 This formula could have been guessed much earlier since we know the general solution of the inhomogeneous equation can be expressed as the sum of a particular solution to that u(x) = 254 CHAPTER 6. LINEAR ORDINARY DIFFERENTIAL EQUATIONS equation plus the general solution of the homogeneous equation. The particular solution u0 (x) = 1 can be obtained by inspection of the D.E. 2 Let us summarize our results. Theorem 6.1 . Consider the first order linear inhomogeneous equation Lu := du + a(x)u = f (x). dx If a(x) and f (x) are continuous functions, the equation has the solutions x u(x) = u(x) ˜ f (s) ds + C u(x) ˜ u(s) ˜ where (6-11) x u(x) = exp(− ˜ a(s) ds) is a non-trivial solution of the homogeneous equation. Moreover, if we specify the initial condition u(x0 ) = α , then the solution which satisfies this initial condition is unique. Proof: The existence follows from the explicit formula (9) or (10) and from the fact that a continuous function is always integrable. Uniqueness. This will be quite similar to the proof carried out in Chapter 4. If u1 (x) and u2 (x) are two solutions of the inhomogeneous equation Lu = f , with the same initial conditions, then the function w(x) := u1 (x) − u2 (x) satisfies the homogeneous equation Lw := w + a(x)w = 0, and is zero at x0 , w(x0 ) = u1 (x0 ) − u2 (x0 ) = 0. Our task is to prove w(x) ≡ 0 . Multiply the equation (20) by w(x) . Then ww = −a(x)w2 , or 1d 2 w = −a(x)w2 . 2 dx Since a(x) is continuous, for any closed and bounded interval [A, B ] there is a constant k (depending on the interval) such that −a(x) ≤ k for all x ∈ [A, B ] . Consequently, 1d 2 w ≤ kw2 , 2 dx or d2 w − 2kw2 ≤ 0. dx Now we need an important identity which can be verified by direct computation: for any smooth function g , and any constant α , g + αg = e−αx (eαx g ) . We apply this to the above inequality with g = w2 and α = −2k to conclude that e2kx d −2kx 2 [e w ] ≤ 0. dx 6.2. FIRST ORDER LINEAR 255 Because e2kx is always positive, by the mean value theorem this inequality states that e−2kx w2 is a decreasing function of x . Thus e−2kx w2 (x) ≤ e−2kx0 w2 (x0 ), x ≥ x0 , or w2 (x) ≤ e2k(x−x0 ) w2 (x0 ), x ≥ x0 . But since w(x0 ) = 0 and w2 (x) ≥ 0 this means that 0 ≤ w2 (x) ≤ 0. Therefore w(x) ≡ 0 x ≥ x0 . To prove w(x) ≡ 0 for x ≤ x0 , merely observe that the equation (11) has the same form if x is replaced by −x . Thus the above proof applies and shows w(x) ≡ 0 for x ≤ x0 too. Remark: Although a formula has been exhibited for the solution, this does not mean that the integrals which occur can be evaluated in terms of elementary functions. These integrals however can be at least evaluated approximately using a computer if a numerical result is needed. Exercises (1) . Find the solution of the following equations with given initial values (a) u + 7u = 3, (b) 5u − 2u = u(1) = 2 e3x , u(0) = 1. 2x2 , u(−1) = 0. (d) xu + u = 4x3 + 2, u(1) = −1. (c) 3u + u = x − (e) u + (cot x)u = ecos x + 1, u( π ) = 0. [ cot x dx = ln(sin x)] . 2 (2) . The differential equation L du + Ru = E sin ωt, L, R, E constants dt arises in circuit theory. Find the solution satisfying u(0) = 0 and show that it can be written in the form u(t) = R2 ωEL E e−Rt/L + √ sin(ωt − α) 2 L2 +ω R2 + ω 2 L2 where tan α = ωL . R (3) Bernoulli’s equation is u + a(x)u = b(x)uk , k a constant. 256 CHAPTER 6. LINEAR ORDINARY DIFFERENTIAL EQUATIONS (a) Use the substitution v (x) = u(x)1−k to transform this nonlinear equation to the linear equation v + (1 − k )a(x)v = (1 − k )b(x). (b) Apply the above procedure to find the general solution of u − 2ex u = ex u3/2 . (4) . Consider the equation u + au = f (x), where a is a constant, f is continuous in the interval [0, ∞] , and |f (x)| < M for all x . (a) Show that the solution of this equation is x u(x) = e−ax u(0) + e−ax eat f (t) dt 0 (b) Prove (if a = 0 ) u(x) − e−ax u(0) ≤ M [1 − e−ax ]. a (5) (a) Show the uniqueness proof yields the following stronger fact. If u1 (x) and u2 (x) are both solutions of the same equation u + a(x)u = f (x) but satisfy different initial conditions u1 (x0 ) = α, u2 (x0 ) = β, then |u1 (x) − u2 (x)| ≤ ek(x−x0 ) |α − β | , x ≥ x0 for all x ∈ [A, B ] , where −a(x) < k in the interval. Thus, if the initial values are close, then the solutions cannot get too far apart. (b) Show that if a(x) ≤ A < 0 , where A is a constant, then as x → 0 any two solutions of the same equation - but with possibly different initial values - tend to the same function. (6) . Show that the differential equation y = a(x)F (y ) + b(x)G(y ) can be reduced to a linear equation by the substitution u = F (y )/G(y ) or u = G(y )/F (y ) if (F G − GF )/G or (F G − GF )/F , respectively, is a constant. Use this substitution to again solve Bernoulli’s equation. 6.2. FIRST ORDER LINEAR 257 (7) . Let S = { u ∈ C : u(0) = 0 } , and define the operator L from S to C by Lu = u + u. Prove L is injective and R(L) = C . (8) . Set up the differential equation and solve. The rate of growth of a bacteria culture at any time t is proportional to the amount of material present at that time. If there was one ounce of culture in 1940 and 3 ounces in 1950, find the amount present in the year 2000. The doubling time is the interval it takes for a given amount to double. Find the doubling time for this example. (9) . Find the general solution of x2 u + 3xu = sin x . (10) . Assume that a body decreases its temperature u(t) at a rate proportional to the difference between the temperature of the body and the temperature T of the surrounding air. A body originally at a temperature of 1000 is placed in air which is kept at a temperature of 500 . If at the end of one hour the temperature of the body has fallen 200 , how long will it take for the body to reach 600 ? (11) . Here is one simple mathematical model governing economic behavior. Think of yourself as a widget manufacturer for now. Let i) S (t) be the supply of widgets available at time t . This is the only function you can control directly. ii) P (t) be the market price of a widget at time t . iii) D(t) is the demand for widgets at time t —the number of widgets people want to buy at time t . You cannot control this given function. It has been found that the market price P (t) changes at a rate proportional to the difference between demand and supply, dP = k (D(t) − S (t)), dt where k > 0 is a fixed constant. You decide to vary the supply so that it is a fixed constant S0 plus an amount proportional to the market price, S (t) = S0 + αP (t), α > 0. (a) Set up the differential equation for S (t) in terms of the given function D(t) and solve it. (b) Analyze the solution and give an argument making it plausible that the market for widgets behaves roughly in this way. What criticisms can you make of the model? (c) How does the market behave if the demand increases for a long time and then levels off at some constant value, D(t) = D(t1 ) for t ≥ t1 ? A qualitative description of S (t) and P (t) is called for here. In particular, say whether price increases without bound (bringing the evils of inflation) or whether it, too, levels off. 258 CHAPTER 6. LINEAR ORDINARY DIFFERENTIAL EQUATIONS (12) It is found that a juicy rumor spreads at a rate proportional to the number of people who “know”. If one person knows initially, t = 0 , and tells one other person by the next day, t = 1 , approximately how long does it take before 4000 people know? Analyze the mathematical model as t → ∞ and state why it is, in fact, the wrong model. (The question to ask yourself is, “how long will it take before everyone even remotely concerned knows?”). The same mathematical model applies to the spreading of contagious diseases - and many other similar phenomena. 6.3 Linear Equations of Second Order In this section we will consider a portion of the general theory of second order linear O.D.E.’s, with variable coefficients, Lu := a2 (x) du d2 u + a1 (x) + a0 (x)u = f (x). dx2 dx Although all of the results obtained generalize immediately to linear equation of order n , only the special case n = 2 will be treated. This special case has the advantage of clearly illustrating the general situation and supplying proofs which generalize immediately - while avoiding the inevitable computation complexities inherent in the general case. There are three parts: A). a review of the constant coefficient case, B). power series solutions, and C). the general theory. Whereas the first two parts are concerned with obtaining explicit formulas for the solutions, the last resigns itself to some statements which can be made without finding the solution explicitly. a) A Review of the Constant Coefficient Case. Here we have the operator Lu := a2 u + a1 u + a0 u, (6-12) where a0 , a1 , and a2 are constants. In order to solve the homogeneous equation Lu = 0, the function eλx is tried. Substitution yields L(eλx ) = (a2 λ2 + a1 λ + a0 )eλx = p(λ)eλx . (6-13) labeleq:13 The polynomial p(λ) is called the characteristic polynomial for L . If λ1 is a root of this polynomial, p(λ1 ) = 0 , then u1 (x) = eλ1 x is a solution of the homogeneous equation Lu = 0 . If λ2 is another root of this polynomial λ1 = λ2 , u2 (x) = eλ2 x is another solution. Then every function of the form u(x) = Au1 (x) + Bu2 (x) = Aeλ1 x + Beλ2 x , (6-14) where A and B are constants, is a solution of the homogeneous equation. The uniqueness theorem showed that every solution of Lu = 0 is of the form (14). 6.3. LINEAR EQUATIONS OF SECOND ORDER 259 If the two roots of p(λ) coincide, then a second solution is u2 (x) = xeλ1 x , and every function of the form u(x) = Au1 (x) + Bu2 (x) = Aeλ1 x + Bxeλ1 x (6-15) where A and B are constants, is a solution of the homogeneous equation. Again the uniqueness theorem showed that every solution of Lu = 0 is of the form (15). In both (14) and (15), the constants A and B can be chosen to find a unique function u(x) which satisfies the homogeneous equation Lu = 0 as well as the initial conditions u(x0 ) = α, u (x0 ) = β, where α and β are specified constants. It turns out that the inhomogeneous O.D.E. Lu = f, where f is a given continuous function, can always be solved once two linearly independent solutions u1 and u2 of the homogeneous equation Luj = 0 are known. Since the procedure for solving the inhomogeneous equation also works if the coefficients in the differential operator L are not constant, it is described later in this section in the more general situation (p. 487-8, Theorem 8). Somewhat simpler techniques can be used for the constant coefficient equation if the function f is a linear combination of functions of the form xk erx , where k is a nonnegative integer and r is some real or complex constant (cf. Exercise 6, p. 300). Because both sin nx and cos nx are of this form, Fourier series can be used to supply a solution for any function f which has a convergent Fourier series (cf. Exercise 13, p. 303). Section 5 of this chapter contains an interesting generalization of the theory for constant coefficient ordinary differential operators to operators which are “translation invariant”. b) Power Series Solutions . Many ordinary differential equations (linear and nonlinear) can be solved by merely assuming the solution can be expanded in a power series u(x) = cn xn , and plugging into the differential equation to find the coefficients cn . A simple example illustrates this. Example: Solve u − 2xu = 0 with the initial conditions u(0) = 1, u (0) = 0 . Solution: We try u(x) = c0 + c1 x + c2 x2 + · · · + cn xn + · · · . Then u (x) = c1 + 2c2 x + 3c3 x2 + · · · + ncn xn−1 + · · · so 2xu (x) = 2c1 x + 4c2 x2 + · · · + 2ncn xn + · · · 260 CHAPTER 6. LINEAR ORDINARY DIFFERENTIAL EQUATIONS Also u (x) = 2c2 + 2 · 3c3 x + 3 · 4c4 x2 + · + (n − 1)ncn xn + · · · Adding u − 2xu − u and collecting like powers of x we find that 0 = u − 2xu − u = [2c2 − c0 ] + [2 · 3c3 − 2c1 − c1 ]x + [3 · 4c4 − 4c2 − c2 ]x2 + · · · + [(k + 1)(k + 2)ck+2 − 2kck − ck ]xk + · · · If the right side, a Taylor series, is to be zero (= the left side), then the coefficient of each power of x must vanish because the only convergent Taylor series for zero is zero itself. The coefficient of x0 is 2c2 − c0 x1 is 6c3 − 3c1 x2 is 12c4 − 5c2 xk is (k + 1)(k + 2)ck+2 − (2k + 1)ck Equating these to zero we find that c2 = c0 , 2 c3 = c1 , 2 c4 = 5c2 5 = c0 12 24 and, more generally, ck+2 = 2k + 1 ck . (k + 2)(k + 1) (6-16) Thus, for this example eeven is some multiple of c0 while codd is some multiple of c1 . Since u(0) = c0 and u (0) = c1 , the constants c0 and c1 are determined by the initial conditions. c0 = 1, c1 = 0. Consequently, all of the odd coefficients c3 , c5 , . . . vanish, while 1 c2 = , 2 c4 = 5 , 24 c6 = 3 1 c4 = , 10 16 c8 = . . . , so the first few terms in the series for u(x) are 1 5 1 u(x) = 1 + x2 + x4 + x6 + · · · 2 24 16 (6-17) We should investigate if this formal power series expansion converges. Using (16), the ratio of successive terms in the series for u(x) is ck+2 xk+2 (2k + 1) x2 = k (k + 2)(k + 1) ck x Therefore the ratio test shows the formal power series actually converges for all x . By Theorem 16, p. 82, the series can be differentiated term by term and does satisfy the equation. Although the computation is lengthy, the series (17) is a solution. Since there is no way of finding the solution in terms of elementary functions, we must be contented with the power series solution. You have seen (Chapter 1, Section 7) how properties of a function can be extracted from a power series definition. This example is typical. 6.3. LINEAR EQUATIONS OF SECOND ORDER 261 Theorem 6.2 . If the differential equation a2 (x)u + a1 (x)u + a0 (x)u = 0 has analytic coefficients about x = 0 , that is, if the coefficients all have convergent Taylor series expansions about x = 0 , and if a2 (0) = 0 , then given any initial values u(0) = α, u (0) = β, there is a unique solution u(x) which satisfies the equation and initial conditions. Moreover, the solution is analytic about x = 0 and converges in the largest interval [−r, r] in which the series for a1 /a2 and a0 /a2 both converge. Outline of Proof. There are two parts: i) find a formal power series u(x) = cn xn , and ii) prove the formal power series converges. Since explicit formulas can be found for the cn ’s (cf. Exercise 30a) the first part is true. Proof of the second part is sketched in the exercises too (Exercise 30b). From the explicit formulas mentioned above for the cn ’s, it is clear there is at most one analytic solution. But because the general uniqueness proof (p. 510, Theorem 9) states there is at most one solution which is twice differentiable and since u(x) is certainly such a function - the uniqueness of u(x) among all twice differentiable functions follows as soon as Theorem 9 is proved. The restriction a2 (0) = 0 which was made in Theorem 3 is very important. If a2 (0) = 0 then the differential equation a2 (x)u + a1 (x)u + a0 (x)u = 0 is degenerate at x = 0 because the coefficient of the highest order derivative vanishes there. Then the point x = 0 is called a singularity of the differential equation. A simple example illustrates the situation. The function u(x) = x5/2 satisfies the differential equation 4x2 u − 15u = 0 and the initial conditions u(0) = 0, u (0) = 0 . However u(x) ≡ 0 is also a solution. Thus it will be impossible to prove any uniqueness theorem at x = 0 for this equation. Perhaps the singular nature of this equation at x = 0 is more vivid if the equation is written as u− 15 u = 0. 4x2 Although the possibility of a uniqueness result is ruled out for equations with singularities, it is important to be able to find the non-zero solutions of these equations, important because many of the equations which arise in practice do happen to have singularities (Bessel’s equation, Legendre’s equation, the hypergeometric equation, . . . ). In all of the commonly occurring cases, the coefficients a0 (x), a1 (x) , and a2 (x) , a2 u + a1 u + a0 u = 0, are analytic functions. Thus the only obstacle to applying Theorem 3 is the condition a2 = 0 . We persist, however, in the belief that a power series, or some modification of it, 262 CHAPTER 6. LINEAR ORDINARY DIFFERENTIAL EQUATIONS should work. The modification must allow for such solutions as u(x) = x3/2 which do not have Taylor expansions about x = 0 . Undoubtedly the most naive candidate for a solution is to try ∞ u(x) = xρ cn xn , (6-18) n=0 where ρ may be any real number. The particular choice ρ = 3/2, c0 = 1, c1 = c2 = c3 = . . . = 0 does yield the function u(x) = x3/2 . It turns out that (18) is usually the correct guess. Again, we turn to an example. Bessel’s equation of order n , x2 u + xu + (x2 − n2 )u = 0, which arises in the study of waves in a two dimensional circular domain, like those on tympani, in a tea cup, or on your ear drum. Let us find a solution to Bessel’s equation of order one, x2 u + xu + (x2 − 1)u = 0 (6-19) This equation does have a singularity at the origin, x = 0 . If u has the form (18), then ∞ cn xn+ρ , u(x) = n=0 ∞ (n + ρ)cn xn+ρ−1 , u (x) = n=0 and ∞ (n + ρ)(n + ρ − 1)cn xn+ρ−2 . u (x) = n=0 Substituting this into the differential equation (19), we find ∞ ∞ (n + ρ)(n + ρ − 1)cn xn+p + n=0 (n + ρ)cn xn+ρ n=0 ∞ ∞ cn xn+ρ+2 − + n=0 cn xn+ρ = 0. (6-20) n=0 We must equate the coefficients of successive powers of x to zero. The lowest power of x which appears is xρ , the next xρ+1 , and so on. xρ : xρ+1 : xρ+2 : xρ+3 : · · · · ρ+ n : x ρ(ρ − 1)c0 + ρc0 − c0 = 0 (ρ + 1)ρc1 + (ρ + 1)c1 − c1 = 0 (ρ + 2)(ρ + 1)c2 + (ρ + 2)c2 + c0 − c2 = 0 (ρ + 3)(ρ + 2)c3 + (ρ + 3)c3 + c1 − c3 = 0 (ρ + n)(ρ + n − 1)cn + (ρ + n)cn + cn−2 − cn = 0. 6.3. LINEAR EQUATIONS OF SECOND ORDER 263 From the equation for the power xρ , we find (ρ2 − 1)c0 = 0 The polynomial q (ρ) = ρ2 − 1 which appears in the coefficient of the lowest power of x in (20) is called the indicial polynomial since it will be used to determine the index ρ . If c0 = 0 , the equation (ρ2 − 1)c0 = 0 can be satisfied only if ρ is a root of the indicial polynomial. Thus ρ1 = 1, ρ2 = −1 . Consider the largest root ρ1 = 1 . Then the equation for the coefficients of xρ+1 in (20) is xρ+1 = x2 : 3c1 = 0 ⇒ c1 = 0, while the equation for the coefficient of xρ+n in (20) is xρ+n = x1+n : (n + 1)ncn + (n + 1)cn + cn−2 − cn = 0, or cn = − cn−2 , n(n + 2) n = 2, 3, . . . Since c1 = 0 , this equation implies codd = 0 and determines the ceven in terms of c0 , c2 = − c0 , 2·4 c2k = c4 = − c2 c0 = , 4·6 2 · 42 · 6 c6 = − c4 c0 = 6·8 2 · 42 · 62 · 8 (−1)k c0 (−1)k c0 . = 2k 2 · 42 · 62 · 82 · · · (2k )2 (2k + 2) 2 k !(k + 1)! Thus, the formal series we find for the solution, J1 (x) , of the Bessel equation of first order corresponding to the largest indicial root, ρ1 = 1 is x2 x4 1 J1 (x) = x1 (1 − + − ···) 2 2 · 4 2 · 42 · 6 or 1 J1 (x) = x 2 ∞ k=0 (−1)k x2k 22k k !(k + 1)! (6-21) 1 since it is customary to choose the constant c0 for J1 (x) as c0 = 2 (and the constant c0 n n! when n is a positive integer). for Jn (x) as 1/2 The other (smaller) root, ρ2 = −1 , is much more difficult to treat. If the above steps are imitated (which you should try), division by zero needed to solve for c2 from c0 . It turns out that the solution corresponding to the smaller root ρ2 = −1 is not of the form (18). We shall not enter into this matter further except to note that the difficulty occurs because the two roots ρ1 and ρ2 differ by an integer. If the two roots ρ1 and ρ2 do not differ by an integer, the above method yields two different solutions of the form (18) for the equation. In any case, this method always gives a solution of the form (18) for the largest root of the indicial equation. It is easy to check that the power series (21) does converge for all x and is therefore a solution to Bessel’s equation of the first order. From the power series, with considerable effort one can obtain a series of identities for Bessel functions which exactly parallels those for the trigonometric functions. The functions Jn (x) behaving in many ways similar to sin nx or cos nx . Here is a graph of J1 (x) : 264 CHAPTER 6. LINEAR ORDINARY DIFFERENTIAL EQUATIONS a figure goes here For x very large, J1 (x) is asymptotically 2 cos(x − 3π/4) √ , π x √ which is a cosine curve whose amplitude decreases like 1/ x . For good reason this curve resembles the height of surface waves on a lake after a pebble has been dropped into the water, or those on the surface of a cup of tea. Having worked out this example in detail, we shall state a definition in preparation for our theorem. Definition: The differential equation J1 (x) ∼ a2 (x)u + a1 (x)u + a0 (x)u = 0, where the aj (x) are analytic about x = 0 , it has a regular singularity at x = 0 if it can be written in the form x2 u + A(x)xu + B (x)u = 0, where the functions A(x) and B (x) are analytic about x = 0 . Otherwise the singularity is irregular. Examples: (1) . x2 (1 + x)u + 2(sin x)u − ex u = − has a regular singularity at x = 0 since the equation may be written as x2 u + −ex 2(sin x) u− u = 0, 1+x 1+x where the coefficients 2 sin x/(1 + x)x and ex /1 + x do have convergent Taylor series 2 about x = 0 . (Here we observed that sin x = 1 − x + · · · ). x 3! (2) . xu − 7u + the form 3 cos x u = 0 has a regular singularity at x = 0 since it can be written in 3x u = 0, cos x where the coefficients −7 and 3x/ cos x are analytic about x = 0 . x2 u − 7xu + (3) . x2 u − 2u + xu = 0 has an irregular singularity at x = 0 since it cannot be written in the desired form. (4) . x3 u − 2xu + u = 0 has an irregular singularity at x = 0 . Theorem 6.3 . (Frobenius) Consider the equation with a regular singularity at x = 0 a2 (x)u + a1 (x)u + a0 (x)u = 0, so it can be written in the form x2 u + A(x)xu + B (x)u = 0, 6.3. LINEAR EQUATIONS OF SECOND ORDER 265 where the analytic function A(x) and B (x) have convergent power series for |x| < r . Let ρ1 and ρ2 be the roots of the indicial polynomial q (ρ) = ρ(ρ − 1) + A(0)ρ + B (0), where ρ1 ≥ ρ2 (or Reρ1 ≥ Reρ2 if roots are complex). Then the differential equation has one solution u1 (x) of the form ∞ ρ1 cn xn u1 (x) = x (c0 = 0), n=0 the series converging for all |x| < r . Moreover, if ρ1 − ρ2 is not an integer (or zero), there is a second solution u2 (x) of the form ∞ ρ2 cn xn ˜ u2 (x) = x (˜0 = 0), c n=0 where this series also converges in the interval |x| < r . In the special case ρ1 − ρ2 = integer, there may not be a solution of the form (18) - see Exercise 19c. Notice: although the power series do converge at x = 0 , the functions u1 (x) and u2 (x) may not be solutions 1 at that point because the functions xρ may not be twice differentiable (for example, if ρ = 2 √ then x has no derivatives at x = 0 ). Outline of Proof. Like Theorem 2, this proof also has two parts; i) finding the coefficients cn for the formal power series, and ii) proving the formal power series converges. As in Theorem 3, part i) is proved by exhibiting formulas for the cn ’s, while part ii) is proved by comparing the series cn xn with another convergent series Cn xn whose coefficients are larger, |cn | ≤ Cn . To illustrate the procedure of part i), we will obtain the stated formula for the indicial ∞ ∞ αn xn and B (x) = polynomial q (ρ) . Let A(x) = n=0 βn xn be the power series expann=0 sions of A(x) and B (x) . Then assuming u(x) has a solution in the form (18), we find by substituting these formulas into the differential equation that ∞ ∞ (ρ + n)(ρ + n − 1)cn xρ+n + ( n=0 n=0 ∞ +( (ρ + n)cn xρ+n ) n=0 ∞ cn xρ+n ) = 0. n βn x )( n=0 The lowest power of x appearing is ∞ αn xn )( n=0 xρ , then comes xρ+1 , . . . . xρ : ρ(ρ − 1)c0 + α0 ρc0 + β0 c0 = 0 xρ+1 : (ρ + 1)ρc1 + [α1 ρc0 + α0 (ρ + 1)c1 ] + [β1 c0 + β0 c1 ] = 0 · · · n xρ+n : (ρ + n)(ρ + n − 1)cn + n αn−k [(ρ + k )ck ] + k=0 βn−k ck = 0, k=0 266 CHAPTER 6. LINEAR ORDINARY DIFFERENTIAL EQUATIONS the last formula arising from the formula for the coefficients in the product of two power series (p. 76). If c0 = 0 , the first equation states q (ρ) := ρ(ρ − 1) + α0 ρ + β0 = 0, where q (ρ) is the indicial polynomial. Since α0 = A(0) and β0 = B (0) , this is precisely the formula given in the theorem. c) General Theory We begin immediately by stating Theorem 6.4 (Existence and Uniqueness). Consider the second order linear O.D.E. Lu := a2 (x)u + a1 (x)u + a0 (x)u = f (x), where the coefficients a0 , a1 , and a2 as well as f are continuous functions, and a2 (x) = 0 . There exists a unique twice differentiable function u(x) which satisfies the equation and the initial conditions u(x0 ) = α, u (x0 ) = β, where α and β are arbitrary constants. If time permits, the existence proof will be carried out in the last chapter as a special case of a more general result. The uniqueness will be proved later too, as a special case of Theorem 9, page 510 - in the next section. We will not be guilty of circular reasoning. Now what? Although this theorem appears to make further study unnecessary, there are several general statements which can be made because the equation is linear. Two other theorems are particularly nice; the first is dim N(L) = 2 , while the second gives a procedure for solving the inhomogeneous equation once two linearly independent solutions of the homogeneous equation are known. A preliminary result on linear dependence and independence of functions is needed. If the differentiable functions u1 (x) and u2 (x) are linearly dependent, there are constants c1 and c2 not both zero such that c1 u1 (x) + c2 u2 (x) ≡ 0. Differentiating this equation, we find c1 u1 (x) + c2 u2 (x) ≡ 0. Since the two homogeneous algebraic equations for c1 and c2 have a non-trivial solution, by Theorem 32 (page 428), the determinant W (x) := W (u1 , u2 )(x) := u1 (x) u2 (x) u1 (x) u2 (x) =0 must vanish. This determinant is called the Wronskian of u1 and u2 . We have proved 6.3. LINEAR EQUATIONS OF SECOND ORDER 267 Theorem 6.5 . If the differentiable functions u1 (x), u2 (x) are linearly dependent in the interval [α, β ] , then necessarily W (x) ≡ 0 throughout [α, β ] . Thus, if W = 0 , the uj ’s are independent. Remark: The condition W = 0 is necessary for linear dependence but not sufficient in general, as can be seen from the example u1 (x) = x2 , x ≥ 0, 0 , x<0 u2 (x) = 0 , x≥0 x2 , x < 0, for which W (u1 , u2 ) ≡ 0 for all x but u1 and u2 are linearly independent. However it is sufficient if u1 and u2 are solutions of a second order linear O.D.E., Luj = 0 . An even stronger statement is true in this case. All we need require is that W vanish at one point x0 . Theorem 6.6 . Let u1 and u2 both be solutions of Lu := a2 u + a1 u + a0 u = 0, where a2 = 0 . If W (x0 ) = 0 at some point x0 , then u1 and u2 are linearly dependent which implies by Theorem 6 that W (x) ≡ 0 for all x . In other words, if W (x0 ) = 0 , then u1 and u2 are linearly independent. Proof: Since W (x0 ) = 0 , the homogeneous algebraic equations c1 u1 (x0 ) + c2 u2 (x0 ) = 0 c1 u1 (x0 ) + c2 u2 (x0 ) = 0 have a non-trivial solution c1 , c2 . Let v (x) = c1 u1 (x) + c2 u2 (x). We went to prove v (x) ≡ 0 . Observe Lv = 0 . Moreover v (x0 ) = 0 and v (x0 ) = 0 . Thus by uniqueness, v (x) ≡ 0 , establishing the linear dependence of u1 and u2 . The same type of reasoning proves Theorem 6.7 . Let Lu := a2 u + a1 u + a0 u , where a2 (x) = 0 . Then dim N(L) = 2. Proof: We exhibit two special solutions φ1 and φ2 of Lu = 0 and prove they constitute a basis for N(L) . Let φ1 (x) satisfy Lφ1 = 0 with φ1 (x0 ) = 1, φ1 (x0 ) = 0 φ2 (x) satisfy Lφ2 = 0 with φ2 (x0 ) = 0, φ2 (x0 ) = 1. There are such functions by the existence theorem. 268 CHAPTER 6. LINEAR ORDINARY DIFFERENTIAL EQUATIONS i) They are linearly independent. W (x0 ) = W (φ1 , φ2 )(x0 ) = φ1 (x0 ) φ2 (x0 ) φ1 (x0 ) φ2 (x0 ) = 10 01 = 1 = 0. Thus by Theorem 7, φ1 and φ2 are linearly independent. ii) They span N(L) . Let u(x) be any element in N(L) and consider the function v (x) = u(x) − [u(x0 )φ1 (x) + u (x0 )φ2 (x)]. Then Lv = 0 and v (x0 ) = 0, v (x0 ) = 0 . By uniqueness, v (x) ≡ 0 . Thus every u ∈ N(L) can be written as u(x) = Aφ1 (x) + Bφ2 (x), where the constants A and B are A = u(x0 ), B = u (x0 ) . All of our attention has been on the homogeneous equation Lu = 0 . Let us solve the inhomogeneous equation. This is particularly simple for a linear differential equation once we have a basis for N(L) . Theorem 6.8 (Lagrange). Let u1 (x) and u2 (x) be a basis for N(L) , where Lu := a2 (x)u + a1 (x)u + a0 (x)u , with a2 = 0 . Then the inhomogeneous equation Lu = f has the particular solution x up (x) = u1 (x) W1 (s) f (s) ds + u2 (x) W (s) x W2 (s) f (s) ds, W (s) where W (s) := W (u1 , u2 )(s) and Wj (s) is obtained from W (s) by replacing the j th column (uj , uj ) of W by the vector (0, 1/a2 ) . Remark: If we let G(x; s) = u1 (x)W1 (s) + u2 (x)W2 (s) W (s) then the above formula assumes the elegant form x up (x) = G(x; s)f (s) ds. Proof: A device (due to Lagrange) called variation of parameters is needed. We already used a form of this device to solve the inhomogeneous first order linear equation (5, p. 457). The trick is to let up (x) = v1 (x)u1 (x) + v2 (x)u2 (x) where the functions v1 (x) and v2 (x) are to be found. This attempt to find up is reminiscent of writing the general solution of the homogeneous equation as Au1 + Bu2 . Differentiate: up (x) = v1 u1 + v2 u2 + [v1 u1 + v2 u2 ]. The functions v1 and v2 will be chosen to make v1 u1 + v2 u2 = 0. 6.3. LINEAR EQUATIONS OF SECOND ORDER 269 Using this, we differentiate again up (x) = v1 u1 = v2 u2 + [v1 u1 + v2 u2 ] Now multiply up by a2 , up by a1 , up by a0 , and add to find Lup = v1 Lu1 + v2 Lu2 + a2 [v1 u1 + v2 u2 ] = a2 [v1 u1 + v2 u2 ]. If we can choose v1 and v2 so that a2 [ = f , then indeed Lup = f , so u0 = v1 u1 +v2 u2 is a particular solution. It remains to see if v1 and v2 can be found which satisfy the two needed conditions v1 u1 + v2 u2 = 0 v1 u1 + v2 u2 = f . a2 These two linear equations for v1 and v2 may be solved by Cramer’s rule (Theorem 33, page 429), 0 u2 0 u2 f f /a2 u2 1/a2 u2 W1 v1 = = = f W W W u1 0 u1 0 f u1 f /a2 u1 1/a2 W2 v2 = = = f W W W Integration of these equations yields v1 and v2 , which, when substituted into up = u1 v1 + u2 v2 , do give the stated result With this theorem, knowing the general solution of the homogeneous equation Lu = 0 ˜ allows us to find a particular solution of the homogeneous equation Lup = f . The general solution u of the inhomogeneous equation Lu = f is then the up coset of N(L) , that is, all functions of the form u = up + u. ˜ This puts the burden on finding the general solution of the homogeneous equation. Examples: (1) . The homogeneous equation x2 u − 3xu + 3u = 0, x = 0 , has the two linearly independent solutions u1 (x) = x, u2 (x) = x3 —which might have been found by the power series method. Therefore a particular solution of the inhomogeneous equation x2 u − 3xu + 3u = 2x4 can be found by the variation of parameters. We try up = v1 x3 + v2 x and are led to the equations v1 = − 2 x4 3 x x2 , 3 2x v2 = 2 x4 x x2 3 2x 270 CHAPTER 6. LINEAR ORDINARY DIFFERENTIAL EQUATIONS or v1 = −x2 , Thus v1 (x) = − x3 , 3 v2 = 1. v2 (x) = x. Therefore x3 2 ) + x3 (x) = x4 3 3 The general solution to the inhomogeneous equation is found by adding the general solution of the homogeneous equation to this particular solution, up (x) = x(− 2 u(x) = Ax + Bx3 + x4 . 3 (2) The homogeneous equation u + u = 0 has the linearly independent solutions u1 (x) = cos x, u2 (x) = sin x . Let us solve u + u = f (x), where f is an arbitrary continuous function. Trying up (x) = v1 cos x + v2 sin x, we are led to v1 = Thus −f sin x , 1 v2 = f cos x . 1 x v1 (x) = − x f (s) sin s ds, Therefore v2 (x) = x up (x) = − cos x f (s) cos s ds. x f (s) sin s ds + sin x f (s) cos s ds x = f (s)[− sin s cos x + cos s sin x] ds x = f (s) sin(x − s) ds. Consequently, the handsome formula x u(x) = A sin x + B cos x + f (s) sin(x − s) ds is the general solution of the inhomogeneous equation u + u = f . Exercises (1) Solve the following initial value problems any way you can. Check your answers by substituting back into the differential equation. 6.3. LINEAR EQUATIONS OF SECOND ORDER (a) u + 2u = 0, u(1) = 2 (b) u + 3u + 2u = 7, u(0) = 0, u (0) = 0 (c) u + 3u + 2u = 2ex , (d) u + 3u + 2u = e−2x , u(0) = 0, u (0) = 1 u(0) = 1, u (0) = 0 u( π ) = 1 6 (e) (tan x) du + u − sin2 x = 0, dx u(0) = u (0) = 1, x (− π , π ). 22 (f) u + u = tan x, (g) u − 8u = 0, (h) u u(0) = 1, u (0) = 2, u (0) = 3 − k 4 u = 0. General solution. (i) u − 6u + 10u = x2 + sin x, (j) u − 7u − 8u = 0, (k) xu + u = x3 , (l) u + 4u = (m) u − u = 4x2 ex (o) − u(4) + u(0) = u (0) = 0. u(0) = 3, u (0) = 8, u (0) = 65, u (0) = 511. u(1) = 1. + cos 2x, u(0) = 0, u (0) = 1 . General solution. (n) u = 3u + 3u − u = 0, u(5) 271 3u(3) − 3u(2) u(0) = 1, u (0) = 2, u (0) = 3 − 4u(1) + 4u = 0 . General solution. [Hint: λ5 − λ4 + 3λ3 − 3λ2 − 4λ + 4 = (λ2 − 1)(λ2 + 4)(λ − 1) ]. (2) Find the first four non-zero terms (if there are that many) in the power series solutions about x = 0 for the following equations. (a) u − xu − u = 0, u(0) = u (0) = 1 (b) u − 2xu + 2u = 0, u(0) = 0, u (0) = 1. (c) u − 2xu − 2u = 0, u(0) = 1, u (0) = 0. (d) u + xu = 0, u(0) = 1, u (0) = −1. (e) u − xu = 0, u(0) = 1, u (0) = u (0) = 0. (f) u − x2 u = (g) u − 1 1− x u 1 , 1 − x2 = 0, 1 = 1 − x2 1 1−x =?] u(0) = 0, u (0) = 0. [Hint: u(0) = 0, u (0) = 1. [Hint: 1 + x2 + x4 + · · · ] (3) a) - e) Find where the power series in Ex. 2 a-e converge. (4) Find the first four non-zero terms (if there are that many) in the power series solutions corresponding to the larger root of the indicial polynomial. (a) 2x2 u − 3xu + 2u = 0 ∞ (b) xu + 2u − xu = 0. [Answer: u(x) = c0 0 x2n. (2n + 1)! (c) 4xu + 2u + u = 0. (d) xu + (sin x)u + x2 u = 0, (e) xu + u = u(0) = 0, u (0) = 1. x2 . (5) (a-e). Investigate the convergence of the series solutions found in Exercise 4 above. 272 CHAPTER 6. LINEAR ORDINARY DIFFERENTIAL EQUATIONS (6) Find the power series solution about x = 0 for the n th order Bessel equation corresponding to the highest root of the indicial polynomial. The answer is: x Jn (x) = ( )n 2 ∞ k=0 (−1)k x ( )2k , k !(k + n)! 2 where we have chosen c0 = 1/2n n! . (7) Find two linearly independent power series solutions of u + xu + u = 0 and prove they are linearly independent. Find all solutions. (8) The Hermite equation is u − 2xu + 2αu = 0. For which value(s) of the constant α are the solutions polynomials - that is, a solution with a finite Taylor series. These are the Hermite polynomials. (9) Find the first three non-zero terms in the power series about x = 0 for two linearly independent solutions of 2x2 u + xu + (x − 1)u = 0. (10) The homogeneous equation Lu := 2x2 u − 3xu − 2u = 0 has the two linearly inde√ pendent solutions u1 (x) = x2 , u2 (x) = x (see Ex. 20c below). Find the general solution of the inhomogeneous equation Lu = log(x3 ) . (11) Let Lu = (1 − x2 )u − 2xu + n(n + 1)u where n is an integer. Show that Lu = 0 has a polynomial solution - the Legendre polynomial. Compute this for n = 3 . (cf. page 104l Ex. 10). dJ0 is a solution dx of the first order Bessel equation. [Hint: Work directly with the equation itself, not with power series]. (12) Let J0 (x) be a solution of the zero th order Bessel equation. Prove (13) Consider the equation a2 (x)u + a1 (x)u + a0 (x)u = 0. (a) Let u(x) := u1 (x)v (x) . Show that the result arranged as an equation for v (x) is a2 u1 v + (2a2 u1 + a1 u1 )v + (a2 u1 + a1 u1 + a0 u1 )v = 0 (b) If u1 is known to be one solution of the equation, show that the second solution is u2 (x) u2 (x) = u1 (x) w(x) dx where w(x) is a solution of the first order equation a2 u1 w + (2a2 u1 + a1 u1 )w = 0. 6.3. LINEAR EQUATIONS OF SECOND ORDER 273 Thus, if one solution of a second order linear O.D.E. is known, the problem of finding a second solution is reduced to the problem of solving a first order linear O.D.E. which can always be solved by separation of variables. (14) Apply Exercise 13 to the following: (a) One solution of 2x2 u − 3xu + 2u = 0 is u1 (x) = x2 . Find another. (b) One solution of x2 u − xu + u = 0 is u1 (x) = x . Find another. (c) One solution of (1 + x)xu − xu + u = 0 is u1 (x) = x . Find another, and then write down the general solution. (d) One solution of the equation x2 u + 2xu = 0 is clearly u1 (x) = 1 . Find another. Prove the solutions are linearly independent for x > 0 . Find the general solution of x2 u + 2xu = 1 . (15) Consider the O.D.E. u + a(x)u + b(x)u = 0 , where a and b are continuous about x0 . If the graphs of two solutions are tangent at x = x0 , are these two solutions linearly dependent? Explain: Can you make an even stronger deduction? (16) (a) Let L be a constant coefficient differential operator with characteristic polynomial p(λ) . If p(λ) = p(−λ) , prove L(sin kx) = p(ik ) sin kx (b) Apply this to find a particular solution of u − u = sin 2x (17) Find a particular solution of the equation u − n2 u = f, n = 0. [You will need: sin h(α − β ) = sin h α cos h β − sin h β cos h α ]. 1x [Answer: u(x) = f (s) sin h n(x − s) ds. ] n0 (18) Use the method of variation of parameters to find a particular solution to u = f . Compare with Exercise 5, p. 282. (19) Consider the differential operator Lu := x2 u + axu + bu, where a and b are constants. This is called Euler’s equation. It is the simplest equation with a regular singularity at x = 0 . (a) Show that Lxρ = q (ρ)xρ , where q (ρ) is the indicial polynomial for L . (b) If the roots of q (ρ) = 0 are distinct, find two solutions of Lu = 0, x > 0 , and prove the solutions are linearly independent for x > 0 . (c) If the roots ρ1 and ρ2 of q (ρ) = 0 coincide, take the derivative with respect to ρ of the equation in a) - holding x fixed - to obtain the candidate u2 (x) = xρ1 ln x for a second solution. Verify by substitution that u2 is a solution in this case and prove the two solutions u1 (x) = xρ1 , u2 (x) = xρ1 ln x, are linearly independent for x = 0 . x>0 274 CHAPTER 6. LINEAR ORDINARY DIFFERENTIAL EQUATIONS (20) Apply the method of Exercise 19 to find two linearly independent solutions for each of the following Euler equations a). x2 u + xu = 0. b). 2x2 u − 3xu + 2u = 0. c). 2x2 u − 3xu − 2u = 0. d). x2 u − xu + u = 0. (21) (a) Use the result of Ex. 19 a) to find a particular solution of the equation Lu = xα , where Lu := x2 u + axu + bu, with a and b constant, and where α is not a root of the indicial polynomial q (ρ) (cf. Ex. 6, p. 300). (b) If neither α not β are roots of q (ρ) , find a particular solution to the inhomogeneous equation Lu = Axα + Bxβ . (c) Apply this procedure to find the general solution of 2x2 u − 3xu − 2u = 3x − 4x1/3 . (d) How can you solve Lu = xα if α is a root of the indicial polynomial? (22) (a) If u has n derivatives and λ is a constant, prove Dn [eλx u] = eλx (D + λI )n u. Thus (D + λI )n u = e−λx Dn [eλx u] . (b) Let L = (D − a)n be a constant coefficient differential operator with characteristic polynomial p(λ) = (λ − a)n . Show u(x) is a solution of the equation Lu = 0 if and only if u(x) has the form u(x) = eax Q(x), where Q(x) is a polynomial of degree ≤ n − 1 . (23) Consider the O.D.E. Lu = f , where L is a second order constant coefficient operator, and let λ1 and λ2 be the characteristic roots of L1 . Assume i) Reλ1 < 0 and Reλ2 < 0 , and ii)there is some constant M such that |f (x)| ≤ M for all x ∈ [0, ∞] . (a) Prove every solution of Lu = f is bounded for x ∈ [0, ∞] . (b) If lim f (x) = 0 , prove that as x → ∞ , every solution of Lu = f tends to zero. x→∞ (24) Consider the operator Lu := a2 (x)u +a1 (x)u +a0 (x)u , where the aj ’s are continuous for x ∈ [α, β ] . Let u1 , u2 and φ1 , φ2 both be bases for N(L) . Prove there is a constant k = 0 such that W (u1 , u2 )(x) = kW (φ1 , φ2 )(x) for all x ∈ [α, β ]. 6.3. LINEAR EQUATIONS OF SECOND ORDER 275 (25) (a) Generalize the procedure of Ex. 21b and show how the inhomogeneous Euler equation Lu = f can be solved if f has a power series expansion. You will have to assume that no root of the indicial polynomial is a positive integer. (b) Apply a) to find a particular solution (as a power series) of 2x2 u + 3xu − u = 1 . 1−x (26) Given the equation Lu := u + a(x)u + b(x)u = 0 has solutions u1 (x) = sin x , u2 (x) = tan x , find the general solution of the inhomogeneous equation Lu = cos x . 1 + sin2 x (27) (a) If Lu := a2 u + a1 u + a0 u and L∗ v := (a2 v ) − (a1 v ) + a0 v , prove the Lagrange identity d [a2 (u v − v u) + (a1 − a2 )uv ], vLu − uL∗ v = dx where the functions aj are assumed to be sufficiently differentiable. The operator L∗ is the adjoint of L . (b) Show that L is self-adjoint, L = L∗ , if and only if a2 = a1 . Write the Lagrange identity in this case. (c) If c1 u1 (x) + c2 u2 (x) is the general solution of the equation Lu = 0 find the c3 u1 + c4 u2 general solution of the adjoint equation L∗ v = 0 . [Answer: v =. u1 u2 − u1 u2 (d) Let u be a twice differentiable function which vanishes at α and β . Show the adjoint operator L∗ has the property that for all such functions u and v , v , Lu = L∗ v, u where β f , g := f (x)g (x) dx. α (28) (a) Let L be a self-adjoint operator, L = L∗ . If LX1 = λ1 X1 and LX2 = λ2 X2 , where λ1 and λ2 are real number, λ1 = λ2 , prove X1 and X2 are orthogonal X1 , X2 = 0. [Hint: Compare X2 , LX1 = λ1 X2 , X1 with LX2 , X1 = λ2 X2 , X1 ]. 2 d (b) Let L = dx2 . For what values of λ can you find a non-zero solution u of the equation Lu = λu where u satisfies the boundary conditions u(0) = u(π ) = 0 ? (c) Apply parts a) and b) as well a Ex. 27d to prove π sin nx, sin mx = sin nx sin mx dx = 0, 0 where n and m are unequal integers. 276 CHAPTER 6. LINEAR ORDINARY DIFFERENTIAL EQUATIONS (29) . Consider the boundary value problem Lu := u + u = f, u(0) = 0, u(π ) = 0, where f is continuous in [0, π ] . a). Show that if a solution exists, it is not unique. b). Show a solution exists if and only if π f (x) sin x dx = 0. 0 [Hint: First find the general solution of the homogeneous equation]. Remark: In the notation of Ex. 27, we have L = L∗ . Moreover, N(L∗ ) = span{ sin x } . The conclusions of b) states that R(L) = N(L∗ )⊥ , and illustrates how Theorem 34, p. 431, is used in infinite dimensional spaces. (30) . A proof of Theorem 3. Since a2 (x) = 0 , the equation can be written as u + a(x)u + b(x)u = 0. If ∞ ∞ αn x2 , a(x) = n=0 let βn xn , b(x) = n=0 ∞ cn xn , u(x) = where u(0) = c0 , u (0) = c1 , n=0 (a) Imitate the example to prove the remaining cn ’s must satisfy n cn+2 = − k=0 [αn−k (k + 1)ck+1 + βn−k ck ] . (n + 2)(n + 1) Show that if c0 and c1 are known, then the remaining cn ’s are determined inductively by the above formula. (b) Because the series for a(x) and b(x) converge for |x| < r , if R is any number M M less than r , there is a constant M such that for all n, |αn | ≤ Rn and |βn | ≤ Rn (cf. p. 72, line 2). Define constants Cn as C0 = |c0 | , C1 = |c1 | , and for n ≥ 0 n [(k + 1)Ck+1 + Ck ]Rk + M Cn+1 R M Rn Cn+2 = (i) Prove |cn | ≤ Cn , k=0 (n + 2)(n + 1) n = 0, 1, 2, 3, . . . . 6.3. LINEAR EQUATIONS OF SECOND ORDER 277 (ii) Prove Cn+1 xn+1 n(n − 1) + M nR + M R2 = |x| . Cn xn R(n + 1)n ∞ Cn xn converges for |x| < R , where R is any number less than (iii) Prove r. n=0 ∞ cn xn converges for |x| < R , where R is any number less than (iv) Prove r. n=0 (31) . (a) Let u(x) and v (x) be solutions of the equations L1 u := u + a(x)u = 0 , and L2 v := v + b(x)v = 0 respectively, in some interval, where a and b are continuous. If b(x) ≥ a(x) throughout the interval, prove there must be a zero of v between any two zeroes of u . This is the Sturm oscillation theorem. [Hint: Suppose α and β are consecutive zeroes of u and u > 0 in (α, β ) . Prove β (vL1 u − uL2 v ) dx = vu 0= α (b) (c) (d) (e) β α β − (b − a)uv dx, α and show, because u (α) > 0, u (β ) < 0 , there is a contradiction if v does not vanish somewhere in (α, β ) .] Let u1 (x) and u2 (x) be two linearly independent solutions of u + a(x)u = 0 . Prove between any two zeroes of u1 , there is a zero of u2 and vice verse. Thus, the zeroes interlace. Apply b) to the solutions sin γx and cos γx of the equation u + γ 2 u = 0 to conclude a well-known fact. If b(x) ≥ δ > 0 , where δ is a constant, prove every solution of v + b(x)v = 0 must have an infinite number of zeros by comparing v with a solution of u + γ 2 u = 0 , where γ is an appropriate constant. Apply d) to prove every solution of 3 )v = 0, 4x2 has an infinite number of zeroes for x ≥ 1 . √ (f) Let u1 (x) be a solution of the first order Bessel equation. Take v (x) = u1 (x) x and show that v satisfies the equation in e). Deduce that J1 (x) has infinitely many zeroes. v + (1 − (32) Let L1 and L2 be linear constant coefficient differential operators with characteristic polynomials p1 (λ) and p2 (λ) respectively. (a) If there is a function u(x), u(x) ≡ 0 , which satisfies both L1 u = 0 and L2 u = 0 , prove the polynomials p1 and p2 have a common root. (b) If p1 and p2 have no common roots, prove the solution of L1 L2 u = 0 are exactly all functions of the form c1 u1 + c2 u2 where u1 is a solution of L1 u1 = 0 , and u2 of L2 u2 = 0 . Thus N(L1 L2 ) may be decomposed into the two complementary subspaces N(L1 ) and N(L2 ), N(L1 L2 ) = N(L1 ) ⊕ N(L2 ) . 278 CHAPTER 6. LINEAR ORDINARY DIFFERENTIAL EQUATIONS (33) Imitate Exercise 30 and prove Theorem 3. Make sure to observe the trouble in trying to find the solution corresponding to the lower root of the indicial polynomial if the roots differ by an integer. (34) The purpose of this exercise is to show that an equation with an irregular singular point may have a formal power series at that point which does not converge to the solution. Try to find a solution of the form (18) for the following equation which has an irregular singularity at x = 0 , x6 u + 3x5 u − 4u = 0. What happened? Two linearly independent solutions for x = 0 are u1 (x) = e−1/x 2 2 and u2 (x) = e1/x . How does this explain the situation (cf. p. 95-6)? (35) Consider the equation 2x2 u + 3xu + u = (x) . Two linearly independent solutions of the homogeneous equation are x−1/2 and x−1 . Find the general solution of the homogeneous equation. (36) Consider the equation u + b(x)u + c(x)u = 0 , where b and c are continuous functions and c(x) < 0 . Prove that a solution cannot have a positive maximum or negative minimum. 6.4 First Order Linear Systems Quite often in applications you must consider systems of differential equations. We shall consider a linear system of the form du1 + a11 (x)u1 + a12 (x)u2 + · · · + a1n (x)un = f1 (x) dx du2 + a21 (x)u1 + a22 (x)u2 + · · · + a2n (x)un = f2 (x) dx . . . . . . dun + an1 (x)u1 + an2 (x)u2 + · · · + ann (x)un = fn (x) , dx (6-22) (6-23) (6-24) (6-25) where the functions aij (x) and fj (x) are continuous. If we anticipate the next chapter and write the derivative of a vector U = (u1 , . . . , un ) as the derivative of its components, d U (x) = dx du1 du2 dun , ,··· , dx dx dx , then the above system can be written in the clean form dU + A(x)U = F (x), dx where, A(x) = ((aij )), F = (f1 , f2 , . . . , fn ) (6-26) 6.4. FIRST ORDER LINEAR SYSTEMS 279 and U (x) = (u1 , u2 , . . . , un ). The initial value problem for the system of differential equations (22) is to find a vector U (x) which satisfies the equation as well as the initial condition U (x0 ) = U0 , (6-27) where U0 is a vector of constants. It is useful to observe that the initial value problem for a single linear equation of order n u(n) + an−1 (x)u(n−1) + · · · + a0 (x)u = f (x) u(x0 ) = α1 , u (x0 ) = α2 , . . . , u(n−1) (x0 ) = αn , can be transformed to the conceptually simpler problem (22)-(23). Let u1 (x) := u(x) , u2 (x) := u (x), . . . , and un (x) = u(n−1) (x) . Then the components of the vector U (x) = (u1 , u2 , . . . , un ) must obviously satisfy the relations du1 dx du2 dx = u2 = u3 · · · dun−1 dx dun dx = un = −a0 u1 − a1 u2 − · · · − an−1 un + f (x), which may be written as U = M U + F, where M (x) = 0 0 1 0 . . . 0 1 ··· ··· . . . 0 0 0 ··· −a0 −a1 −a2 · · · 0 0 1 −an−1 , and F = (0, 0, . . . , 0, f ). The initial conditions read U (x0 ) = (α1 , α2 , . . . , αn ). Conversely, if U is any solution of this system of equations with the proper initial conditions, then the first component u1 (x) is a solution of the single n th order equation. Thus, the general theory of a single n th order linear O.D.E. is completely subsumed as a portion of the theory of a system of first order linear O.D.E.’s. You should be warned that this generalization is mainly of theoretical value and is of little use if you are seeking an explicit solution. Both the existence and uniqueness theorems are true for systems, and supply an example where the theoretical advantages of systems become clear. To illustrate this, we shall prove the uniqueness theorem. Our proof is patterned directly after the uniqueness proof for a single equation (Theorem 1). 280 CHAPTER 6. LINEAR ORDINARY DIFFERENTIAL EQUATIONS Theorem 6.9 (Uniqueness). Let A(x) be a matrix whose coefficients aij (x) are bounded |aij (x)| ≤ M for x in some interval, and let F (x) be a continuous function. Then there is at most one solution U (x) of the initial value problem U + AU = F, U (x0 ) = U0 . Remark: The existence theorem states, if A is nonsingular and each element is integrable there is at least one solution. Thus, there is then exactly one solution. Proof: Assume U1 and U2 are both solutions. Let W = U1 − U2 . Then W satisfies the homogeneous equation and is zero at x0 , W + AW = 0, W (x0 ) = 0. Take the scalar product of this with W , W, W + W, AW = 0. But = w1 w1 + ww2 + · · · + wn wn 1d 2 2 (w2 + w2 + · · · + wn ) = 2 dx 1 1d = W 2. 2 dx W, W Thus, 1d W 2 = − W, AW . 2 dx By Theorem 17, p. 173 and the hypothesis |aij (x)| ≤ M , we know n W, AW |aij |2 ≤ 1/2 2 W ≤ nM W 2 . i,j =1 so that 1d W 2 dx 2 ≤ nM W 2 . Therefore, as on p. 462-3 d (W dx or e2nM x 2 2 ) − 2nM W d −2nM x [e W dx 2 ≤ 0, ≤ 0. Because e2nM x is always positive, by the mean value theorem the quantity [ decreasing function. Its value for x > x0 is then less than at x0 , e−2nM x W (x) 2 ≤ e−2nM x0 W (x0 ) 2 , x ≥ x0 is a 6.4. FIRST ORDER LINEAR SYSTEMS 281 Consequently W (x) ≤ enM (x−x0 ) W (x0 ) , x ≥ x0 . Since W (x0 ) = 0 and the norm is non negative, we have 0 ≤ W (x) ≤ 0, x ≥ x0 , which implies x ≥ x0 . W (x) = 0, Therefore, W (x) ≡ 0 x ≥ x0 . By replacing x with −x in the original equation, the same statement is true for x ≤ x0 . Thus, throughout the interval where |aij (x)| ≤ M , we have proved W (x) ≡ 0 , that is, U1 (x) ≡ U2 (x) , so the solution is indeed unique. Because a single linear n th order O.D.E. can be replaced by an equivalent system of equations, this theorem implies the uniqueness theorem for a single O.D.E. of order n if the coefficients aj (x) are bounded in some interval - which is certainly true in every interval if the aj ’s are continuous. With this theorem, a short section closes. Further developments in the theory of systems of linear O.D.E.’s make elegant use of linear operators in general and matrices in particular. As you might well accept, the exercises contain a few of the more accessible results. Exercises (1) . Find functions u1 (x), u2 (x) which satisfy u1 = u1 u2 = u1 − u2 , with the initial conditions U (0) := (u1 (0), u2 (0) = (1, 0) . Find the general solution too. [Hint: Solve the equation u1 = u1 first, then substitute. Answer: General solution is U (x) = (γ1 ex , γ1 ex + γ2 e−x ) ]. 2 (2) Consider the system u1 = 2u1 − u2 u2 = 3u1 − 2u2 , that is, U = AU, where A= 2 −1 3 −2 . Let φ1 (x) = au1 + bu2 , φ2 (x) = cu1 + du2 , where a, b, c and d are constants. Thus, Φ = SU, where S= ab cc , Φ = (φ1 , φ2 ). 282 CHAPTER 6. LINEAR ORDINARY DIFFERENTIAL EQUATIONS (a) By direct substitution, find the differential equations satisfied by the φj ’s and show they can be written as Φ = SAS −1 Φ. (b) Pick the coefficients of S so the matrix SAS −1 is a diagonal matrix, SAS −1 = λ1 0 0 λ2 ≡Λ (c) Solve the resulting equation Φ = ΛΦ . [Solution: φ1 = αex , might have φ1 and φ2 interchanged]. φ2 = βe−x —you (d) Use this solution to solve the original equations for U . [hint: Recall U = S −1 Φ ]. (3) By only a slight modification of Exercise 2, solve v1 = 2v1 − v2 v2 = 3v1 − 2v2 . [Hint: Everything, even the algebra, is identical. The only difference is in part c) you have to solve Φ = ΛΦ . Then V = S −1 Φ as before]. (4) A bathtub initially contains Q1 gallons of gin and Q2 gallons of vermouth, where Q1 + Q2 = Q, Q being the capacity of the tub. Pure gin enters from one faucet at a constant rate of R1 gallons per minute, while pure vermouth enters from another faucet at a constant rate R2 gallons per minute. The well stirred mixture of martinis leaves the drain at a rate R1 + R2 gallons per minute (so the total amount of fluid in the tub remains constant at Q gallons). Let G(t) be the quantity of gin in the tub at time t and V (t) be the quantity of vermouth. (a) Show G dG = R1 − (R1 + R2 ) dt Q dV V = R2 − (R1 + R2 ). dt Q (b) Integrate this simple system of equations to find G(t) and V (t) . Also find their ratio P (t) := G(t)/V (t) which is the strength of the martinis at time t . (c) Prove lim P (t) = t→∞ R1 . R2 Compare this with your intuitive expectations. (d) If Q1 = 20, Q2 = 0, R1 = R2 = 1 gal/min, how long must I wait to get a perfect martini (for me, perfect is 5 parts gin to 1 part vermouth). [Needless to say, the mathematical model is applicable to many problems in the mixing of chemicals which do not react with each other. If the chemicals do interact, the model must be changed to account for the interaction]. 6.5. TRANSLATION INVARIANT LINEAR OPERATORS 283 (5) Consider the homogeneous equation U = A(x)U , where A is non-singular (so det A = 0 ). Assuming the validity of the existence theorem, prove there exists n linearly independent vectors U1 (x), U2 (x), . . . , Un (x) which are solutions, Uk = AUk , k = 1, . . . , n . [Hint: Construct n solutions which are linearly independent at x = x0 , and then prove a set of n solutions are linearly independent in an interval if and only if they are linearly independent at x = x0 , where x0 is a point in the interval]. (6) Let LU := U − A(x)U as in Exercise 5. Prove dim N(L) = n . (7) Let LU := U − A(x)U . If a basis U1 , . . . , Un , for N(L) is known, prove the inhomogeneous equation LU = F can be solved by variation of parameters. That is, seek a particular solution Up of LU = F in the form n Up = Ui vi i=1 where the vi (x) are scalar-valued functions (not vectors). (a) Compute Up and substitute into the O.D.E. to conclude Up is a particular solution if n Ui vi = F. i=1 (b) Let U be the n × n matrix whose columns are U1 , U2 , . . . , Un . Prove U is invertible and show vi (x) = (U −1 F )ith component . (c) Show n Up (x) = x Ui (x) [U −1 (s)F (s)]i ds. i=1 This may also be written in the form x Up (x) = U (x) U −1 (s)F (s) ds (d) Apply this procedure to find the general solution of uq = u1 + e2x cf. Ex 1 u2 = u1 − u2 + 1. 6.5 Translation Invariant Linear Operators This section develops various extensions and applications of the procedure used to solve linear ordinary differential equations with constant coefficients. The results will be proved as a series of exercises interspersed by various remarks. Definition: The translation operator Tt acting on functions u(x) is defined by the property (Tt u)(x) = u(x − t). x, t ∈ R. 284 CHAPTER 6. LINEAR ORDINARY DIFFERENTIAL EQUATIONS A linear operator L is translation invariant if LTt = Tt L for every t , that is, if L(Tt u) = Tt (Lu) for every t and for every function u for which the operators are defined. Example: 1 Let (Lu)(x) := 3u(x) − 2u(x − 1) . Then [Tt (Lu)](x) = 3u(x − t) − 2u(x − t − 1), and [L(Tt u)](x) = Lu(x − t) = 3u(x − t) − 2u(x − t − 1). Thus, LTt = Tt L, so the operator L is translation invariant. 2. Let (Lu)(x) := 3xu(x). Then [Tt (Lu)](x) = 3(x − t)u(x − t), and [L(Tt u)](x) = Lu(x − t) = 3xu(x − t). Thus LTt = Tt L, so this operator is not translation invariant. Exercises (1) Which of the following linear operators (verify!) are also translation invariant? (a) (Lu)(x) := cu(x), (b) (Lu)(x) := c ≡ constant u ( x+ h ) − u ( x) , h x h ≡ constant = 0 . k (x − s)u(s) ds (c) (Lu)(x) := −∞ (d) (Lu)(x) := (x − 1)u(x) (e) (Lu)(x) = du dx (x). (f) Any linear ordinary differential operator with constant coefficients, Lu := an u(n) + an−1 u(n−1) + · · · + a0 u, ak constants. (g) Any linear ordinary differential operator with variable coefficients. n ak u(x − γk ), (h) (Lu)(x) = ak and γk constants. k=1 [Answers: All but d) and g) are translation invariant]. 6.5. TRANSLATION INVARIANT LINEAR OPERATORS 285 (2) If L1 and L2 are translation invariant operators which map some linear space into itself, then so are a). AL1 + BL2 , A, B constants b). L1 L2 and L2 L1 c). If in addition L is invertible, then L−1 is also translation invariant. Theorem 6.10 . If L is a translation invariant linear operator, then L(eλa ) = φ(λ)eλx . Proof: We know so little about L that all we can hope to do is compute Tt L(eλx ) and LTt (eλx ) and see what happens. Let Leλx = ψ (λ; x) , where ψ is some unknown function whose value depends on both λ and x . Then Tt L(eλx ) = ψ (λ; x − t), while LTt eλx − Leλ(x−t) = L(e−λt eλx ) = e−λt Leλx = e−λt ψ (λ; x). Since Tt L = LTt , we find e−λt ψ (λ; x) = ψ (λ; x − t), or ψ (λ; x) = ψ (λ; x − t)eλt . Because the left side does not contain t , the right side must not depend on which value of t is chosen. Using this freedom, we let t = x and conclude ψ (λ; x) = ψ (λ; 0)eλx . By setting φ(λ) = ψ (λ, 0) , we find Leλx = ψ (λ; x) = φ(λ)eλx as desired. Exercises (3) By direct substitution, find φ(λ) for those operators in Exercise 1 which are translation invariant. [Answers: a) φ(λ) = c , b) φ(λ) = (e−ah − 1)/h c) φ(λ) = n 0 λs k (−s)e ak λk (the characteristic polynomial), ds , d) φ(λ) = cλ , f) φ(λ) = −∞ k=0 n ak e−λγ k ]. h) φ(λ) = k=1 286 CHAPTER 6. LINEAR ORDINARY DIFFERENTIAL EQUATIONS (4) With the same assumptions and notation as in the theorem, if φ(λ) = 0 is a polynomial equation with N distinct roots λ1 , λ2 , . . . , λN , so φ(λj ) = 0, j = 1, . . . , N , prove any linear combination of the function eλjx is in N(L) , that is, N Lu = 0 where cj eλjx . u(x) = 1 (5) Apply the theorem to find the solution of Exercise 4 for the equation Lu = 0 , where (a) Lu := u − u − u . (b) (Lu)(x) = u(x + 2) − u(x + 1) − u(x) . (c) Find a special solution of b) which satisfies the “initial conditions” u(0) = u(1) = 1 . Compute u(2), u(3) and u(4) directly from b). The integers u(n), n ∈ Z+ are called the Fibonacci sequence. [Answer: u(2) = 2, u(3) = 3, u(4) = 5 , and surprisingly, √ n+1 √ n+1 1+ 5 1− 5 1 ]. − u(n) = √ 2 2 5 (6) Solve u(x) − au(x − 1) + b2 u(x − 2) = 0 with the initial conditions u(1) = a, u(2) = a2 − b2 . Compare with Exercise 17, p. 440. (7) Extend Exercises 5(b - c) and 6 to develop a theory of second order difference equations with constant coefficients. Thus Lu := a2 u(x + 2) + a1 u(x + 1) + a0 u(x), a2 = 0, x ∈ Z. In particular, you should, (a) Find two linearly independent solutions of Lu = 0 . Remember the degenerate case a2 − 4a0 a2 = 0 . 1 (b) Prove there is at most one solution of the initial value problem Lu = f , u(0) = α0 , u(1) = α1 . (c) Prove dim N(L) = 2 . Remarks: The ideas presented above generalize immediately to the case where X ∈ Rn instead of just R1 , as well as to the case where the u ’s are vectors and not scalars. These few concepts lie at the heart of any treatment of many linear operators with constant coefficients, especially ordinary and partial differential operators. This mildly abstract formulation manages to penetrate through the obscuring details of particular cases to observe a rather simple structure unifying many seemingly different problems. 6.6 A Linear Triatomic Molecule A molecule composed of three atoms is called a triatomic. Consider a triatomic molecule whose equilibrium configuration is a straight line with two atoms of equal mass m situated on either side of a central atom of mass M . 6.6. A LINEAR TRIATOMIC MOLECULE 287 a figure goes here To simplify the situation further, we shall only consider the motion along the straight line (axis) of these atoms, and shall assume the inter-atomic forces can be approximated by springs with equal spring constants k . u1 (t), u2 (t) and u3 (t) will denote the displacements of the atoms (see fig.) from their equilibrium position. Newton’s second law, mu = ¨ F , will give the equations of motion. The atom on the left only “feels” the force due to the spring attached to it, the force being equal to the spring constant k times the amount that spring is stretched, u2 − u1 . Thus, mu1 = k (u2 − u1 ). ¨ The central atom “feels” two forces, one from each side, with the resulting equation of motion M u2 = −k (u2 − u1 ) + k (u3 − u2 ). ¨ In the same way, the equation of motion for the remaining atom is mu3 = −k (u3 − u2 ). ¨ Collecting our equations, we have k k u1 + u2 m m k 2k k u2 = ¨ u1 − u2 + u3 M M M k k u3 = u2 − u3 . ¨ m m u1 = − ¨ These are a system of three linear ordinary differential equations with constant coefficients. They cannot be integrated as they stand since each equation involves functions from the other equations, that is, the equations are copied (not surprising since we are considering coupled oscillators. Now we can integrate such a system immediately if they are in the simple form ¨ φ1 = λ1 φ1 ¨ φ2 = λ2 φ2 ¨ φ3 = λ3 φ3 by integrating each equation separately. By using an important method, we will be able to place our system in this special form. Before doing so, it is suggestive to rewrite the system in matrix form k k −m u1 ¨ m u 2 = k − 2k ¨ M M k u3 ¨ 0 m 0 k M k −m u1 u2 . u3 Letting A denote the 3 ×3 matrix, our hope is to somehow change A into a diagonal matrix (one with zeroes everywhere except along the principal diagonal), for then the differential equations will be in a form mentioned above which can be immediately integrated. 288 CHAPTER 6. LINEAR ORDINARY DIFFERENTIAL EQUATIONS The trick is to replace the basis u1 , u2 , u3 by some other basis in which the matrix assumes a diagonal form. The differential equation can be written in the form ¨ U = AU, where U = (u1 , u2 , u3 ) , and the derivative of a vector being defined as the derivative of each of its components. Let φ1 (t), φ2 (t) , and φ3 (t) be three other functions - which we plan to use as a new basis. Then the φj ’s can be written as a linear combination of the uj ’s, φ = s11 u1 + s12 u2 + s13 u3 φ2 = s21 u1 + s22 u2 + s23 u3 φ3 = s31 u1 + s32 u2 + s33 u3 , where sij are constants. Writing S = ((sij )) and Φ = (φ1 , φ2 , φ3 ) , this last equation reads Φ = SU. Taking the derivative of both sides (or going back to the equations defining φj in terms of the uk ’s), we find ¨ ¨ Φ = SU . Because both u1 , u2 and u3 as well as φ1 , φ2 , and φ3 are bases for the solution, the matrix S must be non-singular (its inverse expresses the φj s in terms of the uj ’s). Thus ¨ Φ = SAS −1 Φ. The problem has been reduced to finding a matrix S such that the matrix SAS −1 is a diagonal matrix, λ1 0 0 SAS −1 = 0 λ2 0 ≡ Λ. 0 0 λ3 Multiply by S −1 on the left: AS −1 = S −1 Λ. Since this equation is equally between matrices, their corresponding columns must be equal. ˆ Thus, if we denote by Si , the i th column of S −1 , the above equation then reads ˆ ˆ ASi = λi Si , or ˆ (A − λi I )Si = 0. For each i this is a system of three linear algebraic equations for the three components of ˆ Si . If there is to be a non-trivial solution, we know det(A − λi I ) = 0. Since k − m − λi det(A − λi I ) = k M 0 k m 2k −M − k m 0 λi k M k −m − λi , 6.6. A LINEAR TRIATOMIC MOLECULE 289 (algebra later) k 2 1 + λi )[λi + ( + )k ] m M m We see the three possible values of λ for det(A − λi I ) = 0 are = −λi ( λ1 = 0, λ2 = − k , m λ3 = −k ( 2 1 + ). M m ˆ These numbers λi are the eigenvalues of A . The non-trivial solution Si of the homoˆ geneous equations (A − λi I )Si = 0 corresponding to the i th eigenvalue is called the ˆ eigenvalue of A corresponding to the eigenvalue λi . For example, S2 is the solution of (A − λ2 I )S2 = 0 corresponding to λ2 = −k/m , 0ˆ12 + s We see s22 = 0 while s12 ˆ ˆ k s22 + 0ˆ32 = 0 ˆ s m k 2k k k s12 − ( ˆ − )ˆ22 + s s32 = 0 ˆ M M m M k 0ˆ12 + s22 + 0ˆ32 = 0. s ˆ s m = −s32 . Thus, one solution is ˆ ˆ S2 = (1, 0, −1) ˆ Similarly we find one solution for S1 is ˆ S1 = (1, 1, 1), ˆ while one solution for S3 is 2m ˆ S3 = (1, − , 1). M The computation is over. All that remains is to put the parts together and interpret the solution. If you got lost, presumably this recapitulation will help. We have found a transformation S to new coordinates (φ1 , φ2 , φ3 ) such that the differential equations for ¨ the φj ’s are in diagonal form, φm = λj φj , ¨ φ1 = 0 k ¨ φ2 = − φ2 m 2 1 ¨ φ3 = −k ( + )φ3 . M m The solutions are φ1 (t) = A1 + B1 t, φ2 (t) = A2 cos φ3 = A3 cos k( k t + B2 sin m 2 1 + )t + B3 sin M m k t m k( 2 1 + )t. M m 290 CHAPTER 6. LINEAR ORDINARY DIFFERENTIAL EQUATIONS ˆ Since Φ = SU , and the Sj are the columns of S −1 , 1 1 1 0 − 2m , S −1 = 1 M 1 −1 1 we have U = S −1 Φ , u1 (t) = φ1 (t) + φ2 (t) + φ3 (t) 2m u2 (t) = φ1 (t) − φ3 (t) M u3 (t) = φ1 (t) − φ2 (t) + φ3 (t) Although the solutions φ1 (t), φ2 (t) , and φ3 (t) can now be substituted into the first set of equations for the uj ’s, it is more instructive to leave that step to your imagination and analyze the nature of the solution. (1) If φ1 (t) = 0 but φ2 (t) = φ3 (t) = 0, then u1 (t) = u2 (t) = u3 (t) = A1 + B1 t. Thus all three atoms - the whole molecule - moves with a constant velocity B1 . This is the trivial translation motion of the molecule, simply moving without internal oscillations at all. (2) If φ2 (t) = 0 but φ1 (t) = φ3 (t) = 0 , then u1 (t) = φ2 (t) = −u3 (t), and u2 (t) = 0. Thus, the two outside atoms vibrate in opposite directions with frequency while the center atom remains still: k /m a figure goes here (3) If φ3 (t) = 0 but φ1 (t) = φ2 (t) = 0 2m φ3 (t). M A bit more complicated. The two outside atoms move in the same direction with same u1 (t) = u3 (t) = φ3 (t), u2 (t) = − 2 1 frequency k ( M + m ) , while the center atom moves in a direction opposite to them and with the same frequency but a different amplitude (to conserve linear momentum mu1 + ˙ M u2 + mu3 = 0 ). In the figure we take m = M . ˙ ˙ a figure goes here These three simple motions are called the normal modes of oscillation of the molecule. They are the oscillations determined by the φ1 , φ2 , and φ3 . Every motion of the system is a linear combination of the normal modes of oscillation, the particular oscillation depending on what initial conditions are given. By an appropriate choice of the initial conditions, one or another of the normal modes will result. Otherwise some less recognizable motion will result. Exercises Consider the simpler model of a diatomic molecule 6.6. A LINEAR TRIATOMIC MOLECULE 291 a figure goes here which we will represent as two masses joined by a spring with spring constant k . (a) Show the equations of motion are mu1 = k (u2 − u1 ) ¨ M u2 = −k (u2 − u1 ) ¨ (b) Introduce new variables, Φ = SU , φ1 = s11 + s12 u2 φ2 = s21 u1 + s22 u2 , and find S so that the equation ¨ Φ = SAS −1 Φ is in diagonal form. (c) Solve the resulting equation and find the normal modes of oscillation. Interpret your results with a diagram. 292 CHAPTER 6. LINEAR ORDINARY DIFFERENTIAL EQUATIONS Chapter 7 Nonlinear Operators: Introduction 7.1 Mappings from R 1 to R 1 , a Review . The subject of this section is one you presumably know well. Our intention is to briefly review the more important results, stating them in a form which suggests the generalizations we intend to develop. Consider a function y = f (x), x ∈ R . This function assigns to each number x another real number y . Thus we may write f : R → R. f is a scalar-valued function of a scalar. What are the simplest such functions? Linear ones of course, f (x) = ax + b. In keeping with our more sophisticated terminology, this should be called an “affine” function (mapping, operator, . . . ) since it is linear only if b = 0 . We shall, however, be abusive and refer to such functions as linear mappings. The study of linear functions in one variable, x , is carried out in elementary analytic geometry. At an early age we enlarged our vocabulary of functions from linear ones to a more general class which includes, for example, √ f1 (x) = ax2 + bx + c, f2 (x) = sin x, f3 (x) = x. These functions are all examples of nonlinear functions. They map the reals (only the positive reals in the case of f3 ) into the reals. The portion of the reals for which they are defined is called their domain of definition, D(f ) . Thus D(f1 ) = R1 , D(f2 ) = R1 , D(f3 ) = { x ∈ R1 : x > 0 }. The class of all real valued functions of a real variable is too large to consider. For most purposes it is sufficient to restrict oneself to the class of continuous or sufficiently differentiable functions. Here is an outline of the basic definitions and theorems from elementary calculus. In our prospective generalization from the simplest case of a function (operator) f which maps numbers to numbers, f : R1 → R1 , to the case of a function from vectors to vectors f : Rn → Rm , all of these concepts and results will need to be extended. 293 294 CHAPTER 7. NONLINEAR OPERATORS: INTRODUCTION Definition: ak converges to a, ak and a ∈ R1 . Definition: Continuity. Theorem 7.1 The set of continuous functions forms a linear space. Definition: The derivative: limit of difference quotient. df dg d Theorem 7.2 1. dx (af + bg ) = a dx f + b dx (linearity) dg df d 2. dx (f g ) = f dx + ( dx )g (Product rule) df dg d 3. dx (f ◦ g ) = dg dx (Chain rule) Theorem 7.3 The Mean Value Theorem. Definition: The integral. b a f (x) dx = − Theorem 7.4 1. a b 2. f (x) dx b c f (x) dx + c f (x) dx = 1 f (x) dx b a b b 3. [αf (x) + βg (x)] dx = α a a b (f ◦ φ)(x) 4. a dφ dx = dx b Theorem 7.5 1. a x d 2. dx g (x) dx (linearity) a φ(b) f (x) dx (Change of variable in an integral) φ(a) df (x) dx = f (b) − f (a) dx f (t) dt = f (x) a b 3. b f (x) dx + β f (x) a dg dx = f g dx b a b − a df g (x) dx (Integration by parts). dx Remark: These theorems contain essentially all of elementary calculus. What are missing are specific formulas for the derivatives and integrals of the basic functions as well as the application of these theorems to compute maxima, area, etc. Exercises (1) Use the definition of the derivative (as the limit of a difference quotient) to compute the derivatives of the following functions at the given point. a). 3x2 − x + 1, x0 = 2 b). c). d). 1 x+1 , x 1+x , x 1− x , x0 = 2 x0 = 2 x = x0 = 1. 7.2. GENERALITIES ON MAPPINGS FROM RN TO RM . 295 (2) Use the definition of the integral to evaluate 2 x2 dx. 0 You should approximate the area by rectangular strips and evaluate the limit as the width of the thickest strip tends to zero. [Hint: 12 + 22 + 32 + · · · + n2 = n(n+1)(2n+1) ]. 6 (3) Prove that .6 < log 2 < .8 (log 2 = 0.693) by using the definition of the integral to find upper and lower bounds for 2 log 2 = 1 1 dx. x (4) Find the equation of the straight line which is tangent to the curve f (x) = x7/3 + 1 at x = 1 . Draw a sketch indicating both the curve and tangent line. Use the tangent line to approximately evaluate (1.01)7/3 . Find some estimate for the error in your approximation. 7.2 Generalities on Mappings from R n to R m . A function, or operator, F which maps Rn to Rm , is a rule which assigns to each vector X in Rn another vector Y = F (X ) in Rm . It is a function from vectors to vectors, a vector-valued function of a vector. We have already discussed the case when F is an affine operator, Y = F (X ) = b + LX or in coordinates, y1 = b1 + a11 x2 + · · · + a1n x1n y2 = b2 + a21 x2 + · · · + a2n xn · · · ym = bm + am1 x2 + · · · + amn xn Linear algebra can be thought of as the study of higher dimensional analytic geometry, the affine transformations taking the role of the straight line y = b + cx . But now it is time to consider more complicated mappings from Rn to Rm . Here is an y1 = x1 + x2 sin πx3 √ y2 = e1−x1 − x2 . This transformation maps vectors X = (x1 , x2 , x3 ) ∈ R3 to vectors Y = (y1 , y2 ) ∈ R2 . Note the second function is only defined for x2 ≥ 0 . Thus the domain of the transformation F is D(F ) = X ∈ R3 : x2 ≥ 0 . Example: For example, F maps the point (1, 4, 1 ) into the point (3, −1) . 6 296 CHAPTER 7. NONLINEAR OPERATORS: INTRODUCTION It is usual to write a transformation F which maps a set A ⊂ Rn to a set B ⊂ Rm in terms of its components, y1 = f1 (x1 , . . . , xn ) = f1 (X ) y2 = f2 (x1 , . . . , xn ) = f2 (X ) · · · ym = fm (x1 , . . . , xn ) = fm (X ), or more concisely as Y = F (X ). To discuss continuity etc. for nonlinear mappings from Rn to Rm , it is necessary that the distance between points be defined. We shall use the Euclidean norm - although any other norm could also be used. If X = (x1 , . . . , xk ) is a point (or vector, if you like) in Rk , then X= x2 + · · · + x2 . To review briefly, a sequence of points Xj in Rk converges 1 k to a point X in Rk if, given any > 0 , there is an integer N such that Xj − X < textf orall j ≥ N. An open ball in Rk of radius r about the point X0 is the set B (X0 ; r) = { X ∈ Rk : X − X0 < r } . A closed ball in Rk is ¯ B (X0 ; r) = { X ∈ Rk : X − X0 ≤ r }. The only difference is the open ball does not contain the boundary of the ball. In two dimensions, R2 , the names open and closed disc are often used. A set D ⊂ Rk is open if each point X ∈ D is the center of some ball contained entirely within D . The radius may be very tiny. Every open ball is open, as can be seen in the figure. A closed ball is not open since there is no way of placing a small ball about a point on the boundary in such a way that the small ball is inside the larger one. A set A is closed if it contains all of its limit points, that is, if the points Xj ∈ A converge to a point X, Xj → X , then X is also in A . An open ball is not closed, for a sequence of points in the ball may converge to a point on the boundary, and the boundary points are not in the ball. For the special case of R1 , these notions coincide with those of open and closed intervals. Again, sets - like doors - may be neither open nor closed. A point set D is bounded if it is contained in some ball (of possibly large radius). The point X is exterior to D if X does not belong to D and if there is some ball about X none of whose point are in D . X is interior to D if X belongs to D and there is some ball about X all of whose points are in D . X is a boundary point of D if it is neither interior nor exterior to D . Note that a boundary point of D may or may not belong to ¯ D . For example, the boundaries of the open and closed balls B (0; r), B (0; r) are the same. The boundary of a set D is denoted by ∂D . It is evident that a set is open if and only if every point is an interior point, and a set is closed if and only if it contains all of its boundary points. Definition: Let A be a set in Rn and C a set in Rm . The function F : A → C is continuous at the interior point X0 ∈ A if, given any radius > 0 , there is a radius δ > 0 7.2. GENERALITIES ON MAPPINGS FROM RN TO RM . 297 such that F (X ) − F (X0 ) < [Observe the norm on the left is in It is easy to prove Rm textf orall X − X0 < δ. while that on the right is in Rn ]. Theorem 7.6 . An affine mapping F (X ) = b + LX from Rn to Rm is continuous at every point X0 ∈ Rm . Proof: First, F (X ) − F (X0 ) = b + LX − b − LX0 = L(X − X0 ) . Thus, F (X ) − F (X0 ) = L(X − X0 ) . Let ((aij )) be a matrix representing L with respect to some bases for Rn and Rm . Then by Theorem 17, p. 373 L(X − X0 ) 2 = L(X − X0 ), L(X − X0 ) ≤ k L(X − X0 ) where m X − X0 , n a2 . ij k2 = i=1 j =1 Therefore L(X − X0 ) ≤ k X − X0 . It is now clear that if X → X0 , then L(X − X0 ) → 0 . More formally, given any δ = k+1 , we have F (X ) − F (X0 ) < textf orall > 0 , if X − X0 < δ. The following theorems have the same proofs as were given earlier for special cases. (See a first year calculus book and our Chapter 0). Theorem 7.7 . Let F1 and F2 map A ⊂ Rn into C ⊂ Rm . If F1 and F2 are continuous at the interior point X0 ∈ A , then 1. aF1 + bF2 is continuous at X0 . 2. F1 , F2 is continuous at X0 . Theorem 7.8 . Let F = (f1 , . . . , fm ) map A ⊂ Rn into C ⊂ Rm . Then F is continuous at the interior point X0 ∈ A if and only if each of the fj , j = 1, . . . , m , is continuous at X0 . Theorem 7.9 . Let F : A → C , where A is a closed and bounded (= compact) set. If F is continuous at every point of A , then it is bounded; that is, there is a constant M such that F (x) ≤ M for all X ∈ A . Moreover, if M0 is the least upper bound, then there is a point X0 ∈ A such that F (X0 ) = M0 . Similarly, if m0 is the greatest lower bound for F , then there is a point X1 ∈ A such that F (X1 ) = m0 . There is nothing better than to close this otherwise unauspicious section with one of the crown jewels of mathematics - the Fundamental Theorem of Algebra, all of whose proofs require the non-algebraic notion of continuity. Let p(z ) = a0 + a1 z + · · · + an z n , (n ≥ 1), where the aj ’s are complex numbers and an is not zero. For every complex number z , the value of the function p(z ) is a complex number. Thus p : C → C . We want to prove there is at least one z0 ∈ C such that p(z0 ) = 0 . 298 CHAPTER 7. NONLINEAR OPERATORS: INTRODUCTION Lemma 7.10 . p(z ) is a continuous function for every z ∈ C . Proof: Identical to the proof that a real polynomial is continuous everywhere. Lemma 7.11 Let D be a set in the complex plane in which p(z ) = 0 . The minimum modulus of p(z ) , that is, the minimum value of |p(z )| , cannot occur at an interior point of D . It must occur on the boundary ∂D of D . Proof: Let z0 be any interior point of D . Rewrite p(z ) in the form p(z ) = b0 + b1 (z − z0 ) + · · · + bn (z − z0 )n . Since p(z0 ) = 0 , we know b0 = 0 . Also, because p is not identically constant, at least one coefficient following b0 is not zero. Take bk to be the first such coefficient. We must write b0 , bk and z − z0 in polar form, b0 = ρ0 euα bk = ρ1 eiβ z − z0 = ρeiθ , where ρ0 = |p(z0 )| , ρ1 and ρ are positive real numbers. Here we are restricting z to a point on a circle of radius ρ about z0 , after taking ρ small enough to insure this circle is interior to D . Then p(z ) = ρ0 eiα + ρ1 eiβ ρk eikθ + bk+1 (z − z0 )k+1 + · · · + bn (z − z0 )n = ρ0 eiα + ρ1 ρk ei(β +kθ) + (z − z0 )k+1 [bk+1 + · · · + bn (z − z0 )n−k−1 ]. Pick the particular point z on the circle whose argument θ is given by β + kθ = α + π . ˆ Then ei(β +kθ) = ei(α+π) = −eiα , so p(ˆ) = (ρ0 − ρ1 ρk )eiα + (ˆ − z0 )k+1 [bk+1 + · · · + bn (ˆ − z0 )n−k−1 ]. z z z By the triangle inequality we find |p(ˆ)| ≤ ρ0 − ρ1 ρk + ρk+1 [|bk+1 | + · · · + |bn | ρn−k−1 ]. z Choose the radius ρ so small that ρ0 − ρ1 ρk ≥ 0 . Then |p(ˆ)| ≤ ρ0 − ρ1 ρk + ρk+1 [|bk+1 | + · · · + |bn | ρn−k−1 ]. z By choosing ρ smaller yet, if necessary, we can make the term ρ[|bk+1 | + · · · + |bn | ρn−k−1 ] < 1 2 ρ1 . Consequently, 1 1 |p(ˆ)| ≤ ρ0 − ρ1 ρk + ρ1 ρk = ρ0 − ρ1 ρk z 2 2 < ρ0 = |p(z0 )| . Thus, if z0 is any interior point of a domain D in which p does not vanish, then there is a point z also interior to D such that |p(z )| < |p(z0 )| . Therefore, the minimum of |p(z )| ˆ must occur on the boundary of any set in which p does not vanish. Lemma 7.12 . Given any real number M , there is a circle |z | = R on which |p(z )| > M for all z, |z | = R . 7.2. GENERALITIES ON MAPPINGS FROM RN TO RM . 299 Proof: For z = 0 , we can write the polynomial p(z ) as p(z ) an−1 a0 = an + + ··· + n. n z z z From the triangle inequality written in the form |f1 + f2 | ≥ |f1 | − |f2 | , we find an−1 p(z ) a0 ≥ |an | − + ··· + n . zn z z If |z | is taken large enough, |z | ≥ R0 , it is possible to make the second term on the right less than |an | /2 , an−1 an a0 + ··· + n < , z z 2 |z | = R > R0 texton Therefore, for |z | = R ≥ R0 1 an p(z ) = |an | , ≥ |an | − n z 2 2 so 1 |an | Rn , texton |z | = R. 2 It is now clear that by choosing R sufficiently large, |p(z )| can be made to exceed any constant M on the circle |z | = R . |p(z )| ≥ Theorem 7.13 (Fundamental Theorem of Algebra). Let p(z ) = a0 + a1 z + · · · + an z n , an = 0, n ≥ 1, be any polynomial with possibly complex coefficients, a0 , a1 , . . . , an . Then there is at least one number z0 ∈ C such that p(z0 ) = 0 . In other words, every polynomial has at least one complex root. Proof: By Lemma 3, we can find a large circle |z | = R , on which |p(z )| > 2 |a0 | for all |z | = R . Since p(z ) is a continuous function, by Theorem 4 there is a point z0 in the closed and bounded disc |z | ≤ R for which |p| attains its minimum value m0 , |p(z0 )| = m0 . If p(z0 ) = 0 , we are done. However if p does not vanish inside the closed disc, by the important Lemma 2 its minimum value is attained only on the boundary, so z0 is on the circle |z0 | = R . But on the circle we know |p(z0 )| > 2 |a0 | = 2 |p(0)| , so the minimum is not at z0 after all. The assumption that p does not vanish in the disc |z | ≤ R had led us to a contradiction. Notice the proof does not give a procedure for finding the root whose existence has been proved. Exercises 1. Prove Theorem 2, part 1. 2. Use the Fundamental Theorem of Algebra along with the “factor theorem” of high school algebra to prove that a polynomial of degree n has exactly n roots (some of which may be repeated roots). 300 7.3 CHAPTER 7. NONLINEAR OPERATORS: INTRODUCTION Mapping from 1 E to n E . As a particle moves along a curve γ in En its position F (t) at time t can be specified by a vector X = F (t) = (f1 (t), f2 (t), . . . , fn (t)), where xj = fj (t) is the j th coordinate of the position at time t . Thus, the curve is specified by F (t) , a mapping from numbers to vectors, F : A ⊂ E1 → En , where A is the domain of definition of F . For example, the mapping F : t → (cos πt, sin πt, t), t ∈ (−∞, ∞) which may also be written as F (t) = (cos πt, sin πt, t) can be thought of as describing the motion of a particle along a helix. It is natural to ask about the velocity, which means derivative must be defined. Definition: Let F (t) define a curve γ for t in the interval A = [a, b] . Consider the difference quotient F (t + h) − F (t) , h t textand t+h in A, where t is fixed. If this vector has a limit as h tends to zero, then F is said to have a derivative F (t) at t , F (t + h) − F (t) F (t) = lim , h→0 h while the curve has slope F (t) at t . Some other common notations are ˙ F (t), dF , dt Dt F. The curve γ is called smooth if i) the derivative F (t) exists and is continuous for each t in [a, b] , and if ii) F (t) = 0 for any point t in [a, b] . If t represents time, then F (t) is the velocity of the particle at time t while F (t) is the speed. If F (t) is given in terms of coordinate functions, F : t → (f1 (t), . . . , fn (t)) , how can the derivative of F be computed? Theorem 7.14 . If F (t) = (f1 (t), . . . , fn (t)) is a differentiable mapping of A ⊂ E1 into En , then the coordinate functions are differentiable and dF = dt df1 df2 dfn , ,··· , dt dt dt . Conversely, if the coordinate functions are differentiable, then so is F (t) and the derivative is given by the above formula. 7.3. MAPPING FROM E1 TO EN 301 Proof: If t and t + h are both in A , then 1 F (t + h) − F (t) = [(f1 (t + h), . . . , fn (t + h)) − (f1 (t), . . . , fn (t))] h h = fn (t + h) − fn (t) f1 (t + h) − f1 (t) ,··· , h h Since the limit as h → 0 of the expression on the left exists if and only if all of the limits lim h→0 fj (t + h) − fj (t) , h j = 1, . . . , n exist, the theorem is proved. Examples: (1) If F : t → (cos πt, sin πt, t), t ∈ (−∞, ∞), F is differentiable for all t since each of the coordinate functions are differentiable. Also, F (t) = (−π sin πt, π cos πt, 1). In addition, the curve - a helix - which F defines is smooth since F is continuous and F (t) − π 2 sin2 πt + π 2 cos2 πt + 1 = π 2 + 1 = 0, (2) Let F : t → (a1 + b1 t, a2 + b2 t, a3 + b3 t) = P + Qt where P = (a1 , a2 , a3 ) and Q = (b1 , b2 , b3 ) are constant vectors. Then the curve F defines is a straight line which passes through the point P = (a1 , a2 , a3 ) at t = 0 . F is differentiable for all t , since each of the coordinate functions are differentiable. Furthermore, F (t) = Q = (b1 , b2 , b3 ), a constant vector pointing in the direction Q = (b1 , b2 , b3 ) , as is anticipated for a straight line. Because F (t) = Q = b2 + b2 + b2 , 1 2 3 this curve is smooth except in the degenerate case b1 = b2 = b3 = 0 , that is, Q = 0 , when the curve degenerates to a single point, F (t) = (a1 , a2 , a3 ) = P . (3) The curve defined by the mapping F : t → (t, |t|) is differentiable everywhere and F (t) = 0 except at t = 0 . It is not differentiable there since the second coordinate function, f2 (t) = |t| is not differentiable at t = 0 . Thus, the curve is smooth except at t = 0 . (4) The curve defined by the mapping F : t → (t3 , t2 ) is differentiable everywhere, and F (t) = (3t2 , 2t). √ However, F (t) = 9t4 + 4t2 , so the curve is smooth everywhere except at t = 0 , which corresponds to a cusp at the origin in the x1 , x2 plane. 302 CHAPTER 7. NONLINEAR OPERATORS: INTRODUCTION It is elementary to compute the derivative of the sum of two vectors. The derivative of a product can be defined for the inner product, and for the product with scalar-valued function. Theorem 7.15 . If F (t) and G(t) both map an interval A ⊂ E1 into En , and are both differentiable there, then for all t ∈ A , d 1. dt [aF + bG] = a dF + b dG (linearity of the derivative). dt dt d d 2. dt F, G = F , G + F, G , (in “dot product” notation: dt (F · G) = F · G + F · G ). Proof: Since these are identical to the proofs of the corresponding statements for scalarvalued functions, we prove only the second statement. 1 d F (t), G(t) = lim [ F (t + h), G(t + h) − F (t), G(t) ] h→0 h dt 1 [ F (t + h) − F (t), G(t + h) + F (t), G(t + h) − G(t) ] h→0 h = lim = lim h→0 F (t + h) − F (t), h G(t + h) − G(t) + F (t), G(t + h) h = F (t), G(t) + F (t), G (t) . An interesting and simple consequence is the fact that if a particle moves on a curve F (t) which remains a fixed distance from the origin, F (t) ≡ constant = c , then the velocity vector F is always orthogonal to the position vector F . This follows from c2 = F (t) 2 = F (t), F (t) , so taking the derivative of both sides we find 0 = F , F + F, F = 2 F, F . Thus F, F = 0 for all t , an algebraic statement of the orthogonality. As a particular example, the mapping π π F (t) = (cos , sin ) 2 1+t 1 + t2 has the property F (t) = 1 for all t . You can see the path of the particle in the figure. At t = 0 the particle is at (−1, 0) . As time increases, the particle moves along an arc of the unit circle toward (1, 0) , reaching (0, 1) at t = 1 . The velocity at time t is F (t) = 2πt π π (sin , − cos − ). (1 + t2 )2 1 + t2 1 + t2 From this expression, it is evident the particle slows down as it approaches (1, 0) . In fact, the particle never does manage to reach (1, 0) . We would like to define the notion of a straight line which is tangent to a smooth curve at a given point. There is one touchy issue. You see, the curve may intersect itself, thus having two or more tangents at the same point. Once acknowledged, the difficulty is resolved by realizing that for each value of t , there is a unique point F (t) on the curve. X0 is a double point if F (t1 ) = F (t2 ) = X0 . 7.3. MAPPING FROM E1 TO EN 303 By picking one value of t , there will be a unique tangent line to the curve for this value of t . Thus, we define the tangent line for t = t1 to the curve defined by a differentiable function F (t) as the straight line whose equation is A(t) = F (t1 ) + F (t1 )(t − t1 ). At t = t1 , the curves defined by F (t) and A(t) have the same value F (t1 ) = X0 and the same derivative (slope), F (t) . Example: Consider the curve defined by the mapping F : t → (3 + t3 − t, t2 − t), t ∈ (−∞, ∞) . The point (3, 0) is a double point since F : 0 → (3, 0) and F : 1 → (3, 0) . Thus, the line tangent to the point (3, 0) when t = 1 is defined by A(t) = (3, 0) + (2, 1)(t − 1) = (3, 0) + (2(t − 1), (t − 1)) or A(t) = (1, −1) + (2t, t). Since we are still working with functions F (t) of one real variable t , the mean value theorem and chain rule follow immediately by applying the corresponding theorems for scalar valued functions to each of the components f1 (t), . . . , fn (t) of F (t) . Theorem 7.16 (Approximation Theorem and Mean Value Theorem). If the vector valued function F (t) is continuous for t ∈ [a, b] and differentiable for t ∈ (a, b) then for t0 ∈ (a, b) , 1. F (t) = F (t0 ) + dF t0 (t − t0 ) + R(t, t0 ) |t − t0 | where dt lim R(t, t0 ) = 0. t→t0 2. There is a point τ between t and t0 such that F (t) − F (t0 ) ≤ F (τ ) |t − t0 | . 3. If F = (f1 , . . . , fn ) , there are points τ1 , . . . , τn between t and t0 such that F (t) = F (t0 ) + L(t − t0 ), where L is the linear transformation L = (f1 (τ1 ), f2 (τ2 ), . . . , fn (τn )) Remark: Although 1 and 3 follow from the one variable case f (t) —and will be proved again in greater generality later on - the proof of 2 is difficult under our weak hypothesis. If the stronger assumption, F is continuously differentiable, is made, then 2 becomes easy, and the factor F (τ ) can be replaced by a constant M = max F (τ ) , since a continuous τ ∈[a,b] function F (τ ) does assume its maximum if τ is in a closed and bounded set, τ ∈ [a, b] . Corollary 7.17 . If F satisfies the hypotheses of Theorem 8 and if F (t) ≡ 0 for all t ∈ [a, b] , then F is a constant vector. Proof: Just look at 2 or 3 above to see that for any points t, t0 in [a, b] , we have F (t) = F (t0 ) . 304 CHAPTER 7. NONLINEAR OPERATORS: INTRODUCTION Theorem 7.18 (Chain Rule). Consider the vector-valued function F (t) which is differentiable for t ∈ (a, b) , and the scalar valued function φ(s) which is differentiable for s ∈ (α, β ) . If the range of φ is contained in (a, b), R(φ) ⊂ (a, b) , then the composed function G(s) = (F ◦ φ)(s) = F (φ(s)) is differentiable as a function of s for all s in (α, β ) and G (s) = F (φ(s))φ (s), that is, dF dφ dF dG (s) = (φ) (s) = (t) ds dφ ds dt t=φ(s) dφ (s). ds If F (t) = (f1 (t), . . . , fn (t)) , then G(s) = F ((s)) = (f1 (φ(s)), . . . , fn (φ(s)), and G (s) =)f1 (φ)φ (s), . . . , fn (φ)φ (s)) = (f1 (φ), . . . , fn (φ))φ (s). Proof not given here. It is the same as that given in elementary calculus for n = 1 . A more general theorem containing this one is proved later (p. 701). Examples: 1. If F (t) = (1 − t2 , t3 − sin πt) and φ(s) = e−s , then G(s) = (F ◦ φ)(s) = (1 − −2s , e−3w − sin πe−2 ) . We compute G (s) in two distinct ways, using the chain rule, and e directly from the formula for G(s) . By the chain rule: G (s) = F (t) t=φ(s) 2 φ (s) = (−2t, 3t − π cos πt) t=e−s (−)e−s = −(−2e−s , 3e−2s − π cos πe−s )e−s , In particular, at s = 0 , since t = 1 when s = 0 , we find G (0) = −(−2, 3 + π ) = (2, −3, −π ) Directly from the formula for G(s) = (1 − e−2s , −3e−3s − sin πe−s ), we find G (s) = (2e−2s , −3e−3s + πe−s cos πe−2 ), which agrees with the chain rule computation. Since the derivative F (t) of a function F (t) from numbers to vectors, F : E1 → En , is also a function of the same type, the second and higher order derivatives can be defined inductively; d dk+1 d d2 F (t) := F (t), k+1 F (t) := F (k) (t). dt2 dt dt dt 7.3. MAPPING FROM E1 TO EN 305 Example: If F : t → (cos πt, sin πt, t) , then d (−π sin πt, π cos πt, 1) dt F (t) = = (−π 2 cos πt, −π 2 sin πt, 0). If F (t) represents the position of a particle at time t , then F (t) is the acceleration of the particle at time t . All of these ideas were used in the last two sections in Chapter 6 where linear systems of ordinary differential equations were encountered. Time permitting, a second application to a non-linear system of O.D.E.’s will be treated in Section of Chapter. There another of the crown jewels in the intellectual history of mankind will be discussed: Newton’s incredible solution of “the two body problem”, that is, to determine the motion of the heavenly bodies. Recall that the length of a curve is defined to be the limit of the lengths of inscribed polygons which approximate the curve as the length of the longest subinterval tends to zero - if the limit does exist. Let the curve γ , which we assume is smooth, be determined by the function F (t), t ∈ [a, b] . Then the length of the straight line joining F (tj ) to F (tj + ∆tj ), tj +1 = tj + ∆tj , is F (tj + ∆tj ) − F (tj ) ∆ tj ∆ tj F (tj + ∆tj ) − F (tj ) = Adding up the lengths of these segments and letting the largest ∆tj tend to zero, we find the length of γ is given by b L(γ ) = F (t) dt. 1 If the function F is defined through coordinates, F (t) = (f1 (t), . . . , fn (t)) , this formula reads b L(γ ) = f a 2 1 +f 2 2 + ··· + f 2 dt. n You will recognize the special case where F (t) = (x(t), y (t)) b L(γ ) = x2 + y 2 dt. ˙ a Example: Find the length of the portion of the helix γ defined by F (t) = (cos t, sin t, t) , for t ∈ [0, 2π ] . This is one “hoop” of the helix. Since F (t) = (− sin t, cos t, 1) , we have √ F (t) = sin2 t + cos2 t + 1 = 2 , so the length is 2π L(γ ) = √ √ 2 dt = 2π 2. 0 For each t ∈ [a, b] , we can define an arc length function s(t) , the arc length from a to t , by t s(t) = F (τ ) dτ. a Note we are using a dummy variable of integration τ . By the fundamental theorem of calculus, we have ds = F (t) dt 306 CHAPTER 7. NONLINEAR OPERATORS: INTRODUCTION Since ds/dt can be thought of as the rate of change of arc length with respect to time, it is the speed of a particle moving along the curve, the tangential speed. The integral used in arc length is the integral of a scalar- valued function F (t) . How can we define the integral of a vector-valued function F (t) = (f1 (t), . . . , fn (t)) ? Just integrate each component, assuming they are all integrable of course, b b F (t) dt := ( a b f1 (t) dt, . . . , a For example, if F (t) = (t − 3t2 , 1 − 2 √ 2t, e3t ) , then 2 0 2 (t − 3t2 ) dt, F (t) dt = ( fn (t) dt). a 0 (1 − √ 2 e3t dt) 2t) dt, 0 0 2 = (−4, − , e6 − 1). 3 We give no physical interpretation of the integral (as an area or the like) except in the case b where F (t) represents the velocity of a particle. Then from the position at t = a to the position at t = b . F (t) dt is the vector pointing a Exercises (1) (a) Describe and sketch the images of the curves F : E1 → E2 defined by (i) (ii) (iii) (iv) (v) F (t) = (2t, 3 − t) F (t) = (2t, |3 − t|) F (t) = (t2 , 1 + t2 ) F (t) = (2t, sin t) F (t) = (t2 , 1 + t4 ) (b) Which of the above mappings are differentiable and for what value(s) of t ? Find the derivatives if the functions are differentiable. Which of the curves defined by these mappings are smooth, and where are they not smooth? (2) Use the definition of the derivative to find F (t) at t = 2π for the functions a). F (t) = (2t, 3 − t) b). F (t) = (1 + t ∈ (−∞, ∞). t2 , sin 2t). t ∈ (−∞, ∞). (3) Find the lengths of the curves γ defined by the mappings a). F (t) = (a1 + b1 t, a2 + b2 t, . . . , an + bn t), = P + Qt, t ∈ [0, 1]. b). F (t) = (sin 2t, 1 − 3t, cos 2t, 2t3/2 ), t ∈ [−π, 2π ] (4) Consider the curve defined by the equation F (t) = (t − t2 , t4 − t2 + 1), t ∈ (−∞, ∞) a). Sketch the curve. b). Where does the curve intersect itself? c). Find the line tangent to the curve at the image of t = 1 . 7.3. MAPPING FROM E1 TO EN 307 (5) If F : A ⊂ E1 → En is twice continuously differentiable and F (t) ≡ 0 for all t ∈ A , what can you conclude? Please prove your assertion. [Hint: First consider the special case where F : E1 → E1 ]. (6) Let F (t) be a twice differentiable function which maps a set in E1 into En and satisfies the ordinary differential equation F + µF + kF = 0 , where k and µ are positive constants. Define the energy as E (t) = 1 F 2 2 1 + kF 2 2 (a) Prove E (t) is a non-increasing function of t (energy is dissipated). [Hint: dE/dt =? ]. (b) If F (0) = 0 and F (0) = 0 , prove E (t) ≡ 0 . (c) Prove there is at most one function which satisfies the given differential equation as well as the initial conditions F (0) = A, F (0) = B , where A and B are given vectors. 1 (7) If F (t) = (1 − e2t , t3 , 1+t2 ) , and φ(x) = using the chain rule. 1 1+x , x > −1 , compute d dx (F ◦ φ)(x) by (8) Compute d2 F/dt2 for the function F (t) in Exercise 7. (9) (a) Show that the equation of a straight line which passes through the point P1 at t = 0 and P2 at t = 1 is F (t) = P1 + (P2 − P1 )t. (b) Find the equation of a straight line which passes through the point P1 = (1, 2, 3) at t = 0 and P2 = (1, −5, 0) at t = 1 . (c) Find the equation of a straight line which passes through the point P1 at t = t1 and P2 at t = t2 . (d) Apply this to find the equation of a straight line which passes through P1 = (−3, 1, −2) at t = −1 and P2 = (0, 2, 1) at t = 2 . What is the slope of this line? (10) Given a smooth curve all of whose tangent lines pass through a given point, prove that the curve is a straight line. (11) Let F : E1 → En define a smooth curve which does not pass through the origin. Show that the position vector F (t) is orthogonal to the velocity vector at the point of the curve which is closest to the origin. Apply this to prove anew the well known fact that the radius vector to any point on a circle is perpendicular to the tangent vector at that point. [Hint: Why is it sufficient to minimize ϕ(t) = F (t), F (t) ?] 308 CHAPTER 7. NONLINEAR OPERATORS: INTRODUCTION Chapter 8 Mappings from En to E : The Differential Calculus 8.1 The Directional and Total Derivatives . Throughout this and the next chapter we shall consider functions which map En or a portion of it A , into E . By the statement f : A → E, A ⊂ En we mean that to every vector X in A , the function (operator, map, transformation) assigns a unique real number w . Thus w = f (X ) in this case is a map from vectors to numbers. Two particular examples prove helpful in thinking conceptually about mappings of this type. (1) The temperature function. f : A → E , where the set A ⊂ E3 is the room in which you are sitting. To every point X in the room, A , this function f assigns a number - the temperature f (X ) at X, w = f (X ) . (2) The height function. f : A → E , where the set A is some set in the plane E2 . To every point X in A , this function f assigns a number - the height f (X ) of a surface (or manifold) M above that point. Thus, the set of all pairs (X, f (x)), X ∈ A , defines a portion of a surface, a surface in E2 × E ∼ E3 . = From the second example, it is clear that every function f : A ⊂ En → E may be regarded as the graph of a surface in En × E ∼ En+1 , the surface being regarded as all = points in En+1 of the form (X, f (X )) , where X ∈ A . For example, the temperature function can be thought of as the graph of a surface in E4 , the height of the surface w = f (X ) above X being the temperature at X . (Compare with the discussion from p. 322 bottom, to p. 324). In concrete situations, the point X ∈ En is specified by giving its coordinates with respect to some fixed bases for En and E . The particular coordinate system used depends on the geometry of the problem at hand. Rectangular symmetry calls for the standard 309 310 CHAPTER 8. MAPPINGS FROM EN TO E : THE DIFFERENTIAL CALCULUS rectangular coordinates, while polar coordinates are well suited to problems with circular symmetry. We shall meet these issues head-on a bit later. If X = (x1 , . . . , xn ) with respect to some coordinates for En , then we write w = f (X ) = f (x1 , . . . , xn ) . The points (X, f (X )) on the graph are (x1 , . . . , xn , f (x1 , . . . , xn )) , which we may also write as (x1 , . . . , xn , f ) or else as (x1 , . . . , xn , w) . For low dimensional spaces, E2 or E3 , it is convenient to avoid subscripts. In these situations we shall write w = f (x, y ) and w = f (x, y, z ) for mappings with domains in E2 and E3 , respectively. We now examine some more specific examples. Examples: (1) w = − 1 x + y − 1 . This function assigns to every point X = (x, y ) in E2 a number w 2 in E . We can represent the function, an affine mapping from E2 → E , as the graph of a plane in E3 . The linear nature of the plane reflects the fact that the mapping is an affine mapping - a linear mapping except for a translation of the origin. More generally, the function w = α + a1 x1 + a2 x2 + · · · + an xn , an affine mapping from En → E , represents a plane in En+1 . In fact, this can be taken as the algebraic definition of a plane in En+1 . These affine functions are the simplest functions which map En into E . Although we shall not, it is customary to abuse the nomenclature and refer to affine mappings as being linear. This is because they share most of the algebraic and geometric properties of proper linear mappings, as opposed to the honestly nonlinear mappings we will be treating as in the next examples. (2) w = x2 + y 2 . This function assigns to every point X = (x, y ) in E2 a real number w ∈ E . We can represent the function as the graph of a paraboloid of revolution, obtained by rotating the parabola w = x2 about the w axis. If this paraboloid is cut by a plane parallel to the x, y plane, say w = 2 , the intersection of these two surfaces is the circle x2 + y 2 = 2 . (3) w = −x2 + y 2 . This function can be represented as the graph of a very fancy surface - a hyperbolic paraboloid. If this surface is cut by a plane parallel to the x, y plane, w = c , the intersection is the curve c = −x2 + y 2 . For c > 0 , this curve is a hyperbola which opens about the y axis, while if c < 0 , the curve is a hyperbola which opens + about the x axis. For c = 0 we obtain two straight lines, x =− y (see fig). The intersection of the surface with the plane x = c is a parabola which opens upward in the y, w plane. Similarly, the intersection of the surface with the plane y = c is a parabola which opens downward in the xw plane. This curve is rightly called a saddle, and the origin (0, 0, 0) a saddle point (or mountain pass) since a particle can remain at rest at that point, or ii) move on the surface in one direction and go up, or iii) move on the surface in another direction and go down. Let f (X ) be a function from vectors to numbers, f : A ⊂ En → E. How can we define the notion of derivative for such functions? The derivative should measure the rate of change of f (X ) as X moves about. But if you think of f (X ) as the temperature function, it is clear that the temperature will change at different rates depending which direction you move. Thus, if you move across the room in 8.1. THE DIRECTIONAL AND TOTAL DERIVATIVES 311 the direction of the door, the temperature may decrease, while if you move up to the ceiling, the temperature will likely increase. Thus, the natural notion of a derivative is the rate of change in a particular direction - a directional derivative. Let X0 denote your position and f (X0 ) the temperature there. Take η to be a free vector, which we shall think of as pointing from X0 to X0 + η . We want to define the rate at which the temperature changes as you move from X0 in the direction η toward X0 + η . Since all points on the line joining X0 to X0 + η are of the form X0 + λη , where λ is a real number, the difference f (X0 + λη ) − f (X0 ) is the difference between the temperatures at X0 + λη and at X0 . Definition: Let f : A ⊂ En → E . The derivative of f at the interior point X0 ∈ A with respect to the vector η is f (X0 + λη ) − f (X0 ) , λ→0 λ f (X0 ; η ) = lim if the limit exists. In the special case when η = e is a unit vector, e = 1 , we see that λ = λe . Then De f (X0 ) := f (X0 ; e) is the instantaneous rate of change of f per unit length as X moves from X0 toward X0 + 3 . This normalization to using only unit vectors is necessary to have a meaningful definition of a directional derivative. Thus, the directional derivative of f at X0 ∈ A in the direction of the unit vector e is the derivative with respect to the unit vector e . It measures how f changes as you move from X0 to a point on the unit sphere about X0 . For theoretical purposes, the derivative of f with respect to any vector η is useful, while for practical purposes, the more restrictive notion of the directional derivative is needed. Example: 1 Find the directional derivative of f (X ) = x2 − 2x1 x2 + 3x2 at X0 = (1, 0) in 1 η the direction η = (−1, 1) . Note that η is not a unit vector. The unit vector is e = η = 1 1 (− √2 , √2 ) . Then 11 λλ X0 + λe = (1, 0) + λ(− √ , √ ) = (1 − √ , √ ), 22 22 so λ λ λ λ f (X0 + λe) = (1 − √ )2 − 2(1 − √ )( √ ) + 3( √ ) 2 2 2 2 3 λ = 1 − √ + λ2 . 22 Thus, 1− f (X0 + λe) − f (X0 ) = λ Therefore, the directional derivative De f is λ √ 2 3 + 2 λ2 − 1 λ −1 3 = √ + λ. 22 f (X0 + λe) − f (X0 ) −1 =√ . λ→0 λ 2 De f (X0 ) = lim 1 In words, the rate of change of f at X0 in the direction of the unit vector e is − √2 . One qualitative conclusion we arrive at is that f (X ) decreases as X moves from X0 in the direction e . 312 CHAPTER 8. MAPPINGS FROM EN TO E : THE DIFFERENTIAL CALCULUS 2. Compute f (X ; η ) if f (X ) = X, AX , where A is a self-adjoint transformation. f (X + λη ) = X + λη, A(X + λη ) = X, AX + λ η , AX + λ X, Aη + λ2 η , Aη and since A is self-adjoint, = X, AX + 2λ AX, η + λ2 η , Aη . Thus, f (X + λη ) − f (X ) = 2 AX, η . λ→0 λ f (X, η ) = lim In particular when A = I is the identity operator, f (X ) = X 2 X, η 2 , we find f (X ; η ) = The directional derivatives of f in the particular direction of the coordinate axes e1 = (1, 0, . . .), e2 = (0, 1, 0, . . .) have special names. They are called the partial derivatives of f . For example, the partial derivative of f (X ) = f (x1 , x2 , . . . , xn ) at X0 with respect to x2 is ∂f f (X0 + λe2 ) − f (X0 ) (X0 ) := f (X0 ; e2 ) = lim λ→0 ∂x2 λ There are many other competing notations, all of them being used. We shall list them shortly, after observing there is a simple way to compute these partial derivatives. Consider f (X ) = f (x − 1, x2 , x3 ) . Then f (X + λe1 ) − f (X ) ∂f (X ) = lim λ→0 ∂x2 λ Since X + λe1 = (x1 , x2 , x3 ) + λ(1, 0, 0) = (x1 + λ, x2 , x3 ) we have f (x1 + λ, x2 , x3 ) − f (x1 , x2 , x3 ) ∂f (X ) = lim . λ→0 ∂x2 λ But this is the ordinary derivative of f with respect to the single variable x1 , while holding the other variables x2 and x3 fixed. Thus, ∂f /∂x1 can be computed by merely taking the ordinary one variable derivative of f with respect to x1 , pretending the other variables are constants. Example: If f (X ) = x2 + x1 ex1 x2 , find the rate of change of f at the point X0 in the 1 ∂f ∂f directions e1 = (1, 0) and e2 = (0, 1) . Thus, we want to compute ∂x1 (X0 ) and ∂x2 (X0 ). ∂f = 2x1 + ex1 x2 + x1 x2 ex1 x2 ∂x1 ∂f = x2 ex1 x2 1 ∂x2 At the pint X0 = (2, −1) , we have ∂f ∂x1 = 4 − e−2 , 2 ,− 1 ∂f ∂x2 = 4e−2 . (2,−1) 8.1. THE DIRECTIONAL AND TOTAL DERIVATIVES 313 Some common notation. If w = f (x1 , x2 ) , then ∂w ∂f = = D1 f = f1 = fx1 = wx1 ∂x1 ∂x1 ∂w ∂f = = D2 f = f2 = fx2 = wx2 ∂x2 ∂x2 If f : A ⊂ En → E , then ∂f /∂xj is another function of X = (x1 , x2 , . . . , xn ) . It is then possible to take further partial derivatives. Example: Let w = f (X ) = x2 + x1 ex1 x2 as in the previous example. Then 1 w11 = f11 = fx1 x1 = ∂2f ∂ ∂f x1 x2 = x2 ex1 x2 + x1 x2 ex1 x2 2 2 = ∂x ( ∂x ) = 2 + x2 e ∂x1 1 1 w12 = f12 = fx1 x2 = ∂ ∂f ∂2 = ( ) = x1 ex1 x2 = x1 ex1 x2 + x2 x2 ex1 x2 1 ∂x1 ∂x2 ∂x2 ∂x1 w21 = f21 = fx2 x1 = ∂2f ∂ ∂f = ( ) = 2x1 ex1 x2 = x2 x2 ex1 x2 = f12 1 ∂x2 ∂x1 ∂x1 ∂x2 w22 = f22 = fx2 x2 = ∂2f ∂ ∂f 3 x1 x2 . 2 = ∂x ( ∂x ) = x1 e ∂x2 2 2 And even higher derivatives can be computed too, like f221 = fx2 x2 x1 = ∂3f ∂ ∂2f 2 x1 x2 + x3 x2 ex1 x2 . 1 2 ∂x = ∂x ( ∂x2 ) = 3x1 e ∂x2 1 1 2 Remark: From this one example, it appears possible that we always have f12 = f21 , ∂2f ∂2f that is ∂x1 ∂x2 = ∂x2 ∂x1 . This is indeed the case if the second partial derivatives of f are continuous, but for lack of time we shall not prove it (see Exercise 6). So far we have defined the directional derivative of a function f : En → E and called particular attention to those in the direction of the coordinate axes - the partial derivatives of f . Although the actual computation of the partial derivatives has been reduced to the formal procedure of computing ordinary derivatives, the computation of the directional derivative in an arbitrary direction must still be done by using the definition: the limit of a difference quotient. We shall now reduce the computation of all directional derivatives to a simple formal procedure. In order to do so, we shall introduce the concept of the total derivative for functions f : A ⊂ En → E1 . This derivative will not be a directional derivative, but rather a more general object. The motivating idea here is the important one of approximating a non-linear function f at a point X0 by a linear function. If we think of the function f (X ) as defining a surface M in En+1 with point (X, f (X )) , then the picture is that of approximating the surface M near X0 by a plane (or hyperplane) tangent to the surface at X0 . We want to write f (X ) ∼ f (X0 ) + L(X − X0 ), where L is a linear operator, L : En → E , which may depend on the “base point” X0 . Of course, as X → X0 we want the accuracy to improve in the sense that the tangent plane should be a better approximation the closer X is to X0 . At X = X0 , the tangent 314 CHAPTER 8. MAPPINGS FROM EN TO E : THE DIFFERENTIAL CALCULUS plane f (X0 ) + L(X − X0 ) and surface M touch since they both pass through the point (X0 , f (X0 )) . Notice that the function f (X) ) + L(X − X0 ) is affine, so it does represent a plane surface. Motivated by the above considerations, we can now make a reasonable Definition: Let f : A ⊂ En → E and X0 be an interior point of A . f is differentiable at X0 if there exists a linear transformation L : En → E such that lim h →0 f (X0 + h) − f (X0 ) − Lh = 0, h for any vector h in some small ball about X0 (so f (X0 + h) is defined). The operator L will usually depend on the base point X0 . If f is differentiable at X0 , we shall use the notation df (X0 ) = f (X0 ) = L(X0 ) = L, dX and refer to f (X0 ) as the total derivative of f at X0 . [The notation f (X0 ) and grad f (X0 ) , for gradient, are also used]. If L = f (X0 ) , a linear operator from En to E , exists and depends continuously on the base point X0 for all X0 ∈ A , then f is said to be continuously differentiable in A , written f ∈ C 1 (A) . Remark: The condition that f be differentiable at X0 can also be written in the following useful form: f (X0 + h) = f (X0 ) + Lh + R(X0 , h) h , (8-1) where the remainder R(X0 , h) has the property lim R(X0 , h) = 0. h →0 This abstract operator L has the delightful property that it can be computed easily. But before telling you how, we should first prove for a given f there can be at most one linear operator L which is the total derivative. Theorem 8.1 . (Uniqueness of the total derivative). Let f : A → E be differentiable at the interior point X0 ∈ A . If L1 and L2 are linear operators both of which satisfy the conditions for the total derivative of f at X0 , then L1 = L2 . Proof: Let L = L1 − L2 . We shall show L is the zero operator. Since Lh = L1 h − L2 h = [f (X0 + h) − f (X0 ) − L2 h] − [f (X0 + h) − f (X0 − L1 h], by the triangle inequality we have Lh ≤ f (X0 + h) − f (X0 ) − L2 h + f (X0 + h) − f (X0 ) − L1 h . Consequently, lim h →0 Lh = 0. h 8.1. THE DIRECTIONAL AND TOTAL DERIVATIVES 315 To complete the proof, a trick is needed. Fix η = 0 . If λ is a constant, λ → 0 , then λη → 0 so L(λη ) lim = 0. λη λ →0 But since L is linear, Lλη = λLη = |λ| Lη , so the factor λ can be canceled in numerator and denominator. Thus the last equation is independent of λ , so Lη / η = 0 . Because η = 0 , this implies Lη = 0 . Therefore L must be the zero operator. Next, we give a method for computing L . Not only that, but we also find an easy way to compute the directional derivatives. Theorem 8.2 . Let f : A → E be differentiable at the interior point X0 ∈ A . Then a) the directional derivative of f at X0 exists for every direction e and is given by the formula De f (X0 ) = Le. b) Moreover, if f is given in terms of coordinates, f (X ) = f (x1 , . . . , xn ) , then L is represented by the 1 × n matrix f (X0 ) = L = (fx1 (X0 ), . . . , fxn (X0 )). c) Consequently, the directional derivative is simply the product of this matrix L with the unit vector e , which can also be thought of as the scalar product of the 1 × n matrix, a vector, and the vector e , De f (X0 ) = f (X0 ), e . Proof: This falls out of the definitions. First f (X0 + λe) − f (X0 ) . λ→0 λ De f (X0 ) = lim f (X0 + λe) − f (X0 ) − L(λe) + L(λe) . λ→0 λ = lim Since L(λe) = λLe and λe = λ = lim λe →0 f (X0 + λe) − f (X0 ) − L(λe) + Le. λe Because f is differentiable at X0 , the first term tends to zero. Thus proving the first part. To prove the last part, it is sufficient to observe that if e = ej is one of the coordinate vectors, then by definition Dej f (X0 ) := fxj . Thus, if h is any vector h = (h1 , . . . , hn ) = h1 e1 + · · · + hn en , by the linearity of L we have Lh = L(h1 e1 + · · · + hn en ) = h1 Le1 + · · · + hn Len = h1 fx1 (X0 ) + · · · + hn fxn (X0 ) h1 · = (fx1 (X0 ), . . . , fxn (X0 )) · . · hn 316 CHAPTER 8. MAPPINGS FROM EN TO E : THE DIFFERENTIAL CALCULUS Since h is any vector, we have shown L is represented by the given matrix. Remark: The theorem states that if f is differentiable, then all the partial derivatives exist and f (X0 ) := L is represented by the above matrix. It does not state that if the partial derivatives exist, then f is differentiable. This is false (see Exercise 16). However, if the partial derivatives of f exist and are continuous, then f is differentiable. The last statement will be proved as Theorem 3. Example: The same one worked before (p. 573). Find the directional derivative of f (X ) = x2 − 2x1 x2 + 3x2 at X0 = (1, 0) in the direction η = (−1, 1) . 1 1 1 Since η is not a unit vector, we let e = η = (− √2 , √2 ) . Now at a point X , η L = f (X ) = (fx1 , fx2 ) = (2x1 − 2x2 , −2x1 + 3). In particular, at X = X0 = (1, 0) , L = (2, −2 + 3) = (2, 1). Therefore De f (X0 ) = Le = (2, 1) 1 − √2 1 √ 2 1 = −√ , 2 which checks with the answer found previously. Consider the mapping w = f (X ), X ∈ A ⊂ En , w ∈ E as defining a surface M ⊂ En+1 . It is now evident how to define the tangent plane to M at the point (X0 , f (X0 )) , where X0 ∈ A . Definition: Let F : A ⊂ En → E be a differentiable mapping, thus defining a surface M with points (X, f (X )), X ∈ A . The tangent plane to M at the point (X0 , f (X0 )) , where X0 ∈ A , is the surface defined by the affine mapping Φ(X ) = f (X0 ) + f (X0 )(X − X0 ). or Φ(X ) = f (X0 ) + L(X − X0 ), where L = f (X0 ), Thus, the tangent plane to the surface defined by f is merely the “affine part” of f at X0 . Example: Consider the function w = f (X ) = 3 − x2 − x2 . This function defines a 1 2 paraboloid (see fig.). Let us find the tangent plane to this surface at (X0 , f (X0 )) , where X0 = (1, −1) , so f (X0 ) = 3 − 12 − (−1)2 = 1 . Also fx1 (X ) = −2x1 , fx2 (X ) = −2x2 . Thus f (X0 ) = (fx1 (X0 ), fx2 (X0 )) = (−2, 2). Since X − X0 = (x1 , x2 ) − (1, −1) = (x1 − 1, x2 + 1) we find the equation of the tangent plane is x1 − 1 Φ(X ) = 1 + (−2, 2) = 1 − 2(x1 − 1) + 2(x2 + 1), x2 + 1 8.1. THE DIRECTIONAL AND TOTAL DERIVATIVES 317 or Φ(X ) = 5 − 2x1 + 2x2 . This tangent plane is the unique plane with the property Φ(X0 ) = f (X0 ), and Φ (X0 ) = f (X0 ). Although we have given necessary conditions that a function be differentiable (all directional derivatives exist, in particular, all partial derivatives exist), we have not given sufficient conditions. The next theorem gives sufficient conditions for a function to be continuously differentiable. Theorem 8.3 . Let f : A ⊂ En → E , where A is an open set. Then f is continuously differentiable throughout A if and only if all the partial derivatives of f exist and are continuous. Proof: ⇒ If f is continuously differentiable, then the partial derivatives exist by Theorem 2. Furthermore, for any X and Y in A , fxi (X ) − fxi (Y ) = f (X ), ei − f (Y ), ei = f (X ) − f (Y ), ei . Thus, applying the Schwartz inequality we find |fxi (X ) − fxi (Y )| ≤ f (X ) − f (Y ) . The statement f is continuously differentiable means the vector f (X ) is a continuous function of X . Therefore, given any > 0 is a δ > 0 such that f (X ) − f (Y ) < for all X − Y < δ . For any > 0 , the inequality above shows |fxi (X ) − fxi (Y )| is also less than for the same δ . Consequently, fxi is continuous. ⇐ . A little more difficult. The idea is to use the mean value theorem for functions of one variable. Let X and Y be points on A . To prove continuity at X , it is sufficient to restrict Y to being in some ball about X which is entirely in A (some ball does exist since A is open). For notational convenience, we take n = 2 . Then f (Y ) − f (X ) = f (Y ) − f (Z ) + f (Z ) − f (X ), where Z is a point in A whose coordinates, except the first, are the same as X and whose coordinates, except the second, are the same as Y . By the one variable mean value ˜ ˆ theorem, there is a point X between X and Z and point X between Y and Z such that ∂f ˜ ∂f ˆ f (Z ) − f (X ) = (X )(y1 − x1 ), f (Y ) − f (Z ) = (X )(y2 − x2 ). ∂x1 ∂x2 Therefore ∂f ˆ ∂f ˜ (X )(y1 − x1 ) + (X )(y2 − x2 ), f (Y ) − f (X ) = ∂x1 ∂x2 so f (Y ) − f (X ) − [fxi (X )(y1 − x1 ) + fx2 (X )(y2 − x2 )] ˜ ˆ = [fxi (X ) − fxi (X )](y1 − x1 ) + [fx2 (X ) − fx2 (X )](y2 − x2 ). Therefore ˜ f (Y ) − f (X ) − L(Y − X ) ≤ fxi (X ) − fxi (X ) |y1 − x1 | ˆ fx2 (X ) − fx2 (X ) |y2 − x2 | 318 CHAPTER 8. MAPPINGS FROM EN TO E : THE DIFFERENTIAL CALCULUS where we have written L = (fx1 (X ), fx2 (X )) . Since |yj − xj | ≤ Y − X , we see that f (X ) − f (Y ) − L(Y − X ) ˜ ˆ ≤ fx1 (X ) − fx1 (X ) + fx2 (X ) − fx2 (X ) . Y −X ˜ ˆ Because fx1 and fx2 are continuous and X − X < Y − X , X − X < Y − X , by making Y − X sufficiently small the right side of the above inequality can be made arbitrarily small. This proves the limit as Y − X → 0 of the expression on the left - exists and is zero. Since L is linear, the proof that f is differentiable is complete. The continuous differentiability is an immediate consequence of the linearity of L and the continuity of its components - the partial derivatives fxi . Exercises (1) i) Use the definition of the directional derivative to compute the given directional derivatives, ii) Check your answer by computing the directional derivative using the procedure of the Corollary to Theorem I. (a) f (x1 , x2 ) = 1 − 2x1 + 3x2 , at (2, −1) in the direction (3, 4) . [Answer: + 6 ]. 4 √ x+2y , at (3, −2) in the direction (1, 1) . [Answer: 3e−1 / 2 ]. (b) f (x, y ) = e (c) f (u, v, w) = 3uv + uw − v 2 , at (1, 1, 1) in the direction (1, −2, 2) . 3 4 (d) f (x, y ) = 1 − 3y + xy at (0, 6) in the direction ( 5 , − 5 ) . (2) i) Compute all of the first and second partial derivatives for the following functions. (a) f (x1 , x2 ) = x1 + x1 sin 2x1 √ (b) f (x1 , x2 , x3 ) = x2 x2 + 2x1 x3 − x3 1 (c) f (x, y ) = xy (d) f (x1 , x2 , . . . , xn ) = a + a1 x1 + a2 x2 + . . . + an xn . n (e) f (x1 , x2 , . . . , xn ) = aij xi xj = X, AX , where aij = aji (first try the cases i,j =1 n = 2 and n = 3 to see what is happening). ii) Find the 1 × n matrix f (x) . (3) For the surfaces defined by the functions f (X ) listed below, find the equation of the tangent plane to the surface at the point (X0 , f (X0 )) . Draw a sketch showing the surface and its tangent plane. (a) f (X ) = x2 + 3x2 + 1, 1 2 (b) f (X ) = ex1 x2 , X0 = (0, 1) (c) f (X ) = x2 sin πx2 , 1 X0 = (−1, 1 ) 2 (d) f (X ) = − 1 x1 + x2 + 1, 2 x2 1 X0 = (0, 0). 2x2 2 X0 = (2, 1) (e) f (X ) = + − x1 x3 + x1 , X0 = (1, −2, −1). Why can’t you sketch the surface defined by this function? 8.1. THE DIRECTIONAL AND TOTAL DERIVATIVES 319 (4) Let f (X ) and g (X ) both map A ⊂ En → E1 . If f and g are differentiable for all X ∈ A , prove d dX [af (X ) (a) df dg + bg (X )] = a dX (X ) + b dX (X ) (Linearity), where a and b are con- stants. dg df d dX [f (X )g (X )] = f (X ) dX (X ) + g (X ) dX (X ) f (X ) g (X )f (X )− f (X )g (X ) d , if g (X ) = 0 . dX g (X ) = g 2 (X ) (b) (c) d d (5) Use the rules (a-c) of Exercise 4 to compute dX [2f − 3g ], dX [f · g ] , and f (X ) = f (x1 , x2 ) = 1 − x1 + x1 x2 , and g (X ) = g (x1 , x2 ) = ex1 −x2 . (6) Let f (X ) = f (x, y ) = xy (x2 −y 2 ) , x2 + y 2 0 df dX [ g ] , where X = (x, y ) = 0 X=0 Prove (a) f, fx , fy are continuous for all X ∈ E2 . [Hint: Prove and use 2xy ≤ x2 + y 2 ]. (b) fxy and fyx exist for all X ∈ E2 , and are continuous except at the origin. (c) fxy (0) = 1, fyx (0) = −1 , so fxy (0) = fyx (0) (cf. Remark p. 577). (7) Let f : A ⊂ En → E be a differentiable map. Prove it is necessarily continuous. [Hint: This is a simple consequence of the definition in the form (1)]. (8) Let f : A ⊂ En → E be a continuous map. We say f has a local maximum at the point X0 interior to A if f (X0 ) ≥ f (X ) for all X in some sufficiently small ball about X0 . If we assume f is continuously differentiable, more can be said. (a) If f as above has a local maximum at the point X0 , prove f (X0 , X − X0 ) + R(X0 , X ) X − X0 ≤ 0 for all X is some small ball about X0 . (b) Use the property of R(X0 , X ) to conclude the stronger statement f (X0 ), (X − X0 ) ≤ 0. for all X in some small ball about X0 . (c) Observe the statement must also hold for the vector X0 − X , which points in the direction opposite to X − X0 , to conclude f (X0 ), (X − X0 ) ≥ 0, and hence that in fact f (X0 ), Z = 0, for all vectors Z = X − X0 . (d) Finally, show that at a maximum, f (X0 ) = 0. (9) (a) Find the equation of the plane which is tangent at the point X0 = (2, 6, 3) to the surface consisting of the points (X, f (X )) , where f (X ) = f (x, y, z ) = (x2 + y 2 + z 2 )1/2 . 320 CHAPTER 8. MAPPINGS FROM EN TO E : THE DIFFERENTIAL CALCULUS (b) Use the tangent plane found above to find the approximate value of ((2.01)2 + (5.98)2 + (2.99)2 )1/2 . (10) Assume the continuously differentiable function f (X ) has a zero derivative, f (X ) ≡ 0 , for X in some ball in En . Prove that f (X ) ≡ constant throughout the ball. (11) (a) Show the following functions satisfy the two dimensional Laplace equation ∂2u ∂2u + 2 =0 ∂x2 ∂y i) u(x, y ) = x2 − y 2 − 3xy + 5y − 6 ii) u(x, y ) = log(x2 + y 2 ) , except at the origin, (x, y ) = 0. iii) u(x, y ) = ex sin y (b) Show the following functions satisfy the one (space) dimensional wave equation utt = c2 uxx , c ≡ constant [Here t is time and x is space; c is the velocity of light, sound, etc.] i) u(x, y ) = ex−ct − 2ex+ct ii) u(x, y ) = 2(x + ct)2 + sin 2(x − ct). (12) Let f : A ⊂ En → E be continuously differentiable throughout A . If X0 ∈ A is not a critical point of f , so f (X0 ) = 0 , prove the directional derivative at X0 is greatest in the direction emax := f (X0 )/ f (X0 ) , and least in the opposite direction, emin := −emax . [Hint: Use the Schwarz inequality.] (13) Consider the function f (X ) = f (x, y ) = xy , x2 + y 2 0, X = (x, y ) = 0 X=0 Since F is the quotient of two continuous functions, it is continuous except possibly at the origin, where the denominator vanishes. Show that f (X ) is not continuous at the origin by finding lim f (X ) as X → 0 along paths 1 and 2, and showing that lim f (X ) = lim f (X ). X →0 X →0 path1 path2 (14) Let L be the partial differential operator defined by Lu = ∂2u ∂2u ∂2u −5 +6 2. ∂x2 ∂x∂y ∂y Show that L[eαx+βy ] = p(α, β )eαx+βy , where p(α, β ) is a polynomial in α and β . Find a solution of the linear homogeneous partial differential equation Lu = 0 . Find an infinite number of solutions of Lu = 0 , one for each value of α , by choosing α to depend on β in a particular way. [Answer: e2βx+βy and e3βx+βy are solutions for any β ]. 8.2. THE MEAN VALUE THEOREM. LOCAL EXTREMA. 321 (15) The two equations x = eu cos v y = eu sin v define u = f (x, y ) and v = g (x, y ) . Find the functions f and g for x > 0 . Compute f (X ) and g (X ) and show f (X ) ⊥ g (X ) . (16) This exercise gives an example in which the first partial derivatives of a function exist but the function is not continuous, let alone differentiable. Let f (X ) = f (x, y ) = xy 2 , x2 + y 4 0, X = (x, y ) = 0 X = 0. (a) If cos α = 0 , prove the directional derivative at the origin in the direction e = (cos α, sin α) exists and is De f (0) = 2 sin2 α , cos α cos α = 0 while if cos α = 0 , De f (0) = 0, cos α = 0. (b) Prove f is discontinuous at the origin by showing lim f (X ) has two different X →0 values along the two paths in the figure. Then appeal to exercise 7 to conclude f is not differentiable. (17) (a) Let P (X ), X ∈ En , be a polynomial of degree N , that is, ak1 ,...,kn xk1 xk2 · · · xkn , n 12 P (X ) = k1 +k2 +···+kn ≤N where k1 , k2 , . . . , kn are all non-negative integers. Prove P (α) is continuously differentiable. [Hint: How do you prove a polynomial in one variable is continuously differentiable.] (b) Let R(X ), X ∈ En , be a rational function - that is, the quotient of two polynomials. Prove R(X ) is continuously differentiable whenever the denominator is not zero. (18) If f : E1 → E1 , show that the definition of differentiability on page 578 coincides with the usual one. 8.2 The Mean Value Theorem. Local Extrema. Although the full “chain rule” will not be proved until Chapter 10, we shall need a very special and elementary case to develop the main features of the theory of mappings from En to E . Let f : A ⊂ En → E be a continuously differentiable function at all interior points of A . Take X and Z to be fixed interior points of A . Let φ(t) = f (X + tZ ) . We want to compute d d φ(t) = f (X + tZ ) dt dt that is, the rate of change of f (X ) at the point X + tZ as X varies along the line joining X to Z . 322 CHAPTER 8. MAPPINGS FROM EN TO E : THE DIFFERENTIAL CALCULUS Theorem 8.4 . Let f : A → E be a differentiable function throughout A . If X and Z are two interior points of A , and if the line segment joining them is in A , then d f (X + tZ ) = f (X + tZ )Z, dt t ∈ (0, 1). By the product f (Y )Z we mean matrix multiplication. Proof: For fixed X and Z , the function φ(t) := f (X + tZ ) an ordinary scalar valued function of the one variable t . Thus φ(tλ ) − φ(t) d φ(t) = lim λ→0 dt λ f (X + tZ + λZ ) − f (X + tZ ) = lim λ→0 λ f (X + tZ + λZ ) − f (X + tZ ) − f (X + tZ )(λZ ) + f (X + tZ )(λZ ) = lim λ→0 λ Since f is differentiable at X + tZ , then as λ → 0 the first three terms tend to zero. The factor λ in the last term cancels. Therefore d f (X + tZ ) = lim f (X + tZ )Z = f (X + tZ )Z, λ→0 dt as claimed. An easy consequence is Theorem set in En , X and Y joining X 8.5 (The Mean Value Theorem). Let f : A → E , where A is an open convex that is, if X and Y are any points in Z , then the straight line segment joining is in A too. If f is differentiable in A , there is a point Z on the segment and Y such that f (Y ) − f (X ) = f (Z )(Y − X ). If, moreover, f is bounded by some constant C, f (X ) ≤ C for all X ∈ A , then |f (Y ) − f (X )| ≤ C Y − X a figure goes here Proof: Every point on the segment joining X and Y is of the form X + t(Y − X ) , where t ∈ [0, 1] . Consider the function φ(t) of one variable, φ(t) = f (X + t(Y − X )). Theorem 4 states φ is differentiable. Therefore, by the one variable mean value theorem, there is a number t0 in the interval (0, 1) such that φ(1) − φ(0) = φ (t0 ) . But φ(1) = f (Y ), φ(0) = f (X ) and, by Theorem 4, φ (t0 ) = f (X + t0 (Y − X ))(Y − X ) . Letting Z = X + t0 (Y − X ) , a point on the segment joining X to Y , we conclude f (Y ) − f (X ) = f (Z )(Y − X ). 8.2. THE MEAN VALUE THEOREM. LOCAL EXTREMA. 323 The second part of the theorem follows by applying the Schwarz inequality to the function f (Z )(Y − X ) which can be written as f (Z ), (Y − X ) . Then f (Z ), Y − X ≤ f (Z ) Therefore if Y −X . f (Z ) ≤ C for all Z ∈ A , we find |f (Y ) − f (X )| ≤ C Y − X . Corollary 8.6 Let f : A → E be a differentiable map and A an open connected set in En (by a connected open set we mean it is possible to join any two points in A by a polygonal curve contained in A). If f (X ) ≡ 0 for every X ∈ A , that is, if fx1 (X ) = . . . = fxn (X ) = 0 , then f (X ) ≡ c , c a constant. Proof: If A is convex, say a ball, this is an immediate consequence of the second part of the mean value theorem, for f (X ) = 0 so |f (Y ) − f (X )| = 0 . Thus f (Y ) = f (X ) = constant for any two points X and Y . The requirement that A is connected is to exclude the possibility that A consists of two (or more) disjoint sets, in which case, all we can conclude is that f is constant on each connected part, but not necessarily the same constant. However, if A is connected, then any two points in A can be joined by a polygonal curve which is contained in A . Consider some straight line segment in this curve. By the mean value theorem, f must be constant on it. In particular, it has the same value at both end points. Checking the beginning and end of the whole polygonal curve, we find that f (X ) = f (Y ) . Because X and Y were any points, we are done. It is not at all difficult to generalize the mean value theorem to Taylor’s theorem and then to power series for functions of several variables. The only problem is one of notation, and that is a problem. As a compromise, we will prove the Taylor theorem - but only the first two terms for functions of three variables f (x, y, z ) . Just as in the mean value theorem, the idea is to reduce the problem to a function φ(t) of one real variable, because we do know the result for these functions. Let f be differentiable in some open set A ⊂ E3 and X0 a point in A . If X0 + h is also in A , we would like to express f (X0 + h) in terms of f and its derivatives at X0 . Fix X0 and h and consider the real valued function φ(t) of one variable defined by φ(t) = f (X0 + th), t ∈ [0, 1]. Then by Theorem 4, φ (t) = f (X0 + th)h = fx (X0 + th)h1 + fy (X0 + th)h2 + fz (X0 + th)h3 , where h = (h1 , h2 , h3 ) . Since each of the partial derivatives are maps from A to E , they can be differentiated in the same way f was. So can a sum of such functions. Thus φ (t) = d [fx (X0 + th)h1 + · · · + fz (X0 + th)hn ] dt = fxx (X0 + th)h1 h1 + fxy (X0 + th)h1 h2 + fxz (X0 + th)h1 h3 +fyx (X0 + th)h2 h1 + fyy (X0 + th)h2 h2 + fyz (X0 + th)h2 h3 +fzx (X0 + th)h3 h1 + fzy (X0 + th)h3 h2 + fzz (X0 + th)h3 h2 . 324 CHAPTER 8. MAPPINGS FROM EN TO E : THE DIFFERENTIAL CALCULUS If we introduce a matrix H (X ) , the Hessian matrix, whose elements are ∂ 2 f (X ) , ∂xi ∂xj φ (t) can be written as φ (t) = h, H (X0 + th)h . We remark that if f is sufficiently differentiable (two continuous derivatives is enough), then the Hessian matrix is self-adjoint since fxi xj = fxj xi , as we mentioned - but did not prove - earlier. If φ(t) is twice differentiable, by Taylor’s theorem for functions of one variable, we know that φ(1) = φ(0) + φ (0) + 1 φ (τ ), 2! τ ∈ (0, 1). Substituting into this formula, we find 1 h, H (X0 + τ h)h . 2! f (X0 + h) = f (X0 ) + f (X0 )h + Let us summarize. We have proved Theorem 8.7 (Taylor’s Theorem with two terms) . Let f : A → E , where A is an open connected set in En . Assume f has two continuous derivatives - that is, all the second partial derivatives of f exist and are continuous. If X0 is in A and X0 + h is in a ball about X0 in A , then 1 h, H (X0 + τ h)h , 2! f (X0 + h) = f (X0 ) + f (X0 )h + ∂2f )) is the n × n Hessian matrix and τ ∈ (0, 1) . ∂x1 ∂xj Letting X = X0 + h and Z = X0 + τ h, Z being a point on the line segment joining X0 to X , this reads where H (X ) = (( f (X ) = f (X0 ) + f (X0 )(X − X0 ) + 1 X − X0 , H (Z )(X − X0 ) , 2! or, in more detail, n f (X ) = f (X0 ) + i=1 1 ∂f (X0 ) (xi − x0 ) + i ∂xi 2 n n i=j j =1 ∂ 2 f (Z ) (xi − x0 )(xj − x0 ). i j ∂xi ∂xj Example: Find the first two terms in the Taylor expansion for the function f (X ) = f (x, y ) = 5 + (2x − y )3 about the point X0 = (1, 3) . We compute fx (X ) = 6(2x − y )2 , fy (X ) = −3(2x − y )2 fxx (X ) = 24(2x − y ), fxy (X ) = fyx (X ) = −12(2x − y ), fyy (X ) = 6(2x − y ). Therefore f (X0 ) = 4, fx (X0 ) = 6, fy (X0 ) = −3 , so f (X ) = 4 + (6, −3) 1 x−1 2ξ − η −12(2ξ − η ) + (x − 1, y − 3) y−3 −12(2ξ − η ) 6(2ξ − η ) 2 x−1 y−3 8.2. THE MEAN VALUE THEOREM. LOCAL EXTREMA. 325 where Z = (ξ, η ) is a point on the segment between X0 = (1, 3) and X = (x, y ) . Written out, the above equation reads, 1 f (x, y ) = 4 + 6(x − 1) − 3(y − 3) + [fxx (x − 1)2 + 2fxy (x − 1)(y − 3) + fyy (y − x)2 ], 2 where the second derivatives are evaluated at Z = (ξ, η ) . We are now in a position to examine the extrema of functions of several variables. Finding the maxima and minima of functions is important for several reasons. First of all, there is the vague emotional feeling that all patterns of action should maximize or minimize something. Second, we can investigate a complicated geometrical object by the relatively easy procedure of finding the local maxima and minima. Without further mention, for the balance of this section f (X ) will be a twice continuously differentiable function which maps the open set A ⊂ En into E . Definition: A function f : A → E has a local maximum at the interior point X0 ∈ A if, for all X in some open ball about X0 f (X ) ≤ f (X0 ). f has a local minimum at X0 if for all X in some open ball about X0 f (X ) ≥ f (X0 ). If f has a local maximum or minimum at X0 , is f (X0 ) = 0 ? Certainly. Theorem 8.8 . If f has a local maximum or minimum at X0 , then f (X0 ) = 0 . In coordinates, this means all the partial derivatives vanish at X0 , ∂f ∂f ∂f (X0 ) = · · · = (X0 ) = 0. (X0 ) = ∂x1 1 ∂x2 ∂xn Proof: Let η be any fixed vector. Then the function φ(t) of one variable φ(t) = f (X0 + tη ) has a local maximum or minimum at t = 0 . Consequently φ (0) = 0 . But by Theorem 4, φ (0) = f (X0 )η which we may write as f (X0 ), η . Thus f (X0 ), η = 0 , so the vector f (X0 ) is orthogonal to η . Since η was any vector, we conclude that f (X0 ) = 0 . The derivative f (X0 ) may vanish at points other than maxima or minima. An example is the “saddle point” of the hyperbolic paraboloid at the beginning of Section 1. All points where f vanishes are called critical points or stationary points of f . Let us give a precise definition of a saddle point. f has a saddle point at X0 if X0 is a critical point of f and if every ball about X0 contains points X1 and X2 such that f (X1 ) < f (X0 ) and f (X2 ) > f (X0 ) . Thus, every critical point is either a local maximum, minimum, or saddle point. There is a more intuitive way to prove Theorem 7. If e is a unit vector, then by Theorem 2, the directional derivative at X in the direction e is De f (X ) = f (X ), e . In what way should you move so f increases fastest? By the Schwartz inequality, we find |De f (X )| ≤ f (X ) e = f (X ) , 326 CHAPTER 8. MAPPINGS FROM EN TO E : THE DIFFERENTIAL CALCULUS with equality if and only if the vectors e and f (X ) are parallel. Thus, the directional derivative is largest when e has the same direction as f (X ) , and smallest when e has the opposite direction, emax = f (X )/ f (X ) , emin = −emax , Demax f (X ) = f (X ) , Demin f (X ) = − f (X ) . If X0 is a local maximum of f , then f (X0 ) must be zero, for otherwise you could move in the direction of f (X0 ) and increase the value of f . Similarly, if X0 is a local minimum, f (X0 ) must be zero. Once we know X0 is a critical point of f, f (X0 ) = 0 , an effective criterion is needed to determine if X0 is a local maximum, minimum, or saddle point for f . In elementary calculus, the sign of the second derivative was used. Our next theorem generalizes this test. The idea is essentially the same as in the one variable case (p. 104a-c). If f has a local maxima or minima, the tangent plane to the surface whose points are (X, f (X )) is horizontal, that is, f (X0 ) = 0 . Thus, near X0 the quadratic terms - the next lowest power in the Taylor expansion of f about X0 —will determine the behavior of f near X0 . Let X0 be the origin and take f (X ) = f (x, y ) to be a function of two variables with f (0) = 0 . Then near X0 = 0 , by Taylor’s theorem, we have 1 f (x, y ) ∼ [ax2 + 2bxy + cy 2 ], 2 where a = fxx (0), b = fxy (0) , and c = fyy (0) . The nature of the quadratic form Q(X ) = ax2 + 2bxy + cy 2 has already been determined. If Q(X ) is positive definite, then Q(X ) > 0 for X = 0 . Since f (x, y ) ∼ Q(X ) , this means f (x, y ) is positive near the origin. Because f (0, 0) = 0 , this implies the origin is a minimum for f . Instead of completing and rigorously justifying this special case, we shall immediately treat the general situation. Theorem 8.9 . Assume the twice continuously differentiable function f : A → E has a critical point at an interior point X0 of A ⊂ En , f (X0 ) = 0 . Let H (X0 ) be the Hessian ∂2f matrix (X0 ) evaluated at X0 . ∂xi ∂xj (a) If H (X0 ) is positive definite, then f has a local minimum at X0 . (b) If H (X0 ) is negative definite, then f has a local maximum at X0 . (c) If at least two of the diagonal elements of H (X0 ), fx1 x1 (X0 ), . . . , fxn xn (X0 ) have different signs, then X0 is a saddle point. (d) Otherwise the test fails. Proof: If X0 is a critical point for f , then Taylor’s theorem (Theorem 6) states f (X0 + η ) = f (X0 ) + 1 η , H (Z )η 2 where Z is between X0 and X0 + η . The linear term has been dropped since f (X0 ) = 0 . 8.2. THE MEAN VALUE THEOREM. LOCAL EXTREMA. 327 As in the proof of Taylor’s theorem, let φ(t) = f (X0 + tη ). Then φ (t) = η , H (X0 + tη )η . Since the second derivatives of f are assumed to be continuous, the function φ (t) is a continuous function of t . Consequently, if φ (0) is positive then φ (t) is also positive for all t sufficiently close to zero (Theorem I p. 29b). Because φ (0) = η , H (X0 )η and φ (τ ) = η , H (Z )η , where Z = X0 + τ η , this implies if H is positive definite at X0 , it is also positive definite at Z when Z is close to X0 . Assuming H (X0 ) is positive definite, we see that for all η sufficiently small, H (Z ) is positive definite. Therefore, f (X0 + η ) − f (X0 ) = 1 η , H (Z )η > 0, 2 η = 0, that is f (X0 + η ) − f (X0 ) > 0 for all η is some small ball about X0 . Thus f has a local minimum at X0 . If H (X0 ) is negative definite, the same proof with trivial modifications works. Another way to complete the proof is to apply part a) to the function g (X ) := −f (X ) . The Hessian for g at X0 will be −H (X0 ) which is positive definite (since H (X0 ) was negative definite). Thus g has a local minimum at X0 so f : = −g has a local maximum at X0 . If any two of the diagonal elements of H (X0 ) have opposite sign, say fx1 x1 (X0 ) > 0 and fx2 x2 (X0 ) < 0 , then for η = λe1 = (λ, 0, 0, . . . , 0) , λ any real number, we find η , H (X0 )η = λ2 fx1 x1 (X0 ) > 0 , while for η = λe2 = (0, λ, 0, . . . , 0) η , H (X0 )η = λ2 fx2 x2 (X0 ) < 0 . Therefore the quadratic form η , H (X0 )η assumes positive and negative values in any ball about X0 , proving X0 is a saddle point. Since this theorem reduces the investigation of the nature of a critical point to testing if a matrix is positive or negative definite, it would do well in this context to repeat Theorem A (p. 386d) which tells us when a 2 × 2 matrix is positive definite. Corollary 8.10 . Let X0 be a critical point for the function of two variables f (x, y ) with Hessian matrix fxx (X0 ) fxy (X0 ) H (X0 ) = . fxy (X0 ) fyy (X0 ) (a) If det H (X0 ) > 0 and fxx (X0 ) > 0 , then f has a local minimum at X0 . (b) If det H (X0 ) > 0 and fxx (X0 ) < 0 , then f has a local maximum at X0 . (c) If det H (X0 ) < 0 , then f has a saddle point at X0 (this is a stronger statement than part c of Theorem 8). Proof: Since these merely join Theorem A (p. 386d) with Theorem 8, the proof is done. Examples: 328 CHAPTER 8. MAPPINGS FROM EN TO E : THE DIFFERENTIAL CALCULUS (1) Find and classify the critical points of the function w = f (x, y ) := 3 − x2 − 4y 2 + 2x . A sketch of the surface with points (x, y, f (x, y )) , a paraboloid, is at the right. At a critical point f (X ) = 0 , that is, fx = 0, fy = 0 . Since fx = −2x + 2, fy = −8y, at a critical point −2x + 2 = 0, −8y = 0. There is therefore only one critical point, X0 = (1, 0) . We look at the Hessian to determine the nature of the critical point. Because fxx = −2, fxy = fyx = 0, fyy = −8 , −2 0 H (X0 ) = . 0 −8 Since det H (X0 ) = 16 > 0 and fxx (X0 ) = −2 < 0, H (X0 ) is negative definite so X0 = (1, 0) is a local maximum for the function, and at that point f (X0 ) = 4 . (2) Find and classify the critical points of w = f (x, y ) = −x2 + y 2 . The surface (x, y, f (x, y )) is a hyperbolic paraboloid. We expect a saddle point at the origin. At a critical point fx = −2x = 0, fy = 2y = 0. Thus the origin (0, 0) is the only critical point. Since H (x, y ) = −2 0 02 , and det H (0, 0) = −4 < 0 , the origin is a saddle point. This also follows from the observation that the diagonal elements have different signs. (3) Find and classify the critical points of w = f (x, y ) = [x2 + (y + 1)2 ][x2 + (y − 1)2 ]. At a critical point, fx = 2x[x2 + (y − 1)2 ] + 2x[x2 + (y + 1)2 ] = 0 and fy = 2(y + 1)[x2 + (y − 1)2 ] + 2(y − 1)[x2 + (y + 1)2 ] = 0. The first equation implies x = 0 . Substituting this into the second we find y = 0, y = 1, y = −1 . Thus there are three critical points X1 = (0, 0), X2 = (0, 1), X3 = (0, −1). We must evaluate the Hessian matrix at these points. Since fxx = 12x2 + 4y 2 + 4, H (X1 ) = 4 0 0 −4 fxy = 9xy, , fyy = 4x2 + 12y 2 = −4, H (X2 ) = 80 08 = H (X3 ). Because det H (X1 ) = −16 < 0, X1 = (0, 0) is a saddle point. Because det H (X2 ) > det H (X3 ) = 64 > 0 and fxx (X2 ) = fxx (X3 ) = 8 > 0 , both X2 = (0, 1) and X3 = (0, −1) are local minima. To complete the computation, we find f (X1 ) = 1, f (X2 ) = 0, f (X3 ) = 0 . A sketch of the surface is at the right. 8.2. THE MEAN VALUE THEOREM. LOCAL EXTREMA. 329 (4) Find and classify the critical points of w = f (x, y, z ) = 1 − 2x + 3x2 − xy + xz − z 2 + 4z + y 2 + 2yz. At a critical point, fx = −2 + 6x − y + z, fy = −x + 2y + 2z, fz = x − 2z + 4 + 2y. Solving these equations, we find only one critical point, X0 = (0, −1, 1), where f (X0 ) = 3 . Since fxx = 6, fxy = −1, fxz = 1 fyy = 2, fyx = 2, fzz = −2, then 6 −1 1 2 2 . H (X ) = −1 1 2 −2 Because the diagonal elements 6, 2, −2 are not all of the same sign, by part c of the theorem, the critical point X0 = (0, −1, 1) is a saddle point. (5) Find and classify the critical points of w = f (x, y ) := x2 y 2 . At a critical point, fx = 2xy 2 = 0, fy = 2x2 y = 0. Thus the points where either x = 0 or y = 0 are all critical points. Since fxx = 2y 2 , fxy = 4xy, we find H (X ) = fyy = 2x2 , 2y 2 4xy 4xy 2x2 If either x = 0 or y = 0 , then det H = 0 so none of our tests apply to determine the nature of the critical point. However, a glance at the function f (x, y ) = x2 y 2 reveals that all of the points where either x = 0 or y = 0 are clearly local minima, since f = 0 there, while f > 0 elsewhere. Exercises (1) Find and classify the critical points of the following functions. (a) f (x, y ) = x2 − 3x + 2y 2 + 10 (b) f (x, y ) = 3 − 2x + 2y + x2 y 2 (c) f (x, y ) = [x2 + (y + 1)2 ][4 − x2 − (y − 1)2 ] (d) f (x, y ) = x3 − 3xy 2 (figure on next page) (e) f (x, y ) = xy − x + y + 2 (f) f (x, y ) = x cos y (g) f (x, y, z ) = 2x2 + 3xz + 5z 2 + 4y − y 2 + 7 330 CHAPTER 8. MAPPINGS FROM EN TO E : THE DIFFERENTIAL CALCULUS (h) f (x, y, z ) = 5x2 + 4xy + 2y 2 + z 2 − 4z + 31 (2) Let X1 , . . . , XN be N distinct points in En . Find a point X ∈ En such that the function f (X ) = X − X1 2 + · · · + X − XN 2 is a minimum. [Answer: X = 1 N N j =1 Xj , the center of gravity.] (3) (a) Find the minimum distance from the origin in E3 to the plane 2x + y − z = 5. (b) Find the minimum distance from the origin in En to the hyperplane a1 x1 + a2 x2 + · · · + an xn = c . (c) Find the minimum distance between the fixed point X0 = (˜1 , . . . , xn ) and the x ˜ hyperplane a1 x1 + . . . + an xn = c . (d) Find the minimum distance between the two parallel planes a1 x1 +· · ·+an xn = c1 and a1 x1 + · · · + an xn = c2 , (4) If f (x, y ) has two continuous derivatives, use Taylor’s Theorem (Theorem 6) to prove f (x + h1 , y + h2 ) = f (x, y ) + fx (x, y )h1 + fy (x, y )h2 1 + [fxx (x, y )h2 + 2fxy (x, y )h1 h2 + fyy (x, y )h2 ] + (h2 + h2 )R, 1 2 1 2 2 where R depends on x, y, h1 and h2 , and lim R = 0 . h1 →0 h2 →0 (5) (a) If u(x, y ) has two continuous derivatives, use the result of Exercise 4 to prove uxx (x, y ) = u(x + h1 , y ) − 2u(x, y ) + u(x − h1 , y ) ˜ + h1 R h2 1 uyy (x, y ) = u(x, y + h2 ) − 2u(x, y ) + u(x, y − h2 ) ˆ + h2 R, h2 2 and ˜ where lim R = 0 h1 →0 and ˆ lim R = 0 . h2 →0 (b) Use part a) to deduce that if h1 = h2 = h then uxx (x, y ) + uyy (x, y ) u(x + h, y ) + u(x − h, y ) + u(x, y + h) + u(x, y − h) 4 + h2 R = 2 [u(x, y ) − h 4 (8-2) where lim R = 0 . h→0 (c) Use part b) to deduce that if h is small, the solution of the partial differential equation uxx + uyy = 0 , Laplace’s equation, approximately satisfies the difference equation u(x + h, y ) + u(x − h, y ) + u(x, y + h) + u(x, y − h) 4 This difference equation states that the value of u at the center of a cross equals the arithmetic mean (“average”) of its values at the four ends of the cross. One could use the difference equation to solve Laplace’s equation numerically. u(x, y ) = 8.2. THE MEAN VALUE THEOREM. LOCAL EXTREMA. 331 (d) Prove that any function which satisfies the above difference equation in some set cannot have a maxima or minima inside that set. [Do not differentiate! Reason directly from the difference equation. No computation is necessary.] (6) If all the second partial derivatives of a function f (X ) vanish identically in some open connected set, prove that f is an affine function. (7) (The Method of Least Squares). Let Z1 , . . . , ZN be N distinct points in En , and w1 , . . . , wN a set of N numbers. We imagine the points (Zj , wj ) ∈ En+1 to be points on a surface M in En+1 . Find a hyperplane w = φ(X ) = c + ξ1 x1 + · · · , +ξn xn ≡ c + ξ , X which most closely approximates the surface M in the sense that the error E (ξ ) N |φ(Zn ) − wj |2 = E (ξ ) := j =1 is minimized. Note that you are to find the coefficients ξ1 , . . . , ξn in the equation of the hyperplane. (8) (a) Let u(x, y ) be a twice continuously differentiable function which satisfies the partial differential equation Lu := uxx + uyy + aux + buy − cu = 0 in some open set D , where the coefficients a(x, y ), b(x, y ) , and c(x, y ) are continuous functions. If c > 0 throughout D , prove that u(x, y ) cannot have a positive maximum or negative minimum anywhere in D . (b) Extend the result of part a) to functions u(x1 , . . . , xn ) which satisfy n Lu := i=1 n ∂2u ∂u − cu = 0, aj 2+ ∂x1 j =1 ∂xj in some open set D , where c > 0 throughout D . (c) If u(x, y ) satisfies the equation of part a) and u vanishes on the boundary of D, u ≡ 0 on ∂D , prove that u(x, y ) ≡ 0 throughout D . (d) Assume u(x, y ) and v (x, y ) both satisfy the same equation Lu = 0, Lv = 0 , where L is the operator of part a). If u(x, y ) ≡ v (x, y ) on the whole boundary of D , prove that u(x, y ) ≡ v (x, y ) throughout the interior of D . (9) Let f be a twice continuously differentiable function throughout the open set A . Prove that (a) if f has a local minimum at X0 ∈ A , then its Hessian H (X0 ) is positive definite or semi-definite there. (b) if f has a local maximum at X0 ∈ A , then its Hessian H (X0 ) is negative definite or semi-definite there. 332 CHAPTER 8. MAPPINGS FROM EN TO E : THE DIFFERENTIAL CALCULUS (10) Let A be a square n × n self-adjoint matrix and Y a fixed vector in En , and let f (X ) = X, AX − 2 X, Y . (a) If f (X ) has a critical point at X0 , prove X0 satisfies the equation AX0 = Y. (b) If A is positive definite and X0 satisfies the equation AX0 = Y , prove f (X ) defined above has a minimum at X0 . [The results of this problem remain valid if A is any positive definite linear operator - possibly a differential operator. The nonlinear function f (X ) defines a variational problem associated with the equation AX = Y .] (11) If f : A → E has three continuous derivatives in the open set A ⊂ E2 containing the origin, state precisely and prove Taylor’s Theorem with three terms about the origin. The resulting expression will be f (X ) := f (x, y ) = f (0) + fx (0)x + fy (0)y + 1 [fxx (0)x2 + 2fxy (0)xy + fyy (0)y 2 ] 2! 1 + [fxxx (Z )x3 + 3fxxy (Z )x2 y + 3fxyy (Z )xy 2 + fyyy (Z )y 3 ] 3! where Z is on the line segment between 0 and X = (x, y ) . (12) If u(x, y ) has the property uxy (x, y ) = 0 for (x, y ) in some open set, prove u(x, y ) = φ(x) + ψ (y ) , where φ and ψ are functions of one variable. (13) Compute the direction(s) at X0 in which the following functions f i) increase most rapidly, ii) decrease most rapidly, iii) remain constant. (a) f (x1 , x2 ) = 3 − 2x1 + 5x2 (b) f (x, y ) = e2x+y (c) f (x, y, z ) = 2x2 at X0 = (1, −2) + 3xy + 5z 2 + 4y − y 2 + 7 (d) f (u, v ) = uv − u + v + 2 8.3 at X0 = (2, 1) at at X0 (1, 0, −1) (−1, 1) . The Vibrating String. Waves. You have been hearing about them your whole life. Waves are the term used to describe the oscillatory behavior of continuous media; water waves and sound waves being the most familiar. We shall give a mathematical description of a very simple type of wave - those in an oscillating violin string. The resulting mathematical model will be a second order linear partial differential equation - the wave equation - with both initial and boundary conditions. 8.3. THE VIBRATING STRING. a) 333 The Mathematical Model Consider a string of length stretched along the x axis. Imagine the string vibrating in the plane of the paper and let u(x, t) denote the vertical displacement of the point x at time t . In order to end up with a tractable mathematical model several reasonable simplifying assumptions will be made. We assume the tension τ and density ρ of the string are constant throughout the motion, while the string is taken to be perfectly flexible so the tension force in the string acts along the tangential direction. Dissipative effects (air resistance, heating, etc.) are entirely neglected. One more assumption will be made when needed. It essentially states that the oscillations are small in some sense. Newton’s second law, ma = F , is where we begin. Draw your attention to a small segment of the string whose length, at rest, is ∆x = x2 − x1 . The mass of the segment is ρ∆x . By Newton’s second law the segment moves in such a way that the product of its center of gravity equals the resultant of the forces acting on it. For the vertical component, this means ∂2u x ρ∆x 2 (˜, t) = Fv , ∂t where x ∈ (x, x + ∆x) is the horizontal coordinate of the center of gravity of the segment, ˜ and Fv means the vertical component of the resultant force. There are two types of forces. One is the tension acting at both ends of the segment. The other is gravity acting down with a force equal to the weight of the segment, ρg ∆x . To evaluate the tension forces, let θ1 and θ2 be the angles the string makes with the horizontal at either end of the segment (see figure above). Then the vertical component of the tension force is τ sin θ2 − τ sin θ1 . The signs indicate one force is up while the other is down. Adding the tension force to the gravitational force and substituting into Newton’s second law, we find ρ∆ x ∂2u (˜, t) = τ (sin θ2 − sin θ1 ) − ρg ∆x. x ∂t2 The dependence of θ1 and θ2 on the displacement can be brought out by using the relation ux sin θ = , 1 + u2 x which follows from the relation ux = tan θ for the slope of the string. Using this, we obtain the equation ρ∆ x ∂2u (˜, t) = τ x ∂t2 ux 1+ u2 x x= x2 − ux 1 + u2 x − ρg ∆x. x= x1 A simplifying assumption is badly needed. If the function ux / a Taylor series, 1 ux = ux − u3 + · · · , 2 2x 1 + ux 1 + u2 is expanded in x 334 CHAPTER 8. MAPPINGS FROM EN TO E : THE DIFFERENTIAL CALCULUS we see that if the slope ux is small, essentially only the linear term in this series counts. Therefore, we do assume the slope ux is small (this is the same assumption made in treating the simple pendulum). With this simplification, the equation of motion is ρ∆ x ∂2u (˜, t) = τ [ux (x2 , t) − ux (x1 , t)] − ρg ∆x. x ∂t2 Divide both sides of this equation by ∆x = x2 − x1 and let the length of the interval shrink to zero. Since ∂ ∂2u ux (x2 , t) − ux (x1 , t) = ux (x, t) = (x, t), x2 − x1 ∂x ∂x2 (x2 −x1 )→0 lim where x is the limiting value of x1 and x2 , we find ρ ∂2u ∂2u (x, t) = τ 2 (x, t) − ρg 2 ∂t ∂x Because the length of the interval has been shrunk to one point x , the center of gravity is now at x too. It is customary to let τ /ρ = c2 . The constant c has units of velocity, and, in fact, is just the speed with which waves travel along the string. Thus Lu := utt − c2 uxx = −g. This is the wave equation, a second order linear inhomogeneous partial differential equation. As was the case with linear ordinary differential equations, it is easier to attempt first to solve the homogeneous equation Lu := utt − c2 uxx = 0. On physical grounds, we expect the motion u(x, t) of the string will be determined if the initial position u(x, 0) and initial velocity ut (x, 0) are known, along with the motion of both end points u(0, t) and u( , t) . However the mathematical model must be examined to see if these four facts do determine the subsequent motion (which it should if the model is to be of any use). Thus we must prove that given the initial position u(x, 0) = f (x), x ∈ [0, ] initial velocity ut (x, 0) = g (x), x ∈ [0, ] motion of left end u(0, t) = φ(t) t≥0 motion of right end u( , t) = ψ (t), t ≥ 0, then a solution u(x, t) of the wave equation utt − c2 uxx = 0 does exist which has these properties, and there is only one such solution. Existence and uniqueness theorems must therefore be proved. b) Uniqueness . This is almost identical to all uniqueness theorems encountered earlier, especially that for the simple harmonic oscillator in Chapter 4, Section 2. 8.3. THE VIBRATING STRING. 335 Theorem 8.11 (Uniqueness). There exists at most one twice continuously differentiable function u(x, t) which satisfies the inhomogeneous wave equation Lu := utt − c2 uxx = F (x, t) and the subsidiary initial conditions: u(x, 0) = f (x) , ut (x, 0) = g (x), x ∈ [0, ] boundary conditions: u(0, t) = φ(t) , u( , t) = ψ (t), t ≥ 0 , where F, f, g, φ , and ψ are given functions. Proof: Assume u(x, t) and v (x, t) both satisfy the same equation and the same subsidiary conditions. Let w(x, t) = u(x, t) − v (x, t) . Then Lw = Lu − Lv = F − F = 0 , so w satisfies the homogeneous equation Lw := wtt − c2 wxx = 0 and has zero subsidiary data initial conditions: w(x, 0) ≡ 0, wt (x, 0) ≡ 0, x ∈ [0, ] boundary conditions: w(0, t) ≡ 0, w( , t) ≡ 0, t ≥ 0 We want to prove w(x, t) ≡ 0 . Notice that w satisfies the equation for a vibrating string which is initially at rest on the x axis, and whose ends never move. Therefore our desire to prove the string never moves, w(x, t) ≡ 0 , is certain physically reasonable. For this function w , define the new function E (t) E (t) = 1 2 2 2 [wt + c2 wx ] dx. 0 We have named the function E (t) since it actually happens to be the energy in the string associated with the motion w(x, t) at time t , except for a factor of ρ . Assume it is “legal” to differentiate under the integral sign (it is). Upon doing so, we get dE = dt [wt wtt + c2 wx wxt ] dx. 0 But an integration by parts reveals that wx wxt dx = wx wt 0 0 − wt wxx dx. 0 Because the end points are held fixed, w(0, t) = 0 and w( , t) = 0 , the velocity at those points is zero too, wt (0, t) = 0 and wt ( , t) = 0 . This drops out the boundary terms in the integration by parts. Substituting the last expression into that for dE/dt , we find that dE = dt wt [wtt − c2 wxx ] dx. 0 But w satisfies the homogeneous wave equation wtt − c2 wxx = 0 . Therefore dE/dt ≡ 0 , so E (t) ≡ constant = E (0), that is, energy is conserved. Now E (0) = 1 2 2 2 [wt (x, 0) + c2 wx (x, 0)] dx. 0 336 CHAPTER 8. MAPPINGS FROM EN TO E : THE DIFFERENTIAL CALCULUS Since the initial position is zero, w(x, 0) = 0 , its slope is also zero, wx (x, 0) = 0 . The initial velocity wt (x, 0) is also zero, wt (x, 0) = 0 . Thus E (t) ≡ E (0) ≡ 0, that is, 0 = E (t) = 1 2 2 2 [wt (x, t) + c2 wx (x, t)] dx. 0 Because the integrand is positive, we conclude wt (x, t) ≡ 0 and wx (x, t) ≡ 0 . Consequently w(x, t) ≡ constant. Since w(0, t) = 0 , that constant is the zero constant, w(x, t) ≡ 0. Therefore u(x, t) − v (x, t) ≡ w(x, t) ≡ 0, so u(x, t) ≡ v (x, t) : the solution is unique. c) Existence For the simple one (space) dimension wave equation, there are many ways to prove a solution exists. The one to be given here is not the simplest (see Exercise 6 for the result of that method), but it does generalize immediately to many other problems. It makes no difference how we find a solution, for once found, by the uniqueness theorem it is the only possible solution. To avoid complications, we shall consider only the homogeneous equation and assume the end points are tied down. Thus, we want to solve Wave equations: utt − c2 uxx = 0. Initial conditions: u(x, 0) = f (x), ut (x, 0) = g (x). Boundary conditions: u(0, t) = 0, u( , t) = 0. The idea is first to find special solutions u1 (x, t), u2 (x, t), . . . , which satisfy the boundary conditions but do not necessarily satisfy the initial conditions. Then, as was done for linear O.D.E.’s, we build the solution which does satisfy the given initial conditions as a linear combination of these special solutions, u(x, t) = Aj uj (x, t), that is, by superposition. Let us seek special solutions in the form of a standing wave, u(x, t) = X (x)T (t). Here X (x) and T (t) are functions of one variable. Our procedure is reasonably called separation of variables. Substitution of this into the wave equation gives ¨ T (t)X (x) − c2 X (x)T (t) = 0, or ¨ X (x) 1 T (t) =2 . X (x) c T (t) 8.3. THE VIBRATING STRING. 337 Since the left side depends only on x , while the right depends only on t , both sides must be constant (a somewhat tricky remark; think it over). Let that constant be −γ (using −γ instead of γ is the result of hindsight, as you shall see). ¨ 1T X = 2 = −γ. X cT This leads us to the two ordinary differential equations ¨ T (t) + γc2 T (t) = 0. X (x) + γX (x) = 0, Since u(0, t) = 0 and u( , t) = 0 and u(x, t) = X (x)T (t) , the function X (x) must also satisfy the boundary conditions X (0) = 0, X ( ) = 0. There are several ways to show γ must be positive. Perhaps the simplest is to observe that if γ < 0 or γ = 0 , the only function X (t) which satisfies the differential equation X + γX = 0 and boundary conditions X (0) = X ( ) = 0 is the zero function X (x) ≡ 0 . Since for this function u(x, t) = X (x)T (t) ≡ 0 , it is devoid of further interest. Another way to show γ is positive is to multiply the ordinary differential equation X + γX = 0 by X (x) and integrate over the length of the string, [X (x)X (x) + γX 2 (x)] dx = 0. 0 Upon integrating by parts, we find that ?(x)X (x) dx = XX 0 0 X 2 (x) dx. − 0 Since X (0) = X ( ) = 0 , the boundary terms drop out. Substituting this into the above equation, we find that X 2 (x) dx = γ 0 X 2 (x) dx. 0 If X (x) is not identically zero, this can be solved for γ X 2 (x) dx γ= 0 , 2 X (x) dx 0 and clearly shows γ > 0 . Enough for that. The solution of X + γX = 0, γ > 0 , is X (x) = A cos √ √ γx + B sin γx. The boundary condition X (0) = 0 implies A = 0 , while the boundary condition at the other end point X ( ) = 0 , implies √ 0 = B sin γ . 338 CHAPTER 8. MAPPINGS FROM EN TO E : THE DIFFERENTIAL CALCULUS If B = 0 too, then X (x) ≡ 0 , so u(x, t) ≡ 0 . This is of no use to us. The only alternative √ √ √ is to restrict γ so that sin γl = 0 . This means γ is a multiple of π, γ = nπ, n = 1, 2, . . . , nπ √ γ= , n = 1, 2, . . . . There is then one possible solution X (x) for each integer n , Xn (x) = Bn sin nπ x, where the constants Bn are arbitrary. Remark: There is a similarity of deep significance for mathematics and physics between the work in these last few paragraphs and that done for the coupled oscillators in Chapter 6. There (p. 528-9), we had an operator A and wanted to find nonzero vectors Sn and numbers λ such that ASn = λn Sn . The numbers found λn were called the eigenvalues of A , and Sn the corresponding eigenvectors. d2 Here, we were given the operator A = − dx2 and wanted to find nonzero functions Xn (t) ∈ { X ∈ C 2 [0, ] : X (0) = X ( ) = 0 } which satisfy the equation AXn = γn Xn The numbers found, γn = n2 π 2 / 2 , are also called the eigenvalues of A , and the function Xn (t) = sin nπ x , the eigenfunction of A corresponding to the eigenvalue γn . Associated with each possible eigenvalue γn , there is a solution of the time equation, ¨ T + γc2 T = 0 , ncπ ncπ Tn (t) = Cn cos t + Dn sin t. We therefore have found one special solution, un (x, t) − Xn (t)Tn (t) , for each value of the index n , nπx ncπt ncπt un (x, t) = sin (αn cos + βn sin ). The arbitrary constants have been lumped in this equation. These special solutions are the “natural” vibrations of the string, or normal modes of vibration. A snapshot at t = t0 of the string moving in the n th normal mode would reveal the sine curve un (x, t0 ) = C sin nπx , the constant C accounting for the remaining terms, which are constant for t fixed. In music, the integer n refers to the octave. The fundamental tone is the case n = 1 , while the tone for n = 2 , the second harmonic or first overtone, is one octave higher. a figure goes here 8.3. THE VIBRATING STRING. 339 The time frequency Vn of the n th normal mode is Vn = ncπ , this is the number of oscillations in 2π units of time. It is the time frequency which we usually associate with musical pitch. The (time) period τn of the n th normal mode is 2π/Vn , that is τn = 2l/nc . Another name you will want to know is the wave length λn of the n th normal mode, λn = 2 /n (see figures above). Notice that Vn λn = c , an important relationship. Having found the special normal mode solutions, un (x, t) , we hope that arbitrary constants αn and βn can be chosen so a linear combination ∞ u(x, t) = ∞ un (x, t) = n=1 ncπt (αn cos + βn sin ncπt ) sin nπx n=1 will satisfy the given initial conditions. Every function u(x, t) of this form automatically satisfies the boundary conditions u(0, t) = 0, u( , t) = 0 since each of the un ’s satisfy them. If u(x, 0) = f (x) and ut (x, 0) = g (x) , then from the above equation, we must have ∞ f (x) = ∞ un (x, 0) = n=1 and ∞ g (x) = n=1 αn sin nπx n=1 ∂un (x, 0) = ∂t ∞ nπc βn sin nπx . n=1 Thus, the coefficients αn are the coefficients in the Fourier sine series for f , while the βn are essentially the coefficients in the Fourier sine series for g . In fact, this is how Fourier was led to the series bearing his name. These formulas for u(x, y ), f (x) , and g (x) become easier on the eye if the length of the string is π, = π . Then ∞ u(x, y ) = (αn cos nct + βn sin nct) sin nx, n=1 while ∞ f (x) = ∞ un (x, 0) = αn sin nx, n=1 and ∞ g (x) = n=1 (8-3) n=1 ∂un (x, 0) = ∂t ∞ ncβn sin nx. n=1 Finding the coefficients αn and βn is particularly simple if f and g can be represented by finite series. Examples: Find the solution u(x, t) of the wave equation for a string of length π, l = π , which is pinned down at its end points, u(0, t) = u(π, t) = 0 , and satisfies the given initial conditions. (1) u(x, 0) = f (x) = 2 sin 3x, ut (x, 0) = g (x) = the two series 1 2 sin 4x . We have to find αn and βn for ∞ 2 sin 3x = αn sin nx n=1 340 CHAPTER 8. MAPPINGS FROM EN TO E : THE DIFFERENTIAL CALCULUS 1 sin 4x = 2 ∞ ncβn sin nx. n=1 For these simple functions, just match coefficients, giving α3 = 2, αn = 0, n = 3, and β4 = 1 , βn = 0, n = 4. 8c Therefore, the sum of the two waves u(x, t) = 2 cos 3ct sin 3x + 1 sin 4ct sin 4x 8c is the (unique!) solution of this example. (2) u(x, 0) = f (x) = 1 2 sin 3x − sin 17x and ut (x, 0) = g (x) = −9 sin x + 13 sin 973x. We have to find αn and βn for the two series 1 sin 3x − sin 17x = 2 and ∞ αn sin nx n=1 ∞ −9 sin x + 13 sin 973x = ncβn sin nx. n=1 By matching again, we find α3 = 1 , α17 = −1 , and αn = 0 for n = 3 or 17 . Also, 2 13 β1 = 9 , β973 = 973c , and βn = 0 for n = 1 or 973 . The (unique) solution is then a c sum of four waves 9 1 u(x, t) = − sin ct sin x + cos 3ct sin 3x 3 2 − cos 17ct sin 17x + 13 sin 973ct sin 973x. 973c Since f and g are not usually given in the simple form of these examples, the full Fourier series is needed. Recall that the string is pinned down at both ends. Therefore both the initial position function f (x) and velocity function g (x) have the property f (0) = f (π ) = 0 , and g (0) = g (π ) = 0 , where we have taken the length of the string to be π . It is now possible to extend both f and g , assumed continuous in [0, π ] , to the whole interval [−π, π ] as continuous odd functions, a figure goes here that is, if x ∈ [0, π ] , we can define f (−x) = −f (x) and g (−x) = −g (x), since the right sides, −f (x) and −g (x) , are known functions for x ∈ [0, π ] . As odd functions now on the whole interval [−π, π ] , the functions f and g have Fourier sine series (cf. p. 252, Exercise 3a). 8.3. THE VIBRATING STRING. 341 ∞ f (x) = sin nx bn √ π n=1 ∞ g (x) = where π bn = 2 0 ˜n sin nx b√ π n=1 sin nx f (x) √ dx, π π ˜n = 2 b 0 sin nx g (x) √ dx π (8-4) Comparing with the previous formulas (3) for f and g , we find √ √ αn = bn / π, and βn = ˜n /nc π b Consequently ∞ u(x, t) = b cos nct ˜n sin nct √ ) sin nx + (bn √ nc π π n=1 (8-5) the coefficients bn and ˜n being determined from the initial conditions by equation b (4). Thus, we have almost proved Theorem 8.12 . If f (x) is twice continuously differentiable and g (x) once continuously differentiable for x ∈ [0, π ] and both functions vanish at x = 0 and x = π , then the function u(x, t) defined by equation (5) is a solution of the homogeneous wave equation utt − c2 uxx = 0 and satisfies the initial conditions: u(x, 0) = f (x), ut (x, 0) = g (x), x ∈ [0, π ], as well as the boundary conditions: u(0, t) = 0, u(π, t) = 0, t ≥ 0, where bn and ˜n are determined from f and g through equations (4). Moreover, this b solution is unique (by Theorem 9). Outline of Proof. If it is possible to differentiate the infinite series (5) term by term u(x, t) would satisfy the wave equation since each special solution un (x, t) does. In any case, the initial condition u(x, 0) = f (x) is clearly satisfied. However, checking the other initial condition ut (x, 0) = g (x) also involves differentiating the infinite series term by term. Thus, we must only justify the term by term differentiation of an infinite Fourier series. For power series, we found (p. 82-3, Theorem 16) we can always differentiate term by term within its disc of convergence. Such is not the case with Fourier series. For example, ∞ sin n2 x the Fourier series converges for all x , but the series obtain by differentiating n2 ∞ n=1 cos n2 x diverges at x = 0 . However, if a function is sufficiently smooth, its formally, n=1 Fourier series can be differentiated term by term and does converge to the derivative of the 342 CHAPTER 8. MAPPINGS FROM EN TO E : THE DIFFERENTIAL CALCULUS function. Since the details of a complete proof are but a rehash of the proof carried out for power series (p. 82ff), we omit it. Example: Find the displacement u(x, t) of a violin string of length π with fixed end points which is plucked at its midpoint to height h . The initial position is then xh, x ∈ [0, π/2] (π − x)h, x ∈ [π/2, π ], f (x) = and the initial velocity, g (x) , is zero. We must find the coefficients bn and ˜n in the series (5). After mentally continuing f b and g to the interval [−π, π ] as odd functions, the formulas (4) give us bn and ˜n , b π bn = 2 0 π /2 2h sin nx f (x) √ dx = √ π π π (π − x) sin nx dx . x sin nx dx + 0 π/2 Integrating and simplifying, we find that 0, n even nπ 4h 1, n = 1, 5, 9, 13, . . . = bn = √ 2 sin 2 πn −1, n = 3, 7, 11, 15. From g (x) ≡ 0 , it is immediate that βn = 0 for all n . Thus, 4h u(x, t) = π ∞ n=1 nπ 1 sin cos n ct sin nx n2 2 4h cos 3ct sin x cos ct sin 3x cos 5ct sin 5x [ − + + ···] π 1 32 52 is the desired solution. = Exercises (1) (a) Find a solution u(x, t) of the homogeneous wave equation for a string of length π whose end points are held fixed if the initial position function is u(x, 0) = 1 sin 4x − sin 7x, 2 while the initial velocity is ut (x, 0) = sin 3x + sin 73x. (b) Same problem as a), but u(x, 0) = sin 5x + 12 sin 6x − 7 sin 9x ut (x, 0) = − sin x + 91 sin 273x. 8.3. THE VIBRATING STRING. 343 (2) Find a solution u(x, t) of the homogeneous wave equation for a string of length π whose end points are held fixed if the string is initially plucked at the point x = π/4 to the height h . (3) Consider a vibrating string of length whose end points are on rings which can slide freely on poles at 0 and . Then the boundary conditions at the end points are ux (0, t) = 0, ux ( , t) = 0 that is, zero slope. (a) Use the method of separation of variables to find the form of special standing wave solutions. [Answer: un (x, t) = cos nπx (αn cos ncπt + βn sin ncπt ) ]. (b) Use these to find a solution with the initial conditions u(x, 0) = cos x − 6 cos 3x ut (x, 0) = (let = π ) 1 cos 2x. 2 (4) Let u(x, t) satisfy the homogeneous wave equation. Instead of keeping the end points fixed, we either put them on rings (cf. Exercise 3) or attach them by elastic bands, in which case the boundary conditions become ux (0, t) − c1 u(0, t) = 0, ux (π, t) + c2 u(π, t) = 0, c1 , c2 ≥ 0. (a) Define the energy as before, and prove that energy is dissipated with these boundary conditions, unless c1 and c2 vanish. (b) Prove there is at most one function u(x, t) which satisfies the inhomogeneous wave equation utt − c2 uxx = F (x, t) with initial conditions as before, but with elastic boundary conditions ux (0, t) − c1 u(0, t) = φ(t), ux (π, t) + c2 u(π, t) = ψ (t), where c1 and c2 are non-negative constants. (5) To account for the effect of air resistance on a vibrating string, one common assumption is that the resistance on a segment of length ∆x is proportional to the velocity of its center of gravity, x Fres = −k ∆xut (˜, t), k > 0, where k is a numerical constant. This is analogous to the standard viscous resistance force on a harmonic oscillator. (a) Find the equation of motion ignoring gravity. [Answer: 1 u c2 tt + kut = uxx ] (b) Find the form of the special standing wave solutions, assuming, the end points are held fixed. (c) Write a formula giving the probable form for the general solution u(x, t) . (d) If the end points are pinned down, what do you expect the behavior of the string will be as t → ∞ ? Does the formula found in part c) verify your belief (it should). 344 CHAPTER 8. MAPPINGS FROM EN TO E : THE DIFFERENTIAL CALCULUS (e) Define the energy E (t) as before and show that energy is dissipated if the ends are held fixed. ˙ (f) Use the result of e) to prove E (t) + 2kE (t) ≥ 0 , and conclude that E (t) ≥ −2kt for t ≥ 0 . This shows that the energy is not dissipated too rapidly. E (0)e (6) It is possible to write the solution of the homogeneous wave equation for a string of length π with fixed end points in a simple closed form by using the trigonometric identities 2 sin nx cos nct = sin n(x − ct) + sin n(x + ct). 2 sin nx sin nct = sin n(x − ct) − cos n(x + ct). (a) Do this and obtain d’Alembert’s formula u(x, t) = 1 f (x − ct) + f (x + ct) + 2 2c x+ct g (ξ )dξ. x−ct (b) Solve the example of a plucked string (p. 641) again using this formula. Draw π two sketches, one indicating the position of the string at time t = 2c and another π at t = c . 2 2 ∂ ∂ (7) (a) Prove the wave operator L := ∂t2 − c2 ∂x2 , c a constant, is translation invariant, that is, if T : u(x, t) → u(x + x0 , t + t0 ), prove (LT )u = (T L)u for all values of x0 and t0 , and for all functions u for which the operators make sense. (b) Find the function φ(a, b) in the formula Leax+bt = φ(a, b)eax+bt . (c) Use part b) to show that if a is any constant, the four functions ea(x+ct) , e−a(x+ct) , ea(x−ct) , e−a(x−ct) are solutions of the homogeneous wave equation Lu = 0 . (d) Use the fact that each of the above functions satisfies the ordinary differential equation v (x) = a2 v (x) to conclude that if linear combinations of these functions are to satisfy the boundary conditions v (0) = v ( ) = 0 , then necessarily a2 < 0 , so the constant a is pure imaginary and we can write a = iγ , where γ is real. (e) Let u(x, t) be a linear combination of the four functions part c) with a = iγ . Show that u(x, t) may be written in the form u(x, t) = sin γx [A cos γct + B sin γct]. (f) If u(0, t) = u( , t) = 0 , show that γn = nπ . Find an infinite set of special solutions un (x, t) which satisfy the homogeneous wave equation with zero boundary values [From here on, one proceeds as before to find the general solution. This problem has shown how the idea of translation invariance can also be used to lead one to the special solutions un ]. 8.3. THE VIBRATING STRING. 345 (8) (a) By inspection, find a particular solution for the solution of the inhomogeneous wave equations Lu := utt − c2 uxx = g, g ≡ constant. (b) How can this particular solution be used to find the solution of the equation Lu = g which has given initial conditions and zero boundary conditions? (9) Flow of heat in a thin insulated rod on the x axis is governed by the heat equation ut (x, t) = k 2 uxx (x, t), where u(x, t) represents the temperature at the point x at time t , and k 2 , the diffusivity, is a constant depending on the material. The “energy” in a rod of length , 0 ≤ x ≤ , is defined as E (t) = 1 2 l u2 (x, t) dx. 0 (a) If the ends of the rod have zero temperature, u(0, t) = u( , t) = 0 , prove “energy” ˙ is dissipated, E (t) ≤ 0 , by showing dE (t) = −k 2 dt u2 (x, t) dx. x 0 (b) Given a rod whose ends have zero temperature and whose initial temperature is zero, u(x, 0) = 0 , prove that the temperature remains zero, u(x, t) ≡ 0 . (c) Prove the temperature of a rod is uniquely determined if the following three data are known: initial temperature: u(x, 0) = f (x), x ∈ [0, ]. boundary conditions: u(0, t) = φ(t), u( , t) = ψ (t), t ≥ 0. (d) Use the method of separation of variables to find an infinite number of special solutions of the heat equation for a thin rod whose end points have zero temperature for all t ≥ 0 . [Answer: un (x, t) = cn e −n2 k2 π 2 t 2 sin nπ x, n = 1, 2, . . . ] (e) If the ends of a rod have zero temperature for all t ≥ 0 , what do you intuitively expect the temperature u(x, t) will be as t → ∞ ? Is this borne out by the formulas for the special solutions? (f) Find the temperature distribution in a rod of length π if the ends have zero temperature and if the initial temperature distribution in the rod is u(x, 0) = sin x − 4 sin 7x, (10) If the temperature at the ends of the bar of length is constant but not necessarily zero, say u(0, t) = θ1 , u ( , t) = θ 2 , the temperature distribution can be found be splitting the solution into two parts, u(x, t) = u(x, t) + up (x, t) , where up (x, t) is a particular solution having the correct ˜ temperature at the ends of the bar and u(x, t) is a general solution which has zero temperature at the ends. 346 CHAPTER 8. MAPPINGS FROM EN TO E : THE DIFFERENTIAL CALCULUS (a) Find a particular solution of the homogeneous heat equation ut = k 2 uxx which satisfies u(0, t) = 200 , u( , t) = 500 , but does not necessarily satisfy any prescribed initial condition. [Answer: Many possible solutions - for example up (x, t) = 20 + 30 x , or up (x, t) = 20 + 30 sin πx ] . 2 (b) Find the temperature distribution in a rod of length π if the initial temperature is u(x, 0) = 2 sin x − sin 4x , while the boundary conditions are as in part a). (11) If the ends of a bar of length boundary conditions are are insulated instead of being kept at zero, the ux (0, t) = ux ( , t) = 0. (a) Use the method of separation of variables to find an infinite number of special solutions for the homogeneous heat equation with insulated ends. [Answer: un (x, t) = cn e −n2 k2 π 2 t 2 cos nπx , n = 0, 1, 2, . . . ]. (b) What is the temperature distribution in a rod whose ends are insulated if the initial temperature distribution is u(x, t) = 3 cos 2πx − 1 5πx cos . 5 (12) In this exercise you will find a quantitative estimate for the rate of decrease of energy for the heat in a rod of length with zero temperature at the ends. (a) Use the result of Exercise 9a to prove the differential inequality dE ≤ −cE (t), dt where c is a positive constant. [Hint: Look at p. 227 Exercise 15c]. (b) Conclude that E (t) ≤ E (0)e−ct , t ≥ 0. This is the desired estimate for the decrease of energy in the rod. (13) The linear partial differential equation uxx − u = ut governs the temperature distribution in a rod of length made up of a material which uses up heat to carry out a chemical process. Define the energy E (t) in the rod as in Exercise 9. (a) Prove that if the ends of the rod have zero temperature, then the energy is ˙ dissipated, E (t) ≤ 0 . (b) Given a rod whose ends have zero temperature and whose initial temperature u(x, 0) is zero, use a) to prove that the temperature remains zero, u(x, t) ≡ 0, t ≥ 0 . (c) Use part b) to prove that the temperature of the rod described above is uniquely determined if the following three data are known u(x, 0) for x ∈ [0, ], u(0, t) and u( , t) for t ≥ 0. 8.4. MULTIPLE INTEGRALS 347 (14) In setting up the mathematical model for the vibrating string, we never examined the horizontal components of the forces. (a) Show that the net horizontal force is Fh = τ cos θ2 − τ cos θ1 (b) Under our assumption ux is small, show that the net horizontal force is zero so there is no horizontal motion of the string. This justifies the statement that the motion of the string is entirely vertical. (15) Use the formula Vn = nπc/ (page 635) for the frequency and the relationship between c, T and ρ (page 624) to derive a formula for Vn in terms of the physical constants , T , and ρ for a vibrating string. Interpret the effect on the frequency, Vn , if the physical constants are changed. Does this agree with your experience in tuning stringed instruments? 8.4 Multiple Integrals How can we extend the notion of integration from functions of one variable to functions of several variables? That is the problem we shall face in this section. Let w = f (X ) = f (x1 , . . . , xn ) be a scalar-valued function defined in C ⊂ En . For the purposes of this section it will be convenient to think of f as either the height function for a surface M in En+1 over D , or as the mass density of D . In the first case. f D should be the volume of the solid contained between M and D (see fig.), whereas in the second case, f should be total mass of the set D . D Two problems have to be solved. First, define the integral in En . Second, give a reasonable procedure for explicitly evaluating the integral in sufficiently simple situations. More so than for the single integral, the problem of defining the multiple integral bristles with technical difficulties. However, after this is done the evaluation of integrals in En can be reduced to the evaluation of repeated integrals, that is, a sequence of n integrals in E1 , which is in turn effected not by using the definition of the integral, but rather by recourse to the fundamental theorem of calculus. Before starting the formalities, it is well advised to see where some difficulties lie. Suppose we are given a density function f defined on some domain D and want to find the total mass of D . To make things even simpler, assume for the moment that the density is constant and equal to 1, for all X ∈ D ⊂ En . Then the mass coincides with the volume of the domain. For the special case of functions of one variable D ⊂ E1 is an interval so the “volume” of D (really the length of D ) is trivial a figure goes here to compute, Vol (D) = b − a . However if D has two or more dimensions, even finding the volume of D (area if D ⊂ E2 ) is itself difficult. The problem is that a connected set D in E1 can only be a line segment, whereas a connected open set in En , n ≥ Z can be much more complicated topologically. In E1 , the 348 CHAPTER 8. MAPPINGS FROM EN TO E : THE DIFFERENTIAL CALCULUS closed “cube” and closed “ball” are both intervals [a, b] , and every other connected set is also an interval. In E2 , not only do the cube and ball become distinct, but also a slew of other possibilities arise. D may be riddled with holes and its a figure goes here boundary wild (contrasted to the boundary of a connected set in E1 which is always just two points, the end points of the interval). It should be clear that the notion of volume of a set D may only be definable if the boundary of D is sufficiently smooth. As you should be anticipating, the volume of a set D will be defined by filling it up with little cubes of volume ∆x1 ∆x2 . . . ∆xn = ∆V , and then proving that as the size of the cubes becomes small, the sum of volumes of the cubes approaches a limit (here is where the smoothness of θD enters). In two dimensions, D ⊂ E2 , this roughly reads Area (D) = lim ∆x∆y = ∆x→0 dx dy. D ∆y →0 Only after the volume of a domain is defined can the more general notion of mass of a set D for a density function f be defined. The procedure here is straightforward, however it is important that the density f be “essentially” continuous. Using the same approximating cubes, we assign to each little cube its approximate density, say by using the value of the density f at the center of the little cube. Adding up the masses of these little cubes and passing to the limit again, we find the total mass of the solid D with density f . Again, in two dimensions this roughly reads Mass (D) = lim ∆x→0 f (xi , yj )∆x∆y = f (x, y ) dx dy. D ∆y →0 Because of the technical complications, we shall only state a series of propositions which give the existence of the integral. The proofs of several crucial - but believable - results will not be carried out, but can be found in many advanced calculus books. For convenience, the geometric language of the plane, E2 , will be used. The ideas extend immediately to higher dimensions. Now some terminology. Definition: A shaved rectangle is a rectangle with its bottom and left sides omitted, that is, a set of the form Q = {X = (x1 , x2 ) : aj < xj ≤ bj , j = 1, 2}. A rectangular complex is a finite union of shaved rectangles, which can always be assumed disjoint, that is, non-overlapping. This should more accurately be called a shaved rectangular complex, but is not for the sake of euphony. If D is a set, the characteristic function of D , XD is defined by XD (X ) = 1, 0, X∈D X D. A step function s(X ) is a finite linear combination of characteristic functions of shaved rectangles. The graph of this function looks like its name implies. 8.4. MULTIPLE INTEGRALS 349 a figure goes here A function f has compact support if it is identically zero outside some sufficiently large rectangle. The support of a particular function f , written supp f , is the smallest closed set outside of which f is zero. Thus, it is the set of all points X where f (X ) = 0 and the limit points of those points. We take the area of a shaved rectangle Q as a known quantity - the height times base, and define the integral as I (XQ ) = dA ≡ Area (Q), XQ dA = E2 D where the Area (Q) is defined in the natural way as length × width. You may wish to think of dA as representing an “infinitesimal element of area”. We however assign no meaning to the symbol and use it only as a reminder. Some prefer to do without it altogether and write XQ = Area (Q). E2 Our task is to define I (f ) ≡ f dA E2 for density functions other than XQ ’s. For example, if D is some set, for the function XD we want to define Area (D) = XD dA = E2 dA D But this will not make sense unless it is shown that the set D does have a number associated with it which has the properties of area. It is easy to define the integral of a step function S . Let n S (X ) = aj XQj (X ), j =1 where the Qj ’s are disjoint. Then S dA should represent the total mass of a plate composed of rectangles Q1 , . . . , Qn with respective densities a1 , . . . , an . Thus, we define n I (S ) = S dA ≡ a1 Area (Q1 ) + . . . + an Area , (Qn ) = aj j =1 The integrals of step functions clearly satisfy the following Lemma 8.13 . If S1 (X ) and S2 (X ) are step functions, then a). I (aS1 + bS2 ) = aI (S1 ) + bI (S2 ). b). S1 (X ) ≤ S2 (X ) implies I (S1 ) ≤ I (S2 ). c). If S (X ) is bounded by M, S (X ) ≤ M , then I (S ) ≤ cM, where c is the area of the support of S . XQj dA. 350 CHAPTER 8. MAPPINGS FROM EN TO E : THE DIFFERENTIAL CALCULUS The integral of any other more complicated function is defined by using step functions. Definition: A function f : E2 → E is Riemann integrable if given any > 0 , there are step functions s and S with s(X ) ≤ f (X ) ≤ S (X ) for all X ∈ E2 such that I (S ) − I (s) < , that is S dA − s dA < . E2 E2 Intuitively, a function is Riemann integrable if it can be trapped between two step functions S and s in such a way that the integrals of S and s differ by an arbitrarily small amount. Definition: If f is Riemann integrable, let Sn and sn be a trapping sequence, for f , 1 that is, sn (X ) ≤ f (X ) ≤ Sn (X ) and I (Sn ) − I (sn ) < n . Then the Riemann integral of f, I (f ) is defined as (cf. page 21, for the definition of l.u.b. = least upper bound, and of g.l.b.). I (f ) ≡ l.u.b.n→∞ I (sn ) We could have equivalently defined I (f ) as I (f ) = g.l.b.n→∞ I (Sn ) . Since both limits are the same, it is irrelevant. However, it is important to show that I (f ) has the same ˆ value if any other trapping sequence Sn (X ), sn (X ) is used. This is the content of ˆ Lemma 8.14 . If f is Riemann integrable, then I (f ) does not depend on which trapping sequences are used. Proof not given. Now we exhibit a class of functions which are Riemann integrable. The issue boils down to finding functions which can be approximated well by step functions. Lemma 8.15 . If f is a continuous function and D is a closed and bounded set, then f can be approximated arbitrarily closely from above and below by step functions S and s throughout D . Thus, given any > 0 , there are step functions S and s such that 0 ≤ S (X ) − f (x) < , and 0 ≤ f (X ) − s(X ) < for all X ∈ D. Proof not given. Theorem 8.16 . If f is a continuous function with compact support, then it is Riemann integrable. Proof: Let S (X ) and s(X ) be as in the lemma where D is the support of f . Then s(X ) ≤ f (X ) ≤ S (X ) and S (X ) − s(X ) = [S (X ) − f (X )] + [f (X ) − s(X )] < 2 . Thus by Lemma 1, I (S ) − I (s) = I (S − s) < 2c , where c is the area of the set (supp S ) ∪ (supp s ). Because f has compact support, the constant c is bounded. Therefore the factor 2c can be made arbitrarily small by choosing small. This verifies all the conditions for integrability. 8.4. MULTIPLE INTEGRALS 351 We have disposed of the problem of integrating continuous functions with compact support. Notice that the above procedure is identical to that used for functions of one variable (see figure.) We still do not know how to find the area of a domain D . Although we anticipate that Area (D) = I (XD ) , this does not yet make sense (except for rectangular complexes) since the discontinuous function XD is not covered by Theorem 1). Let us remedy this now. The problem is to show the boundary ∂D does not have any area. Definition: A set in E2 has content zero if it can be enclosed in a rectangular complex whose total area is arbitrarily small. Thus, if a set has content zero, given any > 0 , there is a rectangular complex R containing ∂D such that Area (R) = I (XR ) < . It should be clear that any set with a finite number of points has content zero (since each point can be enclosed on a square of side , so the total area of N such squares is N 2 , which can be made arbitrarily small.) One would also expect that curves will have zero content. This is not necessarily true unless the curve is not too badly behaved. Lemma 8.17 . If a curve is composed of a finite number of smooth curves, then it has zero content. In particular, if the boundary ∂D of a bounded domain D is such a curve, it has zero content. Proof not given. Theorem 8.18 . If the boundary ∂D of a domain D ⊂ E2 has content zero, then the function XD is Riemann integrable. Consequently, the area of D is definable and given by Area (D) = XD dA = E2 dA. D Proof: Almost identical to that for Theorem 11. Let > 0 be given and let R be the rectangular complex which encloses the boundary ∂D , where R has area less than , I (XR ) < . Then the part of D which is enclosed by R, D− = D − R ∩ D , is a rectangular complex as is D+ = R ∪ D− and D+ − D− = R . Since D+ ⊃ D ⊃ D− , we have XD− (X ) ≤ XD (X ) ≤ XD+ (X ) for all X. Also, I (XD+ ) − I (XD− ) = I (XR ) < . Thus XD is trapped by the step functions S = XD+ and s = SD− and I (S ) − I (s) < , proving the theorem. It is now possible to define f dA D for continuous functions f where D is not necessarily the support of f . Theorem 8.19 . If f is continuous in a closed and bounded set D whose boundary ∂D has content zero, then the function fXD is Riemann integrable and f dA ≡ I (fXD ). D 352 CHAPTER 8. MAPPINGS FROM EN TO E : THE DIFFERENTIAL CALCULUS Proof: Let R be the rectangular complex which encloses ∂D and has area less than , I (XR ) < . Take D− = D − R ∩ D and D+ = R ∪ D− as in Theorem 12. Further let S1 and s1 be step functions which trap f within for all X ∈ D− (this is possible by Lemma 3) 0 ≤ S1 (X ) − f (X ) < , 0 ≤ f (X ) − s1 (X ) < for all X D− , so 0 ≤ S1 (X ) − s1 (X ) < 2 X ∈ D. for all Let M be an upper bound for |f | on D, |f (X )| ≤ M for all X ∈ D− . Then define S = S1 + M X R and s = s1 − M XR . These functions S and s trap f on all of D , s(X ) ≤ f (X ) ≤ S (X ) for all X ∈ D, that is, s ≤ f XD ≤ S for all X. Furthermore I (S − s) = I (S1 − s1 ) + 2M I (XR ) < 2c + 2M = (2c + 2M ) , where c is the area of D− . Since S and s are step functions which trap f , and since I (S − s) can be made arbitrarily small, the proof that fXD is Riemann integrable is completed. We follow custom and write I (f XD ) ≡ f dA. D Except for the three unproved lemmas, this completes the proof of the existence of the integral. The next theorem summarizes some important properties of the integral. Theorem 8.20 . If f and g are Riemann integrable, then a). I (af + bg ) = aI (f ) + bI (g ), a, b constants b). f ≤ g implies I (f ) ≤ I (g ) . c). |I (f )| ≤ I (|f |) Proof: a) and b) are immediate consequences of the corresponding statements for step functions (Lemma 1) and the definition of the Riemann integral as the limit of step functions. To prove c), we first observe that if f is integrable, so is |f | . Since − |f | ≤ f ≤ |f | , by parts a and b −I (|f |) ≤ I (f ) ≤ I (|f |), which is equivalent to the stated property. Although the approximate value of the integral f dA can be evaluated by using the D procedures of the above theorems, we have as yet no routine way of evaluating the integral 8.4. MULTIPLE INTEGRALS 353 if f and D are simple. Some notation will suggest the method. Write dA = dx dy and think of dx dy as the area of an “infinitesimal” rectangle. Then f dA = f (x, y ) dx dy. D D If D is the domain in the figure, it is reasonable to evaluate the double integral, which we shall think of as the mass of D with density f , by first finding the mass of a horizontal strip γ2 g (y ) = f (x, y ) dx, γ1 and then adding up the horizontal strips to find the total mass γ4 f (x, y ) dx dy = γ4 γ2 g (y ) dy = D f (x, y ) dx γ3 γ3 dy. γ1 The integral on the right is called an iterated or repeated integral. In a similar way, one could begin with mass of vertical strips γ4 h(x) = f (x, y ) dy γ3 and add these up γ2 f (x, y ) dx dy = D γ2 γ4 h(x) dy = f (x, y ) dy γ1 γ1 dx. γ3 For most purposes, it is sufficient to consider domains which are of the two types pictured a figure goes here that is, D is bounded on two sides by straight line segments. More complicated domains can be treated by decomposing them into domains of these two types, where one or both of the straight line segments might degenerate to a point. Theorem 8.21 . If f is continuous on a domain D1 (respectively D2 ) as above, then the iterated integral b φ2 (x) β f (x, y ) dy a dx φ2 (y ) [resp. φ1 (x) f (x, y ) dx α dy ] φ1 (y ) exists and equals f dA. D Proof not given. It is rather technical. Remark: If a domain D happens to be of both types (as, for example, rectangles and triangles are ) then either iterated integral can be used and yield the same result - since they are both equal f dA . See Examples 1 and 3 below (Example 2 could also have D been done both ways). Examples: 354 CHAPTER 8. MAPPINGS FROM EN TO E : THE DIFFERENTIAL CALCULUS f dA where f (x, y ) = x2 y and D is the rectangle in the figure. We (1) Evaluate D shall integrate with respect to x first. 2 3 (x2 + xy ) dx f dA = D 1 dy. 1 The inner integral is the mass of a strip. Think of y as being the fixed height of the strip. Then 3 (x2 + xy ) dx = 1 x3 x2 y + 3 2 x=3 x=1 =9+ 9y 1 y 26 −−= + 4y 2 32 3 Therefore, adding up all the strips we find 2 f dA = ( D 1 26 26 + 4y ) dy = ( y + 2y 2 ) 3 3 y =2 = y =1 26 44 +6= 3 3 Let us evaluate this again, now integrating first with respect to y . 3 2 (x2 + xy ) dy f dA = D First 1 2 (x2 + xy ) dy = (x2 y + 1 so 3 f dA = D 1 dx. 1 xy 2 ) 2 y =2 y =1 3 = x2 + x 2 x2 3 3 (x2 + x) dx = ( + x2 ) 2 3 4 x=3 x=1 = 44 , 3 which agrees with the previous computation. Instead of imagining f as the density of D , one can also take f to be the height function of a surface above D . Then the integral f dA is the volume of the solid whose base is D and whose “top” is the D surface M with points (x, y, f (x, y )) . In this case, the volume is 44/3. f dA where f (x, y ) = x2 + xy + 2 and D is the domain bounded by (2) Evaluate D the curves φ1 (x) = 2x2 , φ2 (x) = 4 + x2 , and x = 0 . Integrate first with respect to y . Then y varies between 2x2 and 4 + x2 , while x varies between the two straight lines x = 0 and x = 2 . 2 4+x2 (x2 + xy + 2) dy f dA = 0 D 2 (x2 y + = 0 2 0 dx 2 x2 xy 2 + 2y ) 2 y =4+x2 y =2x2 dy 3 464 (8 + 8x + 2x2 + 4x3 − x4 − x5 ) dy = 2 15 8.4. MULTIPLE INTEGRALS 355 f dA where f (x, y ) = (x − 2y )2 and D is the triangle bounded by (3) Evaluate D x = 1, y = −2 , and y + 2x = 6. We shall integrate first with respect to x . Then x varies between x = 1 and x = − 1 y + 2 , while y varies between the lines y = −2 and y = 2 . 2 1 y +2 2 2 f dA = −2 D (x − 2y )2 dx dy 1 Since 1 − 2 y +2 1 1 (x − 2y )2 (x − 2y )2 dx = (x − 2y )3 3 we find f dA = D x=− 1 y +2 2 x=1 1 5 1 = (2 − y )3 − (1 − 2y )3 , 3 2 3 2 1 3 164 5 . [(2 − y )3 − (1 − 2y )3 ] dy = 2 3 −2 One can also integrate first with respect to y . Then y varies between y = −2 and y = −2x + 6 , while x varies between the lines x = 1 and x = 3 . −2x+4 3 (x − 2y )2 dy f dA = D dx. −2 1 Since −2x+4 −2 1 (x − 2y )2 dy = − (x − 2y )3 6 y =−2x+4 y =−2 1 = − [(5x − 8)3 − (x + 4)3 ] 6 we again find f dA = − D 1 6 3 [(5x − 8)3 − (x + 4)3 ] dx = 1 164 . 3 (4) Find the volume of the pyramid P bounded by the four planes x = 0, y = 0, z = 0, and x + y + z = 1 . The easiest way to do this is to let z = f (x, y ) = 1 − x − y be the height function of the tilted plane which we shall take as the top of the pyramid which lies above the triangle D (in the xy plane) which is bounded by the three lines x = 0, y = 0 , and x + y = 1 . Then Volume (P ) = f (x, y ) dx dy D One can integrate with respect to either x or y first. We shall do the x integration first. 1− y 1 (1 − x − y ) dx f dA = D Since 1− y 0 0 dy. 0 1 (1 − x − y ) dx = − (1 − x − y )2 2 x=1−y x=0 1 = (1 − y )2 2 356 CHAPTER 8. MAPPINGS FROM EN TO E : THE DIFFERENTIAL CALCULUS we find Volume (P ) = f dA = D 1 2 1 0 1 (1 − y )2 dy = − (1 − y )3 6 1 0 1 =. 6 This agrees with the usual formula for the volume of a pyramid 1 Vol = altitude × area of base. 3 The identical methods work for triple integrals. All of the theorems and proofs remain unchanged. Again the integral f dV D can either be interpreted as the mass of a solid D with density f , or as the “volume” of a four dimensional solid whose base is D and top in the surface with points (x, y, z, f (x, y, z )) . Because of conceptual difficulties, one usually thinks of f as a density. Calculation of triple integrals is done by evaluating three integrals, as f dV = f (x, y, z ) dz dy dx, D where the limits in the iterated integral on the right are determined from the domain D . An example should illustrate the idea adequately, f dV where f (x, y, z ) ≡ c and D is the solid bounded by the Example: Evaluate D two planes z ≡ 0, y ≡ 2 , and the surface z ≡ −x2 + y 2 . We have to evaluate c dV D which is the mass of the solid D with constant density c , that is c times the volume of D . It is convenient to carry out the z integration first, then the x integration 2 − x2 + y 2 y c dV = c D dz −y 0 dx dy. 0 The x limits of integration have been found by looking at the region of integration in the xy plane beneath the surface z = −x2 + y 2 . This region, found by setting z = 0 , consists of the points between the straight lines 0 = −x2 + y 2 , that is between the lines x = y and x = −y . Then 2 y (−x2 + y 2 ) dx f dV = c D 2 (− =c 0 x3 + xy 2 ) 3 dy −y 0 2 x= y x= − y dy = c 0 4 16 dy = c. 3 3 By letting c = 1 , the volume of the solid is seen to be 16/3 . Exercises (1) Evaluate xy dx dy for the following domains D in two ways: D and ( xy dy ) dx . ( xy dx) dy 8.4. MULTIPLE INTEGRALS 357 (a) D is the rectangle with vertices at (1, 1), (1, 5), (3, 1) and (3, 5) . (b) D is the triangle with vertices at (1, 1), (3, 1) and (3, 5) . (c) D is the region enclosed by the lines x = 1, y = 2 , and the curve y = x3 (a curvilinear triangle). √ (d) D is the region enclosed by the curves y = x2 and y = x . (2) Evaluate sin π (2x + y ) dx dy, D where D is the triangle bounded by the lines x = 1, y = 2 and x − y = 5 . (3) Evaluate (xy − y3 ) dx dy, D where D is the region enclosed by the lines x = −1, x = 1, y = −2 and the curve y = 2 − x2 . (4) Evaluate (xy + z ) dx dy dz, D where D is the rectangular parallelepiped bounded by the six planes x = −2, y = 1, z = 0, x = 1, y = 2, z = 3 . (5) Evaluate xyz dx dy dz, D where D is the solid enclosed by the paraboloid z = x2 + y 2 and the plane z = 4 . (6) Find the volume of an octant of the ball x2 + y 2 + z 2 ≤ a2 in two ways; (a) by evaluating f (x, y )dx dy D where f is a suitable function and D a suitable domain (b) by evaluating dx dy dz, D where D is the ball. (7) If f (x, y ) > 0 is the density function of a plate D , the x and y coordinates of the center of mass (¯, y ) are defined by x¯ x= xf (x, y ) dx dy , D f (x, y ) dx dy D y= yf (x, y ) dx dy . D f (x, y ) dx dy D Find the center of mass of a triangle whose vertices are at the points (0, 0), (0, 4) , and (2, 0) , and whose density is f (x, y ) = xy + 1 . 358 CHAPTER 8. MAPPINGS FROM EN TO E : THE DIFFERENTIAL CALCULUS (8) The moment of inertia with respect to a point p = (ξ, η ) of a plate D with density f (x, y ) is defined by [(x − ξ )2 + (y − η )2 ]f (x, y )dx dy. Jp (D) = D (a) Find the moment of inertia of the plate in Exercise 7, with respect to the point p = (1, 0) . (b) If D is any plate (with sufficiently smooth boundary), prove that the moment of inertia is smallest if the point f = (ξ, η ) is taken to be the center of mass of D . [Hint: Consider J as a function of the two variables ξ and η and show J has a minimum at (¯, y ) .] x¯ (9) (a) Show that fxy (x, y )dx dy = f (p1 ) − f (p2 ) + f (p3 ) − f (p4 ), D where D is a rectangle with vertices at p1 , p2 , p3 , p4 (see fig.). (b) Use the result of part (a) to again evaluate the integral in Ex. 1a. (c) If U (x, y ) satisfies the partial differential equation Uxy = 0 for 0 < y < x and U (x, x) = 0 while U (x, 0) = x sin x , find U (x, y ) for all points (x, y ) in the wedge 0 < y < x . [Answer: U (x, y ) = x sin x − y sin y for 0 < y < x ]. (10) Let f (x, y ) be a bounded function which is continuous except as a set of points of content zero, and suppose f has compact support. Prove that f is Riemann integrable. This again proves Theorem 13. (11) Let D1 and D2 be domains whose boundaries have zero content and whose intersection D1 ∩ D2 has zero content. (a) If f is continuous on D1 ∪ D2 , prove that the integral f dA exists and D1 ∪D2 that f dA = D1 ∪D2 f dA + D1 f dA. D2 (b) Give an example showing the above equality does not hold if D1 ∩ D2 has nonzero content. (12) (a) By an explicit construction, show that the region D = {(x, y ) E2 : |x| + |y | ≤ 1} has boundary with zero content. (b) By an explicit construction, show that the circle ? = {(x, y ) E2 : x2 + y 2 = 1} has zero content. (13) (a) By interchanging the order of integration, show that x s ( 0 x (b) 2 ( 0 0 r f (t) dt)dr) ds =? ( 0 x 0 (x − t)f (t) dt. f (t) dt) ds = 0 8.4. MULTIPLE INTEGRALS 359 (14) Let D be a plate in the x, y plane with density f and total mass M . If p = (ξ, η ) is an arbitrary point in the plane and p = (¯, y ) is the center of mass of D , prove ¯ x¯ Jp (D) = Jp (D) + M p − p0 2 , ¯ where the notation of Exercise 8 has been used. This is the parallel axis theorem. It again proves the result of Exercise 8b. 360 CHAPTER 8. MAPPINGS FROM EN TO E : THE DIFFERENTIAL CALCULUS Chapter 9 Differential Calculus of Maps from n to Em , s. E 9.1 The Derivative . Now we generalize the ideas of Chapters 7 and 8 and consider nonlinear mappings from a set D in En to Em , F : D ⊂ En → Em , or Y = F (X ) , where X ∈ D and Y ∈ Em . In coordinates, these functions look like y1 = f1 (x1 , . . . , xn ) · · · ym = fm (x1 , . . . , xn ) where the functions fj are scalar-valued. The special case n = 1, m arbitrary, was treated in Chapter 7, section 3, while the special case m = 1, n arbitrary, was treated in Chapter 8. One interpretation of maps F : D ⊂ En → Em is as a geometric transformation from some subset D of En into all or part of Em . EXAMPLES. (1) The affine map Y = F (X ) defined by y1 = 2 + x1 − 2x2 y2 = 1 + x1 + x2 maps E2 into E2 . Under this map, the origin goes into (2, 1) , the x1 axis (i.e. the line x2 = 0 ) goes into the line y1 − y2 = 1 , a figure goes here 361 362 CHAPTER 9. DIFFERENTIAL CALCULUS OF MAPS FROM EN TO EM , S. while the x2 axis goes into the line y1 + 2y2 = 4 . The shaded region indicates the image of the indicated square. (2) The map Y = F (X ) defined by y1 = x1 − x2 y2 = x2 + x2 1 2 maps all of E2 onto the upper half y1 y2 plane (since y2 ≥ 0 ). Let us see what happens to a rectangle under this mapping. Consider the rectangle R in the figure. 2 The x1 axis, x2 = 0 , goes into the parabola y2 = y1 , and the line x2 = 1 into 2. y2 = 1 + (y1 + 1) a figure goes here Similarly, the line x1 = 1 is mapped into y2 = 1 + (y1 − 1)2 , while x1 = 2 is mapped into y2 = 4 + (y1 − 2)2 . By following the images of the boundary ∂R , we now see that the interior of R is mapped into the shaded curvilinear “parallelogram”. This mapping, though injective when restricted to our rectangle, is not injective for all (x1 , x2 ) ∈ E2 , since, for example, the points X1 = (1, 2) and X2 = (−2, −1) are both mapped into the same point (−1, 5) . (3) The function w = x2 + x2 whose graph is a paraboloid, is a map from E2 into 1 2 E1 . It can also be regarded as a map from E2 into E3 by a useful artifice. Let y1 = x1 , y2 = x2 , and y3 = w = x2 + x2 . Then 1 2 y1 = x1 y2 = x2 y3 = x2 + x2 1 2 is a map F from E2 into E3 . The image of the unit square (see figure) is then the shaded region in the figure above the image (y1 , y2 ) of the square R a figure goes here (4) The map F : E2 → E3 defined by (cf. example 2) y1 = x1 − x2 y2 = x2 + x2 1 2 y3 = x1 + x2 2 2 also represents a surface M . In fact, since y1 + y3 = 2y2 , this surface is a paraboloid opening out on the y2 axis. Again, we investigate where the rectangle R of example 2 is mapped. Since the y1 and y2 components of the mapping are the same as before, the image of R will lie on the surface M above the image (y1 , y2 ) of (x1 , x2 ) . Thus the image of the rectangle R is a patch of the surface M . 9.1. THE DERIVATIVE 363 From these examples, we see it is natural to regard any map F : D ⊂ E2 → Em as an ordinary surface, or two dimensional manifold, embedded in Em , much as a map F : D ⊂ E1 → Em was regarded as an ordinary curve. In the case m = 1 , the surface F : D ⊂ E2 → E1 was representable as the graph of the function F . For m = 2 and higher, this surface is seen as the range of the map. In the same way, an n dimensional surface, or manifold, embedded in I Em is a map F : D ⊂ En → Em . You might want to think of n as being the number of “degrees of freedom” on the manifold. In a strict sense, the map F : D ⊂ En → Em is not an n manifold embedded in Em unless Em is big enough to hold an m manifold, i.e. m ≥ n . However by either using the graph of F , a subset of Em+n , or by using the trick of example 3 we can always think of the map F : En → Em as an n dimensional surface. For m ≥ n , this surface can be embedded as a subset of En . There are several valuable physical interpretations of these vector valued functions of a vector, Y = F (X ) . Consider a fluid flowing through a domain D in E3 . The fluid could be air and D as the outside of an airplane, or the fluid could be an organic fluid, and D as some portion of the body. The velocity V of a particle of fluid is a three vector which depends upon the space coordinate (x1 , x2 , x3 ) as well as the time coordinate t of the particle, V = F (x1 , x2 , x3 , t) = F (X, t) . This velocity vector V (X, t) at X points in the direction the fluid is moving. Thus, the velocity function is an example of a mapping from space-time E3 × E1 ∼ E4 into = vectors in E3 . In this case, we think of the velocity vector V = F (X, t) as having its foot at the point X ∈ D and imagine the mapping as the domain D along with a vector V attached to each point of D (see fig. above). One calls this a vector field defined on the domain D , since it assigns a vector to each point of D . A very common vector field is a field of forces. By this we mean that to every point X of a domain D , we associate a vector F (X ) equal to the force an object at X “feels”. If the forces are time dependent, then the force field is written F (X, t), X ∈ D . You are most familiar with the force field due to gravity. If e3 is the direction toward the center of the earth, and say e1 points east and e2 north along the surface of the earth (other coordinates must be chosen for the north and south poles), then the gravitational force is usually written as F = (0, 0, g ) , a constant vector pointing down to the center of the earth. For more precise purposes, one must take into account the fact that g does vary from place to place of the earth’s surface. Then F (x) = (0, 0, g (X )) . In even more accurate experiments - or in outer space - must further account for the effect of the other heavenly bodies. This brings in the other components of force as well as a time dependence due to the motion of the earth, F (X, t) = (f1 (X, t), f2 (X, t), f3 (X, t)) . The force field is imagined as a vector attached to each point X in space, the vector having the magnitude and direction of the net force F there. An entirely different example of a mapping F from En to Em is a factory - or an even larger economic system. The vector X = (x1 , x2 , . . . , xn ) might represent the quantities x1 , x2 , . . . of different raw materials needed. Y = F (X ) could then represent the output from the factory, the number yj being the quantity of the j th product produced from the input X . Turning to the quantitative mathematical aspect of the mappings F : En → Em , we define the derivative. The definition will be formal, patterned directly on the definition of the total derivative given previously (p. 578-9). Definition: Let F : D ⊂ En → Em and X0 be an interior point of D . F is differentiable 364 CHAPTER 9. DIFFERENTIAL CALCULUS OF MAPS FROM EN TO EM , S. at X0 , if there exists a linear transformation L(X0 ) : En → Em , depending on the base point X0 , such that lim h →0 F (X0 + h) − F (X0 ) − L(X0 ) h =0 h for any vector h in some sufficiently small ball about X0 . If F is differentiable at X0 , we shall use the notations dF (X0 ) = F (X0 ) = L(X0 ) dX and refer to them as the derivative of F at X0 . If F (X0 ) depends continuously on the base point X0 for all X0 in D , then F is said to be continuously differentiable in D , written F ∈ C 1 (D) . Many of the results from Chapter 8 Sections 1 and 2 generalize immediately to the present situation. Proposition 9.1 . The function F : D ⊂ En → Em is differentiable at the interior point X0 ∈ D if and only if there is a linear operator L(X0 ) : En → Em and a function R(X0 , h) such that F (X0 + h) = F (X0 ) + L(X0 ) h + R(X0 , h) h, where the remainder R(X0 , h) has the property lim h →0 R(X0 , h) = 0. Proof: ⇐ If F is differentiable at X0 , let L(X0 ) be the derivative and take R(X0 , h) = [F (X0 + h) − F (X0 ) − L(X0 ) h]/ h . Then this L(X0 ) and R(X0 , h) do satisfy the above conditions. ⇒ If L(X0 ) and R(X0 , h) are as above, then lim h →0 F (X0 + h) − F (X0 ) − L(X0 ) h = lim R(X0 , h) = 0. h h →0 Since L(X0 ) is linear, this proves F is differentiable at X0 . There is at most one derivative operator L(X0 ) , that is Proposition 9.2 . (Uniqueness of the derivative). Let F : D ⊂ En → Em be differentiable ˆ ˜ at the interior point X0 ∈ D . If L(X0 ) and L(X0 ) are linear operators both of which satisfy ˆ ˜ the conditions for the derivative of F and X0 , then L(X0 ) = L(X0 ) . Proof: Word for word the same as the proof of Theorem 1, page 579-80. If the map F = F (X ) is given in terms of coordinates, y1 = f1 (x1 , . . . , xn ) y2 = f2 (x1 , . . . , xn ) · · · · · · ym = fm (x1 , . . . , xn ), how is the derivative computed, and what is its relationship to the derivative of the individual coordinate functions fj ? The answer is contained in 9.1. THE DERIVATIVE 365 Theorem 9.3 . Let F map D ⊂ En into Em be given in terms of the coordinate functions fj (X ), j = 1, . . . , m y1 = f1 (X ) f1 (x1 , . . . , xn ) · · · ym = fm (X ) = fm (x1 , . . . , xm ). (a) Then F is differentiable or continuously differentiable at the interior point X0 ∈ D if and only if all of the fj ’s are respectively differentiable or continuously differentiable. (b) Moreover, if F is differentiable at X0 , then the derivative in these coordinates is given by the m × n matrix of partial derivatives ∂f1 ∂f1 (X0 ), . . . , ∂xn (X0 ) f1 (X0 ) ∂x1 · · = · L(X0 ) := F (X0 ) = · . · · ∂fm fm (X0 ) (X0 ), . . . , ∂fm (X0 ) ∂x1 ∂xn The matrix is sometimes called the Jacobian matrix. Proof: (a) Observe that the limit lim h →0 F (X0 + h) − F (X0 ) − L(X0 ) h =0 h exists if and only if each of its components tend to zero, lim h →0 fj X0 + h) − fj (X0 ) − Lj (X0 ) h = 0, h j = 1, 2, . . . , m. Thus, if F is differentiable at X0 , each of the coordinate functions fj are differentiable and have total derivative Lj (X0 ) . Conversely, if each of the coordinate functions are differentiable at X0 , all of the above limits exist so the vector valued function F is also differentiable. (b) Since the differentiability of F implies that of the coordinate vectors, we have f1 (X0 ) · . · F (X0 ) = · fm (X0 ) The result now follows by writing out each of the derivatives f1 (X0 ) = ( ∂f1 (X0 ) ∂f1 (X0 ) ,..., ) ∂x1 ∂xn f2 (X0 ) = . . . etc. and then inserting these in the expression for F (X0 ) . Corollary 9.4 . A function F : D ⊂ En → Em is continuously differentiable in D if and only if all the partial derivatives of its components ∂fi /∂xj exist and are continuous. CHAPTER 9. DIFFERENTIAL CALCULUS OF MAPS FROM EN TO EM , S. 366 Proof: This follows from this theorem and Theorem 3, p. 585. EXAMPLES. 1. Let F be an affine map from En to Em F (X ) = Y0 + BX, where B is a linear operator from En to Em (which you may choose to think of as an m × n matrix with respect to some coordinate system) and Y0 = F (0) is a fixed vector in Em . Then F is differentiable at every point of En and it given by the eminently reasonable formula F (X0 ) = B, where the operator B does not depend on X0 . For proof, we observe that F (X0 + h) − F (X0 ) = Y0 + B (X0 + h) − [Y0 + BX0 ] = Bh. Thus lim h →0 F (X0 + h) − F (X0 ) − Bh 0 = lim = 0. h h →0 h Since B is linear, this shows the derivatives exists and is B . Let us do this again in coordinates. If B = ((bij )) the function F is f1 (X ) = y01 + b11 x1 + b12 x2 + . . . +b1n xn f2 (X ) = y02 + b21 x1 + . . . +b2n xn · · · +bmn xn . fm (X ) = y0m + bm1 x1 + . . . Therefore each of the functions fj is clearly differentiable and ∂f ∂f1 f1 = ( ∂x1 , . . . , ∂xn ) = (b11 , . . . , b1n ) 1 · · · · · · · · · ∂fm ∂fm fm = ( ∂x1 , . . . , ∂xn ) = (bm1 , . . . , bmn ). Consequently, F (X0 ) = f1 (X0 ) · · · fm (X0 ) = b11 , . . . , b1m · · · bm1 , . . . , bmn which agrees with the result obtained without coordinates. 2. Let F : E2 → E3 be defined by f1 (x1 , x2 ) = 2 − x1 + x2 2 = B, 9.1. THE DERIVATIVE 367 f2 (x1 , x2 ) = x1 x2 − x3 2 f3 (x1 , x2 ) = x2 − 3x1 x2 . 1 Since each of the coordinate functions fj are continuously differentiable, so is F . Because f1 (X ) = (−1, 2x2 ), f2 (X ) − (x2 , x1 − 3x2 ), 2 f3 (X ) = (2x1 − 3x2 , −3x1 ), we find that at X0 = (3, 1) f1 (X0 ) −1 2 0 . F (X0 ) = f2 (X0 ) = 1 f3 (X0 ) 3 −9 If X is near X0 , then by Proposition 1 with h = X − X0 F (X ) = F (X0 ) + f (X0 )(X − X0 ) + remainder −1 2 0 x1 − 3 0 + remainder, = 2 + 1 x2 − 1 3 −9 3 where the remainder term becomes less significant the closer X is to X0 . Motivated by our previous work, it is natural to formally define the tangent map as follows. Definition: Let F : D ⊂ En → Em be differentiable at the interior point X0 ∈ D . The tangent map at F (X0 ) to the (hyper) surface defined by F is defined to be the affine mapping Φ(X ) = F (X0 ) + f (X0 )(X − X0 ). Examples: (1) Let F be the function of Example 2 above. Then the tangent map at X0 = (3, 1) is −1 2 0 x1 − 3 0 . Φ(X ) = 2 + 1 x2 − 1 3 −9 3 (2) Let F be the function of Example 4 (page 679). Then 1 −1 F (X ) = 2x1 2x2 . 1 1 Thus the tangent map at (2, 1) is 1 −1 1 2 Φ(X ) = 5 + 4 3 1 1 x1 − 2 x2 − 1 If we let Y = Φ(X ) , then the target plane in the tangent space is found from y1 = 1 + (x1 − 2) − (x2 − 1) y2 = 5 + 4(x1 − 2) + 2(x2 − 1) y3 = 3 + (x1 − 2) + (x2 − 1) By eliminating x1 and x2 from these equations, we find y2 = −5 + y1 + 3y3 . A graph of the surface M and the tangent plane can now be drawn. 368 CHAPTER 9. DIFFERENTIAL CALCULUS OF MAPS FROM EN TO EM , S. a figure goes here The next result is the generalization of the mean value theorem. Theorem 9.5 . (Mean Value Theorem). Let F : D ⊂ En → Em be differentiable at every point of D , where D is an open convex set in En . If F (X ) is bounded in D , that is, if ∂ fi there is a constant γ < ∞ such that ∂xj (X ) ≤ γ for all X ∈ D and for all i = 1, . . . , m , and j = 1, . . . , n , then F (X2 ) − F (X1 ) ≤ c X2 − X1 √ for all X1 and X2 in D , where C = nmγ . Proof: The idea is to use the components of F and to appeal to the similar theorem (p. 597-8) for the function from En → E1 . By that theorem, if X1 and X2 are in D , then there is a point Z1 on the line segment joining X1 to X2 such that f1 (X2 ) = f1 (X1 ) + f1 (Z1 )(X2 − X1 ), and similarly for the other components f2 , f3 , . . . , fm . Thus f1 (Z1 ) f1 (X1 ) f1 (X2 ) · · · = = · · · · · · fm (Zm ) fm (X1 ) fm (X2 ) (X2 − X1 ), where Z1 , . . . , Zm are all on the segment joining a figure goes here X1 to X2 . Observe that the fj (Zj ) ’s are all vectors. Let L be the matrix of derivatives in the last term above, that is ∂f1 ∂f1 (Z1 ) · · · ∂xn (Z1 ) f1 (Z1 ) ∂x2 · · = · · L= . · · ∂fm ∂fm fm (Zm ) ∂x (Zm ) · · · ∂x (Zm ) 1 n The above equation then reads F (X2 ) = F (X1 ) + L(X2 − X1 ). (9-1) This equation itself is sometimes referred to as the mean value theorem. Note, however, that the partial derivatives in L are not all evaluated at the same point. ∂ fi Since ∂xj (X ) ≤ γ for all X , if η is any vector in En , by Theorem 17, p. 373. we find that √ Lη ≤ nmγ η . Taking η = X2 − X1 , and using (1), we are led to the inequality √ F (X2 ) − F (X1 ) ≤ nmγ X2 − X1 , 9.1. THE DERIVATIVE 369 which holds for any points X1 and X2 in D . With C = inequality. √ nmγ , this is the desired A few heuristic remarks. We have been considering mappings F : En → Em . In the case of linear mappings, L : En → Em , it was possible to prove that the range of L had dimension no greater than n , dim R(L) ≤ n . Although this does not remain true for an arbitrary nonlinear map F , it is still true if F is differentiable - after a suitable definition of dimension for an arbitrary point set is made (for the range of F will not usually be a linear space, the only sets whose dimension we have so far defined). In the case of differentiable maps F , it is easy to make a reasonable definition of dimension. The idea is to define dimension of the range of F locally, that is, in the neighborhood of every point in the range. If F : D ⊂ En → Em and F is differentiable at X ∈ D , then for all h sufficiently small, F (X + h) = F (X ) + L(X ) h + remainder. The dimension of the range of F at F (X ) is defined to be the dimension of its affine part, which is the same as dim bR(L(X ) ) . Since L(X ) is a linear operator, its range has a well defined dimension. Geometrically, we have defined dimension of the range of F at F (X ) as the dimension of the tangent plane at F (X ) . Our definition makes good physical sense for it is exactly the number an insect on the surface would use for the dimension. The illustration below is for a map F : D E2 → E3 whose range has dimension 2, a figure goes here Some special remarks should be made about maps from one space into another of the same dimension, F : D En → En . Let us assume F is differentiable throughout D . Then the dimension of the range of F at F (X ), X ∈ D , is the dimension of the range of L(X ) = F (X ) . If F is to preserve dimension at every point, then we must have dim R(L(X ) ) = n for all X ∈ D . For maps F given in terms of coordinates, this means the determinant of the n × n matrix L(X ) does not vanish, det L(X ) = det F (X ) = 0 for all x ∈ D . In more conceptual terms, this states that a map F : D ⊂ En → En is dimension preserving at X0 ∈ D if its “affine part” Φ(X0 + h) = F (X0 ) + F (X0 )h is dimension preserving at X0 (there is no trouble with the constant vector F (X0 ) since it only represents a translation of the origin - which does not affect dimensionality). From the geometric interpretation of determinants as volume, we see that the condition det F (X0 ) = 0 means that if a small set S ⊂ D has non-zero volume, then its image F (X ) also has non-zero volume. In fact, we expect that if S is a small set about X , then Vol (F (S )) = det F (X0 ) Vol (S ). Our expectation is based upon the realization that if the points of S are all near X0 , then F will behave like its affine part, (X0 + h) = F (X0 ) + F (X0 )h , on the points X0 + h ∈ S . The above formula is a restatement of the effect of affine maps on volume (Corollary to Theorem 30, page 426). We shall return to this later (Chapter 10, Section 4). CHAPTER 9. DIFFERENTIAL CALCULUS OF MAPS FROM EN TO EM , S. 370 Because of its frequent appearance, det F (X ) has a name of its own. It is called the Jacobian determinant or just the Jacobian of F . If F is given in terms of coordinates, y1 = f1 (x1 , . . . , xn ) · · · yn = fn (x1 , . . . , xn ), then another common notation for the Jacobian is det F (X ) = ∂ (f1 , f2 , . . . , fn ) . ∂ (x1 , x2 , . . . , xn ) For these maps F from a space into one of the same dimension, F : D ⊂ En → En , there is a very special derivative which appears often. It is the sum of the diagonal elements of the derivative matrix F (X ) . One writes this expression as . F or ÷F , the divergence of F , ∂f1 (X ) ∂f2 (X ) ∂fn (X ) · F (X ) = div F (X ) = + + ··· + ∂x1 ∂x2 ∂xn For example, if Y = F (X ) is defined by y1 = x1 + 2x1 x2 y2 = x2 − 3x2 , 1 then F (X ) = 1 + 2x2 2x1 2x1 −3 and · F (X ) = div F (X ) = (1 + 2x2 ) + (−3) = −2 + 2x2 . The significance of the divergence will become clear later (Chapter 10, Section 2). You will probably find it helpful to think of as the operator =( Then ∂ ∂ ,··· , ). ∂x1 ∂xn · F is the “scalar product” of the operator with the vector F . EXERCISES. (1) (a) Find the derivative matrix at the given point for the following mappings Y = F (X ) . (i) y1 y2 (ii) y1 y2 y3 y4 = x2 + sin x1 x2 1 = x2 + cos x1 x2 at X0 = (0, 0) 2 = x2 + x3 ex2 − x3 1 2 = x1 − 3x2 + x1 log x3 = x2 + x3 = 5x1 x2 x3 at X0 = (2, 0, 1) (b) Find the equation of the tangent plane to the above surfaces at the given point. 9.1. THE DERIVATIVE 371 (2) Consider the following map from E2 → E2 , u = ex cos y v = ex sin y (a) Find the image of the following regions i) x ≥ 0, 0 ≤ y ≤ π 4 ii) x ≥ 0, 0 ≤ y ≤ π iii) x ≤ 0, 0 ≤ y ≤ 2π iv) 1 < x < 2, π ≤ y ≤ π . 6 3 (b) Compute the derivative matrix and its determinant. (3) If F : D ⊂ En → Em is differentiable at X0 ∈ D , prove it is then also continuous at X0 . (4) Let F and G both map D ⊂ En → Em , so the function f (X ) = F (X ), G(X ) is defined for all X ∈ D and f : D → E1 . (a) If F and G are differentiable in D , prove f is also, and that f =F G+GF (b) Apply this result to the function f (X ) = X, AX − 2 X, Y , where A is a constant linear operator from En → En and Y is a constant vector in En . How does the result simplify if A is self adjoint? (5) If ϕ : D ⊂ En → E1 and F : D ⊂ En → Em , then the function G(X ) := ϕ(X )F (X ) is defined for all x ∈ D and G : En → Em . (a) Let ϕ(x2 , x2 ) = ax1 + bx2 and F (x1 , x2 ) = (αx1 + βx2 , γx1 + δx2 ) . Let G = ϕF and compute G (X ) . (b) More generally, prove that if ϕ and F are differentiable in D , then G := ϕF is also differentiable and find a formula for G . If F is expressed in terms of coordinate functions, F = (f1 , f2 , . . . , fm ) , how does your formula read? Check the result with that of part (a). (6) (a) If F : D ⊂ En → Em is differentiable in the open connected set D , and if F (X ) ≡ 0 for all x ∈ D , prove that F is a constant vector. (b) If F and G map D ⊂ En → Em are differentiable in the open connected set D , and if F (X ) ≡ G (X ) for all x ∈ D , what can you conclude? (7) Consider the map F : Q → R3 defined by x = (a + b cos ϕ) cos θ F : y = (a + b cos ϕ) sin θ z = b sin ϕ 372 CHAPTER 9. DIFFERENTIAL CALCULUS OF MAPS FROM EN TO EM , S. a figure goes here (a) Compute F . (b) Find the equation of the tangent map at (0, 0) and at (π/2, π/2) . (c) Determine the range of the tangent map at the above two points and indicate your findings in a sketch. 9.2. THE DERIVATIVE OF COMPOSITE MAPS (“THE CHAIN RULE”). 9.2 373 The Derivative of Composite Maps (“The Chain Rule”). Consider the two mappings F : A ⊂ En → Em and G : B ⊂ Em → Er . Then the composite map H := G ◦ F : A ⊂ En → Er is defined if B contains the image of all the points from A, F (A) ⊂ B . a figure goes here The map H = G ◦ F takes points from A ⊂ En and sends them into Er . From knowledge of the derivatives of F and G , it is possible to compute the derivative of the composite map G ◦ F . Theorem 9.6 . Let F : A ⊂ En → Em and G : B ⊂ Em → Er be differentiable maps defined in the open sets A and B , respectively, with F (A) ⊂ B (so the composite map H (X ) := (G ◦ F )(X ) is defined for all X ∈ A ). If X0 ∈ A , let Y0 = F (X0 ) ∈ B . Then the composite map H is differentiable at X0 and H (X0 ) = G (Y0 ) ◦ F (X0 ). Remark: The multiplication G ◦ F is the multiplication of the linear operators G and F . If F and G are given in terms of coordinates, then the formula is just the product of two matrices G and F . Before proving this theorem, we shall illustrate its meaning. Example: Let F : E2 → E2 and G : E2 → E3 be defined by Y = F (X ) and Z = G(Y ) as follows z1 = y 1 y 2 y1 = x1 − x2 2 2 z = 1 + y1 + y2 y2 = x2 sin πx1 2 3 z3 = 5 − y 2 . Then y2 y1 1 . G (X ) = 2y1 2 0 −3y2 F (X ) = 1 −2x2 πx2 cos πx1 sin πx1 , At X0 = (3, 2) , we find Y0 = F (X0 ) = (−1, 0) . Thus F (X0 ) = 1 −4 −2π 0 , 0 −1 1 . G (Y0 ) = −2 0 0 If H (X ) = (G ◦ F )(X ) = G(F (X )) , then the derivative of H at X0 is 0 −1 2π 0 1 −4 1 H (X0 ) = G (Y0 ) ◦ F (X0 ) = −2 = −2 − 2π 8 . −2π 0 0 0 0 0 CHAPTER 9. DIFFERENTIAL CALCULUS OF MAPS FROM EN TO EM , S. 374 The derivative could also have been found in a longer way by explicitly finding Z = H (X ) from the formulas for F and G z1 = y1 y2 = (x1 − x2 )(x2 sin πx1 ) 2 2 z2 = 1 + y1 + y2 = 1 + (x1 − x2 )2 + x2 sin πx1 2 3 z3 = 5 − y2 = 5 − (x2 sin πx1 )3 and now directly computing H (X0 ) . Proof of Theorem. Since F is differentiable at X0 ∈ A ⊂ En and G is differentiable at Y0 ∈ B ⊂ Er , for all sufficiently small vectors h ∈ En and k ∈ Em , we can write F (X0 + h) = F (X0 ) + F (X0 )h + R1 (X0 , h) h G(Y0 + k ) = G(Y0 ) + G (Y0 )k + R2 (Y0 , k ) k where lim h →0 R1 (X0 ; h) = 0 and lim k →0 R2 (Y0 , k ) = 0. Consequently, since H (X ) := (G ◦ F )(X ) = G(F (X )) , H (X0 + h) = G(F (X0 + h)) = G(F (X0 ) + F (X0 )h + R1 (X0 ; h) h = G(F (X0 )) + G (Y0 )F (X0 )h + R3 (X0 , h) h , where R3 (X0 ; h) = G (Y0 )R1 (X0 ; h) + R2 (Y0 , k ) k , h and k = F (X0 )h + R1 (X0 ; h) h . Thus, for all sufficiently small h , H (X0 + h) = H (X0 ) + G (Y0 )F (X0 )h + R3 (X0 , h) h . Because G (Y0 ) and F (X0 ) are linear maps, so is their product. Therefore we are done if we prove lim R3 (X0 ; h) = 0 . h →0 By the triangle inequality R3 (X0 ; h) ≤ G (Y0 )R1 (X0 ; h) + R2 (Y0 , k ) h k . Since for fixed X0 , the operators F (X0 ) and G (Y0 ) are constant operators, by Theorem 17, p. 373, there exist constants α and β such that for any vectors ξ ∈ En and η ∈ Em , F (X0 )ξ ≤ α ξ and G (Y0 )η ≤ β η . This means k ≤ F (X0 )h + R1 (X0 ; h) h ≤ (α + R1 (X0 ; h) ) h 9.2. THE DERIVATIVE OF COMPOSITE MAPS (“THE CHAIN RULE”). 375 and G (Y0 )R1 (X0 ; h) ≤ β R1 (X0 ; h) . Thus, R3 (X0 ; h) ≤ β R1 (X0 ; h) + (α + R1 (X0 ; h) ) R2 (Y0 , k ) Now, as h → 0 , so does k ≤ (α + R1 (X0 ; h) ) h . From the definition of R1 and R2 , this implies R3 (X0 ; h) → 0 as h → 0 and completes the proof. Incidentally, if one writes Y = F (X ) and Z = G(Y ) , then the chain rule can be written in the form dG dY d (G ◦ F ) = ◦ , dx dY dX which could hardly be more simple to remember. For the balance of this section, we shall work out a few more illustrations showing how the chain rule is applied in different concrete situations. We isolate the next example as an important Corollary 9.7 . Let F : D ⊂ En → Em and the scalar valued function g : Em → E1 both satisfy the hypotheses of Theorem 1. If we write Y = F (X ) in coordinates F = (f1 , f2 , . . . , fm ) , and let h = g ◦ F , then ∂h ∂x1 = ∂g ∂f1 ∂y1 ∂x1 + ∂g ∂f2 ∂y2 ∂x1 + ··· + ∂g ∂fm ∂ym ∂x1 = ∂g ∂f1 ∂y1 ∂xn + ∂g ∂f2 ∂y2 ∂xn + ··· + ∂g ∂fm ∂ym ∂xn · · · ∂h ∂xn Remark: This is the chain rule for scalar-valued functions. Proof: By Theorem 3, dq dF dh = dX dY dX Since ∂g ∂g dq ,··· , ) =( dY ∂y1 ∂ym and dF = dX ∂f1 ∂x1 ··· ∂f1 ∂xn · · · ∂fm ∂x1 ··· ∂fm ∂xn , we find upon multiplying the matrices that dh =( dX But we also know m j =1 ∂g ∂fj , ∂yj ∂x1 m j =1 ∂g ∂fj ,··· , ∂yj ∂x2 m j =1 ∂h ∂h ∂h dh =( , ,··· ). dX ∂x1 ∂x2 ∂xn ∂g ∂fj ). ∂yj ∂xn CHAPTER 9. DIFFERENTIAL CALCULUS OF MAPS FROM EN TO EM , S. 376 Comparison of the last two formulas gives the stated result. EXAMPLE. Let F : E2 → E2 and g : E2 → E1 be defined by f1 (x1 , x2 ) = x1 − ex2 , f2 (x1 , x2 ) = ex1 + x2 Then F (X ) = 1 ex1 −ex2 1 2 g (y1 , y2 ) = y1 + y1 y2 . , g (Y ) = (2y1 + y2 , y1 ). If h = g ◦ F = g (F (x1 , x2 )) , then dh = (2y1 + y2 , y1 ) dX −ex2 1 1 ex1 = (2y1 + y2 + y1 ex1 , −(2y1 + y2 )ex2 + y1 ). In particular, we find ∂h = 2y1 + y2 + y1 ex1 ∂x1 and ∂h = −(2y1 + y2 )ex2 + y1 . ∂x2 These formulas could also have been found by directly applying the corollary, viz. ∂h ∂g ∂f1 ∂g ∂f2 = + = (2y1 + y2 )1 + y1 (ex1 ), ∂x1 ∂y1 ∂x1 ∂y2 ∂x1 and similarly for ∂h/∂x2 . Many applications of the chain rule are more complicated. Consider a real valued ˜ function g (x1 , x2 , x3 , t) , which depends on the point X = (x1 , x2 , x3 ) as well as t . The ˜ function g could be an expression of the temperature at a point X at time t . If the point ˜ ˜ X represents your position in the room, then since you move around the room, X is itself ˜ = F (t) , ˜ a function of t . Thus, if your position is specified by X x1 = f1 (t), x2 = f2 (t), x3 = f3 (t), ˜ the temperature where you stand is h(t) = g (f1 (t), f2 (t), f3 (t), t) . Since F : E1 → E3 while 4 → E1 , the chain rule is not directly applicable because g is defined on E4 , while the g:E ˜ image of F is in E3 . A simple - if artificial - device clears up the difficulty. Introduce another variable x4 and let X = (x1 , x2 , x3 , x4 ) . Then write g (x1 , x2 , x3 , x4 ) , as well as X = F (t) , with x1 = f1 (t), x2 = f2 (t), x3 = f3 (t), x4 = f4 (t) ≡ t. Now, as before, h(t) = g (f1 (t), f2 (t), f3 (t), t) , but F : E1 → E4 and g : E4 → E1 . The chain rule is thus applicable and gives dh dg dF = dt dX dt 9.2. THE DERIVATIVE OF COMPOSITE MAPS (“THE CHAIN RULE”). =( ∂g ∂g ∂g ∂g , , , ∂x1 ∂x2 ∂x3 ∂x4 df1 dt df2 dt df3 dt 377 , 1 so that ∂g ∂f1 ∂g ∂f2 ∂g ∂f3 ∂g dh = + + + . dt ∂x1 ∂t ∂x2 ∂t ∂x3 ∂t ∂x4 Since x4 ≡ t , the last equation can also be written as dh ∂g df1 ∂g ∂f2 ∂g df3 ∂g = + + + . dt ∂x1 dt ∂x2 ∂t ∂x3 dt dt From a less formal viewpoint, this could have been obtained directly from the equation h(t) = g (f1 (t), f2 (t), f3 (t), t) without dragging in the artificial auxiliary variable x4 . The variable x4 has been introduced to show how the chain rule applies. Once the process is understood, the variable x4 can (and should) be omitted. 4 EXAMPLE. Let g (x1 , x2 , x3 , t) = x1 t + 3x2 − x1 x3 + 1+t2 , and let x1 = 3t − 1, x2 = 2 et−1 , x3 = t2 − 1 . If h(t) = g (x1 (t), x2 (t), x3 (t), t) , we find ∂g ∂x1 ∂g dx2 ∂g dx3 ∂g dh = + + + . dt ∂x1 dt ∂x2 ∂t ∂x3 dt ∂t = (t − x3 )3 + (6x2 )et−1 − (x1 )2t + x1 − 8t . (1 + t2 )2 In particular, at t = 1 , we have x1 = 2, x2 = 1, x3 = 0 so that 8 dh(1) = (1 − 0)3 + (6)1 − (2)2 + 2 − = 5. dt 4 It is straightforward to compute the second derivative d2 h/dt2 from the formula for the first derivative. ∂ dg dx1 ∂ dg dx2 ∂ dg dx3 dg d2 h () () () = + + + ( ). dt2 ∂x1 dt dt ∂x2 dt dt ∂x3 dt dt ∂t dt For this example, this gives d2 h = (−2t + 1)3 + (6et−1 )et−1 + (−3)2t+ dt2 (3 + 6x2 et−1 − 2x1 − 8 1 − 3t 2 ). (1 + t2 )3 At t = 1 , we have ∂2h −2 (1) = (−2 + 1)3 + 6 − 6 + (3 + 6 − 4 − 8 ) = 4. 2 ∂t 8 ∂ The next example brings to the surface an ambiguity in the notation ∂x for partial derivatives. This ambiguity is often a source of great confusion. Consider a scalar valued function g (x1 , x2 , t, s) . If x1 = f1 (t) and x2 = f2 (t) , then h(t, s) = g (f1 (t), f2 (t), t, s) 378 CHAPTER 9. DIFFERENTIAL CALCULUS OF MAPS FROM EN TO EM , S. depends on the two variables t and s . In order to see how h changes with respect to t , we regard s as being held fixed and use the previous example to find ∂g ∂f1 ∂g ∂f2 ∂g ∂h = + + . dt ∂x1 dt ∂x2 ∂t ∂t We were careful and realized that the functions g (x1 , x2 , t, s) , a function with four independent variables, and h(t, s) := g (f1 (t), f2 (t), t, s) , a function with only two independent variables, were different functions. The usual (occasionally confusing) approach is to be less careful and write ∂g ∂g ∂f1 ∂g ∂f2 ∂g = + + . dt ∂x1 dt ∂x2 ∂t ∂t In the above equation, the term ∂g/∂t on the right is the partial derivative of g (x1 , x2 , t, s) with respect to t while thinking of all four variables x1 , x2 , t and s as being independent. On the other hand, the term ∂g/∂t on the left is the partial derivative of g (f1 (t), f2 (t), t, s) as a function of two variables. After being spelled out like this, the formula does have a clear meaning - but this is not at all obvious from a glance. One might even be mistakenly tempted to cancel the terms ∂g/∂t from both sides of the equation. It is often awkward to introduce a new name, as h(t, s) , for g (f1 (t), f2 (t), t, s) . Another unambiguous procedure is available: use the numerical subscript notation for the partial derivatives. Then g,1 always refers to the partial derivative of g with respect to its first variable, g,2 with respect to the second variable, etc. Thus, for the above example of g (x1 , x2 , t, s) where x1 = f1 (t) and x2 = f2 (t) , we have df1 df2 ∂g = g,1 + g,2 + g,3 . ∂t dt dt This clearly distinguishes the two time derivatives g,3 and ∂g/∂t . The seemingly unnecessary comma in the notation is to take care of the possibility of vector valued functions G(x1 , x2 , t, s) whose coordinate functions are indicated by subg1 is a map into E2 , where the coordinate functions are scripts. For example, if G = g2 g1 (x1 , x2 , t, s) and g2 (x1 , x2 , t, s) , then if x1 = f1 (t) and x2 = f2 (t) , we have ∂G = ∂t ∂ g1 ∂t ∂g2 ∂t = g1,1 f1 + g1,2 f2 + g1,3 g2,1 f1 + g2,2 f2 + g1,3 . Here g1,1 = ∂g1 /∂x1 , etc. The notation f1 for df1 (t)/dt could also have been replaced by f1,1 —but this is unnecessary here since the fj are functions of one variable. In applications, one commonly meets a problem of the following type. Let u(x, y ) be a scalar valued function which satisfies the wave equation uxx − uyy = 0 . If F : E2 → E2 is defined by 1 x = f1 (ξ, η ) = (ξ, +η ) 2 1 y = f2 (ξ, η ) = (ξ − η ) 2 and if h = u ◦ F , that is, h(ξ, η ) = u(f1 (ξ, η ), f2 (ξ, η )) , what differential equation does h satisfy? First, we compute hξ and hη ∂h ∂u ∂f1 ∂u ∂f2 1 1 1 = + = ux ( ) + uy ( ) = (ux + uy ) ∂ξ ∂x ∂ξ ∂y ∂ξ 2 2 2 9.2. THE DERIVATIVE OF COMPOSITE MAPS (“THE CHAIN RULE”). 379 ∂h ∂u ∂f1 ∂u ∂f2 1 1 1 = + = ux ( ) + uy (− ) = (ux − uy ) ∂η ∂x ∂η ∂y ∂η 2 2 2 In a similar way the second derivatives hξξ , hξη and hηη are found, ∂ (hξ ) ∂f1 ∂ (hξ ) ∂f2 ∂2h = + 2 ∂ξ ∂x ∂ξ ∂y ∂ξ = 1∂ 1 1∂ 1 1 (ux + uy ) + (ux + uy ) · = [uxx + 2uxy + uyy ] 2 ∂x 2 2 ∂y 2 4 ∂ (hξ ) ∂ (hξ ) ∂f1 ∂ (hξ ) ∂f2 ∂2h = = + ∂ξ∂η ∂η ∂x ∂η ∂y ∂η = 1∂ 1 1∂ −1 1 (ux + uy ) · + (ux + uy ) · = [uxx − uyy ] 2 ∂x 2 2 ∂y 2 4 ∂ (hη ) ∂f1 ∂ (hη ) ∂f2 ∂2h = + 2 ∂η ∂x ∂η ∂y ∂η = 1∂ 1 1∂ 1 1 (ux − uy ) · + (ux − uy )(− ) = [uxx − 2uxy + uyy ] 2 ∂x 2 2 ∂y 2 4 Since hξη = 1 [uxx − uyy ] , and u satisfies the wave equation, we see that h satisfies the 4 equation hξη = 0, so, in fact, the equations for hxiξ and hηη are superfluous to obtain the desired result. From this, it is easy to give another procedure for solving the wave equation, independent of Fourier series. Because hξη = 0 , we know that h(ξ, η ) = ϕ(ξ ) + ψ (η ) , where the functions ϕ and ψ are any twice differentiable functions. However, h(ξ, η ) = u( ξ+η , ξ−η ) . 2 2 Since the equations x = ξ+η , y = ξ−η may be solved for ξ and η in terms of x and y , 2 2 viz. ξ = x + y and η = x − y , we have h(x + y, x − y ) = u(x, y ) . But h(ξ, η ) = ϕ(ξ )+ ψ (η ) . Consequently u(x, y ) = ϕ(x + y ) + ψ (x − y ). This formula is the general solution of the one space dimensional wave equation. It expresses u in terms of two arbitrary functions ϕ and ψ . These functions ϕ and ψ can be chosen so that the function u(x, y ) , a solution of the wave equation, has any given initial position u(x, 0) = f (x) and initial velocity uy (x, 0) = g (x) . Let us do this. From the initial conditions we find f (x) = u(x, 0) = ϕ(x) + ψ (x) g (x) = uy (x, 0) = ϕ (x) − ψ (x). After differentiating the first expression, one can solve for ϕ and ψ , ϕ (x) = f (x) + g (x) , 2 ψ (x) = f (x) − g (x) . 2 Integrate these: x ϕ(x) = ϕ(0) + 0 f (s) + g (s) f (x) − f (0) 1 ds = ϕ(0) + + 2 2 2 x g (s) ds. 0 380 CHAPTER 9. DIFFERENTIAL CALCULUS OF MAPS FROM EN TO EM , S. x ψ (x) = ψ (0) + 0 x f (s) + g (s) f (x) − f (0) 1 ds = ψ (0) + + 2 2 2 g (s) ds. 0 Thus, u(x, y ) = ϕ(x + y ) + ψ (x − y ) = ϕ(0) + ψ (0) + f (x + y ) − f (0) 1 + 2 2 x+ y g (s) ds+ 0 f (x − y ) − f (0) 1 − intx−y g (s) ds. 0 2 2 Because f (0) = ϕ(0) + ψ (0) , this simplifies to u(x, y ) = f (x + y ) − f (x − y ) 1 + 2s 2 x+ y g (s) ds, x− y the famous d’Alembert formula for the solution of the one space dimensional wave equation in terms of the initial position f (x) and initial velocity g (x) . Unfortunately, simple formulas like this are exceedingly rare. That is why a different, more generally applicable, procedure was used earlier to solve the wave equation. As was seen in Exercise 6, p. 645, the d’Alembert formula is recoverable from the Fourier series. Exercises (1) For the following function g and f , compute the point X0 = (2, 2) . d dX (g ◦ F ) and evaluate ∂ ∂x1 (g ◦ F ) at (a) g (y1 , y2 ) = y1 y2 − y2 e2y1 , F : yz = 2x1 − x1 x2 , y2 = x2 + x2 1 2 (b) g (y1 , y2 ) = 7 + ey1 sin y2 F : y1 = 2x1 x2 , y2 = x2 − x2 1 2 2 2 (c) g (y1 , y2 , y3 ) = y1 − y2 − 3y1 y3 + y2 F : y1 = 2x1 − x2 , y2 = 2x1 + x2 , y3 = x2 1 (2) Let ϕ(x1 , x2 , t) := x2 x2 − te2x1 . If X = F (t) is defined by x1 = 1 − t2 , d find dt (ϕ ◦ F ) at t = 1 . (3) Let ϕ(x, s, t) := xs + xt + st . If x = f (t) = t3 − 7 , compute ∂2 Also compute ∂t2 (ϕ ◦ f ) at t = 3 . ∂ ∂t (ϕ x2 = 3t + 1 , ◦ f ) at t = 3 . (4) If u(x, y ) = x2 − y 2 , while F := (f1 , fx ) is given by x = f1 (r, θ) = r cos θ, y = f2 (r, θ) = r sin θ find hr and hθ , where h := u ◦ F . Also compute, hrr , hrθ and hθθ . 9.2. THE DERIVATIVE OF COMPOSITE MAPS (“THE CHAIN RULE”). 381 (5) (a) Let u(x, y ) be a scalar valued function and F : E2 → E2 be defined by the polar coordinate transformation f1 (r, 0) = r cos θ, f2 (rθ) = r sin θ, Take h := u ◦ F . Find hr , hθ , hrr , hrθ , and hθ,θ . [Answer: h4 = ux cos θ + uy sin θ, hrr = −uxx r sin θ + uyy (r cos θ − r sin θ)+ uyy r cos θ − ux sin θ + uy cos θ] (b) Show that uxx + uyy = hrr + 1 1 h + hr . 2 θθ r r (6) The two space dimensional wave equation is utt = uxx + uyy (a) If the space variables x, y are changed to polar coordinates (ex. 5) while the time variable is not changed, the wave equation reads htt =? where h(r, θ, t) = u(r cos θ, r sin θ, t). (b) If a given wave form depends only on the distance r from the origin and time t , but not on the angle ∂ , how does the wave equation for h simplify? (c) Consider the equation you found in b. Use the method of separation of variables and seek a solution in the form h(r, t) = R(r)T (t) . What are the resulting ordinary differential equations? Compare the equation for R(r) with Bessel’s differential equation. (7) If w = f (x, y, s) , while x = ϕ(y, s, t) and y = ψ (s, t) , find expressions for the partial derivative of the composite function g (ϕ(ψ, s, t), ψs) with respect to s and t . (8) (a) Let u(x, y ) = f (x − y ) . Show that u satisfies the partial differential equation ux + uy = 0. (b) Let u(x, y ) = f (xy ) . Show that u satisfies the equation xux − yuy = 0 . (c) Let u(x, y ) = f ( x ) . Show that u satisfies the equation y xux + yuy = 0. (d) Let u(x, y ) = f (x2 + y 2 ) , so u only depends on the distance from the origin. Show that u satisfies yux − xuy = 0. (9) Let u(x, y ) satisfy the equation xux + yuy = 0 . (a) Change the equation to polar coordinates [Answer: if h(r, θ) := u(r cos θ, r sin θ) , then rhr = 0 ]. (b) Solve the equation for h(r, θ) and use it to deduce that u(x, y ) = f ( x ) for some y function f . (cf. Ex. 8c) 382 CHAPTER 9. DIFFERENTIAL CALCULUS OF MAPS FROM EN TO EM , S. (10) Assume u(x, y ) satisfies the equation uxx − 2uxy − 3uyy = 0. (a) Choose the constants α, β, γ , and δ so that after the change of variables x = αξ + βη, y = γξ + δη , the equation for h(ξ, η ) = u(αξ + βη, γξ + δη ) is hξη = 0 . (b) Use the result of part (a) to find the general solution of the equation for u . [Answer: u(x, y ) = ϕ(3x − y ) + ψ (x + y ) ]. (11) If f (x, y ) is a known scalar valued function, find both partial derivatives of the function f (f (x, y ), y ) . (12) If W = G(Y ) and Y = F (X ) are defined by G: find d dX (G w1 = ey1 −y2 , w2 = ey1 +y2 y1 = x2 − 3x2 − x3 1 y2 = x1 + x2 + 3x3 , 2 F: ◦ F) . (13) Let u(x, y ) be a solution of the two dimensional Laplace’s equation uxx + uyy = 0 . (a) If u depends only on the distance from the origin u(x, y ) = h(r) , where r = x2 + y 2 , what ordinary differential equation does h satisfy? Compare your answer with that found in Exercise 5. (b) Solve the resulting equation for h and deduce that all the solutions of the two dimensional Laplace equation which depend only on the distance from the origin are of the form u(x, y ) = A + B log(x2 + y 2 ), where A and B are constants. (c) Now do the same thing all over again for a solution u(x1 , x2 , . . . , xn ) of the n dimensional Laplace equation ux1 x1 + . . . + uxn xn = 0 , i.e. find the form of all solutions which only depend on r = x2 + . . . + x2 , u(x1 , . . . , xn ) = h(r) . n 1 B B [Answer: u(x1 , . . . , xn ) = A + (x2 +...+x2 ) n−2 = A + rn−2 , n ≥ 3 ]. 2 1 n (14) If f (t) is a differentiable scalar valued function with the property that f (x + y ) = f (x) + f (y ) for all x, y ∈ E1 , prove that f (x) ≡ kx where k = f (1) . (15) (a) Find the general solution of the partial differential equation ux − 2uy = 0 . [Hint: Introduce new variables as in Ex. 10] (b) What is the solution if one requires that u(x, 0) = x2 ? [Answer: u(x, y ) = 1 (x + 2 y )2 ]. Chapter 10 Miscellaneous Supplementary Problems 1. (a) Sn , n = 1, 2, . . . , be a given sequence. Find another sequence an such that N SN = an . In other words, given the partial sums Sn , find a series whose n=1 partial sums are Sn . To what extent are the an uniquely determined? (b) Apply part (a) to find an infinite series an whose n th partial sum Sn is given by 1 (i) Sn = , (ii) Sn = e−n n 2. Let S = { x ∈ R : x ∈ (−1, 1) } . Define addition on S by the formula x ⊕ y = x+ y 1+xy , x, y ∈ S , where the operations on the right are the usual ones of arithmetic. Show that the elements of S form a commutative group with the operation ⊕ . 3. (a) If an → a , prove that a1 +a2 +···+an n → a also. (b) Assume that f is continuous on the interval [0, ∞] and lim f (x) = A . Define x→∞ 1N HN = f (x) dx . Prove that lim HN exists and find its value. [Hint: x→∞ N0 Interpret HN as the average height of the function f ]. 4. (a) Suppose that all the zeroes of a polynomial P (x) are real. Does this imply that all the zeroes of its derivative P (x) are also real? (Proof or counterexample). What can you say about higher derivatives P (k) (x) ? (b) Define the n th Laguerre polynomial by Ln (x) = ex d n n −x [x e ]. dxn Show that Ln is a polynomial of degree n . Prove that the zeroes of Ln (x) are all positive real numbers, and that there are exactly n of them. 383 384 CHAPTER 10. MISCELLANEOUS SUPPLEMENTARY PROBLEMS ∞ an xn (which converges to f for |x| < ρ so 5. If f (x) has a Taylor series: f (x) = n=0 the remainder does go to zero there) prove that f (cxk ) , where c is a constant and k a positive integer, has the Taylor series ∞ k an cn xnk f (cx ) = n=0 which converges to f (cxk ) for |x| < ( |ρ| )1/k . You must show that i) the Taylor c coefficients for f (cxk ) are an cn , that ii) the power series for f (cxk ) converges for |x| < ( |ρ| )1/k , and that iii) the remainder tends to zero. Apply the result to obtain c the Taylor series for cos(2x2 ) from that of cos x . 6. Yet another proof of Taylor’s Theorem. Beginning with equation 9 on p. 98, define the function K (s) by N K (s) = f (s) − n=0 f (n) (x0 ) A(s − x0 )N +1 (s − x0 )n − , n! (N + 1)! where A is picked so that K (ˆ) = 0 . x (a) Verify that K (x0 ) = K (x0 ) = . . . K (N ) (x0 ) = 0 . (b) Use Rolle’s Theorem to prove that if a function K (s) satisfies the properties of a), and if K (ˆ) = 0 , then there is a ξ between x and x0 such that K (N +1) (ξ ) = 0 . x ˆ (c) Apply parts a) and b) to prove Taylor’s Theorem. 7. Assume an converges. You are to investigate the convergence of |an | under various hypotheses. a2 and n (a) an arbitrary complex number (b) an ≥ 0 . an + 1 (c) lim < 1 (not = 1). n→∞ an 1 8. The harmonic series 1+ 1 + 3 +· · · has been said to diverge with “infuriating slowness”. 2 1 Find a number N such that 1 + 1 + 1 + · · · + N is at least 100. Compare this with 2 3 23 . Avogadro’s number ∼ 6 × 10 9. Consider the series ∞ n=1 an , where the an ’s are real. (a) Let b1 , b2 , b3 , . . . and c1 , c2 , c3 , . . . denote the positive and negative terms respec∞ tively from a1 , a2 , . . . . If n=1 an converges conditionally but not absolutely, ∞ ∞ prove that both series n=1 bn and n=1 cn diverge. (b) Let d1 , d2 , d3 , . . . , denote the terms a1 , a2 , a3 , . . . rearranged in any way. Prove ∞ Riemann’s theorem, which states that if n=1 an converges conditionally but ∞ not absolutely, then by picking some suitable rearrangement, the series n=1 dn can be made to converge to any real number, while using other rearrangements, it can be made to diverge to plus or minus infinity. 385 10. If A and B are subsets of a linear space V , a) show that span{ A ∩ B } ⊂ span{ A }∩ span{ B } . Give an example showing that span{ A ∩ B } may be smaller than span{ A } ∩ span{ B } . b). Show that if A ⊂ B ⊂ span{ A } , then span{ A } ⊃ span{ B } . 11. Let A = { X1 , . . . , Xk } be a set of vectors in a linear space V . Denote by cs A (coset of A ) the set k csA = { X ∈ V : X = k aj Xj , j =1 aj = 1 }. where j =1 Prove that cs A is a coset of V , in fact, the smallest coset of V which contains the vectors X1 , . . . , Xk . √ 12. (a) Consider the set of real numbers of the form a + b 2 , where a and b are rational numbers. Prove that this set is a vector space over the field of rational numbers. What is the dimension of this vector space? (b) Consider√ set of numbers of the form a + bi , where a and b are real numbers the and i = −1 . Prove that this set is a vector space over the field of real numbers and find its dimension. 13. If F1 and F2 are fields with F1 ⊂ F2 , we call F2 an extension field of F1 – such as R ⊂ C . As such, we may think of F2 as a vector space over the field F1 (see exercise 1l). In other words, take F2 as an additive group and take the scalars from F1 . If this vector space is finite dimensional, the field F2 is called a finite extension of F1 , and the dimension n of this vector space is called the degree of the extension and written n = [F2 : F1 ] . (a) Prove that every element ξ ∈ F2 satisfies an equation an ξ n + an−1 ξ n−1 + · · · + a0 = 0, where the ak ∈ F1 and n = [F2 : F1 ] . [Hint: look at the examples of exercise 1l]. (b) If F1 ⊂ F2 ⊂ F3 are fields with [F2 : F1 ] = n < ∞ and [F3 : F2 ] = m < ∞, prove that [F3 : F1 ] < ∞ , in fact, prove [F3 : F1 ] = [F3 : F2 ]]F2 : F1 ] = nm. (c) Let F1 be the field of rationals, F2 the field whose elements have the form √ a + b 3 , where a and b are rational, and let F3 be the field whose elements √ have the form c + d 5 , where c and d are√ F2 . Compute [F2 : F1 ] and find in the polynomial of part a) satisfied by (1 − 3) ∈ F )2 . Compute [F3 : F2 ] and [F3 : F1 ] . Find a basis for F3 as a vector space whose scalars are elements of F1 . [The ideas in this problem are basic to modern algebra, particularly Galois’ theory of equations.] 386 CHAPTER 10. MISCELLANEOUS SUPPLEMENTARY PROBLEMS 14. Let Pj = (αj , βj ), j = 1, . . . , n, αj = αk be any n distinct points in the plane R2 . One often wants to find a polynomial p(x) = a0 + a1 x + · · · + aN xN which passes through these n points, p(αj ) = βj , j = 1, . . . , n . Thus, p(x) is an interpolating polynomial. Given any points P1 , . . . , Pn , prove that a unique interpolating polynomial p(x) degree n − 1(= N ) can be found. (More about this is in Exercises 17-18 below). 15. Let L1 and L2 be linear operators mapping V → V . Then they can be both multiplied and added (or subtracted). The bracket product or commutator [L1 , L2 ] ≡ L1 L2 − L2 L1 “measures the non-commutativity”. It is important in mathematics and physics. [In quantum mechanics, the observables - like energy, momentum, and position - are represented by self-adjoint operators. Two observables can be measured at the same time if and only if their associated operators commute]. Prove the identities (a) [L1 , L1 ] = 0, [L1 , I ] = 0 (b) [L1 , L2 ] = −[L2 , L1 ] (c) [aL1 , L2 ] = a[L1 , L2 ] , a scalar (d) [L1 + L2 , L3 ] = [L1 , L3 ] + [L2 , L3 ] (e) [L1 , L2 , L3 ] = [L1 , L2 ]L3 + L2 [L1 , L3 ] (f) [L1 , [L2 , L3 ]] + [L2 , [L3 , L1 ]] + [L3 , [L1 , L2 ]] = 0 (Part f is the Jacobi identity. It has been said that everyone should verify it once in her lifetime.) 16. * Consider the normalized Legendre Polynomials, en (x) = 2 1 dn 2 (x − 1)n , 2n + 1 2n n! dxn n = 0, 1, 2, . . . which are an orthonormal set of polynomials in L2 [−1, 1], en being of degree n . If f ∈ C [−1, 1] , prove that N PN f = f , en en n=0 converges to f in the norm of L2 [−1, 1] . [Hint: Use the form of the Weierstrass Approximation Theorem (p. 255) and the method of Theorem (p. 241)]. 17. * We again take up the interpolation problem begun in Exercise 13 above. Let Pj = (αj , βj ), j = 1, 2, . . . , n be n points in the plane, αi = αj . Although we proved there is a unique polynomial p(x) = a0 + a1 x + · · · + an−1 xn−1 of degree n − 1 passing through the n points, the proof was entirely non-constructive. Here we (or you) explicitly construct the polynomial. (a) Show that the polynomial of degree n − 1 pj (x) = Πn= (x − αk ) ˜ k k =j is zero if x = αk , k = j , but pj (αj ) = 0 . ˜ 387 (b) Construct a polynomial pj (x) with the property pj (αk ) = δjk . (c) Show that n p(x) = βj pj (x) j =1 is the desired (unique by Ex. 13) interpolating polynomial. (d) Let P1 = (1, 1), P2 = (2, 1), P3 = (4, −1), P4 = (−1, −2) . Find the interpolating polynomial using the above construction. 18. * If f is some complicated function, it is often useful to use an interpolating polynomial instead of the function. Then the polynomial p(x) will pass through the points Pj = (αm , f (αj )), j = 1, . . . , n , so by Exercise 16, n p(x) = f (αj )pj (x). j =1 How much will p differ from f in an interval [a, b] containing the αj ? You must estimate the remainder R = f − p . (a) Assume f ∈ C n [a, b] . Since R(x) = f (x) − p(x) vanishes at x = αj , j = 1, . . . , n , it is reasonable to write R(x) = (x − α1 ) · · · (x − αn ) · (?) Fix x and define the constant A by ˆ f (ˆ) − p(ˆ) = A(ˆ − α1 ) · · · (ˆ − αn ). x x x x By a trick similar to that used in Taylor’s Theorem (cf. P. 104j Ex. 12), prove that A = f (n) (ξ )/n! where ξ is some point in (a, b) . Thus, f (ˆ) = p(ˆ) + x x (ˆ − α1 ) · · · (ˆ − αn ) (n−1) x x f (ξ ), ξ ∈ (a, b). n! (b) Let f (x) = 2x , and α1 = −1, α2 = 0, α3 = 1, α4 = 2. Find the approximating polynomial and find an upper bound for the error in the interval [−2, 2] . 19. If x is irrational and a,b,c, and d are rational (with ad − bc = 0) , prove that is irrational. 20. Prove by induction that 1 + 3 + 5 + · · · + (2n − 1) = n2 . 21. (a) If x ≥ 0 , use the mean value theorem to prove ex ≥ 1 + x. ax+b cx+d 388 CHAPTER 10. MISCELLANEOUS SUPPLEMENTARY PROBLEMS (b) If ak ≥ 0 , prove that n Pn ak ≤ Πn=1 (1 + ak ) ≤ e k k=1 ak , k=1 (where Πn=1 bk = b1 b2 · · · bn ). k (c) If ak ≥ 0 , prove that the infinite product Π∞ (1 + ak ) := limn→∞ Πn=1 (1 + ak ) k=1 k ∞ converges if and only if the infinite series k=1 converges. 22. Let an+1 = 2 1+an , where a1 > 1 . Prove that (a) the sequence a2n+1 is monotone decreasing and bounded from below. (b) the sequence a2n is monotone increasing and bounded from above. (c) does lim an exist? n→∞ 23. Let ak , k = 1, . . . , n + 1 be arbitrary real numbers which satisfy a1 + a2 + · · · + 2 an+1 an n−1 has at least one zero for n + n+1 = 0 . Show that P (x) = a1 + a2 x + · · · + an x x ∈ (0, 1) . 24. Suppose f ∈ C 2 in some neighborhood of x0 . Prove that f (x0 + h) − 2f (x0 ) + f (x0 − h) = f (x0 ). h→0 h2 lim 25. Let s(x) and c(x) be continuously differentiable functions defined for all x , and having the properties s (x) = c(x), c (x) = s(x) s(0) = 0, c(0) = 1. (a) Prove that c2 (x) − s2 (x) = 1 . (b) Show that c(s) and s(x) are uniquely determined by these properties. 26. Consider an and (a) If lim n→∞ bn . bn = K, K = 0, ∞ , then the series both converge or diverge together. an (b) If an converges and lim bn = 0 , then an (c) If an converges and lim bn = ∞ , then the series an n→∞ n→∞ diverge (give examples). (d) Apply these to: ∞ (i) 1 √ n− n n=2 bn converges. bn may converge or 389 ∞ (ii) n=1 ∞ n3 1 √ −2 n (−1)n sin (iii) n=1 π . (Hint: as x → 0, n sin x x → 1 ). 27. The following (a weak form of Stirling’s formula) is an improvement of the result on page 64, Ex. 6. n log n − (n − 1) < log n! < (n + 1) log(n + 1) − 2 log 2 − (n − 1), from which one finds nn e−n+1 < n! < 1 (n + 1)(n+1) e−n+1 . 4 Prove these. 28. (a) Find the Taylor series expansion for f (x) = e−x about x = 0 . (b) Show that the series found in (a) converges to e−x for all x in the interval [−r, r] , where r > 0 is an arbitrary but fixed real number. 29. Consider the sequence N SN = 2 sin πx dx. x Does lim SN exist? [Hint: observe that SN can be written as N →∞ N −1 SN = an , 2 where n+1 an = n sin πx dx. x sin πx x ,x ≥ 2 , to deduce - by inspection - the needed properties of Sketch a graph of the an ’s. Please do not attempt to evaluate the integrals for an ]. 30. Let A = { p ∈ P9 : p(x) = p(−x) } . (a) Prove that A is a subspace of P9 . (b) Compute the dimension of A . 31. Let X and Y be elements in a real linear space. Prove that X = Y if (X + Y ) ⊥ (X − Y ) . 32. In the space R2 , introduce the new scalar product < X, Y >= x1 y1 + 4x2 y2 , where X = (x1 , x2 ) and Y = (y1 , y2 ) . if and only 390 CHAPTER 10. MISCELLANEOUS SUPPLEMENTARY PROBLEMS (a) Verify that this indeed is a scalar product and define the associated norm X . (b) Let X1 = (0, 1) and X2 = (4, −2) . Using this norm and scalar product, find an orthonormal set of vectors e1 and e2 such that e1 is in the subspace spanned by X1 . 33. Let H be a scalar product space with X and Y in H . Find a scalar α which makes X − αY a minimum. For this α , how are X − αY and Y related? [Hint: Draw a picture in E2 ]. ∞ ∞ an converges, where an ≥ 0 , does the series 34. If n=1 1 √ a also converge? Proof or n2 counterexample. 35. Use the Taylor series about x0 = 0 to calculate sin.2 making an error less than .005 . Justify your statements. 36. Let A = span{ (1, 1, 1, 1), (1, 0, 1, 0) } be a subspace of E4 . Find the orthogonal complement, A⊥ , of A by giving a basis for A⊥ . 37. Prove that (a) 1 + (b) 1 + 1 < 8 1 2 ∞ k=1 ∞ k=1 1 1 <1+ . 3 k 2 1 3 <1+ . k2 4 38. Let ak be a sequence of positive numbers decreasing to zero, ak → 0 , and let SN = a1 + a2 + · · · + aN . (a) Prove that SN ≥ N aN . (b) Use this to estimate the number, N , of terms needed to make N k −1/4 > 1000. k=1 39. Prove or give a counterexample: ∞ (a) If ∞ b2n must converge. bn converges, then n=1 ∞ n=1 ∞ |bn | converges, then (b) If n=1 |b2n | must converge. n=1 40. Let X1 and X2 be elements of a scalar product space. (a) If X1 ⊥ X2 , prove that X1 − aX2 ≤ X1 for any real number a . 391 (b) Prove the converse, that is, if X1 − aX2 ≤ X1 for every real number a , then X1 ⊥ X2 . [Hint: After your first approach has failed, try looking at the problem geometrically. How would you pick a to minimize the left side of the inequality?]. 41. Let Sn = a1 + a2 + · · · + an , where an → 0 as n → ∞ . Prove that Sn converges if and only if S2n = a1 + a2 + · · · + a2n−1 + a2n converges (one could also use S3n etc.). ∞ 42. Show that the error in approximating the series n=1 1 by the first N terms is less nn than N −N −1 . 43. A sample “multiplication” for points X = (x1 , x2 , x3 ) and Y = (y1 , y2 , y3 ) in R3 is to define X Y ≡ (x1 y2 , x2 y2 , x3 y3 ). Define a multiplicative identity by yourself. Using these definitions for the multiplicative structure and the usual rules for the additive structure, show that the resulting algebraic object is not a field. 44. (a) Assume an ≥ 0 and bn ≥ 0 . Prove that ∠(an + bn ) converges if and only if the series ∠an and ∠bn both converge. (b) What if you allow the bn ’s to be negative? 1 1 1 1 45. (a) Show that the vectors e1 = ( √2 , √2 ), e2 = ( √2 , − √2 ) form an orthonormal basis for E2 . (b) Write the vector X = (7, −3) in the form X = a1 e1 + a2 e2 , using the scalar product to find a1 and a2 (don’t solve linear equations). 46. Consider the linear space P2 as a subspace of L2 [0, 1] . (a) If p(x) = 1 − x2 , compute p. (b) Find the orthonormal basis for A⊥ , where A = span{ 2 + x } . (c) Find the polynomial ϕ ∈ P2 such that p, ϕ = p(1) for all p ∈ P2 , that is, the same ϕ should work for all p ’s. 47. Give formal proofs for the following (trivial) properties of a norm on a linear space. Only the axioms may be used. (a) −X = X (b) X −Y = Y −X (c) X +Y ≥ X − Y (d) X1 + X2 + · · · + Xn ≤ X1 + X2 + · · · + Xn (I suggest induction here). 392 CHAPTER 10. MISCELLANEOUS SUPPLEMENTARY PROBLEMS 48. Consider R2 with the norms 1, 2 , and ∞ . (a) Draw a sketch of R2 indicating the unit ball for each of these three norms. (The ball may not turn out to be “round”). (b) Which of these three linear spaces have the following property: “given any subspace M and a point X0 not in M , then there is a unique point on M which is closest to M .” 49. Are the following scalar products the set of functions continuous on [a, b] ? Proof or counterexample. b (a) [f, g ] = ( b f (x) dx)( a g (x) dx) a b b |f (x)| dx)( (b) [f, g ] = ( a |g (x)| dx) a 50. (a) Let dim V = n and { X1 , . . . , Xn } ∈ V . Prove that { X1 , . . . , Xn } are linearly independent if and only if they span V (so in either case, they form a basis for V ). (b) Let { e1 , . . . , en } be an orthonormal set of vectors for an inner product space H . Prove this set of vectors is a complete orthonormal set for H if and only if n = dim H . (c) Prove that dim V = largest possible number of linearly independent vectors in V. 51. (a) Let X and Y be any two elements in an inner product space. Prove that the parallelogram law holds X +Y 2 + X −Y 2 =2 X 2 +2 Y 2 (cf. page 192, Ex. 9). (b) Consider the set of continuous functions on [0, 1] with the uniform norm, f ∞ = max0≤x≤1 |f (x)| . Show that this norm cannot arise from an inner product, i.e. there is no inner product such that for all f , f ∞ = f , f . [Hint: If there were, the relationship of part a would hold between the norms of various elements. Show that relationship does not, in fact, hold for the function f (x) = 1 and g (x) = x ]. 52. (a) Let H be a finite dimensional inner product space and (X ) a linear functional defined for all X ∈ H . Show that there is a fixed vector X0 ∈ H such that (X ) = X, X0 for all X ∈ H. This shows that every linear functional can be represented simply as the result of taking the inner product with some vector X0 . [Hint: First pick a basis { e1 , . . . , en } for H and let cj = (en ) . Now use the fact that the ej ’s are a basis and that is linear]. 393 (b) Consider the linear space P2 with the L2 [0, 1] inner product. This gives an inner product space H . (i) Show that (p) = p( 1 ) is a linear functional. 3 (ii) Find a polynomial p0 such that (p) = p, p0 for all p ∈ H . 53. Consider the set S of pairs of real numbers X = (x1 , x2 ) . Define X + Y = (x1 + y1 , x2 + y2 ), aX = (ax1 , x2 ). Is S , with this definition of vector addition and multiplication by scalars, a vector space? 54. By inspection, place suitable restrictions on the contents a, b, c, · · · in order to make the following operator linear: T u = a[ d2 u du d3 u 2 + bx2 2 + cu + eu + f sin u + g. 3 dx dx dx d 55. Consider the operator D = dx on the linear space Pn of all polynomials of degree less than or equal to n . Find R(D) and N(D) as well as dim R(D) and dim N(D) . 56. Let 1 −2 A = 2 0 , 31 30 2 and C = 1 4 −1 . 0 −2 0 B= −1 0 −2 , 210 Compute all of the following products which make sense: AB, BA, AC, CA, BC, CB, A2 , B 2 , C 2 , ABC, CAB. 57. Consider the mapping A : R4 → R3 which is 1 −1 A = 2 1 0 −3 defined by the matrix 11 1 4 1 −2 (a) Find bases for N(A) and R(A) . (b) Compute dim N(A) and dim R(A) . 58. Let A be a square matrix. Consider the system of linear algebraic equations AX = Y0 , where Y0 is a fixed vector. Assume these equations have two distinct solutions X1 and X2 , AX1 = Y0 , AX2 = Y0 , X1 = X2 . (a) Find a third solution X3 . 394 CHAPTER 10. MISCELLANEOUS SUPPLEMENTARY PROBLEMS (b) Does there exist a vector Y1 such that the equations AX = Y1 have no solutions? Why? (c) det A =? 59. Let Q be a parallelepiped in En whose vertices Xk are at points with integer coordinates, Xk = (a1k , a2k , · · · ank ), aik integers. Prove that the volume of Q is an integer. 60. Let A and B be self-adjoint matrices. Prove that their product AB is self-adjoint if and only if AB = BA . 61. Solve the following initial value problems. u(0) = 1 , u (0) = 0 2 (a) u + 8u + 16u = 0, (b) u + 10u + 16u = 0, u(0) = 1, u (0) = 2 1 u(0) = 4 , u (0) = 1 (c) u + 64u = 0, u(0) = 2, u (0) = −1 (d) u + 4u + 5u = 0, u(0) = 0, u (0) = −2 (e) 2u + 6u + 5u = 0, (f) 4u − 4u + u = 0, u(1) = −1, u (1) = 0 (g) u + 8u + 16u = 2, 1 u(0) = 2 , u (0) = 0 (h) u + 8u + 16u = t, u(0) = 1 , u (0) = 0 2 (i) u + 8u + 16u = t − 2, u(0) = 0, u (0) = 0 (j) u + 8u + 16u = t − 2, 1 u(0) = 2 , u (0) = 0 (k) u + 10u + 16u = t, u(0) = 1, u (0) = 2 1 u(0) = 4 , u (0) = 2 (l) u + 64u = 64, (m) u + 64u = t − 64, u(0) = 3 , u(0) = 0 4 (n) 2u + 6u + 5u = t2 , u(0) = 0, u (0) = −2 62. (The complex numbers as matrices). (a) Show that the set of matrices C={ a −b ba : a and b are real numbers } is a field. (b) Find a map ϕ : C → complex numbers such that ϕ is bijective and such that for all A, B ∈ C (i) ϕ(A + B ) = ϕ(A) + ϕ(B ) (ii) ϕ(AB ) = ϕ(A)ϕ(B ). 395 63. (Quaternions as matrices). A definition: A division ring is an algebraic object which satisfies all of the field axioms except commutativity of multiplication. (a) Show that the set of matrices Q={ z −w ¯ wz ¯ : z, w are complex numbers } form a division ring with the usual definitions of additions and multiplication for matrices. √ (b) If we write z = x + iy, w = u + iv where i = −1 and x, y, u , and v are real numbers, then Q can be considered as a vector space over the reals with basis 1= 10 01 i= i0 0 −i j= 0 −1 10 k= 0i . i0 Compute i2 , j2 , k2 , ij, jk, ki, ji, kj , and ik . (The set Q is called the quaternions ). 64. Let 2 −3 1 0 0 2 −3 1 . A= 0 0 2 −3 00 0 2 (a) Find det A . (b) Find A−1 . 2 8 (c) Solve AX = Y , where Y = 8 . −16 (d) Let L : P3 → P3 be the linear operator defined by Lp = p − 3p + 2p, (u = du ). dx Find the matrix eL e for L with respect to the following basis for P3 e1 (x) = 1, e2 (x) = x, e3 (x) = x2 , 2 e4 (x) = x3 . 3! (e) Use the above results to find a solution of 8 Lu = 2 + 8x + 4x2 − x3 . 3 [Hint: Express the right side in the basis of part d.]. 65. Let H be an inner product space, and suppose that A is a symmetric operator, A∗ = A , with the additional property that A2 = A . Show that there exist two subspaces V1 and V2 of H with all of the following properties 396 CHAPTER 10. MISCELLANEOUS SUPPLEMENTARY PROBLEMS (i) V1 ⊥ V2 (ii) If X ∈ V1 , then AX = X (iii) If Y ∈ V2 , then AY = 0 (iv) If Z ∈ H , then Z can be written uniquely as Z = X + Y where X ∈ V1 and Y ∈ V2 . 66. (a) Find the inverse of the matrix 2 1 0 1 . A = −1 0 0 −1 −1 (b) Use the result of a) to solve AX = b for X where b = (7, −3, 2) . 67. Let A and B be 2 × 2 positive definite matrices with det A = det B . Prove that det(A − B ) < 0 . 68. Let L : V1 → V2 be a linear operator with LX1 = Y1 and LX2 = Y2 . Give a proof or counterexample to each of the following assertions: (a) If X1 and X2 are linearly independent, then Y1 and Y2 must be linearly independent. (b) If Y1 and Y2 are linearly independent, then X1 and X2 must be linearly independent. 69. Let p0 , p1 , p2 , , . . . be an orthogonal set of polynomials in [a, b] where pn has degree n. (a) Prove that pn is orthogonal to 1, x, x2 , . . . , xn−1 . (b) Prove that pn is orthogonal to any polynomial q of degree less than n . (c) Prove that pn has exactly n distinct real zeros in (a, b) . [Hint: Let α1 , . . . , αk be the places in (a, b) where pn (x) changes sign, so p(x) = r(x)(x − α1 )(x − α2 ) . . . (x − αk ) where r(x) is a polynomial of degree n − k which does not change sign for x in (a, b) , say r(x) ≥ 0 . Show that b p(x)(x − α1 ) · · · (x − αk ) dx > 0. a If k < n , show that this contradicts the result of part b).]. 70. Consider the system of inhomogeneous equations a11 x1 + · · · + a1n xn = b, . . . ak1 x1 + · · · + akn xn = bn . 397 Let A = ((aij )) and let Ab denote the augmented matrix a11 · · · a1n b1 . Ab = . . ak1 · · · akn bn formed by adding the bj ’s as an extra column to A . Prove that the given system of equations has a solution if and only if dim R(A) = dim R(Ab ) . 71. Let A be an n × n matrix. (a) Show that you can not solve the equation A2 = −I if n is odd. (b) Find a 2 × 2 matrix A such that A2 = −I . (c) If n is even, find an n × n matrix A such that A2 = −I . 72. Let A be an n×n matrix such that A2 = I . Prove that dim R(A+I )+dim R(A−I ) = n. 73. Let f (x, y ) = (y − 2x2 )(y − x2 ) . Show that the origin is a critical point. Then show that if you approach the origin along a straight line, the origin appears to be a minimum. On the other hand, show that if curved paths are also used, then the origin is a saddle point of f . [The point of this exercise is to illustrate the fact that the nature of a critical point cannot be determined by merely approaching it along straight lines]. 74. (a) Let A be a diagonal matrix, no two of whose diagonal elements are the same. If B is another matrix and AB = BA , prove that B is also diagonal. (b) Let A be a diagonal matrix, B a matrix with at least one zero-free column and with the further property that AB = BA . Prove that all of the diagonal elements of A are equal. √ an 1 75. (a) If an converges, where an ≥ 0 , prove that converges if p > 2 . np [Hint: Schwarz]. (b) Find an example showing that the series may diverge if p = 1 2 . 76. Let [X, Y ] be an inner product on R3 with basis vectors e1 , e2 , e3 , not necessarily orthonormal. Let aij = [ei , ej ] . Prove that the quadratic form 3 3 Q(X ) = aij xi xj i=1 j =1 is positive definite. 77. If A is self-adjoint and AX = λ1 X, AY = λ2 Y with λ1 = λ2 , prove that X ⊥ Y . 398 CHAPTER 10. MISCELLANEOUS SUPPLEMENTARY PROBLEMS 78. Let S be a positive definite matrix. Prove that det S > 0 . [Hint: Consider the matrix A(t) ≡ tS + (1 − t)I , where 0 ≤ t ≤ 1 . Show that A(t) is positive definite, so det A(t) = 0 . Then use the fact that A(0) = I and A(1) = S to obtain the conclusion]. 79. Consider the linear space of infinite sequences X = (x1 , x2 , x3 , · · · ) with the usual addition. Define the linear operator S (the right shift operator) by SX = (0, x1 , x2 , x3 , · · · ) (a) Does S have a left inverse? If so, what is it? (b) Does S have a right inverse? If so, what is it? 80. Find a right inverse for the matrix A= 101 . 010 Can A have a left inverse? Why? 81. Which of the following statements are true for all square matrices A ? Proof or counterexample. (a) If A2 = I , then det A = I . (b) If A2 = A , then det A = 1. (c) If A2 = 0 , then det A = 0 (d) If A2 = I − A , then det A2 = 1 − det A . 82. Let L be a linear operator on an inner product space H with inner product <, > . Define [X, Y ] = LX, LY . Under what further condition(s) on L is [X, Y ] an inner product too? 83. Let L : H → H be an invertible transformation on the inner product space H . If L “preserves orthogonality” in the sense that X ⊥ Y implies LX ⊥ LY , prove that there is a constant α such that R ≡ αL is an orthogonal transformation. 84. Let H be an inner product space. If the vectors X1 and X2 are at opposite ends of a diameter of the sphere of radius r about the origin, and if Y is any other point on that sphere, prove that Y − X1 is perpendicular to Y − X2 , proving that an angle inscribed in a hemisphere is a right angle. 85. If L is skew-adjoint, L∗ = −L , prove that X, LX = 0 for all X. 399 86. Let Dn be a n × n matrix with x on the main diagonal and super-diagonals, so x1 x10 1 x x1 D2 = , D3 = 1 x 1 , D4 = 0 1 1x 01x 00 If x = 2 cos θ , prove that det Dn = sin(n+1)θ sin θ and 1 s on both the sub- 0 1 x 1 0 0 , 1 x D5 = · · · . . 87. Let A and B be square matrices of the same size. If I − AB is invertible, prove that I − BA is also invertible by exhibiting a formula for its inverse. 88. Assume an converges, where an ≥ 0 . Does the series √ an an+1 also converge? Proof or counterexample. 89. Let A be a square matrix. (a) Prove that AA∗ is self-adjoint. (b) Is AA∗ always equal to A∗ A ? Proof or counterexample. 90. Show that C [0, 1] is a direct sum of the space V1 spanned by e1 (x) = x and e2 (x) = x4 , and the subspace V2 of all functions ϕ(x) such that 1 0= 1 xϕ(x) dx, x4 ϕ(x) dx. 0= 0 0 [Hint: Show that if f ∈ [0, 1] , there are unique constants a and b such that g (x) ≡ f (x) − [ax + bx4 ] belongs to V2 ]. 91. Let V1 be the linear space of all complex-valued analytic functions in the open unit disc, that is, V1 consists of all complex-valued functions f of the complex variable z which have convergent power series expansions ∞ an z n f (z ) = 0 in the open disc, |z | < 1 . Let V2 be the linear space of all sequences of complex numbers ( a0 , a1 , a2 , · · · ) with the natural definition of addition and multiplication by constants. Define L : V1 → V2 by the rule Lf = (a0 , a1 , a2 , · · · ), where the aj ’s are the Taylor series coefficients of f . Answer the following questions with a proof or counterexample. 400 CHAPTER 10. MISCELLANEOUS SUPPLEMENTARY PROBLEMS (a) Is L injective? (b) Is L surjective? (c) Is 2 contained in R(L) ? (Note: 2 is the subspace of V2 such that ∞ |ak |2 < ∞). k=0 92. Do the following series converge or diverge? ∞ ∞ 1 + 1/n, (a) ( 1 + 1/n2 − 1). (b) n=1 n=1 93. Consider the set of four operators { T1 , T2 , T3 , T4 } defined as follows on the set of square invertible matrices. T2 A = A−1 T1 A = A, T3 A = A∗ , T4 A = (A−1 )∗ . Show that this set of four operators forms a commutative group with the group operation being ordinary operator multiplication. 94. Let Sn = a1 − a2 + a3 − a4 + a5 − · · · . If 0 < ak and the ak ’s are increasing, prove that |SN | ≤ aN . 95. The Monge-Ampere equation is uxx uyy − u2 = 0 . Show that it is satisfied by any xy u(x, y ) ∈ C 2 of the form u(x, y ) = ϕ(ax + by ) , where a and b are constants. 96. (a) Consider the differential operator Lu = u − 4u (i) Find a basis for the nullspace of L . (ii) Find a particular solution of Lu = e2x+1 . (iii) Find the general solution of Lu = e2x+1 . (b) Consider the differential operator Lu = u + 4u Repeat part (a), only here use Lu = f , where f (x) = sec 2x . 97. Find the general solution for each of the following (a) 2u + 5u − 3u = 0 (b) u − 6u + 9u = 0 (c) u − 4u + 5u = 0 401 98. Find the first four non-zero terms in the series solution of 4x2 u − 4xu + (3 − 4x2 )u = 0 corresponding to the largest root of the indicial equation. Where does the series converge? 99. Find the complete solution of each of the following equations valid near x = 0 by using power series. (a) x2 u + xu − (x2 − 1 )u = 0 4 (b) u + xu − u = 0 (only first five non-zero terms) [Answers: ∞ (a) u(x) = Ax−1/2 k=0 x2k + Bx1/2 (2k )! (b) u(x) = Ax + B (1 + x2 2! − x4 4! + ∞ k=0 3 x6 6! − x2k , (2k + 1)! 15x8 8! + · · · ) ]. 100. Consider the matrix −1 −4 −12 1 3 6 A= 0 0 −1 0 −4 −12 0 0 . 0 1 (a) Compute det A . (b) Compute A−1 . (c) Solve AX = b where b = (1, 2, 3, −1) . 101. True or false. Justify your response if you believe the statement is false (a counterexample is adequate). (a) The set A = { X ∈ R3 : x1 = 2 } is a linear subspace of R3 . (b) The vectors X1 = (2, 4) and X2 = (−2, 4) span R2 . (c) The vectors X1 = (1, 2, 3), X2 = (−7, 3, 2), X3 = (2, −1, 1) , and X4 = (π, e, 5) are linearly independent. (d) The set A = { u ∈ C [0, 1] : u(x) = a1 x + a2 ex } is an infinite dimensional subspace of C [0, 1] . (e) The functions f1 (x) = x and f2 (x) = ex are linearly dependent functions in C [0, 1] . (f) If { e1 , e2 , . . . , en } are an orthonormal set of vectors in E8 , then n ≤ 7 . (g) The vector Y = (1, 2, 3) is orthogonal to the subspace of E3 spanned by e1 = (0, 3, −2) and e2 = (−1, −1, 1) . 402 CHAPTER 10. MISCELLANEOUS SUPPLEMENTARY PROBLEMS (h) The elements of the set A = { u ∈ C 2 [0, 10] : u + xu − 3u = 6x } can be represented as u(x) = u(x) + x3 , where ˜ u ∈ S = { u ∈ C 2 [0, 10] : u + xu − 3x = 0 }. ˜ 1 83 2 1 2 1 (i) The set of vectors e1 = ( 3 , 0, 2 , − 3 ), e2 = (0, 0, √2 , √2 ), and e3 = ( 9 , 9 , − 9 , 2 ) 3 9 constitute a complete orthonormal basis for E4 . (j) In the vector space of bounded functions f (x), x ∈ [0, 1] , the functions f1 (x) = 1, f2 (x) = 1, 0 ≤ x ≤ 1 , 2 0, 1 < x ≤ 1 2 f3 (x) = 0, 0 ≤ x ≤ 1 2 1, 1 < x ≤ 1 2 are linearly independent. (k) The function f (x) = |x| can be represented by a convergent Taylor series about the point x0 = 0 . (l) The function f (x) = x2 − x73 can be represented by a convergent Taylor series about the point x0 = −1 . (m) The function f (x) = |x| can be represented by a convergent Taylor series about the point x0 = −1 . (n) The plane of all points (x1 , x2 , x3 , x4 ) ∈ E4 such that 2x1 − 4x2 + 6x3 − 5x4 = 7 is perpendicular to the vector (2, −4, 6, −5) . 3 (o) If e1 = ( 3 , 4 ) and e2 = ( 4 , − 5 ) , then X = (−1, 2) can be written as X = 55 5 2e1 − e2 . (p) The set of all integers (positive, negative, and zero) is a field. (q) Consider the infinite series ∞ ak . k=0 If lim |ak | = 0 , then the series must converge. k→0 (r) Let { an } be a sequence of rational numbers. If this sequence converges to a , then the limiting value, a , must be a rational number too. (s) The equation x6 + 3 = 0 , where x is an element of an ordered field, has no solutions. √ (t) It is possible to write i in the form a + ib , where a and b are real numbers. √ (Here i = −1 , of course). (u) Let an be a sequence of complex numbers. If the sequence of absolute values, |an | , converges, then the sequence an must converge. (v) If ∞ ak z k k=0 converges at the point z = 3 , then it must converge at z = 1 + i . 403 (w) The linear subspace A = { p ∈ P7 : p(x) = a1 x + a2 x5 } is a five dimensional subspace of P7 . (x) The linear subspace A = { u ∈ C [−1, 1] : u(x) = a1 x + a2 x5 } is an infinite dimensional subspace of C [−1, 1] . (y) There is a number α such that the vectors X = (1, 1, 1) and Y = (1, α, α2 ) form a basis for R3 . (z) The operator T : C 2 → C 1 defined for u ∈ C 2 by T u = u − 7u is a linear operator. 102. (a) The operator T : C [0, 1] → R defined for u ∈ C [0, 1] by 1 |u(x)| dx Tu = 0 is a linear operator. (b) The sequence (1 + i)n converges to √ 2. (c) The series ∞ k=1 k+1 2345 = + + + + ··· 2k + 1 3579 converges. (d) If t is real, then eit = 1 . (e) Let V1 and V2 be linear spaces and let the operator T map V1 into V2 . If T 0 = 0 , then T is a linear operator. (f) The operator T : C ∞ [−7, 13] → C ∞ [−7, 13] defined by Tu = u du dx is linear. (g) The operator T : C [0, 13] → C [0, 13] defined by x (T u)(x) = u(t) sin t dt, x ∈ [0, 13] 0 is linear (h) In the scalar product space L2 [0, 1] , the functions f and g whose graphs are a figure goes here are orthogonal. (i) Let L be a linear operator. IF LX1 = Y and LX2 = Y , where X1 = X2 , then the solution of the homogeneous equation LX = 0 is not unique. (j) Let L be a linear operator. If X1 and X2 are solutions of LX = 0 , then 3X1 − 7X2 is also a solution of LX = 0 . 404 CHAPTER 10. MISCELLANEOUS SUPPLEMENTARY PROBLEMS (k) Let e1 = (1, 1) and e2 = (0, 1) , and let the linear operator L which maps R2 into R3 satisfy Le1 = (1, 2, 3), Le2 = (1, −2, −1). Then L(2, 3) = (1, 1, 1). (l) In the space L2 [0, 1] , if f is orthogonal to the function x2 , then either f ≡ 0 or else f must be positive somewhere in [0, 1] . (m) If F (X ) = (2, 3, 4) for all X ∈ E3 , then F is an affine mapping. (n) If f : E3 → E1 is such that f : (1, 0, 0) → 1 and f : (0, 4, 0) → 2 , there is a 1 point Z ∈ E3 such that f (Z ) ≥ 5 . (o) Let A and B be square matrices with det A = 7 and det B = 3 . Then det AB = 10. det(A + B ) = 10. (p) If A : R3 → R3 is given by A= 231 , 192 then dim N(A) = 2 . (q) The function f (x, y, z ) = 9 + 3x + 4y − 7z does not take on its maximum value. (r) If the function u(x) has two derivatives in some neighborhood of x = 0 , and satisfies the differential equation 9x2 u − 28u = 0, then u(0) = 0 . (s) There are constants a, b and c such that the function u(x) = ex + 2e2x − e−x is a solution of au + bu + cu = 0. (t) The vector (xy, x) is the derivative of some real-valued function f (x, y ) . (u) The vector (y, x) is not the derivative of some real-valued function f (x, y ) . (v) Given any q × p matrix A = ((aij (X ))) , where X = (x1 , · · · , xp ) and where the elements aij (X ) are sufficiently differentiable functions, then there is a map F : Rp → Rq such that F (X ) = A . (w) If A is a square matrix and A2 = A , then A = I . (x) If A is a square matrix and A2 = 0 , then A = 0 . (y) If A is a square matrix and det A = 0 , then A2 = A if and only if A = I . (z) If X, Y , and Z are three linearly independent vectors, then X + Y, and X + Z are also linearly independent. Y +Z, 103. Define L : P2 → P2 as follows: if p ∈ P2 Lp = (x + 1) dp dx (a) Find the matrix e Le representing the operator L with respect to the bases e1 = 1, e2 = x1 , e3 = x2 for P2 . 405 (b) Is L an invertible operator? Why? (c) Find dim R(L) and dim N(L) . 104. Let A= 1 2 √ 3 2 √ 3 2 − 1 2 , B= 5 √ 3 √ 3 . 3 (a) Compute AA∗ , ABA∗ , and (ABA∗ )100 . (b) How could you use the result of part (a) to compute B 100 ? 105. Consider the following system of three equations as a linear map L : R2 → R3 x1 + x2 = y1 4x1 + x2 = y2 x1 − 2x2 = y3 (a) Find a basis for N(L∗ ) . (b) Use the result of part a) to determine the value(s) of α such that Y = (1, 2, α) is in R(L) . 106. Find the unique solution to each of the following initial value problems. (a) u + u − 2u = 0, u(0) = 3, u (0) = 0 (b) u + 4u + 4u = 0, u(0) = 1 u (0) = −1 (c) u − 2u + 5u = 0, u(0) = 2, u (0) = 2 107. Consider the special second order inhomogeneous constant coefficient O.D.E. Lu = f , where Lu ≡ u − 4u, and where f is assumed to be a suitably differentiable function which is periodic with period 2π, f (x + 2π ) = f (x) . (a) Expand f in its Fourier series and seek a candidate, u , for a solution of Lu = f as a Fourier series, showing how the Fourier coefficients of u are determined by the Fourier coefficients of f . (b) Apply the above procedure to the trivial example where f (x) = sin 3x − 4 cos 17x + 3 sin 36x. 108. (a) Find the directional derivative of the function f (x, y ) = 2 − x + xy at the point (0, 6) in the direction (3, −4) by using the definition of the directional derivative as a limit. Check your answer by using the short method. 406 CHAPTER 10. MISCELLANEOUS SUPPLEMENTARY PROBLEMS (b) Repeat part (a) for f (x, y ) = 1 − 3y + xy . 109. Find and classify the critical points of the following functions. (a) f (x, y ) = x3 + y 2 − 3x − 2y + 2 (b) f (x, y ) = x2 − 4x + y 2 − 2y + 6 (c) f (x, y ) = (x2 + y 2 )2 − 8y 2 (d) f (x, y ) = (x2 − y 2 )2 − 8y 2 (e) f (x, y ) = (x2 − y 2 )2 (f) f (x, y ) = x2 − 2xy + 1 y 3 − 3y 3 110. Consider the function x3 + y 2 − 3x − 2y + 2 . At the point (2, 1) find the direction in which the directional derivative is greatest. Find the direction where it is least. 111. Let f : E2 → E be a suitably differentiable function and let X (t) be the equation of a smooth curve C in E2 on which f is identically constant, say, f (X (t)) ≡ 4 . Show that on this curve, f is perpendicular to the velocity vector X (t) . [Hint: Do something to ϕ(t) = f (X (t)) . The proof takes but one line.]. 112. Consider the following statements concerning a function f : En → E . (A) f is continuous. (B) f has a total derivative everywhere. (C) f has first order partial derivatives everywhere. (D) f has a total derivative everywhere which is continuous everywhere. (E) f has first order partial derivatives everywhere and they are continuous functions everywhere. (F) f is an affine function. (G) f ≡ 0 . (a) Which of these statements always imply which others. A sample (possibly incorrect) answer might look like (A) ⇒ B, F, · · · (B ) ⇒ A, · · · (b) Find examples illustrating each case where a given statement does not imply another (the Exercises, pp. 588-95, contain the required examples). 113. Solve the following ordinary differential equations subject to the given auxiliary conditions (a) u − u − 6u = 0, (b) xu + u = ex−1 , u(0) = 0, u (0) = 5 u(1) = 2 (c) u − 6u + 10u = 0 , general solution. 407 114. (a) If u(x, y, t) = xexy + t2 , while x = 1 − t3 and y = log t2 , then let w(t) = u(x(t), y (t), t) . Find dw at t = 1 . dt (b) If F : E3 → E2 and G : E2 → E2 are defined by F (X ) = 2x1 − x2 + x2 x3 + 1 2 , x2 − x2 + x2 1 3 G(Y ) = y1 + y2 sin y1 2, −3y1 y2 + y2 (i) Why doesn’t F ◦ G make sense? (ii) Compute [G ◦ F ] at the point X0 = (0, 1, 0) . 115. Let F = E2 → E2 and G : E3 → E2 be defined by 2 F (w, z ) = ew+z 2 ez +w G(r, s, t) = r + s2 + t3 . s + t2 + r 3 (a) Find F and G . (b) Which of F ◦ G or G ◦ F makes sense? (c) If G ◦ F makes sense, compute (G ◦ F ) at (−1, −1) . (d) If F ◦ G makes sense, compute (F ◦ G) at (−1, 0, 0) . 116. Let F : X → Y and G : Y → Z be defined by F: y1 = x2 − ex1 +2x2 , y2 = x1 x2 G: w1 = y2 + y2 sin y1 w2 = (y1 + y2 )2 (a) Compute F at X0 = (−2, 1) and G at Y0 = F (X0 ) . (b) Let H = G ◦ F . Compute H at X0 = (−2, 1) . 117. Consider the map F : E2 → E3 defined by f1 (x, y ) = y + ex−y f2 (x, y ) = sin(x − 2y + 1) F: f3 (x, y ) = x − 3x2 + y 2 (a) Find the tangent map at the point X0 = (1, 1) . (b) Use the result of part (a) to evaluate approximately F at X1 = (1.1, .9) . 118. Consider the system of O.D.E.’s u = αu v = αu − βv, where α and β are constants. If u(0) = A and v (0) = B , (a) Find u(t) . (b) Find v (t) (remember to consider the case α = β separately). 408 CHAPTER 10. MISCELLANEOUS SUPPLEMENTARY PROBLEMS 119. (a) Consider the homogeneous equation u + a(t)u = 0, where a(t) is continuous and periodic with period P , so a(t + P ) = a(t) . (i) If a(t) ≡ 1 , show that there is no non-trivial periodic solution by merely solving the equation. (ii) If a(t) = cos t , show (again by solving the equation) that there is a periodic solution u(t) with period 2π . (iii) In general, if u(t) is a solution, not necessarily periodic, show that v (t) ≡ u(t + P ) is also a solution. (iv) Show that the homogeneous equation has a non-trivial periodic solution of period P if and only if P a(t) dt = 0 0 (b) Consider the inhomogeneous equation u + a(t)u = f (t), where both a(t) and f (t) are continuous and periodic with period P . P (i) If a(t) dt = K = 0 , show that the inhomogeneous equation has one and 0 only one periodic solution with period P . P (ii) If a(t) dt = 0 , find a necessary condition on f that the inhomogeneous 0 equation have a periodic solution with period P . 120. Let f : En → E be a differentiable function and denote the directional derivative in the direction of the unit vector e by De f . Prove that D−e f = −De f . 121. Let f : En → E be of the form f (a1 x1 + . . . + an xn ) . Write α = (a1 , · · · , an ) and β = (b1 , · · · , bn ) . If β is perpendicular to α , prove that β ⊥ f . 122. Let R denote the rectangle 0 ≤ x1 < 2π, 0 ≤ x2 < 2π , and define the map f : R → E1 by f (x1 , x2 ) = (3 + 2 cos x2 ) sin x1 Find and classify the critical points of f . (This function is the height function of a torus with major radius 3 and minor radius 2). 123. Consider the constant coefficient differential operator Lu ≡ au + bu + cu, (a, b, c real, a = 0.) Let λ1 and λ2 denote the roots of the characteristic polynomial p(λ) = aλ2 + bλ + c . (a) If λ1 = λ2 , find a formula for a particular solution of Lu = f . x 1 [Answer: up (x) = [eλ1 (x−t) − e−λ2 (x−t) ]f (t) dt. λ1 − λ2 409 ¯ (b) If λ1 is complex, say, λ1 = α + iβ , then λ2 = λ1 = α − iβ . Show that in this case, the above formula simplifies to up (x) = 1 β x eα(x−t) sin β (x − t)f (t) dt. (c) If λ1 = λ2 , find a formula for a particular solution of Lu = f . x [Answer: up (x) = (x − t)eλ1 (x−t) f (t) dt ]. f dA where D is the triangle with vertices at (−1, 1), (0, 0) , and 124. Consider D (3, 1) . (a) Set up the iterated integrals in two ways. (b) Evaluate one of the integrals in (a) for the integrand f (x, y ) = (x + y )2 . 125. When a double integral was set up for the mass M of a certain plate with density f (x, y ) , the following sum of iterated integrals was obtained x3 2 M= ( 1 8 f (x, y ) dy ) dx + 8 ( x 2 f (x, y ) dy ) dx. x (a) Sketch the domain of integration and express M as an iterated integral in which the order of integration is reversed. (b) Evaluate M if f (x, y ) = 1 x . y 1 xy dx dy . 126. Evaluate 0 0 127. It is difficult to evaluate the integral I = f dA , where f (x, y ) = D is the indicated rectangle. However, you can show that (trivially) 1 3 < I < 3, 2 and, with a bit more effort but the same method, that 1 2 Please do so. < I < 3. 2 1 1+x+y 2 and D 410 CHAPTER 10. MISCELLANEOUS SUPPLEMENTARY PROBLEMS 128. Consider the integral I = D f dA , where f (x, h) = 3 8+ x4 + y 4 and D is the domain inside the curve x4 + y 4 = 16 . Show that √ 2 2 < I < 6. [Hint: Show that 1 < f < 3 in D . Then approximate the area of D by an inscribed 4 8 A 3 and circumscribed square. For the record, it turns out that I = 34 ln( 2 ) , where A is the 21 area = Γ( )2 ]. π4 129. (a) Find the derivative matrix for the following mappings Y = F (X ) at the given point X0 . y1 = x2 + sin x1 x2 1 y2 = x2 + cos x1 x2 2 (i) F: (ii) y1 y2 F: y3 y4 at X0 = (0, 0) = x2 + x3 ex2 − x3 1 2 = x1 − 3x2 + x1 log x3 = x2 + x3 = f x1 x2 x3 at X0 = (2, 0, 1) (b) Find the equation of the tangent plane to the above surfaces at the given point. 130. Consider the following map F from E2 → E2 , the familiar change of variables from polar to rectangular coordinates. F: y1 = x1 cos x2 y2 = x1 sin x2 (a) Find the images of (i) the semi-infinite strip 1 ≤ x1 < ∞, (ii) the semi-infinite strip 0 ≤ x1 < ∞, 0 ≤ x2 ≤ 0 ≤ x2 ≤ (b) Compute F and det F . 131. Given that up (x) = e3x + e−2x − 2ex/2 is a solution of au + bu + cu = e3 x, find the constants a, b , and c . 132. Evaluate the determinants of the following 1 1y 11 1 1 2 x z 1 −1 1 −1 (b) w2 x2 0 (a) 1 −1 −1 1 w3 x3 0 1 1 −1 −1 w4 x4 0 matrices. zt t y 0 0 . 0 0 00 π 2. 3π 2. 411 133. For what value(s) of x is the following matrix invertible? 1 1 1 1 111 2 22 23 3 32 33 x x2 x3 (Hint: Observe that the determinant is a cubic polynomial all of whose roots are obvious). n 134. Let f (x) = k=1 sin kx and g (x) = ak √ π n sin r x b√ . π r=1 By direct integration prove that n π f (x)g (x) dx = −π aj bj . j =1 After you are done, compare with Theorem 15, page 206-7 and its proof. 135. Let 1 1 A= 0 0 1 0 0 0 00 0 0 0 −1 10 (a) Find det A . (b) Find A−1 . 2 2 (c) Solve AX = Y , where Y = . 1 3 (d) Let S = { u : u(x) = aex + bex + c sin x + d cos x } , where a, b, c , and d are any real numbers, and define a linear operator L : S → S by the rule Lu ≡ u − u + u. Find the matrix e Le for L with respect to the following basis for S : e1 (x) = xex , e2 (x) = x, e3 (x) = sin x, e4 (x) = cos x. (e) Use the above results to find a solution of Lu = 2xex + 2ex + sin x + 3 cos x. 136. Let u1 and u2 be solutions of the homogeneous equation Lu ≡ a2 (x)u + a1 (x)u + a0 (x)u = 0. 412 CHAPTER 10. MISCELLANEOUS SUPPLEMENTARY PROBLEMS (a) Show that W (x) ≡ W (u1 , u2 )(x) , the Wronskian of u1 and u2 satisfies the differential equation a1 (x) W. W =− a2 (x) (b) Find the equation of (a) for the particular operator Lu ≡ x2 u − 2xu + 2u and solve it for W under the condition that W (1) = 1 . (c) Given that u1 (x) = x is a solution of Lu = 0 for the operator of part (b), use the result of (b) to show that if u2 is another solution of Lu = 0 , then u2 satisfies the equation 1 u2 − u2 = x, x provided that W (x, u2 )(1) = 1 . (d) Solve the equation of part (c) under the assumption that u2 (1) = 1 , and thus find a second independent solution of the equation Lu = 0 for the operator of part (b). (e) Generalize the idea of parts (c) - (d) by stating and proving some theorem. 137. Here are some linear transformations defined in terms of matrices. In each case, describe geometrically what the transformation does, by computing the images of the three parallelograms Q1 : with vertices at (0, 0), (2, 0), (3, 1), (1, 1). Q2 : with vertices at (1, 2), (3, 2), (4, 3), (2, 3). Q3 : with vertices at (1, 0), (0, 2), (−1, 0), (0, −2). (a) Diagonal Maps (Stretchings) L1 = L4 = 30 , 01 10 , 0 −1 L7 = a0 , 0a L2 = L5 = L8 = a0 , 01 L3 = −1 0 , 0 −1 10 , 0b −4 0 , 06 L6 = L9 = −2 0 , 00 a0 , 0b (Remember to consider negative values of a and b ). (b) Maps with 0 on the diagonal. L1 = L4 = 01 , 00 01 , 10 L2 = L5 = 0a , 00 0a , 10 L3 = L6 = 00 , −1 0 0a . b0 413 (c) Upper Triangular Matrices. L1 = 11 , 01 L2 = 1 −1 , 01 L4 = 1a , 01 L5 = −1 −1 , 0 0 L3 = L6 = 11 , 0 −1 a1 . 0b (d) Orthogonal Matrices (Rotations and Reflections). L1 = 0 −1 10 L3 = 3 5 L2 4 5 = 4 5 L4 = −3 5 1 − √2 1 √ 2 1 √ 2 1 √ 2 . 138. Let a and b be real numbers such that a2 + b2 = 1 . Let S= a2 ab , ab b2 P e1 = e1 (b) Se2 = −e2 , (c) S 2 = I, P= e2 = (−b, a) , so e1 ⊥ e2 . Show that and let e1 = (a, b), (a) Se1 = e2 , a2 − b2 2ab , 2ab b2 − a2 P e2 = 0 P2 = P (d) Show that S can be interpreted as the reflection which leaves the line through e1 fixed, and that P can be interpreted as the projection onto the line through e1 parallel to e2 . 139. (a) Consider the following relation defined on the set of all integers: nRm if n and m are both even integers. Verify that this relation is symmetric and transitive but not reflexive (since, for example, 1R1 ). / (b) Let R be a symmetric and transitive relation defined on a set A . If, given any element x in A , there is some element y related to it, xRy , prove that the relation R is also reflexive. (The example in part (a) shows that the assertion will be false if some element is related to no others). 140. Let an be a decreasing sequence of positive real numbers which satisfy an−1 an+1 ≤ 1/n an a2 . If an converges, prove that n an−1 converges too. [Hint: Show that (an /an−1 )1/n ≤ an ]. 141. (a) Prove that the series an z n and a2 z n have the same radii of convergence. n (b) Prove that the series of convergence. an z n and (an )kz n , where k > 0 , have the same radii 142. Let V be a linear space and L an invertible linear map, L : V → V . If { e1 , . . . , en } is a basis for V , prove that its image { Le1 , Le2 , . . . , Len } is also a basis for V . 143. Let H be an inner product space and R an orthogonal transformation, R : H → H . If { e1 , . . . , en } is a complete orthonormal set for H , prove that its image { Re1 , . . . , Ren } is also a complete orthonormal set for H . 414 CHAPTER 10. MISCELLANEOUS SUPPLEMENTARY PROBLEMS 144. (a) Let R be an orthogonal matrix and let ρ1 and ρ2 be any two of its column vectors. Prove that ρ1 ⊥ ρ2 . Prove that any two rows of an orthogonal matrix are also orthogonal to each other. (b) Conversely, let A be a square matrix whose column vectors are orthogonal. Must A be an orthogonal matrix? Proof or counterexample. 145. Let H be an inner product space and A the subspace of H spanned by the vectors X1 , . . . , Xn . The Gram determinant of those vectors is defined as G(X1 , . . . , Xn ) = X1 , X1 X1 , X2 · · · X1 , Xn ··· ··· Xn , X1 · · · · Xn , Xn (a) Prove that X1 , · · · , Xn are linearly dependent if and only if G(X1 , · · · , Xn ) = 0 . [Suggestion: If Z ∈ A , then Z = a1 X1 + · · · an Xn , where the scalars a1 , · · · , an are to be found. This can be done in two ways, by Theorem 31, page 428, or by solving the n equations Z , X1 · · · · Z , Xn = a1 X1 , X1 + · · · + an Xn , X1 = a1 X1 , Xn + · · · + an Xn , Xn which are obtained from Z, Xj = ai X1 + · · · + an Xn , Xj . Couple both methods to prove the result]. (b) If X1 , · · · , Xn are an orthogonal set of vectors, compute G(X1 , · · · , Xn ) . (c) If Y ∈ H , prove that the distance of Y from the subspace A, is given by the formula δ 2 = Y − PA Y 2 = Y − PA Y = δ , G(Y, X1 , . . . , Xn ) . G(X1 , . . . , Xn ) [Suggestion: Observe that δ 2 = Y −PA Y 2 = Y −PA Y, Y and that PA Y, Y = an X1 , Y + · · · + an Xn , Y . Now write PA Y as Z , use the n equations in a) and the one equation δ 2 = Y , Y − a1 X1 , Y − · · · − an Xn , Y to solve for δ 2 by using Cramer’s rule]. (d) Use the fact that G(X1 ) = X1 , X1 to prove the Gram determinant of linearly independent vectors is always positive. In particular, deduce the Cauchy Schwarz inequality from G(X1 , X2 ) ≥ 0 . (e) In L2 [0, 1] , let X1 = 1+ x , and X2 = x3 . Compute G(X1 , X2 ) . Let Y = 2 − x4 and compute Y − PA Y , where A is the subspace spanned by X1 and X2 . 415 (f) (Muntz) In L2 [0, 1] , let An = span{ xj1 , xj2 , · · · , xjn } where j1 , · · · , jn are distinct positive integers. Let Y = xk , where k is a positive integer by not one 1 of the j ’s. Prove that lim Y − PAn Y = 0 if and only if diverges. n→∞ jn 146. (a) Use Theorem 17, page 217 to find linear polynomials P and Q such that, respectively, 1 [x2 − P (x)]2 dx is minimized, (i) −1 1 [x2 − Q(x)]2 dx is minimized. (ii) 0 (b) Write P (x) = a + bx and use calculus to again find the values of a and b such that 1 [x2 − P (x)]2 dx −1 is minimized. 147. Let Z = (1, 1, 1, 1, 1) ∈ E5 and let A be the subspace of E5 spanned by X1 = (1, 0, 1, 0, 0), X2 = (1, 0, 0, −1, 0) , and X3 = (0, 1, 0, 0, 1) . Find Z − PA Z . 148. Let Γ0 be a closed planar curve which encloses a convex region, and let Γr be the “parallel” curve obtained by moving out a distance of r along the outer normal. (a) Discover a formula relating the arc length of Γr to that of Γ0 . [Advise: Examine the special cases of a circle, rectangle, and convex polygon]. (b) Prove the result you conjectured in part a). 149. The hypergeometric function F (a, b; c; x) is defined by the power series F (a, b; c; x) = 1 + a(a + 1)b(b + 1) 2 a(a + 1)(a + 2)b(b + 1)(b + 2) 2 a·b x+ x+ x +··· 1·c 1 · 2 · c(c + 1) 1 · 2 · 3c(c + 1)(c + 2) (a) Show that the series converges for all |x| < 1 . (b) Show that d dx F (a, b; c; x) = ab c F (a (c) Show that (i) (ii) (iii) (iv) (v) (1 − x)n = F (−n, b; b; x) (1 + x)n = F (−n, b; b; −x) log(1 − x) = −xF (1, 1; 2, x) 1 log( 1+x ) = 2xF ( 2 , 1; 3 ; x2 ) 1− x 2 ex = lim F (1, b; 1; x/b) b→∞ 1 1 (vi) cos x = F ( 2 , − 1 ; 2 , sin2 x) 2 113 (vii) sin−1 x = xF ( 2 , 2 ; 2 ; x2 ) 1 (viii) tan−1 x = xF ( 2 , 1; 3 ; −x2 ) 2 + 1, b + 1; c + 1; x) . 416 CHAPTER 10. MISCELLANEOUS SUPPLEMENTARY PROBLEMS (d) Show that F satisfies the hypergeometric differential equation x(1 − x) d2 F dF + [c − (a + b + 1)x] − ab F = 0. 2 dx dx [This equation is essentially the most general one with three regular singular points - in this case located at 0, 1 , and ∞ ]. 150. Let { e1 , · · · , en } be a complete orthonormal set of En and let { X1 , · · · , Xn } be a set of vectors which are close to the ej ’s in the sense that n Xj − ej 2 < 1. j =1 Prove that the Xj ’s are linearly independent. Give an example in E3 of linearly dependent vectors { X1 , X2 , X3 } which satisfy n Xj − ej 2 = 1. j =1 [In fact, one can prove that n ⊥ dim A ≤ Xj − ej 2 , j =1 where A = span{ X1 , · · · , Xn } ]. 151. (a) Show that the function f (z ) = ez , z ∈ C , is never zero. (b) Scrutinize the proof of the Fundamental Theorem of Algebra (pp. 544-548) and find where it breaks down if one attempts to extend it to prove that ez has at least one zero. ...
View Full Document

{[ snackBarMessage ]}

Ask a homework question - tutors are online