**Unformatted text preview: **INTERMEDIATE CALCULUS
AND
LINEAR ALGEBRA
Part I
J. KAZDAN Harvard University
Lecture Notes ii Preface
These notes will contain most of the material covered in class, and be distributed before
each lecture (hopefully). Since the course is an experimental one and the notes written
before the lectures are delivered, there will inevitably be some sloppiness, disorganization,
and even egregious blunders—not to mention the issue of clarity in exposition. But we will
try. Part of your task is, in fact, to catch and point out these rough spots. In mathematics,
proofs are not dogma given by authority; rather a proof is a way of convincing one of the
validity of a statement. If, after a reasonable attempt, you are not convinced, complain
loudly.
Our subject matter is intermediate calculus and linear algebra. We shall develop the
material of linear algebra and use it as setting for the relevant material of intermediate
calculus. The ﬁrst portion of our work—Chapter 1 on inﬁnite series—more properly belongs
in the ﬁrst year, but is relegated to the second year by circumstance. Presumably this topic
will eventually take its more proper place in the ﬁrst year.
Our course will have a tendency to swallow whole two other more advanced courses,
and consequently, like the duck in Peter and the Wolf, remain undigested until regurgitated
alive and kicking. To mitigate—if not avoid—this problem, we shall often take pains to
state a theorem clearly and then either prove only some special case, or oﬀer no proof at
all. This will be true especially if the proof involves technical details which do not help
illuminate the landscape. More often than not, when we only prove a special case, the
proof in the general case is essentially identical—the equations only becoming larger.
September 1964 iii Afterward
I have now taught from these notes for two years. No attempt has been made to revise
them, although a major revision would be needed to bring them even vaguely in line with
what I now believe is the “right” way to do things. And too, the last several chapters
remain unwritten. Because the notes were written as a ﬁrst draft under panic pressure,
they contain many incompletely thought-out ideas and expose the whimsy of my passing
moods.
It is with this—and the novelty of the material at the sophomore level—in mind, that
the following suggestions and students’ reactions are listed. There are three categories,
A), Material that turned out to be too diﬃcult (they found rigor hard, but not many of
the abstractions), B), changes in the order of covering the stuﬀ, and C), material—mainly
supplementary at this level—which is not too hard, but should be omitted if one ever hopes
to complete the ”standard” topics within the conﬁnes of a year course.
(A) It was too hard (unless one took vast chunks of time).
(1) Completeness of reals. Only “monotone sequences converge” is needed for inﬁnite
series.
(2) Term-by-term diﬀerentiation and integration of power series. The statement of
the main theorem should be fully intelligible—but the proof is too complicated.
(3) Cosets. This is apparently too abstract. It might be possible to do after ﬁnding
general solutions of linear inhomogeneous O.D.E.’s.
(4) L2 and uniform convergence of Fourier series. Again, all I ended up doing was to
try to state what the issues were, and not to attempt the proof. The ambitious
student should be warned that my proof of the Weierstrass theorem is opaque
(one should explicitly introduce the idea of an approximate identity).
(5) Fundamental Theorem of Algebra. The students simply don’t believe inequalities
in such profusion.
(6) I you want to see rank confusion, try to teach the class how to compute higher
order partial derivatives using the chain rule. That computation should be one
of the headaches of advanced calculus.
(7) Existence of a determinant function. I don’t know a simple proof except for the
one involving permutations—and I hate that one.
(8) Dual spaces. As lovely as the ideas are, this topic is too abstract, and to my
knowledge, unneeded at this level where almost all of the spaces are either ﬁnite
dimensional or Hilbert spaces. One should, however, mention the words “vector”
and “covector” to distinguish column from row vectors. I forgot to do so in these
notes and it did cause some confusion.
(B) Changes in Order and Timing. The structure of the notes is to investigate bare
linear spaces, then linear mappings between them, and ﬁnally non-linear mappings
between them. It is with this in mind that linear O.D.E.’s came before nonlinear maps
from Rn → R . The course ended by treating the simplest problem in the calculus
of variations as an example of a nonlinear map from an inﬁnite dimensional space iv
to the reals. My current feeling is to consider linear and non-linear maps between
ﬁnite dimensional spaces before doing the inﬁnite dimensional example of diﬀerential
equations.
The ﬁrst semester should get up to the generalities on solving LX = Y , p. 319
[incidentally, the material on inverses (p. 355 ﬀ) belongs around p. 319]. Most
students ﬁnd the material on linear dependence diﬃcult—probably for two reasons:
1) they are not used to formal deﬁnitions, and ii) they think they have learned a
technique for doing something, not just a naked deﬁnition, and can’t quite ﬁgure out
just what they can do with it. In other words, they should feel these deﬁnitions about
the anatomy of linear spaces are similar to those describing a football ﬁeld and of
little value until the game begins—i.e., until the operators between spaces make their
grand entrance.
Because of time shortages, the sections on linear maps from R1 → Rn and Rn → R1 ,
pp. 320-41 were regrettably omitted both years I taught the course. The notes were
written so that these sections can be skipped.
(C) Supplementary Material. A remarkable number of fascinating and important topics
could have been included—if there were only enough time. For example:
(1) Change of bases for linear transformations (including the spectral theorem).
(2) Elementary diﬀerential geometry of curves and surfaces.
(3) Inverse and implicit function theorems. These should be stated as natural generalizations of the problems of a) inverting a linear map, b) ﬁnding the null space
of a linear map, and c) generalizing dim D(L) = dim R(L) + dim N (L) all to
local properties of nonlinear maps via the tangent map.
(4) Change of variable in multiple integration. Determinants were deliberately introduced as oriented volume to make the result obvious for linear maps and
plausible for nonlinear maps.
(5) Constrained extrema using Lagrange multipliers.
(6) Line and surface integrals along with the theorems of Gauss, Green, and Stokes.
The formal development of diﬀerential forms takes too much time to do here.
Perhaps a satisfactory solution is to restrict oneself to line integrals and these
theorems in the plane, where the topological diﬃculties are minimal.
(7) Elementary Morse Theory. One can prove the Morse inequalities easily for the
real line, the circle, the plane, and S 2 merely by gradually ﬂooding these sets
and observing the number of lakes and shore line changes only at the critical
points.
(8) Sturm-Liouville theory. An elegant fusion of the geometry of Hilbert spaces to
diﬀerential equations.
(9) Translation-invariant operators with applications to constant coeﬃcient diﬀerence and diﬀerential equations. The Laplace and Fourier transforms enter naturally here.
(10) The Calculus of Variations. The formalism of nonlinear functionals on R , i.e.,
maps f : R → R , generalizes immediately to nonlinear functionals deﬁned on
inﬁnite dimensional spaces. v
(11) The deleted rigor.
(12) Linear operators with ﬁnite dimensional (perhaps even compact) range.
One parting warning. When covering intermediate calculus from this viewpoint, it is
all too natural to forget the innocence of the class, to enchant with glitter, and to numb
with purity and formalism. Emphasis should be placed on developing insight and intuition
along with routine computational facility.
My classes found frequent reviews of the mathematical ediﬁce, backward glances at the
previous months’ work, not only helpful but mandatory if they were to have any conception
of the vast canvas which was being etched in their minds over the course of the year. The
question, “What are we doing now and how does it ﬁt into the larger plan?” must constantly
be raised and at least partially resolved.
May, 1966 Contents
0 Remembrance of Things Past.
0.1 Sets and Functions . . . . . . . . . . . . . . . . . . . . . . . . .
0.2 Relations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
0.3 Mathematical Induction . . . . . . . . . . . . . . . . . . . . . .
0.4 Reals: Algebraic and Order Properties . . . . . . . . . . . . . .
0.5 Reals: Completeness . . . . . . . . . . . . . . . . . . . . . . . .
0.6 Appendix: Continuous Functions and the Mean Value Theorem
0.7 Complex Numbers: Algebraic Properties . . . . . . . . . . . . .
0.8 Complex numbers: Completeness and Functions . . . . . . . .
1 Inﬁnite Series
1.1 Introduction . . . . . . . . . . . . . . . . . . .
1.2 Tests for Convergence of Positive Series . . .
1.3 Absolute and Conditional Convergence . . . .
1.4 Power Series, Inﬁnite Series of Functions . . .
1.5 Properties of Functions Represented by Power
1.6 Complex-Valued Functions, ez , cos z, sin z . .
1.7 Appendix to Chapter 1, Section 7. . . . . . .
2 Linear Vector Spaces: Algebraic Structure
2.1 Examples and Deﬁnition . . . . . . . . . . . .
a)
The Space R2 . . . . . . . . . . . . . .
b)
The Space Rn . . . . . . . . . . . . .
c)
The Space C [a, b] . . . . . . . . . . . .
d)
D. The Space C k [a, b] . . . . . . . . .
e)
E. The Space l1 . . . . . . . . . . . . .
f)
F. The Space L1 [a, b] . . . . . . . . . .
g)
G. The Space fn . . . . . . . . . . . .
h)
Appendix. Free Vectors . . . . . . . .
2.2 Subspaces. Cosets. . . . . . . . . . . . . . . .
2.3 Linear Dependence and Independence. Span.
2.4 Bases and Dimension . . . . . . . . . . . . . . ....
....
....
....
Series
....
.... .
.
.
.
.
.
.
.
.
.
.
. .
.
.
.
.
.
.
.
.
.
.
. .
.
.
.
.
.
.
.
.
.
.
. .
.
.
.
.
.
.
.
.
.
.
. .
.
.
.
.
.
. .
.
.
.
.
.
.
.
.
.
.
. .
.
.
.
.
.
. .
.
.
.
.
.
.
.
.
.
.
. .
.
.
.
.
.
. .
.
.
.
.
.
.
.
.
.
.
. .
.
.
.
.
.
. .
.
.
.
.
.
.
.
.
.
.
. .
.
.
.
.
.
. .
.
.
.
.
.
.
.
.
.
.
. .
.
.
.
.
.
. .
.
.
.
.
.
.
.
.
.
.
. .
.
.
.
.
.
.
. .
.
.
.
.
.
. .
.
.
.
.
.
.
.
.
.
.
. .
.
.
.
.
.
.
. .
.
.
.
.
.
. .
.
.
.
.
.
.
.
.
.
.
. .
.
.
.
.
.
.
. .
.
.
.
.
.
. .
.
.
.
.
.
.
.
.
.
.
. .
.
.
.
.
.
.
. .
.
.
.
.
.
. .
.
.
.
.
.
.
.
.
.
.
. .
.
.
.
.
.
.
. .
.
.
.
.
.
. .
.
.
.
.
.
.
.
.
.
.
. .
.
.
.
.
.
.
. .
.
.
.
.
.
. .
.
.
.
.
.
.
.
.
.
.
. .
.
.
.
.
.
.
. 1
1
5
6
7
9
15
22
28 .
.
.
.
.
.
. 33
33
36
41
43
48
65
70 .
.
.
.
.
.
.
.
.
.
.
. 75
75
75
76
77
77
78
78
79
80
84
88
93 3 Linear Spaces: Norms and Inner Products
101
3.1 Metric and Normed Spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . 101
3.2 The Scalar Product in E2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107
3.3 Abstract Scalar Product Spaces . . . . . . . . . . . . . . . . . . . . . . . . . 113
vii viii CONTENTS
3.4
3.5
3.6 Fourier Series. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Appendix. The Weierstrass Approximation Theorem . . . . . . . . . . . . .
The Vector Product in R3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 Linear Operators: Generalities. V 1 → Vn , Vn → V 1
4.1 Introduction. Algebra of Operators . . . . . . . . .
4.2 A Digression to Consider au + bu + cu = f . . . .
4.3 Generalities on LX = Y . . . . . . . . . . . . . . .
4.4 L : R1 → Rn . Parametrized Straight Lines. . . . .
4.5 L : Rn → R1 . Hyperplanes. . . . . . . . . . . . . . .
.
.
.
. .
.
.
.
. .
.
.
.
. .
.
.
.
. .
.
.
.
. .
.
.
.
. .
.
.
.
. .
.
.
.
. .
.
.
.
. .
.
.
.
. .
.
.
.
. .
.
.
.
. .
.
.
.
. 132
140
146 .
.
.
.
. 147
147
161
170
177
182 5 Matrix Representation
5.1 L : Rm → Rn . . . . . . . . . . . . . . . . . . . . . . . . .
5.2 Supplement on Quadratic Forms . . . . . . . . . . . . .
5.3 Volume, Determinants, and Linear Algebraic Equations.
a)
Application to Linear Equations . . . . . . . . .
5.4 An Application to Genetics . . . . . . . . . . . . . . . .
5.5 A pause to ﬁnd out where we are . . . . . . . . . . . . . .
.
.
.
.
. .
.
.
.
.
. .
.
.
.
.
. .
.
.
.
.
. .
.
.
.
.
. .
.
.
.
.
. .
.
.
.
.
. .
.
.
.
.
. .
.
.
.
.
. .
.
.
.
.
. .
.
.
.
.
. 187
187
210
217
234
243
246 6 Linear Ordinary Diﬀerential Equations
6.1 Introduction . . . . . . . . . . . . . . . . . . . . . .
6.2 First Order Linear . . . . . . . . . . . . . . . . . .
6.3 Linear Equations of Second Order . . . . . . . . .
a)
A Review of the Constant Coeﬃcient Case.
b)
Power Series Solutions . . . . . . . . . . . .
c)
General Theory . . . . . . . . . . . . . . . .
6.4 First Order Linear Systems . . . . . . . . . . . . .
6.5 Translation Invariant Linear Operators . . . . . . .
6.6 A Linear Triatomic Molecule . . . . . . . . . . . . .
.
.
.
.
.
.
.
. .
.
.
.
.
.
.
.
. .
.
.
.
.
.
.
.
. .
.
.
.
.
.
.
.
. .
.
.
.
.
.
.
.
. .
.
.
.
.
.
.
.
. .
.
.
.
.
.
.
.
. .
.
.
.
.
.
.
.
. .
.
.
.
.
.
.
.
. .
.
.
.
.
.
.
.
. .
.
.
.
.
.
.
.
. 249
249
252
258
258
259
266
278
283
286 .
.
.
.
.
.
.
.
. .
.
.
.
.
.
.
.
. .
.
.
.
.
.
.
.
. 7 Nonlinear Operators: Introduction
293
1 to R1 , a Review . . . . . . . . . . . . . . . . . . . . . . 293
7.1 Mappings from R
7.2 Generalities on Mappings from Rn to Rm . . . . . . . . . . . . . . . . . . . 295
7.3 Mapping from E1 to En . . . . . . . . . . . . . . . . . . . . . . . . . . . . 300
8 Mappings from En to E : The Diﬀerential Calculus
8.1 The Directional and Total Derivatives . . . . . . . . .
8.2 The Mean Value Theorem. Local Extrema. . . . . . .
8.3 The Vibrating String. . . . . . . . . . . . . . . . . . .
a)
The Mathematical Model . . . . . . . . . . . .
b)
Uniqueness . . . . . . . . . . . . . . . . . . . .
c)
Existence . . . . . . . . . . . . . . . . . . . . .
8.4 Multiple Integrals . . . . . . . . . . . . . . . . . . . . . .
.
.
.
.
.
. .
.
.
.
.
.
. .
.
.
.
.
.
. .
.
.
.
.
.
. .
.
.
.
.
.
. .
.
.
.
.
.
. .
.
.
.
.
.
. .
.
.
.
.
.
. .
.
.
.
.
.
. .
.
.
.
.
.
. .
.
.
.
.
.
. .
.
.
.
.
.
. 309
309
321
332
333
334
336
347 9 Diﬀerential Calculus of Maps from En to Em , s.
361
9.1 The Derivative . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 361
9.2 The Derivative of Composite Maps (“The Chain Rule”). . . . . . . . . . . . 373
10 Miscellaneous Supplementary Problems 383 Chapter 0 Remembrance of Things Past.
We shall treat a hodge-podge of topics in a hasty and incomplete fashion. While most
of these topics should have been learned earlier, section 5 on the completeness of the real
numbers has its more rightful place in advanced calculus. Do not take time to read this
chapter unless the particular topic is needed; then read only the relevant portions. The
chapter is included for reference. 0.1 Sets and Functions A set is any collection of objects, called the elements of the set, together with a criterion
for deciding if an object is in the set. For example, I) the set of all girls with blue eyes and
blond hair, and ii) the less picturesque set of all positive even integers. We can also deﬁne a
set by bluntly listing all of its elements. Thus, the set of all students in this class is deﬁned
by the list in the roll book.
Sets are often speciﬁed by a notation which is best described by examples.
i) S = { x : x is an integer } is the set of all integers.
ii) T = { (x, y ) : x2 + y 2 = 1 } is the set of all points (x, y ) on the unit circle x2 + y 2 = 1 .
iii) A = { 1, 2, 7, −3 } is the set of integers 1, 2, 7 and −3 .
Our attitude toward set theory will be extremely casual; we shall mainly use it as a
language and notation. Without further ado, let us introduce some notation.
x ∈ S,
x is an element of the set S , or just x is in S .
x ∈ S,
x is not an element of the set S .
Z,
the set of all integers, positive, zero, and negative.
Z+ ,
the set of all positive integers, excluding 0.
R
the set of all real numbers (to be deﬁned more precisely later).
C,
The set of all complex numbers (also to be deﬁned more precisely later).
∅,
the set with no elements, the empty or null set. It is extremely uninteresting.
Definition: Given the two sets S and T , i) the set S ∪ T , “ S union T ”, is the set of
elements which are in either S or T , or both.
ii) The set S ∩ T , “ S intersection T ”, is the set of elements in both S and T .
If we represent S by one blob and T by another, S ∪ T is the shaded region while
S ∩ T is the cross-hatched region. Note that all elements in S ∩ T are also in S ∪ T . Two
sets are disjoint if S ∩ T = ∅ , that is, if their intersection is empty.
1 2 CHAPTER 0. REMEMBRANCE OF THINGS PAST. A subset of a set is another way of referring to a portion of a given set. Formally, A is
the subset of S , written A ⊂ S , if every element in A is also an element of S . The set
A is a subset of the set S if and only if either
A ∪ S = S, or, equivalently, A ∩ S = A.
It is possible that A = S , or that A = ∅ . If these degenerate cases are excluded, we say
that A is a proper subset of S .
Given the two sets S and T , it is natural to form a new set S × T , “ S cross T ”,
which consists of all pairs of elements, one from S and the other from T . For example, if
S is the set of all men in this class, and T the set of all women in this class, then S × T
is the set of all couples, a natural set to contemplate.
If x ∈ S and y ∈ T , the standard notation for the induced element in S × T is (x, y ) .
Note that the order in (x, y ) is important. The element on the left is from S , while that on
the right is from T . For this reason the pair of elements (x, y ) is usually called an ordered
pair. The whole set S × T is called the product, direct product, or Cartesian product of S
and T , all three names being used interchangeably.
You have met this idea in graphing points in the plane. Since these points, (x, y ) , are
determined by an ordered pair of real numbers, they are just the elements of R × R . From
this example it is clear that even though this set R × R is the product of a set with itself,
the order of the pair (x, y ) is still important. For example the point (1, 2) ∈ R × R is
certainly not the same as (2, 1) ∈ R × R .
Having deﬁned the direct product of two sets S and T as ordered pairs, it is reasonable
to deﬁne the direct product of three sets S, T, and U as the set of ordered triplets (x, y, z ) ,
where x ∈ S, y ∈ T , and z ∈ U . The extension to n sets, S1 × S2 × · · · × Sn , is done in
the same way.
Let us now recall the ideas behind the notion of a function.
A function f from the set X into the set B is a rule which assigns to every x ∈ X
one and only one element y = f (x) ∈ B . We shall also say that f maps X into B , and
write either
f
f : X → B, or X → B.
This alternative notation is useful when X and B are more important than the speciﬁc
nature of f . The set X is the domain of f , while the range of f is the subset Y ⊂ B of
all elements y ∈ B which are the image of (at least) one point x ∈ X , so y = f (x) , or in
suggestive notation, Y = f (X ).
Automobile license plates supply a nice example, for they assign to every license plate
sold a unique car. The domain is the set of all license plates sold, while the range is not all
cars, but rather the subset of all cars which are driven. Wrecks and museum pieces neither
need nor have license plates since they are not on the roads. Some other examples are i)
1
the function f (n) ≡ n , n = 1, 2, 3, . . . which assigns to every n ∈ Z+ the rational number
1
m
n , and ii) the function f (n, m) = n , n, m = 1, 2, 2, . . . , which assigns to every element of
m
Z+ × Z+ the rational number n .
Quite often we shall use functions which map part of some set into part of some other
set. In other words the function may be deﬁned on only a subset of a given set and take on
values in a subset of some other set. The function f (n, m) ≡ m of the previous paragraph
n
is of this nature for we deﬁned it on a subset of Z × Z and takes its values on the positive
subset of the set of all rational numbers. 0.1. SETS AND FUNCTIONS 3 There is some standard nomenclature (or $10 words if you like) associated with mapping. Say X ⊂ A and the function f : X → B . Note that we know the deﬁnition of f
only on X . It may not be deﬁned for the remainder of A .
Definition: i) if every element of B is the image of (at least) one point in X , the map
f is called surjective or onto. In other words f : X → B is a surjection if the range of f
is all of B . Thus f is always surjective onto its range.
ii) If the map f has the property that for every x1 , x2 ∈ X , we have f (x1 ) = f (x2 )
when and only when x1 = x2 , the map is called injective or one to one (1-1). This is the
case if no two diﬀerent elements in X are mapped into the same element in B .
iii) If the map f is both surjective and injective, that is, if it is both onto and 1-1, then
f is called bijective.
Examples: For these, we have f : X → B where X = B = Z .
(1) The map f (n) = 2n is injective but not surjective since the range does not contain
the odd integers in B .
n
2
n+1
2 if nis even
is surjective but not injective since every eleif nis odd
ment in B is the image of two distinct elements of X . (2) The map f (n) = (3) The map f (n) = n + 7 is bijective.
Notational Remark: For functions whose domain is Z or Z+ it is customary to indicate
1
the element of the range by a notation like an instead of f (n) . Thus f (n) = n , where
1
n ∈ Z+ , is written as an = n . Such a function is usually called a sequence.
The concepts we have just deﬁned are useful if we try to deﬁne what we mean by the
inverse of a function.
Definition: A function f : X → B is invertible if to every b ∈ B there is one and only
one x ∈ X such that b = f (x) . Thus f is invertible if and only if it is bijective. If f is
invertible, we denote the inverse function by f −1 , so x = f −1 (b) .
If f : A → B , and g : B → C , then when composed (put together) these two functions
induce a mapping, g ◦ f , of A into C . Slightly more generally, if B ⊂ R , and f : A → B
while g : R → C , th en g ◦ f : A → C .
You should be able to see why the composed map g ◦ f is only deﬁned on A , and then
understand that our stipulation that B ⊂ R is a convenient requirement.
If x ∈ A and z ∈ C , then g ◦ f maps x onto z = (g ◦ f )(x) , or in more familiar
notation, z = g (f (x)) . Now an example. Say the distance s you have walked at time t is
speciﬁed by the function s = f (t) , and the amount z of shoe leather worn out by walking
the distance s is given by the function z = g (s) . Then the amount of shoe leather you have
worn out at time t is given by the composed function z = g (f (t)) . Here t ∈ A, s ∈ B ,
and z ∈ C . Hopefully you have by now recognized that the “chain rule” for derivatives is
just the procedure for ﬁnding the derivative of composed functions from their constituent
parts. In our example the chain rule would be used to ﬁnd dz from dg and df -if these
dt
ds
dt
functions were diﬀerentiable.
We conclude this section with more symbols—if you have not yet had enough. These
are borrowed from logic. Although we shall use them only infrequently as a shorthand, they
might have greater use to you in class notes.
∀
“for every” 4 CHAPTER 0. REMEMBRANCE OF THINGS PAST.
∃ “there is”, or “there exists”
“such that”
A ⇒ B “the truth of statement A implies that of statement B ”.
A ⇔ B “statement A is equivalent to statement B , that is, both A ⇒ B and
B ⇒ A. Exercises
(1) If R = { 1, 4 }, S = { 1, 2, 3, 4, } , and T = { 2, 3, 7 } , ﬁnd the six other sets R ∪ S, R ∩
S, R ∪ T, R ∩ T, S ∪ T, and S ∩ T . Which of these nine sets are proper subsets of
which other sets?
(2) If S = { x : |x − 1| ≤ 2 } and T = { x : |x| ≤ 2 } , ﬁnd S ∪ T and S ∩ T . A sketch
is adequate.
(3) If A, B , and C are any subsets of a set S , prove
(a) (A ∪ B ) ∪ C = A ∪ (B ∪ C ) —so that the parenthesis can be omitted without
creating ambiguity.
(b) (A ∩ B ) ∩ C = A ∩ (B ∩ C ) —so that again the parentheses are superﬂuous.
(c) (A ∪ B ) ∩ C = (A ∩ C ) ∪ (B ∩ C ).
(d) (A ∩ B ) ∪ C = (A ∪ C ) ∩ (B ∪ C ).
Remark: two sets X and Y are proved equal by showing that both X ⊂ Y and
Y ⊂X.
(4) If the function f has domain S , and both A ⊂ C and B ⊂ S , prove that
(i) A ⊂ B ⇒ f (A) ⊂ f (B ) .
(ii) f (A ∩ B ) ⊂ f (A) ∩ f (B ) [We cannot hope to prove equality because of counterexamples like: let A = { −2, −1, 0, 1, 2, 3 } and B = { −4, −3, −2, −1 } .
Then with f (n) = n2 , we have f (A) = { 0, 1, 4, 9 }, f (B ) = { 1, 4, 9, 16 } , and
f (A ∪ B ) = { 1, 4 } = f (A) ∩ f (B ) ].
(iii) f (A ∪ B ) = f (A) ∪ f (B ) .
(5) For the following functions f : X → B , classify as to injection, surjection, or bijection,
or none of these.
(i) f (n) = n2 with X = Z+ and B = Z .
(ii) Let X = { all rational numbers } , B = { all rational numbers } , and f (x) =
n
n
where x = m ∈ X [Here m is assumed to be reduced to lowest terms.]
(iii) f (x) = 1
x 1
m , , where x ∈ X and X = B = { all positive rational numbers } . (iv) X = { all women born in May }, B = { the thirty days in the month of June } ,
and let f be the function assigning “her birthday” to each woman born in June.
(v) f (n) = |n| , with X = B = Z . 0.2. RELATIONS 0.2 5 Relations A relationship often exists between elements of sets. Some common examples are i) a ≥ b ,
ii) a ⊥ b (perpendicular to), iii) a loves b , and iv) a = b . Let S be a given set, a ,
b ∈ S , and let R be a relation deﬁned on S (that is, ∀ a, b ∈ S , either aRb or aRb with
no third alternative possible). Most relations have at least one of the following properties.
(i) reﬂexive aRa ∀a ∈ S (ii) symmetric aRb ⇒ bRa
(iii) transitive (aRb and bRc) ⇒ aRc .
Examples:
(1) perpendicular ( ⊥ ) is only symmetric.
(2) “loves” enjoys none of these (well, maybe it is reﬂexive).
(3) equality ( = ) has all three properties.
(4) geometric congruence (∼ and geometric similarity ( ) both have all three.
=)
(5) parallel ( ) has all three—if we are willing to agree that a line is parallel to itself.
(6) “is less than ﬁve miles from” is only reﬂexive and symmetric.
(7) for a, b ∈ Z+ , the relation “ a is divisible by b ” is only reﬂexive and transitive but
not symmetric.
(8) “less than” ( < ) is only transitive.
A relation which is reﬂexive, symmetric and transitive is called an equivalence relation.
The standard examples are those of algebraic equality and of geometric congruence. An
equivalence relation on a set S partitions the set into subsets of equivalent elements. Those
terms are illustrated in the following.
Examples:
(1) In the set S of all triangles, the equivalence relation of geometric congruence partitions S into subsets of congruent triangles, any two triangles of S being in the same
subset (or equivalence class as it is called) if and only i f they are congruent.
(2) In the set P of all people, consider the equivalence relation ”has the same birthday,”
disregarding the year. This relation partitions P into 366 equivalence classes. Two
people are in the same equivalence class if their birthdays fall on the same day of the
year.
Notice that any two equivalence classes are either identical or disjoint, that is, they
have either no elements in common or they coincide. This is particularly clear from the
examples with birthdays.
By the fundamental theorem of calculus, we know that the indeﬁnite integral of an
integrable function f can be represented by any function F whose derivative is f . The 6 CHAPTER 0. REMEMBRANCE OF THINGS PAST. mean value theorem told us that every other indeﬁnite integral of f diﬀers from F by
only a constant. Thus, the indeﬁnite integrals of a given function are an equivalence class
of functions, diﬀering from each other by constants. The equivalence relation is “equal up
to an additive constant”. Exercises
(1) If a, b, c, d ∈ Z+ , let us deﬁne the following equivalence relation between the elements
of Z+ × Z+ :
(a, b)R(c, d) if and only if ad = bc.
Verify that R is an equivalence relation. [In real life, the pair (a, b) of this example
c
is written as a , so all we have said is a = d if and only if ad = bc . This equivalence
b
b
relation partitions the set of rational numbers into very familiar equivalence classes.
3
For example the equivalent rational numbers 1 , 2 , 6 , . . . are in the same equivalence
24
class, to no one’s surprise].
(2) Explain the fallacy in the following argument by observing that equality “ = ” here is
not the usual algebraic equality , but rather some other equivalence relation.
“Let = dx
x . Integration by parts ( p = 1/x , dq = dx ), gives
1
A = x( ) −
x x(− 1
) dx = 1 + A.
x2 Hence 0 = 1 .” 0.3 Mathematical Induction You are familiar with a variety of proofs, viz. direct proofs and proofs by contradiction.
There is, however, another type of proof which is not encountered very often in elementary
mathematics: proof by induction.
Abstractly, you have a sequence of statements P1 , P2 , P3 , . . . , and a guess for the nature
of the general statement Pn . A proof by mathematical induction provides a method for
showing the general statement Pn is correct. Here is how it is carried out. First verify
that the statement is true in some special case, say for n = 1 , so you check the validity of
P1 . Second you show that if it is true in some particular case n = k , then it is true for
the next case n = k + 1 , that is, Pk ⇒ Pk+1 . Now since P1 is true, so is P1+1 = P2 , and
consequently so is P2+1 = P3 , and so on up. Observe that the procedure does not tell you
how in the world to guess the general statement Pn , but only shows how to verify it.
Let us carry out the procedure for an example. We guess the formula
n(n + 1)
(0-1)
2
step 1. Is the formula true for n = 1 ? Yes, since both sides then equal 1.
step 2. Assuming the formula is true for n = k , we must show this implies the formula is
true for n = k + 1 .
1 + 2 + ··· + n = 1 + 2 + · · · + k + (k + 1) = (k + 1)(k + 2)
.
2 0.4. REALS: ALGEBRAIC AND ORDER PROPERTIES 7 The formula, assumed to be true, for n = k is
1 + 2 + ··· + k = k (k + 1)
.
2 Adding (k + 1) to both sides we ﬁnd that
1 + 2 + · · · + k + (k + 1) = (k + 1)(k + 2)
k (k + 1)
+ (k + 1) =
2
2 which is exactly the statement we wanted. This proves that formula (0.3) is true for all
n ≥ 1. Exercises
Use mathematical induction to prove the given statements.
(1) 12 + 22 + · · · + n2 =
(2) d
n
dx (x ) n(n+1)(2n+1)
6 = nxn−1 (use the formula for the derivative of a product). (3) Let I (n) = π
2 0 sinn x dx (a) Prove the following formula is correct when n is an odd integer ≥ 3 ,
I (n) = 2 · 4 · 6 · · · ·(n − 1)
1 · 3 · 5 · · · ·n (b) Guess and prove the formula when n is an even integer ≥ 2 .
(4) Let Γ(s) = ∞ −t s−1
dt ,
0et where s > 0 (this is the famous gamma function ). (a) Show Γ(s + 1) = sΓ(s) (Hint: integrate by parts)
(b) If n ∈ Z+ , guess and prove the formula for Γ(n + 1) . 0.4 The Real Numbers: Algebraic and Order Properties. The set of all real numbers can be characterized by a set of axioms. These properties are of
three diﬀerent types, i) algebraic properties, ii) order properties, and iii) the completeness
property. Of these, the last is by far the most diﬃcult to grasp. But that is getting ahead
of our story. Let S be a set with the following properties.
I. Algebraic Properties
A. Addition. To every pair of elements a, b ∈ S , is associated another element, denoted
by a + b , with the properties
A - 0. (a + b) ∈ S
A - 1. Associative: for every a, b, c ∈ S, a + (b + c) = (a + b) + c .
A - 2. Commutative: a + b = b + a
A - 3. There is an additive identity, that is, an element ”0” ∈ S such that 0 + a = a
for all a ∈ S .
A - 4. For every a ∈ S , there is also a b ∈ S such that a + b = 0 . b is the additive
inverse of a , usually written −a . 8 CHAPTER 0. REMEMBRANCE OF THINGS PAST. M. Multiplication. To every pair a, b ∈ S , there is associated another element, denoted
by ab , with the properties
M - 0. ab ∈ S
M - 1. Associative. For every a, b, c ∈ S, a(bc) = (ab)c .
M - 2. Commutative. ab = ba .
M - 3. There is a multiplicative identity, that is, an element “ l” ∈ S such that la = a
for all a ∈ S . Moreover 1 = 0 .
M - 4. For every a ∈ S, a = 0 , there is also a b ∈ S such that ab = 1 . b is the
1
multiplicative inverse of a , usually written a or a−1 .
D. Connection between Addition and Multiplication.
D - 1. Distributive. For every a, b, c ∈ S, a(b + c) = ab + ac .
Some sample - and simple—consequences of these nine axioms are i) a + 0 = a , ii)
a · 1 = a , and iii) a + b = a + c ⇒ b = c .
Any set whose elements satisfy the axioms A-0 to A-4 is called a commutative (or
abelian ) group. The group operation here is addition. In this language, we see that the
multiplication axioms just state that the elements of S —with the additive identity 0
excluded—also form a commutative group, with the group operation being multiplication.
These additive and multiplicative structures are connected by the distributive axiom. Most
of high school algebra takes place in this setting; however, the possibility of non-integer
exponents is not yet speciﬁcally included; in particular the square root of an element of S
is not necessarily also in S .
Our axioms, or some part of them, are satisﬁed by sets other than the real numbers. The
set of even integers form a commutative group with the group operation being addition,
while numbers of the form 2n , n ∈ Z , form a commutative group under multiplication.
The set of rational numbers satisﬁes all nine axioms. Any such set which satisﬁes all nine
axioms is called a ﬁeld. Both the real numbers and the rational numbers (a subset of the
real numbers) are ﬁelds. A more thorough investigation of groups and ﬁelds is carried out
in courses in modern algebra.
II. Order Axioms
Besides the above algebraic rules, we shall introduce an order relation, intuitively, the
notion of ’greater than”. To do this we need to use an undeﬁned concept of positivity for
elements of S and use it to state our axioms.
O -1. If a ∈ S and b ∈ S are positive, so are a + b and ab .
O -2. The additive identity 0 is not positive.
O - 3. For every a ∈ S, a = 0 , either a or −a is positive, but not both. If −a is
positive, we shall say that a is negative.
Trichotomy Theorem. For any two numbers a, b ∈ S , exactly one of the following three
statements is true, i) a − b is positive, ii) b − a is positive, or iii) b − a is zero. If the
notation a < b is used to mean “ b − a is positive,” and a > b means b < a , then this
theorem reads, either a > b, a < b, or a = b . The proof—which you should do—is a simple
consequence of our axioms.
Some other consequences are
a < b and b < c ⇒ a < c (transitivity of “ < ”)
a < b and c > 0 ⇒ ac < bc
a = 0 ⇒ a2 > 0 . (Since 1 = 12 , this implies 1 > 0 ).
The set of rational numbers as well as the set R of real numbers satisfy all twelve
axioms. Any set which satisﬁes these twelve axioms is called an ordered ﬁeld. 0.5. REALS: COMPLETENESS 9 Exercises
√
(1) Let T be a set whose elements are of the form a + b 2 , where a and b are rational
numbers (and so are elements of a ﬁeld). Show that T is also a ﬁeld.
(2) Consider the set of all integers Z with the following equivalence relation: m ∈ Z
and n ∈ Z are equivalent if they have the same remainder when divided by 2. The
notation for this equivalence is
m≡n (mod2) This equivalence relation partitions Z into two equivalence classes which we may
denote respectively by 0 if the number is even, and 1 if the number is odd . Thus
8 ≡ −22 (mod 2) and 7 ≡ 13 (mod 2). Prove that the set Z with ordinary addition
and multiplication but with this equivalence relation forms a ﬁeld.
(3) Prove the trichotomy theorem.
(4) Prove that if a = 0 , then a2 > 0 . Use it to prove that 1 > 0 and then to conclude
that all of the ’positive integers” are, in fact, positive. 0.5 The Real Numbers: Completeness Property. III. Completeness Axiom.
So far our axioms do not insure that we can take fractional powers like the square root,
of an element of an ordered ﬁeld S and still obtain an element of the same ﬁeld. The issue
here is not merely that of fractional powers or other algebraic operations, but a more serious
one. Imagine the (as yet undeﬁned) real number line. Although the rational numbers are
an inﬁnite number of points on √ line, there are many √
the
“holes” between the rationals. We
already know of one “hole” at 2 , there is another at 3 , at π , and at e . In fact, in a
sense which can be made precise, almost all of the points on the real number line represent
irrational numbers.
The completeness axiom is designed to eliminate the possibility of ”holes” in the real
number line. It does so by more or less bluntly stating that there are no holes. This is the
“Dedekind cut” form of the completeness axiom. we have chosen it over other equivalent
axioms because it is easy to visualize—even though the “Cauchy sequence” form is perhaps
preferable for more advanced analysis courses. A deﬁnition is needed before the axiom can
be stated.
Definition: Let S1 and S2 be subsets of an ordered ﬁeld S . Then the set S1 precedes
S2 if for every a ∈ S1 and b ∈ S2 , we have a ≤ b .
If you imagine the real number line, “ S1 precedes S2 ” should be thought of as meaning
that all of S1 is to the left of all of S2 . S1 and S2 of course might touch, or might just
miss touching.
Completeness Axiom. Let S1 and S2 be nonempty subsets of an ordered ﬁeld S . If
S1 precedes S2 , then there is at least one number c ∈ S such that c precedes S2 and is
preceded by S1 . In other words, there is (at least) one element of S between S1 and S2 .
Definition: The set of real numbers, R , is a set which satisﬁes the above axioms of algebra,
order, and completeness. Thus, the real numbers is a complete ordered ﬁeld.
This type of deﬁnition of R amounts to saying “we don’t know or care what the real
numbers are, but in any event they have the required properties.” If we had used the 10 CHAPTER 0. REMEMBRANCE OF THINGS PAST. Cauchy sequence version of the completeness axiom, we would have begun the rational
numbers—which we do know—and then deﬁned the real numbers as the set of limits of
rational numbers. This would have been somewhat more concrete, but would have involved
the diﬃcult concept of limit before we even get oﬀ the ground.
From the picture associated with the completeness axiom, we see that it exactly states
that the real number line has no holes, for - emotionally speaking—if there were a hole, let
S1 be the set of real numbers to the left of the hole, and S2 the s et to the right of the
hole. Then there would be no real number between S1 and S2 , since the hole is there,
contradicting the completeness axiom.
Let us use the idea of the last paragraph to show that the rational numbers, an ordered
ﬁeld, are not complete by exhibiting two sets, one preceding the other, which have no
rational number between them. Just let
S1 = { x : x > 0, x2 < 2 } and S2 = { x : x > 0, x2 > 2 }.
√
The only possible number between S1 and S2 is 2 —which is irrational. This construction is just what we need to prove the following sample.
Theorem 0.1 Every non-negative real number a ∈ R has a unique non-negative square
root.
Proof: If a = 0 , then 0 is the square root. If a > 0 , let S1 = { x : x > 0, x2 < a }
and S2 = { x : x > 0, x2 > a } . We ﬁrst show that neither S1 nor S2 is empty. Since
2
a
(1 + a )2 = 1 + a + a > a , we know that (1 + a ) ∈ S2 , so S2 = ∅ . Also ( 1+ a )2 < a (check
2
4
2
2
a
this) so that 1+ a ∈ S1 and hence S1 = ∅ . Because S1 precedes S2 , by the completeness
2
axiom there is a c ∈ R between S1 and S2 . Notice that c > 0 , since c is preceded by
S1 .
It remains to show that c2 = a . By the trichotomy theorem, either c2 > a, c2 < a ,
or c2 = a . The ﬁrst two possibilities will be shown to give contradictions. If c2 > a , since
2
2
a < ( c 2+a )2 < c2 , we see that c 2+a ∈ S2 an d precedes c2 , contradicting the property
c
c
speciﬁed in the completeness axiom that c2 precedes every element of S2 . Similarly the
assumption c2 < a , with the inequality c2 < ( c2aca )2 < a , leads to a contradiction. The
2+
only remaining possibility is c2 = a , which shows that c is the desired positive square root
of a .
Let us now prove that the positive square root c of a is unique. Assume that there
are two positive numbers c1 and c2 such that both c2 = a and c2 = a . Then
1
2
0 = c2 − c2 = (c1 − c2 )(c1 + c2 )
1
2
Since c1 + c2 > 0 , we conclude that c1 − c2 = 0 , so c1 = c2 , completing the proof of the
theorem.
Definition: The real number M is an upper bound for the set A ⊂ R if for every a ∈ A ,
we have a ≤ M . The number µ ⊂ R is a least upper bound (l.u.b) for A if µ is an upper
bound for A and no smaller number is also an upper bound for A . Lower bound and
greatest lower bound (g.l.b) are deﬁned similarly. A set A ⊂ R is bounded if it has both
upper and lower bounds.
Theorem 0.2 Every non-empty bounded set A ⊂ R has both a greatest lower bound and
a least upper bound. 0.5. REALS: COMPLETENESS 11 Proof: Observe ﬁrst that this theorem utilizes the completeness property in that without
it, there might have been a ”hole” just where the g.l.b. and l.u.b. should be. Since the
proofs for the g.l.b. and l.u.b. are almost identical we only prove there is a g.l.b. Let
S1 = { x : x precedes A }, and S2 = A.
By hypothesis S2 = ∅ . Since A is bounded, it has a lower bound m, m ∈ S1 so S1 = ∅ .
By the completeness axiom, there is a c ∈ R between S1 and S2 . It should be obvious
that c is both greater than or equal to every element of S1 , and less than or equal to every
element of S2 - so it is the required g.l.b.
Definition: The closed interval [a,b] is the set { x ∈ R : ≤ ≤ } .
The open interval (a,b) is the set { x ∈ R : < < } . All we can do is apologize for
the multiple use of the parentheses in notation. Please note that sets are not like doors.
Some sets, like (a, b) = { x ∈ R : ≤ < } are neither open nor closed.
Theorem 0.3 (Nested set property). Let I1 , I2 , . . . be a sequence of non-empty closed
bounded intervals, In = { x : an ≤ x ≤ bn } , which are nested in the sense I1 ⊃ I2 ⊃ I3 . . . ,
so each covers all that follow it. Then there is at least one point c ∈ R which lies in all of
the intervals, that is, c is in their intersection c ∈ ∩∞ Ik .
k=1
Proof: Let S1 = { x : x precedes some In , and so all Ik , k ≥ n }
S2 = { x : x preceded by some In , and so all Ik , k ≥ n }.
First, neither S1 nor S2 are empty since a1 ∈ S1 and b1 ∈ S2 . Thus by the completeness
axiom, there is at least one c ∈ R between S1 and S2 . This c is the required number
(complete the reasoning).
If the intervals Ik do not get smaller after, say IN because aN = aN +1 = . . . and
bN = bN +1 = . . . , then the whole interval aN ≤ x ≤ bN is caught by the preceding
argument. The more common case is there the ak ’s strictly increase and the bk ’s strictly
decrease. This is what happens when approximating a real number to successively greater
√
accuracy by the decimal expansion. In the case of 2 for example,
I1 = { x : 1 ≤ x ≤ 2 },
I2 = { x : 1.4 ≤ x ≤ 1.5 },
I3 = { x : 1.41 ≤ x ≤ 1.42 },
I4 = { x : 1.414 ≤ x ≤ 1.415 },
√
and so on, gradually squeezing down on 2 to any desired accuracy.
Definition: The sequence an ∈ R, = , , . . . . of real numbers converges to the real
number c if, given any > 0 , there is an integer N such that |an − c| < for all n > N .
We will then write an → c . [In practice no confusion arises for the use of → to denote
both convergence and mappings (cf. 1)].
Again ordinary decimals supply an example, for they allow us to get arbitrarily close
to any real number. We could have deﬁned the real numbers as all decimals; however there
would be a mess avoiding the built-in ambiguity illustrated by 1.9999 . . . . = 2.0000 . . . .
Theorem 0.4 Under the hypotheses of the previous theorem, if in addition the length of
In tends to zero, (bn − an ) → 0 , then the number c ∈ R found is unique. Furthermore, if
uk ∈ Ik for all k , that is if ak ≤ uk ≤ bk , then uk → c too.
Proof: Suppose there were two real numbers c and c in all of the intervals,
˜
ak ≤ c ≤ bk and ak ≤ c ≤ bk
˜ for all k. 12 CHAPTER 0. REMEMBRANCE OF THINGS PAST. Rewriting the second inequality as −bk ≤ −c ≤ −ak , and adding this to the ﬁrst inequality,
˜
we ﬁnd that ak − bk ≤ c − c ≤ bk − ak . Since both sides of this inequality tend to zero, if
˜
c − c = 0 , we would have a contradiction.
˜
To prove uk → c , repeat the above reasoning with c replaced by uk . We ﬁnd that
˜
ak − bk ≤ c − uk ≤ bk − ak . Again both sides of this inequality tend to zero. Now let us
ﬁddle with the , N deﬁnition of limit t o complete the proof. Since bn − an → 0 , given
any > 0 , there is an N such that |an − bn | < for all n > N . Thus for any > 0 and
the same N, |un − c| < for n > N , which is the deﬁnition of un → c .
Theorem 0.5 Bolzano-Weierstrass. Every inﬁnite sequence of real numbers { uk } in
a bounded interval I has at least one subsequence which converges to a number c ∈ R .
Proof: This one is very clever and picturesque. Watch. Bisect I into two intervals I1
˜
˜
and I1 of equal length. At least one of I1 or I1 must contain an inﬁnite number of the
{ uk } ’s. Continuing in this way we obtain a set of nested intervals I ⊃ I1 ⊃ I2 ⊃ . . . each
of which have an inﬁnite number of the { uk } ’s, and the length of In tending to zero.
From Theorem 3 we conclude that there must be a c ∈ R common to all of the intervals.
We must now select the subsequence { ukn } of the { uk } ’s which converge to c . Since
each In contains an inﬁnite number of points of the sequence, we can certainly pick one,
say ukn ∈ In . This sequence { ukn } satisﬁes the hypotheses of Theorem 4. Thus ukn → c .
Remarks: 1. If we also assume I is closed, then we can further assert that c ∈ I . If I is
not closed, c may be an end point I .
2. If a sequence uk converges to a c ∈ R , then every inﬁnite subsequence ukn also
converges, and to the same number c .
Theorem 0.6 . If the sequence { uk } converges, it is bounded.
Proof: Say uk → α , and let = 1 in the deﬁnition of convergence. Then there is an N
such that |un − α| < 1 for all n > N . Thus, when n > N ,
|un | = |un − α + α| ≤ |un − α| + |α| < 1 + |a|
Therefore for any k the number |uk | is bounded by the largest of the N + 1 numbers |u1 | ,
|u2 | , . . . , |uN | and (1 + |a|) .
The following theorem shows how to handle algebraic combinations of convergent sequences.
Theorem 0.7 If an → α and bn → β , then
i) an + bn → α + β
ii) an bn → αβ
n
iii) an → α if both bn = 0 , for all n , and if β = 0 .
b
β
Proof: Since the proofs are all similar, we only prove ii). Observe that
|an bn − αβ | = |(an bn − αbn ) + (αbn − αβ )| ≤ |an − α| |bn | + |alpha| |bn − β |
By Theorem 6, the |bn | ’s are bounded, say by B . Since an → α , given any ε > 0 , there
is an N1 such that |an − α| < 2ε for all n > N1 , and since bn → β , for the same ε there
B 0.5. REALS: COMPLETENESS 13 is an N2 such that |bn − β | < 2|ε | for all n > N2 . Thus, if n is greater than the larger of
α
N1 and N2 , n > max(N1 , N2 ) , we ﬁnd that
|an bn − αβ | < εε
+ =ε
22 which does the job.
Definition: The sequence a1 , a2 , . . . of real numbers is said to be monotone increasing if
a1 ≤ a2 ≤ a3 ≤ . . . , and monotone decreasing a1 ≥ a2 ≥ a3 ≥ . . . . Both kinds are called
monotone sequences.
Theorem 0.8 Every bounded monotone sequence a1 , a2 , . . . of real numbers converges. In
other words, there is an αεR such that an → α .
Proof: We assume the sequence is increasing. The proof for decreasing sequence is identical. Since the sequence is bounded, by Theorem 2 it has a least upper bound αεR . We
maintain an → α . Given any ε > 0 , we know that for all n, an < a + ε because α is an
upper bound. Since α − ε < α , and α is the l.u.b. of the sequence, we can ﬁnd an N such
that α − ε < aN . But then, because the sequence is increasing α − ε < an for all n ≥ N .
Thus for all n ≥ N, a − ε < an < a + ε ; that is, |an − α| < ε for all n ≥ N , proving the
convergence to α .
We shall close this diﬃcult section with a wonderful procedure for computing the square
root of a positive real number. I use it all of the time. It is much easier to understand than
the hair-raising method taught in public school.
Theorem 0.9 For any positive real numbers A and a0 the inﬁnite sequence deﬁned by
A
1
an+1 = (an + ), n = 0, 1, 2, . . . ,
2
an
√
is monotone decreasing and converges to A . Moreover, if we let bn =
√
are monotone increasing and also converge to A :
√
ba ≤ b2 ≤ . . . ≤ A ≤ . . . ≤ a2 ≤ a1 (0-2)
A
an , then the bn ’s Proof: We ﬁrst show that a2 ≥ A and that ak+1 ≤ ak ,
k
1
A2
1
A2
) − A = (ak−1 +
) ≥ 0, so a2 ≥ A.
a2 − A = (ak−1 +
k
k
4
ak − 1
4
ak − 1
From this, it is easy to see that ak+1 ≤ ak , for
a2 − A
1
A
≥ 0.
ak − ak+1 = ak − (ak + ) = k
2
ak
2ak
√
Thus a1 ≥ a2 ≥ · · · . ≥ A .
That the a2 converge is an immediate consequence of Theorem 8, since the sequence
k
2 } is a bounded (by A) monotone decreasing sequence. Denoting the limit by α, a2 → α ,
{ ak
k
the proof that α = A is identical to th e reasoning which gave a unique limit in Theorem
4.
√
A
Since bn = an , and the an ’s decrease and are ≥ A , then the bn ’s increase and are
√
√
√
A
≤ A . This also shows that bn ≤ an . Since an → A , we have bn = an → A too. 14 CHAPTER 0. REMEMBRANCE OF THINGS PAST. √
8
Application: We compute 8 . Take a0 = 3 . Then a1 = 1 (3 + 3 ) = 17 , and
6
√ 2 577
48
6
b1 = 8 · 17 = 17 . Similarly, a2 = 577 , b2 = 1632 . This gives 1632 ≤ 8 ≤ 204 , or in decimal
204
577
577
form
√
2.82842 < 8 < 2.82843,
astounding accuracy after only two steps. I carried the computations one step further and
found
√
2.828427124 . . . . ≤ 8 ≤ 2.828427124 . . . .,
where the dots indicate I gave up on the arithmetic, having obtained the exact value as far
as the approximation went. Digital computers use this method and related ones for similar
computations. It is particularly well adapted to them (and me) since only simple arithmetic
operations are involved.
This Theorem 9 gives another proof that every positive real number has a unique
positive square root. It is valuable to compare this proof with that of Theorem 1. The
main distinction is that the second proof just given is constructive it actually shows a
way to compute successive approximations to the square root of any number. However,
you are justiﬁed in asking how we ever found the procedure of equation (0.9) in the ﬁrst
place. The secret is that this formula is a statement of Newton’s method for ﬁnding roots
of f (x) = 0 , applied to the particular function f (x) = x2 − A . See most calculus books
for more information about this method. Hopefully, we will have time to discuss this topic
later, for it is a constructive way o f proving the existence of a sought after object. The
standard existence theorem for ordinary diﬀerential equations is a close relative of Newton’s
method. Exercises
(1) For the sequences deﬁned below, ﬁnd which converge, which do not converge but
do have at least one convergent subsequence, and which have neither. In all cases
n ∈ Z+ .
(a) an =
(b) an =
(c) bn = 1
n +1 (−1)n
n
n
e (d) an = e−2n+1
(e) an = 1 + n
(f) an = 2 + (−1)n
√
√
(g) an = n + 1 − n
(h) an =
(i) an = 2− 3n
5n+1
7n
n! (j) sn = 1 + (tough, isn’t it?)
1
2 + 1
4 + 1
8 + ··· + 1
2n . (2) Prove that if an → α and bn → β , then (an + bn ) → α + β , where all the letters
represent real numbers. 0.6. APPENDIX: CONTINUOUS FUNCTIONS AND THE MEAN VALUE THEOREM15
(3) a). Prove Bernoulli’s inequality
(1 + h)n > 1 + nh, h = 0, h > −1, n ≥ 2.
Here h ∈ R and n ∈ Z . I suggest proof by induction.
b). If s ∈ R , use part a) to prove that
an ≡ sn →
[Hint: If |s| < 1 , write |s| = 0.6 1
1+h , 0 if |s| < 1
∞ if |s| > 1. h > 0 , while if |s| > 1 , write |s| = 1 + h, h > 0] . Appendix: Continuous Functions and the Mean Value
Theorem Definition: : The function f (x) is continuous at the point x0 if, given any
is a δ ( ) > 0 such that
|f (x) − f (x0 )| < ε when 0 < |x − x0 | < δ ( ).
Remark: This may be rephrased as
lim f (x) = f (x0 ). x→x0 Note that either statement requires
(1) f be deﬁned at x0 .
(2) lim f (x) exists.
x → x0
x = x0
(3) the limiting value of f at x0 is equal to the deﬁned value of f at x0 .
If a function is discontinuous at x0 , it has at least one of the four troubles
(1) Jump discontinuity
(2) Inﬁnite discontinuity
(3) Inﬁnite oscillations
(4) Removable discontinuity.
Here are examples of each trouble at the point x = 0 .
(1) f (x) =
(2) f (x) = 1, 0 ≤ x
−1, x < 0
1
x anything, x=0
say 1, x = 0 > 0 , there 16 CHAPTER 0. REMEMBRANCE OF THINGS PAST. (3) f (x) = 1
sin x
anything, (4) f (x) = x, x = 0
1, x = 0 x=0
say 0, x = 0 Note that a function may oscillate inﬁnitely about a point and still be continuous there.
This is illustrated by the everywhere continuous function
f (x) = 1
x sin x
0 , x=0
, x=0 Theorem 0.10 I. If f (x) is continuous at x = c , and f (c) = A = 0 , then f (x) will
keep the same sign as f (c) in a suitably small neighborhood of x = c .
Proof: : We construct the desired neighborhood. Assume A is positive. The proof if
A < 0 is essentially the same. In the deﬁnition of continuity, take = A . Then there is a
δ > 0 such that
|f (x) − A| < A when |x − c| < δ,
that is,
0 < f (x) < 2A, when |x − c| < δ.
In other words, f (x) is positive in the interval |x − c| < δ .
Theorem 0.11 II. If f (x) is continuous at every point of a closed and bounded interval, then there is a constant M such that |f (x)| ≤ M throughout the interval. Thus a
continuous function in a closed and bounded interval is bounded.
Proof: : By contradiction. If f is not bounded, there is a sequence of points xn such
that |f (xn )| > n . From that sequence by Theorem 5 (Bolzano-Weierstrass) we can select
a subsequence xnk which converges to some point x0 in t he interval, xnk → x0 . Thus
|f (xnk )| → ∞.
But we know from the continuity of f that |f (xnk )| → |f (x0 )| . A contradiction.
Moreover, with the same hypotheses, we can conclude more.
Theorem 0.12 III. If f is continuous at every point of a closed and bounded interval,
then there are points x = α and x = β in the interval where f assumes its greatest and
least values, respectively.
Proof: : We show that f assumes its greatest value. The proof for the least value is
essentially identical. Let S be the set of all upper bounds for f . By Theorem II S is not
empty. Therefore by Theorem 2, S has a g.l.b., call it M0 . Since M0 is the greatest lower
bound of upper bounds for f , there is a sequence xn such that limn→∞ f (xn ) → M0 . Use
Bolzano-Weierstrass to pick a subsequence xnk of the xn such that the xnk converges, say
to c . By continuity of f, limnk →∞ f (xnk ) = f (x) . Thus f (c) = M0 , so f does assume its
greatest value at x = c .
Remark: This theorem refers to the absolute maximum and absolute minimum values. 0.6. APPENDIX: CONTINUOUS FUNCTIONS AND THE MEAN VALUE THEOREM17
Examples: The following show that the theorem is not necessarily true if any of the
hypotheses are omitted.
(1) f (x) = x, 0 < x ≤ 1 . No min. (interval not closed).
(2) f (x) = x, x ≤ 0 , and f (x) =
unbounded.)
(3) f (x) = 1
,
1+x2 all x , both have no min. (the interval is x,
0 ≤ x < 3. No max. (function is discontinuous.)
x − 2, 3 ≤ x ≤ 4 Theorem 0.13 If f (x) is continuous at every point of a closed and bounded interval [a, b] ,
and if f (a) and f (b) have opposite sign, then there is at least one point c ∈ (a, b) such
that f (c) = 0 .
Proof: : Say f (a) < 0, f (b) > 0 . We ﬁnd one point c , “the largest x such that
f (x) = 0 ”. Let S = { x ∈ [a, b] : f (x) ≤ 0 } .
Since f (a) < 0, S is not empty. It thus has a l.u.b., c . We prove that f (c) = 0 .
Either f (c) > 0, f (c) < 0 , or f (c) = 0 . The ﬁrst two possibilities cannot happen, since
by Theorem I, if they did, f would be positive (or negative) in a whole neighborhood of
c -violating the fact that c is the l.u.b. of S .
Corollary 0.14 (intermediate value theorem). Let f (x) be continuous at every point
of a closed and bounded interval [a, b] , with f (a) = A, and f (b) = B . Then if C is any
number between A and B , there is at least one point c, a ≤ c < b , such that f (c) = C .
Thus, f assumes every value between A and B at least once.
Proof: : Apply Theorem IV to the function ϕ(x) = C − f (x) .
Remark: The function may assume values other than just those between A and B . An
example is the function f (x) = x2 , −1 ≤ x ≤ 3 . The theorem requires that it assume all
values between f (−1) = 1 and f (3) = 9 . Besides those values , this function also happens
to assume all values between 0 and 1.
We can oﬀer another proof of
Corollary 0.15 Every positive number k has a unique positive square root.
Proof: : Consider f (x) = x2 − k , which is clearly continuous everywhere. Since f (0) < 0 ,
2
and f (1+ k ) = (1+ k )2 − k = 1+ k4 > 0 , Theorem IV shows that f must vanish somewhere
2
2
in the interval 0 < x < 1 + k . This is the root. It is the unique positive square root, for
2
say there were two positive numbers x and y such that x2 − k = 0 and y 2 − k = 0 . then
x2 − y 2 = 0 . Thus, 0 = x2 − y 2 = (x − 1)(x + y ) . Since x + y > 0 , we conclude x − y = 0 ,
or x = y .
Remark : It appears that if a function has the property of Corollary 1, the intermediate
value property, then it must be continuous. This is false. An example is given by the
discontinuous (trouble 3) function
f (x) = 1
sin x
0 , x=0
, x=0 18 CHAPTER 0. REMEMBRANCE OF THINGS PAST. about the point x = 0 . If a is any number < 0 , and b any number > 0 , then f (x)
assumes every value between f (a) and f (b) , but f (x) is not continuous throughout the
interval since it is not continuous at x = 0 .
Definition: The function f (x) has a relative maximum (minimum) at the point x0 , if,
for all x in a suﬃciently small interval containing x0 as an interior point, we have
f (x) ≤ f (x0 ) (f (x) ≥ f (x0 )). Remark: By convention, we shall agree not to call the possible max (or min) at the end
point of an interval a relative max (or min). This does lead to the possibility of an absolute
max (or min) not being a relative max (or min). However, if the absolute max (or min)
does occur at an interior point of an interval, it is also a relative max (or min).
Definition: The function f (x) is diﬀerentiable at the point x0 if the following limit
lim x→x0 f (x) − f (x0 )
x − x0 exists. There are the usual notations: f (x0 ) , df
dx x=x
0 , Df (x0 ) . Theorem 0.16 If f (x) is diﬀerentiable at x0 , then it is continuous there.
Proof: : Now if the limit f (x) − f (x0 )
x − x0
exists, as we have assumed, then the numerator must approach zero as x tends to x0 .
Thus f is continuous at x0 .
lim x→x0 Theorem 0.17 If f (x) is diﬀerentiable at x0 and has a relative maximum or minimum
at x0 , then f (x0 ) = 0 .
Proof: : Assume f has a relative min at x0 . Then for all x near x0 , f (x) ≥ f (x0 ) .
)(
(i) if x < x0 f (xx−f 0x0 ) ≤ 0
−x
f ( x) − f ( x0 )
≤
x− x0
)(
(ii) if x > x0 f (xx−f 0x0 )
−x
)− f (
so limx→x0 f (xx−x0x0 ) ≥
x>x so limx→x0 x<x0 0
≥0
0 0 Because the function is diﬀerentiable at x0 , the two limiting values are f (x0 ) . Thus
f (x) ) ≤ 0 and f (x0 ) ≥ 0 . Both statements can be true only if f (x0 ) = 0 . The trick here
was, the slope must be negative to the left, and positive to the right of x0 . Since there is
a unique slope (the derivative) at x0 , the slope must be zero there. At a relative max., the
same proof holds with obvious modiﬁcations.
Examples: 1. Although the function f (x) = |x| has a relative minimum at x = 0 , the
conclusion of the theorem does not hold since f is not diﬀerentiable there. Note that both
(i) and (ii) of the proof still do hold.
2. The diﬀerentiable function (for all x )
f (x) = 1
x4 sin x
0 , x=0
, x=0 has an inﬁnite number of relative max and min in any interval including the origin. 0.6. APPENDIX: CONTINUOUS FUNCTIONS AND THE MEAN VALUE THEOREM19
Theorem 0.18 (Rolle). If
(i) f (x) is continuous at every point of the closed and bounded interval [a, b]
(ii) f (x) is diﬀerentiable at every point of the open interval (a, b) and
(iii) f (a) = f (b) ,
then there is at least one point c , a < c < b , where f (c) = 0 .
Proof: : If f (x) ≡ constant throughout [a, b] , take c to be any point in (a, b) . Otherwise
f (x) must go either above or below (or both) the value f (a) . Assume it goes above. Then
by Theorem III there is a point x = c where f has its absolute maximum. Since we
assumed f (x) goes above f (a) , the point x = c is an interior point. Thus there is
a relative maximum. Since f is diﬀerentiable in (a, b) , we may apply Theorem VI to
conclude that f (c) = 0 . If we had assumed f went below f (a) , then there would have
been an absolute (and relative) min. etc.
Remarks: 1. From the proof of the theorem, we see that if f has values both greater and
less than f (a) , then there would be at least two points in (a, b) where f = 0 .
2. You should be able to construct examples showing the theorem is not true if any of
the hypotheses are dropped.
Corollary 0.19 (mean value theorem) If
(i) f (x) is continuous at every point of the closed and bounded interval [a, b] and
(ii) f (x) is diﬀerentiable at every point of the open interval (a, b) , then there is at
least one point c in (a, b) where
f (c) = f (b) − f (a)
.
b−a Proof: : “Shift and apply Rolle’s Theorem”. In more detail, consider
F (x) = f (x) − f (a) − x−a
(f (b) − f (a)).
b−a F (x) satisﬁes all of the assumption of Rolle’s Theorem. Therefore there is a point c where
F (c) = 0 . Since
f (b) − f (a)
F (x) = f (x) −
,
b−a
at x = c , we have
f (c) = f (b) − f (a)
.
b−a Remarks: 1. The function f (x) = |x| in the interval [a, b], a < 0, b > 0 , shows what
happens if the function fails to be diﬀerentiable at even one point of the open interval (a, b) .
2. An alternative form of the conclusion is: there is a number θ, 0 < θ < 1 , such that
f (b) − f (a) = f (a + θ(b − a))(b − a).
This is because every point in the interval (a, b) is of the form a + θ(b − a) , for some
θ, 0 < θ < 1 .
We shall now give some applications of the Mean Value Theorem. The ﬁrst one is a
speciﬁc example, while the others have great signiﬁcance in themselves. 20 CHAPTER 0. REMEMBRANCE OF THINGS PAST. Example: The function f (x) = a1 sin x + a2 sin 2x + b cos x + b2 cos 2x has at least one
zero in the interval [0, 2π ] , no matter what the coeﬃcients a1 , a2 , b1 and b2 are. To show
this, we shall show f is the derivative of a function g (x) which satisﬁes the hypotheses of
Rolle’s theorem. This function g is just an anti-derivative of f : g (x) = f (x)
g (x) = −a cos x − b2
a2
cos 2x + b1 sin x + sin 2x.
2
2 Since g is clearly continuous and diﬀerentiable everywhere, we must only see if g (0) =
g (2π ) , which as also easy.
Theorem 0.20 If f (x) is continuous and diﬀerentiable throughout [a, b] , and |f | < N
there too, then the δ ( ) in the deﬁnition of continuity can be chosen as δ (c) = N . This δ
works for every x in [a, b] .
Proof: : Use the form of the mean value theorem in Remark 2. Then for any points x, x0
in (a, b) ,
f (x) − f (x0 ) = f (x)(x − x0 ),
˜
where x is somewhere between x and x0 . Thus
˜
|f (x) − f (x0 )| ≤ N |x − x0 | .
We see now that if δ ( ) = N , then for any |f (x) − f (x0 )| < > 0,
if |x − x0 | < δ. Theorem 0.21 If f satisﬁes the hypotheses of the mean value theorem and if in addition
f (x) ≡ 0 throughout (a, b) , then f (x) ≡ const.
Proof: : Let x1 and x2 be any points on (a, b) . Then by the form of the mean value
theorem in Remark 2
f (x2 ) − f (x1 ) = 0 · (x2 − x1 ) = 0.
Thus f (x2 ) = f (x1 ) for any two points in (a, b) , that is, f is identically constant.
Corollary 0.22 If f (x) and g (x) both satisfy the hypotheses of the mean value theorem,
and if in addition f (x) ≡ g (x) for all x in (a, b) , then f (x) = g (x) + c , where c is some
constant.
Proof: : consider the function F (x) = f (x) − g (x) . It satisﬁes the hypothesis of Theorem
VII, so F (x) ≡ c, c constant. Thus f (x) − g (x) = c .
Remark: Theorem IX is the converse of the theorem: “the derivative of a constant function
is zero.”
a figure goes here 0.6. APPENDIX: CONTINUOUS FUNCTIONS AND THE MEAN VALUE THEOREM21 Exercises
(1) Look over all the theorems (and corollaries) here and be sure you can ﬁnd examples
showing that the theorems are not true if any of the hypotheses are relaxed.
(2) Let f (x) = 1, if x is a rational number
0, if x is an irrational number. Is f continuous anywhere?
(3) Let f (x) be an everywhere diﬀerentiable function which is zero at x = aj , j =
1, 2, . . . , n. Find a function which vanishes at least once between each of the zeros of
f.
(4) Use Theorem VIII to ﬁnd a δ ( ) for the given functions.
(a) f (x) = x4 − 7, −2 ≤ x ≤ 3.
(b) f (x) = x2 sin x, −4 ≤ x ≤ 3
1
(c) f (x) = 1+x2 , −2 ≤ x ≤ 1
4 (d) f (x) = x 3 + 7, −2 ≤ x ≤ 8
√
(e) f (x) = x x2 + 1, −2 ≤ x ≤ 2
(5) (a) The function f (x) satisﬁes the following condition
|f (x) − f (x0 )| ≤ 2 |x − x0 |3
for every pair of points x, x0 in the interval [a, b] . Prove f (x) ≡ constant in this
interval.
(b) Generalize your proof to the case when f satisﬁes
|f (x) − f (x0 )| ≤ c |x − x0 |α ,
where c > 0 is some constant and α is any number > 1 .
2 (6) Consider the function f (x) = x 3 , in the interval [−8, 8] .
Sketch a graph. Note that f (−8) = f (8) = 4 but there is no point where f = 0 ;
which hypothesis of Rolle’s theorem is violated?
(7) In a trip, the average speed of a car is 180 miles per hour. Prove that at some time
during the trip, the speedometer must have registered precisely 180 miles per hour.
(8) Let P1 := (x1 , y1 ) and P2 := (x2 , y2 ) be any two points on the parabola y =
ax2 + bx + c , and let P3 := (x3 , y3 ) be the point on the arc P1 P2 where the tangent
is parallel to the chord P1 P2 . Show that
x3 = x1 + x2
.
2 (9) Prove that every polynomial of odd degree
P (x) = x2n+1 + a2n x2n + · · · + a1 x + a0
has at least one real root.
(10) If f is a nice function and f < 0 everywhere, prove that f is strictly decreasing. 22 0.7 CHAPTER 0. REMEMBRANCE OF THINGS PAST. Complex Numbers: Algebraic Properties .
In high school, to be able to ﬁnd the roots of all quadratic equations ax2 + 2bx + c = 0 ,
√
we were forced to introduce the symbol i ≡ −1 , in other words, introduce a special symbol
for a root of x2 + 1 = 0 . Before going any further, we should prove that no real number c
can satisfy c2 + 1 = 0 . By contradiction, assume that there is such a c . Then necessarily
either c > 0, c < 0, or c = 0 . If c = 0 , we have the immediate contradiction that 1 = 0 . If
c > 0 , or c < 0, 0 < c2 . Consequently 0 < c2 + 1 too, which again contradicts 0 = c2 + 1 ,
and proves our contention that no real number can satisfy x2 + 1 = 0 .
Observe that our proof also shows that if we introduce a new symbol for a root of
2 + 1 = 0 , that symbol cannot be an element of an ordered ﬁeld, for only the ordered ﬁeld
x
properties of the real numbers were used in the above proof. we shall see that “i” is an
element of a ﬁeld, but not an ordered ﬁeld.
It is diﬃcult to overestimate the importance of complex numbers for all of mathematics,
both from an esthetic as well as from a practical viewpoint. With them we can prove that
every quadratic polynomial has exactly two roots (which may coincide). What is more
surprising is that every polynomial of order n
an xn + an−1 xn−1 + · · · + a1 x + a0 = 0, an = 0,
has exactly n complex roots. This result, thefundamental theorem of algebra, was ﬁrst
proved by Gauss in his doctoral dissertation (1799). It is one of the crown jewels of mathematics. The diﬃcult part is proving that every polynomial has at least one complex root,
from which the general result follows using only the “factor theorem” of high school algebra. Later on in the semester we shall discuss this more fully and oﬀer a proof. It is not
simpleminded, for the proof is non-constructive pure existence proof, giving absolutely no
method of ﬁnding the roots. Perhaps we shall even prove some more exotic results.
Having gotten carried away, let us retreat and obtain the algebraic rules governing the
set C of complex numbers. In order to reveal the algebraic structure most clearly, we shall
denote a complex number z by an ordered pair of real numbers: z = (x, y ), x, y ∈ R .
Thus C is R × R with the following additional algebraic structure.
Definition: If z1 = (x1 , y1 ) and z2 = (x2 , y 2 ) are any two complex numbers, then we
deﬁne
Addition:
z1 + z2 = (x1 + x2 , y1 + y2 ) ,
and
Multiplication:
z1 · z2 = (x1 x2 − y1 y2 , x1 y2 + y1 x2 ) .
Equality: z1 = z2 if and only if both x1 = x2 and y1 = y2 .
Thus, the complex number zero—the additive identity—is (0, 0) , while the complex
number one—the multiplicative identity—is (1, 0) . Using the fact that the real numbers
R form a ﬁeld, we can now prove the
Theorem 0.23 The complex numbers C form a ﬁeld.
Proof: Since the veriﬁcation of the ﬁeld axioms are entirely straightforward we give only
a smattering. Note that we shall rely heavily on the ﬁeld properties of R . Addition is
commutative:
z1 + z2 = (x1 , y1 ) + (x2 , y 2 ) = (x1 + x2 , y1 + y2 )
= (x2 + x1 , y2 + y1 ) = (x2 , y2 ) + (x1 , y1 ) = z2 + z1 . (0-3) 0.7. COMPLEX NUMBERS: ALGEBRAIC PROPERTIES 23 Additive identity:
0 + z = (0, 0) + (x, y ) = (0 + x, 0 + y ) = (x, y ) = z.
Multiplicative inverse: For any z ∈ C, z = (0, 0) , we must ﬁnd a z = (ˆ, y ) ∈ C such
ˆ
xˆ
that z z = 1 , that is, ﬁnd real numbers x and y such that (x, y )(ˆ, y ) = (1, 0) . Using
ˆ
ˆ
ˆ
xˆ
the deﬁnition o f complex multiplication, this means we must solve the two linear algebraic
equations
xx − y y = 1
ˆ
ˆ
x, y ∈ R,
y x + xy = 0
ˆ
ˆ
for x and y ∈ R . The result is
ˆ
ˆ
z = (ˆ, y ) = (
ˆ
xˆ x2 −y
x
,
).
2 x2 + y 2
+y 1
We will denote this multiplicative inverse, which we have just proved does exist, by z or
−1 .
z
It is interesting to notice that complex numbers of the form (x, 0) have the same
arithmetic deﬁnitions as the real numbers, viz. (x1 , 0) + (x2 , 0) = (x1 + x2 , 0)
(x1 , 0)(x2 , 0) = (x1 x2 , 0).
We can easily verify that all complex numbers of this form (x, 0) also form a ﬁeld,
a subﬁeld of the ﬁeld C . On the basis of these last two equations, we can identify a real
number x with the complex number (x, 0) in the sense th at if we perform any computation
with these complex numbers of this form, the result will be the same as if the computation
had been performed with the real numbers alone. Thus, numbers of the form (x, 0) ∈ C are
algebraically equivalent to the numbers x ∈ R . The technical term for such an algebraic
equivalence is isomorphic, much as a term for geometric equivalence is congruent. After
identifying the real numbers with complex numbers of the form (x, 0) , we can say that the
ﬁeld of real numbers R is embedded as a subﬁeld in the ﬁeld of complex numbers, R ⊂ C .
After all this chatter, let us at least convince ourselves that every quadratic equation
is solvable if we use complex numbers. First we solve z 2 + 1 = 0 , which may be written
as (x, y )(x, y ) + (1, 0) = (0, 0) , or as the two real equations x2 − y 2 = −1, 2xy = 0 .
The last equation says that either x = 0 or y = 0 . Now if y = 0 , we are left to solve
x2 + 1 = 0, x ∈ R , which we know is impossible. Therefore x = 0 and then y 2 = 1 . Thus
the two complex numbers (0, 1) and (0, −1) both satisfy z 2 + 1 = 0 . The general case,
az 2 + bz + c = 0 is easily reduced to the special one by completing the square.
One by-product of the above demonstration is that we see it is foolhardy to try to deﬁne
an order relation on C to obtain an ordered ﬁeld. This is because the equation x2 + 1 = 0
cannot be solved in any ordered ﬁeld, as was shown earlier, whereas we have just solved it
in C .
Observe that every (x, y ) ∈ C can be written as
(x, y ) = (x, 0)(1, 0) + (y, 0)(0, 1),
where the complex number (0, 1) is called the imaginary unit and is denoted by i. If we
now utilize the isomorphism between the real number a and complex numbers (a, 0) , the 24 CHAPTER 0. REMEMBRANCE OF THINGS PAST. last equation shows that (x, y ) may be thought of as x + iy . Thus, we have obtained the
usual notation for complex numbers. From our development, the algebraic role of i as the
symbol for the imaginary unit (0, 1) is hopefully clariﬁed. The number x is called the real
part, and y the imaginary part of the complex number z = x + iy . In symbols, x = Re{ z }
and y = Im{ z } .
Our introduction of complex numbers suggests a geometric interpretation. We have
deﬁned complex numbers C as ordered pairs of real numbers, elements of R × R , with an
additional algebraic structure. Since the points in the plane are also elements of R × R , it
is clear that there is a one to one correspondence between the complex numbers and the
points in the plane. If we plot the point z = (x, y ) , the real number |z | , the “absolute
value or modulus of z ” is the distance of the point z from the origin. Its value is computed
by the Pythagorean theorem
|z | = x2 + y 2 .
Here are several formulas which are easily veriﬁed: |z1 z2 | = |z1 | + |z2 | |x| ≤ |z | , |y | ≤ |z | |z1 + z2 | ≤ |z1 | + |z2 | (triangle inequality) (0-4) If the line joining the point z to the origin is drawn, the angle θ between that line
and the positive real (= x) axis is called the argument or amplitude or z . The absolute
value r and argument θ of a complex number determine it uniquely, since we have
z = r(cos θ + i sin θ) (0-5) This is the polar coordinate form of the complex number z . Note that conversely, z
determines its argument only to within an additive multiple of 2π . This observation will
prove of value to us shortly.
Associated with every complex number, z = x + iy there is another complex number
z = x − iy , the complex conjugate of z . It is the reﬂection of z in the real axis. Probably
the main reason for introducing z is that we can solve for x and y in terms of z and z :
x= z+z
,
2 y= z−z
.
2i Again some simple formulas:
|z | = |z | , |z |2 = |z |2 = z z.
¯
¯
(z1 + z2 ) = z1 + z2 , (z1 z2 ) = z1 z2 . (0-6) To illustrate the value of this notation, let us leave the main road to prove the interesting
Theorem 0.24 . If the complex number γ is a root of the polynomial
P (t) = an tn + an−1 tn−1 + · · · + a1 t + a0 ,
where the coeﬃcients a0 , a1 , . . . , an are real numbers, then γ is also a root of P (t) . In
other words, the roots of real equations occur in conjugate pairs. 0.7. COMPLEX NUMBERS: ALGEBRAIC PROPERTIES 25 Proof: Since γ is a root, the complex number
P (γ ) = an γ n + · · · + a1 γ + a0
is zero, P (γ ) = 0 . This implies that its conjugate is also 0, P (γ ) = 0 . By using equations
(0.7), we have that
P (γ ) = an γ n + · · · + a1 γ + a0 ,
since the coeﬃcients aj are real, aj = aj . Thus
0 = P (γ ) = an γ n + · · · a1 γ + a0 = P (γ ),
that is, the complex number γ is a root of the same polynomial.
Now if the proof looks like it was done with mirrors, go over each step carefully. This
type of reasoning is somewhat typical of modern mathematics in that it yields information
about an object (the roots of a polynomial in this case) without ﬁrst obtaining an explicit
formula for the object.
After this digression let us return and ﬁnd a geometric interpretation for the arithmetic
operations on complex numbers. First, addition. The three points z1 , z2 and z1 + z2
together with the origin determine a parallelogram. (check this). Thus addition of complex
numbers is sometimes called the parallelogram rule for additions. Given the points z1 and
z2 , the point z1 + z2 can be constructed using compass and straight-edge. Subtraction is
just z1 + (−z2 ) .
Multiplication is much more diﬃcult to interpret geometrically. We shall use equation
(0.7) and write zj = |zj | (cos θj + i sin θj ), j = 1, 2. Then
z1 z2 = |z1 | (cos θ1 + i sin θ1 ) |z2 | (cos θ2 + i sin θ2 )
z1 z2 = |z1 z2 | [cos θ1 + θ2 ) + i sin(θ1 + θ2 )]. (0-7) Thus the product of z1 and z2 has modulus |z1 z2 | and argument θ1 + θ2 : multiply the
moduli and add the arguments. This too may be carried out using compass and straightedge. Since z1 = |z1 | (cos θ2 − i sin θ2 ) , division reads
2
2
z1
z1
=
[cos(θ1 − θ2 ) + i sin(θ1 − θ2 )],
z2
z2
so the moduli are divided while the arguments are subtracted.
We will exploit the multiplication formula (0.7) to ﬁnd all n complex roots of the
speciﬁc polynomial
z n = A,
for any A ∈ C . This equation is one of the few whose roots can always be found explicitly.
The trick is to write A in its polar coordinate form
A = |A| [cos(α + 2kπ ) + i sin(α + 2kπ )],
where α is the argument of A and k is any integer. Although we get the same A no matter
what k is used, as was observed following equation (0.7), we shall retain the arbitrary k
since it is the heart of the process we have in mind. From equation (0.7) we see that
1 1 A n = |A| n [cos α + 2kπ
α + 2kπ
+ i sin
n
n 26 CHAPTER 0. REMEMBRANCE OF THINGS PAST.
1 in the sense that for any value of the integer k, (A n )n = A . As k runs through the integers,
we get only n diﬀerent angles of the form α+2kπ , since the other angles diﬀer from these
n
n angles by multiples of 2π . For each of these n diﬀerent angles we obtain a diﬀerent
1
1
complex number A n . These n numbers for A n are the desired n roots of z n = A . It
is usually convenient to obtain the angles by letting k = 0, 1, 2, . . . , n − 1 , although any n
integers which do not diﬀer by multiples of n will do.
An example should help clear the air. We shall ﬁnd the three cube roots of −2 , that
is, solve z 3 = −2 . First,
−2 = 2[cos(π + 2kπ ) + i sin(π + 2kπ )],
since the argument of −2 is π while its modulus is 2. Thus, the roots are
1 z = 2 3 [cos π + 2kπ
π + 2kπ
+ i sin, k = 0, t1 , t2 . . .
3
3 There are only three values of z possible, no matter what k ’s are used. These three
cube roots of −2 are
√
1
1
1
k = 0, 3, 6, . . . z1 = 2 3 [cos( π ) + i sin( π )] = 2 3 ( 1 + i 23 ) k = 1, 4, 7, . . . z2 = 2 3 [cos(π ) +
3
3
2 1 i sin(π )] = −2 3
√
1
1
π
π
1
k = 2, 5, 8, . . . z3 = 2 3 [cos( 53 ) + i sin( 53 )] = 2 3 ( 2 + i 23 ).
It is time-saving to observe that the n roots of unity, that is, of z n = 1 , can be written
down immediately by utilizing the geometric interpretation of multiplication. All of the
roots have modulus 1, and so must lie on the unit circle |z | = 1 . Bisecting the circle into
n equal sectors by the radii, the ﬁrst beginning on the positive x -axis, we ﬁnd the roots of
unity, wj , at the n successive intersections of these radii with the unit circle. The roots
π
wj , j = 1, 2, 3, of z 3 = 1 are illustrated in the ﬁgure as the intersections of θ = 0, θ = 23 ,
4π
2π
2π
and θ = 3 with |z | = 1 . Thus w1 = cos 0 + i sin 0 = 1, w2 = cos 3 + i sin 3 =
1
−2 + i √ 3
2 , w3 π
π
= cos 43 + i sin 43 = − 1 − i
2 √ 3
2. Exercises
(1) Express the following complex numbers in the form a + bi .
(a) (1 − i)2
(b) (2 + i)(3 − i)
(c)
(d)
(e)
(f) 1
i
1+i
2− i
1+i
1+2i
i3 + i4 + i271 (2) Compute the absolute values of the complex numbers in Ex. 1. 0.7. COMPLEX NUMBERS: ALGEBRAIC PROPERTIES 27 (3) a) Add (1 + i) and (1 + 2i) using compass and straight-edge.
b) Multiply (1 + i) and (1 + 2i) using compass and straight-edge.
(4) Express in the form r(cos θ + i sin θ), with 0 ≤ θ < 2π :
(a) i
(b) 2i
(c) −2i
(d) 4
(e) −1
(f) −1 + i
(g) (1 − i)3
(h)
(i) 1
(1+i)2 √ 1
2( 3 + i) (5) Determine the
(a) three cube roots of i, −i , and of 1 + i ,
(b) four fourth roots of −1 and +2
(c) six roots of z 6 = 1 .
(6) Let A be any complex number, A = |A| [cos α + i sin α] , and let w1 , . . . , wn be the
n roots of z n = 1 . Prove that the n roots of z n = A are
1 1 1 z1 = A n w1 , z2 = A n w2 , . . . zn = A n wn ,
where
1 1 A n = |A| n (cos α
α
+ i sin )
n
n is the principal n th root of A . This shows that the problem of ﬁnding the roots of
a complex number is essentially reduced to the simpler problem of ﬁnding the roots
of unity.
(7) Draw a sketch of the following sets of points in the complex plane.
(a) { z ∈ C : |z − 2| ≤ 1 }
(b) { z ∈ C : |z − 1 + i| ≤ 2 }
(c) { z ∈ C : |z − 2| > 3 }
(d) { z ∈ C : 1 ≤ |z − 2| ≤ 3 }
(e) { z ∈ C : 1 ≤ |z + i| < 2 } 28 0.8 CHAPTER 0. REMEMBRANCE OF THINGS PAST. Complex numbers: Completeness Properties, Complex
Functions. We have just considered the algebraic properties of complex numbers. Now we look at
inﬁnite sequences of complex numbers. To develop the desired properties of C , we shall
utilize those of R .
Definition: The sequence zn of complex numbers converges to the complex number z if,
given any > 0 , there is an N such that |zn − z | < for all n > N . We shall again write
zn → z .
In order to apply the theorem known for real sequences to complex sequences, the
following is vital.
Theorem 0.25 Let zn = xn + iyn , and z = x + iy . Then zn converges to z if and only
if both the real and imaginary parts converge to their respective limits. In symbols,
zn → z ⇐⇒ xn → x and yn → y.
Proof: Since zn → z , given any > 0 , we can ﬁnd an N etc. for the zn ’s. Now by
equation (0.7)
|xn − x| ≤ |zn − z | < and |yn − y | ≤ |zn − z | <
so both xn → x and yn → y .
Conversely, given any > 0 , we can ﬁnd an N1 for the xn ’s and an N2 for the yn ’s.
Let N be the larger of N1 and N2 , N = max (N1 , N2 ) . This N works for both the xn
and yn . But
|zn − z | = |xn + iyn − x − iy | ≤ |xn − x| + |yn − y | < 2 .
Therefore zn → z , completing the proof.
This theorem states that a deﬁnition is equivalent to some other property. We could
thus have used either property as a deﬁnition.
Recall that the real numbers were deﬁned so that there would be no“hole” in the real
line. This was the completeness property. It guaranteed that if a sequence of real numbers
an “looked like” they were approaching a limiting value, then indeed th ere is some a ∈ R
such that an → a . The issue here was to avoid the problem of a sequence of rational
numbers approaching an irrational number—which is a “hole” if our set just consisted of
the rationals. One consequence of the las t theorem is that the set of complex numbers C
is also complete.
Theorem 0.26 . Every bounded inﬁnite sequence of complex numbers { zk } has at least
one subsequence which converges to a number z ∈ C . (By bounded, we mean that there is
some r ∈ R such that |zk | < r for all k ).
Proof: Since the { zk } are bounded, we know { xk } and { yk } are also bounded sequences of real numbers. The conclusion is now a consequence of the Bolzano-Weierstrass
theorem 5 applied to { xk } and { yk } , and of theorem 12 just proved. There is a ﬁne
point though: how to get a subsequence of the zk whose real and imaginary parts both
converge. The trick is ﬁrst to select a subsequence { xkj } = { Rezkj } of the { xk } which
converge to some x ∈ R . Then, from the related subsequence { ykj } = { Imzkj } , select 0.8. COMPLEX NUMBERS: COMPLETENESS AND FUNCTIONS 29 a subsequence { ykjn } which converges to some y ∈ R . Then { xkjn } also converges to
x ∈ R so zkjn → z , and we a re done.
With these technical results under our belts, sequences in C become no more diﬃcult
than those in R .
Let us brieﬂy examine the elements of functions of a complex variable. A complexvalued function f (z ) of the complex variable z is a mapping of some subset z
U ⊂C
2 , and f (z ) = 1 .
into the complex numbers C, f : U → C . Two examples are f (z ) = z
z
Both the domain and range of f (z ) = z 2 are all of C , while the domain and range of
1
f (z ) = z are all of C with the exception of 0.
If f maps R → R , like f (x) = 1 + x or f (x) = ex , since R ⊂ C , one asks how
the domain of deﬁnition of f can be extended from R to C . Of course there are many
possible ways to do this, but most of them are entirely artiﬁcial. For f (x) = 1 + x , the
natural extension is f (z ) = 1+ z, z ∈ C . Similarly, if P (x) = N=0 ak xk is any polynomial
k
deﬁned for x ∈ R , the natural extension to z ∈ C is P (z ) = N=0 ak z k . We are thus led
k
to extend f (x) = ex for x ∈ R , to z ∈ C by deﬁning f (z ) = e2 . The only problem is that
we have absolutely no idea what it means to raise a real number, e , to a complex power.
Taylor (power) series are needed to resolve this issue. This will be carried out at the end
of Chapter 1.
Continuity of complex functions is deﬁned in a natural way. Let z0 be an interior point
of the set U ⊂ C (that is, z0 is not on the boundary of U ).
Definition: The function f : U → C is continuous at the interior point a0 U if, given
any > 0 there is a δ > 0 such that |f (z ) − f (z0 )| < for all z in 0 < |z − z0 | < δ .
Reasonable theorems like, if f and g are continuous at the interior point z0 U , so is
the function f + g , are true too - with the same proof as was given for real-valued functions
of a real variable.
Although we could go on and deﬁne the derivative and integral for complex-valued
functions f (z ) of a complex variable, the development would take too much work. For
our future purposes, it will be suﬃcient to deﬁne the derivative and integral of a complexvalued function f (x) of the real variable x . The ﬁrst step is to split f (x) into its real
and imaginary parts, that is, ﬁnd real valued functions u(x) and v (x) such that f (x) =
u(x) + iv (x) . This decomposition c an always be done by taking
u(x) = f (x) + f (x)
f (x) + f (x)
, v (x) =
.
2
2i Since u(x) = u(x) and v (x) = v (x) , both u(x) and v (x) are real-valued functions. It is
clear that f (x) = u(x) + iv (x) .
Example: For the functions f (x) = 1 + 2ix , we have f (x) = 1 − 2ix , so
u(x) = (1 + 2ix) + (1 − 2ix)
(1 + 2ix) − (1 − 2ix)
= 1, v (x) =
= 2x.
2
2i as expected.
Because f (x) is a complex number for every x in the domain where f is deﬁned, we
|f (x)| = u2 (x) + v 2 (x). With this notion of absolute value, the deﬁnitions of continuity and diﬀerentiability read
just as if f were itself real-valued. For example 30 CHAPTER 0. REMEMBRANCE OF THINGS PAST. Definition: : The complex-valued function f (x) of the real variable x is diﬀerentiable
at the point x0 if
f (x) − f (x0 )
lim
x→x0
x − x0
exists.
A more convenient way of dealing with the derivative is supplied by the following
Theorem 0.27 . The function f (x) = u(x) + iv (x) is diﬀerentiable at a point x0 if and
only if both u(x) and v (x) are diﬀerentiable there, and
du
dv
df
=
+i .
dx
dx
dx
Proof: We shall use Theorem 12. Let { xn } be any sequence whose limit is x0 . Deﬁne
the sequences { an }, { αn }, and { βn } by
an = αn = f (xn ) − f (x0 )
,
xn − x0 u(xn ) − u(x0 )
v (xn ) − v (x0 )
, and βn =
.
xn − x0
xn − x0 We must show that limn→∞ an exists if and only if both limits limn→∞ αn and limn→∞ βn
exist, for the existence of these limits is equivalent to the existence of the respective derivatives. But notice that an = αn + iβn , since
an = f (xn ) − f (x0 )
u(xn ) + iv (xn ) − (u(x0 ) + iv (x0 ))
=
= αn + iβn .
xn − x0
xn − x0 Thus we can appeal to Theorem 12 to conclude that lim an exists if and only if both lim αn
and lim βn exist. The formula f = u + iv is an immediate consequence since
an → f (x0 ), αn → u (x0 ), and βn → v (x0 )
Examples:
df
d
d
a) If f (x) = 1 + 2ix, dx = dx 1 + i dx 2x = 2i
b) If f (θ) = cos 7θ + i sin 7θ + 2θ − iθ2
df
d
d
2
dθ = dθ [2θ + cos 7θ ] + i dθ [−θ + sin 7θ ] = 2 − 7 sin 7θ + i[−2θ + 7 cos 7θ ]
A related result which is even easier to prove is
Theorem 0.28 . The complex-valued function f (x) = u(x) + iv (x), x R is continuous at
x0 R if and only if both u(x) and v (x) are continuous at x0 .
Proof: An exercise.
Integration is deﬁned more directly.
Definition: Let f (x) = u(x) + iv (x), x R . If the real-valued functions u(x) , and v (x)
are integrable for x [a, b] , we deﬁne the deﬁnite integral of f (x) by
b b f (x) dx =
a b u(x) dx + i
a v (x) dx.
a 0.8. COMPLEX NUMBERS: COMPLETENESS AND FUNCTIONS 31 The standard theorems, like if c is any complex constant, then
b
a b b b f (x) dx ≤ f (x) dx, and, if a ≤ b, cf (x) dx = c a a |f (x)| dx
a are proved by using the deﬁnition above and the corresponding theorems for real functions.
We shall, however, need the more diﬃcult
Theorem 0.29 . If the complex-valued function f (t) = u(t) + iv (t), t R , is continuous for
all t [a, b] , then there is a constant K such that |f (t)| ≤ K for all t [a, b] . Furthermore
if x, x0 [a, b] , the n
x f (t) dt ≤ K |x − x0 | . (0-8) x0 Notice that the left-hand side absolute value is in the sense of complex numbers.
Proof: Since f (t) is continuous in [a, b] , by Theorem 15 so are both u(t) and v (t) . But
a real-valued function which is continuous in a closed and bounded interval is bounded.
Thus there are constants K1 and K2 such that |u(t)| ≤ K1 , |v (t)| ≤ K2 for all t [a, b] .
then
2
2
|f (t)| = u2 (t) + v 2 (t) ≤ K1 + K2 ≡ K.
To prove the inequality (0.29), we use the inequality mentioned before the theorem to see
that if x0 ≤ x
x x f (t) dt ≤
x0 |f (t)| dt.
x0 Since |f (t)| ≤ K , we ﬁnd that
x |f (t)| dt ≤ K |x − x0 | .
x0 Combining these last two inequalities, we obtain the desired inequality (0.29) if x0 ≤ x .
The other case, x ≤ x0 , can be reduced to that already proved by observing that
x x f (t) dt = −
x0 x f (t) dt ≤ K |x0 − x| = K |x − x0 | . f (t) dt =
x0 x0 Exercises
(1) In the complex sequences below, which ones converge, which do not converge but
have at least one convergent subsequence, and which do neither? In all cases n =
1, 2, 3, . . . .
(a) zn = i
n + 3i − 4 (b) zn = 2i + (−1)n
(c) zn = n − i
(d) zn = in √
(e) zn = 1 + i 3 − (−1)n
7n 32 CHAPTER 0. REMEMBRANCE OF THINGS PAST.
(f) zn = (4+6i)n−5
1−2ni . (2) Write the following complex-valued functions f (x) of the real variable x as f (x) =
u(x) + iv (x) , where u and v and real-valued.
(a) f (x) = i + 2(3 − 2i)x2 ,
(b) f (x) = (1 + 2ix)2
(c) f (x) = cos 3x2 − (3 + i) sin x
(d) f (x) = 1
1+2i−x (3) (a) Use the deﬁnition of the derivative to compute
above.
(b) Find df
dx for all the functions in Exercise 2 above. (4) Evaluate
(a)
(b) df
dx 3
−1 (1 + 2ix) dx
4
1 [x + (1 − i) cos 2x] dx for the function in Exercise 2a Chapter 1 Inﬁnite Series
1.1 Introduction In elementary calculus you have met the notion of the limit of a sequence of numbers (see also
Chapter 0, sections 5 and 7). This concept of limit is just what essentially distinguishes
calculus from algebra. It was crucial in the deﬁnition of the derivative as the limit of a
diﬀerence quotient and the integral as the limit of a Riemann sum. We now propose to
discuss another limiting process, inﬁnite series, in detail.
An inﬁnite series is a sum of the form
∞ ak = a1 + a2 + a3 + · · · , (1-1) k=1 where the ak ’s are real or complex numbers. Since there is no added diﬃculty we shall
suppose the ak ’s are complex numbers. One immediate trouble is that it would take us an
inﬁnite amount of time to add an inﬁnite sum. For example, what is
∞
(a)
=?
k=1 1 = 1 + 1 + 1 + 1 + · · ·
∞
(b)
1 = 1 − 1 + 1 − 1 + 1 − 1···
=?
k=1
∞
1
1
(c)
= 1 + 1 + 1 + 1 + 16 + · · · =?
k=1 2k−1
2
4
8
Thus, we are faced with the realization that be sum (1) is not really well deﬁned, even
in cases where we feel it might make sense.
Our ﬁrst task is to give a more adequate deﬁnition. Let Sn be the sum of the ﬁrst n
terms:
n Sn := a1 + a2 + · · · + an = ak .
k=1 Then for each n , we have a complex number Sn , called the nth partial sum of the series
(1).
Definition: If limn→∞ Sn = S , where S is a (ﬁnite) complex number, we say that the
inﬁnite series converges to S . If the sequence S1 , S2 , S3 , . . . has no limit, we say that the
inﬁnite series diverges.
For the examples given just above, we have
(a) Sn = n 1 = n → ∞ so the inﬁnite series diverges to ∞ .
1
1 n odd,
(b) Sn = n (−1)n+1 =
. which does not have a limiting value since
1
0 n even,
it oscillates between 1 and 0.
33 34 CHAPTER 1. INFINITE SERIES (c) Sn = n 2k1 1 = 2(1 − 21 ) → 2 , so the inﬁnite series converges to the number 2
n
−
1
(we found the sum of the series by realizing it is a simple geometric series:
1 + r + r2 + · · · + rN = 1 − rN +1
)
1−r for (r = 1). With an adequate deﬁnition of convergence of inﬁnite series, it is clear that we should
develop some tests for determining if a given series converges. That will be done in the next
section. In preparation, let us examine some simple types of series which occur often and
prove a few useful theorems.
There are two types of series whose sums can always be found, and for which the
question of convergence is exceedingly elementary.
Definition: An inﬁnite geometric series is a series of the form
∞ ark = a + ar + ar2 + · · · .
k=0 The partial sums are
Sn = a + ar + · · · + arn = a
Theorem 1.1 The inﬁnite geometric series
a
|r| < 1 . Then the sum is 1−r . 1 − rn+1
1−r
∞
k
k=0 ar , for (r = 1).
a = 0 , converges if and only if Proof: limn→∞ rn+1 exists only if |r| < 1 . Then the limit is zero so limn→∞ Sn =
(the non-convergence when |r| = 1 follow from Theorem 6, p. ?)
Examples:
(a) ∞
k=0 (1 (b) ∞
1+i k
k=0 ( 2 ) (c) ∞
k=1 1 (d) ∞
k
k=1 (−1) + i)k diverges since |1 + i| =
converges since 1+i
2 √ 2 ≥ 1. √ = 2
2 < 1. The sum of this series is 1 + i . diverges since |1| = 1 .
diverges since |−1| = 1 . Definition: An inﬁnite telescopic series is one of the form
∞ (αk − αk+1 ) = (α1 − α2 ) + (α2 − α3 ) + (α3 − α4 ) + · · · .
k=1 It is clear that most of the terms cancel each other.
Theorem 1.2 If αk → α , then ∞
k=1 (αk − αk+1 ) = α1 − α . Proof: Sn = (α1 − α2 ) + (α2 − α3 ) + · · · + (αn − αn+1 ) = α1 − αn+1 → α1 − α .
Examples:
1
1
(a) 112 + 213 + 314 + · · · = ∞ k(k1 = ∞ ( k − k+1 ) = 1 .
k=1
k=1
·
·
·
+1)
1
1
(b) 4·11−1 + 4·21−1 + 4·31−1 + · · · = 2 ∞ ( 2k1 1 − 2k1 ) = 2
2
2
2
k=1
−
+1 a
1− r 1.1. INTRODUCTION 35 We close this section with some reasonable (and desirable) theorems. The proofs are
immediate consequences of the deﬁnition of convergence of inﬁnite series and the related
theorems about limits of sequences of numbers.
∞
k=1 ak Theorem 1.3 . If n
k=1 ak Theorem 1.4 If → a , and c is any number then → a and n
k=1 bk ∞
k=1 cak n
k=1 (ak → b , then → ca . + bk ) → a + b . Theorem 1.5 Let ak = αk + iβk , where αk and βk are real. The inﬁnite series
ak
converges if and only if the two real series
αk and
βk both converge. That is, an
inﬁnite complex series converges if and only if both its real and imaginary parts converge.
Proof: We must look at the partial sums. Let σn =
n Sn = n n αk =
k=1 n
k=1 αk (αk + iβk ) =
k=1 , and τn = n
k=1 βk . Then n αk + i
k=1 βn = σn + iτn .
k=1 But we know from Theorem 12 of Chapter 0 that the complex sequence Sn converges if
and only if both its real part, σn , and imaginary part, τn , both converge—in other words,
if the series
ak and
βk both converge.
Two remarks should be made in an attempt to mitigate some confusion. First, the index
∞
∞
∞
k of the series
k=1 ak could have been any other letter. Thus
k=1 ak =
j =1 aj . This
is perhaps indicated most clearly if we left an empty box instead of using any letter at all:
. The connecting line means that the same letter must be used in both boxes. Now you
can ﬁll in any letter that makes you happy. No matter w hat you write, it still means
a1 + a2 + a3 + · · · . In a similar way, the index need not begin with 1. Thus, for example,
∞
∞
k=1 ak =
k=17 ak−16 = a1 + a2 + · · · . Although this manipulation looks like unwanted
silliness here, it is sometimes quite useful. Later on this year you will need it. The related
transformation for integrals is illustrated by
3
2 1
dt =
t 2
1 1
dt.
t+1 Exercises
(1) Find a closed form expression for the nth partial sum of the following inﬁnite series
and determine if they converge.
(a) 2
3 + 2
9 (b) 1 + i +
(c) 1
2! + ∞
2
k=1 3k . 2
2
27 + · · · + 3n + · · · =
i2 + i3 + · · · + in + · · · + 2
3! + 3
4! + ··· = ∞ k −1
k=2 k! = ∞
1
k=2 ( (k−1)! 3
n
(d) ln 1 + ln 2 + ln 4 + · · · + ln( n+1 ) + · · ·
2
3 (e)
(f) ∞
3− 4i m
m=0 ( 7 )
∞
2− 3i
n=1 n(n+1) 1
− k) 36 CHAPTER 1. INFINITE SERIES (2) The repeating decimal 1.565656 · · · can be written as
56
56
56
1 + 2 + 4 + 6 + · · · = 1 + 56
10
10
10 ∞ (
k=1 1k
).
102 Sum the geometric series and ﬁnd what rational number the repeating decimal represents. In a similar way, every decimal which begins to repeat eventually is a rational
number. What rational number is represented by 1.4723?
(3) A ball is dropped from a height of 20 feet. Every time it bounces, it rebounds to 3
4
of its height on the previous bounce. What is the total distance traveled by the ball?
(4) If ∞
∞
k=1 ak → a and
k=1 bk
∞
(αak + βbk ) → αa + βb .
k=1 (5) If an > 0 and → b , and if α and β are any numbers, prove that
1
an an converges, prove that (6) Does the convergence of ∞
n=1 an diverges. imply the convergence of ∞
n=1 (an + an+1 ) ? (7) (a) If the partial sums of
an are bounded, and { bn } is a strictly decreasing
sequence with limit 0, bn
0 , prove that
an bn converges.
(b) Use (a) to prove that if ∞
n=1 nan converges then so does the series (c) Use (a) to discuss the convergence of 1.2 ∞ sin nx
n=1 n ∞
n=1 an . . Tests for Convergence of Positive Series Tests to determine convergence are of several types, i) those that give suﬃcient conditions,
ii) those that give necessary conditions, and iii) those that give both necessary and suﬃcient
conditions. Theorem 1 of the last section governing geometric series was of the last type;
however it is more common to ﬁnd convergence tests of the ﬁrst two types since they are
usually easier to come by. You should be careful to observe the nature of a test . A simple
theorem should make the point clear.
Theorem 1.6 . If the series ∞
k=1 ak —where ak may be complex—converges, then
lim |ak | = 0. k→∞ Proof: Let Sn = a1 + a2 + · · · + an . Then |an | = |Sn − Sn−1 | . As n → ∞ both Sn and
Sn−1 tend to the same limit, so |an | → 0 .
Returning to the point made before, this theorem states a necessary but not suﬃcient
(as we shall see) condition for an inﬁnite series to converge. We can apply it to see that
k
k
k+1 diverges—since k+1 → 1 = 0 . Thus this theorem is useful as a quick crude test to
∞1
weed out series which diverge badly. But all it tells us about the series
k=1 k —for which
1
k → 0 so the criterion of the theorem is satisﬁed—is that it might converge. In fact, this
series diverges too, as we shall now prove.
∞
k=1 1
1111
11
1
1
1
= 1 + + + + + ··· + + + ··· +
+
+ ··· +
+ ···
k
2345
89
16 17
32 1.2. TESTS FOR CONVERGENCE OF POSITIVE SERIES 1+ 37 1111
1
1
1
1
1
+ + + + ··· + +
+ ··· +
+
+ ··· +
+ ···
2448
8 16
16 32
32
=1+ 11
+
22 + 1
2 1
2 + + 1
2 + · · · .. 1
1
1
Thus S1 = 1, S2 = 1 + 1 , S4 > 1 + 1 + 2 = 1 + 2 · 2 , S8 > 1 + 3 · 2 , S16 > 1 + 4 · 1 , . . . , S2n >
2
2
2
1
1
1 + n · 2 . We can easily see that as n → ∞, S2n → ∞ , so the series
k , called the
harmonic series, diverges.
For the many series which slip through the test of Theorem 6, more reﬁned criteria are
needed. The criteria we shall present in the remainder of this section are for series with
positive terms, an ≥ 0 . Application of these criteria to series with complex terms will be
made in the next section. Theorem 1.7 . If ak ≥ 0 for each k , then the series
the sequence of partial sums is bounded from above. ∞
k=1 ak converges if and only if Proof: Since all the ak ’s are non-negative, Sn+1 ≥ Sn . Thus the Sn ’s are a monotone
increasing sequence of real numbers. By Theorems 6 and 8 of Chapter 0, this sequence Sn
converge if and only if it is bounded.
Example: The series ∞1
k=1 k! of positive terms converges, since 1
1
1
1
=
≤
= k −1
k!
1 · 2 · 3 · · · .k
1 · 2 · 2 · 2 · · · .2
2
so n Sn =
k=1 1
≤
k! n
k=1 1
2k − 1 ∞ ≤
k=0 1
= 2.
2k The convergence now follows since Sn is bounded from above.
We can extract an exceedingly useful idea from these examples: check the convergence
of a given series by comparing it with another series which we know to converge or diverge.
Theorem 1.8 . (comparison test) Let
ak ≤ bk for n > N . Then
i) if
bk converges, so does
ak .
ii) if
ak diverges, so does
bk . ak and bk be two positive series for which Proof: Let sn = n+N +1 ak and tn = n+N +1 bk . Then sn ≤ tn for all n > N , so i) if
k
k
tn → t , then sn is bounded (sn ≤ t) , ii) if sn → ∞ , then tn → ∞ too.
Remark: The “n > N ” part of the hypothesis reﬂects the fact that it is only the inﬁnite
tail of an inﬁnite series that we need to worry about. Any ﬁnite number of terms can always
be added later on.
Examples:
∞
1
1
1
1
(a)
converges.
k=1 2k +1 converges since 2k +1 < 2k and
2k
∞√
1
1
1
√ ≥ 1 (for k ≥ 1 ) and
(b)
k=1 k diverges since
k
k diverges.
k
Our next test is based upon comparison with a geometric series
rn . 38 CHAPTER 1. INFINITE SERIES Theorem 1.9 . (ratio test) Let
following limit exists an be a series with positive terms such that the
lim n→∞ an+1
= L.
an Then
i) if L < 1 , the series converges
ii) if L > 1 , the series diverges
iii) if L = 1 , the test is inconclusive.
Remark: If the assumed limit does not exist, a variant of the theorem is still true but we
shall not discuss it.
Proof: i) If L < 1 , pick any r, L < r < 1 . Then there is an N such that for all
n ≥ N, an+1 < r . Therefore an < ran−1 < r2 an−2 < . . . < rn−N aN , so that an <
an
∞
N −1
∞
N
Krn , n ≥ N , where K > aN . The series
n=1 an =
n=1 an +
n=N an consists of a
r
ﬁnite sum plus an inﬁnite tail which is dominated by the geometric series
K rn . Since
r < 1 , the geometric series converges and by the comparison test, so does
an .
ii) If L > 1 , then an+1 > an for all n > N ; thus limn→∞ an = 0 . By Theorem 6, the
series
an cannot converge.
iii) This is seen from the two examples.
an+1
1
n
(a)
n , with limn→∞ an = limn→∞ n+1 = 1 , which we know diverges.
(n+1)(n+2)
an+1
1
(b)
= 1 , which we know (Theorem
n(n+1) , with limn→∞ an = limn→∞
n(n+)
2, Example a) converges.
In both these cases L = 1 . You should notice that the criterion uses the limiting value
1
of an+1 /an . The divergent harmonic series
n , whose ratio n/n + 1 is less than one
for ﬁnite n , but 1 in the limit shows the mistake you will make if you use the ratio before
passing to the limit. Examples:
(a) n!
: Since limn→∞ ( an+1 ) = limn→∞ ( (n+1)! ) = limn→∞
an
less than one so the series converges.
1
n! (b) 10n
n! (c) n!
2n 1
n+1 = 0 < 1 , the ratio is : Since limn→∞ ( an+1 ) = limn→∞ ( n10 ) = 0 < 1 , the series converges.
an
+1
: Since limn→∞ ( an+1 ) = limn→∞ ( n+1 ) = ∞ , the series diverges.
an
2 Our last test for series with positive terms is associated with a picture. The crux of the
∞
matter is very simple and clever. We associate an area with the inﬁnite series
n=1 an .
For the term an we use a rectangle between n ≤ x ≤ n + 1 of height an and base one.
Then the sum of the inﬁnite series is represented by total area under the rectangles. Now
by Theorem 7, if all the an ’s are positive we know the series converges if the total area
is ﬁnite. Thus, if we can ﬁnd a function f (x) whose graph lies above the rectangles, and
whose total area is ﬁnite, then we know the area contained in the rectangles is ﬁnite and so
the series converges.
∞
Theorem 1.10 . (integral test) Let
n=i an be a series of positive decreasing terms:
0 < an+1 ≤ an , and f (x) a continuous decreasing function with f (n) = an . Then the
sequence
N SN = N f (x) dx an and TN =
n=1 1 1.2. TESTS FOR CONVERGENCE OF POSITIVE SERIES 39 either both converge or both diverge, in fact, SN − a1 ≤ TN ≤ SN −1 .
Proof: First of all,
N 2 f (x) dx = 3 +··· + + 1 N −1 N 1 f (x) dx. N −1 2 n+1 =
n n=1 Since in the interval n ≤ x ≤ n + 1 we know that
an = f (n) ≥ f (x) ≥ f (n + 1) = an+1 ,
we see that
n+1 n+1 f (n) dx ≥ an =
n n+1 f (x) dx ≥
n f (n + 1) dx = an+1 .
n Adding these up, we ﬁnd
N −1 N −1 an ≥
n=1 or f (x) dx ≥
n=1 n N −1 an+1
n=1 N N an ≥
n=1 N −1 n+1 f (x) dx ≥
1 an .
n=2 Thus
SN −1 ≥ TN ≥ SN − a1 .
From this last inequality, we see that limn → ∞TN is ﬁnite if and only if limn→∞ SN
is ﬁnite. Since the sequences SN and TN are both monotone increasing sequences, by
Theorem 7 the sequences converge or diverge together. And we are done.
Examples:
∞
1
(a).
n=1 np converges if p > 1 , diverges if p ≤ 1 . We use the function f (x) =
which satisﬁes the hypothesis of the theorem, and examine the integral
N TN =
1 1
dx =
xp N 1−p −1
1− p ln N , p = 1.
, p=1 1
xp , . As N → ∞, ln N → ∞ , and so does N 1−p if p < 1 , while N 1−p → 0 if p > 1 . Therefore
∞
1
TN converges if and only if p > 1 , so by our theorem
n=1 np converges if and only if
1
p > 1 . In the special case p = 1 we have again proven that the harmonic series
n
1
diverges. Another often seen special case is p = 2,
, which converges. Sometime later
n2
∞
1
π2
we shall prove the amazing
n=1 n2 = 6 .
N dx
∞
1
(b)
n=2 n ln n diverges since as N → ∞, 2 x ln 2 = ln(ln N ) − ln(ln 2) → ∞ Exercises
(1) Determine if the following series converge or diverge.
(a) ∞
1
n=1 n2 +1 40 CHAPTER 1. INFINITE SERIES
(b)
(c)
(d)
(e)
(f)
(g)
(h)
(i)
(j)
(k)
(l)
(m)
(n) ∞
1
n=1 2n−1
∞
1
n=1 n(ln n)2
∞
1
n=1 10n2
∞
n
n=1 n2 +1
∞
1
n=1 2n+3
∞ cos2 n
n=1 2n
√
n
∞
n=1 n3 +1
∞ n2
n=1 2n
∞
−n2
n=1 ne
∞√
1
n=1
n(n+1)(n+2)
∞
n!
n=1 22n
∞ |a n |
n=1 10n , |an |
∞
p −n , p
n=1 n e < 10
∈R (2) If an ≥ 0 and bn ≥ 0 for all n ≥ 1 , and if there is a constant c such that an ≤ cbn ,
prove that the convergence of
bn implies the convergence of
an .
1
(3) Use the geometric idea of the integral test to show limn→∞ [1 + 1 + · · · + n − ln n]
2
1
converges to a constant γ , and show that 2 < γ < 1 . γ is called Euler’s constant. (4) If an converges, where an ≥ 0 , prove that an
1+an also converges. (5) (a). If
an converges, where an ≥ 0 , and cn have the property 0 ≤ cn ≤ K , the
same K for all n , then prove that
cn an converges.
(b). Deduce the result of Exercise 4 from Exercise 5a.
(6) Use the geometric idea behind the integral test to prove that
(a). ln n! = ln 1 + ln 2 + ln 3 + · · · + ln n >
From this deduce that n
1 ln x dx = n ln n − n + 1 when n ≥ 2 . (b). n! > e( n )n , when n ≥ 2 .
e
(c). As an application of (b), prove that limn→∞ xn
n! = 0. 1
(7) (a). Use the idea in the proof of the divergence of the harmonic series,
n , to prove
the following test for convergence: Let { an } be a positive monotonically decreasing
sequence. Then
an converges or diverges respectively if and only if the “condensed”
n a n converges or diverges.
series
22 (b). Apply the test of part (a) to again prove that
diverges if p ≤ 1 . 1
np converges if p > 1 , and (c). Apply the test of part (a) to determine the values of p for which the series
∞
1
n=2 n(ln n)p converges and diverges. 1.3. ABSOLUTE AND CONDITIONAL CONVERGENCE 1.3 41 Absolute and Conditional Convergence The tests just given for series with positive terms can be applied to many series with complex
terms an by utilizing the concept of absolute convergence.
∞
Definition: The series
k=1 ak , where the ak may be complex numbers, converges
∞
absolutely if the series of positive numbers
k=1 |ak | converges. It is called conditionally
∞
∞
convergent if
ak converges but
|ak | diverges.
k=1
k=1
Absolute convergence is stronger than ordinary convergence because
Theorem 1.11 . If ∞
n=1 |an | converges, then ∞
n=1 an converges. Proof: Let aN = αn + iβn . We shall show that the real series
αn and
βn both
converge. Then by Theorem 5
an converges too. To show that
αn converges, let
2
2
cn = αn + |an | . Since |αn | ≤ (αn + βn ) = |an | , we know that 0 ≤ cn ≤ 2 |an | . Thus
the positive series
cn is bounded,
cn ≤ 2 |an | < inf ty , and so converges by the
comparison test (Theorem 8). Since
αn = (cn − |an |) , and both
cn and
|an |
converge, then
αn also converges by Theorem 4. Similarly, by taking dn = βn + |an | ,
the series
dn converges, from which we can conclude that
βn converges.
Examples:
(a) The complex series
in−1
n2 = 1
n2 1
12 + i
22 + i2
32 + and the positive series i3
+ ···
42
∞
1
n=1 n2 = ∞ in
n=1 n2 converges absolutely since converges. (b) 1 + 212 − 213 − 214 + 215 + 216 − 217 − 218 + · · · , which is the geometric series
1
negative signs thrown in, converges absolutely since
2n converges.
(c) rn , r complex, converges absolutely if 1
2n with |r|n converges, that is, if |r| < 1 . n+1 1
1
(d) 1 − 2 + 3 − 1 + 1 . . . = ∞ (−1)
, the alternating harmonic series does not converge
n=1
4
5
n
1
absolutely because
n diverges. It does converge though, as we shall see shortly.
Thus the alternating harmonic series is conditionally convergent. On the basis of this last theorem, many complex series can be proved to converge by
proving they converge absolutely. Since absolute convergence concerns itself with series
having only positive terms, all the tests for convergence developed in the previous section
may be used. This is the most common way of proving a complex series converges. If it
does not converge absolutely, the proof of convergence will usually be more diﬃcult and use
special ingenuity based on the particular series at hand.
There is one case of conditional convergence which is easy to treat, that of alternating
series.
Definition: A series of real numbers is called alternating if the positive and negative terms
occur alternately. They have the form
∞ (−1)n−1 an = a1 − a2 + a3 − a4 + · · · ,
n=1 where the an ’s are all positive. 42 CHAPTER 1. INFINITE SERIES ∞
n−1 a , a > 0 , converges if i) the a
Theorem 1.12 . The alternating series
n
n
n
n=1 (−1)
are monotone decreasing ( an
), and ii) limn→∞ an = 0 . If S is the sum of the series,
the inequality
0 < |S − SN | < aN +1
(1-2) shows how much the N th partial sum diﬀers from the limit S . In words inequality (2)
says that the error which results by using the ﬁrst N terms is less than the ﬁrst neglected
term aN +1 .
Proof: The idea is quite simple. Observe that since an , S2n − S2n−2 = a2n−1 − a2n > 0 ,
so the S2n ’s increase. Similarly the S2n+1 ’s decrease. Also both sequences are bounded—
from below by S2 and from above by S1 (you should check this). Therefore by Theorem
8 Chapter 0, the bounded monotonic sequences S2n and S2n+1 converge to real numbers
ˆ
ˆ
S and S respectively. Let us show that S = S .
ˆ
S − S = lim S2n+1 − lim S2n = lim (S2n+1 − S2n ) = lim a2n+1 = 0
n→∞ n→∞ n→∞ n→∞ Thus the alternating series converges to the unique limit S . All that is left to verify is
inequality (2). Because S2n is increasing and S2n+1 is decreasing, we know that
S2n < S and S < S2n+1
Therefore
0 < S − S2n < S2n+1 − S2n = a2n+1 0 < S2n−1 − S < S2n−1 − S2n = a2n . and These two inequalities are the cases N even and N odd in (2).
Examples:
(a)
∞ (−1)n−1
n=1
n converges since it is an alternating sequence and
tonically to zero. Later we shall show that its sum is ln 2 . (b) ∞ (−1)n
n=2 ln n (c) ∞
n−1 n
n=1 (−1)
n+1 converges since 1
ln n 1
n decreases mono- decreases monotonically to zero. diverges by Theorem 6 since limn→∞ (−1)n−1 n
n+1 is not zero. Exercises
(1) Determine which of the following series converge absolutely, converge conditionally,
or diverge.
(a)
(b)
(c)
(d)
(e) 1
(f) 1)
∞ (−√ n+1
n=1
n
∞ (2−3i)n
n=1
n!
∞ (2k+i)2
k=2
ek
∞
n−1 ln n
n=1 (−1)
n
1
− 1 + 1 − 212 + 5
2
3
∞
1
n=1 n2 +2i − 1
23 + 1
7 − 1
24 + 1
9 − ··· . 1.4. POWER SERIES, INFINITE SERIES OF FUNCTIONS
(g) ∞
1
n=1 n+2i (h) ∞ (−1)n−1
n=1 n+2i (i) ∞ (−1)n−1
,
n=1
np (j) 43 2
∞
n (1+i)n
n=1 (−1) 2n2 +1 (k) ∞ cos nθ
n=1 n2 , p > 0, θ arbitrary. (2) If
an and
bn are absolutely convergent, and α and β are any complex numbers,
prove that
(αan + βbn ) also converges absolutely.
(3) Show that ∞
n
n=1 nz converges absolutely if |z | < 1 . (4) Show that for any θ ∈ R , then
∞
then
n=0 sin nθ also diverges. 1.4 ∞
n=0 cos nθ diverges, and that if θ = 0, ±π, ±2π, . . . , Power Series, Inﬁnite Series of Functions As you will all agree, the simplest functions are polynomials. With inﬁnite series at hand,
it is reasonable to consider an “inﬁnite polynomial”
∞ a0 + a1 z + a2 z 2 + a3 a3 + · · · = an z n .
n=0 Because of the appearance of the powers of z , this is called a power series. The question
of convergence of a power series is trivial at z = 0 , for then we have only the one term a0 .
Does this series converge for any other values of z , and if so, for which ones?
The answer depends on the coeﬃcients an , but in any case, the set of complex numbers,
z ∈ C , for which the series converges is always a disc |z | < ρ —with possibly some additional
points on the boundary |z | = ρ —in the complex pane bC . This number ρ is called the
radius of convergence of the power series. We shall ﬁrst prove that the set z ∈ C for which
a power series converges is always a disc. Then we shall give a way of computing the radius
ρ of that disc.
Theorem 1.13 . The set z ∈ C for which the power series
an z n converges is always
a disc |z | < ρ , inside of which it even converges absolutely. We do not exclude the two
extreme possibilities that the radius of this disc is zero or inﬁnity.
The series might converge at some, none, or all of the points on the boundary of the
disk |z | = ρ .
Proof: We shall show that if the series converges for any ζ ∈ C , then it converges
absolutely for all complex z with |z | < |ζ | . If ζ = 0 , there is nothing to prove, so assume
ζ = 0 . Because
an ζ n converges, limn→∞ |an ζ n | → 0 . Thus all the terms are bounded
in absolute value, that is, there is an M such that |an ζ n | < M for all n . Then, since
|an z n | = an ζ n zn
z
<M
n
ζ
ζ n for all n, 44 CHAPTER 1. INFINITE SERIES the series |an z n | is dominated by M z
ζ n . But this last series is a geometric series an z n
which does converge since |z | < |ζ | , so z < 1 . Thus by the comparison test
ζ
converges absolutely for all z ∈ C with |z | < |ζ | .
Therefore, if the power series
an z n converges for some complex number ζ , then it
converges in the whole disc |z | < |ζ | . The radius of convergence ρ is then the radius of
the largest disc |z | < ρ for which the series converges.
See Exercise 3 for examples concerning convergence on the boundary of the disk.
Let us now give a method of computing ρ which covers most cases arising in practices.
Theorem 1.14 . If limn→∞ an+1
an = L exists, the power series an z n has radius of 1
convergence ρ = L if L = 0, ∞ if L = 0 . In other words, if L = 0 the series converges
1
1
in the disc |z | < L and diverges if |z | > L . On the circumference |z | = 1/L , anything
may happen (see Exercise 3 at the end of this section). If L = 0 , the series converges in
the whole complex plane. Proof: This is a simple application of the ratio test. The series converges if the limit of
z n+1
is less than one and diverges if it is greater
the ratio of successive terms limn→∞ an+1 z n
an
than one. Thus we have convergence if
lim 1
an+1 z
= |z | L < 1, i.e. if |z | < ,
an
L lim an+1 z
1
= |z | L > 1, i.e. if |z | > .
an
L n→∞ and divergence if
n→∞ Remark: In the one additional case an+1 → ∞ as n → ∞ , the series diverges for every
an
|z | = 0 , as can easily be seen again by the ratio test.
Examples:
(a) ∞
n
n=0 z (b) ∞ nz n
n=0 2n converges where limn→∞ z n+1 /z n < 1 that is; for |z | < 1 .
converges where limn→∞
lim n→∞ (n+1)z n+1 nz n
/ 2n
2n+1 < 1 . Since (n + 1)z n+1 nz n
(n + 1)z
z
/ n = lim
,
=
n+1
n→∞
2
2
2n
2 the series converges for all |z | < 2 .
(c) ∞ zn
n=0 n! converges where limn→∞
lim n→∞ z n+1 z n
(n+1)! / n! < 1 . Since z n+1 z n
z
/
= lim
= 0,
n→∞ n + 1
(n + 1)! n! the series converges for all z ∈ C , that is, in the whole complex plane. 1.4. POWER SERIES, INFINITE SERIES OF FUNCTIONS
(d) ∞
n
n=0 n!z converges where limn→∞
lim n→∞ (n+1)!z n+1
n!z n 45 < 1 . But (n + 1)!z n+1
= lim |(n + 1)z | = ∞
n→∞
n!z n unless z = 0 . Thus the ratio is less than one only at z = 0 , so the series converges
only at the origin.
Only minor modiﬁcations are needed for the more general power series
∞ a0 + a1 (z − z0 ) + a2 (z − z0 )2 + · · · = an (z − z0 )n ,
n=0 where a0 ∈ C . Again the series converges in a disc in the complex plane, only now the disc
has its center at z0 instead of the origin, so if the radius of convergence is ρ , the series
converges for |z − z0 | < ρ . An example should make this clear.
Example: ∞ (z −2i)n
n=1
n . By the ratio test, this converges when
(z − 2i)n+1 (z − 2i)n
/
< 1,
n+1
n lim n→∞ that is, when |z − 2i| < 1 . This is a disc with center at 2i and radius 1.
A few words should be said about real power series
an (x − x0 )n where both x and
x0 are real (some people only use this phrase if the an are also real). This is a special case
of
an (z − z0 )n where z0 is on the real axis and we only ask for what real z the series
converges. However we know that
an (z − z0 )n converges only for those z in the disc
of convergence |z − z0 | < ρ —and possibly some boundary points. Thus the real values of
z for which the series
an (z − z0 )n converges are exactly those points on the real axis
which are also inside the disc of convergence of the complex power series. In particular the
series
an (x − x0 )n , with both x and x0 real converges for |x − x0 | < ρ , i.e., in the
interval x0 − ρ ≤ x ≤ x0 + ρ .
∞
1
n
Example: For what x ∈ R does
n=0 2n (x − 1) converge? The related complex series
∞
1
n converges in the disc |z − 1| < 2 . The points on the real axis which are
n=0 2n (z − 1)
in this disc are |x − 1| < 2 , which is −1 < x < 3 . A direct check shows the series diverges
at both end points x = −1 and x = 3 . If
an and
bn both converge, can we deﬁne
their product in a meaningful way
∞ (
n=0 ∞ an )(
n=0 ∞ bn ) = cn ?
n=0 and if so, does the resulting series converge? The most simple-minded approach is to insert
powers of z (a bookkeeping device), giving ( an z n )( bn z n ) , try long multiplication and
see what happens. A computation shows that
(a0 + a1 z + ax z 2 + · · · )(b0 + b1 z + b2 z 2 + · · · ) = a0 b0 + (a0 b1 + a1 b0 )z
+(a0 b2 + a1 b1 + a2 b0 )z 2 + · · · + (a0 bn + a1 bn−1 + · · · + an b0 )z n + · · · . 46 CHAPTER 1. INFINITE SERIES Motivated by this, we make the following
Definition: The formal product, called the Cauchy product, of the series
is deﬁned to be
∞ ∞ ( bn ∞ bn ) ≡ an )( n=0 an and n=0 cn ,
n=0 where n cn = a0 bn + az bn−1 + · · · + an b0 = ak bn−k .
k=0 With this deﬁnition we shall answer the question we raised about multiplication of
power series.
Theorem 1.15 . If
Cauchy product series ∞
n=0 an ∞
n=0 bn = A and
∞ ( = B both converge absolutely, then the ∞ ∞ bn ) ≡ ( an )( n=0 n=0 where cn ), n=0 ∞ ak bn−k , cn =
k=0 also converges absolutely, and to C = AB .
Proof: Let AN = N=0 an , BN = N=0 bn , and CN = N=0 cn . We shall show that by
n
n
n
picking N large enough, |AN BN − CN | can be made arbitrarily small. Since AN BN →
AB , this will complete the proof. Observe that
CN = a0 b0 + (a0 b1 + a1 b0 ) + · · · + (a0 bN + · · · + aN b0 ) =
while N aj bk , N AN BN = (a0 + · · · + aN )(b0 + · · · + bN ) = aj bk .
j =0 k=0 Therefore
N N N N aj bk ≤ |AN BN − CN | =
j =0 k=0 |aj | |bk | .
j =0 k=0 Since j + k > N , either j > N/2 or k > N/2 , so
N N j> N
2 k=0 |AN BN − CN | ≤ N N |aj | |bk | + |aj | |bk | .
j =0 k> N
2 Because the original series both converge absolutely, they are bounded,
∞ ∞ |aj | < M and
j =0 |bk | < M.
k=0 Consequently, ∞ |AN BN − CN | ≤ M (
j> N
2 ∞ |aj | + |bk |).
k> N
2 1.4. POWER SERIES, INFINITE SERIES OF FUNCTIONS 47 Again using the absolute convergence of the original series, we see that for N large,
the right side can be made arbitrarily small.
Since we shall need the ideas later on, let us digress brieﬂy and examine the convergence
of inﬁnite series of functions,
un (z ) . In the special case where un (z ) = an (z − z0 )n ,
this is a power series. Generally, there is little one can say about the convergence of such
series except to apply our general tests and hope for the best. We shall only illustrate the
situation with two
Examples:
(a) (b) ∞ cos nθ
n=1 n2 , where θ is any real number. This converges for all θ since it converges
nθ
absolutely, that is
+ cos2
converges. We can see this last statement is true
n
cos nθ
with the larger convergent series (since |cos nθ| ≤ 1 )
by comparing
+ n2
∞
1
n=1 n2 .
∞
nx
n=1 ne . By the ratio test, converges if limn→∞ (n + 1)e(n+1)x /nenx < 1 . Since
lim (u + 1)e(n+1)x /nenx = lim n→∞ n→∞ n+1 x
e = ex ,
n the series converges if ex < 1 , which happens only when x < 0 . Exercises
(1) Find the disc of convergence of the following power series by ﬁnding the center and
radius of the disc.
(a)
(b)
(c)
(d)
(e)
(f)
(g)
(h)
(i)
(j)
(k)
(l)
(m)
(n) ∞
zn
n=0 n+1
∞ (z −2)n
n=0
n
∞
in
n−1
n=0 2n−1 z
∞
n=0 (n + 1)[z − 2
∞ (2z +3)n
n=0 n2 +2i
∞
1 n−2
n=0 ln n z
∞ (2n−i) n
n=0 3n z
∞ 2n z n
n=0 n! (0! ≡ 1)
∞
1
in
n=0 ( 2n + 3n z
∞ (z +i)n
n=0 22n
∞
z2 n
n=0 (2n)!
∞
n
n
n=0 n (z − 1)
∞ z 2n
n=0 4n
∞
1
i
n=0 ( n + n2 +1 )(z + 3i]n − √ 2i)n (2) Find the set x ∈ R for which the following series converge. 48 CHAPTER 1. INFINITE SERIES
(a)
(b)
(c)
(d)
(e)
(f)
(g) ∞ (x−1)n
n=0 n2n
∞ cos nx
n=0 2n
∞ 1 x− 1 n
n=0 n ( x )
∞
−n(x+1)
n=0 e
∞ 2n (sin x)n
n=0
n
∞
xn
n=0 (1 + e )
∞
xn
n=0 (1 − e ) (3) The point of this exercise is to show that a power series might converge at some, none,
or all of the points on the boundary of the disk of convergence.
∞
n
n=0 z (a) Show that
vergence. diverges at every point on the boundary of its disc of con- n ∞
z
(b) Show that
n=0 n+1 diverges for z = 1 but converges for z = −1 (in fact, it
converges everywhere on |z | = 1 except at z = 1 ). (c) Show that
convergence.
(4) If ∞
xn
n=0 (n+1)2 converges at every point on the boundary of its disc of an z n diverges for z = ζ ∈ C , prove that it diverges for all z ∈ C with |z | > |ζ | .
2 ∞
z
(5) For what z ∈ C does
n=0 (1+z 2 )n converge? Find a formula for the n th partial
sum Sn (z ) . Evaluate limn→∞ Sn (z ) . Is the limit function continuous?
∞
n have radius of convergence rho , and let P (n) be any polyno(6) Let
n=0 P (n)an z
∞
n converges and also has ρ as its radius of convermial. Prove that
n=0 P (n)an z
gence. (By P (n) w e mean P (n) = Ak nk + Ak−1 nk−1 + · · · + A1 n + A0 ). 1.5 Properties of Functions Represented by Power Series Having found that a power series
an (z − z0 )n converges in some disc, |z − z0 | < , it
is interesting to study the function f (z ) deﬁned by the power series for z in the disc of
convergence
∞ an (z − z0 )n , f (z ) = |z − z0 | < ρ. n=0 It turns out that functions f (z ) deﬁned by a convergent power series are delightful, as
nicely behaved as functions can be. In particular, they are not only continuous, but also
automatically have an inﬁnite number of continuous derivatives and many other amazing
properties.
This section will be devoted to proving the more elementary properties of functions
represented by power series, while in the next section we will begin with given functions,
like sin x , and see if there is a convergent power series associated with the m, as well as
showing a way of obtaining the coeﬃcients an of that power series. The profound theory
of functions represented by convergent power series is called analytic functions of a complex
variable. 1.5. PROPERTIES OF FUNCTIONS REPRESENTED BY POWER SERIES 49 Definition: A function f (z ) of the complex variable z is said to be analytic in the disc
|z − z0 | < ρ if f (z ) can be represented by a convergent power series in that disc:
∞ an (z − z0 )n , f (z ) = |z − z0 | < ρ. n=0
df
Since we have not developed the notion of the derivative, dz , of a complex valued
function f (z ) of the complex variable z , nor have we considered the corresponding theory
of integration, f (z )dz , the scope of our treatment will regrettably have to be narrowed.
However our proofs will have the property that as soon as an adequate theory of diﬀerentiation and integration is given, the theorems and proofs remain unchanged.
Instead of considering power series in the complex variable z , we shall restrict our
attention to series in the real variable x
∞ an (x − x0 )n , |x − x0 | < ρ, f (x) = (1-3) n=0 still allowing the coeﬃcients an to be complex. Thus, f (x) is a complex-valued function
of the real variable x . The deﬁnitions of derivative and integral for such functions were
given in Section 7 of Chapter 0. We shall use that material here . Our aim is the following:
∞
n
n=0 an x Theorem 1.16 . Suppose that
Then has radius of convergence ρ > 0 (possibly ∞ ). (a) the function f (x) deﬁned by
∞ an xn , |x| < ρ, f (x) =
n=0 has an inﬁnite number of derivatives;
(b) the series ∞
n−1
n=0 nan x has the same radius of convergence ρ and
∞ nan xn−1 , |x| < ρ, f (x) =
n=0 and
(c) the series ∞
an n+1
n=0 n+1 x has the same radius of convergence ρ , and
∞ x f (t) dt =
0 n=0 an n+1
x
, |x| < ρ.
n+1 Remark: If we omit f (x) from the picture and write (b) and (c) directly in terms of the
inﬁnite sum, we ﬁnd
∞
∞
d
(b)
[
an xn ] =
nan xn−1
dx
n=0 and x∞ (c) [
0 n=0 n=0
∞ an tn ] dt =
n=0 an n−1
x
.
n+1 50 CHAPTER 1. INFINITE SERIES These two statements are usually abbreviated “a power series may be diﬀerentiated
term by term” and “a power series may be integrated term by term” within their domain
of convergence (these statements are not generally true for an arbitrary inﬁnite series of
∞
n is
functions
un (x) , see Exercise 4 below). The generalization to
n=0 an (x − x0 )
obvious.
Our proof will be given in several parts. We begin with the Lemma 1. Under the
hypothesis of the theorem, f (x) is continuous for all x with |x| < ρ . Proof: (This is a
˜
˜
little dull). Given any > 0 , we must ﬁnd a δ > 0 such that
|f (x) − f (˜)| <
x when |x − x| < δ.
˜ N
∞
n
n
Let us write fN (x) =
n=0 an x and RN (x) =
N +1 an x , so that f (x) = fN (x) +
RN (x) .
Observe that |f (x) − f (˜)| = |fN (x) − fN (˜) + RN (x) − RN (˜)| ≤ |fN (x) − fN (˜)| +
x
x
x
x
|RN (x)| + |RN (˜)| .
x
We shall show that each of these three terms can be made < 3 by picking x close
enough to x and N -which is entirely at our disposal- large enough.
˜
First work with RN (x) and RN (˜) . Choose r such that |x| < r < ρ . This is to
x
˜
insure that we stay away from the boundary |x| = ρ where the series may diverge. Then
∞
N
n
n
n=0 |an r | converges absolutely, say to the number S . If we let SN =
0 |an r | ,
N
n
we know that by picking N large enough,
N +1 |an r | = S − SN < 3 . But |RN (x)| =
∞
∞
n≤
n | , so that if |x| + ≤ r , by using the same N found above, we
N +1 an x
N +1 |an x
have
∞ |RN (x)| ≤
N +1 |an rn | = S − SN < .
3 Since by the deﬁnition of r we know |x| ≤ r , this also proves that for this same
˜
x
N |RN (˜)| < 3 . Thus by restricting |x| ≤ r , we have seen that both |RN (x)| and |RN (˜)|
x
can be made less than 3 .
Having ﬁxed N, fN (x) is a polynomial -which we know is continuous. Thus there is a
δ, > 0 such that
|fN (x) − fN (˜)| < when |x − x| < δ1 .
x
˜
3
This shows that |f (x) − f (˜)| < if x is in the intersection of the intervals |x| ≤ |x| <
x
˜
r < ρ ) and |x − x| < δ1 . That there is some interval contained in both of these intervals is
˜
easy to see since both contain all points suﬃciently close to x . And the proof is completed.
˜
As you have observed, the proof involves no new ideas but is rather technical.
With this lemma proved, we know that f (x) is continuous -and hence integrable. Thus
x
we can work with 0 f (t) dt . Our next task is to prove a portion of Part (c) of Theorem
16.
Lemma 1.17 If ∞
n
n=0 an x
∞
n=0 has radius of convergence ρ > 0 , then an n+1
x
=
n+1 x f (t) dt for all |x| < ρ.
0 Proof: We shall show that
N x f (t) dt −
0 n=0 an n+1
x
n+1 (1-4) 1.5. PROPERTIES OF FUNCTIONS REPRESENTED BY POWER SERIES 51 can be made arbitrarily small by choosing N large enough. Write
∞ N an tn + f (t) =
n=0 an tn .
n=N +1 Then since we can integrate any ﬁnite sum term by term, we have
N x f (t) dt =
0 x an
n=0 ∞ x
n t dt + N
n [ 0 0 an t ] dt =
n=1 n=N +1 an n+1
x
=
n+1 ∞ x [
0 an tn ] dt, n=N +1 so that (4) reduces to showing that
∞ x an tn dt
0 n=N +1 can be made small by choosing N large. The idea here is to apply Theorem 16 of Chapter
0. This means we need to estimate the size of the above integrand. By now you should
recognize the method. Because |x| < ρ , we can choose an r such that |x| < r < ρ .
Then
an rn is convergent so its terms are bounded, say M ≥ |an rn | for all n , that is,
M
|an | ≤ rn . Therefore, since |t| < |x| , we ﬁnd the inequality
∞ ∞ an t n ≤ N +1 ∞
n |an | |t| ≤
N +1 N +1 But the last series is a geometric series whose sum is
∞ an tn ≤
N +1 x
r N M
|x|n .
n
r x N M | x|
r
r −x . Thus M |x|
.
r−x Applying Theorem 16 of Chapter 0, we ﬁnd that
∞ x an tn ) dt ≤ (
0 N +1 x
r N M |x|2
.
r−x that is,
N x f (t) dt −
0 0 an n+1
x
x
≤
n+1
4 N M |x|2
.
r−x N Since x < 1 , we know that x
→ 0 as N → ∞ , which completes the proof of the
r
r
lemma.
Incidentally, all we have left to prove of part c of the theorem is that the radius of
convergence of the integrated series is no larger than ρ (since the lemma shows it is at least
ρ ). But this will have to wait until after
Lemma 1.18 If
an xn has radius of convergence ρ , the series obtained by formally
diﬀerentiating term by term,
nan xn−1 , has the same radius of convergence. 52 CHAPTER 1. INFINITE SERIES Remark: This lemma does not say that the derived series is equal to the derivative of the
function deﬁned by the original series. It only discusses the radius of convergence, not the
relationship of the functions represented b y the two series.
Proof: Let ρ1 be the radius of convergence of
nan xn−1 . First we show that ρ1 ≤ ρ . If
nan xn−1 converges for some ﬁxed x , then so does
nan xn . But the terms of this last
n since |na xn | ≥ |a xn | . Thus by the comparison
sequence are larger than those of
an x
n
n
n also converges for that x , which shows ρ ≤ ρ .
test
an x
1
To show that ρ ≤ ρ1 , assume
an xn converges for some x and choose r between |x|
and ρ, |x| < r < ρ . As in the proof of Lemma 2 we ﬁnd that |an | < M r−n . Then the terms
n−1 . By
nan xn−1 are smaller than the corresponding terms in
n M | x|
in the series
rr
the ratio test this last series converges, since |x| < r . Thus the derived series
nan xn−1
also converges, showing that ρ ≤ ρ1 and completing the proof of the lemma.
Now we can complete the proof of part c of Theorem 16.
Corollary 1.19 If
an xn has radius of convergence ρ , then the series obtained by foran n+1
mally integrating term by term,
also has radius of convergence ρ .
n+1 x
an n+1
, and we have
an xn is the formal derivative of the series
Proof: The series
n+1 x
just seen that these two series have the same radius of convergence.
We shall next prove part (b) of Theorem 16 as Lemma 1.20 f (x) ≡ ∞
n
n=0 an x has radius of convergence ρ > 0 then df
d
=
[
dx
dx ∞ ∞ nan xn−1 , an xn ] = n=0 n=0 and this series also has radius of convergence ρ .
Proof: In Lemma 3 we proved that the radii of convergence are the same. What we must
prove here is that the derivative of the function is given by the derivative of the series.
This is a more or less immediate consequence of Lemma 2, for let us apply this integration
lemma to the function g (x) deﬁned by
∞ nan xn−1 , |x| < ρ. g (x) ≡
n=1 Then we ﬁnd that ∞ x an xn = f (x) − a0 , |x| < ρ. g (t) dt =
0 n=1 By the fundamental theorem of calculus, we can take the derivative of the left side, and it
is g (x) . Thus
g (x) = f (x),
that is, ∞ nan xn−1 =
n=1 d
f (x).
dx This incidentally also proves the otherwise not obvious fact that f (x) , only known to be
continuous so far (Lemma 1) is also diﬀerentiable. 1.5. PROPERTIES OF FUNCTIONS REPRESENTED BY POWER SERIES 53 To complete the proof of Theorem 16, we must prove Lemma 5. If the power series
∞
n
an xn converges for |x| < ρ , then the function f (x) deﬁned by f (x) ≡
n=0 an x
has an inﬁnite number of derivatives. The derivatives are represented by the formal series
obtained by term-by-term diﬀerentiation.
Proof: By induction, Lemma 4 shows us that f (x) has one derivative. Assume f (x) has
k derivatives. We shall show that is has k + 1 . Let f (k) (x) = bn xn be the series for the
k th derivative of f . Applying Lemma 4 to this series we ﬁnd that f (k) (x) is diﬀerentiable.
This proves that f has k + 1 derivatives and completes the induction proof.
Examples: (a) We know that
1
=
1+t ∞ (−t)n = 1 − t + t2 − t3 + · · · .
n=0 where the geometric series converges for |t| < 1 . Applying the theorem, we integrate term
by term to ﬁnd that
x ln(1 + x) =
0 1
dt =
1+t or ∞
n=0 (−1)n · xn+1
, |x| < 1,
n+1 x2 x3 x4 x5
+
−
+
+ ··· .
2
3
4
5 ln(1 + x) = x − Thus the function ln(1 + x) is equal to the power series on the right. With a little
more work we can prove that the series, which converges at x = 1 , converges to ln(1 + 1)
and obtain the following interesting formula.
ln 2 = 1 − 1111
+ − + − ··· .
2345 The power series for ln(1 + x) can be used to illustrate the possibilities of computing
with inﬁnite series. If 0 < x < 1 the series for ln(1 + x) is a strictly alternating series to
which we can apply inequality (2) of Theorem 12. For this series it reads
k 0 < ln(1 + x) −
n=0 (−1)n xn+1
xk+2
<
,
n+1
k+2 x > 0. This inequality states that if only the ﬁrst k terms of the inﬁnite series are used to compute
k+2
ln(1 + x) , the error will be less than x +2 . Say we want to compute ln(1 + 1 ) =
k
4
ln 5 to 5 decimal places. Then we want t o choose k so that
4
1 k+2
4 k+2 < 1
= 10−6
1, 000, 000 Cross-multiplying, writing 4 = 22 , we want k such that 106 < (k + 2)22k+4 , since k + 2 ≥
2, 22k+5 ≤ (k + 2)22k+4 . Thus, we are done if we can ﬁnd k such that
106 ≤ 22k+5 . 54 CHAPTER 1. INFINITE SERIES But since 210 = 1024 > 103 , we know 220 > 106 . Thus if 2k + 5 ≥ 20 , or k = 8 we will
have the desired accuracy. This means that
ln 5
1 11
11
= − ( )2 + · · · + ( )8+1 + error
4
4 24
94 where the error is less than 10−6 .
From the form of the error estimate, it is clear that the series converges faster if x is
smaller. This power series, valid only if |x| < 1 can be used to compute ln(1 + x) if |x| > 1
by utilizing the observation illustrated by
4
1
1
3
ln 6 = 3 ln( ) + 2 ln( ) = 3 ln(1 + ) + 2 ln(1 + ),
2
3
2
3
1
where both ln(1 + 1 ) and ln(1 + 3 ) can be computed using the power series. We should
2
confess that this series converges too slowly to be of much value for that purpose in real
life.
1
(b) Since 1+t2 is also the sum of a geometric series 1
= 1 − t2 + t4 − t6 + t8 + · · · =
1 + t2 ∞ (−1)n t2n , |t| < 1,
0 if we integrate term by term, we ﬁnd
2 tan−1 x =
0 dt
+
1 + t2 ∞
n=0 x3 x5
(−1)n x2n+1
=x−
+
+ ··· ,
2n + 1
3
5 which converges if |x| < 1 . Further investigation shows that the series also converges
at x = 1 and represents the function at that point. This yields the wonderful formula
(obtained by letting x = 1 )
111
π
= 1 − + − + ···
4
357
from which we can compute π to any desired accuracy. Exercises
1
(1) Write down an inﬁnite series whose sum is 1−t and integrate the series term by term
to obtain a power series for ln(1 − x) . For what x does the series converge? (2) Find a power series which converges about x = 0 for the function x
(1−x)2 by recog- 1
nizing (1−x)2 as the derivative of a function whose power series in known. For what
x does the series converge? (3) Compute ln 9 to 4 decimal places, proving the error in your approximation is correct.
8
2 ∞ sin n x
(4) Show that
converges for all x but the series obtained by diﬀerentiating
n=1 n2
term-by-term does not converge, say at x = 0 . (5) Exercise your ingenuity and apply the theorems of this section to ﬁnd the function
whose power series is
(a) a + 2x2 + 4x4 + 6x6 + 8x8 + · · · + (2n)x2n + · · · .
(b) 2 + 3 · 2x + 4 · 3x2 + 5 · 4x3 + · · · + (k + 2)(k + 1)xk + · · · 1.5. PROPERTIES OF FUNCTIONS REPRESENTED BY POWER SERIES 55 6. Taylor’s Theorem. Representation of a Given Function in a Power Series. The Binomial
Theorem.
In this section we prove Taylor’s Theorem, an important generalization of the mean
value theorem, and use it to investigate the questions i) when does a given function f (x)
have a power series? and ii) if f (x) has a power series about x0 , f (x) = ∞ an (x − x0 )n ,
n=0
how can we ﬁnd the coeﬃcients an ? As a partial answer to i) we know from Theorem 16
of the last section that if f (x) has a power series about x0 , it must necessarily have an
inﬁnite number of derivatives at x0 . It turns out that this is not enough.
Perhaps it is easiest to begin with question ii).
Assume f (x) has a power series about x0 ,
∞ an (x − x0 )n , f (x) =
n=0 which converges for |x − x0 | < ρ . How can we ﬁnd the coeﬃcients an ? By Theorem 16
we know that f has an inﬁnite number of derivatives at x0 . Moreover these derivatives
can be calculated by diﬀerentiating the power series term-by-term. F or convenience we let
x0 = 0 .
f (x) = a0 + a1 x + a2 x2 + a3 x3 + · · · + an xn + · · · ,
f (x) = a1 + 2a2 x + 3a3 x2 + · · · + nan xn−1 + · · · ,
f x(x) = 2a2 + 2 · 3a3 x + 3 · 4 · a4 x2 + · · · + n(n − 1)an xn−2 + · · · ,
f (3) (x) = 2 · 3a3 + 2 · 3 · 4a4 x + 3 · 4 · 5a5 x2 + · · · +,
f (n) (x) = n!an + (n + 1)!an+1 x + (n + 2)
!an+2 x2 + · · · .
x By letting x = 0 in each line, we ﬁnd
a0 = f (0), a1 = f (0), a2 = f (0)
f (n) (0)
, . . . , an =
.
2
n! This proves
Theorem 1.21 If f (x) =
an (x − x0 )n has a convergent power series representation
about x0 , then the coeﬃcients an are equal to f (n) (x0 )/n! , so in fact
∞ f (x) =
n=0 f (n) (x0 )
(x − x0 )n .
n! (1-5) This formula (1.21) completely solves the problem of ﬁnding the coeﬃcients an of a function
if that function has a power series. A simple consequence is the
Corollary 1.22 A function f (x) has at most one convergent Taylor series about a point
x0 .
Proof: By the above theorem, if f (x) =
f (n) (x an (x − x0 )n and f (x) = bn (x − x0 )n , then an = n! 0 ) = bn , so the power series are identical.
Remark: When f has a power series expansion about x0 , the series is usually called the
Taylor series of f at x0 . In the special case x0 = 0 , the series is sometimes called the
Maclaurin series for f . 56 CHAPTER 1. INFINITE SERIES Examples:
n d
(a) If f (x) = ex has a power series about x = 0 , what is it? Since f (n) (0) = dxn ex x=0 =
∞
1n
e0 = 1 , we know that an − 1/n! so that the power series is
n=0 n! x . We cannot
∞
1n
yet write ex = n=0 n! x since we have not proved that ex does have a power series. (b) If f (x) = cos x has a power series about x = 0 , what is it? f (0) = 1, f (0) =
− sin 0 = 0, f (0) = − cos 0 = −1, f (0) = sin 0 = 0, f (4) (0) = cos 0 = 1, . . . . All the
odd derivatives at 0 are zero while the even derivatives alternate between +1 and
−1 . Therefore the series is
a− 1
1
12
x + x4 − x6 + · · ·
2!
4!
6! ∞
n=0 (−1)n x2n
.
(2n)! Again we cannot yet claim that this is cos x .
1 (c) If f (x) = e− x2
0 , x=0
, x=0 . has a power series about x = 0 what is it? The computation is somewhat more diﬃcult here. f (x) =
1
4
)e− x2
x6 1 1
2 − x2
,
e
x3 6
f (x) = (− x4 − n−
3n
, and generally f (n) (x) = ( α3n + · · · + α2n+22 )e− x2 where the αk are real
x
x
numbers we don’t need to ﬁnd. If we let x = 0 in f (n) (x) , the resulting expression
has the indeterminate form ∞ · 0 . Thus l’Hˆspital’s rule must be invoked. Now
o
2 2 −1/x
e−1/x
, k > 0 . What is limx→0 e xk ? Let
xk
k/2
limt→0 tk/2 e−t = limt→∞ t et . If k is an even integer,
times leaves a constant in the numerator and et in the f (n) (x) is the sum of terms of the form 1
t = x2 , and we must evaluate
applying l’Hˆspital’s rule k/2
o
denominator, so the limit is limt→∞ const = 0 . If k is odd, applying l’Hˆspital’s rule
o
et
√
(k + 1)/2 times leaves a function of the form const , which also tends to 0 as t → ∞ .
tet What we have just shown is that f (n) (0) = 0 . The power series associated with e−1/x 2 is
0+0·x+ 02
0
x + · · · xn + · · · ≡ 0.
2!
n! 2 This function e−1/x , whose power series about x = 0 is zero, is an example of a function
which is clearly not equal to the power series, 0, associated with it.
To ﬁnd if a given function has a power series expansion about x0 we turn to Taylor’s
Theorem (also known as the extended mean value theorem). Now if a function f deﬁned
in a neighborhood of x0 has a power series expansion there, we know the series is given by
(5). Thus we should investigate
N RN (x) ≡ f (x) −
n=0 f (n) (x0 )
(x − x0 )n .
n! To say that f is equal to its series expansion is the same as saying that the remainder,
RN (x) , becomes arbitrarily small as N → ∞ . We must now seek an estimate of this
remainder RN (x) . Taylor’s theorem is one way of ﬁnding an estimate. 1.5. PROPERTIES OF FUNCTIONS REPRESENTED BY POWER SERIES 57 Theorem 1.23 . (Taylor’s Theorem). Let f be a real-valued function with N + 1 continuous derivatives deﬁned on an interval containing x0 and x . There exists a number ζ
between x0 and x such that f (x) = f (x) + f (x0 )(x − x0 ) +
+ f (x0 )
f (x0 )
(x − x0 )2 +
(x − x0 )3 + · · ·
2!
3! f ( N + 1)(ζ )
f (N ) (x0 )
(x − x0 )N +
(x − x0 )N +1 .
N!
(N + 1)! (1-6) In other words,
RN (x) = f (N +1) (ζ )
(x − x0 )N +1 .
(N + 1)! (1-7) Remark: 1 The proof will only tell us that such a ζ exists but will give us no way
to ﬁnd it. In practice we often try to ﬁnd some upper bound M for f ( N + 1)(ζ ) , so
f ( N + 1)(ζ ) ≤ M , for all N , and only use the crude resulting estimate
|RN (x)| ≤ M
|x − x0 |N +1 .
(N + 1)! (1-8) An example of this is the series for cos x . Assuming the proof of the theorem, we know
that (see Example b above) about x0 = 0 ,
cos x = 1 −
where
RN (x) =
Since x2 x4
(−1)N 2N
+
+ ··· +
x + RN (x),
2!
4!
(2N )! 1
d2N +2
[ 2N +2 cos x]x=ζ x2N +2 , ζ (0, x).
(2N + 2)! dx
d2N +2
cos x
dx2N +2 we ﬁnd that
|RN (x)| ≤ ≤ 1,
x= ζ 1
|x|2N +2
(2N + 2)! Because, for ﬁxed x , this remainder tends to 0 as n → ∞ , we have proved that the power
series for cos x at x0 = 0 does converge to cos x , so in the limit
∞ cos x =
n=0 (−1)n 2n
x.
(2n)! We can apply Theorem 16 and diﬀerentiate both sides of this to ﬁnd the series for sin x .
Remark: 2 Observe that Taylor’s Theorem is only proved for real-valued functions f . It
is not true if f is complex-valued. However using it we will be able to prove the inequality
(7) for complex-valued f .
Proof: (Taylor’s Theorem). Our proof is short—perhaps a little too slick. The trick is to
appeal to the mean value theorem (really only Rolle’s theorem is used). 58 CHAPTER 1. INFINITE SERIES
Fix x and deﬁne the real number A by
N f (x) =
n=0 (x − x0 )N +1
f ( n)(x0 )
(x − x0 )n + A
.
n!
(N + 1)! (1-9) Now let
H (t) := f (x) − f (t) + f (t)(x − t) + f (t)(x − t)2
f (N ) (t)
(x − t)N +1
+···+
(x − t)N − A
.
2!
N!
(N + 1)! Thus we are letting x0 vary, not x . Observe that H (x) = 0 (obviously) and H (x0 ) = 0
(by deﬁnition of A). Since H (t) satisﬁes the hypotheses of the mean value theorem, we
conclude that there is some ζ between x0 and x such that H (ζ ) = 0 . But
H (t) = −f (t) − f (t)(x − t) − f (t) − · · · −
−A f (N +1) (t)
f (N ) (t)
(x − t)N −
(x − t)N −1
N!
(N − 1)! (x − t)N
(x − t)N
A − f (N +1) (t) .
=
N!
N! Amazingly, almost all the terms canceled. Since H (ζ ) = 0 and ζ = x , we now know
that A = f (N +1) (ζ ) . Substitution of this value of A into (8) gives us exactly (6), which is
just what we wanted to prove.
As an application let us prove the Binomial Theorem. That is the name given to the
Maclaurin series for (1 + x)α , where α ∈ R . The derivatives are easy to compute.
f (x) = (1 + x)α
f (x) = α(1 + x)α−1
f (x) = α(α − 1)(1 + x)α−2
...
f (n) = α(α − 1) · · · .(α − n + 1)(1 + x)α−n .
Thus the power series about 0 associated formally with (1 + x)α is
∞
n=0 α(α − 1) · · · (α − n + 1) n
x.
n! By the ratio test this series converges for |x| < 1 . Does it converge to (1 + x)α when
|x| < 1 ?
If α is a positive integer, α = N , the terms in the power series from n = N + 1 on all
are zero since they contain the factor (N − N ) . In this case we have only a ﬁnite series so
convergence is trivial. The resulting polynomial is the familiar Binomial Theorem of high
school algebra.
Let us therefore assume α is not a positive integer (or 0). Then we have an honest
inﬁnite series. In order to prove that (1 + x)α is equal to the inﬁnite series, we must show
that the remainder
N RN (x) ≡ (a + x)α −
n=0 α(α − 1) · · · (α − n + 1) n
x
n! 1.5. PROPERTIES OF FUNCTIONS REPRESENTED BY POWER SERIES 59 tends to zero as N → ∞ . By Taylor’s Theorem
RN (x) = α(α − 1) · · · (α − N )
(1 + ζ )α−N −1 xN +1 ,
(N + 1)! where ζ is between 0 and x . We shall prove that this tends to 0 as N → ∞ only when
0 ≤ x < 1 . It is also true for −1 < x ≤ 0 , but the proof is much longer so we will not give
it [however a diﬀerent attack yields the proof easily].
Now if 0 ≤ x < 1 , since 0 < ζ < x , then 1 < 1 + ζ . Therefore for N ≥ α , we have
(z + ζ )α−N −1 < 1 . Thus
|RN (x)| < α(α − 1) · · · (α − N ) N +1
x
(N + 1)! which does tend to zero as N → ∞ (since it is the N + 1 st term of the convergent series
∞ α(α−1)···(α−n+1) n
x , |x| < 1) .
n!
Although we have proved it only if 0 ≤ x < 1 , we shall state the complete
Theorem 1.24 (Binomial Theorem). The function (1 + x)α is equal to a power series
which converges for |x| < 1 . It is
∞ (1 + x)α =
n=0 α(α − 1) · · · (α − n + 1)
.
n! (1-10) In practice it is silly to memorize this formula since it is easier to expand (1 + x)α
directly in a Maclaurin series, which we have just shown (partly anyway) is equal to the
function.
We close this section with the generalization of Taylor’s Theorem to complex-valued
function f (x) .
Theorem 1.25 . Let f (x) = u(x) + iv (x) be a complex-valued function with N + 1
continuous derivatives deﬁned on an interval containing x0 and x . There exists a real
number MN depending on N such that
N f (x) −
n=0 f (n) (x0 )
MN
(x − x0 )n ≤
|x − x0 |N +1
n!
(N + 1)! (1-11) Proof: Since f has N + 1 continuous derivatives, so do the real-valued functions u(x)
and v (x) . Applying Taylor’s Theorem to u and v , we ﬁnd numbers ζ1 and ζ2 , both
between x0 and x , such that
N u(x) −
n=0 u(n) (x0 )
u(N +1) (ζ1 )
(x − x0 )n =
(x − x0 )N +1 ,
n!
(N + 1)! and
N v (x) −
n=0 v (n) (x0 )
v (N +1) (ζ2 )
(x − x0 )n =
(x − x0 )N +1 .
n!
(N + 1)! 60 CHAPTER 1. INFINITE SERIES
Thus, by addition, since f (n) = u(n) = iv (n) , we ﬁnd
N f (x) −
n=0 u(N +1) (ζ1 ) + iv (N +1) (ζ2 )
f (n) (x0 )
(x − x0 )n =
(x − x0 )N +1 .
n!
(N + 1)! However since u(N +1) and v (N +1) are assumed continuous in an interval containing
ˆ
˜
x0 and x , they are bounded there, say by MN and MN . Taking absolute values of the
ˆ
˜
last equation, we obtain equation (10) where MN = M 2 + M 2 .
N N Exercises
(1) Find the Taylor series about the speciﬁed point x0 and determine the interval of
convergence for the following functions. You need not prove that the series do converge
to the functions.
(a)
(b)
(c)
(d)
(e)
(f)
(g)
(h) sin x, x0 = 0,
ln x, x0 = 1,
1
x , x0 = −1,
√
x, x0 = 6,
1x
−x
2 (e + e ), x0 = 0
x+ i
1+x , x0 = 0,
cos x, x0 = π ,
4
1
i+x , x0 = 0
2 (i) e−x , x0 = 0,
(j) (1 + x + x2 )−1 , x0 = 0,
(k) cos x + i sin x, x0 = 0,
1
(l) √1+2x , x0 = 0.
(2) Prove that in their interval of convergence about 0 the following power series associated
with the given functions converge to the functions. Do this by proving that the
remainder |RN (x)| → 0 as N → ∞ .
(a) sin x,
1
(b) 1+x4 ,
(c) e−x
(d) cosh x [Recall the deﬁnition: cosh x = ex +e−x
2
. 2 1
(3) One often approximates √1+x2 by 1 − x when |x| is small. Give some estimate of
2
the error if a) |x| + < 10−1 , b) |x| < 10−2 , c) |x| < 10−4 . (4) Use the Taylor series
2 e−x = 1 − x2 +
1 2 x4 x6
(−1)n x2n
−
+ ··· +
+ ··· .
2!
3!
n! to evaluate 0 e−x dx to three decimal places. I suggest using Theorem 16 and the
error estimate of Theorem 12. 1.5. PROPERTIES OF FUNCTIONS REPRESENTED BY POWER SERIES 61 (5) Assume the ordinary diﬀerential equation y − y = 0 , with y (0) = 1 has a power
series solution y (x) = ∞ an xn about x = 0 . a). Substitute this series directly
n=0
into the diﬀerential equation and solve for the coeﬃcients an . b). Find when the
series converges; c). justify (a posteriori) the fact that the function deﬁned by the
convergent series does satisfy the diﬀerential equation. [We do not yet know that this
is the only solution. All we know is that it is the only solution which has a power
series].
(6) In this exercise you will prove that e is irrational. It all hinges on the series for 3 .
e=1+1+ 1
1
1
+ + ··· +
+ ··· .
2 3!
n! (a) Prove that 2 < e < 3 , so e is not an integer (cf. page 58, bottom).
(b) Assume e is rational, e = p , where p and q are integers with no common
q
factor and q ≥ 2 . Then use the Taylor series with q terms and the remainder
Rq to show that e · q ! = N + qeζ , where 0 < ζ < 1 , and N is an integer.
+1
ζ (c) From this deduce that qe must be an integer, and show that this contradicts
+1
eζ < e < 3 , and q + 1 ≥ 3 .
(7) This exercise generalizes the form of the remainder (6’) in Taylor’s Theorem. Fix x
and deﬁne the number B by
N f (x) =
n=0 f (n) (x0 )
(x − x0 )n + B (x − x0 )α , α ≥ 1.
n! Then consider the function H (t) deﬁned by
N H (t) ≡ f (x) −
n=0 f (n) (t)
(x − t)n − B (x − t)α .
n! Show that there is a ζ between x0 and x such that
B= f (N +1) (ζ )
(x − ζ )N +1−α ,
αN ! so that f (N +1) (ζ )
(x − x0 )α (x − ζ )N +1−α .
αN !
This is Schlomilch’s form of the remainder. In the special case α = N + 1 , we obtain
Lagrange’s form of the remainder, (6) found previously, while for α = 1 we obtain
Cauchy’s form of the remainder
RN = RN = f (N +1) (ζ )
(x − x0 )(x − ζ )N .
N! Here are two applications of Taylor’s Theorem to problems other than inﬁnite series.
The ﬁrst one deals with max-min. Let f (x) be a suﬃciently smooth function (by which
we mean f has plenty of derivatives—we’ll specify the number later). Now we know that 62 CHAPTER 1. INFINITE SERIES if f has a local maximum or minimum at x0 , then f (x0 ) = 0 , and it is a maximum if
f (x0 ) < 0 , a minimum if f (x0 ) > 0 . But what if f (x0 ) = 0 ? Consider the examples
f1 (x) = x4 , f2 (x) = −x4 , f3 (x) = x3 , the ﬁrst of which has a minimum at x = 0 , the
second a maximum at x = 0 , while the third has neither. These three examples suggest
the criterion will depend upon the lowest non-zero derivative being an even or odd derivative,
and on its sign.
a figure goes here
By the deﬁnition of local maximum and minimum, the issue is the behavior of f (x) in
a neighborhood of x0 , that is, the nature of f (x0 + h) for |h| small. We remind you that
f has a local max at x0 if f (x0 + h) − f (x0 ) ≤ 0 for all |h| suﬃciently small, and a local
min at x0 if f (x0 + h) − f (x0 ) ≥ 0 for all |h| suﬃciently small. Since the behavior of f (x)
near x0 is determined by the Taylor polynomial
f (x0 + h) = f (x0 ) + f (x0 )h + f (n)
f (n+1) (ζ )hn+1
f (x0 )h2
+ ··· +
(x0 )hn +
2!
n!
(n + 1)! where ζ is between x0 and x0 + h , it is natural to look at this polynomial to answer our
question.
Theorem 1.26 Assume f has (at least) n + 1 continuous derivatives in some interval
containing x0 . Say f (x0 ) = f (x0 ) = . . . = f n (x0 ) = 0 but f (n+1) (x0 ) = 0 , then
(a) if n is even, then f has neither a max nor min at x0 .
(b) if n is odd, then
i) f has a max at x0 if f (n+1) (x0 ) < 0.
ii) f has a min at x0 if f (n+1) (x0 ) > 0.
Proof: We shall use Taylor’s polynomial with n + 1 terms.
Since the ﬁrst n derivatives vanish at x0 , we have f (x0 + h) − f (x0 ) = f (n+1) (ζ ) n+1
,ζ
(n+1)! h
(n+1) (ζ ) must
f between x0 and x0 + h . Because f (n+1) (x) is assumed continuous at x0 ,
have the same sign as f (n+1) (x0 ) in some neighborhood of x0 . Restrict your attention to
the neighborhood. If n is even, n + 1 is odd, so that hn+1 is positive if h > 0 , negative
if h < 0 . Thus f (x0 + h) − f (x0 ) changes sign in any neighborhood of x0 . However if n
is odd, hn+1 is positive no matter if h > 0 or h < 0 . Therefore f (x0 + h) − f (x0 ) has the
same sign as f (n+1) (x0 ) throughout some neighborhood about x0 . The precise conditions
are easy to verify now.
Examples:
1. f (x) = x5 + 1 has neither a max nor min at x = 0 , since f (0) = . . . = f (4) (0) = 0 ,
but f (5) (0) = 5! = 0 .
2. f (x) = (x − 1)6 − 7 has a min at x = 1 since f (1) = . . . = f (5) (1) = 0 , but
f (6) (1) = 6! > 0 .
Our second application is a geometrical interpretation of the Taylor polynomial. Given
the function f (x) , consider the polynomial
Pn (x) = f (x0 ) + f (x0 )(x − x0 ) + f (x0 )
f (n) (x0 )
(x − x0 ) + · · · +
(x − x0 )n ,
2!
n! 1.5. PROPERTIES OF FUNCTIONS REPRESENTED BY POWER SERIES 63 whose ﬁrst n derivatives agree with those of f at x = x0 . P1 (x) = f (x0 ) + f (x0 )(x − x0 )
is the equation of the tangent to the curve y = f (x) at x0 . It is the straight line which
most closely approximates the curve at x0 . Similarly P2 (x) is the parabola which most
closely approximates the curve at x0 . Generally, Pn (x) is the polynomial of degree n
which most closely approximates the curve y = f (x) at the point x0 . Using this Taylor
polynomial, we can deﬁne the order of contact of two curves at a point.
Definition: The two curves y = f (x) and y = g (x) have order of contact n at the point
x0 if their Taylor polynomials of degree n at x0 are identical, but their n + 1 st Taylor
polynomials diﬀer.
An equivalent deﬁnition is that f (x0 ) = g (x0 ) , f (x0 ) = g (x0 ) , . . . , f (n) (x0 ) =
(n) (x ) , but f (n+1) (x ) = g (n+1) (x ) . We have assumed that f and g have n + 1
g
0
0
0
continuous derivatives. If f and g have contact n at x0 , then
f (x0 + h) − g (x0 + h) = f (n+1) (ζ1 ) − g (n+1) (ζ2 ) n+1
h
.
(n + 1)! One interesting consequence of this formula is that if f and g have contact of even
order, then the curves will cross at x0 , while if the contact is of odd order, the curves will
not cross in some neighborhood of x0 .
We can deﬁne the curvature of a curve in the plane by using the concept of contact.
First we deﬁne the curvature of a circle (whose curvature had better be constant).
1
1
Definition: The curvature k of a circle of radius R is deﬁned to be R , k = R .
Thus the smaller the circle, the larger the curvature—a natural outcome. Furthermore,
a straight line—which may be thought of as a circle with inﬁnite radius—has curvature zero.
How can we deﬁne the curvature of a given curve? For all non-circles, the curvature will
clearly vary from point to point of the curve. Thus, the concept we want is the curvature
of a given curve y = f (x) at a point x0 . Our deﬁnition should appear reasonable.
Definition: The curvature k of a plane curve y = f (x) at the point x0 is the curvature
of the circle which has contact of order two at x0 .
This circle which has contact of order two is called the osculating circle to the curve
at x0 (osculate: Latin, to kiss). Let us convince ourselves that there is only one osculating
circle (for if there were two, the curvature would not b e well deﬁned.) Consider all circles
of contact one to f (x) at x0 . These are all circles tangent to f (x) at x0 . Their centers
lie on the line l normal to the curve at x0 (“normal” means perpendicular to the tangent
line). It is geometrically clear that of these circles with contact 1, there will be exactly one
with contact 2.
Example: Find the curvature of y = ex at x = 0 . The slope of the curve at (0, 1) is
1. Therefore the equation of the normal is y − 1 = −x . Since the center (x0 , y0 ) of the
osculating circle must lie on this line, and the circle contains the point (0, 1) , subject to
y0 = 1 − x0 , the value of x0 must be determined from the fact that the second derivative
of the circle (0, 1) must equal the second derivative of y = ex at x = 0 , that is, it must
equal 1. But for any circle, (y − y0 )y + y 2 + 1 = 0 . In our case y = 1 at (0, 1) (recall
the circle is tangent to ex at (0, 1) ), so that (1 − y0 ) · 1 + 1 + 1 = 0 , or y0 = 3 . The
equation y0 = 1 − x0 implies that x0 = −2 . Thus the equation of the osculating circle is
1
(y − 3)2 + (x + 2)2 = 8 , and the curvature of y = ex at x = 0 is k = √8 . Later on we will
give another deﬁnition of curvature which is applicable not only to plane curves, but also
to curves in space. 64 CHAPTER 1. INFINITE SERIES Exercises
(1) What is the order of contact of the curves y = e−x and y = 1
1+x + 1 sin2 x at x = 0 ?
2 (2) Find the osculating circle and curvature for the curve y = x2 at x = 1 .
(3) Show that at x = a , the curve y = f (x) has curvature k = f (a )
3 [1+f (a)2 ] 2 f
of the osculating circle is at the point (a − f (a) [1 + f (a)2 ], f (a) +
(a )
is the messy equation of the osculating circle? and the center
1+f (a)2
f (a) ). What (4) At the given points, the following curves have slope zero. Determine if the curve has
a max, min, or neither there.
(a). y = (x + 1)4 , x = −1,
(b). y = x2 sin x, x = 0.
(5) Let P1 , P , and P2 be three distinct points on the curve y = f (x) , and consider the
circle passing through those three points. Show that in the limit as both P1 and P2
approach P , this circle becomes the osculating circle. (Hint: Taylor’s Theorem will
be needed here).
(6) In this problem we outline another derivation of Taylor’s Theorem. Whereas the one
in the notes did not use the fact the f (n+1) was continuous, this proof relies upon
that fact.
(a) Show that
x
x0 (x − t)k−1 (k)
(x − x0 )k
f (t) dt = f (k) x0
+
(k − 1)!
k! x
x0 (x − t)k (k+1)
f
(t) dt.
k! (b) Prove by induction that
f (x) = f (x0 ) + f (x0 )(x − x0 ) + · · · + f (n) (x0 )
(x − x0 )n +
n! x
x0 (x − t)n (n+1)
f
(t) dt.
n! The remainder is expressed as an integral here. It is because f (n+1) is to be
integrated that we require its continuity.
(7) (a) Let g (x) have contact of order n with the function 0 at the point x = a , and
assume that f (x) has contact of order at least n with the function 0 at x = a .
Use Taylor’s Theorem to prove that
f (x)
f (n+1) (a)
= (n+1)
x→a g (x)
g
(a)
lim This is l’Hˆspital’s Rule.
o
(b) Apply l’Hˆspital’s rule to evaluate
o
i) lim x→0 x − sin x
1 − tan θ
, ii) lim
x3
θ− π
θ→ π
4
v 1.6. COMPLEX-VALUED FUNCTIONS, E Z , COS Z, SIN Z . 65 (8) Assume f has two derivatives in the interval [a, b] , and assume that f ≥ 0 throughout the interval. Prove that if ζ is any point in [a, b] , then the curve y = f (x) never
falls below its tangent at the point x = ζ, y = f (ζ ) . [hint: Use Taylor’s Theorem
with three terms].
(9) Use Cauchy’s form of the remainder (p. 103-4, no. 7) for Taylor’s Theorem to prove
that the binomial series converges to (1 + x)α for −1 < x ≤ 0 . This will complete
the proof of the binomial theorem.
n 1d
(10) The nth Legendre polynomial Pn (x) is deﬁned by Pn (x) = 2n n! dxn [(x2 − 1)n ] . Prove
that Pn (x) is a polynomial of degree n and has n distinct real zeros in the interval
(−1, 1) . (11) Verify that eax is a solution of y = ay . Prove that every solution has the form
Aeax , where A is a constant.
(12) Assume that f (x) has plenty of derivatives in the interval [a, b] , and that f has
n + 1 distinct zeros in the interval. Prove that there is at least one c ∈ (a, b) such
that f (n) (c) = 0 . 1.6 Complex-Valued Functions, ez , cos z, sin z . The task of this section is to answer the following question. Say f (x) is a real or complex
valued function of the real variable x . How can we deﬁne f (z ) where z is complex ?
For example, if P (x) = a0 + a1 x + · · · + an xn is a polynomial, the answer is easily given:
just deﬁne P (z ) = a0 + a1 z + · · · + an z n . Since this function only involves addition and
multiplication of complex numbers, for any complex z the number P (z ) can be computed.
P
Similarly any rational function, Q(x) , where P (x) and Q(x) are both polynomials, can be
( x)
P
deﬁned for complex z as Q(z ) since both P (z ) and Q(z ) are deﬁned separately and we
(z )
can then take their quotient.
But how do we deﬁne ez , or cos z , or (1 + z )α , where α R is not a positive integer?
As might have been suspected, the trick is to use inﬁnite series.
Definition: If f (x), x R , has a convergent Taylor series,
∞ an xn , f (x) = |x| < ρ, n=0 then we deﬁne f (z ), z C , by the inﬁnite series
∞ an z n , f (z ) =
n=0 and the inﬁnite series converges throughout the disc |z | < ρ .
The assertion that the complex series converges throughout the disc |z | < ρ is an
immediate consequence of Theorem 13 on page ?.
Thus, for example, we deﬁne.
∞ E (z ) =
n=0 1n
z,
n! 66 CHAPTER 1. INFINITE SERIES ∞ C (z ) =
n=0
∞ S (z ) =
n=0 and ∞ (1 + z )α =
n=0 (−1)n z 2n
,
(2n)! (−1)n z 2n+1
,
(2n + 1)! α(α − 1) · · · (α − n + 1) n
z,
n! αR where the ﬁrst three series converge for all z C , while the last converge for |z | < 1 . We
have temporarily used the notation E (z ) in place of ez , C (z ) for cos z , and S (z ) for sin z
so that you do not jump to hasty conclusion s about these functions by merely extrapolating
your knowledge of ex etc. For example it is not true that |sin z | ≤ 1 for all z C , even
though |sin x| ≤ 1 for all x R . All properties of these function s for z C must be proved
again beginning with the power series deﬁnitions. Known properties of ex , x R and wishful
thinking don’t prove properties of ez , z C . Let us begin by proving
Theorem 1.27 .
(a) E (iz ) = C (z ) + iS (z ), for all z ∈ C .
(b) E (−iz ) = C (z ) − iS (z ) , for all z ∈ C .
1
(c) C (z ) = 2 [E (iz ) + E (−iz )] , for all z ∈ C . (d) S (z ) = 1
2i [E (iz ) − E (−iz )] , for all z ∈ C . Proof: a). b). Just substitute and rearrange the series. For example
C (z ) = 1 − z 2 x4 x6
+
−
+ ··· .
2!
4!
6! iS (z ) = i[z − z3 z5 z7
+
−
+ ···]
3!
5!
7! so z2
z3 z4
z5
−i +
+ i − ··· ,
2!
3!
4!
5!
where the adding of the two series is justiﬁed by Theorem 5(page ?). We must compare the
last series with that for E (iz ) :
C (z ) + iS (z ) = 1 + iz − E (iz ) = 1 + iz + (iz )2 (iz )3 (iz )4
z2
z3 z4
+
+
+ · · · = 1 + iz −
−i +
+ ··· ,
2!
3!
4!
2!
3!
4! which is identical to the series for C (z ) + iS (z ) .
c)-d). These follow by elementary algebra from a) and b).
The formulas a)-d) of Theorem 21 show there is a close connection between the four
functions E (iz ), E (−iz ), C (z ), and S (z ) . Our next theorem shows that the formula
ex ey = ex+y , x, y ∈ R , extends to the function E (z ) .
Theorem 1.28 . E (z )E (w) = E (z + w) , for all z, w ∈ C . 1.6. COMPLEX-VALUED FUNCTIONS, E Z , COS Z, SIN Z . 67 Proof: We must show that
∞ (
n=0 zn
)(
n! ∞ n=0 wn
)=
n! ∞
n=0 (z + w)n
,
n! The product of the two series is deﬁned in Theorem 15. Using that deﬁnition, we ﬁnd that
∞ (
n=0 zn
)(
n! ∞ n=0 wn
)=
n! ∞ n (
n=0 k=0 z k wn−k
).
k ! (n − k )! However, the binomial theorem for positive integer exponents (which only uses the
algebraic rules for complex numbers) states that
n
n (z + w) =
k=0 n!
z k wn−k .
k !(n − k )! Upon substituting this into the last equation, we obtain the desired formula.
The formula of this theorem is the key to many results, like the following generalization
of sin2 x + cos2 x = 1 .
Corollary 1.29 C (z )2 + S (z )2 = 1 for all z ∈ C .
Proof: We use equations a) and b) of Theorem 21 to reduce the question to one of
exponentials.
E (iz )E (−iz ) = [C (z ) + iS (z )][C (z ) − iS (z )] = C 2 (z ) + S 2 (z ).
But by Theorem 22, E (iz )E (−iz ) = E (iz − iz ) = E (0) . Directly from the power series we
see that E (0) = 1 . This proves the formula.
Our next corollary states that the addition formulas for sin x and cos x are still valid
for C (z ) and S (z ) .
Corollary 1.30 C (z + w) = C (z )C (w) − S (z )S (w) and S (z + w) = S (z )C (w) − S (w)C (z )
for all z, w ∈ C
Proof: A direct algebraic computation does the job.
C (z + w) + iS (z + w) = E (iz + iw) = E (iz )E (iw) = [C (z ) + iS (z )][C (w) + iS (w)] = [C (z )C (w) − S (z )S (w)] + i[S (z )C (w) + S (w)C (z )].
Similarly we ﬁnd that
C (z + w) − iS (z + w) = [C (z )C (w) − S (z )S (w)] − i[S (z )C (w) + S (w)C (z )].
Addition of these two equations gives the formula for C (z + w) , while subtraction gives the
formula for S (z + w) .
Had we but world enough, and time, we would linger a while. A lovely result we have
not proved is that E (z + 2πi) = E (z ) , the periodicity of E (z ) , which is a consequence of 68 CHAPTER 1. INFINITE SERIES the formulas C (z + 2π ) = C (z ) , and S (z + 2π ) = S (z ) , the periodicity of C (z ) and S (z ) ,
by using Theorem 21 (but see pp. ??).
We shall close this chapter by restating the results proved above in the usual language
of ez etc. instead of the temporary notation E (z ) etc. we have been using.
eiz = cos z + i sin z (1-12) −iz = cos z − i sin z (1-13) 1
cos z = 2 (eiz + e−iz )
1
sin z = 2i (eiz − e−iz )
zw
z +w (1-14) e (1-15) e e =e (1-16) 2 sin z + cos z = 1 (1-17) cos(z + w) = cos z cos w − sin z sin w (1-18) sin(z + w) = sin z cos w + sin w cos z (1-19) 2 Generally, all algebraic formulas for sin x, cos x , and ex remain valid for sin z, cos z , and
ez . In fact any algebraic relationship between any combination of analytic functions remains
valid as we change the in dependent variable from a real x to the complex z . Inequalities
almost always fall apart in the transition from x ∈ R to z ∈ C . Exercise 2e below illustrates
this.
One formula which we will use frequently later on is a specialization of (1-12) to the
case when z is real. Then writing the real z as θ we have the famous formula
eiθ = cos θ + i sin θ, θ ∈ R. (1-20) We cannot resist stating this formula down again for θ = π :
eiπ = −1,
an almost mystical identity connecting the four numbers e, i π , and −1 . Notice that (1.6)
also implies eiθ = 1 .
If we write z = x + iy , then using (1.6) we ﬁnd
ez = ex+iy = ex eiy = ex (cos y + i sin y ). (1-21) |ez | = ex (1-22) A consequence of this is Exercises
(1) Observe that (directly from the power series)
cos(−z ) = cos z, and sin(−z ) = − sin z.
Use this and the addition formula for cos(z + w) to prove that sin2 z + cos2 z = 1 .
1
1
(2) If we deﬁne sin hx = 2 (ex − e−x ) and cos hx = 2 (ex + e−x ), x ∈ R , we prove that 1.6. COMPLEX-VALUED FUNCTIONS, E Z , COS Z, SIN Z . 69 (a) cos ix = cos h x, sin ix = i sin h x
(b) cos z = cos h y − i sin x sin h y, (z = x + iy )
sin z = sin x cos h y + i cos x sin h y
(c) |cos z |2 = cos2 x + sin h2 y
|cos z |2 = cos h2 y − sin2 x
|sin z |2 = sin2 x + sin h2 y
|sin z |2 = cos h2 y − cos2 x
(d) Use the identities of part c) to deduce that
|sin h y | ≤ |cos z | ≤ cos h y
|sin h y | ≤ |sin z | ≤ cos h y
(e) Prove that there is some z ∈ C such that
|sinz | > 1, and |cos z | > 1.
(3) Deﬁne the derivative of f (z ) at z0 , where z, z0 ∈ C , as
lim z →z0 f (z ) − f (z0 )
,
z − z0 if the limit exists.
(a) By working directly with the power series, show that ez is diﬀerentiable for all
z , and that
d az
e = aeaz , a, z ∈ C,
dz
(b) Apply this to (1-12) and (1-13) to deduce that
d
d
cos z = − sin z,
sin z = cos z
dz
dz
(We cannot appeal to Theorem 16 and diﬀerentiate term-by-term since that
theorem assumed the independent variable, x , was real).
(4) Use the results of Exercise 2c to show that the only complex roots z = x + iy of sin z
and cos z are at the points on the real axis y = 0 where sin x = 0 and cos x = 0 ,
respectively.
(5) Use the results of this section to prove DeMoirve’s Theorem
(cos θ + i sin θ)n = cos nθ + i sin nθ, θ ∈ R,
where n is a positive integer.
(6) (a) Show that the sum of the ﬁnite geometric series
N einx =
n=1 einx is ei(N +1/2)x − eix/2
.
eix/2 − e−ix/2 70 CHAPTER 1. INFINITE SERIES
(b) Take the real and imaginary parts of the above formula and prove that for all
x = 0, x ∈ (0, 2π ) ,
N cos x =
n=1 sin(N + 1/2)x − sin 1/2x
1
2 sin 2 x N sin nx =
n=1 1.7 1
1
cos 2 x − cos(N + 2 )x
.
2 sin 1
2 Appendix to Chapter 1, Section 7. As a special dessert let us take some time out and prove some interesting results you would
probably never see otherwise. We have in mind to deﬁne a speciﬁc number α ∈ R as the
smallest positive zero of cos x, x ∈ R —so α had better turn out as π/2 . Then we prove
that 1) sin(x + 4a) = sin x etc., 2) the ratio of the circumference to diameter of a circle
is 2α so that 2α does equal the π of public school fame. Furthermore, we also present a
way of computing α .
In this section we take sin z and cos z, z ∈ C to be deﬁned by their power series,
and use only the properties of these functions which were obtained from the power series
deﬁnition.
Lemma 1.31 The set A = { x ∈ R : cos x = 0, 0 < x < 2 } is not empty, that is, the
equation cos x = 0 has at least one real root for x ∈ (0, 2) .
Proof: Since cos x is deﬁned by a convergent power series, it is continuous (even inﬁnitely
diﬀerentiable); furthermore because x ∈ R and the power series has real coeﬃcients, we
know that cos x, x ∈ R is real-valued. Observe that cos 0 = 1 > 0 , and the following crude
inequality
22
+
cos 2 = 1 −
1·2
< −1 + ∞ n=2
4∞2
2 4! k=0 (−1)n 22n
< −1 +
(2n)! ( )2k
5 ∞
n=2 22 n
(2n)!
(1-23) 50
= −1 +
< 0.
63 Thus cos 0 > 0 and cos 2 < 0 , so there is at least one point in (0, 2) where the real-valued
continuous function cos x vanishes. This proves the lemma.
Denote the g.l.b of A (which does exist since A is bounded—say by 0 and 2) by α .
We shall show that α ∈ A . Since α is the g.l.b. of A , there exists a sequence of points
αk ∈ A (the αk may just be the same point repeated over and over) such that αk → α
and cos αk = 0 . But since cos x is continuous,
0 = lim cos αk = cos α,
k→∞ so in fact cos α = 0 too ⇒ α ∈ A .
Now cos x must be positive throughout the interval [0, α) , since it is positive at x = 0
d
and α is the ﬁrst place it vanishes. Therefore the formula dx sin x = cos x —obtained by
diﬀerentiating the real power series for sin x term by term—shows that sin x is increasing 1.7. APPENDIX TO CHAPTER 1, SECTION 7. 71 for x ∈ [0, α) . Since sin 0 = 0 , we see that sin x ≥ 0 for x ∈ [0, α) . Thus the formula
d
dx cos x = − sin x tells us that cos x is decreasing in the interval [0, α] . From the formula
1 = sin2 α + cos2 α = sin2 α,
and the fact that sin α > 0 , we ﬁnd that sin α = 1 . We can thus conclude from the
addition formulas for sin x and cos x the:
Theorem 1.32 Let α denote the smallest zero of cos x for x > 0 . Then
cos α = 0, cos 2α = −1, cos 3α = 0, cos 4α = 1
sin α = 1, sin 2α = 0, sin 3α = −1, sin 4α = 0,
or more generally
cos(z + α) = − sin z, sin(z + α) = cos z
cos(z + 4α) = cos z, sin(z + 4α) = sin z
This proves that the sin z and cos z are periodic with period 4α .
As you have guessed, α is another name for π/2 —and serves as our deﬁnition of π .
This is based upon power series and is independent of circles or triangles—or even the entire
concept of angle. A simple consequence is the
Corollary 1.33 The function ez is periodic with period 4αi ,
ez +4αi = ez e4αi = ez .
Proof: ez +4α = ez e4iα = ez (cos 4α + i sin 4α) + ez (1 + i0) = ez .
Two issues remain to be settled before closing up. We should 1) prove that the ratio
of the circumference C of a circle to its diameter D is π , i.e., C = 2αD , and 2) ﬁnd
some way of approximating α numerically (for all we know of alpha so far is that it is
the smallest element in a set and 0 < α < 2 ). The two problems are closely related.
The circle of radius R has the equation x2 + y 2 = R2 . Consider the portion in the
ﬁrst quadrant. Then using the familiar formulas for arc length, we ﬁnd that
C
=R
4 R √
0 dx
=R
R2 − x2 1 √
0 dt
,
1 − t2 where the change of variable x = Rt has been used to obtain the last integral [this is legal
since the mapping “multiply by R” is a bijection and hence an invertible function]. Thus,
the desired result, C = 2αD = 4αR will be proved if we can p rove
Theorem 1.34 1
√ dt
0
1−t2 = α(= π )
2 Corollary 1.35 If C denotes the arc length of the circumference of a circle of radius R ,
then C = 4αR . 72 CHAPTER 1. INFINITE SERIES Proof: of Theorem. We want to make the change of variable t = sin ζ , where t ∈ [0, 1] . In
order to do this we must only check that the function sin ζ is diﬀerentiable and invertible
function there. We know it is diﬀerentiable . Since sin x is continuous and monotone
increasing for x ∈ [0, α] , and since the end points are mapped into 0 and 1 respectively
( sin 0 = 0, sin α = 1 ), the function f (ζ ) = sin ζ is invertible for x ∈ [0, α] ⇐⇒ t ∈ [0, 1] .
The usual formulas are applicable and yield
1 √
0 1
dt =
1 − t2 α dζ = α
0 Q.E.D.
To compute π = 2α , it is convenient to introduce tan z = sin z/ cos z , for all z where
cos z = 0 . In particular tan x is deﬁned for all real x in the interval 0 ≤ x < α/2 . From
the behavior of sin x and cos x in the interval x ∈ [0, α/2) , it is easy to show that tan x
has inﬁnitely many derivatives and is increasing for x ∈ [0, α/2) , assuming the values from
0 = tan 0 to 1 = tan α . The function tan x is therefore invertible in that interval, so we
2
can make the natural change of variable t = tan x and obtain
1
0 dt
=
1 + t2 α/2
0 α/2 d
1
2 x ( dx tan x)dx =
1 + tan dx =
0 α
.
2 But the integral on the left can be approximated readily because of the algebraic identity
1
=
1 + t2 N (−1)n t2n +
0 Thus
α
π
==
4
2
or 1
0 dt
=
1 + t2 (−1)N +1 t2N +2
, all t = i.
1 + t2 N 1 (−1)n
0 0 1 2N +2
t t2n dt + (−1)N +1
0 1 + t2 dt, π
111
(−1)N
= 1 − + − + ··· +
+ RN ,
4
357
2N + 1 where since 2t ≤ 1 + t2 the remainder RN can be estimated by
1 2N +2
t |RN | =
0 1+ t2 1 2N +2
t dt <
0 2t dt = 1
4N + 4 If the ﬁrst 250 terms in the series are used, N = 250 , we ﬁnd
π
11
1
= 1 − + − ··· +
+ R250 ,
4
35
251
1
1
where |R250 | < 1004 < 1000 , so three decimal accuracy is obtained. This is quite slow—but
it does work. For practical computations, a series which converges much faster is needed.
See exercise 2 below; it is neat.
Since RN → 0 as N → ∞ , the following formula is a consequence of our eﬀort: π
1111
= 1 − + − + − · · · ..
4
3579 Exercises 1.7. APPENDIX TO CHAPTER 1, SECTION 7. 73 (1) Use the method illustrated here to slow that
1 ln 2 =
0 1111
(−1)N +1
1
dx = 1 − + − + − · · · +
+ RN ,
1+x
2345
N where limN →∞ RN = 0 . Find an N such that |RN | < 10−3 .
[Hint: Write 1
1+x (2) To approximate N
nn
0 (−1) x =
π
4 1
=
1 + t2 + (−1)N +1 xN +1
,
1+x x = −1]. with fewer terms, the following clever device works. Write N −1 (−1)N t2N
(−1)N t2N
(−1)N +1 t2N +2
+(
+
)
2
2
1 + t2 (−1)n t2n +
0 and show that
π
11
(−1)N −1
(−1)N
˜
= 1 − + + ··· +
+
+ RN ,
4
35
2N − 1
2(2N − 1)
˜
where RN + (−1)N
2 1 t2N −t2N +2
0
1+t2 ˜
(a) Prove that RN < dt . 1
8N 2 +8N . ˜
(b) What should N be to make RN < 10−3 ? Amazing saving, isn’t it? The
technique does generalize to other series and can be reﬁned to yield even better
results.
(c) Apply the method given here to problem 1 above to show that ln 2 = 1 − 1 +
2
N +1
−N
1
1
1
˜
˜
+ · · · + (N1)1 + 2 (−1)
+ RN , where RN < (2N +1)(2N +3) . Pick N so that
3
−
N
˜
RN < 10−3 . 74 CHAPTER 1. INFINITE SERIES Chapter 2 Linear Vector Spaces: Algebraic
Structure
2.1 Examples and Deﬁnition In order to develop intuition for linear vector spaces, a slew of standard examples are needed.
From them we shall abstract the needed properties which will then be stated as a set of
axioms. a) The Space R2 . We begin by informally examining a space of two dimensions (whatever that means). It is
constructed by taking the Cartesian Product of R with itself. We are thus looking at R × R ,
which is denoted by R2 . A point X in this space is an ordered pair, X = (x1 , x2 ) , where
x1 ∈ R, x2 ∈ R . x1 and x2 are called the coordinates or components of the point x . Let
us propose a reasonable algebraic structure on R × R . If X = (x1 , x2 ) , and Y = (y1 , y2 )
are any two points, and α is any real number, we deﬁne
addition: X + Y = (x1 + y1 , x2 + y2 ) .
multiplication by scalars: α · X = (αx1 , αx2 ), α ∈ R .
equality: X = Y ⇐⇒ x1 = y1 , x2 = y2
The addition formula states that the parallelogram rule is used to add points, whereas
the second formula states that a point X is “stretched” by α by stretching each coordinate
by α .
Some immediate consequences of the above deﬁnitions are, for all X, Y, Z in R × R ,
(1) addition is associative (X + Y ) + Z = X + (Y + Z )
(2) addition is commutative X + Y = Y + X
(3) There is an additive identity, 0=(0,0) with the property that X + 0 = X for any X .
(4) Every X = (x1 , x2 ) ∈ R × R has an additive inverse (−x1 , −x2 ) , which we denote
by −X . Thus X + (−X ) = 0 . Thus the set of points in R × R forms an additive
abelian group.
The following additional properties are also obvious, where α and β are arbitrary
real numbers.
75 76 CHAPTER 2. LINEAR VECTOR SPACES: ALGEBRAIC STRUCTURE (5) α(βX ) = (αβ )X
(6) 1 · X = X .
and the two distributive laws.
(7) (α + β )X = αX + βX
(8) α(X + Y ) = αX + αY .
To insure that you too feel these properties are obvious, let us prove, one, say 7.
(α + β ) · X = (α + β ) · (x1 , x2 ) = ((α + β )x1 , (α + β )x2 )
= (αx1 + βx1 , αx2 + βx2 ) = (αx1 , αx2 ) + (βx1 , βx2 ) (2-1) = α · (x1 , x2 ) + β · (x1 , x2 ) = α · X + β · X
Example: If X = (2, 1) , then 3X = (6, 3) and −2X = (−4, −2) .
Instead of thinking of the elements (x1 , x2 ) in R2 as points, it is sometimes useful to
think of them as directed line segments, from the origin (0,0) directed to the point (x1 , x2 ) .
The ﬁgure at the right illustrates this.
Note that the axes need not be perpendicular to each other in the space R2 . They
could just as well veer oﬀ at some outrageous angle, as in the diagram. This is because we
have yet to place a metric (distance) structure on R2 or introduce any concept of angle
measurement. When we do that, we will have Euclidean 2-space E2 . But right now all we
have is R2 , which might be thought of as a ﬂoppy Euclidean space. b) The Space Rn .
This is a simple-minded generalization of R2 . A point X in Rn = R × ... × R is an
ordered n tuple, X = (x1 , x2 , ..., xn ) of real numbers, xk ∈ R . If X = (x1 , ..., xn ) and
Y = (y1 , ..., yn ) are any two points in Rn , and α is any real number, we deﬁne
addition: λ + Y = (x1 + y1 , x2 + y2 , ..., xn + yn )
multiplication by scalars: α · X = (αx1 , αx2 , ..., αxn ), α ∈ R.
equality: X = Y ⇐⇒ xj = yj for all j .
1
Example: The point X = (1, 2, 3) , and 1 X = ( 2 , 1, 3 ) in R3 are indicated in the ﬁgure.
2
2
Again the coordinate axes need not be mutually perpendicular.
Properties 1-8 listed earlier remain valid - and with the proofs essentially unchanged
(just add dots inside the parentheses).
Remark: . At this stage, you probably are anxiously waiting for us to deﬁne multiplication
in Rn , that is, the product of two points in Rn , X · Y = Z ∈ Rn , possibly using the
multiplication of complex numbers (points in R2 ) as a guide. Well, we would if we could.
It turns out that it is possible to deﬁne such a multiplication only in R1 , R2 , R4 , and in
R8 –but in no others. This is a famous theorem. In R2 ordinary complex multiplication
does the job. To do it in R4 , we have to abandon the commutative law for multiplication.
The result is called quaternions. In R8 , the multiplication is neither commutative nor
associative. The result there is the Cayley numbers.
Here we shall not have time to treat this issue. All we shall do (later) is introduce a
“pseudo multiplication” in R3 —the so called cross product - obtained from the quaternion 2.1. EXAMPLES AND DEFINITION 77 algebra in R4 . The major importance of this pseudo multiplication which holds only in R3
is the fact of life that our world has three space dimensions. This multiplication is extremely
valuable in physics. c) The Space C [a, b] . Our next example is of an entirely diﬀerent nature, it is a space of functions, a function
space. The space C [a, b] is the set of all real-valued functions of a real variable x which
are continuous for x ∈ [a, b] . If f and g are continuous for x ∈ [a, b] , that is if f and
g ∈ C [a, b] , and if α is any real number, we deﬁne, in the usual way,
addition: (f + g )(x) = f (x) + g (x) ,
multiplication by scalars: (αf )(x) = α[f (x)]. α ∈ R
equality: f = g ⇐⇒ f (x) = g (x) for all x ∈ [a, b].
Notice that the sum of two functions in C [a, b] is again in C [a, b] , and the product of
a continuous function - in C [a, b] —by a constant α is also an element of C [a, b] . We shall
ignore the fact that the product of two continuous functions is also a continuous function.
Properties 1-8 listed earlier are also valid here, that is, if f, g , and h are any elements
in C [a, b] , then
(1) f + (g + h) = (f + g ) + h
(2) f + g = g + f
(3) f + 0 = f
(4) f + (−1)f = 0
(5) α(βf ) = (αβ )f
(6) (1)f = f 1∈R (7) (α + β )f = αf + βf
(8) α(f + g ) = αf + αg .
Again, 1-4 state that the elements of C [a, b] form an abelian group with the group
operation being addition. When we deﬁne the dimension of a vector space, it will turn
out that the space C [a, b] is inﬁnite dimensional, but don’t let that bother you. This nice
space, C [a, b] , and Rn are the two most useful examples of a vector space. d) D. The Space C k [a, b] . The space C k [a, b] consists of all real-valued functions f (x) which have k continuous
derivatives for x in the interval [a, b] ⊂ R . When k = 0 , this reduces to the space C [a, b] .
Addition and scalar multiplication are deﬁned just as in C [a, b] . The key property is that
the sum of two functions with k continuous derivatives of x ∈ [a, b] is also a function with
k continuous derivatives. All of properties 1-8 are valid in C k [a, b] .
Every function f (x) which has one continuous derivative is necessarily continuous.
This is a basic result from elementary calculus; it may be written as C 1 [a, b] ⊂ C [a, b] .
Since the function |x| , x ∈ [−1, 1] is in C [−1, 1] but not in C 1 [−1, 1] , we see that C 1 and
C are not the same, that is C 1 is a proper subset of C . Similarly, C k+1 [a, b] ⊂ C k [a, b]
(see Exercise 7). 78 CHAPTER 2. LINEAR VECTOR SPACES: ALGEBRAIC STRUCTURE The space C ∞ [a, b] consists of all functions with an inﬁnite number of continuous
derivatives for x ∈ [a, b] . All functions which have a convergent Taylor series for x ∈ [a, b]
2
are in C ∞ [a, b] . In addition, C ∞ [a, b] contains functions like f (x) = e−1/x , x = 0, f (0) =
0 , which have an inﬁnite number of continuous derivatives (see p. ??) but do not have
convergent Taylor series.
Another example of a function space is the set of analytic functions A(z0 , R) , functions
which have a convergent Taylor series in the disc with center at z0 ∈ C and radius at least
R. e) E. The Space l1 . The space l1 (tired yet?) consists of all inﬁnite sequences X = (x1 , x2 , x3 , ...) which satisfy
∞ |xn | < ∞ . Addition and multiplication by scalars are deﬁned in a natural the condition
n=1 way. If X and Y are in l1 , then
X + Y + (x1 + y1 , x2 + y2 , x3 + y3 , ...)
and, if α is any complex number
α · X = (αx1 , αx2 , ...).
Equality is deﬁned by
X + Y ⇐⇒ xj = yj for all j.
We should show that if X and Y are in l1 , then so is X + Y and x · X . To prove that
X + Y ∈ l1 , we must show that
|xn + yn | < ∞ . But since |xn + yn | ≤ |xn | + |yn | , we
have for any N ∈ Z+
N N |xn + yn | ≤
n=1 ∞ N |xn | +
n=1 |yn | ≤
n=1 ∞ |xn | +
n=1 |yn | < ∞.
n=1 ∞ Now letting N → ∞ on the left, we see that
that α · X is also in l1 since
∞ ∞ |αxn | =
n=1 f) |xn + yn | < ∞ . If X ∈ l1 , it is obvious
n=1 ∞ |α| |xn | = |α|
n=1 |xn | < ∞.
n=1 F. The Space L1 [a, b] . Yes, the space L1 [a, b] does consist of all functions f (x) (possibly complex-valued) with
b
the property that a |f (x)| dx < ∞ . It is the integral analogue of l1 . Addition and scalar
multiplication are deﬁned as in C [a, b] , that is, as usual. If f and g are in L1 [a, b] , then
so are f + g and αf , where α ∈ C , since
b b |f (x) + g (x)| dx ≤
a b |f (x)| dx +
a |g (x)| dx < ∞,
a 2.1. EXAMPLES AND DEFINITION 79 and
b b |αf (x)| dx = |α|
a |f (x)| dx < ∞.
a For example, f (x) = x is in L1 [0, 1] but f (x) =
that properties 1-8 are satisﬁed in L1 [a, b] . g) 1
x2 is not in L1 [0, 1] . It is simple to check G. The Space fn . If P (x) = a0 + a1 x + ... + an xn is any polynomial of degree n with real coeﬃcients and
Q(x) = b0 + b1 x + ... + bn xn is another one, then with ordinary addition, multiplication by
real scalars and equality the set fn of all polynomials of degree n satisfy conditions 1-8.
Since
a0 + a1 x + ... + an−1 xn−1 = a0 + a1 x + ... + an−1 xn−1 + 0xn ,
it is clear that fn−1 ⊂ fn .
Enough examples for now. You must have gotten the point. We shall meet more later
on. Let us give the abstract deﬁnition of a linear vector space.
Definition: . Let S be a set with elements X, Y, Z, ... and F be a ﬁeld with elements
α, β... . The set S is a linear vector space (linear space, vector space )over the ﬁeld F if the
following conditions are satisﬁed.
For any two elements X, Y ∈ S , there is a unique third element X + Y ∈ S , such that
(1) (X + Y ) + Z = X + (Y + Z );
(2) X + Y = Y + X ;
(3) There exists an element 0 ∈ S having the property that 0 + X = X for all X ∈ S ;
(4) for every X ∈ S , there is an element −X ∈ S ; such that X + (−X ) = 0 .
Furthermore, if α is any element of the ﬁeld F , there is a unique element αX ∈ S
such that, for any α, β ∈ F ,
(5) α(βX ) = (αβ )X ;
(6) 1 · X = X .
The additive and ﬁeld multiplicative structures are related by the following distributive rules
(7) (α + β )X = αX + βX
(8) α(X + Y ) = αX + αY.
Elements of the ﬁeld F are called scalars, whereas elements of S are called vectors.
We shall usually take the real numbers R for our ﬁeld F , although the complex numbers
C will be used at times. Exercise 4 shows the need for Axiom 6 (in case you thought it was
superﬂuous).
All of the examples of this section are linear spaces. For most purposes the simple
example R2 will serve you well as a guide to further expectations. The pictures there are
simple. In fact, with a certain degree of cleverness, the “right” proof for R2 immediately
generalizes to all other linear spaces - even “inﬁnite dimensional” ones. 80 CHAPTER 2. LINEAR VECTOR SPACES: ALGEBRAIC STRUCTURE Since you probably think that everything is a linear space, here is an example to dispel
the delusion. Let S be the subset of all functions f (x) in C [0, 1] which have the property
f (0) = 1 . Then if f and g are in S , we are immediately stuck since f (0) + g (0) = 2 , so
that f + g is not in S . Also, 0 ∈ S .
Both here, and before (p.?) when deﬁning a ﬁeld, axioms “0” have been used. They all
express roughly the same concept. We have some set S and an operation * deﬁned on the
set. These axioms all stated that for any x, y ∈ S , we also have x ∗ y ∈ S . In other words,
the set S is closed under the operation * in the sense that performing that operation does
not take us out of the set. We shall ﬁnd this concept useful. h) Appendix. Free Vectors One more example is needed, an exceedingly important example. There are “physicists’
vectors” or free vectors. I always thought they were easy to deﬁne - until today. Twelve
hours and ﬁfty pages later, I begin again on the ﬁfth attempt. The essential idea is easy to
imagine but diﬃcult to convey in a clear and precise exposition.
Say you are given two elements X and Y of Rn , which we represent by directed line
segments from the origin. Somehow we want to ﬁnd a directed line segment V from the
tip of X to the tip of Y . Now V “looks” like a vector. The problem is that all of the
vectors we have met so far have been directed line segments in Rn beginning at the origin.
In order to ﬁnd a way out, it is best to examine the problem for the most simple case
−R1 , the ordinary line. Watch closely since we will be so shrewd that all the formalism
will be adequate without change for the general case of Rn .
We are given two points, X and Y of R1 which we shall represent by directed line
segments from the origin. To make the picture clear, we will draw them slightly above the
line.
a figure goes here
We want a directed line segment V from the tip of X to the tip of Y . Of course you
recognize this as the problem of solving
X +V =Y
The solution, V = Y − X , is the diﬀerence of the two real numbers Y and X . But
where should we draw V ? If we are stubborn and demand that all real numbers must be
represented by line segments beginning at the origin, we have the picture
a figure goes here
but what we really want to do is place the tail of V at the tip of X and add the line
segments. Why not relent and allow ourselves this added ﬂexibility.
a figure goes here
There! Now we have solved our problem. But we have made an important generalization
in doing so. You see, this V has been released from its bondage to the origin and is now
free to move along the whole of R .
Although we were led to this V from the pair X and Y , the same V could have been
˜
˜
generated by a diﬀerent pair X and Y , as the diagram below indicates, 2.1. EXAMPLES AND DEFINITION 81 a figure goes here
˜
˜
for we still have X + V = Y .
In the ﬁrst case we might have had X = 2 and Y = 3 , so that V = 1 , while in the
second, we might have had X = −4 and Y = −3 , and again V = 1 . Even though we
have let this V go free, sliding from place to place along R , we still want to say that this
is only one V , and in fact, we want to identify this V with the V tied to the origin in
(2). In other words, we would like to say that all three V ’s used above are equivalent to
each other.
More formally, the element V is generated by an ordered pair, V = [X, Y ] , which we
˜
read as the vector from X to Y , for X, Y ∈ R . If some V is generated by another ordered
˜ = [X, Y ], X, Y ∈ R , then we want equality V = V to mean that Y − X = Y − X .
˜˜ ˜˜
˜
˜˜
pair, V
Moreover, we want to represent V = [X, Y ] , the vector from X to Y , by the vector from
the origin 0 to Y − X, V = [0, Y − X ] . This representation of V is unique, since if any
˜˜
˜
˜
other pair also generates V, V = [X, Y ] , the representative V = [0, Y − X ] = [0, Y − X ]
˜
˜
since V = V implies that Y − X = Y − X . Therefore much as each rational number
is an equivalence class, represented by a single rational number - as 1 represents the
2
3
equivalence class 1 , 2 , 6 , ... , each V is an equivalence class of ordered pairs V = [X, Y ] ,
24
where X, Y ∈ R . It is uniquely represented by an element of R , viz. V = Y − X , the
representation being independent of the particular ordered pair [X, Y ] which generates V .
It is possible to think of V either as an ordered pair with an equivalence relation, or just
as the representative V = [0, Y − X ] of the whole equivalence class, the representation
being written more simply as an element of R : V = Y − X , where here equality is between
elements of R .
The generalization is now easily made
Definition: . (Free vectors). Let X and Y be any elements of Rn . An element V ∈ V n ,
“physicists’ n -space”, is deﬁned as an equivalence class of ordered pairs of elements in Rn ,
V = [X, Y ], X, Y ∈ Rn , ˜
˜˜
with the following equivalence relation: If V = [X, Y ] and V = [X, Y ] , then
˜
˜
˜
V = V ⇐⇒ Y − X = Y − X,
where the second equality is that of elements in Rn . If we are given X and Y in Rn , we
speak of V = [X, Y ] as the free vector going from X to Y .
Previous reasoning also shows that each V ∈ Vn is uniquely represented by the ordered
pair V = [0, Y − X ] . This representation is independent of the elements [X, Y ] which
generated V .
We were led to this deﬁnition of Vn by examining the situation in the special case of
1 . Since our formal reasoning there was quite algebraic and general, we know that the
V
deﬁnition works algebraically. The geometry works too. An example in V2 should make
the general case clear.
Let X = (1, 3) and Y = (2, 1) . These two points in R2 generate the ordered pair
V = [(1, 3), (2, 1)] in V2 . V is the vector going from X = (1, 3) to Y = (2, 1) . Of all
equivalent V ’s, the unique representative which begins at the origin is V = [(0, 0), (1, −2)] ,
which we simply write as V = (1, −2) and represent as an ordinary element of R2 . On
˜
˜
the same diagram we exhibit the vector from X = (−2, 2) to Y = (−1, 0) , which is
˜
˜
˜
V = [(−2, 2), (−1, 0)] . The unique representative (of all V ’s equivalent of V ) which begins 82 CHAPTER 2. LINEAR VECTOR SPACES: ALGEBRAIC STRUCTURE ˜
˜
from (0,0) is V = [(0, 0), (1, −2)] , which we write simply as V = (1, −2) . Comparison of
˜ reveals that they are equal, V = V . Thus, from the diagram, we see that a
˜
V and V
free vector is an equivalence class of directed line segments, with two directed line segments
˜
V, V being equivalent as vectors in V2 if they are equivalent to the same directed line
˜
segment which begins at the origin. In more geometrical language, V = V if by sliding
them “parallel to themselves”, they can be made to coincide with their representer which
begins at the origin. (We shall not deﬁne “parallel” here. It is not needed because we
already have a satisfactory algebraic deﬁnition of equivalence.)
ˆ
Notice that X = (1, 3) and Y = (2, 1) also generates a second ordered pair V =
[(2, 1), (1, 3)] , the vector from Y = (2, 1) to X = (1, 3) . Its unique representation which
ˆ
ˆ
begins at the origin is V = [(0, 0), (−1, 2)] , or more simply V = (−1, 2) . Comparison with
ˆ
the previous example shows that V = −V : the vector from Y to X is the negative of the
vector from X to Y . We need the little arrow on our picture of V = [X, Y ] to distinguish
it from −V = [Y, X ] which is also between the same points but headed in the opposite
direction.
From now on we shall denote a vector V ∈ Vn from X to Y by its representative
Y − X in Rn , so V = Y − X . Hence the vector from (1,3) to (2,1) will be immediately
written as V = (1, −2) . As we have said many times, the representation V = Y − X as
an element on Rn is independent of which particular pair [X, Y ] happened to generate V .
The following diagram shows a whole bunch of equivalent vectors Vj ∈ V2 ,
a figure goes here
Vj = Vk , and their particular representative V chained to the origin.
In order to justify calling the elements of Vn vectors, we should prove that the elements
of Vn do form a vector space. Addition and scalar multiplication must ﬁrst be deﬁned, an
easy task. Since every V ∈ Vn is uniquely represented as an element of Rn , V = Y − X ∈
Rn , we use addition and scalar multiplication for elements of Rn —which has already been
deﬁned. Because Rn is known to be a vector space, it is a tedious triviality to prove.
Theorem 2.1 . Vn is a linear vector space.
Proof: . Only a smattering.
(1) Vn is closed under addition. Say V1 and V2 are in Vn . Then they are represented
as the diﬀerence of two elements of Rn , say V1 = Y1 − X1 and V2 = Y2 − X2 . Thus
V1 + V2 = (Y1 − X1 ) + (Y2 − X2 ) = (Y1 + Y2 ) − (X1 + X2 ),
so that their sum is generated by [X1 + X2 , Y1 + Y2 ] . In other words, there is at
least one pair of elements, [X3 , Y3 ], X3 = X1 + X2 and Y3 = Y1 + Y2 , in Rn which
generate V1 + V2 , so that V3 = V1 + V2 ∈ Vn . Of course [0, Y3 − X3 ] and many other
pairs also generate V3 .
(2) Commutativity.
V1 + V2 = (Y1 − X1 ) + (Y2 − X2 ) = (Y2 − X2 ) + (Y1 − X1 ) = V2 + V1 .
(3) (α + β )V1 = (α + β )(Y1 − X1 ) = α(Y1 − X1 ) + β (Y1 − X1 ) = αV1 + βV1 2.1. EXAMPLES AND DEFINITION 83 1
Example: If A = (4, 2, −3), B = (0, 1, −2), C = (−1, 0, 1 ) and D = (4, − 2 , 1) , ﬁnd the
2
vector V1 from A to B and the vector from C to D . Then compute V1 + 2V2 and
V1 − V2 .
solution: V1 = B − A = (0, 1, −2) − (4, 2, −3) = (−4, −1, 1) 1
11
1
V2 = D − C = (4, − , 1) − (−1, 0, ) = (5, − , )
2
2
22
11
V1 + 2V2 = (−4, −1, 1) + 2(5, − , ) = (−4, −1, 1) + (10, −1, 1) = (6, −2, 2)
22
11
11
11
V1 − V2 = (−4, −1, 1) − (5, − , ) = (−4, −1, 1) + (−5, , − ) = (−9, − , )
22
22
22 Exercises
(1) (a) Find the vector representing the free vectors from the given A ∈ Rn to B ∈ Rn .
(i)
(ii)
(iii)
(iv)
(v)
(vi) A = (3, 1), B = (2, 2).
A = (−3, 3), B = (0, 4).
A = (2, 2, 3), B = (5, 2, 17)
A = (0, 0, 0) B = (9, 8, −3)
A = (1, 2, 3), B = (0, 0, −1)
A = (0, 0, −1), B = (1, 2, 3) (b) Let V1 and V2 be the respective vectors of iii) and v) above. Compute V1 +
V2 , V1 − V2 , and 2V1 − 3V2 .
(c) Draw a diagram on which you indicate the vector going from A = (3, 1) to
B = (2, 2) , and indicate the representer of that vector which begins at the
origin. Do the same with the vector from B to A .
(2) Which of the following subsets of C [−1, 1] are linear spaces:
(a) The set of all even functions in C [−1, 1] , that is, functions f (x) with the additional property f (−x) = f (x) , like x2 and cos x .
(b) The set of all functions f in C [−1, 1] with the additional property that |f (x)| ≤
1.
(c) The set of all functions f in C [−1, 1] with the property that f (0) = 0 .
(3) In R3 , let X = (1, −1, 2) and Y = (0, 4, −3) . Find X + 2Y, Y − X , and 7X − 4Y .
(4) (a) Show that for every X ∈ R3 you can ﬁnd scalars αj ∈ R such that X can be
written as
X = α1 e1 + α2 e2 + α3 e3 ,
where e1 = (1, 0, 0), e2 = (0, 1, 0), e3 = (0, 0, 1) .
(b) If X ∈ R3 , can you ﬁnd scalars αj ∈ R such that
X = α1 θ1 + α2 θ2 + α3 θ3 ,
where θ1 = (1, −1, 0), θ2 = (−1, 1, 0), θ3 = (0, 0, 1) , and αj ∈ R ? Proof or
counter-example. 84 CHAPTER 2. LINEAR VECTOR SPACES: ALGEBRAIC STRUCTURE
(c) Find two polynomials P1 (x) and P2 (x) in ?1 such that for every polynomial
P (x) ∈?1 you can ﬁnd scalars αj ∈ R such that P can be written in the form
P (x) = α1 P1 (x) + α2 P2 (x). (5) Let V = R × R with the following deﬁnition of addition and scalar multiplication
X + Y = (x1 + x2 , y1 + y2 ), αX = (αx1 , 0),
0 = (0, 0), −X = (−x1 , −x2 ).
Is V a vector space? Why?
(6) Show that any ﬁeld can be considered to be a vector space over itself.
(7) Consider the set
S = { u ∈ C 2 [0, 1] : a2 u + a1 u + a0 u = 0 },
where the aj (x) ∈ C [0, 1] . Is S a linear space? Note that we do not yet know that
S has any elements at all. The proof that S is not empty is the existence theorem
for ordinary diﬀerential equations.
(8) By integrating |x| the “right” number of times, ﬁnd a function which is in C k [−1, 1]
but is not in C k+1 [−1, 1] . 2.2 Subspaces. Cosets. With this section we begin the process of assigning names to the various concepts surrounding the idea of a linear vector space. This name calling will take us the balance of the
chapter. Although the ideas are elementary and theorems simple, do not deceive yourselves
into thinking this must be some grotesque joke that mathematicians have perpetrated. You
see, we are in the process of building a machine. Most of its constituent parts are very
easy to grasp. But when combined, the machine will be equipped successfully to assault a
diversity of problems which appear oﬀ hand to be unrelated.
The value of this abstract formalism is that many seemingly distinct complicated speciﬁc
problems are just one single problem in a variety of fancy dresses. By ignoring the extraneous
paraphernalia we can concentrate on the essential issues.
a figure goes here
We begin by deﬁning what is meant by a subspace of a vector space W . While reading
the deﬁnition, think of a plane through the origin, which is a subspace of ordinary three
dimensional space.
Definition: . A set A is a linear subspace (linear variety, linear manifold) of the linear
space W if i) A is a subset of W , and ii) A is also a linear space under the operations of
vector addition and multiplication by scalars already deﬁned on V . 2.2. SUBSPACES. COSETS. 85 Examples:
(1) Let A = { X ∈ R3 : X = (x1 , x2 , 0) } , that is, the points in R3 whose last coordinate
is zero. Since A ⊂ R3 , and a simple check shows that A is also a linear space,
we see that A is a linear subspace of R3 . Intuitively, this set A certainly “looks
like” R2 . You are right, and recall that the fancy word for this equivalence - of
R2 = (x1 , x2 ) and the points in R3 of the form (x1 , x2 , 0) —is isomorphic. Similarly,
the set B = { X ∈ R3 : X = (x1 , 0, x3 ) } is also a subspace of R3 . B is also
isomorphic to R2 .
(2) Let A = { X ∈ Rn : X = (x1 , x2 , ..., xk , 0, 0, ..., 0) } , that is, the points in A are those
points in Rn whose last n − k coordinates are zero. It is easy to see that A is a
linear subspace of Rn , and that A is isomorphic to Rk .
(3) Let A = { f ∈ C [0, 1] : f (0) = 0 } . A is a subset of the linear space C [0, 1] , and is
also a linear space (check this). Thus A is a linear subspace of C [0, 1] .
(4) Let A = { f ∈ C [0, 1] : f (0) = 1 } . A is a subset of C [0, 1] , but it is not a linear
subspace since - as we saw in the last section (p. ?)— A is itself not a linear space.
The following lemma supplies a convenient criterion for checking if a given subset A of
a linear space W is a subspace.
Theorem 2.2 . If A is a non-empty subset of the linear space W , then A is a linear
subspace of W ⇐⇒ A is closed under addition of vectors in A and multiplication by all
scalars.
Proof: . ⇒ . Since A is a subspace, it is itself a linear space. But all linear spaces are,
by deﬁnition, closed under addition and multiplication by scalars.
⇐ . Because A is a subset of W , and properties 1,2,5,6,7, and 8 hold in W , they also
hold for the particular elements in W which happened to be in A . Notice that here we use
the fact that A is closed under addition. Therefore only the existential axioms 3 and 4 need
be checked. Since A is not empty, it contains at least one element, say X ∈ A . Because
A is closed under multiplication by scalars we see that 0 = 0 · X ∈ A . Furthermore, for
every X ∈ A , also −X = (−1) · X ∈ A .
Example: Let A = { f ∈ C 1 [0, 1] : f (0) = 0 } . Since A is a subset of the linear space
C 1 [0, 1] , all we need show is that A is closed under addition and multiplication by scalars in
order to prove A a linear subspace of C 1 [0, 1] . If f, g ∈ A , then (f + g ) (0) = (f + g )(0) =
f (0) + g (0) = 0 , so f + g ∈ A . Also, for any α ∈ R, (αf ) (0) = α(f )(0) = α · 0 = 0 , so
αf ∈ A .
Theorem 2.3 . The intersection of two subspaces is also a subspace, but the union of two
subspaces is not necessarily a subspace. More generally, the intersection of any collection
of subspaces is also a subspace.
Proof: . Let A, B be subspaces of W . We show that A ∩ B is a subspace. Since
A ∩ B ⊂ W , all we need show is the closure properties of A ∩ B . If X, Y ∈ A ∩ B , then
X and Y are both in A and B , so X + Y ∈ A and X + Y ∈ B ⇒ X + Y ∈ A ∩ B
too. Similarly for scalar multiples. The proof that A ∩ B ∩ C ∩ ... is a subspace is identical
except for a notational mess. 86 CHAPTER 2. LINEAR VECTOR SPACES: ALGEBRAIC STRUCTURE For the second part of the theorem we merely exhibit an example of two subspaces
A, B for which A ∪ B is not a subspace. In R2 let A be the linear subspace “horizontal
axis”, that is, A = { X ∈ R2 : X = (x1 , 0) } , while B is “the vertical axis”, B = { X ∈
R2 : X = (0, x2 ) } . Then A ∪ B is the “cross” of all points on either the horizontal axis or
the vertical axis. This is not a linear space because points like (1, 0) ∈ A, (0, 1) ∈ B do
not have their sum (1, 0) + (0, 1) = (1, 1) in A ∪ B . Precisely for this reason R2 = R1 × R1
was constructed as the Cartesian product of R1 with itself; for if it had been constructed
as R1 × R1 , then only the points situated on the axes themselves would get caught. More
generally - and for the same reason - the Cartesian product is the process always used to
“glue” together a larger space from several linear spaces. Only when A ⊂ B (or B ⊂ A )
is A ∪ B also a subspace (Exercise 4).
Your image of a linear space should be R3 , and a subspace S is a plane or line in R3 .
Note that since every subspace must contain 0, these planes or lines must pass through the
origin.
Example: Let Sc = { X ∈ R2 : x1 + 2x2 = c, c real } . Thus, the set Sc is all points
S = (s1 , s2 ) ∈ R2 on the straight line s1 + 2s2 = c . For what value(s) of c is Sc a
subspace? If Sc is a subspace, then we must have aS ∈ Sc for all scalars a , that is
aS = (as1 , as2 ) ∈ Sc ⇒ as1 + 2as2 = c . But for a = 0 this states that c = 0 . Therefore
the only possible subspace is S0 = { X ∈ R2 : x1 + 2x2 = 0 } . It is easy to check that if S1
and S2 are in S0 , then so are S1 + S2 and aS1 . Thus S0 is a subspace. Similarly, every
straight line through the origin is a subspace.
Our question now is, how can we talk about the other straight lines or planes which do
not happen to pass through the origin? First we answer the question for our example above.
There we have the linear space R2 and the subspace S0 which will be simply written as
S . S is a line through the origin. Let X1 be any element in R2 (think of X1 as a point).
Then the set of all elements of R2 which can be written in the form S + X1 , where S ∈ S ,
is the line “parallel” to S which passes through X1 . This line is written as S + X1 . More
explicitly, say X1 = (1, 3 ) . The set S + X1 is the set of all points X = (x1 , x2 ) ∈ R2 of
2
the form
X = S + X1 , which S ∈ S,
or
3
(x1 , x2 ) = (s1 , s2 ) + (1, ), where s1 + 2s2 = 0.
2
3
Consequently x1 = s1 + 1 , and x2 = s2 + 2 . Using the relation s1 + 2s2 = 0 , we ﬁnd that
3
x1 + 2x2 = 4 — exactly the equation of the straight line through X1 = (1, 2 ) and “parallel”
2 : X = S + X , where S ∈ S } , is called
to the subspace S . This subset, S + X1 = { X ∈ R
1
the X1 coset of S . Thus, cosets are the names given to “linear objects” which are not
subspaces. They are subspaces translated to pass through X1 . You might prefer to call
them aﬃne subspaces instead of cosets.
Please observe that the cosets S + X1 and S + X2 , where X1 , X2 ∈ W , are not
necessarily distinct. In our example, these cosets coincide if and only if X2 is on the line
S + X1 , that is, if X2 ∈ S + X1 . The easiest way to test this is to see if X2 − X1 ∈ S .
Say X1 = (1, 3 ) as before, and that X2 = (2, 1) . Then the cosets S + X1 and S + X2 are
2
1
the same since the point X2 − X1 = (1, − 2 ) is in S . It should be geometrically clear that 2.2. SUBSPACES. COSETS. 87 the relation of equality among these cosets is an equivalence relation (and so deserving of
the title “equality”). We shall state these ideas formally as we turn from this special - but
characteristic - example to the general situation.
The general problem of describing lines or planes or “higher dimensional linear objects”
which do not pass through the origin - so are not subspaces - is solved similarly.
Definition: . Let W be a linear space, S a subspace of V , and X1 any element of W .
All elements in W which can be written in the form S + X1 , where S ∈ S , is called the
X1 coset of S , and written as S + X1 .
Our ﬁrst theorem states that if X2 is in the X1 coset of S , then X1 is in the X2
coset of S :
Theorem 2.4 . X2 ∈ S + X1 ⇐⇒ X1 ∈ S + X2 .
Proof: Since X2 ∈ S + X1 , there is an S ∈ S such that X2 = S + X1 . Therefore
X1 = (−S ) + X2 . Because S is a linear space, (−S ) ∈ S . Thus X1 has been written as
the sum of X2 and an element of S , which means that X1 ∈ S + X2 .
By the same argument, one sees that any two cosets S + X1 and S + X2 are either
identical or are disjoint (have no element in common). Thus the cosets of S partition W
in the sense that every element of W is in exactly one coset, just as for our example, every
point in the plane R2 was in exactly one straight line parallel to the subspace determined
by x1 + 2x2 = 0 .
Although we were motivated by geometrical considerations, the ideas apply without
alteration to any linear space. This is illustrated by again examining the set
A = { f ∈ C [−1, 1] : f (0) = 1 },
which is not a subspace. It is a coset of a subspace S of C [−1, 1] which is constructed as
follows. Consider the subspace S which is “naturally” associated with A , viz.
S = { g ∈ C [−1, 1] : g (0) = 0 }.
Then A is the coset S + 1, A = S + 1 . This is true since clearly A ⊃ S + 1 . Also A ⊂ S + 1
because for every f ∈ A ,
f (x) = [f (x) − 1] + 1 = g (x) + 1, where g ∈ S.
ˆ
ˆ
Therefore A = S +1 . Similarly, we could have written A as S + f , where f is any function
in A , for example A = S + cos x . Exercises
(1) Find which of the following subsets of Rn are subspaces.
(a) { X ∈ Rn : x1 = 0 },
(b) { X ∈ Rn : x1 ≥ 0 },
(c) { X ∈ Rn : x1 − x2 = 0 },
(d) { X ∈ Rn : x1 − x2 = 1 },
(e) { X ∈ Rn : x2 − x2 = 0 },
1 88 CHAPTER 2. LINEAR VECTOR SPACES: ALGEBRAIC STRUCTURE (2) In P3 , the linear space of all polynomials of degree ≤ 3 , let A = { p(x) ∈ P3 : p(0) =
0 } , and let B = { p(x) ∈ P3 : p(1) = 0 } .
(a). Show that A and B are subspaces of P3 .
(b). Find A ∩ B and A ∪ B . Give an example which shows that A ∪ B is not a
subspace of P3 .
(3) (a) If X1 and X2 are given ﬁxed vectors in R2 then is
A = { X ∈ R2 : X = a1 X1 + a2 X2 , a1 and a2 any scalars }
a subspace of R2 ?
(b) Same as (a) but replace R2 by an arbitrary linear space W .
(c) If X1 , X2 , ..., Xk ∈ W , then is
k A = {X ∈ W : X = aj Xj , for any scalars aj },
1 a subspace of W ?
(4) Let A and B be subspaces of a linear space W . Prove that A ∪ B is also a subspace
if and only if either A ⊂ B or B ⊂ A , that is, if one of the subspaces contains the
other.
(5) Let S and T be subspaces of a linear space W , and suppose that A is a coset
of S and B is a coset of T . Prove that (a). A ⊂ B ⇒ S ⊂ T , and also (b).
A=B ⇒S=T.
(6) (a) Write the plane 2x1 − 3x2 + x3 = 7 as a coset of some suitable subspace S ⊂ R36 .
(b) Write the set A = { f ∈ C [0, 4] : f (0) = 1, f (1) = 3 } , as a coset of some suitable
subspace S ⊂ C [0, 4] .
(c) Write the set A = { f ∈ C 1 [0, 4] : f (1) = 1, f (1) = 2 } as a coset of some
suitable subspace S ⊂ C 1 [0, 4] . 2.3 Linear Dependence and Independence. Span. If W is a linear space and X1 , X2 , ..., Xk ∈ W , then we know that, for any scalars aj ,
k aj Xj = a1 X1 + a2 X2 + ... + ak Xk Y=
j =1 is also in V . Y is a linear combination of the Xj ’s. Now if 0 can be expressed as a linear
combination of the Xj ’s, where at least one of the aj ’s is not zero we expect that there is
something degenerate around. In fact, if 0 = a1 X1 + ... + ak Xk where say a1 = 0 , then we
can solve for X1 as a linear combination of X2 , X3 , ..., Xk ,
X1 = −1
(a2 X2 + ... + a, Xk ).
a1 This leads us to make a deﬁnition and state a theorem. 2.3. LINEAR DEPENDENCE AND INDEPENDENCE. SPAN. 89 Definition: . A ﬁnite set of elements Xj ∈ W, j = 1, ..., k is called linearly dependent if
k there exists a set of scalars aj , j = 1, ..., k , not all zero such that 0 = aj Xj . If the Xj
1 are not linearly dependent, we say they are linearly independent.
Theorem 2.5 . A set of vectors Xj ∈ W, j = 1, ..., k is linearly dependent if and only if
at least one of the Xj ’s can be written as a linear combination of the other Xj ’s.
To test if a given set of vectors is linearly independent, an equivalent form of Theorem
5 is useful.
Corollary 2.6 A set of vectors Xj ∈ W, j = 1, ..., k is linearly independent if and only if
k aj Xj = 0 implies that a1 = a2 = ... = ak = 0 .
j =1 Examples:
(1) The vectors X1 = (2, 0), X2 = (0, 1), X3 = (1, 1) in R are linearly dependent since
0 = X1 + 2X2 − 2X3 . Equivalently, we could have applied the theorem since X3 can
be written as a linear combination of X1 and X2
1
X3 = X1 + X2 .
2
x −x (2) The functions f1 (x) = ex , f2 (x) = e−x , f3 (x) = e +e
2
dent since
0 = f1 + f2 − 2f3 in C [0, 1] are linearly depen- (3) The vectors X1 = (2, 0, 1), X2 = (−1, 0, 0) in R3 are linearly independent, since if
for some a1 , a2 ,
0 = a1 X1 + a + 2X2 = (2a1 , 0, a1 ) + (−a2 , 0, 0),
then
0 = (0, 0, 0) = (2a1 − a2 , 0, a1 ),
which implies that 2a1 − a2 = 0 , and a1 = 0 =⇒ a1 = a2 = 0 .
a figure goes here
A simple consequence of these ideas is the following
Theorem 2.7 . If A and B are any subsets of the linear space W and if A ⊂ B , then
i) A is linearly dependent ⇒ B is linearly dependent; and the contrapositive: ii) B is
linearly independent ⇒ A is linearly independent.
We now prove the transitivity of linear dependence. 90 CHAPTER 2. LINEAR VECTOR SPACES: ALGEBRAIC STRUCTURE Theorem 2.8 . If Z is linearly dependent on the set { Yj }, j = 1, . . . , n and each Yj
is linearly dependent on the set { Xl }, l = 1, . . . , m then Z is linearly dependent on the
{ Xl } .
Proof: . This is trivial arithmetic. We know that
Z = a1 Y1 + . . . + an Yn ,
and that
Yj = c1j X1 + c2j X2 + ... + cmj Xm
By substitution then
Z = al (cll Xl + · · · + cml Xm ) + a2 (c12 X1 + · · · + cm2 Xm )
+ · · · + an (c1m X1 + · · · + cmn Xm )
= (a1 c11 + a2 c12 + · · · + an cln )X1 + (a1 c21 + · · · + an c2n )X2
+ · · · + (a1 cml + · · · + cmn )Xm
n = γ1 X1 + · · · + γm Xm , where γl = aj clj .
j =1 More concisely:
n Z= n aj Yj =
j =1 m aj
j =1 m clj Xl = l=1 n m aj clj Xl = l=1 j =1 γl Xl .
l=1 Let X1 and X2 be any elements of a linear space W . Is there a smallest subspace
A of W which contains X1 and X2 ? There are two possible ways of answering this,
constructively and non-constructively.
First, constructively. We observe that the desired subspace must contain X1 and
X2 , and all linear combinations of X1 and X2 , that is, A must contain all X ∈ W
of the form X = a1 X1 + a2 X2 for all scalars a1 and a2 . But observe that the set
B = { X ∈ V : X = a1 X1 + a2 X2 } is a linear space, since if X and Y ∈ B , then aX ∈ B
for any scalar a , and also X + Y ∈ B . Thus the desired subspace A is just B itself.
The constructive proof goes as follows: just let A be the intersection of all subspaces
containing X1 and X2 . By Theorem 3 the intersection of these subspaces is also a subspace.
It is clearly the smallest one. Do you feel cheated? This type of reasoning is often used in
modern mathematics. Although it reveals little more than the existence of the sought-after
object, it is an extremely valuable procedure when you really don’t want anything more
than to know the object exists. More important, procedures like this are vital when there
is no constructive proof available.
More generally, if S = { Xj }, j = 1, . . . , k , is any ﬁnite subset of a linear space W , we
ask for the smallest subspace A of W which contains S . There are two proofs - exactly
as in the simple case above (where k = 2 ). From the constructive proof we ﬁnd that
k A = {X ∈ W : X = aj Xj , aj scalars },
1 so A is the set of all linear combinations of the Xj ’s. This set A is called the span of S ,
and denoted by A = span(S ) . We also say that S spans A , or that A is generated by S . 2.3. LINEAR DEPENDENCE AND INDEPENDENCE. SPAN. 91 Examples:
(1) In R3 let X1 = (1, 0, 0) and S2 = (0, 1, 0) . Then the span of S = { Xj , j = 1, 2 } is
all X ∈ R3 of the form X = a1 X1 + a2 X2 = (a1 , a2 , 0) . If we imagine R3 as ordinary
3-space, then the span of X1 and X2 is the entire x1 , x2 plane.
(2) In R3 , let X1 = (1, 0, 0), X2 = (0, 1, 0) , and X3 = (0, 0, 1) . Then the span of
T = { Xj , j = 1, 2, 3 } is all X ∈ R3 of the form X = a1 X1 + a2 X2 + a3 X3 (a1 , a2 , a3 ) .
Since all of R3 can be so represented, we have span(T ) = R3 , that is, the set B spans
R3 . Comparing these two examples, we see that S ⊂ T and span(S ) ⊂ span(T ) .
(3) In R3 , let X1 = (1, 0) and X2 = (0, 1) . Then the span of S = { X1 , X2 } is all of
R2 , since every X ∈ R2 can be written as X = a1 X1 + a2 X2 , where a1 and a2 are
scalars. Many other sets also span R2 . In fact almost every set of two vectors X1 and
X2 in R2 span R2 . This can be seen from the diagram, where we have drawn a net
parallel to X1 and X2 . Then X = a1 X1 + a2 X2 . Any vectors X1 and X2 would
do equally well, as long as they do not point in the same (or opposite) direction.
We collect some properties of the span
Theorem 2.9 . Let R, S , and T be subsets of a linear space W . Then
(a) R ⊂ span(R) .
(b) R ⊂ S =⇒ span(R) ⊂ span(S ).
(c) R ⊂ span(S ) and S ⊂ span(T ) =⇒ R ⊂ span(T ).
(d) S ⊂ span(T ) =⇒ span(S ) ⊂ span(T ).
(e) span(span(T )) = span(T ).
(f) A vector Xj ∈ S is linearly dependent on the other elements of S ⇐⇒ span(S ) =
span(S − { Xj }) . (Here S − { Xj } means the set A with the one vector Xj deleted).
Proof: These all depend on the representation of span(S ) as a linear combination of the
elements of S .
(a) and (b)—Obvious. They really should be if you understand the deﬁnitions.
(c). A direct translation of Theorem 7.
(d). This is the special case R = span(S ) of part c.
(e). By part (a) span(span(T )) ⊃ span(T ) . The opposite inclusion span(span(T )) ⊂
span(T ) is the special case S = span(T ) of part (d).
(f). Xj linearly dependent on S − { Xj } =⇒ S ⊂ span(S − { Xj }) . Thus by part (d),
span(S ) ⊂ span(S −{ Xj }) . Inclusion in the opposite direction span(S −{ Xj }) ⊂ span(S )
follows from part (b). Therefore span(S ) = span(S − { Xj }) means that Xj ∈ span(S )
can be expressed as a linear combination S − { Xj } , i.e., the other Xk ’s.
Now most likely this proof was your ﬁrst taste of abstract juggling and you ﬁnd it
diﬃcult. Relax and don’t be impressed with how formidable it appears. Except for parts a
and b, the whole business hinges on the explicit construction of Theorem ?. Since (d) is a
special case of (c), a good exercise is to write out the proof of (d) without relying on (c).
In R2 , let X1 = (1, 0) , X2 = (0, 1) , and X3 be any vector in R2 . Observe that X1
and X2 together span R2 . Thus X3 can be expressed as a linear combination of X1 and
X2 , so that X1 , X2 , and X3 are linearly dependent. The next theorem is a generalization
of this idea. 92 CHAPTER 2. LINEAR VECTOR SPACES: ALGEBRAIC STRUCTURE Theorem 2.10 . If a ﬁnite set A = { Xj , j = 1, . . . , n } spans a linear space W , then ev˜
ery set S = { Yj ∈ V, j = 1, . . . , m > n } with more than n elements is linearly dependent.
In other words, every linearly independent set has at most n elements.
˜
Proof: Pick any n + 1 elements Y1 , . . . Yn+1 from S and throw the rest away. Call the
new set S . We shall show that these n + 1 elements are linearly dependent. Then, since
˜
˜
S ⊂ S , Theorem ? tells us that S is also linearly dependent. The only problem is how
to carry out the proof without getting involved in a mess of algebra. By the principle of
conservation of eﬀort, this means that there will be some fancy footwork.
Reasoning by contradiction, assume S is linearly independent. If we can show that
span(A) = span(S − { Yn+1 }) , then span(S ) ⊂ span(S − { Yn+1 }) because span(S ) ⊂ V =
span(A) = span(S − { Yn+1 })) . Since span(S − { Yn+1 }) ⊂ span(S ) , we can apply part f
of Theorem ? to conclude that S is linearly dependent - the desired contradiction.
Thus, assuming S = { Y1 , . . . , Yn+1 } is linearly independent, we are done if we prove
that span(A) = span(S − { Yn+1 }) . Consider the set Bk = { Y1 , . . . , Yk , Xk+1 , . . . , Xn } .
We know that B0 = A , so that span(B0 ) = span(A) = W . Then by induction we shall
prove that span(Bk ) = W =⇒ span(Bk+1 ) = W . Since span(Bk ) spans W , then Yk+1
is a linear combination of the elements of Bk . Because the Y ’s are assumed linearly
independent, this linear combination must involve at least one of Xk+1 , . . . , Xn . Say it
involves Xk+1 (if not, relabel the X ’s to make it so). Then we can solve for Xk+1 as a
linear combination of span(Bk+1 ) . Therefore W = span(Bk ) = span(Bk+1 ) . Putting this
part together, we ﬁnd that span(A) = W = span(B0 ) = span(B1 ) = . . . = span(Bn ) . But
Bn = S − { Yn+1 } . Thus span(A) = span(S − { Yn+1 }) , and the proof is completed.
Example. In R2 , any three (or more) non-zero vectors are linearly dependent since the two
vectors X1 = (1, 0) and X2 = (0, 1) span R2 . Exercises
(1) (a) In P2
(b) In R3 , p1 (x) = 1, p2 (x) = 1 + x, p3 (x) = x − x2 X1 = (0, 1, 1), X2 = (0, 0, −1), X3 = (0, 2, 3) . (c) In C [0, π ], f (x) = sin x, g (x) = cos x .
(d) In Rn , e1 = (1, 0, 0, . . . , 0), e2 = (0, 1, 0, 0), . . . , en = (0, 0, . . . , 0, 1) .
(2) Use the result of (d) to show that any set of n + 1 vectors in Rn must be linearly
dependent.
(3) (a) Find a set which spans
i) P3 , ii) R4 (b) Show that no ﬁnite set spans l1 .
(4) Let X1 , . . . , Xk be any elements of a linear space V .
(a) Prove that span({ X1 , . . . , Xk }) = span({ X1 + aXj , X2 , . . . , Xk }) , where a is
any scalar and Xj is any of the X2 , X3 , . . . , Xk , .
(b) Prove that span({ X1 , . . . , Xk }) = span({ aX1 , X2 , . . . , Xk }), a = 0 . 2.4. BASES AND DIMENSION 93 (c) In Rn , consider the ordered set of vectors { X1 , X2 , . . . , Xk } , where Xj =
(x1j , x2j , . . . , xnj ) . They are said to be in echelon form if i) no Xj is zero, and ii)
the index of the ﬁrst non-zero entry in Xj is less than the index of the ﬁrst nonzero entry in Xj +1 , for each j = 1, . . . , k − 1 . Thus X1 = (0, 1, 0), X2 = (0, 0, 1)
are in echelon form while X1 = (0, 1, 0), X2 = (1, 0, 1) are not in echelon form.
Prove that any set of vectors in echelon form is always linearly independent. (I
suggest a proof by induction).
(5) For what real value(s) of the scalar α are the vectors (α, 1, 0), (1, α, 1) and (0, 1, α)
in R3 linearly dependent?
(6) (a) In R3 , let X1 = (3, −1, 2) . Express (−6, 2, −4) linearly in terms of X1 . Show
that (3, 4, −7) cannot be expressed linearly in terms of X1 . Can (1, 2, 1) be
expressed linearly in terms of X1 ?
(b) In R3 , let A = { X1 , X2 } , where X1 = (1, 3, −2) and X2 = (2, 1, 1) . Express
(3, −1, 4) linearly in terms of A . Show that (0, 0, 2) cannot be expressed linearly
in terms of A . Can (0, 5, −5) be expressed linearly in terms of A ?
(7) (a) In C [0, 10] , let f1 , . . . , f8 be deﬁned by
f1 (x) = x2 − x + 2,
f2 (x) = (x + 1)2
f3 (x) = x + 3
f4 (x) = 1 f5 (x) = x3
f6 (x) = sin x
f7 (x) = cos x
f8 (x) = sin(x + π/4). Let A = { f1 , f2 , f3 } . Express f4 linearly in terms of A . Show that f5 cannot
be expressed linearly in terms of A . Is f6 ∈ span(A) ? Is f8 ∈ span(f6 , f7 ) ? Is
f6 ∈ span(f5 , f7 , f8 ) ?
(b) If we let f9 (x) = (x − 1)3 , f10 (x) = 2x − 1 , determine which of the following sets
are linearly dependent:
(i)
(ii)
(iii)
(iv) 2.4 { f1 , f3 , f10 },
{ f1 , f5 , f9 },
{ f3 , f4 , f10 },
{ f1 , f4 , f5 , f9 }, Bases and Dimension If the set { X1 , . . . , Xm } spans the linear space W , is there any set with less than m
vectors which also spans W ? There certainly is if the { X1 , . . . Xm } are linearly dependent, for if say Xm depends linearly upon the { X1 , . . . , Xm−1 } , then by Theorem ??,
span({ X1 , . . . , Xm }) = span({ X1 , . . . , Xm }) = W , so then { X1 , . . . , Xm−1 } span W .
We can continue and eliminate the extra linearly dependent elements until we obtain a set
{ X1 , . . . , Xn } of linearly independent vectors which still span W .
Deﬁnition. A set of vectors Xj ∈ W, j = 1, . . . , n which is i) linearly independent, and
ii) spans W is called a basis for W .
Examples. 94 CHAPTER 2. LINEAR VECTOR SPACES: ALGEBRAIC STRUCTURE (1) In R2 , the vectors X1 = (1, 0) and X2 = (0, 1) are linearly independent and span
R2 . Therefore X1 and X2 form a basis for R2 . The vectors X3 = (3, −1) and
X4 = (−2, 2) in R2 are also linearly independent and span R2 . They thus constitute
another basis for R2 . Almost any two vectors in R2 span R2 , as long as they do not
point on the same or opposite direction.
(2) In P2 , the polynomials p1 (x) = 1 , and p2 (x) = x − x2 do not form a basis. They
are linearly independent but do not span the space - since for example you can never
obtain the polynomial p(x) = x which is in P2 . If we add the third polynomial, say
p3 (x) = x − 2x2 , then p1 , p2 and p3 do form a basis for P2 .
Bases have an important property.
Theorem 2.11 . If { X1 , . . . , Xn } form a basis for the linear space W , then every X ∈
W can be expressed uniquely as a linear combination of the Xj ’s.
Remark: Every set which spans W has, by deﬁnition, the property that every X ∈ W can
be expressed as a linear combination of the Xj ’s. The point here is that for a basis, this
linear combination is uniquely determined.
n Proof: Suppose that X = n ak Xk and also X =
1 bk Xk . We must show that ak = bk
1 n ck Xk , where ck = ak − bk . for all k . Subtracting the two equations we ﬁnd that 0 =
1 But since the Xk ’s are linearly independent, by the Corollary to Theorem 5, the only way
a linear combination can be zero is if ck = 0, k = 1, . . . , n , that is, ak = bk for all k .
We have observed that a linear space may have several diﬀerent bases. Is it possible
that diﬀerent bases contain a diﬀerent number of elements? Our next theorem states that
the answer is NO.
Theorem 2.12 . If a linear space W has one basis with a ﬁnite number of elements, say
n , then all other bases are ﬁnite and also have exactly n elements.
Proof: We invoke Theorem ?. Let A be a basis with n elements and B be a basis with
m elements. Now A spans W and the elements of B are linearly independent, so the
Theorem ?, m ≤ n . Reversing the roles of A and B we ﬁnd that n ≤ m . Therefore
n = m.
With this result behind us, we can now deﬁne the dimension of a linear space.
Deﬁnition. If a linear space W has a basis with n elements, then we say that the dimension
of W is n . If a linear space W has the property that no ﬁnite set of elements spans it,
we say it is inﬁnite dimensional.
Remarks. Theorem ? states that the dimension of W is independent of which basis we
happened to pick. If we want to emphasize the dimension of a ﬁnite dimensional space, we
will write W n .
Announcement. The dimension of Rn is n , for the n elements e1 = (1, 0, 0, . . . , 0), e2 =
(0, 1, 0, . . . , 0), . . . , en = (0, . . . , 0, 1) are linearly independent and span Rn .
A picture. We have seen that e1 = (1, 0, 0), e2 = (0, 1, 0), e3 = (0, 0, 1) form a basis
in R3 . Thus every X ∈ R3 can be expressed uniquely as a linear combination of the ej ’s,
X = a1 e1 + a2 e2 + a3 e3 . If we represent e1 as a directed line segment from the origin to
(1, 0, 0) , and similarly for e2 and e3 , then X is the geometrical sum of a1 e1 + a2 e2 + a3 e3 , 2.4. BASES AND DIMENSION 95 and is represented as a directed line segment from the origin to (a1 , a2 , a3 ) . In R3 , e1 is
usually written as i , e2 as j and e3 as k , so that a vector X ∈ R3 is written as
X = a1 i + a2 j + a3 k .
The points in the plane x3 = 0 , which is isomorphic to R2 , are then represented as
ˆ
X = a1ˆ + a2 ˆ + 0k = a1ˆ + a2 ˆ . We would retain this notation except that one runs out of
i
j
i
j
letters when considering spaces of higher dimension. For that reason the subscript notation
e1 , e2 , . . . is better suited to our purposes.
It behooves us to show that the linear space C [0, 1] of functions continuous in the
interval [0, 1] is inﬁnite dimensional. This will be done by proving that the functions
f0 (x) = 1, f1 (x) = ex , f2 (x) = e2x , . . . , fn (x) = enx , . . . are linearly independent. Assume
N ak ekx , where N is any non-negative integer. We must show that all the ak ’s that 0 =
k=0 are zero.
The trick is to use induction. For N = 0 , we know that 0 = a0 only if a0 = 0 .
N −1 ak ekx = 0 if and only Suppose 1, ex , e2x , . . . , e(N −1)x are linearly independent. Then
k=0
N ak ekx = 0 if and only if if all of the ak ’s are zero. Let us show that this implies that
k=0 all the ak ’s vanish. Take the derivative. The constant term drops out and we are left with
0 = a1 ex + 2a2 e2x + · · · + N aN eN x .
Factor out ex
0 = ex (a1 + 2a2 ex + · · · + N aN e(N −1)x ).
Since ex is never zero, we know that
0 = a1 + 2a2 ex + · · · + N aN e(N −1)x .
By our induction hypothesis, this linear combination of !, ex , . . . , e(N −1)x can be zero if
and only if a1 = a2 = a3 = · · · = aN = 0 . It remains to show that a0 = 0 . This is an
N ak ekx = 0 and the vanishing of ak , for k ≥ 1 . immediate consequence of
k=0 Since the functions 1, ex , e2x , . . . , are in C k [a, b] for any k we have shown that these
spaces are inﬁnite dimensional too. Moreover, the exact same proof also shows that the
set { eα1 x , eα2 x , . . . , eαN x } , where α1 , . . . , aN are arbitrary distinct complex numbers, is
linearly independent. This fact will be needed later. Perhaps we shall present a diﬀerent
proof - or several diﬀerent ones - at that time. All of the other proofs still involve some
calculus - but that should be no surprise since we used calculus to deﬁne the exponential
function in the ﬁrst place.
Not all spaces of functions are inﬁnite dimensional. For example, the function space
A = { f ∈ C [−1, 1] : f (x) = a + bex , a, b ∈ R } has dimension 2. The functions f1 (x) = 1
and f2 (x) = ex constitute a basis for A because every f ∈ A can be written in the form
f = a1 f1 + a2 f2 , where a1 and a2 are real numbers. Another basis for A is f3 (x) = 1 + ex
and f4 = 2 − ex . There are many ways to see this. One is to observe that f3 + f4 = 3
b
and 2f3 − f4 = 3ex . Thus if f (x) = a + bex ∈ A , then f = a (f3 + f4 ) + 3 (2f3 − f4 ) =
3
a
2b
a
b
( 3 + 3 )f3 + ( 3 − 3 )f4 . 96 CHAPTER 2. LINEAR VECTOR SPACES: ALGEBRAIC STRUCTURE The function space B = { f ∈ C [−1, 1] : f (x) = a sin(x + α), α, a ∈ R } , also has
dimension two, since f (x) = (a cos α) sin x + (a sin α) cos x = a1 sin x + a2 cos x . Thus
f1 (x) = sin x and f2 (x) = cos x form a basis. Actually, we have only shown that f1
and f2 span B , but not that they are linearly independent. You can settle that point
yourselves.
A few more remarks should be added. If A is a subspace of an n dimensional space
n , we would like to enlarge a basis { e , . . . , e } for A to a larger basis { e , . . . , e } for
W
1
1
n
k
all of W . Since A ⊂ W n , it is clear that k = dim A ≤ n . If A = W n , we are done since
{ e1 , . . . , ek } already span W n . Otherwise there is some element ek+1 in W n which is
not in A . Let A1 = span{ e1 , . . . , ek+1 } ⊂ W n . If A1 = W n , then { e1 , . . . , ek+1 } form
a basis for W n . Otherwise there is some element ek+2 in W n which is not in A1 . Form
A2 = sp{ e1 , . . . , em+2 } . Repeat this process until you ﬁnally get a basis for all W n . Only
a ﬁnite number of steps are needed since the dimension of W n is ﬁnite. This proves
Theorem 2.13 . If A is a subspace of (ﬁnite dimensional) space W , then any basis for
A can be extended to a basis that spans all of W .
Consider a subspace A of a linear space W . Somehow we would like to discuss - and
give a name to - the part A of V which is not in A . We would like A to be a subspace
of V such that the only element of V which A and A share is 0, and such that every
element in V can be written as the sum of an element in A and an element in A .
Definition: . Let A be a subspace of the linear space V . A complementary subspace A
of A is a subset of V with the properties
1. A is a subspace of V ,
2. If X ∈ V , then X = X1 + X2 , where X1 ∈ A and X2 ∈ A .
3. A ∩ A = 0 . (The zero vector, not the empty set).
Our ﬁrst task is to prove
Theorem 2.14 . Every subspace A ⊂ V has at least one complement A .
.
Proof: Let { e1 , . . . , em } be a basis for A , and { e1 , . . . em , em+1 , . . . , en } an extension
to a basis for V . We shall verify that A = sp{ em+1 , . . . , en } satisﬁes both criteria.
Now if X ∈ A and X ∈ A , then we can write X = a1 e1 + . . . + am em ∈ A , and
X = am+1 ee+1 + . . . + an en ∈ A . Subtracting these equations, we ﬁnd
0 = a1 e1 + . . . + am em − am+1 em+1 − . . . − an en .
But since { e1 , . . . , en } is a basis for V , the elements are linearly independent. Thus
a1 = a2 = . . . = am = am+1 = . . . = an = 0 , so X = 0 . Therefore A ∩ A = 0 .
Furthermore, if X ∈ V since { e1 , . . . , en } is a basis for V , then
n X= cj ej + cj ej =
j =1 n m
j =1 cj ej .
j =m+1 Thus we just let X1 = c1 e1 + . . . + cm em ∈ A and X2 and X2 = cm+1 em+1 + . . . + cn en .
It is easy to see that the above construction of A is independent of the basis chosen for
A . This is because the construction of em+1 , . . . , en (Theorem ??) did not depend on the
particular basis for A . That construction only utilized the fact that we can pick elements 2.4. BASES AND DIMENSION 97 not in A . However, the construction of A does depend on which elements em+1 , . . . en
(not in A ) we pick. For example, let V = R2 , and A be some one dimensional subspace.
Then we pick e , as any vector in A , and e2 as any vector not in A . The resulting
complement A is then the span of e2 . But { e1 } could have been extended to a basis
˜
for V by choosing another vector e2 A . This determines a diﬀerent complement A of
˜
A . A subspace has many possible complements. This ambiguity will not bother us since
we shall only use the properties of a particular complement which do not depend on which
particular complement is chosen. The dimension of the complement is such a property. It
only depends on the dimension of the subspace A and the larger space V , and has the
reasonable formula dim A = dim V − dim A , which we now prove.
Theorem 2.15 . If A is a subspace of a linear space V and if A is any complement of
A , then
dim A + dim A = dim V.
Thus, the dimension of A is determined by A and V alone.
Proof: The dim A and dim V are given data. We shall compute dim A . Since the
union of a basis for A with a basis for any A spans V (property 2), it is clear that
dim A + dim A ≥ dim V . However A and any A intersect only at the origin (property
3) and are subspaces of V . Thus the union of their bases can span at most V , that is,
dim A + dim A ≤ dim V . These two inequalities prove the theorem.
REMARK. Some people refer to dim A as the codimension of A (complementary
dimension). In this way they avoid mentioning A at all. The last theorem can be written
as dim A + codim A = dim V .
A simple result closes the chapter.
Theorem 2.16 . If A is a subspace of V and A is a complement of A , then for X ∈ V
the decomposition X = X1 + X2 , X1 ∈ A, X2 ∈ A is unique.
˜
˜
Proof: Assume there are two decompositions, X = X1 + X2 and X = X1 + X2 . Then
˜ 1 + X2 = X1 + X2 or X1 − X1 = X2 − X2 . However the left side of this equation is in
˜
˜
˜
X
˜
A while the right is in A . The only element in both A and A is 0. Thus X1 = X1 and
˜
X2 = X2 .
a figure goes here
EXERCISES
(1) (a) Let A = { X ∈ R2 : x1 = 0 } . Find a basis for A and extend it to a basis for all
of R2 . Use this to deﬁne a complement A of A . Sketch A and A . Extend
the same basis for A in a diﬀerent way to a basis for all of R2 . Use this to
˜
˜
deﬁne another complement A of A . Sketch A .
(b) Find a basis for the subspace A = { X ∈ R2 : x1 + x2 + x3 = 0 } . Extend
this basis to one for all of R3 . Deﬁne a complement A of A induced by this
extension. Write X = (−1, 0, 7) as X = Y1 + Y2 where Y1 ∈ A and Y2 ∈ A .
(2) (a) Let A = { p ∈ P2 : p(0) = 0 } . Find a basis for A and extend it to a basis for
all of P2 . Deﬁne A induced by this extension. Is the particular polynomial
p(x) = 1 + x2 in A ? in A ? Write p(x) as p(x) = q1 (x) + q2 (x) where
q1 (x) ∈ A, q2 (x) ∈ A . 98 CHAPTER 2. LINEAR VECTOR SPACES: ALGEBRAIC STRUCTURE
(b) Let A = { p ∈ P2 : p(1) = 0 } . Find a basis for A and extend it to a basis for
all of P2 . (3) Let A be a subspace of a linear space V . Show by an example that a basis for V
need not contain a basis for A .
(4) If dim V = n and V = sp{ X1 , . . . , Xn } , prove that X1 , . . . , Xn are linearly independent.
(5) Let V = P4 and A the subspace spanned by 1, x2 and x4 . Find three diﬀerent
subspaces complementary to A (you may specify a subspace by giving a basis for it).
After all this about bases, it is probably best to notify you that properties of linear
spaces are best deﬁned and proved without introducing a particular basis. As soon as you
deﬁne a property of a linear space in terms of a basis, you must then prove that the property
is intrinsic to the space itself and does not depend upon the basis you choose. We met this
problem in deﬁning the dimension in terms of a basis - and were consequently forced to
prove Theorem ? which stated that the property really only depended on the space itself,
not on the basis chosen.
This, in fact, corresponds to one of the major requisites for laws of physics: they should
not depend upon the particular coordinate system you choose (picking a coordinate system
is equivalent to picking a basis). Moreover, the laws should not depend on the units you
choose for each axis of the coordinate system. But these are long, involved questions which
must be investigated deeply to make our remarks precise.
One should, however, distinguish theoretical issues from computational ones. In theoretical questions, the rule is never pick a speciﬁc basis unless there is no way out. On the
other hand, for computational questions you must always pick a basis. Just as in physics,
on order to perform any measurements, you must pick some speciﬁc coordinate system
and speciﬁc units. If the theoretical foundations are ﬁrm, then you can feel conﬁdent that
no matter what choice of basis you make, the essential nature of the results will remain
unchanged.
As an example, let us consider a point P and two diﬀerent ﬁxed coordinate systems in
the plane of this paper. You should feel that any motion of the point P can be described
adequately in either coordinate system - and that when the observers in the two coordinate
systems get together and discuss the motion of P , they will agree as to what happened. A
common example is the meeting of two people from countries using diﬀerent units of money. Exercises
(1) Prove that any n + 1 elements in a linear space of dimension n must be linearly
dependent.
(2) Prove that Pn has dimension n + 1 .
(3) Since a basis for a linear space of dimension n must contain exactly n elements,
all one must test is that the n elements which are candidates for a basis are linearly independent - or equivalently that they span the space. Show that the vectors
{ X1 , . . . , Xn } form a basis for Rn if and only if e1 , e2 , . . . , en can all be expressed
as a linear combination of the { X1 , . . . , Xn } . 2.4. BASES AND DIMENSION 99 (4) Use Exercise 3 to determine which of the following sets from bases for R3 .
(a) X1 = (1, 1, 0), X2 = (1, 0, 1), X3 = (0, 1, 1).
(b) X1 = (1, 0, 1), X2 = (1, 1, 1).
(c) X1 = (1, 0, 1), X2 = (1, 1, 0), X3 = (0, −1, 1).
(d) X1 = (1, 1, 1), X2 = (1, 2, 3), X3 = (17, 3, 9), X4 = (−2, 7, −1).
11
(e) X1 = (−1, 0, 2), X2 = (1, 1, 1), X3 = ( 2 , 3 , −1). (5) Prove that the subspace of functions in C [0, π ] which vanish at x = 0 and at
x = π is inﬁnite dimensional by showing that the functions f1 (x) = sin x, f2 (x) =
sin 2x, . . . , fk (x) = sin kx, . . . are all linearly independent. [Hint: Assume that 0 =
N ak sin kx , for arbitrary N and show that all the ak ’s must be zero by multiplying
k=1 both sides by sin nx and utilizing the important formula
π sin nx sin kx dx =
0 0
π
2 , k=n
, k = n. .] (6) Let C ∗ [a, b] denote the set of all complex-valued functions f (x) = u(x) + iv (x)
which are continuous for x ∈ [a, b] . The complex number ﬁeld C is the ﬁeld of
scalars for C ∗ . What is the dimension of the subspace A = { f ∈ C ∗ [−π, π ] : f (x) =
aeix + be−ix , a, b ∈ C } ? Show that f1 (x) = cos x and f2 (x) = sin x constitute a
basis for A . [Hint: Use (?) on p. ?].
(7) Which of the following sets of vectors form a basis for R4 ?
(a) X1 = (1, 0, 0, 5), X2 = (0, 3, 2, 6), (b) X1 = (1, 6, 7, 0), X2 = (−2, 2, 5, 0), X3 = (4, 5, 6, 0), X4 = (7, 8, 3, 0). (c) X1 = (1, 2, 5, 7),
X5 = (0, 0, 0, 1). X2 = (4, 9, 11, 8), X3 = (6, 3, 12, 2), X4 = (3, −4, 7, 6), (d) X1 = (1, 2, 3, 4), X2 = (0, 2, 3, 4), (8) Find a basis for the following subspaces.
(a) A = { X ∈ R2 : x1 + x2 = 0 }
(b) B = { X ∈ R3 : x1 + x2 + x3 = 0 }
(c) C = { p ∈ P3 : p(0) = 0 }
(d) D = { p ∈ P3 : p(1) = 0 }
(e) E = { u ∈ C 1 [−1, 1] : u − u = 0 }
(f) F = { u ∈ C 1 [−1, 1] : u + 2u = 0 } . X3 = (0, 0, 1, 2), X3 = (0, 0, 3, 4), X4 = (0, 0, 0, 1). X4 = (0, 0, 0, 4). 100 CHAPTER 2. LINEAR VECTOR SPACES: ALGEBRAIC STRUCTURE Chapter 3 Linear Spaces: Norms and Inner
Products
3.1 Metric and Normed Spaces Until now we have been contented with being able to add two elements X1 and X2 of a
linear space, and to multiply them by scalars, aX . Since only these algebraic operations
have been deﬁned, only algebraic questions could have been raised and answered. Notably
absent were any mention of convergence, because the idea of one element of a linear space
being “close” to another was not deﬁned. In this chapter we shall introduce a distance
or metric structure into linear spaces. Instead of lingering in the realm of generalities, we
shall deﬁne metric and norm in this ﬁrst section and devote the balance of the chapter
to a particular kind of metric which generalizes the “Pythagorean distance” of ordinary
Euclidean space. Fourier series supply a wonderful and valuable application.
Our ﬁrst notion of distance, that of a metric, makes sense for elements X, Y, Z of an
arbitrary set S . The idea is to deﬁne the distance d(X, Y ) between any two elements of
S . This distance is a function which assigns to every pair of points (X, Y ) a positive real
number d(X, Y ) called the “distance between X and Y ”.
Deﬁnition. Let S be a non-empty set. A metric on S is a real-valued function d :
S × S → R , where X, Y ∈ S , which has the three properties:
i) d(X, Y ) ≥ 0.
d(X, Y ) = 0 ⇐⇒ X = Y
ii) (symmetry) d(X, Y ) = d(Y, X ) ,
iii) (triangle inequality) d(X, Z ) ≤ d(X, Y ) + d(Y, Z ) .
Well, they certainly are reasonable requirements for any function we intend to think of
as measuring distance.
Examples.
(1) This ﬁrst example is trivial but acts as an important check on intuition. With it, you
see that every non-empty set can be regarded as a metric space with the following
metric
0 , if X = Y
d(X, Y ) =
1 , if X = Y.
A moments reﬂection will show that this is a metric—but not too useful since it is so
coarse.
101 102 CHAPTER 3. LINEAR SPACES: NORMS AND INNER PRODUCTS (2) For the real line, R , with the usual deﬁnition of absolute value we deﬁne d(X, Y ) =
|X − Y | , which is clearly a metric.
|X − Y |
(3) Another less common metric may be given to R . We deﬁne d(X, Y ) = 1+|X −Y | .
Only the triangle inequality is not evident—and that involves some algebra. This
metric has the property that the distance between any two points is always less than
one, d(X, Y ) < 1 for all X, Y ∈ R . (4) Rn can be endowed with many metrics. Let X = (x1 , x2 , . . . , xn ) , Y = (y1 , . . . , yn )
and Z = (z1 , . . . zn ) be arbitrary points in Rn . The metric you most expect is the
Euclidean distance
n
2 2 1/2 d(X, Y ) = [(x1 − y1 ) + . . . + (xn − yn ) ] (xk − yk )2 ]1/2 =[
k=1 Again, only the triangle inequality is not obvious. It is a consequence of the CauchySchwarz inequality
2 n n n x2
k ≤ xk yk
k=1 2
yk , (3-1) k=1 k=1 which in turn is an immediate consequence of the algebraic identity
2 n xk yk n n k=1 2
yk − x2
k = k=1 k=1 1
2 n n (xi yj − xj yi )2 .
i=1 j =1 And now the triangle inequality. Let ak = xk − yk , and bk = yk − zk . Then
xk − zk = ak + bk . Thus, using Cauchy- Schwarz in the second line below, we ﬁnd
that
n [d(X, Z )]2 = n n (ak + bk )2 =
k=1 n n a2 + 2
k ≤
k=1 a2
k
k=1 1/2 n a2
k =
k=1 b2
k b2
k (3-2) k=1 1/2 2 n b2
k + k=1
n + k=1 b2
k ak bk + k=1
1/2 k=1 n n a2 + 2
k = [d(X, Y ) + d(Y, Z )]2 , k=1 so
d(X, Z ) ≤ d(X, Y ) + d(Y, Z ).
Another proof of the Schwarz and triangle inequalities for this metric will be given
later in the chapter.
(5) A second metric for R is
n |xk − yk | d(X, Y ) =
k=1 The axioms for a metric are easily veriﬁed. 3.1. METRIC AND NORMED SPACES 103 (6) A third metric for Rn is
1/p n
p |xk − yk | d(X, Y ) = , 1 ≤ p < ∞. k=1 Example 4 is the special case p = 2 , while example 5 is the special case p = 1 . And
again, all but the triangle inequality are obvious. However the triangle inequality,
called Minkowski’s inequality in this general case, is not simple. We shall not prove it
here. Perhaps it will appear as an exercise later.
(7) The usual metric for C [a, b] is the uniform metric
d(f, g ) = max |f (x) − g (x)| .
a ≤ x≤ b Geometrically, this distance is the largest vertical distance between the graphs of f
and g for all x ∈ [a, b] .
(8) The space L1 [a, b] of functions whose absolute value is integrable has the “natural”
metric
b |f (x) − g (x)| dx, d(f, g ) =
a which can be interpreted as the total area between the two curves. Since every function
which is continuous for x ∈ [a, b] is integrable there, i.e., C [a, b] ⊂ L1 [a, b] , this metric
is another metric for C [a, b] .
(9) For the function space C 1 [a, b] , the standard metric is
d(f, g ) = max |f (x) − g (x)| + max f (x) − g (x)
a ≤ x≤ b a ≤ x≤ b The metric for C k [a, b] is deﬁned similarly.
There are many theorems one can prove about metric spaces (a metric space is a set
S on which a metric is deﬁned). Look in any book on general topology (or point set
topology, as it is often called) and you will ﬁnd more than enough to satisfy you. For
most of our purposes metric spaces are too general. Normed linear spaces will suﬃce.
The norm X of an element X in a linear space V is the “distance” of X from
the origin—the 0 element of V .
Deﬁnition. Let V be a linear space over the real or complex ﬁeld. If to every element
X ∈ V there is associated a real number X , the norm of X , which has the three
properties
i) X ≥ 0. X = 0 ⇐⇒ X = 0 ii) aX = |a| iii) X + Y ≤ X + Y , (triangle inequality), X (homogeneity), a is a scalar, then we say that V is a normed linear space.
How does a norm diﬀer from a metric?
First of all, a norm is only deﬁned on a linear space (since aX and X + Y appear in
the deﬁnition) whereas a metric may be deﬁned on any set (cf. example 1 above). But if we 104 CHAPTER 3. LINEAR SPACES: NORMS AND INNER PRODUCTS restrict our attention to linear spaces, how do the concepts of norm and metric diﬀer? Every
normed linear space can be made into a metric space in such a way that X is indeed the
distance of X from the origin, d(X, 0) = X . The explicit formula for d(X, Y ) should
surprise no one
d(X, Y ) = X − Y .
It is easy to check that d(X, Y ) is a metric. Thus every normed linear space has a “natural”
metric induced upon it. However, a linear space which has a metric need not be a normed
linear space. For example in R , the linear space of the real numbers, the metric of example
3
|X − Y |
d(X, Y ) =
1 + |X − Y |
is not associated with a norm because axiom ii) for a norm is not satisﬁed.
Of the examples considered earlier, all but the ﬁrst and third metrics arise from norms,
in the sense that
d(X, Y ) = d(X − Y, 0) = X − Y .
By far the most common norm in Rn is that given by the Pythagorean theorem (example 4). Then
1/2 n X= x2
1 + x2
2 + ··· + 2
kn x2
x =
k=1 and the induced metric is
1/2 n
2 d(X, Y ) = X − Y = (xk − yk )
k=1 For obvious historical reasons, we shall refer to R2 with this Pythagorean norm as Euclidean n -space, and denote it by En . Note that En is a linear space with a particular way
of measuring length speciﬁed. A metric removes the ﬂoppiness from Rn , giving the additional structure needed to investigate those geometrical concepts which utilize the notion
of distance.
Once we have a norm (or metric) it becomes possible to discuss convergence of a sequence of elements.
Deﬁnition: If V is a normed linear space, the sequence Xn ∈ V converges to X ∈ V if,
given any > 0 , there in an N such that
Xn − X < for all n > N. As an example, we shall prove the sample
(n) (n) (n) Theorem 3.1 . A sequence of points Xn = (x1 , x2 , . . . xk ) in Ek converges to the
(n)
point X = (x1 , . . . , xk ) in Ek if and only if each component xj converges to its respective
(n) limit, lim xj
n→∞ = xj , j = 1, . . . , k .
(n) Proof: i) Xn → X ⇒ xj
(n) xj − xj ≤ → xj . This is a consequence of the trivial inequality
(n) (n) (x1 − x1 )2 + · · · + (xk − xk )2 = Xn − X ; 3.1. METRIC AND NORMED SPACES 105
(n) (n) for if Xn − X < for n > N , then xj − xj < for n > N too. Thus xj → xj .
If the subscripts are cluttering up the proof, go through it again in a special case, say
(n)
x2 → x2 .
(n)
ii) xj → xj ⇒ Xn → X . By hypothesis, given any
> 0 , there are numbers
(n) N1 , N2 , . . . Nk such that x1 − x1 <
N2 , . . . , (n)
xk − xk < work for all the (n)
xj (n) , for all n > N1 , x2 − x2 < for all n > for all n > Nk . Pick N = max(N1 , N2 , . . . , Nk ) . This N will ’s, that is, for every j ,
(n) xj − xj < for all n > N. Thus
Xn − X =
< (n) (n) (x1 − x1 )2 + . . . + (xk − xk )2
√
2 + ... + 2 =
k, for all n > N. (3-3) Since k is a ﬁxed ﬁnite number, this shows that Xn − X may be made arbitrarily small
by picking n big enough, so Xn does converge to X .
1
n
Example. In E4 , the sequence Xn = ( n+1 , 2, − n , 0) converges to X = (1, 2, 0, 0) since
n
1
n+1 → 1, 2 → 2, − n → 0 , and 0 → 0 .
A useful elementary result is
Theorem 3.2 . If V is a normed linear space, and if Xn → X, Yn → Y in V , then for
any scalars a and b, aXn + bYn → aX + bY .
Proof: There are essentially no changes from the case of R1 . We must show that aXn +
bYn − aX − bY can be made arbitrarily small by picking n large enough. One application
of the triangle inequality
aXn + bYn − aX − bY ≤ aXn − aX + bYn − bY ,
and the homogeneity of a norm, yields
≤ |a| Xn − X + |b| Yn − Y . Because Xn → X and Yn → Y , if n > N1 , then
Yn − Y < . Pick N = max(N1 , N2 ) . Thus Xn − X < . Also, if n > N2 , then aXn + bYn − aX − bY < |a| + |b| = (|a| + |b|) , n > N, and the desired convergence is proved.
For a given linear space V , there may be two (or even more) norms deﬁned, say
and
1 to distinguish them. Why carry them both around? First of all, a sequence
may converge in one norm and not in the other. Second, even if both norms yield the same
convergent sequences, one norm may be more convenient in some particular computation.
Example. Consider the linear space C [−1, 1] of functions f (x) continuous for x ∈ [−1, 1] ,
with the two norms (Examples 7 and 8)
1 f ∞ = max |f (x)|
− 1 ≤ x≤ 1 ; f 1 |f (x)| dx, =
−1 106 CHAPTER 3. LINEAR SPACES: NORMS AND INNER PRODUCTS that is, the uniform norm and the L1 norm. We shall exhibit a sequence of functions which
converge in the second norm but not in the ﬁrst. Let fn (x) be 1
x ∈ [−1, − n2 ] 0,
3 1
1
n (x + n2 )
x ∈ [− n2 , 0]
fn (x) =
3 (x − 1 ) x ∈ [0, 1 ] −n n2
n2 1
0
x ∈ [ n2 , 1]
Then by inspection from the graph (
fn ∞ = n , and
1 is the area), we see that
1
fn 1 = n . As n → ∞, fn − 0 1 → 0 so that fn → 0 in the L1 norm. On the other
hand, fn ∞ → ∞ so the limit does not exist in the uniform norm. If you look at the
1
1
graph, fn is zero except for a spike in the interval [− n2 , n2 ] . As n → ∞ , the function is
zero in essentially the whole interval, except for the bit around the origin where it blows
up—but it blows up slowly enough that the area under the curve tends to zero.
However, we can prove the
Theorem 3.3 . Let fn and f be continuous functions, n = 1, 2, . . . . If fn → f in the
uniform norm, then also fn → f in the L1 norm.
Remark. We have just seen that the converse is false.
Proof: An immediate consequence of the
Lemma 3.4 fn − f 1 ≤ (b − a) fn − f ∞ Proof:
b fn − f 1 b |fn (x) − f (x)| dx ≤ = fn − f n ∞ dx a (3-4) b = fn − f dx = (b − a) fn − f ∞ a Exercises
(1) In the set Z , deﬁne d(m, n) = |m − n| where |x| is ordinary absolute value. Prove
that d(m, n) is a metric.
(2) Suppose that d1 (X, Y ) and d2 (X, Y ) are both metrics for a set S , where X, Y ∈ S .
a). Show that [d1 (X, Y )]2 is not, in general, a metric. b). Prove that d1 + d2 and
d2 + d2 are also metrics for S .
1
2
(3) Prove that the function d(X, Y ) =
a norm on R . |X − Y |
1+|X −Y | , (4) Let X = (x1 , . . . , xk ) ∈ Rk . Deﬁne
(a) Prove that X ∞ X ∞ X 1 1≤ l ≤ k 1/2 k |xl | , and = = max |xl | . is a norm for Rk , and write down the induced metric. k (b) Let X, Y ∈ R , is a metric, but that it is not X 2 2 |xl | = l=1 . l=1 Prove
X ∞ ≤X 2 ≤X 1 ≤k X ∞. 3.2. THE SCALAR PRODUCT IN E2 107 1
1
(c) Consider the sequence Xn = (1 − n , −7, n2 ) in R3 . In which of the norms ∞, 2, 1 does it converge, and to what? (5) Let Xn be a sequence of elements in a normed linear space V (not necessarily ﬁnite
dimensional). Prove that if Xn → X , then the sequence Xn is bounded in norm (a
sequence Xn in a normed linear space is bounded if there is an M ∈ R such that
Xn ≤ M for all n ). [Hint: Compare with Theorem 6, page ??].
(6) Compute the
R3 . 1, 2 , and (cf. Ex. 4) norms of the following vectors in ∞ b) Y = (2, −2, 1) , a) X = (1, 2, 2) ,
(7) Compute the
[−1, 1] . 1, 2 , and a) f (x) = −2x + 3 ∞ c) Z = (0, 3, −4) , d) W = (0, −1, 0) . norms of the following functions for the interval b) g (x) = sin πx , c) hn (x) = xn d) As n → ∞ , does the sequence hn converge in any of these three norms?
(8) Which of the following deﬁne norms for the given linear spaces?
a) For R3 , [X ] = x2 + x2 + x2
x
1
3
b) For P3 , [p] = max p(x)
0 ≤ x≤ 1 c) For P3 , [p] = max |p(x)|
0 ≤ x≤ 1 d) For R3 , [X ] = |x1 | + |x2 | e) For R4 , [X ] = 1 + x2 + x2 + x2 + x2 .
1
2
3
4 (9) Prove that [X ] = x2 + x2 deﬁnes a norm for R2 (some algebraic fortitude will be
1
2
needed to prove the triangle inequality). 3.2 The Scalar Product in 2
E In Euclidean space E2 —which we remind you is R2 with the Euclidean norm X =
x2 + x2 —one can introduce many geometric concepts and develop a corresponding geo1
2
metric theory. Most important of these concepts is that of angle—especially orthogonality
(perpendicularity). It turns out that these ideas generalize almost immediately to all En ,
and even to some exceedingly important inﬁnite dimensional spaces. This section is devoted
to the most simple situation: E2 . Please look at the pictures.
To begin, we introduce the scalar product (also called the dot product, or inner product)
of two vectors X and Y .
Deﬁnition: If X and Y are two vectors in E2 , their scalar product X, Y (sometimes
written X · Y ) is deﬁned by
X, Y = X Y cos θ, where θ is the angle between X and Y .
Notice that the scalar product of two vectors is a real number, a scalar, not another
vector. We need not specify the direction in which θ is measured, counterclockwise or
clockwise, since cos θ = cos(−θ) . Further, we can use either the acute or obtuse angle 108 CHAPTER 3. LINEAR SPACES: NORMS AND INNER PRODUCTS between X and Y since cos(2π − θ) = cos θ . It is important to observe that the scalar
product of two vectors is deﬁned independent of any coordinate system.
We are immediately led to some simple consequences.
Lemma 3.5 . Two vectors X and Y are orthogonal if and only if X, Y = 0
Proof: If X and Y are orthogonal, the angle θ between them is π , so X, Y =
2
Y cos θ = 0 . If
X
Y cos π = 0 . In the other direction, if X, Y = 0 , then X
2
π
neither X nor Y = 0 , then cos θ = 0 , that is θ = π or 32 . Thus X is orthogonal
2
to Y . If X or Y = 0 , then one of them is just the point at the origin, the zero
vector. We agree to say that the zero vector is orthogonal to every other vector. With this
agreement, X, Y = 0 ⇒ X ⊥ Y , and the second half of the theorem is proved too.
There is a nice geometric interpretation of the scalar product. A hint of it appeared in
our last lemma. Let e be a unit vector, e = 1 . Consider X, e = X cos θ (see ﬁgure).
This is the length of the projection of X in the direction of e , or in other words, the length
of the projection of X into the subspace spanned by the single vector e . Strictly X, e is
not really a “length”, since “length” carries the implication of being positive, whereas the
real number X, e will be negative if the projection “points” in the direction opposite to
e . We shall, however, allow ourselves this abuse of language. The vector U1 which is the
projection of X into the subspace spanned by e is U1 = X, e e .
If Y is a (non-zero) vector in E2 which is not a unit vector, the above geometric idea
goes through by making the simple observation that given any vector Y = 0 , the vector
e = Y / Y is a unit vector in the direction of Y .
Now you are certainly wondering how in the world we compute this scalar product.
You could take out your ruler, protractor and table of cosines—but we will present a more
convenient method. In order to compute this as is always the case, a particular basis must
be chosen. Then the vectors X and Y can be given explicitly in terms of the basis. Since
we want to show that the concepts are independent of any particular basis, you must relax
and be patient. Only after the theory has been exposed will we reveal how to compute in
terms of a given basis.
Theorem 3.6 (a) X, X = X 2 (b) X, Y = Y , X (c) aX, Y = a X, Y (d) X, aY = a X, Y , where a ∈ R . (e) X + Y, Z = X, Z + Y , Z (f) X, Y + Z = X, Y + X, Z (g) | X, Y | ≤ X where a ∈ R . Y (Cauchy- Schwarz inequality) Proof:
(a) Obvious since θ = 0 and cos 0 = 1 .
(b) Obvious since cos(−θ) = cos θ . 3.2. THE SCALAR PRODUCT IN E2 109 (c) The vectors X and aX lie along the same line through the origin. There are two
cases, a > 0 and a < 0 ( a = 0 is trivial). If a > 0 , the angle θ between X and Y
is identical to that between aX and Y . Since aX = a X for a > 0 , this case is
proved, for
aX, Y = aX
Y cos θ = a X, Y .
If a < 0 , then aX points in the direction opposite to X . Thus the angle θ1 between
aX and Y equals π − θ , where θ is the angle between X and Y . The following
computation completes the proof:
aX, Y = aX
= − |a| Y cos θ1 = |a| X
X Y cos(π − θ) Y cos θ = a X (3-5) Y cos θ = a X, Y (d) By (b) and (c) and (b) again we are done
X, aY = aY, X = a Y , X = a X, Y .
(e) This is the most subtle part. We shall rely on the interpretation of the scalar product
U, e as the length of the projection of U in the subspace spanned by e . First, let
e = Z/ Z be the unit vector in the direction of Z . We shall show that X + Y, e =
X, e + Y , e . A picture is all that is needed now. Two situations are illustrated,
where both X and Y are on the same side of the line perpendicular to e and
a figure goes here
where X and Y are on opposite sides of that line. The vector X + Y is found from
X and Y by the parallelogram rule for addition. Interpreting the scalar product of
a vector with e as the length of the projection into the subspace (line) spanned by
e , we see (look) that we must prove
→ → → OP =OQ + OM .
→ → But since OA and BC are on opposite sides of the same parallelogram, know that
→ → OM =QP both in magnitude and direction. The natural substitution yields
→ → → OP =OQ + QP ,
which is indeed all we desired. Thus
X + Y, e = X, e + Y , e
To prove the general result for Z = Z e , multiply the last equation by
is a scalar. Then by part a we ﬁnd
X + Y, Z e = X, Z e + Y , Z e ,
or
X + Y, Z = X, Z + Y , Z ,
We are done. Z , which 110 CHAPTER 3. LINEAR SPACES: NORMS AND INNER PRODUCTS (f) By parts (b), (e) and (b) again we obtain the result.
X, Y + Z = Y + Z, X = Y , X + Z, X = X, Y + X, Z
(g) Obvious since |cos θ| ≤ 1 . It is evident that equality occurs when and only when
cos θ ± 1 , that is, when X and Y lie along the same line (possibly pointing in
opposite directions).
If e is a unit vector, we know how to ﬁnd the projection U1 of a given vector X into the
subspace spanned by e , it is U1 = X, e e . Similarly, if Y is any vector—not necessarily
of length one, then since Y / Y is a unit vector in the direction of Y , the projection of X
into the subspace spanned by Y is X, Y / Y Y / Y = X, Y Y / Y 2 . We can also ﬁnd
the projection U2 of X into the subspace orthogonal to the unit vector e . Since the sum
of U1 and U2 must add up to X, X = U1 + U2 , we ﬁnd that U2 = X − U1 = X − X, e e .
Thus, we have proved
Theorem 3.7 . If X and Y are any two vectors, Y = 0 , then X can be decomposed
into two vectors U1 and U2 , X = U1 + U2 such that U1 is in the subspace spanned by Y
and U2 is in the orthogonal subspace. The decomposition is given by U1 = X, Y Y / Y 2
and U2 = X − X, Y Y / Y 2 , so that
X = X, Y Y
Y 2 + (X − X, Y Y
Y 2 ). Without further delay, we shall show how to compute the scalar product of two vectors.
In order to carry this out we must introduce a basis. Let X1 and X2 be any two vectors in
E2 which span E2 . Then every vector X ∈ E2 can be written in the form X = a1 X1 +a2 X2 ,
where the scalars a1 and a2 are determined uniquely by the vector X . Now it is most
convenient to have a basis whose vectors are i) orthogonal to each other and ii) have unit
length. Such a basis is called an orthonormal basis (orthogonal and normalized to have
unit length). In other words e1 and e2 are an orthonormal basis for E2 if ej = 1 and
e1 , e2 = 0 . This requirement is most conveniently stated by introducing the Kronecker
symbol δjk
0 j=k
δjk =
1 j = k.
Then the orthonormality property reads ej , ek = δjk , j, k = 1, 2 . The notation is perhaps
excessive for this simple case, but will really be useful in our generalizations.
Therefore, let e1 and e2 be an orthonormal basis for E2 , so that if X ∈ E2 , X =
x1 e1 + x2 e2 . Fix the basis throughout the ensuing discussion. Observe that x1 and x2
can be computed in terms of X , and the basis vectors e1 and e2 , viz X, e1 = x1 e1 +
x2 e2 , e1 = x1 e1 , e1 + x2 e2 , e1 = x1 , since e1 , e1 = 1 and e1 , e2 = 0 . Similarly,
X, ex = x2 . Thus we have proved
Theorem 3.8 . If { ej }, j = 1, 2 , form an orthonormal basis for E2 , then every vector
2 X ∈ E2 can be written as X = xj ej , where xj is the length of the projection of X
j =1 into the subspace spanned by ej , xj = X, ej . 3.2. THE SCALAR PRODUCT IN E2 111 If X = x1 e1 + x2 e2 and Y = y1 e1 + y2 e2 are any two vectors in E2 , then
X, Y
= x1 e1 + x2 e2 , y1 e1 + y2 e2
= x1 e1 + x2 e2 , y1 e1 + x1 e1 + x2 e2 , y2 e2
= x1 e1 , y1 e1 + x2 e2 , y1 e1 + x1 e1 , y2 e2 + x2 e2 , , y2
= x1 y1 e1 , e1 + x2 yx e2 , e1 + x1 y2 e1 , e2 + x2 y2 e2 , e2
= x1 y1 + 0 + 0 + x2 y2 = x1 y1 + x2 y2 .
Now you see how easy it is to compute the scalar product of X and Y in terms of the
representation from an orthonormal basis. Let us rewrite our result formally.
2 Theorem 3.9 . Let { ej }, j = 1, 2 , form an orthonormal basis for E2 . If X = xj ej
j =1 2 and Y = vj ej , then
j =1
2 X, Y = xj yj = x1 y1 + x2 y2 .
j =1 Some numerical examples should reassure you of the basic simplicity of the computation.
As our orthonormal basis in E2 , we choose the vectors e1 = (1, 0) and e2 = (0, 1) . These
both have unit length, and are perpendicular (one is on the horizontal axis, the other on
the vertical axis). Let X = (−2, 3) . Then X = −2e1 + 3e2 . Notice that −2e1 and 3e2
are exactly the projections of X into the subspaces spanned by e1 and e2 respectively. If
Y = (1, −2) , then our theorem shows that
X, Y = (−2)(−1) + (3)(−2) = −2 − 6 = −8.
From this computation we can reverse the geometric procedure and ﬁnd the angle θ between
X and Y , for we know the formula
X, Y
.
X
Y
√
√
√
√
In this example, X, Y = −8, X = 4 + 9 = 13 and Y = 1 + 4 = 5 . Thus
−8
θ = cos−1 ( √65 ) which can be evaluated by consulting your favorite numerical tables.
It is equally simple to check if two vectors are orthogonal. Let X = (2, −3) and
Y = (6, 4) . Then X, Y = (2)(6) + (−3)(4) = 0 ; consequently X and Y are orthogonal.
Another consequence is the law of cosines. Let X = (x1 , x2 ) and Y = (y1 , y2 ) . Then
from the parallelogram construction, the length of the segment joining the tip of X to the
tip of Y has length Y − X . But
Y − X 2 = Y − X, Y − X
= X 2 + Y 2 − 2 X, Y
= X 2+ Y 2−2 X
Y ??θ.
One more example. We shall ﬁnd the distance of the point P = (−3, 2) from the coset
A = { X = (x1 , x2 ) ∈ E2 : x1 − 2x2 = 2 } . Pick some point in X0 in A , say X0 = (3, 1 ) .
2
The distance d from P to A is then the length of the projection of the segment X0 P
onto a line l orthogonal to A . First of all, we can replace the segment X0 P by a vector
3
from the origin 0 to the point Q = P − X0 = (−6, 2 ) , for the length of the projection of
0¯ onto a line l orthogonal to A is equal to the length of the projection of X¯P onto
Q
0
cos θ = 112 CHAPTER 3. LINEAR SPACES: NORMS AND INNER PRODUCTS l (see ﬁgure). Now we have the vector Q = (−6, 3 ) ; all we need to compute the desired
2
projection is another vector N orthogonal to A , for then d = | Y, N/ N | .
To ﬁnd a vector N orthogonal to A , we realize that N will also be orthogonal to
the subspace S parallel to the coset A so A = S + X0 , where S = { X = (x1 , x2 ) ∈
E2 : x1 − 2x2 = 0 } . If N = (n1 , n2 ) and X is any element of S , since N ⊥ S , we must
have 0 = X, N = x1 n1 + x2 n2 . However X ∈ S so x1 − 2x2 = 0 . We want the equation
x1 n1 + x2 n2 = 0 to hold for all points on x1 − 2x2 = 0 , that is for all X ∈ S . This is only
possible if √ 1 = 1 · c and n2 = −2 · c , where c is any constant. Thus N = c(1, −2) and
n
N = |c| 5 . The distance d between the point P and the coset A is then
c
3
√ (1, −2)
d = | Y, N/ N | = (−6, ),
2 |c| 5
c
9
3 −2c
= (−6)( √ ) + ( )( √ ) = √ .
2 |c| 5
|c| 5
5 (3-6) This example contained a plethora of ideas. It would be wise to go through it again and list
the constructions and concepts used. The exercises will develop many of them in greater
generality.
Now you should try some problems on your own. Exercises
→ (1) If X = (3, 4) and Y = (5, −12) are two points in E2 , ﬁnd the angle between OX
→ and OY , where 0 is the origin.
(2) If X = (3, −4) and Y = (5, 12) are two vectors in E2 , ﬁnd vectors U1 ∈ span(Y )
and U2 orthogonal to span(Y ) such that X = U1 + U2 .
(3) Show that the vector N = (a1 , a2 ) is perpendicular to the straight line whose equation
is a1 x1 + a2 x2 = c (you will have to supply the natural deﬁnition of what it means
for a vector to be perpendicular to a straight line).
(4) (a) Find the distance of the point P = (2, −1) from the coset A = { X ∈ E2 : x1 +
x2 = −2 } .
(b) Find the distance between the two “parallel” cosets A deﬁned above and B =
{ X ∈ E2 : x1 + x2 = 1 } . (Hint: Draw a ﬁgure and observe that P ∈ B ) .
(5) (a) Prove that the distance d of the point P = (y1 , y2 ) from the coset A = { X ∈
E2 : a1 x1 + a2 x2 = c } is given by
d= |a1 y1 + a2 y2 − c|
a2 + a2
1
2 . (b) Prove that the distance d between the two “parallel” cosets A = { X ∈ E2 : a1 x1 +
a2 x2 = c1 } , and B = { X ∈ E2 : a1 x1 + a2 x2 = c2 } is given by
d= |c1 − c2 |
a2 + a2
1
2 . (Hint: If you use part (a) and are cunning, the derivation takes but one line). 3.3. ABSTRACT SCALAR PRODUCT SPACES 113 (6) (a) If it is known that X, Y1 = X, Y2 , and that X = 0 for a ﬁxed X , can
you “cancel” X from both sides and conclude that Y1 = Y2 ? Reason?
(b) If it is known that X, Y
Reason?
(c) If it is known that
Y1 = Y2 ? Reason? = 0 for every X , can you conclude that Y = 0 ? X , Y1 = X , Y2 for every X , can you conclude that (7) (a) Show that the vector
Z= X Y+ Y X
.
X+Y bisects the angle between the vectors X and Y .
(b) Show that the vector
X Y. X Y + Y X is perpendicular to the vector Y X− (8) Express the angle between an edge and a diagonal of a rectangle in terms of the scalar
product.
(9) Let two of the sides of a parallelogram be given by the vectors X and Y . The
parallelogram theorem states that the sum of the squares of the sides is equal to the
sum of the squares of the diagonals, that is,
X +Y 2 + X −Y 2 =2 X 2 +2 Y 2 . Prove this in two ways: i) using elementary geometry, and ii) using only the fact that
X and Y are elements of a linear space, and the properties of the scalar product
contained in Theorem 4 (using 4a to deﬁne
).
(10) Let X be any vector in E2 , and let e be a unit vector. Deﬁne the vector U = ae ,
where a = X, e is the length of the projection of X into the subspace spanned by
e , and V = αe , where α is any scalar. Prove that
X −V 2 ≥ X −U 2 =X 2 −U 2 =X 2 − a2 . This shows that in the subspace spanned by e , the vector closest to X is the projection U of X into that subspace.
(11) If X is orthogonal to Y , prove the Pythagorean theorem X + Y 2 = X 2 + Y
using only V 2 = V , V and the properties of a scalar product in Theorem 4.
(12) Let X and Y be orthogonal elements of E2 , with neither X nor Y
that X and Y are linearly independent. Do not introduce a basis. 3.3 2 zero. Prove Abstract Scalar Product Spaces We shall turn the tables around. Whereas in the last section we deﬁned the scalar product
geometrically and deduced its properties, in this section we deﬁne a scalar product space as
a linear space upon which a scalar product is deﬁned, and the scalar product is stipulated
to have the properties deduced earlier. After presenting our abstract deﬁnition, we shall
give examples—other than E2 —of scalar product spaces. 114 CHAPTER 3. LINEAR SPACES: NORMS AND INNER PRODUCTS Deﬁnition. A linear space H is called a real scalar product space if to every pair of
elements X, Y ∈ H is associated a real number X, Y , the scalar product of X and Y ,
which has the properties
1. X, X ≥ 0 with equality if and only if X = 0 .
2. X, Y = Y , X
3. aX, Y = a X, Y , a ∈ R
4. X + Y, Z = X, Z + Y , Z
You should observe that the scalar product in E2 does have these properties (Theorem
X, X and suspect that
4). Using E2 as our model, it is natural to deﬁne X =
is indeed a norm on the linear space H . This is true, but proving the triangle inequality
for this norm using only properties 1-4 will take some work. We shall do just that after
presenting
Examples
(1) Let X = (x1 , . . . , xn ) and Y = (y1 , . . . , yn ) be points in the linear space R2 . We
deﬁne
X, Y = x1 y1 + x2 y2 + . . . + xn yn .
Only easy algebra is needed to verify that the real number X, Y satisﬁes all of the
properties of a scalar product. It turns out (after we prove the triangle inequality)
X, X is the Euclidean norm, so this is E2 .
that the natural norm X =
(2) This example is the ﬁrst hint that our abstractions are fruitful. Let the functions
f (x) and g (x) be points in the linear function space C [a, b] of real-valued functions
continuous for a ≤ x ≤ b . We deﬁne
b f, g = f (x)g (x) dx.
a You might be surprised; in any event let us verify that the real number f , g associated with the pair of functions f and g does satisfy the four properties of a scalar
product.
(i) (ii)
(iii)
(iv) b f , f = a f 2 (x) dx . This is clearly non-negative and f = 0 implies that
b
f , f = 0 . All we must show is that if f , f = a f 2 (x) dx = 0 , then f = 0 .
By contradiction, assume f (x) = 0 . Then there is some point x0 ∈ [a, b] such
that f (x0 ) = c = 0 . Thus f 2 (x0 ) = c2 > 0 . Since f —and hence f 2 —is
continuous, this means that f 2 is positive in some interval about x0 (p. 29b,
b
Theorem I), so that a f 2 (x) dx > 0 , the desired contradiction.
b
b
a f (x)g (x) dx = a g (x)f (x) dx = g , f .
b
b
αf, g = a αf (x)g (x) dx = α a f (x)g (x) dx = α
b
f + g, h = a (f (x) + g (x))h(x) dx
b
b
= a f (x)h(x) dx + a g (x)h(x) dx f, g = f , g , where α ∈ R . = f , h + g, h .
There. We did it. After we prove the triangle inequality for an abstract scalar
product space, the natural candidate for a norm f is a norm:
b f 2 (x) dx. f=
a 3.3. ABSTRACT SCALAR PRODUCT SPACES 115 I like this space very much. You will be meeting it often, becoming much more
intimate with its ﬁner features. We shall—somewhat improperly—refer to this
linear space with the given scalar product as L2 [a, b] . The name is improper
since L2 [a, b] is customarily used for our space but with more general functions
and an extended notion of integration.
(3) Let f (x) and g (x) be in C [0, ∞] . This time deﬁne
∞ f, g = f (x)g (x)e−x dx. 0 e−x Since
is continuous and positive for all x , we are assured that f , f ≥ 0 , with
equality if and only if f = 0 . The other properties of an inner product follow from
simple manipulations. Do them.
Remark: Complex scalar product spaces are deﬁned similarly. For them, X, Y
may be a complex number, and complex scalars are admitted. The only change in the
axioms is that property 2 is dropped in favor of
¯
Y , X = X, Y , ¯
2. where the bar means take the complex conjugate of the complex number X, Y .
Since we shall not develop the theory far enough, our attention henceforth will be
restricted to real scalar product spaces.
The ﬁrst order of business is to prove that the natural candidate for a norm X =
X, X is in fact a norm for the linear space V . Only properties 1-4 may be used.
(1) X ≥ 0 , with equality if and only if X = 0 . This follows immediately from the
corresponding property of X, X . (2) aX = |a|
|a|
X. X . For aX = aX, aX = a2 X, X = |a| X, X = The proof of the triangle inequality
(3) X + Y ≤ X + Y involves more labor. We shall ﬁrst need to prove the CauchySchwarz inequality (cf. Theorem 4,g). Theorem 3.10 (Cauchy-Schwarz inequality).
| X, Y | ≤ X
Proof: If either X or Y
nor Y is zero and deﬁne Y. is zero, this is immediate. Thus, assume that neither X X
Y
,
V=
,
X
Y
so that both U and V are unit vectors, U = V = 1 . Then
U= 0≤ U ±V 2 =
=
= U ± V, U ± V
U, U ± U, V ± V , U + V , V
U 2 ± 2 U, V + V 2 , . Since U = 1 and V = 1 , this shows ± U, V ≤ 1 . Substituting for U and V , we
obtain the inequality sought:
| X, Y | ≤ X Y . 116 CHAPTER 3. LINEAR SPACES: NORMS AND INNER PRODUCTS Theorem 3.11 (Triangle inequality) X +Y ≤ X + Y . Proof: This is identical to that given in section 1. X + Y 2 = X + Y, X + Y
X 2 + 2 X, Y + Y 2 . By Cauchy-Schwarz, X, Y ≤ X
Y , so
X +Y 2 ≤X 2 +2 X 2 Y +Y = = ( X + Y )2 . Now take square root of both sides to ﬁnd
X +Y ≤ X + Y .
Nice, eh? See how clean everything is. We have proved
X, X in terms
Theorem 3.12 . If H is a scalar product space and we deﬁne X =
of the scalar product, then
is a norm and H is a normed linear space with that norm.
This special case where the norm is induced by a scalar product is called a pre-Hilbert space
(an honest Hilbert space has the additional property of being “complete”).
Let us state two easy algebraic consequences of our axioms for a scalar product. The
proofs are identical to those of Theorem 4 in the previous section.
Theorem 3.13
X, aY = a X, Y , a∈R (3-7) X, Y + Z = X, Y + X, Z , (3-8) Needless to say, we hope you are still thinking in the geometric terms presented earlier.
In particular, the next deﬁnition should be reasonable.
Deﬁnition Two vectors X, Y are said to be orthogonal if X, Y = 0 .
The Pythagorean theorem suggests
Theorem 3.14 . If X and Y are orthogonal, then
X ±Y 2 =X 2 +Y 2 , and conversely.
Proof: Both parts are an immediate consequence of the identity
X ±Y 2 = X + Y, X + Y = X 2 ± 2 X, Y + Y 2 . Examples.
(1) Let X = (2, 3, −1) and Y = (1, −1, −1) be points in E3 , where we use the scalar
product of example 1 in this section. Then X, Y = 2 · 1 + 3(−1) + (−1)(−1) = 0
so X and Y are orthogonal. Similarly X = (2, 3, 1, −1) and Y = (3, −3, 3, 0) in
E4 are orthogonal. A useful example is supplied by the vectors e1 = (1, 0, 0, . . . , 0) ,
e2 = (0, 1, 0, 0, . . . , 0), . . . , en = (0, 0, . . . , 0, 1) in En . These are orthonormal since
ek , ek = 1 , but ek , el = 0, k = l , that is, ek , el = δkl . 3.3. ABSTRACT SCALAR PRODUCT SPACES 117 (2) Consider the functions Φk (x) = sin kx in L2 [−π, π ] , where k = 1, 2, 3, . . . . Then,
1
since sin θ sin Ψ = 2 [cos(θ − Ψ) − cos(θ + Ψ)] , we ﬁnd that
π sin2 kx dx = π Φk , Φk =
−π and for k = l p Φk , Φl = i−π sin kx sin lx dx = 0. as a computation reveals. Thus in L2 [−π, π ] the function sin kx is orthogonal to the
function sin lx when k = l . The whole computation may be summarized by
Φk , Φl = sin kx, sin lx = πδkl .
It is only the factor π which does not allow us to say that the Φk are orthonormal—
√ kx
but that is easily patched up. Let ek (x) = sin π . Then
sin kx sin lx
√ ,√
π
π
1
= sin kx, sin lx ,
π ek , el = or
ek , el = δkl .
√ kx
Therefore the functions ek (x) = sin π are orthonormal. Don’t attempt to imagine it.
Just keep on thinking of a big E2 and all will be well. So far we have discussed the notion of two vectors X and Y being orthogonal. This
can be restated as one vector X being orthogonal to the subspace A spanned by Y , for
all vectors in A are of the form aY where a is a scalar, and X, aY = 0 ⇐⇒ X, Y = 0
since X, aY = a X, Y . One can also introduce the concept of a vector X being
orthogonal to an arbitrary subspace A . Think of A as being a plane (through the origin
of course).
Deﬁnition The vector X is orthogonal to the subspace A if X is orthogonal to every
vector in the subspace A .
In practice, the usual way to check if X is orthogonal to the subspace A is as follows.
Pick some basis { Y1 , Y2 , . . . } for A . Then every Y ∈ A is of the form
Y= ak Yk (if the basis has an inﬁnite number of elements—that is, if A is inﬁnite dimensional—
one should worry about convergence; however we shall ignore that issue for now). By the
algebraic rules for the scalar product, we ﬁnd that
X, Y = X, ak Yk = ak X, Yk . Thus, X is orthogonal to the subspace A if X is orthogonal to every element in some basis
for A X, Yk = 0 .
For example, if A is the x1 x2 plane in E3 , and X is the vector (0, 0, 1) , then
we can show that X = (0, 0, 1) is orthogonal to A by showing it is orthogonal to both 118 CHAPTER 3. LINEAR SPACES: NORMS AND INNER PRODUCTS the vector e1 = (1, 0, 0) and to e2 = (0, 1, 0) , since e1 and e2 form a basis for A . The
computation X, e1 = 0 and X, e2 = 0 is immediate. Because Y1 = (1, 2, 0) and
Y2 = (1, −1, 0) also form a basis for A , we could prove that X is orthogonal to A by
showing that X, Y1 = 0 and X, Y2 = 0 —which is equally simple.
A less obvious example is supplied by the function Ψ(x) = cos x which is orthogonal
to the subspace A spanned by Φ1 (x) = sin x, Φ2 (x) = sin 2x, . . . , Φn (x) = sin nx in
Lx (−π, π ) . The proof is a consequence of the integration formula
π Ψ, Φk = cos x sin kx dx = 0 for all k. −π Even more general than a vector being orthogonal to a subspace is the idea that two
subspaces A and B are orthogonal, by which we mean that every vector in A is orthogonal
to every vector in B . If A is a subspace of a scalar product space H , then it is natural
to deﬁne the orthogonal complement A⊥ of A as the set
A⊥ = { X ∈ H : X, Y = 0 for all Y ∈ A }
of vectors X orthogonal to A , that is, orthogonal to every vector Y ∈ A . The set A⊥ is
a subspace since it is closed under vector addition and multiplication by scalars (Theorem
2, p. 142).
Without fear of evoking surprise, we deﬁne the angle θ between two vectors X and
Y by the formula
X, Y
.
cos θ =
X
Y
No matter what X and Y are, this deﬁnes a real angle since the right side of the equation
is a real number between −1 and +1 (by the Cauchy-Schwarz inequality). To be honest,
there is little use for the concept of angles other than right angles. In E3 the formula has
some use, but is totally unused for more general scalar product spaces.
If we are given a set of linearly independent vectors { X1 , X2 , . . . } which span a linear
scalar product space H , how can we construct an orthonormal set { e1 , e2 , . . . } which also
1
spans the space? The process is carried out inductively. Let e1 = X1 . Now we want a
X
unit vector e2 orthogonal to e1 . A reasonable candidate is
e2 = X2 − X2 , e1 e1 ,
˜
which is X2 with the projection of X2 onto e1 subtracted oﬀ (see ﬁg.) This vector e2 is
˜
orthogonal to e1 since e2 , e1 = 0 . We divide by its length to obtain the unit vector e2 ,
˜
e2 = X2 − X2 , e1 e1
.
X2 − X2 , e1 e1 Next we take X2 and subtract oﬀ both its projection into the subspace spanned by e1 and
e2
e3 = X3 − [ X3 , e1 e1 + X3 , e2 e2 ].
˜
This vector e3 is orthogonal to both e1 and e2 . Normalize it to get e3 = e3 / e3 .
˜
˜˜ 3.3. ABSTRACT SCALAR PRODUCT SPACES 119 More generally, say we have used the vectors X1 , X2 , . . . , Xk to obtain the orthonormal
set e1 , e2 , . . . , ek . Then ek+1 is given by
k Xk+1 − Xk+1 , el el
l=1
k ek+1 = Xk+1 − Xk+1 , el
l=1 This procedure is called the Gram-Schmidt orthogonalization process. With it we can assert
that if some set of linearly independent vectors spans a linear space A , we might as well
suppose that those vectors constitute an orthonormal set, for if they don’t just use GramSchmidt to construct a set that is orthonormal.
The next result is a useful observation.
Theorem 3.15 . A set { X1 , X2 , . . . , Xn } of orthogonal vectors, none of which is the zero
vector, is necessarily linearly independent.
Proof: The hypothesis states that Xj , Xk = 0, j = k and that Xj , Xj = 0 . Assume
there are scalars a1 , a2 , . . . an such that
0 = a1 X1 + a2 X2 + . . . + an Xn .
We shall show that a1 = a2 = . . . = an = 0 . Take the scalar product of both sides with
the vector X1 . Then
0, X1 = a1 X1 , X1 + a2 X2 , X1 + · · · + an Xn , X1 .
so that
0 = a1 X1 , X1 .
Since X1 , X1 = 0 , we conclude that a1 = 0 . Similarly, by taking the scalar product with
X2 we ﬁnd that a2 = 0 , and so on.
An easy consequence of this theorem is the fact that the functions fn (x) = sin nx, n =
1, 2, . . . , N where x ∈ [−π, π ] are linearly independent, for they are orthogonal (cf. Exercise
5, p. ???).
Say we are given an orthonormal set of n vectors, { ej }, j = 1, . . . , n,
ej , ek =
δjk , and X an element of the linear space A spanned by the { ej } . Then
n X= xj ej ,
j =1 where the xj are uniquely determined just from the general theory of linear spaces (p. 160,
Theorem 10). In the special case of a scalar product space we can conclude even more.
Theorem 3.16 . Let { ej , j = 1, . . . , n } be an orthonormal set of vectors which span A .
n Then every vector X ∈ A can be uniquely written as X = xj ej , where xj is the length
j =1 of the projection of X into the subspace spanned by ej , that is, xj = X, ej . The xj are
the Fourier coeﬃcients of X with respect to the orthonormal basis { ej } . 120 CHAPTER 3. LINEAR SPACES: NORMS AND INNER PRODUCTS Proof: This is identical to Theorem 6 of the last section. Take the inner product of both
n sides of X = xj ej with ek . Then
n=1
n X , ek = xj ej , ek
j =1
n = (3-9) n xj ej , ek =
j =1 xj δjk ,
j =1 so that
X, ek = xk .
Furthermore,
Theorem 3.17 . Let { ej }, j = 1, . . . , n be an orthonormal set of vectors which span A .
n If X = n xj ej and Y =
j =1 yj ej are vectors in A , then
j =1
n xj yj = x1 y1 + x2 y2 + · · · + xn yn . X, Y =
j =1 Proof: Identical to Theorem 7 of the last section.
n n X, Y = xj ej ,
j =1
n = yk ek
k=1
n xj ej ,
j =1
n = n xj (
j =1
n = yk ek
k=1 (3-10) yk ej , ek )
k=1
n xj (
j =1 so that yk δjk ),
k=1 n xj yj . X, Y =
j =1 Remark: We shall see that these two theorems extend to the case n = ∞ .
Examples.
(1) The vectors e1 = (1, 0, 0), e2 = (0, 1, 0) , and e3 (0, 0, 1) clearly form an orthonormal
basis for E3 . Let X = (2, −1, 4) . We shall compute the xj in
3 X= xj ej .
j =1 3.3. ABSTRACT SCALAR PRODUCT SPACES 121 Since xj = X, ej , we ﬁnd z1 = X, e1 = (2, −1, 4), (1, 0, 0) = 2·1+(−1)·0+4·0 =
2 , and similarly, x2 = −1, x3 = 4 as expected. Thus
(2, −1, 4) = 2e1 − e2 + 4e3 .
In the same way, if Y = (7, 1, −3) , then
Y = 7e1 + e2 − 3e3 .
Also,
X, Y = (2)(7) + (−1)(1) + (4)(−3) = 1.
The projection of X into the subspace spanned by Y is
X, Y / Y Y
Y 1
(7, 1, −3)
59
1
3
7
= e1 + e2 − e3 .
59
59
59 = (3-11) 1
1
1
1
Another orthonormal basis for E3 is e1 = ( √2 , √2 , 0), e2 = (− √2 , √2 , 0) , and e3 =
˜
˜
˜
(0, 0, 1) , since ej , ek = δjk . The expansion for X in this basis is
˜˜
3 X= xj ej ,
˜˜
j =1 where
1
11
x1 = X, e1 = (2, −1, 4), ( √ , √ , 0) = √ ,
˜
˜
22
2
3
x2 = − √ , and x3 = 4.
˜
˜
2 (3-12)
(3-13) Thus
3
1
˜
˜
e
X = √ e1 − √ e2 + 4˜3 .
2
2
Similarly,
6
8
Y = √ e1 − √ e2 − 3˜3 .
˜
˜
e
2
2
Therefore
1
8
3
6
X, Y = ( √ )( √ ) + (− √ )(− √ ) + (4)(−3) = 1.
2
2
2
2
Notice that the number X, Y is the same no matter which basis is used. This is
not a coincidence. Recall that the scalar product X, Y was deﬁned independently
of any basis. Hence its value should not be dependent upon which basis we happen
to choose. If you think of X, Y geometrically in terms of the projection, it should
be clear that the number should not depend upon which particular basis is used to
describe the vectors. 122 CHAPTER 3. LINEAR SPACES: NORMS AND INNER PRODUCTS (2) For our second example, we consider the set of orthonormal functions e1 (x) = sin x
√
π , sin x
√
π e2 (x) =
and let A be the set in L2 (−π, π ) which they span. We would like to
expand some function
2 f (x) + fj ej (x).
j =1 The only trouble is that Theorems 14 and 15 only allow us to expand functions f
which are in the subspace A , that is, are a linear combination of the basis elements e1
and e2 . Since we secretly know that f (x) = sin x cos x(= 1 sin 2x) is such a function,
2
let us ﬁnd its expansion. By elementary integration,
π sin x
(sin x cos x) √ dx = 0,
x
−π f1 = f , e1 =
and √
sin 2x
π
.
(sin x cos x) √ dx =
2
π
−π
π f2 = f , e2 =
Therefore √ √
π
π
f = 0 · e1 +
e2 =
e2
2
2
or
√
sin 2x
π sin 2x
( √ )=
,
sin x cos x =
2
2
π
which we knew was the case from trigonometry.
If the orthonormal set { ej }, j = 1, . . . , m spans a subspace A of a linear scalar product
space H , and if X ∈ H , can any sense be made of the expansion
m
? X= xj ej ?
j =1 One way to seek an answer is to examine a special case. Again geometry will supply the key.
Let H = E3 and let A be the subspace spanned by the orthonormal vectors e1 = (1, 0, 0)
and e2 = (0, 1, 0) . Then if X ∈ E3 , how can we interpret
2
? X= xj ej = x1 e1 + x2 e2 ?
j =1 Plowing blindly ahead, we take the scalar product of both sides with e1 and then with e2 .
This gives us xj = X, ej . Thus the right side, x1 e1 + x2 e2 , is the projection of X into
the subspace A spanned by { ej } . It is now clear how our original quandary is resolved.
Deﬁnition If the orthonormal set { ej }, j = 1, . . . , m spans a subspace A of a linear
m scalar product space H , and if X ∈ H , then the vector xj ej , where xj = X, ej , is
j =1 the projection of X into the subspace A .
Remark. It is customary to denote the projection of X into A by PA X . Think of PA
as an operator (function) which maps the vector X into its projection in A . With this
notation the above deﬁnition reads
m PA X = xj ej ,
j =1 3.3. ABSTRACT SCALAR PRODUCT SPACES 123 where xj = X, ej and the orthonormal set { ej } spans A .
Since the projection PA X is deﬁned in terms of a particular basis for A , we should
show that this geometrical object is independent of the basis you choose for A . But we
shall not take the time right now. In reality, Theorem 17 below leads us to make a better
deﬁnition of projection.
Theorem 3.18 . If the orthonormal set { ej }, j = 1, . . . , m spans a subspace A ⊂ H ,
and if X and Y are in H , then
m a) PA X, PA Y = xj yj ,
j =1 where xj = X, ej and yj = Y , ej . In particular
m b) x2 .
j PA X =
j =1 Furthermore, X − PA X ∈ A⊥ , that is, for every Y ∈ A
X − PA X, Y = 0 c)
Every X ∈ H can be written as
d) X = PA X + PA⊥ X, where PA⊥ X ≡ X − PA X is in A⊥ .
m Proof: Since both vectors PA X = m xj ej and PA Y =
j =1 yj ej are in A itself, a) and
j =1 b) are immediate consequences of Theorem 15. Although the equation c) is geometrically
clear, we shall compute it too.
Since the ej span A , this is equivalent to showing it is orthogonal to all the ej . Now
X − PA X, ej = X, ej − PA X, ej = xj − xj = 0 . Since trivially X = PA X +(X − PA X ) ,
the only content of part d) is that (X − PA X ) ∈ A⊥ , which is just what part c) proved.
Corollary 3.19
a) X
X 2
2 = PA X
= PA X 2
2 + X − PA X
+ PA⊥ X 2 2 (Pythagorean Theorem) m b) X 2 ≥ PA X 2 x2
j = (Bessel’s Inequality) j =1 Proof: a) is a result of the fact that PA X ∈ A is orthogonal to X − PA X ∈ A⊥ and
Theorem 12. The inequality b), Bessel’s inequality, is simply a weaker form of a)—since
X −PA X ≥ 0 . There is equality if and only if X ∈ A , for only then does X −PA X = 0 .
Examples: 124 CHAPTER 3. LINEAR SPACES: NORMS AND INNER PRODUCTS (1) Let A be the subspace of E3 spanned by e1 = (1, 0, 0) and e2 = (0, 1, 0) . The
projection of X = (3, −1, 7) into A is represented by
PA X = X, e1 e1 + X, e2 e2 = 3e1 − e2 ∈ A
Also
PA⊥ X = X − PA X = 3e1 − e2 + 7e3 − (3e1 − e2 ) = 7e3 ∈ A⊥ .
1
1
1
1
˜
Since e1 = ( √2 , √2 , 0) and e2 = (− √2 , √2 , 0) also form an orthonormal basis for A ,
˜
we can equally well write 2
4
PA X = X, e1 e1 + X, e2 e2 = √ e1 − √ e2 .
˜˜
˜˜
˜
˜
2
2
(2) Let A be the subspace of L2 [−π, π ] spanned by the orthonormal functions e1 (x) =
sin x
√ , e2 (x) = sin 2x . The projection of the function f (x) ≡ x into A is represented
√
π
π
by
PA f = f , e1 e1 + f , e2 e2 .
Since an integration by parts shows that
π −x cos kx
k x sin kx dx =
−π =− x cos kx
k π
−π =− 2π
k,k
π
− 2k , 2π
cos kπ =
k we ﬁnd π
−π cos kx dx odd
k even = (−1)k+1 2π
,
k π f , e1 = x, e1 =
and √
sin x
x √ dx = 2 π
π
−π
π f , e2 = x, e2 =
Thus √
sin 2x
x √ dx = − π.
π
−π √ sin x √ sin 2x
PA x = 2 π √ − π √ ,
π
π or
PA x = 2 sin x = sin 2x.
Also,
2 PA X = f , e1 2 + f , e2 2 = 5π. ˜
More generally, we can let A be the subspace of L2 [−π, π ] spanned by { ek }, k =
sin kx
˜
1, 2, . . . , N , where ek (x) = √π . Then the projection of x onto A is given by
N PA x =
˜ x, ek ek (x).
k=1 Since √
sin kx
k+1 2 π
=
x √ dx = (−1)
,
k
π
−π
π x, ek 3.3. ABSTRACT SCALAR PRODUCT SPACES 125 we have
√ N k+1 2 (−1) PA x =
˜ k=1
N π sin kx
√
k
π (−1)k+1
sin kx
k =2
k=1 = 2(sin x − (3-14) sin 2x sin 3x
sin N x
+
− . . . + (−1)N +1
).
2
3
N Furthermore,
N
2 PA x
˜ N = 2 x, ek = k=1 k=1 4π
= 4π
k2 N
k=1 1
.
k2 It is from this formula that we eventually intend to obtain the famous formula
∞
k=1 1
1
1
1
π2
= 1 + 2 + 2 + 2 + ··· =
.
k2
2
3
4
6 We will observe that π f 2 2 =X x · x dx = =
−π 2π 3
3 and prove
lim N →∞ PA⊥ x = lim
˜ N →∞ x − PA x = 0
˜ Then from the Corollary to Theorem 16,
2 X = lim N →∞ PA x 2 ,
˜ or
2π 3
= 4π
3 ?
k=1 1
π2
⇒
=
k2
6 ∞
k=1 1
.
k2 Geometry leads us to the next theorem—and the proof too. Let X be a given vector
and PA X its projection into the subspace A . Since distance is measured by dropping a
perpendicular, we expect that PA X is the vector in A which is closest to X , that is, most
closely approximates X .
Theorem 3.20 . Let X be a vector in a scalar product space H and A a subspace of
H . Then if V is any vector in A ,
X − PA X ≤ X − V .
Proof: We shall prove the stronger statement (cf. ﬁg. above)
X − PA X 2 + V − PA X 2 = X −V 2 . Observe that (V − PA X ) ∈ A , since both terms are in A and A is a subspace. Moreover
X − PA X ∈ A⊥ (Theorem 16c). Therefore X − PA X is orthogonal to PA X − V , so the
identity is a consequence of Theorem 12. 126 CHAPTER 3. LINEAR SPACES: NORMS AND INNER PRODUCTS Remark. With this theorem in mind, we could deﬁne the projection PA X into a subspace
A as the element in A which is closest to X . This deﬁnition is independent of any basis,
whereas our original deﬁnition was not. One must, however, be somewhat careful when
deﬁning the projection into an inﬁnite dimensional subspace. Although it is clear that the
number X − V has a g.l.b. as V wanders throughout A , it is not clear that it has an
actual min, that is, there really is a vector U ∈ A such that X − U takes on its g.l.b.
as a min. If there is such a U , we call it PA X . Otherwise there is no projection. When
projecting into a ﬁnite dimensional space this diﬃculty does not arise (but we will stop
without further explanation of this detail).
Some discussion of these results is needed to place the material in its proper perspective.
If you are given an orthonormal set of vectors { ej } which span some subspace A of a scalar
product space H , then for any X in H you can ﬁnd a representation for PA X in terms of
that basis, PA X = xj ej . If the vector X happened to already lie in A , then PA X = X
x2 . This last equation for the length of X is the
so X =
xj ej and X =
j
Pythagorean Theorem. If X did not lie entirely in A , but “stuck out” of it into the rest
of H , then PA X =
xj ej only represents a piece of X , its projection into A . Since
x2 . This inequality
part of X has been omitted, we expect that X > PA X =
j
was the content of the Corollary to Theorem 16. Informally, if no vector X ∈ H sticks
out of the linear space spanned by the { ej } , then the set { ej } is said to be complete (do
not confuse this with the complete of Chapter 0; they are entirely diﬀerent concepts, an
unfortunate coincidence). More precisely,
Deﬁnition An orthonormal set is complete for the scalar product space H if that orthonormal set is not properly contained in a larger orthonormal set.
There are many ways to check if a given orthonormal set is complete for H . Geometry
suggests them all.
Theorem 3.21 . Let { ej } be an orthonormal set which spans the subspace A of the
scalar product space H . The following statements are equivalent
(a) The set { ej } is complete for H .
(b) If X, ej = 0 for all j , then X = 0 .
(c) A = H .
(d) If X ∈ H , then X = xj ej , where xj = X, ej . (e) If X and Y ∈ H , then X, Y = xj yj , where xj = X, ej and yj = Y , ej (f) If X ∈ H , then (Pythagorean Theorem) X 2 = x2 , where xj = X, ej
j Proof: We shall use the chain of reasoning a ⇒ b ⇒ c . . . ⇒ f ⇒ a .
a ⇒ b . If X, ej = 0 but X = 0 , then X/ X is a unit vector orthogonal to all the
ej . This means that { X , e1 , e2 , . . . } is an orthonormal set which contains { e1 , e2 , . . . }
X
as a proper subset.
b ⇒ c . If there is an X ∈ H but X A , then PA⊥ X = X − PA X ∈ A⊥ and is not
zero. Since all the ej ∈ A , we have PA⊥ X, ej = 0 for all j but PA⊥ X = 0 , contradicting
b). Thus H ⊂ A . Since A ⊂ H by hypothesis, this proves that H = A . 3.3. ABSTRACT SCALAR PRODUCT SPACES 127 c ⇒ d . Since every X ∈ A has the form X =
xj ej (by Theorem 14) and since
H = A , the conclusion is immediate.
d ⇒ e ⇒ f . A restatement of Theorem 16 since for every X ∈ H , we know that
PA X = PH X = X .
f ⇒ a . If { ej } is not complete, it is contained in a larger orthonormal set. Let e
be a vector in that larger set which is not one of the ej . Then by f), and the fact that
e, ej = 0 ,
e 2 = e, ej 2 = 0. Therefore e = 0 .
Remarks. 1. Because each of the six conditions a-f are equivalent, any one of them could
have been used as the deﬁnition of a complete orthonormal set.
2. If the orthonormal set { ej } has a (countably) inﬁnite number of elements, the
theorem is still valid but some convergence questions for d-f arise because of the then
∞ inﬁnite series X = xj ej . The appropriate sense of convergence is that the remainder
1 ∞ N xj ej = X − after N terms,
N +1 xj ej tends to zero in the norm of the scalar product
1 space, that is, if
N lim N →∞ X− xj ej = 0.
1 We shall meet this in the next section for the space L2 [−π, π ] . Condition f) gives us
no convergence problems since the series is an inﬁnite series of positive terms which is
always bounded by X 2 (Bessel’s Inequality—Corollary b to Theorem 16), and so always
converges. This criterion just asks if the sum of the series actually equals X 2 (we know
it is no larger).
Examples
1
1
1
1
(1) The set of orthonormal vectors e1 = ( √2 , √2 , 0) and e2 = ( √2 , − √2 , 0) are not
complete for E3 since any basis for E3 must have three elements because its dimension
is 3. This could also be seen geometrically from the fact that, for example X = (1, 2, 3)
sticks out of the space spanned by e1 and e2 , or from the fact that e3 = (0, 0, 2)
is a non-zero vector orthogonal to both e1 and e2 , or in many other ways. The
dimension argument is the easiest to apply if H is ﬁnite dimension, for then the
number of elements in a complete orthonormal set { ek } must equal the dimension
of H.
√
(2) The set { en } where en (x) = sin nx is an orthonormal set of functions in the scalar
˜
˜
π
product space L2 [−π, π ] , but it is not a complete orthonormal set for that space since
the function cos x is a non-zero function in L2 [−π, π ] which is orthogonal to all the
en ,
˜
π
sin nx
cos x, en =
˜
cos √ dx = 0.
π
−π Thus, although the set { en } has an inﬁnite number of elements, it is still not big
˜
enough to span all of L2 [−π, π ] . The next section will be devoted to proving that the 128 CHAPTER 3. LINEAR SPACES: NORMS AND INNER PRODUCTS
larger orthonormal set, e0 , e1 , e1 , e2 , e2 , . . . , where
˜
˜
1
cos nx
sin nx
e0 = √ , en (x) = √ , en (x) = √
˜
π
π
2π
is a complete orthonormal set for the scalar product space L2 [−π, π ] . This is a
diﬃcult theorem. Speciﬁc applications of the ideas in this section are contained in the exercises. For
many of them you would be wise if you referred to their corresponding special cases which
appeared in Section 2. Exercises
(1) Let X and Y be points in Rn . Determine which of the following make Rn into a
scalar product space, and why—or why not.
n (a) X, Y =
k=1
n (b) 1
xk yk .
k
(−1)k xk yk . X, Y =
k=1 n (c) 2
x2 yk .
k X, Y =
k=1
n (d) X, Y = ak xk yk , where ak > 0 for all k .
k=1 (2) Let f and g be continuous real-valued functions in the interval [0, 1] , so f, g ∈
C [0, 1] . Determine which of the following make C [0, 1] into a scalar product space,
and why—or why not.
1 (a) f, g = f (x)g (x)
0 1
dx .
1 + x2 1 (b) f, g = f (x)g (x) sin 2πx dx .
0
1 (c) f (x)g 2 (x) dx f, g =
0
1 (d) f, g = f (x)g (x)ρ(x) dx , where ρ(x) is a ﬁxed continuous function with the
0 property ρ(x) > 0 .
(e) f , g = f (0)g (0) . (3) This is the analogue of L2 for sequences. Let l2 be the set of all sequences X =
∞ (x1 , x2 , xe , . . .) with the property that X x2 < ∞ . Prove that l2 is a
j =
j =1 normed linear space (cf. the example for l1 in Section 1). 3.3. ABSTRACT SCALAR PRODUCT SPACES 129
∞ ∞ n2 a2 < ∞ , then
n (4) Use the Cauchy-Schwarz inequality to prove that if
n=1 (Hint: |an | = 1
n |an | < ∞ .
n=1 |nan | ). (5) Consider the following linearly independent vectors in E3 :
X1 = (1, 0, −1), X3 = (2, −1, 0). X2 = (0, 3, 1), (a) Use the Gram-Schmidt orthogonalization process to ﬁnd an orthonormal set of
vectors, e1 , e2 and e3 such that e1 is in the subspace spanned by X1 .
3 (b) Write X = (1, 2, 3) as X = xj ej , where the ej are those of part a). Also,
j =1 compute X and PX . (6) Consider the following linearly independent set of functions in L2 [−1, 1]
f1 (x) = 1, f2 (x) = x, f3 (x) = x2 . (a) Use the Gram-Schmidt orthogonalization process to ﬁnd an orthonormal set of
functions e1 (x), e2 (x) and e3 (x) such that e1 is in the subspace spanned by
f1 .
(b) Find the projection of the function f (x) = (1+ x)3 into the subspace of L2 [−1, 1]
spanned by e1 (x), e2 (x) , and e3 (x) . Also, compute f and P f .
(7) Let Pn (x) = 1 dn
2n n! dxn (1 − x2 )n , n = 0, 1, 2, . . . . These are the Legendre Polynomials.
1 (a) Prove that Pn , Pm = Pn (x)Pm (x) dx = 0, n = m , that is, the Pn are −1 orthogonal in L2 [−1, 1] by ﬁrst proving that
1 Pn (x)xm dx = 0, m < n.
−1 (b) Show that Pn 2 = 2
2n+1 . Thus the functions
en (x) = 2n + 1
Pn (x)
2 are an orthonormal set of functions for L2 [−1, 1] . Compute e0 (x), e1 (x) , and
e2 (x) and compare with Exercise 6a.
(8) (a) Show that the vector N = (a1 , a2 , a3 ) is orthogonal to the coset (a plane in E3 )
A = { X ∈ E3 : a1 x1 + a2 x2 + a3 x3 = c } .
(b) Show that the vector N = (a1 , . . . , an ) is orthogonal to the coset (a hyperplane
in En ) A = { X ∈ En : a1 x1 + . . . an xn = c } .
(c) Find the coset A ⊂ E3 which passes through the point X0 = (1, −1, 2) and is
orthogonal to N = (1, 3, 2) . In ordinary language, A is the plane containing
the point X0 which is orthogonal to N . 130 CHAPTER 3. LINEAR SPACES: NORMS AND INNER PRODUCTS
(d) Show that the coset A ⊂ En which passes through the point X0 = (˜1 , . . . , xn )
x
˜
and is orthogonal to N = (a1 , . . . , an ) is
A = { X ∈ En : X, N = X0 , N }. (9) (a) Use Problem 8a to show that the distance d from the point P = (y1 , y2 , y3 ) ∈ E3
to the coset a1 x1 + a2 x2 + a3 x3 = c in E3 is
d= |a1 y1 + a2 y2 + a3 y3 − c|
a2
a + a2
2 + a2
3 = | N, P − c|
N (b) Show that the distance d from the point P = (y1 , . . . , yn ) ∈ En to the coset
a1 x1 + . . . + an xn = c in En is
d= |a1 y1 + a2 y2 + · · · + an yn − c|
a2
1 + a2
2 + ··· + a2
n = | N, P − c|
.
N (c) Show that the distance d between the “parallel” cosets a1 x1 + · · · + an xn = c1
and a1 x1 + · · · + an xn = c2 in En is
d= |c1 − c2 |
a2 + · · · + a2
n
1 = |c1 − c2 |
.
N (Hint: Pick a point P in one of the cosets and apply part b).
(10) Find the angle between the diagonal of a cube and one of its edges.
(11) Let Y1 and Y2 be ﬁxed vectors in a scalar product space H .
a). If X, Y1 = 0 for all X ∈ H , prove that Y1 = 0 .
b). If X, Y1 = X, Y2 for all X ∈ H , prove that Y1 = Y2 .
(12) Let Y0 be a ﬁxed vector in a scalar product space H . Let A = { X ∈ H : Y , Y0 =
0Rightarrow X, Y = 0 } . Prove that A is the span of Y0 : { X ∈ ARightarrowX =
cY0 } for some scalar c . Make sure to see the geometrical situation for the case
H = E3 . [Hint: Let B be the set of all vectors orthogonal to Y0 , so Y ∈ B .
Since H is composed of two parts, Y0 and B , every X ∈ H can be written as
X = cY0 + Z , where cY0 is the projection of X into the subspace spanned by Y0
(so c = X, Y0 / Y0 2 ) and Z = (X − cY0 ) ∈ B . Now show that X ∈ A ⇒ Z = 0) .
(13) (a) Let X = (1, 3, −1) and Y = (2, 1, 1) . Find a vector N which is orthogonal to
the subspace spanned by X and Y .
(b) Let X = (x1 , x2 , x3 ) and Y = (y1 , y2 , y3 ) . Find a vector N which is orthogonal
to the subspace spanned by X and Y . [Answer. N = c(x2 y3 − y2 x3 , y1 x3 −
x1 y3 , x1 y2 − y1 x2 ) , where c is any non-zero scalar].
(14) Let A be the subspace of L2 [−π, π ] spanned by the orthonormal set { en (x) }, n =
√
1, 2, . . . , N , where en (x) = sin nx .
π
(a) Find the projection of f (x) = x2 , into A . (The answer should surprise you).
Compute f and PA f too. 3.3. ABSTRACT SCALAR PRODUCT SPACES 131 (b) Find the projection of f (x) = 1 + sin3 x into A . Compute f and PA f . (c) If f (x) is an even function, f (x) = f (−x) , show that its projection into A is
zero. Now look at part (a) again.
(15) (a) If f ∈ C [a, b] , show that
b b f (x) dx)2 ≤ (b − a) ( f 2 (x) dx. a a [Hint: Write f (x) = 1 · f (x) and use the Cauchy-Schwarz inequality for L2 [a, b] ].
(b) If f ∈ C 1 [a, b] , prove that
b |f (x) − f (a)|2 ≤ (x − a) f (x)2 dx, x ∈ (a, b).
a x
af [Hint: Write f (x) − f (a) = (t) dt and apply part a)] (c) If f ∈ C 1 [a, b] and f (a) = 0 , use part b to prove that
b f 2 (x) dx ≤
a (b − a)2
2 b f (x)2 dx.
a (16) (a) Let A = { h ∈ C 1 [a, b] : h(a) = h(b) } and let B = { h ∈ C 1 [a, b] : 1, h = 0 } ,
where h = dh . Show that the subspaces A and B are identical h ∈ A ⇐⇒
dx
h∈B.
b (b) Let f (x) be any continuous function such that f (x)h (x) dx = 0 for all
a h(x) ∈ C 1 [a, b] with h(a) = h(b) . Show that f ≡ constant. [Hint: Use part (a)
and the result of Exercise 12].
b (17) If f (x) ∈ C [a, b] and satisﬁes the condition
which satisfy the conditions
b b h(x) dx = 0,
a f (x)h(x) dx = 0 for all h(x) ∈ C [a, b]
a b xn h(x) dx = 0, xh(x) dx = 0, . . . ,
a a prove that f ∈ Pn , that is, f is of the form
f (x) = a0 + a1 x + · · · + an xn ,
where the aj are constants. [Hint: Use Exercise 12].
(18) Determine which of the following orthonormal sets are complete for their respective
spaces.
(a) In E3 , e1 = (0, 1, 0), e2 = ( 3 , 0, 4 ), e3 = (− 4 , 0, 3 )
5
5
5
5
1
1
(b) In E4 , e1 = (1, 0, 0, 0), ex = (0, 1, 0, 0), e3 = (0, 0, √2 , √2 ).
1
1
(c) In E4 , e1 , e2 , e3 as in (b), and e4 = (0, 0, √2 , − √2 ). 132 CHAPTER 3. LINEAR SPACES: NORMS AND INNER PRODUCTS (19) Let e1 , e2 , and e3 be an orthonormal basis for E3 , and let A be the subspace
spanned by X1 = 3e1 − 4e3 . Find an orthonormal basis for A⊥ .
(20) Let A be a subspace of a scalar product space H . If X ∈ H , prove that PA (PA X ) =
2
PA X and interpret this geometrically. This result can be written as PA = PA .
(21) Let A be any operator (not necessarily linear) on a scalar product space. Prove the
polarization identity
2 AX, AY = AX + AY 3.4 2 − AX 2 − AY 2 . Fourier Series. Throughout this section we shall only use the scalar product of L2 [a, b] ,
b f, g = f (x)g (x) dx.
a We begin with the observation that in the interval [−π, π ]
π sin nx, sin mx = sin nx sin mx dx = πδnm , (3-15) −π
π sin nx, cos mx = sin nx cos mx dx = 0, (3-16) cos nx cos mx dx = πδnm , (3-17) −π and
π cos nx, cos mx =
−π where n, m = 0, 1, 2, 3, . . . . Thus the functions
sin nx
cos nx
1
˜
e0 (x) = √ , en (x) = √ , cn (x) = √
π
π
2π
form an orthonormal set:
en , em = δnm en , em = 0, en , em = δnm .
˜
˜
˜˜
Thus, if f ∈ Lx [−π, π ] , we can ﬁnd the projection PN f of f into the subspace spanned
by e0 , e1 , e, . . . , eN , eN .
˜
˜
N (PN f ) = a0 e0 + an en + bn en ,
˜ (3-18) n=1 where
ak = f , ek and bk = f , ek
˜ (3-19) More explicitly,
N 1
cos nx
sin nx
(PN f )(x) = a0 √ +
an √
+ bn √ ,
π
π
2π n=1 (3-20) 3.4. FOURIER SERIES. 133 where π a0 =
and 1
f (x) · √ dx,
2π
−π
π π an = cos nx
dx,
f (x) √
π
−π bn = sin nx
f (x) √ dx
π
−π (3-21) A natural question arises: as N → ∞ , does the series converge: PN f → f , in the sense
that f − PN f → 0 ? In other words, is the set { ej (x), ej (x) }, j = 0, 1, 2, . . . a complete
˜
orthonormal set of functions for L2 [−π, π ] ? The answer is yes, as we shall prove. Thus for
any f ∈ L2 [−π, π ] ,
∞
sin nx
1
cos nx
+ bn √ ,
(3-22)
f (x) = a0 √ +
an √
π
π
2π n=1
where the Fourier coeﬃcients, an , bn are determined by the formulas (2). The expansion
(3) is called the Fourier series for f .
Historically, Fourier series did not arise from the geometrical considerations we have
developed. Mathematical physics—in particular the vibrations of strings and the ﬂow of
heat in a bar—take the credit for these ideas. Only in recent years has the geometrical
viewpoint been investigated. Later on we shall discuss some of the fascinating problems in
mathematical physics to which Fourier series can be applied.
Beware. The equality which appears in (3) is equality in the L2 [−π, π ] norm, viz.
b [f (x) − (PN f )(x)]2 dx → 0 f − PN f =
a This is quite diﬀerent than the convergence of inﬁnite series to which you’re accustomed,
which is the uniform norm
f − PN f ∞ = max |f (x) − (PN f )(x)| .
− π ≤ x≤ π In Section 1 (p. 176) you saw one instance of where a sequence of functions converged in
some norm (the L1 norm there) but did not converge in the uniform norm. Such is also
the case here. In fact, contrasting the situation in the L2 norm, there do exist continuous
functions f whose Fourier series (3) does not converge to f in the uniform norm. However
if the function f has one derivative, then its Fourier series does converge to f in the
uniform norm.
In addition, there are some discontinuous functions whose Fourier series converge. These
ideas will become clearer later on.
You should be warned that our deﬁnition (1), (3) of a Fourier series is not the standard
√ nx ˜
√
one. Most books do not work with the orthonormal set e0 = √1 π , en = cos π , en = sin nx ,
π
2
˜
but rather use just an orthogonal set which is not normalized θ0 = 1 , θn = cos nx, θn =
2 sin nx . For these people,
f (x) =
where A0
+
2 π An = f (x)
−π ∞ An cos nx + Bn sin nx,
n=1 cos nx
dx,
π π Bn = f (x)
−π sin nx
dx,
π 134 CHAPTER 3. LINEAR SPACES: NORMS AND INNER PRODUCTS √
n = 0, 1, 2 . . . . As you can see, these diﬀer from our formulas only by factors of π .
Needless to say, the resulting Fourier series for a given function f does not depend which
intermediate formulas you use. We prefer the less standard ones because they are more
intimately tied to geometry (so there is less to remember).
Before discussing the diﬃcult issues of convergence in detail, we will ﬁnd the Fourier
series associated with some speciﬁc functions.
Examples.
(1) Find the Fourier series associated with the functions f (x) = x , −π ≤ x ≤ π . We
actually found this in the previous section. A computation (involving integration by
parts) shows that
π a0 = f , e0 = 1
x · √ dx = 0
2π
−π π an = f , en = cos nx
x√
dx = 0,
π
−π n = 1, 2, . . . √
sin nx
2(−1)n+1 π
=
x √ dx =
,
n
π
−π
π bn = f , en
˜ n = 1, 2, . . . Thus, upon substituting into (3) we ﬁnd that
∞ ∞ 2(−1)n+1 √ sin nx
(−1)n+1
x=
π√
=2
sin nx
n
n
π
n=1
n=1
or
x = 2[sin x − sin 2x sin 3x sin 4x
+
−
+ · · · ].
2
3
4 Again we remind you that the equality here is in the sense of convergence in L2 . For
this particular function, there is also equality in the usual sense of convergence for
inﬁnite series for all x ∈ (−π, π ) . Direct substitution reveals that it does not converge
in the usual sense at x = ±π . These remarks are based upon convergence theorems
we have yet to prove. At x = π , this yields
2
π
111
= 1 − + − + ···
4
357
(2) Since the formulas (2)’ make sense even if the function f (x) has a ﬁnite number of
discontinuities, we are tempted to ﬁnd the Fourier series for discontinuous functions
(in contrast, recall that the coeﬃcients of an inﬁnite power series are only deﬁned if
the function had an inﬁnite number of derivatives). We shall ﬁnd the Fourier series
associated with the discontinuous function
f (x) = 0, −π ≤ x ≤ 0
π, 0 < x < π 3.4. FOURIER SERIES. 135 The computations are particularly simple.
0 π
1
1
0 · √ dx +
π · √ dx
2π
2π
−π
0
0
π
cos nx
cos nx
=
0· √
dx +
π· √
dx = 0,
π
π
−π
0 π2
= √ , (3-23)
2π a0 = f , e0 =
an = f , en 0 bn = f , en =
˜
√ sin nx
0 · √ dx +
π
−π π
=
(1 − cos nπ ) =
n √
2π
n 0 π
0 n > 0, (3-24) sin nx
π · √ dx
π (3-25) , n odd
, n even (3-26) Therefore the Fourier series associated with this function is
√
1
π2
f (x) = √ · √ + 2 π
2π
2π
or
f (x) = sin x sin 3x sin 5x
√ + √ + √ + ···
π
3π
5π , sin 3x sin 5x
π
+ 2(sin x +
+
+ · · · ).
2
3
5 As usual, the equality is meant in the sense of convergence in the L2 norm. The series
also converges to the function f in the uniform norm in the whole interval except for a
neighborhood of x = 0 . At 0 it hasn’t got a chance because of the discontinuity of f
there. A glance at the series reveals that at x = 0 , the right side is π/2 —the arithmetic
mean between the values of f just to the left and right of 0. This is the usual case at a
discontinuity: a Fourier series converges to the average of the function values to the right
and left of the point where f is discontinuous. We still oﬀer no proof for these statements.
Observe that the Fourier series (3) for any function f (x) depends only upon the values
of x in the interval −π ≤ x ≤ π . However the series itself is periodic with period 2π .
If the function f (x) , which we considered only for x ∈ [−π, π ] is deﬁned for all other x
by the formula f (x + 2π ) = f (x) (making f periodic too), then both sides of the Fourier
series (3) are periodic with period 2π . Therefore whatever they do in the interval [−π, π ]
is repeated every 2π .
For example, the function f (x) = x, x ∈ [−π, π ] when continued outside the interval
[−π, π ] as a function periodic with period 2π becomes
a figure goes here
Since the Fourier series for this particular function converges uniformly for all x ∈ (−π, π ) ,
it also converges uniformly to the periodically continued function for all x ∈ (kπ, kπ +
2π ), k = 1, ±1, ±2, . . . . This also makes it clear why the Fourier series for f (x) = x
converges to zero at x = ±π , for the series is just converging to the arithmetic mean of its
neighboring values at the discontinuity.
It is pleasant to look at a picture. Let us see how the ﬁrst four terms of its Fourier
series approximates the function x
x = 2(sin x − sin 2x sin 3x sin 4x
+
−
+ ···)
2
3
4 a figure goes here 136 CHAPTER 3. LINEAR SPACES: NORMS AND INNER PRODUCTS Notice that as more terms are used, the projection PN x PN x = 2(sin x − sin22x + · · · +
N
(−1)N +1 sinN x ) more and more closely approximate x . This reﬂects the convergence of the
Fourier series, PN f → f .
One popular interpretation of a Fourier series is as a sum of “waves” which approximate
a given function. Thus the function x is the sum of 2 times the wave sin x plus (−1) times
the wave sin 2x and so on. In other words, the Fourier series for the function f (x) = x
represents that function as the superposition of sine waves. The term 2 sin x is spoken of
2
as the ﬁrst harmonic, the term – sin 2x as the second harmonic, the term 3 sin 3x as the
third harmonic, etc.
Although it is diﬃcult to believe, the ear hears by taking the sound wave f (x) which
impinges on the ear drum and splitting it up into its Fourier components (3). It then
analyzes each component an en + bn en —only considering the coeﬃcients an and bn . These
˜
Fourier coeﬃcients measure the intensity of the n th harmonic. Particular sounds are then
heard in terms of the intensity of their various harmonics. We recognize familiar sounds by
recognizing that the sound waves have similar Fourier coeﬃcients. Amazing.
It is time to consider the convergence of Fourier series. The question is: does the partial
Fourier series
N PN f = a0 e0 + an en + bn en
˜
n=0 converge to the function f as N → ∞ . Since there are several norms, in particular the L2
norm
and the uniform norm
∞ , we must investigate convergence in each norm.
Even though our proofs are reasonably slick, they are neither short nor particularly simple.
A great deal of analytical technique will be needed. The proofs to be presented have been
chosen because each of the devices invoked are important devices in their own right.
We begin with some useful facts which have nothing especially to do with Fourier series.
Theorem 3.22 (Weierstrass Approximation Theorem). If f (x) is continuous in the interval [−π, π ] and f (−π ) = f (π ) , then given any > 0 there is a trigonometric polynomial
N TN (x) = α0 + αn cos nx + βn sin nx
n=1
N (3-27)
ˆ˜
αn en + βn en ,
ˆ = α0 e0 +
ˆ
n=1 √
√
ˆ√
ˆ
ˆ
(where α0 = α0 2π, αn = αn π, βn = βn π ), such that
f − TN ∞ = max |f (x) − TN (x)| < .
− π ≤ x≤ π Note that the numbers αn and βn are not necessarily the Fourier coeﬃcients of f . The
proof, which is placed as an appendix at the end of this section, will indicate how they can
be found.
The following theorem states that convergence in the uniform norm implies convergence
in the L2 norm.
Theorem 3.23 . If θ(x) is any bounded integrable function, then (if b > a )
√
θ ≤ b − a θ ∞. 3.4. FOURIER SERIES.
Proof: Since θ ∞ 137 = max |θ(x)| we ﬁnd immediately that
x∈[a,b] b b θ(x)2 dx ≤
a b θ 2
∞ dx =θ 2
∞ dx = (b − a) θ a 2
∞ a from which the conclusion is obvious. On geometrical grounds the theorem is even easier,
since θ ∞ is the greatest height of the curve θ(x) .
Although convergence in the L2 norm does not imply convergence in the uniform norm
(the example in Section 1 comparing L1 convergence and uniform convergence also works
for L2 ), a useful weaker statement is true.
Theorem 3.24 . (cf. Ex. 15 Section 3). If θ ∈ C 1 [a, b] and θ(x0 ) = 0 , where x0 ∈ [a, b] ,
then for every x ∈ [a, b] |θ(x)| ≤ √ b b−a θ dt = √ b−a θ . a Since the right side is independent of x , this implies that
θ ∞ = maxx∈[a,b] |θ(x)| ≤ √ b−a θ = √ b − a Dθ Proof: By the fundamental theorem of calculus,
x θ(x) = θ(x) − θ(x0 ) = θ (t) dt.
x0 Thus the Cauchy-Schwarz inequality yields
2 2 x |θ(x)| = 1 · θ (t) dt x x 12 dt ≤ x0 x0 θ (t)2 dt
x0 x θ (t)2 dt = (x − x0 ) (3-28) x0
b θ (t)2 dt. ≤ (b − a)
a Therefore
|θ(x)|2 = (b − a) θ 2 . With these preliminaries behind us we turn to the convergence of Fourier series. First
up is convergence in the L2 norm.
Theorem 3.25 . Assume f is continuous in the interval [−π, π ] and f (−π ) = f (π ) .
Denote the sum of the ﬁrst N terms of its Fourier series by PN f . Then
π lim N →∞ [f (x) − (PN f )(x)]2 dx = 0 f − PN f = lim N →∞ −π 138 CHAPTER 3. LINEAR SPACES: NORMS AND INNER PRODUCTS Proof: Given any > 0 , let TN (x) be the trigonometric polynomial given by Weierstrass
Approximation Theorem. The trick is to apply Theorem 17. Using the N of TN , we know
that
N PN f = a0 e0 + an en + bn en
˜
n=1 and N ˆ˜
αn en + βn en .
ˆ TN = a0 e0 +
ˆ
n=1 Let A be the subspace of H = L2 [−π, π ] spanned by e0 , e1 , e1 , . . . eN , eN Then both PN f
˜
˜
and TN are in A . Thus by Theorem 17 of the last section (where slightly diﬀerent notation
was used),
f − PN f ≤ f − T N ,
and by Theorem 20
≤ √ b − a f − TN < ∞ √ b−a . Thus
lim N →∞ f − PN f = 0, proving the theorem.
Corollary 3.26 (Parseval’s Theorem). If f (x) is continuous in the interval [−π, π ] and
f (−π ) = f (π ) , then
f 2 = lim PN f 2 ,
N →∞ that is, ∞ π
2 f (x) dx =
−π a2
0 (a2 + b2 ),
n
n +
n=1 where the Fourier coeﬃcients aj and bj are determined by equations (2) or (2)’.
Proof: The Corollary to Theorem 16 states that
f 2 = PN f 2 + f − PN f 2 . If we now let N → ∞ , the second term on the right vanishes by the theorem just proved.
Remark: The theorem and corollary state that the orthonormal set of functions e0 =
√1 , en (x) = cos nx , and en (x) = sin nx is a complete orthonormal set for the scalar prod√
√
˜
π
π
2π
uct space L2 [−π, π ] . The formula contained in the corollary is a generalization of the
Pythagorean Theorem to L2 [−π, π ] .
The proof of convergence in the uniform norm if the function has one continuous derivative is only slightly more diﬃcult. We shall need a preliminary
Lemma 3.27 . Assume f ∈ C 1 [−π, π ] . Extend it as a periodic function with period 2π
by f (x + 2π ) = f (x) . Let (PN f ) be the sum of the ﬁrst N terms of its Fourier series.
Then the sum of the ﬁrst N terms in the Fourier series for Df = df is PN (Df ) , that is
dx
PN (Df ) = D(PN f ).
This in not necessarily true for other bases in L2 [−π, π ] . 3.4. FOURIER SERIES. 139 Proof: We know that
N 1
cos nx
sin nx
(PN f )(x) = a0 √ +
an √
+ bn √ .
π
π
2π n=1
Since we can diﬀerentiate a ﬁnite sum term by term, we ﬁnd that
N D(PN f )(x) = sin nx
cos nx
−nan √
+ nbn √ ,
π
π
n=1 where the an and bn are found by using formulas (2)’. If
N cos nx
1
sin nx
An √
+ Bn √ ,
PN (Df ) = A0 √ +
π
π
2π n=1
where the An and Bn are also found by using (2)’, we must show that
An = nbn , and Bn = −nan A0 = 0,
But
π A0 = 1
1
(Df (x)) √ dx = √ [f (π ) − f (−π )] = 0
2π
2π
−π since f is periodic. Integrating by parts, we further ﬁnd
π cos nx
dx = n
(Df (x)) √
π
−π An =
and π Bn = sin nx
(Df (x)) √ dx = −n
π
−π π sin nx
f (x) √ dx = nbn
π
−π
π cos nx
dx = −nan .
f (x) √
π
−π Our result is now only a few steps away.
Theorem 3.28 . If f ∈ C 1 [−π, π ] and if both f and f are periodic with period 2π ,
then the Fourier series PN f converges to f in the uniform norm
lim N →∞ f − PN f ∞ = 0. Proof: The key observation is that f is a continuous function, so that Theorem 22 can
be applied to its Fourier series. This shows that
lim N →∞ Df − PN (Df ) = 0. By the above lemma,
D(f − Pn f ) = Df − D(PN f ) = Df − PN (Df ).
Thus
lim N →∞ D(f − PN f ) = 0. (3-29) 140 CHAPTER 3. LINEAR SPACES: NORMS AND INNER PRODUCTS We would like to apply Theorem 21 to the function θN = f − PN f . In order to do so, we
must only verify that θN vanishes somewhere in [−π, π ] . But the area under θN = f − PN f
is
π
√
θN (x) dx = 2π θN , e0 = 0
−π since f − PN f is orthogonal to the space spanned by e0 , e1 , e1 , . . . , eN , eN (Theorem 16c).
˜
˜
Because θN (x) is a continuous function (the diﬀerence of the C 1 ) function f and the
inﬁnitely diﬀerentiable trigonometric polynomial PN f ), the area under it can be zero only
if θN vanishes somewhere. Thus Theorem 21 is applicable and yields the inequality
√
f − PN f ∞ ≤ b − a D(f − PN f ) .
We now pass to the limit N → ∞ and use equation (4) to complete the proof of the
theorem:
√
lim f − PN f ∞ ≤ lim b − a D(f − PN f ) = 0.
N →∞ N →∞ C 1 [−π, π ] Remarks. The hypothesis that f ∈
and is periodic with period 2π has been
proved a suﬃcient condition for the Fourier series to converge to the function in the uniform
norm. Much weaker hypotheses also suﬃce to prove the same result—but mere continuity
is not enough. Convergence of Fourier series or generalizations thereof is a vast and deep
subject, one still the object of intense study.
On the basis of the theorems we have proved, many other problems are reasonably
accessible—like the convergence of the Fourier series for a function which is nice except for
a ﬁnite number of jump discontinuities. But there is not time for this pleasant excursion.
a figure goes here 3.5 Appendix. The Weierstrass Approximation Theorem .
The proof—which is diﬃcult—will be given as a series of lemmas.
Lemma 3.29 . If f (x) is continuous and periodic with period 2π , then for any a ∈ R ,
the following equality holds
a+2π 2π f (x) dx = f (x) dx. a 0 Proof: This is clear from a graph of f , since the area under one period of f does not
depend upon where you begin measuring. We also oﬀer a computational proof. Write
a+2π 0 f (x) dx =
a 2π f (x) dx +
a a+2π f (x) dx +
0 f (x) dx.
2π Let x = t + 2π in the last integral and use the fact that f (t + 2π ) = f (t) . The last integral
is then
0 − f (t) dt,
a which cancels the unwanted term in the last equation and proves the lemma. 3.5. APPENDIX. THE WEIERSTRASS APPROXIMATION THEOREM
π /2 cos2n t dt = Lemma 3.30 .
0 141 1 2 · 4 · 6 · · · (2n)
1
, where cn = v
.
2cn
π 1 · 3 · 5 · · · (2n − 1) Proof: A computation. Integrate by parts to show that
π /2 cos2n t dt = (2n − 1)(I2n−2 − I2n ). I2n =
0 Thus I2n =
I0 = π/2 . 2n−1
2n I2n−2 . Now induction can be used to do the rest, since by observation Lemma 3.31 . Assume f (x) is continuous and periodic with period 2π . Let
TN (x) =
Then given any cN
2 π f (t) cos2N (
−π t−x
) dt
2 (3-30) > 0 , there is an N such that
f − TN = max |f (x) − TN (x)| < . ∞ − π ≤ x≤ π Proof: How did we guess the formula (4)? We observed that cos2N x is one at x = 0 ,
and strictly less than one for all other x ∈ [−π, π ] . Thus, for large N, cos2N x is one at
x = 0 , and decreases sharply thereafter so cos2N ( t−x ) has the same property at x − t = 0 ,
2
where x = t . Then essentially the only values of f (t) which will count are those about
t = x , so what comes out will be f (x) . Let us proceed with the details.
Take s = t−x . Then
2
π /2 f (x + 2s) cos2N s ds. TN (x) = cN
−π/2 Split the integral into two pieces, from − π to 0 and from 0 to
2
−s in the ﬁrst one. This gives π
2 , and then replace s by π /2 [f (x) + 2s) + f (x − 2s)] cos2N s ds. TN (x) = cN
0 From Lemma 2 we know that
π /2 2f (x) cos2N s ds, f (x) = cN
0 since f (x) is a constant in the integration with respect to s . Therefore
π /2 [f (x + 2s) − 2f (x) + f (x − 2s)] cos2N s ds. TN (x) − f (x) = cN
0 Now given any
such that > 0 , from the continuity of f we can pick a δ > 0 independent of x
|f (x1 ) − f (x2 )| < when |x1 − x2 | < δ.
2
This will be the of our conclusion. Break the integral into two parts, one from 0 to δ
and the other from δ to π/2 , where δ is the δ we just found. Then in the [0, δ ] interval,
|f (x + 2s) − 2f (x) + f (x − 2s)| ≤ |f (x + 2s) − f (x)| + |f (x) − f (x − 2s)| < , 142 CHAPTER 3. LINEAR SPACES: NORMS AND INNER PRODUCTS while in the [δ, π ] interval,
2
|f (x + 2s) − 2f (x) + f (x − 2s)| ≤ |f (x + 2s)| + 2 |f (x)| + |f (x − 2s)| ≤ 4M,
where M = max |f (x)| . Hence
x∈[−π,π ] δ π /2 cos2N s ds + 4M |f (x) − TN (x)| < cN [
0 cos2N s ds].
0 Now we observe that
π /2 δ cos2N s ds < cos2N s ds = 0 0 1
,
2cN and that, since cos s decreases as s goes to π/2 ,
π π /2 cos2N s ds < cos2N δ ds < δ δ πN
γ,
2 where γ = cos2 δ < 1 . Thus
|f (x) − TN (x)| <
Now πcN = 2
3 2 + 2πM cN γ N . 4
2N − 2
· 5 · · · 2N −1 · 2N < 2N , so that 2πM cN γ N < 4M N γ N . Because γ < 1 , we know that lim N γ N = 0 . Thus, pick N so large that N γ N <
N →∞ same 8M , where this is the as before. Consequently, for this N ,
|f (x) − TN (x)| < . Since is independent of x ,
f (x) − TN (x) = max |f (x) − TN (x)| <
x∈[−π,π ] too. A diﬃcult lemma is thereby proved.
The whole proof is completed in the following simple
Lemma 3.32 . The function TN (x) deﬁned by (3)
TN (x) = cN
2 π f (t) cos2N (
−π t−x
) dt
2 is a trigonometric polynomial.
Proof: This can be horribly messy unless one is shrewd. We shall use the formula eiθ =
cos θ + i sin θ and the binomial theorem (top p. 108). First notice that
cos2N θ = eiθ + e−iθ
2 2N = 1
22N 2N
k=0 (2N )!
eikθ e−i(2N −k)θ .
(2N − k )k ! 3.5. APPENDIX. THE WEIERSTRASS APPROXIMATION THEOREM 143 Let dk = (2N )!/22N (2N − k )!k ! Then
2N dk e−i(2N −2k)θ cos2N θ =
k=0
2N (3-31)
dk [cos(2N − 2k )θ − i sin(2N − 2k )θ]. =
k=0 Since cos2N θ is real, the sum of the imaginary terms on the right must be zero. Thus,
replacing 2θ by t − x , we ﬁnd that
cos2N ( t−x
2=
) 2N dk cos(N − k )(t − x)
k=0
2N (3-32)
dk [cos(N − k )t cos(N − k )x + sin(N − k )t sin(N − k )x]. =
k=0 Split the sum into two parts, one from 0 to N , the other from N + 1 to 2N , and let
n = N − k in the ﬁrst, n = k − N in the second. This gives
cos2N ( t−x
)=
2 N dN −n [cos nt cos nx + sin nt sin nx]
n=0
N + (3-33)
dN +n [cos nt cos nx + sin nt sin nx], n=1 so
cos2N ( t−x
) = dN +
2 N (dN +n + dN −n )[cos nt cos nx + sin nt sin nx],
n=1 which is much more simple than one might have anticipated. Substituting this into (4)
and realizing that the t integrations just yield constants, we ﬁnd that TN (x) is indeed a
trigonometric polynomial. Coupled with Lemma 3, the proof of Weierstrass’ Approximation
Theorem is completely proved. Exercises
(1) Find the Fourier series with period 2π for the given functions.
(a) f (x) = 0, −π ≤ x ≤ 0
2, 0 < x < π (b) f (x) = −2, −π ≤ x < 0
2,
0≤x<π (c) f (x) = sin 17x + cos 2s, −π ≤ x < π (d) f (x) = sin2 x, −π ≤ x ≤ π ;
(e) f (x) = x2 , −π ≤ x ≤ π 144 CHAPTER 3. LINEAR SPACES: NORMS AND INNER PRODUCTS (f) f (x) = ∞ x + π,
−π ≤ x ≤ 0
−x + π, 0 ≤ x ≤ π (Also, compute f 2 (a2 + b2 )
n
n and a2 +
0
n=1 for (a)-(f)).
(2) (a) Apply Parseval’s Theorem (Corollary to Theorem 22) to the function f (x) = x
and its Fourier series to deduce that
1
1
1
π2
= 1 + 2 + 2 + 2 + ···
6
2
3
4
(cf. the example before Theorem 17 of Section 3).
(b) Do the same for the function f (x) = x2 (Ex. 1, e above) to evaluate
1+ 1
1
1
+ 4 + 4 + · · · =?
4
2
3
4 (3) A function f (x) is even if f (−x) = f (x) , odd if f (−x) = −f (x) . Thus 2 + x2 is
an even function, x3 − sin x is an odd function, while 1 + x is neither even nor odd.
Let an and bn be the Fourier coeﬃcients of the piecewise continuous function f (x) .
Prove the following statements.
(a) If f is an odd function,
π an = 0, bn = 2
0 sin nx
f (x) √ dx
π (b) If f is an even function
π an = 2
0 cos nx
f (x) √
dx,
π bn = 0 (c) A function f deﬁned in [0, π ] may be extended to [−π, π ] as either an even or
odd function by the formulas
even extension : f (−x) = f (x), x ≥ 0, f (−x) = −f (x), x ≥ 0. or
odd extension : The even extension of f (x) = x, x ∈ [0, π ] is f (x) = |x| , x ∈ [−π, π ] , while its
odd extension is f (x) = x, x ∈ [−π, π ] . The odd extension of f (x) = x2 , x ∈
x2 , x ∈ [0, π ]
[0, π ] is f (x) =
. Extend the function f (x) = 1, x ∈ [0, π ]
−x2 , x ∈ [−π, 0]
to the interval [−π, π ] as an odd function and sketch its graph. Find its Fourier
series using part (a).
(4) (a) Let f (x) be a given function. Find a solution of the O.D. E. u + λ2 u = f , where
λ is a real number and u(x) satisﬁes the boundary condition u(−π ) = u(π ) = 0 ,
by the following procedure: Expand f in its Fourier series and assume u has a
Fourier series whose coeﬃcients are to be found. Find a formula for the Fourier
coeﬃcients of u in terms of those for f in the case where λ is not an integer. 3.5. APPENDIX. THE WEIERSTRASS APPROXIMATION THEOREM 145 (b) If λ = n is an integer, show that there is a solution if and only if 0 = f , en =
˜
π
sin nx
f (x) √ dx .
π
−π
(5) (a) State Parseval’s Theorem for the special cases i) f is a continuous even function
in [−π, π ] , and ii) f is a continuous odd function in [−π, π ] .
(b) If f is a continuous even function in [−π, π ] and
π f (x) cos nx dx = 0, n = 0, 1, 2, 3, . . . , 0 show that f = 0 in [−π, π ] .
(c) State and prove a theorem similar to (b) in the case of a continuous odd function.
(6) In this exercise you show how a function f ∈ L2 [−A, A] can be expanded in a modiﬁed
Fourier series (so far we know only L2 [−π, π ] ). Let y = πx —this maps the interval
A
[−A, A] onto [−π, π ] —and deﬁne g (y ) by
f (x) = f ( πx
Ay
) = g (y ) = g ( ).
π
A Since g (y ) ∈ L2 [−π, π ] , it can be expanded in a Fourier series
∞ cos ny
1
sin ny
g (y ) = a0 √ +
+ bn √ ,
an √
π
π
2π n=1
where the an and bn are given by the usual formulas (2)’.
(a) Prove that f (x) ∈ L2 [−A, A] has the modiﬁed Fourier series
∞ bn
1
nx
nπ
f (x) = a0 √
x + √ sin
x,
+
cos
A
A
2A n=1
A
where
1
a0 = √
2A
1
an = √
A A f (x) cos
−A nπx
dx,
A A f (x) dx
−A 1
bn = √
A A f (x) sin
−A nπx
dx.
A (b) Find the modiﬁed Fourier series for f (x) = |x| , in the interval [−1, 1] .
The following exercises all concern the Weierstrass Approximation Theorem.
(7) Prove the following version of the Weierstrass Approximation Theorem. Let f ∈
C [a, b] . Then given any > 0 , there is a polynomial Q(x) such that
f −Q ∞ = max |f (x) − Q(x)| < .
x∈[a,b] −a
(Hint: Let y = −π +2 (x−a ) π . This maps [a, b] into [−π, π ] . Deﬁne g (y ), y ∈ [−π, π ]
b
by
(b − a)
(x − a)
(y + π ) = g (y ) = g (−π + 2
π ).
f (x) = f (a +
2π
b−a 146 CHAPTER 3. LINEAR SPACES: NORMS AND INNER PRODUCTS
Use the version of the theorem proved to approximate g (y ), y ∈ [−π, π ] by a trigonometric polynomial TN (y ) to within /2 . Then approximate sin ny and cos ny to
within c (you pick c ) by a ﬁnite piece of their Taylor series—which are polynomials.
Put both parts together to obtain the complete proof for g (y ) . The transition back
to f (x) is trivial.] (8) (Riemann-Lebesgue Lemma). Let f ∈ C [a, b] . Prove that
b lim f (x) sin λx dx = 0. λ→∞ a [Hint: Integrate by parts to prove it ﬁrst for all f ∈ C 1 [a, b] . For arbitrary f ,
approximate f by a polynomial—Ex. 7 above—to within /2 and realize that every
polynomial is in C 1 [a, b] ].
(9) If f ∈ C [0, 1] , prove that
1 f (x)xn dx = f (1). lim n n→∞ 0 [Hint: Use the hint in Ex. 8].
(10) If f ∈ C [a, b] , and if
b f (x)xn dx = 0, n = 0, 1, 2, 3, . . . , a
b show that f = 0 . [Hint: This implies that f (x)Q(x) dx = 0 , where Q is
a ˜
any polynomial. f can be approximated by some polynomial Q . Now show that
b f 2 (x) dx = 0 .]
a 3.6 The Vector Product in R 3 .
As you grasped many years ago, the world we live in has three space dimensions. For
this reason the material in this section is important in many applications. What we intend
to do is deﬁne a way to multiply two vectors X and Y in R3 . Whereas the scalar product
X, Y is a scalar, this product X × Y , the vector product, or cross product as it is often
called, is a vector.
For several reasons [i) we shall not cover this in class, and ii) I can probably not do as
good a job as appears in many books] we shall let you read about this topic elsewhere. But
make sure to read about it even though you’ll never be examined on it. Chapter 4 Linear Operators: Generalities.
V 1 → Vn, Vn → V 1
4.1 Introduction. Algebra of Operators .
Let V by a linear space. So far we have considered the algebraic structure of such a
space; however most signiﬁcant reason for studying linear spaces is so that one can study
operators deﬁned on them. Operator is another, more organic, name for function. Thus an
operator
T: A→B
T maps elements in its domain A into elements of B , where B contains the range of T .
If X ∈ A , then T (X ) = Y ∈ B . Think of feeding X into the operator T , and Y being
a figure goes here
what T sends out in return. It is useful to think of T as some type of machine or factory,
the input (raw material) is X , and the output is Y . Some examples should illustrate the
situation and its potential power.
Examples:
(1) Let V = R2 . If X = (x1 , x2 ) ∈ R2 , and Y = (y1 , y2 , y3 ) , we deﬁne T (X ) = Y by x1 + 2x2 = y1 x1 + x2 = y2
T (X ) =
, 3x1 + x2 = y3
or
T (X ) = T (x1 , x2 ) = (x1 + 2x2 , x1 + x2 , 3x1 + x2 ) = (y1 , y2 , y3 ) = Y.
This operator T has the property that to every X ∈ R2 it assigns a Y ∈ R3 . In
other words T maps the two dimensional space R2 into the three dimensional space
R3 T : R2 → R3 .
147 CHAPTER 4. LINEAR OPERATORS: GENERALITIES. V 1 → VN , VN → V 1 148 R2 is the domain of T , denoted by D(T ) , while the range of T, R(T ) is contained
in R3 ,
D(T ) = R2 , R(T ) ⊂ R3 . Since y1 = y2 = 0 implies that x1 = x2 = 0 , which in turn implies that y3 = 0 , we
see that the point (0, 0, 1) ∈ R3 is not in the range of T . Thus, T is not surjective
onto R3 . It is injective (one-to-one) since every point Y ∈ R(T ) is the image of
exactly one X ∈ D(T ) . This an be seen by observing that y1 and y2 suﬃce to
determine X = (x1 , x2 ) uniquely by solving the ﬁrst two equations
−y1 + 2y2 = x1
y1 − y2 = x2 .
Hence if Y = T (X1 ) and also Y = T (X2 ) , then X1 = X2 . Since the operator
T is completely determined by the coeﬃcients in the equations, it is reasonable to
represent this T by the matrix 12
T = 1 1 31
If you care to think of X as the input into a paint-making machine, then x1 might
represent the quantity of yellow and x2 the quantity of blue used. In this case y1 , y2
and y3 represent the quantities of three diﬀerent shades of green the machine yields.
For this machine, as soon as you specify the desired quantities of any two of the
greens, say y1 and y2 , the quantities x1 and x2 of the input colors are completely
determined, as is the quantity y3 of the remaining shade of green.
(2) Let V be R2 again. With X = (x1 , x2 ) ∈ R2 , and Y = (y1 ) ∈ R1 , deﬁne T by
x2 + x2 = y1 ,
1
2
or
T (X ) = x2 + x2 .
1
2
This operator T maps R2 into R1
T : R2 → R1 .
It is not surjective onto R1 since the negative half of R1 is completely omitted from
R(T ) . Furthermore, it is not injective either since each point y1 ∈ R(T ) other than
zero is the image of inﬁnitely many points—all of those on the circle x2 + x2 = y1 .
1
2
(3) Let V be C [−1, 1] . If f ∈ C [−1, 1] , we deﬁne T by
T (f ) = f (0).
Thus, if f (x) = 2 + cos x , then T f = 3 . This operator T is usually denoted by
δ and called the Dirac delta functional. It was ﬁrst used by Dirac in his work on
quantum mechanics and is extremely valuable in modern mathematics and physics.
T assigns to each continuous function f its value at x = 0 , a real number. Therefore
T : C [−1, 1] → R1 . 4.1. INTRODUCTION. ALGEBRA OF OPERATORS 149 The operator T is not injective, since for example the element 2 ∈ R1 is the image
of both f (x) = 1 + ex and f (x) = 2 . It is surjective since every element a ∈ R1 is
the image of at least one element in C [−1, 1] (if f (x) ≡ a , then clearly T (f ) = a ).
(4) Let V be C [−1, 1] . If f ∈ C 1 [−1, 1] then the diﬀerentiation operator D is deﬁned
by
df
(x).
(Df )(x) =
dx
It maps each function into its derivative. If f (x) = x2 , then (Df )(x) = 2x . Since the
derivative of a continuously diﬀerentiable function (a function in C 1 ) is necessarily
continuous, we see that
D : C 1 [−1, 1] → C [−1, 1].
D is not injective since, for example, the function g (x) = 1 is the image of both
f1 (x) = x and f2 (x) = 2 + x . D is surjective onto C [−1, 1] .
R(D) = C [−1, 1],
since if g (x) is any element of C [−1, 1] , then g is the image of the particular function
f ∈ C 1 [−1, 1] deﬁned by
x f (x) = g (s) ds,
0 because Df = g by the fundamental theorem of calculus.
Throughout this and the next chapter we will study some of the elementary aspects
of linear operators. It is reasonable to denote a linear operator by L .
Deﬁnition Let V1 and V2 both be linear spaces over the same ﬁeld of scalars. An
˜
operator L mapping V1 into V2 is called a linear operator if for every X and X
in V1 and any scalar a , L satisﬁes the two conditions
˜
˜
1. L(X + X ) = L(X ) + L(X )
2. L(aX ) = aL(X ).
Whenever ambiguity does not arise, we will omit the parentheses and write LX
instead of L(X ) .
An equivalent form of the deﬁnition is
Theorem 4.1 . L is a linear operator ⇐⇒
˜
˜
L(aX + bX ) = aL(X ) + bL(X ),
˜
where X, X ∈ V1 and a and b are any scalars.
Proof: ⇒
˜
˜
L(aX + bX ) = L(aX ) + L(bX ) (property 1)
˜
= aLX + bLX (property 2). (4-1) ⇐ Property 1 is the special case a = b = 1 . Property 2 is the special case b = 0 .
Remark:
It is useful to observe that always L(0) = L(0 · X ) = 0L(X ) = 0 . This identity is often
the easiest way to test if an operator is not linear. 150 CHAPTER 4. LINEAR OPERATORS: GENERALITIES. V 1 → VN , VN → V 1 Examples:
(1) The operator L deﬁned by example 1 where L : R2 → R3 is
LX = (x1 + 2x2 , x1 + x2 , 3x1 + x2 ),
˜
is linear. Let X = (x1 , x2 ) and X = (˜1 , x2 ) . Then
x˜
˜
L(X + X ) = (x1 + x1 + 2x2 + 2˜2 , x1 + x1 + x2 + x2 , 3x1 + 3˜1 + x2 + x2 )
˜
x
˜
˜
x
˜
= (x1 + 2x2 , x1 + x2 , 3x1 + x2 ) + (˜1 + 2˜2 , x1 + x2 , 3˜1 + x2 )
x
x˜
˜x
˜
˜
= LX + LX
and
L(aX ) = (ax1 + 2ax2 , ax1 + ax2 , 3ax1 + ax2 )
= a(x1 + 2x2 , x1 + x2 , 3xz + x2 )
= aLX.
(4-2)
(2) The operator T X = x2 + x2 with domain R2 and range R1 is not linear, since
1
2
T (aX ) = (ax1 )2 + (ax2 )2 = a2 [x2 + x2 ] = aT X
1
2
except for the particular scalars a = 0, 1 .
df
(3) The operator Df = dx with domain C 1 [−1, 1] and range C [−1, 1] is linear since if
f1 and f2 are in C 1 [−1, 1] and a and b are any real numbers, then by elementary
calculus d
df1
df2
(af1 + bf2 ) = a
+b
dx
dx
dx
= aDf1 + bDf2 . D(af1 + bf2 ) = (4-3) (4) The operator L deﬁned as
Lu = a2 (x)u + a1 (x)u + a0 (x)u, (= d
),
dx where u(x) ∈ D(L) = C 2 , and where a0 (x) , a1 (x) , and a2 (x) are continuous
functions, is a linear operator,
L : C 2 → C.
If A and B are any constants (scalars for C 2 ), then for any u1 and u2 ∈ C 2 ,
L(Au1 + Bu2 ) = ax [Au1 + Bu2 ] + a1 [Au1 + Bu2 ] + a0 [Au1 + Bu2 ]
= a2 Au1 + a2 B2 + a1 Au1 + a1 Bu2 + a0 Au1 + a0 Bu2
= A[a2 u1 + a1 u1 + a0 u1 ] + B [a2 u2 + a1 u2 + a0 u2 ]
= ALu1 + BLu2 . (4-4) 4.1. INTRODUCTION. ALGEBRA OF OPERATORS 151 (5) The identity operator I is the operator which leaves everything unchanged. Because
it is so simple, it can be deﬁned on an arbitrary set S and maps S into itself S → S
in a trivial way. If X ∈ S , then we deﬁne
IX = X.
What could be more simple? If S is a linear space V (so aX and X1 + X2 are
deﬁned), then I is trivially a linear operator, since
I (aX1 + bX2 ) = aX1 + bX2 = aIX1 + bIX2
Why are linear operators important? There are several reasons. First, they are much
easier to work with than nonlinear operators. Second, most of the operators which arise
in applications are linear. The feature possessed by linear operators which is central to
applications is that of superposition. If Lu1 = f and Lu2 = g , then L(u1 + u2 ) = f + g .
In other words, if u1 is the response to some external inﬂuence f and u2 the response to
g , then the response to f + g is found by adding the separate responses.
The special case of a linear operator whose range is the real number line R1 arises often
enough to receive a name of its own.
Definition: A linear operator whose range is R1 is called a linear functional, V → R1 .
The Dirac delta functional is such an operator. So is the operator
1 l(f ) = f (x) dx,
0 which assigns to every continuous function f ∈ C [0, 1] the real number equal to the area
between the graph of f and the x -axis. Check that is linear.
If the linear operator L : V1 → V2 the range of L —a subset of the linear space V2 —has
a particularly nice structure. In fact, R(L) is not just any clump of points in V2 but
Theorem 4.2 . The range of a linear operator L : V1 → V2 is a linear subspace of V2 .
Remark: Even more is true. We shall prove (p. 312-3) that dim R(L) ≤ dim D(L) so that
no matter how large V2 is, the range has at most the same dimension as the domain.
Proof: The range of L consists of all elements Y ∈ V2 of the form Y = LX where
X ∈ V1 . We know that R(L) is a subset of the linear space V2 . The only task is to prove
that it is actually a subspace. Since V2 is a linear space, it is suﬃcient to show that the
set R(L) is closed under multiplication by scalars, and under addition of vectors. i) R(L)
is closed under multiplication by scalars. If Y ∈ R(L) , there is an X ∈ V1 = D(L) such
˜
˜
that Y + LX . We must ﬁnd some X in V1 such that aY = LX , where a is any scalar.
˜
Since aY = aLX = L(aX ) , we take X = aX .
ii) R(L) is closed under addition of vectors. If Y1 and Y2 are in R(L) , there are elements
X1 and X2 in V1 = D(L) such that Y1 = LX1 and Y2 = LX2 . We must show that
˜
˜
Y1 + Y2 ∈ D(L) , that is, ﬁnd some X ∈ V1 such that Y1 + Y2 = LX . But Y1 + Y2 =
˜ = X1 + X2 .
LX1 + LX2 = L(X1 + X2 ) . Thus we can take X
Before moving further on into the realm of special linear operators, we shall take this
opportunity to deﬁne algebraic operations (addition and multiplication) for linear operators.
But ﬁrst we deﬁne equality, L1 = L2 , in a straightforward way. 152 CHAPTER 4. LINEAR OPERATORS: GENERALITIES. V 1 → VN , VN → V 1 Definition: (equality) If L1 and L2 both map V1 into V2 , where V1 and V2 are linear
spaces, and if L1 X = L2 X for all X in V1 , then L1 equals L2 . Thus, two operators are
equal if they have the same eﬀect on any vector.
Addition is equally simple.
Definition: (addition). If L1 : V1 → V2 and L2 : V1 → V2 then their sum, L1 + L2 , is
deﬁned by the rule
(L1 + L2 )X = L1 X + L2 X, X ∈ V1
Examples:
(1) Let L1 : R2 → R3 be deﬁned by
X = (x1 , x2 ) ∈ R2 L1 (X ) = (x1 + x2 , x1 + 2x2 , −x2 ),
and L2 : R2 → R3 be deﬁned by X = (x1 , x2 ) ∈ R2 . L2 X = (−3x1 + x2 , x1 − x2 , x1 ),
Then L1 + L2 is deﬁned, and is (L1 + L2 )X + L1 X + L2 X = (x1 + x2 , x1 + 2x2 , −x2 ) + (−3x1 + x2 , x1 − x2 , x1 )
= (−2x1 + 2x2 , 2x1 + x2 , x1 − x2 )
(4-5)
(2) Let D : C 1 → C be deﬁned by
Du = du
dx u ∈ C 1, and L : C 1 → C be deﬁned by
1 Lu = ex−t u(t) dt u ∈ C1 0
1
x =e (4-6)
−t e u(t) dt.
0 (In reality, L may be deﬁned on a much larger class of functions— u ∈ C is plenty, while
its image is the smaller space, constant ex ⊂ C . We have decided on the smaller domain
and larger image space so that the sum D + L is deﬁned). Then for any u ∈ C 1 .
(D + L)u = Du + Lu = du
+
dx 1 ex−t u(t) dt. 0 The following theorem is a statement of some simple facts about the sum of two linear
operators.
Theorem 4.3 . Let L1 , L2 , L3 , . . . be any linear operators which map V1 → V2 , so that
their sums are deﬁned. Then
0. L = L1 + L2 is a linear operator
(1) L1 + (L2 + L3 ) + (L1 + L2 ) + L3 , 4.1. INTRODUCTION. ALGEBRA OF OPERATORS 153 (2) L1 + L2 = L2 + L1
(3) Let 0 be the operator which maps every element of V1 into 0 ∈ V2 , so 0X = 0 .
Then
L1 + 0 = L1 .
(4) L1 + (−L1 ) = 0 . Here −L1 is the operator which maps every element X ∈ V1
into −(L1 X ) .
Proof: These are just computations. Let X1 , X2 ∈ V1 .
0.
L(aX1 + bX2 ) = (L1 + L2 )(aX1 + bX2 )
= L1 (aX1 + bX2 ) + L2 (aX1 + bX2 )
= aL1 X1 + bL1 X2 + aL2 X1 + bL2 X2
= a(L1 X1 + L2 X1 ) + b(L1 X2 + L2 X2 ) (4-7) = a(L1 + L2 )X1 + b(L1 + L2 )X2
= aLX1 + bLX2 .
(1) (L1 + (L2 + L3 ))X = L1 X + (L2 + L3 )X = L1 X + L2 X + L3 X = (L1 + L2 )X + L3 X =
((L1 + L2 ) + L3 )X.
(2) (L1 + L2 )X = L1 X + L2 X = L2 X + L1 X = (L2 + L1 )X . The step L1 X + L2 X =
L2 X + L1 X is justiﬁed on the grounds that the vectors Y1 := L1 X and Y2 := L2 X
are elements of V2 —which is a linear space—so that Y1 + Y2 = Y2 + Y1 .
(3) (L1 + 0)X = L1 X + 0X = L1 X + 0 = L1 X
Note that the 0 in 0X is an operator, while the 0 in the next step is an element of
V2 . This ambiguity causes no trouble once you understand it.
(4) (L1 + (−L1 ))X = L1 X + (−L1 )X = L1 X − L1 X = 0
The crucial step (−L1 )X = −L1 X is the deﬁnition of the operator (−L1 ) .
Remark: This theorem states that the set of all linear operators mapping one linear space
V1 into another V2 form an abelian group under addition.
Multiplication of operators is not much more diﬃcult. If L1 and L2 are linear operators, then their product L2 L1 in that order is deﬁned by the rule L2 L1 X = L2 (L1 X ) . In
other words, ﬁrst operate on X with L1 giving a vector Y = L1 X . Then operate on this
new vector Y with L2 , giving L2 Y = L2 (L1 X ) . It is clear that in order for this to make
sense, for every X ∈ D(L1 ) , the new vector Y = L1 X must be in the domain of L2 . Thus
to form the product L2 L1 , we require that R(L1 ) ⊂ D(L2 ) .
Look at our machine again.
a figure goes here 154 CHAPTER 4. LINEAR OPERATORS: GENERALITIES. V 1 → VN , VN → V 1 The multiplication L2 L1 means sending the output from L1 as input into L2 . In order
to join the machines in this way, surely one necessary requirement is that L2 is equipped
to act on the output from R(L1 ) , that is, R(L1 ) ⊂ D(L2 ) . Of course the L2 machine
might be able to digest input other than what L1 sends out. But all we care is that L2
can digest at least what L1 sends it.
Definition: (multiplication). Let L1 : V1 → V2 and L2 : V3 → V4 . If the range of L1
is contained in the domain of L2 , R(L1 ) ⊂ D(L2 ) , then the product L2 L1 is deﬁnable by
the composition rule
L2 L1 X = L2 (L1 X ), where X ∈ V1 = D(L1 ).
The product L2 L1 maps the input V1 for L1 into the output V4 for L2 , L2 L1 : V1 →
V3 → V4 .
We exhibit a little diagram (cf. p. ???).
a figure goes here
The way to get from V1 to V4 using L2 L1 is to ﬁrst use L1 to reach V2 . Then use L2
to get to V4 .
Remarks: If L2 L1 is deﬁned, it is not necessarily true that L1 L2 is deﬁned (Example
1 below). Furthermore, even if L1 L2 is also deﬁned, it is only a rare coincidence that
multiplication is commutative. Usually L2 L1 = L1 L2 when both products are deﬁned.
Thus the order L2 L1 is important.
Examples:
(1) Let L1 : R2 → R3 be deﬁned as
L1 X = (x1 − x2 , x2 , −x1 − 2x2 ), where X = (x1 , x2 ) ∈ R2 ,
and let L2 : R3 → R1 be deﬁned as
L2 Y = (y1 + 2y2 − y3 ), where Y = (y1 , y2 , y3 ) ∈ R3 .
L 1
Then R(L1 ) ⊂ R3 = D(L2 ) so that the product L2 L1 is deﬁnable and L2 L1 : R2 → L2 R3 → R1 . Consider what L2 L1 does to the particular vector X0 = (−1, 2) ∈ R2 . L2 L1 X0 = L2 (L1 X0 ) = L2 (−3, 2, −3) = (−3 + 4 + 3 = 4)
Thus L2 L1 maps (−1, 2) ∈ R2 into 4 ∈ R1 . More generally, if X is any vector in
R2 ,
L2 L1 X = L2 (L1 X ) = L2 (x1 − x2 , x2 , −x1 − 2x2 )
= (x1 − x2 + 2x2 + x1 + 2x2 ) = 2x1 + 3x2 ∈ R1 . (4-8) Thus L2 L1 maps (x1 , x2 ) ∈ R2 into 2x1 + 3x2 ∈ R1 .
Since R(L2 ) = R1 and D(L1 ) = R2 , R(L2 ) not ⊂ D(L1 ) so that the product L1 L2
is not deﬁned. You might be thinking that R1 is part of R2 . What you mean is that
R2 has one dimensional subspaces. It certainly does—an inﬁnite number of them, all
of the straight lines through the origin. Because there are so many subspaces of R2 4.1. INTRODUCTION. ALGEBRA OF OPERATORS 155 which are one dimensional, there is no natural way of regarding R1 as being contained
in R2 . [On the other hand, there is a natural way in which C 1 can be regarded as
contained in C . We used this above in our second example for addition of linear
operators].
(2) Deﬁne L1 : R2 → R2 by the rule L1 X = (2x1 − 3x2 , −x1 + x2 ) and L2 : R2 → R2 by
the rule L2 X = (2x2 , x1 + x2 ) Then R(L1 ) = R2 = D(L2 ) so that L2 L1 is deﬁned.
It is given by
L2 L1 X = L2 (2x1 − 3x2 , −x1 + x2 ) = (−2x1 + 2x2 , x1 − 2x2 )
In particular, L2 L1 maps X0 = (1, 2) into (2, −3) . Now R(L2 ) = R2 = D(L1 ) , so
that L1 L2 is also deﬁnable. It is given by
L1 L2 X = L1 (2x2 , x1 + x2 )
= (2 · 2x2 − 3 · (x1 + x2 ), −2x2 + (x1 + x2 )) (4-9) = (−3x1 + x2 , x1 − x2 ).
In particular, L1 L2 maps X0 = (1, 2) into (−1, −1) . Since L1 L2 and L2 L1 map
the point X0 = (1, 2) into two diﬀerent points, it is clear that L1 L2 = L2 L1 , the
operators do not commute.
(3) Let A be the subspace of R2 spanned by some unit vector e1 and B be the subspace
spanned by another unit vector e2 . Consider the projection operators PA and PB .
They are linear since, for example,
PA (aX1 + bX2 ) = aX1 + bX2 , e1 e1
= a X1 , e1 e1 + b X2 , e1 e1 (4-10) = aPA X1 + bPA X2 .
Because PA : R2 → R2 and PB : R2 → R2 , both products PA PB and PB PA are
deﬁned. We have
PA PB X = PA (PB X ) = PA ( X, e2 e2 )
= X, e2 PA e2 = X, e2 e2 , e1 e1 . (4-11) Also,
PB PA X = PB (PA X ) = PB ( X, e1 e1 )
= X, e1 PB e1 = X, e1 e1 , e2 e2 . (4-12) Since PA PB X ∈ A ⊂ R2 , while PB PA X ∈ B ⊂ R2 , it is clear that usually PA PB =
PB PA . They will happen to be equal if A = B , or if A ⊥ B (for then PA PB =
PB PA = 0 ). See the ﬁgure at the beginning of this example—and draw some more
special cases for yourself.
(4) Let L : C ∞ → C ∞ ( C ∞ is the space of inﬁnitely diﬀerentiable functions) be deﬁned
by
(Lu)(x) = xu(x), u ∈ C ∞ , CHAPTER 4. LINEAR OPERATORS: GENERALITIES. V 1 → VN , VN → V 1 156 and D : C ∞ → C ∞ be deﬁned by
(Du)(x) = du
(x),
dx u ∈ C ∞. Then R(L) = D(D) so that the product DL is deﬁnable by
DLu = D(Lu) = D(xu) = d
(xu(x)) = xu + u.
dx Also, R(D) = D(L) so LD is deﬁnable by
LDu = L(Du) = L(u ) = xu .
Notice that LD = DL unless u = 0 .
We collect some properties of multiplication.
Theorem 4.4 . If L1 : V1 → V2 , L2 : V3 → V4 , and L3 : V5 → V6 , where V1 ⊂ V3 and
V4 ⊂ V5 , then
0. The operator L = L2 L1 is a linear operator.
1. L3 (L2 L1 ) = (L3 L2 )L1 —Associative law.
Proof: 0.
L(aX1 + bX2 ) = L2 L1 (aX1 + bX2 )
= L2 aL1 X1 + bL1 X2
= L2 (aL1 X1 ) + L2 (bL1 X2 ) (4-13) = aL2 L1 X1 + bL2 L1 X2
= aLX1 + bLX2 .
(1) By deﬁnition of the product,
[L3 (L2 L1 )]X = L3 [(L2 L1 )X ] = L3 [L2 (L1 X )]
and
[(L3 L2 )L1 ]X = (L3 L2 )(L1 X ) = L3 [L2 (L1 X )].
Now match the ends.
Notice that the commonly occurring special case V1 = V2 = V3 = V4 = V5 = V6 is
included in this theorem. In this special case, even more can be proved. For then the
identity operator I , deﬁned by IX = X for all X ∈ V can be used to multiply any other
operator. Moreover, addition, L1 + L2 also makes sense.
Theorem 4.5 . If the linear operators L1 , L2 , L3 all map V into V , then representing
any one of these by L ,
(1) LI = IL = L.
(2) For any positive integer n , we deﬁne Ln inductively by the rule Ln+1 = LLn ,
and L0 = I . Then for any non-negative integers m and n ,
Lm+1 = Lm Ln . 4.1. INTRODUCTION. ALGEBRA OF OPERATORS 157 (3) (L1 + L2 )L3 = L1 L3 + L2 L3 .
(4) L3 (L1 + L2 ) = L3 L1 + L3 L2 (This is needed in addition to 3 because of the noncommutativity).
Proof:
(1) If X ∈ V ,
(LI )X = L(IX ) = LX (IL)X = I (LX ) = LX
(2) We shall prove Lm+n = Lm Ln by induction on m . The statement is true, by
deﬁnition, for m = 1 . Assume it is true for m = k , so Lk+n = Lk Ln . Our job is to
prove the statement for m = k + 1 . By the deﬁnition and the induction hypothesis,
we have
Lk+n+1 = LLk+n = L(Lk Ln ).
Since multiplication is associative, we ﬁnd that
L(Lk Ln ) = (LLk )Ln .
But, by deﬁnition,
LLk = Lk+1 .
Thus,
Lk+n+1 = Lk+1 Ln .
This completes the induction proof.
(3) If X ∈ V ,
[(L1 + L2 )L3 ]X = (L1 + L2 )(L3 X )
Let L3 X = Y ∈ V . Then (L1 + L2 )Y = L1 Y + L2 Y. Thus
[(L1 + L2 )L3 ]X = L1 (L3 X ) + L2 (L3 X ) = (L1 , L3 )X + (L2 L3 )X.
(4) Same proof as 3.
Remark: If V1 and V2 are two linear spaces, the set of all linear operators which
map V1 into V2 is usually denoted by Hom(V1 , V2 ) —Hom rhymes with Mom and
Tom. In this notation, the last theorem concerned Hom(V, V ) . The abbreviation
Hom is for the impressive word “homomorphism”. Tell your friends.
Examples: Consider D : C ∞ → C ∞ deﬁned by (Du)(x) =
dn
.
dxn Exercises
(1) Determine which of the following are linear operators. du
dx (x) . Then Dn = 158 CHAPTER 4. LINEAR OPERATORS: GENERALITIES. V 1 → VN , VN → V 1
(a) T : R2 → R2
T X = (x1 + x2 , x1 − x2 ),
where X = (x1 , x2 ) ∈ R2 .
(b) T : R2 → R2 ,
T (X ) = (x1 + x2 + 1, x1 − x2 )
(c) T : R3 → R2
T (X ) = (x1 + x1 x2 , x2 )
(d) T : R3 → R1
T (X ) = (x1 + x2 − x3 )
(e) T : R3 → R1
T (X ) + (x1 + x2 − x3 + 2)
(f) D : P2 → P1 . If P (x) = a2 x2 + a1 x + a0 ∈ P2 then
D(P ) = 2a2 + a1 ∈ P1 .
(g) T : C 1 [−1, 1] → R1 . If u(x) ∈ C 1 [−1, 1] , then
T (u) = u(0) + u (0).
(h) T : C [2, 3] → C [2, 3] . If u ∈ C [2, 3] ,
3 ex−t u(t) dt (T u)(x) =
2 (i) T : C [2, 3] → C [2, 3] ,
3 (T u)(x) = 1 + ex−t u(t) dt 2 (j) T : C [2, 3] → C [2, 3],
3 (T u)(x) = ex−t u2 (t) dt 2 (k) S1 : C [0, ∞] → C [0, ∞]
(S1 u)(x) = u(x + 1) − u(x)
(l) L : A → C [0, ∞] ,
where A = { u ∈ C [0, ∞] : ∞
0 |u(t)| dt < ∞ } ,
∞ (Lu)(x) = e−xt u(t) dt, 0 [Our restriction on A is just to insure that the integral exists. Lu is usually
called the Laplace transform of u ]. 4.1. INTRODUCTION. ALGEBRA OF OPERATORS 159 (m) T : C [0, ∞] → C [0, ∞]
(T u)(x) = a2 u(x2 ) + a1 u(x + 1) + a0 u(x),
where the ak (x) are continuous functions.
(n) T : C [0, 1] → C [0, 1] .
(T u)(x) = 2xu(x).
(o) T : R2 → R1
T X = |x1 + x2 | , where X = (x1 , x2 ) ∈ R2 .
[Answers: a,d,f,g,h,k,l,m,n are linear].
(2) (a) If l(x) is a linear functional mapping R1 → R1 , prove that l(x) = αx , where
α = l(1) .
n (b) If l(X ) is a linear functional mapping Rn → R1 , prove that l(X ) = αk xk ,
k=1 where X = (x1 , . . . , xn ) .
(3) Let L1 : R1 → R2 be deﬁned by
L1 X = (x1 , 3x1 ), where X = (x1 ) ∈ R1 ,
and L2 : R2 → R2 be deﬁned by
L2 Y = (y1 + y2 , y1 + 2y2 ), where Y = (y1 , y2 ) ∈ R2 .
Compute L2 L1 X0 , where X0 = 2 ∈ R1 . Is L1 L2 deﬁned?
(4) Let A : R2 → R2 be deﬁned by
X = (x1 , x2 ) ∈ R2 AX = (x1 + 3x2 , −x1 − x2 ),
and B : R2 → R2 by BX = (−x1 + x2 , 2x1 + x2 ).
a). Compute ABX, BAX, B 2 X, A2 BX , and (A + B )X .
b). Find an operator C such that CA = I . [hint: Let CX = (c11 x1 + c12 x2 , c21 x1 +
c22 x2 ) and solve for c11 , c12 , etc.]
(5) Consider the operators D : C ∞ → C ∞ , (Du) = u and L : C ∞ → C ∞ , (Lu)(x) =
x
0 u(t) dt .
(a) Show that DL = I, LD = I − δ , where δ is the delta functional.
x 2 (L2 u)(x) = u(t) dt
0 ds. 0 Integrate by parts to conclude that
x (L2 u)(x) + (x − t)u(t) dt.
0 160 CHAPTER 4. LINEAR OPERATORS: GENERALITIES. V 1 → VN , VN → V 1
(b) Observe that D2 L2 = D(DL)L = DIL = DL = I . Use this observation to ﬁnd
a solution of the diﬀerential equation D2 u = f for u , where f ∈ C ∞ . Solve
1
the particular equation (D2 u)(x) = 1+x2 (6) Let A : R2 → R2 be deﬁned by
AX = (a11 x1 + a12 x2 , a21 x1 + a22 x2 ),
and B : R2 → R2 be deﬁned by
BX = (b11 x1 + b12 x2 , b21 x1 + b22 x2 ).
(a) Compute AB .
(b) Find a matrix B such that AB = I , that is, determine b11 , b12 , . . . in terms of
a11 , a12 , . . . such that AB = I . [In the course of your computation, I suggest introducing a symbol, say ∆ , for a11 a12 − a12 a21 when that algebraic combination
crops up.]
(7) In the plane E2 , consider the operator R which rotates a vector by 90o and the
operator P projecting onto the subspace spanned by e (see ﬁg). (a) Prove that R
is linear. (b). Let X = (x1 , x2 ) be any point on E2 . Compute P RX and RP X .
Draw a sketch for the special case X = (1, 1) .
(8) In R3 , let A denote the operator of rotation through 90o about the x1 -axis (so
A : (0, 1, 0) → (0, 0, 1) ), B the operator of rotation through 90o about the x2 -axis
and C the operator of rotation through 90o about the x3 -axis (see ﬁg.) Prove these
operators are linear (just do it for A ). Show that A4 = B 4 = C 4 = I, AB = BA ,
and that A2 B 2 = B 2 A2 . Is it true that ABAB = A2 B 2 ?
(9) Let P denote the linear space of all polynomials in x . For p ∈ P , consider the
dp
operators Dp = dx and Lp = xp . Show that DL − LD = I .
(10) (a) If L1 L2 = L2 L1 , prove that
(L1 + L2 )2 = L2 + 2L1 L2 + L2 .
1
2
(b) If L1 L2 = L2 L1 , then (L1 + L2 )2 =?
(11) If L1 and L2 are operators such that L1 L2 − L2 L1 = I , prove the formula Ln L2 −
1
L2 Ln = nLn−1 , where n = 1, 2, 3, . . . .
1
1
(12) If L1 is a linear operator, L1 : V1 → V2 [or L1 ∈ Hom(V1 , V2 ) ], and a is any scalar,
deﬁne the operator L = aL1 by the rule LX = (aL1 )X = a(L1 X ) , where X ∈ V1 .
Prove
(0). L = aL1 is a linear operator, L : V1 → V1 .
(5). a(bL1 ) = (ab)L1 , where a, b are any scalars.
(6). 1 · L1 = L1 .
(7). (a + b)L1 = aL1 + bL1 .
(8). a(L1 + L2 ) = aL1 + aL2 , where L2 ∈ Hom(V1 , V2 ) . 4.2. A DIGRESSION TO CONSIDER AU + BU + CU = F . 161 Coupled with Theorem 3, this exercise proves that the set of all linear operators mapping one linear space in to another linear is itself a linear space, that is, Hom (V1 , V2 )
is a linear space.
(13) (a). In E2 , let L denote the operator which rotates a vector by 90o . Then L : E2 →
E2 . If X = (x1 , x2 ) = x1 e1 + x2 e2 , where e1 = (1, 0) and e2 = (0, 1) , write L as
LX = (a11 x1 + a12 x2 , a21 x1 + a22 x2 ),
That is, ﬁnd the coeﬃcients a11 , a12 , . . . . This gives two ways to represent L , as
a rotation (geometrically), and by linear equations in terms of a particular basis
(algebraically).
(b). In E2 , consider the operator L of rotation through an angle α . Show that
Le1 = (cos α, sin α), Le2 = (− sin α, cos α), and then deduce that if X = (x1 , x2 ) = x1 e1 + x2 e2 ,
LX = (x1 cos α − x2 sin α, x1 sin α + x2 cos α).
(14) Consider the space Pn of all polynomial of degree n . Deﬁne L : Pn → Pn as the
translation operator (Lp)(x) = p(x + 1) , and D : Pn → Pn as the diﬀerentiation
dp
operator, (Dp)(x) = dx (x) . Show that
L=I +D+ D2
Dn−1
Dn
+ ··· +
+
2!
(n − 1)!
n! (15) Consider the linear operators L1 = a1 D2 + b1 D + c1 I , and L2 = a2 D2 + b2 D + e2 I .
Both L1 and L2 map the linear space of inﬁnitely diﬀerentiable function into itself,
Lj : C ∞ → C ∞ . If the coeﬃcients a1 , a2 , b1 , . . . are constants, prove that L1 L2 =
L2 L1 . 4.2 A Digression to Consider au + bu + cu = f . Essentially the only linear equation you can solve explicitly are linear algebraic equations,
like two equations in two unknowns. Since our theory applies to much more general situations, we shall develop a diﬀerent example for you to keep in the back of your minds along
with that of linear algebraic equations. The example we have chosen has the additional
virtue that it contains most of the solvable diﬀerential equations which arise anywhere.
Watch closely because we shall be brief and with a high density of valuable ideas.
Problems concerning vibration or oscillatory phenomena are among the most important
and signiﬁcant ones which arise in applications. The simplest case is that of a simple
harmonic oscillator. We have
a figure goes here 162 CHAPTER 4. LINEAR OPERATORS: GENERALITIES. V 1 → VN , VN → V 1 a mass m attached to a spring. Pull the mass back a little and watch it move back and
forth, back and forth. These are oscillations. To make the situation simple, we assume
that the spring has no mass and that the surface upon which the mass rests is frictionless.
Let u(t) denote the displacement of the center of gravity of the mass from the equilibrium
position. Two experimental results are needed from physics.
.
1. Newton’s Second Law: m . . u =
F , where
F means the resultant of all the
forces on the center of gravity of the mass (we assume all forces are acting horizontally).
2. Hooke’s Law: If a spring is not stretched too far, then the force it exerts is proportional to the displacement,
F = −ku,
k > 0.
We chose the minus sign since if a spring is displaced, the force it exerts is in the direction
opposite to the displacement. [Under larger displacements, actually
F (u) = a1 u + a2 u2 + a3 u3 + . . .
-where a0 = F (0) = 0 . If the displacement u is small, the lowest term in the Taylor series
for F (u) gives an adequate approximation. This is a more precise statement of Hooke’s
Law].
Putting these two results together, we ﬁnd that
mu = −ku + F1 ,
¨ (notation : u =
¨ d2 u
)
dt2 where F1 represents all of the remaining forces on the mass. One possible force (so far
incorporated into F1 ) is a so-called viscous damping force. It is of the form Fv = −µu where
˙
µ > 0 ; at low velocities, this force is experimentally found to account for air resistance. It
is directed opposite to the velocity, and increases as the speed does (speed = velocity ).
[Again, Fv = b1 u + b2 u2 + . . . , that is Fv (u) is given by a Taylor series with Fv (0) = 0 . At
¨
˙
˙
low speeds, the higher order terms can be neglected to yield a reasonable approximation.]
Thus, to our approximation,
mu = −ku − µu + F2 ,
¨
˙
where F2 represents the forces yet unaccounted for. Let us assume that these remaining
forces do not depend on the motion and are applied by the outside world. Then the force
F2 depends only on time, F2 = f (t) . It is called the applied or external force. Newton’s
law gives
mu = −ku − µu + f (t),
¨
˙
or
Lu : = au = bu + cu = f (t),
¨
˙
where a = m , b = µ , and c = k . For the purposes of our discussion, we shall assume that
k and µ do not depend on time. Then a, b and c are non negative constants.
In order to determine the motion of the mass, we must solve the ordinary diﬀerential
equation Lu = f for u . Have we given enough information to determine the solution? In
other words, is the solution unique? For any physically reasonable problem, we expect the
mathematical model has a unique solution since (neglecting quantum mechanical eﬀects)
once we let the mass go, it will certainly move in one particular way, the same way every
time we perform the same experiment. It is clear that the motion will depend on the initial 4.2. A DIGRESSION TO CONSIDER AU + BU + CU = F . 163 position u(t0 ) . But if two masses have the same initial position, the resulting motion will
still be diﬀerent if their initial velocities u(t0 ) are diﬀerent. Thus we must also specify the
˙
initial velocity u(t0 ) as well as the initial position u(t0 ) . Are these suﬃcient to determine
˙
the motion? Yes, however that requires proof. What must be proved is that if we have two
solutions u1 (t) and u2 (t) of the same ordinary diﬀerential equation (1) and if their initial
positions and velocities coincide, then the solutions coincide, u1 = u2 for all later time,
t ≥ t0 .
Theorem 4.6 (Uniqueness). Let u1 (t) and u2 (t) be two solutions of the ordinary diﬀerential equation
Lu : = au + bu + cu = f (t),
¨
˙
where a, b , and c are constants, a > 0, b ≥ 0, c ≥ 0 . If u1 (t0 ) = u2 (t0 ) , and u1 (t0 ) =
˙
u2 (t0 ) , then u1 (t) = u2 (t) for all t ≥ 0 , in other words, the solution is uniquely determined
˙
by the initial position and velocity.
Remark: The theorem is true under much more general conditions - as we shall prove in
Chapter 6.
Proof: Let w(t) = u2 (t) − u1 (t) . We shall show that w(t) ≡ 0 for all t ≥ t0 . Now
Lw = L(u2 − u1 ) = Lu2 − Lu1 = f − f = 0,
that is,
aw + bw + cw = 0
¨
˙ (4-14) w(t0 ) = 0 and w(t0 ) = 0,
˙ (4-15) Furthermore
since w(t0 ) = u2 (t0 ) − u1 (t0 ) = 0 , and w(t0 ) = u2 (t0 ) − u1 (t0 ) = 0 . This reduces the
˙
˙
˙
question to showing that if Lw = 0 , and if w has zero initial position and velocity, then
in fact w ≡ 0 .
The trick is to introduce a new function, E (t) , associated with (2) (which happens to
be the total energy of the system)
1
1
E (t) = aw2 + cw2 .
˙
2
2
How does this function change with time? We compute its derivative.
˙
E (t) = aww + cww = w(aw + cw).
˙¨
˙
˙¨
Using (2) we know that aw + cw = −bw . Therefore
¨
˙
˙
E (t) = −bw2 ≤ 0
˙ (since b ≥ 0) ˙
[Thus energy is dissipated (b > 0 ) - or conserved E = 0 in the special case of no damping
( b = 0 ).] Consequently
E (t) ≤ E (t0 ) for all t ≥ t0
(4-16)
a
Now observe that for the mechanical system associated with w , we have E (t0 ) = w w2 (t0 )+
˙
c2
w (t0 ) = 0 . Furthermore, it is obvious from the deﬁnition of E (t) (since a and c are
2
positive) that 0 ≤ E (t) . Substitution of this information into (4) reveals 0 ≤ E (t) ≤ 0 for all t ≥ t0 . 164 CHAPTER 4. LINEAR OPERATORS: GENERALITIES. V 1 → VN , VN → V 1 This proves E (t) ≡ 0 for all t ≥ t0 , which in turn implies w(t) ≡ 0 —again from the
deﬁnition of E (t) . Our proof is completed. We have taken some care since all of our
uniqueness proofs will use essentially no additional ideas. A more general case ( a, b and c
still constants but not necessarily positive) will be treated in Exercise 9.
Having proved that there is at most one solution of the initial value problem
Lu : = au + bu + cu = f (t)
¨
˙
u(t0 ) = α and u(t0 ) = β
˙ (diﬀerential equation)
(initial conditions) we must now prove there is at least one solution. This is the question of existence. For the
special equation (5), the solution is shown to exist by explicitly exhibiting it. In the case of
more complicated equations we are not as fortunate and must content ourselves with just
showing that a unique solution exists but cannot exhibit it in closed form.
It is easiest to begin with the homogeneous equation Lu = 0 , that is, ﬁnd a solution of
au + bu + cu = 0
¨
˙ with u(t0 ) = α, and u(t0 ) = β.
˙ Without motivation, let us see what the substitution u(t) = eλt yields. Here λ is a
constant. We must compute Leλt .
Leλt = (aλ2 + bλ + c)eλt .
Can λ be chosen so that eλt is a solution of Lu = 0 ? Since eλt = 0 for any t , this means,
is it possible to pick λ so that aλ2 + bλ + c = 0 ? Yes. In fact that “quadratic equation
formula” yields two roots
√
√
−b − b2 − 4ac
−b + b2 − 4ac
, λ2 =
λ1 =
2a
2a
of the characteristic polynomial p(λ) = aλ2 + bλ + c . Notice that we have assumed a = 0 .
Thus, two solutions of the homogeneous equation are
u1 (t) = eλ1 t and u2 (t) = eλ2 t . Since the operator L is linear, every linear combination of solutions is also a solution,
L(Au1 + Bu2 ) = ALu1 + BLu2 = 0 . Therefore u(t) = Au1 (t) + Bu2 (t) is a solution of the
homogeneous equation Lu = 0 for any choice of the scalars A and B .
What about the initial conditions u(t0 ) = α, u(t0 ) = β ; can they be satisﬁed by picking
˙
the constants A and B suitably? Let us try. We want to pick A and B so that
Aeλ1 t0 + Beλ2 t0 = α
λ1 t0 Aλ1 e λ2 t0 + Bλ2 e (u(t0 ) = α)
=β (u(t0 ) = β ).
˙ These equations can be solved as long as
0 = λ2 e(λ1 +λ2 )t0 − λ1 e(λ2 +λ1 )t0 = (λ2 − λ1 )e(λ1 +λ2 )t0 ,
which means λ1 = λ2 or b2 − 4ac = 0 . [The linear equations Ar1 + Bs1 = α, Ar2 + Bs2 = β
can be solved for A and B if r1 s2 − r2 s1 = 0 ]. Before dealing with the degenerate case
b2 − 4ac = 0 , let us consider an 4.2. A DIGRESSION TO CONSIDER AU + BU + CU = F . 165 Example: Solve u + 3u + 2u = 0 with the initial conditions u(0) = 1 and u(0) = 0 . If
¨
˙
˙
λt , the characteristic polynomial is λ2 + 3λ + 2 = 0 ,
we seek a solution of the form u(t) = e
and has roots λ1 = −1, λ2 = −2 . Therefore u(t) = Ae−1 + Be−2t is a solution. Since
λ1 = λ2 , we can solve for A and B by using the initial conditions. We ﬁnd
A+B =1 (u(0) = 1), −A − 2B = 0 (u(0) = 0).
˙ These two equations yield A = 1, B = −1 . Thus
u(t) = 2e−t − e−2t
is the unique solution of our initial value problem.
The degenerate case b2 − 4ac = 0 must be discussed separately. In this case λ1 = λ2 =
b
− 2a , so the two solutions eλ1 t and eλ2 t are really the same solution. Without motivation
(but see Exercise 12) we claim that teλ1 t is also a solution. This is easy to verify by a
calculation.
L(teλ1 t ) = a(tλ2 eλ1 t + 2λ1 eλ1 t ) + b(eλ1 t + λ1 teλ1 t ) + cteλt
1
= (aλ2 + bλ1 + c)teλt + (2aλ1 + b)eλ1 t
1 (4-17) Since (aλ2 + bλ1 + c) = 0 by deﬁnition of λ1 , and λ1 = − 2ba in our special case, both terms
1
on the right vanish. Hence both u1 (t) = eλ1 t and u2 (t) = teλ1 t are solutions of Lu = 0
(if b2 − 4ac = 0) , so u(t) = Aeλ1 t + Bteλ1 t is a solution for any choice of A and B . It is
possible to pick A and B to satisfy arbitrary initial conditions u(t0 ) = α, u(t0 ) = β .
˙
Aeλ1 t0 + Bt0 eλ1 t0 = α,
λ1 t0 Aλ1 e (u(t0 ) = α)
λ1 t0 + B (1 + λ1 t0 )e =β (u(t0 ) = β ).
˙ These can be solved for A and B since
0 = (1 + λ1 t0 )e2λ1 t0 − λ1 t0 e2λ1 t0 = e2λ1 t0 .
Example: Solve u +6u +9u = 0 with the initial conditions u(1) = 2 , u(1) = −1 . Seeking
¨
˙
˙
a solution in the form eλt , we are led to the characteristic equation λ2 + 6λ + 9 = 0 , which
has λ1 = −3, λ2 = −3 , as roots. Therefore u1 (t) = e−3t is a solution of Lu = 0 . Since
λ1 = λ2 , another solution is u2 (t) = te−3t . Thus u(t) = Ae−35 + Bte−3t is a solution for
any A and B . To solve for A and B in terms of the initial conditions, we must solve the
algebraic equations
Ae−3 + B · 1 · e−3 = 2,
−3 − 3Ae −3 + B (1 − 3)e (u(1) = 2),
= −1, (u(1) = −1).
˙ We ﬁnd that A = −3e3 and B = 5e3 . Thus
u(t) = −3e3 e−3t + 5e3 te−3t ,
or, equivalently,
u(t) = −3e−3(t−1) + 5te−3(t−1) .
Our results will now be collected as 166 CHAPTER 4. LINEAR OPERATORS: GENERALITIES. V 1 → VN , VN → V 1 Theorem 4.7 . The initial value problem
au + bu + cu = 0, a = 0,
¨
˙ with u(t0 ) = α, u(t0 ) = β,
˙ where a, b , and c are constants, has a unique solution.
i) If b2 − 4ac = 0 , it is of the form
u(t) = Aeλ1 t + Beλ2 t .
ii) If b2 − 4ac = 0 , so λ1 = λ2 , it is of the form
u(t) = Aeλ1 t + Bteλ1 t .
Here λ1 and λ2 are the roots of the characteristic equation aλ2 + bλ + c = 0 , and the
constants A and B are determined from the initial conditions.
Remark: We have omitted the condition a > 0 , b ≥ 0 , c ≥ 0 from our theorem since the
construction presented to ﬁnd a solution did not depend on this. Uniqueness for that case
is treated as exercise 9, as we mentioned earlier.
Another
Example: Solve u − 2u + 2u = 0 , with the initial conditions u(0) = 1, u(0) = 1 . The
¨
˙
˙
characteristic polynomial is λ2 − 2λ + 2 = 0 . Its roots are λ1 = 1 + i , and λ2 = 1 − i . Since
λ1 = λ2 , the solution is of the form u(t) = Ae(1i )t + Be(1−i)t . From the initial conditions,
we ﬁnd that
A + B = 1, (u(0) = 1), (1 + i)A + (1 − i)B = 1,
Thus A = 1 , B =
2 1
2 (u(0) = 1).
˙ , so
1
1
u(t) = e(1+i)t + e(1−i)t .
2
2 Recalling that ex+iy = ex (cos t + i sin y ) , this solution may be written in a more familiar
form:
1
1
u(t) = et (cos t + i sin t) + et (cos t − i sin t),
2
2
that is,
u(t) = et cos t.
What has been done can be summarized elegantly in the language of linear spaces. We
have sought a solution of a second order linear O.D.E., which we write as Lu = 0 . It was
found that every solution of this equation could be expressed as a linear combination of
two speciﬁc solutions u1 and u2 , u(t) = Au1 (t) + Bu2 (t) , where the constants A and B
are uniquely determined from u(t0 ) and u(t0 ) . Thus, the set of functions u which satisfy
˙
Lu = 0 form a two dimensional subspace of D(L) = C 2 . The functions u1 and u2 span
that subspace. If we call the set of all solutions of Lu = 0 the nullspace of L, N(L) , then
our result simply reads “dim N(L) = 2 ”. A particular solution of Lu = 0 is found by
specifying u(t0 ) and u(t0 ) .
˙
The inhomogeneous equation Lu = f is treated by ﬁnding a coset of the nullspace
of L . For if u0 is a particular solution of the inhomogeneous equation Lu0 = f , then 4.2. A DIGRESSION TO CONSIDER AU + BU + CU = F . 167 u = u+u0 ; where u ∈ N(L) , is also a solution since Lu = L(˜+u0 ) = Lu+Lu0 = 0+f = f .
˜
˜
u
˜
Therefore, if one solution u0 of the inhomogeneous equation Lu = f is found, the general
solution is u = u + u0 where u ∈ N(L) . In particular, the solution u ∈ N(L) can be
˜
˜
˜
chosen so that arbitrary initial conditions for u , u(t0 ) = α, u(t0 ) = β , can be met. We
˙
shall defer (until our systematic treatment of linear O.D.E.’s) presenting a general method
for ﬁnding a solution u0 of the inhomogeneous equation. In our example, the particular
solution will be found by guessing.
Example: Solve Lu : = u − u = 2t , with the initial conditions u(0) = −1, u(0) = 3 . The
¨
˙
homogeneous equation Lu = 0 has the general solution u(t) = Aet + Be−t . We observe
˜
that the function u0 (t) = −2t is a particular solution of the inhomogeneous equation,
Lu = 2t . Thus u(t) = Aet + Be−t . The initial conditions lead us to solve the following
equations for A and B ,
A + B = −1 (4-18) A − B − 2 = 3. (4-19) A computation gives A = 2 , B = −3 . Thus the solution of our problem is
u(t) = 2et − 3e−t − 2t
It is routine to verify that this function u(t) does satisfy the O.D.E. and initial conditions
(you should verify the solutions to check for algebraic mistakes). Exercises
(1) Solve the following homogeneous initial value problems,
(a). u − u = 0,
¨ u(0) = 0, u(0) = 1 .
˙ (b). u + u = 0,
¨ u(0) = 1, u(0) = 0.
˙ (c). u − 4u + 5u = 0,
¨
˙ u(0) = −1, (d). u + 2u − 8u = 0,
¨
˙ u(2) = 3, (e). u = 0,
¨ u(0) = 7, u(0) = 2.
˙
u(2) = 0.
˙ u(0) = 3.
˙ (2) Solve the following inhomogeneous initial value problems by guessing a particular
solution of the inhomogeneous equation. Check your answers.
(a) u − u = t2 ,
¨ u(0) = 0, hint: Try u0 (t) = a1 t2 (b) u − 4u +5u = sin t,
¨
˙ u(0) = 0
˙ + a2 t + a3 and solve for a1 , a2 , a3 .]
u(0) = 1, u(0) = 0 [hint: Try u0 (t) = a1 sin t + a2 cos t .]
˙ (3) Consider an undamped harmonic oscillator with a sinusoidal forcing term, u + n2 u =
¨
2 = n2 [try u (t) = a sin γt + a cos γt for a
sin γt . Find the general solution if γ
0
1
2
+ particular solution]. What happens if γ →− n ? This is called resonance.
(4) You shall discuss damping in this problem. Consider the equation u + 2µu + ku = 0 ,
¨
˙
where µ > 0 , and k > 0 . We shall let γ = |µ2 − k | . 168 CHAPTER 4. LINEAR OPERATORS: GENERALITIES. V 1 → VN , VN → V 1
(a) Light damping (µ2 < k ) . Show that the solution is
u(t) = e−µt (A cos γt + B sin γt),
and sketch a rough graph for the case A = 1, B = 0 . This is the kind of
oscillation you want for a pendulum clock, with µ small.
(b) Heavy damping (µ2 > k ) . Show that the solution is
u(t) = e−µt (Aeγt + Be−γt ).
Show that u(t) vanishes at most once. Sketch a graph for the two cases A =
B = 1 and A = −1, B = 3 . The ﬁrst describes the oscillation of an ideal screen
door, while the second describes the ideal oscillation of a slammed car door. (5) It is often useful to study the oscillations described by u + 2µu + ku = 0 by sketching
¨
˙
the solution in the u, u plane - or phase space as it is called. Investigate the curves
˙
for heavily and lightly damped oscillators. Show that the curve for a heavily damped
oscillator will be a straight line through the origin for special initial conditions. What
does the phase space curve look like for an undamped oscillator (µ = 0, k > 0) ?
(6) Consider the linear operator Lu = au + bu + cu , where a, b, c are constants. We have
¨
˙
seen that Lert = p(r)ert where p(r) is the characteristic polynomial.
(a) If r is not one of the roots of the characteristic polynomial, observe that you
can ﬁnd a particular solution of Lu = ert . What is it?
(b) If neither r1 nor r2 is a root of the characteristic polynomial, ﬁnd a particular
solution of Lu = a1 er1 t + a2 er2 t , where a1 and a2 are speciﬁed constants.
(c) Use this procedure to ﬁnd a particular solution of
i)¨ − 4u = cos ht,
u ii)¨ + 4u = sin t
u (7) (a) Imitate our procedure and develop a theory for the ﬁrst order homogeneous
O.D.E. Lu : = u + bu = 0 , where b is a constant. In particular, you should prove
˙
that there exists a unique solution satisfying the initial condition u(t0 ) = α , and
give a recipe for ﬁnding it. Use your recipe to solve u + 2u = 0, u(0) = 3 .
˙
(b) And now you will show us how to ﬁnd a particular solution of the inhomogeneous
equation Lu = f , where f (t) is some given continuous function and Lu : =
d
u + bu . [hint: Try to ﬁnd a function µ(t) such that µ(u + bu) = dt (µu) . Then
˙
˙
d
integrate dt (µu) = µf , and solve for u ]. Use your method to ﬁnd a particular
solution for u + 2u = x , and then a solution of the same equation which satisﬁes
˙
the initial condition u(0) = 1 .
(8) Find a solution of u − 2u − u + 2u = 0 which satisﬁes the initial conditions
u(0) = u (0) = 0, u (0) = 1 . [hint: The cubic equation γ 3 − 2γ 2 − γ + 2 has roots
+1, −1 and 2].
(9) You will prove the uniqueness theorem for the equation u + bu + cu = 0 , where b and
¨
˙
c are any constants (we have let a = 1 , because if it is not 1, just divide the whole
equation by a). The trick is to reduce this to the special case b ≥ 0, c ≥ 0 , already
done. 4.2. A DIGRESSION TO CONSIDER AU + BU + CU = F . 169 (a) Show that in order to prove the solution of
u + bu + cu = f, where u(t0 ) = α, u(t0 ) = β
¨
˙
˙
is unique, it is suﬃcient to prove that the only solution of
w + bw + cw = 0, w(t0 ) = 0, w(t0 ) = 0
¨
˙
˙
is w(t) ≡ 0 .
(b) Deﬁne ϕ(t) by w(t) = eγt ϕ(t) . Observe: to prove w = 0 , it is suﬃcient to
prove ϕ ≡ 0 ( here γ is any constant). Use the diﬀerential equation and initial
conditions for w to ﬁnd the diﬀerential equation and initial conditions for ϕ .
Show that γ can be picked so that the D.E. for ϕ is
ϕ + ˜ϕ + ˜ϕ = 0,
¨ b˙ 0
where ˜ and c are positive. Deduce that ϕ ≡ 0 , and from that, that w ≡ 0 ,
b
˜
completing the proof.
(10) A boundary value problem for the equation
u + bu + cu = 0
is to ﬁnd a solution of the equation with given boundary values, say u(0) = α and
u(1) = β . Assume b and c are real numbers.
(a) Show that a solution of the boundary value problem always exists if b2 − 4c ≥ 0
(the case b2 − 4c = 0 will have to be done separately).
(b) Prove that if b2 − 4c ≥ 0 , the solution is unique too. [I suggest letting u(t) =
eγt v (t) , and then choosing γ so that the equation satisﬁed by v is of the form
v + cv = 0 , where c ≤ 0 . The case c = 0 is trivial. If ˜c) < 0 , can the solution
˜
˜
˜
(
have a positive maximum or negative minimum?]
(11) If a spring is hung vertically and a mass m placed at its end, an external force of
magnitude mg due to gravity is placed on the system. Assume there are no dissipative
forces of any kind.
(a) Set up the diﬀerential equation of motion. Remember that you must specify
which is the positive direction.
(b) If the tip of the spring is displaced a distance d by placing the mass on it (no
motion yet), so the equilibrium position is d below the unstretched end of the
spring, show that the spring constant k is given by k = mg/d .
(c) Let the body weigh 32 pounds, and d be 2 feet. Find the subsequent motion if
the body is initially displaced from rest one foot below its equilibrium position.
[Take |g | = 32 ft/sec 2 ].
(12) * Consider au + bu + cu = 0 . If γ1 = γ2 , are the roots of the characteristic equation,
observe that the function
eγ1 t − eγ2 t
u(t) =
˜
γ1 − γ2 170 CHAPTER 4. LINEAR OPERATORS: GENERALITIES. V 1 → VN , VN → V 1
is also a solution (it is a linear combination of eγ1 t and eγ2 t ). Now pass to the
limit γ2 → γ1 (leave γ1 ﬁxed and let γ2 move) by using the Taylor series for eγt .
The function you get is then a “guess” for a second solution in the degenerate case
γ1 = γ2 . This supplies some motivation for the guess made earlier. (13) * Consider Lu : = u + 2u = f , where f is given. You know how to solve
Lu = A sin nx (Exercise 6). Find a particular solution to the general inhomogeneous equation in the interval [−π, π ] by expanding f in a Fourier series and then
use superposition. Apply this to solve u + 2u = x .
(14) Consider an undamped harmonic oscillator, whose motion is speciﬁed by u(t) , where
k
k
t + B1 sin
t may
mu + ku = 0, k > 0 . Show that the solution u(t) = A1 cos
m
m
be written in the form
u(t) = A sin(wt + θ),
where A is the amplitude of the oscillation, w = 2πv, v is the frequency, and θ is
the phase. Show that u(t) is periodic, u(t + T ) = u(t) , where the period T = 1/v .
Interpret the amplitude and phase and determine A, w, and θ in terms of A1 , B1 , k
and m . [I suggest looking at a speciﬁc example and its graph ﬁrst]. 4.3 Generalities on LX = Y . Undoubtedly the fundamental problem in the theory of linear (and nonlinear) operators is
to determine the nature of the range of an operator L . One particular aspect of this is the
vast problem of solving the equation
LX = Y
for X when Y is given to you. The question here is, “is a given Y in the range of L ?”,
or “can we ﬁnd some X such that LX = Y ?” If one can solve the problem uniquely for
any Y , then the solution is written as
X = L−1 (Y ),
where L−1 is the operator inverse to L , in the sense that L−1 L = I (so to solve LX = Y ,
apply L−1 , X = L−1 LX = L−1 Y ).
Let us give some examples, familiar and unfamiliar, of problems of the form LX = Y ,
where Y is given.
1. LX = (2x1 + 3x2 , x1 + 2x2 ), X ∈ R2 ,
L : R2 → R2 .
The problem of solving LX = Y where Y = (−1, 2) ∈ R2 is that of solving the two
equations
2x1 + 3x2 = −1
x1 + 2x2 = 2
for two unknowns (x1 , x2 ) = X . 4.3. GENERALITIES ON LX = Y . 171 2. Lu = u + 2u + 3u, where u ∈ C 2 ,
L : C 2 → C.
The problem of solving L(u) = x is that of solving the inhomogeneous ordinary diﬀerential
equation
Lu : = u + 2u + 3u = x
for u(x) . π cos(x − t)u(t) dt, 3. Lu = u ∈ C [0, π ]. 0 You should check that L is a linear operator. The problem of solving L(u) = sin x is that
of solving the integral equation
π cos(x − t)u(t) dt = sin x Lu : =
0 for the function u . In this example, it is instructive to examine the range more closely.
Since cos(x − t) = cos x cos t + sin x sin t and since functions of x are constant with respect
to t integration, we see that Lu may be written as
π Lu : = cos x π cos t u(t) dt + sin x
0 sin t u(t) dt,
0 or
Lu : = α1 cos x + α2 sin x,
where the numbers α1 and α2 are
π α1 = π (cos t)u(t) dt; α2 = 0 (sin t)u(t) dt.
0 Thus, the range of L is the linear space spanned by cos x and sin x , which has dimension
two. This linear operator L therefore maps the inﬁnite dimensional space C [0, π ] into a
ﬁnite (two) dimensional space. In order to even have a chance of solving Lu = f for this
operator L , we ﬁrst check to see if f even lies in this two dimensional subspace (for if it
doesn’t, it is futile to go further). The particular function sin x does, so it is reasonable
to look for a solution - which we shall not do right now (however there are inﬁnitely many
2
solutions, among them u(x) = π sin x ).
One particularly important equation which arises frequently is the homogeneous equation
LX = 0,
which is the special case Y = 0 of the inhomogeneous equation,
LX = Y.
Since L is a linear operator, there is no problem of our ﬁnding one solution of LX = 0
for X = 0 is a solution, the so-called trivial solution of the homogeneous equation. The
problem is to ﬁnd a non-trivial solution, or better yet, all solutions. In the previous section,
this question was answered fully for the particular operator Lu = au + bu + cu , where
a, b , and c are constants. Many of our results there generalize immediately, as we shall
see now. 172 CHAPTER 4. LINEAR OPERATORS: GENERALITIES. V 1 → VN , VN → V 1 Definition: The set of all solutions of the homogeneous equation LX = 0 where L is a
linear operator is called the nullspace of L . This nullspace of L, N(L) , consists of all X
in the domain of L which are mapped into zero by L ,
L : N(L) → 0. N(L) ⊂ D(L).
We have called N(L) the nullspace of L , not the null set because of
Theorem 4.8 . The nullspace of a linear operator L : V1 → V2 is a linear space, a subspace
of the domain of L .
Proof: Since the domain of L, D(L) = V1 , is a linear space and N(L) ⊂ D(L) , by
Theorem 2, p.142 all we need show is that the set N(L) is closed under multiplication by
scalars and under addition of vectors. Say X1 and X2 ∈ N(L) . Then LX1 = 0 and
LX2 = 0 . We must show that L(aX1 ) = 0 for any scalar a , and that L(X1 + X2 ) = 0 .
But L(aX1 ) = aL(X1 ) = a · 0 = 0 , and L(X1 + X2 ) = LX1 + LX2 = 0 + 0 = 0 . Thus
N(L) is a subspace of D(L) = V1 .
One important reason for examining the null space of a linear operator is because
if N(L) is known, and if any one solution of the inhomogeneous equation is known, say
LX1 = Y (where Y was given and X1 is the solution we know), then every solution of the
˜
˜
inhomogeneous equation is of the form X + X1 , where X ∈ N(L) . In other words every
solution of LX = Y is in N(L) + X1 , the X1 coset of the subspace N(L) .
Theorem 4.9 . Let L : V1 → V2 be a linear operator. If X1 and X2 are any two solutions
of the inhomogeneous equation LX = Y , where Y is given, then X2 − X1 ∈ N(L) , or
˜
˜
X2 = X + X1 where X ∈ N(L) .
˜
˜
Proof: Let X = X2 − X1 . We shall show that X ∈ N(L) .
˜
LX = L(X2 − X1 ) = LX2 − LX1 = Y − Y = 0.
By using this theorem, we see that if all solutions of the homogeneous equation LX =
0 are known - the nullspace of L —and if one solution of the inhomogeneous equation
LX1 = Y is known, then all of the solutions of the inhomogeneous equation are known.
This solution set of the inhomogeneous equation is the X1 coset of N(L) .
Example: 1 Let L : R2 → R2 be deﬁned by
LX = (x1 + x2 , x1 − x2 ) ∈ R2
Then N(L) is the set of all points in R2 such that LX = 0 , that is, which satisfy the
equations
x1 + x2 = 0
x1 − x2 = 0
Thus the nullspace of L consists of the intersection of the two lines x1 + x2 = 0, x1 − x2 = 0 .
The only point on both lines is 0. Thus N(L) is just the point 0. To solve the inhomogeneous
equation LX = Y , where Y = (1, 1) .
x1 + x2 = 1, x1 − x2 = 1, 4.3. GENERALITIES ON LX = Y . 173 we ﬁnd one solution of it, X1 = (1, 0) . Then every solution of the inhomogeneous equation
˜
˜
˜
is of the form X = X + X1 , where X ∈ N(L) . But since X + 0 is the only point in N(L) ,
every solution is of the form X = 0 + X1 = X1 . Thus every solution is exactly X1 , which
is the unique solution of LX = Y . This situation is a general one. Again, we also saw this
for Lu = au + bu + cu .
Theorem 4.10 . If the nullspace N(L) of the linear operator consists only of 0, then the
solution of the inhomogeneous equation LX = Y (if a solution exists) is unique. (Thus, if
the nullspace contains only 0, then L is injective).
Proof: Say there were two solution X1 and X2 . Then LX1 = Y and LX2 = Y , which
implies L(X2 − X1 ) = LX2 − LX1 = Y − Y = 0 . Therefore (X2 − X1 ) ∈ N(L) . Since the
only element of N(L) is 0, X2 − X1 = 0 , or, X1 = X2 . In other words, the two solutions
are the same.
Example: 2 Let L : C 2 → C be deﬁned on functions u ∈ C 2 by
Lu : = a(x)u + b(x)u + c(x)u.
The nullspace of L consists of all solutions of the homogeneous equation Lu = 0 . It
turns out (see chapter 6) - as in the constant coeﬃcient case - that every solution of this
homogeneous O.D.E. has the form u = Au1 + Bu2 , where u1 and u2 are any two linearly
independent solutions of the equation, and where A and B are constants. Thus N(L)
is a two dimensional space spanned by u1 and u2 . If u1 is a particular solution of the
inhomogeneous equation Lu1 = f , then all the solutions of Lu = f are just the elements
of the u1 coset of N(L) , that is, functions of the form u = u + u1 , where u ∈ N(L) .
˜
˜
With every linear operator L : V1 → V2 , V1 = D(L) , we have associated two other
linear spaces, the nullspace N(L) ⊂ D(L) = V1 and range R(L) ⊂ V2 . There is a valuable
and elegant way to connect D(L), N(L) and R(L) . The result we are aiming at is certainly
the most important theorem of this section.
We know that R(L) ⊂ V2 . The space V2 may be of arbitrarily high dimension.
However, since R(L) is the image of D(L) , we suspect that R(L) can take up “no more
room” then D(L) . To be more precise,
dim R(L) ≤ dim D(L).
Thus, for example, if L : R2 → R17 , we expect that the range of L is a subspace of
dimension no more than two in R17 . Not only is this a justiﬁable expectation, but even
more is true.
If Dim R(L) = Dim D(L) , essentially all of D(L) is carried over under the mapping.
But if Dim R(L) < Dim D(L) , what has happened to the remainder of D(L) ? Let us
look at N(L) ⊂ D(L) . The elements of N(L) are all squashed into the zero element of
V2 . In other words, a set of dim N(L) in V1 = D(L) is mapped into a set of dimension
zero in V2 . Does L decompose D(L) + V1 into two parts, N(L) and a complement N(L)
such that L maps N(L) into zero and the dimension of the remainder, N(L) , is preserved
under L (so dim N(L) = dim R(L)) . Of course,
a figure goes here 174 CHAPTER 4. LINEAR OPERATORS: GENERALITIES. V 1 → VN , VN → V 1 Theorem 4.11 . Let the linear operator L map V1 = D(L) into V2 . If D(L) has ﬁnite
dimension, then
dim D(L) = dim R(L) + dim N(L).
Proof: Let N(L) be a complement of N(L) (cf. pp. 163a-d). Since dim N(L) +
dim N(L) = dim D(L) , it is suﬃcient to prove that dim N(L) = dim R(L) .
For X ∈ V1 , we can write X = X1 + X2 , where X1 ∈ N(L) and X2 ∈ N (L) . Now
LX = LX1 + LX2 , so the image of D(L) is the same as the image of N(L) . In addition, if
X2 ∈ N(L) , then LX2 = 0 if and only if X2 = 0 , merely because N(L) is a complement
of the nullspace. Let { θ1 , . . . , θk } be a basis for N(L ) . If X2 ∈ N(L) , we can write X2 =
k k aj θj , and LX2 =
j =1 aj Lθj . Let Lθ1 = Y1 , Lθ2 = Y2 , . . . , Lθk = Yk . Since the image
j =1 of N(L) is R(L) , the vectors Y1 , . . . , Yk span R(L) . Thus, dim R(L) ≤ k = dim N(L) .
To show that there is equality, dim R(L) = dim N(L) , we prove that Y1 , . . . , Yk are
linearly independent. If c1 Y1 + · · · + ck Yk = 0 , then 0 = c1 Lθ1 + · · · + ck Lθk = L(c1 θ1 +
˜
˜
· · · + ck θk ) = LX where x = c1 θ1 + · · · + ck θk ∈ N(L) . However for any X ∈ N(L) ,
˜
˜
we know Lx = 0 implies that X = 0 . The linear independence of θ1 , . . . , θk further
˜
shows that c1 = c2 = · · · = ck = 0 . The hypothesis c1 Y1 + · · · + ck Yk = 0 has led us to
conclude that the cj ’s are all zero, that is, the Yj ’s are linearly independent. Therefore
dim R(L) = dim N(L) . Coupled with our ﬁrst relationship, this proves the result.
Corollary 4.12 : dim R(L) ≤ dim D(L).
Proof: dim N(L) ≥ 0 .
Two examples, one an illustration, the other an application.
Example: 1 Consider a projection operator, PA , mapping vectors from E n into a subspace
A of En , where the dim A = m < n . Let us ﬁrst show that PA is a linear operator. If
e1 , . . . em is an orthonormal basis for A , then for any X and Y in En ,
m PA (X + Y ) = m X + Y, ek ek =
k=1 m = ( X, ek + Y , ek )ek
k=1 m X, ek ek +
k=1 Y , ek ek = PA X + PA Y.
k=1 Similarly, PA (aX ) = aPA X for every scalar a . Thus the projection operator is a linear
operator. Since R(PA ) = A and dim A = m , while dim En = n , we conclude that
dim N(PA ) = n − m . This could have been arrived at immediately since PA will certainly
map everything perpendicular to A , that is A⊥ , into 0 (see ﬁg. illustrating the case
E2 → A , where A is a line). Thus N(PA ) + A⊥ , so dim N(PA ) = dim A⊥ = n − m .
Example: 2 Deﬁne L : Rn → Rk by,
LX = (a11 x1 + a12 x2 + · · · + a1n xn , a21 x1 + · · · + a2n xn , · · · , akl x1 + ak2 x2 + · · · + akn xn )
where X = (x1 , x2 , . . . xn ) ∈ Rn . If we let Y = (y1 , . . . , yk ) ∈ Rk , then writing Y = LX ,
the linear operator L may be deﬁned by the k equations (for y1 , . . . , yk ) in n “unknowns” 4.3. GENERALITIES ON LX = Y . 175 (x1 , . . . , xn ) ,
a11 x1 + a12 x2 + · · · + a1n xn = y1
a21 x1 + a22 x2 + · · · + a2n xn = y2
.
.
.
ak1 x1 + ak2 x2 + · · · + akn xn = yk .
The problem of solving LX = Y , where Y is given, is that of solving k equations with n
“unknowns”.
Consider the special case k < n , when there are less equations than unknowns. Since
the range of L is contained in Rk , R(L) ⊂ Rk , then dim R(L) ≤ dim Rk = k . Because
D(L) = Rn , we also know that dim D(L) = dim Rn = n . Thus
dim N(L) = dim D(L) − dim R(L) ≥ n − k > 0.
However if dim N(L) > 0 , then N(L) must contain something other than zero. Thus there
˜
˜
˜
is at least one non-trivial solution X of the homogeneous equation, LX = 0 . Since aX is
also a solution, where a is any scalar, there are, in fact an inﬁnite number of solutions.
Notice that the above was a non-constructive existence theorem. We proved that a
solution does exist but never gave a recipe to obtain it. One consequence of this result is
that, if dim N(L) > 0 , and if a solution of the inhomogeneous equation LX = Y exists, it
˜
˜
is not unique; for if LX1 = Y , then also L(X1 + X ) = Y , where X is any solution of the
homogeneous equation.
In the special case n = k , and dim N(L) = 0 a fascinating (and non-constructive)
theorem falls out of Theorem 11: the inhomogeneous equation LX = Y always has a
solution and the solution is unique. Put in more conventional terms, if there are the same
number of equations as unknowns, and if the only solution of the homogeneous equation
is zero, then the inhomogeneous equation always has a unique solution. Thus, if n = k ,
uniqueness implies existence.
Since dim N(L) = 0 , then dim R(L) = dim D(L) = n . However L : Rn → Rn in this
case (n = k ) . Since R(L) ⊂ Rn and dim R(L) = n , we see that R(L) must be all of Rn ,
that is, every Y ∈ Rn is in the range of L , which means that the inhomogeneous equation
LX = Y is solvable for every Y ∈ Rn . Theorem 10 gives the uniqueness. We shall obtain
a better theorem later.
Remark: Some people refer to dim R(L) as the rank of the linear operator L . We shall,
however, refer to it as the dimension of the range of L .
If L1 : V1 → V2 and L2 : V2 → V3 , it is easy to make a few statements about
dim R(L2 L1 ) .
Theorem 4.13 . If L1 : V1 → V2 and L2 : V3 → V4 , when V2 ⊂ V3 , (so L2 L1 ) is
deﬁned), then
dim R(L2 L1 ) ≤ min(dim R(L1 ), dim R(L2 )).
Proof: The last corollary states that an operator is like a funnel with respect to dimension:
the dimension can only get smaller or remain the same. After passing through two funnels,
we obtain no more than the smallest allowed through. One might think that there should be
equality in the formula. That this is not the case can be seen from the possibility illustrated
in the ﬁgure. Only the shaded stuﬀ gets through. 176 CHAPTER 4. LINEAR OPERATORS: GENERALITIES. V 1 → VN , VN → V 1 Exercises
(1) Let L : Rn → Rn be deﬁned by
LX = (x1 , x2 , . . . , xk , 0, . . . , 0),
where X = (x1 , x2 , . . . xn ) ∈ Rn . Describe R(L) and N(L) . Compute dim R(L)
and dim N(L) .
(2) (a) Describe the range and nullspace of the linear operator L : R3 → R3 deﬁned by
LX = (x1 + x2 − x3 , 2x1 − x2 + x3 , x2 − x3 ), X = (x1 , x2 , x3 ) ∈ R3 .
(b) Compute dim R(L) and dim N(L) .
(c) Is (1, 2, 0) ∈ R(L) ? Is (1, 2, 1) ∈ N(L) ?
Is (1, 2, 2) ∈ N(L) ? Is (0, −1, −1) ∈ N(L) ?
(3) Let A = { u ∈ C 2 [0, 2] : u(0) = u(1) = 0 } , and deﬁne L : A → C [0, 1] by Lu =
u + b(x)u − u , where b(x) is some continuous function. Prove N(L) = 0 . [hint: If
u ∈ N(L) , can u have a positive maximum or negative minimum?]
(4) Consider the linear operator L : C [0, 1] → C [0, 1] deﬁned by
1 (Lu)(x) = u(x) + 2 ex−t u(t) dt 0 (a) Find the nullspace of L .
(b) Solve Lu = 3ex . Is the solution unique?
(c) Show that the unique solution of Lu = f , where f ∈ C [0, 1] is
u(x) = f (x) − cex , where c = 2
3 1 e−t f (t) dt. 0 (5) Let L : V → V (so Lk is deﬁned for k = 0, 1, 2 . . . ). Prove that
(a) R(L) ⊂ N(L) if and only if L2 = 0 .
(b) N(L) ⊂ N(L2 ) ⊂ N(L3 ) ⊂ . . .
(c) N(L) ⊃ N(L2 ) ⊃ N(L3 ) ⊃ . . . .
(6) If L1 : V1 → V2 and L2 : V3 → V4 where V2 ⊂ V3 , Theorem 12 gives an upper bound
for dim R(L2 L1 ) .
(a) Prove the corresponding lower bound
dim R(L2 L1 ) ≥ dim R(L1 ) + dim R(L2 ) − dim V3 .
[hint: Prove the equivalent inequality dim R(L1 ) ≤ dim R(L2 L1 ) + dim N(L2 )
˜
˜
by letting V = R(L1 ) and applying Theorem 11 to L2 deﬁned on V ].
(b) Prove: if dim N(L2 ) = 0 , then
dim R(L2 L1 ) = dim R(L1 ). 4.4. L : R1 → RN . PARAMETRIZED STRAIGHT LINES. 177 (c) If dim N(L1 ) = 0 , is it then true that dim R(L2 L1 ) = dim R(L2 ) ? Proof or
counterexample.
(d) If dim V1 = dim V2 = dim V3 and dim N(L1 ) = 0 , is it true that dim R(L2 L1 ) =
dim R(L2 ) ? Proof or counterexample.
(7) If L1 and L2 both map V1 → V2 , prove
|dim R(L1 ) − dim R(L2 )| ≤ dim R(L1 + L2 ).
(8) Consider the operator L : C 2 [0, ∞) → C [0, ∞) deﬁned by
Lu : = u + 3u + 2u.
(a) Describe N(L) . What is dim N(L) ? Is f (x) = sin x ∈ R(L) ?
(b) Consider the same operator L but mapping A into C [0, ∞] , where A = { u ∈
C 2 [0, ∞) : u(0) + u (0) = 0 } . Answer the same questions as part a).
(c) Same as b but A = { u ∈ C 2 [0, ∞) : u(1) + u (1) = 0 } this time. 4.4 L: R 1 → Rn . Parametrized Straight Lines. Our study of particular linear operators begins with the most simple case: a linear operator
which maps a one- dimensional space R1 into an n dimensional space Rn . Since the
dimension of the range of L is no greater than that of the domain R1 and dim R1 = 1 ,
then
dim R(L) ≤ 1.
This proves
Remark: If L : R1 → Rn , then the dimension of the range of L is either one or zero.
The case dim R(L) = 0 is trivial, for then L must map all of R1 into a single point, and
that single point must be the origin since the range of L is a subspace. Thus, if dim R(L) =
0 , then L maps every point into 0. Without change, the same holds if L : V1 → V2 (where
V1 and V2 are any linear spaces) and dim R(L) = 0 . Not very profound.
If dim R(L) = 1 , then the subspace R(L) in Rn is a one dimensional subspace in the
n dimensional space Rn , this is, R(L) is a “straight line” through the origin of Rn . This
straight line is determined if any one point P = 0 on it is known. Then there is a point
X1 ∈ R1 such that LX1 = P . Since R1 is one dimensional it is spanned by any element
other than zero, so every X ∈ R1 can be written as X = sX1 . Therefore, if X is any
element of R1 ,
LX = L(sX1 ) = sLX1 = tP.
In other words, this last equation states that the range of L is a multiple of a particular
vector P , that is, a straight line through the origin.
Example:
If L : R1 → R2 such that the point X1 = 2 ∈ R1 is mapped into the point P =
(1, −2) ∈ R2 , then
L : X = s2 → (s, −2s),
In particular, the point X = 3(s = 3 ) is mapped into the point ( 3 , −3) .
2
2 178 CHAPTER 4. LINEAR OPERATORS: GENERALITIES. V 1 → VN , VN → V 1
a figure goes here In applications, the domain R1 usually represents time, while the range represents the
position of a particle. Then L : R1 → Rn is an operator which speciﬁes the position of a
particle at a given time. Since L is linear and L0 = 0 , the path of the particle must be
a straight line which passes through the origin at t = 0 . Later on in this section we shall
show how to treat the situation of a straight line not through the origin, while in Chapter
7 we shall examine curved paths (non-linear operators).
Example: This is the same example as before. L : R1 → R2 is such that at time t = 2 ∈ R1
a particle is at the point (1, −2) (while at t = 0 it is at the origin). At any time t = s2 ,
3
the particle is at (s, −2s) . In particular, at t = 3(s = 2 ) , the particle is at ( 3 , −3) . It is
2
also convenient to rewrite the position (s, −2s) directly in terms of the time. Since t = 2s ,
t
the position at time t is ( 2 , −t) . Thus we can write
t
L : t → ( , −t),
2
which clearly indicates the position at a given time. If a point in the space R2 is speciﬁed
by Y = (y1 , y2 ) ∈ R2 , then the operator can be written as
1
y1 = t
2
y2 = −t.
All of these are useful ways to write the operator L . In some situations, one might be more
useful than another.
This brings us to an issue which perhaps seems a bit pedantic but can serve you well
in times of need. How can we represent the operator in a picture? There are three distinct
ways. Some clarity can be gained by distinguishing them carefully. The same ideas carry
over immediately to nonlinear operators.
Our ﬁrst picture has two parts. If L : R1 → Rn , then the ﬁrst part is a diagram of R1 ,
the second part a diagram of Rn , and between them are arrows to indicate the image of
each point in R1 . The picture below the ﬁrst example was of this type. All of the arrows
get in the way, so a more convenient picture is needed. That comes next.
The second picture is the graph of an operator L . The graph L : R1 → Rn is the set
of points (X, LX ) in the Cartesian product space R1 × Rn . Thus, if V 1 is time, and Rn
space with L assigning a position to every time, then the points on the graph are points
t
in time - space (X, LX ) . For the previous example, these are the points (t, ( 2 , −t)) in
R1 × R2 , a straight line in time-space ( or space-time if you prefer). To each time, there
is a unique point in space. In a sense, this second picture, the graph, associated with an
operator results from gluing together the two pieces of the ﬁrst picture. By using the graph
of an operator, we avoid the arrow mess of the ﬁrst picture.
The third picture just indicates the range of an operator (when thinking of pictures,
the range is often referred to as the path of the operator). In terms of the time- position
example, this picture only shows the path of a particle in space and ignores when a particle
had a given position. Thus, this picture is the second half of the ﬁrst picture. From our
physical interpretation, it is clear that two diﬀerent operators might have the same path (for
two particles could travel the same path without having the same position at every time).
Thus, this picture is an incomplete representation of an operator. 4.4. L : R1 → RN . PARAMETRIZED STRAIGHT LINES. 179 ˜
Example: If L : R1 → R2 such that the point X1 = 1 ∈ R1 is mapped into the point
P = (1, −2) ∈ R2 , then
˜
L : X = s · 1 → (s, −2s).
˜
In particular, the point X = 3(s = 3) is mapped into the point (3, −6) . The graph of L
1 × R2 . Compare this with the
is the set of points (s, s, −2s) , which is a straight line in R
operator L considered previously (we remind you that L : X = 2s → (s, −2s)) . The graph
˜
of L was the set of point (2s, s, −2s) . These two sets of points the graphs of L and L ,
respectively, do not coincide since the operators are the same. On the other hand, the path
˜
of L is the set of points (s, −2s) , which is exactly the same set of points as the path of
L . Shortly, we shall ask the question, how can we describe a straight line in Rn ? One way
is to ﬁnd an operator whose path is that straight line. Since many operators have the same
path, there will be many possible ways to describe the straight line. All we need do is pick
one, any one will do.
Let L : R1 → Rn and Y0 be some ﬁxed point in Rn . Consider the operator M X :=
LX + Y0 . Since M 0 = L0 + Y0 = Y0 = 0 , we see that M is not a linear operator; it is
called an aﬃne operator or aﬃne mapping. The range of M is the subspace translated by
the vector Y0 , a straight line which does not necessarily pass through the origin ( it will if
and only if Y0 ∈ R(L) ). In other words, R(M ) is the Y0 coset of the subspace R(L) .
Example: Take L to be the same as before, so L : X = 2s → (s, −2s) or L(2s) = (s, −2s) .
Let Y0 = (−3, 2) . Then M X := LX + Y0 = (s, −2s) + (−3, 2) = (s − 3, −2s + 2) , where
e
X = 2s . In particular, M maps the point X = 3 ∈ V 1 (s = 2 ) into (− 3 , −1) ∈ R2 . The
2
ﬁgure shows the path of L and M . Since X = 2s , we can eliminate s from the above
formula and write
1
M X = ( X − 3, −X + 2),
X ∈ R1 .
2
If we denote by Y = (y1 , y2 ) a general point in R2 , then M may be written in the standard
form
1
y−1= X −3
2
y − 2 = −X + 2.
Of course, one could eliminate X from these too and be left with 2y1 + y2 = −4 , which is
the equation of the path and could come from any mapping with the same path.
It is instructive to investigate the reverse question, given two points P and Q in Rn ,
ﬁnd an equation for the straight line passing through them. Any mapping whose path is
the desired line will do. We have learned that M X = LX + Y0 is the general equation
of a straight line through Y0 . There is complete freedom in specifying which points are
mapped into P and Q , so we would be foolish not to pick the most simple case. Let
M : 0 → P and M : 1 → Q . Then P = M (0) = L(0) + Y0 = Y0 , so Y0 = P , and
Q = M (1) = L(1) + Y0 = L(1) + P , so L : 1 → P − Q . This completely determines M
(since L is determined once the image of one point is known, L : 1 → P − Q , and the
vector Y0 is also determined, Y0 = P ).
Example: Find an equation for the straight line passing through the two points P =
(1, 2, −3, −4) , Q = (−1, 3, 2, −2) in R4 . Say P is the image of 0 and Q is the image of
1, so M : 0 → P and M : 1 → Q . Then since M X = LX + Y0 ⇒ P = M (0) = Y0 so
Y0 = (1, 2, −3, −4) . Also Q = L(1) + Y0 ⇒ L(1) = Q − Y0 = Q − P = (−2, 1, 5, 2) . Because 180 CHAPTER 4. LINEAR OPERATORS: GENERALITIES. V 1 → VN , VN → V 1 every X ∈ R1 can be written as X = s · 1 ⇒ LX = L(s · 1) = sL(1) = s(−2, 1, 5, 2) , or
LX = (−2s, s, 5s, 2s) , where X = s · 1 ∈ R1 . Thus M X = LX + Y0 = (−2s, s, 5s, 2s) +
(1, 2, −3, −4) , or
M X = (−2s + 1, s + 2, 5s − 3, 2s − 4), where X = s· ∈ R1 .
If we use Y = (y1 , y2 , y3 , y4 ) to indicate a general point in R4 , then M : R1 → R4 can be
written as four equations.
y1 = −2s + 1
y2 = s + 2
y3 = 5s − 3
y4 = 2s − 4
where X = s · 1 ∈ R1 . For example, the image of X = 2(s = 2) in R1 is the point
Y = (−3, 4, 7, 4) ∈ R4 .
The discussion before the example contained most of the proof of Theorem 13. Let P
and Q be two points in Rn . Then the aﬃne mapping
M X = P + s(Q − P ),
has as its path the straight line passing through P and Q .
˜
Remark: 1 The aﬃne mapping M X = P + ks(Q − P ) , where k = 0 is some constant,
has the same path too. The only change is that while M : 0 → P and M : 1 → Q this
˜
˜
˜
mapping M : 0 → P and M : ks → Q . In other words for M we have chosen to take ks
(not s ) as the pre-image of Q . This pre-image of Q was entirely arbitrary anyway.
Remark: 2 The equation M X = P + s(Q − P ) of M : R1 → Rn , where X = s · 1 ∈ R1
is called a parametric equation of the straight line which passes through P and Q in
Rn , and s is called the parameter. Other parametric equations of the same line arise if
X = ks · 1 ∈ R1 (cf. Remark 1), where k is some non-zero constant.
In order to introduce the slope of a straight line, let us paraphrase the last few paragraphs in terms of particle motion. If P and Q are two points in Rn , then M t =
P + t(Q − P ) M : R1 → Rn , where t ∈ R1 describes the position of the particle at time
t . At t = 0 the particle is at P , while at t = 1 the particle is at Q . Another particle
˜
moving k times as fast has the position M t = P + kt(Q − P ) . This other particle is also
1
at P when t = 0 , but takes time t = k to reach the point Q . It still has the same path
as the ﬁrst particle. If we denote by Y = (y1 , y2 , . . . , yn ) an arbitrary point in Rn , then
the position Y at time t is
y1 = p1 + kt(q1 − p1 )
.
.
. y2 = p2 + kt(q2 − p2 )
yn = pn + kt(qn − pn ). Now consider the mapping M t = P + kt(Q − P ) . The derivative at t = t1 is
dM
dt t=t1 = lim t2 →t1 M (t2 ) − M (t1 )
t2 − t1 4.4. L : R1 → RN . PARAMETRIZED STRAIGHT LINES. 181 It represents the velocity at t = t1 . To have this make sense, we must introduce a norm
in Rn so that the limit can be deﬁned. Use the Euclidean norm (although any other one
could be used, for it turns out that there is no need for a limit in the case of a straight line).
Since M (t2 ) − M (t1 ) = P + kt2 (Q − P ) − [P + kt1 (Q − P )] = k (t2 − t1 )(Q − P ) , we have
M (t2 ) − M (t1 )
= k (Q − P ),
t2 − t1
so dM
(t) = k (Q − P ).
dt
Because this is independent of t , it is the derivative at any time t . Thus, the derivative is
a vector, k (Q − P ) . The derivative represents the velocity of a particle moving on the line.
The speed is the length of the velocity vector, speed = k (Q − P ) . What is the slope of
the line? Since the line is the path of a mapping, it should not depend on which mapping
is used. In terms of mechanics, the slope should not depend on the speed of the particle
moving along the line, but only that it moved along the straight line, that is its velocity
vector was along the line. Thus we deﬁne the slope as a unit vector in the direction of the
velocity. In our case, slope = Q − P/ Q − P . This is a unit vector from P to Q and
only depends upon the mapping to specify a positive direction (orientation) for the line.
Example: A particle moves on a straight line from P = (1, −2, 1) at t = 0 to Q =
(3, 1, −5) at t = 2 . Find the position of the particle as a function of time, the velocity and
speed of the particle, and slope of the path.
The equation of the path is M t = P + kt(Q − P ) , where k is determined from
1
Q = M (2) = P + 2k (Q − P ) , so k = 1 . Thus M (t) = (1, −2, 1) + 2 t(2, 3, −6) =
2
3
1
3
(1 + t, 2 + 2 t, 1 − 3t) . Velocity = 2 (Q − P ) = (1, 2 , −3) . Speed = velocity = 7 . Slope
2
6
= velocity /speed = ( 2 , 3 , − 7 ) .
77
A glance at the formulas which precede the example reveals that the position of a
particle which moves along a straight line through P can be written in any of the forms
1. M (t) = P + kt(Q − P ). where Q is another point on the path and the particle is at Q when t =
2. M (t) = P + 1
k , dM
t
dt or
3. M (t) = P + V t, where V is the velocity. See Exercise 5 too. Exercises
(1) (a) If L : R1 → R2 such that the point X1 = 3 ∈ R1 is mapped into P = (1, 0) ,
which of the following points are in R(L) i) (2, 0) , ii) (1, 2) , iii) (−1, 0) ?
(b) Sketch two pictures, one of the graph of L , the other of the path of L .
˜
(c) Find another operator L : R1 → R2 whose path is the same as that for L . 182 CHAPTER 4. LINEAR OPERATORS: GENERALITIES. V 1 → VN , VN → V 1 (2) Find a mapping whose path is the straight line passing through the points (2, −1, 3)
and (1, −3, −5) . Find its slope too.
(3) If a point is at (1, −1, 0) at t = 0 and at (2, 3, 8) at t = 3 , ﬁnd the position as a
function of time if the particle moves along a straight line. What is the velocity and
speed of the particle?
(4) If a particle is initially at (0, 1, 0, 1) and has constant velocity (1, −2, 3, −1) , ﬁnd its
position as a function of time. Where is it at t = 3 ?
˜
(5) A particle moves along a straight line in such a way that at t = t0 it is at P , while
˜.
at t = t1 it is at Q
(a) Show that its position M (t) as a function of time is
˜
˜
Q−P
˜
M (t) = P + (t − t0 )
t1 − t0
(b) What is the velocity?
(c) Show that
M (t) = M (t0 ) + dM
(t − t0 ).
dt (6) Two straight lines are parallel if they have the same slope. If M (t) = P + t(Q − P ) is
a parametric equation of one line, ﬁnd an equation for the parallel line which passes
˜
through the point P . 4.5 L: R n → R1 . Hyperplanes. Whereas in the previous section we examined linear mappings from a one-dimensional linear
space into an n dimensional space, now we shall look at the opposite extreme, linear
mappings from an n dimensional space into a one- dimensional space.
Let L : Rn → R1 . We would like to ﬁnd a representation theorem for this linear
operator. The most natural way to do this is to work with a basis { e1 , . . . , en } for Rn .
n Then every X ∈ Rn can be written as X = xk ek . Consequently,
1
n n LX = L xk ek
1 = n L(xk ek ) =
1 xk L(ek ).
1 It is clear that LX is determined once we know all the numbers Lek . In other words, the
linear mapping L is determined by the eﬀect of the mapping on a basis for the domain of
the operator. This proves
Theorem 4.14 . Let L : Rn → R1 linearly. If { ek } is a basis for the domain of L, Rn ,
then
n LX = a1 x1 + a2 x2 + . . . + an xn = ak xk ,
k =a 4.5. L : RN → R1 . HYPERPLANES. 183 n where X = xk ek and ak = Lek . Notice that the ak are scalars since they are in the
1 range of L —and the range of L is R1 by hypothesis.
Examples:
(1) Consider the linear operator L : R3 → R1 , which maps L : e1 = (1, 0, 0) → 1, L : e2 =
(0, 1, 0) → 0 , and L : e3 = (0, 0, 1) → 0 . Since the ek constitute a basis for R3 , the
mapping L is completely determined by using Theorem 14. If X = (x1 , x2 , x3 ) ∈ R3 ,
then X = x1 e1 + x2 e2 + x3 e3 . Thus
LX = x1 Le1 + x2 Le2 + x3 Le3 = x1 − x2
or
LX = x1 .
For example, L : (2, 1, 7) → 2 . The nullspace of L —those points X ∈ R3 such that
LX = 0 —are the points X = (x1 , x2 , x3 ) ∈ R3 such that x1 = 0 which is the x2 x3
plane.
(2) Let L : R4 → R1 such that Le1 = 1 , Le2 = −2 , Le3 = 5 , Le4 = −3 , where
e1 = (1, 0, 0, 0) , e2 = etc. Then if X = (x1 , x2 , x3 , x4 ) ∈ R4 , we have
LX = x1 − 2x2 + 5x3 − 3x4 .
The nullspace of L is again a hyperplane, the hyperplane x1 − 2x2 + 5x3 − 3x4 = 0
in R4 .
So far we have not given any attention to the range of L , all of our pictures being in the
domain of L . Since the range is R1 , its picture is a simple straight line which is not very
interesting. However the graph of L is interesting. Let L : Rn → R1 and Y = (y ) ∈ R1 .
Then
y = a1 x1 + . . . + an xn .
The graph of L is the set of points (X, LX ) [or (X, Y ) where Y = LX ] in Rn × R1 ∼
=
Rn+1 . A point (X, Y ) = (x1 , . . . , xn , y ) ∈ Rn × R1 is on the graph if the coordinates satisfy the equation y = a1 x1 + . . . + an xn . This equation can be written as 0 = a1 x1 + . . . +
an xn + (−1)y which is a hyperplane in Rn+1 .
Thus we have found two ways to associate a hyperplane with L : Rn → R1 ,
i) All X such that LX = 0 , which is the nullspace of L , a linear space of dimension
n − 1 (since dim N(L) = dim D(L) − dim R(L) = n − 1) .
ii) The graph of L , that is, all points of the form (X, LX ) , is a linear space of dimension
n+1.
Although this is confusing, both ways are used in practice, whichever is most convenient
for the problem at hand. For the remainder of this section, we shall conﬁne our attention
to hyperplanes deﬁned in the ﬁrst way.
Since linear mappings L : Rn → R1 all have the form LX = a1 x1 + . . . + an xn , and
since it is natural to think of the sum as the scalar product of the vectors N = (a1 , . . . , an )
and X = (x1 , . . . , xn ) . Theorem 14 may be rephrased as
Theorem 4.15 . If L : Rn → R1 , then LX = N , X , where N is the vector N =
(Le1 , . . . , Len ) and { ek } form a basis for Rn . 184 CHAPTER 4. LINEAR OPERATORS: GENERALITIES. V 1 → VN , VN → V 1 Remark: The vector N is an element of the so-called dual space of Rn . From the above,
it is clear that the dual space of Rn also has dimension n .
Theorem 14’ is a “representation theorem”. It states that every linear mapping L : Rn →
1 may be represented in the form LX := N , X for some vector N which depends on
R
L . You may wish to think of N as a vector perpendicular to the hyperplane LX = 0 (cf.
Ex. 8, p. 225).
Example: Consider the operator L of Example 2 in this section. For it, LX = N , X
where N is the particular vector N = (1, −2, 5, 3) .
Recall that a linear functional is a linear operator l whose range is R1 . Since the
operators L : Rn → R1 we are considering have range R1 , they are all linear functionals.
We may again rephrase Theorem 14 in this language. It states that every linear functional
deﬁned on Rn may be represented in the form l(X ) = N , X , where N depends on the
functional l at hand. This is just a restatement of Theorem 14 with the realization that
our L ’s are linear functionals. Don’t let the excess language bewilder you.
So far in this section, we have concentrated our attention on the algebraic representation
of a linear operator (functional) L : Rn → R1 . Let us turn to geometry for a bit. In passing
we observed that the nullspace of the operator was a hyperplane in the domain of L (a
hyperplane in a linear space V is a “ﬂat” subset of V whose dimension is one less than V ,
that is, of codimension one). These hyperplanes, { X ∈ Rn : LX = 0 } , all passed through
the origin of Rn . A plane parallel to this one which passes through the particular point
X 0 ∈ Rn has the form
L(X − X 0 ) = 0.
It is clear that the point X = X 0 does satisfy the equation. From the representation
theorem,
L(X − X 0 ) = a1 (x1 − x0 ) + a2 (x2 − x0 ) + . . . + an (xn − x0 ) = 0,
1
2
n
is the equation of this hyperplane, where X = (x1 , x2 , . . . , xn ) and X 0 = (x0 , x0 , . . . , x0 ) .
n
12
If we again write N = (a1 , a2 , . . . , an ) , then the equation of the hyperplane is
N , X − X 0 = 0,
all vectors X such that X − X0 is perpendicular to N .
Examples:
(1) Find the equation of a plane which passes through the point X 0 = (1, 2, −5) and is
parallel to the plane −2x1 + 7x2 + 4x3 = 0 .
Solution : Here N = (−2, 7, 4) , X = (x1 , x2 , x3 ) , so the plane has the equation
0 = N , X − X 0 = −2(x1 − 1) + 7(x2 − 2) + 4(x3 + 5),
which may be written as
−2x1 + 7x2 + 4x3 = −8.
The equation has been cooked up so that X 0 = (1, 2, −5) does satisfy it.
(2) Find the equation of a plane which passes through the point X 0 = (1, 2, −5) and is
parallel to the plane −2x1 + 7x2 + 4x3 = 37 .
Solution : Since this plane is also parallel to the plane −2x1 + 7x2 + 4x3 = 0 , the
solution is that of Example 1. 4.5. L : RN → R1 . HYPERPLANES. 185 (3) Find the equation of the plane in R4 which is perpendicular to the vector N =
(1, −2, 3, 1) and passes through the point X 0 = (1, 0, 1, −1) . Easy. The plane is all
points X such that
N , X − X 0 = 0,
that is
(x1 − 1) − 2(x2 − 0) + 3(x3 − 1) + (x4 + 1) = 0,
or
x1 − 2x2 + 3x3 + x4 = 3.
(4) Find the equation of the plane in R3 which passes through the three points
X 1 = (7, 0, 0), X 2 = (1, 0, −2), X 3 = (0, 5, 1). We shall ﬁnd this by using the general equation of a plane,
a1 (x1 − x0 ) + a2 (x2 − x0 ) + a3 (x3 − x0 ) = 0.
1
2
3
Here X 0 = (x0 , x0 , x0 ) is a particular point on the plane. We may use any of X 1 , X 2 ,
123
or X 3 for it. Since X 1 is simplest, we take X 0 = (7, 0, 0) . All that remains is to
ﬁnd the coeﬃcients a1 , a2 , and a3 in
a1 (x1 − 7) + a2 x2 + a + 3x3 = 0.
Since X 2 and X 3 are in the plane (and so must satisfy its equation), the substitution
X = X 2 and X = X 3 yields two equations for the coeﬃcients,
a1 (1 − 7) + a2 0 + a3 (−2) = 0
a1 (0 − 7) + a2 (5) + a3 (1) = 0.
These two equations in three unknowns may be solved for any two in terms of the
third. We ﬁnd a3 = −3a1 and a2 = 2a1 , so the equation is
a1 (x1 − 7) + 2a1 x2 − 3a1 x3 = 0.
Factoring out the coeﬃcient a1 , we obtain the desired equation
x1 − 7 + 2x2 − 3x3 = 0.
(It is clear from the general equation of a plane that the coeﬃcients are determined
only to within a constant multiple). Exercises
(1) Let L : R2 → R1 map
L : (1, 0) → 3, L : (0, 1) → −2. Write LX in the form Lx = a1 x1 + a2 x2 . L : (7, 3) → ? 186 CHAPTER 4. LINEAR OPERATORS: GENERALITIES. V 1 → VN , VN → V 1 (2) Let L : R2 → R1 map
L : (2, 1) → 1, L : (0, 3) → −2. Write LX in the form LX = a1 x1 + a2 x2 . L : (7, 3) → ?
(3) Find the equation of a plane in R3 which passes through the point (3, −1, 2) and is
parallel to the plane x1 − x2 − 2x3 = 7 .
(4) Find the equation of a plane in R5 which is perpendicular to the vector N =
(6, 2, −3, 1, −1) and contains the point (1, 1, 1, 1, 4) .
(5) Find the equation of a plane in R4 which contains the four points X1 = (2, 0, 0, 0) ,
X2 = (1, 0, 2, 0) , X3 = (0, −1, 0, −1) , X4 = (3, 0, 1, 1) .
(6) In this problem, you will have to use the norm induced by the scalar product.
a). Show that the distance between the point Y ∈ Rn and the plane A = { X ∈
Rn : N , X − X 0 = 0 } is
d(Y, A) = N, Y − X 0
.
N b). Prove that the distance between the parallel planes A = { X ∈ Rn : N , X − X 1 =
0 } and B = { X ∈ Rn : N , X − X 2 = 0 } is
d(A, B ) = N, X 2 − X 1
.
N Chapter 5 Matrices and the Matrix
Representation of a Linear
Operator
5.1 L: R m → Rn . The simplest example of a linear operator L which maps Rm into Rn is supplied by n
linear algebraic equations with m variables. Let X = (x1 , . . . , xm ) ∈ Rm . Then we deﬁne a11 x1 + a12 x2 + · · · + a1m xm
a x + a22 x2 + · · · + a2m xm LX = 21 1
(5-1)
. . . . . . . . . . . . . . . . . . . . . . . . . . . .
an1 x1 + · · · · · · + anm xm
Notice the right side of this equation is a (column) vector with n components. If we let
Y = (y1 , . . . , yn ) , then the equation LX = Y or
m aij xj = yi , i = 1, 2, . . . , n, j =1 determines a vector Y in Rn for every X in Rm . Since the operator L is essentially
speciﬁed by the coeﬃcients a11 , a12 , . . . , anm , it is convenient to represent it by the notation a11 a12 · · · alm
a
a22 · · · a2m L = 21
. . . . . . . . . . . . . . . . . . . ,
anl an2 · · · anm
and use the notation a11 a12 · · · alm
x1
a21 a22 · · · a2m x2 LX = . . . . . . . . . . . . . . . . . . . . anl . . . . . . . . anm
xm (5-2) The ordered array of m × n coeﬃcients is called a matrix associated with L , and the
numbers aij are called the elements of the matrix. The ﬁrst index i refers to the row while
187 188 CHAPTER 5. MATRIX REPRESENTATION the second index j refers to the column. We may also write L = ((aij )) as a shorthand
to refer to the whole matrix. Since we shall only use linear operators in this chapter it
is convenient to drop the letter L for the operator and use A = ((aij )) instead. This
will facilitate the notation when referring to other matrices B = ((bin ) , etc. since there
will be enough subscripts without adding to the confusion by using L1 , L2 , etc. for linear
operators.
In this section we shall work out the meaning of operator algebra applied to the special
case of operators L : Rm → Rn which are represented by matrices. It turns out that every
operator L : Rm → Rn can be represented by a matrix (proved later in this very section).
Let us ﬁrst i) deﬁne equality, ii) exhibit the matrices for the zero operator O(X ) = 0
(additive identity). If A = ((aij )) and B = ((bij )) both map Rm → Rn , then by deﬁnition,
A = B if and only if AX = BX for every X ∈ Rm , that is, for all X = (x1 , x2 , . . . , xm ) ,
ai1 x1 + ai2 x2 + · · · + aim xm = bil xl + · · · + bim xm ,
or m i = 1, 2, . . . , n m aij xj = bij xj , i = 1, 2, . . . , n. (aij − bij )xj = 0, i = 1, 2, . . . , n j =1 j =1 Subtracting, we ﬁnd that
m
j =1 must hold for any choice of X = (x1 , x2 , . . . , xm ) . From the particular choice X =
(1, 0, 0, . . . 0) , we see that
ai1 − bi1 = 0, i = 1, 2, . . . , n, that is,
a11 = b11 , a21 = b21 , . . . , anl = bnl .
Similarly, by using other vectors X , we conclude
Theorem 5.1 1 (equality). If A = ((aij )) and B = ((bij )) both map Rm → Rn , then
A = B if and only if the corresponding elements of their matrices are equal,
aij = bij , i = 1, 2, . . . , n, j = 1, 2, . . . , m. It is clear that the n × m matrix all of whose elements are zero 0 0 ··· 0
0 0 · · · 0 0=
. . . . . . . . . . . .
0 0 ··· 0
has the property that it maps every X ∈ Rm into zero, and thus satisﬁes the conditions for
the zero matrix. That this is the only such matrix follows from Theorem 1, since any other
matrix which acts the same way on every vector X ∈ Rm must have the same elements all zeroes. 5.1. L : RM → RN . 189 Theorem 5.2 2. The zero matrix 0 : Rm → Rn is uniquely represented by a matrix with
n rows and m columns, all of whose elements are zero.
How is the identity matrix I deﬁned? Since I : Rn → Rn maps every vector into
itself, IX = X , the linear equations (1) must have the property that given any vector
X = (x1 , x2 , . . . , xn ) ∈ Rn , then
n δij xj = xi , i = 1, 2, . . . , n j =1 If aij = δij (the Kronecker delta), so a11 = a22 = · · · = ann = 1 while aij = 0,
then indeed i = j, n δij xj = xi , i = 1, 2, . . . n j =1 is satisﬁed. Thus, the coeﬃcients of the identity matrix are I = ((δij )) . This is a square
(nx m) matrix, 1 0 ··· 0
0 1 · · · 0 I=
. . . . . . . . . . . .
0 0 ··· 1
with ones along the main diagonal and zeroes elsewhere.
Theorem 5.3 3. The identity matrix I : Rn → Rn is uniquely represented by a square
(n × n) matrix whose elements are I = ((δij )) .
We turn to addition. Let A = ((aij )) and B = ((bij )) be two n × m matrices, so they
both represent operators mapping Rm into Rn . Their sum C = A + B is deﬁned as the
operator which acts upon X according to the rule (p. 268)
CX = AX + BX, X ∈ Rm . The elements cij of the matrix C consequently satisfy
m m cij xj =
j =1 aij xj +
j =1 or m bij xj , j = 1, 2, . . . , n. j =m m = (aij + bij )xj , j = 1, 2, . . . , n j =1 for all X = (x1 , x2 , . . . , xm ) . Thus, the cij are in fact aij + bij (by Theorem 1)
Theorem 5.4 4. If A = ((aij )) and B = ((bij )) both map Rm into Rn , then their sum
C = A + B has elements
cij = aij + bij . 190 CHAPTER 5. MATRIX REPRESENTATION Remark: From this it follows that the zero matrix is actually the additive identity, for if
A = ((aij )) , then C = A + 0 has elements cij = aij + 0 = aij , that is, A + 0 = A .
Example: 1 Let A and B which map R3 → R4 be represented by the matrices 2
2
2
−3 0
1 −3 7 2 −1 0
0 .
A= 5 4 −3 ; B = −4 −2
2
0 −1 −1
01
1
Then −1
4
A+B =
1
0 2
3
2 −1 .
2 −1 0
0 Example: 2. Let A and B be the operators on p. 268 (called L1 and L2 there) which
map R2 → R3 . Then 1
1
−3
1
2 ,
A= 1
B = 1 −1 ,
0 −1
1
0
so −2
2
1
A+B = 2
1 −1 which agrees with the sum obtained there.
˜
˜
˜
If A = ((aij )) , is there a matrix A such that A + A = 0 ? Clearly the matrix A
˜
deﬁned by A = ((−aij )) does the job since
˜
A + A = ((aij )) + ((−aij )) = ((0))
by deﬁnition of addition. We shall denote the matrix with elements ((−aij )) by “ −A ”
since A + (−A) = 0 . This matrix “ −A ” is the additive inverse to A .
Example: If 1 −1
2
A = −π
0 −1 then −1
1
− A = π −2 .
0
1 Since a linear operator which is represented by a matrix is still a linear operator,
Theorem 3 (p. 269) certainly holds for matrix addition. We shall rewrite it.
Theorem 5.5 Let A, B, C, . . . be matrices which map Rm into Rn (so they are n ×
m matrices). The set of all such matrices forms an abelian (commutative) group under
addition, that is,
1. A + (B + C ) = (A + B ) + C
2. A + B = B + A
3. A + 0 = A
4. For every A , there is a matrix ( −A ) such that
A + (−A) = 0. 5.1. L : RM → RN . 191 Proof: No need to do this again since it was carried out in even greater generality on
p. 270. For practice, you might want to write out the proof in the special case of 2 × 3
matrices and see how much more awkward the formulas become when you use the speciﬁc
elements instead of proceeding more abstractly as we did in the proof on p. 270.
If α is a scalar and A = ((aij )) is an n × m matrix which represents a linear operator
mapping Rm → Rn , the operator αA is deﬁned by the rule
(αA)X = A(αX )
where X is any vector in Rm . In terms of the elements ((aij )) , this means that the
elements ((˜ij )) of αA are given by
a
m m aij xj =
˜
j =1 aij (αxj ), i = 1, 2, . . . , n (αaij )xj , i = 1, 2, . . . , n j =1
m =
j =1 so aij = αaij . Thus, the matrix αA is found by multiplying each of the elements of A by
˜
α, αa11 αa12 · · · αa1m
a11 a12 · · · a1m
a
. . . . . . . . a2m αa21 αa22 · · · αa2m α 21
. . . . . . . . . . . . . . . . . . . = . . . . . . . . . . . . . . . . . . . . . . . αanl . . . · · · αanm
anl . . . . . . anm
Example: 7
1
3
−14 −2 −6 −2 −1 4 4
2 −8
−1 = 9
6
5 −18 −12 −10
−3
1 −1
6 −2
2 . The following theorem concerns multiplication of matrices by scalars. It is proved either
by direct computation - or more simply by realizing that it is a special case of Exercise 12,
p. 284.
Theorem 5.6 . If A and B are matrices which map Rm → Rn , and if α, β are any
scalars, then
1. α(βA) = (αβ )A
2. 1 · A = A
3. (α + β )A = αA + βA
4. α(A + B ) = αA + αB.
Remark: Theorems 5 and 6 together state that the set of all matrices which map Rm into
Rn forms a linear space. It is easy to show that the dimension of this space is m · n (by
exhibiting m · n linearly independent matrices which span the whole space).
Now we get more algebraic structure and see how to multiply. Let A map R? into Rn
and B = ((bij )) map Rr into Rs . By deﬁnition of operator multiplication (p. 271-2), the
product AB is deﬁned on an element X ∈ Rr = D(B ) by the rule
ABX = A(BX ). 192 CHAPTER 5. MATRIX REPRESENTATION Since the vector BX ∈ Rs must be fed into A , we ﬁnd that BX ∈ Rm too. Thus, in
order for the product AB of an n × m matrix A with a s × r matrix B to make sense,
we must have ? = m , that is, the range of B must be contained in the domain of A ,
a figure goes here
If C = ((cij )) = AB , then for every X ∈ Rr
CX = A(BX )
or
r r s cij xk = bjk xk , i = 1, 2, . . . , n aij bjk xk , i = 1, 2, . . . , n. aij
j =1 k=1 k=1 so
r = s k=1 j =1 Therefore, the elements cik of the product AB are given by the formula
s cik = aij bjk
j =1 i = 1, 2, . . . , n
.
k = 1, 2, . . . , r Since the summation signs have probably overwhelmed you, we repeat it in a special
case. Let B be determined by the linear equations
b11 x1 + b12 x2 + b13 x3 = y1
b21 x1 + b22 x2 + b23 x3 = y2 .
Then B : R3 → R2 . Also let A : R2 → R2 be determined by
a11 y1 + a12 y2 = z1
a21 y1 + a22 y2 = z2 .
The product AB maps a vector X ∈ R3 ﬁrst into Y = BX ∈ R2 and then into Z =
ABX ∈ R2 .
a figure goes here
Ordinary substitution yields Z = ABX as a function of X :
a11 (b11 x1 + b12 x2 + b13 x3 ) + a12 (b21 x1 + b22 x2 + b21 x3 ) = z1
a21 (b11 x1 + b12 x2 + b13 x3 ) + a22 (b21 x1 + b22 x2 + b23 x3 ) = z2 ,
or
(a11 b11 + a12 b21 )x1 + (a11 b12 + a12 b22 )x2 + (a11 b13 + a12 b23 )x3 = z1
(a21 b11 + a22 b21 )x1 + (a21 b12 + a22 b22 )x2 + (a21 b13 + a22 b23 )x3 = z2 . 5.1. L : RM → RN . 193 If we write this in the matrix form x1 x2 =
x3 c11 c12 c13
c21 c22 c23 z1
z2 , we ﬁnd
c11 = a11 b11 + a12 b21 , c12 = a11 b12 + a12 b22 etc., just as was dictated by the general formula for the multiplication of matrices.
Theorem 5.7 . If A = ((aij )) and B = ((bij )) are matrices with B : Rr → Rs and
A : Rs → Rn , then the product C = AB is deﬁned and the elements of the product C =
((cij )) are given by the formula
s cik = aij bjk , i = 1, 2, . . . , n; k = 1, 2, . . . , r. j =a Remark: Since this formula for matrix multiplication is impossible to remember as it
stands, it is fortunate that there is an easy way to remember it. We shall work with the
example of matrices A : R2 → R2 and B : R3 → R2 discussed earlier. Then
AB = b11 b12 b13
b21 b22 b23 a11 a12
a21 a22 = c11 c12 c13
c21 c22 c23 . To compute the element cik , we merely observe that
2 cik = aij bjk = ai1 b1k + a12 b2k
j =1 cik is the scalar product of the i th row in A with the k th column in B (see ﬁg.). Thus,
the element c21 in C = AB is the scalar product of the 2nd row of A with the 1st column
of B . Do not be embarrassed to use two hands to multiply matrices. Everybody does.
Examples:
(1) (cf. p. 274 where this was done without matrices). If
A= 2 −3
−1
1 , B= 02
11 , then
AB = 2 −3
−1
1 BA = 02
11 02
11 = −3
1
1 −1 2 −3
−1
1 = −2
2
1 −2 and
. Notice that even though AB and BA are both deﬁned, we have AB = BA —the
expected noncommutativity in operator multiplication. 194 CHAPTER 5. MATRIX REPRESENTATION (2) (cf. p. 272 bottom where this was done without matrices). If 1 −1
1 ,
A= 0
−1 −2 B = (1, 2, −1), then 1 −1
1 = (2, 3).
BA = (1, 2, −1) 0
−1 −2
However the product AB does not make sense.
From the general theory of linear operators (Theorem 4, p. 276) we can conclude Theorem 5.8 . Matrix multiplication is associative, that is, if
A B C Rk → Rl → Rm → Rn , so the products C (BA) and (CB )A are deﬁned, then
C (BA) = (CB )A.
Thus the parenthesis can be omitted without risking chaos. Remark: Returning to linear algebraic equations, you will observe that the matrix
notation AX there [eq(2)] can now be viewed as matrix multiplication of the n × m
matrix A = ((aij )) with the m × 1 matrix (column vector) X .
In developing the algebra of matrices - and operators in general - we have been neglecting one important issue, that of an inverse operator. If L : V1 → V2 , can we ﬁnd
˜
an operator L : V2 → V1 which reverses the eﬀect of L , that is, if LX = Y , where
˜
˜
X ∈ V1 and Y ∈ V2 , is there an operator L such that LY = X ? If so, then
˜
˜
LLX = LY = X,
and we write
˜
LL = I.
˜
ˆ
This operator L is the left (multiplicative) inverse of L . Similarly, an operator L
ˆ = I is the right (multiplicative) inverse of L . We shall shortly prove
such that LL
ˆ
that if an operator L has both a left inverse L and a right inverse L , then they are
ˆ = L , so without ambiguity one can write L−1 for the inverse.
˜
equal, L
a figure goes here 5.1. L : RM → RN . 195 To begin, we compute the inverse of the matrix
A= 5 −2
3 −1 associated with the system of linear equations
5x1 − 2x2 = y1
3x1 − x2 = y2 .
These equations specify a mapping from R2 into R2 . They map a point X into Y .
Finding the inverse of A is equivalent to answering the question, if we are given a point
Y , can we ﬁnd the X whence it came?
AX = Y, X = A−1 Y. Finding the X in terms of Y means solving these two equations, a routine task. The
answer is
x1 = −y1 + 2y2
x1
−1 2
y1
so
=
x2 = −3y1 + 5y2 .
x2
−3 5
y2
Thus,
X = A−1 Y,
where
A−1 = −1 2
−3 5 . The matrix A−1 is the matrix inverse to A . It is easy to check that
AA−1 = 5 −2
3 −1 −1 2
−3 5 = 10
01 =I A−1 A = −1 2
−3 5 5 −2
3 −1 = 10
01 = I. and Thus, this matrix A−1 is both the right and left inverse of A .
Our second example is of a more geometric nature. We shall consider a matrix R which
represents rotation of a vector in E2 through an angle α .
a figure goes here
R is represented by the matrix (cf. Ex. 13b p. 285)
R= cos α − sin α
sin α
cos α . It is geometrically clear that in inverse of this operator R is an operator which rotates
through an angle −α , unwinding the eﬀect of R . Thus, immediately from the formula for
R , we ﬁnd
cos(−α) − sin(−α)
cos α sin α
R −1 =
=
.
sin(−α)
cos(−α)
− sin α cos α 196 CHAPTER 5. MATRIX REPRESENTATION To check that geometry has not deceived us, we should multiply out RR−1 and R−1 R .
Do it. You will ﬁnd RR−1 = R−1 R = I . One could also have found R−1 by solving linear
algebraic equations as was done in the ﬁrst example.
The problem of ﬁnding the matrix inverse to any square matrix, a11 a12 · · · a1n a21 a22 · · · a2n A= .
.
.
.
.
.
an1 an2 · · · ann is equivalent to the dull problem of solving n linear algebraic equations in n unknowns
a11 x1 + · · ·
a21 x1 + · · ·
.
.
.
an1 x1 + · · ·
for X in terms of Y,
yields the formulas +a1n xn
+a2n xn
.
.
. +ann xn = yn = y1
= y2 X = A−1 Y . For n = 2 the computation is not too grotesque, and
a12
a22
y1 −
y2
∆
∆
a11
−a21
y1 +
y1
x2 =
∆
∆ x1 = where ∆ = a11 a22 − a12 a21 (= determinant of A , for those who have seen this before).
From this formula we read oﬀ that the inverse of the 2 × 2 matrix
A= a11 a12
a21 a22 is A−1 = 1
∆ a22 −a12
−a21
a11 . As a check, one computes that
AA−1 = A−1 A = I.
Thus the 2 × 2 matrix A has an inverse if and only if ∆ := a11 a22 − a12 a21 = 0 .
a figure goes here
Fortunately, one rarely needs the explicit formula for the inverse of a square n × n
matrix other than the reasonable cases n = 2 and n = 3 . The inverse of a matrix has
greater conceptual use as the inverse of an operator.
Having relegated the computation of the inverse of a matrix to the future, let us see
what can be said about the inverse without computation. This will necessarily be a bit
more abstract. Since the issues involve solving systems of linear algebraic equations, we
shall invoke the theory concerning that which was developed in Chapter 4 Section 3. For
this discussion, it is convenient to use the following deﬁnition (cf. p. 6).
Definition: An operator A : V1 → V2 is invertible if it has the two properties
i) If X1 = X2 then AX1 = AX2 (injective, 1-1)
ii) To every Y ∈ V2 , there is at least one X ∈ V1 such that AX = Y (surjective,
onto). 5.1. L : RM → RN . 197 Thus, an operator is invertible if and only if it is bijective. An invertible matrix is usually
called non-singular, while a matrix which is not invertible is called singular.
To show that this deﬁnition is identical with the previous one, we must show that every
invertible linear operator A has a right and left inverse. A more pressing matter though,
is
Theorem 5.9 . If the linear operator A : V1 → V2 where V1 and V2 are ﬁnite dimensional, is invertible, then dim V1 = dim V2 , so a matrix must necessarily be square for an
inverse to exist (but being square is not suﬃcient, as was seen in the 2 × 2 case where the
additional condition a11 a22 − a21 a12 = 0 we needed). In other words, you haven’t got a
chance to invert a matrix unless it is square, but being square is not enough.
Proof: Condition i) states that N(A) = 0 , for if X1 = 0 , then AX1 = 0 . Therefore
dim R(A) = dim D(A) − dim N(A) = dim V1 − 0 = dim V1 .
On the other hand, condition ii) states that V2 ⊂ R(A) . Since A : V1 → V2 , we know that
R(A) ⊂ V2 . Therefore R(A) = V2 . Coupled with the ﬁrst part, we have
dim V1 = dim R(A) = dim V2 .
Theorem 5.10 . Given an operator A which is invertible, there is a linear operator A−1
such that AA−1 = A−1 A = I .
˜
˜
˜
˜
˜
Proof: If Y ∈ V2 , there is an X ∈ V1 such that AX = Y (by property ii), and that X
−1 Y = X . A similar
˜
˜
is unique (property i). Therefore without ambiguity we can deﬁne A
process deﬁnes the operator A−1 for every Y ∈ V2 . From our construction, it is clear (or
should be) that
AA−1 = A−1 A = I.
˜
˜
ˆ
˜
All that remains is to show A−1 is linear. If AX = Y and AX = Y , then since A is linear,
˜ + bX ) = aAX + bAX = aY + bY . Thus A−1 (aY + bY ) = aX + bX = aA−1 Y + bA−1 Y .
ˆ
˜
ˆ
˜
ˆ
˜
ˆ
˜
ˆ
˜
ˆ
A(aX
Remark: Glancing over this proof, it should be observed that ﬁnite dimensionality (or
even the concept of dimension) never entered - so the result is true for inﬁnite dimensional
spaces. Furthermore, linearity was only used to show that A−1 was linear. Thus the
theorem (except for the claim that A−1 is linear) is true for nonlinear operators as well.
Needless to say, this construction of A−1 one point at a time is useless as a method for
ﬁnding A−1 (since even in the simplest case A : R1 → R1 it involves an inﬁnite number of
points).
This theorem shows that if an operator A is invertible, then there are right and left
inverses which are equal AA−1 = A−1 A = I . We can reverse the theorem and prove
ˆ
Theorem 5.11 . Given the linear operator A : V1 → V2 , if there are linear operators A
˜ (left inverse) such that
(right inverse) and A
ˆ
˜
AA = AA = I,
ˆ
˜
then A is invertible and A−1 = A = A . 198 CHAPTER 5. MATRIX REPRESENTATION ˜
˜
˜
Proof: Verify condition i: If AX1 = AX2 , then AAX1 = AAX2 . Since AA = I , this
implies X1 = X2 .
ˆ
ˆ
Verify condition ii. If Y is any element in V2 , let X = AY . Then AX = AAY = Y ,
so that Y is the image of X under the mapping.
˜
ˆ
The proof that A−1 = A = A is delightfully easy. Only the associative property of
multiplication is used:
ˆ
ˆ
ˆ
˜
˜
˜
A = (A−1 A)A = A−1 (AA) = A−1 = (AA)A−1 = A(AA−1 ) = A.
Examples:
(1) The identity operator I on every linear space is invertible, for it trivially satisﬁes
both criteria. Not only that, but it is its own inverse for II = I .
(2) The zero operator is never invertible, for even though X1 = X2 , we always have
0(X1 ) = 0 = 0(X2 ) .
(3) The 2 × 2 matrix
1
3
−2 −6 A=
is not invertible since, from the formula
AX = 1
3
−2 −6 x1
x2 = x1 + 3x2
−2x1 − 6x2 , we see that the vector (−3, 1) = 0 is mapped into zero by A (whereas criterion i).
states that only 0 can be mapped into 0 by an invertible linear operator). Another
way to see that A is not invertible is to observe that ∆ = a11 a22 − a12 a21 = 0 . thus
violating the explicit condition for 2 × 2 matrices found earlier.
In this last example, we observed that if a linear operator A is invertible, then by
property i) the equation AX = 0 has exactly one solution X = 0 . If A : V1 → V2 on a
ﬁnite dimensional space, and dim V1 = dim V2 the converse is true also.
Theorem 5.12 If the linear operator A maps the linear space V1 into V2 and dim V1 =
dim V2 < ∞ , then
A is invertible ⇐⇒ AX = 0 implies X = 0.
Proof: ⇒ A restatement of condition i) in the deﬁnition.
⇐ A restatement of lines 7-10 on page 316.
Corollary 5.13 . A square matrix A = ((aij )) is invertible if and only if its columns a12
a1n
a11 a21 a22 a2n ·
·
· A1 = , A2 = , . . . , An = ·
·
· ·
·
·
an1
an2
ann
are linearly independent vectors. 5.1. L : RM → RN . 199 Proof: To test for linear independence, we examine
xa A1 + x2 A2 + · · · + xn An = 0,
and try to prove that x1 = x2 = · · · = xn = 0 . But writing the equation in full, it reads
a11 x1 + a12 x2 + · · ·
a21 x1 + a22 x2 + · · ·
·
·
·
·
·
·
an1 x1 + an2 x2 + · · · +a1n xn = 0
+a2n xn = 0
·
·
·
+ann xn = 0, or
AX = 0.
By the theorem, A is invertible if and only if the equation AX = 0 has only the solution
X = 0 . Thus A is invertible if and only if the only solution of
x1 A1 + x2 A2 + · · · + xn An = 0
is x1 = x2 = · · · = xn = 0 .
We close our discussion of invertible operators with
Theorem 5.14 . The set of all invertible linear operators which map a space into itself
constitutes a (non- commutative) group under multiplication; that is, if L1 , L2 , . . . are
invertible operators which map V into itself then they satisfy
0. Closed under multiplication ( L1 L2 is an invertible linear operator which maps V
into itself ).
(1) L1 (L2 L3 ) = (L1 L2 )L3 - Associative
(2) There is an identity I such that
IL = LI = L.
(3) For every operator L in the set, there is another operator L−1 for which
LL−1 = L−1 L = I.
Proof: 0) L1 L2 is a linear operator which maps V into itself by part 0. of Theorem 4
(p. 276). It is invertible since its inverse can be written in the explicit form (an important
formula)
(L1 L2 )−1 = L−1 LL−1 ,
2
1
as we will verify:
(L1 L2 )(L−1 L−1 ) = L1 (L2 L−1 )L−1 = L1 IL−1 = L1 L−1 = I
2
1
2
1
1
1
connect these?? (L−1 L−1 )(L1 L2 ) = L−1 (L−1 L1 )L2 = L−1 IL2 = L−1 IL2 = L−1 L2 = I.
2
1
2
1
2
2
2
(1) Part 1 of Theorem 4 (p. 276). 200 CHAPTER 5. MATRIX REPRESENTATION (2) Part 1 of Theorem 5 (p. 277)
(3) A direct restatement of the fact that our set consists only of invertible operators.
Closely associated with a matrix A : Rn → Rm a11 a12 · · · a21 · · · · · · ·
A=
· ·
am1 am2 · · ·
is another matrix A∗ , the transpose or adjoint
the rows and columns of A , viz. a11 a21 a12 a22 ·
∗
A =
· ·
a1n · · · a1n
·
·
·
·
amn . of A , which is obtained by interchanging
···
··· ··· am1
·
·
·
·
amn . For example, 1
2
if A = 4 −2 , then A∗ =
5 −2 1
4
5
2 −2 −1 If A = ((aij )) , then A∗ = ((aij )) . The adjoint of an m × n matrix
Thus, if A : Rn → Rm then A∗ : Rm → Rn , and for any Z ∈ Rm , we a11 z1 + a21 z2 +
a11 a21 · · · am1
z1 a12 z1 + · · · a12 a22 · · ·
· · ·
· ∗ · = · A Z=
· · ·
· ·
· ·
zm
a1n z1 + · · ·
a1n · · · · · · amn .
is an n × m matrix.
have · · · + am1 zm
· · · + am2 zm ·
, · ·
· · · + amn zm so the j th component (A∗ Z )j of the vector A∗ Z ∈ Rn is
m (A∗ Z )j = aij zi = a1j z1 + a2j z2 + · · · + amj zm .
i=1 Beware: The classical literature on matrices uses the term “adjoint of a matrix” for an
entirely diﬀerent object. Our nomenclature is now standard in the theory of linear operators.
A real square matrix A is called symmetric or self-adjoint if A = A∗ . For example, 7
2 −3
5 = A∗ .
A = 2 −1
−3
5
4
For a symmetric matrix A , we have aij = aji . 5.1. L : RM → RN . 201 The signiﬁcance of the adjoint of a matrix (as well as its relation to the more general
conception of the adjoint of an arbitrary operator) arises in the following way. If A : En →
Em , then for any X in En the vector Y = AX is a vector in Em . We can form the scalar
product of this vector Y = AX with any other vector Z in Em (because Y and Z are
both in Em
Z, Y = Z, AX .
Since A∗ : Em → En , and Z ∈ Em , then A∗ Z makes sense, and is a vector in En , so
A∗ Z, X is a real number for any X ∈ En . Claim:
Z, AX = A∗ Z, X .
This is easy to verify. Let A = ((aij )) . Then
n m (AX )i = and (A∗ Z )j = aij xj
j =1 aij zi ,
i=1 so that
m Z, AX = m zi (AX )i =
i=1
m n zi (
i=1 aij xj )
j =1 n = zi aij xj .
i=1 j =1 In the same way,
n A∗ Z, X = n (A∗ Z )j xj =
j =1
mn = m ( aij zi )xj j =1 i=1 zi aij xj .
i=1 j =1 Comparison reveals we have proved
Theorem 5.15 . If A : En → Em , then for any X ∈ En and any Z ∈ Em ,
Z, AX = A ∗ Z, X ,
where A∗ is the adjoint of A . Remark: From a more abstract point of view, the operator A∗ is usually deﬁned as the
operator which has the above property. If this deﬁnition is adopted, one must use it to prove
the adjoint A∗ of a matrix A is found by merely interchanging the rows and columns (try
to do it!).
It is remarkably easy to obtain some properties of the adjoint by using Theorem 14.
Our attention will be restricted to square matrices (although the results are still true with
but minor modiﬁcations for a rectangular matrix). 202 CHAPTER 5. MATRIX REPRESENTATION Theorem 5.16 . Let A and B be n × n matrices (so the products AB , BA , B ∗ A∗ ,
A + B etc. are all deﬁned). Then
0. I ∗ = I (because I is symmetric)
1. (A∗ )∗ = A
2. (AB )∗ = B ∗ A∗
3. (A + B )∗ = A∗ + B ∗ .
4. (cA)∗ = cA∗ , c is a real scalar.
5. A is invertible if and only if A∗ is invertible, and
(A∗ )−1 = (A−1 )∗ .
6. A is invertible if and only if the rows of A are linearly independent.
Proof: We could use subscripts and the aij stuﬀ - but it is clearer to use the result of
Theorem 14. In order to do so, an important preliminary result is needed.
Theorem 5.17 . If C : En → Em , then the equation
C x, Y = 0
⇐⇒ for all X in En and Y in Em C is the zero operator, C = 0 . Thus if C1 and C2 map En into itself, the equation
C1 X, Y = C2 X, Y for all X, Y ∈ En ⇐⇒ C1 = C2 . Proof: ⇒ By contradiction, if C = 0 there is some X0 such that 0 = CX0 ∈ En . Now
just pick Y0 = CX0 . Then
0 = C X0 , Y0 = C X0 , CX0 = C X0 2 >0 because by assumption CX0 = 0 . A glance at this line reveals the desired contradiction.
⇐ Obvious.
The last assertion of the theorem follows by subtraction,
0 = C1 X, Y − C2 X, Y = C1 X − C2 X, Y = (C1 − C2 )X, Y
and letting C = C1 − C2 .
Now we return to the
Proof of Theorem 15: The vectors X, Z will be in En .
(0) Particularly clear because I is symmetric. You should try constructing another
proof patterned on those below.
(1) Two successive interchanges of the rows and columns of a matrix leave it unchanged.
Again, try to construct another proof patterned on those below.
(2) (AB )∗ Z, X = Z, ABX = Z, A(BX ) = A∗ Z, BX
= B ∗ (A∗ Z ), X = (B ∗ A∗ )Z, X
for all X, Z in En . Application of Theorem 16 yields the result. 5.1. L : RM → RN .
(3) 203 (A + B )∗ Z, X = Z, (A + B )X = Z, AX + BX
= Z, AX + Z, BX, = A∗ Z, X + B ∗ Z, X
= A∗ Z + B ∗ Z, X = (A∗ + B ∗ )Z, X .
And apply Theorem 16. (4) (cA)∗ Z, X = Z, cAX = c Z, AX
= c A∗ Z, X = (cA∗ )Z, X .
Apply Theorem 16. (5) If A is invertible, then AA−1 = A−1 A = I . An application of parts 0 and 2 shows
(A−1 )∗ A∗ = (AA−1 )∗ = I ∗ = I.
Similarly, A∗ (A−1 )∗ = I . Thus A∗ has a left and right inverse, so it is invertible by
Theorem 11. The above formulas reveal (A∗ )−1 = (A−1 )∗ .
In the other direction, assume A∗ is invertible. Since A∗∗ = A (part 1) the matrix
A is the adjoint of A∗ . But we just saw that if a matrix is invertible then its adjoint
is too. Thus the invertibility of A∗ implies that of A .
(6) By the Corollary to Theorem 12, A∗ is invertible if and only if its columns are linearly
independent. Since the columns of A∗ are the rows of A , we ﬁnd that A∗ is invertible
if and only if the rows of A are linearly independent. Coupled with Part 5, the proof
is completed.
In our later work we shall need an inequality. Why not insert it here for future reference.
Theorem 5.18 . If A = ((aij )) is an m × n matrix, so A : En → Em , then for any X
in En and Y in Em
AX ≤ k X
and
| Y, AX | ≤ k X
where m Y, n k2 = a2 .
ij
i=1 j =1 Proof: By deﬁnition
m AX 2 m (AX )2
i = ( = i=1 n aij xj )2 , i=1 j =1 where (AX )i is the i th component of the vector AX . The Schwarz inequality shows
n n
2 aij xj ) ≤ (
j =1 n a2
ij
j =1 n x2
j
j =1 =X 2 a2 .
ij
j =1 204 CHAPTER 5. MATRIX REPRESENTATION Thus, m AX 2 ≤X 2 n ( a2 ) = k 2 X
ij 2 , i=1 j =1 which proves the ﬁrst part. The second part follows from this and one more application of
Schwarz:
| Y, AX | ≤ Y AX ≤ k X Y .
After all of this detailed discussion of matrices as an example of a linear operator L
mapping one ﬁnite dimensional space into another, our next theorem will show why matrices
are so ubiquitous. You see, we shall prove that every such linear operator L : V1 → V2 can
be represented as a matrix after bases for V1 and V2 have been selected.
Theorem 5.19 . (Representation Theorem) Let L be a linear operator which maps one
ﬁnite dimensional space into another
L : V1 → V2 .
Let { e1 , e2 , . . . , en } be a basis for V1 , and { θ1 , θ2 , . . . θm } be a basis for V2 . Then in
terms of these bases L may be represented by the matrix θ Le whose j th column is the
vector (Lej )θ , that is, the vector Lej (which is a vector in V2 ) written in terms of the θ
basis for V2 . Pictorially we have
θ Le = ((Le1 )θ · · · (Len )θ ) . Proof: Finding the representation of L in terms of given bases for V1 and V2 means:
given a vector X in V1 which is represented in the e basis for V1 (write it as Xe ) to ﬁnd
a matrix θ Le such that the image vector θ Le Xe is the image (LX )θ of X written in the
θ basis for V2 . We have used the cumbersome notation θ Le to make explicit the fact that
it maps vectors written in the e basis for V1 into vectors written in the θ basis V2 .
To avoid even further notation, we shall carry out the details only for the particular case
where the domain V1 is two dimensional with basis { e1 , e2 } and V2 is three dimensional
with basis { θ1 , θ2 , θ3 } . The general case is proved in the same way.
Since the vectors Le1 and Le2 are in V2 , they can be written in the θ basis, say,
Le1 = a1 θ1 + b1 θ2 + c1 θ3 , Le2 = a2 θ1 + b2 θ2 + c2 θ3 ,
so a1
(Le1 )θ = b1 c1 a2
and (Le2 )θ = b2 .
c2 Given X in V1 , it can be written in the e basis for V1 ,
X = x1 e1 + x2 e2 , so Xe = x1
x2 . Then
LX = L(x1 e1 + x2 e2 ) = x1 Le1 + x2 Le2
= x1 (a1 θ1 + b1 θ2 + c1 θ3 ) + x2 (a2 θ1 + b2 θ2 + c2 θ3 )
= (a1 x1 + a2 x2 )θ1 + (b1 x1 + b2 x2 )θ2 + (c1 x1 + c2 x2 )θ3 . 5.1. L : RM → RN . 205 If we write LX as a column vector in the θ basis it is a1 x1 + a2 x2
(LX )θ = b1 x1 + b2 x2 c1 x2 + c2 x3
which is recognized as a product a1 a2
θ L e = b1 b2 c1 c2 a1 a2
= b1 b2 X e
c1 c2 x1
x2 therefore, the matrix we want is a1 a2 b1 b2 ((Le1 ) (Le2 ) ) ,
θ Le =
θ
θ
c1 c2 a matrix whose j th column is the vector Lej written in the θ basis for V2 .
x Example: Consider the integral operator L : = 0 as a map of the two dimensional space
P1 into the three dimensional space P2 . Any bases for P1 and P2 will do, however we
must simply ﬁx our attention to speciﬁc bases. Say
basis for P1 := { e1 (x) = 1,
e2 (x) = x }
basis for P2 := { θ1 (x) = 1+x , θ2 (x) = 1−x ,
θ3 (x) = x2 }.
2
2
Then
x 1 dt = x = θ1 − θ2 Le1 =
0 and
x Le2 = t dt =
0 Therefore 1
(Le1 )θ = −1 ,
0 x2
1
= θ3 .
2
2 0
and (Le2 )θ = 0 , 1
2 so 1
Le = ((Le1 )θ (Le2 )θ ) = −1
θ
0 0
0
1
2 is the matrix representing L in terms of the given e basis for P1 and θ basis for P2 . To
make you believe this, let us evaluate
x LP = P
0 for some polynomial p ∈ P1 by using the matrix. For example, p(x) = 3 − x = 3e1 − e2 ,
3
so in the e basis for P1 , P3 =
. Its image under L in terms of the θ basis for
−1
P2 is then 10
3
3
(Lp )θ =θ L(pe ) = −1 0 = −3 ;
e
−1
1
1
02
−2 206 CHAPTER 5. MATRIX REPRESENTATION that is,
1+x
1−x
1
1
1
) − 3(
) − (x2 ) = 3x − x2
Lp = 3θ1 − 3θ2 − θ3 = 3(
2
2
2
2
2
which, of course, agrees with
x x p(t) dt =
0 0 1
(3 − t) dt = 3x − x2 .
2 WARNING: If we had used a diﬀerent basis for either P1 or P2 , the resulting matrix
representing L would be diﬀerent. For example, if the same basis were used for P1 but a
diﬀerent basis for P2 ,
˜
˜
θ basis for P2 := { θ1 (x) = 1,
then
˜
Le1 = x = θ2
so and 0
(Le1 )θ = 1 ,
˜
0 ˜
θ2 (x) = x, ˜
θ3 (x) = x2 }, 1˜
x2
= θ3 ,
2
2 0
(Le2 )θ = 0 .
˜
Le2 = 1
2 ˜
˜
Therefore the matrix θL e which represents L in terms of the e basis for P1 and the θ
basis for P2 is 00 1 0 .
˜
θ Le =
1
02
˜
Again, if p(x) = 3 − x = 3e1 − e2 , then in the θ basis 00
3
(Lp )θ =θ Le Pe = 1 0 ˜
˜
−1
01
2 = 0
3 ; 1
−2 that is,
1˜
1
˜
˜
Lp = 0θ1 + 3θ2 − θ3 = 3x − x2 ,
2
2
to no one’s surprise.
Observe that the matrices θ Le and θ L3 both represent L —but with respect to dif˜
ferent basis. The second matrix θ Le is somewhat simpler that the ﬁrst since it has more
˜
zeroes. It is often useful to pick bases in order that the representing matrix be as simple as
possible. We shall not discuss that issue right now.
There is a simple class of operators (transformations) which are not linear, but enjoy
most of the properties which linear ones do. They are aﬃne operators, or aﬃne transformations. To deﬁne them, it is best to ﬁrst deﬁne the translation operator.
Definition: If V is any linear space and Y0 a particular element of V , then the operator
T : V → V deﬁned by
T Y = Y + Y0 , Y ∈ V,
is the translation operator. It translates a vector Y into the vector Y + Y0 . 5.1. L : RM → RN . 207 Definition: An aﬃne transformation A is a linear transformation L followed by a translation. if L : V1 → V2 and Y0 ∈ V2 , it has the form
AX := LX + Y0 . X ∈ V1 , Y0 ∈ V2 . Aﬃne transformations can be added and multiplied by the same deﬁnition which governed linear transformations. Thus, if A and B are aﬃne transformations mapping V1
into V2 ,
(A + B )X := AX + BX.
In particular, if AX = L1 X + Y0 and BX = L2 X + Z0 , where Y0 and Z0 are in V2 , then
(A + B )X = AX + BX = L1 X + Y0 + L2 X + Z0
= (L1 + L2 )X + (Y0 + Z0 ).
Similarly, if A : V1 → V2 and B : V3 → V4 , where V2 ⊂ V3 , then
(BA)X := B (AX ) = B (L1 X + Y0 ) = L2 (L1 X + Y0 ) + Z0
= L2 L1 X + L2 Y0 + Z0 ,
where Y0 ∈ V2 and Z0 ∈ V4 .
You will carry out the (straightforward) proofs of the algebraic properties for aﬃne
transformations in Exercise 23.
The curtain on this longest of sections will be brought down with a brief discussion
of the operators which characterize rigid body motions, or Euclidean motions, as they are
often called.
Definition: The transformation R : En → En is an isometric transformation, (or Euclidean
transformation or rigid body transformation) if the distance between two points is preserved
(invariant) under the transformation. Thus, R is an isometry if
RX − RY = X − Y
for all X and Y in En .
It is interesting to think for a moment how all these names originated. The phrase
rigid body transformation arises from the idea that any motion of a rigid body (such as a
translation or rotation) does not alter the distance between any two points in the body. In
the framework of Euclidean geometry the whole notion of congruence is deﬁned to be just
those properties of a ﬁgure which are invariant under isometries. By allowing deformations
other than isometries, one obtains geometries, so aﬃne geometry is the study of properties
invariant under all aﬃne motions.
The study of isometric transformations is mainly contained in that of a special case,
orthogonal transformations. These are isometries which leave the origin ﬁxed, R0 = 0 . It
should be clear from our next theorem (part 3) that the idea of an orthogonal transformation
generalizes the idea of a rotation to higher dimensional space. Reﬂections (mirror images)
are also orthogonal transformations. Theorem 20 states that every isometric transformation
is the result of an orthogonal transformation followed by a translation.
Example: The matrix R =
X= x1
x2 , 1
0
0 −1 deﬁnes an orthogonal transformation since if then RX = 1
0
0 −1 x1
x2 = x1
−x2 , 208 CHAPTER 5. MATRIX REPRESENTATION and if
Y= y1
y2 , y1
−y2 then RY = . Consequently RX − RY = X − Y = (x1 − y1 )2 + (x2 − y2 )2 , so R , being isometric
and linear is an orthogonal transformation. It represents a reﬂection across the x1 axis.
Our deﬁnition of an orthogonal transformation does not presume its linearity. This is
because the linearity is a consequence of the given properties. A proof is outlined in Ex.
16, p. 390. For convenience, the linearity will be assumed in the following theorem where
we collect the standard properties of orthogonal transformations.
Theorem 5.20 . Let R : En → En be a linear transformation. The following properties of
R are equivalent.
(1) R is an orthogonal transformation, that is
RX − RY = X − Y
(2) RX, RY = X, Y R 0 = 0. RX = X (3) and (so angles are preserved) (4) R∗ R = I
(5) R is invertible and R−1 = R∗ . (Only in this part do we use the ﬁnite dimensionality of En ).
Proof: We shall prove the following chain of implications: 1 =⇒ 2 =⇒ 3 =⇒ 4 =⇒ 5 =⇒
4 =⇒ 1
1 =⇒ 2 . Trivial, for
RX = RX − R0 = X − 0 = X .
2 =⇒ 3 . By linearity and part 2) applied to the vector X + Y , we have
RX + RY = R(X + Y ) = X + Y .
Now square both sides and express the norm as a scalar product:
RX + RY, RX + RY = X + Y, X + Y .
Upon expanding both sides, we ﬁnd that
RX 2 + 2 RX, RY + RY 2 =X 2 + 2 X, Y + Y 2 . Since by part 2) RX = X and RY = Y , we are done.
3 =⇒ 4 . By part 3) and Theorem 14 (p. 369),
R∗ RX, Y = RX, RY = X, Y .
Thus, an application of the second part of Theorem 16 (p. 371) gives us R∗ R = I .
4 =⇒ 5 . Since X = R∗ RX , we see that RX = 0 implies X = 0 , consequently, R is
invertible (Theorem 12, p. 364). Moreover R∗ R = I so R∗ = R−1 .
5 =⇒ 4 . Clear, since R∗ = R−1 . 5.1. L : RM → RN . 209 5 =⇒ 1 . Because R is linear, R0 = 0 . It remains to show that RX − RY = X − Y ,
an easy computation. RX − RY 2 = R(X − Y ) 2 = R(X − Y ), R(X − Y ) , so using 4)
= R∗ R(X − Y ), X − Y = (X − Y ), (X − Y ) = X − Y 2 . Done.
Earlier in this section (p. 357-8) we considered a matrix R which represented the
operator which rotates a vector in E2 through an angle α . This matrix is the simplest
(non-trivial) example of a rigid body transformation which leaves the origin ﬁxed, that is,
an orthogonal transformation.
R= cos α − sin α
sin α
cos α . To prove that R is an orthogonal matrix, by Theorem 19 part 3, it is suﬃcient to verify
RX, RY = X, Y for all X and Y in E2 . A calculation is in order here.
RX = cos α − sin α
sin α
cos α x1
x2 = x1 cos α − x2 sin α
x1 sin α + x2 cos α . Similarly for RY , just replace x1 and x2 by y1 and y2 respectively. Then
RX, RY = (x1 cos α − x2 sin α)(y1 cos α − y2 sin α) + missing?
(x1 sin α + x2 cos α)(y1 sin α + y2 cos α)
= x1 y1 cos2 α − (x1 y2 + x2 y1 ) sin α cos α + x2 y2 sin2 α
+x1 y1 sin2 α + (x1 y2 + x2 y1 ) sin α cos α + x2 y2 cos2 α
= x1 y1 + x2 y2 = X, Y . Done.
We previously found an expression for R−1 (p. 358) by geometric reasoning. It is
reassuring to notice R−1 = R∗ , just as part 5 of our theorem states.
The most general rotation in E3 may be decomposed into a product of these simple
two dimensional rotations. For a brief discussion - complete with pictures - open Goldstein,
Classical Mechanics to pp. 107-9.
Now to the last theorem of this section.
Theorem 5.21 . If R : En → En is a rigid body transformation, then for every X ∈ En
RX = R0 X + X0 ,
where R0 is an orthogonal transformation (rotation) and X0 is a ﬁxed vector in En . Thus,
every rigid body motion is composed of a rotation (by R0 and a translation (through X0 ).
Proof: Let R0 X = RX − R0 . Since
R0 0 = R 0 − R 0 = 0,
the operator R0 has the property R0 0 = 0 . Furthermore, for any X and Y in En ,
R0 X − R0 Y = RX − R0 − RY + R0
= RX − RY = X − Y .
Therefore R0 satisﬁes the deﬁnition of an orthogonal transformation. The proof is completed by deﬁning X0 to be the image of the origin under R, X0 = R0 . Then
R0 X = RX − X0 ,
or
RX = R0 X + X0 . 210 5.2 CHAPTER 5. MATRIX REPRESENTATION Supplement on Quadratic Forms Quadratic polynomials of the form
Q(X ) = αx2 + βx1 x2 + γx2 ,
1
2 X = (x1 , x2 ) and the generalization to n variables X = (x1 , x2 , . . . , xn )
n n Q(X ) = αij xi xj
i j =1 often arise in mathematics. They are called quadratic forms and can always be represented
in the form X, SX where S is a self adjoint matrix. For example, the ﬁrst quadratic
form can be written as
Q(X ) = (x1 , x2 ) α
β
2 β
2 γ x1
x2 = X, SX , where S is the matrix indicated.
The procedure for ﬁnding the elements ((aij )) of the matrix S is simple. First take
care of the diagonal terms by letting aii be the coeﬃcient of x2 in Q(X ) . Realizing that
1
xi xj = xi xj , collect the terms αij xi xj and αji xj xi in Q(X ) , getting (αij + αji )xi xj .
Then let
1
aij = aji = (αij + αji ) i = j.
2
Example: Q(X ) = x2 − 2x1 x3 − x2 + 6x1 x2 + 4x3 x1 . Rewrite this as Q(X ) = x2 − x2 +
1
2
1
2
6x1 x2 + 2x1 x3 . Then 1
31
S = 3 −1 0 1
00
and
Q(X ) = X, SX .
as you can easily verify.
Definition: A quadratic form Q(X ) is positive semi deﬁnite if Q(X ) ≥ 0 for all X and
positive deﬁnite if Q(X ) > 0, x = 0. Q(X ) is negative semi deﬁnite or negative deﬁnite
if, respectively, Q(X ) ≤ 0 , or Q(X ) < 0, X = 0 . If S is the self adjoint matrix associated
with the quadratic form Q(X ) , then S is positive semi deﬁnite, positive deﬁnite, etc., if
Q(X ) has the respective property.
We may think of Q(X ) as representing a quadratic surface. Thus, if S is diagonal,
for example 200
S = 0 1 0 ,
003
with positive diagonal elements, then the equation Q(X ) = 1 , where Q(X ) = X, SX =
2x2 + x2 +3x2 , represents an ellipsoid. This matrix S is positive deﬁnite since by inspection
1
2
3
Q(X ) > 0, X = 0 .
It is easy to see if a diagonal matrix S is positive semi deﬁnite, negative semi deﬁnite,
positive deﬁnite, or negative deﬁnite. 5.2. SUPPLEMENT ON QUADRATIC FORMS 211 Example: The diagonal matrix γ1 . . . 0 S = ... 0 . . . γn is
(a) positive semi deﬁnite if and only if γ1 , . . . , γn are all non-negative,
(b) positive deﬁnite if and only if γ1 , . . . , γn are all positive (not zero), and the obvious
statements for negative semi deﬁnite and negative deﬁnite.
The problem of determining if a non diagonal symmetric matrix is positive etc. is more
subtle. We shall ﬁnd necessary and suﬃcient conditions for the two variable case, but only
necessary conditions for the general case.
Consider the 2 × 2 self-adjoint matrix
S= ab
bc and the associated quadratic form
Q(X ) = ax2 + 2bxy + cy 2 .
There are several cases.
(i) If a = 0 , then
Q(X ) = (2bx + cy )y.
If b = 0 , by choosing x and y appropriately, we can make Q(X ) assume both positive
and negative values. Thus, for a = 0, b = 0, Q can be neither a positive nor a negative
semi-deﬁnite form. On the other hand, if a = 0 , and b = 0 , then Q is positive (negative)
semi deﬁnite if and only if c ≥ 0 (c ≤ 0) . If a = 0, Q can never be positive deﬁnite or
negative deﬁnite since if X = (x, 0) where x = 0 , then Q(X ) = 0 but X = 0 .
(ii) If a = 0 , then Q can be written as
Q(X ) = 1
[(ax + by )2 = (ac − b2 )y 2 ].
a We can immediately read oﬀ the conditions from this. Q is positive semi deﬁnite (deﬁnite)
if and only if a > 0 and ac − b2 ≥ 0 (ac − b2 > 0) , and negative semi deﬁnite (deﬁnite) if
and only if a < 0 and ac − b2 ≥ 0 (ac − b2 > 0) .
In summary, we have proved
Theorem 5.22 A. Let Q(X ) = ax2 + 2bxy + cy 2 , and S be the associated symmetric
matrix. Then
(a) Q is positive semi deﬁnite if and only if a ≥ 0 and ac − b2 ≥ 0 (this implies c ≥ 0
too).
(b) Q is positive deﬁnite if and only if a > 0 and ac − b2 > 0 (this implies c > 0
too). 212 CHAPTER 5. MATRIX REPRESENTATION The general case of a quadratic form in n variables is much more diﬃcult to treat.
There are known necessary and suﬃcient conditions, but they are not too useful in practice,
especially for a large number of variables. We shall only prove one necessary condition for a
quadratic form to be positive semi-deﬁnite (or positive deﬁnite), a condition which is both
transparent to verify in practice and even easier to prove.
THEOREM B. If the self adjoint matrix S = ((aij )) is positive deﬁnite, then the
diagonal elements must all be positive, a11 , a22 , . . . , ann > 0 . Similarly, if S is negative
deﬁnite then the diagonal elements must all be negative.
n Proof: Q(X ) = X, SX = aij xi xj . Since Q is positive deﬁnite, Q(X ) > 0 for all
i,j =1 X = 0 . In particular, Q(ek ) > 0, k = 1, . . . , n , where ek is the k th coordinate vector
ek = (0, 0, . . . , 0, 1, 0, . . . , 0) . But Q(ek ) = akk . Thus akk > 0, k = 1, . . . , n , just what we
wanted to prove.
Examples: 1. The quadratic form Q(X ) = 3x2 + 743xy − y 2 + 4z 2 + xz is positive deﬁnite
or semi deﬁnite since the coeﬃcient of y 2 is negative. It is not negative deﬁnite or semi
deﬁnite since the coeﬃcient of x2 is positive.
2. The quadratic form Q(X ) = x2 − 5xy + y 2 + 2z 2 satisﬁes the necessary conditions
of Theorem B, but the conditions of Theorem D were not suﬃcient conditions for positive
deﬁniteness. Thus, we cannot conclude this Q(X ) is positive deﬁnite. In fact, this Q(X ) is
not positive deﬁnite or semi deﬁnite since, for example, if X = (1, 1, 1) , then Q(X ) = −1 .
It is clearly not negative deﬁnite or semi deﬁnite. Exercises
(1) Find the self-adjoint matrix S associated with the following quadratic forms:
(a) Q(X ) = x2 − 2x1 x2 + 4x2 .
1
2
(b) Q(X ) = −x2 + x1 x2 − x1 x3 + x2 − 3x2 x1 − 2x3 x2 + 3x2
1
2
3
(c) Q(X ) = 2x1 x2 − 3x3 x2 + 4x2 x4 + x3 x4 + 7x2
2 [Answers: (a) 1 −1
−1
4 −1 −1 −1
1
, (b)
1
− 2 −1 0
1
0
1
7 −3
2
−1 , (c) 0 −3
0
2
3
1
0
2
2 −1
2 0
2 1
2
0 (2) Use Theorem A or B to determine which of the following quadratic forms in two
variables are positive or negative deﬁnite, or semi deﬁnite, or none of these.
(a) Q(X ) = x2 − 2x1 x2 + 4x2
1
2
(b) Q(X ) = −x2 + x1 x2 − 4x2
1
2
(c) Q(X ) = x2 − 6x1 x2 − 4x2
1
2
(d) Q(X ) = x2 − 6x1 x2 + 4x2
1
2
(e) Q(X ) = x2 − 6x1 x2 + 4x2 x3 − x2 + 4x2
1
2
3
(3) If the self-adjoint matrix S is positive deﬁnite, prove it is invertible. Give an example
of an invertible self-adjoint matrix which is neither positive nor negative deﬁnite. 5.2. SUPPLEMENT ON QUADRATIC FORMS 213 (4) Find all real values for λ for which the quadratic form
Q(X ) = 2x2 + y 2 + 3z 2 + 2λxy + 2xz
√
is positive deﬁnite. [Hint: Q(X ) = ( 5 − λ2 )x2 + (λx + y )2 + ( 3z +
3 1
√ x)2
3 (5) Let the integer n be ≥ 3 . If the quadratic form
n Q(X ) = aij xi xj , aij = aji i,j =1 is the product of two linear forms
n Q(X ) = ( n λi xi )(
i=1 µj xj ),
j =1 show that det A = det((aij )) = 0 .
(6) If the self-adjoint matrix S is positive deﬁnite or semi-deﬁnite, prove the generalized
Schwarz inequality:
| Y, SX |2 ≤ Y , SY X, SX
for all X and Y . [Hint: Observe [X, Y ] := Y , SX
scalar product]. satisﬁes all the axioms for a (7) If the self-adjoint matrix S is positive deﬁnite (so S −1 exists by Exercise 3), prove
that S −1 is also positive deﬁnite. [Hint: Use the generalized Schwarz inequality,
Exercise 6, with Y = S −1 X and the inequality X, SX ≤ k 2 X 2 of Theorem 17,
p. 373].
(8) Proof or counterexample:
(a) If a matrix A = ((aij )) is positive deﬁnite, then all of its elements are positive,
aij > 0 for all i, j .
(b) If a matrix A is such that all of its elements are positive, aij > 0 , then the
matrix is positive deﬁnite. Exercises
(1) Write out the matrices associated with the operators A and B in Exercise 4a, p.
281, and carry out the computation there using matrices.
(2) Write out the matrices RA , RB , and RC for the rotation operators A, B , and C
in Exercise 8
p. 281 and complete that problem using matrices. [Ans. RA = 10
0 0 0 −1 in terms of the basis e1 = (1, 0, 0), e2 = (0, 1, 0), e3 = (0, 0, 1) ].
01
0
(3) Prove Exercise 2b (p. 281) as a corollary of Theorem 18. 214 CHAPTER 5. MATRIX REPRESENTATION (4) If
00
01 A=
compute AB, , 01
00 B= BA , and B 2 . (5) Compute A−1 if
12
34 (a). A = −2 [ans. A−1 = 40
5
(b). A = 0 1 −6
30
4 1100
0 1 1 0
(c) A = 0 0 1 1
0001 3
2 1
1
−2 4 0 −5
[ans. A−1 = −18 1 24 ]
−3 0
4 1 −1
1 −1
0
1 −1
1
]
[ans. A−1 = 0
0
1 −1 0
0
0
1 (6) If A is the matrix of 5a) above, from the deﬁnition compute directly,
(a) −6A−1 + 1 A∗
2 [ans. −9
2
−8
5
25
2
. (b) (A∗ )−1 and (A−1 )∗ . Compare them. State and prove a general theorem.
(c) AA∗ and A∗ A .
(7) If
A= 12
34 , B= 1
1
2 −1 , compute (AB )∗ , A∗ B ∗ , and B ∗ A∗ . Compare (AB )∗ and B ∗ A∗ and explain the
outcome.
(8) Prove that I ∗ = I and A∗∗ = A using only Theorems 14 and 16 (cf. Parts 2-4 of
Theorem 15).
(9) If A : Rn → Rm and B : Rm → Rn where n > m , prove that BA (an n × n matrix)
is singular. Is AB necessarily singular? (Proof or counterexample).
(10) Given two square matrices A and B such that AB = 0 , which of the following
statements are always true. Proofs or counterexamples are called for. [I suggest you
conﬁne your search for counterexamples to the case of 2 × 2 matrices.]
(a). A = 0 .
(b). B = 0.
(c). A and/or B are (is) singular (not invertible).
(d). A is singular.
(e). B −1 exists.
(f). If A−1 exists, then B = 0 .
(g). If B is nonsingular, then A = C . 5.2. SUPPLEMENT ON QUADRATIC FORMS 215 (h). BA = 0 .
(i). If A = 0 and B = 0 , then neither A nor B are invertible.
(11) (a). If A is a square matrix which satisﬁes
A2 − 2A − I = 0,
ﬁnd A−1 in terms of A . [Hint: Find a matrix B such that AB = BA = I .]
(b). If A is a square matrix which satisﬁes
An + an−1 An−1 + an−2 An−2 + . . . + a1 A + a0 I = 0, a0 = 0, where a0 , a1 , . . . , an−1 are scalars, prove that A is invertible and ﬁnd A−1 in terms
of A .
(12) (a). If L : En → Em , prove N(L∗ ) = R(L)⊥
[Hint: Show (in two lines) that X ∈ N(L∗ ) ⇐⇒ X, LZ = 0 for all Z ∈ En —from
which the result is immediate.]
(b). Use part (a) to show that dim R(L) = dim R(L∗ ) .
(c). Do exercise 19, page 441.
(13) (a). If T : En → En is a translation, T X = X + X0 , prove T is invertible by explicitly
ﬁnding T −1 (which is a trivial task). [Answer: T −1 X = X − X0 .]
(b). If R : En → En is a rigid body transformation, show that R is always invertible
by exhibiting R−1 . [Answer: If RX = R0 X + X0 , then R can be written as
∗
Rx = (T R0 )X. R−1 = R0 T −1 .]
(14) If A is any n × n matrix, ﬁnd matrices A1 and A2 such that A is decomposed into
the two parts
A = A1 + A2
where A1 is symmetric and A2 is anti-symmetric, i.e., A∗ = −A2 . [Hint: Assume
2
there is such a decomposition and use it to ﬁnd A1 and A2 in terms of A and A∗ .
Then verify that these work.]
d
(15) Consider the operator D = dx on P5 . Prove that D is not invertible (return to the
deﬁnition p. 360) but exhibit an operator L which is a right inverse, DL = I . (16) This problem proves that R is orthogonal if and only if R is linear and isometric.
(a) Prove that if R is linear and isometric, then it is orthogonal. (Trivial!).
(b) If R is orthogonal, prove that
i) RX = X
ii) RX, RY = X, Y (Hint: Use RX − RY 2 = X − Y 2 )
iii) R(aX ) = aRX (Hint: Prove R(aX ) − aRX 2 = 0 )
iv) R(X + Y ) = RX + RY (Hint: Prove
“something” 2 = 0 )
v) R is linear and isometric
[Warning: If you assume linearity in b), you’ll vitiate the whole problem]. 216 CHAPTER 5. MATRIX REPRESENTATION (17) (a). Let A be a square matrix such that A5 = 0 . Verify that (I + A)−1 = I − A +
A2 − A3 + A4 .
(b). If A7 = 0 , then (I − A)−1 = ?
(18) Consider the matrices
(a). (c). α
1
−2 1
2 δ 0β
γ0 , (b). , (d). 1
√
2 γ β , 1
√
2 1β
02 . For what value(s) of α, β, γ and δ do these matrices represent orthogonal transformations?
(19) If A = ((aij )) is a square (n × n) matrix, the trace of A is deﬁned as the sum of the
elements on the main diagonal, tr A : = a11 + a22 + . . . + ann . Prove
(a). tr(αA) = αA , where α is a scalar.
(b). tr(A + B ) = tr A + tr B , where B is also an n × n matrix.
(c). tr(AB ) = tr(BA) .
(d). tr(I ) =?
(20) Assume that A : En → En is anti-symmetric, A∗ = −A .
(a). Prove A − I is invertible. [By Theorem 12, it is suﬃcient to show (A − I )X =
0 ⇒ X = 0 . Use the property of A to prove it AX = X , then X, AX =
X 2,
A∗ X, X = − X 2 , and X, AX = A∗ X, X .]
(b). If U = (A + I )(A − I )−1 , then U is an orthogonal transformation.
(21) Let An be the orthogonal matrix which rotates vectors in E2 through an angle of
2π/n .
(a). Find a matrix representing An (use the standard basis for E2 ).
(b). Let B denote the orthogonal matrix of reﬂection across the x1 axis (p. 382).
Show that BAb = A−1 B . [The group of matrices generated by An and B and all
n
possible products is the dihedral group of order n ].
(22) Prove that the set of all orthogonal transformations of En into En forms a (noncommutative) group under multiplication.
(23) An aﬃne transformation AX = LX + X0 of a linear space into itself is called
non-singular if the linear transformation L is non-singular. Prove that the set of
all such non-singular aﬃne transformations form a (non-commutative) group under
multiplication.
(24) Let A = ((aij )) be a square matrix. Find all such matrices with the property that
tr(AA∗ ) = 0 (see Ex. 19 for the deﬁnition of the trace). 5.3. VOLUME, DETERMINANTS, AND LINEAR ALGEBRAIC EQUATIONS. 217 (25) Consider the linear space
S = { f (x) : f (x) = a + b cos x + c sin x }.
with the scalar product
1
f , g = aa + (b˜ + cc),
˜
b
˜
2
where g (x) = a + ˜ cos x + c sin x . Deﬁne the linear transformation R : S → S by the
˜b
˜
rule
(Rf )(x) = f (x + α), α real.
(a) Show that R is an orthogonal transformation by proving that Rf, Rg = f , g
for all f, g in S .
(b) Choose a basis for S and exhibit a matrix e Re which represents R with respect
to that basis for both the domain and target.
(26) Let A : En → Em . Prove: A is surjective (= onto) if and only if A∗ is injective (=
one to one).
(27) Deﬁne A : P3 → R3 by
A[p(x)] = (p(0), p(1), p(−1)) where p ∈ P3 . Find the matrix for this transformation with respect to the basis e1 = 1 e2 =
(x + 1)2 , e3 = (x − 1)2 , e4 = x3 for P3 ; and the standard basis for R3 .
(b). Find the matrix representing A using the same basis for R3 but using the basis
e1 = 1, e2 = x, e3 = x2 and e4 = x3 for P3 .
ˆ
ˆ
ˆ
ˆ
(28) If A and B both map the linear space V into itself, and if B is the only right
inverse of A, AB = I , prove A is invertible. [Hint: Consider BA + B + I ].
(29) Let A : En → Em be represented by the matrix ((aij )) , and B : Em → En by ((bij )) .
If
Y , AX = B Y, X
for all X ∈ En and all Y ∈ Em , prove B = A∗ . This proves the statement made in
the remark following Theorem 14.
(30) Let L : R4 → R4 be deﬁned by LX = (x1 , 0, x3 , 0) , where X = (x1 , x2 , x3 , x4 ) . Find
a matrix representing L in terms of some basis. You may use the same basis for both
the domain and the target. 5.3 Volume, Determinants, and Linear Algebraic Equations. Often we have stated that thus and so is true if and only if a certain set of vectors are
linearly independent. But we still have no adequate criteria for determining if a set of
vectors is linearly independent. What would be an ideal criterion? One superb criteria
would be as follows. Find a function which assigns to a set of n vectors X1 , X2 , . . . , Xn in
Rn a real number, with the property that this number is zero if and only if the vectors are
linearly dependent. 218 CHAPTER 5. MATRIX REPRESENTATION There is a geometric way of solving this problem. For clarity we shall work in two
dimensions, E2 . If X1 and X2 are any two vectors in E2 , then intuition tells us X1 and
X2 are linearly dependent if and only if the area of the parallelogram (see ﬁg.) is zero.
Thus, once we deﬁne the analogue of volume for n dimensional parallelepipeds in Rn , the
appropriate criterion appears to be that a set of n vectors X1 , . . . , Xn in Em is linearly
dependent if and only if the volume of the parallelepiped they span is zero.
The major hurdle is constructing a volume function which behaves in the manner dictated by two and three dimensional intuition. Our program is to state a few (four to be
exact) desirable properties of a volume function V for parallelepipeds, then construct a
simpler related function - the determinant D , and observe that V = |D| (absolute value
of D ) is a volume function. This determinant function will prove useful in the theory of
linear algebraic equations.
Let X1 and X2 be any two vectors in R2 . We deﬁne the parallelogram spanned by
X1 and X2 to be the set of points X in R2 which have the form
X = t1 X 1 + t2 X 2 , 0 ≤ t 1 ≤ 1, 0 ≤ t 2 ≤ 1 You can check that these points are precisely those in the parallelogram drawn above. The
volume function (really area in this case) V (X1 , X2 ) which assigns to each parallelogram
its volume should have the properties
1. V (X1 , X2 ) ≥ 0.
2. V (λX1 , X2 ) = |λ| V (X1 , X2 ), λ scalar.
3. V (X1 + X2 , X2 ) = V (X1 , X2 ) = V (X1 , X1 + X2 ) .
4. V (e1 , e2 ) = 1. e1 = (1, 0), e2 = (0, 1).
The second property states that if one side is multiplied by λ1 then the volume is
multiplied by |λ| (see ﬁg.).
The third property is more subtle. It states that the volume of the parallelogram
spanned by X1 and X2 is the same as the parallelogram spanned by X1 and X1 + X2 .
This is clear from the ﬁgure since both parallelograms have the same base and height.
The last property merely normalizes the volume. It states that the unit square has
volume 1.
Our ﬁrst task is to deﬁne a parallelepiped in En .
Definition: The n dimensional parallelepiped in En spanned by a linearly independent
set of vectors X1 , X2 , . . . , Xn is the set of all points X in Rn of the form
X = t1 X 1 + t2 X 2 + · · · + tn X n , 0 ≤ tj ≤ 1. It is a straightforward matter to write the axioms for the volume V (X1 , X2 , . . . , Xn )
for the n dimensional parallelepiped in En .
V-1. V (X1 , X2 , . . . , Xn ) ≥ 0.
V-2. V (X1 , X2 , . . . , Xn ) is multiplied by |λ| if some Xj is replaced by λXj where λ
is real.
V-3. V (X1 , X2 , . . . , Xn ) does not change if some Xj is replaced by Xj + Xk , where
j =k.
V-4. V (e1 , e2 , . . . , en ) = 1 , where e1 = (1, 0, 0, . . . , 0) , etc.
These axioms are amazingly simple. It is surprising that the volume function V in
uniquely determined by them; that is, there is only one function which satisﬁes these axioms.
You might wonder why we did not add the reasonable stipulation that volume remains 5.3. VOLUME, DETERMINANTS, AND LINEAR ALGEBRAIC EQUATIONS. 219 unchanged if the parallelepiped is subjected to a rigid body transformation. The reason
is that this axiom would be redundant, for this invariance of volume under rigid body
transformation will be one of our theorems.
The most simple way to obtain the volume function is to ﬁrst obtain the determinant
function D(X1 , X2 , . . . , Xn ) . We deﬁne the determinant function D(X1 , X2 , . . . , Xn ) of n
vectors X1 , X2 , . . . , Xn in Rn by the following axioms (selected from those for V ).
D-1. D(X1 , X2 , . . . , Xn ) is a real number.
D-2. D(X1 , X2 , . . . , Xn ) is multiplied by λ if some Xj is replaced by λXj where λ
is real.
D-3. D(X1 , X2 , . . . , Xn ) does not change if some Xj is replaced by Xj + Xk , where
j =k.
D-4. D(e1 , e2 , . . . , en ) = 1 , where e1 = (1, 0, 0, . . . , 0) etc.
Remarks:
(1) If A = ((aij )) is a (square) n × n matrix, A= a11 a12 · · ·
a21 a22 · · ·
·
·
·
an1 an2 · · · a1n
·
·
·
·
ann we can consider it as being composed of n column vectors A1 , A2 , . . . , An , and deﬁne
the determinant of the square matrix A in terms of the determinant of these vectors det A = D(A1 , A2 , . . . , An ) = a11 a12 · · ·
a21
·
·
·
an1 · · · · · · a1n
a2n
·
·
·
ann . (2) Although we have written a set of axioms for D , it is not at all obvious that such a
function exists. Rest assured that we will prove the existence of such a function.
(3) Observe: if we deﬁne
V (X1 , X2 , . . . , Xn ) := |D(X1 , X2 , . . . , Xn )| , Xj ∈ En , then V does satisfy the axioms for volume.
Granting existence of D , we derive some algebraic consequences of the axioms.
Theorem 5.23 . Let D be a function which satisﬁes axiom D-1 to D-3 (not necessarily
D-4).
(1) If Xj is replaced by Xj = λk Xk then D does not change.
k =j 220 CHAPTER 5. MATRIX REPRESENTATION
(2) If one of the vectors Xj is zero, then D = 0 .
(3) If the vectors X1 , X2 , . . . , Xn are linearly dependent then D = 0 . In particular
D = 0 if two vectors are equal.
(4) D is a linear function of each of its variables, that is
D(. . . , λY + µZ, . . .) = λD(. . . , Y, . . .) + µD(. . . , Z, . . .)
(so D is a multilinear function).
(5) If any two vectors Xi and Xj are interchanged, then D is multiplied by −1 .
D(. . . , Xi , . . . , Xj , . . .) = −D(. . . , Xj , . . . , Xi , . . .) Proof: These proofs, like the statements above, are conceptually simple but notationally
awkward. Notice that only Axioms 1-3 but not Axiom 4 will be used. We shall need this
fact shortly.
(1) We prove this only if Xj is replaced by Xj + λXk , j = k and λ = 0 . The general
case is a simple repetition of this until the other Xk ’s are used up. It is simplest to
work backward. By Axiom 2,
D(. . . , Xj + λXk , . . . , Xk . . .) = 1
D(. . . , Xj + λXk , . . . , λXk , . . .)
λ so by axiom 3 (since λXk is now a vector in D )
= 1
D(. . . , Xj , . . . , λXk , . . .)
λ and axiom 2 again
= D(. . . , Xj , . . . , Xk , . . .).
(2) Write the vector Xj = 0 as 0Xj where 0 is now a scalar. This scalar may be brought
outside D by axiom 2. Since D is a real number, 0 · D = 0 .
(3) Let Xj = ak Xk . By part 1, D does not change if Xj is replaced by Xj +
k =j λk X k .
k =j Choose λk = −ak . This gives a D with one vector zero, Xj − ak Xk = 0 . Thus
k =j D is zero by part 2.
(4) The trickiest part. Axiom 2 immediately reduced this to the special case λ = µ = 1 .
For notational convenience, let Y + Z by in the last slot. We have to prove
D(X1 , X2 , . . . , Y + Z ) = D(X1 , X2 , . . . , Y ) + D(X1 , X2 , . . . , Z ).
If X1 , X2 , . . . , Xn−1 (which appear in all three terms above) are linearly dependent,
we are done by part 3. Thus assume they are linearly independent. Since our linear
space Rn has dimension n , these n − 1 vectors can be extended to a basis for Rn 5.3. VOLUME, DETERMINANTS, AND LINEAR ALGEBRAIC EQUATIONS. 221 ˜
by adding one more, Xn . Now we can write Y and Z as a linear combination of
these basis vectors
˜
Y = a1 X1 + · · · + an−1 Xn−1 + an Xn , ˜
Z = b1 X1 + · · · + bn−1 Xn−1 + bn Xn . Substituting this into D we obtain
n−1 ˜
(aj + bj )Xj + (an + bn )Xn ). D(X1 , . . . , Y + Z ) = D(X1 , . . . , . . . ,
1 But by part 1,
˜
= D(X1 , . . . , (an + bn )Xn )
and axiom 1 results in
˜
= (an + bn )D(X1 , . . . , Xn ).
However, again by part 1,
n−1 ˜
aj Xj + an Xn ) D(X1 , . . . , Y ) = D(X1 , . . . ,
1 ˜
˜
= D(X1 , . . . , . . . , an Xn ) = an D(X1 , . . . , Xn ).
Similarly
˜
D(X1 , . . . , Z ) = bn D(X1 , . . . , Xn ).
Adding these two expressions and comparing them with the above, we obtain the
result.
(5) To avoid a mess, indicate only the i th and j th vectors. Our task is to prove
D(. . . , Xi , . . . , Xj , . . .) = −D(. . . , Xj , . . . , Xi , . . .).
This is clever. Watch: By the multilinearity (part 4)
D(. . . , Xi + Xj , . . . , Xi + Xj , . . .)
= D(. . . , Xi , . . . , Xi , . . . , ) + · · · + D(. . . , Xi , . . . , Xj , . . .)
+ D(. . . , Xj , . . . , Xi , . . .) + · · · + D(. . . , Xj , . . . , Xj , . . .).
However part 2 states that the left side as well as the ﬁrst and last terms on the right
are zero. Thus
0 = D(. . . , Xi , . . . , Xj , . . .) + D(. . . , Xj , . . . , Xi , . . .).
Transposition of one of the terms to the other side of the equality sign completes the
proof. You should also be able to fashion an easy proof of this part which uses only
the axioms directly (and uses none of the other parts of this theorem).
Instead of moving on immediately, it is instructive to compute D[X1 , X2 ] where X1
and X2 are vectors in R2 , X1 = (a, b), X2 = (c, d) . Then we are computing
D a
b , c
d , 222 CHAPTER 5. MATRIX REPRESENTATION
ac
bd which is, equivalently, the determinant of the matrix D a
b , c
d = aD
= aD
= aD 1
b
a 1
b
a 1
b
a = (ad − bc) D
= (ad − bc) D
= (ad − bc) D , c
d , c
d (axiom 2)
1 −c (algebra) ad−cb
a 1
b
a 1
b
a 1
0 0
1 ,
−
, (Theorem 21 part 1) b
a 0 , . b
a (axiom 2)
0
1 0
1 = (ad − bc) D [e1 , e2 ] = ad − bc , 0
1 (Theorem 21 part 1) (algebra)
(axiom 4). Thus |Area| = |(a + c)(b + d) − 2bc − cd − ab| = |ad − bc|
You can indulge in a bit of analytic geometry (or look at my ﬁgure) to show that the
area of a parallelogram spanned by X1 and X2 is |ad − bc| . From our explicit calculation,
the existence and uniqueness of the determinant of two vectors in R2 has been proved.
There are several ways to prove the general existence and uniqueness of a determinant function. Our procedure is to ﬁrst prove there is at most one determinant function
(uniqueness). Then we shall deﬁne a function inductively, and verify it satisﬁes the axioms.
By uniqueness, it must be the only function. Two interesting and important preliminary
propositions are needed.
The following lemma shows how to evaluate the determinant if all of the elements above
the principal diagonal are zeroes (that is, the determinant of a lower triangular matrix).
LEMMA: Let X1 , · · · , Xn be the columns of a lower triangular matrix a11 0
0 ...
0 a21 a22 0
· ·
·
0 ·
·
· ·
·
·
an1 an2 · · · · · · ann
Then
D(X1 , · · · , Xn ) = a11 a22 · · · ann D(e1 , · · · , en ) = a11 a22 · · · ann ,
that is, the determinant of a triangular matrix is the product of the diagonal elements.
Proof: If any one of the principal diagonal elements are zero, then the determinant is
zero. For example, if ajj = 0 , then the n − j + 1 vectors Xj , · · · , Xn all have their
ﬁrst j components zero, and hence can span at most an n − j dimensional space. Since
n − j + 1 > n − j , these vectors must be linearly dependent. Therefore, by Theorem 21,
part 3, the determinant is zero, as the theorem asserts. [If you didn’t follow this, look at a
3 × 3 or 4 × 4 lower triangular matrix and think for a moment]. 5.3. VOLUME, DETERMINANTS, AND LINEAR ALGEBRAIC EQUATIONS. 223 If none of the diagonal elements are zero, we can carry out the following simple recipe.
The recipe gives a procedure for reducing the problem to evaluating a matrix which is zero
everywhere except along the diagonal.
First, we get all zeros to the left of a22 in the second row by multiplying the second
column, X2 , by −a21 /a22 and adding the resulting vector to X1 . This gives a new
ﬁrst column with i = 2, j = 1 element zero. Moreover, the new matrix has the same
determinant as the old one (Theorem 21, part 1). It looks like a11 0
0
0 0 a22 0
· a31 a32 a33
· ˜
·
·
·
0 . ·
·
·
· ·
·
·
·
an1 an2 · · · ann
˜
Only the ﬁrst column has changed. Repeat the same process to get all zeros to the left
of a33 . Thus, multiply the third column by −a31 /a33 and −a31 /a33 and add the result
˜
to the ﬁrst and second columns respectively. This gives a new matrix, again with equal
determinant, but which looks like a11 0
0
0 ···
0 0 a22 0
0
0 0
0 a33 0
· a41 a42 a43 a44
ˆ
·
.
ˆ
·
·
0 ·
·
· ·
·
·
an1 an2 an3
ˆ
ˆ · · Moving on, we gradually eliminate all of the terms
the same diagonal ones. The ﬁnal result is a11 0 · · · 0 0 a22
· ·
·
0 ·
·
· ·
·
·
0
0
0
ann ann
to the left of the diagonal but keep . It has the same determinant as the original matrix, so
D(X1 , . . . , Xn ) = D(a11 e1 , . . . , ann en )
= a11 · · · ann D(e1 , . . . , en ),
where Axiom 2 has been used to pull out the constants. Now Axiom 4, D(e1 , . . . , en ) = 1 ,
can be used to complete the proof. Observe that Axiom 4 is not used until the very last
step. Thus, the formula D = (something) D(e1 , . . . , en ) depends only on Axioms 1-3. We
shall need this soon.
The above theorem shows how easy it is to evaluate the determinant of a lower triangular matrix. It becomes particularly valuable when coupled with the next theorem which 224 CHAPTER 5. MATRIX REPRESENTATION shows how the determinant of an arbitrary matrix can be reduced to that of a lower triangular matrix. The reduction procedure given here is the best practical way of evaluating
a determinant. There is a peculiar criss-cross method for evaluating 3 × 3 determinants
which is taught in many high schools. Forget it. The method is not very practical and does
not generalize to 4 × 4 or larger determinants.
Theorem 5.24 . The evaluation of the determinant D(X1 , . . . , Xn ) can be reduced to the
evaluation of a lower triangular matrix - and hence has the form
D = (something)D(e1 , . . . , en ).
The proof gives a way of computing “something” in terms of the original matrix.
Remark: In the above formula, we did not utilize the fact that
D(e1 , · · · , en ) = 1
since this one step in the proof is the only place where Axiom 4 would be used, so we can
(and shall) use the fact that this result holds for any function which only satisﬁes Axioms
1-3.
Proof: This is just a recipe for carrying out the reduction. It essentially is a repetition
of the last part of the preceding lemma. Instead of waving our hands at the procedure, we
shall work out a representative
Example: Evaluate D = D(X1 , X2 , X3 , X4 ) = 1
2 −1
0
−1 −2
3
1
0 −1
4 −3
2
5
0
1 by reducing it to a lower triangular determinant.
First we get all zeros to the right of the diagonal in the ﬁrst row, that is, except in the
a11 slot, by multiplying X1 by the constants −2, 1 and 0 and adding the resulting vectors
to X2 , X3 , and X4 , respectively. We obtain D= 1
2 −1 0
−1 −2 3
1
0 −1 4 −3
2
5
0
1 = 1
0 −1
0
−1
0
3
1
0 −1
4 −3
2
1
0
1 = 1
0
−1
0
0 −1
2
1 0
0
2
1
4 −3
2
1 Now we get all zeros to the right of the diagonal in the second row. Since the new a22 element above is zero, interchange the second and third columns (one could have interchanged
the second and fourth). This introduces a factor of −1 (by Theorem 21, part 5). Then
multiply the new second column by the constants 0 and − 1 , respectively, and add to the
2
last two columns, respectively. This gives D= 1
0
−1
0
0 −1
2
1 0
0
2
1
4 −3
2
1 = 1
−1
0
2 0
0
0
2
0
1
4 −1 −3
2
1
1 = 1
−1
0
2 0
0
0
2
0
0
4 −1 −5
2
1
0 5.3. VOLUME, DETERMINANTS, AND LINEAR ALGEBRAIC EQUATIONS. 225 And on the third row, where we again want all zeros to the right of the diagonal, so multiply
the new third column by −5 and add it to the fourth column:
1
−1
D=−
0
2 0
0
0
2
0
0
4 −1 −5
2
1
0 1
−1
=−
0
2 0
0
0
2
0
0
4 −1
0
2
1 −5 = −(1)(2)(−1)(−5) = −10, where we have used the lemma about determinants of lower triangular matrices to evaluate
the last determinant.
Uniqueness is now elementary.
Theorem 5.25 . There is at most one function
D(X1 , · · · , Xn ), X k ∈ Rn , which satisﬁes the 4 axioms for a determinant function.
Proof: Assume there are two such functions,
D(X1 , · · · , Xn )
and
˜
D(X1 , · · · , Xn ).
Let
˜
∆(X1 , · · · , Xn ) = D(X1 , · · · , Xn ) − D(X1 , · · · , Xn ).
˜
We shall show ∆(X1 , · · · , Xn ) = 0 for any choice of X1 , · · · , Xn . Since both D and D
satisfy Axioms 1-4, we have
˜
1). ∆ = D − D is real valued.
˜
2). ∆(. . . , λXj , . . .) = D(. . . , λXj , . . .) − D(. . . , λXj , . . .)
˜
= λD(. . . , Xj , . . .) − λD(. . . , Xj , . . .)
= λ∆(. . . , Xj , . . .).
˜
3). ∆(. . . , Xj + Xk , . . .) = D(. . . , Xj + Xk , . . .) − D(. . . , Xj + Xk , . . .)
˜
= D(. . . , Xj , . . . − D(. . . , Xj , . . .)
= ∆(. . . , Xj , . . .), j = k. ˜
4). ∆(e1 , . . . , en ) = D(e1 , . . . , en ) − D(e1 , . . . , en ) = 1 − 1 = 0 . Thus, ∆ satisﬁes the
same ﬁrst three axioms but ∆(e1 , . . . , en ) = 0 in place of Axiom 4. Because the proof of
Theorem 22 and its predecessors never used Axiom 4, we know that
∆(X1 , . . . , Xn ) = (something) ∆(e1 , . . . , en ) = 0. Thus ∆(X1 , · · · , Xn ) = 0 for any vectors Xj .
If it exists, the determinant function is known to be unique. We intend to deﬁne the
determinant of order n , that is, of n vectors in Rn , in terms of determinants of order
n − 1 . The key to such an approach is a relationship between a determinant of order n and 226 CHAPTER 5. MATRIX REPRESENTATION determinants of order n − 1 . To motivate our deﬁnition, we ﬁrst examine the case n = 3
and utilize the intimate relation between determinant and volume.
Let X1 , X2 and X3 be three vectors in R3 . To ﬁnd the determinant D(X1 , X2 , X3 ) ,
we can resolve one of the vectors, say X1 , into its components X1 = a11 e1 + a21 e2 + a31 e3 .
Since the determinant function is linear (Theorem 21, part 4),
D = D(X1 , X2 , X3 ) = a11 D(e1 , X2 , X3 ) + a21 D(e2 , X2 , X3 ) + a31 D(e3 , X2 , X3 ).
How can we interpret D(a11 e1 , X2 , X3 ) ,
a11 a12 a13
0 a22 a23 ?
0 a32 a33 D(a11 e1 , X2 , X3 ) = By subtracting suitable multiples of the ﬁrst column from the other two, we have
a11 0
0
0 a22 a23
0 a32 a33 D(a11 e1 , X2 , X3 ) = . Consider the related volume function. The vectors in the last matrix span a parallelepiped
whose base is the parallelogram spanned by (0, a22 , a32 ) and (0, a23 , a33 ) , while the height
is a11 . Thus, we expect the volume to be a11 times the area of the base. Since the area of
a
a
the base is det 22 23 , we hope
a32 a33
a11 0
0
0 a22 a23
0 a32 a33 a22 a23
a32 a33 = a11 . except possibly for a factor of ±1 . This last formula is the connection between determinants
of order three and those of order two.
Notice that the determinant on the right in the last equation is obtained from that of
D = D(X1 , X2 , X3 ) by deleting both the ﬁrst row and ﬁrst column. It is called the 1,1
minor of D , and written D11 . More generally, the i, j minor Dij of D is the determinant
obtained by deleting the i th row and j th column of D . If D is of order n , then each
Dij is of order n − 1 .
In this notation, we expect from the expansion of D(X1 , X2 , X3 ) that
D(X1 , X2 , X3 ) = ±?a11 D11 ±?a21 D21 ±?31 D31 ,
or
a11 a21 a13
a21 a22 a23
a31 a32 a33 = ±?a11 a22 a23
a32 a33 ±?a21 a12 a13
a32 a33 ±?a31 a12 a13
a22 a23 . where ? indicates our doubt as to the signs. Explicit evaluation of both sides (using
Theorem 22) reveals that the correct sign pattern is +, −, + .
Having examined this special case (and the 4 × 4 case too), we are tentatively led to
Suspicion (Expansion by Minors). If D(X1 , . . . , Xn ) is a determinant function, that is, if
it satisﬁes the axioms, then
n (−1)i+j aij Dij , D(X1 , X2 , . . . , Xn ) =
i=1 (5-3) 5.3. VOLUME, DETERMINANTS, AND LINEAR ALGEBRAIC EQUATIONS. 227 where Xj = (a1j , a2j , . . . , anj ) .
For the case n = 3, j = 1 this is the formula we found above. To verify that the
formula is correct, we must verify that the function satisﬁes our axioms for a determinant.
The reasoning goes as follows: we know exactly what determinants of order two are by a
previous computation, so the formula gives a candidate for the determinant of order three,
which in turn gives a candidate for a determinant function of order four, and so on. Thus,
by induction, let us assume that determinants of order k − 1 are known. We must prove
Theorem 5.26 . The previous function D(X1 , . . . , Xk ) deﬁned by the above formula is a
determinant function, that is, it satisﬁes the axioms.
Proof: 1). D(X1 , . . . , Xk ) is real valued since, by our induction hypothesis, each of the
Dij , determinants of order k − 1 , is real valued.
2). D(. . . , λXl , . . .) = λD(. . . , Xl , . . .) . There are two cases. If l = j , then λXj
means that a1j , a2j , . . . , is multiplied by λ . Thus
n (−1)i+j λaij Dij = λD(. . . , Xj , . . .), D(. . . , λXj , . . .) =
i=1 so the axiom is satisﬁed. If l = j , then some vector X other
so
· · · λa1l · · ·
· · · λa2l
·
D(. . . , λXl , . . . , Xj , . . .) =
·
·
· · · λakl than Xj is multiplied by λ ,
a1j
a2j
·
·
·
akj ···
··· ··· Since Dij is formed by deleting the i th row and j th column of D , and l = j , one
column in minor Dij will have the factor λ appearing in it. By the induction hypothesis,
the factor can be pulled out of each one, and hence from any linear combination of them.
Because the expansion formula for D is a linear combination of the minors, the axiom is
veriﬁed in this case too.
3). Omitted. This one is just plain messy. If you don’t care to try the general case for
yourself, at least try the case n = 3 and verify it there.
4). To prove D(e1 , . . . , en ) = 1 . Of the coeﬃcients a1j , a2j , . . . , anj , only ajj = 0 ,
and ajj = 1 . Thus D(e1 , . . . , en ) = (−1)j +j ajj Djj = Djj . But by the induction hypothesis, Djj = 1 since it has only ones on its main diagonal and zero elsewhere. Therefore
D(e1 , . . . , en ) = 1 , as desired.
This theorem completes (except for one segment) the proof that a unique determinant
function exists. The uniqueness was proved directly, while the existence was obtained from
the known existence of 2 × 2 determinant functions (the simpler case of 1 × 1 determinants could also have been used) and proving inductively that a candidate for the n × n
determinant function does satisfy the axioms.
Emerging from the jungle of the existence proof, we are fully equipped with the powerful
determinant function and the associated volume function. It will be relatively simple to
prove the remaining theorems involving determinants. The trick in most of them is to make
clever use of the fact that the determinant function is unique. We shall expose this trick in
its bare form. 228 CHAPTER 5. MATRIX REPRESENTATION Theorem 5.27 . Let ∆(X1 , . . . , Xn ) be a function of n vectors in Rn which satisﬁes
axioms 1-3 for the determinant. Then for every set of vectors X1 , . . . , Xn
∆(X1 , . . . , Xn ) = ∆(e1 , . . . , en )D(X1 , . . . , Xn ).
Thus, the function ∆ diﬀers from D only by a constant multiplicative factor, which is the
number ∆ assigns to the unit matrix (geometrically, the unit cube) in Rn .
Proof: If ∆(e1 , . . . , en ) = 1 , then ∆ satisﬁes Axiom 4 also, so by the uniqueness theorem,
it must be D itself. If ∆(e1 , . . . , en ) = 1 , consider
D(X1 , . . . , Xn ) = ∆(X1 , . . . , Xn )
˜
.
D(X1 , . . . , Xn ) :=
1 − ∆(e1 , . . . , en )
Note that the denominator is a ﬁxed scalar which does not depend on X1 , . . . , Xn . It is a
˜
˜
mental calculation to verify that D satisﬁes all of Axioms 1-4. Therefore D(X1 , . . . , Xn ) :=
D(X1 , . . . , Xn ) by uniqueness. Solving the last equation for ∆(X1 , . . . , Xn ) yields the
formula.
Consider D(X1 , . . . , Xn ) . If B = ((bij )) is a square n × n matrix representing a linear
transformation from Rn to Rn , how are D(X1 , . . . , Xn ) and D(BX1 , BX2 , . . . , BXn )
related? The answer to this question is vital if we are to ﬁnd how volume varies under a
linear transformation B . If A = ((aij )) is the matrix whose columns are X1 , . . . , Xn , and
C = ((cij )) is the matrix whose columns are BX1 , BX2 , . . . , BXn , then C = BA [since,
for example, c11 —the ﬁrst element in the vector BX1 —is
c11 = b11 a11 + b12 a21 + b13 a31 + · · · + b1n an1 .]
Because D(X1 , . . . , Xn ) = det A and D(BX1 , . . . , BXn ) = det C , our question becomes
one of relating det C = det(BA) to det A . The result is as simple as one could possibly
expect.
Theorem 5.28 . If A and B are two n × n matrices, then
det(BA) = (det B )(det A) = (det A)(det B ) = det(AB )
or, if X1 , . . . , Xn are the column vectors of A , then this is equivalent to
D(BX1 , . . . , BXn ) = D(Be1 , Be2 , . . . , Ben )D(X1 , . . . , Xn )
(since the matrix whose columns are Be1 , . . . , Ben is just B ).
Proof: Let ∆(X1 , . . . , Xn ) := D(BX1 , . . . , BXn ) . This function clearly satisﬁes Axiom
1. We shall verify Axioms 2 and 3 at the same time.
∆(. . . , λXj + µXk , . . .) = D(. . . , B (λXn + µXk ), . . .)
Because B is a linear transformation, we have
= D(. . . , λBXj + µBXk , . . .).
By the linearity of D (Theorem 21, part 4)
= λD(. . . , BXj , . . .) + µD(. . . , BXk , . . .). 5.3. VOLUME, DETERMINANTS, AND LINEAR ALGEBRAIC EQUATIONS. 229 If j = k , then the vector BXk in the second term on the right also appears as another
column in the same determinant. Hence the second term vanishes. Thus if j = k ,
∆(. . . , λXj + µXk , . . .) = λD(. . . , BXj , . . .).
The special case µ = 0 shows Axiom 2 holds for ∆ , while the case λ = µ = 1 veriﬁes
Axiom 3. Therefore ∆ satisﬁes Axioms 1-3. Applying the preceding Theorem (25), we
have
∆(X1 , . . . , Xn ) = ∆(e1 , . . . , en )D(X1 , . . . , Xn ).
By deﬁnition, ∆(e1 , . . . , en ) := D(Be1 , . . . , Ben ) . Substitution veriﬁes our formula. The
commutativity
(det B )(det A) = (det A)(det B )
follows from the fact that det A and det B are real numbers - which do commute under
multiplications.
Corollary 5.29 . If A is an invertible matrix, then
det(A−1 ) = 1
.
det A Proof: Since AA−1 = I , and det I = 1 , we ﬁnd
(det A)(det A−1 ) = det(AA−1 ) = det I = 1.
Ordinary division completes the proof.
Our next theorem is also a corollary, but because of its importance, we call it
Theorem 5.30 . The vectors X1 , . . . , Xn in Rn are linearly independent if and only if
D(X1 , . . . , Xn ) = 0 .
Proof: ⇐ If D(X1 , . . . , Xn ) = 0 , then the vectors X1 , . . . , Xn are linearly independent,
since if they were dependent, then D = 0 by part 3 of Theorem 21.
⇒ . If X1 , . . . , Xn are linearly independent vectors in Rn , then the Corollary to
Theorem 12 (p. 364) shows that the matrix A whose columns are the Xj is invertible.
Let A−1 be its inverse. From the computation in the corollary preceding this theorem,
(det A)(det A−1 ) = 1.
Thus the real number det A cannot be zero. The equivalent form of our theorem is also a
consequence of the Corollary to Theorem 12.
Example: (cf. p. 157, Ex. 1b). Are the vectors
X1 = (0, 1, 1), X2 = (0, 0, 1), X3 = (0, 2, 3) linearly dependent? We compute the determinant D(X1 , X2 , X3 ) = 0
00
1
02.
1 −1 3 230 CHAPTER 5. MATRIX REPRESENTATION If we knew that “the determinant of a matrix was equal to the determinant of its adjoint”
(a true theorem to be proved below), then taking the adjoint we get a matrix with one
column zero 0 which gives D = 0 . Since the quoted theorem is not yet proved, we proceed
diﬀerently and reduce our 3 × 3 determinant to 2 × 2 determinants expanding by minors
(p. 411). The simplest column to use is the second.
0
00
1
02
1 −1 3 12
13 = (−1)1+20 = 00
12 + (−1)2+20 00
13 + (−1)3+2 (−1) 00
12 =0·2−1·0=0 by the explicit formula for evaluating 2 × 2 determinants. Thus D = 0 so the vectors
X1 , X2 , X3 are linearly dependent.
That nice theorem we could have used in the above example is our next target.
Theorem 5.31 . If A is an n × n matrix, then
det A∗ = det A.
Proof: Let A1 , . . . , An be the columns of A and B, . . . , Bn its rows, A= a11 a12 · · ·
a21 · · · · · ·
·
·
·
an1 · · · · · · a1n
a2n
·
·
·
ann B1 } B2 · . · ·
} Bn Consider the function D(B1 , . . . , Bn ) = a11 · · ·
a12
·
·
a1n an1
·
·
·
ann } A1
·
· = det A∗ .
·
} An since the rows of A are the columns of A∗ . Let us deﬁne a new function
ˆ
D(A1 , . . . , An ) := D(B1 , . . . , Bn ).
ˆ
Our task is to verify that D(A1 , . . . , An ) satisﬁes all of Axioms 1-4. Then by uniqueness
ˆ
det A∗ := D(A1 , . . . , An ) = D(A1 , . . . , An ) = det A.
ˆ
(1) D(A1 , . . . , An ) is a real number since det A∗ , the determinant of the matrix A∗ is
a real number. 5.3. VOLUME, DETERMINANTS, AND LINEAR ALGEBRAIC EQUATIONS. 231 ˆ
ˆ
(2) We must show D(. . . , λAj , . . .) = λD(. . . , Aj , . . .) , that is,
a11
·
·
λa1j
·
·
·
a1n a2j ··· λa2j ... ... =λ ann ··· anl ··· anj a1n · · · . . . λanj an1 ann a11
·
·
a1j (a fact we only know so far if a column is multiplied by a scalar). Trick: observe that 10
0 1
j th row . . . 0 ···
0 = 0 0 λ
1 0 0... 0 1 0 a11 · · ·
·
·
a1j · · ·
·
·
a1n · · · anl
·
·
anj
·
·
ann a11 · · ·
·
·
a1j · · ·
·
·
a1n · · · anl
·
·
anj
·
·
ann The matrix on the left is the identity matrix I except for a λ in its j th row and
j th column. Its determinant is λ (since you can factor λ from the j th column and
are left with the identity matrix). By Theorem 26, the determinant of the product
ˆ
ˆ
ˆ
on the left is λD(A1 , . . . , An ) while the right is D(A1 , . . . , An ) , proving D satisﬁes
Axiom 2.
ˆ
(3) The proof of Axiom 3 involves a similar trick. We have to show D(. . . , Aj + Ak , . . .) =
ˆ
D(. . . , Aj , . . .) where j = k , that is, to show
a11
···
·
·
a1j + a1k · · ·
·
·
·
a1n
··· an1
·
·
anj + ank
·
·
·
ann = a11 · · ·
·
·
a1j · · ·
·
·
·
a1n · · · an1 anj
·
·
·
ann , j = k. 232 CHAPTER 5. MATRIX REPRESENTATION
Observe that 1
0
···
0
0 0
1
···
0
0 ···
···
···
··· 0 1 0 a11 · · ·
·
·
a1j · · ·
·
·
·
a1n · · · an1 anj = ann a11
···
·
·
a1j + a1k · · ·
·
·
·
a1n
··· an1
·
·
anj + ank , ann where the matrix on the left is the identity matrix with an extra 1 in the j th row, k th
column. Since the determinant of this matrix is one (check by a mental computation),
the rule for the determinant of a product of matrices shows that Axiom 3 is satisﬁed.
(4) Easy, for ˆ
D(e1 , . . . , en ) = 1
0 0
1
···
0 ··· ···
···
···
··· ···
···
1
0 0
0
0
1 = D(e1 , . . . , en ) = 1. This veriﬁcation of the four Axioms coupled with the remarks at the beginning of the
proof completes the proof.
Corollary 5.32 . The column operations of Theorem 21 are also valid as row operations.
Proof: Every row operation on a matrix A (like adding two rows) can be split up to : i)
take A∗ so the rows become columns, ii) carry out the operation on the column of A∗ and
iii) take the adjoint again. Since the determinant does not change under these operations,
we are done.
Corollary 5.33 . If R is an orthogonal matrix then
det R = ±1.
Proof: If R is orthogonal, then R∗ R = I by Theorem 19 (p. 383). Thus,
a = det I = det(R∗ R) = (det R∗ )(det R) = (det R)2 ,
where Theorems 25 and 27 were invoked once each. Now take the square root of both sides.
The orthogonal matrices
R1 = 10
01 and R2 = 01
10 , for which det R1 = 1 and det R2 = −1 show that both signs are possible. If det R = −1 ,
then the orthogonal transformation has not only been a rotation but also a reﬂection. The
transformation given by R2 is
a figure goes here 5.3. VOLUME, DETERMINANTS, AND LINEAR ALGEBRAIC EQUATIONS. 233 which can be thought of as the composition (product) of a rotation by +900 followed
ˆ˜
by a reﬂection (mirror image). In fact, R2 may be factored into RR = r2 , where
1
0
0 −1 01
−1 0 01
10 = = R2 . Pictorially
a figure goes here
Our theorems about determinants also imply the following valuable result about volume.
Theorem 5.34 . Let X1 , . . . , Xn span a parallelepiped Q in En and the matrix A map
En into En . Then the volume is magniﬁed by |det A| , that is,
V [AX1 , . . . , AXn ] = |det A| V [X1 , . . . , Xn ].
If we denote the image of Q by A(Q) , then this theorem reads
Vol[A(Q)] = |det A| Vol[Q]. Proof: [We should ﬁrst prove that there is at most one volume function V satisfying its
four axioms. Since V := |D| is a volume function, assume there is another volume function
˜
V ∗ and deﬁne D(X1 , . . . , Xn ) by
˜
D(X1 , . . . , Xn ) := V ∗ (X1 ,...,Xn )D(X1 ,...,Xn )
|D(X1 ,...,Xn )| 0 if D = 0
if D = 0. ˜
˜
It is simple to check that D satisﬁes the axioms for a determinant. By uniqueness, D = D .
∗ (X , . . . , X ) = |D (X . . . , X )| ≡ V (X , . . . , X ) , so
Solving the last equation, we ﬁnd V
1
n
,
n
1
n
the volume function is also unique.]
The theorem is easily proved. Since V = |D| , an application of Theorem 26 tells us
that
V [AX1 , . . . , AXn ] = |D(AX1 , . . . , AXn )|
= |D(Ae1 , . . . , Aen )|
= |det A| |D(X1 , . . . , Xn )| V [X1 , . . . , Xn ]. Done.
Corollary 5.35 . Volume is invariant under an orthogonal transformation.
V (RQ) = V (Q)
Proof: If R is an orthogonal transformation, |det R| = 1 .
Remark 1. Since we eventually want to deﬁne the volume of suitable sets by approximating
the sets by parallelepipeds, this theorem will allow us to conclude the same results about how
the volume of some set changes under a linear transformation in general and an orthogonal
transformation in particular. 234 CHAPTER 5. MATRIX REPRESENTATION Remark: 2 We deﬁne the determinant of a linear transformation L which maps Rn into
Rn as the determinant of a matrix which represents L . This deﬁnition makes it mandatory
to prove: “the determinant of two diﬀerent matrices which represent L (diﬀerent because
of a diﬀerent choice of bases) are equal.” However the theorem is an immediate consequence
of the following fact we never proved: “if A and B are matrices which represent the same
linear transformation L with respect to diﬀerent bases then there is a nonsingular matrix
C such that B = CAC −1 .” The matrix C is the matrix expressing one set of bases vectors
in terms of the other bases. Using this theorem, we ﬁnd
det B = det(CAC −1 ) = (det C )(det A)(det C −1 ) = det A.
How does volume change under a translation T, T X = X + X0 ? A little thought is
needed. Imagine a parallelepiped Q spanned by X1 , . . . , Xn . The crux of the matter is to
realize that the parallelepiped has the origin as one of its vertices and X1 , . . . , Xn at the
others. Under the translation T , not only do the Xj ’s get translated through X0 , but so
does the origin, 0 → X0 , X1 → X1 + X0 , X2 → X2 + X0 , etc.
a figure goes here
In terms of free vectors, the edge from 0 to Xj becomes the edge from X0 to Xj + X0
(see ﬁgure). Thus the free vector representing this edge is (Xj + X0 ) − X0 , that is, it is
still Xj ! This motivates the
Definition: The volume of a parallelepiped is deﬁned to be the volume of the parallelepiped
after translating one vertex to the origin.
Theorem 5.36 . The change in volume of a parallelepiped Q under an aﬃne transformation AX = LX + X0 , L linear, is given by:
Vol[A(Q)] = |det L| Vol[Q].
In particular, volume is invariant under a rigid body transformation (for then L is an
orthogonal transformation).
Proof: The aﬃne transformation may be factored into A = T L , a linear transformation
followed by a translation (p. 380). Since L changes volume by |det L| while translation
preserves the volume, the net result is a change by |det L| as claimed. a) Application to Linear Equations What have our geometrically motivated determinants in common with the determinants
of high school fame - where they were used to solve systems of linear algebraic equations?
Everything, for they are the same. Since determinants are deﬁned only for square matrices,
they are applicable to linear algebraic equations only when there are the same number of
equations as unknowns. At the end of this section, we shall make some remarks about the
case when the number of equations and unknowns are not equal.
Consider the system of equations
a11 x1 + · · · + a1n xn = y1
a21 x1 + · · · + a2n xn = y2
.
.
.
.
.
.
an1 x1 + · · · + ann xn = yn , 5.3. VOLUME, DETERMINANTS, AND LINEAR ALGEBRAIC EQUATIONS. 235 which we can write as
x1 A1 + · · · + xn An = Y,
where Aj is the j th column of the matrix A = ((aij )) and Y is the obvious column
vector. The problem is to ﬁnd numbers x1 , . . . , xn such that x1 A1 + · · · + xn An = Y ,
where Y is given.
Theorem 5.37 . Let A = ((aij )) be a square n × n matrix and Y a given vector. The
system of linear algebraic equations AX = Y can always be solved for X if and only if
det A = 0 . This can be rephrased as, A is invertible if and only if det A = 0 .
Proof: Let Aj be the j th vector of A . Each Aj is a vector in Rn . If det A = 0 ,
then the An ’s are linearly independent by Theorem 27, p. 417. But since they are linearly
independent and there are n of them, A1 , · · · , An , they must span Rn . Thus, any Y ∈ Rn
can be written as a linear combination of the Aj ’s. The numbers x1 , · · · , xn are just the
coeﬃcients in this linear combination.
Conversely, if the equations AX = Y can be solved for any Y ∈ Rn , then the vectors A1 , · · · , An span Rn . But if n vectors span Rn , these vectors must be linearly
independent, so det A = 0 , again by Theorem 27, page 417.
Theorem 5.38 . Let A be a square matrix. The system of homogeneous equations AX =
0 has a non-trivial solution if and only if det A = 0 .
Proof: By Theorem 27, Page 417, det A = 0 if and only if the column vectors A1 , . . . , An
are linearly dependent. Now if the column vectors A1 , . . . , An are linearly dependent,
then there are numbers x1 , . . . , xn , not all zero, such that x1 A1 + . . . + xn An = 0 . The
vector X = (x1 , . . . , xn ) is then a non-trivial solution of AX = 0 . Conversely, if there is
a non-trivial solution of AX = 0 , then x1 A1 + · · · + xn An = 0 , so the Aj ’s are linearly
dependent. Hence det A = 0 .
In contrast to the above theorems which give no hint of a procedure for ﬁnding the
desired vector X , the next theorem gives an explicit formula for the solution of AX = Y .
Theorem 5.39 (Cramer’s Rule). Let A = ((aij )) be a square n × n matrix with columns
A1 , . . . , An . Assume det A = 0 . Then for any vector Y , the solution of AX = Y is
x1 = D(Y, A2 , . . . , An )
,
D(A1 , . . . , An ) x2 = D(A1 , Y, A3 , . . . , An )
D(A1 , . . . , An ) .
.
.
xn =
D(A1 , . . . , An−1 , Y )
.
D(A1 , . . . , An ) For example, in detail, the formula for x2 is
··· a1n
.
.
. an1 yn an3 · · · ann a11
.
.
.
x2 = y1
.
.
. a13
.
.
. a12
.
.
. a13
.
.
. ···
.
.
. a1n
.
.
. an1 an2 an3 ··· ann a11
.
.
. . 236 CHAPTER 5. MATRIX REPRESENTATION Proof: A snap. Since det A = 0 , by Theorem 31 we know a solution X = (x1 , . . . , xn )
exists. Thus x1 A1 + · · · + xn An = Y . Let us obtain the formula for x2 as a representative
case. Observe that
D(A1 , Y, A3 , · · · , An ) = D(A1 , x1 A1 + · · · + xn An , A3 , · · · , An ).
Since D is multilinear, we can expand the above to
= x1 D(A1 , A1 , A3 , · · · , An ) + xn D(A1 , A2 , A3 , · · · , An ) + · · · + xn D(A1 , An , A3 · · · An ).
Now all of these determinants, except the second one, vanishes since each has two identical
columns (part 5 of Theorem 21, page 400). Thus
D(A1 , Y, A3 , ·, An ) = x2 D(A1 , A2 , · · · , An ).
Because det A = D(A1 , · · · , An ) = 0 , we can divide to ﬁnd the desired formula for x2 .
Done.
Remark: This elegant formula is mainly of theoretical use. It is not the most eﬃcient
procedure for solving such equations. That honor belongs to the method of reducing to
triangular form which was outlined in the proof of Theorem 22. To be more vivid, if
Cramer’s rule were used to solve a system of 26 equations, approximately (23 + 1)! ≈ 1028
multiplications would be required. Reduction to triangular form, on the other hand, would
only require about (1/3)(23)3 ≈ 6000 multiplications. Think about that.
For non-square matrices, determinants are not applicable. Given a vector Y , one would
still like a criterion to determine if one can solve AX = Y , that is, one would like a criterion
to see if Y ∈ R(A) .
Theorem 5.40 . Let L : V1 → V2 be a linear operator. Then
R(L)⊥ = N(L∗ );
or equivalently (for ﬁnite dimensional spaces)
R(L) = N(L∗ )⊥ .
Proof: If X ∈ V , and Y ∈ R(L)⊥ , then for all X
0 = Y , LX = L∗ Y, X .
This means L∗ Y is orthogonal to all X , consequently, L∗ Y = 0 , so Y ∈ N(L∗ ) . The
converse is proved by observing that our steps are reversible.
Application. For what vectors Y = (y1 , y2 , y3 ) can you solve the equations
2x1 , +3x2 = y1
x1 − x2 = y2
x1 + 2x2 = y3 ? 5.3. VOLUME, DETERMINANTS, AND LINEAR ALGEBRAIC EQUATIONS. 237 If the equations are written as AX = Y , then by the above theorem Y ∈ R(A) if and
only if Y⊥ N(A∗ ) . Let us ﬁnd a basis for N(A∗ ) . This means solving the homogeneous
equations A∗ Z = 0 ,
2z 1 + z 2 + z 3 = 0
3z1 − z2 + 2z3 = 0.
If we let z1 = α , and solve the resulting equations for z2 and z3 , we ﬁnd that z3 =
−5α/3 and z2 = −11α/3 . Consequently, all vectors Z ∈ (A∗ ) have the form Z =
(3α, −11α, −5α) . A basis for N(A∗ ) is e = (3, −11, −5) . Therefore, Y⊥ N(A∗ ) if and only
if 3y1 − 11y2 − 5y3 = 0 . By the above reasoning, the equation AX = Y can be solved for
only these Y ’s.
Remark: The use of Theorem 34 as a criterion for ﬁnding if Y ∈ R(L) is much more
valuable in inﬁnite dimensional spaces, for it quite often turns out that N(L∗ ) is still ﬁnite
dimensional while R(L) is inﬁnite dimensional. For more on these ideas, see page 389,
Exercise 12 and page 501 Exercises 27- 29. Exercises
(1) Evaluate the following determinants as you see ﬁt:
a). 7
3
,
2 −1 c). −10 −2
3
−3
2
1,
5
0 −1 f ). 2
1
1
1 1
2
1
1 1
1
2
1 b). 1
1
,
1
2 1
2 5
.
−3 4 d). g). a
b
c
c
d 1
1
0
0
e 53 17 29
36 12 39 ,
69 23 75 e). 1
1
0
1 2
0
3
4
1 −5
2
3 1
0
6
4 00
0
00
0
0 1 −b
0 1 −a
1f
g [Answers: a) −13 , b) 17, c) −14 , d) 6, e) 5, g) −(b − a)2 ].
(2) If A and B are the matrices whose respective determinants appear in #1 a) and
b), compute det(AB ) by ﬁrst ﬁnding AB . Compare with (det A)(det B ) .
(3) a). Use Cramer’s rule (Theorem 33) to solve the equation AX = Y , where A is given
below. Then observe you have computed A−1 , so exhibit it. 1
1
1
6
8
2
1
−6 −3
3 ].
A = 2 −3 −1 .
[A−1 =
30
4
9
1
30 −5 −5
b). Use the formula for A−1 to solve the equations
AX = Y where Y = (1, 2, 0). 238 CHAPTER 5. MATRIX REPRESENTATION (4) a). Find the volume of the parallelepiped Q in E3 which is spanned by the vectors
X1 = (1, 1, 1), X2 = (2, −1, −3) and X3 = (4, 1, 9) . [Answer: Volume = 30].
b). The matrix A , −10 −2
3
2
1
A = −3
5
0 −1 − (cf. #1,c) maps E3 into itself. Find the volume of the image of Q , that is, the volume of A(Q) .
[Answer: 420].
(5) Let B = A − λI where A is a square matrix. The values λ for which B is singular
are called the eigenvalues of A . Find the eigenvalues for
a). A = 3
2
2 −1 c). A = ab
cd , b). A = 3
2
1 −1 . . [Hint: If B is singular, then 0 = det B = det(A − λI ) . Now observe that det(A − λI )
1
is a polynomial in λ . The answer to c) is λ = 2 (a + d ± (a + d)2 − 4(ad − bc))].
(6) For what value(s) of α are the vectors
X1 = (1, 2, 3), X2 = (2, 0, 1), X3 = (0, α, −1) linearly dependent?
(7) If X1 , X2 , X3 and Y1 , Y2 , Y3 are vectors in R3 , prove that
D[X1 , X2 , X3 ] − D[Y1 , Y2 , Y3 ]
= D[X1 − Y1 , X2 , X3 ] + D[X1 , X2 − Y2 , X3 ] + D[X1 , X2 , X3 − Y3 ].
[Hint: First work out the corresponding formula for the 2 × 2 case.]
(8) Here you shall compute the derivative of a determinant if the coeﬃcients of A = ((aij ))
depend on t, aij (t) . Let X1 (t), . . . , Xn (t) be the vectors which constitute the columns
of A . The problem is to compute dD(t)
d
d
= D[X1 , . . . , Xn ](t) =
dt
dt
dt a11 (t), · · · , a1n (t)
·
·
·
an1 (t), · · · , ann (t) a). Use Exercise 7 (generalized to n × n matrices) to show
D(t + ∆t) − D(t) ≡ D[X1 (t + ∆t), X2 (t + ∆t), . . .] − D[X1 (t), X2 (t), . . .]
n D[X1 (t), . . . , Xj −1 (t), Xj (t + ∆t) − Xj (t), Xj +1 (t + ∆b), . . .] =
j =1 5.3. VOLUME, DETERMINANTS, AND LINEAR ALGEBRAIC EQUATIONS. 239 [Hint: Do the cases n = 2 and n = 3 ﬁrst].
b). Use part a to show that
dD
D(t + ∆t) − D(t)
= lim
∆t→0
dt
∆t
n = D[X1 , . . . , Xj −1 ,
j =1 dXj
, Xj +1 , . . . , Xn ],
dt so the derivative of a determinant is found by taking the derivative one column at a
time and adding the result.
(9) Let u1 (t) , and u3 (t) be solutions of the diﬀerential equation
u + a1 (t)u + a0 (t)u = 0.
Consider the Wronski determinant
W (u1 , u2 )(t) := u1 (t) u2 (t)
u1 (t) u2 (t) (a) Use Exercise 8 to prove
dW
= −a1 (t)W.
dt
(b) Consequently, show
t W (t) = W (t0 ) exp − a1 (s) ds .
t0 (c) Apply this to show that if the vectors (u1 (t), u1 (t)) and (u2 (t), u2 (t)) are linearly
independent at t = t0 , then they are always linearly independent.
(d) Let u1 (t) . . . , un (t) be solutions of the diﬀerential equation
u(n) + an−1 (t)u(n−1) + · · · + a1 (t)u + a0 (t)u = 0.
Consider the Wronski determinant of u1 , . . . , un W (u1 , . . . , un ) = u1
u1
·
·
·
(n−1) u1
Prove u2
u2 ···
··· (n−) ··· a2 un
un (n−1) un dW
= −an−1 (t)W,
dt so again
t W (t) = W (t0 ) exp − an−1 (s) ds .
t0 240 CHAPTER 5. MATRIX REPRESENTATION
(e) Use part d) to conclude that the n vectors
(n−1) (u1 , u1 , . . . , u1 (n−1) ), (u2 , u2 , . . . , u2 , ), · · · (un , un , . . . , u(n−1) )
n (where the uj are solutions of the O.D.E.) are linearly independent for all t if
and only if they are so at t = t0 .
(10) A matrix A is upper (lower) triangular if
diagonal are zero, a11 a12 0 a22
A=
0
0
0
0 all the elements below (above) the main
···
···
··· an
·
.
·
ann If A is upper (or lower) triangular, prove again that
det A = a11 a22 . . . ann .
by expanding by minors. What is the relation of this result to the exercise ( #4 , p.
157) on echelon form?
ˆ
(11) Let X1 , . . . , Xn be vectors in Rn and let D(X1 , . . . , Xn ) be a real valued function
ˆ
which has properties 1 and 4 of Theorem 21. Thus D is skew-symmetric, and is
ˆ necessarily satisﬁes Axioms 2 and 3 for the
linear in each of its columns. Prove D
determinant, and conclude that
ˆ
D(X1 , . . . , Xn ) = kD(X1 , . . . , Xn ),
where the constant k = D(e1 , . . . , en ) .
(12) Let u1 (t), . . . , un (t) be suﬃciently diﬀerentiable functions (C n−1 is enough). Deﬁne
the Wronskian as in Exercise 9 part d. Prove that if the functions u1 , . . . , un are
linearly dependent, then W (t) ≡ 0 . Thus, if W (t0 ) = 0 , the functions are linearly
independent in any interval containing t0 . [Do not try to apply the result of Exercise
9 for it is not applicable].
(13) (a) If I is the n × n identity matrix, evaluate det(λI ) where λ is a constant.
(b) If A is an n × n matrix, prove
det(λA) = λn det A.
(c) If A or B are n × n matrices, is
? det(A + B ) = det A + det B ?
Proof or counterexample.
(14) For what value of α does the system of equations
x + 2y + z = 0
−2x + αy + 2z = 0
x + 2y + 3z = 0
have more than one solution? 5.3. VOLUME, DETERMINANTS, AND LINEAR ALGEBRAIC EQUATIONS. 241 (15) A matrix is nilpotent if some power of it is zero, that is, AN = 0 for some positive
integer N . Prove that if A is nilpotent, then det A = 0 .
(16) (a) Solve the systems of equations
i) x + y = 1 , x − .9y = −1
and
ii) x + y = 1 , x − 1.1y = −1 ,
and compare your solutions, which should be almost the same.
(b) Solve the systems of equations
i) x + y = 1 , x + .9y = −1,
and
x + y = 1 , x + 1.1y = −1.
and again compare your solutions. Explain the result in terms of the theory in
this section.
(c) Consider the solution of the systems of equations x+y =1
x + αy = −1
as the point where the lines x + y = 1 and x + αy = −1 intersect. Sketch
the graph of these lines for α near −1 and then for α near +1 . Use these
observations to again explain the phenomena in parts a) and b).
(17) Let ∆n be the n × n determinant of a matrix with a ’s along the main diagonal and
b ’s on the two “oﬀ diagonals” directly above and below the main diagonal. Thus ∆5 = ab000
bab00
0bab0.
00bab
000ba (a) Prove ∆n = a∆n−1 − b2 ∆n−2 .
(b) Compute ∆1 and ∆2 by hand. Then use the formula to compute ∆3 and ∆4 .
(c) If a2 = 4b2 , can you show ∆n = √ a2 1 a+
2
− 4b √ a2
2 − 4b 2 n+1 − a− √ a2
2 − 4b2 n+1 ? Later, we shall give a method for obtaining this directly from the equation of part a).
[p. 522-523].
(18) Prove Part 5 of Theorem 21 using only the axioms and no other part of Theorem 21. 242 CHAPTER 5. MATRIX REPRESENTATION (19) Apply the result of Exercise 12 on page 389. Try to prove the following. A is a square
matrix.
a). dim N(A) = dim N(A∗ ).
Thus, the homogeneous equation AX = 0 has the same number of linearly independent solutions as does the equation A∗ Z = 0 .
b). Let Z1 , . . . , Zk span N(A∗ ) . Then the inhomogeneous equation
AX = Y
has a solution, that is, Y ∈ R(A) , if and only if
Zj , Y = 0, j = 1, 2, . . . , k. In other words, the equation AX = Y has a solution if and only if Y is orthogonal
to the solutions of the homogeneous adjoint equation.
c). Consider the system of linear equations
2x − 3y + z = 1
−3x + 2y − 4z = α
x − 4y − 2z = β.
Let A be the coeﬃcient matrix. Find a basis for N(A∗ ) . [Answer: dim N(A∗ ) = 1
and Z1 = (2, 1, −1) is a basis]. For what value(s) of the constants α, β can you solve
the given system of equations? [Answer: There is a solution if and only if β − α = 2 .]
Find a solution if α = 1 and β = 3 .
d). Repeat part c) for the system of equations
x−y =1
x − 2y = −1
x + 3y = α.
[Answer: dim N(A) = 1 and Z1 = (−5, 4, 1) is a basis. There is a solution if and
only if α = −1 ].
(20) Use the result of Exercise 12 to prove that each of the following sets of functions are
linearly independent everywhere.
a) u1 (x) = sin x,
b) u1 (x) = sin nx, u2 (x) = cos x
u2 (x) = cos mx, where n = 0. c) u1 (x) = ex , u2 (x) = e2x , u3 (x) = e3x .
d) u1 (x) = eax , u2 (x) = ebx , u3 (x) = ecx , where a, b , and c are distinct numbers.
e) u1 (x) = 1, u2 (x) = x, u3 (x) = x2 , u4 (x) = x3
f) u1 (x) = ex , u2 (x) = e−x , u3 (x) = xex , u4 (x) = xe−x . 5.4. AN APPLICATION TO GENETICS 5.4 243 An Application to Genetics A mathematical model is developed and solved. Although this particular model will be
motivated by genetics, the resulting mathematical problem also arises in sociometrics and
statistical mechanics as well as many other places. In the literature you will ﬁnd these
mathematical ideas listed under the title Markov chains.
Part of the value you should glean from our discourse is insight into the process of
going from vague qualitative phenomena to setting up a quantitative model. One part of
this scientiﬁc process we shall not have time to investigate in detail is the very important step
of comparing the quantitative results with experimental data. Furthermore, we shall never
delve into the fertile realm of generalizing our accumulated knowledge to more complicated
- as well as more interesting and realistic - situations.
In bisexual mating, the genes of the resulting oﬀspring occur in pairs, one gene in
each pair being contributed by each parent. Consider the simplest case of a trait which is
determined by a single pair of genes, each of which is one of two types g and G . Thus, the
father contributes G or g to the pair, and the mother does likewise. Since experimental
results show that the pair Gg is identical to the pair gG , the oﬀspring has one of the three
pairs
GG
Gg
gg.
The gene G dominates g if the resulting oﬀspring with genetic types GG and Gg “appear”
identical but both are diﬀerent from gg . In this case, an individual with genetic type GG
is called dominant, while the types gg and Gg are called recessive and hybrid, respectively.
An oﬀspring can have the pair GG (resp. gg ) if and only if both parents contributed
a gene of type G (resp. g) while the combination Gg occurs if either parent contributed
G and the other g . A fundamental assumption we shall make is that a parent with genetic
type ab can only contribute a gene of type a or of type b . This assumption ignores such
things as radioactivity as a genetic force. Thus, a dominant parent, GG can only contribute
a dominant gene, G , a recessive parent, gg , can only contribute g , and a hybrid parent
Gg can contribute either G or g (with equal probability). Consequently, if two hybrids
are mated, the oﬀspring has probability 1 of getting G or g from each parent, so the
2
probability of his having genetic type GG of gg is 1 each, while the probability of having
4
1
genetic type Gg is 2 .
We introduce a probability vector V = (v1 , v2 , v3 ) , with v1 representing the probability
of being genetic type GG , v2 of being type Gg , and v3 of being type gg . Thus for an
11
oﬀspring of two hybrid parents, V = ( 4 , 2 , 1 ) . Observe that, by deﬁnition of probability,
4
0 ≤ vj ≤ 1, j = 1, 2, 3 , and v1 + v2 + v3 = 1 (since with probability one - certainty - the
oﬀspring is either GG , Gg , or gg ).
Consider the issue of mating an individual whose genetic type is unknown with an
individual of known genetic type (dominant, hybrid or recessive). To be speciﬁc, assume
the known person is of dominant type. Then the following matrix of transition probabilities 1 1 /2 0
D = 0 1 /2 1 000
describes the probability of the oﬀspring’s genetic type in the following sense: if the unknown
parent had genetic type V0 (so V0 = (1, 0, 0) if unknown was dominant, V0 = (0, 1, 0) if 244 CHAPTER 5. MATRIX REPRESENTATION hybrid, and V0 = (0, 0, 1) if recessive), then
V1 = DV0 ,
is the probability vector of the oﬀspring. For example, if the unknown parent was hy1
brid, then V1 = DV0 = ( 1 , 2 , 0) . Thus the oﬀspring can, with equal likelihood, be either
2
dominant or hybrid, but cannot be recessive.
Notice that the matrix D embodies the fact that one of the parents is dominant.
If the individual of unknown genetic type were crossed with an individual of hybrid
type, then the corresponding matrix H is 1 1
2
40
1
H = 1 1 2 ,
2
2
1
1
042
while if the person of unknown type were crossed with the individual of recessive type, then
000
1
R = 1 2 0 .
1
021
It is of interest to investigate the question of genetic stability under various circumstances. Say we begin with an individual of unknown genetic type and cross it with a
dominant individual, then cross that oﬀspring with another dominant individual, and so
on, always mating the resulting oﬀspring with a dominant individual. Let Vn represent the
genetic probability vector for the oﬀspring in the n th generation. Then
Vn = DVn−1 = D2 Vn−2 = · · · = Dn V0 ,
where V0 is the unknown vector for the initial parent (of unknown genetic type). Without
knowing V0 , can we predict the eventual (n → ∞ ) genetic types of the oﬀspring? Intuitively, we expect that no matter what the type of the initial parent, the repeated mating
with a dominant individual will produce a dominant strain. The question we are asking is,
does lim Vn exist, and if so, what is it?
n→∞
Assume for the moment that the limit does exist and denote it by V . Then V = DV
since
V = lim Vn = lim Vn+1 = lim DVn = D( lim Vn ) = DV
n→∞ n→∞ n→∞ n→∞ Armed with the equation DV = V , we can solve linear equations for the vector V =
(v1 , v2 , v3 )
1
v1 + v2 + 0 = v1
2
1
0 + v2 + v3 = v2
2
0 + 0 + 0 = v3 .
Clearly v1 = v2 = v3 = 0 is a trivial solution. A non-trivial one can be found by transposing
the vj ’s to the left side and solving. We ﬁnd v1 = 1, v2 = 0, v3 = 0(v1 = 1 since
v1 + v2 + v3 = 1 ). Thus, if the limit Vn exists, the limit must be V = (1, 0, 0) . In
genetic terms, this sustains our feeling that the oﬀspring will eventually become genetically
dominant. 5.4. AN APPLICATION TO GENETICS 245 But does the limit exist? To prove it does, we must show for any probability vector
V0 = (v1 , v2 , v3 ) , where v1 + v2 + v3 = 1 , that the limit
lim Vn = lim Dn V0 , n→∞ n→∞ exists and equals V = (1, 0, 0) . By evaluating D, D2 , and D3 explicitly, we are led to
guess 1 1 − 21 1 − 2n1 1
n
−
1
1
,
Dn = 0
2n
2n−1
0
0
0
which is then easily veriﬁed using mathematical induction. Thus v1 + (1 − 21 ) + (1 − 2n1 1 )v3
n
−
1 + 2n1 1 v3
Vn = D n V0 = 0 +
−
2n v2
0+
0
+0 v1 + v2 + v3 −
1
=
2n v2 +
0 1
2n (v2 +
1
v
2n−1 3 2v3 ) Since v1 + v2 + v3 = 1 , we ﬁnd 1
−1
1
Vn = Dn V0 = 0 + n (v2 + 2v3 ) 1 .
2
0
0 It is now clear that the limit as n → ∞ does exist, and is V = (1, 0, 0) . Consequently, if we
begin with a random individual (you) and mate that individual and the successive oﬀspring
with a dominant gene bearer, then the resulting generations will tend to all dominant
individuals. Moreover, the process proceeds exponentially because the “damping factor” is
1
essentially 2 for each generation (see above formula).
Were there enough time, you would see a second application of matrices to the special
theory of relativity. Given your knowledge of linear spaces, it is possible to present an
elegant exposition of the theory. The Lorentz transformation would appear as an orthogonal
transformation - a rotation - in world space or Minkowski’s space as it is often called. This
is a four dimensional space three of whose dimension are those of ordinary space, while
√
the fourth dimension is an imaginary (i = −1) time dimension. Goldstein’s Classical
Mechanics contains the topic. Regrettably, he does not begin with the Michelson - Morley
experiment but rather plunges immediately into mathematical technicalities. Exercises
1. If you begin with an individual of unknown genetic type and cross it with a hybrid
individual and then cross the successive oﬀspring with hybrids, does the resulting strain
approach equilibrium? If so, what is it?
2. Same as 1 but you mate an individual of unknown type with a recessive individual.
3. Beginning with an individual of unknown genetic type, you mate it with a dominant
individual, mate the oﬀspring with a hybrid, mate that oﬀspring with a dominant, and
continue mating alternate generations with dominants and hybrids respectively. Does the 246 CHAPTER 5. MATRIX REPRESENTATION resulting strain approach equilibrium? If so, what is it? (You will need to deﬁne equilibrium
to cope with this problem. There are several reasonable deﬁnitions.)
4. a). The city X has found that each year 5% of the city dwellers move to the suburbs,
while only 1% of the suburbanites move to the city. Assuming the total population of the
city plus suburb does not change, show that the matrix of transition probabilities is
P= .95 .01
.05 .99 , where a vector V = (v1 , v2 ) = (proportion of people in city, proportion of people in suburb).
b). Given any initial population distribution V , does the population approach an
equilibrium distribution? If so, ﬁnd it.
5. A long queue in front of a Moscow market in the Stalin era sees the butcher whisper to
the ﬁrst in line. He tells her “Yes, there is steak today.” She tells the one behind her and so
on down the line. However, Moscow housewives are not reliable transmitters. If one is told
“yes”, there is only an 80% chance she’ll report “yes” to the person behind her. On the
other hand, being optimistic, if one hears “no”, she will report “yes” 40% of the time. If
the queue is very long, what fraction of them will hear “there is no steak”? [This problem
can be solved without ﬁnding a formula for P n , although you might ﬁnd it a challenge to
ﬁnd the formula]. 5.5 A pause to ﬁnd out where we are .
We all know the homily about the forest and the trees. The next few pages are about
the forest.
In the beginning we introduced dead linear spaces with their algebraic structure (Chapter II). Then we investigated the geometry induced by deﬁning an inner product on a linear
space and saw how easily many of the results in Euclidean geometry generalize (Chapter
III).
Our next step was to consider mappings, linear mappings, between linear spaces (Chapter IV). Not much could be said in general, so we began investigating a particular case, linear
maps between ﬁnite dimensional spaces. Two important special cases of this
L : R1 → Rn ,
and
L : Rn → R1 ,
were treated before the general case,
L : Rn → Rm .
A key theorem which facilitates the theory of linear mappings between ﬁnite dimensional
spaces is the representation theorem (page 374): every such map can be represented as a
matrix.
What next? There are two equally reasonable alternatives: 5.5. A PAUSE TO FIND OUT WHERE WE ARE 247 (A) We can continue with linear maps,
L : V1 → V2 ,
and consider the case where V1 or V2 , or both are inﬁnite dimensional. The general
theory here is in its youth and still undeveloped. Only one of the sources of diﬃculty
is that a generalization of the representation theorem (page 374) remains unknown
- except for some special cases. Thus, many special types of mappings have to be
investigated individually. We shall consider only one type of linear mapping between
inﬁnite dimensional spaces, those deﬁned by linear diﬀerential operators (Chapter VI
and Chapter VII, Section 3).
(B) The second alternative is to continue our study of mappings between ﬁnite dimensional
spaces, only now switch to non linear mappings. This theory should parallel the
transition in elementary calculus from the analytic geometry of straight lines,
f (x) = a + bx,
that is, aﬃne mappings, to genuine non linear mappings, as
√
f (x) = x2 − 7 x
or
f (x) = x3 − esin x .
You recall, one important idea was to approximate the graph of a function y = f (x)
at a point x0 by its tangent line at x0 , since for x near x0 , the curve and the tangent
line there approximately agree. For example, one easily proves that at a maximum or
minimum, the tangent line must be horizontal, f = 0 .
In generalizing this to functions of several variables,
Y = F (X ) = F (x1 , · · · , xn ),
the role of the derivative at X0 is assumed by the aﬃne map,
A(X ) = Y0 + LX,
which is tangent to F at X0 . Thus, linear algebra appears as the natural extension
of analytic geometry to higher dimensional spaces. See Chapters VII - IX for this. 248 CHAPTER 5. MATRIX REPRESENTATION Chapter 6 Linear Ordinary Diﬀerential
Equations
6.1 Introduction .
A diﬀerential equation is an equation relating the values of a function u(t) with the
values of its derivatives at a point,
F (t, u(t), du
dn u
,..., n ) = 0
dt
dt (6-1) The order of the equation is the order, n , of the highest derivative which appears. For
example, the equations
3
d2 u
du
−7
+ t2 u2 − sin t = 0
dt2
dt
du
− t sin u2 = 0
dt
are of order two and one respectively. A function u(t) is a solution of the diﬀerential
equation if it has at least as many derivatives as the order of the equation, and if substitution
of it into the equation yields an identity. Thus, the equation
du
dt 2 + u2 = 1 has the function u(t) = sin t as a solution, since for all t
d
sin t
dt 2 + (sin t)2 = 1. A diﬀerential equation (1) for the unknown function u(t) is linear if it has the form
Lu := an (t) dn u
dn−1
+ an−1 (t) n−1 + · · · + a0 (t)u = 0
dtn
dt (6-2) You should verify that this coincides with the notion of a linear operator used earlier. Equation (2) is sometimes called linear homogeneous to distinguish it from the inhomogeneous
equation
Lu = f (t),
(6-3)
249 250 CHAPTER 6. LINEAR ORDINARY DIFFERENTIAL EQUATIONS that is
an (t) dn u
+ · · · + a0 (t)u = f (t).
dtn (6-4) The subject of this chapter is linear ordinary diﬀerential equations with variable coeﬃcients (to distinguish them from the special case where the aj ’s are constants). This operator L deﬁned by (2) has as its domain the set of all suﬃciently diﬀerentiable functions— n
derivatives is enough. These functions constitute an inﬁnite dimensional linear space. Thus,
the diﬀerential operator L acts on an inﬁnite dimensional space, as opposed to a matrix
which acts on a ﬁnite dimensional space.
Diﬀerential equations abound throughout applications of mathematics. This is because
most phenomena are described by laws which relate the rate of change of a function - the
derivative - at a given time (or point) to the values of the function at that same time.
For example, we have seen that at any time the acceleration of a harmonic oscillator is
determined by its position and velocity at the same time,
u = −µu − ku.
¨
˙
When confronted by a diﬀerential equation, your ﬁrst reaction should be to attempt to
ﬁnd the solution explicitly. We were able to do this for linear constant coeﬃcient equations
(Chapter 4, Section 2). One of the main goals of this chapter is to show you how to solve
as many linear ordinary diﬀerential equations as possible. However, it is naive to expect
to solve an arbitrary equation which crops up in terms of the few functions we know:
xα , ex , log x, sin x , and cos x . In fact, to even solve the elementary equation
1
du
=,
dx
x
appearing in elementary calculus, we were forced to deﬁne a new function as the solution
of this equation
u(x) = log x + c
and obtain the properties of this function and its inverse ex directly from the diﬀerential
equation. Many many functions arise which cannot be expressed in terms of the few elementary functions we know and love. Most of these functions - like Bessel’s functions, elliptic
functions, and hypergeometric functions, arise directly because they are the solutions of
diﬀerential equations nature has forced us to consider.
How do we know these strange sounding functions are solutions of the diﬀerential equations? Well, we somehow prove a solution exists and then simply give a name to the solution
- much as babies are given names at birth. Furthermore, as is the case with babies, their
actual “names” are the least important aspect.
To summarize brieﬂy, we shall solve as many equations as we can. For the remaining
ones (which include most equations), we shall attempt to describe a few of the main properties so that if one arises in your work, you will have a place to begin the attack. Later
on, we shall again return to the more complicated situation of nonlinear equations. Much
less can be said there. Only very few general results are known.
Lest you get the wrong idea, we shall cover but a fraction of the known theory for just
linear ordinary diﬀerential equations. In the next chapter, we shall only look at one partial
diﬀerential equation (the wave equation for a vibrating violin string). The general theory
there is too complicated to allow discussion for more than one particular equation. 6.1. INTRODUCTION 251 Exercises
1. Assume there exists a unique function E (x) which satisﬁes the following diﬀerential
equation for all x and satisﬁes the initial condition
du
= u,
dx u(0) = 1. (a) Use the “chain rule” and uniqueness to prove for any a ∈ R
E (x + a) = E (a)E (x)
˜
[Hint: Prove E (x) := E (x + a) is also a solution of the equation. Then apply the
˜
uniqueness to the function E (x)/E (a) ].
(b) Prove
E (−x) = 1
.
E (x) (c) Prove for any x
E (nx) = [E (x)]n , n ∈ Z. In particular, show
E (n) = [E (1)]n ,
and
E( n∈Z 1
) = [E (1)]1/m ,
m m ∈ Z+ (d) Prove n
) = [E (1)]n/m , n ∈ Z, m ∈ Z+
m
n
[Thus, the function E (x) is deﬁned for all rational x = m as the number E (1) to
the power n/m . Since E (x) is continuous (even diﬀerentiable by deﬁnition, we can
extend the last formula to irrational x by continuity: if rj is a sequence of rational
numbers converging to the real number x (which may or may not be rational) then
by continuity
E (x) = lim E (rj ) = lim [E (1)]r j = E (1)x .
E( j →∞ j →∞ Consequently, E (x) is the familiar exponential function ex ].
2. Find the general solutions of the following equations by any method you can.
(a) du
dx − 2u = 0 (b) du
dx = x2 + sin x (c) du 2
dx + 4u2 = 1 (d) du
dx = (e) du
dx = x2 eu (f) d2 u
dx2 x
u+1 + 3 du − 4u = 4
dx 252 CHAPTER 6. LINEAR ORDINARY DIFFERENTIAL EQUATIONS 6.2 First Order Linear .
Except for those diﬀerential equations which can be solved by inspection, the next most
simple equation is one which is linear and ﬁrst order, the homogeneous equation
du
+ a(x)u = 0,
dx (6-5) du
+ a(x)u = f (x).
dx (6-6) and the inhomogeneous equation The homogeneous equation can be solved by ﬁrst writing it in the form
1 du
= −a(x)
u dx
and then integrating both sides
x log u(x) = − a(s) ds + C1 . Thus x u(x) = Ce − a(s) ds (6-7) is the solution of equation (4) for any constant C . In the very special case a(s) ≡ constant,
the solution does have the form found earlier (Chapter 4, Section 2) for a linear equation
with constant coeﬃcients.
How can we integrate the inhomogeneous equation (5)? A useful device is needed.
Multiply both sides of this equation by an unknown function q (x)
q (x) du
+ q (x)a(x)u = q (x)f (x),
dx If we can ﬁnd q (x) so that the left side is a derivative,
q (x) du
d
+ q (x)a(x)u =
(q (x)u),
dx
dx (6-8) then the equation reads
d
(q (x)u) = q (x)f (x),
dx
which can be integrated immediately,
x q (x)u(x) = q (s)f (s) ds + c, (6-9) and then solved for u(x) by dividing by q (x) .
Thus, the problem is reduced to ﬁnding a q (x) which satisﬁes (7). Evaluating the right
side of (7), we ﬁnd
du
dq
du
q
+ qa u = u
+q ,
dx
dx
dx 6.2. FIRST ORDER LINEAR 253 so q (x) must satisfy
dq
= q (x)a(x).
dx
It is easy to ﬁnd a function q (x) which satisﬁes this - for it is a homogeneous equation of
the form (4). Therefore
Rx
q (x) = e a(t) dt ,
the reciprocal of the solution (6) to the homogeneous equation, does satisfy (7). Notice
we have ignored the arbitrary constant factor in the solution since all we want is any one
function q (x) for (7).
Now we can substitute into (8) to ﬁnd the solution of the inhomogeneous equation
u(x) = x 1
q (x) q (s)f (s) ds + c
,
q (x) (6-10) where q (x) is given by the formula at the top of the page. If it makes you happier,
substitute the expression for q (x) into (9) to obtain the messy formula. We have left some
room.
a figure goes here
Examples: 1.
First, du
dx 2
+ x u = (1 + x3 )17 ,
x q (x) = exp(
Thus x = 0. 2
ds) = exp(2 ln x) = exp(ln x2 ) = x2 .
s d2
(x u) = x2 (1 + x3 )17 .
dx Integrating both sides we ﬁnd
x2 u(x) =
Therefore
u(x) =
2.
First, du
dx (1 + x3 )18
+ C.
54 1 (1 + x3 )18
C
+ 2,
54
x2
x x = 0. + 2xu = x
x q (x) = exp( 2s ds) = exp x2 . Thus,
d x2
2
(e u) = ex x.
dx
Integrating both sides, we ﬁnd
12
2
ex u(x) = ex + C,
2
so 1
2
+ Ce−x .
2
This formula could have been guessed much earlier since we know the general solution of
the inhomogeneous equation can be expressed as the sum of a particular solution to that
u(x) = 254 CHAPTER 6. LINEAR ORDINARY DIFFERENTIAL EQUATIONS equation plus the general solution of the homogeneous equation. The particular solution
u0 (x) = 1 can be obtained by inspection of the D.E.
2
Let us summarize our results.
Theorem 6.1 . Consider the ﬁrst order linear inhomogeneous equation
Lu := du
+ a(x)u = f (x).
dx If a(x) and f (x) are continuous functions, the equation has the solutions
x u(x) = u(x)
˜ f (s)
ds + C u(x)
˜
u(s)
˜ where (6-11) x u(x) = exp(−
˜ a(s) ds) is a non-trivial solution of the homogeneous equation. Moreover, if we specify the initial
condition u(x0 ) = α , then the solution which satisﬁes this initial condition is unique.
Proof: The existence follows from the explicit formula (9) or (10) and from the fact that
a continuous function is always integrable.
Uniqueness. This will be quite similar to the proof carried out in Chapter 4. If u1 (x)
and u2 (x) are two solutions of the inhomogeneous equation Lu = f , with the same initial
conditions, then the function
w(x) := u1 (x) − u2 (x)
satisﬁes the homogeneous equation
Lw := w + a(x)w = 0,
and is zero at x0 ,
w(x0 ) = u1 (x0 ) − u2 (x0 ) = 0.
Our task is to prove w(x) ≡ 0 . Multiply the equation (20) by w(x) . Then
ww = −a(x)w2 ,
or 1d 2
w = −a(x)w2 .
2 dx
Since a(x) is continuous, for any closed and bounded interval [A, B ] there is a constant k
(depending on the interval) such that −a(x) ≤ k for all x ∈ [A, B ] . Consequently,
1d 2
w ≤ kw2 ,
2 dx
or d2
w − 2kw2 ≤ 0.
dx
Now we need an important identity which can be veriﬁed by direct computation: for any
smooth function g , and any constant α , g + αg = e−αx (eαx g ) . We apply this to the
above inequality with g = w2 and α = −2k to conclude that
e2kx d −2kx 2
[e
w ] ≤ 0.
dx 6.2. FIRST ORDER LINEAR 255 Because e2kx is always positive, by the mean value theorem this inequality states that
e−2kx w2 is a decreasing function of x . Thus
e−2kx w2 (x) ≤ e−2kx0 w2 (x0 ), x ≥ x0 , or
w2 (x) ≤ e2k(x−x0 ) w2 (x0 ), x ≥ x0 . But since w(x0 ) = 0 and w2 (x) ≥ 0 this means that
0 ≤ w2 (x) ≤ 0.
Therefore w(x) ≡ 0 x ≥ x0 .
To prove w(x) ≡ 0 for x ≤ x0 , merely observe that the equation (11) has the same
form if x is replaced by −x . Thus the above proof applies and shows w(x) ≡ 0 for x ≤ x0
too.
Remark: Although a formula has been exhibited for the solution, this does not mean that
the integrals which occur can be evaluated in terms of elementary functions. These integrals
however can be at least evaluated approximately using a computer if a numerical result is
needed. Exercises
(1) . Find the solution of the following equations with given initial values
(a) u + 7u = 3,
(b) 5u − 2u = u(1) = 2 e3x , u(0) = 1. 2x2 , u(−1) = 0. (d) xu + u = 4x3 + 2, u(1) = −1. (c) 3u + u = x − (e) u + (cot x)u = ecos x + 1, u( π ) = 0. [ cot x dx = ln(sin x)] .
2 (2) . The diﬀerential equation
L du
+ Ru = E sin ωt, L, R, E constants
dt arises in circuit theory. Find the solution satisfying u(0) = 0 and show that it can
be written in the form
u(t) = R2 ωEL
E
e−Rt/L + √
sin(ωt − α)
2 L2
+ω
R2 + ω 2 L2 where
tan α = ωL
.
R (3) Bernoulli’s equation is
u + a(x)u = b(x)uk , k a constant. 256 CHAPTER 6. LINEAR ORDINARY DIFFERENTIAL EQUATIONS
(a) Use the substitution v (x) = u(x)1−k to transform this nonlinear equation to the
linear equation
v + (1 − k )a(x)v = (1 − k )b(x).
(b) Apply the above procedure to ﬁnd the general solution of
u − 2ex u = ex u3/2 . (4) . Consider the equation
u + au = f (x),
where a is a constant, f is continuous in the interval [0, ∞] , and |f (x)| < M for
all x .
(a) Show that the solution of this equation is
x u(x) = e−ax u(0) + e−ax eat f (t) dt
0 (b) Prove (if a = 0 )
u(x) − e−ax u(0) ≤ M
[1 − e−ax ].
a (5) (a) Show the uniqueness proof yields the following stronger fact. If u1 (x) and u2 (x)
are both solutions of the same equation
u + a(x)u = f (x)
but satisfy diﬀerent initial conditions
u1 (x0 ) = α, u2 (x0 ) = β, then
|u1 (x) − u2 (x)| ≤ ek(x−x0 ) |α − β | , x ≥ x0 for all x ∈ [A, B ] , where −a(x) < k in the interval. Thus, if the initial values
are close, then the solutions cannot get too far apart.
(b) Show that if a(x) ≤ A < 0 , where A is a constant, then as x → 0 any two
solutions of the same equation - but with possibly diﬀerent initial values - tend
to the same function.
(6) . Show that the diﬀerential equation
y = a(x)F (y ) + b(x)G(y )
can be reduced to a linear equation by the substitution
u = F (y )/G(y ) or u = G(y )/F (y ) if (F G − GF )/G or (F G − GF )/F , respectively, is a constant. Use this substitution
to again solve Bernoulli’s equation. 6.2. FIRST ORDER LINEAR 257 (7) . Let S = { u ∈ C : u(0) = 0 } , and deﬁne the operator L from S to C by
Lu = u + u.
Prove L is injective and R(L) = C .
(8) . Set up the diﬀerential equation and solve. The rate of growth of a bacteria culture
at any time t is proportional to the amount of material present at that time. If there
was one ounce of culture in 1940 and 3 ounces in 1950, ﬁnd the amount present in the
year 2000. The doubling time is the interval it takes for a given amount to double.
Find the doubling time for this example.
(9) . Find the general solution of x2 u + 3xu = sin x .
(10) . Assume that a body decreases its temperature u(t) at a rate proportional to the
diﬀerence between the temperature of the body and the temperature T of the surrounding air. A body originally at a temperature of 1000 is placed in air which is
kept at a temperature of 500 . If at the end of one hour the temperature of the body
has fallen 200 , how long will it take for the body to reach 600 ?
(11) . Here is one simple mathematical model governing economic behavior. Think of
yourself as a widget manufacturer for now. Let
i) S (t) be the supply of widgets available at time t . This is the only function you
can control directly.
ii) P (t) be the market price of a widget at time t .
iii) D(t) is the demand for widgets at time t —the number of widgets people want
to buy at time t . You cannot control this given function.
It has been found that the market price P (t) changes at a rate proportional to the
diﬀerence between demand and supply,
dP
= k (D(t) − S (t)),
dt
where k > 0 is a ﬁxed constant.
You decide to vary the supply so that it is a ﬁxed constant S0 plus an amount
proportional to the market price,
S (t) = S0 + αP (t), α > 0. (a) Set up the diﬀerential equation for S (t) in terms of the given function D(t) and
solve it.
(b) Analyze the solution and give an argument making it plausible that the market
for widgets behaves roughly in this way. What criticisms can you make of the
model?
(c) How does the market behave if the demand increases for a long time and then
levels oﬀ at some constant value, D(t) = D(t1 ) for t ≥ t1 ? A qualitative
description of S (t) and P (t) is called for here. In particular, say whether price
increases without bound (bringing the evils of inﬂation) or whether it, too, levels
oﬀ. 258 CHAPTER 6. LINEAR ORDINARY DIFFERENTIAL EQUATIONS (12) It is found that a juicy rumor spreads at a rate proportional to the number of people
who “know”. If one person knows initially, t = 0 , and tells one other person by
the next day, t = 1 , approximately how long does it take before 4000 people know?
Analyze the mathematical model as t → ∞ and state why it is, in fact, the wrong
model. (The question to ask yourself is, “how long will it take before everyone even
remotely concerned knows?”). The same mathematical model applies to the spreading
of contagious diseases - and many other similar phenomena. 6.3 Linear Equations of Second Order In this section we will consider a portion of the general theory of second order linear
O.D.E.’s, with variable coeﬃcients,
Lu := a2 (x) du
d2 u
+ a1 (x)
+ a0 (x)u = f (x).
dx2
dx Although all of the results obtained generalize immediately to linear equation of order n ,
only the special case n = 2 will be treated. This special case has the advantage of clearly
illustrating the general situation and supplying proofs which generalize immediately - while
avoiding the inevitable computation complexities inherent in the general case.
There are three parts:
A). a review of the constant coeﬃcient case,
B). power series solutions, and
C). the general theory.
Whereas the ﬁrst two parts are concerned with obtaining explicit formulas for the solutions,
the last resigns itself to some statements which can be made without ﬁnding the solution
explicitly. a) A Review of the Constant Coeﬃcient Case. Here we have the operator
Lu := a2 u + a1 u + a0 u, (6-12) where a0 , a1 , and a2 are constants. In order to solve the homogeneous equation
Lu = 0,
the function eλx is tried. Substitution yields
L(eλx ) = (a2 λ2 + a1 λ + a0 )eλx = p(λ)eλx . (6-13) labeleq:13 The polynomial p(λ) is called the characteristic polynomial for L . If λ1 is a
root of this polynomial, p(λ1 ) = 0 , then u1 (x) = eλ1 x is a solution of the homogeneous
equation Lu = 0 . If λ2 is another root of this polynomial λ1 = λ2 , u2 (x) = eλ2 x is
another solution. Then every function of the form
u(x) = Au1 (x) + Bu2 (x) = Aeλ1 x + Beλ2 x , (6-14) where A and B are constants, is a solution of the homogeneous equation. The uniqueness
theorem showed that every solution of Lu = 0 is of the form (14). 6.3. LINEAR EQUATIONS OF SECOND ORDER 259 If the two roots of p(λ) coincide, then a second solution is u2 (x) = xeλ1 x , and every
function of the form
u(x) = Au1 (x) + Bu2 (x) = Aeλ1 x + Bxeλ1 x (6-15) where A and B are constants, is a solution of the homogeneous equation. Again the
uniqueness theorem showed that every solution of Lu = 0 is of the form (15).
In both (14) and (15), the constants A and B can be chosen to ﬁnd a unique function
u(x) which satisﬁes the homogeneous equation
Lu = 0
as well as the initial conditions
u(x0 ) = α, u (x0 ) = β, where α and β are speciﬁed constants.
It turns out that the inhomogeneous O.D.E.
Lu = f,
where f is a given continuous function, can always be solved once two linearly independent
solutions u1 and u2 of the homogeneous equation Luj = 0 are known. Since the procedure
for solving the inhomogeneous equation also works if the coeﬃcients in the diﬀerential
operator L are not constant, it is described later in this section in the more general situation
(p. 487-8, Theorem 8). Somewhat simpler techniques can be used for the constant coeﬃcient
equation if the function f is a linear combination of functions of the form xk erx , where k
is a nonnegative integer and r is some real or complex constant (cf. Exercise 6, p. 300).
Because both sin nx and cos nx are of this form, Fourier series can be used to supply a
solution for any function f which has a convergent Fourier series (cf. Exercise 13, p. 303).
Section 5 of this chapter contains an interesting generalization of the theory for constant
coeﬃcient ordinary diﬀerential operators to operators which are “translation invariant”. b) Power Series Solutions .
Many ordinary diﬀerential equations (linear and nonlinear) can be solved by merely
assuming the solution can be expanded in a power series u(x) = cn xn , and plugging into
the diﬀerential equation to ﬁnd the coeﬃcients cn . A simple example illustrates this.
Example: Solve u − 2xu = 0 with the initial conditions u(0) = 1, u (0) = 0 .
Solution: We try
u(x) = c0 + c1 x + c2 x2 + · · · + cn xn + · · · .
Then
u (x) = c1 + 2c2 x + 3c3 x2 + · · · + ncn xn−1 + · · ·
so
2xu (x) = 2c1 x + 4c2 x2 + · · · + 2ncn xn + · · · 260 CHAPTER 6. LINEAR ORDINARY DIFFERENTIAL EQUATIONS Also
u (x) = 2c2 + 2 · 3c3 x + 3 · 4c4 x2 + · + (n − 1)ncn xn + · · ·
Adding u − 2xu − u and collecting like powers of x we ﬁnd that 0 = u − 2xu − u =
[2c2 − c0 ] + [2 · 3c3 − 2c1 − c1 ]x + [3 · 4c4 − 4c2 − c2 ]x2
+ · · · + [(k + 1)(k + 2)ck+2 − 2kck − ck ]xk + · · ·
If the right side, a Taylor series, is to be zero (= the left side), then the coeﬃcient of each
power of x must vanish because the only convergent Taylor series for zero is zero itself.
The coeﬃcient of
x0 is
2c2 − c0
x1 is
6c3 − 3c1
x2 is
12c4 − 5c2
xk is
(k + 1)(k + 2)ck+2 − (2k + 1)ck
Equating these to zero we ﬁnd that
c2 = c0
,
2 c3 = c1
,
2 c4 = 5c2
5
= c0
12
24 and, more generally,
ck+2 = 2k + 1
ck .
(k + 2)(k + 1) (6-16) Thus, for this example eeven is some multiple of c0 while codd is some multiple of c1 .
Since u(0) = c0 and u (0) = c1 , the constants c0 and c1 are determined by the initial
conditions.
c0 = 1, c1 = 0.
Consequently, all of the odd coeﬃcients c3 , c5 , . . . vanish, while
1
c2 = ,
2 c4 = 5
,
24 c6 = 3
1
c4 = ,
10
16 c8 = . . . , so the ﬁrst few terms in the series for u(x) are
1
5
1
u(x) = 1 + x2 + x4 + x6 + · · ·
2
24
16 (6-17) We should investigate if this formal power series expansion converges. Using (16), the ratio
of successive terms in the series for u(x) is
ck+2 xk+2
(2k + 1)
x2
=
k
(k + 2)(k + 1)
ck x
Therefore the ratio test shows the formal power series actually converges for all x . By
Theorem 16, p. 82, the series can be diﬀerentiated term by term and does satisfy the
equation.
Although the computation is lengthy, the series (17) is a solution. Since there is no
way of ﬁnding the solution in terms of elementary functions, we must be contented with the
power series solution. You have seen (Chapter 1, Section 7) how properties of a function
can be extracted from a power series deﬁnition.
This example is typical. 6.3. LINEAR EQUATIONS OF SECOND ORDER 261 Theorem 6.2 . If the diﬀerential equation
a2 (x)u + a1 (x)u + a0 (x)u = 0
has analytic coeﬃcients about x = 0 , that is, if the coeﬃcients all have convergent Taylor
series expansions about x = 0 , and if a2 (0) = 0 , then given any initial values
u(0) = α, u (0) = β, there is a unique solution u(x) which satisﬁes the equation and initial conditions. Moreover,
the solution is analytic about x = 0 and converges in the largest interval [−r, r] in which
the series for a1 /a2 and a0 /a2 both converge.
Outline of Proof. There are two parts: i) ﬁnd a formal power series u(x) = cn xn , and ii)
prove the formal power series converges. Since explicit formulas can be found for the cn ’s
(cf. Exercise 30a) the ﬁrst part is true. Proof of the second part is sketched in the exercises
too (Exercise 30b).
From the explicit formulas mentioned above for the cn ’s, it is clear there is at most
one analytic solution. But because the general uniqueness proof (p. 510, Theorem 9) states
there is at most one solution which is twice diﬀerentiable and since u(x) is certainly such
a function - the uniqueness of u(x) among all twice diﬀerentiable functions follows as soon
as Theorem 9 is proved.
The restriction a2 (0) = 0 which was made in Theorem 3 is very important. If a2 (0) = 0
then the diﬀerential equation
a2 (x)u + a1 (x)u + a0 (x)u = 0
is degenerate at x = 0 because the coeﬃcient of the highest order derivative vanishes there.
Then the point x = 0 is called a singularity of the diﬀerential equation. A simple example
illustrates the situation. The function u(x) = x5/2 satisﬁes the diﬀerential equation
4x2 u − 15u = 0
and the initial conditions u(0) = 0, u (0) = 0 . However u(x) ≡ 0 is also a solution. Thus
it will be impossible to prove any uniqueness theorem at x = 0 for this equation. Perhaps
the singular nature of this equation at x = 0 is more vivid if the equation is written as
u− 15
u = 0.
4x2 Although the possibility of a uniqueness result is ruled out for equations with singularities, it is important to be able to ﬁnd the non-zero solutions of these equations, important
because many of the equations which arise in practice do happen to have singularities
(Bessel’s equation, Legendre’s equation, the hypergeometric equation, . . . ). In all of the
commonly occurring cases, the coeﬃcients a0 (x), a1 (x) , and a2 (x) ,
a2 u + a1 u + a0 u = 0,
are analytic functions. Thus the only obstacle to applying Theorem 3 is the condition
a2 = 0 . We persist, however, in the belief that a power series, or some modiﬁcation of it, 262 CHAPTER 6. LINEAR ORDINARY DIFFERENTIAL EQUATIONS should work. The modiﬁcation must allow for such solutions as u(x) = x3/2 which do not
have Taylor expansions about x = 0 . Undoubtedly the most naive candidate for a solution
is to try
∞ u(x) = xρ cn xn , (6-18) n=0 where ρ may be any real number. The particular choice ρ = 3/2, c0 = 1, c1 = c2 = c3 =
. . . = 0 does yield the function u(x) = x3/2 . It turns out that (18) is usually the correct
guess.
Again, we turn to an example. Bessel’s equation of order n ,
x2 u + xu + (x2 − n2 )u = 0,
which arises in the study of waves in a two dimensional circular domain, like those on
tympani, in a tea cup, or on your ear drum. Let us ﬁnd a solution to Bessel’s equation of
order one,
x2 u + xu + (x2 − 1)u = 0
(6-19)
This equation does have a singularity at the origin, x = 0 . If u has the form (18), then
∞ cn xn+ρ , u(x) =
n=0
∞ (n + ρ)cn xn+ρ−1 , u (x) =
n=0 and ∞ (n + ρ)(n + ρ − 1)cn xn+ρ−2 . u (x) =
n=0 Substituting this into the diﬀerential equation (19), we ﬁnd
∞ ∞ (n + ρ)(n + ρ − 1)cn xn+p +
n=0 (n + ρ)cn xn+ρ
n=0 ∞ ∞ cn xn+ρ+2 − +
n=0 cn xn+ρ = 0. (6-20) n=0 We must equate the coeﬃcients of successive powers of x to zero. The lowest power of x
which appears is xρ , the next xρ+1 , and so on.
xρ :
xρ+1 :
xρ+2 :
xρ+3 :
·
·
·
·
ρ+ n :
x ρ(ρ − 1)c0 + ρc0 − c0 = 0
(ρ + 1)ρc1 + (ρ + 1)c1 − c1 = 0
(ρ + 2)(ρ + 1)c2 + (ρ + 2)c2 + c0 − c2 = 0
(ρ + 3)(ρ + 2)c3 + (ρ + 3)c3 + c1 − c3 = 0 (ρ + n)(ρ + n − 1)cn + (ρ + n)cn + cn−2 − cn = 0. 6.3. LINEAR EQUATIONS OF SECOND ORDER 263 From the equation for the power xρ , we ﬁnd
(ρ2 − 1)c0 = 0
The polynomial q (ρ) = ρ2 − 1 which appears in the coeﬃcient of the lowest power of x
in (20) is called the indicial polynomial since it will be used to determine the index ρ . If
c0 = 0 , the equation (ρ2 − 1)c0 = 0 can be satisﬁed only if ρ is a root of the indicial
polynomial. Thus ρ1 = 1, ρ2 = −1 .
Consider the largest root ρ1 = 1 . Then the equation for the coeﬃcients of xρ+1 in
(20) is
xρ+1 = x2 : 3c1 = 0 ⇒ c1 = 0,
while the equation for the coeﬃcient of xρ+n in (20) is
xρ+n = x1+n : (n + 1)ncn + (n + 1)cn + cn−2 − cn = 0,
or
cn = − cn−2
,
n(n + 2) n = 2, 3, . . . Since c1 = 0 , this equation implies codd = 0 and determines the ceven in terms of c0 ,
c2 = − c0
,
2·4
c2k = c4 = − c2
c0
=
,
4·6
2 · 42 · 6 c6 = − c4
c0
=
6·8
2 · 42 · 62 · 8 (−1)k c0
(−1)k c0
.
= 2k
2 · 42 · 62 · 82 · · · (2k )2 (2k + 2)
2 k !(k + 1)! Thus, the formal series we ﬁnd for the solution, J1 (x) , of the Bessel equation of ﬁrst order
corresponding to the largest indicial root, ρ1 = 1 is
x2
x4
1
J1 (x) = x1 (1 −
+
− ···)
2
2 · 4 2 · 42 · 6
or
1
J1 (x) = x
2 ∞
k=0 (−1)k x2k
22k k !(k + 1)! (6-21) 1
since it is customary to choose the constant c0 for J1 (x) as c0 = 2 (and the constant c0
n n! when n is a positive integer).
for Jn (x) as 1/2
The other (smaller) root, ρ2 = −1 , is much more diﬃcult to treat. If the above steps
are imitated (which you should try), division by zero needed to solve for c2 from c0 . It
turns out that the solution corresponding to the smaller root ρ2 = −1 is not of the form
(18). We shall not enter into this matter further except to note that the diﬃculty occurs
because the two roots ρ1 and ρ2 diﬀer by an integer. If the two roots ρ1 and ρ2 do not
diﬀer by an integer, the above method yields two diﬀerent solutions of the form (18) for the
equation. In any case, this method always gives a solution of the form (18) for the largest
root of the indicial equation.
It is easy to check that the power series (21) does converge for all x and is therefore
a solution to Bessel’s equation of the ﬁrst order. From the power series, with considerable
eﬀort one can obtain a series of identities for Bessel functions which exactly parallels those
for the trigonometric functions. The functions Jn (x) behaving in many ways similar to
sin nx or cos nx . Here is a graph of J1 (x) : 264 CHAPTER 6. LINEAR ORDINARY DIFFERENTIAL EQUATIONS
a figure goes here For x very large, J1 (x) is asymptotically
2 cos(x − 3π/4)
√
,
π
x
√
which is a cosine curve whose amplitude decreases like 1/ x . For good reason this curve
resembles the height of surface waves on a lake after a pebble has been dropped into the
water, or those on the surface of a cup of tea.
Having worked out this example in detail, we shall state a deﬁnition in preparation for
our theorem.
Definition: The diﬀerential equation
J1 (x) ∼ a2 (x)u + a1 (x)u + a0 (x)u = 0,
where the aj (x) are analytic about x = 0 , it has a regular singularity at x = 0 if it can
be written in the form
x2 u + A(x)xu + B (x)u = 0,
where the functions A(x) and B (x) are analytic about x = 0 . Otherwise the singularity
is irregular.
Examples:
(1) . x2 (1 + x)u + 2(sin x)u − ex u = − has a regular singularity at x = 0 since the
equation may be written as
x2 u + −ex
2(sin x)
u−
u = 0,
1+x
1+x where the coeﬃcients 2 sin x/(1 + x)x and ex /1 + x do have convergent Taylor series
2
about x = 0 . (Here we observed that sin x = 1 − x + · · · ).
x
3!
(2) . xu − 7u +
the form 3
cos x u = 0 has a regular singularity at x = 0 since it can be written in 3x
u = 0,
cos x
where the coeﬃcients −7 and 3x/ cos x are analytic about x = 0 .
x2 u − 7xu + (3) . x2 u − 2u + xu = 0 has an irregular singularity at x = 0 since it cannot be written
in the desired form.
(4) . x3 u − 2xu + u = 0 has an irregular singularity at x = 0 .
Theorem 6.3 . (Frobenius) Consider the equation with a regular singularity at x = 0
a2 (x)u + a1 (x)u + a0 (x)u = 0,
so it can be written in the form
x2 u + A(x)xu + B (x)u = 0, 6.3. LINEAR EQUATIONS OF SECOND ORDER 265 where the analytic function A(x) and B (x) have convergent power series for |x| < r . Let
ρ1 and ρ2 be the roots of the indicial polynomial
q (ρ) = ρ(ρ − 1) + A(0)ρ + B (0),
where ρ1 ≥ ρ2 (or Reρ1 ≥ Reρ2 if roots are complex). Then the diﬀerential equation has
one solution u1 (x) of the form
∞
ρ1 cn xn u1 (x) = x (c0 = 0), n=0 the series converging for all |x| < r . Moreover, if ρ1 − ρ2 is not an integer (or zero), there
is a second solution u2 (x) of the form
∞
ρ2 cn xn
˜ u2 (x) = x (˜0 = 0),
c n=0 where this series also converges in the interval |x| < r . In the special case ρ1 − ρ2 =
integer, there may not be a solution of the form (18) - see Exercise 19c. Notice: although
the power series do converge at x = 0 , the functions u1 (x) and u2 (x) may not be solutions
1
at that point because the functions xρ may not be twice diﬀerentiable (for example, if ρ = 2
√
then x has no derivatives at x = 0 ).
Outline of Proof. Like Theorem 2, this proof also has two parts; i) ﬁnding the coeﬃcients
cn for the formal power series, and ii) proving the formal power series converges. As in
Theorem 3, part i) is proved by exhibiting formulas for the cn ’s, while part ii) is proved by
comparing the series
cn xn with another convergent series
Cn xn whose coeﬃcients
are larger, |cn | ≤ Cn .
To illustrate the procedure of part i), we will obtain the stated formula for the indicial
∞ ∞ αn xn and B (x) = polynomial q (ρ) . Let A(x) =
n=0 βn xn be the power series expann=0 sions of A(x) and B (x) . Then assuming u(x) has a solution in the form (18), we ﬁnd by
substituting these formulas into the diﬀerential equation that
∞ ∞ (ρ + n)(ρ + n − 1)cn xρ+n + (
n=0 n=0
∞ +( (ρ + n)cn xρ+n ) n=0 ∞ cn xρ+n ) = 0. n βn x )( n=0 The lowest power of x appearing is ∞ αn xn )( n=0 xρ , then comes xρ+1 , . . . . xρ : ρ(ρ − 1)c0 + α0 ρc0 + β0 c0 = 0
xρ+1 : (ρ + 1)ρc1 + [α1 ρc0 + α0 (ρ + 1)c1 ] + [β1 c0 + β0 c1 ] = 0
·
·
·
n xρ+n : (ρ + n)(ρ + n − 1)cn + n αn−k [(ρ + k )ck ] +
k=0 βn−k ck = 0,
k=0 266 CHAPTER 6. LINEAR ORDINARY DIFFERENTIAL EQUATIONS the last formula arising from the formula for the coeﬃcients in the product of two power
series (p. 76). If c0 = 0 , the ﬁrst equation states
q (ρ) := ρ(ρ − 1) + α0 ρ + β0 = 0,
where q (ρ) is the indicial polynomial. Since α0 = A(0) and β0 = B (0) , this is precisely
the formula given in the theorem. c) General Theory We begin immediately by stating
Theorem 6.4 (Existence and Uniqueness). Consider the second order linear O.D.E.
Lu := a2 (x)u + a1 (x)u + a0 (x)u = f (x),
where the coeﬃcients a0 , a1 , and a2 as well as f are continuous functions, and a2 (x) = 0 .
There exists a unique twice diﬀerentiable function u(x) which satisﬁes the equation and the
initial conditions
u(x0 ) = α,
u (x0 ) = β,
where α and β are arbitrary constants.
If time permits, the existence proof will be carried out in the last chapter as a special
case of a more general result. The uniqueness will be proved later too, as a special case of
Theorem 9, page 510 - in the next section. We will not be guilty of circular reasoning.
Now what? Although this theorem appears to make further study unnecessary, there
are several general statements which can be made because the equation is linear. Two
other theorems are particularly nice; the ﬁrst is dim N(L) = 2 , while the second gives a
procedure for solving the inhomogeneous equation once two linearly independent solutions
of the homogeneous equation are known.
A preliminary result on linear dependence and independence of functions is needed. If
the diﬀerentiable functions u1 (x) and u2 (x) are linearly dependent, there are constants c1
and c2 not both zero such that
c1 u1 (x) + c2 u2 (x) ≡ 0.
Diﬀerentiating this equation, we ﬁnd
c1 u1 (x) + c2 u2 (x) ≡ 0.
Since the two homogeneous algebraic equations for c1 and c2 have a non-trivial solution,
by Theorem 32 (page 428), the determinant
W (x) := W (u1 , u2 )(x) := u1 (x) u2 (x)
u1 (x) u2 (x) =0 must vanish. This determinant is called the Wronskian of u1 and u2 . We have proved 6.3. LINEAR EQUATIONS OF SECOND ORDER 267 Theorem 6.5 . If the diﬀerentiable functions u1 (x), u2 (x) are linearly dependent in the
interval [α, β ] , then necessarily W (x) ≡ 0 throughout [α, β ] . Thus, if W = 0 , the uj ’s
are independent. Remark: The condition W = 0 is necessary for linear dependence but not suﬃcient in
general, as can be seen from the example
u1 (x) = x2 , x ≥ 0,
0 , x<0 u2 (x) = 0 , x≥0
x2 , x < 0, for which W (u1 , u2 ) ≡ 0 for all x but u1 and u2 are linearly independent. However it is
suﬃcient if u1 and u2 are solutions of a second order linear O.D.E., Luj = 0 . An even
stronger statement is true in this case. All we need require is that W vanish at one point
x0 .
Theorem 6.6 . Let u1 and u2 both be solutions of
Lu := a2 u + a1 u + a0 u = 0,
where a2 = 0 . If W (x0 ) = 0 at some point x0 , then u1 and u2 are linearly dependent which implies by Theorem 6 that W (x) ≡ 0 for all x . In other words, if W (x0 ) = 0 , then
u1 and u2 are linearly independent.
Proof: Since W (x0 ) = 0 , the homogeneous algebraic equations
c1 u1 (x0 ) + c2 u2 (x0 ) = 0
c1 u1 (x0 ) + c2 u2 (x0 ) = 0
have a non-trivial solution c1 , c2 . Let
v (x) = c1 u1 (x) + c2 u2 (x).
We went to prove v (x) ≡ 0 . Observe Lv = 0 . Moreover v (x0 ) = 0 and v (x0 ) = 0 . Thus
by uniqueness, v (x) ≡ 0 , establishing the linear dependence of u1 and u2 .
The same type of reasoning proves
Theorem 6.7 . Let Lu := a2 u + a1 u + a0 u , where a2 (x) = 0 . Then
dim N(L) = 2.
Proof: We exhibit two special solutions φ1 and φ2 of Lu = 0 and prove they constitute
a basis for N(L) . Let
φ1 (x) satisfy Lφ1 = 0 with φ1 (x0 ) = 1, φ1 (x0 ) = 0 φ2 (x) satisfy Lφ2 = 0 with φ2 (x0 ) = 0, φ2 (x0 ) = 1. There are such functions by the existence theorem. 268 CHAPTER 6. LINEAR ORDINARY DIFFERENTIAL EQUATIONS
i) They are linearly independent.
W (x0 ) = W (φ1 , φ2 )(x0 ) = φ1 (x0 ) φ2 (x0 )
φ1 (x0 ) φ2 (x0 ) = 10
01 = 1 = 0. Thus by Theorem 7, φ1 and φ2 are linearly independent.
ii) They span N(L) . Let u(x) be any element in N(L) and consider the function
v (x) = u(x) − [u(x0 )φ1 (x) + u (x0 )φ2 (x)].
Then Lv = 0 and v (x0 ) = 0, v (x0 ) = 0 . By uniqueness, v (x) ≡ 0 . Thus every u ∈ N(L)
can be written as
u(x) = Aφ1 (x) + Bφ2 (x),
where the constants A and B are A = u(x0 ), B = u (x0 ) .
All of our attention has been on the homogeneous equation Lu = 0 . Let us solve the
inhomogeneous equation. This is particularly simple for a linear diﬀerential equation once
we have a basis for N(L) .
Theorem 6.8 (Lagrange). Let u1 (x) and u2 (x) be a basis for N(L) , where Lu :=
a2 (x)u + a1 (x)u + a0 (x)u , with a2 = 0 . Then the inhomogeneous equation Lu = f
has the particular solution
x up (x) = u1 (x) W1 (s)
f (s) ds + u2 (x)
W (s) x W2 (s)
f (s) ds,
W (s) where W (s) := W (u1 , u2 )(s) and Wj (s) is obtained from W (s) by replacing the j th
column (uj , uj ) of W by the vector (0, 1/a2 ) .
Remark: If we let
G(x; s) = u1 (x)W1 (s) + u2 (x)W2 (s)
W (s) then the above formula assumes the elegant form
x up (x) = G(x; s)f (s) ds. Proof: A device (due to Lagrange) called variation of parameters is needed. We already
used a form of this device to solve the inhomogeneous ﬁrst order linear equation (5, p. 457).
The trick is to let
up (x) = v1 (x)u1 (x) + v2 (x)u2 (x)
where the functions v1 (x) and v2 (x) are to be found. This attempt to ﬁnd up is reminiscent
of writing the general solution of the homogeneous equation as Au1 + Bu2 . Diﬀerentiate:
up (x) = v1 u1 + v2 u2 + [v1 u1 + v2 u2 ].
The functions v1 and v2 will be chosen to make
v1 u1 + v2 u2 = 0. 6.3. LINEAR EQUATIONS OF SECOND ORDER 269 Using this, we diﬀerentiate again
up (x) = v1 u1 = v2 u2 + [v1 u1 + v2 u2 ]
Now multiply up by a2 , up by a1 , up by a0 , and add to ﬁnd
Lup = v1 Lu1 + v2 Lu2 + a2 [v1 u1 + v2 u2 ]
= a2 [v1 u1 + v2 u2 ].
If we can choose v1 and v2 so that a2 [ = f , then indeed Lup = f , so u0 = v1 u1 +v2 u2
is a particular solution. It remains to see if v1 and v2 can be found which satisfy the two
needed conditions
v1 u1 + v2 u2 = 0
v1 u1 + v2 u2 = f
.
a2 These two linear equations for v1 and v2 may be solved by Cramer’s rule (Theorem 33,
page 429),
0
u2
0
u2
f
f /a2 u2
1/a2 u2
W1
v1 =
=
=
f
W
W
W
u1
0
u1 0
f
u1 f /a2
u1 1/a2
W2
v2 =
=
=
f
W
W
W
Integration of these equations yields v1 and v2 , which, when substituted into up = u1 v1 +
u2 v2 , do give the stated result
With this theorem, knowing the general solution of the homogeneous equation Lu = 0
˜
allows us to ﬁnd a particular solution of the homogeneous equation Lup = f . The general
solution u of the inhomogeneous equation Lu = f is then the up coset of N(L) , that is,
all functions of the form
u = up + u.
˜
This puts the burden on ﬁnding the general solution of the homogeneous equation.
Examples:
(1) . The homogeneous equation x2 u − 3xu + 3u = 0, x = 0 , has the two linearly
independent solutions u1 (x) = x, u2 (x) = x3 —which might have been found by the
power series method. Therefore a particular solution of the inhomogeneous equation
x2 u − 3xu + 3u = 2x4
can be found by the variation of parameters. We try
up = v1 x3 + v2 x
and are led to the equations
v1 = − 2 x4 3
x
x2
,
3
2x v2 = 2 x4
x
x2
3
2x 270 CHAPTER 6. LINEAR ORDINARY DIFFERENTIAL EQUATIONS
or
v1 = −x2 ,
Thus
v1 (x) = − x3
,
3 v2 = 1.
v2 (x) = x. Therefore x3
2
) + x3 (x) = x4
3
3
The general solution to the inhomogeneous equation is found by adding the general
solution of the homogeneous equation to this particular solution,
up (x) = x(− 2
u(x) = Ax + Bx3 + x4 .
3
(2) The homogeneous equation u + u = 0 has the linearly independent solutions u1 (x) =
cos x, u2 (x) = sin x . Let us solve
u + u = f (x),
where f is an arbitrary continuous function. Trying
up (x) = v1 cos x + v2 sin x,
we are led to
v1 =
Thus −f sin x
,
1 v2 = f cos x
.
1 x v1 (x) = − x f (s) sin s ds, Therefore v2 (x) = x up (x) = − cos x f (s) cos s ds.
x f (s) sin s ds + sin x f (s) cos s ds x = f (s)[− sin s cos x + cos s sin x] ds
x = f (s) sin(x − s) ds. Consequently, the handsome formula
x u(x) = A sin x + B cos x + f (s) sin(x − s) ds is the general solution of the inhomogeneous equation u + u = f . Exercises
(1) Solve the following initial value problems any way you can. Check your answers by
substituting back into the diﬀerential equation. 6.3. LINEAR EQUATIONS OF SECOND ORDER
(a) u + 2u = 0, u(1) = 2 (b) u + 3u + 2u = 7, u(0) = 0, u (0) = 0 (c) u + 3u + 2u = 2ex , (d) u + 3u + 2u = e−2x , u(0) = 0, u (0) = 1
u(0) = 1, u (0) = 0
u( π ) = 1
6 (e) (tan x) du + u − sin2 x = 0,
dx u(0) = u (0) = 1, x (− π , π ).
22 (f) u + u = tan x,
(g) u − 8u = 0,
(h) u u(0) = 1, u (0) = 2, u (0) = 3 − k 4 u = 0. General solution. (i) u − 6u + 10u = x2 + sin x,
(j) u − 7u − 8u = 0, (k) xu + u = x3 ,
(l) u + 4u =
(m) u − u = 4x2
ex (o) − u(4) + u(0) = u (0) = 0. u(0) = 3, u (0) = 8, u (0) = 65, u (0) = 511. u(1) = 1.
+ cos 2x, u(0) = 0, u (0) = 1 . General solution. (n) u = 3u + 3u − u = 0,
u(5) 271 3u(3) − 3u(2) u(0) = 1, u (0) = 2, u (0) = 3
− 4u(1) + 4u = 0 . General solution. [Hint: λ5 − λ4 + 3λ3 − 3λ2 − 4λ + 4 = (λ2 − 1)(λ2 + 4)(λ − 1) ].
(2) Find the ﬁrst four non-zero terms (if there are that many) in the power series solutions
about x = 0 for the following equations.
(a) u − xu − u = 0, u(0) = u (0) = 1 (b) u − 2xu + 2u = 0, u(0) = 0, u (0) = 1. (c) u − 2xu − 2u = 0, u(0) = 1, u (0) = 0. (d) u + xu = 0, u(0) = 1, u (0) = −1. (e) u − xu = 0, u(0) = 1, u (0) = u (0) = 0. (f) u − x2 u =
(g) u − 1
1− x u 1
,
1 − x2 = 0, 1
=
1 − x2
1
1−x =?] u(0) = 0, u (0) = 0. [Hint:
u(0) = 0, u (0) = 1. [Hint: 1 + x2 + x4 + · · · ] (3) a) - e) Find where the power series in Ex. 2 a-e converge.
(4) Find the ﬁrst four non-zero terms (if there are that many) in the power series solutions
corresponding to the larger root of the indicial polynomial.
(a) 2x2 u − 3xu + 2u = 0
∞ (b) xu + 2u − xu = 0. [Answer: u(x) = c0
0 x2n.
(2n + 1)! (c) 4xu + 2u + u = 0.
(d) xu + (sin x)u + x2 u = 0,
(e) xu + u = u(0) = 0, u (0) = 1. x2 . (5) (a-e). Investigate the convergence of the series solutions found in Exercise 4 above. 272 CHAPTER 6. LINEAR ORDINARY DIFFERENTIAL EQUATIONS (6) Find the power series solution about x = 0 for the n th order Bessel equation corresponding to the highest root of the indicial polynomial. The answer is:
x
Jn (x) = ( )n
2 ∞
k=0 (−1)k
x
( )2k ,
k !(k + n)! 2 where we have chosen c0 = 1/2n n! .
(7) Find two linearly independent power series solutions of
u + xu + u = 0
and prove they are linearly independent. Find all solutions.
(8) The Hermite equation is
u − 2xu + 2αu = 0.
For which value(s) of the constant α are the solutions polynomials - that is, a solution
with a ﬁnite Taylor series. These are the Hermite polynomials.
(9) Find the ﬁrst three non-zero terms in the power series about x = 0 for two linearly
independent solutions of
2x2 u + xu + (x − 1)u = 0.
(10) The homogeneous equation Lu := 2x2 u − 3xu − 2u = 0 has the two linearly inde√
pendent solutions u1 (x) = x2 , u2 (x) = x (see Ex. 20c below). Find the general
solution of the inhomogeneous equation Lu = log(x3 ) .
(11) Let Lu = (1 − x2 )u − 2xu + n(n + 1)u where n is an integer. Show that Lu = 0
has a polynomial solution - the Legendre polynomial. Compute this for n = 3 . (cf.
page 104l Ex. 10).
dJ0
is a solution
dx
of the ﬁrst order Bessel equation. [Hint: Work directly with the equation itself, not
with power series]. (12) Let J0 (x) be a solution of the zero th order Bessel equation. Prove (13) Consider the equation
a2 (x)u + a1 (x)u + a0 (x)u = 0.
(a) Let u(x) := u1 (x)v (x) . Show that the result arranged as an equation for v (x)
is
a2 u1 v + (2a2 u1 + a1 u1 )v + (a2 u1 + a1 u1 + a0 u1 )v = 0
(b) If u1 is known to be one solution of the equation, show that the second solution
is u2 (x)
u2 (x) = u1 (x) w(x) dx where w(x) is a solution of the ﬁrst order equation
a2 u1 w + (2a2 u1 + a1 u1 )w = 0. 6.3. LINEAR EQUATIONS OF SECOND ORDER 273 Thus, if one solution of a second order linear O.D.E. is known, the problem of ﬁnding
a second solution is reduced to the problem of solving a ﬁrst order linear O.D.E. which can always be solved by separation of variables.
(14) Apply Exercise 13 to the following:
(a) One solution of 2x2 u − 3xu + 2u = 0 is u1 (x) = x2 . Find another.
(b) One solution of x2 u − xu + u = 0 is u1 (x) = x . Find another.
(c) One solution of (1 + x)xu − xu + u = 0 is u1 (x) = x . Find another, and then
write down the general solution.
(d) One solution of the equation x2 u + 2xu = 0 is clearly u1 (x) = 1 . Find another.
Prove the solutions are linearly independent for x > 0 . Find the general solution
of x2 u + 2xu = 1 .
(15) Consider the O.D.E. u + a(x)u + b(x)u = 0 , where a and b are continuous about
x0 . If the graphs of two solutions are tangent at x = x0 , are these two solutions
linearly dependent? Explain: Can you make an even stronger deduction?
(16) (a) Let L be a constant coeﬃcient diﬀerential operator with characteristic polynomial p(λ) . If p(λ) = p(−λ) , prove
L(sin kx) = p(ik ) sin kx
(b) Apply this to ﬁnd a particular solution of u − u = sin 2x (17) Find a particular solution of the equation
u − n2 u = f, n = 0. [You will need: sin h(α − β ) = sin h α cos h β − sin h β cos h α ].
1x
[Answer: u(x) =
f (s) sin h n(x − s) ds. ]
n0
(18) Use the method of variation of parameters to ﬁnd a particular solution to u = f .
Compare with Exercise 5, p. 282.
(19) Consider the diﬀerential operator
Lu := x2 u + axu + bu,
where a and b are constants. This is called Euler’s equation. It is the simplest
equation with a regular singularity at x = 0 .
(a) Show that Lxρ = q (ρ)xρ , where q (ρ) is the indicial polynomial for L .
(b) If the roots of q (ρ) = 0 are distinct, ﬁnd two solutions of Lu = 0, x > 0 , and
prove the solutions are linearly independent for x > 0 .
(c) If the roots ρ1 and ρ2 of q (ρ) = 0 coincide, take the derivative with respect to ρ
of the equation in a) - holding x ﬁxed - to obtain the candidate u2 (x) = xρ1 ln x
for a second solution. Verify by substitution that u2 is a solution in this case
and prove the two solutions
u1 (x) = xρ1 , u2 (x) = xρ1 ln x,
are linearly independent for x = 0 . x>0 274 CHAPTER 6. LINEAR ORDINARY DIFFERENTIAL EQUATIONS (20) Apply the method of Exercise 19 to ﬁnd two linearly independent solutions for each
of the following Euler equations
a). x2 u + xu = 0.
b). 2x2 u − 3xu + 2u = 0.
c). 2x2 u − 3xu − 2u = 0.
d). x2 u − xu + u = 0.
(21) (a) Use the result of Ex. 19 a) to ﬁnd a particular solution of the equation Lu = xα ,
where
Lu := x2 u + axu + bu,
with a and b constant, and where α is not a root of the indicial polynomial
q (ρ) (cf. Ex. 6, p. 300).
(b) If neither α not β are roots of q (ρ) , ﬁnd a particular solution to the inhomogeneous equation
Lu = Axα + Bxβ .
(c) Apply this procedure to ﬁnd the general solution of
2x2 u − 3xu − 2u = 3x − 4x1/3 .
(d) How can you solve Lu = xα if α is a root of the indicial polynomial?
(22) (a) If u has n derivatives and λ is a constant, prove
Dn [eλx u] = eλx (D + λI )n u.
Thus (D + λI )n u = e−λx Dn [eλx u] .
(b) Let L = (D − a)n be a constant coeﬃcient diﬀerential operator with characteristic polynomial p(λ) = (λ − a)n . Show u(x) is a solution of the equation
Lu = 0 if and only if u(x) has the form
u(x) = eax Q(x),
where Q(x) is a polynomial of degree ≤ n − 1 .
(23) Consider the O.D.E. Lu = f , where L is a second order constant coeﬃcient operator,
and let λ1 and λ2 be the characteristic roots of L1 . Assume i) Reλ1 < 0 and
Reλ2 < 0 , and ii)there is some constant M such that |f (x)| ≤ M for all x ∈ [0, ∞] .
(a) Prove every solution of Lu = f is bounded for x ∈ [0, ∞] .
(b) If lim f (x) = 0 , prove that as x → ∞ , every solution of Lu = f tends to zero.
x→∞ (24) Consider the operator Lu := a2 (x)u +a1 (x)u +a0 (x)u , where the aj ’s are continuous
for x ∈ [α, β ] . Let u1 , u2 and φ1 , φ2 both be bases for N(L) . Prove there is a
constant k = 0 such that
W (u1 , u2 )(x) = kW (φ1 , φ2 )(x) for all x ∈ [α, β ]. 6.3. LINEAR EQUATIONS OF SECOND ORDER 275 (25) (a) Generalize the procedure of Ex. 21b and show how the inhomogeneous Euler
equation Lu = f can be solved if f has a power series expansion. You will have
to assume that no root of the indicial polynomial is a positive integer.
(b) Apply a) to ﬁnd a particular solution (as a power series) of
2x2 u + 3xu − u = 1
.
1−x (26) Given the equation Lu := u + a(x)u + b(x)u = 0 has solutions u1 (x) = sin x ,
u2 (x) = tan x , ﬁnd the general solution of the inhomogeneous equation
Lu = cos x
.
1 + sin2 x (27) (a) If Lu := a2 u + a1 u + a0 u and L∗ v := (a2 v ) − (a1 v ) + a0 v , prove the Lagrange
identity
d
[a2 (u v − v u) + (a1 − a2 )uv ],
vLu − uL∗ v =
dx
where the functions aj are assumed to be suﬃciently diﬀerentiable. The operator
L∗ is the adjoint of L .
(b) Show that L is self-adjoint, L = L∗ , if and only if a2 = a1 . Write the Lagrange
identity in this case.
(c) If c1 u1 (x) + c2 u2 (x) is the general solution of the equation Lu = 0 ﬁnd the
c3 u1 + c4 u2
general solution of the adjoint equation L∗ v = 0 . [Answer: v =.
u1 u2 − u1 u2
(d) Let u be a twice diﬀerentiable function which vanishes at α and β . Show the
adjoint operator L∗ has the property that for all such functions u and v ,
v , Lu = L∗ v, u
where β f , g := f (x)g (x) dx.
α (28) (a) Let L be a self-adjoint operator, L = L∗ . If LX1 = λ1 X1 and LX2 = λ2 X2 ,
where λ1 and λ2 are real number, λ1 = λ2 , prove X1 and X2 are orthogonal
X1 , X2 = 0.
[Hint: Compare X2 , LX1 = λ1 X2 , X1 with LX2 , X1 = λ2 X2 , X1 ].
2 d
(b) Let L = dx2 . For what values of λ can you ﬁnd a non-zero solution u of the
equation Lu = λu where u satisﬁes the boundary conditions u(0) = u(π ) = 0 ? (c) Apply parts a) and b) as well a Ex. 27d to prove
π sin nx, sin mx = sin nx sin mx dx = 0,
0 where n and m are unequal integers. 276 CHAPTER 6. LINEAR ORDINARY DIFFERENTIAL EQUATIONS (29) . Consider the boundary value problem
Lu := u + u = f, u(0) = 0, u(π ) = 0, where f is continuous in [0, π ] .
a). Show that if a solution exists, it is not unique.
b). Show a solution exists if and only if
π f (x) sin x dx = 0.
0 [Hint: First ﬁnd the general solution of the homogeneous equation].
Remark: In the notation of Ex. 27, we have L = L∗ . Moreover, N(L∗ ) =
span{ sin x } . The conclusions of b) states that R(L) = N(L∗ )⊥ , and illustrates
how Theorem 34, p. 431, is used in inﬁnite dimensional spaces.
(30) . A proof of Theorem 3. Since a2 (x) = 0 , the equation can be written as
u + a(x)u + b(x)u = 0.
If ∞ ∞ αn x2 , a(x) =
n=0 let βn xn , b(x) =
n=0 ∞ cn xn , u(x) = where u(0) = c0 , u (0) = c1 , n=0 (a) Imitate the example to prove the remaining cn ’s must satisfy
n cn+2 = −
k=0 [αn−k (k + 1)ck+1 + βn−k ck ]
.
(n + 2)(n + 1) Show that if c0 and c1 are known, then the remaining cn ’s are determined
inductively by the above formula.
(b) Because the series for a(x) and b(x) converge for |x| < r , if R is any number
M
M
less than r , there is a constant M such that for all n, |αn | ≤ Rn and |βn | ≤ Rn
(cf. p. 72, line 2). Deﬁne constants Cn as
C0 = |c0 | , C1 = |c1 | ,
and for n ≥ 0
n [(k + 1)Ck+1 + Ck ]Rk + M Cn+1 R M
Rn Cn+2 =
(i) Prove |cn | ≤ Cn , k=0 (n + 2)(n + 1)
n = 0, 1, 2, 3, . . . . 6.3. LINEAR EQUATIONS OF SECOND ORDER 277 (ii) Prove
Cn+1 xn+1
n(n − 1) + M nR + M R2
=
|x| .
Cn xn
R(n + 1)n
∞ Cn xn converges for |x| < R , where R is any number less than (iii) Prove
r. n=0
∞ cn xn converges for |x| < R , where R is any number less than (iv) Prove
r. n=0 (31) .
(a) Let u(x) and v (x) be solutions of the equations L1 u := u + a(x)u = 0 ,
and L2 v := v + b(x)v = 0 respectively, in some interval, where a and b are
continuous. If b(x) ≥ a(x) throughout the interval, prove there must be a zero
of v between any two zeroes of u . This is the Sturm oscillation theorem. [Hint:
Suppose α and β are consecutive zeroes of u and u > 0 in (α, β ) . Prove
β (vL1 u − uL2 v ) dx = vu 0=
α (b) (c)
(d) (e) β
α β − (b − a)uv dx,
α and show, because u (α) > 0, u (β ) < 0 , there is a contradiction if v does not
vanish somewhere in (α, β ) .]
Let u1 (x) and u2 (x) be two linearly independent solutions of u + a(x)u = 0 .
Prove between any two zeroes of u1 , there is a zero of u2 and vice verse. Thus,
the zeroes interlace.
Apply b) to the solutions sin γx and cos γx of the equation u + γ 2 u = 0 to
conclude a well-known fact.
If b(x) ≥ δ > 0 , where δ is a constant, prove every solution of v + b(x)v =
0 must have an inﬁnite number of zeros by comparing v with a solution of
u + γ 2 u = 0 , where γ is an appropriate constant.
Apply d) to prove every solution of 3
)v = 0,
4x2
has an inﬁnite number of zeroes for x ≥ 1 .
√
(f) Let u1 (x) be a solution of the ﬁrst order Bessel equation. Take v (x) = u1 (x) x
and show that v satisﬁes the equation in e). Deduce that J1 (x) has inﬁnitely
many zeroes.
v + (1 − (32) Let L1 and L2 be linear constant coeﬃcient diﬀerential operators with characteristic
polynomials p1 (λ) and p2 (λ) respectively.
(a) If there is a function u(x), u(x) ≡ 0 , which satisﬁes both L1 u = 0 and L2 u = 0 ,
prove the polynomials p1 and p2 have a common root.
(b) If p1 and p2 have no common roots, prove the solution of L1 L2 u = 0 are exactly
all functions of the form c1 u1 + c2 u2 where u1 is a solution of L1 u1 = 0 , and u2
of L2 u2 = 0 . Thus N(L1 L2 ) may be decomposed into the two complementary
subspaces N(L1 ) and N(L2 ), N(L1 L2 ) = N(L1 ) ⊕ N(L2 ) . 278 CHAPTER 6. LINEAR ORDINARY DIFFERENTIAL EQUATIONS (33) Imitate Exercise 30 and prove Theorem 3. Make sure to observe the trouble in trying
to ﬁnd the solution corresponding to the lower root of the indicial polynomial if the
roots diﬀer by an integer.
(34) The purpose of this exercise is to show that an equation with an irregular singular
point may have a formal power series at that point which does not converge to the
solution.
Try to ﬁnd a solution of the form (18) for the following equation which has an irregular
singularity at x = 0 ,
x6 u + 3x5 u − 4u = 0.
What happened? Two linearly independent solutions for x = 0 are
u1 (x) = e−1/x 2 2 and u2 (x) = e1/x . How does this explain the situation (cf. p. 95-6)?
(35) Consider the equation 2x2 u + 3xu + u = (x) . Two linearly independent solutions
of the homogeneous equation are x−1/2 and x−1 . Find the general solution of the
homogeneous equation.
(36) Consider the equation u + b(x)u + c(x)u = 0 , where b and c are continuous
functions and c(x) < 0 . Prove that a solution cannot have a positive maximum or
negative minimum. 6.4 First Order Linear Systems Quite often in applications you must consider systems of diﬀerential equations. We shall
consider a linear system of the form
du1
+ a11 (x)u1 + a12 (x)u2 + · · · + a1n (x)un = f1 (x)
dx
du2
+ a21 (x)u1 + a22 (x)u2 + · · · + a2n (x)un = f2 (x)
dx
.
.
.
.
.
.
dun
+ an1 (x)u1 + an2 (x)u2 + · · · + ann (x)un = fn (x) ,
dx (6-22)
(6-23)
(6-24)
(6-25) where the functions aij (x) and fj (x) are continuous. If we anticipate the next chapter
and write the derivative of a vector U = (u1 , . . . , un ) as the derivative of its components,
d
U (x) =
dx du1 du2
dun
,
,··· ,
dx dx
dx , then the above system can be written in the clean form
dU
+ A(x)U = F (x),
dx
where,
A(x) = ((aij )), F = (f1 , f2 , . . . , fn ) (6-26) 6.4. FIRST ORDER LINEAR SYSTEMS 279 and
U (x) = (u1 , u2 , . . . , un ).
The initial value problem for the system of diﬀerential equations (22) is to ﬁnd a vector
U (x) which satisﬁes the equation as well as the initial condition
U (x0 ) = U0 , (6-27) where U0 is a vector of constants.
It is useful to observe that the initial value problem for a single linear equation of order
n
u(n) + an−1 (x)u(n−1) + · · · + a0 (x)u = f (x)
u(x0 ) = α1 , u (x0 ) = α2 , . . . , u(n−1) (x0 ) = αn ,
can be transformed to the conceptually simpler problem (22)-(23). Let u1 (x) := u(x) ,
u2 (x) := u (x), . . . , and un (x) = u(n−1) (x) . Then the components of the vector U (x) =
(u1 , u2 , . . . , un ) must obviously satisfy the relations
du1
dx
du2
dx = u2
= u3 ·
·
·
dun−1
dx
dun
dx = un
= −a0 u1 − a1 u2 − · · · − an−1 un + f (x), which may be written as
U = M U + F,
where M (x) = 0
0 1
0
.
.
. 0
1 ···
···
.
.
. 0
0
0
···
−a0 −a1 −a2 · · · 0
0
1
−an−1 , and
F = (0, 0, . . . , 0, f ).
The initial conditions read
U (x0 ) = (α1 , α2 , . . . , αn ).
Conversely, if U is any solution of this system of equations with the proper initial conditions,
then the ﬁrst component u1 (x) is a solution of the single n th order equation. Thus, the
general theory of a single n th order linear O.D.E. is completely subsumed as a portion
of the theory of a system of ﬁrst order linear O.D.E.’s. You should be warned that this
generalization is mainly of theoretical value and is of little use if you are seeking an explicit
solution.
Both the existence and uniqueness theorems are true for systems, and supply an example
where the theoretical advantages of systems become clear. To illustrate this, we shall prove
the uniqueness theorem. Our proof is patterned directly after the uniqueness proof for a
single equation (Theorem 1). 280 CHAPTER 6. LINEAR ORDINARY DIFFERENTIAL EQUATIONS Theorem 6.9 (Uniqueness). Let A(x) be a matrix whose coeﬃcients aij (x) are bounded
|aij (x)| ≤ M for x in some interval, and let F (x) be a continuous function. Then there
is at most one solution U (x) of the initial value problem
U + AU = F, U (x0 ) = U0 . Remark: The existence theorem states, if A is nonsingular and each element is integrable
there is at least one solution. Thus, there is then exactly one solution.
Proof: Assume U1 and U2 are both solutions. Let
W = U1 − U2 .
Then W satisﬁes the homogeneous equation and is zero at x0 ,
W + AW = 0, W (x0 ) = 0. Take the scalar product of this with W ,
W, W + W, AW = 0. But
= w1 w1 + ww2 + · · · + wn wn
1d
2
2
(w2 + w2 + · · · + wn )
=
2 dx 1
1d
=
W 2.
2 dx W, W Thus,
1d
W 2 = − W, AW .
2 dx
By Theorem 17, p. 173 and the hypothesis |aij (x)| ≤ M , we know
n W, AW |aij |2 ≤ 1/2 2 W ≤ nM W 2 . i,j =1 so that 1d
W
2 dx 2 ≤ nM W 2 . Therefore, as on p. 462-3
d
(W
dx
or
e2nM x 2 2 ) − 2nM W d −2nM x
[e
W
dx 2 ≤ 0, ≤ 0. Because e2nM x is always positive, by the mean value theorem the quantity [
decreasing function. Its value for x > x0 is then less than at x0 ,
e−2nM x W (x) 2 ≤ e−2nM x0 W (x0 ) 2 , x ≥ x0 is a 6.4. FIRST ORDER LINEAR SYSTEMS 281 Consequently
W (x) ≤ enM (x−x0 ) W (x0 ) , x ≥ x0 . Since W (x0 ) = 0 and the norm is non negative, we have
0 ≤ W (x) ≤ 0, x ≥ x0 , which implies
x ≥ x0 . W (x) = 0,
Therefore,
W (x) ≡ 0 x ≥ x0 . By replacing x with −x in the original equation, the same statement is true for x ≤ x0 .
Thus, throughout the interval where |aij (x)| ≤ M , we have proved W (x) ≡ 0 , that is,
U1 (x) ≡ U2 (x) , so the solution is indeed unique.
Because a single linear n th order O.D.E. can be replaced by an equivalent system of
equations, this theorem implies the uniqueness theorem for a single O.D.E. of order n if the
coeﬃcients aj (x) are bounded in some interval - which is certainly true in every interval if
the aj ’s are continuous.
With this theorem, a short section closes. Further developments in the theory of systems
of linear O.D.E.’s make elegant use of linear operators in general and matrices in particular.
As you might well accept, the exercises contain a few of the more accessible results. Exercises
(1) . Find functions u1 (x), u2 (x) which satisfy
u1 = u1
u2 = u1 − u2 ,
with the initial conditions U (0) := (u1 (0), u2 (0) = (1, 0) . Find the general solution
too. [Hint: Solve the equation u1 = u1 ﬁrst, then substitute. Answer: General
solution is U (x) = (γ1 ex , γ1 ex + γ2 e−x ) ].
2
(2) Consider the system
u1 = 2u1 − u2
u2 = 3u1 − 2u2 ,
that is,
U = AU, where A= 2 −1
3 −2 . Let φ1 (x) = au1 + bu2 , φ2 (x) = cu1 + du2 , where a, b, c and d are constants. Thus,
Φ = SU,
where
S= ab
cc , Φ = (φ1 , φ2 ). 282 CHAPTER 6. LINEAR ORDINARY DIFFERENTIAL EQUATIONS
(a) By direct substitution, ﬁnd the diﬀerential equations satisﬁed by the φj ’s and
show they can be written as
Φ = SAS −1 Φ.
(b) Pick the coeﬃcients of S so the matrix SAS −1 is a diagonal matrix,
SAS −1 = λ1 0
0 λ2 ≡Λ (c) Solve the resulting equation Φ = ΛΦ . [Solution: φ1 = αex ,
might have φ1 and φ2 interchanged]. φ2 = βe−x —you (d) Use this solution to solve the original equations for U . [hint: Recall U =
S −1 Φ ].
(3) By only a slight modiﬁcation of Exercise 2, solve
v1 = 2v1 − v2
v2 = 3v1 − 2v2 .
[Hint: Everything, even the algebra, is identical. The only diﬀerence is in part c) you
have to solve Φ = ΛΦ . Then V = S −1 Φ as before].
(4) A bathtub initially contains Q1 gallons of gin and Q2 gallons of vermouth, where
Q1 + Q2 = Q, Q being the capacity of the tub. Pure gin enters from one faucet at
a constant rate of R1 gallons per minute, while pure vermouth enters from another
faucet at a constant rate R2 gallons per minute. The well stirred mixture of martinis
leaves the drain at a rate R1 + R2 gallons per minute (so the total amount of ﬂuid in
the tub remains constant at Q gallons). Let G(t) be the quantity of gin in the tub
at time t and V (t) be the quantity of vermouth.
(a) Show
G
dG
= R1 − (R1 + R2 )
dt
Q
dV
V
= R2 − (R1 + R2 ).
dt
Q
(b) Integrate this simple system of equations to ﬁnd G(t) and V (t) . Also ﬁnd their
ratio P (t) := G(t)/V (t) which is the strength of the martinis at time t .
(c) Prove
lim P (t) = t→∞ R1
.
R2 Compare this with your intuitive expectations.
(d) If Q1 = 20, Q2 = 0, R1 = R2 = 1 gal/min, how long must I wait to get a perfect
martini (for me, perfect is 5 parts gin to 1 part vermouth). [Needless to say, the
mathematical model is applicable to many problems in the mixing of chemicals
which do not react with each other. If the chemicals do interact, the model must
be changed to account for the interaction]. 6.5. TRANSLATION INVARIANT LINEAR OPERATORS 283 (5) Consider the homogeneous equation U = A(x)U , where A is non-singular (so
det A = 0 ). Assuming the validity of the existence theorem, prove there exists
n linearly independent vectors U1 (x), U2 (x), . . . , Un (x) which are solutions, Uk =
AUk , k = 1, . . . , n . [Hint: Construct n solutions which are linearly independent at
x = x0 , and then prove a set of n solutions are linearly independent in an interval
if and only if they are linearly independent at x = x0 , where x0 is a point in the
interval].
(6) Let LU := U − A(x)U as in Exercise 5. Prove dim N(L) = n .
(7) Let LU := U − A(x)U . If a basis U1 , . . . , Un , for N(L) is known, prove the inhomogeneous equation LU = F can be solved by variation of parameters. That is, seek
a particular solution Up of LU = F in the form
n Up = Ui vi
i=1 where the vi (x) are scalar-valued functions (not vectors).
(a) Compute Up and substitute into the O.D.E. to conclude Up is a particular
solution if
n Ui vi = F.
i=1 (b) Let U be the n × n matrix whose columns are U1 , U2 , . . . , Un . Prove U is
invertible and show
vi (x) = (U −1 F )ith component .
(c) Show
n Up (x) = x Ui (x) [U −1 (s)F (s)]i ds. i=1 This may also be written in the form
x Up (x) = U (x) U −1 (s)F (s) ds (d) Apply this procedure to ﬁnd the general solution of
uq = u1 + e2x cf. Ex 1 u2 = u1 − u2 + 1. 6.5 Translation Invariant Linear Operators This section develops various extensions and applications of the procedure used to solve
linear ordinary diﬀerential equations with constant coeﬃcients. The results will be proved
as a series of exercises interspersed by various remarks.
Definition: The translation operator Tt acting on functions u(x) is deﬁned by the property
(Tt u)(x) = u(x − t).
x, t ∈ R. 284 CHAPTER 6. LINEAR ORDINARY DIFFERENTIAL EQUATIONS A linear operator L is translation invariant if
LTt = Tt L
for every t , that is, if
L(Tt u) = Tt (Lu)
for every t and for every function u for which the operators are deﬁned.
Example: 1 Let (Lu)(x) := 3u(x) − 2u(x − 1) . Then
[Tt (Lu)](x) = 3u(x − t) − 2u(x − t − 1),
and
[L(Tt u)](x) = Lu(x − t) = 3u(x − t) − 2u(x − t − 1).
Thus,
LTt = Tt L,
so the operator L is translation invariant.
2. Let (Lu)(x) := 3xu(x). Then
[Tt (Lu)](x) = 3(x − t)u(x − t),
and
[L(Tt u)](x) = Lu(x − t) = 3xu(x − t).
Thus
LTt = Tt L,
so this operator is not translation invariant. Exercises
(1) Which of the following linear operators (verify!) are also translation invariant?
(a) (Lu)(x) := cu(x),
(b) (Lu)(x) := c ≡ constant u ( x+ h ) − u ( x)
,
h
x h ≡ constant = 0 . k (x − s)u(s) ds (c) (Lu)(x) :=
−∞ (d) (Lu)(x) := (x − 1)u(x)
(e) (Lu)(x) = du
dx (x). (f) Any linear ordinary diﬀerential operator with constant coeﬃcients,
Lu := an u(n) + an−1 u(n−1) + · · · + a0 u, ak constants. (g) Any linear ordinary diﬀerential operator with variable coeﬃcients.
n ak u(x − γk ), (h) (Lu)(x) = ak and γk constants. k=1 [Answers: All but d) and g) are translation invariant]. 6.5. TRANSLATION INVARIANT LINEAR OPERATORS 285 (2) If L1 and L2 are translation invariant operators which map some linear space into
itself, then so are
a). AL1 + BL2 , A, B constants b). L1 L2 and L2 L1
c). If in addition L is invertible, then L−1 is also translation invariant.
Theorem 6.10 . If L is a translation invariant linear operator, then
L(eλa ) = φ(λ)eλx .
Proof: We know so little about L that all we can hope to do is compute Tt L(eλx )
and LTt (eλx ) and see what happens. Let Leλx = ψ (λ; x) , where ψ is some unknown
function whose value depends on both λ and x . Then
Tt L(eλx ) = ψ (λ; x − t),
while
LTt eλx − Leλ(x−t) = L(e−λt eλx )
= e−λt Leλx = e−λt ψ (λ; x).
Since Tt L = LTt , we ﬁnd
e−λt ψ (λ; x) = ψ (λ; x − t),
or
ψ (λ; x) = ψ (λ; x − t)eλt .
Because the left side does not contain t , the right side must not depend on which
value of t is chosen. Using this freedom, we let t = x and conclude
ψ (λ; x) = ψ (λ; 0)eλx .
By setting φ(λ) = ψ (λ, 0) , we ﬁnd
Leλx = ψ (λ; x) = φ(λ)eλx
as desired. Exercises
(3) By direct substitution, ﬁnd φ(λ) for those operators in Exercise 1 which are translation invariant. [Answers: a) φ(λ) = c , b) φ(λ) = (e−ah − 1)/h c) φ(λ) =
n 0
λs k (−s)e ak λk (the characteristic polynomial), ds , d) φ(λ) = cλ , f) φ(λ) = −∞ k=0 n ak e−λγ k ]. h) φ(λ) =
k=1 286 CHAPTER 6. LINEAR ORDINARY DIFFERENTIAL EQUATIONS (4) With the same assumptions and notation as in the theorem, if φ(λ) = 0 is a polynomial equation with N distinct roots λ1 , λ2 , . . . , λN , so φ(λj ) = 0, j = 1, . . . , N ,
prove any linear combination of the function eλjx is in N(L) , that is,
N Lu = 0 where cj eλjx . u(x) =
1 (5) Apply the theorem to ﬁnd the solution of Exercise 4 for the equation Lu = 0 , where
(a) Lu := u − u − u .
(b) (Lu)(x) = u(x + 2) − u(x + 1) − u(x) .
(c) Find a special solution of b) which satisﬁes the “initial conditions” u(0) = u(1) =
1 . Compute u(2), u(3) and u(4) directly from b). The integers u(n), n ∈ Z+
are called the Fibonacci sequence. [Answer: u(2) = 2, u(3) = 3, u(4) = 5 , and
surprisingly, √ n+1
√ n+1
1+ 5
1− 5
1
].
−
u(n) = √ 2
2
5
(6) Solve u(x) − au(x − 1) + b2 u(x − 2) = 0 with the initial conditions u(1) = a, u(2) =
a2 − b2 . Compare with Exercise 17, p. 440.
(7) Extend Exercises 5(b - c) and 6 to develop a theory of second order diﬀerence equations
with constant coeﬃcients. Thus
Lu := a2 u(x + 2) + a1 u(x + 1) + a0 u(x), a2 = 0, x ∈ Z. In particular, you should,
(a) Find two linearly independent solutions of Lu = 0 . Remember the degenerate
case a2 − 4a0 a2 = 0 .
1
(b) Prove there is at most one solution of the initial value problem Lu = f , u(0) =
α0 , u(1) = α1 .
(c) Prove dim N(L) = 2 .
Remarks: The ideas presented above generalize immediately to the case where X ∈ Rn
instead of just R1 , as well as to the case where the u ’s are vectors and not scalars. These few
concepts lie at the heart of any treatment of many linear operators with constant coeﬃcients,
especially ordinary and partial diﬀerential operators. This mildly abstract formulation
manages to penetrate through the obscuring details of particular cases to observe a rather
simple structure unifying many seemingly diﬀerent problems. 6.6 A Linear Triatomic Molecule A molecule composed of three atoms is called a triatomic. Consider a triatomic molecule
whose equilibrium conﬁguration is a straight line with two atoms of equal mass m situated
on either side of a central atom of mass M . 6.6. A LINEAR TRIATOMIC MOLECULE 287 a figure goes here
To simplify the situation further, we shall only consider the motion along the straight
line (axis) of these atoms, and shall assume the inter-atomic forces can be approximated by
springs with equal spring constants k . u1 (t), u2 (t) and u3 (t) will denote the displacements
of the atoms (see ﬁg.) from their equilibrium position.
Newton’s second law, mu =
¨
F , will give the equations of motion. The atom on
the left only “feels” the force due to the spring attached to it, the force being equal to the
spring constant k times the amount that spring is stretched, u2 − u1 . Thus,
mu1 = k (u2 − u1 ).
¨
The central atom “feels” two forces, one from each side, with the resulting equation of
motion
M u2 = −k (u2 − u1 ) + k (u3 − u2 ).
¨
In the same way, the equation of motion for the remaining atom is
mu3 = −k (u3 − u2 ).
¨
Collecting our equations, we have
k
k
u1 + u2
m
m
k
2k
k
u2 =
¨
u1 −
u2 +
u3
M
M
M
k
k
u3 = u2 − u3 .
¨
m
m u1 = −
¨ These are a system of three linear ordinary diﬀerential equations with constant coeﬃcients.
They cannot be integrated as they stand since each equation involves functions from the
other equations, that is, the equations are copied (not surprising since we are considering
coupled oscillators. Now we can integrate such a system immediately if they are in the
simple form
¨
φ1 = λ1 φ1
¨
φ2 = λ2 φ2
¨
φ3 = λ3 φ3
by integrating each equation separately. By using an important method, we will be able to
place our system in this special form.
Before doing so, it is suggestive to rewrite the system in matrix form
k
k
−m
u1
¨
m u 2 = k − 2k
¨
M
M
k
u3
¨
0
m 0
k
M
k
−m u1 u2 .
u3 Letting A denote the 3 ×3 matrix, our hope is to somehow change A into a diagonal matrix
(one with zeroes everywhere except along the principal diagonal), for then the diﬀerential
equations will be in a form mentioned above which can be immediately integrated. 288 CHAPTER 6. LINEAR ORDINARY DIFFERENTIAL EQUATIONS The trick is to replace the basis u1 , u2 , u3 by some other basis in which the matrix
assumes a diagonal form. The diﬀerential equation can be written in the form
¨
U = AU,
where U = (u1 , u2 , u3 ) , and the derivative of a vector being deﬁned as the derivative of
each of its components. Let φ1 (t), φ2 (t) , and φ3 (t) be three other functions - which we
plan to use as a new basis. Then the φj ’s can be written as a linear combination of the
uj ’s,
φ = s11 u1 + s12 u2 + s13 u3
φ2 = s21 u1 + s22 u2 + s23 u3
φ3 = s31 u1 + s32 u2 + s33 u3 ,
where sij are constants. Writing S = ((sij )) and Φ = (φ1 , φ2 , φ3 ) , this last equation reads
Φ = SU.
Taking the derivative of both sides (or going back to the equations deﬁning φj in terms of
the uk ’s), we ﬁnd
¨
¨
Φ = SU .
Because both u1 , u2 and u3 as well as φ1 , φ2 , and φ3 are bases for the solution, the matrix
S must be non-singular (its inverse expresses the φj s in terms of the uj ’s). Thus
¨
Φ = SAS −1 Φ.
The problem has been reduced to ﬁnding a matrix S such that the matrix SAS −1 is a
diagonal matrix, λ1 0 0
SAS −1 = 0 λ2 0 ≡ Λ.
0 0 λ3
Multiply by S −1 on the left:
AS −1 = S −1 Λ.
Since this equation is equally between matrices, their corresponding columns must be equal.
ˆ
Thus, if we denote by Si , the i th column of S −1 , the above equation then reads
ˆ
ˆ
ASi = λi Si ,
or
ˆ
(A − λi I )Si = 0.
For each i this is a system of three linear algebraic equations for the three components of
ˆ
Si . If there is to be a non-trivial solution, we know
det(A − λi I ) = 0.
Since k
− m − λi det(A − λi I ) = k
M 0 k
m
2k
−M −
k
m 0
λi k
M k
−m − λi , 6.6. A LINEAR TRIATOMIC MOLECULE 289 (algebra later)
k
2
1
+ λi )[λi + (
+ )k ]
m
M
m
We see the three possible values of λ for det(A − λi I ) = 0 are
= −λi ( λ1 = 0, λ2 = − k
,
m λ3 = −k ( 2
1
+ ).
M
m ˆ
These numbers λi are the eigenvalues of A . The non-trivial solution Si of the homoˆ
geneous equations (A − λi I )Si = 0 corresponding to the i th eigenvalue is called the
ˆ
eigenvalue of A corresponding to the eigenvalue λi . For example, S2 is the solution of
(A − λ2 I )S2 = 0 corresponding to λ2 = −k/m ,
0ˆ12 +
s We see s22 = 0 while s12
ˆ
ˆ k
s22 + 0ˆ32 = 0
ˆ
s
m k
2k
k
k
s12 − (
ˆ
− )ˆ22 +
s
s32 = 0
ˆ
M
M
m
M
k
0ˆ12 + s22 + 0ˆ32 = 0.
s
ˆ
s
m
= −s32 . Thus, one solution is
ˆ
ˆ
S2 = (1, 0, −1) ˆ
Similarly we ﬁnd one solution for S1 is
ˆ
S1 = (1, 1, 1),
ˆ
while one solution for S3 is 2m
ˆ
S3 = (1, −
, 1).
M
The computation is over. All that remains is to put the parts together and interpret
the solution. If you got lost, presumably this recapitulation will help. We have found a
transformation S to new coordinates (φ1 , φ2 , φ3 ) such that the diﬀerential equations for
¨
the φj ’s are in diagonal form, φm = λj φj ,
¨
φ1 = 0
k
¨
φ2 = − φ2
m
2
1
¨
φ3 = −k (
+ )φ3 .
M
m
The solutions are
φ1 (t) = A1 + B1 t,
φ2 (t) = A2 cos
φ3 = A3 cos k( k
t + B2 sin
m 2
1
+ )t + B3 sin
M
m k
t
m
k( 2
1
+ )t.
M
m 290 CHAPTER 6. LINEAR ORDINARY DIFFERENTIAL EQUATIONS ˆ
Since Φ = SU , and the Sj are the columns of S −1 , 1
1
1
0 − 2m ,
S −1 = 1
M
1 −1
1
we have U = S −1 Φ ,
u1 (t) = φ1 (t) + φ2 (t) + φ3 (t)
2m
u2 (t) = φ1 (t)
−
φ3 (t)
M
u3 (t) = φ1 (t) − φ2 (t) + φ3 (t)
Although the solutions φ1 (t), φ2 (t) , and φ3 (t) can now be substituted into the ﬁrst
set of equations for the uj ’s, it is more instructive to leave that step to your imagination
and analyze the nature of the solution.
(1) If φ1 (t) = 0 but φ2 (t) = φ3 (t) = 0, then
u1 (t) = u2 (t) = u3 (t) = A1 + B1 t. Thus all three atoms - the whole molecule - moves with a constant velocity B1 .
This is the trivial translation motion of the molecule, simply moving without internal
oscillations at all.
(2) If φ2 (t) = 0 but φ1 (t) = φ3 (t) = 0 , then
u1 (t) = φ2 (t) = −u3 (t), and u2 (t) = 0. Thus, the two outside atoms vibrate in opposite directions with frequency
while the center atom remains still: k /m a figure goes here
(3) If φ3 (t) = 0 but φ1 (t) = φ2 (t) = 0
2m
φ3 (t).
M
A bit more complicated. The two outside atoms move in the same direction with same
u1 (t) = u3 (t) = φ3 (t), u2 (t) = − 2
1
frequency
k ( M + m ) , while the center atom moves in a direction opposite to them and
with the same frequency but a diﬀerent amplitude (to conserve linear momentum mu1 +
˙
M u2 + mu3 = 0 ). In the ﬁgure we take m = M .
˙
˙ a figure goes here
These three simple motions are called the normal modes of oscillation of the molecule.
They are the oscillations determined by the φ1 , φ2 , and φ3 . Every motion of the system is
a linear combination of the normal modes of oscillation, the particular oscillation depending
on what initial conditions are given. By an appropriate choice of the initial conditions, one
or another of the normal modes will result. Otherwise some less recognizable motion will
result. Exercises
Consider the simpler model of a diatomic molecule 6.6. A LINEAR TRIATOMIC MOLECULE 291 a figure goes here
which we will represent as two masses joined by a spring with spring constant k .
(a) Show the equations of motion are
mu1 = k (u2 − u1 )
¨
M u2 = −k (u2 − u1 )
¨
(b) Introduce new variables, Φ = SU ,
φ1 = s11 + s12 u2
φ2 = s21 u1 + s22 u2 ,
and ﬁnd S so that the equation
¨
Φ = SAS −1 Φ
is in diagonal form.
(c) Solve the resulting equation and ﬁnd the normal modes of oscillation. Interpret your
results with a diagram. 292 CHAPTER 6. LINEAR ORDINARY DIFFERENTIAL EQUATIONS Chapter 7 Nonlinear Operators: Introduction
7.1 Mappings from R 1 to R 1 , a Review .
The subject of this section is one you presumably know well. Our intention is to brieﬂy
review the more important results, stating them in a form which suggests the generalizations
we intend to develop.
Consider a function y = f (x), x ∈ R . This function assigns to each number x another
real number y . Thus we may write
f : R → R.
f is a scalar-valued function of a scalar. What are the simplest such functions? Linear
ones of course,
f (x) = ax + b.
In keeping with our more sophisticated terminology, this should be called an “aﬃne” function (mapping, operator, . . . ) since it is linear only if b = 0 . We shall, however, be abusive
and refer to such functions as linear mappings. The study of linear functions in one variable,
x , is carried out in elementary analytic geometry.
At an early age we enlarged our vocabulary of functions from linear ones to a more
general class which includes, for example,
√
f1 (x) = ax2 + bx + c, f2 (x) = sin x, f3 (x) = x.
These functions are all examples of nonlinear functions. They map the reals (only the
positive reals in the case of f3 ) into the reals. The portion of the reals for which they are
deﬁned is called their domain of deﬁnition, D(f ) . Thus
D(f1 ) = R1 , D(f2 ) = R1 , D(f3 ) = { x ∈ R1 : x > 0 }.
The class of all real valued functions of a real variable is too large to consider. For
most purposes it is suﬃcient to restrict oneself to the class of continuous or suﬃciently
diﬀerentiable functions.
Here is an outline of the basic deﬁnitions and theorems from elementary calculus. In
our prospective generalization from the simplest case of a function (operator) f which
maps numbers to numbers, f : R1 → R1 , to the case of a function from vectors to vectors
f : Rn → Rm , all of these concepts and results will need to be extended.
293 294 CHAPTER 7. NONLINEAR OPERATORS: INTRODUCTION Definition: ak converges to a, ak and a ∈ R1 .
Definition: Continuity.
Theorem 7.1 The set of continuous functions forms a linear space.
Definition: The derivative: limit of diﬀerence quotient.
df
dg
d
Theorem 7.2 1. dx (af + bg ) = a dx f + b dx (linearity)
dg
df
d
2. dx (f g ) = f dx + ( dx )g (Product rule)
df dg
d
3. dx (f ◦ g ) = dg dx (Chain rule) Theorem 7.3 The Mean Value Theorem.
Definition: The integral.
b a f (x) dx = − Theorem 7.4 1.
a
b 2. f (x) dx
b c f (x) dx + c f (x) dx = 1 f (x) dx b a
b b 3. [αf (x) + βg (x)] dx = α
a a
b (f ◦ φ)(x) 4.
a dφ
dx =
dx
b Theorem 7.5 1.
a x d
2.
dx g (x) dx (linearity)
a φ(b) f (x) dx (Change of variable in an integral)
φ(a) df
(x) dx = f (b) − f (a)
dx f (t) dt = f (x)
a b 3. b f (x) dx + β f (x)
a dg
dx = f g
dx b
a b −
a df
g (x) dx (Integration by parts).
dx Remark: These theorems contain essentially all of elementary calculus. What are missing
are speciﬁc formulas for the derivatives and integrals of the basic functions as well as the
application of these theorems to compute maxima, area, etc. Exercises
(1) Use the deﬁnition of the derivative (as the limit of a diﬀerence quotient) to compute
the derivatives of the following functions at the given point.
a). 3x2 − x + 1, x0 = 2
b).
c).
d). 1
x+1 ,
x
1+x ,
x
1− x , x0 = 2
x0 = 2
x = x0 = 1. 7.2. GENERALITIES ON MAPPINGS FROM RN TO RM . 295 (2) Use the deﬁnition of the integral to evaluate
2 x2 dx.
0 You should approximate the area by rectangular strips and evaluate the limit as the
width of the thickest strip tends to zero. [Hint: 12 + 22 + 32 + · · · + n2 = n(n+1)(2n+1) ].
6
(3) Prove that
.6 < log 2 < .8 (log 2 = 0.693) by using the deﬁnition of the integral to ﬁnd upper and lower bounds for
2 log 2 =
1 1
dx.
x (4) Find the equation of the straight line which is tangent to the curve f (x) = x7/3 + 1
at x = 1 . Draw a sketch indicating both the curve and tangent line. Use the tangent
line to approximately evaluate (1.01)7/3 . Find some estimate for the error in your
approximation. 7.2 Generalities on Mappings from R n to R m . A function, or operator, F which maps Rn to Rm , is a rule which assigns to each vector
X in Rn another vector Y = F (X ) in Rm . It is a function from vectors to vectors, a
vector-valued function of a vector. We have already discussed the case when F is an aﬃne
operator,
Y = F (X ) = b + LX
or in coordinates,
y1 = b1 + a11 x2 + · · · + a1n x1n
y2 = b2 + a21 x2 + · · · + a2n xn
·
·
·
ym = bm + am1 x2 + · · · + amn xn
Linear algebra can be thought of as the study of higher dimensional analytic geometry, the
aﬃne transformations taking the role of the straight line y = b + cx .
But now it is time to consider more complicated mappings from Rn to Rm . Here is an
y1 = x1 + x2 sin πx3
√
y2 = e1−x1 − x2 .
This transformation maps vectors X = (x1 , x2 , x3 ) ∈ R3 to vectors Y = (y1 , y2 ) ∈ R2 . Note
the second function is only deﬁned for x2 ≥ 0 . Thus the domain of the transformation F
is
D(F ) = X ∈ R3 : x2 ≥ 0 . Example: For example, F maps the point (1, 4, 1 ) into the point (3, −1) .
6 296 CHAPTER 7. NONLINEAR OPERATORS: INTRODUCTION It is usual to write a transformation F which maps a set A ⊂ Rn to a set B ⊂ Rm in
terms of its components,
y1 = f1 (x1 , . . . , xn ) = f1 (X )
y2 = f2 (x1 , . . . , xn ) = f2 (X )
·
·
·
ym = fm (x1 , . . . , xn ) = fm (X ),
or more concisely as
Y = F (X ).
To discuss continuity etc. for nonlinear mappings from Rn to Rm , it is necessary that
the distance between points be deﬁned. We shall use the Euclidean norm - although any
other norm could also be used. If X = (x1 , . . . , xk ) is a point (or vector, if you like) in Rk ,
then X= x2 + · · · + x2 . To review brieﬂy, a sequence of points Xj in Rk converges
1
k to a point X in Rk if, given any > 0 , there is an integer N such that Xj − X < textf orall j ≥ N. An open ball in Rk of radius r about the point X0 is the set B (X0 ; r) = { X ∈ Rk : X −
X0 < r } .
A closed ball in Rk is
¯
B (X0 ; r) = { X ∈ Rk : X − X0 ≤ r }.
The only diﬀerence is the open ball does not contain the boundary of the ball. In two
dimensions, R2 , the names open and closed disc are often used.
A set D ⊂ Rk is open if each point X ∈ D is the center of some ball contained entirely
within D . The radius may be very tiny. Every open ball is open, as can be seen in the
ﬁgure. A closed ball is not open since there is no way of placing a small ball about a point
on the boundary in such a way that the small ball is inside the larger one. A set A is
closed if it contains all of its limit points, that is, if the points Xj ∈ A converge to a point
X, Xj → X , then X is also in A . An open ball is not closed, for a sequence of points
in the ball may converge to a point on the boundary, and the boundary points are not in
the ball. For the special case of R1 , these notions coincide with those of open and closed
intervals. Again, sets - like doors - may be neither open nor closed.
A point set D is bounded if it is contained in some ball (of possibly large radius). The
point X is exterior to D if X does not belong to D and if there is some ball about X
none of whose point are in D . X is interior to D if X belongs to D and there is some
ball about X all of whose points are in D . X is a boundary point of D if it is neither
interior nor exterior to D . Note that a boundary point of D may or may not belong to
¯
D . For example, the boundaries of the open and closed balls B (0; r), B (0; r) are the same.
The boundary of a set D is denoted by ∂D . It is evident that a set is open if and only
if every point is an interior point, and a set is closed if and only if it contains all of its
boundary points.
Definition: Let A be a set in Rn and C a set in Rm . The function F : A → C is
continuous at the interior point X0 ∈ A if, given any radius > 0 , there is a radius δ > 0 7.2. GENERALITIES ON MAPPINGS FROM RN TO RM . 297 such that
F (X ) − F (X0 ) <
[Observe the norm on the left is in
It is easy to prove Rm textf orall X − X0 < δ. while that on the right is in Rn ]. Theorem 7.6 . An aﬃne mapping F (X ) = b + LX from Rn to Rm is continuous at
every point X0 ∈ Rm .
Proof: First, F (X ) − F (X0 ) = b + LX − b − LX0 = L(X − X0 ) . Thus,
F (X ) − F (X0 ) = L(X − X0 ) .
Let ((aij )) be a matrix representing L with respect to some bases for Rn and Rm . Then
by Theorem 17, p. 373
L(X − X0 ) 2 = L(X − X0 ), L(X − X0 ) ≤ k L(X − X0 ) where m X − X0 , n a2 .
ij k2 =
i=1 j =1 Therefore
L(X − X0 ) ≤ k X − X0 .
It is now clear that if X → X0 , then L(X − X0 ) → 0 . More formally, given any
δ = k+1 , we have
F (X ) − F (X0 ) < textf orall > 0 , if X − X0 < δ. The following theorems have the same proofs as were given earlier for special cases. (See a
ﬁrst year calculus book and our Chapter 0).
Theorem 7.7 . Let F1 and F2 map A ⊂ Rn into C ⊂ Rm . If F1 and F2 are continuous
at the interior point X0 ∈ A , then
1. aF1 + bF2 is continuous at X0 .
2. F1 , F2 is continuous at X0 .
Theorem 7.8 . Let F = (f1 , . . . , fm ) map A ⊂ Rn into C ⊂ Rm . Then F is continuous
at the interior point X0 ∈ A if and only if each of the fj , j = 1, . . . , m , is continuous at
X0 .
Theorem 7.9 . Let F : A → C , where A is a closed and bounded (= compact) set. If F
is continuous at every point of A , then it is bounded; that is, there is a constant M such
that F (x) ≤ M for all X ∈ A . Moreover, if M0 is the least upper bound, then there is
a point X0 ∈ A such that F (X0 ) = M0 . Similarly, if m0 is the greatest lower bound
for F , then there is a point X1 ∈ A such that F (X1 ) = m0 .
There is nothing better than to close this otherwise unauspicious section with one of
the crown jewels of mathematics - the Fundamental Theorem of Algebra, all of whose proofs
require the non-algebraic notion of continuity. Let
p(z ) = a0 + a1 z + · · · + an z n , (n ≥ 1), where the aj ’s are complex numbers and an is not zero.
For every complex number z , the value of the function p(z ) is a complex number.
Thus p : C → C . We want to prove there is at least one z0 ∈ C such that p(z0 ) = 0 . 298 CHAPTER 7. NONLINEAR OPERATORS: INTRODUCTION Lemma 7.10 . p(z ) is a continuous function for every z ∈ C .
Proof: Identical to the proof that a real polynomial is continuous everywhere.
Lemma 7.11 Let D be a set in the complex plane in which p(z ) = 0 . The minimum
modulus of p(z ) , that is, the minimum value of |p(z )| , cannot occur at an interior point
of D . It must occur on the boundary ∂D of D .
Proof: Let z0 be any interior point of D . Rewrite p(z ) in the form
p(z ) = b0 + b1 (z − z0 ) + · · · + bn (z − z0 )n .
Since p(z0 ) = 0 , we know b0 = 0 . Also, because p is not identically constant, at least one
coeﬃcient following b0 is not zero. Take bk to be the ﬁrst such coeﬃcient. We must write
b0 , bk and z − z0 in polar form,
b0 = ρ0 euα bk = ρ1 eiβ z − z0 = ρeiθ , where ρ0 = |p(z0 )| , ρ1 and ρ are positive real numbers. Here we are restricting z to a
point on a circle of radius ρ about z0 , after taking ρ small enough to insure this circle is
interior to D . Then
p(z ) = ρ0 eiα + ρ1 eiβ ρk eikθ + bk+1 (z − z0 )k+1 + · · · + bn (z − z0 )n
= ρ0 eiα + ρ1 ρk ei(β +kθ) + (z − z0 )k+1 [bk+1 + · · · + bn (z − z0 )n−k−1 ].
Pick the particular point z on the circle whose argument θ is given by β + kθ = α + π .
ˆ
Then ei(β +kθ) = ei(α+π) = −eiα , so
p(ˆ) = (ρ0 − ρ1 ρk )eiα + (ˆ − z0 )k+1 [bk+1 + · · · + bn (ˆ − z0 )n−k−1 ].
z
z
z
By the triangle inequality we ﬁnd
|p(ˆ)| ≤ ρ0 − ρ1 ρk + ρk+1 [|bk+1 | + · · · + |bn | ρn−k−1 ].
z
Choose the radius ρ so small that ρ0 − ρ1 ρk ≥ 0 . Then
|p(ˆ)| ≤ ρ0 − ρ1 ρk + ρk+1 [|bk+1 | + · · · + |bn | ρn−k−1 ].
z
By choosing ρ smaller yet, if necessary, we can make the term ρ[|bk+1 | + · · · + |bn | ρn−k−1 ] <
1
2 ρ1 . Consequently,
1
1
|p(ˆ)| ≤ ρ0 − ρ1 ρk + ρ1 ρk = ρ0 − ρ1 ρk
z
2
2
< ρ0 = |p(z0 )| .
Thus, if z0 is any interior point of a domain D in which p does not vanish, then there is
a point z also interior to D such that |p(z )| < |p(z0 )| . Therefore, the minimum of |p(z )|
ˆ
must occur on the boundary of any set in which p does not vanish.
Lemma 7.12 . Given any real number M , there is a circle |z | = R on which |p(z )| > M
for all z, |z | = R . 7.2. GENERALITIES ON MAPPINGS FROM RN TO RM . 299 Proof: For z = 0 , we can write the polynomial p(z ) as
p(z )
an−1
a0
= an +
+ ··· + n.
n
z
z
z
From the triangle inequality written in the form |f1 + f2 | ≥ |f1 | − |f2 | , we ﬁnd
an−1
p(z )
a0
≥ |an | −
+ ··· + n .
zn
z
z
If |z | is taken large enough, |z | ≥ R0 , it is possible to make the second term on the right
less than |an | /2 ,
an−1
an
a0
+ ··· + n <
,
z
z
2 |z | = R > R0 texton Therefore, for |z | = R ≥ R0
1
an
p(z )
= |an | ,
≥ |an | −
n
z
2
2
so 1
|an | Rn , texton |z | = R.
2
It is now clear that by choosing R suﬃciently large, |p(z )| can be made to exceed any
constant M on the circle |z | = R .
|p(z )| ≥ Theorem 7.13 (Fundamental Theorem of Algebra). Let
p(z ) = a0 + a1 z + · · · + an z n , an = 0, n ≥ 1, be any polynomial with possibly complex coeﬃcients, a0 , a1 , . . . , an . Then there is at least
one number z0 ∈ C such that p(z0 ) = 0 . In other words, every polynomial has at least one
complex root.
Proof: By Lemma 3, we can ﬁnd a large circle |z | = R , on which |p(z )| > 2 |a0 | for all
|z | = R . Since p(z ) is a continuous function, by Theorem 4 there is a point z0 in the closed
and bounded disc |z | ≤ R for which |p| attains its minimum value m0 , |p(z0 )| = m0 . If
p(z0 ) = 0 , we are done. However if p does not vanish inside the closed disc, by the
important Lemma 2 its minimum value is attained only on the boundary, so z0 is on the
circle |z0 | = R . But on the circle we know |p(z0 )| > 2 |a0 | = 2 |p(0)| , so the minimum is
not at z0 after all. The assumption that p does not vanish in the disc |z | ≤ R had led us
to a contradiction. Notice the proof does not give a procedure for ﬁnding the root whose
existence has been proved. Exercises
1. Prove Theorem 2, part 1.
2. Use the Fundamental Theorem of Algebra along with the “factor theorem” of high
school algebra to prove that a polynomial of degree n has exactly n roots (some of which
may be repeated roots). 300 7.3 CHAPTER 7. NONLINEAR OPERATORS: INTRODUCTION Mapping from 1
E to n
E .
As a particle moves along a curve γ in En its position F (t) at time t can be speciﬁed
by a vector
X = F (t) = (f1 (t), f2 (t), . . . , fn (t)),
where xj = fj (t) is the j th coordinate of the position at time t . Thus, the curve is
speciﬁed by F (t) , a mapping from numbers to vectors, F : A ⊂ E1 → En , where A is the
domain of deﬁnition of F .
For example, the mapping
F : t → (cos πt, sin πt, t), t ∈ (−∞, ∞) which may also be written as
F (t) = (cos πt, sin πt, t)
can be thought of as describing the motion of a particle along a helix.
It is natural to ask about the velocity, which means derivative must be deﬁned.
Definition: Let F (t) deﬁne a curve γ for t in the interval A = [a, b] . Consider the
diﬀerence quotient
F (t + h) − F (t)
,
h t textand t+h in A, where t is ﬁxed. If this vector has a limit as h tends to zero, then F is said to have a
derivative F (t) at t ,
F (t + h) − F (t)
F (t) = lim
,
h→0
h
while the curve has slope F (t) at t . Some other common notations are
˙
F (t), dF
,
dt Dt F. The curve γ is called smooth if i) the derivative F (t) exists and is continuous for each t
in [a, b] , and if ii) F (t) = 0 for any point t in [a, b] .
If t represents time, then F (t) is the velocity of the particle at time t while F (t)
is the speed.
If F (t) is given in terms of coordinate functions, F : t → (f1 (t), . . . , fn (t)) , how can
the derivative of F be computed?
Theorem 7.14 . If F (t) = (f1 (t), . . . , fn (t)) is a diﬀerentiable mapping of A ⊂ E1 into
En , then the coordinate functions are diﬀerentiable and
dF
=
dt df1 df2
dfn
,
,··· ,
dt dt
dt . Conversely, if the coordinate functions are diﬀerentiable, then so is F (t) and the derivative
is given by the above formula. 7.3. MAPPING FROM E1 TO EN 301 Proof: If t and t + h are both in A , then
1
F (t + h) − F (t)
= [(f1 (t + h), . . . , fn (t + h)) − (f1 (t), . . . , fn (t))]
h
h
= fn (t + h) − fn (t)
f1 (t + h) − f1 (t)
,··· ,
h
h Since the limit as h → 0 of the expression on the left exists if and only if all of the limits
lim h→0 fj (t + h) − fj (t)
,
h j = 1, . . . , n exist, the theorem is proved.
Examples:
(1) If F : t → (cos πt, sin πt, t), t ∈ (−∞, ∞), F is diﬀerentiable for all t since each of
the coordinate functions are diﬀerentiable. Also,
F (t) = (−π sin πt, π cos πt, 1).
In addition, the curve - a helix - which F deﬁnes is smooth since F is continuous
and
F (t) − π 2 sin2 πt + π 2 cos2 πt + 1 = π 2 + 1 = 0,
(2) Let F : t → (a1 + b1 t, a2 + b2 t, a3 + b3 t) = P + Qt where P = (a1 , a2 , a3 ) and
Q = (b1 , b2 , b3 ) are constant vectors. Then the curve F deﬁnes is a straight line
which passes through the point P = (a1 , a2 , a3 ) at t = 0 . F is diﬀerentiable for all
t , since each of the coordinate functions are diﬀerentiable. Furthermore,
F (t) = Q = (b1 , b2 , b3 ),
a constant vector pointing in the direction Q = (b1 , b2 , b3 ) , as is anticipated for a
straight line. Because
F (t) = Q = b2 + b2 + b2 ,
1
2
3 this curve is smooth except in the degenerate case b1 = b2 = b3 = 0 , that is, Q = 0 ,
when the curve degenerates to a single point, F (t) = (a1 , a2 , a3 ) = P .
(3) The curve deﬁned by the mapping F : t → (t, |t|) is diﬀerentiable everywhere and
F (t) = 0 except at t = 0 . It is not diﬀerentiable there since the second coordinate
function, f2 (t) = |t| is not diﬀerentiable at t = 0 . Thus, the curve is smooth except
at t = 0 .
(4) The curve deﬁned by the mapping F : t → (t3 , t2 ) is diﬀerentiable everywhere, and
F (t) = (3t2 , 2t).
√
However, F (t) = 9t4 + 4t2 , so the curve is smooth everywhere except at t = 0 ,
which corresponds to a cusp at the origin in the x1 , x2 plane. 302 CHAPTER 7. NONLINEAR OPERATORS: INTRODUCTION It is elementary to compute the derivative of the sum of two vectors. The derivative
of a product can be deﬁned for the inner product, and for the product with scalar-valued
function.
Theorem 7.15 . If F (t) and G(t) both map an interval A ⊂ E1 into En , and are both
diﬀerentiable there, then for all t ∈ A ,
d
1. dt [aF + bG] = a dF + b dG (linearity of the derivative).
dt
dt
d
d
2. dt F, G = F , G + F, G , (in “dot product” notation: dt (F · G) = F · G + F · G ).
Proof: Since these are identical to the proofs of the corresponding statements for scalarvalued functions, we prove only the second statement.
1
d
F (t), G(t) = lim [ F (t + h), G(t + h) − F (t), G(t) ]
h→0 h
dt
1
[ F (t + h) − F (t), G(t + h) + F (t), G(t + h) − G(t) ]
h→0 h = lim = lim h→0 F (t + h) − F (t), h
G(t + h) − G(t)
+ F (t),
G(t + h)
h
= F (t), G(t) + F (t), G (t) . An interesting and simple consequence is the fact that if a particle moves on a curve
F (t) which remains a ﬁxed distance from the origin, F (t) ≡ constant = c , then the
velocity vector F is always orthogonal to the position vector F . This follows from
c2 = F (t) 2 = F (t), F (t) , so taking the derivative of both sides we ﬁnd
0 = F , F + F, F = 2 F, F . Thus F, F = 0 for all t , an algebraic statement of the orthogonality. As a particular
example, the mapping
π
π
F (t) = (cos
, sin
)
2
1+t
1 + t2
has the property F (t) = 1 for all t . You can see the path of the particle in the ﬁgure.
At t = 0 the particle is at (−1, 0) . As time increases, the particle moves along an arc of
the unit circle toward (1, 0) , reaching (0, 1) at t = 1 . The velocity at time t is
F (t) = 2πt
π
π
(sin
, − cos −
).
(1 + t2 )2
1 + t2
1 + t2 From this expression, it is evident the particle slows down as it approaches (1, 0) . In fact,
the particle never does manage to reach (1, 0) .
We would like to deﬁne the notion of a straight line which is tangent to a smooth
curve at a given point. There is one touchy issue. You see, the curve may intersect itself,
thus having two or more tangents at the same point. Once acknowledged, the diﬃculty is
resolved by realizing that for each value of t , there is a unique point F (t) on the curve.
X0 is a double point if F (t1 ) = F (t2 ) = X0 . 7.3. MAPPING FROM E1 TO EN 303 By picking one value of t , there will be a unique tangent line to the curve for this value
of t . Thus, we deﬁne the tangent line for t = t1 to the curve deﬁned by a diﬀerentiable
function F (t) as the straight line whose equation is
A(t) = F (t1 ) + F (t1 )(t − t1 ).
At t = t1 , the curves deﬁned by F (t) and A(t) have the same value F (t1 ) = X0 and the
same derivative (slope), F (t) .
Example: Consider the curve deﬁned by the mapping F : t → (3 + t3 − t, t2 − t), t ∈
(−∞, ∞) . The point (3, 0) is a double point since F : 0 → (3, 0) and F : 1 → (3, 0) .
Thus, the line tangent to the point (3, 0) when t = 1 is deﬁned by
A(t) = (3, 0) + (2, 1)(t − 1) = (3, 0) + (2(t − 1), (t − 1))
or
A(t) = (1, −1) + (2t, t).
Since we are still working with functions F (t) of one real variable t , the mean value
theorem and chain rule follow immediately by applying the corresponding theorems for
scalar valued functions to each of the components f1 (t), . . . , fn (t) of F (t) .
Theorem 7.16 (Approximation Theorem and Mean Value Theorem). If the vector valued
function F (t) is continuous for t ∈ [a, b] and diﬀerentiable for t ∈ (a, b) then for t0 ∈
(a, b) ,
1. F (t) = F (t0 ) + dF t0 (t − t0 ) + R(t, t0 ) |t − t0 | where
dt
lim R(t, t0 ) = 0. t→t0 2. There is a point τ between t and t0 such that
F (t) − F (t0 ) ≤ F (τ ) |t − t0 | .
3. If F = (f1 , . . . , fn ) , there are points τ1 , . . . , τn between t and t0 such that
F (t) = F (t0 ) + L(t − t0 ),
where L is the linear transformation
L = (f1 (τ1 ), f2 (τ2 ), . . . , fn (τn ))
Remark: Although 1 and 3 follow from the one variable case f (t) —and will be proved
again in greater generality later on - the proof of 2 is diﬃcult under our weak hypothesis. If
the stronger assumption, F is continuously diﬀerentiable, is made, then 2 becomes easy, and
the factor F (τ ) can be replaced by a constant M = max F (τ ) , since a continuous
τ ∈[a,b] function F (τ ) does assume its maximum if τ is in a closed and bounded set, τ ∈ [a, b] .
Corollary 7.17 . If F satisﬁes the hypotheses of Theorem 8 and if F (t) ≡ 0 for all
t ∈ [a, b] , then F is a constant vector.
Proof: Just look at 2 or 3 above to see that for any points t, t0 in [a, b] , we have
F (t) = F (t0 ) . 304 CHAPTER 7. NONLINEAR OPERATORS: INTRODUCTION Theorem 7.18 (Chain Rule). Consider the vector-valued function F (t) which is diﬀerentiable for t ∈ (a, b) , and the scalar valued function φ(s) which is diﬀerentiable for
s ∈ (α, β ) . If the range of φ is contained in (a, b), R(φ) ⊂ (a, b) , then the composed function G(s) = (F ◦ φ)(s) = F (φ(s)) is diﬀerentiable as a function of s for all s in (α, β )
and
G (s) = F (φ(s))φ (s),
that is,
dF
dφ
dF
dG
(s) =
(φ) (s) =
(t)
ds
dφ
ds
dt t=φ(s) dφ
(s).
ds If F (t) = (f1 (t), . . . , fn (t)) , then
G(s) = F ((s)) = (f1 (φ(s)), . . . , fn (φ(s)), and G (s) =)f1 (φ)φ (s), . . . , fn (φ)φ (s))
= (f1 (φ), . . . , fn (φ))φ (s).
Proof not given here. It is the same as that given in elementary calculus for n = 1 . A more
general theorem containing this one is proved later (p. 701).
Examples:
1. If F (t) = (1 − t2 , t3 − sin πt) and φ(s) = e−s , then G(s) = (F ◦ φ)(s) = (1 −
−2s , e−3w − sin πe−2 ) . We compute G (s) in two distinct ways, using the chain rule, and
e
directly from the formula for G(s) . By the chain rule:
G (s) = F (t) t=φ(s)
2 φ (s) = (−2t, 3t − π cos πt) t=e−s (−)e−s = −(−2e−s , 3e−2s − π cos πe−s )e−s ,
In particular, at s = 0 , since t = 1 when s = 0 , we ﬁnd
G (0) = −(−2, 3 + π ) = (2, −3, −π )
Directly from the formula for G(s) = (1 − e−2s , −3e−3s − sin πe−s ), we ﬁnd
G (s) = (2e−2s , −3e−3s + πe−s cos πe−2 ),
which agrees with the chain rule computation.
Since the derivative F (t) of a function F (t) from numbers to vectors, F : E1 → En ,
is also a function of the same type, the second and higher order derivatives can be deﬁned
inductively;
d
dk+1
d
d2
F (t) := F (t), k+1 F (t) := F (k) (t).
dt2
dt
dt
dt 7.3. MAPPING FROM E1 TO EN 305 Example: If F : t → (cos πt, sin πt, t) , then
d
(−π sin πt, π cos πt, 1)
dt F (t) = = (−π 2 cos πt, −π 2 sin πt, 0).
If F (t) represents the position of a particle at time t , then F (t) is the acceleration
of the particle at time t . All of these ideas were used in the last two sections in Chapter 6
where linear systems of ordinary diﬀerential equations were encountered. Time permitting,
a second application to a non-linear system of O.D.E.’s will be treated in Section of Chapter.
There another of the crown jewels in the intellectual history of mankind will be discussed:
Newton’s incredible solution of “the two body problem”, that is, to determine the motion
of the heavenly bodies.
Recall that the length of a curve is deﬁned to be the limit of the lengths of inscribed
polygons which approximate the curve as the length of the longest subinterval tends to
zero - if the limit does exist. Let the curve γ , which we assume is smooth, be determined
by the function F (t), t ∈ [a, b] . Then the length of the straight line joining F (tj ) to
F (tj + ∆tj ), tj +1 = tj + ∆tj , is
F (tj + ∆tj ) − F (tj )
∆ tj
∆ tj F (tj + ∆tj ) − F (tj ) = Adding up the lengths of these segments and letting the largest ∆tj tend to zero, we ﬁnd
the length of γ is given by
b L(γ ) = F (t) dt.
1 If the function F is deﬁned through coordinates, F (t) = (f1 (t), . . . , fn (t)) , this formula
reads
b L(γ ) = f
a 2
1 +f 2
2 + ··· + f 2 dt.
n You will recognize the special case where F (t) = (x(t), y (t))
b L(γ ) = x2 + y 2 dt.
˙
a Example: Find the length of the portion of the helix γ deﬁned by F (t) = (cos t, sin t, t) ,
for t ∈ [0, 2π ] . This is one “hoop” of the helix. Since F (t) = (− sin t, cos t, 1) , we have
√
F (t) = sin2 t + cos2 t + 1 = 2 , so the length is
2π L(γ ) = √ √
2 dt = 2π 2. 0 For each t ∈ [a, b] , we can deﬁne an arc length function s(t) , the arc length from a to
t , by
t s(t) = F (τ ) dτ.
a Note we are using a dummy variable of integration τ . By the fundamental theorem of
calculus, we have
ds
= F (t)
dt 306 CHAPTER 7. NONLINEAR OPERATORS: INTRODUCTION Since ds/dt can be thought of as the rate of change of arc length with respect to time, it
is the speed of a particle moving along the curve, the tangential speed.
The integral used in arc length is the integral of a scalar- valued function F (t) .
How can we deﬁne the integral of a vector-valued function F (t) = (f1 (t), . . . , fn (t)) ? Just
integrate each component, assuming they are all integrable of course,
b b F (t) dt := (
a b f1 (t) dt, . . . ,
a For example, if F (t) = (t − 3t2 , 1 −
2 √ 2t, e3t ) , then 2 0 2 (t − 3t2 ) dt, F (t) dt = ( fn (t) dt).
a 0 (1 − √ 2 e3t dt) 2t) dt,
0 0 2
= (−4, − , e6 − 1).
3
We give no physical interpretation of the integral (as an area or the like) except in the case
b where F (t) represents the velocity of a particle. Then
from the position at t = a to the position at t = b . F (t) dt is the vector pointing
a Exercises
(1) (a) Describe and sketch the images of the curves F : E1 → E2 deﬁned by
(i)
(ii)
(iii)
(iv)
(v) F (t) = (2t, 3 − t)
F (t) = (2t, |3 − t|)
F (t) = (t2 , 1 + t2 )
F (t) = (2t, sin t)
F (t) = (t2 , 1 + t4 ) (b) Which of the above mappings are diﬀerentiable and for what value(s) of t ? Find
the derivatives if the functions are diﬀerentiable. Which of the curves deﬁned by
these mappings are smooth, and where are they not smooth?
(2) Use the deﬁnition of the derivative to ﬁnd F (t) at t = 2π for the functions
a). F (t) = (2t, 3 − t)
b). F (t) = (1 + t ∈ (−∞, ∞). t2 , sin 2t). t ∈ (−∞, ∞). (3) Find the lengths of the curves γ deﬁned by the mappings
a). F (t) = (a1 + b1 t, a2 + b2 t, . . . , an + bn t), = P + Qt, t ∈ [0, 1]. b). F (t) = (sin 2t, 1 − 3t, cos 2t, 2t3/2 ), t ∈ [−π, 2π ]
(4) Consider the curve deﬁned by the equation
F (t) = (t − t2 , t4 − t2 + 1), t ∈ (−∞, ∞) a). Sketch the curve.
b). Where does the curve intersect itself?
c). Find the line tangent to the curve at the image of t = 1 . 7.3. MAPPING FROM E1 TO EN 307 (5) If F : A ⊂ E1 → En is twice continuously diﬀerentiable and F (t) ≡ 0 for all t ∈ A ,
what can you conclude? Please prove your assertion. [Hint: First consider the special
case where F : E1 → E1 ].
(6) Let F (t) be a twice diﬀerentiable function which maps a set in E1 into En and
satisﬁes the ordinary diﬀerential equation F + µF + kF = 0 , where k and µ are
positive constants. Deﬁne the energy as
E (t) = 1
F
2 2 1
+ kF
2 2 (a) Prove E (t) is a non-increasing function of t (energy is dissipated). [Hint:
dE/dt =? ].
(b) If F (0) = 0 and F (0) = 0 , prove E (t) ≡ 0 .
(c) Prove there is at most one function which satisﬁes the given diﬀerential equation
as well as the initial conditions F (0) = A, F (0) = B , where A and B are
given vectors.
1
(7) If F (t) = (1 − e2t , t3 , 1+t2 ) , and φ(x) =
using the chain rule. 1
1+x , x > −1 , compute d
dx (F ◦ φ)(x) by (8) Compute d2 F/dt2 for the function F (t) in Exercise 7.
(9) (a) Show that the equation of a straight line which passes through the point P1 at
t = 0 and P2 at t = 1 is
F (t) = P1 + (P2 − P1 )t.
(b) Find the equation of a straight line which passes through the point P1 = (1, 2, 3)
at t = 0 and P2 = (1, −5, 0) at t = 1 .
(c) Find the equation of a straight line which passes through the point P1 at t = t1
and P2 at t = t2 .
(d) Apply this to ﬁnd the equation of a straight line which passes through P1 =
(−3, 1, −2) at t = −1 and P2 = (0, 2, 1) at t = 2 . What is the slope of this
line?
(10) Given a smooth curve all of whose tangent lines pass through a given point, prove
that the curve is a straight line.
(11) Let F : E1 → En deﬁne a smooth curve which does not pass through the origin. Show
that the position vector F (t) is orthogonal to the velocity vector at the point of the
curve which is closest to the origin. Apply this to prove anew the well known fact
that the radius vector to any point on a circle is perpendicular to the tangent vector
at that point. [Hint: Why is it suﬃcient to minimize ϕ(t) = F (t), F (t) ?] 308 CHAPTER 7. NONLINEAR OPERATORS: INTRODUCTION Chapter 8 Mappings from En to E : The
Diﬀerential Calculus
8.1 The Directional and Total Derivatives .
Throughout this and the next chapter we shall consider functions which map En or a
portion of it A , into E . By the statement
f : A → E, A ⊂ En we mean that to every vector X in A , the function (operator, map, transformation) assigns
a unique real number w . Thus w = f (X ) in this case is a map from vectors to numbers.
Two particular examples prove helpful in thinking conceptually about mappings of this
type.
(1) The temperature function. f : A → E , where the set A ⊂ E3 is the room in which
you are sitting. To every point X in the room, A , this function f assigns a number
- the temperature f (X ) at X, w = f (X ) .
(2) The height function. f : A → E , where the set A is some set in the plane E2 . To every
point X in A , this function f assigns a number - the height f (X ) of a surface (or
manifold) M above that point. Thus, the set of all pairs (X, f (x)), X ∈ A , deﬁnes
a portion of a surface, a surface in E2 × E ∼ E3 .
=
From the second example, it is clear that every function f : A ⊂ En → E may be
regarded as the graph of a surface in En × E ∼ En+1 , the surface being regarded as all
=
points in En+1 of the form (X, f (X )) , where X ∈ A . For example, the temperature
function can be thought of as the graph of a surface in E4 , the height of the surface
w = f (X ) above X being the temperature at X . (Compare with the discussion from p.
322 bottom, to p. 324).
In concrete situations, the point X ∈ En is speciﬁed by giving its coordinates with
respect to some ﬁxed bases for En and E . The particular coordinate system used depends
on the geometry of the problem at hand. Rectangular symmetry calls for the standard
309 310 CHAPTER 8. MAPPINGS FROM EN TO E : THE DIFFERENTIAL CALCULUS
rectangular coordinates, while polar coordinates are well suited to problems with circular
symmetry. We shall meet these issues head-on a bit later.
If X = (x1 , . . . , xn ) with respect to some coordinates for En , then we write w =
f (X ) = f (x1 , . . . , xn ) . The points (X, f (X )) on the graph are (x1 , . . . , xn , f (x1 , . . . , xn )) ,
which we may also write as (x1 , . . . , xn , f ) or else as (x1 , . . . , xn , w) . For low dimensional
spaces, E2 or E3 , it is convenient to avoid subscripts. In these situations we shall write
w = f (x, y ) and w = f (x, y, z ) for mappings with domains in E2 and E3 , respectively.
We now examine some more speciﬁc examples.
Examples:
(1) w = − 1 x + y − 1 . This function assigns to every point X = (x, y ) in E2 a number w
2
in E . We can represent the function, an aﬃne mapping from E2 → E , as the graph
of a plane in E3 . The linear nature of the plane reﬂects the fact that the mapping
is an aﬃne mapping - a linear mapping except for a translation of the origin. More
generally, the function w = α + a1 x1 + a2 x2 + · · · + an xn , an aﬃne mapping from
En → E , represents a plane in En+1 . In fact, this can be taken as the algebraic
deﬁnition of a plane in En+1 . These aﬃne functions are the simplest functions which
map En into E . Although we shall not, it is customary to abuse the nomenclature
and refer to aﬃne mappings as being linear. This is because they share most of
the algebraic and geometric properties of proper linear mappings, as opposed to the
honestly nonlinear mappings we will be treating as in the next examples.
(2) w = x2 + y 2 . This function assigns to every point X = (x, y ) in E2 a real number
w ∈ E . We can represent the function as the graph of a paraboloid of revolution,
obtained by rotating the parabola w = x2 about the w axis. If this paraboloid is
cut by a plane parallel to the x, y plane, say w = 2 , the intersection of these two
surfaces is the circle x2 + y 2 = 2 .
(3) w = −x2 + y 2 . This function can be represented as the graph of a very fancy surface
- a hyperbolic paraboloid. If this surface is cut by a plane parallel to the x, y plane,
w = c , the intersection is the curve c = −x2 + y 2 . For c > 0 , this curve is a hyperbola
which opens about the y axis, while if c < 0 , the curve is a hyperbola which opens
+ about the x axis. For c = 0 we obtain two straight lines, x =− y (see ﬁg). The
intersection of the surface with the plane x = c is a parabola which opens upward
in the y, w plane. Similarly, the intersection of the surface with the plane y = c is
a parabola which opens downward in the xw plane. This curve is rightly called a
saddle, and the origin (0, 0, 0) a saddle point (or mountain pass) since a particle can
remain at rest at that point, or ii) move on the surface in one direction and go up, or
iii) move on the surface in another direction and go down.
Let f (X ) be a function from vectors to numbers,
f : A ⊂ En → E.
How can we deﬁne the notion of derivative for such functions? The derivative should
measure the rate of change of f (X ) as X moves about. But if you think of f (X )
as the temperature function, it is clear that the temperature will change at diﬀerent
rates depending which direction you move. Thus, if you move across the room in 8.1. THE DIRECTIONAL AND TOTAL DERIVATIVES 311 the direction of the door, the temperature may decrease, while if you move up to the
ceiling, the temperature will likely increase. Thus, the natural notion of a derivative
is the rate of change in a particular direction - a directional derivative.
Let X0 denote your position and f (X0 ) the temperature there. Take η to be a free
vector, which we shall think of as pointing from X0 to X0 + η . We want to deﬁne the rate
at which the temperature changes as you move from X0 in the direction η toward X0 + η .
Since all points on the line joining X0 to X0 + η are of the form X0 + λη , where λ is a
real number, the diﬀerence f (X0 + λη ) − f (X0 ) is the diﬀerence between the temperatures
at X0 + λη and at X0 .
Definition: Let f : A ⊂ En → E . The derivative of f at the interior point X0 ∈ A with
respect to the vector η is
f (X0 + λη ) − f (X0 )
,
λ→0
λ f (X0 ; η ) = lim
if the limit exists. In the special case when η = e is a unit vector, e = 1 , we see that λ = λe . Then
De f (X0 ) := f (X0 ; e) is the instantaneous rate of change of f per unit length as X moves
from X0 toward X0 + 3 . This normalization to using only unit vectors is necessary to
have a meaningful deﬁnition of a directional derivative. Thus, the directional derivative of
f at X0 ∈ A in the direction of the unit vector e is the derivative with respect to the unit
vector e . It measures how f changes as you move from X0 to a point on the unit sphere
about X0 . For theoretical purposes, the derivative of f with respect to any vector η is
useful, while for practical purposes, the more restrictive notion of the directional derivative
is needed.
Example: 1 Find the directional derivative of f (X ) = x2 − 2x1 x2 + 3x2 at X0 = (1, 0) in
1
η
the direction η = (−1, 1) . Note that η is not a unit vector. The unit vector is e = η =
1
1
(− √2 , √2 ) . Then
11
λλ
X0 + λe = (1, 0) + λ(− √ , √ ) = (1 − √ , √ ),
22
22
so λ
λ
λ
λ
f (X0 + λe) = (1 − √ )2 − 2(1 − √ )( √ ) + 3( √ )
2
2
2
2
3
λ
= 1 − √ + λ2 .
22 Thus,
1−
f (X0 + λe) − f (X0 )
=
λ
Therefore, the directional derivative De f is λ
√
2 3
+ 2 λ2 − 1 λ −1 3
= √ + λ.
22 f (X0 + λe) − f (X0 )
−1
=√ .
λ→0
λ
2 De f (X0 ) = lim 1
In words, the rate of change of f at X0 in the direction of the unit vector e is − √2 . One
qualitative conclusion we arrive at is that f (X ) decreases as X moves from X0 in the
direction e . 312 CHAPTER 8. MAPPINGS FROM EN TO E : THE DIFFERENTIAL CALCULUS
2. Compute f (X ; η ) if f (X ) = X, AX , where A is a self-adjoint transformation.
f (X + λη ) = X + λη, A(X + λη )
= X, AX + λ η , AX + λ X, Aη + λ2 η , Aη
and since A is self-adjoint,
= X, AX + 2λ AX, η + λ2 η , Aη .
Thus,
f (X + λη ) − f (X )
= 2 AX, η .
λ→0
λ f (X, η ) = lim In particular when A = I is the identity operator, f (X ) = X
2 X, η 2 , we ﬁnd f (X ; η ) = The directional derivatives of f in the particular direction of the coordinate axes e1 =
(1, 0, . . .), e2 = (0, 1, 0, . . .) have special names. They are called the partial derivatives of
f . For example, the partial derivative of f (X ) = f (x1 , x2 , . . . , xn ) at X0 with respect to
x2 is
∂f
f (X0 + λe2 ) − f (X0 )
(X0 ) := f (X0 ; e2 ) = lim
λ→0
∂x2
λ
There are many other competing notations, all of them being used. We shall list them
shortly, after observing there is a simple way to compute these partial derivatives. Consider
f (X ) = f (x − 1, x2 , x3 ) . Then
f (X + λe1 ) − f (X )
∂f
(X ) = lim
λ→0
∂x2
λ
Since X + λe1 = (x1 , x2 , x3 ) + λ(1, 0, 0) = (x1 + λ, x2 , x3 ) we have
f (x1 + λ, x2 , x3 ) − f (x1 , x2 , x3 )
∂f
(X ) = lim
.
λ→0
∂x2
λ
But this is the ordinary derivative of f with respect to the single variable x1 , while holding
the other variables x2 and x3 ﬁxed. Thus, ∂f /∂x1 can be computed by merely taking the
ordinary one variable derivative of f with respect to x1 , pretending the other variables
are constants.
Example: If f (X ) = x2 + x1 ex1 x2 , ﬁnd the rate of change of f at the point X0 in the
1
∂f
∂f
directions e1 = (1, 0) and e2 = (0, 1) . Thus, we want to compute ∂x1 (X0 ) and ∂x2 (X0 ).
∂f
= 2x1 + ex1 x2 + x1 x2 ex1 x2
∂x1
∂f
= x2 ex1 x2
1
∂x2
At the pint X0 = (2, −1) , we have
∂f
∂x1 = 4 − e−2 ,
2 ,− 1 ∂f
∂x2 = 4e−2 .
(2,−1) 8.1. THE DIRECTIONAL AND TOTAL DERIVATIVES 313 Some common notation. If w = f (x1 , x2 ) , then
∂w
∂f
=
= D1 f = f1 = fx1 = wx1
∂x1
∂x1
∂w
∂f
=
= D2 f = f2 = fx2 = wx2
∂x2
∂x2
If f : A ⊂ En → E , then ∂f /∂xj is another function of X = (x1 , x2 , . . . , xn ) . It is
then possible to take further partial derivatives.
Example: Let w = f (X ) = x2 + x1 ex1 x2 as in the previous example. Then
1
w11 = f11 = fx1 x1 = ∂2f
∂ ∂f
x1 x2
= x2 ex1 x2 + x1 x2 ex1 x2
2
2 = ∂x ( ∂x ) = 2 + x2 e
∂x1
1
1 w12 = f12 = fx1 x2 = ∂ ∂f
∂2
=
(
) = x1 ex1 x2 = x1 ex1 x2 + x2 x2 ex1 x2
1
∂x1 ∂x2
∂x2 ∂x1 w21 = f21 = fx2 x1 = ∂2f
∂ ∂f
=
(
) = 2x1 ex1 x2 = x2 x2 ex1 x2 = f12
1
∂x2 ∂x1
∂x1 ∂x2 w22 = f22 = fx2 x2 = ∂2f
∂ ∂f
3 x1 x2
.
2 = ∂x ( ∂x ) = x1 e
∂x2
2
2 And even higher derivatives can be computed too, like
f221 = fx2 x2 x1 = ∂3f
∂ ∂2f
2 x1 x2
+ x3 x2 ex1 x2 .
1
2 ∂x = ∂x ( ∂x2 ) = 3x1 e
∂x2 1
1
2 Remark: From this one example, it appears possible that we always have f12 = f21 ,
∂2f
∂2f
that is ∂x1 ∂x2 = ∂x2 ∂x1 . This is indeed the case if the second partial derivatives of f are
continuous, but for lack of time we shall not prove it (see Exercise 6).
So far we have deﬁned the directional derivative of a function f : En → E and called
particular attention to those in the direction of the coordinate axes - the partial derivatives
of f . Although the actual computation of the partial derivatives has been reduced to
the formal procedure of computing ordinary derivatives, the computation of the directional
derivative in an arbitrary direction must still be done by using the deﬁnition: the limit of
a diﬀerence quotient. We shall now reduce the computation of all directional derivatives
to a simple formal procedure. In order to do so, we shall introduce the concept of the
total derivative for functions f : A ⊂ En → E1 . This derivative will not be a directional
derivative, but rather a more general object.
The motivating idea here is the important one of approximating a non-linear function
f at a point X0 by a linear function. If we think of the function f (X ) as deﬁning a surface
M in En+1 with point (X, f (X )) , then the picture is that of approximating the surface
M near X0 by a plane (or hyperplane) tangent to the surface at X0 . We want to write
f (X ) ∼ f (X0 ) + L(X − X0 ),
where L is a linear operator, L : En → E , which may depend on the “base point” X0 .
Of course, as X → X0 we want the accuracy to improve in the sense that the tangent
plane should be a better approximation the closer X is to X0 . At X = X0 , the tangent 314 CHAPTER 8. MAPPINGS FROM EN TO E : THE DIFFERENTIAL CALCULUS
plane f (X0 ) + L(X − X0 ) and surface M touch since they both pass through the point
(X0 , f (X0 )) . Notice that the function f (X) ) + L(X − X0 ) is aﬃne, so it does represent a
plane surface.
Motivated by the above considerations, we can now make a reasonable
Definition: Let f : A ⊂ En → E and X0 be an interior point of A . f is diﬀerentiable
at X0 if there exists a linear transformation L : En → E such that
lim h →0 f (X0 + h) − f (X0 ) − Lh
= 0,
h for any vector h in some small ball about X0 (so f (X0 + h) is deﬁned). The operator L
will usually depend on the base point X0 . If f is diﬀerentiable at X0 , we shall use the
notation
df
(X0 ) = f (X0 ) = L(X0 ) = L,
dX
and refer to f (X0 ) as the total derivative of f at X0 . [The notation f (X0 ) and grad
f (X0 ) , for gradient, are also used]. If L = f (X0 ) , a linear operator from En to E , exists
and depends continuously on the base point X0 for all X0 ∈ A , then f is said to be
continuously diﬀerentiable in A , written f ∈ C 1 (A) .
Remark: The condition that f be diﬀerentiable at X0 can also be written in the following
useful form:
f (X0 + h) = f (X0 ) + Lh + R(X0 , h) h ,
(8-1)
where the remainder R(X0 , h) has the property
lim R(X0 , h) = 0. h →0 This abstract operator L has the delightful property that it can be computed easily.
But before telling you how, we should ﬁrst prove for a given f there can be at most one
linear operator L which is the total derivative.
Theorem 8.1 . (Uniqueness of the total derivative). Let f : A → E be diﬀerentiable at
the interior point X0 ∈ A . If L1 and L2 are linear operators both of which satisfy the
conditions for the total derivative of f at X0 , then L1 = L2 .
Proof: Let L = L1 − L2 . We shall show L is the zero operator. Since
Lh = L1 h − L2 h = [f (X0 + h) − f (X0 ) − L2 h] − [f (X0 + h) − f (X0 − L1 h],
by the triangle inequality we have
Lh ≤ f (X0 + h) − f (X0 ) − L2 h + f (X0 + h) − f (X0 ) − L1 h .
Consequently,
lim h →0 Lh
= 0.
h 8.1. THE DIRECTIONAL AND TOTAL DERIVATIVES 315 To complete the proof, a trick is needed. Fix η = 0 . If λ is a constant, λ → 0 , then
λη → 0 so
L(λη )
lim
= 0.
λη
λ →0
But since L is linear, Lλη = λLη = |λ| Lη , so the factor λ can be canceled in
numerator and denominator. Thus the last equation is independent of λ , so Lη / η = 0 .
Because η = 0 , this implies Lη = 0 . Therefore L must be the zero operator.
Next, we give a method for computing L . Not only that, but we also ﬁnd an easy way
to compute the directional derivatives.
Theorem 8.2 . Let f : A → E be diﬀerentiable at the interior point X0 ∈ A . Then a) the
directional derivative of f at X0 exists for every direction e and is given by the formula
De f (X0 ) = Le.
b) Moreover, if f is given in terms of coordinates, f (X ) = f (x1 , . . . , xn ) , then L is
represented by the 1 × n matrix
f (X0 ) = L = (fx1 (X0 ), . . . , fxn (X0 )).
c) Consequently, the directional derivative is simply the product of this matrix L with the
unit vector e , which can also be thought of as the scalar product of the 1 × n matrix, a
vector, and the vector e ,
De f (X0 ) = f (X0 ), e .
Proof: This falls out of the deﬁnitions. First
f (X0 + λe) − f (X0 )
.
λ→0
λ De f (X0 ) = lim f (X0 + λe) − f (X0 ) − L(λe) + L(λe)
.
λ→0
λ = lim
Since L(λe) = λLe and λe = λ = lim λe →0 f (X0 + λe) − f (X0 ) − L(λe)
+ Le.
λe Because f is diﬀerentiable at X0 , the ﬁrst term tends to zero. Thus proving the ﬁrst part.
To prove the last part, it is suﬃcient to observe that if e = ej is one of the coordinate
vectors, then by deﬁnition Dej f (X0 ) := fxj . Thus, if h is any vector h = (h1 , . . . , hn ) =
h1 e1 + · · · + hn en , by the linearity of L we have
Lh = L(h1 e1 + · · · + hn en ) = h1 Le1 + · · · + hn Len
= h1 fx1 (X0 ) + · · · + hn fxn (X0 ) h1
· = (fx1 (X0 ), . . . , fxn (X0 )) · . ·
hn 316 CHAPTER 8. MAPPINGS FROM EN TO E : THE DIFFERENTIAL CALCULUS
Since h is any vector, we have shown L is represented by the given matrix.
Remark: The theorem states that if f is diﬀerentiable, then all the partial derivatives
exist and f (X0 ) := L is represented by the above matrix. It does not state that if the
partial derivatives exist, then f is diﬀerentiable. This is false (see Exercise 16). However,
if the partial derivatives of f exist and are continuous, then f is diﬀerentiable. The last
statement will be proved as Theorem 3.
Example: The same one worked before (p. 573). Find the directional derivative of f (X ) =
x2 − 2x1 x2 + 3x2 at X0 = (1, 0) in the direction η = (−1, 1) .
1
1
1
Since η is not a unit vector, we let e = η = (− √2 , √2 ) . Now at a point X ,
η
L = f (X ) = (fx1 , fx2 ) = (2x1 − 2x2 , −2x1 + 3).
In particular, at X = X0 = (1, 0) ,
L = (2, −2 + 3) = (2, 1).
Therefore
De f (X0 ) = Le = (2, 1) 1
− √2
1
√
2 1
= −√ ,
2 which checks with the answer found previously.
Consider the mapping w = f (X ), X ∈ A ⊂ En , w ∈ E as deﬁning a surface M ⊂ En+1 .
It is now evident how to deﬁne the tangent plane to M at the point (X0 , f (X0 )) , where
X0 ∈ A .
Definition: Let F : A ⊂ En → E be a diﬀerentiable mapping, thus deﬁning a surface M
with points (X, f (X )), X ∈ A . The tangent plane to M at the point (X0 , f (X0 )) , where
X0 ∈ A , is the surface deﬁned by the aﬃne mapping
Φ(X ) = f (X0 ) + f (X0 )(X − X0 ).
or
Φ(X ) = f (X0 ) + L(X − X0 ), where L = f (X0 ), Thus, the tangent plane to the surface deﬁned by f is merely the “aﬃne part” of f at
X0 .
Example: Consider the function w = f (X ) = 3 − x2 − x2 . This function deﬁnes a
1
2
paraboloid (see ﬁg.). Let us ﬁnd the tangent plane to this surface at (X0 , f (X0 )) , where
X0 = (1, −1) , so f (X0 ) = 3 − 12 − (−1)2 = 1 . Also
fx1 (X ) = −2x1 , fx2 (X ) = −2x2 .
Thus
f (X0 ) = (fx1 (X0 ), fx2 (X0 )) = (−2, 2).
Since X − X0 = (x1 , x2 ) − (1, −1) = (x1 − 1, x2 + 1) we ﬁnd the equation of the tangent
plane is
x1 − 1
Φ(X ) = 1 + (−2, 2)
= 1 − 2(x1 − 1) + 2(x2 + 1),
x2 + 1 8.1. THE DIRECTIONAL AND TOTAL DERIVATIVES 317 or
Φ(X ) = 5 − 2x1 + 2x2 .
This tangent plane is the unique plane with the property
Φ(X0 ) = f (X0 ), and Φ (X0 ) = f (X0 ). Although we have given necessary conditions that a function be diﬀerentiable (all directional
derivatives exist, in particular, all partial derivatives exist), we have not given suﬃcient
conditions. The next theorem gives suﬃcient conditions for a function to be continuously
diﬀerentiable.
Theorem 8.3 . Let f : A ⊂ En → E , where A is an open set. Then f is continuously
diﬀerentiable throughout A if and only if all the partial derivatives of f exist and are
continuous.
Proof: ⇒ If f is continuously diﬀerentiable, then the partial derivatives exist by Theorem
2. Furthermore, for any X and Y in A ,
fxi (X ) − fxi (Y ) = f (X ), ei − f (Y ), ei = f (X ) − f (Y ), ei .
Thus, applying the Schwartz inequality we ﬁnd
|fxi (X ) − fxi (Y )| ≤ f (X ) − f (Y ) .
The statement f is continuously diﬀerentiable means the vector f (X ) is a continuous
function of X . Therefore, given any > 0 is a δ > 0 such that f (X ) − f (Y ) < for
all X − Y < δ . For any > 0 , the inequality above shows |fxi (X ) − fxi (Y )| is also less
than for the same δ . Consequently, fxi is continuous.
⇐ . A little more diﬃcult. The idea is to use the mean value theorem for functions of
one variable. Let X and Y be points on A . To prove continuity at X , it is suﬃcient
to restrict Y to being in some ball about X which is entirely in A (some ball does exist
since A is open). For notational convenience, we take n = 2 . Then
f (Y ) − f (X ) = f (Y ) − f (Z ) + f (Z ) − f (X ),
where Z is a point in A whose coordinates, except the ﬁrst, are the same as X and
whose coordinates, except the second, are the same as Y . By the one variable mean value
˜
ˆ
theorem, there is a point X between X and Z and point X between Y and Z such
that
∂f ˜
∂f ˆ
f (Z ) − f (X ) =
(X )(y1 − x1 ), f (Y ) − f (Z ) =
(X )(y2 − x2 ).
∂x1
∂x2
Therefore
∂f ˆ
∂f ˜
(X )(y1 − x1 ) +
(X )(y2 − x2 ),
f (Y ) − f (X ) =
∂x1
∂x2
so
f (Y ) − f (X ) − [fxi (X )(y1 − x1 ) + fx2 (X )(y2 − x2 )]
˜
ˆ
= [fxi (X ) − fxi (X )](y1 − x1 ) + [fx2 (X ) − fx2 (X )](y2 − x2 ).
Therefore
˜
f (Y ) − f (X ) − L(Y − X ) ≤ fxi (X ) − fxi (X ) |y1 − x1 | ˆ
fx2 (X ) − fx2 (X ) |y2 − x2 | 318 CHAPTER 8. MAPPINGS FROM EN TO E : THE DIFFERENTIAL CALCULUS
where we have written L = (fx1 (X ), fx2 (X )) . Since |yj − xj | ≤ Y − X , we see that
f (X ) − f (Y ) − L(Y − X )
˜
ˆ
≤ fx1 (X ) − fx1 (X ) + fx2 (X ) − fx2 (X ) .
Y −X
˜
ˆ
Because fx1 and fx2 are continuous and X − X < Y − X , X − X < Y − X ,
by making Y − X suﬃciently small the right side of the above inequality can be made
arbitrarily small. This proves the limit as Y − X → 0 of the expression on the left - exists
and is zero. Since L is linear, the proof that f is diﬀerentiable is complete. The continuous
diﬀerentiability is an immediate consequence of the linearity of L and the continuity of its
components - the partial derivatives fxi . Exercises
(1) i) Use the deﬁnition of the directional derivative to compute the given directional
derivatives, ii) Check your answer by computing the directional derivative using the
procedure of the Corollary to Theorem I.
(a) f (x1 , x2 ) = 1 − 2x1 + 3x2 , at (2, −1) in the direction (3, 4) . [Answer: + 6 ].
4
√
x+2y , at (3, −2) in the direction (1, 1) . [Answer: 3e−1 / 2 ].
(b) f (x, y ) = e
(c) f (u, v, w) = 3uv + uw − v 2 , at (1, 1, 1) in the direction (1, −2, 2) .
3
4
(d) f (x, y ) = 1 − 3y + xy at (0, 6) in the direction ( 5 , − 5 ) . (2) i) Compute all of the ﬁrst and second partial derivatives for the following functions.
(a) f (x1 , x2 ) = x1 + x1 sin 2x1
√
(b) f (x1 , x2 , x3 ) = x2 x2 + 2x1 x3 − x3
1
(c) f (x, y ) = xy
(d) f (x1 , x2 , . . . , xn ) = a + a1 x1 + a2 x2 + . . . + an xn .
n (e) f (x1 , x2 , . . . , xn ) = aij xi xj = X, AX , where aij = aji (ﬁrst try the cases
i,j =1 n = 2 and n = 3 to see what is happening).
ii) Find the 1 × n matrix f (x) .
(3) For the surfaces deﬁned by the functions f (X ) listed below, ﬁnd the equation of the
tangent plane to the surface at the point (X0 , f (X0 )) . Draw a sketch showing the
surface and its tangent plane.
(a) f (X ) = x2 + 3x2 + 1,
1
2
(b) f (X ) = ex1 x2 , X0 = (0, 1) (c) f (X ) = x2 sin πx2 ,
1 X0 = (−1, 1 )
2 (d) f (X ) = − 1 x1 + x2 + 1,
2
x2
1 X0 = (0, 0). 2x2
2 X0 = (2, 1) (e) f (X ) =
+
− x1 x3 + x1 , X0 = (1, −2, −1).
Why can’t you sketch the surface deﬁned by this function? 8.1. THE DIRECTIONAL AND TOTAL DERIVATIVES 319 (4) Let f (X ) and g (X ) both map A ⊂ En → E1 . If f and g are diﬀerentiable for all
X ∈ A , prove
d
dX [af (X ) (a) df
dg
+ bg (X )] = a dX (X ) + b dX (X ) (Linearity), where a and b are con- stants.
dg
df
d
dX [f (X )g (X )] = f (X ) dX (X ) + g (X ) dX (X )
f (X )
g (X )f (X )− f (X )g (X )
d
, if g (X ) = 0 .
dX g (X ) =
g 2 (X ) (b)
(c) d
d
(5) Use the rules (a-c) of Exercise 4 to compute dX [2f − 3g ], dX [f · g ] , and
f (X ) = f (x1 , x2 ) = 1 − x1 + x1 x2 , and g (X ) = g (x1 , x2 ) = ex1 −x2 . (6) Let f (X ) = f (x, y ) = xy (x2 −y 2 )
,
x2 + y 2 0 df
dX [ g ] , where X = (x, y ) = 0
X=0 Prove
(a) f, fx , fy are continuous for all X ∈ E2 . [Hint: Prove and use 2xy ≤ x2 + y 2 ].
(b) fxy and fyx exist for all X ∈ E2 , and are continuous except at the origin.
(c) fxy (0) = 1, fyx (0) = −1 , so fxy (0) = fyx (0) (cf. Remark p. 577).
(7) Let f : A ⊂ En → E be a diﬀerentiable map. Prove it is necessarily continuous. [Hint:
This is a simple consequence of the deﬁnition in the form (1)].
(8) Let f : A ⊂ En → E be a continuous map. We say f has a local maximum at the
point X0 interior to A if f (X0 ) ≥ f (X ) for all X in some suﬃciently small ball
about X0 . If we assume f is continuously diﬀerentiable, more can be said.
(a) If f as above has a local maximum at the point X0 , prove f (X0 , X − X0 ) +
R(X0 , X ) X − X0 ≤ 0 for all X is some small ball about X0 .
(b) Use the property of R(X0 , X ) to conclude the stronger statement
f (X0 ), (X − X0 ) ≤ 0.
for all X in some small ball about X0 .
(c) Observe the statement must also hold for the vector X0 − X , which points in
the direction opposite to X − X0 , to conclude
f (X0 ), (X − X0 ) ≥ 0,
and hence that in fact
f (X0 ), Z = 0,
for all vectors Z = X − X0 .
(d) Finally, show that at a maximum,
f (X0 ) = 0.
(9) (a) Find the equation of the plane which is tangent at the point X0 = (2, 6, 3) to
the surface consisting of the points (X, f (X )) , where
f (X ) = f (x, y, z ) = (x2 + y 2 + z 2 )1/2 . 320 CHAPTER 8. MAPPINGS FROM EN TO E : THE DIFFERENTIAL CALCULUS
(b) Use the tangent plane found above to ﬁnd the approximate value of
((2.01)2 + (5.98)2 + (2.99)2 )1/2 .
(10) Assume the continuously diﬀerentiable function f (X ) has a zero derivative, f (X ) ≡
0 , for X in some ball in En . Prove that f (X ) ≡ constant throughout the ball.
(11) (a) Show the following functions satisfy the two dimensional Laplace equation
∂2u ∂2u
+ 2 =0
∂x2
∂y
i) u(x, y ) = x2 − y 2 − 3xy + 5y − 6
ii) u(x, y ) = log(x2 + y 2 ) , except at the origin, (x, y ) = 0.
iii) u(x, y ) = ex sin y
(b) Show the following functions satisfy the one (space) dimensional wave equation
utt = c2 uxx , c ≡ constant [Here t is time and x is space; c is the velocity of light, sound, etc.]
i) u(x, y ) = ex−ct − 2ex+ct
ii) u(x, y ) = 2(x + ct)2 + sin 2(x − ct).
(12) Let f : A ⊂ En → E be continuously diﬀerentiable throughout A . If X0 ∈ A is
not a critical point of f , so f (X0 ) = 0 , prove the directional derivative at X0 is
greatest in the direction emax := f (X0 )/ f (X0 ) , and least in the opposite direction,
emin := −emax . [Hint: Use the Schwarz inequality.]
(13) Consider the function f (X ) = f (x, y ) = xy
,
x2 + y 2 0, X = (x, y ) = 0
X=0 Since F is the quotient of two continuous functions, it is continuous except possibly
at the origin, where the denominator vanishes. Show that f (X ) is not continuous at
the origin by ﬁnding lim f (X ) as X → 0 along paths 1 and 2, and showing that
lim f (X ) = lim f (X ).
X →0 X →0 path1 path2 (14) Let L be the partial diﬀerential operator deﬁned by
Lu = ∂2u
∂2u
∂2u
−5
+6 2.
∂x2
∂x∂y
∂y Show that
L[eαx+βy ] = p(α, β )eαx+βy ,
where p(α, β ) is a polynomial in α and β . Find a solution of the linear homogeneous
partial diﬀerential equation Lu = 0 . Find an inﬁnite number of solutions of Lu = 0 ,
one for each value of α , by choosing α to depend on β in a particular way. [Answer:
e2βx+βy and e3βx+βy are solutions for any β ]. 8.2. THE MEAN VALUE THEOREM. LOCAL EXTREMA. 321 (15) The two equations
x = eu cos v
y = eu sin v
deﬁne u = f (x, y ) and v = g (x, y ) . Find the functions f and g for x > 0 . Compute
f (X ) and g (X ) and show f (X ) ⊥ g (X ) .
(16) This exercise gives an example in which the ﬁrst partial derivatives of a function exist
but the function is not continuous, let alone diﬀerentiable. Let
f (X ) = f (x, y ) = xy 2
,
x2 + y 4 0, X = (x, y ) = 0
X = 0. (a) If cos α = 0 , prove the directional derivative at the origin in the direction e =
(cos α, sin α) exists and is
De f (0) = 2 sin2 α
,
cos α cos α = 0 while if cos α = 0 ,
De f (0) = 0, cos α = 0. (b) Prove f is discontinuous at the origin by showing lim f (X ) has two diﬀerent
X →0 values along the two paths in the ﬁgure. Then appeal to exercise 7 to conclude
f is not diﬀerentiable.
(17) (a) Let P (X ), X ∈ En , be a polynomial of degree N , that is,
ak1 ,...,kn xk1 xk2 · · · xkn ,
n
12 P (X ) =
k1 +k2 +···+kn ≤N where k1 , k2 , . . . , kn are all non-negative integers. Prove P (α) is continuously
diﬀerentiable. [Hint: How do you prove a polynomial in one variable is continuously diﬀerentiable.]
(b) Let R(X ), X ∈ En , be a rational function - that is, the quotient of two polynomials. Prove R(X ) is continuously diﬀerentiable whenever the denominator is
not zero.
(18) If f : E1 → E1 , show that the deﬁnition of diﬀerentiability on page 578 coincides with
the usual one. 8.2 The Mean Value Theorem. Local Extrema. Although the full “chain rule” will not be proved until Chapter 10, we shall need a very
special and elementary case to develop the main features of the theory of mappings from En
to E . Let f : A ⊂ En → E be a continuously diﬀerentiable function at all interior points
of A . Take X and Z to be ﬁxed interior points of A . Let φ(t) = f (X + tZ ) . We want
to compute
d
d
φ(t) = f (X + tZ )
dt
dt
that is, the rate of change of f (X ) at the point X + tZ as X varies along the line joining
X to Z . 322 CHAPTER 8. MAPPINGS FROM EN TO E : THE DIFFERENTIAL CALCULUS
Theorem 8.4 . Let f : A → E be a diﬀerentiable function throughout A . If X and Z
are two interior points of A , and if the line segment joining them is in A , then
d
f (X + tZ ) = f (X + tZ )Z,
dt t ∈ (0, 1). By the product f (Y )Z we mean matrix multiplication.
Proof: For ﬁxed X and Z , the function φ(t) := f (X + tZ ) an ordinary scalar valued
function of the one variable t . Thus
φ(tλ ) − φ(t)
d
φ(t) = lim
λ→0
dt
λ
f (X + tZ + λZ ) − f (X + tZ )
= lim
λ→0
λ
f (X + tZ + λZ ) − f (X + tZ ) − f (X + tZ )(λZ ) + f (X + tZ )(λZ )
= lim
λ→0
λ
Since f is diﬀerentiable at X + tZ , then as λ → 0 the ﬁrst three terms tend to zero. The
factor λ in the last term cancels. Therefore
d
f (X + tZ ) = lim f (X + tZ )Z = f (X + tZ )Z,
λ→0
dt
as claimed.
An easy consequence is
Theorem
set in En ,
X and Y
joining X 8.5 (The Mean Value Theorem). Let f : A → E , where A is an open convex
that is, if X and Y are any points in Z , then the straight line segment joining
is in A too. If f is diﬀerentiable in A , there is a point Z on the segment
and Y such that
f (Y ) − f (X ) = f (Z )(Y − X ). If, moreover, f is bounded by some constant C, f (X ) ≤ C for all X ∈ A , then
|f (Y ) − f (X )| ≤ C Y − X
a figure goes here
Proof: Every point on the segment joining X and Y is of the form X + t(Y − X ) , where
t ∈ [0, 1] . Consider the function φ(t) of one variable,
φ(t) = f (X + t(Y − X )).
Theorem 4 states φ is diﬀerentiable. Therefore, by the one variable mean value theorem,
there is a number t0 in the interval (0, 1) such that φ(1) − φ(0) = φ (t0 ) . But φ(1) =
f (Y ), φ(0) = f (X ) and, by Theorem 4, φ (t0 ) = f (X + t0 (Y − X ))(Y − X ) . Letting
Z = X + t0 (Y − X ) , a point on the segment joining X to Y , we conclude
f (Y ) − f (X ) = f (Z )(Y − X ). 8.2. THE MEAN VALUE THEOREM. LOCAL EXTREMA. 323 The second part of the theorem follows by applying the Schwarz inequality to the function
f (Z )(Y − X ) which can be written as f (Z ), (Y − X ) . Then
f (Z ), Y − X ≤ f (Z )
Therefore if Y −X . f (Z ) ≤ C for all Z ∈ A , we ﬁnd
|f (Y ) − f (X )| ≤ C Y − X . Corollary 8.6 Let f : A → E be a diﬀerentiable map and A an open connected set in En
(by a connected open set we mean it is possible to join any two points in A by a polygonal
curve contained in A). If f (X ) ≡ 0 for every X ∈ A , that is, if fx1 (X ) = . . . = fxn (X ) =
0 , then f (X ) ≡ c , c a constant.
Proof: If A is convex, say a ball, this is an immediate consequence of the second part of
the mean value theorem, for f (X ) = 0 so |f (Y ) − f (X )| = 0 . Thus f (Y ) = f (X ) =
constant for any two points X and Y . The requirement that A is connected is to exclude
the possibility that A consists of two (or more) disjoint sets, in which case, all we can
conclude is that f is constant on each connected part, but not necessarily the same constant.
However, if A is connected, then any two points in A can be joined by a polygonal curve
which is contained in A . Consider some straight line segment in this curve. By the mean
value theorem, f must be constant on it. In particular, it has the same value at both
end points. Checking the beginning and end of the whole polygonal curve, we ﬁnd that
f (X ) = f (Y ) . Because X and Y were any points, we are done.
It is not at all diﬃcult to generalize the mean value theorem to Taylor’s theorem and
then to power series for functions of several variables. The only problem is one of notation,
and that is a problem. As a compromise, we will prove the Taylor theorem - but only the
ﬁrst two terms for functions of three variables f (x, y, z ) .
Just as in the mean value theorem, the idea is to reduce the problem to a function
φ(t) of one real variable, because we do know the result for these functions. Let f be
diﬀerentiable in some open set A ⊂ E3 and X0 a point in A . If X0 + h is also in A , we
would like to express f (X0 + h) in terms of f and its derivatives at X0 . Fix X0 and h
and consider the real valued function φ(t) of one variable deﬁned by
φ(t) = f (X0 + th), t ∈ [0, 1]. Then by Theorem 4,
φ (t) = f (X0 + th)h = fx (X0 + th)h1 + fy (X0 + th)h2 + fz (X0 + th)h3 ,
where h = (h1 , h2 , h3 ) . Since each of the partial derivatives are maps from A to E , they
can be diﬀerentiated in the same way f was. So can a sum of such functions. Thus
φ (t) = d
[fx (X0 + th)h1 + · · · + fz (X0 + th)hn ]
dt = fxx (X0 + th)h1 h1 + fxy (X0 + th)h1 h2 + fxz (X0 + th)h1 h3
+fyx (X0 + th)h2 h1 + fyy (X0 + th)h2 h2 + fyz (X0 + th)h2 h3
+fzx (X0 + th)h3 h1 + fzy (X0 + th)h3 h2 + fzz (X0 + th)h3 h2 . 324 CHAPTER 8. MAPPINGS FROM EN TO E : THE DIFFERENTIAL CALCULUS
If we introduce a matrix H (X ) , the Hessian matrix, whose elements are ∂ 2 f (X )
,
∂xi ∂xj φ (t) can be written as
φ (t) = h, H (X0 + th)h .
We remark that if f is suﬃciently diﬀerentiable (two continuous derivatives is enough),
then the Hessian matrix is self-adjoint since fxi xj = fxj xi , as we mentioned - but did not
prove - earlier. If φ(t) is twice diﬀerentiable, by Taylor’s theorem for functions of one
variable, we know that
φ(1) = φ(0) + φ (0) + 1
φ (τ ),
2! τ ∈ (0, 1). Substituting into this formula, we ﬁnd
1
h, H (X0 + τ h)h .
2! f (X0 + h) = f (X0 ) + f (X0 )h +
Let us summarize. We have proved Theorem 8.7 (Taylor’s Theorem with two terms) . Let f : A → E , where A is
an open connected set in En . Assume f has two continuous derivatives - that is, all the
second partial derivatives of f exist and are continuous. If X0 is in A and X0 + h is in
a ball about X0 in A , then
1
h, H (X0 + τ h)h ,
2! f (X0 + h) = f (X0 ) + f (X0 )h + ∂2f
)) is the n × n Hessian matrix and τ ∈ (0, 1) .
∂x1 ∂xj
Letting X = X0 + h and Z = X0 + τ h, Z being a point on the line segment joining
X0 to X , this reads where H (X ) = (( f (X ) = f (X0 ) + f (X0 )(X − X0 ) + 1
X − X0 , H (Z )(X − X0 ) ,
2! or, in more detail,
n f (X ) = f (X0 ) +
i=1 1
∂f (X0 )
(xi − x0 ) +
i
∂xi
2 n n i=j j =1 ∂ 2 f (Z )
(xi − x0 )(xj − x0 ).
i
j
∂xi ∂xj Example: Find the ﬁrst two terms in the Taylor expansion for the function f (X ) =
f (x, y ) = 5 + (2x − y )3 about the point X0 = (1, 3) .
We compute
fx (X ) = 6(2x − y )2 , fy (X ) = −3(2x − y )2
fxx (X ) = 24(2x − y ), fxy (X ) = fyx (X ) = −12(2x − y ), fyy (X ) = 6(2x − y ).
Therefore f (X0 ) = 4, fx (X0 ) = 6, fy (X0 ) = −3 , so
f (X ) = 4 + (6, −3) 1
x−1
2ξ − η
−12(2ξ − η )
+ (x − 1, y − 3)
y−3
−12(2ξ − η )
6(2ξ − η )
2 x−1
y−3 8.2. THE MEAN VALUE THEOREM. LOCAL EXTREMA. 325 where Z = (ξ, η ) is a point on the segment between X0 = (1, 3) and X = (x, y ) . Written
out, the above equation reads,
1
f (x, y ) = 4 + 6(x − 1) − 3(y − 3) + [fxx (x − 1)2 + 2fxy (x − 1)(y − 3) + fyy (y − x)2 ],
2
where the second derivatives are evaluated at Z = (ξ, η ) .
We are now in a position to examine the extrema of functions of several variables.
Finding the maxima and minima of functions is important for several reasons. First of all,
there is the vague emotional feeling that all patterns of action should maximize or minimize
something. Second, we can investigate a complicated geometrical object by the relatively
easy procedure of ﬁnding the local maxima and minima. Without further mention, for the
balance of this section f (X ) will be a twice continuously diﬀerentiable function which maps
the open set A ⊂ En into E .
Definition: A function f : A → E has a local maximum at the interior point X0 ∈ A if,
for all X in some open ball about X0
f (X ) ≤ f (X0 ).
f has a local minimum at X0 if for all X in some open ball about X0
f (X ) ≥ f (X0 ).
If f has a local maximum or minimum at X0 , is f (X0 ) = 0 ? Certainly.
Theorem 8.8 . If f has a local maximum or minimum at X0 , then f (X0 ) = 0 . In
coordinates, this means all the partial derivatives vanish at X0 ,
∂f
∂f
∂f
(X0 ) = · · · =
(X0 ) = 0.
(X0 ) =
∂x1 1
∂x2
∂xn
Proof: Let η be any ﬁxed vector. Then the function φ(t) of one variable
φ(t) = f (X0 + tη )
has a local maximum or minimum at t = 0 . Consequently φ (0) = 0 . But by Theorem 4,
φ (0) = f (X0 )η which we may write as f (X0 ), η . Thus f (X0 ), η = 0 , so the vector
f (X0 ) is orthogonal to η . Since η was any vector, we conclude that f (X0 ) = 0 .
The derivative f (X0 ) may vanish at points other than maxima or minima. An example
is the “saddle point” of the hyperbolic paraboloid at the beginning of Section 1. All points
where f vanishes are called critical points or stationary points of f . Let us give a precise
deﬁnition of a saddle point. f has a saddle point at X0 if X0 is a critical point of f
and if every ball about X0 contains points X1 and X2 such that f (X1 ) < f (X0 ) and
f (X2 ) > f (X0 ) . Thus, every critical point is either a local maximum, minimum, or saddle
point.
There is a more intuitive way to prove Theorem 7. If e is a unit vector, then by
Theorem 2, the directional derivative at X in the direction e is De f (X ) = f (X ), e . In
what way should you move so f increases fastest? By the Schwartz inequality, we ﬁnd
|De f (X )| ≤ f (X ) e = f (X ) , 326 CHAPTER 8. MAPPINGS FROM EN TO E : THE DIFFERENTIAL CALCULUS
with equality if and only if the vectors e and f (X ) are parallel. Thus, the directional
derivative is largest when e has the same direction as f (X ) , and smallest when e has the
opposite direction, emax = f (X )/ f (X ) , emin = −emax ,
Demax f (X ) = f (X ) , Demin f (X ) = − f (X ) .
If X0 is a local maximum of f , then f (X0 ) must be zero, for otherwise you could move in
the direction of f (X0 ) and increase the value of f . Similarly, if X0 is a local minimum,
f (X0 ) must be zero.
Once we know X0 is a critical point of f, f (X0 ) = 0 , an eﬀective criterion is needed
to determine if X0 is a local maximum, minimum, or saddle point for f . In elementary
calculus, the sign of the second derivative was used. Our next theorem generalizes this test.
The idea is essentially the same as in the one variable case (p. 104a-c). If f has a
local maxima or minima, the tangent plane to the surface whose points are (X, f (X )) is
horizontal, that is, f (X0 ) = 0 . Thus, near X0 the quadratic terms - the next lowest power
in the Taylor expansion of f about X0 —will determine the behavior of f near X0 . Let
X0 be the origin and take f (X ) = f (x, y ) to be a function of two variables with f (0) = 0 .
Then near X0 = 0 , by Taylor’s theorem, we have
1
f (x, y ) ∼ [ax2 + 2bxy + cy 2 ],
2
where a = fxx (0), b = fxy (0) , and c = fyy (0) . The nature of the quadratic form
Q(X ) = ax2 + 2bxy + cy 2
has already been determined. If Q(X ) is positive deﬁnite, then Q(X ) > 0 for X = 0 .
Since f (x, y ) ∼ Q(X ) , this means f (x, y ) is positive near the origin. Because f (0, 0) = 0 ,
this implies the origin is a minimum for f .
Instead of completing and rigorously justifying this special case, we shall immediately
treat the general situation.
Theorem 8.9 . Assume the twice continuously diﬀerentiable function f : A → E has a
critical point at an interior point X0 of A ⊂ En , f (X0 ) = 0 . Let H (X0 ) be the Hessian
∂2f
matrix
(X0 )
evaluated at X0 .
∂xi ∂xj
(a) If H (X0 ) is positive deﬁnite, then f has a local minimum at X0 .
(b) If H (X0 ) is negative deﬁnite, then f has a local maximum at X0 .
(c) If at least two of the diagonal elements of H (X0 ), fx1 x1 (X0 ), . . . , fxn xn (X0 ) have
diﬀerent signs, then X0 is a saddle point.
(d) Otherwise the test fails.
Proof: If X0 is a critical point for f , then Taylor’s theorem (Theorem 6) states
f (X0 + η ) = f (X0 ) + 1
η , H (Z )η
2 where Z is between X0 and X0 + η . The linear term has been dropped since f (X0 ) = 0 . 8.2. THE MEAN VALUE THEOREM. LOCAL EXTREMA. 327 As in the proof of Taylor’s theorem, let
φ(t) = f (X0 + tη ).
Then
φ (t) = η , H (X0 + tη )η .
Since the second derivatives of f are assumed to be continuous, the function φ (t) is a
continuous function of t . Consequently, if φ (0) is positive then φ (t) is also positive
for all t suﬃciently close to zero (Theorem I p. 29b). Because φ (0) = η , H (X0 )η and
φ (τ ) = η , H (Z )η , where Z = X0 + τ η , this implies if H is positive deﬁnite at X0 , it
is also positive deﬁnite at Z when Z is close to X0 .
Assuming H (X0 ) is positive deﬁnite, we see that for all η suﬃciently small, H (Z ) is
positive deﬁnite. Therefore,
f (X0 + η ) − f (X0 ) = 1
η , H (Z )η > 0,
2 η = 0, that is
f (X0 + η ) − f (X0 ) > 0
for all η is some small ball about X0 . Thus f has a local minimum at X0 .
If H (X0 ) is negative deﬁnite, the same proof with trivial modiﬁcations works. Another
way to complete the proof is to apply part a) to the function g (X ) := −f (X ) . The Hessian
for g at X0 will be −H (X0 ) which is positive deﬁnite (since H (X0 ) was negative deﬁnite).
Thus g has a local minimum at X0 so f : = −g has a local maximum at X0 .
If any two of the diagonal elements of H (X0 ) have opposite sign, say fx1 x1 (X0 ) > 0
and fx2 x2 (X0 ) < 0 , then for η = λe1 = (λ, 0, 0, . . . , 0) , λ any real number, we ﬁnd
η , H (X0 )η = λ2 fx1 x1 (X0 ) > 0 , while for η = λe2 = (0, λ, 0, . . . , 0) η , H (X0 )η =
λ2 fx2 x2 (X0 ) < 0 . Therefore the quadratic form η , H (X0 )η assumes positive and negative
values in any ball about X0 , proving X0 is a saddle point.
Since this theorem reduces the investigation of the nature of a critical point to testing if
a matrix is positive or negative deﬁnite, it would do well in this context to repeat Theorem
A (p. 386d) which tells us when a 2 × 2 matrix is positive deﬁnite.
Corollary 8.10 . Let X0 be a critical point for the function of two variables f (x, y ) with
Hessian matrix
fxx (X0 ) fxy (X0 )
H (X0 ) =
.
fxy (X0 ) fyy (X0 )
(a) If det H (X0 ) > 0 and fxx (X0 ) > 0 , then f has a local minimum at X0 .
(b) If det H (X0 ) > 0 and fxx (X0 ) < 0 , then f has a local maximum at X0 .
(c) If det H (X0 ) < 0 , then f has a saddle point at X0 (this is a stronger statement
than part c of Theorem 8).
Proof: Since these merely join Theorem A (p. 386d) with Theorem 8, the proof is done.
Examples: 328 CHAPTER 8. MAPPINGS FROM EN TO E : THE DIFFERENTIAL CALCULUS
(1) Find and classify the critical points of the function w = f (x, y ) := 3 − x2 − 4y 2 + 2x .
A sketch of the surface with points (x, y, f (x, y )) , a paraboloid, is at the right. At a
critical point f (X ) = 0 , that is, fx = 0, fy = 0 . Since
fx = −2x + 2, fy = −8y,
at a critical point
−2x + 2 = 0, −8y = 0. There is therefore only one critical point, X0 = (1, 0) . We look at the Hessian to
determine the nature of the critical point. Because fxx = −2, fxy = fyx = 0, fyy =
−8 ,
−2
0
H (X0 ) =
.
0 −8
Since det H (X0 ) = 16 > 0 and fxx (X0 ) = −2 < 0, H (X0 ) is negative deﬁnite so
X0 = (1, 0) is a local maximum for the function, and at that point f (X0 ) = 4 .
(2) Find and classify the critical points of w = f (x, y ) = −x2 + y 2 .
The surface (x, y, f (x, y )) is a hyperbolic paraboloid. We expect a saddle point at
the origin. At a critical point
fx = −2x = 0, fy = 2y = 0. Thus the origin (0, 0) is the only critical point. Since
H (x, y ) = −2 0
02 , and det H (0, 0) = −4 < 0 , the origin is a saddle point. This also follows from the
observation that the diagonal elements have diﬀerent signs.
(3) Find and classify the critical points of
w = f (x, y ) = [x2 + (y + 1)2 ][x2 + (y − 1)2 ].
At a critical point,
fx = 2x[x2 + (y − 1)2 ] + 2x[x2 + (y + 1)2 ] = 0
and
fy = 2(y + 1)[x2 + (y − 1)2 ] + 2(y − 1)[x2 + (y + 1)2 ] = 0.
The ﬁrst equation implies x = 0 . Substituting this into the second we ﬁnd y = 0, y =
1, y = −1 . Thus there are three critical points
X1 = (0, 0), X2 = (0, 1), X3 = (0, −1). We must evaluate the Hessian matrix at these points. Since
fxx = 12x2 + 4y 2 + 4,
H (X1 ) = 4
0
0 −4 fxy = 9xy,
, fyy = 4x2 + 12y 2 = −4, H (X2 ) = 80
08 = H (X3 ). Because det H (X1 ) = −16 < 0, X1 = (0, 0) is a saddle point. Because det H (X2 ) >
det H (X3 ) = 64 > 0 and fxx (X2 ) = fxx (X3 ) = 8 > 0 , both X2 = (0, 1) and
X3 = (0, −1) are local minima. To complete the computation, we ﬁnd f (X1 ) =
1, f (X2 ) = 0, f (X3 ) = 0 . A sketch of the surface is at the right. 8.2. THE MEAN VALUE THEOREM. LOCAL EXTREMA. 329 (4) Find and classify the critical points of
w = f (x, y, z ) = 1 − 2x + 3x2 − xy + xz − z 2 + 4z + y 2 + 2yz.
At a critical point,
fx = −2 + 6x − y + z, fy = −x + 2y + 2z, fz = x − 2z + 4 + 2y.
Solving these equations, we ﬁnd only one critical point, X0 = (0, −1, 1), where
f (X0 ) = 3 . Since
fxx = 6, fxy = −1, fxz = 1 fyy = 2, fyx = 2, fzz = −2,
then 6 −1
1
2
2 .
H (X ) = −1
1
2 −2 Because the diagonal elements 6, 2, −2 are not all of the same sign, by part c of the
theorem, the critical point X0 = (0, −1, 1) is a saddle point.
(5) Find and classify the critical points of w = f (x, y ) := x2 y 2 . At a critical point,
fx = 2xy 2 = 0, fy = 2x2 y = 0. Thus the points where either x = 0 or y = 0 are all critical points. Since
fxx = 2y 2 , fxy = 4xy,
we ﬁnd
H (X ) = fyy = 2x2 , 2y 2 4xy
4xy 2x2 If either x = 0 or y = 0 , then det H = 0 so none of our tests apply to determine the
nature of the critical point. However, a glance at the function f (x, y ) = x2 y 2 reveals that
all of the points where either x = 0 or y = 0 are clearly local minima, since f = 0 there,
while f > 0 elsewhere. Exercises
(1) Find and classify the critical points of the following functions.
(a) f (x, y ) = x2 − 3x + 2y 2 + 10
(b) f (x, y ) = 3 − 2x + 2y + x2 y 2
(c) f (x, y ) = [x2 + (y + 1)2 ][4 − x2 − (y − 1)2 ]
(d) f (x, y ) = x3 − 3xy 2 (ﬁgure on next page)
(e) f (x, y ) = xy − x + y + 2
(f) f (x, y ) = x cos y
(g) f (x, y, z ) = 2x2 + 3xz + 5z 2 + 4y − y 2 + 7 330 CHAPTER 8. MAPPINGS FROM EN TO E : THE DIFFERENTIAL CALCULUS
(h) f (x, y, z ) = 5x2 + 4xy + 2y 2 + z 2 − 4z + 31
(2) Let X1 , . . . , XN be N distinct points in En . Find a point X ∈ En such that the
function
f (X ) = X − X1 2 + · · · + X − XN 2
is a minimum. [Answer: X = 1
N N
j =1 Xj , the center of gravity.] (3) (a) Find the minimum distance from the origin in E3 to the plane 2x + y − z = 5.
(b) Find the minimum distance from the origin in En to the hyperplane a1 x1 +
a2 x2 + · · · + an xn = c .
(c) Find the minimum distance between the ﬁxed point X0 = (˜1 , . . . , xn ) and the
x
˜
hyperplane a1 x1 + . . . + an xn = c .
(d) Find the minimum distance between the two parallel planes a1 x1 +· · ·+an xn = c1
and a1 x1 + · · · + an xn = c2 ,
(4) If f (x, y ) has two continuous derivatives, use Taylor’s Theorem (Theorem 6) to prove
f (x + h1 , y + h2 ) = f (x, y ) + fx (x, y )h1 + fy (x, y )h2
1
+ [fxx (x, y )h2 + 2fxy (x, y )h1 h2 + fyy (x, y )h2 ] + (h2 + h2 )R,
1
2
1
2
2
where R depends on x, y, h1 and h2 , and lim R = 0 .
h1 →0 h2 →0 (5) (a) If u(x, y ) has two continuous derivatives, use the result of Exercise 4 to prove
uxx (x, y ) = u(x + h1 , y ) − 2u(x, y ) + u(x − h1 , y )
˜
+ h1 R
h2
1 uyy (x, y ) = u(x, y + h2 ) − 2u(x, y ) + u(x, y − h2 )
ˆ
+ h2 R,
h2
2 and ˜
where lim R = 0
h1 →0 and ˆ
lim R = 0 . h2 →0 (b) Use part a) to deduce that if h1 = h2 = h then
uxx (x, y ) + uyy (x, y )
u(x + h, y ) + u(x − h, y ) + u(x, y + h) + u(x, y − h)
4 + h2 R
= 2 [u(x, y ) −
h
4
(8-2)
where lim R = 0 .
h→0 (c) Use part b) to deduce that if h is small, the solution of the partial diﬀerential
equation uxx + uyy = 0 , Laplace’s equation, approximately satisﬁes the diﬀerence
equation
u(x + h, y ) + u(x − h, y ) + u(x, y + h) + u(x, y − h)
4
This diﬀerence equation states that the value of u at the center of a cross equals
the arithmetic mean (“average”) of its values at the four ends of the cross. One
could use the diﬀerence equation to solve Laplace’s equation numerically.
u(x, y ) = 8.2. THE MEAN VALUE THEOREM. LOCAL EXTREMA. 331 (d) Prove that any function which satisﬁes the above diﬀerence equation in some set
cannot have a maxima or minima inside that set. [Do not diﬀerentiate! Reason
directly from the diﬀerence equation. No computation is necessary.]
(6) If all the second partial derivatives of a function f (X ) vanish identically in some open
connected set, prove that f is an aﬃne function.
(7) (The Method of Least Squares). Let Z1 , . . . , ZN be N distinct points in En , and
w1 , . . . , wN a set of N numbers. We imagine the points (Zj , wj ) ∈ En+1 to be points
on a surface M in En+1 . Find a hyperplane
w = φ(X ) = c + ξ1 x1 + · · · , +ξn xn ≡ c + ξ , X
which most closely approximates the surface M in the sense that the error E (ξ )
N |φ(Zn ) − wj |2 = E (ξ ) :=
j =1 is minimized. Note that you are to ﬁnd the coeﬃcients ξ1 , . . . , ξn in the equation of
the hyperplane.
(8) (a) Let u(x, y ) be a twice continuously diﬀerentiable function which satisﬁes the
partial diﬀerential equation
Lu := uxx + uyy + aux + buy − cu = 0
in some open set D , where the coeﬃcients a(x, y ), b(x, y ) , and c(x, y ) are continuous functions. If c > 0 throughout D , prove that u(x, y ) cannot have a
positive maximum or negative minimum anywhere in D .
(b) Extend the result of part a) to functions u(x1 , . . . , xn ) which satisfy
n Lu :=
i=1 n ∂2u
∂u
− cu = 0,
aj
2+
∂x1 j =1 ∂xj in some open set D , where c > 0 throughout D .
(c) If u(x, y ) satisﬁes the equation of part a) and u vanishes on the boundary of
D, u ≡ 0 on ∂D , prove that u(x, y ) ≡ 0 throughout D .
(d) Assume u(x, y ) and v (x, y ) both satisfy the same equation Lu = 0, Lv = 0 ,
where L is the operator of part a). If u(x, y ) ≡ v (x, y ) on the whole boundary
of D , prove that u(x, y ) ≡ v (x, y ) throughout the interior of D .
(9) Let f be a twice continuously diﬀerentiable function throughout the open set A .
Prove that
(a) if f has a local minimum at X0 ∈ A , then its Hessian H (X0 ) is positive deﬁnite
or semi-deﬁnite there.
(b) if f has a local maximum at X0 ∈ A , then its Hessian H (X0 ) is negative
deﬁnite or semi-deﬁnite there. 332 CHAPTER 8. MAPPINGS FROM EN TO E : THE DIFFERENTIAL CALCULUS
(10) Let A be a square n × n self-adjoint matrix and Y a ﬁxed vector in En , and let
f (X ) = X, AX − 2 X, Y .
(a) If f (X ) has a critical point at X0 , prove X0 satisﬁes the equation
AX0 = Y.
(b) If A is positive deﬁnite and X0 satisﬁes the equation AX0 = Y , prove f (X )
deﬁned above has a minimum at X0 . [The results of this problem remain valid
if A is any positive deﬁnite linear operator - possibly a diﬀerential operator.
The nonlinear function f (X ) deﬁnes a variational problem associated with the
equation AX = Y .]
(11) If f : A → E has three continuous derivatives in the open set A ⊂ E2 containing the
origin, state precisely and prove Taylor’s Theorem with three terms about the origin.
The resulting expression will be
f (X ) := f (x, y ) = f (0) + fx (0)x + fy (0)y + 1
[fxx (0)x2 + 2fxy (0)xy + fyy (0)y 2 ]
2! 1
+ [fxxx (Z )x3 + 3fxxy (Z )x2 y + 3fxyy (Z )xy 2 + fyyy (Z )y 3 ]
3!
where Z is on the line segment between 0 and X = (x, y ) .
(12) If u(x, y ) has the property uxy (x, y ) = 0 for (x, y ) in some open set, prove u(x, y ) =
φ(x) + ψ (y ) , where φ and ψ are functions of one variable.
(13) Compute the direction(s) at X0 in which the following functions f
i) increase most rapidly,
ii) decrease most rapidly,
iii) remain constant.
(a) f (x1 , x2 ) = 3 − 2x1 + 5x2
(b) f (x, y ) = e2x+y
(c) f (x, y, z ) = 2x2 at X0 = (1, −2)
+ 3xy + 5z 2 + 4y − y 2 + 7 (d) f (u, v ) = uv − u + v + 2 8.3 at X0 = (2, 1) at at X0 (1, 0, −1) (−1, 1) . The Vibrating String. Waves. You have been hearing about them your whole life. Waves are the term used to
describe the oscillatory behavior of continuous media; water waves and sound waves being
the most familiar. We shall give a mathematical description of a very simple type of wave
- those in an oscillating violin string. The resulting mathematical model will be a second
order linear partial diﬀerential equation - the wave equation - with both initial and boundary
conditions. 8.3. THE VIBRATING STRING. a) 333 The Mathematical Model Consider a string of length
stretched along the x axis. Imagine the string vibrating
in the plane of the paper and let u(x, t) denote the vertical displacement of the point x
at time t . In order to end up with a tractable mathematical model several reasonable
simplifying assumptions will be made. We assume the tension τ and density ρ of the
string are constant throughout the motion, while the string is taken to be perfectly ﬂexible
so the tension force in the string acts along the tangential direction. Dissipative eﬀects (air
resistance, heating, etc.) are entirely neglected. One more assumption will be made when
needed. It essentially states that the oscillations are small in some sense.
Newton’s second law, ma =
F , is where we begin. Draw your attention to a small
segment of the string whose length, at rest, is ∆x = x2 − x1 . The mass of the segment is
ρ∆x . By Newton’s second law the segment moves in such a way that the product of its
center of gravity equals the resultant of the forces acting on it. For the vertical component,
this means
∂2u
x
ρ∆x 2 (˜, t) = Fv ,
∂t
where x ∈ (x, x + ∆x) is the horizontal coordinate of the center of gravity of the segment,
˜
and Fv means the vertical component of the resultant force.
There are two types of forces. One is the tension acting at both ends of the segment.
The other is gravity acting down with a force equal to the weight of the segment, ρg ∆x . To
evaluate the tension forces, let θ1 and θ2 be the angles the string makes with the horizontal
at either end of the segment (see ﬁgure above). Then the vertical component of the tension
force is
τ sin θ2 − τ sin θ1 .
The signs indicate one force is up while the other is down. Adding the tension force to the
gravitational force and substituting into Newton’s second law, we ﬁnd
ρ∆ x ∂2u
(˜, t) = τ (sin θ2 − sin θ1 ) − ρg ∆x.
x
∂t2 The dependence of θ1 and θ2 on the displacement can be brought out by using the
relation
ux
sin θ =
,
1 + u2
x
which follows from the relation ux = tan θ for the slope of the string. Using this, we obtain
the equation
ρ∆ x ∂2u
(˜, t) = τ
x
∂t2 ux
1+ u2
x x= x2 − ux
1 + u2
x − ρg ∆x.
x= x1 A simplifying assumption is badly needed. If the function ux /
a Taylor series,
1
ux
= ux − u3 + · · · ,
2
2x
1 + ux 1 + u2 is expanded in
x 334 CHAPTER 8. MAPPINGS FROM EN TO E : THE DIFFERENTIAL CALCULUS
we see that if the slope ux is small, essentially only the linear term in this series counts.
Therefore, we do assume the slope ux is small (this is the same assumption made in treating
the simple pendulum). With this simpliﬁcation, the equation of motion is
ρ∆ x ∂2u
(˜, t) = τ [ux (x2 , t) − ux (x1 , t)] − ρg ∆x.
x
∂t2 Divide both sides of this equation by ∆x = x2 − x1 and let the length of the interval shrink
to zero. Since
∂
∂2u
ux (x2 , t) − ux (x1 , t)
=
ux (x, t) =
(x, t),
x2 − x1
∂x
∂x2
(x2 −x1 )→0
lim where x is the limiting value of x1 and x2 , we ﬁnd
ρ ∂2u
∂2u
(x, t) = τ 2 (x, t) − ρg
2
∂t
∂x Because the length of the interval has been shrunk to one point x , the center of gravity is
now at x too.
It is customary to let τ /ρ = c2 . The constant c has units of velocity, and, in fact, is
just the speed with which waves travel along the string. Thus
Lu := utt − c2 uxx = −g.
This is the wave equation, a second order linear inhomogeneous partial diﬀerential equation.
As was the case with linear ordinary diﬀerential equations, it is easier to attempt ﬁrst to
solve the homogeneous equation
Lu := utt − c2 uxx = 0.
On physical grounds, we expect the motion u(x, t) of the string will be determined if
the initial position u(x, 0) and initial velocity ut (x, 0) are known, along with the motion of
both end points u(0, t) and u( , t) . However the mathematical model must be examined
to see if these four facts do determine the subsequent motion (which it should if the model
is to be of any use). Thus we must prove that given the
initial position
u(x, 0) = f (x),
x ∈ [0, ]
initial velocity
ut (x, 0) = g (x),
x ∈ [0, ]
motion of left end
u(0, t) = φ(t)
t≥0
motion of right end
u( , t) = ψ (t), t ≥ 0,
then a solution u(x, t) of the wave equation
utt − c2 uxx = 0
does exist which has these properties, and there is only one such solution. Existence and
uniqueness theorems must therefore be proved. b) Uniqueness .
This is almost identical to all uniqueness theorems encountered earlier, especially that
for the simple harmonic oscillator in Chapter 4, Section 2. 8.3. THE VIBRATING STRING. 335 Theorem 8.11 (Uniqueness). There exists at most one twice continuously diﬀerentiable
function u(x, t) which satisﬁes the inhomogeneous wave equation
Lu := utt − c2 uxx = F (x, t)
and the subsidiary
initial conditions: u(x, 0) = f (x) , ut (x, 0) = g (x), x ∈ [0, ]
boundary conditions: u(0, t) = φ(t) , u( , t) = ψ (t), t ≥ 0 ,
where F, f, g, φ , and ψ are given functions.
Proof: Assume u(x, t) and v (x, t) both satisfy the same equation and the same subsidiary
conditions. Let w(x, t) = u(x, t) − v (x, t) . Then Lw = Lu − Lv = F − F = 0 , so w satisﬁes
the homogeneous equation
Lw := wtt − c2 wxx = 0
and has zero subsidiary data
initial conditions:
w(x, 0) ≡ 0, wt (x, 0) ≡ 0, x ∈ [0, ]
boundary conditions:
w(0, t) ≡ 0, w( , t) ≡ 0, t ≥ 0
We want to prove w(x, t) ≡ 0 . Notice that w satisﬁes the equation for a vibrating string
which is initially at rest on the x axis, and whose ends never move. Therefore our desire
to prove the string never moves, w(x, t) ≡ 0 , is certain physically reasonable.
For this function w , deﬁne the new function E (t)
E (t) = 1
2 2
2
[wt + c2 wx ] dx.
0 We have named the function E (t) since it actually happens to be the energy in the string
associated with the motion w(x, t) at time t , except for a factor of ρ . Assume it is “legal”
to diﬀerentiate under the integral sign (it is). Upon doing so, we get
dE
=
dt [wt wtt + c2 wx wxt ] dx.
0 But an integration by parts reveals that
wx wxt dx = wx wt
0 0 − wt wxx dx.
0 Because the end points are held ﬁxed, w(0, t) = 0 and w( , t) = 0 , the velocity at those
points is zero too, wt (0, t) = 0 and wt ( , t) = 0 . This drops out the boundary terms in the
integration by parts. Substituting the last expression into that for dE/dt , we ﬁnd that
dE
=
dt wt [wtt − c2 wxx ] dx.
0 But w satisﬁes the homogeneous wave equation wtt − c2 wxx = 0 . Therefore dE/dt ≡ 0 ,
so
E (t) ≡ constant = E (0),
that is, energy is conserved. Now
E (0) = 1
2 2
2
[wt (x, 0) + c2 wx (x, 0)] dx.
0 336 CHAPTER 8. MAPPINGS FROM EN TO E : THE DIFFERENTIAL CALCULUS
Since the initial position is zero, w(x, 0) = 0 , its slope is also zero, wx (x, 0) = 0 . The
initial velocity wt (x, 0) is also zero, wt (x, 0) = 0 . Thus
E (t) ≡ E (0) ≡ 0,
that is,
0 = E (t) = 1
2 2
2
[wt (x, t) + c2 wx (x, t)] dx.
0 Because the integrand is positive, we conclude wt (x, t) ≡ 0 and wx (x, t) ≡ 0 . Consequently
w(x, t) ≡ constant. Since w(0, t) = 0 , that constant is the zero constant,
w(x, t) ≡ 0.
Therefore
u(x, t) − v (x, t) ≡ w(x, t) ≡ 0,
so u(x, t) ≡ v (x, t) : the solution is unique. c) Existence For the simple one (space) dimension wave equation, there are many ways to prove a solution
exists. The one to be given here is not the simplest (see Exercise 6 for the result of that
method), but it does generalize immediately to many other problems. It makes no diﬀerence
how we ﬁnd a solution, for once found, by the uniqueness theorem it is the only possible
solution. To avoid complications, we shall consider only the homogeneous equation and
assume the end points are tied down. Thus, we want to solve
Wave equations:
utt − c2 uxx = 0.
Initial conditions:
u(x, 0) = f (x), ut (x, 0) = g (x).
Boundary conditions:
u(0, t) = 0, u( , t) = 0.
The idea is ﬁrst to ﬁnd special solutions u1 (x, t), u2 (x, t), . . . , which satisfy the boundary conditions but do not necessarily satisfy the initial conditions. Then, as was done for
linear O.D.E.’s, we build the solution which does satisfy the given initial conditions as a
linear combination of these special solutions,
u(x, t) = Aj uj (x, t), that is, by superposition.
Let us seek special solutions in the form of a standing wave,
u(x, t) = X (x)T (t).
Here X (x) and T (t) are functions of one variable. Our procedure is reasonably called
separation of variables. Substitution of this into the wave equation gives
¨
T (t)X (x) − c2 X (x)T (t) = 0,
or ¨
X (x)
1 T (t)
=2
.
X (x)
c T (t) 8.3. THE VIBRATING STRING. 337 Since the left side depends only on x , while the right depends only on t , both sides must
be constant (a somewhat tricky remark; think it over). Let that constant be −γ (using
−γ instead of γ is the result of hindsight, as you shall see).
¨
1T
X
= 2 = −γ.
X
cT
This leads us to the two ordinary diﬀerential equations
¨
T (t) + γc2 T (t) = 0. X (x) + γX (x) = 0, Since u(0, t) = 0 and u( , t) = 0 and u(x, t) = X (x)T (t) , the function X (x) must also
satisfy the boundary conditions
X (0) = 0, X ( ) = 0. There are several ways to show γ must be positive. Perhaps the simplest is to observe
that if γ < 0 or γ = 0 , the only function X (t) which satisﬁes the diﬀerential equation
X + γX = 0 and boundary conditions X (0) = X ( ) = 0 is the zero function X (x) ≡ 0 .
Since for this function u(x, t) = X (x)T (t) ≡ 0 , it is devoid of further interest.
Another way to show γ is positive is to multiply the ordinary diﬀerential equation
X + γX = 0 by X (x) and integrate over the length of the string,
[X (x)X (x) + γX 2 (x)] dx = 0.
0 Upon integrating by parts, we ﬁnd that
?(x)X (x) dx = XX
0 0 X 2 (x) dx. −
0 Since X (0) = X ( ) = 0 , the boundary terms drop out. Substituting this into the above
equation, we ﬁnd that
X 2 (x) dx = γ
0 X 2 (x) dx.
0 If X (x) is not identically zero, this can be solved for γ
X 2 (x) dx
γ= 0 ,
2 X (x) dx
0 and clearly shows γ > 0 .
Enough for that. The solution of X + γX = 0, γ > 0 , is
X (x) = A cos √ √
γx + B sin γx. The boundary condition X (0) = 0 implies A = 0 , while the boundary condition at the
other end point X ( ) = 0 , implies
√
0 = B sin γ . 338 CHAPTER 8. MAPPINGS FROM EN TO E : THE DIFFERENTIAL CALCULUS
If B = 0 too, then X (x) ≡ 0 , so u(x, t) ≡ 0 . This is of no use to us. The only alternative
√
√
√
is to restrict γ so that sin γl = 0 . This means γ is a multiple of π, γ = nπ, n =
1, 2, . . . ,
nπ
√
γ=
, n = 1, 2, . . . .
There is then one possible solution X (x) for each integer n ,
Xn (x) = Bn sin nπ x, where the constants Bn are arbitrary.
Remark: There is a similarity of deep signiﬁcance for mathematics and physics between
the work in these last few paragraphs and that done for the coupled oscillators in Chapter
6. There (p. 528-9), we had an operator A and wanted to ﬁnd nonzero vectors Sn and
numbers λ such that
ASn = λn Sn .
The numbers found λn were called the eigenvalues of A , and Sn the corresponding eigenvectors.
d2
Here, we were given the operator A = − dx2 and wanted to ﬁnd nonzero functions
Xn (t) ∈ { X ∈ C 2 [0, ] : X (0) = X ( ) = 0 } which satisfy the equation
AXn = γn Xn
The numbers found, γn = n2 π 2 / 2 , are also called the eigenvalues of A , and the function
Xn (t) = sin nπ x , the eigenfunction of A corresponding to the eigenvalue γn .
Associated with each possible eigenvalue γn , there is a solution of the time equation,
¨
T + γc2 T = 0 ,
ncπ
ncπ
Tn (t) = Cn cos
t + Dn sin
t.
We therefore have found one special solution, un (x, t) − Xn (t)Tn (t) , for each value of
the index n ,
nπx
ncπt
ncπt
un (x, t) = sin
(αn cos
+ βn sin
).
The arbitrary constants have been lumped in this equation. These special solutions are the
“natural” vibrations of the string, or normal modes of vibration. A snapshot at t = t0 of
the string moving in the n th normal mode would reveal the sine curve
un (x, t0 ) = C sin nπx , the constant C accounting for the remaining terms, which are constant for t ﬁxed. In
music, the integer n refers to the octave. The fundamental tone is the case n = 1 , while
the tone for n = 2 , the second harmonic or ﬁrst overtone, is one octave higher.
a figure goes here 8.3. THE VIBRATING STRING. 339 The time frequency Vn of the n th normal mode is Vn = ncπ , this is the number of
oscillations in 2π units of time. It is the time frequency which we usually associate with
musical pitch. The (time) period τn of the n th normal mode is 2π/Vn , that is τn = 2l/nc .
Another name you will want to know is the wave length λn of the n th normal mode,
λn = 2 /n (see ﬁgures above). Notice that Vn λn = c , an important relationship.
Having found the special normal mode solutions, un (x, t) , we hope that arbitrary
constants αn and βn can be chosen so a linear combination
∞ u(x, t) = ∞ un (x, t) =
n=1 ncπt (αn cos + βn sin ncπt ) sin nπx n=1 will satisfy the given initial conditions. Every function u(x, t) of this form automatically
satisﬁes the boundary conditions u(0, t) = 0, u( , t) = 0 since each of the un ’s satisfy
them.
If u(x, 0) = f (x) and ut (x, 0) = g (x) , then from the above equation, we must have
∞ f (x) = ∞ un (x, 0) =
n=1 and ∞ g (x) =
n=1 αn sin nπx n=1 ∂un
(x, 0) =
∂t ∞ nπc βn sin nπx . n=1 Thus, the coeﬃcients αn are the coeﬃcients in the Fourier sine series for f , while the βn
are essentially the coeﬃcients in the Fourier sine series for g . In fact, this is how Fourier
was led to the series bearing his name. These formulas for u(x, y ), f (x) , and g (x) become
easier on the eye if the length of the string is π, = π . Then
∞ u(x, y ) = (αn cos nct + βn sin nct) sin nx,
n=1 while ∞ f (x) = ∞ un (x, 0) = αn sin nx, n=1 and ∞ g (x) =
n=1 (8-3) n=1 ∂un
(x, 0) =
∂t ∞ ncβn sin nx.
n=1 Finding the coeﬃcients αn and βn is particularly simple if f and g can be represented
by ﬁnite series.
Examples: Find the solution u(x, t) of the wave equation for a string of length π, l = π ,
which is pinned down at its end points, u(0, t) = u(π, t) = 0 , and satisﬁes the given initial
conditions.
(1) u(x, 0) = f (x) = 2 sin 3x, ut (x, 0) = g (x) =
the two series 1
2 sin 4x . We have to ﬁnd αn and βn for ∞ 2 sin 3x = αn sin nx
n=1 340 CHAPTER 8. MAPPINGS FROM EN TO E : THE DIFFERENTIAL CALCULUS
1
sin 4x =
2 ∞ ncβn sin nx.
n=1 For these simple functions, just match coeﬃcients, giving
α3 = 2, αn = 0, n = 3, and β4 = 1
, βn = 0, n = 4.
8c Therefore, the sum of the two waves
u(x, t) = 2 cos 3ct sin 3x + 1
sin 4ct sin 4x
8c is the (unique!) solution of this example.
(2) u(x, 0) = f (x) = 1
2 sin 3x − sin 17x and
ut (x, 0) = g (x) = −9 sin x + 13 sin 973x. We have to ﬁnd αn and βn for the two series
1
sin 3x − sin 17x =
2
and ∞ αn sin nx
n=1
∞ −9 sin x + 13 sin 973x = ncβn sin nx.
n=1 By matching again, we ﬁnd α3 = 1 , α17 = −1 , and αn = 0 for n = 3 or 17 . Also,
2
13
β1 = 9 , β973 = 973c , and βn = 0 for n = 1 or 973 . The (unique) solution is then a
c
sum of four waves
9
1
u(x, t) = − sin ct sin x + cos 3ct sin 3x
3
2
− cos 17ct sin 17x + 13
sin 973ct sin 973x.
973c Since f and g are not usually given in the simple form of these examples, the full
Fourier series is needed. Recall that the string is pinned down at both ends. Therefore
both the initial position function f (x) and velocity function g (x) have the property
f (0) = f (π ) = 0 , and g (0) = g (π ) = 0 , where we have taken the length of the string
to be π . It is now possible to extend both f and g , assumed continuous in [0, π ] ,
to the whole interval [−π, π ] as continuous odd functions,
a figure goes here
that is, if x ∈ [0, π ] , we can deﬁne
f (−x) = −f (x) and g (−x) = −g (x), since the right sides, −f (x) and −g (x) , are known functions for x ∈ [0, π ] .
As odd functions now on the whole interval [−π, π ] , the functions f and g have
Fourier sine series (cf. p. 252, Exercise 3a). 8.3. THE VIBRATING STRING. 341 ∞ f (x) = sin nx
bn √
π
n=1
∞ g (x) =
where π bn = 2
0 ˜n sin nx
b√
π
n=1 sin nx
f (x) √ dx,
π π ˜n = 2
b
0 sin nx
g (x) √ dx
π (8-4) Comparing with the previous formulas (3) for f and g , we ﬁnd
√
√
αn = bn / π, and βn = ˜n /nc π
b
Consequently ∞ u(x, t) = b
cos nct ˜n sin nct
√ ) sin nx
+
(bn √
nc
π
π
n=1 (8-5) the coeﬃcients bn and ˜n being determined from the initial conditions by equation
b
(4). Thus, we have almost proved
Theorem 8.12 . If f (x) is twice continuously diﬀerentiable and g (x) once continuously
diﬀerentiable for x ∈ [0, π ] and both functions vanish at x = 0 and x = π , then the
function u(x, t) deﬁned by equation (5) is a solution of the homogeneous wave equation
utt − c2 uxx = 0
and satisﬁes the
initial conditions: u(x, 0) = f (x), ut (x, 0) = g (x), x ∈ [0, π ],
as well as the
boundary conditions: u(0, t) = 0, u(π, t) = 0, t ≥ 0,
where bn and ˜n are determined from f and g through equations (4). Moreover, this
b
solution is unique (by Theorem 9).
Outline of Proof. If it is possible to diﬀerentiate the inﬁnite series (5) term by term u(x, t)
would satisfy the wave equation since each special solution un (x, t) does. In any case, the
initial condition u(x, 0) = f (x) is clearly satisﬁed. However, checking the other initial
condition ut (x, 0) = g (x) also involves diﬀerentiating the inﬁnite series term by term.
Thus, we must only justify the term by term diﬀerentiation of an inﬁnite Fourier series.
For power series, we found (p. 82-3, Theorem 16) we can always diﬀerentiate term by
term within its disc of convergence. Such is not the case with Fourier series. For example,
∞
sin n2 x
the Fourier series
converges for all x , but the series obtain by diﬀerentiating
n2
∞ n=1 cos n2 x diverges at x = 0 . However, if a function is suﬃciently smooth, its formally,
n=1 Fourier series can be diﬀerentiated term by term and does converge to the derivative of the 342 CHAPTER 8. MAPPINGS FROM EN TO E : THE DIFFERENTIAL CALCULUS
function. Since the details of a complete proof are but a rehash of the proof carried out for
power series (p. 82ﬀ), we omit it.
Example: Find the displacement u(x, t) of a violin string of length π with ﬁxed end
points which is plucked at its midpoint to height h . The initial position is then
xh, x ∈ [0, π/2]
(π − x)h, x ∈ [π/2, π ], f (x) = and the initial velocity, g (x) , is zero.
We must ﬁnd the coeﬃcients bn and ˜n in the series (5). After mentally continuing f
b
and g to the interval [−π, π ] as odd functions, the formulas (4) give us bn and ˜n ,
b
π bn = 2
0 π /2 2h
sin nx
f (x) √ dx = √
π
π π (π − x) sin nx dx . x sin nx dx +
0 π/2 Integrating and simplifying, we ﬁnd that 0, n even
nπ 4h
1, n = 1, 5, 9, 13, . . .
=
bn = √ 2 sin 2
πn
−1, n = 3, 7, 11, 15.
From g (x) ≡ 0 , it is immediate that βn = 0 for all n . Thus,
4h
u(x, t) =
π ∞
n=1 nπ
1
sin
cos n ct sin nx
n2
2 4h cos 3ct sin x cos ct sin 3x cos 5ct sin 5x
[
−
+
+ ···]
π
1
32
52
is the desired solution.
= Exercises
(1) (a) Find a solution u(x, t) of the homogeneous wave equation for a string of length
π whose end points are held ﬁxed if the initial position function is
u(x, 0) = 1
sin 4x − sin 7x,
2 while the initial velocity is
ut (x, 0) = sin 3x + sin 73x.
(b) Same problem as a), but
u(x, 0) = sin 5x + 12 sin 6x − 7 sin 9x
ut (x, 0) = − sin x + 91 sin 273x. 8.3. THE VIBRATING STRING. 343 (2) Find a solution u(x, t) of the homogeneous wave equation for a string of length π
whose end points are held ﬁxed if the string is initially plucked at the point x = π/4
to the height h .
(3) Consider a vibrating string of length whose end points are on rings which can slide
freely on poles at 0 and . Then the boundary conditions at the end points are
ux (0, t) = 0, ux ( , t) = 0
that is, zero slope.
(a) Use the method of separation of variables to ﬁnd the form of special standing
wave solutions. [Answer: un (x, t) = cos nπx (αn cos ncπt + βn sin ncπt ) ].
(b) Use these to ﬁnd a solution with the initial conditions
u(x, 0) = cos x − 6 cos 3x
ut (x, 0) = (let = π ) 1
cos 2x.
2 (4) Let u(x, t) satisfy the homogeneous wave equation. Instead of keeping the end points
ﬁxed, we either put them on rings (cf. Exercise 3) or attach them by elastic bands,
in which case the boundary conditions become
ux (0, t) − c1 u(0, t) = 0, ux (π, t) + c2 u(π, t) = 0, c1 , c2 ≥ 0.
(a) Deﬁne the energy as before, and prove that energy is dissipated with these boundary conditions, unless c1 and c2 vanish.
(b) Prove there is at most one function u(x, t) which satisﬁes the inhomogeneous
wave equation utt − c2 uxx = F (x, t) with initial conditions as before, but with
elastic boundary conditions
ux (0, t) − c1 u(0, t) = φ(t), ux (π, t) + c2 u(π, t) = ψ (t),
where c1 and c2 are non-negative constants.
(5) To account for the eﬀect of air resistance on a vibrating string, one common assumption is that the resistance on a segment of length ∆x is proportional to the velocity
of its center of gravity,
x
Fres = −k ∆xut (˜, t), k > 0,
where k is a numerical constant. This is analogous to the standard viscous resistance
force on a harmonic oscillator.
(a) Find the equation of motion ignoring gravity. [Answer: 1
u
c2 tt + kut = uxx ] (b) Find the form of the special standing wave solutions, assuming, the end points
are held ﬁxed.
(c) Write a formula giving the probable form for the general solution u(x, t) .
(d) If the end points are pinned down, what do you expect the behavior of the string
will be as t → ∞ ? Does the formula found in part c) verify your belief (it
should). 344 CHAPTER 8. MAPPINGS FROM EN TO E : THE DIFFERENTIAL CALCULUS
(e) Deﬁne the energy E (t) as before and show that energy is dissipated if the ends
are held ﬁxed.
˙
(f) Use the result of e) to prove E (t) + 2kE (t) ≥ 0 , and conclude that E (t) ≥
−2kt for t ≥ 0 . This shows that the energy is not dissipated too rapidly.
E (0)e
(6) It is possible to write the solution of the homogeneous wave equation for a string of
length π with ﬁxed end points in a simple closed form by using the trigonometric
identities
2 sin nx cos nct = sin n(x − ct) + sin n(x + ct).
2 sin nx sin nct = sin n(x − ct) − cos n(x + ct).
(a) Do this and obtain d’Alembert’s formula
u(x, t) = 1
f (x − ct) + f (x + ct)
+
2
2c x+ct g (ξ )dξ.
x−ct (b) Solve the example of a plucked string (p. 641) again using this formula. Draw
π
two sketches, one indicating the position of the string at time t = 2c and another
π
at t = c .
2 2 ∂
∂
(7) (a) Prove the wave operator L := ∂t2 − c2 ∂x2 , c a constant, is translation invariant,
that is, if T : u(x, t) → u(x + x0 , t + t0 ), prove (LT )u = (T L)u for all values of
x0 and t0 , and for all functions u for which the operators make sense. (b) Find the function φ(a, b) in the formula
Leax+bt = φ(a, b)eax+bt .
(c) Use part b) to show that if a is any constant, the four functions
ea(x+ct) , e−a(x+ct) , ea(x−ct) , e−a(x−ct)
are solutions of the homogeneous wave equation Lu = 0 .
(d) Use the fact that each of the above functions satisﬁes the ordinary diﬀerential
equation v (x) = a2 v (x) to conclude that if linear combinations of these functions are to satisfy the boundary conditions v (0) = v ( ) = 0 , then necessarily
a2 < 0 , so the constant a is pure imaginary and we can write a = iγ , where γ
is real.
(e) Let u(x, t) be a linear combination of the four functions part c) with a = iγ .
Show that u(x, t) may be written in the form
u(x, t) = sin γx [A cos γct + B sin γct].
(f) If u(0, t) = u( , t) = 0 , show that γn = nπ . Find an inﬁnite set of special solutions un (x, t) which satisfy the homogeneous wave equation with zero boundary
values [From here on, one proceeds as before to ﬁnd the general solution. This
problem has shown how the idea of translation invariance can also be used to
lead one to the special solutions un ]. 8.3. THE VIBRATING STRING. 345 (8) (a) By inspection, ﬁnd a particular solution for the solution of the inhomogeneous
wave equations
Lu := utt − c2 uxx = g, g ≡ constant.
(b) How can this particular solution be used to ﬁnd the solution of the equation
Lu = g which has given initial conditions and zero boundary conditions?
(9) Flow of heat in a thin insulated rod on the x axis is governed by the heat equation
ut (x, t) = k 2 uxx (x, t),
where u(x, t) represents the temperature at the point x at time t , and k 2 , the
diﬀusivity, is a constant depending on the material. The “energy” in a rod of length
, 0 ≤ x ≤ , is deﬁned as
E (t) = 1
2 l u2 (x, t) dx.
0 (a) If the ends of the rod have zero temperature, u(0, t) = u( , t) = 0 , prove “energy”
˙
is dissipated, E (t) ≤ 0 , by showing
dE (t)
= −k 2
dt u2 (x, t) dx.
x
0 (b) Given a rod whose ends have zero temperature and whose initial temperature is
zero, u(x, 0) = 0 , prove that the temperature remains zero, u(x, t) ≡ 0 .
(c) Prove the temperature of a rod is uniquely determined if the following three data
are known:
initial temperature: u(x, 0) = f (x), x ∈ [0, ].
boundary conditions: u(0, t) = φ(t), u( , t) = ψ (t), t ≥ 0.
(d) Use the method of separation of variables to ﬁnd an inﬁnite number of special
solutions of the heat equation for a thin rod whose end points have zero temperature for all t ≥ 0 . [Answer: un (x, t) = cn e −n2 k2 π 2
t
2 sin nπ x, n = 1, 2, . . . ] (e) If the ends of a rod have zero temperature for all t ≥ 0 , what do you intuitively
expect the temperature u(x, t) will be as t → ∞ ? Is this borne out by the
formulas for the special solutions?
(f) Find the temperature distribution in a rod of length π if the ends have zero
temperature and if the initial temperature distribution in the rod is
u(x, 0) = sin x − 4 sin 7x,
(10) If the temperature at the ends of the bar of length is constant but not necessarily
zero, say
u(0, t) = θ1 ,
u ( , t) = θ 2 ,
the temperature distribution can be found be splitting the solution into two parts,
u(x, t) = u(x, t) + up (x, t) , where up (x, t) is a particular solution having the correct
˜
temperature at the ends of the bar and u(x, t) is a general solution which has zero
temperature at the ends. 346 CHAPTER 8. MAPPINGS FROM EN TO E : THE DIFFERENTIAL CALCULUS
(a) Find a particular solution of the homogeneous heat equation ut = k 2 uxx which
satisﬁes u(0, t) = 200 , u( , t) = 500 , but does not necessarily satisfy any
prescribed initial condition. [Answer: Many possible solutions - for example
up (x, t) = 20 + 30 x , or up (x, t) = 20 + 30 sin πx ] .
2
(b) Find the temperature distribution in a rod of length π if the initial temperature
is u(x, 0) = 2 sin x − sin 4x , while the boundary conditions are as in part a).
(11) If the ends of a bar of length
boundary conditions are are insulated instead of being kept at zero, the
ux (0, t) = ux ( , t) = 0. (a) Use the method of separation of variables to ﬁnd an inﬁnite number of special solutions for the homogeneous heat equation with insulated ends. [Answer:
un (x, t) = cn e −n2 k2 π 2
t
2 cos nπx , n = 0, 1, 2, . . . ]. (b) What is the temperature distribution in a rod whose ends are insulated if the
initial temperature distribution is
u(x, t) = 3 cos 2πx − 1
5πx
cos
.
5 (12) In this exercise you will ﬁnd a quantitative estimate for the rate of decrease of energy
for the heat in a rod of length with zero temperature at the ends.
(a) Use the result of Exercise 9a to prove the diﬀerential inequality
dE
≤ −cE (t),
dt
where c is a positive constant. [Hint: Look at p. 227 Exercise 15c].
(b) Conclude that
E (t) ≤ E (0)e−ct , t ≥ 0.
This is the desired estimate for the decrease of energy in the rod.
(13) The linear partial diﬀerential equation
uxx − u = ut
governs the temperature distribution in a rod of length made up of a material which
uses up heat to carry out a chemical process. Deﬁne the energy E (t) in the rod as
in Exercise 9.
(a) Prove that if the ends of the rod have zero temperature, then the energy is
˙
dissipated, E (t) ≤ 0 .
(b) Given a rod whose ends have zero temperature and whose initial temperature
u(x, 0) is zero, use a) to prove that the temperature remains zero, u(x, t) ≡
0, t ≥ 0 .
(c) Use part b) to prove that the temperature of the rod described above is uniquely
determined if the following three data are known
u(x, 0) for x ∈ [0, ], u(0, t) and u( , t) for t ≥ 0. 8.4. MULTIPLE INTEGRALS 347 (14) In setting up the mathematical model for the vibrating string, we never examined the
horizontal components of the forces.
(a) Show that the net horizontal force is
Fh = τ cos θ2 − τ cos θ1
(b) Under our assumption ux is small, show that the net horizontal force is zero so there is no horizontal motion of the string. This justiﬁes the statement that
the motion of the string is entirely vertical.
(15) Use the formula Vn = nπc/ (page 635) for the frequency and the relationship between c, T and ρ (page 624) to derive a formula for Vn in terms of the physical
constants , T , and ρ for a vibrating string. Interpret the eﬀect on the frequency,
Vn , if the physical constants are changed. Does this agree with your experience in
tuning stringed instruments? 8.4 Multiple Integrals How can we extend the notion of integration from functions of one variable to functions of
several variables? That is the problem we shall face in this section.
Let w = f (X ) = f (x1 , . . . , xn ) be a scalar-valued function deﬁned in C ⊂ En . For
the purposes of this section it will be convenient to think of f as either the height function
for a surface M in En+1 over D , or as the mass density of D . In the ﬁrst case. f
D should be the volume of the solid contained between M and D (see ﬁg.), whereas in the
second case, f should be total mass of the set D .
D Two problems have to be solved. First, deﬁne the integral in En . Second, give a
reasonable procedure for explicitly evaluating the integral in suﬃciently simple situations.
More so than for the single integral, the problem of deﬁning the multiple integral bristles
with technical diﬃculties. However, after this is done the evaluation of integrals in En can
be reduced to the evaluation of repeated integrals, that is, a sequence of n integrals in E1 ,
which is in turn eﬀected not by using the deﬁnition of the integral, but rather by recourse
to the fundamental theorem of calculus.
Before starting the formalities, it is well advised to see where some diﬃculties lie.
Suppose we are given a density function f deﬁned on some domain D and want to ﬁnd
the total mass of D . To make things even simpler, assume for the moment that the density
is constant and equal to 1, for all X ∈ D ⊂ En . Then the mass coincides with the volume
of the domain. For the special case of functions of one variable D ⊂ E1 is an interval so
the “volume” of D (really the length of D ) is trivial
a figure goes here
to compute, Vol (D) = b − a . However if D has two or more dimensions, even ﬁnding the
volume of D (area if D ⊂ E2 ) is itself diﬃcult.
The problem is that a connected set D in E1 can only be a line segment, whereas a
connected open set in En , n ≥ Z can be much more complicated topologically. In E1 , the 348 CHAPTER 8. MAPPINGS FROM EN TO E : THE DIFFERENTIAL CALCULUS
closed “cube” and closed “ball” are both intervals [a, b] , and every other connected set is
also an interval. In E2 , not only do the cube and ball become distinct, but also a slew of
other possibilities arise. D may be riddled with holes and its
a figure goes here
boundary wild (contrasted to the boundary of a connected set in E1 which is always just
two points, the end points of the interval). It should be clear that the notion of volume of
a set D may only be deﬁnable if the boundary of D is suﬃciently smooth.
As you should be anticipating, the volume of a set D will be deﬁned by ﬁlling it up
with little cubes of volume ∆x1 ∆x2 . . . ∆xn = ∆V , and then proving that as the size of
the cubes becomes small, the sum of volumes of the cubes approaches a limit (here is where
the smoothness of θD enters). In two dimensions, D ⊂ E2 , this roughly reads
Area (D) = lim ∆x∆y = ∆x→0 dx dy.
D ∆y →0 Only after the volume of a domain is deﬁned can the more general notion of mass
of a set D for a density function f be deﬁned. The procedure here is straightforward,
however it is important that the density f be “essentially” continuous. Using the same
approximating cubes, we assign to each little cube its approximate density, say by using the
value of the density f at the center of the little cube. Adding up the masses of these little
cubes and passing to the limit again, we ﬁnd the total mass of the solid D with density f .
Again, in two dimensions this roughly reads
Mass (D) = lim
∆x→0 f (xi , yj )∆x∆y = f (x, y ) dx dy.
D ∆y →0 Because of the technical complications, we shall only state a series of propositions which
give the existence of the integral. The proofs of several crucial - but believable - results will
not be carried out, but can be found in many advanced calculus books. For convenience,
the geometric language of the plane, E2 , will be used. The ideas extend immediately to
higher dimensions. Now some terminology.
Definition: A shaved rectangle is a rectangle with its bottom and left sides omitted, that
is, a set of the form
Q = {X = (x1 , x2 ) : aj < xj ≤ bj , j = 1, 2}.
A rectangular complex is a ﬁnite union of shaved rectangles, which can always be assumed
disjoint, that is, non-overlapping. This should more accurately be called a shaved rectangular complex, but is not for the sake of euphony.
If D is a set, the characteristic function of D , XD is deﬁned by
XD (X ) = 1,
0, X∈D
X D. A step function s(X ) is a ﬁnite linear combination of characteristic functions of shaved
rectangles. The graph of this function looks like its name implies. 8.4. MULTIPLE INTEGRALS 349
a figure goes here A function f has compact support if it is identically zero outside some suﬃciently large
rectangle. The support of a particular function f , written supp f , is the smallest closed
set outside of which f is zero. Thus, it is the set of all points X where f (X ) = 0 and the
limit points of those points.
We take the area of a shaved rectangle Q as a known quantity - the height times base,
and deﬁne the integral as
I (XQ ) = dA ≡ Area (Q), XQ dA =
E2 D where the Area (Q) is deﬁned in the natural way as length × width. You may wish to think
of dA as representing an “inﬁnitesimal element of area”. We however assign no meaning
to the symbol and use it only as a reminder. Some prefer to do without it altogether and
write
XQ = Area (Q).
E2 Our task is to deﬁne
I (f ) ≡ f dA
E2 for density functions other than XQ ’s. For example, if D is some set, for the function XD
we want to deﬁne
Area (D) = XD dA =
E2 dA
D But this will not make sense unless it is shown that the set D does have a number associated
with it which has the properties of area. It is easy to deﬁne the integral of a step function
S . Let
n S (X ) = aj XQj (X ),
j =1 where the Qj ’s are disjoint. Then S dA should represent the total mass of a plate composed of rectangles Q1 , . . . , Qn with respective densities a1 , . . . , an . Thus, we deﬁne
n I (S ) = S dA ≡ a1 Area (Q1 ) + . . . + an Area , (Qn ) = aj
j =1 The integrals of step functions clearly satisfy the following
Lemma 8.13 . If S1 (X ) and S2 (X ) are step functions, then
a). I (aS1 + bS2 ) = aI (S1 ) + bI (S2 ).
b). S1 (X ) ≤ S2 (X ) implies I (S1 ) ≤ I (S2 ).
c). If S (X ) is bounded by M, S (X ) ≤ M , then
I (S ) ≤ cM,
where c is the area of the support of S . XQj dA. 350 CHAPTER 8. MAPPINGS FROM EN TO E : THE DIFFERENTIAL CALCULUS
The integral of any other more complicated function is deﬁned by using step functions.
Definition: A function f : E2 → E is Riemann integrable if given any > 0 , there are step
functions s and S with s(X ) ≤ f (X ) ≤ S (X ) for all X ∈ E2 such that I (S ) − I (s) < ,
that is
S dA −
s dA < .
E2 E2 Intuitively, a function is Riemann integrable if it can be trapped between two step
functions S and s in such a way that the integrals of S and s diﬀer by an arbitrarily
small amount.
Definition: If f is Riemann integrable, let Sn and sn be a trapping sequence, for f ,
1
that is, sn (X ) ≤ f (X ) ≤ Sn (X ) and I (Sn ) − I (sn ) < n . Then the Riemann integral of
f, I (f ) is deﬁned as (cf. page 21, for the deﬁnition of l.u.b. = least upper bound, and of
g.l.b.).
I (f ) ≡ l.u.b.n→∞ I (sn )
We could have equivalently deﬁned I (f ) as I (f ) = g.l.b.n→∞ I (Sn ) . Since both limits
are the same, it is irrelevant. However, it is important to show that I (f ) has the same
ˆ
value if any other trapping sequence Sn (X ), sn (X ) is used. This is the content of
ˆ
Lemma 8.14 . If f is Riemann integrable, then I (f ) does not depend on which trapping
sequences are used. Proof not given.
Now we exhibit a class of functions which are Riemann integrable. The issue boils down
to ﬁnding functions which can be approximated well by step functions.
Lemma 8.15 . If f is a continuous function and D is a closed and bounded set, then
f can be approximated arbitrarily closely from above and below by step functions S and s
throughout D . Thus, given any > 0 , there are step functions S and s such that
0 ≤ S (X ) − f (x) < , and 0 ≤ f (X ) − s(X ) < for all X ∈ D. Proof not given.
Theorem 8.16 . If f is a continuous function with compact support, then it is Riemann
integrable.
Proof: Let S (X ) and s(X ) be as in the lemma where D is the support of f . Then
s(X ) ≤ f (X ) ≤ S (X )
and
S (X ) − s(X ) = [S (X ) − f (X )] + [f (X ) − s(X )] < 2 .
Thus by Lemma 1,
I (S ) − I (s) = I (S − s) < 2c ,
where c is the area of the set (supp S ) ∪ (supp s ).
Because f has compact support, the constant c is bounded. Therefore the factor
2c can be made arbitrarily small by choosing small. This veriﬁes all the conditions for
integrability. 8.4. MULTIPLE INTEGRALS 351 We have disposed of the problem of integrating continuous functions with compact
support. Notice that the above procedure is identical to that used for functions of one
variable (see ﬁgure.)
We still do not know how to ﬁnd the area of a domain D . Although we anticipate that
Area (D) = I (XD ) , this does not yet make sense (except for rectangular complexes) since
the discontinuous function XD is not covered by Theorem 1). Let us remedy this now.
The problem is to show the boundary ∂D does not have any area.
Definition: A set in E2 has content zero if it can be enclosed in a rectangular complex
whose total area is arbitrarily small. Thus, if a set has content zero, given any > 0 , there
is a rectangular complex R containing ∂D such that
Area (R) = I (XR ) < .
It should be clear that any set with a ﬁnite number of points has content zero (since
each point can be enclosed on a square of side , so the total area of N such squares is
N 2 , which can be made arbitrarily small.) One would also expect that curves will have
zero content. This is not necessarily true unless the curve is not too badly behaved.
Lemma 8.17 . If a curve is composed of a ﬁnite number of smooth curves, then it has
zero content. In particular, if the boundary ∂D of a bounded domain D is such a curve,
it has zero content. Proof not given.
Theorem 8.18 . If the boundary ∂D of a domain D ⊂ E2 has content zero, then the
function XD is Riemann integrable. Consequently, the area of D is deﬁnable and given by
Area (D) = XD dA =
E2 dA.
D Proof: Almost identical to that for Theorem 11. Let
> 0 be given and let R be
the rectangular complex which encloses the boundary ∂D , where R has area less than
, I (XR ) < . Then the part of D which is enclosed by R, D− = D − R ∩ D , is a
rectangular complex as is D+ = R ∪ D− and D+ − D− = R . Since D+ ⊃ D ⊃ D− , we
have
XD− (X ) ≤ XD (X ) ≤ XD+ (X ) for all X.
Also,
I (XD+ ) − I (XD− ) = I (XR ) < .
Thus XD is trapped by the step functions S = XD+ and s = SD− and I (S ) − I (s) < ,
proving the theorem.
It is now possible to deﬁne
f dA
D for continuous functions f where D is not necessarily the support of f .
Theorem 8.19 . If f is continuous in a closed and bounded set D whose boundary ∂D
has content zero, then the function fXD is Riemann integrable and
f dA ≡ I (fXD ).
D 352 CHAPTER 8. MAPPINGS FROM EN TO E : THE DIFFERENTIAL CALCULUS
Proof: Let R be the rectangular complex which encloses ∂D and has area less than
, I (XR ) < . Take D− = D − R ∩ D and D+ = R ∪ D− as in Theorem 12. Further let
S1 and s1 be step functions which trap f within
for all X ∈ D− (this is possible by
Lemma 3)
0 ≤ S1 (X ) − f (X ) < , 0 ≤ f (X ) − s1 (X ) < for all X D− , so
0 ≤ S1 (X ) − s1 (X ) < 2 X ∈ D. for all Let M be an upper bound for |f | on D, |f (X )| ≤ M for all X ∈ D− . Then deﬁne
S = S1 + M X R and s = s1 − M XR . These functions S and s trap f on all of D ,
s(X ) ≤ f (X ) ≤ S (X ) for all X ∈ D, that is,
s ≤ f XD ≤ S for all X. Furthermore
I (S − s) = I (S1 − s1 ) + 2M I (XR )
< 2c + 2M = (2c + 2M ) ,
where c is the area of D− . Since S and s are step functions which trap f , and since
I (S − s) can be made arbitrarily small, the proof that fXD is Riemann integrable is
completed. We follow custom and write
I (f XD ) ≡ f dA.
D Except for the three unproved lemmas, this completes the proof of the existence of the
integral. The next theorem summarizes some important properties of the integral.
Theorem 8.20 . If f and g are Riemann integrable, then
a). I (af + bg ) = aI (f ) + bI (g ),
a, b constants
b). f ≤ g implies I (f ) ≤ I (g ) .
c). |I (f )| ≤ I (|f |)
Proof:
a) and b) are immediate consequences of the corresponding statements for step functions
(Lemma 1) and the deﬁnition of the Riemann integral as the limit of step functions. To
prove c), we ﬁrst observe that if f is integrable, so is |f | . Since − |f | ≤ f ≤ |f | , by parts
a and b
−I (|f |) ≤ I (f ) ≤ I (|f |),
which is equivalent to the stated property.
Although the approximate value of the integral f dA can be evaluated by using the
D procedures of the above theorems, we have as yet no routine way of evaluating the integral 8.4. MULTIPLE INTEGRALS 353 if f and D are simple. Some notation will suggest the method. Write dA = dx dy and
think of dx dy as the area of an “inﬁnitesimal” rectangle. Then
f dA = f (x, y ) dx dy. D D If D is the domain in the ﬁgure, it is reasonable to evaluate the double integral, which we
shall think of as the mass of D with density f , by ﬁrst ﬁnding the mass of a horizontal
strip
γ2 g (y ) = f (x, y ) dx,
γ1 and then adding up the horizontal strips to ﬁnd the total mass
γ4 f (x, y ) dx dy = γ4 γ2 g (y ) dy = D f (x, y ) dx γ3 γ3 dy. γ1 The integral on the right is called an iterated or repeated integral. In a similar way, one
could begin with mass of vertical strips
γ4 h(x) = f (x, y ) dy
γ3 and add these up
γ2 f (x, y ) dx dy =
D γ2 γ4 h(x) dy = f (x, y ) dy γ1 γ1 dx. γ3 For most purposes, it is suﬃcient to consider domains which are of the two types
pictured
a figure goes here
that is, D is bounded on two sides by straight line segments. More complicated domains
can be treated by decomposing them into domains of these two types, where one or both
of the straight line segments might degenerate to a point.
Theorem 8.21 . If f is continuous on a domain D1 (respectively D2 ) as above, then
the iterated integral
b φ2 (x) β f (x, y ) dy
a dx φ2 (y ) [resp. φ1 (x) f (x, y ) dx
α dy ] φ1 (y ) exists and equals
f dA.
D Proof not given. It is rather technical.
Remark: If a domain D happens to be of both types (as, for example, rectangles and
triangles are ) then either iterated integral can be used and yield the same result - since
they are both equal f dA . See Examples 1 and 3 below (Example 2 could also have
D been done both ways).
Examples: 354 CHAPTER 8. MAPPINGS FROM EN TO E : THE DIFFERENTIAL CALCULUS
f dA where f (x, y ) = x2 y and D is the rectangle in the ﬁgure. We (1) Evaluate
D shall integrate with respect to x ﬁrst.
2 3 (x2 + xy ) dx f dA =
D 1 dy. 1 The inner integral is the mass of a strip. Think of y as being the ﬁxed height of the
strip. Then
3 (x2 + xy ) dx =
1 x3 x2 y
+
3
2 x=3
x=1 =9+ 9y 1 y
26
−−=
+ 4y
2
32
3 Therefore, adding up all the strips we ﬁnd
2 f dA = ( D 1 26
26
+ 4y ) dy = ( y + 2y 2 )
3
3 y =2 = y =1 26
44
+6=
3
3 Let us evaluate this again, now integrating ﬁrst with respect to y .
3 2 (x2 + xy ) dy f dA =
D First 1 2 (x2 + xy ) dy = (x2 y +
1 so 3 f dA =
D 1 dx. 1 xy 2
)
2 y =2
y =1 3
= x2 + x
2 x2 3
3
(x2 + x) dx = ( + x2 )
2
3
4 x=3
x=1 = 44
,
3 which agrees with the previous computation. Instead of imagining f as the density
of D , one can also take f to be the height function of a surface above D . Then the
integral f dA is the volume of the solid whose base is D and whose “top” is the
D surface M with points (x, y, f (x, y )) . In this case, the volume is 44/3.
f dA where f (x, y ) = x2 + xy + 2 and D is the domain bounded by (2) Evaluate
D the curves φ1 (x) = 2x2 , φ2 (x) = 4 + x2 , and x = 0 .
Integrate ﬁrst with respect to y . Then y varies between 2x2 and 4 + x2 , while x
varies between the two straight lines x = 0 and x = 2 .
2 4+x2 (x2 + xy + 2) dy f dA =
0 D
2 (x2 y + =
0
2
0 dx 2 x2 xy 2
+ 2y )
2 y =4+x2
y =2x2 dy 3
464
(8 + 8x + 2x2 + 4x3 − x4 − x5 ) dy =
2
15 8.4. MULTIPLE INTEGRALS 355 f dA where f (x, y ) = (x − 2y )2 and D is the triangle bounded by (3) Evaluate
D x = 1, y = −2 , and y + 2x = 6.
We shall integrate ﬁrst with respect to x . Then x varies between x = 1 and
x = − 1 y + 2 , while y varies between the lines y = −2 and y = 2 .
2
1
y +2
2 2 f dA =
−2 D (x − 2y )2 dx dy 1 Since
1
− 2 y +2 1 1
(x − 2y )2 (x − 2y )2 dx = (x − 2y )3
3 we ﬁnd
f dA =
D x=− 1 y +2
2
x=1 1
5
1
= (2 − y )3 − (1 − 2y )3 ,
3
2
3 2 1
3 164
5
.
[(2 − y )3 − (1 − 2y )3 ] dy =
2
3
−2 One can also integrate ﬁrst with respect to y . Then y varies between y = −2 and
y = −2x + 6 , while x varies between the lines x = 1 and x = 3 .
−2x+4 3 (x − 2y )2 dy f dA =
D dx. −2 1 Since
−2x+4
−2 1
(x − 2y )2 dy = − (x − 2y )3
6 y =−2x+4
y =−2 1
= − [(5x − 8)3 − (x + 4)3 ]
6 we again ﬁnd
f dA = −
D 1
6 3 [(5x − 8)3 − (x + 4)3 ] dx =
1 164
.
3 (4) Find the volume of the pyramid P bounded by the four planes x = 0, y = 0, z = 0,
and x + y + z = 1 . The easiest way to do this is to let z = f (x, y ) = 1 − x − y be
the height function of the tilted plane which we shall take as the top of the pyramid
which lies above the triangle D (in the xy plane) which is bounded by the three
lines x = 0, y = 0 , and x + y = 1 . Then
Volume (P ) = f (x, y ) dx dy
D One can integrate with respect to either x or y ﬁrst. We shall do the x integration
ﬁrst.
1− y 1 (1 − x − y ) dx f dA =
D Since 1− y
0 0 dy. 0 1
(1 − x − y ) dx = − (1 − x − y )2
2 x=1−y
x=0 1
= (1 − y )2
2 356 CHAPTER 8. MAPPINGS FROM EN TO E : THE DIFFERENTIAL CALCULUS
we ﬁnd
Volume (P ) = f dA =
D 1
2 1
0 1
(1 − y )2 dy = − (1 − y )3
6 1
0 1
=.
6 This agrees with the usual formula for the volume of a pyramid
1
Vol = altitude × area of base.
3
The identical methods work for triple integrals. All of the theorems and proofs remain
unchanged. Again the integral
f dV
D can either be interpreted as the mass of a solid D with density f , or as the “volume” of a four dimensional solid whose base is D and top in the surface with points
(x, y, z, f (x, y, z )) . Because of conceptual diﬃculties, one usually thinks of f as a density.
Calculation of triple integrals is done by evaluating three integrals, as
f dV = f (x, y, z ) dz dy dx, D where the limits in the iterated integral on the right are determined from the domain D .
An example should illustrate the idea adequately,
f dV where f (x, y, z ) ≡ c and D is the solid bounded by the Example: Evaluate
D two planes z ≡ 0, y ≡ 2 , and the surface z ≡ −x2 + y 2 . We have to evaluate c dV
D which is the mass of the solid D with constant density c , that is c times the volume of
D . It is convenient to carry out the z integration ﬁrst, then the x integration
2 − x2 + y 2 y c dV = c
D dz
−y 0 dx dy. 0 The x limits of integration have been found by looking at the region of integration in the
xy plane beneath the surface z = −x2 + y 2 . This region, found by setting z = 0 , consists
of the points between the straight lines 0 = −x2 + y 2 , that is between the lines x = y and
x = −y . Then
2 y (−x2 + y 2 ) dx f dV = c
D
2 (− =c
0 x3
+ xy 2 )
3 dy −y 0 2 x= y
x= − y dy = c
0 4
16
dy = c.
3
3 By letting c = 1 , the volume of the solid is seen to be 16/3 . Exercises
(1) Evaluate xy dx dy for the following domains D in two ways:
D and ( xy dy ) dx . ( xy dx) dy 8.4. MULTIPLE INTEGRALS 357 (a) D is the rectangle with vertices at (1, 1), (1, 5), (3, 1) and (3, 5) .
(b) D is the triangle with vertices at (1, 1), (3, 1) and (3, 5) .
(c) D is the region enclosed by the lines x = 1, y = 2 , and the curve y = x3 (a
curvilinear triangle).
√
(d) D is the region enclosed by the curves y = x2 and y = x .
(2) Evaluate
sin π (2x + y ) dx dy,
D where D is the triangle bounded by the lines x = 1, y = 2 and x − y = 5 .
(3) Evaluate
(xy − y3 ) dx dy,
D where D is the region enclosed by the lines x = −1, x = 1, y = −2 and the curve
y = 2 − x2 .
(4) Evaluate
(xy + z ) dx dy dz,
D where D is the rectangular parallelepiped bounded by the six planes x = −2, y =
1, z = 0, x = 1, y = 2, z = 3 .
(5) Evaluate
xyz dx dy dz,
D where D is the solid enclosed by the paraboloid z = x2 + y 2 and the plane z = 4 .
(6) Find the volume of an octant of the ball x2 + y 2 + z 2 ≤ a2 in two ways;
(a) by evaluating
f (x, y )dx dy
D where f is a suitable function and D a suitable domain
(b) by evaluating
dx dy dz,
D where D is the ball.
(7) If f (x, y ) > 0 is the density function of a plate D , the x and y coordinates of the
center of mass (¯, y ) are deﬁned by
x¯
x= xf (x, y ) dx dy
,
D f (x, y ) dx dy D y= yf (x, y ) dx dy
.
D f (x, y ) dx dy D Find the center of mass of a triangle whose vertices are at the points (0, 0), (0, 4) ,
and (2, 0) , and whose density is f (x, y ) = xy + 1 . 358 CHAPTER 8. MAPPINGS FROM EN TO E : THE DIFFERENTIAL CALCULUS
(8) The moment of inertia with respect to a point p = (ξ, η ) of a plate D with density
f (x, y ) is deﬁned by
[(x − ξ )2 + (y − η )2 ]f (x, y )dx dy. Jp (D) =
D (a) Find the moment of inertia of the plate in Exercise 7, with respect to the point
p = (1, 0) .
(b) If D is any plate (with suﬃciently smooth boundary), prove that the moment
of inertia is smallest if the point f = (ξ, η ) is taken to be the center of mass of
D . [Hint: Consider J as a function of the two variables ξ and η and show J
has a minimum at (¯, y ) .]
x¯
(9) (a) Show that
fxy (x, y )dx dy = f (p1 ) − f (p2 ) + f (p3 ) − f (p4 ),
D where D is a rectangle with vertices at p1 , p2 , p3 , p4 (see ﬁg.).
(b) Use the result of part (a) to again evaluate the integral in Ex. 1a.
(c) If U (x, y ) satisﬁes the partial diﬀerential equation Uxy = 0 for 0 < y < x and
U (x, x) = 0 while U (x, 0) = x sin x , ﬁnd U (x, y ) for all points (x, y ) in the
wedge 0 < y < x . [Answer: U (x, y ) = x sin x − y sin y for 0 < y < x ].
(10) Let f (x, y ) be a bounded function which is continuous except as a set of points
of content zero, and suppose f has compact support. Prove that f is Riemann
integrable. This again proves Theorem 13.
(11) Let D1 and D2 be domains whose boundaries have zero content and whose intersection D1 ∩ D2 has zero content.
(a) If f is continuous on D1 ∪ D2 , prove that the integral f dA exists and
D1 ∪D2 that
f dA =
D1 ∪D2 f dA +
D1 f dA.
D2 (b) Give an example showing the above equality does not hold if D1 ∩ D2 has nonzero content.
(12) (a) By an explicit construction, show that the region D = {(x, y ) E2 : |x| + |y | ≤ 1}
has boundary with zero content.
(b) By an explicit construction, show that the circle ? = {(x, y ) E2 : x2 + y 2 = 1}
has zero content.
(13) (a) By interchanging the order of integration, show that
x s (
0
x (b) 2 (
0 0 r f (t) dt)dr) ds =? (
0 x 0 (x − t)f (t) dt. f (t) dt) ds =
0 8.4. MULTIPLE INTEGRALS 359 (14) Let D be a plate in the x, y plane with density f and total mass M . If p = (ξ, η )
is an arbitrary point in the plane and p = (¯, y ) is the center of mass of D , prove
¯
x¯
Jp (D) = Jp (D) + M p − p0 2 ,
¯
where the notation of Exercise 8 has been used. This is the parallel axis theorem. It
again proves the result of Exercise 8b. 360 CHAPTER 8. MAPPINGS FROM EN TO E : THE DIFFERENTIAL CALCULUS Chapter 9 Diﬀerential Calculus of Maps from
n to Em , s.
E
9.1 The Derivative .
Now we generalize the ideas of Chapters 7 and 8 and consider nonlinear mappings from
a set D in En to Em , F : D ⊂ En → Em , or Y = F (X ) , where X ∈ D and Y ∈ Em . In
coordinates, these functions look like
y1 = f1 (x1 , . . . , xn )
·
·
·
ym = fm (x1 , . . . , xn )
where the functions fj are scalar-valued. The special case n = 1, m arbitrary, was treated
in Chapter 7, section 3, while the special case m = 1, n arbitrary, was treated in Chapter
8.
One interpretation of maps F : D ⊂ En → Em is as a geometric transformation from
some subset D of En into all or part of Em .
EXAMPLES.
(1) The aﬃne map Y = F (X ) deﬁned by
y1 = 2 + x1 − 2x2
y2 = 1 + x1 + x2
maps E2 into E2 . Under this map, the origin goes into (2, 1) , the x1 axis (i.e. the
line x2 = 0 ) goes into the line y1 − y2 = 1 ,
a figure goes here
361 362 CHAPTER 9. DIFFERENTIAL CALCULUS OF MAPS FROM EN TO EM , S.
while the x2 axis goes into the line y1 + 2y2 = 4 . The shaded region indicates the
image of the indicated square. (2) The map Y = F (X ) deﬁned by
y1 = x1 − x2
y2 = x2 + x2
1
2
maps all of E2 onto the upper half y1 y2 plane (since y2 ≥ 0 ). Let us see what
happens to a rectangle under this mapping. Consider the rectangle R in the ﬁgure.
2
The x1 axis, x2 = 0 , goes into the parabola y2 = y1 , and the line x2 = 1 into
2.
y2 = 1 + (y1 + 1)
a figure goes here
Similarly, the line x1 = 1 is mapped into y2 = 1 + (y1 − 1)2 , while x1 = 2 is mapped
into y2 = 4 + (y1 − 2)2 . By following the images of the boundary ∂R , we now see
that the interior of R is mapped into the shaded curvilinear “parallelogram”. This
mapping, though injective when restricted to our rectangle, is not injective for all
(x1 , x2 ) ∈ E2 , since, for example, the points X1 = (1, 2) and X2 = (−2, −1) are
both mapped into the same point (−1, 5) .
(3) The function w = x2 + x2 whose graph is a paraboloid, is a map from E2 into
1
2
E1 . It can also be regarded as a map from E2 into E3 by a useful artiﬁce. Let
y1 = x1 , y2 = x2 , and y3 = w = x2 + x2 . Then
1
2
y1 = x1
y2 = x2
y3 = x2 + x2
1
2
is a map F from E2 into E3 . The image of the unit square (see ﬁgure) is then the
shaded region in the ﬁgure above the image (y1 , y2 ) of the square R
a figure goes here
(4) The map F : E2 → E3 deﬁned by (cf. example 2)
y1 = x1 − x2
y2 = x2 + x2
1
2
y3 = x1 + x2
2
2
also represents a surface M . In fact, since y1 + y3 = 2y2 , this surface is a paraboloid
opening out on the y2 axis. Again, we investigate where the rectangle R of example 2
is mapped. Since the y1 and y2 components of the mapping are the same as before, the
image of R will lie on the surface M above the image (y1 , y2 ) of (x1 , x2 ) . Thus the image
of the rectangle R is a patch of the surface M . 9.1. THE DERIVATIVE 363 From these examples, we see it is natural to regard any map F : D ⊂ E2 → Em
as an ordinary surface, or two dimensional manifold, embedded in Em , much as a map
F : D ⊂ E1 → Em was regarded as an ordinary curve. In the case m = 1 , the surface
F : D ⊂ E2 → E1 was representable as the graph of the function F . For m = 2 and
higher, this surface is seen as the range of the map. In the same way, an n dimensional
surface, or manifold, embedded in I Em is a map F : D ⊂ En → Em . You might want to
think of n as being the number of “degrees of freedom” on the manifold. In a strict sense,
the map F : D ⊂ En → Em is not an n manifold embedded in Em unless Em is big enough
to hold an m manifold, i.e. m ≥ n . However by either using the graph of F , a subset of
Em+n , or by using the trick of example 3 we can always think of the map F : En → Em as
an n dimensional surface. For m ≥ n , this surface can be embedded as a subset of En .
There are several valuable physical interpretations of these vector valued functions of a
vector, Y = F (X ) . Consider a ﬂuid ﬂowing through a domain D in E3 . The ﬂuid could
be air and D as the outside of an airplane, or the ﬂuid could be an organic ﬂuid, and D
as some portion of the body.
The velocity V of a particle of ﬂuid is a three vector which depends upon the space coordinate (x1 , x2 , x3 ) as well as the time coordinate t of the particle, V = F (x1 , x2 , x3 , t) =
F (X, t) . This velocity vector V (X, t) at X points in the direction the ﬂuid is moving.
Thus, the velocity function is an example of a mapping from space-time E3 × E1 ∼ E4 into
=
vectors in E3 . In this case, we think of the velocity vector V = F (X, t) as having its foot
at the point X ∈ D and imagine the mapping as the domain D along with a vector V
attached to each point of D (see ﬁg. above). One calls this a vector ﬁeld deﬁned on the
domain D , since it assigns a vector to each point of D .
A very common vector ﬁeld is a ﬁeld of forces. By this we mean that to every point
X of a domain D , we associate a vector F (X ) equal to the force an object at X “feels”.
If the forces are time dependent, then the force ﬁeld is written F (X, t), X ∈ D . You are
most familiar with the force ﬁeld due to gravity. If e3 is the direction toward the center
of the earth, and say e1 points east and e2 north along the surface of the earth (other
coordinates must be chosen for the north and south poles), then the gravitational force
is usually written as F = (0, 0, g ) , a constant vector pointing down to the center of the
earth. For more precise purposes, one must take into account the fact that g does vary
from place to place of the earth’s surface. Then F (x) = (0, 0, g (X )) . In even more accurate
experiments - or in outer space - must further account for the eﬀect of the other heavenly
bodies. This brings in the other components of force as well as a time dependence due to the
motion of the earth, F (X, t) = (f1 (X, t), f2 (X, t), f3 (X, t)) . The force ﬁeld is imagined as
a vector attached to each point X in space, the vector having the magnitude and direction
of the net force F there.
An entirely diﬀerent example of a mapping F from En to Em is a factory - or an even
larger economic system. The vector X = (x1 , x2 , . . . , xn ) might represent the quantities
x1 , x2 , . . . of diﬀerent raw materials needed. Y = F (X ) could then represent the output
from the factory, the number yj being the quantity of the j th product produced from the
input X .
Turning to the quantitative mathematical aspect of the mappings F : En → Em , we
deﬁne the derivative. The deﬁnition will be formal, patterned directly on the deﬁnition of
the total derivative given previously (p. 578-9).
Definition: Let F : D ⊂ En → Em and X0 be an interior point of D . F is diﬀerentiable 364 CHAPTER 9. DIFFERENTIAL CALCULUS OF MAPS FROM EN TO EM , S. at X0 , if there exists a linear transformation L(X0 ) : En → Em , depending on the base
point X0 , such that
lim h →0 F (X0 + h) − F (X0 ) − L(X0 ) h
=0
h for any vector h in some suﬃciently small ball about X0 . If F is diﬀerentiable at X0 ,
we shall use the notations
dF
(X0 ) = F (X0 ) = L(X0 )
dX
and refer to them as the derivative of F at X0 . If F (X0 ) depends continuously on the
base point X0 for all X0 in D , then F is said to be continuously diﬀerentiable in D ,
written F ∈ C 1 (D) .
Many of the results from Chapter 8 Sections 1 and 2 generalize immediately to the
present situation.
Proposition 9.1 . The function F : D ⊂ En → Em is diﬀerentiable at the interior point
X0 ∈ D if and only if there is a linear operator L(X0 ) : En → Em and a function R(X0 , h)
such that
F (X0 + h) = F (X0 ) + L(X0 ) h + R(X0 , h)
h,
where the remainder R(X0 , h) has the property
lim h →0 R(X0 , h) = 0. Proof: ⇐ If F is diﬀerentiable at X0 , let L(X0 ) be the derivative and take R(X0 , h) =
[F (X0 + h) − F (X0 ) − L(X0 ) h]/ h . Then this L(X0 ) and R(X0 , h) do satisfy the above
conditions.
⇒ If L(X0 ) and R(X0 , h) are as above, then
lim h →0 F (X0 + h) − F (X0 ) − L(X0 ) h
= lim R(X0 , h) = 0.
h
h →0 Since L(X0 ) is linear, this proves F is diﬀerentiable at X0 .
There is at most one derivative operator L(X0 ) , that is
Proposition 9.2 . (Uniqueness of the derivative). Let F : D ⊂ En → Em be diﬀerentiable
ˆ
˜
at the interior point X0 ∈ D . If L(X0 ) and L(X0 ) are linear operators both of which satisfy
ˆ
˜
the conditions for the derivative of F and X0 , then L(X0 ) = L(X0 ) .
Proof: Word for word the same as the proof of Theorem 1, page 579-80.
If the map F = F (X ) is given in terms of coordinates,
y1 = f1 (x1 , . . . , xn )
y2 = f2 (x1 , . . . , xn )
·
·
·
·
·
·
ym = fm (x1 , . . . , xn ),
how is the derivative computed, and what is its relationship to the derivative of the individual coordinate functions fj ? The answer is contained in 9.1. THE DERIVATIVE 365 Theorem 9.3 . Let F map D ⊂ En into Em be given in terms of the coordinate functions
fj (X ), j = 1, . . . , m
y1 = f1 (X ) f1 (x1 , . . . , xn )
·
·
·
ym = fm (X ) = fm (x1 , . . . , xm ).
(a) Then F is diﬀerentiable or continuously diﬀerentiable at the interior point X0 ∈ D
if and only if all of the fj ’s are respectively diﬀerentiable or continuously diﬀerentiable.
(b) Moreover, if F is diﬀerentiable at X0 , then the derivative in these coordinates is
given by the m × n matrix of partial derivatives ∂f1 ∂f1
(X0 ), . . . , ∂xn (X0 )
f1 (X0 )
∂x1 ·
· = ·
L(X0 ) := F (X0 ) = ·
. ·
· ∂fm
fm (X0 )
(X0 ), . . . , ∂fm (X0 )
∂x1
∂xn
The matrix is sometimes called the Jacobian matrix.
Proof: (a) Observe that the limit
lim h →0 F (X0 + h) − F (X0 ) − L(X0 ) h
=0
h exists if and only if each of its components tend to zero,
lim h →0 fj X0 + h) − fj (X0 ) − Lj (X0 ) h
= 0,
h j = 1, 2, . . . , m. Thus, if F is diﬀerentiable at X0 , each of the coordinate functions fj are diﬀerentiable
and have total derivative Lj (X0 ) . Conversely, if each of the coordinate functions are differentiable at X0 , all of the above limits exist so the vector valued function F is also
diﬀerentiable.
(b) Since the diﬀerentiability of F implies that of the coordinate vectors, we have f1 (X0 ) · .
·
F (X0 ) = ·
fm (X0 )
The result now follows by writing out each of the derivatives
f1 (X0 ) = ( ∂f1 (X0 )
∂f1 (X0 )
,...,
)
∂x1
∂xn f2 (X0 ) = . . . etc. and then inserting these in the expression for F (X0 ) .
Corollary 9.4 . A function F : D ⊂ En → Em is continuously diﬀerentiable in D if and
only if all the partial derivatives of its components ∂fi /∂xj exist and are continuous. CHAPTER 9. DIFFERENTIAL CALCULUS OF MAPS FROM EN TO EM , S. 366 Proof: This follows from this theorem and Theorem 3, p. 585.
EXAMPLES.
1. Let F be an aﬃne map from En to Em
F (X ) = Y0 + BX,
where B is a linear operator from En to Em (which you may choose to think of as an
m × n matrix with respect to some coordinate system) and Y0 = F (0) is a ﬁxed vector in
Em . Then F is diﬀerentiable at every point of En and it given by the eminently reasonable
formula
F (X0 ) = B,
where the operator B does not depend on X0 . For proof, we observe that
F (X0 + h) − F (X0 ) = Y0 + B (X0 + h) − [Y0 + BX0 ] = Bh.
Thus
lim h →0 F (X0 + h) − F (X0 ) − Bh
0
= lim
= 0.
h
h →0 h Since B is linear, this shows the derivatives exists and is B . Let us do this again in
coordinates. If B = ((bij )) the function F is
f1 (X ) = y01 + b11 x1 + b12 x2 + . . . +b1n xn
f2 (X ) = y02 + b21 x1 + . . .
+b2n xn
·
·
·
+bmn xn .
fm (X ) = y0m + bm1 x1 + . . .
Therefore each of the functions fj is clearly diﬀerentiable and
∂f
∂f1
f1 = ( ∂x1 , . . . , ∂xn )
= (b11 , . . . , b1n )
1
·
·
·
·
·
·
·
·
·
∂fm
∂fm
fm = ( ∂x1 , . . . , ∂xn ) = (bm1 , . . . , bmn ). Consequently, F (X0 ) = f1 (X0 )
·
·
·
fm (X0 ) = b11 , . . . , b1m
·
·
·
bm1 , . . . , bmn which agrees with the result obtained without coordinates.
2. Let F : E2 → E3 be deﬁned by
f1 (x1 , x2 ) = 2 − x1 + x2
2 = B, 9.1. THE DERIVATIVE 367
f2 (x1 , x2 ) = x1 x2 − x3
2
f3 (x1 , x2 ) = x2 − 3x1 x2 .
1 Since each of the coordinate functions fj are continuously diﬀerentiable, so is F . Because
f1 (X ) = (−1, 2x2 ), f2 (X ) − (x2 , x1 − 3x2 ),
2 f3 (X ) = (2x1 − 3x2 , −3x1 ), we ﬁnd that at X0 = (3, 1) f1 (X0 )
−1
2
0 .
F (X0 ) = f2 (X0 ) = 1
f3 (X0 )
3 −9
If X is near X0 , then by Proposition 1 with h = X − X0
F (X ) = F (X0 ) + f (X0 )(X − X0 ) + remainder −1
2
0
x1 − 3
0
+ remainder,
= 2 + 1
x2 − 1
3 −9
3
where the remainder term becomes less signiﬁcant the closer X is to X0 .
Motivated by our previous work, it is natural to formally deﬁne the tangent map as
follows.
Definition: Let F : D ⊂ En → Em be diﬀerentiable at the interior point X0 ∈ D . The
tangent map at F (X0 ) to the (hyper) surface deﬁned by F is deﬁned to be the aﬃne
mapping
Φ(X ) = F (X0 ) + f (X0 )(X − X0 ).
Examples:
(1) Let F be the function of Example 2 above. Then the tangent map at X0 = (3, 1) is −1
2
0
x1 − 3
0
.
Φ(X ) = 2 + 1
x2 − 1
3 −9
3
(2) Let F be the function of Example 4 (page 679). Then 1 −1
F (X ) = 2x1 2x2 .
1
1
Thus the tangent map at (2, 1) is 1 −1
1
2
Φ(X ) = 5 + 4
3
1
1 x1 − 2
x2 − 1 If we let Y = Φ(X ) , then the target plane in the tangent space is found from
y1 = 1 + (x1 − 2) − (x2 − 1)
y2 = 5 + 4(x1 − 2) + 2(x2 − 1)
y3 = 3 + (x1 − 2) + (x2 − 1)
By eliminating x1 and x2 from these equations, we ﬁnd y2 = −5 + y1 + 3y3 . A graph of
the surface M and the tangent plane can now be drawn. 368 CHAPTER 9. DIFFERENTIAL CALCULUS OF MAPS FROM EN TO EM , S.
a figure goes here
The next result is the generalization of the mean value theorem. Theorem 9.5 . (Mean Value Theorem). Let F : D ⊂ En → Em be diﬀerentiable at every
point of D , where D is an open convex set in En . If F (X ) is bounded in D , that is, if
∂ fi
there is a constant γ < ∞ such that ∂xj (X ) ≤ γ for all X ∈ D and for all i = 1, . . . , m ,
and j = 1, . . . , n , then
F (X2 ) − F (X1 ) ≤ c X2 − X1
√
for all X1 and X2 in D , where C = nmγ .
Proof: The idea is to use the components of F and to appeal to the similar theorem (p.
597-8) for the function from En → E1 . By that theorem, if X1 and X2 are in D , then
there is a point Z1 on the line segment joining X1 to X2 such that
f1 (X2 ) = f1 (X1 ) + f1 (Z1 )(X2 − X1 ),
and similarly for the other components f2 , f3 , . . . , fm . Thus f1 (Z1 )
f1 (X1 )
f1 (X2 ) ·
·
· =
= ·
·
· ·
·
·
fm (Zm )
fm (X1 )
fm (X2 ) (X2 − X1 ), where Z1 , . . . , Zm are all on the segment joining
a figure goes here
X1 to X2 . Observe that the fj (Zj ) ’s are all vectors. Let L be the matrix of derivatives
in the last term above, that is ∂f1 ∂f1
(Z1 ) · · · ∂xn (Z1 )
f1 (Z1 )
∂x2 ·
· =
·
·
L=
. ·
· ∂fm
∂fm
fm (Zm )
∂x (Zm ) · · ·
∂x (Zm )
1 n The above equation then reads
F (X2 ) = F (X1 ) + L(X2 − X1 ). (9-1) This equation itself is sometimes referred to as the mean value theorem. Note, however,
that the partial derivatives in L are not all evaluated at the same point.
∂ fi
Since ∂xj (X ) ≤ γ for all X , if η is any vector in En , by Theorem 17, p. 373. we
ﬁnd that
√
Lη ≤ nmγ η .
Taking η = X2 − X1 , and using (1), we are led to the inequality
√
F (X2 ) − F (X1 ) ≤ nmγ X2 − X1 , 9.1. THE DERIVATIVE 369 which holds for any points X1 and X2 in D . With C =
inequality. √ nmγ , this is the desired A few heuristic remarks. We have been considering mappings F : En → Em . In the
case of linear mappings, L : En → Em , it was possible to prove that the range of L had
dimension no greater than n , dim R(L) ≤ n . Although this does not remain true for an
arbitrary nonlinear map F , it is still true if F is diﬀerentiable - after a suitable deﬁnition of
dimension for an arbitrary point set is made (for the range of F will not usually be a linear
space, the only sets whose dimension we have so far deﬁned). In the case of diﬀerentiable
maps F , it is easy to make a reasonable deﬁnition of dimension. The idea is to deﬁne
dimension of the range of F locally, that is, in the neighborhood of every point in the
range. If F : D ⊂ En → Em and F is diﬀerentiable at X ∈ D , then for all h suﬃciently
small,
F (X + h) = F (X ) + L(X ) h + remainder.
The dimension of the range of F at F (X ) is deﬁned to be the dimension of its aﬃne part,
which is the same as dim bR(L(X ) ) . Since L(X ) is a linear operator, its range has a well
deﬁned dimension. Geometrically, we have deﬁned dimension of the range of F at F (X )
as the dimension of the tangent plane at F (X ) . Our deﬁnition makes good physical sense
for it is exactly the number an insect on the surface would use for the dimension. The
illustration below is for a map F : D E2 → E3 whose range has dimension 2,
a figure goes here
Some special remarks should be made about maps from one space into another of the
same dimension,
F : D En → En .
Let us assume F is diﬀerentiable throughout D . Then the dimension of the range of
F at F (X ), X ∈ D , is the dimension of the range of L(X ) = F (X ) . If F is to preserve
dimension at every point, then we must have dim R(L(X ) ) = n for all X ∈ D . For maps
F given in terms of coordinates, this means the determinant of the n × n matrix L(X )
does not vanish,
det L(X ) = det F (X ) = 0
for all x ∈ D . In more conceptual terms, this states that a map F : D ⊂ En → En is
dimension preserving at X0 ∈ D if its “aﬃne part” Φ(X0 + h) = F (X0 ) + F (X0 )h is
dimension preserving at X0 (there is no trouble with the constant vector F (X0 ) since it
only represents a translation of the origin - which does not aﬀect dimensionality).
From the geometric interpretation of determinants as volume, we see that the condition
det F (X0 ) = 0 means that if a small set S ⊂ D has non-zero volume, then its image F (X )
also has non-zero volume. In fact, we expect that if S is a small set about X , then
Vol (F (S )) = det F (X0 ) Vol (S ).
Our expectation is based upon the realization that if the points of S are all near X0 , then
F will behave like its aﬃne part, (X0 + h) = F (X0 ) + F (X0 )h , on the points X0 + h ∈ S .
The above formula is a restatement of the eﬀect of aﬃne maps on volume (Corollary to
Theorem 30, page 426). We shall return to this later (Chapter 10, Section 4). CHAPTER 9. DIFFERENTIAL CALCULUS OF MAPS FROM EN TO EM , S. 370 Because of its frequent appearance, det F (X ) has a name of its own. It is called the
Jacobian determinant or just the Jacobian of F . If F is given in terms of coordinates,
y1 = f1 (x1 , . . . , xn )
·
·
·
yn = fn (x1 , . . . , xn ),
then another common notation for the Jacobian is
det F (X ) = ∂ (f1 , f2 , . . . , fn )
.
∂ (x1 , x2 , . . . , xn ) For these maps F from a space into one of the same dimension, F : D ⊂ En → En ,
there is a very special derivative which appears often. It is the sum of the diagonal elements
of the derivative matrix F (X ) . One writes this expression as . F or ÷F , the divergence
of F ,
∂f1 (X ) ∂f2 (X )
∂fn (X )
· F (X ) = div F (X ) =
+
+ ··· +
∂x1
∂x2
∂xn
For example, if Y = F (X ) is deﬁned by
y1 = x1 + 2x1 x2
y2 = x2 − 3x2 ,
1
then
F (X ) = 1 + 2x2 2x1
2x1
−3 and
· F (X ) = div F (X ) = (1 + 2x2 ) + (−3) = −2 + 2x2 .
The signiﬁcance of the divergence will become clear later (Chapter 10, Section 2). You will
probably ﬁnd it helpful to think of
as the operator
=(
Then ∂
∂
,··· ,
).
∂x1
∂xn · F is the “scalar product” of the operator with the vector F . EXERCISES.
(1) (a) Find the derivative matrix at the given point for the following mappings Y =
F (X ) .
(i) y1
y2
(ii) y1
y2
y3
y4 = x2 + sin x1 x2
1
= x2 + cos x1 x2 at X0 = (0, 0)
2
= x2 + x3 ex2 − x3
1
2
= x1 − 3x2 + x1 log x3
= x2 + x3
= 5x1 x2 x3 at X0 = (2, 0, 1) (b) Find the equation of the tangent plane to the above surfaces at the given point. 9.1. THE DERIVATIVE 371 (2) Consider the following map from E2 → E2 ,
u = ex cos y
v = ex sin y
(a) Find the image of the following regions
i) x ≥ 0, 0 ≤ y ≤ π
4
ii) x ≥ 0, 0 ≤ y ≤ π
iii) x ≤ 0, 0 ≤ y ≤ 2π
iv) 1 < x < 2, π ≤ y ≤ π .
6
3
(b) Compute the derivative matrix and its determinant.
(3) If F : D ⊂ En → Em is diﬀerentiable at X0 ∈ D , prove it is then also continuous at
X0 .
(4) Let F and G both map D ⊂ En → Em , so the function f (X ) = F (X ), G(X ) is
deﬁned for all X ∈ D and f : D → E1 .
(a) If F and G are diﬀerentiable in D , prove f is also, and that
f =F G+GF
(b) Apply this result to the function
f (X ) = X, AX − 2 X, Y ,
where A is a constant linear operator from En → En and Y is a constant vector
in En . How does the result simplify if A is self adjoint?
(5) If ϕ : D ⊂ En → E1 and F : D ⊂ En → Em , then the function G(X ) := ϕ(X )F (X )
is deﬁned for all x ∈ D and G : En → Em .
(a) Let ϕ(x2 , x2 ) = ax1 + bx2 and F (x1 , x2 ) = (αx1 + βx2 , γx1 + δx2 ) . Let G = ϕF
and compute G (X ) .
(b) More generally, prove that if ϕ and F are diﬀerentiable in D , then G := ϕF
is also diﬀerentiable and ﬁnd a formula for G . If F is expressed in terms of
coordinate functions, F = (f1 , f2 , . . . , fm ) , how does your formula read? Check
the result with that of part (a).
(6) (a) If F : D ⊂ En → Em is diﬀerentiable in the open connected set D , and if
F (X ) ≡ 0 for all x ∈ D , prove that F is a constant vector.
(b) If F and G map D ⊂ En → Em are diﬀerentiable in the open connected set
D , and if F (X ) ≡ G (X ) for all x ∈ D , what can you conclude?
(7) Consider the map F : Q → R3 deﬁned by
x = (a + b cos ϕ) cos θ
F : y = (a + b cos ϕ) sin θ
z = b sin ϕ 372 CHAPTER 9. DIFFERENTIAL CALCULUS OF MAPS FROM EN TO EM , S.
a figure goes here (a) Compute F . (b) Find the equation of the tangent map at (0, 0) and at (π/2, π/2) . (c) Determine the range of the tangent map at the above two points and indicate
your ﬁndings in a sketch. 9.2. THE DERIVATIVE OF COMPOSITE MAPS (“THE CHAIN RULE”). 9.2 373 The Derivative of Composite Maps (“The Chain Rule”). Consider the two mappings
F : A ⊂ En → Em and G : B ⊂ Em → Er .
Then the composite map H := G ◦ F : A ⊂ En → Er is deﬁned if B contains the image of
all the points from A, F (A) ⊂ B .
a figure goes here
The map H = G ◦ F takes points from A ⊂ En and sends them into Er . From
knowledge of the derivatives of F and G , it is possible to compute the derivative of the
composite map G ◦ F .
Theorem 9.6 . Let F : A ⊂ En → Em and G : B ⊂ Em → Er be diﬀerentiable maps
deﬁned in the open sets A and B , respectively, with F (A) ⊂ B (so the composite map
H (X ) := (G ◦ F )(X ) is deﬁned for all X ∈ A ). If X0 ∈ A , let Y0 = F (X0 ) ∈ B . Then
the composite map H is diﬀerentiable at X0 and
H (X0 ) = G (Y0 ) ◦ F (X0 ).
Remark: The multiplication G ◦ F is the multiplication of the linear operators G and
F . If F and G are given in terms of coordinates, then the formula is just the product of
two matrices G and F .
Before proving this theorem, we shall illustrate its meaning.
Example: Let F : E2 → E2 and G : E2 → E3 be deﬁned by Y = F (X ) and Z = G(Y )
as follows z1 = y 1 y 2
y1 = x1 − x2
2
2
z = 1 + y1 + y2
y2 = x2 sin πx1 2
3
z3 = 5 − y 2 .
Then y2
y1
1 .
G (X ) = 2y1
2
0 −3y2 F (X ) = 1
−2x2
πx2 cos πx1 sin πx1 , At X0 = (3, 2) , we ﬁnd Y0 = F (X0 ) = (−1, 0) . Thus F (X0 ) = 1 −4
−2π
0 , 0 −1
1 .
G (Y0 ) = −2
0
0 If H (X ) = (G ◦ F )(X ) = G(F (X )) , then the derivative of H at X0 is 0 −1
2π
0
1 −4
1
H (X0 ) = G (Y0 ) ◦ F (X0 ) = −2
= −2 − 2π 8 .
−2π
0
0
0
0
0 CHAPTER 9. DIFFERENTIAL CALCULUS OF MAPS FROM EN TO EM , S. 374 The derivative could also have been found in a longer way by explicitly ﬁnding Z = H (X )
from the formulas for F and G
z1 = y1 y2 = (x1 − x2 )(x2 sin πx1 )
2
2
z2 = 1 + y1 + y2 = 1 + (x1 − x2 )2 + x2 sin πx1
2
3
z3 = 5 − y2 = 5 − (x2 sin πx1 )3
and now directly computing H (X0 ) .
Proof of Theorem. Since F is diﬀerentiable at X0 ∈ A ⊂ En and G is diﬀerentiable
at Y0 ∈ B ⊂ Er , for all suﬃciently small vectors h ∈ En and k ∈ Em , we can write
F (X0 + h) = F (X0 ) + F (X0 )h + R1 (X0 , h) h
G(Y0 + k ) = G(Y0 ) + G (Y0 )k + R2 (Y0 , k ) k
where
lim h →0 R1 (X0 ; h) = 0 and lim k →0 R2 (Y0 , k ) = 0. Consequently, since H (X ) := (G ◦ F )(X ) = G(F (X )) ,
H (X0 + h) = G(F (X0 + h))
= G(F (X0 ) + F (X0 )h + R1 (X0 ; h) h
= G(F (X0 )) + G (Y0 )F (X0 )h + R3 (X0 , h) h ,
where
R3 (X0 ; h) = G (Y0 )R1 (X0 ; h) + R2 (Y0 , k ) k
,
h and
k = F (X0 )h + R1 (X0 ; h) h .
Thus, for all suﬃciently small h ,
H (X0 + h) = H (X0 ) + G (Y0 )F (X0 )h + R3 (X0 , h) h .
Because G (Y0 ) and F (X0 ) are linear maps, so is their product. Therefore we are done if
we prove lim R3 (X0 ; h) = 0 .
h →0 By the triangle inequality
R3 (X0 ; h) ≤ G (Y0 )R1 (X0 ; h) + R2 (Y0 , k )
h k . Since for ﬁxed X0 , the operators F (X0 ) and G (Y0 ) are constant operators, by Theorem
17, p. 373, there exist constants α and β such that for any vectors ξ ∈ En and η ∈ Em ,
F (X0 )ξ ≤ α ξ and G (Y0 )η ≤ β η . This means
k ≤ F (X0 )h + R1 (X0 ; h) h ≤ (α + R1 (X0 ; h) ) h 9.2. THE DERIVATIVE OF COMPOSITE MAPS (“THE CHAIN RULE”). 375 and
G (Y0 )R1 (X0 ; h) ≤ β R1 (X0 ; h) .
Thus,
R3 (X0 ; h) ≤ β R1 (X0 ; h) + (α + R1 (X0 ; h) ) R2 (Y0 , k )
Now, as h → 0 , so does k ≤ (α + R1 (X0 ; h) ) h . From the deﬁnition of R1 and
R2 , this implies R3 (X0 ; h) → 0 as h → 0 and completes the proof.
Incidentally, if one writes Y = F (X ) and Z = G(Y ) , then the chain rule can be
written in the form
dG dY
d
(G ◦ F ) =
◦
,
dx
dY dX
which could hardly be more simple to remember.
For the balance of this section, we shall work out a few more illustrations showing how
the chain rule is applied in diﬀerent concrete situations. We isolate the next example as an
important
Corollary 9.7 . Let F : D ⊂ En → Em and the scalar valued function g : Em → E1
both satisfy the hypotheses of Theorem 1. If we write Y = F (X ) in coordinates F =
(f1 , f2 , . . . , fm ) , and let h = g ◦ F , then
∂h
∂x1 = ∂g ∂f1
∂y1 ∂x1 + ∂g ∂f2
∂y2 ∂x1 + ··· + ∂g ∂fm
∂ym ∂x1 = ∂g ∂f1
∂y1 ∂xn + ∂g ∂f2
∂y2 ∂xn + ··· + ∂g ∂fm
∂ym ∂xn ·
·
·
∂h
∂xn Remark: This is the chain rule for scalar-valued functions.
Proof: By Theorem 3,
dq dF
dh
=
dX
dY dX
Since
∂g
∂g
dq
,··· ,
)
=(
dY
∂y1
∂ym
and dF = dX ∂f1
∂x1 ··· ∂f1
∂xn ·
·
·
∂fm
∂x1 ··· ∂fm
∂xn , we ﬁnd upon multiplying the matrices that
dh
=(
dX
But we also know m
j =1 ∂g ∂fj
,
∂yj ∂x1 m
j =1 ∂g ∂fj
,··· ,
∂yj ∂x2 m
j =1 ∂h ∂h
∂h
dh
=(
,
,···
).
dX
∂x1 ∂x2
∂xn ∂g ∂fj
).
∂yj ∂xn CHAPTER 9. DIFFERENTIAL CALCULUS OF MAPS FROM EN TO EM , S. 376 Comparison of the last two formulas gives the stated result.
EXAMPLE. Let F : E2 → E2 and g : E2 → E1 be deﬁned by
f1 (x1 , x2 ) = x1 − ex2 ,
f2 (x1 , x2 ) = ex1 + x2
Then
F (X ) = 1
ex1 −ex2
1 2
g (y1 , y2 ) = y1 + y1 y2 . , g (Y ) = (2y1 + y2 , y1 ). If h = g ◦ F = g (F (x1 , x2 )) , then
dh
= (2y1 + y2 , y1 )
dX −ex2
1 1
ex1 = (2y1 + y2 + y1 ex1 , −(2y1 + y2 )ex2 + y1 ).
In particular, we ﬁnd
∂h
= 2y1 + y2 + y1 ex1
∂x1
and ∂h
= −(2y1 + y2 )ex2 + y1 .
∂x2 These formulas could also have been found by directly applying the corollary, viz.
∂h
∂g ∂f1
∂g ∂f2
=
+
= (2y1 + y2 )1 + y1 (ex1 ),
∂x1
∂y1 ∂x1 ∂y2 ∂x1
and similarly for ∂h/∂x2 .
Many applications of the chain rule are more complicated. Consider a real valued
˜
function g (x1 , x2 , x3 , t) , which depends on the point X = (x1 , x2 , x3 ) as well as t . The
˜
function g could be an expression of the temperature at a point X at time t . If the point
˜
˜
X represents your position in the room, then since you move around the room, X is itself
˜ = F (t) ,
˜
a function of t . Thus, if your position is speciﬁed by X
x1 = f1 (t), x2 = f2 (t), x3 = f3 (t), ˜
the temperature where you stand is h(t) = g (f1 (t), f2 (t), f3 (t), t) . Since F : E1 → E3 while
4 → E1 , the chain rule is not directly applicable because g is deﬁned on E4 , while the
g:E
˜
image of F is in E3 .
A simple - if artiﬁcial - device clears up the diﬃculty. Introduce another variable x4
and let X = (x1 , x2 , x3 , x4 ) . Then write g (x1 , x2 , x3 , x4 ) , as well as X = F (t) , with
x1 = f1 (t), x2 = f2 (t), x3 = f3 (t), x4 = f4 (t) ≡ t. Now, as before, h(t) = g (f1 (t), f2 (t), f3 (t), t) , but F : E1 → E4 and g : E4 → E1 . The
chain rule is thus applicable and gives
dh
dg dF
=
dt
dX dt 9.2. THE DERIVATIVE OF COMPOSITE MAPS (“THE CHAIN RULE”). =( ∂g ∂g ∂g ∂g ,
,
,
∂x1 ∂x2 ∂x3 ∂x4 df1
dt
df2
dt
df3
dt 377 , 1
so that ∂g ∂f1
∂g ∂f2
∂g ∂f3
∂g
dh
=
+
+
+
.
dt
∂x1 ∂t
∂x2 ∂t
∂x3 ∂t
∂x4
Since x4 ≡ t , the last equation can also be written as
dh
∂g df1
∂g ∂f2
∂g df3 ∂g
=
+
+
+
.
dt
∂x1 dt
∂x2 ∂t
∂x3 dt
dt From a less formal viewpoint, this could have been obtained directly from the equation
h(t) = g (f1 (t), f2 (t), f3 (t), t) without dragging in the artiﬁcial auxiliary variable x4 . The
variable x4 has been introduced to show how the chain rule applies. Once the process is
understood, the variable x4 can (and should) be omitted.
4
EXAMPLE. Let g (x1 , x2 , x3 , t) = x1 t + 3x2 − x1 x3 + 1+t2 , and let x1 = 3t − 1, x2 =
2
et−1 , x3 = t2 − 1 . If h(t) = g (x1 (t), x2 (t), x3 (t), t) , we ﬁnd ∂g ∂x1
∂g dx2
∂g dx3 ∂g
dh
=
+
+
+
.
dt
∂x1 dt
∂x2 ∂t
∂x3 dt
∂t
= (t − x3 )3 + (6x2 )et−1 − (x1 )2t + x1 − 8t
.
(1 + t2 )2 In particular, at t = 1 , we have x1 = 2, x2 = 1, x3 = 0 so that
8
dh(1)
= (1 − 0)3 + (6)1 − (2)2 + 2 − = 5.
dt
4
It is straightforward to compute the second derivative d2 h/dt2 from the formula for
the ﬁrst derivative.
∂ dg dx1
∂ dg dx2
∂ dg dx3
dg
d2 h
()
()
()
=
+
+
+ ( ).
dt2
∂x1 dt dt
∂x2 dt dt
∂x3 dt dt
∂t dt
For this example, this gives
d2 h
= (−2t + 1)3 + (6et−1 )et−1 + (−3)2t+
dt2
(3 + 6x2 et−1 − 2x1 − 8 1 − 3t 2
).
(1 + t2 )3 At t = 1 , we have
∂2h
−2
(1) = (−2 + 1)3 + 6 − 6 + (3 + 6 − 4 − 8 ) = 4.
2
∂t
8
∂
The next example brings to the surface an ambiguity in the notation ∂x for partial
derivatives. This ambiguity is often a source of great confusion. Consider a scalar valued
function g (x1 , x2 , t, s) . If x1 = f1 (t) and x2 = f2 (t) , then h(t, s) = g (f1 (t), f2 (t), t, s) 378 CHAPTER 9. DIFFERENTIAL CALCULUS OF MAPS FROM EN TO EM , S. depends on the two variables t and s . In order to see how h changes with respect to t ,
we regard s as being held ﬁxed and use the previous example to ﬁnd
∂g ∂f1
∂g ∂f2 ∂g
∂h
=
+
+
.
dt
∂x1 dt
∂x2 ∂t
∂t
We were careful and realized that the functions g (x1 , x2 , t, s) , a function with four
independent variables, and h(t, s) := g (f1 (t), f2 (t), t, s) , a function with only two independent variables, were diﬀerent functions. The usual (occasionally confusing) approach is to
be less careful and write
∂g
∂g ∂f1
∂g ∂f2 ∂g
=
+
+
.
dt
∂x1 dt
∂x2 ∂t
∂t
In the above equation, the term ∂g/∂t on the right is the partial derivative of g (x1 , x2 , t, s)
with respect to t while thinking of all four variables x1 , x2 , t and s as being independent.
On the other hand, the term ∂g/∂t on the left is the partial derivative of g (f1 (t), f2 (t), t, s)
as a function of two variables. After being spelled out like this, the formula does have a
clear meaning - but this is not at all obvious from a glance. One might even be mistakenly
tempted to cancel the terms ∂g/∂t from both sides of the equation.
It is often awkward to introduce a new name, as h(t, s) , for g (f1 (t), f2 (t), t, s) . Another
unambiguous procedure is available: use the numerical subscript notation for the partial
derivatives. Then g,1 always refers to the partial derivative of g with respect to its ﬁrst
variable, g,2 with respect to the second variable, etc. Thus, for the above example of
g (x1 , x2 , t, s) where x1 = f1 (t) and x2 = f2 (t) , we have
df1
df2
∂g
= g,1
+ g,2
+ g,3 .
∂t
dt
dt
This clearly distinguishes the two time derivatives g,3 and ∂g/∂t .
The seemingly unnecessary comma in the notation is to take care of the possibility
of vector valued functions G(x1 , x2 , t, s) whose coordinate functions are indicated by subg1
is a map into E2 , where the coordinate functions are
scripts. For example, if G =
g2
g1 (x1 , x2 , t, s) and g2 (x1 , x2 , t, s) , then if x1 = f1 (t) and x2 = f2 (t) , we have
∂G
=
∂t ∂ g1
∂t
∂g2
∂t = g1,1 f1 + g1,2 f2 + g1,3
g2,1 f1 + g2,2 f2 + g1,3 . Here g1,1 = ∂g1 /∂x1 , etc. The notation f1 for df1 (t)/dt could also have been replaced by
f1,1 —but this is unnecessary here since the fj are functions of one variable.
In applications, one commonly meets a problem of the following type. Let u(x, y ) be
a scalar valued function which satisﬁes the wave equation uxx − uyy = 0 . If F : E2 → E2
is deﬁned by
1
x = f1 (ξ, η ) = (ξ, +η )
2
1
y = f2 (ξ, η ) = (ξ − η )
2
and if h = u ◦ F , that is, h(ξ, η ) = u(f1 (ξ, η ), f2 (ξ, η )) , what diﬀerential equation does h
satisfy? First, we compute hξ and hη
∂h
∂u ∂f1 ∂u ∂f2
1
1
1
=
+
= ux ( ) + uy ( ) = (ux + uy )
∂ξ
∂x ∂ξ
∂y ∂ξ
2
2
2 9.2. THE DERIVATIVE OF COMPOSITE MAPS (“THE CHAIN RULE”). 379 ∂h
∂u ∂f1 ∂u ∂f2
1
1
1
=
+
= ux ( ) + uy (− ) = (ux − uy )
∂η
∂x ∂η
∂y ∂η
2
2
2
In a similar way the second derivatives hξξ , hξη and hηη are found,
∂ (hξ ) ∂f1 ∂ (hξ ) ∂f2
∂2h
=
+
2
∂ξ
∂x ∂ξ
∂y ∂ξ
= 1∂
1 1∂
1
1
(ux + uy ) +
(ux + uy ) · = [uxx + 2uxy + uyy ]
2 ∂x
2 2 ∂y
2
4
∂ (hξ )
∂ (hξ ) ∂f1 ∂ (hξ ) ∂f2
∂2h
=
=
+
∂ξ∂η
∂η
∂x ∂η
∂y ∂η
= 1∂
1 1∂
−1
1
(ux + uy ) · +
(ux + uy ) ·
= [uxx − uyy ]
2 ∂x
2 2 ∂y
2
4
∂ (hη ) ∂f1 ∂ (hη ) ∂f2
∂2h
=
+
2
∂η
∂x ∂η
∂y ∂η = 1∂
1 1∂
1
1
(ux − uy ) · +
(ux − uy )(− ) = [uxx − 2uxy + uyy ]
2 ∂x
2 2 ∂y
2
4 Since hξη = 1 [uxx − uyy ] , and u satisﬁes the wave equation, we see that h satisﬁes the
4
equation
hξη = 0,
so, in fact, the equations for hxiξ and hηη are superﬂuous to obtain the desired result.
From this, it is easy to give another procedure for solving the wave equation, independent of Fourier series. Because hξη = 0 , we know that h(ξ, η ) = ϕ(ξ ) + ψ (η ) , where the
functions ϕ and ψ are any twice diﬀerentiable functions. However, h(ξ, η ) = u( ξ+η , ξ−η ) .
2
2
Since the equations x = ξ+η , y = ξ−η may be solved for ξ and η in terms of x and y ,
2
2
viz. ξ = x + y and η = x − y , we have h(x + y, x − y ) = u(x, y ) . But h(ξ, η ) = ϕ(ξ )+ ψ (η ) .
Consequently
u(x, y ) = ϕ(x + y ) + ψ (x − y ).
This formula is the general solution of the one space dimensional wave equation. It expresses
u in terms of two arbitrary functions ϕ and ψ .
These functions ϕ and ψ can be chosen so that the function u(x, y ) , a solution
of the wave equation, has any given initial position u(x, 0) = f (x) and initial velocity
uy (x, 0) = g (x) . Let us do this.
From the initial conditions we ﬁnd
f (x) = u(x, 0) = ϕ(x) + ψ (x)
g (x) = uy (x, 0) = ϕ (x) − ψ (x).
After diﬀerentiating the ﬁrst expression, one can solve for ϕ and ψ ,
ϕ (x) = f (x) + g (x)
,
2 ψ (x) = f (x) − g (x)
.
2 Integrate these:
x ϕ(x) = ϕ(0) +
0 f (s) + g (s)
f (x) − f (0) 1
ds = ϕ(0) +
+
2
2
2 x g (s) ds.
0 380 CHAPTER 9. DIFFERENTIAL CALCULUS OF MAPS FROM EN TO EM , S.
x ψ (x) = ψ (0) +
0 x f (s) + g (s)
f (x) − f (0) 1
ds = ψ (0) +
+
2
2
2 g (s) ds.
0 Thus,
u(x, y ) = ϕ(x + y ) + ψ (x − y ) = ϕ(0) + ψ (0) + f (x + y ) − f (0) 1
+
2
2 x+ y g (s) ds+
0 f (x − y ) − f (0) 1
− intx−y g (s) ds.
0
2
2 Because f (0) = ϕ(0) + ψ (0) , this simpliﬁes to
u(x, y ) = f (x + y ) − f (x − y ) 1
+
2s
2 x+ y g (s) ds,
x− y the famous d’Alembert formula for the solution of the one space dimensional wave equation in terms of the initial position f (x) and initial velocity g (x) . Unfortunately, simple
formulas like this are exceedingly rare. That is why a diﬀerent, more generally applicable,
procedure was used earlier to solve the wave equation. As was seen in Exercise 6, p. 645,
the d’Alembert formula is recoverable from the Fourier series. Exercises
(1) For the following function g and f , compute
the point X0 = (2, 2) . d
dX (g ◦ F ) and evaluate ∂
∂x1 (g ◦ F ) at (a) g (y1 , y2 ) = y1 y2 − y2 e2y1 ,
F : yz = 2x1 − x1 x2 , y2 = x2 + x2
1
2 (b) g (y1 , y2 ) = 7 + ey1 sin y2
F : y1 = 2x1 x2 , y2 = x2 − x2
1
2 2
2
(c) g (y1 , y2 , y3 ) = y1 − y2 − 3y1 y3 + y2 F : y1 = 2x1 − x2 , y2 = 2x1 + x2 , y3 = x2
1 (2) Let ϕ(x1 , x2 , t) := x2 x2 − te2x1 . If X = F (t) is deﬁned by x1 = 1 − t2 ,
d
ﬁnd dt (ϕ ◦ F ) at t = 1 .
(3) Let ϕ(x, s, t) := xs + xt + st . If x = f (t) = t3 − 7 , compute
∂2
Also compute ∂t2 (ϕ ◦ f ) at t = 3 . ∂
∂t (ϕ x2 = 3t + 1 , ◦ f ) at t = 3 . (4) If u(x, y ) = x2 − y 2 , while F := (f1 , fx ) is given by x = f1 (r, θ) = r cos θ, y =
f2 (r, θ) = r sin θ ﬁnd hr and hθ , where h := u ◦ F . Also compute, hrr , hrθ and
hθθ . 9.2. THE DERIVATIVE OF COMPOSITE MAPS (“THE CHAIN RULE”). 381 (5) (a) Let u(x, y ) be a scalar valued function and F : E2 → E2 be deﬁned by the polar
coordinate transformation
f1 (r, 0) = r cos θ, f2 (rθ) = r sin θ, Take h := u ◦ F . Find hr , hθ , hrr , hrθ , and hθ,θ . [Answer: h4 = ux cos θ +
uy sin θ, hrr = −uxx r sin θ + uyy (r cos θ − r sin θ)+ uyy r cos θ − ux sin θ + uy cos θ]
(b) Show that
uxx + uyy = hrr + 1
1
h + hr .
2 θθ
r
r (6) The two space dimensional wave equation is
utt = uxx + uyy
(a) If the space variables x, y are changed to polar coordinates (ex. 5) while the
time variable is not changed, the wave equation reads
htt =?
where h(r, θ, t) = u(r cos θ, r sin θ, t).
(b) If a given wave form depends only on the distance r from the origin and time
t , but not on the angle ∂ , how does the wave equation for h simplify?
(c) Consider the equation you found in b. Use the method of separation of variables
and seek a solution in the form h(r, t) = R(r)T (t) . What are the resulting
ordinary diﬀerential equations? Compare the equation for R(r) with Bessel’s
diﬀerential equation.
(7) If w = f (x, y, s) , while x = ϕ(y, s, t) and y = ψ (s, t) , ﬁnd expressions for the partial
derivative of the composite function g (ϕ(ψ, s, t), ψs) with respect to s and t .
(8) (a) Let u(x, y ) = f (x − y ) . Show that u satisﬁes the partial diﬀerential equation
ux + uy = 0.
(b) Let u(x, y ) = f (xy ) . Show that u satisﬁes the equation xux − yuy = 0 .
(c) Let u(x, y ) = f ( x ) . Show that u satisﬁes the equation
y
xux + yuy = 0.
(d) Let u(x, y ) = f (x2 + y 2 ) , so u only depends on the distance from the origin.
Show that u satisﬁes
yux − xuy = 0.
(9) Let u(x, y ) satisfy the equation xux + yuy = 0 .
(a) Change the equation to polar coordinates [Answer: if h(r, θ) := u(r cos θ, r sin θ) ,
then rhr = 0 ].
(b) Solve the equation for h(r, θ) and use it to deduce that u(x, y ) = f ( x ) for some
y
function f . (cf. Ex. 8c) 382 CHAPTER 9. DIFFERENTIAL CALCULUS OF MAPS FROM EN TO EM , S. (10) Assume u(x, y ) satisﬁes the equation
uxx − 2uxy − 3uyy = 0.
(a) Choose the constants α, β, γ , and δ so that after the change of variables x =
αξ + βη, y = γξ + δη , the equation for h(ξ, η ) = u(αξ + βη, γξ + δη ) is hξη = 0 .
(b) Use the result of part (a) to ﬁnd the general solution of the equation for u .
[Answer: u(x, y ) = ϕ(3x − y ) + ψ (x + y ) ].
(11) If f (x, y ) is a known scalar valued function, ﬁnd both partial derivatives of the
function f (f (x, y ), y ) .
(12) If W = G(Y ) and Y = F (X ) are deﬁned by
G:
ﬁnd d
dX (G w1 = ey1 −y2
,
w2 = ey1 +y2 y1 = x2 − 3x2 − x3
1
y2 = x1 + x2 + 3x3 ,
2 F: ◦ F) . (13) Let u(x, y ) be a solution of the two dimensional Laplace’s equation uxx + uyy = 0 .
(a) If u depends only on the distance from the origin u(x, y ) = h(r) , where r =
x2 + y 2 , what ordinary diﬀerential equation does h satisfy? Compare your
answer with that found in Exercise 5.
(b) Solve the resulting equation for h and deduce that all the solutions of the two
dimensional Laplace equation which depend only on the distance from the origin
are of the form
u(x, y ) = A + B log(x2 + y 2 ),
where A and B are constants.
(c) Now do the same thing all over again for a solution u(x1 , x2 , . . . , xn ) of the n
dimensional Laplace equation ux1 x1 + . . . + uxn xn = 0 , i.e. ﬁnd the form of
all solutions which only depend on r = x2 + . . . + x2 , u(x1 , . . . , xn ) = h(r) .
n
1
B
B
[Answer: u(x1 , . . . , xn ) = A + (x2 +...+x2 ) n−2 = A + rn−2 , n ≥ 3 ].
2
1 n (14) If f (t) is a diﬀerentiable scalar valued function with the property that f (x + y ) =
f (x) + f (y ) for all x, y ∈ E1 , prove that f (x) ≡ kx where k = f (1) .
(15) (a) Find the general solution of the partial diﬀerential equation ux − 2uy = 0 . [Hint:
Introduce new variables as in Ex. 10]
(b) What is the solution if one requires that u(x, 0) = x2 ? [Answer: u(x, y ) =
1
(x + 2 y )2 ]. Chapter 10 Miscellaneous Supplementary
Problems 1. (a) Sn , n = 1, 2, . . . , be a given sequence. Find another sequence an such that
N SN = an . In other words, given the partial sums Sn , ﬁnd a series whose
n=1 partial sums are Sn . To what extent are the an uniquely determined?
(b) Apply part (a) to ﬁnd an inﬁnite series
an whose n th partial sum Sn is
given by
1
(i) Sn = ,
(ii) Sn = e−n
n
2. Let S = { x ∈ R : x ∈ (−1, 1) } . Deﬁne addition on S by the formula x ⊕ y =
x+ y
1+xy , x, y ∈ S , where the operations on the right are the usual ones of arithmetic.
Show that the elements of S form a commutative group with the operation ⊕ .
3. (a) If an → a , prove that a1 +a2 +···+an
n → a also. (b) Assume that f is continuous on the interval [0, ∞] and lim f (x) = A . Deﬁne
x→∞ 1N
HN =
f (x) dx . Prove that lim HN exists and ﬁnd its value. [Hint:
x→∞
N0
Interpret HN as the average height of the function f ].
4. (a) Suppose that all the zeroes of a polynomial P (x) are real. Does this imply that
all the zeroes of its derivative P (x) are also real? (Proof or counterexample).
What can you say about higher derivatives P (k) (x) ?
(b) Deﬁne the n th Laguerre polynomial by
Ln (x) = ex d n n −x
[x e ].
dxn Show that Ln is a polynomial of degree n . Prove that the zeroes of Ln (x) are
all positive real numbers, and that there are exactly n of them.
383 384 CHAPTER 10. MISCELLANEOUS SUPPLEMENTARY PROBLEMS
∞ an xn (which converges to f for |x| < ρ so 5. If f (x) has a Taylor series: f (x) =
n=0 the remainder does go to zero there) prove that f (cxk ) , where c is a constant and
k a positive integer, has the Taylor series
∞
k an cn xnk f (cx ) =
n=0 which converges to f (cxk ) for |x| < ( |ρ| )1/k . You must show that i) the Taylor
c
coeﬃcients for f (cxk ) are an cn , that ii) the power series for f (cxk ) converges for
|x| < ( |ρ| )1/k , and that iii) the remainder tends to zero. Apply the result to obtain
c
the Taylor series for cos(2x2 ) from that of cos x .
6. Yet another proof of Taylor’s Theorem. Beginning with equation 9 on p. 98, deﬁne
the function K (s) by
N K (s) = f (s) −
n=0 f (n) (x0 )
A(s − x0 )N +1
(s − x0 )n −
,
n!
(N + 1)! where A is picked so that K (ˆ) = 0 .
x
(a) Verify that K (x0 ) = K (x0 ) = . . . K (N ) (x0 ) = 0 .
(b) Use Rolle’s Theorem to prove that if a function K (s) satisﬁes the properties of a),
and if K (ˆ) = 0 , then there is a ξ between x and x0 such that K (N +1) (ξ ) = 0 .
x
ˆ
(c) Apply parts a) and b) to prove Taylor’s Theorem.
7. Assume
an converges. You are to investigate the convergence of
|an | under various hypotheses. a2 and
n (a) an arbitrary complex number
(b) an ≥ 0 .
an + 1
(c) lim
< 1 (not = 1).
n→∞
an
1
8. The harmonic series 1+ 1 + 3 +· · · has been said to diverge with “infuriating slowness”.
2
1
Find a number N such that 1 + 1 + 1 + · · · + N is at least 100. Compare this with
2
3
23 .
Avogadro’s number ∼ 6 × 10 9. Consider the series ∞
n=1 an , where the an ’s are real. (a) Let b1 , b2 , b3 , . . . and c1 , c2 , c3 , . . . denote the positive and negative terms respec∞
tively from a1 , a2 , . . . . If
n=1 an converges conditionally but not absolutely,
∞
∞
prove that both series
n=1 bn and
n=1 cn diverge.
(b) Let d1 , d2 , d3 , . . . , denote the terms a1 , a2 , a3 , . . . rearranged in any way. Prove
∞
Riemann’s theorem, which states that if
n=1 an converges conditionally but
∞
not absolutely, then by picking some suitable rearrangement, the series
n=1 dn
can be made to converge to any real number, while using other rearrangements,
it can be made to diverge to plus or minus inﬁnity. 385
10. If A and B are subsets of a linear space V , a) show that span{ A ∩ B } ⊂ span{ A }∩
span{ B } . Give an example showing that span{ A ∩ B } may be smaller than
span{ A } ∩ span{ B } .
b). Show that if A ⊂ B ⊂ span{ A } , then span{ A } ⊃ span{ B } .
11. Let A = { X1 , . . . , Xk } be a set of vectors in a linear space V . Denote by cs A
(coset of A ) the set
k csA = { X ∈ V : X = k aj Xj ,
j =1 aj = 1 }. where
j =1 Prove that cs A is a coset of V , in fact, the smallest coset of V which contains the
vectors X1 , . . . , Xk .
√
12. (a) Consider the set of real numbers of the form a + b 2 , where a and b are rational
numbers. Prove that this set is a vector space over the ﬁeld of rational numbers.
What is the dimension of this vector space?
(b) Consider√ set of numbers of the form a + bi , where a and b are real numbers
the
and i = −1 . Prove that this set is a vector space over the ﬁeld of real numbers
and ﬁnd its dimension.
13. If F1 and F2 are ﬁelds with F1 ⊂ F2 , we call F2 an extension ﬁeld of F1 – such
as R ⊂ C . As such, we may think of F2 as a vector space over the ﬁeld F1 (see
exercise 1l). In other words, take F2 as an additive group and take the scalars from
F1 . If this vector space is ﬁnite dimensional, the ﬁeld F2 is called a ﬁnite extension
of F1 , and the dimension n of this vector space is called the degree of the extension
and written n = [F2 : F1 ] .
(a) Prove that every element ξ ∈ F2 satisﬁes an equation
an ξ n + an−1 ξ n−1 + · · · + a0 = 0,
where the ak ∈ F1 and n = [F2 : F1 ] . [Hint: look at the examples of exercise
1l].
(b) If F1 ⊂ F2 ⊂ F3 are ﬁelds with
[F2 : F1 ] = n < ∞ and [F3 : F2 ] = m < ∞,
prove that [F3 : F1 ] < ∞ , in fact, prove
[F3 : F1 ] = [F3 : F2 ]]F2 : F1 ] = nm.
(c) Let F1 be the ﬁeld of rationals, F2 the ﬁeld whose elements have the form
√
a + b 3 , where a and b are rational, and let F3 be the ﬁeld whose elements
√
have the form c + d 5 , where c and d are√ F2 . Compute [F2 : F1 ] and ﬁnd
in
the polynomial of part a) satisﬁed by (1 − 3) ∈ F )2 . Compute [F3 : F2 ] and
[F3 : F1 ] . Find a basis for F3 as a vector space whose scalars are elements of
F1 . [The ideas in this problem are basic to modern algebra, particularly Galois’
theory of equations.] 386 CHAPTER 10. MISCELLANEOUS SUPPLEMENTARY PROBLEMS 14. Let Pj = (αj , βj ), j = 1, . . . , n, αj = αk be any n distinct points in the plane R2 .
One often wants to ﬁnd a polynomial p(x) = a0 + a1 x + · · · + aN xN which passes
through these n points, p(αj ) = βj , j = 1, . . . , n . Thus, p(x) is an interpolating
polynomial. Given any points P1 , . . . , Pn , prove that a unique interpolating polynomial p(x) degree n − 1(= N ) can be found. (More about this is in Exercises 17-18
below).
15. Let L1 and L2 be linear operators mapping V → V . Then they can be both
multiplied and added (or subtracted). The bracket product or commutator
[L1 , L2 ] ≡ L1 L2 − L2 L1
“measures the non-commutativity”. It is important in mathematics and physics. [In
quantum mechanics, the observables - like energy, momentum, and position - are
represented by self-adjoint operators. Two observables can be measured at the same
time if and only if their associated operators commute]. Prove the identities
(a) [L1 , L1 ] = 0, [L1 , I ] = 0
(b) [L1 , L2 ] = −[L2 , L1 ]
(c) [aL1 , L2 ] = a[L1 , L2 ] , a scalar
(d) [L1 + L2 , L3 ] = [L1 , L3 ] + [L2 , L3 ]
(e) [L1 , L2 , L3 ] = [L1 , L2 ]L3 + L2 [L1 , L3 ]
(f) [L1 , [L2 , L3 ]] + [L2 , [L3 , L1 ]] + [L3 , [L1 , L2 ]] = 0
(Part f is the Jacobi identity. It has been said that everyone should verify it once in
her lifetime.)
16. * Consider the normalized Legendre Polynomials,
en (x) = 2
1 dn 2
(x − 1)n ,
2n + 1 2n n! dxn n = 0, 1, 2, . . . which are an orthonormal set of polynomials in L2 [−1, 1], en being of degree n . If
f ∈ C [−1, 1] , prove that
N PN f = f , en en
n=0 converges to f in the norm of L2 [−1, 1] . [Hint: Use the form of the Weierstrass
Approximation Theorem (p. 255) and the method of Theorem (p. 241)].
17. * We again take up the interpolation problem begun in Exercise 13 above. Let Pj =
(αj , βj ), j = 1, 2, . . . , n be n points in the plane, αi = αj . Although we proved there
is a unique polynomial p(x) = a0 + a1 x + · · · + an−1 xn−1 of degree n − 1 passing
through the n points, the proof was entirely non-constructive. Here we (or you)
explicitly construct the polynomial.
(a) Show that the polynomial of degree n − 1
pj (x) = Πn= (x − αk )
˜
k
k =j is zero if x = αk , k = j , but pj (αj ) = 0 .
˜ 387
(b) Construct a polynomial pj (x) with the property pj (αk ) = δjk .
(c) Show that
n p(x) = βj pj (x)
j =1 is the desired (unique by Ex. 13) interpolating polynomial.
(d) Let P1 = (1, 1), P2 = (2, 1), P3 = (4, −1), P4 = (−1, −2) .
Find the interpolating polynomial using the above construction.
18. * If f is some complicated function, it is often useful to use an interpolating polynomial instead of the function. Then the polynomial p(x) will pass through the points
Pj = (αm , f (αj )), j = 1, . . . , n , so by Exercise 16,
n p(x) = f (αj )pj (x).
j =1 How much will p diﬀer from f in an interval [a, b] containing the αj ? You must
estimate the remainder R = f − p .
(a) Assume f ∈ C n [a, b] . Since R(x) = f (x) − p(x) vanishes at x = αj , j =
1, . . . , n , it is reasonable to write
R(x) = (x − α1 ) · · · (x − αn ) · (?)
Fix x and deﬁne the constant A by
ˆ
f (ˆ) − p(ˆ) = A(ˆ − α1 ) · · · (ˆ − αn ).
x
x
x
x
By a trick similar to that used in Taylor’s Theorem (cf. P. 104j Ex. 12), prove
that A = f (n) (ξ )/n! where ξ is some point in (a, b) . Thus,
f (ˆ) = p(ˆ) +
x
x (ˆ − α1 ) · · · (ˆ − αn ) (n−1)
x
x
f
(ξ ), ξ ∈ (a, b).
n! (b) Let f (x) = 2x , and α1 = −1, α2 = 0, α3 = 1, α4 = 2.
Find the approximating polynomial and ﬁnd an upper bound for the error in the
interval [−2, 2] .
19. If x is irrational and a,b,c, and d are rational (with ad − bc = 0) , prove that
is irrational.
20. Prove by induction that
1 + 3 + 5 + · · · + (2n − 1) = n2 .
21. (a) If x ≥ 0 , use the mean value theorem to prove
ex ≥ 1 + x. ax+b
cx+d 388 CHAPTER 10. MISCELLANEOUS SUPPLEMENTARY PROBLEMS
(b) If ak ≥ 0 , prove that
n Pn ak ≤ Πn=1 (1 + ak ) ≤ e
k k=1 ak , k=1 (where Πn=1 bk = b1 b2 · · · bn ).
k
(c) If ak ≥ 0 , prove that the inﬁnite product Π∞ (1 + ak ) := limn→∞ Πn=1 (1 + ak )
k=1
k
∞
converges if and only if the inﬁnite series
k=1 converges.
22. Let an+1 = 2
1+an , where a1 > 1 . Prove that (a) the sequence a2n+1 is monotone decreasing and bounded from below.
(b) the sequence a2n is monotone increasing and bounded from above.
(c) does lim an exist?
n→∞ 23. Let ak , k = 1, . . . , n + 1 be arbitrary real numbers which satisfy a1 + a2 + · · · +
2
an+1
an
n−1 has at least one zero for
n + n+1 = 0 . Show that P (x) = a1 + a2 x + · · · + an x
x ∈ (0, 1) .
24. Suppose f ∈ C 2 in some neighborhood of x0 . Prove that
f (x0 + h) − 2f (x0 ) + f (x0 − h)
= f (x0 ).
h→0
h2
lim 25. Let s(x) and c(x) be continuously diﬀerentiable functions deﬁned for all x , and
having the properties
s (x) = c(x),
c (x) = s(x)
s(0) = 0, c(0) = 1. (a) Prove that c2 (x) − s2 (x) = 1 .
(b) Show that c(s) and s(x) are uniquely determined by these properties.
26. Consider an and (a) If lim n→∞ bn . bn
= K, K = 0, ∞ , then the series both converge or diverge together.
an (b) If an converges and lim bn
= 0 , then
an (c) If an converges and lim bn
= ∞ , then the series
an n→∞ n→∞ diverge (give examples).
(d) Apply these to:
∞ (i) 1
√
n− n
n=2 bn converges.
bn may converge or 389
∞ (ii)
n=1
∞ n3 1
√
−2 n (−1)n sin (iii)
n=1 π
. (Hint: as x → 0,
n sin x
x → 1 ). 27. The following (a weak form of Stirling’s formula) is an improvement of the result on
page 64, Ex. 6.
n log n − (n − 1) < log n! < (n + 1) log(n + 1) − 2 log 2 − (n − 1),
from which one ﬁnds
nn e−n+1 < n! < 1 (n + 1)(n+1) e−n+1 .
4
Prove these.
28. (a) Find the Taylor series expansion for f (x) = e−x about x = 0 .
(b) Show that the series found in (a) converges to e−x for all x in the interval
[−r, r] , where r > 0 is an arbitrary but ﬁxed real number.
29. Consider the sequence
N SN =
2 sin πx
dx.
x Does lim SN exist? [Hint: observe that SN can be written as
N →∞ N −1 SN = an ,
2 where n+1 an =
n sin πx
dx.
x sin πx
x ,x ≥ 2 , to deduce - by inspection - the needed properties of
Sketch a graph of
the an ’s. Please do not attempt to evaluate the integrals for an ].
30. Let A = { p ∈ P9 : p(x) = p(−x) } .
(a) Prove that A is a subspace of P9 .
(b) Compute the dimension of A .
31. Let X and Y be elements in a real linear space. Prove that X = Y
if (X + Y ) ⊥ (X − Y ) .
32. In the space R2 , introduce the new scalar product
< X, Y >= x1 y1 + 4x2 y2 ,
where X = (x1 , x2 ) and Y = (y1 , y2 ) . if and only 390 CHAPTER 10. MISCELLANEOUS SUPPLEMENTARY PROBLEMS
(a) Verify that this indeed is a scalar product and deﬁne the associated norm X .
(b) Let X1 = (0, 1) and X2 = (4, −2) . Using this norm and scalar product, ﬁnd an
orthonormal set of vectors e1 and e2 such that e1 is in the subspace spanned
by X1 . 33. Let H be a scalar product space with X and Y in H . Find a scalar α which
makes X − αY a minimum. For this α , how are X − αY and Y related? [Hint:
Draw a picture in E2 ].
∞ ∞ an converges, where an ≥ 0 , does the series 34. If
n=1 1 √ a
also converge? Proof or
n2 counterexample.
35. Use the Taylor series about x0 = 0 to calculate sin.2 making an error less than
.005 . Justify your statements.
36. Let A = span{ (1, 1, 1, 1), (1, 0, 1, 0) } be a subspace of E4 . Find the orthogonal
complement, A⊥ , of A by giving a basis for A⊥ .
37. Prove that
(a) 1 + (b) 1 + 1
<
8
1
2 ∞
k=1 ∞
k=1 1
1
<1+ .
3
k
2 1
3
<1+ .
k2
4 38. Let ak be a sequence of positive numbers decreasing to zero, ak → 0 , and let SN =
a1 + a2 + · · · + aN .
(a) Prove that SN ≥ N aN .
(b) Use this to estimate the number, N , of terms needed to make
N k −1/4 > 1000.
k=1 39. Prove or give a counterexample:
∞ (a) If ∞ b2n must converge. bn converges, then
n=1
∞ n=1
∞ |bn | converges, then (b) If
n=1 |b2n | must converge.
n=1 40. Let X1 and X2 be elements of a scalar product space.
(a) If X1 ⊥ X2 , prove that X1 − aX2 ≤ X1 for any real number a . 391
(b) Prove the converse, that is, if X1 − aX2 ≤ X1 for every real number a ,
then X1 ⊥ X2 . [Hint: After your ﬁrst approach has failed, try looking at the
problem geometrically. How would you pick a to minimize the left side of the
inequality?].
41. Let Sn = a1 + a2 + · · · + an , where an → 0 as n → ∞ . Prove that Sn converges if
and only if S2n = a1 + a2 + · · · + a2n−1 + a2n converges (one could also use S3n etc.).
∞ 42. Show that the error in approximating the series
n=1 1
by the ﬁrst N terms is less
nn than N −N −1 .
43. A sample “multiplication” for points X = (x1 , x2 , x3 ) and Y = (y1 , y2 , y3 ) in R3 is
to deﬁne
X Y ≡ (x1 y2 , x2 y2 , x3 y3 ).
Deﬁne a multiplicative identity by yourself. Using these deﬁnitions for the multiplicative structure and the usual rules for the additive structure, show that the resulting
algebraic object is not a ﬁeld.
44. (a) Assume an ≥ 0 and bn ≥ 0 . Prove that ∠(an + bn ) converges if and only if the
series ∠an and ∠bn both converge.
(b) What if you allow the bn ’s to be negative?
1
1
1
1
45. (a) Show that the vectors e1 = ( √2 , √2 ), e2 = ( √2 , − √2 ) form an orthonormal basis
for E2 . (b) Write the vector X = (7, −3) in the form X = a1 e1 + a2 e2 , using the scalar
product to ﬁnd a1 and a2 (don’t solve linear equations).
46. Consider the linear space P2 as a subspace of L2 [0, 1] .
(a) If p(x) = 1 − x2 , compute p. (b) Find the orthonormal basis for A⊥ , where A = span{ 2 + x } .
(c) Find the polynomial ϕ ∈ P2 such that
p, ϕ = p(1) for all p ∈ P2 , that is, the same ϕ should work for all p ’s.
47. Give formal proofs for the following (trivial) properties of a norm on a linear space.
Only the axioms may be used.
(a) −X = X (b) X −Y = Y −X (c) X +Y ≥ X − Y (d) X1 + X2 + · · · + Xn ≤ X1 + X2 + · · · + Xn (I suggest induction here). 392 CHAPTER 10. MISCELLANEOUS SUPPLEMENTARY PROBLEMS 48. Consider R2 with the norms 1, 2 , and ∞ . (a) Draw a sketch of R2 indicating the unit ball for each of these three norms. (The
ball may not turn out to be “round”).
(b) Which of these three linear spaces have the following property: “given any subspace M and a point X0 not in M , then there is a unique point on M which
is closest to M .”
49. Are the following scalar products the set of functions continuous on [a, b] ? Proof or
counterexample.
b (a) [f, g ] = ( b f (x) dx)(
a g (x) dx)
a b b |f (x)| dx)( (b) [f, g ] = (
a |g (x)| dx)
a 50. (a) Let dim V = n and { X1 , . . . , Xn } ∈ V . Prove that { X1 , . . . , Xn } are linearly
independent if and only if they span V (so in either case, they form a basis for
V ).
(b) Let { e1 , . . . , en } be an orthonormal set of vectors for an inner product space
H . Prove this set of vectors is a complete orthonormal set for H if and only if
n = dim H .
(c) Prove that dim V = largest possible number of linearly independent vectors in
V.
51. (a) Let X and Y be any two elements in an inner product space. Prove that the
parallelogram law holds
X +Y 2 + X −Y 2 =2 X 2 +2 Y 2 (cf. page 192, Ex. 9).
(b) Consider the set of continuous functions on [0, 1] with the uniform norm, f ∞ =
max0≤x≤1 |f (x)| . Show that this norm cannot arise from an inner product, i.e.
there is no inner product such that for all f , f ∞ =
f , f . [Hint: If there
were, the relationship of part a would hold between the norms of various elements. Show that relationship does not, in fact, hold for the function f (x) = 1
and g (x) = x ].
52. (a) Let H be a ﬁnite dimensional inner product space and (X ) a linear functional
deﬁned for all X ∈ H . Show that there is a ﬁxed vector X0 ∈ H such that
(X ) = X, X0 for all X ∈ H. This shows that every linear functional can be represented simply as the result
of taking the inner product with some vector X0 . [Hint: First pick a basis
{ e1 , . . . , en } for H and let cj = (en ) . Now use the fact that the ej ’s are a
basis and that is linear]. 393
(b) Consider the linear space P2 with the L2 [0, 1] inner product. This gives an
inner product space H .
(i) Show that (p) = p( 1 ) is a linear functional.
3
(ii) Find a polynomial p0 such that (p) = p, p0 for all p ∈ H .
53. Consider the set S of pairs of real numbers X = (x1 , x2 ) . Deﬁne
X + Y = (x1 + y1 , x2 + y2 ), aX = (ax1 , x2 ). Is S , with this deﬁnition of vector addition and multiplication by scalars, a vector
space?
54. By inspection, place suitable restrictions on the contents a, b, c, · · · in order to make
the following operator linear:
T u = a[ d2 u
du
d3 u 2 + bx2 2 + cu
+ eu + f sin u + g.
3
dx
dx
dx d
55. Consider the operator D = dx on the linear space Pn of all polynomials of degree
less than or equal to n . Find R(D) and N(D) as well as dim R(D) and dim N(D) . 56. Let 1 −2
A = 2 0 ,
31 30
2
and C = 1 4 −1 .
0 −2 0 B= −1 0 −2
,
210 Compute all of the following products which make sense:
AB, BA, AC, CA, BC, CB, A2 , B 2 , C 2 , ABC, CAB.
57. Consider the mapping A : R4 → R3 which is 1 −1
A = 2 1
0 −3 deﬁned by the matrix 11
1 4
1 −2 (a) Find bases for N(A) and R(A) .
(b) Compute dim N(A) and dim R(A) .
58. Let A be a square matrix. Consider the system of linear algebraic equations
AX = Y0 ,
where Y0 is a ﬁxed vector. Assume these equations have two distinct solutions X1
and X2 ,
AX1 = Y0 , AX2 = Y0 , X1 = X2 .
(a) Find a third solution X3 . 394 CHAPTER 10. MISCELLANEOUS SUPPLEMENTARY PROBLEMS
(b) Does there exist a vector Y1 such that the equations
AX = Y1
have no solutions? Why?
(c) det A =? 59. Let Q be a parallelepiped in En whose vertices Xk are at points with integer coordinates,
Xk = (a1k , a2k , · · · ank ), aik integers.
Prove that the volume of Q is an integer.
60. Let A and B be self-adjoint matrices. Prove that their product AB is self-adjoint
if and only if AB = BA .
61. Solve the following initial value problems.
u(0) = 1 , u (0) = 0
2 (a) u + 8u + 16u = 0,
(b) u + 10u + 16u = 0, u(0) = 1, u (0) = 2 1
u(0) = 4 , u (0) = 1 (c) u + 64u = 0, u(0) = 2, u (0) = −1 (d) u + 4u + 5u = 0, u(0) = 0, u (0) = −2 (e) 2u + 6u + 5u = 0,
(f) 4u − 4u + u = 0, u(1) = −1, u (1) = 0 (g) u + 8u + 16u = 2, 1
u(0) = 2 , u (0) = 0 (h) u + 8u + 16u = t, u(0) = 1 , u (0) = 0
2 (i) u + 8u + 16u = t − 2, u(0) = 0, u (0) = 0 (j) u + 8u + 16u = t − 2, 1
u(0) = 2 , u (0) = 0 (k) u + 10u + 16u = t, u(0) = 1, u (0) = 2 1
u(0) = 4 , u (0) = 2 (l) u + 64u = 64,
(m) u + 64u = t − 64, u(0) = 3 , u(0) = 0
4 (n) 2u + 6u + 5u = t2 , u(0) = 0, u (0) = −2 62. (The complex numbers as matrices).
(a) Show that the set of matrices
C={ a −b
ba : a and b are real numbers } is a ﬁeld.
(b) Find a map ϕ : C → complex numbers such that ϕ is bijective and such that
for all A, B ∈ C
(i) ϕ(A + B ) = ϕ(A) + ϕ(B )
(ii) ϕ(AB ) = ϕ(A)ϕ(B ). 395
63. (Quaternions as matrices). A deﬁnition: A division ring is an algebraic object which
satisﬁes all of the ﬁeld axioms except commutativity of multiplication.
(a) Show that the set of matrices
Q={ z −w
¯
wz
¯ : z, w are complex numbers } form a division ring with the usual deﬁnitions of additions and multiplication for
matrices.
√
(b) If we write z = x + iy, w = u + iv where i = −1 and x, y, u , and v are real
numbers, then Q can be considered as a vector space over the reals with basis
1= 10
01 i= i0
0 −i j= 0 −1
10 k= 0i
.
i0 Compute i2 , j2 , k2 , ij, jk, ki, ji, kj , and ik . (The set Q is called the quaternions ).
64. Let 2 −3 1
0
0 2 −3 1 .
A=
0 0
2 −3
00
0
2 (a) Find det A .
(b) Find A−1 . 2
8 (c) Solve AX = Y , where Y = 8 .
−16 (d) Let L : P3 → P3 be the linear operator deﬁned by
Lp = p − 3p + 2p, (u = du
).
dx Find the matrix eL e for L with respect to the following basis for P3
e1 (x) = 1, e2 (x) = x, e3 (x) = x2
,
2 e4 (x) = x3
.
3! (e) Use the above results to ﬁnd a solution of
8
Lu = 2 + 8x + 4x2 − x3 .
3
[Hint: Express the right side in the basis of part d.].
65. Let H be an inner product space, and suppose that A is a symmetric operator,
A∗ = A , with the additional property that A2 = A . Show that there exist two
subspaces V1 and V2 of H with all of the following properties 396 CHAPTER 10. MISCELLANEOUS SUPPLEMENTARY PROBLEMS
(i) V1 ⊥ V2
(ii) If X ∈ V1 , then AX = X
(iii) If Y ∈ V2 , then AY = 0
(iv) If Z ∈ H , then Z can be written uniquely as Z = X + Y where X ∈ V1 and
Y ∈ V2 . 66. (a) Find the inverse of the matrix 2
1
0
1 .
A = −1 0
0 −1 −1
(b) Use the result of a) to solve AX = b for X where b = (7, −3, 2) .
67. Let A and B be 2 × 2 positive deﬁnite matrices with det A = det B . Prove that
det(A − B ) < 0 .
68. Let L : V1 → V2 be a linear operator with LX1 = Y1 and LX2 = Y2 . Give a proof
or counterexample to each of the following assertions:
(a) If X1 and X2 are linearly independent, then Y1 and Y2 must be linearly
independent.
(b) If Y1 and Y2 are linearly independent, then X1 and X2 must be linearly
independent.
69. Let p0 , p1 , p2 , , . . . be an orthogonal set of polynomials in [a, b] where pn has degree
n.
(a) Prove that pn is orthogonal to 1, x, x2 , . . . , xn−1 .
(b) Prove that pn is orthogonal to any polynomial q of degree less than n .
(c) Prove that pn has exactly n distinct real zeros in (a, b) . [Hint: Let α1 , . . . , αk
be the places in (a, b) where pn (x) changes sign, so p(x) = r(x)(x − α1 )(x −
α2 ) . . . (x − αk ) where r(x) is a polynomial of degree n − k which does not
change sign for x in (a, b) , say r(x) ≥ 0 . Show that
b p(x)(x − α1 ) · · · (x − αk ) dx > 0.
a If k < n , show that this contradicts the result of part b).].
70. Consider the system of inhomogeneous equations
a11 x1 + · · · + a1n xn = b,
.
.
.
ak1 x1 + · · · + akn xn = bn . 397
Let A = ((aij )) and let Ab denote the augmented matrix a11 · · · a1n b1
. Ab = . .
ak1 · · · akn bn
formed by adding the bj ’s as an extra column to A . Prove that the given system of
equations has a solution if and only if dim R(A) = dim R(Ab ) .
71. Let A be an n × n matrix.
(a) Show that you can not solve the equation
A2 = −I
if n is odd.
(b) Find a 2 × 2 matrix A such that A2 = −I .
(c) If n is even, ﬁnd an n × n matrix A such that A2 = −I .
72. Let A be an n×n matrix such that A2 = I . Prove that dim R(A+I )+dim R(A−I ) =
n.
73. Let f (x, y ) = (y − 2x2 )(y − x2 ) . Show that the origin is a critical point. Then
show that if you approach the origin along a straight line, the origin appears to be
a minimum. On the other hand, show that if curved paths are also used, then the
origin is a saddle point of f . [The point of this exercise is to illustrate the fact that
the nature of a critical point cannot be determined by merely approaching it along
straight lines].
74. (a) Let A be a diagonal matrix, no two of whose diagonal elements are the same.
If B is another matrix and AB = BA , prove that B is also diagonal.
(b) Let A be a diagonal matrix, B a matrix with at least one zero-free column
and with the further property that AB = BA . Prove that all of the diagonal
elements of A are equal.
√
an
1
75. (a) If
an converges, where an ≥ 0 , prove that
converges if p > 2 .
np
[Hint: Schwarz].
(b) Find an example showing that the series may diverge if p = 1
2 . 76. Let [X, Y ] be an inner product on R3 with basis vectors e1 , e2 , e3 , not necessarily
orthonormal. Let aij = [ei , ej ] . Prove that the quadratic form
3 3 Q(X ) = aij xi xj
i=1 j =1 is positive deﬁnite.
77. If A is self-adjoint and AX = λ1 X, AY = λ2 Y with λ1 = λ2 , prove that X ⊥ Y . 398 CHAPTER 10. MISCELLANEOUS SUPPLEMENTARY PROBLEMS 78. Let S be a positive deﬁnite matrix. Prove that det S > 0 . [Hint: Consider the
matrix A(t) ≡ tS + (1 − t)I , where 0 ≤ t ≤ 1 . Show that A(t) is positive deﬁnite,
so det A(t) = 0 . Then use the fact that A(0) = I and A(1) = S to obtain the
conclusion].
79. Consider the linear space of inﬁnite sequences
X = (x1 , x2 , x3 , · · · )
with the usual addition. Deﬁne the linear operator S (the right shift operator) by
SX = (0, x1 , x2 , x3 , · · · )
(a) Does S have a left inverse? If so, what is it?
(b) Does S have a right inverse? If so, what is it?
80. Find a right inverse for the matrix
A= 101
.
010 Can A have a left inverse? Why?
81. Which of the following statements are true for all square matrices A ? Proof or
counterexample.
(a) If A2 = I , then det A = I .
(b) If A2 = A , then det A = 1.
(c) If A2 = 0 , then det A = 0
(d) If A2 = I − A , then det A2 = 1 − det A .
82. Let L be a linear operator on an inner product space H with inner product <, > .
Deﬁne
[X, Y ] = LX, LY .
Under what further condition(s) on L is [X, Y ] an inner product too?
83. Let L : H → H be an invertible transformation on the inner product space H . If
L “preserves orthogonality” in the sense that X ⊥ Y implies LX ⊥ LY , prove that
there is a constant α such that R ≡ αL is an orthogonal transformation.
84. Let H be an inner product space. If the vectors X1 and X2 are at opposite ends of
a diameter of the sphere of radius r about the origin, and if Y is any other point on
that sphere, prove that Y − X1 is perpendicular to Y − X2 , proving that an angle
inscribed in a hemisphere is a right angle.
85. If L is skew-adjoint, L∗ = −L , prove that
X, LX = 0 for all X. 399
86. Let Dn be a n × n matrix with x on the main diagonal
and super-diagonals, so x1
x10
1 x
x1
D2 =
, D3 = 1 x 1 , D4 = 0 1
1x
01x
00
If x = 2 cos θ , prove that det Dn = sin(n+1)θ
sin θ and 1 s on both the sub- 0
1
x
1 0
0
,
1
x D5 = · · · . . 87. Let A and B be square matrices of the same size. If I − AB is invertible, prove
that I − BA is also invertible by exhibiting a formula for its inverse.
88. Assume an converges, where an ≥ 0 . Does the series
√ an an+1 also converge? Proof or counterexample.
89. Let A be a square matrix.
(a) Prove that AA∗ is self-adjoint.
(b) Is AA∗ always equal to A∗ A ? Proof or counterexample.
90. Show that C [0, 1] is a direct sum of the space V1 spanned by e1 (x) = x and e2 (x) =
x4 , and the subspace V2 of all functions ϕ(x) such that
1 0= 1 xϕ(x) dx, x4 ϕ(x) dx. 0= 0 0 [Hint: Show that if f ∈ [0, 1] , there are unique constants a and b such that g (x) ≡
f (x) − [ax + bx4 ] belongs to V2 ].
91. Let V1 be the linear space of all complex-valued analytic functions in the open unit
disc, that is, V1 consists of all complex-valued functions f of the complex variable
z which have convergent power series expansions
∞ an z n f (z ) =
0 in the open disc, |z | < 1 .
Let V2 be the linear space of all sequences of complex numbers ( a0 , a1 , a2 , · · · ) with
the natural deﬁnition of addition and multiplication by constants.
Deﬁne L : V1 → V2 by the rule
Lf = (a0 , a1 , a2 , · · · ),
where the aj ’s are the Taylor series coeﬃcients of f . Answer the following questions
with a proof or counterexample. 400 CHAPTER 10. MISCELLANEOUS SUPPLEMENTARY PROBLEMS
(a) Is L injective?
(b) Is L surjective?
(c) Is 2 contained in R(L) ? (Note: 2 is the subspace of V2 such that ∞ |ak |2 < ∞).
k=0 92. Do the following series converge or diverge?
∞ ∞ 1 + 1/n, (a) ( 1 + 1/n2 − 1). (b)
n=1 n=1 93. Consider the set of four operators { T1 , T2 , T3 , T4 } deﬁned as follows on the set of
square invertible matrices.
T2 A = A−1 T1 A = A,
T3 A = A∗ , T4 A = (A−1 )∗ . Show that this set of four operators forms a commutative group with the group operation being ordinary operator multiplication.
94. Let Sn = a1 − a2 + a3 − a4 + a5 − · · · . If 0 < ak and the ak ’s are increasing, prove
that |SN | ≤ aN .
95. The Monge-Ampere equation is uxx uyy − u2 = 0 . Show that it is satisﬁed by any
xy
u(x, y ) ∈ C 2 of the form u(x, y ) = ϕ(ax + by ) , where a and b are constants.
96. (a) Consider the diﬀerential operator
Lu = u − 4u
(i) Find a basis for the nullspace of L .
(ii) Find a particular solution of Lu = e2x+1 .
(iii) Find the general solution of Lu = e2x+1 .
(b) Consider the diﬀerential operator
Lu = u + 4u
Repeat part (a), only here use Lu = f , where f (x) = sec 2x .
97. Find the general solution for each of the following
(a) 2u + 5u − 3u = 0
(b) u − 6u + 9u = 0
(c) u − 4u + 5u = 0 401
98. Find the ﬁrst four non-zero terms in the series solution of
4x2 u − 4xu + (3 − 4x2 )u = 0
corresponding to the largest root of the indicial equation. Where does the series
converge?
99. Find the complete solution of each of the following equations valid near x = 0 by
using power series.
(a) x2 u + xu − (x2 − 1 )u = 0
4
(b) u + xu − u = 0 (only ﬁrst ﬁve non-zero terms)
[Answers:
∞ (a) u(x) = Ax−1/2
k=0 x2k
+ Bx1/2
(2k )! (b) u(x) = Ax + B (1 + x2
2! − x4
4! + ∞
k=0 3 x6
6! − x2k
,
(2k + 1)!
15x8
8! + · · · ) ]. 100. Consider the matrix −1 −4 −12
1
3
6
A=
0
0 −1
0 −4 −12 0
0
.
0
1 (a) Compute det A .
(b) Compute A−1 .
(c) Solve AX = b where b = (1, 2, 3, −1) .
101. True or false. Justify your response if you believe the statement is false (a counterexample is adequate).
(a) The set A = { X ∈ R3 : x1 = 2 } is a linear subspace of R3 .
(b) The vectors X1 = (2, 4) and X2 = (−2, 4) span R2 .
(c) The vectors X1 = (1, 2, 3), X2 = (−7, 3, 2), X3 = (2, −1, 1) , and X4 = (π, e, 5)
are linearly independent.
(d) The set A = { u ∈ C [0, 1] : u(x) = a1 x + a2 ex } is an inﬁnite dimensional
subspace of C [0, 1] .
(e) The functions f1 (x) = x and f2 (x) = ex are linearly dependent functions in
C [0, 1] .
(f) If { e1 , e2 , . . . , en } are an orthonormal set of vectors in E8 , then n ≤ 7 .
(g) The vector Y = (1, 2, 3) is orthogonal to the subspace of E3 spanned by e1 =
(0, 3, −2) and e2 = (−1, −1, 1) . 402 CHAPTER 10. MISCELLANEOUS SUPPLEMENTARY PROBLEMS
(h) The elements of the set
A = { u ∈ C 2 [0, 10] : u + xu − 3u = 6x }
can be represented as u(x) = u(x) + x3 , where
˜
u ∈ S = { u ∈ C 2 [0, 10] : u + xu − 3x = 0 }.
˜
1
83
2
1
2
1
(i) The set of vectors e1 = ( 3 , 0, 2 , − 3 ), e2 = (0, 0, √2 , √2 ), and e3 = ( 9 , 9 , − 9 , 2 )
3
9
constitute a complete orthonormal basis for E4 . (j) In the vector space of bounded functions f (x), x ∈ [0, 1] , the functions
f1 (x) = 1, f2 (x) = 1, 0 ≤ x ≤ 1 ,
2
0, 1 < x ≤ 1
2 f3 (x) = 0, 0 ≤ x ≤ 1
2
1, 1 < x ≤ 1
2 are linearly independent.
(k) The function f (x) = |x| can be represented by a convergent Taylor series about
the point x0 = 0 .
(l) The function f (x) = x2 − x73 can be represented by a convergent Taylor series
about the point x0 = −1 .
(m) The function f (x) = |x| can be represented by a convergent Taylor series about
the point x0 = −1 .
(n) The plane of all points (x1 , x2 , x3 , x4 ) ∈ E4 such that
2x1 − 4x2 + 6x3 − 5x4 = 7
is perpendicular to the vector (2, −4, 6, −5) .
3
(o) If e1 = ( 3 , 4 ) and e2 = ( 4 , − 5 ) , then X = (−1, 2) can be written as X =
55
5
2e1 − e2 . (p) The set of all integers (positive, negative, and zero) is a ﬁeld.
(q) Consider the inﬁnite series
∞ ak .
k=0 If lim |ak | = 0 , then the series must converge.
k→0 (r) Let { an } be a sequence of rational numbers. If this sequence converges to a ,
then the limiting value, a , must be a rational number too.
(s) The equation x6 + 3 = 0 , where x is an element of an ordered ﬁeld, has no
solutions.
√
(t) It is possible to write i in the form a + ib , where a and b are real numbers.
√
(Here i = −1 , of course).
(u) Let an be a sequence of complex numbers. If the sequence of absolute values,
|an | , converges, then the sequence an must converge.
(v) If
∞ ak z k
k=0 converges at the point z = 3 , then it must converge at z = 1 + i . 403
(w) The linear subspace A = { p ∈ P7 : p(x) = a1 x + a2 x5 } is a ﬁve dimensional
subspace of P7 .
(x) The linear subspace A = { u ∈ C [−1, 1] : u(x) = a1 x + a2 x5 } is an inﬁnite
dimensional subspace of C [−1, 1] .
(y) There is a number α such that the vectors X = (1, 1, 1) and Y = (1, α, α2 )
form a basis for R3 .
(z) The operator T : C 2 → C 1 deﬁned for u ∈ C 2 by T u = u − 7u is a linear
operator.
102. (a) The operator T : C [0, 1] → R deﬁned for u ∈ C [0, 1] by
1 |u(x)| dx Tu =
0 is a linear operator.
(b) The sequence (1 + i)n converges to √ 2. (c) The series
∞
k=1 k+1
2345
= + + + + ···
2k + 1
3579 converges.
(d) If t is real, then eit = 1 .
(e) Let V1 and V2 be linear spaces and let the operator T map V1 into V2 . If
T 0 = 0 , then T is a linear operator.
(f) The operator T : C ∞ [−7, 13] → C ∞ [−7, 13] deﬁned by
Tu = u du
dx is linear.
(g) The operator T : C [0, 13] → C [0, 13] deﬁned by
x (T u)(x) = u(t) sin t dt, x ∈ [0, 13] 0 is linear
(h) In the scalar product space L2 [0, 1] , the functions f and g whose graphs are
a figure goes here
are orthogonal.
(i) Let L be a linear operator. IF LX1 = Y and LX2 = Y , where X1 = X2 , then
the solution of the homogeneous equation LX = 0 is not unique.
(j) Let L be a linear operator. If X1 and X2 are solutions of LX = 0 , then
3X1 − 7X2 is also a solution of LX = 0 . 404 CHAPTER 10. MISCELLANEOUS SUPPLEMENTARY PROBLEMS
(k) Let e1 = (1, 1) and e2 = (0, 1) , and let the linear operator L which maps R2
into R3 satisfy
Le1 = (1, 2, 3), Le2 = (1, −2, −1).
Then L(2, 3) = (1, 1, 1).
(l) In the space L2 [0, 1] , if f is orthogonal to the function x2 , then either f ≡ 0
or else f must be positive somewhere in [0, 1] .
(m) If F (X ) = (2, 3, 4) for all X ∈ E3 , then F is an aﬃne mapping.
(n) If f : E3 → E1 is such that f : (1, 0, 0) → 1 and f : (0, 4, 0) → 2 , there is a
1
point Z ∈ E3 such that f (Z ) ≥ 5 .
(o) Let A and B be square matrices with det A = 7 and det B = 3 . Then
det AB = 10. det(A + B ) = 10. (p) If A : R3 → R3 is given by
A= 231
,
192 then dim N(A) = 2 .
(q) The function f (x, y, z ) = 9 + 3x + 4y − 7z does not take on its maximum value.
(r) If the function u(x) has two derivatives in some neighborhood of x = 0 , and
satisﬁes the diﬀerential equation
9x2 u − 28u = 0,
then u(0) = 0 .
(s) There are constants a, b and c such that the function u(x) = ex + 2e2x − e−x
is a solution of
au + bu + cu = 0.
(t) The vector (xy, x) is the derivative of some real-valued function f (x, y ) .
(u) The vector (y, x) is not the derivative of some real-valued function f (x, y ) .
(v) Given any q × p matrix A = ((aij (X ))) , where X = (x1 , · · · , xp ) and where
the elements aij (X ) are suﬃciently diﬀerentiable functions, then there is a map
F : Rp → Rq such that F (X ) = A .
(w) If A is a square matrix and A2 = A , then A = I .
(x) If A is a square matrix and A2 = 0 , then A = 0 .
(y) If A is a square matrix and det A = 0 , then A2 = A if and only if A = I .
(z) If X, Y , and Z are three linearly independent vectors, then X + Y,
and X + Z are also linearly independent. Y +Z, 103. Deﬁne L : P2 → P2 as follows: if p ∈ P2
Lp = (x + 1) dp
dx (a) Find the matrix e Le representing the operator L with respect to the bases
e1 = 1, e2 = x1 , e3 = x2 for P2 . 405
(b) Is L an invertible operator? Why?
(c) Find dim R(L) and dim N(L) .
104. Let
A= 1
2
√
3
2 √ 3
2 −
1
2 , B= 5
√
3 √ 3
.
3 (a) Compute AA∗ , ABA∗ , and (ABA∗ )100 .
(b) How could you use the result of part (a) to compute B 100 ?
105. Consider the following system of three equations as a linear map L : R2 → R3
x1 + x2 = y1
4x1 + x2 = y2
x1 − 2x2 = y3
(a) Find a basis for N(L∗ ) .
(b) Use the result of part a) to determine the value(s) of α such that Y = (1, 2, α)
is in R(L) .
106. Find the unique solution to each of the following initial value problems.
(a) u + u − 2u = 0, u(0) = 3, u (0) = 0 (b) u + 4u + 4u = 0, u(0) = 1 u (0) = −1 (c) u − 2u + 5u = 0, u(0) = 2, u (0) = 2 107. Consider the special second order inhomogeneous constant coeﬃcient O.D.E. Lu = f ,
where
Lu ≡ u − 4u,
and where f is assumed to be a suitably diﬀerentiable function which is periodic with
period 2π, f (x + 2π ) = f (x) .
(a) Expand f in its Fourier series and seek a candidate, u , for a solution of Lu = f
as a Fourier series, showing how the Fourier coeﬃcients of u are determined by
the Fourier coeﬃcients of f .
(b) Apply the above procedure to the trivial example where
f (x) = sin 3x − 4 cos 17x + 3 sin 36x.
108. (a) Find the directional derivative of the function
f (x, y ) = 2 − x + xy
at the point (0, 6) in the direction (3, −4) by using the deﬁnition of the directional derivative as a limit. Check your answer by using the short method. 406 CHAPTER 10. MISCELLANEOUS SUPPLEMENTARY PROBLEMS
(b) Repeat part (a) for f (x, y ) = 1 − 3y + xy . 109. Find and classify the critical points of the following functions.
(a) f (x, y ) = x3 + y 2 − 3x − 2y + 2
(b) f (x, y ) = x2 − 4x + y 2 − 2y + 6
(c) f (x, y ) = (x2 + y 2 )2 − 8y 2
(d) f (x, y ) = (x2 − y 2 )2 − 8y 2
(e) f (x, y ) = (x2 − y 2 )2
(f) f (x, y ) = x2 − 2xy + 1 y 3 − 3y
3
110. Consider the function x3 + y 2 − 3x − 2y + 2 . At the point (2, 1) ﬁnd the direction
in which the directional derivative is greatest. Find the direction where it is least.
111. Let f : E2 → E be a suitably diﬀerentiable function and let X (t) be the equation
of a smooth curve C in E2 on which f is identically constant, say, f (X (t)) ≡ 4 .
Show that on this curve, f is perpendicular to the velocity vector X (t) . [Hint: Do
something to ϕ(t) = f (X (t)) . The proof takes but one line.].
112. Consider the following statements concerning a function f : En → E .
(A) f is continuous.
(B) f has a total derivative everywhere.
(C) f has ﬁrst order partial derivatives everywhere.
(D) f has a total derivative everywhere which is continuous everywhere.
(E) f has ﬁrst order partial derivatives everywhere and they are continuous functions
everywhere.
(F) f is an aﬃne function.
(G) f ≡ 0 .
(a) Which of these statements always imply which others. A sample (possibly incorrect) answer might look like
(A) ⇒ B, F, · · ·
(B ) ⇒ A, · · ·
(b) Find examples illustrating each case where a given statement does not imply
another (the Exercises, pp. 588-95, contain the required examples).
113. Solve the following ordinary diﬀerential equations subject to the given auxiliary conditions
(a) u − u − 6u = 0,
(b) xu + u = ex−1 , u(0) = 0, u (0) = 5 u(1) = 2 (c) u − 6u + 10u = 0 , general solution. 407
114. (a) If u(x, y, t) = xexy + t2 , while x = 1 − t3 and y = log t2 , then let w(t) =
u(x(t), y (t), t) . Find dw at t = 1 .
dt
(b) If F : E3 → E2 and G : E2 → E2 are deﬁned by
F (X ) = 2x1 − x2 + x2 x3 + 1
2
,
x2 − x2 + x2
1
3 G(Y ) = y1 + y2 sin y1
2,
−3y1 y2 + y2 (i) Why doesn’t F ◦ G make sense?
(ii) Compute [G ◦ F ] at the point X0 = (0, 1, 0) .
115. Let F = E2 → E2 and G : E3 → E2 be deﬁned by
2 F (w, z ) = ew+z
2
ez +w G(r, s, t) = r + s2 + t3
.
s + t2 + r 3 (a) Find F and G .
(b) Which of F ◦ G or G ◦ F makes sense?
(c) If G ◦ F makes sense, compute (G ◦ F ) at (−1, −1) .
(d) If F ◦ G makes sense, compute (F ◦ G) at (−1, 0, 0) .
116. Let F : X → Y and G : Y → Z be deﬁned by
F: y1 = x2 − ex1 +2x2
,
y2 = x1 x2 G: w1 = y2 + y2 sin y1
w2 = (y1 + y2 )2 (a) Compute F at X0 = (−2, 1) and G at Y0 = F (X0 ) .
(b) Let H = G ◦ F . Compute H at X0 = (−2, 1) .
117. Consider the map F : E2 → E3 deﬁned by f1 (x, y ) = y + ex−y
f2 (x, y ) = sin(x − 2y + 1)
F: f3 (x, y ) = x − 3x2 + y 2
(a) Find the tangent map at the point X0 = (1, 1) .
(b) Use the result of part (a) to evaluate approximately F at X1 = (1.1, .9) .
118. Consider the system of O.D.E.’s
u = αu
v = αu − βv,
where α and β are constants. If u(0) = A and v (0) = B ,
(a) Find u(t) .
(b) Find v (t) (remember to consider the case α = β separately). 408 CHAPTER 10. MISCELLANEOUS SUPPLEMENTARY PROBLEMS 119. (a) Consider the homogeneous equation
u + a(t)u = 0,
where a(t) is continuous and periodic with period P , so a(t + P ) = a(t) .
(i) If a(t) ≡ 1 , show that there is no non-trivial periodic solution by merely
solving the equation.
(ii) If a(t) = cos t , show (again by solving the equation) that there is a periodic
solution u(t) with period 2π .
(iii) In general, if u(t) is a solution, not necessarily periodic, show that v (t) ≡
u(t + P ) is also a solution.
(iv) Show that the homogeneous equation has a non-trivial periodic solution of
period P if and only if
P a(t) dt = 0
0 (b) Consider the inhomogeneous equation
u + a(t)u = f (t),
where both a(t) and f (t) are continuous and periodic with period P .
P (i) If a(t) dt = K = 0 , show that the inhomogeneous equation has one and
0 only one periodic solution with period P .
P (ii) If a(t) dt = 0 , ﬁnd a necessary condition on f that the inhomogeneous
0 equation have a periodic solution with period P .
120. Let f : En → E be a diﬀerentiable function and denote the directional derivative in
the direction of the unit vector e by De f . Prove that D−e f = −De f .
121. Let f : En → E be of the form f (a1 x1 + . . . + an xn ) . Write α = (a1 , · · · , an ) and
β = (b1 , · · · , bn ) . If β is perpendicular to α , prove that β ⊥ f .
122. Let R denote the rectangle 0 ≤ x1 < 2π, 0 ≤ x2 < 2π , and deﬁne the map
f : R → E1 by
f (x1 , x2 ) = (3 + 2 cos x2 ) sin x1
Find and classify the critical points of f . (This function is the height function of a
torus with major radius 3 and minor radius 2).
123. Consider the constant coeﬃcient diﬀerential operator
Lu ≡ au + bu + cu, (a, b, c real, a = 0.) Let λ1 and λ2 denote the roots of the characteristic polynomial p(λ) = aλ2 + bλ + c .
(a) If λ1 = λ2 , ﬁnd a formula for a particular solution of Lu = f .
x
1
[Answer: up (x) =
[eλ1 (x−t) − e−λ2 (x−t) ]f (t) dt.
λ1 − λ2 409
¯
(b) If λ1 is complex, say, λ1 = α + iβ , then λ2 = λ1 = α − iβ . Show that in this
case, the above formula simpliﬁes to
up (x) = 1
β x eα(x−t) sin β (x − t)f (t) dt. (c) If λ1 = λ2 , ﬁnd a formula for a particular solution of Lu = f .
x [Answer: up (x) = (x − t)eλ1 (x−t) f (t) dt ]. f dA where D is the triangle with vertices at (−1, 1), (0, 0) , and 124. Consider
D (3, 1) .
(a) Set up the iterated integrals in two ways.
(b) Evaluate one of the integrals in (a) for the integrand
f (x, y ) = (x + y )2 . 125. When a double integral was set up for the mass M of a certain plate with density
f (x, y ) , the following sum of iterated integrals was obtained
x3 2 M= (
1 8 f (x, y ) dy ) dx + 8 ( x 2 f (x, y ) dy ) dx.
x (a) Sketch the domain of integration and express M as an iterated integral in which
the order of integration is reversed.
(b) Evaluate M if
f (x, y ) =
1 x
.
y 1 xy dx dy . 126. Evaluate
0 0 127. It is diﬃcult to evaluate the integral I = f dA , where f (x, y ) =
D is the indicated rectangle. However, you can show that (trivially)
1
3 < I < 3,
2 and, with a bit more eﬀort but the same method, that
1
2 Please do so. < I < 3.
2 1
1+x+y 2 and D 410 CHAPTER 10. MISCELLANEOUS SUPPLEMENTARY PROBLEMS 128. Consider the integral I = D f dA , where
f (x, h) = 3
8+ x4 + y 4 and D is the domain inside the curve x4 + y 4 = 16 . Show that
√
2 2 < I < 6.
[Hint: Show that 1 < f < 3 in D . Then approximate the area of D by an inscribed
4
8
A
3
and circumscribed square. For the record, it turns out that I = 34 ln( 2 ) , where A
is the
21
area = Γ( )2 ].
π4
129. (a) Find the derivative matrix for the following mappings Y = F (X ) at the given
point X0 .
y1 = x2 + sin x1 x2
1
y2 = x2 + cos x1 x2
2 (i) F: (ii) y1 y2
F: y3 y4 at X0 = (0, 0) = x2 + x3 ex2 − x3
1
2
= x1 − 3x2 + x1 log x3
= x2 + x3
= f x1 x2 x3 at X0 = (2, 0, 1) (b) Find the equation of the tangent plane to the above surfaces at the given point.
130. Consider the following map F from E2 → E2 , the familiar change of variables from
polar to rectangular coordinates.
F: y1 = x1 cos x2
y2 = x1 sin x2 (a) Find the images of
(i) the semi-inﬁnite strip 1 ≤ x1 < ∞,
(ii) the semi-inﬁnite strip 0 ≤ x1 < ∞, 0 ≤ x2 ≤
0 ≤ x2 ≤ (b) Compute F and det F .
131. Given that up (x) = e3x + e−2x − 2ex/2 is a solution of
au + bu + cu = e3 x,
ﬁnd the constants a, b , and c .
132. Evaluate the determinants of the following 1
1y
11
1
1
2 x z 1 −1 1 −1 (b) w2 x2 0
(a) 1 −1 −1 1 w3 x3 0
1 1 −1 −1
w4 x4 0 matrices. zt
t y 0 0 . 0 0
00 π
2.
3π
2. 411
133. For what value(s) of x is the following matrix invertible? 1
1 1
1 111
2 22 23 3 32 33 x x2 x3 (Hint: Observe that the determinant is a cubic polynomial all of whose roots are
obvious).
n 134. Let f (x) =
k=1 sin kx
and g (x) =
ak √
π n sin r x
b√ .
π
r=1 By direct integration prove that
n π f (x)g (x) dx =
−π aj bj .
j =1 After you are done, compare with Theorem 15, page 206-7 and its proof.
135. Let 1
1
A=
0
0 1
0
0
0 00
0 0 0 −1
10 (a) Find det A .
(b) Find A−1 . 2
2 (c) Solve AX = Y , where Y = .
1 3
(d) Let S = { u : u(x) = aex + bex + c sin x + d cos x } , where a, b, c , and d are any
real numbers, and deﬁne a linear operator L : S → S by the rule
Lu ≡ u − u + u.
Find the matrix e Le for L with respect to the following basis for S :
e1 (x) = xex , e2 (x) = x, e3 (x) = sin x, e4 (x) = cos x.
(e) Use the above results to ﬁnd a solution of
Lu = 2xex + 2ex + sin x + 3 cos x.
136. Let u1 and u2 be solutions of the homogeneous equation
Lu ≡ a2 (x)u + a1 (x)u + a0 (x)u = 0. 412 CHAPTER 10. MISCELLANEOUS SUPPLEMENTARY PROBLEMS
(a) Show that W (x) ≡ W (u1 , u2 )(x) , the Wronskian of u1 and u2 satisﬁes the
diﬀerential equation
a1 (x)
W.
W =−
a2 (x)
(b) Find the equation of (a) for the particular operator
Lu ≡ x2 u − 2xu + 2u
and solve it for W under the condition that W (1) = 1 .
(c) Given that u1 (x) = x is a solution of Lu = 0 for the operator of part (b), use
the result of (b) to show that if u2 is another solution of Lu = 0 , then u2
satisﬁes the equation
1
u2 − u2 = x,
x
provided that W (x, u2 )(1) = 1 .
(d) Solve the equation of part (c) under the assumption that u2 (1) = 1 , and thus
ﬁnd a second independent solution of the equation Lu = 0 for the operator of
part (b).
(e) Generalize the idea of parts (c) - (d) by stating and proving some theorem. 137. Here are some linear transformations deﬁned in terms of matrices. In each case,
describe geometrically what the transformation does, by computing the images of the
three parallelograms
Q1 : with vertices at (0, 0), (2, 0), (3, 1), (1, 1).
Q2 : with vertices at (1, 2), (3, 2), (4, 3), (2, 3).
Q3 : with vertices at (1, 0), (0, 2), (−1, 0), (0, −2).
(a) Diagonal Maps (Stretchings)
L1 = L4 = 30
,
01 10
,
0 −1 L7 = a0
,
0a L2 = L5 =
L8 = a0
,
01 L3 = −1 0
,
0 −1
10
,
0b −4 0
,
06 L6 =
L9 = −2 0
,
00
a0
,
0b (Remember to consider negative values of a and b ).
(b) Maps with 0 on the diagonal.
L1 = L4 = 01
,
00
01
,
10 L2 = L5 = 0a
,
00
0a
,
10 L3 = L6 = 00
,
−1 0
0a
.
b0 413
(c) Upper Triangular Matrices.
L1 = 11
,
01 L2 = 1 −1
,
01 L4 = 1a
,
01 L5 = −1 −1
,
0
0 L3 =
L6 = 11
,
0 −1
a1
.
0b (d) Orthogonal Matrices (Rotations and Reﬂections).
L1 = 0 −1
10 L3 = 3
5
L2 4
5 = 4
5 L4 = −3
5 1
− √2
1
√
2 1
√
2
1
√
2 . 138. Let a and b be real numbers such that a2 + b2 = 1 . Let
S= a2 ab
,
ab b2 P e1 = e1 (b) Se2 = −e2 ,
(c) S 2 = I, P= e2 = (−b, a) , so e1 ⊥ e2 . Show that and let e1 = (a, b),
(a) Se1 = e2 , a2 − b2
2ab
,
2ab
b2 − a2 P e2 = 0
P2 = P (d) Show that S can be interpreted as the reﬂection which leaves the line through
e1 ﬁxed, and that P can be interpreted as the projection onto the line through
e1 parallel to e2 .
139. (a) Consider the following relation deﬁned on the set of all integers: nRm if n and
m are both even integers. Verify that this relation is symmetric and transitive but not reﬂexive (since, for example, 1R1 ).
/
(b) Let R be a symmetric and transitive relation deﬁned on a set A . If, given any
element x in A , there is some element y related to it, xRy , prove that the
relation R is also reﬂexive. (The example in part (a) shows that the assertion
will be false if some element is related to no others).
140. Let an be a decreasing sequence of positive real numbers which satisfy an−1 an+1 ≤
1/n
an
a2 . If
an
converges, prove that
n
an−1 converges too. [Hint: Show that
(an /an−1 )1/n ≤ an ].
141. (a) Prove that the series an z n and a2 z n have the same radii of convergence.
n (b) Prove that the series
of convergence. an z n and (an )kz n , where k > 0 , have the same radii 142. Let V be a linear space and L an invertible linear map, L : V → V . If { e1 , . . . , en }
is a basis for V , prove that its image { Le1 , Le2 , . . . , Len } is also a basis for V .
143. Let H be an inner product space and R an orthogonal transformation, R : H →
H . If { e1 , . . . , en } is a complete orthonormal set for H , prove that its image
{ Re1 , . . . , Ren } is also a complete orthonormal set for H . 414 CHAPTER 10. MISCELLANEOUS SUPPLEMENTARY PROBLEMS 144. (a) Let R be an orthogonal matrix and let ρ1 and ρ2 be any two of its column
vectors. Prove that ρ1 ⊥ ρ2 . Prove that any two rows of an orthogonal matrix
are also orthogonal to each other.
(b) Conversely, let A be a square matrix whose column vectors are orthogonal. Must
A be an orthogonal matrix? Proof or counterexample.
145. Let H be an inner product space and A the subspace of H spanned by the vectors
X1 , . . . , Xn . The Gram determinant of those vectors is deﬁned as G(X1 , . . . , Xn ) = X1 , X1
X1 , X2
·
·
·
X1 , Xn ··· ··· Xn , X1
·
·
·
·
Xn , Xn (a) Prove that X1 , · · · , Xn are linearly dependent if and only if G(X1 , · · · , Xn ) = 0 .
[Suggestion: If Z ∈ A , then Z = a1 X1 + · · · an Xn , where the scalars a1 , · · · , an
are to be found. This can be done in two ways, by Theorem 31, page 428, or by
solving the n equations
Z , X1
·
·
·
·
Z , Xn = a1 X1 , X1 + · · · + an Xn , X1 = a1 X1 , Xn + · · · + an Xn , Xn which are obtained from Z, Xj = ai X1 + · · · + an Xn , Xj . Couple both
methods to prove the result].
(b) If X1 , · · · , Xn are an orthogonal set of vectors, compute G(X1 , · · · , Xn ) .
(c) If Y ∈ H , prove that the distance of Y from the subspace A,
is given by the formula
δ 2 = Y − PA Y 2 = Y − PA Y = δ , G(Y, X1 , . . . , Xn )
.
G(X1 , . . . , Xn ) [Suggestion: Observe that δ 2 = Y −PA Y 2 = Y −PA Y, Y and that PA Y, Y =
an X1 , Y + · · · + an Xn , Y . Now write PA Y as Z , use the n equations in
a) and the one equation δ 2 = Y , Y − a1 X1 , Y − · · · − an Xn , Y to solve for
δ 2 by using Cramer’s rule].
(d) Use the fact that G(X1 ) = X1 , X1 to prove the Gram determinant of linearly independent vectors is always positive. In particular, deduce the Cauchy Schwarz inequality from G(X1 , X2 ) ≥ 0 .
(e) In L2 [0, 1] , let X1 = 1+ x , and X2 = x3 . Compute G(X1 , X2 ) . Let Y = 2 − x4
and compute Y − PA Y , where A is the subspace spanned by X1 and X2 . 415
(f) (Muntz) In L2 [0, 1] , let An = span{ xj1 , xj2 , · · · , xjn } where j1 , · · · , jn are
distinct positive integers. Let Y = xk , where k is a positive integer by not one
1
of the j ’s. Prove that lim Y − PAn Y = 0 if and only if
diverges.
n→∞
jn
146. (a) Use Theorem 17, page 217 to ﬁnd linear polynomials P and Q such that,
respectively,
1 [x2 − P (x)]2 dx is minimized, (i)
−1
1 [x2 − Q(x)]2 dx is minimized. (ii)
0 (b) Write P (x) = a + bx and use calculus to again ﬁnd the values of a and b such
that
1 [x2 − P (x)]2 dx
−1 is minimized.
147. Let Z = (1, 1, 1, 1, 1) ∈ E5 and let A be the subspace of E5 spanned by X1 =
(1, 0, 1, 0, 0), X2 = (1, 0, 0, −1, 0) , and X3 = (0, 1, 0, 0, 1) . Find Z − PA Z .
148. Let Γ0 be a closed planar curve which encloses a convex region, and let Γr be the
“parallel” curve obtained by moving out a distance of r along the outer normal.
(a) Discover a formula relating the arc length of Γr to that of Γ0 . [Advise: Examine
the special cases of a circle, rectangle, and convex polygon].
(b) Prove the result you conjectured in part a).
149. The hypergeometric function F (a, b; c; x) is deﬁned by the power series
F (a, b; c; x) = 1 + a(a + 1)b(b + 1) 2 a(a + 1)(a + 2)b(b + 1)(b + 2) 2
a·b
x+
x+
x +···
1·c
1 · 2 · c(c + 1)
1 · 2 · 3c(c + 1)(c + 2) (a) Show that the series converges for all |x| < 1 .
(b) Show that d
dx F (a, b; c; x) = ab
c F (a (c) Show that
(i)
(ii)
(iii)
(iv)
(v) (1 − x)n = F (−n, b; b; x)
(1 + x)n = F (−n, b; b; −x)
log(1 − x) = −xF (1, 1; 2, x)
1
log( 1+x ) = 2xF ( 2 , 1; 3 ; x2 )
1− x
2
ex = lim F (1, b; 1; x/b)
b→∞ 1
1
(vi) cos x = F ( 2 , − 1 ; 2 , sin2 x)
2
113
(vii) sin−1 x = xF ( 2 , 2 ; 2 ; x2 )
1
(viii) tan−1 x = xF ( 2 , 1; 3 ; −x2 )
2 + 1, b + 1; c + 1; x) . 416 CHAPTER 10. MISCELLANEOUS SUPPLEMENTARY PROBLEMS
(d) Show that F satisﬁes the hypergeometric diﬀerential equation
x(1 − x) d2 F
dF
+ [c − (a + b + 1)x]
− ab F = 0.
2
dx
dx [This equation is essentially the most general one with three regular singular
points - in this case located at 0, 1 , and ∞ ].
150. Let { e1 , · · · , en } be a complete orthonormal set of En and let { X1 , · · · , Xn } be a
set of vectors which are close to the ej ’s in the sense that
n Xj − ej 2 < 1. j =1 Prove that the Xj ’s are linearly independent. Give an example in E3 of linearly
dependent vectors { X1 , X2 , X3 } which satisfy
n Xj − ej 2 = 1. j =1 [In fact, one can prove that
n
⊥ dim A ≤ Xj − ej 2 , j =1 where A = span{ X1 , · · · , Xn } ].
151. (a) Show that the function f (z ) = ez , z ∈ C , is never zero.
(b) Scrutinize the proof of the Fundamental Theorem of Algebra (pp. 544-548) and
ﬁnd where it breaks down if one attempts to extend it to prove that ez has at
least one zero. ...

View
Full Document