This
** preview**
has intentionally

**sections.**

*blurred***to view the full version.**

*Sign up*
**Unformatted text preview: **i i i “main”
2007/2/16
page 1
i CHAPTER 1
First-Order Differential
Equations
Among all of the mathematical disciplines the theory of differential equations is the
most important. It furnishes the explanation of all those elementary manifestations
of nature which involve time. — Sophus Lie 1.1 How Differential Equations Arise
In this section we will introduce the idea of a differential equation through the mathematical formulation of a variety of problems. We then use these problems throughout
the chapter to illustrate the applicability of the techniques introduced. Newton’s Second Law of Motion
Newton’s second law of motion states that, for an object of constant mass m, the sum of
the applied forces acting on the object is equal to the mass of the object multiplied by the
acceleration of the object. If the object is moving in one dimension under the inﬂuence
of a force F , then the mathematical statement of this law is
m dv
= F,
dt (1.1.1) where v(t) denotes the velocity of the object at time t . We let y(t) denote the displacement
of the object at time t . Then, using the fact that velocity and displacement are related via
v= dy
,
dt we can write (1.1.1) as
m d 2y
= F.
dt 2 (1.1.2) This is an example of a differential equation, so called because it involves derivatives
of the unknown function y(t).
1
i i i i i i i “main”
2007/2/16
page 2
i 2 CHAPTER 1 First-Order Differential Equations Positive y-direction mg Gravitational Force: As a speciﬁc example, consider the case of an object falling
freely under the inﬂuence of gravity (see Figure 1.1.1). In this case the only force acting
on the object is F = mg , where g denotes the (constant) acceleration due to gravity.
Choosing the positive y -direction as downward, it follows from Equation (1.1.2) that the
motion of the object is governed by the differential equation Figure 1.1.1: Object falling
under the inﬂuence of gravity. m d 2y
= mg,
dt 2 (1.1.3) or equivalently,
d 2y
= g.
dt 2
Since g is a (positive) constant, we can integrate this equation to determine y(t). Performing one integration yields
dy
= gt + c1 ,
dt
where c1 is an arbitrary integration constant. Integrating once more with respect to t, we
obtain
y(t) = 12
gt + c1 t + c2 ,
2 (1.1.4) where c2 is a second integration constant. We see that the differential equation has an
inﬁnite number of solutions parameterized by the constants c1 and c2 . In order to uniquely
specify the motion, we must augment the differential equation with initial conditions that
specify the initial position and initial velocity of the object. For example, if the object
is released at t = 0 from y = y0 with a velocity v0 , then, in addition to the differential
equation, we have the initial conditions
y(0) = y0 , dy
(0) = v0 .
dt (1.1.5) These conditions must be imposed on the solution (1.1.4) in order to determine the values
of c1 and c2 that correspond to the particular problem under investigation. Setting t = 0
in (1.1.4) and using the ﬁrst initial condition from (1.1.5), we ﬁnd that
y0 = c2 .
Substituting this into Equation (1.1.4), we get
y(t) = 12
gt + c1 t + y0 .
2 (1.1.6) In order to impose the second initial condition from (1.1.5), we ﬁrst differentiate Equation
(1.1.6) to obtain
dy
= gt + c1 .
dt
Consequently the second initial condition in (1.1.5) requires
c1 = v0 . i i i i i i i “main”
2007/2/16
page 3
i 1.1 How Differential Equations Arise 3 From (1.1.6), it follows that the position of the object at time t is
12
gt + v0 t + y0 .
2
The differential equation (1.1.3) together with the initial conditions (1.1.5) is an example
of an initial-value problem.
y(t) = Spring Force: As a second application of Newton’s law of motion, consider the spring–
mass system depicted in Figure 1.1.2, where, for simplicity, we are neglecting frictional
and external forces. In this case, the only force acting on the mass is the restoring force (or
spring force), Fs , due to the displacement of the spring from its equilibrium (unstretched)
position. We use Hooke’s law to model this force:
y 0
Mass in its
equilibrium position y(t) Positive y-direction Figure 1.1.2: A simple harmonic oscillator. Hooke’s Law: The restoring force of a spring is directly proportional to the displacement
of the spring from its equilibrium position and is directed toward the equilibrium position.
If y(t) denotes the displacement of the spring from its equilibrium position at time
t (see Figure 1.1.2), then according to Hooke’s law, the restoring force is
Fs = −ky,
where k is a positive constant called the spring constant. Consequently, Newton’s second
law of motion implies that the motion of the spring–mass system is governed by the
differential equation
m d 2y
= −ky,
dt 2 which we write in the equivalent form
d 2y
+ ω 2 y = 0,
dt 2 (1.1.7) √
where ω = k/m. At present we cannot solve this differential equation. However, we
leave it as an exercise (Problem 7) to verify by direct substitution that
y(t) = A cos(ωt − φ)
is a solution to the differential equation (1.1.7), where A and φ are constants (determined
from the initial conditions for the problem). We see that the resulting motion is periodic
with amplitude A. This is consistent with what we might expect physically, since no
frictional forces or external forces are acting on the system. This type of motion is
referred to as simple harmonic motion, and the physical system is called a simple
harmonic oscillator. i i i i i i i “main”
2007/2/16
page 4
i 4 CHAPTER 1 First-Order Differential Equations Newton’s Law of Cooling
We now build a mathematical model describing the cooling (or heating) of an object.
Suppose that we bring an object into a room. If the temperature of the object is hotter
than that of the room, then the object will begin to cool. Further, we might expect that the
major factor governing the rate at which the object cools is the temperature difference
between it and the room.
Newton’s Law of Cooling: The rate of change of temperature of an object is proportional
to the temperature difference between the object and its surrounding medium.
To formulate this law mathematically, we let T (t) denote the temperature of the object at time t , and let Tm (t) denote the temperature of the surrounding medium. Newton’s
law of cooling can then be expressed as the differential equation
dT
= −k(T − Tm ),
dt (1.1.8) where k is a constant. The minus sign in front of the constant k is traditional. It ensures
that k will always be positive.1 After we study Section 1.4, it will be easy to show that,
when Tm is constant, the solution to this differential equation is
T (t) = Tm + ce−kt , (1.1.9) where c is a constant (see also Problem 12). Newton’s law of cooling therefore predicts
that as t approaches inﬁnity (t → ∞), the temperature of the object approaches that
of the surrounding medium (T → Tm ). This is certainly consistent with our everyday
experience (see Figure 1.1.3).
T(t)
T0 T(t) Object that is cooling Tm Tm
Object that is heating
T0
t t Figure 1.1.3: According to Newton’s law of cooling, the temperature of an object approaches
room temperature exponentially. The Orthogonal Trajectory Problem
Next we consider a geometric problem that has many interesting and important applications. Suppose
F (x, y, c) = 0 (1.1.10) 1 If T > T , then the object will cool, so that dT /dt < 0. Hence, from Equation (1.1.8), k must be positive.
m
Similarly, if T < Tm , then dT /dt > 0, and once more Equation (1.1.8) implies that k must be positive. i i i i i i i “main”
2007/2/16
page 5
i 1.1 How Differential Equations Arise 5 deﬁnes a family of curves in the xy -plane, where the constant c labels the different
curves. For instance, the equation
x2 + y2 − c = 0
describes a family of concentric circles with center at the origin, whereas
−x 2 + y − c = 0
describes a family of parabolas that are vertical shifts of the standard parabola y = x 2 .
We assume that every curve in the family F (x, y, c) = 0 has a well-deﬁned tangent
line at each point. Associated with this family is a second family of curves, say,
G(x, y, k) = 0, y x (1.1.11) with the property that whenever a curve from the family (1.1.10) intersects a curve
from the family (1.1.11), it does so at right angles.2 We say that the curves in the
family (1.1.11) are orthogonal trajectories of the family (1.1.10), and vice versa. For
example, from elementary geometry, it follows that the lines y = kx in the family
G(x, y, k) = y − kx = 0 are orthogonal trajectories of the family of concentric circles
x 2 + y 2 = c2 . (See Figure 1.1.4.)
Orthogonal trajectories arise in various applications. For example, a family of curves
and its orthogonal trajectories can be used to deﬁne an orthogonal coordinate system in
the xy -plane. In Figure 1.1.4 the families x 2 + y 2 = c2 and y = kx are the coordinate
curves of a polar coordinate system (that is, the curves r = constant and θ = constant,
respectively). In physics, the lines of electric force of a static conﬁguration are the
orthogonal trajectories of the family of equipotential curves. As a ﬁnal example, if we
consider a two-dimensional heated plate, then the heat energy ﬂows along the orthogonal
trajectories to the constant-temperature curves (isotherms).
Statement of the Problem: Given the equation of a family of curves, ﬁnd the equation of
the family of orthogonal trajectories. Figure 1.1.4: The family of
curves x 2 + y 2 = c2 and the
orthogonal trajectories y = kx . Mathematical Formulation: We recall that curves that intersect at right angles satisfy the
following:
The product of the slopes3 at the point of intersection is −1.
Thus if the given family F (x, y, c) = 0 has slope m1 = f (x, y) at the point (x, y), then
the slope of the family of orthogonal trajectories G(x, y, k) = 0 is m2 = −1/f (x, y),
and therefore the differential equation that determines the orthogonal trajectories is
1
dy
=−
.
dx
f (x, y) 2 That is, the tangent lines to each curve are perpendicular at any point of intersection.
3 By the slope of a curve at a given point, we mean the slope of the tangent line to the curve at that point. i i i i i i i “main”
2007/2/16
page 6
i 6 CHAPTER 1 First-Order Differential Equations Example 1.1.1 Determine the equation of the family of orthogonal trajectories to the curves with equation
y 2 = cx. (1.1.12) Solution: According to the preceding discussion, the differential equation determining the orthogonal trajectories is
dy
1
=−
,
dx
f (x, y)
where f (x, y) denotes the slope of the given family at the point (x, y). To determine
f (x, y), we differentiate Equation (1.1.12) implicitly with respect to x to obtain
2y dy
= c.
dx (1.1.13) We must now eliminate c from the previous equation to obtain an expression that gives
the slope at the point (x, y). From Equation (1.1.12) we have
c= y2
,
x which, when substituted into Equation (1.1.13), yields
dy
y
=
.
dx
2x
Consequently, the slope of the given family at the point (x, y) is
f (x, y) = y
,
2x so that the orthogonal trajectories are obtained by solving the differential equation
dy
2x
=− .
dx
y
A key point to notice is that we cannot solve this differential equation by simply integrating with respect to x , since the function on the right-hand side of the differential
equation depends on both x and y . However, multiplying by y, we see that
y dy
= −2x,
dx or equivalently,
d
dx 12
y
2 = −2x. Since the right-hand side of this equation depends only on x, whereas the term on the
left-hand side is a derivative with respect to x , we can integrate both sides of the equation
with respect to x to obtain
12
y = −x 2 + c1 ,
2
which we write as
2x 2 + y 2 = k, (1.1.14) i i i i i i i “main”
2007/2/16
page 7
i 1.1
y 2x2 How Differential Equations Arise
y2 7 k
y2 cx x Figure 1.1.5: The family of curves y 2 = cx and its orthogonal trajectories 2x 2 + y 2 = k . where k = 2c1 . We see that the curves in the given family (1.1.12) are parabolas, and the
orthogonal trajectories (1.1.14) are a family of ellipses. This is illustrated in Figure 1.1.5. Exercises for 1.1 Key Terms
Differential equation, Initial conditions, Initial-value problem, Newton’s second law of motion, Hooke’s law, Spring
constant, Simple harmonic motion, Simple harmonic oscillator, Newton’s law of cooling, Orthogonal trajectories. Skills
• Given a differential equation, be able to check whether
or not a given function y = f (x) is indeed a solution
to the differential equation.
• Be able to ﬁnd the distance, velocity, and acceleration functions for an object moving freely under the
inﬂuence of gravity.
• Be able to determine the motion of an object in a
spring–mass system with no frictional or external
forces.
• Be able to describe qualitatively how the temperature
of an object changes as a function of time according
to Newton’s law of cooling.
• Be able to ﬁnd the equation of the orthogonal trajectories to a given family of curves. In simple geometric cases, be prepared to provide rough sketches of some
representative orthogonal trajectories. True-False Review
For Questions 1–11, decide if the given statement is true or
false, and give a brief justiﬁcation for your answer. If true,
you can quote a relevant deﬁnition or theorem from the text.
If false, provide an example, illustration, or brief explanation
of why the statement is false.
1. A differential equation for a function y = f (x) must
contain the ﬁrst derivative y = f (x).
2. The numerical values y(0) and y (0) accompanying
a differential equation for a function y = f (x) are
called initial conditions of the differential equation.
3. The relationship between the velocity and the acceleration of an object falling under the inﬂuence of gravity can be expressed mathematically as a differential
equation.
4. A sketch of the height of an object falling freely under
the inﬂuence of gravity as a function of time takes the
shape of a parabola. i i i i i i i “main”
2007/2/16
page 8
i 8 CHAPTER 1 First-Order Differential Equations 5. Hooke’s law states that the restoring force of a spring is
directly proportional to the displacement of the spring
from its equilibrium position and is directed in the direction of the displacement from the equilibrium position.
6. If room temperature is 70◦ F, then an object whose
temperature is 100◦ F at a particular time cools faster
at that time than an object whose temperature at that
time is 90◦ F.
7. According to Newton’s law of cooling, the temperature of an object eventually becomes the same as the
temperature of the surrounding medium.
8. A hot cup of coffee that is put into a cold room cools
more in the ﬁrst hour than the second hour.
9. At a point of intersection of a curve and one of its orthogonal trajectories, the slopes of the two curves are
reciprocals of one another.
10. The family of orthogonal trajectories for a family of
parallel lines is another family of parallel lines.
11. The family of orthogonal trajectories for a family of
circles that are centered at the origin is another family
of circles centered at the origin. 3. A pyrotechnic rocket is to be launched vertically upward from the ground. For optimal viewing, the rocket
should reach a maximum height of 90 meters above
the ground. Ignore frictional forces.
(a) How fast must the rocket be launched in order to
achieve optimal viewing?
(b) Assuming the rocket is launched with the speed
determined in part (a), how long after it is
launched will it reach its maximum height?
4. Repeat Problem 3 under the assumption that the rocket
is launched from a platform 5 meters above the ground.
5. An object thrown vertically upward with a speed of
2 m/s from a height of h meters takes 10 seconds to
reach the ground. Set up and solve the initial-value
problem that governs the motion of the object, and
determine h.
6. An object released from a height h meters above the
ground with a vertical velocity of v0 m/s hits the
ground after t0 seconds. Neglecting frictional forces,
set up and solve the initial-value problem governing
the motion, and use your solution to show that
v0 = Problems
1. An object is released from rest at a height of 100 meters above the ground. Neglecting frictional forces,
the subsequent motion is governed by the initial-value
problem
d 2y
= g,
dt 2 y(0) = 0, dy
(0) = 0,
dt where y(t) denotes the displacement of the object from
its initial position at time t . Solve this initial-value
problem and use your solution to determine the time
when the object hits the ground.
2. A ﬁve-foot-tall boy tosses a tennis ball straight up from
the level of the top of his head. Neglecting frictional
forces, the subsequent motion is governed by the differential equation 7. Verify that y(t) = A cos(ωt − φ) is a solution to the
differential equation (1.1.7), where A, ω, and φ are
constants with A and ω nonzero. Determine the constants A and φ (with |φ | < π radians) in the particular
case when the initial conditions are (a) the time when the tennis ball reaches its maximum height.
(b) the maximum height of the tennis ball. dy
(0) = 0.
dt y(0) = a,
8. Verify that y(t) = c1 cos ωt + c2 sin ωt
is a solution to the differential equation (1.1.7). Show
that the amplitude of the motion is d 2y
= g.
dt 2
If the object hits the ground 8 seconds after the boy
releases it, ﬁnd 1
2
(2h − gt0 ).
2t0 A= 2
2
c1 + c2 . 9. Verify that, for t > 0, y(t) = ln t is a solution to the
differential equation
2 dy
dt 3 = d 3y
.
dt 3 i i i i i i i “main”
2007/2/16
page 9
i 1.1 10. Verify that y(x) = x/(x + 1) is a solution to the differential equation
d 2y
dy
x 3 + 2x 2 − 3
y+ 2 =
+
.
dx
dx
(1 + x)3
11. Verify that y(x) = ex sin x is a solution to the differential equation
2y cot x − d 2y
= 0.
dx 2 12. By writing Equation (1.1.8) in the form
1
dT
= −k
T − Tm dt
and using u−1 du
d
= (ln u), derive (1.1.9).
dt
dt 13. A glass of water whose temperature is 50◦ F is taken
outside at noon on a day whose temperature is constant at 70◦ F. If the water’s temperature is 55◦ F at 2
p.m., do you expect the water’s temperature to reach
60◦ F before 4 p.m. or after 4 p.m.? Use Newton’s law
of cooling to explain your answer.
14. On a cold winter day (10◦ F), an object is brought outside from a 70◦ F room. If it takes 40 minutes for the
object to cool from 70◦ F to 30◦ F, did it take more or
less than 20 minutes for the object to reach 50◦ F? Use
Newton’s law of cooling to explain your answer. How Differential Equations Arise 21. y = mx + c.
22. y = cx m .
23. y 2 + mx 2 = c.
24. y 2 = mx + c.
25. We call a coordinate system (u, v) orthogonal if its
coordinate curves (the two families of curves u = constant and v = constant) are orthogonal trajectories (for
example, a Cartesian coordinate system or a polar coordinate system). Let (u, v) be orthogonal coordinates,
where u = x 2 + 2y 2 , and x and y are Cartesian coordinates. Find the Cartesian equation of the v -coordinate
curves, and sketch the (u, v) coordinate system.
26. Any curve with the property that whenever it intersects a curve of a given family it does so at an angle
a = π/2 is called an oblique trajectory of the given
family. (See Figure 1.1.6.) Let m1 (equal to tan a1 )
denote the slope of the required family at the point
(x, y), and let m2 (equal to tan a2 ) denote the slope of
the given family. Show that
m1 = 16. y = c/x . m2 − tan a
dy
=
.]
dx
1 + m2 tan a m1
m2 tan a1
tan a2 17. y = cx 2 .
18. y =
19. y2 slope of required family
slope of given family a cx 4 . = 2x + c . 20. y = m2 − tan a
.
1 + m2 tan a [Hint: From Figure 1.1.6, tan a1 = tan(a2 − a). Thus,
the equation of the family of oblique trajectories is
obtained by solving For Problems 15–20, ﬁnd the equation of the orthogonal trajectories to the given family of curves. In each case, sketch
some curves from each family.
15. x 2 + 4y 2 = c. cex . For Problems 21–24, m denotes a ﬁxed nonzero constant,
and c is the constant distinguishing the different curves in
the given family. In each case, ﬁnd the equation of the orthogonal trajectories. 9 a2
Curve of required
family a1 Curve of given family Figure 1.1.6: Oblique trajectories intersecting at an angle a . i i i i i i i “main”
2007/2/16
page 10
i 10 CHAPTER 1 First-Order Differential Equations 1.2 Basic Ideas and Terminology
In the preceding section we have used some applied problems to illustrate how differential
equations arise. We now undertake to formalize mathematically several ideas introduced
through these examples. We begin with a very general deﬁnition of a differential equation. DEFINITION 1.2.1
A differential equation is an equation involving one or more derivatives of an unknown function. Example 1.2.2 The following are all differential equations:
5 (a) d 2y
d 3y
d 2y
dy
+ y = x2,
(b)
= −k 2 y ,
(c)
+
+ cos x = 0,
2
3
dx
dx
dx
dx 2
dy
(d) sin
(e) φxx + φyy − φx = ex + x sin y.
+ tan−1 y = 1,
dx The differential equations occurring in (a) through (d) are called ordinary differential equations, since the unknown function y(x) depends only on one variable, x . In (e),
the unknown function φ(x, y) depends on more than one variable; hence the equation
involves partial derivatives. Such a differential equation is called a partial differential
equation. In this text we consider only ordinary differential equations.
We now introduce some more deﬁnitions and terminology. DEFINITION 1.2.3
The order of the highest derivative occurring in a differential equation is called the
order of the differential equation.
In Example 1.2.2, (a) has order 1, (b) has order 2, (c) has order 3, and (d) has order 1.
If we look back at the examples from the previous section, we see that problems formulated using Newton’s second law of motion will always be governed by a second-order
differential equation (for the position of the object). Indeed, second-order differential
equations play a very fundamental role in applied problems, although differential equations of other orders also arise. For example, the differential equation obtained from
Newton’s law of cooling is a ﬁrst-order differential equation, as is the differential equation for determining the orthogonal trajectories to a given family of curves. As another
example, we note that under certain conditions, the deﬂection, y(x), of a horizontal beam
is governed by the fourth-order differential equation
d 4y
= F (x)
dx 4
for an appropriate function F (x).
Any differential equation of order n can be written in the form
G(x, y, y , y , . . . , y (n) ) = 0, (1.2.1) where we have introduced the prime notation to denote derivatives, and y (n) denotes the
nth derivative of y with respect to x (not y to the power of n). Of particular interest to us i i i i i i i “main”
2007/2/16
page 11
i 1.2 Basic Ideas and Terminology 11 throughout the text will be linear differential equations. These arise as the special case of
Equation (1.2.1), when y, y , . . . , y (n) occur to the ﬁrst degree only, and not as products
or arguments of other functions. The general form for such a differential equation is
given in the next deﬁnition. DEFINITION 1.2.4
A differential equation that can be written in the form
a0 (x)y (n) + a1 (x)y (n−1) + · · · + an (x)y = F (x),
where a0 , a1 , . . . , an and F are functions of x only, is called a linear differential
equation of order n. Such a differential equation is linear in y, y , y , . . . , y (n) .
A differential equation that does not satisfy this deﬁnition is called a nonlinear
differential equation.
Example 1.2.5 The equations
y + x 2 y + (sin x)y = ex xy + 4x 2 y − and 2
y=0
1 + x2 are linear differential equations of order 2 and order 3, respectively, whereas the differential equations
y + x sin(y ) − xy = x 2 and y − x2y + y2 = 0 are nonlinear. In the ﬁrst case the nonlinearity arises from the sin(y ) term, whereas in
the second, the nonlinearity is due to the y 2 term.
Example 1.2.6 The general forms for ﬁrst- and second-order linear differential equations are
a0 (x) dy
+ a1 (x)y = F (x)
dx and
a0 (x) d 2y
dy
+ a1 (x)
+ a2 (x)y = F (x),
dx
dx 2 respectively.
If we consider the examples from the previous section, we see that the differential
equation governing the simple harmonic oscillator is a second-order linear differential
equation. In this case the linearity was imposed in the modeling process when we assumed
that the restoring force was directly proportional to the displacement from equilibrium
(Hooke’s law). Not all springs satisfy this relationship. For example, Dufﬁng’s equation
m d 2y
+ k1 y + k2 y 3 = 0
dx 2 gives a mathematical model of a nonlinear spring–mass system. If k2 = 0, this reduces
to the simple harmonic oscillator equation. Newton’s law of cooling assumes a linear relationship between the rate of change of the temperature of an object and the temperature i i i i i i i “main”
2007/2/16
page 12
i 12 CHAPTER 1 First-Order Differential Equations difference between the object and that of the surrounding medium. Hence, the resulting
differential equation is linear. This can be seen explicitly by writing Equation (1.1.8) as
dT
+ kT = kTm ,
dt
which is a ﬁrst-order linear differential equation. Finally, the differential equation for
determining the orthogonal trajectories of a given family of curves will in general be
nonlinear, as seen in Example 1.1.1. Solutions of Differential Equations
We now deﬁne precisely what is meant by a solution to a differential equation. DEFINITION 1.2.7
A function y = f (x) that is (at least) n times differentiable on an interval I is called
a solution to the differential equation (1.2.1) on I if the substitution y = f (x), y =
f (x), . . . , y (n) = f (n) (x) reduces the differential equation (1.2.1) to an identity
valid for all x in I . In this case we say that y = f (x) satisﬁes the differential equation. Example 1.2.8 Verify that for all constants c1 and c2 , y(x) = c1 sin x + c2 cos x is a solution to the
linear differential equation y + y = 0 for x in the interval (−∞, ∞). Solution: The function y(x) is certainly twice differentiable for all real x . Further- more,
y (x) = c1 cos x − c2 sin x
and
y (x) = −(c1 sin x + c2 cos x).
Consequently,
y + y = −(c1 sin x + c2 cos x) + c1 sin x + c2 cos x = 0,
so that y + y = 0 for every x in (−∞, ∞). It follows from the preceding deﬁnition
that the given function is a solution to the differential equation on (−∞, ∞).
In the preceding example, x could assume all real values. Often, however, the independent variable will be restricted in some manner. For example, the differential equation
dy
1
= √ (y − 1)
dx
2x
is undeﬁned when x ≤ 0, and so any solution would be deﬁned only for x > 0. In fact
this linear differential equation has solution
y(x) = ce √ x + 1, x > 0, i i i i i i i “main”
2007/2/16
page 13
i 1.2 Basic Ideas and Terminology 13 where c is a constant. (The reader can check this by plugging in to the given differential
equation, as was done in Example 1.2.8. In Section 1.4 we will introduce a technique that
will enable us to derive this solution.) We now distinguish two ways in which solutions
to a differential equation can be expressed. Often, as in Example 1.2.8, we will be able
to obtain a solution to a differential equation in the explicit form y = f (x), for some
function f . However, when dealing with nonlinear differential equations, we usually
have to be content with a solution written in implicit form
F (x, y) = 0,
where the function F deﬁnes the solution, y(x), implicitly as a function of x . This is
illustrated in Example 1.2.9.
Example 1.2.9 Verify that the relation x 2 + y 2 − 4 = 0 deﬁnes an implicit solution to the nonlinear
differential equation
dy
x
=− .
dx
y Solution: We regard the given relation as deﬁning y as a function of x . Differentiating
this relation with respect to x yields4
2x + 2y dy
= 0.
dx That is,
dy
x
=− ,
dx
y
as required. In this example we can obtain y explicitly in terms of x, since x 2 + y 2 − 4 = 0
implies that
y = ± 4 − x2.
The implicit relation therefore contains the two explicit solutions
y(x) = 4 − x2, y(x) = − 4 − x 2 , which correspond graphically to the two semi-circles sketched in Figure 1.2.1.
y
y(x) (4 x2)1/2 x Both solutions are undefined
when x
2
y(x) (4 x2)1/2 Figure 1.2.1: Two solutions to the differential equation y = −x/y .
4 Note that we have used implicit differentiation in obtaining d(y 2 )/dx = 2y · (dy/dx). i i i i i i i “main”
2007/2/16
page 14
i 14 CHAPTER 1 First-Order Differential Equations Since x = ±2 corresponds to y = 0 in both of these equations, whereas the
differential equation is deﬁned only for y = 0, we must omit x = ±2 from the domains
of the solutions. Consequently, both of the foregoing solutions to the differential equation
are valid for −2 < x < 2.
In the preceding example the solutions to the differential equation are more simply
expressed in implicit form, although, as we have shown, it is quite easy to obtain the
corresponding explicit solutions. In the following example the solution must be expressed
in implicit form, since it is impossible to solve the implicit relation (analytically) for y
as a function of x .
Example 1.2.10 Show that the relation sin(xy) + y 2 − x = 0 deﬁnes a solution to
dy
1 − y cos(xy)
=
.
dx
x cos(xy) + 2y Solution: Differentiating the given relationship implicitly with respect to x yields
cos(xy) y + x dy
dx + 2y dy
− 1 = 0.
dx That is,
dy
[x cos(xy) + 2y ] = 1 − y cos(xy),
dx
which implies that
dy
1 − y cos(xy)
=
dx
x cos(xy) + 2y
as required.
Now consider the simple differential equation
d 2y
= 12x.
dx 2
From elementary calculus we know that all functions whose second derivative is 12x can
be obtained by performing two integrations. Integrating the given differential equation
once yields
dy
= 6x 2 + c1 ,
dx
where c1 is an arbitrary constant. Integrating again, we obtain
y(x) = 2x 3 + c1 x + c2 , (1.2.2) where c2 is another arbitrary constant. The point to notice about this solution is that
it contains two arbitrary constants. Further, by assigning appropriate values to these
constants, we can determine all solutions to the differential equation. We call (1.2.2)
the general solution to the differential equation. In this example the given differential
equation was of second-order, and the general solution contained two arbitrary constants,
which arose because two integrations were required to solve the differential equation.
In the case of an nth-order differential equation we might suspect that the most general i i i i i i i “main”
2007/2/16
page 15
i 1.2 Basic Ideas and Terminology 15 form of solution that can arise would contain n arbitrary constants. This is indeed the
case and motivates the following deﬁnition. DEFINITION 1.2.11
A solution to an nth-order differential equation on an interval I is called the general
solution on I if it satisﬁes the following conditions:
1. The solution contains n constants c1 , c2 , . . . , cn .
2. All solutions to the differential equation can be obtained by assigning
appropriate values to the constants. Remark Not all differential equations have a general solution. For example, consider
(y )2 + (y − 1)2 = 0. The only solution to this differential equation is y(x) = 1, and hence the differential
equation does not have a solution containing an arbitrary constant.
Example 1.2.12 Find the general solution to the differential equation y = e−x . Solution: Integrating the given differential equation with respect to x yields
y = −e−x + c1 , where c1 is an integration constant. Integrating this equation, we obtain
y(x) = e−x + c1 x + c2 (1.2.3) where c2 is another integration constant. Consequently, all solutions to y = e−x are of
the form (1.2.3), and therefore, according to Deﬁnition 1.2.11, this is the general solution
to y = e−x on any interval.
As the preceding example illustrates, we can, in principle, always ﬁnd the general
solution to a differential equation of the form
d ny
= f (x)
dx n (1.2.4) by performing n integrations. However, if the function on the right-hand side of the
differential equation is not a function of x only, this procedure cannot be used. Indeed,
one of the major aims of this text is to determine solution techniques for differential
equations that are more complicated than Equation (1.2.4). A solution to a differential
equation is called a particular solution if it does not contain any arbitrary constants not
present in the differential equation itself. One way in which particular solutions arise
is by our assigning speciﬁc values to the arbitrary constants occurring in the general
solution to a differential equation. For example, from (1.2.3),
y(x) = e−x + x
is a particular solution to the differential equation d 2 y/dx 2 = e−x (the solution corresponding to c1 = 1, c2 = 0). i i i i i i i “main”
2007/2/16
page 16
i 16 CHAPTER 1 First-Order Differential Equations Initial-Value Problems
As discussed in the preceding section, the unique speciﬁcation of an applied problem
requires more than just a differential equation. We must also give appropriate auxiliary
conditions that characterize the problem under investigation. Of particular interest to us
is the case of the initial-value problem deﬁned for an nth-order differential equation as
follows. DEFINITION 1.2.13
An nth-order differential equation together with n auxiliary conditions of the form
y(x0 ) = y0 , y (x0 ) = y1 , ..., y (n−1) (x0 ) = yn−1 , where y0 , y1 , . . . , yn−1 are constants, is called an initial-value problem. Example 1.2.14 Solve the initial-value problem
y = e −x ,
y(0) = 1,
y (0) = 4. Solution: (1.2.5)
(1.2.6) From Example 1.2.12, the general solution to Equation (1.2.5) is
y(x) = e−x + c1 x + c2 . (1.2.7) We now impose the auxiliary conditions (1.2.6). Setting x = 0 in (1.2.7), we see that
y(0) = 1 if and only if 1 = 1 + c2 . So c2 = 0. Using this value for c2 in (1.2.7) and differentiating the result yields
y (x) = −e−x + c1 .
Consequently
y (0) = 4 if and only if 4 = −1 + c1 , and hence c1 = 5. Thus the given auxiliary conditions pick out the particular solution to
the differential equation (1.2.5) with c1 = 5 and c2 = 0, so that the initial-value problem
has the unique solution
y(x) = e−x + 5x. Initial-value problems play a fundamental role in the theory and applications of
differential equations. In the previous example, the initial-value problem had a unique
solution. More generally, suppose we have a differential equation that can be written in
the normal form
y (n) = f (x, y, y , . . . , y (n−1) ).
According to Deﬁnition 1.2.13, the initial-value problem for such an nth-order differential equation is the following: i i i i i i i “main”
2007/2/16
page 17
i 1.2 Basic Ideas and Terminology 17 Statement of the initial-value problem: Solve
y (n) = f (x, y, y , . . . , y (n−1) )
subject to
y(x0 ) = y0 , y (x0 ) = y1 , ..., y (n−1) (x0 ) = yn−1 , where y0 , y1 , . . . , yn−1 are constants.
It can be shown that this initial-value problem always has a unique solution, provided
that f and its partial derivatives with respect to y, y , . . . , y (n−1) are continuous in an
appropriate region. This is a fundamental result in the theory of differential equations.
In Chapter 6 we will show how the following special case can be used to develop the
theory for linear differential equations.
Theorem 1.2.15 Let a1 , a2 , . . . , an , F be functions that are continuous on an interval I . Then, for any x0
in I , the initial-value problem
y (n) + a1 (x)y (n−1) + · · · + an−1 (x)y + an (x)y = F (x),
y(x0 ) = y0 , y (x0 ) = y1 , ..., y (n−1) (x0 ) = yn−1 has a unique solution on I .
The next example, which we will refer back to on many occasions throughout the
remainder of the text, illustrates the power of the preceding theorem.
Example 1.2.16 Prove that the general solution to the differential equation
y + ω 2 y = 0, −∞ < x < ∞ (1.2.8) where ω is a nonzero constant, is
y(x) = c1 cos ωx + c2 sin ωx, (1.2.9) where c1 , c2 are arbitrary constants. Solution: It is a routine computation to verify that y(x) = c1 cos ωx + c2 sin ωx is a
solution to the differential equation (1.2.8) on (−∞, ∞). According to Deﬁnition 1.2.11
we must now establish that every solution to (1.2.8) is of the form (1.2.9). To that end,
suppose that y = f (x) is any solution to (1.2.8). Then according to the preceding
theorem, y = f (x) is the unique solution to the initial-value problem
y + ω 2 y = 0, y(0) = f (0), y (0) = f (0). (1.2.10) However, consider the function
y(x) = f (0) cos ωx + f (0)
sin ωx
ω (1.2.11) This is of the form y(x) = c1 cos ωx + c2 sin ωx, where c1 = f (0) and c2 = f (0)/ω,
and therefore solves the differential equation (1.2.8). Further, evaluating (1.2.11) at x = 0
yields
y(0) = f (0) and y (0) = f (0). Consequently, (1.2.11) solves the initial-value problem (1.2.10). But, by assumption,
y(x) = f (x) solves the same initial-value problem. Owing to the uniqueness of the i i i i i i i “main”
2007/2/16
page 18
i 18 CHAPTER 1 First-Order Differential Equations solution to this initial-value problem, it follows that these two solutions must coincide.
Therefore,
f (x) = f (0) cos ωx + f (0)
sin ωx = c1 cos ωx + c2 sin ωx.
ω Since f (x) was an arbitrary solution to the differential equation (1.2.8), we can conclude
that every solution to (1.2.8) is of the form
y(x) = c1 cos ωx + c2 sin ωx
and therefore this is the general solution on (−∞, ∞).
In the remainder of this chapter we will focus primarily on ﬁrst-order differential
equations and some of their elementary applications. We will investigate such differential
equations qualitatively, analytically, and numerically. Exercises for 1.2 Key Terms
Differential equation, Order of a differential equation, Linear
differential equation, Nonlinear differential equation, General solution to a differential equation, Particular solution to
a differential equation, Initial-value problem. Skills
• Be able to determine the order of a differential equation. you can quote a relevant deﬁnition or theorem from the text.
If false, provide an example, illustration, or brief explanation
of why the statement is false.
1. The order of a differential equation is the order of
the lowest derivative appearing in the differential
equation.
2. The general solution to a third-order differential equation must contain three constants. • Be able to determine whether a given differential equation is linear or nonlinear. 3. An initial-value problem always has a unique solution if the functions and partial derivatives involved
are continuous. • Be able to determine whether or not a given function y(x) is a particular solution to a given differential
equation. 4. The general solution to y + y = 0 is y(x) =
c1 cos x + 5c2 cos x . • Be able to determine whether or not a given implicit
relation deﬁnes a particular solution to a given differential equation.
• Be able to ﬁnd the general solution to differential equations of the form y (n) = f (x) via n integrations.
• Be able to use initial conditions to ﬁnd the solution to
an initial-value problem. True-False Review
For Questions 1–6, decide if the given statement is true or
false, and give a brief justiﬁcation for your answer. If true, 5. The general solution to y + y = 0 is y(x) =
c1 cos x + 5c1 sin x .
6. The general solution to a differential equation of the
form y (n) = F (x) can be obtained by n consecutive
integrations of the function F (x). Problems
For Problems 1–6, determine the order of the given differential equation and state whether it is linear or nonlinear.
1. d 2y
dy
+ exy
= x2.
dx
dx 2 i i i i i i i “main”
2007/2/16
page 19
i 1.2 2. d 3y
d 2y
dy
+ 4 2 + sin x
= xy + tan x .
dx
dx 3
dx 3. y + 3x(y )3 − y = 1 + 3x . 6. d 2y
d 4y
+ 3 2 = x.
4
dx
dx
√ 23. When N is a positive integer, the Legendre equation
(1 − x 2 )y − 2xy + N(N + 1)y = 0, y(x) = ln x
xy +
= 3x 3 .
y For Problems 7–18, verify that the given function is a solution to the given differential equation (c1 and c2 are arbitrary
constants), and state the maximum interval over which the
solution is valid.
7. y(x) = c1 ex cos 2x +c2 ex sin 2x, y −2y +5y = 0.
8. y(x) = c1 ex + c2 e−2x ,
9. y(x) = 1
,
x+4 10. y(x) = c1 x 1/2 , y + y − 2y = 0. y = −y 2 .
y
y=
.
2x 11. y(x) = e−x sin 2x, 13. y(x) = c1 x −3 + c2 x −1 ,
14. y(x) = c1 x 1/2 + 3x 2 , 1
x(5x 2 − 3).
2 24. Determine a solution to the differential equation
(1 − x 2 )y − xy + 4y = 0
of the form y(x) = a0 + a1 x + a2 x 2 satisfying the
normalization condition y(1) = 1.
For Problems 25–29, show that the given relation deﬁnes an
implicit solution to the given differential equation, where c
is an arbitrary constant.
25. x sin y − ex = c, y = y + 2y + 5y = 0. 12. y(x) = c1 cosh 3x + c2 sinh 3x, 19 with −1 < x < 1, has a solution that is a polynomial
of degree N . Show by substitution into the differential
equation that in the case N = 3 such a solution is 4. sin x · ey + y − tan y = cos x .
5. Basic Ideas and Terminology y − 9y = 0. x 2 y + 5xy + 3y = 0.
2x 2 y − xy + y = 9x 2 . ex − sin y
.
x cos y 26. xy 2 + 2y − x = c, y = 1 − y2
.
2(1 + xy) 1 − yexy
.
xexy
Determine the solution with y(1) = 0. 27. exy − x = c, y = 15. y(x) = c1 x 2 + c2 x 3 − x 2 sin x,
x 2 y − 4xy + 6y = x 4 sin x . 28. ey/x + xy 2 − x = c, y = 16. y(x) = c1 eax + c2 ebx , y − (a + b)y + aby = 0,
where a and b are constants and a = b. x 2 (1 − y 2 ) + yey/x
.
x(ey/x + 2x 2 y) 29. x 2 y 2 − sin x = c, y = 17. y(x) = eax (c1 + c2 x),
a is a constant. y − 2ay + a 2 y = 0, where 18. y(x) = eax (c1 cos bx + c2 sin bx),
y − 2ay + (a 2 + b2 )y = 0, where a and b are
constants.
For Problems 19–22, determine all values of the constant
r such that the given function solves the given differential
equation. cos x − 2xy 2
.
2x 2 y
Determine the explicit solution that satisﬁes y(π) =
1/π . For Problems 30–33, ﬁnd the general solution to the given
differential equation and the maximum interval on which the
solution is valid.
30. y = sin x .
31. y = x −1/2 . 19. y(x) = erx , y + 2y − 3y = 0. 32. y = xex . 20. y(x) = erx , y − 8y + 16y = 0. 33. y = x n , n an integer. 21. y(x) = x r , x 2 y + xy − y = 0. 22. y(x) = x r , x 2 y + 5xy + 4y = 0. For Problems 34–38, solve the given initial-value problem.
34. y = ln x, y(1) = 2. i i i i i i i “main”
2007/2/16
page 20
i 20 CHAPTER 1 First-Order Differential Equations 35. y = cos x, y(0) = 2, y (0) = 1. 10 = 6x, y(0) = 1, y (0) = −1, y (0) = 4. 36. y 37. y = xex , y(0) = 3, y (0) = 4. y (x) = 47. x > 0.
48. 38. Prove that the general solution to y − y = 0 on any
interval I is y(x) = c1 ex + c2 e−x . (a) Derive the polynomial of degree ﬁve that satisﬁes
both the Legendre equation A second-order differential equation together with two auxiliary conditions imposed at different values of the independent variable is called a boundary-value problem. For Problems 39–40, solve the given boundary-value problem. (1 − x 2 )y − 2xy + 30y = 0
and the normalization condition y(1) = 1.
(b) 39. y = e−x , y(0) = 1, y(1) = 0.
40. y = −2(3 + 2 ln x), y(1) = y(e) = 0.
41. The differential equation y + y = 0 has the general
solution y(x) = c1 cos x + c2 sin x .
(a) Show that the boundary-value problem y + y =
0, y(0) = 0, y(π) = 1 has no solutions.
(b) Show that the boundary-value problem y + y =
0, y(0) = 0, y(π) = 0, has an inﬁnite number
of solutions.
For Problems 42–47, verify that the given function is a solution to the given differential equation. In these problems,
c1 and c2 are arbitrary constants. Throughout the text, the
symbol refers to exercises for which some form of technology, such as a graphing calculator or computer algebra
system (CAS), is recommended.
42. y (x) = c1 e2x + c2 e−3x , y + y − 6y = 0. 43. y (x) = c1 x 4 + c2 x −2 , x 2 y − xy − 8y = 0, x > 0. 44. y (x) = c1 x 2 + c2 x 2 ln x + 1 x 2 (ln x)3 ,
6
− 3xy + 4y = x 2 ln x, x > 0. x2y
45. y (x) = x a [c1 cos(b ln x) + c2 sin(b ln x)],
x 2 y + (1 − 2a)xy + (a 2 + b2 )y = 0, x > 0, where
a and b are arbitrary constants. 46. y (x) = c1 ex + c2 e−x (1 + 2x + 2x 2 ),
xy − 2y + (2 − x)y = 0, x > 0. 1.3 k =0 1k
x , xy − (x + 10)y + 10y = 0,
k! 49. Sketch your solution from (a) and determine approximations to all zeros and local maxima and
local minima on the interval (−1, 1). One solution to the Bessel equation of (nonnegative)
integer order N
x 2 y + xy + (x 2 − N 2 )y = 0
is
∞ y(x) = JN (x) =
k =0 (−1)k
x
k !(N + k)! 2 2 k +N . (a) Write the ﬁrst three terms of J0 (x).
(b) Let J (0, x, m) denote the mth partial sum
m J (0, x, m) =
k =0 (−1)k
(k !)2 x
2 2k . Plot J (0, x, 4) and use your plot to approximate
the ﬁrst positive zero of J0 (x). Compare your
value against a tabulated value or one generated
by a computer algebra system.
(c) Plot J0 (x) and J (0, x, 4) on the same axes over
the interval [0, 2]. How well do they compare?
(d) If your system has built-in Bessel functions, plot
J0 (x) and J (0, x, m) on the same axes over the
interval [0, 10] for various values of m. What is
the smallest value of m that gives an accurate approximation to the ﬁrst three positive zeros of
J0 (x)? The Geometry of First-Order DIfferential Equations
The primary aim of this chapter is to study the ﬁrst-order differential equation
dy
= f (x, y),
dx (1.3.1) i i i i i i i “main”
2007/2/16
page 21
i 1.3 The Geometry of First-Order DIfferential Equations 21 where f (x, y) is a given function of x and y . In this section we focus our attention mainly
on the geometric aspects of the differential equation and its solutions. The graph of any
solution to the differential equation (1.3.1) is called a solution curve. If we recall the
geometric interpretation of the derivative dy/dx as giving the slope of the tangent line
at any point on the curve with equation y = y(x), we see that the function f (x, y) in
(1.3.1) gives the slope of the tangent line to the solution curve passing through the point
(x, y). Consequently, when we solve Equation (1.3.1), we are ﬁnding all curves whose
slope at the point (x, y) is given by the function f (x, y). According to our deﬁnition in
the previous section, the general solution to the differential equation (1.3.1) will involve
one arbitrary constant, and therefore, geometrically, the general solution gives a family
of solution curves in the xy -plane, one solution curve corresponding to each value of the
arbitrary constant.
Example 1.3.1 Find the general solution to the differential equation dy/dx = 2x , and sketch the corresponding solution curves. Solution: The differential equation can be integrated directly to obtain y(x) = x 2 + c.
Consequently the solution curves are a family of parabolas in the xy -plane. This is
illustrated in Figure 1.3.1.
y x Figure 1.3.1: Some solution curves for the differential equation dy/dx = 2x . Figure 1.3.2 gives a Mathematica plot of some solution curves to the differential
equation
dy
= y − x2.
dx
This illustrates that generally the solution curves of a differential equation are quite
complicated. Upon completion of the material in this section, the reader will be able to
obtain Figure 1.3.2 without needing a computer algebra system. Existence and Uniqueness of Solutions
It is useful for the further analysis of the differential equation (1.3.1) to give at this point
a brief discussion of the existence and uniqueness of solutions to the corresponding
initial-value problem
dy
= f (x, y),
y(x0 ) = y0 .
(1.3.2)
dx
Geometrically, we are interested in ﬁnding the particular solution curve to the differential
equation that passes through the point in the xy -plane with coordinates (x0 , y0 ). The
following questions arise regarding the initial-value problem: i i i i i i i “main”
2007/2/16
page 22
i 22 CHAPTER 1 First-Order Differential Equations
y (x0, y0)
y (x0) f(x0, y0) y0 x02 x Figure 1.3.2: Some solution curves for the differential equation dy/dx = y − x 2 . 1. Existence: Does the initial-value problem have any solutions?
2. Uniqueness: If the answer to question 1 is yes, does the initial-value problem have
only one solution?
Certainly in the case of an applied problem we would be interested only in initial-value
problems that have precisely one solution. The following theorem establishes conditions
on f that guarantee the existence and uniqueness of a solution to the initial-value problem
(1.3.2).
Theorem 1.3.2 (Existence and Uniqueness Theorem)
Let f (x, y) be a function that is continuous on the rectangle
R = {(x, y) : a ≤ x ≤ b, c ≤ y ≤ d }.
Suppose further that ∂f/∂y is continuous in R . Then for any interior point (x0 , y0 ) in the
rectangle R , there exists an interval I containing x0 such that the initial-value problem
(1.3.2) has a unique solution for x in I . Proof A complete proof of this theorem can be found, for example, in G. F. Simmons,
Differential Equations (New York: McGraw-Hill, 1972). Figure 1.3.3 gives a geometric
illustration of the result. Remark From a geometric viewpoint, if f (x, y) satisﬁes the hypotheses of the existence and uniqueness theorem in a region R of the xy -plane, then throughout that region
the solution curves of the differential equation dy/dx = f (x, y) cannot intersect. For if
two solution curves did intersect at (x0 , y0 ) in R , then that would imply there was more
than one solution to the initial-value problem
dy
= f (x, y),
dx y(x0 ) = y0 , which would contradict the existence and uniqueness theorem. i i i i i i i “main”
2007/2/16
page 23
i 1.3 The Geometry of First-Order DIfferential Equations 23 y
Unique solution on I
d Rectangle, R (x0, y0)
c a b I x Figure 1.3.3: Illustration of the existence and uniqueness theorem for ﬁrst-order differential
equations. The following example illustrates how the preceding theorem can be used to establish
the existence of a unique solution to a differential equation, even though at present we
do not know how to determine the solution.
Example 1.3.3 Prove that the initial-value problem
dy
= 3xy 1/3 ,
dx y(0) = a has a unique solution whenever a = 0. Solution: In this case the initial point is x0 = 0, y0 = a , and f (x, y) = 3xy 1/3 .
Hence, ∂f/∂y = xy −2/3 . Consequently, f is continuous at all points in the xy -plane,
whereas ∂f/∂y is continuous at all points not lying on the x -axis (y = 0). Provided
a = 0, we can certainly draw a rectangle containing (0, a) that does not intersect the
x -axis. (See Figure 1.3.4.) In any such rectangle the hypotheses of the existence and
uniqueness theorem are satisﬁed, and therefore the initial-value problem does indeed
have a unique solution.
y (0, a)
x Figure 1.3.4: The initial-value problem in Example 1.3.3 satisﬁes the hypotheses of the
existence and uniqueness theorem in the small rectangle, but not in the large rectangle. Example 1.3.4 Discuss the existence and uniqueness of solutions to the initial-value problem
dy
= 3xy 1/3 ,
dx y(0) = 0. i i i i i i i “main”
2007/2/16
page 24
i 24 CHAPTER 1 First-Order Differential Equations Solution: The differential equation is the same as in the previous example, but the
initial condition is imposed on the x -axis. Since ∂f/∂y = xy −2/3 is not continuous
along the x -axis, there is no rectangle containing (0, 0) in which the hypotheses of the
existence and uniqueness theorem are satisﬁed. We can therefore draw no conclusion
from the theorem itself. We leave it as an exercise to verify by direct substitution that the
given initial-value problem does in fact have the following two solutions:
y(x) = 0 and y(x) = x 3 . Consequently in this case the initial-value problem does not have a unique solution. Slope Fields
We now return to our discussion of the geometry of solutions to the differential equation
dy
= f (x, y).
dx
The fact that the function f (x, y) gives the slope of the tangent line to the solution
curves of this differential equation leads to a simple and important idea for determining
the overall shape of the solution curves. We compute the value of f (x, y) at several
points and draw through each of the corresponding points in the xy -plane small line
segments having f (x, y) as their slopes. The resulting sketch is called the slope ﬁeld
for the differential equation. The key point is that each solution curve must be tangent to
the line segments that we have drawn, and therefore by studying the slope ﬁeld we can
obtain the general shape of the solution curves. Example 1.3.5 x Slope = 2x 2 0
±0.2
±0.4
±0.6
±0.8
±1.0 0
0.08
0.32
0.72
1.28
2 Table 1.3.1: Values of the slope
for the differential equation in
Example 1.3.5. Sketch the slope ﬁeld for the differential equation dy/dx = 2x 2 . Solution: The slope of the solution curves to the differential equation at each point in
the xy -plane depends on x only. Consequently, the slopes of the solution curves will be
the same at every point on any line parallel to the y -axis (on such a line, x is constant).
Table 1.3.1 contains the values of the slope of the solution curves at various points in the
interval [−1, 1].
Using this information, we obtain the slope ﬁeld shown in Figure 1.3.5. In this
example, we can integrate the differential equation to obtain the general solution
y(x) = 23
x + c.
3 Some solution curves and their relation to the slope ﬁeld are also shown in Figure 1.3.5. In the preceding example, the slope ﬁeld could be obtained fairly easily because the
slopes of the solution curves to the differential equation were constant on lines parallel
to the y -axis. For more complicated differential equations, further analysis is generally
required if we wish to obtain an accurate plot of the slope ﬁeld and the behavior of the
corresponding solution curves. Below we have listed three useful procedures. i i i i i i i “main”
2007/2/16
page 25
i 1.3 The Geometry of First-Order DIfferential Equations 25 y x Figure 1.3.5: Slope ﬁeld and some representative solution curves for the differential equation
dy/dx = 2x 2 . 1. Isoclines: For the differential equation
dy
= f (x, y),
(1.3.3)
dx
the function f (x, y) determines the regions in the xy -plane where the slope of the
solution curves is positive, as well as those where it is negative. Furthermore, each
solution curve will have the same slope k along the family of curves
f (x, y) = k.
These curves are called the isoclines of the differential equation, and they can be
very useful in determining slope ﬁelds. When sketching a slope ﬁeld, we often
start by drawing several isoclines and the corresponding line segments with slope
k at various points along them.
2. Equilibrium Solutions: Any solution to the differential equation (1.3.3) of the
form y(x) = y0 , where y0 is a constant, is called an equilibrium solution to the
differential equation. The corresponding solution curve is a line parallel to the x axis. From Equation (1.3.3), equilibrium solutions are given by any constant values
of y for which f (x, y) = 0, and therefore can often be obtained by inspection.
For example, the differential equation
dy
= (y − x)(y + 1)
dx
has the equilibrium solution y(x) = −1. One reason that equilibrium solutions are
useful in sketching slope ﬁelds and determining the general behavior of the full
family of solution curves is that, from the existence and uniqueness theorem, we
know that no other solution curves can intersect the solution curve corresponding
to an equilibrium solution. Consequently, equilibrium solutions serve to divide the
xy -plane into different regions.
3. Concavity Changes: By differentiating Equation (1.3.3) (implicitly) with respect
to x we can obtain an expression for d 2 y/dx 2 in terms of x and y . This can be
useful in determining the behavior of the concavity of the solution curves to the
differential equation (1.3.3). The remaining examples illustrate the application of
the foregoing procedures.
Example 1.3.6 Sketch the slope ﬁeld for the differential equation
dy
= y − x.
dx (1.3.4) i i i i i i i “main”
2007/2/16
page 26
i 26 CHAPTER 1 First-Order Differential Equations Solution: By inspection we see that the differential equation has no equilibrium
solutions. The isoclines of the differential equation are the family of straight lines y − x =
k . Thus each solution curve of the differential equation has slope k at all points along
the line y − x = k . Table 1.3.2 contains several values for the slopes of the solution
curves, and the equations of the corresponding isoclines. We note that the slope at all
points along the isocline y = x + 1 is unity, which, from Table 1.3.2, coincides with
the slope of any solution curve that meets it. This implies that the isocline must in fact
coincide with a solution curve. Hence, one solution to the differential equation (1.3.4)
is y(x) = x + 1, and, by the existence and uniqueness theorem, no other solution curve
can intersect this one.
Slope of
Solution Curves
k
k
k
k
k = −2
= −1
=0
=1
=2 Equation of
Isocline
y
y
y
y
y =x−2
=x−1
=x
=x+1
=x+2 Table 1.3.2: Slope and isocline information for the differential equation in Example 1.3.6. In order to determine the behavior of the concavity of the solution curves, we differentiate the given differential equation implicitly with respect to x to obtain
d 2y
dy
=
− 1 = y − x − 1,
dx
dx 2
where we have used (1.3.4) to substitute for dy/dx in the second step. We see that the
solution curves are concave up (y > 0) at all points above the line
y =x+1 (1.3.5) and concave down (y < 0) at all points beneath this line. We also note that Equation (1.3.5) coincides with the particular solution already identiﬁed. Putting all of this
information together, we obtain the slope ﬁeld sketched in Figure 1.3.6.
y
y x 1 Isoclines x Figure 1.3.6: Hand-drawn slope ﬁeld, isoclines, and some approximate solution curves for the
differential equation in Example 1.3.6. i i i i i i i “main”
2007/2/16
page 27
i 1.3 The Geometry of First-Order DIfferential Equations 27 Generating Slope Fields Using Technology
Many computer algebra systems (CAS) and graphing calculators have built-in programs
to generate slope ﬁelds. As an example, in the CAS Maple the command
diffeq := diff(y(x), x) = y(x) − x ;
assigns the name diffeq to the differential equation considered in the previous example.
The further command
DEplot(diffeq, y(x), x = −3..3, y = −3..3, arrows=line);
then produces a sketch of the slope ﬁeld for the differential equation on the square
−3 ≤ x ≤ 3, −3 ≤ y ≤ 3. Initial conditions such as y(0) = 0, y(0) = 1, y(0) =
2, y(0) = −1 can be speciﬁed using the command
IC := {[0, 0], [0, 1], [0, 2], [0, −1]};
Then the command
DEplot(diffeq, y(x), x = −3..3, IC, y = −3..3, arrows=line);
not only plots the slope ﬁeld, but also gives a numerical approximation to each of the
solution curves satisfying the speciﬁed initial conditions. Some of the methods that can
be used to generate such numerical approximations will be discussed in Section 1.10.
The preceding sequence of Maple commands was used to generate the Maple plot given
in Figure 1.3.7. Clearly the generation of slope ﬁelds and approximate solution curves
is one area where technology can be extremely helpful.
y
3
2
1 3 2 1 1 2 3 x 1
2
3 Figure 1.3.7: Maple plot of the slope ﬁeld and some approximate solution curves for the
differential equation in Example 1.3.6. Example 1.3.7 Sketch the slope ﬁeld and some approximate solution curves for the differential equation
dy
= y(2 − y).
dx (1.3.6) i i i i i i i “main”
2007/2/16
page 28
i 28 CHAPTER 1 First-Order Differential Equations Solution:
solutions We ﬁrst note that the given differential equation has the two equilibrium
y(x) = 0 and y(x) = 2. Consequently, from Theorem 1.3.2, the xy -plane can be divided into the three distinct
regions y < 0, 0 < y < 2, and y > 2. From Equation (1.3.6) the behavior of the sign
of the slope of the solution curves in each of these regions is given in the following
schematic.
− − −− |+ + ++ |− − −−
0
2 sign of slope:
y -interval:
The isoclines are determined from y(2 − y) = k.
That is,
y 2 − 2y + k = 0,
so that the solution curves have slope k at all points of intersection with the horizontal
lines
√
y = 1 ± 1 − k.
(1.3.7)
Table 1.3.3 contains some of the isocline equations. Note from Equation (1.3.7) that
the largest possible positive slope is k = 1. We see that the slopes of the solution
curves quickly become very large and negative for y outside the interval [0, 2]. Finally,
differentiating Equation (1.3.6) implicitly with respect to x yields
d 2y
dy
dy
dy
=2
− 2y
= 2(1 − y)
= 2y(1 − y)(2 − y).
2
dx
dx
dx
dx Slope
of Solution Curves
k
k
k
k
k
k =1
=0
= −1
= −2
= −3
= −n, n ≥ 1 Equation
of Isocline
y
y
y
y
y
y =1
= 2 and y = 0
√
= 1 ± √2
=1± 3
= 3 and y = −1
√
=1± n+1 Table 1.3.3: Slope and isocline information for the differential equation in Example 1.3.7. The sign of d 2 y/dx 2 is given in the following schematic.
sign of y :
y -interval: − − −− |+ + ++ |− − −− |+ + + +
0
1
2 i i i i i i i “main”
2007/2/16
page 29
i 1.3 The Geometry of First-Order DIfferential Equations 29 y 2 x Figure 1.3.8: Hand-drawn slope ﬁeld, isoclines, and some solution curves for the differential
equation dy/dx = y(2 − y). Using this information leads to the slope ﬁeld sketched in Figure 1.3.8. We have also
included some approximate solution curves. We see from the slope ﬁeld that for any
initial condition y(x0 ) = y0 , with 0 ≤ y0 ≤ 2, the corresponding unique solution to
the differential equation will be bounded. In contrast, if y0 > 2, the slope ﬁeld suggests
that all corresponding solutions approach y = 2 as x → ∞, whereas if y0 < 0, then all
corresponding solutions approach y = 0 as x → −∞. Furthermore, the behavior of the
slope ﬁeld also suggests that the solution curves that do not lie in the region 0 < y < 2
may diverge at ﬁnite values of x . We leave it as an exercise to verify (by substitution into
Equation (1.3.6)) that for all values of the constant c,
2ce2x
ce2x − 1
is a solution to the given differential equation. We see that any initial condition that
yields a positive value for c will indeed lead to a solution that has a vertical asymptote
1
at x = 2 ln(1/c).
y(x) = The tools that we have introduced in this section enable us to analyze the solution
behavior of many ﬁrst-order differential equations. However, for complicated functions
f (x, y) in Equation (1.3.3), performing these computations by hand can be a tedious
task. Fortunately, as we have illustrated, there are many computer programs available for
drawing slope ﬁelds and generating solution curves (numerically). Furthermore, several
graphing calculators also have these capabilities. Exercises for 1.3 Key Terms
Solution curve, Existence and uniqueness theorem, Slope
ﬁeld, Isocline, Equilibrium solution. Skills
• Be able to ﬁnd isoclines for a differential equation
dy/dx = f (x, y).
• Be able to determine equilibrium solutions for a differential equation dy/dx = f (x, y). • Be able to sketch the slope ﬁeld for a differential equation, using isoclines, equilibrium solutions, and concavity changes.
• Be able to sketch solution curves to a differential
equation.
• Be able to apply the existence and uniqueness theorem
to ﬁnd unique solutions to initial-value problems. i i i i i i i “main”
2007/2/16
page 30
i 30 CHAPTER 1 First-Order Differential Equations True-False Review
For Questions 1–7, decide if the given statement is true or
false, and give a brief justiﬁcation for your answer. If true,
you can quote a relevant deﬁnition or theorem from the text.
If false, provide an example, illustration, or brief explanation
of why the statement is false.
1. If f (x, y) satisﬁes the hypotheses of the existence
and uniqueness theorem in a region R of the xy plane, then the solution curves to a differential equation dy/dx = f (x, y) cannot intersect in R . For Problems 8–11, verify that the given function (or relation) deﬁnes a solution to the given differential equation and
sketch some of the solution curves. If an initial condition is
given, label the solution curve corresponding to the resulting
unique solution. (In these problems, c denotes an arbitrary
constant.)
8. x 2 + y 2 = c, y = −x/y .
9. y = cx 3 , y = 3y/x , y(2) = 8.
10. y 2 = cx, 2x dy − y dx = 0, y(1) = 2.
y2 − x2
, y(2) = 2.
2xy 2. Every differential equation dy/dx = f (x, y) has at
least one equilibrium solution. 11. (x − c)2 + y 2 = c2 , y = 3. The differential equation dy/dx = x(y 2 − 4) has no
equilibrium solutions. 12. Prove that the initial-value problem 4. The circle x 2 + y 2 = 4 is an isocline for the differential
equation dy/dx = x 2 + y 2 .
5. The equilibrium solutions of a differential equation are
always parallel to one another.
6. The isoclines for the differential equation
dy
x2 + y2
=
dx
2y
are the family of circles x 2 + (y − k)2 = k 2 .
7. No solution to the differential equation dy/dx =
f (x, y) can intersect with equilibrium solutions of the
differential equation. Problems
For Problems 1–7, determine the differential equation giving
the slope of the tangent line at the point (x, y) for the given
family of curves.
1. y = c/x .
2. y = cx 2 .
3. x 2 + y 2 = 2cx .
4. y 2 = cx .
5. 2cy = x 2 − c2 .
6. y 2 − x 2 = c.
7. (x − c)2 + (y − c)2 = 2c2 . y = x sin(x + y), y(0) = 1 has a unique solution.
13. Use the existence and uniqueness theorem to prove
that y(x) = 3 is the only solution to the initial-value
problem
y= x
(y 2 − 9),
x2 + 1 y(0) = 3. 14. Do you think that the initial-value problem
y = xy 1/2 , y(0) = 0 has a unique solution? Justify your answer.
15. Even simple-looking differential equations can have
complicated solution curves. In this problem, we study
the solution curves of the differential equation
y = −2xy 2 . (1.3.8) (a) Verify that the hypotheses of the existence and
uniqueness theorem (Theorem 1.3.2) are satisﬁed
for the initial-value problem
y = −2xy 2 , y(x0 ) = y0 for every (x0 , y0 ). This establishes that the initialvalue problem always has a unique solution on
some interval containing x0 .
(b) Verify that for all values of the constant c, y(x) =
1/(x 2 + c) is a solution to (1.3.8). i i i i i i i “main”
2007/2/16
page 31
i 1.3 (c) Use the solution to (1.3.8) given in (b) to solve the
following initial-value problems. For each case,
sketch the corresponding solution curve, and state
the maximum interval on which your solution is
valid.
(i) y = −2xy 2 ,
(ii) y = −2xy 2 ,
(iii) y = −2xy 2 , The Geometry of First-Order DIfferential Equations 24. y = x 2 + y 2 .
25. According to Newton’s law of cooling (see Section 1.1), the temperature of an object at time t is
governed by the differential equation y(0) = 1.
y(1) = 1.
y(0) = −1. dT
= −k(T − Tm ),
dt
where Tm is the temperature of the surrounding
medium, and k is a constant. Consider the case when
Tm = 70 and k = 1/80. Sketch the corresponding
slope ﬁeld and some representative solution curves.
What happens to the temperature of the object as
t → ∞? Note that this result is independent of the
initial temperature of the object. (d) What is the unique solution to the following
initial-value problem?
y = −2xy 2 , y(0) = 0. 16. Consider the initial-value problem:
y = y(y − 1), y(x0 ) = y0 . (a) Verify that the hypotheses of the existence and
uniqueness theorem are satisﬁed for this initialvalue problem for any x0 , y0 . This establishes that
the initial-value problem always has a unique solution on some interval containing x0 .
(b) By inspection, determine all equilibrium solutions to the differential equation.
(c) Determine the regions in the xy -plane where the
solution curves are concave up, and determine
those regions where they are concave down.
(d) Sketch the slope ﬁeld for the differential equation, and determine all values of y0 for which
the initial-value problem has bounded solutions.
On your slope ﬁeld, sketch representative solution
curves in the three cases y0 < 0, 0 < y0 < 1, and
y0 > 1.
For Problems 17–24, sketch the slope ﬁeld and some representative solution curves for the given differential equation.
17. y = 4x .
18. y = 1/x .
19. y = x + y .
20. y = x/y .
21. y = −4x/y .
22. y = x 2 y .
23. y = x 2 cos y . 31 For Problems 26–31, determine the slope ﬁeld and some representative solution curves for the given differential equation.
26. y = −2xy . 27. y= 28. y = 3x − y . 29. y = 2x 2 sin y . 30. y= 2 + y2
.
3 + 0.5x 2 31. y= 1 − y2
.
2 + 0.5x 2 x sin x
.
1 + y2 32.
(a) Determine the slope ﬁeld for the differential
equation
y = x −1 (3 sin x − y)
on the interval (0, 10].
(b) Plot the solution curves corresponding to each of
the following initial conditions:
y(0.5) = 0,
y(1) = 2, y(1) = −1,
y(3) = 0. What do you conclude about the behavior as
x → 0+ of solutions to the differential equation? i i i i i i i “main”
2007/2/16
page 32
i 32 CHAPTER 1 First-Order Differential Equations (c) Plot the solution curve corresponding to the initial condition y(π/2) = 6/π . How does this ﬁt
in with your answer to part (b)?
(d) Describe the behavior of the solution curves for
large positive x . bers of the given family of curves. Describe the
family of orthogonal trajectories.
34. Consider the differential equation Consider the family of curves y = kx 2 , where k is
a constant. di
+ ai = b,
dt (a) Show that the differential equation of the family
of orthogonal trajectories is 33. where a and b are constants. By drawing the slope
ﬁelds corresponding to various values of a and b, formulate a conjecture regarding the value of dy
x
=− .
dx
2y lim i(t). t →∞ (b) On the same axes sketch the slope ﬁeld for the
preceding differential equation and several mem- 1.4 Separable Differential Equations
In the previous section we analyzed ﬁrst-order differential equations using qualitative
techniques. We now begin an analytical study of these differential equations by developing some solution techniques that enable us to determine the exact solution to certain
types of differential equations. The simplest differential equations for which a solution
technique can be obtained are the so-called separable equations, which are deﬁned as
follows: DEFINITION 1.4.1
A ﬁrst-order differential equation is called separable if it can be written in the form
p(y) dy
= q(x).
dx (1.4.1) The solution technique for a separable differential equation is given in Theorem 1.4.2. Theorem 1.4.2 If p(y) and q(x) are continuous, then Equation (1.4.1) has the general solution
p (y) dy = q (x) dx + c, (1.4.2) where c is an arbitrary constant. Proof We use the chain rule for derivatives to rewrite Equation (1.4.1) in the equivalent
form
d
dx p (y) dy = q(x). Integrating both sides of this equation with respect to x yields Equation (1.4.2). i i i i i i i “main”
2007/2/16
page 33
i 1.4 Remark Separable Differential Equations 33 In differential form, Equation (1.4.1) can be written as
p(y) dy = q(x) dx, and the general solution (1.4.2) is obtained by integrating the left-hand side with respect
to y and the right-hand side with respect to x . This is the general procedure for solving
separable equations. Example 1.4.3 Solve (1 + y 2 ) dy
= x cos x .
dx Solution: By inspection we see that the differential equation is separable. Integrating
both sides of the differential equation yields
(1 + y 2 ) dy = x cos x dx + c. Using integration by parts to evaluate the integral on the right-hand side, we obtain
y + 1 y 3 = x sin x + cos x + c,
3
or equivalently
y 3 + 3y = 3(x sin x + cos x) + c1 ,
where c1 = 3c. As often happens with separable differential equations, the solution is
given in implicit form.
In general, the differential equation dy/dx = f (x)g(y) is separable, since it can be
written as
1 dy
= f (x),
g(y) dx
which is of the form of Equation (1.4.1) with p(y) = 1/g(y). It is important to note,
however, that in writing the given differential equation in this way, we have assumed
that g(y) = 0. Thus the general solution to the resulting differential equation may not
include solutions of the original equation corresponding to any values of y for which
g(y) = 0. (These are the equilibrium solutions for the original differential equation.)
We will illustrate with an example.
Example 1.4.4 Find all solutions to
y = −2y 2 x. Solution: (1.4.3) Separating the variables yields
y −2 dy = −2x dx. (1.4.4) Integrating both sides, we obtain
−y −1 = −x 2 + c i i i i i i i “main”
2007/2/16
page 34
i 34 CHAPTER 1 First-Order Differential Equations so that
y(x) = 1
.
x2 − c (1.4.5) This is the general solution to Equation (1.4.4). It is not the general solution to Equation (1.4.3), since there is no value of the constant c for which y(x) = 0, whereas by
inspection we see y(x) = 0 is a solution to Equation (1.4.3). This solution is not contained in (1.4.5), since in separating the variables, we divided by y and hence assumed
implicitly that y = 0. Thus the solutions to Equation (1.4.3) are
y(x) = 1
x2 − c and y(x) = 0. The slope ﬁeld for the given differential equation is depicted in Figure 1.4.1, together
with some representative solution curves. y
2 1 2 1 1 2 x 1 2 Figure 1.4.1: The slope ﬁeld and some solution curves for the differential equation
dy/dx = −2xy 2 . Many difﬁculties that students encounter with ﬁrst-order differential equations arise
not from the solution techniques themselves, but in the algebraic simpliﬁcations that are
used to obtain a simple form for the resulting solution. We will explicitly illustrate some
of the standard simpliﬁcations using the differential equation
dy
= −2xy.
dx
First notice that y(x) = 0 is an equilibrium solution to the differential equation. Consequently, no other solution curves can cross the x -axis. For y = 0 we can separate the
variables to obtain
1
dy = −2x dx.
y (1.4.6) i i i i i i i “main”
2007/2/16
page 35
i 1.4 Separable Differential Equations 35 Integrating this equation yields
ln |y | = −x 2 + c.
Exponentiating both sides of this solution gives
|y | = e−x 2 +c , or equivalently,
|y | = ec e−x .
2 We now introduce a new constant c1 deﬁned by c1 = ec . Then the preceding expression
for |y | reduces to
|y | = c 1 e − x .
2 (1.4.7) Notice that c1 is a positive constant. This is a perfectly acceptable form for the solution.
However, a redeﬁnition of the integration constant can be used to eliminate the absolutevalue bars as follows. According to (1.4.7), the solution to the differential equation is
c1 e − x ,
2
−c1 e−x ,
2 y(x) = if y > 0,
if y < 0. (1.4.8) We can now deﬁne a new constant c2 , by
c2 = c1 ,
−c1 , if y > 0,
if y < 0, in terms of which the solutions given in (1.4.8) can be combined into the single formula
y(x) = c2 e−x .
2 (1.4.9) The appropriate sign for c2 will be determined from the initial conditions. For example,
the initial condition y(0) = 1 would require that c2 = 1, with corresponding unique
solution
y(x) = e−x .
2 Similarly the initial condition y(0) = −1 leads to c2 = −1, so that
y(x) = −e−x .
2 We make one further point about the solution (1.4.9). In obtaining the separable form
(1.4.6), we divided the given differential equation by y , and so the derivation of the
solution obtained assumes that y = 0. However, as we have already noted, y(x) = 0 is
indeed a solution to this differential equation. Formally this solution is the special case
c2 = 0 in (1.4.9) and corresponds to the initial condition y(0) = 0. Thus (1.4.9) does
give the general solution to the differential equation, provided we allow c2 to assume the
value zero. The slope ﬁeld for the differential equation, together with some particular
solution curves, is shown in Figure 1.4.2. i i i i i i i “main”
2007/2/16
page 36
i 36 CHAPTER 1 First-Order Differential Equations
y
3
2
1 3 2 1 1 2 3 x 1
2
3 Figure 1.4.2: Slope ﬁeld and some solution curves for the differential equation
dy/dx = −2xy . Example 1.4.5 kv Positive y An object of mass m falls from rest, starting at a point near the earth’s surface. Assuming that the air resistance is proportional to the velocity of the object, determine the
subsequent motion. Solution: Let y(t) be the distance traveled by the object at time t from the point it
was released, and let the positive y -direction be downward. Then, y(0) = 0, and the
velocity of the object is v(t) = dy/dt . Since the object was dropped from rest, we have
v(0) = 0. The forces acting on the object are those due to gravity, Fg = mg , and the
force due to air resistance, Fr = −kv , where k is a positive constant (see Figure 1.4.3).
According to Newton’s second law, the differential equation describing the motion of
the object is mg Figure 1.4.3: Particle falling
under the inﬂuence of gravity and
air resistance. m dv
= Fg + Fr = mg − kv.
dt We are also given the initial condition v(0) = 0. Thus the initial-value problem governing
the behavior of v is dv
= mg − kv,
m
(1.4.10)
dt
v(0) = 0.
Separating the variables in Equation (1.4.10) yields
m
dv = dt,
mg − kv
which can be integrated directly to obtain
m
− ln |mg − kv | = t + c.
k
Multiplying both sides of this equation by −k/m and exponentiating the result yields
|mg − kv | = c1 e−(k/m)t ,
where c1 = e−ck/m . By redeﬁning the constant c1 , we can write this in the equivalent
form
mg − kv = c2 e−(k/m)t . i i i i i i i “main”
2007/2/16
page 37
i 1.4 Separable Differential Equations 37 Hence,
v(t) = mg
− c3 e−(k/m)t ,
k (1.4.11) where c3 = c2 /k . Imposing the initial condition v(0) = 0 yields
c3 = mg
.
k So the solution to the initial-value problem (1.4.10) is
v(t) = mg
1 − e−(k/m)t .
k (1.4.12) Notice that the velocity does not increase indeﬁnitely, but approaches a so-called limiting
velocity vL deﬁned by
vL = lim v(t) = lim
t →∞ t →∞ mg
mg
1 − e−(k/m)t =
.
k
k The behavior of the velocity as a function of time is shown in Figure 1.4.4. Owing to the
negative exponent in (1.4.11), we see that this result is independent of the value of the
initial velocity.
v
mg/k t Figure 1.4.4: The behavior of the velocity of the object in Example 1.4.5. Since dy/dt = v , it follows from (1.4.12) that the position of the object at time t
can be determined by solving the initial-value problem
mg
dy
=
1 − e−(k/m)t ,
dt
k y(0) = 0. The differential equation can be integrated directly to obtain
y(t) = m
mg
t + e−(k/m)t + c.
k
k Imposing the initial condition y(0) = 0 yields
c=− m2 g
,
k2 so that
y(t) = mg
m −(k/m)t
t+
e
−1
k
k . i i i i i i i “main”
2007/2/16
page 38
i 38 CHAPTER 1 First-Order Differential Equations Example 1.4.6 A hot metal bar whose temperature is 350◦ F is placed in a room whose temperature is
constant at 70◦ F. After two minutes, the temperature of the bar is 210◦ F. Using Newton’s
law of cooling, determine
1. the temperature of the bar after four minutes.
2. the time required for the bar to cool to 100◦ F. Solution: According to Newton’s law of cooling (see Section 1.1), the temperature
of the object at time t is governed by the differential equation
dT
= −k(T − Tm ),
dt (1.4.13) where, from the statement of the problem,
Tm = 70◦ F, T (0) = 350◦ F, T (2) = 210◦ F. Substituting for Tm in Equation (1.4.13), we have the separable equation
dT
= −k(T − 70).
dt
Separating the variables yields
1
dT = −k dt,
T − 70
which we can integrate immediately to obtain
ln |T − 70| = −kt + c.
Exponentiating both sides and solving for T yields
T (t) = 70 + c1 e−kt , (1.4.14) where we have redeﬁned the integration constant. The two constants c1 and k can be
determined from the given auxiliary conditions as follows. The condition T (0) = 350◦ F
requires that 350 = 70 + c1 . Hence, c1 = 280. Substituting this value for c1 into (1.4.14)
yields
T (t) = 70(1 + 4e−kt ). (1.4.15) Consequently, T (2) = 210◦ F if and only if
210 = 70(1 + 4e−2k ),
1
so that e−2k = 2 . Hence, k = 1
2 ln 2, and so, from (1.4.15), T (t) = 70 1 + 4e−(t/2) ln 2 . (1.4.16) We can now determine the quantities requested.
1. We have T (4) = 70(1 + 4e−2 ln 2 ) = 70 1 + 4 · 1
22 = 140◦ F. i i i i i i i “main”
2007/2/16
page 39
i 1.4 Separable Differential Equations 39 2. From (1.4.16), T (t) = 100◦ F when
100 = 70 1 + 4e−(t/2) ln 2
—that is, when
e−(t/2) ln 2 = 3
.
28 Taking the natural logarithm of both sides and solving for t yields
t= 2 ln (28/3)
≈ 6.4 minutes.
ln 2 Exercises for 1.4 Skills 8. The differential equation • Be able to recognize whether or not a given differential
equation is separable. x + 4y
dy
=
dx
4x + y • Be able to solve separable differential equations. is separable. True-False Review
For Questions 1–9, decide if the given statement is true or
false, and give a brief justiﬁcation for your answer. If true,
you can quote a relevant deﬁnition or theorem from the text.
If false, provide an example, illustration, or brief explanation
of why the statement is false.
1. Every differential equation of the form dy/dx =
f (x)g(y) is separable.
2. The general solution to a separable differential equation contains one constant whose value can be determined from an initial condition for the differential
equation. 9. The differential equation
dy
x3y + x2y2
=
dx
x 2 + xy
is separable. Problems
For Problems 1–11, solve the given differential equation.
1. dy
= 2xy.
dx 4. The differential equation dy/dx = x 2 + y 2 is
separable. 2. y2
dy
.
=2
dx
x +1 5. The differential equation dy/dx = x sin(xy) is
separable. 3. ex +y dy − dx = 0. 3. Newton’s law of cooling is a separable differential
equation. 6. The differential equation dy
= ex +y is separable.
dx 7. The differential equation
dy
1
=2
dx
x (1 + y 2 )
is separable. 4. dy
y
=
.
dx
x ln x 5. ydx − (x − 2)dy = 0.
6. dy
2x(y − 1)
.
=
dx
x2 + 3 i i i i i i i “main”
2007/2/16
page 40
i 40 CHAPTER 1 7. y − x First-Order Differential Equations dy
dy
= 3 − 2x 2 .
dx
dx 8. cos(x − y)
dy
=
− 1.
dx
sin x sin y 9. 19. Find the equation of the curve that passes through the
point (3, 1) and whose slope at each point (x, y) is
e x −y . − 1)
dy
=
.
dx
2(x − 2)(x − 1) 10. 20. Find the equation of the curve that passes through the
point (−1, 1) and whose slope at each point (x, y) is
x2y2. x(y 2 21. At time t , the velocity v(t) of an object moving in a
straight line satisﬁes dy
x 2 y − 32
=
+ 2.
dx
16 − x 2 dv
= −(1 + v 2 ).
dt 11. (x − a)(x − b)y − (y − c) = 0, where a, b, c are
constants. (1.4.17) In Problems 12–15, solve the given initial-value problem.
12. (x 2 + 1)y + y 2 = −1, y(0) = 1. 13. (1 − x 2 )y + xy = ax,
constant.
dy
sin(x + y)
14.
=1−
,
dx
sin y cos x
15. y = y 3 sin x, y(0) = 2a , where a is a
y(π/4) = π/4. y(0) = 0. 16. One solution to the initial-value problem
2
dy
= (y − 1)1/2 ,
dx
3 y(1) = 1 is y(x) = 1. Determine another solution. Does this
contradict the existence and uniqueness theorem (Theorem 1.3.2)? Explain.
17. An object of mass m falls from rest, starting at a point
near the earth’s surface. Assuming that the air resistance varies as the square of the velocity of the object,
a simple application of Newton’s second law yields
the initial-value problem for the velocity, v(t), of the
object at time t :
m dv
= mg − kv 2 ,
dt v(0) = 0, (a) Show that
tan−1 (v) = tan−1 (v0 ) − t,
where v0 denotes the velocity of the object at time
t = 0 (and we assume v0 > 0). Hence prove
that the object comes to rest after a ﬁnite time
tan−1 (v0 ). Does the object remain at rest?
(b) Use the chain rule to show that (1.4.17) can be
written as
v dv
= −(1 + v 2 ),
dx where x(t) denotes the distance traveled by the
object at time t , from its position at t = 0. Determine the distance traveled by the object when it
ﬁrst comes to rest.
22. The differential equation governing the velocity of an
object is
dv
= −kv n ,
dt
where k > 0 and n are constants. At t = 0, the object
is set in motion with velocity v0 . where k, m, g are positive constants. (a) Show that the object comes to rest in a ﬁnite time
if and only if n < 1, and determine the maximum
distance traveled by the object in this case. (a) Solve the foregoing initial-value problem for v in
terms of t . (b) If 1 ≤ n < 2, show that the maximum distance
traveled by the object in a ﬁnite time is less than (b) Does the velocity of the object increase indeﬁnitely? Justify.
(c) Determine the position of the object at time t .
18. Find the equation of the curve that passes through the
1
point (0, 2 ) and whose slope at each point (x, y) is
−x/4y . 2
v0 −n
.
(2 − n)k (c) If n ≥ 2, show that there is no limit to the distance
that the object can travel. i i i i i i i “main”
2007/2/16
page 41
i 1.5 23. The pressure p, and density, ρ , of the atmosphere at a
height y above the earth’s surface are related by
dp = −gρ dy.
Assuming that p and ρ satisfy the adiabatic equation
ργ
of state p = p0
, where γ = 1 is a constant
ρ0
and p0 and ρ0 denote the pressure and density at the
earth’s surface, respectively, show that
p = p0 1 − (γ − 1) ρ0 gy
·
γ
p0 γ /(γ −1) . 24. An object whose temperature is 615◦ F is placed in a
room whose temperature is 75◦ F. At 4 p.m. the temperature of the object is 135◦ F, and an hour later its temperature is 95◦ F. At what time was the object placed
in the room?
25. A ﬂammable substance whose initial temperature is
50◦ F is inadvertently placed in a hot oven whose temperature is 450◦ F. After 20 minutes, the substance’s
temperature is 150◦ F. Find the temperature of the substance after 40 minutes. Assuming that the substance
ignites when its temperature reaches 350◦ F, ﬁnd the
time of combustion. 1.5 Some Simple Population Models 41 26. At 2 p.m. on a cool (34◦ F) afternoon in March, Sherlock Holmes measured the temperature of a dead body
to be 38◦ F. One hour later, the temperature was 36◦ F.
After a quick calculation using Newton’s law of cooling, and taking the normal temperature of a living body
to be 98◦ F, Holmes concluded that the time of death
was 10 a.m. Was Holmes right?
27. At 4 p.m., a hot coal was pulled out of a furnace and
allowed to cool at room temperature (75◦ F). If, after
10 minutes, the temperature of the coal was 415◦ F, and
after 20 minutes, its temperature was 347◦ F, ﬁnd the
following:
(a) The temperature of the furnace.
(b) The time when the temperature of the coal was
100◦ F.
28. A hot object is placed in a room whose temperature is
72◦ F. After one minute the temperature of the object
is 150◦ F and its rate of change of temperature is 20◦ F
per minute. Find the initial temperature of the object
and the rate at which its temperature is changing after
10 minutes. Some Simple Population Models
In this section we consider two important models of population growth whose mathematical formulation leads to separable differential equations. Malthusian Growth
The simplest mathematical model of population growth is obtained by assuming that the
rate of increase of the population at any time is proportional to the size of the population
at that time. If we let P (t) denote the population at time t , then
dP
= kP ,
dt
where k is a positive constant. Separating the variables and integrating yields
P (t) = P0 ekt , (1.5.1) where P0 denotes the population at t = 0. This law predicts an exponential increase in
the population with time, which gives a reasonably accurate description of the growth
of certain algae, bacteria, and cell cultures. It is called the Malthusian growth model.
The time taken for such a culture to double in size is called the doubling time. This is
the time, td , when P (td ) = 2P0 . Substituting into (1.5.1) yields
2P0 = P0 ektd .
Dividing both sides by P0 and taking logarithms, we ﬁnd
ktd = ln 2, i i i i i i i “main”
2007/2/16
page 42
i 42 CHAPTER 1 First-Order Differential Equations so that the doubling time is
td = Example 1.5.1 1
ln 2.
k The number of bacteria in a certain culture grows at a rate that is proportional to the
number present. If the number increased from 500 to 2000 in 2 hours, determine
1. the number present after 12 hours.
2. the doubling time. Solution: The behavior of the system is governed by the differential equation
dP
= kP ,
dt so that
P (t) = P0 ekt ,
where the time t is measured in hours. Taking t = 0 as the time when the population
was 500, we have P0 = 500. Thus,
P (t) = 500ekt .
Further, P (2) = 2000 implies that
2000 = 500e2k ,
so that
k= 1
ln 4 = ln 2.
2 Consequently,
P (t) = 500et ln 2 .
1. The number of bacteria present after 12 hours is therefore
P (12) = 500e12 ln 2 = 500(212 ) = 2, 048, 000.
2. The doubling time of the system is
td = 1
ln 2 = 1 hour.
k Logistic Population Model
The Malthusian growth law (1.5.1) does not provide an accurate model for the growth of
a population over a long time period. To obtain a more realistic model we need to take
account of the fact that as the population increases, several factors will begin to affect the
growth rate. For example, there will be increased competition for the limited resources
that are available, increases in disease, and overcrowding of the limited available space,
all of which would serve to slow the growth rate. In order to model this situation mathematically, we modify the differential equation leading to the simple exponential growth i i i i i i i “main”
2007/2/16
page 43
i 1.5 Some Simple Population Models 43 law by adding in a term that slows the growth down as the population increases. If we
consider a closed environment (neglecting factors such as immigration and emigration),
then the rate of change of population can be modeled by the differential equation
dP
= [B(t) − D(t)]P ,
dt
where B(t) and D(t) denote the birth rate and death rate per individual, respectively.
The simple exponential law corresponds to the case when B(t) = k and D(t) = 0. In
the more general situation of interest now, the increased competition as the population
grows will result in a corresponding increase in the death rate per individual. Perhaps
the simplest way to take account of this is to assume that the death rate per individual is
directly proportional to the instantaneous population, and that the birth rate per individual
remains constant. The resulting initial-value problem governing the population growth
can then be written as
dP
= (B0 − D0 P )P ,
dt P (0) = P0 , where B0 and D0 are positive constants. It is useful to write the differential equation in
the equivalent form
dP
P
=r 1−
dt
C (1.5.2) P, where r = B0 , and C = B0 /D0 . Equation (1.5.2) is called the logistic equation, and the
corresponding population model is called the logistic model. The differential equation
(1.5.2) is separable and can be solved without difﬁculty. Before doing that, however, we
give a qualitative analysis of the differential equation.
The constant C in Equation (1.5.2) is called the carrying capacity of the population.
We see from Equation (1.5.2) that if P < C , then dP /dt > 0 and the population
increases, whereas if P > C , then dP /dt < 0 and the population decreases. We can
therefore interpret C as representing the maximum population that the environment can
sustain. We note that P (t) = C is an equilibrium solution to the differential equation,
as is P (t) = 0. The isoclines for Equation (1.5.2) are determined from
r 1− P
C P = k, where k is a constant. This can be written as
P 2 − CP + kC
= 0,
r so that the isoclines are the lines
P= 1
C±
2 C2 − 4kC
r . This tells us that the slopes of the solution curves satisfy
C2 − 4kC
≥ 0,
r so that
k ≤ rC/4. i i i i i i i “main”
2007/2/16
page 44
i 44 CHAPTER 1 First-Order Differential Equations Furthermore, the largest value that the slope can assume is k = rC/4, which corresponds
to P = C/2. We also note that the slope approaches zero as the solution curves approach
the equilibrium solutions P (t) = 0 and P (t) = C . Differentiating Equation (1.5.2) yields
d 2P
=r
dt 2 1− P
C dP
P dP
−
dt
C dt =r 1−2 P
C dP
r2
= 2 (C − 2P )(C − P )P ,
dt
C where we have substituted for dP /dt from (1.5.2) and simpliﬁed the result. Since P = C
and P = 0 are solutions to the differential equation (1.5.2), the only points of inﬂection
occur along the line P = C/2. The behavior of the concavity is therefore given by the
following schematic:
sign of P :
P -interval: | + + ++ |− − −− |+ + ++
0
C/2
C This information determines the general behavior of the solution curves to the differential equation (1.5.2). Figure 1.5.1 gives a Maple plot of the slope ﬁeld and some
representative solution curves. Of course, such a ﬁgure could have been constructed by
hand, using the information we have obtained. From Figure 1.5.1, we see that if the
initial population is less than the carrying capacity, then the population increases monotonically toward the carrying capacity. Similarly, if the initial population is bigger than
the carrying capacity, then the population monotonically decreases toward the carrying
capacity. Once more this illustrates the power of the qualitative techniques that have
been introduced for analyzing ﬁrst-order differential equations.
P C C/2 t Figure 1.5.1: Representative slope ﬁeld and some approximate solution curves for the logistic
equation. We turn now to obtaining an analytical solution to the differential equation (1.5.2).
Separating the variables in Equation (1.5.2) and integrating yields
C
dP = rt + c1 ,
P (C − P )
where c1 is an integration constant. Using a partial-fraction decomposition on the lefthand side, we ﬁnd
1
1
+
P
C−P d P = rt + c1 , i i i i i i i “main”
2007/2/16
page 45
i 1.5 Some Simple Population Models 45 which upon integration gives
ln P
= rt + c1 .
C−P Exponentiating, and redeﬁning the integration constant, yields
P
= c2 ert ,
C−P
which can be solved algebraically for P to obtain
P (t) = c2 Cert
,
1 + c2 ert P (t) = c2 C
.
c2 + e−rt or equivalently, Imposing the initial condition P (0) = P0 , we ﬁnd that c2 = P0 /(C − P0 ). Inserting this
value of c2 into the preceding expression for P (t) yields
P (t) = CP0
.
P0 + (C − P0 )e−rt (1.5.3) We make two comments regarding this formula. First, we see that, owing to the negative
exponent of the exponential term in the denominator, as t → ∞ the population does
indeed tend to the carrying capacity C independently of the initial population P0 . Second,
by writing (1.5.3) in the equivalent form
P (t) = P0
,
P0 /C + (1 − P0 /C)e−rt it follows that if P0 is very small compared to the carrying capacity, then for small t the
terms involving P0 in the denominator can be neglected, leading to the approximation
P (t) ≈ P0 ert .
Consequently, in this case, the Malthusian population model does approximate the logistic model for small time intervals.
Although we now have a formula for the solution to the logistic population model, the
qualitative analysis is certainly enlightening with regard to the general overall properties
of the solution. Of course if we want to investigate speciﬁc details of a particular model,
then we use the corresponding exact solution (1.5.3).
Example 1.5.2 The initial population (measured in thousands) of a city is 20. After 10 years this has
increased to 50.87, and after 15 years to 78.68. Use the logistic model to predict the
population after 30 years. Solution: In this problem we have P0 = P (0) = 20, P (10) = 50.87, P (15) =
78.68, and we wish to ﬁnd P (30). Substituting for P0 into Equation (1.5.3) yields
P (t) = 20C
.
20 + (C − 20)e−rt (1.5.4) i i i i i i i “main”
2007/2/16
page 46
i 46 CHAPTER 1 First-Order Differential Equations
P
500
400 Carrying capacity 300
200
100 0 20 40 60 80 100 t Figure 1.5.2: Solution curve corresponding to the population model in Example 1.5.2. The
population is measured in thousands of people. Imposing the two remaining auxiliary conditions leads to the following pair of equations
for determining r and C :
20C
,
20 + (C − 20)e−10r
20C
.
78.68 =
20 + (C − 20)e−15r 50.87 = This is a pair of nonlinear equations that are tedious to solve by hand. We therefore turn
to technology. Using the algebraic capabilities of Maple, we ﬁnd that
r ≈ 0.1, C ≈ 500.37. Substituting these values of r and C in Equation (1.5.4) yields
10007.4
.
20 + 480.37e−0.1t
Accordingly, the predicted value of the population after 30 years is
P (t) = 10007.4
= 227.87.
20 + 480.37e−3
A sketch of P (t) is given in Figure 1.5.2.
P (30) = Exercises for 1.5 Key Terms
Malthusian growth model, Doubling time, Logistic growth
model, Carrying capacity. Skills
• Be able to solve the basic differential equations describing the Malthusian and logistic population growth
models. • Be able to solve word problems involving initial conditions, doubling time, etc., for the Malthusian and
logistic population growth models.
• Be able to compute the carrying capacity for a logistic
population model.
• Be able to discuss the qualitative behavior of a population governed by a Malthusian or logistic model,
based on initial values, doubling time, and so on as a
function of time. i i i i i i i “main”
2007/2/16
page 47
i 1.5 True-False Review
For Questions 1–10, decide if the given statement is true or
false, and give a brief justiﬁcation for your answer. If true,
you can quote a relevant deﬁnition or theorem from the text.
If false, provide an example, illustration, or brief explanation
of why the statement is false.
1. A population whose growth rate at any given time is
proportional to its size at that time obeys the Malthusian growth model.
2. If a population obeys the logistic growth model, then
its size can never exceed the carrying capacity of the
population. Some Simple Population Models 47 bacteria. Determine the initial size of the culture and
the doubling time of the population.
3. A certain cell culture has a doubling time of 4 hours.
Initially there were 2000 cells present. Assuming an
exponential growth law, determine the time it takes for
the culture to contain 106 cells.
4. At time t , the population P (t) of a certain city is increasing at a rate proportional to the number of residents in the city at that time. In January 1990 the
population of the city was 10,000, and by 1995 it had
risen to 20,000.
(a) What will the population of the city be at the beginning of the year 2010? 3. The differential equations which describe population
growth according to the Malthusian model and the logistic model are both separable. (b) In what year will the population reach one
million? 4. The rate of change of a population whose growth is
described with the logistic model eventually tends toward zero, regardless of the initial population. In the logistic population model (1.5.3), if P (t1 ) = P1 and
P (2t1 ) = P2 , then it can be shown (through some algebra
performed tediously by hand, or easily on a computer algebra
system) that 5. If the doubling time of a population governed by the
Malthusian growth model is ﬁve minutes, then the initial population increases 64-fold in a half-hour.
6. If a population whose growth is based on the Malthusian growth model has a doubling time of 10 years,
then it takes approximately 30–40 years in order for
the initial population size to increase tenfold.
7. The population growth rate according to the Malthusian growth model is always constant.
8. The logistic population model always has exactly two
equilibrium solutions.
9. The concavity of the graph of population governed by
the logistic model changes if and only if the initial
population is less than the carrying capacity.
10. The concavity of the graph of a population governed
by the Malthusian growth model never changes, regardless of the initial population. Problems
1. The number of bacteria in a culture grows at a rate proportional to the number present. Initially there were 10
bacteria in the culture. If the doubling time of the culture is 3 hours, ﬁnd the number of bacteria that were
present after 24 hours.
2. The number of bacteria in a culture grows at a rate proportional to the number present. After 10 hours, there
were 5000 bacteria present, and after 12 hours, 6000 1
P2 (P1 − P0 )
ln
,
t1
P0 (P2 − P1 ) (1.5.5) P1 [P1 (P0 + P2 ) − 2P0 P2 ]
.
2
P1 − P0 P2 (1.5.6) r= C= These formulas will be used in Problems 5–7.
5. The initial population in a small village is 500. After
5 years this has grown to 800 and after 10 years to
1000. Using the logistic population model, determine
the population after 15 years.
6. An animal sanctuary had an initial population of 50 animals. After two years the population was 62 and after
four years 76. Using the logistic population model,
determine the carrying capacity and the number of animals in the sanctuary after 20 years.
7. (a) Using Equations (1.5.5) and (1.5.6), and the fact
that r and C are positive, derive two inequalities
that P0 , P1 , P2 must satisfy in order for there to
be a solution to the logistic equation satisfying
the conditions
P (0) = P0 , P (t1 ) = P1 , P (2t1 ) = P2 . (b) The initial population in a town is 10,000. After
5 years this has grown to 12,000, and after 10
years to 18,000. Is there a solution to the logistic
equation that ﬁts this data? i i i i i i i “main”
2007/2/16
page 48
i 48 CHAPTER 1 First-Order Differential Equations 8. Of the 1500 passengers, crew, and staff that board a
cruise ship, 5 have the ﬂu. After one day of sailing,
the number of infected people has risen to 10. Assuming that the rate at which the ﬂu virus spreads is
proportional to the product of the number of infected
individuals and the number not yet infected, determine
how many people will have the ﬂu at the end of the
14-day cruise. Would you like to be a member of the
customer relations department for the cruise line the
day after the ship docks?
9. Consider the population model
dP
= r(P − T )P ,
dt P (0) = P0 , (1.5.7) where r, T , and P0 are positive constants.
(a) Perform a qualitative analysis of the differential
equation in the initial-value problem (1.5.7), following the steps used in the text for the logistic
equation. Identify the equilibrium solutions, the
isoclines, and the behavior of the slope and concavity of the solution curves.
(b) Using the information obtained in (a), sketch the
slope ﬁeld for the differential equation and include representative solution curves.
(c) What predictions can you make regarding the
behavior of the population? Consider the cases
P0 < T and P0 > T . The constant T is called
the threshold level. Based on your predictions,
why is this an appropriate term to use for T ?
10. In the preceding problem, a qualitative analysis of the
differential equation in (1.5.7) was carried out. In this
problem, we determine the exact solution to the differential equation and verify the predictions from the
qualitative analysis. 11. As a modiﬁcation to the population model considered
in the previous two problems, suppose that P (t) satisﬁes the initial-value problem
dP
= r(C − P )(P − T )P ,
dt P (0) = P0 , where r, C, T , P0 are positive constants, and 0 < T <
C . Perform a qualitative analysis of this model. Sketch
the slope ﬁeld and some representative solution curves
in the three cases 0 < P0 < T , T < P0 < C , and
P0 > C . Describe the behavior of the corresponding
solutions.
The next two problems consider the Gompertz population
model, which is governed by the initial-value problem
dP
= rP (ln C − ln P ),
dt P (0) = P0 , (1.5.8) where r, C, and P0 are positive constants.
12. Determine all equilibrium solutions for the differential equation in (1.5.8), and the behavior of the slope
and concavity of the solution curves. Use this information to sketch the slope ﬁeld and some representative
solution curves.
13. Solve the initial-value problem (1.5.8) and verify that
all solutions satisfy lim P (t) = C .
t →∞ Problems 14–16 consider the phenomenon of exponential
decay. This occurs when a population P (t) is governed by
the differential equation
dP
= kP ,
dt
where k is a negative constant. (a) Solve the initial-value problem (1.5.7).
(b) Using your solution from (a), verify that if P0 <
T , then lim P (t) = 0. What does this mean for
t →∞ the population?
(c) Using your solution from (a), verify that if P0 >
T , then each solution curve has a vertical asymptote at t = te , where
te = 1
P0
ln
rT
P0 − T . How do you interpret this result in terms of population growth? Note that this was not obvious
from the qualitative analysis performed in the previous problem. 14. A population of swans in a wildlife sanctuary is declining due to the presence of dangerous chemicals in
the water. If the population of swans is experiencing
exponential decay, and if there were 400 swans in the
park at the beginning of the summer and 340 swans
30 days later,
(a) How many swans are in the park 60 days after
the start of summer? 100 days after the start of
summer?
(b) How long does it take for the population of swans
to be cut in half? (This is known as the half-life
of the population.) i i i i i i i “main”
2007/2/16
page 49
i 1.6 15. At the conclusion of the Super Bowl, the number of
fans remaining in the stadium decreases at a rate proportional to the number of fans in the stadium. Assume
that there are 100,000 fans in the stadium at the end of
the Super Bowl and ten minutes later there are 80,000
fans in the stadium. P2 = 17. Use some form of technology to solve the pair of
equations
P1 = CP0
,
P0 + (C − P0 )e−rt1 1.6 49 CP0
,
P0 + (C − P0 )e−2rt1 for r and C , and thereby derive the expressions given
in Equations (1.5.5) and (1.5.6).
18. (a) Thirty minutes after the Super Bowl will there be
more or less than 40,000 fans? How do you know
this without doing any calculations?
(b) What is the half-life (see the previous problem)
for the fan population in the stadium?
(c) When will there be only 15,000 fans left in the
stadium?
(d) Explain why the exponential decay model for the
population of fans in the stadium is not realistic
from a qualitative perspective.
16. Cobalt-60, an isotope used in cancer therapy, decays
exponentially with a half-life of 5.2 years (i.e., half
the original sample remains after 5.2 years). How long
does it take for a sample of cobalt-60 to disintegrate
to the extent that only 4% of the original amount remains? First-Order Linear Differential Equations According to data from the U.S. Bureau of the Census, the population (measured in millions of people)
of the United States in 1950, 1960, and 1970 was, respectively, 151.3, 179.4, and 203.3.
(a) Using the 1950 and 1960 population ﬁgures, solve
the corresponding Malthusian population model.
(b) Determine the logistic model corresponding to
the given data.
(c) On the same set of axes, plot the solution curves
obtained in (a) and (b). From your plots, determine the values the different models would have
predicted for the population in 1980 and 1990,
and compare these predictions to the actual values of 226.54 and 248.71, respectively. 19. In a period of ﬁve years, the population of a city
doubles from its initial size of 50 (measured in thousands of people). After ten more years, the population
has reached 250. Determine the logistic model corresponding to this data. Sketch the solution curve and
use your plot to estimate the time it will take for the
population to reach 95% of the carrying capacity. First-Order Linear Differential Equations
In this section we derive a technique for determining the general solution to any ﬁrst-order
linear differential equation. This is the most important technique in the chapter. DEFINITION 1.6.1
A differential equation that can be written in the form
a(x) dy
+ b(x)y = r(x)
dx (1.6.1) where a(x), b(x), and r(x) are functions deﬁned on an interval (a, b), is called a
ﬁrst-order linear differential equation.
We assume that a(x) = 0 on (a, b) and divide both sides of (1.6.1) by a(x) to obtain
the standard form
dy
+ p(x)y = q(x),
dx (1.6.2) where p(x) = b(x)/a(x) and q(x) = r(x)/a(x). The idea behind the solution technique i i i i i i i “main”
2007/2/16
page 50
i 50 CHAPTER 1 First-Order Differential Equations for (1.6.2) is to rewrite the differential equation in the form
d
[g(x, y)] = F (x)
dx
for an appropriate function g(x, y). The general solution to the differential equation can
then be obtained by an integration with respect to x . First consider an example.
Example 1.6.2 Solve the differential equation
dy
1
+ y = ex ,
dx
x Solution: x > 0. (1.6.3) If we multiply (1.6.3) by x, we obtain
x dy
+ y = xex .
dx But, from the product rule for differentiation, the left-hand side of this equation is just
d
the expanded form of
(xy). Thus (1.6.3) can be written in the equivalent form
dx
d
(xy) = xex .
dx
Integrating both sides of this equation with respect to x, we obtain
xy = xex − ex + c.
Dividing by x yields the general solution to (1.6.3) as
y(x) = x −1 [ex (x − 1) + c],
where c is an arbitrary constant.
In the preceding example we multiplied the given differential equation by the function I (x) = x . This had the effect of reducing the left-hand side of the resulting differential equation to the integrable form
d
(xy).
dx
Motivated by this example, we now consider the possibility of multiplying the general
linear differential equation
dy
+ p(x)y = q(x)
dx (1.6.4) by a nonzero function I (x), chosen in such a way that the left-hand side of the resulting
differential equation is
d
[I (x)y ].
dx
Henceforth we will assume that the functions p and q are continuous on (a, b). Multiplying the differential equation (1.6.4) by I (x) yields
I dy
+ p(x)Iy = I q(x).
dx (1.6.5) i i i i i i i “main”
2007/2/16
page 51
i 1.6 First-Order Linear Differential Equations 51 Furthermore, from the product rule for derivatives, we know that
d
dy
dI
(Iy) = I
+
y.
dx
dx
dx (1.6.6) Comparing Equations (1.6.5) and (1.6.6), we see that Equation (1.6.5) can indeed be
written in the integrable form
d
(Iy) = I q(x),
dx
provided the function I (x) is a solution to5
I dy
dy
dI
+ p(x)Iy = I
+
y.
dx
dx
dx This will hold whenever I (x) satisﬁes the separable differential equation
dI
= p(x)I.
dx (1.6.7) Separating the variables and integrating yields
ln |I | = p(x) dx + c, so that
I (x) = c1 e p (x)dx , where c1 is an arbitrary constant. Since we require only one solution to Equation (1.6.7),
we set c1 = 1, in which case
I (x) = e p (x)dx . We can therefore draw the following conclusion.
Multiplying the linear differential equation
dy
+ p(x)y = q(x)
dx
by I (x) = e p (x)dx (1.6.8) reduces it to the integrable form
d
e
dx p (x)dx y = q(x)e p (x)dx . (1.6.9) The general solution to (1.6.8) can now be obtained from (1.6.9) by integration.
Formally we have
y(x) = e− p (x) dx q (x)e p (x)dx dx + c . (1.6.10) 5 This is obtained by equating the left-hand side of Equation (1.6.5) to the right-hand side of Equation (1.6.6). i i i i i i i “main”
2007/2/16
page 52
i 52 CHAPTER 1 First-Order Differential Equations Remarks
1. The function I (x) = e p(x)dx is called an integrating factor for the differential
equation (1.6.8), since it enables us to reduce the differential equation to a form
that is directly integrable.
2. It is not necessary to memorize (1.6.10). In a speciﬁc problem, we ﬁrst evaluate
the integrating factor e p(x)dx and then use (1.6.9). Example 1.6.3 Solve the initial-value problem
dy
2
+ xy = xex /2 ,
dx Solution: y(0) = 1. An appropriate integrating factor in this case is
I (x) = e x dx = ex 2 /2 . Multiplying the given differential equation by I and using (1.6.9) yields
d x 2 /2
2
(e
y) = xex .
dx
Integrating both sides with respect to x , we obtain
ex 2 /2 2 1
y = 2 ex + c. Hence,
y(x) = e−x 2 /2 2 1
( 2 ex + c). Imposing the initial condition y(0) = 1 yields
1= 1
2 + c, 1
so that c = 2 . Thus the required particular solution is
1
y(x) = 2 e−x Example 1.6.4 Solve x 2 /2 2 1
(ex + 1) = 2 (ex 2 /2 + e −x 2 /2 ) = cosh(x 2 /2). dy
+ 2y = cos x, x > 0.
dx Solution:
x yields We ﬁrst write the given differential equation in standard form. Dividing by
dy
+ 2x −1 y = x −1 cos x.
dx (1.6.11) An integrating factor is
I (x) = e 2x −1 dx = e2 ln x = x 2 , i i i i i i i “main”
2007/2/16
page 53
i 1.6 First-Order Linear Differential Equations 53 so that upon multiplying Equation (1.6.11) by I , we obtain
d2
(x y) = x cos x.
dx
Integrating and rearranging gives
y(x) = x −2 (x sin x + cos x + c),
where we have used integration by parts on the right-hand side. Example 1.6.5 Solve the initial-value problem
y − y = f (x),
where f (x) = 1,
2 − x, y(0) = 0, if x < 1,
if x ≥ 1. Solution: We have sketched f (x) in Figure 1.6.1. An integrating factor for the differential equation is I (x) = e−x .
f (x) 1 1 2 x Figure 1.6.1: A sketch of the function f (x) from Example 1.6.5. Upon multiplication by the integrating factor, the differential equation reduces to
d −x
(e y) = e−x f (x).
dx
We now integrate this differential equation over the interval [0, x ]. To do so we need to
use a dummy integration variable, which we denote by w. We therefore obtain
e−w y(w) x
0 x = e−w f (w) dw, 0 or equivalently,
x e−x y(x) − y(0) = e−w f (w) dw. 0 Multiplying by ex and substituting for y(0) = 0 yields
x y(x) = ex e−w f (w) dw. (1.6.12) 0 i i i i i i i “main”
2007/2/16
page 54
i 54 CHAPTER 1 First-Order Differential Equations Owing to the form of f (x), the value of the integral on the right-hand side will depend
on whether x < 1 or x ≥ 1. If x < 1, then f (w) = 1, and so (1.6.12) can be written as
x y(x) = ex e−w dw = ex (1 − e−x ), 0 so that
y(x) = ex − 1, x < 1. If x ≥ 1, then the interval of integration [0, x ] must be split into two parts. From (1.6.12)
we have
1 y(x) = ex x e−w dw + 0 (2 − w)e−w d w. 1 A straightforward integration leads to
y(x) = ex (1 − e−1 ) + − 2e−w + we−w + e−w x
1 , which simpliﬁes to
y(x) = ex (1 − e−1 ) + x − 1.
The solution to the initial-value problem can therefore be written as
ex − 1,
ex (1 − e−1 ) + x − 1, y(x) = if x < 1,
if x ≥ 1. A sketch of the corresponding solution curve is given in Figure 1.6.2.
y 15 10 5 2 1 1 2 3 x Figure 1.6.2: The solution curve for the initial-value problem in Example 1.6.5. The dashed
curve is the continuation of y(x) = ex − 1 for x > 1. Differentiating both branches of this function, we ﬁnd
y (x) = ex ,
ex (1 − e−1 ) + 1, if x < 1,
if x ≥ 1. y (x) = ex ,
ex (1 − e−1 ), if x < 1,
if x ≥ 1. We see that even though the function f in the original differential equation was not differentiable at x = 1, the solution to the initial-value problem has a continuous derivative
at that point. The discontinuity in the derivative of the driving term does show up in the
second derivative of the solution, as indeed it must. i i i i i i i “main”
2007/2/16
page 55
i 1.6 First-Order Linear Differential Equations 55 Exercises for 1.6
4. 2x
dy
+
y = 4x,
dx
1 − x2 5. Key Terms dy
2x
4
+
y=
.
2
dx
1+x
(1 + x 2 )2 First-order linear differential equation, Integrating factor. Skills
• Be able to recognize a ﬁrst-order linear differential
equation.
• Be able to ﬁnd an integrating factor for a given ﬁrstorder linear differential equation.
• Be able to solve a ﬁrst-order linear differential equation. True-False Review −1 < x < 1. 6. 2(cos2 x)y + y sin 2x = 4 cos4 x,
0 ≤ x < π/2.
7. y + 1
y = 9x 2 .
x ln x 8. y − y tan x = 8 sin3 x .
9. t dx
+ 2 x = 4e t ,
dt t > 0. 10. y = sin x(y sec x − 2). For Questions 1–5, decide if the given statement is true or
false, and give a brief justiﬁcation for your answer. If true,
you can quote a relevant deﬁnition or theorem from the text.
If false, provide an example, illustration, or brief explanation
of why the statement is false.
1. There is a unique integrating factor for a differential
equation of the form y + p(x)y = q(x). 11. (1 − y sin x) dx − (cos x) dy = 0.
12. y − x −1 y = 2x 2 ln x .
13. y + αy = eβx , where α, β are constants.
14. y + mx −1 y = ln x , where m is constant.
In Problems 15–20, solve the given initial-value problem. 2. An integrating factor for the differential equation
y + p(x)y = q(x) is e p(x)dx . 15. y + 2x −1 y = 4x, 3. Upon multiplying the differential equation y +
p(x)y = q(x) by an integrating factor I (x), the differential equation becomes (I (x) · y) = q(x)I . 16. (sin x)y − y cos x = sin 2x, 4. An integrating factor for the differential equation 18. (y − ex ) dx + dy = 0, y(0) = 1. dy
= x 2 y + sin x
dx
is I (x) = e 17. y(1) = 2. dx
2
+
x = 5, x(0) = 4.
dt
4−t 19. y + y = f (x), y(0) = 3, where
f (x) = x 2 dx . y(π/2) = 2. 1,
0, if x ≤ 1,
if x > 1. 5. An integrating factor for the differential equation
dy
y
=x−
x
dx
is I (x) = 5x . Problems
For Problems 1–14, solve the given differential equation.
1. dy
− y = e 2x .
dx 2. x 2 y − 4xy = x 7 sin x,
3. y + 2xy = 2x 3 . x > 0. 20. y − 2y = f (x), y(0) = 1, where
f (x) = 1 − x,
0, if x < 1,
if x ≥ 1. 21. Solve the initial-value problem in Example 1.6.5 as
follows. First determine the general solution to the
differential equation on each interval separately. Then
use the given initial condition to ﬁnd the appropriate
integration constant for the interval (−∞, 1). To determine the integration constant on the interval [1, ∞),
use the fact that the solution must be continuous at
x = 1. i i i i i i i “main”
2007/2/16
page 56
i 56 CHAPTER 1 First-Order Differential Equations 22. Find the general solution to the second-order differential equation
1 dy
d 2y
+
= 9x,
2
x dx
dx x > 0. [Hint: Let u = dy/dx .]
23. Solve the differential equation for Newton’s law of
cooling by viewing it as a ﬁrst-order linear differential
equation.
24. Suppose that an object is placed in a medium whose
temperature is increasing at a constant rate of α ◦ F per
minute. Show that, according to Newton’s law of cooling, the temperature of the object at time t is given by
T (t) = α(t − k −1 ) + c1 + c2 e−kt ,
where c1 and c2 are constants.
25. Between 8 a.m. and 12 p.m. on a hot summer day, the
temperature rose at a rate of 10◦ F per hour from an
initial temperature of 65◦ F. At 9 a.m. the temperature
of an object was measured to be 35◦ F and was, at that
time, increasing at a rate of 5◦ F per hour. Show that
the temperature of the object at time t was
T (t) = 10t − 15 + 40e(1−t)/8 , Tm (t) = 80e Tm (t) = A − B cos ωt, (b) With Tm given in (1.6.14), solve (1.6.13) subject
to the initial condition T (0) = T0 .
28. This problem demonstrates the variation-ofparameters method for ﬁrst-order linear differential
equations. Consider the ﬁrst-order linear differential
equation
y + p(x)y = q(x). y + p(x)y = 0
is y(x) = u(x)e− (c) Determine the time, tmax , when the temperature
of the object is a maximum. Find T (tmax ) and
Tm (tmax ).
(d) Make a sketch to depict the behavior of T (t) and
Tm (t).
27. The differential equation . p (x)dx is a solution to (1.6.15), and hence derive the general solution to (1.6.15).
For Problems 29–32, use the technique derived in the previous problem to solve the given differential equation.
29. y + x −1 y = cos x, x > 0. 30. y + y = e−2x .
31. y + y cot x = 2 cos x, (1.6.13) p (x)dx (b) Determine the function u(x) such that T (t) = 80(e−t/40 − e−t/20 ).
(b) What happens to the temperature of the object as
t → +∞? Is this reasonable? (1.6.15) (a) Show that the general solution to the associated
homogeneous equation yH (x) = c1 e− (a) Using Newton’s law of cooling, show that the
temperature of the object at time t is (1.6.14) (a) Make a sketch of Tm (t). Taking t = 0 to correspond to midnight, describe the variation of the
external temperature over a 24-hour period. . dT
= −k1 [T − Tm (t)] + A0 ,
dt ω = π/12, where A and B are constants, and t is measured in
hours. 0 ≤ t ≤ 4. 26. It is known that a certain object has constant of proportionality k = 1/40 in Newton’s law of cooling.
When the temperature of this object is 0◦ F, it is placed
in a medium whose temperature is changing in time
according to
−t/20 where k1 and A0 are positive constants, can be used to
model the temperature variation T (t) in a building. In
this equation, the ﬁrst term on the right-hand side gives
the contribution due to the variation in the outside temperature, and the second term on the right-hand side
gives the contribution due to the heating effect from
internal sources such as machinery, lighting, people,
and so on. Consider the case when 0 < x < π. 32. xy − y = x 2 ln x . i i i i i i i “main”
2007/2/16
page 57
i 1.7 Modeling Problems Using First-Order Linear Differential Equations For Problems 33–38, use a differential equation solver to determine the solution to each of the initial-value problems and
sketch the corresponding solution curve. 35. The initial-value problem in Problem 17. 36. 57 The initial-value problem in Problem 18. 33. The initial-value problem in Problem 15. 37. The initial-value problem in Problem 19. 34. The initial-value problem in Problem 16. 38. The initial-value problem in Problem 20. 1.7 Modeling Problems Using First-Order Linear Differential Equations
There are many examples of applied problems whose mathematical formulation leads
to a ﬁrst-order linear differential equation. In this section we analyze two in detail. Mixing Problems
Statement of the Problem: Consider the situation depicted in Figure 1.7.1. A tank initially
contains V0 liters of a solution in which is dissolved A0 grams of a certain chemical. A
solution containing c1 grams/liter of the same chemical ﬂows into the tank at a constant
rate of r1 liters/minute, and the mixture ﬂows out at a constant rate of r2 liters/minute. We
assume that the mixture is kept uniform by stirring. Then at any time t the concentration
of chemical in the tank, c2 (t), is the same throughout the tank and is given by
c2 = A(t)
,
V (t) (1.7.1) where V (t) denotes the volume of solution in the tank at time t and A(t) denotes the
amount of chemical in the tank at time t .
Solution of concentration c1 grams/liter
flows in at a rate of r1 liters/minute A(t)
V(t)
c2(t) amount of chemical in the tank at time t
volume of solution in the tank at time t
A(t)/V(t) concentration of chemical in the tank at time t
Solution of concentration
c2 grams/liter flows out at
a rate of r2 liters/minute Figure 1.7.1: A mixing problem. Mathematical Formulation: The two functions in the problem are V (t) and A(t). In
order to determine how they change with time, we ﬁrst consider their change during a
short time interval, t minutes. In time t , r1 t liters of solution ﬂow into the tank,
whereas r2 t liters ﬂow out. Thus during the time interval t , the change in the volume
of solution in the tank is
V = r1 t − r2 t = (r1 − r2 ) t. (1.7.2) Since the concentration of chemical in the inﬂow is c1 grams/liter (assumed constant),
it follows that in the time interval t the amount of chemical that ﬂows into the tank is
c1 r1 t . Similarly, the amount of chemical that ﬂows out in this same time interval is
approximately6 c2 r2 t . Thus, the total change in the amount of chemical in the tank
6 This is only an approximation, since c is not constant over the time interval
2
become more accurate as t → 0. t . The approximation will i i i i i i i “main”
2007/2/16
page 58
i 58 CHAPTER 1 First-Order Differential Equations during the time interval t , denoted by A ≈ c1 r1 A, is approximately t − c2 r2 Dividing Equations (1.7.2) and (1.7.3) by
V
= r1 − r2
t t = (c1 r1 − c2 r2 ) (1.7.3) t. t yields
A
≈ c1 r1 − c2 r2 ,
t and respectively. These equations describe the rates of change of V and A over the short, but
ﬁnite, time interval t . In order to determine the instantaneous rates of change of V and
A, we take the limit as t → 0 to obtain
dV
= r1 − r2
dt (1.7.4) dA
A
= c1 r1 − r2 ,
dt
V (1.7.5) and where we have substituted for c2 from Equation (1.7.1). Since r1 and r2 are constants,
we can integrate Equation (1.7.4) directly, obtaining
V (t) = (r1 − r2 )t + V0 ,
where V0 is an integration constant. Substituting for V into Equation (1.7.5) and rearranging terms yields the linear equation for A(t) :
dA
r2
A = c1 r1 .
+
dt
(r1 − r2 )t + V0 (1.7.6) This differential equation can be solved, subject to the initial condition A(0) = A0 , to
determine the behavior of A(t). Remark The reader need not memorize Equation (1.7.6), since it is better to derive
it for each speciﬁc example.
Example 1.7.1 A tank contains 8 L (liters) of water in which is dissolved 32 g (grams) of chemical. A
solution containing 2 g/L of the chemical ﬂows into the tank at a rate of 4 L/min, and
the well-stirred mixture ﬂows out at a rate of 2 L/min.
1. Determine the amount of chemical in the tank after 20 minutes.
2. What is the concentration of chemical in the tank at that time? Solution: We are given r1 = 4 L/min, r2 = 2 L/min, c1 = 2 g/L, V (0) = 8 L, and A(0) = 32 g. For parts 1 and 2, we must ﬁnd A(20) and A(20)/V (20), respectively. Now,
V = r1 t − r2 t implies that
dV
= 2.
dt i i i i i i i “main”
2007/2/16
page 59
i 1.7 Modeling Problems Using First-Order Linear Differential Equations 59 Integrating this equation and imposing the initial condition that V (0) = 8 yields
V (t) = 2(t + 4). (1.7.7) Further,
A ≈ c1 r1 t − c2 r2 t implies that
dA
= 8 − 2c2 .
dt
That is, since c2 = A/V ,
dA
A
=8−2 .
dt
V
Substituting for V from (1.7.7), we must solve
dA
1
+
A = 8.
dt
t +4 (1.7.8) This ﬁrst-order linear equation has integrating factor
I =e 1/(t +4)dt = t + 4. Consequently (1.7.8) can be written in the equivalent form
d
[(t + 4)A] = 8(t + 4),
dt
which can be integrated directly to obtain
(t + 4)A = 4(t + 4)2 + c.
Hence
A(t) = 1
[4(t + 4)2 + c].
t +4 Imposing the given initial condition A(0) = 32 g implies that c = 64. Consequently
A(t) = 4
[(t + 4)2 + 16].
t +4 Setting t = 20 gives us the values for parts 1 and 2:
1. We have
A(20) = 1
296
[(24)2 + 16] =
g.
6
3 2. Furthermore, using (1.7.7),
A(20)
1 296
37
=
·
=
g/L.
V (20)
48 3
18 i i i i i i i “main”
2007/2/16
page 60
i 60 CHAPTER 1 First-Order Differential Equations Electric Circuits
An important application of differential equations arises from the analysis of simple
electric circuits. The most basic electric circuit is obtained by connecting the ends of
a wire to the terminals of a battery or generator. This causes a ﬂow of charge, q(t),
measured in coulombs (C), through the wire, thereby producing a current, i(t), measured
in amperes (A), deﬁned to be the rate of change of charge. Thus,
i(t) = dq
.
dt (1.7.9) In practice a circuit will contain several components that oppose the ﬂow of charge. As
current passes through these components, work has to be done, and the loss of energy is
described by the resulting voltage drop across each component. For the circuits that we
will consider, the behavior of the current in the circuit is governed by Kirchoff’s second
law, which can be stated as follows.
Kirchoff’s Second Law: The sum of the voltage drops around a closed circuit is zero.
In order to apply this law we need to know the relationship between the current
passing through each component in the circuit and the resulting voltage drop. The components of interest to us are resistors, capacitors, and inductors. We brieﬂy describe each
of these next.
1. Resistors: A resistor is a component that, owing to its constituency, directly resists
the ﬂow of charge through it. According to Ohm’s law, the voltage drop, VR ,
between the ends of a resistor is directly proportional to the current that is passing
through it. This is expressed mathematically as
VR = iR (1.7.10) where the constant of proportionality, R , is called the resistance of the resistor.
The units of resistance are ohms ( ).
2. Capacitors: A capacitor can be thought of as a component that stores charge and
thereby opposes the passage of current. If q(t) denotes the charge on the capacitor
at time t , then the drop in voltage, VC , as current passes through it is directly
proportional to q(t). It is usual to express this law in the form
VC = 1
q,
C (1.7.11) where the constant C is called the capacitance of the capacitor. The units of
capacitance are farads (F).
3. Inductors: The third component that is of interest to us is an inductor. This can be
considered as a component that opposes any change in the current ﬂowing through
it. The drop in voltage as current passes through an inductor is directly proportional
to the rate at which the current is changing. We write this as
VL = L di
,
dt (1.7.12) where the constant L is called the inductance of the inductor, measured in units
of henrys (H).
4. EMF : The ﬁnal component in our circuits will be a source of voltage that produces
an electromotive force (EMF), driving the charge through the circuit. As current
passes through the voltage source, there is a voltage gain, which we denote by
E(t) volts (that is, a voltage drop of −E(t) volts). i i i i i i i “main”
2007/2/16
page 61
i 1.7 Modeling Problems Using First-Order Linear Differential Equations
i(t) 61 Inductance, L E(t) Capacitance, C
Resistance, R Switch Figure 1.7.2: A simple RLC circuit. A circuit containing all of these components is shown in Figure 1.7.2. Such a circuit
is called an RLC circuit. According to Kirchoff’s second law, the sum of the voltage
drops at any instant must be zero. Applying this to the RLC circuit in Figure 1.7.2, we
obtain
VR + VC + VL − E(t) = 0. (1.7.13) Substituting into Equation (1.7.13) from (1.7.10)–(1.7.12) and rearranging yields the
basic differential equation for an RLC circuit—namely,
L di
q
+ Ri + = E(t).
dt
C (1.7.14) Three cases are important in applications, two of which are governed by ﬁrst-order
linear differential equations.
Case 1: An RL CIRCUIT. In the case when no capacitor is present, we have what is
referred to as an RL circuit. The differential equation (1.7.14) then reduces to
di
R
1
+ i = E(t).
dt
L
L (1.7.15) This is a ﬁrst-order linear differential equation for the current in the circuit at any time t .
Case 2: An RC CIRCUIT. Now consider the case when no inductor is present in the
circuit. Setting L = 0 in Equation (1.7.14) yields
i+ 1
E
q= .
RC
R In this equation we have two unknowns, q(t) and i(t). Substituting from (1.7.9) for
i(t) = dq/dt, we obtain the following differential equation for q(t):
dq
1
E
+
q= .
dt
RC
R (1.7.16) In this case, the ﬁrst-order linear differential equation (1.7.16) can be solved for the
charge q(t) on the plates of the capacitor. The current in the circuit can then be obtained
from
i(t) = dq
dt by differentiation.
Case 3: An RLC CIRCUIT. In the general case, we must consider all three components to be present in the circuit. Substituting from Equation (1.7.9) into Equation (1.7.14) i i i i i i i “main”
2007/2/16
page 62
i 62 CHAPTER 1 First-Order Differential Equations yields the following differential equation for determining the charge on the capacitor:
d 2q
1
1
R dq
+
q = E(t).
+
2
L dt
LC
L
dt
We will develop techniques in Chapter 6 that enable us to solve this differential equation
without difﬁculty.
For the remainder of this section we restrict our attention to RL and RC circuits.Since
these are both ﬁrst-order linear differential equations, we can solve them using the technique derived in the previous section, once the applied EMF, E(t), has been speciﬁed.
The two most important forms for E(t) are
E(t) = E0 and E(t) = E0 cos ωt, where E0 and ω are constants. The ﬁrst of these corresponds to a source of EMF such
as a battery. The resulting current is called a direct current (DC). The second form of
EMF oscillates between ±E0 and is called an alternating current (AC).
Example 1.7.2 Determine the current in an RL circuit if the applied EMF is E(t) = E0 cos ωt , where
E0 and ω are constants, and the initial current is zero. Solution: Substituting into Equation (1.7.15) for E(t) yields the differential equation
R
E0
di
+ i=
cos ωt,
dt
L
L which we write as
di
E0
+ ai =
cos ωt,
dt
L (1.7.17) where a = R/L. An integrating factor for (1.7.17) is I (t) = eat , so that the equation
can be written in the equivalent form
E0 at
d at
(e i) =
e cos ωt.
dt
L
Integrating this equation using the standard integral
eat cos ωt dt = a2 1
eat (a cos ωt + ω sin ωt) + c,
+ ω2 we obtain
eat i = E0
eat (a cos ωt + ω sin ωt) + c,
+ ω2 ) L(a 2 where c is an integration constant. Consequently,
i(t) = E0
(a cos ωt + ω sin ωt) + ce−at .
+ ω2 ) L(a 2 Imposing the initial condition i(0) = 0, we ﬁnd
c=− E0 a
,
L(a 2 + ω2 ) i i i i i i i “main”
2007/2/16
page 63
i 1.7 Modeling Problems Using First-Order Linear Differential Equations 63 so that
i(t) = E0
(a cos ωt + ω sin ωt − ae−at ).
+ ω2 ) L(a 2 (1.7.18) This solution can be written in the form
i(t) = iS (t) + iT (t),
where
iS (t) = E0
(a cos ωt + ω sin ωt),
+ ω2 ) L(a 2 iT (t) = − aE0
e−at .
L(a 2 + ω2 ) The term iT (t) decays exponentially with time and is referred to as the transient part
of the solution. As t → ∞, the solution (1.7.18) approaches the steady-state solution,
iS (t). The steady-state solution can be written in a more illuminating form as follows.
If we construct the right-angled triangle (see Figure 1.7.3) with sides a and ω, then the
√
hypotenuse of the triangle is a 2 + ω2 . Consequently, there exists a unique angle φ in
(0, π/2), such that
2 2 v (a 1 /2 cos φ = √ ) v a
a 2 + ω2 , sin φ = √ ω
a 2 + ω2 . Equivalently, f
a Figure 1.7.3: Deﬁning the
phase angle for an RL circuit. a 2 + ω2 cos φ, a= ω= a 2 + ω2 sin φ. Substituting for a and ω into the expression for iS yields
E0
(cos ωt cos φ + sin ωt sin φ),
√
L a 2 + ω2 iS (t) = which can be written, using an appropriate trigonometric identity, as
iS (t) = E0
cos(ωt − φ).
√
L a 2 + ω2 This is referred to as the phase-amplitude form of the solution. Comparing this with the
original driving term, E0 cos ωt , we see that the system has responded with a steadystate solution having the same periodic behavior, but with a phase shift of φ radians.
Furthermore the amplitude of the response is
A=
iS(t), E(t) iS(t) E0
E0
=√
,
√
L a 2 + ω2
R 2 + ω2 L2 (1.7.19) A cos (vt — f) E0 cos vt E(t) t Figure 1.7.4: The response of an RL circuit to the driving term E(t) = E0 cos ωt . i i i i i i i “main”
2007/2/16
page 64
i 64 CHAPTER 1 First-Order Differential Equations
i(t), iS(t) iS(t) t i(t) iS(t) iT(t) Figure 1.7.5: The transient part of the solution for an RL circuit dies out as t increases. where we have substituted for a = R/L. This is illustrated in Figure 1.7.4. The general
picture that we have, therefore, is that the transient part of the solution affects i(t) for a
short period of time, after which the current settles into a steady-state. In the case when
the driving EMF has the form E(t) = E0 cos ωt , the steady-state is a phase shift of
this driving EMF with an amplitude given in Equation (1.7.19). This general behavior is
illustrated in Figure 1.7.5.
Our next example illustrates the procedure for solving the differential equation
(1.7.16) governing the behavior of an RC circuit.
Example 1.7.3 Consider the RC circuit in which R = 0.5 , C = 0.1 F, and E0 = 20 V. Given that the
capacitor has zero initial charge, determine the current in the circuit after 0.25 seconds. Solution: In this case we ﬁrst solve Equation (1.7.16) for q(t) and then determine
the current in the circuit by differentiating the result. Substituting for R , C and E into
Equation (1.7.16) yields
dq
+ 20q = 40,
dt
which has general solution
q(t) = 2 + ce−20t ,
where c is an integration constant. Imposing the initial condition q(0) = 0 yields c = −2,
so that
q(t) = 2(1 − e−20t ).
Differentiating this expression for q gives the current in the circuit
i(t) = dq
= 40e−20t .
dt Consequently,
i(0.25) = 40e−5 ≈ 0.27 A. i i i i i i i “main”
2007/2/16
page 65
i 1.7 Modeling Problems Using First-Order Linear Differential Equations 65 Exercises for 1.7 Key Terms
Mixing problem, Concentration, Electric circuit, Kirchoff’s
second law, Resistor, Capacitor, Inductor, Electromotive
force (EMF), RL circuit, RC circuit, RLC circuit, Direct
current, Alternating current, Transient solution, Steady-state
solution, Phase, Amplitude. Skills
• Be able to use information about a mixing problem to
provide the correct mathematical formulation of the
problem.
• Be able to solve mixing problems by deriving and solving the differential equation (1.7.6) for a speciﬁc mixing problem and using initial conditions.
• Know the relationship between the charge and the current in an electric circuit.
• Be familiar with the basic components of an electric
circuit, such as electromotive force, resistors, capacitors, and inductors.
• Be able to write down and solve the differential equation for the current in an RL circuit and for the charge
in an RC circuit, for either a direct current or an alternating current.
• Be able to identify the transient and steady-state components of current in an electric circuit with an alternating current.
• Be able to put the steady-state component of the current in an RL circuit in phase-amplitude form, and
identify the phase shift and the amplitude. True-False Review
For Questions 1–8, decide if the given statement is true or
false, and give a brief justiﬁcation for your answer. If true,
you can quote a relevant deﬁnition or theorem from the text.
If false, provide an example, illustration, or brief explanation
of why the statement is false.
1. The amount of chemical A(t) in a tank at time t is obtained by multiplying the concentration of chemical
c(t) in the tank at time t by the volume of the solution,
V (t), at time t .
2. If r1 and r2 denote the rates at which ﬂuid is ﬂowing
into a tank and out of the tank, respectively, then the
rate of change of the volume of the tank is r2 − r1 . 3. For the mixing problems described in this section, we
assume that the concentration of the chemical entering
the tank is independent of time.
4. For the mixing problems described in this section, we
assume that the concentration of the chemical leaving
the tank is independent of time.
5. Kirchoff’s second law states the sum of the voltage
drops around a closed circuit is independent of time.
6. The larger the resistance in a resistor, the greater the
voltage drop between the ends of the resistor.
7. Given an alternating current in an RL circuit, the transient part of the current decays to zero with time, while
the steady-state part of the current oscillates with the
same frequency as the applied EMF.
8. The higher the frequency of an applied EMF in an
RL circuit, the lower the amplitude of the steady-state
current. Problems
1. A container initially contains 10 L of water in which
there is 20 g of salt dissolved. A solution containing
4 g/L of salt is pumped into the container at a rate
of 2 L/min, and the well-stirred mixture runs out at a
rate of 1 L/min. How much salt is in the tank after 40
minutes?
2. A tank initially contains 600 L of solution in which
there is dissolved 1500 g of chemical. A solution containing 5 g/L of the chemical ﬂows into the tank at a
rate of 6 L/min, and the well-stirred mixture ﬂows out
at a rate of 3 L/min. Determine the concentration of
chemical in the tank after one hour.
3. A tank whose volume is 40 L initially contains 20 L of
water. A solution containing 10 g/L of salt is pumped
into the tank at a rate of 4 L/min, and the well-stirred
mixture ﬂows out at a rate of 2 L/min. How much salt
is in the tank just before the solution overﬂows?
4. A tank whose volume is 200 L is initially half full of
a solution that contains 100 g of chemical. A solution
containing 0.5 g/L of the same chemical ﬂows into the
tank at a rate of 6 L/min, and the well-stirred mixture
ﬂows out at a rate of 4 L/min. Determine the concentration of chemical in the tank just before the solution
overﬂows. i i i i i i i “main”
2007/2/16
page 66
i 66 CHAPTER 1 First-Order Differential Equations 5. A tank initially contains 10 L of a salt solution. Water
ﬂows into the tank at a rate of 3 L/min, and the wellstirred mixture ﬂows out at a rate of 2 L/min. After 5
min, the concentration of salt in the tank is 0.2 g/L.
Find: where V1 and V2 are constants. (a) The amount of salt in the tank initially.
(b) The volume of solution in the tank when the concentration of salt is 0.1 g/L. (b) Let r1 = 6 L/min, r2 = 4 L/min, r3 = 3 L/min,
and c1 = 0.5 g/L. If the ﬁrst tank initially holds
40 L of water in which 4 grams of chemical is
dissolved, whereas the second tank initially contains 20 g of chemical dissolved in 20 L of water,
determine the amount of chemical in the second
tank after 10 min. 6. A tank initially contains 20 L of water. A solution
containing 1 g/L of chemical ﬂows into the tank at a
rate of 3 L/min, and the mixture ﬂows out at a rate of
2 L/min. 9. Consider the RL circuit in which R = 4 , L = 0.1
H, and E(t) = 20 V. If no current is ﬂowing initially,
determine the current in the circuit for t ≥ 0. (a) Set up and solve the initial-value problem for
A(t), the amount of chemical in the tank at time
t.
(b) When does the concentration of chemical in the
tank reach 0.5 g/L?
7. A tank initially contains w liters of a solution in which
is dissolved A0 grams of chemical. A solution containing k g/L of this chemical ﬂows into the tank at a rate
of r L/min, and the mixture ﬂows out at the same rate.
(a) Show that the amount of chemical, A(t), in the
tank at time t is
A(t) = e−(rt)/w [kw(e(rt)/w − 1) + A0 ].
(b) Show that as t → ∞, the concentration of chemical in the tank approaches k g/L. Is this result
reasonable? Explain.
8. Consider the double mixing problem depicted in Figure 1.7.6.
r1, c1 A1 A2 1
10. Consider the RC circuit which has R = 5 , C = 50
F, and E(t) = 100 V. If the capacitor is uncharged
initially, determine the current in the circuit for t ≥ 0. 11. An RL circuit has EMF E(t) = 10 sin 4t V. If R =
2 , L = 2 H, and there is no current ﬂowing initially,
3
determine the current for t ≥ 0.
12. Consider the RC circuit with R = 2 , C = 1 F,
8
and E(t) = 10 cos 3t V. If q(0) = 1 C, determine the
current in the circuit for t ≥ 0.
13. Consider the general RC circuit with E(t) = 0. Suppose that q(0) = 5 C. Determine the charge on the
capacitor for t > 0. What happens as t → ∞? Is this
reasonable? Explain.
14. Determine the current in an RC circuit if the capacitor has zero charge initially and the driving EMF is
E = E0 , where E0 is a constant. Make a sketch showing the change in the charge q(t) on the capacitor with
time and show that q(t) approaches a constant value
as t → ∞. What happens to the current in the circuit
as t → ∞? r3, c3 r2, c2 Figure 1.7.6: Double mixing problem (a) Show that the following are differential equations
for A1 (t) and A2 (t):
dA1
r2
A1 = c1 r1 ,
+
dt
(r1 − r2 )t + V1
dA2
r2 A1
r3
A2 =
,
+
dt
(r2 − r3 )t + V2
(r1 − r2 )t + V1 15. Determine the current ﬂowing in an RL circuit if the
applied EMF is E(t) = E0 sin ωt , where E0 and ω are
constants. Identify the transient part of the solution and
the steady-state solution.
16. Determine the current ﬂowing in an RL circuit if the
applied EMF is constant and the initial current is zero.
17. Determine the current ﬂowing in an RC circuit if the
capacitor is initially uncharged and the driving EMF
is given by E(t) = E0 e−at , where E0 and a are
constants. i i i i i i i “main”
2007/2/16
page 67
i 1.8 18. Consider the special case of the RLC circuit in which
the resistance is negligible and the driving EMF is
zero. The differential equation governing the charge
on the capacitor in this case is 67 and no current is ﬂowing initially, determine the charge
on the capacitor for t > 0, and the corresponding
current in the circuit. [Hint: Let u = dq/dt and
use the chain rule to show that this implies du/dt =
u(du/dq).] d 2q
1
+
q = 0.
LC
dt 2
If the capacitor has an initial charge of q0 coulombs, 1.8 Change of Variables 19. Repeat the previous problem for the case in which the
driving EMF is E(t) = E0 , a constant. Change of Variables
So far we have introduced techniques for solving separable and ﬁrst-order linear differential equations. Clearly, most ﬁrst-order differential equations are not of these two
types. In this section, we consider two further types of differential equations that can be
solved by using a change of variables to reduce them to one of the types we know how
to solve. The key point to grasp, however, is not the speciﬁc changes of variables that
we discuss, but the general idea of changing variables in a differential equation. Further
examples are considered in the exercises. We ﬁrst require a preliminary deﬁnition. DEFINITION 1.8.1
A function f (x, y) is said to be homogeneous of degree zero7 if
f (tx, ty) = f (x, y)
for all positive values of t for which (tx, ty) is in the domain of f . Remark Equivalently, we can say that f is homogeneous of degree zero if it is
invariant under a rescaling of the variables x and y .
The simplest nonconstant functions that are homogeneous of degree zero are
f (x, y) = y/x , and f (x, y) = x/y .
Example 1.8.2 If f (x, y) = x2 − y2
, then
2xy + y 2
f (tx, ty) = t 2 (x 2 − y 2 )
= f (x, y),
t 2 (2xy + y 2 ) so that f is homogeneous of degree zero.
In the previous example, if we factor an x 2 term from the numerator and denominator,
then the function f can be written in the form
f (x, y) = x 2 [1 − (y/x)2 ]
x 2 [2(y/x) + (y/x)2 ] . That is,
f (x, y) =
7 1 − (y/x)2
.
2(y/x) + (y/x)2 More generally, f (x, y) is said to be homogeneous of degree m if f (tx, ty) = t m f (x, y). i i i i i i i “main”
2007/2/16
page 68
i 68 CHAPTER 1 First-Order Differential Equations Thus f can be considered to depend on the single variable V = y/x . The following
theorem establishes that this is a basic property of all functions that are homogeneous of
degree zero.
Theorem 1.8.3 A function f (x, y) is homogeneous of degree zero if and only if it depends on y/x only. Proof Suppose that f is homogeneous of degree zero. We must consider two cases
separately.
(a) If x > 0, we can take t = 1/x in Deﬁnition 1.8.1 to obtain
f (x, y) = f (1, y/x),
which is a function of V = y/x only.
(b) If x < 0, then we can take t = −1/x in Deﬁnition 1.8.1. In this case we obtain
f (x, y) = f (−1, −y/x),
which once more depends on y/x only.
Conversely, suppose that f (x, y) depends only on y/x . If we replace x by tx and y
by ty, then f is unaltered, since y/x = (ty)/(tx), and hence is homogeneous of degree
zero. Remark Do not memorize the formulas in the preceding theorem. Just remember that
a function f (x, y) that is homogeneous of degree zero depends only on the combination
y/x and hence can be considered as a function of a single variable, say, V , where
V = y/x .
We now consider solving differential equations that satisfy the following deﬁnition. DEFINITION 1.8.4
If f (x, y) is homogeneous of degree zero, then the differential equation
dy
= f (x, y)
dx
is called a homogeneous ﬁrst-order differential equation.
In general, if
dy
= f (x, y)
dx
is a homogeneous ﬁrst-order differential equation, then we cannot solve it directly. However, our preceding discussion implies that such a differential equation can be written in
the equivalent form
dy
= F (y/x),
(1.8.1)
dx
for an appropriate function F . This suggests that, instead of using the variables x and y ,
we should use the variables x and V , where V = y/x , or equivalently,
y = xV (x). (1.8.2) i i i i i i i “main”
2007/2/16
page 69
i 1.8 Change of Variables 69 Substitution of (1.8.2) into the right-hand side of Equation (1.8.1) has the effect of
reducing it to a function of V only. We must also determine how the derivative term
dy/dx transforms. Differentiating (1.8.2) with respect to x using the product rule yields
the following relationship between dy/dx and dV /dx :
dy
dV
=x
+ V.
dx
dx
Substituting into Equation (1.8.1), we therefore obtain
x dV
+ V = F (V ),
dx x dV
= F (V ) − V .
dx or equivalently, The variables can now be separated to yield
1
1
dV = dx,
F (V ) − V
x
which can be solved directly by integration. We have therefore established the next
theorem.
Theorem 1.8.5 The change of variables y = xV (x) reduces a homogeneous ﬁrst-order differential
equation dy/dx = f (x, y) to the separable equation
1
1
dV = dx.
F (V ) − V
x Remark The separable equation that results in the previous technique can be integrated to obtain a relationship between V and x . We then obtain the solution to the given
differential equation by substituting y/x for V in this relationship. Example 1.8.6 Find the general solution to
dy
4x + y
=
.
dx
x − 4y (1.8.3) Solution: The function on the right-hand side of Equation (1.8.3) is homogeneous of
degree zero, so that we have a ﬁrst-order homogeneous differential equation. Substituting
y = xV into the equation yields
d
4+V
(xV ) =
.
dx
1 − 4V
That is,
x dV
4+V
+V =
,
dx
1 − 4V i i i i i i i “main”
2007/2/16
page 70
i 70 CHAPTER 1 First-Order Differential Equations or equivalently,
x 4(1 + V 2 )
dV
=
.
dx
1 − 4V Separating the variables gives
1
1 − 4V
dV = dx.
x
4(1 + V 2 )
We write this as
V
1
−
2)
4(1 + V
1+V2 dV = 1
dx,
x which can be integrated directly to obtain
1
1
arctan V − ln (1 + V 2 ) = ln |x | + c.
4
2
Substituting V = y/x and multiplying through by 2 yields
y
x2 + y2
1
arctan
− ln
2
x
x2 = ln (x 2 ) + c1 , which simpliﬁes to
1
y
arctan
− ln (x 2 + y 2 ) = c1 .
2
x (1.8.4) Although this technically gives the answer, the solution is more easily expressed in terms
of polar coordinates:
x = r cos θ and y = r sin θ ⇐⇒ r= x2 + y2 and θ = arctan y
.
x Substituting into Equation (1.8.4) yields
1
θ − ln (r 2 ) = c1 ,
2
or equivalently,
ln r = 1
θ + c2 .
4 Exponentiating both sides of this equation gives
r = c3 eθ/4 .
For each value of c3 , this is the equation of a logarithmic spiral. The particular spiral
1
with equation r = 2 eθ/4 is shown in Figure 1.8.1. i i i i i i i “main”
2007/2/16
page 71
i 1.8 Change of Variables 71 y 2
4 2 2 4 x 2
4
6
8
1
Figure 1.8.1: Graph of the logarithmic spiral with polar equation r = 2 eθ/4 ,
−5π/6 ≤ θ ≤ 22π/6. Example 1.8.7 Find the equation of the orthogonal trajectories to the family
x 2 + y 2 − 2cx = 0. (1.8.5) (Completing the square in x , we obtain (x − c)2 + y 2 = c2 , which represents the family
of circles centered at (c, 0), with radius c.) Solution: First we need an expression for the slope of the given family at the point
(x, y). Differentiating Equation (1.8.5) implicitly with respect to x yields
2x + 2y dy
− 2c = 0,
dx which simpliﬁes to
dy
c−x
=
.
dx
y (1.8.6) This is not the differential equation of the given family, since it still contains the constant
c and hence is dependent on the individual curves in the family. Therefore, we must
eliminate c to obtain an expression for the slope of the family that is independent of any
particular curve in the family. From Equation (1.8.5) we have
c= x2 + y2
.
2x Substituting this expression for c into Equation (1.8.6) and simplifying gives
dy
y2 − x2
=
.
dx
2xy
Therefore, the differential equation for the family of orthogonal trajectories is
dy
2xy
=− 2
.
dx
y − x2 (1.8.7) This differential equation is ﬁrst-order homogeneous. Substituting y = xV (x) into
Equation (1.8.7) yields
d
2V
(xV ) =
,
dx
1−V2 i i i i i i i “main”
2007/2/16
page 72
i 72 CHAPTER 1 First-Order Differential Equations so that
x dV
2V
.
+V =
dx
1−V2 Hence
x V +V3
dV
,
=
dx
1−V2 or in separated form,
1
1−V2
dV = dx.
x
V (1 + V 2 )
Decomposing the left-hand side into partial fractions yields
1
2V
−
V
1+V2 dV = 1
dx,
x which can be integrated directly to obtain
ln |V | − ln (1 + V 2 ) = ln |x | + c,
or equivalently,
ln |V |
1+V2 = ln |x | + c. Exponentiating both sides and redeﬁning the constant yields
V
= c1 x.
1+V2
Substituting back for V = y/x , we obtain
xy
= c1 x.
x2 + y2
That is,
x 2 + y 2 = c2 y,
where c2 = 1/c1 . Completing the square in y yields
x 2 + (y − k)2 = k 2 , (1.8.8) where k = c2 /2. Equation (1.8.8) is the equation of the family of orthogonal trajectories.
This is the family of circles centered at (0, k) with radius k (circles along the y -axis).
(See Figure 1.8.2.) i i i i i i i “main”
2007/2/16
page 73
i 1.8 Change of Variables 73 y
x2 (y k)2 k2
(x c)2 y2 c2 x Figure 1.8.2: The family (x − c)2 + y 2 = c2 and its orthogonal trajectories
x 2 + (y − k)2 = k 2 . Bernoulli Equations
We now consider a special type of nonlinear differential equation that can be reduced to
a linear equation by a change of variables. DEFINITION 1.8.8
A differential equation that can be written in the form
dy
+ p(x)y = q(x)y n ,
dx (1.8.9) where n is a real constant, is called a Bernoulli equation.
If n = 0 or n = 1, Equation (1.8.9) is linear, but otherwise it is nonlinear. We can
reduce it to a linear equation as follows. We ﬁrst divide Equation (1.8.9) by y n to obtain
y −n dy
+ y 1−n p(x) = q(x).
dx (1.8.10) We now make the change of variables
u(x) = y 1−n , (1.8.11) which implies that
du
dy
= (1 − n)y −n .
dx
dx
That is,
y −n dy
1 du
=
.
dx
1 − n dx Substituting into Equation (1.8.10) for y 1−n and y −n dy/dx yields the linear differential
equation
1 du
+ p(x)u = q(x),
1 − n dx i i i i i i i “main”
2007/2/16
page 74
i 74 CHAPTER 1 First-Order Differential Equations or in standard form,
du
+ (1 − n)p(x)u = (1 − n)q(x).
dx (1.8.12) The linear equation (1.8.12) can now be solved for u as a function of x . The solution to
the original equation is then obtained from (1.8.11).
Example 1.8.9 Solve
dy
3
12y 2/3
+ y=√
,
dx
x
1 + x2 x > 0. Solution: The differential equation is a Bernoulli equation. Dividing both sides of
the differential equation by y 2/3 yields
y − 2 /3 dy
3
12
+ y 1/3 = √
.
dx
x
1 + x2 (1.8.13) We make the change of variables
u = y 1/3 , (1.8.14) which implies that
1
dy
du
= y − 2 /3 .
dx
3
dx
Substituting into Equation (1.8.13) yields
3 du 3
12
+ u= √
,
dx
x
1 + x2 or in standard form,
du 1
4
.
+ u= √
dx
x
1 + x2 (1.8.15) An integrating factor for this linear equation is
I (x) = e (1/x) dx = eln x = x, so that Equation (1.8.15) can be written as
d
4x
(xu) = √
.
dx
1 + x2
Integrating, we obtain
u(x) = x −1 4 1 + x 2 + c ,
and so, from (1.8.14), the solution to the original differential equation is
y 1/3 = x − 1 4 1 + x 2 + c . i i i i i i i “main”
2007/2/16
page 75
i 1.8 Change of Variables 75 Exercises for 1.8 Key Terms 4. The differential equation Homogeneous of degree zero, Homogeneous ﬁrst-order differential equation, Bernoulli equation. Skills x2y2
dy
=4
dx
x + y4
is a ﬁrst-order homogeneous differential equation. • Be able to recognize whether or not a function f (x, y)
is homogeneous of degree zero, and whether or not
a given differential equation is a homogeneous ﬁrstorder differential equation. 5. The change of variables y = xV (x) always turns a
ﬁrst-order homogeneous differential equation into a
separable differential equation for V as a function of
x. • Know how to change the variables in a homogeneous
ﬁrst-order differential equation in order to get a differential equation that is separable and thus can be
solved. 6. The change of variables u = y −n always turns a
Bernoulli differential equation into a ﬁrst-order linear
differential equation for u as a function of x . • Be able to recognize whether or not a given ﬁrst-order
differential equation is a Bernoulli equation.
• Know how to change the variables in a Bernoulli equation in order to get a differential equation that is ﬁrstorder linear and thus can be solved.
• Be able to make other changes of variables to differential equations in order to turn them into differential
equations that can be solved by methods from earlier
in this chapter. 7. The differential equation
√
dy
√
= xy + xy
dx
is a Bernoulli differential equation.
8. The differential equation
dy
√
− exy y = 5x y
dx
is a Bernoulli differential equation. True-False Review
For Questions 1–9, decide if the given statement is true or
false, and give a brief justiﬁcation for your answer. If true,
you can quote a relevant deﬁnition or theorem from the text.
If false, provide an example, illustration, or brief explanation
of why the statement is false. 9. The differential equation
dy
+ xy = x 2 y 2/3
dx
is a Bernoulli differential equation. 1. The function
f (x, y) = 2xy − x 2
2xy + y 2 is homogeneous of degree zero.
2. The function
y2
f (x, y) =
x + y2
is homogeneous of degree zero.
3. The differential equation
dy
1 + xy 2
=
dx
1 + x2y
is a ﬁrst-order homogeneous differential equation. Problems
For Problems 1–8, determine whether the given function is
homogeneous of degree zero. Rewrite those that are as functions of the single variable V = y/x .
1. f (x, y) = x2 − y2
.
xy 2. f (x, y) = x − y .
3. f (x, y) =
4. f (x, y) = x sin(x/y) − y cos(y/x)
.
y
x2 + y2
,
x−y x > 0. i i i i i i i “main”
2007/2/16
page 76
i 76 CHAPTER 1 First-Order Differential Equations 25. 5. f (x, y) =
6. f (x, y) = x − 3 5y + 9
+
.
y
3y
x2 + y2
,
x 7. f (x, y) = x2 8. f (x, y) = −x+y
,
x + 3y x
x, y = 0. For Problems 9–22, solve the given differential equation.
9. (3x − 2y) dy
= 3y.
dx (x + y)2
10. y =
.
2x 2
y
y
11. sin
(xy − y) = x cos
.
x
x
12. xy = 16x 2 − y 2 + y, 13. xy − y = x > 0. 9x 2 + y 2 , dy
y−
=
dx tx > 0. 15. xy + y ln x = y ln y . 18. x 2 + 2y 2 ) dx = 0. 19. yy = + y2 − x, x > 0. 20. 2x(y + 2x)y = y(4x − y). 4x 2 − y 2 , x > 0. can be written in polar form as r = keaθ .
(b) For the particular case when a = 1/2, determine the solution satisfying the initial condition
y(1) = 1, and ﬁnd the maximum x -interval on
which this solution is valid. [Hint: When does
the solution curve have a vertical tangent?]
(c) On the same set of axes, sketch the spiral corresponding to your solution in (b), and the line
y = x/2. Thus verify the x -interval obtained in
(b) with the graph. 29. x 2 + y 2 = 2cy . x x2 + y2 + y2
dy
=
,
22.
dx
xy 31. Fix a real number m. Let S1 denote the family of circles, centered on the line y = mx , each member of
which passes through the origin.
(a) Show that the equation of S1 can be written in the
form dy
= x tan(y/x) + y.
21. x
dx (x − a)2 + (y − ma)2 = a 2 (m2 + 1),
x > 0. 23. Solve the differential equation in Example 1.8.6 by
ﬁrst transforming it into polar coordinates. [Hint:
Write the differential equation in differential form and
then express dx and dy in terms of r and θ .]
For Problems 24–26, solve the given initial-value problem.
dy
2(2y − x)
24.
=
,
dx
x+y dy
−y =
dx 30. (x − c)2 + (y − c)2 = 2c2 . dy
= y 2 + 3xy + x 2 .
dx
x2 y(3) = 4. For Problems 29–30, determine the orthogonal trajectories
to the given family of curves. Sketch some curves from each
family. y 2 + 2xy − 2x 2
dy
.
=2
16.
dx
x − xy + y 2
2 /x 2 x2 + y2
,
x 28. (a) Show that the general solution to the differential
equation
dy
x + ay
=
dx
ax − y 14. y(x 2 − y 2 )dx − x(x 2 + y 2 ) dy = 0. 17. 2xy dy − (x 2 e−y y(1) = 1. 27. Find all solutions to x < 0. + 4y 2 2x − y
dy
=
,
dx
x + 4y 26. y
.
x−1 y(1) = 2. where a is a constant that labels particular members of the family.
(b) Determine the equation of the family of orthogonal trajectories to S1 , and show that it consists of the family of circles centered on the line
x = −my that pass through the origin.
(c) Sketch some curves from both families when
√
m = 3/3. i i i i i i i “main”
2007/2/16
page 77
i 1.8 Let F1 and F2 be two families of curves with the property that whenever a curve from the family F1 intersects one from the family F2 , it does so at an angle
α = π/2. If we know the equation of F2 , then it can
be shown (see Problem 26 in Section 1.1) that the differential equation for determining F1 is
dy
m2 − tan α
=
,
dx
1 + m2 tan α (1.8.16) where m2 denotes the slope of the family F2 at the
point (x, y).
For Problems 32–34, use Equation (1.8.16) to determine the
equation of the family of curves that cuts the given family at
an angle α = π/4.
32. x 2 + y 2 = c. 35. (a) Use Equation (1.8.16) to ﬁnd the equation of the
family of curves that intersects the family of hyperbolas y = c/x at an angle α = α0 .
When α0 = π/4, sketch several curves from
each family. 36. (a) Use Equation (1.8.16) to show that the family of
curves that intersects the family of concentric circles x 2 + y 2 = c at an angle α = tan−1 m has
polar equation r = kemθ .
When α = π/6, sketch several curves from
each family. For Problems 37–49, solve the given differential equation.
37. y − x −1 y = 4x 2 y −1 cos x,
38. x > 0. dy
1
+ (tan x)y = 2y 3 sin x .
dx
2 1
3
dy
−
y=
xy π .
dx
(π − 1)x
1−π 48. 2y + y cot x = 8y −1 cos3 x .
√
√
49. (1 − 3)y + y sec x = y 3 sec x .
For Problems 50–51, solve the given initial-value problem.
50. 2x
dy
y = xy 2 ,
+
dx
1 + x2 51. y + y cot x = y 3 sin3 x, y(0) = 1.
y(π/2) = 1. 52. Consider the differential equation
(1.8.17) dy
3
−
y = 6y 1/3 x 2 ln x .
dx
2x
√
√
40. y + 2x −1 y = 6 1 + x 2 y, reduces Equation (1.8.17) to the separable form
1
dV = dx.
bF (V ) + a
For Problems 53–55, use the result from the previous problem to solve the given differential equation. For Problem 53,
impose the given initial condition as well.
53. y = (9x − y)2 , y(0) = 0. 54. y = (4x + y + 2)2 .
55. y = sin2 (3x − 3y + 1). y
dy
= F (xy)
dx
x
x > 0. 41. y + 2x −1 y = 6y 2 x 4 .
42. 2x(y + y 3 x 2 ) + y = 0.
√
43. (x − a)(x − b)(y − y) = 2(b − a)y , where a, b are
constants.
44. y + 6x −1 y = 3x −1 y 2/3 cos x, V = ax + by + c 56. Show that the change of variables V = xy transforms
the differential equation 39. 45. y + 4xy = 4x 3 y 1/2 . 1
dy
−
y = 2xy 3 .
dx
2x ln x 47. 77 where a, b = 0, and c are constants. Show that the
change of variables from x, y to x, V , where 34. x 2 + y 2 = 2cx . (b) 46. y = F (ax + by + c), 33. y = cx 6 . (b) Change of Variables x > 0. into the separable differential equation
dV
1
1
=.
V [F (V ) + 1] dx
x
57. Use the result from the previous problem to solve
y
dy
= [ln (xy) − 1].
dx
x i i i i i i i “main”
2007/2/16
page 78
i 78 CHAPTER 1 First-Order Differential Equations 61. (a) Show that the change of variables y = x −1 + w
transforms the Riccati differential equation 58. Consider the differential equation
x + 2y − 1
dy
=
.
dx
2x − y + 3 (1.8.18) (a) Show that the change of variables deﬁned by
x = u − 1, y + 7x −1 y − 3y 2 = 3x −2
into the Bernoulli equation
w + x −1 w = 3w2 . y =v+1 transforms Equation (1.8.18) into the homogeneous equation (1.8.23) (b) Solve Equation (1.8.23), and hence determine the
general solution to (1.8.22).
62. Consider the differential equation dv
u + 2v
=
.
du
2u − v (1.8.19) (b) Find the general solution to Equation (1.8.19),
and hence solve Equation (1.8.18).
59. A differential equation of the form
y + p(x)y + q(x)y 2 = r(x) y −1 y + p(x) ln y = q(x), (1.8.24) where p(x) and q(x) are continuous functions on
some interval (a, b). Show that the change of variables u = ln y reduces Equation (1.8.24) to the linear
differential equation
u + p(x)u = q(x), (1.8.20) and hence show that the general solution to Equation (1.8.24) is is called a Riccati equation.
(a) If y = Y (x) is a known solution to Equation (1.8.20), show that the substitution
y = Y (x) + v −1 (x) y(x) = exp I −1 I =e v − [p(x) + 2Y (x)q(x)]v = q(x).
(b) Find the general solution to the Riccati equation
x 2 y − xy − x 2 y 2 = 1, I (x)q(x) dx + c , where reduces it to the linear equation x > 0, p (x)dx (1.8.25) and c is an arbitrary constant.
63. Use the technique derived in the previous problem to
solve the initial-value problem
y −1 y − 2x −1 ln y = x −1 (1 − 2 ln x), given that y = −x −1 is a solution. y(1) = e. 60. Consider the Riccati equation
y + 2x −1 y − y 2 = −2x −2 , (1.8.22) 64. Consider the differential equation
x > 0. (1.8.21) (a) Determine the values of the constants a and r
such that y(x) = ax r is a solution to Equation
(1.8.21).
(b) Use the result from part (a) of the previous problem to determine the general solution to Equation
(1.8.21). f (y) dy
+ p(x)f (y) = q(x),
dx (1.8.26) where p and q are continuous functions on some interval (a, b), and f is an invertible function. Show that
Equation (1.8.26) can be written as
du
+ p(x)u = q(x),
dx i i i i i i i “main”
2007/2/16
page 79
i 1.9 where u = f (y), and hence show that the general
solution to Equation (1.8.26) is
y(x) = f −1 I −1 I (x)q(x) dx + c 79 and c is an arbitrary constant.
65. Solve , sec2 y where I is given in (1.8.25), f −1 is the inverse of f , 1.9 Exact Differential Equations dy
1
1
+√
tan y = √
.
dx
2 1+x
2 1+x Exact Differential Equations
For the next technique it is best to consider ﬁrst-order differential equations written in
differential form
M(x, y) dx + N(x, y) dy = 0, (1.9.1) where M and N are given functions, assumed to be sufﬁciently smooth.8 The method
that we will consider is based on the idea of a differential. Recall from a previous calculus
course that if φ = φ(x, y) is a function of two variables, x and y , then the differential
of φ , denoted dφ , is deﬁned by
dφ = Example 1.9.1 ∂φ
∂φ
dx +
dy.
∂x
∂y (1.9.2) Solve
2x sin y dx + x 2 cos y dy = 0. (1.9.3) Solution: This equation is separable, but we will use a different technique to solve
it. By inspection, we notice that
2x sin y dx + x 2 cos y dy = d(x 2 sin y).
Consequently, Equation (1.9.3) can be written as d(x 2 sin y) = 0, which implies that
x 2 sin y is constant, hence the general solution to Equation (1.9.3) is
sin y = c
,
x2 where c is an arbitrary constant.
In the foregoing example we were able to write the given differential equation in the
form dφ(x, y) = 0, and hence obtain its solution. However, we cannot always do this.
Indeed we see by comparing Equation (1.9.1) with (1.9.2) that the differential equation
M(x, y) dx + N(x, y) dy = 0
can be written as dφ = 0 if and only if
M= ∂φ
∂x and N= ∂φ
∂y for some function φ . This motivates the following deﬁnition:
8 This means we assume that the functions M and N have continuous derivatives of sufﬁciently high order. i i i i i i i “main”
2007/2/16
page 80
i 80 CHAPTER 1 First-Order Differential Equations DEFINITION 1.9.2
The differential equation
M(x, y) dx + N(x, y) dy = 0
is said to be exact in a region R of the xy -plane if there exists a function φ(x, y) such
that
∂φ
= M,
∂x ∂φ
= N,
∂y (1.9.4) for all (x, y) in R .
Any function φ satisfying (1.9.4) is called a potential function for the differential
equation
M(x, y) dx + N(x, y) dy = 0.
We emphasize that if such a function exists, then the preceding differential equation can
be written as
dφ = 0.
This is why such a differential equation is called an exact differential equation. From the
previous example, a potential function for the differential equation
2x sin y dx + x 2 cos y dy = 0
is
φ(x, y) = x 2 sin y.
We now show that if a differential equation is exact and we can ﬁnd a potential function
φ , its solution can be written down immediately.
Theorem 1.9.3 The general solution to an exact equation
M(x, y) dx + N(x, y) dy = 0
is deﬁned implicitly by
φ(x, y) = c,
where φ satisﬁes (1.9.4) and c is an arbitrary constant. Proof We rewrite the differential equation in the form
dy
= 0.
dx
Since the differential equation is exact, there exists a potential function φ (see (1.9.4))
such that
∂φ
∂φ dy
+
= 0.
∂x
∂y dx
M(x, y) + N(x, y) But this implies that ∂φ/∂x = 0. Consequently, φ(x, y) is a function of y only. By a
similar argument, which we leave to the reader, we can deduce that φ(x, y) is a function
of x only. We conclude therefore that φ(x, y) = c, where c is a constant. i i i i i i i “main”
2007/2/16
page 81
i 1.9 Exact Differential Equations 81 Remarks
1. The potential function φ is a function of two variables x and y , and we interpret the
relationship φ(x, y) = c as deﬁning y implicitly as a function of x . The preceding
theorem states that this relationship deﬁnes the general solution to the differential
equation for which φ is a potential function.
2. Geometrically, Theorem 1.9.3 says that the solution curves of an exact differential
equation are the family of curves φ(x, y) = k , where k is a constant. These are
called the level curves of the function φ(x, y).
The following two questions now arise:
1. How can we tell whether a given differential equation is exact?
2. If we have an exact equation, how do we ﬁnd a potential function?
The answers are given in the next theorem and its proof.
Theorem 1.9.4 (Test for Exactness) Let M , N , and their ﬁrst partial derivatives My and Nx , be continuous in a (simply connected9 ) region R of the xy -plane. Then the differential equation
M(x, y) dx + N(x, y) dy = 0
is exact for all x , y in R if and only if
∂M
∂N
=
.
∂y
∂x (1.9.5) Proof We ﬁrst prove that exactness implies the validity of Equation (1.9.5). If the
differential equation is exact, then by deﬁnition there exists a potential function φ(x, y)
such that φx = M and φy = N . Thus, taking partial derivatives, φxy = My and
φyx = Nx . Since My and Nx are continuous in R , it follows that φxy and φyx are
continuous in R . But, from multivariable calculus, this implies that φxy = φyx and
hence that My = Nx .
We now prove the converse. Thus we assume that Equation (1.9.5) holds and must
prove that there exists a potential function φ such that
∂φ
=M
∂x (1.9.6) ∂φ
= N.
∂y (1.9.7) and The proof is constructional. That is, we will actually ﬁnd a potential function φ . We
begin by integrating Equation (1.9.6) with respect to x , holding y ﬁxed (this is a partial
integration) to obtain
φ(x, y) = x M(s, y) ds + h(y), (1.9.8) 9 Roughly speaking, simply connected means that the interior of any closed curve drawn in the region also
lies in the region. For example, the interior of a circle is a simply connected region, although the region between
two concentric circles is not. i i i i i i i “main”
2007/2/16
page 82
i 82 CHAPTER 1 First-Order Differential Equations where h(y) is an arbitrary function of y (this is the integration “constant” that we must
allow to depend on y , since we held y ﬁxed in performing the integration10 ). We now
show how to determine h(y) so that the function f deﬁned in (1.9.8) also satisﬁes
Equation (1.9.7). Differentiating (1.9.8) partially with respect to y yields
x ∂φ
∂
=
∂y
∂y M(s, y) ds + dh
.
dy In order that φ satisfy Equation (1.9.7) we must choose h(y) to satisfy
x ∂
∂y M(s, y) ds + dh
= N(x, y).
dy That is,
x dh
∂
= N(x, y) −
dy
∂y (1.9.9) M(s, y) ds. Since the left-hand side of this expression is a function of y only, we must show, for
consistency, that the right-hand side also depends only on y . Taking the derivative of the
right-hand side with respect to x yields
∂
∂x N− ∂
∂y x M(s, y) ds x
∂2
∂N
−
M(s, y) ds
∂x
∂x∂y
x
∂N
∂
∂
=
−
M(s, y) ds
∂x
∂y ∂x
∂M
∂N
−
.
=
∂x
∂y = Thus, using (1.9.5), we have
∂
∂x N− x ∂
∂y = 0, M(s, y) ds so that the right-hand side of Equation (1.9.9) does depend only on y . It follows that
(1.9.9) is a consistent equation, and hence we can integrate both sides with respect to y
to obtain
h(y) = y N(x, t) dt − y ∂
∂t x M(s, t) ds d t. Finally, substituting into (1.9.8) yields the potential function
φ(x, y) = x M(s, y) dx + y N(x, t) dt − y ∂
∂t x M(s, t) ds d t. Remark There is no need to memorize the ﬁnal result for φ . For each particular
problem, one can construct an appropriate potential function from ﬁrst principles. This
is illustrated in Examples 1.9.6 and 1.9.7.
10 Throughout the text, x f (t) dt means “evaluate the indeﬁnite integral f (t) dt and replace t with x in the result.” i i i i i i i “main”
2007/2/16
page 83
i 1.9 Example 1.9.5 Exact Differential Equations 83 Determine whether the given differential equation is exact.
1. [1 + ln (xy)] dx + (x/y) dy = 0.
2. x 2 y dx − (xy 2 + y 3 ) dy = 0. Solution:
1. In this case, M = 1 + ln (xy) and N = x/y , so that My = 1/y = Nx . It follows
from the previous theorem that the differential equation is exact.
2. In this case, we have M = x 2 y , N = −(xy 2 + y 3 ), so that My = x 2 , whereas
Nx = −y 2 . Since My = Nx , the differential equation is not exact. Example 1.9.6 Find the general solution to 2xey dx + (x 2 ey + cos y) dy = 0. Solution: We have
M(x, y) = 2xey , N(x, y) = x 2 ey + cos y, so that
My = 2xey = Nx .
Hence the given differential equation is exact, and so there exists a potential function φ
such that (see Deﬁnition 1.9.2)
∂φ
= 2xey ,
∂x
∂φ
= x 2 ey + cos y.
∂y (1.9.10)
(1.9.11) Integrating Equation (1.9.10) with respect to x , holding y ﬁxed, yields
φ(x, y) = x 2 ey + h(y), (1.9.12) where h is an arbitrary function of y . We now determine h(y) such that (1.9.12) also
satisﬁes Equation (1.9.11). Taking the derivative of (1.9.12) with respect to y yields
∂φ
dh
= x 2 ey +
.
∂y
dy (1.9.13) Equations (1.9.11) and (1.9.13) give two expressions for ∂φ/∂y . This allows us to determine h. Subtracting Equation (1.9.11) from Equation (1.9.13) gives the consistency
requirement
dh
= cos y,
dy
which implies, upon integration, that
h(y) = sin y, i i i i i i i “main”
2007/2/16
page 84
i 84 CHAPTER 1 First-Order Differential Equations where we have set the integration constant equal to zero without loss of generality, since
we require only one potential function. Substitution into (1.9.12) yields the potential
function
φ(x, y) = x 2 ey + sin y.
Consequently, the given differential equation can be written as
d(x 2 ey + sin y) = 0,
and so, from Theorem 1.9.3, the general solution is
x 2 ey + sin y = c.
Notice that the solution obtained in the preceding example is an implicit solution.
Owing to the nature of the way in which the potential function for an exact equation is
obtained, this is usually the case.
Example 1.9.7 Find the general solution to
sin(xy) + xy cos(xy) + 2x d x + x 2 cos(xy) + 2y d y = 0. Solution: We have M(x, y) = sin(xy) + xy cos(xy) + 2x and N(x, y) = x 2 cos(xy) + 2y. Thus,
My = 2x cos(xy) − x 2 y sin(xy) = Nx ,
and so the differential equation is exact. Hence there exists a potential function φ(x, y)
such that
∂φ
= sin(xy) + xy cos(xy) + 2x,
∂x
∂φ
= x 2 cos(xy) + 2y.
∂y (1.9.14)
(1.9.15) In this case, Equation (1.9.15) is the simpler equation, and so we integrate it with respect
to y , holding x ﬁxed, to obtain
φ(x, y) = x sin(xy) + y 2 + g(x), (1.9.16) where g(x) is an arbitrary function of x . We now determine g(x), and hence φ , from
(1.9.14) and (1.9.16). Differentiating (1.9.16) partially with respect to x yields
∂φ
dg
= sin(xy) + xy cos(xy) +
.
∂x
dx (1.9.17) Equations (1.9.14) and (1.9.17) are consistent if and only if
dg
= 2x.
dx
Hence, upon integrating,
g(x) = x 2 , i i i i i i i “main”
2007/2/16
page 85
i 1.9 Exact Differential Equations 85 where we have once more set the integration constant to zero without loss of generality,
since we require only one potential function. Substituting into (1.9.16) gives the potential
function
φ(x, y) = x sin xy + x 2 + y 2 .
The original differential equation can therefore be written as
d(x sin xy + x 2 + y 2 ) = 0,
and hence the general solution is
x sin xy + x 2 + y 2 = c. Remark At ﬁrst sight the above procedure appears to be quite complicated. However,
with a little bit of practice, the steps are seen to be, in fact, fairly straightforward. As we
have shown in Theorem 1.9.4, the method works in general, provided one starts with an
exact differential equation. Integrating Factors
Usually a given differential equation will not be exact. However, sometimes it is possible
to multiply the differential equation by a nonzero function to obtain an exact equation
that can then be solved using the technique we have described in this section. Notice
that the solution to the resulting exact equation will be the same as that of the original
equation, since we multiply by a nonzero function. DEFINITION 1.9.8
A nonzero function I (x, y) is called an integrating factor for the differential equation
M(x, y)dx + N(x, y)dy = 0 if the differential equation
I (x, y)M(x, y) dx + I (x, y)N(x, y) dy = 0
is exact. Example 1.9.9 Show that I = x 2 y is an integrating factor for the differential equation
(3y 2 + 5x 2 y) dx + (3xy + 2x 3 ) dy = 0. Solution:
yields (1.9.18) Multiplying the given differential equation (which is not exact) by x 2 y
(3x 2 y 3 + 5x 4 y 2 ) dx + (3x 3 y 2 + 2x 5 y) dy = 0. (1.9.19) Thus,
My = 9x 2 y 2 + 10x 4 y = Nx , i i i i i i i “main”
2007/2/16
page 86
i 86 CHAPTER 1 First-Order Differential Equations so that the differential equation (1.9.19) is exact, and hence I = x 2 y is an integrating
factor for Equation (1.9.18). Indeed we leave it as an exercise to verify that (1.9.19) can
be written as
d(x 3 y 3 + x 5 y 2 ) = 0,
so that the general solution to Equation (1.9.19) (and hence the general solution to
Equation (1.9.18)) is deﬁned implicitly by
x 3 y 3 + x 5 y 2 = c.
That is,
x 3 y 2 (y + x 2 ) = c. As shown in the next theorem, using the test for exactness, it is straightforward to
determine the conditions that a function I (x, y) must satisfy in order to be an integrating
factor for the differential equation M(x, y) dx + N(x, y) dy = 0.
Theorem 1.9.10 The function I (x, y) is an integrating factor for
M(x, y) dx + N(x, y) dy = 0 (1.9.20) if and only if it is a solution to the partial differential equation
N ∂I
∂I
−M
=
∂x
∂y ∂M
∂N
−
∂y
∂x I. (1.9.21) Proof Multiplying Equation (1.9.20) by I yields
I M dx + I N dy = 0.
This equation is exact if and only if
∂
∂
(I M) =
(I N),
∂y
∂x
that is, if and only if
∂I
∂M
∂I
∂N
M +I
=
N +I
.
∂y
∂y
∂x
∂x
Rearranging the terms in this equation yields Equation (1.9.21).
The preceding theorem is not too useful in general, since it is usually no easier to
solve the partial differential equation (1.9.21) to ﬁnd I than it is to solve the original
Equation (1.9.20). However, it sometimes happens that an integrating factor exists that
depends only on one variable. We now show that Theorem 1.9.10 can be used to determine
when such an integrating factor exists and also to actually ﬁnd a corresponding integrating
factor. i i i i i i i “main”
2007/2/16
page 87
i 1.9 Theorem 1.9.11 Exact Differential Equations 87 Consider the differential equation M(x, y) dx + N(x, y) dy = 0.
1. There exists an integrating factor that is dependent only on x if and only if
(My − Nx )/N = f (x), a function of x only. In such a case, an integrating factor is
I (x) = e f (x) dx . 2. There exists an integrating factor that is dependent only on y if and only if
(My − Nx )/M = g(y), a function of y only. In such a case, an integrating factor is
I (y) = e− g (y) dy . Proof For part 1 of the theorem, we begin by assuming that I = I (x) is an integrating
factor for M(x, y) dx + N(x, y) dy = 0. Then ∂I /∂y = 0, and so, from (1.9.21), I is a
solution to
dI
N = (My − Nx )I.
dx
That is,
My − Nx
1 dI
=
.
I dx
N
Since, by assumption, I is a function of x only, it follows that the left-hand side of this
expression depends only on x and hence also the right-hand side.
Conversely, suppose that (My − Nx )/N = f (x), a function of x only. Then, dividing
(1.9.21) by N , it follows that I is an integrating factor for M(x, y) dx + N(x, y) dy = 0
if and only if it is a solution to
M ∂I
∂I
−
= If (x).
∂x
N ∂y (1.9.22) We must show that this differential equation has a solution I that depends on x only.
We do this by explicitly integrating the differential equation under the assumption that
I = I (x). Indeed, if I = I (x), then Equation (1.9.22) reduces to
dI
= If (x),
dx
which is a separable equation with solution
I (x) = e f (x) dx The proof of part 2 is similar, and so we leave it as an exercise (see Problem 30 ).
Example 1.9.12 Solve
(2x − y 2 ) dx + xy dy = 0, Solution: x > 0. (1.9.23) The equation is not exact (My = Nx ). However,
My − Nx
−2y − y
3
=
=− ,
N
xy
x i i i i i i i “main”
2007/2/16
page 88
i 88 CHAPTER 1 First-Order Differential Equations which is a function of x only. It follows from part 1 of the preceding theorem that an
integrating factor for Equation (1.9.23) is
I (x) = e− (3/x)dx = e−3 ln x = x −3 .
Multiplying Equation (1.9.23) by I yields the exact equation
(2x −2 − x −3 y 2 ) dx + x −2 y dy = 0. (1.9.24) (The reader should check that this is exact, although it must be, by the previous theorem.)
We leave it as an exercise to verify that a potential function for Equation (1.9.24) is
φ(x, y) = 1 −2 2
x y − 2 x −1 ,
2 and hence the general solution to (1.9.23) is given implicitly by
1 −2 2
x y − 2x −1 = c,
2
or equivalently,
y 2 − 4x = c1 x 2 . Exercises for 1.9 Key Terms
Exact differential equation, Potential function, Integrating
factor. Skills
• Be able to determine whether or not a given differential
equation is exact.
• Given the partial derivatives ∂φ/∂x and ∂φ/∂y of a potential function φ(x, y), be able to determine φ(x, y).
• Be able to ﬁnd the general solution to an exact differential equation.
• When circumstances allow, be able to use an integrating factor to convert a given differential equation into
an exact differential equation with the same solution
set. you can quote a relevant deﬁnition or theorem from the text.
If false, provide an example, illustration, or brief explanation
of why the statement is false.
1. The differential equation M(x, y) dx + N(x, y) dy =
0 is exact in a simply connected region R if Mx and
Ny are continuous partial derivatives with Mx = Ny .
2. The solution to an exact differential equation is called
a potential function.
3. If M(x) and N(y) are continuous functions, then the
differential equation M(x) dx + N(y) dy = 0 is exact.
4. If (My − Nx )/N(x, y) is a function of x only, then the
differential equation M(x, y) dx + N(x, y) dy = 0
becomes exact when it is multiplied through by
I (x) = exp (My − Nx )/N(x, y) dx . True-False Review
For Questions 1–9, decide if the given statement is true or
false, and give a brief justiﬁcation for your answer. If true, 5. There is a unique potential function for an exact differential equation M(x, y) dx + N(x, y) dy = 0. i i i i i i i “main”
2007/2/16
page 89
i 1.9 6. The differential equation
(2ye2x − sin y) dx + (e2x − x cos y) dy = 0 Exact Differential Equations 89 13. (3x 2 ln x + x 2 − y) dx − xdy = 0, y(1) = 5.
14. 2x 2 y + 4xy = 3 sin x, y(2π) = 0.
15. (yexy + cos x) dx + xexy dy = 0, y(π/2) = 0. is exact.
7. The differential equation
x2
−2xy
dx + 2
dy = 0
(x 2 + y)2
(x + y)2
is exact.
8. The differential equation
(y 2 + cos x) dx + 2xy 2 dy = 0
is exact. 16. Show that if φ(x, y) is a potential function for
M(x, y) dx + N(x, y) dy = 0, then so is φ(x, y) + c,
where c is an arbitrary constant. This shows that potential functions are uniquely deﬁned only up to an
additive constant.
For Problems 17–19, determine whether the given function
is an integrating factor for the given differential equation.
17. I (x, y) = cos(xy), [tan(xy) + xy ] dx + x 2 dy = 0.
18. I (x) = sec x , [2x − (x 2 + y 2 ) tan x ] dx + 2y dy = 0. 9. The differential equation
(ex sin y sin y) dx + (ex sin y cos y) dy = 0
is exact. 19. I (x, y) = y −2 e−x/y , y(x 2 − 2xy) dx − x 3 dy = 0.
For Problems 20–26, determine an integrating factor for
the given differential equation, and hence ﬁnd the general
solution. Problems 20. (xy − 1) dx + x 2 dy = 0. For Problems 1–3, determine whether the given differential
equation is exact. 21. y dx − (2x + y 4 ) dy = 0. 1. (y + 3x 2 ) dx + xdy = 0.
2. [cos(xy) − xy sin(xy)] dx − x 2 sin(xy) dy = 0.
3. yexy dx + (2y − xexy ) dy = 0.
For Problems 4–12, solve the given differential equation.
4. 2xy dx + (x 2 + 1) dy = 0.
5. (y 2 + cos x) dx + (2xy + sin y) dy = 0. 6. x −1 (xy − 1) dx + y −1 (xy + 1) dy = 0.
7. (4e2x + 2xy − y 2 ) dx + (x − y)2 dy = 0.
8. (y 2 − 2x) dx + 2xy dy = 0.
9. y
1
−2
x
x + y2 x
dx + 2
dy = 0.
x + y2 10. [1 + ln (xy)] dx + xy −1 dy = 0.
11. [y cos(xy) − sin x ] dx + x cos(xy) dy = 0.
12. (2xy + cos y) dx + (x 2 − x sin y − 2y) dy = 0.
For Problems 13–15, solve the given initial-value problem. 22. x 2 y dx + y(x 3 + e−3y sin y) dy = 0.
23. (y − x 2 ) dx + 2xdy = 0, x > 0. 24. xy [2 ln (xy) + 1] dx + x 2 dy = 0,
25. x > 0. 2x
1
dy
+
y=
.
2
dx
1+x
(1 + x 2 )2 26. (3xy − 2y −1 ) dx + x(x + y −2 ) dy = 0.
For Problems 27–29, determine the values of the constants
r and s such that I (x, y) = x r y s is an integrating factor for
the given differential equation.
27. (y −1 − x −1 ) dx + (xy −2 − 2y −1 ) dy = 0.
28. y(5xy 2 + 4) dx + x(xy 2 − 1) dy = 0.
29. 2y(y + 2x 2 ) dx + x(4y + 3x 2 ) dy = 0.
30. Prove that if (My − Nx )/M = g(y), a function of y
only, then an integrating factor for
M(x, y) dx + N(x, y) dy = 0
is I (y) = e− g (y) dy . i i i i i i i “main”
2007/2/16
page 90
i 90 CHAPTER 1 First-Order Differential Equations 31. Consider the general ﬁrst-order linear differential
equation
dy
+ p(x)y = q(x),
(1.9.25)
dx
where p(x) and q(x) are continuous functions on some
interval (a, b).
(a) Rewrite Equation (1.9.25) in differential form,
and show that an integrating factor for the resulting equation is
I (x) = e 1.10 p (x)dx . (b) Show that the general solution to Equation
(1.9.25) can be written in the form
y(x) = I −1 x I (t)q(t) dt + c , where I is given in Equation (1.9.26), and c is an
arbitrary constant. (1.9.26) Numerical Solution to First-Order Differential Equations
So far in this chapter we have investigated ﬁrst-order differential equations geometrically
via slope ﬁelds, and analytically by trying to construct exact solutions to certain types of
differential equations. Certainly, for most ﬁrst-order differential equations, it simply is
not possible to ﬁnd analytic solutions, since they will not fall into the few classes for which
solution techniques are available. Our ﬁnal approach to analyzing ﬁrst-order differential
equations is to look at the possibility of constructing a numerical approximation to the
unique solution to the initial-value problem
dy
= f (x, y),
dx y(x0 ) = y0 . (1.10.1) We consider three techniques that give varying levels of accuracy. In each case, we
generate a sequence of approximations y1 , y2 , . . . to the value of the exact solution at
the points x1 , x2 , . . . , where xn+1 = xn + h, n = 0, 1, . . . , and h is a real number.
We emphasize that numerical methods do not generate a formula for the solution to the
differential equation. Rather they generate a sequence of approximations to the value of
the solution at speciﬁed points. Furthermore, if we use a sufﬁcient number of points, then
by plotting the points (xi , yi ) and joining them with straight-line segments, we are able to
obtain an overall approximation to the solution curve corresponding to the solution of the
given initial-value problem. This is how the approximate solution curves were generated
in the preceding sections via the computer algebra system Maple. There are many subtle
ideas associated with constructing numerical solutions to initial-value problems that are
beyond the scope of this text. Indeed, a full discussion of the application of numerical
methods to differential equations is best left for a future course in numerical analysis. Euler’s Method
Suppose we wish to approximate the solution to the initial-value problem (1.10.1) at
x = x1 = x0 + h, where h is small. The idea behind Euler’s method is to use the
tangent line to the solution curve through (x0 , y0 ) to obtain such an approximation. (See
Figure 1.10.1.)
The equation of the tangent line through (x0 , y0 ) is
y(x) = y0 + m(x − x0 ),
where m is the slope of the curve at (x0 , y0 ). From Equation (1.10.1), m = f (x0 , y0 ), so
y(x) = y0 + f (x0 , y0 )(x − x0 ). i i i i i i i “main”
2007/2/16
page 91
i 1.10
y Numerical Solution to First-Order Differential Equations
Tangent line to the
solution curve passing
through (x1, y1) 91 Solution curve through (x1, y1) y3
y2
y1 (x1, y1)
Tangent line at the point
(x0, y0) to the exact
solution to the IVP y0 (x2, y(x2))
Exact solution to IVP (x1, y(x1)) (x0, y0) h
x0 h
x1 h
x2 x3 x Figure 1.10.1: Euler’s method for approximating the solution to the initial-value problem
dy/dx = f (x, y), y(x0 ) = y0 . Setting x = x1 in this equation yields the Euler approximation to the exact solution at
x1 , namely,
y1 = y0 + f (x0 , y0 )(x1 − x0 ),
which we write as
y1 = y0 + hf (x0 , y0 ).
Now suppose we wish to obtain an approximation to the exact solution to the initialvalue problem (1.10.1) at x2 = x1 + h. We can use the same idea, except we now use
the tangent line to the solution curve through (x1 , y1 ). From (1.10.1), the slope of this
tangent line is f (x1 , y1 ), so that the equation of the required tangent line is
y(x) = y1 + f (x1 , y1 )(x − x1 ).
Setting x = x2 yields the approximation
y2 = y1 + hf (x1 , y1 ),
where we have substituted for x2 − x1 = h, to the solution to the initial-value problem
at x = x2 . Continuing in this manner, we determine the sequence of approximations
yn+1 = yn + hf (xn , yn ), n = 0, 1, . . . to the solution to the initial-value problem (1.10.1) at the points xn+1 = xn + h.
In summary, Euler’s method for approximating the solution to the initial-value problem
y = f (x, y), y(x0 ) = y0 at the points xn+1 = x0 + nh (n = 0, 1, . . . ) is
yn+1 = yn + hf (xn , yn ), n = 0, 1, . . . . (1.10.2) i i i i i i i “main”
2007/2/16
page 92
i 92 CHAPTER 1 First-Order Differential Equations Example 1.10.1 Consider the initial-value problem
y = y − x, 1
y(0) = 2 . Use Euler’s method with (a) h = 0.1 and (b) h = 0.05 to obtain an approximation to
y(1). Given that the exact solution to the initial-value problem is
1
y(x) = x + 1 − 2 ex , compare the errors in the two approximations to y(1). Solution: In this problem we have
f (x, y) = y − x, x0 = 0 , 1
y0 = 2 . (a) Setting h = 0.1 in (1.10.2) yields
yn+1 = yn + 0.1(yn − xn ).
Hence,
y1 = y0 + 0.1(y0 − x0 ) = 0.5 + 0.1(0.5 − 0) = 0.55,
y2 = y1 + 0.1(y1 − x1 ) = 0.55 + 0.1(0.55 − 0.1) = 0.595.
Continuing in this manner, we generate the approximations listed in Table 1.10.1,
where we have rounded the calculations to six decimal places.
Exact Solution n xn yn 1
2
3
4
5
6
7
8
9
10 0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1.0 0.55
0.595
0.6345
0.66795
0.694745
0.714219
0.725641
0.728205
0.721026
0.703129 Absolute Error 0.547414
0.589299
0.625070
0.654088
0.675639
0.688941
0.693124
0.687229
0.670198
0.640859 0.002585
0.005701
0.009430
0.013862
0.019106
0.025278
0.032518
0.040976
0.050828
0.062270 Table 1.10.1: The results of applying Euler’s method with h = 0.1 to the initial-value problem
in Example 1.10.1. We have also listed the values of the exact solution and the absolute value of the
error. In this case, the approximation to y(1) is y10 = 0.703129, with an absolute
error of
|y(1) − y10 | = 0.062270. (1.10.3) (b) When h = 0.05, Euler’s method gives
yn+1 = yn + 0.05(yn − xn ), n = 0, 1, . . . , 19, which generates the approximations given in Table 1.10.2, where we have listed
only every other intermediate approximation. We see that the approximation to
y(1) is
y20 = 0.673351 i i i i i i i “main”
2007/2/16
page 93
i 1.10 Numerical Solution to First-Order Differential Equations 93 and that the absolute error in this approximation is
|y(1) − y20 | = 0.032492. n xn yn 2
4
6
8
10
12
14
16
18
20 0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1.0 Exact Solution Absolute Error 0.547414
0.589299
0.625070
0.654088
0.675639
0.688941
0.693124
0.687229
0.670198
0.640859 0.001335
0.002948
0.004881
0.007185
0.009913
0.013131
0.016910
0.021333
0.026492
0.032492 0.54875
0.592247
0.629952
0.661272
0.685553
0.702072
0.710034
0.708563
0.696690
0.686525 Table 1.10.2: The results of applying Euler’s method with h = 0.05 to the initial-value
problem in Example 1.10.1. y 0.7
0.65
0.6
0.55 0.2 0.4 0.6 0.8 1 x Figure 1.10.2: The exact solution to the initial-value problem considered in Example 1.10.1
and the two approximations obtained using Euler’s method. Comparing this with (1.10.3), we see that the smaller step size has led to a better
approximation. In fact, it has almost halved the error at y(1). In Figure 1.10.2 we have
plotted the exact solution and the Euler approximations just obtained.
In the preceding example we saw that halving the step size had the effect of essentially halving the error. However, even then the accuracy was not as good as we probably
would have liked. Of course we could just keep decreasing the step size (provided we
did not take h to be so small that round-off errors started to play a role) to increase the
accuracy, but then the number of steps we would have to take would make the calculations very cumbersome. A better approach is to derive methods that have a higher order
of accuracy. We will consider two such methods. i i i i i i i “main”
2007/2/16
page 94
i 94 CHAPTER 1 First-Order Differential Equations Modiﬁed Euler Method (Heun’s Method)
The method that we consider here is an example of what is called a predictor-corrector
method. The idea is to use the formula from Euler’s method to obtain a ﬁrst approxima∗
tion to the solution y(xn+1 ). We denote this approximation by yn+1 , so that
∗
yn+1 = yn + hf (xn , yn ). We now improve (or “correct”) this approximation by once more applying Euler’s
method. But this time, we use the average of the slopes of the solution curves through
∗
(xn , yn ) and (xn+1 , yn+1 ). This gives
1
∗
yn+1 = yn + 2 h[f (xn , yn ) + f (xn+1 , yn+1 )]. As illustrated in Figure 1.10.3 for the case n = 1, we can interpret the modiﬁed Euler
approximations as arising from ﬁrst stepping to the point
y
(x1, y(x1))
Modified Euler
approximation at x
(x0 h/2, y0 hf(x0, y0)/2)
(x1, y1) Exact solution
to the IVP Euler approximation
at x x1 (x1, y*1) (x0, y0) P
Tangent line to solution
curve through (x1, y*1)
x h/2
x0 x1 x0 x1 h/2 Figure 1.10.3: Derivation of the ﬁrst step in the modiﬁed Euler method. P xn + h
hf (xn , yn )
, yn +
2
2 along the tangent line to the solution curve through (xn , yn ) and then stepping from P
∗
to (xn+1 , yn+1 ) along the line through P whose slope is f (xn , yn ).
In summary, the modiﬁed Euler method for approximating the solution to the initialvalue problem
y = f (x, y), y(x0 ) = y0
at the points xn+1 = x0 + nh (n = 0, 1, . . . ) is
1
∗
yn+1 = yn + 2 h f (xn , yn ) + f (xn+1 , yn+1 ) , where
∗
yn+1 = yn + hf (xn , yn ), Example 1.10.2 n = 0, 1, . . . . Apply the modiﬁed Euler method with h = 0.1 to determine an approximation to the
solution to the initial-value problem
y = y − x, y(0) = 1
2 at x = 1. i i i i i i i “main”
2007/2/16
page 95
i 1.10 Numerical Solution to First-Order Differential Equations 95 Taking h = 0.1 and f (x, y) = y − x in the modiﬁed Euler method yields Solution: ∗
yn+1 = yn + 0.1(yn − xn ),
∗
yn+1 = yn + 0.05(yn − xn + yn+1 − xn+1 ). Hence,
yn+1 = yn + 0.05 {yn − xn + [yn + 0.1(yn − xn )] − xn+1 } .
That is,
yn+1 = yn + 0.05(2.1yn − 1.1xn − xn+1 ), n = 0, 1, . . . , 9. When n = 0,
y1 = y0 + 0.05(2.1y0 − 1.1x0 − x1 ) = 0.5475,
and when n = 1,
y2 = y1 + 0.05(2.1y1 − 1.1x1 − x2 ) = 0.5894875.
Exact Solution n xn yn 1
2
3
4
5
6
7
8
9
10 0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1.0 0.5475
0.589487
0.625384
0.654549
0.676277
0.689786
0.694213
0.688605
0.671909
0.642959 Absolute Error 0.547414
0.589299
0.625070
0.654088
0.675639
0.688941
0.693124
0.687229
0.670198
0.640859 0.000085
0.000189
0.000313
0.000461
0.000637
0.000845
0.001089
0.001376
0.001711
0.002100 Table 1.10.3: The results of applying the modiﬁed Euler method with h = 0.1 to the
initial-value problem in Example 1.10.2. Continuing in this manner, we generate the results displayed in Table 1.10.3. From this
table, we see that the approximation to y(1) according to the modiﬁed Euler method is
y10 = 0.642960.
As seen in the previous example, the value of the exact solution at x = 1 is
y(1) = 0.640859.
Consequently, the absolute error in the approximation at x = 1 using the modiﬁed Euler
approximation with h = 0.1 is
|y(1) − y10 | = 0.002100.
Comparing this with the results of the previous example, we see that the modiﬁed Euler
method has picked up approximately one decimal place of accuracy when using a step
size h = 0.1. This is indicative of the general result that the error in the modiﬁed Euler
method behaves as order h2 as compared to the order h behavior of the Euler method.
In Figure 1.10.4 we have sketched the exact solution to the differential equation and the
modiﬁed Euler approximation with h = 0.1. i i i i i i i “main”
2007/2/16
page 96
i 96 CHAPTER 1 First-Order Differential Equations
y 0.65 0.6 0.55 0.2 0.4 0.6 0.8 1 x Figure 1.10.4: The exact solution to the initial-value problem in Example 1.10.2 and the
approximations obtained using the modiﬁed Euler method with h = 0.1. Runge-Kutta Method of Order Four
The ﬁnal method that we consider is somewhat more tedious to use in hand calculations,
but is very easily programmed into a calculator or computer. It is a fourth-order method,
which, in the case of a differential equation of the form y = f (x), reduces to Simpson’s rule (which the reader has probably studied in a calculus course) for numerically
evaluating deﬁnite integrals. Without justiﬁcation, we state the algorithm.
The fourth-order Runge-Kutta method for approximating the solution to the initialvalue problem
y = f (x, y), y(x0 ) = y0 at the points xn+1 = x0 + nh (n = 0, 1, . . . ) is
yn+1 = yn + 1 (k1 + 2k2 + 2k3 + k4 ),
6
where
1
1
1
1
k1 = hf (xn , yn ), k2 = hf (xn + 2 h, yn + 2 k1 ), k3 = hf (xn + 2 h, yn + 2 k2 ), k4 = hf (xn+1 , yn + k3 ),
n = 0, 1, 2, . . . . Remark In the previous sections, we used Maple to generate slope ﬁelds and approximate solution curves for ﬁrst-order differential equations. The solution curves were in
fact generated using a Runge-Kutta approximation. Example 1.10.3 Apply the fourth-order Runge-Kutta method with h = 0.1 to determine an approximation
to the solution to the initial-value problem below at x = 1:
y = y − x, y(0) = 1
2 i i i i i i i “main”
2007/2/16
page 97
i 1.10 Numerical Solution to First-Order Differential Equations 97 Solution: We take h = 0.1, and f (x, y) = y − x in the fourth-order Runge-Kutta
method, and we need to determine y10 . First we determine k1 , k2 , k3 , k4 .
k1
k2
k3
k4 = 0.1f (xn , yn ) = 0.1(yn − xn ),
= 0.1f (xn + 0.05, yn + 0.5k1 ) = 0.1(yn + 0.5k1 − xn − 0.05),
= 0.1f (xn + 0.05, yn + 0.5k2 ) = 0.1(yn + 0.5k2 − xn − 0.05),
= 0.1f (xn+1 , yn + k3 ) = 0.1(yn + k3 − xn+1 ). When n = 0,
k1
k2
k3
k4 = 0.1(0.5) = 0.05,
= 0.1[0.5 + (0.5)(0.05) − 0.05] = 0.0475,
= 0.1[0.5 + (0.5)(0.0475) − 0.05] = 0.047375,
= 0.1(0.5 + 0.047375 − 0.1) = 0.0447375, so that
y1 = y0 + 1 (k1 + 2k2 + 2k3 + k4 ) = 0.5 + 1 (0.2844875) = 0.54741458,
6
6
rounded to eight decimal places. Continuing in this manner, we obtain the results displayed in Table 1.10.4.
Exact Solution n xn yn 1
2
3
4
5
6
7
8
9
10 0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1.0 0.54741458
0.58929871
0.62507075
0.65408788
0.67563968
0.68894102
0.69312419
0.68723022
0.67019929
0.64086013 Absolute Error 0.54741454
0.58929862
0.62507060
0.65408765
0.67563936
0.68894060
0.69312365
0.68722954
0.67019844
0.64085909 0.00000004
0.00000009
0.00000015
0.00000022
0.00000032
0.00000042
0.00000054
0.00000068
0.00000085
0.00000104 Table 1.10.4: The results of applying the fourth-order Runge-Kutta method with h = 0.1 to
the initial-value problem in Example 1.10.3. In particular, we see that the fourth-order Runge-Kutta method approximation to y(1) is
y10 = 0.64086013,
so that
|y(1) − y10 | = 0.00000104.
Clearly this is an excellent approximation. If we increase the step size to h = 0.2, the
corresponding approximation to y(1) becomes
y5 = 0.640874,
with absolute error
|y(1) − y5 | = 0.000015,
which is still very impressive. i i i i i i i “main”
2007/2/16
page 98
i 98 CHAPTER 1 First-Order Differential Equations Exercises for 1.10 Key Terms 1. y = 4y − 1, Euler’s method, Predictor-corrector method, Modiﬁed Euler method (Heun’s method), Fourth-order Runge-Kutta
method. 2. y = − Skills
• Be able to apply Euler’s method to approximate the
solution to an initial-value problem at a point near the
initial value x0 .
• Be able to use the modiﬁed Euler method (Heun’s
method) to approximate the solution to an initial-value
problem at a point near the initial value x0 .
• Be able to use the fourth-order Runge-Kutta method
to approximate the solution to an initial-value problem
at a point near the initial value x0 . True-False Review
For Questions 1–4, decide if the given statement is true or
false, and give a brief justiﬁcation for your answer. If true,
you can quote a relevant deﬁnition or theorem from the text.
If false, provide an example, illustration, or brief explanation
of why the statement is false.
1. Generally speaking, the smaller the step size in Euler’s method, the more accurate the approximation to
the solution of an initial-value problem at a point near
the initial value x0 .
2. Euler’s method is based on the equation of a tangent
line to a curve at a given point (x0 , y0 ). y(0) = 1, 2xy
,
1 + x2 3. y = x − y 2 , h = 0.05, y(0) = 1,
y(0) = 2, 4. y = −x 2 y, y(0) = 1, 5. y = 2xy 2 , y(0) = 0.5, y(0.5). h = 0.1,
h = 0.05, h = 0.2, y(1).
y(0.5). y(1). h = 0.1, y(1). For Problems 6–10, use the modiﬁed Euler method with the
speciﬁed step size to determine the solution to the given
initial-value problem at the speciﬁed point. In each case,
compare your answer to that obtained using Euler’s method.
6. The initial-value problem in Problem 1.
7. The initial-value problem in Problem 2.
8. The initial-value problem in Problem 3.
9. The initial-value problem in Problem 4.
10. The initial-value problem in Problem 5.
For Problems 11–15, use the fourth-order Runge-Kutta
method with the speciﬁed step size to determine the solution to the given initial-value problem at the speciﬁed point.
In each case, compare your answer to that obtained using
Euler’s method.
11. The initial-value problem in Problem 1.
12. The initial-value problem in Problem 2.
13. The initial-value problem in Problem 3.
14. The initial-value problem in Problem 4. 3. With each additional step that is taken in Euler’s
method, the error in the approximation obtained from
the method can only grow in size.
4. At each step of length h, Heun’s method requires two
applications of Euler’s method with step size h/2. Problems
For Problems 1–5, use Euler’s method with the speciﬁed
step size to determine the solution to the given initial-value
problem at the speciﬁed point. 15. The initial-value problem in Problem 5.
16. Use the fourth-order Runge-Kutta method with
h = 0.5 to approximate the solution to the initialvalue problem
y+ 1
10 y = e−x/10 cos x, y(0) = 0 at the points x = 0.5, 1.0, . . . , 25. Plot these
points and describe the behavior of the corresponding
solution. i i i i i i i “main”
2007/2/16
page 99
i 1.11 1.11 Some Higher-Order Differential Equations 99 Some Higher-Order Differential Equations
So far we have developed analytical techniques only for solving special types of ﬁrstorder differential equations. The methods that we have discussed do not apply directly to
higher-order differential equations, and so the solution to such equations usually requires
the derivation of new techniques. One approach is to replace a higher-order differential
equation by an equivalent system of ﬁrst-order equations. (This will be developed further
in Chapter 7.) For example, any second-order differential equation that can be written in
the form
d 2y
dy
= F x , y,
dx
dx 2 (1.11.1) where F is a known function, can be replaced by an equivalent pair of ﬁrst-order differential equations as follows. We let v = dy/dx . Then d 2 y/dx 2 = dv/dx , and so solving Equation (1.11.1) is equivalent to solving the following two ﬁrst-order differential
equations
dy
= v,
dx
dv
= F (x, y, v).
dx (1.11.2)
(1.11.3) In general the differential equation (1.11.3) cannot be solved directly, since it involves
three variables, x , y , and v . However, for certain forms of the function F , Equation (1.11.3) will involve only two variables, and then we can sometimes solve it for
v using one of our previous techniques. Having obtained v , we can then substitute into
Equation (1.11.2) to obtain a ﬁrst-order differential equation for y . We now discuss two
forms of F for which this is certainly the case.
Case 1: Second-Order Equations with the Dependent Variable Missing
If y does not occur explicitly in the function F , then Equation (1.11.1) assumes the
form
d 2y
dy
= F x,
2
dx
dx (1.11.4) . Substituting v = dy/dx and dv/dx = d 2 y/dx 2 into this equation allows us to replace
it with the two ﬁrst-order equations
dy
= v,
dx
dv
= F (x, v).
dx (1.11.5)
(1.11.6) Thus, to solve Equation (1.11.4), we ﬁrst solve Equation (1.11.6) for v in terms of
x and then solve Equation (1.11.5) for y as a function of x . Example 1.11.1 Find the general solution to
d 2y
1
=
2
x
dx dy
+ x 2 cos x ,
dx x > 0. (1.11.7) i i i i i i i “main”
2007/2/16
page 100
i 100 CHAPTER 1 First-Order Differential Equations Solution: In Equation (1.11.7) the dependent variable is missing, and so we let v =
dy/dx , which implies that d 2 y/dx 2 = dv/dx . Substituting into Equation (1.11.7) yields
the following equivalent ﬁrst-order system:
dy
= v,
dx
dv
1
= (v + x 2 cos x).
dx
x (1.11.8)
(1.11.9) Equation (1.11.9) is a ﬁrst-order linear differential equation with standard form
dv
− x −1 v = x cos x.
dx (1.11.10) An appropriate integrating factor is
I (x) = e− x −1 dx = e− ln x = x −1 . Multiplying Equation (1.11.10) by x −1 reduces it to
d −1
(x v) = cos x,
dx
which can be integrated directly to obtain
x −1 v = sin x + c.
Thus,
v = x sin x + cx. (1.11.11) Substituting the expression for v from (1.11.11) into Equation (1.11.8) gives
dy
= x sin x + cx
dx
which we can integrate to obtain
y(x) = −x cos x + sin x + c1 x 2 + c2 .
Case 2: Second-Order Equations with the Independent Variable Missing
If x does not occur explicitly in the function F in Equation (1.11.1), then we must
solve a differential equation of the form
dy
d 2y
= F y,
dx
dx 2 . (1.11.12) In this case, we still let
v= dy
,
dx as previously, but now we use the chain rule to express d 2 y/dx 2 in terms of dv/dy .
Speciﬁcally, we have
d 2y
dv dy
dv
dv
=
=v .
=
dx
dy dx
dy
dx 2 i i i i i i i “main”
2007/2/16
page 101
i 1.11 Some Higher-Order Differential Equations 101 Substituting for dy/dx and d 2 y/dx 2 into Equation (1.11.12) reduces the second-order
equation to the equivalent ﬁrst-order system
dy
= v,
dx
dv
= F (y, v).
dy (1.11.13)
(1.11.14) In this case, we ﬁrst solve Equation (1.11.14) for v as a function of y and then solve
Equation (1.11.13) for y as a function of x .
Example 1.11.2 Find the general solution to
d 2y
2
=−
1−y
dx 2 dy
dx 2 . (1.11.15) Solution: In this differential equation, the independent variable does not occur explicitly. Therefore, we let v = dy/dx and use the chain rule to obtain
dv dy
dv
dv
d 2y
=
=v .
=
2
dx
dy dx
dy
dx
Substituting into Equation (1.11.15) results in the equivalent system
dy
= v,
dx
dv
2
v
=−
v2 .
dy
1−y (1.11.16)
(1.11.17) Separating the variables in the differential equation (1.11.17) gives
2
1
dv = −
dy,
v
1−y (1.11.18) which can be integrated to obtain
ln |v | = 2 ln |1 − y | + c.
Combining the logarithm terms and exponentiating yields
v(y) = c1 (1 − y)2 , (1.11.19) where we have set c1 = ±ec . Notice that in solving Equation (1.11.17), we implicitly
assumed that v = 0, since we divided by it to obtain Equation (1.11.18). However, the
general form (1.11.19) does include the solution v = 0, provided we allow c1 to equal
zero. Substituting for v into Equation (1.11.16) yields
dy
= c1 (1 − y)2 .
dx
Separating the variables and integrating, we obtain
(1 − y)−1 = c1 x + d1 .
That is,
1−y = 1
.
c1 x + d1 i i i i i i i “main”
2007/2/16
page 102
i 102 CHAPTER 1 First-Order Differential Equations Solving for y gives
y(x) = c1 x + (d1 − 1)
,
c1 x + d1 (1.11.20) which can be written in the simpler form
y(x) = x+a
,
x+b (1.11.21) where the constants a and b are deﬁned by a = (d1 − 1)/c1 and b = d1 /c1 . Notice
that the form (1.11.21) does not include the solution y = constant, which is contained
in (1.11.20) (set c1 = 0). This is because in dividing by c1 , we implicitly assumed that
c1 = 0. Thus in specifying the solution in the form (1.11.21), we should also include the
statement that any constant function y = k (k a constant) is a solution. Example 1.11.3 Determine the displacement at time t of a simple harmonic oscillator that is extended a
distance A units from its equilibrium position and released from rest at t = 0. Solution: According to the derivation in Section 1.1, the motion of the simple harmonic oscillator is governed by the initial-value problem
d 2y
= −ω2 y,
dt 2
dy
(0) = 0,
y(0) = A,
dt (1.11.22)
(1.11.23) where ω is a positive constant. The differential equation (1.11.22) has the independent
variable t missing. We therefore let v = dy/dt and use the chain rule to write
d 2y
dv
=v
dy
dt 2
It then follows that Equation (1.11.22) can be replaced by the equivalent ﬁrst-order
system
dy
= v,
dt
dv
v
= −ω2 y.
dy (1.11.24)
(1.11.25) Separating the variables and integrating Equation (1.11.25) yields
12
1
v = − ω2 y 2 + c,
2
2
which implies that
v = ± c1 − ω2 y 2
where c1 = 2c. Substituting for v into Equation (1.11.24) yields
dy
= ± c1 − ω2 y 2 .
dt (1.11.26) i i i i i i i “main”
2007/2/16
page 103
i 1.11 Some Higher-Order Differential Equations 103 Setting t = 0 in this equation and using the initial conditions (1.11.23), we ﬁnd that
c1 = ω2 A2 . Equation (1.11.26) therefore gives
dy
= ±ω A2 − y 2 .
dt
By separating the variables and integrating, we obtain
arcsin(y/A) = ±ωt + b,
where b is an integration constant. Thus,
y(t) = A sin(b ± ωt).
The initial condition y(0) = A implies that sin b = 1, and so we can choose b = π/2.
We therefore have
y(t) = A sin(π/2 ± ωt)
That is,
y(t) = A cos ωt.
Consequently the predicted motion is that the mass oscillates between ±A for all t . This
solution makes sense physically, since the simple harmonic oscillator does not include
dissipative forces that would slow the motion. Remark In Chapter 6 we will see how to solve the initial-value problem (1.11.22),
(1.11.23) in just a few lines of work without requiring any integration! Exercises for 1.11
7. y − 2x −1 y = 6x 4 . Skills
• Be familiar with the strategy of solving a higher-order
differential equation by replacing it with an equivalent
system of ﬁrst-order differential equations, and be able
to carry out this strategy in particular instances. Problems
1. y = 2x −1 y + 4x 2 .
2. (x − 1)(x − 2)y = y − 1.
3. y + 2y −1 )(y )2 = y . 9. y − α(y )2 − βy = 0, where α and β are nonzero
constants. 5. y + y tan x = (y )2 .
dx
dt 2 +2 11. (1 + x 2 )y = −2xy .
12. y + y −1 (y )2 = ye−y (y )3 .
13. y − y tan x = 1, 0 ≤ x < π/2. In Problems 14–15, solve the given initial-value problem. 4. y = (y )2 tan y . d 2x
=
dt 2 d 2x
dx
= 2(t +
).
dt
dt 2 10. y − 2x −1 y = 18x 4 . For Problems 1–13, solve the given differential equation. 6. 8. t dx
.
dt 14. yy = 2(y )2 + y 2 , y(0) = 1, 15. y = ω2 y, y(0) = a,
positive constants. y (0) = 0. y (0) = 0, where ω, a are i i i i i i i “main”
2007/2/16
page 104
i 104 CHAPTER 1 First-Order Differential Equations 16. The following initial-value problem arises in the analysis of a cable suspended between two ﬁxed points
y= 1
1 + (y )2 , y(0) = a, y (0) = 0,
a where a is a nonzero constant. Solve this initial-value
problem for y(x). The corresponding solution curve
is called a catenary.
17. Consider the general second-order linear differential
equation with dependent variable missing: Replace this differential equation with an equivalent
pair of ﬁrst-order equations and express the solution
in terms of integrals.
18. Consider the general third-order differential equation
of the form
= F (x, y ).
du1
du2
du3
= u2 ,
= u3 ,
= F (x, u3 ),
dx
dx
dx
where the variables u1 , u2 , u3 are deﬁned by (b) Solve y u2 = y , u3 = y . = x −1 (y − 1). 19. A simple pendulum consists of a particle of mass m
supported by a piece of string of length L. Assuming
that the pendulum is displaced through an angle θ0
radians from the vertical and then released from rest, 1.12 θ(0) = θ0 , dθ
(0) = 0.
dt
(1.11.28) (a) For small oscillations, θ << 1, we can use the
approximation sin θ ≈ θ in Equation (1.11.28) to
obtain the linear equation
θ(0) = θ0 , dθ
(0) = 0.
dt Solve this initial-value problem for θ as a function
of t . Is the predicted motion reasonable?
(b) Obtain the following ﬁrst integral of (1.11.28):
dθ
2g
=±
(cos θ − cos θ0 ).
dt
L (1.11.29) (1.11.27) (a) Show that Equation (1.11.27) can be replaced by
the equivalent ﬁrst-order system u1 = y, g
d 2θ
+ sin θ = 0,
L
dt 2 d 2θ g
+ θ = 0,
dt 2 L y + p(x)y = q(x). y the resulting motion is described by the initial-value
problem (c) Show from Equation (1.11.29) that the time T
(equal to one-fourth of the period of motion) required for θ to go from 0 to θ0 is given by the
elliptic integral of the ﬁrst kind
T= L
2g θ0 √ 0 1
dθ. (1.11.30)
cos θ − cos θ0 (d) Show that (1.11.30) can be written as
T= L
g π /2
0 1
1 − k 2 sin2 u du, where k = sin(θ0 /2). [Hint: First express cos θ
and cos θ0 in terms of sin2 (θ/2) and sin2 (θ0 /2).] Chapter Review Basic Theory of Differential Equations
This chapter has provided an introduction to the theory of differential equations. A
differential equation involves one or more derivatives of an unknown function, and the
highest-order derivative is the order of the differential equation.
For an nth-order differential equation, the general solution contains n arbitrary constants, and all solutions can be obtained by assigning appropriate values to the constants.
This chapter is concerned mainly with ﬁrst-order differential equations, which may
be written in the form i i i i i i i “main”
2007/2/16
page 105
i 1.12 Chapter Review dy
= f (x, y),
dx 105 (1.12.1) for some given function f . If we impose an initial condition specifying the value of a
solution y(x) to the differential equation (1.12.1) at a particular point x0 , say y0 = y(x0 ),
then we have an initial-value problem:
dy
= f (x, y),
dx y(x0 ) = y0 . (1.12.2) To solve an initial-value problem of the form (1.12.2), the ﬁrst step is to determine the
general solution to the differential equation (1.12.1), and then use the initial condition to
determine the speciﬁc value of the arbitrary constant appearing in the general solution. Solution Techniques for First-Order Differential Equations
One of our main goals in this chapter is to ﬁnd solutions to ﬁrst-order differential equations of the form (1.12.1). There are various ways in which we can seek these solutions:
1. Geometrically: The function f (x, y) gives the slope of the tangent line to the
solution curves of the differential equation (1.12.1) at the point (x, y). Thus, by
computing f (x, y) for various points (x, y), we can draw small line segments
through the point (x, y) with slope f (x, y) to depict how a solution curve would
pass through (x, y). The resulting picture of line segments is called the slope ﬁeld
of the differential equation, and any solution curves to the differential equation in
the xy -plane must be tangent to the slope ﬁeld at all points.
For example, the differential equation dy/dx = −x/y determines a slope ﬁeld
consisting of small line segments that encircle the origin. Indeed, the solutions to
this differential equation consist of concentric circles centered at the origin.
One piece of theory is that different solution curves for the same differential equation can never cross (this essentially tells us that an initial-value problem cannot
have multiple solutions). Thus, for example, if we ﬁnd a solution to the differential
equation (1.12.1) of the form y(x) = y0 , for some constant y0 (recall that such a
solution is called an equilibrium solution), then all other solution curves to the
differential equation must lie entirely above the line y = y0 or entirely below it.
2. Numerically: Suppose we wish to approximate the solution to the initial-value
problem (1.12.2) at the point x = x1 = x0 + h, where h is small. Euler’s method
uses the slope of the solution at (x0 , y0 ), which is f (x0 , y0 ), to use a tangent line
approximation to the solution:
y(x) = y0 + f (x0 , y0 )(x − x0 ).
Therefore, we approximate
y(x1 ) = y0 + f (x0 , y0 )(x1 − x0 ) = y0 + hf (x0 , y0 ).
Now, starting from the point (x1 , y(x1 )), we can repeat the process to ﬁnd approximations to the solutions at other points x2 , x3 , . . . . The conclusion is that the
approximation to the solution to the initial-value problem (1.12.2) at the points
xn+1 = x0 + nh (n = 0, 1, . . . ) is
yn+1 = yn + hf (xn , yn ), n = 0, 1, . . . In Section 1.10, other modiﬁcations to Euler’s method are also discussed. i i i i i i i “main”
2007/2/16
page 106
i 106 CHAPTER 1 First-Order Differential Equations 3. Analytically: In some situations, we can explicitly obtain an equation for the general solution to the differential equation (1.12.1). These include situations in which
the differential equation is separable, ﬁrst-order linear, ﬁrst-order homogeneous,
Bernoulli, and/or exact. Table 1.12.1 shows the types of differential equations we
can solve analytically and summarizes the solution techniques. If a given differential equation cannot be written in one of these forms, then the next step is to try
to determine an integrating factor. If that fails, then we might try to ﬁnd a change
of variables that would reduce the differential equation to one of the above types. Type Standard Form Technique Separable p(y)y = q(x) Separate the variables and integrate. First-order linear y + p(x)y = q(x) d
Rewrite as dx (I · y) = I · q(x), where
I = e p(x)dx , and integrate with respect to x . First-order
homogeneous y = f (x, y) where
f (tx, ty) = f (x, y) Change variables: y = xV (x), and reduce to a
separable equation. Bernoulli y + p(x)y = q(x)y n Divide by y n and make the change of variables
u = y 1−n . This reduces the differential equation to
a linear equation. Exact M dx + N dy = 0, with
My = Nx The solution is φ(x, y) = c, where φ is determined
by integrating φx = M , φy = N . Table 1.12.1: A summary of the basic solution techniques for y = f (x, y). Example 1.12.1 Determine which of the above types, if any, the following differential equation falls into:
dy
(8x 5 + 3y 4 )
.
=−
dx
4xy 3 Solution: Since the given differential equation is written in the form dy/dx =
f (x, y), we ﬁrst check whether it is separable or homogeneous. By inspection, we
see that it is neither of these. We next check to see whether it is a linear or a Bernoulli
equation. We therefore rewrite the equation in the equivalent form
dy
3
+
y = −2x 4 y −3 ,
dx
4x (1.12.3) which we recognize as a Bernoulli equation with n = −3. We could therefore solve the
equation using the appropriate technique. Owing to the y −3 term in Equation (1.12.3),
it follows that the equation is not a linear equation. Finally, we check for exactness. The
natural differential form to try for the given differential equation is
(8x 5 + 3y 4 ) dx + 4xy 3 dy = 0. (1.12.4) In this form, we have
My = 12y 3 , Nx = 4y 3 , i i i i i i i “main”
2007/2/16
page 107
i 1.12 Chapter Review 107 so that the equation is not exact. However, we see that
(My − Nx )/N = 2x −1 ,
so that according to Theorem 1.9.11, I (x) = x 2 is an integrating factor. Therefore, we
could multiply Equation (1.12.4) by x 2 and then solve it as an exact equation. Examples of First-Order Differential Equations
There are numerous real-world examples of ﬁrst-order differential equations. Among the
applications discussed in this chapter are Newton’s law of cooling, families of orthogonal trajectories, Malthusian and logistic population models, mixing problems, electric
circuits, and others. Additional Problems
1. A racquetball player standing at the back wall of the
court hits the ball from a height of 2 feet horizontally
toward the front wall at 80 miles per hour. The length
of a regulation racquetball court is 40 feet. Does the
ball reach the front wall before hitting the ground?
Neglect air resistance, and assume the acceleration of
gravity is 32 feet/sec2 .
2. A boy 2 meters tall shoots a toy rocket straight up
from head level at 10 meters per second. Assume the
acceleration of gravity is 9.8 meters/sec2 .
(a) What is the highest point above the ground
reached by the rocket?
(b) When does the rocket hit the ground?
In Problems 3–6, ﬁnd the equation of the orthogonal trajectories to the given family of curves.
3. y = cx 3 .
4. y 2 = cx 3 . (b) Determine the orthogonal trajectories to the family (1.12.5).
In Problems 8–9, sketch the slope ﬁeld and some representative solution curves for the given differential equation.
8. y = sin x .
9. y = y/x 2 .
10. At time t the velocity, v(t), of an object is governed
by the differential equation
dv
1
= (25 − v),
dt
2 t > 0. (a) Verify that v(t) = 25 is a solution to this differential equation.
(b) Sketch the slope ﬁeld for 0 ≤ v ≤ 25. What
happens to v(t) as t → ∞? 5. y = ln (cx).
6. x 4 + y 4 = c.
7. Consider the family of curves
x 2 + 3y 2 = 2cy, (1.12.5) (a) Show that the differential equation of this family
is
dy
2xy
=2
.
dx
x − 3y 2 11. An object of mass m is released from rest in a medium
in which the frictional forces are proportional to the
square of the velocity. The initial-value problem that
governs the subsequent motion is
mv dv
= mg − kv 2 ,
dy v(0) = 0, (1.12.6) i i i i i i i “main”
2007/2/16
page 108
i 108 CHAPTER 1 First-Order Differential Equations where v(t) denotes the velocity of the object at time
t , y(t) denotes the distance traveled by the object at
time t as measured from the point at which the object
was released, and k is a positive constant.
(a) Solve (1.12.6) and show that
mg
(1 − e−2ky/m ).
k v2 =
(b) Make a sketch of v2 2xy
dy
.
=− 2
dx
x + 2y 15. (y 2 + x2) dx − x2 dy = 0. 16. y + y(tan x + y sin x) = 0.
dy
2e 2x
1
17.
.
+
y = 2x
2x
dx
1+e
e −1
18. y − x −1 y = x −1 x 2 − y 2 .
sin y + y cos x + 1
dy
=
.
19.
dx
1 − x cos y − sin x
1
dy
+ y=
dx
x 33. dy
x2
y
=2
+.
2
dx
x
x −y 34. [ln (xy) + 1] dx +
35. y + x
+ 2y
y d y = 0. 25 ln x
y
.
=
x
2x 3 y 36. (x + xy 2 )y = x 3 yex −y . 25x 2 ln x
2y π
.
2 For Problems 38–41, determine which of the ﬁve types of
differential equations we have studied the given differential
equation falls into, and use an appropriate technique to ﬁnd
the solution to the initial-value problem. 2x 2 ln x . 14. 20. dy
√
− x2y = y.
dx 37. y = cos x(y csc x − 1), 0 < x < 2 ln x
dy
=
.
12.
dx
xy + 3xy 32. as a function of y . In Problems 12–37, determine which of the ﬁve types of differential equations we have studied the given equation falls
into (see Table 1.12.1), and use an appropriate technique to
ﬁnd the general solution. 13. xy − 2y = 31. x sec2 (xy) dy = − y sec2 (xy) + 2x d x . . 38. y − x 2 y = x 2 , y(0) = 5.
39. e−3x +2y dx + ex −4y dy = 0, y(0) = 0.
40. (3x 2 + 2xy 2 ) dx + (2x 2 y) dy = 0, y(1) = 3.
41. dy
1
− (sin x)y = e− cos x , y(0) = .
dx
e 42. Determine all values of the constants m and n, if there
are any, for which the differential equation
(x 5 + y m ) dx − x n y 3 dy = 0
is each of the following: 21. e2x +y dy − ex −y dx = 0. (a) Exact. 22. y + y cot x = sec x . (b) Separable. dy
2e x
√
23.
+
y = 2 ye−x .
dx
1 + ex
24. y [ln (y/x) + 1]dx − xdy = 0.
25. (1 + 2xey ) dx − (ey + x) dy = 0.
26. y + y sin x = sin x .
27. (3y 2 + x 2 ) dx − 2xy dy = 0.
28. 2x(ln x)y − y = −9x 3 y 3 ln x .
29. (1 + x)y = y(2 + x).
30. (x 2 − 1)(y − 1) + 2y = 0. (c) Homogeneous.
(d) Linear.
(e) Bernoulli.
43. A man’s sandals are moved from poolside (80◦ F) to
a sauna (180◦ F) to warm and dry them. If they are
100◦ F after 3 minutes in the sauna, how much time is
required in the sauna to increase their temperature to
140◦ F, according to Newton’s law of cooling?
44. A hot plate (150◦ F) is placed on a countertop in a
room kept at 70◦ F. If the plate cools 25◦ F in the ﬁrst
10 minutes, when does the plate reach 100◦ F, according to Newton’s law of cooling? i i i i i i i “main”
2007/2/16
page 109
i 1.12 45. A simple nonlinear law of cooling states that the rate
of change of temperature of an object is proportional
to the square of the temperature difference between
the object and its surrounding medium (you may assume that the temperature of the surrounding medium
is constant). Set up and solve the initial-value problem
that governs this cooling process if the initial temperature is T0 . What happens to the temperature of the
object as t → ∞?
46. The temperature of an object at time t is governed by
the linear differential equation
dT
= −k(T − 5 cos 2t).
dt
At t = 0, the temperature of the object is 0◦ F and is,
at that time, increasing at a rate of 5◦ F/min.
(a) Determine the value of the constant k .
(b) Determine the temperature of the object at time
t.
(c) Describe the behavior of the temperature of the
object for large values of t .
47. Each spring, sandhill cranes migrate through the Platte
River valley in central Nebraska. An estimated maximum of a half-million of these birds reach the region
by April 1 each year. If there are only 100,000 sandhill
cranes 15 days later and the sandhill cranes leave the
Platte River valley at a rate proportional to the number
of them still in the valley at the time,
(a) How many sandhill cranes remain in the valley
30 days after April 1?
(b) How many sandhill cranes remain in the valley
35 days after April 1?
(c) How many days after April 1 will there be fewer
than 1000 sandhill cranes in the valley?
48. A city’s population in the year 2000 was 200,000, in
2003 it was 230,000, and in 2006 it was 250,000. Using
the logistic model of population, predict the population in 2010 and 2020. Chapter Review 109 49. Consider an RC circuit with R = 4 , C = 1 F,
5
and E(t) = 6 cos 2t V. If q(0) = 3 C, determine the
current in the circuit for t ≥ 0.
50. Consider an RL circuit with R = 3 , L = 0.3 H, and
E(t) = 10 V. If i(0) = 3 A, determine the current in
the circuit for t ≥ 0.
51. A solution containing 3 g/L of a salt solution pours into
a tank, initially half full of water, at a rate of 6 L/min.
The well-stirred mixture ﬂows out at a rate of 4 L/min.
If the tank holds 60 L, ﬁnd the amount of salt (in grams)
in the tank when the solution overﬂows.
In Problems 52–53, use Euler’s method with the speciﬁed
step size to determine the solution to the given initial-value
problem at the speciﬁed point.
52. y = x 2 + 2y 2 , y(0) = −3, h = 0.1, y(1).
53. y = 3x
+ 2, y(1) = 2, h = 0.05, y(1.5).
y In Problems 54–55, use the modiﬁed Euler method with the
speciﬁed step size to determine the solution to the given
initial-value problem at the speciﬁed point. In each case,
compare your answer to that determined by using Euler’s
method.
54. The initial-value problem in Problem 52.
55. The initial-value problem in Problem 53.
In Problems 56–57, use the fourth-order Runge-Kutta
method with the speciﬁed step size to determine the solution to the given initial-value problem at the speciﬁed point.
In each case, compare your answer to that determined by
using Euler’s method.
56. The initial-value problem in Problem 52.
57. The initial-value problem in Problem 53. Project: A Cylindrical Tank Problem
Consider an open cylindrical tank of height h0 meters and radius r meters that is ﬁlled
with water. A circular hole of radius l meters in the bottom of the tank allows the water
to ﬂow out under the inﬂuence of gravity. According to Torricelli’s law, the water ﬂows
out with the same speed that it would acquire in falling freely from the water level in the
tank to the hole. i i i i i i i “main”
2007/2/16
page 110
i 110 CHAPTER 1
First-Order Differential Equations 1. Use Torricelli’s law to derive the following equation for the rate of change of
volume of water in the tank,
dV
= −a 2gh
dt
where h(t) denotes the height of water in the tank at time t , a denotes the area of
the hole, and g denotes the acceleration due to gravity. [Hint: First show that an
√
object that is released from rest at a height h hits the ground with a speed 2gh.
Then consider the change in the volume of water in the tank in a time interval t .]
2. Show that the rate of change of volume of water in the tank is also given by
dV
dh
= πr 2 .
dt
dt
3. Using the results from problems (1) and (2), determine the height of the water in
the tank at time t , and show that the tank will empty when t = te where
te = πr 2
a 2h0
.
g 4. Suppose now that starting at t = 0 chemical is added to the water in the tank at a
rate of w grams/second. Derive the following differential equation governing the
amount of chemical, A(t), in the tank at time t :
dA
2
−
A = w,
dt
t − te 0 < t < te . (1.12.7) 5. Solve the differential equation (1.12.7). Determine the time when A(t) is a maximum.
6. By making an appropriate change of variables in the differential equation (1.12.7),
derive a differential equation for the concentration c(t) of chemical in the tank at
time t . Solve your differential equation and verify that you get the same expression
for c(t) as you do by dividing the expression for A(t) obtained in the previous
problem by V (t).
7. In the particular case when h0 = 16 m, r = 5 m, l = 0.1 m, and w = 15 g/s, determine
te , and the time when the concentration of chemical in the tank reaches 1 g/L. i i i i i i i “main”
2007/2/16
page 111
i CHAPTER 2
Matrices and Systems
of Linear Equations
Algebra is the intellectual instrument which has been created for rendering clear
the quantitative aspects of the world. — Alfred North Whitehead We will see in the later chapters that most problems in linear algebra can be reduced
to questions regarding the solutions of systems of linear equations. In preparation for
this, the next two chapters provide a detailed introduction to the theory and solution
techniques for such systems. An example of a linear system of equations in the unknowns
x1 , x2 , x3 is
3x1 + 4x2 − 7x3 = 5,
2x1 − 3x2 + 9x3 = 7,
7x1 + 2x2 − 3x3 = 4.
We see that this system is completely determined by the array of numbers 3 4 −7
2 −3 9
7 2 −3 5
7 ,
4 which contains the coefﬁcients of the unknowns on the left-hand side of the system and
the numbers appearing on the right-hand side of the system. Such an array is an example
of a matrix. In this chapter we see that, in general, linear systems of equations are best
represented in terms of matrices and that, once such a representation has been made, the
set of all solutions to the system can be easily determined. In the ﬁrst few sections of
this chapter we therefore introduce the basics of matrix algebra. We then apply matrices
to solve systems of linear equations. In Chapter 7, we will see how matrices also give a
natural framework for formulating and solving systems of linear differential equations.
111
i i i i i i i “main”
2007/2/16
page 112
i 112 CHAPTER 2 Matrices and Systems of Linear Equations 2.1 Matrices: Deﬁnitions and Notation
We begin our discussion of matrices with a deﬁnition. DEFINITION 2.1.1
An m × n (read “m by n”) matrix is a rectangular array of numbers arranged in m
horizontal rows and n vertical columns. Matrices are usually denoted by uppercase
letters, such as A and B . The entries in the matrix are called the elements of the
matrix. Example 2.1.2 The following are examples of a 2 × 3 and a 3 × 3 matrix, respectively:
3
A= 2 51
45 0 −3
7 , 5
9 2 −1 3
B = 1 1 −1 .
001 We will use the index notation to denote the elements of a matrix. According to this
notation, the element in the i th row and j th column of the matrix A will be denoted aij .
Thus, for the matrices in the previous example we have
a13 = 1 ,
5 a22 = − 3 ,
7 b23 = −1, and so on. Using the index notation, a general m × n matrix A is written a11 a12 . . . a1n a21 a22 . . . a2n A= .
.
. ,
.
.
.
.
.
.
am1 am2 . . . amn
or, in a more abbreviated form, A = [aij ]. Remark The expression m × n representing the number of rows and columns of a
general matrix A is sometimes informally called the size of the matrix A. The numbers
m and n themselves are sometimes called the dimensions1 of the matrix A.
Next we deﬁne what is meant by equality of matrices. DEFINITION 2.1.3
Two matrices A and B are equal, written A = B , if
1. They both have the same size, m × n.
2. All corresponding elements in the matrices are equal: aij = bij for all i and j
with 1 ≤ i ≤ m and 1 ≤ j ≤ n.
1 Be careful not to confuse this usage of the term with the dimension of a vector space, which will be
introduced in Chapter 4. i i i i i i i “main”
2007/2/16
page 113
i 2.1 Matrices: Deﬁnitions and Notation According to Deﬁnition 2.1.3, even though the matrices 4
123
A=
and B = 3
456
1 113 2
6
5 contain the same six numbers, and therefore store the same basic information, they are
not equal as matrices. Row Vectors and Column Vectors
Of particular interest to us in the future will be 1 × n and n × 1 matrices. For this reason
we give them special names. DEFINITION 2.1.4
A 1 × n matrix is called a row n-vector. An n × 1 matrix is called a column n-vector.
The elements of a row or column n-vector are called the components of the vector. Remarks
1. We can refer to the objects just deﬁned simply as row vectors and column vectors
if the value of n is clear from the context.
2. We will see later in this chapter that when a system of linear equations is written
using matrices, the basic unknown in the reformulated system is a column vector.
A similar formulation will also be given in Chapter 7 for systems of differential
equations. Example 2.1.5 The matrix a = 2
3 −1
5 4
7 is a row 3-vector and 1
−1
b= 3
4
is a column 4-vector.
As indicated here, we usually denote a row or column vector by a lowercase letter
in bold print.
Associated with any m × n matrix are m row n-vectors and n column m-vectors.
These are referred to as the row vectors of the matrix and the column vectors of the
matrix, respectively.
Example 2.1.6 Associated with the matrix −2 1 3 4
A = 1 2 1 1
3 −1 2 5
are the row 4-vectors
−2 1 3 4 , 1211 , and 3 −1 2 5 , i i i i i i i “main”
2007/2/16
page 114
i 114 CHAPTER 2 Matrices and Systems of Linear Equations and the column 3-vectors −2 1 ,
3 1 2 ,
−1 3
1 ,
2 and 4
1 .
5 Conversely, if a1 , a2 , . . . , an are each column m-vectors, then we let [a1 , a2 , . . . , an ]
denote the m × n matrix whose column vectors are a1 , a2 , . . . , an . Similarly, if b1 , b2 ,
. . . , bm are each row n-vectors, then we write b1 b2 .
.
.
bm
for the m × n matrix with row vectors b1 , b2 , . . . , bm . The reader should observe that
a list of vectors arranged in a row will always consist of column vectors, while a list of
vectors arranged in a column will always consist of row vectors.
Example 2.1.7 If a1 = 1
5
2
3 , a2 = 4
7
5
9 , and a3 = −1
3 , then 3
11 [a1 , a2 , a3 ] = 1
5
2
3 4
7
5
9 −1
3
3
11 . DEFINITION 2.1.8
If we interchange the row vectors and column vectors in an m × n matrix A, we obtain
an n × m matrix called the transpose of A. We denote this matrix by AT . In index
T
notation, the (i, j )th element of AT , denoted aij , is given by
T
aij = aj i . Example 2.1.9 If
A= 1262
,
0347 then 1
2
T
A =
6
2
If 0
3
.
4
7 135
A = 2 0 7 ,
349 i i i i i i i “main”
2007/2/16
page 115
i 2.1 then Matrices: Deﬁnitions and Notation 115 123
AT = 3 0 4 .
579 Square Matrices
An n × n matrix is called a square matrix, since it has the same number of rows as
columns. If A is a square matrix, then the elements aii , 1 ≤ i ≤ n, make up the main
diagonal, or leading diagonal, of the matrix. (See Figure 2.1.1 for the 3 × 3 case.)
a11 a12 a13 a21 a22 a23 a31 a32 a33 Figure 2.1.1: The main diagonal of a 3 × 3 matrix. The sum of the main diagonal elements of an n × n matrix A is called the trace of
A and is denoted tr(A). Thus,
tr(A) = a11 + a22 + · · · + ann .
An n × n matrix A is said to be lower triangular if aij = 0 whenever i < j (zeros
everywhere above (i.e.. “northeast of”) the main diagonal), and it is said to be upper
triangular if aij = 0 whenever i > j (zeros everywhere below (i.e., “southwest of”) the
main diagonal). The following are examples of an upper triangular and lower triangular
matrix, respectively: 1 −8 5
20 0 0 −3 9 , 0 1 0.
0 04
−6 7 −3
Observe that the transpose of a lower (upper) triangular matrix is an upper (lower)
triangular matrix.
If every element on the main diagonal of a lower (upper) triangular matrix is a 1,
the matrix is called a unit lower (upper) triangular matrix.
An n × n matrix D = [dij ] that has all off-diagonal elements equal to zero is
called a diagonal matrix. Note that a matrix D is a diagonal matrix if and only if D is
simultaneously upper and lower triangular. Such a matrix is completely determined by
giving its main diagonal elements, since dij = 0 whenever i = j . Consequently, we can
specify a diagonal matrix in the compact form
D = diag(d1 , d2 , . . . , dn ),
where di denotes the diagonal element dii .
Example 2.1.10 The 4 × 4 diagonal matrix D = diag(1, 2, 0, 3) is 1000
0 2 0 0 D=
0 0 0 0 .
0003 i i i i i i i “main”
2007/2/16
page 116
i 116 CHAPTER 2 Matrices and Systems of Linear Equations The transpose naturally picks out two important types of square matrices as follows. DEFINITION 2.1.11
1. A square matrix A satisfying AT = A is called a symmetric matrix.
2. If A = [aij ], then we let −A denote the matrix with elements −aij . A square
matrix A satisfying AT = −A is called a skew-symmetric (or anti-symmetric)
matrix. Example 2.1.12 The matrix 1 −1 1 5
−1 2 2 6 A= 1 2 3 4
5 649
is symmetric, whereas 0 −1 −5 3 1 0 1 −2 B= 5 −1 0 7
−3 2 −7 0
is skew-symmetric.
Notice that the main diagonal elements of the skew-symmetric matrix in the preceding
example are all zero. This is true in general, since if A is a skew-symmetric matrix, then
aij = −aj i , which implies that when i = j , aii = −aii , so that aii = 0. Matrix and Vector Functions
Later in the text we will be concerned with systems of two or more differential equations.
The most effective way to study such systems, as it turns out, is to represent the system
using matrices and vectors. However, we will need to allow the elements of the matrices
and vectors that arise to contain functions of a single variable, not just real or complex
numbers. This leads to the following deﬁnition, reminiscent of Deﬁnition 2.1.1. DEFINITION 2.1.13
An m × n matrix function A is a rectangular array with m rows and n columns whose
elements are functions of a single real variable t . Example 2.1.14 Here are two examples of matrix functions: A(t) = t − cos t 5
2
et ln (t + 1) tet
t3 and 5 − t + t 2 sin(e2t )
B(t) = −1
tan t .
6
6−t A matrix function A(t) is deﬁned only for real values of t such that all elements in A(t)
assume a well-deﬁned value. The function A is deﬁned only for real values of t with i i i i i i i “main”
2007/2/16
page 117
i 2.1 Matrices: Deﬁnitions and Notation 117 t > −1, since ln (t + 1) is deﬁned only for t > −1. The reader should determine the
values of t for which the matrix function B is deﬁned. Remark It is possible, of course, to consider matrix functions of more than one
variable. However, this will not be particularly relevant for our purposes in this text.
Finally in this section, we have the following special type of matrix function. DEFINITION 2.1.15
An n × 1 matrix function is called a column n-vector function.
For instance, t2
−6tet is a column 2-vector function.2 Exercises for 2.1 Key Terms True-False Review Matrices, Elements, Size (dimensions) of a matrix, Row
vector, Column vector, Square matrix, Main diagonal,
Trace, Lower (Upper) triangular matrix, Unit lower (upper)
triangular matrix, Diagonal matrix, Symmetric matrix,
Skew-symmetric matrix, Matrix function, Column n-vector
function. For Questions 1–10, decide if the given statement is true or
false, and give a brief justiﬁcation for your answer. If true,
you can quote a relevant deﬁnition or theorem from the text.
If false, provide an example, illustration, or brief explanation
of why the statement is false.
1. A diagonal matrix must be both upper triangular and
lower triangular. Skills
• Be able to determine the elements of a matrix.
• Be able to identify the size (i.e., dimensions) of a matrix.
• Be able to identify the row and column vectors of a
matrix.
• Be able to determine the components of a row or column vector. 2. An m × n matrix has m column vectors and n row
vectors.
3. If A is a symmetric matrix, then so is AT .
4. The trace of a matrix is the product of the elements
along the main diagonal. • Be able to say whether or not two given matrices are
equal. 5. A skew-symmetric matrix must have zeros along the
main diagonal. • Be able to ﬁnd the transpose of a matrix. 6. A matrix that is both symmetric and skew-symmetric
cannot contain any nonzero elements. • Be able to compute the trace of a square matrix.
• Be able to recognize square matrices that are upper
triangular, lower triangular, or diagonal.
• Be able to recognize square matrices that are symmetric or skew-symmetric.
• Be able to determine the values of the variable t such
that a matrix function A is deﬁned. 7. The matrix functions
√ t 3t 2
1 sin 2t
|t | and −2 + t ln t
esin t −3 are deﬁned for exactly the same values of t . 2 We could, of course, also speak of row n-vector functions as the 1 × n matrix functions, but we will not
need them in this text. i i i i i i i “main”
2007/2/16
page 118
i 118 CHAPTER 2 Matrices and Systems of Linear Equations 8. The matrix function cos t
t2 −2
−t t
1
e√
t −3 is deﬁned for all positive real numbers t .
9. Any matrix of numbers is a matrix function deﬁned
for all real values of the variable t .
10. If A and B are matrix functions such that the matrices
A(0) and B(0) are the same, then we should consider
A and B to be the same matrix function. Problems
1. If 1 −2 3 2
A = 7 −6 5 −1 ,
0 2 −3 4 determine a31 , a24 , a14 , a32 , a21 , and a34 .
For Problems 2–6, write the matrix with the given elements.
In each case, specify the dimensions of the matrix.
2. a11 = 1, a21 = −1, a12 = 5, a22 = 3.
3. a11 = 2, a12 = 1, a13 = −1, a21 = 0, a22 =
4, a23 = −2.
4. a11 = −1, a41 = −5, a31 = 1, a21 = 1. 1 3 −4
11. A = −1 −2 5 .
267
12. A = 2 10 6
.
5 −1 3 13. If a1 = [1 2], a2 = [3 4], and a3 = [5 1], write
the matrix a1
A = a2 ,
a3
and determine the column vectors of A.
14. If 2
b1 = −1 ,
4 5
b2 = 7 ,
−6 0
0,
b3 =
0 1
2,
b4 =
3 write the matrix B = [b1 , b2 , b3 , b4 ] and determine
the row vectors of B .
15. If a1 , a2 , . . . , ap are each column q -vectors, what are
the dimensions of the matrix that has a1 , a2 , . . . , ap
as its column vectors?
For Problems 16–20, give an example of a matrix of the
speciﬁed form. 5. a11 = 1, a31 = 2, a42 = −1, a32 = 7, a13 =
−2, a23 = 0, a33 = 4, a21 = 3, a41 = −4, a12 =
−3, a22 = 6, a43 = 5. 16. 3 × 3 diagonal matrix. 6. a12 = −1, a13 = 2, a23 = 3, aj i = −aij ,
1 ≤ i ≤ 3, 1 ≤ j ≤ 3. 18. 4 × 4 skew-symmetric matrix. For Problems 7–9, determine tr(A) for the given matrix.
10
.
23 7. A = 1
8. A = 3
7 2
3
9. A =
0 2 −1
2 −2 .
5 −3 01
2 5 .
1 −5 For Problems 10–12, write the column vectors and row vectors of the given matrix.
10. A = 1 −1
.
35 17. 4 × 4 upper triangular matrix. 19. 3 × 3 upper triangular symmetric matrix.
20. 3 × 3 lower triangular skew-symmetric matrix.
For Problems 21– 24, give an example of a matrix function
of the speciﬁed form.
21. 2 × 3 matrix function deﬁned only for values of t with
−2 ≤ t < 3.
22. 4 × 2 matrix function A such that
A(0) = A(1) = A(2).
23. 1 × 5 matrix function A that is nonconstant such that
all elements of A(t) are positive for all t in R.
24. 2 × 1 matrix function A that is nonconstant such that
all elements of A(t) are in [0, 1] for every t in R. i i i i i i i “main”
2007/2/16
page 119
i 2.2 25. Construct distinct matrix functions A and B deﬁned
on all of R such that A(0) = B(0) and A(1) = B(1). Matrix Algebra 119 27. Determine all elements of the 3 × 3 skew-symmetric
matrix A with a21 = 1, a31 = 3, a23 = −1. 26. Prove that a symmetric upper triangular matrix is
diagonal. 2.2 Matrix Algebra
In the previous section we introduced the general idea of a matrix. The next step is to
develop the algebra of matrices. Unless otherwise stated, we assume that all elements of
the matrices that appear are real or complex numbers. Addition and Subtraction of Matrices and Multiplication of
a Matrix by a Scalar
Addition and subtraction of matrices is deﬁned only for matrices with the same dimensions. We begin with addition. DEFINITION 2.2.1
If A and B are both m × n matrices, then we deﬁne addition (or the sum) of A and
B , denoted by A + B , to be the m × n matrix whose elements are obtained by adding
corresponding elements of A and B . In index notation, if A = [aij ] and B = [bij ],
then A + B = [aij + bij ]. Example 2.2.2 We have
2 −1 3
−1 0 5
1 −1 8
+
=
.
4 −5 0
−5 2 7
−1 −3 7
Properties of Matrix Addition: If A and B are both m × n matrices, then
A+B =B +A
A + (B + C) = (A + B) + C (matrix addition is commutative),
(matrix addition is associative). Both of these properties follow directly from Deﬁnition 2.2.1.
In order that we can model oscillatory physical phenomena, in much of the later
work we will need to use complex as well as real numbers. Throughout the text we will
use the term scalar to mean a real or complex number. DEFINITION 2.2.3
If A is an m × n matrix and s is a scalar, then we let sA denote the matrix obtained by
multiplying every element of A by s . This procedure is called scalar multiplication.
In index notation, if A = [aij ], then sA = [saij ]. Example 2.2.4 If A = 2 −1
10 −5
, then 5A =
.
46
20 30 i i i i i i i “main”
2007/2/16
page 120
i 120 CHAPTER 2 Matrices and Systems of Linear Equations Example 2.2.5 If A = √
1+i i
and s = 1 − 2i , where i = −1, ﬁnd sA.
2 + 3i 4 Solution: We have
sA = (1 − 2i)(1 + i) (1 − 2i)i
3−i 2+i
=
.
(1 − 2i)(2 + 3i) (1 − 2i)4
8 − i 4 − 8i DEFINITION 2.2.6
We deﬁne subtraction of two matrices with the same dimensions by
A − B = A + (−1)B.
In index notation, A − B = [aij − bij ]. That is, we subtract corresponding elements.
Further properties satisﬁed by the operations of matrix addition and multiplication
of a matrix by a scalar are as follows:
Properties of Scalar Multiplication: For any scalars s and t , and for any matrices A
and B of the same size,
1A = A
s(A + B) = sA + sB
(s + t)A = sA + tA
s(tA) = (st)A = (ts)A = t (sA) (unit property),
(distributivity of scalars over matrix addition),
(distributivity of scalar addition over matrices),
(associativity of scalar multiplication). The m × n zero matrix, denoted 0m×n (or simply 0, if the dimensions are clear),
is the m × n matrix whose elements are all zeros. In the case of the n × n zero matrix,
we may write 0n . We now collect a few properties of the zero matrix. The ﬁrst of these
below indicates that the zero matrix plays a similar role in matrix addition to that played
by the number zero in the addition of real numbers.
Properties of the Zero Matrix: For all matrices A and the zero matrix of the same size,
we have
A + 0 = A,
A − A = 0,
and
0A = 0.
Note that in the last property here, the zero on the left side of the equation is a scalar,
while the zero on the right side of the equation is a matrix. Multiplication of Matrices
The deﬁnition we introduced above for how to multiply a matrix by a scalar is essentially
the only possibility if, in the case when s is a positive integer, we want sA to be the same
matrix as the one obtained when A is added to itself s times. We now deﬁne how to
multiply two matrices together. In this case the multiplication operation is by no means
obvious. However, in Chapter 5 when we study linear transformations, the motivation for
the matrix multiplication procedure we are deﬁning here will become quite transparent
(see Theorem 5.5.7).
We will build up to the general deﬁnition of matrix multiplication in three stages.
Case 1: Product of a row n-vector and a column n-vector. We begin by generalizing
a concept from elementary calculus. If a and b are either row or column n-vectors, with i i i i i i i “main”
2007/2/16
page 121
i 2.2 121 Matrix Algebra components a1 , a2 , . . . , an , and b1 , b2 , . . . , bn , respectively, then their dot product,
denoted a · b, is the number
a · b = a1 b1 + a2 b2 + · · · + an bn .
As we will see, this is the key formula in deﬁning the product of two matrices. Now let
a be a row n-vector, and let x be a column n-vector. Then their matrix product ax is
deﬁned to be the 1 × 1 matrix whose single element is obtained by taking the dot product
of the row vectors a and xT . Thus, x1
x2 ax = a1 a1 . . . an . = [a1 x1 + a2 x2 + · · · + an xn ].
.
.
xn Example 2.2.7 3 2
If a = 2 −1 3 5 and x = , then
−3
4 3 2
ax = 2 −1 3 5 = [(2)(3) + (−1)(2) + (3)(−3) + (5)(4)] = [15].
−3
4 ...
... (Ax)1
(Ax)2 ai2
... (Ax)i xn (Ax)m ... am1 am2 amn ... ... ... ai1 ain ith element of Ax ... Row i x1
x2 ... a1n
a2n ... a12
a22 ... a11
a21 ... Case 2: Product of an m × n matrix and a column n-vector. If A is an m × n matrix
and x is a column n-vector, then the product Ax is deﬁned to be the m × 1 matrix whose
ith element is obtained by taking the dot product of the ith row vector of A with x. (See
Figure 2.2.1.) Figure 2.2.1: Multiplication of an m × n matrix with a column n-vector. The ith row vector of A, ai , is
ai = ai 1 ai 2 . . . ain ,
so that Ax has ith element
(Ax)i = ai 1 x1 + ai 2 x2 + · · · + ain xn .
Consequently the column vector Ax has elements
n (Ax)i = aik xk , 1 ≤ i ≤ m. (2.2.1) k =1 i i i i i i i “main”
2007/2/16
page 122
i 122 CHAPTER 2 Matrices and Systems of Linear Equations As illustrated in the next example, in practice, we do not use the formula (2.2.1); rather,
we explicitly take the matrix products of the row vectors of A with the column vector x. Example 2.2.8 2 3 −1
7
Find Ax if A = 1 4 −6 and x = −3.
5 −2 0
1 Solution: We have 2 3 −1
7
4
Ax = 1 4 −6 −3 = −11 .
5 −2 0
1
41 The following result regarding multiplication of a column vector by a matrix will
be used repeatedly in later chapters. Theorem 2.2.9 If A = a1 , a2 , . . . , an c1
c2 is an m × n matrix and c = . is a column n-vector, then
.
.
cn
Ac = c1 a1 + c2 a2 + · · · + cn an . (2.2.2) Proof The element aik of A is the i th component of the column m-vector ak , so
aik = (ak )i .
Applying formula (2.2.1) for multiplication of a column vector by a matrix yields
n n (Ac)i = aik ck =
k =1 n (ak )i ck =
k =1 (ck ak )i .
k =1 Consequently,
n ck ak = c1 a1 + c2 a2 + · · · + cn an Ac =
k =1 as required.
If x1 , x2 , . . . , xn are column m-vectors and c1 , c2 , . . . , cn are scalars, then an expression of the form
c1 x1 + c2 x2 + · · · + cn xn
is called a linear combination of the column vectors. Therefore, from Equation (2.2.2),
we see that the vector Ac is obtained by taking a linear combination of the column vectors
of A. For example, if
A= 2 −1
43 and c= 5
,
−1 i i i i i i i “main”
2007/2/16
page 123
i 2.2 Matrix Algebra 123 then
Ac = c1 a1 + c2 a2 = 5 2
−1
11
+ (−1)
=
.
4
3
17 Case 3: Product of an m × n matrix and an n × p matrix. If A is an m × n matrix
and B is an n × p matrix, then the product AB has columns deﬁned by multiplying
the matrix A by the respective column vectors of B , as described in Case 2. That is, if
B = [b1 , b2 , . . . , bp ], then AB is the m × p matrix deﬁned by
AB = [Ab1 , Ab2 , . . . , Abp ]. Example 2.2.10 23
142
If A =
and B = 5 −2, determine AB .
357
84 Solution: We have 23
142 5 −2
AB =
357
84
= [(1)(2) + (4)(5) + (2)(8)] [(1)(3) + (4)(−2) + (2)(4)]
38 3
=
.
87 27
[(3)(2) + (5)(5) + (7)(8)] [(3)(3) + (5)(−2) + (7)(4)] Example 2.2.11 2
If A = −1 and B = 2 4 , determine AB .
3 Solution: We have 2
(2)(2) (2)(4)
48
AB = −1 2 4 = (−1)(2) (−1)(4) = −2 −4 .
3
(3)(2) (3)(4)
6 12 Another way to describe AB is to note that the element (AB)ij is obtained by
computing the matrix product of the i th row vector of A and the j th column vector of
B . That is,
(AB)ij = ai 1 b1j + ai 2 b2j + · · · + ain bnj .
Expressing this using the summation notation yields the following result: DEFINITION 2.2.12
If A = [aij ] is an m × n matrix, B = [bij ] is an n × p matrix, and C = AB , then
n cij = aik bkj , 1 ≤ i ≤ m, 1 ≤ j ≤ p. (2.2.3) k =1 This is called the index form of the matrix product.
The formula (2.2.3) for the ijth element of AB is very important and will often be
required in the future. The reader should memorize it.
In order for the product AB to be deﬁned, we see that A and B must satisfy i i i i i i i “main”
2007/2/16
page 124
i 124 CHAPTER 2 Matrices and Systems of Linear Equations number of columns of A = number of rows of B .
In such a case, if C represents the product matrix AB , then the relationship between the
dimensions of the matrices is
Am × n Bn × p = Cm × p SAME
RESULT Now we give some further examples of matrix multiplication.
Example 2.2.13 If A = 13
2 −2 0
and B =
, then
24
1 53
AB = Example 2.2.14 13
24 −1
If A = 1 2 −1 and B = 0
1 2 −2 0
5 13 9
=
.
1 53
8 16 12 1
1, then
2 −1 1
AB = 1 2 −1 0 1 = −2 1 .
12 Example 2.2.15 Example 2.2.16 2
If A = −1 and B = 1 4 −6 , then
3 2
2 8 −12
6 .
AB = −1 1 4 −6 = −1 −4
3
3 12 −18 If A = 1−i i
3 + 2 i 1 + 4i
and B =
, then
2+i 1+i
i
−1 + 2i
AB = 1−i i
2+i 1+i 3 + 2 i 1 + 4i
4 − i 3 + 2i
=
.
i
−1 + 2i
3 + 8i −5 + 10i Notice that in Examples 2.2.13 and 2.2.14 above, the product BA is not deﬁned,
since the number of columns of the matrix B does not agree with the number of rows of
the matrix A.
We can now establish some basic properties of matrix multiplication.
Theorem 2.2.17 If A, B and C have appropriate dimensions for the operations to be performed, then
A(BC) = (AB)C
A(B + C) = AB + AC
(A + B)C = AC + BC (associativity of matrix multiplication),
(2.2.4)
(left distributivity of matrix multiplication), (2.2.5)
(right distributivity of matrix multiplication). (2.2.6) i i i i i i i “main”
2007/2/16
page 125
i 2.2 Matrix Algebra 125 Proof The idea behind the proof of each of these results is to use the deﬁnition of matrix
multiplication to show that the ij th element of the matrix on the left-hand side of each
equation is equal to the ij th element of the matrix on the right-hand side. We illustrate
by proving (2.2.6), but we leave the proofs of (2.2.4) and (2.2.5) as exercises. Suppose
that A and B are m × n matrices and that C is an n × p matrix. Then, from Equation
(2.2.3),
n [(A + B)C ]ij = n (aik + bik )ckj =
k =1 n aik ckj +
k =1 bik ckj
k =1 = (AC)ij + (BC)ij
= (AC + BC)ij , 1 ≤ i ≤ m, 1 ≤ j ≤ p. Consequently,
(A + B)C = AC + BC.
Theorem 2.2.17 states that matrix multiplication is associative and distributive (over
addition). We now consider the question of commutativity of matrix multiplication. If A
is an m × n matrix and B is an n × m matrix, we can form both of the products AB and
BA, which are m × m and n × n, respectively. In the ﬁrst of these, we say that B has
been premultiplied by A, whereas in the second, we say that B has been postmultiplied
by A. If m = n, then the matrices AB and BA will have different dimensions, so they
cannot be equal. It is important to realize, however, that even if m = n, in general (that
is, except for special cases)
AB = BA.
This is the statement that
matrix multiplication is not commutative.
With a little bit of thought this should not be too surprising, in view of the fact that
the ij th element of AB is obtained by taking the matrix product of the i th row vector
of A with the j th column vector of B , whereas the ij th element of BA is obtained by
taking the matrix product of the i th row vector of B with the j th column vector of A.
We illustrate with an example. Example 2.2.18 If A = 12
31
and B =
, ﬁnd AB and BA.
−1 3
2 −1 Solution:
AB = We have
12
−1 3 31
7 −1
=
2 −1
3 −4 and BA = 31
2 −1 12
29
=
.
−1 3
31 Thus we see that in this example, AB = BA.
As an exercise, the reader can calculate the matrix BA in Examples 2.2.15 and
2.2.16 and again see that AB = BA.
For an n × n matrix we use the usual power notation to denote the operation of
multiplying A by itself. Thus,
A2 = AA, A3 = AAA, and so on. i i i i i i i “main”
2007/2/16
page 126
i 126 CHAPTER 2 Matrices and Systems of Linear Equations The identity matrix, In (or just I if the dimensions are obvious), is the n × n matrix
with ones on the main diagonal and zeros elsewhere. For example, 100
10
I2 =
and
I3 = 0 1 0 .
01
001 DEFINITION 2.2.19
The elements of In can be represented by the Kronecker delta symbol, δij , deﬁned
by
δij = 1,
0, if i = j ,
if i = j . Then,
In = [δij ].
The following properties of the identity matrix indicate that it plays the same role
in matrix multiplication as the number 1 does in the multiplication of real numbers.
Properties of the Identity Matrix:
1. Am×n In = Am×n .
2. Im Am×p = Am×p . Proof We establish property 1 and leave the proof of property 2 as an exercise (Problem 25). Using the index form of the matrix product, we have
n (AI )ij = aik δkj = ai 1 δ1j + ai 2 δ2j + · · · + aij δjj + · · · + ain δnj .
k =1 But, from the deﬁnition of the Kronecker delta symbol, we see that all terms in the
summation with k = j vanish, so that we are left with
(AI )ij = aij δjj = aij , 1 ≤ i ≤ m, 1 ≤ j ≤ n. The next example illustrates property 2 of the identity matrix. Example 2.2.20 2 −1
If A = 3 5, verify that I3 A = A.
0 −2 Solution: We have 100
2 −1
2 −1
I3 A = 0 1 0 3 5 = 3 5 = A.
001
0 −2
0 −2 i i i i i i i “main”
2007/2/16
page 127
i 2.2 Matrix Algebra 127 Properties of the Transpose
The operation of taking the transpose of a matrix was introduced in the previous section.
The next theorem gives three important properties satisﬁed by the transpose. These
should be memorized.
Theorem 2.2.21 Let A and C be m × n matrices, and let B be an n × p matrix. Then
1. (AT )T = A.
2. (A + C)T = AT + C T .
3. (AB)T = B T AT . Proof For all three statements, our strategy is again to show that the (i, j )-elements of
each side of the equation are the same. We prove statement 3 and leave the proofs of 1
and 2 for the exercises (Problem 24). From the deﬁnition of the transpose and the index
form of the matrix product, we have
[(AB)T ]ij = (AB)j i (deﬁnition of the transpose) n = (index form of the matrix product) aj k bki
k =1
n n bki aj k = =
k =1
T TT
bik akj
k =1 = (B A )ij .
T Consequently,
(AB)T = B T AT . Results for Triangular Matrices
Upper and lower triangular matrices play a signiﬁcant role in the analysis of linear systems of equations. The following theorem and its corollary will be needed in Section 2.7.
Theorem 2.2.22 The product of two lower (upper) triangular matrices is a lower (upper) triangular matrix. Proof Suppose that A and B are n × n lower triangular matrices. Then, aik = 0
whenever i < k , and bkj = 0 whenever k < j . If we let C = AB , then we must prove
that
cij = 0 whenever i < j. Using the index form of the matrix product, we have
n cij = n aik bkj =
k =1 aik bkj (since bkj = 0 if k < j ). (2.2.7) k =j i i i i i i i “main”
2007/2/16
page 128
i 128 CHAPTER 2 Matrices and Systems of Linear Equations We now impose the condition that i < j . Then, since k ≥ j in (2.2.7), it follows that
k > i . However, this implies that aik = 0 (since A is lower triangular), and hence, from
(2.2.7), that
cij = 0 whenever i < j. as required.
To establish the result for upper triangular matrices, either we can give an argument
similar to that presented above for lower triangular matrices, or we can use the fact
that the transpose of a lower triangular matrix is an upper triangular matrix, and vice
versa. Hence, if A and B are n × n upper triangular matrices, then AT and B T are lower
triangular, and therefore by what we proved above, (AB)T = B T AT remains lower
triangular. Thus, AB is upper triangular. Corollary 2.2.23 The product of two unit lower (upper) triangular matrices is a unit lower (upper)
triangular matrix. Proof Let A and B be unit lower triangular n × n matrices. We know from Theorem 2.2.22 that C = AB is a lower triangular matrix. We must establish that cii = 1
for each i . The elements on the main diagonal of C can be obtained by setting j = i in
(2.2.7):
n cii = (2.2.8) aik bki .
k =i Since aik = 0 whenever k > i , the only nonzero term in the summation in (2.2.8) occurs
when k = i . Consequently,
cii = aii bii = 1 · 1 = 1, i = 1, 2, . . . , n. The proof for unit upper triangular matrices is similar and left as an exercise. The Algebra and Calculus of Matrix Functions
By and large, the algebra of matrix and vector functions is the same as that for matrices
and vectors of real or complex numbers. Since vector functions are a special case of
matrix functions, we focus here on matrix functions. The main comment here pertains to
scalar multiplication. In the description of scalar multiplication of matrices of numbers,
the scalars were required to be real or complex numbers. However, for matrix functions,
we can scalar multiply by any scalar function s(t). Example 2.2.24 If s(t) = et and A(t) = −2 + t e2t
, then
4
cos t
s(t)A(t) = Example 2.2.25 et (−2 + t) e3t
4e t
et cos t . Referring to A and B from Example 2.1.14, ﬁnd 2A − tB T . i i i i i i i “main”
2007/2/16
page 129
i 2.2 Solution: Matrix Algebra 129 We have 2A − tB T =
= 2t 3 2t − 2 cos t 10
2
2et 2 ln (t + 1) 2tet 6t
5t − t 2 + t 3 −t
t sin(e2t ) t tan t 6t − t 2 − 3t − 2 cos t
10 − 6t
t 3 + t 2 − 5t
t 2 − t sin(e2t ) 2 ln (t + 1) − t tan t 2tet + t 2 − 6t
2e . We can also perform calculus operations on matrix functions. In particular we can
differentiate and integrate them. The rules for doing so are as follows:
1. The derivative of a matrix function is obtained by differentiating every element of
the matrix. Thus, if A(t) = [aij (t)], then
d aij (t)
dA
=
,
dt
dt
provided that each of the aij is differentiable.
2. It follows from (1) and the index form of the matrix product that if A and B are
both differentiable and the product AB is deﬁned, then
dB
dA
d
(AB) = A
+
B.
dt
dt
dt
The key point to notice is that the order of the multiplication must be preserved.
3. If A(t) = [aij (t)], where each aij (t) is integrable on an interval [a, b], then
b b A(t) dt = aij (t) dt . a Example 2.2.26 If A(t) = Solution: a 2t 1
, determine dA/dt and
6t 2 4e2t
We have whereas dA
=
dt
1 A(t) dt = 0 1
0 A(t) 20
12t 8e2t 1
0 2t dt 1
01 12
0 6t dt 1
2t
0 4e dt
dt dt . , = 1
1
.
2 2(e2 − 1) Exercises for 2.2 Key Terms
Matrix addition and subtraction, Scalar multiplication, Matrix multiplication, Dot product, Linear combination of column vectors, Index form, Premultiplication, Postmultiplication, Zero matrix, Identity matrix, Kronecker delta symbol. Skills
• Be able to perform matrix addition, subtraction, and
multiplication. • Know the basic relationships between the dimensions
of two matrices A and B in order for A + B to be
deﬁned, and in order for AB to be deﬁned.
• Be able to multiply a matrix by a scalar.
• Be able to express the product Ax of a matrix and a
column vector as a linear combination of the columns
of A. i i i i i i i “main”
2007/2/16
page 130
i 130 CHAPTER 2 Matrices and Systems of Linear Equations • Be familiar with all of the basic properties of matrix
addition, matrix multiplication, scalar multiplication,
the zero matrix, the identity matrix, the transpose of a
matrix, and lower (upper) triangular matrices.
• Know the basic technique for showing formally that
two matrices are equal.
• Be able to perform algebra and calculus operations on
matrix functions. 11. If A is an n × n matrix function such that A and dA/dt
are the same function, then A = cet In for some constant c.
12. If A and B are matrix functions whose product AB is
deﬁned, then the matrix functions (AB)T and B T AT
are the same. Problems
1. If True-False Review
For Questions 1–12, decide if the given statement is true or
false, and give a brief justiﬁcation for your answer. If true,
you can quote a relevant deﬁnition or theorem from the text.
If false, provide an example, illustration, or brief explanation
of why the statement is false.
1. For all matrices A, B, and C of the appropriate dimensions, we have
(AB)C = (CA)B.
2. If A is an m × n matrix, B is an n × p matrix, and C
is a p × q matrix, then ABC is an m × q matrix.
3. If A and B are symmetric n × n matrices, then so is
A + B.
4. If A and B are skew-symmetric n × n matrices, then
AB is a symmetric matrix. A= 1 2 −1
,
35 2 2 −1 3
,
1 45 B= ﬁnd 2A, −3B, A − 2B , and 3A + 4B .
2. If 2 −1 0
A = 3 1 2,
−1 1 1 −1 −1 1
C = 1 2 3,
−1 1 0 1 −1 2
B = 3 0 1,
−1 1 0 ﬁnd the matrix D such that 2A + B − 3C + 2D =
A + 4C .
3. Let 1 −1 2
,
3 14 1
C = −1 ,
2
A= 5. For n × n matrices A and B , we have
(A + B)2 = A2 + 2AB + B 2 .
6. If AB = 0, then either A = 0 or B = 0. 2 −1 3
B = 5 1 2,
4 6 −2
D = 2 −2 3 . 7. If A and B are square matrices such that AB is upper
triangular, then A and B must both be upper triangular. Find, if possible, AB, BC, CA, DC, DB, AD , and
CD . 8. If A is a square matrix such that A2 = A, then A must
be the zero matrix or the identity matrix. For Problems 4–6, determine AB for the given matrices. In
√
these problems i denotes −1. 9. If A is a matrix of numbers, then if we consider A as
a matrix function, its derivative is the zero matrix. 4. A = 2−i 1+i
,
−i 2 + 4i B= i 1 − 3i
.
0 4+i 10. If A and B are matrix functions whose product AB is
deﬁned, then 5. A = 3 + 2 i 2 − 4i
,
5 + i −1 + 3i B= −1 + i 3 + 2i
.
4 − 3i 1 + i 6. A = 3 − 2i i
,
−i 1 d
dB
dA
(AB) = A
+B
.
dt
dt
dt B= −1 + i 2 − i
0
.
1 + 5i 0 3 − 2i i i i i i i i “main”
2007/2/16
page 131
i 2.2 7. Let A= C= 1 −1 2 3
,
−2 3 4 6 1xz
A = 0 1 y 001 32 1 5 B= 4 −3 ,
−1 6 such that 131 16. Find a matrix Matrix Algebra 0 −1 0
A2 + 0 0 −1 = I3 .
000 −3 2
.
1 −4 Find ABC and CAB .
17. If 8. If
1 −2
,
31 A= B= −1 2
,
53 C= 3
,
−1 ﬁnd (2A − 3B)C . determine all values of x and y for which A2 = A.
18. The Pauli spin matrices σ1 , σ2 , and σ3 are deﬁned by For Problems 9–11, determine Ac by computing an appropriate linear combination of the column vectors of A.
13
,
−5 4 9. A = 6
.
−2 2
c = 3 .
−4 c= 3 −1 4
10. A = 2 1 5 ,
7 −6 3 −1 2
11. A = 4 7 ,
5 −4 c= 5
.
−1 12. If A is an m × n matrix and C is an r × s matrix, what
must be the dimensions of B in order for the product ABC to be deﬁned? Write an expression for the
(i, j )th element of ABC in terms of the elements of
A, B and C .
13. Find A2 , A3 , and x1
,
−2 y A= A4 14. If A and B are n × n matrices, prove that
(a) (A + B)2 = A2 + AB + BA + B 2 .
(b) (A − B)2 = A2 − AB − BA + B 2 . Verify that they satisfy
σ1 σ2 = iσ3 , A= 3 −1
,
−5 −1 calculate A2 and verify that A satisﬁes A2 −2A−8I2 =
02 . σ2 σ3 = iσ1 , σ3 σ1 = iσ2 . If A and B are n × n matrices, we deﬁne their commutator,
denoted [A, B ], by
[A, B ] = AB − BA.
Thus, [A, B ] = 0 if and only if A and B commute. That is,
AB = BA. Problems 19–22 require the commutator.
19. If 1 −1
,
21 31
,
42 B= ﬁnd [A, B ].
20. If
A1 = 10
,
01 A2 = 01
,
00 A3 = 00
,
10 compute all of the commutators [Ai , Aj ], and determine which of the matrices commute.
21. If
A1 = 15. If 10
.
0 −1 σ3 = A= 1 −1
.
23 0 10
(b) A = −2 0 1 .
4 −1 0 0 −i
,
i0 σ2 = and if (a) A = 01
,
10 σ1 = 1
2 0i
,
i0
A3 = verify that [A1 , A2 ]
[A3 , A1 ] = A2 . A2 =
1
2
= 1
2 0 −1
,
10 i0
,
0 −i
A3 , [A2 , A3 ] = A1 , i i i i i i i “main”
2007/2/16
page 132
i 132 CHAPTER 2 Matrices and Systems of Linear Equations 22. If A, B and C are n × n matrices, ﬁnd [A, [B, C ]] and
prove the Jacobi identity
[A, [B, C ]] + [B, [C, A]] + [C, [A, B ]] = 0.
23. Use the index form of the matrix product to prove
properties (2.2.4) and (2.2.5). 31. Use the properties of the transpose to show that S and
T are symmetric and skew-symmetric, respectively.
32. Find S and T for the matrix 1 −5 3
A = 3 2 4.
7 −2 6 24. Prove parts 1 and 2 of Theorem 2.2.21.
25. Prove property 2 of the identity matrix.
26. If A and B are n × n matrices, prove that tr(AB) =
tr(BA).
27. If 1 −1 1 4
A = 2 0 2 −3 ,
3 4 −1 0 0 −1
B=
1
2 1
2
,
1
1 33. If A is an n × n symmetric matrix, show that T = 0.
What is the corresponding result for skew-symmetric
matrices?
34. Show that every n × n matrix can be written as the
sum of a symmetric and a skew-symmetric matrix.
35. Prove that if A is an n × p matrix and D =
diag(d1 , d2 , . . . , dn ), then DA is the matrix obtained
by multiplying the i th row vector of A by di (1 ≤ i ≤
n). 36. Use the properties of the transpose to prove that
ﬁnd AT , B T , AAT , AB and B T AT . (a) AAT is a symmetric matrix.
221
28. Let A = 2 5 2 , and let S be the matrix with col(b) (ABC)T = C T B T AT .
122
For Problems 37–40, determine the derivative of the given
umn vectors matrix function.
−x
−y
z
e −2 t
s1 = 0 ,
s2 = y ,
s3 = 2z ,
37. A(t) =
.
sin t
x
−y
z
38. A(t) = where x, y, z are constants. t sin t
.
cos t 4t 39. A(t) = e t e 2t t 2
.
2et 4e2t 5t 2 (a) Show that AS = [s1 , s2 , 7s3 ].
(b) Find all values of x, y, z such that S T AS =
diag(1, 1, 7).
29. A matrix that is a multiple of In is called an n × n
scalar matrix.
(a) Determine the 4 × 4 scalar matrix whose trace is
8.
(b) Determine the 3 × 3 scalar matrix such that the
product of the elements on the main diagonal is
343. sin t cos t 0
40. A(t) = − cos t sin t t .
0 3t 1
41. Let A = [aij (t)] be an m × n matrix function and let
B = [bij (t)] be an n × p matrix function. Use the
deﬁnition of matrix multiplication to prove that
d
dB
dA
(AB) = A
+
B.
dt
dt
dt 30. Prove that for each positive integer n, there is a unique
scalar matrix whose trace is a given constant k .
If A is an n × n matrix, then the matrices S and T deﬁned by
1
S = 2 (A + AT ), 1
T = 2 (A − AT ) are referred to as the symmetric and skew-symmetric parts
of A, respectively. Problems 31–34 investigate properties of
S and T . For Problems 42–45, determine
matrix function. b
a A(t) dt for the given 42. A(t) = cos t
, a = 0, b = π/2.
sin t 43. A(t) = e t e −t
, a = 0, b = 1.
2et 5e−t i i i i i i i “main”
2007/2/16
page 133
i 2.3 44. A(t) = sin 2t
e 2t
, a = 0, b = 1.
−5
tet
2 t 3t − sin t
sec 133 In Problems 46–49, evaluate the indeﬁnite integral A(t) dt
for the given matrix function. You may assume that the constants of all indeﬁnite integrations are zero. t2 45. The matrix function A(t) in Problem 39, with a = 0
and b = 1.
Integration of matrix functions given in the text was done
with deﬁnite integrals, but one can naturally compute indefinite integrals of matrix functions as well, by performing
indeﬁnite integrals for each element of the matrix function. 2.3 Terminology for Systems of Linear Equations 46. A(t) = 2t
.
3t 2 47. The matrix function A(t) in Problem 40.
48. The matrix function A(t) in Problem 43.
49. The matrix function A(t) in Problem 44. Terminology for Systems of Linear Equations
As we mentioned in Section 2.1, a main aim of this chapter is to apply matrices to
determine the solution properties of any system of linear equations. We are now in a
position to pursue that aim. We begin by introducing some notation and terminology. DEFINITION 2.3.1
The general m × n system of linear equations is of the form
a11 x1 + a12 x2 + · · · + a1n xn = b1 ,
a21 x1 + a22 x2 + · · · + a2n xn = b2 ,
.
.
. (2.3.1) am1 x1 + am2 x2 + · · · + amn xn = bm ,
where the system coefﬁcients aij and the system constants bj are given scalars and
x1 , x2 , . . . , xn denote the unknowns in the system. If bi = 0 for all i , then the system
is called homogeneous; otherwise it is called nonhomogeneous. DEFINITION 2.3.2
By a solution to the system (2.3.1) we mean an ordered n-tuple of scalars, (c1 , c2 ,
. . . , cn ), which, when substituted for x1 , x2 , . . . , xn into the left-hand side of system
(2.3.1), yield the values on the right-hand side. The set of all solutions to system
(2.3.1) is called the solution set to the system. Remarks
1. Usually the aij and bj will be real numbers, and we will then be interested in determining only the real solutions to system (2.3.1). However, many of the problems
that arise in the later chapters will require the solution to systems with complex
coefﬁcients, in which case the corresponding solutions will also be complex.
2. If (c1 , c2 , . . . , cn ) is a solution to the system (2.3.1), we will sometimes specify
this solution by writing x1 = c1 , x2 = c2 , . . . , xn = cn . For example, the ordered
pair of numbers (1, 2) is a solution to the system
x1 + x2 = 3,
3x1 − 2x2 = −1,
and we could express this solution in the equivalent form x1 = 1, x2 = 2. i i i i i i i “main”
2007/2/16
page 134
i 134 CHAPTER 2 Matrices and Systems of Linear Equations At this point, we pause to introduce some important notation that will be used
frequently throughout the remainder of the text. Notation 2.3.3 The set of all ordered n-tuples of real numbers (c1 , c2 , . . . , cn ) will be denoted by Rn .
Therefore, the set of all real solutions to the linear system (2.3.1) forms a subset of Rn .
In like manner, the set of all ordered n-tuples of complex numbers will be denoted by
Cn , and the solution set for a linear system (2.3.1) containing complex coefﬁcients can
be viewed as a subset of Cn .
Notice that when we restrict all scalar values to be real, we have a natural correspondence between elements of Rn , row n-vectors, and column n-vectors: x1 x2 . . . xn ] ←→ . .
.
. (x1 , x2 , . . . , xn ) ←→ [x1 x2 xn
Therefore, we may use the operations of addition, subtraction, and scalar multiplication
of row n-vectors and column n-vectors to naturally equip Rn with these same operations.
Therefore, just as we can perform addition and scalar multiplication of row or column
vectors, so too can we perform these operations on n-tuples of scalars. In fact, we will
often treat ordered n-tuples of scalars, row n-vectors, and column n-vectors as if they
were just different representations of the same basic object.
Of course, if we allow all scalars in question to assume complex values, then the
correspondence is between elements of Cn , row n-vectors, and column n-vectors. We
will have much more to say about the sets Rn and Cn in Chapter 4.
Returning to the general discussion of system (2.3.1), we will consider some fundamental questions:
1. Does the system (2.3.1) have a solution?
2. If the answer to question 1 is yes, then how many solutions are there?
3. How do we determine all of the solutions?
To obtain an idea of the answer to questions 1 and 2, consider the special case of a
system of three equations in three unknowns. The linear system (2.3.1) then reduces to
a11 x1 + a12 x2 + a13 x3 = b1 ,
a21 x1 + a22 x2 + a23 x3 = b2 ,
a31 x1 + a32 x2 + a33 x3 = b3 ,
which can be interpreted as deﬁning three planes in space. An ordered triple (c1 , c2 , c3 )
is a solution to this system if and only if it corresponds to the coordinates of a point of
intersection of the three planes. There are precisely four possibilities:
1. The planes have no intersection point.
2. The planes intersect in just one point.
3. The planes intersect in a line.
4. The planes are all identical. i i i i i i i “main”
2007/2/16
page 135
i 2.3 Terminology for Systems of Linear Equations 135 In case 1, the corresponding system has no solution, whereas in case 2, the system has
just one solution. Finally, in cases 3 and 4, every point on the line or plane (respectively)
is a solution to the linear system and hence the system has an inﬁnite number of solutions.
Cases 1,2 and 3 are illustrated in Figure 2.3.1. Three parallel planes (no
intersection): no solution No common intersection:
no solution Planes intersect at a point: a
unique solution Planes intersect in a line: an
infinite number of solutions Figure 2.3.1: Possible intersection points for three planes in space. We have therefore proved, geometrically, that there are precisely three possibilities
for the solutions of a system of three equations in three unknowns. The system either
has no solution, it has just one solution, or it has an inﬁnite number of solutions. In
Section 2.5, we will establish that these are the only possibilities for the general m × n
system (2.3.1). DEFINITION 2.3.4
A system of equations that has at least one solution is said to be consistent, whereas
a system that has no solution is called inconsistent.
Our problem will be to determine whether a given system is consistent and then, if
it is, to ﬁnd its solution set. DEFINITION 2.3.5
Naturally associated with the system (2.3.1) are the following two matrices: a11 a12 . . . a1n a21 a22 . . . a2n 1. The matrix of coefﬁcients A = .
.
. . am1 am2 . . . amn a11 a12 . . . a1n a21 a22 . . . a2n 2. The augmented matrix A# = .
. .
am1 am2 . . . amn b1
b2 . .
.
. bm i i i i i i i “main”
2007/2/16
page 136
i 136 CHAPTER 2 Matrices and Systems of Linear Equations The augmented matrix completely characterizes a system of equations, since it
contains all of the system coefﬁcients and system constants. We will see in the subsequent
sections that the relationship between A and A# determines the solution properties of a
linear system. Notice that the matrix of coefﬁcients is the matrix consisting of the ﬁrst n
columns of A# .
Example 2.3.6 Write the system of equations with the following augmented matrix: 1 2 9 −1 1
2 −3 7 4 2 .
1 3 5 0 −1 Solution: The appropriate system is
x1 + 2x2 + 9x3 − x4 = 1,
2x1 − 3x2 + 7x3 + 4x4 = 2,
x1 + 3x2 + 5x3
= −1. Vector Formulation
We next show that the matrix product described in the preceding section can be used to
write a linear system as a single equation involving the matrix of coefﬁcients and column
vectors. For example, the system
x1 + 3x2 − 4x3 = 1,
2x1 + 5x2 − x3 = 5,
x1
+ 6x3 = 3
can be written as the vector equation 1
1 3 −4
x1
2 5 −1 x2 = 5 ,
3
10 6
x3
since this vector equation is satisﬁed if and only if x1 + 3x2 − 4x3
1
2x1 + 5x2 − x3 = 5 ;
3
x1
+ 6x3
that is, if and only if each equation of the given system is satisﬁed.
Similarly, the general m × n system of linear equations
a11 x1 + a12 x2 + · · · + a1n xn = b1 ,
a21 x1 + a22 x2 + · · · + a2n xn = b2 ,
.
.
.
am1 x1 + am2 x2 + · · · + amn xn = bm ,
can be written as the vector equation
Ax = b, i i i i i i i “main”
2007/2/16
page 137
i 2.3 Terminology for Systems of Linear Equations 137 where A is the m × n matrix of coefﬁcients and x1
b1
x2 b2 x=.
and
b = . .
.
.
.
.
xn bm We will refer to the column n-vector x as the vector of unknowns, and to the column
m-vector b as the right-hand-side vector. Assuming that all elements in the system are
real, we can view b as an element of Rm and x as an element of Rn . We can denote these
statements by b ∈ Rm and x ∈ Rn , respectively.3 Therefore, the set of all real solutions
to the system Ax = b is
S = {x ∈ Rn : Ax = b},
which is a subset of Rn .
Example 2.3.7 It can be shown, using the techniques of the next two sections, that the solution set of
the linear system
x1 + x2 + 2x3 − x4 = 0,
3x1 − 2x2 + x3 + 2x4 = 0,
5x1 + 3x2 + 3x3 − 2x4 = 0,
is the subset of R4 deﬁned by
S = {(−t, 4t, t, 5t) : t ∈ R}.
A similar vector formulation for systems of differential equations can be used not
only in developing the theory for such systems, but also in deriving solution techniques.
As an example of this formulation, consider the system of differential equations
dx1
= 3tx1 + 9x2 + 6et ,
dt
dx2
= 2x1 − 7x2 + 3et .
dt
Using matrix and vector functions, this system can be written as the vector equation
dx
= A(t)x(t) + b(t),
dt
where
x(t) = x1 (t)
,
x2 (t) dx
d x1 /dt
=
,
dx2 /dt
dt A= 3t 9
,
2 −7 and b(t) = 6et
.
3et In this formulation, the basic unknown is the column 2-vector function x(t).
Example 2.3.8 Give the vector formulation for the system of equations
x1 = 3x1 + (sin t)x2 + et ,
x2 = 7tx1 +
t 2 x2 − 4e−t .
3 The symbol ∈ is the set-theoretic notation declaring membership in a set; it will be often encountered in
the text. i i i i i i i “main”
2007/2/16
page 138
i 138 CHAPTER 2 Matrices and Systems of Linear Equations Solution: We have
x1
x2 = 3 sin t
7t t 2 x1
x2 + et −4e−t . That is,
x (t) = A(t)x(t) + b(t),
where
x(t) = x1 (t)
, A(t) =
x2 (t) 3 sin t
7t t 2 , b(t) = et −4e−t . Exercises for 2.3 Key Terms
System coefﬁcients, System constants, Homogeneous system, Nonhomogeneous system, Solution, Solution set, Consistent system, Inconsistent system, Matrix of coefﬁcients,
Augmented matrix, Vector of unknowns, Right-hand-side
vector. Skills 2. A linear system that contains three distinct planes can
have at most one solution.
3. If the matrix of coefﬁcients of a linear system is an
m × n matrix, then the right-hand-side vector must
have n components.
4. It is impossible for a linear system of equations to have
exactly two solutions. • Be able to write a linear system of equations as a vector equation, and identify the matrix of coefﬁcients,
the right-hand-side vector, and the augmented matrix. 5. If a linear system has an m × n coefﬁcient matrix,
then the augmented matrix for the linear system is
m × (n + 1). • Given a matrix of coefﬁcients and a right-hand-side
vector, or an augmented matrix, be able to write the
corresponding linear system. 6. If A is an n × n matrix, then the linear systems Ax = 0
and AT x = 0 have the same solution set. • Understand the geometric difference between a consistent linear system and an inconsistent one.
• Be able to verify that the components of a given vector
provide a solution to a linear system.
• Be able to give the vector formulation for a system of
differential equations. True-False Review
For Questions 1–6, decide if the given statement is true or
false, and give a brief justiﬁcation for your answer. If true,
you can quote a relevant deﬁnition or theorem from the text.
If false, provide an example, illustration, or brief explanation
of why the statement is false.
1. If a linear system of equations has an m × n augmented matrix, then the system has m equations and
n unknowns. Problems
For Problems 1–2, verify that the given triple of real numbers
is a solution to the given system.
1. (1, −1, 2);
2x1 − 3x2 + 4x3 = 13,
x1 + x2 − x3 = −2,
5x1 + 4x2 + x3 = 3.
2. (2, −3, 1);
x1
3x1
x1
2x1 +
−
+
+ x2
x2
x2
2x2 −
−
+
− 2x3
7x3
x3
4x3 = −3,
= 2,
= 0,
= −6. i i i i i i i “main”
2007/2/16
page 139
i 2.4 139 Elementary Row Operations and Row-Echelon Matrices 3. Verify that for all values of t ,
(1 − t, 2 + 3t, 3 − 2t)
is a solution to the linear system
x1 + x2 + x3 = 6,
x1 − x2 − 2x3 = −7,
5x1 + x2 − x3 = 4. 2 13
3
10. A = 4 −1 2 , b = 1 .
7 63
−5
11. Consider the m × n homogeneous system of linear
equations
Ax = 0. (2.3.2) 4. Verify that for all values of s and t ,
(s, s − 2t, 2s + 3t, t) (a) If x = [x1 x2 . . . xn ]T and y = [y1 y2 . . . yn ]T
are solutions to (2.3.2), show that is a solution to the linear system
x1 + x2 − x3 + 5x4 = 0,
2x2 − x3 + 7x4 = 0,
4x1 + 2x2 − 3x3 + 13x4 = 0.
5. By making a sketch in the xy -plane, prove that the
following linear system has no solution:
2x + 3y = 1,
2x + 3y = 2.
For Problems 6–8, determine the coefﬁcient matrix, A, the
right-hand-side vector, b, and the augmented matrix A# of
the given system.
6. x1 + 2x2 − 3x3 = 1,
2x1 + 4x2 − 5x3 = 2,
7x1 + 2x2 − x3 = 3. 7. x + y + z − w = 3,
2x + 4y − 3z + 7w = 2. 8. x1 + 2x2 − x3 = 0,
2x1 + 3x2 − 2x3 = 0,
5x1 + 6x2 − 5x3 = 0. For Problems 9–10, write the system of equations with the
given coefﬁcient matrix and right-hand-side vector. 1 −1 2 3
1
9. A = 1 1 −2 6 , b = −1 .
3 1 42
2 2.4 z=x+y and w = cx are also solutions, where c is an arbitrary scalar.
(b) Is the result of (a) true when x and y are solutions to the nonhomogeneous system Ax = b?
Explain.
For Problems 12–15, write the vector formulation for the
given system of differential equations.
12. x1 = −4x1 + 3x2 + 4t ,
13. x1 = t 2 x1 − tx2 ,
14. x1 = e2t x2 , x2 = 6x1 − 4x2 + t 2 . x2 = (− sin t)x1 + x2 . x2 + (sin t)x1 = 1. 15. x1 = (− sin t)x2 + x3 + t , x2 = −et x1 + t 2 x3 + t 3 ,
x3 = −tx1 + t 2 x2 + 1.
For Problems 16–17 verify that the given vector function x
deﬁnes a solution to x = Ax + b for the given A and b.
16. x(t) = e 4t
−2e4t , A= 2 −1
, b(t) =
−2 3 4e−2t + 2 sin t
, A=
3e−2t − cos t
−2(cos t + sin t)
b(t) =
.
7 sin t + 2 cos t 17. x(t) = 0
.
0 1 −4
,
−3 2 Elementary Row Operations and Row-Echelon Matrices
In the next section we will develop methods for solving a system of linear equations.
These methods will consist of reducing a given system of equations to a new system that
has the same solution set as the given system but is easier to solve. In this section we
introduce the requisite mathematical results. i i i i i i i “main”
2007/2/16
page 140
i 140 CHAPTER 2 Matrices and Systems of Linear Equations Elementary Row Operations
The ﬁrst step in deriving systematic procedures for solving a linear system is to determine
what operations can be performed on such a system without altering its solution set.
Example 2.4.1 Consider the system of equations
x1 + 2x2 + 4x3 = 2,
2x1 − 5x2 + 3x3 = 6,
4x1 + 6x2 − 7x3 = 8. (2.4.1)
(2.4.2)
(2.4.3) Solution: If we permute (i.e., interchange), say, Equations (2.4.1) and (2.4.2), the
resulting system is
2x1 − 5x2 + 3x3 = 6,
x1 + 2x2 + 4x3 = 2,
4x1 + 6x2 − 7x3 = 8,
which certainly has the same solution set as the original system. Returning to the original
system, if we multiply, say, Equation (2.4.2) by 5, we obtain the system
x1 + 2x2 + 4x3 = 2,
10x1 − 25x2 + 15x3 = 30,
4x1 + 6x2 − 7x3 = 8,
which again has the same solution set as the original system. Finally, if we add, say,
twice Equation (2.4.1) to Equation (2.4.3), we obtain the system
x1 + 2x2 + 4x3 = 2, (2.4.4) 2x1 − 5x2 + 3x3 = 6, (2.4.5) (4x1 + 6x2 − 7x3 ) + 2(x1 + 2x2 + 4x3 ) = 8 + 2(2). (2.4.6) We can verify that, if (2.4.4)–(2.4.6) are satisﬁed, then so are (2.4.1)–(2.4.3), and
vice versa. It follows that the system of equations (2.4.4)–(2.4.6) has the same solution
set as the original system of equations (2.4.1)–(2.4.3).
More generally, similar reasoning can be used to show that the following three
operations can be performed on any m × n system of linear equations without altering
the solution set:
1. Permute equations.
2. Multiply an equation by a nonzero constant.
3. Add a multiple of one equation to another equation.
Since these operations involve changes only in the system coefﬁcients and constants
(and not changes in the variables), they can be represented by the following operations
on the rows of the augmented matrix of the system:
1. Permute rows.
2. Multiply a row by a nonzero constant.
3. Add a multiple of one row to another row. i i i i i i i “main”
2007/2/16
page 141
i 2.4 Elementary Row Operations and Row-Echelon Matrices 141 These three operations, called elementary row operations, will be a basic computational
tool throughout the text, even in cases when the matrix under consideration is not derived
from a system of linear equations. The following notation will be used to describe
elementary row operations performed on a matrix A.
1. Pij : Permute the i th and j th rows in A. 2. Mi (k): Multiply every element of the i th row of A by a nonzero scalar k .
3. Aij (k): Add to the elements of the j th row of A the scalar k times the corresponding
elements of the i th row of A.
Furthermore, the notation A ∼ B will mean that matrix B has been obtained from
matrix A by a sequence of elementary row operations. To reference a particular elementary row operation used in, say, the nth step of the sequence of elementary row operations,
n
we will write ∼ B .
Example 2.4.2 The one-step operations performed on the system in Example 2.4.1 can be described as
follows using elementary row operations on the augmented matrix of the system: 1 2 42
2 −5 3 6
1 2 −5 3 6 ∼ 1 2 4 2 1. P12 . Permute (2.4.1) and (2.4.2).
4 6 −7 8
4 6 −7 8 1 2 42
1
2 42
1 2 −5 3 6 ∼ 10 −25 15 30 1. M2 (5). Multiply (2.4.2) by 5.
4 6 −7 8
4
6 −7 8 1 2 42
1 24 2
1 2 −5 3 6 ∼ 2 −5 3 6 1. A13 (2). Add 2 times (2.4.1) to (2.4.3).
4 6 −7 8
6 10 1 12
It is important to realize that each elementary row operation is reversible; we can “undo” a given elementary row operation by another elementary row operation to bring the
modiﬁed linear system back into its original form. Speciﬁcally, in terms of the notation
introduced above, the reverse operations are determined as follows (ERO refers here to
“elementary row operation”):
ERO Applied to A
A∼B
Pij
Mi (k )
Aij (k ) Reverse ERO Applied to B
B∼A
Pj i : Permute row j and i in B .
Mi (1/k ): Multiply the i th row of B by 1/k .
Aij (−k): Add to the elements of the j th row
of B the scalar −k times the corresponding
elements of the i th row of B We introduce a special term for matrices that are related via elementary row operations. DEFINITION 2.4.3
Let A be an m × n matrix. Any matrix obtained from A by a ﬁnite sequence of
elementary row operations is said to be row-equivalent to A. i i i i i i i “main”
2007/2/16
page 142
i 142 CHAPTER 2 Matrices and Systems of Linear Equations Thus, all of the matrices in the previous example are row-equivalent. Since elementary row operations do not alter the solution set of a linear system, we have the next
theorem.
Theorem 2.4.4 Systems of linear equations with row-equivalent augmented matrices have the same
solution sets. Row-Echelon Matrices
Our methods for solving a system of linear equations will consist of using elementary
row operations to reduce the augmented matrix of the given system to a simple form.
But how simple a form should we aim for? In order to answer this question, consider the
system
x1 + x2 − x3 = 4,
x2 − 3x3 = 5,
x3 = 2. (2.4.7)
(2.4.8)
(2.4.9) This system can be solved most easily as follows. From Equation (2.4.9), x3 = 2.
Substituting this value into Equation (2.4.8) and solving for x2 yields x2 = 5 + 6 = 11.
Finally, substituting for x3 and x2 into Equation (2.4.7) and solving for x1 , we obtain
x1 = −5. Thus, the solution to the given system of equations is (−5, 11, 2), a single
vector in R3 . This technique is called back substitution and could be used because the
given system has a simple form. The augmented matrix of the system is 1 1 −1 4
0 1 −3 5
00 12
We see that the submatrix consisting of the ﬁrst three columns (which corresponds
to the matrix of coefﬁcients) is an upper triangular matrix with the leftmost nonzero
entry in each row equal to 1. The back-substitution method will work on any system of
linear equations with an augmented matrix of this form. Unfortunately, not all systems
of equations have augmented matrices that can be reduced to such a form. However,
there is a simple type of matrix to which any matrix can be reduced by elementary row
operations, and which also represents a system of equations that can be solved (if it has
a solution) by back substitution. This is called a row-echelon matrix and is deﬁned as
follows: DEFINITION 2.4.5
An m × n matrix is called a row-echelon matrix if it satisﬁes the following three
conditions:
1. If there are any rows consisting entirely of zeros, they are grouped together at
the bottom of the matrix.
2. The ﬁrst nonzero element in any nonzero row4 is a 1 (called a leading 1).
3. The leading 1 of any row below the ﬁrst row is to the right of the leading 1 of
the row above it.
4 A nonzero row (nonzero column) is any row (column) that does not consist entirely of zeros. i i i i i i i “main”
2007/2/16
page 143
i 2.4 Example 2.4.6 Elementary Row Operations and Row-Echelon Matrices 143 Examples of row-echelon matrices are 1 −2 3 7
0 1 5 0 ,
0 001 001
0 0 0 ,
000 whereas 1 0 −1
0 1 2
0 1 −1 and and 1
0 0
0 1 −1 6 5 9
0 0 1 2 5 0 0 0 1 0 ,
0 0000 00
0 0 1 −1
01 are not row-echelon matrices.
The basic result that will allow us to determine the solution set to any system of
linear equations is stated in the next theorem.
Theorem 2.4.7 Example 2.4.8 Any matrix is row-equivalent to a row-echelon matrix.
According to this theorem, by applying an appropriate sequence of elementary row
operations to any m × n matrix, we can always reduce it to a row-echelon matrix. When
a matrix A has been reduced to a row-echelon matrix in this way, we say that it has been
reduced to row-echelon form and refer to the resulting matrix as a row-echelon form
of A. The proof of Theorem 2.4.7 consists of giving an algorithm that will reduce an
arbitrary m × n matrix to a row-echelon matrix after a ﬁnite sequence of elementary
row operations. Before presenting such an algorithm, we ﬁrst illustrate the result with
an example. 2 1 −1 3 1 −1 2 1 Use elementary row operations to reduce −4 6 −7 1 to row-echelon form.
2 0 13 Solution: We show each step in detail.
Step 1: Put a leading 1 in the (1, 1) position.
This is most easily accomplished by permuting rows 1 and 2. 2 1 −1 3
1 −1 2 1 1 −1 2 1 1 2 1 −1 3 −4 6 −7 1 ∼ −4 6 −7 1
2 0 13
2 0 13
Step 2: Use the leading 1 to put zeros beneath it in column 1.
This is accomplished by adding appropriate multiples of row 1 to the remaining
rows. 1 −1 2 1
Add −2 times row 1 to row 2. 3 −5 1
2 0 ∼
Step 2 row operations:
Add 4 times row 1 to row 3.
0 2 1 5 Add −2 times row 1 to row 4.
0 2 −3 1
Step 3: Put a leading 1 in the (2, 2) position.
We could accomplish this by multiplying row 2 by 1/3. However, this would introduce fractions into the matrix and thereby complicate the remaining computations. In i i i i i i i “main”
2007/2/16
page 144
i 144 CHAPTER 2 Matrices and Systems of Linear Equations hand calculations, fewer algebraic errors result if we avoid the use of fractions. In this
case, we can obtain a leading 1 without the use of fractions by adding −1 times row 3
to row 2. 1 −1 2 1
1 −6 −4
3 0 ∼
Step 3 row operation: Add −1 times row 3 to row 2.
0 2 1 5 0 2 −3 1
Step 4: Use the leading 1 in the (2, 2) position to put zeros beneath it in column 2.
We now add appropriate multiples of row 2 to the rows beneath it. For row-echelon
form, we need not be concerned about the row above it, however. 1 −1 2 1
Add −2 times row 2 to row 3.
1 −6 −4
4 0 ∼
Step 4 row operations:
0 0 13 13
Add −2 times row 2 to row 4.
0099
Step 5: Put a leading 1 in the (3, 3) position.
This can be accomplished by multiplying row 3 by 1/13. 1 −1 2 1
1 −6 −4
5 0 ∼
0 0 1 1 0099
Step 6: Use the leading 1 in the (3, 3) position to put zeros beneath it in column 3.
The appropriate row operation is to add −9 times row 3 to row 4. 1 −1 2 1
1 −6 −4
6 0 ∼
0 0 1 1 0000
This is a row-echelon matrix, hence the given matrix has been reduced to row-echelon
form. The speciﬁc operations used at each step are given next, using the notation introduced previously in this section. In future examples, we will simply indicate brieﬂy the
elementary row operation used at each step. The following shows this description for the
present example.
1. P12 2. A12 (−2), A13 (4), A14 (−2) 3. A32 (−1) 4. A23 (−2), A24 (−2) 5. M3 (1/13) 6. A34 (−9) Remarks
1. Notice that in steps 2 and 4 of the preceding example we have performed multiple elementary row operations of the type Aij (k) in a single step. With this one
exception, the reader is strongly advised not to combine multiple elementary row
operations into a single step, particularly when they are of different types. This is
a common source of calculation errors. i i i i i i i “main”
2007/2/16
page 145
i 2.4 Elementary Row Operations and Row-Echelon Matrices 145 2. The reader may have noticed that the particular steps taken in the preceding example are not uniquely determined. For instance, we could have achieved a leading 1
in the (1, 1) position in step 1 by multiplying the ﬁrst row by 1/2, rather than permuting the ﬁrst two rows. Therefore, we may have multiple strategies for reducing
a matrix to row-echelon form, and indeed, many possible row-echelon forms for
a given matrix A. In this particular case, we chose not to multiply the ﬁrst row by
1/2 in order to avoid introducing fractions into the calculations.
The reader is urged to study the foregoing example very carefully, since it illustrates
the general procedure for reducing an m × n matrix to row-echelon form using elementary
row operations. This procedure will be used repeatedly throughout the text. The idea
behind reduction to row-echelon form is to start at the upper left-hand corner of the
matrix and proceed downward and to the right in the matrix. The following algorithm
formalizes the steps that reduce any m × n matrix to row-echelon form using a ﬁnite
number of elementary row operations and thereby provides a proof of Theorem 2.4.7.
An illustration of this algorithm is given in Figure 2.4.1.
Algorithm for Reducing an m × n Matrix A to Row-Echelon Form
1. Start with an m × n matrix A. If A = 0, go to step 7.
2. Determine the leftmost nonzero column (this is called a pivot column,
and the topmost position in this column is called a pivot position).
3. Use elementary row operations to put a 1 in the pivot position.
4. Use elementary row operations to put zeros below the pivot position.
5. If there are no more nonzero rows below the pivot position go to step 7,
otherwise go to step 6.
6. Apply steps 2 through 5 to the submatrix consisting of the rows that lie
below the pivot position.
7. The matrix is a row-echelon matrix. ze ze ze Elementary row
Elementary row
*
*
*
operations applied
operations applied
*
*
*
0 . . . 0 * * * * to pivot column 0 . . . 0 1 * * * to rows beneath 0 . . . 0 1 * * *
in submatrix
pivot position
0*
*
*
.*
~
~
*
*
.*
Pivot position
*
*
*
0*
* ro ro ro s s s Nonzero elements
Submatrix Pivot column New submatrix Figure 2.4.1: Illustration of an algorithm for reducing an m × n matrix to row-echelon form. Remark In order to obtain a row-echelon matrix, we put a 1 in each pivot position.
However, many algorithms for solving systems of linear equations numerically are based
around the preceding algorithm, except that in step 3 we place a nonzero number (not
necessarily a 1) in the pivot position. Of course, the matrix resulting from an application
of this algorithm differs from a row-echelon matrix, since it will have arbitrary nonzero
elements in the pivot positions. Example 2.4.9 3 2 −5
Reduce 1 1 −2
1 0 −3 2
1 to row-echelon form.
4 i i i i i i i “main”
2007/2/16
page 146
i 146 CHAPTER 2 Matrices and Systems of Linear Equations Solution: Applying the row-reduction algorithm leads to the following sequence of
elementary row operations. The speciﬁc row operations used at each step are given at
the end of the process.
Pivot position Pivot position E 3 2 −5 2
h
1 1 −1 1
1 1 −1 1
© 1
2
1 1 −1 1 ∼ 3 2 −5 2 ∼ 0 −m 2 −1
1−
1 0 −3 4
1 0 −3 4
0 −1 −2 3
T
T
Pivot column 1
3
∼ 0
0 Pivot column Pivot column d
d
1 −1 1
1 1 −1 1
1 1 −1 1
4
5
1 2 1 ∼ 0 1 2 1 ∼ 0 1 2 1
j
−1 −2 3
00 04
00 01
T Pivot position This is a row-echelon matrix and hence we are done. The row operations used are
summarized here:
1. P12 2. A12 (−3), A13 (−1) 3. M2 (−1) 4. A23 (1) 5. M3 (1/4) The Rank of a Matrix
We now derive some further results on row-echelon matrices that will be required in the
next section to develop the theory for solving systems of linear equations.
We ﬁrst observe that a row-echelon form for a matrix A is not unique. Given one
row-echelon form for A, we can always obtain a different one by taking the ﬁrst rowechelon form for A and adding some multiple of a given row to any rows above it. The
result is still in row-echelon form.
However, even though the row-echelon form of A is not unique, we do have the
following theorem (in Chapter 4 we will see how the proof of this theorem arises naturally
from the more sophisticated ideas from linear algebra yet to be introduced).
Theorem 2.4.10 Let A be an m × n matrix. All row-echelon matrices that are row-equivalent to A have
the same number of nonzero rows.
Theorem 2.4.10 associates a number with any m × n matrix A—namely, the number
of nonzero rows in any row-echelon form of A. As we will see in the next section, this
number is fundamental in determining the solution properties of linear systems, and
indeed it plays a central role in linear algebra in general. For this reason, we give it a
special name. DEFINITION 2.4.11
The number of nonzero rows in any row-echelon form of a matrix A is called the
rank of A and is denoted rank(A). Example 2.4.12 31
Determine rank(A) if A = 4 3
2 −1 4
5.
3 i i i i i i i “main”
2007/2/16
page 147
i 2.4 Elementary Row Operations and Row-Echelon Matrices 147 Solution: In order to determine rank(A), we must ﬁrst reduce A to row-echelon form. 121
12 1
3 14
1 21
1 21
1
2
3
4
4 3 5 ∼ 4 3 5 ∼ 0 −5 1 ∼ 0 1 − 1 ∼ 0 1 − 1 .
5
5
2 −1 3
2 −1 3
0 −5 1
0 −5 1
00 0
Since there are two nonzero rows in this row-echelon form of A, it follows from Deﬁnition 2.4.11 that rank(A) = 2.
1. A31 (−1) 2. A12 (−4), A13 (−2) 3. M2 (−1/5) 4. A23 (5)
In the preceding example, the original matrix A had three nonzero rows, whereas any
row-echelon form of A has only two nonzero rows. We can interpret this geometrically as
follows. The three row vectors of A can be considered as vectors in R3 with components
a1 = (3, 1, 4), a2 = (4, 3, 5), a3 = (2, −1, 3). In performing elementary row operations on A, we are taking combinations of these
vectors in the following way:
c1 a1 + c2 a2 + c3 a3 ,
and thus the rows of a row-echelon form of A are all of this form. We have been combining
the vectors linearly. The fact that we obtained a row of zeros in the row-echelon form
means that there are values of the constants c1 , c2 , c3 such that
c1 a1 + c2 a2 + c3 a3 = 0,
where 0 denotes the zero vector (0, 0, 0). Equivalently, one of the vectors can be written
in terms of the other two vectors, and therefore the three vectors lie in a plane. Reducing
the matrix to row-echelon form has uncovered this relationship among the three vectors.
We shall have much more to say about this in Chapter 4. Remark If A is an m × n matrix, then rank(A) ≤ m and rank(A) ≤ n. This is because
the number of nonzero rows in a row-echelon form of A is equal to the number of pivots
in a row-echelon form of A, which cannot exceed the number of rows or columns of A,
since there can be at most one pivot per row and per column. Reduced Row-Echelon Matrices
In the future we will need to consider the special row-echelon matrices that arise when
zeros are placed above, as well as beneath, each leading 1. Any such matrix is called a
reduced row-echelon matrix and is deﬁned precisely as follows. DEFINITION 2.4.13
An m × n matrix is called a reduced row-echelon matrix if it satisﬁes the following
conditions:
1. It is a row-echelon matrix.
2. Any column that contains a leading 1 has zeros everywhere else. i i i i i i i “main”
2007/2/16
page 148
i 148 CHAPTER 2 Matrices and Systems of Linear Equations Example 2.4.14 The following are examples of reduced row-echelon matrices: 1300
0 0 1 0 ,
0001 1 −1 7 0
,
0 001 1053
0 1 2 1 ,
0000 and 100
0 1 0 .
001 Although an m × n matrix A does not have a unique row-echelon form, in reducing
A to a reduced row-echelon matrix we are making a particular choice of row-echelon
matrix, since we arrange that all elements above each leading 1 are zeros. In view of this,
the following theorem should not be too surprising:
Theorem 2.4.15 An m × n matrix is row-equivalent to a unique reduced row-echelon matrix.
The unique reduced row-echelon matrix to which a matrix A is row-equivalent will
be called the reduced row-echelon form of A. As illustrated in the next example, the
row-reduction algorithm is easily extended to determine the reduced row-echelon form
of A—we just put zeros above and beneath each leading 1. Example 2.4.16 3 −1 22
Determine the reduced row-echelon form of A = −1 5 2.
2 4 24 Solution: We apply the row-reduction algorithm, but put 0s above and below the
leading 1s. In so doing, it is immaterial whether we ﬁrst reduce A to row-echelon form
and then arrange 0s above the leading 1s, or arrange 0s both above and below the leading
1s as we proceed from left to right. 1 9 26
1
9 26
1
9 26
108
3 −1 22
1
2
3
4
1
2 ∼ 0 1 2
A = −1 5 2 ∼ −1 5 2 ∼ 0 14 28 ∼ 0
2 4 24
0 −14 −28
0 −14 −28
000
2 4 24
which is the reduced row-echelon form of A.
1. A21 (2) 2. A12 (1), A13 (−2) 3. M2 (1/14) 4. A21 (−9), A23 (14) Exercises for 2.4 Key Terms
Elementary row operations, Row-equivalent matrices, Back
substitution, Row-echelon matrix, Row-echelon form, Leading 1, Pivot, Rank of a matrix, Reduced row-echelon matrix. Skills
• Be able to perform elementary row operations on a
matrix.
• Be able to determine a row-echelon form or reduced
row-echelon form for a matrix. • Be able to ﬁnd the rank of a matrix. True-False Review
For Questions 1–9, decide if the given statement is true or
false, and give a brief justiﬁcation for your answer. If true,
you can quote a relevant deﬁnition or theorem from the text.
If false, provide an example, illustration, or brief explanation
of why the statement is false.
1. A matrix A can have many row-echelon forms but only
one reduced row-echelon form. i i i i i i i “main”
2007/2/16
page 149
i 2.4 Elementary Row Operations and Row-Echelon Matrices 2. Any upper triangular n × n matrix is in row-echelon
form.
3. Any n × n matrix in row-echelon form is upper triangular.
4. If a matrix A has more rows than a matrix B , then
rank(A) ≥ rank(B).
5. For any matrices A and B of the same dimensions, 0100
8. 0 0 1 0 .
0000
For Problems 9–18, use elementary row operations to reduce
the given matrix to row-echelon form, and hence determine
the rank of each matrix.
21
.
1 −3 9. rank(A + B) = rank(A) + rank(B).
6. For any matrices A and B of the appropriate dimensions,
rank(AB) = rank(A) · rank(B).
7. If a matrix has rank zero, then it must be the zero
matrix.
8. The matrices A and 2A must have the same rank.
9. The matrices A and 2A must have the same reduced
row-echelon form. Problems
For Problems 1–8, determine whether the given matrices are
in reduced row-echelon form, row-echelon form but not reduced row-echelon form, or neither. 1 0 −1 0
1. 0 0 1 2 .
00 00 1025
2. 1 0 0 2 .
0110
1000
.
0001 3.
4. 01
.
10 5. 11
.
00 1
0 6. 0
0 0
7. 0
0 0
0
0
0 1
1
0
0 2
1
.
1
0 000
0 0 0 .
000 149 2 −4
.
−4 8 10. 2 14
11. 2 −3 4 .
3 −2 6 013
12. 0 1 4 .
035 2 −1
13. 3 2 .
25 2 −1 3
14. 3 1 −2 .
2 −2 1 2 −1 3 4
15. 1 −2 1 3 .
1 −5 0 5 2
3
16. 1
2 −2 −1 3
−2 3 1 .
−1 1 0 −1 2 2 4 74 7
3 5 3 5 17. 2 −2 2 −2 .
5 −2 5 −2 21342
18. 1 0 2 1 3 .
23157
For Problems 19–25, reduce the given matrix to reduced rowechelon form and hence determine the rank of each matrix.
19. 32
.
1 −1 i i i i i i i “main”
2007/2/16
page 150
i 150 CHAPTER 2 3
20. 2
1 3
21. 2
6 22. 23. 24. 25. Matrices and Systems of Linear Equations 7 10
3 −1 .
21 −3 6
−2 4 .
−6 12 the linear algebra package of Maple, the three elementary
row operations are 3 5 −12 2 3 −7 .
−2 −1
1 1 −1 −1 2 3 −2 0 7 2 −1 2 4 .
4 −2 3 8 1 −2 1 3 3 −6 2 7 .
4 −8 3 10 0121 0 3 1 2 .
0201 For Problems 26–28, use some form of technology to determine a row-echelon form of the given matrix. • swaprow(A, i, j ) : permute rows i and j
• mulrow(A, i, k) : multiply row i by k
• addrow(A, i, j ) : add k times row i to row j 26. The matrix in Problem 14.
27. The matrix in Problem 15.
28. The matrix in Problem 18.
Many forms of technology also have built-in functions
for directly determining the reduced row-echelon form of
a given matrix A. For example, in the linear algebra package of Maple, the appropriate command is rref(A). In Problems 29–31, use technology to determine directly the reduced
row-echelon form of the given matrix.
29. The matrix in Problem 21. Many forms of technology have commands for performing
elementary row operations on a matrix A. For example, in 2.5 30. The matrix in Problem 24.
31. The matrix in Problem 25. Gaussian Elimination
We now illustrate how elementary row-operations applied to the augmented matrix of a
system of linear equations can be used ﬁrst to determine whether the system is consistent,
and second, if the system is consistent, to ﬁnd all of its solutions. In doing so, we will
develop the general theory for linear systems of equations. Example 2.5.1 Determine the solution set to
3x1 − 2x2 + 2x3 = 9,
x1 − 2x2 + x3 = 5,
2x1 − x2 − 2x3 = −1. (2.5.1)
Solution: We ﬁrst use elementary row operations to reduce the augmented matrix of
the system to row-echelon form. 3 −2 2 9
1 −2 1 5
1 −2 1
5
1
2
1 −2 1 5 ∼ 3 −2 2 9 ∼ 0 4 −1 − 6
2 −1 −2 −1
2 −1 −2 −1
0 3 −4 −11 1 −2 1
5
1 −2
1
5
1 −2 1
4
5
5 ∼ 0 1
3
5 ∼ 0 1 3
∼ 0 1 3
0 3 −4 −11
0 0 −13 −26
0 01
3 5
5 .
2 i i i i i i i “main”
2007/2/16
page 151
i 2.5 1. P12 151 Gaussian Elimination 2. A12 (−3), A13 (−2) 3. A32 (−1) 4. A23 (−3) 5. M3 (−1/13) The system corresponding to this row-echelon form of the augmented matrix is
x1 − 2x2 + x3 = 5,
x2 + 3x3 = 5,
x3 = 2, (2.5.2)
(2.5.3)
(2.5.4) which can be solved by back substitution. From Equation (2.5.4), x3 = 2. Substituting
into Equation (2.5.3) and solving for x2 , we ﬁnd that x2 = −1. Finally, substituting into
Equation (2.5.2) for x3 and x2 and solving for x1 yields x1 = 1. Thus, our original system
of equations has the unique solution (1, −1, 2), and the solution set to the system is
S = {(1, −1, 2)},
which is a subset of R3 .
The process of reducing the augmented matrix to row-echelon form and then using
back substitution to solve the equivalent system is called Gaussian elimination. The
particular case of Gaussian elimination that arises when the augmented matrix is reduced
to reduced row-echelon form is called Gauss-Jordan elimination.
Example 2.5.2 Use Gauss-Jordan elimination to determine the solution set to
x1 + 2x2 − x3 = 1,
2x1 + 5x2 − x3 = 3,
x1 + 3x2 + 2x3 = 6. Solution: In this case, we ﬁrst reduce the augmented matrix of the system to reduced
row-echelon form. 1 2 −1 1
1 0 −3 −1
1 0 −3 −1
100 5
1 2 −1 1
1
2
3
4
2 5 −1 3 ∼ 0 1 1 1 ∼ 0 1 1 1 ∼ 0 1 1 1 ∼ 0 1 0 −1
01 35
00 2 4
00 1 2
001 2
13 26
1. A12 (−2), A13 (−1) 2. A21 (−2), A23 (−1) 3. M3 (1/2) 4. A31 (3), A32 (−1)
The augmented matrix is now in reduced row-echelon form. The equivalent system is
= x1
x2 5, = −1,
x3 = 2. and the solution can be read off directly as (5, −1, 2). Consequently, the given system
has solution set
S = {(5, −1, 2)}
in R3 .
We see from the preceding two examples that the advantage of Gauss-Jordan elimination over Gaussian elimination is that it does not require back substitution. However,
the disadvantage is that reducing the augmented matrix to reduced row-echelon form
requires more elementary row operations than reduction to row-echelon form. It can be i i i i i i i “main”
2007/2/16
page 152
i 152 CHAPTER 2 Matrices and Systems of Linear Equations shown, in fact, that in general, Gaussian elimination is the more computationally efﬁcient technique. As we will see in the next section, the main reason for introducing the
Gauss-Jordan method is its application to the computation of the inverse of an n × n
matrix. Remark The Gaussian elimination method is so systematic that it can be programmed
easily on a computer. Indeed, many large-scale programs for solving linear systems are
based on the row-reduction method.
In both of the preceding examples,
rank(A) = rank(A# ) = number of unknowns in the system
and the system had a unique solution. More generally, we have the following lemma:
Lemma 2.5.3 Consider the m × n linear system Ax = b. Let A# denote the augmented matrix of the
system. If rank(A) = rank(A# ) = n, then the system has a unique solution. Proof If rank(A) = rank(A# ) = n, then there are n leading ones in any row-echelon
form of A, hence back substitution gives a unique solution. The form of the row-echelon
form of A# is shown below, with m − n rows of zeros at the bottom of the matrix omitted
and where the ∗’s denote unknown elements of the row-echelon form. 1 ∗ ∗ ∗ ... ∗ ∗
0 1 ∗ ∗ . . . ∗ ∗ 0 0 1 ∗ . . . ∗ ∗ . . . .
. . . . . . . . .
. .
. . . .
..
0 0 0 0 ... 1 ∗
Note that rank(A) cannot exceed rank(A# ). Thus, there are only two possibilities
for the relationship between rank(A) and rank(A# ): rank(A) < rank(A# ) or rank(A) =
rank(A# ). We now consider what happens in these cases.
Example 2.5.4 Determine the solution set to
x1 + x2 − x3 + x4 = 1,
2x1 + 3x2 + x3
= 4,
3x1 + 5x2 + 3x3 − x4 = 5. Solution: We use elementary row operations to reduce the augmented matrix: 1 1 −1 1 1
1 1 −1 1 1
1 1 −1 1 1
1
2
2 3 1 0 4 ∼ 0 1 3 −2 2 ∼ 0 1 3 −2 2
3 5 3 −1 5
0 2 6 −4 2
0 0 0 0 −2
1. A12 (−2), A13 (−3) 2. A23 (−2) The last row tells us that the system of equations has no solution (that is, it is inconsistent),
since it requires
0x1 + 0x2 + 0x3 + 0x4 = −2,
which is clearly impossible. The solution set to the system is thus the empty set ∅. i i i i i i i “main”
2007/2/16
page 153
i 2.5 Gaussian Elimination 153 In the previous example, rank(A) = 2, whereas rank(A# ) = 3. Thus, rank(A) <
rank(A# ), and the corresponding system has no solution. Next we establish that this
result is true in general.
Lemma 2.5.5 Consider the m × n linear system Ax = b. Let A# denote the augmented matrix of the
system. If rank(A) < rank(A# ), then the system is inconsistent. Proof If rank(A) < rank(A# ), then there will be one row in the reduced row-echelon
form of the augmented matrix whose ﬁrst nonzero element arises in the last column.
Such a row corresponds to an equation of the form
0x1 + 0x2 + · · · + 0xn = 1,
which has no solution. Consequently, the system is inconsistent.
Finally, we consider the case when rank(A) = rank(A# ). If rank(A) = n, we have
already seen in Lemma 2.5.3 that the system has a unique solution. We now consider an
example in which rank(A) < n.
Example 2.5.6 Determine the solution set to
5x1 −6x2 +x3 = 4,
2x1 −3x2 +x3 = 1,
4x1 −3x2 −x3 = 5. Solution: (2.5.5) We begin by reducing the augmented matrix of the system. 5 −6 1 4
1 −3 2 −1
1 −3 2 −1
1
2
2 −3 1 1 ∼ 2 −3 1 1 ∼ 0 3 −3 3
4 −3 −1 5
4 −3 1 5
−
0 9 9 9
− 1 −3 2 −1
1 −3 2 −1
3
4
∼ 0 1 −1 1 ∼ 0 1 −1 1
0 9 −9 9
0000
1. A31 (−1) 2. A12 (−2), A13 (−4) 3. M2 (1/3) 4. A23 (−9) The augmented matrix is now in row-echelon form, and the equivalent system is
x1 − 3x2 + 2x3 = −1,
x2 − x3 = 1. (2.5.6)
(2.5.7) Since we have three variables, but only two equations relating them, we are free to
specify one of the variables arbitrarily. The variable that we choose to specify is called
a free variable or free parameter. The remaining variables are then determined by
the system of equations and are called bound variables or bound parameters. In the
foregoing system, we take x3 as the free variable and set
x3 = t,
where t can assume any real value5 . It follows from (2.5.7) that
x2 = 1 + t.
5 When considering systems of equations with complex coefﬁcients, we allow free variables to assume complex values as well. i i i i i i i “main”
2007/2/16
page 154
i 154 CHAPTER 2 Matrices and Systems of Linear Equations Further, from Equation (2.5.6),
x1 = −1 + 3(1 + t) − 2t = 2 + t.
Thus the solution set to the given system of equations is the following subset of R3 :
S = {(2 + t, 1 + t, t) : t ∈ R}.
The system has an inﬁnite number of solutions, obtained by allowing the parameter t to
assume all real values. For example, two particular solutions of the system are
(2, 1, 0) and (0, −1, −2), corresponding to t = 0 and t = −2, respectively. Note that we can also write the solution
set S above in the form
S = {(2, 1, 0) + t (1, 1, 1) : t ∈ R}. Remark The geometry of the foregoing solution is as follows. The given system
(2.5.5) can be interpreted as consisting of three planes in 3-space. Any solution to the
system gives the coordinates of a point of intersection of the three planes. In the preceding
example the planes intersect in a line whose parametric equations are
x1 = 2 + t, x2 = 1 + t, x3 = t. (See Figure 2.3.1.)
In general, the solution to a consistent m × n system of linear equations may involve
more than one free variable. Indeed, the number of free variables will depend on how
many nonzero rows arise in any row-echelon form of the augmented matrix, A# , of the
system; that is, it will depend on the rank of A# . More precisely, if rank(A# ) = r # , then the
equivalent system will have only r # relationships between the n variables. Consequently,
provided the system is consistent,
number of free variables = n − r # .
We therefore have the following lemma.
Lemma 2.5.7 Consider the m × n linear system Ax = b. Let A# denote the augmented matrix of the
system and let r # = rank(A# ). If r # = rank(A) < n, then the system has an inﬁnite
number of solutions, indexed by n − r # free variables. Proof As discussed before, any row-echelon equivalent system will have only r # equations involving the n variables, and so there will be n − r # > 0 free variables. If we
assign arbitrary values to these free variables, then the remaining r # variables will be
uniquely determined, by back substitution, from the system. Since each free variable can
assume inﬁnitely many values, in this case there are an inﬁnite number of solutions to
the system.
Example 2.5.8 Use Gaussian elimination to solve
x1 − 2x2 + 2x3 − x4 = 3,
3x1 + x2 + 6x3 + 11x4 = 16,
2x1 − x2 + 4x3 + 4x4 = 9. i i i i i i i “main”
2007/2/16
page 155
i 2.5 Solution: Gaussian Elimination 155 A row-echelon form of the augmented matrix of the system is 1 −2 2 −1 3
0 1 0 2 1 ,
0 00 00 so that we have two free variables. The equivalent system is
x1 − 2x2 + 2x3 − x4 = 3,
x2
+ 2x4 = 1. (2.5.8)
(2.5.9) Notice that we cannot choose any two variables freely. For example, from Equation
(2.5.9), we cannot specify both x2 and x4 independently. The bound variables should be
taken as those that correspond to leading 1s in the row-echelon form of A# , since these
are the variables that can always be determined by back substitution (they appear as the
leftmost variable in some equation of the system corresponding to the row echelon form
of the augmented matrix).
Choose as free variables those variables that
do not correspond to a leading 1 in a row-echelon form of A# .
Applying this rule to Equations (2.5.8) and (2.5.9), we choose x3 and x4 as free variables
and therefore set
x3 = s, x4 = t. It then follows from Equation (2.5.9) that
x2 = 1 − 2t.
Substitution into (2.5.8) yields
x1 = 5 − 2s − 3t,
so that the solution set to the given system is the following subset of R4 :
S = {(5 − 2s − 3t, 1 − 2t, s, t) : s, t ∈ R}.
= {(5, 1, 0, 0) + s(−2, 0, 1, 0) + t (−3, −2, 0, 1) : s, t ∈ R}.
Lemmas 2.5.3, 2.5.5, and 2.5.7 completely characterize the solution properties of
an m × n linear system. Combining the results of these three lemmas gives the next
theorem.
Theorem 2.5.9 Consider the m × n linear system Ax = b. Let r denote the rank of A, and let r # denote
the rank of the augmented matrix of the system. Then
1. If r < r # , the system is inconsistent.
2. If r = r # , the system is consistent and
(a) There exists a unique solution if and only if r # = n.
(b) There exists an inﬁnite number of solutions if and only if r # < n. i i i i i i i “main”
2007/2/16
page 156
i 156 CHAPTER 2 Matrices and Systems of Linear Equations Homogeneous Linear Systems
Many problems that we will meet in the future will require the solution to a homogeneous
system of linear equations. The general form for such a system is
a11 x1 + a12 x2 + · · · + a1n xn = 0,
a21 x1 + a22 x2 + · · · + a2n xn = 0,
.
.
. (2.5.10) am1 x1 + am2 x2 + · · · + amn xn = 0,
or, in matrix form, Ax = 0, where A is the coefﬁcient matrix of the system and 0 denotes
the m-vector whose elements are all zeros.
Corollary 2.5.10 The homogeneous linear system Ax = 0 is consistent for any coefﬁcient matrix A, with
a solution given by x = 0. Proof We can see immediately from (2.5.10) that if x = 0, then Ax = 0, so x = 0 is a
solution to the homogeneous linear system.
Alternatively, we can deduce the consistency of this system from Theorem 2.5.9 as
follows. The augmented matrix A# of a homogeneous linear system differs from that of
the coefﬁcient matrix A only by the addition of a column of zeros, a feature that does
not affect the rank of the matrix. Consequently, for a homogeneous system, we have
rank(A# ) = rank(A), and therefore, from Theorem 2.5.9, such a system is necessarily
consistent.
Remarks
1. The solution x = 0 is referred to as the trivial solution. Consequently, from
Theorem 2.5.9, a homogeneous system either has only the trivial solution or has
an inﬁnite number of solutions (one of which must be the trivial solution).
2. Once more it is worth mentioning the geometric interpretation of Corollary 2.5.10
in the case of a homogeneous system with three unknowns. We can regard each
equation of such a system as deﬁning a plane. Owing to the homogeneity, each
plane passes through the origin, hence the planes intersect at least at the origin.
Often we will be interested in determining whether a given homogeneous system
has an inﬁnite number of solutions, and not in actually obtaining the solutions. The
following corollary to Theorem 2.5.9 can sometimes be used to determine by inspection
whether a given homogeneous system has nontrivial solutions:
Corollary 2.5.11 A homogeneous system of m linear equations in n unknowns, with m < n, has an inﬁnite
number of solutions. Proof Let r and r # be as in Theorem 2.5.9. Using the fact that r = r # for a homogeneous
system, we see that since r # ≤ m < n, Theorem 2.5.9 implies that the system has an
inﬁnite number of solutions.
Remark If m ≥ n, then we may or may not have nontrivial solutions, depending on
whether the rank of the augmented matrix, r # , satisﬁes r # < n or r # = n, respectively.
We encourage the reader to construct linear systems that illustrate each of these two
possibilities. i i i i i i i “main”
2007/2/16
page 157
i 2.5 Gaussian Elimination 157 Example 2.5.12 02 3
Determine the solution set to Ax = 0, if A = 0 1 −1.
03 7 Solution: The augmented matrix of the system is 02 30
0 1 −1 0 ,
03 70 with reduced row-echelon form 0100
0 0 1 0 .
0000
The equivalent system is
x2 = 0,
x3 = 0.
It is tempting, but incorrect, to conclude from this that the solution to the system
is x1 = x2 = x3 = 0. Since x1 does not occur in the system, it is a free variable and
therefore not necessarily zero. Consequently, the correct solution to the foregoing system
is (r, 0, 0), where r is a free variable, and the solution set is {(r, 0, 0) : r ∈ R}.
The linear systems that we have so far encountered have all had real coefﬁcients, and
we have considered corresponding real solutions. The techniques that we have developed
for solving linear systems are also applicable to the case when our system has complex
coefﬁcients. The corresponding solutions will also be complex. Remark In general, the simplest method of putting a leading 1 in a position that
contains the complex number a + ib is to multiply the corresponding row by the scalar
1
(a − ib). This is illustrated in steps 1 and 4 in the next example. If difﬁculties
a 2 +b 2
are encountered, consultation of Appendix A is in order. Example 2.5.13 Determine the solution set to
(1 + 2i)x1 +
4x2 + (3 + i)x3 = 0,
3x3 = 0,
(2 − i)x1 + (1 + i)x2 +
5ix1 + (7 − i)x2 + (3 + 2i)x3 = 0. Solution: We reduce the augmented matrix of the system. 1 + 2i 4 3 + i 0
1 4 (1 − 2i) 1 − i
5
1
2−i 1+i ∼ 2 − i 1 + i
30
3
5i 7 − i 3 + 2i 0
5i
7 − i 3 + 2i 4
1−i
1
5 (1 − 2i)
0 (1 + i) − 4 (1 − 2i)(2 − i) 3 − (1 − i)(2 − i)
∼
5
0
(7 − i) − 4i(1 − 2i)
(3 + 2i) − 5i(1 − i)
2 0
0
0 0
0
0 i i i i i i i “main”
2007/2/16
page 158
i 158 CHAPTER 2 Matrices and Systems of Linear Equations 1 4 (1 − 2i) 1 − i 0
1
5
3
= 0 1 + 5i
2 + 3i 0 ∼ 0
0 −1 − 5i −2 − 3i 0
0
1
∼ 0
0
4 4
5 (1 − 2i) 1
0 4
5 (1 − 2i) 1 + 5i
0 1−i
1
(17 − 7i)
26
0 1−i 0
2 + 3i 0
00 0
0 .
0 1. M1 ((1 − 2i)/5) 2. A12 (−(2 − i)), A13 (−5i) 3. A23 (1) 4. M2 ((1 − 5i)/26)
This matrix is now in row-echelon form. The equivalent system is
x1 + 4
5 (1 − 2i)x2
x2 +
+ (1 − i)x3
1
26 (17 − 7i)x3 = 0,
= 0. There is one free variable, which we take to be x3 = t , where t can assume any complex
value. Applying back substitution yields
x2 =
x1 =
= 1
26 t (−17 + 7i)
2
− 65 t (1 − 2i)(−17 + 7i) − t (1 − i)
1
− 65 t (59 + 17i) so that the solution set to the system is the subset of C3
1
1
− 65 t (59 + 17i), 26 t (−17 + 7i), t : t ∈ C . Exercises for 2.5 Key Terms
Gaussian elimination, Gauss-Jordan elimination, Free variables, Bound (or leading) variables, Trivial solution. Skills
• Be able to solve a linear system of equations by Gaussian elimination and by Gauss-Jordan elimination.
• Be able to identify free variables and bound variables
and know how they are used to construct the solution
set to a linear system. • Understand the relationship between the ranks of A
and A# , and how this affects the number of solutions
to a linear system. True-False Review
For Questions 1–6, decide if the given statement is true or
false, and give a brief justiﬁcation for your answer. If true,
you can quote a relevant deﬁnition or theorem from the text.
If false, provide an example, illustration, or brief explanation
of why the statement is false. i i i i i i i “main”
2007/2/16
page 159
i 2.5 1. The process by which a matrix is brought via elementary row operations to row-echelon form is known as
Gauss-Jordan elimination. 4. A linear system Ax = b is consistent if and only if the
last column of the row-echelon form of the augmented
matrix [A b] is not a pivot column. −
+
+
− +
−
−
+ x3
x3
2x3
x3 x4
x4
2x4
x4 =
=
=
= 8. 9. x1 + 2x2 + x3 + x4 − 2x5 = 3,
x3 + 4x4 − 3x5 = 2,
2x1 + 4x2 − x3 − 10x4 + 5x5 = 0. For Problems 10–15, use Gauss-Jordan elimination to determine the solution set to the given system.
10. 2x1 − x2 − x3 = 2,
4x1 + 3x2 − 2x3 = −1,
x1 + 4x2 + x3 = 4. 11. 3x1 + x2 + 5x3 = 2,
x1 + x2 − x3 = 1,
2x1 + x2 + 2x3 = 3.
− 2x3 = −3,
x1
3x1 − 2x2 − 4x3 = −9,
x1 − 4x2 + 2x3 = −3. 13. 2x1 − x2 + 3x3 − x4 = 3,
3x1 + 2x2 + x3 − 5x4 = −6,
x1 − 2x2 + 3x3 + x4 = 6. 14. x1
x1
x1
x1 15. 2x1
x1
3x1
x1
5x1 6. The columns of the row-echelon form of A# that contain the leading 1s correspond to the free variables. Problems
For Problems 1–9, use Gaussian elimination to determine
the solution set to the given system.
x1 + 2x2 + x3 = 1,
3x1 + 5x2 + x3 = 3,
2x1 + 6x2 + 7x3 = 1. 2. 3x1 − x2
= 1,
2x1 + x2 + 5x3 = 4,
7x1 − 5x2 − 8x3 = −3. 3. 3x1 + 5x2 − x3 = 14,
x1 + 2x2 + x3 = 3,
2x1 + 5x2 + 6x3 = 2. 4. 6x1 − 3x2 + 3x3 = 12,
2x1 − x2 + x3 = 4,
−4x1 + 2x2 − 2x3 = −8. 5. 2x1
3x1
7x1
5x1 −
+
+
− x2
x2
2x2
x2 +
−
−
− 3x3
2x3
3x3
2x3 = 14,
= −1,
= 3,
= 5. 6. 2x1
3x1
5x1
x1 −
+
+
+ x2
2x2
6x2
x2 −
−
−
− 4x3
5x3
6x3
3x3 = 5,
= 8,
= 20,
= −3. 7. x1 + 2x2 − x3 + x4 = 1,
2x1 + 4x2 − 2x3 + 2x4 = 2,
5x1 + 10x2 − 5x3 + 5x4 = 5. 159 1,
2,
1,
3. 12. 5. A linear system is consistent if and only if there are
free variables in the row-echelon form of the corresponding augmented matrix. 1. 2x2
3x2
5x2
x2 x1
2x1
x1
4x1 2. A homogeneous linear system of equations is always
consistent.
3. For a linear system Ax = b, every column of the
row-echelon form of A corresponds to either a bound
variable or a free variable, but not both, of the linear
system. +
−
−
+ Gaussian Elimination +
−
+
− +
−
−
+ x3
x3
x3
x3 −
−
+
+ x4
x4
x4
x4 x2
3x2
x2
2x2
3x2 +
−
−
+
− 3x3
2x3
2x3
x3
3x3 +
−
−
+
+ x2
x2
x2
x2 −
−
+
+
− = 4,
= 2,
= −2,
= −8.
x4
x4
x4
2x4
x4 −
−
+
+
+ x5
2x5
x5
3x5
2x5 = 11,
= 2,
= −2,
= −3,
= 2. For Problems 16–20, determine the solution set to the system Ax = b for the given coefﬁcient matrix A and right-hand
side vector b. 1 −3 1
8
16. A = 5 −4 1 , b = 15 .
2 4 −3
−4 1 05
0
17. A = 3 −2 11 , b = 2 .
2 −2 6
2 0 1 −1
−2
18. A = 0 5 1 , b = 8 .
02 1
5 i i i i i i i “main”
2007/2/16
page 160
i 160 CHAPTER 2 Matrices and Systems of Linear Equations 1 −1 0 −1
2
19. A = 2 1 3 7 , b = 2 .
3 −2 1 0
4 11 0 1
2 3 1 −2 3 , b = 8 .
20. A = 2 3 1 2 3
−2 3 5 −2
−9
21. Determine all values of the constant k for which the
following system has (a) no solution, (b) an inﬁnite
number of solutions, and (c) a unique solution.
x1 + 2x2 − x3 = 3,
2x1 + 5x2 + x3 = 7,
x1 + x2 − k 2 x3 = −k.
22. Determine all values of the constant k for which the
following system has (a) no solution, (b) an inﬁnite
number of solutions, and (c) a unique solution.
2x1
x1
4x1
3x1 +
+
+
− x2
x2
2x2
x2 −
+
−
+ x3
x3
x3
x3 +
−
+
+ x4
x4
x4
kx4 =
=
=
= 0,
0,
0,
0. 23. Determine all values of the constants a and b for which
the following system has (a) no solution, (b) an inﬁnite
number of solutions, and (c) a unique solution.
x1 + x2 − 2x3 = 4,
3x1 + 5x2 − 4x3 = 16,
2x1 + 3x2 − ax3 = b.
24. Determine all values of the constants a and b for which
the following system has (a) no solution, (b) an inﬁnite
number of solutions, and (c) a unique solution.
x1 −
ax2 = 3,
x2 = 6,
2x1 +
−3x1 + (a + b)x2 = 1.
25. Show that the system
x1 + x2 + x3 = y1 ,
2x1 + 3x2 + x3 = y2 ,
3x1 + 5x2 + x3 = y3 ,
has an inﬁnite number of solutions, provided that
(y1 , y2 , y3 ) lies on the plane whose equation is
y1 − 2y2 + y3 = 0. 6 26. Consider the system of linear equations
a11 x1 + a12 x2 = b1 ,
a21 x1 + a22 x2 = b2 .
Deﬁne 1 , 1, and 2 by = a11 a22 − a12 a21 ,
= a22 b1 − a12 b2 ,
2 = a11 b2 − a12 b1 . (a) Show that the given system has a unique solution
if and only if = 0, and that the unique solution
in this case is x1 = 1 / , x2 = 2 / .
(b) If = 0 and a11 = 0, determine the conditions
on 2 that would guarantee that the system has (i)
no solution, (ii) an inﬁnite number of solutions.
(c) Interpret your results in terms of intersections of
straight lines.
Gaussian elimination with partial pivoting uses the following algorithm to reduce the augmented matrix:
1. Start with augmented matrix A# .
2. Determine the leftmost nonzero column.
3. Permute rows to put the element of largest absolute
value in the pivot position.
4. Use elementary row operations to put zeros beneath
the pivot position.
5. If there are no more nonzero rows below the pivot position, go to 7, otherwise go to 6.
6. Apply (2)–(5) to the submatrix consisting of the rows
that lie below the pivot position.
7. The matrix is in reduced form.6
In Problems 27–30, use the preceding algorithm to reduce A#
and then apply back substitution to solve the equivalent system. Technology might be useful in performing the required
row operations.
27. The system in Problem 1.
28. The system in Problem 5.
29. The system in Problem 6.
30. The system in Problem 10. Notice that this reduced form is not a row-echelon matrix. i i i i i i i “main”
2007/2/16
page 161
i 2.5 31. (a) An n × n system of linear equations whose matrix of coefﬁcients is a lower triangular matrix
is called a lower triangular system. Assuming
that aii = 0 for each i , devise a method for solving such a system that is analogous to the backsubstitution method.
(b) Use your method from (a) to solve
x1
= 2,
2x1 − 3x2
= 1,
3x1 + x2 − x3 = 8. 33. 3x1 + 2x2 − x3 = 0,
2x1 + x2 + x3 = 0,
5x1 − 4x2 + x3 = 0.
+
−
−
+ −
+
−
− =
=
=
= 0,
0,
0,
0. 34. 2x1
3x1
x1
5x1 35. 2x1 − x2 − x3 = 0,
5x1 − x2 + 2x3 = 0,
x1 + x2 + 4x3 = 0. 36. (1 + 2i)x1 + (1 − i)x2 +
x3 = 0,
ix1 + (1 + i)x2 −
ix3 = 0,
2ix1 +
x2 + (1 + 3i)x3 = 0. 37. 3x1 + 2x2 + x3 = 0,
6x1 − x2 + 2x3 = 0,
12x1 + 6x2 + 4x3 = 0. 38. 2x1
3x1
5x1
3x1 39. x1 + (1 + i)x2 + (1 − i)x3 = 0,
ix1 +
x2 +
ix3 = 0,
(1 − 2i)x1 − (1 − i)x2 + (1 − 3i)x3 = 0. +
−
−
− x2
x2
x2
2x2 x2
2x2
6x2
5x2 −
−
−
+ x3
2x3
x3
2x3 8x3
5x3
3x3
x3 =
=
=
= 0,
0,
0,
0. 6x3
9x3
3x3
15x3 =
=
=
= 161 0,
0,
0,
0. 41. 2x1
3x1
x1
5x1 42. 4x1 − 2x2 − x3 − x4 = 0,
3x1 + x2 − 2x3 + 3x4 = 0,
5x1 − x2 − 2x3 + x4 = 0. 43. 2x1
x1
3x1
4x1 −
−
−
− +
+
−
+ 4x2
6x2
2x2
10x2 x2
x2
x2
2x2 x3
2x3
x3
x3 +
+
+
+ −
+
+
− x3
x3
x3
x3 +
−
−
+ x4
x4
2x4
x4 0,
0,
0,
0. =
=
=
= 0,
0,
0,
0. For Problems 44–54, determine the solution set to the system
Ax = 0 for the given matrix A. Does your answer contradict Theorem 2.5.9? Explain.
For Problems 33–43, determine the solution set to the given
system. =
=
=
= 40. 3
2
x1 − x2 + x3 = 2,
3
2
3x1 + x2 − x3 = 2. +
+
−
− x1 − x2
3x2
3x1
5x1 + x2 32. Find all solutions to the following nonlinear system of
equations:
3
2
4x1 + 2x2 + 3x3 = 12, Gaussian Elimination 44. A = 2 −1
.
34 45. A = 1 − i 2i
.
1 + i −2 1 + i 1 − 2i
.
−1 + i 2 + i 1 23
A = 2 −1 0 .
1 11 1 1 1 −1
A = −1 0 −1 2 .
13 2 2 2 − 3i 1 + i i − 1
A = 3 + 2i −1 + i −1 − i .
5−i
2i
−2 1 30
A = −2 −3 0 .
1 40 103 3 −1 7 A = 2 1 8 . 1 1 5
−1 1 −1 1 −1 0 1
A = 3 −2 0 5 .
−1 2 0 1 46. A = 47. 48. 49. 50. 51. 52. i i i i i i i “main”
2007/2/16
page 162
i 162 CHAPTER 2 Matrices and Systems of Linear Equations 1 0 −3 0
53. A = 3 0 −9 0 .
−2 0 6 0 2.6 2 + i i 3 − 2i
54. A = i 1 − i 4 + 3i .
3 − i 1 + i 1 + 5i The Inverse of a Square Matrix
In this section we investigate the situation when, for a given n × n matrix A, there exists
a matrix B satisfying
AB = In and BA = In (2.6.1) and derive an efﬁcient method for determining B (when it does exist). As a possible
application of the existence of such a matrix B , consider the n × n linear system
Ax = b. (2.6.2) Premultiplying both sides of (2.6.2) by an n × n matrix B yields
(BA)x = B b.
Assuming that BA = In , this reduces to
x = B b. (2.6.3) Thus, we have determined a solution to the system (2.6.2) by a matrix multiplication. Of
course, this depends on the existence of a matrix B satisfying (2.6.1), and even if such a
matrix B does exist, it will turn out that using (2.6.3) to solve n × n systems is not very
efﬁcient computationally. Therefore it is generally not used in practice to solve n × n
systems. However, from a theoretical point of view, a formula such as (2.6.3) is very
useful. We begin the investigation by establishing that there can be at most one matrix
B satisfying (2.6.1) for a given n × n matrix A.
Theorem 2.6.1 Let A be an n × n matrix. Suppose B and C are both n × n matrices satisfying
AB = BA = In , (2.6.4) AC = CA = In , (2.6.5) respectively. Then B = C . Proof From (2.6.4), it follows that
C = CIn = C(AB).
That is,
C = (CA)B = In B = B,
where we have used (2.6.5) to replace CA by In in the second step.
Since the identity matrix In plays the role of the number 1 in the multiplication of
matrices, the properties given in (2.6.1) are the analogs for matrices of the properties
xx −1 = 1, x −1 x = 1, which holds for all (nonzero) numbers x . It is therefore natural to denote the matrix B
in (2.6.1) by A−1 and to call it the inverse of A. The following deﬁnition introduces the
appropriate terminology. i i i i i i i “main”
2007/2/16
page 163
i 2.6 The Inverse of a Square Matrix 163 DEFINITION 2.6.2
Let A be an n × n matrix. If there exists an n × n matrix A−1 satisfying
AA−1 = A−1 A = In ,
then we call A−1 the matrix inverse to A, or just the inverse of A. We say that A is
invertible if A−1 exists.
Invertible matrices are sometimes called nonsingular, while matrices that are not
invertible are sometimes called singular. Remark It is important to realize that A−1 denotes the matrix that satisﬁes
AA−1 = A−1 A = In . It does not mean 1/A, which has no meaning whatsoever. Example 2.6.3 1 −1 2
0 −1 3
If A = 2 −3 3, verify that B = 1 −1 1 is the inverse of A.
1 −1 1
1 0 −1 Solution: By direct multiplication, we ﬁnd that 1 −1 2
0 −1 3
100
AB = 2 −3 3 1 −1 1 = 0 1 0 = I3
1 −1 1
1 0 −1
001 and 0 −1 3
1 −1 2
100
BA = 1 −1 1 2 −3 3 = 0 1 0 = I3 .
1 0 −1
1 −1 1
001 Consequently, (2.6.1) is satisﬁed, hence B is indeed the inverse of A. We therefore
write 0 −1 3
A−1 = 1 −1 1 .
1 0 −1
We now return to the n × n system of Equations (2.6.2).
Theorem 2.6.4 If A−1 exists, then the n × n system of linear equations
Ax = b
has the unique solution x = A−1 b for every b in Rn . i i i i i i i “main”
2007/2/16
page 164
i 164 CHAPTER 2 Matrices and Systems of Linear Equations Proof We can verify by direct substitution that x = A−1 b is indeed a solution to the
linear system. The uniqueness of this solution is contained in the calculation leading
from (2.6.2) to (2.6.3).
Our next theorem establishes when A−1 exists, and it also uncovers an efﬁcient
method for computing A−1 .
Theorem 2.6.5 An n × n matrix A is invertible if and only if rank(A) = n. Proof If A−1 exists, then by Theorem 2.6.4, any n × n linear system Ax = b has a
unique solution. Hence, Theorem 2.5.9 implies that rank(A) = n.
Conversely, suppose rank(A) = n. We must establish that there exists an n × n
matrix X satisfying
AX = In = XA.
Let e1 , e2 , . . . , en denote the column vectors of the identity matrix In . Since rank(A) = n,
Theorem 2.5.9 implies that each of the linear systems
Axi = ei , i = 1, 2, . . . , n (2.6.6) has a unique solution7 xi . Consequently, if we let X = [x1 , x2 , . . . , xn ], where x1 , x2 ,
. . . , xn are the unique solutions of the systems in (2.6.6), then
A[x1 , x2 , . . . , xn ] = [Ax1 , Ax2 , . . . , Axn ] = [e1 , e2 , . . . , en ];
that is,
AX = In . (2.6.7) We must also show that, for the same matrix X,
XA = In .
Postmultiplying both sides of (2.6.7) by A yields
(AX)A = A.
That is,
A(XA − In ) = 0n . (2.6.8) Now let y1 , y2 , . . . , yn denote the column vectors of the n × n matrix XA − In .
Equating corresponding column vectors on either side of (2.6.8) implies that
Ayi = 0, i = 1, 2, . . . , n. (2.6.9) But, by assumption, rank(A) = n, and so each system in (2.6.9) has a unique solution
that, since the systems are homogeneous, must be the trivial solution. Consequently, each
yi is the zero vector, and thus
XA − In = 0n .
Therefore,
XA = In . (2.6.10) 7 Notice that for an n × n system Ax = b, if rank(A) = n, then rank(A# ) = n. i i i i i i i “main”
2007/2/16
page 165
i 2.6 The Inverse of a Square Matrix 165 Equations (2.6.7) and (2.6.10) imply that X = A−1 .
We now have the following converse to Theorem 2.6.4.
Corollary 2.6.6 Let A be an n × n matrix. If Ax = b has a unique solution for some column n-vector b,
then A−1 exists. Proof If Ax = b has a unique solution, then from Theorem 2.5.9, rank(A) = n, and
so from the previous theorem, A−1 exists. Remark In particular, the above corollary tells us that if the homogeneous linear
system Ax = 0 has only the trivial solution x = 0, then A−1 exists.
Other criteria for deciding whether or not an n × n matrix A has an inverse will be
developed in the next three chapters, but our goal at present is to develop a method for
ﬁnding A−1 , should it exist.
Assuming that rank(A) = n, let x1 , x2 , . . . , xn denote the column vectors of A−1 .
Then, from (2.6.6), these column vectors can be obtained by solving each of the n × n
systems
i = 1, 2, . . . , n.
Axi = ei ,
As we now show, some computation can be saved if we employ the Gauss-Jordan method
in solving these systems. We ﬁrst illustrate the method when n = 3. In this case, from
(2.6.6), the column vectors of A−1 are determined by solving the three linear systems
Ax1 = e1 , Ax2 = e2 , Ax3 = e3 . The augmented matrices of these systems can be written as 1
0
0 0 ,
1 ,
0 ,
0
0
1 A A A respectively. Furthermore, since rank(A) = 3 by assumption, the reduced row-echelon
form of A is I3 . Consequently, using elementary row operations to reduce the augmented
matrix of the ﬁrst system to reduced row-echelon form will yield, schematically, 1
1 0 0 a1 0 ∼ ERO ∼ 0 1 0 a2 ,
...
0
0 0 1 a3 A which implies that the ﬁrst column vector of A−1 is a1
x1 = a2 .
a3
Similarly, for the second system, the reduction 0
1 0 0 b1 1 ∼ ERO ∼ 0 1 0 b2 ...
0
0 0 1 b3 A i i i i i i i “main”
2007/2/16
page 166
i 166 CHAPTER 2 Matrices and Systems of Linear Equations implies that the second column vector of A−1 is b1
x2 = b2 .
b3
Finally, for the third system, the reduction 0
1 0 0 c1 0 ∼ ERO ∼ 0 1 0 c2 ...
1
0 0 1 c3 A implies that the third column vector of A−1 is c1
x3 = c2 .
c3
Consequently, a1 b1 c1
= [x1 , x2 , x3 ] = a2 b2 c2 .
a3 b3 c3 A−1 The key point to notice is that in solving for x1 , x2 , x3 we use the same elementary
row operations to reduce A to I3 . We can therefore save a signiﬁcant amount of work by
combining the foregoing operations as follows: 100
1 0 0 a1 b1 c1 0 1 0 ∼ ERO ∼ 0 1 0 a2 b2 c2 .
...
001
0 0 1 a3 b3 c3 A The generalization to the n × n case is immediate. We form the n × 2n matrix [A In ]
and reduce A to In using elementary row operations. Schematically,
...
[A In ] ∼ ERO ∼ [In A−1 ]. This method of ﬁnding A−1 is called the Gauss-Jordan technique. Remark Notice that if we are given an n × n matrix A, we likely will not know from
the outset whether rank(A) = n, hence we will not know whether A−1 exists. However,
if at any stage in the row reduction of [A In ] we ﬁnd that rank(A) < n, then it will
follow from Theorem 2.6.5 that A is not invertible. Example 2.6.7 11 3
Find A−1 if A = 0 1 2 .
3 5 −1 Solution: 11 31
0 1 2 0
3 5 −1 0 Using the Gauss-Jordan technique, proceed as follows.
we 00
11
3 100
10
1 1 −1 0
1
2
1 0 ∼ 0 1
2 0 1 0 ∼ 0 1
2 0 1 0
01
0 2 −10 −3 0 1
0 0 −14 −3 −2 1 1
11
8
1 0 0 14 − 7 14 1 0 1 1 −1
0 3
4
5
1 .
012 0 1
0 ∼ 0 1 0 −3 7
∼ 7
7
3
1 0 0 1 14 1 − 14
7
1
3
1
0 0 1 14 7 − 14 i i i i i i i “main”
2007/2/16
page 167
i 2.6 Thus, −1 A 11
14 = −3
7 The Inverse of a Square Matrix −8
7 1
14 5
7 1
7 1
7 167 1
− 14 3
14 . We leave it as an exercise to conﬁrm that AA−1 = A−1 A = I3 .
1. A13 (−3) 2. A21 (−1), A23 (−2) 3. M3 (−1/14) 4. A31 (−1), A32 (−2) Example 2.6.8 Continuing the previous example, use A−1 to solve the system
x1 + x2 + 3x3 = 2,
x2 + 2x3 = 1,
3x1 + 5x2 − x3 = 4. Solution: The system can be written as
Ax = b, where A is the matrix in the previous example, and 2
b = 1.
4
Since A is invertible, the system has a unique solution that can be written as x = A−1 b.
Thus, from the previous example we have 11 8
5
1 2
14 − 7
14
7 35 3 1 1 = .
x = −7 7 7
7 3
2
1
1
4
14
7 − 14
7
Consequently, x1 =
532
7, 7, 7 5
7, x2 = 3
7, and x3 = 2
7, so that the solution to the system is . We now return to more theoretical information pertaining to the inverse of a matrix. Properties of the Inverse
The inverse of an n × n matrix satisﬁes the properties stated in the following theorem,
which should be committed to memory:
Theorem 2.6.9 Let A and B be invertible n × n matrices. Then
1. A−1 is invertible and (A−1 )−1 = A.
2. AB is invertible and (AB)−1 = B −1 A−1 .
3. AT is invertible and (AT )−1 = (A−1 )T . i i i i i i i “main”
2007/2/16
page 168
i 168 CHAPTER 2 Matrices and Systems of Linear Equations Proof The proof of each result consists of verifying that the appropriate matrix products
yield the identity matrix.
1. We must verify that
A−1 A = In and AA−1 = In . Both of these follow directly from Deﬁnition 2.6.2.
2. We must verify that
(AB)(B −1 A−1 ) = In and (B −1 A−1 )(AB) = In . We establish the ﬁrst equality, leaving the second equation as an exercise. We have
(AB)(B −1 )(A−1 ) = A(BB −1 )A−1 = AIn A−1 = AA−1 = In .
3. We must verify that
AT (A−1 )T = In and (A−1 )T AT = In . Again, we prove the ﬁrst part, leaving the second part as an exercise.
First recall from Theorem 2.2.21 that AT B T = (BA)T . Using this property with
B = A−1 yields
T
AT (A−1 )T = (A−1 A)T = In = In . The proof of property 2 of Theorem 2.6.9 can easily be extended to a statement
about invertibility of a product of an arbitrary ﬁnite number of matrices. More precisely,
we have the following.
Corollary 2.6.10 Let A1 , A2 , . . . , Ak be invertible n × n matrices. Then A1 A2 · · · Ak is invertible, and
(A1 A2 · · · Ak )−1 = A−1 A−11 · · · A−1 .
k
k−
1 Proof The proof is left as an exercise (Problem 28). Some Further Theoretical Results
Finally, in this section, we establish two results that will be required in Section 2.7 and
also in a proof that arises in Section 3.2.
Theorem 2.6.11 Let A and B be n × n matrices. If AB = In , then both A and B are invertible and
B = A−1 . Proof Let b be an arbitrary column n-vector. Then, since AB = In , we have
A(B b) = In b = b.
Consequently, for every b, the system Ax = b has the solution x = B b. But this implies
that rank(A) = n. To see why, suppose that rank(A) < n, and let A∗ denote a rowechelon form of A. Note that the last row of A∗ is zero. Choose b∗ to be any column i i i i i i i “main”
2007/2/16
page 169
i 2.6 The Inverse of a Square Matrix 169 n-vector whose last component is nonzero. Then, since rank(A) < n, it follows that the
system
A∗ x = b∗
is inconsistent. But, applying to the augmented matrix [A∗ b∗ ] the inverse row operations that reduced A to row-echelon form yields [A b] for some b. Since Ax = b has
the same solution set as A∗ x = b∗ , it follows that Ax = b is inconsistent. We therefore
have a contradiction, and so it must be the case that rank(A) = n, and therefore that A
is invertible by Theorem 2.6.5.
We now establish that8 A−1 = B . Since AB = In by assumption, we have
A−1 = A−1 In = A−1 (AB) = (A−1 A)B = In B = B,
as required. It now follows directly from property 1 of Theorem 2.6.9 that B is invertible
with inverse A. Corollary 2.6.12 Let A and B be n × n matrices. If AB is invertible, then both A and B are invertible. Proof If we let C = B(AB)−1 and D = AB , then
AC = AB(AB)−1 = DD −1 = In .
It follows from Theorem 2.6.11 that A is invertible. Similarly, if we let C = (AB)−1 A,
then
CB = (AB)−1 AB = In .
Once more we can apply Theorem 2.6.11 to conclude that B is invertible. Exercises for 2.6 Key Terms
Inverse, Invertible, Singular, Nonsingular, Gauss-Jordan
technique. • Know the basic properties related to how the inverse
operation behaves with respect to itself, multiplication, and transpose (Theorem 2.6.9). True-False Review Skills • Be able to ﬁnd the inverse of an invertible matrix via
the Gauss-Jordan technique. For Questions 1–10, decide if the given statement is true or
false, and give a brief justiﬁcation for your answer. If true,
you can quote a relevant deﬁnition or theorem from the text.
If false, provide an example, illustration, or brief explanation
of why the statement is false. • Be able to use the inverse of a coefﬁcient matrix of a
linear system in order to solve the system. 1. An invertible matrix is also known as a singular matrix. • Be able to check directly whether or not two matrices
A and B are inverses of each other. 8 Note that it now makes sense to speak of A−1 , whereas prior to proving in the preceding paragraph that
A is invertible, it would not have been legal to use the notation A−1 . i i i i i i i “main”
2007/2/16
page 170
i 170 CHAPTER 2 Matrices and Systems of Linear Equations 2. Every square matrix that does not contain a row of
zeros is invertible.
3. A linear system Ax = b with an n × n invertible coefﬁcient matrix A has a unique solution.
4. If A is a matrix such that there exists a matrix B with
AB = In , then A is invertible.
5. If A and B are invertible n × n matrices, then so is
A + B.
6. If A and B are invertible n × n matrices, then so is
AB .
7. If A is an invertible matrix such that A2 = A, then A
is the identity matrix.
8. If A is an n × n invertible matrix and B and C are n × n
matrices such that AB = AC , then B = C . 3
9. A = 1
2 0
10. A = 0
0 4
11. A = 2
3 12. 13. 9. If A is a 5 × 5 matrix of rank 4, then A is not invertible.
10. If A is a 6 × 6 matrix of rank 6, then A is invertible. 14. Problems
For Problems 1–3 verify by direct multiplication that the
given matrices are inverses of one another.
2 −1
, A−1 =
3 −1 1. A = 15. −1 1
.
−3 2 49
7 −9
, A−1 =
.
37
−3 4 8 −29 3
351
3. A = 1 2 1 , A−1 = −5 19 −2 .
2 −8 1
267 2. A = 16. 4. A = 12
.
13 5. A = 1 1+i
.
1−i 1 6. A = 1
−i
.
−1 + i 2 7. A = 00
.
00 1 −1 2
8. A = 2 1 11 .
4 −3 10 2 −13
1 −7 .
2
4 1 2 −3
A = 2 6 −2 .
−1 1 4 1
i2
A = 1 + i −1 2i .
2
2i 5 2 13
A = 1 −1 2 .
3 34 1 −1 2 3 2 0 3 −4 A= 3 −1 7 8 .
1 03 5 0 −2 −1 −3
2 0 2 1 A= 1 −2 0 2 .
3 −1 −2 0 17. Let For Problems 4–16, determine A−1 , if possible, using the
Gauss-Jordan method. If A−1 exists, check your answer by
verifying that AA−1 = In . 51
2 1 .
67 10
0 1 .
12 2 −1 4
A = 5 1 2.
1 −1 3 Find the second column vector of A−1 without determining the whole inverse.
For Problems 18–22, use A−1 to ﬁnd the solution to the given
system.
18. x1 + 3x2 = 1,
2x1 + 5x2 = 3. 19. x1 + x2 − 2x3 = −2,
x2 + x3 = 3,
2x1 + 4x2 − 3x3 = 1. 20. x1 − 2ix2 = 2,
(2 − i)x1 + 4ix2 = −i. i i i i i i i “main”
2007/2/16
page 171
i 2.6 21. 3x1 + 4x2 + 5x3 = 1,
2x1 + 10x2 + x3 = 1,
4x1 + x2 + 8x3 = 1. 22. x1 + x2 + 2x3 = 12,
x1 + 2x2 − x3 = 24,
2x1 − x2 + x3 = −36. 24. A = 171 The quantity deﬁned above is referred to as the determinant of A. We will investigate determinants in
more detail in the next chapter.
35. Let A be an n × n matrix, and suppose that we have
to solve the p linear systems An n × n matrix A is called orthogonal if AT = A−1 . For
Problems 23–26, show that the given matrices are orthogonal.
23. A = The Inverse of a Square Matrix 01
.
−1 0
√
3/2 √ /2
1
.
−1/2 3/2 Axi = bi , i = 1, 2, . . . , p where the bi are given. Devise an efﬁcient method for
solving these systems.
36. Use your method from the previous problem to solve
the three linear systems
Axi = bi , cos α sin α
25. A =
.
− sin α cos α 1
−2x 2x 2
1 2x 1 − 2x 2 −2x .
26. A =
1 + 2x 2
2x 2
2x
1 if 1 −1 1
A = 2 −1 4 ,
1 16 −1
b2 = 2 ,
5 27. Complete the proof of Theorem 2.6.9 by verifying the
remaining properties in parts 2 and 3.
28. Prove Corollary 2.6.10.
For Problems 29–30, use properties of the inverse to prove
the given statement. i = 1, 2, 3 1
b1 = 1 ,
−1 2
3.
b3 =
2 37. Let A be an m × n matrix with m ≤ n. 29. If A is an n × n invertible symmetric matrix, then A−1
is symmetric. (a) If rank(A) = m, prove that there exists a matrix
B satisfying AB = Im . Such a matrix is called a
right inverse of A. 30. If A is an n × n invertible skew-symmetric matrix, then
A−1 is skew-symmetric. (b) If 31. Let A be an n × n matrix with
In − A is invertible with A4 = 0. Prove that (In − A)−1 = In + A + A2 + A3 .
32. Prove that if A, B, C are n × n matrices satisfying
BA = In and AC = In , then B = C .
33. If A, B, C are n × n matrices satisfying BA = In and
CA = In , does it follow that B = C ? Justify your
answer.
34. Consider the general 2 × 2 matrix
A= a11 a12
a21 a22 and let = a11 a22 − a12 a21 with a11 = 0. Show that
if = 0,
A−1 = 1 A= a22 −a12
.
−a21 a11 131
,
274 determine all right inverses of A.
For Problems 38–39, reduce the matrix [A In ] to reduced
row-echelon form and thereby determine, if possible, the inverse of A. 5 9 17
38. A = 7 21 13 .
27 16 8
39. A is a randomly generated 4 × 4 matrix.
For Problems 40–42, use built-in functions of some form
of technology to determine rank(A) and, if possible,A−1 . 3
5 −7
5 9 .
40. A = 2
13 −11 22 i i i i i i i “main”
2007/2/16
page 172
i 172 CHAPTER 2 Matrices and Systems of Linear Equations 7 13 15 21 9 −2 14 23 41. A = 17 −27 22 31 .
19 −42 21 33 44. Hn = 42. A is a randomly generated 5 × 5 matrix.
43. 1
,
i+j −1 1 ≤ i, j ≤ n. (a) Determine H4 and show that it is invertible. For the system in Problem 21, determine A−1 and
use it to solve the system. 2.7 Consider the n × n Hilbert matrix −
(b) Find H4 1 and use it to solve H4 x = b if b =
[2, −1, 3, 5]T . Elementary Matrices and the LU Factorization
We now introduce some matrices that can be used to perform elementary row operations
on a matrix. Although they are of limited computational use, they do play a signiﬁcant
role in linear algebra and its applications. DEFINITION 2.7.1
Any matrix obtained by performing a single elementary row operation on the identity
matrix is called an elementary matrix. In particular, an elementary matrix is always a square matrix. In general we will
denote elementary matrices by E . If we are describing a speciﬁc elementary matrix, then
in keeping with the notation introduced previously for elementary row operations, we
will use the following notation for the three types of elementary matrices:
Type 1: Pij —permute rows i and j in In .
Type 2: Mi (k)—multiply row i of In by the nonzero scalar k .
Type 3: Aij (k)—add k times row i of In to row j of In . Example 2.7.2 Write all 2 × 2 elementary matrices. Solution: From Deﬁnition 2.7.1 and using the notation introduced above, we have P12 = 01
.
10 2. Scaling matrices: M1 (k) = k0
,
01 M2 (k) = 10
.
0k 3. Row combinations: A12 (k) = 10
,
k1 A21 (k) = 1k
.
01 1. Permutation matrix: We leave it as an exercise to verify that the n × n elementary matrices have the
following structure: i i i i i i i “main”
2007/2/16
page 173
i 2.7 Elementary Matrices and the LU Factorization 173 Pij : ones along main diagonal except (i, i) and (j, j ), ones in the (i, j ) and (j, i)
positions, and zeros elsewhere.
Mi (k): the diagonal matrix diag(1, 1, . . . , k, . . . , 1), where k appears in the (i, i)
position.
Aij (k): ones along the main diagonal, k in the (j, i) position, and zeros elsewhere.
A key point to note about elementary matrices is the following:
Premultiplying an n × p matrix A by an n × n elementary matrix E has the effect
of performing the corresponding elementary row operation on A.
Rather than proving this statement, which we leave as an exercise, we illustrate with
an example. Example 2.7.3 If A = 3 −1 4
, then, for example,
2 75
k0
01 M1 (k)A = 3 −1 4
2 75 = 3k −k 4k
.
2 75 Similarly,
A21 (k)A = 3 −1 4
2 75 1k
01 = 3 + 2k −1 + 7k 4 + 5k
.
2
7
5 Since elementary row operations can be performed on a matrix by premultiplication
by an appropriate elementary matrix, it follows that any matrix A can be reduced to rowechelon form by multiplication by a sequence of elementary matrices. Schematically we
can therefore write
Ek Ek −1 · · · E2 E1 A = U,
where U denotes a row-echelon form of A and the Ei are elementary matrices. Example 2.7.4 Determine elementary matrices that reduce A = 23
to row-echelon form.
14 Solution: We can reduce A to row-echelon form using the following sequence of
elementary row operations:
23
14 1 ∼ 1. P12 14
23 2 ∼ 14
0 −5 3 ∼ 14
.
01 2. A12 (−2) 3. M2 (− 1 )
5 Consequently,
M2 (− 1 )A12 (−2)P12 A =
5 14
,
01 i i i i i i i “main”
2007/2/16
page 174
i 174 CHAPTER 2 Matrices and Systems of Linear Equations which we can verify by direct multiplication:
M2 (− 1 )A12 (−2)P12 A =
5 10
0 −1
5 10
−2 1 01
10 = 10
0 −1
5 10
−2 1 14
23 = 10
0 −1
5 14
0 −5 = 23
14 14
.
01 Since any elementary row operation is reversible, it follows that each elementary
matrix is invertible. Indeed, in the 2 × 2 case it is easy to see that
P−1 =
12 01
,
10 M1 (k)−1 = A12 (k)−1 = 10
,
−k 1 1/k 0
,
01 M2 (k)−1 = A21 (k)−1 = 10
,
0 1/k 1 −k
.
01 We leave it as an exercise to verify that in the n × n case, we have:
Mi (k)−1 = Mi (1/k), P−1 = Pij ,
ij Aij (k)−1 = Aij (−k) Now consider an invertible n × n matrix A. Since the unique reduced row-echelon
form of such a matrix is the identity matrix In , it follows from the preceding discussion
that there exist elementary matrices E1 , E2 , . . . , Ek such that
Ek Ek −1 · · · E2 E1 A = In .
But this implies that
and hence, A−1 = Ek Ek −1 · · · E2 E1 , −
−
−
A = (A−1 )−1 = (Ek · · · E2 E1 )−1 = E1 1 E2 1 · · · Ek 1 , which is a product of elementary matrices. So any invertible matrix is a product of elementary matrices. Conversely, since elementary matrices are invertible, a product of
elementary matrices is a product of invertible matrices, hence is invertible by Corollary 2.6.10. Therefore, we have established the following.
Theorem 2.7.5 Let A be an n × n matrix. Then A is invertible if and only if A is a product of elementary
matrices. The LU Decomposition of an Invertible Matrix 9
For the remainder of this section, we restrict our attention to invertible n × n matrices. In
reducing such a matrix to row-echelon form, we have always placed leading ones on the
main diagonal in order that we obtain a row-echelon matrix. We now lift the requirement
that the main diagonal of the row-echelon form contain ones. As a consequence, the
matrix that results from row reduction will be an upper triangular matrix but will not
necessarily be in row-echelon form. Furthermore, reduction to such an upper triangular
form can be accomplished without the use of Type 2 row operations.
9 The material in the remainder of this section is not used elsewhere in the text. i i i i i i i “main”
2007/2/16
page 175
i 2.7 Example 2.7.6 Elementary Matrices and the LU Factorization 175 Use elementary row operations to reduce the matrix 25 3
A = 3 1 −2 −1 2 1
to upper triangular form. Solution: The given matrix can be reduced to upper triangular form using the following sequence of elementary row operations: 2
5
2
2
5
3 25 3 2
1 3 1 −2 ∼ 0 − 13 − 13 ∼ 0 − 13 − 13 .
2
2 2
2 −1 2 1
5
9
0
0 −2
0
2
2
3
1
9
1. A12 (− 2 ), A13 ( 2 ) 2. A23 ( 13 ) When using elementary row operations of Type 3, the multiple of a speciﬁc row that
is subtracted from row i to put a zero in the (i, j ) position is called a multiplier and
denoted mij . Thus, in the preceding example, there are three multipliers—namely,
3
m21 = 2 , 1
m31 = − 2 , 9
m32 = − 13 . The multipliers will be used in the forthcoming discussion.
In Example 2.7.6 we were able to reduce A to upper triangular form using only row
operations of Type 3. This is not always the case. For example, the matrix
05
32
requires that the two rows be permuted to obtain an upper triangular form. For the
moment, however, we will restrict our attention to invertible matrices A for which the
reduction to upper triangular form can be accomplished without permuting rows. In this
case, we can therefore reduce A to upper triangular form using row operations of Type
3 only. Furthermore, throughout the reduction process, we can restrict ourselves to Type
3 operations that add multiples of a row to rows beneath that row, by simply performing
the row operations column by column, from left to right. According to our description
of the elementary matrices Aij (k), our reduction process therefore uses only elementary
matrices that are unit lower triangular. More speciﬁcally, in terms of elementary matrices
we have
Ek Ek −1 · · · E2 E1 A = U,
where Ek , Ek −1 , . . . , E2 , E1 are unit lower triangular Type 3 elementary matrices and
U is an upper triangular matrix. Since each elementary matrix is invertible, we can write
the preceding equation as
−
−
−
A = E1 1 E2 1 · · · Ek 1 U. (2.7.1) But, as we have already argued, each of the elementary matrices in (2.7.1) is a unit lower
triangular matrix, and we know from Corollary 2.2.23 that the product of two unit lower i i i i i i i “main”
2007/2/16
page 176
i 176 CHAPTER 2 Matrices and Systems of Linear Equations triangular matrices is also a unit lower triangular matrix. Consequently, (2.7.1) can be
written as
A = LU, (2.7.2) −
−
−
L = E1 1 E2 1 · · · Ek 1 (2.7.3) where is a unit lower triangular matrix and U is an upper triangular matrix. Equation (2.7.2)
is referred to as the LU factorization of A. It can be shown (Problem 29) that this LU
factorization is unique.
Example 2.7.7 Determine the LU factorization of the matrix 25 3
A = 3 1 −2 .
−1 2 1 Solution: Using the results of Example 2.7.6, we can write 25
3
E3 E2 E1 A = 0 − 13 − 13 ,
2
2
0 0 −2 where
3
E1 = A12 (− 2 ), 1
E2 = A13 ( 2 ), Therefore, and 9
E3 = A23 ( 13 ). 25
3
U = 0 − 13 − 13 2
2
0 0 −2 and from (2.7.3),
−
−
−
L = E1 1 E2 1 · · · Ek 1 . (2.7.4) Computing the inverses of the elementary matrices, we have
−
3
E1 1 = A12 ( 2 ), −
1
E2 1 = A13 (− 2 ), and −
9
E3 1 = A23 (− 13 ). Substituting these results into (2.7.4) yields 1
00
1
00
100
100
3
3
1 0 = 2
1 0.
L = 2 1 0 0 1 00
9
1
1
9
0 − 13 1
001
−2 0 1
− 2 − 13 1
Consequently, A= 1
3
2
1
−2 5
3
00 2
13
13 1 0 0 − 2 − 2 9
− 13 1
0
0 −2 which is easily veriﬁed by a matrix multiplication. i i i i i i i “main”
2007/2/16
page 177
i 2.7 177 Elementary Matrices and the LU Factorization Computing the lower triangular matrix L in the LU factorization of A using (2.7.3)
can require a signiﬁcant amount of work. However, if we look carefully at the matrix
L in Example 2.7.7, we see that the elements beneath the leading diagonal are just the
corresponding multipliers. That is, if lij denotes the (i, j ) element of the matrix L, then
lij = mij , (2.7.5) i > j. Furthermore, it can be shown that this relationship holds in general. Consequently, we
do not need to use (2.7.3) to obtain L. Instead we use row operations of Type 3 to reduce
A to upper triangular form, and then we can use (2.7.5) to obtain L directly.
Example 2.7.8 Determine the LU decomposition for the matrix 12
2 1
.
6 −5 32 2 −3 5 −1 A=
32
−1 1 Solution: To determine U , we reduce A to upper triangular form using only row
operations of Type 3 in which we add multiples of a given row only to rows below the
given row. 2 −3 1 2 2 −3
1
2
2 −3 1
2
1 0 13 − 2 −4 2 2 0 13 − 1 −4 3 0 13 − 1 −4 1
2
2
2
2
∼ = U.
∼
A∼ 0
9
0
5 −4 0
0 5 −4 0 13 2 −8 2 0
0 0 71
0
0 45 35
13
13
13
7
1
0 −2 2 3
Row Operations Corresponding Multipliers 5
3
1
(1) A12 (− 2 ), A13 (− 2 ), A14 ( 2 )
1
(2) A23 (−1), A24 ( 13 )
9
(3) A34 (− 13 ) 5
m21 = 2 ,
m32 = 1,
9
m43 = 13 3
1
m31 = 2 , m41 = − 2
1
m42 = − 13 Consequently, from (2.7.4), L= 1
5
2
3
2
1
−2 0 00 1 0 0
.
1 1 0 19
− 13 13 1 We leave it as an exercise to verify that LU = A.
The question undoubtedly in the reader’s mind is: What is the use of the LU decomposition? In order to answer this question, consider the n × n system of linear equation
Ax = b, where A = LU . If we write the system as
LU x = b
and let U x = y, then solving Ax = b is equivalent to solving the pair of equations
Ly = b,
U x = y. i i i i i i i “main”
2007/2/16
page 178
i 178 CHAPTER 2 Matrices and Systems of Linear Equations Owing to the triangular form of each of the coefﬁcient matrices L and U , these systems
can be solved easily—the ﬁrst by “forward” substitution and the second by back substitution. In the case when we have a single right-hand-side vector b, the LU factorization for
solving the system has no advantage over Gaussian elimination. However, if we require
the solution of several systems of equations with the same coefﬁcient matrix A, say
Axi = bi , i = 1, 2, . . . , p then it is more efﬁcient to compute the LU factorization of A once, and then successively
solve the triangular systems
Lyi = bi ,
U xi = yi .
Example 2.7.9 i = 1, 2, . . . , p. Use the LU decomposition of 12
2 1 6 −5 32 2 −3 5 −1
A=
32
−1 1 2 −3 to solve the system Ax = b if b = 5 .
7 We have shown in the previous example that A = LU where 1 0 00
2 −3 1
2
5 0 13 − 1 −4 1 0 0
2 2
2
3 L=
and
U =
.
1 1 0
0
0 5 −4 2 1
19
− 2 − 13 13 1
0
0 0 71
13 Solution: We now solve the two triangular systems Ly = b and U x = y. Using forward substitution
on the ﬁrst of these systems, we have
y1 = 2, 5
y2 = −3 − 2 y1 = −8, 3
y3 = 5 − 2 y1 − y2 = 5 − 3 + 8 = 10,
1
y4 = 7 + 2 y1 + 1
13 y2 − 9
13 y3 =8− 8
13 − 90
13 = 6
13 . Solving U x = y via back substitution yields
x4 = 13
71 y4 x2 = 2
13 x1 = 1
2 = 6
71 , x3 = 1 (y3 + 4x4 ) =
5 1
y2 + 2 x3 + 4x4 = y1 + 3x2 − x3 − 2x4 2
13 = −8 +
1
2 367
355 2− 10 + 1
5 + 1086
355 24
71 − 24
71 = 734
355 , = − 362 ,
355
734
355 − 12
71 = − 117 .
71 Consequently,
6
x = − 117 , − 362 , 734 , 71 .
71
355 355 i i i i i i i “main”
2007/2/16
page 179
i 2.7 179 Elementary Matrices and the LU Factorization In the more general case when row interchanges are required to reduce an invertible
matrix A to upper triangular form, it can be shown that A has a factorization of the form
A = P LU, (2.7.6) where P is an appropriate product of elementary permutation matrices, L is a unit
lower triangular matrix, and U is an upper triangular matrix. From the properties of the
elementary permutation matrices, it follows (see Problem 27), that P −1 = P T . Using
(2.7.6) the linear system Ax = b can be written as
P LU x = b,
or equivalently,
LU x = P T b.
Consequently, to solve Ax = b in this case we can solve the two triangular systems
Ly = P T b,
U x = y.
For a full discussion of this and other factorizations of n × n matrices, and their
applications, the reader is referred to more advanced texts on linear algebra or numerical
analysis [for example, B. Noble and J. W.Daniel, Applied Linear Algebra (Englewood
Cliffs, N.J., Prentice Hall, 1988); J. Ll. Morris, Computational Methods in Elementary
Numerical Analysis (New York: Wiley, 1983)]. Exercises for 2.7 Key Terms
Elementary matrix, Multiplier, LU Factorization of a matrix. Skills
• Be able to determine whether or not a given matrix is
an elementary matrix.
• Know the form for the permutation matrices, scaling
matrices, and row combination matrices.
• Be able to write down the inverse of an elementary
matrix without any computation.
• Be able to determine elementary matrices that reduce
a given matrix to row-echelon form. • Be able to express an invertible matrix as a product of
elementary matrices.
• Be able to determine the multipliers of a matrix.
• Be able to determine the LU factorization of a matrix.
• Be able to use the LU factorization of a matrix A to
solve a linear system Ax = b. True-False Review
For Questions 1–10, decide if the given statement is true or
false, and give a brief justiﬁcation for your answer. If true,
you can quote a relevant deﬁnition or theorem from the text.
If false, provide an example, illustration, or brief explanation
of why the statement is false. i i i i i i i “main”
2007/2/16
page 180
i 180 CHAPTER 2 Matrices and Systems of Linear Equations 1. Every elementary matrix is invertible.
2. A product of elementary matrices is an elementary
matrix.
3. Every matrix can be expressed as a product of elementary matrices.
4. If A is an m × n matrix and E is an m × m elementary
matrix, then the matrices A and EA have the same
rank.
2
5. If Pij is a permutation matrix, then Pij = Pij . 6. If E1 and E2 are n × n elementary matrices, then
E1 E2 = E2 E1 .
7. If E1 and E2 are n × n elementary matrices of the same
type, then E1 E2 = E2 E1 .
8. Every matrix has an LU factorization.
9. In the LU factorization of a matrix A, the matrix L is a
unit lower triangular matrix and the matrix U is a unit
upper triangular matrix. 4 −5
.
14 9. A = 1 −1 0
10. A = 2 2 2 .
3 13 0 −4 −2
11. A = 1 −1 3 .
−2 2 2 123
12. A = 0 8 0 .
345
13. Determine elementary matrices E1 , E2 , . . . , Ek that
reduce
2 −1
A=
13
to reduced row-echelon form. Verify by direct multiplication that E1 E2 · · · Ek A = I2 .
14. Determine a Type 3 lower triangular elementary matrix E1 that reduces 10. A 4 × 4 matrix A that has an LU factorization has 10
multipliers. Problems
1. Write all 3 × 3 elementary matrices and their inverses.
For Problems 2–5, determine elementary matrices that reduce the given matrix to row-echelon form.
2. 35
.
1 −2 58 2
.
1 3 −1 3 −1 4
4. 2 1 3 .
1 32 1234
5. 2 3 4 5 .
3456 A= to upper triangular form. Use Equation (2.7.3) to determine L and verify Equation (2.7.2).
For Problems 15–20, determine the LU factorization of the
given matrix. Verify your answer by computing the product
LU .
15. A = 23
.
51 16. A = 31
.
52 3. For Problems 6–12, express the matrix A as a product of
elementary matrices.
6. A = 12
.
13 7. A = −2 −3
.
57 8. A = 3 −4
.
−1 2 3 −2
−1 5 3 −1 2
17. A = 6 −1 1 .
−3 5 2 521
18. A = −10 −2 3 .
15 2 −3 1 −1 2 3 2 0 3 −4 19. A = 3 −1 7 8 .
1 34 5 2 −3 1 2 4 −1 1 1 20. A = −8 2 2 −5 .
6 15 2 i i i i i i i “main”
2007/2/16
page 181
i 2.8 For Problems 21–24, use the LU factorization of A to solve
the system Ax = b.
12
3
,b =
.
23
−1 1 −3 5
1
22. A = 3 2 2 , b = 5 .
2 52
−1 22 1
1
23. A = 6 3 −1 , b = 0 .
−4 2 2
2 43 00
2
8 1 2 0
3 24. A = 0 5 3 6 , b = 0 .
0 0 −5 7
5
21. A = (b) The inverse of a unit upper triangular matrix is
unit upper triangular. Repeat for a unit lower triangular matrix. (a) Apply Corollary 2.6.12 to conclude that L2 and
U1 are invertible, and then use the fact that
L1 U1 = L2 U2 to establish that L−1 L1 =
2
−
U2 U1 1 .
(b) Use the result from (a) together with Theorem 2.2.22 and Corollary 2.2.23 to prove that
−
L−1 L1 = In and U2 U1 1 = In , from which the
2
required result follows.
30. QR Factorization: It can be shown that any invertible
n × n matrix has a factorization of the form 2 −1
−8 3 A = QR, to solve each of the systems Axi = bi if
b1 = 3
,
−1 b2 = 2
,
7 26. Use the LU factorization of −1 4
A= 3 1
5 −7 b3 = 5
.
−9 2
4
1 to solve each of the systems Axi = ei and thereby
determine A−1 .
27. If P = P1 P2 · · · Pk , where each Pi is an elementary
permutation matrix, show that P −1 = P T .
28. Prove that
(a) The inverse of an invertible upper triangular matrix is upper triangular. Repeat for an invertible
lower triangular matrix. 2.8 181 29. In this problem, we prove that the LU decomposition
of an invertible n × n matrix is unique in the sense
that, if A = L1 U1 and A = L2 U2 , where L1 , L2 are
unit lower triangular matrices and U1 , U2 are upper
triangular matrices, then L1 = L2 and U1 = U2 . 25. Use the LU factorization of
A= The Invertible Matrix Theorem I where Q and R are invertible, R is upper triangular,
and Q satisﬁes QT Q = In (i.e., Q is orthogonal).
Determine an algorithm for solving the linear system
Ax = b using this QR factorization.
For Problems 31–33, use some form of technology to determine the LU factorization of the given matrix. Verify the
factorization by computing the product LU . 3 5 −2
31. A = 2 7 9 .
−5 5 11 27 −19 32
32. A = 15 −16 9 .
23 −13 51 34 13 19 22 53 17 −71 20 33. A = 21 37 63 59 .
81 93 −47 39 The Invertible Matrix Theorem I
In Section 2.6, we deﬁned an n × n invertible matrix A to be a matrix such that there
exists an n × n matrix B satisfying AB = BA = In . There are, however, many other
important and useful viewpoints on invertibility of matrices. Some of these we have
already encountered in the preceding two sections, while others await us in later chapters.
It is worthwhile to begin collecting a list of conditions on an n × n matrix A that are i i i i i i i “main”
2007/2/16
page 182
i 182 CHAPTER 2 Matrices and Systems of Linear Equations mathematically equivalent to its invertibility. We refer to this theorem as the Invertible
Matrix Theorem. As we have indicated, this result is somewhat a “work in progress,”
and we shall return to it later in Sections 3.2 and 4.10.
Theorem 2.8.1 (Invertible Matrix Theorem)
Let A be an n × n matrix with real elements. The following conditions on A are
equivalent:
(a) A is invertible.
(b) The equation Ax = b has a unique solution for every b in Rn .
(c) The equation Ax = 0 has only the trivial solution x = 0.
(d) rank(A) = n.
(e) A can be expressed as a product of elementary matrices.
(f) A is row-equivalent to In . Proof The equivalence of (a), (b), and (d) has already been established in Section 2.6
in Theorems 2.6.4 and 2.6.5, as well as in Corollary 2.6.6. Moreover, the equivalence of
(a) and (e) was already established in Theorem 2.7.5.
Next we establish that (c) is an equivalent statement by proving that (b) ⇒ (c)
⇒ (d). Assuming that (b) holds, we can conclude that the linear system Ax = 0 has
a unique solution. However, one solution is evidently x = 0, hence this is the unique
solution to Ax = 0, which establishes (c). Next, assume that (c) holds. The fact that
Ax = 0 has only the trivial solution means that, in reducing A to row-echelon form, we
ﬁnd no free parameters. Thus, every column (and hence every row) of A contains a pivot,
which means that the row-echelon form of A has n nonzero rows; that is, rank(A) = n,
which is (d).
Finally, we prove that (e) ⇒ (f) ⇒ (a). If (e) holds, we can left multiply In
by a product of elementary matrices (corresponding to a sequence of elementary row
operations applied to In ) to obtain A. This means that A is row-equivalent to In , which
is (f). Last, if A is row-equivalent to In , we can write A as a product of elementary
matrices, each of which is invertible. Since a product of invertible matrices is invertible
(by Corollary 2.6.10), we conclude that A is invertible, as needed. Exercises for 2.8 Skills
• Know the list of characterizations of invertible matrices given in the Invertible Matrix Theorem.
• Be able to use the Invertible Matrix Theorem to draw
conclusions related to the invertibility of a matrix. True-False Review
For Questions 1–4, decide if the given statement is true or
false, and give a brief justiﬁcation for your answer. If true,
you can quote a relevant deﬁnition or theorem from the text.
If false, provide an example, illustration, or brief explanation
of why the statement is false. 1. If the linear system Ax = 0 has a nontrivial solution,
then A can be expressed as a product of elementary
matrices.
2. A 4 × 4 matrix A with rank(A) = 4 is row-equivalent
to I4 .
3. If A is a 3 × 3 matrix with rank(A) = 2, then the linear
system Ax = b must have inﬁnitely many solutions.
4. Any n × n upper triangular matrix is row-equivalent
to In . i i i i i i i “main”
2007/2/16
page 183
i 2.9 Problems
1. Use part (c) of the Invertible Matrix Theorem to prove
that if A is an invertible matrix and B and C are matrices of the same size as A such that AB = AC , then
B = C . [Hint: Consider AB − AC = 0.]
2. Give a direct proof of the fact that (d) ⇒ (c) in the
Invertible Matrix Theorem. 2.9 Chapter Review 183 3. Give a direct proof of the fact that (c) ⇒ (b) in the
Invertible Matrix Theorem.
4. Use the equivalence of (a) and (e) in the Invertible Matrix Theorem to prove that if A and B are invertible
n × n matrices, then so is AB .
5. Use the equivalence of (a) and (c) in the Invertible Matrix Theorem to prove that if A and B are invertible
n × n matrices, then so is AB . Chapter Review
In this chapter we have investigated linear systems of equations. Matrices provide a
convenient mathematical representation for linear systems, and whether or not a linear
system has a solution (and if so, how many) can be determined entirely from the matrix
for the linear system.
An m × n matrix A = [aij ] is a rectangular array of numbers arranged in m rows
and n columns. The entry in the i th row and j th column is written aij . More generally,
such an array, whose entries are allowed to depend on an indeterminate t , is known as
a matrix function. Matrix functions can be used to formulate systems of differential
equations.
If m = n, the matrix (or matrix function) is called a square matrix. Concepts Related to Square Matrices
• Main diagonal: the entries a11 , a22 , . . . , ann in the matrix.
• Trace: the sum of the entries on the main diagonal.
• Upper triangular matrix: aij = 0 for i > j .
• Lower triangular matrix: aij = 0 for i < j .
• Diagonal matrix: aij = 0 for i = j .
• Transpose: applying to any m × n matrix A, this is the n × m matrix AT obtained
from A by interchanging its rows and columns
• Symmetric matrix: AT = A; that is, aij = aj i .
• Skew-symmetric matrix: AT = −A; that is, aij = −aj i . In particular, aii = 0
for each i . Matrix Algebra
Given two matrices A and B of the same size m × n, we can perform the following
operations:
• Addition/subtraction A±B : add/subtract the corresponding elements of A and B .
• Scalar multiplication rA: multiply each entry of A by the real (or complex)
scalar r .
If A is m × n and B is n × p, we can form their product AB , which is an m × p
matrix whose (i, j )-entry is computed by taking the dot product of the i th row vector of
A with the j th column vector of B . Note that, in general, AB = BA. i i i i i i i “main”
2007/2/16
page 184
i 184 CHAPTER 2 Matrices and Systems of Linear Equations Linear Systems
The general m × n system of linear equations is of the form
a11 x1 + a12 x2 + · · · + a1n xn = b1 ,
a21 x1 + a22 x2 + · · · + a2n xn = b2 ,
.
.
.
am1 x1 + am2 x2 + · · · + amn xn = bm .
If each bi = 0, the system is called homogeneous. There are two useful ways to formulate
the above linear system:
1. Augmented matrix: a11 a12 . . . a1n a21 a22 . . . a2n A# = .
. .
am1 am2 . . . amn b1
b2
.
.
. . bm 2. Vector form:
Ax = b,
where a11 a12 . . . a1n a21 a22 . . . a2n A=
,
.
. .
am1 am2 . . . amn x1 x2 x = . , b = . .
xn b1
b2
.
.
. . bm Elementary Row Operations and Row Echelon Form
There are three types of elementary row operations on a matrix A:
1. Pij : Permute the i th and j th rows in A.
2. Mi (k): Multiply the entries in the i th row of A by the nonzero scalar k .
3. Aij (k): Add to the elements of the j th row of A the scalar k times the corresponding
elements of the i th row of A.
By performing elementary row operations on the augmented matrix above, we can
determine solutions, if any, to the linear system. The strategy is to apply elementary
row operations in such a way that A is transformed into row-echelon form—a process
known as Gaussian elimination. By applying back substitution to the linear system
corresponding to the row-echelon form obtained, we ﬁnd the solution. This solution
agrees with the solution to the original linear system. If necessary, free parameters may
be used to express this solution. A leading one in the far right-hand column of the
row-echelon form indicates that the system has no solution.
A row-echelon form matrix is one in which
• All rows consisting entirely of zeros are placed at the bottom of the matrix.
• All other rows begin with a (leading) “1”, called a pivot.
• The leading ones occur in columns strictly to the right of the leading ones in the
rows above. i i i i i i i “main”
2007/2/16
page 185
i 2.9 Chapter Review 185 Invertible Matrices
An n × n matrix A is invertible if there exists an n × n matrix B such that AB = In = BA,
where In is the n × n identity matrix (ones on the main diagonal, zeros elsewhere). We
write A−1 for the (unique) inverse B of A. One procedure for determining A−1 , if it
exists, is the Gauss-Jordan technique:
[A|In ] ∼ ERO ∼ [In |A−1 ].
...
Invertible matrices A share all of the following equivalent properties:
• A can be reduced to In via a sequence of elementary row operations.
• The linear system Ax = b has a unique solution x.
• The linear system Ax = 0 has only the trivial solution x = 0.
• A can be expressed as a product of elementary matrices that are obtained from
the identity matrix by applying exactly one elementary row operation. Additional Problems
Let A= (b) Using the values of a and b obtained in (a), compute BA. −3 0
−5 2 2 −6 −2 4 2 6 8. Let A be an m × n matrix and let B be an p × n matrix.
, B= 1 −3 , C = 3 ,
−1 −1 5 0
Use the index form of the matrix product to prove that
01
1
(AB T )T = BAT . and r = −4. For Problems 1–6, compute the given expression, if possible. 9. Let A be an n × n matrix. 1. rA − B T . (a) Use the index form of the matrix product to write
the ij th element of A2 . 2. AB and tr(AB). (b) In the case when A is a symmetric matrix, show
that A2 is also symmetric. 3. (AC)(AC)T . 10. Let A and B be n × n matrices. If A is skew-symmetric,
use properties of the transpose to establish that B T AB
is also skew-symmetric. 4. (rB)A.
5. (AB)−1 .
6. C T C and tr(C T C).
7. Let 123
A=
257 and 3
B = −4
a b
a .
b (a) Compute AB and determine the values of a and
b such that AB = I2 . An n × n matrix A is called nilpotent if Ap = 0 for some
positive integer p. For Problems 11–12, show that the given
matrix is nilpotent.
39
.
−1 −3 011
12. A = 0 0 1 .
000
11. A = i i i i i i i “main”
2007/2/16
page 186
i 186 CHAPTER 2 Matrices and Systems of Linear Equations 24. −7
t2 6 − t 3t 3 + 6t 2 B(t) = 1 + t cos(πt/2) .
1 − t3
et and Compute the given expression, if possible.
13. A (t).
14. 10x1 +kx2 −x3 = 0,
kx1 +x2 −x3 = 0,
2x1 +x2 −x3 = 0. 27. e−3t − sec2 t
A(t) = 2t 3 cos t 6 ln t 36 − 5t kx1 + 2x2 − x3 = 2,
kx2 + x3 = 2. 26. x1 − kx2 = 6,
2x1 + 3x2 = k. 25. For Problems 13–16, let x1 − kx2 + k 2 x3 = 0,
x1
+ kx3 = 0,
x2 − x3 = 1. 28. Do the three planes x1 + 2x2 + x3 = 4, x2 − x3 = 1,
and x1 + 3x2 = 0 have at least one common point of
intersection? Explain. 1
0 B(t) dt . 15. t 3 · A(t) − sin t · B(t).
16. B (t) − et A(t).
For Problems 17–23, determine the solution set to the given
linear system of equations. For Problems 29–34, (a) ﬁnd a row-echelon form of the given
matrix A, (b) determine rank(A), and (c) use the GaussJordan technique to determine the inverse of A, if it exists.
29. A = 17. x1 + 5x2 + 2x3 = −6,
4x2 − 7x3 = 2,
5x3 = 0. 18. 5x1 − x2 + 2x3 = 7,
−2x1 + 6x2 + 9x3 = 0,
−7x1 + 5x2 − 3x3 = −7. 19. x + 2y − z = 1,
x
+ z = 5,
4x + 4y
= 12. 32. x1 − 2x2 − x3 + 3x4 = 0,
−2x1 + 4x2 + 5x3 − 5x4 = 3,
3x1 − 6x2 − 6x3 + 8x4 = 2. 33. 20. 21. +
+
+
+ +
+
+
+ −
−
−
− 31. x5
2x5
x5
4x5 = 1,
= −1,
= 5.
= −2. 3x5
5x5
9x5
8x5 =
=
=
= 22. x1
x1
2x1
2x1 23. 6,
8,
17,
14. x1 − 3x2 + 2ix3 = 1,
−2ix1 + 6x2 + 2x3 = −2. x2
x2
3x2
2x2 x3
x3
x3
2x3 x4
2x4
4x4
3x4 2 −7
.
−4 14 3 −1 6
A = 0 2 3 .
3 −5 0 2100
1 2 0 0 A= 0 0 3 4 .
0043 300
A = 0 2 −1 .
1 −1 2 −2 −3 1
A = 1 4 2 .
0 53 30. A = 3x1
− x3 + 2x4 −
x1 + 3x2 + x3 − 3x4 +
4x1 − 2x2 − 3x3 + 6x4 −
x4 +
+
+
+
+ 47
.
−2 5 For Problems 24–27, determine all values of k for which the
given linear system has (a) no solution, (b) a unique solution,
and (c) inﬁnitely many solutions. 34. 35. Let 1 −1 3
A = 4 −3 13 .
1 14 Solve each of the systems
Axi = ei , i = 1, 2, 3 where ei denote the column vectors of the identity matrix I3 . i i i i i i i “main”
2007/2/16
page 187
i 2.9 36. Solve each of the systems Axi = bi if
25
,
7 −2
4
b2 =
,
3
A= 1
,
2
−2
b3 =
.
5 187 43. (a) Prove that if A and B are n × n matrices, then
(A + B)3 = A3 + A2 B + ABA + BA2 b1 = 37. Let A and B be invertible matrices. Chapter Review + AB 2 + BAB + B 2 A + B 3 .
(b) How does the formula change for (A − B)3 ? (a) By computing an appropriate matrix product, verify that (A−1 B)−1 = B −1 A. (c) Can you make a conjecture about the number of
terms in the expansion of (A + B)k , in terms of
k? (b) Use properties of the inverse to derive
(A−1 B)−1 = B −1 A. 44. Suppose that A and B are invertible matrices. Prove
that the block matrix 38. Let S be an invertible n × n matrix and let k be
a nonnegative integer. If A = SDS −1 , prove that
Ak = SD k S −1 .
For Problems 39–42, (a) express the given matrix as a product
of elementary matrices, and (b) determine the LU decomposition of the matrix.
39. The matrix in Problem 29.
40. The matrix in Problem 32.
41. The matrix in Problem 33.
42. The matrix in Problem 34. A0
0 B −1
is invertible.
45. In many different positions can two leading ones of a
row-echelon form of a 2 × 4 matrix occur? How about
three leading ones for a 3 × 4 matrix? How about four
leading ones for a 4 × 6 matrix? How about m leading
ones for an m × n matrix with m ≤ n?
46. If the inverse of A2 is the matrix B , what is the inverse
of the matrix A10 ? Prove your answer. Project: Circles and Spheres via Gaussian Elimination
Part 1: Circles In this part, we shall see that any three noncollinear points in the plane
can be found on a unique circle, and we will use Gaussian elimination to ﬁnd the center
and radius of this circle.
(a) Show geometrically that three noncollinear points in the plane must lie on a unique
circle. [Hint: The radius must lie on the line that passes through the midpoint of
two of the three points and that is perpendicular to the segment connecting the two
points.]
(b) A circle in the plane has an equation that can be given in the form
(x − a)2 + (y − b)2 = r 2 ,
where (a, b) is the center and r is the radius. By expanding the formula, we may
write the equation of the circle in the form
x 2 + y 2 + cx + dy = k,
for constants c, d, and k . Using this latter formula together with Gaussian elimination, determine c, d , and k for each set of points below. Then solve for (a, b)
and r to write the equation of the circle.
(i) (2, −1), (3, 3), (4, −1).
(ii) (−1, 0), (1, 2), (2, 2). i i i i i i i “main”
2007/2/16
page 188
i 188 CHAPTER 2 Matrices and Systems of Linear Equations Part 2: Spheres In this part, we shall extend the ideas of Part 1 and consider four
noncoplanar points in 3-space. Any three of these four points lie in a plane but are
noncollinear (why?). A sphere in 3-space has an equation that can be given in the form
(x − a)2 + (y − b)2 + (z − c)2 = r 2 ,
where (a, b, c) is the center and r is the radius. By expanding the formula, we may write
the equation of the sphere in the form
x 2 + y 2 + z2 + ux + vy + wz = k,
for constants u, v, w, and k .
(a) Using the latter formula above together with Gaussian elimination, determine
u, v, w, and k for each set of points below. Then solve for (a, b, c) and r to write
the equation of the sphere.
(i) (1, −1, 2), (2, −1, 4), (−1, −1, −1), (1, 4, 1).
(ii) (2, 0, 0), (0, 3, 0), (0, 0, 4), (0, 0, 6).
(b) What goes wrong with the procedure in (a) if the points lie on a single plane?
Choose four points of your own and carry out the procedure in part (a) to see what
happens. Can you describe circumstances under which the four coplanar points
will lie on a sphere? i i i i ...

View
Full
Document