Chapters 1-2

Chapters 1-2 - i i i “main” 2007/2/16 page 1 i CHAPTER...

Info iconThis preview shows pages 1–3. Sign up to view the full content.

View Full Document Right Arrow Icon

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: i i i “main” 2007/2/16 page 1 i CHAPTER 1 First-Order Differential Equations Among all of the mathematical disciplines the theory of differential equations is the most important. It furnishes the explanation of all those elementary manifestations of nature which involve time. — Sophus Lie 1.1 How Differential Equations Arise In this section we will introduce the idea of a differential equation through the mathematical formulation of a variety of problems. We then use these problems throughout the chapter to illustrate the applicability of the techniques introduced. Newton’s Second Law of Motion Newton’s second law of motion states that, for an object of constant mass m, the sum of the applied forces acting on the object is equal to the mass of the object multiplied by the acceleration of the object. If the object is moving in one dimension under the influence of a force F , then the mathematical statement of this law is m dv = F, dt (1.1.1) where v(t) denotes the velocity of the object at time t . We let y(t) denote the displacement of the object at time t . Then, using the fact that velocity and displacement are related via v= dy , dt we can write (1.1.1) as m d 2y = F. dt 2 (1.1.2) This is an example of a differential equation, so called because it involves derivatives of the unknown function y(t). 1 i i i i i i i “main” 2007/2/16 page 2 i 2 CHAPTER 1 First-Order Differential Equations Positive y-direction mg Gravitational Force: As a specific example, consider the case of an object falling freely under the influence of gravity (see Figure 1.1.1). In this case the only force acting on the object is F = mg , where g denotes the (constant) acceleration due to gravity. Choosing the positive y -direction as downward, it follows from Equation (1.1.2) that the motion of the object is governed by the differential equation Figure 1.1.1: Object falling under the influence of gravity. m d 2y = mg, dt 2 (1.1.3) or equivalently, d 2y = g. dt 2 Since g is a (positive) constant, we can integrate this equation to determine y(t). Performing one integration yields dy = gt + c1 , dt where c1 is an arbitrary integration constant. Integrating once more with respect to t, we obtain y(t) = 12 gt + c1 t + c2 , 2 (1.1.4) where c2 is a second integration constant. We see that the differential equation has an infinite number of solutions parameterized by the constants c1 and c2 . In order to uniquely specify the motion, we must augment the differential equation with initial conditions that specify the initial position and initial velocity of the object. For example, if the object is released at t = 0 from y = y0 with a velocity v0 , then, in addition to the differential equation, we have the initial conditions y(0) = y0 , dy (0) = v0 . dt (1.1.5) These conditions must be imposed on the solution (1.1.4) in order to determine the values of c1 and c2 that correspond to the particular problem under investigation. Setting t = 0 in (1.1.4) and using the first initial condition from (1.1.5), we find that y0 = c2 . Substituting this into Equation (1.1.4), we get y(t) = 12 gt + c1 t + y0 . 2 (1.1.6) In order to impose the second initial condition from (1.1.5), we first differentiate Equation (1.1.6) to obtain dy = gt + c1 . dt Consequently the second initial condition in (1.1.5) requires c1 = v0 . i i i i i i i “main” 2007/2/16 page 3 i 1.1 How Differential Equations Arise 3 From (1.1.6), it follows that the position of the object at time t is 12 gt + v0 t + y0 . 2 The differential equation (1.1.3) together with the initial conditions (1.1.5) is an example of an initial-value problem. y(t) = Spring Force: As a second application of Newton’s law of motion, consider the spring– mass system depicted in Figure 1.1.2, where, for simplicity, we are neglecting frictional and external forces. In this case, the only force acting on the mass is the restoring force (or spring force), Fs , due to the displacement of the spring from its equilibrium (unstretched) position. We use Hooke’s law to model this force: y 0 Mass in its equilibrium position y(t) Positive y-direction Figure 1.1.2: A simple harmonic oscillator. Hooke’s Law: The restoring force of a spring is directly proportional to the displacement of the spring from its equilibrium position and is directed toward the equilibrium position. If y(t) denotes the displacement of the spring from its equilibrium position at time t (see Figure 1.1.2), then according to Hooke’s law, the restoring force is Fs = −ky, where k is a positive constant called the spring constant. Consequently, Newton’s second law of motion implies that the motion of the spring–mass system is governed by the differential equation m d 2y = −ky, dt 2 which we write in the equivalent form d 2y + ω 2 y = 0, dt 2 (1.1.7) √ where ω = k/m. At present we cannot solve this differential equation. However, we leave it as an exercise (Problem 7) to verify by direct substitution that y(t) = A cos(ωt − φ) is a solution to the differential equation (1.1.7), where A and φ are constants (determined from the initial conditions for the problem). We see that the resulting motion is periodic with amplitude A. This is consistent with what we might expect physically, since no frictional forces or external forces are acting on the system. This type of motion is referred to as simple harmonic motion, and the physical system is called a simple harmonic oscillator. i i i i i i i “main” 2007/2/16 page 4 i 4 CHAPTER 1 First-Order Differential Equations Newton’s Law of Cooling We now build a mathematical model describing the cooling (or heating) of an object. Suppose that we bring an object into a room. If the temperature of the object is hotter than that of the room, then the object will begin to cool. Further, we might expect that the major factor governing the rate at which the object cools is the temperature difference between it and the room. Newton’s Law of Cooling: The rate of change of temperature of an object is proportional to the temperature difference between the object and its surrounding medium. To formulate this law mathematically, we let T (t) denote the temperature of the object at time t , and let Tm (t) denote the temperature of the surrounding medium. Newton’s law of cooling can then be expressed as the differential equation dT = −k(T − Tm ), dt (1.1.8) where k is a constant. The minus sign in front of the constant k is traditional. It ensures that k will always be positive.1 After we study Section 1.4, it will be easy to show that, when Tm is constant, the solution to this differential equation is T (t) = Tm + ce−kt , (1.1.9) where c is a constant (see also Problem 12). Newton’s law of cooling therefore predicts that as t approaches infinity (t → ∞), the temperature of the object approaches that of the surrounding medium (T → Tm ). This is certainly consistent with our everyday experience (see Figure 1.1.3). T(t) T0 T(t) Object that is cooling Tm Tm Object that is heating T0 t t Figure 1.1.3: According to Newton’s law of cooling, the temperature of an object approaches room temperature exponentially. The Orthogonal Trajectory Problem Next we consider a geometric problem that has many interesting and important applications. Suppose F (x, y, c) = 0 (1.1.10) 1 If T > T , then the object will cool, so that dT /dt < 0. Hence, from Equation (1.1.8), k must be positive. m Similarly, if T < Tm , then dT /dt > 0, and once more Equation (1.1.8) implies that k must be positive. i i i i i i i “main” 2007/2/16 page 5 i 1.1 How Differential Equations Arise 5 defines a family of curves in the xy -plane, where the constant c labels the different curves. For instance, the equation x2 + y2 − c = 0 describes a family of concentric circles with center at the origin, whereas −x 2 + y − c = 0 describes a family of parabolas that are vertical shifts of the standard parabola y = x 2 . We assume that every curve in the family F (x, y, c) = 0 has a well-defined tangent line at each point. Associated with this family is a second family of curves, say, G(x, y, k) = 0, y x (1.1.11) with the property that whenever a curve from the family (1.1.10) intersects a curve from the family (1.1.11), it does so at right angles.2 We say that the curves in the family (1.1.11) are orthogonal trajectories of the family (1.1.10), and vice versa. For example, from elementary geometry, it follows that the lines y = kx in the family G(x, y, k) = y − kx = 0 are orthogonal trajectories of the family of concentric circles x 2 + y 2 = c2 . (See Figure 1.1.4.) Orthogonal trajectories arise in various applications. For example, a family of curves and its orthogonal trajectories can be used to define an orthogonal coordinate system in the xy -plane. In Figure 1.1.4 the families x 2 + y 2 = c2 and y = kx are the coordinate curves of a polar coordinate system (that is, the curves r = constant and θ = constant, respectively). In physics, the lines of electric force of a static configuration are the orthogonal trajectories of the family of equipotential curves. As a final example, if we consider a two-dimensional heated plate, then the heat energy flows along the orthogonal trajectories to the constant-temperature curves (isotherms). Statement of the Problem: Given the equation of a family of curves, find the equation of the family of orthogonal trajectories. Figure 1.1.4: The family of curves x 2 + y 2 = c2 and the orthogonal trajectories y = kx . Mathematical Formulation: We recall that curves that intersect at right angles satisfy the following: The product of the slopes3 at the point of intersection is −1. Thus if the given family F (x, y, c) = 0 has slope m1 = f (x, y) at the point (x, y), then the slope of the family of orthogonal trajectories G(x, y, k) = 0 is m2 = −1/f (x, y), and therefore the differential equation that determines the orthogonal trajectories is 1 dy =− . dx f (x, y) 2 That is, the tangent lines to each curve are perpendicular at any point of intersection. 3 By the slope of a curve at a given point, we mean the slope of the tangent line to the curve at that point. i i i i i i i “main” 2007/2/16 page 6 i 6 CHAPTER 1 First-Order Differential Equations Example 1.1.1 Determine the equation of the family of orthogonal trajectories to the curves with equation y 2 = cx. (1.1.12) Solution: According to the preceding discussion, the differential equation determining the orthogonal trajectories is dy 1 =− , dx f (x, y) where f (x, y) denotes the slope of the given family at the point (x, y). To determine f (x, y), we differentiate Equation (1.1.12) implicitly with respect to x to obtain 2y dy = c. dx (1.1.13) We must now eliminate c from the previous equation to obtain an expression that gives the slope at the point (x, y). From Equation (1.1.12) we have c= y2 , x which, when substituted into Equation (1.1.13), yields dy y = . dx 2x Consequently, the slope of the given family at the point (x, y) is f (x, y) = y , 2x so that the orthogonal trajectories are obtained by solving the differential equation dy 2x =− . dx y A key point to notice is that we cannot solve this differential equation by simply integrating with respect to x , since the function on the right-hand side of the differential equation depends on both x and y . However, multiplying by y, we see that y dy = −2x, dx or equivalently, d dx 12 y 2 = −2x. Since the right-hand side of this equation depends only on x, whereas the term on the left-hand side is a derivative with respect to x , we can integrate both sides of the equation with respect to x to obtain 12 y = −x 2 + c1 , 2 which we write as 2x 2 + y 2 = k, (1.1.14) i i i i i i i “main” 2007/2/16 page 7 i 1.1 y 2x2 How Differential Equations Arise y2 7 k y2 cx x Figure 1.1.5: The family of curves y 2 = cx and its orthogonal trajectories 2x 2 + y 2 = k . where k = 2c1 . We see that the curves in the given family (1.1.12) are parabolas, and the orthogonal trajectories (1.1.14) are a family of ellipses. This is illustrated in Figure 1.1.5. Exercises for 1.1 Key Terms Differential equation, Initial conditions, Initial-value problem, Newton’s second law of motion, Hooke’s law, Spring constant, Simple harmonic motion, Simple harmonic oscillator, Newton’s law of cooling, Orthogonal trajectories. Skills • Given a differential equation, be able to check whether or not a given function y = f (x) is indeed a solution to the differential equation. • Be able to find the distance, velocity, and acceleration functions for an object moving freely under the influence of gravity. • Be able to determine the motion of an object in a spring–mass system with no frictional or external forces. • Be able to describe qualitatively how the temperature of an object changes as a function of time according to Newton’s law of cooling. • Be able to find the equation of the orthogonal trajectories to a given family of curves. In simple geometric cases, be prepared to provide rough sketches of some representative orthogonal trajectories. True-False Review For Questions 1–11, decide if the given statement is true or false, and give a brief justification for your answer. If true, you can quote a relevant definition or theorem from the text. If false, provide an example, illustration, or brief explanation of why the statement is false. 1. A differential equation for a function y = f (x) must contain the first derivative y = f (x). 2. The numerical values y(0) and y (0) accompanying a differential equation for a function y = f (x) are called initial conditions of the differential equation. 3. The relationship between the velocity and the acceleration of an object falling under the influence of gravity can be expressed mathematically as a differential equation. 4. A sketch of the height of an object falling freely under the influence of gravity as a function of time takes the shape of a parabola. i i i i i i i “main” 2007/2/16 page 8 i 8 CHAPTER 1 First-Order Differential Equations 5. Hooke’s law states that the restoring force of a spring is directly proportional to the displacement of the spring from its equilibrium position and is directed in the direction of the displacement from the equilibrium position. 6. If room temperature is 70◦ F, then an object whose temperature is 100◦ F at a particular time cools faster at that time than an object whose temperature at that time is 90◦ F. 7. According to Newton’s law of cooling, the temperature of an object eventually becomes the same as the temperature of the surrounding medium. 8. A hot cup of coffee that is put into a cold room cools more in the first hour than the second hour. 9. At a point of intersection of a curve and one of its orthogonal trajectories, the slopes of the two curves are reciprocals of one another. 10. The family of orthogonal trajectories for a family of parallel lines is another family of parallel lines. 11. The family of orthogonal trajectories for a family of circles that are centered at the origin is another family of circles centered at the origin. 3. A pyrotechnic rocket is to be launched vertically upward from the ground. For optimal viewing, the rocket should reach a maximum height of 90 meters above the ground. Ignore frictional forces. (a) How fast must the rocket be launched in order to achieve optimal viewing? (b) Assuming the rocket is launched with the speed determined in part (a), how long after it is launched will it reach its maximum height? 4. Repeat Problem 3 under the assumption that the rocket is launched from a platform 5 meters above the ground. 5. An object thrown vertically upward with a speed of 2 m/s from a height of h meters takes 10 seconds to reach the ground. Set up and solve the initial-value problem that governs the motion of the object, and determine h. 6. An object released from a height h meters above the ground with a vertical velocity of v0 m/s hits the ground after t0 seconds. Neglecting frictional forces, set up and solve the initial-value problem governing the motion, and use your solution to show that v0 = Problems 1. An object is released from rest at a height of 100 meters above the ground. Neglecting frictional forces, the subsequent motion is governed by the initial-value problem d 2y = g, dt 2 y(0) = 0, dy (0) = 0, dt where y(t) denotes the displacement of the object from its initial position at time t . Solve this initial-value problem and use your solution to determine the time when the object hits the ground. 2. A five-foot-tall boy tosses a tennis ball straight up from the level of the top of his head. Neglecting frictional forces, the subsequent motion is governed by the differential equation 7. Verify that y(t) = A cos(ωt − φ) is a solution to the differential equation (1.1.7), where A, ω, and φ are constants with A and ω nonzero. Determine the constants A and φ (with |φ | < π radians) in the particular case when the initial conditions are (a) the time when the tennis ball reaches its maximum height. (b) the maximum height of the tennis ball. dy (0) = 0. dt y(0) = a, 8. Verify that y(t) = c1 cos ωt + c2 sin ωt is a solution to the differential equation (1.1.7). Show that the amplitude of the motion is d 2y = g. dt 2 If the object hits the ground 8 seconds after the boy releases it, find 1 2 (2h − gt0 ). 2t0 A= 2 2 c1 + c2 . 9. Verify that, for t > 0, y(t) = ln t is a solution to the differential equation 2 dy dt 3 = d 3y . dt 3 i i i i i i i “main” 2007/2/16 page 9 i 1.1 10. Verify that y(x) = x/(x + 1) is a solution to the differential equation d 2y dy x 3 + 2x 2 − 3 y+ 2 = + . dx dx (1 + x)3 11. Verify that y(x) = ex sin x is a solution to the differential equation 2y cot x − d 2y = 0. dx 2 12. By writing Equation (1.1.8) in the form 1 dT = −k T − Tm dt and using u−1 du d = (ln u), derive (1.1.9). dt dt 13. A glass of water whose temperature is 50◦ F is taken outside at noon on a day whose temperature is constant at 70◦ F. If the water’s temperature is 55◦ F at 2 p.m., do you expect the water’s temperature to reach 60◦ F before 4 p.m. or after 4 p.m.? Use Newton’s law of cooling to explain your answer. 14. On a cold winter day (10◦ F), an object is brought outside from a 70◦ F room. If it takes 40 minutes for the object to cool from 70◦ F to 30◦ F, did it take more or less than 20 minutes for the object to reach 50◦ F? Use Newton’s law of cooling to explain your answer. How Differential Equations Arise 21. y = mx + c. 22. y = cx m . 23. y 2 + mx 2 = c. 24. y 2 = mx + c. 25. We call a coordinate system (u, v) orthogonal if its coordinate curves (the two families of curves u = constant and v = constant) are orthogonal trajectories (for example, a Cartesian coordinate system or a polar coordinate system). Let (u, v) be orthogonal coordinates, where u = x 2 + 2y 2 , and x and y are Cartesian coordinates. Find the Cartesian equation of the v -coordinate curves, and sketch the (u, v) coordinate system. 26. Any curve with the property that whenever it intersects a curve of a given family it does so at an angle a = π/2 is called an oblique trajectory of the given family. (See Figure 1.1.6.) Let m1 (equal to tan a1 ) denote the slope of the required family at the point (x, y), and let m2 (equal to tan a2 ) denote the slope of the given family. Show that m1 = 16. y = c/x . m2 − tan a dy = .] dx 1 + m2 tan a m1 m2 tan a1 tan a2 17. y = cx 2 . 18. y = 19. y2 slope of required family slope of given family a cx 4 . = 2x + c . 20. y = m2 − tan a . 1 + m2 tan a [Hint: From Figure 1.1.6, tan a1 = tan(a2 − a). Thus, the equation of the family of oblique trajectories is obtained by solving For Problems 15–20, find the equation of the orthogonal trajectories to the given family of curves. In each case, sketch some curves from each family. 15. x 2 + 4y 2 = c. cex . For Problems 21–24, m denotes a fixed nonzero constant, and c is the constant distinguishing the different curves in the given family. In each case, find the equation of the orthogonal trajectories. 9 a2 Curve of required family a1 Curve of given family Figure 1.1.6: Oblique trajectories intersecting at an angle a . i i i i i i i “main” 2007/2/16 page 10 i 10 CHAPTER 1 First-Order Differential Equations 1.2 Basic Ideas and Terminology In the preceding section we have used some applied problems to illustrate how differential equations arise. We now undertake to formalize mathematically several ideas introduced through these examples. We begin with a very general definition of a differential equation. DEFINITION 1.2.1 A differential equation is an equation involving one or more derivatives of an unknown function. Example 1.2.2 The following are all differential equations: 5 (a) d 2y d 3y d 2y dy + y = x2, (b) = −k 2 y , (c) + + cos x = 0, 2 3 dx dx dx dx 2 dy (d) sin (e) φxx + φyy − φx = ex + x sin y. + tan−1 y = 1, dx The differential equations occurring in (a) through (d) are called ordinary differential equations, since the unknown function y(x) depends only on one variable, x . In (e), the unknown function φ(x, y) depends on more than one variable; hence the equation involves partial derivatives. Such a differential equation is called a partial differential equation. In this text we consider only ordinary differential equations. We now introduce some more definitions and terminology. DEFINITION 1.2.3 The order of the highest derivative occurring in a differential equation is called the order of the differential equation. In Example 1.2.2, (a) has order 1, (b) has order 2, (c) has order 3, and (d) has order 1. If we look back at the examples from the previous section, we see that problems formulated using Newton’s second law of motion will always be governed by a second-order differential equation (for the position of the object). Indeed, second-order differential equations play a very fundamental role in applied problems, although differential equations of other orders also arise. For example, the differential equation obtained from Newton’s law of cooling is a first-order differential equation, as is the differential equation for determining the orthogonal trajectories to a given family of curves. As another example, we note that under certain conditions, the deflection, y(x), of a horizontal beam is governed by the fourth-order differential equation d 4y = F (x) dx 4 for an appropriate function F (x). Any differential equation of order n can be written in the form G(x, y, y , y , . . . , y (n) ) = 0, (1.2.1) where we have introduced the prime notation to denote derivatives, and y (n) denotes the nth derivative of y with respect to x (not y to the power of n). Of particular interest to us i i i i i i i “main” 2007/2/16 page 11 i 1.2 Basic Ideas and Terminology 11 throughout the text will be linear differential equations. These arise as the special case of Equation (1.2.1), when y, y , . . . , y (n) occur to the first degree only, and not as products or arguments of other functions. The general form for such a differential equation is given in the next definition. DEFINITION 1.2.4 A differential equation that can be written in the form a0 (x)y (n) + a1 (x)y (n−1) + · · · + an (x)y = F (x), where a0 , a1 , . . . , an and F are functions of x only, is called a linear differential equation of order n. Such a differential equation is linear in y, y , y , . . . , y (n) . A differential equation that does not satisfy this definition is called a nonlinear differential equation. Example 1.2.5 The equations y + x 2 y + (sin x)y = ex xy + 4x 2 y − and 2 y=0 1 + x2 are linear differential equations of order 2 and order 3, respectively, whereas the differential equations y + x sin(y ) − xy = x 2 and y − x2y + y2 = 0 are nonlinear. In the first case the nonlinearity arises from the sin(y ) term, whereas in the second, the nonlinearity is due to the y 2 term. Example 1.2.6 The general forms for first- and second-order linear differential equations are a0 (x) dy + a1 (x)y = F (x) dx and a0 (x) d 2y dy + a1 (x) + a2 (x)y = F (x), dx dx 2 respectively. If we consider the examples from the previous section, we see that the differential equation governing the simple harmonic oscillator is a second-order linear differential equation. In this case the linearity was imposed in the modeling process when we assumed that the restoring force was directly proportional to the displacement from equilibrium (Hooke’s law). Not all springs satisfy this relationship. For example, Duffing’s equation m d 2y + k1 y + k2 y 3 = 0 dx 2 gives a mathematical model of a nonlinear spring–mass system. If k2 = 0, this reduces to the simple harmonic oscillator equation. Newton’s law of cooling assumes a linear relationship between the rate of change of the temperature of an object and the temperature i i i i i i i “main” 2007/2/16 page 12 i 12 CHAPTER 1 First-Order Differential Equations difference between the object and that of the surrounding medium. Hence, the resulting differential equation is linear. This can be seen explicitly by writing Equation (1.1.8) as dT + kT = kTm , dt which is a first-order linear differential equation. Finally, the differential equation for determining the orthogonal trajectories of a given family of curves will in general be nonlinear, as seen in Example 1.1.1. Solutions of Differential Equations We now define precisely what is meant by a solution to a differential equation. DEFINITION 1.2.7 A function y = f (x) that is (at least) n times differentiable on an interval I is called a solution to the differential equation (1.2.1) on I if the substitution y = f (x), y = f (x), . . . , y (n) = f (n) (x) reduces the differential equation (1.2.1) to an identity valid for all x in I . In this case we say that y = f (x) satisfies the differential equation. Example 1.2.8 Verify that for all constants c1 and c2 , y(x) = c1 sin x + c2 cos x is a solution to the linear differential equation y + y = 0 for x in the interval (−∞, ∞). Solution: The function y(x) is certainly twice differentiable for all real x . Further- more, y (x) = c1 cos x − c2 sin x and y (x) = −(c1 sin x + c2 cos x). Consequently, y + y = −(c1 sin x + c2 cos x) + c1 sin x + c2 cos x = 0, so that y + y = 0 for every x in (−∞, ∞). It follows from the preceding definition that the given function is a solution to the differential equation on (−∞, ∞). In the preceding example, x could assume all real values. Often, however, the independent variable will be restricted in some manner. For example, the differential equation dy 1 = √ (y − 1) dx 2x is undefined when x ≤ 0, and so any solution would be defined only for x > 0. In fact this linear differential equation has solution y(x) = ce √ x + 1, x > 0, i i i i i i i “main” 2007/2/16 page 13 i 1.2 Basic Ideas and Terminology 13 where c is a constant. (The reader can check this by plugging in to the given differential equation, as was done in Example 1.2.8. In Section 1.4 we will introduce a technique that will enable us to derive this solution.) We now distinguish two ways in which solutions to a differential equation can be expressed. Often, as in Example 1.2.8, we will be able to obtain a solution to a differential equation in the explicit form y = f (x), for some function f . However, when dealing with nonlinear differential equations, we usually have to be content with a solution written in implicit form F (x, y) = 0, where the function F defines the solution, y(x), implicitly as a function of x . This is illustrated in Example 1.2.9. Example 1.2.9 Verify that the relation x 2 + y 2 − 4 = 0 defines an implicit solution to the nonlinear differential equation dy x =− . dx y Solution: We regard the given relation as defining y as a function of x . Differentiating this relation with respect to x yields4 2x + 2y dy = 0. dx That is, dy x =− , dx y as required. In this example we can obtain y explicitly in terms of x, since x 2 + y 2 − 4 = 0 implies that y = ± 4 − x2. The implicit relation therefore contains the two explicit solutions y(x) = 4 − x2, y(x) = − 4 − x 2 , which correspond graphically to the two semi-circles sketched in Figure 1.2.1. y y(x) (4 x2)1/2 x Both solutions are undefined when x 2 y(x) (4 x2)1/2 Figure 1.2.1: Two solutions to the differential equation y = −x/y . 4 Note that we have used implicit differentiation in obtaining d(y 2 )/dx = 2y · (dy/dx). i i i i i i i “main” 2007/2/16 page 14 i 14 CHAPTER 1 First-Order Differential Equations Since x = ±2 corresponds to y = 0 in both of these equations, whereas the differential equation is defined only for y = 0, we must omit x = ±2 from the domains of the solutions. Consequently, both of the foregoing solutions to the differential equation are valid for −2 < x < 2. In the preceding example the solutions to the differential equation are more simply expressed in implicit form, although, as we have shown, it is quite easy to obtain the corresponding explicit solutions. In the following example the solution must be expressed in implicit form, since it is impossible to solve the implicit relation (analytically) for y as a function of x . Example 1.2.10 Show that the relation sin(xy) + y 2 − x = 0 defines a solution to dy 1 − y cos(xy) = . dx x cos(xy) + 2y Solution: Differentiating the given relationship implicitly with respect to x yields cos(xy) y + x dy dx + 2y dy − 1 = 0. dx That is, dy [x cos(xy) + 2y ] = 1 − y cos(xy), dx which implies that dy 1 − y cos(xy) = dx x cos(xy) + 2y as required. Now consider the simple differential equation d 2y = 12x. dx 2 From elementary calculus we know that all functions whose second derivative is 12x can be obtained by performing two integrations. Integrating the given differential equation once yields dy = 6x 2 + c1 , dx where c1 is an arbitrary constant. Integrating again, we obtain y(x) = 2x 3 + c1 x + c2 , (1.2.2) where c2 is another arbitrary constant. The point to notice about this solution is that it contains two arbitrary constants. Further, by assigning appropriate values to these constants, we can determine all solutions to the differential equation. We call (1.2.2) the general solution to the differential equation. In this example the given differential equation was of second-order, and the general solution contained two arbitrary constants, which arose because two integrations were required to solve the differential equation. In the case of an nth-order differential equation we might suspect that the most general i i i i i i i “main” 2007/2/16 page 15 i 1.2 Basic Ideas and Terminology 15 form of solution that can arise would contain n arbitrary constants. This is indeed the case and motivates the following definition. DEFINITION 1.2.11 A solution to an nth-order differential equation on an interval I is called the general solution on I if it satisfies the following conditions: 1. The solution contains n constants c1 , c2 , . . . , cn . 2. All solutions to the differential equation can be obtained by assigning appropriate values to the constants. Remark Not all differential equations have a general solution. For example, consider (y )2 + (y − 1)2 = 0. The only solution to this differential equation is y(x) = 1, and hence the differential equation does not have a solution containing an arbitrary constant. Example 1.2.12 Find the general solution to the differential equation y = e−x . Solution: Integrating the given differential equation with respect to x yields y = −e−x + c1 , where c1 is an integration constant. Integrating this equation, we obtain y(x) = e−x + c1 x + c2 (1.2.3) where c2 is another integration constant. Consequently, all solutions to y = e−x are of the form (1.2.3), and therefore, according to Definition 1.2.11, this is the general solution to y = e−x on any interval. As the preceding example illustrates, we can, in principle, always find the general solution to a differential equation of the form d ny = f (x) dx n (1.2.4) by performing n integrations. However, if the function on the right-hand side of the differential equation is not a function of x only, this procedure cannot be used. Indeed, one of the major aims of this text is to determine solution techniques for differential equations that are more complicated than Equation (1.2.4). A solution to a differential equation is called a particular solution if it does not contain any arbitrary constants not present in the differential equation itself. One way in which particular solutions arise is by our assigning specific values to the arbitrary constants occurring in the general solution to a differential equation. For example, from (1.2.3), y(x) = e−x + x is a particular solution to the differential equation d 2 y/dx 2 = e−x (the solution corresponding to c1 = 1, c2 = 0). i i i i i i i “main” 2007/2/16 page 16 i 16 CHAPTER 1 First-Order Differential Equations Initial-Value Problems As discussed in the preceding section, the unique specification of an applied problem requires more than just a differential equation. We must also give appropriate auxiliary conditions that characterize the problem under investigation. Of particular interest to us is the case of the initial-value problem defined for an nth-order differential equation as follows. DEFINITION 1.2.13 An nth-order differential equation together with n auxiliary conditions of the form y(x0 ) = y0 , y (x0 ) = y1 , ..., y (n−1) (x0 ) = yn−1 , where y0 , y1 , . . . , yn−1 are constants, is called an initial-value problem. Example 1.2.14 Solve the initial-value problem y = e −x , y(0) = 1, y (0) = 4. Solution: (1.2.5) (1.2.6) From Example 1.2.12, the general solution to Equation (1.2.5) is y(x) = e−x + c1 x + c2 . (1.2.7) We now impose the auxiliary conditions (1.2.6). Setting x = 0 in (1.2.7), we see that y(0) = 1 if and only if 1 = 1 + c2 . So c2 = 0. Using this value for c2 in (1.2.7) and differentiating the result yields y (x) = −e−x + c1 . Consequently y (0) = 4 if and only if 4 = −1 + c1 , and hence c1 = 5. Thus the given auxiliary conditions pick out the particular solution to the differential equation (1.2.5) with c1 = 5 and c2 = 0, so that the initial-value problem has the unique solution y(x) = e−x + 5x. Initial-value problems play a fundamental role in the theory and applications of differential equations. In the previous example, the initial-value problem had a unique solution. More generally, suppose we have a differential equation that can be written in the normal form y (n) = f (x, y, y , . . . , y (n−1) ). According to Definition 1.2.13, the initial-value problem for such an nth-order differential equation is the following: i i i i i i i “main” 2007/2/16 page 17 i 1.2 Basic Ideas and Terminology 17 Statement of the initial-value problem: Solve y (n) = f (x, y, y , . . . , y (n−1) ) subject to y(x0 ) = y0 , y (x0 ) = y1 , ..., y (n−1) (x0 ) = yn−1 , where y0 , y1 , . . . , yn−1 are constants. It can be shown that this initial-value problem always has a unique solution, provided that f and its partial derivatives with respect to y, y , . . . , y (n−1) are continuous in an appropriate region. This is a fundamental result in the theory of differential equations. In Chapter 6 we will show how the following special case can be used to develop the theory for linear differential equations. Theorem 1.2.15 Let a1 , a2 , . . . , an , F be functions that are continuous on an interval I . Then, for any x0 in I , the initial-value problem y (n) + a1 (x)y (n−1) + · · · + an−1 (x)y + an (x)y = F (x), y(x0 ) = y0 , y (x0 ) = y1 , ..., y (n−1) (x0 ) = yn−1 has a unique solution on I . The next example, which we will refer back to on many occasions throughout the remainder of the text, illustrates the power of the preceding theorem. Example 1.2.16 Prove that the general solution to the differential equation y + ω 2 y = 0, −∞ < x < ∞ (1.2.8) where ω is a nonzero constant, is y(x) = c1 cos ωx + c2 sin ωx, (1.2.9) where c1 , c2 are arbitrary constants. Solution: It is a routine computation to verify that y(x) = c1 cos ωx + c2 sin ωx is a solution to the differential equation (1.2.8) on (−∞, ∞). According to Definition 1.2.11 we must now establish that every solution to (1.2.8) is of the form (1.2.9). To that end, suppose that y = f (x) is any solution to (1.2.8). Then according to the preceding theorem, y = f (x) is the unique solution to the initial-value problem y + ω 2 y = 0, y(0) = f (0), y (0) = f (0). (1.2.10) However, consider the function y(x) = f (0) cos ωx + f (0) sin ωx ω (1.2.11) This is of the form y(x) = c1 cos ωx + c2 sin ωx, where c1 = f (0) and c2 = f (0)/ω, and therefore solves the differential equation (1.2.8). Further, evaluating (1.2.11) at x = 0 yields y(0) = f (0) and y (0) = f (0). Consequently, (1.2.11) solves the initial-value problem (1.2.10). But, by assumption, y(x) = f (x) solves the same initial-value problem. Owing to the uniqueness of the i i i i i i i “main” 2007/2/16 page 18 i 18 CHAPTER 1 First-Order Differential Equations solution to this initial-value problem, it follows that these two solutions must coincide. Therefore, f (x) = f (0) cos ωx + f (0) sin ωx = c1 cos ωx + c2 sin ωx. ω Since f (x) was an arbitrary solution to the differential equation (1.2.8), we can conclude that every solution to (1.2.8) is of the form y(x) = c1 cos ωx + c2 sin ωx and therefore this is the general solution on (−∞, ∞). In the remainder of this chapter we will focus primarily on first-order differential equations and some of their elementary applications. We will investigate such differential equations qualitatively, analytically, and numerically. Exercises for 1.2 Key Terms Differential equation, Order of a differential equation, Linear differential equation, Nonlinear differential equation, General solution to a differential equation, Particular solution to a differential equation, Initial-value problem. Skills • Be able to determine the order of a differential equation. you can quote a relevant definition or theorem from the text. If false, provide an example, illustration, or brief explanation of why the statement is false. 1. The order of a differential equation is the order of the lowest derivative appearing in the differential equation. 2. The general solution to a third-order differential equation must contain three constants. • Be able to determine whether a given differential equation is linear or nonlinear. 3. An initial-value problem always has a unique solution if the functions and partial derivatives involved are continuous. • Be able to determine whether or not a given function y(x) is a particular solution to a given differential equation. 4. The general solution to y + y = 0 is y(x) = c1 cos x + 5c2 cos x . • Be able to determine whether or not a given implicit relation defines a particular solution to a given differential equation. • Be able to find the general solution to differential equations of the form y (n) = f (x) via n integrations. • Be able to use initial conditions to find the solution to an initial-value problem. True-False Review For Questions 1–6, decide if the given statement is true or false, and give a brief justification for your answer. If true, 5. The general solution to y + y = 0 is y(x) = c1 cos x + 5c1 sin x . 6. The general solution to a differential equation of the form y (n) = F (x) can be obtained by n consecutive integrations of the function F (x). Problems For Problems 1–6, determine the order of the given differential equation and state whether it is linear or nonlinear. 1. d 2y dy + exy = x2. dx dx 2 i i i i i i i “main” 2007/2/16 page 19 i 1.2 2. d 3y d 2y dy + 4 2 + sin x = xy + tan x . dx dx 3 dx 3. y + 3x(y )3 − y = 1 + 3x . 6. d 2y d 4y + 3 2 = x. 4 dx dx √ 23. When N is a positive integer, the Legendre equation (1 − x 2 )y − 2xy + N(N + 1)y = 0, y(x) = ln x xy + = 3x 3 . y For Problems 7–18, verify that the given function is a solution to the given differential equation (c1 and c2 are arbitrary constants), and state the maximum interval over which the solution is valid. 7. y(x) = c1 ex cos 2x +c2 ex sin 2x, y −2y +5y = 0. 8. y(x) = c1 ex + c2 e−2x , 9. y(x) = 1 , x+4 10. y(x) = c1 x 1/2 , y + y − 2y = 0. y = −y 2 . y y= . 2x 11. y(x) = e−x sin 2x, 13. y(x) = c1 x −3 + c2 x −1 , 14. y(x) = c1 x 1/2 + 3x 2 , 1 x(5x 2 − 3). 2 24. Determine a solution to the differential equation (1 − x 2 )y − xy + 4y = 0 of the form y(x) = a0 + a1 x + a2 x 2 satisfying the normalization condition y(1) = 1. For Problems 25–29, show that the given relation defines an implicit solution to the given differential equation, where c is an arbitrary constant. 25. x sin y − ex = c, y = y + 2y + 5y = 0. 12. y(x) = c1 cosh 3x + c2 sinh 3x, 19 with −1 < x < 1, has a solution that is a polynomial of degree N . Show by substitution into the differential equation that in the case N = 3 such a solution is 4. sin x · ey + y − tan y = cos x . 5. Basic Ideas and Terminology y − 9y = 0. x 2 y + 5xy + 3y = 0. 2x 2 y − xy + y = 9x 2 . ex − sin y . x cos y 26. xy 2 + 2y − x = c, y = 1 − y2 . 2(1 + xy) 1 − yexy . xexy Determine the solution with y(1) = 0. 27. exy − x = c, y = 15. y(x) = c1 x 2 + c2 x 3 − x 2 sin x, x 2 y − 4xy + 6y = x 4 sin x . 28. ey/x + xy 2 − x = c, y = 16. y(x) = c1 eax + c2 ebx , y − (a + b)y + aby = 0, where a and b are constants and a = b. x 2 (1 − y 2 ) + yey/x . x(ey/x + 2x 2 y) 29. x 2 y 2 − sin x = c, y = 17. y(x) = eax (c1 + c2 x), a is a constant. y − 2ay + a 2 y = 0, where 18. y(x) = eax (c1 cos bx + c2 sin bx), y − 2ay + (a 2 + b2 )y = 0, where a and b are constants. For Problems 19–22, determine all values of the constant r such that the given function solves the given differential equation. cos x − 2xy 2 . 2x 2 y Determine the explicit solution that satisfies y(π) = 1/π . For Problems 30–33, find the general solution to the given differential equation and the maximum interval on which the solution is valid. 30. y = sin x . 31. y = x −1/2 . 19. y(x) = erx , y + 2y − 3y = 0. 32. y = xex . 20. y(x) = erx , y − 8y + 16y = 0. 33. y = x n , n an integer. 21. y(x) = x r , x 2 y + xy − y = 0. 22. y(x) = x r , x 2 y + 5xy + 4y = 0. For Problems 34–38, solve the given initial-value problem. 34. y = ln x, y(1) = 2. i i i i i i i “main” 2007/2/16 page 20 i 20 CHAPTER 1 First-Order Differential Equations 35. y = cos x, y(0) = 2, y (0) = 1. 10 = 6x, y(0) = 1, y (0) = −1, y (0) = 4. 36. y 37. y = xex , y(0) = 3, y (0) = 4. y (x) = 47. x > 0. 48. 38. Prove that the general solution to y − y = 0 on any interval I is y(x) = c1 ex + c2 e−x . (a) Derive the polynomial of degree five that satisfies both the Legendre equation A second-order differential equation together with two auxiliary conditions imposed at different values of the independent variable is called a boundary-value problem. For Problems 39–40, solve the given boundary-value problem. (1 − x 2 )y − 2xy + 30y = 0 and the normalization condition y(1) = 1. (b) 39. y = e−x , y(0) = 1, y(1) = 0. 40. y = −2(3 + 2 ln x), y(1) = y(e) = 0. 41. The differential equation y + y = 0 has the general solution y(x) = c1 cos x + c2 sin x . (a) Show that the boundary-value problem y + y = 0, y(0) = 0, y(π) = 1 has no solutions. (b) Show that the boundary-value problem y + y = 0, y(0) = 0, y(π) = 0, has an infinite number of solutions. For Problems 42–47, verify that the given function is a solution to the given differential equation. In these problems, c1 and c2 are arbitrary constants. Throughout the text, the symbol refers to exercises for which some form of technology, such as a graphing calculator or computer algebra system (CAS), is recommended. 42. y (x) = c1 e2x + c2 e−3x , y + y − 6y = 0. 43. y (x) = c1 x 4 + c2 x −2 , x 2 y − xy − 8y = 0, x > 0. 44. y (x) = c1 x 2 + c2 x 2 ln x + 1 x 2 (ln x)3 , 6 − 3xy + 4y = x 2 ln x, x > 0. x2y 45. y (x) = x a [c1 cos(b ln x) + c2 sin(b ln x)], x 2 y + (1 − 2a)xy + (a 2 + b2 )y = 0, x > 0, where a and b are arbitrary constants. 46. y (x) = c1 ex + c2 e−x (1 + 2x + 2x 2 ), xy − 2y + (2 − x)y = 0, x > 0. 1.3 k =0 1k x , xy − (x + 10)y + 10y = 0, k! 49. Sketch your solution from (a) and determine approximations to all zeros and local maxima and local minima on the interval (−1, 1). One solution to the Bessel equation of (nonnegative) integer order N x 2 y + xy + (x 2 − N 2 )y = 0 is ∞ y(x) = JN (x) = k =0 (−1)k x k !(N + k)! 2 2 k +N . (a) Write the first three terms of J0 (x). (b) Let J (0, x, m) denote the mth partial sum m J (0, x, m) = k =0 (−1)k (k !)2 x 2 2k . Plot J (0, x, 4) and use your plot to approximate the first positive zero of J0 (x). Compare your value against a tabulated value or one generated by a computer algebra system. (c) Plot J0 (x) and J (0, x, 4) on the same axes over the interval [0, 2]. How well do they compare? (d) If your system has built-in Bessel functions, plot J0 (x) and J (0, x, m) on the same axes over the interval [0, 10] for various values of m. What is the smallest value of m that gives an accurate approximation to the first three positive zeros of J0 (x)? The Geometry of First-Order DIfferential Equations The primary aim of this chapter is to study the first-order differential equation dy = f (x, y), dx (1.3.1) i i i i i i i “main” 2007/2/16 page 21 i 1.3 The Geometry of First-Order DIfferential Equations 21 where f (x, y) is a given function of x and y . In this section we focus our attention mainly on the geometric aspects of the differential equation and its solutions. The graph of any solution to the differential equation (1.3.1) is called a solution curve. If we recall the geometric interpretation of the derivative dy/dx as giving the slope of the tangent line at any point on the curve with equation y = y(x), we see that the function f (x, y) in (1.3.1) gives the slope of the tangent line to the solution curve passing through the point (x, y). Consequently, when we solve Equation (1.3.1), we are finding all curves whose slope at the point (x, y) is given by the function f (x, y). According to our definition in the previous section, the general solution to the differential equation (1.3.1) will involve one arbitrary constant, and therefore, geometrically, the general solution gives a family of solution curves in the xy -plane, one solution curve corresponding to each value of the arbitrary constant. Example 1.3.1 Find the general solution to the differential equation dy/dx = 2x , and sketch the corresponding solution curves. Solution: The differential equation can be integrated directly to obtain y(x) = x 2 + c. Consequently the solution curves are a family of parabolas in the xy -plane. This is illustrated in Figure 1.3.1. y x Figure 1.3.1: Some solution curves for the differential equation dy/dx = 2x . Figure 1.3.2 gives a Mathematica plot of some solution curves to the differential equation dy = y − x2. dx This illustrates that generally the solution curves of a differential equation are quite complicated. Upon completion of the material in this section, the reader will be able to obtain Figure 1.3.2 without needing a computer algebra system. Existence and Uniqueness of Solutions It is useful for the further analysis of the differential equation (1.3.1) to give at this point a brief discussion of the existence and uniqueness of solutions to the corresponding initial-value problem dy = f (x, y), y(x0 ) = y0 . (1.3.2) dx Geometrically, we are interested in finding the particular solution curve to the differential equation that passes through the point in the xy -plane with coordinates (x0 , y0 ). The following questions arise regarding the initial-value problem: i i i i i i i “main” 2007/2/16 page 22 i 22 CHAPTER 1 First-Order Differential Equations y (x0, y0) y (x0) f(x0, y0) y0 x02 x Figure 1.3.2: Some solution curves for the differential equation dy/dx = y − x 2 . 1. Existence: Does the initial-value problem have any solutions? 2. Uniqueness: If the answer to question 1 is yes, does the initial-value problem have only one solution? Certainly in the case of an applied problem we would be interested only in initial-value problems that have precisely one solution. The following theorem establishes conditions on f that guarantee the existence and uniqueness of a solution to the initial-value problem (1.3.2). Theorem 1.3.2 (Existence and Uniqueness Theorem) Let f (x, y) be a function that is continuous on the rectangle R = {(x, y) : a ≤ x ≤ b, c ≤ y ≤ d }. Suppose further that ∂f/∂y is continuous in R . Then for any interior point (x0 , y0 ) in the rectangle R , there exists an interval I containing x0 such that the initial-value problem (1.3.2) has a unique solution for x in I . Proof A complete proof of this theorem can be found, for example, in G. F. Simmons, Differential Equations (New York: McGraw-Hill, 1972). Figure 1.3.3 gives a geometric illustration of the result. Remark From a geometric viewpoint, if f (x, y) satisfies the hypotheses of the existence and uniqueness theorem in a region R of the xy -plane, then throughout that region the solution curves of the differential equation dy/dx = f (x, y) cannot intersect. For if two solution curves did intersect at (x0 , y0 ) in R , then that would imply there was more than one solution to the initial-value problem dy = f (x, y), dx y(x0 ) = y0 , which would contradict the existence and uniqueness theorem. i i i i i i i “main” 2007/2/16 page 23 i 1.3 The Geometry of First-Order DIfferential Equations 23 y Unique solution on I d Rectangle, R (x0, y0) c a b I x Figure 1.3.3: Illustration of the existence and uniqueness theorem for first-order differential equations. The following example illustrates how the preceding theorem can be used to establish the existence of a unique solution to a differential equation, even though at present we do not know how to determine the solution. Example 1.3.3 Prove that the initial-value problem dy = 3xy 1/3 , dx y(0) = a has a unique solution whenever a = 0. Solution: In this case the initial point is x0 = 0, y0 = a , and f (x, y) = 3xy 1/3 . Hence, ∂f/∂y = xy −2/3 . Consequently, f is continuous at all points in the xy -plane, whereas ∂f/∂y is continuous at all points not lying on the x -axis (y = 0). Provided a = 0, we can certainly draw a rectangle containing (0, a) that does not intersect the x -axis. (See Figure 1.3.4.) In any such rectangle the hypotheses of the existence and uniqueness theorem are satisfied, and therefore the initial-value problem does indeed have a unique solution. y (0, a) x Figure 1.3.4: The initial-value problem in Example 1.3.3 satisfies the hypotheses of the existence and uniqueness theorem in the small rectangle, but not in the large rectangle. Example 1.3.4 Discuss the existence and uniqueness of solutions to the initial-value problem dy = 3xy 1/3 , dx y(0) = 0. i i i i i i i “main” 2007/2/16 page 24 i 24 CHAPTER 1 First-Order Differential Equations Solution: The differential equation is the same as in the previous example, but the initial condition is imposed on the x -axis. Since ∂f/∂y = xy −2/3 is not continuous along the x -axis, there is no rectangle containing (0, 0) in which the hypotheses of the existence and uniqueness theorem are satisfied. We can therefore draw no conclusion from the theorem itself. We leave it as an exercise to verify by direct substitution that the given initial-value problem does in fact have the following two solutions: y(x) = 0 and y(x) = x 3 . Consequently in this case the initial-value problem does not have a unique solution. Slope Fields We now return to our discussion of the geometry of solutions to the differential equation dy = f (x, y). dx The fact that the function f (x, y) gives the slope of the tangent line to the solution curves of this differential equation leads to a simple and important idea for determining the overall shape of the solution curves. We compute the value of f (x, y) at several points and draw through each of the corresponding points in the xy -plane small line segments having f (x, y) as their slopes. The resulting sketch is called the slope field for the differential equation. The key point is that each solution curve must be tangent to the line segments that we have drawn, and therefore by studying the slope field we can obtain the general shape of the solution curves. Example 1.3.5 x Slope = 2x 2 0 ±0.2 ±0.4 ±0.6 ±0.8 ±1.0 0 0.08 0.32 0.72 1.28 2 Table 1.3.1: Values of the slope for the differential equation in Example 1.3.5. Sketch the slope field for the differential equation dy/dx = 2x 2 . Solution: The slope of the solution curves to the differential equation at each point in the xy -plane depends on x only. Consequently, the slopes of the solution curves will be the same at every point on any line parallel to the y -axis (on such a line, x is constant). Table 1.3.1 contains the values of the slope of the solution curves at various points in the interval [−1, 1]. Using this information, we obtain the slope field shown in Figure 1.3.5. In this example, we can integrate the differential equation to obtain the general solution y(x) = 23 x + c. 3 Some solution curves and their relation to the slope field are also shown in Figure 1.3.5. In the preceding example, the slope field could be obtained fairly easily because the slopes of the solution curves to the differential equation were constant on lines parallel to the y -axis. For more complicated differential equations, further analysis is generally required if we wish to obtain an accurate plot of the slope field and the behavior of the corresponding solution curves. Below we have listed three useful procedures. i i i i i i i “main” 2007/2/16 page 25 i 1.3 The Geometry of First-Order DIfferential Equations 25 y x Figure 1.3.5: Slope field and some representative solution curves for the differential equation dy/dx = 2x 2 . 1. Isoclines: For the differential equation dy = f (x, y), (1.3.3) dx the function f (x, y) determines the regions in the xy -plane where the slope of the solution curves is positive, as well as those where it is negative. Furthermore, each solution curve will have the same slope k along the family of curves f (x, y) = k. These curves are called the isoclines of the differential equation, and they can be very useful in determining slope fields. When sketching a slope field, we often start by drawing several isoclines and the corresponding line segments with slope k at various points along them. 2. Equilibrium Solutions: Any solution to the differential equation (1.3.3) of the form y(x) = y0 , where y0 is a constant, is called an equilibrium solution to the differential equation. The corresponding solution curve is a line parallel to the x axis. From Equation (1.3.3), equilibrium solutions are given by any constant values of y for which f (x, y) = 0, and therefore can often be obtained by inspection. For example, the differential equation dy = (y − x)(y + 1) dx has the equilibrium solution y(x) = −1. One reason that equilibrium solutions are useful in sketching slope fields and determining the general behavior of the full family of solution curves is that, from the existence and uniqueness theorem, we know that no other solution curves can intersect the solution curve corresponding to an equilibrium solution. Consequently, equilibrium solutions serve to divide the xy -plane into different regions. 3. Concavity Changes: By differentiating Equation (1.3.3) (implicitly) with respect to x we can obtain an expression for d 2 y/dx 2 in terms of x and y . This can be useful in determining the behavior of the concavity of the solution curves to the differential equation (1.3.3). The remaining examples illustrate the application of the foregoing procedures. Example 1.3.6 Sketch the slope field for the differential equation dy = y − x. dx (1.3.4) i i i i i i i “main” 2007/2/16 page 26 i 26 CHAPTER 1 First-Order Differential Equations Solution: By inspection we see that the differential equation has no equilibrium solutions. The isoclines of the differential equation are the family of straight lines y − x = k . Thus each solution curve of the differential equation has slope k at all points along the line y − x = k . Table 1.3.2 contains several values for the slopes of the solution curves, and the equations of the corresponding isoclines. We note that the slope at all points along the isocline y = x + 1 is unity, which, from Table 1.3.2, coincides with the slope of any solution curve that meets it. This implies that the isocline must in fact coincide with a solution curve. Hence, one solution to the differential equation (1.3.4) is y(x) = x + 1, and, by the existence and uniqueness theorem, no other solution curve can intersect this one. Slope of Solution Curves k k k k k = −2 = −1 =0 =1 =2 Equation of Isocline y y y y y =x−2 =x−1 =x =x+1 =x+2 Table 1.3.2: Slope and isocline information for the differential equation in Example 1.3.6. In order to determine the behavior of the concavity of the solution curves, we differentiate the given differential equation implicitly with respect to x to obtain d 2y dy = − 1 = y − x − 1, dx dx 2 where we have used (1.3.4) to substitute for dy/dx in the second step. We see that the solution curves are concave up (y > 0) at all points above the line y =x+1 (1.3.5) and concave down (y < 0) at all points beneath this line. We also note that Equation (1.3.5) coincides with the particular solution already identified. Putting all of this information together, we obtain the slope field sketched in Figure 1.3.6. y y x 1 Isoclines x Figure 1.3.6: Hand-drawn slope field, isoclines, and some approximate solution curves for the differential equation in Example 1.3.6. i i i i i i i “main” 2007/2/16 page 27 i 1.3 The Geometry of First-Order DIfferential Equations 27 Generating Slope Fields Using Technology Many computer algebra systems (CAS) and graphing calculators have built-in programs to generate slope fields. As an example, in the CAS Maple the command diffeq := diff(y(x), x) = y(x) − x ; assigns the name diffeq to the differential equation considered in the previous example. The further command DEplot(diffeq, y(x), x = −3..3, y = −3..3, arrows=line); then produces a sketch of the slope field for the differential equation on the square −3 ≤ x ≤ 3, −3 ≤ y ≤ 3. Initial conditions such as y(0) = 0, y(0) = 1, y(0) = 2, y(0) = −1 can be specified using the command IC := {[0, 0], [0, 1], [0, 2], [0, −1]}; Then the command DEplot(diffeq, y(x), x = −3..3, IC, y = −3..3, arrows=line); not only plots the slope field, but also gives a numerical approximation to each of the solution curves satisfying the specified initial conditions. Some of the methods that can be used to generate such numerical approximations will be discussed in Section 1.10. The preceding sequence of Maple commands was used to generate the Maple plot given in Figure 1.3.7. Clearly the generation of slope fields and approximate solution curves is one area where technology can be extremely helpful. y 3 2 1 3 2 1 1 2 3 x 1 2 3 Figure 1.3.7: Maple plot of the slope field and some approximate solution curves for the differential equation in Example 1.3.6. Example 1.3.7 Sketch the slope field and some approximate solution curves for the differential equation dy = y(2 − y). dx (1.3.6) i i i i i i i “main” 2007/2/16 page 28 i 28 CHAPTER 1 First-Order Differential Equations Solution: solutions We first note that the given differential equation has the two equilibrium y(x) = 0 and y(x) = 2. Consequently, from Theorem 1.3.2, the xy -plane can be divided into the three distinct regions y < 0, 0 < y < 2, and y > 2. From Equation (1.3.6) the behavior of the sign of the slope of the solution curves in each of these regions is given in the following schematic. − − −− |+ + ++ |− − −− 0 2 sign of slope: y -interval: The isoclines are determined from y(2 − y) = k. That is, y 2 − 2y + k = 0, so that the solution curves have slope k at all points of intersection with the horizontal lines √ y = 1 ± 1 − k. (1.3.7) Table 1.3.3 contains some of the isocline equations. Note from Equation (1.3.7) that the largest possible positive slope is k = 1. We see that the slopes of the solution curves quickly become very large and negative for y outside the interval [0, 2]. Finally, differentiating Equation (1.3.6) implicitly with respect to x yields d 2y dy dy dy =2 − 2y = 2(1 − y) = 2y(1 − y)(2 − y). 2 dx dx dx dx Slope of Solution Curves k k k k k k =1 =0 = −1 = −2 = −3 = −n, n ≥ 1 Equation of Isocline y y y y y y =1 = 2 and y = 0 √ = 1 ± √2 =1± 3 = 3 and y = −1 √ =1± n+1 Table 1.3.3: Slope and isocline information for the differential equation in Example 1.3.7. The sign of d 2 y/dx 2 is given in the following schematic. sign of y : y -interval: − − −− |+ + ++ |− − −− |+ + + + 0 1 2 i i i i i i i “main” 2007/2/16 page 29 i 1.3 The Geometry of First-Order DIfferential Equations 29 y 2 x Figure 1.3.8: Hand-drawn slope field, isoclines, and some solution curves for the differential equation dy/dx = y(2 − y). Using this information leads to the slope field sketched in Figure 1.3.8. We have also included some approximate solution curves. We see from the slope field that for any initial condition y(x0 ) = y0 , with 0 ≤ y0 ≤ 2, the corresponding unique solution to the differential equation will be bounded. In contrast, if y0 > 2, the slope field suggests that all corresponding solutions approach y = 2 as x → ∞, whereas if y0 < 0, then all corresponding solutions approach y = 0 as x → −∞. Furthermore, the behavior of the slope field also suggests that the solution curves that do not lie in the region 0 < y < 2 may diverge at finite values of x . We leave it as an exercise to verify (by substitution into Equation (1.3.6)) that for all values of the constant c, 2ce2x ce2x − 1 is a solution to the given differential equation. We see that any initial condition that yields a positive value for c will indeed lead to a solution that has a vertical asymptote 1 at x = 2 ln(1/c). y(x) = The tools that we have introduced in this section enable us to analyze the solution behavior of many first-order differential equations. However, for complicated functions f (x, y) in Equation (1.3.3), performing these computations by hand can be a tedious task. Fortunately, as we have illustrated, there are many computer programs available for drawing slope fields and generating solution curves (numerically). Furthermore, several graphing calculators also have these capabilities. Exercises for 1.3 Key Terms Solution curve, Existence and uniqueness theorem, Slope field, Isocline, Equilibrium solution. Skills • Be able to find isoclines for a differential equation dy/dx = f (x, y). • Be able to determine equilibrium solutions for a differential equation dy/dx = f (x, y). • Be able to sketch the slope field for a differential equation, using isoclines, equilibrium solutions, and concavity changes. • Be able to sketch solution curves to a differential equation. • Be able to apply the existence and uniqueness theorem to find unique solutions to initial-value problems. i i i i i i i “main” 2007/2/16 page 30 i 30 CHAPTER 1 First-Order Differential Equations True-False Review For Questions 1–7, decide if the given statement is true or false, and give a brief justification for your answer. If true, you can quote a relevant definition or theorem from the text. If false, provide an example, illustration, or brief explanation of why the statement is false. 1. If f (x, y) satisfies the hypotheses of the existence and uniqueness theorem in a region R of the xy plane, then the solution curves to a differential equation dy/dx = f (x, y) cannot intersect in R . For Problems 8–11, verify that the given function (or relation) defines a solution to the given differential equation and sketch some of the solution curves. If an initial condition is given, label the solution curve corresponding to the resulting unique solution. (In these problems, c denotes an arbitrary constant.) 8. x 2 + y 2 = c, y = −x/y . 9. y = cx 3 , y = 3y/x , y(2) = 8. 10. y 2 = cx, 2x dy − y dx = 0, y(1) = 2. y2 − x2 , y(2) = 2. 2xy 2. Every differential equation dy/dx = f (x, y) has at least one equilibrium solution. 11. (x − c)2 + y 2 = c2 , y = 3. The differential equation dy/dx = x(y 2 − 4) has no equilibrium solutions. 12. Prove that the initial-value problem 4. The circle x 2 + y 2 = 4 is an isocline for the differential equation dy/dx = x 2 + y 2 . 5. The equilibrium solutions of a differential equation are always parallel to one another. 6. The isoclines for the differential equation dy x2 + y2 = dx 2y are the family of circles x 2 + (y − k)2 = k 2 . 7. No solution to the differential equation dy/dx = f (x, y) can intersect with equilibrium solutions of the differential equation. Problems For Problems 1–7, determine the differential equation giving the slope of the tangent line at the point (x, y) for the given family of curves. 1. y = c/x . 2. y = cx 2 . 3. x 2 + y 2 = 2cx . 4. y 2 = cx . 5. 2cy = x 2 − c2 . 6. y 2 − x 2 = c. 7. (x − c)2 + (y − c)2 = 2c2 . y = x sin(x + y), y(0) = 1 has a unique solution. 13. Use the existence and uniqueness theorem to prove that y(x) = 3 is the only solution to the initial-value problem y= x (y 2 − 9), x2 + 1 y(0) = 3. 14. Do you think that the initial-value problem y = xy 1/2 , y(0) = 0 has a unique solution? Justify your answer. 15. Even simple-looking differential equations can have complicated solution curves. In this problem, we study the solution curves of the differential equation y = −2xy 2 . (1.3.8) (a) Verify that the hypotheses of the existence and uniqueness theorem (Theorem 1.3.2) are satisfied for the initial-value problem y = −2xy 2 , y(x0 ) = y0 for every (x0 , y0 ). This establishes that the initialvalue problem always has a unique solution on some interval containing x0 . (b) Verify that for all values of the constant c, y(x) = 1/(x 2 + c) is a solution to (1.3.8). i i i i i i i “main” 2007/2/16 page 31 i 1.3 (c) Use the solution to (1.3.8) given in (b) to solve the following initial-value problems. For each case, sketch the corresponding solution curve, and state the maximum interval on which your solution is valid. (i) y = −2xy 2 , (ii) y = −2xy 2 , (iii) y = −2xy 2 , The Geometry of First-Order DIfferential Equations 24. y = x 2 + y 2 . 25. According to Newton’s law of cooling (see Section 1.1), the temperature of an object at time t is governed by the differential equation y(0) = 1. y(1) = 1. y(0) = −1. dT = −k(T − Tm ), dt where Tm is the temperature of the surrounding medium, and k is a constant. Consider the case when Tm = 70 and k = 1/80. Sketch the corresponding slope field and some representative solution curves. What happens to the temperature of the object as t → ∞? Note that this result is independent of the initial temperature of the object. (d) What is the unique solution to the following initial-value problem? y = −2xy 2 , y(0) = 0. 16. Consider the initial-value problem: y = y(y − 1), y(x0 ) = y0 . (a) Verify that the hypotheses of the existence and uniqueness theorem are satisfied for this initialvalue problem for any x0 , y0 . This establishes that the initial-value problem always has a unique solution on some interval containing x0 . (b) By inspection, determine all equilibrium solutions to the differential equation. (c) Determine the regions in the xy -plane where the solution curves are concave up, and determine those regions where they are concave down. (d) Sketch the slope field for the differential equation, and determine all values of y0 for which the initial-value problem has bounded solutions. On your slope field, sketch representative solution curves in the three cases y0 < 0, 0 < y0 < 1, and y0 > 1. For Problems 17–24, sketch the slope field and some representative solution curves for the given differential equation. 17. y = 4x . 18. y = 1/x . 19. y = x + y . 20. y = x/y . 21. y = −4x/y . 22. y = x 2 y . 23. y = x 2 cos y . 31 For Problems 26–31, determine the slope field and some representative solution curves for the given differential equation. 26. y = −2xy . 27. y= 28. y = 3x − y . 29. y = 2x 2 sin y . 30. y= 2 + y2 . 3 + 0.5x 2 31. y= 1 − y2 . 2 + 0.5x 2 x sin x . 1 + y2 32. (a) Determine the slope field for the differential equation y = x −1 (3 sin x − y) on the interval (0, 10]. (b) Plot the solution curves corresponding to each of the following initial conditions: y(0.5) = 0, y(1) = 2, y(1) = −1, y(3) = 0. What do you conclude about the behavior as x → 0+ of solutions to the differential equation? i i i i i i i “main” 2007/2/16 page 32 i 32 CHAPTER 1 First-Order Differential Equations (c) Plot the solution curve corresponding to the initial condition y(π/2) = 6/π . How does this fit in with your answer to part (b)? (d) Describe the behavior of the solution curves for large positive x . bers of the given family of curves. Describe the family of orthogonal trajectories. 34. Consider the differential equation Consider the family of curves y = kx 2 , where k is a constant. di + ai = b, dt (a) Show that the differential equation of the family of orthogonal trajectories is 33. where a and b are constants. By drawing the slope fields corresponding to various values of a and b, formulate a conjecture regarding the value of dy x =− . dx 2y lim i(t). t →∞ (b) On the same axes sketch the slope field for the preceding differential equation and several mem- 1.4 Separable Differential Equations In the previous section we analyzed first-order differential equations using qualitative techniques. We now begin an analytical study of these differential equations by developing some solution techniques that enable us to determine the exact solution to certain types of differential equations. The simplest differential equations for which a solution technique can be obtained are the so-called separable equations, which are defined as follows: DEFINITION 1.4.1 A first-order differential equation is called separable if it can be written in the form p(y) dy = q(x). dx (1.4.1) The solution technique for a separable differential equation is given in Theorem 1.4.2. Theorem 1.4.2 If p(y) and q(x) are continuous, then Equation (1.4.1) has the general solution p (y) dy = q (x) dx + c, (1.4.2) where c is an arbitrary constant. Proof We use the chain rule for derivatives to rewrite Equation (1.4.1) in the equivalent form d dx p (y) dy = q(x). Integrating both sides of this equation with respect to x yields Equation (1.4.2). i i i i i i i “main” 2007/2/16 page 33 i 1.4 Remark Separable Differential Equations 33 In differential form, Equation (1.4.1) can be written as p(y) dy = q(x) dx, and the general solution (1.4.2) is obtained by integrating the left-hand side with respect to y and the right-hand side with respect to x . This is the general procedure for solving separable equations. Example 1.4.3 Solve (1 + y 2 ) dy = x cos x . dx Solution: By inspection we see that the differential equation is separable. Integrating both sides of the differential equation yields (1 + y 2 ) dy = x cos x dx + c. Using integration by parts to evaluate the integral on the right-hand side, we obtain y + 1 y 3 = x sin x + cos x + c, 3 or equivalently y 3 + 3y = 3(x sin x + cos x) + c1 , where c1 = 3c. As often happens with separable differential equations, the solution is given in implicit form. In general, the differential equation dy/dx = f (x)g(y) is separable, since it can be written as 1 dy = f (x), g(y) dx which is of the form of Equation (1.4.1) with p(y) = 1/g(y). It is important to note, however, that in writing the given differential equation in this way, we have assumed that g(y) = 0. Thus the general solution to the resulting differential equation may not include solutions of the original equation corresponding to any values of y for which g(y) = 0. (These are the equilibrium solutions for the original differential equation.) We will illustrate with an example. Example 1.4.4 Find all solutions to y = −2y 2 x. Solution: (1.4.3) Separating the variables yields y −2 dy = −2x dx. (1.4.4) Integrating both sides, we obtain −y −1 = −x 2 + c i i i i i i i “main” 2007/2/16 page 34 i 34 CHAPTER 1 First-Order Differential Equations so that y(x) = 1 . x2 − c (1.4.5) This is the general solution to Equation (1.4.4). It is not the general solution to Equation (1.4.3), since there is no value of the constant c for which y(x) = 0, whereas by inspection we see y(x) = 0 is a solution to Equation (1.4.3). This solution is not contained in (1.4.5), since in separating the variables, we divided by y and hence assumed implicitly that y = 0. Thus the solutions to Equation (1.4.3) are y(x) = 1 x2 − c and y(x) = 0. The slope field for the given differential equation is depicted in Figure 1.4.1, together with some representative solution curves. y 2 1 2 1 1 2 x 1 2 Figure 1.4.1: The slope field and some solution curves for the differential equation dy/dx = −2xy 2 . Many difficulties that students encounter with first-order differential equations arise not from the solution techniques themselves, but in the algebraic simplifications that are used to obtain a simple form for the resulting solution. We will explicitly illustrate some of the standard simplifications using the differential equation dy = −2xy. dx First notice that y(x) = 0 is an equilibrium solution to the differential equation. Consequently, no other solution curves can cross the x -axis. For y = 0 we can separate the variables to obtain 1 dy = −2x dx. y (1.4.6) i i i i i i i “main” 2007/2/16 page 35 i 1.4 Separable Differential Equations 35 Integrating this equation yields ln |y | = −x 2 + c. Exponentiating both sides of this solution gives |y | = e−x 2 +c , or equivalently, |y | = ec e−x . 2 We now introduce a new constant c1 defined by c1 = ec . Then the preceding expression for |y | reduces to |y | = c 1 e − x . 2 (1.4.7) Notice that c1 is a positive constant. This is a perfectly acceptable form for the solution. However, a redefinition of the integration constant can be used to eliminate the absolutevalue bars as follows. According to (1.4.7), the solution to the differential equation is c1 e − x , 2 −c1 e−x , 2 y(x) = if y > 0, if y < 0. (1.4.8) We can now define a new constant c2 , by c2 = c1 , −c1 , if y > 0, if y < 0, in terms of which the solutions given in (1.4.8) can be combined into the single formula y(x) = c2 e−x . 2 (1.4.9) The appropriate sign for c2 will be determined from the initial conditions. For example, the initial condition y(0) = 1 would require that c2 = 1, with corresponding unique solution y(x) = e−x . 2 Similarly the initial condition y(0) = −1 leads to c2 = −1, so that y(x) = −e−x . 2 We make one further point about the solution (1.4.9). In obtaining the separable form (1.4.6), we divided the given differential equation by y , and so the derivation of the solution obtained assumes that y = 0. However, as we have already noted, y(x) = 0 is indeed a solution to this differential equation. Formally this solution is the special case c2 = 0 in (1.4.9) and corresponds to the initial condition y(0) = 0. Thus (1.4.9) does give the general solution to the differential equation, provided we allow c2 to assume the value zero. The slope field for the differential equation, together with some particular solution curves, is shown in Figure 1.4.2. i i i i i i i “main” 2007/2/16 page 36 i 36 CHAPTER 1 First-Order Differential Equations y 3 2 1 3 2 1 1 2 3 x 1 2 3 Figure 1.4.2: Slope field and some solution curves for the differential equation dy/dx = −2xy . Example 1.4.5 kv Positive y An object of mass m falls from rest, starting at a point near the earth’s surface. Assuming that the air resistance is proportional to the velocity of the object, determine the subsequent motion. Solution: Let y(t) be the distance traveled by the object at time t from the point it was released, and let the positive y -direction be downward. Then, y(0) = 0, and the velocity of the object is v(t) = dy/dt . Since the object was dropped from rest, we have v(0) = 0. The forces acting on the object are those due to gravity, Fg = mg , and the force due to air resistance, Fr = −kv , where k is a positive constant (see Figure 1.4.3). According to Newton’s second law, the differential equation describing the motion of the object is mg Figure 1.4.3: Particle falling under the influence of gravity and air resistance. m dv = Fg + Fr = mg − kv. dt We are also given the initial condition v(0) = 0. Thus the initial-value problem governing the behavior of v is dv = mg − kv, m (1.4.10) dt v(0) = 0. Separating the variables in Equation (1.4.10) yields m dv = dt, mg − kv which can be integrated directly to obtain m − ln |mg − kv | = t + c. k Multiplying both sides of this equation by −k/m and exponentiating the result yields |mg − kv | = c1 e−(k/m)t , where c1 = e−ck/m . By redefining the constant c1 , we can write this in the equivalent form mg − kv = c2 e−(k/m)t . i i i i i i i “main” 2007/2/16 page 37 i 1.4 Separable Differential Equations 37 Hence, v(t) = mg − c3 e−(k/m)t , k (1.4.11) where c3 = c2 /k . Imposing the initial condition v(0) = 0 yields c3 = mg . k So the solution to the initial-value problem (1.4.10) is v(t) = mg 1 − e−(k/m)t . k (1.4.12) Notice that the velocity does not increase indefinitely, but approaches a so-called limiting velocity vL defined by vL = lim v(t) = lim t →∞ t →∞ mg mg 1 − e−(k/m)t = . k k The behavior of the velocity as a function of time is shown in Figure 1.4.4. Owing to the negative exponent in (1.4.11), we see that this result is independent of the value of the initial velocity. v mg/k t Figure 1.4.4: The behavior of the velocity of the object in Example 1.4.5. Since dy/dt = v , it follows from (1.4.12) that the position of the object at time t can be determined by solving the initial-value problem mg dy = 1 − e−(k/m)t , dt k y(0) = 0. The differential equation can be integrated directly to obtain y(t) = m mg t + e−(k/m)t + c. k k Imposing the initial condition y(0) = 0 yields c=− m2 g , k2 so that y(t) = mg m −(k/m)t t+ e −1 k k . i i i i i i i “main” 2007/2/16 page 38 i 38 CHAPTER 1 First-Order Differential Equations Example 1.4.6 A hot metal bar whose temperature is 350◦ F is placed in a room whose temperature is constant at 70◦ F. After two minutes, the temperature of the bar is 210◦ F. Using Newton’s law of cooling, determine 1. the temperature of the bar after four minutes. 2. the time required for the bar to cool to 100◦ F. Solution: According to Newton’s law of cooling (see Section 1.1), the temperature of the object at time t is governed by the differential equation dT = −k(T − Tm ), dt (1.4.13) where, from the statement of the problem, Tm = 70◦ F, T (0) = 350◦ F, T (2) = 210◦ F. Substituting for Tm in Equation (1.4.13), we have the separable equation dT = −k(T − 70). dt Separating the variables yields 1 dT = −k dt, T − 70 which we can integrate immediately to obtain ln |T − 70| = −kt + c. Exponentiating both sides and solving for T yields T (t) = 70 + c1 e−kt , (1.4.14) where we have redefined the integration constant. The two constants c1 and k can be determined from the given auxiliary conditions as follows. The condition T (0) = 350◦ F requires that 350 = 70 + c1 . Hence, c1 = 280. Substituting this value for c1 into (1.4.14) yields T (t) = 70(1 + 4e−kt ). (1.4.15) Consequently, T (2) = 210◦ F if and only if 210 = 70(1 + 4e−2k ), 1 so that e−2k = 2 . Hence, k = 1 2 ln 2, and so, from (1.4.15), T (t) = 70 1 + 4e−(t/2) ln 2 . (1.4.16) We can now determine the quantities requested. 1. We have T (4) = 70(1 + 4e−2 ln 2 ) = 70 1 + 4 · 1 22 = 140◦ F. i i i i i i i “main” 2007/2/16 page 39 i 1.4 Separable Differential Equations 39 2. From (1.4.16), T (t) = 100◦ F when 100 = 70 1 + 4e−(t/2) ln 2 —that is, when e−(t/2) ln 2 = 3 . 28 Taking the natural logarithm of both sides and solving for t yields t= 2 ln (28/3) ≈ 6.4 minutes. ln 2 Exercises for 1.4 Skills 8. The differential equation • Be able to recognize whether or not a given differential equation is separable. x + 4y dy = dx 4x + y • Be able to solve separable differential equations. is separable. True-False Review For Questions 1–9, decide if the given statement is true or false, and give a brief justification for your answer. If true, you can quote a relevant definition or theorem from the text. If false, provide an example, illustration, or brief explanation of why the statement is false. 1. Every differential equation of the form dy/dx = f (x)g(y) is separable. 2. The general solution to a separable differential equation contains one constant whose value can be determined from an initial condition for the differential equation. 9. The differential equation dy x3y + x2y2 = dx x 2 + xy is separable. Problems For Problems 1–11, solve the given differential equation. 1. dy = 2xy. dx 4. The differential equation dy/dx = x 2 + y 2 is separable. 2. y2 dy . =2 dx x +1 5. The differential equation dy/dx = x sin(xy) is separable. 3. ex +y dy − dx = 0. 3. Newton’s law of cooling is a separable differential equation. 6. The differential equation dy = ex +y is separable. dx 7. The differential equation dy 1 =2 dx x (1 + y 2 ) is separable. 4. dy y = . dx x ln x 5. ydx − (x − 2)dy = 0. 6. dy 2x(y − 1) . = dx x2 + 3 i i i i i i i “main” 2007/2/16 page 40 i 40 CHAPTER 1 7. y − x First-Order Differential Equations dy dy = 3 − 2x 2 . dx dx 8. cos(x − y) dy = − 1. dx sin x sin y 9. 19. Find the equation of the curve that passes through the point (3, 1) and whose slope at each point (x, y) is e x −y . − 1) dy = . dx 2(x − 2)(x − 1) 10. 20. Find the equation of the curve that passes through the point (−1, 1) and whose slope at each point (x, y) is x2y2. x(y 2 21. At time t , the velocity v(t) of an object moving in a straight line satisfies dy x 2 y − 32 = + 2. dx 16 − x 2 dv = −(1 + v 2 ). dt 11. (x − a)(x − b)y − (y − c) = 0, where a, b, c are constants. (1.4.17) In Problems 12–15, solve the given initial-value problem. 12. (x 2 + 1)y + y 2 = −1, y(0) = 1. 13. (1 − x 2 )y + xy = ax, constant. dy sin(x + y) 14. =1− , dx sin y cos x 15. y = y 3 sin x, y(0) = 2a , where a is a y(π/4) = π/4. y(0) = 0. 16. One solution to the initial-value problem 2 dy = (y − 1)1/2 , dx 3 y(1) = 1 is y(x) = 1. Determine another solution. Does this contradict the existence and uniqueness theorem (Theorem 1.3.2)? Explain. 17. An object of mass m falls from rest, starting at a point near the earth’s surface. Assuming that the air resistance varies as the square of the velocity of the object, a simple application of Newton’s second law yields the initial-value problem for the velocity, v(t), of the object at time t : m dv = mg − kv 2 , dt v(0) = 0, (a) Show that tan−1 (v) = tan−1 (v0 ) − t, where v0 denotes the velocity of the object at time t = 0 (and we assume v0 > 0). Hence prove that the object comes to rest after a finite time tan−1 (v0 ). Does the object remain at rest? (b) Use the chain rule to show that (1.4.17) can be written as v dv = −(1 + v 2 ), dx where x(t) denotes the distance traveled by the object at time t , from its position at t = 0. Determine the distance traveled by the object when it first comes to rest. 22. The differential equation governing the velocity of an object is dv = −kv n , dt where k > 0 and n are constants. At t = 0, the object is set in motion with velocity v0 . where k, m, g are positive constants. (a) Show that the object comes to rest in a finite time if and only if n < 1, and determine the maximum distance traveled by the object in this case. (a) Solve the foregoing initial-value problem for v in terms of t . (b) If 1 ≤ n < 2, show that the maximum distance traveled by the object in a finite time is less than (b) Does the velocity of the object increase indefinitely? Justify. (c) Determine the position of the object at time t . 18. Find the equation of the curve that passes through the 1 point (0, 2 ) and whose slope at each point (x, y) is −x/4y . 2 v0 −n . (2 − n)k (c) If n ≥ 2, show that there is no limit to the distance that the object can travel. i i i i i i i “main” 2007/2/16 page 41 i 1.5 23. The pressure p, and density, ρ , of the atmosphere at a height y above the earth’s surface are related by dp = −gρ dy. Assuming that p and ρ satisfy the adiabatic equation ργ of state p = p0 , where γ = 1 is a constant ρ0 and p0 and ρ0 denote the pressure and density at the earth’s surface, respectively, show that p = p0 1 − (γ − 1) ρ0 gy · γ p0 γ /(γ −1) . 24. An object whose temperature is 615◦ F is placed in a room whose temperature is 75◦ F. At 4 p.m. the temperature of the object is 135◦ F, and an hour later its temperature is 95◦ F. At what time was the object placed in the room? 25. A flammable substance whose initial temperature is 50◦ F is inadvertently placed in a hot oven whose temperature is 450◦ F. After 20 minutes, the substance’s temperature is 150◦ F. Find the temperature of the substance after 40 minutes. Assuming that the substance ignites when its temperature reaches 350◦ F, find the time of combustion. 1.5 Some Simple Population Models 41 26. At 2 p.m. on a cool (34◦ F) afternoon in March, Sherlock Holmes measured the temperature of a dead body to be 38◦ F. One hour later, the temperature was 36◦ F. After a quick calculation using Newton’s law of cooling, and taking the normal temperature of a living body to be 98◦ F, Holmes concluded that the time of death was 10 a.m. Was Holmes right? 27. At 4 p.m., a hot coal was pulled out of a furnace and allowed to cool at room temperature (75◦ F). If, after 10 minutes, the temperature of the coal was 415◦ F, and after 20 minutes, its temperature was 347◦ F, find the following: (a) The temperature of the furnace. (b) The time when the temperature of the coal was 100◦ F. 28. A hot object is placed in a room whose temperature is 72◦ F. After one minute the temperature of the object is 150◦ F and its rate of change of temperature is 20◦ F per minute. Find the initial temperature of the object and the rate at which its temperature is changing after 10 minutes. Some Simple Population Models In this section we consider two important models of population growth whose mathematical formulation leads to separable differential equations. Malthusian Growth The simplest mathematical model of population growth is obtained by assuming that the rate of increase of the population at any time is proportional to the size of the population at that time. If we let P (t) denote the population at time t , then dP = kP , dt where k is a positive constant. Separating the variables and integrating yields P (t) = P0 ekt , (1.5.1) where P0 denotes the population at t = 0. This law predicts an exponential increase in the population with time, which gives a reasonably accurate description of the growth of certain algae, bacteria, and cell cultures. It is called the Malthusian growth model. The time taken for such a culture to double in size is called the doubling time. This is the time, td , when P (td ) = 2P0 . Substituting into (1.5.1) yields 2P0 = P0 ektd . Dividing both sides by P0 and taking logarithms, we find ktd = ln 2, i i i i i i i “main” 2007/2/16 page 42 i 42 CHAPTER 1 First-Order Differential Equations so that the doubling time is td = Example 1.5.1 1 ln 2. k The number of bacteria in a certain culture grows at a rate that is proportional to the number present. If the number increased from 500 to 2000 in 2 hours, determine 1. the number present after 12 hours. 2. the doubling time. Solution: The behavior of the system is governed by the differential equation dP = kP , dt so that P (t) = P0 ekt , where the time t is measured in hours. Taking t = 0 as the time when the population was 500, we have P0 = 500. Thus, P (t) = 500ekt . Further, P (2) = 2000 implies that 2000 = 500e2k , so that k= 1 ln 4 = ln 2. 2 Consequently, P (t) = 500et ln 2 . 1. The number of bacteria present after 12 hours is therefore P (12) = 500e12 ln 2 = 500(212 ) = 2, 048, 000. 2. The doubling time of the system is td = 1 ln 2 = 1 hour. k Logistic Population Model The Malthusian growth law (1.5.1) does not provide an accurate model for the growth of a population over a long time period. To obtain a more realistic model we need to take account of the fact that as the population increases, several factors will begin to affect the growth rate. For example, there will be increased competition for the limited resources that are available, increases in disease, and overcrowding of the limited available space, all of which would serve to slow the growth rate. In order to model this situation mathematically, we modify the differential equation leading to the simple exponential growth i i i i i i i “main” 2007/2/16 page 43 i 1.5 Some Simple Population Models 43 law by adding in a term that slows the growth down as the population increases. If we consider a closed environment (neglecting factors such as immigration and emigration), then the rate of change of population can be modeled by the differential equation dP = [B(t) − D(t)]P , dt where B(t) and D(t) denote the birth rate and death rate per individual, respectively. The simple exponential law corresponds to the case when B(t) = k and D(t) = 0. In the more general situation of interest now, the increased competition as the population grows will result in a corresponding increase in the death rate per individual. Perhaps the simplest way to take account of this is to assume that the death rate per individual is directly proportional to the instantaneous population, and that the birth rate per individual remains constant. The resulting initial-value problem governing the population growth can then be written as dP = (B0 − D0 P )P , dt P (0) = P0 , where B0 and D0 are positive constants. It is useful to write the differential equation in the equivalent form dP P =r 1− dt C (1.5.2) P, where r = B0 , and C = B0 /D0 . Equation (1.5.2) is called the logistic equation, and the corresponding population model is called the logistic model. The differential equation (1.5.2) is separable and can be solved without difficulty. Before doing that, however, we give a qualitative analysis of the differential equation. The constant C in Equation (1.5.2) is called the carrying capacity of the population. We see from Equation (1.5.2) that if P < C , then dP /dt > 0 and the population increases, whereas if P > C , then dP /dt < 0 and the population decreases. We can therefore interpret C as representing the maximum population that the environment can sustain. We note that P (t) = C is an equilibrium solution to the differential equation, as is P (t) = 0. The isoclines for Equation (1.5.2) are determined from r 1− P C P = k, where k is a constant. This can be written as P 2 − CP + kC = 0, r so that the isoclines are the lines P= 1 C± 2 C2 − 4kC r . This tells us that the slopes of the solution curves satisfy C2 − 4kC ≥ 0, r so that k ≤ rC/4. i i i i i i i “main” 2007/2/16 page 44 i 44 CHAPTER 1 First-Order Differential Equations Furthermore, the largest value that the slope can assume is k = rC/4, which corresponds to P = C/2. We also note that the slope approaches zero as the solution curves approach the equilibrium solutions P (t) = 0 and P (t) = C . Differentiating Equation (1.5.2) yields d 2P =r dt 2 1− P C dP P dP − dt C dt =r 1−2 P C dP r2 = 2 (C − 2P )(C − P )P , dt C where we have substituted for dP /dt from (1.5.2) and simplified the result. Since P = C and P = 0 are solutions to the differential equation (1.5.2), the only points of inflection occur along the line P = C/2. The behavior of the concavity is therefore given by the following schematic: sign of P : P -interval: | + + ++ |− − −− |+ + ++ 0 C/2 C This information determines the general behavior of the solution curves to the differential equation (1.5.2). Figure 1.5.1 gives a Maple plot of the slope field and some representative solution curves. Of course, such a figure could have been constructed by hand, using the information we have obtained. From Figure 1.5.1, we see that if the initial population is less than the carrying capacity, then the population increases monotonically toward the carrying capacity. Similarly, if the initial population is bigger than the carrying capacity, then the population monotonically decreases toward the carrying capacity. Once more this illustrates the power of the qualitative techniques that have been introduced for analyzing first-order differential equations. P C C/2 t Figure 1.5.1: Representative slope field and some approximate solution curves for the logistic equation. We turn now to obtaining an analytical solution to the differential equation (1.5.2). Separating the variables in Equation (1.5.2) and integrating yields C dP = rt + c1 , P (C − P ) where c1 is an integration constant. Using a partial-fraction decomposition on the lefthand side, we find 1 1 + P C−P d P = rt + c1 , i i i i i i i “main” 2007/2/16 page 45 i 1.5 Some Simple Population Models 45 which upon integration gives ln P = rt + c1 . C−P Exponentiating, and redefining the integration constant, yields P = c2 ert , C−P which can be solved algebraically for P to obtain P (t) = c2 Cert , 1 + c2 ert P (t) = c2 C . c2 + e−rt or equivalently, Imposing the initial condition P (0) = P0 , we find that c2 = P0 /(C − P0 ). Inserting this value of c2 into the preceding expression for P (t) yields P (t) = CP0 . P0 + (C − P0 )e−rt (1.5.3) We make two comments regarding this formula. First, we see that, owing to the negative exponent of the exponential term in the denominator, as t → ∞ the population does indeed tend to the carrying capacity C independently of the initial population P0 . Second, by writing (1.5.3) in the equivalent form P (t) = P0 , P0 /C + (1 − P0 /C)e−rt it follows that if P0 is very small compared to the carrying capacity, then for small t the terms involving P0 in the denominator can be neglected, leading to the approximation P (t) ≈ P0 ert . Consequently, in this case, the Malthusian population model does approximate the logistic model for small time intervals. Although we now have a formula for the solution to the logistic population model, the qualitative analysis is certainly enlightening with regard to the general overall properties of the solution. Of course if we want to investigate specific details of a particular model, then we use the corresponding exact solution (1.5.3). Example 1.5.2 The initial population (measured in thousands) of a city is 20. After 10 years this has increased to 50.87, and after 15 years to 78.68. Use the logistic model to predict the population after 30 years. Solution: In this problem we have P0 = P (0) = 20, P (10) = 50.87, P (15) = 78.68, and we wish to find P (30). Substituting for P0 into Equation (1.5.3) yields P (t) = 20C . 20 + (C − 20)e−rt (1.5.4) i i i i i i i “main” 2007/2/16 page 46 i 46 CHAPTER 1 First-Order Differential Equations P 500 400 Carrying capacity 300 200 100 0 20 40 60 80 100 t Figure 1.5.2: Solution curve corresponding to the population model in Example 1.5.2. The population is measured in thousands of people. Imposing the two remaining auxiliary conditions leads to the following pair of equations for determining r and C : 20C , 20 + (C − 20)e−10r 20C . 78.68 = 20 + (C − 20)e−15r 50.87 = This is a pair of nonlinear equations that are tedious to solve by hand. We therefore turn to technology. Using the algebraic capabilities of Maple, we find that r ≈ 0.1, C ≈ 500.37. Substituting these values of r and C in Equation (1.5.4) yields 10007.4 . 20 + 480.37e−0.1t Accordingly, the predicted value of the population after 30 years is P (t) = 10007.4 = 227.87. 20 + 480.37e−3 A sketch of P (t) is given in Figure 1.5.2. P (30) = Exercises for 1.5 Key Terms Malthusian growth model, Doubling time, Logistic growth model, Carrying capacity. Skills • Be able to solve the basic differential equations describing the Malthusian and logistic population growth models. • Be able to solve word problems involving initial conditions, doubling time, etc., for the Malthusian and logistic population growth models. • Be able to compute the carrying capacity for a logistic population model. • Be able to discuss the qualitative behavior of a population governed by a Malthusian or logistic model, based on initial values, doubling time, and so on as a function of time. i i i i i i i “main” 2007/2/16 page 47 i 1.5 True-False Review For Questions 1–10, decide if the given statement is true or false, and give a brief justification for your answer. If true, you can quote a relevant definition or theorem from the text. If false, provide an example, illustration, or brief explanation of why the statement is false. 1. A population whose growth rate at any given time is proportional to its size at that time obeys the Malthusian growth model. 2. If a population obeys the logistic growth model, then its size can never exceed the carrying capacity of the population. Some Simple Population Models 47 bacteria. Determine the initial size of the culture and the doubling time of the population. 3. A certain cell culture has a doubling time of 4 hours. Initially there were 2000 cells present. Assuming an exponential growth law, determine the time it takes for the culture to contain 106 cells. 4. At time t , the population P (t) of a certain city is increasing at a rate proportional to the number of residents in the city at that time. In January 1990 the population of the city was 10,000, and by 1995 it had risen to 20,000. (a) What will the population of the city be at the beginning of the year 2010? 3. The differential equations which describe population growth according to the Malthusian model and the logistic model are both separable. (b) In what year will the population reach one million? 4. The rate of change of a population whose growth is described with the logistic model eventually tends toward zero, regardless of the initial population. In the logistic population model (1.5.3), if P (t1 ) = P1 and P (2t1 ) = P2 , then it can be shown (through some algebra performed tediously by hand, or easily on a computer algebra system) that 5. If the doubling time of a population governed by the Malthusian growth model is five minutes, then the initial population increases 64-fold in a half-hour. 6. If a population whose growth is based on the Malthusian growth model has a doubling time of 10 years, then it takes approximately 30–40 years in order for the initial population size to increase tenfold. 7. The population growth rate according to the Malthusian growth model is always constant. 8. The logistic population model always has exactly two equilibrium solutions. 9. The concavity of the graph of population governed by the logistic model changes if and only if the initial population is less than the carrying capacity. 10. The concavity of the graph of a population governed by the Malthusian growth model never changes, regardless of the initial population. Problems 1. The number of bacteria in a culture grows at a rate proportional to the number present. Initially there were 10 bacteria in the culture. If the doubling time of the culture is 3 hours, find the number of bacteria that were present after 24 hours. 2. The number of bacteria in a culture grows at a rate proportional to the number present. After 10 hours, there were 5000 bacteria present, and after 12 hours, 6000 1 P2 (P1 − P0 ) ln , t1 P0 (P2 − P1 ) (1.5.5) P1 [P1 (P0 + P2 ) − 2P0 P2 ] . 2 P1 − P0 P2 (1.5.6) r= C= These formulas will be used in Problems 5–7. 5. The initial population in a small village is 500. After 5 years this has grown to 800 and after 10 years to 1000. Using the logistic population model, determine the population after 15 years. 6. An animal sanctuary had an initial population of 50 animals. After two years the population was 62 and after four years 76. Using the logistic population model, determine the carrying capacity and the number of animals in the sanctuary after 20 years. 7. (a) Using Equations (1.5.5) and (1.5.6), and the fact that r and C are positive, derive two inequalities that P0 , P1 , P2 must satisfy in order for there to be a solution to the logistic equation satisfying the conditions P (0) = P0 , P (t1 ) = P1 , P (2t1 ) = P2 . (b) The initial population in a town is 10,000. After 5 years this has grown to 12,000, and after 10 years to 18,000. Is there a solution to the logistic equation that fits this data? i i i i i i i “main” 2007/2/16 page 48 i 48 CHAPTER 1 First-Order Differential Equations 8. Of the 1500 passengers, crew, and staff that board a cruise ship, 5 have the flu. After one day of sailing, the number of infected people has risen to 10. Assuming that the rate at which the flu virus spreads is proportional to the product of the number of infected individuals and the number not yet infected, determine how many people will have the flu at the end of the 14-day cruise. Would you like to be a member of the customer relations department for the cruise line the day after the ship docks? 9. Consider the population model dP = r(P − T )P , dt P (0) = P0 , (1.5.7) where r, T , and P0 are positive constants. (a) Perform a qualitative analysis of the differential equation in the initial-value problem (1.5.7), following the steps used in the text for the logistic equation. Identify the equilibrium solutions, the isoclines, and the behavior of the slope and concavity of the solution curves. (b) Using the information obtained in (a), sketch the slope field for the differential equation and include representative solution curves. (c) What predictions can you make regarding the behavior of the population? Consider the cases P0 < T and P0 > T . The constant T is called the threshold level. Based on your predictions, why is this an appropriate term to use for T ? 10. In the preceding problem, a qualitative analysis of the differential equation in (1.5.7) was carried out. In this problem, we determine the exact solution to the differential equation and verify the predictions from the qualitative analysis. 11. As a modification to the population model considered in the previous two problems, suppose that P (t) satisfies the initial-value problem dP = r(C − P )(P − T )P , dt P (0) = P0 , where r, C, T , P0 are positive constants, and 0 < T < C . Perform a qualitative analysis of this model. Sketch the slope field and some representative solution curves in the three cases 0 < P0 < T , T < P0 < C , and P0 > C . Describe the behavior of the corresponding solutions. The next two problems consider the Gompertz population model, which is governed by the initial-value problem dP = rP (ln C − ln P ), dt P (0) = P0 , (1.5.8) where r, C, and P0 are positive constants. 12. Determine all equilibrium solutions for the differential equation in (1.5.8), and the behavior of the slope and concavity of the solution curves. Use this information to sketch the slope field and some representative solution curves. 13. Solve the initial-value problem (1.5.8) and verify that all solutions satisfy lim P (t) = C . t →∞ Problems 14–16 consider the phenomenon of exponential decay. This occurs when a population P (t) is governed by the differential equation dP = kP , dt where k is a negative constant. (a) Solve the initial-value problem (1.5.7). (b) Using your solution from (a), verify that if P0 < T , then lim P (t) = 0. What does this mean for t →∞ the population? (c) Using your solution from (a), verify that if P0 > T , then each solution curve has a vertical asymptote at t = te , where te = 1 P0 ln rT P0 − T . How do you interpret this result in terms of population growth? Note that this was not obvious from the qualitative analysis performed in the previous problem. 14. A population of swans in a wildlife sanctuary is declining due to the presence of dangerous chemicals in the water. If the population of swans is experiencing exponential decay, and if there were 400 swans in the park at the beginning of the summer and 340 swans 30 days later, (a) How many swans are in the park 60 days after the start of summer? 100 days after the start of summer? (b) How long does it take for the population of swans to be cut in half? (This is known as the half-life of the population.) i i i i i i i “main” 2007/2/16 page 49 i 1.6 15. At the conclusion of the Super Bowl, the number of fans remaining in the stadium decreases at a rate proportional to the number of fans in the stadium. Assume that there are 100,000 fans in the stadium at the end of the Super Bowl and ten minutes later there are 80,000 fans in the stadium. P2 = 17. Use some form of technology to solve the pair of equations P1 = CP0 , P0 + (C − P0 )e−rt1 1.6 49 CP0 , P0 + (C − P0 )e−2rt1 for r and C , and thereby derive the expressions given in Equations (1.5.5) and (1.5.6). 18. (a) Thirty minutes after the Super Bowl will there be more or less than 40,000 fans? How do you know this without doing any calculations? (b) What is the half-life (see the previous problem) for the fan population in the stadium? (c) When will there be only 15,000 fans left in the stadium? (d) Explain why the exponential decay model for the population of fans in the stadium is not realistic from a qualitative perspective. 16. Cobalt-60, an isotope used in cancer therapy, decays exponentially with a half-life of 5.2 years (i.e., half the original sample remains after 5.2 years). How long does it take for a sample of cobalt-60 to disintegrate to the extent that only 4% of the original amount remains? First-Order Linear Differential Equations According to data from the U.S. Bureau of the Census, the population (measured in millions of people) of the United States in 1950, 1960, and 1970 was, respectively, 151.3, 179.4, and 203.3. (a) Using the 1950 and 1960 population figures, solve the corresponding Malthusian population model. (b) Determine the logistic model corresponding to the given data. (c) On the same set of axes, plot the solution curves obtained in (a) and (b). From your plots, determine the values the different models would have predicted for the population in 1980 and 1990, and compare these predictions to the actual values of 226.54 and 248.71, respectively. 19. In a period of five years, the population of a city doubles from its initial size of 50 (measured in thousands of people). After ten more years, the population has reached 250. Determine the logistic model corresponding to this data. Sketch the solution curve and use your plot to estimate the time it will take for the population to reach 95% of the carrying capacity. First-Order Linear Differential Equations In this section we derive a technique for determining the general solution to any first-order linear differential equation. This is the most important technique in the chapter. DEFINITION 1.6.1 A differential equation that can be written in the form a(x) dy + b(x)y = r(x) dx (1.6.1) where a(x), b(x), and r(x) are functions defined on an interval (a, b), is called a first-order linear differential equation. We assume that a(x) = 0 on (a, b) and divide both sides of (1.6.1) by a(x) to obtain the standard form dy + p(x)y = q(x), dx (1.6.2) where p(x) = b(x)/a(x) and q(x) = r(x)/a(x). The idea behind the solution technique i i i i i i i “main” 2007/2/16 page 50 i 50 CHAPTER 1 First-Order Differential Equations for (1.6.2) is to rewrite the differential equation in the form d [g(x, y)] = F (x) dx for an appropriate function g(x, y). The general solution to the differential equation can then be obtained by an integration with respect to x . First consider an example. Example 1.6.2 Solve the differential equation dy 1 + y = ex , dx x Solution: x > 0. (1.6.3) If we multiply (1.6.3) by x, we obtain x dy + y = xex . dx But, from the product rule for differentiation, the left-hand side of this equation is just d the expanded form of (xy). Thus (1.6.3) can be written in the equivalent form dx d (xy) = xex . dx Integrating both sides of this equation with respect to x, we obtain xy = xex − ex + c. Dividing by x yields the general solution to (1.6.3) as y(x) = x −1 [ex (x − 1) + c], where c is an arbitrary constant. In the preceding example we multiplied the given differential equation by the function I (x) = x . This had the effect of reducing the left-hand side of the resulting differential equation to the integrable form d (xy). dx Motivated by this example, we now consider the possibility of multiplying the general linear differential equation dy + p(x)y = q(x) dx (1.6.4) by a nonzero function I (x), chosen in such a way that the left-hand side of the resulting differential equation is d [I (x)y ]. dx Henceforth we will assume that the functions p and q are continuous on (a, b). Multiplying the differential equation (1.6.4) by I (x) yields I dy + p(x)Iy = I q(x). dx (1.6.5) i i i i i i i “main” 2007/2/16 page 51 i 1.6 First-Order Linear Differential Equations 51 Furthermore, from the product rule for derivatives, we know that d dy dI (Iy) = I + y. dx dx dx (1.6.6) Comparing Equations (1.6.5) and (1.6.6), we see that Equation (1.6.5) can indeed be written in the integrable form d (Iy) = I q(x), dx provided the function I (x) is a solution to5 I dy dy dI + p(x)Iy = I + y. dx dx dx This will hold whenever I (x) satisfies the separable differential equation dI = p(x)I. dx (1.6.7) Separating the variables and integrating yields ln |I | = p(x) dx + c, so that I (x) = c1 e p (x)dx , where c1 is an arbitrary constant. Since we require only one solution to Equation (1.6.7), we set c1 = 1, in which case I (x) = e p (x)dx . We can therefore draw the following conclusion. Multiplying the linear differential equation dy + p(x)y = q(x) dx by I (x) = e p (x)dx (1.6.8) reduces it to the integrable form d e dx p (x)dx y = q(x)e p (x)dx . (1.6.9) The general solution to (1.6.8) can now be obtained from (1.6.9) by integration. Formally we have y(x) = e− p (x) dx q (x)e p (x)dx dx + c . (1.6.10) 5 This is obtained by equating the left-hand side of Equation (1.6.5) to the right-hand side of Equation (1.6.6). i i i i i i i “main” 2007/2/16 page 52 i 52 CHAPTER 1 First-Order Differential Equations Remarks 1. The function I (x) = e p(x)dx is called an integrating factor for the differential equation (1.6.8), since it enables us to reduce the differential equation to a form that is directly integrable. 2. It is not necessary to memorize (1.6.10). In a specific problem, we first evaluate the integrating factor e p(x)dx and then use (1.6.9). Example 1.6.3 Solve the initial-value problem dy 2 + xy = xex /2 , dx Solution: y(0) = 1. An appropriate integrating factor in this case is I (x) = e x dx = ex 2 /2 . Multiplying the given differential equation by I and using (1.6.9) yields d x 2 /2 2 (e y) = xex . dx Integrating both sides with respect to x , we obtain ex 2 /2 2 1 y = 2 ex + c. Hence, y(x) = e−x 2 /2 2 1 ( 2 ex + c). Imposing the initial condition y(0) = 1 yields 1= 1 2 + c, 1 so that c = 2 . Thus the required particular solution is 1 y(x) = 2 e−x Example 1.6.4 Solve x 2 /2 2 1 (ex + 1) = 2 (ex 2 /2 + e −x 2 /2 ) = cosh(x 2 /2). dy + 2y = cos x, x > 0. dx Solution: x yields We first write the given differential equation in standard form. Dividing by dy + 2x −1 y = x −1 cos x. dx (1.6.11) An integrating factor is I (x) = e 2x −1 dx = e2 ln x = x 2 , i i i i i i i “main” 2007/2/16 page 53 i 1.6 First-Order Linear Differential Equations 53 so that upon multiplying Equation (1.6.11) by I , we obtain d2 (x y) = x cos x. dx Integrating and rearranging gives y(x) = x −2 (x sin x + cos x + c), where we have used integration by parts on the right-hand side. Example 1.6.5 Solve the initial-value problem y − y = f (x), where f (x) = 1, 2 − x, y(0) = 0, if x < 1, if x ≥ 1. Solution: We have sketched f (x) in Figure 1.6.1. An integrating factor for the differential equation is I (x) = e−x . f (x) 1 1 2 x Figure 1.6.1: A sketch of the function f (x) from Example 1.6.5. Upon multiplication by the integrating factor, the differential equation reduces to d −x (e y) = e−x f (x). dx We now integrate this differential equation over the interval [0, x ]. To do so we need to use a dummy integration variable, which we denote by w. We therefore obtain e−w y(w) x 0 x = e−w f (w) dw, 0 or equivalently, x e−x y(x) − y(0) = e−w f (w) dw. 0 Multiplying by ex and substituting for y(0) = 0 yields x y(x) = ex e−w f (w) dw. (1.6.12) 0 i i i i i i i “main” 2007/2/16 page 54 i 54 CHAPTER 1 First-Order Differential Equations Owing to the form of f (x), the value of the integral on the right-hand side will depend on whether x < 1 or x ≥ 1. If x < 1, then f (w) = 1, and so (1.6.12) can be written as x y(x) = ex e−w dw = ex (1 − e−x ), 0 so that y(x) = ex − 1, x < 1. If x ≥ 1, then the interval of integration [0, x ] must be split into two parts. From (1.6.12) we have 1 y(x) = ex x e−w dw + 0 (2 − w)e−w d w. 1 A straightforward integration leads to y(x) = ex (1 − e−1 ) + − 2e−w + we−w + e−w x 1 , which simplifies to y(x) = ex (1 − e−1 ) + x − 1. The solution to the initial-value problem can therefore be written as ex − 1, ex (1 − e−1 ) + x − 1, y(x) = if x < 1, if x ≥ 1. A sketch of the corresponding solution curve is given in Figure 1.6.2. y 15 10 5 2 1 1 2 3 x Figure 1.6.2: The solution curve for the initial-value problem in Example 1.6.5. The dashed curve is the continuation of y(x) = ex − 1 for x > 1. Differentiating both branches of this function, we find y (x) = ex , ex (1 − e−1 ) + 1, if x < 1, if x ≥ 1. y (x) = ex , ex (1 − e−1 ), if x < 1, if x ≥ 1. We see that even though the function f in the original differential equation was not differentiable at x = 1, the solution to the initial-value problem has a continuous derivative at that point. The discontinuity in the derivative of the driving term does show up in the second derivative of the solution, as indeed it must. i i i i i i i “main” 2007/2/16 page 55 i 1.6 First-Order Linear Differential Equations 55 Exercises for 1.6 4. 2x dy + y = 4x, dx 1 − x2 5. Key Terms dy 2x 4 + y= . 2 dx 1+x (1 + x 2 )2 First-order linear differential equation, Integrating factor. Skills • Be able to recognize a first-order linear differential equation. • Be able to find an integrating factor for a given firstorder linear differential equation. • Be able to solve a first-order linear differential equation. True-False Review −1 < x < 1. 6. 2(cos2 x)y + y sin 2x = 4 cos4 x, 0 ≤ x < π/2. 7. y + 1 y = 9x 2 . x ln x 8. y − y tan x = 8 sin3 x . 9. t dx + 2 x = 4e t , dt t > 0. 10. y = sin x(y sec x − 2). For Questions 1–5, decide if the given statement is true or false, and give a brief justification for your answer. If true, you can quote a relevant definition or theorem from the text. If false, provide an example, illustration, or brief explanation of why the statement is false. 1. There is a unique integrating factor for a differential equation of the form y + p(x)y = q(x). 11. (1 − y sin x) dx − (cos x) dy = 0. 12. y − x −1 y = 2x 2 ln x . 13. y + αy = eβx , where α, β are constants. 14. y + mx −1 y = ln x , where m is constant. In Problems 15–20, solve the given initial-value problem. 2. An integrating factor for the differential equation y + p(x)y = q(x) is e p(x)dx . 15. y + 2x −1 y = 4x, 3. Upon multiplying the differential equation y + p(x)y = q(x) by an integrating factor I (x), the differential equation becomes (I (x) · y) = q(x)I . 16. (sin x)y − y cos x = sin 2x, 4. An integrating factor for the differential equation 18. (y − ex ) dx + dy = 0, y(0) = 1. dy = x 2 y + sin x dx is I (x) = e 17. y(1) = 2. dx 2 + x = 5, x(0) = 4. dt 4−t 19. y + y = f (x), y(0) = 3, where f (x) = x 2 dx . y(π/2) = 2. 1, 0, if x ≤ 1, if x > 1. 5. An integrating factor for the differential equation dy y =x− x dx is I (x) = 5x . Problems For Problems 1–14, solve the given differential equation. 1. dy − y = e 2x . dx 2. x 2 y − 4xy = x 7 sin x, 3. y + 2xy = 2x 3 . x > 0. 20. y − 2y = f (x), y(0) = 1, where f (x) = 1 − x, 0, if x < 1, if x ≥ 1. 21. Solve the initial-value problem in Example 1.6.5 as follows. First determine the general solution to the differential equation on each interval separately. Then use the given initial condition to find the appropriate integration constant for the interval (−∞, 1). To determine the integration constant on the interval [1, ∞), use the fact that the solution must be continuous at x = 1. i i i i i i i “main” 2007/2/16 page 56 i 56 CHAPTER 1 First-Order Differential Equations 22. Find the general solution to the second-order differential equation 1 dy d 2y + = 9x, 2 x dx dx x > 0. [Hint: Let u = dy/dx .] 23. Solve the differential equation for Newton’s law of cooling by viewing it as a first-order linear differential equation. 24. Suppose that an object is placed in a medium whose temperature is increasing at a constant rate of α ◦ F per minute. Show that, according to Newton’s law of cooling, the temperature of the object at time t is given by T (t) = α(t − k −1 ) + c1 + c2 e−kt , where c1 and c2 are constants. 25. Between 8 a.m. and 12 p.m. on a hot summer day, the temperature rose at a rate of 10◦ F per hour from an initial temperature of 65◦ F. At 9 a.m. the temperature of an object was measured to be 35◦ F and was, at that time, increasing at a rate of 5◦ F per hour. Show that the temperature of the object at time t was T (t) = 10t − 15 + 40e(1−t)/8 , Tm (t) = 80e Tm (t) = A − B cos ωt, (b) With Tm given in (1.6.14), solve (1.6.13) subject to the initial condition T (0) = T0 . 28. This problem demonstrates the variation-ofparameters method for first-order linear differential equations. Consider the first-order linear differential equation y + p(x)y = q(x). y + p(x)y = 0 is y(x) = u(x)e− (c) Determine the time, tmax , when the temperature of the object is a maximum. Find T (tmax ) and Tm (tmax ). (d) Make a sketch to depict the behavior of T (t) and Tm (t). 27. The differential equation . p (x)dx is a solution to (1.6.15), and hence derive the general solution to (1.6.15). For Problems 29–32, use the technique derived in the previous problem to solve the given differential equation. 29. y + x −1 y = cos x, x > 0. 30. y + y = e−2x . 31. y + y cot x = 2 cos x, (1.6.13) p (x)dx (b) Determine the function u(x) such that T (t) = 80(e−t/40 − e−t/20 ). (b) What happens to the temperature of the object as t → +∞? Is this reasonable? (1.6.15) (a) Show that the general solution to the associated homogeneous equation yH (x) = c1 e− (a) Using Newton’s law of cooling, show that the temperature of the object at time t is (1.6.14) (a) Make a sketch of Tm (t). Taking t = 0 to correspond to midnight, describe the variation of the external temperature over a 24-hour period. . dT = −k1 [T − Tm (t)] + A0 , dt ω = π/12, where A and B are constants, and t is measured in hours. 0 ≤ t ≤ 4. 26. It is known that a certain object has constant of proportionality k = 1/40 in Newton’s law of cooling. When the temperature of this object is 0◦ F, it is placed in a medium whose temperature is changing in time according to −t/20 where k1 and A0 are positive constants, can be used to model the temperature variation T (t) in a building. In this equation, the first term on the right-hand side gives the contribution due to the variation in the outside temperature, and the second term on the right-hand side gives the contribution due to the heating effect from internal sources such as machinery, lighting, people, and so on. Consider the case when 0 < x < π. 32. xy − y = x 2 ln x . i i i i i i i “main” 2007/2/16 page 57 i 1.7 Modeling Problems Using First-Order Linear Differential Equations For Problems 33–38, use a differential equation solver to determine the solution to each of the initial-value problems and sketch the corresponding solution curve. 35. The initial-value problem in Problem 17. 36. 57 The initial-value problem in Problem 18. 33. The initial-value problem in Problem 15. 37. The initial-value problem in Problem 19. 34. The initial-value problem in Problem 16. 38. The initial-value problem in Problem 20. 1.7 Modeling Problems Using First-Order Linear Differential Equations There are many examples of applied problems whose mathematical formulation leads to a first-order linear differential equation. In this section we analyze two in detail. Mixing Problems Statement of the Problem: Consider the situation depicted in Figure 1.7.1. A tank initially contains V0 liters of a solution in which is dissolved A0 grams of a certain chemical. A solution containing c1 grams/liter of the same chemical flows into the tank at a constant rate of r1 liters/minute, and the mixture flows out at a constant rate of r2 liters/minute. We assume that the mixture is kept uniform by stirring. Then at any time t the concentration of chemical in the tank, c2 (t), is the same throughout the tank and is given by c2 = A(t) , V (t) (1.7.1) where V (t) denotes the volume of solution in the tank at time t and A(t) denotes the amount of chemical in the tank at time t . Solution of concentration c1 grams/liter flows in at a rate of r1 liters/minute A(t) V(t) c2(t) amount of chemical in the tank at time t volume of solution in the tank at time t A(t)/V(t) concentration of chemical in the tank at time t Solution of concentration c2 grams/liter flows out at a rate of r2 liters/minute Figure 1.7.1: A mixing problem. Mathematical Formulation: The two functions in the problem are V (t) and A(t). In order to determine how they change with time, we first consider their change during a short time interval, t minutes. In time t , r1 t liters of solution flow into the tank, whereas r2 t liters flow out. Thus during the time interval t , the change in the volume of solution in the tank is V = r1 t − r2 t = (r1 − r2 ) t. (1.7.2) Since the concentration of chemical in the inflow is c1 grams/liter (assumed constant), it follows that in the time interval t the amount of chemical that flows into the tank is c1 r1 t . Similarly, the amount of chemical that flows out in this same time interval is approximately6 c2 r2 t . Thus, the total change in the amount of chemical in the tank 6 This is only an approximation, since c is not constant over the time interval 2 become more accurate as t → 0. t . The approximation will i i i i i i i “main” 2007/2/16 page 58 i 58 CHAPTER 1 First-Order Differential Equations during the time interval t , denoted by A ≈ c1 r1 A, is approximately t − c2 r2 Dividing Equations (1.7.2) and (1.7.3) by V = r1 − r2 t t = (c1 r1 − c2 r2 ) (1.7.3) t. t yields A ≈ c1 r1 − c2 r2 , t and respectively. These equations describe the rates of change of V and A over the short, but finite, time interval t . In order to determine the instantaneous rates of change of V and A, we take the limit as t → 0 to obtain dV = r1 − r2 dt (1.7.4) dA A = c1 r1 − r2 , dt V (1.7.5) and where we have substituted for c2 from Equation (1.7.1). Since r1 and r2 are constants, we can integrate Equation (1.7.4) directly, obtaining V (t) = (r1 − r2 )t + V0 , where V0 is an integration constant. Substituting for V into Equation (1.7.5) and rearranging terms yields the linear equation for A(t) : dA r2 A = c1 r1 . + dt (r1 − r2 )t + V0 (1.7.6) This differential equation can be solved, subject to the initial condition A(0) = A0 , to determine the behavior of A(t). Remark The reader need not memorize Equation (1.7.6), since it is better to derive it for each specific example. Example 1.7.1 A tank contains 8 L (liters) of water in which is dissolved 32 g (grams) of chemical. A solution containing 2 g/L of the chemical flows into the tank at a rate of 4 L/min, and the well-stirred mixture flows out at a rate of 2 L/min. 1. Determine the amount of chemical in the tank after 20 minutes. 2. What is the concentration of chemical in the tank at that time? Solution: We are given r1 = 4 L/min, r2 = 2 L/min, c1 = 2 g/L, V (0) = 8 L, and A(0) = 32 g. For parts 1 and 2, we must find A(20) and A(20)/V (20), respectively. Now, V = r1 t − r2 t implies that dV = 2. dt i i i i i i i “main” 2007/2/16 page 59 i 1.7 Modeling Problems Using First-Order Linear Differential Equations 59 Integrating this equation and imposing the initial condition that V (0) = 8 yields V (t) = 2(t + 4). (1.7.7) Further, A ≈ c1 r1 t − c2 r2 t implies that dA = 8 − 2c2 . dt That is, since c2 = A/V , dA A =8−2 . dt V Substituting for V from (1.7.7), we must solve dA 1 + A = 8. dt t +4 (1.7.8) This first-order linear equation has integrating factor I =e 1/(t +4)dt = t + 4. Consequently (1.7.8) can be written in the equivalent form d [(t + 4)A] = 8(t + 4), dt which can be integrated directly to obtain (t + 4)A = 4(t + 4)2 + c. Hence A(t) = 1 [4(t + 4)2 + c]. t +4 Imposing the given initial condition A(0) = 32 g implies that c = 64. Consequently A(t) = 4 [(t + 4)2 + 16]. t +4 Setting t = 20 gives us the values for parts 1 and 2: 1. We have A(20) = 1 296 [(24)2 + 16] = g. 6 3 2. Furthermore, using (1.7.7), A(20) 1 296 37 = · = g/L. V (20) 48 3 18 i i i i i i i “main” 2007/2/16 page 60 i 60 CHAPTER 1 First-Order Differential Equations Electric Circuits An important application of differential equations arises from the analysis of simple electric circuits. The most basic electric circuit is obtained by connecting the ends of a wire to the terminals of a battery or generator. This causes a flow of charge, q(t), measured in coulombs (C), through the wire, thereby producing a current, i(t), measured in amperes (A), defined to be the rate of change of charge. Thus, i(t) = dq . dt (1.7.9) In practice a circuit will contain several components that oppose the flow of charge. As current passes through these components, work has to be done, and the loss of energy is described by the resulting voltage drop across each component. For the circuits that we will consider, the behavior of the current in the circuit is governed by Kirchoff’s second law, which can be stated as follows. Kirchoff’s Second Law: The sum of the voltage drops around a closed circuit is zero. In order to apply this law we need to know the relationship between the current passing through each component in the circuit and the resulting voltage drop. The components of interest to us are resistors, capacitors, and inductors. We briefly describe each of these next. 1. Resistors: A resistor is a component that, owing to its constituency, directly resists the flow of charge through it. According to Ohm’s law, the voltage drop, VR , between the ends of a resistor is directly proportional to the current that is passing through it. This is expressed mathematically as VR = iR (1.7.10) where the constant of proportionality, R , is called the resistance of the resistor. The units of resistance are ohms ( ). 2. Capacitors: A capacitor can be thought of as a component that stores charge and thereby opposes the passage of current. If q(t) denotes the charge on the capacitor at time t , then the drop in voltage, VC , as current passes through it is directly proportional to q(t). It is usual to express this law in the form VC = 1 q, C (1.7.11) where the constant C is called the capacitance of the capacitor. The units of capacitance are farads (F). 3. Inductors: The third component that is of interest to us is an inductor. This can be considered as a component that opposes any change in the current flowing through it. The drop in voltage as current passes through an inductor is directly proportional to the rate at which the current is changing. We write this as VL = L di , dt (1.7.12) where the constant L is called the inductance of the inductor, measured in units of henrys (H). 4. EMF : The final component in our circuits will be a source of voltage that produces an electromotive force (EMF), driving the charge through the circuit. As current passes through the voltage source, there is a voltage gain, which we denote by E(t) volts (that is, a voltage drop of −E(t) volts). i i i i i i i “main” 2007/2/16 page 61 i 1.7 Modeling Problems Using First-Order Linear Differential Equations i(t) 61 Inductance, L E(t) Capacitance, C Resistance, R Switch Figure 1.7.2: A simple RLC circuit. A circuit containing all of these components is shown in Figure 1.7.2. Such a circuit is called an RLC circuit. According to Kirchoff’s second law, the sum of the voltage drops at any instant must be zero. Applying this to the RLC circuit in Figure 1.7.2, we obtain VR + VC + VL − E(t) = 0. (1.7.13) Substituting into Equation (1.7.13) from (1.7.10)–(1.7.12) and rearranging yields the basic differential equation for an RLC circuit—namely, L di q + Ri + = E(t). dt C (1.7.14) Three cases are important in applications, two of which are governed by first-order linear differential equations. Case 1: An RL CIRCUIT. In the case when no capacitor is present, we have what is referred to as an RL circuit. The differential equation (1.7.14) then reduces to di R 1 + i = E(t). dt L L (1.7.15) This is a first-order linear differential equation for the current in the circuit at any time t . Case 2: An RC CIRCUIT. Now consider the case when no inductor is present in the circuit. Setting L = 0 in Equation (1.7.14) yields i+ 1 E q= . RC R In this equation we have two unknowns, q(t) and i(t). Substituting from (1.7.9) for i(t) = dq/dt, we obtain the following differential equation for q(t): dq 1 E + q= . dt RC R (1.7.16) In this case, the first-order linear differential equation (1.7.16) can be solved for the charge q(t) on the plates of the capacitor. The current in the circuit can then be obtained from i(t) = dq dt by differentiation. Case 3: An RLC CIRCUIT. In the general case, we must consider all three components to be present in the circuit. Substituting from Equation (1.7.9) into Equation (1.7.14) i i i i i i i “main” 2007/2/16 page 62 i 62 CHAPTER 1 First-Order Differential Equations yields the following differential equation for determining the charge on the capacitor: d 2q 1 1 R dq + q = E(t). + 2 L dt LC L dt We will develop techniques in Chapter 6 that enable us to solve this differential equation without difficulty. For the remainder of this section we restrict our attention to RL and RC circuits.Since these are both first-order linear differential equations, we can solve them using the technique derived in the previous section, once the applied EMF, E(t), has been specified. The two most important forms for E(t) are E(t) = E0 and E(t) = E0 cos ωt, where E0 and ω are constants. The first of these corresponds to a source of EMF such as a battery. The resulting current is called a direct current (DC). The second form of EMF oscillates between ±E0 and is called an alternating current (AC). Example 1.7.2 Determine the current in an RL circuit if the applied EMF is E(t) = E0 cos ωt , where E0 and ω are constants, and the initial current is zero. Solution: Substituting into Equation (1.7.15) for E(t) yields the differential equation R E0 di + i= cos ωt, dt L L which we write as di E0 + ai = cos ωt, dt L (1.7.17) where a = R/L. An integrating factor for (1.7.17) is I (t) = eat , so that the equation can be written in the equivalent form E0 at d at (e i) = e cos ωt. dt L Integrating this equation using the standard integral eat cos ωt dt = a2 1 eat (a cos ωt + ω sin ωt) + c, + ω2 we obtain eat i = E0 eat (a cos ωt + ω sin ωt) + c, + ω2 ) L(a 2 where c is an integration constant. Consequently, i(t) = E0 (a cos ωt + ω sin ωt) + ce−at . + ω2 ) L(a 2 Imposing the initial condition i(0) = 0, we find c=− E0 a , L(a 2 + ω2 ) i i i i i i i “main” 2007/2/16 page 63 i 1.7 Modeling Problems Using First-Order Linear Differential Equations 63 so that i(t) = E0 (a cos ωt + ω sin ωt − ae−at ). + ω2 ) L(a 2 (1.7.18) This solution can be written in the form i(t) = iS (t) + iT (t), where iS (t) = E0 (a cos ωt + ω sin ωt), + ω2 ) L(a 2 iT (t) = − aE0 e−at . L(a 2 + ω2 ) The term iT (t) decays exponentially with time and is referred to as the transient part of the solution. As t → ∞, the solution (1.7.18) approaches the steady-state solution, iS (t). The steady-state solution can be written in a more illuminating form as follows. If we construct the right-angled triangle (see Figure 1.7.3) with sides a and ω, then the √ hypotenuse of the triangle is a 2 + ω2 . Consequently, there exists a unique angle φ in (0, π/2), such that 2 2 v (a 1 /2 cos φ = √ ) v a a 2 + ω2 , sin φ = √ ω a 2 + ω2 . Equivalently, f a Figure 1.7.3: Defining the phase angle for an RL circuit. a 2 + ω2 cos φ, a= ω= a 2 + ω2 sin φ. Substituting for a and ω into the expression for iS yields E0 (cos ωt cos φ + sin ωt sin φ), √ L a 2 + ω2 iS (t) = which can be written, using an appropriate trigonometric identity, as iS (t) = E0 cos(ωt − φ). √ L a 2 + ω2 This is referred to as the phase-amplitude form of the solution. Comparing this with the original driving term, E0 cos ωt , we see that the system has responded with a steadystate solution having the same periodic behavior, but with a phase shift of φ radians. Furthermore the amplitude of the response is A= iS(t), E(t) iS(t) E0 E0 =√ , √ L a 2 + ω2 R 2 + ω2 L2 (1.7.19) A cos (vt — f) E0 cos vt E(t) t Figure 1.7.4: The response of an RL circuit to the driving term E(t) = E0 cos ωt . i i i i i i i “main” 2007/2/16 page 64 i 64 CHAPTER 1 First-Order Differential Equations i(t), iS(t) iS(t) t i(t) iS(t) iT(t) Figure 1.7.5: The transient part of the solution for an RL circuit dies out as t increases. where we have substituted for a = R/L. This is illustrated in Figure 1.7.4. The general picture that we have, therefore, is that the transient part of the solution affects i(t) for a short period of time, after which the current settles into a steady-state. In the case when the driving EMF has the form E(t) = E0 cos ωt , the steady-state is a phase shift of this driving EMF with an amplitude given in Equation (1.7.19). This general behavior is illustrated in Figure 1.7.5. Our next example illustrates the procedure for solving the differential equation (1.7.16) governing the behavior of an RC circuit. Example 1.7.3 Consider the RC circuit in which R = 0.5 , C = 0.1 F, and E0 = 20 V. Given that the capacitor has zero initial charge, determine the current in the circuit after 0.25 seconds. Solution: In this case we first solve Equation (1.7.16) for q(t) and then determine the current in the circuit by differentiating the result. Substituting for R , C and E into Equation (1.7.16) yields dq + 20q = 40, dt which has general solution q(t) = 2 + ce−20t , where c is an integration constant. Imposing the initial condition q(0) = 0 yields c = −2, so that q(t) = 2(1 − e−20t ). Differentiating this expression for q gives the current in the circuit i(t) = dq = 40e−20t . dt Consequently, i(0.25) = 40e−5 ≈ 0.27 A. i i i i i i i “main” 2007/2/16 page 65 i 1.7 Modeling Problems Using First-Order Linear Differential Equations 65 Exercises for 1.7 Key Terms Mixing problem, Concentration, Electric circuit, Kirchoff’s second law, Resistor, Capacitor, Inductor, Electromotive force (EMF), RL circuit, RC circuit, RLC circuit, Direct current, Alternating current, Transient solution, Steady-state solution, Phase, Amplitude. Skills • Be able to use information about a mixing problem to provide the correct mathematical formulation of the problem. • Be able to solve mixing problems by deriving and solving the differential equation (1.7.6) for a specific mixing problem and using initial conditions. • Know the relationship between the charge and the current in an electric circuit. • Be familiar with the basic components of an electric circuit, such as electromotive force, resistors, capacitors, and inductors. • Be able to write down and solve the differential equation for the current in an RL circuit and for the charge in an RC circuit, for either a direct current or an alternating current. • Be able to identify the transient and steady-state components of current in an electric circuit with an alternating current. • Be able to put the steady-state component of the current in an RL circuit in phase-amplitude form, and identify the phase shift and the amplitude. True-False Review For Questions 1–8, decide if the given statement is true or false, and give a brief justification for your answer. If true, you can quote a relevant definition or theorem from the text. If false, provide an example, illustration, or brief explanation of why the statement is false. 1. The amount of chemical A(t) in a tank at time t is obtained by multiplying the concentration of chemical c(t) in the tank at time t by the volume of the solution, V (t), at time t . 2. If r1 and r2 denote the rates at which fluid is flowing into a tank and out of the tank, respectively, then the rate of change of the volume of the tank is r2 − r1 . 3. For the mixing problems described in this section, we assume that the concentration of the chemical entering the tank is independent of time. 4. For the mixing problems described in this section, we assume that the concentration of the chemical leaving the tank is independent of time. 5. Kirchoff’s second law states the sum of the voltage drops around a closed circuit is independent of time. 6. The larger the resistance in a resistor, the greater the voltage drop between the ends of the resistor. 7. Given an alternating current in an RL circuit, the transient part of the current decays to zero with time, while the steady-state part of the current oscillates with the same frequency as the applied EMF. 8. The higher the frequency of an applied EMF in an RL circuit, the lower the amplitude of the steady-state current. Problems 1. A container initially contains 10 L of water in which there is 20 g of salt dissolved. A solution containing 4 g/L of salt is pumped into the container at a rate of 2 L/min, and the well-stirred mixture runs out at a rate of 1 L/min. How much salt is in the tank after 40 minutes? 2. A tank initially contains 600 L of solution in which there is dissolved 1500 g of chemical. A solution containing 5 g/L of the chemical flows into the tank at a rate of 6 L/min, and the well-stirred mixture flows out at a rate of 3 L/min. Determine the concentration of chemical in the tank after one hour. 3. A tank whose volume is 40 L initially contains 20 L of water. A solution containing 10 g/L of salt is pumped into the tank at a rate of 4 L/min, and the well-stirred mixture flows out at a rate of 2 L/min. How much salt is in the tank just before the solution overflows? 4. A tank whose volume is 200 L is initially half full of a solution that contains 100 g of chemical. A solution containing 0.5 g/L of the same chemical flows into the tank at a rate of 6 L/min, and the well-stirred mixture flows out at a rate of 4 L/min. Determine the concentration of chemical in the tank just before the solution overflows. i i i i i i i “main” 2007/2/16 page 66 i 66 CHAPTER 1 First-Order Differential Equations 5. A tank initially contains 10 L of a salt solution. Water flows into the tank at a rate of 3 L/min, and the wellstirred mixture flows out at a rate of 2 L/min. After 5 min, the concentration of salt in the tank is 0.2 g/L. Find: where V1 and V2 are constants. (a) The amount of salt in the tank initially. (b) The volume of solution in the tank when the concentration of salt is 0.1 g/L. (b) Let r1 = 6 L/min, r2 = 4 L/min, r3 = 3 L/min, and c1 = 0.5 g/L. If the first tank initially holds 40 L of water in which 4 grams of chemical is dissolved, whereas the second tank initially contains 20 g of chemical dissolved in 20 L of water, determine the amount of chemical in the second tank after 10 min. 6. A tank initially contains 20 L of water. A solution containing 1 g/L of chemical flows into the tank at a rate of 3 L/min, and the mixture flows out at a rate of 2 L/min. 9. Consider the RL circuit in which R = 4 , L = 0.1 H, and E(t) = 20 V. If no current is flowing initially, determine the current in the circuit for t ≥ 0. (a) Set up and solve the initial-value problem for A(t), the amount of chemical in the tank at time t. (b) When does the concentration of chemical in the tank reach 0.5 g/L? 7. A tank initially contains w liters of a solution in which is dissolved A0 grams of chemical. A solution containing k g/L of this chemical flows into the tank at a rate of r L/min, and the mixture flows out at the same rate. (a) Show that the amount of chemical, A(t), in the tank at time t is A(t) = e−(rt)/w [kw(e(rt)/w − 1) + A0 ]. (b) Show that as t → ∞, the concentration of chemical in the tank approaches k g/L. Is this result reasonable? Explain. 8. Consider the double mixing problem depicted in Figure 1.7.6. r1, c1 A1 A2 1 10. Consider the RC circuit which has R = 5 , C = 50 F, and E(t) = 100 V. If the capacitor is uncharged initially, determine the current in the circuit for t ≥ 0. 11. An RL circuit has EMF E(t) = 10 sin 4t V. If R = 2 , L = 2 H, and there is no current flowing initially, 3 determine the current for t ≥ 0. 12. Consider the RC circuit with R = 2 , C = 1 F, 8 and E(t) = 10 cos 3t V. If q(0) = 1 C, determine the current in the circuit for t ≥ 0. 13. Consider the general RC circuit with E(t) = 0. Suppose that q(0) = 5 C. Determine the charge on the capacitor for t > 0. What happens as t → ∞? Is this reasonable? Explain. 14. Determine the current in an RC circuit if the capacitor has zero charge initially and the driving EMF is E = E0 , where E0 is a constant. Make a sketch showing the change in the charge q(t) on the capacitor with time and show that q(t) approaches a constant value as t → ∞. What happens to the current in the circuit as t → ∞? r3, c3 r2, c2 Figure 1.7.6: Double mixing problem (a) Show that the following are differential equations for A1 (t) and A2 (t): dA1 r2 A1 = c1 r1 , + dt (r1 − r2 )t + V1 dA2 r2 A1 r3 A2 = , + dt (r2 − r3 )t + V2 (r1 − r2 )t + V1 15. Determine the current flowing in an RL circuit if the applied EMF is E(t) = E0 sin ωt , where E0 and ω are constants. Identify the transient part of the solution and the steady-state solution. 16. Determine the current flowing in an RL circuit if the applied EMF is constant and the initial current is zero. 17. Determine the current flowing in an RC circuit if the capacitor is initially uncharged and the driving EMF is given by E(t) = E0 e−at , where E0 and a are constants. i i i i i i i “main” 2007/2/16 page 67 i 1.8 18. Consider the special case of the RLC circuit in which the resistance is negligible and the driving EMF is zero. The differential equation governing the charge on the capacitor in this case is 67 and no current is flowing initially, determine the charge on the capacitor for t > 0, and the corresponding current in the circuit. [Hint: Let u = dq/dt and use the chain rule to show that this implies du/dt = u(du/dq).] d 2q 1 + q = 0. LC dt 2 If the capacitor has an initial charge of q0 coulombs, 1.8 Change of Variables 19. Repeat the previous problem for the case in which the driving EMF is E(t) = E0 , a constant. Change of Variables So far we have introduced techniques for solving separable and first-order linear differential equations. Clearly, most first-order differential equations are not of these two types. In this section, we consider two further types of differential equations that can be solved by using a change of variables to reduce them to one of the types we know how to solve. The key point to grasp, however, is not the specific changes of variables that we discuss, but the general idea of changing variables in a differential equation. Further examples are considered in the exercises. We first require a preliminary definition. DEFINITION 1.8.1 A function f (x, y) is said to be homogeneous of degree zero7 if f (tx, ty) = f (x, y) for all positive values of t for which (tx, ty) is in the domain of f . Remark Equivalently, we can say that f is homogeneous of degree zero if it is invariant under a rescaling of the variables x and y . The simplest nonconstant functions that are homogeneous of degree zero are f (x, y) = y/x , and f (x, y) = x/y . Example 1.8.2 If f (x, y) = x2 − y2 , then 2xy + y 2 f (tx, ty) = t 2 (x 2 − y 2 ) = f (x, y), t 2 (2xy + y 2 ) so that f is homogeneous of degree zero. In the previous example, if we factor an x 2 term from the numerator and denominator, then the function f can be written in the form f (x, y) = x 2 [1 − (y/x)2 ] x 2 [2(y/x) + (y/x)2 ] . That is, f (x, y) = 7 1 − (y/x)2 . 2(y/x) + (y/x)2 More generally, f (x, y) is said to be homogeneous of degree m if f (tx, ty) = t m f (x, y). i i i i i i i “main” 2007/2/16 page 68 i 68 CHAPTER 1 First-Order Differential Equations Thus f can be considered to depend on the single variable V = y/x . The following theorem establishes that this is a basic property of all functions that are homogeneous of degree zero. Theorem 1.8.3 A function f (x, y) is homogeneous of degree zero if and only if it depends on y/x only. Proof Suppose that f is homogeneous of degree zero. We must consider two cases separately. (a) If x > 0, we can take t = 1/x in Definition 1.8.1 to obtain f (x, y) = f (1, y/x), which is a function of V = y/x only. (b) If x < 0, then we can take t = −1/x in Definition 1.8.1. In this case we obtain f (x, y) = f (−1, −y/x), which once more depends on y/x only. Conversely, suppose that f (x, y) depends only on y/x . If we replace x by tx and y by ty, then f is unaltered, since y/x = (ty)/(tx), and hence is homogeneous of degree zero. Remark Do not memorize the formulas in the preceding theorem. Just remember that a function f (x, y) that is homogeneous of degree zero depends only on the combination y/x and hence can be considered as a function of a single variable, say, V , where V = y/x . We now consider solving differential equations that satisfy the following definition. DEFINITION 1.8.4 If f (x, y) is homogeneous of degree zero, then the differential equation dy = f (x, y) dx is called a homogeneous first-order differential equation. In general, if dy = f (x, y) dx is a homogeneous first-order differential equation, then we cannot solve it directly. However, our preceding discussion implies that such a differential equation can be written in the equivalent form dy = F (y/x), (1.8.1) dx for an appropriate function F . This suggests that, instead of using the variables x and y , we should use the variables x and V , where V = y/x , or equivalently, y = xV (x). (1.8.2) i i i i i i i “main” 2007/2/16 page 69 i 1.8 Change of Variables 69 Substitution of (1.8.2) into the right-hand side of Equation (1.8.1) has the effect of reducing it to a function of V only. We must also determine how the derivative term dy/dx transforms. Differentiating (1.8.2) with respect to x using the product rule yields the following relationship between dy/dx and dV /dx : dy dV =x + V. dx dx Substituting into Equation (1.8.1), we therefore obtain x dV + V = F (V ), dx x dV = F (V ) − V . dx or equivalently, The variables can now be separated to yield 1 1 dV = dx, F (V ) − V x which can be solved directly by integration. We have therefore established the next theorem. Theorem 1.8.5 The change of variables y = xV (x) reduces a homogeneous first-order differential equation dy/dx = f (x, y) to the separable equation 1 1 dV = dx. F (V ) − V x Remark The separable equation that results in the previous technique can be integrated to obtain a relationship between V and x . We then obtain the solution to the given differential equation by substituting y/x for V in this relationship. Example 1.8.6 Find the general solution to dy 4x + y = . dx x − 4y (1.8.3) Solution: The function on the right-hand side of Equation (1.8.3) is homogeneous of degree zero, so that we have a first-order homogeneous differential equation. Substituting y = xV into the equation yields d 4+V (xV ) = . dx 1 − 4V That is, x dV 4+V +V = , dx 1 − 4V i i i i i i i “main” 2007/2/16 page 70 i 70 CHAPTER 1 First-Order Differential Equations or equivalently, x 4(1 + V 2 ) dV = . dx 1 − 4V Separating the variables gives 1 1 − 4V dV = dx. x 4(1 + V 2 ) We write this as V 1 − 2) 4(1 + V 1+V2 dV = 1 dx, x which can be integrated directly to obtain 1 1 arctan V − ln (1 + V 2 ) = ln |x | + c. 4 2 Substituting V = y/x and multiplying through by 2 yields y x2 + y2 1 arctan − ln 2 x x2 = ln (x 2 ) + c1 , which simplifies to 1 y arctan − ln (x 2 + y 2 ) = c1 . 2 x (1.8.4) Although this technically gives the answer, the solution is more easily expressed in terms of polar coordinates: x = r cos θ and y = r sin θ ⇐⇒ r= x2 + y2 and θ = arctan y . x Substituting into Equation (1.8.4) yields 1 θ − ln (r 2 ) = c1 , 2 or equivalently, ln r = 1 θ + c2 . 4 Exponentiating both sides of this equation gives r = c3 eθ/4 . For each value of c3 , this is the equation of a logarithmic spiral. The particular spiral 1 with equation r = 2 eθ/4 is shown in Figure 1.8.1. i i i i i i i “main” 2007/2/16 page 71 i 1.8 Change of Variables 71 y 2 4 2 2 4 x 2 4 6 8 1 Figure 1.8.1: Graph of the logarithmic spiral with polar equation r = 2 eθ/4 , −5π/6 ≤ θ ≤ 22π/6. Example 1.8.7 Find the equation of the orthogonal trajectories to the family x 2 + y 2 − 2cx = 0. (1.8.5) (Completing the square in x , we obtain (x − c)2 + y 2 = c2 , which represents the family of circles centered at (c, 0), with radius c.) Solution: First we need an expression for the slope of the given family at the point (x, y). Differentiating Equation (1.8.5) implicitly with respect to x yields 2x + 2y dy − 2c = 0, dx which simplifies to dy c−x = . dx y (1.8.6) This is not the differential equation of the given family, since it still contains the constant c and hence is dependent on the individual curves in the family. Therefore, we must eliminate c to obtain an expression for the slope of the family that is independent of any particular curve in the family. From Equation (1.8.5) we have c= x2 + y2 . 2x Substituting this expression for c into Equation (1.8.6) and simplifying gives dy y2 − x2 = . dx 2xy Therefore, the differential equation for the family of orthogonal trajectories is dy 2xy =− 2 . dx y − x2 (1.8.7) This differential equation is first-order homogeneous. Substituting y = xV (x) into Equation (1.8.7) yields d 2V (xV ) = , dx 1−V2 i i i i i i i “main” 2007/2/16 page 72 i 72 CHAPTER 1 First-Order Differential Equations so that x dV 2V . +V = dx 1−V2 Hence x V +V3 dV , = dx 1−V2 or in separated form, 1 1−V2 dV = dx. x V (1 + V 2 ) Decomposing the left-hand side into partial fractions yields 1 2V − V 1+V2 dV = 1 dx, x which can be integrated directly to obtain ln |V | − ln (1 + V 2 ) = ln |x | + c, or equivalently, ln |V | 1+V2 = ln |x | + c. Exponentiating both sides and redefining the constant yields V = c1 x. 1+V2 Substituting back for V = y/x , we obtain xy = c1 x. x2 + y2 That is, x 2 + y 2 = c2 y, where c2 = 1/c1 . Completing the square in y yields x 2 + (y − k)2 = k 2 , (1.8.8) where k = c2 /2. Equation (1.8.8) is the equation of the family of orthogonal trajectories. This is the family of circles centered at (0, k) with radius k (circles along the y -axis). (See Figure 1.8.2.) i i i i i i i “main” 2007/2/16 page 73 i 1.8 Change of Variables 73 y x2 (y k)2 k2 (x c)2 y2 c2 x Figure 1.8.2: The family (x − c)2 + y 2 = c2 and its orthogonal trajectories x 2 + (y − k)2 = k 2 . Bernoulli Equations We now consider a special type of nonlinear differential equation that can be reduced to a linear equation by a change of variables. DEFINITION 1.8.8 A differential equation that can be written in the form dy + p(x)y = q(x)y n , dx (1.8.9) where n is a real constant, is called a Bernoulli equation. If n = 0 or n = 1, Equation (1.8.9) is linear, but otherwise it is nonlinear. We can reduce it to a linear equation as follows. We first divide Equation (1.8.9) by y n to obtain y −n dy + y 1−n p(x) = q(x). dx (1.8.10) We now make the change of variables u(x) = y 1−n , (1.8.11) which implies that du dy = (1 − n)y −n . dx dx That is, y −n dy 1 du = . dx 1 − n dx Substituting into Equation (1.8.10) for y 1−n and y −n dy/dx yields the linear differential equation 1 du + p(x)u = q(x), 1 − n dx i i i i i i i “main” 2007/2/16 page 74 i 74 CHAPTER 1 First-Order Differential Equations or in standard form, du + (1 − n)p(x)u = (1 − n)q(x). dx (1.8.12) The linear equation (1.8.12) can now be solved for u as a function of x . The solution to the original equation is then obtained from (1.8.11). Example 1.8.9 Solve dy 3 12y 2/3 + y=√ , dx x 1 + x2 x > 0. Solution: The differential equation is a Bernoulli equation. Dividing both sides of the differential equation by y 2/3 yields y − 2 /3 dy 3 12 + y 1/3 = √ . dx x 1 + x2 (1.8.13) We make the change of variables u = y 1/3 , (1.8.14) which implies that 1 dy du = y − 2 /3 . dx 3 dx Substituting into Equation (1.8.13) yields 3 du 3 12 + u= √ , dx x 1 + x2 or in standard form, du 1 4 . + u= √ dx x 1 + x2 (1.8.15) An integrating factor for this linear equation is I (x) = e (1/x) dx = eln x = x, so that Equation (1.8.15) can be written as d 4x (xu) = √ . dx 1 + x2 Integrating, we obtain u(x) = x −1 4 1 + x 2 + c , and so, from (1.8.14), the solution to the original differential equation is y 1/3 = x − 1 4 1 + x 2 + c . i i i i i i i “main” 2007/2/16 page 75 i 1.8 Change of Variables 75 Exercises for 1.8 Key Terms 4. The differential equation Homogeneous of degree zero, Homogeneous first-order differential equation, Bernoulli equation. Skills x2y2 dy =4 dx x + y4 is a first-order homogeneous differential equation. • Be able to recognize whether or not a function f (x, y) is homogeneous of degree zero, and whether or not a given differential equation is a homogeneous firstorder differential equation. 5. The change of variables y = xV (x) always turns a first-order homogeneous differential equation into a separable differential equation for V as a function of x. • Know how to change the variables in a homogeneous first-order differential equation in order to get a differential equation that is separable and thus can be solved. 6. The change of variables u = y −n always turns a Bernoulli differential equation into a first-order linear differential equation for u as a function of x . • Be able to recognize whether or not a given first-order differential equation is a Bernoulli equation. • Know how to change the variables in a Bernoulli equation in order to get a differential equation that is firstorder linear and thus can be solved. • Be able to make other changes of variables to differential equations in order to turn them into differential equations that can be solved by methods from earlier in this chapter. 7. The differential equation √ dy √ = xy + xy dx is a Bernoulli differential equation. 8. The differential equation dy √ − exy y = 5x y dx is a Bernoulli differential equation. True-False Review For Questions 1–9, decide if the given statement is true or false, and give a brief justification for your answer. If true, you can quote a relevant definition or theorem from the text. If false, provide an example, illustration, or brief explanation of why the statement is false. 9. The differential equation dy + xy = x 2 y 2/3 dx is a Bernoulli differential equation. 1. The function f (x, y) = 2xy − x 2 2xy + y 2 is homogeneous of degree zero. 2. The function y2 f (x, y) = x + y2 is homogeneous of degree zero. 3. The differential equation dy 1 + xy 2 = dx 1 + x2y is a first-order homogeneous differential equation. Problems For Problems 1–8, determine whether the given function is homogeneous of degree zero. Rewrite those that are as functions of the single variable V = y/x . 1. f (x, y) = x2 − y2 . xy 2. f (x, y) = x − y . 3. f (x, y) = 4. f (x, y) = x sin(x/y) − y cos(y/x) . y x2 + y2 , x−y x > 0. i i i i i i i “main” 2007/2/16 page 76 i 76 CHAPTER 1 First-Order Differential Equations 25. 5. f (x, y) = 6. f (x, y) = x − 3 5y + 9 + . y 3y x2 + y2 , x 7. f (x, y) = x2 8. f (x, y) = −x+y , x + 3y x x, y = 0. For Problems 9–22, solve the given differential equation. 9. (3x − 2y) dy = 3y. dx (x + y)2 10. y = . 2x 2 y y 11. sin (xy − y) = x cos . x x 12. xy = 16x 2 − y 2 + y, 13. xy − y = x > 0. 9x 2 + y 2 , dy y− = dx tx > 0. 15. xy + y ln x = y ln y . 18. x 2 + 2y 2 ) dx = 0. 19. yy = + y2 − x, x > 0. 20. 2x(y + 2x)y = y(4x − y). 4x 2 − y 2 , x > 0. can be written in polar form as r = keaθ . (b) For the particular case when a = 1/2, determine the solution satisfying the initial condition y(1) = 1, and find the maximum x -interval on which this solution is valid. [Hint: When does the solution curve have a vertical tangent?] (c) On the same set of axes, sketch the spiral corresponding to your solution in (b), and the line y = x/2. Thus verify the x -interval obtained in (b) with the graph. 29. x 2 + y 2 = 2cy . x x2 + y2 + y2 dy = , 22. dx xy 31. Fix a real number m. Let S1 denote the family of circles, centered on the line y = mx , each member of which passes through the origin. (a) Show that the equation of S1 can be written in the form dy = x tan(y/x) + y. 21. x dx (x − a)2 + (y − ma)2 = a 2 (m2 + 1), x > 0. 23. Solve the differential equation in Example 1.8.6 by first transforming it into polar coordinates. [Hint: Write the differential equation in differential form and then express dx and dy in terms of r and θ .] For Problems 24–26, solve the given initial-value problem. dy 2(2y − x) 24. = , dx x+y dy −y = dx 30. (x − c)2 + (y − c)2 = 2c2 . dy = y 2 + 3xy + x 2 . dx x2 y(3) = 4. For Problems 29–30, determine the orthogonal trajectories to the given family of curves. Sketch some curves from each family. y 2 + 2xy − 2x 2 dy . =2 16. dx x − xy + y 2 2 /x 2 x2 + y2 , x 28. (a) Show that the general solution to the differential equation dy x + ay = dx ax − y 14. y(x 2 − y 2 )dx − x(x 2 + y 2 ) dy = 0. 17. 2xy dy − (x 2 e−y y(1) = 1. 27. Find all solutions to x < 0. + 4y 2 2x − y dy = , dx x + 4y 26. y . x−1 y(1) = 2. where a is a constant that labels particular members of the family. (b) Determine the equation of the family of orthogonal trajectories to S1 , and show that it consists of the family of circles centered on the line x = −my that pass through the origin. (c) Sketch some curves from both families when √ m = 3/3. i i i i i i i “main” 2007/2/16 page 77 i 1.8 Let F1 and F2 be two families of curves with the property that whenever a curve from the family F1 intersects one from the family F2 , it does so at an angle α = π/2. If we know the equation of F2 , then it can be shown (see Problem 26 in Section 1.1) that the differential equation for determining F1 is dy m2 − tan α = , dx 1 + m2 tan α (1.8.16) where m2 denotes the slope of the family F2 at the point (x, y). For Problems 32–34, use Equation (1.8.16) to determine the equation of the family of curves that cuts the given family at an angle α = π/4. 32. x 2 + y 2 = c. 35. (a) Use Equation (1.8.16) to find the equation of the family of curves that intersects the family of hyperbolas y = c/x at an angle α = α0 . When α0 = π/4, sketch several curves from each family. 36. (a) Use Equation (1.8.16) to show that the family of curves that intersects the family of concentric circles x 2 + y 2 = c at an angle α = tan−1 m has polar equation r = kemθ . When α = π/6, sketch several curves from each family. For Problems 37–49, solve the given differential equation. 37. y − x −1 y = 4x 2 y −1 cos x, 38. x > 0. dy 1 + (tan x)y = 2y 3 sin x . dx 2 1 3 dy − y= xy π . dx (π − 1)x 1−π 48. 2y + y cot x = 8y −1 cos3 x . √ √ 49. (1 − 3)y + y sec x = y 3 sec x . For Problems 50–51, solve the given initial-value problem. 50. 2x dy y = xy 2 , + dx 1 + x2 51. y + y cot x = y 3 sin3 x, y(0) = 1. y(π/2) = 1. 52. Consider the differential equation (1.8.17) dy 3 − y = 6y 1/3 x 2 ln x . dx 2x √ √ 40. y + 2x −1 y = 6 1 + x 2 y, reduces Equation (1.8.17) to the separable form 1 dV = dx. bF (V ) + a For Problems 53–55, use the result from the previous problem to solve the given differential equation. For Problem 53, impose the given initial condition as well. 53. y = (9x − y)2 , y(0) = 0. 54. y = (4x + y + 2)2 . 55. y = sin2 (3x − 3y + 1). y dy = F (xy) dx x x > 0. 41. y + 2x −1 y = 6y 2 x 4 . 42. 2x(y + y 3 x 2 ) + y = 0. √ 43. (x − a)(x − b)(y − y) = 2(b − a)y , where a, b are constants. 44. y + 6x −1 y = 3x −1 y 2/3 cos x, V = ax + by + c 56. Show that the change of variables V = xy transforms the differential equation 39. 45. y + 4xy = 4x 3 y 1/2 . 1 dy − y = 2xy 3 . dx 2x ln x 47. 77 where a, b = 0, and c are constants. Show that the change of variables from x, y to x, V , where 34. x 2 + y 2 = 2cx . (b) 46. y = F (ax + by + c), 33. y = cx 6 . (b) Change of Variables x > 0. into the separable differential equation dV 1 1 =. V [F (V ) + 1] dx x 57. Use the result from the previous problem to solve y dy = [ln (xy) − 1]. dx x i i i i i i i “main” 2007/2/16 page 78 i 78 CHAPTER 1 First-Order Differential Equations 61. (a) Show that the change of variables y = x −1 + w transforms the Riccati differential equation 58. Consider the differential equation x + 2y − 1 dy = . dx 2x − y + 3 (1.8.18) (a) Show that the change of variables defined by x = u − 1, y + 7x −1 y − 3y 2 = 3x −2 into the Bernoulli equation w + x −1 w = 3w2 . y =v+1 transforms Equation (1.8.18) into the homogeneous equation (1.8.23) (b) Solve Equation (1.8.23), and hence determine the general solution to (1.8.22). 62. Consider the differential equation dv u + 2v = . du 2u − v (1.8.19) (b) Find the general solution to Equation (1.8.19), and hence solve Equation (1.8.18). 59. A differential equation of the form y + p(x)y + q(x)y 2 = r(x) y −1 y + p(x) ln y = q(x), (1.8.24) where p(x) and q(x) are continuous functions on some interval (a, b). Show that the change of variables u = ln y reduces Equation (1.8.24) to the linear differential equation u + p(x)u = q(x), (1.8.20) and hence show that the general solution to Equation (1.8.24) is is called a Riccati equation. (a) If y = Y (x) is a known solution to Equation (1.8.20), show that the substitution y = Y (x) + v −1 (x) y(x) = exp I −1 I =e v − [p(x) + 2Y (x)q(x)]v = q(x). (b) Find the general solution to the Riccati equation x 2 y − xy − x 2 y 2 = 1, I (x)q(x) dx + c , where reduces it to the linear equation x > 0, p (x)dx (1.8.25) and c is an arbitrary constant. 63. Use the technique derived in the previous problem to solve the initial-value problem y −1 y − 2x −1 ln y = x −1 (1 − 2 ln x), given that y = −x −1 is a solution. y(1) = e. 60. Consider the Riccati equation y + 2x −1 y − y 2 = −2x −2 , (1.8.22) 64. Consider the differential equation x > 0. (1.8.21) (a) Determine the values of the constants a and r such that y(x) = ax r is a solution to Equation (1.8.21). (b) Use the result from part (a) of the previous problem to determine the general solution to Equation (1.8.21). f (y) dy + p(x)f (y) = q(x), dx (1.8.26) where p and q are continuous functions on some interval (a, b), and f is an invertible function. Show that Equation (1.8.26) can be written as du + p(x)u = q(x), dx i i i i i i i “main” 2007/2/16 page 79 i 1.9 where u = f (y), and hence show that the general solution to Equation (1.8.26) is y(x) = f −1 I −1 I (x)q(x) dx + c 79 and c is an arbitrary constant. 65. Solve , sec2 y where I is given in (1.8.25), f −1 is the inverse of f , 1.9 Exact Differential Equations dy 1 1 +√ tan y = √ . dx 2 1+x 2 1+x Exact Differential Equations For the next technique it is best to consider first-order differential equations written in differential form M(x, y) dx + N(x, y) dy = 0, (1.9.1) where M and N are given functions, assumed to be sufficiently smooth.8 The method that we will consider is based on the idea of a differential. Recall from a previous calculus course that if φ = φ(x, y) is a function of two variables, x and y , then the differential of φ , denoted dφ , is defined by dφ = Example 1.9.1 ∂φ ∂φ dx + dy. ∂x ∂y (1.9.2) Solve 2x sin y dx + x 2 cos y dy = 0. (1.9.3) Solution: This equation is separable, but we will use a different technique to solve it. By inspection, we notice that 2x sin y dx + x 2 cos y dy = d(x 2 sin y). Consequently, Equation (1.9.3) can be written as d(x 2 sin y) = 0, which implies that x 2 sin y is constant, hence the general solution to Equation (1.9.3) is sin y = c , x2 where c is an arbitrary constant. In the foregoing example we were able to write the given differential equation in the form dφ(x, y) = 0, and hence obtain its solution. However, we cannot always do this. Indeed we see by comparing Equation (1.9.1) with (1.9.2) that the differential equation M(x, y) dx + N(x, y) dy = 0 can be written as dφ = 0 if and only if M= ∂φ ∂x and N= ∂φ ∂y for some function φ . This motivates the following definition: 8 This means we assume that the functions M and N have continuous derivatives of sufficiently high order. i i i i i i i “main” 2007/2/16 page 80 i 80 CHAPTER 1 First-Order Differential Equations DEFINITION 1.9.2 The differential equation M(x, y) dx + N(x, y) dy = 0 is said to be exact in a region R of the xy -plane if there exists a function φ(x, y) such that ∂φ = M, ∂x ∂φ = N, ∂y (1.9.4) for all (x, y) in R . Any function φ satisfying (1.9.4) is called a potential function for the differential equation M(x, y) dx + N(x, y) dy = 0. We emphasize that if such a function exists, then the preceding differential equation can be written as dφ = 0. This is why such a differential equation is called an exact differential equation. From the previous example, a potential function for the differential equation 2x sin y dx + x 2 cos y dy = 0 is φ(x, y) = x 2 sin y. We now show that if a differential equation is exact and we can find a potential function φ , its solution can be written down immediately. Theorem 1.9.3 The general solution to an exact equation M(x, y) dx + N(x, y) dy = 0 is defined implicitly by φ(x, y) = c, where φ satisfies (1.9.4) and c is an arbitrary constant. Proof We rewrite the differential equation in the form dy = 0. dx Since the differential equation is exact, there exists a potential function φ (see (1.9.4)) such that ∂φ ∂φ dy + = 0. ∂x ∂y dx M(x, y) + N(x, y) But this implies that ∂φ/∂x = 0. Consequently, φ(x, y) is a function of y only. By a similar argument, which we leave to the reader, we can deduce that φ(x, y) is a function of x only. We conclude therefore that φ(x, y) = c, where c is a constant. i i i i i i i “main” 2007/2/16 page 81 i 1.9 Exact Differential Equations 81 Remarks 1. The potential function φ is a function of two variables x and y , and we interpret the relationship φ(x, y) = c as defining y implicitly as a function of x . The preceding theorem states that this relationship defines the general solution to the differential equation for which φ is a potential function. 2. Geometrically, Theorem 1.9.3 says that the solution curves of an exact differential equation are the family of curves φ(x, y) = k , where k is a constant. These are called the level curves of the function φ(x, y). The following two questions now arise: 1. How can we tell whether a given differential equation is exact? 2. If we have an exact equation, how do we find a potential function? The answers are given in the next theorem and its proof. Theorem 1.9.4 (Test for Exactness) Let M , N , and their first partial derivatives My and Nx , be continuous in a (simply connected9 ) region R of the xy -plane. Then the differential equation M(x, y) dx + N(x, y) dy = 0 is exact for all x , y in R if and only if ∂M ∂N = . ∂y ∂x (1.9.5) Proof We first prove that exactness implies the validity of Equation (1.9.5). If the differential equation is exact, then by definition there exists a potential function φ(x, y) such that φx = M and φy = N . Thus, taking partial derivatives, φxy = My and φyx = Nx . Since My and Nx are continuous in R , it follows that φxy and φyx are continuous in R . But, from multivariable calculus, this implies that φxy = φyx and hence that My = Nx . We now prove the converse. Thus we assume that Equation (1.9.5) holds and must prove that there exists a potential function φ such that ∂φ =M ∂x (1.9.6) ∂φ = N. ∂y (1.9.7) and The proof is constructional. That is, we will actually find a potential function φ . We begin by integrating Equation (1.9.6) with respect to x , holding y fixed (this is a partial integration) to obtain φ(x, y) = x M(s, y) ds + h(y), (1.9.8) 9 Roughly speaking, simply connected means that the interior of any closed curve drawn in the region also lies in the region. For example, the interior of a circle is a simply connected region, although the region between two concentric circles is not. i i i i i i i “main” 2007/2/16 page 82 i 82 CHAPTER 1 First-Order Differential Equations where h(y) is an arbitrary function of y (this is the integration “constant” that we must allow to depend on y , since we held y fixed in performing the integration10 ). We now show how to determine h(y) so that the function f defined in (1.9.8) also satisfies Equation (1.9.7). Differentiating (1.9.8) partially with respect to y yields x ∂φ ∂ = ∂y ∂y M(s, y) ds + dh . dy In order that φ satisfy Equation (1.9.7) we must choose h(y) to satisfy x ∂ ∂y M(s, y) ds + dh = N(x, y). dy That is, x dh ∂ = N(x, y) − dy ∂y (1.9.9) M(s, y) ds. Since the left-hand side of this expression is a function of y only, we must show, for consistency, that the right-hand side also depends only on y . Taking the derivative of the right-hand side with respect to x yields ∂ ∂x N− ∂ ∂y x M(s, y) ds x ∂2 ∂N − M(s, y) ds ∂x ∂x∂y x ∂N ∂ ∂ = − M(s, y) ds ∂x ∂y ∂x ∂M ∂N − . = ∂x ∂y = Thus, using (1.9.5), we have ∂ ∂x N− x ∂ ∂y = 0, M(s, y) ds so that the right-hand side of Equation (1.9.9) does depend only on y . It follows that (1.9.9) is a consistent equation, and hence we can integrate both sides with respect to y to obtain h(y) = y N(x, t) dt − y ∂ ∂t x M(s, t) ds d t. Finally, substituting into (1.9.8) yields the potential function φ(x, y) = x M(s, y) dx + y N(x, t) dt − y ∂ ∂t x M(s, t) ds d t. Remark There is no need to memorize the final result for φ . For each particular problem, one can construct an appropriate potential function from first principles. This is illustrated in Examples 1.9.6 and 1.9.7. 10 Throughout the text, x f (t) dt means “evaluate the indefinite integral f (t) dt and replace t with x in the result.” i i i i i i i “main” 2007/2/16 page 83 i 1.9 Example 1.9.5 Exact Differential Equations 83 Determine whether the given differential equation is exact. 1. [1 + ln (xy)] dx + (x/y) dy = 0. 2. x 2 y dx − (xy 2 + y 3 ) dy = 0. Solution: 1. In this case, M = 1 + ln (xy) and N = x/y , so that My = 1/y = Nx . It follows from the previous theorem that the differential equation is exact. 2. In this case, we have M = x 2 y , N = −(xy 2 + y 3 ), so that My = x 2 , whereas Nx = −y 2 . Since My = Nx , the differential equation is not exact. Example 1.9.6 Find the general solution to 2xey dx + (x 2 ey + cos y) dy = 0. Solution: We have M(x, y) = 2xey , N(x, y) = x 2 ey + cos y, so that My = 2xey = Nx . Hence the given differential equation is exact, and so there exists a potential function φ such that (see Definition 1.9.2) ∂φ = 2xey , ∂x ∂φ = x 2 ey + cos y. ∂y (1.9.10) (1.9.11) Integrating Equation (1.9.10) with respect to x , holding y fixed, yields φ(x, y) = x 2 ey + h(y), (1.9.12) where h is an arbitrary function of y . We now determine h(y) such that (1.9.12) also satisfies Equation (1.9.11). Taking the derivative of (1.9.12) with respect to y yields ∂φ dh = x 2 ey + . ∂y dy (1.9.13) Equations (1.9.11) and (1.9.13) give two expressions for ∂φ/∂y . This allows us to determine h. Subtracting Equation (1.9.11) from Equation (1.9.13) gives the consistency requirement dh = cos y, dy which implies, upon integration, that h(y) = sin y, i i i i i i i “main” 2007/2/16 page 84 i 84 CHAPTER 1 First-Order Differential Equations where we have set the integration constant equal to zero without loss of generality, since we require only one potential function. Substitution into (1.9.12) yields the potential function φ(x, y) = x 2 ey + sin y. Consequently, the given differential equation can be written as d(x 2 ey + sin y) = 0, and so, from Theorem 1.9.3, the general solution is x 2 ey + sin y = c. Notice that the solution obtained in the preceding example is an implicit solution. Owing to the nature of the way in which the potential function for an exact equation is obtained, this is usually the case. Example 1.9.7 Find the general solution to sin(xy) + xy cos(xy) + 2x d x + x 2 cos(xy) + 2y d y = 0. Solution: We have M(x, y) = sin(xy) + xy cos(xy) + 2x and N(x, y) = x 2 cos(xy) + 2y. Thus, My = 2x cos(xy) − x 2 y sin(xy) = Nx , and so the differential equation is exact. Hence there exists a potential function φ(x, y) such that ∂φ = sin(xy) + xy cos(xy) + 2x, ∂x ∂φ = x 2 cos(xy) + 2y. ∂y (1.9.14) (1.9.15) In this case, Equation (1.9.15) is the simpler equation, and so we integrate it with respect to y , holding x fixed, to obtain φ(x, y) = x sin(xy) + y 2 + g(x), (1.9.16) where g(x) is an arbitrary function of x . We now determine g(x), and hence φ , from (1.9.14) and (1.9.16). Differentiating (1.9.16) partially with respect to x yields ∂φ dg = sin(xy) + xy cos(xy) + . ∂x dx (1.9.17) Equations (1.9.14) and (1.9.17) are consistent if and only if dg = 2x. dx Hence, upon integrating, g(x) = x 2 , i i i i i i i “main” 2007/2/16 page 85 i 1.9 Exact Differential Equations 85 where we have once more set the integration constant to zero without loss of generality, since we require only one potential function. Substituting into (1.9.16) gives the potential function φ(x, y) = x sin xy + x 2 + y 2 . The original differential equation can therefore be written as d(x sin xy + x 2 + y 2 ) = 0, and hence the general solution is x sin xy + x 2 + y 2 = c. Remark At first sight the above procedure appears to be quite complicated. However, with a little bit of practice, the steps are seen to be, in fact, fairly straightforward. As we have shown in Theorem 1.9.4, the method works in general, provided one starts with an exact differential equation. Integrating Factors Usually a given differential equation will not be exact. However, sometimes it is possible to multiply the differential equation by a nonzero function to obtain an exact equation that can then be solved using the technique we have described in this section. Notice that the solution to the resulting exact equation will be the same as that of the original equation, since we multiply by a nonzero function. DEFINITION 1.9.8 A nonzero function I (x, y) is called an integrating factor for the differential equation M(x, y)dx + N(x, y)dy = 0 if the differential equation I (x, y)M(x, y) dx + I (x, y)N(x, y) dy = 0 is exact. Example 1.9.9 Show that I = x 2 y is an integrating factor for the differential equation (3y 2 + 5x 2 y) dx + (3xy + 2x 3 ) dy = 0. Solution: yields (1.9.18) Multiplying the given differential equation (which is not exact) by x 2 y (3x 2 y 3 + 5x 4 y 2 ) dx + (3x 3 y 2 + 2x 5 y) dy = 0. (1.9.19) Thus, My = 9x 2 y 2 + 10x 4 y = Nx , i i i i i i i “main” 2007/2/16 page 86 i 86 CHAPTER 1 First-Order Differential Equations so that the differential equation (1.9.19) is exact, and hence I = x 2 y is an integrating factor for Equation (1.9.18). Indeed we leave it as an exercise to verify that (1.9.19) can be written as d(x 3 y 3 + x 5 y 2 ) = 0, so that the general solution to Equation (1.9.19) (and hence the general solution to Equation (1.9.18)) is defined implicitly by x 3 y 3 + x 5 y 2 = c. That is, x 3 y 2 (y + x 2 ) = c. As shown in the next theorem, using the test for exactness, it is straightforward to determine the conditions that a function I (x, y) must satisfy in order to be an integrating factor for the differential equation M(x, y) dx + N(x, y) dy = 0. Theorem 1.9.10 The function I (x, y) is an integrating factor for M(x, y) dx + N(x, y) dy = 0 (1.9.20) if and only if it is a solution to the partial differential equation N ∂I ∂I −M = ∂x ∂y ∂M ∂N − ∂y ∂x I. (1.9.21) Proof Multiplying Equation (1.9.20) by I yields I M dx + I N dy = 0. This equation is exact if and only if ∂ ∂ (I M) = (I N), ∂y ∂x that is, if and only if ∂I ∂M ∂I ∂N M +I = N +I . ∂y ∂y ∂x ∂x Rearranging the terms in this equation yields Equation (1.9.21). The preceding theorem is not too useful in general, since it is usually no easier to solve the partial differential equation (1.9.21) to find I than it is to solve the original Equation (1.9.20). However, it sometimes happens that an integrating factor exists that depends only on one variable. We now show that Theorem 1.9.10 can be used to determine when such an integrating factor exists and also to actually find a corresponding integrating factor. i i i i i i i “main” 2007/2/16 page 87 i 1.9 Theorem 1.9.11 Exact Differential Equations 87 Consider the differential equation M(x, y) dx + N(x, y) dy = 0. 1. There exists an integrating factor that is dependent only on x if and only if (My − Nx )/N = f (x), a function of x only. In such a case, an integrating factor is I (x) = e f (x) dx . 2. There exists an integrating factor that is dependent only on y if and only if (My − Nx )/M = g(y), a function of y only. In such a case, an integrating factor is I (y) = e− g (y) dy . Proof For part 1 of the theorem, we begin by assuming that I = I (x) is an integrating factor for M(x, y) dx + N(x, y) dy = 0. Then ∂I /∂y = 0, and so, from (1.9.21), I is a solution to dI N = (My − Nx )I. dx That is, My − Nx 1 dI = . I dx N Since, by assumption, I is a function of x only, it follows that the left-hand side of this expression depends only on x and hence also the right-hand side. Conversely, suppose that (My − Nx )/N = f (x), a function of x only. Then, dividing (1.9.21) by N , it follows that I is an integrating factor for M(x, y) dx + N(x, y) dy = 0 if and only if it is a solution to M ∂I ∂I − = If (x). ∂x N ∂y (1.9.22) We must show that this differential equation has a solution I that depends on x only. We do this by explicitly integrating the differential equation under the assumption that I = I (x). Indeed, if I = I (x), then Equation (1.9.22) reduces to dI = If (x), dx which is a separable equation with solution I (x) = e f (x) dx The proof of part 2 is similar, and so we leave it as an exercise (see Problem 30 ). Example 1.9.12 Solve (2x − y 2 ) dx + xy dy = 0, Solution: x > 0. (1.9.23) The equation is not exact (My = Nx ). However, My − Nx −2y − y 3 = =− , N xy x i i i i i i i “main” 2007/2/16 page 88 i 88 CHAPTER 1 First-Order Differential Equations which is a function of x only. It follows from part 1 of the preceding theorem that an integrating factor for Equation (1.9.23) is I (x) = e− (3/x)dx = e−3 ln x = x −3 . Multiplying Equation (1.9.23) by I yields the exact equation (2x −2 − x −3 y 2 ) dx + x −2 y dy = 0. (1.9.24) (The reader should check that this is exact, although it must be, by the previous theorem.) We leave it as an exercise to verify that a potential function for Equation (1.9.24) is φ(x, y) = 1 −2 2 x y − 2 x −1 , 2 and hence the general solution to (1.9.23) is given implicitly by 1 −2 2 x y − 2x −1 = c, 2 or equivalently, y 2 − 4x = c1 x 2 . Exercises for 1.9 Key Terms Exact differential equation, Potential function, Integrating factor. Skills • Be able to determine whether or not a given differential equation is exact. • Given the partial derivatives ∂φ/∂x and ∂φ/∂y of a potential function φ(x, y), be able to determine φ(x, y). • Be able to find the general solution to an exact differential equation. • When circumstances allow, be able to use an integrating factor to convert a given differential equation into an exact differential equation with the same solution set. you can quote a relevant definition or theorem from the text. If false, provide an example, illustration, or brief explanation of why the statement is false. 1. The differential equation M(x, y) dx + N(x, y) dy = 0 is exact in a simply connected region R if Mx and Ny are continuous partial derivatives with Mx = Ny . 2. The solution to an exact differential equation is called a potential function. 3. If M(x) and N(y) are continuous functions, then the differential equation M(x) dx + N(y) dy = 0 is exact. 4. If (My − Nx )/N(x, y) is a function of x only, then the differential equation M(x, y) dx + N(x, y) dy = 0 becomes exact when it is multiplied through by I (x) = exp (My − Nx )/N(x, y) dx . True-False Review For Questions 1–9, decide if the given statement is true or false, and give a brief justification for your answer. If true, 5. There is a unique potential function for an exact differential equation M(x, y) dx + N(x, y) dy = 0. i i i i i i i “main” 2007/2/16 page 89 i 1.9 6. The differential equation (2ye2x − sin y) dx + (e2x − x cos y) dy = 0 Exact Differential Equations 89 13. (3x 2 ln x + x 2 − y) dx − xdy = 0, y(1) = 5. 14. 2x 2 y + 4xy = 3 sin x, y(2π) = 0. 15. (yexy + cos x) dx + xexy dy = 0, y(π/2) = 0. is exact. 7. The differential equation x2 −2xy dx + 2 dy = 0 (x 2 + y)2 (x + y)2 is exact. 8. The differential equation (y 2 + cos x) dx + 2xy 2 dy = 0 is exact. 16. Show that if φ(x, y) is a potential function for M(x, y) dx + N(x, y) dy = 0, then so is φ(x, y) + c, where c is an arbitrary constant. This shows that potential functions are uniquely defined only up to an additive constant. For Problems 17–19, determine whether the given function is an integrating factor for the given differential equation. 17. I (x, y) = cos(xy), [tan(xy) + xy ] dx + x 2 dy = 0. 18. I (x) = sec x , [2x − (x 2 + y 2 ) tan x ] dx + 2y dy = 0. 9. The differential equation (ex sin y sin y) dx + (ex sin y cos y) dy = 0 is exact. 19. I (x, y) = y −2 e−x/y , y(x 2 − 2xy) dx − x 3 dy = 0. For Problems 20–26, determine an integrating factor for the given differential equation, and hence find the general solution. Problems 20. (xy − 1) dx + x 2 dy = 0. For Problems 1–3, determine whether the given differential equation is exact. 21. y dx − (2x + y 4 ) dy = 0. 1. (y + 3x 2 ) dx + xdy = 0. 2. [cos(xy) − xy sin(xy)] dx − x 2 sin(xy) dy = 0. 3. yexy dx + (2y − xexy ) dy = 0. For Problems 4–12, solve the given differential equation. 4. 2xy dx + (x 2 + 1) dy = 0. 5. (y 2 + cos x) dx + (2xy + sin y) dy = 0. 6. x −1 (xy − 1) dx + y −1 (xy + 1) dy = 0. 7. (4e2x + 2xy − y 2 ) dx + (x − y)2 dy = 0. 8. (y 2 − 2x) dx + 2xy dy = 0. 9. y 1 −2 x x + y2 x dx + 2 dy = 0. x + y2 10. [1 + ln (xy)] dx + xy −1 dy = 0. 11. [y cos(xy) − sin x ] dx + x cos(xy) dy = 0. 12. (2xy + cos y) dx + (x 2 − x sin y − 2y) dy = 0. For Problems 13–15, solve the given initial-value problem. 22. x 2 y dx + y(x 3 + e−3y sin y) dy = 0. 23. (y − x 2 ) dx + 2xdy = 0, x > 0. 24. xy [2 ln (xy) + 1] dx + x 2 dy = 0, 25. x > 0. 2x 1 dy + y= . 2 dx 1+x (1 + x 2 )2 26. (3xy − 2y −1 ) dx + x(x + y −2 ) dy = 0. For Problems 27–29, determine the values of the constants r and s such that I (x, y) = x r y s is an integrating factor for the given differential equation. 27. (y −1 − x −1 ) dx + (xy −2 − 2y −1 ) dy = 0. 28. y(5xy 2 + 4) dx + x(xy 2 − 1) dy = 0. 29. 2y(y + 2x 2 ) dx + x(4y + 3x 2 ) dy = 0. 30. Prove that if (My − Nx )/M = g(y), a function of y only, then an integrating factor for M(x, y) dx + N(x, y) dy = 0 is I (y) = e− g (y) dy . i i i i i i i “main” 2007/2/16 page 90 i 90 CHAPTER 1 First-Order Differential Equations 31. Consider the general first-order linear differential equation dy + p(x)y = q(x), (1.9.25) dx where p(x) and q(x) are continuous functions on some interval (a, b). (a) Rewrite Equation (1.9.25) in differential form, and show that an integrating factor for the resulting equation is I (x) = e 1.10 p (x)dx . (b) Show that the general solution to Equation (1.9.25) can be written in the form y(x) = I −1 x I (t)q(t) dt + c , where I is given in Equation (1.9.26), and c is an arbitrary constant. (1.9.26) Numerical Solution to First-Order Differential Equations So far in this chapter we have investigated first-order differential equations geometrically via slope fields, and analytically by trying to construct exact solutions to certain types of differential equations. Certainly, for most first-order differential equations, it simply is not possible to find analytic solutions, since they will not fall into the few classes for which solution techniques are available. Our final approach to analyzing first-order differential equations is to look at the possibility of constructing a numerical approximation to the unique solution to the initial-value problem dy = f (x, y), dx y(x0 ) = y0 . (1.10.1) We consider three techniques that give varying levels of accuracy. In each case, we generate a sequence of approximations y1 , y2 , . . . to the value of the exact solution at the points x1 , x2 , . . . , where xn+1 = xn + h, n = 0, 1, . . . , and h is a real number. We emphasize that numerical methods do not generate a formula for the solution to the differential equation. Rather they generate a sequence of approximations to the value of the solution at specified points. Furthermore, if we use a sufficient number of points, then by plotting the points (xi , yi ) and joining them with straight-line segments, we are able to obtain an overall approximation to the solution curve corresponding to the solution of the given initial-value problem. This is how the approximate solution curves were generated in the preceding sections via the computer algebra system Maple. There are many subtle ideas associated with constructing numerical solutions to initial-value problems that are beyond the scope of this text. Indeed, a full discussion of the application of numerical methods to differential equations is best left for a future course in numerical analysis. Euler’s Method Suppose we wish to approximate the solution to the initial-value problem (1.10.1) at x = x1 = x0 + h, where h is small. The idea behind Euler’s method is to use the tangent line to the solution curve through (x0 , y0 ) to obtain such an approximation. (See Figure 1.10.1.) The equation of the tangent line through (x0 , y0 ) is y(x) = y0 + m(x − x0 ), where m is the slope of the curve at (x0 , y0 ). From Equation (1.10.1), m = f (x0 , y0 ), so y(x) = y0 + f (x0 , y0 )(x − x0 ). i i i i i i i “main” 2007/2/16 page 91 i 1.10 y Numerical Solution to First-Order Differential Equations Tangent line to the solution curve passing through (x1, y1) 91 Solution curve through (x1, y1) y3 y2 y1 (x1, y1) Tangent line at the point (x0, y0) to the exact solution to the IVP y0 (x2, y(x2)) Exact solution to IVP (x1, y(x1)) (x0, y0) h x0 h x1 h x2 x3 x Figure 1.10.1: Euler’s method for approximating the solution to the initial-value problem dy/dx = f (x, y), y(x0 ) = y0 . Setting x = x1 in this equation yields the Euler approximation to the exact solution at x1 , namely, y1 = y0 + f (x0 , y0 )(x1 − x0 ), which we write as y1 = y0 + hf (x0 , y0 ). Now suppose we wish to obtain an approximation to the exact solution to the initialvalue problem (1.10.1) at x2 = x1 + h. We can use the same idea, except we now use the tangent line to the solution curve through (x1 , y1 ). From (1.10.1), the slope of this tangent line is f (x1 , y1 ), so that the equation of the required tangent line is y(x) = y1 + f (x1 , y1 )(x − x1 ). Setting x = x2 yields the approximation y2 = y1 + hf (x1 , y1 ), where we have substituted for x2 − x1 = h, to the solution to the initial-value problem at x = x2 . Continuing in this manner, we determine the sequence of approximations yn+1 = yn + hf (xn , yn ), n = 0, 1, . . . to the solution to the initial-value problem (1.10.1) at the points xn+1 = xn + h. In summary, Euler’s method for approximating the solution to the initial-value problem y = f (x, y), y(x0 ) = y0 at the points xn+1 = x0 + nh (n = 0, 1, . . . ) is yn+1 = yn + hf (xn , yn ), n = 0, 1, . . . . (1.10.2) i i i i i i i “main” 2007/2/16 page 92 i 92 CHAPTER 1 First-Order Differential Equations Example 1.10.1 Consider the initial-value problem y = y − x, 1 y(0) = 2 . Use Euler’s method with (a) h = 0.1 and (b) h = 0.05 to obtain an approximation to y(1). Given that the exact solution to the initial-value problem is 1 y(x) = x + 1 − 2 ex , compare the errors in the two approximations to y(1). Solution: In this problem we have f (x, y) = y − x, x0 = 0 , 1 y0 = 2 . (a) Setting h = 0.1 in (1.10.2) yields yn+1 = yn + 0.1(yn − xn ). Hence, y1 = y0 + 0.1(y0 − x0 ) = 0.5 + 0.1(0.5 − 0) = 0.55, y2 = y1 + 0.1(y1 − x1 ) = 0.55 + 0.1(0.55 − 0.1) = 0.595. Continuing in this manner, we generate the approximations listed in Table 1.10.1, where we have rounded the calculations to six decimal places. Exact Solution n xn yn 1 2 3 4 5 6 7 8 9 10 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 0.55 0.595 0.6345 0.66795 0.694745 0.714219 0.725641 0.728205 0.721026 0.703129 Absolute Error 0.547414 0.589299 0.625070 0.654088 0.675639 0.688941 0.693124 0.687229 0.670198 0.640859 0.002585 0.005701 0.009430 0.013862 0.019106 0.025278 0.032518 0.040976 0.050828 0.062270 Table 1.10.1: The results of applying Euler’s method with h = 0.1 to the initial-value problem in Example 1.10.1. We have also listed the values of the exact solution and the absolute value of the error. In this case, the approximation to y(1) is y10 = 0.703129, with an absolute error of |y(1) − y10 | = 0.062270. (1.10.3) (b) When h = 0.05, Euler’s method gives yn+1 = yn + 0.05(yn − xn ), n = 0, 1, . . . , 19, which generates the approximations given in Table 1.10.2, where we have listed only every other intermediate approximation. We see that the approximation to y(1) is y20 = 0.673351 i i i i i i i “main” 2007/2/16 page 93 i 1.10 Numerical Solution to First-Order Differential Equations 93 and that the absolute error in this approximation is |y(1) − y20 | = 0.032492. n xn yn 2 4 6 8 10 12 14 16 18 20 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 Exact Solution Absolute Error 0.547414 0.589299 0.625070 0.654088 0.675639 0.688941 0.693124 0.687229 0.670198 0.640859 0.001335 0.002948 0.004881 0.007185 0.009913 0.013131 0.016910 0.021333 0.026492 0.032492 0.54875 0.592247 0.629952 0.661272 0.685553 0.702072 0.710034 0.708563 0.696690 0.686525 Table 1.10.2: The results of applying Euler’s method with h = 0.05 to the initial-value problem in Example 1.10.1. y 0.7 0.65 0.6 0.55 0.2 0.4 0.6 0.8 1 x Figure 1.10.2: The exact solution to the initial-value problem considered in Example 1.10.1 and the two approximations obtained using Euler’s method. Comparing this with (1.10.3), we see that the smaller step size has led to a better approximation. In fact, it has almost halved the error at y(1). In Figure 1.10.2 we have plotted the exact solution and the Euler approximations just obtained. In the preceding example we saw that halving the step size had the effect of essentially halving the error. However, even then the accuracy was not as good as we probably would have liked. Of course we could just keep decreasing the step size (provided we did not take h to be so small that round-off errors started to play a role) to increase the accuracy, but then the number of steps we would have to take would make the calculations very cumbersome. A better approach is to derive methods that have a higher order of accuracy. We will consider two such methods. i i i i i i i “main” 2007/2/16 page 94 i 94 CHAPTER 1 First-Order Differential Equations Modified Euler Method (Heun’s Method) The method that we consider here is an example of what is called a predictor-corrector method. The idea is to use the formula from Euler’s method to obtain a first approxima∗ tion to the solution y(xn+1 ). We denote this approximation by yn+1 , so that ∗ yn+1 = yn + hf (xn , yn ). We now improve (or “correct”) this approximation by once more applying Euler’s method. But this time, we use the average of the slopes of the solution curves through ∗ (xn , yn ) and (xn+1 , yn+1 ). This gives 1 ∗ yn+1 = yn + 2 h[f (xn , yn ) + f (xn+1 , yn+1 )]. As illustrated in Figure 1.10.3 for the case n = 1, we can interpret the modified Euler approximations as arising from first stepping to the point y (x1, y(x1)) Modified Euler approximation at x (x0 h/2, y0 hf(x0, y0)/2) (x1, y1) Exact solution to the IVP Euler approximation at x x1 (x1, y*1) (x0, y0) P Tangent line to solution curve through (x1, y*1) x h/2 x0 x1 x0 x1 h/2 Figure 1.10.3: Derivation of the first step in the modified Euler method. P xn + h hf (xn , yn ) , yn + 2 2 along the tangent line to the solution curve through (xn , yn ) and then stepping from P ∗ to (xn+1 , yn+1 ) along the line through P whose slope is f (xn , yn ). In summary, the modified Euler method for approximating the solution to the initialvalue problem y = f (x, y), y(x0 ) = y0 at the points xn+1 = x0 + nh (n = 0, 1, . . . ) is 1 ∗ yn+1 = yn + 2 h f (xn , yn ) + f (xn+1 , yn+1 ) , where ∗ yn+1 = yn + hf (xn , yn ), Example 1.10.2 n = 0, 1, . . . . Apply the modified Euler method with h = 0.1 to determine an approximation to the solution to the initial-value problem y = y − x, y(0) = 1 2 at x = 1. i i i i i i i “main” 2007/2/16 page 95 i 1.10 Numerical Solution to First-Order Differential Equations 95 Taking h = 0.1 and f (x, y) = y − x in the modified Euler method yields Solution: ∗ yn+1 = yn + 0.1(yn − xn ), ∗ yn+1 = yn + 0.05(yn − xn + yn+1 − xn+1 ). Hence, yn+1 = yn + 0.05 {yn − xn + [yn + 0.1(yn − xn )] − xn+1 } . That is, yn+1 = yn + 0.05(2.1yn − 1.1xn − xn+1 ), n = 0, 1, . . . , 9. When n = 0, y1 = y0 + 0.05(2.1y0 − 1.1x0 − x1 ) = 0.5475, and when n = 1, y2 = y1 + 0.05(2.1y1 − 1.1x1 − x2 ) = 0.5894875. Exact Solution n xn yn 1 2 3 4 5 6 7 8 9 10 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 0.5475 0.589487 0.625384 0.654549 0.676277 0.689786 0.694213 0.688605 0.671909 0.642959 Absolute Error 0.547414 0.589299 0.625070 0.654088 0.675639 0.688941 0.693124 0.687229 0.670198 0.640859 0.000085 0.000189 0.000313 0.000461 0.000637 0.000845 0.001089 0.001376 0.001711 0.002100 Table 1.10.3: The results of applying the modified Euler method with h = 0.1 to the initial-value problem in Example 1.10.2. Continuing in this manner, we generate the results displayed in Table 1.10.3. From this table, we see that the approximation to y(1) according to the modified Euler method is y10 = 0.642960. As seen in the previous example, the value of the exact solution at x = 1 is y(1) = 0.640859. Consequently, the absolute error in the approximation at x = 1 using the modified Euler approximation with h = 0.1 is |y(1) − y10 | = 0.002100. Comparing this with the results of the previous example, we see that the modified Euler method has picked up approximately one decimal place of accuracy when using a step size h = 0.1. This is indicative of the general result that the error in the modified Euler method behaves as order h2 as compared to the order h behavior of the Euler method. In Figure 1.10.4 we have sketched the exact solution to the differential equation and the modified Euler approximation with h = 0.1. i i i i i i i “main” 2007/2/16 page 96 i 96 CHAPTER 1 First-Order Differential Equations y 0.65 0.6 0.55 0.2 0.4 0.6 0.8 1 x Figure 1.10.4: The exact solution to the initial-value problem in Example 1.10.2 and the approximations obtained using the modified Euler method with h = 0.1. Runge-Kutta Method of Order Four The final method that we consider is somewhat more tedious to use in hand calculations, but is very easily programmed into a calculator or computer. It is a fourth-order method, which, in the case of a differential equation of the form y = f (x), reduces to Simpson’s rule (which the reader has probably studied in a calculus course) for numerically evaluating definite integrals. Without justification, we state the algorithm. The fourth-order Runge-Kutta method for approximating the solution to the initialvalue problem y = f (x, y), y(x0 ) = y0 at the points xn+1 = x0 + nh (n = 0, 1, . . . ) is yn+1 = yn + 1 (k1 + 2k2 + 2k3 + k4 ), 6 where 1 1 1 1 k1 = hf (xn , yn ), k2 = hf (xn + 2 h, yn + 2 k1 ), k3 = hf (xn + 2 h, yn + 2 k2 ), k4 = hf (xn+1 , yn + k3 ), n = 0, 1, 2, . . . . Remark In the previous sections, we used Maple to generate slope fields and approximate solution curves for first-order differential equations. The solution curves were in fact generated using a Runge-Kutta approximation. Example 1.10.3 Apply the fourth-order Runge-Kutta method with h = 0.1 to determine an approximation to the solution to the initial-value problem below at x = 1: y = y − x, y(0) = 1 2 i i i i i i i “main” 2007/2/16 page 97 i 1.10 Numerical Solution to First-Order Differential Equations 97 Solution: We take h = 0.1, and f (x, y) = y − x in the fourth-order Runge-Kutta method, and we need to determine y10 . First we determine k1 , k2 , k3 , k4 . k1 k2 k3 k4 = 0.1f (xn , yn ) = 0.1(yn − xn ), = 0.1f (xn + 0.05, yn + 0.5k1 ) = 0.1(yn + 0.5k1 − xn − 0.05), = 0.1f (xn + 0.05, yn + 0.5k2 ) = 0.1(yn + 0.5k2 − xn − 0.05), = 0.1f (xn+1 , yn + k3 ) = 0.1(yn + k3 − xn+1 ). When n = 0, k1 k2 k3 k4 = 0.1(0.5) = 0.05, = 0.1[0.5 + (0.5)(0.05) − 0.05] = 0.0475, = 0.1[0.5 + (0.5)(0.0475) − 0.05] = 0.047375, = 0.1(0.5 + 0.047375 − 0.1) = 0.0447375, so that y1 = y0 + 1 (k1 + 2k2 + 2k3 + k4 ) = 0.5 + 1 (0.2844875) = 0.54741458, 6 6 rounded to eight decimal places. Continuing in this manner, we obtain the results displayed in Table 1.10.4. Exact Solution n xn yn 1 2 3 4 5 6 7 8 9 10 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 0.54741458 0.58929871 0.62507075 0.65408788 0.67563968 0.68894102 0.69312419 0.68723022 0.67019929 0.64086013 Absolute Error 0.54741454 0.58929862 0.62507060 0.65408765 0.67563936 0.68894060 0.69312365 0.68722954 0.67019844 0.64085909 0.00000004 0.00000009 0.00000015 0.00000022 0.00000032 0.00000042 0.00000054 0.00000068 0.00000085 0.00000104 Table 1.10.4: The results of applying the fourth-order Runge-Kutta method with h = 0.1 to the initial-value problem in Example 1.10.3. In particular, we see that the fourth-order Runge-Kutta method approximation to y(1) is y10 = 0.64086013, so that |y(1) − y10 | = 0.00000104. Clearly this is an excellent approximation. If we increase the step size to h = 0.2, the corresponding approximation to y(1) becomes y5 = 0.640874, with absolute error |y(1) − y5 | = 0.000015, which is still very impressive. i i i i i i i “main” 2007/2/16 page 98 i 98 CHAPTER 1 First-Order Differential Equations Exercises for 1.10 Key Terms 1. y = 4y − 1, Euler’s method, Predictor-corrector method, Modified Euler method (Heun’s method), Fourth-order Runge-Kutta method. 2. y = − Skills • Be able to apply Euler’s method to approximate the solution to an initial-value problem at a point near the initial value x0 . • Be able to use the modified Euler method (Heun’s method) to approximate the solution to an initial-value problem at a point near the initial value x0 . • Be able to use the fourth-order Runge-Kutta method to approximate the solution to an initial-value problem at a point near the initial value x0 . True-False Review For Questions 1–4, decide if the given statement is true or false, and give a brief justification for your answer. If true, you can quote a relevant definition or theorem from the text. If false, provide an example, illustration, or brief explanation of why the statement is false. 1. Generally speaking, the smaller the step size in Euler’s method, the more accurate the approximation to the solution of an initial-value problem at a point near the initial value x0 . 2. Euler’s method is based on the equation of a tangent line to a curve at a given point (x0 , y0 ). y(0) = 1, 2xy , 1 + x2 3. y = x − y 2 , h = 0.05, y(0) = 1, y(0) = 2, 4. y = −x 2 y, y(0) = 1, 5. y = 2xy 2 , y(0) = 0.5, y(0.5). h = 0.1, h = 0.05, h = 0.2, y(1). y(0.5). y(1). h = 0.1, y(1). For Problems 6–10, use the modified Euler method with the specified step size to determine the solution to the given initial-value problem at the specified point. In each case, compare your answer to that obtained using Euler’s method. 6. The initial-value problem in Problem 1. 7. The initial-value problem in Problem 2. 8. The initial-value problem in Problem 3. 9. The initial-value problem in Problem 4. 10. The initial-value problem in Problem 5. For Problems 11–15, use the fourth-order Runge-Kutta method with the specified step size to determine the solution to the given initial-value problem at the specified point. In each case, compare your answer to that obtained using Euler’s method. 11. The initial-value problem in Problem 1. 12. The initial-value problem in Problem 2. 13. The initial-value problem in Problem 3. 14. The initial-value problem in Problem 4. 3. With each additional step that is taken in Euler’s method, the error in the approximation obtained from the method can only grow in size. 4. At each step of length h, Heun’s method requires two applications of Euler’s method with step size h/2. Problems For Problems 1–5, use Euler’s method with the specified step size to determine the solution to the given initial-value problem at the specified point. 15. The initial-value problem in Problem 5. 16. Use the fourth-order Runge-Kutta method with h = 0.5 to approximate the solution to the initialvalue problem y+ 1 10 y = e−x/10 cos x, y(0) = 0 at the points x = 0.5, 1.0, . . . , 25. Plot these points and describe the behavior of the corresponding solution. i i i i i i i “main” 2007/2/16 page 99 i 1.11 1.11 Some Higher-Order Differential Equations 99 Some Higher-Order Differential Equations So far we have developed analytical techniques only for solving special types of firstorder differential equations. The methods that we have discussed do not apply directly to higher-order differential equations, and so the solution to such equations usually requires the derivation of new techniques. One approach is to replace a higher-order differential equation by an equivalent system of first-order equations. (This will be developed further in Chapter 7.) For example, any second-order differential equation that can be written in the form d 2y dy = F x , y, dx dx 2 (1.11.1) where F is a known function, can be replaced by an equivalent pair of first-order differential equations as follows. We let v = dy/dx . Then d 2 y/dx 2 = dv/dx , and so solving Equation (1.11.1) is equivalent to solving the following two first-order differential equations dy = v, dx dv = F (x, y, v). dx (1.11.2) (1.11.3) In general the differential equation (1.11.3) cannot be solved directly, since it involves three variables, x , y , and v . However, for certain forms of the function F , Equation (1.11.3) will involve only two variables, and then we can sometimes solve it for v using one of our previous techniques. Having obtained v , we can then substitute into Equation (1.11.2) to obtain a first-order differential equation for y . We now discuss two forms of F for which this is certainly the case. Case 1: Second-Order Equations with the Dependent Variable Missing If y does not occur explicitly in the function F , then Equation (1.11.1) assumes the form d 2y dy = F x, 2 dx dx (1.11.4) . Substituting v = dy/dx and dv/dx = d 2 y/dx 2 into this equation allows us to replace it with the two first-order equations dy = v, dx dv = F (x, v). dx (1.11.5) (1.11.6) Thus, to solve Equation (1.11.4), we first solve Equation (1.11.6) for v in terms of x and then solve Equation (1.11.5) for y as a function of x . Example 1.11.1 Find the general solution to d 2y 1 = 2 x dx dy + x 2 cos x , dx x > 0. (1.11.7) i i i i i i i “main” 2007/2/16 page 100 i 100 CHAPTER 1 First-Order Differential Equations Solution: In Equation (1.11.7) the dependent variable is missing, and so we let v = dy/dx , which implies that d 2 y/dx 2 = dv/dx . Substituting into Equation (1.11.7) yields the following equivalent first-order system: dy = v, dx dv 1 = (v + x 2 cos x). dx x (1.11.8) (1.11.9) Equation (1.11.9) is a first-order linear differential equation with standard form dv − x −1 v = x cos x. dx (1.11.10) An appropriate integrating factor is I (x) = e− x −1 dx = e− ln x = x −1 . Multiplying Equation (1.11.10) by x −1 reduces it to d −1 (x v) = cos x, dx which can be integrated directly to obtain x −1 v = sin x + c. Thus, v = x sin x + cx. (1.11.11) Substituting the expression for v from (1.11.11) into Equation (1.11.8) gives dy = x sin x + cx dx which we can integrate to obtain y(x) = −x cos x + sin x + c1 x 2 + c2 . Case 2: Second-Order Equations with the Independent Variable Missing If x does not occur explicitly in the function F in Equation (1.11.1), then we must solve a differential equation of the form dy d 2y = F y, dx dx 2 . (1.11.12) In this case, we still let v= dy , dx as previously, but now we use the chain rule to express d 2 y/dx 2 in terms of dv/dy . Specifically, we have d 2y dv dy dv dv = =v . = dx dy dx dy dx 2 i i i i i i i “main” 2007/2/16 page 101 i 1.11 Some Higher-Order Differential Equations 101 Substituting for dy/dx and d 2 y/dx 2 into Equation (1.11.12) reduces the second-order equation to the equivalent first-order system dy = v, dx dv = F (y, v). dy (1.11.13) (1.11.14) In this case, we first solve Equation (1.11.14) for v as a function of y and then solve Equation (1.11.13) for y as a function of x . Example 1.11.2 Find the general solution to d 2y 2 =− 1−y dx 2 dy dx 2 . (1.11.15) Solution: In this differential equation, the independent variable does not occur explicitly. Therefore, we let v = dy/dx and use the chain rule to obtain dv dy dv dv d 2y = =v . = 2 dx dy dx dy dx Substituting into Equation (1.11.15) results in the equivalent system dy = v, dx dv 2 v =− v2 . dy 1−y (1.11.16) (1.11.17) Separating the variables in the differential equation (1.11.17) gives 2 1 dv = − dy, v 1−y (1.11.18) which can be integrated to obtain ln |v | = 2 ln |1 − y | + c. Combining the logarithm terms and exponentiating yields v(y) = c1 (1 − y)2 , (1.11.19) where we have set c1 = ±ec . Notice that in solving Equation (1.11.17), we implicitly assumed that v = 0, since we divided by it to obtain Equation (1.11.18). However, the general form (1.11.19) does include the solution v = 0, provided we allow c1 to equal zero. Substituting for v into Equation (1.11.16) yields dy = c1 (1 − y)2 . dx Separating the variables and integrating, we obtain (1 − y)−1 = c1 x + d1 . That is, 1−y = 1 . c1 x + d1 i i i i i i i “main” 2007/2/16 page 102 i 102 CHAPTER 1 First-Order Differential Equations Solving for y gives y(x) = c1 x + (d1 − 1) , c1 x + d1 (1.11.20) which can be written in the simpler form y(x) = x+a , x+b (1.11.21) where the constants a and b are defined by a = (d1 − 1)/c1 and b = d1 /c1 . Notice that the form (1.11.21) does not include the solution y = constant, which is contained in (1.11.20) (set c1 = 0). This is because in dividing by c1 , we implicitly assumed that c1 = 0. Thus in specifying the solution in the form (1.11.21), we should also include the statement that any constant function y = k (k a constant) is a solution. Example 1.11.3 Determine the displacement at time t of a simple harmonic oscillator that is extended a distance A units from its equilibrium position and released from rest at t = 0. Solution: According to the derivation in Section 1.1, the motion of the simple harmonic oscillator is governed by the initial-value problem d 2y = −ω2 y, dt 2 dy (0) = 0, y(0) = A, dt (1.11.22) (1.11.23) where ω is a positive constant. The differential equation (1.11.22) has the independent variable t missing. We therefore let v = dy/dt and use the chain rule to write d 2y dv =v dy dt 2 It then follows that Equation (1.11.22) can be replaced by the equivalent first-order system dy = v, dt dv v = −ω2 y. dy (1.11.24) (1.11.25) Separating the variables and integrating Equation (1.11.25) yields 12 1 v = − ω2 y 2 + c, 2 2 which implies that v = ± c1 − ω2 y 2 where c1 = 2c. Substituting for v into Equation (1.11.24) yields dy = ± c1 − ω2 y 2 . dt (1.11.26) i i i i i i i “main” 2007/2/16 page 103 i 1.11 Some Higher-Order Differential Equations 103 Setting t = 0 in this equation and using the initial conditions (1.11.23), we find that c1 = ω2 A2 . Equation (1.11.26) therefore gives dy = ±ω A2 − y 2 . dt By separating the variables and integrating, we obtain arcsin(y/A) = ±ωt + b, where b is an integration constant. Thus, y(t) = A sin(b ± ωt). The initial condition y(0) = A implies that sin b = 1, and so we can choose b = π/2. We therefore have y(t) = A sin(π/2 ± ωt) That is, y(t) = A cos ωt. Consequently the predicted motion is that the mass oscillates between ±A for all t . This solution makes sense physically, since the simple harmonic oscillator does not include dissipative forces that would slow the motion. Remark In Chapter 6 we will see how to solve the initial-value problem (1.11.22), (1.11.23) in just a few lines of work without requiring any integration! Exercises for 1.11 7. y − 2x −1 y = 6x 4 . Skills • Be familiar with the strategy of solving a higher-order differential equation by replacing it with an equivalent system of first-order differential equations, and be able to carry out this strategy in particular instances. Problems 1. y = 2x −1 y + 4x 2 . 2. (x − 1)(x − 2)y = y − 1. 3. y + 2y −1 )(y )2 = y . 9. y − α(y )2 − βy = 0, where α and β are nonzero constants. 5. y + y tan x = (y )2 . dx dt 2 +2 11. (1 + x 2 )y = −2xy . 12. y + y −1 (y )2 = ye−y (y )3 . 13. y − y tan x = 1, 0 ≤ x < π/2. In Problems 14–15, solve the given initial-value problem. 4. y = (y )2 tan y . d 2x = dt 2 d 2x dx = 2(t + ). dt dt 2 10. y − 2x −1 y = 18x 4 . For Problems 1–13, solve the given differential equation. 6. 8. t dx . dt 14. yy = 2(y )2 + y 2 , y(0) = 1, 15. y = ω2 y, y(0) = a, positive constants. y (0) = 0. y (0) = 0, where ω, a are i i i i i i i “main” 2007/2/16 page 104 i 104 CHAPTER 1 First-Order Differential Equations 16. The following initial-value problem arises in the analysis of a cable suspended between two fixed points y= 1 1 + (y )2 , y(0) = a, y (0) = 0, a where a is a nonzero constant. Solve this initial-value problem for y(x). The corresponding solution curve is called a catenary. 17. Consider the general second-order linear differential equation with dependent variable missing: Replace this differential equation with an equivalent pair of first-order equations and express the solution in terms of integrals. 18. Consider the general third-order differential equation of the form = F (x, y ). du1 du2 du3 = u2 , = u3 , = F (x, u3 ), dx dx dx where the variables u1 , u2 , u3 are defined by (b) Solve y u2 = y , u3 = y . = x −1 (y − 1). 19. A simple pendulum consists of a particle of mass m supported by a piece of string of length L. Assuming that the pendulum is displaced through an angle θ0 radians from the vertical and then released from rest, 1.12 θ(0) = θ0 , dθ (0) = 0. dt (1.11.28) (a) For small oscillations, θ << 1, we can use the approximation sin θ ≈ θ in Equation (1.11.28) to obtain the linear equation θ(0) = θ0 , dθ (0) = 0. dt Solve this initial-value problem for θ as a function of t . Is the predicted motion reasonable? (b) Obtain the following first integral of (1.11.28): dθ 2g =± (cos θ − cos θ0 ). dt L (1.11.29) (1.11.27) (a) Show that Equation (1.11.27) can be replaced by the equivalent first-order system u1 = y, g d 2θ + sin θ = 0, L dt 2 d 2θ g + θ = 0, dt 2 L y + p(x)y = q(x). y the resulting motion is described by the initial-value problem (c) Show from Equation (1.11.29) that the time T (equal to one-fourth of the period of motion) required for θ to go from 0 to θ0 is given by the elliptic integral of the first kind T= L 2g θ0 √ 0 1 dθ. (1.11.30) cos θ − cos θ0 (d) Show that (1.11.30) can be written as T= L g π /2 0 1 1 − k 2 sin2 u du, where k = sin(θ0 /2). [Hint: First express cos θ and cos θ0 in terms of sin2 (θ/2) and sin2 (θ0 /2).] Chapter Review Basic Theory of Differential Equations This chapter has provided an introduction to the theory of differential equations. A differential equation involves one or more derivatives of an unknown function, and the highest-order derivative is the order of the differential equation. For an nth-order differential equation, the general solution contains n arbitrary constants, and all solutions can be obtained by assigning appropriate values to the constants. This chapter is concerned mainly with first-order differential equations, which may be written in the form i i i i i i i “main” 2007/2/16 page 105 i 1.12 Chapter Review dy = f (x, y), dx 105 (1.12.1) for some given function f . If we impose an initial condition specifying the value of a solution y(x) to the differential equation (1.12.1) at a particular point x0 , say y0 = y(x0 ), then we have an initial-value problem: dy = f (x, y), dx y(x0 ) = y0 . (1.12.2) To solve an initial-value problem of the form (1.12.2), the first step is to determine the general solution to the differential equation (1.12.1), and then use the initial condition to determine the specific value of the arbitrary constant appearing in the general solution. Solution Techniques for First-Order Differential Equations One of our main goals in this chapter is to find solutions to first-order differential equations of the form (1.12.1). There are various ways in which we can seek these solutions: 1. Geometrically: The function f (x, y) gives the slope of the tangent line to the solution curves of the differential equation (1.12.1) at the point (x, y). Thus, by computing f (x, y) for various points (x, y), we can draw small line segments through the point (x, y) with slope f (x, y) to depict how a solution curve would pass through (x, y). The resulting picture of line segments is called the slope field of the differential equation, and any solution curves to the differential equation in the xy -plane must be tangent to the slope field at all points. For example, the differential equation dy/dx = −x/y determines a slope field consisting of small line segments that encircle the origin. Indeed, the solutions to this differential equation consist of concentric circles centered at the origin. One piece of theory is that different solution curves for the same differential equation can never cross (this essentially tells us that an initial-value problem cannot have multiple solutions). Thus, for example, if we find a solution to the differential equation (1.12.1) of the form y(x) = y0 , for some constant y0 (recall that such a solution is called an equilibrium solution), then all other solution curves to the differential equation must lie entirely above the line y = y0 or entirely below it. 2. Numerically: Suppose we wish to approximate the solution to the initial-value problem (1.12.2) at the point x = x1 = x0 + h, where h is small. Euler’s method uses the slope of the solution at (x0 , y0 ), which is f (x0 , y0 ), to use a tangent line approximation to the solution: y(x) = y0 + f (x0 , y0 )(x − x0 ). Therefore, we approximate y(x1 ) = y0 + f (x0 , y0 )(x1 − x0 ) = y0 + hf (x0 , y0 ). Now, starting from the point (x1 , y(x1 )), we can repeat the process to find approximations to the solutions at other points x2 , x3 , . . . . The conclusion is that the approximation to the solution to the initial-value problem (1.12.2) at the points xn+1 = x0 + nh (n = 0, 1, . . . ) is yn+1 = yn + hf (xn , yn ), n = 0, 1, . . . In Section 1.10, other modifications to Euler’s method are also discussed. i i i i i i i “main” 2007/2/16 page 106 i 106 CHAPTER 1 First-Order Differential Equations 3. Analytically: In some situations, we can explicitly obtain an equation for the general solution to the differential equation (1.12.1). These include situations in which the differential equation is separable, first-order linear, first-order homogeneous, Bernoulli, and/or exact. Table 1.12.1 shows the types of differential equations we can solve analytically and summarizes the solution techniques. If a given differential equation cannot be written in one of these forms, then the next step is to try to determine an integrating factor. If that fails, then we might try to find a change of variables that would reduce the differential equation to one of the above types. Type Standard Form Technique Separable p(y)y = q(x) Separate the variables and integrate. First-order linear y + p(x)y = q(x) d Rewrite as dx (I · y) = I · q(x), where I = e p(x)dx , and integrate with respect to x . First-order homogeneous y = f (x, y) where f (tx, ty) = f (x, y) Change variables: y = xV (x), and reduce to a separable equation. Bernoulli y + p(x)y = q(x)y n Divide by y n and make the change of variables u = y 1−n . This reduces the differential equation to a linear equation. Exact M dx + N dy = 0, with My = Nx The solution is φ(x, y) = c, where φ is determined by integrating φx = M , φy = N . Table 1.12.1: A summary of the basic solution techniques for y = f (x, y). Example 1.12.1 Determine which of the above types, if any, the following differential equation falls into: dy (8x 5 + 3y 4 ) . =− dx 4xy 3 Solution: Since the given differential equation is written in the form dy/dx = f (x, y), we first check whether it is separable or homogeneous. By inspection, we see that it is neither of these. We next check to see whether it is a linear or a Bernoulli equation. We therefore rewrite the equation in the equivalent form dy 3 + y = −2x 4 y −3 , dx 4x (1.12.3) which we recognize as a Bernoulli equation with n = −3. We could therefore solve the equation using the appropriate technique. Owing to the y −3 term in Equation (1.12.3), it follows that the equation is not a linear equation. Finally, we check for exactness. The natural differential form to try for the given differential equation is (8x 5 + 3y 4 ) dx + 4xy 3 dy = 0. (1.12.4) In this form, we have My = 12y 3 , Nx = 4y 3 , i i i i i i i “main” 2007/2/16 page 107 i 1.12 Chapter Review 107 so that the equation is not exact. However, we see that (My − Nx )/N = 2x −1 , so that according to Theorem 1.9.11, I (x) = x 2 is an integrating factor. Therefore, we could multiply Equation (1.12.4) by x 2 and then solve it as an exact equation. Examples of First-Order Differential Equations There are numerous real-world examples of first-order differential equations. Among the applications discussed in this chapter are Newton’s law of cooling, families of orthogonal trajectories, Malthusian and logistic population models, mixing problems, electric circuits, and others. Additional Problems 1. A racquetball player standing at the back wall of the court hits the ball from a height of 2 feet horizontally toward the front wall at 80 miles per hour. The length of a regulation racquetball court is 40 feet. Does the ball reach the front wall before hitting the ground? Neglect air resistance, and assume the acceleration of gravity is 32 feet/sec2 . 2. A boy 2 meters tall shoots a toy rocket straight up from head level at 10 meters per second. Assume the acceleration of gravity is 9.8 meters/sec2 . (a) What is the highest point above the ground reached by the rocket? (b) When does the rocket hit the ground? In Problems 3–6, find the equation of the orthogonal trajectories to the given family of curves. 3. y = cx 3 . 4. y 2 = cx 3 . (b) Determine the orthogonal trajectories to the family (1.12.5). In Problems 8–9, sketch the slope field and some representative solution curves for the given differential equation. 8. y = sin x . 9. y = y/x 2 . 10. At time t the velocity, v(t), of an object is governed by the differential equation dv 1 = (25 − v), dt 2 t > 0. (a) Verify that v(t) = 25 is a solution to this differential equation. (b) Sketch the slope field for 0 ≤ v ≤ 25. What happens to v(t) as t → ∞? 5. y = ln (cx). 6. x 4 + y 4 = c. 7. Consider the family of curves x 2 + 3y 2 = 2cy, (1.12.5) (a) Show that the differential equation of this family is dy 2xy =2 . dx x − 3y 2 11. An object of mass m is released from rest in a medium in which the frictional forces are proportional to the square of the velocity. The initial-value problem that governs the subsequent motion is mv dv = mg − kv 2 , dy v(0) = 0, (1.12.6) i i i i i i i “main” 2007/2/16 page 108 i 108 CHAPTER 1 First-Order Differential Equations where v(t) denotes the velocity of the object at time t , y(t) denotes the distance traveled by the object at time t as measured from the point at which the object was released, and k is a positive constant. (a) Solve (1.12.6) and show that mg (1 − e−2ky/m ). k v2 = (b) Make a sketch of v2 2xy dy . =− 2 dx x + 2y 15. (y 2 + x2) dx − x2 dy = 0. 16. y + y(tan x + y sin x) = 0. dy 2e 2x 1 17. . + y = 2x 2x dx 1+e e −1 18. y − x −1 y = x −1 x 2 − y 2 . sin y + y cos x + 1 dy = . 19. dx 1 − x cos y − sin x 1 dy + y= dx x 33. dy x2 y =2 +. 2 dx x x −y 34. [ln (xy) + 1] dx + 35. y + x + 2y y d y = 0. 25 ln x y . = x 2x 3 y 36. (x + xy 2 )y = x 3 yex −y . 25x 2 ln x 2y π . 2 For Problems 38–41, determine which of the five types of differential equations we have studied the given differential equation falls into, and use an appropriate technique to find the solution to the initial-value problem. 2x 2 ln x . 14. 20. dy √ − x2y = y. dx 37. y = cos x(y csc x − 1), 0 < x < 2 ln x dy = . 12. dx xy + 3xy 32. as a function of y . In Problems 12–37, determine which of the five types of differential equations we have studied the given equation falls into (see Table 1.12.1), and use an appropriate technique to find the general solution. 13. xy − 2y = 31. x sec2 (xy) dy = − y sec2 (xy) + 2x d x . . 38. y − x 2 y = x 2 , y(0) = 5. 39. e−3x +2y dx + ex −4y dy = 0, y(0) = 0. 40. (3x 2 + 2xy 2 ) dx + (2x 2 y) dy = 0, y(1) = 3. 41. dy 1 − (sin x)y = e− cos x , y(0) = . dx e 42. Determine all values of the constants m and n, if there are any, for which the differential equation (x 5 + y m ) dx − x n y 3 dy = 0 is each of the following: 21. e2x +y dy − ex −y dx = 0. (a) Exact. 22. y + y cot x = sec x . (b) Separable. dy 2e x √ 23. + y = 2 ye−x . dx 1 + ex 24. y [ln (y/x) + 1]dx − xdy = 0. 25. (1 + 2xey ) dx − (ey + x) dy = 0. 26. y + y sin x = sin x . 27. (3y 2 + x 2 ) dx − 2xy dy = 0. 28. 2x(ln x)y − y = −9x 3 y 3 ln x . 29. (1 + x)y = y(2 + x). 30. (x 2 − 1)(y − 1) + 2y = 0. (c) Homogeneous. (d) Linear. (e) Bernoulli. 43. A man’s sandals are moved from poolside (80◦ F) to a sauna (180◦ F) to warm and dry them. If they are 100◦ F after 3 minutes in the sauna, how much time is required in the sauna to increase their temperature to 140◦ F, according to Newton’s law of cooling? 44. A hot plate (150◦ F) is placed on a countertop in a room kept at 70◦ F. If the plate cools 25◦ F in the first 10 minutes, when does the plate reach 100◦ F, according to Newton’s law of cooling? i i i i i i i “main” 2007/2/16 page 109 i 1.12 45. A simple nonlinear law of cooling states that the rate of change of temperature of an object is proportional to the square of the temperature difference between the object and its surrounding medium (you may assume that the temperature of the surrounding medium is constant). Set up and solve the initial-value problem that governs this cooling process if the initial temperature is T0 . What happens to the temperature of the object as t → ∞? 46. The temperature of an object at time t is governed by the linear differential equation dT = −k(T − 5 cos 2t). dt At t = 0, the temperature of the object is 0◦ F and is, at that time, increasing at a rate of 5◦ F/min. (a) Determine the value of the constant k . (b) Determine the temperature of the object at time t. (c) Describe the behavior of the temperature of the object for large values of t . 47. Each spring, sandhill cranes migrate through the Platte River valley in central Nebraska. An estimated maximum of a half-million of these birds reach the region by April 1 each year. If there are only 100,000 sandhill cranes 15 days later and the sandhill cranes leave the Platte River valley at a rate proportional to the number of them still in the valley at the time, (a) How many sandhill cranes remain in the valley 30 days after April 1? (b) How many sandhill cranes remain in the valley 35 days after April 1? (c) How many days after April 1 will there be fewer than 1000 sandhill cranes in the valley? 48. A city’s population in the year 2000 was 200,000, in 2003 it was 230,000, and in 2006 it was 250,000. Using the logistic model of population, predict the population in 2010 and 2020. Chapter Review 109 49. Consider an RC circuit with R = 4 , C = 1 F, 5 and E(t) = 6 cos 2t V. If q(0) = 3 C, determine the current in the circuit for t ≥ 0. 50. Consider an RL circuit with R = 3 , L = 0.3 H, and E(t) = 10 V. If i(0) = 3 A, determine the current in the circuit for t ≥ 0. 51. A solution containing 3 g/L of a salt solution pours into a tank, initially half full of water, at a rate of 6 L/min. The well-stirred mixture flows out at a rate of 4 L/min. If the tank holds 60 L, find the amount of salt (in grams) in the tank when the solution overflows. In Problems 52–53, use Euler’s method with the specified step size to determine the solution to the given initial-value problem at the specified point. 52. y = x 2 + 2y 2 , y(0) = −3, h = 0.1, y(1). 53. y = 3x + 2, y(1) = 2, h = 0.05, y(1.5). y In Problems 54–55, use the modified Euler method with the specified step size to determine the solution to the given initial-value problem at the specified point. In each case, compare your answer to that determined by using Euler’s method. 54. The initial-value problem in Problem 52. 55. The initial-value problem in Problem 53. In Problems 56–57, use the fourth-order Runge-Kutta method with the specified step size to determine the solution to the given initial-value problem at the specified point. In each case, compare your answer to that determined by using Euler’s method. 56. The initial-value problem in Problem 52. 57. The initial-value problem in Problem 53. Project: A Cylindrical Tank Problem Consider an open cylindrical tank of height h0 meters and radius r meters that is filled with water. A circular hole of radius l meters in the bottom of the tank allows the water to flow out under the influence of gravity. According to Torricelli’s law, the water flows out with the same speed that it would acquire in falling freely from the water level in the tank to the hole. i i i i i i i “main” 2007/2/16 page 110 i 110 CHAPTER 1 First-Order Differential Equations 1. Use Torricelli’s law to derive the following equation for the rate of change of volume of water in the tank, dV = −a 2gh dt where h(t) denotes the height of water in the tank at time t , a denotes the area of the hole, and g denotes the acceleration due to gravity. [Hint: First show that an √ object that is released from rest at a height h hits the ground with a speed 2gh. Then consider the change in the volume of water in the tank in a time interval t .] 2. Show that the rate of change of volume of water in the tank is also given by dV dh = πr 2 . dt dt 3. Using the results from problems (1) and (2), determine the height of the water in the tank at time t , and show that the tank will empty when t = te where te = πr 2 a 2h0 . g 4. Suppose now that starting at t = 0 chemical is added to the water in the tank at a rate of w grams/second. Derive the following differential equation governing the amount of chemical, A(t), in the tank at time t : dA 2 − A = w, dt t − te 0 < t < te . (1.12.7) 5. Solve the differential equation (1.12.7). Determine the time when A(t) is a maximum. 6. By making an appropriate change of variables in the differential equation (1.12.7), derive a differential equation for the concentration c(t) of chemical in the tank at time t . Solve your differential equation and verify that you get the same expression for c(t) as you do by dividing the expression for A(t) obtained in the previous problem by V (t). 7. In the particular case when h0 = 16 m, r = 5 m, l = 0.1 m, and w = 15 g/s, determine te , and the time when the concentration of chemical in the tank reaches 1 g/L. i i i i i i i “main” 2007/2/16 page 111 i CHAPTER 2 Matrices and Systems of Linear Equations Algebra is the intellectual instrument which has been created for rendering clear the quantitative aspects of the world. — Alfred North Whitehead We will see in the later chapters that most problems in linear algebra can be reduced to questions regarding the solutions of systems of linear equations. In preparation for this, the next two chapters provide a detailed introduction to the theory and solution techniques for such systems. An example of a linear system of equations in the unknowns x1 , x2 , x3 is 3x1 + 4x2 − 7x3 = 5, 2x1 − 3x2 + 9x3 = 7, 7x1 + 2x2 − 3x3 = 4. We see that this system is completely determined by the array of numbers 3 4 −7 2 −3 9 7 2 −3 5 7 , 4 which contains the coefficients of the unknowns on the left-hand side of the system and the numbers appearing on the right-hand side of the system. Such an array is an example of a matrix. In this chapter we see that, in general, linear systems of equations are best represented in terms of matrices and that, once such a representation has been made, the set of all solutions to the system can be easily determined. In the first few sections of this chapter we therefore introduce the basics of matrix algebra. We then apply matrices to solve systems of linear equations. In Chapter 7, we will see how matrices also give a natural framework for formulating and solving systems of linear differential equations. 111 i i i i i i i “main” 2007/2/16 page 112 i 112 CHAPTER 2 Matrices and Systems of Linear Equations 2.1 Matrices: Definitions and Notation We begin our discussion of matrices with a definition. DEFINITION 2.1.1 An m × n (read “m by n”) matrix is a rectangular array of numbers arranged in m horizontal rows and n vertical columns. Matrices are usually denoted by uppercase letters, such as A and B . The entries in the matrix are called the elements of the matrix. Example 2.1.2 The following are examples of a 2 × 3 and a 3 × 3 matrix, respectively: 3 A= 2 51 45 0 −3 7 , 5 9 2 −1 3 B = 1 1 −1 . 001 We will use the index notation to denote the elements of a matrix. According to this notation, the element in the i th row and j th column of the matrix A will be denoted aij . Thus, for the matrices in the previous example we have a13 = 1 , 5 a22 = − 3 , 7 b23 = −1, and so on. Using the index notation, a general m × n matrix A is written a11 a12 . . . a1n a21 a22 . . . a2n A= . . . , . . . . . . am1 am2 . . . amn or, in a more abbreviated form, A = [aij ]. Remark The expression m × n representing the number of rows and columns of a general matrix A is sometimes informally called the size of the matrix A. The numbers m and n themselves are sometimes called the dimensions1 of the matrix A. Next we define what is meant by equality of matrices. DEFINITION 2.1.3 Two matrices A and B are equal, written A = B , if 1. They both have the same size, m × n. 2. All corresponding elements in the matrices are equal: aij = bij for all i and j with 1 ≤ i ≤ m and 1 ≤ j ≤ n. 1 Be careful not to confuse this usage of the term with the dimension of a vector space, which will be introduced in Chapter 4. i i i i i i i “main” 2007/2/16 page 113 i 2.1 Matrices: Definitions and Notation According to Definition 2.1.3, even though the matrices 4 123 A= and B = 3 456 1 113 2 6 5 contain the same six numbers, and therefore store the same basic information, they are not equal as matrices. Row Vectors and Column Vectors Of particular interest to us in the future will be 1 × n and n × 1 matrices. For this reason we give them special names. DEFINITION 2.1.4 A 1 × n matrix is called a row n-vector. An n × 1 matrix is called a column n-vector. The elements of a row or column n-vector are called the components of the vector. Remarks 1. We can refer to the objects just defined simply as row vectors and column vectors if the value of n is clear from the context. 2. We will see later in this chapter that when a system of linear equations is written using matrices, the basic unknown in the reformulated system is a column vector. A similar formulation will also be given in Chapter 7 for systems of differential equations. Example 2.1.5 The matrix a = 2 3 −1 5 4 7 is a row 3-vector and 1 −1 b= 3 4 is a column 4-vector. As indicated here, we usually denote a row or column vector by a lowercase letter in bold print. Associated with any m × n matrix are m row n-vectors and n column m-vectors. These are referred to as the row vectors of the matrix and the column vectors of the matrix, respectively. Example 2.1.6 Associated with the matrix −2 1 3 4 A = 1 2 1 1 3 −1 2 5 are the row 4-vectors −2 1 3 4 , 1211 , and 3 −1 2 5 , i i i i i i i “main” 2007/2/16 page 114 i 114 CHAPTER 2 Matrices and Systems of Linear Equations and the column 3-vectors −2 1 , 3 1 2 , −1 3 1 , 2 and 4 1 . 5 Conversely, if a1 , a2 , . . . , an are each column m-vectors, then we let [a1 , a2 , . . . , an ] denote the m × n matrix whose column vectors are a1 , a2 , . . . , an . Similarly, if b1 , b2 , . . . , bm are each row n-vectors, then we write b1 b2 . . . bm for the m × n matrix with row vectors b1 , b2 , . . . , bm . The reader should observe that a list of vectors arranged in a row will always consist of column vectors, while a list of vectors arranged in a column will always consist of row vectors. Example 2.1.7 If a1 = 1 5 2 3 , a2 = 4 7 5 9 , and a3 = −1 3 , then 3 11 [a1 , a2 , a3 ] = 1 5 2 3 4 7 5 9 −1 3 3 11 . DEFINITION 2.1.8 If we interchange the row vectors and column vectors in an m × n matrix A, we obtain an n × m matrix called the transpose of A. We denote this matrix by AT . In index T notation, the (i, j )th element of AT , denoted aij , is given by T aij = aj i . Example 2.1.9 If A= 1262 , 0347 then 1 2 T A = 6 2 If 0 3 . 4 7 135 A = 2 0 7 , 349 i i i i i i i “main” 2007/2/16 page 115 i 2.1 then Matrices: Definitions and Notation 115 123 AT = 3 0 4 . 579 Square Matrices An n × n matrix is called a square matrix, since it has the same number of rows as columns. If A is a square matrix, then the elements aii , 1 ≤ i ≤ n, make up the main diagonal, or leading diagonal, of the matrix. (See Figure 2.1.1 for the 3 × 3 case.) a11 a12 a13 a21 a22 a23 a31 a32 a33 Figure 2.1.1: The main diagonal of a 3 × 3 matrix. The sum of the main diagonal elements of an n × n matrix A is called the trace of A and is denoted tr(A). Thus, tr(A) = a11 + a22 + · · · + ann . An n × n matrix A is said to be lower triangular if aij = 0 whenever i < j (zeros everywhere above (i.e.. “northeast of”) the main diagonal), and it is said to be upper triangular if aij = 0 whenever i > j (zeros everywhere below (i.e., “southwest of”) the main diagonal). The following are examples of an upper triangular and lower triangular matrix, respectively: 1 −8 5 20 0 0 −3 9 , 0 1 0. 0 04 −6 7 −3 Observe that the transpose of a lower (upper) triangular matrix is an upper (lower) triangular matrix. If every element on the main diagonal of a lower (upper) triangular matrix is a 1, the matrix is called a unit lower (upper) triangular matrix. An n × n matrix D = [dij ] that has all off-diagonal elements equal to zero is called a diagonal matrix. Note that a matrix D is a diagonal matrix if and only if D is simultaneously upper and lower triangular. Such a matrix is completely determined by giving its main diagonal elements, since dij = 0 whenever i = j . Consequently, we can specify a diagonal matrix in the compact form D = diag(d1 , d2 , . . . , dn ), where di denotes the diagonal element dii . Example 2.1.10 The 4 × 4 diagonal matrix D = diag(1, 2, 0, 3) is 1000 0 2 0 0 D= 0 0 0 0 . 0003 i i i i i i i “main” 2007/2/16 page 116 i 116 CHAPTER 2 Matrices and Systems of Linear Equations The transpose naturally picks out two important types of square matrices as follows. DEFINITION 2.1.11 1. A square matrix A satisfying AT = A is called a symmetric matrix. 2. If A = [aij ], then we let −A denote the matrix with elements −aij . A square matrix A satisfying AT = −A is called a skew-symmetric (or anti-symmetric) matrix. Example 2.1.12 The matrix 1 −1 1 5 −1 2 2 6 A= 1 2 3 4 5 649 is symmetric, whereas 0 −1 −5 3 1 0 1 −2 B= 5 −1 0 7 −3 2 −7 0 is skew-symmetric. Notice that the main diagonal elements of the skew-symmetric matrix in the preceding example are all zero. This is true in general, since if A is a skew-symmetric matrix, then aij = −aj i , which implies that when i = j , aii = −aii , so that aii = 0. Matrix and Vector Functions Later in the text we will be concerned with systems of two or more differential equations. The most effective way to study such systems, as it turns out, is to represent the system using matrices and vectors. However, we will need to allow the elements of the matrices and vectors that arise to contain functions of a single variable, not just real or complex numbers. This leads to the following definition, reminiscent of Definition 2.1.1. DEFINITION 2.1.13 An m × n matrix function A is a rectangular array with m rows and n columns whose elements are functions of a single real variable t . Example 2.1.14 Here are two examples of matrix functions: A(t) = t − cos t 5 2 et ln (t + 1) tet t3 and 5 − t + t 2 sin(e2t ) B(t) = −1 tan t . 6 6−t A matrix function A(t) is defined only for real values of t such that all elements in A(t) assume a well-defined value. The function A is defined only for real values of t with i i i i i i i “main” 2007/2/16 page 117 i 2.1 Matrices: Definitions and Notation 117 t > −1, since ln (t + 1) is defined only for t > −1. The reader should determine the values of t for which the matrix function B is defined. Remark It is possible, of course, to consider matrix functions of more than one variable. However, this will not be particularly relevant for our purposes in this text. Finally in this section, we have the following special type of matrix function. DEFINITION 2.1.15 An n × 1 matrix function is called a column n-vector function. For instance, t2 −6tet is a column 2-vector function.2 Exercises for 2.1 Key Terms True-False Review Matrices, Elements, Size (dimensions) of a matrix, Row vector, Column vector, Square matrix, Main diagonal, Trace, Lower (Upper) triangular matrix, Unit lower (upper) triangular matrix, Diagonal matrix, Symmetric matrix, Skew-symmetric matrix, Matrix function, Column n-vector function. For Questions 1–10, decide if the given statement is true or false, and give a brief justification for your answer. If true, you can quote a relevant definition or theorem from the text. If false, provide an example, illustration, or brief explanation of why the statement is false. 1. A diagonal matrix must be both upper triangular and lower triangular. Skills • Be able to determine the elements of a matrix. • Be able to identify the size (i.e., dimensions) of a matrix. • Be able to identify the row and column vectors of a matrix. • Be able to determine the components of a row or column vector. 2. An m × n matrix has m column vectors and n row vectors. 3. If A is a symmetric matrix, then so is AT . 4. The trace of a matrix is the product of the elements along the main diagonal. • Be able to say whether or not two given matrices are equal. 5. A skew-symmetric matrix must have zeros along the main diagonal. • Be able to find the transpose of a matrix. 6. A matrix that is both symmetric and skew-symmetric cannot contain any nonzero elements. • Be able to compute the trace of a square matrix. • Be able to recognize square matrices that are upper triangular, lower triangular, or diagonal. • Be able to recognize square matrices that are symmetric or skew-symmetric. • Be able to determine the values of the variable t such that a matrix function A is defined. 7. The matrix functions √ t 3t 2 1 sin 2t |t | and −2 + t ln t esin t −3 are defined for exactly the same values of t . 2 We could, of course, also speak of row n-vector functions as the 1 × n matrix functions, but we will not need them in this text. i i i i i i i “main” 2007/2/16 page 118 i 118 CHAPTER 2 Matrices and Systems of Linear Equations 8. The matrix function cos t t2 −2 −t t 1 e√ t −3 is defined for all positive real numbers t . 9. Any matrix of numbers is a matrix function defined for all real values of the variable t . 10. If A and B are matrix functions such that the matrices A(0) and B(0) are the same, then we should consider A and B to be the same matrix function. Problems 1. If 1 −2 3 2 A = 7 −6 5 −1 , 0 2 −3 4 determine a31 , a24 , a14 , a32 , a21 , and a34 . For Problems 2–6, write the matrix with the given elements. In each case, specify the dimensions of the matrix. 2. a11 = 1, a21 = −1, a12 = 5, a22 = 3. 3. a11 = 2, a12 = 1, a13 = −1, a21 = 0, a22 = 4, a23 = −2. 4. a11 = −1, a41 = −5, a31 = 1, a21 = 1. 1 3 −4 11. A = −1 −2 5 . 267 12. A = 2 10 6 . 5 −1 3 13. If a1 = [1 2], a2 = [3 4], and a3 = [5 1], write the matrix a1 A = a2 , a3 and determine the column vectors of A. 14. If 2 b1 = −1 , 4 5 b2 = 7 , −6 0 0, b3 = 0 1 2, b4 = 3 write the matrix B = [b1 , b2 , b3 , b4 ] and determine the row vectors of B . 15. If a1 , a2 , . . . , ap are each column q -vectors, what are the dimensions of the matrix that has a1 , a2 , . . . , ap as its column vectors? For Problems 16–20, give an example of a matrix of the specified form. 5. a11 = 1, a31 = 2, a42 = −1, a32 = 7, a13 = −2, a23 = 0, a33 = 4, a21 = 3, a41 = −4, a12 = −3, a22 = 6, a43 = 5. 16. 3 × 3 diagonal matrix. 6. a12 = −1, a13 = 2, a23 = 3, aj i = −aij , 1 ≤ i ≤ 3, 1 ≤ j ≤ 3. 18. 4 × 4 skew-symmetric matrix. For Problems 7–9, determine tr(A) for the given matrix. 10 . 23 7. A = 1 8. A = 3 7 2 3 9. A = 0 2 −1 2 −2 . 5 −3 01 2 5 . 1 −5 For Problems 10–12, write the column vectors and row vectors of the given matrix. 10. A = 1 −1 . 35 17. 4 × 4 upper triangular matrix. 19. 3 × 3 upper triangular symmetric matrix. 20. 3 × 3 lower triangular skew-symmetric matrix. For Problems 21– 24, give an example of a matrix function of the specified form. 21. 2 × 3 matrix function defined only for values of t with −2 ≤ t < 3. 22. 4 × 2 matrix function A such that A(0) = A(1) = A(2). 23. 1 × 5 matrix function A that is nonconstant such that all elements of A(t) are positive for all t in R. 24. 2 × 1 matrix function A that is nonconstant such that all elements of A(t) are in [0, 1] for every t in R. i i i i i i i “main” 2007/2/16 page 119 i 2.2 25. Construct distinct matrix functions A and B defined on all of R such that A(0) = B(0) and A(1) = B(1). Matrix Algebra 119 27. Determine all elements of the 3 × 3 skew-symmetric matrix A with a21 = 1, a31 = 3, a23 = −1. 26. Prove that a symmetric upper triangular matrix is diagonal. 2.2 Matrix Algebra In the previous section we introduced the general idea of a matrix. The next step is to develop the algebra of matrices. Unless otherwise stated, we assume that all elements of the matrices that appear are real or complex numbers. Addition and Subtraction of Matrices and Multiplication of a Matrix by a Scalar Addition and subtraction of matrices is defined only for matrices with the same dimensions. We begin with addition. DEFINITION 2.2.1 If A and B are both m × n matrices, then we define addition (or the sum) of A and B , denoted by A + B , to be the m × n matrix whose elements are obtained by adding corresponding elements of A and B . In index notation, if A = [aij ] and B = [bij ], then A + B = [aij + bij ]. Example 2.2.2 We have 2 −1 3 −1 0 5 1 −1 8 + = . 4 −5 0 −5 2 7 −1 −3 7 Properties of Matrix Addition: If A and B are both m × n matrices, then A+B =B +A A + (B + C) = (A + B) + C (matrix addition is commutative), (matrix addition is associative). Both of these properties follow directly from Definition 2.2.1. In order that we can model oscillatory physical phenomena, in much of the later work we will need to use complex as well as real numbers. Throughout the text we will use the term scalar to mean a real or complex number. DEFINITION 2.2.3 If A is an m × n matrix and s is a scalar, then we let sA denote the matrix obtained by multiplying every element of A by s . This procedure is called scalar multiplication. In index notation, if A = [aij ], then sA = [saij ]. Example 2.2.4 If A = 2 −1 10 −5 , then 5A = . 46 20 30 i i i i i i i “main” 2007/2/16 page 120 i 120 CHAPTER 2 Matrices and Systems of Linear Equations Example 2.2.5 If A = √ 1+i i and s = 1 − 2i , where i = −1, find sA. 2 + 3i 4 Solution: We have sA = (1 − 2i)(1 + i) (1 − 2i)i 3−i 2+i = . (1 − 2i)(2 + 3i) (1 − 2i)4 8 − i 4 − 8i DEFINITION 2.2.6 We define subtraction of two matrices with the same dimensions by A − B = A + (−1)B. In index notation, A − B = [aij − bij ]. That is, we subtract corresponding elements. Further properties satisfied by the operations of matrix addition and multiplication of a matrix by a scalar are as follows: Properties of Scalar Multiplication: For any scalars s and t , and for any matrices A and B of the same size, 1A = A s(A + B) = sA + sB (s + t)A = sA + tA s(tA) = (st)A = (ts)A = t (sA) (unit property), (distributivity of scalars over matrix addition), (distributivity of scalar addition over matrices), (associativity of scalar multiplication). The m × n zero matrix, denoted 0m×n (or simply 0, if the dimensions are clear), is the m × n matrix whose elements are all zeros. In the case of the n × n zero matrix, we may write 0n . We now collect a few properties of the zero matrix. The first of these below indicates that the zero matrix plays a similar role in matrix addition to that played by the number zero in the addition of real numbers. Properties of the Zero Matrix: For all matrices A and the zero matrix of the same size, we have A + 0 = A, A − A = 0, and 0A = 0. Note that in the last property here, the zero on the left side of the equation is a scalar, while the zero on the right side of the equation is a matrix. Multiplication of Matrices The definition we introduced above for how to multiply a matrix by a scalar is essentially the only possibility if, in the case when s is a positive integer, we want sA to be the same matrix as the one obtained when A is added to itself s times. We now define how to multiply two matrices together. In this case the multiplication operation is by no means obvious. However, in Chapter 5 when we study linear transformations, the motivation for the matrix multiplication procedure we are defining here will become quite transparent (see Theorem 5.5.7). We will build up to the general definition of matrix multiplication in three stages. Case 1: Product of a row n-vector and a column n-vector. We begin by generalizing a concept from elementary calculus. If a and b are either row or column n-vectors, with i i i i i i i “main” 2007/2/16 page 121 i 2.2 121 Matrix Algebra components a1 , a2 , . . . , an , and b1 , b2 , . . . , bn , respectively, then their dot product, denoted a · b, is the number a · b = a1 b1 + a2 b2 + · · · + an bn . As we will see, this is the key formula in defining the product of two matrices. Now let a be a row n-vector, and let x be a column n-vector. Then their matrix product ax is defined to be the 1 × 1 matrix whose single element is obtained by taking the dot product of the row vectors a and xT . Thus, x1 x2 ax = a1 a1 . . . an . = [a1 x1 + a2 x2 + · · · + an xn ]. . . xn Example 2.2.7 3 2 If a = 2 −1 3 5 and x = , then −3 4 3 2 ax = 2 −1 3 5 = [(2)(3) + (−1)(2) + (3)(−3) + (5)(4)] = [15]. −3 4 ... ... (Ax)1 (Ax)2 ai2 ... (Ax)i xn (Ax)m ... am1 am2 amn ... ... ... ai1 ain ith element of Ax ... Row i x1 x2 ... a1n a2n ... a12 a22 ... a11 a21 ... Case 2: Product of an m × n matrix and a column n-vector. If A is an m × n matrix and x is a column n-vector, then the product Ax is defined to be the m × 1 matrix whose ith element is obtained by taking the dot product of the ith row vector of A with x. (See Figure 2.2.1.) Figure 2.2.1: Multiplication of an m × n matrix with a column n-vector. The ith row vector of A, ai , is ai = ai 1 ai 2 . . . ain , so that Ax has ith element (Ax)i = ai 1 x1 + ai 2 x2 + · · · + ain xn . Consequently the column vector Ax has elements n (Ax)i = aik xk , 1 ≤ i ≤ m. (2.2.1) k =1 i i i i i i i “main” 2007/2/16 page 122 i 122 CHAPTER 2 Matrices and Systems of Linear Equations As illustrated in the next example, in practice, we do not use the formula (2.2.1); rather, we explicitly take the matrix products of the row vectors of A with the column vector x. Example 2.2.8 2 3 −1 7 Find Ax if A = 1 4 −6 and x = −3. 5 −2 0 1 Solution: We have 2 3 −1 7 4 Ax = 1 4 −6 −3 = −11 . 5 −2 0 1 41 The following result regarding multiplication of a column vector by a matrix will be used repeatedly in later chapters. Theorem 2.2.9 If A = a1 , a2 , . . . , an c1 c2 is an m × n matrix and c = . is a column n-vector, then . . cn Ac = c1 a1 + c2 a2 + · · · + cn an . (2.2.2) Proof The element aik of A is the i th component of the column m-vector ak , so aik = (ak )i . Applying formula (2.2.1) for multiplication of a column vector by a matrix yields n n (Ac)i = aik ck = k =1 n (ak )i ck = k =1 (ck ak )i . k =1 Consequently, n ck ak = c1 a1 + c2 a2 + · · · + cn an Ac = k =1 as required. If x1 , x2 , . . . , xn are column m-vectors and c1 , c2 , . . . , cn are scalars, then an expression of the form c1 x1 + c2 x2 + · · · + cn xn is called a linear combination of the column vectors. Therefore, from Equation (2.2.2), we see that the vector Ac is obtained by taking a linear combination of the column vectors of A. For example, if A= 2 −1 43 and c= 5 , −1 i i i i i i i “main” 2007/2/16 page 123 i 2.2 Matrix Algebra 123 then Ac = c1 a1 + c2 a2 = 5 2 −1 11 + (−1) = . 4 3 17 Case 3: Product of an m × n matrix and an n × p matrix. If A is an m × n matrix and B is an n × p matrix, then the product AB has columns defined by multiplying the matrix A by the respective column vectors of B , as described in Case 2. That is, if B = [b1 , b2 , . . . , bp ], then AB is the m × p matrix defined by AB = [Ab1 , Ab2 , . . . , Abp ]. Example 2.2.10 23 142 If A = and B = 5 −2, determine AB . 357 84 Solution: We have 23 142 5 −2 AB = 357 84 = [(1)(2) + (4)(5) + (2)(8)] [(1)(3) + (4)(−2) + (2)(4)] 38 3 = . 87 27 [(3)(2) + (5)(5) + (7)(8)] [(3)(3) + (5)(−2) + (7)(4)] Example 2.2.11 2 If A = −1 and B = 2 4 , determine AB . 3 Solution: We have 2 (2)(2) (2)(4) 48 AB = −1 2 4 = (−1)(2) (−1)(4) = −2 −4 . 3 (3)(2) (3)(4) 6 12 Another way to describe AB is to note that the element (AB)ij is obtained by computing the matrix product of the i th row vector of A and the j th column vector of B . That is, (AB)ij = ai 1 b1j + ai 2 b2j + · · · + ain bnj . Expressing this using the summation notation yields the following result: DEFINITION 2.2.12 If A = [aij ] is an m × n matrix, B = [bij ] is an n × p matrix, and C = AB , then n cij = aik bkj , 1 ≤ i ≤ m, 1 ≤ j ≤ p. (2.2.3) k =1 This is called the index form of the matrix product. The formula (2.2.3) for the ijth element of AB is very important and will often be required in the future. The reader should memorize it. In order for the product AB to be defined, we see that A and B must satisfy i i i i i i i “main” 2007/2/16 page 124 i 124 CHAPTER 2 Matrices and Systems of Linear Equations number of columns of A = number of rows of B . In such a case, if C represents the product matrix AB , then the relationship between the dimensions of the matrices is Am × n Bn × p = Cm × p SAME RESULT Now we give some further examples of matrix multiplication. Example 2.2.13 If A = 13 2 −2 0 and B = , then 24 1 53 AB = Example 2.2.14 13 24 −1 If A = 1 2 −1 and B = 0 1 2 −2 0 5 13 9 = . 1 53 8 16 12 1 1, then 2 −1 1 AB = 1 2 −1 0 1 = −2 1 . 12 Example 2.2.15 Example 2.2.16 2 If A = −1 and B = 1 4 −6 , then 3 2 2 8 −12 6 . AB = −1 1 4 −6 = −1 −4 3 3 12 −18 If A = 1−i i 3 + 2 i 1 + 4i and B = , then 2+i 1+i i −1 + 2i AB = 1−i i 2+i 1+i 3 + 2 i 1 + 4i 4 − i 3 + 2i = . i −1 + 2i 3 + 8i −5 + 10i Notice that in Examples 2.2.13 and 2.2.14 above, the product BA is not defined, since the number of columns of the matrix B does not agree with the number of rows of the matrix A. We can now establish some basic properties of matrix multiplication. Theorem 2.2.17 If A, B and C have appropriate dimensions for the operations to be performed, then A(BC) = (AB)C A(B + C) = AB + AC (A + B)C = AC + BC (associativity of matrix multiplication), (2.2.4) (left distributivity of matrix multiplication), (2.2.5) (right distributivity of matrix multiplication). (2.2.6) i i i i i i i “main” 2007/2/16 page 125 i 2.2 Matrix Algebra 125 Proof The idea behind the proof of each of these results is to use the definition of matrix multiplication to show that the ij th element of the matrix on the left-hand side of each equation is equal to the ij th element of the matrix on the right-hand side. We illustrate by proving (2.2.6), but we leave the proofs of (2.2.4) and (2.2.5) as exercises. Suppose that A and B are m × n matrices and that C is an n × p matrix. Then, from Equation (2.2.3), n [(A + B)C ]ij = n (aik + bik )ckj = k =1 n aik ckj + k =1 bik ckj k =1 = (AC)ij + (BC)ij = (AC + BC)ij , 1 ≤ i ≤ m, 1 ≤ j ≤ p. Consequently, (A + B)C = AC + BC. Theorem 2.2.17 states that matrix multiplication is associative and distributive (over addition). We now consider the question of commutativity of matrix multiplication. If A is an m × n matrix and B is an n × m matrix, we can form both of the products AB and BA, which are m × m and n × n, respectively. In the first of these, we say that B has been premultiplied by A, whereas in the second, we say that B has been postmultiplied by A. If m = n, then the matrices AB and BA will have different dimensions, so they cannot be equal. It is important to realize, however, that even if m = n, in general (that is, except for special cases) AB = BA. This is the statement that matrix multiplication is not commutative. With a little bit of thought this should not be too surprising, in view of the fact that the ij th element of AB is obtained by taking the matrix product of the i th row vector of A with the j th column vector of B , whereas the ij th element of BA is obtained by taking the matrix product of the i th row vector of B with the j th column vector of A. We illustrate with an example. Example 2.2.18 If A = 12 31 and B = , find AB and BA. −1 3 2 −1 Solution: AB = We have 12 −1 3 31 7 −1 = 2 −1 3 −4 and BA = 31 2 −1 12 29 = . −1 3 31 Thus we see that in this example, AB = BA. As an exercise, the reader can calculate the matrix BA in Examples 2.2.15 and 2.2.16 and again see that AB = BA. For an n × n matrix we use the usual power notation to denote the operation of multiplying A by itself. Thus, A2 = AA, A3 = AAA, and so on. i i i i i i i “main” 2007/2/16 page 126 i 126 CHAPTER 2 Matrices and Systems of Linear Equations The identity matrix, In (or just I if the dimensions are obvious), is the n × n matrix with ones on the main diagonal and zeros elsewhere. For example, 100 10 I2 = and I3 = 0 1 0 . 01 001 DEFINITION 2.2.19 The elements of In can be represented by the Kronecker delta symbol, δij , defined by δij = 1, 0, if i = j , if i = j . Then, In = [δij ]. The following properties of the identity matrix indicate that it plays the same role in matrix multiplication as the number 1 does in the multiplication of real numbers. Properties of the Identity Matrix: 1. Am×n In = Am×n . 2. Im Am×p = Am×p . Proof We establish property 1 and leave the proof of property 2 as an exercise (Problem 25). Using the index form of the matrix product, we have n (AI )ij = aik δkj = ai 1 δ1j + ai 2 δ2j + · · · + aij δjj + · · · + ain δnj . k =1 But, from the definition of the Kronecker delta symbol, we see that all terms in the summation with k = j vanish, so that we are left with (AI )ij = aij δjj = aij , 1 ≤ i ≤ m, 1 ≤ j ≤ n. The next example illustrates property 2 of the identity matrix. Example 2.2.20 2 −1 If A = 3 5, verify that I3 A = A. 0 −2 Solution: We have 100 2 −1 2 −1 I3 A = 0 1 0 3 5 = 3 5 = A. 001 0 −2 0 −2 i i i i i i i “main” 2007/2/16 page 127 i 2.2 Matrix Algebra 127 Properties of the Transpose The operation of taking the transpose of a matrix was introduced in the previous section. The next theorem gives three important properties satisfied by the transpose. These should be memorized. Theorem 2.2.21 Let A and C be m × n matrices, and let B be an n × p matrix. Then 1. (AT )T = A. 2. (A + C)T = AT + C T . 3. (AB)T = B T AT . Proof For all three statements, our strategy is again to show that the (i, j )-elements of each side of the equation are the same. We prove statement 3 and leave the proofs of 1 and 2 for the exercises (Problem 24). From the definition of the transpose and the index form of the matrix product, we have [(AB)T ]ij = (AB)j i (definition of the transpose) n = (index form of the matrix product) aj k bki k =1 n n bki aj k = = k =1 T TT bik akj k =1 = (B A )ij . T Consequently, (AB)T = B T AT . Results for Triangular Matrices Upper and lower triangular matrices play a significant role in the analysis of linear systems of equations. The following theorem and its corollary will be needed in Section 2.7. Theorem 2.2.22 The product of two lower (upper) triangular matrices is a lower (upper) triangular matrix. Proof Suppose that A and B are n × n lower triangular matrices. Then, aik = 0 whenever i < k , and bkj = 0 whenever k < j . If we let C = AB , then we must prove that cij = 0 whenever i < j. Using the index form of the matrix product, we have n cij = n aik bkj = k =1 aik bkj (since bkj = 0 if k < j ). (2.2.7) k =j i i i i i i i “main” 2007/2/16 page 128 i 128 CHAPTER 2 Matrices and Systems of Linear Equations We now impose the condition that i < j . Then, since k ≥ j in (2.2.7), it follows that k > i . However, this implies that aik = 0 (since A is lower triangular), and hence, from (2.2.7), that cij = 0 whenever i < j. as required. To establish the result for upper triangular matrices, either we can give an argument similar to that presented above for lower triangular matrices, or we can use the fact that the transpose of a lower triangular matrix is an upper triangular matrix, and vice versa. Hence, if A and B are n × n upper triangular matrices, then AT and B T are lower triangular, and therefore by what we proved above, (AB)T = B T AT remains lower triangular. Thus, AB is upper triangular. Corollary 2.2.23 The product of two unit lower (upper) triangular matrices is a unit lower (upper) triangular matrix. Proof Let A and B be unit lower triangular n × n matrices. We know from Theorem 2.2.22 that C = AB is a lower triangular matrix. We must establish that cii = 1 for each i . The elements on the main diagonal of C can be obtained by setting j = i in (2.2.7): n cii = (2.2.8) aik bki . k =i Since aik = 0 whenever k > i , the only nonzero term in the summation in (2.2.8) occurs when k = i . Consequently, cii = aii bii = 1 · 1 = 1, i = 1, 2, . . . , n. The proof for unit upper triangular matrices is similar and left as an exercise. The Algebra and Calculus of Matrix Functions By and large, the algebra of matrix and vector functions is the same as that for matrices and vectors of real or complex numbers. Since vector functions are a special case of matrix functions, we focus here on matrix functions. The main comment here pertains to scalar multiplication. In the description of scalar multiplication of matrices of numbers, the scalars were required to be real or complex numbers. However, for matrix functions, we can scalar multiply by any scalar function s(t). Example 2.2.24 If s(t) = et and A(t) = −2 + t e2t , then 4 cos t s(t)A(t) = Example 2.2.25 et (−2 + t) e3t 4e t et cos t . Referring to A and B from Example 2.1.14, find 2A − tB T . i i i i i i i “main” 2007/2/16 page 129 i 2.2 Solution: Matrix Algebra 129 We have 2A − tB T = = 2t 3 2t − 2 cos t 10 2 2et 2 ln (t + 1) 2tet 6t 5t − t 2 + t 3 −t t sin(e2t ) t tan t 6t − t 2 − 3t − 2 cos t 10 − 6t t 3 + t 2 − 5t t 2 − t sin(e2t ) 2 ln (t + 1) − t tan t 2tet + t 2 − 6t 2e . We can also perform calculus operations on matrix functions. In particular we can differentiate and integrate them. The rules for doing so are as follows: 1. The derivative of a matrix function is obtained by differentiating every element of the matrix. Thus, if A(t) = [aij (t)], then d aij (t) dA = , dt dt provided that each of the aij is differentiable. 2. It follows from (1) and the index form of the matrix product that if A and B are both differentiable and the product AB is defined, then dB dA d (AB) = A + B. dt dt dt The key point to notice is that the order of the multiplication must be preserved. 3. If A(t) = [aij (t)], where each aij (t) is integrable on an interval [a, b], then b b A(t) dt = aij (t) dt . a Example 2.2.26 If A(t) = Solution: a 2t 1 , determine dA/dt and 6t 2 4e2t We have whereas dA = dt 1 A(t) dt = 0 1 0 A(t) 20 12t 8e2t 1 0 2t dt 1 01 12 0 6t dt 1 2t 0 4e dt dt dt . , = 1 1 . 2 2(e2 − 1) Exercises for 2.2 Key Terms Matrix addition and subtraction, Scalar multiplication, Matrix multiplication, Dot product, Linear combination of column vectors, Index form, Premultiplication, Postmultiplication, Zero matrix, Identity matrix, Kronecker delta symbol. Skills • Be able to perform matrix addition, subtraction, and multiplication. • Know the basic relationships between the dimensions of two matrices A and B in order for A + B to be defined, and in order for AB to be defined. • Be able to multiply a matrix by a scalar. • Be able to express the product Ax of a matrix and a column vector as a linear combination of the columns of A. i i i i i i i “main” 2007/2/16 page 130 i 130 CHAPTER 2 Matrices and Systems of Linear Equations • Be familiar with all of the basic properties of matrix addition, matrix multiplication, scalar multiplication, the zero matrix, the identity matrix, the transpose of a matrix, and lower (upper) triangular matrices. • Know the basic technique for showing formally that two matrices are equal. • Be able to perform algebra and calculus operations on matrix functions. 11. If A is an n × n matrix function such that A and dA/dt are the same function, then A = cet In for some constant c. 12. If A and B are matrix functions whose product AB is defined, then the matrix functions (AB)T and B T AT are the same. Problems 1. If True-False Review For Questions 1–12, decide if the given statement is true or false, and give a brief justification for your answer. If true, you can quote a relevant definition or theorem from the text. If false, provide an example, illustration, or brief explanation of why the statement is false. 1. For all matrices A, B, and C of the appropriate dimensions, we have (AB)C = (CA)B. 2. If A is an m × n matrix, B is an n × p matrix, and C is a p × q matrix, then ABC is an m × q matrix. 3. If A and B are symmetric n × n matrices, then so is A + B. 4. If A and B are skew-symmetric n × n matrices, then AB is a symmetric matrix. A= 1 2 −1 , 35 2 2 −1 3 , 1 45 B= find 2A, −3B, A − 2B , and 3A + 4B . 2. If 2 −1 0 A = 3 1 2, −1 1 1 −1 −1 1 C = 1 2 3, −1 1 0 1 −1 2 B = 3 0 1, −1 1 0 find the matrix D such that 2A + B − 3C + 2D = A + 4C . 3. Let 1 −1 2 , 3 14 1 C = −1 , 2 A= 5. For n × n matrices A and B , we have (A + B)2 = A2 + 2AB + B 2 . 6. If AB = 0, then either A = 0 or B = 0. 2 −1 3 B = 5 1 2, 4 6 −2 D = 2 −2 3 . 7. If A and B are square matrices such that AB is upper triangular, then A and B must both be upper triangular. Find, if possible, AB, BC, CA, DC, DB, AD , and CD . 8. If A is a square matrix such that A2 = A, then A must be the zero matrix or the identity matrix. For Problems 4–6, determine AB for the given matrices. In √ these problems i denotes −1. 9. If A is a matrix of numbers, then if we consider A as a matrix function, its derivative is the zero matrix. 4. A = 2−i 1+i , −i 2 + 4i B= i 1 − 3i . 0 4+i 10. If A and B are matrix functions whose product AB is defined, then 5. A = 3 + 2 i 2 − 4i , 5 + i −1 + 3i B= −1 + i 3 + 2i . 4 − 3i 1 + i 6. A = 3 − 2i i , −i 1 d dB dA (AB) = A +B . dt dt dt B= −1 + i 2 − i 0 . 1 + 5i 0 3 − 2i i i i i i i i “main” 2007/2/16 page 131 i 2.2 7. Let A= C= 1 −1 2 3 , −2 3 4 6 1xz A = 0 1 y 001 32 1 5 B= 4 −3 , −1 6 such that 131 16. Find a matrix Matrix Algebra 0 −1 0 A2 + 0 0 −1 = I3 . 000 −3 2 . 1 −4 Find ABC and CAB . 17. If 8. If 1 −2 , 31 A= B= −1 2 , 53 C= 3 , −1 find (2A − 3B)C . determine all values of x and y for which A2 = A. 18. The Pauli spin matrices σ1 , σ2 , and σ3 are defined by For Problems 9–11, determine Ac by computing an appropriate linear combination of the column vectors of A. 13 , −5 4 9. A = 6 . −2 2 c = 3 . −4 c= 3 −1 4 10. A = 2 1 5 , 7 −6 3 −1 2 11. A = 4 7 , 5 −4 c= 5 . −1 12. If A is an m × n matrix and C is an r × s matrix, what must be the dimensions of B in order for the product ABC to be defined? Write an expression for the (i, j )th element of ABC in terms of the elements of A, B and C . 13. Find A2 , A3 , and x1 , −2 y A= A4 14. If A and B are n × n matrices, prove that (a) (A + B)2 = A2 + AB + BA + B 2 . (b) (A − B)2 = A2 − AB − BA + B 2 . Verify that they satisfy σ1 σ2 = iσ3 , A= 3 −1 , −5 −1 calculate A2 and verify that A satisfies A2 −2A−8I2 = 02 . σ2 σ3 = iσ1 , σ3 σ1 = iσ2 . If A and B are n × n matrices, we define their commutator, denoted [A, B ], by [A, B ] = AB − BA. Thus, [A, B ] = 0 if and only if A and B commute. That is, AB = BA. Problems 19–22 require the commutator. 19. If 1 −1 , 21 31 , 42 B= find [A, B ]. 20. If A1 = 10 , 01 A2 = 01 , 00 A3 = 00 , 10 compute all of the commutators [Ai , Aj ], and determine which of the matrices commute. 21. If A1 = 15. If 10 . 0 −1 σ3 = A= 1 −1 . 23 0 10 (b) A = −2 0 1 . 4 −1 0 0 −i , i0 σ2 = and if (a) A = 01 , 10 σ1 = 1 2 0i , i0 A3 = verify that [A1 , A2 ] [A3 , A1 ] = A2 . A2 = 1 2 = 1 2 0 −1 , 10 i0 , 0 −i A3 , [A2 , A3 ] = A1 , i i i i i i i “main” 2007/2/16 page 132 i 132 CHAPTER 2 Matrices and Systems of Linear Equations 22. If A, B and C are n × n matrices, find [A, [B, C ]] and prove the Jacobi identity [A, [B, C ]] + [B, [C, A]] + [C, [A, B ]] = 0. 23. Use the index form of the matrix product to prove properties (2.2.4) and (2.2.5). 31. Use the properties of the transpose to show that S and T are symmetric and skew-symmetric, respectively. 32. Find S and T for the matrix 1 −5 3 A = 3 2 4. 7 −2 6 24. Prove parts 1 and 2 of Theorem 2.2.21. 25. Prove property 2 of the identity matrix. 26. If A and B are n × n matrices, prove that tr(AB) = tr(BA). 27. If 1 −1 1 4 A = 2 0 2 −3 , 3 4 −1 0 0 −1 B= 1 2 1 2 , 1 1 33. If A is an n × n symmetric matrix, show that T = 0. What is the corresponding result for skew-symmetric matrices? 34. Show that every n × n matrix can be written as the sum of a symmetric and a skew-symmetric matrix. 35. Prove that if A is an n × p matrix and D = diag(d1 , d2 , . . . , dn ), then DA is the matrix obtained by multiplying the i th row vector of A by di (1 ≤ i ≤ n). 36. Use the properties of the transpose to prove that find AT , B T , AAT , AB and B T AT . (a) AAT is a symmetric matrix. 221 28. Let A = 2 5 2 , and let S be the matrix with col(b) (ABC)T = C T B T AT . 122 For Problems 37–40, determine the derivative of the given umn vectors matrix function. −x −y z e −2 t s1 = 0 , s2 = y , s3 = 2z , 37. A(t) = . sin t x −y z 38. A(t) = where x, y, z are constants. t sin t . cos t 4t 39. A(t) = e t e 2t t 2 . 2et 4e2t 5t 2 (a) Show that AS = [s1 , s2 , 7s3 ]. (b) Find all values of x, y, z such that S T AS = diag(1, 1, 7). 29. A matrix that is a multiple of In is called an n × n scalar matrix. (a) Determine the 4 × 4 scalar matrix whose trace is 8. (b) Determine the 3 × 3 scalar matrix such that the product of the elements on the main diagonal is 343. sin t cos t 0 40. A(t) = − cos t sin t t . 0 3t 1 41. Let A = [aij (t)] be an m × n matrix function and let B = [bij (t)] be an n × p matrix function. Use the definition of matrix multiplication to prove that d dB dA (AB) = A + B. dt dt dt 30. Prove that for each positive integer n, there is a unique scalar matrix whose trace is a given constant k . If A is an n × n matrix, then the matrices S and T defined by 1 S = 2 (A + AT ), 1 T = 2 (A − AT ) are referred to as the symmetric and skew-symmetric parts of A, respectively. Problems 31–34 investigate properties of S and T . For Problems 42–45, determine matrix function. b a A(t) dt for the given 42. A(t) = cos t , a = 0, b = π/2. sin t 43. A(t) = e t e −t , a = 0, b = 1. 2et 5e−t i i i i i i i “main” 2007/2/16 page 133 i 2.3 44. A(t) = sin 2t e 2t , a = 0, b = 1. −5 tet 2 t 3t − sin t sec 133 In Problems 46–49, evaluate the indefinite integral A(t) dt for the given matrix function. You may assume that the constants of all indefinite integrations are zero. t2 45. The matrix function A(t) in Problem 39, with a = 0 and b = 1. Integration of matrix functions given in the text was done with definite integrals, but one can naturally compute indefinite integrals of matrix functions as well, by performing indefinite integrals for each element of the matrix function. 2.3 Terminology for Systems of Linear Equations 46. A(t) = 2t . 3t 2 47. The matrix function A(t) in Problem 40. 48. The matrix function A(t) in Problem 43. 49. The matrix function A(t) in Problem 44. Terminology for Systems of Linear Equations As we mentioned in Section 2.1, a main aim of this chapter is to apply matrices to determine the solution properties of any system of linear equations. We are now in a position to pursue that aim. We begin by introducing some notation and terminology. DEFINITION 2.3.1 The general m × n system of linear equations is of the form a11 x1 + a12 x2 + · · · + a1n xn = b1 , a21 x1 + a22 x2 + · · · + a2n xn = b2 , . . . (2.3.1) am1 x1 + am2 x2 + · · · + amn xn = bm , where the system coefficients aij and the system constants bj are given scalars and x1 , x2 , . . . , xn denote the unknowns in the system. If bi = 0 for all i , then the system is called homogeneous; otherwise it is called nonhomogeneous. DEFINITION 2.3.2 By a solution to the system (2.3.1) we mean an ordered n-tuple of scalars, (c1 , c2 , . . . , cn ), which, when substituted for x1 , x2 , . . . , xn into the left-hand side of system (2.3.1), yield the values on the right-hand side. The set of all solutions to system (2.3.1) is called the solution set to the system. Remarks 1. Usually the aij and bj will be real numbers, and we will then be interested in determining only the real solutions to system (2.3.1). However, many of the problems that arise in the later chapters will require the solution to systems with complex coefficients, in which case the corresponding solutions will also be complex. 2. If (c1 , c2 , . . . , cn ) is a solution to the system (2.3.1), we will sometimes specify this solution by writing x1 = c1 , x2 = c2 , . . . , xn = cn . For example, the ordered pair of numbers (1, 2) is a solution to the system x1 + x2 = 3, 3x1 − 2x2 = −1, and we could express this solution in the equivalent form x1 = 1, x2 = 2. i i i i i i i “main” 2007/2/16 page 134 i 134 CHAPTER 2 Matrices and Systems of Linear Equations At this point, we pause to introduce some important notation that will be used frequently throughout the remainder of the text. Notation 2.3.3 The set of all ordered n-tuples of real numbers (c1 , c2 , . . . , cn ) will be denoted by Rn . Therefore, the set of all real solutions to the linear system (2.3.1) forms a subset of Rn . In like manner, the set of all ordered n-tuples of complex numbers will be denoted by Cn , and the solution set for a linear system (2.3.1) containing complex coefficients can be viewed as a subset of Cn . Notice that when we restrict all scalar values to be real, we have a natural correspondence between elements of Rn , row n-vectors, and column n-vectors: x1 x2 . . . xn ] ←→ . . . . (x1 , x2 , . . . , xn ) ←→ [x1 x2 xn Therefore, we may use the operations of addition, subtraction, and scalar multiplication of row n-vectors and column n-vectors to naturally equip Rn with these same operations. Therefore, just as we can perform addition and scalar multiplication of row or column vectors, so too can we perform these operations on n-tuples of scalars. In fact, we will often treat ordered n-tuples of scalars, row n-vectors, and column n-vectors as if they were just different representations of the same basic object. Of course, if we allow all scalars in question to assume complex values, then the correspondence is between elements of Cn , row n-vectors, and column n-vectors. We will have much more to say about the sets Rn and Cn in Chapter 4. Returning to the general discussion of system (2.3.1), we will consider some fundamental questions: 1. Does the system (2.3.1) have a solution? 2. If the answer to question 1 is yes, then how many solutions are there? 3. How do we determine all of the solutions? To obtain an idea of the answer to questions 1 and 2, consider the special case of a system of three equations in three unknowns. The linear system (2.3.1) then reduces to a11 x1 + a12 x2 + a13 x3 = b1 , a21 x1 + a22 x2 + a23 x3 = b2 , a31 x1 + a32 x2 + a33 x3 = b3 , which can be interpreted as defining three planes in space. An ordered triple (c1 , c2 , c3 ) is a solution to this system if and only if it corresponds to the coordinates of a point of intersection of the three planes. There are precisely four possibilities: 1. The planes have no intersection point. 2. The planes intersect in just one point. 3. The planes intersect in a line. 4. The planes are all identical. i i i i i i i “main” 2007/2/16 page 135 i 2.3 Terminology for Systems of Linear Equations 135 In case 1, the corresponding system has no solution, whereas in case 2, the system has just one solution. Finally, in cases 3 and 4, every point on the line or plane (respectively) is a solution to the linear system and hence the system has an infinite number of solutions. Cases 1,2 and 3 are illustrated in Figure 2.3.1. Three parallel planes (no intersection): no solution No common intersection: no solution Planes intersect at a point: a unique solution Planes intersect in a line: an infinite number of solutions Figure 2.3.1: Possible intersection points for three planes in space. We have therefore proved, geometrically, that there are precisely three possibilities for the solutions of a system of three equations in three unknowns. The system either has no solution, it has just one solution, or it has an infinite number of solutions. In Section 2.5, we will establish that these are the only possibilities for the general m × n system (2.3.1). DEFINITION 2.3.4 A system of equations that has at least one solution is said to be consistent, whereas a system that has no solution is called inconsistent. Our problem will be to determine whether a given system is consistent and then, if it is, to find its solution set. DEFINITION 2.3.5 Naturally associated with the system (2.3.1) are the following two matrices: a11 a12 . . . a1n a21 a22 . . . a2n 1. The matrix of coefficients A = . . . . am1 am2 . . . amn a11 a12 . . . a1n a21 a22 . . . a2n 2. The augmented matrix A# = . . . am1 am2 . . . amn b1 b2 . . . . bm i i i i i i i “main” 2007/2/16 page 136 i 136 CHAPTER 2 Matrices and Systems of Linear Equations The augmented matrix completely characterizes a system of equations, since it contains all of the system coefficients and system constants. We will see in the subsequent sections that the relationship between A and A# determines the solution properties of a linear system. Notice that the matrix of coefficients is the matrix consisting of the first n columns of A# . Example 2.3.6 Write the system of equations with the following augmented matrix: 1 2 9 −1 1 2 −3 7 4 2 . 1 3 5 0 −1 Solution: The appropriate system is x1 + 2x2 + 9x3 − x4 = 1, 2x1 − 3x2 + 7x3 + 4x4 = 2, x1 + 3x2 + 5x3 = −1. Vector Formulation We next show that the matrix product described in the preceding section can be used to write a linear system as a single equation involving the matrix of coefficients and column vectors. For example, the system x1 + 3x2 − 4x3 = 1, 2x1 + 5x2 − x3 = 5, x1 + 6x3 = 3 can be written as the vector equation 1 1 3 −4 x1 2 5 −1 x2 = 5 , 3 10 6 x3 since this vector equation is satisfied if and only if x1 + 3x2 − 4x3 1 2x1 + 5x2 − x3 = 5 ; 3 x1 + 6x3 that is, if and only if each equation of the given system is satisfied. Similarly, the general m × n system of linear equations a11 x1 + a12 x2 + · · · + a1n xn = b1 , a21 x1 + a22 x2 + · · · + a2n xn = b2 , . . . am1 x1 + am2 x2 + · · · + amn xn = bm , can be written as the vector equation Ax = b, i i i i i i i “main” 2007/2/16 page 137 i 2.3 Terminology for Systems of Linear Equations 137 where A is the m × n matrix of coefficients and x1 b1 x2 b2 x=. and b = . . . . . . xn bm We will refer to the column n-vector x as the vector of unknowns, and to the column m-vector b as the right-hand-side vector. Assuming that all elements in the system are real, we can view b as an element of Rm and x as an element of Rn . We can denote these statements by b ∈ Rm and x ∈ Rn , respectively.3 Therefore, the set of all real solutions to the system Ax = b is S = {x ∈ Rn : Ax = b}, which is a subset of Rn . Example 2.3.7 It can be shown, using the techniques of the next two sections, that the solution set of the linear system x1 + x2 + 2x3 − x4 = 0, 3x1 − 2x2 + x3 + 2x4 = 0, 5x1 + 3x2 + 3x3 − 2x4 = 0, is the subset of R4 defined by S = {(−t, 4t, t, 5t) : t ∈ R}. A similar vector formulation for systems of differential equations can be used not only in developing the theory for such systems, but also in deriving solution techniques. As an example of this formulation, consider the system of differential equations dx1 = 3tx1 + 9x2 + 6et , dt dx2 = 2x1 − 7x2 + 3et . dt Using matrix and vector functions, this system can be written as the vector equation dx = A(t)x(t) + b(t), dt where x(t) = x1 (t) , x2 (t) dx d x1 /dt = , dx2 /dt dt A= 3t 9 , 2 −7 and b(t) = 6et . 3et In this formulation, the basic unknown is the column 2-vector function x(t). Example 2.3.8 Give the vector formulation for the system of equations x1 = 3x1 + (sin t)x2 + et , x2 = 7tx1 + t 2 x2 − 4e−t . 3 The symbol ∈ is the set-theoretic notation declaring membership in a set; it will be often encountered in the text. i i i i i i i “main” 2007/2/16 page 138 i 138 CHAPTER 2 Matrices and Systems of Linear Equations Solution: We have x1 x2 = 3 sin t 7t t 2 x1 x2 + et −4e−t . That is, x (t) = A(t)x(t) + b(t), where x(t) = x1 (t) , A(t) = x2 (t) 3 sin t 7t t 2 , b(t) = et −4e−t . Exercises for 2.3 Key Terms System coefficients, System constants, Homogeneous system, Nonhomogeneous system, Solution, Solution set, Consistent system, Inconsistent system, Matrix of coefficients, Augmented matrix, Vector of unknowns, Right-hand-side vector. Skills 2. A linear system that contains three distinct planes can have at most one solution. 3. If the matrix of coefficients of a linear system is an m × n matrix, then the right-hand-side vector must have n components. 4. It is impossible for a linear system of equations to have exactly two solutions. • Be able to write a linear system of equations as a vector equation, and identify the matrix of coefficients, the right-hand-side vector, and the augmented matrix. 5. If a linear system has an m × n coefficient matrix, then the augmented matrix for the linear system is m × (n + 1). • Given a matrix of coefficients and a right-hand-side vector, or an augmented matrix, be able to write the corresponding linear system. 6. If A is an n × n matrix, then the linear systems Ax = 0 and AT x = 0 have the same solution set. • Understand the geometric difference between a consistent linear system and an inconsistent one. • Be able to verify that the components of a given vector provide a solution to a linear system. • Be able to give the vector formulation for a system of differential equations. True-False Review For Questions 1–6, decide if the given statement is true or false, and give a brief justification for your answer. If true, you can quote a relevant definition or theorem from the text. If false, provide an example, illustration, or brief explanation of why the statement is false. 1. If a linear system of equations has an m × n augmented matrix, then the system has m equations and n unknowns. Problems For Problems 1–2, verify that the given triple of real numbers is a solution to the given system. 1. (1, −1, 2); 2x1 − 3x2 + 4x3 = 13, x1 + x2 − x3 = −2, 5x1 + 4x2 + x3 = 3. 2. (2, −3, 1); x1 3x1 x1 2x1 + − + + x2 x2 x2 2x2 − − + − 2x3 7x3 x3 4x3 = −3, = 2, = 0, = −6. i i i i i i i “main” 2007/2/16 page 139 i 2.4 139 Elementary Row Operations and Row-Echelon Matrices 3. Verify that for all values of t , (1 − t, 2 + 3t, 3 − 2t) is a solution to the linear system x1 + x2 + x3 = 6, x1 − x2 − 2x3 = −7, 5x1 + x2 − x3 = 4. 2 13 3 10. A = 4 −1 2 , b = 1 . 7 63 −5 11. Consider the m × n homogeneous system of linear equations Ax = 0. (2.3.2) 4. Verify that for all values of s and t , (s, s − 2t, 2s + 3t, t) (a) If x = [x1 x2 . . . xn ]T and y = [y1 y2 . . . yn ]T are solutions to (2.3.2), show that is a solution to the linear system x1 + x2 − x3 + 5x4 = 0, 2x2 − x3 + 7x4 = 0, 4x1 + 2x2 − 3x3 + 13x4 = 0. 5. By making a sketch in the xy -plane, prove that the following linear system has no solution: 2x + 3y = 1, 2x + 3y = 2. For Problems 6–8, determine the coefficient matrix, A, the right-hand-side vector, b, and the augmented matrix A# of the given system. 6. x1 + 2x2 − 3x3 = 1, 2x1 + 4x2 − 5x3 = 2, 7x1 + 2x2 − x3 = 3. 7. x + y + z − w = 3, 2x + 4y − 3z + 7w = 2. 8. x1 + 2x2 − x3 = 0, 2x1 + 3x2 − 2x3 = 0, 5x1 + 6x2 − 5x3 = 0. For Problems 9–10, write the system of equations with the given coefficient matrix and right-hand-side vector. 1 −1 2 3 1 9. A = 1 1 −2 6 , b = −1 . 3 1 42 2 2.4 z=x+y and w = cx are also solutions, where c is an arbitrary scalar. (b) Is the result of (a) true when x and y are solutions to the nonhomogeneous system Ax = b? Explain. For Problems 12–15, write the vector formulation for the given system of differential equations. 12. x1 = −4x1 + 3x2 + 4t , 13. x1 = t 2 x1 − tx2 , 14. x1 = e2t x2 , x2 = 6x1 − 4x2 + t 2 . x2 = (− sin t)x1 + x2 . x2 + (sin t)x1 = 1. 15. x1 = (− sin t)x2 + x3 + t , x2 = −et x1 + t 2 x3 + t 3 , x3 = −tx1 + t 2 x2 + 1. For Problems 16–17 verify that the given vector function x defines a solution to x = Ax + b for the given A and b. 16. x(t) = e 4t −2e4t , A= 2 −1 , b(t) = −2 3 4e−2t + 2 sin t , A= 3e−2t − cos t −2(cos t + sin t) b(t) = . 7 sin t + 2 cos t 17. x(t) = 0 . 0 1 −4 , −3 2 Elementary Row Operations and Row-Echelon Matrices In the next section we will develop methods for solving a system of linear equations. These methods will consist of reducing a given system of equations to a new system that has the same solution set as the given system but is easier to solve. In this section we introduce the requisite mathematical results. i i i i i i i “main” 2007/2/16 page 140 i 140 CHAPTER 2 Matrices and Systems of Linear Equations Elementary Row Operations The first step in deriving systematic procedures for solving a linear system is to determine what operations can be performed on such a system without altering its solution set. Example 2.4.1 Consider the system of equations x1 + 2x2 + 4x3 = 2, 2x1 − 5x2 + 3x3 = 6, 4x1 + 6x2 − 7x3 = 8. (2.4.1) (2.4.2) (2.4.3) Solution: If we permute (i.e., interchange), say, Equations (2.4.1) and (2.4.2), the resulting system is 2x1 − 5x2 + 3x3 = 6, x1 + 2x2 + 4x3 = 2, 4x1 + 6x2 − 7x3 = 8, which certainly has the same solution set as the original system. Returning to the original system, if we multiply, say, Equation (2.4.2) by 5, we obtain the system x1 + 2x2 + 4x3 = 2, 10x1 − 25x2 + 15x3 = 30, 4x1 + 6x2 − 7x3 = 8, which again has the same solution set as the original system. Finally, if we add, say, twice Equation (2.4.1) to Equation (2.4.3), we obtain the system x1 + 2x2 + 4x3 = 2, (2.4.4) 2x1 − 5x2 + 3x3 = 6, (2.4.5) (4x1 + 6x2 − 7x3 ) + 2(x1 + 2x2 + 4x3 ) = 8 + 2(2). (2.4.6) We can verify that, if (2.4.4)–(2.4.6) are satisfied, then so are (2.4.1)–(2.4.3), and vice versa. It follows that the system of equations (2.4.4)–(2.4.6) has the same solution set as the original system of equations (2.4.1)–(2.4.3). More generally, similar reasoning can be used to show that the following three operations can be performed on any m × n system of linear equations without altering the solution set: 1. Permute equations. 2. Multiply an equation by a nonzero constant. 3. Add a multiple of one equation to another equation. Since these operations involve changes only in the system coefficients and constants (and not changes in the variables), they can be represented by the following operations on the rows of the augmented matrix of the system: 1. Permute rows. 2. Multiply a row by a nonzero constant. 3. Add a multiple of one row to another row. i i i i i i i “main” 2007/2/16 page 141 i 2.4 Elementary Row Operations and Row-Echelon Matrices 141 These three operations, called elementary row operations, will be a basic computational tool throughout the text, even in cases when the matrix under consideration is not derived from a system of linear equations. The following notation will be used to describe elementary row operations performed on a matrix A. 1. Pij : Permute the i th and j th rows in A. 2. Mi (k): Multiply every element of the i th row of A by a nonzero scalar k . 3. Aij (k): Add to the elements of the j th row of A the scalar k times the corresponding elements of the i th row of A. Furthermore, the notation A ∼ B will mean that matrix B has been obtained from matrix A by a sequence of elementary row operations. To reference a particular elementary row operation used in, say, the nth step of the sequence of elementary row operations, n we will write ∼ B . Example 2.4.2 The one-step operations performed on the system in Example 2.4.1 can be described as follows using elementary row operations on the augmented matrix of the system: 1 2 42 2 −5 3 6 1 2 −5 3 6 ∼ 1 2 4 2 1. P12 . Permute (2.4.1) and (2.4.2). 4 6 −7 8 4 6 −7 8 1 2 42 1 2 42 1 2 −5 3 6 ∼ 10 −25 15 30 1. M2 (5). Multiply (2.4.2) by 5. 4 6 −7 8 4 6 −7 8 1 2 42 1 24 2 1 2 −5 3 6 ∼ 2 −5 3 6 1. A13 (2). Add 2 times (2.4.1) to (2.4.3). 4 6 −7 8 6 10 1 12 It is important to realize that each elementary row operation is reversible; we can “undo” a given elementary row operation by another elementary row operation to bring the modified linear system back into its original form. Specifically, in terms of the notation introduced above, the reverse operations are determined as follows (ERO refers here to “elementary row operation”): ERO Applied to A A∼B Pij Mi (k ) Aij (k ) Reverse ERO Applied to B B∼A Pj i : Permute row j and i in B . Mi (1/k ): Multiply the i th row of B by 1/k . Aij (−k): Add to the elements of the j th row of B the scalar −k times the corresponding elements of the i th row of B We introduce a special term for matrices that are related via elementary row operations. DEFINITION 2.4.3 Let A be an m × n matrix. Any matrix obtained from A by a finite sequence of elementary row operations is said to be row-equivalent to A. i i i i i i i “main” 2007/2/16 page 142 i 142 CHAPTER 2 Matrices and Systems of Linear Equations Thus, all of the matrices in the previous example are row-equivalent. Since elementary row operations do not alter the solution set of a linear system, we have the next theorem. Theorem 2.4.4 Systems of linear equations with row-equivalent augmented matrices have the same solution sets. Row-Echelon Matrices Our methods for solving a system of linear equations will consist of using elementary row operations to reduce the augmented matrix of the given system to a simple form. But how simple a form should we aim for? In order to answer this question, consider the system x1 + x2 − x3 = 4, x2 − 3x3 = 5, x3 = 2. (2.4.7) (2.4.8) (2.4.9) This system can be solved most easily as follows. From Equation (2.4.9), x3 = 2. Substituting this value into Equation (2.4.8) and solving for x2 yields x2 = 5 + 6 = 11. Finally, substituting for x3 and x2 into Equation (2.4.7) and solving for x1 , we obtain x1 = −5. Thus, the solution to the given system of equations is (−5, 11, 2), a single vector in R3 . This technique is called back substitution and could be used because the given system has a simple form. The augmented matrix of the system is 1 1 −1 4 0 1 −3 5 00 12 We see that the submatrix consisting of the first three columns (which corresponds to the matrix of coefficients) is an upper triangular matrix with the leftmost nonzero entry in each row equal to 1. The back-substitution method will work on any system of linear equations with an augmented matrix of this form. Unfortunately, not all systems of equations have augmented matrices that can be reduced to such a form. However, there is a simple type of matrix to which any matrix can be reduced by elementary row operations, and which also represents a system of equations that can be solved (if it has a solution) by back substitution. This is called a row-echelon matrix and is defined as follows: DEFINITION 2.4.5 An m × n matrix is called a row-echelon matrix if it satisfies the following three conditions: 1. If there are any rows consisting entirely of zeros, they are grouped together at the bottom of the matrix. 2. The first nonzero element in any nonzero row4 is a 1 (called a leading 1). 3. The leading 1 of any row below the first row is to the right of the leading 1 of the row above it. 4 A nonzero row (nonzero column) is any row (column) that does not consist entirely of zeros. i i i i i i i “main” 2007/2/16 page 143 i 2.4 Example 2.4.6 Elementary Row Operations and Row-Echelon Matrices 143 Examples of row-echelon matrices are 1 −2 3 7 0 1 5 0 , 0 001 001 0 0 0 , 000 whereas 1 0 −1 0 1 2 0 1 −1 and and 1 0 0 0 1 −1 6 5 9 0 0 1 2 5 0 0 0 1 0 , 0 0000 00 0 0 1 −1 01 are not row-echelon matrices. The basic result that will allow us to determine the solution set to any system of linear equations is stated in the next theorem. Theorem 2.4.7 Example 2.4.8 Any matrix is row-equivalent to a row-echelon matrix. According to this theorem, by applying an appropriate sequence of elementary row operations to any m × n matrix, we can always reduce it to a row-echelon matrix. When a matrix A has been reduced to a row-echelon matrix in this way, we say that it has been reduced to row-echelon form and refer to the resulting matrix as a row-echelon form of A. The proof of Theorem 2.4.7 consists of giving an algorithm that will reduce an arbitrary m × n matrix to a row-echelon matrix after a finite sequence of elementary row operations. Before presenting such an algorithm, we first illustrate the result with an example. 2 1 −1 3 1 −1 2 1 Use elementary row operations to reduce −4 6 −7 1 to row-echelon form. 2 0 13 Solution: We show each step in detail. Step 1: Put a leading 1 in the (1, 1) position. This is most easily accomplished by permuting rows 1 and 2. 2 1 −1 3 1 −1 2 1 1 −1 2 1 1 2 1 −1 3 −4 6 −7 1 ∼ −4 6 −7 1 2 0 13 2 0 13 Step 2: Use the leading 1 to put zeros beneath it in column 1. This is accomplished by adding appropriate multiples of row 1 to the remaining rows. 1 −1 2 1 Add −2 times row 1 to row 2. 3 −5 1 2 0 ∼ Step 2 row operations: Add 4 times row 1 to row 3. 0 2 1 5 Add −2 times row 1 to row 4. 0 2 −3 1 Step 3: Put a leading 1 in the (2, 2) position. We could accomplish this by multiplying row 2 by 1/3. However, this would introduce fractions into the matrix and thereby complicate the remaining computations. In i i i i i i i “main” 2007/2/16 page 144 i 144 CHAPTER 2 Matrices and Systems of Linear Equations hand calculations, fewer algebraic errors result if we avoid the use of fractions. In this case, we can obtain a leading 1 without the use of fractions by adding −1 times row 3 to row 2. 1 −1 2 1 1 −6 −4 3 0 ∼ Step 3 row operation: Add −1 times row 3 to row 2. 0 2 1 5 0 2 −3 1 Step 4: Use the leading 1 in the (2, 2) position to put zeros beneath it in column 2. We now add appropriate multiples of row 2 to the rows beneath it. For row-echelon form, we need not be concerned about the row above it, however. 1 −1 2 1 Add −2 times row 2 to row 3. 1 −6 −4 4 0 ∼ Step 4 row operations: 0 0 13 13 Add −2 times row 2 to row 4. 0099 Step 5: Put a leading 1 in the (3, 3) position. This can be accomplished by multiplying row 3 by 1/13. 1 −1 2 1 1 −6 −4 5 0 ∼ 0 0 1 1 0099 Step 6: Use the leading 1 in the (3, 3) position to put zeros beneath it in column 3. The appropriate row operation is to add −9 times row 3 to row 4. 1 −1 2 1 1 −6 −4 6 0 ∼ 0 0 1 1 0000 This is a row-echelon matrix, hence the given matrix has been reduced to row-echelon form. The specific operations used at each step are given next, using the notation introduced previously in this section. In future examples, we will simply indicate briefly the elementary row operation used at each step. The following shows this description for the present example. 1. P12 2. A12 (−2), A13 (4), A14 (−2) 3. A32 (−1) 4. A23 (−2), A24 (−2) 5. M3 (1/13) 6. A34 (−9) Remarks 1. Notice that in steps 2 and 4 of the preceding example we have performed multiple elementary row operations of the type Aij (k) in a single step. With this one exception, the reader is strongly advised not to combine multiple elementary row operations into a single step, particularly when they are of different types. This is a common source of calculation errors. i i i i i i i “main” 2007/2/16 page 145 i 2.4 Elementary Row Operations and Row-Echelon Matrices 145 2. The reader may have noticed that the particular steps taken in the preceding example are not uniquely determined. For instance, we could have achieved a leading 1 in the (1, 1) position in step 1 by multiplying the first row by 1/2, rather than permuting the first two rows. Therefore, we may have multiple strategies for reducing a matrix to row-echelon form, and indeed, many possible row-echelon forms for a given matrix A. In this particular case, we chose not to multiply the first row by 1/2 in order to avoid introducing fractions into the calculations. The reader is urged to study the foregoing example very carefully, since it illustrates the general procedure for reducing an m × n matrix to row-echelon form using elementary row operations. This procedure will be used repeatedly throughout the text. The idea behind reduction to row-echelon form is to start at the upper left-hand corner of the matrix and proceed downward and to the right in the matrix. The following algorithm formalizes the steps that reduce any m × n matrix to row-echelon form using a finite number of elementary row operations and thereby provides a proof of Theorem 2.4.7. An illustration of this algorithm is given in Figure 2.4.1. Algorithm for Reducing an m × n Matrix A to Row-Echelon Form 1. Start with an m × n matrix A. If A = 0, go to step 7. 2. Determine the leftmost nonzero column (this is called a pivot column, and the topmost position in this column is called a pivot position). 3. Use elementary row operations to put a 1 in the pivot position. 4. Use elementary row operations to put zeros below the pivot position. 5. If there are no more nonzero rows below the pivot position go to step 7, otherwise go to step 6. 6. Apply steps 2 through 5 to the submatrix consisting of the rows that lie below the pivot position. 7. The matrix is a row-echelon matrix. ze ze ze Elementary row Elementary row * * * operations applied operations applied * * * 0 . . . 0 * * * * to pivot column 0 . . . 0 1 * * * to rows beneath 0 . . . 0 1 * * * in submatrix pivot position 0* * * .* ~ ~ * * .* Pivot position * * * 0* * ro ro ro s s s Nonzero elements Submatrix Pivot column New submatrix Figure 2.4.1: Illustration of an algorithm for reducing an m × n matrix to row-echelon form. Remark In order to obtain a row-echelon matrix, we put a 1 in each pivot position. However, many algorithms for solving systems of linear equations numerically are based around the preceding algorithm, except that in step 3 we place a nonzero number (not necessarily a 1) in the pivot position. Of course, the matrix resulting from an application of this algorithm differs from a row-echelon matrix, since it will have arbitrary nonzero elements in the pivot positions. Example 2.4.9 3 2 −5 Reduce 1 1 −2 1 0 −3 2 1 to row-echelon form. 4 i i i i i i i “main” 2007/2/16 page 146 i 146 CHAPTER 2 Matrices and Systems of Linear Equations Solution: Applying the row-reduction algorithm leads to the following sequence of elementary row operations. The specific row operations used at each step are given at the end of the process. Pivot position Pivot position E 3 2 −5 2 h 1 1 −1 1 1 1 −1 1 © 1 2 1 1 −1 1 ∼ 3 2 −5 2 ∼ 0 −m 2 −1 1− 1 0 −3 4 1 0 −3 4 0 −1 −2 3 T T Pivot column 1 3 ∼ 0 0 Pivot column Pivot column d ‚ d 1 −1 1 1 1 −1 1 1 1 −1 1 4 5 1 2 1 ∼ 0 1 2 1 ∼ 0 1 2 1 j −1 −2 3 00 04 00 01 T Pivot position This is a row-echelon matrix and hence we are done. The row operations used are summarized here: 1. P12 2. A12 (−3), A13 (−1) 3. M2 (−1) 4. A23 (1) 5. M3 (1/4) The Rank of a Matrix We now derive some further results on row-echelon matrices that will be required in the next section to develop the theory for solving systems of linear equations. We first observe that a row-echelon form for a matrix A is not unique. Given one row-echelon form for A, we can always obtain a different one by taking the first rowechelon form for A and adding some multiple of a given row to any rows above it. The result is still in row-echelon form. However, even though the row-echelon form of A is not unique, we do have the following theorem (in Chapter 4 we will see how the proof of this theorem arises naturally from the more sophisticated ideas from linear algebra yet to be introduced). Theorem 2.4.10 Let A be an m × n matrix. All row-echelon matrices that are row-equivalent to A have the same number of nonzero rows. Theorem 2.4.10 associates a number with any m × n matrix A—namely, the number of nonzero rows in any row-echelon form of A. As we will see in the next section, this number is fundamental in determining the solution properties of linear systems, and indeed it plays a central role in linear algebra in general. For this reason, we give it a special name. DEFINITION 2.4.11 The number of nonzero rows in any row-echelon form of a matrix A is called the rank of A and is denoted rank(A). Example 2.4.12 31 Determine rank(A) if A = 4 3 2 −1 4 5. 3 i i i i i i i “main” 2007/2/16 page 147 i 2.4 Elementary Row Operations and Row-Echelon Matrices 147 Solution: In order to determine rank(A), we must first reduce A to row-echelon form. 121 12 1 3 14 1 21 1 21 1 2 3 4 4 3 5 ∼ 4 3 5 ∼ 0 −5 1 ∼ 0 1 − 1 ∼ 0 1 − 1 . 5 5 2 −1 3 2 −1 3 0 −5 1 0 −5 1 00 0 Since there are two nonzero rows in this row-echelon form of A, it follows from Definition 2.4.11 that rank(A) = 2. 1. A31 (−1) 2. A12 (−4), A13 (−2) 3. M2 (−1/5) 4. A23 (5) In the preceding example, the original matrix A had three nonzero rows, whereas any row-echelon form of A has only two nonzero rows. We can interpret this geometrically as follows. The three row vectors of A can be considered as vectors in R3 with components a1 = (3, 1, 4), a2 = (4, 3, 5), a3 = (2, −1, 3). In performing elementary row operations on A, we are taking combinations of these vectors in the following way: c1 a1 + c2 a2 + c3 a3 , and thus the rows of a row-echelon form of A are all of this form. We have been combining the vectors linearly. The fact that we obtained a row of zeros in the row-echelon form means that there are values of the constants c1 , c2 , c3 such that c1 a1 + c2 a2 + c3 a3 = 0, where 0 denotes the zero vector (0, 0, 0). Equivalently, one of the vectors can be written in terms of the other two vectors, and therefore the three vectors lie in a plane. Reducing the matrix to row-echelon form has uncovered this relationship among the three vectors. We shall have much more to say about this in Chapter 4. Remark If A is an m × n matrix, then rank(A) ≤ m and rank(A) ≤ n. This is because the number of nonzero rows in a row-echelon form of A is equal to the number of pivots in a row-echelon form of A, which cannot exceed the number of rows or columns of A, since there can be at most one pivot per row and per column. Reduced Row-Echelon Matrices In the future we will need to consider the special row-echelon matrices that arise when zeros are placed above, as well as beneath, each leading 1. Any such matrix is called a reduced row-echelon matrix and is defined precisely as follows. DEFINITION 2.4.13 An m × n matrix is called a reduced row-echelon matrix if it satisfies the following conditions: 1. It is a row-echelon matrix. 2. Any column that contains a leading 1 has zeros everywhere else. i i i i i i i “main” 2007/2/16 page 148 i 148 CHAPTER 2 Matrices and Systems of Linear Equations Example 2.4.14 The following are examples of reduced row-echelon matrices: 1300 0 0 1 0 , 0001 1 −1 7 0 , 0 001 1053 0 1 2 1 , 0000 and 100 0 1 0 . 001 Although an m × n matrix A does not have a unique row-echelon form, in reducing A to a reduced row-echelon matrix we are making a particular choice of row-echelon matrix, since we arrange that all elements above each leading 1 are zeros. In view of this, the following theorem should not be too surprising: Theorem 2.4.15 An m × n matrix is row-equivalent to a unique reduced row-echelon matrix. The unique reduced row-echelon matrix to which a matrix A is row-equivalent will be called the reduced row-echelon form of A. As illustrated in the next example, the row-reduction algorithm is easily extended to determine the reduced row-echelon form of A—we just put zeros above and beneath each leading 1. Example 2.4.16 3 −1 22 Determine the reduced row-echelon form of A = −1 5 2. 2 4 24 Solution: We apply the row-reduction algorithm, but put 0s above and below the leading 1s. In so doing, it is immaterial whether we first reduce A to row-echelon form and then arrange 0s above the leading 1s, or arrange 0s both above and below the leading 1s as we proceed from left to right. 1 9 26 1 9 26 1 9 26 108 3 −1 22 1 2 3 4 1 2 ∼ 0 1 2 A = −1 5 2 ∼ −1 5 2 ∼ 0 14 28 ∼ 0 2 4 24 0 −14 −28 0 −14 −28 000 2 4 24 which is the reduced row-echelon form of A. 1. A21 (2) 2. A12 (1), A13 (−2) 3. M2 (1/14) 4. A21 (−9), A23 (14) Exercises for 2.4 Key Terms Elementary row operations, Row-equivalent matrices, Back substitution, Row-echelon matrix, Row-echelon form, Leading 1, Pivot, Rank of a matrix, Reduced row-echelon matrix. Skills • Be able to perform elementary row operations on a matrix. • Be able to determine a row-echelon form or reduced row-echelon form for a matrix. • Be able to find the rank of a matrix. True-False Review For Questions 1–9, decide if the given statement is true or false, and give a brief justification for your answer. If true, you can quote a relevant definition or theorem from the text. If false, provide an example, illustration, or brief explanation of why the statement is false. 1. A matrix A can have many row-echelon forms but only one reduced row-echelon form. i i i i i i i “main” 2007/2/16 page 149 i 2.4 Elementary Row Operations and Row-Echelon Matrices 2. Any upper triangular n × n matrix is in row-echelon form. 3. Any n × n matrix in row-echelon form is upper triangular. 4. If a matrix A has more rows than a matrix B , then rank(A) ≥ rank(B). 5. For any matrices A and B of the same dimensions, 0100 8. 0 0 1 0 . 0000 For Problems 9–18, use elementary row operations to reduce the given matrix to row-echelon form, and hence determine the rank of each matrix. 21 . 1 −3 9. rank(A + B) = rank(A) + rank(B). 6. For any matrices A and B of the appropriate dimensions, rank(AB) = rank(A) · rank(B). 7. If a matrix has rank zero, then it must be the zero matrix. 8. The matrices A and 2A must have the same rank. 9. The matrices A and 2A must have the same reduced row-echelon form. Problems For Problems 1–8, determine whether the given matrices are in reduced row-echelon form, row-echelon form but not reduced row-echelon form, or neither. 1 0 −1 0 1. 0 0 1 2 . 00 00 1025 2. 1 0 0 2 . 0110 1000 . 0001 3. 4. 01 . 10 5. 11 . 00 1 0 6. 0 0 0 7. 0 0 0 0 0 0 1 1 0 0 2 1 . 1 0 000 0 0 0 . 000 149 2 −4 . −4 8 10. 2 14 11. 2 −3 4 . 3 −2 6 013 12. 0 1 4 . 035 2 −1 13. 3 2 . 25 2 −1 3 14. 3 1 −2 . 2 −2 1 2 −1 3 4 15. 1 −2 1 3 . 1 −5 0 5 2 3 16. 1 2 −2 −1 3 −2 3 1 . −1 1 0 −1 2 2 4 74 7 3 5 3 5 17. 2 −2 2 −2 . 5 −2 5 −2 21342 18. 1 0 2 1 3 . 23157 For Problems 19–25, reduce the given matrix to reduced rowechelon form and hence determine the rank of each matrix. 19. 32 . 1 −1 i i i i i i i “main” 2007/2/16 page 150 i 150 CHAPTER 2 3 20. 2 1 3 21. 2 6 22. 23. 24. 25. Matrices and Systems of Linear Equations 7 10 3 −1 . 21 −3 6 −2 4 . −6 12 the linear algebra package of Maple, the three elementary row operations are 3 5 −12 2 3 −7 . −2 −1 1 1 −1 −1 2 3 −2 0 7 2 −1 2 4 . 4 −2 3 8 1 −2 1 3 3 −6 2 7 . 4 −8 3 10 0121 0 3 1 2 . 0201 For Problems 26–28, use some form of technology to determine a row-echelon form of the given matrix. • swaprow(A, i, j ) : permute rows i and j • mulrow(A, i, k) : multiply row i by k • addrow(A, i, j ) : add k times row i to row j 26. The matrix in Problem 14. 27. The matrix in Problem 15. 28. The matrix in Problem 18. Many forms of technology also have built-in functions for directly determining the reduced row-echelon form of a given matrix A. For example, in the linear algebra package of Maple, the appropriate command is rref(A). In Problems 29–31, use technology to determine directly the reduced row-echelon form of the given matrix. 29. The matrix in Problem 21. Many forms of technology have commands for performing elementary row operations on a matrix A. For example, in 2.5 30. The matrix in Problem 24. 31. The matrix in Problem 25. Gaussian Elimination We now illustrate how elementary row-operations applied to the augmented matrix of a system of linear equations can be used first to determine whether the system is consistent, and second, if the system is consistent, to find all of its solutions. In doing so, we will develop the general theory for linear systems of equations. Example 2.5.1 Determine the solution set to 3x1 − 2x2 + 2x3 = 9, x1 − 2x2 + x3 = 5, 2x1 − x2 − 2x3 = −1. (2.5.1) Solution: We first use elementary row operations to reduce the augmented matrix of the system to row-echelon form. 3 −2 2 9 1 −2 1 5 1 −2 1 5 1 2 1 −2 1 5 ∼ 3 −2 2 9 ∼ 0 4 −1 − 6 2 −1 −2 −1 2 −1 −2 −1 0 3 −4 −11 1 −2 1 5 1 −2 1 5 1 −2 1 4 5 5 ∼ 0 1 3 5 ∼ 0 1 3 ∼ 0 1 3 0 3 −4 −11 0 0 −13 −26 0 01 3 5 5 . 2 i i i i i i i “main” 2007/2/16 page 151 i 2.5 1. P12 151 Gaussian Elimination 2. A12 (−3), A13 (−2) 3. A32 (−1) 4. A23 (−3) 5. M3 (−1/13) The system corresponding to this row-echelon form of the augmented matrix is x1 − 2x2 + x3 = 5, x2 + 3x3 = 5, x3 = 2, (2.5.2) (2.5.3) (2.5.4) which can be solved by back substitution. From Equation (2.5.4), x3 = 2. Substituting into Equation (2.5.3) and solving for x2 , we find that x2 = −1. Finally, substituting into Equation (2.5.2) for x3 and x2 and solving for x1 yields x1 = 1. Thus, our original system of equations has the unique solution (1, −1, 2), and the solution set to the system is S = {(1, −1, 2)}, which is a subset of R3 . The process of reducing the augmented matrix to row-echelon form and then using back substitution to solve the equivalent system is called Gaussian elimination. The particular case of Gaussian elimination that arises when the augmented matrix is reduced to reduced row-echelon form is called Gauss-Jordan elimination. Example 2.5.2 Use Gauss-Jordan elimination to determine the solution set to x1 + 2x2 − x3 = 1, 2x1 + 5x2 − x3 = 3, x1 + 3x2 + 2x3 = 6. Solution: In this case, we first reduce the augmented matrix of the system to reduced row-echelon form. 1 2 −1 1 1 0 −3 −1 1 0 −3 −1 100 5 1 2 −1 1 1 2 3 4 2 5 −1 3 ∼ 0 1 1 1 ∼ 0 1 1 1 ∼ 0 1 1 1 ∼ 0 1 0 −1 01 35 00 2 4 00 1 2 001 2 13 26 1. A12 (−2), A13 (−1) 2. A21 (−2), A23 (−1) 3. M3 (1/2) 4. A31 (3), A32 (−1) The augmented matrix is now in reduced row-echelon form. The equivalent system is = x1 x2 5, = −1, x3 = 2. and the solution can be read off directly as (5, −1, 2). Consequently, the given system has solution set S = {(5, −1, 2)} in R3 . We see from the preceding two examples that the advantage of Gauss-Jordan elimination over Gaussian elimination is that it does not require back substitution. However, the disadvantage is that reducing the augmented matrix to reduced row-echelon form requires more elementary row operations than reduction to row-echelon form. It can be i i i i i i i “main” 2007/2/16 page 152 i 152 CHAPTER 2 Matrices and Systems of Linear Equations shown, in fact, that in general, Gaussian elimination is the more computationally efficient technique. As we will see in the next section, the main reason for introducing the Gauss-Jordan method is its application to the computation of the inverse of an n × n matrix. Remark The Gaussian elimination method is so systematic that it can be programmed easily on a computer. Indeed, many large-scale programs for solving linear systems are based on the row-reduction method. In both of the preceding examples, rank(A) = rank(A# ) = number of unknowns in the system and the system had a unique solution. More generally, we have the following lemma: Lemma 2.5.3 Consider the m × n linear system Ax = b. Let A# denote the augmented matrix of the system. If rank(A) = rank(A# ) = n, then the system has a unique solution. Proof If rank(A) = rank(A# ) = n, then there are n leading ones in any row-echelon form of A, hence back substitution gives a unique solution. The form of the row-echelon form of A# is shown below, with m − n rows of zeros at the bottom of the matrix omitted and where the ∗’s denote unknown elements of the row-echelon form. 1 ∗ ∗ ∗ ... ∗ ∗ 0 1 ∗ ∗ . . . ∗ ∗ 0 0 1 ∗ . . . ∗ ∗ . . . . . . . . . . . . . . . . . . . .. 0 0 0 0 ... 1 ∗ Note that rank(A) cannot exceed rank(A# ). Thus, there are only two possibilities for the relationship between rank(A) and rank(A# ): rank(A) < rank(A# ) or rank(A) = rank(A# ). We now consider what happens in these cases. Example 2.5.4 Determine the solution set to x1 + x2 − x3 + x4 = 1, 2x1 + 3x2 + x3 = 4, 3x1 + 5x2 + 3x3 − x4 = 5. Solution: We use elementary row operations to reduce the augmented matrix: 1 1 −1 1 1 1 1 −1 1 1 1 1 −1 1 1 1 2 2 3 1 0 4 ∼ 0 1 3 −2 2 ∼ 0 1 3 −2 2 3 5 3 −1 5 0 2 6 −4 2 0 0 0 0 −2 1. A12 (−2), A13 (−3) 2. A23 (−2) The last row tells us that the system of equations has no solution (that is, it is inconsistent), since it requires 0x1 + 0x2 + 0x3 + 0x4 = −2, which is clearly impossible. The solution set to the system is thus the empty set ∅. i i i i i i i “main” 2007/2/16 page 153 i 2.5 Gaussian Elimination 153 In the previous example, rank(A) = 2, whereas rank(A# ) = 3. Thus, rank(A) < rank(A# ), and the corresponding system has no solution. Next we establish that this result is true in general. Lemma 2.5.5 Consider the m × n linear system Ax = b. Let A# denote the augmented matrix of the system. If rank(A) < rank(A# ), then the system is inconsistent. Proof If rank(A) < rank(A# ), then there will be one row in the reduced row-echelon form of the augmented matrix whose first nonzero element arises in the last column. Such a row corresponds to an equation of the form 0x1 + 0x2 + · · · + 0xn = 1, which has no solution. Consequently, the system is inconsistent. Finally, we consider the case when rank(A) = rank(A# ). If rank(A) = n, we have already seen in Lemma 2.5.3 that the system has a unique solution. We now consider an example in which rank(A) < n. Example 2.5.6 Determine the solution set to 5x1 −6x2 +x3 = 4, 2x1 −3x2 +x3 = 1, 4x1 −3x2 −x3 = 5. Solution: (2.5.5) We begin by reducing the augmented matrix of the system. 5 −6 1 4 1 −3 2 −1 1 −3 2 −1 1 2 2 −3 1 1 ∼ 2 −3 1 1 ∼ 0 3 −3 3 4 −3 −1 5 4 −3 1 5 − 0 9 9 9 − 1 −3 2 −1 1 −3 2 −1 3 4 ∼ 0 1 −1 1 ∼ 0 1 −1 1 0 9 −9 9 0000 1. A31 (−1) 2. A12 (−2), A13 (−4) 3. M2 (1/3) 4. A23 (−9) The augmented matrix is now in row-echelon form, and the equivalent system is x1 − 3x2 + 2x3 = −1, x2 − x3 = 1. (2.5.6) (2.5.7) Since we have three variables, but only two equations relating them, we are free to specify one of the variables arbitrarily. The variable that we choose to specify is called a free variable or free parameter. The remaining variables are then determined by the system of equations and are called bound variables or bound parameters. In the foregoing system, we take x3 as the free variable and set x3 = t, where t can assume any real value5 . It follows from (2.5.7) that x2 = 1 + t. 5 When considering systems of equations with complex coefficients, we allow free variables to assume complex values as well. i i i i i i i “main” 2007/2/16 page 154 i 154 CHAPTER 2 Matrices and Systems of Linear Equations Further, from Equation (2.5.6), x1 = −1 + 3(1 + t) − 2t = 2 + t. Thus the solution set to the given system of equations is the following subset of R3 : S = {(2 + t, 1 + t, t) : t ∈ R}. The system has an infinite number of solutions, obtained by allowing the parameter t to assume all real values. For example, two particular solutions of the system are (2, 1, 0) and (0, −1, −2), corresponding to t = 0 and t = −2, respectively. Note that we can also write the solution set S above in the form S = {(2, 1, 0) + t (1, 1, 1) : t ∈ R}. Remark The geometry of the foregoing solution is as follows. The given system (2.5.5) can be interpreted as consisting of three planes in 3-space. Any solution to the system gives the coordinates of a point of intersection of the three planes. In the preceding example the planes intersect in a line whose parametric equations are x1 = 2 + t, x2 = 1 + t, x3 = t. (See Figure 2.3.1.) In general, the solution to a consistent m × n system of linear equations may involve more than one free variable. Indeed, the number of free variables will depend on how many nonzero rows arise in any row-echelon form of the augmented matrix, A# , of the system; that is, it will depend on the rank of A# . More precisely, if rank(A# ) = r # , then the equivalent system will have only r # relationships between the n variables. Consequently, provided the system is consistent, number of free variables = n − r # . We therefore have the following lemma. Lemma 2.5.7 Consider the m × n linear system Ax = b. Let A# denote the augmented matrix of the system and let r # = rank(A# ). If r # = rank(A) < n, then the system has an infinite number of solutions, indexed by n − r # free variables. Proof As discussed before, any row-echelon equivalent system will have only r # equations involving the n variables, and so there will be n − r # > 0 free variables. If we assign arbitrary values to these free variables, then the remaining r # variables will be uniquely determined, by back substitution, from the system. Since each free variable can assume infinitely many values, in this case there are an infinite number of solutions to the system. Example 2.5.8 Use Gaussian elimination to solve x1 − 2x2 + 2x3 − x4 = 3, 3x1 + x2 + 6x3 + 11x4 = 16, 2x1 − x2 + 4x3 + 4x4 = 9. i i i i i i i “main” 2007/2/16 page 155 i 2.5 Solution: Gaussian Elimination 155 A row-echelon form of the augmented matrix of the system is 1 −2 2 −1 3 0 1 0 2 1 , 0 00 00 so that we have two free variables. The equivalent system is x1 − 2x2 + 2x3 − x4 = 3, x2 + 2x4 = 1. (2.5.8) (2.5.9) Notice that we cannot choose any two variables freely. For example, from Equation (2.5.9), we cannot specify both x2 and x4 independently. The bound variables should be taken as those that correspond to leading 1s in the row-echelon form of A# , since these are the variables that can always be determined by back substitution (they appear as the leftmost variable in some equation of the system corresponding to the row echelon form of the augmented matrix). Choose as free variables those variables that do not correspond to a leading 1 in a row-echelon form of A# . Applying this rule to Equations (2.5.8) and (2.5.9), we choose x3 and x4 as free variables and therefore set x3 = s, x4 = t. It then follows from Equation (2.5.9) that x2 = 1 − 2t. Substitution into (2.5.8) yields x1 = 5 − 2s − 3t, so that the solution set to the given system is the following subset of R4 : S = {(5 − 2s − 3t, 1 − 2t, s, t) : s, t ∈ R}. = {(5, 1, 0, 0) + s(−2, 0, 1, 0) + t (−3, −2, 0, 1) : s, t ∈ R}. Lemmas 2.5.3, 2.5.5, and 2.5.7 completely characterize the solution properties of an m × n linear system. Combining the results of these three lemmas gives the next theorem. Theorem 2.5.9 Consider the m × n linear system Ax = b. Let r denote the rank of A, and let r # denote the rank of the augmented matrix of the system. Then 1. If r < r # , the system is inconsistent. 2. If r = r # , the system is consistent and (a) There exists a unique solution if and only if r # = n. (b) There exists an infinite number of solutions if and only if r # < n. i i i i i i i “main” 2007/2/16 page 156 i 156 CHAPTER 2 Matrices and Systems of Linear Equations Homogeneous Linear Systems Many problems that we will meet in the future will require the solution to a homogeneous system of linear equations. The general form for such a system is a11 x1 + a12 x2 + · · · + a1n xn = 0, a21 x1 + a22 x2 + · · · + a2n xn = 0, . . . (2.5.10) am1 x1 + am2 x2 + · · · + amn xn = 0, or, in matrix form, Ax = 0, where A is the coefficient matrix of the system and 0 denotes the m-vector whose elements are all zeros. Corollary 2.5.10 The homogeneous linear system Ax = 0 is consistent for any coefficient matrix A, with a solution given by x = 0. Proof We can see immediately from (2.5.10) that if x = 0, then Ax = 0, so x = 0 is a solution to the homogeneous linear system. Alternatively, we can deduce the consistency of this system from Theorem 2.5.9 as follows. The augmented matrix A# of a homogeneous linear system differs from that of the coefficient matrix A only by the addition of a column of zeros, a feature that does not affect the rank of the matrix. Consequently, for a homogeneous system, we have rank(A# ) = rank(A), and therefore, from Theorem 2.5.9, such a system is necessarily consistent. Remarks 1. The solution x = 0 is referred to as the trivial solution. Consequently, from Theorem 2.5.9, a homogeneous system either has only the trivial solution or has an infinite number of solutions (one of which must be the trivial solution). 2. Once more it is worth mentioning the geometric interpretation of Corollary 2.5.10 in the case of a homogeneous system with three unknowns. We can regard each equation of such a system as defining a plane. Owing to the homogeneity, each plane passes through the origin, hence the planes intersect at least at the origin. Often we will be interested in determining whether a given homogeneous system has an infinite number of solutions, and not in actually obtaining the solutions. The following corollary to Theorem 2.5.9 can sometimes be used to determine by inspection whether a given homogeneous system has nontrivial solutions: Corollary 2.5.11 A homogeneous system of m linear equations in n unknowns, with m < n, has an infinite number of solutions. Proof Let r and r # be as in Theorem 2.5.9. Using the fact that r = r # for a homogeneous system, we see that since r # ≤ m < n, Theorem 2.5.9 implies that the system has an infinite number of solutions. Remark If m ≥ n, then we may or may not have nontrivial solutions, depending on whether the rank of the augmented matrix, r # , satisfies r # < n or r # = n, respectively. We encourage the reader to construct linear systems that illustrate each of these two possibilities. i i i i i i i “main” 2007/2/16 page 157 i 2.5 Gaussian Elimination 157 Example 2.5.12 02 3 Determine the solution set to Ax = 0, if A = 0 1 −1. 03 7 Solution: The augmented matrix of the system is 02 30 0 1 −1 0 , 03 70 with reduced row-echelon form 0100 0 0 1 0 . 0000 The equivalent system is x2 = 0, x3 = 0. It is tempting, but incorrect, to conclude from this that the solution to the system is x1 = x2 = x3 = 0. Since x1 does not occur in the system, it is a free variable and therefore not necessarily zero. Consequently, the correct solution to the foregoing system is (r, 0, 0), where r is a free variable, and the solution set is {(r, 0, 0) : r ∈ R}. The linear systems that we have so far encountered have all had real coefficients, and we have considered corresponding real solutions. The techniques that we have developed for solving linear systems are also applicable to the case when our system has complex coefficients. The corresponding solutions will also be complex. Remark In general, the simplest method of putting a leading 1 in a position that contains the complex number a + ib is to multiply the corresponding row by the scalar 1 (a − ib). This is illustrated in steps 1 and 4 in the next example. If difficulties a 2 +b 2 are encountered, consultation of Appendix A is in order. Example 2.5.13 Determine the solution set to (1 + 2i)x1 + 4x2 + (3 + i)x3 = 0, 3x3 = 0, (2 − i)x1 + (1 + i)x2 + 5ix1 + (7 − i)x2 + (3 + 2i)x3 = 0. Solution: We reduce the augmented matrix of the system. 1 + 2i 4 3 + i 0 1 4 (1 − 2i) 1 − i 5 1 2−i 1+i ∼ 2 − i 1 + i 30 3 5i 7 − i 3 + 2i 0 5i 7 − i 3 + 2i 4 1−i 1 5 (1 − 2i) 0 (1 + i) − 4 (1 − 2i)(2 − i) 3 − (1 − i)(2 − i) ∼ 5 0 (7 − i) − 4i(1 − 2i) (3 + 2i) − 5i(1 − i) 2 0 0 0 0 0 0 i i i i i i i “main” 2007/2/16 page 158 i 158 CHAPTER 2 Matrices and Systems of Linear Equations 1 4 (1 − 2i) 1 − i 0 1 5 3 = 0 1 + 5i 2 + 3i 0 ∼ 0 0 −1 − 5i −2 − 3i 0 0 1 ∼ 0 0 4 4 5 (1 − 2i) 1 0 4 5 (1 − 2i) 1 + 5i 0 1−i 1 (17 − 7i) 26 0 1−i 0 2 + 3i 0 00 0 0 . 0 1. M1 ((1 − 2i)/5) 2. A12 (−(2 − i)), A13 (−5i) 3. A23 (1) 4. M2 ((1 − 5i)/26) This matrix is now in row-echelon form. The equivalent system is x1 + 4 5 (1 − 2i)x2 x2 + + (1 − i)x3 1 26 (17 − 7i)x3 = 0, = 0. There is one free variable, which we take to be x3 = t , where t can assume any complex value. Applying back substitution yields x2 = x1 = = 1 26 t (−17 + 7i) 2 − 65 t (1 − 2i)(−17 + 7i) − t (1 − i) 1 − 65 t (59 + 17i) so that the solution set to the system is the subset of C3 1 1 − 65 t (59 + 17i), 26 t (−17 + 7i), t : t ∈ C . Exercises for 2.5 Key Terms Gaussian elimination, Gauss-Jordan elimination, Free variables, Bound (or leading) variables, Trivial solution. Skills • Be able to solve a linear system of equations by Gaussian elimination and by Gauss-Jordan elimination. • Be able to identify free variables and bound variables and know how they are used to construct the solution set to a linear system. • Understand the relationship between the ranks of A and A# , and how this affects the number of solutions to a linear system. True-False Review For Questions 1–6, decide if the given statement is true or false, and give a brief justification for your answer. If true, you can quote a relevant definition or theorem from the text. If false, provide an example, illustration, or brief explanation of why the statement is false. i i i i i i i “main” 2007/2/16 page 159 i 2.5 1. The process by which a matrix is brought via elementary row operations to row-echelon form is known as Gauss-Jordan elimination. 4. A linear system Ax = b is consistent if and only if the last column of the row-echelon form of the augmented matrix [A b] is not a pivot column. − + + − + − − + x3 x3 2x3 x3 x4 x4 2x4 x4 = = = = 8. 9. x1 + 2x2 + x3 + x4 − 2x5 = 3, x3 + 4x4 − 3x5 = 2, 2x1 + 4x2 − x3 − 10x4 + 5x5 = 0. For Problems 10–15, use Gauss-Jordan elimination to determine the solution set to the given system. 10. 2x1 − x2 − x3 = 2, 4x1 + 3x2 − 2x3 = −1, x1 + 4x2 + x3 = 4. 11. 3x1 + x2 + 5x3 = 2, x1 + x2 − x3 = 1, 2x1 + x2 + 2x3 = 3. − 2x3 = −3, x1 3x1 − 2x2 − 4x3 = −9, x1 − 4x2 + 2x3 = −3. 13. 2x1 − x2 + 3x3 − x4 = 3, 3x1 + 2x2 + x3 − 5x4 = −6, x1 − 2x2 + 3x3 + x4 = 6. 14. x1 x1 x1 x1 15. 2x1 x1 3x1 x1 5x1 6. The columns of the row-echelon form of A# that contain the leading 1s correspond to the free variables. Problems For Problems 1–9, use Gaussian elimination to determine the solution set to the given system. x1 + 2x2 + x3 = 1, 3x1 + 5x2 + x3 = 3, 2x1 + 6x2 + 7x3 = 1. 2. 3x1 − x2 = 1, 2x1 + x2 + 5x3 = 4, 7x1 − 5x2 − 8x3 = −3. 3. 3x1 + 5x2 − x3 = 14, x1 + 2x2 + x3 = 3, 2x1 + 5x2 + 6x3 = 2. 4. 6x1 − 3x2 + 3x3 = 12, 2x1 − x2 + x3 = 4, −4x1 + 2x2 − 2x3 = −8. 5. 2x1 3x1 7x1 5x1 − + + − x2 x2 2x2 x2 + − − − 3x3 2x3 3x3 2x3 = 14, = −1, = 3, = 5. 6. 2x1 3x1 5x1 x1 − + + + x2 2x2 6x2 x2 − − − − 4x3 5x3 6x3 3x3 = 5, = 8, = 20, = −3. 7. x1 + 2x2 − x3 + x4 = 1, 2x1 + 4x2 − 2x3 + 2x4 = 2, 5x1 + 10x2 − 5x3 + 5x4 = 5. 159 1, 2, 1, 3. 12. 5. A linear system is consistent if and only if there are free variables in the row-echelon form of the corresponding augmented matrix. 1. 2x2 3x2 5x2 x2 x1 2x1 x1 4x1 2. A homogeneous linear system of equations is always consistent. 3. For a linear system Ax = b, every column of the row-echelon form of A corresponds to either a bound variable or a free variable, but not both, of the linear system. + − − + Gaussian Elimination + − + − + − − + x3 x3 x3 x3 − − + + x4 x4 x4 x4 x2 3x2 x2 2x2 3x2 + − − + − 3x3 2x3 2x3 x3 3x3 + − − + + x2 x2 x2 x2 − − + + − = 4, = 2, = −2, = −8. x4 x4 x4 2x4 x4 − − + + + x5 2x5 x5 3x5 2x5 = 11, = 2, = −2, = −3, = 2. For Problems 16–20, determine the solution set to the system Ax = b for the given coefficient matrix A and right-hand side vector b. 1 −3 1 8 16. A = 5 −4 1 , b = 15 . 2 4 −3 −4 1 05 0 17. A = 3 −2 11 , b = 2 . 2 −2 6 2 0 1 −1 −2 18. A = 0 5 1 , b = 8 . 02 1 5 i i i i i i i “main” 2007/2/16 page 160 i 160 CHAPTER 2 Matrices and Systems of Linear Equations 1 −1 0 −1 2 19. A = 2 1 3 7 , b = 2 . 3 −2 1 0 4 11 0 1 2 3 1 −2 3 , b = 8 . 20. A = 2 3 1 2 3 −2 3 5 −2 −9 21. Determine all values of the constant k for which the following system has (a) no solution, (b) an infinite number of solutions, and (c) a unique solution. x1 + 2x2 − x3 = 3, 2x1 + 5x2 + x3 = 7, x1 + x2 − k 2 x3 = −k. 22. Determine all values of the constant k for which the following system has (a) no solution, (b) an infinite number of solutions, and (c) a unique solution. 2x1 x1 4x1 3x1 + + + − x2 x2 2x2 x2 − + − + x3 x3 x3 x3 + − + + x4 x4 x4 kx4 = = = = 0, 0, 0, 0. 23. Determine all values of the constants a and b for which the following system has (a) no solution, (b) an infinite number of solutions, and (c) a unique solution. x1 + x2 − 2x3 = 4, 3x1 + 5x2 − 4x3 = 16, 2x1 + 3x2 − ax3 = b. 24. Determine all values of the constants a and b for which the following system has (a) no solution, (b) an infinite number of solutions, and (c) a unique solution. x1 − ax2 = 3, x2 = 6, 2x1 + −3x1 + (a + b)x2 = 1. 25. Show that the system x1 + x2 + x3 = y1 , 2x1 + 3x2 + x3 = y2 , 3x1 + 5x2 + x3 = y3 , has an infinite number of solutions, provided that (y1 , y2 , y3 ) lies on the plane whose equation is y1 − 2y2 + y3 = 0. 6 26. Consider the system of linear equations a11 x1 + a12 x2 = b1 , a21 x1 + a22 x2 = b2 . Define 1 , 1, and 2 by = a11 a22 − a12 a21 , = a22 b1 − a12 b2 , 2 = a11 b2 − a12 b1 . (a) Show that the given system has a unique solution if and only if = 0, and that the unique solution in this case is x1 = 1 / , x2 = 2 / . (b) If = 0 and a11 = 0, determine the conditions on 2 that would guarantee that the system has (i) no solution, (ii) an infinite number of solutions. (c) Interpret your results in terms of intersections of straight lines. Gaussian elimination with partial pivoting uses the following algorithm to reduce the augmented matrix: 1. Start with augmented matrix A# . 2. Determine the leftmost nonzero column. 3. Permute rows to put the element of largest absolute value in the pivot position. 4. Use elementary row operations to put zeros beneath the pivot position. 5. If there are no more nonzero rows below the pivot position, go to 7, otherwise go to 6. 6. Apply (2)–(5) to the submatrix consisting of the rows that lie below the pivot position. 7. The matrix is in reduced form.6 In Problems 27–30, use the preceding algorithm to reduce A# and then apply back substitution to solve the equivalent system. Technology might be useful in performing the required row operations. 27. The system in Problem 1. 28. The system in Problem 5. 29. The system in Problem 6. 30. The system in Problem 10. Notice that this reduced form is not a row-echelon matrix. i i i i i i i “main” 2007/2/16 page 161 i 2.5 31. (a) An n × n system of linear equations whose matrix of coefficients is a lower triangular matrix is called a lower triangular system. Assuming that aii = 0 for each i , devise a method for solving such a system that is analogous to the backsubstitution method. (b) Use your method from (a) to solve x1 = 2, 2x1 − 3x2 = 1, 3x1 + x2 − x3 = 8. 33. 3x1 + 2x2 − x3 = 0, 2x1 + x2 + x3 = 0, 5x1 − 4x2 + x3 = 0. + − − + − + − − = = = = 0, 0, 0, 0. 34. 2x1 3x1 x1 5x1 35. 2x1 − x2 − x3 = 0, 5x1 − x2 + 2x3 = 0, x1 + x2 + 4x3 = 0. 36. (1 + 2i)x1 + (1 − i)x2 + x3 = 0, ix1 + (1 + i)x2 − ix3 = 0, 2ix1 + x2 + (1 + 3i)x3 = 0. 37. 3x1 + 2x2 + x3 = 0, 6x1 − x2 + 2x3 = 0, 12x1 + 6x2 + 4x3 = 0. 38. 2x1 3x1 5x1 3x1 39. x1 + (1 + i)x2 + (1 − i)x3 = 0, ix1 + x2 + ix3 = 0, (1 − 2i)x1 − (1 − i)x2 + (1 − 3i)x3 = 0. + − − − x2 x2 x2 2x2 x2 2x2 6x2 5x2 − − − + x3 2x3 x3 2x3 8x3 5x3 3x3 x3 = = = = 0, 0, 0, 0. 6x3 9x3 3x3 15x3 = = = = 161 0, 0, 0, 0. 41. 2x1 3x1 x1 5x1 42. 4x1 − 2x2 − x3 − x4 = 0, 3x1 + x2 − 2x3 + 3x4 = 0, 5x1 − x2 − 2x3 + x4 = 0. 43. 2x1 x1 3x1 4x1 − − − − + + − + 4x2 6x2 2x2 10x2 x2 x2 x2 2x2 x3 2x3 x3 x3 + + + + − + + − x3 x3 x3 x3 + − − + x4 x4 2x4 x4 0, 0, 0, 0. = = = = 0, 0, 0, 0. For Problems 44–54, determine the solution set to the system Ax = 0 for the given matrix A. Does your answer contradict Theorem 2.5.9? Explain. For Problems 33–43, determine the solution set to the given system. = = = = 40. 3 2 x1 − x2 + x3 = 2, 3 2 3x1 + x2 − x3 = 2. + + − − x1 − x2 3x2 3x1 5x1 + x2 32. Find all solutions to the following nonlinear system of equations: 3 2 4x1 + 2x2 + 3x3 = 12, Gaussian Elimination 44. A = 2 −1 . 34 45. A = 1 − i 2i . 1 + i −2 1 + i 1 − 2i . −1 + i 2 + i 1 23 A = 2 −1 0 . 1 11 1 1 1 −1 A = −1 0 −1 2 . 13 2 2 2 − 3i 1 + i i − 1 A = 3 + 2i −1 + i −1 − i . 5−i 2i −2 1 30 A = −2 −3 0 . 1 40 103 3 −1 7 A = 2 1 8 . 1 1 5 −1 1 −1 1 −1 0 1 A = 3 −2 0 5 . −1 2 0 1 46. A = 47. 48. 49. 50. 51. 52. i i i i i i i “main” 2007/2/16 page 162 i 162 CHAPTER 2 Matrices and Systems of Linear Equations 1 0 −3 0 53. A = 3 0 −9 0 . −2 0 6 0 2.6 2 + i i 3 − 2i 54. A = i 1 − i 4 + 3i . 3 − i 1 + i 1 + 5i The Inverse of a Square Matrix In this section we investigate the situation when, for a given n × n matrix A, there exists a matrix B satisfying AB = In and BA = In (2.6.1) and derive an efficient method for determining B (when it does exist). As a possible application of the existence of such a matrix B , consider the n × n linear system Ax = b. (2.6.2) Premultiplying both sides of (2.6.2) by an n × n matrix B yields (BA)x = B b. Assuming that BA = In , this reduces to x = B b. (2.6.3) Thus, we have determined a solution to the system (2.6.2) by a matrix multiplication. Of course, this depends on the existence of a matrix B satisfying (2.6.1), and even if such a matrix B does exist, it will turn out that using (2.6.3) to solve n × n systems is not very efficient computationally. Therefore it is generally not used in practice to solve n × n systems. However, from a theoretical point of view, a formula such as (2.6.3) is very useful. We begin the investigation by establishing that there can be at most one matrix B satisfying (2.6.1) for a given n × n matrix A. Theorem 2.6.1 Let A be an n × n matrix. Suppose B and C are both n × n matrices satisfying AB = BA = In , (2.6.4) AC = CA = In , (2.6.5) respectively. Then B = C . Proof From (2.6.4), it follows that C = CIn = C(AB). That is, C = (CA)B = In B = B, where we have used (2.6.5) to replace CA by In in the second step. Since the identity matrix In plays the role of the number 1 in the multiplication of matrices, the properties given in (2.6.1) are the analogs for matrices of the properties xx −1 = 1, x −1 x = 1, which holds for all (nonzero) numbers x . It is therefore natural to denote the matrix B in (2.6.1) by A−1 and to call it the inverse of A. The following definition introduces the appropriate terminology. i i i i i i i “main” 2007/2/16 page 163 i 2.6 The Inverse of a Square Matrix 163 DEFINITION 2.6.2 Let A be an n × n matrix. If there exists an n × n matrix A−1 satisfying AA−1 = A−1 A = In , then we call A−1 the matrix inverse to A, or just the inverse of A. We say that A is invertible if A−1 exists. Invertible matrices are sometimes called nonsingular, while matrices that are not invertible are sometimes called singular. Remark It is important to realize that A−1 denotes the matrix that satisfies AA−1 = A−1 A = In . It does not mean 1/A, which has no meaning whatsoever. Example 2.6.3 1 −1 2 0 −1 3 If A = 2 −3 3, verify that B = 1 −1 1 is the inverse of A. 1 −1 1 1 0 −1 Solution: By direct multiplication, we find that 1 −1 2 0 −1 3 100 AB = 2 −3 3 1 −1 1 = 0 1 0 = I3 1 −1 1 1 0 −1 001 and 0 −1 3 1 −1 2 100 BA = 1 −1 1 2 −3 3 = 0 1 0 = I3 . 1 0 −1 1 −1 1 001 Consequently, (2.6.1) is satisfied, hence B is indeed the inverse of A. We therefore write 0 −1 3 A−1 = 1 −1 1 . 1 0 −1 We now return to the n × n system of Equations (2.6.2). Theorem 2.6.4 If A−1 exists, then the n × n system of linear equations Ax = b has the unique solution x = A−1 b for every b in Rn . i i i i i i i “main” 2007/2/16 page 164 i 164 CHAPTER 2 Matrices and Systems of Linear Equations Proof We can verify by direct substitution that x = A−1 b is indeed a solution to the linear system. The uniqueness of this solution is contained in the calculation leading from (2.6.2) to (2.6.3). Our next theorem establishes when A−1 exists, and it also uncovers an efficient method for computing A−1 . Theorem 2.6.5 An n × n matrix A is invertible if and only if rank(A) = n. Proof If A−1 exists, then by Theorem 2.6.4, any n × n linear system Ax = b has a unique solution. Hence, Theorem 2.5.9 implies that rank(A) = n. Conversely, suppose rank(A) = n. We must establish that there exists an n × n matrix X satisfying AX = In = XA. Let e1 , e2 , . . . , en denote the column vectors of the identity matrix In . Since rank(A) = n, Theorem 2.5.9 implies that each of the linear systems Axi = ei , i = 1, 2, . . . , n (2.6.6) has a unique solution7 xi . Consequently, if we let X = [x1 , x2 , . . . , xn ], where x1 , x2 , . . . , xn are the unique solutions of the systems in (2.6.6), then A[x1 , x2 , . . . , xn ] = [Ax1 , Ax2 , . . . , Axn ] = [e1 , e2 , . . . , en ]; that is, AX = In . (2.6.7) We must also show that, for the same matrix X, XA = In . Postmultiplying both sides of (2.6.7) by A yields (AX)A = A. That is, A(XA − In ) = 0n . (2.6.8) Now let y1 , y2 , . . . , yn denote the column vectors of the n × n matrix XA − In . Equating corresponding column vectors on either side of (2.6.8) implies that Ayi = 0, i = 1, 2, . . . , n. (2.6.9) But, by assumption, rank(A) = n, and so each system in (2.6.9) has a unique solution that, since the systems are homogeneous, must be the trivial solution. Consequently, each yi is the zero vector, and thus XA − In = 0n . Therefore, XA = In . (2.6.10) 7 Notice that for an n × n system Ax = b, if rank(A) = n, then rank(A# ) = n. i i i i i i i “main” 2007/2/16 page 165 i 2.6 The Inverse of a Square Matrix 165 Equations (2.6.7) and (2.6.10) imply that X = A−1 . We now have the following converse to Theorem 2.6.4. Corollary 2.6.6 Let A be an n × n matrix. If Ax = b has a unique solution for some column n-vector b, then A−1 exists. Proof If Ax = b has a unique solution, then from Theorem 2.5.9, rank(A) = n, and so from the previous theorem, A−1 exists. Remark In particular, the above corollary tells us that if the homogeneous linear system Ax = 0 has only the trivial solution x = 0, then A−1 exists. Other criteria for deciding whether or not an n × n matrix A has an inverse will be developed in the next three chapters, but our goal at present is to develop a method for finding A−1 , should it exist. Assuming that rank(A) = n, let x1 , x2 , . . . , xn denote the column vectors of A−1 . Then, from (2.6.6), these column vectors can be obtained by solving each of the n × n systems i = 1, 2, . . . , n. Axi = ei , As we now show, some computation can be saved if we employ the Gauss-Jordan method in solving these systems. We first illustrate the method when n = 3. In this case, from (2.6.6), the column vectors of A−1 are determined by solving the three linear systems Ax1 = e1 , Ax2 = e2 , Ax3 = e3 . The augmented matrices of these systems can be written as 1 0 0 0 , 1 , 0 , 0 0 1 A A A respectively. Furthermore, since rank(A) = 3 by assumption, the reduced row-echelon form of A is I3 . Consequently, using elementary row operations to reduce the augmented matrix of the first system to reduced row-echelon form will yield, schematically, 1 1 0 0 a1 0 ∼ ERO ∼ 0 1 0 a2 , ... 0 0 0 1 a3 A which implies that the first column vector of A−1 is a1 x1 = a2 . a3 Similarly, for the second system, the reduction 0 1 0 0 b1 1 ∼ ERO ∼ 0 1 0 b2 ... 0 0 0 1 b3 A i i i i i i i “main” 2007/2/16 page 166 i 166 CHAPTER 2 Matrices and Systems of Linear Equations implies that the second column vector of A−1 is b1 x2 = b2 . b3 Finally, for the third system, the reduction 0 1 0 0 c1 0 ∼ ERO ∼ 0 1 0 c2 ... 1 0 0 1 c3 A implies that the third column vector of A−1 is c1 x3 = c2 . c3 Consequently, a1 b1 c1 = [x1 , x2 , x3 ] = a2 b2 c2 . a3 b3 c3 A−1 The key point to notice is that in solving for x1 , x2 , x3 we use the same elementary row operations to reduce A to I3 . We can therefore save a significant amount of work by combining the foregoing operations as follows: 100 1 0 0 a1 b1 c1 0 1 0 ∼ ERO ∼ 0 1 0 a2 b2 c2 . ... 001 0 0 1 a3 b3 c3 A The generalization to the n × n case is immediate. We form the n × 2n matrix [A In ] and reduce A to In using elementary row operations. Schematically, ... [A In ] ∼ ERO ∼ [In A−1 ]. This method of finding A−1 is called the Gauss-Jordan technique. Remark Notice that if we are given an n × n matrix A, we likely will not know from the outset whether rank(A) = n, hence we will not know whether A−1 exists. However, if at any stage in the row reduction of [A In ] we find that rank(A) < n, then it will follow from Theorem 2.6.5 that A is not invertible. Example 2.6.7 11 3 Find A−1 if A = 0 1 2 . 3 5 −1 Solution: 11 31 0 1 2 0 3 5 −1 0 Using the Gauss-Jordan technique, proceed as follows. we 00 11 3 100 10 1 1 −1 0 1 2 1 0 ∼ 0 1 2 0 1 0 ∼ 0 1 2 0 1 0 01 0 2 −10 −3 0 1 0 0 −14 −3 −2 1 1 11 8 1 0 0 14 − 7 14 1 0 1 1 −1 0 3 4 5 1 . 012 0 1 0 ∼ 0 1 0 −3 7 ∼ 7 7 3 1 0 0 1 14 1 − 14 7 1 3 1 0 0 1 14 7 − 14 i i i i i i i “main” 2007/2/16 page 167 i 2.6 Thus, −1 A 11 14 = −3 7 The Inverse of a Square Matrix −8 7 1 14 5 7 1 7 1 7 167 1 − 14 3 14 . We leave it as an exercise to confirm that AA−1 = A−1 A = I3 . 1. A13 (−3) 2. A21 (−1), A23 (−2) 3. M3 (−1/14) 4. A31 (−1), A32 (−2) Example 2.6.8 Continuing the previous example, use A−1 to solve the system x1 + x2 + 3x3 = 2, x2 + 2x3 = 1, 3x1 + 5x2 − x3 = 4. Solution: The system can be written as Ax = b, where A is the matrix in the previous example, and 2 b = 1. 4 Since A is invertible, the system has a unique solution that can be written as x = A−1 b. Thus, from the previous example we have 11 8 5 1 2 14 − 7 14 7 35 3 1 1 = . x = −7 7 7 7 3 2 1 1 4 14 7 − 14 7 Consequently, x1 = 532 7, 7, 7 5 7, x2 = 3 7, and x3 = 2 7, so that the solution to the system is . We now return to more theoretical information pertaining to the inverse of a matrix. Properties of the Inverse The inverse of an n × n matrix satisfies the properties stated in the following theorem, which should be committed to memory: Theorem 2.6.9 Let A and B be invertible n × n matrices. Then 1. A−1 is invertible and (A−1 )−1 = A. 2. AB is invertible and (AB)−1 = B −1 A−1 . 3. AT is invertible and (AT )−1 = (A−1 )T . i i i i i i i “main” 2007/2/16 page 168 i 168 CHAPTER 2 Matrices and Systems of Linear Equations Proof The proof of each result consists of verifying that the appropriate matrix products yield the identity matrix. 1. We must verify that A−1 A = In and AA−1 = In . Both of these follow directly from Definition 2.6.2. 2. We must verify that (AB)(B −1 A−1 ) = In and (B −1 A−1 )(AB) = In . We establish the first equality, leaving the second equation as an exercise. We have (AB)(B −1 )(A−1 ) = A(BB −1 )A−1 = AIn A−1 = AA−1 = In . 3. We must verify that AT (A−1 )T = In and (A−1 )T AT = In . Again, we prove the first part, leaving the second part as an exercise. First recall from Theorem 2.2.21 that AT B T = (BA)T . Using this property with B = A−1 yields T AT (A−1 )T = (A−1 A)T = In = In . The proof of property 2 of Theorem 2.6.9 can easily be extended to a statement about invertibility of a product of an arbitrary finite number of matrices. More precisely, we have the following. Corollary 2.6.10 Let A1 , A2 , . . . , Ak be invertible n × n matrices. Then A1 A2 · · · Ak is invertible, and (A1 A2 · · · Ak )−1 = A−1 A−11 · · · A−1 . k k− 1 Proof The proof is left as an exercise (Problem 28). Some Further Theoretical Results Finally, in this section, we establish two results that will be required in Section 2.7 and also in a proof that arises in Section 3.2. Theorem 2.6.11 Let A and B be n × n matrices. If AB = In , then both A and B are invertible and B = A−1 . Proof Let b be an arbitrary column n-vector. Then, since AB = In , we have A(B b) = In b = b. Consequently, for every b, the system Ax = b has the solution x = B b. But this implies that rank(A) = n. To see why, suppose that rank(A) < n, and let A∗ denote a rowechelon form of A. Note that the last row of A∗ is zero. Choose b∗ to be any column i i i i i i i “main” 2007/2/16 page 169 i 2.6 The Inverse of a Square Matrix 169 n-vector whose last component is nonzero. Then, since rank(A) < n, it follows that the system A∗ x = b∗ is inconsistent. But, applying to the augmented matrix [A∗ b∗ ] the inverse row operations that reduced A to row-echelon form yields [A b] for some b. Since Ax = b has the same solution set as A∗ x = b∗ , it follows that Ax = b is inconsistent. We therefore have a contradiction, and so it must be the case that rank(A) = n, and therefore that A is invertible by Theorem 2.6.5. We now establish that8 A−1 = B . Since AB = In by assumption, we have A−1 = A−1 In = A−1 (AB) = (A−1 A)B = In B = B, as required. It now follows directly from property 1 of Theorem 2.6.9 that B is invertible with inverse A. Corollary 2.6.12 Let A and B be n × n matrices. If AB is invertible, then both A and B are invertible. Proof If we let C = B(AB)−1 and D = AB , then AC = AB(AB)−1 = DD −1 = In . It follows from Theorem 2.6.11 that A is invertible. Similarly, if we let C = (AB)−1 A, then CB = (AB)−1 AB = In . Once more we can apply Theorem 2.6.11 to conclude that B is invertible. Exercises for 2.6 Key Terms Inverse, Invertible, Singular, Nonsingular, Gauss-Jordan technique. • Know the basic properties related to how the inverse operation behaves with respect to itself, multiplication, and transpose (Theorem 2.6.9). True-False Review Skills • Be able to find the inverse of an invertible matrix via the Gauss-Jordan technique. For Questions 1–10, decide if the given statement is true or false, and give a brief justification for your answer. If true, you can quote a relevant definition or theorem from the text. If false, provide an example, illustration, or brief explanation of why the statement is false. • Be able to use the inverse of a coefficient matrix of a linear system in order to solve the system. 1. An invertible matrix is also known as a singular matrix. • Be able to check directly whether or not two matrices A and B are inverses of each other. 8 Note that it now makes sense to speak of A−1 , whereas prior to proving in the preceding paragraph that A is invertible, it would not have been legal to use the notation A−1 . i i i i i i i “main” 2007/2/16 page 170 i 170 CHAPTER 2 Matrices and Systems of Linear Equations 2. Every square matrix that does not contain a row of zeros is invertible. 3. A linear system Ax = b with an n × n invertible coefficient matrix A has a unique solution. 4. If A is a matrix such that there exists a matrix B with AB = In , then A is invertible. 5. If A and B are invertible n × n matrices, then so is A + B. 6. If A and B are invertible n × n matrices, then so is AB . 7. If A is an invertible matrix such that A2 = A, then A is the identity matrix. 8. If A is an n × n invertible matrix and B and C are n × n matrices such that AB = AC , then B = C . 3 9. A = 1 2 0 10. A = 0 0 4 11. A = 2 3 12. 13. 9. If A is a 5 × 5 matrix of rank 4, then A is not invertible. 10. If A is a 6 × 6 matrix of rank 6, then A is invertible. 14. Problems For Problems 1–3 verify by direct multiplication that the given matrices are inverses of one another. 2 −1 , A−1 = 3 −1 1. A = 15. −1 1 . −3 2 49 7 −9 , A−1 = . 37 −3 4 8 −29 3 351 3. A = 1 2 1 , A−1 = −5 19 −2 . 2 −8 1 267 2. A = 16. 4. A = 12 . 13 5. A = 1 1+i . 1−i 1 6. A = 1 −i . −1 + i 2 7. A = 00 . 00 1 −1 2 8. A = 2 1 11 . 4 −3 10 2 −13 1 −7 . 2 4 1 2 −3 A = 2 6 −2 . −1 1 4 1 i2 A = 1 + i −1 2i . 2 2i 5 2 13 A = 1 −1 2 . 3 34 1 −1 2 3 2 0 3 −4 A= 3 −1 7 8 . 1 03 5 0 −2 −1 −3 2 0 2 1 A= 1 −2 0 2 . 3 −1 −2 0 17. Let For Problems 4–16, determine A−1 , if possible, using the Gauss-Jordan method. If A−1 exists, check your answer by verifying that AA−1 = In . 51 2 1 . 67 10 0 1 . 12 2 −1 4 A = 5 1 2. 1 −1 3 Find the second column vector of A−1 without determining the whole inverse. For Problems 18–22, use A−1 to find the solution to the given system. 18. x1 + 3x2 = 1, 2x1 + 5x2 = 3. 19. x1 + x2 − 2x3 = −2, x2 + x3 = 3, 2x1 + 4x2 − 3x3 = 1. 20. x1 − 2ix2 = 2, (2 − i)x1 + 4ix2 = −i. i i i i i i i “main” 2007/2/16 page 171 i 2.6 21. 3x1 + 4x2 + 5x3 = 1, 2x1 + 10x2 + x3 = 1, 4x1 + x2 + 8x3 = 1. 22. x1 + x2 + 2x3 = 12, x1 + 2x2 − x3 = 24, 2x1 − x2 + x3 = −36. 24. A = 171 The quantity defined above is referred to as the determinant of A. We will investigate determinants in more detail in the next chapter. 35. Let A be an n × n matrix, and suppose that we have to solve the p linear systems An n × n matrix A is called orthogonal if AT = A−1 . For Problems 23–26, show that the given matrices are orthogonal. 23. A = The Inverse of a Square Matrix 01 . −1 0 √ 3/2 √ /2 1 . −1/2 3/2 Axi = bi , i = 1, 2, . . . , p where the bi are given. Devise an efficient method for solving these systems. 36. Use your method from the previous problem to solve the three linear systems Axi = bi , cos α sin α 25. A = . − sin α cos α 1 −2x 2x 2 1 2x 1 − 2x 2 −2x . 26. A = 1 + 2x 2 2x 2 2x 1 if 1 −1 1 A = 2 −1 4 , 1 16 −1 b2 = 2 , 5 27. Complete the proof of Theorem 2.6.9 by verifying the remaining properties in parts 2 and 3. 28. Prove Corollary 2.6.10. For Problems 29–30, use properties of the inverse to prove the given statement. i = 1, 2, 3 1 b1 = 1 , −1 2 3. b3 = 2 37. Let A be an m × n matrix with m ≤ n. 29. If A is an n × n invertible symmetric matrix, then A−1 is symmetric. (a) If rank(A) = m, prove that there exists a matrix B satisfying AB = Im . Such a matrix is called a right inverse of A. 30. If A is an n × n invertible skew-symmetric matrix, then A−1 is skew-symmetric. (b) If 31. Let A be an n × n matrix with In − A is invertible with A4 = 0. Prove that (In − A)−1 = In + A + A2 + A3 . 32. Prove that if A, B, C are n × n matrices satisfying BA = In and AC = In , then B = C . 33. If A, B, C are n × n matrices satisfying BA = In and CA = In , does it follow that B = C ? Justify your answer. 34. Consider the general 2 × 2 matrix A= a11 a12 a21 a22 and let = a11 a22 − a12 a21 with a11 = 0. Show that if = 0, A−1 = 1 A= a22 −a12 . −a21 a11 131 , 274 determine all right inverses of A. For Problems 38–39, reduce the matrix [A In ] to reduced row-echelon form and thereby determine, if possible, the inverse of A. 5 9 17 38. A = 7 21 13 . 27 16 8 39. A is a randomly generated 4 × 4 matrix. For Problems 40–42, use built-in functions of some form of technology to determine rank(A) and, if possible,A−1 . 3 5 −7 5 9 . 40. A = 2 13 −11 22 i i i i i i i “main” 2007/2/16 page 172 i 172 CHAPTER 2 Matrices and Systems of Linear Equations 7 13 15 21 9 −2 14 23 41. A = 17 −27 22 31 . 19 −42 21 33 44. Hn = 42. A is a randomly generated 5 × 5 matrix. 43. 1 , i+j −1 1 ≤ i, j ≤ n. (a) Determine H4 and show that it is invertible. For the system in Problem 21, determine A−1 and use it to solve the system. 2.7 Consider the n × n Hilbert matrix − (b) Find H4 1 and use it to solve H4 x = b if b = [2, −1, 3, 5]T . Elementary Matrices and the LU Factorization We now introduce some matrices that can be used to perform elementary row operations on a matrix. Although they are of limited computational use, they do play a significant role in linear algebra and its applications. DEFINITION 2.7.1 Any matrix obtained by performing a single elementary row operation on the identity matrix is called an elementary matrix. In particular, an elementary matrix is always a square matrix. In general we will denote elementary matrices by E . If we are describing a specific elementary matrix, then in keeping with the notation introduced previously for elementary row operations, we will use the following notation for the three types of elementary matrices: Type 1: Pij —permute rows i and j in In . Type 2: Mi (k)—multiply row i of In by the nonzero scalar k . Type 3: Aij (k)—add k times row i of In to row j of In . Example 2.7.2 Write all 2 × 2 elementary matrices. Solution: From Definition 2.7.1 and using the notation introduced above, we have P12 = 01 . 10 2. Scaling matrices: M1 (k) = k0 , 01 M2 (k) = 10 . 0k 3. Row combinations: A12 (k) = 10 , k1 A21 (k) = 1k . 01 1. Permutation matrix: We leave it as an exercise to verify that the n × n elementary matrices have the following structure: i i i i i i i “main” 2007/2/16 page 173 i 2.7 Elementary Matrices and the LU Factorization 173 Pij : ones along main diagonal except (i, i) and (j, j ), ones in the (i, j ) and (j, i) positions, and zeros elsewhere. Mi (k): the diagonal matrix diag(1, 1, . . . , k, . . . , 1), where k appears in the (i, i) position. Aij (k): ones along the main diagonal, k in the (j, i) position, and zeros elsewhere. A key point to note about elementary matrices is the following: Premultiplying an n × p matrix A by an n × n elementary matrix E has the effect of performing the corresponding elementary row operation on A. Rather than proving this statement, which we leave as an exercise, we illustrate with an example. Example 2.7.3 If A = 3 −1 4 , then, for example, 2 75 k0 01 M1 (k)A = 3 −1 4 2 75 = 3k −k 4k . 2 75 Similarly, A21 (k)A = 3 −1 4 2 75 1k 01 = 3 + 2k −1 + 7k 4 + 5k . 2 7 5 Since elementary row operations can be performed on a matrix by premultiplication by an appropriate elementary matrix, it follows that any matrix A can be reduced to rowechelon form by multiplication by a sequence of elementary matrices. Schematically we can therefore write Ek Ek −1 · · · E2 E1 A = U, where U denotes a row-echelon form of A and the Ei are elementary matrices. Example 2.7.4 Determine elementary matrices that reduce A = 23 to row-echelon form. 14 Solution: We can reduce A to row-echelon form using the following sequence of elementary row operations: 23 14 1 ∼ 1. P12 14 23 2 ∼ 14 0 −5 3 ∼ 14 . 01 2. A12 (−2) 3. M2 (− 1 ) 5 Consequently, M2 (− 1 )A12 (−2)P12 A = 5 14 , 01 i i i i i i i “main” 2007/2/16 page 174 i 174 CHAPTER 2 Matrices and Systems of Linear Equations which we can verify by direct multiplication: M2 (− 1 )A12 (−2)P12 A = 5 10 0 −1 5 10 −2 1 01 10 = 10 0 −1 5 10 −2 1 14 23 = 10 0 −1 5 14 0 −5 = 23 14 14 . 01 Since any elementary row operation is reversible, it follows that each elementary matrix is invertible. Indeed, in the 2 × 2 case it is easy to see that P−1 = 12 01 , 10 M1 (k)−1 = A12 (k)−1 = 10 , −k 1 1/k 0 , 01 M2 (k)−1 = A21 (k)−1 = 10 , 0 1/k 1 −k . 01 We leave it as an exercise to verify that in the n × n case, we have: Mi (k)−1 = Mi (1/k), P−1 = Pij , ij Aij (k)−1 = Aij (−k) Now consider an invertible n × n matrix A. Since the unique reduced row-echelon form of such a matrix is the identity matrix In , it follows from the preceding discussion that there exist elementary matrices E1 , E2 , . . . , Ek such that Ek Ek −1 · · · E2 E1 A = In . But this implies that and hence, A−1 = Ek Ek −1 · · · E2 E1 , − − − A = (A−1 )−1 = (Ek · · · E2 E1 )−1 = E1 1 E2 1 · · · Ek 1 , which is a product of elementary matrices. So any invertible matrix is a product of elementary matrices. Conversely, since elementary matrices are invertible, a product of elementary matrices is a product of invertible matrices, hence is invertible by Corollary 2.6.10. Therefore, we have established the following. Theorem 2.7.5 Let A be an n × n matrix. Then A is invertible if and only if A is a product of elementary matrices. The LU Decomposition of an Invertible Matrix 9 For the remainder of this section, we restrict our attention to invertible n × n matrices. In reducing such a matrix to row-echelon form, we have always placed leading ones on the main diagonal in order that we obtain a row-echelon matrix. We now lift the requirement that the main diagonal of the row-echelon form contain ones. As a consequence, the matrix that results from row reduction will be an upper triangular matrix but will not necessarily be in row-echelon form. Furthermore, reduction to such an upper triangular form can be accomplished without the use of Type 2 row operations. 9 The material in the remainder of this section is not used elsewhere in the text. i i i i i i i “main” 2007/2/16 page 175 i 2.7 Example 2.7.6 Elementary Matrices and the LU Factorization 175 Use elementary row operations to reduce the matrix 25 3 A = 3 1 −2 −1 2 1 to upper triangular form. Solution: The given matrix can be reduced to upper triangular form using the following sequence of elementary row operations: 2 5 2 2 5 3 25 3 2 1 3 1 −2 ∼ 0 − 13 − 13 ∼ 0 − 13 − 13 . 2 2 2 2 −1 2 1 5 9 0 0 −2 0 2 2 3 1 9 1. A12 (− 2 ), A13 ( 2 ) 2. A23 ( 13 ) When using elementary row operations of Type 3, the multiple of a specific row that is subtracted from row i to put a zero in the (i, j ) position is called a multiplier and denoted mij . Thus, in the preceding example, there are three multipliers—namely, 3 m21 = 2 , 1 m31 = − 2 , 9 m32 = − 13 . The multipliers will be used in the forthcoming discussion. In Example 2.7.6 we were able to reduce A to upper triangular form using only row operations of Type 3. This is not always the case. For example, the matrix 05 32 requires that the two rows be permuted to obtain an upper triangular form. For the moment, however, we will restrict our attention to invertible matrices A for which the reduction to upper triangular form can be accomplished without permuting rows. In this case, we can therefore reduce A to upper triangular form using row operations of Type 3 only. Furthermore, throughout the reduction process, we can restrict ourselves to Type 3 operations that add multiples of a row to rows beneath that row, by simply performing the row operations column by column, from left to right. According to our description of the elementary matrices Aij (k), our reduction process therefore uses only elementary matrices that are unit lower triangular. More specifically, in terms of elementary matrices we have Ek Ek −1 · · · E2 E1 A = U, where Ek , Ek −1 , . . . , E2 , E1 are unit lower triangular Type 3 elementary matrices and U is an upper triangular matrix. Since each elementary matrix is invertible, we can write the preceding equation as − − − A = E1 1 E2 1 · · · Ek 1 U. (2.7.1) But, as we have already argued, each of the elementary matrices in (2.7.1) is a unit lower triangular matrix, and we know from Corollary 2.2.23 that the product of two unit lower i i i i i i i “main” 2007/2/16 page 176 i 176 CHAPTER 2 Matrices and Systems of Linear Equations triangular matrices is also a unit lower triangular matrix. Consequently, (2.7.1) can be written as A = LU, (2.7.2) − − − L = E1 1 E2 1 · · · Ek 1 (2.7.3) where is a unit lower triangular matrix and U is an upper triangular matrix. Equation (2.7.2) is referred to as the LU factorization of A. It can be shown (Problem 29) that this LU factorization is unique. Example 2.7.7 Determine the LU factorization of the matrix 25 3 A = 3 1 −2 . −1 2 1 Solution: Using the results of Example 2.7.6, we can write 25 3 E3 E2 E1 A = 0 − 13 − 13 , 2 2 0 0 −2 where 3 E1 = A12 (− 2 ), 1 E2 = A13 ( 2 ), Therefore, and 9 E3 = A23 ( 13 ). 25 3 U = 0 − 13 − 13 2 2 0 0 −2 and from (2.7.3), − − − L = E1 1 E2 1 · · · Ek 1 . (2.7.4) Computing the inverses of the elementary matrices, we have − 3 E1 1 = A12 ( 2 ), − 1 E2 1 = A13 (− 2 ), and − 9 E3 1 = A23 (− 13 ). Substituting these results into (2.7.4) yields 1 00 1 00 100 100 3 3 1 0 = 2 1 0. L = 2 1 0 0 1 00 9 1 1 9 0 − 13 1 001 −2 0 1 − 2 − 13 1 Consequently, A= 1 3 2 1 −2 5 3 00 2 13 13 1 0 0 − 2 − 2 9 − 13 1 0 0 −2 which is easily verified by a matrix multiplication. i i i i i i i “main” 2007/2/16 page 177 i 2.7 177 Elementary Matrices and the LU Factorization Computing the lower triangular matrix L in the LU factorization of A using (2.7.3) can require a significant amount of work. However, if we look carefully at the matrix L in Example 2.7.7, we see that the elements beneath the leading diagonal are just the corresponding multipliers. That is, if lij denotes the (i, j ) element of the matrix L, then lij = mij , (2.7.5) i > j. Furthermore, it can be shown that this relationship holds in general. Consequently, we do not need to use (2.7.3) to obtain L. Instead we use row operations of Type 3 to reduce A to upper triangular form, and then we can use (2.7.5) to obtain L directly. Example 2.7.8 Determine the LU decomposition for the matrix 12 2 1 . 6 −5 32 2 −3 5 −1 A= 32 −1 1 Solution: To determine U , we reduce A to upper triangular form using only row operations of Type 3 in which we add multiples of a given row only to rows below the given row. 2 −3 1 2 2 −3 1 2 2 −3 1 2 1 0 13 − 2 −4 2 2 0 13 − 1 −4 3 0 13 − 1 −4 1 2 2 2 2 ∼ = U. ∼ A∼ 0 9 0 5 −4 0 0 5 −4 0 13 2 −8 2 0 0 0 71 0 0 45 35 13 13 13 7 1 0 −2 2 3 Row Operations Corresponding Multipliers 5 3 1 (1) A12 (− 2 ), A13 (− 2 ), A14 ( 2 ) 1 (2) A23 (−1), A24 ( 13 ) 9 (3) A34 (− 13 ) 5 m21 = 2 , m32 = 1, 9 m43 = 13 3 1 m31 = 2 , m41 = − 2 1 m42 = − 13 Consequently, from (2.7.4), L= 1 5 2 3 2 1 −2 0 00 1 0 0 . 1 1 0 19 − 13 13 1 We leave it as an exercise to verify that LU = A. The question undoubtedly in the reader’s mind is: What is the use of the LU decomposition? In order to answer this question, consider the n × n system of linear equation Ax = b, where A = LU . If we write the system as LU x = b and let U x = y, then solving Ax = b is equivalent to solving the pair of equations Ly = b, U x = y. i i i i i i i “main” 2007/2/16 page 178 i 178 CHAPTER 2 Matrices and Systems of Linear Equations Owing to the triangular form of each of the coefficient matrices L and U , these systems can be solved easily—the first by “forward” substitution and the second by back substitution. In the case when we have a single right-hand-side vector b, the LU factorization for solving the system has no advantage over Gaussian elimination. However, if we require the solution of several systems of equations with the same coefficient matrix A, say Axi = bi , i = 1, 2, . . . , p then it is more efficient to compute the LU factorization of A once, and then successively solve the triangular systems Lyi = bi , U xi = yi . Example 2.7.9 i = 1, 2, . . . , p. Use the LU decomposition of 12 2 1 6 −5 32 2 −3 5 −1 A= 32 −1 1 2 −3 to solve the system Ax = b if b = 5 . 7 We have shown in the previous example that A = LU where 1 0 00 2 −3 1 2 5 0 13 − 1 −4 1 0 0 2 2 2 3 L= and U = . 1 1 0 0 0 5 −4 2 1 19 − 2 − 13 13 1 0 0 0 71 13 Solution: We now solve the two triangular systems Ly = b and U x = y. Using forward substitution on the first of these systems, we have y1 = 2, 5 y2 = −3 − 2 y1 = −8, 3 y3 = 5 − 2 y1 − y2 = 5 − 3 + 8 = 10, 1 y4 = 7 + 2 y1 + 1 13 y2 − 9 13 y3 =8− 8 13 − 90 13 = 6 13 . Solving U x = y via back substitution yields x4 = 13 71 y4 x2 = 2 13 x1 = 1 2 = 6 71 , x3 = 1 (y3 + 4x4 ) = 5 1 y2 + 2 x3 + 4x4 = y1 + 3x2 − x3 − 2x4 2 13 = −8 + 1 2 367 355 2− 10 + 1 5 + 1086 355 24 71 − 24 71 = 734 355 , = − 362 , 355 734 355 − 12 71 = − 117 . 71 Consequently, 6 x = − 117 , − 362 , 734 , 71 . 71 355 355 i i i i i i i “main” 2007/2/16 page 179 i 2.7 179 Elementary Matrices and the LU Factorization In the more general case when row interchanges are required to reduce an invertible matrix A to upper triangular form, it can be shown that A has a factorization of the form A = P LU, (2.7.6) where P is an appropriate product of elementary permutation matrices, L is a unit lower triangular matrix, and U is an upper triangular matrix. From the properties of the elementary permutation matrices, it follows (see Problem 27), that P −1 = P T . Using (2.7.6) the linear system Ax = b can be written as P LU x = b, or equivalently, LU x = P T b. Consequently, to solve Ax = b in this case we can solve the two triangular systems Ly = P T b, U x = y. For a full discussion of this and other factorizations of n × n matrices, and their applications, the reader is referred to more advanced texts on linear algebra or numerical analysis [for example, B. Noble and J. W.Daniel, Applied Linear Algebra (Englewood Cliffs, N.J., Prentice Hall, 1988); J. Ll. Morris, Computational Methods in Elementary Numerical Analysis (New York: Wiley, 1983)]. Exercises for 2.7 Key Terms Elementary matrix, Multiplier, LU Factorization of a matrix. Skills • Be able to determine whether or not a given matrix is an elementary matrix. • Know the form for the permutation matrices, scaling matrices, and row combination matrices. • Be able to write down the inverse of an elementary matrix without any computation. • Be able to determine elementary matrices that reduce a given matrix to row-echelon form. • Be able to express an invertible matrix as a product of elementary matrices. • Be able to determine the multipliers of a matrix. • Be able to determine the LU factorization of a matrix. • Be able to use the LU factorization of a matrix A to solve a linear system Ax = b. True-False Review For Questions 1–10, decide if the given statement is true or false, and give a brief justification for your answer. If true, you can quote a relevant definition or theorem from the text. If false, provide an example, illustration, or brief explanation of why the statement is false. i i i i i i i “main” 2007/2/16 page 180 i 180 CHAPTER 2 Matrices and Systems of Linear Equations 1. Every elementary matrix is invertible. 2. A product of elementary matrices is an elementary matrix. 3. Every matrix can be expressed as a product of elementary matrices. 4. If A is an m × n matrix and E is an m × m elementary matrix, then the matrices A and EA have the same rank. 2 5. If Pij is a permutation matrix, then Pij = Pij . 6. If E1 and E2 are n × n elementary matrices, then E1 E2 = E2 E1 . 7. If E1 and E2 are n × n elementary matrices of the same type, then E1 E2 = E2 E1 . 8. Every matrix has an LU factorization. 9. In the LU factorization of a matrix A, the matrix L is a unit lower triangular matrix and the matrix U is a unit upper triangular matrix. 4 −5 . 14 9. A = 1 −1 0 10. A = 2 2 2 . 3 13 0 −4 −2 11. A = 1 −1 3 . −2 2 2 123 12. A = 0 8 0 . 345 13. Determine elementary matrices E1 , E2 , . . . , Ek that reduce 2 −1 A= 13 to reduced row-echelon form. Verify by direct multiplication that E1 E2 · · · Ek A = I2 . 14. Determine a Type 3 lower triangular elementary matrix E1 that reduces 10. A 4 × 4 matrix A that has an LU factorization has 10 multipliers. Problems 1. Write all 3 × 3 elementary matrices and their inverses. For Problems 2–5, determine elementary matrices that reduce the given matrix to row-echelon form. 2. 35 . 1 −2 58 2 . 1 3 −1 3 −1 4 4. 2 1 3 . 1 32 1234 5. 2 3 4 5 . 3456 A= to upper triangular form. Use Equation (2.7.3) to determine L and verify Equation (2.7.2). For Problems 15–20, determine the LU factorization of the given matrix. Verify your answer by computing the product LU . 15. A = 23 . 51 16. A = 31 . 52 3. For Problems 6–12, express the matrix A as a product of elementary matrices. 6. A = 12 . 13 7. A = −2 −3 . 57 8. A = 3 −4 . −1 2 3 −2 −1 5 3 −1 2 17. A = 6 −1 1 . −3 5 2 521 18. A = −10 −2 3 . 15 2 −3 1 −1 2 3 2 0 3 −4 19. A = 3 −1 7 8 . 1 34 5 2 −3 1 2 4 −1 1 1 20. A = −8 2 2 −5 . 6 15 2 i i i i i i i “main” 2007/2/16 page 181 i 2.8 For Problems 21–24, use the LU factorization of A to solve the system Ax = b. 12 3 ,b = . 23 −1 1 −3 5 1 22. A = 3 2 2 , b = 5 . 2 52 −1 22 1 1 23. A = 6 3 −1 , b = 0 . −4 2 2 2 43 00 2 8 1 2 0 3 24. A = 0 5 3 6 , b = 0 . 0 0 −5 7 5 21. A = (b) The inverse of a unit upper triangular matrix is unit upper triangular. Repeat for a unit lower triangular matrix. (a) Apply Corollary 2.6.12 to conclude that L2 and U1 are invertible, and then use the fact that L1 U1 = L2 U2 to establish that L−1 L1 = 2 − U2 U1 1 . (b) Use the result from (a) together with Theorem 2.2.22 and Corollary 2.2.23 to prove that − L−1 L1 = In and U2 U1 1 = In , from which the 2 required result follows. 30. QR Factorization: It can be shown that any invertible n × n matrix has a factorization of the form 2 −1 −8 3 A = QR, to solve each of the systems Axi = bi if b1 = 3 , −1 b2 = 2 , 7 26. Use the LU factorization of −1 4 A= 3 1 5 −7 b3 = 5 . −9 2 4 1 to solve each of the systems Axi = ei and thereby determine A−1 . 27. If P = P1 P2 · · · Pk , where each Pi is an elementary permutation matrix, show that P −1 = P T . 28. Prove that (a) The inverse of an invertible upper triangular matrix is upper triangular. Repeat for an invertible lower triangular matrix. 2.8 181 29. In this problem, we prove that the LU decomposition of an invertible n × n matrix is unique in the sense that, if A = L1 U1 and A = L2 U2 , where L1 , L2 are unit lower triangular matrices and U1 , U2 are upper triangular matrices, then L1 = L2 and U1 = U2 . 25. Use the LU factorization of A= The Invertible Matrix Theorem I where Q and R are invertible, R is upper triangular, and Q satisfies QT Q = In (i.e., Q is orthogonal). Determine an algorithm for solving the linear system Ax = b using this QR factorization. For Problems 31–33, use some form of technology to determine the LU factorization of the given matrix. Verify the factorization by computing the product LU . 3 5 −2 31. A = 2 7 9 . −5 5 11 27 −19 32 32. A = 15 −16 9 . 23 −13 51 34 13 19 22 53 17 −71 20 33. A = 21 37 63 59 . 81 93 −47 39 The Invertible Matrix Theorem I In Section 2.6, we defined an n × n invertible matrix A to be a matrix such that there exists an n × n matrix B satisfying AB = BA = In . There are, however, many other important and useful viewpoints on invertibility of matrices. Some of these we have already encountered in the preceding two sections, while others await us in later chapters. It is worthwhile to begin collecting a list of conditions on an n × n matrix A that are i i i i i i i “main” 2007/2/16 page 182 i 182 CHAPTER 2 Matrices and Systems of Linear Equations mathematically equivalent to its invertibility. We refer to this theorem as the Invertible Matrix Theorem. As we have indicated, this result is somewhat a “work in progress,” and we shall return to it later in Sections 3.2 and 4.10. Theorem 2.8.1 (Invertible Matrix Theorem) Let A be an n × n matrix with real elements. The following conditions on A are equivalent: (a) A is invertible. (b) The equation Ax = b has a unique solution for every b in Rn . (c) The equation Ax = 0 has only the trivial solution x = 0. (d) rank(A) = n. (e) A can be expressed as a product of elementary matrices. (f) A is row-equivalent to In . Proof The equivalence of (a), (b), and (d) has already been established in Section 2.6 in Theorems 2.6.4 and 2.6.5, as well as in Corollary 2.6.6. Moreover, the equivalence of (a) and (e) was already established in Theorem 2.7.5. Next we establish that (c) is an equivalent statement by proving that (b) ⇒ (c) ⇒ (d). Assuming that (b) holds, we can conclude that the linear system Ax = 0 has a unique solution. However, one solution is evidently x = 0, hence this is the unique solution to Ax = 0, which establishes (c). Next, assume that (c) holds. The fact that Ax = 0 has only the trivial solution means that, in reducing A to row-echelon form, we find no free parameters. Thus, every column (and hence every row) of A contains a pivot, which means that the row-echelon form of A has n nonzero rows; that is, rank(A) = n, which is (d). Finally, we prove that (e) ⇒ (f) ⇒ (a). If (e) holds, we can left multiply In by a product of elementary matrices (corresponding to a sequence of elementary row operations applied to In ) to obtain A. This means that A is row-equivalent to In , which is (f). Last, if A is row-equivalent to In , we can write A as a product of elementary matrices, each of which is invertible. Since a product of invertible matrices is invertible (by Corollary 2.6.10), we conclude that A is invertible, as needed. Exercises for 2.8 Skills • Know the list of characterizations of invertible matrices given in the Invertible Matrix Theorem. • Be able to use the Invertible Matrix Theorem to draw conclusions related to the invertibility of a matrix. True-False Review For Questions 1–4, decide if the given statement is true or false, and give a brief justification for your answer. If true, you can quote a relevant definition or theorem from the text. If false, provide an example, illustration, or brief explanation of why the statement is false. 1. If the linear system Ax = 0 has a nontrivial solution, then A can be expressed as a product of elementary matrices. 2. A 4 × 4 matrix A with rank(A) = 4 is row-equivalent to I4 . 3. If A is a 3 × 3 matrix with rank(A) = 2, then the linear system Ax = b must have infinitely many solutions. 4. Any n × n upper triangular matrix is row-equivalent to In . i i i i i i i “main” 2007/2/16 page 183 i 2.9 Problems 1. Use part (c) of the Invertible Matrix Theorem to prove that if A is an invertible matrix and B and C are matrices of the same size as A such that AB = AC , then B = C . [Hint: Consider AB − AC = 0.] 2. Give a direct proof of the fact that (d) ⇒ (c) in the Invertible Matrix Theorem. 2.9 Chapter Review 183 3. Give a direct proof of the fact that (c) ⇒ (b) in the Invertible Matrix Theorem. 4. Use the equivalence of (a) and (e) in the Invertible Matrix Theorem to prove that if A and B are invertible n × n matrices, then so is AB . 5. Use the equivalence of (a) and (c) in the Invertible Matrix Theorem to prove that if A and B are invertible n × n matrices, then so is AB . Chapter Review In this chapter we have investigated linear systems of equations. Matrices provide a convenient mathematical representation for linear systems, and whether or not a linear system has a solution (and if so, how many) can be determined entirely from the matrix for the linear system. An m × n matrix A = [aij ] is a rectangular array of numbers arranged in m rows and n columns. The entry in the i th row and j th column is written aij . More generally, such an array, whose entries are allowed to depend on an indeterminate t , is known as a matrix function. Matrix functions can be used to formulate systems of differential equations. If m = n, the matrix (or matrix function) is called a square matrix. Concepts Related to Square Matrices • Main diagonal: the entries a11 , a22 , . . . , ann in the matrix. • Trace: the sum of the entries on the main diagonal. • Upper triangular matrix: aij = 0 for i > j . • Lower triangular matrix: aij = 0 for i < j . • Diagonal matrix: aij = 0 for i = j . • Transpose: applying to any m × n matrix A, this is the n × m matrix AT obtained from A by interchanging its rows and columns • Symmetric matrix: AT = A; that is, aij = aj i . • Skew-symmetric matrix: AT = −A; that is, aij = −aj i . In particular, aii = 0 for each i . Matrix Algebra Given two matrices A and B of the same size m × n, we can perform the following operations: • Addition/subtraction A±B : add/subtract the corresponding elements of A and B . • Scalar multiplication rA: multiply each entry of A by the real (or complex) scalar r . If A is m × n and B is n × p, we can form their product AB , which is an m × p matrix whose (i, j )-entry is computed by taking the dot product of the i th row vector of A with the j th column vector of B . Note that, in general, AB = BA. i i i i i i i “main” 2007/2/16 page 184 i 184 CHAPTER 2 Matrices and Systems of Linear Equations Linear Systems The general m × n system of linear equations is of the form a11 x1 + a12 x2 + · · · + a1n xn = b1 , a21 x1 + a22 x2 + · · · + a2n xn = b2 , . . . am1 x1 + am2 x2 + · · · + amn xn = bm . If each bi = 0, the system is called homogeneous. There are two useful ways to formulate the above linear system: 1. Augmented matrix: a11 a12 . . . a1n a21 a22 . . . a2n A# = . . . am1 am2 . . . amn b1 b2 . . . . bm 2. Vector form: Ax = b, where a11 a12 . . . a1n a21 a22 . . . a2n A= , . . . am1 am2 . . . amn x1 x2 x = . , b = . . xn b1 b2 . . . . bm Elementary Row Operations and Row Echelon Form There are three types of elementary row operations on a matrix A: 1. Pij : Permute the i th and j th rows in A. 2. Mi (k): Multiply the entries in the i th row of A by the nonzero scalar k . 3. Aij (k): Add to the elements of the j th row of A the scalar k times the corresponding elements of the i th row of A. By performing elementary row operations on the augmented matrix above, we can determine solutions, if any, to the linear system. The strategy is to apply elementary row operations in such a way that A is transformed into row-echelon form—a process known as Gaussian elimination. By applying back substitution to the linear system corresponding to the row-echelon form obtained, we find the solution. This solution agrees with the solution to the original linear system. If necessary, free parameters may be used to express this solution. A leading one in the far right-hand column of the row-echelon form indicates that the system has no solution. A row-echelon form matrix is one in which • All rows consisting entirely of zeros are placed at the bottom of the matrix. • All other rows begin with a (leading) “1”, called a pivot. • The leading ones occur in columns strictly to the right of the leading ones in the rows above. i i i i i i i “main” 2007/2/16 page 185 i 2.9 Chapter Review 185 Invertible Matrices An n × n matrix A is invertible if there exists an n × n matrix B such that AB = In = BA, where In is the n × n identity matrix (ones on the main diagonal, zeros elsewhere). We write A−1 for the (unique) inverse B of A. One procedure for determining A−1 , if it exists, is the Gauss-Jordan technique: [A|In ] ∼ ERO ∼ [In |A−1 ]. ... Invertible matrices A share all of the following equivalent properties: • A can be reduced to In via a sequence of elementary row operations. • The linear system Ax = b has a unique solution x. • The linear system Ax = 0 has only the trivial solution x = 0. • A can be expressed as a product of elementary matrices that are obtained from the identity matrix by applying exactly one elementary row operation. Additional Problems Let A= (b) Using the values of a and b obtained in (a), compute BA. −3 0 −5 2 2 −6 −2 4 2 6 8. Let A be an m × n matrix and let B be an p × n matrix. , B= 1 −3 , C = 3 , −1 −1 5 0 Use the index form of the matrix product to prove that 01 1 (AB T )T = BAT . and r = −4. For Problems 1–6, compute the given expression, if possible. 9. Let A be an n × n matrix. 1. rA − B T . (a) Use the index form of the matrix product to write the ij th element of A2 . 2. AB and tr(AB). (b) In the case when A is a symmetric matrix, show that A2 is also symmetric. 3. (AC)(AC)T . 10. Let A and B be n × n matrices. If A is skew-symmetric, use properties of the transpose to establish that B T AB is also skew-symmetric. 4. (rB)A. 5. (AB)−1 . 6. C T C and tr(C T C). 7. Let 123 A= 257 and 3 B = −4 a b a . b (a) Compute AB and determine the values of a and b such that AB = I2 . An n × n matrix A is called nilpotent if Ap = 0 for some positive integer p. For Problems 11–12, show that the given matrix is nilpotent. 39 . −1 −3 011 12. A = 0 0 1 . 000 11. A = i i i i i i i “main” 2007/2/16 page 186 i 186 CHAPTER 2 Matrices and Systems of Linear Equations 24. −7 t2 6 − t 3t 3 + 6t 2 B(t) = 1 + t cos(πt/2) . 1 − t3 et and Compute the given expression, if possible. 13. A (t). 14. 10x1 +kx2 −x3 = 0, kx1 +x2 −x3 = 0, 2x1 +x2 −x3 = 0. 27. e−3t − sec2 t A(t) = 2t 3 cos t 6 ln t 36 − 5t kx1 + 2x2 − x3 = 2, kx2 + x3 = 2. 26. x1 − kx2 = 6, 2x1 + 3x2 = k. 25. For Problems 13–16, let x1 − kx2 + k 2 x3 = 0, x1 + kx3 = 0, x2 − x3 = 1. 28. Do the three planes x1 + 2x2 + x3 = 4, x2 − x3 = 1, and x1 + 3x2 = 0 have at least one common point of intersection? Explain. 1 0 B(t) dt . 15. t 3 · A(t) − sin t · B(t). 16. B (t) − et A(t). For Problems 17–23, determine the solution set to the given linear system of equations. For Problems 29–34, (a) find a row-echelon form of the given matrix A, (b) determine rank(A), and (c) use the GaussJordan technique to determine the inverse of A, if it exists. 29. A = 17. x1 + 5x2 + 2x3 = −6, 4x2 − 7x3 = 2, 5x3 = 0. 18. 5x1 − x2 + 2x3 = 7, −2x1 + 6x2 + 9x3 = 0, −7x1 + 5x2 − 3x3 = −7. 19. x + 2y − z = 1, x + z = 5, 4x + 4y = 12. 32. x1 − 2x2 − x3 + 3x4 = 0, −2x1 + 4x2 + 5x3 − 5x4 = 3, 3x1 − 6x2 − 6x3 + 8x4 = 2. 33. 20. 21. + + + + + + + + − − − − 31. x5 2x5 x5 4x5 = 1, = −1, = 5. = −2. 3x5 5x5 9x5 8x5 = = = = 22. x1 x1 2x1 2x1 23. 6, 8, 17, 14. x1 − 3x2 + 2ix3 = 1, −2ix1 + 6x2 + 2x3 = −2. x2 x2 3x2 2x2 x3 x3 x3 2x3 x4 2x4 4x4 3x4 2 −7 . −4 14 3 −1 6 A = 0 2 3 . 3 −5 0 2100 1 2 0 0 A= 0 0 3 4 . 0043 300 A = 0 2 −1 . 1 −1 2 −2 −3 1 A = 1 4 2 . 0 53 30. A = 3x1 − x3 + 2x4 − x1 + 3x2 + x3 − 3x4 + 4x1 − 2x2 − 3x3 + 6x4 − x4 + + + + + 47 . −2 5 For Problems 24–27, determine all values of k for which the given linear system has (a) no solution, (b) a unique solution, and (c) infinitely many solutions. 34. 35. Let 1 −1 3 A = 4 −3 13 . 1 14 Solve each of the systems Axi = ei , i = 1, 2, 3 where ei denote the column vectors of the identity matrix I3 . i i i i i i i “main” 2007/2/16 page 187 i 2.9 36. Solve each of the systems Axi = bi if 25 , 7 −2 4 b2 = , 3 A= 1 , 2 −2 b3 = . 5 187 43. (a) Prove that if A and B are n × n matrices, then (A + B)3 = A3 + A2 B + ABA + BA2 b1 = 37. Let A and B be invertible matrices. Chapter Review + AB 2 + BAB + B 2 A + B 3 . (b) How does the formula change for (A − B)3 ? (a) By computing an appropriate matrix product, verify that (A−1 B)−1 = B −1 A. (c) Can you make a conjecture about the number of terms in the expansion of (A + B)k , in terms of k? (b) Use properties of the inverse to derive (A−1 B)−1 = B −1 A. 44. Suppose that A and B are invertible matrices. Prove that the block matrix 38. Let S be an invertible n × n matrix and let k be a nonnegative integer. If A = SDS −1 , prove that Ak = SD k S −1 . For Problems 39–42, (a) express the given matrix as a product of elementary matrices, and (b) determine the LU decomposition of the matrix. 39. The matrix in Problem 29. 40. The matrix in Problem 32. 41. The matrix in Problem 33. 42. The matrix in Problem 34. A0 0 B −1 is invertible. 45. In many different positions can two leading ones of a row-echelon form of a 2 × 4 matrix occur? How about three leading ones for a 3 × 4 matrix? How about four leading ones for a 4 × 6 matrix? How about m leading ones for an m × n matrix with m ≤ n? 46. If the inverse of A2 is the matrix B , what is the inverse of the matrix A10 ? Prove your answer. Project: Circles and Spheres via Gaussian Elimination Part 1: Circles In this part, we shall see that any three noncollinear points in the plane can be found on a unique circle, and we will use Gaussian elimination to find the center and radius of this circle. (a) Show geometrically that three noncollinear points in the plane must lie on a unique circle. [Hint: The radius must lie on the line that passes through the midpoint of two of the three points and that is perpendicular to the segment connecting the two points.] (b) A circle in the plane has an equation that can be given in the form (x − a)2 + (y − b)2 = r 2 , where (a, b) is the center and r is the radius. By expanding the formula, we may write the equation of the circle in the form x 2 + y 2 + cx + dy = k, for constants c, d, and k . Using this latter formula together with Gaussian elimination, determine c, d , and k for each set of points below. Then solve for (a, b) and r to write the equation of the circle. (i) (2, −1), (3, 3), (4, −1). (ii) (−1, 0), (1, 2), (2, 2). i i i i i i i “main” 2007/2/16 page 188 i 188 CHAPTER 2 Matrices and Systems of Linear Equations Part 2: Spheres In this part, we shall extend the ideas of Part 1 and consider four noncoplanar points in 3-space. Any three of these four points lie in a plane but are noncollinear (why?). A sphere in 3-space has an equation that can be given in the form (x − a)2 + (y − b)2 + (z − c)2 = r 2 , where (a, b, c) is the center and r is the radius. By expanding the formula, we may write the equation of the sphere in the form x 2 + y 2 + z2 + ux + vy + wz = k, for constants u, v, w, and k . (a) Using the latter formula above together with Gaussian elimination, determine u, v, w, and k for each set of points below. Then solve for (a, b, c) and r to write the equation of the sphere. (i) (1, −1, 2), (2, −1, 4), (−1, −1, −1), (1, 4, 1). (ii) (2, 0, 0), (0, 3, 0), (0, 0, 4), (0, 0, 6). (b) What goes wrong with the procedure in (a) if the points lie on a single plane? Choose four points of your own and carry out the procedure in part (a) to see what happens. Can you describe circumstances under which the four coplanar points will lie on a sphere? i i i i ...
View Full Document

Page1 / 188

Chapters 1-2 - i i i “main” 2007/2/16 page 1 i CHAPTER...

This preview shows document pages 1 - 3. Sign up to view the full document.

View Full Document Right Arrow Icon
Ask a homework question - tutors are online