Partial Derivatives

Functions of Several Variables

Multivariable calculus is the extension of calculus in one variable to calculus in more than one variable.

Learning Objectives

Identify areas of application of multivariable calculus

Key Takeaways

Key Points

  • Multivariable calculus can be applied to analyze deterministic systems that have multiple degrees of freedom.
  • Unlike a single variable function
    f(x)f(x)
    , for which the limits and continuity of the function need to be checked as
    xx
    varies on a line (
    xx
    -axis), multivariable functions have infinite number of paths approaching a single point.
  • In multivariable calculus, gradient, Stokes', divergence, and Green theorems are specific incarnations of a more general theorem: the generalized Stokes' theorem.


Key Terms

  • deterministic: having exactly predictable time evolution
  • divergence: a vector operator that measures the magnitude of a vector field's source or sink at a given point, in terms of a signed scalar


Multivariable calculus (also known as multivariate calculus) is the extension of calculus in one variable to calculus in more than one variable: the differentiated and integrated functions involve multiple variables, rather than just one. Multivariable calculus can be applied to analyze deterministic systems that have multiple degrees of freedom. Functions with independent variables corresponding to each of the degrees of freedom are often used to model these systems, and multivariable calculus provides tools for characterizing the system dynamics.

image

A Scalar Field: A scalar field shown as a function of

(x,y)(x,y)
. Extensions of concepts used for single variable functions may require caution.

Multivariable calculus is used in many fields of natural and social science and engineering to model and study high-dimensional systems that exhibit deterministic behavior. Non-deterministic, or stochastic, systems can be studied using a different kind of mathematics, such as stochastic calculus. Quantitative analysts in finance also often use multivariate calculus to predict future trends in the stock market.

As we will see, multivariable functions may yield counter-intuitive results when applied to limits and continuity. Unlike a single variable function
f(x)f(x)
, for which the limits and continuity of the function need to be checked as
xx
 varies on a line (
xx
-axis), multivariable functions have infinite number of paths approaching a single point.Likewise, the path taken to evaluate a derivative or integral should always be specified when multivariable functions are involved.

We have also studied theorems linking derivatives and integrals of single variable functions. The theorems we learned are gradient theorem, Stokes' theorem, divergence theorem, and Green's theorem. In a more advanced study of multivariable calculus, it is seen that these four theorems are specific incarnations of a more general theorem, the generalized Stokes' theorem, which applies to the integration of differential forms over manifolds.

Limits and Continuity

A study of limits and continuity in multivariable calculus yields counter-intuitive results not demonstrated by single-variable functions.

Learning Objectives

Describe the relationship between the multivariate continuity and the continuity in each argument

Key Takeaways

Key Points

  • The function
    f(x,y)=x2yx4+y2f(x,y) = \frac{x^2y}{x^4+y^2}
    has different limit values at the origin, depending on the path taken for the evaluation.
  • Continuity in each argument does not imply multivariate continuity.
  • When taking different paths toward the same point yields different values for the limit, the limit does not exist.


Key Terms

  • continuity: lack of interruption or disconnection; the quality of being continuous in space or time
  • limit: a value to which a sequence or function converges
  • scalar function: any function whose domain is a vector space and whose value is its scalar field


A study of limits and continuity in multivariable calculus yields many counter-intuitive results not demonstrated by single- variable functions. For example, there are scalar functions of two variables with points in their domain which give a particular limit when approached along any arbitrary line, yet give a different limit when approached along a parabola. For example, the function
f(x,y)=x2yx4+y2f(x,y) = \frac{x^2y}{x^4+y^2}
approaches zero along any line through the origin. However, when the origin is approached along a parabola
y=x2y = x^2
, it has a limit of
0.50.5
. Since taking different paths toward the same point yields different values for the limit, the limit does not exist.

image

Continuity: Continuity in single-variable function as shown is rather obvious. However, continuity in multivariable functions yields many counter-intuitive results.

Continuity in each argument does not imply multivariate continuity. For instance, in the case of a real-valued function with two real-valued parameters,
f(x,y)f(x,y)
, continuity of
ff
in
xx
for fixed
yy
and continuity of
ff
in
yy
for fixed
xx
does not imply continuity of
ff
. As an example, consider

f(x,y)={yxyif 1x>y0xyxif 1y>x01xif x=y>00else.f(x,y)= \begin{cases} \displaystyle{\frac{y}{x}}-y & \text{if } 1 \geq x > y \geq 0 \\ \displaystyle{\frac{x}{y}}-x & \text{if } 1 \geq y > x \geq 0 \\ 1-x & \text{if } x=y>0 \\ 0 & \text{else}. \end{cases}


It is easy to check that all real-valued functions (with one real-valued argument) that are given by
fy(x)=f(x,y)f_y(x)= f(x,y)
are continuous in
xx
(for any fixed
yy
). Similarly, all
fxf_x
are continuous as
ff
is symmetric with regards to
xx
and
yy
. However,
ff
itself is not continuous as can be seen by considering the sequence
f(1n,1n)f \left(\frac{1}{n},\frac{1}{n} \right)
(for natural
nn
), which should converge to
f(0,0)=0\displaystyle{f (0,0) = 0}
if
ff
 is continuous. However,
limf(1n,1n)=1\lim f \left(\frac{1}{n},\frac{1}{n} \right) = 1
.

Partial Derivatives

A partial derivative of a function of several variables is its derivative with respect to a single variable, with the others held constant.

Learning Objectives

Identify proper ways to express the partial derivative

Key Takeaways

Key Points

  • The partial derivative of a function
    ff
    with respect to the variable
    xx
    is variously denoted by
    fx, f,x, xf, or fxf^\prime_x,\ f_{,x},\ \partial_x f, \text{ or } \frac{\partial f}{\partial x}
    .
  • To every point on this surface describing a multi-variable function, there is an infinite number of tangent lines. Partial differentiation is the act of choosing one of these lines and finding its slope.
  • As an ordinary derivative, partial derivatives are defined in limit:
    aif(a)=limh0f(a1,,ai1,ai+h,ai+1,,an)f(a1,,ai,,an)h\frac{ \partial }{\partial a_i }f(\mathbf{a}) = \lim_{h \rightarrow 0}{ f(a_1, \dots, a_{i-1}, a_i+h, a_{i+1}, \dots,a_n) - f(a_1, \dots, a_i, \dots,a_n) \over h }
    .


Key Terms

  • differential geometry: the study of geometry using differential calculus
  • Euclidean: adhering to the principles of traditional geometry, in which parallel lines are equidistant


A partial derivative of a function of several variables is its derivative with respect to one of those variables, with the others held constant (as opposed to the total derivative, in which all variables are allowed to vary). Partial derivatives are used in vector calculus and differential geometry. The partial derivative of a function f with respect to the variable x is variously denoted by
fx, f,x, xf, or fxf^\prime_x,\ f_{,x},\ \partial_x f, \text{ or } \frac{\partial f}{\partial x}
.

Suppose that f is a function of more than one variable. For instance,
z=f(x,y)=x2+xy+y2z = f(x, y) = x^2 + xy + y^2
. The graph of this function defines a surface in Euclidean space. To every point on this surface, there is an infinite number of tangent lines. Partial differentiation is the act of choosing one of these lines and finding its slope. Usually, the lines of most interest are those which are parallel to the
xzxz
-plane and those which are parallel to the
yzyz
-plane (which result from holding either
yy
or
xx
constant, respectively).

image

Graph of 

z=x2+xy+y2z = x^2 + xy + y^2
: For the partial derivative at
(1,1,3)(1, 1, 3)
that leaves
yy
constant, the corresponding tangent line is parallel to the
xzxz
-plane.

To find the slope of the line tangent to the function at
P(1,1,3)P(1, 1, 3)
that is parallel to the
xzxz
-plane, the
yy
variable is treated as constant. By finding the derivative of the equation while assuming that
yy
is a constant, the slope of
ff
at the point
(x,y,z)(x, y, z)
is found to be:

zx=2x+y\displaystyle{\frac{\partial z}{\partial x} = 2x+y}


So at
(1,1,3)(1, 1, 3)
, by substitution, the slope is
33
. Therefore,

zx=3\displaystyle{\frac{\partial z}{\partial x} = 3}


at the point
(1,1,3)(1, 1, 3)
. That is to say, the partial derivative of
zz
 with respect to
xx
 at
(1,1,3)(1, 1, 3)
is
33
.

image

Graph of

z=x2+xy+y2z = x^2 + xy + y^2
 at 
y=1y=1
: A slice of the graph at
y=1y=1
.

Formal Definition

Like ordinary derivatives, the partial derivative is defined as a limit. Let
UU
 be an open subset of
RnR^n
 and
f:URf:U \rightarrow R
 a function. The partial derivative of
ff
 at the point
a=(a1,,an)Ua = (a_1, \cdots, a_n) \in U
 with respect to the
ii
th variable  is defined as:

aif(a)=limh0f(a1,,ai1,ai+h,ai+1,,an)f(a1,,ai,,an)h\displaystyle{\frac{ \partial }{\partial a_i }f(\mathbf{a}) = \lim_{h \rightarrow 0}{ f(a_1, \cdots, a_{i-1}, a_i+h, a_{i+1}, \cdots,a_n) - f(a_1, \cdots, a_i, \cdots,a_n) \over h }}


Tangent Planes and Linear Approximations

The tangent plane to a surface at a given point is the plane that "just touches" the surface at that point.

Learning Objectives

Explain why the tangent plane can be used to approximate the surface near the point

Key Takeaways

Key Points

  • For a surface given by a differentiable multivariable function
    z=f(x,y)z=f(x,y)
    , the equation of the tangent plane at
    (x0,y0,z0)(x_0,y_0,z_0)
    is given as fx(x0,y0)(x−x0)+fy(x0,y0)(y−y0)−(z−z0)=0f_x(x_0,y_0) (x-x_0) + f_y(x_0,y_0) (y-y_0) - (z-z_0) = 0.
  • Since a tangent plane is the best approximation of the surface near the point where the two meet, the tangent plane can be used to approximate the surface near the point.
  • The plane describing the linear approximation for a surface described by
    z=f(x,y)z=f(x,y)
    is given as
    z=z0+fx(x0,y0)(xx0)+fy(x0,y0)(yy0)z = z_0 + f_x(x_0,y_0) (x-x_0) + f_y(x_0,y_0) (y-y_0)
    .


Key Terms

  • differentiable: having a derivative, said of a function whose domain and co-domain are manifolds
  • differential geometry: the study of geometry using differential calculus
  • slope: also called gradient; slope or gradient of a line describes its steepness


The tangent line (or simply the tangent) to a plane curve at a given point is the straight line that "just touches" the curve at that point. Similarly, the tangent plane to a surface at a given point is the plane that "just touches" the surface at that point. The concept of a tangent is one of the most fundamental notions in differential geometry and has been extensively generalized.

image

Tangent Plane to a Sphere: The tangent plane to a surface at a given point is the plane that "just touches" the surface at that point.

Equations

When the curve is given by
y=f(x)y = f(x)
the slope of the tangent is
dydx\frac{dy}{dx}
, so by the point–slope formula the equation of the tangent line at
(x0,y0)(x_0, y_0)
is:

dydx(x0,y0)(xx0)(yy0)\frac{dy}{dx}(x_0,y_0) \cdot (x-x_0) - (y-y_0)


where
(x,y)(x, y)
are the coordinates of any point on the tangent line, and where the derivative is evaluated at
x=x0x=x_0
.

The tangent plane to a surface at a given point
pp
is defined in an analogous way to the tangent line in the case of curves. It is the best approximation of the surface by a plane at
pp
, and can be obtained as the limiting position of the planes passing through 3 distinct points on the surface close to
pp
as these points converge to
pp
. For a surface given by a differentiable multivariable function
z=f(x,y)z=f(x,y)
, the equation of the tangent plane at
(x0,y0,z0)(x_0,y_0,z_0)
is given as:

fx(x0,y0)(xx0)+fy(x0,y0)(yy0)(zz0)=0f_x(x_0,y_0) (x-x_0) + f_y(x_0,y_0) (y-y_0) - (z-z_0) = 0


where
(x0,y0,z0)(x_0,y_0,z_0)
is a point on the surface. Note the similarity of the equations for tangent line and tangent plane.

Linear Approximation

Since a tangent plane is the best approximation of the surface near the point where the two meet, tangent plane can be used to approximate the surface near the point. The approximation works well as long as the point
(x,y,z)(x,y,z)
 under consideration is close enough to
(x0,y0,z0)(x_0,y_0,z_0)
, where the tangent plane touches the surface. The plane describing the linear approximation for a surface described by
z=f(x,y)z=f(x,y)
is given as:

z=z0+fx(x0,y0)(xx0)+fy(x0,y0)(yy0)z = z_0 + f_x(x_0,y_0) (x-x_0) + f_y(x_0,y_0) (y-y_0)
.

The Chain Rule

For a function
UU
with two variables
xx
and
yy
, the chain rule is given as
dUdt=Uxdxdt+Uydydt\frac{d U}{dt} = \frac{\partial U}{\partial x} \cdot \frac{dx}{dt} + \frac{\partial U}{\partial y} \cdot \frac{dy}{dt}
.

Learning Objectives

Express a chain rule for a function with two variables

Key Takeaways

Key Points

  • The chain rule can be easily generalized to functions with more than two variables.
  • For a single variable functions, the chain rule is a formula for computing the derivative of the composition of two or more functions. For example, the chain rule for
    fg(x)f[g(x)]f \circ g (x) ≡ f [g (x)]
    is
    dfdx=dfdgdgdx\frac {df}{dx} = \frac {df}{dg} \cdot \frac {dg}{dx}
    .
  • The chain rule can be used when we want to calculate the rate of change of the function
    U(x,y)U(x,y)
    as a function of time
    tt
    , where
    x=x(t)x=x(t)
    and
    y=y(t)y=y(t)
    .


Key Terms

  • potential energy: the energy possessed by an object because of its position (in a gravitational or electric field), or its condition (as a stretched or compressed spring, as a chemical reactant, or by having rest mass)


The chain rule is a formula for computing the derivative of the composition of two or more functions. That is, if
ff
is a function and
gg
is a function, then the chain rule expresses the derivative of the composite function
fg(x)f[g(x)]f \circ g (x) ≡ f [g (x)]
in terms of the derivatives of
ff
and
gg
. For example, the chain rule for
fgf \circ g
is
dfdx=dfdgdgdx\frac {df}{dx} = \frac {df}{dg} \, \frac {dg}{dx}
.

The chain rule above is for single variable functions
f(x)f(x)
and
g(x)g(x)
. However, the chain rule can be generalized to functions with multiple variables. For example, consider a function
UU
with two variables
xx
and
yy
:
U=U(x,y)U=U(x,y)
.
UU
could be electric potential energy at a location
(x,y)(x,y)
. The motion of a test charge on the
xyxy
-plane can be described by
x=x(t)x=x(t)
,
y=y(t)y=y(t)
where
tt
is a parameter representing time
tt
. What we want to calculate is the rate of change of the potential energy
UU
as a function of time
tt
. Assuming
x=x(t)x=x(t)
,
y=y(t)y=y(t)
, and
U=U(x,y)U=U(x,y)
are all differentiable at
tt
and
(x,y)(x,y)
, the chain rule is given as:

dUdt=Uxdxdt+Uydydt\displaystyle{\frac{d U}{dt} = \frac{\partial U}{\partial x} \cdot \frac{dx}{dt} + \frac{\partial U}{\partial y} \cdot \frac{dy}{dt}}


This relation can be easily generalized for functions with more than two variables.

image

Scalar Field: The chain rule can be used to take derivatives of multivariable functions with respect to a parameter.

Example

For
z=(x2+xy+y2)1/2z = (x^2 + xy + y^2)^{1/2}
 where
x=x(t)x=x(t)
 and
y=y(t)y=y(t)
, express
dzdt\frac{dz}{dt}
 in terms of
dxdt\frac{dx}{dt}
and
dydt\frac{dy}{dt}
:

dzdt=ddt(x2+xy+y2)1/2\displaystyle{\frac{dz}{dt} = \frac{d}{dt}(x^2 +xy+ y^2)^{1/2}}


=12(x2+xy+y2)1/2ddt(x2+xy+y2)\displaystyle{\,\,\,\quad= \frac{1}{2}(x^2 +xy + y^2)^{-1/2}\frac{d}{dt}(x^2 +xy+ y^2)}


=12(x2+xy+y2)1/2(ddt(x2)+ddt(xy)+ddt(y2))\displaystyle{\,\,\,\quad=\frac{1}{2}(x^2 +xy+ y^2)^{-1/2}\left(\frac{d}{dt}(x^2) + \frac{d}{dt}(xy) +\frac{d}{dt}(y^2) \right)}


=(x+12y)dxdt+(y+12x)dydtx2+xy+y2\displaystyle{\,\,\,\quad= \frac{ \left(x+\displaystyle{\frac{1}{2}} y \right)\displaystyle{\frac{dx}{dt}} + \left(y+\frac{1}{2} x \right) \displaystyle{\frac{dy}{dt}}}{\sqrt{x^2 +xy+ y^2}}}


Directional Derivatives and the Gradient Vector

The directional derivative represents the instantaneous rate of change of the function, moving through
x\mathbf{x}
 with a velocity specified by
v\mathbf{v}
.

Learning Objectives

Describe properties of a function represented by the directional derivative

Key Takeaways

Key Points

  • The directional derivative is defined by the limit
    vf(x)=limh0f(x+hv)f(x)h\nabla_{\mathbf{v}}{f}(\mathbf{x}) = \lim_{h \rightarrow 0}{\frac{f(\mathbf{x} + h\mathbf{v}) - f(\mathbf{x})}{h}}
    .
  • If the function
    ff
     is differentiable at
    x\mathbf{x}
    , then the directional derivative exists along any vector
    v\mathbf{v}
    , and one gets
    vf(x)=f(x)v\nabla_{\mathbf{v}}{f}(\mathbf{x}) = \nabla f(\mathbf{x}) \cdot \mathbf{v}
    .
  • Many of the familiar properties of the ordinary derivative hold for the directional derivative.


Key Terms

  • chain rule: a formula for computing the derivative of the composition of two or more functions.
  • gradient: of a function
    y=f(x)y=f(x)
    or the graph of such a function, the rate of change of
    yy
    with respect to
    xx
    ; that is, the amount by which
    yy
    changes for a certain (often unit) change in
    xx
    .


The directional derivative of a multivariate differentiable function along a given vector
v\mathbf{v}
 at a given point
x\mathbf{x}
 intuitively represents the instantaneous rate of change of the function, moving through
x\mathbf{x}
 with a velocity specified by
v\mathbf{v}
. It therefore generalizes the notion of a partial derivative, in which the rate of change is taken along one of the coordinate curves, all other coordinates being constant.

Definition

The directional derivative of a scalar function
f(x)=f(x1,x2,,xn)f(\mathbf{x}) = f(x_1, x_2, \ldots, x_n)
 along a vector
v=(v1,,vn)\mathbf{v} = (v_1, \ldots, v_n)
 is the function defined by the limit:

vf(x)=limh0f(x+hv)f(x)h\displaystyle{\nabla_{\mathbf{v}}{f}(\mathbf{x}) = \lim_{h \rightarrow 0}{\frac{f(\mathbf{x} + h\mathbf{v}) - f(\mathbf{x})}{h}}}


If the function
ff
 is differentiable at
x\mathbf x
, then the directional derivative exists along any vector
v\mathbf v
, and one has
vf(x)=f(x)v\nabla_{\mathbf{v}}{f}(\mathbf{x}) = \nabla f(\mathbf{x}) \cdot \mathbf{v}
, where the
f(x)\nabla f(\mathbf{x})
 is the gradient vector and
\cdot
 is the dot product. At any point
x\mathbf x
, the directional derivative of
ff
 intuitively represents the rate of change of
ff
 with respect to time when it is moving at a speed and direction given by
v\mathbf v
 at the point
x\mathbf x
. The name "directional derivative" is a bit misleading since it depends on both the length and direction of
v\mathbf v
.

We can imagine the directional derivative
vf(x)\nabla_{\mathbf{v}}{f}(\mathbf{x})
 as the slope of the tangent line to the 2-dimensional slice of the graph of
ff
 that lies parallel to the vector
v\mathbf{v}
. However, this slice will be stretched or compressed horizontally unless
v=1\mathbf{v}=1
.

image

Gradient of a Function: The gradient of the function

f(x,y)=((cosx)2+(cosy)2)f(x,y) = -\left((\cos x)^2 + (\cos y)^2\right)
 depicted as a projected vector field on the bottom plane. Directional derivative represents the rate of change of the function along any direction specified by
v\mathbf{v}
.

Properties

Many of the familiar properties of the ordinary derivative hold for the directional derivative.

The Sum Rule

v(f+g)=vf+vg\nabla_\mathbf{v} (f + g) = \nabla_\mathbf{v} f + \nabla_\mathbf{v} g


The Constant Factor Rule

For any constant
cc
,
v(cf)=cvf\nabla_\mathbf{v} (cf) = c\nabla_\mathbf{v} f
.

The Product Rule (or Leibniz Rule)

v(fg)=gvf+fvg\nabla_\mathbf{v} (fg) = g\nabla_\mathbf{v} f + f\nabla_\mathbf{v} g


The Chain Rule

If
gg
 is differentiable at
pp
 and
hh
 is differentiable at
g(p)g(p)
, then
vhg(p)=h(g(p))vg(p)\nabla_\mathbf{v} h\circ g (p) = h'(g(p)) \nabla_\mathbf{v} g (p)
.

Maximum and Minimum Values

The second partial derivative test is a method used to determine whether a critical point is a local minimum, maximum, or saddle point.

Learning Objectives

Apply the second partial derivative test to determine whether a critical point is a local minimum, maximum, or saddle point

Key Takeaways

Key Points

  • For a function of two variables, the second partial derivative test is based on the sign of
    M(x,y)=fxx(x,y)fyy(x,y)(fxy(x,y))2M(x,y)= f_{xx}(x,y)f_{yy}(x,y) - \left( f_{xy}(x,y) \right)^2
    and
    fxx(a,b)f_{xx}(a,b)
    , where
    (a,b)(a,b)
    is a critical point.
  • There are substantial differences between the functions of one variable and the functions of more than one variable in the identification of global extrema.
  • The maximum and minimum of a function, known collectively as extrema, are the largest and smallest values that the function takes at a point either within a given neighborhood (local or relative extremum) or on the function domain in its entirety (global or absolute extremum).


Key Terms

  • critical point: a maximum, minimum, or point of inflection on a curve; a point at which the derivative of a function is zero or undefined
  • intermediate value theorem: a statement that claims that, for each value between the least upper bound and greatest lower bound of the image of a continuous function, there is a corresponding point in its domain that the function maps to that value
  • Rolle's theorem: a theorem stating that a differentiable function which attains equal values at two distinct points must have a point somewhere between them where the first derivative (the slope of the tangent line to the graph of the function) is zero


The maximum and minimum of a function, known collectively as extrema, are the largest and smallest values that the function takes at a point either within a given neighborhood (local or relative extremum) or on the function domain in its entirety (global or absolute extremum).

Finding Maxima and Minima of Multivariable Functions

The second partial derivative test is a method in multivariable calculus used to determine whether a critical point
(a,b,)(a,b, \cdots )
of a function
f(x,y,)f(x,y, \cdots )
 is a local minimum, maximum, or saddle point.

image

Saddle Point: A saddle point on the graph of

z=x2y2z=x^2-y^2
(in red).

For a function of two variables, suppose that
M(x,y)=fxx(x,y)fyy(x,y)(fxy(x,y))2M(x,y)= f_{xx}(x,y)f_{yy}(x,y) - \left( f_{xy}(x,y) \right)^2
.

  1. If
    M(a,b)>0M(a,b)>0
    and
    fxx(a,b)>0f_{xx}(a,b)>0
    , then
    (a,b)(a,b)
    is a local minimum of
    ff
    .
  2. If M(a,b)>0M(a,b)>0 and fxx(a,b)<0f_{xx}(a,b)<0, then
    (a,b)(a,b)
     is a local maximum of
    ff
    .
  3. If
    M(a,b)<0M(a,b)<0
    , then
    (a,b)(a,b)
    is a saddle point of
    ff
    .
  4. If
    M(a,b)=0M(a,b)=0
    , then the second derivative test is inconclusive.


There are substantial differences between functions of one variable and functions of more than one variable in the identification of global extrema. For example, if a bounded differentiable function
ff
defined on a closed interval in the real line has a single critical point, which is a local minimum, then it is also a global minimum (use the intermediate value theorem and Rolle's theorem). In two and more dimensions, this argument fails, as the function
f(x,y)=x2+y2(1x)3,x,yRf(x,y)= x^2+y^2(1-x)^3,\,\, x,y\in\mathbb{R}
shows. Its only critical point is at
(0,0)(0,0)
, which is a local minimum with
f(0,0)=0f(0,0) = 0
. However, it cannot be a global one, because
f(4,1)=11f(4,1) = 11
.

Lagrange Multiplers

The method of Lagrange multipliers is a strategy for finding the local maxima and minima of a function subject to equality constraints.

Learning Objectives

Describe application of the method of Lagrange multipliers

Key Takeaways

Key Points

  • To maximize
    f(x,y)f(x,y)
    subject to
    g(x,y)=cg(x,y)=c
    , we introduce a new variable
    λ\lambda
    , called a Lagrange multiplier, and study the Lagrange function (or Lagrangian) defined by
    Λ(x,y,λ)=f(x,y)+λ(g(x,y)c)\Lambda(x,y,\lambda) = f(x,y) + \lambda \cdot \Big(g(x,y)-c\Big)
    .
  • When the contour line for
    g=cg = c
     meets the contour lines of
    ff
    tangentially do we not increase or decrease the value of
    ff
    — that is, when the contour lines touch but do not cross. This will often be the situation where a solution to the constrained maximum problem above exists.
  • Solve
    x,y,λΛ(x,y,λ)=0\nabla_{x,y,\lambda} \Lambda(x, y, \lambda)=0
    , and we find a necessary condition for extrema under the given constraint.


Key Terms

  • gradient: of a function
    y=f(x)y = f(x)
    or the graph of such a function, the rate of change of
    yy
    with respect to
    xx
    ; that is, the amount by which
    yy
    changes for a certain (often unit) change in
    xx
  • contour: a line on a map or chart delineating those points which have the same altitude or other plotted quantity: a contour line or isopleth


In mathematical optimization, the method of Lagrange multipliers (named after Joseph Louis Lagrange) is a strategy for finding the local maxima and minima of a function subject to equality constraints.

For instance, consider the following optimization problem: Maximize
f(x,y)f(x,y)
 subject to
g(x,y)=cg(x,y)=c
. We need both
ff
and
gg
to have continuous first partial derivatives. We introduce a new variable (
λ\lambda
) called a Lagrange multiplier, and study the Lagrange function (or Lagrangian) defined by:

Λ(x,y,λ)=f(x,y)+λ(g(x,y)c)\Lambda(x,y,\lambda) = f(x,y) + \lambda \cdot \left(g(x,y)-c\right)


where the
λ\lambda
term may be either added or subtracted. If
f(x0,y0)f(x_0, y_0)
 is a maximum of
f(x,y)f(x,y)
for the original constrained problem, then there exists
λ0\lambda_0
such that
(x0,y0,λ0)(x_0,y_0,\lambda_0)
is a stationary point for the Lagrange function (stationary points are those points where the partial derivatives of
Λ\Lambda
are zero, i.e.,
Λ=0\nabla\Lambda = 0
). However, not all stationary points yield a solution of the original problem. Thus, the method of Lagrange multipliers yields a necessary condition for optimality in constrained problems. Sufficient conditions for a minimum or maximum also exist.

image

Maximizing f(x,y): Find x and y to maximize f(x,y) subject to a constraint (shown in red) g(x,y)=c.

Introduction

One of the most common problems in calculus is that of finding maxima or minima (in general, "extrema") of a function, but it is often difficult to find a closed form for the function being extremized. Such difficulties often arise when one wishes to maximize or minimize a function subject to fixed outside conditions or constraints. The method of Lagrange multipliers is a powerful tool for solving this class of problems without the need to explicitly solve the conditions and use them to eliminate extra variables.

Consider the two-dimensional problem introduced above. Maximize
f(x,y)f(x,y)
subject to
g(x,y)=cg(x,y)=c
. We can visualize contours of
ff
given by
f(x,y)=df(x, y)=d
for various values of
dd
, and the contour of
gg
given by
g(x,y)=cg (x, y) = c
. Suppose we walk along the contour line with
g=cg = c
. In general, the contour lines of
ff
and
gg
may be distinct, so following the contour line for
g=cg = c
, one could intersect with or cross the contour lines of
ff
. This is equivalent to saying that while moving along the contour line for
g=cg = c
, the value of
ff
can vary. When the contour line for
g=cg = c
meets contour lines of
ff
tangentially we neither increase nor decrease the value of
ff
—that is, when the contour lines touch but do not cross.

The contour lines of
ff
and
gg
touch when the tangent vectors of the contour lines are parallel. Since the gradient of a function is perpendicular to the contour lines, this is the same as saying that the gradients of
ff
and
gg
are parallel. Thus, we want points:

(x,y)(x,y)
 where 
g(x,y)=cg(x,y)=c


and

x,yf=λx,yg\nabla_{x,y} f = - \lambda \nabla_{x,y} g
, where
x,yf=(fx,fy)\nabla_{x,y} f= \left( \frac{\partial f}{\partial x}, \frac{\partial f}{\partial y} \right)
 and
x,yg=(gx,gy)\nabla_{x,y} g= \left( \frac{\partial g}{\partial x}, \frac{\partial g}{\partial y} \right)
are the respective gradients.

The constant is required because, although the two gradient vectors are parallel, the magnitudes of the gradient vectors are generally not equal. Note that
λ0\lambda \neq 0
; otherwise we cannot assert the two gradients are parallel.

To incorporate these conditions into one equation, we introduce an auxiliary function,
Λ(x,y,λ)=f(x,y)+λ(g(x,y)c)\Lambda(x,y,\lambda) = f(x,y) + \lambda \cdot \Big(g(x,y)-c\Big)
, and solve
x,y,λΛ(x,y,λ)=0\nabla_{x,y,\lambda} \Lambda(x, y, \lambda)=0
. This is the method of Lagrange multipliers. Note that
λΛ(x,y,λ)=0\nabla_{\lambda} \Lambda(x, y, \lambda)=0
 implies
g(x,y)=cg(x,y)=c
.

Where the Lagrange multiplier
λ=0\lambda=0
 we can have a local extremum and the two contours cross instead of meeting tangentially. Consider the following example.

Minimize
f(x,y)=sin(x)f(x,y) = \sin(x)
, given that
g(x,y)=x2+y2=9g(x,y) = x^2 + y^2=9
. Every point
(π2,y)\left(\frac{-\pi}{2}, y\right)
f=1f=-1
 is a global minimum of
ff
with value
1-1
. Therefore where the constraint
g=cg=c
 crosses the contour line
f=1f=-1
, is a local minimum of
ff
on the constraint. The trace and the contour
f=1f=-1
 cross at the minimum as we can see in the figure. It is easy to verify that
fx=0f_x=0
 and
fy=0f_y=0
 when
x=π2x = \frac{\pi}{2}
. Since both
gx0g_x \neq 0
 and
gy0g_y \neq 0
, the Lagrange multiplier
λ=0\lambda = 0
 at the minimum.

image

Example where the contour and constraint cross at an extremum.

Optimization in Several Variables

To solve an optimization problem, formulate the function
f(x,y,)f(x,y, \cdots )
to be optimized and find all critical points first.

Learning Objectives

Solve a simple problem that requires optimization of several variables

Key Takeaways

Key Points

  • Mathematical optimization is the selection of a best element (with regard to some criteria) from some set of available alternatives.
  • An optimization process that involves only a single variable is rather straightforward. After finding out the function
    f(x)f(x)
    to be optimized, local maxima or minima at critical points can easily be found. End points may have maximum/minimum values as well.
  • For a rectangular cuboid shape, given the fixed volume, a cube is the geometric shape that minimizes the surface area.


Key Terms

  • optimization: the design and operation of a system or process to make it as good as possible in some defined sense
  • cuboid: a parallelepiped having six rectangular faces


Mathematical optimization is the selection of a best element (with regard to some criteria) from some set of available alternatives. An optimization process that involves only a single variable is rather straightforward. After finding out the function
f(x)f(x)
 to be optimized, local maxima or minima at critical points can be easily found. (Of course, end points may have maximum/minimum values as well.) The same strategy applies for optimization with several variables. In this atom, we will solve a simple example to see how optimization involving several variables can be achieved.

Cardboard Box with a Fixed Volume

A packaging company needs cardboard boxes in rectangular cuboid shape with a given volume of 1000 cubic centimeters and would like to minimize the material cost for the boxes. What should be the dimensions
xx
,
yy
,
zz
of a box?

First of all, the material cost would be proportional to the surface area
SS
of the cuboid. Therefore, the goal of the optimization is to minimize a function
S(x,y,z)=2(xy+yz+zx)S(x,y,z) = 2(xy + yz+zx)
. The constraint in the case is that the volume is fixed:
V=xyz=1000V = xyz = 1000
.

image

Rectangular Cuboid: Mathematical optimization can be used to solve problems that involve finding the right size of a volume such as a cuboid.

We will first remove
zz
from
S(x,y,z)S(x,y,z)
. We can do this by using the constraint
z=1000xyz = \frac{1000}{xy}
. Inserting the expression for
zz
in
S(x,y,z)S(x,y,z)
yields:

S(x,y,z)=2(xy+1000x+1000y)\displaystyle{S(x,y,z) = 2\left(xy + \frac{1000}{x} + \frac{1000}{y}\right)}


To find the critical points:

Sx=2(y1000x2)=0y=1000x2\displaystyle{\frac{\partial S}{\partial x} = 2 \left(y - \frac{1000}{x^2} \right) = 0\\ \therefore y = \frac{1000}{x^2}}


and

Sy=2(x1000y2)=0x=1000y2\displaystyle{\frac{\partial S}{\partial y} = 2\left(x - \frac{1000}{y^2}\right) = 0\\ \therefore x = \frac{1000}{y^2}}


Then, substituting in the expression found equal to
yy
 above yields:

x3=1000x^3 = 1000


Therefore, we find that:

x=y=z=10x=y=z=10


That is to say, the box that minimizes the cost of materials while maintaining the desired volume should be a 10-by-10-by-10 cube.

Applications of Minima and Maxima in Functions of Two Variables

Finding extrema can be a challenge with regard to multivariable functions, requiring careful calculation.

Learning Objectives

Identify steps necessary to find the minimum and maximum in multivariable functions

Key Takeaways

Key Points

  • The second derivative test is a criterion for determining whether a given critical point of a real function of one variable is a local maximum or a local minimum using the value of the second derivative at the point.
  • To find minima/maxima for functions with two variables, we must first find the first partial derivatives with respect to
    xx
    and
    yy
    of the function.
  • The function
    z=f(x,y)=(x+y)(xy+xy2)z = f(x, y) = (x+y)(xy + xy^2)
     has saddle points at
    (0,1)(0,-1)
    and
    (1,1)(1,-1)
    and a local maximum at
    (38,3.4)\left(\frac{3}{8}, -3.4\right)
    .


Key Terms

  • multivariable: concerning more than one variable
  • critical point: a maximum, minimum, or point of inflection on a curve; a point at which the derivative of a function is zero or undefined


We have learned how to find the minimum and maximum in multivariable functions. As previously mentioned, finding extrema can be a challenge with regard to multivariable functions. In particular, we learned about the second derivative test, which is a criterion for determining whether a given critical point of a real function of one variable is a local maximum or a local minimum, using the value of the second derivative at the point. In this atom, we will find extrema for a function with two variables.

Example

Find and label the critical points of the following function:

z=f(x,y)=(x+y)(xy+xy2)z = f(x, y) = (x+y)(xy + xy^2)


 

Plot of 
z=(x+y)(xy+xy2)z = (x+y)(xy+xy^2)
: The maxima and minima of this plot cannot be found without extensive calculation.
To solve this problem we must first find the first partial derivatives of the function with respect to
xx
and
yy
:

zx=y(2x+y)(y+1)\displaystyle{\frac{\partial z}{\partial x} = y(2x +y)(y+1)}


zy=x(3y2+2y(x+1)+x)\displaystyle{\frac{\partial z}{\partial y} = x \left( 3y^2 +2y(x+1) + x \right)}


Looking at
zx=0\frac{\partial z}{\partial x} = 0
, we see that
yy
must equal
00
,
1-1
, or
2x-2x
.

We plug the first solution,
y=0y=0
, into the next equation, and get:

zy=x(3y2+2y(x+1)+x)=x2\displaystyle{\frac{\partial z}{\partial y} = x \left( 3y^2 +2y(x+1) + x \right)\\ \,\quad = x^2}


There were other possibilities for
yy
, so for
y=1y=-1
we have:

zy=x(32(x+1)+x)=x(1x)=0\displaystyle{\frac{\partial z}{\partial y} = x \left( 3 -2(x+1) + x \right) \\ \,\quad= x(1-x)\\ \,\quad= 0}


So
xx
must be equal to
11
or
00
. Finally, for
y=2xy=-2x
:

zy=x(3(2x)2+2(2x)(x+1)+x)=x2(8x3)=0\displaystyle{\frac{\partial z}{\partial y} = x \left( 3(-2x)^2 +2(-2x)(x+1) + x \right) \\ \,\quad= x^2(8x-3) \\ \,\quad= 0}


So
xx
must equal
00
or  for
y=0y = 0
and
y=34y = -\frac{3}{4}
, respectively.

Let's list all the critical values now:

(x,y)(0,0),(0,1),(1,1),(38,34)\displaystyle{(x,y) \in {(0,0), (0, -1), (1,-1), \left(\frac{3}{8}, -\frac{3}{4}\right)}}


Now we have to label the critical values using the second derivative test. Plugging in all the different critical values we found to label them, we have:

  • D(0,0)=0D(0, 0) = 0
  • D(0,1)=1D(0, -1) = -1
  • D(1,1)=1D(1, -1) = -1
  • D(38,34)=0.210938D\left(\frac{3}{8}, -\frac{3}{4}\right) = 0.210938


We can now label some of the points:

  • at (0, −1), 
    f(x,y)f(x, y)
    has a saddle point
  • at (1, −1),
    f(x,y)f(x, y)
    has a saddle point
  • at
    (38,34)\left(\frac{3}{8}, -\frac{3}{4}\right)
     
    f(x,y)f(x, y)
    has a local maximum, since 
    fxx=38<0f_{xx} = -\frac{3}{8} < 0


At the remaining point we need higher-order tests to find out what exactly the function is doing.

Licenses and Attributions

More Study Resources for You

Show More