Unformatted text preview: ECONOMICS 100A MATHEMATICAL HANDOUT
Fall 2003 α abd β abc γ abc µ abc
A. CALCULUS REVIEW1 Mark Machina Derivatives, Partial Derivatives and the Chain Rule You should already know what a derivative is. We’ll use the expressions ƒ (x) or dƒ(x)/dx for the derivative of the function ƒ(x). To indicate the derivative of ƒ(x) evaluated at the point x = x*, we’ll use the expressions ƒ (x*) or dƒ(x*)/dx. When we have a function of more than one variable, we can consider its derivatives with respect to each of the variables, that is, each of its partial derivatives. We use the expressions: ∂ƒ(x1,x2)/∂x1 and ƒ1(x1,x2) interchangeably to denote the partial derivative of ƒ(x1,x2) with respect to its first argument (that is, with respect to x1). To calculate this, just hold x2 fixed (treat it as a constant) so that ƒ(x1,x2) may be thought of as a function of x1 alone, and differentiate it with respect to x1. The notation for partial derivatives with respect to x2 (or in the general case, with respect to xi) is analogous. For example, if ƒ(x1,x2) = x12⋅x2 + 3x1, we have: ∂ƒ(x1,x2)/∂x1 = 2x1·x2 + 3 and ∂ƒ(x1,x2)/∂x2 = x12 The normal vector of a function ƒ(x1,...,xn) at the point (x1,...,xn) is just the vector (i.e., ordered list) of its n partial derivatives at that point, that is, the vector: ∂ ƒ( x1 ,..., xn ) ⎞ ⎛ ∂ ƒ( x1 ,..., xn ) ∂ ƒ( x1 ,..., xn ) , ,..., ⎜ ⎟ = ( ƒ1 ( x1 ,..., xn ), ƒ 2 ( x1 ,..., xn ),..., ƒ n ( x1 ,..., xn ) ) ∂x1 ∂x2 ∂xn ⎝ ⎠ Normal vectors play a key role in the conditions for unconstrained and constrained optimization. The chain rule gives the derivative for a “function of a function.” Thus, if ƒ(x) ≡ g(h(x)), then ƒ (x) = g (h(x)) ⋅ h (x) The chain rule also applies to taking partial derivatives. For example, if ƒ(x1,x2) ≡ g(h(x1,x2)) then ∂ ƒ( x1 , x2 ) ∂h( x1 , x2 ) = g ′( h( x1 , x2 )) ⋅ ∂x1 ∂x1 Similarly, if ƒ(x1,x2) ≡ g(h(x1,x2), k(x1,x2)) then: ∂ ƒ( x1 , x2 ) ∂x1 = g1 ( h( x1 , x2 ), k ( x1 , x2 )) ⋅ ∂h( x1 , x2 ) ∂k ( x1 , x2 ) + g 2 ( h( x1 , x2 ), k ( x1 , x2 )) ⋅ ∂x1 ∂x1 The second derivative of the function ƒ(x) is written: ƒ (x) or d 2ƒ(x)/dx2 and it is obtained by differentiating the function ƒ(x) twice with respect to x (if you want the value of ƒ (⋅) at some point x*, don’t substitute in x* until after you’ve differentiated twice).
1 If the material in this section is not already familiar to you, you may have trouble on the 100A midterms and final. A second partial derivative of a function of two or more variables is analogous, i.e., we will use the expressions: ƒ11(x1,x2) or ∂2ƒ(x1,x2)/∂x12 to denote differentiating twice with respect to x1 (and ∂2ƒ(x1,x2)/∂x22 for twice with respect to x2). We get a cross partial derivative when we differentiate first with respect to x1 and then with respect to x2. We will denote this with the expressions: ƒ12(x1,x2) or ∂2ƒ(x1,x2)/∂x1∂x2 Here’s a strange and wonderful result: if we had differentiated in the opposite order, that is, first with respect to x2 and then with respect to x1, we would have gotten the same result. In other words, we have ƒ12(x1,x2) ≡ ƒ21(x1,x2) or equivalently ∂2ƒ(x1,x2)/∂x1∂x2 ≡ ∂2ƒ(x1,x2)/∂x2∂x1.
Approximation Formulas for Small Changes in Functions (Total Differentials) If ƒ(x) is differentiable, we can approximate the effect of a small change in x by: ∆ƒ = ƒ(x+∆x) – ƒ(x) ≈ ƒ (x)⋅∆x where ∆x is the change in x. From calculus, we know that as ∆x becomes smaller and smaller, this approximation becomes extremely good. We sometimes write this general idea more formally by expressing the total differential of ƒ(x), namely: dƒ = ƒ (x)⋅dx but it is still just shorthand for saying “We can approximate the change in ƒ(x) by the formula ∆ƒ ≈ ƒ (x)⋅∆x, and this approximation becomes extremely good for very small values of ∆x.” When ƒ(⋅) is a “function of a function,” i.e., it takes the form ƒ(x) ≡ g(h(x)), the chain rule lets us write the above approximation formula and above total differential formula as
∆ g ( h( x )) ≈ dg ( h( x ))) ⋅∆x = g ′(h ( x ))⋅h′( x )⋅∆x dx so dg ( h( x )) = g ′( h( x )) ⋅ h′( x ) ⋅ dx For a function ƒ(x1,...,xn) that depends upon several variables, the approximation formula is: ∂ ƒ( x1 ,..., xn ) ∂ ƒ( x1 ,..., xn ) ⋅ ∆x1 + ... + ⋅ ∆ xn ∂x1 ∂xn Once again, this approximation formula becomes extremely good for very small values of ∆x1,…,∆xn. As before, we sometimes write this idea more formally (and succinctly) by expressing the total differential of ƒ(x), namely: ∆ ƒ = ƒ( x1 +∆x1 ,..., xn +∆xn ) − ƒ( x1 ,..., xn ) = dƒ = or in equivalent notation: dƒ = ƒ1(x1,...,xn)⋅dx1 + ⋅⋅⋅ + ƒn(x1,...,xn)⋅dxn ∂ ƒ( x1 ,..., xn ) ∂ ƒ( x1 ,..., xn ) ⋅ dx1 + ... + ⋅ dxn ∂x1 ∂xn Econ 100A Fall 2003 2 Mathematical Handout B. ELASTICITY Let the variable y depend upon the variable x according to some function, i.e.: y = ƒ(x) How responsive is y to changes in x? One measure of responsiveness would be to plot the function ƒ(⋅) and look at its slope. If we did this, our measure of responsiveness would be: absolute change in y ∆y dy = ≈ = ƒ′( x ) dx absolute change in x ∆x Elasticity is a different measure of responsiveness than slope. Rather than looking at the ratio of the absolute change in y to the absolute change in x, elasticity is a measure of the proportionate (or percentage) change in y to the proportionate (or percentage) change in x. Formally, if y = ƒ(x), then the elasticity of y with respect to x, written Ey,x , is given by: slope of ƒ(x ) =
E y ,x = proportionate change in y proportionate change in x = ( ∆y y ) ( ∆x x ) ⎛ ∆y ⎞ ⎛ x ⎞ =⎜ ⎟⋅ ⎝ ∆x ⎠ ⎜ y ⎟ ⎝⎠ If we consider very small changes in x (and hence in y), ∆y/∆x becomes dy/dx = ƒ (x), so we get that the elasticity of y with respect to x is given by: ⎛ ∆y ⎞ ⎛ x ⎞ ⎛ dy ⎞ ⎛ x ⎞ ⎛x⎞ =⎜ ⎟ ⋅ ⎜ ⎟ ≈ ⎜ ⎟ ⋅ ⎜ ⎟ = f ′( x ) ⋅ ⎜ ⎟ ⎝ ∆x ⎠ ⎝ y ⎠ ⎝ dx ⎠ ⎝ y ⎠ ⎝ y⎠ Note that if ƒ(x) is an increasing function the elasticity will be positive, and if ƒ(x) is a decreasing function, it will be negative. E y ,x = Recall that since the percentage change in a variable is simply 100 times its proportional change, elasticity is also as the ratio of the percentage change in y to the percentage change in x: ( ∆y y ) ( ∆x x ) E y ,x = (∆y y )
( ∆x x ) = 100 ⋅ ∆y ( y)
(
x 100 ⋅ ∆x ) = % change in y % change in x A useful intuitive interpretation: Since we can rearrange the above equation as % change in y = Ey,x ⋅ % change in x we see that Ey,x serves as the “conversion factor” between the percentage change in x and the percentage change in y. Although elasticity and slope are both measures of how responsive y is to changes in x, they are different measures. In other words, elasticity is not the same as slope. For example, if y = 7⋅x, the slope of this curve is obviously 7, but its elasticity is 1: E y ,x = dy x ⋅ dx y = 7⋅ x 7x ≡1 That is, if y is exactly proportional to x, the elasticity of y with respect to x will always be one, regardless of the coefficient of proportionality. Econ 100A Fall 2003 3 Mathematical Handout Constant Slope Functions versus Constant Elasticity Functions Another way to see that slope and elasticity are different measures is to consider the simple function ƒ(x) = 3 + 4x. Although ƒ(⋅) has a constant slope, it does not have a constant elasticity: Eƒ( ⋅),x = d ƒ( x ) x ⋅ dx ƒ( x ) = 4⋅ x 3 + 4x = 4x 3 + 4x which is obviously not constant as x changes. However, some functions do have a constant elasticity for all values of x, namely functions of the form ƒ(x) ≡ c⋅ xβ, for any constants c > 0 and β 0. Since it involves taking x to a fixed power β, this function can be called a power function. Deriving its elasticity gives:
Eƒ( x ), x d ƒ(x ) x = ⋅ dx ƒ( x ) = β ⋅ c x ( β −1) ⋅ x
c ⋅ xβ ≡β
x Conversely, if a function ƒ(⋅) has a constant elasticity, it must be a power function. In summary: linear functions all have a constant slope: ƒ( x ) ≡ α + β ⋅ x
x ⇔ ⇔ d ƒ (x ) ≡β dx x Eƒ( x ), x ≡ β
x power functions all have a constant elasticity: ƒ( x ) ≡ c ⋅ x β
x C. LEVEL CURVES OF FUNCTIONS If ƒ(x1,x2) is a function of the two variables x1 and x2, a level curve of ƒ(x1,x2) is just a locus of points in the (x1,x2) plane along which ƒ(x1,x2) takes on some constant value, say the value k. The equation of this level curve is therefore simply ƒ(x1,x2) = k, where we may or may not want to solve for x2. For example, the level curves of a consumer’s utility function are just his or her indifference curves (defined by the equation U(x1,x2) = u0), and the level curves of a firm’s production function are just the isoquants (defined by the equation ƒ(L,K) = Q0). The slope of a level curve is indicated by the notation: dx2 or dx1 ƒ( x , x ) = k
1 2 dx2 dx1 ∆ ƒ =0 where the subscripted equations are used to remind us that x1 and x2 must vary in a manner which keeps us on the ƒ(x1,x2) = k level curve (i.e., so that ∆ƒ = 0). To calculate this slope, recall the vector of changes (∆x1,∆x2) will keep us on this level curve if and only if it satisfies the equation: 0 = ∆ƒ ≈ ƒ1(x1,x2)⋅∆x1 + ƒ2(x1,x2)⋅∆x2 which implies that ∆x1 and ∆x2 will accordingly satisfy: ∆x2 ∆x1 so that in the limit we have: ≈−
ƒ( x1 , x2 ) = k ∂ ƒ( x1 , x2 ) ∂x1 ∂ ƒ( x1 , x2 ) ∂x2 Econ 100A Fall 2003 4 Mathematical Handout dx2 dx1 =−
ƒ( x1 , x2 ) = k ∂ ƒ( x1 , x2 ) ∂x1 ∂ ƒ( x1 , x2 ) ∂x2 This slope gives the rate at which we can “trade off” or “substitute” x2 against x1 so as to leave the value of the function ƒ(x1,x2) unchanged. This concept will be of frequent use in this course. x2 slope = – ∂ƒ(x1,x2)/dx1 ∂ƒ(x1,x2)/dx2 ƒ(x1,x2) = k 0 x1 An application of this result is that the slope of the indifference curve at a given consumption bundle is given by the ratio of the marginal utilities of the two commodities at that bundle. Another application is that the slope of an isoquant at a given input bundle is the ratio of the marginal products of the two factors at that input bundle. In the case of a function ƒ(x1,...,xn) of several variables, we will have a level surface in n– dimensional space along which the function is constant, that is, defined by the equation ƒ(x1,...,xn) = k. In this case the level surface does not have a unique tangent line. However, we can still determine the rate at which we can trade off any pair of variables xi and xj so as to keep the value of the function constant. By exact analogy with the above derivation, this rate is given by:
dxi dx j =
ƒ( x1 ,..., xn ) = k dxi dx j =−
∆ ƒ =0 ƒ j ( x1 ,..., xn ) ƒ i ( x1 ,..., xn ) Given any level curve (or level surface) corresponding to the value k, its betterthan set is the set of all points at which the function yields a higher value than k, and its worsethan set is the set of all points at which the function yields a lower value than k. D. OPTIMIZATION #1: SOLVING OPTIMIZATION PROBLEMS The General Structure of Optimization Problems Economics is full of optimization (maximization or minimization) problems: the maximization of utility, the minimization of expenditure, the minimization of cost, the maximization of profits, etc. Understanding these is a lot easier if one knows what is systematic about such problems.
Econ 100A Fall 2003 5 Mathematical Handout Each optimization problem has an objective function ƒ(x1,...,xn;α1,...,αm) which we are trying to either maximize or minimize (in our examples, we’ll always be maximizing). This function depends upon both the control variables x1,...,xn which we (or the economic agent) are able to set, as well as some parameters α1,...,αm, which are given as part of the problem. Thus a general unconstrained maximization problem takes the form:
x1 ,..., xn max ƒ( x1 ,..., xn ; α1 ,..., α m ) max ƒ( x1 ,..., xn ; α ) Consider the following oneparameter maximization problem
x1 ,..., xn (It’s only for simplicity that we assume just one parameter. All of our results will apply to the general case of many parameters α1,...,αm.) We represent the solutions to this problem, which obviously depend upon the values of the parameter(s), by the n solution functions:
∗ x1 ∗ x2 = = = ∗ x1 (α ) ∗ x2 (α ) ∗ xn ∗ xn (α ) It is often useful to ask “how well have we done?” or in other words, “how high can we get ƒ(x1,...,xn;α), given the value of the parameter α?” This is obviously determined by substituting in the optimal solutions back into the objective function, to obtain:
∗ ∗ ∗ ∗ φ (α ) ≡ ƒ( x1 ,..., xn ;α ) ≡ ƒ( x1 (α ),..., xn (α );α ) and φ(α) is called the optimal value function. Sometimes we will be optimizing subject to a constraint on the control variables (such as the budget constraint of the consumer). Since this constraint may also depend upon the parameter(s), our problem becomes: max ƒ( x1 ,..., xn ; α )
x1 ,..., xn subject to g ( x1 ,..., xn ; α ) = c (Note that we now have an additional parameter, namely the constant c.) In this case we still define the solution functions and optimal value function in the same way – we just have to remember to take into account the constraint. Although it is possible that there could be more than one constraint in a given problem, we will only consider problems with a single constraint. For example, if we were looking at the profit maximization problem, the control variables would be the quantities of inputs and outputs chosen by the firm, the parameters would be the current input and output prices, the constraint would be the production function, and the optimal value function would be the firm’s “profit function,” i.e., the highest attainable level of profits given current input and output prices. In economics we are interested both in how the optimal values of the control variables and the optimal attainable value vary with the parameters. In other words, we will be interested in differentiating both the solution functions and the optimal value function with respect to the
Econ 100A Fall 2003 6 Mathematical Handout parameters. Before we can do this, however, we need to know how to solve unconstrained or constrained optimization problems.
First Order Conditions for Unconstrained Optimization Problems The first order conditions for the unconstrained optimization problem:
x1 ,..., xn max ƒ( x1 ,..., xn ) are simply that each of the partial derivatives of the objective function be zero at the solution * * values (x1 ,...,xn ), i.e. that: ∗ ∗ ƒ1 ( x1 ,..., xn ) = 0
∗ ∗ ƒ n ( x1 ,..., xn ) = 0 The intuition is that if you want to be at a “mountain top” (a maximum) or the “bottom of a bowl” (a minimum) it must be the case that no small change in any control variable be able to move you up or down. That means that the partial derivatives of ƒ(x1,...,xn) with respect to each xi must be zero.
Second Order Conditions for Unconstrained Optimization Problems If our optimization problem is a maximization problem, the second order condition for this solution to be a local maximum is that ƒ(x1, ...,xn) be a weakly concave function of (x1,...,xn) (i.e., a mountain top) in the locality of this point. Thus, if there is only one control variable, the second order condition is that ƒ ″(x*) < 0 at the optimum value of the control variable x. If there are two control variables, it turns out that the conditions are: ƒ11(x1*,x2*) < 0 and
∗ ∗ ∗ ∗ ƒ11 ( x1 , x2 ) ƒ12 ( x1 , x2 ) * * * * ƒ 21 ( x1 , x2 ) ƒ 22 ( x1 , x2 ) ƒ22(x1*,x2*) < 0
>0 When we have a minimization problem, the second order condition for this solution to be a local minimum is that ƒ(x1,...,xn) be a weakly convex function of (x1,...,xn) (i.e., the bottom of a bowl) in the locality of this point. Thus, if there is only one control variable x, the second order condition is that ƒ ″(x*) > 0. If there are two control variables, the conditions are:
** ƒ11(x1 , x2 ) > 0 ** ƒ22(x1 , x2 ) > 0
∗ ∗ ƒ12 ( x1 , x2 ) and
∗ ∗ ƒ11 ( x1 , x2 ) * * * * ƒ 21 ( x1 , x2 ) ƒ 22 ( x1 , x2 ) >0 (yes, this last determinant really is supposed to be positive). Econ 100A Fall 2003 7 Mathematical Handout First Order Conditions for Constrained Optimization Problems (VERY important) The first order conditions for the twovariable constrained optimization problem: max ƒ( x1 , x2 )
x1 , x2 subject to g ( x1 , x2 ) = c are easy to see from the following diagram x2 (x1*,x2*) level curves of ƒ(x1,x2) g(x1,x2) = c 0 x1 ** The point (x1 , x2 ) is clearly not an unconstrained maximum, since increasing both x1 and x2 would move you to a higher level curve for ƒ(x1,x2). However, this change is not “legal” since it does not satisfy the constraint – it would move you off of the level curve g(x1,x2) = c. In order to stay on the level curve, we must jointly change x1 and x2 in a manner which preserves the value of g(x1,x2). That is, we can only tradeoff x1 against x2 at the “legal” rate: dx2 dx1 =
g ( x1 , x2 ) = c dx2 dx1 =−
∆g = 0 g1 ( x1 , x2 ) g 2 ( x1 , x2 ) The condition for maximizing ƒ(x1,x2) subject to g(x1,x2) = c is that no tradeoff between x1 and x2 at this “legal” rate be able to raise the value of ƒ(x1,x2). This is the same as saying that the level curve of the constraint function be tangent to the level curve of the objective function. In other words, the tradeoff rate which preserves the value of g(x1,x2) (the “legal” rate) must be the same as the tradeoff rate that preserves the value of ƒ(x1,x2). We thus have the condition: dx2 dx1 which implies that: − which is in turn equivalent to:
** ** ƒ1 ( x1 , x2 ) = λ ⋅ g1 ( x1 , x2 ) ** ** ƒ 2 ( x1 , x2 ) = λ ⋅ g 2 ( x1 , x2 ) =
∆g = 0 dx2 dx1 ∆ ƒ =0 g1 ( x1 , x2 ) g 2 ( x1 , x2 ) =− ƒ1 ( x1 , x2 ) ƒ 2 ( x1 , x2 ) for some scalar λ.
Econ 100A Fall 2003 8 Mathematical Handout To summarize, we have that the first order conditions for the constrained maximization problem: max ƒ( x1 , x2 )
x1 , x2 subject to
** are that the solutions (x1 , x2 ) satisfy the equations g ( x1 , x2 ) = c ** ** ƒ1 ( x1 , x2 ) = λ ⋅ g1 ( x1 , x2 ) ** ** ƒ 2 ( x1 , x2 ) = λ ⋅ g 2 ( x1 , x2 ) ** g ( x1 , x2 ) = c for some scalar λ. An easy way to remember these conditions is simply that the normal vector to ** ƒ(x1,x2) at the optimal point (x1 , x2 ) must be a scalar multiple of the normal vector to g(x1,x2) at ** the optimal point (x1 , x2 ), i.e. that: ( ƒ1(x1*,x2*) , ƒ2(x1*,x2*) ) = λ ⋅ ( g1(x1*,x2*) , g2(x1*,x2*) )
** and also that the constraint g(x1 , x2 ) = c be satisfied. This same principle extends to the case of several variables. In other words, the conditions for * * (x1 ,...,xn ) to be a solution to the constrained maximization problem:
x1 ,..., xn max ƒ( x1 ,..., xn ) g ( x1 ,..., xn ) = c subject to is that no legal tradeoff between any pair of variables xi and xj be able to affect the value of the objective function. In other words, the tradeoff rate between xi and xj that preserves the value of g(x1,...,xn) must be the same as the tradeoff rate between xi and xj that preserves the value of ƒ(x1,...,xn). We thus have the condition: dxi dx j or in other words, that: − g j ( x1 ,..., xn ) gi ( x1 ,..., xn ) =− ƒ j ( x1 ,..., xn ) ƒ i ( x1 ,..., xn ) for all i and j =
∆g = 0 dxi dx j for all i and j
∆ ƒ =0 Again, the only way to ensure that these ratios will be equal for all i and j is to have:
* * * * ƒ1 ( x1 ,..., xn ) = λ ⋅ g1 ( x1 ,..., xn ) * * * * ƒ 2 ( x1 ,..., xn ) = λ ⋅ g 2 ( x1 ,..., xn ) * * * * ƒ n ( x1 ,..., xn ) = λ ⋅ g n ( x1 ,..., xn ) Econ 100A Fall 2003 9 Mathematical Handout To summarize: the first order conditions for the constrained maximization problem:
x1 ,..., xn max ƒ( x1 ,..., xn ) g ( x1 ,..., xn ) = c subject to * * are that the solutions (x1 ,...,xn ) satisfy the equations:
* * * * ƒ1 ( x1 ,..., xn ) = λ ⋅ g1 ( x1 ,..., xn ) * * * * ƒ 2 ( x1 ,..., xn ) = λ ⋅ g 2 ( x1 ,..., xn ) * * * * ƒ n ( x1 ,..., xn ) = λ ⋅ g n ( x1 ,..., xn ) and the constraint * * g ( x1 ,..., xn ) = c Once again, the easy way to remember this is simply that the normal vector of ƒ(x1,...,xn) be a scalar multiple of the normal vector of g(x1,...,xn) at the optimal point, i.e.:
* * * * ( ƒ1(x1*,...,xn ) , ... , ƒn(x1*,...,xn ) ) = λ ⋅ ( g1(x1*,...,xn ) , ... , gn(x1*,...,xn ) ) * * and also that the constraint g(x1 ,...,xn ) = c be satisfied. Lagrangians The first order conditions for the above constrained maximization problem are just a system of n+1 equations in the n+1 unknowns x1,...,xn and λ. Personally, I suggest that you get these first order conditions the direct way by simply setting the normal vector of ƒ(x1,...,xn) to equal a scalar multiple of the normal vector of g(x1,...,xn) (with the scale factor λ). However, another way to obtain these equations is to construct the Lagrangian function:
L(x1,...,xn,λ) ≡ ƒ(x1,...,xn) + λ⋅[c – g(x1,...,xn)] (where λ is called the Lagrangian multiplier). Then, if we calculate the partial derivatives ∂L/∂x1,...,∂L/∂xn and ∂L/∂λ and set them all equal to zero, we get the equations:
* * ∂L ( x1 ,..., xn , λ ) ∂x1 * * ∂L ( x1 ,..., xn , λ ) ∂x2 * * ∂L ( x1 ,..., xn , λ ) ∂xn * * ∂L ( x1 ,..., xn , λ ) ∂λ * * * * = ƒ1 ( x1 ,..., xn ) − λ ⋅ g1 ( x1 ,..., xn ) = 0 * * * * = ƒ 2 ( x1 ,..., xn ) − λ ⋅ g 2 ( x1 ,..., xn ) = 0 * * * * = ƒ n ( x1 ,..., xn ) − λ ⋅ g n ( x1 ,..., xn ) = 0 = c− * * g ( x1 ,..., xn ) =0 But these equations are the same as our original n+1 first order conditions. In other words, the method of Lagrangians is nothing more than a roundabout way of generating our condition that the normal vector of ƒ(x1,...,xn) be λ times the normal vector of g(x1,...,xn), and the constraint g(x1,...,xn) = c be satisfied. We will not do second order conditions for constrained optimization: they are a royal pain.
Econ 100A Fall 2003 10 Mathematical Handout E. SCALE PROPERTIES OF FUNCTIONS A function ƒ(x1,...,xn) is said to exhibit constant returns to scale if: ƒ(λ⋅x1,...,λ⋅xn) ≡ λ⋅ƒ(x1,...,xn) for all x1,...,xn and all λ > 0 That is, if multiplying all arguments by λ leads to the value of the function being multiplied by λ. Functions that exhibit constant returns to scale are also said to be homogeneous of degree 1. A function ƒ(x1,...,xn) is said to be scale invariant if: ƒ(λ⋅x1,...,λ⋅xn) ≡ ƒ(x1,...,xn) for all x1,...,xn and all λ > 0 In other words, if multiplying all the arguments by λ leads to no change in the value of the function. Functions that exhibit scale invariance are also said to be homogeneous of degree 0. Say that ƒ(x1,...,xn) is homogeneous of degree one, so that we have ƒ(λ⋅x1, ...,λ⋅xn) ≡ λ⋅ƒ(x1,...,xn). Differentiating this identity with respect to λ yields: ∑ ƒ (λ ⋅x ,..., λ ⋅x ) ⋅ x
i =1 i 1 n n i ≡ ƒ( x1 ,..., xn ) for all x1 ,..., xn , all λ > 0 and setting λ = 1 then gives: ∑ ƒ (x ,..., x ) ⋅ x
i =1 i 1 n n i ≡ ƒ( x1 ,..., xn ) for all x1 ,..., xn which is called Euler’s theorem, and which will turn out to have very important implications for the distribution of income among factors of production. Here’s another useful result: if a function is homogeneous of degree 1, then its partial derivatives are all homogeneous of degree 0. To see this, take the identity ƒ(λ⋅x1,...,λ⋅xn) ≡ λ⋅ƒ(x1,...,xn) and this time differentiate with respect to xi, to get: λ⋅ƒi(λ⋅x1,...,λ⋅xn) ≡ λ⋅ƒi(x1,...,xn)
or equivalently: ƒi(λ⋅x1,...,λ⋅xn) ≡ ƒi(x1,...,xn) for all x1,...,xn and λ > 0 for all x1,...,xn and λ > 0 which establishes our result. In other words, if a production function exhibits constant returns to scale (i.e., is homogeneous of degree 1), the marginal products of all the factors will be scale invariant (i.e., homogeneous of degree 0). F. OPTIMIZATION #2: COMPARATIVE STATICS OF SOLUTION FUNCTIONS Having obtained the first order conditions for a constrained or unconstrained optimization problem, we can now ask how the optimal values of the control variables change when the parameters change (for example, how the optimal quantity of a commodity will be affected by a price change or an income change). Consider a simple maximization problem with a single control variable x and single parameter α max ƒ( x ;α )
x Econ 100A Fall 2003 11 Mathematical Handout For a given value of α, recall that the solution x* is the value that satisfies the first order condition ∂ ƒ( x*;α ) =0 ∂x Since the values of economic parameters can (and do) change, we have defined the solution function x*(α) as the formula that specifies the optimal value x* for each value of α. Thus, for each value of α, the value of x*(α) satisfies the first order condition for that value of α. So we can basically plug the solution function x*(α) into the first order condition to obtain the identity ∂ ƒ( x*(α );α ) ≡0 α ∂x We refer to this as the identity version of the first order condition.
Comparative statics is the study of how changes in a parameter affect the optimal value of a control variable. For example, is x*(α) an increasing or decreasing function of α ? How sensitive is x*(α) to changes in α? To learn this about x*(α), we need to derive its derivative d x*(α)/dα. The easiest way to get d x*(α)/dα would be to solve the first order condition to get the formula for x*(α) itself, then differentiate it with respect to α to get the formula for d x*(α)/dα. But sometimes first order conditions are too complicated to solve. Are we up a creek? No: there is another approach, implicit differentiation, which always gets the formula for the derivative d x*(α)/dα. In fact, it can get the formula for d x*(α)/dα even when we can’t get the formula for the solution function x*(α) itself ! Implicit differentiation is straightforward. Since the solution function x*(α) satisfies the identity ∂ ƒ( x*(α );α ) ≡0 α ∂x we can just totally differentiate this identity with respect to α, to get ∂ 2 ƒ( x*(α );α ) d x*(α ) ⋅ ∂x 2 dα and solve to get
d x*(α ) dα ≡
α + ∂ 2 ƒ( x*(α );α ) ∂x ∂α ≡0
α − ∂ 2 ƒ( x*(α );α ) ∂x ∂α ∂ 2 ƒ( x*(α );α ) ∂x 2 For example, let’s go back to that troublesome problem max α⋅x2 – ex, with first order condition 2⋅α⋅x* – ex* = 0. Its solution function x* = x*(α) satisfies the first order condition identity
2 ⋅α ⋅ x*(α ) − e x*(α ) So to get the formula for d x*(α)/dα, totally differentiate this identity with respect to α : d x*(α ) d x*(α ) − e x * (α ) ⋅ ≡0 2 ⋅ x*(α ) + 2 ⋅α ⋅ α dα dα and solve, to get d x*(α ) dα ≡−
α α ≡ 0 2⋅ x*(α ) 2⋅α − e x *(α )
Mathematical Handout Econ 100A Fall 2003 12 Comparative Statics when there are Several Parameters Implicit differentiation also works when there is more than one parameter. Consider the problem max ƒ( x ;α , β )
x with first order condition ∂ ƒ( x*;α , β ) ∂x =0 Since the solution function x* = x*(α,β ) satisfies this first order condition for all values of α and β , we have the identity ∂ ƒ( x*(α , β );α , β ) ≡0 α ,β ∂x Note that the optimal value x* = x*(α,β ) is affected by both changes in α as well as changes in β. To derive ∂ x*(α,β )/∂α, we totally differentiate the above identity with respect to α, and then solve. If we want ∂ x*(α,β )/∂β, we totally differentiate the identity with respect to β, then solve. For example, consider the maximization problem
max a ⋅ ln( x ) − β ⋅ x 2
x Since the first order condition is α⋅[x*]–1 – 2⋅β ⋅x* = 0 its solution function x*(α,β ) will satisfy the identity α⋅[x*(α,β )]–1 – 2⋅β ⋅x*(α,β ) α≡β 0 ,
To get ∂ x*(α,β )/∂α, totally differentiate this identity with respect to α : [ x*(α , β )]−1 − α ⋅ [ x*(α , β )]−2 ⋅ and solve to get: ∂x*(α , β ) ∂α = [ x*(α , β )]−1 α ⋅ [ x*(α , β )]−2 + 2 ⋅ β ∂x*(α , β ) ∂x*(α , β ) − 2⋅ β ⋅ ∂α ∂α
α ,β ≡ 0 On the other hand, to get ∂ x*(α,β )/∂β, totally differentiate the identity with respect to β : −α ⋅ [ x*(α , β )]−2 ⋅ and solve to get: ∂x*(α , β ) ∂β = − 2 ⋅ x*(α , β ) α ⋅ [ x*(α , β )]−2 + 2 ⋅ β ∂x*(α , β ) ∂x*(α , β ) − 2 ⋅ x*(α , β ) − 2 ⋅ β ⋅ ∂β ∂β =0 Econ 100A Fall 2003 13 Mathematical Handout Comparative Statics when there are Several Control Variables Implicit differentiation also works when there is more than one control variable, and hence more than one equation in the first order condition. Consider the example
max ƒ( x1 , x2 ;α )
x1 , x2 The first order conditions are that x1* and x2* solve the pair of equations
∗ ∗ ∗ ∗ ∂ ƒ( x1 , x2 ;α ) ∂ ƒ( x1 , x2 ;α ) =0 and =0 ∂x1 ∂x2 so the solution functions x1* = x1*(α) and x2* = x2*(α) satisfy the pair of identities ∗ ∗ ∂ ƒ( x1 (α ), x2 (α );α ) ∂x1 ≡0
α and ∗ ∗ ∂ ƒ( x1 (α ), x2 (α );α ) ∂x2 ≡0
α To get ∂x1*(α)/∂α and ∂x2*(α)/∂α, totally differentiate both of these identities respect to α, to get
∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∂ 2 ƒ ( x1 (α ), x2 (α );α ) ∂x1 (α ) ∂ 2 ƒ ( x1 (α ), x2 (α );α ) ∂x2 (α ) ∂ 2 ƒ ( x1 (α ), x2 (α );α ) ⋅ + ⋅ + ∂ x12 ∂α ∂ x1 ∂ x2 ∂α ∂ x1 ∂α ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∂ 2 ƒ ( x1 (α ), x2 (α );α ) ∂x1 (α ) ∂ 2 ƒ ( x1 (α ), x2 (α );α ) ∂x2 (α ) ∂ 2 ƒ ( x1 (α ), x2 (α );α ) ⋅ + ⋅ + 2 ∂ x1 ∂ x2 ∂α ∂ x2 ∂α ∂ x2 ∂α ≡0
α ≡0
α This is a set of two linear equations in the two derivatives ∂x1*(α)/∂α and ∂x2*(α)/∂α, and we can solve for ∂x1*(α)/∂α and ∂x2*(α)/∂α by substitution, or by Cramer’s Rule, or however.
Comparative Statics of Equilibria Implicit differentiation isn’t restricted to optimization problems. It also allows us to derive how changes in the parameters affect the equilibrium values in an economic system. Consider a simple market system, with supply and demand and supply functions QD = D(P,I) and QS = S(P,w) where P is market price, and the parameters are income I and the wage rate w. Naturally, the equilibrium price is the value Pe solves the equilibrium condition D(Pe,I) = S(Pe,w) It is clear that the equilibrium price function, namely Pe = Pe(I,w), must satisfy the identity D( Pe(I,w) , I ) I≡ S( Pe(I,w) , w ) ,w So if we want to determine how a rise in income affects equilibrium price, totally differentiate the above identity with respect to I, to get ∂D ( P e ( I , w), I ) ∂P e ( I , w) ∂D ( P e ( I , w), I ) ⋅ + ∂P e ∂I ∂I then solve to get
Econ 100A Fall 2003 w, I ≡ ∂S ( P e ( I , w), w ) ∂P e ( I , w) ⋅ ∂P e ∂I 14 Mathematical Handout ∂P e ( I , w) ∂I = ∂D ( Pe ( I ,w), I ) ∂I e ( I ,w),w) ∂S ( P ∂D ( Pe ( I ,w), I ) − e ∂P ∂Pe In class, we’ll analyze this formula to see what it implies about the effect of changes in income upon equilibrium price in a market. For practice, see if you can derive the formula for the effect of changes in the wage rate upon the equilibrium price.
Summary of the Use of Implicit Differentiation to Obtain Comparative Statics Results The approach of implicit differentiation is straightforward, yet robust and powerful. It is used extensively in economic analysis, and always consists of the following four steps:
STEP 1: Obtain the first order conditions for the optimization problem, or the equilibrium conditions of the system. STEP 2: Convert these conditions to identities in the parameters, by substituting in the solution functions. STEP 3: Totally differentiate these identities with respect to the parameter that is changing. STEP 4: Solve for the derivatives with respect to that parameter. G. OPTIMIZATION #3: COMPARATIVE STATICS OF OPTIMAL VALUES The final question we can ask is how the optimal attainable value of the objective function varies when we change the parameters. This has a surprising aspect to it. In the unconstrained maximization problem: max ƒ( x1 ,..., xn ;α )
x1 ,..., xn * recall that we get the optimal value function φ(α) by substituting the solutions x1*(α),...,xn (α) back into the objective function, i.e.: * * φ (α ) ≡ ƒ( x1 (α ),..., xn (α );α )
α Thus, we could simply differentiate with respect to α to get:
dφ (α ) dα =
* * * ∂ ƒ( x1 (α ),..., xn (α );α ) dx1 (α ) ⋅ ∂x1 dα + +
Econ 100A Fall 2003 * * * ∂ ƒ( x1 (α ),..., xn (α );α ) dxn (α ) ⋅ ∂xn dα * * ∂ ƒ( x1 (α ),..., xn (α );α ) ∂α 15 Mathematical Handout where the last term is obviously the direct effect of α upon the objective function. The first n terms are there because a change in α affects the optimal xi values, which in turn affect the objective function. All in all, this derivative is a big mess. However, if we recall the first order conditions to this problem, we see that since ∂ƒ/∂x1 = ... = ∂ƒ/∂xn = 0 at the optimum, all of these first n terms are zero, so that we just get: dφ (α ) dα =
* * ∂ ƒ( x1 (α ),..., xn (α );α ) ∂α This means that when we evaluate how the optimal value function is affected when we change a parameter, we only have to consider that parameter’s direct affect on the objective function, and can ignore the indirect effects caused by the resulting changes in the optimal values of the control variables. If we keep this in mind, we can save a lot of time. This also works for constrained maximization problems. Consider the problem
x1 ,..., xn max ƒ( x1 ,..., xn ;α ) subject to g ( x1 ,..., xn ;α ) = c Once again, we get the optimal value function by plugging the optimal values of the control * variables (namely x1*(α),...,xn (α)) into the objective function:
* * φ (α ) ≡ ƒ( x1 (α ),..., xn (α );α )
α Note that since these values must also satisfy the constraint, we also have:
* * c − g ( x1 (α ),..., xn (α );α ) ≡ 0
α so we can multiply by λ(α) and add to the previous equation to get:
* * φ(α) ≡ ƒ(x1*(α),...,xn (α);α) + λ(α)⋅[c – g(x1*(α),...,xn (α);α)] * which is the same as if we had plugged the optimal values x1*(α),...,xn (α) and λ∗(α) directly into the Lagrangian formula, or in other words: * * * φ(α) ≡ L(x1*(α),...,xn (α),λ∗(α);α) ≡ ƒ(x1*(α),...,xn (α);α) + λ(α)⋅[c – g(x1*(α),...,xn (α);α)] Now if we differentiate the above identity with respect to α, we get:
dφ (α ) dα =
* * * ∂L ( x1 (α ),..., xn (α ), λ*(α );α ) d x1 (α ) ⋅ ∂x1 dα + + + * * * ∂L ( x1 (α ),..., xn (α ), λ*(α );α ) d xn (α ) ⋅ ∂xn dα * * ∂L ( x1 (α ),..., xn (α ), λ*(α );α ) d λ *(α ) ⋅ dα ∂λ * * ∂L ( x1 (α ),..., xn (α ), λ*(α );α ) ∂α Econ 100A Fall 2003 16 Mathematical Handout But once again, since the first order conditions for the constrained maximization problem are ∂L/∂x1 = ⋅⋅⋅ = ∂L/∂xn = ∂L/∂λ = 0, all but the last of these right hand terms are zero, so we get: dφ (α ) dα =
* * ∂L ( x1 (α ),..., xn (α ), λ*(α );α ) ∂α In other words, we only have to take into account the direct effect of α on the Lagrangian function, and can ignore the indirect effects due to changes in the optimal values of the xi’s and λ. A very helpful thing to know. H. DETERMINANTS, SYSTEMS OF LINEAR EQUATIONS & CRAMER’S RULE The Determinant of a Matrix In order to solve systems of linear equations we need to define the determinant A of a square matrix A. If A is a 1 × 1 matrix, that is, if A = [a11], we define A = a11. In the 2 × 2 case:
if ⎡a A = ⎢ 11 ⎣ a21 a12 ⎤ a22 ⎥ ⎦ we define A = a11 ⋅ a22 − a12 ⋅ a21 that is, the product along the downward sloping diagonal (a11⋅a22), minus the product along the upward sloping diagonal (a12⋅a21).
⎡ a11 A = ⎢ a21 ⎢ ⎢ a31 ⎣ a12 a22 a32 a13 ⎤ a23 ⎥ ⎥ a33 ⎥ ⎦ ⎡ a11 ⎢a ⎢ 21 ⎢ a31 ⎣ a12 a22 a32 a13 ⎤ a11 a23 ⎥ a21 ⎥ a33 ⎥ a31 ⎦ a12 a22 a32 In the 3 × 3 case: if then first form (i.e., recopy the first two columns). Then we define: A = a11⋅a22⋅a33 + a12⋅a23⋅a31 + a13⋅a21⋅a32 – a13⋅a22⋅a31 – a11⋅a23⋅a32 – a12⋅a21⋅a33 in other words, add the products of all three downward sloping diagonals and subtract the products of all three upward sloping diagonals. Unfortunately, this technique doesn’t work for 4×4 or bigger matrices, so to hell with them.
Systems of Linear Equations and Cramer’s Rule The general form of a system of n linear equations in the n unknown variables x1,...,xn is: a11 ⋅ x1 + a12 ⋅ x2 + a21 ⋅ x1 + a22 ⋅ x2 + an1 ⋅ x1 + an 2 ⋅ x2 + + a1n ⋅ xn + a 2 n ⋅ xn + ann ⋅ xn = c1 = c2 = cn Econ 100A Fall 2003 17 Mathematical Handout ⎡ a11 a12 ⎢a a22 for some matrix of coefficients A = ⎢ 21 ⎢ ⎢ ⎣ a n1 a n 2 a1n ⎤ a2 n ⎥ ⎥ and vector of constants C = ⎥ ⎥ ann ⎦ ⎡ c1 ⎤ ⎢c ⎥ ⎢ 2⎥ ⎢⎥ ⎢⎥ ⎣ cn ⎦ Note that the first subscript in the coefficient aij refers to its row and the second subscript refers to its column (thus, aij is the coefficient of xj in the i’th equation). We now give Cramer’s Rule for solving linear systems. The solutions to the 2 × 2 linear system: a11 ⋅ x1 + a12 ⋅ x2 a21 ⋅ x1 + a22 ⋅ x2 are simply:
* x1 = c1 = c2 = c1 a12 c2 a22 a11 a12 a21 a22 and * x2 = a11 c1 a21 c2 a11 a12 a21 a22 The solutions to the 3 × 3 system: a11 ⋅ x1 + a12 ⋅ x2 + a13 ⋅ x3 a21 ⋅ x1 + a22 ⋅ x2 + a23 ⋅ x3 a31 ⋅ x1 + a32 ⋅ x2 + a33 ⋅ x3 are simply: c1 a12 a13 c2 a22 a23 c3 a32 a33 a11 a12 a13 a21 a22 a23 a31 a32 a33 a11 c1 a13 a21 c2 a23 a31 c3 a33 a11 a12 a13 a21 a22 a23 a31 a32 a33 a11 a12 c1 a21 a22 c2 a31 a32 c3 a11 a12 a13 a21 a22 a23 a31 a32 a33 = c1 = c2 = c3 ∗ x1 = ∗ x2 = ∗ x3 = Note that in both the 2 × 2 and the 3 × 3 case we have that x* is obtained as the ratio of two i determinants. The denominator is always the determinant of the coefficient matrix A. The numerator is the determinant of a matrix which is just like the coefficient matrix, except that the j’th column has been replaced by the vector of right hand side constants. Econ 100A Fall 2003 18 Mathematical Handout ...
View
Full Document
 Fall '08
 staff
 Economics, Microeconomics

Click to edit the document details