43 Pages

mcleish_chap4

Course: STAT 340, Winter 2007
School: Waterloo
Rating:
 
 
 
 
 

Word Count: 12782

Document Preview

Reduction CHAPTER 4 Variance Techniques INTRODUCTION In this chapter we discuss techniques for improving on the speed and efciency of a simulation, usually called variance reduction techniques. Much of the simulation literature concerns discrete event simulations (DESs), simulations of systems that are assumed to change instantaneously in response to sudden or discrete events. These are most common in...

Register Now

Unformatted Document Excerpt

Coursehero >> Canada >> Waterloo >> STAT 340

Course Hero has millions of student submitted documents similar to the one
below including study guides, practice problems, reference materials, practice exams, textbook help and tutor support.

Course Hero has millions of student submitted documents similar to the one below including study guides, practice problems, reference materials, practice exams, textbook help and tutor support.
Reduction CHAPTER 4 Variance Techniques INTRODUCTION In this chapter we discuss techniques for improving on the speed and efciency of a simulation, usually called variance reduction techniques. Much of the simulation literature concerns discrete event simulations (DESs), simulations of systems that are assumed to change instantaneously in response to sudden or discrete events. These are most common in operations research, and examples are simulations of processes such as networks or queues. Simulation models in which the process is characterized by a state, with changes only at discrete time points, are DESs. In modeling an inventory system, for example, the arrival of a batch of raw materials can be considered an event that precipitates a sudden change in the state of the system, followed by a demand some discrete time later when the state of the system changes again. A system driven by differential equations in continuous time is not a DES because the changes occur continuously in time. One approach to DES is future event simulation, which schedules one or more future events at a time, choosing the event in the future event set that has minimum time, updating the state of the system and the clock accordingly, and then repeating this whole procedure. Simulation of a stock price that moves at discrete time points by an amount that may be a continuous random variable is a DES. In fact, this approach is often used in valuing American options by Monte Carlo methods where we model the stock price path using a binomial or trinomial tree. Often we identify one or more performance measures by which the system is to be judged, and parameters that may be adjusted to improve the system performance. Examples are the delay for an air trafc control system, customer waiting times for a bank teller scheduling system, delays or throughput for computer networks, and response times for the location of re stations or supply depots. Performance measures are important in engineering examples or in operations research, but less common in nance. 163 164 MONTE CARLO SIMULATION AND FINANCE They may be used to calibrate a simulation model, however. For example, our performance measure might be the average distance between observed option prices on a given stock and prices obtained by simulation from a model with specic parameter values. In all cases, the performance measure is usually the expected value of a complicated function of many variables, often expressible only by a computer program with some simulated random variables as input. Whether these input random variables are generated by inverse transform, acceptance-rejection, or some other method, they are ultimately a function of uniform[0, 1] random variables U1 U2 . . .. These uniform random variables determine such quantities as the normally distributed increments of the logarithm of the stock price. In summary, the simulation is used simply to estimate a multidimensional integral of the form E(g(U1 . . . Ud )) = g (u1 u2 . . . ud )du1 du2 dud (4.1) over the unit cube in d dimensions, where often d is large. As an example in nance, suppose that we wish to price a European option on a stock price under the following stochastic volatility model. Example Suppose the daily asset returns under a risk-neutral distribution is assumed to be a variance mixture of the normal distribution, by which we mean that the variance itself is random, independent of the normal variable, and follows a distribution with moment-generating function m(s). More specifically, assume under the Q measure that the stock price at time (n + 1) t is determined from exp{r t + n+1 Zn+1 } S(n+1) t = Sn t m( 1 ) 2 where, under the risk-neutral distribution, the positive random variables 2 i are assumed to have a distribution with moment-generating function m(s) = E {exp(s i )}, Zi is standard normal independent of 2 , and both Zn+1 and i 2 +1 are independent of the process up to time n t. We wish to determine n the price of a European call option with maturity T and strike price K. It should be noted that the rather strange choice of m( 1 ) in the denom2 inator above is such that the discounted process is a martingale, since E exp{n+1 Zn+1 } m( 1 ) 2 =E E =E =1 exp{n+1 Zn+1 } n+1 m( 1 ) 2 exp{2 +1 /2} n m( 1 ) 2 165 Variance Reduction Techniques There are many ways of simulating an option price in the above example, some much more efcient than others. We might, for example, simulate all of the 2n random variables {i Zi i = 1 . . . n = T / t } and use these to determine the simulated value of ST nally averaging the discounted payoff from the option in this simulation, erT (ST K)+ . The price of this option at time 0 is the average of many such simulations (say we do this a total of N times) discounted to the present, erT (ST K)+ where the bar indicates the average of the values erT (ST K)t values over all simulations. This is a description of a crude and inefcient method of conducting this simulation. Roughly, the time required for the simulation is proportional to 2N n the total number of random variables generated. This chapter discusses some of the many improvements possible in problems like this. Since each simulation requires at least d = 2n independent uniform random variables to generate the values {i Zi i = 1 . . . n} , we are trying to estimate a rather complicated integral of the form (4.1) of high dimension d. In this case, however, we can immediately see some obvious improvements. Notice that we can rewrite ST in the form ST = S0 exp{rT + Z } mn ( 1 ) 2 (4.2) where the random variable 2 = n=1 2 has moment-generating function i i mn (s) and Z is independent standard normal. Obviously, if we can simulate directly, we can avoid the computation involved in generating the individual i . Further savings are possible in light of the Black-Scholes formula, which provides the price of a call option when a stock price is given by (4.2) and the volatility parameter is nonrandom. The expected return from the call under the risk-neutral distribution can be written, using the Black-Scholes formula, as E erT (ST K)+ = E E erT (ST K)+ | = E S0 2 log(S0 /K) + (r + )T 2 T Ke rT 2 log(S0 /K) + (r )T 2 T which is now a one-dimensional integral over the distribution of 2 . This can now be evaluated either by a one-dimensional numerical integration or by repeatedly simulating the value of 2 and averaging the values of S0 2 log(S0 /K) + (r + )T 2 T Ke rT 2 log(S0 /K) + (r )T 2 T 166 MONTE CARLO SIMULATION AND FINANCE obtained from these simulations. As a special case we might take the distribution of 2 to be gamma( t ) with moment-generating function i m(s) = 1 (1 s) t in which case the distribution of 2 is gamma(T ). This is the so-called variance-gamma distribution investigated extensively by Madan and Seneta (1990) and originally suggested as a model for stock prices by McLeish and Pierson (cf. McLeish, 1982). Alternatively, many other, wider-tailed alternatives to the normal returns model can be written as a variance mixture of the normal distribution, and option prices can be simulated in this way. For example, when the variance is generated having the distribution of the reciprocal of a gamma random variable, the returns have a Students t distribution. Similarly, the stable distributions and the Laplace distribution all have a representation as a variance mixture of the normal. The rest of this chapter discusses variance reduction techniques such as the one employed above for evaluating integrals such as (4.1), beginning with the much simpler case of an integral in one dimension. VARIANCE REDUCTION FOR ONE-DIMENSIONAL MONTE CARLO INTEGRATION We wish to use Monte Carlo methods to evaluate the one-dimensional in1 tegral = 0 f (u)du for some function f (u). We have already noted that whatever the distribution of random variables required in our simulation, they are usually generated using uniform[0,1] random variables U so without loss of generality we can assume that the integral is with respect to the uniform[0,1] probability density function: that is, we wish to estimate 1 = E {f (U )} = f (u)du 0 One simple approach, called crude Monte Carlo, is to randomly sample Ui uniform[0 1] and then average the values of f (Ui ) to obtain CR = 1 n n f (Ui ) i =1 It is easy to see that E(CR ) = , so that this average is an unbiased estimator of the integral and the variance of the estimator is var (CR ) = var (f (U1 )) n 167 Variance Reduction Techniques Example: A Crude Simulation of a Call Option Price under the Black-Scholes Model For a simple example that we will use throughout, consider an integral used to price a call option. We have seen that if a European option has payoff V (ST ), where ST is the value of the stock at maturity T then the option can be valued at the present (t = 0) using the discounted future payoff from the option under the risk-neutral measure: erT E [V (ST )] = erT E [V (S0 eX )] where, in the Black-Scholes model, the random variable X = ln(ST /S0 ) has a normal distribution with mean rT 2 T /2 and variance 2 T. It is possible 2 to generate a normally distributed random variable X = 1 (U; rT T 2 2 2 T ) using the inverse transform method, where 1 (U; rT T 2 T ) is 2 2 the inverse of the normal(rT 2 T 2 T ) cumulative distribution function evaluated at U a uniform[0 1] random variable. Then the value of the option can be written as an expectation over the distribution of the uniform random variable U 1 E {f (U )} = f (u)du 0 where f (u) = erT V S0 exp 1 U ; rT 2 T 2 T 2 This function is graphed in Figure 4.1 in the case of a simple call option with strike price K payoff at maturity V(ST ) = (ST K)+ current stock 4 3.5 3 f(u) 2.5 2 1.5 1 0.5 0 0 0.1 0.2 0.3 0.4 0.5 u 0.6 0.7 0.8 0.9 1 FIGURE 4.1 The Function f (u) Whose Integral Provides the Value of a Call Option 168 MONTE CARLO SIMULATION AND FINANCE price S0 = $10 exercise price K = $10 annual interest rate r = 5 percent, maturity of three months or one-quarter of a year (T = 0.25) and annual volatility = 0.20. A simple crude Monte Carlo estimator corresponds to evaluating this function at a large number of randomly selected values of Ui U [0 1] and then averaging the results. For example, the following function in Matlab accepts a vector of inputs u = (U1 . . . Un ) assumed to be uniform[0, 1] and outputs the values of f (U1 ) . . . f (Un ) which can be averaged to give 1 CR = n n=1 f (Ui ). i function v=fn(u) % value of the integrand for a call option with exercise price ex, r=annual interest rate, %sigma = annual vol, S0 = current stock price. % u = vector of uniform (0,1) inputs to %generate normal variates by inverse transform. T = maturity S0 = 10 ;K = 10;r = .05; sigma = .2 ;T = .25 ; % Values of parameters ST = S0*exp(norminv(u,r*T-sigma2*T/2,sigma*sqrt(T))); 2 % ST =S0 exp{ 1(U; rT T 2 T )} is stock price at time T 2 v = exp(-r*T)*max((ST-ex),0); % v = discounted to present payoffs from the call option The analogous function in R is fn<-function(u,So,strike,r,sigma,T){ # value of the integrand for a call option with exercise price=strike, r=annual interest rate, # sigma=annual volatility, So=current stock price, u = uniform (0,1) input to generate normal variates # by inverse transform. T=time to maturity. For Black-Scholes price, integrate over (0,1). x<-So*exp(qnorm(u,mean=r*T-sigma2*T/2,sd = sigma*sqrt(T)) ) v<-exp(-r*T)*pmax((x-strike),0) v} In the case of initial stock price = $10 , exercise price = $10 , annual volatility = 0.20 r = 5 percent, T = 0.25 (three months), this is run as u = rand(1,500000); mean(fn(u)) and in R, 169 Variance Reduction Techniques mean(fn(runif(500000),So = 10,strike = 10,r = .05, sigma = .2,T = .25)) and this provides an approximate value of the option of CR = 0.4620. We may conrm this using the Black-Scholes formula, again in Matlab, [CALL,PUT] = BLSPRICE(10,10,0.05,0.25,0.2,0). The arguments are, in order, (S0 K r T q), where the last argument (q = 0) is the annual dividend yield, which we assume here to be zero. Provided that no dividends are paid on the stock before the maturity of the option, this is reasonable. This Matlab command provides the result CALL = 0.4615 and PUT = 0.3373, indicating that our simulated call option price was reasonably accurateoff by 1 percent or so. The put option is an option to sell the stock at the specied price $10 at the maturity date and is also priced by this same function. One of the advantages of Monte Carlo methods over numerical techniques is that, because we are using a sample mean, we have a simple estimator of accuracy. In general, when n simulations are conducted, the accuracy is measured by the standard error of the sample mean. Since var (CR ) = var (f (U1 )) n the standard error of the sample mean is the standard deviation of CR , or f SE(CR ) = n (4.3) where 2 = var (f (U )). As usual, we estimate 2 using the sample standf f ard deviation. Since fn(u) provides a whole vector of estimators ( f (U1 ) f (U2 ) . . . f (Un )) then sqrt(var(fn(u))) is the sample estimator of f so the standard error SE(CR ) is given by Sf=sqrt(var(fn(u))); Sf/sqrt(length(u)) giving an estimate 0.6603 of the standard deviation f or standard error f / 500 000, or 0.0009. Of course, parameters in statistical problems are usually estimated using an interval estimate or a condence interval, an interval constructed using a method that guarantees capturing the true value of the parameter under similar circumstances with high probability (the condence coefcient, often taken to be 95 percent). Formally, 170 MONTE CARLO SIMULATION AND FINANCE Definition A 95 percent condence interval for a parameter is an interval [L U ] with random endpoints L U such that the probability P [L U ] = 0.95. If we were to repeat the experiment 100 times, say, by running 100 more, similar independent simulations, and in each case use the results to construct a 95 percent condence interval, then this denition implies that roughly 95 of the intervals constructed will contain the true value of the parameter (and, of course, roughly 5 will not). For an approximately normal(X 2 ) random X variable X , we can use the approximation P [X 2X X X + 2X ] 0.95 (4.4) (i.e., approximately normal variables are within two standard deviations of their mean with probability around 95 percent) to build a simple condence interval. Strictly, the value 2X should be replaced by 1.96X , where 1.96 is taken from the normal distribution tables. The value 2 is very close to correct for a t distribution with 60 degrees of freedom. In any case, these condence intervals, which assume approximate normality, are typically too short (i.e., contain the true value of the parameter less frequently than stated) for most real data, and so a value marginally larger than 1.96 is warranted. Replacing X above by the standard deviation of a sample mean, (4.4) results in the approximate 95 percent condence interval f f CR 2 CR + 2 n n for the true value . With condence 95 percent, the true price of the option is within the interval 0.462 2(0.0009). As it happens in this case, this interval does capture the true value, 0.4615 of the option. So far Monte Carlo has not told us anything we couldnt obtain from the Black-Scholes formula, but what if we used a distribution other than the normal to generate the returns? This is an easy modication of the above. For example, suppose we replace the standard normal by a logistic distribution, which, as we have seen, has a density function very similar to the standard normal if we choose b = 0.625. Of course, the Black-Scholes formula does not apply to a process with logistically distributed returns. We need only replace the standard normal inverse cumulative distribution function by the corresponding inverse for the logistic, F 1 (U ) = b ln U 1U and thus replace the Matlab code, norminv(u,T*(r-sigma2/2), sigma* sqrt(T)) by T*(r-sigma2/2)+sigma*sqrt(T)*.625* log(u./(1-u)) . 171 Variance Reduction Techniques This results in a slight increase in the option value (to 0.504) and a considerable (about 50 percent) increase in the variance of the estimator. We will look at the efciency of various improvements to crude Monte Carlo, and to that end, we record the value of the variance of the estimator based on a single uniform variate in this case: 2 2 crude = f = var(f (U )) 0.436 Then the crude Monte Carlo estimator using n function evaluations or n uniform variates has variance approximately 0.436/n. If we were able to adjust the method so that the variance 2 based on a single evaluation of the f function f in the numerator were halved, then we could achieve the same accuracy from a simulation using half the number of function evaluations. For this reason, when we compare two different methods for conducting a simulation, the ratio of variances corresponding to a xed number of function evaluations can also be interpreted roughly as the ratio of computational effort required for a given predetermined accuracy. We will often compare various new methods of estimating the same function based on variance reduction schemes and quote the efciency gain over crude Monte Carlo sampling: Efciency = Variance of crude Monte Carlo estimator Variance of new estimator (4.5) where the numerator and denominator correspond to estimators with the same number of function evaluations (since this is usually the more expensive part of the computation). An efciency of 100 would indicate that the crude Monte Carlo estimator would require 100 times the number of function evaluations to achieve the same variance or standard error of the estimator. To begin with, consider a crude estimator obtained from ve U [0 1] variates, Ui = 0.1 0.3 0.5 0.6 0.8 i = 1 . . . 5 The crude Monte Carlo estimator in the case n = 5 is displayed in Figure 4.2, the estimator being the sum of the areas of the marked rectangles. Only three of the ve points actually contribute to this area since for this particular function f (u) = e rT S0 exp 1 2 u; rT T 2 T 2 + K (4.6) and the parameters chosen, f (0.1) = f (0.3) = 0. Since these two random numbers contributed 0 and the other three appear to be on average slightly too small, the sum of the area of the rectangles appears to underestimate 172 MONTE CARLO SIMULATION AND FINANCE 3 2.5 f(u) 2 1.5 1 0.5 0 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 u FIGURE 4.2 Crude Monte Carlo Estimator Based on Five Observations, Ui = 0.1 0.3 0.5 0.6 0.8 the integral. Of course, another selection of ve uniform random numbers may prove to be even more badly distributed and may result in an under- or overestimate. There are various ways of improving the efciency of this estimator, many of which partially emulate numerical integration techniques. First we should note that most numerical integrals, like CR , are weighted averages of the values of the function at certain points Ui . What if we evaluated the function at nonrandom points, chosen to attempt reasonable balance between locations where the function is large and small? Numerical integration techniques and quadrature methods choose both points at which we evaluate the function and weights that we attach to these points to provide accurate approximations for polynomials of certain degree. For example, suppose we insist on evaluating the function at equally spaced points, such as the points 0 1/n 2/n . . . (n 1)/n 1. In some sense these points are now more uniform than we are likely to obtain from n + 1 randomly and independently chosen points Ui i = 1 2 . . . n. The trapezoidal rule corresponds to using such equally spaced points and equal weights (except at the boundary), so that the estimator of the integral is T R = 1 f (0) + 2f (1/n) + + 2f 2n 1 1 n + f (1) (4.7) or the simpler and very similar alternative in our case, with n = 5 T R = 1 {f (0.1) + f (0.3) + f (0.5) + f (0.7) + f (0.9)} 5 (4.8) 173 Variance Reduction Techniques 4 3.5 3 2.5 2 1.5 1 0.5 0 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 FIGURE 4.3 Graphical Illustration of (4.8) A reasonable balance between large and small values of the function is almost guaranteed by such a rule, as shown in Figure 4.3, with the observations equally spaced. Simpsons rule is to generate equally spaced points and weights that (except for endpoints) alternate: 2/3n 4/3n 2/3n . . . . In the case when n is even, the integral is estimated by SR = 1 f (0) + 4f (1/n) + 2f (2/n) + + 4f 3n n1 n + f (1) (4.9) The trapezoidal rule is exact for linear functions, and Simpsons rule is exact for quadratic functions. These one-dimensional numerical integration rules provide some insight into how to achieve lower variance in Monte Carlo integration. It illustrates some options for increasing accuracy over simple random sampling. We may either vary the weights attached to the individual points or vary the points (the Ui ) themselves or both. Notice that as long as the Ui individually have distributions that are uniform[0 1], we can introduce any degree of dependence among them in order to come closer to the equal spacings characteristic of numerical integrals. Even if the Ui are dependent U [0 1] , an estimator of the form n 1 f (Ui ) n i =1 will continue to be an unbiased estimator because each of the summands continue to satisfy E(f (Ui )) = . Ideally, if we introduce dependence among 174 MONTE CARLO SIMULATION AND FINANCE the various Ui and the expected value remains unchanged, we would wish that the variance n 1 f (Ui ) var n i =1 is reduced over independent uniform. The simplest case of this idea is the use of antithetic random variables. Antithetic Random Numbers Consider rst the simple case of n = 2 function evaluations at possibly dependent points. Then the estimator is = 1 {f (U1 ) + f (U2 )} 2 with expected value = 1 0 f (u)du and variance given by var () = 1 { var (f (U1 )) + cov[f (U1 ) f (U2 )]} 2 assuming both U1 U2 are uniform[0, 1]. In the independent case the covariance term disappears and we obtain the variance of the crude Monte Carlo estimator 1 var (f (U1 )) 2 Notice, however, that if we are able to introduce a negative covariance, the resulting variance of will be smaller than that of the corresponding crude Monte Carlo estimator, so the question is how to generate this negative covariance. Suppose, for example, that f is monotone (increasing or decreasing). Then f (1 U1 ) decreases whenever f (U1 ) increases, so that substituting U2 = 1 U1 has the desired effect and produces a negative covariance (in fact, we will show later that we cannot do any better when the function f is monotone). Such a choice of U2 = 1 U1 which helps reduce the variability in f (U1 ) , is termed an antithetic variate. In our example, because the function to be integrated is monotone, there is a negative correlation between f (U1 ) and f (1 U1 ) and 1 { var (f (U1 )) 2 + cov[f (U1 ) f (U2 )]} < 1 var(f (U1 )) 2 That is, the variance is decreased over simple random sampling. Of course, in practice our sample size is much greater than n = 2 but we still enjoy the benets of this argument if we generate the points in antithetic pairs. For example, to determine the extent of the variance reduction using antithetic random numbers, suppose we generate 500 000 uniform variates U and use 175 Variance Reduction Techniques as well the values of 1 U (for a total of 1 000 000 function evaluations as before). F=(fn(u)+fn(1-u))/2; This results in mean(F)=0.46186 and var(F)=0.1121. The standard error of the estimator is 0.1121 = 2.24 107 length(F ) Since each of the 500,000 components of F obtains from two function evaluations, the variance should be compared with a crude Monte Carlo estimator with 1,000,000 function evaluations, 2 /1 000 000 = 4.35 107 . crude The efciency gain due to the use of antithetic random numbers is 4.35/2.24, or about 2, so roughly half as many function evaluations using antithetic random numbers provide the same precision as a crude Monte Carlo estimator. There is the additional advantage that only half as many uniform random variables are required. The introduction of antithetic variates has had the same effect on precision as increasing the sample size under crude Monte Carlo by a factor of approximately 2. We have noted that antithetic random numbers improve the efciency whenever the function being integrated is monotone in u. What if it is not? For example, suppose we use antithetic random numbers to integrate the function f (u) = u(1 u) on the interval 0 < u < 1. Rather than balance large values with small values and so reduce the variance of the estimator, in this case notice that f (U ) and f (1 U ) are strongly positively correlated, in fact are equal, and so the argument supporting the use of antithetic random numbers for monotone functions will show that in this case they increase the variance over a crude estimator with the same number of function evaluations. Of course, this problem can be remedied if we can identify intervals in which the function is monotone. In this case we could use antithetic random numbers in the two intervals [0 1 ] and [ 1 1]; so, for example, we might 2 2 1 estimate 0 f (u)du by an average of terms like 1 f 4 U1 +f 2 1 U1 +f 2 1 + U2 +f 2 2 U2 2 for independent U [0 1] random variables U1 U2 . Stratified Sample One of the reasons for the inaccuracy of the crude Monte Carlo estimator in the above example is the large interval, evident in Figure 4.1, in which 176 MONTE CARLO SIMULATION AND FINANCE the function is zero. Nevertheless, both crude and antithetic Monte Carlo methods sample in that region, this portion of the sample contributing nothing to our integral. Naturally, we would prefer to concentrate our sample in the region where the function is positive, and where the function is more variable, use larger sample sizes. One method designed to achieve this objective is the use of a stratied sample. Once again, for a simple example we choose n = 2 function evaluations, and with V1 U [0 a ] and V2 U [a 1] dene an estimator st = af (V1 ) + (1 a)f (V2 ) Note that this is a weighted average of the two function values with weights a and 1 a proportional to the length of the corresponding intervals. It is easy to show once again that the estimator st is an unbiased estimator of , since E(st ) = aEf (V1 ) + (1 a)Ef (V2 ) a =a 0 1 = 1 f (x) dx + (1 a) a 1 f (x) a 1 dx 1a f (x)dx 0 Moreover, var (st ) = a 2 var[f (V1 )] + (1 a)2 var[f (V2 )] + 2a(1 a) cov[f (V1 ) f (V2 )] (4.10) Even when V1 V2 are independent, and so we obtain var (st ) = a 2 var[f (V1 )] + (1 a)2 var[f (V2 )] there may be a dramatic improvement in variance over crude Monte Carlo provided that the variability of f in each of the intervals [0 a ] and [a 1] is substantially less than that in the whole interval [0 1]. Let us return to the call option example above, with f dened by (4.6). Suppose for simplicity we choose independent values of V1 V2 . In this case, var (st ) = a 2 var[f (V1 )] + (1 a)2 var[f (V2 )] (4.11) For example, for a = 0.7 this results in a variance of about 0.046, obtained from F=a*fn(a*rand(1,500000))+(1-a)*fn(a+(1-a)* rand(1,500000)); var(F) and the variance of the sample mean of the components of the vector F is var(F)/length(F), or around 9.2 108 . Since each component of the vector above corresponds to two function evaluations, we should compare this 177 Variance Reduction Techniques with a crude Monte Carlo estimator with n = 1 000 000 having variance 2 106 = 4.36 107 . This corresponds to an efciency gain of 43.6/9.2 f or around 5. We can afford to use one-fth the sample size by simply splitting the sample into two strata. The improvement is somewhat limited by the fact that we are still sampling in a region in which the function is 0 (although now slightly less often). A general stratied sample estimator is constructed as follows. We subdivide the interval [0 1] into convenient subintervals 0 = x0 < x1 < < xk = 1, and then select ni random variables uniform on the corresponding interval Vij U [xi 1 xi ] j = 1 2 . . . ni . Then the estimator of is k st = (xi xi 1 ) i =1 1 ni ni f (Vij ) (4.12) j =1 Once again, the weights (xi xi 1 ) on the average of the function in the ith interval are proportional to the lengths of these intervals, and the estimator st is unbiased: k 1 ni (xi xi 1 )E f (Vij ) E(st ) = ni i =1 j =1 k = (xi xi 1 )Ef (Vi 1 ) i =1 k (xi xi 1 ) = = i =1 1 xi f (x) xi 1 1 dx xi xi 1 f (x)dx = 0 In the case that all of the Vij are independent, the variance is given by k var (st ) = (xi xi 1 )2 i =1 1 var[f (Vi 1 )] ni (4.13) Again, if we choose our intervals so that the variation within intervals var [f (Vi 1 )] is small, this provides a substantial improvement over crude Monte Carlo. Suppose we wish to choose the sample sizes so as to minimize this variance. Obviously, to avoid innite sample sizes and to keep a ceiling on costs, we need to impose a constraint on the total sample size, say k ni = n i (4.14) 178 MONTE CARLO SIMULATION AND FINANCE If we treat the parameters ni as continuous variables, we can use the method of Lagrange multipliers to solve k (xi xi 1 )2 min {ni } i =1 1 var[f (Vi 1 )] ni subject to the constraint (4.14). It is easy to show that the optimal choice of sample sizes within intervals are ni (xi xi 1 ) var[f (Vi 1 )] or more precisely that ni = n (xi xi 1 ) var f (Vi 1 ) k j =1 (xj xj 1 ) var[f (Vj 1 )] (4.15) In practice, of course, this will not necessarily produce an integral value of ni , and so we are forced to round to the nearest integer. For this optimal choice of sample size, the variance is now given by 2 k 1 (xj xj 1 ) var[f (Vj 1 )] var (st ) = n j =1 The term k=1 (xj xj 1 ) var[f (Vj 1 )] is a weighted average of the standard j deviation of the function f within the interval (xi 1 xi ), and it is clear that, at least for a continuous function, these standard deviations can be made small simply by choosing k large with |xi xi 1 | small. In other words, if we ignore the fact that the sample sizes must be integers, at least for a continuous function f, we can achieve arbitrarily small var (st ) using a xed sample size n simply by division into a very large number of (small) strata. The intervals should be chosen so that the variances var[f (Vi 1 )] are small, ni (xi xi 1 ) var[f (Vi 1 )] . In summary, optimal sample sizes are proportional to the lengths of intervals times the standard deviation of the function evaluated at a uniform random variable on the interval. For sufciently small strata we can achieve arbitrarily small variances. The following function was designed to accept the strata x1 x2 . . . xk and the desired sample size n as input, and then determine optimal sample sizes and the stratied sample estimator as follows: 1. Initially sample sizes 1000 are chosen from each stratum, and these of are used to estimate var[f (Vi 1 )] . Variance Reduction Techniques 179 2. Approximately optimal sample sizes ni are then calculated from (4.15). 3. Samples of size ni are then taken and the stratied sample estimator (4.12), its variance (4.13), and the sample sizes ni are output. function [est,v,n]=stratified(x,nsample) % function for optimal sample size stratified estimator on call option price example %[est,v,n]=stratified([0 .6 .85 1],100000) uses three strata (0,.6),(.6 .85),(.85 1) and total sample size 100000 est=0; n=[]; m=length(x); for i=1:m-1 % the preliminary sample of size 1000 v= var(callopt2(unifrnd(x(i),x(i+1),1,1000),10,10,.05, .2,.25)); n=[n (x(i+1)-x(i))*sqrt(v)]; end n=floor(nsample*n/sum(n)); % calculation of the optimal sample sizes, rounded down v=0; for i=1:m-1 F=callopt2(unifrnd(x(i),x(i+1),1,n(i)),10,10,.05,.2, .25); %evaluate the function f at n(i) uniform points in interval est=est+(x(i+1)-x(i))*mean(F); v=v+var(F)*(x(i+1)-x(i))2/n(i); end A call to [est,v,n]=stratied([0 .6 .85 1],100000), for example, generates a stratied sample with three strata [0, 0.6], (0.6, 0.85], and (0.85, 1], and outputs the estimate est = 0.4617, its variance v = 3.5 107 , and the approximately optimal choice of sample sizes n = 26 855 31 358 41 785. To compare this with a crude Monte Carlo estimator, note that a total of 99,998 function evaluations are used, so the efciency gain is 2 /(99 998 3.5 f 107 ) = 12.8. Evidently this stratied random sample can account for an improvement in efciency of about a factor of 13. Of course, there is a little setup cost here (a preliminary sample of size 3000), which we have not included in our calculation, but the results of that preliminary sample could have been combined with the main sample for a very slight decrease in variance as well). For comparison, the function call 180 MONTE CARLO SIMULATION AND FINANCE [est,v,n]=stratified([.47 .62 .75 .87 .96 1],1000000) uses ve strata, [0.47, 0.62], [0.62, 0.75], [0.75, 0.87], [0.87, 0.96], [0.96, 1], and gives a variance of the estimator of 7.4 109 . Since a crude sample of the same size has variance around 4.36 107 , the efciency is about 170. This stratied sample is as good as a crude Monte Carlo estimator with 170 million simulations! By introducing more strata, we can increase this efciency as much as we wish. Within a stratied random sample we may also introduce antithetic variates designed to provide negative covariance. For example, we may use antithetic pairs within an interval if we believe that the function is monotone in the interval, or if we believe that the function is increasing across adjacent strata, we can introduce antithetic pairs between two intervals. For example, we may generate U uniform[0 1] and then sample the point Vij = xi 1 + (xi xi 1 )U from the interval (xi 1 xi ) as well as the point V(i +1)j = xi +1 (xi +1 xi )U from the interval (xi xi +1 ) to obtain antithetic pairs between intervals. For a simple example of this applied to the above call option valuation, consider the estimator based on three strata [0, 0.47], [0.47, 0.84], [0.84, and 1]. Here we have not bothered to sample to the left of 0.47 since the function is 0 there, so the sample size here is set to 0. Then using antithetic random numbers within each of the two strata [0.47 0.84], [0.84 1], and U uniform[0 1], we obtain the estimator str ant = 0.37 [f (0.47 + 0.37U ) + f (0.84 0.37U )] 2 0.16 [f (0.84 + 0.16U ) + f (1 0.16U )] + 2 To assess this estimator, we evaluated, for U a vector of 1,000,000 uniform, U=rand(1,1000000); F=.37*.5*(fn(.47+.37*U)+fn(.84-.37*U))+.16*.5 *(fn(.84+.16*U)+fn(1-.16*U)); mean(F) % gives 0.4615 var(F)/length(F) % gives 1.46109 This should be compared with the crude Monte Carlo estimator having the same number, n = 4 106 , function evaluations as each of the components of the vector F : 2 /(4 106 ) = 1.117 107 . The gain in efciency is crude therefore 1.117/0.0146, or approximately 77. The above stratied antithetic simulation with 1,000,000 input variates and 4,000,000 function evaluations is equivalent to a crude Monte Carlo simulation with sample size of 308 million! Variance reduction makes the difference between a simulation 181 Variance Reduction Techniques that is feasible on a laptop and one that would require a very long time on a mainframe computer. However, on a Pentium IV 2.2 GHz laptop it took approximately 58 seconds to run. Control Variates There are two techniques that permit using knowledge about a function with shape similar to that of f. First, we consider the use of a control variate, based on the trivial identity f (u)du = g (u)du + (f (u) g(u))du (4.16) for an arbitrary function g(u). Assume that the integral of g is known, so we can substitute its known value for the rst term above. The second integral, we assume, is more difcult and we estimate it by crude Monte Carlo, resulting in estimator cv = g (u)du + 1 n n [f (Ui ) g(Ui )] (4.17) i =1 This estimator is clearly unbiased and has variance var (cv ) = var = 1 n n [f (Ui ) g(Ui )] i =1 var[f (U ) g(U )] n so the variance is reduced over that of a crude Monte Carlo estimator having the same sample size n by a factor var[f (U )] var[f (U ) g(U )] for U U [0 1] (4.18) Let us return to the example of pricing a call option. By some experimentation, which could involve a preliminary crude simulation or simply evaluating the function at various points, we discovered that the function g(u) = 6[(u 0.47)+ ]2 + (u 0.47)+ provided a reasonable approximation to the function f (u). The two functions are compared in Figure 4.4. Moreover, the integral 2 0.532 + 1 0.533 2 of the function g(.) is easy to obtain. It is obvious from the gure that since f (u) g(u) is generally much smaller and less variable than f (u) var[f (U ) g(U )] < var (f (U )). The 182 MONTE CARLO SIMULATION AND FINANCE 3 2.5 2 f (u ) 1.5 1 0.5 f (u ) g (u ) 0 0.5 0 0.1 0.2 0.3 0.4 0.5 u 0.6 0.7 0.8 0.9 1 FIGURE 4.4 Comparison of the Function f (u) and the Control Variate g(u) variance of the crude Monte Carlo estimator is determined by the variability in the function f (u) over its full range. The variance of the control variate estimator is determined by the variance of the difference between the two functions, which in this case is quite small. We used the following Matlab functions, the rst to generate the function g(u) and the second to determine the efciency gain of the control variate estimator. function g=GG(u) % this is the functiong(u) a control variate for fn(u) u=max(0,u-.47); g=6*u.2+u; function [est,var1,var2]=control(f,g,intg,n) % run using a statement like %[est,var1,var2]=control (fn,GG,intg,n) % runs a simulation on the function f using control variate g (both character strings) n times. % intg is the integral of g 1 % intg= 0 g(u)du % outputs estimator est and variances var1,var2, variances with and without control variate. U=unifrnd(0,1,1,n); FN=eval(strcat(f,(U))); % evaluates f (u) for vector u CN=eval(strcat(g,(U))); % evaluates g(u) est=intg+mean(FN-CN); var1=var(FN); var2=var(FN-CN); 183 Variance Reduction Techniques Then the call [est,var1,var2]=control(fn,GG,2*(.53)3+ (.53)2/2, 1000000) yields the estimate 0.4616 and variance = 1.46 108 , for an efciency gain over crude Monte Carlo of around 30. This elementary form of control variate suggests using the estimator g (u)du + 1 n n [f (Ui ) g(Ui )] i =1 but it may well be that g(U ) is not the best estimator we can imagine for f (U ). We can often nd a linear function of g(U ) that is better by using regression. Since elementary regression yields f (U ) E(f )) (U = (g(U ) E(g(U ))) + where = cov(f (U ) g(U )) var (g(U )) (4.19) (4.20) and the errors have expectation 0, it follows that E(f (U )) + = f (U ) [g(U )E(g(U ))], and so f (U )[g(U )E(g(U ))] is an unbiased estimator of E(f (U )). For a sample of n uniform random numbers this becomes cv = E(g(U )) + 1 n n [f (Ui ) g(Ui )] (4.21) i =1 Moreover, this estimator has the smallest variance among all linear combinations of f (U ) and g(U ). Note that when = 1, (4.21) reduces to the simpler form of the control variate technique (4.17) discussed above. However, the latter is generally better in terms of maximizing efciency. Of course, in practice it is necessary to estimate the covariance and the variances in the denition of from the simulations themselves by evaluating f and g at many different uniform random variables Ui i = 1 2 . . . n, and then estimating using the standard least squares estimator = n n i =1 f (Ui )g(Ui ) n=1 f (Ui ) n=1 g(Ui ) i i n n=1 g 2 (Ui ) ( n=1 g(Ui ))2 i i Although in theory the substitution of an estimator for the true value results in a small bias in the estimator, for large numbers of simulations n our estimator is so close to the true value that this bias can be disregarded. Importance Sampling Another technique that is similar is importance sampling. Again we depend on having a reasonably simple function g that, after multiplication by some 184 MONTE CARLO SIMULATION AND FINANCE constant, is similar to f. However, rather than attempt to minimize the difference f (u) g(u) between the two functions, we try and nd g(u) such that f (u)/g(u) is nearly a constant. We also require that g be nonnegative and be integrable so that, after rescaling the function, it integrates to 1 (i.e., it is a probability density function). Assume we can easily generate random variables from the probability density function g(z). The distribution whose probability density function is g(z) z [0 1], is the importance distribution. Note that if we generate a random variable Z having the probability density function g(z) z [0 1], then 1 f (u)du = 0 =E f (z) g(z)dz g(z) f (Z) g(Z) (4.22) This can therefore be estimated by generating independent random variables Zi with probability density function g(z) and then setting im = 1 n n i =1 f (Zi ) g(Zi ) (4.23) Once again, according to (4.22), this is an unbiased estimator and the variance is f (Z1 ) 1 var {im } = var (4.24) n g(Z1 ) Returning to our example, we might consider using the same function as before for g(u). However, it is not easy to generate variates from a density proportional to this function g by inverse transform since this would require solving a cubic equation. Instead, let us consider something much simpler, the density function g(u) = 2(0.53)2 (u 0.47)+ having cumulative 0 distribution function G(u) = (0.53)2 [(u .47)+ ]2 and inverse cumulative distribution function G1 (u) = 0.47 + 0.53 u. In this case we generate Zi using Zi = G1 (Ui ) for Ui uniform[0 1]. The following function simulates an importance sample estimator: function [est,v]=importance(f,g,Ginv,u) %runs a simulation on the function f using importance density g(both character strings) and inverse c.d.f. Ginverse % outputs all estimators (should be averaged) and variance. 185 Variance Reduction Techniques % IM is the inverse cf of the importance distribution c.d.f. IM= eval(Ginv); %=.47+.53*sqrt(u); %IMdens is the density of the importance sampling distribution at IM IMdens=eval(g); %2*(IM-.47)/(.53)2; FN=eval(strcat(f,(IM))); est=FN./IMdens; % mean(est) provides the estimator v=var(FN./IMdens)/length(IM); % this is the variance of the estimator per simulation The function was called with [est,v]=importance(fn,2*(IM-.47)/ (.53)2;,.47+.53* sqrt(u);,rand(1,1000000)); giving an estimate mean(est) = 0.4616 with variance 1.28 108 for an efciency gain of around 35 over crude Monte Carlo. Example (Estimating quantiles using importance sampling) Suppose we are able to generate random variables X from a probability density function of the form f (x) and we wish to estimate a quantile such as Var, that is, estimate xp such that P0 (X xp ) = p for a certain value 0 of the parameter. As a very simple example, suppose S is the sum of 10 independent random variables having the exponential distribution with mean and f (x1 . . . x10 ) is the joint probability density function of these 10 observations. Assume 0 = 1 and p = 0.999 so that we seek an extreme quantile of the sum; that is, we want to determine xp such that P0 (S xp ) = p. The equation that we wish to solve for xp is E0 {I (S xp )} = p (4.25) The crudest estimator of this is obtained by generating a large number of independent observations of S under the parameter value 0 = 1 and nding the pth quantile (i.e., by dening the empirical c.d.f.). We generate 186 MONTE CARLO SIMULATION AND FINANCE independent random vectors Xi = (Xi 1 . . . Xi 10 ) from the probability dens0 ity f0 (x1 . . . x10 ), and with Si = 1=1 Xij dene j n 1 F (x) = n I (Si x) (4.26) i =1 Invert it (possibly with interpolation) to estimate the quantile xp = F 1 (p) (4.27) If the true cumulative distribution function is differentiable, the variance of this quantile estimator is asymptotically related to the variance of our estimator of the cumulative distribution function, var (F (xp )) (F (xp ))2 var (xp ) so any variance reduction in the estimator of the c.d.f. is reected, at least asymptotically, in a variance reduction in the estimator of the quantile. Rather than generate the sample (Xi 1 . . . Xi 10 ) as independent observations having parameter value 0 , we could generate them using a different parameter value and the replace F (x) in (4.27) by the importance sampling empirical c.d.f. FI (x) = 1 n n Wi I (Si x) (4.28) i =1 where Wi = f0 (Xi 1 . . . Xi 10 ) f (Xi 1 . . . Xi 10 ) and once again solve for xp . Ideally, we should choose the value of so that the variance of xp or of Wi I (Si xp ) is as small as possible. This requires a wise guess or experimentation with various choices of . For a given we have another choice of empirical cumulative distribution function, FI 2 (x) = n 1 n i =1 Wi Wi I (Si x) (4.29) i =1 Both of these provide fairly crude estimates of the sample quantiles when observations are weighted, and, as one does with the sample median, one could easily interpolate between adjacent values around the value of xp . 187 Variance Reduction Techniques The alternative (4.29) is motivated by the fact that the values Wi appear as weights attached to the observations Si , and it therefore seems reasonable to divide by the sum of the weights. In fact, the expected value of the denominator is n Wi = n E i =1 so the two denominators are similar. In the example where the Xij are independent exponential(0 = 1), let us examine the weight on Si determined by Xi = (Xi 1 . . . Xi 10 ) 10 f (Xi 1 . . . Xi 10 ) exp(Xij ) Wi = 0 = = 10 exp{Si (1 1 )} 1 f (Xi 1 . . . Xi 10 ) exp(Xij /) j =1 The renormalized alternative (4.29) might be necessary for estimating extreme quantiles when the number of simulations is small but only the rst provides an completely unbiased estimating function. In our case, using (4.28) with = 2.5, we obtained an estimator of F (x0.999 ) with efciency about 180 times that of a crude Monte Carlo simulation. There is some discussion of various renormalizations of the importance sampling weights in Hesterberg (1995). Importance Sampling, the Exponential Tilt, and the Saddlepoint Approximation In searching for a convenient importance distribution, particularly if we wish to increase or decrease the frequency of observations in the tails, it is quite common to embed a given density in an exponential family. For example, suppose we wish to estimate an integral g (x)f (x)dx where f (x) is a probability density function. Suppose K(s) denotes the cumulant-generating function (the logarithm of the moment-generating function) of the density f (x) exp{K(s)} = exs f (x)dx The cumulant-generating function is a useful summary of the moments of a distribution since the mean can be determined as K (0) and the variance as 188 MONTE CARLO SIMULATION AND FINANCE K (0). From this single probability density function, we can now produce a whole (exponential) family of densities f (x) = ex K() f (x) (4.30) of which f (x) is a special case corresponding to = 0. The density (4.30) is often referred to as an exponential tilt of the original density function, and it increases the weight in the right tail for > 0 and decreases it for < 0. This family of densities is closely related to the saddlepoint approximation. If we wish to estimate the value of a probability density function f (x) at a particular point x note that this could be obtained from (4.30) if we knew the probability density function f (x). On the other hand, a normal approximation to a density is often reasonable at or around its mode, particularly if we are interested in the density of a sum or an average of independent random variables. The cumulant-generating function of the density f (x) is easily seen to be K( + s) and the mean is therefore K (). If we choose the parameter = (x) so that K () = x (4.31) then the density f has mean x and variance K (). How do we know for a given value of x there exists a solution to (4.31)? From the properties of cumulant-generating functions, K(t) is convex increasing and K(0) = 0. This implies that as t increases, the slope of the cumulant-generating function K (t) is nondecreasing. It therefore approaches a limit xmax (nite or innite) as t , and as long as we restrict the value of x in (4.31) to the interval x < xmax we can nd a solution. The value of the N (x K ()) density at the value x is 1 f (x) 2K () and therefore the approximation to the density f (x) is f (x) 1 eK()x 2K () (4.32) where = (x) satises K () = x. This is the saddlepoint approximation, discovered by Daniels (1954, 1980), and usually applied to the distribution of sums or averages of independent random variables because then the normal approximation is better motivated. Indeed, the saddlepoint approximation to the distribution of the sum of n independent identically distributed random variables is accurate to order O(n1 ), and if we renormalize it to integrate to 1, accuracy to order O(n3/2 ) is possiblesubstantially better than the order O(n1/2 ) of the usual normal approximation. 189 Variance Reduction Techniques Consider, for example, the saddlepoint approximation to the gamma ( 1) distribution. Because the moment-generating function of the gamma ( 1) distribution is 1 t <1 m(t) = (1 t) the cumulant-generating function is K(t) = ln(m(t)) = ln(1 t) K () = x K () = implies (1 )2 (x) = 1 so that x K ((x)) = x2 Therefore, the saddlepoint approximation to the probability density function is f (x) = exp ln(/x) x 1 2x 2 x 1 1/2 1 ex exp(x) 2 This is exactly the gamma density function with Stirlings approximation replacing (), and after renormalization this is exactly the gamma density function. Since it is often computationally expensive to generate random variables whose distribution is a convolution of known densities, it is interesting to ask whether (4.32) makes this any easier. In many cases the saddlepoint approximation can be used to generate a random variable whose distribution is close to this convolution with high efciency. For example, suppose we wish to generate the random variable Sn = n=1 Xi where each Xi has the i noncentral chi-squared distribution with cumulant-generating function K(t) = p 2 t ln(1 2t) 1 2t 2 (4.33) The parameter is the noncentrality parameter of the distribution, and p is the number of degrees of freedom. Notice that the cumulant-generating function of the sum takes the same form but with ( p) replaced by (n np) , so in effect we wish to generate a random variable with cumulant-generating function (4.33) for large values of the parameters ( p). Instead we generate from the saddlepoint approximation (4.32) to this distribution, and in fact we do this indirectly. If we change variables in (4.32) to determine the density of the new random variable that solves the equation K( )=X 190 MONTE CARLO SIMULATION AND FINANCE then the saddlepoint approximation (4.32) is equivalent to specifying a probability density for this variable, dx d = constant K ()eK()K () f () = f (K ()) (4.34) In general, this probability density function can often be bounded above by some density over the range of possible values of , allowing us to generate by acceptance-rejection. Then the value of the random variable is X = K ( ). In the particular case of the noncentral chi-squared example above, we may take the dominating density to be the U [0 1 ] density since (4.34) is bounded. 2 Combining Monte Carlo Estimators We have now seen a number of different variance reduction techniques, and many more are possible. With many of these methods, such as importance and stratied sampling, are associated parameters that may be chosen in different ways. The variance formula may be used as a basis of choosing a best method, but these variances and efciencies must also be estimated from the simulation, and it is rarely clear a priori which sampling procedure and estimator is best. For example, if a function f is monotone on [0 1] then an antithetic variate can be introduced with an estimator of the form a 1 = 1 [f (U ) + f (1 U )] 2 U U [0 1] but if the function is increasing to a maximum somewhere around then decreasing thereafter, we might prefer a 2 = 1 [f (U/2) + f ((1 U )/2) + f ((1 + U )/2) + f (1 U/2)] 4 (4.35) 1 2 and (4.36) Notice that any weighted average of these two unbiased estimators of would also provide an unbiased estimator of . The large number of potential variance reduction techniques is an embarrassment of riches. Which variance reduction method should we use, and how will we know whether it is better than the competitors? Fortunately, the answer is often to use all of the methods (within reason, of course); choosing a single method is often neither necessary nor desirable. Rather, it is preferable to use a weighted average of the available estimators with the optimal choice of the weights provided by regression. Suppose in general that we have k estimators or statistics i i = 1 . . . k , all unbiased estimators of the same parameter so that E(i ) = for all i. = (1 . . . k) we write E( ) = 1 , where 1 In vector notation, letting 191 Variance Reduction Techniques is the k-dimensional column vector of 1s so that 1 = (1 1 . . . 1). Let us suppose for the moment that we know the variance-covariance matrix V of the vector , dened by Vij = cov(i j ) Theorem 19 (Best linear combinations of estimators) The linear combination of the i that provides an unbiased estimator of and has minimum variance among all linear unbiased estimators is blc = bi i (4.37) i where the vector b = (b1 . . . bk) is given by b = (1t V 1 1)1 V 1 1 The variance of the resulting estimator is var(blc) = bt V b = 1/(1t V 1 1) Proof The proof is straightforward. It is easy to see that for any linear combination (4.37) the variance of the estimator is bt V b and we wish to minimize this quadratic form as a function of b subject to the constraint that the coefcients add to 1, or b 1 =1 Introducing the Lagrangian, we wish to set the derivatives with respect to the components bi equal to zero, t {b V b + (b 11)} = 0 or b 2V b + 1 = 0 b = constant V 1 1 and upon requiring that the coefcients add to one, we discover the value of I the constant above is (1t V 1 1)1 . This theorem indicates that the ideal linear combination of estimators has coefcients proportional to the row sums of the inverse covariance mat- 192 MONTE CARLO SIMULATION AND FINANCE rix. Notably, the variance of a particular estimator i is an ingredient in that sum, but one of many. In practice, of course, we almost never know the variance-covariance matrix V of a vector of estimators . However, when we do simulation evaluating these estimators using the same uniform input to each, we obtain independent replicated values of . This permits us to estimate the covariance matrix V and since we typically conduct many simulations, this estimate can be very accurate. Let us suppose that we have n simulated values of the vectors and call these 1 . . . n . As usual, we estimate the covariance matrix V using the sample covariance matrix V= 1 n1 n ( i )( i ) i =1 where = 1 n n i i =1 Let us return to the example and attempt to nd the best combination of the many estimators we have considered so far. To this end, let 0.53 [f (0.47 + 0.53U ) + f (1 0.53U )] an antithetic estimator 2 0.16 0.37 [f (0.47 + 0.37U ) + f (0.84 0.37U )] + [f (0.84 + 0.16U ) 2 = 2 2 + f (1 0.16U )] 1 = 3 = 0.37[f (0.47 + 0.37U )] + 0.16[f (1 0.16U )] (stratied-antithetic) 4 = g (x)dx + [f (U ) g(U )] 5 = im (control variate) the importance sampling estimator (4.23) Then 2 and 3 are both stratied-antithetic estimators, 4 is a control variate estimator, and 5 is the importance sampling estimator discussed earlier, all obtained from a single input uniform random variate U. In order to determine the optimal linear combination, we need to generate simulated values of all ve estimators using the same uniform random numbers as inputs. We determine the best linear combination of these estimators using function [o,v,b,V]=optimal(U) % generates optimal linear combination of five estimators and outputs % average estimator, variance and weights 193 Variance Reduction Techniques % input U a row vector of U[0,1] random numbers T1=(.53/2)*(fn(.47+.53*U)+fn(1-.53*U)); T2=.37*.5*(fn(.47+.37*U)+fn(.84-.37*U))+.16*.5 *(fn(.84+.16*U)+fn(1-.16*U)); T3=.37*fn(.47+.37*U)+.16*fn(1-.16*U); intg=2*(.53)3+.532/2; T4=intg+fn(U)-GG(U); T5=importance(fn,U); X=[T1 T2 T3 T4 T5]; % columns of X are replications of the same estimator, % row, estimators using same U mean(X) V=cov(X); % this estimates the covariance matrix V on=ones(5,1); V1=inv(V); % the inverse of the covariance matrix b=V1*on/(on*V1*on); % vector of coefficients of the optimal linear combination o=mean(X*b); % vector of the optimal linear combinations v=1/(on*V1*on); % variance of the optimal linear combination based on a single U One run of this estimator, called with [o,v,b,V]= optimal(rand (1,1000000)) yields o = 0.4615 b = [0.5499 1.4478 0.1011 0.0491 0.0481] The estimate 0.4615 is accurate to at least four decimal places, which is not surprising since the variance per uniform random number input is v = 1.13 105 . In other words, the variance of the mean based on 1,000,000 uniform input is 1.13 1010 , and the standard error is around 0.00001, so we can expect accuracy to at least four decimal places. Note that some of the weights are negative and others are greater than 1. Do these negative weights indicate estimators that are worse than useless? The effect of some estimators may be, on subtraction, to render the remaining function more linear and more easily estimated using another method, and negative coefcients are quite common in regression generally. The efciency gain over 194 MONTE CARLO SIMULATION AND FINANCE crude Monte Carlo is an extraordinary 40,000. However, since there are 10 function evaluations for each uniform variate input, the efciency when we adjust for the number of function evaluations is 4000. This simulation using 1,000,000 uniform random numbers and taking 63 seconds on a Pentium IV (2.4 GHz) (including the time required to generate all ve estimators) is equivalent to 40 billion simulations by crude Monte Carlo, a major task on a supercomputer! If we intended to use this simulation method repeatedly, we might well wish to see whether some of the estimators can be omitted without too much loss of information. Since the variance of the optimal estimator is 1/(1t V 1 1), we might use this to attempt to select one of the estimators for deletion. Notice that it is not so much the covariance of the estimators V that enters into Theorem 19 but its inverse J = V 1, which we can consider a type of information matrix by analogy to maximum likelihood theory. For example, we could choose to delete the ith estimator, that is, delete the ith row and column of V where i is chosen to have the smallest effect on 1/(1t V 1 1) or its reciprocal 1t J1 = i j Jij . In particular, if we let V(i) be the matrix V with the ith row and column deleted and J(i) = V(i)1 then we can identify t t 1 J1 1 J(i) 1 as the loss of information when the ith estimator is deleted. Since not all estimators have the same number of function evaluations, we should adjust this information by F E(i) = number of function evaluations required by the ith estimator. In other words, if an estimator i is to be deleted, it should be the one corresponding to min i 1t J1 1t J(i) 1 F E(i) We should drop this ith estimator if the minimum is less than the information per function evaluation in the combined estimator, because this means we will increase the information available in our simulation per function evaluation. In the above example with all ve estimators included, 1t J1 = 88 757 (with 10 function evaluations per uniform variate), so the information per function evaluation is 8876. In this case, if we were to eliminate one of the estimators, our choice would likely be number 3 since it contributes the least information per function evaluation. However, since all contribute more than 8876 per function evaluation, we should likely retain all ve. Common Random Numbers We now discuss another variance reduction technique, closely related to antithetic variates, called common random numbers, which is used, for example, whenever we wish to estimate the difference in performance between two 195 Variance Reduction Techniques systems or any other variable involving a difference, such as a slope of a function. Example For a simple example, suppose we have two estimators 1 2 of the center of a symmetric distribution. We would like to know which of these estimators is better in the sense that it has smaller variance when applied to a sample from a specic distribution symmetric about its median. If both estimators are unbiased estimators of the median, then the rst estimator is better if var (1 ) < var (2 ) and so we are interested in estimating a quantity like Eh1 (X) Eh2 (X) where X is a vector representing a sample from the distribution and h1 (X) = 2 2 1 h2 (X) = 2 . There are at least two ways of estimating these differences: 1. Generate samples and hence values of h1 (Xi ) i = 1 . . . n , and Eh2 (Xj ) j = 1 2 . . . m, independently and use the estimator 1 n n h1 (Xi ) i =1 1 m m h2 (Xj ) j =1 2. Generate samples and hence values of h1 (Xi ) h2 (Xi ) i = 1 . . . n, independently and use the estimator 1 n n (h1 (Xi ) h2 (Xi )) i =1 It seems intuitive that the second method is preferable since it removes the variability due to the particular sample from the comparison. Estimating TABLE 4.1 i 1t J1 1t J(i) 1 FE(i) 1t J11t J 1 (i) FE(i) 1 2 3 4 5 88,048 87,989 28,017 55,725 32,323 2 4 2 1 1 44,024 21,997 14,008 55,725 32,323 196 MONTE CARLO SIMULATION AND FINANCE the difference between two expected values is a common type of problem. For example, we may be considering investing in a new piece of equipment that will speed up processing at one node of a network and we wish to estimate the expected improvement in performance between the new system and the old. In general, suppose that we wish to estimate the difference between two expectations, say (4.38) Eh1 (X) Eh2 (Y ) where the random variable or vector X has cumulative distribution function FX and Y has cumulative distribution function FY . Notice that the variance of a Monte Carlo estimator, var[h1 (X) h2 (Y )] = var[h1 (X)] + var[h2 (Y )] 2 cov{h1 (X) h2 (Y )} (4.39) is small if we can induce a high degree of positive correlation between the generated random variables h1 (X) and h2 (Y ). This is precisely the opposite of the problem that led to antithetic random numbers, where we wished to induce a high degree of negative correlation. The following lemma is due to Hoeffding (1940) and provides a useful bound on the joint cumulative distribution function of two random variables X and Y. Suppose X Y have cumulative distribution functions FX (x) and FY (y) , respectively, and joint cumulative distribution function G(x y) = P [X x Y y ]. Lemma 6 satises (a) The joint cumulative distribution function G of (X Y ) always (FX (x) + FY (y) 1)+ G(x y) min(FX (x) FY (y)) (4.40) for all x y . (b) Assume that FX and FY are continuous functions. In the case that X = FX 1 (U ) and Y = FY 1 (U ) for U uniform on [0 1], equality is achieved on the right: G(x y) = min(FX (x) FY (y)). In the case that X = FX 1 (U ) and 1 + Y = FY (1 U ), there is equality on the left: (FX (x) + FY (y) 1) = G(x y). Proof (a) Note that P [X x Y y ] P [X x ] P [Y y ] and similarly This shows that G(x y) min(FX (x) FY (y)) Variance Reduction Techniques 197 verifying the right side of (4.40). Similarly, for the left side P [X x Y y ] = P [X x ] P [X x Y > y ] P [X x ] P [Y > y ] = FX (x) (1 FY (y)) = (FX (x) + FY (y) 1) Since it is also nonnegative, the left side follows. (b) Suppose X = FX 1 (U ) and Y = FY 1 (U ); then P [X x Y y ] = P [FX 1 (U ) x FY 1 (U ) y ] = P [U FX (x) U FY (y)] since P [X = x ] = 0 and P [Y = y ] = 0. But P [U FX (x) U FY (y)] = min(FX (x) FY (y)) verifying the equality on the right of (4.40) for common random numbers. By a similar argument, P [FX 1 (U ) x FY 1 (1 U ) y ] = P [U FX (x) 1 U FY (y)] = P [U FX (x) U 1 FY (y)] = (FX (x) (1 FY (y)))+ verifying the equality on the left. I The following theorem supports the use of common random numbers to maximize covariance and antithetic random numbers to minimize covariance. Theorem 20 (Maximum/minimum covariance) Suppose h1 and h2 are both nondecreasing (or both nonincreasing) functions. Subject to the constraint that X Y have cumulative distribution functions FX FY , respectively, the covariance cov[h1 (X) h2 (Y )] is maximized when Y = FY 1 (U ) and X = FX 1 (U ) (i.e., for common uni form [0 1] random numbers) and is minimized when Y = FY 1 (U ) and X = 1 FX (1 U ) (i.e., for antithetic random numbers). Proof We will sketch a proof of the theorem when the distributions are all continuous and h1 h2 are differentiable. Dene G(x y) = P [X x Y y ]. 198 MONTE CARLO SIMULATION AND FINANCE The following representation of covariance is useful: Dene H (x y) = P (X > x Y > y) P (X > x)P (Y > y) = G(x y) FX (x)FY (y) (4.41) Notice that, using integration by parts, H (x y)h1 (x)h2 (y)dx dy H (x y)h1 (x)h2 (y)dx dy x 2 H (x y)h1 (x)h2 (y)dx dy = xy = = h1 (x)h2 (y)g(x y)dx dy h1 (x)fX (x)dx h2 (y)fY (y)dy = cov(h1 (X) h2 (Y )) (4.42) where g(x y) fX (x) fY (y) denote the joint probability density function, the probability density function of X, and that of Y, respectively. In fact, this result holds in general even without the assumption that the distributions are continuous. The covariance between h1 (X) and h2 (Y ), for h1 and h2 differentiable functions, is cov(h1 (X) h2 (Y )) = H (x y)h1 (x)h2 (y)dx dy The formula shows that to maximize the covariance, if h1 h2 are both increasing or both decreasing functions, it is sufcient to maximize H (x y) for each x y since h1 (x) h2 (y) are both nonnegative. Since we are constraining the marginal cumulative distribution functions FX FY this is equivalent to maximizing G(x y) subject to the constraints lim G(x y) = FX (x) y lim G(x y) = FY (y) x Lemma 6 shows that the maximum is achieved when common random numbers are used and the minimum achieved when we use antithetic random numbers. I We can argue intuitively for the use of common random numbers in the case of a discrete distribution with probability on the points indicated in 199 Variance Reduction Techniques 1 0.9 0.8 P4 P1 0.7 0.6 y 0.5 0.4 0.3 P3 P2 0.2 0.1 0 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 x FIGURE 4.5 Changing Weights on Points to Maximize Covariance Figure 4.5. This gure corresponds to a joint distribution with the following probabilities, say x y P [X = x Y = y ] 0 0 0.1 0.25 0.25 0.2 0.25 0.75 0.2 0.75 0.25 0.1 0.75 0.75 0.2 1 1 0.2 Suppose we wish to maximize P [X > x Y > y ] subject to the constraint that the probabilities P [X > x ] and P [Y > y ] are xed. We have indicated arbitrary xed values of (x y) in the gure. Note that if there is any weight attached to the point in the lower right quadrant (labeled P2 ), some or all of this weight can be reassigned to the point P3 in the lower left quadrant provided there is an equal movement of weight from the upper left P4 to the upper right P1 . Such a movement of weight will increase the value of G(x y) without affecting P [X x ] or P [Y y ]. The weight that we are able to transfer in this example is 0.1 the minimum of the weights on P4 and P2 . In general, this continues until there is no weight in one of the 200 MONTE CARLO SIMULATION AND FINANCE off-diagonal quadrants for every choice of (x y). The resulting distribution in this example is given by x y P [X = x Y = y ] 0 0 0.1 0.25 0.25 0.3 0.25 0.75 0 0.75 0.25 0.1 0.75 0.75 0.3 1 1 0.2 and it is easy to see that such a joint distribution can be generated from common random numbers X = FX 1 (U ) Y = FY 1 (U ). Conditioning We now consider a simple but powerful generalization of control variates. Suppose that we can decompose a random variable T into two components T1 , (4.43) T = T1 + so that T1 are uncorrelated: cov(T1 ) = 0 Assume as well that E() = 0. Regression is one method for determining such a decomposition, and the error term in regression satises these conditions. Then T1 has the same mean as T and it is easy to see that var (T ) = var (T1 ) + var () so T1 has smaller variance than T (unless = 0 with probability 1). Thus, if we wish to estimate the common mean of T or T1 the estimator T1 is preferable, since it has the same mean with smaller variance. One special case is variance reduction by conditioning. One common denition of E [X|Y ] is the unique (with probability 1) function g(y) of Y that minimizes E {X g(Y )}2 . This denition applies only to random variables X that have nite variance, and so this denition requires some modication when E(X2 ) = but we will assume here that all random variables under consideration, say X Y Z, have nite variance. We can dene conditional covariance using conditional expectation as cov(X Y |Z) = E [XY |Z ] E [X|Z ]E [Y |Z ] and conditional variance as var (X |Z) = E(X 2 |Z) (E [X |Z ])2 Variance reduction through conditioning is justied by the following wellknown result. 201 Variance Reduction Techniques Theorem 21 (a) E(X) = E {E [X|Y ]} (b) cov(X Y ) = E {cov(X Y |Z)} + cov{E [X|Z ] E [Y |Z ]} (c) var (X) = E {var (X|Z)} + var {E [X |Z ]} This theorem is used as follows. Suppose we are considering a candidate estimator , an unbiased estimator of . We also have an arbitrary random variable Z that is somehow related to . Suppose that we have chosen Z care fully so that we are able to calculate the conditional expectation T1 = E [|Z ]. Then by part (a) of the above theorem, T1 is also an unbiased estimator of . Dene = T1 By part (c), var () = var (T1 ) + var () and var (T1 ) = var () var () < var (). In other words, for any variable |Z ] has the same expectation as does but smaller variance, and the Z , E [ are nearly independent, because decrease in variance is largest if Z and in this case E [|Z ] is close to a constant and its variance close to zero. In general, the search for an appropriate Z so as to reduce the variance of an estimator by conditioning requires searching for a random variable Z such that 1. The conditional expectation E [|Z ] with the original estimator is computable. 2. var (E [|Z ]) is substantially smaller than var (). Example (Hit or miss) Suppose we wish to estimate the area under a certain graph f (x) by the hit-or-miss method. A crude method would involve determining a multiple c of a probability density function g(x) that dominates f (x) so that cg(x) f (x) for all x. We can generate points (X Y ) at random and uniformly distributed under the graph of cg(x) by generating X by inverse transform X = G1 (U1 ), where G(x) is the cumulative distribution function corresponding to density g , and then generating Y from the uniform[0 cg(X)] distribution, say Y = cg(X)U2 . An example with g(x) = 2x 0 < x < 1, and c = 1/4 is given in Figure 4.6. The hit-or-miss estimator of the area under the graph of f obtains by generating such random points (X Y ) and counting the proportion that fall under the graph of g that is, for which Y f (X). This proportion estimates 202 MONTE CARLO SIMULATION AND FINANCE 0.5 0.45 0.4 0.35 0.3 cg (x ) 0.25 0.2 f (x ) 0.15 0.1 * (X,Y ) 0.05 0 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 FIGURE 4.6 Example of the Hit-or-Miss Method the probability P [Y f (X)] = Area under f (x) Area under cg(x) Area under f (x) = c since g(x) is a probability density function. Notice that if we dene W= c if Y f (X) 0 if Y > f (X) then E(W ) = c Area under f (x) Area under cg(x) = Area under f (x) so W is an unbiased estimator of the parameter that we wish to estimate. We might therefore estimate the area under f (x) using a Monte Carlo estimator 1 H M = n n=1 Wi based on independent values of Wi . This is the hit-ori miss estimator. However, in this case it is easy to nd a random variable Z such that the conditional expectation E(Z |W ) can be determined in closed form. In fact, we can choose Z = X obtaining f (X) E [W |X ] = g(X) 203 Variance Reduction Techniques This is therefore an unbiased estimator of the same parameter and it has smaller variance than does W. For a sample of size n we should replace the crude estimator cr by the estimator Cond = 1 n 1 = n n i =1 n i =1 f (Xi ) g(Xi ) f (Xi ) 2Xi with Xi generated from X = G1 (Ui ) = Ui i = 1 2 . . . n, and Ui uniform[0, 1]. In this case, the conditional expectation results in a familiar form for the estimator Cond. This is simply an importance sampling estimator with g(x) the importance distribution. However, this derivation shows that the estimator Cond has smaller variance than H M . PROBLEMS 1. Use both crude and antithetic random numbers to integrate the function 1 0 eu 1 du e1 (a) What is the efciency gain attributed to the use of antithetic random numbers? (b) How large a sample size would we need, using antithetic and crude Monte Carlo, in order to estimate the above integral, correct to four decimal places, with probability at least 95 percent? 2. Under what conditions on f does the use of antithetic random numbers completely correct for the variability of the Monte Carlo estimator (i.e., when is var (f (U ) + f (1 U )) = 0? 3. Suppose that F (x) is the normal( 2 ) cumulative distribution function. Prove that F 1 (1 U ) = 2 F 1 (U ) and therefore, if we use antithetic random numbers to generate two normal random variables X1 X2 having mean and variance 2 , this is equivalent to setting X2 = 2 X1 . In other words, if we wish to use antithetic random numbers for normal variates, it is not necessary to generate the normal random variables using the inverse transform method. 4. Show that the variance of a weighted average var (X + (1 )W ) 204 MONTE CARLO SIMULATION AND FINANCE is minimized over when = var (W ) cov(X W ) var (W ) + var (X) 2 cov(X W ) Determine the resulting minimum variance. What if the random variables X W are independent? 5. Use a stratied random sample to integrate the function 1 0 eu 1 du e1 What do you recommend for choice of strata (two or three) and sample sizes? What is the efciency gain? 6. Use a combination of stratied random sampling and an antithetic random number in the form 1 [f (U/2) 2 + f (1 U/2)] to integrate the function 1 0 eu 1 du e1 What is the efciency gain? 7. The second version of the control variate Monte Carlo estimator cv 1 = n n {f (Ui ) [g(Ui ) E(g(Ui ))]} i =1 an improved control variate estimator, is equivalent to the rst version x in the case = 1. In the case f (x) = ee11 , consider using g(x) = x as a control variate to integrate over [0, 1]. Determine how much better cv is than the basic control variate ( = 1) by performing simulations. Show that the variance is reduced by a factor of approximately 60 over crude Monte Carlo. Is there much additional improvement if we use a more general quadratic function of x for g(x)? 8. There is considerable evidence that portfolio returns are neither normally nor lognormally distributed but have fatter tails than either distribution. Suppose we approximate the distribution of trading losses using a random variable Xi with probability density function f (x) = (b2 2b 3 + (x )2 )2 205 Variance Reduction Techniques This is the recentered and rescaled Students t distribution with 3 degrees of freedom. We are told that the parameters are determined by two observations from historical data, that the median daily loss is $1,000 (i.e., a prot) and that the probability that the daily loss exceed $5,000 is only 0.01. We wish to estimate a weekly value at risk, Var0.95 , a value v such that P [ 5=1 Xi < v ] = 0.95 . Since we do not know the distribui tion of the sum of independent Student-distributed random variables Xi , we may wish to do this by simulation. Suggest appropriate methods involving importance sampling, control variates, and stratied sampling. Implement these methods and estimate the variance reduction achieved by each. How do they compare with the variance reduction achieved using the optimal linear combination? 9. Suppose three different simulation estimates Y1 Y2 Y3 are all unbiased estimators of the parameter and all with identical variances var (Yi ) = 1 Assume that cov(Y1 Y2 ) = cov(Y1 Y3 ) = 1/2 and cov(Y2 Y3 ) = 0. In order to estimate the parameter , should we use one of the estimators Yi or some linear combination of Y1 Y2 Y3? Compare the number of simulations necessary for a certain degree of accuracy if we use a single estimator with that for a linear combination. x 10. In the case f (x) = ee11 , use g(x) = x as a control variate to integrate over [0, 1]. Find the optimal linear combination using estimators (4.35) and (4.36), an importance sampling estimator, and the control variate estimator above. What is the efciency gain over crude Monte Carlo?
Find millions of documents on Course Hero - Study Guides, Lecture Notes, Reference Materials, Practice Exams and more. Course Hero has millions of course specific materials providing students with the best way to expand their education.

Below is a small sample set of documents:

Waterloo - STAT - 340
CHAPTER5Simulating the Value of OptionsASIAN OPTIONSAn Asian option, at expiration T, has value determined not by the closing price of the underlying asset as for a European option, but on an average price of the asset over an interval. For example a
Waterloo - STAT - 340
CHAPTER7Estimation and CalibrationINTRODUCTIONVirtually all models have parameters that must be specied in order for themodels to be completely described. Statistical estimation can be used forsome or all of these parameters under two important cond
Waterloo - STAT - 340
SOLUTIONS FOR REVIEW PROBLEMS1.The joint probability density function isf (x, y ) = 2e(x+2y) , 0 &lt; x &lt; and 0 &lt; y &lt; .ThereforeP (XZZ&lt; Y)==Z0=Zcfw_y01.3f (x, y )dxdycfw_(x,y );x&lt;y f (x, y )dxdy2. In this case n = 30 and the observed nu
Waterloo - STAT - 340
02/01/2007STATISTICS 340/CS 437COMPUTER SIMULATIONInstructor:Don McLeishMC 6138TEXT:Simulation, 2nd 4th edition, Sheldon M.Ross. Academic Press(on reserve in libraryQA273.R82)Various notes andtransparencies willbe posted on webpage including
Waterloo - STAT - 340
Covariance: DefinitionIf E(X)=x and E(Y)=y, thenCov(X,Y)= E[(X-x)(Y-y)]Another formula for covariance:Cov(X,Y)=E(XY)-xy51Laws of Covariancefor constants a,b d.Cov(X,X)=var(X)Cov(X,Y)=Cov(Y,X)Cov(aX+b,Y)=a Cov(X,Y)|Cov(X,Y)|SD(X)SD(Y)If X,Y ind
University of Phoenix - ECONOMIC - ECON561
1Changing Roles in Human Resource ManagementStudentDate2Discussing the changing role of the Human Resource Manager an interview with JillianJohnson, HR manager at one of the premier telecommunications companys in the area.Discovering the value of H
Instituto Politecnico National Unidad Profesional - ENGINEERIN - 404
IntroduccinEl presente trabajo de investigacin ejemplifica y define lo referente a los servomotores y su construccin decd y ca.Tambin se dan ejemplos claros de cmo influye la potencia en la mquina y las relaciones de parvelocidadde algunas de las maqu
College of Puerto Rico - CC - 101
Universidad Interamericana de Puerto RicoRecinto de FajardoDepartamento de Ciencias y TecnologaCOMP 3410 Computer SecurityExamen FinalConteste las siguientes preguntas. Someter a travs de esta misma herramienta.1. Define IT security management.2. L
DeVry Addison - ACCOUNTING - 101
373Chapter 12Corporate Formation, Distributions, and OtherCorporation-Related Tax IssuesTRUE-FALSE QUESTIONSCHAPTER 121. A corporation recognizes a loss when it distributes property that has declined in value.2. When a shareholder receives a return
DeVry Addison - ACCOUNTING - 101
395Chapter 13The Sole Proprietorship and Individual Tax ReturnTRUE-FALSE QUESTIONSCHAPTER 131. Only when an individuals itemized deductions exceed the standard deduction amount will taxable incomereduced by an expense that is deductible as an itemize
DeVry Addison - ACCOUNTING - 101
415Chapter 14Flow-Through Entities: Partnerships, LLPs, and LLCsTRUE-FALSE QUESTIONSCHAPTER 141. In a general partnership all the partners are classied as general partners, each of whom has unlimitedliability for the debts of the partnership.2. Each
DeVry Addison - ACCOUNTING - 101
Limitations on Business Deductionshttp:/highered.mcgraw-hill.com/sites/0077507819/student_view0/e.Taxation of BusinessEntities, 2012, eBook3/eContentChapter1: Business Income, Deductions, and AccountingMethodsLimitations on Business Deductionsp.
DeVry Addison - ACCOUNTING - 101
Specic Business Deductionshttp:/highered.mcgraw-hill.com/sites/0077507819/student_view0/e.Taxation of BusinessEntities, 2012, eBook3/eContentChapter1: Business Income, Deductions, and AccountingMethodsSpecic Business Deductionsp. 13161514As w
DeVry Addison - ACCOUNTING - 101
Accounting Periodshttp:/highered.mcgraw-hill.com/sites/0077507819/student_view0/e.Taxation of BusinessEntities, 2012, eBook3/eContentChapter1: Business Income, Deductions, and AccountingMethodsAccounting Periodsp. 17So far we've discussed how to
DeVry Addison - ACCOUNTING - 101
Accounting Methodshttp:/highered.mcgraw-hill.com/sites/0077507819/student_view0/e.Chapter1: Business Income, Deductions, and AccountingMethodsAccounting Methodsp. 19343332313029282726252423222120Once a business adopts a tax year, it m
DeVry Addison - ACCOUNTING - 101
Business Deductionshttp:/highered.mcgraw-hill.com/sites/0077507819/student_view0/e.Taxation of BusinessEntities, 2012, eBook3/eContentChapter1: Business Income, Deductions, and AccountingMethodsBusiness Deductionsp. 45Because Congress intended
DeVry Addison - ACCOUNTING - 101
Chapter Openerhttp:/highered.mcgraw-hill.com/sites/0077507819/student_view0/e.Taxation of BusinessEntities, 2012, eBook3/eContentChapter1: Business Income, Deductions, and AccountingMethodsChapter Openerp. 13Learning ObjectivesUpon completing
DeVry Addison - ACCOUNTING - 101
Tax Free Class NotesPage 1 of 8These notes are not stand alone ! They are intended for use in conjunction with the classNON - CONCURRENT EXCHANGE SAFE HARBOR RULESFor: INDIVIDUAL TAX PAYERSWho else can do Exchanges?Corp., Partnership, LLC (all or no
DeVry Addison - ACCOUNTING - 101
Passive Activities and Real Estate Professionals - Print document - ProQuestPage 1 of 4Back to documentPassive Activities and Real Estate ProfessionalsSullivan, Jeanne; Gordon, Deborah Karet; Bloom, Brandon; Merrill, Sam.Business Entities 13. 6 (Nov/D
DeVry Addison - ACCOUNTING - 101
IntuitS go to Table of ContentsProLine Professional Tax Planning Guide1IntuitS go to Table of ContentsProLine Professional Tax Planning Guide
DeVry Addison - ACCOUNTING - 101
Chapter OpenerPage 1 of 4Taxation of BusinessEntities, 2012, eBook3/eContentChapter9: Forming and Operating PartnershipsChapter Openerp. 393Learning ObjectivesUpon completing this chapter, you should be able to:Determine whether a flow-through
DeVry Addison - ACCOUNTING - 101
Flow-Through Entities OverviewPage 1 of 2Taxation of BusinessEntities, 2012, eBook3/eContentChapter9: Forming and Operating PartnershipsFlow-Through Entities OverviewIncome earned by flow-through entities is usually not taxed at the entity level.
DeVry Addison - ACCOUNTING - 101
Partnership Formations and Acquisitions of Partnership InterestsPage 1 of 10Taxation of BusinessEntities, 2012, eBook3/eContentChapter9: Forming and Operating PartnershipsPartnership Formations and Acquisitions of Partnership Interestsp. 396Acqui
DeVry Addison - ACCOUNTING - 101
Partnership Accounting Periods, Methods, and Tax ElectionsPage 1 of 4Taxation of BusinessEntities, 2012, eBook3/eContentChapter9: Forming and Operating PartnershipsPartnership Accounting Periods, Methods, and Tax ElectionsA newly formed partnershi
DeVry Addison - ACCOUNTING - 101
Reporting the Results of Partnership Operationshttp:/highered.mcgraw-hill.com/sites/0077507819/student_view0/e.Taxation of BusinessEntities, 2012, eBook3/eContentChapter9: Forming and Operating PartnershipsReporting the Results of Partnership Opera
DeVry Addison - ACCOUNTING - 101
Partner's Adjusted Tax Basis in Partnership InterestPage 1 of 4Taxation of BusinessEntities, 2012, eBook3/eContentChapter9: Forming and Operating PartnershipsPartner's Adjusted Tax Basis in Partnership InterestEarlier in this chapter, we discussed
DeVry Addison - ACCOUNTING - 101
Loss LimitationsPage 1 of 6Taxation of BusinessEntities, 2012, eBook3/eContentChapter9: Forming and Operating PartnershipsLoss LimitationsWhile partners generally prefer not to invest in partnerships with operating losses, these losses generate cu
DeVry Addison - ACCOUNTING - 101
Chapter OpenerPage 1 of 2Taxation of BusinessEntities, 2012, eBook3/eContentChapter3: Property DispositionsChapter Openerp. 111Learning ObjectivesUpon completing this chapter, you should be able to:Calculate the amount of gain or loss recognize
DeVry Addison - ACCOUNTING - 101
DispositionsPage 1 of 3Taxation of BusinessEntities, 2012, eBook3/eContentChapter3: Property DispositionsDispositionsTaxpayers can dispose of assets in many ways. For example, a taxpayer could sell an asset, donate it to charity,trade it for a si
DeVry Addison - ACCOUNTING - 101
Character of Gain or LossPage 1 of 3Taxation of BusinessEntities, 2012, eBook3/eContentChapter3: Property DispositionsCharacter of Gain or LossIn order to determine how a recognized gain or loss affects a taxpayer's income tax liability, the taxpa
DeVry Addison - ACCOUNTING - 101
Depreciation RecapturePage 1 of 5Taxation of BusinessEntities, 2012, eBook3/eContentChapter3: Property DispositionsDepreciation RecaptureBecause all 1231 assets except land are subject to cost recovery, it is possible that a 1231 asset other than
DeVry Addison - ACCOUNTING - 101
Other Provisions Affecting the Rate at Which Gains are TaxedPage 1 of 3Taxation of BusinessEntities, 2012, eBook3/eContentChapter3: Property DispositionsOther Provisions Affecting the Rate at Which Gains are TaxedOther provisions, other than depre
DeVry Addison - ACCOUNTING - 101
Calculating Net 1231 Gains or LossesPage 1 of 3Taxation of BusinessEntities, 2012, eBook3/eContentChapter3: Property DispositionsCalculating Net 1231 Gains or LossesOnce taxpayers determine the amount and character of gain or loss they recognize o
DeVry Addison - ACCOUNTING - 101
Gain or Loss SummaryPage 1 of 3Taxation of BusinessEntities, 2012, eBook3/eContentChapter3: Property DispositionsGain or Loss SummaryAs indicated in Exhibit 3-2, Teton sold several assets during the year. Exhibit 3-8 summarizes the character of th
DeVry Addison - ACCOUNTING - 101
Nonrecognition TransactionsPage 1 of 14Taxation of BusinessEntities, 2012, eBook3/eContentChapter3: Property DispositionsNonrecognition TransactionsTaxpayers realizing gains and losses when they sell or exchange property must immediately recognize
DeVry Addison - ACCOUNTING - 101
Chapter OpenerPage 1 of 4Taxation of BusinessEntities, 2012, eBook3/eContentChapter6: Accounting for Income TaxesChapter Openerp. 247Learning ObjectivesUpon completing this chapter, you should be able to:Explain the objectives behind FASB ASC T
DeVry Addison - ACCOUNTING - 101
Objectives of Accounting for Income Taxes and the Income Tax Provision ProcessPage 1 of 4Taxation of BusinessEntities, 2012, eBook3/eContentChapter6: Accounting for Income TaxesObjectives of Accounting for Income Taxes and the Income TaxProvision
DeVry Addison - ACCOUNTING - 101
Calculating the Current and Deferred Income Tax Expense or Benefit Components of a C. Page 1 of 10Taxation of BusinessEntities, 2012, eBook3/eContentChapter6: Accounting for Income TaxesCalculating the Current and Deferred Income Tax Expense or Bene
DeVry Addison - ACCOUNTING - 101
Determining Whether a Valuation Allowance is NeededPage 1 of 6Taxation of BusinessEntities, 2012, eBook3/eContentChapter6: Accounting for Income TaxesDetermining Whether a Valuation Allowance is Neededp. 265Step 5: Evaluate the Need for a Valuati
DeVry Addison - ACCOUNTING - 101
Accounting for Uncertainty in Income Tax PositionsPage 1 of 5Taxation of BusinessEntities, 2012, eBook3/eContentChapter6: Accounting for Income TaxesAccounting for Uncertainty in Income Tax Positionsp. 270As you have learned in your study of the
DeVry Addison - ACCOUNTING - 101
Financial Statement Disclosure and the Computation of a Corporation's Effective Tax RatePage 1 of 5Taxation of BusinessEntities, 2012, eBook3/eContentChapter6: Accounting for Income TaxesFinancial Statement Disclosure and the Computation of aCorpo
DeVry Addison - ACCOUNTING - 101
Convergence of ASC 740 with International Financial Reporting StandardsPage 1 of 1Taxation of BusinessEntities, 2012, eBook3/eContentChapter6: Accounting for Income TaxesConvergence of ASC 740 with International Financial ReportingStandardsIn 200
DeVry Addison - ACCOUNTING - 101
ConclusionPage 1 of 1Taxation of BusinessEntities, 2012, eBook3/eContentChapter6: Accounting for Income TaxesConclusionIn this chapter we discussed the basic rules that govern the computation of a company's U.S. income tax provision.As a result o
DeVry Addison - ACCOUNTING - 101
SummaryPage 1 of 3Taxation of BusinessEntities, 2012, eBook3/eContentChapter6: Accounting for Income TaxesSummaryp. 281Explain the objectives behind FASB ASC Topic 740, Income Taxes, and the income tax provision process. Objectives of ASC 740.
DeVry Addison - ACCOUNTING - 101
Chapter OpenerPage 1 of 3Taxation of BusinessEntities, 2012, eBook3/eContentChapter10: Dispositions of Partnership Interests and Partnership DistributionsChapter Openerp. 445Learning ObjectivesUpon completing this chapter, you should be able to:
DeVry Addison - ACCOUNTING - 101
Basics of Sales of Partnership InterestsPage 1 of 7Taxation of BusinessEntities, 2012, eBook3/eContentChapter10: Dispositions of Partnership Interests and Partnership DistributionsBasics of Sales of Partnership InterestsAs we've seen in previous c
DeVry Addison - ACCOUNTING - 101
Basics of Partnership DistributionsPage 1 of 15Taxation of BusinessEntities, 2012, eBook3/eContentChapter10: Dispositions of Partnership Interests and Partnership DistributionsBasics of Partnership DistributionsLike shareholders receiving corporat
DeVry Addison - ACCOUNTING - 101
Disproportionate DistributionsPage 1 of 2Taxation of BusinessEntities, 2012, eBook3/eContentChapter10: Dispositions of Partnership Interests and Partnership DistributionsDisproportionate DistributionsUp to this point in the chapter, all our distri
DeVry Addison - ACCOUNTING - 101
Special Basis AdjustmentsPage 1 of 4Taxation of BusinessEntities, 2012, eBook3/eContentChapter10: Dispositions of Partnership Interests and Partnership DistributionsSpecial Basis AdjustmentsRecall that when a partner sells her partnership interest
DeVry Addison - ACCOUNTING - 101
ConclusionPage 1 of 1Taxation of BusinessEntities, 2012, eBook3/eContentChapter10: Dispositions of Partnership Interests and Partnership DistributionsConclusionThe tax rules for partnership dispositions and distributions are among the most complex
DeVry Addison - ACCOUNTING - 101
SummaryPage 1 of 3Taxation of BusinessEntities, 2012, eBook3/eContentChapter10: Dispositions of Partnership Interests and Partnership DistributionsSummaryp. 475Determine the tax consequences to the buyer and seller of the disposition of a partner
DeVry Addison - ECON - 101
145Chapter 1Introduction to TaxationTRUE-FALSE QUESTIONSCHAPTER 11. A distinguishing characteristic of a public good is that there is an effective means of limiting theenjoyment of the good.2. When a government provides goods or services that affect
DeVry Addison - ACC - ais
ch1Student: _1. Which of the following are parts of most accounting information systems?A.B.C.D.Activities and documents only.Activities and technology only.Documents and technology only.Activities, documents and technology.2. Accounting inform
DeVry Addison - ACC - ais
ch2Student: _1. Accounting is often confused with:A.B.C.D.Bookkeeping.Finance.Information systems.Tax research.2. Which of the following statements is most true?A.B.C.D.Accounting is the part of bookkeeping devoted to identifying and measu
DeVry Addison - ACC - ais
ch3Student: _1. Which of the following is not a characteristic of a professional defined by Bell?A.B.C.D.Appropriately uses philosophical knowledge.Communicates effectively.Actively seeks additional knowledge.Integrates knowledge from many disci
DeVry Addison - ACC - ais
ch4Student: _1. According to the COSO definition, internal control is a(n):A.B.C.D.Set of procedures.Process.Checklist.Way to eliminate risk.2. According to the COSO definition, internal controls should provide:A.B.C.D.Reasonable assurance
DeVry Addison - ACC - ais
ch5Student: _1. Which type of flowchart gives the user a &quot;big picture&quot; look?A.B.C.D.System.Program.Hardware.Document.2. Which type of flowchart depicts instructions for carrying out a task with a computer?A.B.C.D.System.Program.Hardware.
DeVry Addison - ACC - ais
ch6Student: _1. In a data flow diagram, a circle represents:A.B.C.D.An on-page connector.An off-page connector.An external entity.A business process.2. Databases in a data flow diagram are represented by:A.B.C.D.Parallel lines.Rectangles.
DeVry Addison - ACC - ais
ch7Student: _1. Macro-level factors to consider in IT adoption decisions include all of the following except:A.B.C.D.Adaptability.Financing.Personnel involvement.Strategic fit.2. Which of the following is a macro-level factor to consider in IT
DeVry Addison - ACC - ais
ch8Student: _1. In Porter's value chain, all of the following are primary activities except:A.B.C.D.Information technology.Service.Inbound logistics.Marketing and sales2. Which of the following is a primary activity in Porter's value chain?A.
DeVry Addison - ACC - ais
ch9Student: _1. The acquisition/payment process begins by:A.B.C.D.Choosing a software package.Pre-numbering purchase orders.Establishing an economic order quantity for each inventory item.Requesting goods and services based on monitored need.2.