Unformatted Document Excerpt
Coursehero >>
Canada >>
Waterloo >>
STAT 340
Course Hero has millions of student submitted documents similar to the one
below including study guides, practice problems, reference materials, practice exams, textbook help and tutor support.
Course Hero has millions of student submitted documents similar to the one
below including study guides, practice problems, reference materials, practice exams, textbook help and tutor support.
Reduction CHAPTER
4
Variance Techniques
INTRODUCTION
In this chapter we discuss techniques for improving on the speed and efciency of a simulation, usually called variance reduction techniques.
Much of the simulation literature concerns discrete event simulations
(DESs), simulations of systems that are assumed to change instantaneously in
response to sudden or discrete events. These are most common in operations
research, and examples are simulations of processes such as networks or
queues. Simulation models in which the process is characterized by a state,
with changes only at discrete time points, are DESs. In modeling an inventory
system, for example, the arrival of a batch of raw materials can be considered
an event that precipitates a sudden change in the state of the system, followed
by a demand some discrete time later when the state of the system changes
again. A system driven by differential equations in continuous time is not
a DES because the changes occur continuously in time. One approach to
DES is future event simulation, which schedules one or more future events
at a time, choosing the event in the future event set that has minimum time,
updating the state of the system and the clock accordingly, and then repeating
this whole procedure. Simulation of a stock price that moves at discrete time
points by an amount that may be a continuous random variable is a DES.
In fact, this approach is often used in valuing American options by Monte
Carlo methods where we model the stock price path using a binomial or
trinomial tree.
Often we identify one or more performance measures by which the system is to be judged, and parameters that may be adjusted to improve the
system performance. Examples are the delay for an air trafc control system, customer waiting times for a bank teller scheduling system, delays or
throughput for computer networks, and response times for the location of
re stations or supply depots. Performance measures are important in engineering examples or in operations research, but less common in nance.
163
164
MONTE CARLO SIMULATION AND FINANCE
They may be used to calibrate a simulation model, however. For example,
our performance measure might be the average distance between observed
option prices on a given stock and prices obtained by simulation from a
model with specic parameter values. In all cases, the performance measure
is usually the expected value of a complicated function of many variables,
often expressible only by a computer program with some simulated random
variables as input. Whether these input random variables are generated by
inverse transform, acceptance-rejection, or some other method, they are ultimately a function of uniform[0, 1] random variables U1 U2 . . .. These uniform random variables determine such quantities as the normally distributed
increments of the logarithm of the stock price. In summary, the simulation
is used simply to estimate a multidimensional integral of the form
E(g(U1 . . . Ud )) =
g (u1 u2 . . . ud )du1 du2 dud
(4.1)
over the unit cube in d dimensions, where often d is large.
As an example in nance, suppose that we wish to price a European
option on a stock price under the following stochastic volatility model.
Example
Suppose the daily asset returns under a risk-neutral distribution is assumed
to be a variance mixture of the normal distribution, by which we mean
that the variance itself is random, independent of the normal variable, and
follows a distribution with moment-generating function m(s). More specifically, assume under the Q measure that the stock price at time (n + 1) t is
determined from
exp{r t + n+1 Zn+1 }
S(n+1) t = Sn t
m( 1 )
2
where, under the risk-neutral distribution, the positive random variables 2
i
are assumed to have a distribution with moment-generating function m(s) =
E {exp(s i )}, Zi is standard normal independent of 2 , and both Zn+1 and
i
2 +1 are independent of the process up to time n t. We wish to determine
n
the price of a European call option with maturity T and strike price K.
It should be noted that the rather strange choice of m( 1 ) in the denom2
inator above is such that the discounted process is a martingale, since
E
exp{n+1 Zn+1 }
m( 1 )
2
=E E
=E
=1
exp{n+1 Zn+1 }
n+1
m( 1 )
2
exp{2 +1 /2}
n
m( 1 )
2
165
Variance Reduction Techniques
There are many ways of simulating an option price in the above example,
some much more efcient than others. We might, for example, simulate all
of the 2n random variables {i Zi i = 1 . . . n = T / t } and use these to
determine the simulated value of ST nally averaging the discounted payoff
from the option in this simulation, erT (ST K)+ . The price of this option
at time 0 is the average of many such simulations (say we do this a total of
N times) discounted to the present,
erT (ST K)+
where the bar indicates the average of the values erT (ST K)t values
over all simulations. This is a description of a crude and inefcient method
of conducting this simulation. Roughly, the time required for the simulation
is proportional to 2N n the total number of random variables generated.
This chapter discusses some of the many improvements possible in problems
like this. Since each simulation requires at least d = 2n independent uniform
random variables to generate the values {i Zi i = 1 . . . n} , we are trying to
estimate a rather complicated integral of the form (4.1) of high dimension d.
In this case, however, we can immediately see some obvious improvements.
Notice that we can rewrite ST in the form
ST = S0
exp{rT + Z }
mn ( 1 )
2
(4.2)
where the random variable 2 = n=1 2 has moment-generating function
i
i
mn (s) and Z is independent standard normal. Obviously, if we can simulate
directly, we can avoid the computation involved in generating the individual
i . Further savings are possible in light of the Black-Scholes formula, which
provides the price of a call option when a stock price is given by (4.2) and
the volatility parameter is nonrandom. The expected return from the call
under the risk-neutral distribution can be written, using the Black-Scholes
formula, as
E erT (ST K)+ = E E erT (ST K)+ |
= E S0
2
log(S0 /K) + (r + )T
2
T
Ke
rT
2
log(S0 /K) + (r )T
2
T
which is now a one-dimensional integral over the distribution of 2 . This
can now be evaluated either by a one-dimensional numerical integration or
by repeatedly simulating the value of 2 and averaging the values of
S0
2
log(S0 /K) + (r + )T
2
T
Ke
rT
2
log(S0 /K) + (r )T
2
T
166
MONTE CARLO SIMULATION AND FINANCE
obtained from these simulations. As a special case we might take the distribution of 2 to be gamma( t ) with moment-generating function
i
m(s) =
1
(1 s)
t
in which case the distribution of 2 is gamma(T ). This is the so-called
variance-gamma distribution investigated extensively by Madan and Seneta
(1990) and originally suggested as a model for stock prices by McLeish and
Pierson (cf. McLeish, 1982). Alternatively, many other, wider-tailed alternatives to the normal returns model can be written as a variance mixture
of the normal distribution, and option prices can be simulated in this way.
For example, when the variance is generated having the distribution of the
reciprocal of a gamma random variable, the returns have a Students t distribution. Similarly, the stable distributions and the Laplace distribution all
have a representation as a variance mixture of the normal.
The rest of this chapter discusses variance reduction techniques such as
the one employed above for evaluating integrals such as (4.1), beginning
with the much simpler case of an integral in one dimension.
VARIANCE REDUCTION FOR ONE-DIMENSIONAL
MONTE CARLO INTEGRATION
We wish to use Monte Carlo methods to evaluate the one-dimensional in1
tegral = 0 f (u)du for some function f (u). We have already noted that
whatever the distribution of random variables required in our simulation,
they are usually generated using uniform[0,1] random variables U so without loss of generality we can assume that the integral is with respect to the
uniform[0,1] probability density function: that is, we wish to estimate
1
= E {f (U )} =
f (u)du
0
One simple approach, called crude Monte Carlo, is to randomly sample
Ui uniform[0 1] and then average the values of f (Ui ) to obtain
CR =
1
n
n
f (Ui )
i =1
It is easy to see that E(CR ) = , so that this average is an unbiased estimator
of the integral and the variance of the estimator is
var (CR ) =
var (f (U1 ))
n
167
Variance Reduction Techniques
Example: A Crude Simulation of a Call Option Price under
the Black-Scholes Model
For a simple example that we will use throughout, consider an integral used
to price a call option. We have seen that if a European option has payoff
V (ST ), where ST is the value of the stock at maturity T then the option can
be valued at the present (t = 0) using the discounted future payoff from the
option under the risk-neutral measure:
erT E [V (ST )] = erT E [V (S0 eX )]
where, in the Black-Scholes model, the random variable X = ln(ST /S0 ) has a
normal distribution with mean rT 2 T /2 and variance 2 T. It is possible
2
to generate a normally distributed random variable X = 1 (U; rT T
2
2
2 T ) using the inverse transform method, where 1 (U; rT T 2 T ) is
2
2
the inverse of the normal(rT 2 T 2 T ) cumulative distribution function
evaluated at U a uniform[0 1] random variable. Then the value of the option
can be written as an expectation over the distribution of the uniform random
variable U
1
E {f (U )} =
f (u)du
0
where
f (u) = erT V
S0 exp
1
U ; rT
2
T 2 T
2
This function is graphed in Figure 4.1 in the case of a simple call option
with strike price K payoff at maturity V(ST ) = (ST K)+ current stock
4
3.5
3
f(u)
2.5
2
1.5
1
0.5
0
0
0.1
0.2
0.3
0.4
0.5
u
0.6
0.7
0.8
0.9
1
FIGURE 4.1 The Function f (u) Whose Integral Provides the Value of a Call Option
168
MONTE CARLO SIMULATION AND FINANCE
price S0 = $10 exercise price K = $10 annual interest rate r = 5 percent,
maturity of three months or one-quarter of a year (T = 0.25) and annual
volatility = 0.20.
A simple crude Monte Carlo estimator corresponds to evaluating this
function at a large number of randomly selected values of Ui U [0 1] and
then averaging the results. For example, the following function in Matlab
accepts a vector of inputs u = (U1 . . . Un ) assumed to be uniform[0, 1]
and outputs the values of f (U1 ) . . . f (Un ) which can be averaged to give
1
CR = n n=1 f (Ui ).
i
function v=fn(u)
% value of the integrand for a call option with
exercise price ex, r=annual interest rate,
%sigma = annual vol, S0 = current stock price.
% u = vector of uniform (0,1) inputs to
%generate normal variates by inverse transform.
T = maturity
S0 = 10 ;K = 10;r = .05; sigma = .2 ;T = .25 ;
% Values of parameters
ST = S0*exp(norminv(u,r*T-sigma2*T/2,sigma*sqrt(T)));
2
% ST =S0 exp{ 1(U; rT T 2 T )} is stock price at time T
2
v = exp(-r*T)*max((ST-ex),0);
% v = discounted to present payoffs from the call
option
The analogous function in R is
fn<-function(u,So,strike,r,sigma,T){
# value of the integrand for a call option with
exercise price=strike, r=annual interest rate,
# sigma=annual volatility, So=current stock price,
u = uniform (0,1) input to generate normal variates
# by inverse transform. T=time to maturity. For
Black-Scholes price, integrate over (0,1).
x<-So*exp(qnorm(u,mean=r*T-sigma2*T/2,sd = sigma*sqrt(T))
)
v<-exp(-r*T)*pmax((x-strike),0)
v}
In the case of initial stock price = $10 , exercise price = $10 , annual
volatility = 0.20 r = 5 percent, T = 0.25 (three months), this is run as
u = rand(1,500000); mean(fn(u))
and in R,
169
Variance Reduction Techniques
mean(fn(runif(500000),So = 10,strike = 10,r = .05,
sigma = .2,T = .25))
and this provides an approximate value of the option of CR = 0.4620. We
may conrm this using the Black-Scholes formula, again in Matlab,
[CALL,PUT] = BLSPRICE(10,10,0.05,0.25,0.2,0).
The arguments are, in order, (S0 K r T q), where the last argument
(q = 0) is the annual dividend yield, which we assume here to be zero.
Provided that no dividends are paid on the stock before the maturity of the
option, this is reasonable. This Matlab command provides the result CALL
= 0.4615 and PUT = 0.3373, indicating that our simulated call option price
was reasonably accurateoff by 1 percent or so. The put option is an option
to sell the stock at the specied price $10 at the maturity date and is also
priced by this same function.
One of the advantages of Monte Carlo methods over numerical techniques is that, because we are using a sample mean, we have a simple estimator of accuracy. In general, when n simulations are conducted, the accuracy
is measured by the standard error of the sample mean. Since
var (CR ) =
var (f (U1 ))
n
the standard error of the sample mean is the standard deviation of CR , or
f
SE(CR ) =
n
(4.3)
where 2 = var (f (U )). As usual, we estimate 2 using the sample standf
f
ard deviation. Since fn(u) provides a whole vector of estimators ( f (U1 )
f (U2 ) . . . f (Un )) then sqrt(var(fn(u))) is the sample estimator of f so
the standard error SE(CR ) is given by
Sf=sqrt(var(fn(u)));
Sf/sqrt(length(u))
giving an estimate 0.6603 of the standard deviation f or standard error
f / 500 000, or 0.0009. Of course, parameters in statistical problems are
usually estimated using an interval estimate or a condence interval, an interval constructed using a method that guarantees capturing the true value of the
parameter under similar circumstances with high probability (the condence
coefcient, often taken to be 95 percent). Formally,
170
MONTE CARLO SIMULATION AND FINANCE
Definition A 95 percent condence interval for a parameter is an interval [L U ] with random endpoints L U such that the probability P [L
U ] = 0.95.
If we were to repeat the experiment 100 times, say, by running 100 more,
similar independent simulations, and in each case use the results to construct
a 95 percent condence interval, then this denition implies that roughly 95
of the intervals constructed will contain the true value of the parameter (and,
of course, roughly 5 will not). For an approximately normal(X 2 ) random
X
variable X , we can use the approximation
P [X 2X X X + 2X ] 0.95
(4.4)
(i.e., approximately normal variables are within two standard deviations of
their mean with probability around 95 percent) to build a simple condence
interval. Strictly, the value 2X should be replaced by 1.96X , where 1.96 is
taken from the normal distribution tables. The value 2 is very close to correct
for a t distribution with 60 degrees of freedom. In any case, these condence
intervals, which assume approximate normality, are typically too short (i.e.,
contain the true value of the parameter less frequently than stated) for most
real data, and so a value marginally larger than 1.96 is warranted. Replacing
X above by the standard deviation of a sample mean, (4.4) results in the
approximate 95 percent condence interval
f
f
CR 2 CR + 2
n
n
for the true value . With condence 95 percent, the true price of the option is
within the interval 0.462 2(0.0009). As it happens in this case, this interval
does capture the true value, 0.4615 of the option.
So far Monte Carlo has not told us anything we couldnt obtain from
the Black-Scholes formula, but what if we used a distribution other than the
normal to generate the returns? This is an easy modication of the above. For
example, suppose we replace the standard normal by a logistic distribution,
which, as we have seen, has a density function very similar to the standard
normal if we choose b = 0.625. Of course, the Black-Scholes formula does
not apply to a process with logistically distributed returns. We need only
replace the standard normal inverse cumulative distribution function by the
corresponding inverse for the logistic,
F 1 (U ) = b ln
U
1U
and thus replace the Matlab code, norminv(u,T*(r-sigma2/2), sigma*
sqrt(T)) by T*(r-sigma2/2)+sigma*sqrt(T)*.625* log(u./(1-u))
.
171
Variance Reduction Techniques
This results in a slight increase in the option value (to 0.504) and a considerable (about 50 percent) increase in the variance of the estimator.
We will look at the efciency of various improvements to crude Monte
Carlo, and to that end, we record the value of the variance of the estimator
based on a single uniform variate in this case:
2
2
crude = f = var(f (U )) 0.436
Then the crude Monte Carlo estimator using n function evaluations or n
uniform variates has variance approximately 0.436/n. If we were able to
adjust the method so that the variance 2 based on a single evaluation of the
f
function f in the numerator were halved, then we could achieve the same
accuracy from a simulation using half the number of function evaluations.
For this reason, when we compare two different methods for conducting a
simulation, the ratio of variances corresponding to a xed number of function evaluations can also be interpreted roughly as the ratio of computational
effort required for a given predetermined accuracy. We will often compare
various new methods of estimating the same function based on variance
reduction schemes and quote the efciency gain over crude Monte Carlo
sampling:
Efciency =
Variance of crude Monte Carlo estimator
Variance of new estimator
(4.5)
where the numerator and denominator correspond to estimators with the
same number of function evaluations (since this is usually the more expensive
part of the computation). An efciency of 100 would indicate that the crude
Monte Carlo estimator would require 100 times the number of function
evaluations to achieve the same variance or standard error of the estimator.
To begin with, consider a crude estimator obtained from ve U [0 1]
variates,
Ui = 0.1 0.3 0.5 0.6 0.8 i = 1 . . . 5
The crude Monte Carlo estimator in the case n = 5 is displayed in Figure 4.2,
the estimator being the sum of the areas of the marked rectangles. Only three
of the ve points actually contribute to this area since for this particular
function
f (u) = e
rT
S0 exp
1
2
u; rT T 2 T
2
+
K
(4.6)
and the parameters chosen, f (0.1) = f (0.3) = 0. Since these two random
numbers contributed 0 and the other three appear to be on average slightly
too small, the sum of the area of the rectangles appears to underestimate
172
MONTE CARLO SIMULATION AND FINANCE
3
2.5
f(u)
2
1.5
1
0.5
0
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
u
FIGURE 4.2 Crude Monte Carlo Estimator Based on Five Observations,
Ui = 0.1 0.3 0.5 0.6 0.8
the integral. Of course, another selection of ve uniform random numbers
may prove to be even more badly distributed and may result in an under- or
overestimate.
There are various ways of improving the efciency of this estimator,
many of which partially emulate numerical integration techniques. First we
should note that most numerical integrals, like CR , are weighted averages
of the values of the function at certain points Ui . What if we evaluated
the function at nonrandom points, chosen to attempt reasonable balance between locations where the function is large and small? Numerical integration
techniques and quadrature methods choose both points at which we evaluate
the function and weights that we attach to these points to provide accurate
approximations for polynomials of certain degree. For example, suppose we
insist on evaluating the function at equally spaced points, such as the points
0 1/n 2/n . . . (n 1)/n 1. In some sense these points are now more uniform than we are likely to obtain from n + 1 randomly and independently
chosen points Ui i = 1 2 . . . n. The trapezoidal rule corresponds to using
such equally spaced points and equal weights (except at the boundary), so
that the estimator of the integral is
T R =
1
f (0) + 2f (1/n) + + 2f
2n
1
1
n
+ f (1)
(4.7)
or the simpler and very similar alternative in our case, with n = 5
T R =
1
{f (0.1) + f (0.3) + f (0.5) + f (0.7) + f (0.9)}
5
(4.8)
173
Variance Reduction Techniques
4
3.5
3
2.5
2
1.5
1
0.5
0
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
FIGURE 4.3 Graphical Illustration of (4.8)
A reasonable balance between large and small values of the function is almost
guaranteed by such a rule, as shown in Figure 4.3, with the observations
equally spaced.
Simpsons rule is to generate equally spaced points and weights that
(except for endpoints) alternate: 2/3n 4/3n 2/3n . . . . In the case when n is
even, the integral is estimated by
SR =
1
f (0) + 4f (1/n) + 2f (2/n) + + 4f
3n
n1
n
+ f (1)
(4.9)
The trapezoidal rule is exact for linear functions, and Simpsons rule is exact
for quadratic functions.
These one-dimensional numerical integration rules provide some insight
into how to achieve lower variance in Monte Carlo integration. It illustrates
some options for increasing accuracy over simple random sampling. We may
either vary the weights attached to the individual points or vary the points
(the Ui ) themselves or both. Notice that as long as the Ui individually have
distributions that are uniform[0 1], we can introduce any degree of dependence among them in order to come closer to the equal spacings characteristic
of numerical integrals. Even if the Ui are dependent U [0 1] , an estimator
of the form
n
1
f (Ui )
n i =1
will continue to be an unbiased estimator because each of the summands
continue to satisfy E(f (Ui )) = . Ideally, if we introduce dependence among
174
MONTE CARLO SIMULATION AND FINANCE
the various Ui and the expected value remains unchanged, we would wish
that the variance
n
1
f (Ui )
var
n i =1
is reduced over independent uniform. The simplest case of this idea is the
use of antithetic random variables.
Antithetic Random Numbers
Consider rst the simple case of n = 2 function evaluations at possibly
dependent points. Then the estimator is
= 1 {f (U1 ) + f (U2 )}
2
with expected value =
1
0
f (u)du and variance given by
var () = 1 { var (f (U1 )) + cov[f (U1 ) f (U2 )]}
2
assuming both U1 U2 are uniform[0, 1]. In the independent case the covariance term disappears and we obtain the variance of the crude Monte Carlo
estimator
1
var (f (U1 ))
2
Notice, however, that if we are able to introduce a negative covariance, the
resulting variance of will be smaller than that of the corresponding crude
Monte Carlo estimator, so the question is how to generate this negative
covariance. Suppose, for example, that f is monotone (increasing or decreasing). Then f (1 U1 ) decreases whenever f (U1 ) increases, so that substituting U2 = 1 U1 has the desired effect and produces a negative covariance (in
fact, we will show later that we cannot do any better when the function f is
monotone). Such a choice of U2 = 1 U1 which helps reduce the variability
in f (U1 ) , is termed an antithetic variate. In our example, because the function to be integrated is monotone, there is a negative correlation between
f (U1 ) and f (1 U1 ) and
1
{ var (f (U1 ))
2
+ cov[f (U1 ) f (U2 )]} < 1 var(f (U1 ))
2
That is, the variance is decreased over simple random sampling. Of course,
in practice our sample size is much greater than n = 2 but we still enjoy the
benets of this argument if we generate the points in antithetic pairs. For
example, to determine the extent of the variance reduction using antithetic
random numbers, suppose we generate 500 000 uniform variates U and use
175
Variance Reduction Techniques
as well the values of 1 U (for a total of 1 000 000 function evaluations as
before).
F=(fn(u)+fn(1-u))/2;
This results in mean(F)=0.46186 and var(F)=0.1121. The standard error
of the estimator is
0.1121
= 2.24 107
length(F )
Since each of the 500,000 components of F obtains from two function evaluations, the variance should be compared with a crude Monte Carlo estimator with 1,000,000 function evaluations, 2 /1 000 000 = 4.35 107 .
crude
The efciency gain due to the use of antithetic random numbers is 4.35/2.24,
or about 2, so roughly half as many function evaluations using antithetic random numbers provide the same precision as a crude Monte Carlo estimator.
There is the additional advantage that only half as many uniform random
variables are required. The introduction of antithetic variates has had the
same effect on precision as increasing the sample size under crude Monte
Carlo by a factor of approximately 2.
We have noted that antithetic random numbers improve the efciency
whenever the function being integrated is monotone in u. What if it is not?
For example, suppose we use antithetic random numbers to integrate the
function f (u) = u(1 u) on the interval 0 < u < 1. Rather than balance
large values with small values and so reduce the variance of the estimator, in
this case notice that f (U ) and f (1 U ) are strongly positively correlated, in
fact are equal, and so the argument supporting the use of antithetic random
numbers for monotone functions will show that in this case they increase
the variance over a crude estimator with the same number of function evaluations. Of course, this problem can be remedied if we can identify intervals in
which the function is monotone. In this case we could use antithetic random
numbers in the two intervals [0 1 ] and [ 1 1]; so, for example, we might
2
2
1
estimate 0 f (u)du by an average of terms like
1
f
4
U1
+f
2
1 U1
+f
2
1 + U2
+f
2
2 U2
2
for independent U [0 1] random variables U1 U2 .
Stratified Sample
One of the reasons for the inaccuracy of the crude Monte Carlo estimator
in the above example is the large interval, evident in Figure 4.1, in which
176
MONTE CARLO SIMULATION AND FINANCE
the function is zero. Nevertheless, both crude and antithetic Monte Carlo
methods sample in that region, this portion of the sample contributing nothing to our integral. Naturally, we would prefer to concentrate our sample
in the region where the function is positive, and where the function is more
variable, use larger sample sizes. One method designed to achieve this objective is the use of a stratied sample. Once again, for a simple example we
choose n = 2 function evaluations, and with V1 U [0 a ] and V2 U [a 1]
dene an estimator
st = af (V1 ) + (1 a)f (V2 )
Note that this is a weighted average of the two function values with weights
a and 1 a proportional to the length of the corresponding intervals. It is
easy to show once again that the estimator st is an unbiased estimator of ,
since
E(st ) = aEf (V1 ) + (1 a)Ef (V2 )
a
=a
0
1
=
1
f (x) dx + (1 a)
a
1
f (x)
a
1
dx
1a
f (x)dx
0
Moreover,
var (st ) = a 2 var[f (V1 )] + (1 a)2 var[f (V2 )]
+ 2a(1 a) cov[f (V1 ) f (V2 )]
(4.10)
Even when V1 V2 are independent, and so we obtain var (st ) = a 2 var[f (V1 )] +
(1 a)2 var[f (V2 )] there may be a dramatic improvement in variance over
crude Monte Carlo provided that the variability of f in each of the intervals
[0 a ] and [a 1] is substantially less than that in the whole interval [0 1].
Let us return to the call option example above, with f dened by (4.6).
Suppose for simplicity we choose independent values of V1 V2 . In this case,
var (st ) = a 2 var[f (V1 )] + (1 a)2 var[f (V2 )]
(4.11)
For example, for a = 0.7 this results in a variance of about 0.046, obtained
from
F=a*fn(a*rand(1,500000))+(1-a)*fn(a+(1-a)*
rand(1,500000));
var(F)
and the variance of the sample mean of the components of the vector F is
var(F)/length(F), or around 9.2 108 . Since each component of the vector
above corresponds to two function evaluations, we should compare this
177
Variance Reduction Techniques
with a crude Monte Carlo estimator with n = 1 000 000 having variance
2 106 = 4.36 107 . This corresponds to an efciency gain of 43.6/9.2
f
or around 5. We can afford to use one-fth the sample size by simply splitting
the sample into two strata. The improvement is somewhat limited by the fact
that we are still sampling in a region in which the function is 0 (although
now slightly less often).
A general stratied sample estimator is constructed as follows. We subdivide the interval [0 1] into convenient subintervals 0 = x0 < x1 <
< xk = 1, and then select ni random variables uniform on the corresponding
interval Vij U [xi 1 xi ] j = 1 2 . . . ni . Then the estimator of is
k
st =
(xi xi 1 )
i =1
1
ni
ni
f (Vij )
(4.12)
j =1
Once again, the weights (xi xi 1 ) on the average of the function in the ith
interval are proportional to the lengths of these intervals, and the estimator
st is unbiased:
k
1 ni
(xi xi 1 )E
f (Vij )
E(st ) =
ni
i =1
j =1
k
=
(xi xi 1 )Ef (Vi 1 )
i =1
k
(xi xi 1 )
=
=
i =1
1
xi
f (x)
xi 1
1
dx
xi xi 1
f (x)dx =
0
In the case that all of the Vij are independent, the variance is given by
k
var (st ) =
(xi xi 1 )2
i =1
1
var[f (Vi 1 )]
ni
(4.13)
Again, if we choose our intervals so that the variation within intervals var
[f (Vi 1 )] is small, this provides a substantial improvement over crude Monte
Carlo. Suppose we wish to choose the sample sizes so as to minimize this
variance. Obviously, to avoid innite sample sizes and to keep a ceiling on
costs, we need to impose a constraint on the total sample size, say
k
ni = n
i
(4.14)
178
MONTE CARLO SIMULATION AND FINANCE
If we treat the parameters ni as continuous variables, we can use the method
of Lagrange multipliers to solve
k
(xi xi 1 )2
min
{ni }
i =1
1
var[f (Vi 1 )]
ni
subject to the constraint (4.14). It is easy to show that the optimal choice of
sample sizes within intervals are
ni (xi xi 1 ) var[f (Vi 1 )]
or more precisely that
ni = n
(xi xi 1 ) var f (Vi 1 )
k
j =1 (xj
xj 1 ) var[f (Vj 1 )]
(4.15)
In practice, of course, this will not necessarily produce an integral value
of ni , and so we are forced to round to the nearest integer. For this optimal
choice of sample size, the variance is now given by
2
k
1
(xj xj 1 ) var[f (Vj 1 )]
var (st ) =
n j =1
The term k=1 (xj xj 1 ) var[f (Vj 1 )] is a weighted average of the standard
j
deviation of the function f within the interval (xi 1 xi ), and it is clear that,
at least for a continuous function, these standard deviations can be made
small simply by choosing k large with |xi xi 1 | small. In other words, if we
ignore the fact that the sample sizes must be integers, at least for a continuous
function f, we can achieve arbitrarily small var (st ) using a xed sample
size n simply by division into a very large number of (small) strata. The
intervals should be chosen so that the variances var[f (Vi 1 )] are small, ni
(xi xi 1 ) var[f (Vi 1 )] . In summary, optimal sample sizes are proportional
to the lengths of intervals times the standard deviation of the function evaluated at a uniform random variable on the interval. For sufciently small
strata we can achieve arbitrarily small variances. The following function
was designed to accept the strata x1 x2 . . . xk and the desired sample size n
as input, and then determine optimal sample sizes and the stratied sample
estimator as follows:
1. Initially sample sizes 1000 are chosen from each stratum, and these
of
are used to estimate var[f (Vi 1 )] .
Variance Reduction Techniques
179
2. Approximately optimal sample sizes ni are then calculated from (4.15).
3. Samples of size ni are then taken and the stratied sample estimator
(4.12), its variance (4.13), and the sample sizes ni are output.
function [est,v,n]=stratified(x,nsample)
% function for optimal sample size stratified
estimator on call option price example
%[est,v,n]=stratified([0 .6 .85 1],100000) uses three
strata (0,.6),(.6 .85),(.85 1) and total sample size
100000
est=0;
n=[];
m=length(x); for i=1:m-1
% the preliminary sample of size 1000
v= var(callopt2(unifrnd(x(i),x(i+1),1,1000),10,10,.05,
.2,.25));
n=[n (x(i+1)-x(i))*sqrt(v)];
end
n=floor(nsample*n/sum(n));
% calculation of the optimal sample sizes, rounded down
v=0;
for i=1:m-1
F=callopt2(unifrnd(x(i),x(i+1),1,n(i)),10,10,.05,.2,
.25);
%evaluate the function f at n(i) uniform points in interval
est=est+(x(i+1)-x(i))*mean(F);
v=v+var(F)*(x(i+1)-x(i))2/n(i);
end
A call to [est,v,n]=stratied([0 .6 .85 1],100000), for example, generates
a stratied sample with three strata [0, 0.6], (0.6, 0.85], and (0.85, 1], and
outputs the estimate est = 0.4617, its variance v = 3.5 107 , and the approximately optimal choice of sample sizes n = 26 855 31 358 41 785. To
compare this with a crude Monte Carlo estimator, note that a total of 99,998
function evaluations are used, so the efciency gain is 2 /(99 998 3.5
f
107 ) = 12.8. Evidently this stratied random sample can account for
an improvement in efciency of about a factor of 13. Of course, there is
a little setup cost here (a preliminary sample of size 3000), which we have
not included in our calculation, but the results of that preliminary sample
could have been combined with the main sample for a very slight decrease
in variance as well). For comparison, the function call
180
MONTE CARLO SIMULATION AND FINANCE
[est,v,n]=stratified([.47 .62 .75 .87 .96 1],1000000)
uses ve strata, [0.47, 0.62], [0.62, 0.75], [0.75, 0.87], [0.87, 0.96],
[0.96, 1], and gives a variance of the estimator of 7.4 109 . Since a crude
sample of the same size has variance around 4.36 107 , the efciency is
about 170. This stratied sample is as good as a crude Monte Carlo estimator with 170 million simulations! By introducing more strata, we can
increase this efciency as much as we wish.
Within a stratied random sample we may also introduce antithetic variates designed to provide negative covariance. For example, we may use antithetic pairs within an interval if we believe that the function is monotone
in the interval, or if we believe that the function is increasing across adjacent strata, we can introduce antithetic pairs between two intervals. For
example, we may generate U uniform[0 1] and then sample the point
Vij = xi 1 + (xi xi 1 )U from the interval (xi 1 xi ) as well as the point
V(i +1)j = xi +1 (xi +1 xi )U from the interval (xi xi +1 ) to obtain antithetic
pairs between intervals. For a simple example of this applied to the above
call option valuation, consider the estimator based on three strata [0, 0.47],
[0.47, 0.84], [0.84, and 1]. Here we have not bothered to sample to the left
of 0.47 since the function is 0 there, so the sample size here is set to 0. Then
using antithetic random numbers within each of the two strata [0.47 0.84],
[0.84 1], and U uniform[0 1], we obtain the estimator
str ant
=
0.37
[f (0.47 + 0.37U ) + f (0.84 0.37U )]
2
0.16
[f (0.84 + 0.16U ) + f (1 0.16U )]
+
2
To assess this estimator, we evaluated, for U a vector of 1,000,000 uniform,
U=rand(1,1000000);
F=.37*.5*(fn(.47+.37*U)+fn(.84-.37*U))+.16*.5
*(fn(.84+.16*U)+fn(1-.16*U));
mean(F)
% gives 0.4615
var(F)/length(F) % gives 1.46109
This should be compared with the crude Monte Carlo estimator having the
same number, n = 4 106 , function evaluations as each of the components
of the vector F : 2 /(4 106 ) = 1.117 107 . The gain in efciency is
crude
therefore 1.117/0.0146, or approximately 77. The above stratied antithetic
simulation with 1,000,000 input variates and 4,000,000 function evaluations is equivalent to a crude Monte Carlo simulation with sample size of
308 million! Variance reduction makes the difference between a simulation
181
Variance Reduction Techniques
that is feasible on a laptop and one that would require a very long time on
a mainframe computer. However, on a Pentium IV 2.2 GHz laptop it took
approximately 58 seconds to run.
Control Variates
There are two techniques that permit using knowledge about a function with
shape similar to that of f. First, we consider the use of a control variate, based
on the trivial identity
f (u)du =
g (u)du +
(f (u) g(u))du
(4.16)
for an arbitrary function g(u). Assume that the integral of g is known, so
we can substitute its known value for the rst term above. The second integral, we assume, is more difcult and we estimate it by crude Monte Carlo,
resulting in estimator
cv =
g (u)du +
1
n
n
[f (Ui ) g(Ui )]
(4.17)
i =1
This estimator is clearly unbiased and has variance
var (cv ) = var
=
1
n
n
[f (Ui ) g(Ui )]
i =1
var[f (U ) g(U )]
n
so the variance is reduced over that of a crude Monte Carlo estimator having
the same sample size n by a factor
var[f (U )]
var[f (U ) g(U )]
for U U [0 1]
(4.18)
Let us return to the example of pricing a call option. By some experimentation, which could involve a preliminary crude simulation or simply
evaluating the function at various points, we discovered that the function
g(u) = 6[(u 0.47)+ ]2 + (u 0.47)+
provided a reasonable approximation to the function f (u). The two functions are compared in Figure 4.4. Moreover, the integral 2 0.532 + 1 0.533
2
of the function g(.) is easy to obtain.
It is obvious from the gure that since f (u) g(u) is generally much
smaller and less variable than f (u) var[f (U ) g(U )] < var (f (U )). The
182
MONTE CARLO SIMULATION AND FINANCE
3
2.5
2
f (u )
1.5
1
0.5
f (u )
g (u )
0
0.5
0
0.1
0.2
0.3
0.4
0.5
u
0.6
0.7
0.8
0.9
1
FIGURE 4.4 Comparison of the Function f (u) and the Control Variate g(u)
variance of the crude Monte Carlo estimator is determined by the variability
in the function f (u) over its full range. The variance of the control variate
estimator is determined by the variance of the difference between the two
functions, which in this case is quite small. We used the following Matlab
functions, the rst to generate the function g(u) and the second to determine
the efciency gain of the control variate estimator.
function g=GG(u)
% this is the functiong(u) a control variate for fn(u)
u=max(0,u-.47);
g=6*u.2+u;
function [est,var1,var2]=control(f,g,intg,n)
% run using a statement like
%[est,var1,var2]=control (fn,GG,intg,n)
% runs a simulation on the function f using control
variate g (both character strings) n times.
% intg is the integral of g
1
% intg= 0 g(u)du
% outputs estimator est and variances var1,var2,
variances with and without control variate.
U=unifrnd(0,1,1,n);
FN=eval(strcat(f,(U))); % evaluates f (u) for vector u
CN=eval(strcat(g,(U))); % evaluates g(u)
est=intg+mean(FN-CN);
var1=var(FN);
var2=var(FN-CN);
183
Variance Reduction Techniques
Then the call [est,var1,var2]=control(fn,GG,2*(.53)3+
(.53)2/2, 1000000) yields the estimate 0.4616 and variance = 1.46
108 , for an efciency gain over crude Monte Carlo of around 30.
This elementary form of control variate suggests using the estimator
g (u)du +
1
n
n
[f (Ui ) g(Ui )]
i =1
but it may well be that g(U ) is not the best estimator we can imagine for
f (U ). We can often nd a linear function of g(U ) that is better by using
regression. Since elementary regression yields
f (U ) E(f )) (U = (g(U ) E(g(U ))) +
where
=
cov(f (U ) g(U ))
var (g(U ))
(4.19)
(4.20)
and the errors have expectation 0, it follows that E(f (U )) + = f (U )
[g(U )E(g(U ))], and so f (U )[g(U )E(g(U ))] is an unbiased estimator
of E(f (U )). For a sample of n uniform random numbers this becomes
cv = E(g(U )) +
1
n
n
[f (Ui ) g(Ui )]
(4.21)
i =1
Moreover, this estimator has the smallest variance among all linear combinations of f (U ) and g(U ). Note that when = 1, (4.21) reduces to the
simpler form of the control variate technique (4.17) discussed above. However, the latter is generally better in terms of maximizing efciency. Of course,
in practice it is necessary to estimate the covariance and the variances in the
denition of from the simulations themselves by evaluating f and g at many
different uniform random variables Ui i = 1 2 . . . n, and then estimating
using the standard least squares estimator
=
n
n
i =1
f (Ui )g(Ui ) n=1 f (Ui ) n=1 g(Ui )
i
i
n n=1 g 2 (Ui ) ( n=1 g(Ui ))2
i
i
Although in theory the substitution of an estimator for the true value
results in a small bias in the estimator, for large numbers of simulations n
our estimator is so close to the true value that this bias can be disregarded.
Importance Sampling
Another technique that is similar is importance sampling. Again we depend
on having a reasonably simple function g that, after multiplication by some
184
MONTE CARLO SIMULATION AND FINANCE
constant, is similar to f. However, rather than attempt to minimize the difference f (u) g(u) between the two functions, we try and nd g(u) such
that f (u)/g(u) is nearly a constant. We also require that g be nonnegative
and be integrable so that, after rescaling the function, it integrates to 1 (i.e.,
it is a probability density function). Assume we can easily generate random
variables from the probability density function g(z). The distribution whose
probability density function is g(z) z [0 1], is the importance distribution.
Note that if we generate a random variable Z having the probability density
function g(z) z [0 1], then
1
f (u)du =
0
=E
f (z)
g(z)dz
g(z)
f (Z)
g(Z)
(4.22)
This can therefore be estimated by generating independent random variables
Zi with probability density function g(z) and then setting
im =
1
n
n
i =1
f (Zi )
g(Zi )
(4.23)
Once again, according to (4.22), this is an unbiased estimator and the variance is
f (Z1 )
1
var {im } = var
(4.24)
n
g(Z1 )
Returning to our example, we might consider using the same function
as before for g(u). However, it is not easy to generate variates from a density proportional to this function g by inverse transform since this would
require solving a cubic equation. Instead, let us consider something much
simpler, the density function g(u) = 2(0.53)2 (u 0.47)+ having cumulative
0
distribution function G(u) = (0.53)2 [(u .47)+ ]2 and inverse cumulative
distribution function G1 (u) = 0.47 + 0.53 u. In this case we generate Zi
using Zi = G1 (Ui ) for Ui uniform[0 1]. The following function simulates an importance sample estimator:
function [est,v]=importance(f,g,Ginv,u)
%runs a simulation on the function f using
importance density g(both character strings) and
inverse c.d.f. Ginverse
% outputs all estimators (should be averaged) and
variance.
185
Variance Reduction Techniques
% IM is the inverse cf of the importance distribution
c.d.f.
IM= eval(Ginv); %=.47+.53*sqrt(u);
%IMdens is the density of the importance sampling
distribution at IM
IMdens=eval(g); %2*(IM-.47)/(.53)2;
FN=eval(strcat(f,(IM)));
est=FN./IMdens; % mean(est) provides the estimator
v=var(FN./IMdens)/length(IM);
% this is the variance of the estimator per simulation
The function was called with
[est,v]=importance(fn,2*(IM-.47)/ (.53)2;,.47+.53*
sqrt(u);,rand(1,1000000));
giving an estimate mean(est) = 0.4616 with variance 1.28 108 for an
efciency gain of around 35 over crude Monte Carlo.
Example (Estimating quantiles using importance sampling) Suppose we are
able to generate random variables X from a probability density function of
the form
f (x)
and we wish to estimate a quantile such as Var, that is, estimate xp such that
P0 (X xp ) = p
for a certain value 0 of the parameter.
As a very simple example, suppose S is the sum of 10 independent
random variables having the exponential distribution with mean and
f (x1 . . . x10 ) is the joint probability density function of these 10 observations. Assume 0 = 1 and p = 0.999 so that we seek an extreme quantile
of the sum; that is, we want to determine xp such that P0 (S xp ) = p. The
equation that we wish to solve for xp is
E0 {I (S xp )} = p
(4.25)
The crudest estimator of this is obtained by generating a large number
of independent observations of S under the parameter value 0 = 1 and
nding the pth quantile (i.e., by dening the empirical c.d.f.). We generate
186
MONTE CARLO SIMULATION AND FINANCE
independent random vectors Xi = (Xi 1 . . . Xi 10 ) from the probability dens0
ity f0 (x1 . . . x10 ), and with Si = 1=1 Xij dene
j
n
1
F (x) =
n
I (Si x)
(4.26)
i =1
Invert it (possibly with interpolation) to estimate the quantile
xp = F 1 (p)
(4.27)
If the true cumulative distribution function is differentiable, the variance
of this quantile estimator is asymptotically related to the variance of our
estimator of the cumulative distribution function,
var (F (xp ))
(F (xp ))2
var (xp )
so any variance reduction in the estimator of the c.d.f. is reected, at least
asymptotically, in a variance reduction in the estimator of the quantile.
Rather than generate the sample (Xi 1 . . . Xi 10 ) as independent observations having parameter value 0 , we could generate them using a different
parameter value and the replace F (x) in (4.27) by the importance sampling
empirical c.d.f.
FI (x) =
1
n
n
Wi I (Si x)
(4.28)
i =1
where
Wi =
f0 (Xi 1 . . . Xi 10 )
f (Xi 1 . . . Xi 10 )
and once again solve for xp . Ideally, we should choose the value of so that
the variance of xp or of
Wi I (Si xp )
is as small as possible. This requires a wise guess or experimentation with
various choices of . For a given we have another choice of empirical
cumulative distribution function,
FI 2 (x) =
n
1
n
i =1
Wi
Wi I (Si x)
(4.29)
i =1
Both of these provide fairly crude estimates of the sample quantiles when
observations are weighted, and, as one does with the sample median, one
could easily interpolate between adjacent values around the value of xp .
187
Variance Reduction Techniques
The alternative (4.29) is motivated by the fact that the values Wi appear
as weights attached to the observations Si , and it therefore seems reasonable to divide by the sum of the weights. In fact, the expected value of the
denominator is
n
Wi = n
E
i =1
so the two denominators are similar. In the example where the Xij are independent exponential(0 = 1), let us examine the weight on Si determined by
Xi = (Xi 1 . . . Xi 10 )
10
f (Xi 1 . . . Xi 10 )
exp(Xij )
Wi = 0
=
= 10 exp{Si (1 1 )}
1
f (Xi 1 . . . Xi 10 )
exp(Xij /)
j =1
The renormalized alternative (4.29) might be necessary for estimating extreme quantiles when the number of simulations is small but only the rst
provides an completely unbiased estimating function. In our case, using
(4.28) with = 2.5, we obtained an estimator of F (x0.999 ) with efciency
about 180 times that of a crude Monte Carlo simulation. There is some discussion of various renormalizations of the importance sampling weights in
Hesterberg (1995).
Importance Sampling, the Exponential Tilt,
and the Saddlepoint Approximation
In searching for a convenient importance distribution, particularly if we wish
to increase or decrease the frequency of observations in the tails, it is quite
common to embed a given density in an exponential family. For example,
suppose we wish to estimate an integral
g (x)f (x)dx
where f (x) is a probability density function. Suppose K(s) denotes the
cumulant-generating function (the logarithm of the moment-generating function) of the density f (x)
exp{K(s)} =
exs f (x)dx
The cumulant-generating function is a useful summary of the moments of a
distribution since the mean can be determined as K (0) and the variance as
188
MONTE CARLO SIMULATION AND FINANCE
K (0). From this single probability density function, we can now produce a
whole (exponential) family of densities
f (x) = ex K() f (x)
(4.30)
of which f (x) is a special case corresponding to = 0. The density (4.30) is
often referred to as an exponential tilt of the original density function, and
it increases the weight in the right tail for > 0 and decreases it for < 0.
This family of densities is closely related to the saddlepoint approximation. If we wish to estimate the value of a probability density function f (x) at
a particular point x note that this could be obtained from (4.30) if we knew
the probability density function f (x). On the other hand, a normal approximation to a density is often reasonable at or around its mode, particularly
if we are interested in the density of a sum or an average of independent
random variables. The cumulant-generating function of the density f (x) is
easily seen to be K( + s) and the mean is therefore K (). If we choose the
parameter = (x) so that
K () = x
(4.31)
then the density f has mean x and variance K (). How do we know for
a given value of x there exists a solution to (4.31)? From the properties
of cumulant-generating functions, K(t) is convex increasing and K(0) = 0.
This implies that as t increases, the slope of the cumulant-generating function
K (t) is nondecreasing. It therefore approaches a limit xmax (nite or innite)
as t , and as long as we restrict the value of x in (4.31) to the interval
x < xmax we can nd a solution. The value of the N (x K ()) density at the
value x is
1
f (x)
2K ()
and therefore the approximation to the density f (x) is
f (x)
1
eK()x
2K ()
(4.32)
where = (x) satises K () = x. This is the saddlepoint approximation,
discovered by Daniels (1954, 1980), and usually applied to the distribution of
sums or averages of independent random variables because then the normal
approximation is better motivated. Indeed, the saddlepoint approximation
to the distribution of the sum of n independent identically distributed random
variables is accurate to order O(n1 ), and if we renormalize it to integrate
to 1, accuracy to order O(n3/2 ) is possiblesubstantially better than the
order O(n1/2 ) of the usual normal approximation.
189
Variance Reduction Techniques
Consider, for example, the saddlepoint approximation to the gamma
( 1) distribution. Because the moment-generating function of the gamma
( 1) distribution is
1
t <1
m(t) =
(1 t)
the cumulant-generating function is
K(t) = ln(m(t)) = ln(1 t)
K () = x
K () =
implies
(1 )2
(x) = 1
so that
x
K ((x)) =
x2
Therefore, the saddlepoint approximation to the probability density function
is
f (x)
=
exp ln(/x) x 1
2x 2
x
1 1/2 1
ex
exp(x)
2
This is exactly the gamma density function with Stirlings approximation
replacing (), and after renormalization this is exactly the gamma density
function.
Since it is often computationally expensive to generate random variables
whose distribution is a convolution of known densities, it is interesting to
ask whether (4.32) makes this any easier. In many cases the saddlepoint
approximation can be used to generate a random variable whose distribution
is close to this convolution with high efciency. For example, suppose we
wish to generate the random variable Sn = n=1 Xi where each Xi has the
i
noncentral chi-squared distribution with cumulant-generating function
K(t) =
p
2 t
ln(1 2t)
1 2t
2
(4.33)
The parameter is the noncentrality parameter of the distribution, and p
is the number of degrees of freedom. Notice that the cumulant-generating
function of the sum takes the same form but with ( p) replaced by (n np) ,
so in effect we wish to generate a random variable with cumulant-generating
function (4.33) for large values of the parameters ( p). Instead we generate
from the saddlepoint approximation (4.32) to this distribution, and in fact
we do this indirectly. If we change variables in (4.32) to determine the density
of the new random variable that solves the equation
K( )=X
190
MONTE CARLO SIMULATION AND FINANCE
then the saddlepoint approximation (4.32) is equivalent to specifying a probability density for this variable,
dx
d
= constant K ()eK()K ()
f () = f (K ())
(4.34)
In general, this probability density function can often be bounded above by
some density over the range of possible values of , allowing us to generate
by acceptance-rejection. Then the value of the random variable is X = K ( ).
In the particular case of the noncentral chi-squared example above, we may
take the dominating density to be the U [0 1 ] density since (4.34) is bounded.
2
Combining Monte Carlo Estimators
We have now seen a number of different variance reduction techniques, and
many more are possible. With many of these methods, such as importance
and stratied sampling, are associated parameters that may be chosen in
different ways. The variance formula may be used as a basis of choosing a
best method, but these variances and efciencies must also be estimated
from the simulation, and it is rarely clear a priori which sampling procedure
and estimator is best. For example, if a function f is monotone on [0 1] then
an antithetic variate can be introduced with an estimator of the form
a 1 = 1 [f (U ) + f (1 U )]
2
U U [0 1]
but if the function is increasing to a maximum somewhere around
then decreasing thereafter, we might prefer
a 2 = 1 [f (U/2) + f ((1 U )/2) + f ((1 + U )/2) + f (1 U/2)]
4
(4.35)
1
2
and
(4.36)
Notice that any weighted average of these two unbiased estimators of
would also provide an unbiased estimator of . The large number of potential variance reduction techniques is an embarrassment of riches. Which
variance reduction method should we use, and how will we know whether it
is better than the competitors? Fortunately, the answer is often to use all of
the methods (within reason, of course); choosing a single method is often
neither necessary nor desirable. Rather, it is preferable to use a weighted
average of the available estimators with the optimal choice of the weights
provided by regression.
Suppose in general that we have k estimators or statistics i i = 1 . . . k ,
all unbiased estimators of the same parameter so that E(i ) = for all i.
= (1 . . . k) we write E( ) = 1 , where 1
In vector notation, letting
191
Variance Reduction Techniques
is the k-dimensional column vector of 1s so that 1 = (1 1 . . . 1). Let us
suppose for the moment that we know the variance-covariance matrix V of
the vector , dened by
Vij = cov(i j )
Theorem 19 (Best linear combinations of estimators) The linear combination of the i that provides an unbiased estimator of and has minimum
variance among all linear unbiased estimators is
blc =
bi i
(4.37)
i
where the vector b = (b1 . . . bk) is given by
b = (1t V 1 1)1 V 1 1
The variance of the resulting estimator is
var(blc) = bt V b = 1/(1t V 1 1)
Proof The proof is straightforward. It is easy to see that for any linear
combination (4.37) the variance of the estimator is
bt V b
and we wish to minimize this quadratic form as a function of b subject to
the constraint that the coefcients add to 1, or
b 1 =1
Introducing the Lagrangian, we wish to set the derivatives with respect
to the components bi equal to zero,
t
{b V b + (b 11)} = 0 or
b
2V b + 1 = 0
b = constant V 1 1
and upon requiring that the coefcients add to one, we discover the value of
I
the constant above is (1t V 1 1)1 .
This theorem indicates that the ideal linear combination of estimators
has coefcients proportional to the row sums of the inverse covariance mat-
192
MONTE CARLO SIMULATION AND FINANCE
rix. Notably, the variance of a particular estimator i is an ingredient in
that sum, but one of many. In practice, of course, we almost never know
the variance-covariance matrix V of a vector of estimators . However,
when we do simulation evaluating these estimators using the same uniform
input to each, we obtain independent replicated values of . This permits
us to estimate the covariance matrix V and since we typically conduct many
simulations, this estimate can be very accurate. Let us suppose that we have
n simulated values of the vectors
and call these 1 . . . n . As usual, we
estimate the covariance matrix V using the sample covariance matrix
V=
1
n1
n
(
i
)(
i
)
i =1
where
=
1
n
n
i
i =1
Let us return to the example and attempt to nd the best combination
of the many estimators we have considered so far. To this end, let
0.53
[f (0.47 + 0.53U ) + f (1 0.53U )] an antithetic estimator
2
0.16
0.37
[f (0.47 + 0.37U ) + f (0.84 0.37U )] +
[f (0.84 + 0.16U )
2 =
2
2
+ f (1 0.16U )]
1 =
3 = 0.37[f (0.47 + 0.37U )] + 0.16[f (1 0.16U )] (stratied-antithetic)
4 =
g (x)dx + [f (U ) g(U )]
5 = im
(control variate)
the importance sampling estimator (4.23)
Then 2 and 3 are both stratied-antithetic estimators, 4 is a control variate
estimator, and 5 is the importance sampling estimator discussed earlier, all
obtained from a single input uniform random variate U. In order to determine
the optimal linear combination, we need to generate simulated values of
all ve estimators using the same uniform random numbers as inputs. We
determine the best linear combination of these estimators using
function [o,v,b,V]=optimal(U)
% generates optimal linear combination of five
estimators and outputs
% average estimator, variance and weights
193
Variance Reduction Techniques
% input U a row vector of U[0,1] random numbers
T1=(.53/2)*(fn(.47+.53*U)+fn(1-.53*U));
T2=.37*.5*(fn(.47+.37*U)+fn(.84-.37*U))+.16*.5
*(fn(.84+.16*U)+fn(1-.16*U));
T3=.37*fn(.47+.37*U)+.16*fn(1-.16*U);
intg=2*(.53)3+.532/2;
T4=intg+fn(U)-GG(U);
T5=importance(fn,U);
X=[T1 T2 T3 T4 T5];
% columns of X are replications of the same estimator,
% row, estimators using same U
mean(X)
V=cov(X);
% this estimates the covariance matrix V
on=ones(5,1);
V1=inv(V);
% the inverse of the covariance matrix
b=V1*on/(on*V1*on);
% vector of coefficients of the optimal linear
combination
o=mean(X*b);
% vector of the optimal linear combinations
v=1/(on*V1*on);
% variance of the optimal linear combination based on a
single U
One run of this estimator, called with [o,v,b,V]= optimal(rand
(1,1000000)) yields
o = 0.4615
b = [0.5499 1.4478 0.1011 0.0491
0.0481]
The estimate 0.4615 is accurate to at least four decimal places, which is
not surprising since the variance per uniform random number input is v =
1.13 105 . In other words, the variance of the mean based on 1,000,000
uniform input is 1.13 1010 , and the standard error is around 0.00001, so
we can expect accuracy to at least four decimal places. Note that some of
the weights are negative and others are greater than 1. Do these negative
weights indicate estimators that are worse than useless? The effect of some
estimators may be, on subtraction, to render the remaining function more
linear and more easily estimated using another method, and negative coefcients are quite common in regression generally. The efciency gain over
194
MONTE CARLO SIMULATION AND FINANCE
crude Monte Carlo is an extraordinary 40,000. However, since there are
10 function evaluations for each uniform variate input, the efciency when
we adjust for the number of function evaluations is 4000. This simulation
using 1,000,000 uniform random numbers and taking 63 seconds on a Pentium IV (2.4 GHz) (including the time required to generate all ve estimators)
is equivalent to 40 billion simulations by crude Monte Carlo, a major task
on a supercomputer!
If we intended to use this simulation method repeatedly, we might well
wish to see whether some of the estimators can be omitted without too much
loss of information. Since the variance of the optimal estimator is 1/(1t V 1 1),
we might use this to attempt to select one of the estimators for deletion.
Notice that it is not so much the covariance of the estimators V that enters
into Theorem 19 but its inverse J = V 1, which we can consider a type of
information matrix by analogy to maximum likelihood theory. For example,
we could choose to delete the ith estimator, that is, delete the ith row and
column of V where i is chosen to have the smallest effect on 1/(1t V 1 1) or
its reciprocal 1t J1 = i j Jij . In particular, if we let V(i) be the matrix V
with the ith row and column deleted and J(i) = V(i)1 then we can identify
t
t
1 J1 1 J(i) 1 as the loss of information when the ith estimator is deleted.
Since not all estimators have the same number of function evaluations, we
should adjust this information by F E(i) = number of function evaluations
required by the ith estimator. In other words, if an estimator i is to be deleted,
it should be the one corresponding to
min
i
1t J1 1t J(i) 1
F E(i)
We should drop this ith estimator if the minimum is less than the information
per function evaluation in the combined estimator, because this means we
will increase the information available in our simulation per function evaluation. In the above example with all ve estimators included, 1t J1 = 88 757
(with 10 function evaluations per uniform variate), so the information per
function evaluation is 8876.
In this case, if we were to eliminate one of the estimators, our choice
would likely be number 3 since it contributes the least information per function evaluation. However, since all contribute more than 8876 per function
evaluation, we should likely retain all ve.
Common Random Numbers
We now discuss another variance reduction technique, closely related to antithetic variates, called common random numbers, which is used, for example,
whenever we wish to estimate the difference in performance between two
195
Variance Reduction Techniques
systems or any other variable involving a difference, such as a slope of a
function.
Example For a simple example, suppose we have two estimators 1 2 of the
center of a symmetric distribution. We would like to know which of these
estimators is better in the sense that it has smaller variance when applied
to a sample from a specic distribution symmetric about its median. If both
estimators are unbiased estimators of the median, then the rst estimator is
better if
var (1 ) < var (2 )
and so we are interested in estimating a quantity like
Eh1 (X) Eh2 (X)
where X is a vector representing a sample from the distribution and h1 (X) =
2
2
1 h2 (X) = 2 . There are at least two ways of estimating these differences:
1. Generate samples and hence values of h1 (Xi ) i = 1 . . . n , and Eh2 (Xj )
j = 1 2 . . . m, independently and use the estimator
1
n
n
h1 (Xi )
i =1
1
m
m
h2 (Xj )
j =1
2. Generate samples and hence values of h1 (Xi ) h2 (Xi ) i = 1 . . . n, independently and use the estimator
1
n
n
(h1 (Xi ) h2 (Xi ))
i =1
It seems intuitive that the second method is preferable since it removes
the variability due to the particular sample from the comparison. Estimating
TABLE 4.1
i
1t J1 1t J(i) 1
FE(i)
1t J11t J 1
(i)
FE(i)
1
2
3
4
5
88,048
87,989
28,017
55,725
32,323
2
4
2
1
1
44,024
21,997
14,008
55,725
32,323
196
MONTE CARLO SIMULATION AND FINANCE
the difference between two expected values is a common type of problem. For
example, we may be considering investing in a new piece of equipment that
will speed up processing at one node of a network and we wish to estimate
the expected improvement in performance between the new system and the
old. In general, suppose that we wish to estimate the difference between two
expectations, say
(4.38)
Eh1 (X) Eh2 (Y )
where the random variable or vector X has cumulative distribution function
FX and Y has cumulative distribution function FY . Notice that the variance
of a Monte Carlo estimator,
var[h1 (X) h2 (Y )] = var[h1 (X)] + var[h2 (Y )]
2 cov{h1 (X) h2 (Y )}
(4.39)
is small if we can induce a high degree of positive correlation between the
generated random variables h1 (X) and h2 (Y ). This is precisely the opposite
of the problem that led to antithetic random numbers, where we wished to
induce a high degree of negative correlation. The following lemma is due
to Hoeffding (1940) and provides a useful bound on the joint cumulative
distribution function of two random variables X and Y. Suppose X Y have
cumulative distribution functions FX (x) and FY (y) , respectively, and joint
cumulative distribution function G(x y) = P [X x Y y ].
Lemma 6
satises
(a) The joint cumulative distribution function G of (X Y ) always
(FX (x) + FY (y) 1)+ G(x y) min(FX (x) FY (y))
(4.40)
for all x y .
(b) Assume that FX and FY are continuous functions. In the case that
X = FX 1 (U ) and Y = FY 1 (U ) for U uniform on [0 1], equality is achieved
on the right: G(x y) = min(FX (x) FY (y)). In the case that X = FX 1 (U ) and
1
+
Y = FY (1 U ), there is equality on the left: (FX (x) + FY (y) 1) = G(x y).
Proof
(a) Note that
P [X x Y y ] P [X x ]
P [Y y ]
and similarly
This shows that
G(x y) min(FX (x) FY (y))
Variance Reduction Techniques
197
verifying the right side of (4.40). Similarly, for the left side
P [X x Y y ] = P [X x ] P [X x Y > y ]
P [X x ] P [Y > y ]
= FX (x) (1 FY (y))
= (FX (x) + FY (y) 1)
Since it is also nonnegative, the left side follows.
(b) Suppose X = FX 1 (U ) and Y = FY 1 (U ); then
P [X x Y y ] = P [FX 1 (U ) x FY 1 (U ) y ]
= P [U FX (x) U FY (y)]
since P [X = x ] = 0 and P [Y = y ] = 0.
But
P [U FX (x) U FY (y)] = min(FX (x) FY (y))
verifying the equality on the right of (4.40) for common random numbers.
By a similar argument,
P [FX 1 (U ) x FY 1 (1 U ) y ] = P [U FX (x) 1 U FY (y)]
= P [U FX (x) U 1 FY (y)]
= (FX (x) (1 FY (y)))+
verifying the equality on the left.
I
The following theorem supports the use of common random numbers to maximize covariance and antithetic random numbers to minimize
covariance.
Theorem 20 (Maximum/minimum covariance) Suppose h1 and h2 are both
nondecreasing (or both nonincreasing) functions. Subject to the constraint
that X Y have cumulative distribution functions FX FY , respectively, the
covariance
cov[h1 (X) h2 (Y )]
is maximized when Y = FY 1 (U ) and X = FX 1 (U ) (i.e., for common uni
form [0 1] random numbers) and is minimized when Y = FY 1 (U ) and X =
1
FX (1 U ) (i.e., for antithetic random numbers).
Proof We will sketch a proof of the theorem when the distributions are all
continuous and h1 h2 are differentiable. Dene G(x y) = P [X x Y y ].
198
MONTE CARLO SIMULATION AND FINANCE
The following representation of covariance is useful: Dene
H (x y) = P (X > x Y > y) P (X > x)P (Y > y)
= G(x y) FX (x)FY (y)
(4.41)
Notice that, using integration by parts,
H (x y)h1 (x)h2 (y)dx dy
H (x y)h1 (x)h2 (y)dx dy
x
2
H (x y)h1 (x)h2 (y)dx dy
=
xy
=
=
h1 (x)h2 (y)g(x y)dx dy
h1 (x)fX (x)dx
h2 (y)fY (y)dy
= cov(h1 (X) h2 (Y ))
(4.42)
where g(x y) fX (x) fY (y) denote the joint probability density function, the
probability density function of X, and that of Y, respectively. In fact, this
result holds in general even without the assumption that the distributions
are continuous. The covariance between h1 (X) and h2 (Y ), for h1 and h2
differentiable functions, is
cov(h1 (X) h2 (Y )) =
H (x y)h1 (x)h2 (y)dx dy
The formula shows that to maximize the covariance, if h1 h2 are both increasing or both decreasing functions, it is sufcient to maximize H (x y) for
each x y since h1 (x) h2 (y) are both nonnegative. Since we are constraining
the marginal cumulative distribution functions FX FY this is equivalent to
maximizing G(x y) subject to the constraints
lim G(x y) = FX (x)
y
lim G(x y) = FY (y)
x
Lemma 6 shows that the maximum is achieved when common random numbers are used and the minimum achieved when we use antithetic random
numbers.
I
We can argue intuitively for the use of common random numbers in the
case of a discrete distribution with probability on the points indicated in
199
Variance Reduction Techniques
1
0.9
0.8
P4
P1
0.7
0.6
y
0.5
0.4
0.3
P3
P2
0.2
0.1
0
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
x
FIGURE 4.5 Changing Weights on Points to Maximize Covariance
Figure 4.5. This gure corresponds to a joint distribution with the following
probabilities, say
x
y
P [X = x Y = y ]
0
0
0.1
0.25
0.25
0.2
0.25
0.75
0.2
0.75
0.25
0.1
0.75
0.75
0.2
1
1
0.2
Suppose we wish to maximize P [X > x Y > y ] subject to the constraint
that the probabilities P [X > x ] and P [Y > y ] are xed. We have indicated
arbitrary xed values of (x y) in the gure. Note that if there is any weight
attached to the point in the lower right quadrant (labeled P2 ), some or all
of this weight can be reassigned to the point P3 in the lower left quadrant
provided there is an equal movement of weight from the upper left P4 to
the upper right P1 . Such a movement of weight will increase the value of
G(x y) without affecting P [X x ] or P [Y y ]. The weight that we are
able to transfer in this example is 0.1 the minimum of the weights on P4
and P2 . In general, this continues until there is no weight in one of the
200
MONTE CARLO SIMULATION AND FINANCE
off-diagonal quadrants for every choice of (x y). The resulting distribution
in this example is given by
x
y
P [X = x Y = y ]
0
0
0.1
0.25
0.25
0.3
0.25
0.75
0
0.75
0.25
0.1
0.75
0.75
0.3
1
1
0.2
and it is easy to see that such a joint distribution can be generated from
common random numbers X = FX 1 (U ) Y = FY 1 (U ).
Conditioning
We now consider a simple but powerful generalization of control variates.
Suppose that we can decompose a random variable T into two components
T1 ,
(4.43)
T = T1 +
so that T1 are uncorrelated:
cov(T1 ) = 0
Assume as well that E() = 0. Regression is one method for determining such
a decomposition, and the error term in regression satises these conditions.
Then T1 has the same mean as T and it is easy to see that
var (T ) = var (T1 ) + var ()
so T1 has smaller variance than T (unless = 0 with probability 1). Thus,
if we wish to estimate the common mean of T or T1 the estimator T1 is
preferable, since it has the same mean with smaller variance.
One special case is variance reduction by conditioning. One common
denition of E [X|Y ] is the unique (with probability 1) function g(y) of Y that
minimizes E {X g(Y )}2 . This denition applies only to random variables X
that have nite variance, and so this denition requires some modication
when E(X2 ) = but we will assume here that all random variables under
consideration, say X Y Z, have nite variance. We can dene conditional
covariance using conditional expectation as
cov(X Y |Z) = E [XY |Z ] E [X|Z ]E [Y |Z ]
and conditional variance as
var (X |Z) = E(X 2 |Z) (E [X |Z ])2
Variance reduction through conditioning is justied by the following wellknown result.
201
Variance Reduction Techniques
Theorem 21
(a) E(X) = E {E [X|Y ]}
(b) cov(X Y ) = E {cov(X Y |Z)} + cov{E [X|Z ] E [Y |Z ]}
(c) var (X) = E {var (X|Z)} + var {E [X |Z ]}
This theorem is used as follows. Suppose we are considering a candidate
estimator , an unbiased estimator of . We also have an arbitrary random
variable Z that is somehow related to . Suppose that we have chosen Z care
fully so that we are able to calculate the conditional expectation T1 = E [|Z ].
Then by part (a) of the above theorem, T1 is also an unbiased estimator of .
Dene
= T1
By part (c),
var () = var (T1 ) + var ()
and var (T1 ) = var () var () < var (). In other words, for any variable
|Z ] has the same expectation as does but smaller variance, and the
Z , E [
are nearly independent, because
decrease in variance is largest if Z and
in this case E [|Z ] is close to a constant and its variance close to zero. In
general, the search for an appropriate Z so as to reduce the variance of an
estimator by conditioning requires searching for a random variable Z such
that
1. The conditional expectation E [|Z ] with the original estimator is computable.
2. var (E [|Z ]) is substantially smaller than var ().
Example (Hit or miss) Suppose we wish to estimate the area under a certain graph f (x) by the hit-or-miss method. A crude method would involve
determining a multiple c of a probability density function g(x) that dominates f (x) so that cg(x) f (x) for all x. We can generate points (X Y )
at random and uniformly distributed under the graph of cg(x) by generating X by inverse transform X = G1 (U1 ), where G(x) is the cumulative
distribution function corresponding to density g , and then generating Y
from the uniform[0 cg(X)] distribution, say Y = cg(X)U2 . An example with
g(x) = 2x 0 < x < 1, and c = 1/4 is given in Figure 4.6.
The hit-or-miss estimator of the area under the graph of f obtains by
generating such random points (X Y ) and counting the proportion that fall
under the graph of g that is, for which Y f (X). This proportion estimates
202
MONTE CARLO SIMULATION AND FINANCE
0.5
0.45
0.4
0.35
0.3
cg (x )
0.25
0.2
f (x )
0.15
0.1
* (X,Y )
0.05
0
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
FIGURE 4.6 Example of the Hit-or-Miss Method
the probability
P [Y f (X)] =
Area under f (x)
Area under cg(x)
Area under f (x)
=
c
since g(x) is a probability density function. Notice that if we dene
W=
c if Y f (X)
0 if Y > f (X)
then
E(W ) = c
Area under f (x)
Area under cg(x)
= Area under f (x)
so W is an unbiased estimator of the parameter that we wish to estimate. We
might therefore estimate the area under f (x) using a Monte Carlo estimator
1
H M = n n=1 Wi based on independent values of Wi . This is the hit-ori
miss estimator. However, in this case it is easy to nd a random variable Z
such that the conditional expectation E(Z |W ) can be determined in closed
form. In fact, we can choose Z = X obtaining
f (X)
E [W |X ] =
g(X)
203
Variance Reduction Techniques
This is therefore an unbiased estimator of the same parameter and it has
smaller variance than does W. For a sample of size n we should replace the
crude estimator cr by the estimator
Cond =
1
n
1
=
n
n
i =1
n
i =1
f (Xi )
g(Xi )
f (Xi )
2Xi
with Xi generated from X = G1 (Ui ) = Ui i = 1 2 . . . n, and Ui
uniform[0, 1]. In this case, the conditional expectation results in a familiar
form for the estimator Cond. This is simply an importance sampling estimator
with g(x) the importance distribution. However, this derivation shows that
the estimator Cond has smaller variance than H M .
PROBLEMS
1. Use both crude and antithetic random numbers to integrate the function
1
0
eu 1
du
e1
(a) What is the efciency gain attributed to the use of antithetic random
numbers?
(b) How large a sample size would we need, using antithetic and crude
Monte Carlo, in order to estimate the above integral, correct to four
decimal places, with probability at least 95 percent?
2. Under what conditions on f does the use of antithetic random numbers
completely correct for the variability of the Monte Carlo estimator (i.e.,
when is var (f (U ) + f (1 U )) = 0?
3. Suppose that F (x) is the normal( 2 ) cumulative distribution function.
Prove that F 1 (1 U ) = 2 F 1 (U ) and therefore, if we use antithetic
random numbers to generate two normal random variables X1 X2 having mean and variance 2 , this is equivalent to setting X2 = 2 X1 .
In other words, if we wish to use antithetic random numbers for normal variates, it is not necessary to generate the normal random variables
using the inverse transform method.
4. Show that the variance of a weighted average
var (X + (1 )W )
204
MONTE CARLO SIMULATION AND FINANCE
is minimized over when
=
var (W ) cov(X W )
var (W ) + var (X) 2 cov(X W )
Determine the resulting minimum variance. What if the random variables X W are independent?
5. Use a stratied random sample to integrate the function
1
0
eu 1
du
e1
What do you recommend for choice of strata (two or three) and sample
sizes? What is the efciency gain?
6. Use a combination of stratied random sampling and an antithetic random number in the form
1
[f (U/2)
2
+ f (1 U/2)]
to integrate the function
1
0
eu 1
du
e1
What is the efciency gain?
7. The second version of the control variate Monte Carlo estimator
cv
1
=
n
n
{f (Ui ) [g(Ui ) E(g(Ui ))]}
i =1
an improved control variate estimator, is equivalent to the rst version
x
in the case = 1. In the case f (x) = ee11 , consider using g(x) = x as
a control variate to integrate over [0, 1]. Determine how much better
cv is than the basic control variate ( = 1) by performing simulations.
Show that the variance is reduced by a factor of approximately 60 over
crude Monte Carlo. Is there much additional improvement if we use a
more general quadratic function of x for g(x)?
8. There is considerable evidence that portfolio returns are neither normally
nor lognormally distributed but have fatter tails than either distribution.
Suppose we approximate the distribution of trading losses using a random variable Xi with probability density function
f (x) =
(b2
2b 3
+ (x )2 )2
205
Variance Reduction Techniques
This is the recentered and rescaled Students t distribution with 3 degrees
of freedom. We are told that the parameters are determined by two observations from historical data, that the median daily loss is $1,000
(i.e., a prot) and that the probability that the daily loss exceed $5,000
is only 0.01. We wish to estimate a weekly value at risk, Var0.95 , a value
v such that P [ 5=1 Xi < v ] = 0.95 . Since we do not know the distribui
tion of the sum of independent Student-distributed random variables Xi ,
we may wish to do this by simulation. Suggest appropriate methods involving importance sampling, control variates, and stratied sampling.
Implement these methods and estimate the variance reduction achieved
by each. How do they compare with the variance reduction achieved
using the optimal linear combination?
9. Suppose three different simulation estimates Y1 Y2 Y3 are all unbiased
estimators of the parameter and all with identical variances
var (Yi ) = 1
Assume that cov(Y1 Y2 ) = cov(Y1 Y3 ) = 1/2 and cov(Y2 Y3 ) = 0.
In order to estimate the parameter , should we use one of the estimators Yi or some linear combination of Y1 Y2 Y3? Compare the number of
simulations necessary for a certain degree of accuracy if we use a single
estimator with that for a linear combination.
x
10. In the case f (x) = ee11 , use g(x) = x as a control variate to integrate
over [0, 1]. Find the optimal linear combination using estimators (4.35)
and (4.36), an importance sampling estimator, and the control variate
estimator above. What is the efciency gain over crude Monte Carlo?
Find millions of documents on Course Hero - Study Guides, Lecture Notes, Reference Materials, Practice Exams and more.
Course Hero has millions of course specific materials providing students with the best way to expand
their education.
Below is a small sample set of documents:
Waterloo - STAT - 340
CHAPTER5Simulating the Value of OptionsASIAN OPTIONSAn Asian option, at expiration T, has value determined not by the closing price of the underlying asset as for a European option, but on an average price of the asset over an interval. For example a
Waterloo - STAT - 340
CHAPTER7Estimation and CalibrationINTRODUCTIONVirtually all models have parameters that must be specied in order for themodels to be completely described. Statistical estimation can be used forsome or all of these parameters under two important cond
Waterloo - STAT - 340
SOLUTIONS FOR REVIEW PROBLEMS1.The joint probability density function isf (x, y ) = 2e(x+2y) , 0 < x < and 0 < y < .ThereforeP (XZZ< Y)==Z0=Zcfw_y01.3f (x, y )dxdycfw_(x,y );x<y f (x, y )dxdy2. In this case n = 30 and the observed nu
Waterloo - STAT - 340
02/01/2007STATISTICS 340/CS 437COMPUTER SIMULATIONInstructor:Don McLeishMC 6138TEXT:Simulation, 2nd 4th edition, Sheldon M.Ross. Academic Press(on reserve in libraryQA273.R82)Various notes andtransparencies willbe posted on webpage including
Waterloo - STAT - 340
Covariance: DefinitionIf E(X)=x and E(Y)=y, thenCov(X,Y)= E[(X-x)(Y-y)]Another formula for covariance:Cov(X,Y)=E(XY)-xy51Laws of Covariancefor constants a,b d.Cov(X,X)=var(X)Cov(X,Y)=Cov(Y,X)Cov(aX+b,Y)=a Cov(X,Y)|Cov(X,Y)|SD(X)SD(Y)If X,Y ind
University of Phoenix - ECONOMIC - ECON561
1Changing Roles in Human Resource ManagementStudentDate2Discussing the changing role of the Human Resource Manager an interview with JillianJohnson, HR manager at one of the premier telecommunications companys in the area.Discovering the value of H
Instituto Politecnico National Unidad Profesional - ENGINEERIN - 404
IntroduccinEl presente trabajo de investigacin ejemplifica y define lo referente a los servomotores y su construccin decd y ca.Tambin se dan ejemplos claros de cmo influye la potencia en la mquina y las relaciones de parvelocidadde algunas de las maqu
College of Puerto Rico - CC - 101
Universidad Interamericana de Puerto RicoRecinto de FajardoDepartamento de Ciencias y TecnologaCOMP 3410 Computer SecurityExamen FinalConteste las siguientes preguntas. Someter a travs de esta misma herramienta.1. Define IT security management.2. L
DeVry Addison - ACCOUNTING - 101
373Chapter 12Corporate Formation, Distributions, and OtherCorporation-Related Tax IssuesTRUE-FALSE QUESTIONSCHAPTER 121. A corporation recognizes a loss when it distributes property that has declined in value.2. When a shareholder receives a return
DeVry Addison - ACCOUNTING - 101
395Chapter 13The Sole Proprietorship and Individual Tax ReturnTRUE-FALSE QUESTIONSCHAPTER 131. Only when an individuals itemized deductions exceed the standard deduction amount will taxable incomereduced by an expense that is deductible as an itemize
DeVry Addison - ACCOUNTING - 101
415Chapter 14Flow-Through Entities: Partnerships, LLPs, and LLCsTRUE-FALSE QUESTIONSCHAPTER 141. In a general partnership all the partners are classied as general partners, each of whom has unlimitedliability for the debts of the partnership.2. Each
DeVry Addison - ACCOUNTING - 101
Limitations on Business Deductionshttp:/highered.mcgraw-hill.com/sites/0077507819/student_view0/e.Taxation of BusinessEntities, 2012, eBook3/eContentChapter1: Business Income, Deductions, and AccountingMethodsLimitations on Business Deductionsp.
DeVry Addison - ACCOUNTING - 101
Specic Business Deductionshttp:/highered.mcgraw-hill.com/sites/0077507819/student_view0/e.Taxation of BusinessEntities, 2012, eBook3/eContentChapter1: Business Income, Deductions, and AccountingMethodsSpecic Business Deductionsp. 13161514As w
DeVry Addison - ACCOUNTING - 101
Accounting Periodshttp:/highered.mcgraw-hill.com/sites/0077507819/student_view0/e.Taxation of BusinessEntities, 2012, eBook3/eContentChapter1: Business Income, Deductions, and AccountingMethodsAccounting Periodsp. 17So far we've discussed how to
DeVry Addison - ACCOUNTING - 101
Accounting Methodshttp:/highered.mcgraw-hill.com/sites/0077507819/student_view0/e.Chapter1: Business Income, Deductions, and AccountingMethodsAccounting Methodsp. 19343332313029282726252423222120Once a business adopts a tax year, it m
DeVry Addison - ACCOUNTING - 101
Business Deductionshttp:/highered.mcgraw-hill.com/sites/0077507819/student_view0/e.Taxation of BusinessEntities, 2012, eBook3/eContentChapter1: Business Income, Deductions, and AccountingMethodsBusiness Deductionsp. 45Because Congress intended
DeVry Addison - ACCOUNTING - 101
Chapter Openerhttp:/highered.mcgraw-hill.com/sites/0077507819/student_view0/e.Taxation of BusinessEntities, 2012, eBook3/eContentChapter1: Business Income, Deductions, and AccountingMethodsChapter Openerp. 13Learning ObjectivesUpon completing
DeVry Addison - ACCOUNTING - 101
Tax Free Class NotesPage 1 of 8These notes are not stand alone ! They are intended for use in conjunction with the classNON - CONCURRENT EXCHANGE SAFE HARBOR RULESFor: INDIVIDUAL TAX PAYERSWho else can do Exchanges?Corp., Partnership, LLC (all or no
DeVry Addison - ACCOUNTING - 101
Passive Activities and Real Estate Professionals - Print document - ProQuestPage 1 of 4Back to documentPassive Activities and Real Estate ProfessionalsSullivan, Jeanne; Gordon, Deborah Karet; Bloom, Brandon; Merrill, Sam.Business Entities 13. 6 (Nov/D
DeVry Addison - ACCOUNTING - 101
IntuitS go to Table of ContentsProLine Professional Tax Planning Guide1IntuitS go to Table of ContentsProLine Professional Tax Planning Guide
DeVry Addison - ACCOUNTING - 101
Chapter OpenerPage 1 of 4Taxation of BusinessEntities, 2012, eBook3/eContentChapter9: Forming and Operating PartnershipsChapter Openerp. 393Learning ObjectivesUpon completing this chapter, you should be able to:Determine whether a flow-through
DeVry Addison - ACCOUNTING - 101
Flow-Through Entities OverviewPage 1 of 2Taxation of BusinessEntities, 2012, eBook3/eContentChapter9: Forming and Operating PartnershipsFlow-Through Entities OverviewIncome earned by flow-through entities is usually not taxed at the entity level.
DeVry Addison - ACCOUNTING - 101
Partnership Formations and Acquisitions of Partnership InterestsPage 1 of 10Taxation of BusinessEntities, 2012, eBook3/eContentChapter9: Forming and Operating PartnershipsPartnership Formations and Acquisitions of Partnership Interestsp. 396Acqui
DeVry Addison - ACCOUNTING - 101
Partnership Accounting Periods, Methods, and Tax ElectionsPage 1 of 4Taxation of BusinessEntities, 2012, eBook3/eContentChapter9: Forming and Operating PartnershipsPartnership Accounting Periods, Methods, and Tax ElectionsA newly formed partnershi
DeVry Addison - ACCOUNTING - 101
Reporting the Results of Partnership Operationshttp:/highered.mcgraw-hill.com/sites/0077507819/student_view0/e.Taxation of BusinessEntities, 2012, eBook3/eContentChapter9: Forming and Operating PartnershipsReporting the Results of Partnership Opera
DeVry Addison - ACCOUNTING - 101
Partner's Adjusted Tax Basis in Partnership InterestPage 1 of 4Taxation of BusinessEntities, 2012, eBook3/eContentChapter9: Forming and Operating PartnershipsPartner's Adjusted Tax Basis in Partnership InterestEarlier in this chapter, we discussed
DeVry Addison - ACCOUNTING - 101
Loss LimitationsPage 1 of 6Taxation of BusinessEntities, 2012, eBook3/eContentChapter9: Forming and Operating PartnershipsLoss LimitationsWhile partners generally prefer not to invest in partnerships with operating losses, these losses generate cu
DeVry Addison - ACCOUNTING - 101
Chapter OpenerPage 1 of 2Taxation of BusinessEntities, 2012, eBook3/eContentChapter3: Property DispositionsChapter Openerp. 111Learning ObjectivesUpon completing this chapter, you should be able to:Calculate the amount of gain or loss recognize
DeVry Addison - ACCOUNTING - 101
DispositionsPage 1 of 3Taxation of BusinessEntities, 2012, eBook3/eContentChapter3: Property DispositionsDispositionsTaxpayers can dispose of assets in many ways. For example, a taxpayer could sell an asset, donate it to charity,trade it for a si
DeVry Addison - ACCOUNTING - 101
Character of Gain or LossPage 1 of 3Taxation of BusinessEntities, 2012, eBook3/eContentChapter3: Property DispositionsCharacter of Gain or LossIn order to determine how a recognized gain or loss affects a taxpayer's income tax liability, the taxpa
DeVry Addison - ACCOUNTING - 101
Depreciation RecapturePage 1 of 5Taxation of BusinessEntities, 2012, eBook3/eContentChapter3: Property DispositionsDepreciation RecaptureBecause all 1231 assets except land are subject to cost recovery, it is possible that a 1231 asset other than
DeVry Addison - ACCOUNTING - 101
Other Provisions Affecting the Rate at Which Gains are TaxedPage 1 of 3Taxation of BusinessEntities, 2012, eBook3/eContentChapter3: Property DispositionsOther Provisions Affecting the Rate at Which Gains are TaxedOther provisions, other than depre
DeVry Addison - ACCOUNTING - 101
Calculating Net 1231 Gains or LossesPage 1 of 3Taxation of BusinessEntities, 2012, eBook3/eContentChapter3: Property DispositionsCalculating Net 1231 Gains or LossesOnce taxpayers determine the amount and character of gain or loss they recognize o
DeVry Addison - ACCOUNTING - 101
Gain or Loss SummaryPage 1 of 3Taxation of BusinessEntities, 2012, eBook3/eContentChapter3: Property DispositionsGain or Loss SummaryAs indicated in Exhibit 3-2, Teton sold several assets during the year. Exhibit 3-8 summarizes the character of th
DeVry Addison - ACCOUNTING - 101
Nonrecognition TransactionsPage 1 of 14Taxation of BusinessEntities, 2012, eBook3/eContentChapter3: Property DispositionsNonrecognition TransactionsTaxpayers realizing gains and losses when they sell or exchange property must immediately recognize
DeVry Addison - ACCOUNTING - 101
Chapter OpenerPage 1 of 4Taxation of BusinessEntities, 2012, eBook3/eContentChapter6: Accounting for Income TaxesChapter Openerp. 247Learning ObjectivesUpon completing this chapter, you should be able to:Explain the objectives behind FASB ASC T
DeVry Addison - ACCOUNTING - 101
Objectives of Accounting for Income Taxes and the Income Tax Provision ProcessPage 1 of 4Taxation of BusinessEntities, 2012, eBook3/eContentChapter6: Accounting for Income TaxesObjectives of Accounting for Income Taxes and the Income TaxProvision
DeVry Addison - ACCOUNTING - 101
Calculating the Current and Deferred Income Tax Expense or Benefit Components of a C. Page 1 of 10Taxation of BusinessEntities, 2012, eBook3/eContentChapter6: Accounting for Income TaxesCalculating the Current and Deferred Income Tax Expense or Bene
DeVry Addison - ACCOUNTING - 101
Determining Whether a Valuation Allowance is NeededPage 1 of 6Taxation of BusinessEntities, 2012, eBook3/eContentChapter6: Accounting for Income TaxesDetermining Whether a Valuation Allowance is Neededp. 265Step 5: Evaluate the Need for a Valuati
DeVry Addison - ACCOUNTING - 101
Accounting for Uncertainty in Income Tax PositionsPage 1 of 5Taxation of BusinessEntities, 2012, eBook3/eContentChapter6: Accounting for Income TaxesAccounting for Uncertainty in Income Tax Positionsp. 270As you have learned in your study of the
DeVry Addison - ACCOUNTING - 101
Financial Statement Disclosure and the Computation of a Corporation's Effective Tax RatePage 1 of 5Taxation of BusinessEntities, 2012, eBook3/eContentChapter6: Accounting for Income TaxesFinancial Statement Disclosure and the Computation of aCorpo
DeVry Addison - ACCOUNTING - 101
Convergence of ASC 740 with International Financial Reporting StandardsPage 1 of 1Taxation of BusinessEntities, 2012, eBook3/eContentChapter6: Accounting for Income TaxesConvergence of ASC 740 with International Financial ReportingStandardsIn 200
DeVry Addison - ACCOUNTING - 101
ConclusionPage 1 of 1Taxation of BusinessEntities, 2012, eBook3/eContentChapter6: Accounting for Income TaxesConclusionIn this chapter we discussed the basic rules that govern the computation of a company's U.S. income tax provision.As a result o
DeVry Addison - ACCOUNTING - 101
SummaryPage 1 of 3Taxation of BusinessEntities, 2012, eBook3/eContentChapter6: Accounting for Income TaxesSummaryp. 281Explain the objectives behind FASB ASC Topic 740, Income Taxes, and the income tax provision process. Objectives of ASC 740.
DeVry Addison - ACCOUNTING - 101
Chapter OpenerPage 1 of 3Taxation of BusinessEntities, 2012, eBook3/eContentChapter10: Dispositions of Partnership Interests and Partnership DistributionsChapter Openerp. 445Learning ObjectivesUpon completing this chapter, you should be able to:
DeVry Addison - ACCOUNTING - 101
Basics of Sales of Partnership InterestsPage 1 of 7Taxation of BusinessEntities, 2012, eBook3/eContentChapter10: Dispositions of Partnership Interests and Partnership DistributionsBasics of Sales of Partnership InterestsAs we've seen in previous c
DeVry Addison - ACCOUNTING - 101
Basics of Partnership DistributionsPage 1 of 15Taxation of BusinessEntities, 2012, eBook3/eContentChapter10: Dispositions of Partnership Interests and Partnership DistributionsBasics of Partnership DistributionsLike shareholders receiving corporat
DeVry Addison - ACCOUNTING - 101
Disproportionate DistributionsPage 1 of 2Taxation of BusinessEntities, 2012, eBook3/eContentChapter10: Dispositions of Partnership Interests and Partnership DistributionsDisproportionate DistributionsUp to this point in the chapter, all our distri
DeVry Addison - ACCOUNTING - 101
Special Basis AdjustmentsPage 1 of 4Taxation of BusinessEntities, 2012, eBook3/eContentChapter10: Dispositions of Partnership Interests and Partnership DistributionsSpecial Basis AdjustmentsRecall that when a partner sells her partnership interest
DeVry Addison - ACCOUNTING - 101
ConclusionPage 1 of 1Taxation of BusinessEntities, 2012, eBook3/eContentChapter10: Dispositions of Partnership Interests and Partnership DistributionsConclusionThe tax rules for partnership dispositions and distributions are among the most complex
DeVry Addison - ACCOUNTING - 101
SummaryPage 1 of 3Taxation of BusinessEntities, 2012, eBook3/eContentChapter10: Dispositions of Partnership Interests and Partnership DistributionsSummaryp. 475Determine the tax consequences to the buyer and seller of the disposition of a partner
DeVry Addison - ECON - 101
145Chapter 1Introduction to TaxationTRUE-FALSE QUESTIONSCHAPTER 11. A distinguishing characteristic of a public good is that there is an effective means of limiting theenjoyment of the good.2. When a government provides goods or services that affect
DeVry Addison - ACC - ais
ch1Student: _1. Which of the following are parts of most accounting information systems?A.B.C.D.Activities and documents only.Activities and technology only.Documents and technology only.Activities, documents and technology.2. Accounting inform
DeVry Addison - ACC - ais
ch2Student: _1. Accounting is often confused with:A.B.C.D.Bookkeeping.Finance.Information systems.Tax research.2. Which of the following statements is most true?A.B.C.D.Accounting is the part of bookkeeping devoted to identifying and measu
DeVry Addison - ACC - ais
ch3Student: _1. Which of the following is not a characteristic of a professional defined by Bell?A.B.C.D.Appropriately uses philosophical knowledge.Communicates effectively.Actively seeks additional knowledge.Integrates knowledge from many disci
DeVry Addison - ACC - ais
ch4Student: _1. According to the COSO definition, internal control is a(n):A.B.C.D.Set of procedures.Process.Checklist.Way to eliminate risk.2. According to the COSO definition, internal controls should provide:A.B.C.D.Reasonable assurance
DeVry Addison - ACC - ais
ch5Student: _1. Which type of flowchart gives the user a "big picture" look?A.B.C.D.System.Program.Hardware.Document.2. Which type of flowchart depicts instructions for carrying out a task with a computer?A.B.C.D.System.Program.Hardware.
DeVry Addison - ACC - ais
ch6Student: _1. In a data flow diagram, a circle represents:A.B.C.D.An on-page connector.An off-page connector.An external entity.A business process.2. Databases in a data flow diagram are represented by:A.B.C.D.Parallel lines.Rectangles.
DeVry Addison - ACC - ais
ch7Student: _1. Macro-level factors to consider in IT adoption decisions include all of the following except:A.B.C.D.Adaptability.Financing.Personnel involvement.Strategic fit.2. Which of the following is a macro-level factor to consider in IT
DeVry Addison - ACC - ais
ch8Student: _1. In Porter's value chain, all of the following are primary activities except:A.B.C.D.Information technology.Service.Inbound logistics.Marketing and sales2. Which of the following is a primary activity in Porter's value chain?A.
DeVry Addison - ACC - ais
ch9Student: _1. The acquisition/payment process begins by:A.B.C.D.Choosing a software package.Pre-numbering purchase orders.Establishing an economic order quantity for each inventory item.Requesting goods and services based on monitored need.2.