This preview shows page 1. Sign up to view the full content.
Unformatted text preview: Monte Carlo Methods with R: Monte Carlo Optimization [1] Chapter 5: Monte Carlo Optimization
“He invented a game that allowed players to predict the outcome?”
Susanna Gregory
To Kill or Cure This Chapter
◮ Two uses of computergenerated random variables to solve optimization problems.
◮ The ﬁrst use is to produce stochastic search techniques
⊲ To reach the maximum (or minimum) of a function
⊲ Avoid being trapped in local maxima (or minima)
⊲ Are suﬃciently attracted by the global maximum (or minimum).
◮ The second use of simulation is to approximate the function to be optimized. Monte Carlo Methods with R: Monte Carlo Optimization [2] Monte Carlo Optimization
Introduction ◮ Optimization problems can mostly be seen as one of two kinds:
⊲ Find the extrema of a function h(θ) over a domain Θ
⊲ Find the solution(s) to an implicit equation g (θ) = 0 over a domain Θ.
◮ The problems are exchangeable
⊲ The second one is a minimization problem for a function like h(θ) = g 2(θ)
⊲ while the ﬁrst one is equivalent to solving ∂h(θ)/∂θ = 0
◮ We only focus on the maximization problem Monte Carlo Methods with R: Monte Carlo Optimization [3] Monte Carlo Optimization
Deterministic or Stochastic ◮ Similar to integration, optimization can be deterministic or stochastic
◮ Deterministic: performance dependent on properties of the function
⊲ such as convexity, boundedness, and smoothness
◮ Stochastic (simulation)
⊲ Properties of h play a lesser role in simulationbased approaches.
◮ Therefore, if h is complex or Θ is irregular, chose the stochastic approach. Monte Carlo Methods with R: Monte Carlo Optimization [4] Monte Carlo Optimization
Numerical Optimization ◮ R has several embedded functions to solve optimization problems
⊲ The simplest one is optimize (one dimensional)
Example: Maximizing a Cauchy likelihood C (θ, 1)
◮ When maximizing the likelihood of a Cauchy C (θ, 1) sample,
n ℓ (θ  x 1 , . . . , x n ) =
i=1 1
,
2
1 + (x i − θ ) ◮ The sequence of maxima (MLEs) → θ∗ = 0 when n → ∞.
◮ But the journey is not a smooth one... Monte Carlo Methods with R: Monte Carlo Optimization [5] Monte Carlo Optimization
Cauchy Likelihood ◮ MLEs (left) at each sample size, n = 1, 500 , and plot of ﬁnal likelihood (right).
⊲ Why are the MLEs so wiggly?
⊲ The likelihood is not as wellbehaved as it seems Monte Carlo Methods with R: Monte Carlo Optimization [6] Monte Carlo Optimization
Cauchy Likelihood2 ◮ The likelihood ℓ(θx1, . . . , xn) = n
1
i=1 1+(xi −θ)2 ◮ Is like a polynomial of degree 2n
◮ The derivative has 2n zeros
◮ Hard to see if n = 500
◮ Here is n = 5
◮ R code Monte Carlo Methods with R: Monte Carlo Optimization [7] Monte Carlo Optimization
NewtonRaphson ◮ Similarly, nlm is a generic R function using the Newton–Raphson method
◮ Based on the recurrence relation
θi+1 ∂ 2h
= θi −
(θ i )
∂θ∂θT −1 ∂h
(θ i )
∂θ ◮ Where the matrix of the second derivatives is called the Hessian
⊲ This method is perfect when h is quadratic
⊲ But may also deteriorate when h is highly nonlinear
⊲ It also obviously depends on the starting point θ0 when h has several minima. Monte Carlo Methods with R: Monte Carlo Optimization [8] Monte Carlo Optimization
NewtonRaphson; Mixture Model Likelihood
1
◮ Bimodal Mixture Model Likelihood 4 N (µ1, 1) + 3 N (µ2, 1)
4 ◮ Sequences go to the closest mode
◮ Starting point (−1, −1) has a steep gradient
⊲ Bypasses the main mode (−0.68, 1.98)
⊲ Goes to other mode (lower likelihood) Monte Carlo Methods with R: Monte Carlo Optimization [9] Stochastic search
A Basic Solution ◮ A natural if rudimentary way of using simulation to ﬁnd maxθ h(θ)
⊲ Simulate points over Θ according to an arbitrary distribution f positive on Θ
⊲ Until a high value of h(θ) is observed ◮ Recall h(x) = [cos(50x) + sin(20x)]2
◮ Max=3.8325
◮ Histogram of 1000 runs Monte Carlo Methods with R: Monte Carlo Optimization [10] Stochastic search
Stochastic Gradient Methods ◮ Generating direct simulations from the target can be diﬃcult.
◮ Diﬀerent stochastic approach to maximization
⊲ Explore the surface in a local manner. ⊲ A Markov Chain ⊲ Can use θj +1 = θj + ǫj ⊲ The random component ǫj can be arbitrary ◮ Can also use features of the function: NewtonRaphson Variation
θj +1 = θj + αj ∇h(θj ) ,
⊲ Where ∇h(θj ) is the gradient
⊲ αj the step size αj > 0 , Monte Carlo Methods with R: Monte Carlo Optimization [11] Stochastic search
Stochastic Gradient Methods2 ◮ In diﬃcult problems
⊲ The gradient sequence will most likely get stuck in a local extremum of h.
◮ Stochastic Variation
∇h(θj ) ≈ h(θj + βj ζj ) − h(θj + βj ζj )
∆h(θj , βj ζj )
ζj =
ζj ,
2βj
2βj ⊲ (βj ) is a second decreasing sequence
⊲ ζj is uniform on the unit sphere ζ  = 1. ◮ We then use
θj +1 αj
∆h(θj , βj ζj ) ζj
= θj +
2βj Monte Carlo Methods with R: Monte Carlo Optimization [12] Stochastic Search
A Diﬃcult Minimization ◮ Many Local Minima
◮ Global Min at (0, 0)
◮ Code in the text Monte Carlo Methods with R: Monte Carlo Optimization [13] Stochastic Search
A Diﬃcult Minimization – 2
Scenario 1
αj
βj 2 3 4 1/ log(j + 1) 1/100 log(j + 1) 1/(j + 1) 1/(j + 1)
1/ log(j + 1).1 1/ log(j + 1).1
1/(j + 1).5 1/(j + 1).1 ◮ α ↓ 0 slowly, j αj = ∞ ◮ β ↓ 0 more slowly, 2
j (αj /βj ) <∞ ◮ Scenarios 12: Not enough energy
◮ Scenarios 34: Good Monte Carlo Methods with R: Monte Carlo Optimization [14] Simulated Annealing
Introduction ◮ This name is borrowed from Metallurgy:
◮ A metal manufactured by a slow decrease of temperature (annealing)
⊲ Is stronger than a metal manufactured by a fast decrease of temperature.
◮ The fundamental idea of simulated annealing methods
⊲ A change of scale, or temperature
⊲ Allows for faster moves on the surface of the function h to maximize.
⊲ Rescaling partially avoids the trapping attraction of local maxima.
◮ As T decreases toward 0, the values simulated from this distribution become
concentrated in a narrower and narrower neighborhood of the local maxima of h Monte Carlo Methods with R: Monte Carlo Optimization [15] Simulated Annealing
Metropolis Algorithm/Simulated Annealing • Simulation method proposed by Metropolis et al. (1953)
• Starting from θ0, ζ is generated from
ζ ∼ Uniform in a neighborhood of θ0.
• The new value of θ is generated as
θ1 = ζ with probability ρ = exp(∆h/T ) ∧ 1
θ0 with probability 1 − ρ, ◦ ∆ h = h (ζ ) − h (θ 0 )
◦ If h(ζ ) ≥ h(θ0), ζ is accepted
◦ If h(ζ ) < h(θ0), ζ may still be accepted
◦ This allows escape from local maxima Monte Carlo Methods with R: Monte Carlo Optimization [16] Simulated Annealing
Metropolis Algorithm  Comments • Simulated annealing typically modiﬁes the temperature T at each iteration
• It has the form
1. Simulate ζ from an instrumental distribution
with density g (ζ − θi);
2. Accept θi+1 = ζ with probability
ρi = exp{∆hi/Ti} ∧ 1;
take θi+1 = θi otherwise.
3. Update Ti to Ti+1. • All positive moves accepted
• As T ↓ 0
◦ Harder to accept downward moves
◦ No big downward moves
• Not a Markov Chain  diﬃcult to analyze Monte Carlo Methods with R: Monte Carlo Optimization [17] Simulated Annealing
Simple Example ◮ Trajectory: Ti = 1
(1+i)2 ◮ Log trajectory also works
◮ Can Guarantee Finding Global
Max
◮ R code Monte Carlo Methods with R: Monte Carlo Optimization [18] Simulated Annealing
Normal Mixture ◮ Previous normal mixture
◮ Most sequences ﬁnd max
◮ They visit both modes Monte Carlo Methods with R: Monte Carlo Optimization [19] Stochastic Approximation
Introduction ◮ We now consider methods that work with the objective function h
⊲ Rather than being concerned with fast exploration of the domain Θ.
◮ Unfortunately, the use of those methods results in an additional level of error
⊲ Due to this approximation of h.
◮ But, the objective function in many statistical problems can be expressed as
⊲ h(x) = E[H (x, Z )]
⊲ This is the setting of socalled missingdata models Monte Carlo Methods with R: Monte Carlo Optimization [20] Stochastic Approximation
Optimizing Monte Carlo Approximations ◮ If h(x) = E[H (x, Z )], a Monte Carlo approximation is
1
ˆ
h (x ) =
m m H (x, zi),
i=1 ⊲ Zi’s are generated from the conditional distribution f (z x).
◮ This approximation yields a convergent estimator of h(x) for every value of x
⊲ This is a pointwise convergent estimator
⊲ Its use in optimization setups is not recommended
⊲ Changing sample of Zi’s ⇒ unstable sequence of evaluations
⊲ And a rather noisy approximation to arg max h(x) Monte Carlo Methods with R: Monte Carlo Optimization [21] Stochastic Approximation
Bayesian Probit Example: Bayesian analysis of a simple probit model
◮ Y ∈ {0, 1} has a distribution depending on a covariate X :
Pθ (Y = 1X = x) = 1 − Pθ (Y = 0X = x) = Φ(θ0 + θ1x) ,
⊲ Illustrate with Pima.tr dataset, Y = diabetes indicator, X =BMI
◮ Typically infer from the marginal posterior
Φ(θ0 + θ1xn)yi Φ(−θ0 − θ1xn)1−yi dθ1 = arg max h(θ0 ) arg max
θ0 i=1 ⊲ For a ﬂat prior on θ and a sample (x1, . . . , xn). θ0 Monte Carlo Methods with R: Monte Carlo Optimization [22] Stochastic Approximation
Bayesian Probit – Importance Sampling ◮ No analytic expression for h
◮ The conditional distribution of θ1 given θ0 is also nonstandard
⊲ Use importance sampling with a t distribution with 5 df
⊲ Take µ = 0.1 and σ = 0.03 (MLEs)
◮ Importance Sampling Approximation
1
h0(θ0) =
M M
m
m
m
Φ(θ0 + θ1 xn)yi Φ(−θ0 − θ1 xn)1−yi t5(θ1 ; µ, σ )−1 ,
m=1 i=1 Monte Carlo Methods with R: Monte Carlo Optimization [23] Stochastic Approximation
Importance Sampling Evaluation ◮ Plotting this approximation of h with t samples simulated for each value of θ0
⊲ The maximization of the represented h function is not to be trusted as an
approximation to the maximization of h.
◮ But, if we use the same t sample for all values of θ0
⊲ We obtain a much smoother function
◮ We use importance sampling based on a single sample of Zi’s
⊲ Simulated from an importance function g (z ) for all values of x
⊲ Estimate h with 1
ˆ
hm(x) =
m m
i=1 f (zix)
H (x, zi).
g (zi) Monte Carlo Methods with R: Monte Carlo Optimization [24] Stochastic Approximation
Importance Sampling Likelihood Representation ◮ Top: 100 runs, diﬀerent samples
◮ Middle: 100 runs, same sample
◮ Bottom: averages over 100 runs ◮ The averages over 100 runs are the same  but we will not do 100 runs
◮ R code: Run pimax(25) from mcsm Monte Carlo Methods with R: Monte Carlo Optimization [25] Stochastic Approximation
Comments ◮ This approach is not absolutely foolproof
ˆ
⊲ The precision of hm(x) has no reason to be independent of x
⊲ The number m of simulations has to reﬂect the most varying case.
◮ As in every importance sampling experiment
⊲ The choice of the candidate g is inﬂuential
⊲ In obtaining a good (or a disastrous) approximation of h(x).
◮ Checking for the ﬁnite variance of the ratio f (zix)H (x, zi) g (zi)
⊲ Is a minimal requirement in the choice of g Monte Carlo Methods with R: Monte Carlo Optimization [26] MissingData Models and Demarginalization
Introduction ◮ Missing data models are special cases of the representation h(x) = E[H (x, Z )]
◮ These are models where the density of the observations can be expressed as
g (x  θ ) = f (x, z θ) dz .
Z ◮ This representation occurs in many statistical settings
⊲ Censoring models and mixtures
⊲ Latent variable models (tobit, probit, arch, stochastic volatility, etc.)
⊲ Genetics: Missing SNP calls Monte Carlo Methods with R: Monte Carlo Optimization [27] MissingData Models and Demarginalization
Mixture Model Example: Normal mixture model as a missingdata model
◮ Start with a sample (x1, . . . , xn)
◮ Introduce a vector (z1, . . . , zn) ∈ {1, 2}n such that
Pθ (Zi = 1) = 1 − Pθ (Zi = 2) = 1/4 , XiZi = z ∼ N (µz , 1) , ◮ The (observed) likelihood is then obtained as E[H (x, Z)] for
H (x, z) ∝ 1
exp −(xi − µ1)2/2
4
i; z =1
i 3
exp −(xi − µ2)2 /2 ,
4
i; z =2
i ◮ We recover the mixture model
1
3
N (µ1, 1) + N (µ2, 1)
4
4
⊲ As the marginal distribution of Xi. Monte Carlo Methods with R: Monte Carlo Optimization [28] MissingData Models and Demarginalization
Censored–Data Likelihood Example: Censored–data likelihood
◮ Censored data may come from experiments
⊲ Where some potential observations are replaced with a lower bound
⊲ Because they take too long to observe.
◮ Suppose that we observe Y1, . . ., Ym, iid, from f (y − θ)
⊲ And the (n − m) remaining (Ym+1, . . . , Yn) are censored at the threshold a.
◮ The corresponding likelihood function is
m L(θy) = [1 − F (a − θ)]n−m f (y i − θ ),
i=1 ⊲ F is the cdf associated with f Monte Carlo Methods with R: Monte Carlo Optimization [29] MissingData Models and Demarginalization
Recovering the Observed Data Likelihood ◮ If we had observed the last n − m values
⊲ Say z = (zm+1, . . . , zn), with zi ≥ a (i = m + 1, . . . , n),
⊲ We could have constructed the (complete data) likelihood
m Lc(θy, z) = n f (yi − θ)
i=1 f (zi − θ) .
i=m+1 ◮ Note that
Lc(θy, z)k (zy, θ) dz, L(θy) = E[Lc(θy, Z)] =
Z ⊲ Where k (zy, θ) is the density of the missing data
⊲ Conditional on the observed data
⊲ The product of the f (zi − θ)/[1 − F (a − θ)]’s
⊲ f (z − θ) restricted to (a, +∞). Monte Carlo Methods with R: Monte Carlo Optimization [30] MissingData Models and Demarginalization
Comments ◮ When we have the relationship
g (x  θ ) = f (x, z θ) dz .
Z ⊲ Z merely serves to simplify calculations
⊲ it does not necessarily have a speciﬁc meaning
◮ We have the completedata likelihood Lc(θx, z)) = f (x, zθ)
⊲ The likelihood we would obtain
⊲ Were we to observe (x, z),the complete data
◮ REMEMBER:
g (x  θ ) = f (x, z θ) dz .
Z Monte Carlo Methods with R: Monte Carlo Optimization [31] The EM Algorithm
Introduction ◮ The EM algorithm is a deterministic optimization technique
⊲ Dempster, Laird and Rubin 1977
◮ Takes advantage of the missing data representation
⊲ Builds a sequence of easier maximization problems
⊲ Whose limit is the answer to the original problem ◮ We assume that we observe X1, . . . , Xn ∼ g (xθ) that satisﬁes
g (x  θ ) = f ( x , z θ ) d z,
Z ˆ
⊲ And we want to compute θ = arg max L(θx) = arg max g (xθ). Monte Carlo Methods with R: Monte Carlo Optimization [32] The EM Algorithm
First Details ◮ With the relationship g (xθ) = Z f ( x , z θ ) d z, ⊲ ( X , Z ) ∼ f ( x , z θ )
◮ The conditional distribution of the missing data Z
⊲ Given the observed data x is
k (zθ, x) = f (x, zθ) g (xθ) .
◮ Taking the logarithm of this expression leads to the following relationship
log L(θx) = Eθ0 [log Lc(θx, Z)] − Eθ0 [log k (Zθ, x)],
Obs. Data Complete Data Missing Data ◮ Where the expectation is with respect to k (zθ0, x).
◮ In maximizing log L(θx), we can ignore the last term Monte Carlo Methods with R: Monte Carlo Optimization [33] The EM Algorithm
Iterations ◮ Denoting
Q(θθ0, x) = Eθ0 [log Lc(θx, Z)],
◮ EM algorithm indeed proceeds by maximizing Q(θθ0, x) at each iteration
ˆ
ˆ
ˆ
⊲ If θ(1) = argmaxQ(θθ0, x), θ(0) → θ(1)
ˆ
◮ Sequence of estimators {θ(j )}, where
ˆ
ˆ
θ(j ) = argmaxQ(θθ(j −1))
◮ This iterative scheme
⊲ Contains both an expectation step
⊲ And a maximization step
⊲ Giving the algorithm its name. Monte Carlo Methods with R: Monte Carlo Optimization [34] The EM Algorithm
The Algorithm
ˆ
Pick a starting value θ(0) and set m = 0.
Repeat
1. Compute (the Estep)
ˆ
Q(θθ(m), x) = Eθ [log Lc(θx, Z)] ,
ˆ
( m) ˆ
where the expectation is with respect to k (zθ(m), x).
ˆ
2. Maximize Q(θθ(m), x) in θ and take (the Mstep)
ˆ
ˆ
θ(m+1) = arg max Q(θθ(m), x)
θ and set m = m + 1
ˆ
ˆ
until a fixed point is reached; i.e., θ(m+1) = θ(m).fixed point Monte Carlo Methods with R: Monte Carlo Optimization [35] The EM Algorithm
Properties ◮ Jensen’s inequality ⇒ The likelihood increases at each step of the EM algorithm
ˆ
ˆ
L(θ(j +1)x) ≥ L(θ(j )x),
ˆ
ˆ
ˆˆ
⊲ Equality holding if and only if Q(θ(j +1)θ(j ), x) = Q(θ(j )θ(j ), x).
ˆ
◮ Every limit point of an EM sequence {θ(j )} is a stationary point of L(θx)
⊲ Not necessarily the maximum likelihood estimator
⊲ In practice, we run EM several times with diﬀerent starting points.
◮ Implementing the EM algorithm thus means being able to
(a) Compute the function Q(θ′θ, x)
(b) Maximize this function. Monte Carlo Methods with R: Monte Carlo Optimization [36] The EM Algorithm
Censored Data Example ◮ The completedata likelihood is
n m exp{−(zi − θ)2/2} , exp{−(yi − θ)2/2} Lc(θy, z) ∝ i=m+1 i=1 ◮ With expected completedata loglikelihood
1
Q (θ  θ 0 , y ) = −
2 m n 1
(y i − θ )2 −
Eθ0 [(Zi − θ)2] ,
2 i=m+1
i=1 ⊲ the Zi are distributed from a normal N (θ, 1) distribution truncated at a.
◮ Mstep (diﬀerentiating Q(θθ0, y) in θ and setting it equal to 0 gives
¯
ˆ = m y + (n − m )E θ ′ [ Z 1 ] .
θ
n
ϕ( a θ
⊲ With Eθ [Z1] = θ + 1−Φ(−−)θ) ,
a Monte Carlo Methods with R: Monte Carlo Optimization [37] The EM Algorithm
Censored Data MLEs ◮ EM sequence
ˆ(j )
ˆ
ˆ(j +1) = m y + n − m θ(j ) + ϕ(a − θ )
¯
θ
ˆ
n
n
1 − Φ(a − θ(j )) ◮ Climbing the Likelihood
◮ R code Monte Carlo Methods with R: Monte Carlo Optimization [38] The EM Algorithm
Normal Mixture ◮ Normal Mixture Bimodal Likelihood
1
Q(θ θ, x) = −
2 n ′ Eθ Zi(xi − µ1)2 + (1 − Zi)(xi − µ2)2 x .
i=1 Solving the Mstep then provides the closedform expressions
n n µ′1 Zi xi  x = Eθ i=1 i=1 and n n µ′2 (1 − Zi)xix = Eθ Zi  x Eθ i=1 Since
E θ [ Zi  x ] = (1 − Zi)x . Eθ
i=1 ϕ(xi − µ1)
,
ϕ(xi − µ1) + 3ϕ(xi − µ2) Monte Carlo Methods with R: Monte Carlo Optimization [39] The EM Algorithm
Normal Mixture MLEs ◮ EM ﬁve times with various starting points
◮ Two out of ﬁve sequences → higher mode
◮ Others → lower mode Monte Carlo Methods with R: Monte Carlo Optimization [40] Monte Carlo EM
Introduction ◮ If computation Q(θθ0, x) is diﬃcult, can use Monte Carlo
ˆ
◮ For Z1, . . . , ZT ∼ k (zx, θ(m)), maximize
ˆ (θ  θ 0 , x ) = 1
Q
T T log Lc(θx, zi)
i=1 ◮ Better: Use importance sampling
⊲ Since
g (x  θ )
f ( x , z θ )
= arg max log Eθ(0)
x,
arg max L(θx) = arg max log
θ
θ
θ
g (xθ(0))
f (x, zθ(0))
⊲ Use the approximation to the loglikelihood
1
log L(θx) ≈
T T
i=1 Lc(θx, zi)
,
c (θ  x , z )
L (0)
i Monte Carlo Methods with R: Monte Carlo Optimization [41] Monte Carlo EM
Genetics Data Example: Genetic linkage.
◮ A classic example of the EM algorithm
◮ Observations (x1, x2, x3, x4) are gathered from the multinomial distribution
1 θ1
1
θ
M n; + , (1 − θ), (1 − θ),
.
2 44
4
4 ◮ Estimation is easier if the x1 cell is split into two cells
⊲ We create the augmented model
1
θ
1θ1
(z1, z2, x2, x3, x4) ∼ M n; , , (1 − θ), (1 − θ),
244
4
4
with x1 = z1 + z2. ⊲ Completedata likelihood: θz2+x4 (1 − θ)x2+x3
⊲ Observeddata likelihood: (2 + θ)x1 θx4 (1 − θ)x2+x3 Monte Carlo Methods with R: Monte Carlo Optimization [42] Monte Carlo EM
Genetics Linkage Calculations ◮ The expected complete loglikelihood function is
Eθ0 [(Z2 + x4) log θ + (x2 + x3) log(1 − θ)] = θ0
x1 + x4 log θ + (x2 + x3) log(1 − θ),
2 + θ0 ⊲ which can easily be maximized in θ, leading to the EM step
ˆ
θ1 = θ 0 x1
2 + θ0 θ 0 x1
+ x2 + x3 + x4
2 + θ0 ◮ Monte Carlo EM: Replace the expectation with
⊲ zm = 1
m m
i=1 zi , zi ∼ B (x1, θ0/(2 + θ0)) ◮ The MCEM step would then be
zm
,
z m + x2 + x3 + x4
ˆ
which converges to θ1 as m grows to inﬁnity.
θ1 = . Monte Carlo Methods with R: Monte Carlo Optimization [43] Monte Carlo EM
Genetics Linkage MLEs ◮ Note variation in MCEM sequence
◮ Can control with ↑ simulations
◮ R code Monte Carlo Methods with R: Monte Carlo Optimization [44] Monte Carlo EM
Random eﬀect logit model Example: Random eﬀect logit model
◮ Random eﬀect logit model,
⊲ yij is distributed conditionally on one covariate xij as a logit model
exp {βxij + ui}
P (yij = 1xij , ui, β ) =
,
1 + exp {βxij + ui}
⊲ ui ∼ N (0, σ 2) is an unobserved random eﬀect.
⊲ (U1, . . . , Un) therefore corresponds to the missing data Z Monte Carlo Methods with R: Monte Carlo Optimization [45] Monte Carlo EM
Random eﬀect logit model likelihood ◮ For the complete data likelihood with θ = (β, σ ),
Q(θ′θ, x, y) = yij E[β ′xij + Uiβ, σ, x, y]
i,j E[log 1 + exp{β ′xij + Ui}β, σ, x, y] −
i,j E[Ui2β, σ, x, y]/2σ ′2 − n log σ ′ , −
i ⊲ it is impossible to compute the expectations in Ui.
◮ Were those available, the Mstep would be diﬃcult but feasible
◮ MCEM: Simulate the Ui’s conditional on β, σ, x, y from
exp
π (uiβ, σ, x, y) ∝
j j yij ui − u2/2σ 2
i [1 + exp {βxij + ui}] Monte Carlo Methods with R: Monte Carlo Optimization [46] Monte Carlo EM
Random eﬀect logit MLEs ◮ MCEM sequence
⊲ Increases the number of Monte Carlo steps
at each iteration
◮ MCEM algorithm
⊲ Does not have EM monotonicity property
◮ Top: Sequence of β ’s from the MCEM
algorithm
◮ Bottom: Sequence of completed likelihoods ...
View Full
Document
 Fall '11
 Womack

Click to edit the document details