ch5 - Monte Carlo Methods with R: Monte Carlo Optimization...

Info iconThis preview shows page 1. Sign up to view the full content.

View Full Document Right Arrow Icon
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: Monte Carlo Methods with R: Monte Carlo Optimization [1] Chapter 5: Monte Carlo Optimization “He invented a game that allowed players to predict the outcome?” Susanna Gregory To Kill or Cure This Chapter ◮ Two uses of computer-generated random variables to solve optimization problems. ◮ The first use is to produce stochastic search techniques ⊲ To reach the maximum (or minimum) of a function ⊲ Avoid being trapped in local maxima (or minima) ⊲ Are sufficiently attracted by the global maximum (or minimum). ◮ The second use of simulation is to approximate the function to be optimized. Monte Carlo Methods with R: Monte Carlo Optimization [2] Monte Carlo Optimization Introduction ◮ Optimization problems can mostly be seen as one of two kinds: ⊲ Find the extrema of a function h(θ) over a domain Θ ⊲ Find the solution(s) to an implicit equation g (θ) = 0 over a domain Θ. ◮ The problems are exchangeable ⊲ The second one is a minimization problem for a function like h(θ) = g 2(θ) ⊲ while the first one is equivalent to solving ∂h(θ)/∂θ = 0 ◮ We only focus on the maximization problem Monte Carlo Methods with R: Monte Carlo Optimization [3] Monte Carlo Optimization Deterministic or Stochastic ◮ Similar to integration, optimization can be deterministic or stochastic ◮ Deterministic: performance dependent on properties of the function ⊲ such as convexity, boundedness, and smoothness ◮ Stochastic (simulation) ⊲ Properties of h play a lesser role in simulation-based approaches. ◮ Therefore, if h is complex or Θ is irregular, chose the stochastic approach. Monte Carlo Methods with R: Monte Carlo Optimization [4] Monte Carlo Optimization Numerical Optimization ◮ R has several embedded functions to solve optimization problems ⊲ The simplest one is optimize (one dimensional) Example: Maximizing a Cauchy likelihood C (θ, 1) ◮ When maximizing the likelihood of a Cauchy C (θ, 1) sample, n ℓ (θ | x 1 , . . . , x n ) = i=1 1 , 2 1 + (x i − θ ) ◮ The sequence of maxima (MLEs) → θ∗ = 0 when n → ∞. ◮ But the journey is not a smooth one... Monte Carlo Methods with R: Monte Carlo Optimization [5] Monte Carlo Optimization Cauchy Likelihood ◮ MLEs (left) at each sample size, n = 1, 500 , and plot of final likelihood (right). ⊲ Why are the MLEs so wiggly? ⊲ The likelihood is not as well-behaved as it seems Monte Carlo Methods with R: Monte Carlo Optimization [6] Monte Carlo Optimization Cauchy Likelihood-2 ◮ The likelihood ℓ(θ|x1, . . . , xn) = n 1 i=1 1+(xi −θ)2 ◮ Is like a polynomial of degree 2n ◮ The derivative has 2n zeros ◮ Hard to see if n = 500 ◮ Here is n = 5 ◮ R code Monte Carlo Methods with R: Monte Carlo Optimization [7] Monte Carlo Optimization Newton-Raphson ◮ Similarly, nlm is a generic R function using the Newton–Raphson method ◮ Based on the recurrence relation θi+1 ∂ 2h = θi − (θ i ) ∂θ∂θT −1 ∂h (θ i ) ∂θ ◮ Where the matrix of the second derivatives is called the Hessian ⊲ This method is perfect when h is quadratic ⊲ But may also deteriorate when h is highly nonlinear ⊲ It also obviously depends on the starting point θ0 when h has several minima. Monte Carlo Methods with R: Monte Carlo Optimization [8] Monte Carlo Optimization Newton-Raphson; Mixture Model Likelihood 1 ◮ Bimodal Mixture Model Likelihood 4 N (µ1, 1) + 3 N (µ2, 1) 4 ◮ Sequences go to the closest mode ◮ Starting point (−1, −1) has a steep gradient ⊲ Bypasses the main mode (−0.68, 1.98) ⊲ Goes to other mode (lower likelihood) Monte Carlo Methods with R: Monte Carlo Optimization [9] Stochastic search A Basic Solution ◮ A natural if rudimentary way of using simulation to find maxθ h(θ) ⊲ Simulate points over Θ according to an arbitrary distribution f positive on Θ ⊲ Until a high value of h(θ) is observed ◮ Recall h(x) = [cos(50x) + sin(20x)]2 ◮ Max=3.8325 ◮ Histogram of 1000 runs Monte Carlo Methods with R: Monte Carlo Optimization [10] Stochastic search Stochastic Gradient Methods ◮ Generating direct simulations from the target can be difficult. ◮ Different stochastic approach to maximization ⊲ Explore the surface in a local manner. ⊲ A Markov Chain ⊲ Can use θj +1 = θj + ǫj ⊲ The random component ǫj can be arbitrary ◮ Can also use features of the function: Newton-Raphson Variation θj +1 = θj + αj ∇h(θj ) , ⊲ Where ∇h(θj ) is the gradient ⊲ αj the step size αj > 0 , Monte Carlo Methods with R: Monte Carlo Optimization [11] Stochastic search Stochastic Gradient Methods-2 ◮ In difficult problems ⊲ The gradient sequence will most likely get stuck in a local extremum of h. ◮ Stochastic Variation ∇h(θj ) ≈ h(θj + βj ζj ) − h(θj + βj ζj ) ∆h(θj , βj ζj ) ζj = ζj , 2βj 2βj ⊲ (βj ) is a second decreasing sequence ⊲ ζj is uniform on the unit sphere ||ζ || = 1. ◮ We then use θj +1 αj ∆h(θj , βj ζj ) ζj = θj + 2βj Monte Carlo Methods with R: Monte Carlo Optimization [12] Stochastic Search A Difficult Minimization ◮ Many Local Minima ◮ Global Min at (0, 0) ◮ Code in the text Monte Carlo Methods with R: Monte Carlo Optimization [13] Stochastic Search A Difficult Minimization – 2 Scenario 1 αj βj 2 3 4 1/ log(j + 1) 1/100 log(j + 1) 1/(j + 1) 1/(j + 1) 1/ log(j + 1).1 1/ log(j + 1).1 1/(j + 1).5 1/(j + 1).1 ◮ α ↓ 0 slowly, j αj = ∞ ◮ β ↓ 0 more slowly, 2 j (αj /βj ) <∞ ◮ Scenarios 1-2: Not enough energy ◮ Scenarios 3-4: Good Monte Carlo Methods with R: Monte Carlo Optimization [14] Simulated Annealing Introduction ◮ This name is borrowed from Metallurgy: ◮ A metal manufactured by a slow decrease of temperature (annealing) ⊲ Is stronger than a metal manufactured by a fast decrease of temperature. ◮ The fundamental idea of simulated annealing methods ⊲ A change of scale, or temperature ⊲ Allows for faster moves on the surface of the function h to maximize. ⊲ Rescaling partially avoids the trapping attraction of local maxima. ◮ As T decreases toward 0, the values simulated from this distribution become concentrated in a narrower and narrower neighborhood of the local maxima of h Monte Carlo Methods with R: Monte Carlo Optimization [15] Simulated Annealing Metropolis Algorithm/Simulated Annealing • Simulation method proposed by Metropolis et al. (1953) • Starting from θ0, ζ is generated from ζ ∼ Uniform in a neighborhood of θ0. • The new value of θ is generated as θ1 = ζ with probability ρ = exp(∆h/T ) ∧ 1 θ0 with probability 1 − ρ, ◦ ∆ h = h (ζ ) − h (θ 0 ) ◦ If h(ζ ) ≥ h(θ0), ζ is accepted ◦ If h(ζ ) < h(θ0), ζ may still be accepted ◦ This allows escape from local maxima Monte Carlo Methods with R: Monte Carlo Optimization [16] Simulated Annealing Metropolis Algorithm - Comments • Simulated annealing typically modifies the temperature T at each iteration • It has the form 1. Simulate ζ from an instrumental distribution with density g (|ζ − θi|); 2. Accept θi+1 = ζ with probability ρi = exp{∆hi/Ti} ∧ 1; take θi+1 = θi otherwise. 3. Update Ti to Ti+1. • All positive moves accepted • As T ↓ 0 ◦ Harder to accept downward moves ◦ No big downward moves • Not a Markov Chain - difficult to analyze Monte Carlo Methods with R: Monte Carlo Optimization [17] Simulated Annealing Simple Example ◮ Trajectory: Ti = 1 (1+i)2 ◮ Log trajectory also works ◮ Can Guarantee Finding Global Max ◮ R code Monte Carlo Methods with R: Monte Carlo Optimization [18] Simulated Annealing Normal Mixture ◮ Previous normal mixture ◮ Most sequences find max ◮ They visit both modes Monte Carlo Methods with R: Monte Carlo Optimization [19] Stochastic Approximation Introduction ◮ We now consider methods that work with the objective function h ⊲ Rather than being concerned with fast exploration of the domain Θ. ◮ Unfortunately, the use of those methods results in an additional level of error ⊲ Due to this approximation of h. ◮ But, the objective function in many statistical problems can be expressed as ⊲ h(x) = E[H (x, Z )] ⊲ This is the setting of so-called missing-data models Monte Carlo Methods with R: Monte Carlo Optimization [20] Stochastic Approximation Optimizing Monte Carlo Approximations ◮ If h(x) = E[H (x, Z )], a Monte Carlo approximation is 1 ˆ h (x ) = m m H (x, zi), i=1 ⊲ Zi’s are generated from the conditional distribution f (z |x). ◮ This approximation yields a convergent estimator of h(x) for every value of x ⊲ This is a pointwise convergent estimator ⊲ Its use in optimization setups is not recommended ⊲ Changing sample of Zi’s ⇒ unstable sequence of evaluations ⊲ And a rather noisy approximation to arg max h(x) Monte Carlo Methods with R: Monte Carlo Optimization [21] Stochastic Approximation Bayesian Probit Example: Bayesian analysis of a simple probit model ◮ Y ∈ {0, 1} has a distribution depending on a covariate X : Pθ (Y = 1|X = x) = 1 − Pθ (Y = 0|X = x) = Φ(θ0 + θ1x) , ⊲ Illustrate with Pima.tr dataset, Y = diabetes indicator, X =BMI ◮ Typically infer from the marginal posterior Φ(θ0 + θ1xn)yi Φ(−θ0 − θ1xn)1−yi dθ1 = arg max h(θ0 ) arg max θ0 i=1 ⊲ For a flat prior on θ and a sample (x1, . . . , xn). θ0 Monte Carlo Methods with R: Monte Carlo Optimization [22] Stochastic Approximation Bayesian Probit – Importance Sampling ◮ No analytic expression for h ◮ The conditional distribution of θ1 given θ0 is also nonstandard ⊲ Use importance sampling with a t distribution with 5 df ⊲ Take µ = 0.1 and σ = 0.03 (MLEs) ◮ Importance Sampling Approximation 1 h0(θ0) = M M m m m Φ(θ0 + θ1 xn)yi Φ(−θ0 − θ1 xn)1−yi t5(θ1 ; µ, σ )−1 , m=1 i=1 Monte Carlo Methods with R: Monte Carlo Optimization [23] Stochastic Approximation Importance Sampling Evaluation ◮ Plotting this approximation of h with t samples simulated for each value of θ0 ⊲ The maximization of the represented h function is not to be trusted as an approximation to the maximization of h. ◮ But, if we use the same t sample for all values of θ0 ⊲ We obtain a much smoother function ◮ We use importance sampling based on a single sample of Zi’s ⊲ Simulated from an importance function g (z ) for all values of x ⊲ Estimate h with 1 ˆ hm(x) = m m i=1 f (zi|x) H (x, zi). g (zi) Monte Carlo Methods with R: Monte Carlo Optimization [24] Stochastic Approximation Importance Sampling Likelihood Representation ◮ Top: 100 runs, different samples ◮ Middle: 100 runs, same sample ◮ Bottom: averages over 100 runs ◮ The averages over 100 runs are the same - but we will not do 100 runs ◮ R code: Run pimax(25) from mcsm Monte Carlo Methods with R: Monte Carlo Optimization [25] Stochastic Approximation Comments ◮ This approach is not absolutely fool-proof ˆ ⊲ The precision of hm(x) has no reason to be independent of x ⊲ The number m of simulations has to reflect the most varying case. ◮ As in every importance sampling experiment ⊲ The choice of the candidate g is influential ⊲ In obtaining a good (or a disastrous) approximation of h(x). ◮ Checking for the finite variance of the ratio f (zi|x)H (x, zi) g (zi) ⊲ Is a minimal requirement in the choice of g Monte Carlo Methods with R: Monte Carlo Optimization [26] Missing-Data Models and Demarginalization Introduction ◮ Missing data models are special cases of the representation h(x) = E[H (x, Z )] ◮ These are models where the density of the observations can be expressed as g (x | θ ) = f (x, z |θ) dz . Z ◮ This representation occurs in many statistical settings ⊲ Censoring models and mixtures ⊲ Latent variable models (tobit, probit, arch, stochastic volatility, etc.) ⊲ Genetics: Missing SNP calls Monte Carlo Methods with R: Monte Carlo Optimization [27] Missing-Data Models and Demarginalization Mixture Model Example: Normal mixture model as a missing-data model ◮ Start with a sample (x1, . . . , xn) ◮ Introduce a vector (z1, . . . , zn) ∈ {1, 2}n such that Pθ (Zi = 1) = 1 − Pθ (Zi = 2) = 1/4 , Xi|Zi = z ∼ N (µz , 1) , ◮ The (observed) likelihood is then obtained as E[H (x, Z)] for H (x, z) ∝ 1 exp −(xi − µ1)2/2 4 i; z =1 i 3 exp −(xi − µ2)2 /2 , 4 i; z =2 i ◮ We recover the mixture model 1 3 N (µ1, 1) + N (µ2, 1) 4 4 ⊲ As the marginal distribution of Xi. Monte Carlo Methods with R: Monte Carlo Optimization [28] Missing-Data Models and Demarginalization Censored–Data Likelihood Example: Censored–data likelihood ◮ Censored data may come from experiments ⊲ Where some potential observations are replaced with a lower bound ⊲ Because they take too long to observe. ◮ Suppose that we observe Y1, . . ., Ym, iid, from f (y − θ) ⊲ And the (n − m) remaining (Ym+1, . . . , Yn) are censored at the threshold a. ◮ The corresponding likelihood function is m L(θ|y) = [1 − F (a − θ)]n−m f (y i − θ ), i=1 ⊲ F is the cdf associated with f Monte Carlo Methods with R: Monte Carlo Optimization [29] Missing-Data Models and Demarginalization Recovering the Observed Data Likelihood ◮ If we had observed the last n − m values ⊲ Say z = (zm+1, . . . , zn), with zi ≥ a (i = m + 1, . . . , n), ⊲ We could have constructed the (complete data) likelihood m Lc(θ|y, z) = n f (yi − θ) i=1 f (zi − θ) . i=m+1 ◮ Note that Lc(θ|y, z)k (z|y, θ) dz, L(θ|y) = E[Lc(θ|y, Z)] = Z ⊲ Where k (z|y, θ) is the density of the missing data ⊲ Conditional on the observed data ⊲ The product of the f (zi − θ)/[1 − F (a − θ)]’s ⊲ f (z − θ) restricted to (a, +∞). Monte Carlo Methods with R: Monte Carlo Optimization [30] Missing-Data Models and Demarginalization Comments ◮ When we have the relationship g (x | θ ) = f (x, z |θ) dz . Z ⊲ Z merely serves to simplify calculations ⊲ it does not necessarily have a specific meaning ◮ We have the complete-data likelihood Lc(θ|x, z)) = f (x, z|θ) ⊲ The likelihood we would obtain ⊲ Were we to observe (x, z),the complete data ◮ REMEMBER: g (x | θ ) = f (x, z |θ) dz . Z Monte Carlo Methods with R: Monte Carlo Optimization [31] The EM Algorithm Introduction ◮ The EM algorithm is a deterministic optimization technique ⊲ Dempster, Laird and Rubin 1977 ◮ Takes advantage of the missing data representation ⊲ Builds a sequence of easier maximization problems ⊲ Whose limit is the answer to the original problem ◮ We assume that we observe X1, . . . , Xn ∼ g (x|θ) that satisfies g (x | θ ) = f ( x , z| θ ) d z, Z ˆ ⊲ And we want to compute θ = arg max L(θ|x) = arg max g (x|θ). Monte Carlo Methods with R: Monte Carlo Optimization [32] The EM Algorithm First Details ◮ With the relationship g (x|θ) = Z f ( x , z| θ ) d z, ⊲ ( X , Z ) ∼ f ( x , z| θ ) ◮ The conditional distribution of the missing data Z ⊲ Given the observed data x is k (z|θ, x) = f (x, z|θ) g (x|θ) . ◮ Taking the logarithm of this expression leads to the following relationship log L(θ|x) = Eθ0 [log Lc(θ|x, Z)] − Eθ0 [log k (Z|θ, x)], Obs. Data Complete Data Missing Data ◮ Where the expectation is with respect to k (z|θ0, x). ◮ In maximizing log L(θ|x), we can ignore the last term Monte Carlo Methods with R: Monte Carlo Optimization [33] The EM Algorithm Iterations ◮ Denoting Q(θ|θ0, x) = Eθ0 [log Lc(θ|x, Z)], ◮ EM algorithm indeed proceeds by maximizing Q(θ|θ0, x) at each iteration ˆ ˆ ˆ ⊲ If θ(1) = argmaxQ(θ|θ0, x), θ(0) → θ(1) ˆ ◮ Sequence of estimators {θ(j )}, where ˆ ˆ θ(j ) = argmaxQ(θ|θ(j −1)) ◮ This iterative scheme ⊲ Contains both an expectation step ⊲ And a maximization step ⊲ Giving the algorithm its name. Monte Carlo Methods with R: Monte Carlo Optimization [34] The EM Algorithm The Algorithm ˆ Pick a starting value θ(0) and set m = 0. Repeat 1. Compute (the E-step) ˆ Q(θ|θ(m), x) = Eθ [log Lc(θ|x, Z)] , ˆ ( m) ˆ where the expectation is with respect to k (z|θ(m), x). ˆ 2. Maximize Q(θ|θ(m), x) in θ and take (the M-step) ˆ ˆ θ(m+1) = arg max Q(θ|θ(m), x) θ and set m = m + 1 ˆ ˆ until a fixed point is reached; i.e., θ(m+1) = θ(m).fixed point Monte Carlo Methods with R: Monte Carlo Optimization [35] The EM Algorithm Properties ◮ Jensen’s inequality ⇒ The likelihood increases at each step of the EM algorithm ˆ ˆ L(θ(j +1)|x) ≥ L(θ(j )|x), ˆ ˆ ˆˆ ⊲ Equality holding if and only if Q(θ(j +1)|θ(j ), x) = Q(θ(j )|θ(j ), x). ˆ ◮ Every limit point of an EM sequence {θ(j )} is a stationary point of L(θ|x) ⊲ Not necessarily the maximum likelihood estimator ⊲ In practice, we run EM several times with different starting points. ◮ Implementing the EM algorithm thus means being able to (a) Compute the function Q(θ′|θ, x) (b) Maximize this function. Monte Carlo Methods with R: Monte Carlo Optimization [36] The EM Algorithm Censored Data Example ◮ The complete-data likelihood is n m exp{−(zi − θ)2/2} , exp{−(yi − θ)2/2} Lc(θ|y, z) ∝ i=m+1 i=1 ◮ With expected complete-data log-likelihood 1 Q (θ | θ 0 , y ) = − 2 m n 1 (y i − θ )2 − Eθ0 [(Zi − θ)2] , 2 i=m+1 i=1 ⊲ the Zi are distributed from a normal N (θ, 1) distribution truncated at a. ◮ M-step (differentiating Q(θ|θ0, y) in θ and setting it equal to 0 gives ¯ ˆ = m y + (n − m )E θ ′ [ Z 1 ] . θ n ϕ( a θ ⊲ With Eθ [Z1] = θ + 1−Φ(−−)θ) , a Monte Carlo Methods with R: Monte Carlo Optimization [37] The EM Algorithm Censored Data MLEs ◮ EM sequence ˆ(j ) ˆ ˆ(j +1) = m y + n − m θ(j ) + ϕ(a − θ ) ¯ θ ˆ n n 1 − Φ(a − θ(j )) ◮ Climbing the Likelihood ◮ R code Monte Carlo Methods with R: Monte Carlo Optimization [38] The EM Algorithm Normal Mixture ◮ Normal Mixture Bimodal Likelihood 1 Q(θ |θ, x) = − 2 n ′ Eθ Zi(xi − µ1)2 + (1 − Zi)(xi − µ2)2 x . i=1 Solving the M-step then provides the closed-form expressions n n µ′1 Zi xi | x = Eθ i=1 i=1 and n n µ′2 (1 − Zi)xi|x = Eθ Zi | x Eθ i=1 Since E θ [ Zi | x ] = (1 − Zi)|x . Eθ i=1 ϕ(xi − µ1) , ϕ(xi − µ1) + 3ϕ(xi − µ2) Monte Carlo Methods with R: Monte Carlo Optimization [39] The EM Algorithm Normal Mixture MLEs ◮ EM five times with various starting points ◮ Two out of five sequences → higher mode ◮ Others → lower mode Monte Carlo Methods with R: Monte Carlo Optimization [40] Monte Carlo EM Introduction ◮ If computation Q(θ|θ0, x) is difficult, can use Monte Carlo ˆ ◮ For Z1, . . . , ZT ∼ k (z|x, θ(m)), maximize ˆ (θ | θ 0 , x ) = 1 Q T T log Lc(θ|x, zi) i=1 ◮ Better: Use importance sampling ⊲ Since g (x | θ ) f ( x , z| θ ) = arg max log Eθ(0) x, arg max L(θ|x) = arg max log θ θ θ g (x|θ(0)) f (x, z|θ(0)) ⊲ Use the approximation to the log-likelihood 1 log L(θ|x) ≈ T T i=1 Lc(θ|x, zi) , c (θ | x , z ) L (0) i Monte Carlo Methods with R: Monte Carlo Optimization [41] Monte Carlo EM Genetics Data Example: Genetic linkage. ◮ A classic example of the EM algorithm ◮ Observations (x1, x2, x3, x4) are gathered from the multinomial distribution 1 θ1 1 θ M n; + , (1 − θ), (1 − θ), . 2 44 4 4 ◮ Estimation is easier if the x1 cell is split into two cells ⊲ We create the augmented model 1 θ 1θ1 (z1, z2, x2, x3, x4) ∼ M n; , , (1 − θ), (1 − θ), 244 4 4 with x1 = z1 + z2. ⊲ Complete-data likelihood: θz2+x4 (1 − θ)x2+x3 ⊲ Observed-data likelihood: (2 + θ)x1 θx4 (1 − θ)x2+x3 Monte Carlo Methods with R: Monte Carlo Optimization [42] Monte Carlo EM Genetics Linkage Calculations ◮ The expected complete log-likelihood function is Eθ0 [(Z2 + x4) log θ + (x2 + x3) log(1 − θ)] = θ0 x1 + x4 log θ + (x2 + x3) log(1 − θ), 2 + θ0 ⊲ which can easily be maximized in θ, leading to the EM step ˆ θ1 = θ 0 x1 2 + θ0 θ 0 x1 + x2 + x3 + x4 2 + θ0 ◮ Monte Carlo EM: Replace the expectation with ⊲ zm = 1 m m i=1 zi , zi ∼ B (x1, θ0/(2 + θ0)) ◮ The MCEM step would then be zm , z m + x2 + x3 + x4 ˆ which converges to θ1 as m grows to infinity. θ1 = . Monte Carlo Methods with R: Monte Carlo Optimization [43] Monte Carlo EM Genetics Linkage MLEs ◮ Note variation in MCEM sequence ◮ Can control with ↑ simulations ◮ R code Monte Carlo Methods with R: Monte Carlo Optimization [44] Monte Carlo EM Random effect logit model Example: Random effect logit model ◮ Random effect logit model, ⊲ yij is distributed conditionally on one covariate xij as a logit model exp {βxij + ui} P (yij = 1|xij , ui, β ) = , 1 + exp {βxij + ui} ⊲ ui ∼ N (0, σ 2) is an unobserved random effect. ⊲ (U1, . . . , Un) therefore corresponds to the missing data Z Monte Carlo Methods with R: Monte Carlo Optimization [45] Monte Carlo EM Random effect logit model likelihood ◮ For the complete data likelihood with θ = (β, σ ), Q(θ′|θ, x, y) = yij E[β ′xij + Ui|β, σ, x, y] i,j E[log 1 + exp{β ′xij + Ui}|β, σ, x, y] − i,j E[Ui2|β, σ, x, y]/2σ ′2 − n log σ ′ , − i ⊲ it is impossible to compute the expectations in Ui. ◮ Were those available, the M-step would be difficult but feasible ◮ MCEM: Simulate the Ui’s conditional on β, σ, x, y from exp π (ui|β, σ, x, y) ∝ j j yij ui − u2/2σ 2 i [1 + exp {βxij + ui}] Monte Carlo Methods with R: Monte Carlo Optimization [46] Monte Carlo EM Random effect logit MLEs ◮ MCEM sequence ⊲ Increases the number of Monte Carlo steps at each iteration ◮ MCEM algorithm ⊲ Does not have EM monotonicity property ◮ Top: Sequence of β ’s from the MCEM algorithm ◮ Bottom: Sequence of completed likelihoods ...
View Full Document

Ask a homework question - tutors are online