This preview has intentionally blurred sections. Sign up to view the full version.View Full Document
Unformatted text preview: 4. Stochastic methods 4.1 Simulated annealing Gradient-based optimization and least-squares methods rely on the fact that the multivariate function being minimized is smooth and does not have many local minima. The polytope method does not need this requirement but works well for small numbers of parameters and does not perform well under local minima either. All these methods are termed “greedy” methods for optimization. Simulated Annealling (SA) is an alternative, probabilistic method for optimization that can deal with local minima. However, the price to be paid for this generalization that it is a much more CPU-intensive than the greedy algorithms. Recall that the problem with direct search (prototype) or gradient-based optimization is it gets stuck always in the local minima where you started closest to. The basic concept of SA is borrowed from statistical mechanics where one studies the property of a collection of large number of atoms or particles in liquid or solid. An annealing process is a process whereby one wants to create large crystal structure (a minimum energy state) by first heating a liquid and then slowly cooling in down in stages to avoid forming a “glass” state (quenching) which does not represent the minimum energy state. The gradual decrease of temperature is critical: Crystal Solid Liquid Liquid T T T ↓ ↓ ↓ → → → T=temperature The analogy is that the optimization process can be seen as analogous to cooling a liquid to form a solid that has a minimum energy state. A state of liquid is defined as a particular configuration of particles. When annealing, at each temperature T, the solid or liquid is allowed to reach thermal equilibrium. At T each state i represents a specific internal energy E i . The probability of observing state i with energy E i at T is given by a Boltzmann distribution. Prob (observing state i with E i ) = ( 29 ( 29 Constant ion Normalizat state possible all over sum / exp / exp = ∑-- jes j i KT E KT E Where K is the famous Boltzmann constant. So you wait, at T , until you r r r r r r r r r m equilibriu at i) (state Prob the calculate m equilibriu reach then reduce T etc T 0. In optimization you form an objective function to be minimized, say ( 29 θ E , where θ are the parameters to be found. In the SA analogy we identify energy with this objective function, while θ is considered to be a state i of a configuration of particles. Suppose we are now in the annealing process at a certain temperature T , how do we find/calculate the minimum energy state (equilibrium) at that temperature: to do this we use the so-called Metropolis algorithm as follows (this procedure essentially avoids calculating the normalization constant) Metropolis algorithm : at temperature T: loop over a number of changes in θ Proposal 1....
View Full Document
- Spring '10
- Optimization, objective function, minimum energy state