1995_Viola_thesis_registrationMI

In each case there is a function of a parameters set

Info icon This preview shows page 1. Sign up to view the full content.

View Full Document Right Arrow Icon
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: al Alignment EMMA bears some similarity to methods used for evaluating and adjusting geometrical alignment. These similarities may be seen by revisiting the entropy derivative of Equation 3.28, and comparing it to the derivative of the following construct. 149 Paul A. Viola CHAPTER 7. CONCLUSION We de ne D, half the averaged Mahalonobis distance between values in B and their nearest correspondences in A, 1 X min 1 Dz , z  : 7.1 DT  N i j B zi 2B zj 2A 2 Locally away from discontinuities, the derivative of the above expression is d DT  1 X min d  1 Dz , z  : dT NB zi 2B zj 2A dT 2 i j Comparing the above expression with Equation 3.28, we see the following analogy. If the transformation T is adjusted to reduce the averaged squared di erences" between points in B and their counterparts from A that are nearest in signal value, then a reduction in entropy is obtained. This is intuitive, in that entropy will be lower if clusters in signal value" are tighter so that nearby signal di erences will be smaller. The approximation of this analogy is due to the dissimilarity between max and softmax. Equation 7.1 is essentially the measure used in chamfer matching techniques, such as the method described by Borgefors Borgefors, 1988. Huttenlocher Huttenlocher et al., 1991 has used a related measure in feature matching applications, the Hausdor distance, which uses maximum instead of the sum that appears in Equation 7.1. The similarity between geometrical matching and entropy becomes even stronger if one uses the softmax operation to weight the closest element rather than simply selecting the closest, as Wells has Wells III, 1992b; Wells III, 1992a. We reiterate that in vision applications, these methods have typically been used to measure aggregate geometrical distance, while here we are measuring aggregate distances among signal values typically intensities, brightnesses, or surface properties. 150 Appendix A Appendix A.1 Gradient Descent In a number problems described in this thesis one must nd a set of parameters that extremizes an evaluation function. Examples include: 1 nding the parameters of density so that the likelihood of sample is maximized; 2 nding the pose parameters that align a model and an image best; and 3 nding the weights of a neural network so that it approximates a function best. In each case there is a function of a parameters set F p, whose value is to be either maximized or minimized. The parameters are continuous variables, and we are therefore faced with an in nite number of possible solutions. The gradient descent procedure is an e ective though greedy technique for searching such a space. There are many closely related gradient descent algorithms. Here we will describe the simplest: steepest descent or hill climbing. Starting from an initial guess for the parameters, steepest descent is an iterative procedure that uses the partial derivatives of a function to construct an improved estimate for its parameters. Each parameter is updated by p  p +  @F p : @p The update rate  which is also known as the learning rate must be chosen carefully. When 151 Paul A. Viola APPENDIX A. APPENDIX  is su ciently small one can use a Taylor expansion of F  to prove that F p +  @F p   F p : @p When  is too small p might take arbitrarily long to approach a maximum. If  is chosen correctly p will converge toward the maximum relatively rapidly. There are many gradient based techniques that attempt to speed the rate of convergence of p. Second order techniques such as Levenberg-Marquart and Newton-Raphson use the second derivatives of F p to re-estimate . Conjugate gradient techniques attempt to nd better directions than the gradient of F . In every case one must be careful that the theoretical advantages of the algorithm are not outweighed by the costs of computing it. Researchers in neural networks have found that for many problems it is di cult to realize any actual improvement in convergence speed. The problems for which steepest descent works as well as more complex techniques include functions where there are a large number of parameters| this makes computing the second derivatives quite expensive. 152 Bibliography Anderson, J. and Rosenfeld, E., editors 1988. Neurocomputing: Foundations of Research. MIT Press, Cambridge. Baclawski, K., Rota, G.-C., and Billey, S. 1990. Introduction to the theory of probability. MIT course notes for 18.313. Becker, S. and Hinton, G. E. 1992. Learning to make coherent predictions in domains with discontinuities. In Moody, J. E., Hanson, S. J., and Lippmann, R. P., editors, Advances in Neural Information Processing, volume 4, Denver 1991. Morgan Kaufmann, San Mateo. Bell, A. J. and Sejnowski, T. J. 1995. An information-maximisation approach to blind separation. In Advances in Neural Information Processing, volume 7, Denver 1994. Morgan Kaufmann, San Francisco. Besl, P. and Jain, R. 1985. Three-Dimensional Object Recognition. Computing Surveys, 17:75 145. Bezdek, J., Hall, L., and Clarke, L. 1993. Review of MR...
View Full Document

{[ snackBarMessage ]}

What students are saying

  • Left Quote Icon

    As a current student on this bumpy collegiate pathway, I stumbled upon Course Hero, where I can find study resources for nearly all my courses, get online help from tutors 24/7, and even share my old projects, papers, and lecture notes with other students.

    Student Picture

    Kiran Temple University Fox School of Business ‘17, Course Hero Intern

  • Left Quote Icon

    I cannot even describe how much Course Hero helped me this summer. It’s truly become something I can always rely on and help me. In the end, I was not only able to survive summer classes, but I was able to thrive thanks to Course Hero.

    Student Picture

    Dana University of Pennsylvania ‘17, Course Hero Intern

  • Left Quote Icon

    The ability to access any university’s resources through Course Hero proved invaluable in my case. I was behind on Tulane coursework and actually used UCLA’s materials to help me move forward and get everything together on time.

    Student Picture

    Jill Tulane University ‘16, Course Hero Intern