1995_Viola_thesis_registrationMI

1995_Viola_thesis_registrationMI - MASSACHUSETTS INSTITUTE...

Info icon This preview shows page 1. Sign up to view the full content.

View Full Document Right Arrow Icon
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: MASSACHUSETTS INSTITUTE OF TECHNOLOGY ARTIFICIAL INTELLIGENCE LABORATORY A.I. Technical Report No. 1548 June, 1995 Alignment by Maximization of Mutual Information Paul A. Viola This publication can be retrieved by anonymous ftp to publications.ai.mit.edu. Abstract A new information-theoretic approach is presented for nding the pose of an object in an image. The technique does not require information about the surface properties of the object, besides its shape, and is robust with respect to variations of illumination. In our derivation, few assumptions are made about the nature of the imaging process. As a result the algorithms are quite general and can foreseeably be used in a wide variety of imaging situations. Experiments are presented that demonstrate the approach registering magnetic resonance MR images with computed tomography CT images, aligning a complex 3D object model to real scenes including clutter and occlusion, tracking a human head in a video sequence and aligning a view-based 2D object model to real images. The method is based on a formulation of the mutual information between the model and the image called EMMA. As applied here the technique is intensity-based, rather than feature-based. It works well in domains where edge or gradient-magnitude based methods have di culty, yet it is more robust than traditional correlation. Additionally, it has an e cient implementation that is based on stochastic approximation. Finally, we will describe a number of additional real-world applications that can be solved efciently and reliably using EMMA. EMMA can be used in machine learning to nd maximally informative projections of high-dimensional data. EMMA can also be used to detect and correct corruption in magnetic resonance images MRI. Copyright c Massachusetts Institute of Technology, 1995 This report describes research done at the Arti cial Intelligence Laboratory of the Massachusetts Institute of Technology. Support for the laboratory's arti cial intelligence research is provided in part by the Advanced Research Projects Agency of the Department of Defense under O ce of Naval Research contract :N0001494-01-0994. Paul Viola was also supported by USAF ASSERT program, Parent Grant:F49620-93-1-0263. 1 2 Alignment by Maximization of Mutual Information by Paul A. Viola Submitted to the Department of Electrical Engineering and Computer Science on June 1995, in partial ful llment of the requirements for the degree of Doctor of Philosophy. Abstract Over the last 30 years the problems of image registration and recognition have proven more di cult than even the most pessimistic might have predicted. Progress has been hampered by the sheer complexity of the relationship between an object and its image, which involves the object's shape, surface properties, position, and illumination. Changes in illumination can radically alter the intensity and shading of an image. Nevertheless, the human visual system can use shading both for recognition and image interpretation. We will present a measure for comparing objects and images that uses shading information, yet is explicitly insensitive to changes in illumination. This measure is unique in that it compares 3D object models directly to raw images. No pre-processing or edge detection is required. We will show that when the mutual information between model and image is large they are likely to be aligned. Toward making this technique a reality we have de ned a concrete and e cient technique for evaluating entropy called EMMA. In our derivation of mutual information based alignment few assumptions are made about the nature of the imaging process. As a result the algorithms are quite general and can be used in a wide variety of imaging situations. Experiments demonstrate this approach aligning a number of complex 3D object models to real images. In addition, we demonstrate that the same technique can be used to solve problems in medical registration. Alignment is accomplished by adjusting the pose of an object until the mutual information between image and object is maximized. We will present a gradient descent alignment procedure based on stochastic approximation that has a very e cient implementation. For this application stochastic approximation a ords a speed up of at least a factor of 500 over gradient descent. In addition, stochastic approximation can be used to accelerate a variety of other vision applications. We will describe an existing vision application which can be accelerated by a factor of 30 using stochastic approximation. Finally, we will describe a number of additional real-world applications that can be solved efciently and reliably using EMMA. EMMA can be used in machine learning to nd maximally informative projections of high-dimensional data. EMMA can also be used to detect and correct corruption in magnetic resonance images MRI. Thesis Committee: Prof. Prof. Prof. Prof. Tomas Lozano-Perez Co-Supervisor Christopher G. Atkeson Co-Supervisor W. Eric L. Grimson Berthold K. P. Horn 3 4 Acknowledgments I would like to acknowledge the MIT Arti cial Intelligence Laboratory for being a haven of intellectual freedom. Professors Tom s Lozano-P rez, Christopher Atkeson, Eric Grimson, a e Rodney Brooks, and others have worked to build an environment where every resource is unfettered. Chris and Tom s have supported me through thick and through thin. They've a taught me how truly valuable unconditional support can be. Students are the AI lab's most precious resource and I bene ted from discussions with Phil Agre, Davi Geiger, David Chapman, Jose Robles, Tao Alter, Misha Bolotski, Jonathan Connel, Karen Sarachik, Maja Mataric, Ian Horswill, Colin Angle, Cynthia Ferrel, Henry Minsky, Saed Younis, Rick Lathrope, Barbara Moore, and many others. Willliam Wells has stood out among all those I've met at MIT. His unique approach to vision, his care in research and his advice as a friend have proven invaluable. I owe much to John Shewchuk, then of Brown University, for introducing me to the eld of statistical learning. Memories of his irrepressible intellectual curiosity serve to continually motivate my own thinking. Outside of MIT I have spent many productive months at the Salk Institute in the Computational Neurobiology Laboratory of Terrence Sejnowski. Terry is the most tirelessly devoted scientist that I have ever met. In his lab I learned that science is an uncompromising pursuit of truth. Science is about building on the work of others, and that to build one must rst understand. In Terry's lab I have had the pleasure of working with David Lawrence, Rich Zemel, Nici Shraudolph, Tony Bell, and Peter Dayan. Each of them has had an e ect on some part of this thesis. Most importantly I must recognize the technical contributions of Sara Billey. She worked tirelessly with me to understand the most cryptic mathematics and clear up my own cryptic thinking. Without her this thesis would not exist. 5 To: Sara Billey for being the love of my life. My parents Mary Ancona-Viola and Alfredo Viola for making it all possible. 6 Contents 1 Introduction 9 1.1 An Introduction to Alignment : : : : : : : : : : : : : : : : : : : : : : : : : : 11 1.1.1 An Alignment Example : : : : : : : : : : : : : : : : : : : : : : : : : : 12 1.2 Overview of the Thesis : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 18 2 Probability and Entropy 2.1 Random Variables : : : : : : : : : : : : : : : : : : : : : 2.2 Entropy : : : : : : : : : : : : : : : : : : : : : : : : : : 2.2.1 Di erential Entropy : : : : : : : : : : : : : : : : 2.3 Samples versus Distributions : : : : : : : : : : : : : : : 2.3.1 Model Selection, Likelihood and Cross Entropy : 2.4 Modeling Densities : : : : : : : : : : : : : : : : : : : : 2.4.1 The Gaussian Density : : : : : : : : : : : : : : 2.4.2 Other Parametric Densities : : : : : : : : : : : : 2.4.3 Parzen Window Density Estimation : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 3 Empirical Entropy Manipulation and Stochastic Gradient Descent 3.1 Empirical Entropy : : : : : : : : : : : : : : : : 3.2 Estimating Entropy with Parzen Densities : : : 3.3 Stochastic Maximization Algorithm : : : : : : : 3.3.1 Estimating the Covariance : : : : : : : : 3.4 Principal Components Analysis and Information 3.5 Conclusion : : : : : : : : : : : : : : : : : : : : : 4 Matching and Alignment : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 20 21 26 30 31 32 35 35 38 41 52 53 57 60 67 68 73 76 4.1 Alignment : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 77 7 4.2 4.3 4.4 4.5 4.1.1 Correlation as a Maximum Likelihood Technique : 4.1.2 Correlation and Mutual Information : : : : : : : : Weighted Neighbor Likelihood vs. EMMA : : : : : : : : 4.2.1 Non-functional Signals : : : : : : : : : : : : : : : Alignment Derivation : : : : : : : : : : : : : : : : : : : : Matching and Minimum Description Length : : : : : : : Summary : : : : : : : : : : : : : : : : : : : : : : : : : : 5 Alignment Experiments 5.1 Alignment of 3D Objects to Video : : : : : : : 5.1.1 Alignment of Skull Model : : : : : : : 5.1.2 Alignment of Head Model : : : : : : : 5.1.3 Alignment of Curved Surfaces : : : : : 5.2 Medical Registration Experiments : : : : : : : 5.2.1 Three Dimensional MR CT Alignment 5.3 View Based Recognition Experiments : : : : : 5.3.1 Photometric Stereo : : : : : : : : : : : 5.4 Limitations of EMMA Alignment : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 79 81 93 94 100 101 105 106 107 111 119 123 126 128 132 133 135 6 Other Applications of EMMA 137 7 Conclusion 147 A Appendix 151 6.1 Bias Compensation : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 137 6.2 Alignment of Line Drawings : : : : : : : : : : : : : : : : : : : : : : : : : : : 142 7.1 Related Work : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 148 7.2 A Parallel with Geometrical Alignment : : : : : : : : : : : : : : : : : : : : : 149 A.1 Gradient Descent : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 151 8 Chapter 1 Introduction This thesis is about a new information theoretic approach for solving several standing problems in computer vision and image processing. For example, this approach can be used to nd the correct alignment between a three dimensional model and an image. While alignment is a critical component of the object recognition problem, it is also useful by itself in medical and military applications. We will also describe several other applications, including an image processing application and a new form of unsupervised learning. While the form of these applications is quite di erent the underlying theory and derivations are very similar. Preliminary investigation imply that the theory presented here will have wide application. Computer vision has proven more di cult than even the most pessimistic might have predicted. While the problem has been of interest for over 30 years, progress has been painstakingly slow. Even the best computer vision systems stand in stark contrast to the human visual system: our perception of images is e ortless and robust; computer vision systems are at best slow and unreliable. Among other di culties, progress has been hampered by the sheer complexity of the relationship between image and object, which involves the object's shape, surface properties, position, and illumination. A computer vision program is faced with the task of interpreting an image of intensities. While information about the shape and location of objects is somehow embedded in these intensities, the actual intensities that arise in an image are di cult to interpret. For example, changes in illumination can radically alter the intensity and shading of an image. Though 9 Paul A. Viola CHAPTER 1. INTRODUCTION the human visual system can use shading both for recognition and image interpretation, most existing computer object recognition systems cannot. These systems throw out shading information in an e ort to obtain illumination invariance". We will present a measure for comparing objects and images that uses shading information, yet is explicitly insensitive to changes in illumination. This measure is unique in that it compares 3D object models directly to raw images. No pre-processing or edge detection is required. This image model comparison measure has been rigorously derived from information theory. Both the theory and algorithms involved are new, and are based on a e cient scheme for evaluating mutual information called EMMA1. The derivation of the the alignment procedure requires few assumptions about the nature of the imaging process. As a result the algorithms are quite general and can be used in a wide variety of imaging situations. Experiments demonstrate that this approach can align a number of complex 3D object models to real images. In addition, the same technique can be used to solve problems in medical registration. Alignment adjusts the pose of an object until the mutual information between image and object is maximized. Pose adjustment can be accomplished by ascending the gradient of mutual information. We will present an alignment procedure based on stochastic approximation that a ords a speed up of at least a factor of 500 over gradient ascent. In addition, stochastic approximation can be used to accelerate a variety of other vision applications. We will describe an existing vision application which can be accelerated by a factor of 30 using stochastic approximation. EMMA has also proven useful in a number of tasks beside alignment. For example, an entropy minimization framework that can be used to detect and correct corruption in magnetic resonance images MRI. EMMA can also be used to de ne a new form of unsupervised learning. Unsupervised learning has been popularized in the neural network literature as a scheme for simplifying the representations of complex data. EMMA can be used to nd lowdimensional projections of a high dimensional input space that are maximally informative. EMMA is a random but pronounceable subset of the letters in the words EMpirical entropy Manipulation and Analysis". 1 10 1.1. AN INTRODUCTION TO ALIGNMENT AI-TR 1548 1.1 An Introduction to Alignment The general problem of alignment entails comparing a predicted image of an object with an actual image. Given an object model and a pose coordinate transformation, a model for the imaging process could be used to predict the image that will result. If we had a good imaging model then deciding whether an image contained a particular model at a given pose is straightforward: compute the predicted image and compare it to the actual image directly. Given a perfect imaging model the two images will be identical, or close to it. Of course nding the correct alignment is still a remaining challenge. The relationship between an object model no matter how accurate and the object's image is a complex one. The appearance of a small patch of a surface is a function of the surface properties, the patch's orientation, the position of the lights and the position of the observer. Given a model ux and an image vy we can formulate an imaging equation, vT x = F ux; q or equivalently, 1.1 vy = F uT ,1y; q : 1.2 The imaging equation is separable into two distinct components. The rst component is called a transformation, or pose, denoted T . It relates the coordinate frame of the model to the coordinate frame of the image. The transformation tells us which point in the model is responsible for a particular point in the image. The second component is the imaging function, F ux; q. The imaging function determines the value of image point vT x. In general a pixel's value may be a function both of the model and other exogenous factors. For example an image of a three dimensional object depends not only on the object but also on the lighting. The parameter, q, collects all of the exogenous in uences into a single vector. One reason that it is, in principle, possible to de ne F is that the image does convey information about the model. Clearly if there were no mutual information between u and v, there could be no meaningful F . We propose to nesse the problem of nding and computing F by dealing with this mutual information directly. We will present an algorithm that aligns by maximizing the mutual information between model and image. It requires no a priori model of the relationship between surface properties and scene intensities it only assumes 11 Paul A. Viola CHAPTER 1. INTRODUCTION that the model tells more about the scene when it is correctly aligned. 1.1.1 An Alignment Example One of the alignment problems that we will address involves nding the pose of a threedimensional object that appears in a video image. This problem involves comparing two very di erent kinds of representations: a three-dimensional model of the shape of the object and a video image of that object. For example, Figure 1.1 contains a video image of an example object on the left and a depth map of that same object on the right the object in question is a person's head: Ron. A depth map is an image that displays the depth from the camera to every visible point on the object model. A depth map is a complete description of the shape of the object, at least the visible parts. From the depth map alone it might be di cult to see that the image and the model are aligned. The task can be made much easier, at least for us, if we simulate the imaging process and construct an image from the 3D model. Figure 1.2 contains two computer graphics renderings of the object model. These synthetic images are constructed assuming that the 3D model has a Lambertian surface and that the lighting comes from the right. It is almost immediately obvious that the model on the left is more closely aligned to the true image than the model on the right. Unfortunately, what we nd trivial is very di cult for a computer. The intensities of the true video image and the synthetic images are very di erent. The true image and the correct model image are in fact uncorrelated. Yet any person can glance at these images and decide that both are images of a head and that both heads are looking in roughly the same direction. The human visual system is capable of ignoring the super cial di erences that arise from changes in illumination and surface properties. It is not easy to build an automated alignment procedure that can make this kind of comparison. It is harder still to construct a system that can nd the correct model pose. We have built such a system. That system selected the pose of the model shown at left in Figure 1.2. As mentioned above, the synthetic images of Ron were generated under the assumption the model surface is Lambertian and the lighting is from the right. Lambert's law is perhaps the simplest model of surface re ectivity. It is an accurate model of the re ectance of a matte 12 1.1. AN INTRODUCTION TO ALIGNMENT AI-TR 1548 Figure 1.1: Two di erent views of Ron. On the left is a video image. On the right is a depth map of a model of Ron. A depth map describes the distance to each of the visible points of the model. White denotes points that are closer, bl...
View Full Document

{[ snackBarMessage ]}

What students are saying

  • Left Quote Icon

    As a current student on this bumpy collegiate pathway, I stumbled upon Course Hero, where I can find study resources for nearly all my courses, get online help from tutors 24/7, and even share my old projects, papers, and lecture notes with other students.

    Student Picture

    Kiran Temple University Fox School of Business ‘17, Course Hero Intern

  • Left Quote Icon

    I cannot even describe how much Course Hero helped me this summer. It’s truly become something I can always rely on and help me. In the end, I was not only able to survive summer classes, but I was able to thrive thanks to Course Hero.

    Student Picture

    Dana University of Pennsylvania ‘17, Course Hero Intern

  • Left Quote Icon

    The ability to access any university’s resources through Course Hero proved invaluable in my case. I was behind on Tulane coursework and actually used UCLA’s materials to help me move forward and get everything together on time.

    Student Picture

    Jill Tulane University ‘16, Course Hero Intern