muinfjmiv - A Bayesian Joint Mixture Framework for the...

Info iconThis preview shows page 1. Sign up to view the full content.

View Full Document Right Arrow Icon
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: A Bayesian Joint Mixture Framework for the Integration of Anatomical Information in Functional Image Reconstruction ¡ Anand Rangarajan , Ing-Tsung Hsiao , and Gene Gindi ¡ 30th December 1998 Departments of Electrical Engineering and Diagnostic Radiology, Yale University, New Haven, CT. Departments of Electrical Engineering and Radiology, SUNY at Stony Brook, Stony Brook, NY. ¡ Abstract We present a Bayesian joint mixture framework for integrating anatomical image intensity and region segmentation information into emission tomographic reconstruction in medical imaging. The joint mixture framework is particularly well suited for this problem and allows us to integrate additional available information such as anatomical region segmentation information into the Bayesian model. Since this information is independently available as opposed to being estimated, it acts as a good constraint on the joint mixture model. After specifying the joint mixture model, we combine it with the standard emission tomographic likelihood. The Bayesian posterior is a combination of this likelihood and the joint mixture prior. Since well known EM algorithms separately exist for both the emission tomography (ET) likelihood and the joint mixture prior, we have designed a novel EM algorithm that comprises two EM algorithms—one for the likelihood and one for the prior. Despite being dove-tailed in this manner, the resulting EM algorithm is an alternating descent algorithm that is guaranteed to converge to a local minimum of the negative log Bayesian posterior. Results are shown on synthetic images with bias/variance plots used to gauge performance. The EM algorithm resulting from the joint mixture framework has the best bias/variance performance when compared with six other closely related algorithms that incorporate anatomical information to varying degrees. ¢ ¢ ¢ Keywords: Mixture modeling, emission tomography, Bayesian reconstruction, expectation-maximization (EM), alternating algorithms, joint mixtures, algebraic transformations, conjugate priors, gamma distributions, bias/variance performance, region segmentation, projection data, Radon transform 1 1 Introduction Maximum likelihood (ML) reconstruction in positron emission tomography (PET) and singlephoton emission computed tomography (SPECT) (Shepp and Vardi, 1982) is hampered by the loss of information due to noise and other factors in the projection imaging process. One approach to obtaining improved reconstructions is to use priors (Hebert and Leahy, 1989) that reflect the nature of the underlying radionuclide distribution. Recently, in emission tomography (ET), there has been considerable interest in incorporating side information derived from highly correlated anatomical information, such as co-registered magnetic resonance (MR) images, in the form of Bayesian priors (see extensive review in Section 2.) The main attraction of this data fusion approach is that one expects to obtain improved reconstructions to the extent that function follows anatomy. While the correlation of the spatial distribution of the intensities in functional images and their co-registered anatomical counterparts is often high, there is no relationship between the actual values of the two intensity patterns in common anatomical regions. Consequently, we can expect to gain more purchase by using the anatomical segmentation label information rather than the raw anatomical intensity as prior information in tomographic reconstruction. This correlation between function and anatomy is not perfect: there may be regions in the functional image that do not have an anatomical counterpart. For example, there could be a lesion which is apparent in the radionuclide distribution as a localized “hot spot” of high intensity pixels without a corresponding signature lesion in the anatomical label map. Therefore, we need a prior model which takes into account the good, but not perfect, correlation between a segmented and co-registered anatomical image and the evolving emission tomographic reconstruction. We can capture the notion of anatomy-function correlation via the joint histogram formed from the raw intensity values of the anatomical and functional images. Qualitatively, we would expect the joint histogram to comprise several localized peaks, each corresponding to a joint cluster of anatomy and function pixels. Where the anatomy-function correlation is high, the volume of the peaks is large. For a function class with no anatomical counterpart, e.g. functional lesions, there could be multiple low volume peaks in the joint histogram. If there are intensity classes in intensity classes in anatomy, there is a cross-product of as many as classes ¡¦¢¥ ¡¢ ¤£ ¡ function and that may appear in the joint histogram. However, since function follows anatomy for the most . part, the actual number of peaks seen in the joint histogram would be much less than The joint histogram is seen typically as a non-parametric estimate of the joint density between anatomy and function. In this paper, we have chosen to use a joint mixture framework to model the joint density between anatomy and function. As opposed to a non-parametric density estimator, the joint mixture framework allows us to associate a parametric density function for each joint anatomy-function class. We also have more information than is present in the joint histogram: often we have a very good estimate of the anatomical segmentation labels which act as a strong constraint on the joint mixture model. We also employ a prior on the joint mixture occupation probability to express the tight coupling between anatomy and function. The Bayesian maximum 2 a posteriori (MAP) estimate then has the novel interpretation of including mutual information maximization between the anatomy and function. We approach the ET reconstruction problem from a Bayesian perspective wherein the prior is formulated using the aforementioned joint mixture model. We use the familiar Poisson likelihood as a forward model for generating the observed projection data from the underlying radionuclide source distribution. Then, the log-posterior is proportional to the sum of the Poisson log-likelihood and the joint mixture log-prior. Our overarching goal is a MAP estimate of the functional image. The parameters of the joint mixture prior are unknown and also have to be estimated from the projection data (and the known anatomical labels). Since the Poisson log-likelihood is maximized by the well known emission tomographic (ET) expectation-maximization (EM) algorithm (Shepp and Vardi, 1982) and well known EM algorithms exist for mixture decomposition (McLachlan and Basford, 1988; Redner and Walker, 1984), it is natural to combine these two algorithms into one EM algorithm. The result is an alternating descent algorithm, which we term the EM algorithm, which consists of an EM outer loop dealing with the likelihood and an EM inner loop dealing with the joint mixture model parameters. It is guaranteed to find a local minimum of the Bayesian posterior cost function, by virtue of being an alternating descent algorithm. 2 Previous Work The integration of anatomical side information into the ET reconstruction, a problem of some practical import especially in brain imaging, has been addressed by several groups in the past decade. The strategies fall mainly into two categories: those that employ explicit boundary information from the anatomical image, and those that employ implicit boundary information derived from anatomical tissue classification labels at each pixel. In both cases, the anatomy-function correlation is premised on the physiologically plausible assumption of approximately uniform activity in distinct anatomical regions. In the first category, loci of discontinuities from the co-registered anatomical image are used to spatially modulate the formation of discontinuities in the ET reconstruction. For one class of these methods, the mathematical machinery for imposing side information is couched within a Bayesian MAP estimation problem that uses an edge-preserving Gibbs prior. A spatially varying weight (for line-site or discontinuity formation) modulates the Gibbs potential so as to encourage the formation of discontinuities in the ET reconstruction at sites corresponding to discontinuities in the anatomical image. Implementational details differ, but such strategies have been employed in (Leahy and Yan, 1991; Gindi et al., 1993; Ouyang et al., 1994; Lipinski et al., 1997). A related strategy was employed in (Ardekani et al., 1996) in which an adaptive edge-preserving smoothing filter is calculated using multispectral segmented MR data and applied to the ET reconstruction with the aid of a cross-entropy objective function. All of these approaches are distinguished in the ways they model uncertainty in the location of anatomical discontinuities, errors in the registration, and mismatched resolution of the anatomical and functional images. In cases where the 3 anatomy-function correlation is high, these methods work well. The basic idea in the second approach is similar, but here, use is made of tissue label information at each pixel in the co-registered anatomical image (e.g. “gray”, “white” and “csf” labels derived from a segmented MR brain scan). Through a variety of mathematical strategies, local differences in the label field may be used to again modulate the strength of a smoothing prior. At loci of label discontinuities, the smoothing weight is lowered and discontinuities are allowed to form. Since a given pixel may be composed of different tissues, non-binary “partial-volume” labels are typically used in these models. In addition, some models posit a uniform concentration of radionuclide for each tissue class, thus accounting for the spatial variation of radionuclide by a weighted (by spatially varying partial-volume tissue labels) sum of activities in each class. At the interface of anatomical regions, usually corresponding to gradual transitions in ET intensity, this use of partial-volume modeling is efficacious for accurate ET reconstruction. This strategy was used in the form of a post correction to a conventionally reconstructed PET brain scan in (Meltzer et al., 1990; Videen et al., 1988; Muller-Gartner et al., 1992), with a later, more sophisticated extension of this approach reported in (Rousset et al., 1998). The incorporation of label information within a Bayesian MAP estimation procedure is reported in (Sastry and Carson, 1997). In a different Bayesian MAP approach (Hsu and Leahy, 1997), pixel labels that are not dominated by one tissue class are flagged and used to invoke a second-order “thin-plate” prior (Lee et al., 1997), corresponding in the ET reconstruction to graded intensity regions: otherwise, conventional first-order smoothing is imposed. This strategy leads to an ET reconstruction containing realistic smooth transition regions separating those areas of uniform activity corresponding to relatively pure tissue types. Again, these models work well where the anatomy-function correlation is high. More complex models, not clearly identitified with either of the two categories, have been proposed. In (Zhang et al., 1994), a joint MAP estimation approach is proposed wherein, as with other approaches, anatomical label differences are used to modulate discontinuities in ET. Unlike other approaches, a separate provision is made to account for the possibility of localized intensity regions in the emission reconstruction that do not correspond to label regions in the anatomy image. Some user interaction with an initial ET reconstruction is required, however. An elegant but very elaborate hierarchical Bayesian model proposed in (Bowsher et al., 1996) is able to robustly account for exceptions to the presumed anatomy-function correlation, and, in fact, utilizes anatomical information in the form of an initial estimate. However, the model expressiveness is balanced by the need to estimate several hyperparameters on whose values the performance depends. The framework proposed in this paper is motivated by a desire to capture some of the modeling flexibility afforded in (Zhang et al., 1994) and (Bowsher et al., 1996), but without the need for user interaction or difficult hyperparameter determination. Finally, this work is similar in spirit to recent work on unsupervised category learning using mixture models in the neural net literature (Tenenbaum and Freeman, 1997; de Sa and Ballard, 4 1998). The themes are very similar; for example, in (Tenenbaum and Freeman, 1997), a mixture model is used to model the joint space of font styles and types and in (de Sa and Ballard, 1998), the focus is squarely on the unsupervised learning of categories from multimodality data. 3 A Bayesian Joint Mixture Framework In this section, we begin with a brief description of the standard likelihood model for emission tomographic reconstruction. We then proceed with the development of the joint mixture model. After describing the overall joint mixture framework in some detail, we specialize the mixture parametric density to a gamma distribution. With the likelihood and priors specified, we then describe the EM algorithm—an alternating descent algorithm that minimizes the negative logposterior. 3.1 The Likelihood Model We now write down the likelihood for our problem. The likelihood specifies the forward model for generating the observed projection data from the underlying radionuclide (PET or SPECT) source distribution. We use the conventional model of Poisson noise at the detector (Shepp and Vardi, 1982) with the standard assumption that the detector bin counts are independent of one another. Therefore, the likelihood is 7 2 0( & $# ! ¨    2 @9¨ 7 ¦ 7 ¤ 8BA& ! £  6(54( 31)'%"¢ ©¦   £ Pr ! ¨¦ ©§¥¤£ Poisson (1) ¡ ¢  where is the vector of projection data, is the underlying radionuclide source distribution vector (to be estimated) and is the forward projection matrix. and are the associated random  ¦ ¤ fields. The measurement physics is incorporated in . (While this linear model is appropriate WXUVTQRSRP%GEC I QQIHF D ¤ for SPECT, it has to be extended to an affine model for PET. This extension does not affect our formulation.) Throughout this paper, we use as the index for detector measurement W IQQQIHF D dcbaRRRP%G`Y The corresponding negative log-likelihood is . terms independent of ¨ q¦ as the index for the radionuclide source distribution 7¦ and (2) ! ! 7 t 37 7 7 ¨ £ €g7 ¦ 7 ¤ f r PxwP! u1r ¦ ! ¤ s!¢ q6¦£ pfi h ge r yv! Despite the non-quadratic nature of the above log-likelihood, the well known EM algorithm that maximizes the log-likelihood has a simple, elegant form (Shepp and Vardi, 1982). The EM algorithm is conventionally derived using an incomplete/complete data parameterization (Dempster et al., 1977). Lack of space precludes us from presenting this derivation of the EM algorithm for emission tomography. Instead, we present the intuition behind this conventional derivation and then present a shorter, idiosyncratic derivation. 5 ! The complete data random field for emission tomography is the number of photons that are emitted from pixel and detected at detector bin . The incomplete projection data random ! 7 ¨ 7 8&  ! £ C Y is derived from a straightforward many-to-one mapping 7 field from the complete data . The original incomplete data likelihood in (1) is now written in terms of the complete data likelihood. Since the actual complete data is unknown, the incomplete data log-likelihood be- comes a function of expected value of the complete data. These computed expectations comprise the E-step. The M-step is a maximization step carried out after the E-step. For emission tomography, the EM algorithm turns out to be an extremely simple (but usually inefficient) discrete-time dynamical system that is guaranteed to maximize the incomplete data log-likelihood. We now present an idiosyncratic derivation of the EM algorithm for ET. Borrowing from a similar EM algorithm for mixture decompositions (Hathaway, 1986; Gold et al., 1996; Yuille et al., 1994), we write down the following complete data energy function: V¦1§ I © 7 7 8& § C Y !  ©! 7§ ! t7 € E7 § 7 § 7 37 ¦ 7 ¤ 7 § 7 t 7 ¦ 7 ¤ 7 qV¦1¨§£ € ¨ '7 § £ ¨I ¥¡ © !  ! r ! ! r ! Pxwv ! r! ! Pxwv ! r! ! r!  Vfi h "f¦¤£¢e y y In (3), (3) is the complete data corresponding to the expected number of photons emanating from pixel that are detected at detector . The vector the complete-incomplete data constraint and the Lagrange parameter is a Lagrange parameter vector that expresses . It is possible to solve for the three unknowns by differentiating with respect to each variable and setting the result to zero. Rather surprisingly, this yields the emission tomography EM algorithm originally derived in (Shepp and Vardi, 1982). !© 5 ! ( 22 4 22 && 7 (  7 § ¦ &  %I 1' I ¨ H t t £ 2)0(7 ¦ 7 ! ¤  7  ! !!§   & &  &   $ "# !2    3  $ "# !(   2   %  $ "# !    4( The final algorithm after eliminating has a simple iterative form which at time step (4) is (5) G HFED G HFED !! § 7¤ & C A 9 67 !B@87(¦ I I 4 I 2 0 P& ( 4 !( 2 0 ! &  While the above derivation is somewhat idiosyncratic, it is especially useful for our purposes. We were able to convert the original incomplete data log-likelihood energy function in (2) into a constrained energy function from which the well known EM algorithm emerges as a straightforward alternating descent algorithm. Exactly the same approach is used later in the constrained joint mixture energy function. 6 3.2 The joint mixture model Assume that the anatomical image and the (evolving) functional image are perfectly registered and that the anatomical image has has been resampled so that the pixel size is the same as that of the functional image. While it is reasonable to expect that the pixel classifications of the anatomical and functional images will have some degree of overlap, there is also reason to expect that some functional image region class may overlap several anatomical region classes. This notion will be further explicated after notational definitions have been established. The joint mixture prior is written as follows: ¨C 6  7  7 £ ¨CA6 I 7  7  7 ¦  7 £   Pr (6) ¥ ¥ ¡ ¢ £ Pr ¢   ¤ A  rr ¨  ¨ A §  ¨ C 6 I !CA 6 I   I ¦  £  ¤  A ¥ ¦ 7 ¨ © ¥ ¡ ¢ £ ¡ Pr ¢ In (6), is the anatomical image intensity with being the associated random field. The matrix is the occupation probability of the two intensity images in a label cross product space.  ¤  and H & W ¡ aRRRP%F D %W TRRRI %F 8 I IQQQIH I IQQQ H D & ¡¢ # !" !  ¤    ¤ ¤ are The constraints on . Throughout this paper, is the subscript index used for the ET classes and the subscript index used for the anatomical classes. The common index on and denotes a pixel location in # ¡I and for a 2D SPECT '% (& H  ¦ b ucba¡I RRRP% F D Y W QQQIH (e.g. & or $  is much larger than ¢ b either image. Typically, reconstruction problem with anatomical MR side information). The parameters and are associated with the basic modeling distributions of the joint mixture model and will be described C6 ¥ CA6 ¥ below. After specifying the joint mixture model (6), we turn to the prior on the occupation probability matrix . If a perfect overlap existed between the labels in the functional and anatomical image, ¤ ¤ the matrix would be a square matrix and without loss of generality, the diagonal elements would represent the numbers of common pixels sharing the same label. On the other hand, if the anatomical and functional labels were independent of one another, could be factored into a product of ¤ a column and a row vector with the column (row) vector consisting of the occupancy probabil¤ ities of the functional (anatomical) image. The prior on can be viewed as a complexity term, which, when combined with the joint mixture model has the novel interpretation of “maximizing the mutual information of ” (Wells III et al., 1996). We write the prior on with an eye toward ¤ ¤ maximizing the mutual information: @ 6 E4 (7) 1 QI RPG& 6 4 2 CDB58 @A489H¥ 72 F 53& A  0 ¨ £ ¤ ) ¥ QIG 7SH F where is the partition function corresponding to this prior density. This partition function is a configuration space integral over all satisfying the constraints I&  T  ¤ V ¤ 7 !U  . The parameter I IQQQIH %W TRRSP%F D H  ¤  & and is a free parameter (which we later set to the W IQQQIH D '¡ TRRRP%F ` # ! ). The corresponding negative log prior energy function is terms independent of (8) Q ¤ € ¤ £¢  ¤ ¢ £r y Pxwv ¤   € % ¤ % r P vw y b number of ET pixels ¡ r  ¨ £ Vh¥ e  ¤  ¤ V This energy function can be viewed as a summation over the negative of the marginal entropies ¤ of . The only remaining piece of information left unspecified is the anatomical label information. We assume that the anatomical image has been previously segmented and registered with the ET image; for registration purposes, a filtered backprojection (FBP) or EM reconstruction will suffice. We also assume that every anatomical pixel (in the ET coordinate frame) has a region label. We represent this anatomical image segmentation information as a matrix with each entry ¥ . (Recall that we use to denote an anatomical class index throughout this paper. ) # denoted by W ¡ ITSQRRIP%F D QQH W 7 ¦F ¥ 7¥  Essentially, the matrix contains the degree of membership of pixel in the anatomical image . The constraints on are and . to class Y H 7 ¥ &  W ¡ RSRRP%F D I IQQQIH  #S   & 7¥   W 7 ¦F ¥  W 7 ¦F ¥  # The segmentation information derived from a segmentation algorithm is assumed known. It plays a vital role in the computation of the pixel classifications in the functional image. Using Bayes’ rule, one could write down the incomplete data posterior energy function consist¤ ing of the tomographic likelihood in (1), the joint mixture model in (6) and the prior on in (7). However, in order to derive the EM algorithm, we switch to the complete data spaces for both the tomographic likelihood and the mixture prior and then write down the complete data posterior energy function. ¤ Having specified the prior on , we return to the joint mixture model. The negative log-prior energy function corresponding to the joint mixture model in (6) is ¨ C 6  7  7 £ ¨ !CA 6 I 7  7  7 ¦  7  £ (9) ¥ Pr ¢  ¥ ¡ ¢  Pr  ¤ 7 t h r Pxwv r  ¨ C 6 I !CA 6 I I I 6¦£ § £ e y  ¥ ¡ ¥ ¡ ¤ ¢ As in the case of the tomographic likelihood, it is possible to go from the incomplete data constrained joint mixture energy function to the complete data constrained joint mixture energy function. This conversion was (to our knowledge) first pointed out in (Hathaway, 1986) for the canonical single mixture case. The extension to the joint mixture model is straightforward. The complete & Y Y ¥ ¦F &  D Y %W ¡ TQRRQRPH%D I 7 ¥ 7  ¨ A & I & 7  ¨ I IQIF W 7  ©F  ¨     ¨    ¨  7  ¨ @ 6P §   # ¨  lows. An example of the complete data  & H7   ¨   (  ( ! ¦¡&  ¤ constraints. Since is the degree of membership of ET pixel in ET class of membership of the anatomical pixel in anatomical class , the constraint in the above and the degree fol # !S . Note the appearance of the anatomical segmentation labels , imply   S . These constraints taken in conjunction with the constraint 7 ¥ 7  ¨  W7 H 7 ¥ W IQQQIH dcbTRSRP%F that which is a joint classification variand  data energy function uses a new complete data variable able. The constraints on are is shown in Figure 1. Note the differences between the anatomical “MR” and functional image regions. Specifically, in this example, there are two “very hot” regions in the functional image that do not exist in the anatomical image. One of the “very 8 ¤ hot” regions extends across two anatomical classes. The complete data constrained joint mixture energy function [after adding the log-prior on terms from (8)] is ¨ H t r £ € ¨ 7¢ ¥ ¨ C 6  7  7 £ ¨ !CA 6 ¥ ¢ £  ¤   Pr ¢  € r% ¢r y t 7 Pxwv ¨ £ 7 7 € r r I 7  7  7 ¦  7 £ ¢  ¥ ¡  ¥ ¦ (10)  ¤  ¤  ¤  ¤  V    ¤ % ¨   r t  ¨ C 6 ¡I C A 6 ¡I ¤ I ¨ I 6¦£ § £ ¦£ ¡ e ¥ ¥ h¥     r yPxwv 7 ¨ 7  yPxwv 7  y Pxwv ¤¡ ¨  r € 7  7r V € are Lagrange parameters (vector and scalar, respectively) expressing the con ¤  & and H ¤ 7 ¥ 7 ¨ &   ¥   straints and In (10), Pr . From Bayes’ rule, we know that the log-posterior is proportional to the sum of the log-likelihood and the log-prior. Combining the complete data mixture energy function in (10) with the complete data tomography likelihood energy function in (3), we get the generic form of the final energy function used in this paper. This posterior energy function is ¨ C 6 I C A 6 I I ¨ I @¦£ § £ ¥ ¤£¡ e € ©V¦1§ £ Vfi h "¥ ¤£¡ e  ¨ C 6 I !CA 6 I I ¨ I V¦1§ £ ¦£ ¡ e ¨I f I¥ h (11) ¥ ¡ ¥ ¡ ¤ ¥ ¡ ¥ ¡ ¤ 3.3 Mixtures of conjugate priors  and Pr 7 £ ¨CA6 I 7  7  7 ¦  7 £ ¥ ¡ ¢ We have not yet specified the modeling distributions Pr 7 77 7  ¨ !CA 6 I   ¦   £ ¨C 6  7 . The first simplifying assumption we make is the independence of the functional image ¥ ¢ pixel conditional density function with respect to the corresponding anatomical image pixel; i.e. Pr Pr . This assumption is clearly limiting and forces to ¨ !CA 6  7 ¦  7  £ ¤ ¥ ¥ ¡ ¢ mediate all of the real “coupling” between the anatomical and functional images. The main reason ¨  7 I 7 @¦£ ¢ for this assumption is ignorance. While we could model the the joint density function using (say) a multivariate Gaussian for each pair , this greatly increases the complexity of the ¥ ) ¨ "I £ # estimation procedure for the mixture parameters . A second reason for refraining from modeling ¥ the above joint density for each pair in the cross-product space is more technical. It turns out that using multivariate Gaussian priors (on ) complicates the estimation procedure for . ¨ "I £ # ¦ ¨ C A 6  7 6¦£ ¥ ) ¦ We use a gamma prior for the prior density . The use of the gamma prior is mo- tivated by the fact that it is conjugate to the Poisson distribution used in the likelihood model. Conjugate priors usually result in highly analytically tractable posterior estimates (Bernardo and 7¦ to be positive unlike Gaussian ! "@ # (12) 7¨ £ 7 ( $#A $ ¦  ¨ I  6¦£ ¨ I £  CA6  @  £    @   9  @   ) The mean of a gamma distribution parameterized in this manner is and the variance is @ %@ &$ is defined as @   ©§    £ ¨ ¥ densities. The gamma density with Q Smith, 1994). In addition, the gamma prior naturally constrains After resorting to the complete data energy function reformulation trick pioneered in (Hathaway, 1986), the complete data constrained joint mixture energy function which is a specialization of (10) to the gamma prior is written as y 7 7 € 7 Pxwv € 7 t7 r £ ¨ ¥  ¨ r £ r  ¨ P vw y 7¨ £ ( $#A $ ¦ y P vw b €% r %r 7 7 € 7 yPxwv r £ r  ¨ yP vw 7¦¨ £ y P vw ! "@ # ¢ ¨H t  ¤  ¥ ¦ @ @  ¤  ¤     V  ¤¡    £  @   ¤   @   ¤ , (13) can be rewritten as ¢ ¨H t  ¤  ¥ ¦  ! @#   £  @   @   ¡ @ @ ¤  (14) B  ¤    ¤ ¢¢ V ¤  &  1 %¤% t rb 7 ¨ 7 € r 7 ¨ 7 t ¨ r b € r 7 ¨ 7 € r 7 ¨ 7 t ¨ r     ¤ & G¡ ¤ e £ ¥ ¤¡ £ V    ¤    ¤ I I I ¨ I @¦£ £ § h  "  , and setting ¢r y € 7 ¢ t 7 Pxwv r £ ¨ ¥  ¨ ( $#A $ I I I ¨ I @¦£ £ § h 7 7  ¨ )& Using the identity (13)   " G¡ ¤ £ ¥ ¤¡ £ e   The last term in (14) can be interpreted as the mutual information of the occupancy probability . ¤ The prior on was reverse-engineered in order to obtain the mutual information in the posterior. Equation (14) can be interpreted as a posterior energy function wherein the mutual information of is maximized subject to the constraints on the complete data and the choice of the modeling ¤ ¨ distribution (in this case, gamma). We have now specified in great detail the likelihood, the joint mixture with the gamma distribution as the basic modeling distribution, the prior on (motivated by mutual information) ¤ W 7 ¦F ¥  and the constraints due to the available anatomical segmentation information . In addition, we have written down constrained complete data joint mixture energy functions as well as the complete data likelihood. Next, we develop the EM algorithm to minimize the complete data log-posterior energy function. 3.4 The EM algorithm We now write the full complete data energy function which comprises the complete data tomographic likelihood as well as the complete data joint mixture prior (which includes the prior on ). The final energy function used in this paper is a summation of the energy functions in (3) and ¤ in (13). € t '7 ¦ ¨ H t £ t 7 7  ¨ 7 € ¨£ € ¦ r y P y y tvxw 7 Pxwv g7 Pxwv 7 7 Pxwg7 ¦ 7 ¤ 7 § 7 37 ¦ 7 ¤ 7 €y €v t ¨ '7 § £ ¨ I I © I I I I ¨ I V¦1¨§£ I ! §y § !  ! r ! © Pr ! Pxwv ! r! ! Pxwv ! ¥!r ! ¥!r  y ¤ ¥ ¤ ¥£        £   10 £   ¤  " £ G ¢¥ ¤¡ £ e  ¤ ¢r ¢ t Pxwv ¨H y (15)  ¤  r£  ¤ y € ¨ 7 ¥ t 7   ¨ Pxwv £ 7  r  ¥  € rb  % ¤ % r  ¤ ¤ 7 ¨7 t b€ 7r € 7 Pxwv 7  7 r € y r  ¨ Pxwv  ¨ r y      ¤    In Appendix A, we derive the final set of update equations from the above energy function. The final set of update equations is derived by differentiating w.r.t. each variable in the transformed energy function and setting the result to zero. The result is an alternating descent algorithm that consists of two EM algorithms—one for the likelihood with a feedback contribution from the prior and one for the joint mixture prior. The mixture EM algorithm is nested inside the outer likelihood EM. The pseudocode is presented below in the box entitled “The EM Algorithm: Mutual Information”. The EM Algorithm: Mutual Information (!  & 9 ( 2 0 2 & C A $ 6 ( !  & 9 ( 2 2 & ¡7 ¦ I 4 I2 0 I % & 4( ( 2 0 ¡7 § ! ! ¨ qV¦I I I I ¨ £  " converges ¤ ( ¤¥ ¤ (4G D£ ( ¤¥ ¤ (4G D£ ¨ I I I ¨£ ! @# @ @ ! 2# 2 2  ¨  £  €   €  y Pxwv  y Pxwv @ @ @# @ 6P6@ @ 6P6@ t 37 ¦ ¨ H t £ t 4(  £ 2 @ @$ 22 @  @@ 22 @@ # # 8 72 8@ ¢ ¢ 6@ 8R2 ¢ 8 8 ¢ (!  ( & 4( ( !  ( y & ( !  G& & (& & 7 ¥ 7 ¨   2  ¤ 6@ § 7 ¨ 7 &   6@ 6 @6 6P 3 @ © ¨¦ §   End B converge y Pxwv ¦ Begin B: EM inner loop. Do B until y Pxwv Begin A: EM outer loop. Do A until  " ¤ Initialize End A 4 Experimental Results In this section, we compare the performance of the EM algorithm with six other closely related variants. The different algorithms model to varying degrees the inclusion of the various forms of the anatomical information present in the the EM algorithm. The seven algorithms are briefly summarized in Appendix B; they are designated by EM-1, EM-2, MIX, MR, JENT, MI and MIz. The EM-1 and EM-2 are standard EM emission-tomography maximum-likelihood algorithms as in (5) with the only difference between them being the stopping criterion, i.e. the number of iterations. 11 (The EM maximum likelihood algorithms for ET have a over-fitting problem and hence never actually converge (Shepp and Vardi, 1982).) The algorithm MIX uses a mixture of gamma prior to stabilize the reconstruction and does not include any anatomical information. It is a standard Bayesian MAP algorithm with a mixture of gamma prior. The fourth algorithm, MR, is derived based on the assumption that the anatomical and functional classes have a perfect overlap. In (note the absence of the second subscript ) is 7¥  7¨  directly constrained by the anatomical label information # MR, the complete data classification variable and is inhibited from forming “hot spot” lesions in ET that do not appear in anatomy. The fifth and sixth algorithms, JENT and MI, respectively are EM algorithms with the only difference between them being the prior on . MI, ¤ the EM algorithm derived in this paper, is the only algorithm that has the interpretation of mutual information maximization. JENT can be derived from the joint mixture formulation by dropping the prior on . MR can be derived by forcing the classification spaces in anatomy and function to ¤ and so on. Finally, the last algorithm MIz, is used for validation purposes 7 ¨ ¨ 3¡  £ be identical only. It assumes that the complete data variable is known. Whereas has to be estimated by the preceding algorithms MR, JENT and MI, MIz assumes it from the outset, leading to a simple ¨  EM algorithm with the various mixture parameters “frozen” at their true values. The functional source “ground truth” image for all the reconstruction experiments is shown in Figure 2. It comprises four regions including a uniform background. The anatomical source image is chosen to be almost the same as the functional source image except that the two central bright regions seen in the functional “ground truth” image are absent. Note that the actual intensity values in the anatomical source image are irrelevant. For this “ground truth” image, we have ¢ £ ¡ functional classes and anatomical classes. ¡ We generated 300K simulated projection data counts using the phantom in Figure 2. Noiseless projections were generated via line integrals (Radon transform) followed by the addition of Poisson noise. Anecdotal reconstructions using all seven algorithms are shown in Figure 3. The EM-1 reconstruction was obtained with a stopping criterion of 20 iterations. EM-2 used a much larger iteration cap of about 100 iterations. MIX—the mixture of gamma prior with no MR information— was executed using an EM-1 initial condition. The inner EM loop in MIX used a reasonable con¤ . We noticed that (regardless of region vergence criterion based on the total change in label) would usually increase without approaching a finite limiting value. To circumvent this ¨ I I I ¨£   of 500. We do not yet understand why does not approach ¤  "  problem, we imposed a ceiling on a finite fixed point. JENT and MI used the same initial conditions as MIX for . For , we used an initial condition which somewhat favored a good overlap between the first three MR and ET ¦ classes. This initial condition was not necessary for the algorithm MR since the anatomical pixelregion segmentation labels exert a much stronger constraint in this case. Finally, when we assume perfect knowledge about the joint complete data classification matrix in the algorithm MIz, the depends on , it is never updated as well. ¨ ¤ are frozen and never updated. Since ¨ ¨ values of The results shown in Figure 3 show the qualitative behaviors associated with each of the seven algorithms. Since EM-1 uses an early stopping criterion, the reconstructed image looks smoother 12 than its late stopping rule EM-2 counterpart. The EM maximum likelihood reconstruction progressively becomes less biased with increasing number of iterations at the cost of higher variance. Addition of the mixture of gamma prior regularizes the reconstruction. This can be seen in the MIX reconstruction which has a piecewise flat appearance (in contrast to EM-1 and EM-2). Inclusion of the anatomical segmentation labels appears from the second row on in Figure 3. In the MR reconstruction, the high bias incurred from the misclassification of the two central bright regions (in ET) is apparent. The central regions have been classified in the same class as the outer non-background class. Consequently, the intensities in this region are completely wrong, as anticipated. This anecdotal reconstruction clearly illustrates the bias produced when the ET reconstruction is overly constrained by the anatomical segmentation labels. The JENT and MI reconstructions are quite similar and both avoid the problems created by the aforementioned over commitment. Finally, the MIz reconstruction, when compared with the original phantom, demonstrates the improvement that can be obtained when assuming perfect knowledge about the joint classification matrix . The JENT and MI classification images shown in Figures 4 and 5, respectively, depict the clas- ¨ ¨ sification matrix in image form. The classification matrices are not binary despite appearances to the contrary. The differences between JENT and MI are somewhat subtle. Note the big difference in the “outlier” class—the two ET central regions which have no homology in the anatomical image. In JENT, this class is under-represented and the opposite is true for MI. The bias/variance images shown in Figures 6 demonstrate performance differences amongst noise trials. We the seven algorithms. The bias and variance images were computed using ¨© ¦I §7Q 7¦ t ¦ §7AI Q ¨ C6¦ ¢ ¢ ¥£ r H ¤¡ is the “ground truth” functional source image. as  © § W QQQIHF upbRRRP%GD 6YI 7 7 # A ¢ ¨ H t £ ¨ 7 t C 6 7 ¦ 7 ¦ H © ¢ £ ¡r WubcaRRRP%GD 6YI 7 © IQQQIHF ¦ © # ¤ where is the estimated reconstruction and We define the variance image ,  as & @ define the bias image, ¦ © § ¨ A¨ C B¦ ¢ 67 ¢ ¤ ¡r H where 7 ¦ © § Note that the bias image being bipolar is depicted so that gray corresponds to zero bias and lighter (darker) regions correspond to positive (negative) bias. Also, for reasons of dynamic range, we display the standard deviation image as opposed to the variance image. This image is unipolar. The EM-2 reconstruction has a lower bias at the expense of increased variance in comparison to the EM-1 reconstruction. The MR reconstruction shows a high negative bias in the “hot spot” central 13 regions and has a uniformly very low variance. MI has a lower bias and variance than JENT for the central “hot spot”regions. The perfect information algorithm MIz has a uniformly low bias and almost zero variance. To illustrate more quantitatively the performance characteristics of the seven algorithms, we compute scalar summary metrics of bias and variance according to the following prescription. § © A 77 © r Hb     "§ © ¨ © § ¢¥ © § A7 I 7 r Hb    © § # § ¥ £¡ ¨¦¤¢ ¨ © and these are displayed in Figure 7. Note that MIz has the best overall performance in the sense ¤ of being closest to the origin. Inclusion of the prior on in the MI algorithm leads to an overall performance improvement over JENT. As expected, the relative positions of EM-1 and EM-2 illustrate well known bias/variance tradeoffs with iteration number. In this case, MIX performs about as well as EM-1. Note the poor bias behavior manifest in MR. This is is to be expected due to the inordinate influence of the anatomical segmentation label matrix on the complete data ¥ variable . Overall, the inclusion of the various forms of prior information in MI leads to the best ¨ bias/variance performance. 5 Discussion and Conclusion We have introduced a novel Bayesian joint mixture framework for incorporating anatomical information into emission tomographic reconstruction. Unlike many other previous approaches, this approach has few free parameters. In fact, the principal free parameter is —the number of ET classes. Note that , the number of anatomical classes is presumed known from an indepen- ¡ dent anatomical segmentation. This approach does not require that the anatomical and functional regions are exactly homologous; the joint mixture model is designed specifically to handle a mismatch of anatomical and functional regions. As illustrated in Section 4, the statistical trials show a progressive improvement in bias/variance performance as more information is included in the ¤ joint mixture model. The EM algorithm using a prior on the occupation probabilities which we term MI, has the best bias/variance performance. This indicates that careful modeling of the joint classification space of anatomy and function is an important factor in obtaining improved functional reconstructions. While in the present framework, there is room for modeling the raw anatomical intensity information , the EM algorithm does not, in fact, do so. This is a consequence of using a gamma prior. ¢ Since there is no multivariate extension of the gamma prior, it was difficult to jointly model the ¢ anatomical intensity along with the functional intensity . In future work, we plan to overcome as a multivariate Gaussian. Note that the additional this limitation by modeling the pair ¦ ¢ ¨ 7 I 7 6¦£ ¥ anatomical segmentation information would act as a constraint in this extension. The joint mix- ture model using a multivariate Gaussian would be forced to respect the segmentation constraints imposed by which is independently obtained. Unfortunately, this would increase the number ¥ 14 of parameters to be estimated. This requires further investigation. Currently, a perfect registration between the anatomical and functional images is assumed prior to functional image reconstruction. While it is certainly possible to register an anatomical scan and an EM reconstructed ET image of the same patient, registration errors are likely to be present. We have not specifically taken into account the mis-registration error between anatomy and ET in this paper. However, we think the joint mixture model is robust enough to accommodate small overlap errors. In future work, we intend to model these and other factors in order to exploit this framework to obtain practical reconstruction algorithms for the clinic. Acknowledgments We wish to thank Anthony Quinn, Petar Djuric, Michael Leventon and Thomas Hofmann for helpful discussions and Doug Dougherty for help with the figures. This work was partially supported by NIH grant RO1 NS32879. A Algebraic transformations of the constrained complete data energy functions An algebraic transformation—essentially a Legendre transformation—transforms an energy function by introducing auxiliary variables (Mjolsness and Garrett, 1990). Examples of algebraic transformations are: ¤ ¢© ¨H t © t © £ y Pxw€ v © t © £ © £  Pxwv y ¡t ¢ Algebraic transformations are employed when alternating descent algorithms are being designed to minimize complex nonlinear objective functions. Each auxiliary variable results in a separate phase of optimization. We now transform the complete data constrained mixture energy function in (10). The final objective function is: ¨H ¨ 7 ¥ t 7  ¨ ¨C 6  7  7 £ ¨CA6 I 7 ¢ ¥ Pr ¢  ¥ ¡  ¨ H t ¤ r ¨ t £ ¨ H t © r q¦ t £ t € r£ 7 7 €7 r£ r  77 7   ¦  £ ¢  ¤    ¤ ¥      ¤ 7 yPxwv © 7 yPxwv 7 yPxwv ¨ 7 yPxwv 7  y Pxwv   ¤   ¡  € ¨ 7  r € ¨ 7  r t ¨ 7  r € ¨ 7r ¨   r t  ¨ C 6 ¦¥I !CA 6 ¡q§%§I I I ¥q© I ¤ I ¨ I ¢ I 6¦£ BfG § h ¥I ¨I ¦ ¤I ¤  7 ¥ 15 Pr (16) £ ¥ ¤¢e £¡ The above algebraically transformed energy function involves the mixture prior only. Since the variables in the prior alone are involved, this is justified. The alternating update equation can be ¤ ¤ derived by differentiating w.r.t. and setting the results to zero: 7 ¨7 & t  CA6  §  yP7  7 7 £ G D &© ¨¦  7  vxw ¦ £ ¤ £ y y  C  (  (  4( ( 6 § & G C D   (  4 ( © 4( ¨ ( 6 ¢ § 7 Pxw¥v  7  ¨   4© ¨ ¢ G  D  b¨   ( ! ¦( &  ¤  b¦  (!  ( & © b    (!  ( &  II q§¨%§¦I I I ¥¤q© I I ¨ I CA6 ¥ ¡ ¥  &  ¤ @ 6 P§ @ 6P§6 6@ ¨ ¨CA6 I 7 7 7¦ 7 £ H t ¨ C 6  7   7 £  ¨ C A 6 I  7 ¢ ¥ ¡ ¢  ¢ £ ¥ ¡ ¨ ¨ ¢ 2  @ 6@   &  &  &  ¡  &   6@ @ 6 E4  22 6 72 ¨ Pr Pr ¥ ¨ Pr Pr Pr @  &  §@  & ¥ (17)  & ¤  & ¥  (G 6 D£     ( !6 P@     ¥ ¦ 6   ¤ £   ¢ £ @  £    ¡ @  6 E4     Since the solutions w.r.t. all the variables are inter-dependent, we have merely written down the final closed-form solutions on the right hand side of all the above implications. For example, the actual partial derivative w.r.t. is: && 7¦ 7 £ &% 7 ¥  7  ¨ 7  7 ¦  7 £   & 7¥ 7 & ©  ¨ ¨ 7 7 7 77 7  £ ¨CA6 I   ¦  £   ¨ 7 ¨  1 )0' ¨ C 6 7  ¤ ¥ ¢  ¥ ¡ ¢  ¤  depends on the solution for . Solving for the Lagrange parameter and using the solution , we get ¤ using the  ¤  ¥ ¡ (19) ¥ ¡ ¢  ¢ £ Pr ¨ C A 6% I 7  7 ¨CA6 I 7      Pr (18) ¤ Pr ¨H t 7 t£ The solution for constraint Pr 8 8 R2 4 3 6 72 4 @ 8 E4 8 6@4   The EM algorithm can be derived by adding in the complete data likelihood term from (3) and and . Since, we’ve already derived the original EM maximum likelihood § ¦ differentiating w.r.t. algorithm for ET in this manner, this derivation will not be repeated here. In the next section, we summarize the seven different algorithms used in this paper. B Description of the different algorithms The seven different algorithms used (for comparison purposes) are described here. The names given to the seven algorithms are EM-1, EM-2, MIX, MR, JENT, MI and MIz. The first and second algorithms are standard EM algorithms operating on the original likelihood energy function in (2). Since the EM algorithm (for tomographic reconstruction) typically has convergence problems and quite different intensity images can be obtained for different stopping conditions, we use two different EM algorithms with the only difference between them being 16 the stopping rule. EM-1 uses an early stopping rule (and hence incurs higher bias) whereas EM-2 uses a late stopping rule (and incurs higher variance). EM-1 and EM-2: Maximum Likelihood ¦ Initialize Begin A: Do A for a fixed number of iterations ! ! (2 0 2 & (2 2 & 7 ¦ & (4( 2 0 7 § I 4 I2 0 I % End A The third algorithm is the mixture of gamma algorithm, termed MIX. A mixture of gamma prior is added to the emission tomography likelihood energy function. Note that the anatomical segmentation information does not appear in this algorithm. We merely use this algorithm for the sake of comparison. An EM algorithm is used for the mixtures prior. This results in an EM algorithm for the likelihood and an EM algorithm for the prior. Hence, this is an EM algorithm which has been specialized for a mixtures of gamma prior. MIX: Single Mixture of Gamma Prior ¤ (  & 9 (2 0 2 & C!A $ 6 (  & 9 ( 2 2 & ¡7 ¦ I 4 I2 0 I % & 4( ( 2 0 ¡7 § ! ! ¨ qV¦I I I I ¨ £  " converges @ @ @# @ @ @ @ converge ( ( ! 2# 2 ! @# @ (4 2 242 ¤ ¥  ¤ 2 G 2 D@ 2 £ @ # @ @4 @ @ (4 ¤ ¥  ¤ G @D £ # 2 2 ( (   )& & (4( ( y ( (  )& & ¨ £  £  € y P vw  €  y Pxwv  y Pxwv t37 ¦ ¨ H t £ t 4(  @ @$   § @ y Pxwv 7¨ @ 7 ¨7 &  @ @ © ¨¦ § ¤    End B  " ¦ Begin B: EM inner loop. Do B until ¨ I I I ¨£ Begin A: EM outer loop. Do A until ¤ Initialize End A The fourth algorithm directly constrains the ET pixel classification variable using the a priori anatomical segmentation . However, the set of ET classification labels are not allowed to 7¨  ¥ 17 be different from that of anatomy. This has the effect, for example, of misclassification of functional lesions which do not appear in anatomy. In the algorithm description, note the direct modulation  by the anatomical label 7¥ 7¨  of ET pixel classification labels . The algorithm is termed MR. MR: Anatomical Segmentation Constrained Mixture Prior ¤ (  & 9 (2 0 2 & C!A $ 6 (  & 9 ( 2 2 & ¡7 ¦ I 4 I2 0 I % & 2 6(54( 30 ¡7 § ! ! ¨ qV¦I I I I ¨ £  " converges @ @ @# ¤ @ @ @ @ Begin B: EM inner loop. Do B until ¨ I I I ¨£ converge ( ( ! 2# 2 ! @# @ ( 4 2 2 4 (2 2 ¤ ¥  ¤ 2 G 2 D@ 2 £ @ # @ @ 4(@ @ @ (4 ¤ ¥  ¤ G @D £ # 2 2 ¨ £  £ €  y P vw  €  y Pxwv  y Pxwv &  ( ©  ( ( y ¨&§ &¦ (4( ( ( & @ y Pxwv § 7¨  @ t 37 ¦ ¨ H t £ t 4(  @$ @  7 7 ¨ §&  @ @ ¤    End B  " Begin A: EM outer loop. Do A until ¦ Initialize End A The first clue that the algorithm MR introduces a significant, unwarranted bias in the reconstruction is the absence of a second, differentiating segmentation index . Since the anatomical and the # ET classes are not allowed to separately vary, it follows that the ET pixel classes are very tightly coupled with the anatomical segmentation labels. Unfortunately, such a coupling prohibits, by its very nature, the formation of a functional lesion that is a separate ET class not found in anatomy. depends 7¥  multiplicatively on the anatomical segmentation labels . Consequently, if any can never be non-zero. the corresponding ET pixel classification variable ¨ An even more egregious error occurs at the algorithmic level. The update equation for is zero, W 7 ¦F ¥  7¨  The fifth algorithm uses the joint mixture model. Since no prior is added on the occupation ¤ probabilities , the algorithm can be interpreted as minimizing the joint entropy between the anatomical and functional images. Hence, the algorithm is termed JENT. 18 JENT: Joint Entropy (!  & 9 ( 2 0 2 & C A $ 6 ( !  & 9 ( 2 2 & ¡7 ¦ I 4 I2 0 I % & 4( ( 2 0 ¡7 § ! ! ¨ qV¦I I I I ¨ £  " Begin A: EM outer loop. Do A until ¦ converges ¤ ( ¤¥ ¤ (4G D£ ( ¤¥ ¤ (4G D£ ¨ I I I ¨£ ! @# @ @ 2 ! 2#  ¨  £  €  y Pxwv  €  y Pxwv  y Pxwv @ 6P6@ @ 6P6@ t 37 ¦ ¨ H t £ t 4(  £ @ 2 @ 2  22 @@ # @@ # 22 8 R2 4 2     ¤ 6@ § 7 ¨ 7 & @$  ( ! ( © ¨&§ ¦ 4( ( !  ( y & (!  ( & & 7 ¥ 7 ¨ @ 6 E4  @ 6P 6 @6 6P 3  @  End B converge y Pxwv @ @ @# Begin B: EM inner loop. Do B until  " ¤ Initialize End A This algorithm does not suffer from the fatal flaw of the previous algorithm MR. In sharp contrast to MR, this algorithm uses the full joint clustering space of ET and anatomy. The joint classification variable is not directly constrained and overridden by . JENT is the first algorithm that ¥ ¨ allows the formation of a functional lesion that is a separate class and has the expressive power of cutting across several anatomical classes. (This should be a requirement of any approach that uses anatomical side information.) The sixth algorithm uses the joint mixture model with a prior on the occupation probabilities ¤ that is equivalent to maximizing the mutual information between the anatomical and functional images. The algorithm is termed MI. In the algorithm description, note that the update of the pixel 64 72 52 @ 6 E4 & depends on . If we interpret  ¤ 7 ¨  classification variable as the joint class probability and %¤% as the marginal probability of the anatomical class labels, then can be interpreted as the conditional probability of the anatomical class label conditioned on the functional label 64 72 52 @ 6 E4 & 7  ¨  #  & being . This is in contrast to the previous algorithm JENT, where depends only on the joint  ¤ class probability . MI shares with its predecessor JENT the capability of expressing the fact that a functional lesion can “outrun” anatomical regions. 19 20 ¨  £  y Pxwv €  y Pxwv  €  y Pxwv  t 7 ¦ ¨ H t £ t 4(  £ y Pxwv @$ @  "  $ ( !6 ( ! CA ¨ I I @¦£ 7 ¨ 7  @ @#  @  End A & © ¨¦ § (!  ( y & 4( ( !  ( & & 9 2 02 &  & ( 9 ( 2 2 & ¡7 ¦ I 4 I2 0 I % & 2 (654( 30 ¡7 § ! ! ¨I qV¦g I £ @ @6 6P 3 @ 6P 6   @ 6P6@ @ 6P6@ Begin A: EM loop. Do A until converge   Initialize MIz: Perfect Information EM algorithm, termed MIz. 7 ¨ ical and functional pixel classification labels . Consequently, we can estimate the gamma prior parameters with ground-truth pixel classification information. This results in an extremely simple  Finally, for validation purposes, we run MI with perfect information regarding the joint anatomEnd A  ¨  £  € y Pxwv  y Pxwv  €  y Pxwv  t37 ¦ ¨ H t £ t 4( @$ @  £ y Pxwv 2 2 2 ! 2# @ @ ! @#  22 @ 7 ¨ 7 &   @ © ¨¦ § (!  ( & 4( ( !  ( y &  G& ( & ( !& & 7 ¥ 7 ¨ 22 8 72 # @@ @@ ¢ 8 8R2 ¢ 8@ #   ¤ @ 6 P§ ( ¤¥ ¤ (4G D£ ( ¤¥ ¤ (4G D£ ¨ I I I ¨£  " End B  @ 6P 6 @ 6P 6 2 ¢ 6@ 8   ¢ (!  & 9 ( 2 0 2 & C A $ 6 ( !  & 9 ( 2 2 & ¡7 ¦ I 4 I2 0 I % & 4( ( 2 0 ¡7 § ! ! ¨ qV¦I I I I ¨ £ Begin B: EM inner loop. Do B until ¤ ¦ @ @# @ @ 6P6@ 6@ 6@ Begin A: EM outer loop. Do A until converge converges  " Initialize ¤ MI: Mutual Information References Ardekani, B. A., Braun, M., Hutton, B. F., Kanno, I., and Ida, H. (1996). Minimum cross-entropy reconstruction of PET images using prior anatomical information. Phys. Med. Biol., 41(11):2497– 2517. Bernardo, J. and Smith, A. (1994). Bayesian Theory. John Wiley and Sons, New York, NY. Bowsher, J. E., Johnson, V. E., Turkington, T. G., Jaszczak, R. J., and Floyd Jr., C. E. (1996). Bayesian reconstruction and use of anatomical a priori information for emission tomography. IEEE Trans. Med. Imag., 15(5):673–686. de Sa, V. and Ballard, D. (1998). Category learning through multimodality sensing. Neural Computation, 10(5):1097–1118. Dempster, A. P., Laird, N. M., and Rubin, D. B. (1977). Maximum likelihood from incomplete data via the EM algorithm. J. R. Statist. Soc. Ser. B, 39:1–38. Gindi, G., Lee, M., Rangarajan, A., and Zubal, I. G. (1993). Bayesian reconstruction of functional images using anatomical information as priors. IEEE Trans. Med. Imag., 12(4):670–680. Gold, S., Rangarajan, A., and Mjolsness, E. (1996). Learning with preknowledge: clustering with point and graph matching distance measures. Neural Computation, 8(4):787–804. Hathaway, R. (1986). Another interpretation of the EM algorithm for mixture distributions. Statistics and Probability Letters, 4:53–56. Hebert, T. and Leahy, R. (1989). A generalized EM algorithm for 3-D Bayesian reconstruction for Poisson data using Gibbs priors. IEEE Trans. Medical Imaging, 8(2):194–202. Hsu, C. H. and Leahy, R. (1997). PET image reconstruction incorporating anatomical information using segmented regression. In Hanson, K., editor, Proc. SPIE: Medical Imaging - Image Processing, Vol. 3034, pages 381–392, Newport Beach, CA. SPIE. Leahy, R. and Yan, X. (1991). Incorporation of anatomical MR data for improved functional imaging with PET. In Ortendahl, D. and Llacer, J., editors, Information Processing in Medical Imaging, pages 105–120. Wiley-Liss. Lee, S., Hsiao, I., and Gindi, G. (1997). The thin plate as a regularizer in Bayesian SPECT reconstruction. IEEE Trans Nucl. Sci., 44(3):1381–1387. Lipinski, B., Herzog, H., Rota Kops, E., Oberschelp, W., and Muller-Gartner, H. W. (1997). Expectation-Maximization reconstruction of positron emission tomography images using anatomical magnetic resonance information. IEEE Trans. Med. Imag., 16(2):129–136. 21 McLachlan, G. J. and Basford, K. E. (1988). Mixture models: inference and applications to clustering. Marcel Dekker, New York. Meltzer, C. C., Leal, J. P., Mayberg, H. S., Wagner Jr., H. N., and Frost, J. J. (1990). Correction of PET data for partial volume effects in human cerebral cortex by MR imaging. J. of Computer Assisted Tomography, 14(4):561–570. Mjolsness, E. and Garrett, C. (1990). Algebraic transformations of objective functions. Neural Networks, 3:651–669. Muller-Gartner, H. W., Links, J. M., Prince, J. L., Bryan, R. N., McVeigh, E., Leal, J. P., Davatzikos, C., and Frost, J. J. (1992). Measurement of radiotracer concentration in brain gray matter using positron emission tomography: MRI-based correction for partial volume effects. Journal of Cerebral Blood Flow and Metabolism, 12(4):571–583. Ouyang, X., Wong, W. H., Johnson, V. E., Hu, X., and Chen, C. T. (1994). Incorporation of correlated structural images in PET image reconstruction. IEEE Trans. Med. Imag., 13(4):627–640. Redner, R. A. and Walker, H. F. (1984). Mixture densities, maximum likelihood and the EM algorithm. SIAM Review, 26(2):195–239. Rousset, O. G., Ma, Y., and Evans, A. C. (1998). Correction of partial volume effects in PET: Principle and validation. J. Nuc. Med., 39(5):904–911. Sastry, S. and Carson, R. E. (1997). Multimodality Bayesian algorithm for image reconstruction in PET: A tissue composition model. IEEE Trans. Med. Imag., 16(6):750–761. Shepp, L. and Vardi, Y. (1982). Maximum likelihood reconstruction for emission tomography. IEEE Trans. on Medical Imaging, 1(2):113–122. Tenenbaum, J. and Freeman, W. (1997). Separating style and content. In Advances in Neural Information Processing Systems (NIPS) 9, pages 662–668. MIT Press, Cambridge, MA. Videen, T. O., Perlmutter, J. S., Mintun, M. A., and Raichle, M. E. (1988). Regional correction of positron emission tomography data for the effects of cerebral atrophy. Journal of Cerebral Blood Flow and Metabolism, 8(5):662–670. Wells III, W., Viola, P., Atsumi, H., Nakajima, S., and Kikinis, R. (1996). Multi-modal volume registration by maximization of mutual information. Medical Image Analysis, 1(1):35–52. Yuille, A. L., Stolorz, P., and Utans, J. (1994). Statistical physics, mixtures of distributions, and the EM algorithm. Neural Computation, 6(2):334–340. Zhang, Y., Fessler, J., Clinthorne, N., and Rogers, W. (1994). Joint estimation for incorporating MRI anatomic images into SPECT reconstruction. In Trendler, R., editor, IEEE Nucl. Sci. Sym. Med. Imaging Conf., pages 1256–1260, Norfolk, Virginia USA. IEEE. 22 Figures 7  ¨ Figure 1: Illustration of the joint classification . On the left is an anatomical “MR” image with three regions and three classes (gray, white and csf) delineated in gray, blue and black colors respectively. On the right is a functional image with five regions and four classes (very hot, hot, warm and cold) delineated in yellow, red, green, and blue respectively. The very hot class occurs twice in two spatially separated regions. The joint classification matrices are shown for two pixels—one in the (csf, hot) region and the other in the (gray, very hot) region. Note that being the complete data, is inaccessible. It is estimated by the algorithm.  ¨ 23  Figure 2: The “ground truth” functional source image. It has intensity classes as seen in the gray scale encoding. With the exception of the two bright central regions, the anatomical label field with regions, follows the functional image classification. For the bright regions, the anatomical regions correspond to the outer “gray” matter region. ¢ ¡ 3.5 3.5 3.5 3 3 3 2.5 2.5 2.5 2 2 2 1.5 1.5 1.5 1 1 1 0.5 0.5 0.5 0 0 0 3.5 3.5 3.5 3 3 3 2.5 2.5 2.5 2 2 2 1.5 1.5 1.5 1 1 1 0.5 0.5 0.5 0 0 0 Figure 3: Anecdotal Reconstructions: Top row. Left: EM1. Middle: EM2. Right: MIX Middle row. Left: MR. Middle: JENT. Right: MI. Bottom row: MIz. (See text for description of the different algorithms.) 24 1 1 1 0.5 0.5 0.5 0 0 0 1 1 1 0.5 0.5 0.5 0 0 0 1 1 1 0.5 0.5 0.5 0 0 0 1 1 1 0.5 0.5 0.5 0 0 0 Figure 4: Classification images for joint entropy (JENT) reconstruction. The columns correspond to anatomical classes and the rows correspond to ET classes. The bottom left image corresponds to the aberrant ET class that does not have an anatomical counterpart. Note the presence of the two central bright regions in this image.   25 ¨ ¢ ¡ 1 1 0.5 0.5 0 0 1 1 1 0.5 0.5 0.5 0 0 0 1 1 1 0.5 0.5 0.5 0 0 0 1 1 1 0.5 0.5 0.5 0 0 0 0.8 0.6 0.4 0.2 0 Figure 5: Classification images for mutual information (MI) reconstruction. The columns correspond to anatomical classes and the rows correspond to ET classes. The bottom left image corresponds to the aberrant ET class that does not have an anatomical counterpart. Note the presence of the two central bright regions in this image. ¡  26 ¨ ¢ ¡ Bias−EM1 Bias−EM2 Bias−MIX Bias−MR 0.5 0.5 0.5 0.5 0 0 0 0 −0.5 −0.5 −0.5 −0.5 Bias−JENT Bias−MI Bias−MIz 0.5 0.5 0.5 0 0 0 −0.5 −0.5 −0.5 Std−EM1 Std−EM2 300K counts Std−MIX Std−MR 1.5 1.5 1.5 1.5 1 1 1 1 0.5 0.5 0.5 0.5 0 0 0 0 Std−JENT Std−MI Std−MIz 1.5 1.5 1.5 1 1 1 0.5 0.5 0.5 0 0 0 Figure 6: Bias/variance images for the seven different algorithms. (see text for description.) 0.2 MR MIX EM1 0.15 JENT BIAS G EM2 0.1 MI MIz 0.05 0 0 0.1 0.2 STDG 0.3 0.4 Figure 7: Bias/variance performance for the seven different algorithms. (see text for description.) 27 ...
View Full Document

Ask a homework question - tutors are online