This preview shows page 1. Sign up to view the full content.
Unformatted text preview: A Bayesian Joint Mixture Framework for the Integration of
Anatomical Information in Functional Image Reconstruction
¡ Anand Rangarajan , IngTsung Hsiao , and Gene Gindi
¡ 30th December 1998 Departments of Electrical Engineering and Diagnostic Radiology, Yale University, New Haven, CT.
Departments of Electrical Engineering and Radiology, SUNY at Stony Brook, Stony Brook, NY.
¡ Abstract
We present a Bayesian joint mixture framework for integrating anatomical image intensity and region segmentation information into emission tomographic reconstruction in medical
imaging. The joint mixture framework is particularly well suited for this problem and allows
us to integrate additional available information such as anatomical region segmentation information into the Bayesian model. Since this information is independently available as opposed
to being estimated, it acts as a good constraint on the joint mixture model. After specifying
the joint mixture model, we combine it with the standard emission tomographic likelihood.
The Bayesian posterior is a combination of this likelihood and the joint mixture prior. Since
well known EM algorithms separately exist for both the emission tomography (ET) likelihood
and the joint mixture prior, we have designed a novel EM algorithm that comprises two EM
algorithms—one for the likelihood and one for the prior. Despite being dovetailed in this
manner, the resulting EM algorithm is an alternating descent algorithm that is guaranteed to
converge to a local minimum of the negative log Bayesian posterior. Results are shown on synthetic images with bias/variance plots used to gauge performance. The EM algorithm resulting from the joint mixture framework has the best bias/variance performance when compared
with six other closely related algorithms that incorporate anatomical information to varying
degrees.
¢ ¢ ¢ Keywords: Mixture modeling, emission tomography, Bayesian reconstruction, expectationmaximization
(EM), alternating algorithms, joint mixtures, algebraic transformations, conjugate priors, gamma
distributions, bias/variance performance, region segmentation, projection data, Radon transform 1 1 Introduction
Maximum likelihood (ML) reconstruction in positron emission tomography (PET) and singlephoton emission computed tomography (SPECT) (Shepp and Vardi, 1982) is hampered by the
loss of information due to noise and other factors in the projection imaging process. One approach to obtaining improved reconstructions is to use priors (Hebert and Leahy, 1989) that reﬂect
the nature of the underlying radionuclide distribution. Recently, in emission tomography (ET),
there has been considerable interest in incorporating side information derived from highly correlated anatomical information, such as coregistered magnetic resonance (MR) images, in the form
of Bayesian priors (see extensive review in Section 2.) The main attraction of this data fusion approach is that one expects to obtain improved reconstructions to the extent that function follows
anatomy.
While the correlation of the spatial distribution of the intensities in functional images and their
coregistered anatomical counterparts is often high, there is no relationship between the actual
values of the two intensity patterns in common anatomical regions. Consequently, we can expect
to gain more purchase by using the anatomical segmentation label information rather than the raw
anatomical intensity as prior information in tomographic reconstruction. This correlation between
function and anatomy is not perfect: there may be regions in the functional image that do not
have an anatomical counterpart. For example, there could be a lesion which is apparent in the
radionuclide distribution as a localized “hot spot” of high intensity pixels without a corresponding
signature lesion in the anatomical label map. Therefore, we need a prior model which takes into
account the good, but not perfect, correlation between a segmented and coregistered anatomical
image and the evolving emission tomographic reconstruction.
We can capture the notion of anatomyfunction correlation via the joint histogram formed from
the raw intensity values of the anatomical and functional images. Qualitatively, we would expect
the joint histogram to comprise several localized peaks, each corresponding to a joint cluster of
anatomy and function pixels. Where the anatomyfunction correlation is high, the volume of the
peaks is large. For a function class with no anatomical counterpart, e.g. functional lesions, there
could be multiple low volume peaks in the joint histogram. If there are
intensity classes in
intensity classes in anatomy, there is a crossproduct of as many as classes ¡¦¢¥ ¡¢
¤£ ¡ function and that may appear in the joint histogram. However, since function follows anatomy for the most
.
part, the actual number of peaks seen in the joint histogram would be much less than
The joint histogram is seen typically as a nonparametric estimate of the joint density between
anatomy and function. In this paper, we have chosen to use a joint mixture framework to model
the joint density between anatomy and function. As opposed to a nonparametric density estimator, the joint mixture framework allows us to associate a parametric density function for each joint
anatomyfunction class. We also have more information than is present in the joint histogram:
often we have a very good estimate of the anatomical segmentation labels which act as a strong
constraint on the joint mixture model. We also employ a prior on the joint mixture occupation
probability to express the tight coupling between anatomy and function. The Bayesian maximum
2 a posteriori (MAP) estimate then has the novel interpretation of including mutual information maximization between the anatomy and function.
We approach the ET reconstruction problem from a Bayesian perspective wherein the prior
is formulated using the aforementioned joint mixture model. We use the familiar Poisson likelihood as a forward model for generating the observed projection data from the underlying radionuclide source distribution. Then, the logposterior is proportional to the sum of the Poisson
loglikelihood and the joint mixture logprior. Our overarching goal is a MAP estimate of the functional image. The parameters of the joint mixture prior are unknown and also have to be estimated
from the projection data (and the known anatomical labels). Since the Poisson loglikelihood is
maximized by the well known emission tomographic (ET) expectationmaximization (EM) algorithm (Shepp and Vardi, 1982) and well known EM algorithms exist for mixture decomposition
(McLachlan and Basford, 1988; Redner and Walker, 1984), it is natural to combine these two algorithms into one EM algorithm. The result is an alternating descent algorithm, which we term the
EM algorithm, which consists of an EM outer loop dealing with the likelihood and an EM inner
loop dealing with the joint mixture model parameters. It is guaranteed to ﬁnd a local minimum of the Bayesian posterior cost function, by virtue of being an alternating descent algorithm. 2 Previous Work
The integration of anatomical side information into the ET reconstruction, a problem of some practical import especially in brain imaging, has been addressed by several groups in the past decade.
The strategies fall mainly into two categories: those that employ explicit boundary information
from the anatomical image, and those that employ implicit boundary information derived from
anatomical tissue classiﬁcation labels at each pixel. In both cases, the anatomyfunction correlation is premised on the physiologically plausible assumption of approximately uniform activity
in distinct anatomical regions.
In the ﬁrst category, loci of discontinuities from the coregistered anatomical image are used
to spatially modulate the formation of discontinuities in the ET reconstruction. For one class of
these methods, the mathematical machinery for imposing side information is couched within a
Bayesian MAP estimation problem that uses an edgepreserving Gibbs prior. A spatially varying
weight (for linesite or discontinuity formation) modulates the Gibbs potential so as to encourage
the formation of discontinuities in the ET reconstruction at sites corresponding to discontinuities
in the anatomical image. Implementational details differ, but such strategies have been employed
in (Leahy and Yan, 1991; Gindi et al., 1993; Ouyang et al., 1994; Lipinski et al., 1997). A related
strategy was employed in (Ardekani et al., 1996) in which an adaptive edgepreserving smoothing
ﬁlter is calculated using multispectral segmented MR data and applied to the ET reconstruction
with the aid of a crossentropy objective function. All of these approaches are distinguished in
the ways they model uncertainty in the location of anatomical discontinuities, errors in the registration, and mismatched resolution of the anatomical and functional images. In cases where the
3 anatomyfunction correlation is high, these methods work well.
The basic idea in the second approach is similar, but here, use is made of tissue label information at each pixel in the coregistered anatomical image (e.g. “gray”, “white” and “csf” labels
derived from a segmented MR brain scan). Through a variety of mathematical strategies, local
differences in the label ﬁeld may be used to again modulate the strength of a smoothing prior.
At loci of label discontinuities, the smoothing weight is lowered and discontinuities are allowed
to form. Since a given pixel may be composed of different tissues, nonbinary “partialvolume”
labels are typically used in these models. In addition, some models posit a uniform concentration of radionuclide for each tissue class, thus accounting for the spatial variation of radionuclide by a weighted (by spatially varying partialvolume tissue labels) sum of activities in each
class. At the interface of anatomical regions, usually corresponding to gradual transitions in ET
intensity, this use of partialvolume modeling is efﬁcacious for accurate ET reconstruction. This
strategy was used in the form of a post correction to a conventionally reconstructed PET brain
scan in (Meltzer et al., 1990; Videen et al., 1988; MullerGartner et al., 1992), with a later, more
sophisticated extension of this approach reported in (Rousset et al., 1998). The incorporation of
label information within a Bayesian MAP estimation procedure is reported in (Sastry and Carson,
1997). In a different Bayesian MAP approach (Hsu and Leahy, 1997), pixel labels that are not dominated by one tissue class are ﬂagged and used to invoke a secondorder “thinplate” prior (Lee
et al., 1997), corresponding in the ET reconstruction to graded intensity regions: otherwise, conventional ﬁrstorder smoothing is imposed. This strategy leads to an ET reconstruction containing
realistic smooth transition regions separating those areas of uniform activity corresponding to relatively pure tissue types. Again, these models work well where the anatomyfunction correlation
is high.
More complex models, not clearly identitiﬁed with either of the two categories, have been
proposed. In (Zhang et al., 1994), a joint MAP estimation approach is proposed wherein, as with
other approaches, anatomical label differences are used to modulate discontinuities in ET. Unlike
other approaches, a separate provision is made to account for the possibility of localized intensity regions in the emission reconstruction that do not correspond to label regions in the anatomy
image. Some user interaction with an initial ET reconstruction is required, however. An elegant
but very elaborate hierarchical Bayesian model proposed in (Bowsher et al., 1996) is able to robustly account for exceptions to the presumed anatomyfunction correlation, and, in fact, utilizes
anatomical information in the form of an initial estimate. However, the model expressiveness
is balanced by the need to estimate several hyperparameters on whose values the performance
depends.
The framework proposed in this paper is motivated by a desire to capture some of the modeling ﬂexibility afforded in (Zhang et al., 1994) and (Bowsher et al., 1996), but without the need for
user interaction or difﬁcult hyperparameter determination.
Finally, this work is similar in spirit to recent work on unsupervised category learning using
mixture models in the neural net literature (Tenenbaum and Freeman, 1997; de Sa and Ballard, 4 1998). The themes are very similar; for example, in (Tenenbaum and Freeman, 1997), a mixture
model is used to model the joint space of font styles and types and in (de Sa and Ballard, 1998),
the focus is squarely on the unsupervised learning of categories from multimodality data. 3 A Bayesian Joint Mixture Framework
In this section, we begin with a brief description of the standard likelihood model for emission
tomographic reconstruction. We then proceed with the development of the joint mixture model.
After describing the overall joint mixture framework in some detail, we specialize the mixture
parametric density to a gamma distribution. With the likelihood and priors speciﬁed, we then
describe the EM algorithm—an alternating descent algorithm that minimizes the negative logposterior. 3.1 The Likelihood Model
We now write down the likelihood for our problem. The likelihood speciﬁes the forward model
for generating the observed projection data from the underlying radionuclide (PET or SPECT)
source distribution. We use the conventional model of Poisson noise at the detector (Shepp and
Vardi, 1982) with the standard assumption that the detector bin counts are independent of one
another. Therefore, the likelihood is 7 2 0( & $# ! ¨
2 @9¨ 7 ¦ 7 ¤ 8BA& ! £ 6(54( 31)'%"¢ ©¦ £ Pr ! ¨¦
©§¥¤£ Poisson (1) ¡
¢ where is the vector of projection data, is the underlying radionuclide source distribution vector
(to be estimated) and
is the forward projection matrix.
and are the associated random ¦ ¤ ﬁelds. The measurement physics is incorporated in . (While this linear model is appropriate WXUVTQRSRP%GEC
I QQIHF D
¤ for SPECT, it has to be extended to an afﬁne model for PET. This extension does not affect our
formulation.) Throughout this paper, we use
as the index for detector measurement W IQQQIHF D
dcbaRRRP%G`Y The corresponding negative loglikelihood is . terms independent of ¨
q¦ as the index for the radionuclide source distribution 7¦ and (2) ! ! 7
t
37 7 7 ¨
£ g7 ¦ 7 ¤
f
r PxwP! u1r ¦ ! ¤ s!¢ q6¦£ pfi h ge
r
yv! Despite the nonquadratic nature of the above loglikelihood, the well known EM algorithm that
maximizes the loglikelihood has a simple, elegant form (Shepp and Vardi, 1982). The EM algorithm is conventionally derived using an incomplete/complete data parameterization (Dempster
et al., 1977). Lack of space precludes us from presenting this derivation of the EM algorithm for
emission tomography. Instead, we present the intuition behind this conventional derivation and
then present a shorter, idiosyncratic derivation.
5 ! The complete data random ﬁeld for emission tomography is the number of photons
that
are emitted from pixel and detected at detector bin . The incomplete projection data random ! 7
¨ 7 8&
! £ C Y is derived from a straightforward manytoone mapping 7 ﬁeld from the complete data . The original incomplete data likelihood in (1) is now written in terms of the complete data
likelihood. Since the actual complete data is unknown, the incomplete data loglikelihood be comes a function of expected value of the complete data. These computed expectations comprise
the Estep. The Mstep is a maximization step carried out after the Estep. For emission tomography, the EM algorithm turns out to be an extremely simple (but usually inefﬁcient) discretetime
dynamical system that is guaranteed to maximize the incomplete data loglikelihood.
We now present an idiosyncratic derivation of the EM algorithm for ET. Borrowing from a
similar EM algorithm for mixture decompositions (Hathaway, 1986; Gold et al., 1996; Yuille et al.,
1994), we write down the following complete data energy function: V¦1§
I
©
7
7 8&
§
C
Y
!
©!
7§
!
t7
E7 § 7 § 7 37 ¦ 7 ¤ 7 § 7 t 7 ¦ 7 ¤ 7 qV¦1¨§£
¨ '7 § £
¨I ¥¡
©
! ! r ! ! r ! Pxwv ! r! ! Pxwv ! r! ! r! Vfi h "f¦¤£¢e
y
y In (3), (3) is the complete data corresponding to the expected number of photons emanating from pixel that are detected at detector . The vector
the completeincomplete data constraint
and the Lagrange parameter is a Lagrange parameter vector that expresses
. It is possible to solve for the three unknowns by differentiating with respect to each variable and setting the result to zero. Rather surprisingly, this yields the emission tomography EM algorithm originally
derived in (Shepp and Vardi, 1982). !© 5 ! ( 22 4 22 &&
7
(
7 § ¦ &
%I
1'
I ¨ H t t £ 2)0(7 ¦ 7 ! ¤ 7
! !!§
& &
&
$ "# !2
3
$ "# !(
2
%
$ "# !
4( The ﬁnal algorithm after eliminating has a simple iterative form which at time step (4) is (5) G
HFED G
HFED !! § 7¤ &
C A 9 67
!B@87(¦
I
I 4 I 2 0 P&
( 4 !( 2 0 ! & While the above derivation is somewhat idiosyncratic, it is especially useful for our purposes. We
were able to convert the original incomplete data loglikelihood energy function in (2) into a constrained energy function from which the well known EM algorithm emerges as a straightforward
alternating descent algorithm. Exactly the same approach is used later in the constrained joint
mixture energy function. 6 3.2 The joint mixture model
Assume that the anatomical image and the (evolving) functional image are perfectly registered
and that the anatomical image has has been resampled so that the pixel size is the same as that of
the functional image. While it is reasonable to expect that the pixel classiﬁcations of the anatomical
and functional images will have some degree of overlap, there is also reason to expect that some
functional image region class may overlap several anatomical region classes. This notion will be
further explicated after notational deﬁnitions have been established.
The joint mixture prior is written as follows: ¨C 6 7 7 £ ¨CA6 I 7 7 7 ¦ 7 £ Pr (6) ¥ ¥
¡ ¢
£ Pr ¢
¤ A rr
¨ ¨ A § ¨ C 6 I !CA 6 I I ¦ £
¤ A ¥
¦ 7 ¨
© ¥
¡ ¢
£ ¡ Pr
¢ In (6), is the anatomical image intensity with being the associated random ﬁeld. The matrix
is the occupation probability of the two intensity images in a
label cross product space.
¤ and H & W ¡ aRRRP%F D %W TRRRI %F 8 I
IQQQIH
I IQQQ H D &
¡¢ #
!" !
¤
¤ ¤ are The constraints on . Throughout this paper, is the subscript index used for the ET classes and the subscript index used for the
anatomical classes. The common index
on and denotes a pixel location in
# ¡I and for a 2D SPECT '%
(& H
¦ b ucba¡I RRRP% F D Y
W QQQIH
(e.g. & or $
is much larger than ¢ b either image. Typically, reconstruction problem with anatomical MR side information). The parameters
and
are
associated with the basic modeling distributions of the joint mixture model and will be described C6 ¥ CA6 ¥ below.
After specifying the joint mixture model (6), we turn to the prior on the occupation probability
matrix . If a perfect overlap existed between the labels in the functional and anatomical image,
¤ ¤ the matrix would be a square matrix and without loss of generality, the diagonal elements would represent the numbers of common pixels sharing the same label. On the other hand, if the anatomical and functional labels were independent of one another, could be factored into a product of
¤ a column and a row vector with the column (row) vector consisting of the occupancy probabil¤ ities of the functional (anatomical) image. The prior on can be viewed as a complexity term, which, when combined with the joint mixture model has the novel interpretation of “maximizing
the mutual information of ” (Wells III et al., 1996). We write the prior on with an eye toward
¤ ¤ maximizing the mutual information:
@
6 E4 (7) 1
QI
RPG& 6 4 2
CDB58 @A489H¥ 72 F 53&
A 0 ¨ £
¤ ) ¥ QIG
7SH F where
is the partition function corresponding to this prior density. This partition function
is a conﬁguration space integral over all satisfying the constraints I&
T ¤ V ¤ 7 !U
. The parameter I IQQQIH
%W TRRSP%F D H ¤ & and is a free parameter (which we later set to the W IQQQIH D
'¡ TRRRP%F ` #
! ). The corresponding negative log prior energy function is
terms independent of (8) Q ¤ ¤
£¢
¤ ¢
£r y
Pxwv ¤ % ¤ % r P vw
y b number of ET pixels ¡
r ¨ £ Vh¥ e ¤ ¤ V This energy function can be viewed as a summation over the negative of the marginal entropies
¤ of .
The only remaining piece of information left unspeciﬁed is the anatomical label information.
We assume that the anatomical image has been previously segmented and registered with the
ET image; for registration purposes, a ﬁltered backprojection (FBP) or EM reconstruction will
sufﬁce. We also assume that every anatomical pixel (in the ET coordinate frame) has a region
label. We represent this anatomical image segmentation information as a matrix with each entry ¥ . (Recall that we use to denote an anatomical class index throughout this paper. )
# denoted by W ¡ ITSQRRIP%F D
QQH
W 7 ¦F
¥
7¥
Essentially, the matrix
contains the degree of membership of pixel in the anatomical image
. The constraints on
are
and
.
to class Y H 7 ¥ &
W ¡ RSRRP%F D I
IQQQIH #S
& 7¥ W 7 ¦F
¥
W 7 ¦F
¥
# The segmentation information derived from a segmentation algorithm is assumed known. It plays a vital role in the computation of the pixel classiﬁcations in the functional image.
Using Bayes’ rule, one could write down the incomplete data posterior energy function consist¤ ing of the tomographic likelihood in (1), the joint mixture model in (6) and the prior on in (7). However, in order to derive the EM algorithm, we switch to the complete data spaces for both the
tomographic likelihood and the mixture prior and then write down the complete data posterior energy function.
¤ Having speciﬁed the prior on , we return to the joint mixture model. The negative logprior
energy function corresponding to the joint mixture model in (6) is ¨ C 6 7 7 £ ¨ !CA 6 I 7 7 7 ¦ 7 £ (9) ¥ Pr ¢
¥
¡ ¢
Pr ¤ 7
t
h
r Pxwv r ¨ C 6 I !CA 6 I I I 6¦£ § £ e
y ¥
¡ ¥
¡ ¤ ¢ As in the case of the tomographic likelihood, it is possible to go from the incomplete data constrained joint mixture energy function to the complete data constrained joint mixture energy function. This conversion was (to our knowledge) ﬁrst pointed out in (Hathaway, 1986) for the canonical single mixture case. The extension to the joint mixture model is straightforward. The complete &
Y
Y
¥
¦F
&
D Y %W ¡ TQRRQRPH%D I 7 ¥ 7 ¨ A & I & 7 ¨
I IQIF
W 7 ©F
¨ ¨ ¨ 7 ¨ @
6P § # ¨ lows. An example of the complete data &
H7
¨ (
( ! ¦¡&
¤ constraints. Since
is the degree of membership of ET pixel in ET class
of membership of the anatomical pixel in anatomical class , the constraint in the above and the degree
fol #
!S . Note the appearance of the anatomical segmentation labels , imply
S . These constraints taken in conjunction with the constraint 7 ¥ 7 ¨
W7
H 7 ¥
W IQQQIH
dcbTRSRP%F that which is a joint classiﬁcation variand data energy function uses a new complete data variable
able. The constraints on are is shown in Figure 1. Note the differences between the anatomical “MR” and functional image regions. Speciﬁcally, in this example, there are two “very
hot” regions in the functional image that do not exist in the anatomical image. One of the “very
8 ¤ hot” regions extends across two anatomical classes.
The complete data constrained joint mixture energy function [after adding the logprior on
terms from (8)] is ¨ H t r £ ¨ 7¢ ¥
¨ C 6 7 7 £ ¨ !CA 6
¥ ¢
£ ¤ Pr ¢
r%
¢r y
t 7 Pxwv ¨ £ 7 7
r
r
I 7 7 7 ¦ 7 £
¢
¥
¡ ¥
¦ (10) ¤ ¤ ¤ ¤ V ¤ % ¨ r t ¨ C 6 ¡I C A 6 ¡I ¤ I ¨ I 6¦£ § £ ¦£ ¡ e
¥
¥
h¥ r yPxwv
7 ¨ 7
yPxwv
7
y
Pxwv
¤¡ ¨ r
7
7r V are Lagrange parameters (vector and scalar, respectively) expressing the con ¤ & and H ¤ 7 ¥ 7 ¨ &
¥ straints and In (10), Pr . From Bayes’ rule, we know that the logposterior is proportional to the sum of the loglikelihood
and the logprior. Combining the complete data mixture energy function in (10) with the complete
data tomography likelihood energy function in (3), we get the generic form of the ﬁnal energy
function used in this paper. This posterior energy function is ¨ C 6 I C A 6 I I ¨ I @¦£ § £ ¥ ¤£¡ e ©V¦1§ £ Vfi h "¥ ¤£¡ e ¨ C 6 I !CA 6 I I ¨ I V¦1§ £ ¦£ ¡ e
¨I f
I¥
h (11) ¥
¡ ¥
¡ ¤ ¥
¡ ¥
¡ ¤ 3.3 Mixtures of conjugate priors and Pr 7
£ ¨CA6 I 7 7 7 ¦ 7 £
¥
¡ ¢ We have not yet speciﬁed the modeling distributions Pr 7 77 7
¨ !CA 6 I ¦ £
¨C 6 7 . The ﬁrst simplifying assumption we make is the independence of the functional image ¥ ¢ pixel conditional density function with respect to the corresponding anatomical image pixel; i.e.
Pr
Pr
. This assumption is clearly limiting and forces to ¨ !CA 6 7 ¦ 7 £ ¤ ¥ ¥
¡ ¢ mediate all of the real “coupling” between the anatomical and functional images. The main reason ¨ 7 I 7 @¦£
¢ for this assumption is ignorance. While we could model the the joint density function
using (say) a multivariate Gaussian for each pair
, this greatly increases the complexity of the
¥ ) ¨
"I £
# estimation procedure for the mixture parameters . A second reason for refraining from modeling
¥ the above joint density for each pair
in the crossproduct space is more technical. It turns out
that using multivariate Gaussian priors (on ) complicates the estimation procedure for . ¨
"I £
# ¦ ¨ C A 6 7 6¦£
¥ ) ¦ We use a gamma prior for the prior density . The use of the gamma prior is mo tivated by the fact that it is conjugate to the Poisson distribution used in the likelihood model.
Conjugate priors usually result in highly analytically tractable posterior estimates (Bernardo and 7¦ to be positive unlike Gaussian !
"@ # (12) 7¨ £
7
( $#A $ ¦
¨ I 6¦£
¨ I £ CA6
@
£
@
9 @
) The mean of a gamma distribution parameterized in this manner is and the variance is @
%@
&$ is deﬁned as @
©§
£ ¨ ¥ densities. The gamma density with Q Smith, 1994). In addition, the gamma prior naturally constrains After resorting to the complete data energy function reformulation trick pioneered in (Hathaway, 1986), the complete data constrained joint mixture energy function which is a specialization
of (10) to the gamma prior is written as y
7 7 7 Pxwv
7 t7
r £ ¨ ¥ ¨ r £ r ¨ P vw
y
7¨ £
( $#A $ ¦
y
P vw
b
%
r %r
7 7 7 yPxwv
r £ r ¨ yP vw
7¦¨ £
y
P vw
!
"@ # ¢ ¨H t ¤ ¥
¦ @ @ ¤ ¤ V ¤¡
£ @ ¤ @ ¤ , (13) can be rewritten as ¢ ¨H t ¤
¥
¦ ! @#
£ @
@
¡ @ @ ¤ (14) B ¤ ¤
¢¢ V ¤ & 1 %¤% t
rb
7 ¨ 7
r
7 ¨ 7 t ¨
r
b
r
7 ¨ 7
r
7 ¨ 7 t ¨
r
¤ & G¡ ¤ e £ ¥ ¤¡
£ V ¤ ¤ I I I ¨ I @¦£ £ § h
" , and setting ¢r
y
7 ¢ t 7 Pxwv
r £ ¨ ¥ ¨
( $#A $ I I I ¨ I @¦£ £ § h
7
7 ¨ )& Using the identity (13)
" G¡ ¤ £ ¥ ¤¡
£ e The last term in (14) can be interpreted as the mutual information of the occupancy probability .
¤ The prior on was reverseengineered in order to obtain the mutual information in the posterior. Equation (14) can be interpreted as a posterior energy function wherein the mutual information of
is maximized subject to the constraints on the complete data and the choice of the modeling
¤ ¨ distribution (in this case, gamma). We have now speciﬁed in great detail the likelihood, the joint mixture with the gamma distribution as the basic modeling distribution, the prior on (motivated by mutual information)
¤ W 7 ¦F
¥
and the constraints due to the available anatomical segmentation information . In addition, we have written down constrained complete data joint mixture energy functions as well as the
complete data likelihood. Next, we develop the EM algorithm to minimize the complete data logposterior energy function. 3.4 The EM algorithm We now write the full complete data energy function which comprises the complete data tomographic likelihood as well as the complete data joint mixture prior (which includes the prior on
). The ﬁnal energy function used in this paper is a summation of the energy functions in (3) and
¤ in (13).
t
'7 ¦ ¨ H t £ t 7 7 ¨ 7
¨£
¦
r
y
P
y
y
tvxw 7 Pxwv g7 Pxwv 7 7 Pxwg7 ¦ 7 ¤ 7 § 7 37 ¦ 7 ¤ 7
y
v
t
¨ '7 § £
¨ I I © I I I I ¨ I V¦1¨§£
I
! §y §
! ! r ! © Pr ! Pxwv ! r! ! Pxwv ! ¥!r ! ¥!r
y
¤ ¥ ¤
¥£
£ 10 £ ¤
" £ G ¢¥ ¤¡ £ e ¤ ¢r
¢ t Pxwv
¨H y (15) ¤ r£ ¤ y
¨ 7 ¥ t 7 ¨ Pxwv £ 7
r
¥
rb % ¤ % r ¤ ¤ 7 ¨7 t
b
7r 7 Pxwv 7 7 r
y
r ¨ Pxwv ¨ r
y
¤ In Appendix A, we derive the ﬁnal set of update equations from the above energy function.
The ﬁnal set of update equations is derived by differentiating w.r.t. each variable in the transformed energy function and setting the result to zero. The result is an alternating descent algorithm that consists of two EM algorithms—one for the likelihood with a feedback contribution
from the prior and one for the joint mixture prior. The mixture EM algorithm is nested inside the
outer likelihood EM. The pseudocode is presented below in the box entitled “The EM Algorithm: Mutual Information”.
The EM Algorithm: Mutual Information (! & 9 ( 2 0 2 & C A $ 6 ( ! & 9 ( 2 2 & ¡7 ¦
I 4 I2 0 I % & 4( ( 2 0 ¡7 §
! !
¨
qV¦I I I I ¨ £
" converges ¤ ( ¤¥ ¤ (4G D£
( ¤¥ ¤ (4G D£
¨ I I I ¨£
! @# @ @ ! 2# 2 2
¨ £
y
Pxwv y
Pxwv @ @
@# @
6P6@
@
6P6@ t
37 ¦ ¨ H t £ t 4(
£ 2 @ @$ 22
@ @@ 22 @@ # # 8
72 8@ ¢ ¢ 6@ 8R2
¢
8 8 ¢ (! ( &
4( ( ! ( y & (
! G& & (&
& 7 ¥ 7 ¨
2 ¤ 6@ § 7 ¨ 7 &
6@ 6
@6
6P 3 @ © ¨¦
§ End B converge y
Pxwv ¦ Begin B: EM inner loop. Do B until y
Pxwv Begin A: EM outer loop. Do A until
" ¤ Initialize End A 4 Experimental Results
In this section, we compare the performance of the EM algorithm with six other closely related
variants. The different algorithms model to varying degrees the inclusion of the various forms of the anatomical information present in the the EM algorithm. The seven algorithms are brieﬂy summarized in Appendix B; they are designated by EM1, EM2, MIX, MR, JENT, MI and MIz. The
EM1 and EM2 are standard EM emissiontomography maximumlikelihood algorithms as in (5)
with the only difference between them being the stopping criterion, i.e. the number of iterations.
11 (The EM maximum likelihood algorithms for ET have a overﬁtting problem and hence never
actually converge (Shepp and Vardi, 1982).) The algorithm MIX uses a mixture of gamma prior
to stabilize the reconstruction and does not include any anatomical information. It is a standard
Bayesian MAP algorithm with a mixture of gamma prior. The fourth algorithm, MR, is derived
based on the assumption that the anatomical and functional classes have a perfect overlap. In
(note the absence of the second subscript ) is 7¥
7¨
directly constrained by the anatomical label information # MR, the complete data classiﬁcation variable and is inhibited from forming “hot spot” lesions in ET that do not appear in anatomy. The ﬁfth and sixth algorithms, JENT and MI,
respectively are EM algorithms with the only difference between them being the prior on . MI,
¤ the EM algorithm derived in this paper, is the only algorithm that has the interpretation of mutual information maximization. JENT can be derived from the joint mixture formulation by dropping
the prior on . MR can be derived by forcing the classiﬁcation spaces in anatomy and function to
¤ and so on. Finally, the last algorithm MIz, is used for validation purposes 7 ¨ ¨
3¡ £ be identical only. It assumes that the complete data variable
is known. Whereas has to be estimated by
the preceding algorithms MR, JENT and MI, MIz assumes it from the outset, leading to a simple ¨ EM algorithm with the various mixture parameters “frozen” at their true values.
The functional source “ground truth” image for all the reconstruction experiments is shown in
Figure 2. It comprises four regions including a uniform background. The anatomical source image
is chosen to be almost the same as the functional source image except that the two central bright
regions seen in the functional “ground truth” image are absent. Note that the actual intensity
values in the anatomical source image are irrelevant. For this “ground truth” image, we have ¢
£ ¡ functional classes and anatomical classes. ¡ We generated 300K simulated projection data counts using the phantom in Figure 2. Noiseless
projections were generated via line integrals (Radon transform) followed by the addition of Poisson noise. Anecdotal reconstructions using all seven algorithms are shown in Figure 3. The EM1
reconstruction was obtained with a stopping criterion of 20 iterations. EM2 used a much larger
iteration cap of about 100 iterations. MIX—the mixture of gamma prior with no MR information—
was executed using an EM1 initial condition. The inner EM loop in MIX used a reasonable con¤ . We noticed that (regardless of region
vergence criterion based on the total change in
label) would usually increase without approaching a ﬁnite limiting value. To circumvent this ¨ I I I ¨£ of 500. We do not yet understand why does not approach
¤
" problem, we imposed a ceiling on a ﬁnite ﬁxed point. JENT and MI used the same initial conditions as MIX for . For , we used
an initial condition which somewhat favored a good overlap between the ﬁrst three MR and ET ¦ classes. This initial condition was not necessary for the algorithm MR since the anatomical pixelregion segmentation labels exert a much stronger constraint in this case. Finally, when we assume
perfect knowledge about the joint complete data classiﬁcation matrix in the algorithm MIz, the
depends on , it is never updated as well. ¨ ¤ are frozen and never updated. Since ¨ ¨ values of The results shown in Figure 3 show the qualitative behaviors associated with each of the seven
algorithms. Since EM1 uses an early stopping criterion, the reconstructed image looks smoother 12 than its late stopping rule EM2 counterpart. The EM maximum likelihood reconstruction progressively becomes less biased with increasing number of iterations at the cost of higher variance.
Addition of the mixture of gamma prior regularizes the reconstruction. This can be seen in the MIX
reconstruction which has a piecewise ﬂat appearance (in contrast to EM1 and EM2). Inclusion of
the anatomical segmentation labels appears from the second row on in Figure 3. In the MR reconstruction, the high bias incurred from the misclassiﬁcation of the two central bright regions (in ET)
is apparent. The central regions have been classiﬁed in the same class as the outer nonbackground
class. Consequently, the intensities in this region are completely wrong, as anticipated. This anecdotal reconstruction clearly illustrates the bias produced when the ET reconstruction is overly
constrained by the anatomical segmentation labels. The JENT and MI reconstructions are quite
similar and both avoid the problems created by the aforementioned over commitment. Finally,
the MIz reconstruction, when compared with the original phantom, demonstrates the improvement that can be obtained when assuming perfect knowledge about the joint classiﬁcation matrix
.
The JENT and MI classiﬁcation images shown in Figures 4 and 5, respectively, depict the clas ¨ ¨ siﬁcation matrix in image form. The classiﬁcation matrices are not binary despite appearances to the contrary. The differences between JENT and MI are somewhat subtle. Note the big difference in the “outlier” class—the two ET central regions which have no homology in the anatomical
image. In JENT, this class is underrepresented and the opposite is true for MI.
The bias/variance images shown in Figures 6 demonstrate performance differences amongst
noise trials. We
the seven algorithms. The bias and variance images were computed using ¨© ¦I
§7Q 7¦ t ¦
§7AI Q
¨
C6¦ ¢ ¢ ¥£ r H
¤¡ is the “ground truth” functional source image.
as ©
§ W QQQIHF
upbRRRP%GD 6YI 7
7
# A ¢ ¨ H t £
¨ 7 t C 6 7 ¦
7
¦
H ©
¢ £ ¡r
WubcaRRRP%GD 6YI 7 ©
IQQQIHF
¦
© # ¤ where is the estimated reconstruction and
We deﬁne the variance image , as &
@ deﬁne the bias image, ¦ ©
§ ¨ A¨
C B¦ ¢ 67
¢ ¤ ¡r H where 7
¦ ©
§ Note that the bias image being bipolar is depicted so that gray corresponds to zero bias and lighter
(darker) regions correspond to positive (negative) bias. Also, for reasons of dynamic range, we
display the standard deviation image as opposed to the variance image. This image is unipolar.
The EM2 reconstruction has a lower bias at the expense of increased variance in comparison to the
EM1 reconstruction. The MR reconstruction shows a high negative bias in the “hot spot” central 13 regions and has a uniformly very low variance. MI has a lower bias and variance than JENT for
the central “hot spot”regions. The perfect information algorithm MIz has a uniformly low bias
and almost zero variance.
To illustrate more quantitatively the performance characteristics of the seven algorithms, we
compute scalar summary metrics of bias and variance according to the following prescription.
§ © A
77
© r Hb
"§
© ¨
© §
¢¥ © § A7
I 7 r Hb
©
§ # § ¥ £¡
¨¦¤¢ ¨
© and these are displayed in Figure 7. Note that MIz has the best overall performance in the sense
¤ of being closest to the origin. Inclusion of the prior on in the MI algorithm leads to an overall performance improvement over JENT. As expected, the relative positions of EM1 and EM2
illustrate well known bias/variance tradeoffs with iteration number. In this case, MIX performs
about as well as EM1. Note the poor bias behavior manifest in MR. This is is to be expected due
to the inordinate inﬂuence of the anatomical segmentation label matrix
on the complete data ¥ variable . Overall, the inclusion of the various forms of prior information in MI leads to the best ¨ bias/variance performance. 5 Discussion and Conclusion
We have introduced a novel Bayesian joint mixture framework for incorporating anatomical information into emission tomographic reconstruction. Unlike many other previous approaches,
this approach has few free parameters. In fact, the principal free parameter is —the number of
ET classes. Note that , the number of anatomical classes is presumed known from an indepen ¡ dent anatomical segmentation. This approach does not require that the anatomical and functional
regions are exactly homologous; the joint mixture model is designed speciﬁcally to handle a mismatch of anatomical and functional regions. As illustrated in Section 4, the statistical trials show
a progressive improvement in bias/variance performance as more information is included in the
¤ joint mixture model. The EM algorithm using a prior on the occupation probabilities which
we term MI, has the best bias/variance performance. This indicates that careful modeling of the joint classiﬁcation space of anatomy and function is an important factor in obtaining improved
functional reconstructions.
While in the present framework, there is room for modeling the raw anatomical intensity information , the EM algorithm does not, in fact, do so. This is a consequence of using a gamma prior.
¢ Since there is no multivariate extension of the gamma prior, it was difﬁcult to jointly model the
¢ anatomical intensity along with the functional intensity . In future work, we plan to overcome
as a multivariate Gaussian. Note that the additional
this limitation by modeling the pair ¦ ¢ ¨ 7 I 7 6¦£
¥ anatomical segmentation information would act as a constraint in this extension. The joint mix ture model using a multivariate Gaussian would be forced to respect the segmentation constraints
imposed by
which is independently obtained. Unfortunately, this would increase the number ¥ 14 of parameters to be estimated. This requires further investigation.
Currently, a perfect registration between the anatomical and functional images is assumed
prior to functional image reconstruction. While it is certainly possible to register an anatomical
scan and an EM reconstructed ET image of the same patient, registration errors are likely to be
present. We have not speciﬁcally taken into account the misregistration error between anatomy
and ET in this paper. However, we think the joint mixture model is robust enough to accommodate
small overlap errors. In future work, we intend to model these and other factors in order to exploit
this framework to obtain practical reconstruction algorithms for the clinic. Acknowledgments
We wish to thank Anthony Quinn, Petar Djuric, Michael Leventon and Thomas Hofmann for helpful discussions and Doug Dougherty for help with the ﬁgures. This work was partially supported
by NIH grant RO1 NS32879. A Algebraic transformations of the constrained complete data energy
functions
An algebraic transformation—essentially a Legendre transformation—transforms an energy function by introducing auxiliary variables (Mjolsness and Garrett, 1990). Examples of algebraic transformations are: ¤ ¢© ¨H t © t © £
y
Pxw v © t ©
£
©
£ Pxwv y
¡t
¢ Algebraic transformations are employed when alternating descent algorithms are being designed
to minimize complex nonlinear objective functions. Each auxiliary variable results in a separate
phase of optimization.
We now transform the complete data constrained mixture energy function in (10). The ﬁnal
objective function is: ¨H
¨ 7 ¥ t 7 ¨
¨C 6 7 7 £ ¨CA6 I 7 ¢ ¥ Pr ¢ ¥
¡ ¨ H t ¤ r ¨ t
£
¨ H t © r q¦ t
£
t
r£
7 7 7
r£ r
77 7
¦ £ ¢
¤ ¤ ¥ ¤ 7
yPxwv
© 7
yPxwv
7
yPxwv
¨ 7
yPxwv 7
y
Pxwv
¤
¡
¨
7 r
¨
7 r t
¨
7 r
¨
7r
¨ r t ¨ C 6 ¦¥I !CA 6 ¡q§%§I I I ¥q© I ¤ I ¨ I ¢ I 6¦£ BfG § h
¥I ¨I ¦
¤I
¤ 7 ¥ 15 Pr (16) £ ¥ ¤¢e
£¡ The above algebraically transformed energy function involves the mixture prior only. Since the
variables in the prior alone are involved, this is justiﬁed. The alternating update equation can be
¤ ¤ derived by differentiating w.r.t. and setting the results to zero: 7 ¨7 & t
CA6
§
yP7 7 7 £ G D &© ¨¦ 7
vxw ¦ £ ¤ £ y y
C ( ( 4( ( 6 § &
G C D ( 4 ( © 4( ¨ ( 6 ¢ § 7 Pxw¥v 7 ¨
4©
¨
¢
G D
b¨
( ! ¦( & ¤
b¦
(! ( &
©
b
(! ( &
II
q§¨%§¦I I I ¥¤q© I I ¨
I
CA6 ¥
¡ ¥
& ¤ @
6 P§ @
6P§6 6@ ¨ ¨CA6 I 7
7 7¦ 7 £
H t ¨ C 6 7 7 £ ¨ C A 6 I 7
¢ ¥
¡ ¢
¢
£ ¥
¡ ¨ ¨ ¢
2 @
6@
&
& & ¡ & 6@
@
6 E4
22
6
72 ¨ Pr Pr ¥ ¨ Pr
Pr
Pr @
& §@ & ¥ (17)
& ¤
& ¥
(G 6 D£
( !6 P@
¥
¦
6
¤
£
¢
£
@
£
¡
@
6 E4
Since the solutions w.r.t. all the variables are interdependent, we have merely written down the
ﬁnal closedform solutions on the right hand side of all the above implications. For example, the
actual partial derivative w.r.t.
is: &&
7¦ 7 £
&% 7 ¥ 7 ¨
7 7 ¦ 7 £
&
7¥ 7 &
©
¨ ¨
7
7
7 77 7
£ ¨CA6 I ¦ £
¨
7 ¨
1 )0' ¨ C 6 7
¤ ¥ ¢
¥
¡ ¢ ¤ depends on the solution for . Solving for the Lagrange parameter
and using the solution
, we get
¤ using the ¤ ¥
¡ (19) ¥
¡ ¢
¢
£ Pr ¨ C A 6% I 7 7
¨CA6 I 7 Pr (18) ¤ Pr ¨H t 7 t£ The solution for
constraint Pr 8
8
R2 4 3
6
72 4
@
8 E4 8
6@4 The EM algorithm can be derived by adding in the complete data likelihood term from (3) and
and . Since, we’ve already derived the original EM maximum likelihood § ¦ differentiating w.r.t. algorithm for ET in this manner, this derivation will not be repeated here. In the next section, we
summarize the seven different algorithms used in this paper. B Description of the different algorithms
The seven different algorithms used (for comparison purposes) are described here. The names
given to the seven algorithms are EM1, EM2, MIX, MR, JENT, MI and MIz.
The ﬁrst and second algorithms are standard EM algorithms operating on the original likelihood energy function in (2). Since the EM algorithm (for tomographic reconstruction) typically
has convergence problems and quite different intensity images can be obtained for different stopping conditions, we use two different EM algorithms with the only difference between them being
16 the stopping rule. EM1 uses an early stopping rule (and hence incurs higher bias) whereas EM2
uses a late stopping rule (and incurs higher variance).
EM1 and EM2: Maximum Likelihood ¦ Initialize Begin A: Do A for a ﬁxed number of iterations ! ! (2 0 2 &
(2 2 & 7 ¦
&
(4( 2 0 7 § I 4 I2 0 I % End A The third algorithm is the mixture of gamma algorithm, termed MIX. A mixture of gamma
prior is added to the emission tomography likelihood energy function. Note that the anatomical
segmentation information does not appear in this algorithm. We merely use this algorithm for
the sake of comparison. An EM algorithm is used for the mixtures prior. This results in an EM
algorithm for the likelihood and an EM algorithm for the prior. Hence, this is an EM algorithm which has been specialized for a mixtures of gamma prior.
MIX: Single Mixture of Gamma Prior
¤ ( & 9 (2 0 2 & C!A $ 6 ( & 9 ( 2 2 & ¡7 ¦
I 4 I2 0 I % & 4( ( 2 0 ¡7 §
! !
¨
qV¦I I I I ¨ £
" converges @ @
@# @ @ @ @ converge ( ( ! 2# 2 ! @# @ (4 2 242
¤ ¥ ¤ 2 G 2 D@ 2 £ @ # @
@4
@
@ (4
¤ ¥ ¤ G @D £ #
2 2 (
( )& &
(4( ( y
(
( )&
& ¨ £
£ y
P vw y
Pxwv y
Pxwv t37 ¦ ¨ H t £ t 4(
@ @$ § @ y
Pxwv 7¨ @ 7 ¨7 &
@ @ © ¨¦
§ ¤
End B
" ¦ Begin B: EM inner loop. Do B until ¨ I I I ¨£ Begin A: EM outer loop. Do A until ¤ Initialize End A
The fourth algorithm directly constrains the ET pixel classiﬁcation variable
using the a
priori anatomical segmentation . However, the set of ET classiﬁcation labels are not allowed to 7¨
¥ 17 be different from that of anatomy. This has the effect, for example, of misclassiﬁcation of functional
lesions which do not appear in anatomy. In the algorithm description, note the direct modulation
by the anatomical label 7¥ 7¨
of ET pixel classiﬁcation labels . The algorithm is termed MR. MR: Anatomical Segmentation Constrained Mixture Prior
¤ ( & 9 (2 0 2 & C!A $ 6 ( & 9 ( 2 2 & ¡7 ¦
I 4 I2 0 I % & 2
6(54( 30 ¡7 §
! !
¨
qV¦I I I I ¨ £
" converges @ @
@# ¤ @ @ @ @ Begin B: EM inner loop. Do B until ¨ I I I ¨£ converge ( ( ! 2# 2 ! @# @ ( 4 2 2 4 (2 2
¤ ¥ ¤ 2 G 2 D@ 2 £ @ # @
@ 4(@
@
@ (4
¤ ¥ ¤ G @D £ # 2 2 ¨ £
£ y
P vw y
Pxwv y
Pxwv &
( © ( ( y ¨&§ &¦
(4(
( ( &
@ y
Pxwv § 7¨
@ t
37 ¦ ¨ H t £ t 4(
@$
@ 7
7 ¨ §&
@ @ ¤
End B
" Begin A: EM outer loop. Do A until ¦ Initialize End A
The ﬁrst clue that the algorithm MR introduces a signiﬁcant, unwarranted bias in the reconstruction is the absence of a second, differentiating segmentation index . Since the anatomical and the
# ET classes are not allowed to separately vary, it follows that the ET pixel classes are very tightly
coupled with the anatomical segmentation labels. Unfortunately, such a coupling prohibits, by its
very nature, the formation of a functional lesion that is a separate ET class not found in anatomy.
depends 7¥
multiplicatively on the anatomical segmentation labels
. Consequently, if any
can never be nonzero.
the corresponding ET pixel classiﬁcation variable ¨ An even more egregious error occurs at the algorithmic level. The update equation for is zero, W 7 ¦F
¥
7¨
The ﬁfth algorithm uses the joint mixture model. Since no prior is added on the occupation
¤ probabilities , the algorithm can be interpreted as minimizing the joint entropy between the
anatomical and functional images. Hence, the algorithm is termed JENT. 18 JENT: Joint Entropy (! & 9 ( 2 0 2 & C A $ 6 ( ! & 9 ( 2 2 & ¡7 ¦
I 4 I2 0 I % & 4( ( 2 0 ¡7 §
! !
¨
qV¦I I I I ¨ £
" Begin A: EM outer loop. Do A until ¦ converges ¤ ( ¤¥ ¤ (4G D£
( ¤¥ ¤ (4G D£
¨ I I I ¨£
! @# @ @ 2
! 2#
¨ £
y
Pxwv y
Pxwv y
Pxwv @
6P6@
@
6P6@ t
37 ¦ ¨ H t £ t 4(
£ @ 2 @ 2 22 @@ # @@
# 22 8
R2 4 2 ¤ 6@ § 7 ¨ 7 & @$
( ! ( © ¨&§ ¦
4( ( ! ( y & (! ( & &
7 ¥ 7 ¨ @
6 E4 @
6P 6
@6
6P 3 @ End B converge y
Pxwv @ @
@# Begin B: EM inner loop. Do B until
" ¤ Initialize End A
This algorithm does not suffer from the fatal ﬂaw of the previous algorithm MR. In sharp contrast
to MR, this algorithm uses the full joint clustering space of ET and anatomy. The joint classiﬁcation variable is not directly constrained and overridden by . JENT is the ﬁrst algorithm that ¥ ¨ allows the formation of a functional lesion that is a separate class and has the expressive power of
cutting across several anatomical classes. (This should be a requirement of any approach that uses
anatomical side information.)
The sixth algorithm uses the joint mixture model with a prior on the occupation probabilities
¤ that is equivalent to maximizing the mutual information between the anatomical and functional
images. The algorithm is termed MI. In the algorithm description, note that the update of the pixel
64
72 52
@
6 E4 & depends on . If we interpret ¤ 7 ¨
classiﬁcation variable as the joint class probability and %¤% as the marginal probability of the anatomical class labels, then
can be interpreted
as the conditional probability of the anatomical class label conditioned on the functional label
64
72 52
@
6 E4 & 7 ¨
# & being . This is in contrast to the previous algorithm JENT, where depends only on the joint ¤ class probability
. MI shares with its predecessor JENT the capability of expressing the fact that
a functional lesion can “outrun” anatomical regions. 19 20 ¨ £
y
Pxwv y
Pxwv y
Pxwv t 7 ¦ ¨ H t £ t 4(
£ y
Pxwv @$ @
"
$ ( !6 ( ! CA ¨ I I @¦£ 7 ¨ 7
@
@# @ End A & © ¨¦ §
(! ( y &
4( ( ! ( & & 9 2 02 & & ( 9 ( 2 2 & ¡7 ¦
I 4 I2 0 I % & 2
(654( 30 ¡7 §
! !
¨I
qV¦g I £
@ @6
6P 3
@
6P 6 @
6P6@
@
6P6@ Begin A: EM loop. Do A until converge Initialize MIz: Perfect Information
EM algorithm, termed MIz. 7 ¨ ical and functional pixel classiﬁcation labels
. Consequently, we can estimate the gamma prior
parameters with groundtruth pixel classiﬁcation information. This results in an extremely simple
Finally, for validation purposes, we run MI with perfect information regarding the joint anatomEnd A
¨ £
y
Pxwv y
Pxwv y
Pxwv t37 ¦ ¨ H t £ t 4( @$ @
£ y
Pxwv 2 2 2 ! 2# @ @ ! @# 22
@ 7 ¨ 7 &
@ © ¨¦
§ (! ( &
4( ( ! ( y & G& (
& ( !&
& 7 ¥ 7 ¨ 22 8
72
# @@ @@ ¢ 8
8R2
¢
8@ # ¤ @
6 P§ ( ¤¥ ¤ (4G D£
( ¤¥ ¤ (4G D£
¨ I I I ¨£
" End B
@
6P 6
@
6P 6 2 ¢ 6@ 8 ¢ (! & 9 ( 2 0 2 & C A $ 6 ( ! & 9 ( 2 2 & ¡7 ¦
I 4 I2 0 I % & 4( ( 2 0 ¡7 §
! !
¨
qV¦I I I I ¨ £ Begin B: EM inner loop. Do B until ¤ ¦ @
@# @ @
6P6@
6@ 6@ Begin A: EM outer loop. Do A until converge converges
" Initialize ¤ MI: Mutual Information References
Ardekani, B. A., Braun, M., Hutton, B. F., Kanno, I., and Ida, H. (1996). Minimum crossentropy reconstruction of PET images using prior anatomical information. Phys. Med. Biol., 41(11):2497–
2517.
Bernardo, J. and Smith, A. (1994). Bayesian Theory. John Wiley and Sons, New York, NY.
Bowsher, J. E., Johnson, V. E., Turkington, T. G., Jaszczak, R. J., and Floyd Jr., C. E. (1996). Bayesian
reconstruction and use of anatomical a priori information for emission tomography. IEEE
Trans. Med. Imag., 15(5):673–686.
de Sa, V. and Ballard, D. (1998). Category learning through multimodality sensing. Neural Computation, 10(5):1097–1118.
Dempster, A. P., Laird, N. M., and Rubin, D. B. (1977). Maximum likelihood from incomplete data
via the EM algorithm. J. R. Statist. Soc. Ser. B, 39:1–38.
Gindi, G., Lee, M., Rangarajan, A., and Zubal, I. G. (1993). Bayesian reconstruction of functional
images using anatomical information as priors. IEEE Trans. Med. Imag., 12(4):670–680.
Gold, S., Rangarajan, A., and Mjolsness, E. (1996). Learning with preknowledge: clustering with
point and graph matching distance measures. Neural Computation, 8(4):787–804.
Hathaway, R. (1986). Another interpretation of the EM algorithm for mixture distributions. Statistics and Probability Letters, 4:53–56.
Hebert, T. and Leahy, R. (1989). A generalized EM algorithm for 3D Bayesian reconstruction for
Poisson data using Gibbs priors. IEEE Trans. Medical Imaging, 8(2):194–202.
Hsu, C. H. and Leahy, R. (1997). PET image reconstruction incorporating anatomical information using segmented regression. In Hanson, K., editor, Proc. SPIE: Medical Imaging  Image
Processing, Vol. 3034, pages 381–392, Newport Beach, CA. SPIE.
Leahy, R. and Yan, X. (1991). Incorporation of anatomical MR data for improved functional imaging with PET. In Ortendahl, D. and Llacer, J., editors, Information Processing in Medical Imaging,
pages 105–120. WileyLiss.
Lee, S., Hsiao, I., and Gindi, G. (1997). The thin plate as a regularizer in Bayesian SPECT reconstruction. IEEE Trans Nucl. Sci., 44(3):1381–1387.
Lipinski, B., Herzog, H., Rota Kops, E., Oberschelp, W., and MullerGartner, H. W. (1997).
ExpectationMaximization reconstruction of positron emission tomography images using
anatomical magnetic resonance information. IEEE Trans. Med. Imag., 16(2):129–136. 21 McLachlan, G. J. and Basford, K. E. (1988). Mixture models: inference and applications to clustering.
Marcel Dekker, New York.
Meltzer, C. C., Leal, J. P., Mayberg, H. S., Wagner Jr., H. N., and Frost, J. J. (1990). Correction of
PET data for partial volume effects in human cerebral cortex by MR imaging. J. of Computer
Assisted Tomography, 14(4):561–570.
Mjolsness, E. and Garrett, C. (1990). Algebraic transformations of objective functions. Neural
Networks, 3:651–669.
MullerGartner, H. W., Links, J. M., Prince, J. L., Bryan, R. N., McVeigh, E., Leal, J. P., Davatzikos,
C., and Frost, J. J. (1992). Measurement of radiotracer concentration in brain gray matter using
positron emission tomography: MRIbased correction for partial volume effects. Journal of
Cerebral Blood Flow and Metabolism, 12(4):571–583.
Ouyang, X., Wong, W. H., Johnson, V. E., Hu, X., and Chen, C. T. (1994). Incorporation of correlated
structural images in PET image reconstruction. IEEE Trans. Med. Imag., 13(4):627–640.
Redner, R. A. and Walker, H. F. (1984). Mixture densities, maximum likelihood and the EM algorithm. SIAM Review, 26(2):195–239.
Rousset, O. G., Ma, Y., and Evans, A. C. (1998). Correction of partial volume effects in PET:
Principle and validation. J. Nuc. Med., 39(5):904–911.
Sastry, S. and Carson, R. E. (1997). Multimodality Bayesian algorithm for image reconstruction in
PET: A tissue composition model. IEEE Trans. Med. Imag., 16(6):750–761.
Shepp, L. and Vardi, Y. (1982). Maximum likelihood reconstruction for emission tomography.
IEEE Trans. on Medical Imaging, 1(2):113–122.
Tenenbaum, J. and Freeman, W. (1997). Separating style and content. In Advances in Neural Information Processing Systems (NIPS) 9, pages 662–668. MIT Press, Cambridge, MA.
Videen, T. O., Perlmutter, J. S., Mintun, M. A., and Raichle, M. E. (1988). Regional correction of
positron emission tomography data for the effects of cerebral atrophy. Journal of Cerebral Blood
Flow and Metabolism, 8(5):662–670.
Wells III, W., Viola, P., Atsumi, H., Nakajima, S., and Kikinis, R. (1996). Multimodal volume
registration by maximization of mutual information. Medical Image Analysis, 1(1):35–52.
Yuille, A. L., Stolorz, P., and Utans, J. (1994). Statistical physics, mixtures of distributions, and the
EM algorithm. Neural Computation, 6(2):334–340.
Zhang, Y., Fessler, J., Clinthorne, N., and Rogers, W. (1994). Joint estimation for incorporating MRI
anatomic images into SPECT reconstruction. In Trendler, R., editor, IEEE Nucl. Sci. Sym. Med.
Imaging Conf., pages 1256–1260, Norfolk, Virginia USA. IEEE.
22 Figures 7 ¨ Figure 1: Illustration of the joint classiﬁcation
. On the left is an anatomical “MR” image with
three regions and three classes (gray, white and csf) delineated in gray, blue and black colors respectively. On the right is a functional image with ﬁve regions and four classes (very hot, hot,
warm and cold) delineated in yellow, red, green, and blue respectively. The very hot class occurs twice in two spatially separated regions. The joint classiﬁcation matrices are shown for two
pixels—one in the (csf, hot) region and the other in the (gray, very hot) region. Note that being
the complete data, is inaccessible. It is estimated by the algorithm.
¨ 23 Figure 2: The “ground truth” functional source image. It has
intensity classes as seen in
the gray scale encoding. With the exception of the two bright central regions, the anatomical label
ﬁeld with
regions, follows the functional image classiﬁcation. For the bright regions, the
anatomical regions correspond to the outer “gray” matter region. ¢ ¡ 3.5 3.5 3.5 3 3 3 2.5 2.5 2.5 2 2 2 1.5 1.5 1.5 1 1 1 0.5 0.5 0.5 0 0 0 3.5 3.5 3.5 3 3 3 2.5 2.5 2.5 2 2 2 1.5 1.5 1.5 1 1 1 0.5 0.5 0.5 0 0 0 Figure 3: Anecdotal Reconstructions: Top row. Left: EM1. Middle: EM2. Right: MIX Middle
row. Left: MR. Middle: JENT. Right: MI. Bottom row: MIz. (See text for description of the
different algorithms.)
24 1 1 1 0.5 0.5 0.5 0 0 0 1 1 1 0.5 0.5 0.5 0 0 0 1 1 1 0.5 0.5 0.5 0 0 0 1 1 1 0.5 0.5 0.5 0 0 0 Figure 4: Classiﬁcation images
for joint entropy (JENT) reconstruction. The columns correspond to
anatomical classes and the rows correspond to
ET classes. The bottom left
image corresponds to the aberrant ET class that does not have an anatomical counterpart. Note
the presence of the two central bright regions in this image. 25 ¨ ¢ ¡ 1 1 0.5 0.5 0 0 1 1 1 0.5 0.5 0.5 0 0 0 1 1 1 0.5 0.5 0.5 0 0 0 1 1 1 0.5 0.5 0.5 0 0 0 0.8
0.6
0.4
0.2
0 Figure 5: Classiﬁcation images
for mutual information (MI) reconstruction. The columns
correspond to
anatomical classes and the rows correspond to
ET classes. The bottom
left image corresponds to the aberrant ET class that does not have an anatomical counterpart. Note
the presence of the two central bright regions in this image. ¡ 26 ¨ ¢ ¡ Bias−EM1 Bias−EM2 Bias−MIX Bias−MR 0.5 0.5 0.5 0.5 0 0 0 0 −0.5 −0.5 −0.5 −0.5 Bias−JENT Bias−MI Bias−MIz 0.5 0.5 0.5 0 0 0 −0.5 −0.5 −0.5 Std−EM1 Std−EM2 300K counts Std−MIX Std−MR 1.5 1.5 1.5 1.5 1 1 1 1 0.5 0.5 0.5 0.5 0 0 0 0 Std−JENT Std−MI Std−MIz 1.5 1.5 1.5 1 1 1 0.5 0.5 0.5 0 0 0 Figure 6: Bias/variance images for the seven different algorithms. (see text for description.) 0.2
MR MIX
EM1 0.15 JENT BIAS G EM2 0.1 MI
MIz 0.05 0
0 0.1 0.2
STDG 0.3 0.4 Figure 7: Bias/variance performance for the seven different algorithms. (see text for description.) 27 ...
View
Full
Document
 Spring '08
 Damkjer

Click to edit the document details