V other than the bias parameter are set to zero in 13

This preview shows page 27 - 29 out of 36 pages.

v other than the bias parameter are set to zero in (13). Thus, the form of interaction potential given in (13) effectively combines both the terms of the earlier model in (8). A geometric interpretation of interaction potential is that it partitions the space induced by the relational features μ ij ( y ) between the pairs that have the same labels and the ones that have different labels. Hence (13) acts as a data-dependent discontinuity adaptive model that will moderate smoothing when the data from the two sites is ’different’. The data-dependent smoothing can especially be useful to absorb the errors in modeling the association poten- tial. Anisotropy can be easily included in the DRF model by parametrizing the interaction potentials of different directional pairwise cliques with different sets of parameters v . 4.2. Parameter learning and inference Let θ be the set of DRF parameters where θ = { w , v } . As shown in Section 3.5.3, pseudolikelihood tends to overestimate the interaction parameters causing the MAP estimates of the field to be very poor solutions. Our experiments in 6 Early version of this work appeared in Advances in Neural Information Processing Systems (NIPS 03) (Kumar and Hebert, 2003a).
Image of page 27

Subscribe to view the full document.

28 Section 4.3 verify these observations for the interaction parameters v in modified DRFs too. To alleviate this problem, we take a Bayesian approach to get the maximum a posteriori estimates of the parameters. Similar to the concept of weight decay in neural learning literature, we assume a Gaussian prior over the interaction parameters v such that p ( v | τ ) = N ( v ; 0 , τ 2 I ) where I is the identity matrix. Using a prior over parameters w that leads to weight decay or shrinkage might also be beneficial but we leave that for future exploration. The prior over parameters w is assumed to be uniform. Thus, given M independent training images, b θ =arg max θ M X m =1 X i S ( log σ ( x i w T h i ( y ))+ X j ∈N i x i x j v T μ ij ( y ) - log z i ) - 1 2 τ 2 v T v (15) where z i = X x i ∈{- 1 , 1 } exp ( log σ ( x i w T h i ( y )) + X j ∈N i x i x j v T μ ij ( y ) ) If τ is given, the penalized log pseudolikelihood in (15) is convex with respect to the model parameters and can be easily maximized using gradient descent. In related work regarding the estimation of τ , Mackay (Mackay, 1996) has suggested the use of type II marginal likelihood. But in the DRF formulation, integrating the parameters v is a hard problem. Another choice is to integrate out τ by choosing a non-informative hyperprior on τ as in (Williams, 1995) (Figueiredo, 2001). However our experiments showed that these methods do not yield good estimates of the parameters because of the use of pseudolikelihood in our framework. In the present work we choose τ by cross-validation. Alter- native ways of parameter estimation include the use of contrastive divergence (Hinton, 2002) and saddle point approximations resembling perceptron learning rules (Collins, 2002). We are currently exploring other possibilities of parameter learning, as discussed in our recent work (Kumar et al., 2005).
Image of page 28
Image of page 29
  • Spring '12
  • NguyenXuanLong,JohnLafferty
  • Machine Learning, Probability theory, Conditional Random Field, Random field, feature vector

{[ snackBarMessage ]}

What students are saying

  • Left Quote Icon

    As a current student on this bumpy collegiate pathway, I stumbled upon Course Hero, where I can find study resources for nearly all my courses, get online help from tutors 24/7, and even share my old projects, papers, and lecture notes with other students.

    Student Picture

    Kiran Temple University Fox School of Business ‘17, Course Hero Intern

  • Left Quote Icon

    I cannot even describe how much Course Hero helped me this summer. It’s truly become something I can always rely on and help me. In the end, I was not only able to survive summer classes, but I was able to thrive thanks to Course Hero.

    Student Picture

    Dana University of Pennsylvania ‘17, Course Hero Intern

  • Left Quote Icon

    The ability to access any university’s resources through Course Hero proved invaluable in my case. I was behind on Tulane coursework and actually used UCLA’s materials to help me move forward and get everything together on time.

    Student Picture

    Jill Tulane University ‘16, Course Hero Intern