This preview has intentionally blurred sections. Sign up to view the full version.
View Full Document
Unformatted text preview: 6 Conditional Densities A number of machine learning algorithms can be derived by using condi tional exponential families of distribution (Section 2.3 ). Assume that the training set { ( x 1 y 1 ) . . . ( x m y m ) } was drawn iid from some underlying distribution. Using Bayes rule ( 1.15 ) one can write the likelihood p ( θ  X Y ) ∝ p ( θ ) p ( Y  X θ ) = p ( θ ) m i =1 p ( y i  x i θ ) (6.1) and hence the negative loglikelihood − log p ( θ  X Y ) = − m i =1 log p ( y i  x i θ ) − log p ( θ ) + const. (6.2) Because we do not have any prior knowledge about the data, we choose a zero mean unit variance isotropic normal distribution for p ( θ ). This yields − log p ( θ  X Y ) = 1 2 θ 2 − m i =1 log p ( y i  x i θ ) + const. (6.3) Finally, if we assume a conditional exponential family model for p ( y  x θ ), that is, p ( y  x θ ) = exp ( φ ( x y ) θ − g ( θ  x )) (6.4) then − log p ( θ  X Y ) = 1 2 θ 2 + m i =1 g ( θ  x i ) − φ ( x i y i...
View
Full Document
 Spring '08
 Staff
 Derivative, Conditional Probability, 1 m, 1 g

Click to edit the document details