• 103
• 100% (1) 1 out of 1 people found this document helpful

Course Hero uses AI to attempt to automatically extract content from documents to surface to you and others so you can study better, e.g., in search results, to enrich docs, and more. This preview shows page 59 - 63 out of 103 pages.

Problem 11.11We start with the Kullback-Leibler divergence(1)The probability distributionin the clamped condition is naturally independent of the synapticweightswjiin the Boltzman machine, whereas the probability distributionis dependent onwji.Hence differentiating (1) with respect towji:πi1Z---EiT----exp=ZT*T------exp=qij111T---EiEj()exp+------------------------------------------------------=11ET()exp+------------------------------------------=πii1=ZEiT()expi=Dp+p-pα+pα+pα-------logα=pα+pα-
9(2)To minimize, we use the method of gradient descent:(3)whereεis a positive constant.Letdenote the joint probability that the visible neurons are in stateαand the hiddenneurons are in stateβ, given that the network is in its clamped condition. We may then writeAssuming that the network is in thermal equilibrium, we may use the Gibbs distributionto write(4)whereEαβis the energy of the network when the visible neurons are in stateαand the hiddenneurons are in stateβ. The partition functionZis itself defined byThe energyEαβis defined in terms of the synaptic weightswjibyDp+p-wji--------------------pα+pα-------pα-wji----------α=Dp+p-wjiεDp+p-wji-----------------=εpα+pα-------pα-wji----------α=pαβ-pα-pαβ-β=pαβ-1Z---EαβT---------exp=pα-1Z---EαβT---------expβ=ZEαβT---------expβα=
10(5)whereis the state of neuroniwhen the visible neurons are in stateαand the hidden neuronsare in stateβ. Therefore, using (4):(6)From (5) we have (remembering that in a Boltzmann machinewji=wij)(7)The first term on the right-hand side of (6) is thereforewhere we have made use of the Gibbs distributionas the probability that the visible neurons are in stateαand the hidden neurons are in stateβin thefree-running condition. Consider next the second term on the right-hand side of (6). Except for theminus sign, we may express this term as the product of two factors:(8)The first factor in (8) is recognized as the Gibbs distributiondefined by(9)Eαβ12--wjixjαβxiαβjiji=xiαβpα-wji----------1ZT-------EαβT---------Eαβwji------------expβ=1Z2-----Zwji----------EαβT---------expβEαβwji------------xjαβxiαβ=1ZT-------EαβT---------Eαβwji------------expβ+1ZT-------EαβT---------xjαβxiαβexpβ=1T---pαβ-xjαβxiαββ=pαβ-1Z---EαβT---------exp=1Z2-----Zwji----------EαβT---------expβ1Z---EαβT---------expβ1Z---Zwji----------=pα-pα-1Z---EαβT---------expβ=
11To evaluate the second factor in (8), we write(10)Using (9) and (10) in (8):(11)We are now ready to revisit (6) and thus writeWe now make the following observations:1.The sum of probabilityover the statesαis unity, that is,(12)2.The joint probability(13)Similarly(14)3.The probability of a hidden state, given some visible state, is naturally the same whether thevisible neurons of the network in thermal equilibrium are clamped in that state by the externalenvironment or arrive at that state by free running of the network, as shown by(15)In light of this relation we may rewrite Eq. (13) as1Z---Zwji----------1Z---wji----------EαβT---------expβα=1TZ------EαβT---------Eαβwji------------expβα=1TZ------EαβT---------xjαβxiαβexpβα=1T---pαβ-xjαβxiαββα=1Z2-----Zw

Course Hero member to access this document

Course Hero member to access this document

End of preview. Want to read all 103 pages?