{[ promptMessage ]}

Bookmark it

{[ promptMessage ]}

infomax_notes - Notes on the Infomax Algorithm Upamanyu...

Info iconThis preview shows pages 1–3. Sign up to view the full content.

View Full Document Right Arrow Icon
Notes onthe Infomax Algorithm Upamanyu Madhow Abstract We briefly review the maximum likelihood interpretation of the extended Infomax algo- rithm for independent component analysis (ICA), including the concept of relative gradient used for iterative updates. 1 Maximum Likelihood Formulation Consider a single snapshot of the mixing model X = AS where X , S are n × 1, and A is n × n . We would like to “unmix” the sources by applying an n × n matrix W to get Y = WX In maximum likelihood (ML) estimation, we estimate a parameter θ based on observation x by maximizing the conditional density p ( x | θ ). In order to apply this approach to estimation of W , we must know the conditional density of x given W . Given W , we can compute Y = WX , and we apply ML estimation to this setting by assuming that we know the density of Y . For the “right” W , we assume that (a) the components of Y are independent, (b) they have known marginal densities p i ( y i ), i = 1 , .., n . In practical terms, these marginal densities do not need to be the same as those of the actual independent components: all they do is to provide nonlinearities of the form d dy i log p ( y i ) for iterative update of W . As we have seen from our discussion of the fastICA algorithm, there are a broad range of nonlinearities that can move us towards non-Gaussianity and independence (although only the fourth order nonlinearity is guaranteed to converge to a global optimum). Thus, it makes sense that there should be some flexibility in the choice of nonlinearities in the Infomax algorithm, which is essentially similar in philosophy (except that it uses different nonlinearities and a gradient-based update rather than a Newton update). Equating the probabilities of small volumes, we have p ( x | W ) | d x | = p ( y ) | d y | Since | d y | | d x | = | det ( W ) | we have p ( x | W ) = p ( y ) | det ( W ) | 1
Background image of page 1

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full Document Right Arrow Icon
Taking the log and using the independence of the components of Y , we obtain that the cost
Background image of page 2
Image of page 3
This is the end of the preview. Sign up to access the rest of the document.

{[ snackBarMessage ]}