This preview has intentionally blurred sections. Sign up to view the full version.
View Full Document
Unformatted text preview: CS229 Lecture notes Andrew Ng Part XII Independent Components Analysis Our next topic is Independent Components Analysis (ICA). Similar to PCA, this will find a new basis in which to represent our data. However, the goal is very different. As a motivating example, consider the “cocktail party problem.” Here, n speakers are speaking simultaneously at a party, and any microphone placed in the room records only an overlapping combination of the n speakers’ voices. But lets say we have n different microphones placed in the room, and because each microphone is a different distance from each of the speakers, it records a different combination of the speakers’ voices. Using these microphone record ings, can we separate out the original n speakers’ speech signals? To formalize this problem, we imagine that there is some data s ∈ R n that is generated via n independent sources. What we observe is x = As, where A is an unknown square matrix called the mixing matrix . Repeated observations gives us a dataset { x ( i ) ; i = 1 , . . . , m } , and our goal is to recover the sources s ( i ) that had generated our data ( x ( i ) = As ( i ) ). In our cocktail party problem, s ( i ) is an ndimensional vector, and s ( i ) j is the sound that speaker j was uttering at time i . Also, x ( i ) in an ndimensional vector, and x ( i ) j is the acoustic reading recorded by microphone j at time i . Let W = A 1 be the unmixing matrix. Our goal is to find W , so that given our microphone recordings x ( i ) , we can recover the sources by computing s ( i ) = W x ( i ) . For notational convenience, we also let w T i denote 1 2 the ith row of W , so that W = — w T 1 — . . . — w T n — . Thus, w i ∈ R n , and the jth source can be recovered by computing s ( i ) j = w T j x ( i ) . 1 ICA ambiguities To what degree can W = A 1 be recovered? If we have no prior knowledge about the sources and the mixing matrix, it is not hard to see that there are some inherent ambiguities in A that are impossible to recover, given only the x ( i ) ’s. Specifically, let P be any nby n permutation matrix. This means that each row and each column of P has exactly one “1.” Here’re some examples of permutation matrices: P = 1 1 1 ; P = 1 1 ; P = 1 1 . If z is a vector, then P z is another vector that’s contains a permuted version of z ’s coordinates. Given only the x ( i ) ’s, there will be no way to distinguish between W and P W . Specifically, the permutation of the original sources is ambiguous, which should be no surprise. Fortunately, this does not matter for most applications....
View
Full Document
 '09
 Normal Distribution, Sources, Maximum likelihood, stochastic gradient ascent

Click to edit the document details