This preview shows page 1. Sign up to view the full content.
Unformatted text preview: distribution over all possible words.
For example, given the topic is “Sports”, the probability of having the word
“football” might be high; if the topic were “Weather”, the probability of hav
ing the word “football” might be lower. Other words, like “the” will have a
high probability regardless of the topic. If words are chosen from a set of W
possible words, then we let φk ∈ JW be the multinomial parameter over words
for topic k . Word j of document i, denoted wi,j , will be generated by the dis
tribution over words corresponding to the topic zi,j : wi,j ∼ Multinomial(φzi,j ).
Finally, we give prior distributions for the parameters θi and φk . The multi
nomial distribution is a generalization of the binomial distribution, and its
20 conjugate prior is a generalization of the beta distribution: the Dirichlet
distribution. Thus we model the data with the following generative model:
1. For document i = 1, . . . , m, choose the document’s topic distribution
θi ∼ Dirichlet(α), where α ∈ JK is the prior hyperparameter.
2. For topic k = 1, . . . , K , choose the topic’s word distribution φk ∼
View Full Document
- Spring '12