This preview shows page 1. Sign up to view the full content.
Unformatted text preview: the documents are related. For example, if the documents
are all news articles, the article “Patriots game canceled due to hurricane”
is related to the article “New York Giants lose superbowl” because they are
both about football. The article “Patriots game canceled due to hurricane”
is also related to the article “Record snowfall in May” because they are both
about the weather. We will now develop a hierarchical model for ﬁnding topic
relationships between documents in an unsupervised setting. The method is
called Latent Dirichlet Allocation (LDA) and it was developed by David Blei,
Andrew Ng, and Michael Jordan.
5.1.1 LDA formulation The model has several components. The data are m documents, with docu
ment i consisting of ni words. Each word in the document will be associated
with one of K topics. We let zi,j denote the topic of word j in document i.
We model zi,j ∼ Multinomial(θi ), where θi ∈ JK describes the topic mixture
of document i.
For each topic, we deﬁne a multinomial...
View Full Document
This note was uploaded on 03/24/2014 for the course MIT 15.097 taught by Professor Cynthiarudin during the Spring '12 term at MIT.
- Spring '12