STAT 760, November 15, 2011 Homework 3, Due November 23 1. Summarize your proposed final project (about 1 page). Email this to Prof Newton separately from the other homework solutions. 2. Linear Discriminant Analysis (LDA-1): Apply LDA-1 to build a classifier to discriminate patients whose tumor would be controlled by radiotherapy and those whose tumor would not, using a data set of n = 858 cancer patients available in bir.RData . Seven covariates characterizing the radiation treatment are in the data file. 3. Latent Dirichlet allocation (LDA-2): Recall LDA-2 as discussed in class last week. There are K topics { k } and a vocabulary of W words { w } . Each topic is a probability distribution over words: φ k = ( φ k, 1 , . . . , φ k,W ). A document i comprised of n i words is governed, additionally, by a document-specific distribution θ i = ( θ i, 1 , . . . , θ i,K ) over topics. Topics are latent, and all we get to know for each document is how many words of each type w appear. A more complete data representation (useful in computation) is to know D comp = { X i,k,w } , over all documents, topics, and words, where each
