45 053 057 058 100 100 100 100 079 099 100 100 084

Info iconThis preview shows page 1. Sign up to view the full content.

View Full Document Right Arrow Icon
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: 00 0.000 0.000 0.000 0.000 0.000 0.000 1.000 1.000 1.000 docID 7 8 9 10 11 document text sweet sugar sugar cane brazil sweet sugar beet sweet cake icing cake black forest Iteration of clustering 2 3 4 5 0.45 0.53 0.57 0.58 1.00 1.00 1.00 1.00 0.79 0.99 1.00 1.00 0.84 1.00 1.00 1.00 0.75 0.94 1.00 1.00 0.52 0.66 0.91 1.00 1.00 1.00 1.00 1.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.40 0.14 0.01 0.00 0.57 0.58 0.41 0.07 0.100 0.134 0.158 0.158 0.083 0.042 0.001 0.000 0.000 0.000 0.000 0.000 0.167 0.195 0.213 0.214 0.400 0.432 0.465 0.474 0.167 0.090 0.014 0.001 0.000 0.000 0.000 0.000 0.500 0.585 0.640 0.642 0.300 0.238 0.180 0.159 0.417 0.507 0.610 0.640 15 0.54 1.00 1.00 1.00 1.00 1.00 0.83 0.00 0.00 0.00 0.00 0.00 0.169 0.000 0.000 0.196 0.508 0.000 0.000 0.589 0.153 0.608 25 0.45 1.00 1.00 1.00 1.00 1.00 0.00 0.00 0.00 0.00 0.00 0.00 0.200 0.000 0.000 0.167 0.600 0.000 0.000 0.500 0.000 0.667 Table 16.3 The EM clustering algorithm. The table shows a set of documents (a) and parameter values for selected iterations during EM clustering (b). Parameters shown are prior α , soft assignment scores r (both omitted for cluster 2), and lexical n༆  Experimental Results with Basic EM Define data log likelihood to be L(y,x) + L(x), computed on labeled and unlabeled data. Find parameters that maximize the total likelihood. [Nigam, McCallum, Mitchel, 2006] 39 100% 10000 unlabeled documents No unlabeled documents 90% 80% 70% Accuracy 3.3 EM for Semi-supervised Learning 60% 50% 40% n༆  Paper also presents a number of other fancier models where the unlabeled data helps more. 30% 20% 10% 0% 10 20 50 100 200 500 1000 Number of Labeled Documents 2000 5000 Figure 3.1 Classification accuracy on the 20 Newsgroups data set, both with and without 10,000 unlabeled documents. With small amounts of training data, using EM yields more accurate classifiers. With large amounts of labeled training data, accurate parameter estimates can be obtained without the use of unlabeled data, and classification accuracies of the two methods begin to converge. additive and multiplicative constants are dropped, but the relative values are maintained. Unsupervised Tagging? n༆  n༆  AKA part-of-speech induction Task: n༆  n༆  n༆  Raw sentences in Tagged sentences out Obvious thing to do: n༆  n༆  n༆  Start with a (mostly) uniform HMM Run EM Inspect results EM for HMMs: Process n༆  ML Estimate (only possible with full supervision): c(y, x) e M L (x | y ) = c (y ) c(yi 1 , yi ) q M L ( yi | yi 1 ) = c ( yi 1 ) Instead, alternate between recomputing distributions over hidden variables (the tags) and reestimating parameters n༆  Crucial step: we want to tally up (fractional) counts X X ⇤ c (y, x) = p( y j | x 1 . . . x m ) c ⇤ (y ) = p(x1 . . . xm , yj ) n༆  j :y j = y ⇤ 0 c (y, y ) = X j :y 0 =yj ,y =yj n...
View Full Document

{[ snackBarMessage ]}

Ask a homework question - tutors are online