1 sample positive pro le features where nkem is the

Info iconThis preview shows page 1. Sign up to view the full content.

View Full Document Right Arrow Icon
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: e Positive Pro le Features where nkem is the count of the number of times word wk appears in example Be in slot sm , and Lcj ; sm  = X N e=1 ej jdm j 5 denotes the total weighted length of the documents in category cj and slot sm . These parameters are smoothed" using Laplace estimates to avoid zero probability estimates for words that do not appear in the limited training sample by redistributing some of the probability mass to these items using the method recommended in 14 . Finally, calculation with logarithms of probabilities is used to avoid under ow. The computational complexity of the resulting training testing algorithm is linear in the size of the training testing data. Empirically, the system is quite e cient. In the experiments on the Lit1 data described below, the current Lisp implementation running on a Sun Ultra 1 trained on 20 examples in an average of 0.4 seconds and on 840 examples in an average of 11.5 seconds, and probabilistically categorized new test examples at an average rate of about 200 books per second. An optimized implementation could no doubt signi cantly improve performance even further. A pro le can be partially illustrated by listing the features most indicative of a positive or negative rating. Table 1 presents the top 20 features for a sample pro le learned for recommending science books. Strength measures how much more likely a word in a slot is to appear in a positively rated book than a negatively rated one, computed as: Strengthwk ; sj  = logP wk jc1 ; sj =P wk jc0 ; sj  6 2.3 Producing, Explaining, and Revising Recommendations Once a pro le is learned, it is used to predict the preferred ranking of the remaining books based on posterior probability of a positive categorization, and the top-scoring recommendations are presented to the user. The system also has a limited ability to explain" its recommendations by listing the features that most contributed to its high rank. For example, given the pro le illustrated Word Strength MULTIVERSE 75.12 UNIVERSES 25.08 REALITY 22.96 UNIVERSE 15.55 QUANTUM 14.54 INTELLECT 13.86 OKAY 13.75 RESERVATIONS 11.56 DENIES 11.56 EVOLUTION 11.02 WORLDS 10.10 SMOLIN 9.39 ONE 8.50 IDEAS 8.35 THEORY 8.28 IDEA 6.96 REALITY 6.78 PARALLEL 6.76 IMPLY 6.47 GENIUSES 6.47 Table 2: Sample Recommendation Explanation The word UNIVERSES is positive due to your ratings: Title Rating Count The Life of the Cosmos 10 15 Before the Beginning : Our Universe and Others 8 7 Unveiling the Edge of Time 10 3 Black Holes : A Traveler's Guide 9 3 The In ationary Universe 9 2 Table 3: Sample Feature Explanation above, Libra presented the explanation shown in Table 2. The strength of a cue in this case is multiplied by the number of times it appears in the description in order to fully indicate its in uence on the ranking. The positiveness of a feature can in turn be explained by listing the user's training examples that most in uenced its strength, as illustrated in Table 3 where C...
View Full Document

This document was uploaded on 09/12/2013.

Ask a homework question - tutors are online