Cult and in any case does not provide feedback for

Info iconThis preview shows page 1. Sign up to view the full content.

View Full Document Right Arrow Icon
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: o di cult and, in any case, does not provide feedback for representative test examples. Therefore, it is important to continue to explore a range of alternative datasets and evaluation methods and to avoid prematurely committing to a speci c methodology or overinterpreting the results of individual studies. 3.2 Basic Results The results are summarized in Table 6, where N represents the number of training examples utilized and results are shown for a number of representative points along the learning curve. Overall, the results are quite encouraging even when the system is given relatively small training sets, and performance generally improves quite rapidly as the number of training examples are increased. The SF data set is clearly the most di cult since there are very few highlyrated books. Although accuracy for SF is less than choos- ing the most common class negative, the other metrics are more informative. The top n" metrics are perhaps the most relevant to many users. Consider precision at top 3, which is fairly consistently in the 90 range after only 20 training examples the exceptions are Lit1 until 70 examples1 and SF until 450 examples. Therefore, Libra's top recommendations are highly likely to be viewed positively by the user. Note that the  Positive" column in Table 4 gives the probability that a randomly chosen example from a given data set will be positively rated. Therefore, for every data set, the top 3 and top 10 recommendations are always substantially more likely than random to be rated positively, even after only 5 training examples. Considering the average rating of the top 3 recommendations, it is fairly consistently above an 8 after only 20 training examples the exceptions again are Lit1 until 100 examples and SF. For every data set, the top 3 and top 10 recommendations are always rated substantially higher than a randomly selected example cf. the average rating from Table 4. Looking at the rank correlation, except for SF, there is at least a moderate correlation rs  0:3 after only 10 examples, and SF exhibits a moderate correlation after 40 examples. This becomes a strong correlation rs  0:6 for Lit1 after only 20 examples, for Lit2 after 40 examples, for Sci after 70 examples, for Myst after 300 examples, and for 1 References to performance at 70 and 300 examples are based on learning curve data not included in the summary in Table 6. 7 0.6 6 0.5 5 Rating Top 3 8 0.7 Correlation Coefficient 0.8 0.4 0.3 0.2 LIBRA LIBRA-NR 1 0 0 0 200 300 Figure 1: 100 Lit1 400 500 600 Training Examples 700 800 900 Rank Correlation 80 70 60 50 40 30 LIBRA LIBRA-NR 10 0 0 50 Figure 2: 100 150 Myst 200 250 300 Training Examples 50 100 SF 150 200 250 300 Training Examples 350 400 450 Average Rating of Top 3 results shown in Figure 1, there is a consistent, statisticallysigni cant di erence in performance from 20 examples onward. For the Myst results on precision at top 10 shown in Figure 2, there is a consistent, stati...
View Full Document

This document was uploaded on 09/12/2013.

Ask a homework question - tutors are online