Selected training examples are less e ective than

Info iconThis preview shows page 1. Sign up to view the full content.

View Full Document Right Arrow Icon
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: ly randomly selected training examples are less e ective than user-selected ones, we still prefer complete random sampling since this is the more conservative approach which, to the extent it produces inaccurate results, probably tends to under-estimate true performance. Ideally, users should also provide informed opinions of examples, which, if they do not select the examples themselves, may require a time-consuming process such as reading a book, listening to a CD, or watching a movie. Some metrics, such as top-N measures, only require user ratings for a speci c subset of test examples, and therefore may be accurately estimated by obtaining informed ratings on a Data Lit1 Lit1 Lit1 Lit1 Lit1 Lit1 Lit2 Lit2 Lit2 Lit2 Lit2 Lit2 Myst Myst Myst Myst Myst Myst Sci Sci Sci Sci Sci Sci SF SF SF SF SF SF N 5 10 20 40 100 840 5 10 20 40 100 840 5 10 20 40 100 450 5 10 20 40 100 450 5 10 20 40 100 450 Acc 63.5 65.5 73.4 73.9 79.0 79.8 59.0 65.0 69.5 74.3 78.0 80.2 73.2 75.6 81.6 85.2 86.6 85.8 62.8 67.6 75.4 79.6 81.8 85.2 67.0 64.6 71.8 72.6 76.4 79.2 Rec 49.0 51.3 64.8 65.1 70.7 62.8 57.6 64.5 67.2 72.1 78.5 71.9 83.4 87.9 89.3 95.4 95.2 93.2 63.8 61.9 66.0 69.5 74.4 79.1 38.3 49.0 45.8 58.9 65.7 82.2 Pr Pr3 Pr10 F Rt3 Rt10 rs 50.3 63.3 62.0 46.5 5.87 6.02 0.31 53.3 86.7 76.0 49.7 6.63 6.65 0.35 62.6 86.7 81.0 62.6 7.53 7.20 0.62 63.6 86.7 81.0 63.4 7.40 7.32 0.64 71.1 96.7 86.0 70.5 8.03 7.44 0.69 75.9 96.7 94.0 68.5 8.57 8.03 0.74 52.4 70.0 74.0 53.3 6.80 6.82 0.31 56.7 80.0 82.0 59.2 7.33 7.33 0.48 63.2 93.3 91.0 64.1 8.20 7.84 0.59 68.9 93.3 91.0 69.0 8.53 7.94 0.69 71.2 96.7 94.0 74.4 8.77 8.22 0.72 78.6 100.0 97.0 74.8 9.13 8.48 0.77 82.1 86.7 89.0 81.5 8.20 8.40 0.36 82.4 90.0 90.0 83.8 8.40 8.34 0.40 86.4 96.7 91.0 87.3 8.23 8.43 0.46 85.9 96.7 94.0 90.3 8.37 8.52 0.50 87.2 93.3 94.0 90.9 8.70 8.69 0.55 88.1 96.7 98.0 90.5 8.90 8.97 0.61 46.3 73.3 60.0 51.1 6.97 6.17 0.35 51.2 80.0 67.0 54.3 7.30 6.32 0.37 64.2 96.7 80.0 63.1 8.37 7.03 0.51 68.7 93.3 80.0 68.3 8.43 7.23 0.59 72.2 93.3 83.0 72.3 8.50 7.29 0.65 76.8 93.3 89.0 77.2 8.57 7.71 0.71 32.9 40.0 29.0 28.2 5.23 4.34 0.02 28.9 53.3 36.0 31.5 5.83 4.72 0.15 37.4 66.7 37.0 37.8 6.23 5.04 0.21 40.1 70.0 43.0 43.0 6.47 5.26 0.39 46.2 80.0 56.0 52.4 7.00 5.75 0.40 49.1 90.0 63.0 60.6 7.70 6.26 0.61 Table 6: Summary of Results smaller set of examples after the system has made its predictions. However, this requires users to be available and willing to dedicate signi cant e ort during the experimental evaluation, rather than allowing a system to be automatically evaluated on an archived set of existing data. In our experiments, ratings of unfamiliar items were based only on the information available from Amazon, and therefore are not ideal. Overall, it is clear that all existing experimental methods and metrics have strengths and weaknesses. Conducting quality, controlled user-experiments is di cult, expensive, and time consuming. Obtaining proprietary data from existing commercial systems is als...
View Full Document

{[ snackBarMessage ]}

Ask a homework question - tutors are online