09-recsys

09-recsys - CS246 Mining Massive Datasets Jure Leskovec...

Info iconThis preview shows pages 1–14. Sign up to view the full content.

View Full Document Right Arrow Icon
CS246: Mining Massive Datasets Jure Leskovec, Stanford University http://cs246.stanford.edu
Background image of page 1

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full Document Right Arrow Icon
Training data 100 million ratings, 480,000 users, 17,770 movies 6 years of data: 2000-2005 Test data Last few ratings of each user (2.8 million) Evaluation criterion: root mean squared error (RMSE) Netflix Cinematch RMSE: 0.9514 Competition 2700+ teams $1 million prize for 10% improvement on Cinematch $50,000 progress prize for 8.43% improvement 2/2/2011 Jure Leskovec, Stanford C246: Mining Massive Datasets 2
Background image of page 2
Sudden rise in the avg. rating (early 2004): Improvements in Netflix GUI improvements Meaning of rating changed? Ratings increase with the movie age at the time of the rating [Bellkor Team] 2/2/2011 Jure Leskovec, Stanford C246: Mining Massive Datasets
Background image of page 3

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full Document Right Arrow Icon
Count Avg rating Most Loved Movies 137812 4.593 The Shawshank Redemption 133597 4.545 Lord of the Rings : The Return of the King 180883 4.306 The Green Mile 150676 4.460 Lord of the Rings : The Two Towers 139050 4.415 Finding Nemo 117456 4.504 Raiders of the Lost Ark Most Rated Movies Miss Congeniality Independence Day The Patriot The Day After Tomorrow Pretty Woman Pirates of the Caribbean Highest Variance The Royal Tenenbaums Lost In Translation Pearl Harbor Miss Congeniality Napolean Dynamite Fahrenheit 9/11 [Bellkor Team] 2/2/2011 4 Jure Leskovec, Stanford C246: Mining Massive Datasets
Background image of page 4
5 User ID # Ratings Mean Rating 305344 17,651 1.90 387418 17,432 1.81 2439493 16,560 1.22 1664010 15,811 4.26 2118461 14,829 4.08 1461435 9,820 1.37 1639792 9,764 1.33 1314869 9,739 2.95 [Bellkor Team] 2/2/2011 5 Jure Leskovec, Stanford C246: Mining Massive Datasets
Background image of page 5

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full Document Right Arrow Icon
1 3 4 3 5 5 4 5 5 3 3 2 2 2 5 2 1 1 3 3 1 17,700 movies 480,000 users 2/2/2011 6 Jure Leskovec, Stanford C246: Mining Massive Datasets
Background image of page 6
1 3 4 3 5 5 4 5 5 3 3 2 ? ? ? 2 1 ? 3 ? 1 Test Data Set (most recent ratings) 17,700 movies 480,000 users Mean square error = 1/|R| Σ (u,i) R (r ui - r ui ) 2 ^ 2/2/2011 7 Jure Leskovec, Stanford C246: Mining Massive Datasets
Background image of page 7

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full Document Right Arrow Icon
Grand Prize: 0.8563; 10% improvement BellKor: 0.8693; 8.63% improvement Cinematch: 0.9514; baseline Movie average: 1.0533 User average: 1.0651 Global average: 1.1296 Inherent noise: ???? Personalization erroneous accurate [Bellkor Team] 2/2/2011 8 Jure Leskovec, Stanford C246: Mining Massive Datasets
Background image of page 8
Earliest and most popular collaborative filtering method Derive unknown ratings from those of “similar” items (movie-movie variant) A parallel user-user flavor: rely on ratings of like-minded users (not in this talk) [Bellkor Team] 2/2/2011 9 Jure Leskovec, Stanford C246: Mining Massive Datasets
Background image of page 9

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full Document Right Arrow Icon
12 11 10 9 8 7 6 5 4 3 2 1 4 5 5 3 1 1 3 1 2 4 4 5 2 5 3 4 3 2 1 4 2 3 2 4 5 4 2 4 5 2 2 4 3 4 5 4 2 3 3 1 6 users movies - unknown rating - rating between 1 to 5 [Bellkor Team] 2/2/2011 10 Jure Leskovec, Stanford C246: Mining Massive Datasets
Background image of page 10
12 11 10 9 8 7 6 5 4 3 2 1 4 5 5 ? 3 1 1 3 1 2 4 4 5 2 5 3 4 3 2 1 4 2 3 2 4 5 4 2 4 5 2 2 4 3 4 5 4 2 3 3 1 6 users movies - estimate rating of movie 1 by user 5 [Bellkor Team] 2/2/2011 11 Jure Leskovec, Stanford C246: Mining Massive Datasets
Background image of page 11

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full Document Right Arrow Icon
12 11 10 9 8 7 6 5 4 3 2 1 4 5 5 ? 3 1 1 3 1 2 4 4 5 2 5 3 4 3 2 1 4 2 3 2 4 5 4 2 4 5 2 2 4 3 4 5 4 2 3 3 1 6 users Neighbor selection: Identify movies similar to 1, rated by user 5 movies [Bellkor Team] 2/2/2011 12 Jure Leskovec, Stanford C246: Mining Massive Datasets
Background image of page 12
12 11 10 9 8 7 6 5 4 3 2 1 4 5 5 ?
Background image of page 13

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full Document Right Arrow Icon
Image of page 14
This is the end of the preview. Sign up to access the rest of the document.

{[ snackBarMessage ]}

Page1 / 44

09-recsys - CS246 Mining Massive Datasets Jure Leskovec...

This preview shows document pages 1 - 14. Sign up to view the full document.

View Full Document Right Arrow Icon
Ask a homework question - tutors are online