09-recsys - CS246 Mining Massive Datasets Jure Leskovec...

Info icon This preview shows pages 1–14. Sign up to view the full content.

View Full Document Right Arrow Icon
CS246: Mining Massive Datasets Jure Leskovec, Stanford University http://cs246.stanford.edu
Image of page 1

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full Document Right Arrow Icon
Training data 100 million ratings, 480,000 users, 17,770 movies 6 years of data: 2000-2005 Test data Last few ratings of each user (2.8 million) Evaluation criterion: root mean squared error (RMSE) Netflix Cinematch RMSE: 0.9514 Competition 2700+ teams $1 million prize for 10% improvement on Cinematch $50,000 progress prize for 8.43% improvement 2/2/2011 Jure Leskovec, Stanford C246: Mining Massive Datasets 2
Image of page 2
Sudden rise in the avg. rating (early 2004): Improvements in Netflix GUI improvements Meaning of rating changed? Ratings increase with the movie age at the time of the rating [Bellkor Team] 2/2/2011 Jure Leskovec, Stanford C246: Mining Massive Datasets
Image of page 3

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full Document Right Arrow Icon
Count Avg rating Most Loved Movies 137812 4.593 The Shawshank Redemption 133597 4.545 Lord of the Rings : The Return of the King 180883 4.306 The Green Mile 150676 4.460 Lord of the Rings : The Two Towers 139050 4.415 Finding Nemo 117456 4.504 Raiders of the Lost Ark Most Rated Movies Miss Congeniality Independence Day The Patriot The Day After Tomorrow Pretty Woman Pirates of the Caribbean Highest Variance The Royal Tenenbaums Lost In Translation Pearl Harbor Miss Congeniality Napolean Dynamite Fahrenheit 9/11 [Bellkor Team] 2/2/2011 4 Jure Leskovec, Stanford C246: Mining Massive Datasets
Image of page 4
5 User ID # Ratings Mean Rating 305344 17,651 1.90 387418 17,432 1.81 2439493 16,560 1.22 1664010 15,811 4.26 2118461 14,829 4.08 1461435 9,820 1.37 1639792 9,764 1.33 1314869 9,739 2.95 [Bellkor Team] 2/2/2011 5 Jure Leskovec, Stanford C246: Mining Massive Datasets
Image of page 5

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full Document Right Arrow Icon
1 3 4 3 5 5 4 5 5 3 3 2 2 2 5 2 1 1 3 3 1 17,700 movies 480,000 users 2/2/2011 6 Jure Leskovec, Stanford C246: Mining Massive Datasets
Image of page 6