08-recsys

Usual heuristic is tfidf term frequency times inverse

Info iconThis preview shows page 1. Sign up to view the full content.

View Full Document Right Arrow Icon
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: ing Massive Datasets 15 fij = frequency of term ti in document dj ni = number of docs that mention term i N = total number of docs TF.IDF score wij = Tfij × IDFi Doc profile = set of words with highest TF.IDF scores, together with their scores 1/30/2011 Jure Leskovec, Stanford C246: Mining Massive Datasets 16 User profile possibilities: Weighted average of rated item profiles Variation: weight by difference from average rating for item … Prediction heuristic Given user profile c and item profile s, estimate u(c,s) = cos(c,s) = c.s/(|c||s|) Need efficient method to find items with high utility: later 1/30/2011 Jure Leskovec, Stanford C246: Mining Massive Datasets 17 No need for data on other users. No cold-start or sparsity problems. Able to recommend to users with unique tastes. Able to recommend new and unpopular items No first-rater problem Can provide explanations of recommended items by listing content-features that caused an item to be recommended 1/30/2011 Jure Leskovec, Stanford C246: Mining Massive Datasets 18 Finding the appropriate features e.g., images, movies, music Overspecialization Never recommends items outside user’s content profile People might have multiple interests Unable to exploit quality judgments of other users Recommendations for new users How to build a profile? 1/30/2011 Jure Leskovec, Stanford C246: Mining Massive Datasets 19 Consider user c Find set D of other users whose ratings are “similar” to c’s ratings Estimate user’s ratings based on ratings of users in D 1/30/2011 Jure Leskovec, Stanford C246: Mining Massive Datasets 20 Let rx be the vector of user x’s ratings Cosine similarity measure sim(x,y) = cos(rx , ry) Pearson correlation coefficient Sxy = items rated by both users x and y 1/30/2011 Jure Leskovec, Stanford C246: Mining Massive Datasets 21 Let D be the set of k users most similar to c who have rated item s Possibilities for prediction function (item s): rcs = 1/k ∑d in D rds rcs = (∑d in D sim(c,d) rds)/(∑...
View Full Document

Ask a homework question - tutors are online