Project 1: Book Recommendations
When buying things online you have probably noticed that you are often presented
with other items that “you might also like” or that “other customers also bought”. In
this project you will recommend books to a reader based on what other readers with
similar tastes have liked.
Netflix awarded one million dollars to the winners of the Netflix Prize. They asked
competitors to find an algorithm that would perform 10% better than their own
algorithm. Making good predictions about people's preferences was that important to
this company. It is also a very current area of research in machine learning, which is
part of the area of computer science called artificial intelli- gence.
In this assignment you will discover book recommendations for readers based on
other readers with similar tastes in books. (See the sample execution below on page
3.) The purpose of this assignment is to use common Python data structures (lists,
dictionaries, sets) and file input and text processing operations in a program that is a
little larger than any you may have encountered up to this point. It is important that
you understand the different parts of the program and plan ahead of time how you
will implement them. A suggested order for design and development appears later on
in this document.
There are two input data files used in this project.
First, there’s a list of books in “author,title” format in the file booklist.txt in
Files/Projects/Project 1 on Canvas:
Douglas Adams,The Hitchhiker's Guide To The Galaxy Richard Adams,Watership
Mitch Albom,The Five People You Meet in Heaven
There is also a file there with user ratings for each book (ratings.txt):
Ben 5000000101-3500055000050000000013010-5005505550550005555-5 Moose
The position of the ratings matches the positions of the books in the booklist.txt file.
For example, the first rating of 5 from Ben applies to Hitchhiker’s Guide to the Galaxy
(booklist), and the next 0 means Ben hasn’t read Watership Down (booklist).
The meaning of the rating numbers is explained in the table below.
-5 Hated it!
-3 Didn’t like it
0 Haven’t read it
1 It’s okay
3 Liked It
5 Really liked it!
You will determine recommendations for a reader by looking at other readers that are
“close” to him or her in their tastes. This is done quite cleverly by computing the dot
product of their respective ratings, which we will call an “affinity score”. A dot product
of two vectors (i.e., same-size lists) is the sum of the products of their values in
corresponding list positions, as explained below. We will call the 2 readers with the
highest affinity scores with a given reader the “friends” of that reader.