Hw3 - CS 124 LINGUIST 180 Winter 2011 Homework 3 Movie Reviews Sentiment Classification Due Thursday Jan 27 9:30am Your goal for this homework is

Info iconThis preview shows pages 1–2. Sign up to view the full content.

View Full Document Right Arrow Icon
CS 124 / LINGUIST 180 - Winter 2011 Homework 3: Movie Reviews Sentiment Classification Due: Thursday Jan 27 9:30am Your goal for this homework is to do sentiment analysis : classifying movie reviews as positive or negative. Recall from lecture that sentiment analysis can be used to extract people's opinions about all sorts of things (congressional debates, presidential speeches, reviews, blogs) and at many levels of granularity (the sentence, the paragraph, the entire document). Our goal in this task is to look at an entire movie review and classify it as positive or negative. Algorithm You will be using naive bayes , following the pseudocode in Manning, Raghavan, and Schütze on page 241 (in the paper, offline edition) (or 260 in the "pdf-for-printing" version that's online), using Laplace smoothing. Your classifier will use words as features, add the logprob scores for each token, and make a binary decision between positive and negative. You will also explore the effects of stop word filtering . This means removing common words like "the", "a" and "it" from your train and test sets. We have provided a stop list with the starter code which can be found at: /usr/class/cs124/assignments/hw3/english.stop This algorithm is a simplified version of this paper that you read for class: Bo Pang, Lillian Lee, and Shivakumar Vaithyanathan. 2002. Thumbs up? Sentiment Classification using Machine Learning Techniques. Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 79--86. Evaluation You are given two data sets, imdb1 and imdb2. The first data set, imdb1, is the actual data set used in the original Pang and Lee paper. The second was collected several years later by Chris Potts here at Stanford. You will evaluate your model on the two data sets using two methods.
Background image of page 1

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
Image of page 2
This is the end of the preview. Sign up to access the rest of the document.

This document was uploaded on 06/01/2011.

Page1 / 4

Hw3 - CS 124 LINGUIST 180 Winter 2011 Homework 3 Movie Reviews Sentiment Classification Due Thursday Jan 27 9:30am Your goal for this homework is

This preview shows document pages 1 - 2. Sign up to view the full document.

View Full Document Right Arrow Icon
Ask a homework question - tutors are online