124.11.lec4

124.11.lec4 - Dan Jurafsky Lecture 4: Text ClassiFca8on;...

Info iconThis preview shows pages 1–11. Sign up to view the full content.

View Full Document Right Arrow Icon

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: Dan Jurafsky Lecture 4: Text ClassiFca8on; The Nave Bayes algorithm CS 124/LINGUIST 180: From Languages to Information IP notice: most slides for today from: Chris Manning , plus some from William Cohen, Chien Chin Chen, Jason Eisner, David Yarowsky, P Nakov, Marti Hearst, Barbara Rosario Outline Introduc8on to Text Classifca8on Also called text categoriza8on. Nave Bayes text classifca8on Is this spam? More Applications of Text Classification Authorship iden8fca8on Age/gender iden8fca8on Language Iden8fca8on Assigning topics such as Yahoo-categories e.g., "fnance," "sports," "news>world>asia>business" Genre-detec8on e.g., "editorials" "movie-reviews" "news Opinion/sen8ment analysis on a person/product e.g., like, hate, neutral Labels may be domain-specifc e.g., contains adult language : doesnt Text Classification: definition The classifer: Input : a document d Output : a predicted class c From some fxed set oF labels c 1 ,...,c K The learner: Input: a set oF m hand-labeled documents (d 1 ,c 1 ),...., (d m ,c m ) Output: a learned classifer f:d c Slide from William Cohen Multimedia GUI Garb.Coll. Semantics ML Planning planning temporal reasoning plan language ... programming semantics language proof ... learning intelligence algorithm reinforcement network... garbage collection memory optimization region... planning language proof intelligence Training Data: Test Data: Classes: (AI) Document Classification Slide from Chris Manning (Programming) (HCI) ... ... Classification Methods: Hand-coded rules Some spam/email flters, etc. E.g., assign category iF document contains a given boolean combina8on oF words Accuracy is oTen very high iF a rule has been careFully refned over 8me by a subject expert Building and maintaining these rules is expensive Slide from Chris Manning Classification Methods: Machine Learning Supervised Machine Learning To learn a func8on from documents (or sentences) to labels Naive Bayes (simple, common method) Others k-Nearest Neighbors (simple, powerful) Support-vector machines (new, more powerful) plus many other methods No free lunch: requires hand-classiFed training data But data can be built up (and reFned) by amateurs Slide from Chris Manning Nave Bayes Intuition Representing text for classification Slide from William Cohen ARGENTINE 1986/87 GRAIN/OILSEED REGISTRATIONS BUENOS AIRES, Feb 26 Argentine grain board figures show crop registrations of grains, oilseeds and their products to February 11, in thousands of tonnes, showing those for future shipments month, 1986/87 total and 1985/86 total to February 12, 1986, in brackets: Bread wheat prev 1,655.8, Feb 872.0, March 164.6, total 2,692.4 (4,161.0)....
View Full Document

This document was uploaded on 06/01/2011.

Page1 / 49

124.11.lec4 - Dan Jurafsky Lecture 4: Text ClassiFca8on;...

This preview shows document pages 1 - 11. Sign up to view the full document.

View Full Document Right Arrow Icon
Ask a homework question - tutors are online