lecture11-vector-classify-handout-6-per

G topic in vector space classicaon this set

Info iconThis preview shows page 1. Sign up to view the full content.

View Full Document Right Arrow Icon
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: space? 3 Introduc)on to Informa)on Retrieval Sec.14.1 Classifica)on Using Vector Spaces 4 Introduc)on to Informa)on Retrieval Sec.14.1 Documents in a Vector Space   As before, the training set is a set of documents, each labeled with its class (e.g., topic)   In vector space classifica)on, this set corresponds to a labeled set of points (or, equivalently, vectors) in the vector space   Premise 1: Documents in the same class form a con)guous region of space   Premise 2: Documents from different classes don’t overlap (much)   We define surfaces to delineate classes in the space Government Science Arts 5 6 1 Introduc)on to Informa)on Retrieval Sec.14.1 Test Document of what class? Introduc)on to Informa)on Retrieval Sec.14.1 Test Document = Government Is this similarity hypothesis true in general? Government Government Science Science Arts Arts 7 Introduc)on to Informa)on Retrieval Sec.14.1 Our main topic today is how to find good separators Introduc)on to Informa)on Retrieval 8 Sec.14.2 Using Rocchio for text classifica)on Aside: 2D/3D graphs can be misleading   Relevance feedback methods can be adapted for text categoriza)on   As noted before, relevance feedback can be viewed as 2 ­class classifica)on   Relevant vs. nonrelevant documents   Use standard h ­idf weighted vectors to represent text documents   For training documents in each category, compute a prototype vector by summing the vectors of the training documents in the category.   Prototype = centroid of...
View Full Document

Ask a homework question - tutors are online