lecture11-vector-classify-handout-6-per

Note that centroid will in general not be a unit

Info iconThis preview shows page 1. Sign up to view the full content.

View Full Document Right Arrow Icon
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: members of class 9 Introduc)on to Informa)on Retrieval Sec.14.2   Assign test documents to the category with the closest prototype vector based on cosine similarity. Introduc)on to Informa)on Retrieval 10 Sec.14.2 Defini)on of centroid Illustra)on of Rocchio Text Categoriza)on 1 µ (c ) = ∑ v (d ) | Dc | d ∈ Dc   Where Dc is the set of all documents that belong to class c and v(d) is the vector space representa)on of d. €   Note that centroid will in general not be a unit vector even when the inputs are unit vectors. 11 12 2 Introduc)on to Informa)on Retrieval Sec.14.2 Introduc)on to Informa)on Retrieval Sec.14.2 Rocchio Proper)es Rocchio Anomaly   Forms a simple generaliza)on of the examples in each class (a prototype).   Prototype vector does not need to be averaged or otherwise normalized for length since cosine similarity is insensi)ve to vector length.   Classifica)on is based on similarity to class prototypes.   Does not guarantee classifica)ons are consistent with the given training data. Why not?   Prototype models have problems with polymorphic (disjunc)ve) categories. 13 Introduc)on to Informa)on Retrieval Sec.14.2 14 Introduc)on to Informa)on Retrieval Rocchio classifica)on k Nearest Neighbor Classifica)on   Rocchio forms a simple representa)on for each class: the centroid/prototype   Classifica)on is based on similarity to / distance from the prototype/centroid   It does not guarantee tha...
View Full Document

Ask a homework question - tutors are online