146 introducon to informaon retrieval 26 introducon

Info iconThis preview shows page 1. Sign up to view the full content.

View Full Document Right Arrow Icon
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: ymorphic categories beker than Rocchio/NB. 23 24 4 Sec.14.3 Introduc)on to Informa)on Retrieval Introduc)on to Informa)on Retrieval Nearest Neighbor with Inverted Index kNN: Discussion   Naively, finding nearest neighbors requires a linear search through |D| documents in collec)on   But determining k nearest neighbors is the same as determining the k best retrievals using the test document as a query to a database of training documents.   Use standard vector space inverted index methods to find the k nearest neighbors.   Tes)ng Time: O(B|Vt|) where B is the average Sec.14.3   No feature selec)on necessary   Scales well with large number of classes number of training documents in which a test ­document word appears.   Typically B << |D|   Don’t need to train n classifiers for n classes   Classes can influence each other   Small changes to one class can have ripple effect   Scores can be hard to convert to probabili)es   No training necessary   Actually: perhaps not true. (Data edi)ng, etc.)   May be expensive at test )me   In most cases it’s more accurate than NB or Rocchio 25 Sec.14.6 Introduc)on to Informa)on Retrieval 26 Introduc)on to Informa)on Retrieval Sec.14.6 Bias vs. variance: Choosing the correct model capacity kNN vs. Naive Bayes   Bias/Variance tradeoff   Variance ≈ Capacity   kNN has high variance and low bias.   Infinite memory   NB ha...
View Full Document

This document was uploaded on 02/26/2014.

Ask a homework question - tutors are online