However many relevant documents might have been

Info iconThis preview shows page 1. Sign up to view the full content.

View Full Document Right Arrow Icon
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: fies the fraction of retrieved documents that are in fact relevant, i.e. belong to the target class. Recall indicates which fraction of the relevant documents is retrieved. precision = #{relevant ∩ retrieved} #retrieved recall = #{relevant ∩ retrieved} #relevant (6) Obviously there is a trade off between precision and recall. Most classifiers internally determine some “degree of membership” in the target class. If only documents of high degree are assigned to the target class, the precision is high. However, many relevant documents might have been overlooked, which corresponds to a low recall. When on the other hand the search is more exhaustive, recall increases and precision goes down. The F-score is a compromise of both for measuring the overall performance of classifiers. F= 2 1/recall + 1/precision (7) 3.1.1 Index Term Selection As document collections often contain more than 100,000 different words we may select the most informative ones for a specific classification task to reduce the number of words and thus the complexity of the c...
View Full Document

This note was uploaded on 06/19/2011 for the course IT 2258 taught by Professor Aymenali during the Summer '11 term at Abu Dhabi University.

Ask a homework question - tutors are online