We may classify the documents of this test set with

Info iconThis preview shows page 1. Sign up to view the full content.

View Full Document Right Arrow Icon
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: the specific method employed, a data mining classification task starts with a training set D = (d1 , . . . , dn ) of documents that are already labelled with a class L ∈ L (e.g. sport, politics). The task is then to determine a classification model f :D→L f (d) = L (5) which is able to assign the correct class to a new document d of the domain. To measure the performance of a classification model a random fraction of the labelled documents is set aside and not used for training. We may classify the documents of this test set with the classification model and compare the estimated labels with the true labels. The fraction of correctly classified documents in relation to the total number of documents is called accuracy and is a first performance measure. 30 LDV-FORUM A Brief Survey of Text Mining Often, however, the target class covers only a small percentage of the documents. Then we get a high accuracy if we assign each document to the alternative class. To avoid this effect different measures of classification success are often used. Precision quanti...
View Full Document

This note was uploaded on 06/19/2011 for the course IT 2258 taught by Professor Aymenali during the Summer '11 term at Abu Dhabi University.

Ask a homework question - tutors are online