This preview shows page 1. Sign up to view the full content.
Unformatted text preview: the
speciﬁc method employed, a data mining classiﬁcation task starts with a training
set D = (d1 , . . . , dn ) of documents that are already labelled with a class L ∈ L
(e.g. sport, politics). The task is then to determine a classiﬁcation model
f :D→L f (d) = L (5) which is able to assign the correct class to a new document d of the domain.
To measure the performance of a classiﬁcation model a random fraction
of the labelled documents is set aside and not used for training. We may
classify the documents of this test set with the classiﬁcation model and compare
the estimated labels with the true labels. The fraction of correctly classiﬁed
documents in relation to the total number of documents is called accuracy and is
a ﬁrst performance measure. 30 LDV-FORUM A Brief Survey of Text Mining
Often, however, the target class covers only a small percentage of the documents. Then we get a high accuracy if we assign each document to the alternative class. To avoid this effect different measures of classiﬁcation success are
often used. Precision quanti...
View Full Document
This note was uploaded on 06/19/2011 for the course IT 2258 taught by Professor Aymenali during the Summer '11 term at Abu Dhabi University.
- Summer '11