10.1.1.153.6679

# 7 the three measures return values in the interval 0 1

This preview shows page 1. Sign up to view the full content.

This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: lling, is treated as if it is the desired set of documents for a query which leads to the same deﬁnitions for precision, recall and f-score as deﬁned in Equations 6 and 7. The two partitions P and L are then compared as follows. The precision of a cluster P ∈ P for a given category L ∈ L is given by Precision( P, L) := | P ∩ L| . | P| (11) The overall value for purity is computed by taking the weighted average of maximal precision values: Purity(P, L ) := ∑ P ∈P | P| max Precision( P, L). | D | L ∈L (12) The counterpart of purity is: InversePurity(P, L ) := ∑ L ∈L | L| max Recall( P, L), | D | P ∈P (13) where Recall( P, L) := Precision( L, P) and the well known F-Measure(P, L ) := ∑ L ∈L | L| 2 · Recall( P, L) · Precision( P, L) max , | D | P∈P Recall( P, L) + Precision( P, L) (14) which is based on the F-score as deﬁned in Eq. 7. The three measures return values in the interval [0, 1], with 1 indicating optimal agreement. Purity measures the homogeneity of the resulting clusters when evaluated against a pre-categorization, while inverse purity measures how stable the pre-deﬁned categories are when split up into clusters. Thus, purity achieves an “optimal” value of 1 when the number of clusters k equals | D |, whereas inverse purity achieves an “optimal” value of 1 when k equals 1. Another name...
View Full Document

Ask a homework question - tutors are online