This preview shows page 1. Sign up to view the full content.
Unformatted text preview: lling, is treated as if it is the desired set of documents
for a query which leads to the same deﬁnitions for precision, recall and f-score
as deﬁned in Equations 6 and 7. The two partitions P and L are then compared
The precision of a cluster P ∈ P for a given category L ∈ L is given by
Precision( P, L) := | P ∩ L|
| P| (11) The overall value for purity is computed by taking the weighted average of
maximal precision values:
Purity(P, L ) := ∑ P ∈P | P|
max Precision( P, L).
| D | L ∈L (12) The counterpart of purity is:
InversePurity(P, L ) := ∑ L ∈L | L|
max Recall( P, L),
| D | P ∈P (13) where Recall( P, L) := Precision( L, P) and the well known
F-Measure(P, L ) := ∑ L ∈L | L|
2 · Recall( P, L) · Precision( P, L)
| D | P∈P Recall( P, L) + Precision( P, L) (14) which is based on the F-score as deﬁned in Eq. 7.
The three measures return values in the interval [0, 1], with 1 indicating
optimal agreement. Purity measures the homogeneity of the resulting clusters
when evaluated against a pre-categorization, while inverse purity measures
how stable the pre-deﬁned categories are when split up into clusters. Thus,
purity achieves an “optimal” value of 1 when the number of clusters k equals
| D |, whereas inverse purity achieves an “optimal” value of 1 when k equals 1.
View Full Document
- Summer '11