7 the three measures return values in the interval 0 1

Info iconThis preview shows page 1. Sign up to view the full content.

View Full Document Right Arrow Icon
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: lling, is treated as if it is the desired set of documents for a query which leads to the same definitions for precision, recall and f-score as defined in Equations 6 and 7. The two partitions P and L are then compared as follows. The precision of a cluster P ∈ P for a given category L ∈ L is given by Precision( P, L) := | P ∩ L| . | P| (11) The overall value for purity is computed by taking the weighted average of maximal precision values: Purity(P, L ) := ∑ P ∈P | P| max Precision( P, L). | D | L ∈L (12) The counterpart of purity is: InversePurity(P, L ) := ∑ L ∈L | L| max Recall( P, L), | D | P ∈P (13) where Recall( P, L) := Precision( L, P) and the well known F-Measure(P, L ) := ∑ L ∈L | L| 2 · Recall( P, L) · Precision( P, L) max , | D | P∈P Recall( P, L) + Precision( P, L) (14) which is based on the F-score as defined in Eq. 7. The three measures return values in the interval [0, 1], with 1 indicating optimal agreement. Purity measures the homogeneity of the resulting clusters when evaluated against a pre-categorization, while inverse purity measures how stable the pre-defined categories are when split up into clusters. Thus, purity achieves an “optimal” value of 1 when the number of clusters k equals | D |, whereas inverse purity achieves an “optimal” value of 1 when k equals 1. Another name...
View Full Document

Ask a homework question - tutors are online