Evaluation_And_Redundancy

# Evaluation_And_Redundancy - 1 Practical Remarks the problem...

This preview shows pages 1–3. Sign up to view the full content.

1 Practical Remarks: the problem of near duplicates or exact duplicates We are making some changes in the way that we ask you to evaluate systems, for your term paper, this Semester. These changes are intended to recognize the fact that a search engine may return several snippets which link to different instances of exactly [or essentially] the same information. Each instance is, in the most technical sense, „relevant‟. But only the first one is useful for you. Here is an example. We use capital letters to represent relevant web pages, and lower case to represent not relevant web page [as represented by their snippets]. If the first relevant item in the list is called “A”, we can call the second one A1, and the third A2. Here is what might happen just when we are comparing two systems. System S1 S2 x A3 A d g e A2 B2 h A4 B g m C A3 A B2 m x2 g Calculation rules, For computing precision, assume that only the first instance of each relevant page is counted as relevant for each search engine . Let‟s count all the others as “not relevant”. So it is as if the table really looked like this: System S1 S2 x A3 A d g e A2 B2 h A4 B g M C A3 A B2 m x2 g

This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document
We would like to calculate the set based similarity (sometimes called “overlap” or “Dice coefficient for the relevant items that have been retrieved. We have used strikethrough
This is the end of the preview. Sign up to access the rest of the document.

## This note was uploaded on 02/20/2012 for the course 790 373 taught by Professor Boros during the Fall '09 term at Rutgers.

### Page1 / 3

Evaluation_And_Redundancy - 1 Practical Remarks the problem...

This preview shows document pages 1 - 3. Sign up to view the full document.

View Full Document
Ask a homework question - tutors are online