1. Suppose that we have a standard IR evaluation data set containing 1000 documents.

Assume that a particular query, Q. in this data set is deemed to be relevant to the

following 20 documents in the collection:

REL = { dl, d5, d10, dS8, d150, d200, d210, d250, d300, d400, d405, d472,

d500, d501, d545, d600, d635, d700, d720, d800}

Two different retrieval systems $1 and $2 are used to retrieve ranked lists of documents

from this collection using the above query. The top 10 retrieved documents for these two

systems are given below (each list is in decreasing order of relevance)

RET(S1) = d2, d5, d150, d250, all, d33, d50, d600, d500, d520

RET(S2) = d250, d400, d150, d210, d999, d3, d501, d800, d205, d300

For your convenience, the above information is provided in the Excel spreadsheet:

Relevance.xIsx.

a. Compute the Precision and the Recall graphs for each system as a function of the

number of documents returned (for 1 document returned, 2 documents returned,

etc). First, compile your calculations in a table which shows the Precision and

Recall for $1 and $2 using these query results as a function of the number of

documents returned (from 1 to 10). Then, create separate graphs for Precision and

Recall, in each case comparing the two systems. For each graph, use the number of

retrieved docs (from 1 to 10) as the x-axis and the value of the metric (precison or

recall) as the y-axis.

b. A single metric that can be used to combine precision and recall is the F Measure

(see Section 8.3 of the IR Book). Using the F1 measure (F measure with B=1).

create graph similar to the above comparing the two systems.

c. Next, using the graded relevance (gain) scores provided in the spreadsheet,

compute the Discounted Cumulative Gain (DCG) and the Normalized DSG

(NDCG) values for the retrieved documents for both $1 and $2.