IR-part2

62 term document count matrices consider the number

Info iconThis preview shows page 1. Sign up to view the full content.

View Full Document Right Arrow Icon
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: erms in a collec*on are more informa*ve than frequent terms §༊  Jaccard doesn’t consider this informa*on §༊  We need a more sophis*cated way of normalizing for length §༊  Later in this lecture, we’ll use | A B | / | A B | . . . instead of |A ∩ B|/|A ∪ B| (Jaccard) for length normaliza*on. Introduc)on to Informa)on Retrieval Introduc*on to Informa(on Retrieval Scoring with the Jaccard coefficient Introduc)on to Informa)on Retrieval Introduc*on to Informa(on Retrieval Term frequency weigh*ng Introduc)on to Informa)on Retrieval S ec. 6.2 Recall: Binary term- document incidence matrix Antony and Cleopatra Julius Caesar The Tempest Hamlet Othello Macbeth Antony 1 1 0 0 0 1 Brutus 1 1 0 1 0 0 Caesar 1 1 0 1 1 1 Calpurnia 0 1 0 0 0 0 Cleopatra 1 0 0 0 0 0 mercy 1 0 1 1 1 1 worser 1 0 1 1 1 0 Each document is represented by a binary vector ∈ {0,1...
View Full Document

Ask a homework question - tutors are online