IR-part2

Long and short documents now have comparable weights

Info iconThis preview shows page 1. Sign up to view the full content.

View Full Document Right Arrow Icon
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: S ec. 6.3 Binary → count → weight matrix Antony and Cleopatra Julius Caesar The Tempest Hamlet Othello Macbeth Antony 5.25 3.18 0 0 0 0.35 Brutus 1.21 6.1 0 1 0 0 Caesar 8.59 2.54 0 1.51 0.25 0 Calpurnia 0 1.54 0 0 0 0 Cleopatra 2.85 0 0 0 0 0 mercy 1.51 0 1.9 0.12 5.25 0.88 worser 1.37 0 0.11 4.15 0.25 1.95 Each document is now represented by a real-valued vector of tf-idf weights ∈ R|V| Introduc)on to Informa)on Retrieval Introduc*on to Informa(on Retrieval k- idf weigh*ng Introduc)on to Informa)on Retrieval Introduc*on to Informa(on Retrieval The Vector Space Model (VSM) Introduc)on to Informa)on Retrieval Sec. 6.3 Documents as vectors Now we have a |V|- dimensional vector space Terms are axes of the space Documents are points or vectors in this space Very high- dimensional: tens of millions of dimensions when you apply this to a web search engine §...
View Full Document

This document was uploaded on 02/14/2014.

Ask a homework question - tutors are online