lecture6-tfidf-handout-6-per

63 wh term cossaspap 0789 0832 0515 0555

Info iconThis preview shows page 1. Sign up to view the full content.

View Full Document Right Arrow Icon
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: tself) from earlier slide: they have iden*cal vectors a]er length ­normaliza*on.   Long and short documents now have comparable weights   But how – and why – should we be compu*ng cosines? Introduc)on to Informa)on Retrieval S ec. 6.3 Introduc)on to Informa)on Retrieval cosine(query,document) Dot product Cosine for length ­normalized vectors   For length ­normalized vectors, cosine similarity is simply the dot product (or scalar product): Unit vectors q•d q d cos( q, d ) = = • = qd qd ∑ V qd i =1 i i V 2 i =1 i ∑ q ∑ V i =1 V !! ! ! cos(q, d ) = q • d = " qi di di2 i= 1 qi is the tf-idf weight of term i in the query di is the tf-idf weight of term i in the document cos(q,d) is the cosine similarity of q and d … or, equivalently, the cosine of the angle between q and d. for q, d length ­normalized. ! 36 6 Introduc)on to Informa)on Retrieval Introduc)on to Informa)on Retrieval Cosine similarity illustrated Cosine similarity amongst 3 documents How similar are the novels SaS: Sense and Sensibility PaP: Pride and Prejudice, and WH: Wuthering Heights? 37 Introduc)on to Informa)on Retrieval S ec. 6.3 term SaS SaS PaP WH 115 58 20 jealous 10 7 11 gossip 2 0 6 wuthering 0 0 38 Term frequencies (counts) Note: To simplify this example, we don t do idf weighting. Introduc)on to Informa)on Retrieval Sec. 6.3 A9er length normaliza(on SaS PaP WH affection 3.06 2.76 2.30 affection 0.789 0.832 0.524 jealous 2.00 1.85 2.04 jealous 0.515 0.555 0.465 gossip 1.30 0 1.78 gossip 0.335 0 0.405 0 0 2.58 wuthering 0 0 0.588 wuthering PaP term affection Compu*ng cosine scores 3 documents example contd. Log frequency weigh(ng S ec. 6.3 WH term cos(SaS,PaP) ≈ 0.789 × 0.832 + 0.515 × 0.555 + 0.335 × 0.0 + 0.0 × 0.0 ≈ 0.94 cos(SaS,WH) ≈ 0.79 cos(PaP,WH) ≈ 0.69 Why do we have cos(SaS,PaP) > cos...
View Full Document

Ask a homework question - tutors are online