s10-correlation-lsi

s10-correlation-lsi - 1 Remember: Don't print hidden...

Info iconThis preview shows pages 1–8. Sign up to view the full content.

View Full Document Right Arrow Icon

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: 1 Remember: Don't print hidden slides!! 3/12/12 2 So many ways things can go wrong Reasons that ideal effectiveness hard to achieve: 1. Document representation loses information. 2. Users inability to describe queries precisely. 3. Similarity function used not be good enough. 4. Importance/weight of a term in representing a document and query may be inaccurate 5. Same term may have multiple meanings and different terms may have similar meanings. Query expansion Relevance feedback LSI Co-occurrence analysis Improving Vector Space Ranking We will consider three techniques Relevance feedbackwhich tries to improve the query quality Will do later with Text Classification Correlation analysis, which looks at correlations between keywords (and thus effectively computes a thesaurus based on the word occurrence in the documents) to do query elaboration Principal Components Analysis (also called Latent Semantic Indexing) which subsumes correlation analysis and does dimensionality reduction. Correlation/Co-occurrence analysis Co-occurrence analysis: Terms that are related to terms in the original query may be added to the query. Two terms are related if they have high co-occurrence in documents. Let n be the number of documents; n1 and n2 be # documents containing terms t1 and t2, m be the # documents having both t1 and t2 If t1 and t2 are independent If t1 and t2 are correlated m n n n n n 2 1 m n n n n n < < 2 1 M e a s u r e d e g r e e o f c o r r e l a t i o n >> if Inversely correlated Correlation There are 1000 documents in my corpus The keyword k1 is in 25% of them The keyword k2 is in 32% of them Are k1 and k2 Correlated positively? Correlated negatively? Not correlated? Let the fraction of documents that have k1 and k2 together be x% Now can you answer? If x >> 8% +ve correlation x<< 8% -ve correlation x~ 8% independent Correlation/Co-occurrence analysis Co-occurrence analysis: Terms that are related to terms in the original query may be added to the query. Two terms are related if they have high co-occurrence in documents. Let n be the number of documents; n1 and n2 be # documents containing terms t1 and t2, m be the # documents having both t1 and t2 If t1 and t2 are independent If t1 and t2 are correlated m n n n n n 2 1 m n n n n n < < 2 1 M e a s u r e d e g r e e o f c o r r e l a t i o n >> if Inversely correlated a b c d e f g h I Interface 1 User 1 1 1 System 2 1 1 Human 1 1 Computer 1 1 Response 1 1 Time 1 1 EPS 1 1 Survey 1 1 Trees 1 1 1 Graph 1 1 1 Minors 1 1 Terms and Docs as mutually dependent vectors Document vector T e r m v e c t o r If terms are independent, the T-T similarity matrix should be diagonal =If it is not diagonal, we can use the correlations to add related terms to the query =But can also ask the question Are there independent dimensions which define the space where terms & docs are vectors ? In addition to doc-doc similarity, We can compute term-term distance...
View Full Document

This note was uploaded on 03/11/2012 for the course CSE 494 taught by Professor Rao during the Spring '08 term at ASU.

Page1 / 49

s10-correlation-lsi - 1 Remember: Don't print hidden...

This preview shows document pages 1 - 8. Sign up to view the full document.

View Full Document Right Arrow Icon
Ask a homework question - tutors are online