# lsi_notes - CS/INFO 4300 notes on LSI for Assignment 2. P....

This preview shows page 1. Sign up to view the full content.

This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: CS/INFO 4300 notes on LSI for Assignment 2. P. Ginsparg A query is expressed as a vector vector q in term space, with components q i = 1 if term i is in the query, and otherwise 0. (By convention any query terms not in the original term vector space are ignored.) In the VSM framework, the similarity between a query vector q and the j th document vector d ( j ) is given by the “cosine measure”: vector q · vector d ( j ) | vector q | | vector d ( j ) | (dot product divided by the lengths | vector q | and | vector d ( j ) | ). Using the term–document matrix C ij , this dot product is given by the j th component of vector q · C . Let vectore ( j ) denote the j th basis vector (with a single 1 in the j th position, 0 elsewhere), then we can write vector d ( j ) = Cvectore ( j ) , and the similarity measure is Similarity( vector q, vector d ( j ) ) = cos( θ ) = vector q · vector d ( j ) | vector q | | vector d ( j ) | = vector q · C · vector e ( j ) | vector q | | Cvectore ( j ) |...
View Full Document

## This note was uploaded on 01/25/2012 for the course INFO 4300 taught by Professor Staff during the Fall '08 term at Cornell University (Engineering School).

Ask a homework question - tutors are online