This preview shows page 1. Sign up to view the full content.
Unformatted text preview: CS/INFO 4300 notes on LSI for Assignment 2. P. Ginsparg A query is expressed as a vector vector q in term space, with components q i = 1 if term i is in the query, and otherwise 0. (By convention any query terms not in the original term vector space are ignored.) In the VSM framework, the similarity between a query vector q and the j th document vector d ( j ) is given by the “cosine measure”: vector q · vector d ( j ) | vector q | | vector d ( j ) | (dot product divided by the lengths | vector q | and | vector d ( j ) | ). Using the term–document matrix C ij , this dot product is given by the j th component of vector q · C . Let vectore ( j ) denote the j th basis vector (with a single 1 in the j th position, 0 elsewhere), then we can write vector d ( j ) = Cvectore ( j ) , and the similarity measure is Similarity( vector q, vector d ( j ) ) = cos( θ ) = vector q · vector d ( j ) | vector q | | vector d ( j ) | = vector q · C · vector e ( j ) | vector q | | Cvectore ( j ) |...
View Full Document
This note was uploaded on 01/25/2012 for the course INFO 4300 taught by Professor Staff during the Fall '08 term at Cornell University (Engineering School).
- Fall '08