20 - Introduc)on to Informa)on Retrieval Ch. 18...

Info iconThis preview shows page 1. Sign up to view the full content.

View Full Document Right Arrow Icon
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: Introduc)on to Informa)on Retrieval Ch. 18 Today’s topic   Latent Seman2c Indexing   Term ­document matrices are very large   But the number of topics that people talk about is small (in some sense)  Clothes, movies, poli2cs, …   Can we represent the term ­document space by a lower dimensional latent space? Introduc)on to Informa)on Retrieval Linear Algebra Background 1 Introduc)on to Informa)on Retrieval Sec. 18.1 Eigenvalues & Eigenvectors   Eigenvectors (for a square m×m matrix S) Example (right) eigenvector eigenvalue   How many eigenvalues are there at most? only has a non-zero solution if This is a mth order equation in λ which can have at most m distinct solutions (roots of the characteristic polynomial) – can be complex even though S is real. Introduc)on to Informa)on Retrieval Sec. 18.1 Matrix ­vector mul2plica2on has eigenvalues 30, 20, 1 with corresponding eigenvectors On each eigenvector, S acts as a multiple of the identity matrix: but as a different multiple on each. Any vector (say x= the eigenvectors: ) can be viewed as a combination of x = 2v1 + 4v2 + 6v3 2 Introduc)on to Informa)on Retrieval Sec. 18.1 Matrix vector mul2plica2on   Thus a matrix ­vector mul2plica2on such as Sx (S, x as in the previous slide) can be rewriMen in terms of the eigenvalues/vectors:   Even though x is an arbitrary vector, the ac2on of S on x is determined by the eigenvalues/vectors. Introduc)on to Informa)on Retrieval Sec. 18.1 Matrix vector mul2plica2on   Sugges2on: the effect of “small” eigenvalues is small.   If we ignored the smallest eigenvalue (1), then instead of we would get   These vectors are similar (in cosine similarity, etc.) 3 Introduc)on to Informa)on Retrieval Sec. 18.1 Eigenvalues & Eigenvectors For symmetric matrices, eigenvectors for distinct eigenvalues are orthogonal All eigenvalues of a real symmetric matrix are real. All eigenvalues of a positive semidefinite matrix are non-negative Introduc)on to Informa)on Retrieval Sec. 18.1 Example   Let Real, symmetric.   Then   The eigenvalues are 1 and 3 (nonnega2ve, real).   The eigenvectors are orthogonal (and real): Plug in these values and solve for eigenvectors. 4 Introduc)on to Informa)on Retrieval Sec. 18.1 Eigen/diagonal Decomposi2on   Let be a square matrix with m linearly independent eigenvectors (a “non ­defec2ve” matrix)   Theorem: Exists an eigen decomposi7on U nique diagonal for distinct eigenvalues   (cf. matrix diagonaliza2on theorem)   Columns of U are eigenvectors of S   Diagonal elements of are eigenvalues of Introduc)on to Informa)on Retrieval Sec. 18.1 Diagonal decomposi2on: why/how Let U have the eigenvectors as columns: Then, SU can be written Thus SU=UΛ, or U–1SU=Λ And S=UΛU–1. 5 Introduc)on to Informa)on Retrieval Sec. 18.1 Diagonal decomposi2on  ­ example Recall The eigenvectors and form Inverting, we have Recall UU–1 =1. Then, S=UΛU–1 = Introduc)on to Informa)on Retrieval Sec. 18.1 Example con2nued Let’s divide U (and multiply U–1) by Then, S= Q Λ (Q-1= QT ) Why? Stay tuned … 6 Introduc)on to Informa)on Retrieval Sec. 18.1 Symmetric Eigen Decomposi2on   If is a symmetric matrix:   Theorem: There exists a (unique) eigen decomposi7on   where Q is orthogonal:   Q-1= QT   Columns of Q are normalized eigenvectors   Columns are orthogonal.   (everything is real) Introduc)on to Informa)on Retrieval Sec. 18.1 Exercise   Examine the symmetric eigen decomposi2on, if any, for each of the following matrices: 7 Introduc)on to Informa)on Retrieval Time out!   I came to this class to learn about text retrieval and mining, not have my linear algebra past dredged up again …   But if you want to dredge, Strang’s Applied Mathema)cs is a good place to start.   What do these matrices have to do with text?   Recall M × N term ­document matrices …   But everything so far needs square matrices – so … Introduc)on to Informa)on Retrieval Sec. 18.2 Singular Value Decomposi2on For an M × N matrix A of rank r there exists a factorization (Singular Value Decomposition = SVD) as follows: M×M M×N V is N×N The columns of U are orthogonal eigenvectors of AAT. The columns of V are orthogonal eigenvectors of ATA. Eigenvalues λ1 … λr of AAT are the eigenvalues of ATA. Singular values. 8 Introduc)on to Informa)on Retrieval Sec. 18.2 Singular Value Decomposi2on   Illustra2on of SVD dimensions and sparseness Introduc)on to Informa)on Retrieval Sec. 18.2 SVD example Let Thus M=3, N=2. Its SVD is Typically, the singular values arranged in decreasing order. 9 Introduc)on to Informa)on Retrieval Sec. 18.3 Low ­rank Approxima2on   SVD can be used to compute op2mal low ­rank approxima7ons.   Approxima2on problem: Find Ak of rank k such that Frobenius norm Ak and X are both m×n matrices. Typically, want k << r. Introduc)on to Informa)on Retrieval Sec. 18.3 Low ­rank Approxima2on   Solu2on via SVD set smallest r-k singular values to zero k column notation: sum of rank 1 matrices 10 Introduc)on to Informa)on Retrieval Sec. 18.3 Reduced SVD   If we retain only k singular values, and set the rest to 0, then we don’t need the matrix parts in brown   Then Σ is k×k, U is M×k, VT is k×N, and Ak is M×N   This is referred to as the reduced SVD   It is the convenient (space ­saving) and usual form for computa2onal applica2ons   It’s what Matlab gives you k Introduc)on to Informa)on Retrieval Sec. 18.3 Approxima2on error   How good (bad) is this approxima2on?   It’s the best possible, measured by the Frobenius norm of the error: where the σi are ordered such that σi ≥ σi+1. Suggests why Frobenius error drops as k increased. 11 Introduc)on to Informa)on Retrieval Sec. 18.3 SVD Low ­rank approxima2on   Whereas the term ­doc matrix A may have M=50000, N=10 million (and rank close to 50000)   We can construct an approxima2on A100 with rank 100.   Of all rank 100 matrices, it would have the lowest Frobenius error.   Great … but why would we??   Answer: Latent Seman)c Indexing C. Eckart, G. Young, The approximation of a matrix by another of lower rank. Psychometrika, 1, 211-218, 1936. Introduc)on to Informa)on Retrieval Latent Seman2c Indexing via the SVD 12 Introduc)on to Informa)on Retrieval Sec. 18.4 What it is   From term ­doc matrix A, we compute the approxima2on Ak.   There is a row for each term and a column for each doc in Ak   Thus docs live in a space of k<<r dimensions   These dimensions are not the original axes   But why? Introduc)on to Informa)on Retrieval Vector Space Model: Pros   Automa7c selec2on of index terms   Par7al matching of queries and documents (dealing with the case where no document contains all search terms)   Ranking according to similarity score (dealing with large result sets)   Term weigh7ng schemes (improves retrieval performance)   Various extensions   Document clustering   Relevance feedback (modifying query vector)   Geometric founda2on 13 Introduc)on to Informa)on Retrieval Problems with Lexical Seman2cs   Ambiguity and associa2on in natural language   Polysemy: Words ofen have a mul7tude of meanings and different types of usage (more severe in very heterogeneous collec)ons).   The vector space model is unable to discriminate between different meanings of the same word. Introduc)on to Informa)on Retrieval Problems with Lexical Seman2cs   Synonymy: Different terms may have an den7cal or a similar meaning (weaker: words indica2ng the same topic).   No associa2ons between words are made in the vector space representa2on. 14 Introduc)on to Informa)on Retrieval Polysemy and Context   Document similarity on single word level: polysemy and context ring ••• jupiter … planet ... … saturn meaning 1 space voyager ... meaning 2 contribution to similarity, if used in 1st meaning, but not if in 2nd car company dodge ford ••• Introduc)on to Informa)on Retrieval Sec. 18.4 Latent Seman2c Indexing (LSI)   Perform a low ­rank approxima7on of document ­ term matrix (typical rank 100 ­300)   General idea   Map documents (and terms) to a low ­dimensional representa2on.   Design a mapping such that the low ­dimensional space reflects seman7c associa7ons (latent seman2c space).   Compute document similarity based on the inner product in this latent seman7c space 15 Introduc)on to Informa)on Retrieval Sec. 18.4 Goals of LSI   Similar terms map to similar loca2on in low dimensional space   Noise reduc2on by dimension reduc2on Introduc)on to Informa)on Retrieval Sec. 18.4 Latent Seman2c Analysis   Latent seman7c space: illustra2ng example courtesy of Susan Dumais 16 Introduc)on to Informa)on Retrieval Sec. 18.4 Performing the maps   Each row and column of A gets mapped into the k ­ dimensional LSI space, by the SVD.   Claim – this is not only the mapping with the best (Frobenius error) approxima2on to A, but in fact improves retrieval.   A query q is also mapped into this space, by   Query NOT a sparse vector. Introduc)on to Informa)on Retrieval Sec. 18.4 Empirical evidence   Experiments on TREC 1/2/3 – Dumais   Lanczos SVD code (available on netlib) due to Berry used in these expts   Running 2mes of ~ one day on tens of thousands of docs [s2ll an obstacle to use]   Dimensions – various values 250 ­350 reported. Reducing k improves recall.   (Under 200 reported unsa2sfactory)   Generally expect recall to improve – what about precision? 17 Introduc)on to Informa)on Retrieval Sec. 18.4 Empirical evidence   Precision at or above median TREC precision   Top scorer on almost 20% of TREC topics   Slightly beMer on average than straight vector spaces   Effect of dimensionality: Dimensions 250 300 346 Precision 0.367 0.371 0.374 Introduc)on to Informa)on Retrieval Sec. 18.4 Failure modes   Negated phrases   TREC topics some2mes negate certain query/ terms phrases – precludes automa2c conversion of topics to latent seman2c space.   Boolean queries   As usual, freetext/vector space syntax of LSI queries precludes (say) “Find any doc having to do with the following 5 companies”   See Dumais for more. 18 Introduc)on to Informa)on Retrieval Sec. 18.4 But why is this clustering?   We’ve talked about docs, queries, retrieval and precision here.   What does this have to do with clustering?   Intui2on: Dimension reduc2on through LSI brings together “related” axes in the vector space. Introduc)on to Informa)on Retrieval Intui2on from block matrices N documents Block 1 What’s the rank of this matrix? Block 2 0’s … M terms 0’s Block k = Homogeneous non-zero blocks. 19 Introduc)on to Informa)on Retrieval Intui2on from block matrices N documents Block 1 M terms 0’s Block 2 0’s … Block k Vocabulary partitioned into k topics (clusters); each doc discusses only one topic. Introduc)on to Informa)on Retrieval Intui2on from block matrices N documents Block 1 What’s the best rank-k approximation to this matrix? Block 2 0’s … M terms 0’s Block k = non-zero entries. 20 Introduc)on to Informa)on Retrieval Intui2on from block matrices Likely there’s a good rank-k approximation to this matrix. wiper tire V6 Block 1 Block 2 Few nonzero entries … Few nonzero entries car 10 automobile 0 1 Block k Introduc)on to Informa)on Retrieval Simplis2c picture Topic 1 Topic 2 Topic 3 21 Introduc)on to Informa)on Retrieval Some wild extrapola2on   The “dimensionality” of a corpus is the number of dis2nct topics represented in it.   More mathema2cal wild extrapola2on:   if A has a rank k approxima2on of low Frobenius error, then there are no more than k dis2nct topics in the corpus. Introduc)on to Informa)on Retrieval LSI has many other applica2ons   In many sepngs in paMern recogni2on and retrieval, we have a feature ­object matrix.           For text, the terms are features and the docs are objects. Could be opinions and users … This matrix may be redundant in dimensionality. Can work with low ­rank approxima2on. If entries are missing (e.g., users’ opinions), can recover if dimensionality is low.   Powerful general analy2cal technique   Close, principled analog to clustering methods. 22 Introduc)on to Informa)on Retrieval Resources  IIR 18 23 ...
View Full Document

This note was uploaded on 01/21/2011 for the course CSCP 689 taught by Professor James during the Spring '10 term at Texas A&M.

Ask a homework question - tutors are online