This preview shows page 1. Sign up to view the full content.
Unformatted text preview: Introduc)on to Informa)on Retrieval Ch. 18 Today’s topic Latent Seman2c Indexing Term
document matrices are very large But the number of topics that people talk about is small (in some sense) Clothes, movies, poli2cs, … Can we represent the term
document space by a lower dimensional latent space? Introduc)on to Informa)on Retrieval Linear Algebra Background 1 Introduc)on to Informa)on Retrieval Sec. 18.1 Eigenvalues & Eigenvectors Eigenvectors (for a square m×m matrix S) Example (right) eigenvector eigenvalue How many eigenvalues are there at most? only has a nonzero solution if This is a mth order equation in λ which can have at most m distinct solutions (roots of the characteristic
polynomial) – can be complex even though S is real. Introduc)on to Informa)on Retrieval Sec. 18.1 Matrix
vector mul2plica2on has eigenvalues 30, 20, 1 with corresponding eigenvectors On each eigenvector, S acts as a multiple of the identity matrix: but as a different multiple on each. Any vector (say x= the eigenvectors: ) can be viewed as a combination of x = 2v1 + 4v2 + 6v3 2 Introduc)on to Informa)on Retrieval Sec. 18.1 Matrix vector mul2plica2on Thus a matrix
vector mul2plica2on such as Sx (S, x as in the previous slide) can be rewriMen in terms of the eigenvalues/vectors: Even though x is an arbitrary vector, the ac2on of S on x is determined by the eigenvalues/vectors. Introduc)on to Informa)on Retrieval Sec. 18.1 Matrix vector mul2plica2on Sugges2on: the eﬀect of “small” eigenvalues is small. If we ignored the smallest eigenvalue (1), then instead of we would get These vectors are similar (in cosine similarity, etc.) 3 Introduc)on to Informa)on Retrieval Sec. 18.1 Eigenvalues & Eigenvectors For symmetric matrices, eigenvectors for distinct eigenvalues are orthogonal All eigenvalues of a real symmetric matrix are real. All eigenvalues of a positive semidefinite matrix are nonnegative Introduc)on to Informa)on Retrieval Sec. 18.1 Example Let Real, symmetric. Then The eigenvalues are 1 and 3 (nonnega2ve, real). The eigenvectors are orthogonal (and real): Plug in these values and solve for eigenvectors. 4 Introduc)on to Informa)on Retrieval Sec. 18.1 Eigen/diagonal Decomposi2on Let be a square matrix with m linearly independent eigenvectors (a “non
defec2ve” matrix) Theorem: Exists an eigen decomposi7on U nique diagonal
for distinct eigenvalues (cf. matrix diagonaliza2on theorem) Columns of U are eigenvectors of S Diagonal elements of are eigenvalues of Introduc)on to Informa)on Retrieval Sec. 18.1 Diagonal decomposi2on: why/how Let U have the eigenvectors as columns: Then, SU can be written Thus SU=UΛ, or U–1SU=Λ And S=UΛU–1. 5 Introduc)on to Informa)on Retrieval Sec. 18.1 Diagonal decomposi2on
example Recall The eigenvectors and form Inverting, we have Recall UU–1 =1. Then, S=UΛU–1 = Introduc)on to Informa)on Retrieval Sec. 18.1 Example con2nued Let’s divide U (and multiply U–1) by Then, S= Q Λ (Q1= QT ) Why? Stay tuned … 6 Introduc)on to Informa)on Retrieval Sec. 18.1 Symmetric Eigen Decomposi2on If is a symmetric matrix: Theorem: There exists a (unique) eigen decomposi7on where Q is orthogonal: Q1= QT Columns of Q are normalized eigenvectors Columns are orthogonal. (everything is real) Introduc)on to Informa)on Retrieval Sec. 18.1 Exercise Examine the symmetric eigen decomposi2on, if any, for each of the following matrices: 7 Introduc)on to Informa)on Retrieval Time out! I came to this class to learn about text retrieval and mining, not have my linear algebra past dredged up again … But if you want to dredge, Strang’s Applied Mathema)cs is a good place to start. What do these matrices have to do with text? Recall M × N term
document matrices … But everything so far needs square matrices – so … Introduc)on to Informa)on Retrieval Sec. 18.2 Singular Value Decomposi2on For an M × N matrix A of rank r there exists a factorization (Singular Value Decomposition = SVD) as follows:
M×M M×N V is N×N The columns of U are orthogonal eigenvectors of AAT. The columns of V are orthogonal eigenvectors of ATA. Eigenvalues λ1 … λr of AAT are the eigenvalues of ATA.
Singular values. 8 Introduc)on to Informa)on Retrieval Sec. 18.2 Singular Value Decomposi2on Illustra2on of SVD dimensions and sparseness Introduc)on to Informa)on Retrieval Sec. 18.2 SVD example Let Thus M=3, N=2. Its SVD is Typically, the singular values arranged in decreasing order. 9 Introduc)on to Informa)on Retrieval Sec. 18.3 Low
rank Approxima2on SVD can be used to compute op2mal low
rank approxima7ons. Approxima2on problem: Find Ak of rank k such that Frobenius norm Ak and X are both m×n matrices. Typically, want k << r. Introduc)on to Informa)on Retrieval Sec. 18.3 Low
rank Approxima2on Solu2on via SVD set smallest rk singular values to zero k column notation: sum of rank 1 matrices 10 Introduc)on to Informa)on Retrieval Sec. 18.3 Reduced SVD If we retain only k singular values, and set the rest to 0, then we don’t need the matrix parts in brown Then Σ is k×k, U is M×k, VT is k×N, and Ak is M×N This is referred to as the reduced SVD It is the convenient (space
saving) and usual form for computa2onal applica2ons It’s what Matlab gives you k Introduc)on to Informa)on Retrieval Sec. 18.3 Approxima2on error How good (bad) is this approxima2on? It’s the best possible, measured by the Frobenius norm of the error: where the σi are ordered such that σi ≥ σi+1. Suggests why Frobenius error drops as k increased. 11 Introduc)on to Informa)on Retrieval Sec. 18.3 SVD Low
rank approxima2on Whereas the term
doc matrix A may have M=50000, N=10 million (and rank close to 50000) We can construct an approxima2on A100 with rank 100. Of all rank 100 matrices, it would have the lowest Frobenius error. Great … but why would we?? Answer: Latent Seman)c Indexing C. Eckart, G. Young, The approximation of a matrix by another of lower rank. Psychometrika, 1, 211218, 1936. Introduc)on to Informa)on Retrieval Latent Seman2c Indexing via the SVD 12 Introduc)on to Informa)on Retrieval Sec. 18.4 What it is From term
doc matrix A, we compute the approxima2on Ak. There is a row for each term and a column for each doc in Ak Thus docs live in a space of k<<r dimensions These dimensions are not the original axes But why? Introduc)on to Informa)on Retrieval Vector Space Model: Pros Automa7c selec2on of index terms Par7al matching of queries and documents (dealing with the case where no document contains all search terms) Ranking according to similarity score (dealing with large result sets) Term weigh7ng schemes (improves retrieval performance) Various extensions Document clustering Relevance feedback (modifying query vector) Geometric founda2on 13 Introduc)on to Informa)on Retrieval Problems with Lexical Seman2cs Ambiguity and associa2on in natural language Polysemy: Words ofen have a mul7tude of meanings and diﬀerent types of usage (more severe in very heterogeneous collec)ons). The vector space model is unable to discriminate between diﬀerent meanings of the same word. Introduc)on to Informa)on Retrieval Problems with Lexical Seman2cs Synonymy: Diﬀerent terms may have an den7cal or a similar meaning (weaker: words indica2ng the same topic). No associa2ons between words are made in the vector space representa2on. 14 Introduc)on to Informa)on Retrieval Polysemy and Context Document similarity on single word level: polysemy and context ring
••• jupiter … planet ... …
saturn meaning 1 space voyager ... meaning 2 contribution to similarity, if used in 1st meaning, but not if in 2nd car company dodge ford
••• Introduc)on to Informa)on Retrieval Sec. 18.4 Latent Seman2c Indexing (LSI) Perform a low
rank approxima7on of document
term matrix (typical rank 100
300) General idea Map documents (and terms) to a low
dimensional representa2on. Design a mapping such that the low
dimensional space reﬂects seman7c associa7ons (latent seman2c space). Compute document similarity based on the inner product in this latent seman7c space 15 Introduc)on to Informa)on Retrieval Sec. 18.4 Goals of LSI Similar terms map to similar loca2on in low dimensional space Noise reduc2on by dimension reduc2on Introduc)on to Informa)on Retrieval Sec. 18.4 Latent Seman2c Analysis Latent seman7c space: illustra2ng example courtesy of Susan Dumais 16 Introduc)on to Informa)on Retrieval Sec. 18.4 Performing the maps Each row and column of A gets mapped into the k
dimensional LSI space, by the SVD. Claim – this is not only the mapping with the best (Frobenius error) approxima2on to A, but in fact improves retrieval. A query q is also mapped into this space, by Query NOT a sparse vector. Introduc)on to Informa)on Retrieval Sec. 18.4 Empirical evidence Experiments on TREC 1/2/3 – Dumais Lanczos SVD code (available on netlib) due to Berry used in these expts Running 2mes of ~ one day on tens of thousands of docs [s2ll an obstacle to use] Dimensions – various values 250
350 reported. Reducing k improves recall. (Under 200 reported unsa2sfactory) Generally expect recall to improve – what about precision? 17 Introduc)on to Informa)on Retrieval Sec. 18.4 Empirical evidence Precision at or above median TREC precision Top scorer on almost 20% of TREC topics Slightly beMer on average than straight vector spaces Eﬀect of dimensionality: Dimensions 250 300 346 Precision 0.367 0.371 0.374 Introduc)on to Informa)on Retrieval Sec. 18.4 Failure modes Negated phrases TREC topics some2mes negate certain query/ terms phrases – precludes automa2c conversion of topics to latent seman2c space. Boolean queries As usual, freetext/vector space syntax of LSI queries precludes (say) “Find any doc having to do with the following 5 companies” See Dumais for more. 18 Introduc)on to Informa)on Retrieval Sec. 18.4 But why is this clustering? We’ve talked about docs, queries, retrieval and precision here. What does this have to do with clustering? Intui2on: Dimension reduc2on through LSI brings together “related” axes in the vector space. Introduc)on to Informa)on Retrieval Intui2on from block matrices N documents
Block 1 What’s the rank of this matrix?
Block 2 0’s … M terms 0’s Block k = Homogeneous nonzero blocks. 19 Introduc)on to Informa)on Retrieval Intui2on from block matrices N documents
Block 1 M terms 0’s Block 2 0’s … Block k Vocabulary partitioned into k topics (clusters); each doc discusses only one topic. Introduc)on to Informa)on Retrieval Intui2on from block matrices N documents
Block 1 What’s the best rankk approximation to this matrix?
Block 2 0’s … M terms 0’s Block k = nonzero entries. 20 Introduc)on to Informa)on Retrieval Intui2on from block matrices Likely there’s a good rankk approximation to this matrix.
wiper tire V6 Block 1 Block 2 Few nonzero entries … Few nonzero entries
car 10 automobile 0 1 Block k Introduc)on to Informa)on Retrieval Simplis2c picture Topic 1 Topic 2 Topic 3 21 Introduc)on to Informa)on Retrieval Some wild extrapola2on The “dimensionality” of a corpus is the number of dis2nct topics represented in it. More mathema2cal wild extrapola2on: if A has a rank k approxima2on of low Frobenius error, then there are no more than k dis2nct topics in the corpus. Introduc)on to Informa)on Retrieval LSI has many other applica2ons In many sepngs in paMern recogni2on and retrieval, we have a feature
object matrix. For text, the terms are features and the docs are objects. Could be opinions and users … This matrix may be redundant in dimensionality. Can work with low
rank approxima2on. If entries are missing (e.g., users’ opinions), can recover if dimensionality is low. Powerful general analy2cal technique Close, principled analog to clustering methods. 22 Introduc)on to Informa)on Retrieval Resources IIR 18 23 ...
View
Full
Document
This note was uploaded on 01/21/2011 for the course CSCP 689 taught by Professor James during the Spring '10 term at Texas A&M.
 Spring '10
 JAMES

Click to edit the document details