LatentSemantics.2011

# LatentSemantics.2011 - 1 Latent Semantics We recall that we...

This preview shows pages 1–3. Sign up to view the full content.

1 Latent Semantics We recall that we were able to express the huge matrix that relates words (terms) to documents by introducing a matrix that relates words to concepts, and doing a matrix multiplication. Terms x Documents Topics x Terms Topics x Documents Figure 1. By the way, let’s recall the horrible formula for matrix multiplication: if we call the pink matrix M, and the term-document matrix D, and the Topic-Document (or concept document) matrix C, then the relation is that: () kj ki ij i C MD matrix multiplication C M D In this equation: the number ki M (in the k-th row and the i-th column of M) tells how related the specific word or term (“i”) is to the concept labeled by “ k”. The number ij D , ( in the i-th row and thej-th column of D) tells us something about how often the i-th word occurs in the document labeled j. And this particular term kj C in the matrix product tells us how important concept k is in the document labeled j. Recall that what this means is that we can get, for each particular document, a new vector, which is much shorter, and still represents the document. But it represents it in terms of the topics that are in it, rather than the terms. Recall that in the yellow matrix, each document is represented by a column. (Shown here in blue.) That same document is represented (in terms of concepts or topics) by the shorter vector, shown in yellow, in the topics by documents matrix. (See Figure 2).

This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document
Figure 2 In this case we had to make up the “pink matrix” ourselves. We did it for a few words. But if we tried to do it by hand for all the words in the language, and all the reasonable topics, it would certainly keep all the librarians in the world busy for centuries. So people looked for a different way to get at it. They looked to the idea that “where a word appears” tell you something about its meaning. There are two senses to the notion of “where a word appears”. One is the local context. The other is distribution across documents or web pages. For example the context of the word “something” in the preceding paragraph in the context: tell you something about its. I could have taken a larger or smaller context, but I am just looking two words to the left and to the right. This context tells us that whatever the word “something” means, it makes sense to put it into the local context “ tell you ------ ------ about its ”. That actually tells us a little about what the word something means. On the other hand, distribution across web pages tells us something about a word also. Depending on what a word means, it will tend to appear in some pages and not others.
This is the end of the preview. Sign up to access the rest of the document.

## LatentSemantics.2011 - 1 Latent Semantics We recall that we...

This preview shows document pages 1 - 3. Sign up to view the full document.

View Full Document
Ask a homework question - tutors are online