jurafsky&martin_3rdEd_17 (1).pdf

# Represent a function of word meaning instead of

• 499
• 100% (1) 1 out of 1 people found this document helpful

This preview shows pages 286–288. Sign up to view the full content.

represent a function of word meaning, instead of having to learn tens of thousands of weights for each of the sparse dimensions. Because they contain fewer parameters than sparse vectors of explicit counts, dense vectors may generalize better and help avoid overfitting. And dense vectors may do a better job of capturing synonymy than sparse vectors. For example, car and automobile are synonyms; but in a typical sparse vectors representation, the car dimension and the automobile dimension are distinct dimensions. Because the relationship between these two dimensions is not modeled, sparse vectors may fail to capture the similarity between a word with car as a neighbor and a word with automobile as a neighbor. We will introduce three methods of generating very dense, short vectors: (1) using dimensionality reduction methods like SVD , (2) using neural nets like the popular skip-gram or CBOW approaches. (3) a quite different approach based on neighboring words called Brown clustering . 16.1 Dense Vectors via SVD We begin with a classic method for generating dense vectors: singular value de- composition , or SVD , first applied to the task of generating embeddings from term- SVD document matrices by Deerwester et al. (1988) in a model called Latent Semantic Indexing or Latent Semantic Analysis ( LSA ). Latent Semantic Analysis Singular Value Decomposition (SVD) is a method for finding the most important dimensions of a data set, those dimensions along which the data varies the most. It can be applied to any rectangular matrix. SVD is part of a family of methods that can approximate an N-dimensional dataset using fewer dimensions, including Principle Components Analysis (PCA) , Factor Analysis , and so on. In general, dimensionality reduction methods first rotate the axes of the original dataset into a new space. The new space is chosen so that the highest order dimen- sion captures the most variance in the original dataset, the next dimension captures

This preview has intentionally blurred sections. Sign up to view the full version.

16.1 D ENSE V ECTORS VIA SVD 287 the next most variance, and so on. Fig. 16.1 shows a visualization. A set of points (vectors) in two dimensions is rotated so that the first new dimension captures the most variation in the data. In this new space, we can represent data with a smaller number of dimensions (for example using one dimension instead of two) and still capture much of the variation in the original data. Original Dimension 1 Original Dimension 2 PCA dimension 1 PCA dimension 2 Original Dimension 1 Original Dimension 2 (a) (b) PCA dimension 1 PCA dimension 2 PCA dimension 1 (c) (d) Figure 16.1 Visualizing principle components analysis: Given original data (a) find the rotation of the data (b) such that the first dimension captures the most variation, and the second dimension is the one orthogonal to the first that captures the next most variation. Use this new rotated space (c) to represent each point on a single dimension (d). While some information about the relationship between the original points is necessarily lost, the remaining dimension preserves the most that any one dimension could.
This is the end of the preview. Sign up to access the rest of the document.
• Fall '09

{[ snackBarMessage ]}

### What students are saying

• As a current student on this bumpy collegiate pathway, I stumbled upon Course Hero, where I can find study resources for nearly all my courses, get online help from tutors 24/7, and even share my old projects, papers, and lecture notes with other students.

Kiran Temple University Fox School of Business ‘17, Course Hero Intern

• I cannot even describe how much Course Hero helped me this summer. It’s truly become something I can always rely on and help me. In the end, I was not only able to survive summer classes, but I was able to thrive thanks to Course Hero.

Dana University of Pennsylvania ‘17, Course Hero Intern

• The ability to access any university’s resources through Course Hero proved invaluable in my case. I was behind on Tulane coursework and actually used UCLA’s materials to help me move forward and get everything together on time.

Jill Tulane University ‘16, Course Hero Intern