Unformatted text preview: r Introduc)on to Informa)on Retrieval Sec. 16.1 ScaVer/Gather: Cu_ng, Karger, and Pedersen Introduc)on to Informa)on Retrieval For visualizing a document collec*on and its themes S ec. 16.1 For improving search recall   Cluster hypothesis  ­ Documents in the same cluster behave similarly with respect to relevance to informa*on needs   Therefore, to improve search recall:   Cluster docs in corpus a priori   When a query matches a doc D, also return other docs in the cluster containing D   Hope if we do this: The query car will also return docs containing automobile   Because clustering grouped together docs containing car with those containing automobile.   Wise et al, Visualizing the non ­visual PNNL   ThemeScapes, Car*a   [Mountain height = cluster size] Why might this happen? Introduc)on to Informa)on Retrieval Introduc)on to Informa)on Retrieval Sec. 16.2 Issues for clustering   Representa*on for clustering   Document representa*on   Vector space? Normaliza*on?   Centroids a...
