Unformatted text preview: imilarity Introduc)on to Informa)on Retrieval Introduc)on to Informa)on Retrieval Hard vs. sog clustering Par**oning Algorithms   Hard clustering: Each document belongs to exactly one cluster   Par**oning method: Construct a par**on of n documents into a set of K clusters   Given: a set of documents and the number K   Find: a par**on of K clusters that op*mizes the chosen par**oning criterion   More common and easier to do   Sog clustering: A document can belong to more than one cluster.   Makes more sense for applica*ons like crea*ng browsable hierarchies   You may want to put a pair of sneakers in two clusters: (i) sports apparel and (ii) shoes   You can only do that with a sog clustering approach.   We won t do sog clustering today. See IIR 16.5, 18   Globally op*mal   Intractable for many objec*ve func*ons   Ergo, exhaus*vely enumerate all par**ons   Effec*ve heuris*c methods: K ­means an...
