lecture9-queryexpansion-handout-6-per

# 491 070991 soviets may adapt parts of ss20 missile

Unformatted text preview: ianespace 3.004 bundespost 2.806 ss 2.790 rocket 2.053 scien*st 2.003 broadcast 1.172 earth 0.836 oil 0.646 measure Introduc)on to Informa)on Retrieval S ec. 9.1.1 Results for expanded query Expanded query aker relevance feedback                   Introduc)on to Informa)on Retrieval 3. 0.493, 08/07/89, When the Pentagon Launches a Secret Satellite, Space Sleuths Do Some Spy Work of Their Own 4. 0.493, 07/31/89, NASA Uses Warm Superconductors For Fast Circuit 8 5. 0.492, 12/02/87, Telecommunica*ons Tale of Two Companies 6. 0.491, 07/09/91, Soviets May Adapt Parts of SS ­20 Missile For Commercial Use 7. 0.490, 07/12/88, Gaping Gap: Pentagon Lags in Race To Match the Soviets In Rocket Launchers 8. 0.490, 06/14/90, Rescue of Satellite By Space Agency To Cost \$90 Million Sec. 9.1.1 Introduc)on to Informa)on Retrieval Sec. 9.1.1 Key concept: Centroid Rocchio Algorithm   The centroid is the center of mass of a set of points   Recall that we represent documents as points in a high ­dimensional space   Deﬁni*on: Centroid 1 µ (C ) = d   The Rocchio algorithm uses the vector space model to pick a relevance feedback query   Rocchio seeks the query qopt that maximizes qopt = arg max [cos(q, µ (Cr )) − cos( q, µ (Cnr ))] ∑ | C | d∈C where C is a set of documents. q   Tries to separate docs marked relevant and non ­ relevant 1 1 qopt = ∑d Cr d j ∈Cr j − Cnr ∑d d j ∉Cr j   Problem: we don t know the truly relevant docs Introduc)on to Informa)on Retrieval Sec. 9.1.1 Th...
