ir-s10

ir-s10 - 1/26 Start of IR End of ιριμιι Pledge Week...

Info iconThis preview shows pages 1–11. Sign up to view the full content.

View Full Document Right Arrow Icon

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: 1/26 Start of IR End of ιριμιι Pledge Week 22+1 in 598 22 in 494 Roster pictures Outline of IR topics ■ Background ◆ Definitions, etc. ■ The Problem ◆ 100,000+ pages ■ The Solution ◆ Ranking docs ◆ Vector space ■ Extensions ◆ Relevance feedback, ◆ clustering, ◆ query expansion, etc. Information Retrieval ■ Traditional Model ◆ Given ✦ a set of documents ✦ A query expressed as a set of keywords ◆ Return ✦ A ranked set of documents most relevant to the query ◆ Evaluation: ✦ Precision: Fraction of returned documents that are relevant ✦ Recall: Fraction of relevant documents that are returned ✦ Efficiency ■ Web-induced headaches ◆ Scale (billions of documents) ◆ Hypertext (inter- document connections) ■ Consequently ◆ Ranking that takes link structure into account ✦ Authority/Hub ◆ Indexing and Retrieval algorithms that are ultra fast What is Information Retrieval ■ Given a large repository of documents, how do I get at the ones that I want ◆ Examples: Lexus/Nexus, Medical reports, AltaVista ✦ Keyword based ■ Different from databases ◆ Unstructured (or semi-structured) data ◆ Information is (typically) text ◆ Requests are (typically) word-based I n p r i n c i p l e , t h i s r e q u i r e s N L P !-- N L P t o o h a r d a s y e t-- I R t r i e s t o g e t b y w i t h s y n t a c t i c m e t h o d s C a t c h 2 2 : S i n c e I R d o e s n ’ t d o N L P , u s e r s t e n d t o w r i t e c r y p t i c k e y w o r d q u e r i e s Docs Information Need Index Terms doc query Ranking match Information vs. Data ■ Data retrieval ✦ which docs contain a set of keywords? ✦ Well defined semantics ✦ a single erroneous object implies failure! • A single missed object implies failure too.. ■ Information retrieval ✦ information about a subject or topic ✦ semantics is frequently loose ✦ small errors are tolerated ■ IR system: ✦ interpret contents of information items ✦ generate a ranking which reflects relevance ✦ notion of relevance is most important Relevance : The most over-loaded word in IR ■ We want to rank and return documents that are “relevant” to the user’s query ◆ Easy if each document has a relevance number R(.); just sort the documents in R(.). ■ What does relevance R(.) depend on? ◆ The document d ◆ The query Q ◆ The user U R(d|Q,U) Relevance : The most over-loaded word in IR ■ We want to rank and return documents that are “relevant” to the user’s query ◆ Easy if each document has a relevance number R(.); just sort the documents in R(.). ■ What does relevance R(.) depend on? ◆ The document d ◆ The query Q ◆ The user U ◆ The other documents already shown {d1 d2 … dk } R(d|Q,U, {d1 d2 … dk } ) How to get ■ Specify up front ◆ Too hard—one for each query, user and shown results combination ■ Learn ◆ Active (utility elicitation) ◆ Passive (learn from what the user does) ■ Make up the users’ mind ◆ What you are “really” looking for is.. (used What you are “really” looking for is....
View Full Document

Page1 / 79

ir-s10 - 1/26 Start of IR End of ιριμιι Pledge Week...

This preview shows document pages 1 - 11. Sign up to view the full document.

View Full Document Right Arrow Icon
Ask a homework question - tutors are online