{[ promptMessage ]}

Bookmark it

{[ promptMessage ]}

lecture1-intro-handout-6-per

# lecture1-intro-handout-6-per -...

This preview shows pages 1–3. Sign up to view the full content.

1 Introduc)on to Informa(on Retrieval CS276 Informa)on Retrieval and Web Search Pandu Nayak and Prabhakar Raghavan Lecture 1: Boolean retrieval Introduc)on to Informa)on Retrieval Informa)on Retrieval Informa)on Retrieval (IR) is finding material (usually documents) of an unstructured nature (usually text) that sa)sfies an informa)on need from within large collec)ons (usually stored on computers). 2 Introduc)on to Informa)on Retrieval Unstructured (text) vs. structured (database) data in 1996 3 Introduc)on to Informa)on Retrieval Unstructured (text) vs. structured (database) data in 2009 4 Introduc)on to Informa)on Retrieval Unstructured data in 1680 Which plays of Shakespeare contain the words Brutus AND Caesar but NOT Calpurnia ? One could grep all of Shakespeare’s plays for Brutus and Caesar, then strip out lines containing Calpurnia ? Why is that not the answer? Slow (for large corpora) NOT Calpurnia is non‐trivial Other opera)ons (e.g., find the word Romans near countrymen ) not feasible Ranked retrieval (best documents to return) Later lectures 5 Sec. 1.1 Introduc)on to Informa)on Retrieval Term‐document incidence 1 if play contains word , 0 otherwise Brutus AND Caesar BUT NOT Calpurnia Sec. 1.1

This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document
2 Introduc)on to Informa)on Retrieval Incidence vectors So we have a 0/1 vector for each term. To answer query: take the vectors for Brutus, Caesar and Calpurnia (complemented) bitwise AND . 110100 AND 110111 AND 101111 = 100100. 7 Sec. 1.1 Introduc)on to Informa)on Retrieval Answers to query Antony and Cleopatra, Act III, Scene ii Agrippa [Aside to DOMITIUS ENOBARBUS]: Why, Enobarbus, When Antony found Julius Caesar dead, He cried almost to roaring; and he wept When at Philippi he found Brutus slain. Hamlet, Act III, Scene ii Lord Polonius: I did enact Julius Caesar I was killed i' the Capitol; Brutus killed me. 8 Sec. 1.1 Introduc)on to Informa)on Retrieval Basic assump)ons of Informa)on Retrieval Collec)on : Fixed set of documents Goal : Retrieve documents with informa)on that is relevant to the user’s informa)on need and helps the user complete a task 9 Sec. 1.1 Introduc)on to Informa)on Retrieval The classic search model Corpus TASK Info Need Query Verbal form Results SEARCH ENGINE Query Refinement Info about removing mice without killing them mouse trap Misconception? Mistranslation? Misformulation? Introduc)on to Informa)on Retrieval How good are the retrieved docs? Precision : Frac)on of retrieved docs that are relevant to user’s informa)on need Recall : Frac)on of relevant docs in collec)on that are retrieved More precise defini)ons and measurements to follow in later lectures 11 Sec. 1.1 Introduc)on to Informa)on Retrieval Bigger collec)ons Consider N = 1 million documents, each with about 1000 words.
This is the end of the preview. Sign up to access the rest of the document.

{[ snackBarMessage ]}

### Page1 / 8

lecture1-intro-handout-6-per -...

This preview shows document pages 1 - 3. Sign up to view the full document.

View Full Document
Ask a homework question - tutors are online