class02-boolean

class02-boolean - Information Storage and Retrieval CSCE...

Info iconThis preview shows pages 1–8. Sign up to view the full content.

View Full Document Right Arrow Icon
Information Storage and Retrieval CSCE 670 Instructor: Prof. James Caverlee Boolean Retrieval 21 January 2010 1 Brief history of IR (or, there’s more to IR than Google) Boolean retrieval Python tutorial (Jeremy) To d ay 2
Background image of page 1

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full Document Right Arrow Icon
Library of Alexandria 500,000 volumes catalogs and classiFcations “controlled vocabularies” 3rd Century BC History of the (IR) World 3 Controlled Vocabulary Indexing 4
Background image of page 2
History of the (IR) World 48 BC 5 First concordance of the Bible invention of the inverted list data structure 1247 History of the (IR) World 6
Background image of page 3

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full Document Right Arrow Icon
University of Oxford Library 1600 all books printed in England Johnson’s Dictionary 1755 Set standard for dictionaries Included common language Helped standardize spelling Library of Congress 1800 Webster’s Dictionary 1828 signiFcantly larger than previous dictionaries standardized American spelling Roget’s Thesaurus 1852 Dewey Decimal classiFcation 1876 Carnegie Public Libraries 1880s 1681 built (Frst public library 1850) History of the (IR) World 7 8
Background image of page 4
Between 1883 and 1929, there were 2509 libraries built in US, Britain, Ireland, . .. By 1919, of the 3500 libraries in the US, half were built with construction grants provided by Carnegie Carnegie Libraries 9 Punched card retrieval systems 1930s Vannevar Bush’s Memex 1945 Personal information store based on associations History of the (IR) World 10
Background image of page 5

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full Document Right Arrow Icon
H.P. Luhn’s papers late 1950s Salton’s SMART system 1963 Vector space retrieval!! NLM introduces MEDLINE 1964 Lockheed introduces DIALOG 1967 Simple probabilistic retrieval model 1976 West Publishing introduces WIN 1992 frst major commercial use oF ranked retrieval!!! History of the (IR) World 11 Information storage revolution early 1990s cheap disks Web search engines 1994 Statistical language models for IR 1998 Google PageRank 1998 Advances in ML 2000s Massive social info explosion 2000s Ubiquitous mobile devices 2000s History of the (IR) World 12
Background image of page 6
Boolean Retrieval 13 Let’s get started . .. Which plays of Shakespeare contain the words #SVUVT "/% $BFTBS but /05 $BMQVSOJB ? One could grep all of Shakespeare’s plays for #SVUVT and $BFTBS± then strip out lines containing $BMQVSOJB ? Slow (for large corpora)
Background image of page 7

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full Document Right Arrow Icon
Image of page 8
This is the end of the preview. Sign up to access the rest of the document.

{[ snackBarMessage ]}

Page1 / 23

class02-boolean - Information Storage and Retrieval CSCE...

This preview shows document pages 1 - 8. Sign up to view the full document.

View Full Document Right Arrow Icon
Ask a homework question - tutors are online