s10-info-extraction

s10-info-extraction - 1 2 Can we automatically Extract this...

Info iconThis preview shows pages 1–6. Sign up to view the full content.

View Full Document Right Arrow Icon

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: 1 2 Can we automatically Extract this information From the text (instead of depending on creators To provide automated annotations?) Information Extraction What is Information Extraction Filling slots in a database from sub-segments of text. As a task: October 14, 2002, 4:00 a.m. PT For years, Microsoft Corporation CEO Bill Gates railed against the economic philosophy of open-source software with Orwellian fervor, denouncing its communal licensing as a "cancer" that stifled technological innovation. Today, Microsoft claims to "love" the open- source concept, by which software code is made public to encourage improvement and development by outside programmers. Gates himself says Microsoft will gladly disclose its crown jewels--the coveted code behind the Windows operating system--to select customers. "We can be open source. We love the concept of shared source," said Bill Veghte, a Microsoft VP. "That's a super-important shift for us in terms of code access. Richard Stallman, founder of the Free Software Foundation, countered saying NAME TITLE ORGANIZATION Slides from Cohen & McCallum What is Information Extraction Filling slots in a database from sub-segments of text. As a task: October 14, 2002, 4:00 a.m. PT For years, Microsoft Corporation CEO Bill Gates railed against the economic philosophy of open-source software with Orwellian fervor, denouncing its communal licensing as a "cancer" that stifled technological innovation. Today, Microsoft claims to "love" the open- source concept, by which software code is made public to encourage improvement and development by outside programmers. Gates himself says Microsoft will gladly disclose its crown jewels--the coveted code behind the Windows operating system--to select customers. "We can be open source. We love the concept of shared source," said Bill Veghte , a Microsoft VP . "That's a super-important shift for us in terms of code access. Richard Stallman , founder of the Free Software Foundation , countered saying NAME TITLE ORGANIZATION Bill Gates CEO Microsoft Bill Veghte VP Microsoft Richard Stallman founder Free Soft.. IE Slides from Cohen & McCallum Tapping into the Collective Unconscious Another thread of exciting research is driven by the realization that WEB is not random at all! It is written by humans so analyzing its structure and content allows us to tap into the collective unconscious .. Meaning can emerge from syntactic notions such as co-occurrences and connectedness Examples: Analyzing term co-occurrences in the web-scale corpora to capture semantic information (todays paper) Analyzing the link-structure of the web graph to discover communities DoD and NSA are very much into this as a way of breaking terrorist cells Analyzing the transaction patterns of customers (collaborative filtering) Big Idea 3 How can we possibly do this without full NLP?...
View Full Document

This note was uploaded on 03/11/2012 for the course CSE 494 taught by Professor Rao during the Spring '08 term at ASU.

Page1 / 95

s10-info-extraction - 1 2 Can we automatically Extract this...

This preview shows document pages 1 - 6. Sign up to view the full document.

View Full Document Right Arrow Icon
Ask a homework question - tutors are online