This preview shows pages 1–2. Sign up to view the full content.
This preview has intentionally blurred sections. Sign up to view the full version.View Full Document
Unformatted text preview: Learning First-Order Horn Clauses from Web Text Stefan Schoenmackers, Oren Etzioni, Daniel S. Weld Turing Center University of Washington Computer Science and Engineering Box 352350 Seattle, WA 98125, USA stef,etzioni,[email protected] Jesse Davis Katholieke Universiteit Leuven Department of Computer Science POBox 02402 Celestijnenlaan 200a B-3001 Heverlee, Belgium [email protected] Abstract Even the entire Web corpus does not explic- itly answer all questions, yet inference can un- cover many implicit answers. But where do inference rules come from? This paper investigates the problem of learn- ing inference rules from Web text in an un- supervised, domain-independent manner. The SHERLOCK system, described herein, is a first-order learner that acquires over 30,000 Horn clauses from Web text. SHERLOCK em- bodies several innovations, including a novel rule scoring function based on Statistical Rel- evance (Salmon et al., 1971) which is effec- tive on ambiguous, noisy and incomplete Web extractions. Our experiments show that in- ference over the learned rules discovers three times as many facts (at precision 0.8) as the TEXTRUNNER system which merely extracts facts explicitly stated in Web text. 1 Introduction Today’s Web search engines locate pages that match keyword queries. Even sophisticated Web-based Q/A systems merely locate pages that contain an ex- plicit answer to a question. These systems are help- less if the answer has to be inferred from multiple sentences, possibly on different pages. To solve this problem, Schoenmackers et al. (2008) introduced the HOLMES system, which infers answers from tuples extracted from text. HOLMES’s distinction is that it is domain inde- pendent and that its inference time is linear in the size of its input corpus, which enables it to scale to the Web. However, HOLMES’s Achilles heel is that it requires hand-coded, first-order, Horn clauses as input. Thus, while HOLMES’s inference run time is highly scalable, it requires substantial labor and expertise to hand-craft the appropriate set of Horn clauses for each new domain. Is it possible to learn effective first-order Horn clauses automatically from Web text in a domain- independent and scalable manner? We refer to the set of ground facts derived from Web text as open- domain theories . Learning Horn clauses has been studied extensively in the Inductive Logic Program- ming (ILP) literature (Quinlan, 1990; Muggleton, 1995). However, learning Horn clauses from open- domain theories is particularly challenging for sev- eral reasons. First, the theories denote instances of an unbounded and unknown set of relations. Sec- ond, the ground facts in the theories are noisy, and incomplete. Negative examples are mostly absent, and certainly we cannot make the closed-world as- sumption typically made by ILP systems. Finally, the names used to denote both entities and relations are rife with both synonyms and polysymes making their referents ambiguous and resulting in a particu-...
View Full Document
This note was uploaded on 02/10/2012 for the course CSE 5800 taught by Professor Staff during the Fall '09 term at FIT.
- Fall '09
- Computer Science