emnlp-2010-stef - Learning First-Order Horn Clauses from...

Info icon This preview shows pages 1–2. Sign up to view the full content.

View Full Document Right Arrow Icon
Learning First-Order Horn Clauses from Web Text Stefan Schoenmackers, Oren Etzioni, Daniel S. Weld Turing Center University of Washington Computer Science and Engineering Box 352350 Seattle, WA 98125, USA stef,etzioni,[email protected] Jesse Davis Katholieke Universiteit Leuven Department of Computer Science POBox 02402 Celestijnenlaan 200a B-3001 Heverlee, Belgium [email protected] Abstract Even the entire Web corpus does not explic- itly answer all questions, yet inference can un- cover many implicit answers. But where do inference rules come from? This paper investigates the problem of learn- ing inference rules from Web text in an un- supervised, domain-independent manner. The S HERLOCK system, described herein, is a first-order learner that acquires over 30,000 Horn clauses from Web text. S HERLOCK em- bodies several innovations, including a novel rule scoring function based on Statistical Rel- evance (Salmon et al., 1971) which is effec- tive on ambiguous, noisy and incomplete Web extractions. Our experiments show that in- ference over the learned rules discovers three times as many facts (at precision 0.8) as the T EXT R UNNER system which merely extracts facts explicitly stated in Web text. 1 Introduction Today’s Web search engines locate pages that match keyword queries. Even sophisticated Web-based Q/A systems merely locate pages that contain an ex- plicit answer to a question. These systems are help- less if the answer has to be inferred from multiple sentences, possibly on different pages. To solve this problem, Schoenmackers et al. (2008) introduced the H OLMES system, which infers answers from tuples extracted from text. H OLMES ’s distinction is that it is domain inde- pendent and that its inference time is linear in the size of its input corpus, which enables it to scale to the Web. However, H OLMES ’s Achilles heel is that it requires hand-coded, first-order, Horn clauses as input. Thus, while H OLMES ’s inference run time is highly scalable, it requires substantial labor and expertise to hand-craft the appropriate set of Horn clauses for each new domain. Is it possible to learn effective first-order Horn clauses automatically from Web text in a domain- independent and scalable manner? We refer to the set of ground facts derived from Web text as open- domain theories . Learning Horn clauses has been studied extensively in the Inductive Logic Program- ming (ILP) literature (Quinlan, 1990; Muggleton, 1995). However, learning Horn clauses from open- domain theories is particularly challenging for sev- eral reasons. First, the theories denote instances of an unbounded and unknown set of relations. Sec- ond, the ground facts in the theories are noisy, and incomplete. Negative examples are mostly absent, and certainly we cannot make the closed-world as- sumption typically made by ILP systems. Finally, the names used to denote both entities and relations are rife with both synonyms and polysymes making their referents ambiguous and resulting in a particu- larly noisy and ambiguous set of ground facts.
Image of page 1

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full Document Right Arrow Icon
Image of page 2
This is the end of the preview. Sign up to access the rest of the document.

{[ snackBarMessage ]}

What students are saying

  • Left Quote Icon

    As a current student on this bumpy collegiate pathway, I stumbled upon Course Hero, where I can find study resources for nearly all my courses, get online help from tutors 24/7, and even share my old projects, papers, and lecture notes with other students.

    Student Picture

    Kiran Temple University Fox School of Business ‘17, Course Hero Intern

  • Left Quote Icon

    I cannot even describe how much Course Hero helped me this summer. It’s truly become something I can always rely on and help me. In the end, I was not only able to survive summer classes, but I was able to thrive thanks to Course Hero.

    Student Picture

    Dana University of Pennsylvania ‘17, Course Hero Intern

  • Left Quote Icon

    The ability to access any university’s resources through Course Hero proved invaluable in my case. I was behind on Tulane coursework and actually used UCLA’s materials to help me move forward and get everything together on time.

    Student Picture

    Jill Tulane University ‘16, Course Hero Intern