This preview shows pages 1–2. Sign up to view the full content.
This preview has intentionally blurred sections. Sign up to view the full version.View Full Document
Unformatted text preview: Proceedings of the 47th Annual Meeting of the ACL and the 4th IJCNLP of the AFNLP , pages 683691, Suntec, Singapore, 2-7 August 2009. c 2009 ACL and AFNLP Automatic sense prediction for implicit discourse relations in text Emily Pitler, Annie Louis, Ani Nenkova Computer and Information Science University of Pennsylvania Philadelphia, PA 19104, USA epitler,lannie,firstname.lastname@example.org Abstract We present a series of experiments on au- tomatically identifying the sense of im- plicit discourse relations, i.e. relations that are not marked with a discourse con- nective such as but or because. We work with a corpus of implicit relations present in newspaper text and report re- sults on a test set that is representative of the naturally occurring distribution of senses. We use several linguistically in- formed features, including polarity tags, Levin verb classes, length of verb phrases, modality, context, and lexical features. In addition, we revisit past approaches using lexical pairs from unannotated text as fea- tures, explain some of their shortcomings and propose modifications. Our best com- bination of features outperforms the base- line from data intensive approaches by 4% for comparison and 16% for contingency. 1 Introduction Implicit discourse relations abound in text and readers easily recover the sense of such relations during semantic interpretation. But automatic sense prediction for implicit relations is an out- standing challenge in discourse processing. Discourse relations, such as causal and contrast relations, are often marked by explicit discourse connectives (also called cue words) such as be- cause or but. It is not uncommon, though, for a discourse relation to hold between two text spans without an explicit discourse connective, as the ex- ample below demonstrates: (1) The 101-year-old magazine has never had to woo ad- vertisers with quite so much fervor before. [because] It largely rested on its hard-to-fault demo- graphics. In this paper we address the problem of au- tomatic sense prediction for discourse relations in newspaper text. For our experiments, we use the Penn Discourse Treebank, the largest exist- ing corpus of discourse annotations for both im- plicit and explicit relations. Our work is also informed by the long tradition of data intensive methods that rely on huge amounts of unanno- tated text rather than on manually tagged corpora (Marcu and Echihabi, 2001; Blair-Goldensohn et al., 2007). In our analysis, we focus only on implicit dis- course relations and clearly separate these from explicits. Explicit relations are easy to iden- tify. The most general senses (comparison, con- tingency, temporal and expansion) can be disam- biguated in explicit relations with 93% accuracy based solely on the discourse connective used to signal the relation (Pitler et al., 2008). So report- ing results on explicit and implicit relations sepa- rately will allow for clearer tracking of progress....
View Full Document
- Spring '08
- Natural Language Processing