SIGDIAL27 - Proceedings of SIGDIAL 2010 the 11th Annual...

Info iconThis preview shows pages 1–2. Sign up to view the full content.

View Full Document Right Arrow Icon

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: Proceedings of SIGDIAL 2010: the 11th Annual Meeting of the Special Interest Group on Discourse and Dialogue , pages 147–156, The University of Tokyo, September 24-25, 2010. c 2010 Association for Computational Linguistics Discourse indicators for content selection in summarization Annie Louis, Aravind Joshi, Ani Nenkova University of Pennsylvania Philadelphia, PA 19104, USA { lannie,joshi,nenkova } Abstract We present analyses aimed at eliciting which speciFc aspects of discourse pro- vide the strongest indication for text im- portance. In the context of content selec- tion for single document summarization of news, we examine the beneFts of both the graph structure of text provided by dis- course relations and the semantic sense of these relations. We Fnd that structure information is the most robust indicator of importance. Semantic sense only pro- vides constraints on content selection but is not indicative of important content by it- self. However, sense features complement structure information and lead to improved performance. ¡urther, both types of dis- course information prove complementary to non-discourse features. While our re- sults establish the usefulness of discourse features, we also Fnd that lexical overlap provides a simple and cheap alternative to discourse for computing text structure with comparable performance for the task of content selection. 1 Introduction Discourse relations such as cause , contrast or elaboration are considered critical for text inter- pretation, as they signal in what way parts of a text relate to each other to form a coherent whole. ¡or this reason, the discourse structure of a text can be seen as an intermediate representation, over which an automatic summarizer can perform computa- tions in order to identify important spans of text to include in a summary (Ono et al., 1994; Marcu, 1998; Wolf and Gibson, 2004). In our work, we study the content selection performance of differ- ent types of discourse-based features. Discourse relations interconnect units of a text and discourse formalisms have proposed different resulting structures for the full text, i.e. tree (Mann and Thompson, 1988) and graph (Wolf and Gib- son, 2005). This structure is one source of in- formation from discourse which can be used to compute the importance of text units. The seman- tics of the discourse relations between sentences could be another indicator of content importance. ¡or example, text units connected by “cause” and “contrast” relationships might be more important content for summaries compared to those convey- ing “elaboration”. While previous work have fo- cused on developing content selection methods based upon individual frameworks (Marcu, 1998; Wolf and Gibson, 2004; Uzda et al., 2008), little is known about which aspects of discourse are actu- ally correlated with content selection power....
View Full Document

This note was uploaded on 03/06/2012 for the course CIS 630 taught by Professor Cis630 during the Spring '08 term at UPenn.

Page1 / 10

SIGDIAL27 - Proceedings of SIGDIAL 2010 the 11th Annual...

This preview shows document pages 1 - 2. Sign up to view the full document.

View Full Document Right Arrow Icon
Ask a homework question - tutors are online