fulltext - University of Pennsylvania ScholarlyCommons...

Info iconThis preview shows pages 1–3. Sign up to view the full content.

View Full Document Right Arrow Icon

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: University of Pennsylvania ScholarlyCommons Departmental Papers (CIS) Department of Computer & Information Science 2-10-2008 Sense Annotation in the Penn Discourse Treebank Eleni Miltsakaki University of Pennsylvania , [email protected] Livio Robaldo University of Turin Alan Lee University of Pennsylvania , [email protected] Aravind K. Joshi University of Pennsylvania , [email protected] Postprint version. Published in Lecture Notes in Computer Science , Volume 4919, Computational Linguistics and Intelligent Text, 2008, pages 275-286. Publisher URL: http://dx.doi.org/ 10.1007/978-3-540-78135-6_23 This paper is posted at ScholarlyCommons. http://repository.upenn.edu/cis_papers/388 For more information, please contact [email protected] . Sense Annotation in the Penn Discourse Treebank Eleni Miltsakaki 1 , Livio Robaldo 2 , Alan Lee 1 , and Aravind Joshi 1 1 Institute for Research in Cognitive Science, University of Pennsylvania { elenimi,aleewk,joshi } @linc.cis.upenn.edu 2 Department of Computer Science, University of Turin [email protected] Abstract. An important aspect of discourse understanding and genera- tion involves the recognition and processing of discourse relations. These are conveyed by discourse connectives, i.e., lexical items like because and as a result or implicit connectives expressing an inferred discourse rela- tion. The Penn Discourse TreeBank (PDTB) provides annotations of the argument structure, attribution and semantics of discourse connectives. In this paper, we provide the rationale of the tagset, detailed descrip- tions of the senses with corpus examples, simple semantic definitions of each type of sense tags as well as informal descriptions of the inferences allowed at each level. 1 Introduction Large scale annotated corpora have played and continue to play a critical role in natural language processing. The continuously growing demand for more power- ful and sophisticated NLP applications is evident in recent efforts to produce cor- pora with richer annotations [6], including annotations at the discourse level[2], [8], [4]. The Penn Discourse Treebank is, to date, the largest annotation effort at the discourse level, providing annotations of explicit and implicit connectives. The design of this annotation effort is based on the view that discource connec- tives are predicates taking clausal arguments. In Spring 2006, the first version of the Penn Discourse Treebank was released, making availalble thousands an- notations of discourse connectives and the textual spans that they relate. Discourse connectives, however, like verbs, can have more than one meaning. Being able to correctly identify the intended sense of connectives is crucial for every natural language task which relies on understanding relationships between events or situations in the discourse. The accuracy of information retrieval from text can be significantly impaired if, for example, a temporal relation anchored on the connective...
View Full Document

This note was uploaded on 03/06/2012 for the course CIS 630 taught by Professor Cis630 during the Spring '08 term at UPenn.

Page1 / 13

fulltext - University of Pennsylvania ScholarlyCommons...

This preview shows document pages 1 - 3. Sign up to view the full document.

View Full Document Right Arrow Icon
Ask a homework question - tutors are online