This preview shows pages 1–2. Sign up to view the full content.
This preview has intentionally blurred sections. Sign up to view the full version.View Full Document
Unformatted text preview: Proceedings of the SIGDIAL 2011: the 12th Annual Meeting of the Special Interest Group on Discourse and Dialogue , pages 194203, Portland, Oregon, June 17-18, 2011. c 2011 Association for Computational Linguistics Multilingual Annotation and Disambiguation of Discourse Connectives for Machine Translation Thomas Meyer and Andrei Popescu-Belis Idiap Research Institute Rue Marconi 19, 1920 Martigny, Switzerland Thomas.Meyer@idiap.ch , Andrei.Popescu-Belis@idiap.ch Sandrine Zufferey and Bruno Cartoni Department of Linguistics, University of Geneva Rue de Candolle 2, 1211 Geneva 4, Switzerland Sandrine.Zufferey@unige.ch , Bruno.Cartoni@unige.ch Abstract Many discourse connectives can signal several types of relations between sentences. Their automatic disambiguation, i.e. the labeling of the correct sense of each occurrence, is impor- tant for discourse parsing, but could also be helpful to machine translation. We describe new approaches for improving the accuracy of manual annotation of three discourse con- nectives (two English, one French) by using parallel corpora. An appropriate set of labels for each connective can be found using infor- mation from their translations. Our results for automatic disambiguation are state-of-the-art, at up to 85% accuracy using surface features. Using feature analysis, contextual features are shown to be useful across languages and con- nectives. 1 Introduction Discourse connectives are generally considered as indicators of discourse structure, relating two sen- tences of a written or spoken text, and making ex- plicit the rhetorical or coherence relation between them. Leaving aside the cases when connectives are only implicit, the presence of a connective does not unambiguously signal a specific discourse relation. In fact, many connectives can indicate several types of relations between sentences, i.e. they have several possible senses in context. This paper studies the manual and automated dis- ambiguation of three ambiguous connectives in two languages: alors que in French, since and while in English. We will show how the multilingual per- spective helps to improve the accuracy of annota- tion, and how it helps to find appropriate labels for automated processing and MT. Results from auto- matic annotation experiments, which are close to the state of the art, as well as feature analysis, help to as- sess the usefulness of the proposed labels. The paper is organized as follows. Section 2 ex- plains the motivation of our experiments, and of- fers a wider perspective on our research goals, illus- trating them with examples of translation problems which arise from ambiguous discourse connectives. Current resources and methods for discourse anno- tation are discussed in Section 3. Section 4 analyzes our experiments in manual annotation and in partic- ular the influence of the set of labels on the reliability of annotation. The automatic disambiguation exper- iments, the features used, the results and the analysis...
View Full Document
- Spring '08
- Natural Language Processing