C08-1124 - Extractive Summarization Using Supervised and...

Info iconThis preview shows pages 1–2. Sign up to view the full content.

View Full Document Right Arrow Icon
Proceedings of the 22nd International Conference on Computational Linguistics (Coling 2008), pages 985–992 Manchester, August 2008 Extractive Summarization Using Supervised and Semi-supervised Learning Kam-Fai Wong * , Mingli Wu * * Department of Systems Engineering and Engineering Management The Chinese University of Hong Kong New Territories, Hong Kong {kfwong,[email protected] Wenjie Li Department of Computing The Hong Kong Polytechnic University Kowloon, Hong Kong [email protected] Abstract It is difficult to identify sentence impor- tance from a single point of view. In this paper, we propose a learning-based ap- proach to combine various sentence fea- tures. They are categorized as surface, content, relevance and event features. Surface features are related to extrinsic aspects of a sentence. Content features measure a sentence based on content- conveying words. Event features repre- sent sentences by events they contained. Relevance features evaluate a sentence from its relatedness with other sentences. Experiments show that the combined fea- tures improved summarization perform- ance significantly. Although the evalua- tion results are encouraging, supervised learning approach requires much labeled data. Therefore we investigate co-training by combining labeled and unlabeled data. Experiments show that this semi- supervised learning approach achieves comparable performance to its supervised counterpart and saves about half of the labeling time cost. 1 Introduction 1 Automatic text summarization involves con- densing a document or a document set to produce a human comprehensible summary. Two kinds of summarization approaches were suggested in the past, i.e., extractive (Radev et al., 2004; Li et al., 2006) and abstractive summarization (Dejong, 1978). The abstractive approaches typically need © 2008. Licensed under the Creative Commons Attri- bution-Noncommercial-Share Alike 3.0 Unported license ( http://creativecommons.org/licenses/by-nc- sa/3.0/ ). Some rights reserved. to “understand” and then paraphrase the salient concepts across documents. Due to the limita- tions in natural language processing technology, abstractive approaches are restricted to specific domains. In contrast, extractive approaches commonly select sentences that contain the most significant concepts in the documents. These ap- proaches tend to be more practical. Recently various effective sentence features have been proposed for extractive summarization, such as signature word, event and sentence rele- vance. Although encouraging results have been reported, most of these features are investigated individually. We argue that it is ineffective to identify sentence importance from a single point of view. Each sentence feature has its unique contribution, and combing them would be advan- tageous. Therefore we investigate combined sen- tence features for extractive summarization. To determine weights of different features, we em- ploy a supervised learning framework to identify how likely a sentence is important. Some re-
Background image of page 1

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
Image of page 2
This is the end of the preview. Sign up to access the rest of the document.

This note was uploaded on 03/06/2012 for the course CIS 630 taught by Professor Cis630 during the Spring '08 term at UPenn.

Page1 / 8

C08-1124 - Extractive Summarization Using Supervised and...

This preview shows document pages 1 - 2. Sign up to view the full document.

View Full Document Right Arrow Icon
Ask a homework question - tutors are online