Automatic_Text_Summarization_Using_a_Mac.pdf - Automatic...

This preview shows page 1 - 3 out of 10 pages.

Automatic Text Summarization using a Machine Learning Approach Joel Larocca Neto Alex A. Freitas Celso A. A. Kaestner Pontifical Catholic University of Parana (PUCPR) Rua Imaculada Conceicao, 1155 Curitiba – PR. 80.215-901. BRAZIL {joel, alex, kaestner}@ppgia.pucpr.br Abstract. In this paper we address the automatic summarization task. Recent research works on extractive-summary generation employ some heuristics, but few works indicate how to select the relevant features. We will present a summarization procedure based on the application of trainable Machine Learning algorithms which employs a set of features extracted directly from the original text. These features are of two kinds: statistical – based on the frequency of some elements in the text; and linguistic – extracted from a simplified argumentative structure of the text. We also present some computational results obtained with the application of our summarizer to some well known text databases, and we compare these results to some baseline summarization procedures. 1 Introduction Automatic text processing is a research field that is currently extremely active. One important task in this field is automatic summarization , which consists of reducing the size of a text while preserving its information content [9] , [21] . A summarizer is a system that produces a condensed representation of its input’s for user consumption [12] . Summary construction is, in general, a complex task which ideally would involve deep natural language processing capacities [15] . In order to simplify the problem, current research is focused on extractive-summary generation [21] . An extractive summary is simply a subset of the sentences of the original text. These summaries do not guarantee a good narrative coherence, but they can conveniently represent an approximate content of the text for relevance judgement. A summary can be employed in an indicative way – as a pointer to some parts of the original document, or in an informative way – to cover all relevant information of the text [12] . In both cases the most important advantage of using a summary is its reduced reading time. Summary generation by an automatic procedure has also other advantages: (i) the size of the summary can be controlled; (ii) its content is determinist; and (iii) the link between a text element in the summary and its position in the original text can be easily established.
In our work we deal with an automatic trainable summarization procedure based on the application of machine learning techniques. Projects involving extractive summary generation have shown that the success of this task depends strongly on the use of heuristics [5] , [7] ; unfortunately few indicatives are given of how to choose the relevant features for this task. We will employ here statistical and linguistic features, extracted directly and automatically from the original text.

  • Left Quote Icon

    Student Picture

  • Left Quote Icon

    Student Picture

  • Left Quote Icon

    Student Picture