nlpspo-aaai05 - Impact of Linguistic Analysis on the...

Info iconThis preview shows pages 1–2. Sign up to view the full content.

View Full Document Right Arrow Icon
Impact of Linguistic Analysis on the Semantic Graph Coverage and Learning of Document Extracts Jur ij Leskovec 1,3 , Natasa Milic-Frayling 2 , Marko Grobelnik 3 1 Carnegie Mellon University 5000 Forbes Avenue, Pittsburgh, Pennsylvania 15213, USA 2 Microsoft Research Ltd Roger Needham Bldg., 7 J J Thomson Avenue, Cambridge CB3 0FB, United Kingdom 3 Jozef Stefan Institute Jamova 39, 1000 Ljubljana, Slovenia , , Abstract Automatic document summarization is a problem of creating a document surrogate that adequately represents the full document content. We aim at a summarization system that can replicate the quality of summaries created by humans. In this paper we investigate the machine learning method for extracting full sentences from documents based on the document semantic graph structure. In particular, we explore how the Support Vector Machines (SVM) learning method is affected by the quality of linguistic analyses and the corresponding semantic graph representations. We apply two types of linguistic analysis: (1) a simple part-of-speech tagging of noun phrases and verbs and (2) full logical form analysis which identifies Subject-Predicate-Object triples, and then build the semantic graphs. We train the SVM classifier to identify summary nodes and use these nodes to extract sentences. Experiments with the DUC 2002 and CAST datasets show that the SVM based extraction of sentences does not differ significantly for the simple and the sophisticated syntactic analysis. In both cases the graph attributes used in learning are essential for the classifier performance and the quality of extracted summaries. Introduction Document summarization refers to the task of creating document surrogates that are smaller in size but retain various characteristics of the original document, depending on the intended use. The ultimate objective of summarization systems is to enable automatic abstracting of the document text, with all the properties that humans bring to that process. However, that task stretches beyond text analysis to domain knowledge, inference, and language generation. Most of the research has therefore been concerned with methods for text processing and extraction of textual segments that approximate human abstracts. Recently, document summarization research has been given a significant boost by the Document Understanding Copyright © 2005, American Association for Artificial Intelligence ( All rights reserved. Conference (DUC 2002), which provides an experimentation framework and a forum for exchanging ideas. Recent work by (Vanderwende et al. 2004) and (Leskovec et al. 2005) demonstrates the use of rich document semantic structure for document summarization. Both represent the document text as a semantic graph that
Background image of page 1

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
Image of page 2
This is the end of the preview. Sign up to access the rest of the document.

This note was uploaded on 03/06/2012 for the course CIS 630 taught by Professor Cis630 during the Spring '08 term at UPenn.

Page1 / 6

nlpspo-aaai05 - Impact of Linguistic Analysis on the...

This preview shows document pages 1 - 2. Sign up to view the full document.

View Full Document Right Arrow Icon
Ask a homework question - tutors are online