{[ promptMessage ]}

Bookmark it

{[ promptMessage ]}

68746 - TextTiling A Quantitative Approach to Discourse...

Info iconThis preview shows pages 1–2. Sign up to view the full content.

View Full Document Right Arrow Icon
TextTiling: AQuantitativeApproachtoDiscourseSegmentation MartiA.Hearst ComputerScienceDivision,571EvansHall UniversityofCalifornia,Berkeley Berkeley,CA94720 [email protected] Abstract ThispaperpresentsTextTiling,amethodforpartition- ingfull-lengthtextdocumentsintocoherentmulti- paragraphunits. Thelayoutoftexttilesismeantto reflectthepatternofsubtopicscontainedinanexpos- itorytext. Theapproachuseslexicalanalysesbased ontf.idf,aninformationretrievalmeasurement,tode- terminetheextentofthetiles,incorporatingthesaural informationviaastatisticaldisambiguationalgorithm. Thetileshavebeenfoundtocorrespondwelltohuman judgementsofthemajorsubtopicboundariesofscience magazinearticles. 1 Introduction Expositorytextssuchassciencemagazinearticlesand environmentalimpactreportscanbeviewedasbeing composedofafewmaintopicsandaseriesofshort, sometimesdenselydiscussed,subtopics.Forexample, considera23-paragrapharticlefrom Discover maga- zinewhosemaintopicistheexplorationofVenusby theMagellanspaceprobe.Areaderdividedthistext intothefollowingsegments,withthelabelsshown, wherethenumbersindicateparagraphnumbers: 1-2 IntrotoMagellanspaceprobe 3-4 IntrotoVenus 5-7 Lackofcraters 8-11 Evidenceofvolcanicaction 12-15 RiverStyx 16-18 Crustalspreading 19-21 Recentvolcanism 22-23 FutureofMagellan Thecapabilitytoautomatetherecognitionofthiskind ofstructureinafull-textdocumentshouldbeusefulfor improvingavarietyofcomputationaltasks,e.g.,hy- pertext,textsummarizationandinformationretrieval. Towardthisend,thispaperdescribesTextTiling,acom- putationalapproachtosegmentingwrittenexpository textintocontiguous,non-overlappingdiscourseunits thatcorrespondtothepatternofsubtopicsinatext. 1 (Skorochod’ko1972)hassuggesteddiscoveringatext’s structurebydividingitupintosentencesandseeing howmuchwordoverlapappearsamongthesentences. Theoverlapformsakindofintra-structure;fullycon- nectedgraphsmightindicatedensediscussionsofa topic,whilelongspindlychainsofconnectivitymight indicateasequentialaccount.Thecrucialideaisthatof definingthestructureofatextasafunctionofthecon- nectivitypatternsofthetermsthatcompriseit.Thisis incontrastwithsegmentingguidedprimarilybyfine- graineddiscoursecuessuchasregisterchange,focus shift,andcuewords.Fromacomputationalviewpoint, deducingtextualtopicstructurefromlexicalconnec- tivityaloneisappealing,bothbecauseitiseasytocom- pute,andalsobecausediscoursecuesaresometimes misleadingwithrespecttothetopicstructure(Brown &Yule1983)(ch.3). FollowingSkorochod’ko,TextTilingattemptstodis- covercoherent,interrelatedsubdiscussionsbyanalyz- 1 Theuseof‘topic’hereismeanttosignifypiecesoftext‘about’ something,asopposedtothetopic/commentdistinctionfound withinindividualsentences. Theintendedsenseisthatdescribed by(Brown&Yule1983)(p. 69):“Inordertodivideupalengthy recordingofconversationaldataintochunkswhichcanbeinves- tigatedindetail,theanalystisoftenforcedtodependonintuitive notionsaboutwhereonepartofaconversationendsandanother begins Thenotionof‘topic’isclearlyanintuitivelysatisfactory
Background image of page 1

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full Document Right Arrow Icon
Image of page 2
This is the end of the preview. Sign up to access the rest of the document.

{[ snackBarMessage ]}