3 Pages

I05-3028

Course: I 05, Fall 2009
School: UPenn
Rating:
 
 
 
 
 

Word Count: 1966

Document Preview

Word Chinese Segmentation with Multiple Postprocessors in HIT-IRLab Huipeng Zhang Ting Liu Jinshan Ma Xiantao Liao Information Retrieval Lab, Harbin Institute of Technology, Harbin, 150001 CHINA {zhp,tliu,mjs,taozi}@ir.hit.edu.cn Abstract This paper presents the results of the system IRLAS1 from HIT-IRLab in the Second International Chinese Word Segmentation Bakeoff. IRLAS consists of several basic components and...

Register Now

Unformatted Document Excerpt

Coursehero >> Pennsylvania >> UPenn >> I 05

Course Hero has millions of student submitted documents similar to the one
below including study guides, practice problems, reference materials, practice exams, textbook help and tutor support.

Course Hero has millions of student submitted documents similar to the one below including study guides, practice problems, reference materials, practice exams, textbook help and tutor support.
Word Chinese Segmentation with Multiple Postprocessors in HIT-IRLab Huipeng Zhang Ting Liu Jinshan Ma Xiantao Liao Information Retrieval Lab, Harbin Institute of Technology, Harbin, 150001 CHINA {zhp,tliu,mjs,taozi}@ir.hit.edu.cn Abstract This paper presents the results of the system IRLAS1 from HIT-IRLab in the Second International Chinese Word Segmentation Bakeoff. IRLAS consists of several basic components and multiple postprocessors. The basic components include basic segmentation, factoid recognition, and named entity recognition. These components maintain a segment graph together. The postprocessors include merging of adjoining words, morphologically derived word recognition, and new word identification. These postprocessors do some modifications on the best word sequence which is selected from the segment graph. Our system participated in the open and closed tracks of PK corpus and ranked #4 and #3 respectively. Our scores were very close to the highest level. It proves that our system has reached the current state of the art. tors that affect our performance. Section 6 gives our conclusion. 2 2.1 System Description Basic Segmentation 1 Introduction IRLAS participated in both the open and closed tracks of PK corpus. The sections below descript in detail the components of our system and the tracks we participated in. The structure of this paper is as follows. Section 2 presents the system description. Section 3 describes in detail the tracks we participated in. Section 4 gives some experiments and discussions. Section 5 enumerates some external fac1 When a line is input into the system, it is first split into sentences separated by period. The reason to split a line into sentences is that in named entity recognition, the processing of several shorter sentences can reach a higher named entity recall rate than that of a long sentence. The reason to split the line only by period is for the simplicity for programming, and the sentences separated by period are short enough to process. Then every sentence is segmented into single atoms. For example, a sentence like "HIT-IRLab SIGHAN " will be segmented as "HIT-IRLab/ / / / / / /SIGHAN/ / / / / ". After atom segmentation, a segment graph is created. The number of nodes in the graph is the number of atoms plus 1, and every atom corresponds to an arc in the graph. Then all the words in the dictionary2 that appear in the sentence will be added to the segment graph. The graph contains various information such as the bigram possibility of every word. Figure 1 shows the segment graph of the above sentence after basic segmentation. 2.2 Factoid Recognition After basic segmentation, a graph with all the atoms and all the words in the dictionary is set up. On this basis, we find out all the factoids 2 IRLAS is the abbreviation for "Information Retrieval Lab Lexical Analysis System". The dictionary is trained with training corpus. 172 Figure 1: The segment graph Note: the probability of each word is not shown in the graph. such as numbers, times and e-mails with a set of rules. Then, we also add all these factoids to the segment graph. 2.4 Merging of Adjoining Words 2.3 Named Entity Recognition Then we will recognize the named entities such as persons and locations. First, we select N3 best paths from the segment graph with Dijkstra algorithm. Then for every path of the N+1 paths4 (N best paths and the atom path), we perform a process of Roles Tagging with HMM model (Zhang et al. 2003). The process of it is much like that of Part of Speech Tagging. Then with the best role sequence of every path, we can find out all the named entities and add them to the segment graph as usual. Take the sentence " " for example. After basic segmentation and factoid recognition, the N+1 paths are as follows: / / / / / / / / / / / / / / / After the steps above, the segment graph is completed and a best word sequence is generated with Dijkstra algorithm. This merging operation and all the following operations are done to the best word sequence. There are many inconsistencies in the PK corpus. For example, in PK training corpus, the word " " sometimes is considered as one word, but sometimes is considered as two separate words as " ". The inconsistencies lower the system's performance to some extent. To solve this problem, we first train from the training corpus the probability of a word to be one word and the probability to be two separate words. Then we perform a process of merging: if two adjoining words in the best word sequence are more likely to be one word, then we just merge them together. 2.5 Then for each path, the process of Roles Tagging is performed and the following role sequences are generated: X/S/W/N/O/O/O/O5 X/S/W/N/O/O/O/O/O From these role sequences, we can find out that "XSW" is a 3-character Chinese name. So the word " " is recognized as a person name and be added to the segment graph. Morphologically Derived Word Recognition To deal with the words with the postfix like " ", " ", " " and so on, we perform the process to merge the preceding word and the postfix into one word. We train a list of postfixes from the training corpus. Then we scan the best word sequence, if there is a single character word that appears in the postfix list, we merge the preceding word and this postfix into one word. For example, a best word sequence like " " will be converted to " " after this operation. 2.6 3 4 New Word Identification N is a constant which is 8 in our system. It may be smaller than N+1 if the sentence is short enough; exactly, N+1 is the upper bound of the path number. 5 X, S, W, N and O are all roles for person name recognition, X is surname, S is the first character of given name, W is the second character of given name, N is the word following a person name, and O is other remote context. We defined 17 roles for person name recognition and 10 roles for location name recognition. As for the that words are not in the dictionary and cannot be identified with the steps above, we perform a process of New Word Identification (NWI). We train from the training corpus the probability of a word to be independent and the probability to be a special part of another word. In our system, we only consider the words that have one or two characters. Then we scan 173 the best word sequence, if the product of the probabilities of two adjoining words exceed a threshold, then we merge the two words into one word. Take the word " " for example. It is segmented as " " after all the above steps since this word is not in the dictionary. We find that the word " " has a probability of 0.83 to be the first character of a two character word, and the word " " has a probability of 0.94 to be the last character of a two character word. The product of them is 0.78 which is larger than 0.65, which is the threshold in our system. So the word " " is recognized as a single word. PK open Complete System Without Merging Without MDW Without NWI 96.5% 96.3% 96.6% 96.5% PK closed 94.9% 94.7% 94.4% 94.9% Table 1: Evaluation results of IRLAS with each postprocessor cut off at a time From Table 1, we can come to some interesting facts: ! The Merging of Adjoining Words has good effect on both open and closed tracks. So we can conclude that this module can solve the problem of inconsistent training corpus to some extent. ! Morphologically Derived Word Recognition does some harm in open track, but it has a very good effect in closed track. Maybe it is because that in open track, we can make a comparatively larger dictionary since we can use any resource we have. So most MDWs6 are in the dictionary and the MDWs that are not in the dictionary are mostly difficult to recognize. So it does more harm than good in many cases. But in closed track, we have a small dictionary and many common MDWs are not in the dictionary. So it does much more good in closed track. ! New Word Identification is minimal in both open and closed tracks. Maybe it is because that the above steps have recognized the most OOV words and it is hard to recognize any more new words. 3 3.1 Tracks Closed Track As for the PK closed track, we first extract all the common words and tokens from the training corpus and set up a dictionary of 55,335 entries. Then we extract every kind of named entity respectively. With these named entities, we train parameters for Roles Tagging. We also train all the other parameters mentioned in Section 2 with the training corpus. 3.2 Open Track The PK open track is similar to closed one. In open track, we use all the 6 months corpus of People's Daily and set up a dictionary of 107,749 entries. Additionally, we find 101 new words from the Web and add them to the dictionary. We train the parameters of named entity recognition with a person list and a location list in our laboratory. The training of other parameters is the same with closed track. 4 Experiments and Discussions 5 External Factors That Affect Our Performance We do several experiments on PK test corpus to see the contribution of each postprocessor. We cut off one postprocessor at a time from the complete system and record its F-score. The evaluation results are shown in Table 1. In the table, MDW represents Morphologically Derived Word Recognition, and NWI represents New Word Identification. The difference on the definition of words is the main factor that affects our performance. In many cases such as " ", " ", " " are all considered as one word in our system but not so in t...

Find millions of documents on Course Hero - Study Guides, Lecture Notes, Reference Materials, Practice Exams and more. Course Hero has millions of course specific materials providing students with the best way to expand their education.

Below is a small sample set of documents:

UPenn - P - 98
Alignment of Multiple Languages for Historical ComparisonMichael A. CovingtonArtificial Intelligence Center The University of Georgia Athens, GA 30602-7415 U.S.A. mc@uga.eduAbstract A n essential step in comparative reconstruction is to align cor
UPenn - N - 03
Proceedings of HLT-NAACL 2003 Main Papers , pp. 102-109 Edmonton, May-June 2003Syntax-based Alignment of Multiple Translations: Extracting Paraphrases and Generating New SentencesBo Pang Department of Computer Science Cornell University Ithaca, NY
UPenn - J - 92
A Practical Approach to Multiple Default Inheritance for Unification-Based LexiconsGraham Russell*ISSCOAfzal Ballim*ISSCOJohn Carroll tCambridge University Computer LaboratorySusan Warwick-Armstrong*ISSCOThis paper describes a unificatio
UPenn - P - 95
A Computational Framework for Composition in Multiple Linguistic DomainsElvan GS~men Computer Engineering Department M i d d l e E a s t Technical U n i v e r s i t y 06531, A n k a r a , T u r k e y elvan@lcsl.metu.edu.trAbstract We describe a co
UPenn - C - 90
GENERATING USINGREFERRINGEXPRESSIONS SOURCESMULTIPLEKNOWLEDGERussell Block Universitfit Hamburg Zentrales Fremdsprachenlnstitut Von-Melle-Park 5 2000 Hamburg 13 E.R.G.Helmut Horacek Universifftt Bielefeld Fakultfit fiir Linguistik und Lit
UPenn - STAT - 112
Class 22: Tuesday, Nov. 30th Today: One-way analysis of variance I will e-mail you tonight or tomorrow morning with comments on your project. Schedule: Thurs., Dec. 2nd Homework 8 due Thurs., Dec. 9th Final class Mon., Dec. 13th (5 pm) Preli
UPenn - STAT - 112
Multiple Regression IV and Analysis of VarianceLecture Notes XXIStatistics 112, Fall 2002OutlineInference for single coefficients in a multiple linear regression model Extra-sum-of-squares Analysis of variance Reading for this time: Fin
UPenn - STAT - 112
Homework 7, Statistics 112, Fall 2004This homework is due Thursday, November 18th at the beginning of class. 1. In the first two problems, we will use multiple regression to build a model for forecasting the fuel consumption rate (FUELCON) in gallon
UPenn - STAT - 112
Homework 2, Statistics 112, Fall 2005This homework is due Thursday, October 5th at the beginning of class. Notes: All data sets mentioned can be found under the data sets link on our web site. 1. (a) Problem 3.37 in our textbook, Applied Regression
UPenn - P - 92
INSIDE-OUTSIDE REESTIMATION FROM PARTIALLY BRACKETED CORPORAFernando Pereira2D-447, AT~zT Bell Laboratories PO Box 636, 600 Mountain Ave Murray Hill, NJ 07974-0636pereira@research, art. comYves SchabesDept. of Computer and Information Science U
UPenn - C - 94
A R e e s t i m a t i o n A l g o r i t h m fi~r I ' r o b a b i l i s t i c R e c t o ' s i r e ~lYansition Network*Young S. t I a n , mtd K e y - S u n ( , t u )" l(;enter for Artificial Intelligence (;omputer Science l)epartme'at (,~vnter for A
UPenn - P - 99
A u t o m a t i c C o m p e n s a t i o n for Parser Figure-of-Merit Flaws* D o n Blaheta and Eugene Charniak{dpb, ec}@cs, brown, eduDepartment of Computer Science Box 1910 / 115 Waterman St.-4th floor Brown University Providence, RI 02912Abstrac
UPenn - J - 83
T h e FINITE S T R I N G N e w s l e t t e rAnnouncementsAnnouncements Carl Engelman M e m o r i a l Fund Carl Engelman, one of the pioneers in artificial intelligence research, died of a heart attack at his home in Cambridge, Massachusetts, on N
UPenn - P - 05
Using Readers to Identify Lexical Cohesive Structures in TextsBeata Beigman Klebanov School of Computer Science and Engineering The Hebrew University of Jerusalem Jerusalem, 91904, Israel beata@cs.huji.ac.ilAbstractThis paper describes a reader-b
UPenn - P - 05
Stochastic Lexicalized Inversion Transduction Grammar for AlignmentHao Zhang and Daniel Gildea Computer Science Department University of Rochester Rochester, NY 14627AbstractWe present a version of Inversion Transduction Grammar where rule probab
UPenn - D - 07
Syntactic Re-Alignment Models for Machine TranslationJonathan May Information Sciences Institute University of Southern California Marina del Rey, CA 90292 jonmay@isi.edu Kevin Knight Information Sciences Institute University of Southern California
UPenn - N - 04
Speed and Accuracy in Shallow and Deep Stochastic ParsingRonald M. Kaplan , Stefan Riezler , Tracy Holloway King John T. Maxwell III, Alexander Vasserman and Richard Crouch Palo Alto Research Center, 3333 Coyote Hill Rd., Palo Alto, CA 94304{kapla
UPenn - H - 94
A One Pass Decoder Design For Large Vocabulary RecognitionJ . J . O d e l l , V. V a l t c h e v , P . C . W o o d l a n d , S . J . Y o u n g Cambridge University Engineering Department Trumpington Street, Cambridge, CB2 1PZ, England ABSTRACT To ac
UPenn - P - 84
QUASI-INDEXICAL REFERENCE IN PROPOSITIONAL SEMANTIC NETWORKS William J. Rapaport Department of Philosophy, SUNY Fredonia, Fredonia, NY 14063 Departmeot of Computer Science, SUNY Buffalo, Buffalo, NY 14260 Stuart C. Shapiro Department of Computer Scie
UPenn - P - 84
A STOCHASTIC APPROACH TO SENTENCE PARSINGTetsunosuke FuJisaki Science Institute, IBM Japan, Ltd. No. 36 Kowa Building 5-19 Sanbancho,Chiyoda-ku Tokyo 102, JapanABSTRACTA description will be given of a procedure to asslgn the most likely probabi
UPenn - E - 91
AN ALGORITHM FOR GENERATING NON-REDUNDANT QUANTIFIER SCOPINGSEspen J. Vestre Department of Mathematics University of Oslo P.O. Box 1053 Blindern N-0316 OSLO 3, Norway Internet: espen@math.uio.noABSTRACTThis paper describes an algorithm for genera
UPenn - P - 81
CONTROLLED TRANSFORMATIONAL SENTENCE G E N E R A T I O N Madeleine Bates Bolt Beranek and Newman,Inc.Robert Ingria Department of Linguistics, MIT I. INTRODUCTION This paper describes a sentence generator that was built primarily to focus on synta
UPenn - N - 03
Proceedings of HLT-NAACL 2003 Main Papers , pp. 40-47 Edmonton, May-June 2003A* Parsing: Fast Exact Viterbi Parse SelectionDan Klein Computer Science Department Stanford University Stanford, CA 94305-9040klein@cs.stanford.eduChristopher D. Mann
Chester - ECO - 343
Economist.com Capital punishmentSep 12th 2002 From The Economist print editionTroubled local stockmarkets are looking west for allies. Many of them will not surviveWOOD & COMPANY claims to be Prague's biggest equit
Chester - ECO - 343
WSJ.com - Major Business News August 23, 2002 PAGE ONE Floods in Germany Put Budget at Risk Government's Direction on Reforms Could Sway Countries Across Europe By G. THOMAS SIMS Staff Repor
Chester - ECO - 343
Kuchma Calls for Reform to Defuse Ukraine Tension September 3, 2002 Kuchma Calls for Reform to Defuse Ukraine Tension By REUTERS Filed at 7:32 a.m. ET KIEV (Reuters) - Ukrainian President Leonid Kuchma, keen to tak
UPenn - CS - 184
CS184b: Computer Architecture [Single Threaded Architecture: abstractions, quantification, and optimizations]Day1: January 3, 2000 "Architecture" and overviewCaltech CS184b Winter2001 - DeHon1Today This Quarter What is Architecture? Why?
UPenn - CIS - 39904
Introduction to PythonQuick ReviewTodayPrincipals of Object Orient Programming A Class is different then an Objectunless it is a class objectPython Classes InheritanceexceptionsLast TimeModulesimport Design and Security sys and readline
UPenn - CIS - 39904
Introduction to PythonQuick ReviewTodayInput and Output mechanismString FormattingFile Handling Command Line ArgumentsLast TimeTyping and Variable Deceleration Basic TypesSequencesStrings, Lists, TuplesControl FlowFor loops work over s
UPenn - CIS - 650
Peer-to-Peer Schema MediationBernstein, et al. "Data Management for Peer-to-Peer Computing: A Vision", WebDB 2002 Halevy, et al. "Schema Mediation for Peer Data Management Systems", ICDE 2003Zachary G. IvesUniversity of Pennsylvania CIS 650, Spri
Northwestern OSU - RANGER - 3
RobotKitsSteveMaier HeartlandBEST KitCommitteeChair15Thingstopickupbefore youleavetoday:Returnablesyellowtoolbox Returnablesblackcarrycase ConsumablesblackSUVchest ConsumablesPVCpipe Consumablesplywood2KitTypesConsumables Maybemodifie
Northwestern OSU - RANGER - 3
OFFICE USE ONLY Date Received:_ NORTHWESTERN OKLAHOMA STATE UNIVERSITY APPLICATION FOR NORTHWESTERN SCHOLAR AMBASSADORS Name Home Address(street or R.R) (city, state, zip)DateHome Phone Shirt Size (small, medium, large, x-large) Home Town Newspa
Northwestern OSU - RANGER - 3
NORTHWESTERN OKLAHOMA STATE UNIVERSITYAlva- Enid-WoodwardAn Equal Opportunity EmployerReturn to: NWOSU Human Resources 709 Oklahoma Blvd. Alva, OK 73717APPLICATION FOR EMPLOYMENTNON-EXEMPT NON-TEACHING POSITIONS Today's Date: Hints: Must be
Northwestern OSU - RANGER - 3
Portfolio Requirements One reflection per competency 1-3 artifacts, depending on student choice. Use artifacts together with one reflection to meet entire competency. Competencies cannot be partially met; no more need to highlight; only turn in when
Northwestern OSU - RANGER - 3
Northwestern Oklahoma State UniversityAlva Enid WoodwardAddress Change Form for Part-time Employees Only (E&G, Workstudy, PTS, Adjunct)(Use your word processing program to enter data-just click on each shaded box and start typing. Print and si
Northwestern OSU - RANGER - 3
From the Office of Graduate Studies Northwestern Oklahoma State UniversityFYI re: Miller Analogies Test (MAT), 2007-2008: The exam fee is $80, payable in the NWOSU Business Office. The exam is not offered on a 'walk-in' basis ~ please make an
UPenn - P - 01
Topological Dependency Trees: A Constraint-Based Account of Linear PrecedenceDenys Duchier Programming Systems Lab Universit t des Saarlandes, Geb. 45 a Postfach 15 11 50 66041 Saarbr cken, Germany u duchier@ps.uni-sb.de AbstractWe describe a new f
UPenn - N - 01
A Corpus-based Account of Regular Polysemy: The Case of Context-sensitive AdjectivesMaria Lapata Department of Computational Linguistics Saarland University PO Box 15 11 50 66041 Saarbr cken, Germany u mlap@coli.uni-sb.de AbstractIn this paper we i
UPenn - C - 88
.% A)~:{}/,_I I;:IW@ ./K,:CCOI:.~[~; . . . . .:. e:,~,.~. . . . . . . . [{ ] :~.!.b<)~][!~]-'.~ '~. e' :i.) :rc_~'e m <:3/ h ) e <_I.!IA c;I])A, I(6iti I:~stitutc Io:c New General;ion Compul;er 'ihchnology (IC()"I;') Mil;a. Kokusai ]~ldg. 2:
UPenn - J - 03
A Probabilistic Account of Logical MetonymyMaria LapataUniversity of SheffieldAlex LascaridesUniversity of EdinburghIn this article we investigate logical metonymy, that is, constructions in which the argument of a word in syntax appears to be
UPenn - P - 94
TEMPORAL RELATIONS: REFERENCE OR DISCOURSE COHERENCE?Andrew Kehler Harvard University Aiken C o m p u t a t i o n Laboratory 33 O x f o r d S t r e e t C a m b r i d g e , M A 02138 kehler@das.harvard.eduAbstract The temporal relations that hold b
UPenn - P - 06
An Account for Compound Prepositions in FarsiZahra Abolhassani Chime Research Center of Samt, Tehran, 14636 Ph.D in Linguistics zabolhassani@hotmail.comAbstractThere are some sorts of Preposition + Noun combinations in Farsi that apparently a Pr
UPenn - J - 89
Book ReviewsRemnants ofMeaningindex to locate a reference. It may not find a welldefined readership but it gathers so much useful and interesting information in one place that it is well worth having if the reader is prepared to deal with the d
UPenn - C - 94
INTERLANGUAGESIGNSANDLEXICALTRANSFERERRORSAtle Ro Department of Linguistics and Phonetics University of Bergen ro@hLuib.noAbstract A theory of interlanguage (IL) lexicons is outlined, with emphasis on IL lexical entries, based on the HP
UPenn - C - 86
I OIO~S:iN T H E R O S E T T AMACHINETRANSLATIONSY.~TEMAndr~ Schenk Philips Research Laboratories Eindhoven, The Netherlands Abstract This paper discusses one o f t h e p r o b l e m s o f m a c h i n e t r a n s ] . a t i o n , n.m~mly t h
UPenn - J - 98
Computational LinguisticsVolume 24, Number 4A n a p h o r e s t e m p o r e l l e s et (in-)coh4rence Walter De Mulder, Liliane Tasmowski-De Ryck, and Carl Vetters (editors)(Universit6 d'Artois, Universit6 d'Anvers, and Universit6 du Littoral) A
UPenn - ACL - 2003
ACL-2003 Call for SponsorshipDecember 2002 The Association for Computational Linguistics (ACL) was founded by professionals involved in natural language processing field in the United States, but eventually gained its members in Europe and Asia over
UPenn - ACL - 2003
Evaluation challenges in large-scale document summarizationDragomir R. Radev U. of Michigan radev@umich.edu Wai Lam Chinese U. of Hong Kong wlam@se.cuhk.edu.hk Arda Celebi USC/ISI ardax@isi.edu AbstractWe present a large-scale meta evaluation of e
UPenn - ACL - 2003
Proceedings of the Second SIGHAN Workshop on Chinese Language Processing, July 2003, pp. 104-111.Abductive Explanation-based Learning Improves Parsing Accuracy and EfficiencyOliver Streiter Language and Law, European Academy, Bolzano, Italy ostre
UPenn - ACL - 2003
Proceedings of the 41st Annual Meeting of the Association for Computational Linguistics, July 2003, pp. 152-159.tRuEcasIngLucian Vlad Lita Abe Ittycheriah Salim Roukos Nanda Kambhatla Carnegie Mellon IBM T.J. Watson IBM T.J. Watson IBM T.J. Wats
UPenn - J - 98
Computational LinguisticsVolume 24, Number 2Computational and Conversational Discourse: Burning I s s u e s - - A n Interdisciplinary Account Eduard H. Hovy and Donia R. Scott (editors)(Information Sciences Institute, University of Southern Cali
UPenn - C - 65
9"1965 International Conference on Computational Linguistic s"SOME AND GRAMMAR OFCOMMENTS IN THE NATURALONALGORITHM PARSINGAUTOMATIC LANGUAGESPaul L. GarvinThe B u n k e r - R a m o Corporation 8433 Fallbrook Avenue Canoga Park, Calif
UPenn - J - 93
Grammatical Competence and Parsing Performance Bradley L. Pritchett(Carnegie Mellon University) Chicago: The University of Chicago Press, 1992, xii + 192 pp. Hardbound, ISBN 0-226-68441-5, $55.00; Paperbound, ISBN 0-226-68442-3, $1cL95Reviewed by
UPenn - T - 78
TopicLevelsJoseph E. G r i m e s Cornell U n i v e r s i t y and Summer Institute of L i n g u i s t i c sNow that the s e n t e n c e is no longer the edge of our world, we see m o r e c l e a r l y than ever how t o t a l l y r e s p o n s i
UPenn - C - 80
LEVELS OF REPRESENTATION IN NATURAI, LANGUAGE BASED INFORMATION SYSTEMS AND THEIR RELATION TO THE METHODOI,OGY OF COMPUTATIONAL LINGUISTICS G. ZIFONUN, INSTITUT FUER DEUTSCHE SPRACHE, D-6800 MANNHEIM, FEDERAL REPUBI.IC GERMANYSummar L In this paper
UPenn - P - 82
ENGLISH WORDS AND DATA BASES: HOW TO BRIDGE THE GAP Remko J.H. Scha Philips Research Laboratories Eindhoven The NetherlandsABSTRACT If a q.a. system tries to transform an English question directly into the simplest possible formulation of the corre
UPenn - C - 94
THE JaRAP EXPERIMENTAL SYSTEM OF JAPANESE-RUSSIAN AUTOMATIC TRANSLATIONL a r i s a S. M o d i n a , Z o y a M. S h a l y a p i n a I n s t i t u t e o f O r i e n t a l S t u d i e s , R u s s i a n A c a d e m y o f Sciences, R o z h d e s t v e n
UPenn - N - 06
Identifying Perspectives at the Document and Sentence Levels Using Statistical ModelsWei-Hao Lin Language Technologies Institute Carnegie Mellon University Pittsburgh, PA 15213 U.S.A.whlin@cs.cmu.eduAbstractIn this paper we investigate the probl
UPenn - P - 05
A Flexible Stand-Off Data Model with Query Language for Multi-Level Annotation Christoph Muller EML Research gGmbH Villa Bosch Schlo-Wolfsbrunnenweg 33 69118 Heidelberg, Germanymueller@eml-research.deAbstractWe present an implemented XML data mo
UPenn - LDC - 96
File: tones.doc, updated 8/8/95 FFMTIMIT Acoustic-Phonetic Continuous Speech Corpus Far Field Microphone Recordings Training and Test Data NIST Speech Disc 21-1.1 At the beginning of each data collection day, a calibration tone was recorded using the
UPenn - LDC - 2008
This distribution represents release 2.0 of OntoNotes, an annotatedcorpus whose development is being supported under the GALE program ofthe Defense Advanced Research Projects Agency, ContractNo. HR0011-06-C-0022.Top-level documentation of the re