8 Pages

P03-1011

Course: P 03, Fall 2009
School: UPenn
Rating:
 
 
 
 
 

Word Count: 4304

Document Preview

Tree-Based Loosely Alignment for Machine Translation Daniel Gildea University of Pennsylvania dgildea@cis.upenn.edu Abstract We augment a model of translation based on re-ordering nodes in syntactic trees in order to allow alignments not conforming to the original tree structure, while keeping computational complexity polynomial in the sentence length. This is done by adding a new subtree cloning operation to...

Register Now

Unformatted Document Excerpt

Coursehero >> Pennsylvania >> UPenn >> P 03

Course Hero has millions of student submitted documents similar to the one
below including study guides, practice problems, reference materials, practice exams, textbook help and tutor support.

Course Hero has millions of student submitted documents similar to the one below including study guides, practice problems, reference materials, practice exams, textbook help and tutor support.
Tree-Based Loosely Alignment for Machine Translation Daniel Gildea University of Pennsylvania dgildea@cis.upenn.edu Abstract We augment a model of translation based on re-ordering nodes in syntactic trees in order to allow alignments not conforming to the original tree structure, while keeping computational complexity polynomial in the sentence length. This is done by adding a new subtree cloning operation to either tree-to-string or tree-to-tree alignment algorithms. algorithm for estimating probabilistic parameters for a similar model which represents translation as a sequence of re-ordering operations over children of nodes in a syntactic tree, using automatic parser output for the initial tree structures. The use of explicit syntactic information for the target language in this model has led to excellent translation results (Yamada and Knight, 2002), and raises the prospect of training a statistical system using syntactic information for both sides of the parallel corpus. Tree-to-tree alignment techniques such as probabilistic tree substitution grammars (Haji et al., c 2002) can be trained on parse trees from parallel treebanks. However, real bitexts generally do not exhibit parse-tree isomorphism, whether because of systematic differences between how languages express a concept syntactically (Dorr, 1994), or simply because of relatively free translations in the training material. In this paper, we introduce loosely tree-based alignment techniques to address this problem. We present analogous extensions for both tree-to-string and tree-to-tree models that allow alignments not obeying the constraints of the original syntactic tree (or tree pair), although such alignments are dispreferred because they incur a cost in probability. This is achieved by introducing a clone operation, which copies an entire subtree of the source language syntactic structure, moving it anywhere in the target language sentence. Careful parameterization of the probability model allows it to be estimated at no additional cost in computational complexity. We expect our relatively unconstrained clone operation to allow for various types of structural divergence by 1 Introduction Systems for automatic translation between languages have been divided into transfer-based approaches, which rely on interpreting the source string into an abstract semantic representation from which text is generated in the target language, and statistical approaches, pioneered by Brown et al. (1990), which estimate parameters for a model of word-to-word correspondences and word re-orderings directly from large corpora of parallel bilingual text. Only recently have hybrid approaches begun to emerge, which apply probabilistic models to a structured representation of the source text. Wu (1997) showed that restricting word-level alignments between sentence pairs to observe syntactic bracketing constraints significantly reduces the complexity of the alignment problem and allows a polynomial-time solution. Alshawi et al. (2000) also induce parallel tree structures from unbracketed parallel text, modeling the generation of each nodes children with a nite-state transducer. Yamada and Knight (2001) present an providing a sort of hybrid between tree-based and unstructured, IBM-style models. We rst present the tree-to-string model, followed by the tree-to-tree model, before moving on to alignment results for a parallel syntactically annotated Korean-English corpus, measured in terms of alignment perplexities on held-out test data, and agreement with human-annotated word-level alignments. original tree in a bottom-up manner: for all nodes i in input tree T do for all k, l such that 1 < k < l < N do for all orderings of the children 1 ...m of i do for all partitions of span k, l into k1 , l1 ...km , lm do m (i , k, l)+= Porder (|i ) j=1 (j , kj , lj ) end for end for end for end for 2 The Tree-to-String Model We begin by summarizing the model of Yamada and Knight (2001), which can be thought of as representing translation as an Alexander Calder mobile. If we follow the process of an English sentences transformation into French, the English sentence is rst given a syntactic tree representation by a statistical parser (Collins, 1999). As the rst step in the translation process, the children of each node in the tree can be re-ordered. For any node with m children, m! re-orderings are possible, each of which is assigned a probability Porder conditioned on the syntactic categories of the parent node and its children. As the second step, French words can be inserted at each node of the parse tree. Insertions are modeled in two steps, the rst predicting whether an insertion to the left, an insertion to the right, or no insertion takes place with probability Pins , conditioned on the syntactic category of the node and that of its parent. The second step is the choice of the inserted word Pt (f |NULL), which is predicted without any conditioning information. The nal step, a French translation of each original English word, at the leaves of the tree, is chosen according to a distribution Pt (f |e). The French word is predicted conditioned only on the English word, and each English word can generate at most one French word, or can generate a NULL symbol, representing deletion. Given the original tree, the re-ordering, insertion, and translation probabilities at each node are independent of the choices at any other node. These independence relations are analogous to those of a stochastic context-free grammar, and allow for efcient parameter estimation by an inside-outside Expectation Maximization (EM) algorithm. The computation of inside probabilities , outlined below, considers possible reordering of nodes in the This algorithm has computational complexity O(|T |N m+2 ), where m is the maximum number of children of any node in the input tree T , and N the length of the input string. By storing partially completed arcs in the chart and interleaving the inner two loops, complexity of O(|T |n3 m!2m ) can be achieved. Thus, while the algorithm is exponential in m, the fan-out of the grammar, it is polynomial in the size of the input string. Assuming |T | = O(n), the algorithm is O(n4 ). The models efciency, however, comes at a cost. Not only are many independence assumptions made, but many alignments between source and target sentences simply cannot be represented. As a minimal example, take the tree: A B X Y Z Of the six possible re-orderings of the three terminals, the two which would involve crossing the bracketing of the original tree (XZY and YZX) are not allowed. While this constraint gives us a way of using syntactic information in translation, it may in many cases be too rigid. In part to deal with this problem, Yamada and Knight (2001) atten the trees in a pre-processing step by collapsing nodes with the same lexical head-word. This allows, for example, an English subject-verb-object (SVO) structure, which is analyzed as having a VP node spanning the verb and object, to be re-ordered as VSO in a language such as Arabic. Larger syntactic divergences between the two trees may require further relaxation of this constraint, and in practice we expect such divergences to be frequent. For example, a nominal modier in one language may show up as an adverbial in the other, or, due to choices such as which information is represented by a main verb, the syntactic correspondence between the two S VP NP NNC NNC Kyeo-ul PAD e PAU Neun NNC Su-Kap NP NNC PCA eul NNU Myeoch NNX Khyeol-Re NP NNX XSF Ssik NP NNC Ci-Keup VV Pat S VP VP VP VP NP NNC Ci-Keup NULL VV Pat NULL LV VV EFN Ci NULL NNU Myeoch how XSF Ssik many NP NNX NNX Khyeol-Re pairs NNC Su-Kap gloves NP NNC PCA eul NULL VV Pat each LV VV EFN Ci you VP NP NNC Ci-Keup issued PAD e in NP NNC PAU Neun NULL NNC Kyeo-ul winter VP VP VP LV VV EFN Ci Figure 1: Original Korean parse tree, above, and transformed tree after reordering of children, subtree cloning (indicated by the arrow), and word translation. After the insertion operation (not shown), the trees English yield is: How many pairs of gloves is each of you issued in winter? sentences may break down completely. 2.1 Tree-to-String Clone Operation In order to provide some exibility, we modify the model in order to allow for a copy of a (translated) subtree from the English sentences to occur, with some cost, at any point in the resulting French sentence. For example, in the case of the input tree A B X Y Z estimated using the dynamic programming method above, keeping counts for the expected number of times each node has been cloned, at no increase in computational complexity. Without such an assumption, the parameter estimation becomes a problem of parsing with crossing dependencies, which is exponential in the length of the input string (Barton, 1985). 3 The Tree-to-Tree Model a clone operation making a copy of node 3 as a new child of B would produce the tree: A B X Z Z Y This operation, combined with the deletion of the original node Z, produces the alignment (XZY) that was disallowed by the original tree reordering model. Figure 1 shows an example from our Korean-English corpus where the clone operation allows the model to handle a case of wh-movement in the English sentence that could not be realized by any reordering of subtrees of the Korean parse. The probability of adding a clone of original node i as a child of node j is calculated in two steps: rst, the choice of whether to insert a clone under j , with probability Pins (clone|j ), and the choice of which original node to copy, with probability Pclone (i |clone = 1) = Pmakeclone (i ) k Pmakeclone (k ) The tree-to-tree alignment model has tree transformation operations similar to those of the tree-tostring model described above. However, the transformed tree must not only match the surface string of the target language, but also the tree structure assigned to the string by the treebank annotators. In order to provide enough exibility to make this possible, additional tree transformation operations allow a single node in the source tree to produce two nodes in the target tree, or two nodes in the source tree to be grouped together and produce a single node in the target tree. The model can be thought of as a synchronous tree substitution grammar, with probabilities parameterized to generate the target tree conditioned on the structure of the source tree. The probability P (Tb |Ta ) of transforming the source tree Ta into target tree Tb is modeled in a sequence of steps proceeding from the root of the target tree down. At each level of the tree: 1. At most one of the current nodes children is grouped with the current node in a single elementary tree, with probability Pelem (ta |a children(a )), conditioned on the current node a and its children (ie the CFG production expanding a ). 2. An alignment of the children of the current elementary tree is chosen, with probability Palign (|a children(ta )). This alignment operation is similar to the re-order operation in the tree-to-string model, with the extension that 1) the alignment can include insertions and deletions of individual children, as nodes in either the source or target may not correspond to anything on the other side, and 2) in the case where two nodes have been grouped into ta , their children are re-ordered together in one step. where Pmakeclone is the probability of an original node producing a copy. In our implementation, for simplicity, Pins (clone) is a single number, estimated by the EM algorithm but not conditioned on the parent node j , and Pmakeclone is a constant, meaning that the node to be copied is chosen from all the nodes in the original tree with uniform probability. It is important to note that Pmakeclone is not dependent on whether a clone of the node in question has already been made, and thus a node may be reused any number of times. This independence assumption is crucial to the computational tractability of the algorithm, as the model can be In the nal step of the process, as in the tree-tostring model, lexical items at the leaves of the tree are translated into the target language according to a distribution Pt (f |e). Allowing non-1-to-1 correspondences between nodes in the two trees is necessary to handle the fact that the depth of corresponding words in the two trees often differs. A further consequence of allowing elementary trees of size one or two is that some reorderings not allowed when reordering the children of each individual node separately are now possible. For example, with our simple tree A B X Y Z the generative probability model should be thought of as only generating single nodes on the target side. Thus, the alignment algorithm is constrained by the bracketing on the target side, but does not generate the entire target tree structure. While the probability model for tree transformation operates from the top of the tree down, probability estimation for aligning two trees takes place by iterating through pairs of nodes from each tree in bottom-up order, as sketched below: for all nodes a in source tree Ta in bottom-up order do for all elementary trees ta rooted in a do for all nodes b in target tree Tb in bottom-up order do for all elementary trees tb rooted in b do for all alignments of the children of ta and tb do += (a , b ) Pelem (ta |a )Palign (|i ) (i,j) (i , j ) end for end for end for end for end for if nodes A and B are considered as one elementary tree, with probability Pelem (ta |A BZ), their collective children will be reordered with probability Palign ({(1, 1)(2, 3)(3, 2)}|A XYZ) A X Z Y giving the desired word ordering XZY. However, computational complexity as well as data sparsity prevent us considering from arbitrarily large elementary trees, and the number of nodes considered at once still limits the possible alignments. For example, with our maximum of two nodes, no transformation of the tree A B W X Y C Z The outer two loops, iterating over nodes in each tree, require O(|T |2 ). Because we restrict our elementary trees to include at most one child of the root node on either side, choosing elementary trees for a node pair is O(m2 ), where m refers to the maximum number of children of a node. Computing the alignment between the 2m children of the elementary tree on either side requires choosing which subset of source nodes to delete, O(22m ), which subset of target nodes to insert (or clone), O(22m ), and how to reorder the remaining nodes from source to target tree, O((2m)!). Thus overall complexity of the algorithm is O(|T |2 m2 42m (2m)!), quadratic in the size of the input sentences, but exponential in the fan-out of the grammar. 3.1 Tree-to-Tree Clone Operation is capable of generating the alignment WYXZ. In order to generate the complete target tree, one more step is necessary to choose the structure on the target side, specically whether the elementary tree has one or two nodes, what labels the nodes have, and, if there are two nodes, whether each child attaches to the rst or the second. Because we are ultimately interested in predicting the correct target string, regardless of its structure, we do not assign probabilities to these steps. The nonterminals on the target side are ignored entirely, and while the alignment algorithm considers possible pairs of nodes as elementary trees on the target side during training, Allowing m-to-n matching of up to two nodes on either side of the parallel treebank allows for limited non-isomorphism between the trees, as in Haji et al. (2002). However, even given this exic bility, requiring alignments to match two input trees rather than one often makes tree-to-tree alignment more constrained than tree-to-string alignment. For example, even alignments with no change in word order may not be possible if the structures of the two trees are radically mismatched. This leads us to think it may be helpful to allow departures from Tree-to-String elementary tree grouping re-order insertion lexical translation with cloning Porder (| children()) Pins (left, right, none|) Pt (f |e) Pins (clone|) Pmakeclone () Tree-to-Tree Pelem (ta |a children(a )) Palign (|a children(ta )) can include insertion symbol Pt (f |e) can include clone symbol Pmakeclone () Table 1: Model parameterization the constraints of the parallel bracketing, if it can be done in without dramatically increasing computational complexity. For this reason, we introduce a clone operation, which allows a copy of a node from the source tree to be made anywhere in the target tree. After the clone operation takes place, the transformation of source into target tree takes place using the tree decomposition and subtree alignment operations as before. The basic algorithm of the previous section remains unchanged, with the exception that the alignments between children of two elementary trees can now include cloned, as well as inserted, nodes on the target side. Given that species a new cloned node as a child of j , the choice of which node to clone is made as in the tree-to-string model: Pclone (i |clone ) = Pmakeclone (i ) k Pmakeclone (k ) corpus contains 5083 sentences, of which we used 4982 as training data, holding out 101 sentences for evaluation. The average Korean sentence length was 13 words. Korean is an agglutinative language, and words often contain sequences of meaning-bearing sufxes. For the purposes of our model, we represented the syntax trees using a fairly aggressive tokenization, breaking multimorphemic words into separate leaves of the tree. This gave an average of 21 tokens for the Korean sentences. The average English sentence length was 16. The maximum number of children of a node in the Korean trees was 23 (this corresponds to a comma-separated list of items). 77% of the Korean trees had no more than four children at any node, 92% had no more than ve children, and 96% no more than six children. The vocabulary size (number of unique types) was 4700 words in English, and 3279 in Korean before splitting multi-morphemic words, the Korean vocabulary size was 10059. For reasons of computation speed, trees with more than 5 children were excluded from the experiments described below. Because a node from the source tree is cloned with equal probability regardless of whether it has already been used or not, the probability of a clone operation can be computed under the same dynamic programming assumptions as the basic tree-to-tree model. As with the tree-to-string cloning operation, this independence assumption is essential to keep the complexity polynomial in the size of the input sentences. For reference, the parameterization of all four models is summarized in Table 1. 5 Experiments 4 Data We evaluate our translation models both in terms agreement with human-annotated word-level alignments between the sentence pairs. For scoring the viterbi alignments of each system against goldstandard annotated alignments, we use the alignment error rate (AER) of Och and Ney (2000), which measures agreement at the level of pairs of words:1 AER = 1 2|A G| |A| + |G| For our experiments, we used a parallel KoreanEnglish corpus from the military domain (Han et al., 2001). Syntactic trees have been annotated by hand for both the Korean and English sentences; in this paper we will be using only the Korean trees, modeling their transformation into the English text. The 1 While Och and Ney (2000) differentiate between sure and possible hand-annotated alignments, our gold standard alignments come in only one variety. IBM Model 1 IBM Model 2 IBM Model 3 Tree-to-String Tree-to-String, Clone Tree-to-String, Clone Pins = .5 Tree-to-Tree Tree-to-Tree, Clone Alignment Error Rate .37 .35 .43 .42 .36 .32 .49 .36 Table 2: Alignment error rate on Korean-English corpus where A is the set of word pairs aligned by the automatic system, and G the set aligned in the gold standard. We provide a comparison of the tree-based models with the sequence of successively more complex models of Brown et al. (1993). Results are shown in Table 2. The error rates shown in Table 2 represent the minimum over training iterations; training was stopped for each model when error began to increase. IBM Models 1, 2, and 3 refer to Brown et al. (1993). Tree-to-String is the model of Yamada and Knight (2001), and Tree-to-String, Clone allows the node cloning operation of Section 2.1. Tree-to-Tree indicates the model of Section 3, while Tree-to-Tree, Clone adds the node cloning operation of Section 3.1. Model 2 is initialized from the parameters of Model 1, and Model 3 is initialized from Model 2. The lexical translation probabilities Pt (f |e) for each of our tree-based models are initialized from Model 1, and the node re-ordering probabilities are initialized uniformly. Figure 1 shows the viterbi alignment produced by the Tree-to-String, Clone system on one sentence from our test set. We found better agreement with the human alignments when xing Pins (left) in the Tree-to-String model to a constant rather than letting it be determined through the EM training. While the model learned by EM tends to overestimate the total number of aligned word pairs, xing a higher probability for insertions results in fewer total aligned pairs and therefore a better trade-off between precision and recall. As seen for other tasks (Carroll and Charniak, 1992; Merialdo, 1994), the likelihood criterion used in EM training may not be optimal when evaluating a system against human labeling. The approach of optimizing a small number of metaparameters has been applied to machine translation by Och and Ney (2002). It is likely that the IBM models could similarly be optimized to minimize alignment error an open question is whether the optimization with respect to alignment error will correspond to optimization for translation accuracy. Within the strict EM framework, we found roughly equivalent performance between the IBM models and the two tree-based models ...

Find millions of documents on Course Hero - Study Guides, Lecture Notes, Reference Materials, Practice Exams and more. Course Hero has millions of course specific materials providing students with the best way to expand their education.

Below is a small sample set of documents:

UPenn - C - 90
Toward Memory-based TranslationSatoshi S A T O and Ma.koto N A G A O Dept. of Electrical Engineering, K y o t o University Y o s h i d a - h o n m a c h i , Sa.kyo, K.yoto, 606, Ja.pan sa.to@kuee.kyoto-u.ac.jpAbstractAn essential problem of examp
UPenn - J - 93
Machine Translation: A Knowledge-Based Approach Sergei Nirenburg, Jaime Carbonell, Masaru Tomita, and Kenneth Goodman(Carnegie Mellon University) San Mateo, CA: Morgan Kaufmann Publishers, 1992, xiv + 258 pp. Hardbound, ISBN 1-55860-128-7, $39.95T
UPenn - C - 00
Automatic Corpus-Based Thai Word Extraction with the C4.5 Learning AlgorithmVIRACH SORNLERTLAMVANICH, TANAPONG POTIPITI AND THATSANEE CHAROENPORN National Electronics and Computer Technology Centel, National Science and Technology Development Agency
UPenn - C - 90
Reversible Unification Based M a c h m . FranslatlonGertjan van Noord OTS RUU Trans 10 3,512 JK Utrecht Valmoord~hutruu59.BH~netMarch 28, 1990Abstract[n this paper it will be shown how unification g r a m m a r s can be used to build a reversib
UPenn - C - 00
Chart-Based Transfer Rule Application in Machine TranslationAdam MeyersNew York University meyers@cs.nyu.edu M i c h i k o Kosaka Monlnouth University kosaka@monmouth.eduR a l p h GrishInanNew York University grishman@cs.nyu.eduAbstract35&quot;ans
UPenn - P - 99
Corpus-Based Identification of Non-Anaphoric N o u n PhrasesD a v i d L. B e a n and E l l e n R i l o f fD e p a r t m e n t of C o m p u t e r Science University of U t a h Salt Lake City, U t a h 84112 {bean,riloff}@cs.utah.eduAbstract Corefer
UPenn - P - 90
ZERO MORPHEMES IN UNIFICATION-BASED COMBINATORY CATEGORIAL GRAMMAR Chinatsu Aone The University of Texas at Austin &amp; MCC 3500 West Balcones Center Dr. Austin, TX 78759 (aone@mcc.com) ABSTRACT In this paper, we report on our use of zero morphemes in U
UPenn - P - 96
A N e w Statistical Parser Based on B i g r a m Lexical D e p e n d e n c i e sCollins* Dept. of Computer and Information Science University of Pennsylvania P h i l a d e l p h i a , P A , 19104, U . S . A . mcollins@gradient, cis.upenn, eduMichae
UPenn - P - 99
Designing a Task-Based Evaluation M e t h o d o l o g y for a Spoken Machine Translation S y s t e mKavita Thomas L a n g u a g e Technologies I n s t i t u t e Carnegie Mellon University 5000 Forbes Avenue P i t t s b u r g h , PA 15213, USAkavita
UPenn - P - 03
An Ontology-based Semantic Tagger for IE systemNarj` s Boufaden e Department of Computer Science Universit de Montr al e e Quebec, H3C 3J7 Canada boufaden@iro.umontreal.caAbstractIn this paper, we present a method for the semantic tagging of word
UPenn - C - 96
NL Domain Explanations in Knowledge Based MATGalia Angelova, Kalina Bontcheva 1Bulgarian Academy of Sciences, Linguistic Modelling Laboratory A c a d . G, B o n c h e v Str. 2 5 A , 1113 S o f i a , B u l g a r i a , { galja,kalina} @ b g c i c t .
UPenn - P - 03
Deverbal Compound Noun Analysis Based on Lexical Conceptual StructureTeruo Koyama Koichi Takeuchi Kyo Kageura Human and Social Information Research Division National Institute of Informatics 2-1-2 Hitotsubashi, Chiyodaku, Tokyo 101-8430, Japan koich
UPenn - D - 07
Large-Scale Named Entity Disambiguation Based on Wikipedia DataSilviu CucerzanMicrosoft Research One Microsoft Way, Redmond, WA 98052, USA silviu@microsoft.comAbstractThis paper presents a large-scale system for the recognition and semantic disa
UPenn - P - 01
A Syntax-based Statistical Translation ModelKenji Yamada and Kevin Knight Information Sciences Institute University of Southern California 4676 Admiralty Way, Suite 1001 Marina del Rey, CA 90292 kyamada,knight @isi.edu AbstractWe present a syntax-b
UPenn - C - 02
Semantics-based Representation for Multimodal Interpretation in Conversational SystemsJoyce ChaiIBM T. J. Watson Research Center 19 Skyline Drive Hawthorne, NY 10532, USA{jchai@us.ibm.com}Abstract To support context-based multimodal interpretati
UPenn - A - 92
A Simple Rule-Based Part of Speech TaggerEric Brill * D e p a r t m e n t of C o m p u t e r S c i e n c e University of Pennsylvania P h i l a d e l p h i a , P e n n s y l v a n i a 19104U.S.A.brill@unagi.cis.upenn.edu Abstract Automatic part o
UPenn - P - 05
A Hierarchical Phrase-Based Model for Statistical Machine TranslationDavid Chiang Institute for Advanced Computer Studies (UMIACS) University of Maryland, College Park, MD 20742, USA dchiang@umiacs.umd.eduAbstractWe present a statistical phrase-b
UPenn - P - 06
Investigations on Event-Based SummarizationMingli Wu Department of Computing The Hong Kong Polytechnic University Kowloon, Hong Kong csmlwu@comp.polyu.edu.hkAbstractWe investigate independent and relevant event-based extractive mutli-document su
UPenn - N - 06
Thai Grapheme-Based Speech RecognitionPaisarn Charoenpornsawat, Sanjika Hewavitharana, Tanja SchultzInteractive Systems Laboratories, School of Computer Science Carnegie Mellon University Pittsburgh, PA 15213 {paisarn, sanjika, tanja}@cs.cmu.eduA
UPenn - P - 01
An Algebra for Semantic Construction in Constraint-based GrammarsAnn Copestake Computer Laboratory University of Cambridge New Museums Site Pembroke St, Cambridge, UKaac@cl.cam.ac.ukAlex Lascarides Division of Informatics University of Edinburgh
UPenn - C - 02
Machine Translation Based on NLG from XML-DBYohei Seki Aoyama Gakuin / Department of Informatics, University The Graduate University for Advanced Studies (Sokendai) Abstract Ken'ichi Harada Department of Computing Science Keio UniversityThe purpos
UPenn - E - 06
Word Sense Induction: Triplet-Based Clustering and Automatic EvaluationStefan Bordag Natural Language Processing Department University of Leipzig Germany sbordag@informatik.uni-leipzig.deAbstractIn this paper a novel solution to automatic and uns
UPenn - P - 89
Unification-BasedSemantic InterpretationRobert C. Moore Artificial Intelligence Center SRI International Menlo Park, CA 94025 AbstractWe show how unification can be used to specify the semantic interpretation of natural-language expressions, inc
UPenn - N - 04
Feature-based Pronunciation Modeling for Speech RecognitionKaren Livescu and James Glass MIT Computer Science and Articial Intelligence Laboratory Cambridge, MA 02139, USA {klivescu, glass}@csail.mit.eduAbstractWe present an approach to pronuncia
UPenn - J - 92
Class-Based n-gram Models of Natural LanguageP e t e r F. B r o w n &quot; P e t e r V. d e S o u z a * R o b e r t L. Mercer* IBM T. J. Watson Research Center V i n c e n t J. D e l l a Pietra* J e n i f e r C. Lai*We address the problem of predicting
UPenn - J - 95
Transformation-Based Error-Driven Learning and Natural Language Processing: A Case Study in Part-of-Speech TaggingEric Brill*The Johns Hopkins UniversityRecently, there has been a rebirth of empiricism in the field of natural language processing.
UPenn - E - 06
Adaptive Transformation-based Learning for Improving Dictionary TaggingBurcu Karagol-Ayan, David Doermann, and Amy Weinberg Institute for Advanced Computer Studies (UMIACS) University of Maryland College Park, MD 20742 {burcu,doermann,weinberg}@umia
UPenn - E - 06
Phrase-Based Backoff Models for Machine Translation of Highly Inected LanguagesMei Yang Department of Electrical Engineering University of Washington Seattle, WA, USA yangmei@ee.washington.edu Katrin Kirchhoff Department of Electrical Engineering Un
UPenn - P - 04
Towards a Semantic Classication of Spanish Verbs Based on Subcategorisation InformationEva Esteve Ferrer Department of Informatics University of Sussex Brighton, BN1 9QH, UK E.Esteve-Ferrer@sussex.ac.uk AbstractWe present experiments aiming at an a
UPenn - N - 03
A Phrase-Based Unigram Model for Statistical Machine TranslationChristoph Tillmann and Fei Xia IBM T.J. Watson Research Center Yorktown Heights, NY 10598 {ctill,feixia}@us.ibm.comAbstractIn this paper, we describe a phrase-based unigram model fo
UPenn - CIS - 610
TensorTextures: Multilinear Image-Based RenderingM. Alex O. Vasilescu and Demetri Terzopoulos University of Toronto, Department of Computer Science New York University, Courant Institute of Mathematical SciencesFigure 1: Frames from the Treasure C
UPenn - P - 84
Features and ValuesLauri Karttunen University of Texas at Austin Artificial Intelligence Center SRI International and Center for the Study of Language and Information Stanford UniversityAbstractThe paper discusses the linguistic aspects of a new
UPenn - T - 87
Unification a n d the n e w g r a m m a t i s m Steve Pulman University of Cambridge Computer Laboratory Corn Exchange Street Cambridge C B 2 3QG, UK.Whatare w e talking about?The prototypical unification grammar consists of a context-free skel
UPenn - H - 01
Guidelines for Annotating Temporal InformationInderjeet Mani, George WilsonThe MITRE Corporation, W640 11493 Sunset Hills Road Reston, Virginia 20190-5214, USA +1-703-883-6149Lisa FerroThe MITRE Corporation, K329 202 Burlington Road, Rte. 62 Bed
UPenn - C - 86
The computational complexity of sentence derivation in functional unification grammarGraeme Ritchie Department of Artificial Intelligence University of Edinburgh Edinburgh EHI IHNAbstract Functional unification (FU) grammar is a general linguisti
UPenn - E - 87
DECLARATIVE k VIEVNOOEL FOR DEPENDENCY PARSING INTO BLACKBOARD METHOOOLOGY-Vatkonen, K., J i p p i n e n , H., L e h t o t a , A. and Ytltammi, KIELIKOHE-pr~ject, SITRA Foundation P.O.Box 329, S F - 0 0 1 2 1 H e t s i n k i FinLand t e L . i n
UPenn - A - 97
Layout &amp; Language: Preliminary experiments in assigning logical structure to table cellsMatthew Hurst and Shona Douglas Language Technology Group, Human Communication Research Centre, University of Edinburgh, Edinburgh EH8 9LW UK { M a t t h e w . H
UPenn - C - 88
AN I N T E G R A T E D MODEL F O R T H E TREATMENT OF TIME I N MT- SYSTEMSM. Meya Siemens CDS c/Luis Muntadas,5 CORNELLA, 08940-BARCELONA SpainJ. Vidul EUROTRA-E Ctra. Vallvidriera, 25.27 08017-BARCELONAAbstractOne of the ways to achieve a goo
UPenn - C - 90
Towards a Unification-Based PhonologyRichard Wiese Seminar f'dr Allgem. Spraehwissenschaft Heinrich-Heine-Univer sit,it DUsseldorf D-4000 Di.isseldorf 1 wiesedd0rud81.bitnet 1 Introduction. The Problem Phonological theory has undergone a number cf m
UPenn - C - 86
ASAELEMENTARY CONTRACTS PRAGMATIC BASIS OF LANGUAGEINTERACTIONE.L. Pershina A[ Laboratory, Computer Center Siberian Division of the USSR Ae. Sei. Novosibirsk 630090, USSR ABSTRACT Language interaction (LI) as a part of interpersonal communica
UPenn - E - 85
ON THE REPRESENTATION OF QUERY TERM RELATIONS BY SOFT BOOLEAN oPERATORSGerard Salton D e p a r t m e n t o f Computer S c i e n c e Cornell University Ithaca, NY 14853, USAABSTRACT The l a n g u a g e a n a l y s i s component i n m o s t t e x t
UPenn - C - 86
ConceptualLexicon Using an Object-OrientedLanguageShoiehi Y O K O Y A M A Electrotechnical Laboratory Tsukuba, Ibaraki, JapanKenji H A N A K A T A Universitat Stuttgart Stuttgart, F. R. G e r m a n yAbstractThis paper d e s c r i b e s the
UPenn - E - 91
STRUCTURAL NON-CORRESPONDENCE IN TRANSLATION Henry S. Thompson, Human Communication Research Centre, University of Edinburgh, 2 Buccleuch Place, Edinburgh, EH8 9LW, UIC ht@uk.ac.ed.cogsciLouisa Sadler, Dept. of Language and Linguistics, University
UPenn - C - 82
ADAPTIVE DIALOGUE - THE BASIS FOR PERSONAL COMPUTER SYSTEMVictor Briabrin Computing C e n t e r , Academy o f S c i e n c e s , Hosoow, USSR1. P e r s o n a l Computer S y s t e m s (POS) r e p r e s e n t nowadays a s i g n i f t e a u t t r e n
UPenn - C - 90
Complex Features in Description of Chinese LanguageFeng Zhiwei Imtitute of Applied Linguistics Chinese Academy of Social Sciences 51 Chaoyangmen Nanxiaojie 100010 Beijing, ChinaAbstract In this paper, the similarity of&quot; multi-vahw label fimction&quot; a
UPenn - C - 96
CHINESE STRING SEARCHING USING TtIE K M P ALGORITHMRobert W.P. LukDepartment of Computing, Hong Kong Polytechnic University,Kowloon,Hong Kong E-mail: csrluk@comp.polyu.edu.hkAbstract This paper is about the modification of KMP (Knuth, Morris and
UPenn - CIT - 591
ArraysApr 10, 2009A problem with simple variablesOne variable holds one valueThe value may change over time, but at any given time, a variable holds a single value If you want to keep track of many values, you need many variables All
UPenn - CIT - 591
Numbers and ArraysWidening and narrowing Numeric types are arranged in a continuum: Wider double float long int short byte,char Narrower You can easily assign a narrower type to a wider type: doublewide; intnarrow; wide=narrow; But if you want
UPenn - STAT - 112
Stat 112Review Notes for Chapter 4, Lecture Notes 6-91. Best Simple Linear Regression: Among the variables X 1 , K , X K , the variable which best predicts Y based on a simple linear regression is the variable for which the simple linear regressi
Auburn Montgomery - MATH - 190
Decimal Expansion of FractionsBrent MurphyP QProblem: Under What conditions will the decimal expansion of p/q terminate? Under what conditions will it repeat? p/q can be investigated as p*(1/q).Terminating When placing 1 over q as a fraction t
Auburn Montgomery - MATH - 190
Project 1.2 Decimal Expansions of Rational NumbersJacob Brozenick Anthony Mayle Kenny Milnes And Tim SweetserProblem Descriptions1. Determine which values of q in the expression p/q will cause the termination of the resulting decimal expansion.
Auburn Montgomery - MATH - 190
Calculus Project 1.2By Dorothy McCammon, Tammy Boals, George Reeves, Robert StevensPart 1 When you have a fraction x/y, y can be divided into x to obtain that fraction in decimal form. There are two different types of decimal numbers you can obt
Auburn Montgomery - MATH - 190
PROBLEM 1: Under what conditions will the decimal expansion p/q terminate? Repeat? PROBLEM 2: Suppose that we are given the decimal expansion of a rational number. How can we represent the decimal in the rational form p/q? PROBLEM 3: Express each
UPenn - M - 95
STATISTICAL SIGNIFICANCE OF MUC-6 RESULT SNancy Chinchor, Ph.D.Science Applications International Corporatio n 10260 Campus Point Drive, M/S A2- F San Diego, CA 9212 1 chinchor@gso.saic.com (619) 458-261 4 INTRODUCTIONThe results of the MUC-6 eva
UPenn - D - 07
Improving Query Spelling Correction Using Web Search ResultsQing Chen Natural Language Processing Lab Northeastern University Shenyang, Liaoning, China, 110004 chenqing@ics.neu.edu.cn Ming Zhou Microsoft Research Asia 5F Sigma Center Zhichun Road, H
UPenn - M - 92
U,SC : MUC-4 Test Results and AnalysisD . Moldovan, S. Cha, M . Chung, K. Hendrickson, J . Kim, and S. Kowalsk iParallel Knowledge Processing Laborator y University of Southern Californi a Los Angeles, California 90089-256 2 moldovan@gringo .usc .
UPenn - H - 90
Recent Results from the A R M Continuous Speech Recognition ProjectM a r t i n Russell and K e i t h P o n t i n gSpeech Research Unit RSKE, Malvern, Worcs WR14 3PS, UKIntroductionThis paper describes some of the most recent work on continuous s
UPenn - M - 92
BBN PLUM : MUC-4 Test Results and Analysi sRalph Weischedel, Damaris Ayuso, Sean Boisen , Heidi Fox, Herbert Gish, Robert Ingria, BBN Systems and Technologie s 10 Moulton St . Cambridge, MA 0213 8 weischedel@bbn.com GOALSOur mid-term to long-term g
UPenn - M - 93
THE STATISTICAL SIGNIFICANCE OF THE MUC-5 RESULT SNancy Chinchor, Ph .D.Science Applications International Corporatio n 10260 Campus Point Drive, M/S A2- F San Diego, CA 9212 1 chinchor @gso .saic.com (619) 458-261 4INTRODUCTIONThe statistical
UPenn - M - 91
SRI INTERNATIONAL'S TACITUS SYSTEM : MUC-3 TEST RESULTS AND ANALYSI SJerry R . Hobb sSRI International Menlo Park, California 9402 5 hobsai .sri.corn (415) 859-222 9 RESULTSThis site report is intended as a companion piece to the System Summary a
UPenn - M - 92
CRL/NMSU and Brandeis MucBr'uce : MUC-4 Test Results and Analysi sJim Cowie, Louise Guthrie, Yorick WilksComputing Research Laboratory New Mexico State University James Pustejovsky Computer Science Department Brandeis UniversityINTRODUCTIO NThe