lecture14_S2009

lecture14_S2009 - . 18.417: Complements on language theory:...

Info iconThis preview shows page 1. Sign up to view the full content.

View Full Document Right Arrow Icon
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: . 18.417: Complements on language theory: Multi-tape grammars & Tree adjoining grammars J´rˆme Waldispuhl eo ¨ jeromew@mit.edu Department of Mathematics, M.I.T. 18.417 - Complement in language theory – p. 1/29 Multi-tape S-attribute grammar Definition 1 Multi-tape alphabet (Σ(i) ∪ {ε}) Σ= i=1···m Definition 2 Multi-tape Context-free grammar G = {VT , VN , P, S } where VT is an m-tape alphabet. Definition 3 Multi-tape S-attribute grammar G = {VT , VN , P, S, A, λA , FP } where VT is an m-tape alphabet. MTCFG & TAG – p. 2/29 Example : RNA sequence alignment 8 > >S > > > > > > > > mat > > > > > > > > del > > > > < > ins > > > > > > mut > > > > > > > > > > > > > > > : → → → → → SS | mat | del | ins | mut ˆa˜ ˆu˜ hgi ˆc˜ |u|g|c a hihihihi − |−|−|− a u g c hihihihi a u g c |−|−|− − ˆa˜ hai ˆa˜ ˆu˜ |g|c|a u hi ˆ˜ ˆ˜ ˆ˜ g g |u|u|a|u g c ˆg˜ ˆc˜ ˆ c ˜ hci |c|a|u|g MTCFG & TAG – p. 3/29 Example : RNA sequence alignment 8 > >S > > > > > > > > mat > > > > > > > > del > > > > < > ins > > > > > > mut > > > > > > > > > > > > > > > : → → → → → SS | mat | del | ins | mut ˆa˜ ˆu˜ hgi ˆc˜ |u|g|c a hihihihi − |−|−|− a u g c hihihihi a u g c |−|−|− − ˆa˜ hai ˆa˜ ˆu˜ |g|c|a u hi ˆ˜ ˆ˜ ˆ˜ g g |u|u|a|u g c ˆg˜ ˆc˜ ˆ c ˜ hci |c|a|u|g A=Z λ(• ) = 0 FP = 8 fS →SS (xy ) > > > > > fS →del|ins|mut (x) > > > > < f (x) = x+y = x = 0 fdel→• (x) = 1 fins→• (x) = 1 fmut→• (x) = 1 mat→• > > > > > > > > > : MTCFG & TAG – p. 3/29 Example : RNA sequence alignment S,6 S,0 S,6 S,1 S,5 S,1 S,4 S,0 AC-UUGCAUU-C AACUGG-AUGUC S,4 S,1 S,3 S,0 S,3 S,1 S,2 S,0 Cost = 6 S,2 S,0 S,2 S,1 S,1 S,1 S,0 mat,0 mut,1 del,1 mat,0 mut,1 mat,0 ins,1 mat,0 mat,0 mut,1 del,1 mat,0 A A C A − U U U G G G C A A U U U G − C C C − U MTCFG & TAG – p. 4/29 ε-deletion An alignment : ε d Gives after ε-deletion : b ε a ε b c a d abba dcd The same 2-tape word may also come from : Application : The best alignment of a 2-tape word alignment with ε-insertion. ε d b d ε c b ε a d ω1 ω2 is computed by finding the optimal MTCFG & TAG – p. 5/29 Example : RNA sequence alignment S,6 S,0 S,6 S,1 S,5 S,1 S,4 S,0 AC-UUGCAUU-C AACUGG-AUGUC S,4 S,1 S,3 S,0 S,3 S,1 S,2 S,0 Cost = 6 S,2 S,0 S,2 S,1 S,1 S,1 S,0 mat,0 mut,1 del,1 mat,0 mut,1 mat,0 ins,1 mat,0 mat,0 mut,1 del,1 mat,0 A A C A − U U U G G G C A A U U U G − C C C − U MTCFG & TAG – p. 6/29 Example : RNA sequence alignment S,6 S,0 S,6 S,1 S,5 S,1 S,4 S,0 ACUUGCAUUC AACUGGAUGUC S,4 S,1 S,3 S,0 S,3 S,1 S,2 S,0 Cost = ? ? S,2 S,0 S,2 S,1 S,1 S,1 S,0 mat,0 mut,1 del,1 mat,0 mut,1 mat,0 ins,1 mat,0 mat,0 mut,1 del,1 mat,0 A A C A − U U U G G G C A A U U U G − C C C − U MTCFG & TAG – p. 6/29 Example : RNA sequence alignment S,4 S,1 S,3 S,0 S,3 S,0 S,3 S,0 -ACUUGCAU-UC AACUGG-AUGUC S,3 S,1 S,2 S,0 S,2 S,1 S,1 S,0 Cost = 4 S,1 S,0 S,1 S,1 S,0 S,0 S,0 del,0 mat,1 mat,1 mat,0 mut,1 mat,0 ins,1 mat,0 mat,0 del,1 mat,0 mat,0 − A A C C U U U G G G C A A U U − U U C C A − G MTCFG & TAG – p. 6/29 Example (2) : Unification of structure and alignment S→ ( ( ) ) S − a a − a a SS S S S S S S S S ( ( S S a − a a S S S ) ) a a S − a − a S ( ( ( ( S a a − a ) ) ) ) MTCFG & TAG – p. 7/29 Modeling structures with MTCFG 1. define a m-tape language modeling the structure, 2. design productions rules generating the language, 3. substitute the tokens by nucleotides or amino-acids. MTCFG & TAG – p. 8/29 Example (3) : RNA with pseudoknot 5’ A U A CC AAG U ||| UUCAGUC ||| G A A C AC A G U Lpsn = 3’ x (i u [j v )i w ]j y i > 0, j > 0, et u, v, w, x, y ∈ {.}∗ MTCFG & TAG – p. 9/29 Example (3) : RNA with pseudoknot F. Lefebvre (1996) MTCFG & TAG – p. 10/29 Example (3) : RNA with pseudoknot F. Lefebvre (1996) 5’ A U A CC AAG U ||| UUCAGUC ||| G A A C AC A G U 3’ MTCFG & TAG – p. 10/29 Example (3) : RNA with pseudoknot F. Lefebvre (1996) 5’ A U A CC AAG U ||| UUCAGUC ||| G A A C AC A G U AAAGUACCUCUGACUUGAACACAGU =⇒ 3’ MTCFG & TAG – p. 10/29 Example (3) : RNA with pseudoknot F. Lefebvre (1996) 5’ A U A CC AAG U ||| UUCAGUC ||| G A A C AC A G U =⇒ AAAGUACCUCUGACUUGAACACAGU .(((.....[[[.))).....]]]. 3’ MTCFG & TAG – p. 10/29 Example (3) : RNA with pseudoknot F. Lefebvre (1996) 5’ A U A CC AAG U ||| UUCAGUC ||| G A A C AC A G U =⇒ AAAGUACCUCUGACUUGAACACAGU .(((.........)))......... .........[[[.........]]]. 3’ MTCFG & TAG – p. 10/29 Example (3) : RNA with pseudoknot S S S P A0 B0 A0 B0 A0 B0 A0 A1 B0 A2 A1 A3 B1 B2 A2 A1 B3 B2 A2 B3 B2 B3 A1 B3 A1 . . ( . ( . ( . . . . . . . . . . . B3 . − . − . − . − ) − ) − ) − − [ − [ − [ − . − . − . − . . . . . . . . . . . . . . . . MTCFG & TAG – p. 11/29 Example (3) : RNA with pseudoknot S S S P A0 B0 A0 B0 A0 B0 A0 A1 B0 A2 A1 A3 B1 B2 A2 A1 B3 B2 A2 B3 B2 B3 A1 B3 A1 A A A A A A G G U U A A C C C C U U B3 C − U − G − A − C − U − U − − C − U − G − A − C − U − U G G A A A A C C A A C C A A G G U U MTCFG & TAG – p. 11/29 Example (3) : RNA with pseudoknot S S S P A0 B0 A0 B0 A0 B0 A0 A1 B0 A2 A1 A3 B1 B2 A2 A1 B3 B2 A2 B3 B2 B3 A1 B3 A1 A A ,a A A ,a A A ,a G G ,g U U ,u A A ,a C C ,c C C ,c U U ,u B3 C − ,c U − ,u G − ,g A − ,a C − ,c U − ,u U − ,u − C ,c − U ,u − G ,g − A ,a − C ,c − U ,u − U ,u G G ,g A A ,a A A ,a C C ,c A A C C ,c A A ,a G G ,g U U ,u MTCFG & TAG – p. 11/29 Example (3) : RNA with pseudoknot S,-9 S,-9 S,-9 P,-9 A0 ,-4 B0 ,-5 A0 ,-3 B0 ,-3 A0 ,-2 B0 ,-1 A0 ,0 A1 ,0 B0 ,0 A2 ,0 A1 ,0 A3 ,0 B1 ,0 B2 ,0 A2 ,0 A1 ,0 B3 ,0 B2 ,0 A2 ,0 B3 ,0 B2 ,0 B3 ,0 A1 ,0 B3 ,0 A1 ,0 A A ,a A A ,a A A ,a G G ,g U U ,u A A ,a C C ,c C C ,c U U ,u B3 ,0 C − ,c U − ,u G − ,g A − ,a C − ,c U − ,u U − ,u − C ,c − U ,u − G ,g − A ,a − C ,c − U ,u − U ,u G G ,g A A ,a A A ,a C C ,c A A C C ,c A A ,a G G ,g U U ,u MTCFG & TAG – p. 11/29 Definition Tree Adjoining Grammar (TAG) (Joshi et al., 1975) a set of elementary trees, divided in initial and auxiliary trees, operations of adjunction and substitution are defined which build derived trees from elementary trees. an initial tree is a tree of which the interior nodes are all labelled with non-terminal symbols, and the nodes on the frontier are either labelled with terminal symbols, or with non-terminal symbols, which are marked with the substitution marker ( ↓). An auxiliary tree is defined as an initial tree, except that exactly one of its frontier nodes must be marked as foot node (‘*’). The foot node must be labelled with a non-terminal symbol which is the same as the label of the root node. MTCFG & TAG – p. 12/29 Example(1) : TAG 3 initial and 4 auxiliary trees : np is a substitution node, left is a terminal symbol. MTCFG & TAG – p. 13/29 Example(1) : TAG Substitution Substituting a tree α in a tree α′ simply replaces a substitution node in α′ with α, under the convention that the non-terminal symbol of the substitution node is the same as the root node of α. For example, substituting α2 in α3 gives the following tree : Only initial trees, and derived trees, can be substituted in another tree. MTCFG & TAG – p. 14/29 Example(1) : TAG Substitution Substituting a tree α in a tree α′ simply replaces a substitution node in α′ with α, under the convention that the non-terminal symbol of the substitution node is the same as the root node of α. For example, substituting α2 in α3 gives the following tree : Only initial trees, and derived trees, can be substituted in another tree. MTCFG & TAG – p. 15/29 Example(1) : TAG Adjunction Adjoining an auxiliary tree β at some node n of a derived tree γ : Firstly, the non-terminal symbol of the root node (and hence the non-terminal symbol of the foot node) of β should be the same as the non-terminal symbol associated with n. The sub-tree t of γ rooted by n is removed from γ , and β is substituted for it instead ; where t is substituted in the foot node of β. MTCFG & TAG – p. 16/29 TAG modeling RNA pseudoknots set of elementary trees used to model RNA pseudo-knotted secondary structures (Uemura, 1994) : simple linear TAG : every auxiliary tree has at most one node on which adjunction is allowed, and this node lies on its spine. Parseable in O (n4 ) time. Can be extended to extended simple linear TAG, where a second adjunction is allowed off the spine. MTCFG & TAG – p. 17/29 TAG modeling RNA pseudoknots Example of derivation MTCFG & TAG – p. 18/29 TAG modeling RNA pseudoknots Pseudoknot in hepatitis delta virus ribozyme. MTCFG & TAG – p. 19/29 TAG modeling RNA pseudoknots Other grammars (Rivas & Eddy, 2000) : crossed-interaction grammar (CIG). A CIG has two modules : a set of context-free productions, a set of rearrangement rules. The productions work as in CFG, to which we add a hole string and a set of special nonterminals. The rearrangement rules apply after the productions. Parseable in O (n6 ) time. (Cai et al., 2003) : parallel communicating grammar systems (PCGSs). MTCFG & TAG – p. 20/29 Other TAGs Regular-form TAG (a) Off-spine adjunction, allowed. (b) Acyclic spine adjunction, allowed. (c) Cyclic spine adjunction, not allowed. (d) Root adjunction, not allowed. (e) Foot adjunction, allowed. Definition 4 Definition 3. We say that a TAG is in regular form, or an RF-TAG, if there exists some partial ordering over nonterminal symbols such that if is an auxiliary tree whose root and foot nodes are labeled X , and is a node labeled Y on ’s spine where adjunction is allowed, then X Y , and X = Y only if is a foot node. MTCFG & TAG – p. 21/29 Other TAGs Regular-form TAG generates the same languages as CFG with same parsing complexity (i.e O (n6 ) in time), greater derivational generative capacity than CFG (Chiang, 2002), can be used for modeling limited RNA tertiary interactions (number of such self-contacts is bounded). MTCFG & TAG – p. 22/29 TAG for modeling protein structure (Abe and Mamitsuka, 1997) : Ranked node-rewriting grammars for β -sheets. RNRG is essentially TAG with multiple foot nodes on elementary trees. When a node β is rewritten with a tree β , the children of are identified with the foot nodes of β , matched up according to linear precedence. (RNRG for β -sheet of five strands) MTCFG & TAG – p. 23/29 TAG for modeling protein structure Example of rewriting of a RNRG MTCFG & TAG – p. 24/29 TAG for modeling protein structure RNRG are equivalent to classical TAGs (a) RNRG elementary tree, (b) TAG elementary tree. MTCFG & TAG – p. 25/29 TAG for modeling protein structure Example of permutated β -sheets Problem : How to distinguish all β -sheet architectures ? MTCFG & TAG – p. 26/29 TAG for modeling protein structure Set-local multicomponent TAG for β -sheets MTCFG & TAG – p. 27/29 TAG for modeling protein structure Set-local multicomponent TAG for β -sheets MTCFG & TAG – p. 28/29 TAG for modeling protein structure Set-local multicomponent TAG for β -sheets MTCFG & TAG – p. 28/29 TAG for modeling protein structure Limitation of the approach • structural ambiguity in multicomponent TAG : (a) can be modeled using (b) and (c). But either of these trees can be used by itself to generate (d). • parsing of these grammars is exponential in the number of strands per sheet. • every grammar imposes some upper bound ⇒ no single grammar that can generate all β -sheets. MTCFG & TAG – p. 29/29 ...
View Full Document

This note was uploaded on 06/16/2011 for the course MATH 18.417 taught by Professor Jérômewaldispühl during the Spring '11 term at MIT.

Ask a homework question - tutors are online