Unformatted text preview: Chapter 3 ContextFree Grammars and Parsing 1 Parsing: Syntax Analysis decides which part of the incoming token stream should be grouped together. the output of parsing is some representation of a parse tree. intermediate code generator transforms the parse tree into an intermediate language. 2 Comparisons between r.e. (regular expressions) and c.f.g. (contextfree grammars)
describes r.e. tokens F.A. to test a using valid token describes c.f.g. programming language using constructs P.F.A. to test a valid program (sentence) 3 Features of programming languages contents: declarations sequential statements iterative statements conditional statements 4 features: declare/state recursively & repeatedly hierarchical specification e.g., compound statement > statement > expression > id nested structures similarity 5 Description of the syntax of programming languages Syntax Diagrams (See Sec. 3.5.2) Context Free Grammars (CFG) 6 Contex Free Grammar (in BNF) exp exp addop term  term addop +  term term mulop factor  factor mulop * factor ( exp )  number
8 History In 1956 BNF (Backus Naur Form) is used for description of natural language. Algol uses BNF to describe its language. The Syntactic Specification of Programming Languages CFG ( a BNF description) 9 Capabilities of Contextfree grammars give precise syntactic specification of programming languages a parser can be constructed automatically by CFG the syntax entity specified in CFG can be used for translating into object code. useful for describing nested structures such as balanced parentheses, matching beginend's, corresponding if thenelse, etc.
10 Def. of context free grammars A CFG is a 4tuple (V,T,P,S), where V a finite set of variables (nonterminals) T a finite set of terminal symbols (tokens) P a finite set of productions (or grammar rules) S a start symbol and V T = S V Productions are of the form: A > , where A V, (V+T)* CFG generates CFL(Context Free Languages) 11 An Example G = ( {E}, {+, *, (, ), id}, P, E) P: { E > E + E E > E * E E > ( E ) E > id } 12 Rules from F.A.(r.e.) to CFG 1. 2. 3. 4. 5. For each state there is a nonterminal symbol. If state A has a transition to state B on symbol a, introduce A > aB. If A goes to B on input , introduce A > B. If A is an accepting state, introduce A > . Make the start state of the NFA be the start symbol of the grammar. 13 Examples
(1) r.e.: (ab)(ab01)* c.f.g.: S > aAbA A > aAbA0A1A (2) r.e.: (ab)*abb c.f.g.: S > aS  bS  aA A > bB B > bC C > 14 Why don't we use c.f.g. to replace r.e. ? r.e. => easy & clear description for token. r.e. => efficient token recognizer modularizing the components 15 Derivations (How does a CFG defines a language?) Definitions: directly derive * derive in zero or more steps => (V+T)* + derive in one or more steps => (V+T)* i derive in i steps A => (V+T)* sentential form (V+T)* sentence T* + language: { w  S => w , w T* } leftmost derivations rightmost derivations 16 G = ( {exp, op}, {+, *, (, ), number}, P, exp ) P : { exp exp op exp  ( exp )  number op +    * } (numbernumber)*number 18 Parse trees => a graphical representation for derivations. (Note the difference between parse tree and syntax tree.) => Often the parse tree is produced in only a figurative sense; in reality, the parse tree exists only as a sequence of actions made by stepping through the tree construction process.
19 Ambiguity Ambiguous Grammars Def.: A contextfree grammar that can produce more than one parse tree for some sentence. The ways to disambiguate a grammar: (1) specifying the intention (e.g. associtivity and precedence for arithmetic operators, other) (2) rewrite a grammar to incorporate the intention into the grammar itself. 20 For (1) Precedence: negate > exponent ( ) > * / > + Associtivity: exponent ==> right associtivity others ==> left associtivity In yacc, a "specification rule" is used to solve the problem of (1), e.g., the alignment order, the special syntax, default value (refer to yacc manual for the disambiguating rules) For (2) introducing one nonterminal for each precedence level. 21 Example 1 E > E + E  E E  E * E  E / E  E E  ( E )  E  id is ambiguous ( is exponent operator with right associtivity.) 22 E E + E id E id * E id E E * E + E id id E id More than one parse tree for the sentence id + id * id 23 + * id id *
id id + id id More than one syntax tree for the sentence id + id * id
24 The corresponding grammar shown below is unambiguous
7 element > (expression)  id /*((expression) ) */ primary > primary  element factor > primary factor  primary /*has right associtivity */ term > term * factor  term / factor  factor expression > expression + term  expression term  term 25 Ex: id + id * id expression term factor * primar y element id
26 expression + term factor term
factor primary primary element element id
id Example 2
stat > IF cond THEN stat  IF cond THEN stat ELSE stat  other stat is an ambiguous grammar 29 Dangling else problem stat THEN then
if c2 then s2 else s3 IF if cond c1 stat stat IF cond THEN stat ELSE stat If c1 then if c2 then s2 else s3 IF cond THEN stat ELSE stat if
c1 then IF cond THEN stat if c2 then s2
else s3 The corresponding grammar shown below is unambiguous. stat > matchedstat  unmatchedstat matchedstat > if cond then matchedstat else matched stat  otherstat unmatchedstat > if cond then stat  if cond then matchedstat else unmatchedstat 31 Noncontext free language constructs L = {wcw  w is in (ab)*} L = {anbmcndm  n L = {anbncn n 0} 1 and m 1} 33 Basic Parsing Techniques 1. How to check if an input string is a sentence of a given grammar? (check the syntax not only used in the programming language) 2. How to construct a parse tree for the input string, if desired? 34 Method classic approach modern approach 1. topdown recursive descent LL parsing (produce leftmost derivation) 2. bottomup operator precedence LR parsing (shift reduce parsing; produce rightmost derivation in reverse order) 35 An Example (for LR Parsing)
S > aABe A > Abc  b w = abbcde
rm rm rm B > d
rm S => aABe => aAde => aAbcde => abbcde LR parsing: abbcde ==> aAbcde ==> aAde ==> aABe ==> S
36 37 38 Assignment #3a
1. Do exercises 3.3, 3.5, 3.24, 3.25 Using the grammar in BNF of the TINY language in Fig. 3.6 to derive step by step the sequence of tokens of the program in Fig. 3.8. (for practice only) 39 ...
View
Full
Document
This note was uploaded on 06/28/2011 for the course ENGINEERIN 100 taught by Professor Yangwei during the Spring '10 term at National Cheng Kung University.
 Spring '10
 Yangwei

Click to edit the document details