c02 - C S 4 2 1 C O M P I L E R S A N D I N T E R P R E T E...

Info iconThis preview shows pages 1–3. Sign up to view the full content.

View Full Document Right Arrow Icon

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: C S 4 2 1 C O M P I L E R S A N D I N T E R P R E T E R S Copyright 1994 - 2010 Zhong Shao, Yale University Lexical Analysis : Page 1 of 40 Lexical Analysis Read source program and produce a list of tokens (linear analysis) The lexical structure is specified using regular expressions Other secondary tasks: (1) get rid of white spaces (e.g., \t,\n,\sp ) and comments (2) line numbering token get next token lexical analyzer parser source program C S 4 2 1 C O M P I L E R S A N D I N T E R P R E T E R S Copyright 1994 - 2010 Zhong Shao, Yale University Lexical Analysis : Page 2 of 40 Example: Source Code A Sample Toy Program: (* define valid mutually recursive procedures *) let function do_nothing1(a: int, b: string)= do_nothing2(a+1) function do_nothing2(d: int) = do_nothing1(d, str) in do_nothing1(0, str2) end What do we really care here ? C S 4 2 1 C O M P I L E R S A N D I N T E R P R E T E R S Copyright 1994 - 2010 Zhong Shao, Yale University Lexical Analysis : Page 3 of 40 The Lexical Structure Output after the Lexical Analysis ----- token + associated value LET 51 FUNCTION 56 ID (do_nothing1) 65 LPAREN 76 ID (a) 77 COLON 78 ID (int) 80 COMMA 83 ID (b) 85 COLON 86 ID (string) 88 RPAREN 94 EQ 95 ID (do_nothing2) 99 LPAREN 110 ID (a) 111 PLUS 112 INT (1) 113 RPAREN 114 FUNCTION 117 ID (do_nothing2) 126 LPAREN 137 ID (d) 138 COLON 139 ID (int) 141 RPAREN 144 EQ 146 ID (do_nothing1) 150 LPAREN 161 ID (d) 162 COMMA 163 STRING (str) 165 RPAREN 170 IN 173 ID (do_nothing1) 177 LPAREN 188 INT (0) 189 COMMA 190 STRING (str2) 192 RPAREN 198 END 200 EOF 203 C S 4 2 1 C O M P I L E R S A N D I N T E R P R E T E R S Copyright 1994 - 2010 Zhong Shao, Yale University Lexical Analysis : Page 4 of 40 Tokens Tokens are the atomic unit of a language, and are usually specific strings or instances of classes of strings. Tokens Sample Values Informal Description LET let keyword LET END end keyword END PLUS + LPAREN ( COLON : STRING str RPAREN ) INT 49, 48 integer constants ID do_nothing1, a, int, string letter followed by letters, digits, and under-scores EQ = EOF end of file C S 4 2 1 C O M P I L E R S A N D I N T E R P R E T E R S Copyright 1994 - 2010 Zhong Shao, Yale University Lexical Analysis : Page 5 of 40 Lexical Analysis, How? First, write down the lexical specification (how each token is defined?) using regular expression to specify the lexical structure: identifier = letter (letter | digit | underscore) * letter = a | ... | z | A | ... | Z digit = 0 | 1 | ... | 9 Second, based on the above lexical specification , build the lexical analyzer (to recognize tokens) by hand, Regular Expression Spec ==> NFA ==> DFA ==>Transition Table ==> Lexical Analyzer Or just by using lex --- the lexical analyzer generator Regular Expression Spec (in lex format) ==> feed to lex ==> Lexical Analyzer C S 4 2 1 C O M P I L E R S A N D I N T E R P R E T E R S Copyright 1994 - 2010 Zhong Shao, Yale University Lexical Analysis : Page 6 of 40...
View Full Document

Page1 / 10

c02 - C S 4 2 1 C O M P I L E R S A N D I N T E R P R E T E...

This preview shows document pages 1 - 3. Sign up to view the full document.

View Full Document Right Arrow Icon
Ask a homework question - tutors are online