{[ promptMessage ]}

Bookmark it

{[ promptMessage ]}

c02 - CS421 COMPILERS AND INTERPRETERS CS421 Lexical...

Info iconThis preview shows pages 1–3. Sign up to view the full content.

View Full Document Right Arrow Icon
C S 4 2 1 C O M P I L E R S A N D I N T E R P R E T E R S Copyright 1994 - 2010 Zhong Shao, Yale University Lexical Analysis : Page 1 of 40 Lexical Analysis Read source program and produce a list of tokens (“linear” analysis) The lexical structure is specified using regular expressions Other secondary tasks: (1) get rid of white spaces (e.g., \t,\n,\sp ) and comments (2) line numbering token get next token lexical analyzer parser source program C S 4 2 1 C O M P I L E R S A N D I N T E R P R E T E R S Copyright 1994 - 2010 Zhong Shao, Yale University Lexical Analysis : Page 2 of 40 Example: Source Code A Sample Toy Program: (* define valid mutually recursive procedures *) let function do_nothing1(a: int, b: string)= do_nothing2(a+1) function do_nothing2(d: int) = do_nothing1(d, “str”) in do_nothing1(0, “str2”) end What do we really care here ? C S 4 2 1 C O M P I L E R S A N D I N T E R P R E T E R S Copyright 1994 - 2010 Zhong Shao, Yale University Lexical Analysis : Page 3 of 40 The Lexical Structure Output after the Lexical Analysis ----- token + associated value LET 51 FUNCTION 56 ID (do_nothing1) 65 LPAREN 76 ID (a) 77 COLON 78 ID (int) 80 COMMA 83 ID (b) 85 COLON 86 ID (string) 88 RPAREN 94 EQ 95 ID (do_nothing2) 99 LPAREN 110 ID (a) 111 PLUS 112 INT (1) 113 RPAREN 114 FUNCTION 117 ID (do_nothing2) 126 LPAREN 137 ID (d) 138 COLON 139 ID (int) 141 RPAREN 144 EQ 146 ID (do_nothing1) 150 LPAREN 161 ID (d) 162 COMMA 163 STRING (str) 165 RPAREN 170 IN 173 ID (do_nothing1) 177 LPAREN 188 INT (0) 189 COMMA 190 STRING (str2) 192 RPAREN 198 END 200 EOF 203 C S 4 2 1 C O M P I L E R S A N D I N T E R P R E T E R S Copyright 1994 - 2010 Zhong Shao, Yale University Lexical Analysis : Page 4 of 40 Tokens Tokens are the atomic unit of a language, and are usually specific strings or instances of classes of strings. Tokens Sample Values Informal Description LET let keyword LET END end keyword END PLUS + LPAREN ( COLON : STRING “str” RPAREN ) INT 49, 48 integer constants ID do_nothing1, a, int, string letter followed by letters, digits, and under-scores EQ = EOF end of file
Background image of page 1

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full Document Right Arrow Icon
C S 4 2 1 C O M P I L E R S A N D I N T E R P R E T E R S Copyright 1994 - 2010 Zhong Shao, Yale University Lexical Analysis : Page 5 of 40 Lexical Analysis, How? First, write down the lexical specification (how each token is defined?) using regular expression to specify the lexical structure: identifier = letter (letter | digit | underscore) * letter = a | ... | z | A | ... | Z digit = 0 | 1 | ... | 9 Second, based on the above lexical specification , build the lexical analyzer (to recognize tokens) by hand, Regular Expression Spec ==> NFA ==> DFA ==>Transition Table ==> Lexical Analyzer Or just by using lex --- the lexical analyzer generator Regular Expression Spec (in lex format) ==> feed to lex ==> Lexical Analyzer C S 4 2 1 C O M P I L E R S A N D I N T E R P R E T E R S Copyright 1994 - 2010 Zhong Shao, Yale University Lexical Analysis : Page 6 of 40 Regular Expressions regular expressions are concise, linguistic characterization of regular languages (regular sets) each regular expression define a regular language --- a set of strings over some alphabet, such as ASCII characters; each member of this set is called a sentence, or a word
Background image of page 2
Image of page 3
This is the end of the preview. Sign up to access the rest of the document.

{[ snackBarMessage ]}