This preview has intentionally blurred sections. Sign up to view the full version.View Full Document
Unformatted text preview: MP 7 – A Lexer for PicoML CS 421 – Fall 2007 Revision 1 . Assigned October 16, 2007 Due October 23, 2007, at 23:59pm Extension 48 hours (20% penalty) 1 Change Log 1.1 Corrected string to int and string to float to string of int and float of string 1.0 Initial Release. 2 Overview To complete this MP, make sure you are familiar with the lectures on DFAs and NFAs, regular expressions, and lexing. After completing this MP, you should understand how to implement a practical lexer using a lexer generator such as Lex. Hopefully you should also gain a sense of appreciation for the availability of lexer generators, instead of having to code a lexer completely from scratch. The language we are making a parser for is called PicoML , which is basically a subset of Ocaml. It includes functions, lists, integers, strings, let expressions, etc. 3 Overview of Lexical Analysis (Lexing) Recall from lecture that the process of transforming program code (i.e, as ASCII or Unicode text) into an abstract syntax tree (AST) has two parts. First, the lexical analyzer (lexer) scans over the text of the program and converts the text into a sequence of tokens , usually as values of a user-defined disjoint datatype. These tokens are then fed into the parser , which builds the actual AST. Note that it is not the job of the lexer to check for correct syntax - this is done by the parser. In fact, our lexer will accept (and correctly tokenize) strings such as ” if if let let if if else ” which are not valid programs. 4 Lexer Generators The tokens of a programming language are specified using regular expressions, and thus the lexing process involves a great deal of regular-expression matching. It would be tedious to take the specification for the tokens of our language, convert the regular expressions to a DFA, and then implement the DFA in code to actually scan the text. Instead, most languages come with tools that automate much of the process of implementing a lexer in those lan- guages. To implement a lexer with these tools, you simply need to define the lexing behavior in the tool’s specification language. The tool will then compile your specification into source code for an actual lexer that you can use. In this MP, we will use a tool called ocamllex to build our lexer. 4.1 ocamllex specification The lexer specification for ocamllex is documented here: 1 http://caml.inria.fr/pub/docs/manual-ocaml/manual026.html What follows below is only the short version. If it doesn’t make sense, or you need more details, consult the link above. You will need to become especially familiar with ocamllex ’s regular expression syntax....
View Full Document
- Fall '08
- Regular expression, lexer, Lexer Generators, lexer specification