This preview shows pages 1–2. Sign up to view the full content.
This preview has intentionally blurred sections. Sign up to view the full version.View Full Document
Unformatted text preview: COP4020 Programming languages Programming assignment 1: Lexical Analyzer Lexical Analyzer In this assignment, you will write a lexical analyzer for a pascal liked language called PASC. The analyzer will be written in Lex. Token Specification The table in Page 3 defines the tokens that must be recognized, with their associated symbolic names. All multi-symbol tokens are separated by blanks, tabs, newlines, comments or delimiters. Comments are enclosed in (* ... *), and cannot be nested. An identifier is a sequence of (upper or lower case) letters or digits, beginning with a letter. Upper and lower cases are not distinguished in PASC. There is no limit on the length of identifiers. However, you may impose limits on the total number of distinct identifiers and string lexemes and on the total number of characters in all distinct identifiers and string taken together. There should be no other limitation on the number of lexemes that the lexical analyzer will process. An integer constant is an unsigned sequence of digits representing a 10-based number. A character constant is a character enclosed in single quotes (e.g., ’a’ is a character constant a). A string constant is a sequence of characters surrounded by single quotes (e.g. ’Hello, world’). The internal representations of character constant and string constant are different in PASC. Hard-to-type or invisible characters can be represented in character and string constants by escape sequences ; these sequences look like two characters, but represent only one. The escape sequences supported by the PASC language are \ n for newline, \ t for tab,...
View Full Document
This note was uploaded on 02/20/2012 for the course COP 4020 taught by Professor Engelen during the Spring '11 term at FSU.
- Spring '11