This preview shows pages 1–3. Sign up to view the full content.
This preview has intentionally blurred sections. Sign up to view the full version.View Full Document
Unformatted text preview: ECE 251 Assignment #2: Be the ANTLR Patrick Lam Due: November 5 1 Problem Description One of the themes Ive been repeating this term is that compiler technology is useful for many tasks besides building compilers for general-purpose programming languages. SQL, or the Structured Query Language, is a ubiquitous domain-specific language for talking to databases. Basic SQL is not difficult to pick up, but it is beyond the scope of this course. However, the parsing of SQL is very much on-topic for this course, and it is actually fairly simple. In this lab, you will build a lexer and parser for a small SQL subset by hand, using the recursive- descent parser construction techniques we saw in class. Please do not use a parser generator for this assignment; I would like you to build at least one parser by hand in this course. You may wish to consult the SQLite documentations syntax diagrams for information on SQL: http://www.sqlite.org/syntaxdiagrams.html We will only be implementing a subset of this language. 2 Task 1: Lexical Analysis Weve seen that the two first tasks in creating a compiler are lexical analysis and parsing. The first task will be to create a lexer for your language, which will account for 20% of the marks for this lab. Specifically, I provide a class which splits the stream of characters into a stream of words. Your task is to create Token s for these words, by plugging in the appropriate regular expressions into the Token class. Ive put up a simple test suite for lexical analysis. You should also create a couple of test cases (but, this time, you dont need to hand them in). Ill only run the test cases that I post. Token specifications. SQL is case-insensitive. Your lexer must differentiate between keywords, identifiers, and literals (boolean, numeric and string). The enum type Token.Type contains all of the tokens that you need to recognize. Keywords are obvious. Identifiers start with a letter (a-z) or an underscore, and continue with letters, underscores, and digits, or contain arbitrary characters between two double-quote marks ( " ). (To include a double quote, write two double quotes.) 1 Boolean literals may be the strings TRUE or FALSE.Boolean literals may be the strings TRUE or FALSE....
View Full Document
This note was uploaded on 10/28/2010 for the course ECE 493 taught by Professor Lam during the Spring '09 term at Waterloo.
- Spring '09