Lecture 11 Notes

Unformatted text preview: units. This grouping is divided into two stages: scanning and parsing. Scanning is dividing the sequence of characters into words, punctuation, etc. These units are called lexical items, lexemes, or most often tokens. Refer to this as the lexical structure of the language. Parsing is organizing the sequence of tokens into hierarchical syntactic structures such as expressions, statements, and blocks. This is like organizing (diagramming) an English sentence into clauses, etc. Refer to this as the syntactic or grammatical structure of the language. Typical pieces of lexical specification: Any sequence of spaces and newlines is equivalent to a single space. A comment begins with and continues until the end of the line. An identifier is a sequence of letters and digits starting with a letter, and a variable is an identifier that is not a keyword. space ignored comment ignored ident ident foo bar %here is a comment ")" "begin" ident ) begin baz distinguish punctuation, keywords from identifiers 11 2.4.1 What’s in a token? Data structure for token consists of three pieces: A class, a Scheme symbol that describes what kind of a token you’ve found. The set of classes is part of the lexical specification. A piece of data describing the particular token. The nature of this data is also part of the lexical specification. For our system, the data will be as follows: For identifiers, the datum is a Scheme symbol built from the string in the token. For a number, the datum is the number described by the number literal. For a literal string, the datum is the string (used for keywords and punctuation) In a language that didn’t have symbols, we might use a string (the name of the identifier), or an entry into a hash table indexed by identifiers (a symbol table) instead. Using Scheme spares us these annoyances. Debugging information, such as line and character numbers. Job of the scanner is to go through the input and analyze it to produce these tokens....
