Unformatted text preview: units. This grouping is divided into two stages: scanning and parsing.
Scanning is dividing the sequence of characters into words, punctuation, etc.
These units are called lexical items, lexemes, or most often tokens. Refer to this
as the lexical structure of the language.
Parsing is organizing the sequence of tokens into hierarchical syntactic structures
such as expressions, statements, and blocks. This is like organizing (diagramming) an English sentence into clauses, etc. Refer to this as the syntactic or grammatical structure of the language.
Typical pieces of lexical speciﬁcation:
Any sequence of spaces and newlines is equivalent to a single space. A comment begins with and continues until the end of the line. An identiﬁer is a sequence of letters and digits starting with a letter, and a
variable is an identiﬁer that is not a keyword.
ident ident foo bar %here is a comment
")" "begin" ident ) begin baz
distinguish punctuation, keywords from identifiers 11 2.4.1 What’s in a token?
Data structure for token consists of three pieces:
A class, a Scheme symbol that describes what kind of a token you’ve found.
The set of classes is part of the lexical speciﬁcation. A piece of data describing the particular token. The nature of this data is also
part of the lexical speciﬁcation. For our system, the data will be as follows:
For identiﬁers, the datum is a Scheme symbol built from the string in the
token. For a number, the datum is the number described by the number
literal. For a literal string, the datum is the string (used for keywords and
punctuation) In a language that didn’t have symbols, we might use a string (the name of
the identiﬁer), or an entry into a hash table indexed by identiﬁers (a symbol
table) instead. Using Scheme spares us these annoyances.
Debugging information, such as line and character numbers. Job of the scanner is to go through the input and analyze it to produce these tokens....
View Full Document
- Fall '09
- Formal grammar, ¦ 16, SLLGEN