This preview has intentionally blurred sections. Sign up to view the full version.View Full Document
Unformatted text preview: Supplementary: Finite state machines 1 Purpose and background This document covers (for CS 136 students) a general method for breaking an input stream or a list of characters into syntactic elements called tokens , using a model of computation called a finite state machine . It can be read during or after lecture module 02 on interaction. The method described here forms the basis for tokenizing in CS 241; students will be given “scanner” software very similar to what is developed in this document, and will have to modify and extend it. 2 General tokenizing In CS 135, we saw how to break a list of characters into “lines” using newline characters, or into “words” using whitespace. Lecture module 02 of CS 136 discusses how to write the read-token function, which reads one of four types of tokens (left/right parens, names, and numbers) from the input, using read-char and peek-char to handle one character at a time). This was a low-level helper function for a simple S-expression reader. read-token used the fact that looking at the first character of a token determined its type. This is not always true. For example, there are many different styles of specifying Scheme constants using # . Adding more conditional code to make-token could be tricky. Consider the example of treating 123abc as a name (’ 123abc is a legal symbol in Scheme). As make-token was presented, it applies one of two helper functions, make-name or make-number , based on the first character. We would have to change the code for make-number to suddenly switch to the task of recognizing a name. One approach to tokenizing that is general enough to handle such situations is based on the idea of a finite state machine (FSM). We will develop the idea in the context of deciding whether a list of characters forms a valid token. It can easily be extended to breaking a list of characters into tokens as in CS 135, or reading tokens from the input as in CS 136. 3 Finite state machines The idea is to maintain the notion of a current state while scanning the list. We will represent the state by a symbol. The beginning state, for example, can be the symbol ’ start ....
View Full Document
This note was uploaded on 10/21/2010 for the course CS 136 taught by Professor Becker during the Winter '08 term at Waterloo.
- Winter '08