11-711: Algorithms for NLP Homework Assignment #1: Formal Language Theory Solutions Out: September 10, 2009 Due: September 24, 2009 Problem 1 [10 points] Prove that, for any deterministic FSA A = ( Q, Σ , δ, q 0 , F ), ˆ δ ( q, xy ) = ˆ δ ˆ δ ( q, x ) , y for x, y Σ * . Use the definition of ˆ δ provided in lecture: (1) ˆ δ ( q, ) = q (2) ˆ δ ( q, xσ ) = δ ˆ δ ( q, x ) , σ where is the empty string, x Σ * , and σ Σ. Solution The proof is by induction on | y | . Base: | y | = 0. If | y | = 0, then y = . ˆ δ ( q, xy ) = ˆ δ ( q, x ) by definition of y = ˆ δ ˆ δ ( q, x ) , by definition (1) of ˆ δ = ˆ δ ˆ δ ( q, x ) , y by definition of y Induction: | y | = n + 1 . We rewrite y as where w Σ * and σ Σ. Thus | w | = n , and we assume by the inductive hypothesis that ˆ δ ( q, xw ) = ˆ δ ˆ δ ( q, x ) , w . ˆ δ ( q, xy ) = ˆ δ ( q, xwσ ) by definition of y = δ ˆ δ ( q, xw ) , σ by definition (2) of ˆ δ = δ ˆ δ ˆ δ ( q, x ) , w , σ by inductive hypothesis = ˆ δ ˆ δ ( q, x ) , wσ by definition (2) of ˆ δ = ˆ δ ˆ δ ( q, x ) , y by definition of y 1

Problem 2 [20 points] Give deterministic FSAs accepting the following languages: 1. [6 points] The set of strings over { a, b, c } in which all the a s precede the b s, which in turn precede the c s. It is possible that there are no a s, b s, or c s. (Sudkamp Problem 6.5) 2. [7 points] The set of strings over { a, b, c } in which every b is immediately followed by at least one c . (Sudkamp Problem 6.10) 3. [7 points] The set of strings over { 0 , 1 } such that the third symbol from the right end is the same as the last symbol. 2
Please note that there are simpler solutions for this question than the one given above. Problem 3 [20 points] Give non-deterministic FSAs (possibly with moves) accepting the following languages: 1. [10 points] The set of strings over { a, b }

