This preview has intentionally blurred sections. Sign up to view the full version.
View Full Document
Unformatted text preview: ECS 120 Lesson 11 – Chomsky Normal Form Oliver Kreylos Monday, April 23rd, 2001 Today we are going to look at a special way to write down context free grammars that will make reasoning about them easier. This special form was introduced by Noam Chomsky himself and is called the Chomsky Normal Form (CNF). We will show that for every contextfree grammar G , there is an equivalent grammar G that is in Chomsky Normal Form. The constructive proof for this claim will provide an algorithm to transform G into G . 1 Definition of Chomsky Normal Form A contextfree grammar G = ( V, Σ ,R,S ) is said to be in Chomsky Normal Form (CNF), if and only if every rule in R is of one of the following forms: 1. A → a , for some A ∈ V and some a ∈ Σ 2. A → BC , for some A ∈ V and B,C ∈ V \ { S } 3. S → In other words: Every rule either replaces a variable by a single character or by a pair of variables except the start symbol, and the only rule that can have the empty word as its righthand side must have the start symbol as its lefthand side. From the above definition it follows, that every parse tree for a grammar in CNF must be a binary tree, and the parse tree for any nonempty word cannot have any leaves labeled with in it. The use for the Chomsky Normal Form is to make many of the proofs about contextfree languages we will encounter later much easier by allowing us to assume that every contextfree 1 grammar we want to reason about is in Chomsky Normal Form. We will first see the usefulness of CNF in the proof for the ContextFree Pumping Lemma. 2 Transforming a Grammar to CNF In order to construct the grammar G in CNF that is equivalent to a given grammar G , we first have to identify how exactly G can violate the rules for a CNF. Since the CNF only restricts the rules in G , we have to look only at R . Here are the “bad” cases of rules: 1. A → uSv , where A ∈ V and u,v ∈ ( V ∪ Σ) * . The start symbol must not appear on the righthand side of any rule. We call rules of this type start symbol rules . 2. A → , where A ∈ V \ { S } . The only symbol that can be replaced by the empty word is the start symbol. We call rules of this typerules . 3. A → B , where A,B ∈ V . The only rules involving variables on the righthand side must have exactly two of them. We call rules of this type unit rules . 4. A → w , where A ∈ V , w ∈ ( V ∪ Σ) * and w contains at least one character and at least one variable. The only rules where characters appear on the righthand side must have exactly one character as the righthand side. We call rules of this type mixed rules . 5. A → w , where A ∈ V and w ∈ ( V ∪ Σ) * with  w  > 2. Rules must either have one symbol (one character) or two symbols (two variables) as the righthand side. We call rules of this type long rules ....
View
Full Document
 Spring '07
 Filkov
 Formal language, Contextfree grammar

Click to edit the document details