Chomsky normal form: Difference between revisions

Revision as of 23:21, 3 February 2014

In formal language theory, a context-free grammar is said to be in Chomsky normal form (named for Noam Chomsky) if all of its production rules Template:Disambiguation needed are of the form:

A \to B C

or

A \to α

or

S \to ε

,

where $A$ , $B$ and $C$ are nonterminal symbols, $α$ is a terminal symbol (a symbol that represents a constant value), $S$ is the start symbol, and $ε$ is the empty string. Also, neither $B$ nor $C$ may be the start symbol, and the third production rule can only appear if $ε$ is in $L (G)$ , namely, the language produced by the context-free grammar $G$ .

Every grammar in Chomsky normal form is context-free, and conversely, every context-free grammar can be transformed into an equivalent one which is in Chomsky normal form. Several algorithms for performing such a transformation are known. Transformations are described in most textbooks on automata theory, such as Hopcroft and Ullman, 1979.^[1] As pointed out by Lange and Leiß,^[2] the drawback of these transformations is that they can lead to an undesirable bloat in grammar size. The size of a grammar is the sum of the sizes of its production rules, where the size of a rule is one plus the length of its right-hand side. Using $| G |$ to denote the size of the original grammar $G$ , the size blow-up in the worst case may range from $| G |^{2}$ to $2^{2 | G |}$ , depending on the transformation algorithm used.

Alternative definition

Another way to define the Chomsky normal form (e.g., Hopcroft and Ullman 1978,and Hopcroft et al. 2006) is:

A formal grammar is in Chomsky reduced form if all of its production rules are of the form:

A \to B C

or

A \to α

,

where $A$ , $B$ and $C$ are nonterminal symbols, and $α$ is a terminal symbol. When using this definition, $B$ or $C$ may be the start symbol. Only those context-free grammars which do not generate the empty string can be transformed into Chomsky reduced form.

Converting a grammar to Chomsky Normal Form

Introduce $S_{0}$
Introduce a new start variable, $S_{0}$ and a new rule $S_{0} \to S$ where $S$ is the previous start variable.
Eliminate all $ε$ rules
$ε$ rules are rules of the form $A \to ε$ , where $A = S_{0}$ and $A \in V$ , where $V$ is the CFG's variable alphabet.

Remove every rule with $ε$ on its right hand side (RHS). For each rule with $A$ in its RHS, add a set of new rules consisting of the different possible combinations of $A$ replaced or not replaced with $ε$ . If a rule has $A$ as a singleton on its RHS, add a new rule $R = A \to ε$ unless $R$ has already been removed through this process. For example, examine the following grammar $G$ :
$S \to A b A ∣ B$

$B \to b ∣ c$

$A \to ε$

$G$ has one $ε$ rule. When the $A \to ε$ is removed, we get the following:
$S \to A b A ∣ A b ∣ b A ∣ b ∣ B$

$B \to b ∣ c$

Notice that we have to account for all possibilities of $A \to ε$ and so we actually end up adding 3 rules.
Eliminate all unit rules
$A \to B; A, B \in V$

After all the $ε$ rules have been removed, you can begin removing unit rules, or rules whose RHS contains one variable and no terminals (which is inconsistent with CNF).
To remove $A \to B$

$\forall B \to U$ , where $U$ is a string of variables and terminals, add rule $A \to U$ unless this is a unit rule which has already been removed.
Clean up the remaining rules that are not in Chomsky normal form.
Replace $A \to u_{1} u_{2} \dots u_{k}, k \geq 3, u_{1} \in V \cup Σ$ with $A \to u_{1} A_{1}, A_{1} \to u_{2} A_{2}, \dots, A_{k - 2} \to u_{k - 1} u_{k}$ , where $A_{i}$ are new variables.

If $u_{i} \in Σ$ , replace $u_{i}$ in above rules with some new variable $V_{i}$ and add rule $V_{i} \to u_{i}$ .

Footnotes

↑ * John E. Hopcroft, Rajeev Motwani, and Jeffrey D. Ullman. Introduction to Automata Theory, Languages, and Computation, 3rd Edition, Addison-Wesley, 2006. ISBN 0-321-45536-3 (see subsection 7.1.5, page 272.)
↑ Lange, Martin and Leiß, Hans. To CNF or not to CNF? An Efficient Yet Presentable Version of the CYK Algorithm. Informatica Didactica 8, 2009.

References

John E. Hopcroft and Jeffrey D. Ullman, Introduction to Automata Theory, Languages and Computation, Addison-Wesley Publishing, Reading Massachusetts, 1979. ISBN 0-201-02988-X. (See chapter 4.)
20 year-old Real Estate Agent Rusty from Saint-Paul, has hobbies and interests which includes monopoly, property developers in singapore and poker. Will soon undertake a contiki trip that may include going to the Lower Valley of the Omo.

My blog: http://www.primaboinca.com/view_profile.php?userid=5889534 (Pages 98–101 of section 2.1: context-free grammars. Page 156.)
20 year-old Real Estate Agent Rusty from Saint-Paul, has hobbies and interests which includes monopoly, property developers in singapore and poker. Will soon undertake a contiki trip that may include going to the Lower Valley of the Omo.

My blog: http://www.primaboinca.com/view_profile.php?userid=5889534 (Pages 237–240 of section 6.6: simplified forms and normal forms.)
20 year-old Real Estate Agent Rusty from Saint-Paul, has hobbies and interests which includes monopoly, property developers in singapore and poker. Will soon undertake a contiki trip that may include going to the Lower Valley of the Omo.

My blog: http://www.primaboinca.com/view_profile.php?userid=5889534 (Pages 103–106.)
Cole, Richard. Converting CFGs to CNF (Chomsky Normal Form), October 17, 2007. (pdf)
Sipser, Michael. Introduction to the Theory of Computation, 2nd edition.

[1] * John E. Hopcroft, Rajeev Motwani, and Jeffrey D. Ullman. Introduction to Automata Theory, Languages, and Computation, 3rd Edition, Addison-Wesley, 2006. ISBN 0-321-45536-3 (see subsection 7.1.5, page 272.)

[2] Lange, Martin and Leiß, Hans. To CNF or not to CNF? An Efficient Yet Presentable Version of the CYK Algorithm. Informatica Didactica 8, 2009.

[1]

[2]

@@ Line 1: / Line 1: @@
-I’m Diego from Lorient doing my final year engineering in Hotel Administration. I did my schooling, secured 91% and hope to find someone with same interests in Insect collecting.<br><br>Also visit my webpage; [http://www.recentarticless.info/Bookbyte-Coupons-Advice-For-Everyone-To-Make-Use-Of.htm bookbyte coupon Code]
+In [[formal language]] theory, a [[context-free grammar]] is said to be in '''Chomsky normal form''' (named for [[Noam Chomsky]]) if all of its [[production rule]]s{{disambiguation needed|date=February 2014}} are of the form:
+: <math>A \rightarrow BC</math> or
+: <math>A \rightarrow \alpha</math> or
+: <math>S \rightarrow \varepsilon</math>,
+where <math>A</math>, <math>B</math> and <math>C</math> are nonterminal symbols, <math>\alpha</math> is a [[terminal symbol]] (a symbol that represents a constant value), <math>S</math> is the start symbol, and <math>\varepsilon</math> is the [[empty string]].  Also, neither <math>B</math> nor <math>C</math> may be the start symbol, and the third production rule can only appear if <math>\varepsilon</math> is in <math>L(G)</math>, namely, the language produced by the context-free grammar <math>G</math>.
+Every grammar in Chomsky normal form is [[Context-free grammar|context-free]], and conversely, every context-free grammar can be transformed into an equivalent one which is in Chomsky normal form. Several algorithms for performing such a transformation are known. Transformations are described in most textbooks on automata theory, such as Hopcroft and Ullman, 1979.<ref>* John E. Hopcroft, Rajeev Motwani, and Jeffrey D. Ullman. ''Introduction to Automata Theory, Languages, and Computation'', 3rd Edition, Addison-Wesley, 2006. ISBN 0-321-45536-3 ''(see subsection 7.1.5, page 272.)''</ref> As pointed out by Lange and Leiß,<ref>[http://ddi.cs.uni-potsdam.de/InformaticaDidactica/LangeLeiss2009.pdf Lange, Martin and Leiß, Hans. ''To CNF or not to CNF? An Efficient Yet Presentable Version of the CYK Algorithm.'' Informatica Didactica 8, 2009.]</ref> the drawback of these transformations is that they can lead to an undesirable bloat in grammar size. The size of a grammar is the sum of the sizes of its production rules, where the size of a rule is one plus the length of its right-hand side. Using <math>|G|</math> to denote the size of the original grammar <math>G</math>, the size blow-up in the worst case may range from <math>|G|^2</math> to <math>2^{2 |G|}</math>, depending on the transformation algorithm used.
+==Alternative definition==
+Another way to define the Chomsky normal form (e.g., Hopcroft and Ullman 1978,and Hopcroft et al. 2006) is:
+A [[formal grammar]] is in '''Chomsky reduced form''' if all of its production rules are of the form:
+: <math>A \rightarrow\, BC</math> or
+: <math>A \rightarrow\, \alpha</math>,
+where <math>A</math>, <math>B</math> and <math>C</math> are nonterminal symbols, and <math>\alpha</math> is a [[terminal symbol]].  When using this definition, <math>B</math> or <math>C</math> may be the start symbol.  Only those context-free grammars which do not generate the [[empty string]] can be transformed into Chomsky reduced form.
+==Converting a grammar to Chomsky Normal Form==
+# Introduce <math>S_0</math></dt>
+#: Introduce a new start variable, <math>S_0 </math> and a new rule <math>S_0 \rightarrow S</math> where <math>S </math> is the previous start variable.</dd>
+# Eliminate all ''<math>\varepsilon</math> rules''</dt>
+#: <math>\varepsilon</math> rules are rules of the form <math>A \rightarrow \varepsilon</math>, where <math>A \not= S_0</math> and <math>A \in V</math>, where <math>V</math> is the [[Context-free grammar|CFG]]'s variable alphabet.
+#: Remove every rule with <math>\varepsilon</math> on its right hand side (RHS).  For each rule with <math>A</math> in its RHS, add a set of new rules consisting of the different possible combinations of <math>A</math> replaced or not replaced with <math>\varepsilon</math>.  If a rule has <math>A</math> as a singleton on its RHS, add a new rule <math>R = A \rightarrow \varepsilon</math> ''unless'' <math>R</math> has already been removed through this process.  For example, examine the following grammar <math>G</math>:
+#:: <math>S \rightarrow AbA \mid B</math>
+#:: <math>B \rightarrow b \mid c</math>
+#:: <math>A \rightarrow \varepsilon</math>
+#:<math>G</math> has one <math>\varepsilon</math> rule.  When the <math>A \rightarrow \varepsilon</math> is removed, we get the following:
+#:: <math>S \rightarrow AbA \mid Ab \mid bA \mid b \mid B</math>
+#:: <math>B \rightarrow b \mid c</math>
+#:Notice that we have to account for all possibilities of <math>A \rightarrow \varepsilon</math> and so we actually end up adding 3 rules.
+# Eliminate all ''unit rules''
+#:<math>A \rightarrow B; A,B \in V</math>
+#:After all the <math>\varepsilon</math> rules have been removed, you can begin removing unit rules, or rules whose RHS contains one variable and no terminals (which is inconsistent with CNF).
+#:: To remove <math>A \rightarrow B</math>
+#:: <math>\forall B \rightarrow U</math>, where <math>U</math> is a string of variables and terminals, add rule <math>A \rightarrow U</math> unless this is a unit rule which has already been removed.</dd>
+# Clean up the remaining rules that are not in Chomsky normal form.
+#: Replace <math>A \rightarrow u_1 u_2 \dotso u_k, k \ge 3, u_1 \in V \cup \Sigma</math> with <math>A \rightarrow u_1 A_1 , A_1 \rightarrow u_2 A_2 , \dotsc , A_{k-2} \rightarrow u_{k-1} u_k</math>, where <math>A_i</math> are new variables.
+#: If <math>u_i \in \Sigma</math>, replace <math>u_i</math> in above rules with some new variable <math>V_i</math> and add rule <math>V_i \rightarrow u_i</math>.
+== See also ==
+*[[Backus-Naur form]]
+*[[CYK algorithm]]
+*[[Greibach normal form]]
+*[[Kuroda normal form]]
+== Footnotes ==
+<references />
+== References ==
+* John E. Hopcroft and Jeffrey D. Ullman, ''Introduction to Automata Theory, Languages and Computation'', Addison-Wesley Publishing, Reading Massachusetts, 1979. ISBN 0-201-02988-X. ''(See chapter 4.)''
+* {{cite book
+ | authorlink = Michael Sipser
+ | author = Michael Sipser
+ | year = 1997
+ | title = Introduction to the Theory of Computation
+ | publisher = PWS Publishing
+ | isbn = 0-534-94728-X
+ }} ''(Pages 98–101 of section 2.1: context-free grammars. Page 156.)''
+* {{cite book
+ | author = John Martin
+ | year = 2003
+ | title = Introduction to Languages and the Theory of Computation
+ | publisher = McGraw Hill
+ | isbn = 0-07-232200-4
+ }} ''(Pages 237–240 of section 6.6: simplified forms and normal forms.)''
+* {{cite book
+ | authorlink = Michael A. Harrison
+ | author = Michael A. Harrison
+ | year = 1978
+ | title = Introduction to Formal Language Theory
+ | publisher = Addison-Wesley
+ | isbn = 978-0-201-02955-0
+ }} ''(Pages 103–106.)''
+* Cole, Richard.  ''Converting CFGs to CNF (Chomsky Normal Form)'', October 17, 2007. [http://cs.nyu.edu/courses/fall07/V22.0453-001/cnf.pdf (pdf)]
+* Sipser, Michael. ''Introduction to the Theory of Computation,'' 2nd edition.
+[[Category:Formal languages]]
+[[Category:Noam Chomsky]]

Chomsky normal form: Difference between revisions

Revision as of 23:21, 3 February 2014

Contents

Alternative definition

Converting a grammar to Chomsky Normal Form

See also

Footnotes

References

Navigation menu

Chomsky normal form: Difference between revisions

Revision as of 23:21, 3 February 2014

Alternative definition

Converting a grammar to Chomsky Normal Form

See also

Footnotes

References

Navigation menu

Search