Visual binary: Difference between revisions

Revision as of 07:59, 17 February 2014

Friends call her Roni. What she really enjoys doing is films and he or she would never stop working. Nevada is largest I love most. Taking care of animals will be the I support my family but I've always wanted my own home based business. Check out my website here: http://primacleanse.net/

my page; Prima Cleanse

@@ Line 1: / Line 1: @@
-In [[computer science]], the '''Cocke–Younger–Kasami (CYK) algorithm''' (alternatively called '''CKY''') is a [[parsing]] [[algorithm]] for [[context-free grammar]]s, its name came from the  inventors, [[John Cocke]], Daniel Younger and [[Tadao Kasami]]. It employs  [[bottom-up parsing]] and [[dynamic programming]].
+Friends call her Roni. What she really enjoys doing is films and he or she would never stop working. Nevada is largest I love most. Taking care of animals will be the I support my family but I've always wanted my own home based business. Check out my website here: http://primacleanse.net/<br><br>my page; [http://primacleanse.net/ Prima Cleanse]
-The standard version of CYK operates only on context-free grammars given in [[Chomsky normal form]] (CNF). However any context-free grammar may be transformed to a CNF grammar expressing the same language {{harv|Sipser|1997}}.
-The importance of the CYK algorithm stems from its high efficiency in certain situations. Using [[Landau symbol]]s, the [[Analysis of algorithms|worst case running time]] of CYK is <math>\Theta(n^3 \cdot |G|)</math>, where ''n'' is the length of the parsed string and ''|G|'' is the size of the CNF grammar ''G''. This makes it one of the most efficient parsing algorithms in terms of worst-case [[asymptotic complexity]], although other algorithms exist with better average running time in many practical scenarios.
-==Standard form==
-The algorithm requires the context-free grammar to be rendered into [[Chomsky normal form]] (CNF), because it tests for possibilities to split the current sequence in half. Any context-free grammar that does not generate the empty string can be represented in CNF using only [[Formal grammar#The syntax of grammars|production rules]] of the forms <math>A\rightarrow \alpha</math> and <math>A\rightarrow B C</math>.
-==Algorithm==
-===As pseudocode===
-The algorithm in [[pseudocode]] is as follows:
- '''let''' the input be a string ''S'' consisting of ''n'' characters: ''a''<sub>1</sub> ... ''a''<sub>''n''</sub>.
- '''let''' the grammar contain ''r'' nonterminal symbols ''R''<sub>1</sub> ... ''R''<sub>''r''</sub>.
- This grammar contains the subset ''R''<sub>''s''</sub> which is the set of start symbols.
- '''let''' ''P''[''n'',''n'',''r''] be an array of booleans. Initialize all elements of ''P'' to false.
- '''for each''' ''i'' = 1 to ''n''
-   '''for each''' unit production ''R''<sub>''j''</sub> -> ''a''<sub>''i''</sub>
-     set ''P''[''i'',''1'',''j''] = true
- '''for each''' ''i'' = 2 to ''n'' ''-- Length of span''
-   '''for each''' ''j'' = 1 to ''n''-''i''+1 ''-- Start of span''
-     '''for each''' ''k'' = 1 to ''i''-1 ''-- Partition of span''
-       '''for each''' production ''R''<sub>''A''</sub> -> ''R''<sub>''B''</sub> ''R''<sub>''C''</sub>
-         '''if''' ''P''[''j'',''k'',''B''] and ''P''[''j''+''k'',''i''-''k'',''C''] '''then''' set ''P''[''j'',''i'',''A''] = true
- '''if''' any of ''P''[1,''n'',''x''] is true (''x'' is iterated over the set ''s'', where ''s'' are all the indices for ''R''<sub>''s''</sub>) '''then'''
-   ''S'' is member of language
- '''else'''
-   ''S'' is not member of language
-===As prose===
-In informal terms, this algorithm considers every possible subsequence of the sequence of words and sets <math>P[i,j,k]</math> to be true if the subsequence of words starting from <math>i</math> of length <math>j</math> can be generated from <math>R_k</math>. Once it has considered subsequences of length 1, it goes on to subsequences of length 2, and so on. For subsequences of length 2 and greater, it considers every possible partition of the subsequence into two parts, and checks to see if there is some production <math>P \to Q \; R</math> such that <math>Q</math> matches the first part and <math>R</math> matches the second part. If so, it records <math>P</math> as matching the whole subsequence. Once this process is completed, the sentence is recognized by the grammar if the subsequence containing the entire sentence is matched by the start symbol.
-==Example==
-This is an example grammar:
-:<math>\begin{array}{lcl}
-S &\to& NP \;\; VP\\
-VP &\to& VP \;\; PP\\
-VP &\to& V \;\; NP\\
-VP &\to& \textit{eats}\\
-PP &\to& P \;\; NP\\
-NP &\to& Det \;\; N\\
-NP &\to& \textit{she}\\
-V &\to& \textit{eats}\\
-P &\to& \textit{with}\\
-N &\to& \textit{fish}\\
-N &\to& \textit{fork}\\
-Det &\to& a
-\end{array}</math>
-Now the sentence ''she eats a fish with a fork'' is analyzed using the CYK algorithm. In the following table, in <math>P[i,j,k]</math>, <math>i</math> is the number of the column (starting at the left at 1), and <math>j</math> is the number of the row  (starting at the bottom at 1).
-{| class="wikitable"
-|+CYK table
-|-
-| '''S'''
-|-
-|       || VP
-|-
-|       || &nbsp;||
-|-
-| '''S'''     ||       ||      ||
-|-
-|       || VP    ||      ||      || PP
-|-
-| '''S'''||       || NP   ||      ||       || NP
-|-
-| NP    || V, VP || Det. || N    || P     || Det || N
-|- style="border-top:3px solid grey;"
-| she   || eats  || a    || fish || with  || a    || fork
-|}
-Since <math>P[1,7,R_S]</math> is true, the example sentence can be generated by the grammar.
-==Extensions==
-===Generating a parse tree===
-It is simple to extend the above algorithm to not only determine if a sentence is in a language, but to also construct a [[parse tree]], by storing parse tree nodes as elements of the array, instead of booleans. Since the grammars being recognized can be ambiguous, it is necessary to store a list of nodes (unless one wishes to only pick one possible parse tree); the end result is then a forest of possible parse trees.
-An alternative formulation employs a second table B[n,n,r] of so-called ''backpointers''.
-===Parsing non-CNF context-free grammars===
-As pointed out by {{harvtxt|Lange|Leiß|2009}}, the drawback of all known transformations into Chomsky normal form is that they can lead to an undesirable bloat in grammar size. The size of a grammar is the sum of the sizes of its production rules, where the size of a rule is one plus the length of its right-hand side. Using <math>g</math> to denote the size of the original grammar, the size blow-up in the worst case may range from <math>g^2</math> to <math>2^{2 g}</math>, depending on the transformation algorithm used. For the use in teaching, Lange and Leiß propose a slight generalization of the CYK algorithm, "without compromising efficiency of the algorithm, clarity of its presentation, or simplicity of proofs" {{harv|Lange|Leiß|2009}}.
-===Parsing weighted context-free grammars===
-It is also possible to extend the CYK algorithm to parse strings using [[weighted context-free grammar|weighted]] and [[stochastic context-free grammar]]s. Weights (probabilities) are then stored in the table P instead of booleans, so P[i,j,A] will contain the minimum weight (maximum probability) that the substring from i to j can be derived from A. Further extensions of the algorithm allow all parses of a string to be enumerated from lowest to highest weight (highest to lowest probability).
-===Valiant's algorithm===
-The [[Analysis of algorithms|worst case running time]] of CYK is <math>\Theta(n^3 \cdot |G|)</math>, where ''n'' is the length of the parsed string and ''|G|'' is the size of the CNF grammar ''G''. This makes it one of the most efficient algorithms for recognizing general context-free languages in practice. {{harvtxt|Valiant|1975}} gave an extension of the CYK algorithm. His algorithm computes the same parsing table
-as the CYK algorithm; yet he showed that [[Matrix multiplication#Algorithms for efficient matrix multiplication|algorithms for efficient multiplication]] of [[Boolean matrix|matrices with 0-1-entries]] can be utilized for performing this computation.
-Using the [[Coppersmith–Winograd algorithm]] for multiplying these matrices, this gives an asymptotic worst-case running time of <math>O(n^{2.38} \cdot |G|)</math>. However, the constant term hidden by the [[Big O Notation]] is so large that the Coppersmith–Winograd algorithm is only worthwhile for matrices that are too large to handle on present-day computers {{harv|Knuth|1997}}, and this approach requires subtraction and so is only suitable for recognition. The dependence on efficient matrix multiplication cannot be avoided altogether: {{harvtxt|Lee|2002}} has proved that any parser for context-free grammars working in time <math>O(n^{3-\varepsilon} \cdot |G|)</math> can be effectively converted into an algorithm computing the product of <math>(n \times n)</math>-matrices with 0-1-entries in time <math>O(n^{3 - \varepsilon/3})</math>.
-==See also==
-* [[GLR parser]]
-* [[Earley parser]]
-* [[Packrat parser]]
-==References==
-{{Reflist}}
-* [[John Cocke]] and Jacob T. Schwartz (1970). Programming languages and their compilers: Preliminary notes. Technical report, [[Courant Institute of Mathematical Sciences]], [[New York University]].
-* [[Tadao Kasami|T. Kasami]] (1965). An efficient recognition and syntax-analysis algorithm for context-free languages.  Scientific report AFCRL-65-758, Air Force Cambridge Research Lab, [[Bedford, MA]].
-* Daniel H. Younger (1967). Recognition and parsing of context-free languages in time ''n''<sup>3</sup>. ''Information and Control'' 10(2): 189&ndash;208.
-* {{citation |last=Knuth |first=Donald E. |authorlink=Donald E. Knuth |title=The Art of Computer Programming Volume 2: Seminumerical Algorithms |publisher=Addison-Wesley Professional |edition=3rd |date=November 14, 1997 |isbn=978-0-201-89684-8 |pages=501 }}
-* {{Citation
-| last=Lange
-| first=Martin
-| last2=Leiß
-| first2=Hans
-| title=To CNF or not to CNF? An Efficient Yet Presentable Version of the CYK Algorithm
-| year=2009
-| journal=Informatica Didactica
-| volume=8
-| url=http://www.informatica-didactica.de/cmsmadesimple/index.php?page=LangeLeiss2009
-| place=[http://www.informatica-didactica.de/cmsmadesimple/uploads/Artikel/LangeLeiss2009/LangeLeiss2009.pdf pdf]
-}}
-*{{Citation
- | last=Sipser
- | first=Michael
- | title=Introduction to the Theory of Computation
- | publisher=IPS
- | year=1997
- | edition=1st
- | page=99
- | isbn =0-534-94728-X
-}}
-*{{Citation
-  | last = Lee
-  | first = Lillian
-  | title = Fast context-free grammar parsing requires fast Boolean matrix multiplication
-  | journal = [[Journal of the ACM]]
-  | volume = 49
-  | issue = 1
-  | pages = 1–15
-  | year = 2002
-  | doi = 10.1145/505241.505242
-  | postscript = .
-}}
-*{{citation |last=Valiant |first=Leslie G. |authorlink=Leslie G. Valiant |title=General context-free recognition in less than cubic time |journal=Journal of Computer and System Sciences |volume=10 |issue=2 |year=1975 |pages=308–314 }}
-==External links==
-* [http://www.diotavelli.net/people/void/demos/cky.html CYK parsing demo in JavaScript]
-* [http://www.informatik.uni-leipzig.de/alg/lehre/ss08/AUTO-SPRACHEN/Java-Applets/CYK-Algorithmus.html Interactive Applet from the University of Leipzig to demonstrate the CYK-Algorithm (Site is in german)]
-* [http://www.swisseduc.ch/compscience/exorciser/ Exorciser is a Java application to generate exercises in the CYK algorithm as well as Finite State Machines, Markov algorithms etc]
-[[Category:Parsing algorithms]]

Visual binary: Difference between revisions

Revision as of 07:59, 17 February 2014

Navigation menu

Search