Visual binary: Difference between revisions

From formulasearchengine
Jump to navigation Jump to search
en>PWS91
m →‎The Mass-Luminosity Relationship: changed a to alhpa, to avoid confusion since a had been previously used to denote the semi-major axis.
 
en>EmausBot
m Bot: Migrating 4 interwiki links, now provided by Wikidata on d:Q2088182
Line 1: Line 1:
In [[computer science]], the '''Cocke–Younger–Kasami (CYK) algorithm''' (alternatively called '''CKY''') is a [[parsing]] [[algorithm]] for [[context-free grammar]]s, its name came from the  inventors, [[John Cocke]], Daniel Younger and [[Tadao Kasami]]. It employs  [[bottom-up parsing]] and [[dynamic programming]].
Friends call her Roni. What she really enjoys doing is films and he or she would never stop working. Nevada is largest I love most. Taking care of animals will be the I support my family but I've always wanted my own home based business. Check out my website here: http://primacleanse.net/<br><br>my page; [http://primacleanse.net/ Prima Cleanse]
 
The standard version of CYK operates only on context-free grammars given in [[Chomsky normal form]] (CNF). However any context-free grammar may be transformed to a CNF grammar expressing the same language {{harv|Sipser|1997}}.
 
The importance of the CYK algorithm stems from its high efficiency in certain situations. Using [[Landau symbol]]s, the [[Analysis of algorithms|worst case running time]] of CYK is <math>\Theta(n^3 \cdot |G|)</math>, where ''n'' is the length of the parsed string and ''|G|'' is the size of the CNF grammar ''G''. This makes it one of the most efficient parsing algorithms in terms of worst-case [[asymptotic complexity]], although other algorithms exist with better average running time in many practical scenarios.
 
==Standard form==
 
The algorithm requires the context-free grammar to be rendered into [[Chomsky normal form]] (CNF), because it tests for possibilities to split the current sequence in half. Any context-free grammar that does not generate the empty string can be represented in CNF using only [[Formal grammar#The syntax of grammars|production rules]] of the forms <math>A\rightarrow \alpha</math> and <math>A\rightarrow B C</math>.
 
==Algorithm==
 
===As pseudocode===
The algorithm in [[pseudocode]] is as follows:
 
'''let''' the input be a string ''S'' consisting of ''n'' characters: ''a''<sub>1</sub> ... ''a''<sub>''n''</sub>.
'''let''' the grammar contain ''r'' nonterminal symbols ''R''<sub>1</sub> ... ''R''<sub>''r''</sub>.
This grammar contains the subset ''R''<sub>''s''</sub> which is the set of start symbols.
'''let''' ''P''[''n'',''n'',''r''] be an array of booleans. Initialize all elements of ''P'' to false.
'''for each''' ''i'' = 1 to ''n''
  '''for each''' unit production ''R''<sub>''j''</sub> -> ''a''<sub>''i''</sub>
    set ''P''[''i'',''1'',''j''] = true
'''for each''' ''i'' = 2 to ''n'' ''-- Length of span''
  '''for each''' ''j'' = 1 to ''n''-''i''+1 ''-- Start of span''
    '''for each''' ''k'' = 1 to ''i''-1 ''-- Partition of span''
      '''for each''' production ''R''<sub>''A''</sub> -> ''R''<sub>''B''</sub> ''R''<sub>''C''</sub>
        '''if''' ''P''[''j'',''k'',''B''] and ''P''[''j''+''k'',''i''-''k'',''C''] '''then''' set ''P''[''j'',''i'',''A''] = true
'''if''' any of ''P''[1,''n'',''x''] is true (''x'' is iterated over the set ''s'', where ''s'' are all the indices for ''R''<sub>''s''</sub>) '''then'''
  ''S'' is member of language
'''else'''
  ''S'' is not member of language
 
===As prose===
In informal terms, this algorithm considers every possible subsequence of the sequence of words and sets <math>P[i,j,k]</math> to be true if the subsequence of words starting from <math>i</math> of length <math>j</math> can be generated from <math>R_k</math>. Once it has considered subsequences of length 1, it goes on to subsequences of length 2, and so on. For subsequences of length 2 and greater, it considers every possible partition of the subsequence into two parts, and checks to see if there is some production <math>P \to Q \; R</math> such that <math>Q</math> matches the first part and <math>R</math> matches the second part. If so, it records <math>P</math> as matching the whole subsequence. Once this process is completed, the sentence is recognized by the grammar if the subsequence containing the entire sentence is matched by the start symbol.
 
==Example==
This is an example grammar:
 
:<math>\begin{array}{lcl}
S &\to& NP \;\; VP\\
VP &\to& VP \;\; PP\\
VP &\to& V \;\; NP\\
VP &\to& \textit{eats}\\
PP &\to& P \;\; NP\\
NP &\to& Det \;\; N\\
NP &\to& \textit{she}\\
V &\to& \textit{eats}\\
P &\to& \textit{with}\\
N &\to& \textit{fish}\\
N &\to& \textit{fork}\\
Det &\to& a
\end{array}</math>
 
Now the sentence ''she eats a fish with a fork'' is analyzed using the CYK algorithm. In the following table, in <math>P[i,j,k]</math>, <math>i</math> is the number of the column (starting at the left at 1), and <math>j</math> is the number of the row  (starting at the bottom at 1).
 
{| class="wikitable"
|+CYK table
|-
| '''S'''
|-
|      || VP
|-
|      || &nbsp;||
|-
| '''S'''    ||      ||      ||
|-
|      || VP    ||      ||      || PP
|-
| '''S'''||      || NP  ||      ||      || NP
|-
| NP    || V, VP || Det. || N    || P    || Det || N
|- style="border-top:3px solid grey;"
| she   || eats  || a    || fish || with  || a    || fork
|}
 
Since <math>P[1,7,R_S]</math> is true, the example sentence can be generated by the grammar.
 
==Extensions==
===Generating a parse tree===
It is simple to extend the above algorithm to not only determine if a sentence is in a language, but to also construct a [[parse tree]], by storing parse tree nodes as elements of the array, instead of booleans. Since the grammars being recognized can be ambiguous, it is necessary to store a list of nodes (unless one wishes to only pick one possible parse tree); the end result is then a forest of possible parse trees.
An alternative formulation employs a second table B[n,n,r] of so-called ''backpointers''.
 
===Parsing non-CNF context-free grammars===
 
As pointed out by {{harvtxt|Lange|Leiß|2009}}, the drawback of all known transformations into Chomsky normal form is that they can lead to an undesirable bloat in grammar size. The size of a grammar is the sum of the sizes of its production rules, where the size of a rule is one plus the length of its right-hand side. Using <math>g</math> to denote the size of the original grammar, the size blow-up in the worst case may range from <math>g^2</math> to <math>2^{2 g}</math>, depending on the transformation algorithm used. For the use in teaching, Lange and Leiß propose a slight generalization of the CYK algorithm, "without compromising efficiency of the algorithm, clarity of its presentation, or simplicity of proofs" {{harv|Lange|Leiß|2009}}.
 
===Parsing weighted context-free grammars===
It is also possible to extend the CYK algorithm to parse strings using [[weighted context-free grammar|weighted]] and [[stochastic context-free grammar]]s. Weights (probabilities) are then stored in the table P instead of booleans, so P[i,j,A] will contain the minimum weight (maximum probability) that the substring from i to j can be derived from A. Further extensions of the algorithm allow all parses of a string to be enumerated from lowest to highest weight (highest to lowest probability).
 
===Valiant's algorithm===
The [[Analysis of algorithms|worst case running time]] of CYK is <math>\Theta(n^3 \cdot |G|)</math>, where ''n'' is the length of the parsed string and ''|G|'' is the size of the CNF grammar ''G''. This makes it one of the most efficient algorithms for recognizing general context-free languages in practice. {{harvtxt|Valiant|1975}} gave an extension of the CYK algorithm. His algorithm computes the same parsing table
as the CYK algorithm; yet he showed that [[Matrix multiplication#Algorithms for efficient matrix multiplication|algorithms for efficient multiplication]] of [[Boolean matrix|matrices with 0-1-entries]] can be utilized for performing this computation.
 
Using the [[Coppersmith–Winograd algorithm]] for multiplying these matrices, this gives an asymptotic worst-case running time of <math>O(n^{2.38} \cdot |G|)</math>. However, the constant term hidden by the [[Big O Notation]] is so large that the Coppersmith–Winograd algorithm is only worthwhile for matrices that are too large to handle on present-day computers {{harv|Knuth|1997}}, and this approach requires subtraction and so is only suitable for recognition. The dependence on efficient matrix multiplication cannot be avoided altogether: {{harvtxt|Lee|2002}} has proved that any parser for context-free grammars working in time <math>O(n^{3-\varepsilon} \cdot |G|)</math> can be effectively converted into an algorithm computing the product of <math>(n \times n)</math>-matrices with 0-1-entries in time <math>O(n^{3 - \varepsilon/3})</math>.
 
==See also==
* [[GLR parser]]
* [[Earley parser]]
* [[Packrat parser]]
 
==References==
{{Reflist}}
* [[John Cocke]] and Jacob T. Schwartz (1970). Programming languages and their compilers: Preliminary notes. Technical report, [[Courant Institute of Mathematical Sciences]], [[New York University]].
* [[Tadao Kasami|T. Kasami]] (1965). An efficient recognition and syntax-analysis algorithm for context-free languages.  Scientific report AFCRL-65-758, Air Force Cambridge Research Lab, [[Bedford, MA]].
* Daniel H. Younger (1967). Recognition and parsing of context-free languages in time ''n''<sup>3</sup>. ''Information and Control'' 10(2): 189&ndash;208.
* {{citation |last=Knuth |first=Donald E. |authorlink=Donald E. Knuth |title=The Art of Computer Programming Volume 2: Seminumerical Algorithms |publisher=Addison-Wesley Professional |edition=3rd |date=November 14, 1997 |isbn=978-0-201-89684-8 |pages=501 }}
* {{Citation
| last=Lange
| first=Martin
| last2=Leiß
| first2=Hans
| title=To CNF or not to CNF? An Efficient Yet Presentable Version of the CYK Algorithm
| year=2009
| journal=Informatica Didactica
| volume=8
| url=http://www.informatica-didactica.de/cmsmadesimple/index.php?page=LangeLeiss2009
| place=[http://www.informatica-didactica.de/cmsmadesimple/uploads/Artikel/LangeLeiss2009/LangeLeiss2009.pdf pdf]
}}
*{{Citation
| last=Sipser
| first=Michael
| title=Introduction to the Theory of Computation
| publisher=IPS
| year=1997
| edition=1st
| page=99
| isbn =0-534-94728-X
}}
*{{Citation
  | last = Lee
  | first = Lillian
  | title = Fast context-free grammar parsing requires fast Boolean matrix multiplication
  | journal = [[Journal of the ACM]]
  | volume = 49
  | issue = 1
  | pages = 1–15
  | year = 2002
  | doi = 10.1145/505241.505242
  | postscript = .
}}
*{{citation |last=Valiant |first=Leslie G. |authorlink=Leslie G. Valiant |title=General context-free recognition in less than cubic time |journal=Journal of Computer and System Sciences |volume=10 |issue=2 |year=1975 |pages=308–314 }}
 
==External links==
* [http://www.diotavelli.net/people/void/demos/cky.html CYK parsing demo in JavaScript]
* [http://www.informatik.uni-leipzig.de/alg/lehre/ss08/AUTO-SPRACHEN/Java-Applets/CYK-Algorithmus.html Interactive Applet from the University of Leipzig to demonstrate the CYK-Algorithm (Site is in german)]
* [http://www.swisseduc.ch/compscience/exorciser/ Exorciser is a Java application to generate exercises in the CYK algorithm as well as Finite State Machines, Markov algorithms etc]
 
[[Category:Parsing algorithms]]

Revision as of 08:59, 17 February 2014

Friends call her Roni. What she really enjoys doing is films and he or she would never stop working. Nevada is largest I love most. Taking care of animals will be the I support my family but I've always wanted my own home based business. Check out my website here: http://primacleanse.net/

my page; Prima Cleanse