|
|
(394 intermediate revisions by more than 100 users not shown) |
Line 1: |
Line 1: |
| A '''finite state transducer''' ('''FST''') is a [[finite state machine]] with two tapes: an input tape and an output tape. This contrasts with an ordinary [[finite state automaton]] (or [[finite state acceptor]]), which has a single tape.
| | This is a preview for the new '''MathML rendering mode''' (with SVG fallback), which is availble in production for registered users. |
|
| |
|
| ==Overview==
| | If you would like use the '''MathML''' rendering mode, you need a wikipedia user account that can be registered here [[https://en.wikipedia.org/wiki/Special:UserLogin/signup]] |
| | * Only registered users will be able to execute this rendering mode. |
| | * Note: you need not enter a email address (nor any other private information). Please do not use a password that you use elsewhere. |
|
| |
|
| An automaton can be said to ''recognize'' a string if we view the content of its tape as input. In other words, the automaton computes a function that maps strings into the set {0,1}. Alternatively, we can say that an automaton ''generates'' strings, which means viewing its tape as an output tape. On this view, the automaton generates a [[formal language]], which is a set of strings. The two views of automata are equivalent: the function that the automaton computes is precisely the [[indicator function]] of the set of strings it generates. The class of languages generated by finite automata is known as the class of [[regular language]]s.
| | Registered users will be able to choose between the following three rendering modes: |
|
| |
|
| The two tapes of a transducer are typically viewed as an input tape and an output tape. On this view, a transducer is said to ''transduce'' (i.e., translate) the contents of its input tape to its output tape, by accepting a string on its input tape and generating another string on its output tape. It may do so [[nondeterministic]]ally and it may produce more than one output for each input string. A transducer may also produce no output for a given input string, in which case it is said to ''reject'' the input. In general, a transducer computes a [[relation (mathematics)|relation]] between two formal languages. The class of relations computed by finite state transducers is known as the class of [[rational relation]]s.
| | '''MathML''' |
| | :<math forcemathmode="mathml">E=mc^2</math> |
|
| |
|
| Finite-state transducers are often used for [[phonology|phonological]] and [[morphology (linguistics)|morphological analysis]] in [[natural language processing]] research and applications. Pioneers in this field include [[Ronald Kaplan]], [[Lauri Karttunen]], [[Martin Kay]] and [[Kimmo Koskenniemi]].<ref>{{Harvnb|Koskenniemi|1983}}</ref>
| | <!--'''PNG''' (currently default in production) |
| A common way of using transducers is in a so-called "cascade", where transducers for various operations are combined into a single transducer by repeated application of the composition operator (defined below).
| | :<math forcemathmode="png">E=mc^2</math> |
|
| |
|
| ==Formal construction== | | '''source''' |
| | :<math forcemathmode="source">E=mc^2</math> --> |
|
| |
|
| Formally, a finite transducer ''T'' is a 6-tuple (''Q'', Σ, Γ, ''I'', ''F'', δ) such that:
| | <span style="color: red">Follow this [https://en.wikipedia.org/wiki/Special:Preferences#mw-prefsection-rendering link] to change your Math rendering settings.</span> You can also add a [https://en.wikipedia.org/wiki/Special:Preferences#mw-prefsection-rendering-skin Custom CSS] to force the MathML/SVG rendering or select different font families. See [https://www.mediawiki.org/wiki/Extension:Math#CSS_for_the_MathML_with_SVG_fallback_mode these examples]. |
|
| |
|
| * ''Q'' is a [[finite set]], the set of ''states'';
| | ==Demos== |
| * Σ is a finite set, called the ''input alphabet'';
| |
| * Γ is a finite set, called the ''output alphabet'';
| |
| * ''I'' is a [[subset]] of ''Q'', the set of ''initial states'';
| |
| * ''F'' is a subset of ''Q'', the set of ''final states''; and
| |
| * <math>\delta \subseteq Q \times (\Sigma\cup\{\epsilon\}) \times (\Gamma\cup\{\epsilon\}) \times Q</math> (where ε is the [[empty string]]) is the ''transition relation''.
| |
|
| |
|
| We can view (''Q'', δ) as a labeled [[directed graph]], known as the ''transition graph'' of ''T'': the set of vertices is ''Q'', and <math>(q,a,b,r)\in\delta</math> means that there is a labeled edge going from vertex ''q'' to vertex ''r''. We also say that ''a'' is the ''input label'' and ''b'' the ''output label'' of that edge.
| | Here are some [https://commons.wikimedia.org/w/index.php?title=Special:ListFiles/Frederic.wang demos]: |
|
| |
|
| NOTE: This definition of finite transducer is also called ''letter transducer'' (Roche and Schabes 1997); alternative definitions are possible, but can all be converted into transducers following this one.
| |
|
| |
|
| Define the ''extended transition relation'' <math>\delta^*</math> as the smallest set such that:
| | * accessibility: |
| | ** Safari + VoiceOver: [https://commons.wikimedia.org/wiki/File:VoiceOver-Mac-Safari.ogv video only], [[File:Voiceover-mathml-example-1.wav|thumb|Voiceover-mathml-example-1]], [[File:Voiceover-mathml-example-2.wav|thumb|Voiceover-mathml-example-2]], [[File:Voiceover-mathml-example-3.wav|thumb|Voiceover-mathml-example-3]], [[File:Voiceover-mathml-example-4.wav|thumb|Voiceover-mathml-example-4]], [[File:Voiceover-mathml-example-5.wav|thumb|Voiceover-mathml-example-5]], [[File:Voiceover-mathml-example-6.wav|thumb|Voiceover-mathml-example-6]], [[File:Voiceover-mathml-example-7.wav|thumb|Voiceover-mathml-example-7]] |
| | ** [https://commons.wikimedia.org/wiki/File:MathPlayer-Audio-Windows7-InternetExplorer.ogg Internet Explorer + MathPlayer (audio)] |
| | ** [https://commons.wikimedia.org/wiki/File:MathPlayer-SynchronizedHighlighting-WIndows7-InternetExplorer.png Internet Explorer + MathPlayer (synchronized highlighting)] |
| | ** [https://commons.wikimedia.org/wiki/File:MathPlayer-Braille-Windows7-InternetExplorer.png Internet Explorer + MathPlayer (braille)] |
| | ** NVDA+MathPlayer: [[File:Nvda-mathml-example-1.wav|thumb|Nvda-mathml-example-1]], [[File:Nvda-mathml-example-2.wav|thumb|Nvda-mathml-example-2]], [[File:Nvda-mathml-example-3.wav|thumb|Nvda-mathml-example-3]], [[File:Nvda-mathml-example-4.wav|thumb|Nvda-mathml-example-4]], [[File:Nvda-mathml-example-5.wav|thumb|Nvda-mathml-example-5]], [[File:Nvda-mathml-example-6.wav|thumb|Nvda-mathml-example-6]], [[File:Nvda-mathml-example-7.wav|thumb|Nvda-mathml-example-7]]. |
| | ** Orca: There is ongoing work, but no support at all at the moment [[File:Orca-mathml-example-1.wav|thumb|Orca-mathml-example-1]], [[File:Orca-mathml-example-2.wav|thumb|Orca-mathml-example-2]], [[File:Orca-mathml-example-3.wav|thumb|Orca-mathml-example-3]], [[File:Orca-mathml-example-4.wav|thumb|Orca-mathml-example-4]], [[File:Orca-mathml-example-5.wav|thumb|Orca-mathml-example-5]], [[File:Orca-mathml-example-6.wav|thumb|Orca-mathml-example-6]], [[File:Orca-mathml-example-7.wav|thumb|Orca-mathml-example-7]]. |
| | ** From our testing, ChromeVox and JAWS are not able to read the formulas generated by the MathML mode. |
|
| |
|
| * <math>\delta\subseteq\delta^*</math>;
| | ==Test pages == |
| * <math>(q,\epsilon,\epsilon,q)\in\delta^*</math> for all <math>q\in Q</math>; and
| |
| * whenever <math>(q,x,y,r) \in \delta^*</math> and <math>(r,a,b,s) \in \delta</math> then <math>(q,xa,yb,s) \in \delta^*</math>.
| |
|
| |
|
| The extended transition relation is essentially the reflexive [[transitive closure]] of the transition graph that has been augmented to take edge labels into account. The elements of <math>\delta^*</math> are known as ''paths''. The edge labels of a path are obtained by concatenating the edge labels of its constituent transitions in order.
| | To test the '''MathML''', '''PNG''', and '''source''' rendering modes, please go to one of the following test pages: |
| | *[[Displaystyle]] |
| | *[[MathAxisAlignment]] |
| | *[[Styling]] |
| | *[[Linebreaking]] |
| | *[[Unique Ids]] |
| | *[[Help:Formula]] |
|
| |
|
| The ''behavior'' of the transducer ''T'' is the rational relation [''T''] defined as follows: <math>x[T]y</math> [[if and only if]] there exists <math>i \in I</math> and <math>f \in F</math> such that <math>(i,x,y,f) \in \delta^*</math>. This is to say that ''T'' transduces a string <math>x\in\Sigma^*</math> into a string <math>y\in\Gamma^*</math> if there exists a path from an initial state to a final state whose input label is ''x'' and whose output label is ''y''.
| | *[[Inputtypes|Inputtypes (private Wikis only)]] |
| | | *[[Url2Image|Url2Image (private Wikis only)]] |
| Finite State Transducers can be weighted, where each transition is labeled with a weight in addition to the input and output labels. This property makes FSTs very useful for machine learning applications. A Weighted Finite State Transducer (WFST) over a [[semiring]] ''K'' can be defined similarly to an unweighted one as an 8-tuple ''T''=(''Q'', Σ, Γ, ''I'', ''F'', E, λ, ρ), where:
| | ==Bug reporting== |
| * ''Q'', Σ, Γ, ''I'', ''F'' are defined as above;
| | If you find any bugs, please report them at [https://bugzilla.wikimedia.org/enter_bug.cgi?product=MediaWiki%20extensions&component=Math&version=master&short_desc=Math-preview%20rendering%20problem Bugzilla], or write an email to math_bugs (at) ckurs (dot) de . |
| * <math> E \subseteq Q \times (\Sigma\cup\{\epsilon\}) \times (\Gamma\cup\{\epsilon\}) \times Q \times K</math> (where ε is the [[empty string]]) is the finite set of transitions;
| |
| * <math>\lambda: I \rightarrow K </math> maps initial states to weights;
| |
| * <math>\rho: F \rightarrow K </math> maps final states to weights.
| |
| In order to make certain operations on WFSTs well-defined, the weights have to form a [[Semiring]]. Two typical semirings used in practice are Log and Tropical Semirings.
| |
| | |
| ==Operations on finite state transducers==
| |
| | |
| The following operations defined on finite automata also apply to finite transducers:
| |
| | |
| * [[Union (set theory)|Union]]. Given transducers ''T'' and ''S'', there exists a transducer <math>T\cup S</math> such that <math>x[T\cup S]y</math> if and only if <math>x[T]y</math> or <math>x[S]y</math>.
| |
| | |
| * Concatenation. Given transducers ''T'' and ''S'', there exists a transducer <math>T\cdot S</math> such that <math>wx[T\cdot S]yz</math> if and only if <math>w[T]y</math> and <math>x[S]z</math>.
| |
| | |
| * [[Kleene closure]]. Given a transducer ''T'', there exists a transducer <math>T^*</math> with the following properties: (1) <math>\epsilon[T^*]\epsilon</math>; (2) if <math>w[T^*]y</math> and <math>x[T]z</math> then <math>wx[T^*]yz</math>; and <math>x[T^*]y</math> does not hold unless mandated by (1) or (2).
| |
| | |
| * [[Intersection (set theory)|Intersection]]. Given transducers ''T'', ''S'', the intersection of these transducers ''H'' has the property that (1) x[''H'']y if and only if x[''T'']y and x[''S'']y.
| |
| | |
| * [[Composition of relations|Composition]]. Given a transducer ''T'' on alphabets Σ and Γ and a transducer ''S'' on alphabets Γ and Δ, there exists a transducer <math>T \circ S</math> on Σ and Δ such that <math>x[T \circ S]z</math> if and only if there exists a string <math>y\in\Gamma^*</math> such that <math>x[T]y</math> and <math>y[S]z</math>. This operation extends to the weighted case.<ref>{{harvnb|Mohri|2004|pp=3–5}}</ref> | |
| | |
| :This definition uses the same notation which is used in mathematics for [[Composition of relations|relation composition]]. However, the conventional reading for relation composition is the other way around: given two relations <math>T</math> and <math>S</math>, <math>(x,z)\in T\circ S</math> when there exist some <math>y</math> such that <math>(x,y)\in S</math> and <math>(y,z)\in T</math>.
| |
| | |
| * [[Projection (mathematics)|Projection]] to an automaton. There are two projection functions: <math>\pi_1</math> preserves the input tape, and <math>\pi_2</math> preserves the output tape. The first projection, <math>\pi_1</math> is defined as follows:
| |
| | |
| :* Given a transducer ''T'', there exists a finite automaton <math>\pi_1 T</math> such that <math>\pi_1 T</math> accepts ''x'' if and only if there exists a string ''y'' for which <math>x[T]y</math>.
| |
| | |
| :The second projection, <math>\pi_2</math> is defined similarly.
| |
| | |
| * [[Determinization]]. Given a transducer ''T'', we want to build an equivalent transducer which has a unique initial state and such that no two transitions leaving any state share the same input label. The [[powerset construction]] can be extended to transducers, or even weighted transducers, but sometimes fails to halt; indeed, some non-deterministic transducers do not admit equivalent deterministic transducers.<ref>[http://www.let.rug.nl/~vannoord/papers/preds/node22.html]</ref> [[Characterization (mathematics)|Characterizations]] of determinizable transducers have been proposed<ref>{{harvnb|Mohri|2004|pp=5–6}}</ref> along with efficient algorithms to test them<ref>{{harvnb|Allauzen|2003}}</ref>: they rely on the [[semiring]] used in the weighted case as well as a general property on the structure of the transducer (the [[twins property]]).
| |
| | |
| * Weight pushing for the weighted case.<ref>{{harvnb|Mohri|2004|pp=7–9}}</ref>
| |
| | |
| * Minimization for the weighted case.<ref>{{harvnb|Mohri|2004|pp=9–11}}</ref>
| |
| | |
| * Removal of [[epsilon-transitions]].
| |
| | |
| ==Additional properties of finite state transducers== | |
| | |
| * It is [[decidable]] whether the relation [''T''] of a transducer ''T'' is empty.
| |
| | |
| * It is decidable whether there exists a string ''y'' such that ''x''[''T'']''y'' for a given string ''x''.
| |
| | |
| * It is [[undecidable]] whether two transducers are equivalent.
| |
| | |
| * If one defines the alphabet of labels <math>L=(\Sigma\cup\{\epsilon\}) \times (\Gamma\cup\{\epsilon\})</math>, finite state transducers are isomorphic to [[nondeterministic finite automata|NDFA]] over the alphabet <math>L</math>, and may therefore be determinized (turned into [[deterministic finite automaton|deterministic finite automata]] over the alphabet <math>L=[(\Sigma\cup\{\epsilon\}) \times \Gamma] \cup [\Sigma \times (\Gamma\cup\{\epsilon\})]</math> ) and subsequently minimized so that they have the minimum number of states.{{Citation needed|date=September 2008}}
| |
| | |
| ==Applications==
| |
| | |
| * Context-sensitive rewriting rules of the form ''a → b / c _ d'', used in [[linguistics]] to model [[phonological rule]]s and [[sound change]], are computationally equivalent to finite-state transducers, provided that application is nonrecursive, i.e. the rule is not allowed to rewrite the same substring twice. <ref>{{cite web | url=http://acl.ldc.upenn.edu/J/J94/J94-3001.pdf | title=Regular Models of Phonological Rule Systems | accessdate=August 25, 2012}}</ref>
| |
| | |
| ==See also==
| |
| | |
| === Internal links ===
| |
| * [[Mealy machine]]
| |
| * [[Moore machine]]
| |
| * [[Morphological dictionary]]
| |
| * [[foma (software)]]
| |
| | |
| === External links ===
| |
| * [http://openfst.org/ OpenFst], an open-source library for FST operations.
| |
| * [http://www.ims.uni-stuttgart.de/tcl/SOFTWARE/SFST.html Stuttgart Finite State Transducer Tools], another open-source FST toolkit
| |
| * [http://jsalatas.ictpro.gr/java-fst-framework-api-review/ java FST Framework], an open-source java FST Framework capable of handling OpenFst text format.
| |
| | |
| ==Notes==
| |
| {{reflist|3}}
| |
| | |
| == References ==
| |
| | |
| <references/>
| |
| <div class="references-small">
| |
| *{{cite journal
| |
| | last1 = Allauzen
| |
| | first1 = Cyril
| |
| | last2 = Mohri
| |
| | first2 = Mehryar
| |
| | title = Efficient Algorithms for Testing the Twins Property
| |
| | journal = Journal of Automata, Languages and Combinatorics
| |
| | year = 2003
| |
| | volume = 8
| |
| | issue = 2
| |
| | pages = 117–144
| |
| | url = http://www.cs.nyu.edu/~mohri/pub/twins.pdf
| |
| }}
| |
| *{{citation
| |
| | last = Koskenniemi
| |
| | first = Kimmo
| |
| | authorlink = Kimmo Koskenniemi
| |
| | title = Two-level morphology: A general computational model of word-form recognition and production
| |
| | publisher = Department of General Linguistics, [[University of Helsinki]]
| |
| | year = 1983
| |
| | url = http://www.ling.helsinki.fi/~koskenni/doc/Two-LevelMorphology.pdf
| |
| }}
| |
| *{{cite journal
| |
| | last = Mohri
| |
| | first = Mehryar
| |
| | title = Weighted Finite-State Transducer Algorithms: An Overview
| |
| | journal = Formal Languages and Applications
| |
| | year = 2004
| |
| | volume = 148
| |
| | issue = 620
| |
| | pages = 551–564
| |
| | url = http://www.cs.nyu.edu/~mohri/pub/fla.pdf
| |
| }}
| |
| </div>
| |
| | |
| ==Further reading==
| |
| | |
| * {{cite book |last= Jurafsky|first= Daniel |coauthors= James H. Martin|authorlink=Daniel Jurafsky|title= [[Speech and Language Processing]] |publisher= [[Prentice Hall]] |year= 2000 |isbn= 0-13-095069-6 |pages=71–83}}
| |
| * {{cite book |last= Kornai|first= Andras|authorlink=Andras Kornai|title=[[Extended Finite State Models of Language]] |publisher= [[Cambridge University Press]] |year=1999 |isbn= 0-521-63198-X}}
| |
| * {{cite book |last= Roche|first= Emmanuel |coauthors= Yves Schabes|authorlink=Emmanuel Roche|title= [[Finite-state language processing]] |publisher= [[MIT Press]] |year= 1997 |isbn= 0-262-18182-7|pages=1–65}}
| |
| * {{cite book |last= Beesley|first= Kenneth R. |coauthors= Lauri Karttunen|authorlink=Kenneth R. Beesley|title= [[Finite State Morphology]] |publisher= [[Center for the Study of Language and Information]] |year= 2003 |isbn= 1-57586-434-7}}
| |
| * {{cite book |last= Roark|first= Brian |coauthors= Richard Sproat|authorlink=Brian Roark|title= [[Computational Approaches to Morphology and Syntax]] |publisher= [[Oxford University Press]] |year= 2007 |isbn= 0-19-927478-9}}
| |
| * {{cite book |last= Berstel|first= Jean |title= [[Transductions and Context-free Languages]] |publisher= [[Teubner Verlag]] |year= 1979 }}. [http://www-igm.univ-mlv.fr/~berstel/LivreTransductions/LivreTransductions.html Free PDF version]
| |
| | |
| {{DEFAULTSORT:Finite State Transducer}}
| |
| [[Category:Models of computation]]
| |
| [[Category:Formal languages]]
| |
| [[Category:Automata theory]]
| |
| | |
| [[bs:Konačni transduktor]]
| |
| [[ca:Transductor d'estats finits]]
| |
| [[de:Transduktor (Informatik)]]
| |
| [[es:Transductor de estados finitos]]
| |
| [[fr:Transducteur fini]]
| |
| [[hr:Konačni pretvornik]]
| |
| [[sr:Коначни трансдуктор]]
| |