|
|
Line 1: |
Line 1: |
| In [[computer science]], in the area of [[formal language theory]], frequent use is made of a variety of [[string functions]]; however, the notation used is different from that used on [[computer programming]], and some commonly used functions in the theoretical realm are rarely used when programming. This article defines some of these basic terms.
| | The writer's name is Christy. Credit authorising is exactly where my main earnings arrives from. It's not a common thing but what she likes doing is to play domino but she doesn't have the time lately. Some time in the past he selected to [http://srncomm.com/blog/2014/08/25/relieve-that-stress-find-a-new-hobby/ clairvoyant psychic] live in North Carolina and he doesn't plan on altering it.<br><br>Also visit online psychic readings ([http://myoceancounty.net/groups/apply-these-guidelines-when-gardening-and-grow/ http://myoceancounty.net/]) my webpage [http://gcjcteam.org/index.php?mid=etc_video&document_srl=696611&sort_index=regdate&order_type=desc email psychic readings] |
| | |
| ==Strings and languages==
| |
| A string is a finite sequence of characters.
| |
| The [[empty string]] is denoted by <math>\varepsilon</math>. | |
| The concatenation of two string <math>s</math> and <math>t</math> is denoted by <math>s \cdot t</math>, or shorter by <math>s t</math>.
| |
| Concatenating with the empty string makes no difference: <math>s \cdot \varepsilon = s = \varepsilon \cdot s</math>.
| |
| Concatenation of strings is associative: <math>s \cdot (t \cdot u) = (s \cdot t) \cdot u</math>.
| |
| | |
| For example, <math>(\langle b \rangle \cdot \langle l \rangle) \cdot (\varepsilon \cdot \langle ah \rangle) = \langle bl \rangle \cdot \langle ah \rangle = \langle blah \rangle</math>.
| |
| | |
| A [[language (computer science)|language]] is a finite or infinite set of strings.
| |
| Besides the usual set operations like union, intersection etc., concatenation can be applied to languages:
| |
| if both <math>S</math> and <math>T</math> are languages, their concatenation <math>S \cdot T</math> is defined as the set of concatenations of any string from <math>S</math> and any string from <math>T</math>, formally <math>S \cdot T = \{ s \cdot t \mid s \in S \land t \in T \}</math>.
| |
| Again, the concatenation dot <math>\cdot</math> is often omitted for shortness.
| |
| | |
| The language <math>\{\varepsilon\}</math> consisting of just the empty string is to be distinguished from the empty language <math>\{\}</math>.
| |
| Concatenating any language with the former doesn't make any change: <math>S \cdot \{\varepsilon\} = S = \{\varepsilon\} \cdot S</math>,
| |
| while concatenating with the latter always yields the empty language: <math>S \cdot \{\} = \{\} = \{\} \cdot S</math>.
| |
| Concatenation of languages is associtive: <math>S \cdot (T \cdot U) = (S \cdot T) \cdot U</math>.
| |
| | |
| For example, abbreviating <math>D = \{ \langle 0 \rangle, \langle 1 \rangle, \langle 2 \rangle, \langle 3 \rangle, \langle 4 \rangle, \langle 5 \rangle, \langle 6 \rangle, \langle 7 \rangle, \langle 8 \rangle, \langle 9 \rangle \}</math>, the set of all three-digit decimal numbers is obtained as <math>D \cdot D \cdot D</math>. The set of all decimal numbers of arbitrary length is an example for an infinite language.
| |
| | |
| ==Alphabet of a string==
| |
| The '''alphabet of a string''' is the set of all of the characters that occur in a particular string. If ''s'' is a string, its [[alphabet (computer science)|alphabet]] is denoted by
| |
| | |
| :<math>\operatorname{Alph}(s)</math>
| |
| | |
| The '''alphabet of a language''' <math>S</math> is the set of all characters that occur in any string of <math>S</math>, formally:
| |
| <math>\operatorname{Alph}(S) = \bigcup_{s \in S} \operatorname{Alph}(s)</math>.
| |
| | |
| For example, the set <math>\{\langle a \rangle,\langle c \rangle,\langle o \rangle\}</math> is the alphabet of the string <math>\langle cacao \rangle</math>,
| |
| and the [[#Strings_and_languages|above]] <math>D</math> is the alphabet of the [[#Strings_and_languages|above]] language <math>D \cdot D \cdot D</math> as well as of the language of all decimal numbers.
| |
| | |
| ==String substitution==
| |
| Let ''L'' be a [[language (computer science)|language]], and let <math>\Sigma</math> be its alphabet. A '''string substitution''' or simply a '''substitution''' is a mapping ''f'' that maps letters in <math>\Sigma</math> to languages (possibly in a different alphabet). Thus, for example, given a letter <math>a\in \Sigma</math>, one has <math>f(a)=L_a</math> where <math>L_a\subseteq\Delta^*</math> is some language whose alphabet is <math>\Delta</math>. This mapping may be extended to strings as
| |
| | |
| :<math>f(\varepsilon)=\varepsilon</math>
| |
| | |
| for the [[empty string]] <math>\varepsilon</math>, and
| |
| | |
| :<math>f(sa)=f(s)f(a)</math>
| |
| | |
| for string <math>s\in L</math>. String substitution may be extended to the entire language as
| |
| | |
| :<math>f(L)=\bigcup_{s\in L} f(s)</math> | |
| | |
| [[Regular language]]s are closed under string substitution. That is, if each letter of a regular language is substituted by another regular language, the result is still a regular language.
| |
| | |
| A simple example is the conversion <math>f_{uc}(\cdot)</math> to upper case, which may be defined e.g. as follows:
| |
| | |
| {| class="wikitable"
| |
| |-
| |
| ! letter !! mapped to language !! remark
| |
| |-
| |
| ! <math>x</math> !! <math>f_{uc}(x)</math> !!
| |
| |-
| |
| | <math>\langle a \rangle</math> || <math>\{\langle A \rangle\}</math> || map lower-case char to corresponding upper-case char
| |
| |-
| |
| | <math>\langle A \rangle</math> || <math>\{\langle A \rangle\}</math> || map upper-case char to itself
| |
| |-
| |
| | <math>\langle \text{ß} \rangle</math> || <math>\{\langle SS \rangle\}</math> || no upper-case char available, map to two-char string
| |
| |-
| |
| | <math>\langle 0 \rangle</math> || <math>\{\varepsilon\}</math> || map digit to empty string
| |
| |-
| |
| | <math>\langle ! \rangle</math> || <math>\{\}</math> || forbid punctuation, map to empty language
| |
| |-
| |
| | <math>\ldots</math> || || similar for other chars
| |
| |}
| |
| | |
| For the extension of <math>f_{uc}</math> to strings, we have e.g.
| |
| * <math>f_{uc}(\langle \text{Straße} \rangle) = \{\langle S \rangle\} \cdot \{\langle T \rangle\} \cdot \{\langle R \rangle\} \cdot \{\langle A \rangle\} \cdot \{\langle SS \rangle\} \cdot \{\langle E \rangle\} = \{ \langle STRASSE \rangle \}</math>,
| |
| * <math>f_{uc}(\langle u2 \rangle) = \{\langle U \rangle\} \cdot \{\varepsilon\} = \{\langle U \rangle\}</math>, and
| |
| * <math>f_{uc}(\langle Go! \rangle) = \{\langle G \rangle\} \cdot \{\langle O \rangle\} \cdot \{\} = \{\}</math>.
| |
| For the extension of <math>f_{uc}</math> to languages, we have e.g.
| |
| * <math>f_{uc}(\{\langle \text{Straße} \rangle, \langle u2 \rangle, \langle Go! \rangle\}) = \{ \langle STRASSE \rangle \} \cup \{\langle U \rangle\} \cup \{\} = \{ \langle STRASSE \rangle, \langle U \rangle\}</math>.
| |
|
| |
| | |
| Another example is the conversion of an [[EBCDIC]]-encoded string to [[ASCII]].
| |
| | |
| ==String homomorphism==
| |
| A '''string homomorphism''' (often referred to simply as a [[Homomorphism#Homomorphisms_and_e-free_homomorphisms_in_formal_language_theory|homomorphism]] in [[formal language theory]]) is a string substitution such that each letter is replaced by a single string. That is, <math>f(a)=s</math>, where ''s'' is a string, for each letter ''a''.
| |
| | |
| String homomorphisms are [[monoid morphism]]s on the [[free monoid]], preserving the [[binary operation]] of [[string concatenation]]. Given a language ''L'', the set <math>f(L)</math> is called the '''homomorphic image''' of ''L''. The '''inverse homomorphic image''' of a string ''s'' is defined as
| |
| | |
| :<math>f^{-1}(s)=\{w\vert f(w)=s\}</math>
| |
| | |
| while the inverse homomorphic image of a language ''L'' is defined as
| |
| | |
| :<math>f^{-1}(L)=\{s\vert f(s)\in L\}</math>
| |
| | |
| Note that, in general, <math>f(f^{-1}(L))\ne L</math>, while one does have
| |
| | |
| :<math>f(f^{-1}(L)) \subseteq L</math> | |
| | |
| and
| |
| | |
| :<math>L \subseteq f^{-1}(f(L))</math>
| |
| | |
| for any language ''L''.
| |
| | |
| A string homomorphism is said to be <math>\varepsilon </math>-free (or e-free) if <math>f(a) \ne \varepsilon</math> for all <math>a</math> in the alphabet <math>\Sigma</math>. Simple single-letter [[substitution cipher]]s are examples of (<math>\varepsilon</math>-free) string homomorphisms.
| |
| | |
| An example string homomorphism <math>g_{uc}</math> can also be obtained by defining similar to the [[#String_substitution|above]] substitution: <math>g_{uc}(\langle a \rangle) = \langle A \rangle</math>, ..., <math>g_{uc}(\langle 0 \rangle) = \varepsilon</math>, but letting <math>g_{uc}</math> undefined on punctuation chars. Besides this restriction of its input domain, <math>g_{uc}</math> differs from <math>f_{uc}</math> by returning strings, while the latter returned singleton sets of strings. Examples for inverse homomorphic images are
| |
| * <math>g_{uc}^{-1}(\{ \langle SSS \rangle \}) = \{ \langle sss \rangle, \langle \text{sß} \rangle, \langle \text{ßs} \rangle\} </math>, since <math>g_{uc}(\langle sss \rangle) = g_{uc}(\langle \text{sß} \rangle) = g_{uc}(\langle \text{ßs} \rangle) = \langle SSS \rangle</math>, and
| |
| * <math>g_{uc}^{-1}(\{ \langle A \rangle, \langle bb \rangle \}) = \{ \langle a \rangle\} </math>, since <math>g_{uc}(\langle a \rangle) = \langle A \rangle</math>, while <math>\langle bb \rangle</math> cannot be reached by <math>g_{uc}</math>.
| |
| For the latter language, <math>g_{uc}(g_{uc}^{-1}(\{ \langle A \rangle, \langle bb \rangle \})) = g_{uc}(\{ \langle a \rangle\}) = \{ \langle A \rangle \} \neq \{ \langle A \rangle, \langle bb \rangle \}</math>.
| |
| The homomorphism <math>g_{uc}</math> is not <math>\varepsilon </math>-free, since it maps e.g. <math>\langle 0 \rangle</math> to <math>\varepsilon</math>.
| |
| | |
| ==String projection==
| |
| If ''s'' is a string, and <math>\Sigma</math> is an alphabet, the '''string projection''' of ''s'' is the string that results by removing all letters which are not in <math>\Sigma</math>. It is written as <math>\pi_\Sigma(s)\,</math>. It is formally defined by removal of letters from the right hand side:
| |
| | |
| :<math>\pi_\Sigma(s) = \begin{cases}
| |
| \varepsilon & \mbox{if } s=\varepsilon \mbox{ the empty string} \\
| |
| \pi_\Sigma(t) & \mbox{if } s=ta \mbox{ and } a \notin \Sigma \\
| |
| \pi_\Sigma(t)a & \mbox{if } s=ta \mbox{ and } a \in \Sigma
| |
| \end{cases}</math>
| |
| | |
| Here <math>\varepsilon</math> denotes the [[empty string]]. The projection of a string is essentially the same as a [[projection in relational algebra]].
| |
| | |
| String projection may be promoted to the '''projection of a language'''. Given a [[formal language]] ''L'', its projection is given by
| |
| | |
| :<math>\pi_\Sigma (L)=\{\pi_\Sigma(s) \vert s\in L \}</math> | |
| | |
| ==Right quotient==
| |
| The '''right quotient''' of a letter ''a'' from a string ''s'' is the truncation of the letter ''a'' in the string ''s'', from the right hand side. It is denoted as <math>s/a</math>. If the string does not have ''a'' on the right hand side, the result is the empty string. Thus:
| |
| | |
| :<math>(sa)/ b = \begin{cases}
| |
| s & \mbox{if } a=b \\
| |
| \varepsilon & \mbox{if } a \ne b
| |
| \end{cases}</math>
| |
| | |
| The quotient of the empty string may be taken:
| |
| | |
| :<math>\varepsilon / a = \varepsilon</math>
| |
| | |
| Similarly, given a subset <math>S\subset M</math> of a monoid <math>M</math>, one may define the quotient subset as
| |
| | |
| :<math>S/a=\{s\in M \vert sa\in S\}</math>
| |
| | |
| Left quotients may be defined similarly, with operations taking place on the left of a string.
| |
| | |
| ==Syntactic relation==
| |
| The right quotient of a subset <math>S\subset M</math> of a monoid <math>M</math> defines an [[equivalence relation]], called the '''right [[syntactic relation]]''' of ''S''. It is given by
| |
| | |
| :<math>\sim_S \;\,=\, \{(s,t)\in M\times M \vert S/s = S/t \}</math>
| |
| | |
| The relation is clearly of finite index (has a finite number of equivalence classes) if and only if the family right quotients is finite; that is, if
| |
| | |
| :<math>\{S/m \vert m\in M\}</math>
| |
| | |
| is finite. In this case, ''S'' is a [[recognizable language]], that is, a language that can be recognized by a [[finite state automaton]]. This is discussed in greater detail in the article on [[syntactic monoid]]s.
| |
| | |
| ==Right cancellation== | |
| The '''right cancellation''' of a letter ''a'' from a string ''s'' is the removal of the first occurrence of the letter ''a'' in the string ''s'', starting from the right hand side. It is denoted as <math>s\div a</math> and is recursively defined as
| |
| | |
| :<math>(sa)\div b = \begin{cases}
| |
| s & \mbox{if } a=b \\
| |
| (s\div b)a & \mbox{if } a \ne b
| |
| \end{cases}</math>
| |
| | |
| The empty string is always cancellable:
| |
| | |
| :<math>\varepsilon \div a = \varepsilon</math>
| |
| | |
| Clearly, right cancellation and projection [[Commutative property|commute]]:
| |
| | |
| :<math>\pi_\Sigma(s)\div a = \pi_\Sigma(s \div a )</math>
| |
| | |
| ==Prefixes==
| |
| The '''prefixes of a string''' is the set of all [[prefix (computer science)|prefixes]] to a string, with respect to a given language:
| |
| | |
| :<math>\operatorname{Pref}_L(s) = \{t \vert s=tu \mbox { for } t,u\in \operatorname{Alph}(L)^*\}</math>
| |
| | |
| here <math>s\in L</math>.
| |
| | |
| The '''prefix closure of a language''' is
| |
| | |
| :<math>\operatorname{Pref} (L) = \bigcup_{s\in L} \operatorname{Pref}_L(s) = \left\{ t\vert s=tu; s\in L; t,u\in \operatorname{Alph}(L)^* \right\}</math>
| |
| | |
| '''Example:''' <br>
| |
| <math>L=\left\{abc\right\}\mbox{ then } \operatorname{Pref}(L)=\left\{\varepsilon, a, ab, abc\right\}</math>
| |
| | |
| A language is called '''prefix closed''' if <math>\operatorname{Pref} (L) = L</math>.
| |
| | |
| The prefix closure operator is [[idempotent]]:
| |
| | |
| :<math>\operatorname{Pref} (\operatorname{Pref} (L)) =\operatorname{Pref} (L)</math>
| |
| | |
| The '''prefix relation''' is a [[binary relation]] <math>\sqsubseteq</math> such that <math>s\sqsubseteq t </math> if and only if <math>s \in \operatorname{Pref}_L(t)</math>. This relation is a particular example of a [[prefix order]].
| |
| | |
| ==See also ==
| |
| * [[Comparison of programming languages (string functions)]]
| |
| * [[Levi's lemma]]
| |
| | |
| == References ==
| |
| {{reflist}}
| |
| * {{cite book | first1=John E. | last1=Hopcroft | first2=Jeffrey D. | last2=Ullman | title=Introduction to Automata Theory, Languages and Computation | publisher=Addison-Wesley Publishing | location=Reading, Massachusetts | year=1979 | isbn=0-201-02988-X | zbl=0426.68001 }} ''(See chapter 3.)''
| |
| | |
| [[Category:Formal languages]]
| |
| [[Category:Relational algebra]]
| |
| [[Category:String (computer science)|Operations]]
| |