Transition of state: Difference between revisions

From formulasearchengine
Jump to navigation Jump to search
en>KConWiki
No edit summary
 
en>Colonies Chris
m sp, date & link fixes; unlinking common words, replaced: ’s → 's (2), bra-ket → bra–ket using AWB
 
(One intermediate revision by one other user not shown)
Line 1: Line 1:
'''Linguistic sequence complexity''' (LC) is a measure of the 'vocabulary richness' of a text.<ref name=Trifonov1990>{{cite book| author=[[Edward N. Trifonov]] |year=1990| book=Structure & Methods| title=Structure and Methods| series= Human Genome Initiative and DNA Recombination| volume=1| pages=69–77|chapter=Making sense of the human genome|publisher=Adenine Press, New York}}</ref>
I would like to introduce myself to you, I am Ava Coney but you can connect with me anything at all you like. Credit history authorising is my working [http://Mondediplo.com/spip.php?page=recherche&recherche=day+task day task] now. What me and my loved ones love is to engage in country new music and now I have time to just take on new issues. Arkansas is the only spot I've been residing in. I've been working on my internet site for some time now. Check out it out listed here: http://www.rafer.es/_notes/catalogo/calvin-klein-underwear.html<br><br>Feel free to surf to my web site: [http://www.rafer.es/_notes/catalogo/calvin-klein-underwear.html Calvin Klein Underwear]
When a [[nucleotide]] sequence is written as text using a four-letter alphabet, the repetitiveness of the text, that is, the repetition of its [[N-gram|N-grams (words)]], can be calculated and serves as a measure of sequence complexity. Thus, the more complex a [[DNA sequence]], the richer its [[oligonucleotide]] vocabulary, whereas repetitious sequences have relatively lower complexities. Subsequent work improved the original algorithm described in ([[Edward Trifonov|Trifonov]] 1990)<ref name=Trifonov1990/> without changing the essence of the linguistic complexity approach.<ref name=Gabrielian1999>{{cite doi|10.1016/S0097-8485(99)00007-8|noedit}}</ref><ref name=Orlov2004>{{cite doi|10.1093/nar/gkh466|noedit}}</ref><ref name=Janson2004>{{cite doi|10.1016/j.tcs.2004.06.023|noedit}}</ref>
 
The meaning of LC may be better understood by regarding the presentation of a sequence as a [[Tree (data structure)|tree]] of all subsequences of the given sequence. The most complex sequences have maximally balanced trees, while the measure of imbalance or tree asymmetry serves as a complexity measure. The number of nodes at the tree level {{math|<var>i</var>}} is equal to the actual vocabulary size of words with the length {{math|<var>i</var>}} in a given sequence; the number of nodes in the most balanced tree, which corresponds to the most complex sequence of length N, at the tree level {{math|<var>i</var>}} is either 4<sup>i</sup> or N-j+1, whichever is smaller. Complexity ({{math|<var>C</var>}}) of a sequence fragment (with a length RW) can be directly calculated as the product of vocabulary-usage measures (U<sub>i</sub>):<ref name=Gabrielian1999 />
 
<math>C = U_1 U_2...U_i....U_w </math>
 
Vocabulary usage for [[oligomers]] of a given size {{math|<var>i</var>}} can be defined as the ratio of the actual vocabulary size of a given sequence to the maximal possible vocabulary size for a sequence of that length. For example, U<sub>2</sub> for the sequence ACGGGAAGCTGATTCCA = 14/16, as it contains 14 of 16 possible different dinucleotides; U<sub>3</sub> for the same sequence = 15/15, and U<sub>4</sub>=14/14. For the sequence ACACACACACACACACA, U<sub>1</sub>=1/2; U<sub>2</sub>=2/16=0.125, as it has a simple vocabulary of only two dinucleotides; U<sub>3</sub> for this sequence = 2/15. k-tuples with k from two to W considered, while W depends on RW. For RW values less than 18, W is equal to 3; for RW less than 67, W is equal to 4; for RW<260, W=5; for RW<1029, W=6, and so on. The value of {{math|<var>C</var>}} provides a measure of sequence complexity in the range 0<C<1 for various DNA sequence fragments of a given length.<ref name=Gabrielian1999 />
This formula is different from the original LC measure<ref name=Trifonov1990/> in two respects: in the way vocabulary usage U<sub>i</sub> is calculated, and because {{math|<var>i</var>}} is not in the range of 2 to N-1 but only up to W. This limitation on the range of U<sub>i</sub> makes the algorithm substantially more efficient without loss of power.<ref name=Gabrielian1999 />
 
This sequence analysis complexity calculation can be used to search for conserved regions between compared sequences for the detection of low-complexity regions including simple sequence repeats, imperfect [[Direct repeat|direct]] or [[inverted repeat]]s, polypurine and polypyrimidine [[Triple-stranded DNA|triple-stranded DNA structures]], and four-stranded structures (such as [[G-quadruplex]]es).<ref name=Kalendar2011>{{cite doi|10.1016/j.ygeno.2011.04.009|noedit}}</ref>
 
== References ==
{{reflist}}
 
[[Category:Nucleic acids]]
[[Category:Bioinformatics]]

Latest revision as of 22:09, 14 March 2014

I would like to introduce myself to you, I am Ava Coney but you can connect with me anything at all you like. Credit history authorising is my working day task now. What me and my loved ones love is to engage in country new music and now I have time to just take on new issues. Arkansas is the only spot I've been residing in. I've been working on my internet site for some time now. Check out it out listed here: http://www.rafer.es/_notes/catalogo/calvin-klein-underwear.html

Feel free to surf to my web site: Calvin Klein Underwear