Main Page: Difference between revisions

From formulasearchengine
Jump to navigation Jump to search
No edit summary
No edit summary
 
(294 intermediate revisions by more than 100 users not shown)
Line 1: Line 1:
{{About|Bioinformatics|the disease in horses known by the acronym "PSSM"|Equine polysaccharide storage myopathy}}
This is a preview for the new '''MathML rendering mode''' (with SVG fallback), which is availble in production for registered users.
{{Expert-subject|date=September 2011}}


A '''position weight matrix (PWM)''', also called '''position-specific weight matrix (PSWM)''' or '''position-specific scoring matrix (PSSM)''', is a commonly used representation of [[sequence motif|motifs]] (patterns) in biological sequences.<ref name="Ben-Gal2005">{{cite journal |author=Ben-Gal I, Shani A, Gohr A, Grau J, Arviv S, Shmilovici A, Posch S, Grosse I |title=Identification of Transcription Factor Binding Sites with Variable-order Bayesian Networks |journal=Bioinformatics |volume=21 |issue=11 |year=2005 |pages=2657–2666 |url=http://bioinformatics.oxfordjournals.org/cgi/reprint/bti410?ijkey=KkxNhRdTSfvtvXY&keytype=ref |doi=10.1093/bioinformatics/bti410 |pmid=15797905}}</ref>
If you would like use the '''MathML''' rendering mode, you need a wikipedia user account that can be registered here [[https://en.wikipedia.org/wiki/Special:UserLogin/signup]]
* Only registered users will be able to execute this rendering mode.
* Note: you need not enter a email address (nor any other private information). Please do not use a password that you use elsewhere.


A PWM is a matrix of score values that gives a weighted match to any given [[substring]] of fixed length. It has one row for each symbol of the alphabet, and one column for each position in the pattern. The score assigned by a PWM to a [[substring]] <math>s=(s_j)_{j=1}^N</math> is defined as <math>\textstyle \sum_{j=1}^{N}{m_{s_j,j}}</math>, where <math>j</math> represents position in the substring, <math>s_j</math> is the symbol at position <math>j</math> in the substring, and <math>m_{\alpha,j}</math> is the score in row <math>\alpha</math>, column <math>j</math> of the matrix. In other words, a PWM score is the sum of position-specific scores for each symbol in the substring.
Registered users will be able to choose between the following three rendering modes:


==Basic PWM with log-likelihoods==
'''MathML'''
A PWM assumes independence between positions in the pattern, as it calculates scores at each position independently from the symbols at other positions.
:<math forcemathmode="mathml">E=mc^2</math>
The score of a substring aligned with a PWM can be interpreted as the [[likelihood function|log-likelihood]] of the substring under a product multinomial distribution. Since each column defines log-likelihoods for each of the different symbols, where the sum of likelihoods in a column equals one, the PWM corresponds to a [[Multinomial distribution]]. A PWM's score is the sum of log-likelihoods, which corresponds to the product of likelihoods, meaning that the score of a PWM is then a product-multinomial distribution. The PWM scores can also be interpreted in a physical framework as the sum of binding energies for all [[nucleotide]]s (symbols of the substring) aligned with the PWM.


==Incorporating background distribution==
<!--'''PNG'''  (currently default in production)
Instead of using log-likelihood values in the PWM, as described in the previous paragraph, several methods uses [[log-odds]] scores in the PWMs. An element in a PWM is then calculated as <math>m_{i,j}=log(p_{i,j} / b_i)</math>, where <math>p_{i,j}</math> is the probability of observing symbol i at position j of the motif, and <math>b_i</math> is the probability of observing the symbol i in a background model. The PWM score then corresponds to the log-odds of the substring being generated by the motif versus being generated by the background, in a [[generative model]] of the sequence.
:<math forcemathmode="png">E=mc^2</math>


==Information content of a PWM==
'''source'''
The [[information content]] (IC) of a PWM is sometimes of interest, as it says something about how different a given PWM is from a [[uniform distribution (discrete)|uniform distribution]].
:<math forcemathmode="source">E=mc^2</math> -->


The [[self-information]] of observing a particular symbol at a particular position of the motif is:
<span style="color: red">Follow this [https://en.wikipedia.org/wiki/Special:Preferences#mw-prefsection-rendering link] to change your Math rendering settings.</span> You can also add a [https://en.wikipedia.org/wiki/Special:Preferences#mw-prefsection-rendering-skin Custom CSS] to force the MathML/SVG rendering or select different font families. See [https://www.mediawiki.org/wiki/Extension:Math#CSS_for_the_MathML_with_SVG_fallback_mode these examples].
:<math>-\log(p_{i,j})</math>


The expected (average) self-information of a particular element in the PWM is then:
==Demos==
:<math>-p_{i,j} \cdot \log(p_{i,j})</math>


Finally, the IC of the PWM is then the sum of the expected self-information of every element:
Here are some [https://commons.wikimedia.org/w/index.php?title=Special:ListFiles/Frederic.wang demos]:
:<math>\textstyle -\sum_{i,j} p_{i,j}\cdot \log(p_{i,j})</math>


Often, it is more useful to calculate the information content with the background letter frequencies of the sequences you are studying rather than assuming equal probabilities of each letter (e.g., the GC-content of DNA of [[thermophilic]] bacteria range from 65.3 to 70.8,<ref name="Aleksandrushkina1978">{{cite journal |author=Aleksandrushkina NI, Egorova LA |title=Nucleotide makeup of the DNA of thermophilic bacteria of the genus Thermus |journal=Mikrobiologiia |volume=47 |issue=2 |pages=250–2 |year=1978 |pmid=661633}}</ref> thus a motif of ATAT would contain much more information than a motif of CCGG). The equation for information content thus becomes
:<math>\textstyle -\sum_{i,j} p_{i,j}\cdot \log(p_{i,j}/p_{b})</math>
where <math>p_{b}</math> is the background frequency for that letter. This corresponds to the [[Kullback-Leibler divergence]] or relative entropy. However, it has been shown that when using PSSM to search genomic sequences (see below) this uniform correction can lead to overestimation of the importance of the different bases in a motif, due to the uneven distribution of n-mers in real genomes, leading to a significantly larger number of false positives.<ref name="Erill2009">{{cite journal |author=Erill I, O'Neill MC |title=A reexamination of information theory-based methods for DNA-binding site identification |journal=BMC Bioinformatics |volume=10 |year=2009 |pmid=19210776 |pages=57 |doi=10.1186/1471-2105-10-57 |pmc=2680408}}</ref>


==Using PWMs==
* accessibility:
There are various algorithms to scan for hits of PWMs in sequences. One example is the MATCH algorithm<ref name="Kel2003">{{cite journal |author=Kel AE, ''et. al.'' |title=MATCHTM: a tool for searching transcription factor binding sites in DNA sequences |journal=Nucleic Acids Research |volume=31 |pages=3576–3579 |year=2003 |doi=10.1093/nar/gkg585 |pmid=12824369 |issue=13 |pmc=169193}}</ref> which has been implemented in the ModuleMaster.<ref name="Wrzodek2010">{{Cite journal
** Safari + VoiceOver: [https://commons.wikimedia.org/wiki/File:VoiceOver-Mac-Safari.ogv video only], [[File:Voiceover-mathml-example-1.wav|thumb|Voiceover-mathml-example-1]], [[File:Voiceover-mathml-example-2.wav|thumb|Voiceover-mathml-example-2]], [[File:Voiceover-mathml-example-3.wav|thumb|Voiceover-mathml-example-3]], [[File:Voiceover-mathml-example-4.wav|thumb|Voiceover-mathml-example-4]], [[File:Voiceover-mathml-example-5.wav|thumb|Voiceover-mathml-example-5]], [[File:Voiceover-mathml-example-6.wav|thumb|Voiceover-mathml-example-6]], [[File:Voiceover-mathml-example-7.wav|thumb|Voiceover-mathml-example-7]]
  | last1 = Wrzodek  | first1 = Clemens
** [https://commons.wikimedia.org/wiki/File:MathPlayer-Audio-Windows7-InternetExplorer.ogg Internet Explorer + MathPlayer (audio)]
  | last2 = Schröder | first2 = Adrian
** [https://commons.wikimedia.org/wiki/File:MathPlayer-SynchronizedHighlighting-WIndows7-InternetExplorer.png Internet Explorer + MathPlayer (synchronized highlighting)]
  | last3 = Dräger | first3 = Andreas
** [https://commons.wikimedia.org/wiki/File:MathPlayer-Braille-Windows7-InternetExplorer.png Internet Explorer + MathPlayer (braille)]
  | last4 = Wanke | first4 = Dierk
** NVDA+MathPlayer: [[File:Nvda-mathml-example-1.wav|thumb|Nvda-mathml-example-1]], [[File:Nvda-mathml-example-2.wav|thumb|Nvda-mathml-example-2]], [[File:Nvda-mathml-example-3.wav|thumb|Nvda-mathml-example-3]], [[File:Nvda-mathml-example-4.wav|thumb|Nvda-mathml-example-4]], [[File:Nvda-mathml-example-5.wav|thumb|Nvda-mathml-example-5]], [[File:Nvda-mathml-example-6.wav|thumb|Nvda-mathml-example-6]], [[File:Nvda-mathml-example-7.wav|thumb|Nvda-mathml-example-7]].
  | last5 = Berendzen | first5 = Kenneth W.
** Orca: There is ongoing work, but no support at all at the moment [[File:Orca-mathml-example-1.wav|thumb|Orca-mathml-example-1]], [[File:Orca-mathml-example-2.wav|thumb|Orca-mathml-example-2]], [[File:Orca-mathml-example-3.wav|thumb|Orca-mathml-example-3]], [[File:Orca-mathml-example-4.wav|thumb|Orca-mathml-example-4]], [[File:Orca-mathml-example-5.wav|thumb|Orca-mathml-example-5]], [[File:Orca-mathml-example-6.wav|thumb|Orca-mathml-example-6]], [[File:Orca-mathml-example-7.wav|thumb|Orca-mathml-example-7]].
  | last6 = Kronfeld | first6 = Marcel
** From our testing, ChromeVox and JAWS are not able to read the formulas generated by the MathML mode.
  | last7 = Harter | first7 = Klaus
  | last8 = Zell | first8 = Andreas
  | title = ModuleMaster: A new tool to decipher transcriptional regulatory networks
  | journal = Biosystems
  | volume = 99
  | issue = 1
  | pages = 79–81
  | publisher = Elsevier
  | location = Ireland
  | date = 9 October 2009
  | year = 2010
  | doi = 10.1016/j.biosystems.2009.09.005
  | issn = 0303-2647
  | pmid = 19819296
}}</ref> More sophisticated algorithms for fast database searching with nucleotide as well as amino acid PWMs/PSSMs are implemented in the possumsearch software and are described by Beckstette, ''et al.'' (2006).<ref name="Beckstette2006">{{cite journal |author=Beckstette, M. |title=Fast index based algorithms and software for matching position specific scoring matrices |journal=BMC Bioinformatics |volume=7 |year=2006 |doi=10.1186/1471-2105-7-389 |pmid=16930469 |pages=389 |pmc=1635428 |display-authors=1 |last2=Homann |first2=Robert |last3=Giegerich |first3=Robert |last4=Kurtz |first4=Stefan}}</ref>


==References==
==Test pages ==
{{Reflist|2}}


==External links==
To test the '''MathML''', '''PNG''', and '''source''' rendering modes, please go to one of the following test pages:
* [http://www.biodatamining.org/content/2/1/8 3PFDB] &mdash; a database of Best Representative PSSM Profiles (BRPs) of Protein Families generated using a novel data mining approach.
*[[Displaystyle]]
* [http://ugene.unipro.ru/ UGENE] &mdash; PSS matrices design, integrated interface to JASPAR, Uniprobe and SITECON databases.
*[[MathAxisAlignment]]
*[[Styling]]
*[[Linebreaking]]
*[[Unique Ids]]
*[[Help:Formula]]


{{Use dmy dates|date=September 2011}}
*[[Inputtypes|Inputtypes (private Wikis only)]]
 
*[[Url2Image|Url2Image (private Wikis only)]]
[[Category:Bioinformatics]]
==Bug reporting==
[[Category:Evaluation methods]]
If you find any bugs, please report them at [https://bugzilla.wikimedia.org/enter_bug.cgi?product=MediaWiki%20extensions&component=Math&version=master&short_desc=Math-preview%20rendering%20problem Bugzilla], or write an email to math_bugs (at) ckurs (dot) de .
 
[[fa:ماتریس وزن موقعیت خاص]]

Latest revision as of 22:52, 15 September 2019

This is a preview for the new MathML rendering mode (with SVG fallback), which is availble in production for registered users.

If you would like use the MathML rendering mode, you need a wikipedia user account that can be registered here [[1]]

  • Only registered users will be able to execute this rendering mode.
  • Note: you need not enter a email address (nor any other private information). Please do not use a password that you use elsewhere.

Registered users will be able to choose between the following three rendering modes:

MathML

E=mc2


Follow this link to change your Math rendering settings. You can also add a Custom CSS to force the MathML/SVG rendering or select different font families. See these examples.

Demos

Here are some demos:


Test pages

To test the MathML, PNG, and source rendering modes, please go to one of the following test pages:

Bug reporting

If you find any bugs, please report them at Bugzilla, or write an email to math_bugs (at) ckurs (dot) de .