Mahalanobis distance: Difference between revisions

From formulasearchengine
Jump to navigation Jump to search
en>Jcarrete
Line 1: Line 1:
{{Use dmy dates|date=June 2013}}
Alyson is what my husband loves to contact me but I don't like when individuals use my full name. Kentucky is where I've always been residing. Invoicing is what I do. I am really fond of to go to karaoke but I've been using on new issues recently.<br><br>Check out my blog; psychic phone readings - [http://designstudiomb.com/content/require-more-details-hobbies-read-through-article my latest blog post],
{{More footnotes|date=April 2009}}
'''Belief propagation''', also known as '''sum-product message passing''' is a [[Message-passing method|message passing]] [[algorithm]] for performing [[inference]] on [[graphical model]]s, such as [[Bayesian network]]s and [[Markov random field]]s. It calculates the [[marginal distribution]] for each unobserved node, conditional on any observed nodes. Belief propagation is commonly used in [[artificial intelligence]] and [[information theory]] and has demonstrated empirical success in numerous applications including [[low-density parity-check codes]], [[turbo codes]], [[Thermodynamic free energy|free energy]] approximation, and [[satisfiability]].<ref name="Sat"/>
 
The algorithm was first proposed by [[Judea Pearl]] in 1982,<ref name="Pearl-1982">
{{cite conference|last=Pearl |first=Judea |authorlink=Judea Pearl
|year=1982
|title=Reverend Bayes on inference engines:  A distributed hierarchical approach
|booktitle=Proceedings of the Second National Conference on Artificial Intelligence
|conference=AAAI-82: Pittsburgh, PA
|conferenceurl=http://www.aaai.org/Library/AAAI/aaai82contents.php
|publisher=AAAI Press |location=Menlo Park, California
|pages=133&ndash;136
|url=https://www.aaai.org/Papers/AAAI/1982/AAAI82-032.pdf |accessdate=2009-03-28
}}
</ref> who formulated this algorithm on [[Tree (graph theory)|tree]]s, and was later extended to [[polytree]]s.<ref name="KimPearl-1983">
{{cite conference
|last=Kim |first=Jin H.
|coauthors=[[Judea Pearl|Pearl, Judea]]
|year=1983
|title=A computational model for combined causal and diagnostic reasoning in inference systems
|booktitle=Proceedings of the Eighth International Joint Conference on Artificial Intelligence
|conference=IJCAI-83: Karlsruhe, Germany
|conferenceurl=http://ijcai.org/Past%20Proceedings/IJCAI-83-VOL-1/CONTENT/content.htm
|volume=1
|pages=190&ndash;193
|url=http://ijcai.org/Past%20Proceedings/IJCAI-83-VOL-1/PDF/041.pdf
|accessdate=2013-01-03
}}
</ref> It has since been shown to be a useful approximate algorithm on general graphs.<ref name="Pearl-88">
{{Cite book
|last1=Pearl |first1=Judea |authorlink1=Judea Pearl
|year=1988
|title=Probabilistic Reasoning in Intelligent Systems:  Networks of Plausible Inference
|edition=2nd
|location=San Francisco, CA |publisher=Morgan Kaufmann
|isbn=1-55860-479-0
}}
</ref>
 
If ''X''=(''X''<sub>''v''</sub>) is a set of [[Discrete probability distribution|discrete]] [[random variable]]s with a [[joint distribution|joint]] [[Probability mass function|mass function]] ''p'', the [[marginal distribution]] of a single ''X''<sub>''i''</sub> is simply the summation of ''p'' over all other variables:
 
:<math>p_{X_i}(x_i) = \sum_{\mathbf{x}': x'_i=x_i} p(\mathbf{x}').</math>
 
However this quickly becomes computationally prohibitive: if there are 100 binary variables, then one needs to sum over 2<sup>99</sup>&nbsp;≈&nbsp;6.338&nbsp;×&nbsp;10<sup>29</sup> possible values. By exploiting the [[Graph (mathematics)|graphical]] structure, belief propagation allows the marginals to be computed much more efficiently.
 
==Description of the sum-product algorithm==
Variants of the Belief propagation algorithm  exist for several types of graphical models ([[bayesian network]] and [[markov random field]],<ref name = "yedidia2003">{{Cite book
|last1=Yedidia |first1=J.S.
|last2=Freeman |first2=W.T. |last3=Y.
|chapterurl=http://www.merl.com/publications/TR2001-022/  |accessdate=2009-03-30
|chapter=Understanding Belief Propagation and Its Generalizations
|title=Exploring Artificial Intelligence in the New Millennium
|isbn=1-55860-811-7
|pages=239&ndash;236
|date=January 2003
|publisher=Morgan Kaufmann
|editor1-first=Gerhard |editor1-last=Lakemeyer
|editor2-first=Bernhard |editor2-last=Nebel
}}</ref> in particular). We describe here the variant that operates on a [[factor graph]]. A factor graph is a [[bipartite graph]] containing nodes corresponding to variables ''V'' and factors ''U'', with edges between variables and the factors in which they appear. We can write the joint mass function:
 
:<math>p(\mathbf{x}) = \prod_{u \in U} f_u (\mathbf{x}_u)</math>
 
where '''x'''<sub>''u''</sub> is the vector of neighbouring variable nodes to the factor node ''u''. Any [[Bayesian network]] or [[Markov random field]] can be represented as a factor graph.
 
The algorithm works by passing real valued functions called ''messages'' along the edges between the nodes. More precisely, if ''v'' is a variable node and ''u'' is a factor node connected to ''v'' in the factor graph, the messages from ''v'' to ''u'', (denoted by <math>\mu_{v \to u}</math>) and from ''u'' to ''v'' (<math>\mu_{u \to v}</math>), are real-valued functions whose domain is Dom(''v''), the set of values that can be taken by the random variable associated with ''v''. These messages contain the "influence" that one variable exerts on another. The messages are computed differently depending on whether the node receiving the message is a variable node or a factor node. Keeping the same notation:
* A message from a variable node ''v'' to a factor node ''u'' is the product of the messages from all other neighbouring factor nodes (except the recipient; alternatively one can say the recipient sends as message the constant function equal to "1"):
 
::<math>\forall x_v\in Dom(v),\; \mu_{v \to u} (x_v) = \prod_{u^* \in N(v)\setminus\{u\} } \mu_{u^* \to v} (x_v).</math>
 
:where ''N''(''v'') is the set of neighbouring (factor) nodes to ''v''. If <math>N(v)\setminus\{u\}</math> is empty, then <math>\mu_{v \to u}(x_v)</math> is set to the uniform distribution.
 
* A message from a factor node ''u'' to a variable node ''v'' is the product of the factor with messages from all other nodes, marginalised over all variables except the one associated with ''v'':
 
::<math>\forall x_v\in Dom(v),\; \mu_{u \to v} (x_v) = \sum_{\mathbf{x}'_u:x'_v = x_v } f_u (\mathbf{x}'_u) \prod_{v^* \in N(u) \setminus \{v\}} \mu_{v^* \to u} (x'_{v^*}).</math>
 
:where ''N''(''u'') is the set of neighbouring (variable) nodes to ''u''. If <math>N(u) \setminus \{v\}</math> is empty then <math>\mu_{u \to v} (x_v) = f_u(x_v)</math>.
The name of the algorithm is clear from the previous formula: the complete marginalisation is reduced to a sum of products of simpler terms than the ones appearing in the full joint distribution.
 
In a typical run, each message will be updated iteratively from the previous value of the neighbouring messages. Different scheduling can be used for updating the messages. In the case where the graphical model is a tree, an optimal scheduling allows to reach convergence after computing each messages only once (see next sub-section).  When the factor graph has cycles, such an optimal scheduling does not exist, and a typical choice is to update all messages simultaneously at each iteration.
 
Upon convergence (if convergence happened), the estimated marginal distribution of each node is proportional to the product of all messages from adjoining factors (missing the normalization constant):
 
:<math> p_{X_v} (x_v) \propto \prod_{u \in N(v)} \mu_{u \to v} (x_v). </math>
 
Likewise, the estimated joint marginal distribution of the set of variables belonging to one factor is proportional to the product of the factor and the messages from the variables:
 
:<math> p_{X_u} (\mathbf{x}_u) \propto f_u(\mathbf{x}_u) \prod_{v \in N(u)} \mu_{v \to u} (x_u). </math>
 
In the case where the factor graph is acyclic (i.e. is a tree or a forest), these estimated marginal actually converge to the true marginals in a finite number of iterations. This can be shown by [[mathematical induction]].
 
===Exact algorithm for trees===
In the case when the factor graph is a [[tree (graph theory)|tree]], the Belief Propagation algorithm will compute the exact marginals. Furthermore, with proper scheduling of the message updates, it will terminate after 2 steps. This optimal scheduling can be described as follows:
 
Before starting, the graph is orientated by designating one node as the ''root''; any non-root node which is connected to only one other node is called a ''leaf''.
 
In the first step, messages are passed inwards: starting at the leaves, each node passes a message along the (unique) edge towards the root node. The tree structure guarantees that it is possible to obtain messages from all other adjoining nodes before passing the message on. This continues until the root has obtained messages from all of its adjoining nodes.
 
The second step involves passing the messages back out: starting at the root, messages are passed in the reverse direction. The algorithm is completed when all leaves have received their messages.
 
===Approximate algorithm for general graphs===
Curiously, although it was originally designed for acyclic graphical models, it was found that the Belief Propagation algorithm can be used in general [[graph (mathematics)|graph]]s. The algorithm is then sometimes called "loopy" belief propagation, because graphs typically contain [[cycle (graph theory)|cycle]]s, or loops.  The initialization and scheduling of message updates must be adjusted slightly  (compared with the previously described schedule for acyclic graphs) because graphs might not contain any leaves.  Instead, one initializes all variable messages to 1 and uses the same message definitions above, updating all messages at every iteration (although messages coming from known leaves or tree-structured subgraphs may no longer need updating after sufficient iterations).  It is easy to show that in a tree, the message definitions of this modified procedure will converge to the set of message definitions given above within a number of iterations equal to the [[diameter]] of the tree.
 
The precise conditions under which loopy belief propagation will converge are still not well understood; it is known that on graphs containing a single loop it converges in most cases, but the probabilities obtained might be incorrect.<ref>
{{Cite journal
|last=Weiss |first=Yair
|title=Correctness of Local Probability Propagation in Graphical Models with Loops
|journal=[[Neural Computation]]
|year=2000
|volume=12 |issue=1 |pages=1&ndash;41
|doi=10.1162/089976600300015880
}}
</ref> Several sufficient (but not necessary) conditions for convergence of loopy belief propagation to a unique fixed point exist.<ref>
{{Cite journal
|last1=Mooij |first1=J
|last2=Kappen |first2=H
|title=Sufficient Conditions for Convergence of the Sum–Product Algorithm
|journal=[[IEEE Transactions on Information Theory]]
|volume=53 |issue=12 |pages=4422&ndash;4437 |year=2007
|doi=10.1109/TIT.2007.909166
}}
</ref> There exist graphs which will fail to converge, or which will oscillate between multiple states over repeated iterations.  Techniques like [[EXIT chart]]s can provide an approximate visualisation of the progress of belief propagation and an approximate test for convergence.
 
There are other approximate methods for marginalization including [[Variational Bayesian methods|variational method]]s and [[Monte Carlo method]]s.
 
One method of exact marginalization in general graphs is called the [[junction tree]] algorithm, which is simply belief propagation on a modified graph guaranteed to be a tree.  The basic premise is to eliminate cycles by clustering them into single nodes.
 
==Related algorithm and complexity issues==
A similar algorithm is commonly referred to as the [[Viterbi algorithm]], but also known as a special case of the max-product or min-sum algorithm, which solves the related problem of maximization, or most probable explanation.  Instead of attempting to solve the marginal, the goal here is to find the  values <math>\mathbf{x}</math> that maximises the global function (i.e. most probable values in a probabilistic setting), and it can be defined using the [[arg max]]:
 
:<math>\arg\max_{\mathbf{x}} g(\mathbf{x}).</math>
 
An algorithm that solves this problem is nearly identical to belief propagation, with the sums replaced by maxima in the definitions.<ref>
{{Cite journal
|last=Löliger |first=Hans-Andrea
|title=An Introduction to Factor Graphs
|journal=[[IEEE Signal Processing Magazine]]
|year=2004
|volume=21 |pages=28&ndash;41
}}
</ref>
 
It is worth noting that [[inference]] problems like marginalization and maximization are [[NP-hard]] to solve exactly and approximately (at least for [[approximation error|relative error]]) in a graphical model.  More precisely, the marginalization problem defined above is [[Sharp-P-complete|#P-complete]] and maximization is [[NP-complete]].
 
The memory usage of belief propagation can be reduced through the use of the [[Island algorithm]] (at a small cost in time complexity).
 
==Relation to free energy==
The sum-product algorithm is related to the calculation of [[Thermodynamic free energy|free energy]] in [[thermodynamics]]. Let ''Z'' be the [[partition function (mathematics)|partition function]]. A probability distribution
 
:<math>P(\mathbf{X}) = \frac{1}{Z} \prod_{f_j} f_j(x_j)</math>
 
(as per the factor graph representation) can be viewed as a measure of the [[internal energy]] present in a system, computed as
 
:<math>E(\mathbf{X}) = \log \prod_{f_j} f_j(x_j).</math>
 
The free energy of the system is then
 
:<math>F = U - H = \sum_{\mathbf{X}} P(\mathbf{X}) E(\mathbf{X}) + \sum_{\mathbf{X}}  P(\mathbf{X}) \log P(\mathbf{X}).</math>
 
It can then be shown that the points of convergence of the sum-product algorithm represent the points where the free energy in such a system is minimized.  Similarly, it can be shown that a fixed point of the iterative belief propagation algorithm in graphs with cycles is a stationary point of a free energy approximation.<ref name="GBP-2005">
{{Cite journal
|last1=Yedidia |first1=J.S.
|last2=Freeman |first2=W.T.
|last3=Weiss |last4=Y.
|title=Constructing free-energy approximations and generalized belief propagation algorithms
|journal=[[IEEE Transactions on Information Theory]]
|volume=51 |issue=7 |pages=2282&ndash;2312 |date=July 2005
|doi=10.1109/TIT.2005.850085
|url=http://www.merl.com/publications/TR2004-040/ |accessdate=2009-03-28
|first3=Y.
}}
</ref>
 
==Generalized belief propagation (GBP)==
Belief propagation algorithms are normally presented as message update equations on a factor graph, involving messages between variable nodes and their neighboring factor nodes and vice versa. Considering messages between ''regions'' in a graph is one way of generalizing the belief propagation algorithm.<ref name="GBP-2005" /> There are several ways of defining the set of regions in a graph that can exchange messages. One method uses ideas introduced by [[Ryoichi Kikuchi|Kikuchi]] in the physics literature, and is known as Kikuchi's [[cluster variation method]].
 
Improvements in the performance of belief propagation algorithms are also achievable by breaking the replicas symmetry in the distributions of the fields (messages). This generalization leads to a new kind of algorithm called [[survey propagation]] (SP), which have proved to be very efficient in [[NP-complete]] problems like [[satisfiability]]<ref name="Sat">
{{Cite journal
|last1=Braunstein |first1=A.
|last2=Mézard |first2=R.
|last3=Zecchina
|title=Survey propagation: An algorithm for satisfiability
|journal=Random Structures & Algorithms
|volume=27 |issue=2 |pages=201&ndash;226 |year=2005
|doi=10.1002/rsa.20057
|first3=R.
}}
</ref>
and [[graph coloring]].
 
The cluster variational method and the survey propagation algorithms are two different improvements to belief propagation. The name [[generalized survey propagation]] (GSP) is waiting to be assigned to the algorithm that merges both generalizations.
 
==Gaussian belief propagation (GaBP)==
Gaussian belief propagation is a variant of the belief propagation algorithm when the underlying [[normal distribution|distributions are Gaussian]].  The first work analyzing this special model was the seminal work of Weiss and Freeman <ref name="GPbA">
{{Cite journal
|last1=Weiss |first1=Yair
|last2=Freeman |first2=William T.
|title=Correctness of Belief Propagation in Gaussian Graphical Models of Arbitrary Topology
|journal=[[Neural Computation]]
|volume=13 |issue=10 |pages=2173&ndash;2200 |date=October 2001
|doi=10.1162/089976601750541769
|pmid=11570995
}}
</ref>
 
The GaBP algorithm solves the following marginalization problem:
 
:<math> P(x_i) = \frac{1}{Z} \int_{j \ne i} \exp(-1/2x^TAx + b^Tx)\,dx_j</math>
 
where Z is a normalization constant, ''A'' is a symmetric positive definite matrix (inverse covariance matrix a.k.a. precision matrix) and ''b'' is the shift vector.
 
Equivalently, it can be shown that using the Gaussian model, the solution of the marginalization problem is equivalent to the [[Maximum A Posteriori|MAP]] assignment problem:
 
: <math>\underset{x}{\operatorname{argmax}}\  P(x) = \frac{1}{Z} \exp(-1/2x^TAx + b^Tx).</math>
 
This problem is also equivalent to the following minimization problem of the quadratic form:
 
: <math> \underset{x}{\operatorname{min}}\ 1/2x^TAx - b^Tx.</math>
 
Which is also equivalent to the linear system of equations
 
: <math> Ax = b.</math>
 
Convergence of the GaBP algorithm is easier to analyze (relatively to the general BP case) and there are two known sufficient convergence conditions.  The first one was formulated by Weiss et al. in the year 2000, when the information matrix A is [[diagonally dominant]]. The second convergence condition was formulated by Johnson et al.<ref name="johnson">
{{Cite journal
|first1=Dmitry M. |last1=Malioutov
|first2=Jason K. |last2=Johnson
|first3=Alan S. |last3=Willsky
|title=Walk-sums and belief propagation in Gaussian graphical models
|journal=[[Journal of Machine Learning Research]]
|volume=7 |pages=2031–2064 |date=October 2006
|url=http://jmlr.csail.mit.edu/papers/v7/malioutov06a.html |accessdate=2009-03-28
}}
</ref> in 2006, when the [[spectral radius]] of the matrix
 
:<math>\rho (I - |D^{-1/2}AD^{-1/2}|) < 1 \, </math>
 
where ''D'' = diag(''A'').
 
The GaBP algorithm was linked to the linear algebra domain,<ref name="Bickson">Gaussian belief propagation solver for systems of linear equations. By O. Shental, D. Bickson, P. H. Siegel, J. K. Wolf, and D. Dolev, IEEE Int. Symp. on Inform. Theory (ISIT), Toronto, Canada, July 2008. http://www.cs.huji.ac.il/labs/danss/p2p/gabp/</ref> and it was shown that the GaBP algorithm can be
viewed as an iterative algorithm for solving the linear system of equations
''Ax'' = ''b'' where ''A'' is the information matrix and ''b'' is the shift vector. The known convergence conditions of the GaBP algorithm are
identical to the sufficient conditions of the [[Jacobi method]]. Empirically, the GaBP algorithm is shown to converge faster than classical iterative methods like the Jacobi method, the [[Gauss&ndash;Seidel method]], [[successive over-relaxation]], and others.<ref name="Bickson2">Linear Detection via Belief Propagation. Danny Bickson, Danny Dolev, Ori Shental, Paul H. Siegel and Jack K. Wolf. In the 45th Annual Allerton Conference on Communication, Control, and Computing, Allerton House, Illinois, 7 Sept.. http://www.cs.huji.ac.il/labs/danss/p2p/gabp/</ref> Additionally, the GaBP algorithm is shown to be immune to numerical problems of the preconditioned [[Conjugate gradient method|conjugate gradient]] method <ref name="Bickson3">Distributed large scale network utility maximization. D. Bickson, Y. Tock, A. Zymnis, S. Boyd and D. Dolev. In the International symposium on information theory (ISIT), July 2009. http://www.cs.huji.ac.il/labs/danss/p2p/gabp/</ref>
 
==Notes==
{{Reflist}}
 
==References==
* Frey, Brendan (1998). ''Graphical Models for Machine Learning and Digital Communication''.  MIT Press
* Löliger, Hans-Andrea (2004). ''An Introduction to Factor Graphs''. IEEE Signal Proc. Mag. Vol.21. pages 28–41
* [[David J.C. MacKay]] (2003). Exact Marginalization in Graphs. In David J.C. MacKay, ''Information Theory, Inference, and Learning Algorithms'', pp.&nbsp;334–340. Cambridge: Cambridge University Press.
* Mackenzie, Dana (2005). [http://www.newscientist.com/channel/info-tech/mg18725071.400 ''Communication Speed Nears Terminal Velocity'']  New Scientist. 9 July 2005. Issue 2507 (Registration required)
* {{Cite journal
|last1=Yedidia |first1=J.S.
|last2=Freeman |first2=W.T.
|last3=Weiss |last4=Y.
|title=Constructing free-energy approximations and generalized belief propagation algorithms
|journal=[[IEEE Transactions on Information Theory]]
|volume=51 |issue=7 |pages=2282&ndash;2312 |date=July 2005
|doi=10.1109/TIT.2005.850085
|url=http://www.merl.com/publications/TR2004-040/ |accessdate=2009-03-28
|first3=Y.
}}
* {{Cite book
|last1=Yedidia |first1=J.S.
|last2=Freeman |first2=W.T. |last3=Y.
|chapterurl=http://www.merl.com/publications/TR2001-022/  |accessdate=2009-03-30
|chapter=Understanding Belief Propagation and Its Generalizations
|title=Exploring Artificial Intelligence in the New Millennium
|isbn=1-55860-811-7
|pages=239&ndash;236
|date=January 2003
|publisher=Morgan Kaufmann
|editor1-first=Gerhard |editor1-last=Lakemeyer
|editor2-first=Bernhard |editor2-last=Nebel
}}
* {{Cite book
|last=Bishop |first=Christopher M
|title=Pattern Recognition and Machine Learning
|chapterurl=http://research.microsoft.com/%7Ecmbishop/PRML/Bishop-PRML-sample.pdf |accessdate=2009-03-30
|chapter=Chapter 8: Graphical models
|isbn=0-387-31073-8
|publisher=Springer
|year=2006
|pages=359&ndash;418
}}
* Koch, Volker M. (2007). [http://www.volker-koch.com/diss/''A Factor Graph Approach to Model-Based Signal Separation''] --- A tutorial-style dissertation
* {{Cite book
    | last = Wymeersch
    | first = Henk
    | title = Iterative Receiver Design
    | year = 2007
    | publisher = Cambridge University Press
    | url = http://www.cambridge.org/us/catalogue/catalogue.asp?isbn=9780521873154
    | isbn = 0-521-87315-0 }}
* Bickson, Danny. (2009). [http://www.cs.cmu.edu/~bickson/gabp/index.html''Gaussian Belief Propagation Resource Page''] --- Webpage containing recent publications as well as Matlab source code.
* Coughlan, James. (2009). [http://www.ski.org/Rehab/Coughlan_lab/General/TutorialsandReference/BPtutorial.pdf''A Tutorial Introduction to Belief Propagation''].
 
{{DEFAULTSORT:Belief Propagation}}
[[Category:Graph algorithms]]
[[Category:Graphical models]]
[[Category:Coding theory]]
[[Category:Probability theory]]

Revision as of 22:22, 4 March 2014

Alyson is what my husband loves to contact me but I don't like when individuals use my full name. Kentucky is where I've always been residing. Invoicing is what I do. I am really fond of to go to karaoke but I've been using on new issues recently.

Check out my blog; psychic phone readings - my latest blog post,