|
|
Line 1: |
Line 1: |
| {{Probability distribution
| | I am 29 years old and my name is Britt Zwar. I life in Aachen (Germany).<br><br>My site FIFA coin generator ([http://matchgamer.info/submit-articles/fifa-coin-generator/ try what he says]) |
| | name =Wishart
| |
| | type =density
| |
| | pdf_image =
| |
| | cdf_image =
| |
| | notation =<math>X \sim W_p(\mathbf{V},n)</math>
| |
| | parameters =<math> n > p-1\!</math> [[degrees of freedom (statistics)|degrees of freedom]] ([[real numbers|real]])<br /><math>\mathbf{V} > 0\,</math> [[scale matrix]] (<math>p\times p</math> [[positive definite matrix|pos. def]])
| |
| | support =<math>\mathbf{X}\!</math> <math> (p\times p)</math> [[positive definite matrix]]
| |
| | pdf =<math>\frac{1}{2^\frac{np}{2}\left|{\mathbf V}\right|^\frac{n}{2}\Gamma_p(\frac{n}{2})} {\left|\mathbf{X}\right|}^{\frac{n-p-1}{2}} e^{-\frac{1}{2}{\rm tr}({\mathbf V}^{-1}\mathbf{X})}</math>
| |
| *<math>\Gamma_p</math> is the [[multivariate gamma function]]
| |
| *<math>\mathrm{tr}</math> is the [[trace (linear algebra)|trace]] function
| |
| | cdf =
| |
| | mean =<math>n \mathbf{V}</math>
| |
| | median =
| |
| | mode =<math>(n-p-1)\mathbf{V}\text{ for }n \geq p+1</math>
| |
| | variance =<math>\operatorname{Var}(\mathbf{X}_{ij}) = n(v_{ij}^2+v_{ii}v_{jj})</math>
| |
| | skewness =
| |
| | kurtosis =
| |
| | entropy =[[#Entropy|see below]]
| |
| | mgf =
| |
| | char =<math>\Theta \mapsto \left|{\mathbf I} - 2i\,{\mathbf\Theta}{\mathbf V}\right|^{-n/2}</math>
| |
| }}
| |
| | |
| In [[statistics]], the '''Wishart distribution''' is a generalization to multiple dimensions of the [[chi-squared distribution]], or, in the case of non-integer degrees of freedom, of the [[gamma distribution]]. It is named in honor of [[John Wishart (statistician)|John Wishart]], who first formulated the distribution in 1928.<ref>{{cite journal
| |
| |first=J. |last=Wishart |authorlink=John Wishart (statistician)
| |
| |title=The generalised product moment distribution in samples from a normal multivariate population
| |
| |journal=[[Biometrika]]
| |
| |volume=20A |issue=1–2 |pages=32–52 |year=1928
| |
| |doi=10.1093/biomet/20A.1-2.32 |jfm=54.0565.02 |jstor=2331939
| |
| }}</ref>
| |
| | |
| It is any of a family of [[probability distribution]]s defined over symmetric, [[nonnegative-definite]] [[matrix (math)|matrix]]-valued [[random variable]]s (“random matrices”). These distributions are of great importance in the [[estimation of covariance matrices]] in [[multivariate statistics]]. In [[Bayesian inference|Bayesian statistics]], the Wishart distribution is the [[conjugate prior]] of the [[matrix inverse|inverse]] [[covariance matrix|covariance-matrix]] of a [[multivariate normal distribution|multivariate-normal random-vector]].
| |
| | |
| ==Definition==
| |
| Suppose ''X'' is an ''n'' × ''p'' matrix, each row of which is [[statistical independence|independently]] drawn from a [[multivariate normal distribution|''p''-variate normal distribution]] with zero mean:
| |
| | |
| :<math>X_{(i)}{=}(x_i^1,\dots,x_i^p)^T\sim N_p(0,V).</math>
| |
| | |
| Then the Wishart distribution is the [[probability distribution]] of the ''p''×''p'' random matrix
| |
| | |
| :<math>S=X^T X \,\!</math>
| |
| | |
| known as the [[scatter matrix]]. One indicates that ''S'' has that probability distribution
| |
| by writing
| |
| | |
| :<math>S\sim W_p(V,n).</math>
| |
| | |
| The positive integer ''n'' is the number of ''[[degrees of freedom (statistics)|degrees of freedom]]''. Sometimes this is written ''W''(''V'', ''p'', ''n'').
| |
| For ''n'' ≥ ''p'' the matrix ''S'' is invertible with probability 1 if ''V'' is invertible.
| |
| | |
| If ''p'' = 1 and ''V'' = 1 then this distribution is a [[chi-squared distribution]] with ''n'' degrees of freedom.
| |
| | |
| ==Occurrence==
| |
| | |
| The Wishart distribution arises as the distribution of the sample covariance matrix for a sample from a [[multivariate normal distribution]].{{Citation needed|date=October 2010}} It occurs frequently in [[likelihood-ratio test]]s in multivariate statistical analysis. It also arises in the spectral theory of [[Random matrix|random matrices]]{{Citation needed|date=October 2010}} and in multidimensional Bayesian analysis.{{Citation needed|date=October 2010}}
| |
| | |
| ==Probability density function==
| |
| | |
| The Wishart distribution can be [[characterization (mathematics)|characterized]] by its [[probability density function]] as follows:
| |
| | |
| Let <math>\mathbf{X}</math> be a ''p'' × ''p'' symmetric matrix of random variables that is [[Positive-definite matrix|positive definite]]. Let <math>\mathbf{V}</math> be a (fixed) positive definite matrix of size ''p'' × ''p''.
| |
| | |
| Then, if ''n'' ≥ ''p'', <math>\mathbf{X}</math> has a Wishart distribution with ''n'' degrees of freedom if it has a [[probability density function]] given by
| |
| | |
| :<math>\frac{1}{2^\frac{np}{2}\left|{\mathbf V}\right|^\frac{n}{2}\Gamma_p(\frac{n}{2})} {\left|\mathbf{X}\right|}^{\frac{n-p-1}{2}} e^{-\frac{1}{2}{\rm tr}({\mathbf V}^{-1}\mathbf{X})}</math>
| |
| | |
| where Γ<sub>''p''</sub>(·) is the [[multivariate gamma function]] defined as
| |
| | |
| :<math>
| |
| \Gamma_p(n/2)=
| |
| \pi^{p(p-1)/4}\Pi_{j=1}^p
| |
| \Gamma\left[ n/2+(1-j)/2\right].
| |
| </math>
| |
| | |
| In fact the above definition can be extended to any real ''n'' > ''p'' − 1. If ''n'' ≤ ''p'' − 2, then the Wishart no longer has a density—instead it represents a singular distribution.
| |
| <ref>“On singular Wishart and singular multivariate beta distributions” by Harald Uhlig, The Annals of Statistics, 1994, 395-405
| |
| [http://projecteuclid.org/DPubS?service=UI&version=1.0&verb=Display&handle=euclid.aos/1176325375 projecteuclid]</ref> | |
| | |
| ==Use in Bayesian statistics ==
| |
| | |
| In [[Bayesian statistics]], in the context of the [[multivariate normal distribution]], the Wishart distribution is the conjugate prior to the precision matrix <math>\mathbf{\Omega} = \mathbf{\Sigma}^{-1}</math>, where <math>\mathbf{\Sigma}</math> is the covariance matrix.
| |
| | |
| === Choice of W ===
| |
| | |
| The least informative, proper Wishart prior is obtained by setting <math>n = p</math>.
| |
| | |
| The prior mean of <math>W_p(\mathbf{V}, n)</math> is <math>n\mathbf{V}^{-1}</math>. This implies that a good choice for <math>\mathbf{V}</math> is <math>n\mathbf{\Sigma}_0</math>, where <math>\mathbf{\Sigma}_0</math> is some prior guess for the covariance matrix.
| |
| | |
| ==Properties==
| |
| | |
| ===Log-expectation===
| |
| Note the following formula:<ref name="bishop693">C.M. Bishop, ''Pattern Recognition and Machine Learning'', Springer 2006, p. 693.</ref>
| |
| | |
| :<math>\operatorname{E}[\ln|\mathbf{X}|] = \sum_{i=1}^p \psi\left(\tfrac{1}{2}(n+1-i)\right) + p\ln(2) + \ln|\mathbf{V}|</math>
| |
| | |
| where ψ is the [[digamma function]] (the derivative of the log of the [[gamma function]]).
| |
| | |
| This plays a role in [[variational Bayes]] derivations for [[Bayes network]]s involving the Wishart distribution.
| |
| | |
| ===Entropy===
| |
| The [[information entropy]] of the distribution has the following formula:<ref name="bishop693"/>
| |
| | |
| :<math>\operatorname{H}[\mathbf{X}] = -\ln \left (B(\mathbf{V},n) \right ) -\tfrac{1}{2}(n-p-1) \operatorname{E}[\ln|\mathbf{X}|] + \frac{np}{2}</math>
| |
| | |
| where <math>B(\mathbf{V},n)</math> is the [[normalizing constant]] of the distribution:
| |
| | |
| :<math>B(\mathbf{V},n) = \frac{1}{\left|\mathbf{V}\right|^\frac{n}{2} 2^\frac{np}{2}\Gamma_p(\frac{n}{2})}</math>
| |
| | |
| This can be expanded as follows:
| |
| | |
| :<math>\begin{align}
| |
| \operatorname{H}[\mathbf{X}] &= \tfrac{n}{2}\ln|\mathbf{V}| +\tfrac{np}{2}\ln(2) + \ln\left (\Gamma_p(\tfrac{n}{2}) \right ) -\tfrac{1}{2}(n-p-1) \operatorname{E}[\ln|\mathbf{X}|] + \tfrac{np}{2} \\
| |
| &= \tfrac{n}{2}\ln|\mathbf{V}| +\tfrac{np}{2}\ln(2) + \tfrac{1}{4} p(p-1) \ln(\pi) + \sum_{i=1}^p \ln \left (\Gamma\left ( \tfrac{n}{2}+\tfrac{1-i}{2}\right ) \right ) \\
| |
| &\qquad \qquad -\tfrac{1}{2}(n-p-1)\left(\sum_{i=1}^p \psi\left(\tfrac{1}{2}(n+1-i)\right) + p\ln(2) + \ln|\mathbf{V}|\right) + \tfrac{np}{2} \\
| |
| &= \tfrac{n}{2}\ln|\mathbf{V}| +\tfrac{np}{2}\ln(2) + \tfrac{1}{4} p(p-1) \ln(\pi) + \sum_{i=1}^p \ln \left (\Gamma\left ( \tfrac{n}{2}+\tfrac{1-i}{2}\right ) \right ) \\
| |
| &\qquad \qquad - \left ( \tfrac{1}{2}(n-p-1)\sum_{i=1}^p \psi\left(\tfrac{1}{2}(n+1-i)\right) + \tfrac{1}{2}(n-p-1)p\ln(2) + \tfrac{1}{2}(n-p-1)\ln|\mathbf{V}|\right) + \tfrac{np}{2} \\
| |
| &= \tfrac{p+1}{2}\ln|\mathbf{V}| +\tfrac{1}{2}p(p+1)\ln(2) + \tfrac{1}{4}p(p-1) \ln(\pi) + \sum_{i=1}^p \ln \left (\Gamma\left ( \tfrac{n}{2}+\tfrac{1-i}{2}\right ) \right ) -\tfrac{1}{2}(n-p-1)\sum_{i=1}^p \psi\left(\tfrac{1}{2}(n+1-i)\right) + \tfrac{np}{2}
| |
| \end{align}</math>
| |
| | |
| ===Characteristic function===
| |
| The [[characteristic function (probability theory)|characteristic function]] of the Wishart distribution is
| |
| | |
| :<math>\Theta \mapsto \left|{\mathbf I} - 2i\,{\mathbf\Theta}{\mathbf V}\right|^{-\frac{n}{2}}.</math>
| |
| | |
| In other words,
| |
| | |
| :<math>\Theta \mapsto \operatorname{E}\left [ \mathrm{exp}\left (i \mathrm{tr}(\mathbf{X}{\mathbf\Theta})\right )\right ] = \left|{\mathbf I} - 2i{\mathbf\Theta}{\mathbf V}\right|^{-\frac{n}{2}} </math>
| |
| | |
| where E[⋅] denotes expectation. (Here Θ and '''I''' are matrices the same size as '''V''' ('''I''' is the [[identity matrix]]); and ''i'' is the square root of −1).<ref>{{cite book
| |
| | last = Anderson
| |
| | first = T. W.
| |
| | authorlink = T. W. Anderson
| |
| | title = An Introduction to Multivariate Statistical Analysis
| |
| | publisher = [[Wiley Interscience]]
| |
| | edition = 3rd
| |
| | location = Hoboken, N. J.
| |
| | year = 2003
| |
| | page = 259
| |
| | isbn = 0-471-36091-0 }}</ref>
| |
| | |
| ==Theorem==
| |
| | |
| If <math>\scriptstyle \mathbf{X}</math> has a Wishart distribution with ''m'' degrees of freedom and variance matrix <math>\scriptstyle {\mathbf V}</math>—write <math>\scriptstyle \mathbf{X}\sim\mathcal{W}_p({\mathbf V},m)</math>—and <math>\scriptstyle{\mathbf C}</math> is a ''q'' × ''p'' matrix of [[rank (matrix theory)|rank]] ''q'', then <ref name="rao">Rao, C. R., ''Linear statistical inference and its applications'', Wiley 1965, p. 535.</ref>
| |
| | |
| :<math>
| |
| {\mathbf C}\mathbf{X}{\mathbf C}^T
| |
| \sim
| |
| \mathcal{W}_q\left({\mathbf C}{\mathbf V}{\mathbf C}^T,m\right).
| |
| </math>
| |
| | |
| ===Corollary 1===
| |
| | |
| If <math>{\mathbf z}</math> is a nonzero <math>p\times 1</math> constant vector, then<ref name="rao"/>
| |
| <math>{\mathbf z}^T\mathbf{X}{\mathbf z}\sim\sigma_z^2\chi_m^2</math>.
| |
| | |
| In this case, <math>\chi_m^2</math> is
| |
| the [[chi-squared distribution]] and <math>\sigma_z^2={\mathbf z}^T{\mathbf V}{\mathbf z}</math> (note that <math>\sigma_z^2</math> is a constant; it is positive because <math>{\mathbf V}</math> is positive definite).
| |
| | |
| ===Corollary 2===
| |
| | |
| Consider the case where <math>{\mathbf z}^T=(0,\ldots,0,1,0,\ldots,0)</math> (that is, the ''j''th element is one and all others zero). Then corollary 1 above shows that
| |
| | |
| :<math>
| |
| w_{jj}\sim\sigma_{jj}\chi^2_m</math>
| |
| | |
| gives the marginal distribution of each of the elements on the matrix's diagonal.
| |
| | |
| Noted statistician [[George Seber]] points out{{Citation needed|date=October 2010}} that the Wishart distribution is not called the “multivariate chi-squared distribution” because the marginal distribution of the off-diagonal elements is not chi-squared. Seber prefers{{Citation needed|date=October 2010}} to reserve the term [[multivariate statistics|multivariate]] for the case when all univariate marginals belong to the same family.
| |
| | |
| ==Estimator of the multivariate normal distribution==
| |
| | |
| The Wishart distribution is the [[sampling distribution]] of the [[maximum likelihood|maximum-likelihood estimator]] (MLE) of the [[covariance matrix]] of a [[multivariate normal distribution]].<ref>C. Chatfield and A. J. Collins, 1980,"Introduction to Multivariate Analysis" p.103-108</ref> A [[estimation of covariance matrices|derivation of the MLE]] uses the [[spectral theorem]].
| |
| | |
| ==Bartlett decomposition==
| |
| The '''Bartlett decomposition''' of a matrix <math>\mathbf{X}</math> from a ''p''-variate Wishart distribution with scale matrix '''V''' and ''n'' degrees of freedom is the factorization:
| |
| :<math>\mathbf{X} = {\textbf L}{\textbf A}{\textbf A}^T{\textbf L}^T</math>
| |
| where '''L''' is the [[Cholesky decomposition]] of '''V''', and:
| |
| :<math>\mathbf A = \begin{pmatrix}
| |
| \sqrt{c_1} & 0 & 0 & \cdots & 0\\
| |
| n_{21} & \sqrt{c_2} &0 & \cdots& 0 \\
| |
| n_{31} & n_{32} & \sqrt{c_3} & \cdots & 0\\
| |
| \vdots & \vdots & \vdots &\ddots & \vdots \\
| |
| n_{p1} & n_{p2} & n_{p3} &\cdots & \sqrt{c_p}
| |
| \end{pmatrix}</math>
| |
| where <math>c_i \sim \chi^2_{n-i+1}</math> and <math>n_{ij} \sim N(0,1) \,</math> independently.<ref>{{cite book
| |
| | last = Anderson
| |
| | first = T. W.
| |
| | authorlink = T. W. Anderson
| |
| | title = An Introduction to Multivariate Statistical Analysis
| |
| | publisher = [[Wiley Interscience]]
| |
| | edition = 3rd
| |
| | location = Hoboken, N. J.
| |
| | year = 2003
| |
| | page = 257
| |
| | isbn = 0-471-36091-0 }}</ref>
| |
| This provides a useful method for obtaining random samples from a Wishart distribution.<ref>{{cite journal
| |
| |title=Algorithm AS 53: Wishart Variate Generator
| |
| |first1= W. B. |last1=Smith
| |
| |first2= R. R. |last2=Hocking
| |
| |journal=[[Journal of the Royal Statistical Society, Series C]]
| |
| |volume=21 |issue=3 |year=1972 |pages=341–345
| |
| |jstor=2346290
| |
| }}</ref>
| |
| | |
| ==The possible range of the shape parameter==
| |
| It can be shown <ref>{{cite journal
| |
| |doi=10.1214/aop/1176990455
| |
| |last=Peddada and Richards
| |
| |first1=Shyamal Das
| |
| |last2=Richards
| |
| |first2=Donald St. P. |title=Proof of a Conjecture of M. L. Eaton on the Characteristic Function of the Wishart Distribution,
| |
| |journal=[[Annals of Probability]]
| |
| |volume=19 |issue=2 |pages=868–874 |year=1991 }}</ref> that the Wishart distribution can be defined if and only if the shape parameter '''n''' belongs to the set
| |
| :<math>
| |
| \Lambda_p:=\{0,\dots,p-1\}\cup \left(p-1,\infty\right).
| |
| </math>
| |
| This set is named after Gindikin, who introduced it<ref>{{cite journal
| |
| |doi=10.1007/BF01078179
| |
| |first=S.G. |last=Gindikin
| |
| |title=Invariant generalized functions in homogeneous domains,
| |
| |journal=[[Funct. Anal. Appl.]],
| |
| |volume=9
| |
| |issue=1 |pages=50–52 |year=1975
| |
| }}</ref> in the seventies
| |
| in the context of gamma distributions on homogeneous cones. However, for the new parameters in the discrete spectrum of the Gindikin ensemble, namely,
| |
| :<math>
| |
| \Lambda_p^*:=\{0,\dots,p-1\},
| |
| </math>
| |
| the corresponding Wishart distribution has no Lebesgue density.
| |
| | |
| == Relationships to other distributions ==
| |
| *The Wishart distribution is related to the [[Inverse-Wishart distribution]], denoted by <math>W_p^{-1}</math>, as follows: If <math>\mathbf{X}\sim W_p(\mathbf{V},n)</math> and if we do the change of variables <math>\mathbf{C}=\mathbf{X}^{-1}</math>, then <math>\mathbf{C}\sim W_p^{-1}(\mathbf{V}^{-1},n)</math>. This relationship may be derived by noting that the absolute value of the [[Jacobian determinant]] of this change of variables is <math>|\mathbf{C}|^{p+1}</math>, see for example equation (15.15) in.<ref>Paul S. Dwyer, “SOME APPLICATIONS OF MATRIX DERIVATIVES IN MULTIVARIATE ANALYSIS”, JASA 1967; 62:607-625, available [http://www.jstor.org/pss/2283988 JSTOR].</ref>
| |
| * In [[Bayesian statistics]], the Wishart distribution is a [[conjugate prior]] for the [[Precision (statistics)|precision parameter]] of the [[multivariate normal distribution]], when the mean parameter is known.<ref>C.M. Bishop, ''Pattern Recognition and Machine Learning'', Springer 2006.</ref>
| |
| * A generalization is the [[multivariate gamma distribution]].
| |
| * A different type of generalization is the [[normal-Wishart distribution]], essentially the product of a [[multivariate normal distribution]] with a Wishart distribution.
| |
| | |
| ==See also==
| |
| | |
| {{Colbegin}}
| |
| * [[Chi-squared distribution]]
| |
| * [[F-distribution]]
| |
| * [[Gamma distribution]]
| |
| * [[Hotelling's T-squared distribution]]
| |
| * [[Inverse-Wishart distribution]]
| |
| * [[Multivariate gamma distribution]]
| |
| * [[Student's t-distribution]]
| |
| * [[Wilks' lambda distribution]]
| |
| {{Colend}}
| |
| | |
| ==References==
| |
| {{reflist}}
| |
| | |
| ==External links==
| |
| * [https://github.com/zweng/rmg A C++ library for random matrix generator]
| |
| | |
| | |
| | |
| {{ProbDistributions|multivariate}}
| |
| | |
| {{DEFAULTSORT:Wishart Distribution}}
| |
| [[Category:Continuous distributions]]
| |
| [[Category:Multivariate continuous distributions]]
| |
| [[Category:Multivariate statistics]]
| |
| [[Category:Random matrices]]
| |
| [[Category:Conjugate prior distributions]]
| |
| [[Category:Exponential family distributions]]
| |
| [[Category:Probability distributions]]
| |