https://en.formulasearchengine.com/api.php?action=feedcontributions&user=192.35.44.24&feedformat=atom formulasearchengine - User contributions [en] 2020-08-10T20:34:53Z User contributions MediaWiki 1.35.0-alpha https://en.formulasearchengine.com/index.php?title=Jeffreys_prior&diff=240441 Jeffreys prior 2012-08-24T14:42:05Z <p>192.35.44.24: /* One-parameter case */ be more specific about why the derivation follows</p> <hr /> <div>In [[Bayesian probability]], the '''Jeffreys prior''', named after [[Harold Jeffreys]], is a [[non-informative prior|non-informative]] (objective) [[prior distribution]] on parameter space that is proportional to the [[square root]] of the [[determinant]] of the [[Fisher information]]:<br /> <br /> : &lt;math&gt;p\left(\vec\theta\right) \propto \sqrt{\det \mathcal{I}\left(\vec\theta\right)}.\,&lt;/math&gt;<br /> <br /> It has the key feature that it is [[Parametrization#Parametrization invariance|invariant under reparameterization]] of the parameter vector &lt;math&gt;\vec\theta&lt;/math&gt;. This makes it of special interest for use with ''scale parameters''.&lt;ref&gt;Jaynes, E. T. (1968) &quot;Prior Probabilities&quot;, ''IEEE Trans. on Systems Science and Cybernetics'', '''SSC-4''', 227 [http://bayes.wustl.edu/etj/articles/prior.pdf pdf].&lt;/ref&gt;<br /> <br /> == Reparameterization ==<br /> === One-parameter case ===<br /> For an alternate parameterization &lt;math&gt;\varphi&lt;/math&gt; we can derive<br /> <br /> : &lt;math&gt;p(\varphi) \propto \sqrt{I(\varphi)}\,&lt;/math&gt;<br /> <br /> from<br /> <br /> : &lt;math&gt;p(\theta) \propto \sqrt{I(\theta)}\,&lt;/math&gt;<br /> <br /> using the [[change of variables theorem]] and the definition of Fisher information:<br /> <br /> : &lt;math&gt;<br /> \begin{align}<br /> p(\varphi) &amp; = p(\theta) \left|\frac{d\theta}{d\varphi}\right|<br /> \propto \sqrt{I(\theta) \left(\frac{d\theta}{d\varphi}\right)^2}<br /> = \sqrt{\operatorname{E}\!\left[\left(\frac{d \ln L}{d\theta}\right)^2\right] \left(\frac{d\theta}{d\varphi}\right)^2} \\<br /> &amp; = \sqrt{\operatorname{E}\!\left[\left(\frac{d \ln L}{d\theta} \frac{d\theta}{d\varphi}\right)^2\right]}<br /> = \sqrt{\operatorname{E}\!\left[\left(\frac{d \ln L}{d\varphi}\right)^2\right]}<br /> = \sqrt{I(\varphi)}.<br /> \end{align}<br /> &lt;/math&gt;<br /> <br /> === Multiple-parameter case ===<br /> For an alternate parameterization &lt;math&gt;\vec\varphi&lt;/math&gt; we can derive<br /> <br /> : &lt;math&gt;p(\vec\varphi) \propto \sqrt{\det I(\vec\varphi)}\,&lt;/math&gt;<br /> <br /> from<br /> <br /> : &lt;math&gt;p(\vec\theta) \propto \sqrt{\det I(\vec\theta)}\,&lt;/math&gt;<br /> <br /> using the [[change of variables theorem]], the definition of Fisher information, and that the product of determinants is the determinant of the matrix product:<br /> <br /> : &lt;math&gt;<br /> \begin{align}<br /> p(\vec\varphi) &amp; = p(\vec\theta) \left|\det\frac{\partial\theta_i}{\partial\varphi_j}\right| \\<br /> &amp; \propto \sqrt{\det I(\vec\theta)\, {\det}^2\frac{\partial\theta_i}{\partial\varphi_j}} \\<br /> &amp; = \sqrt{\det \frac{\partial\theta_k}{\partial\varphi_i}\, \det \operatorname{E}\!\left[\frac{\partial \ln L}{\partial\theta_k} \frac{\partial \ln L}{\partial\theta_l} \right]\, \det \frac{\partial\theta_l}{\partial\varphi_j}} \\<br /> &amp; = \sqrt{\det \operatorname{E}\!\left[\sum_{k,l} \frac{\partial\theta_k}{\partial\varphi_i} \frac{\partial \ln L}{\partial\theta_k} \frac{\partial \ln L}{\partial\theta_l} \frac{\partial\theta_l}{\partial\varphi_j} \right]} \\<br /> &amp; = \sqrt{\det \operatorname{E}\!\left[\frac{\partial \ln L}{\partial\varphi_i} \frac{\partial \ln L}{\partial\varphi_j}\right]}<br /> = \sqrt{\det I(\vec\varphi)}.<br /> \end{align}<br /> &lt;/math&gt;<br /> <br /> == Attributes ==<br /> From a practical and mathematical standpoint, a valid reason to use this non-informative prior instead of others, like the ones obtained through a limit in conjugate families of distributions, is that it is not dependent upon the set of parameter variables that is chosen to describe parameter space.<br /> <br /> Sometimes the Jeffreys prior cannot be [[Normalizing constant|normalized]], and thus one must use an [[improper prior]]. For example, the Jeffreys prior for the distribution mean is uniform over the entire real line in the case of a [[Gaussian distribution]] of known variance.<br /> <br /> Use of the Jeffreys prior violates the strong version of the [[likelihood principle]], which is accepted by many, but by no means all, statisticians. When using the Jeffreys prior, inferences about &lt;math&gt;\vec\theta&lt;/math&gt; depend not just on the probability of the observed data as a function of &lt;math&gt;\vec\theta&lt;/math&gt;, but also on the universe of all possible experimental outcomes, as determined by the experimental design, because the Fisher information is computed from an expectation over the chosen universe. Accordingly, the Jeffreys prior, and hence the inferences made using it, may be different for two experiments involving the same &lt;math&gt;\vec\theta&lt;/math&gt; parameter even when the likelihood functions for the two experiments are the same—a violation of the strong likelihood principle.<br /> <br /> == Minimum description length ==<br /> <br /> In the [[minimum description length]] approach to statistics the goal is to describe data as compactly as possible where the length of a description is measured in bits of the code used. For a parametric family of distributions one compares a code with the best code based on one of the distributions in the parameterized family. The main result is that in [[exponential family|exponential families]], asymptotically for large sample size, the code based on the distribution that is a mixture of the elements in the exponential family with the Jeffreys prior is optimal. This result holds if one restricts the parameter set to a compact subset in the interior of the full parameter space. If the full parameter is used a modified version of the result should be used.<br /> <br /> ==Examples==<br /> The Jeffreys prior for a parameter (or a set of parameters) depends upon the statistical model.<br /> <br /> ===Gaussian distribution with mean parameter===<br /> For the [[Gaussian distribution]] of the real value &lt;math&gt;x&lt;/math&gt;<br /> : &lt;math&gt;f(x|\mu) = \frac{e^{-(x - \mu)^2 / 2\sigma^2}}{\sqrt{2 \pi \sigma^2}}&lt;/math&gt;<br /> the Jeffreys prior for the mean &lt;math&gt;\mu&lt;/math&gt; is<br /> : &lt;math&gt;\begin{align} p(\mu) &amp; \propto \sqrt{I(\mu)}<br /> = \sqrt{\operatorname{E}\!\left[ \left( \frac{d}{d\mu} \log f(x|\mu) \right)^2\right]}<br /> = \sqrt{\operatorname{E}\!\left[ \left( \frac{x - \mu}{\sigma^2} \right)^2 \right]} \\<br /> &amp; = \sqrt{\int_{-\infty}^{+\infty} f(x|\mu) \left(\frac{x-\mu}{\sigma^2}\right)^2 dx}<br /> = \sqrt{\frac{\sigma^2}{\sigma^4}}<br /> \propto 1.\end{align}&lt;/math&gt;<br /> That is, the Jeffreys prior for &lt;math&gt;\mu&lt;/math&gt; does not depend upon &lt;math&gt;\mu&lt;/math&gt;; it is the unnormalized uniform distribution on the real line — the distribution that is 1 (or some other fixed constant) for all points. This is an [[improper prior]], and is, up to the choice of constant, the unique ''translation''-invariant distribution on the reals (the [[Haar measure]] with respect to addition of reals), corresponding to the mean being a measure of ''location'' and translation-invariance corresponding to no information about location.<br /> <br /> ===Gaussian distribution with standard deviation parameter===<br /> For the [[Gaussian distribution]] of the real value &lt;math&gt;x&lt;/math&gt;<br /> : &lt;math&gt;f(x|\sigma) = \frac{e^{-(x - \mu)^2 / 2 \sigma^2}}{\sqrt{2 \pi \sigma^2}},&lt;/math&gt;<br /> the Jeffreys prior for the standard deviation σ&amp;nbsp;&gt;&amp;nbsp;0 is<br /> : &lt;math&gt;\begin{align}p(\sigma) &amp; \propto \sqrt{I(\sigma)}<br /> = \sqrt{\operatorname{E}\!\left[ \left( \frac{d}{d\sigma} \log f(x|\sigma) \right)^2\right]}<br /> = \sqrt{\operatorname{E}\!\left[ \left( \frac{(x - \mu)^2-\sigma^2}{\sigma^3} \right)^2 \right]} \\<br /> &amp; = \sqrt{\int_{-\infty}^{+\infty} f(x|\mu)\left(\frac{(x-\mu)^2-\sigma^2}{\sigma^3}\right)^2 dx}<br /> = \sqrt{\frac{2}{\sigma^2}}<br /> \propto \frac{1}{\sigma}.<br /> \end{align}&lt;/math&gt;<br /> Equivalently, the Jeffreys prior for log&amp;nbsp;σ&lt;sup&gt;2&lt;/sup&gt; (or log&amp;nbsp;σ) is the unnormalized uniform distribution on the real line, and thus this distribution is also known as the '''{{visible anchor|logarithmic prior}}'''. It is the unique (up to a multiple) prior (on the positive reals) that is ''scale''-invariant (the [[Haar measure]] with respect to multiplication of positive reals), corresponding to the standard deviation being a measure of ''scale'' and scale-invariance corresponding to no information about scale. As with the uniform distribution on the reals, it is an [[improper prior]].<br /> <br /> ===Poisson distribution with rate parameter===<br /> For the [[Poisson distribution]] of the non-negative integer &lt;math&gt;n&lt;/math&gt;, <br /> : &lt;math&gt;f(n | \lambda) = e^{-\lambda}\frac{\lambda^n}{n!},&lt;/math&gt;<br /> the Jeffreys prior for the rate parameter λ&amp;nbsp;≥&amp;nbsp;0 is<br /> : &lt;math&gt;\begin{align}p(\lambda) &amp;\propto \sqrt{I(\lambda)}<br /> = \sqrt{\operatorname{E}\!\left[ \left( \frac{d}{d\lambda} \log f(x|\lambda) \right)^2\right]}<br /> = \sqrt{\operatorname{E}\!\left[ \left( \frac{n-\lambda}{\lambda} \right)^2\right]} \\<br /> &amp; = \sqrt{\sum_{n=0}^{+\infty} f(n|\lambda) \left( \frac{n-\lambda}{\lambda} \right)^2}<br /> = \sqrt{\frac{1}{\lambda}}.\end{align}&lt;/math&gt;<br /> Equivalently, the Jeffreys prior for &lt;math&gt;\sqrt{\lambda}&lt;/math&gt; is the unnormalized uniform distribution on the non-negative real line.<br /> <br /> ===Bernoulli trial===<br /> For a coin that is &quot;heads&quot; with probability γ&amp;nbsp;∈&amp;nbsp;[0,1] and is &quot;tails&quot; with probability 1&amp;nbsp;−&amp;nbsp;γ, for a given (H,T)&amp;nbsp;∈&amp;nbsp;{(0,1),&amp;nbsp;(1,0)} the probability is &lt;math&gt;\gamma^H (1-\gamma)^T&lt;/math&gt;. The Jeffreys prior for the parameter &lt;math&gt;\gamma&lt;/math&gt; is<br /> <br /> : &lt;math&gt;\begin{align}p(\gamma) &amp; \propto \sqrt{I(\gamma)}<br /> = \sqrt{\operatorname{E}\!\left[ \left( \frac{d}{d\gamma} \log f(x|\gamma) \right)^2\right]}<br /> = \sqrt{\operatorname{E}\!\left[ \left( \frac{H}{\gamma} - \frac{T}{1-\gamma}\right)^2 \right]} \\<br /> &amp; = \sqrt{\gamma \left( \frac{1}{\gamma} - \frac{0}{1-\gamma}\right)^2 + (1-\gamma)\left( \frac{0}{\gamma} - \frac{1}{1-\gamma}\right)^2}<br /> = \frac{1}{\sqrt{\gamma(1-\gamma)}}\,.\end{align}&lt;/math&gt;<br /> <br /> This is the [[arcsine distribution]] and is a [[beta distribution]] with &lt;math&gt;\alpha = \beta = 1/2&lt;/math&gt;. Furthermore, if &lt;math&gt;\gamma = \sin^2(\theta)&lt;/math&gt; the Jeffreys prior for &lt;math&gt;\theta&lt;/math&gt; is uniform in the interval &lt;math&gt;[0, \pi / 2]&lt;/math&gt;. Equivalently, &lt;math&gt;\theta&lt;/math&gt; is uniform on the whole circle &lt;math&gt;[0, 2 \pi]&lt;/math&gt;.<br /> <br /> ===''N''-sided die with biased probabilities===<br /> Similarly, for a throw of an &lt;math&gt;N&lt;/math&gt;-sided die with outcome probabilities &lt;math&gt;\vec{\gamma} = (\gamma_1, \ldots, \gamma_N)&lt;/math&gt;, each non-negative and satisfying &lt;math&gt;\sum_{i=1}^N \gamma_i = 1&lt;/math&gt;, the Jeffreys prior for &lt;math&gt;\vec{\gamma}&lt;/math&gt; is the [[Dirichlet distribution]] with all (alpha) parameters set to &lt;math&gt;1/2&lt;/math&gt;. In particular, if we write &lt;math&gt;\gamma_i = {\phi_i}^2&lt;/math&gt; for each &lt;math&gt;i&lt;/math&gt;, then the Jeffreys prior for &lt;math&gt;\vec{\phi}&lt;/math&gt; is uniform on the (''N''&amp;ndash;1)-dimensional [[unit sphere]] (''i.e.'', it is uniform on the surface of an ''N''-dimensional [[unit sphere|unit ball]]).<br /> <br /> ==References==<br /> *{{cite journal<br /> | last= Jeffreys | first=H. | authorlink=Harold Jeffreys<br /> | year = 1946<br /> | title = An Invariant Form for the Prior Probability in Estimation Problems<br /> | journal = Proceedings of the Royal Society of London. Series A, Mathematical and Physical Sciences<br /> | volume = 186<br /> | issue = 1007<br /> | pages = 453–461<br /> | jstor = 97883<br /> | doi = 10.1098/rspa.1946.0056 <br /> }}<br /> <br /> *{{cite book<br /> | last= Jeffreys | first=H. | authorlink=Harold Jeffreys<br /> | year = 1939<br /> | title = Theory of Probability<br /> | publisher = Oxford University Press<br /> }}<br /> <br /> == Footnotes ==<br /> <br /> &lt;references/&gt;<br /> <br /> [[Category:Bayesian statistics]]<br /> <br /> [[ru:Априорная вероятность Джеффри]]</div> 192.35.44.24