# Cauchy distribution

Jump to navigation Jump to search

Template:Distinguish Template:Probability distribution The Cauchy distribution, named after Augustin Cauchy, is a continuous probability distribution. It is also known, especially among physicists, as the Lorentz distribution (after Hendrik Lorentz), Cauchy–Lorentz distribution, Lorentz(ian) function, or Breit–Wigner distribution. The simplest Cauchy distribution is called the standard Cauchy distribution. It is the distribution of a random variable that is the ratio of two independent standard normal variables and has the probability density function

${\displaystyle f(x;0,1)={\frac {1}{\pi (1+x^{2})}}.\!}$

Its cumulative distribution function has the shape of an arctangent function arctan(x):

${\displaystyle F(x;0,1)={\frac {1}{\pi }}\arctan \left(x\right)+{\frac {1}{2}}}$

The Cauchy distribution is often used in statistics as the canonical example of a "pathological" distribution since both its mean and its variance are undefined. (But see the section Explanation of undefined moments below.) The Cauchy distribution does not have finite moments of order greater than or equal to one; only fractional absolute moments exist.[1] The Cauchy distribution has no moment generating function.

Its importance in physics is the result of it being the solution to the differential equation describing forced resonance.[2] In mathematics, it is closely related to the Poisson kernel, which is the fundamental solution for the Laplace equation in the upper half-plane. In spectroscopy, it is the description of the shape of spectral lines which are subject to homogeneous broadening in which all atoms interact in the same way with the frequency range contained in the line shape. Many mechanisms cause homogeneous broadening, most notably collision broadening, and Chantler–Alda radiation.[3] In its standard form, it is the maximum entropy probability distribution for a random variate X for which[4]

${\displaystyle \operatorname {E} \!\left[\ln(1+X^{2})\right]=\ln(4)}$

## History

Functions with the form of the Cauchy distribution were studied by mathematicians in the 17th century, but in a different context and under the title of the Witch of Agnesi. Despite its name, the first explicit analysis of the properties of the Cauchy distribution was published by the French mathematician Poisson in 1824, with Cauchy only becoming associated with it during an academic controversy in 1853.[5] As such, the name of the distribution is a case of Stigler's Law of Eponymy. Poisson noted that if the mean of observations following such a distribution were taken, the mean error did not converge to any finite number. As such, Laplace's use of the Central Limit Theorem with such a distribution was inappropriate, as it assumed a finite mean and variance. Despite this, Poisson did not regard the issue as important, in contrast to Bienaymé, who was to engage Cauchy in a long dispute over the matter.

## Characterisation

### Probability density function

The Cauchy distribution has the probability density function

${\displaystyle f(x;x_{0},\gamma )={\frac {1}{\pi \gamma \left[1+\left({\frac {x-x_{0}}{\gamma }}\right)^{2}\right]}}={1 \over \pi \gamma }\left[{\gamma ^{2} \over (x-x_{0})^{2}+\gamma ^{2}}\right],}$

where x0 is the location parameter, specifying the location of the peak of the distribution, and γ is the scale parameter which specifies the half-width at half-maximum (HWHM), alternatively 2γ is full width at half maximum (FWHM). γ is also equal to half the interquartile range and is sometimes called the probable error. Augustin-Louis Cauchy exploited such a density function in 1827 with an infinitesimal scale parameter, defining what would now be called a Dirac delta function.

The amplitude of the above Lorentzian function is given by

${\displaystyle {\text{Amplitude (or height)}}={\frac {1}{\pi \gamma }}.}$

The special case when x0 = 0 and γ = 1 is called the standard Cauchy distribution with the probability density function

${\displaystyle f(x;0,1)={\frac {1}{\pi (1+x^{2})}}.\!}$

In physics, a three-parameter Lorentzian function is often used:

${\displaystyle f(x;x_{0},\gamma ,I)={\frac {I}{\left[1+\left({\frac {x-x_{0}}{\gamma }}\right)^{2}\right]}}=I\left[{\gamma ^{2} \over (x-x_{0})^{2}+\gamma ^{2}}\right],}$

where I is the height of the peak. The three-parameter Lorentzian function idicated is not, in general, a probability density function, since it does not integrate to 1, except in the special case where ${\displaystyle I={\frac {1}{\pi \gamma }}.\!}$

### Cumulative distribution function

${\displaystyle F(x;x_{0},\gamma )={\frac {1}{\pi }}\arctan \left({\frac {x-x_{0}}{\gamma }}\right)+{\frac {1}{2}}}$

and the quantile function (inverse cdf) of the Cauchy distribution is

${\displaystyle Q(p;x_{0},\gamma )=x_{0}+\gamma \,\tan \left[\pi \left(p-{\tfrac {1}{2}}\right)\right].}$

It follows that the first and third quartiles are (x0−γ, x0+γ), and hence the interquartile range is 2γ.

The derivative of the quantile function, the quantile density function, for the Cauchy distribution is:

${\displaystyle Q'(p;\gamma )=\gamma \,\pi \,{\sec }^{2}\left[\pi \left(p-{\tfrac {1}{2}}\right)\right].\!}$

The differential entropy of a distribution can be defined in terms of its quantile density,[6] specifically

${\displaystyle h(\gamma )=\int _{0}^{1}\log \,(Q'(p;\gamma ))\,{\mathrm {d} }p=\log(\gamma )\,+\,\log(4\,\pi ).\!}$

## Properties

The Cauchy distribution is an example of a distribution which has no mean, variance or higher moments defined. Its mode and median are well defined and are both equal to x0.

When U and V are two independent normally distributed random variables with expected value 0 and variance 1, then the ratio U/V has the standard Cauchy distribution.

If X1, ..., Xn are independent and identically distributed random variables, each with a standard Cauchy distribution, then the sample mean (X1+ ... +Xn)/n has the same standard Cauchy distribution. To see that this is true, compute the characteristic function of the sample mean:

${\displaystyle \phi _{\overline {X}}(t)=\mathrm {E} \left[e^{i{\overline {X}}t}\right]}$

where ${\displaystyle {\overline {X}}}$ is the sample mean. This example serves to show that the hypothesis of finite variance in the central limit theorem cannot be dropped. It is also an example of a more generalized version of the central limit theorem that is characteristic of all stable distributions, of which the Cauchy distribution is a special case.

The Cauchy distribution is an infinitely divisible probability distribution. It is also a strictly stable distribution.[7]

The standard Cauchy distribution coincides with the Student's t-distribution with one degree of freedom.

Like all stable distributions, the location-scale family to which the Cauchy distribution belongs is closed under linear transformations with real coefficients. In addition, the Cauchy distribution is the only univariate distribution which is closed under linear fractional transformations with real coefficients.[8] In this connection, see also McCullagh's parametrization of the Cauchy distributions.

### Characteristic function

Let X denote a Cauchy distributed random variable. The characteristic function of the Cauchy distribution is given by

${\displaystyle \phi _{X}(t;x_{0},\gamma )={\mathrm {E} }\left[e^{iXt}\right]=\int _{-\infty }^{\infty }f(x;x_{0},\gamma )e^{ixt}\,dx=e^{ix_{0}t-\gamma |t|}.}$

which is just the Fourier transform of the probability density. The original probability density may be expressed in terms of the characteristic function, essentially by using the inverse Fourier transform:

${\displaystyle f(x;x_{0},\gamma )={\frac {1}{2\pi }}\int _{-\infty }^{\infty }\phi _{X}(t;x_{0},\gamma )e^{-ixt}\,dt\!}$

Observe that the characteristic function is not differentiable at the origin: this corresponds to the fact that the Cauchy distribution does not have an expected value.

## Explanation of undefined moments

### Mean

If a probability distribution has a density function f(x), then the mean is

${\displaystyle \int _{-\infty }^{\infty }xf(x)\,dx.\qquad \qquad (1)\!}$

The question is now whether this is the same thing as

${\displaystyle \int _{a}^{\infty }xf(x)\,dx+\int _{-\infty }^{a}xf(x)\,dx.\qquad \qquad (2)\!}$

for an arbitrary real number a.

If at most one of the two terms in (2) is infinite, then (1) is the same as (2). But in the case of the Cauchy distribution, both the positive and negative terms of (2) are infinite. Hence (1) is undefined.[9]

Although we may take (1) to mean

${\displaystyle \lim _{a\to \infty }\int _{-a}^{a}xf(x)\,dx,\!}$

and this is its Cauchy principal value, which is zero, we could also take (1) to mean, for example,

${\displaystyle \lim _{a\to \infty }\int _{-2a}^{a}xf(x)\,dx,\!}$

which is not zero, as can be seen easily by computing the integral.

Various results in probability theory about expected values, such as the strong law of large numbers, will not work in such cases.[9]

### Higher moments

The Cauchy distribution does not have finite moments of any order. Some of the higher raw moments do exist and have a value of infinity, for example the raw second moment:

{\displaystyle {\begin{aligned}\mathrm {E} [X^{2}]&\propto \int _{-\infty }^{\infty }{\frac {x^{2}}{1+x^{2}}}\,dx=\int _{-\infty }^{\infty }1-{\frac {1}{1+x^{2}}}\,dx\\[8pt]&=\int _{-\infty }^{\infty }dx-\int _{-\infty }^{\infty }{\frac {1}{1+x^{2}}}\,dx=\int _{-\infty }^{\infty }dx-\pi =\infty .\end{aligned}}}

By re-arranging the formula, one can see that the second moment is essentially the infinite integral of a constant (here 1). Higher even-powered raw moments will also evaluate to infinity. Odd-powered raw moments, however, do not exist at all (i.e. are undefined), which is distinctly different from existing with the value of infinity. The odd-powered raw moments are undefined because their values are essentially equivalent to ${\displaystyle \infty -\infty }$ since the two halves of the integral both diverge and have opposite signs. The first raw moment is the mean, which, being odd, does not exist. (See also the discussion above about this.) This in turn means that all of the central moments and standardized moments do not exist (are undefined), since they are all based on the mean. The variance — which is the second central moment — is likewise non-existent (despite the fact that the raw second moment exists with the value infinity).

The results for higher moments follow from Hölder's inequality, which implies that higher moments (or halves of moments) diverge if lower ones do.

## Estimation of parameters

Because the parameters of the Cauchy distribution don't correspond to a mean and variance, attempting to estimate the parameters of the Cauchy distribution by using a sample mean and a sample variance will not succeed.[10] For example, if n samples are taken from a Cauchy distribution, one may calculate the sample mean as:

${\displaystyle {\bar {x}}={\frac {1}{n}}\sum _{i=1}^{n}x_{i}}$

Although the sample values xi will be concentrated about the central value x0, the sample mean will become increasingly variable as more samples are taken, because of the increased likelihood of encountering sample points with a large absolute value. In fact, the distribution of the sample mean will be equal to the distribution of the samples themselves; i.e., the sample mean of a large sample is no better (or worse) an estimator of x0 than any single observation from the sample. Similarly, calculating the sample variance will result in values that grow larger as more samples are taken.

Therefore, more robust means of estimating the central value x0 and the scaling parameter γ are needed. One simple method is to take the median value of the sample as an estimator of x0 and half the sample interquartile range as an estimator of γ. Other, more precise and robust methods have been developed [11][12] For example, the truncated mean of the middle 24% of the sample order statistics produces an estimate for x0 that is more efficient than using either the sample median or the full sample mean.[13][14] However, because of the fat tails of the Cauchy distribution, the efficiency of the estimator decreases if more than 24% of the sample is used.[13][14]

Maximum likelihood can also be used to estimate the parameters x0 and γ. However, this tends to be complicated by the fact that this requires finding the roots of a high degree polynomial, and there can be multiple roots that represent local maxima.[15] Also, while the maximum likelihood estimator is asymptotically efficient, it is relatively inefficient for small samples.[16] The log-likelihood function for the Cauchy distribution for sample size n is:

${\displaystyle {\hat {\ell }}(\!x_{0},\gamma \mid x_{1},\dotsc ,x_{n})=n\log(\gamma )-\sum _{i=1}^{n}(\log[(\gamma )^{2}+(x_{i}-x_{0})^{2}])-n\log(\pi )}$

Maximizing the log likelihood function with respect to x0 and γ produces the following system of equations:

${\displaystyle \sum _{i=1}^{n}{\frac {x_{i}-x_{0}}{\gamma ^{2}+[x_{i}-\!x_{0}]^{2}}}=0}$
${\displaystyle \sum _{i=1}^{n}{\frac {\gamma ^{2}}{\gamma ^{2}+[x_{i}-x_{0}]^{2}}}-{\frac {n}{2}}=0}$

Note that

${\displaystyle \sum _{i=1}^{n}{\frac {\gamma ^{2}}{\gamma ^{2}+[x_{i}-x_{0}]^{2}}}}$

is a monotone function in γ and that the solution γ must satisfy

${\displaystyle \min |x_{i}-x_{0}|\leq \gamma \leq \max |x_{i}-x_{0}|.}$

Solving just for x0 requires solving a polynomial of degree 2n−1,[15] and solving just for γ requires solving a polynomial of degree n (first for γ2, then x0). Therefore, whether solving for one parameter or for both parameters simultaneously, a numerical solution on a computer is typically required. The benefit of maximum likelihood estimation is asymptotic efficiency; estimating x0 using the sample median is only about 81% as asymptotically efficient as estimating x0 by maximum likelihood.[14][17] The truncated sample mean using the middle 24% order statistics is about 88% as asymptotically efficient an estimator of x0 as the maximum likelihood estimate.[14] When Newton's method is used to find the solution for the maximum likelihood estimate, the middle 24% order statistics can be used as an initial solution for x0.

## Circular Cauchy distribution

If X is Cauchy distributed with median μ and scale parameter γ, then the complex variable

${\displaystyle Z={\frac {X-i}{X+i}}}$

has unit modulus and is distributed on the unit circle with density:

${\displaystyle P_{cc}(\theta ;\zeta )={\frac {1}{2\pi }}{\frac {1-|\zeta |^{2}}{|e^{i\theta }-\zeta |^{2}}}}$

with respect to the angular variable θ = arg(z),{{ safesubst:#invoke:Unsubst||date=__DATE__ |$B= {{#invoke:Category handler|main}}{{#invoke:Category handler|main}}[citation needed] }} where ${\displaystyle \zeta ={\frac {\psi -i}{\psi +i}}}$ and ψ expresses the two parameters of the associated linear Cauchy distribution for x as a complex number: ${\displaystyle \psi =\mu +i\gamma \,}$ The distribution ${\displaystyle P_{cc}(\theta ;\zeta )}$ is called the circular Cauchy distribution[18][19](also the complex Cauchy distribution){{ safesubst:#invoke:Unsubst||date=__DATE__ |$B= {{#invoke:Category handler|main}}{{#invoke:Category handler|main}}[citation needed] }} with parameter ζ. The circular Cauchy distribution is related to the wrapped Cauchy distribution. If ${\displaystyle P_{wc}(\theta ;\psi )}$ is a wrapped Cauchy distribution with the parameter ψ = μ + i γ representing the parameters of the corresponding "unwrapped" Cauchy distribution in the variable y where θ = y mod 2π, then

${\displaystyle P_{wc}(\theta ;\psi )=P_{cc}(\theta ,e^{i\psi })\,}$

See also McCullagh's parametrization of the Cauchy distributions and Poisson kernel for related concepts.

The circular Cauchy distribution expressed in complex form has finite moments of all orders

${\displaystyle \operatorname {E} [Z^{r}]=\zeta ^{r},\quad \operatorname {E} [{\bar {Z}}^{r}]={\bar {\zeta }}^{r}}$

for integer r ≥ 1. For |φ| < 1, the transformation

${\displaystyle U(z,\phi )={\frac {z-\phi }{1-{\bar {\phi }}z}}}$

is holomorphic on the unit disk, and the transformed variable U(Z, φ) is distributed as complex Cauchy with parameter U(ζ, φ).

Given a sample z1, ..., zn of size n > 2, the maximum-likelihood equation

${\displaystyle n^{-1}U\left(z,{\hat {\zeta }}\right)=n^{-1}\sum U\left(z_{j},{\hat {\zeta }}\right)=0}$

can be solved by a simple fixed-point iteration:

${\displaystyle \zeta ^{(r+1)}=U\left(n^{-1}U(z,\zeta ^{(r)}),\,-\zeta ^{(r)}\right)\,}$

starting with ζ(0) = 0. The sequence of likelihood values is non-decreasing, and the solution is unique for samples containing at least three distinct values.[20]

The maximum-likelihood estimate for the median (${\displaystyle {\hat {\mu }}}$) and scale parameter (${\displaystyle {\hat {\gamma }}}$) of a real Cauchy sample is obtained by the inverse transformation:

${\displaystyle {\hat {\mu }}\pm i{\hat {\gamma }}=i{\frac {1+{\hat {\zeta }}}{1-{\hat {\zeta }}}}.}$

For n ≤ 4, closed-form expressions are known for ${\displaystyle {\hat {\zeta }}}$.[15] The density of the maximum-likelihood estimator at t in the unit disk is necessarily of the form:

${\displaystyle {\frac {1}{4\pi }}{\frac {p_{n}(\chi (t,\zeta ))}{(1-|t|^{2})^{2}}},}$

where

${\displaystyle \chi (t,\zeta )={\frac {|t-\zeta |^{2}}{4(1-|t|^{2})(1-|\zeta |^{2})}}}$.

Formulae for p3 and p4 are available.[21]

## Multivariate Cauchy distribution

A random vector X = (X1, ..., Xk)′ is said to have the multivariate Cauchy distribution if every linear combination of its components Y = a1X1 + ... + akXk has a Cauchy distribution. That is, for any constant vector aRk, the random variable Y = a′X should have a univariate Cauchy distribution.[22] The characteristic function of a multivariate Cauchy distribution is given by:

${\displaystyle \phi _{X}(t)=e^{ix_{0}(t)-\gamma (t)},\!}$

where x0(t) and γ(t) are real functions with x0(t) a homogeneous function of degree one and γ(t) a positive homogeneous function of degree one.[22] More formally:[22]

${\displaystyle x_{0}(at)=ax_{0}(t),}$
${\displaystyle \gamma (at)=|a|\gamma (t),}$

for all t.

An example of a bivariate Cauchy distribution can be given by:[23]

${\displaystyle f(x,y;x_{0},y_{0},\gamma )={1 \over 2\pi }\left[{\gamma \over ((x-x_{0})^{2}+(y-y_{0})^{2}+\gamma ^{2})^{1.5}}\right].}$

Note that in this example, even though there is no analogue to a covariance matrix, x and y are not statistically independent.[23]

Analogously to the univariate density, the multidimensional Cauchy density also relates to the multivariate Student distribution. They are equivalent when the degrees of freedom parameter is equal to one. The density of a k dimension Student distribution with one degree of freedom becomes:

${\displaystyle f({\mathbf {x} };{\mathbf {\mu } },{\mathbf {\Sigma } },k)={\frac {\Gamma \left({\frac {1+k}{2}}\right)}{\Gamma ({\frac {1}{2}})\pi ^{\frac {k}{2}}\left|{\mathbf {\Sigma } }\right|^{\frac {1}{2}}\left[1+({\mathbf {x} }-{\mathbf {\mu } })^{T}{\mathbf {\Sigma } }^{-1}({\mathbf {x} }-{\mathbf {\mu } })\right]^{\frac {1+k}{2}}}}.}$

Properties and details for this density can be obtained by taking it as a particular case of the multivariate Student density.

## Transformation properties

{{#invoke:Category handler|main}}{{#invoke:Category handler|main}}[citation needed] }} ${\displaystyle X+Y\sim {\textrm {Cauchy}}(x_{0}+x_{1},\gamma _{0}+\gamma _{1})\,}$

{{#invoke:Category handler|main}}{{#invoke:Category handler|main}}[citation needed] }} Expressing a Cauchy distribution in terms of one complex parameter ${\displaystyle \psi =x_{0}+i\gamma }$, define X ~ Cauchy(ψ) to mean . If X ~ Cauchy(ψ) then:

${\displaystyle {\frac {aX+b}{cX+d}}}$ ~ Cauchy${\displaystyle \left({\frac {a\psi +b}{c\psi +d}}\right)}$

where a,b,c and d are real numbers.

## References

1. {{#invoke:citation/CS1|citation |CitationClass=book }}, Chapter 16.
2. http://webphysics.davidson.edu/Projects/AnAntonelli/node5.html Note that the intensity, which follows the Cauchy distribution, is the square of the amplitude.
3. {{#invoke:citation/CS1|citation |CitationClass=book }}
4. {{#invoke:Citation/CS1|citation |CitationClass=journal }}
5. Cauchy and the Witch of Agnesi in Statistics on the Table, S M Stigler Harvard 1999 Chapter 18
6. {{#invoke:Citation/CS1|citation |CitationClass=journal }}
7. {{#invoke:citation/CS1|citation |CitationClass=book }}
8. {{#invoke:Citation/CS1|citation |CitationClass=journal }}
9. Template:Cite web
10. Illustration of instability of sample means
11. {{#invoke:Citation/CS1|citation |CitationClass=journal }}
12. {{#invoke:Citation/CS1|citation |CitationClass=journal }}
13. {{#invoke:Citation/CS1|citation |CitationClass=journal }}
14. {{#invoke:Citation/CS1|citation |CitationClass=journal }}
15. {{#invoke:Citation/CS1|citation |CitationClass=journal }}
16. {{#invoke:Citation/CS1|citation |CitationClass=journal }}
17. {{#invoke:Citation/CS1|citation |CitationClass=journal }}
18. McCullagh, P., "Conditional inference and Cauchy models", Biometrika, volume 79 (1992), pages 247–259. PDF from McCullagh's homepage.
19. {{#invoke:citation/CS1|citation |CitationClass=book }}Template:Page needed
20. {{#invoke:Citation/CS1|citation |CitationClass=journal }}
21. {{#invoke:Citation/CS1|citation |CitationClass=journal }}
22. {{#invoke:Citation/CS1|citation |CitationClass=journal }}
23. {{#invoke:Citation/CS1|citation |CitationClass=journal }}