# Poisson distribution

Template:Probability distribution In probability theory and statistics, the Poisson distribution (French pronunciation Template:IPA-fr; in English usually Template:IPAc-en), named after French mathematician Siméon Denis Poisson, is a discrete probability distribution that expresses the probability of a given number of events occurring in a fixed interval of time and/or space if these events occur with a known average rate and independently of the time since the last event. The Poisson distribution can also be used for the number of events in other specified intervals such as distance, area or volume.

For instance, an individual keeping track of the amount of mail they receive each day may notice that they receive an average number of 4 letters per day. If receiving any particular piece of mail doesn't affect the arrival times of future pieces of mail, i.e., if pieces of mail from a wide range of sources arrive independently of one another, then a reasonable assumption is that the number of pieces of mail received per day obeys a Poisson distribution. Other examples that may follow a Poisson: the number of phone calls received by a call center per hour, the number of decay events per second from a radioactive source, or the number of taxis passing a particular street corner per hour.

## History

The distribution was first introduced by Siméon Denis Poisson (1781–1840) and published, together with his probability theory, in 1837 in his work Recherches sur la probabilité des jugements en matière criminelle et en matière civile (“Research on the Probability of Judgments in Criminal and Civil Matters”). The work theorized about the number of wrongful convictions in a given country by focusing on certain random variables N that count, among other things, the number of discrete occurrences (sometimes called "events" or “arrivals”) that take place during a time-interval of given length. The result had been given previously by Abraham de Moivre (1711) in De Mensura Sortis seu; de Probabilitate Eventuum in Ludis a Casu Fortuito Pendentibus in Philosophical Transactions of the Royal Society, p. 219.

A practical application of this distribution was made by Ladislaus Bortkiewicz in 1898 when he was given the task of investigating the number of soldiers in the Prussian army killed accidentally by horse kicks; this experiment introduced the Poisson distribution to the field of reliability engineering.

## Definition

A discrete random variable XTemplate:Space is said to have a Poisson distribution with parameter λ > 0, if, for k = 0, 1, 2, …, the probability mass function of XTemplate:Space is given by:

$\!f(k;\lambda )=\Pr(X{=}k)={\frac {\lambda ^{k}e^{-\lambda }}{k!}},$ where

The positive real number λ is equal to the expected value of X and also to its variance

$\lambda =\operatorname {E} (X)=\operatorname {Var} (X).$ The Poisson distribution can be applied to systems with a large number of possible events, each of which is rare. How many such events will occur during a fixed time interval? Under the right circumstances, this is a random number with a Poisson distribution.

## Properties

### Mean

$\operatorname {E} |X-\lambda |=2\exp(-\lambda ){\frac {\lambda ^{\lfloor \lambda \rfloor +1}}{\lfloor \lambda \rfloor !}}.$ ### Median

Bounds for the median (ν) of the distribution are known and are sharp:

$\lambda -\ln 2\leq \nu <\lambda +{\frac {1}{3}}.$ ### Higher moments

$m_{k}=\sum _{i=1}^{k}\lambda ^{i}\left\{{\begin{matrix}k\\i\end{matrix}}\right\},$ where the {braces} denote Stirling numbers of the second kind. The coefficients of the polynomials have a combinatorial meaning. In fact, when the expected value of the Poisson distribution is 1, then Dobinski's formula says that the nth moment equals the number of partitions of a set of size n.
• Sums of Poisson-distributed random variables:
If $X_{i}\sim {\mathrm {Pois} }(\lambda _{i})\,i=1,\dots ,n$ are independent, and $\lambda =\sum _{i=1}^{n}\lambda _{i}$ , then $Y=\left(\sum _{i=1}^{n}X_{i}\right)\sim {\mathrm {Pois} }(\lambda )$ . A converse is Raikov's theorem, which says that if the sum of two independent random variables is Poisson-distributed, then so is each of those two independent random variables.

### Other properties

$D_{\mathrm {KL} }(\lambda \|\lambda _{0})=\lambda _{0}-\lambda +\lambda \log {\frac {\lambda }{\lambda _{0}}}.$ $P(X\geq x)\leq {\frac {e^{-\lambda }(e\lambda )^{x}}{x^{x}}},{\text{ for }}x>\lambda ,$ $P(X\leq x)\leq {\frac {e^{-\lambda }(e\lambda )^{x}}{x^{x}}},{\text{ for }}x<\lambda .$ $\left\{(k+1)f(k+1)-\lambda f(k)=0,f(0)=e^{-\lambda }\right\}$ ### Poisson Races

${\frac {e^{-({\sqrt {\mu }}-{\sqrt {\lambda }})^{2}}}{(\lambda +\mu )^{2}}}-{\frac {e^{-(\lambda +\mu )}}{2{\sqrt {\lambda \mu }}}}-{\frac {e^{-(\lambda +\mu )}}{4\lambda \mu }}\leq P(X-Y\geq 0)\leq e^{-({\sqrt {\mu }}-{\sqrt {\lambda }})^{2}}$ The upper bound is proved using a standard Chernoff bound.

## Related distributions

Specifically, given $X_{1}+X_{2}=k$ , $\!X_{1}\sim {\mathrm {Binom} }(k,\lambda _{1}/(\lambda _{1}+\lambda _{2}))$ .
More generally, if X1, X2,..., Xn are independent Poisson random variables with parameters λ1, λ2,..., λn then
given $\sum _{j=1}^{n}X_{j}=k,$ $X_{i}\sim {\mathrm {Binom} }\left(k,{\frac {\lambda _{i}}{\sum _{j=1}^{n}\lambda _{j}}}\right)$ . In fact, $\{X_{i}\}\sim {\mathrm {Multinom} }\left(k,\left\{{\frac {\lambda _{i}}{\sum _{j=1}^{n}\lambda _{j}}}\right\}\right)$ .
$F_{\mathrm {Binomial} }(k;n,p)\approx F_{\mathrm {Poisson} }(k;\lambda =np)\,$ $F_{\mathrm {Poisson} }(x;\lambda )\approx F_{\mathrm {normal} }(x;\mu =\lambda ,\sigma ^{2}=\lambda )\,$ {{#invoke:Category handler|main}}{{#invoke:Category handler|main}}[citation needed] }} Other, slightly more complicated, variance stabilizing transformations are available, one of which is Anscombe transform. See Data transformation (statistics) for more general uses of transformations.

$F_{\text{Poisson}}(k;\lambda )=1-F_{\chi ^{2}}(2\lambda ;2(k+1))\quad \quad {\text{ integer }}k,$ and
$\Pr(X=k)=F_{\chi ^{2}}(2\lambda ;2(k+1))-F_{\chi ^{2}}(2\lambda ;2k).$ ## Occurrence

Applications of the Poisson distribution can be found in many fields related to counting:

The Poisson distribution arises in connection with Poisson processes. It applies to various phenomena of discrete properties (that is, those that may happen 0, 1, 2, 3, ... times during a given period of time or in a given area) whenever the probability of the phenomenon happening is constant in time or space. Examples of events that may be modelled as a Poisson distribution include:

• The number of deaths per year in a given age group.
• The number of jumps in a stock price in a given time interval.
• Under an assumption of homogeneity, the number of times a web server is accessed per minute.
• The number of mutations in a given stretch of DNA after a certain amount of radiation.
• The proportion of cells that will be infected at a given multiplicity of infection.
• The arrival of photons on a pixel circuit at a given illumination and over a given time period.
• The targeting of V-1 flying bombs on London during World War II.

Gallagher in 1976 showed that the counts of prime numbers in short intervals obey a Poisson distribution provided a certain version of an unproved conjecture of Hardy and Littlewood is true.

{{safesubst:#invoke:anchor|main}}

### Law of rare events

{{#invoke:main|main}} Comparison of the Poisson distribution (black lines) and the binomial distribution with n=10 (red circles), n=20 (blue circles), n=1000 (green circles). All distributions have a mean of 5. The horizontal axis shows the number of events k. Notice that as n gets larger, the Poisson distribution becomes an increasingly better approximation for the binomial distribution with the same mean.

The rate of an event is related to the probability of an event occurring in some small subinterval (of time, space or otherwise). In the case of Poisson distribution, one assumes that there exists a small enough subinterval for which the probability of an event occurring twice is "negligible". With this assumption one can derive the Poisson distribution from the Binomial one, given only the information of expected number of total events in the whole interval. Indeed let this total number be $\lambda$ . Divide the whole interval into $n$ subintervals $I_{1},\dots ,I_{n}$ of equal size, such that $n$ > $\lambda$ (since we are only interested in very small portions of the interval this assumption is meaningful). This means that the expected number of events in an interval $I_{k}$ for each $k$ is equal to $\lambda /n$ . Now we assume that the occurrence of an event in the whole interval can be seen as a Bernoulli trial, where the $i^{th}$ trial corresponds to looking whether an event happens at the subinterval $I_{i}$ with probability $\lambda /n$ . Indeed the expected number of total events in $n$ such trials would be $\lambda$ , the expected number of total events in the whole interval. Hence for each subdivision of the interval we have approximated the occurrence of the event as a Bernoulli process of the form ${\textrm {B}}(n,\lambda /n)$ . As we have noted before we want to consider only very small subintervals. Therefore we take the limit as $n$ goes to infinity. In this case the binomial distribution converges to what is known as the Poisson distribution by the Poisson limit theorem.

In several of the above examples—such as, the number of mutations in a given sequence of DNA—the events being counted are actually the outcomes of discrete trials, and would more precisely be modelled using the binomial distribution, that is

$X\sim {\textrm {B}}(n,p).\,$ In such cases n is very large and p is very small (and so the expectation np is of intermediate magnitude). Then the distribution may be approximated by the less cumbersome Poisson distribution{{ safesubst:#invoke:Unsubst||date=__DATE__ |$B= {{#invoke:Category handler|main}}{{#invoke:Category handler|main}}[citation needed] }} $X\sim {\textrm {Pois}}(np).\,$ This approximation is sometimes known as the law of rare events, since each of the n individual Bernoulli events rarely occurs. The name may be misleading because the total count of success events in a Poisson process need not be rare if the parameter np is not small. For example, the number of telephone calls to a busy switchboard in one hour follows a Poisson distribution with the events appearing frequent to the operator, but they are rare from the point of view of the average member of the population who is very unlikely to make a call to that switchboard in that hour.{{ safesubst:#invoke:Unsubst||date=__DATE__ |$B= {{#invoke:Category handler|main}}{{#invoke:Category handler|main}}[citation needed] }}

The word law is sometimes used as a synonym of probability distribution, and convergence in law means convergence in distribution. Accordingly, the Poisson distribution is sometimes called the law of small numbers because it is the probability distribution of the number of occurrences of an event that happens rarely but has very many opportunities to happen. The Law of Small Numbers is a book by Ladislaus Bortkiewicz (Bortkevitch) about the Poisson distribution, published in 1898. Some have suggested that the Poisson distribution should have been called the Bortkiewicz distribution.

### Multi-dimensional Poisson process

{{#invoke:main|main}}

The poisson distribution arises as the distribution of counts of occurrences of events in (multidimensional) intervals in multidimensional Poisson processes in a directly equivalent way to the result for unidimensional processes. Thus, if D is any region the multidimensional space for which |D|, the area or volume of the region, is finite, and if N(D) is count of the number of events in D, then

$P(N(D)=k)={\frac {(\lambda |D|)^{k}e^{-\lambda |D|}}{k!}}.$ ### Other applications in science

In a Poisson process, the number of observed occurrences fluctuates about its mean λ with a standard deviation $\sigma _{k}={\sqrt {\lambda }}$ . These fluctuations are denoted as Poisson noise or (particularly in electronics) as shot noise.{{ safesubst:#invoke:Unsubst||date=__DATE__ |$B= {{#invoke:Category handler|main}}{{#invoke:Category handler|main}}[citation needed] }} The correlation of the mean and standard deviation in counting independent discrete occurrences is useful scientifically. By monitoring how the fluctuations vary with the mean signal, one can estimate the contribution of a single occurrence, even if that contribution is too small to be detected directly. For example, the charge e on an electron can be estimated by correlating the magnitude of an electric current with its shot noise. If N electrons pass a point in a given time t on the average, the mean current is $I=eN/t$ ; since the current fluctuations should be of the order $\sigma _{I}=e{\sqrt {N}}/t$ (i.e., the standard deviation of the Poisson process), the charge $e$ can be estimated from the ratio $t\sigma _{I}^{2}/I$ .{{ safesubst:#invoke:Unsubst||date=__DATE__ |$B= {{#invoke:Category handler|main}}{{#invoke:Category handler|main}}[citation needed] }}

An everyday example is the graininess that appears as photographs are enlarged; the graininess is due to Poisson fluctuations in the number of reduced silver grains, not to the individual grains themselves. By correlating the graininess with the degree of enlargement, one can estimate the contribution of an individual grain (which is otherwise too small to be seen unaided).{{ safesubst:#invoke:Unsubst||date=__DATE__ |$B= {{#invoke:Category handler|main}}{{#invoke:Category handler|main}}[citation needed] }} Many other molecular applications of Poisson noise have been developed, e.g., estimating the number density of receptor molecules in a cell membrane. $\Pr(N_{t}=k)=f(k;\lambda t)={\frac {e^{-\lambda t}(\lambda t)^{k}}{k!}}.$ In Causal Set theory the discrete elements of spacetime follow a Poisson distribution in the volume. ## Generating Poisson-distributed random variables A simple algorithm to generate random Poisson-distributed numbers (pseudo-random number sampling) has been given by Knuth (see References below): algorithm poisson random number (Knuth): init: Let L ← e−λ, k ← 0 and p ← 1. do: k ← k + 1. Generate uniform random number u in [0,1] and let p ← p × u. while p > L. return k − 1.  While simple, the complexity is linear in the returned value k, which is λ on average. There are many other algorithms to overcome this. Some are given in Ahrens & Dieter, see References below. Also, for large values of λ, there may be numerical stability issues because of the term e−λ. One solution for large values of λ is rejection sampling, another is to use a Gaussian approximation to the Poisson. Inverse transform sampling is simple and efficient for small values of λ, and requires only one uniform random number u per sample. Cumulative probabilities are examined in turn until one exceeds u. algorithm Poisson generator based upon the inversion by sequential search: init: Let x ← 0, p ← e−λ, s ← p. Generate uniform random number u in [0,1]. while u > s do: x ← x + 1. p ← p * λ / x. s ← s + p. return x.  "This algorithm ... requires expected time proportional to λ as λ→∞. For large λ, round-off errors proliferate, which provides us with another reason for avoiding large values of λ." ## Parameter estimation ### Maximum likelihood Given a sample of n measured values ki = 0, 1, 2, ..., for i = 1, ..., n, we wish to estimate the value of the parameter λ of the Poisson population from which the sample was drawn. The maximum likelihood estimate is  ${\widehat {\lambda }}_{\mathrm {MLE} }={\frac {1}{n}}\sum _{i=1}^{n}k_{i}.\!$ Since each observation has expectation λ so does this sample mean. Therefore the maximum likelihood estimate is an unbiased estimator of λ. It is also an efficient estimator, i.e. its estimation variance achieves the Cramér–Rao lower bound (CRLB).{{ safesubst:#invoke:Unsubst||date=__DATE__ |$B= {{#invoke:Category handler|main}}{{#invoke:Category handler|main}}[citation needed] }} Hence it is MVUE. Also it can be proved that the sum (and hence the sample mean as it is a one-to-one function of the sum) is a complete and sufficient statistic for λ.

To prove sufficiency we may use the factorization theorem. Consider partitioning the probability mass function of the joint Poisson distribution for the sample into two parts: one that depends solely on the sample $\mathbf {x}$ (called $h(\mathbf {x} )$ ) and one that depends on the parameter $\lambda$ and the sample $\mathbf {x}$ only through the function $T({\mathbf {x} })$ . Then $T({\mathbf {x} })$ is a sufficient statistic for $\lambda$ .

$P({\mathbf {x} })=\prod _{i=1}^{n}{\frac {\lambda ^{x}e^{-\lambda }}{x!}}={\frac {1}{\prod _{i=1}^{n}x_{i}!}}\times \lambda ^{\sum _{i=1}^{n}x_{i}}e^{-n\lambda }$ To find the parameter λ that maximizes the probability function for the Poisson population, we can use the logarithm of the probability function:

{\begin{aligned}L(\lambda )&=\ln \prod _{i=1}^{n}f(k_{i}\mid \lambda )\\&=\sum _{i=1}^{n}\ln \!\left({\frac {e^{-\lambda }\lambda ^{k_{i}}}{k_{i}!}}\right)\\&=-n\lambda +\left(\sum _{i=1}^{n}k_{i}\right)\ln(\lambda )-\sum _{i=1}^{n}\ln(k_{i}!).\end{aligned}} We take the derivative of L with respect to λ and compare it to zero:

${\frac {\mathrm {d} }{\mathrm {d} \lambda }}L(\lambda )=0\iff -n+\left(\sum _{i=1}^{n}k_{i}\right){\frac {1}{\lambda }}=0.\!$ Solving for λ gives a stationary point.

$\lambda ={\frac {\sum _{i=1}^{n}k_{i}}{n}}$ So λ is the average of the ki values. Obtaining the sign of the second derivative of L at the stationary point will determine what kind of extreme value λ is.

${\frac {\partial ^{2}L}{\partial \lambda ^{2}}}=-\lambda ^{-2}\sum _{i=1}^{n}k_{i}$ Evaluating the second derivative at the stationary point gives:

${\frac {\partial ^{2}L}{\partial \lambda ^{2}}}=-{\frac {n^{2}}{\sum _{i=1}^{n}k_{i}}}$ which is the negative of n times the reciprocal of the average of the ki. This expression is negative when the average is positive. If this is satisfied, then the stationary point maximizes the probability function.

$E(g(T))=\sum _{t=0}^{\infty }g(t){\frac {(n\lambda )^{t}e^{-n\lambda }}{t!}}=0$ ### Confidence interval

The confidence interval for the mean of a Poisson distribution can be expressed using the relationship between the cumulative distribution functions of the Poisson and chi-squared distributions. The chi-squared distribution is itself closely related to the gamma distribution, and this leads to an alternative expression. Given an observation k from a Poisson distribution with mean μ, a confidence interval for μ with confidence level 1 – α is

${\tfrac {1}{2}}\chi ^{2}(\alpha /2;2k)\leq \mu \leq {\tfrac {1}{2}}\chi ^{2}(1-\alpha /2;2k+2),$ or equivalently,

$F^{-1}(\alpha /2;k,1)\leq \mu \leq F^{-1}(1-\alpha /2;k+1,1),$ where $\chi ^{2}(p;n)$ is the quantile function (corresponding to a lower tail area p) of the chi-squared distribution with n degrees of freedom and $F^{-1}(p;n,1)$ is the quantile function of a Gamma distribution with shape parameter n and scale parameter 1. This interval is 'exact' in the sense that its coverage probability is never less than the nominal 1 – α.

When quantiles of the Gamma distribution are not available, an accurate approximation to this exact interval has been proposed (based on the Wilson–Hilferty transformation):

$k\left(1-{\frac {1}{9k}}-{\frac {z_{\alpha /2}}{3{\sqrt {k}}}}\right)^{3}\leq \mu \leq (k+1)\left(1-{\frac {1}{9(k+1)}}+{\frac {z_{\alpha /2}}{3{\sqrt {k+1}}}}\right)^{3},$ For application of these formulae in the same context as above (given a sample of n measured values ki each drawn from a Poisson distribution with mean λ), one would set

$k=\sum _{i=1}^{n}k_{i},\!$ calculate an interval for μ = , and then derive the interval for λ.

### Bayesian inference

In Bayesian inference, the conjugate prior for the rate parameter λ of the Poisson distribution is the gamma distribution. Let

$\lambda \sim {\mathrm {Gamma} }(\alpha ,\beta )\!$ denote that λ is distributed according to the gamma density g parameterized in terms of a shape parameter α and an inverse scale parameter β:

$g(\lambda \mid \alpha ,\beta )={\frac {\beta ^{\alpha }}{\Gamma (\alpha )}}\;\lambda ^{\alpha -1}\;e^{-\beta \,\lambda }\qquad {\text{ for }}\lambda >0\,\!.$ Then, given the same sample of n measured values ki as before, and a prior of Gamma(α, β), the posterior distribution is

$\lambda \sim {\mathrm {Gamma} }\left(\alpha +\sum _{i=1}^{n}k_{i},\beta +n\right).\!$ The posterior mean E[λ] approaches the maximum likelihood estimate ${\widehat {\lambda }}_{\mathrm {MLE} }$ in the limit as $\alpha \to 0,\ \beta \to 0$ .{{ safesubst:#invoke:Unsubst||date=__DATE__ |\$B= {{#invoke:Category handler|main}}{{#invoke:Category handler|main}}[citation needed] }}

The posterior predictive distribution for a single additional observation is a negative binomial distribution, sometimes called a Gamma–Poisson distribution.

### Simultaneous estimation of multiple Poisson means

${\hat {\lambda }}_{i}=\left(1-{\frac {c}{b+\sum _{i=1}^{p}X_{i}}}\right)X_{i},\qquad i=1,\dots ,p.$ ## Bivariate Poisson distribution

This distribution has been extended to the bivariate case. The generating function for this distribution is

$g(u,v)=\exp[(\theta _{1}-\theta _{12})(u-1)+(\theta _{2}-\theta _{12})(v-1)+\theta _{12}(uv-1)]$ with

$\theta _{1},\theta _{2}>\theta _{12}>0\,$ The marginal distributions are Poisson(θ1) and Poisson(θ2) and the correlation coefficient is limited to the range

$0\leq \rho \leq \min \left\{{\frac {\theta _{1}}{\theta _{2}}},{\frac {\theta _{2}}{\theta _{1}}}\right\}$ A simple way to generate a bivariate Poisson distribution $X_{1},X_{2}$ is to take three independent Poisson distributions $Y_{1},Y_{2},Y_{3}$ with means $\lambda _{1},\lambda _{2},\lambda _{3}$ and then set $X_{1}=Y_{1}+Y_{3},X_{2}=Y_{2}+Y_{3}$ . The probability function of the bivariate Poisson distribution is

{\begin{aligned}&\Pr(X_{1}=k_{1},X_{2}=k_{2})\\={}&\exp \left(-\lambda _{1}-\lambda _{2}-\lambda _{3}\right){\frac {\lambda _{1}^{k_{1}}}{k_{1}!}}{\frac {\lambda _{2}^{k_{2}}}{k_{2}!}}\sum _{k=0}^{\min(k_{1},k_{2})}{\binom {k_{1}}{k}}{\binom {k_{2}}{k}}k!\left({\frac {\lambda _{3}}{\lambda _{1}\lambda _{2}}}\right)^{k}\end{aligned}} 