# Nested sampling algorithm

{{ safesubst:#invoke:Unsubst||$N=Context |date=__DATE__ |$B= {{#invoke:Message box|ambox}} }}

The nested sampling algorithm is a computational approach to the problem of comparing models in Bayesian statistics, developed in 2004 by physicist John Skilling.

## Background

Bayes' theorem can be applied to a pair of competing models $M1$ and $M2$ for data $D$ , one of which may be true (though which one is not known) but which both cannot simultaneously be true, as follows:

{\begin{aligned}P(M1|D)&{}={\frac {P(D|M1)P(M1)}{P(D)}}\\&{}={\frac {P(D|M1)P(M1)}{P(D|M1)P(M1)+P(D|M2)P(M2)}}\\&{}={\frac {1}{1+{\frac {P(D|M2)}{P(D|M1)}}{\frac {P(M2)}{P(M1)}}}}\end{aligned}} Given no a priori information in favor of $M1$ or $M2$ , it is reasonable to assign prior probabilities $P(M1)=P(M2)=1/2$ , so that $P(M2)/P(M1)=1$ . The remaining ratio $P(D|M2)/P(D|M1)$ is not so easy to evaluate since in general it requires marginalization of nuisance parameters. Generally, $M1$ has a collection of parameters that can be lumped together and called $\theta$ , and $M2$ has its own vector of parameters that may be of different dimensionality but is still referred to as $\theta$ . The marginalization for $M1$ is

$P(D|M1)=\int d\theta P(D|\theta ,M1)P(\theta |M1)$ and likewise for $M2$ . This integral is often analytically intractable, and in these cases it is necessary to employ a numerical algorithm to find an approximation. The nested sampling algorithm was developed by John Skilling specifically to approximate these marginalization integrals, and it has the added benefit of generating samples from the posterior distribution $P(\theta |D,M1)$ . It is an alternative to methods from the Bayesian literature such as bridge sampling and defensive importance sampling.

  Start with $N$ points $\theta _{1},...,\theta _{N}$ sampled from prior.
for $i=1$ to $j$ do        % The number of iterations j is chosen by guesswork.
$L_{i}:=\min($ current likelihood values of the points$)$ ;
$X_{i}:=\exp(-i/N);$ $w_{i}:=X_{i-1}-X_{i}$ $Z:=Z+L_{i}*w_{i};$ Save the point with least likelihood as a sample point with weight $w_{i}$ .
Update the point with least likelihood with some Markov Chain
Monte Carlo steps according to the prior, accepting only steps that
keep the likelihood above $L_{i}$ .
end
return $Z$ ;

${\begin{array}{lcl}P(D|M)&=&\int P(D|\theta ,M)P(\theta |M)d\theta \\&=&\int P(D|\theta ,M)dP(\theta |M)\\\end{array}}$ The idea is to chop up the range of $f(\theta )=P(D|\theta ,M)$ and estimate, for each interval $[f(\theta _{i-1}),f(\theta _{i})]$ , how likely it is a priori that a randomly chosen $\theta$ would map to this interval. This can be thought of as a Bayesian's way to numerically implement Lebesgue integration.

## Applications

Since nested sampling was proposed in 2004, it has been used in multiple settings within the field of astronomy. One paper suggested using nested sampling for cosmological model selection and object detection, as it "uniquely combines accuracy, general applicability and computational feasibility." A refinement of the nested sampling algorithm to handle multimodal posteriors has also been suggested as a means of detecting astronomical objects in existing datasets.