# Particle filter

Particle filters or Sequential Monte Carlo (SMC) methods are a set of on-line posterior density estimation algorithms that estimate the posterior density of the state-space by directly implementing the Bayesian recursion equations. The term "sequential Monte Carlo" was first coined in Liu and Chen (1998). SMC methods use a sampling approach, with a set of particles to represent the posterior density. The state-space model can be non-linear and the initial state and noise distributions can take any form required. SMC methods provide a well-established methodology for generating samples from the required distribution without requiring assumptions about the state-space model or the state distributions. However, these methods do not perform well when applied to high-dimensional systems. SMC methods implement the Bayesian recursion equations directly by using an ensemble based approach. The samples from the distribution are represented by a set of particles; each particle has a weight assigned to it that represents the probability of that particle being sampled from the probability density function.

Weight disparity leading to weight collapse is a common issue encountered in these filtering algorithms; however it can be mitigated by including a resampling step before the weights become too uneven. In the resampling step, the particles with negligible weights are replaced by new particles in the proximity of the particles with higher weights.

## History

The first traces of particle filters date back to the 50's; the 'Poor Man's Monte Carlo', that was proposed by Hammersley et al., in 1954, contained hints of the SMC methods used today. Later in the 70's, similar attempts were made in the control community. However it was in 1993, that Gordon et al., published their seminal work 'A novel Approach to nonlinear/non-Gaussian Bayesian State estimation', that provided the first true implementation of the SMC methods used today. The authors named their algorithm 'the bootstrap filter', and demonstrated that compared to other filtering methods, their algorithm does not require any assumption about that state-space or the noise of the system.

## Objective

The objective of a particle filter is to estimate the posterior density of the state variables given the observation variables. The particle filter is designed for a hidden Markov Model, where the system consists of hidden and observable variables. The observable variables (observation process) are related to the hidden variables (state-process) by some functional form that is known. Similarly the dynamical system describing the evolution of the state variables is also known probabilistically.

A generic particle filter estimates the posterior distribution of the hidden states using the observation measurement process. Consider a state-space shown in the diagram (Figure 2).

Figure 2. The State-Space (The Hidden states given by the vector x, and the observation states given by the vector y)

The objective of the particle filter is to estimate the values of the hidden states x, given the values of the observation process y.

The particle filter aims to estimate the sequence of hidden parameters, xk for k = 0,1,2,3,…, based only on the observed data yk for k = 0,1,2,3,…. All Bayesian estimates of xk follow from the posterior distribution p(xk | y0,y1,…,yk). In contrast, the MCMC or importance sampling approach would model the full posterior p(x0,x1,…,xk | y0,y1,…,yk).

## Model

Particle methods assume ${\displaystyle x_{k}}$ and the observations ${\displaystyle y_{k}}$ can be modeled in this form:

and with an initial distribution ${\displaystyle p(x_{0})}$.

In other words, each ${\displaystyle y_{k}}$ only depends on ${\displaystyle x_{k}}$. This conditional distribution for ${\displaystyle y_{k}}$ is written as
${\displaystyle y_{k}|x_{k}\sim p_{y|x}(y|x_{k})}$

An example system with these properties is:

${\displaystyle x_{k}=g(x_{k-1})+w_{k}\,}$
${\displaystyle y_{k}=h(x_{k})+v_{k}\,}$

where both ${\displaystyle w_{k}}$ and ${\displaystyle v_{k}}$ are mutually independent and identically distributed sequences with known probability density functions and ${\displaystyle g(\cdot )}$ and ${\displaystyle h(\cdot )}$ are known functions. These two equations can be viewed as state space equations and look similar to the state space equations for the Kalman filter. If the functions ${\displaystyle g(\cdot )}$ and ${\displaystyle h(\cdot )}$ are linear, and if both ${\displaystyle w_{k}}$ and ${\displaystyle v_{k}}$ are Gaussian, the Kalman filter finds the exact Bayesian filtering distribution. If not, Kalman filter based methods are a first-order approximation (EKF) or a second-order approximation (UKF in general, but if probability distribution is Gaussian a third-order approximation is possible). Particle filters are also an approximation, but with enough particles they can be much more accurate.

## Monte Carlo approximation

Particle methods, like all sampling-based approaches (e.g., MCMC), generate a set of samples that approximate the filtering distribution ${\displaystyle p(x_{k}|y_{0},\dots ,y_{k})}$. For example, we may have ${\displaystyle N}$ samples from the approximate posterior distribution of ${\displaystyle x_{k}}$, where the samples are labeled with superscripts as ${\displaystyle x_{k}^{(1)},x_{k}^{(2)},\ldots ,x_{k}^{(N)}}$. Then, expectations with respect to the filtering distribution are approximated by

${\displaystyle \int f(x_{k})p(x_{k}|y_{0},\dots ,y_{k})\,dx_{k}\approx \sum _{L=1}^{N}f\left(x_{k}^{(L)}\right){\frac {1}{N}}}$

and ${\displaystyle f(\cdot )}$, in the usual way for Monte Carlo, can give all the moments etc. of the distribution up to some degree of approximation.

## Sequential importance resampling (SIR)

Sequential importance resampling (SIR), the original particle filtering algorithm (Gordon et al. 1993), is a very commonly used particle filtering algorithm, which approximates the filtering distribution ${\displaystyle p(x_{k}|y_{0},\ldots ,y_{k})}$ by a weighted set of P particles

${\displaystyle \{(w_{k}^{(L)},x_{k}^{(L)})~:~L\in \{1,\ldots ,P\}\}.}$

The importance weights ${\displaystyle w_{k}^{(L)}}$ are approximations to the relative posterior probabilities (or densities) of the particles such that ${\displaystyle \sum _{L=1}^{P}w_{k}^{(L)}=1}$.

SIR is a sequential (i.e., recursive) version of importance sampling. As in importance sampling, the expectation of a function ${\displaystyle f(\cdot )}$ can be approximated as a weighted average

${\displaystyle \int f(x_{k})p(x_{k}|y_{0},\dots ,y_{k})dx_{k}\approx \sum _{L=1}^{P}w_{k}^{(L)}f(x_{k}^{(L)}).}$

For a finite set of particles, the algorithm performance is dependent on the choice of the proposal distribution

${\displaystyle \pi (x_{k}|x_{0:k-1},y_{0:k})\,}$.

The optimal proposal distribution is given as the target distribution

${\displaystyle \pi (x_{k}|x_{0:k-1},y_{0:k})=p(x_{k}|x_{k-1},y_{k}).\,}$

However, the transition prior probability distribution is often used as importance function, since it is easier to draw particles (or samples) and perform subsequent importance weight calculations:

${\displaystyle \pi (x_{k}|x_{0:k-1},y_{0:k})=p(x_{k}|x_{k-1}).\,}$

Sequential Importance Resampling (SIR) filters with transition prior probability distribution as importance function are commonly known as bootstrap filter and condensation algorithm.

Resampling is used to avoid the problem of degeneracy of the algorithm, that is, avoiding the situation that all but one of the importance weights are close to zero. The performance of the algorithm can be also affected by proper choice of resampling method. The stratified sampling proposed by Kitagawa (1996) is optimal in terms of variance.

A single step of sequential importance resampling is as follows:

1) For ${\displaystyle L=1,\ldots ,P}$ draw samples from the proposal distribution
${\displaystyle x_{k}^{(L)}\sim \pi (x_{k}|x_{0:k-1}^{(L)},y_{0:k})}$
2) For ${\displaystyle L=1,\ldots ,P}$ update the importance weights up to a normalizing constant:
${\displaystyle {\hat {w}}_{k}^{(L)}=w_{k-1}^{(L)}{\frac {p(y_{k}|x_{k}^{(L)})p(x_{k}^{(L)}|x_{k-1}^{(L)})}{\pi (x_{k}^{(L)}|x_{0:k-1}^{(L)},y_{0:k})}}.}$
Note that when we use the transition prior probability distribution as the importance function, ${\displaystyle \pi (x_{k}^{(L)}|x_{0:k-1}^{(L)},y_{0:k})=p(x_{k}^{(L)}|x_{k-1}^{(L)})}$, this simplifies to the following :
${\displaystyle {\hat {w}}_{k}^{(L)}=w_{k-1}^{(L)}p(y_{k}|x_{k}^{(L)}),}$
3) For ${\displaystyle L=1,\ldots ,P}$ compute the normalized importance weights:
${\displaystyle w_{k}^{(L)}={\frac {{\hat {w}}_{k}^{(L)}}{\sum _{J=1}^{P}{\hat {w}}_{k}^{(J)}}}}$
4) Compute an estimate of the effective number of particles as
${\displaystyle {\hat {N}}_{\mathit {eff}}={\frac {1}{\sum _{L=1}^{P}\left(w_{k}^{(L)}\right)^{2}}}}$
5) If the effective number of particles is less than a given threshold ${\displaystyle {\hat {N}}_{\mathit {eff}}, then perform resampling:
a) Draw ${\displaystyle P}$ particles from the current particle set with probabilities proportional to their weights. Replace the current particle set with this new one.
b) For ${\displaystyle L=1,\ldots ,P}$ set ${\displaystyle w_{k}^{(L)}=1/P.}$

The term Sampling Importance Resampling is also sometimes used when referring to SIR filters.

## Sequential importance sampling (SIS)

• Is the same as sequential importance resampling, but without the resampling stage.

## "direct version" algorithm

Template:Confusing section The "direct version" algorithm {{ safesubst:#invoke:Unsubst||date=__DATE__ |\$B= {{#invoke:Category handler|main}}{{#invoke:Category handler|main}}[citation needed] }} is rather simple (compared to other particle filtering algorithms) and it uses composition and rejection. To generate a single sample ${\displaystyle x}$ at ${\displaystyle k}$ from ${\displaystyle p_{x_{k}|y_{1:k}}(x|y_{1:k})}$:

1) Set n=0 (This will count the number of particles generated so far)
2) Uniformly choose an index L from the range ${\displaystyle \{1,...,P\}}$
3) Generate a test ${\displaystyle {\hat {x}}}$ from the distribution ${\displaystyle p_{x_{k}|x_{k-1}}(x|x_{k-1|k-1}^{(L)})}$
4) Generate the probability of ${\displaystyle {\hat {y}}}$ using ${\displaystyle {\hat {x}}}$ from ${\displaystyle p_{y|x}(y_{k}|{\hat {x}})}$ where ${\displaystyle y_{k}}$ is the measured value
5) Generate another uniform u from ${\displaystyle [0,m_{k}]}$ where ${\displaystyle m_{k}=\sup _{x}p_{y|x}(y_{k}|x)}$
6) Compare u and ${\displaystyle p\left({\hat {y}}\right)}$
6a) If u is larger then repeat from step 2
6b) If u is smaller then save ${\displaystyle {\hat {x}}}$ as ${\displaystyle x_{k|k}^{(p)}}$ and increment n
7) If n == P then quit

The goal is to generate P "particles" at ${\displaystyle k}$ using only the particles from ${\displaystyle k-1}$. This requires that a Markov equation can be written (and computed) to generate a ${\displaystyle x_{k}}$ based only upon ${\displaystyle x_{k-1}}$. This algorithm uses composition of the P particles from ${\displaystyle k-1}$ to generate a particle at ${\displaystyle k}$ and repeats (steps 2–6) until P particles are generated at ${\displaystyle k}$.

This can be more easily visualized if ${\displaystyle x}$ is viewed as a two-dimensional array. One dimension is ${\displaystyle k}$ and the other dimensions is the particle number. For example, ${\displaystyle x(k,L)}$ would be the Lth particle at ${\displaystyle k}$ and can also be written ${\displaystyle x_{k}^{(L)}}$ (as done above in the algorithm). Step 3 generates a potential ${\displaystyle x_{k}}$ based on a randomly chosen particle (${\displaystyle x_{k-1}^{(L)}}$) at time ${\displaystyle k-1}$ and rejects or accepts it in step 6. In other words, the ${\displaystyle x_{k}}$ values are generated using the previously generated ${\displaystyle x_{k-1}}$.

## References

1. {{#invoke:Citation/CS1|citation |CitationClass=journal }}
2. {{#invoke:Citation/CS1|citation |CitationClass=journal }}
3. {{#invoke:Citation/CS1|citation |CitationClass=journal }}
4. {{#invoke:citation/CS1|citation |CitationClass=conference }}
5. {{#invoke:citation/CS1|citation |CitationClass=conference }}
6. {{#invoke:Citation/CS1|citation |CitationClass=journal }}

## Bibliography

• {{#invoke:citation/CS1|citation

|CitationClass=book }}

• {{#invoke:Citation/CS1|citation

|CitationClass=journal }}

• {{#invoke:citation/CS1|citation

|CitationClass=book }}

• {{#invoke:Citation/CS1|citation

|CitationClass=journal }}

• {{#invoke:Citation/CS1|citation

|CitationClass=journal }}

• {{#invoke:citation/CS1|citation

|CitationClass=book }}

• {{#invoke:Citation/CS1|citation

|CitationClass=journal }}

• {{#invoke:Citation/CS1|citation

|CitationClass=journal }}

• {{#invoke:Citation/CS1|citation

|CitationClass=journal }}

• {{#invoke:Citation/CS1|citation

|CitationClass=journal }}

• {{#invoke:Citation/CS1|citation

|CitationClass=journal }}

• {{#invoke:Citation/CS1|citation

|CitationClass=journal }}

• {{#invoke:Citation/CS1|citation

|CitationClass=journal }}

• {{#invoke:Citation/CS1|citation

|CitationClass=journal }}

• {{#invoke:Citation/CS1|citation

|CitationClass=journal }}

• {{#invoke:Citation/CS1|citation

|CitationClass=journal }}

• {{#invoke:Citation/CS1|citation

|CitationClass=journal }}