Sample entropy

Sample entropy (SampEn) is a modification of approximate entropy (ApEn), used extensively for assessing the complexity of a physiological time-series signal, thereby diagnosing diseased state.[1] Unlike ApEn, SampEn shows good traits such as data length independence and trouble-free implementation.

There is a multiscale version of SampEn as well, suggested by Costa and others.[2]

Definition

Like approximate entropy (ApEn), Sample entropy (SampEn) is a measure of complexity [1]. But it does not include self-similar patterns as ApEn does. For a given embedding dimension ${\displaystyle m}$, toleranceTemplate:Dn ${\displaystyle r}$ and number of data points ${\displaystyle N}$, SampEn is the negative logarithm of the probability that if two sets of simultaneous data points of length ${\displaystyle m}$ have distance ${\displaystyle then two sets of simultaneous data points of length ${\displaystyle m+1}$ also have distance ${\displaystyle . And we represent it by ${\displaystyle SampEn(m,r,N)}$ (or by ${\displaystyle SampEn(m,r,\tau ,N)}$ including sampling time ${\displaystyle \tau }$).

Now assume we have a time-series data set of length ${\displaystyle N={\{x_{1},x_{2},x_{3},...,x_{N}\}}}$ with a constant time interval ${\displaystyle \tau }$. We define a template vector of length m, such that ${\displaystyle X_{m}(i)={\{x_{i},x_{i+1},x_{i+2},...,x_{i+m-1}\}}}$ and the distance function ${\displaystyle d[x_{m}(i),x_{m}(j)]}$ (i≠j) is to be the Chebyshev distance (but it could be any distance function, including Euclidean distance). We count the number of vector pairs in template vectors of length ${\displaystyle m}$ and ${\displaystyle m+1}$ having ${\displaystyle d[x_{m}(i),x_{m}(j)] and denote it by ${\displaystyle B}$ and ${\displaystyle A}$ respectively. We define the sample entropy to be

${\displaystyle SampEn=-\log {A \over B}}$

Where,

${\displaystyle A}$ = no of template vector pairs having ${\displaystyle d[x_{m}(i),x_{m}(j)] of length ${\displaystyle m+1}$

${\displaystyle B}$ = no of template vector pairs having ${\displaystyle d[x_{m}(i),x_{m}(j)] of length ${\displaystyle m}$

It is clear from the definition that ${\displaystyle A}$ will always have smaller value than ${\displaystyle B}$, so the value of ${\displaystyle SampEn(m,r,\tau )}$ will be always positive. A smaller value of ${\displaystyle SampEn}$ also indicates more self-similarity in data set or less noise.

Generally we take the value of ${\displaystyle m}$ to be ${\displaystyle 2}$ and the value of ${\displaystyle r}$ to be ${\displaystyle 0.2\times std}$. Where std stands for standard deviation.

Multiscale SampEn

The definition mentioned above is a special case of multi scale sampEn with ${\displaystyle \delta =1}$,where ${\displaystyle \delta }$ is called skipping parameter.In multiscale SampEn we define template vectors with a certain interval between its each element specified by the value of ${\displaystyle \delta }$ we are using. And we define our modified template vector as ${\displaystyle X_{m,\delta }(i)={x_{i},x_{i+\delta },x_{i+2\times \delta },...,x_{i+(m-1)\times \delta }}}$ and sampEn can be written as ${\displaystyle SampEn\left(m,r,\delta \right)=-\log {A_{\delta } \over B_{\delta }}}$ And we calculate ${\displaystyle A_{\delta }}$ and ${\displaystyle B_{\delta }}$ like before.And here also we use the value of ${\displaystyle m}$ to be ${\displaystyle 2}$ and ${\displaystyle r}$ to be ${\displaystyle 0.2\times std.}$

Implementation

The SampEn can be implemented easily in many different programming language. One example among MatLab versions can be found here.

References

1. {{#invoke:Citation/CS1|citation |CitationClass=journal }}
2. {{#invoke:Citation/CS1|citation |CitationClass=journal }}