# Group testing

Jump to navigation Jump to search

In probability theory and statistics, the negative multinomial distribution is a generalization of the negative binomial distribution (NB(r, p)) to more than two outcomes.[1]

Suppose we have an experiment that generates m+1≥2 possible outcomes, {X0,…,Xm}, each occurring with non-negative probabilities {p0,…,pm} respectively. If sampling proceeded until n observations were made, then {X0,…,Xm} would have been multinomially distributed. However, if the experiment is stopped once X0 reaches the predetermined value k0, then the distribution of the m-tuple {X1,…,Xm} is negative multinomial.

## Negative multinomial distribution example

The table below shows the an example of 400 Melanoma (skin cancer) Patients where the Type and Site of the cancer are recorded for each subject.

 Type Site Totals Head and Neck Trunk Extremities Hutchinson's melanomic freckle 22 2 10 34 Superficial 16 54 115 185 Nodular 19 33 73 125 Indeterminant 11 17 28 56 Column Totals 68 106 226 400

The sites (locations) of the cancer may be independent, but there may be positive dependencies of the type of cancer for a given location (site). For example, localized exposure to radiation implies that elevated level of one type of cancer (at a given location) may indicate higher level of another cancer type at the same location. The Negative Multinomial distribution may be used to model the sites cancer rates and help measure some of the cancer type dependencies within each location.

If ${\displaystyle x_{i,j}}$ denote the cancer rates for each site (${\displaystyle 0\leq i\leq 2}$) and each type of cancer (${\displaystyle 0\leq j\leq 3}$), for a fixed site (${\displaystyle i_{0}}$) the cancer rates are independent Negative Multinomial distributed random variables. That is, for each column index (site) the column-vector X has the following distribution:

${\displaystyle X=\{X_{1},X_{2},X_{3}\}\sim NM(k_{0},\{p_{1},p_{2},p_{3}\})}$.

Different columns in the table (sites) are considered to be different instances of the random multinomially distributed vector, X. Then we have the following estimates of expected counts (frequencies of cancer):

${\displaystyle {\hat {\mu }}_{i,j}={\frac {x_{i,.}\times x_{.,j}}{x_{.,.}}}}$
${\displaystyle x_{i,.}=\sum _{j=0}^{3}{x_{i,j}}}$
${\displaystyle x_{.,j}=\sum _{i=0}^{2}{x_{i,j}}}$
${\displaystyle x_{.,.}=\sum _{i=0}^{2}\sum _{j=0}^{3}{x_{i,j}}}$
Example: ${\displaystyle {\hat {\mu }}_{1,1}={\frac {x_{1,.}\times x_{.,1}}{x_{.,.}}}={\frac {34\times 68}{400}}=5.78}$

For the first site (Head and Neck, j=0), suppose that ${\displaystyle X=\left\{X_{1}=5,X_{2}=1,X_{3}=5\right\}}$ and ${\displaystyle X\sim NM(k_{0}=10,\{p_{1}=0.2,p_{2}=0.1,p_{3}=0.2\})}$. Then:

${\displaystyle p_{0}=1-\sum _{i=1}^{3}{p_{i}}=0.5}$
${\displaystyle NM(X|k_{0},\{p_{1},p_{2},p_{3}\})=0.00465585119998784}$
${\displaystyle cov[X_{1},X_{3}]={\frac {10\times 0.2\times 0.2}{0.5^{2}}}=1.6}$
${\displaystyle \mu _{2}={\frac {k_{0}p_{2}}{p_{0}}}={\frac {10\times 0.1}{0.5}}=2.0}$
${\displaystyle \mu _{3}={\frac {k_{0}p_{3}}{p_{0}}}={\frac {10\times 0.2}{0.5}}=4.0}$
${\displaystyle corr[X_{2},X_{3}]=\left({\frac {\mu _{2}\times \mu _{3}}{(k_{0}+\mu _{2})(k_{0}+\mu _{3})}}\right)^{\frac {1}{2}}}$ and therefore, ${\displaystyle corr[X_{2},X_{3}]=\left({\frac {2\times 4}{(10+2)(10+4)}}\right)^{\frac {1}{2}}=0.21821789023599242.}$

Notice that the pair-wise NM correlations are always positive, whereas the correlations between multinomial counts are always negative. As the parameter ${\displaystyle k_{0}}$ increases, the paired correlations tend to zero! Thus, for large ${\displaystyle k_{0}}$, the Negative Multinomial counts ${\displaystyle X_{i}}$ behave as independent Poisson random variables with respect to their means ${\displaystyle \left(\mu _{i}=k_{0}{\frac {p_{i}}{p_{0}}}\right)}$.

The marginal distribution of each of the ${\displaystyle X_{i}}$ variables is negative binomial, as the ${\displaystyle X_{i}}$ count (considered as success) is measured against all the other outcomes (failure). But jointly, the distribution of ${\displaystyle X=\{X_{1},\cdots ,X_{m}\}}$ is negative multinomial, i.e., ${\displaystyle X\sim NM(k_{0},\{p_{1},\cdots ,p_{m}\})}$ .

## Parameter estimation

Hutchinson's melanomic freckle type of cancer (${\displaystyle X_{0}}$) is ${\displaystyle {\hat {\mu }}_{0}=34/3=11.33}$.
Superficial type of cancer (${\displaystyle X_{1}}$) is ${\displaystyle {\hat {\mu }}_{1}=185/3=61.67}$.
Nodular type of cancer (${\displaystyle X_{2}}$) is ${\displaystyle {\hat {\mu }}_{2}=125/3=41.67}$.
Indeterminant type of cancer (${\displaystyle X_{3}}$) is ${\displaystyle {\hat {\mu }}_{3}=56/3=18.67}$.
${\displaystyle \mathrm {X} ^{2}=\sum _{i}{\frac {(x_{i}-\mu _{i})^{2}}{\mu _{i}}}}$, we can replace the expected-means (${\displaystyle \mu _{i}}$) by their estimates, ${\displaystyle {\hat {\mu _{i}}}}$, and replace denominators by the corresponding negative multinomial variances. Then we get the following test statistic for negative multinomial distributed data:
${\displaystyle \mathrm {X} ^{2}(k_{0})=\sum _{i}{\frac {(x_{i}-{\hat {\mu _{i}}})^{2}}{{\hat {\mu _{i}}}\left(1+{\frac {\hat {\mu _{i}}}{k_{0}}}\right)}}}$.
Next, we can estimate the ${\displaystyle k_{0}}$ parameter by varying the values of ${\displaystyle k_{0}}$ in the expression ${\displaystyle \mathrm {X} ^{2}(k_{0})}$ and matching the values of this statistic with the corresponding asymptotic chi-squared distribution. The following protocol summarizes these steps using the cancer data above.
DF: The degree of freedom for the Chi-squared distribution in this case is:
df = (# rows – 1)(# columns – 1) = (3-1)*(4-1) = 6
Median: The median of a chi-squared random variable with 6 df is 5.261948.
Mean Counts Estimates: The mean counts estimates (${\displaystyle \mu _{j}}$) for the 4 different cancer types are:
${\displaystyle {\hat {\mu }}_{1}=185/3=61.67}$; ${\displaystyle {\hat {\mu }}_{2}=125/3=41.67}$; and ${\displaystyle {\hat {\mu }}_{3}=56/3=18.67}$.
Thus, we can solve the equation above ${\displaystyle \mathrm {X} ^{2}(k_{0})=5.261948}$ for the single variable of interest -- the unknown parameter ${\displaystyle k_{0}}$. In the cancer example, suppose ${\displaystyle x=\{x_{1}=5,x_{2}=1,x_{3}=5\}}$. Then, the solution is an asymptotic chi-squared distribution driven estimate of the parameter ${\displaystyle k_{0}}$.
${\displaystyle \mathrm {X} ^{2}(k_{0})=\sum _{i=1}^{3}{\frac {(x_{i}-{\hat {\mu _{i}}})^{2}}{{\hat {\mu _{i}}}\left(1+{\frac {\hat {\mu _{i}}}{k_{0}}}\right)}}}$.
${\displaystyle \mathrm {X} ^{2}(k_{0})={\frac {(5-61.67)^{2}}{61.67(1+61.67/k_{0})}}+{\frac {(1-41.67)^{2}}{41.67(1+41.67/k_{0})}}+{\frac {(5-18.67)^{2}}{18.67(1+18.67/k_{0})}}=5.261948.}$ Solving this equation for ${\displaystyle k_{0}}$ provides the desired estimate for the last parameter.
Mathematica provides 3 distinct (${\displaystyle k_{0}}$) solutions to this equation: {50.5466, -21.5204, 2.40461}. Since ${\displaystyle k_{0}>0}$ there are 2 candidate solutions.
${\displaystyle {\frac {61.67}{k_{0}}}p_{0}=31p_{0}=p_{1}}$
${\displaystyle 20p_{0}=p_{2}}$
${\displaystyle 9p_{0}=p_{3}}$
Hence, ${\displaystyle 1-p_{0}=p_{1}+p_{2}+p_{3}=60p_{0}}$, and ${\displaystyle p_{0}={\frac {1}{61}}}$, ${\displaystyle p_{1}={\frac {31}{61}}}$, ${\displaystyle p_{2}={\frac {20}{61}}}$ and ${\displaystyle p_{3}={\frac {9}{61}}}$.
Therefore, the best model distribution for the observed sample ${\displaystyle x=\{x_{1}=5,x_{2}=1,x_{3}=5\}}$ is ${\displaystyle X\sim NM\left(2,\left\{{\frac {31}{61}},{\frac {20}{61}},{\frac {9}{61}}\right\}\right).}$

## References

1. Le Gall, F. The modes of a negative multinomial distribution, Statistics & Probability Letters, Volume 76, Issue 6, 15 March 2006, Pages 619-624, ISSN 0167-7152, 10.1016/j.spl.2005.09.009.
2. 20 year-old Real Estate Agent Rusty from Saint-Paul, has hobbies and interests which includes monopoly, property developers in singapore and poker. Will soon undertake a contiki trip that may include going to the Lower Valley of the Omo.

My blog: http://www.primaboinca.com/view_profile.php?userid=5889534

## Further reading

20 year-old Real Estate Agent Rusty from Saint-Paul, has hobbies and interests which includes monopoly, property developers in singapore and poker. Will soon undertake a contiki trip that may include going to the Lower Valley of the Omo.

My blog: http://www.primaboinca.com/view_profile.php?userid=5889534

55 yrs old Metal Polisher Records from Gypsumville, has interests which include owning an antique car, summoners war hack and spelunkering. Gets immense motivation from life by going to places such as Villa Adriana (Tivoli).

my web site - summoners war hack no survey ios 30 year-old Entertainer or Range Artist Wesley from Drumheller, really loves vehicle, property developers properties for sale in singapore singapore and horse racing. Finds inspiration by traveling to Works of Antoni Gaudí.