# Disjunct matrix

Disjunct and separable matrices play a pivotal role in the mathematical area of non-adaptive group testing. This area investigates efficient designs and procedures to identify 'needles in haystacks' by conducting the tests on groups of items instead of each item alone. The main concept is that if there are very few special items (needles) and the groups are constructed according to certain combinatorial guidelines, then one can test the groups and find all the needles. This can reduce the cost and the labor associated with of large scale experiments.

The grouping pattern can be represented by a ${\displaystyle t\times n}$ binary matrix, where each column represents an item and each row represents a pool. The symbol '1' denotes participation in the pool and '0' absence from a pool. The d-disjunctness and the d-separability of the matrix describe sufficient condition to identify d special items.

In a matrix that is d-separable, the Boolean sum of every d columns is unique. In a matrix that is d-disjunct the Boolean sum of every d columns does not contain any other column in the matrix. Theoretically, for the same number of columns (items), one can construct d-separable matrices with fewer rows (tests) than d-disjunct. However, designs that are based on d-separable are less applicable since the decoding time to identify the special items is exponential. In contrast, the decoding time for d-disjunct matrices is polynomial.

## d-separable

### Decoding algorithm

First we will describe another way to look at the problem of group testing and how to decode it from a different notation. We can give a new interpretation of how group testing works as follows:

This formalizes the relation between ${\displaystyle \mathbf {x} }$ and the columns of ${\displaystyle M}$ and ${\displaystyle \mathbf {r} }$ in a way more suitable to the thinking of ${\displaystyle d}$-separable and ${\displaystyle d}$-disjunct matrices. The algorithm to decode a ${\displaystyle d}$-separable matrix is as follows:

1. For each ${\displaystyle T\subseteq [n]}$ such that ${\displaystyle |T|\leq d}$ check if ${\displaystyle S_{\mathbf {r} }=\bigcup _{j\in T}S_{M_{j}}}$

This algorithm runs in time ${\displaystyle n^{{\mathcal {O}}(d)}}$.

## d-disjunct

In literature disjunct matrices are also called super-imposed codes and d-cover-free families.

### Decoding algorithm

The algorithm for ${\displaystyle d}$-separable matrices was still a polynomial in ${\displaystyle n}$. The following will give a nicer algorithm for ${\displaystyle d}$-disjunct matrices which will be a ${\displaystyle d}$ multiple instead of raised to the power of ${\displaystyle d}$ given our bounds for ${\displaystyle t}$. The algorithm is as follows in the proof of the following lemma:

Lemma 1: There exists an ${\displaystyle {\mathcal {O}}(nt)}$ time decoding for any ${\displaystyle d}$-disjunct ${\displaystyle t}$ x ${\displaystyle n}$ matrix.

Proof of Lemma 1: Given as input ${\displaystyle \mathbf {r} \in \{0,1\}^{t},M}$ use the following algorithm:

By Observation 1 we get that any position where ${\displaystyle \mathbf {r} _{i}=0}$ the appropriate ${\displaystyle \mathbf {x} _{j}}$'s will be set to 0 by step 2 of the algorithm. By Observation 2 we have that there is at least one ${\displaystyle i}$ such that if ${\displaystyle \mathbf {x} _{j}}$ is supposed to be 1 then ${\displaystyle M_{i,j}=1}$ and, if ${\displaystyle \mathbf {x} _{j}}$ is supposed to be 1, it can only be the case that ${\displaystyle \mathbf {r} _{i}=1}$ as well. Therefore step 2 will never assign ${\displaystyle \mathbf {x} _{j}}$ the value 0 leaving it as a 1 and solving for ${\displaystyle \mathbf {x} }$. This takes time ${\displaystyle {\mathcal {O}}(nt)}$ overall. ${\displaystyle \Box }$

## Upper bounds for non-adaptive group testing

The results for these upper bounds rely mostly on the properties of ${\displaystyle d}$-disjunct matrices. Not only are the upper bounds nice, but from Lemma 1 we know that there is also a nice decoding algorithm for these bounds. First the following lemma will be proved since it is relied upon for both constructions:

Note: these conditions are stronger than simply having a subset of size ${\displaystyle d}$ but rather applies to any pair of columns in a matrix. Therefore no matter what column ${\displaystyle i}$ that is chosen in the matrix, that column will contain at least ${\displaystyle w_{\min }}$ 1's and the total number of shared 1's by any two columns is ${\displaystyle a_{\max }}$.

Proof of Lemma 2: Fix an arbitrary ${\displaystyle S\subseteq [n],|S|\leq d,j\notin S}$ and a matrix ${\displaystyle M}$. There exists a match between ${\displaystyle i\in S{\text{ and }}j\notin S}$ if column ${\displaystyle i}$ has a 1 in the same row position as in column ${\displaystyle j}$. Then the total number of matches is ${\displaystyle \leq a_{\max }\cdot d\leq a_{\max }\cdot ({\frac {w_{\min }-1}{a_{\max }}})=w_{\min }-1<{\text{ }}w_{\min }}$, i.e. a column ${\displaystyle j}$ has a fewer number of matches than the number of ones in it. Therefore there must be a row with all 0s in ${\displaystyle S}$ but a 1 in ${\displaystyle j}$. ${\displaystyle \Box }$

We will now generate constructions for the bounds.

### Randomized construction

This first construction will use a probabilistic argument to show the property wanted, in particular the Chernoff bound. Using this randomized construction gives that ${\displaystyle t(d,n)\leq {\mathcal {O}}(d^{2}\log n)}$. The following lemma will give the result needed.

Theorem 1: There exists a random ${\displaystyle d}$-disjunct matrix with ${\displaystyle {\mathcal {O}}(d^{2}\log n)}$ rows.

Note that in this proof ${\displaystyle t=d^{2}\log n}$ thus giving the upper bound of ${\displaystyle t(d,n)\leq {\mathcal {O}}(d^{2}\log n)}$. ${\displaystyle \Box }$

### Strongly explicit construction

It is possible to prove a bound of ${\displaystyle t(d,n)\leq {\mathcal {O}}(d^{2}\log ^{2}{n})}$ using a strongly explicit code. Although this bound is worse by a ${\displaystyle \log n}$ factor, it is preferable because this produces a strongly explicit construction instead of a randomized one.

Theorem 2: There exists a strongly explicit ${\displaystyle d}$-disjunct matrix with ${\displaystyle {\mathcal {O}}(d^{2}\log ^{2}{n})}$ rows.

This proof will use the properties of concatenated codes along with the properties of disjunct matrices to construct a code that will satisfy the bound we are after.

then ${\displaystyle M_{C^{*}}}$ is ${\displaystyle \lfloor {\frac {w_{\min }-1}{a_{\max }}}\rfloor }$-disjunct. To complete the proof another concept must be introduced. This concept uses code concatenation to obtain the result we want.

Kautz-Singleton '64

---

Example: Let ${\displaystyle k=1,q=3,C_{out}=\{(0,0,0),(1,1,1),(2,2,2)\}}$. Below, ${\displaystyle M_{C}}$ denotes the matrix of codewords for ${\displaystyle C_{out}}$ and ${\displaystyle M_{C^{*}}}$ denotes the matrix of codewords for ${\displaystyle C^{*}=C_{out}\circ C_{in}}$, where each column is a codeword. The overall image shows the transition from the outer code to the concatenated code.

---

Thus we have a strongly explicit construction for a code that can be used to form a group testing matrix and so ${\displaystyle t(d,n)\leq (d\log n)^{2}}$.

For non-adaptive testing we have shown that ${\displaystyle \Omega (d\log n)\leq t(d,n)}$ and we have that (i) ${\displaystyle t(d,n)\leq {\mathcal {O}}(d^{2}\log ^{2}{n})}$ (strongly explicit) and (ii) ${\displaystyle t(d,n)\leq {\mathcal {O}}(d^{2}\log n)}$ (randomized). As of recent work by Porat and Rothscheld, they presented an explicit method construction (i.e. deterministic time but not strongly explicit) for ${\displaystyle t(d,n)\leq {\mathcal {O}}(d^{2}\log n)}$,[1] however it is not shown here. There is also a lower bound for disjunct matrices of ${\displaystyle t(d,n)\geq \Omega ({\frac {d^{2}}{\log d}}\log n)}$[2][3][4] which is not shown here either.

## Examples

Here is the 2-disjunct matrix ${\displaystyle M_{9\times 12}}$: