# Rand index

The Rand index[1] or Rand measure (named after William M. Rand) in statistics, and in particular in data clustering, is a measure of the similarity between two data clusterings. A form of the Rand index may be defined that is adjusted for the chance grouping of elements, this is the adjusted Rand index. From a mathematical standpoint, Rand index is related to the accuracy, but is applicable even when class labels are not used.

## Rand index

### Definition

Given a set of ${\displaystyle n}$ elements ${\displaystyle S=\{o_{1},\ldots ,o_{n}\}}$ and two partitions of ${\displaystyle S}$ to compare, ${\displaystyle X=\{X_{1},\ldots ,X_{r}\}}$, a partition of S into r subsets, and ${\displaystyle Y=\{Y_{1},\ldots ,Y_{s}\}}$, a partition of S into s subsets, define the following:

The Rand index, ${\displaystyle R}$, is:[1][2]

${\displaystyle R={\frac {a+b}{a+b+c+d}}={\frac {a+b}{n \choose 2}}}$

Intuitively, ${\displaystyle a+b}$ can be considered as the number of agreements between ${\displaystyle X}$ and ${\displaystyle Y}$ and ${\displaystyle c+d}$ as the number of disagreements between ${\displaystyle X}$ and ${\displaystyle Y}$.

### Properties

The Rand index has a value between 0 and 1, with 0 indicating that the two data clusters do not agree on any pair of points and 1 indicating that the data clusters are exactly the same.

In mathematical terms, a, b, c, d are defined as follows:

The adjusted Rand index is the corrected-for-chance version of the Rand index.[1][2][3] Though the Rand Index may only yield a value between 0 and +1, the Adjusted Rand Index can yield negative values if the index is less than the expected index.[4]

### The contingency table

Given a set ${\displaystyle S}$ of ${\displaystyle n}$ elements, and two groupings (e.g. clusterings) of these points, namely ${\displaystyle X=\{X_{1},X_{2},\ldots ,X_{r}\}}$ and ${\displaystyle Y=\{Y_{1},Y_{2},\ldots ,Y_{s}\}}$, the overlap between ${\displaystyle X}$ and ${\displaystyle Y}$ can be summarized in a contingency table ${\displaystyle \left[n_{ij}\right]}$ where each entry ${\displaystyle n_{ij}}$ denotes the number of objects in common between ${\displaystyle X_{i}}$ and ${\displaystyle Y_{j}}$ : ${\displaystyle n_{ij}=|X_{i}\cap Y_{j}|}$.

## References

1. {{#invoke:Citation/CS1|citation |CitationClass=journal }}
2. {{#invoke:Citation/CS1|citation |CitationClass=journal }}
3. {{#invoke:citation/CS1|citation |CitationClass=conference }}PDF.
4. http://i11www.iti.uni-karlsruhe.de/extra/publications/ww-cco-06.pdf