The Rand index[1] or Rand measure (named after William M. Rand) in statistics, and in particular in data clustering, is a measure of the similarity between two data clusterings. A form of the Rand index may be defined that is adjusted for the chance grouping of elements, this is the adjusted Rand index. From a mathematical standpoint, Rand index is related to the accuracy, but is applicable even when class labels are not used.
Rand index
Definition
Given a set of
elements
and two partitions of
to compare,
, a partition of S into r subsets, and
, a partition of S into s subsets, define the following:
The Rand index,
, is:[1][2]

Intuitively,
can be considered as the number of agreements between
and
and
as the number of disagreements between
and
.
Properties
The Rand index has a value between 0 and 1, with 0 indicating that the two data clusters do not agree on any pair of points and 1 indicating that the data clusters are exactly the same.
In mathematical terms, a, b, c, d are defined as follows:
for some
Adjusted Rand index
The adjusted Rand index is the corrected-for-chance version of the Rand index.[1][2][3] Though the Rand Index may only yield a value between 0 and +1, the Adjusted Rand Index can yield negative values if the index is less than the expected index.[4]
The contingency table
Given a set
of
elements, and two groupings (e.g. clusterings) of these points, namely
and
, the overlap between
and
can be summarized in a contingency table
where each entry
denotes the number of objects in common between
and
:
.
Definition
The adjusted form of the Rand Index, the Adjusted Rand Index, is
, more specifically
where
are values from the contingency table.
References
External links