# Nearest centroid classifier

Jump to navigation
Jump to search

In machine learning, a **nearest centroid** or **nearest prototype classifier** is a classification model that assigns to observations the label of the class of training samples whose mean (centroid) is closest to the observation.

When applied to text classification using tf*idf vectors to represent documents, the nearest centroid classifier is known as the **Rocchio classifier** because of its similarity to the Rocchio algorithm for relevance feedback.^{[1]}

An extended version of the nearest centroid classifier has found applications in the medical domain, specifically classification of tumors.^{[2]}

## Algorithm

- Training procedure: given labeled training samples with class labels , compute the per-class centroids where is the set of indices of samples belonging to class .
- Prediction function: the class assigned to an observation is .