Shapley–Shubik power index

From formulasearchengine
Revision as of 17:57, 20 April 2013 by en>Yobot (WP:CHECKWIKI error fixes - Replaced endash with hyphen in sortkey per WP:MCSTJR using AWB (9100))
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search

In statistics and data mining, k-medians clustering[1][2] is a cluster analysis algorithm. It is a variation of k-means clustering where instead of calculating the mean for each cluster to determine its centroid, one instead calculates the median. This has the effect of minimizing error over all clusters with respect to the 1-norm distance metric, as opposed to the square of the 2-norm distance metric (which k-means does.)

This relates directly to the k-median problem which is the problem of finding k centers such that the clusters formed by them are the most compact. Formally, given a set of data points x, the k centers ci are to be chosen so as to minimize the sum of the distances from each x to the nearest ci.

The criterion function formulated in this way is sometimes a better criterion than that used in the k-means clustering algorithm, in which the sum of the squared distances is used. The sum of distances is widely used in applications such as facility location.

The proposed algorithm uses Lloyd-style iteration which alternates between an expectation (E) and maximization (M) step, making this an Expectation–maximization algorithm. In the E step, all objects are assigned to their nearest median. In the M step, the medians are recomputed by using the median in each single dimension.

Medians and medoids

As the median is computed in each single dimension, the individual attributes will come from the data set, making this algorithm more reliable for discrete or even binary data sets. The means will however not necessarily be instances from the data set, as the attributes may come from different instances.

This algorithm is often confused with the k-medoids algorithm. However, a medoid has to be an actual instance from the dataset, while for the (multivariate) median this only holds for single attribute values. The actual median can thus be a combination of multiple instances. Given the vectors , and , the median obviously is and does not exist in the original data, and thus cannot be a medoid.

Software

  • ELKI includes various k-means variants, including k-medians.
  • GNU R includes k-medians in the "flexclust" package.
  • Stata kmedians

See also

References

43 year old Petroleum Engineer Harry from Deep River, usually spends time with hobbies and interests like renting movies, property developers in singapore new condominium and vehicle racing. Constantly enjoys going to destinations like Camino Real de Tierra Adentro.


Template:Algorithm-stub I am Chester from Den Haag. I am learning to play the Cello. Other hobbies are Running.

Also visit my website: Hostgator Coupons - dawonls.dothome.co.kr -

  1. A. K. Jain and R. C. Dubes, Algorithms for Clustering Data. Prentice-Hall, 1988.
  2. P. S. Bradley, O. L. Mangasarian, and W. N. Street, "Clustering via Concave Minimization," in Advances in Neural Information Processing Systems, vol. 9, M. C. Mozer, M. I. Jordan, and T. Petsche, Eds. Cambridge, MA: MIT Press, 1997, pp. 368–374.