G-2016-81
On strategies to fix degenerate k-Means-means solutions
, , , and
BibTeX referenceThe \(k\)-means is a benchmark algorithm used in cluster analysis. It belongs to the large category of
heuristics based on location-allocation steps that alternately locate cluster centers and allocate data points to them
until no further
improvement is possible. Such heuristics are known to suffer from a phenomenon called degeneracy in which some of the clusters are empty. In this paper, we compare and propose a series of strategies to circumvent degenerate solutions during a \(k\)-means execution. 
Our computational experiments
show that these strategies are effective leading to better clustering solutions in the vast majority of 
the cases in which degeneracy appears in  \(k\)-means. Moreover, we compare the use of our fixing strategies within \(k\)-means against the use of two initialization methods found in the literature. These results demonstrate how useful the proposed strategies can be, specially inside memory-based clustering algorithms.
Published October 2016 , 22 pages