ACTA

HK-Means: A Heuristic Approach to Initialize and Estimate the Number of Clusters in Biological Data

D. Reddy Edla, V. Gondlekar and V. Gauns
National Institute of Technology Goa, Farmagudi, Goa, India

K-means algorithm is one of the simplest and fastest clustering algorithms existing since more than four decades. One of the limitations of this algorithm is estimating number of clusters in advance. This algorithm also suffers from random initialization problem. This paper proposes a heuristic which initializes the cluster centers and estimates the number of clusters as a discrete value. The method estimates the number of clusters and initializes many cluster centers successfully for the clusters that are dense and separated significantly. The method selects a new cluster center in each iteration. The point selected is the point which is most dissimilar from the previously chosen points. The proposed algorithm is experimented on various synthetic data and the results are encouraging.

DOI: 10.12693/APhysPolA.130.78
PACS numbers: 07.05.Tp, 29.85.Fj, 93.85.Bc