A Kernel Density Smoothing Method For Determining An Optimal Number Of Clusters In Continuous Data
Price
Free (open access)
Volume
47
Pages
14
Page Range
165 - 178
Published
2014
Size
610 kb
Paper DOI
10.2495/RISK140151
Copyright
WIT Press
Author(s)
J. Bugrien, K. Mwitondi & F. Shuweihdi
Abstract
While data clustering algorithms are becoming increasingly popular across scientific, industrial and social data mining applications, model complexity remains a major challenge. Most clustering algorithms do not incorporate a mechanism for finding an optimal scale parameter that corresponds to an appropriate number of clusters. We propose , a kernel-density smoothing-based approach to data clustering. Its main ideas derive from two unsupervised clustering approaches – kernel density estimation (KDE) and scale-spacing clustering (SSC). The novel method determines the optimal number of clusters by first finding dense regions in data before separating them based on data-dependent parameter estimates. The optimal number of clusters is determined from different levels of smoothing after the inherent number of arbitrary shape clusters has been detected without a priori information. We demonstrate the applicability of the proposed method under both nested and non-nested hierarchical clustering methodologies. Simulated and real data results are presented to validate the performance of the method, with repeated runs showing high accuracy and reliability. Keywords: BASINS -1, data clustering, data mining, kernel density estimation, local optimization, scale-space clustering, supervised learning, unsupervised learning.
Keywords
BASINS -1, data clustering, data mining, kernel density estimation, local optimization, scale-space clustering, supervised learning, unsupervised learning.