WIT Press


A Kernel Density Smoothing Method For Determining An Optimal Number Of Clusters In Continuous Data

Price

Free (open access)

Volume

47

Pages

14

Page Range

165 - 178

Published

2014

Size

610 kb

Paper DOI

10.2495/RISK140151

Copyright

WIT Press

Author(s)

J. Bugrien, K. Mwitondi & F. Shuweihdi

Abstract

While data clustering algorithms are becoming increasingly popular across scientific, industrial and social data mining applications, model complexity remains a major challenge. Most clustering algorithms do not incorporate a mechanism for finding an optimal scale parameter that corresponds to an appropriate number of clusters. We propose , a kernel-density smoothing-based approach to data clustering. Its main ideas derive from two unsupervised clustering approaches – kernel density estimation (KDE) and scale-spacing clustering (SSC). The novel method determines the optimal number of clusters by first finding dense regions in data before separating them based on data-dependent parameter estimates. The optimal number of clusters is determined from different levels of smoothing after the inherent number of arbitrary shape clusters has been detected without a priori information. We demonstrate the applicability of the proposed method under both nested and non-nested hierarchical clustering methodologies. Simulated and real data results are presented to validate the performance of the method, with repeated runs showing high accuracy and reliability. Keywords: BASINS -1, data clustering, data mining, kernel density estimation, local optimization, scale-space clustering, supervised learning, unsupervised learning.

Keywords

BASINS -1, data clustering, data mining, kernel density estimation, local optimization, scale-space clustering, supervised learning, unsupervised learning.