A Fully Sensitive Correlation Measure For
Data Mining

R. J. G. B. Campello; E. R. Hruschka

doi:10.2495/DATA080041

WIT Press

A Fully Sensitive Correlation Measure For Data Mining

Price

Free (open access)

Transaction

WIT Transactions on Information and Communication Technologies

Volume

Pages

Page Range

35 - 41

Published

2008

Size

195 kb

Paper DOI

10.2495/DATA080041

WIT Press

Author(s)

R. J. G. B. Campello & E. R. Hruschka

Abstract

This paper introduces a novel sequence correlation measure that is fully sensitive to both the ranks and magnitudes of the sequences under evaluation. This measure can be more appropriate than the existing ones in those application scenarios in which such a full sensitivity is desired. The applicability of the new measure in data mining tasks is motivated. Keywords: correlation indexes, clustering analysis. 1 Introduction A problem that appears in different contexts of data analysis is that of comparing two sequences A = {a1, a2, . . . ,an} and B = {b1, b2, . . . ,bn} for which there is a total order relation (≤) on their elements. This problem can be addressed by means of correlation indexes, such as the well-known Pearson correlation coefficient [1, 2]. Aside from the huge applicability of such indexes in statistics [3, 4], there are also different possible scenarios for their application to data mining tasks. In this context, one may mention, for instance, the use of sequence correlation indexes for feature selection as a pre-processing step for data clustering or classification [5]. Another scenario for the application of correlation indexes to data clustering or classification is the measurement of similarities in bioinformatics data sets [6]. For example, sequences A and B can refer to the responses of a given pair of genes along a set of experiments (e.g. microarray) [7]. Since the trend of such responses plays a fundamental role to describe the function and behavior of the corresponding genes, correlation indexes have been widely used as measures of similarity when dealing with this sort of data.

Keywords

correlation indexes, clustering analysis.

Keep me updated

View Book

WIT Press, Ashurst Lodge, Ashurst, Southampton SO40 7AA, UK. Registered in England as a limited company No. 4741634

Connect with WIT Press: