A Multi-criteria Decision Making Approach In Feature Selection For Enhancing Text Categorization
Price
Free (open access)
Volume
35
Pages
11
Published
2005
Size
420 kb
Paper DOI
10.2495/DATA050081
Copyright
WIT Press
Author(s)
S. Doan & S. Horiguchi
Abstract
This paper considers the problem of feature selection in text categorization. Previous works in feature selection often used a filter model in which features, after ranked by a measure, are selected based on a given threshold. In this paper, we present a novel approach to feature selection based on multi-criteria decision making of each feature. Instead of only one criterion, multi-criteria of a feature are used; and a procedure based on each threshold of the criterion is proposed. This framework seems to be suitable for text data and can be applied to feature selection in text categorization. Experimental results on Reuters-21578 benchmark data show that our approach has a promising scheme and enhances the performance of a text categorization system. 1 Introduction Feature selection is an interesting issue recently in machine learning as well as data mining communities [1, 2, 3]. Up to now, there has been two most common approaches: the filter and the wrapper [4, 5]. Both approaches use prior knowledge as the heuristics indicator for selecting the optimal feature subset. The filter approach uses measurements of features as the criterion. In [6], Huang listed four measurements of features belonging to the filtering approach, that is information, distance, dependence and consistency measures. Based on a measurement, the optimal subset of features is chosen by filtering the \“noise” or \“irrelevant” features. Apart from the filter model, the wrapper approach based on the criterion of
Keywords