A Comparison Of Some Classification Techniques
Price
Free (open access)
Volume
28
Pages
Published
2002
Size
561 kb
Paper DOI
10.2495/DATA020551
Copyright
WIT Press
Author(s)
P S S Coelho & N F F Ebecken
Abstract
The classification activity assigns labels, or classes, to differentiate object groups. In general, these labels are well known beforehand through objects already classified. In Data Mining tasks, the objects are records, i.e., they are described using a set of attributes. These attributes can have any nature (categorical or continuous). The objective is to establish models to characterize the classes of the records using its attributes (values, distribution, pattern, etc.). Many different techniques for the record classification task are available today. These techniques are differentiated by the heuristics they use. In this article a comparison is made of some of the most popular classification techniques. This includes Decision Trees, Bayesian Algorithms (Statistical Methods), and the Classification Based on Rule Induction, also Classification Based on Association Rules. To compare these techniques, the Predictive Accuracy Criteria was mainly used. The Speed, Robustness, Scalability and Interpretability Aspects are also argued, but they had not been quantified for a mathematical comparison. The classification models had been determined from two relational tables with real data. The first one of them is composite with data about meteorological conditions in the region of the International Airport of Rio de Janeiro. This table has 26482 records with 19 variables (one of them is the class label). The second one is about an insurance company, having 130143 registers with 63 independent variables (attributes) and one dependent variable (label of the class). These data tables were prepared earlier. The result of this comparison can be seen in some tables. 1 Introduction It can be considered that the activities of Data Mining are concentrated in development of models that represent some knowledge contained in the data
Keywords