Performance Analysis of Attribute Selection Methods in Decision Tree Induction
Date
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Department of Computer Science & Information Technology
Abstract
Decision tree learning algorithm has been successfully used in expert systems in capturing
knowledge. The main task performed in these systems is using inductive methods to the given
values of attributes of an unknown object to determine appropriate classification according to
decision tree rules. It is one of the most effective forms to represent and evaluate the
performance of algorithms, due to its various eye catching features: simplicity,
comprehensibility, no parameters, and being able to handle mixed-type data. There are many
decision tree algorithm available named ID3, C4.5, CART, CHAID, QUEST, GUIDE, CRUISE,
and CTREE. In this paper, I have used attribute Selection Methods: ID3, C4.5 and CART, and
meteorological data collected between 2004 and 2008 from the city of Kathmandu, Nepal, for
Decision Tree algorithm. A data model for the meteorological data was developed and this was
used to train the Decision Tree with these different attribute selection methods. The
performances of these methods were compared using standard performance metrics.
Cross fold validation is performed to test the built model i.e. Decision Tree. 10-fold cross
validation is performed which partitions the dataset into 10 partitions and uses 90% data as
training and 10% as testing. This testing is performed for ten repetitions.
Experimentation results show, CART Decision tree has slightly more accuracy with large
volume of dataset than that of other algorithms ID3 and C4.5. From the view of speed, C4.5 is
better than other two algorithms. CART Decision tree has the average system accuracy rate of
80.9315%, system error rate of 19.0685%, precision rate of 83.1%, and recall rate of 83.1%.
Similarly, C4.5 Decision Tree has the average system accuracy rate of 80.6849%, system error
rate of 19.3151%, and precision rate of 82% recall rate of 84.4%. And ID3 Decision Tree has the
average system accuracy rate of 28.08%, system error rate of 4.08%, and precision rate of 89.4%
recall rate of 91.3%. From the time to complete perspective C4.5 completes in 0.05 seconds, ID3
completes in 0.32 seconds where as CART completes in 251.82 seconds.
Keywords: Data Mining, Classification, Classifier, ID3, C4.5, CART, Supervised Learning,
Unsupervised Learning, Decision Tree, Information Gain, Gain Ratio, Gini Index.