Please use this identifier to cite or link to this item: https://elibrary.tucl.edu.np/handle/123456789/10185
Title: Performance Analysis of Attribute Selection Methods in Decision Tree Induction
Authors: Yogi, Ganesh
Keywords: Data Mining;Classification;Supervised Learning;Decision Tree
Issue Date: 2018
Publisher: Department of Computer Science & Information Technology
Institute Name: Central Department of Computer Science and Information Technology
Level: Masters
Abstract: Decision tree learning algorithm has been successfully used in expert systems in capturing knowledge. The main task performed in these systems is using inductive methods to the given values of attributes of an unknown object to determine appropriate classification according to decision tree rules. It is one of the most effective forms to represent and evaluate the performance of algorithms, due to its various eye catching features: simplicity, comprehensibility, no parameters, and being able to handle mixed-type data. There are many decision tree algorithm available named ID3, C4.5, CART, CHAID, QUEST, GUIDE, CRUISE, and CTREE. In this paper, I have used attribute Selection Methods: ID3, C4.5 and CART, and meteorological data collected between 2004 and 2008 from the city of Kathmandu, Nepal, for Decision Tree algorithm. A data model for the meteorological data was developed and this was used to train the Decision Tree with these different attribute selection methods. The performances of these methods were compared using standard performance metrics. Cross fold validation is performed to test the built model i.e. Decision Tree. 10-fold cross validation is performed which partitions the dataset into 10 partitions and uses 90% data as training and 10% as testing. This testing is performed for ten repetitions. Experimentation results show, CART Decision tree has slightly more accuracy with large volume of dataset than that of other algorithms ID3 and C4.5. From the view of speed, C4.5 is better than other two algorithms. CART Decision tree has the average system accuracy rate of 80.9315%, system error rate of 19.0685%, precision rate of 83.1%, and recall rate of 83.1%. Similarly, C4.5 Decision Tree has the average system accuracy rate of 80.6849%, system error rate of 19.3151%, and precision rate of 82% recall rate of 84.4%. And ID3 Decision Tree has the average system accuracy rate of 28.08%, system error rate of 4.08%, and precision rate of 89.4% recall rate of 91.3%. From the time to complete perspective C4.5 completes in 0.05 seconds, ID3 completes in 0.32 seconds where as CART completes in 251.82 seconds. Keywords: Data Mining, Classification, Classifier, ID3, C4.5, CART, Supervised Learning, Unsupervised Learning, Decision Tree, Information Gain, Gain Ratio, Gini Index.
URI: https://elibrary.tucl.edu.np/handle/123456789/10185
Appears in Collections:Computer Science & Information Technology

Files in This Item:
File Description SizeFormat 
All thesis.pdf1.61 MBAdobe PDFView/Open


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.