A Comparative Study on Document Categorization Using Apriori Algorithm and Naive Bayse Algorithm
Date
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Department of Computer Science and Information Technology
Abstract
Automatic document classification is the process for classifying electronic text document into
specific category based on its contents. This dissertation work is about document classification
and this dissertation will help in arranging electronic documents automatically. Document
classification has many applications in computer science, information science, newspaper
classification, library science etc. Document classification can be used in spam filtering, news
article classification, pornography classification, indexing of documents, routing of emails etc.
The problem of automated document classification can be solved in supervised, unsupervised or
semi-supervised machine learning technique. This dissertation work is based on both
unsupervised and supervised machine learning technique where Apriori Algorithm is related to
unsupervised machine learning and the Navie Bayes Classifier itself is supervised machine
learning. The overall work of training and testing is based on three different classes of
documents: Graphics, Guns and Sports. The system performance is measured on the basis of
accuracy and F1 measure where Apriori Algorithm performed better than Naïve Bayes.
