A Comparative Study on Document Categorization Using Apriori Algorithm and Naive Bayse Algorithm

Date

Journal Title

Journal ISSN

Volume Title

Publisher

Department of Computer Science and Information Technology

Abstract

Automatic document classification is the process for classifying electronic text document into specific category based on its contents. This dissertation work is about document classification and this dissertation will help in arranging electronic documents automatically. Document classification has many applications in computer science, information science, newspaper classification, library science etc. Document classification can be used in spam filtering, news article classification, pornography classification, indexing of documents, routing of emails etc. The problem of automated document classification can be solved in supervised, unsupervised or semi-supervised machine learning technique. This dissertation work is based on both unsupervised and supervised machine learning technique where Apriori Algorithm is related to unsupervised machine learning and the Navie Bayes Classifier itself is supervised machine learning. The overall work of training and testing is based on three different classes of documents: Graphics, Guns and Sports. The system performance is measured on the basis of accuracy and F1 measure where Apriori Algorithm performed better than Naïve Bayes.

Description

Citation