A Comparative Study of Naive Bayes and Support Vector Machine Classifier for Nepali News Classification
Date
2015
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
A Comparative Study of Naive Bayes and Support Vector Machine Classifier for Nepali News Classification
Abstract
Automated document classification is the task of assigning the given document into some
class of interest. Text classification is the subset of document classification as document can
be text, image, music, etc. Document classification has many applications in library science,
information science, computer science and others. It can be used for intellectual
categorization of documents, indexing of documents, filtering of spams, routing of emails,
identification of language, classification of genre, etc.
The problem of automated document classification can be solved in supervised, unsupervised
or semi-supervised way. Most of the learning and classification algorithms use document at-
tributes and human inference to learn and classify given documents. In this dissertation work,
many Natural Language Processing (NLP) techniques are used for document processing and
attribute selection. And, two learning based classification techniques are used namely,
Support vector machine (SVM) and Naive Bayes Classifier.
For the evaluation of the system, we have created Nepali text datasets for five classes of
documents: Business, Crime, Education, Health and Sports. There are two separate datasets
for training and testing of the system. SVM classification system has the average system
accuracy rate of 86.34%, precision rate of 84% recall rate of 94.4%. Similarly, Naive Bayes
classification system has the average system accuracy rate of 88.8%, precision rate of 92.23%
and recall rate of 88.87%.
Keywords:
Automated Document Categorization, Text Classification, Natural language processing,
Nepali language, Pre processing, Feature extraction, Artificial Neural Networks, Support
vector machine, Naive Bayes Classifier
Description
Keywords
Document categorization, Text classification, Nepali language