Please use this identifier to cite or link to this item: https://elibrary.tucl.edu.np/handle/123456789/15407
Title: A Comparative Study of Naive Bayes and Support Vector Machine Classifier for Nepali News Classification
Authors: Ayer, Ganesh Bahadur
Keywords: Document categorization;Text classification;Nepali language
Issue Date: 2015
Publisher: A Comparative Study of Naive Bayes and Support Vector Machine Classifier for Nepali News Classification
Institute Name: Central Department of Computer Science and Information Technology
Level: Masters
Abstract: Automated document classification is the task of assigning the given document into some class of interest. Text classification is the subset of document classification as document can be text, image, music, etc. Document classification has many applications in library science, information science, computer science and others. It can be used for intellectual categorization of documents, indexing of documents, filtering of spams, routing of emails, identification of language, classification of genre, etc. The problem of automated document classification can be solved in supervised, unsupervised or semi-supervised way. Most of the learning and classification algorithms use document at- tributes and human inference to learn and classify given documents. In this dissertation work, many Natural Language Processing (NLP) techniques are used for document processing and attribute selection. And, two learning based classification techniques are used namely, Support vector machine (SVM) and Naive Bayes Classifier. For the evaluation of the system, we have created Nepali text datasets for five classes of documents: Business, Crime, Education, Health and Sports. There are two separate datasets for training and testing of the system. SVM classification system has the average system accuracy rate of 86.34%, precision rate of 84% recall rate of 94.4%. Similarly, Naive Bayes classification system has the average system accuracy rate of 88.8%, precision rate of 92.23% and recall rate of 88.87%. Keywords: Automated Document Categorization, Text Classification, Natural language processing, Nepali language, Pre processing, Feature extraction, Artificial Neural Networks, Support vector machine, Naive Bayes Classifier
URI: https://elibrary.tucl.edu.np/handle/123456789/15407
Appears in Collections:Computer Science & Information Technology

Files in This Item:
File Description SizeFormat 
Cover page.pdf1.99 MBAdobe PDFView/Open
Chapter page.pdf2.48 MBAdobe PDFView/Open


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.