Please use this identifier to cite or link to this item: https://elibrary.tucl.edu.np/handle/123456789/15409
Title: Performance Analysis of Nepali Text Classification using Back Propagation and Naive Bayes Algorithm
Authors: Maharjan, Jamuna
Keywords: Text classification;Feature extraction
Issue Date: 2014
Publisher: Department of Computer Science& Information Technology
Institute Name: Central Department of Computer Science and Information Technology
Level: Masters
Abstract: Automated document classification is the task of assigning the given document into some class of interest. Text classification is the subset of document classification as document can be text, image, music, etc. Document classification has many applications in library science, information science, computer science and others. It can be used for intellectual categorization of documents, indexing of documents, filtering of spams, routing of emails, identification of language, classification of genre, etc. The problem of automated document classification can be solved in supervised, unsupervised or semi-supervised way. Most of the learning and classification algorithms use document attributes and human inference to learn and classify given documents. In this dissertation work, many Natural Language Processing (NLP) techniques are used for document processing and attribute selection. And, two learning based classification techniques are used namely, Artificial Neural Network(ANN) and Naive Bayes Classifier. ANN is a microbiological model of leaning system and Naive Bayes Classifier is a probability based classification technique. For the evaluation of the system, we have created Nepali text datasets for five class of documents: Business, Crime, Education, Health and Sports. There are two separate datasets for training and testing of the system. Training set contains total 1253 documents with 243 for Business, 147 for Crime, 250 for Education, 270 for Health, and 343 for Sports. Similarly, testing dataset contains total 89 documents with 19 for Business, 20 for Crime, 12 for Education, 19 for Health, and 19 for Sports. Training and testing is done by splitting training set into two sets while keeping the testing set unique. Experimentation results show, feed-forward multilayer perceptron based neural network classifier has lower classification error rate than Naive Bayes based classifier. MLP classification system has the average system accuracy rate of 87:55%, system error rate of 12:44%, precision rate of 80:29% recall rate of 93:41% and f-score rate of 86:55%. Similarly, Naive Bayes classification system has the average system accuracy rate of 87:09%, system error rate of 12:90%, precision rate of 79:37% recall rate of 93:87% and f-score rate of 86:05%. Keywords: Automated Document Categorization, Text Classification, Natural language processing, Nepali language, Preprocessing, Feature extraction, Artificial Neural Networks, Multilayer Perceptron, Naive Bayes Classifier
URI: https://elibrary.tucl.edu.np/handle/123456789/15409
Appears in Collections:Computer Science & Information Technology

Files in This Item:
File Description SizeFormat 
Full Thesis.pdf1.55 MBAdobe PDFView/Open


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.