Please use this identifier to cite or link to this item: https://elibrary.tucl.edu.np/handle/123456789/20414
Title: Support vector machines based part of speech tagging for Nepal text
Authors: Shahi, Tej Bahadur
Keywords: Machine learning;Computational linguistics
Issue Date: 2012
Publisher: Department of Computer Science and Information Technology
Institute Name: Central Department of Computer Science and Information Technology
Level: Masters
Abstract: Optimal part-of-speech tagging have great importance in various field of natural language processing such as machine translation, information extraction, word sense disambiguation, speech recognition and others. Due to the nature of the Nepali language, tagset used and size of the corpus (training data), getting accurate part-of-speech tagger is of challenging issue. This study is oriented to build an analytical machine learning model based on which it can be possible to determine the attainable accuracy. To complete this task, the support vector machine based part-of-speech tagger has been developed and tested for various instances of input to verify the accuracy level. The SVM tagger construct the feature vectors for each word in input and classify the word into one of two classes (One Vs Rest). The performance analysis includes different components such as known words, unknown words and size of the training data. The present study of support vector machine based part of speech tagger is limited to use certain set of features and it use a small dictionary which affects its performance. The learning performance of tagger is observed and found that it can learn well from the small set of training data and increases the rate of learning on the increment of training size.
URI: https://elibrary.tucl.edu.np/handle/123456789/20414
Appears in Collections:Computer Science & Information Technology

Files in This Item:
File Description SizeFormat 
chapter page.pdf2.33 MBAdobe PDFView/Open
cover page.pdf296.06 kBAdobe PDFView/Open


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.