Support vector machines based part of speech tagging for Nepal text

Shahi, Tej Bahadur

Support vector machines based part of speech tagging for Nepal text

Files

chapter page.pdf (2.28 MB)

cover page.pdf (296.06 KB)

Date

2012

Authors

Shahi, Tej Bahadur

Publisher

Department of Computer Science and Information Technology

Abstract

Optimal part-of-speech tagging have great importance in various field of natural language processing such as machine translation, information extraction, word sense disambiguation, speech recognition and others. Due to the nature of the Nepali language, tagset used and size of the corpus (training data), getting accurate part-of-speech tagger is of challenging issue. This study is oriented to build an analytical machine learning model based on which it can be possible to determine the attainable accuracy. To complete this task, the support vector machine based part-of-speech tagger has been developed and tested for various instances of input to verify the accuracy level. The SVM tagger construct the feature vectors for each word in input and classify the word into one of two classes (One Vs Rest). The performance analysis includes different components such as known words, unknown words and size of the training data. The present study of support vector machine based part of speech tagger is limited to use certain set of features and it use a small dictionary which affects its performance. The learning performance of tagger is observed and found that it can learn well from the small set of training data and increases the rate of learning on the increment of training size.