Support vector machines based part of speech tagging for Nepal text
Date
2012
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Department of Computer Science and Information Technology
Abstract
Optimal part-of-speech tagging have great importance in various field of natural language
processing such as machine translation, information extraction, word sense disambiguation,
speech recognition and others. Due to the nature of the Nepali language, tagset used and size of
the corpus (training data), getting accurate part-of-speech tagger is of challenging issue. This
study is oriented to build an analytical machine learning model based on which it can be possible
to determine the attainable accuracy. To complete this task, the support vector machine based
part-of-speech tagger has been developed and tested for various instances of input to verify the
accuracy level. The SVM tagger construct the feature vectors for each word in input and classify
the word into one of two classes (One Vs Rest).
The performance analysis includes different components such as known words, unknown words
and size of the training data. The present study of support vector machine based part of speech
tagger is limited to use certain set of features and it use a small dictionary which affects its
performance.
The learning performance of tagger is observed and found that it can learn well from the small
set of training data and increases the rate of learning on the increment of training size.
Description
Keywords
Machine learning, Computational linguistics