Foreign Word Extraction in Nepali Texts
Date
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Department of Computer Science and Information Technology
Abstract
In Nepali text, foreign words, which are mostly transliterations of English words, are frequently
used. Foreign words are usually very important index terms in information retrieval since most
of them are technical terms or names. So, accurate foreign word extraction is important for high
performance of information retrieval. In this study we present a foreign word extraction method
for Nepali text document. In order to accurately extract the foreign words, we developed a
framework using rule based syllabification.
The performance analysis includes different components such as known words, unknown words
and size of training data. The present study of supervised rule based syllabification approach is
limited due to the existence of same syllable structure for both Nepali and English words and it
use a small dictionary which affects its performance.
During this study, the efficacy has taken over 12000 syllabified words taken from different daily
online news sites. The analysis is done taking into account the various factors like Precision and
Recall.
In this dissertation, we present a syllabification algorithm for Nepali language. The process of
syllabification performs the task of identifying syllables in a word. The correct syllabification
rules and algorithms are mainly used in text-to-speech system to improve naturalness of the
synthesized speech. We propose an algorithm based on syllable rules matching. The syllable
rules matching achieved precision of 83% and recall of 63%.