Foreign Word Extraction in Nepali Texts

Khadka, Diksha

Please use this identifier to cite or link to this item: https://elibrary.tucl.edu.np/handle/123456789/15432

Full metadata record

DC Field	Value	Language
dc.contributor.author	Khadka, Diksha	-
dc.date.accessioned	2023-03-05T04:58:05Z	-
dc.date.available	2023-03-05T04:58:05Z	-
dc.date.issued	2014	-
dc.identifier.uri	https://elibrary.tucl.edu.np/handle/123456789/15432	-
dc.description.abstract	In Nepali text, foreign words, which are mostly transliterations of English words, are frequently used. Foreign words are usually very important index terms in information retrieval since most of them are technical terms or names. So, accurate foreign word extraction is important for high performance of information retrieval. In this study we present a foreign word extraction method for Nepali text document. In order to accurately extract the foreign words, we developed a framework using rule based syllabification. The performance analysis includes different components such as known words, unknown words and size of training data. The present study of supervised rule based syllabification approach is limited due to the existence of same syllable structure for both Nepali and English words and it use a small dictionary which affects its performance. During this study, the efficacy has taken over 12000 syllabified words taken from different daily online news sites. The analysis is done taking into account the various factors like Precision and Recall. In this dissertation, we present a syllabification algorithm for Nepali language. The process of syllabification performs the task of identifying syllables in a word. The correct syllabification rules and algorithms are mainly used in text-to-speech system to improve naturalness of the synthesized speech. We propose an algorithm based on syllable rules matching. The syllable rules matching achieved precision of 83% and recall of 63%.	en_US
dc.language.iso	en_US	en_US
dc.publisher	Department of Computer Science and Information Technology	en_US
dc.subject	Foreign word	en_US
dc.subject	Nepali texts	en_US
dc.title	Foreign Word Extraction in Nepali Texts	en_US
dc.type	Thesis	en_US
local.institute.title	Central Department of Computer Science and Information Technology	en_US
local.academic.level	Masters	en_US
Appears in Collections:	Computer Science & Information Technology

Files in This Item:

File	Description	Size	Format
Full Thesis (14).pdf		2.04 MB	Adobe PDF	View/Open

Show simple item record

TUCL eLibrary

Easy and open access to all types of digital resources of TUCL