A RULE BASED STEMMER FOR NEPALI

dc.contributor.authorKOIRALA, PRAVESH
dc.date.accessioned2022-01-07T07:18:00Z
dc.date.available2022-01-07T07:18:00Z
dc.date.issued2017-11
dc.descriptionStemming is an integral part of Natural Language Processing. It’s a preprocessing step in almost every NLP application. Arguably, the most important usage of stemming is in Information Retrieval.en_US
dc.description.abstractStemming is an integral part of Natural Language Processing. It’s a preprocessing step in almost every NLP application. Arguably, the most important usage of stemming is in Information Retrieval. While there has been lots of work done on stemming in languages like English, Nepali stemming has only a few mentionable works. This study focuses on creating a Rule Based stemmer for Nepali text. Specifically, it is a affix stripping system that identifies two different types of suffixes in Nepali grammar and strips them separately. Only a single negativity prefix न is identified and stripped. This study focuses on a number of techniques like exception word identification, morphological normalization, word transformation and stemming limit enforcement to increase stemming performance. The stemmer is also tested intrinsically using Paice’s method and extrinsically on a basic tf-idf based IR system. Upon testing, the under-stemming error was found to be 5.27% and the over-stemming error was found to be 0.2% which is a superior performance than existing works. The IR was tested on stemmed vs non-stemmed documents and queries using 14 queries and it was found that the stemming scheme increased the average relevance of retrieved documents by 18.6%.en_US
dc.identifier.urihttps://elibrary.tucl.edu.np/handle/20.500.14540/7151
dc.language.isoenen_US
dc.publisherPulchowk Campusen_US
dc.subjectNepalien_US
dc.subjectUnder-Stemmingen_US
dc.subjectStemmingen_US
dc.subjectOver-Stemmingen_US
dc.titleA RULE BASED STEMMER FOR NEPALIen_US
dc.typeThesisen_US
local.academic.levelMastersen_US
local.affiliatedinstitute.titlePulchowk Campusen_US
local.institute.titleInstitute of Engineeringen_US
Files
Original bundle
Now showing 1 - 1 of 1
No Thumbnail Available
Name:
Pravesh Koirala.pdf
Size:
511.54 KB
Format:
Adobe Portable Document Format
Description:
License bundle
Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
1.71 KB
Format:
Item-specific license agreed upon to submission
Description: