Text similarity using corpus based semantic word similarity and string similarity for short Nepali texts

Manandhar, Laxman

Please use this identifier to cite or link to this item: https://elibrary.tucl.edu.np/handle/123456789/20404

Full metadata record

DC Field	Value	Language
dc.contributor.author	Manandhar, Laxman	-
dc.date.accessioned	2023-10-13T09:56:45Z	-
dc.date.available	2023-10-13T09:56:45Z	-
dc.date.issued	2013	-
dc.identifier.uri	https://elibrary.tucl.edu.np/handle/123456789/20404	-
dc.description.abstract	Similarity measure for long text, documents have been in research from long time but similarity measure for short text were not been given much emphasis. Short Texts and sentences similarity measures are now considered to be important research topic due to its many applications in the field of Natural language processing and information retrieval. The need to determine semantic similarity, semantic distance between two lexically expressed concepts is a problem that pervades much of natural language processing. This thesis deals with one of Information Retrieval’s big interest: Textual Similarity. This thesis includes the study and implementation of short text similarity measure for Nepali language. The semantic text similarity has not been yet studied for Nepali language text. This thesis deals with two main challenges .The first is to determine the similarity of the two short texts having different lexical terms and the second is determining the semantic similarity based on string similarity for considering the minor spelling mistakes of the words in the sentence. Such measures should mostly be considered during web retrieval as users may not always give the right spelling for the words. Nepali language is based on devanagari script and has different literature. This thesis includes the implementation and analysis of the String similarity measures (Modified version of Longest Common Subsequences and String edit distance) and corpus based word similarity measure (Second Order Co-Occurrence Point Wise Mutual Information) for overall semantic Text similarity. Improvement has been done for the integration of word similarity measure and string similarity measure.	en_US
dc.language.iso	en_US	en_US
dc.publisher	Department of Computer Science and Information Technology	en_US
dc.subject	Text similarity	en_US
dc.subject	Nepali text	en_US
dc.title	Text similarity using corpus based semantic word similarity and string similarity for short Nepali texts	en_US
dc.type	Thesis	en_US
local.institute.title	Central Department of Computer Science and Information Technology	en_US
local.academic.level	Masters	en_US
Appears in Collections:	Computer Science & Information Technology

Files in This Item:

File	Description	Size	Format
Full thesis.pdf		1.56 MB	Adobe PDF	View/Open

Show simple item record

TUCL eLibrary

Easy and open access to all types of digital resources of TUCL