PARAPHRASE GENERATION OF NEPALI LANGUAGE IN DEVANAGARI SCRIPT USING NATURAL LANGUAGE PROCESSING

SAPKOTA, AAJAY; ACHARYA, ABINASH; BADE, ANISH; UPRETI, MAHESH

PARAPHRASE GENERATION OF NEPALI LANGUAGE IN DEVANAGARI SCRIPT USING NATURAL LANGUAGE PROCESSING

Files

Aajay sapkota et al. be report electronics apr2023.pdf (410.34 KB)

Date

2023-04-30

Authors

Publisher

I.O.E. Pulchowk Campus

Abstract

The project aims to develop a system for generating paraphrases using transformer-based models. Fine-tuning the pre-trained models on a large-scale dataset of sentence pairs, consisting of source sentences and their corresponding paraphrases, and evaluation of their performance on several benchmarks was performed. To accomplish the project’s objectives, several tasks were undertaken, such as researching and allocating resources, collecting and translating datasets, sampling, filtering, and analyzing the feasibility of the model. The comprehensive approach employed in the project has enabled the development of a powerful tool for generating high-quality paraphrases, which could enhance the natural language processing and generation capabilities of various applications. Moreover, this model excels in utilizing mathematical and statistical metrics such as BLEU and ROUGE scores to accurately assess paraphrasing. Additionally, the model demonstrated excellent performance on different datasets, showcasing its ability to generalize across different types of test sets. But, the zero-shot evaluation produced a result not so expected, suggesting a low recall score for new sentences which highlighted the need for further improvements in the model. Similarly, this model faces significant challenges such as entity mismatches, semantic and syntactic differences, and exact match problems between the input sentences and their corresponding generated sentences. Furthermore, the implementation of a web application enabled users to input sentences and receive their paraphrases in real time, demonstrating the practicality of our approach. Nonetheless, this research emphasizes the vast potential of advanced language models to enhance natural language processing capabilities in low-resource languages.

Description

The project aims to develop a system for generating paraphrases using transformer-based models. Fine-tuning the pre-trained models on a large-scale dataset of sentence pairs, consisting of source sentences and their corresponding paraphrases, and evaluation of their performance on several benchmarks was performed.