PARAPHRASE GENERATION OF NEPALI LANGUAGE IN DEVANAGARI SCRIPT USING NATURAL LANGUAGE PROCESSING
Date
2023-04-30
Journal Title
Journal ISSN
Volume Title
Publisher
I.O.E. Pulchowk Campus
Abstract
The project aims to develop a system for generating paraphrases using transformer-based
models. Fine-tuning the pre-trained models on a large-scale dataset of sentence pairs, consisting
of source sentences and their corresponding paraphrases, and evaluation of their
performance on several benchmarks was performed. To accomplish the project’s objectives,
several tasks were undertaken, such as researching and allocating resources, collecting and
translating datasets, sampling, filtering, and analyzing the feasibility of the model. The
comprehensive approach employed in the project has enabled the development of a powerful
tool for generating high-quality paraphrases, which could enhance the natural language
processing and generation capabilities of various applications. Moreover, this model excels
in utilizing mathematical and statistical metrics such as BLEU and ROUGE scores to accurately
assess paraphrasing. Additionally, the model demonstrated excellent performance on
different datasets, showcasing its ability to generalize across different types of test sets. But,
the zero-shot evaluation produced a result not so expected, suggesting a low recall score for
new sentences which highlighted the need for further improvements in the model. Similarly,
this model faces significant challenges such as entity mismatches, semantic and syntactic
differences, and exact match problems between the input sentences and their corresponding
generated sentences. Furthermore, the implementation of a web application enabled users to
input sentences and receive their paraphrases in real time, demonstrating the practicality of
our approach. Nonetheless, this research emphasizes the vast potential of advanced language
models to enhance natural language processing capabilities in low-resource languages.
Description
The project aims to develop a system for generating paraphrases using transformer-based
models. Fine-tuning the pre-trained models on a large-scale dataset of sentence pairs, consisting
of source sentences and their corresponding paraphrases, and evaluation of their
performance on several benchmarks was performed.
Keywords
Natural Language Processing,, Transformer,, ROUGE