QUESTION SIMILARITY DETECTION AND ANALYSIS

dc.contributor.authorSHRESTHA, MILAN
dc.contributor.authorSHAKYA, NISCHAL
dc.contributor.authorSWARNAKAR, NITESH
dc.contributor.authorSUBEDI, ROSHAN
dc.date.accessioned2023-07-30T06:26:18Z
dc.date.available2023-07-30T06:26:18Z
dc.date.issued2023-04-30
dc.descriptionThe project aims to explore the effectiveness of using the SBERT model and vector database for performing question similarity analysis. The project involves building a vector database by training a sentence transformer model on a large corpus of text data. The vector dataset is then used to perform question similarity analysis by retrieving similar questions and similarity scores to a given search query.en_US
dc.description.abstractThe project aims to explore the effectiveness of using the SBERT model and vector database for performing question similarity analysis. The project involves building a vector database by training a sentence transformer model on a large corpus of text data. The vector dataset is then used to perform question similarity analysis by retrieving similar questions and similarity scores to a given search query. The model is trained on a large corpus of ALLNLI datasets, other paraphrase datasets such as MRPC, and PAWS, and the semantic similarity of datasets such as STS and finally adapted on 9,282 custom-prepared engineering datasets. The sentence transformer model is trained using the aforementioned datasets with MNR Loss as the loss function. The effectiveness of the model is evaluated by using the STS test dataset and test set of the MRPC. The result of the project demonstrates that using a sentence transformer model and vector database for question similarity analysis outperforms the baseline method of keyword matching. The approach achieved a spearman correlation value of 0.863 on the STS benchmark and an accuracy of 88.7% on the MRPC test. The Spearman correlation value in the SBERT paper for the NLI-large dataset was below 0.80. These values show that continuous training of the model on other datasets besides NLI helps to increase the performance and performs better for downstream tasks. This suggests that the use of the sentence transformer model and vector database is a promising approach for performing question similarity analysis, which could have significant implications for information retrieval systems.en_US
dc.identifier.urihttps://hdl.handle.net/20.500.14540/18803
dc.language.isoenen_US
dc.publisherI.O.E. Pulchowk Campusen_US
dc.subjectIndexing,en_US
dc.subjectInformation retrieval,en_US
dc.subjectvector databaseen_US
dc.titleQUESTION SIMILARITY DETECTION AND ANALYSISen_US
dc.typeReporten_US
local.academic.levelBacheloren_US
local.affiliatedinstitute.titlePulchowk Campusen_US
local.institute.titleInstitute of Engineeringen_US
Files
Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Milan shrestha et al. be project report electronics and computer apr2023.pdf
Size:
2.54 MB
Format:
Adobe Portable Document Format
Description:
License bundle
Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
1.71 KB
Format:
Item-specific license agreed upon to submission
Description: