SHRUTI - A NEPALI BOOK READER

dc.contributor.authorPAUDEL, PRABIN
dc.contributor.authorSHAH, RAHUL
dc.contributor.authorG.C., RANJU
dc.contributor.authorKHADKA, SUPRIYA
dc.date.accessioned2023-07-31T06:54:48Z
dc.date.available2023-07-31T06:54:48Z
dc.date.issued2023-05
dc.descriptionThe use of audiobook technology in the classroom has long been a viable instructional intervention for struggling readers. Shruti, an AI-generated Nepali book reader, is an application that generates a voice for the book. It is a text-to-speech(TTS) system that takes an input book in a PDF format.en_US
dc.description.abstractThe use of audiobook technology in the classroom has long been a viable instructional intervention for struggling readers. Shruti, an AI-generated Nepali book reader, is an application that generates a voice for the book. It is a text-to-speech(TTS) system that takes an input book in a PDF format. The PDF is extracted to text using Optical Character Recognition(OCR) and sent to the text-to-speech pipeline. The speech synthesis acts in two phases: spectrogram generation and vocoder output. The text is extracted, preprocessed, tokenized and sent to the modified Tacotron2 model for generating Mel spectrograms. The output in the form of Mel spectrograms is sent to the HifiGAN vocoder, which produces the sound. The synthesized sample of speech attained a Mean Opinion Score of 4.04 on the basis of naturalness, when audio samples were subjected to 28 volunteers. This sound is post-processed as a final output. The model has been deployed and integrated with a mobile application.en_US
dc.identifier.urihttps://hdl.handle.net/20.500.14540/18849
dc.language.isoenen_US
dc.publisherI.O.E. Pulchowk Campusen_US
dc.subjectText-to-Speech,en_US
dc.subjectTacotron2,en_US
dc.subjectMel-spectrogramen_US
dc.titleSHRUTI - A NEPALI BOOK READERen_US
dc.typeReporten_US
local.academic.levelBacheloren_US
local.affiliatedinstitute.titlePulchowk Campusen_US
local.institute.titleInstitute of Engineeringen_US
Files
Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Prabin Paudel et al. be report computer may 2023.pdf
Size:
2.14 MB
Format:
Adobe Portable Document Format
Description:
License bundle
Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
1.71 KB
Format:
Item-specific license agreed upon to submission
Description: