Please use this identifier to cite or link to this item: https://elibrary.tucl.edu.np/handle/123456789/18849
Title: SHRUTI - A NEPALI BOOK READER
Authors: PAUDEL, PRABIN
SHAH, RAHUL
G.C., RANJU
KHADKA, SUPRIYA
Keywords: Text-to-Speech,;Tacotron2,;Mel-spectrogram
Issue Date: May-2023
Publisher: I.O.E. Pulchowk Campus
Institute Name: Institute of Engineering
Level: Bachelor
Abstract: The use of audiobook technology in the classroom has long been a viable instructional intervention for struggling readers. Shruti, an AI-generated Nepali book reader, is an application that generates a voice for the book. It is a text-to-speech(TTS) system that takes an input book in a PDF format. The PDF is extracted to text using Optical Character Recognition(OCR) and sent to the text-to-speech pipeline. The speech synthesis acts in two phases: spectrogram generation and vocoder output. The text is extracted, preprocessed, tokenized and sent to the modified Tacotron2 model for generating Mel spectrograms. The output in the form of Mel spectrograms is sent to the HifiGAN vocoder, which produces the sound. The synthesized sample of speech attained a Mean Opinion Score of 4.04 on the basis of naturalness, when audio samples were subjected to 28 volunteers. This sound is post-processed as a final output. The model has been deployed and integrated with a mobile application.
Description: The use of audiobook technology in the classroom has long been a viable instructional intervention for struggling readers. Shruti, an AI-generated Nepali book reader, is an application that generates a voice for the book. It is a text-to-speech(TTS) system that takes an input book in a PDF format.
URI: https://elibrary.tucl.edu.np/handle/123456789/18849
Appears in Collections:Computer Engineering

Files in This Item:
File Description SizeFormat 
Prabin Paudel et al. be report computer may 2023.pdf2.19 MBAdobe PDFView/Open


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.