SHRUTI - A NEPALI BOOK READER

Abstract
The use of audiobook technology in the classroom has long been a viable instructional intervention for struggling readers. Shruti, an AI-generated Nepali book reader, is an application that generates a voice for the book. It is a text-to-speech(TTS) system that takes an input book in a PDF format. The PDF is extracted to text using Optical Character Recognition(OCR) and sent to the text-to-speech pipeline. The speech synthesis acts in two phases: spectrogram generation and vocoder output. The text is extracted, preprocessed, tokenized and sent to the modified Tacotron2 model for generating Mel spectrograms. The output in the form of Mel spectrograms is sent to the HifiGAN vocoder, which produces the sound. The synthesized sample of speech attained a Mean Opinion Score of 4.04 on the basis of naturalness, when audio samples were subjected to 28 volunteers. This sound is post-processed as a final output. The model has been deployed and integrated with a mobile application.
Description
The use of audiobook technology in the classroom has long been a viable instructional intervention for struggling readers. Shruti, an AI-generated Nepali book reader, is an application that generates a voice for the book. It is a text-to-speech(TTS) system that takes an input book in a PDF format.
Citation