SHRUTI - A NEPALI BOOK READER
Date
2023-05
Journal Title
Journal ISSN
Volume Title
Publisher
I.O.E. Pulchowk Campus
Abstract
The use of audiobook technology in the classroom has long been a viable instructional
intervention for struggling readers. Shruti, an AI-generated Nepali book reader, is an
application that generates a voice for the book. It is a text-to-speech(TTS) system
that takes an input book in a PDF format. The PDF is extracted to text using Optical
Character Recognition(OCR) and sent to the text-to-speech pipeline. The speech
synthesis acts in two phases: spectrogram generation and vocoder output. The text
is extracted, preprocessed, tokenized and sent to the modified Tacotron2 model for
generating Mel spectrograms. The output in the form of Mel spectrograms is sent to
the HifiGAN vocoder, which produces the sound. The synthesized sample of speech
attained a Mean Opinion Score of 4.04 on the basis of naturalness, when audio samples
were subjected to 28 volunteers. This sound is post-processed as a final output. The
model has been deployed and integrated with a mobile application.
Description
The use of audiobook technology in the classroom has long been a viable instructional
intervention for struggling readers. Shruti, an AI-generated Nepali book reader, is an
application that generates a voice for the book. It is a text-to-speech(TTS) system
that takes an input book in a PDF format.
Keywords
Text-to-Speech,, Tacotron2,, Mel-spectrogram