Nepali Text to Speech using Time Domain Pitch Synchronous Overlap Add Method

Journal Title

Journal ISSN

Volume Title

Publisher

Pulchowk Campus

Abstract

Nepali text to speech has range of application. Nepali Text to speech synthesizer provides help in reading for differently-able and illiterate people. Text to speech is the system of converting Nepali written text to speech through text analysis and speech synthesis. Since Nepali is phonetically rich language, the system uses concatenative approach. The Time Domain Pitch Synchronous Overlap Add (TD-PSOLA) concatenative synthesis is simple and efficient concatenative method. It is based on pre-recorded samples, cut this up into small pieces and then recombines these to form new speech. The raw Nepali text undergoes the letter to sound conversion. The letter to sound conversion is based on diphones concatenation. The diphone dictionary is created from the pre-recorded speech signals. The speech database consists of the diphones, start position, end position of speech segment extracted from pre-recorded speech. For each diphone signal the pitch is determined using autocorrelation method. Hanning window and Hamming window has been used for windowing of speech signal. TD-PSOLA method implies the signals are overlapped and added to generate the synthetic speech signal. Hence TD-PSOLA concatenative method has been implemented to generate synthetic speech signal for Nepali text.

Description

Nepali text to speech has range of application. Nepali Text to speech synthesizer provides help in reading for differently-able and illiterate people.

Citation

MASTER OF SCIENCE IN COMPUTER SYSTEM AND KNOWLEDGE ENGINEERING