Most current high-quality speech synthesizers have only a few voices for each language. This is mostly due to the cost of building a speech inventory for a new voice, which requires a professional speaker that articulates each utterance in a predictable way and manual tuning of the phonetic segmentation of the recordings. At L2F we have been focusing our efforts in the development of tools to automate the process of creating new voices for a speech synthesizer. The precision of automatic phonetic alignment has been achieved by combining both HMM and DTW techniques [Paulo 2003, 2004] and speaker’s regional dialect and disfluencies have also been taken into account [Paulo 2005]. The resulting voices have been integrated into synthesizers for both limited and unlimited domain applications. We have been working with the Festival Speech Synthesis System, a free software synthesis toolkit and engine developed at the University of Edinburgh and using CMU’s FestVox tools for building new voices and Flite, a small footprint synthesis engine.
The synthesizers developed at L2F have been integrated in a variety of applications namely a dialogue system for home automation [Neto 2003, 2004] and to provide speech output for synthetic characters [Cabral 2006a]. These applications require the ability to modify not only the rhythm and intonation of the synthesized speech, as produced by standard speech synthesizers, but they also require the ability to perform voice quality transformations to produce more expressive speech [Cabral 2005, 2006b].
Current research in this area at L2F also includes the following topics: