OUISPER: CORPUS BASED SYNTHESIS DRIVEN BY ARTICULATORY DATA

Thomas Hueber1, Gerard Chollet2, Bruce Denby3, Maureen Stone4 & Leila Zouari2
1LABORATOIRE D'ELECTRONIQUE, ESPCI / CNRS-LTCI, ENST; 2CNRS-LTCI, ENST; 3LABORATOIRE D'ELECTRONIQUE, ESPCI / UPMC-PARIS VI; 4Dept of Biomedical Sciences and Orthodontics, University of Maryland Dental School, Baltimore, MD, USA

ID 1513
[full paper]

Many applications require the production of intelligible speech from articulatory data. This paper outlines a research program (Ouisper : Oral Ultrasound synthetIc SPEech souRce) to synthesize speech from ultrasound acquisition of the tongue movement and video sequences of the lips. Video data is used to search in a multistream corpus associating images of the vocal tract and lips with the audio signal. The search is driven by the recognition of phone units using Hidden Markov Models trained on video sequences. Preliminary results support the feasibility of this approach.