Session Poster IV:

Poster IV

Type: poster
Chair: Helene Loevenbruck , Elina Savino
Date: Wednesday - August 08, 2007
Time: 10:00
Room: Poster Area


Jalaleddin Al-Tamimi, Laboratoire dynamique du Langage – CNRS – Université Lyon 2 (UMR 5596), 14, Avenue Berthelot – 69007 Lyon – France
Paper File Additional Files
  The aim of this paper is to examine the role of dynamic cues (i.e. formant slopes obtained from a linear regression analysis) in comparison with static one (i.e. vowel targets) in the classification of Jordanian and Moroccan vowels, using Discriminant Analysis. 10 speakers per dialect produced a list of vowels in C1VC2, C1VC2V, or C1VC2VC words, where C1 and C2 were either /b/, /d/, /d/(pharyngealized) or /k/, and V, each vowel. Results show the possibility of vowel separation between both dialects for a specific consonantal environment. Using dynamic cues improves the correct classification rates of about 5% for Moroccan Arabic and 13% for Jordanian Arabic.
Poster IV-3 Tone distribution and its effect on subglottal pressure during speech
Helen M. Hanson, MIT
Janet Slifka, MIT
Stefanie Shattuck-Hufnagel, MIT
James Kobler, Massachusetts General Hospital
Paper File
  The current work is part of a project to characterize the subglottal pressure (Ps) contour in terms of the distribution of pitch accents and of phrase and boundary tones. Declination of the working phase, and the transition from the working phase to the termination phase are studied. It is found that the nuclear pitch accent does not define the start of the termination phase; the utterance offset is a better marker. Declination rate of the working phase and its relation to the phrase and boundary tones at utterance offset are found to vary among speakers. These differences could result in variations in SPL and F0 that contribute to a speaker's individuality. The results have implications for models of speech production, and for applications such as computer speech synthesis and recognition.
Liane Lovatto, Laboratoire de Phonétique et de Phonologie, UMR 7018 CNRS/ Sorbonne Nouvelle, 19 rue des Bernardins, 75005 Paris
Angélique Amelot, Institut de la Communication Parlée, Université Stendhal - 1180, Avenue Centrale, BP 25, 38040 CEDEX 9 Tél. +33 (0)4 76 82 43 37;Laboratoire de Phonétique et de Phonologie, UMR 7018 CNRS/ Sorbonne Nouvelle, 19 rue des Bernardins, 75005 Paris
Lise Crevier-Buchman, Unité Voix-Parole-Déglutition Hôpital Européen G. Pompidou, 20 rue Leblanc, 75015, Paris, Chargée de Recherche CNRS, LPP/UMR 7018, Université Paris 3
Patricia Basset, Laboratoire de Phonétique et de Phonologie, UMR 7018 CNRS/ Sorbonne Nouvelle, 19 rue des Bernardins, 75005 Paris
Jacqueline Vaissière, Laboratoire de Phonétique et de Phonologie, UMR 7018 CNRS/ Sorbonne Nouvelle, 19 rue des Bernardins, 75005 Paris
Paper File Additional Files
  This paper examines velar movements during the production of the nasal vowels /, i, u/ in Brazilian Portuguese (BP). Velum movements were measured for a female Brazilian speaker using fiberscopic video-recording synchronized with acoustic recording. The nasal vowel (Vn) was placed in initial, medial and final positions in nonwords with the following structure: VnCoVo, CoVnCoVo, and CoVoCoVn. The oral vowel Vo was /a, i, u/ and the oral consonant Co= /p/, /b/ or /f/. Our results based on fibroscopy confirm that (i) a nasal “tail” (/N/) is clearly observed in 85% of nasal vowel productions, (ii) the nasal tail is about the same length as the previous part of the vowel. This suggests that (iii) when Vn is in medial or final position, the maximum lowering of the velum is free to occur either before the nasal tail or during it.
Poster IV-7 Implicit rate and speaker normalization in a context-rich phonetic exemplar model
Travis Wade, Institute for Natural Language Processing, University of Stuttgart
Paper File
  In this study we present a model of speech perception in which (1) memory includes a single, ordered collection of acoustic cues extracted in real time at salient landmark locations from previously heard signals, and (2) identification of newly encountered sounds involves comparing the sounds and their surrounding contexts with similar sequences occurring in memory. Under these assumptions, perceptual speaker and rate normalization and context dependence in general follow implicitly from the statistics of the language environment and do not require traditionally assumed processes or levels of representation. We verify this by means of a simulation in which the model simultaneously acquires VOT and F1 cues to consonant voicing and vowel height, and their dependence on speaking rate and speaker gender, based on exposure to productions from the TIMIT database.
Poster IV-9 Acoustic and Auditory Analyses of N|uu lingual and pulmonic stop bursts
Amanda Miller, Cornell University
Johanna Brugman, Cornell University
Bonny Sands, Northern Arizona University
Paper File
  We provide auditory data for the center of gravity (COG) and the resonances of two spectral peaks for the lingual stop bursts of all five N|uu click types that differ in the location of the anterior and posterior constrictions. The COG and the resonance of two spectral peaks for lingual bursts in lingual and linguo-pulmonic (LP) stops do not differ between the two sets of clicks. We provide auditory spectra and the COG in the pulmonic posterior bursts of LP stops and [k] and [q]. Results support the claim that place of articulation for the pulmonic portion of LP stops is contrastive, uvular in the case of uvular clicks and upper pharyngeal for dental and palatal clicks, as with lingual stops. LP stops differ from lingual stops in that they are contour segments on the airstream dimension: a new type of segment.
Poster IV-11 Acoustic correlates of emphasis in Arabic
Allard Jongman, University of Kansas
Wendy Herd, University of Kansas
Mohammad Al-Masri, Hashemite University
Paper File
  The effects of emphasis, a secondary articulation in the posterior vocal tract, were investigated in the speech of 8 speakers of Jordanian Arabic. A number of acoustic parameters were measured in the consonants and vowels of mono- and bisyllabic minimal pairs containing plain or emphatic consonants in initial, medial, or final position. In general, the acoustic correlates of emphasis include a raised F1, lowered F2, and raised F3 in the vowel adjacent to the emphatic consonant. This pattern across the three formants suggests that emphasis involves a constriction near the epiglottis. In addition, the present results indicate that the spectral mean of the consonant itself is also a reliable acoustic correlate of emphasis. However, while the spread of emphasis can be detected throughout both vowels of bisyllabic words, only the target consonants themselves show an effect of emphasis.
Fangfang Li, Department of Linguistics, the Ohio State University
Jan Edwards, Department of Communicative Disorders, University of Wisconsin-Madison
Mary Beckman, Department of Linguistics, the Ohio State University
Paper File
  Most acoustic studies of sibilant fricatives focus on languages that have a place distinction like the English distinction between coronal alveolar /s/ and coronal post-alveolar /ʃ/. Much less attention has been paid to languages such as Japanese, where the contrast involves tongue posture as much as position. That is, the Japanese sibilant that contrasts with /s/ is /ɕ/, an alveolopalatal fricative that has a “palatalized” tongue shape (a bunched predorsum). This paper describes measures that can be calculated from the fricative interval alone, which we applied both to the place distinction of English and the “palatalization” or posture distinction of Japanese. The measures were further tested on Mandarin Chinese, a language that has a three-way contrast in sibilant fricatives contrasting in both tongue position and posture.
Haruo KUBOZONO, Department of Linguistics, Kobe University
Paper File
  The primary goal of this paper to propose a new mora-based analysis of loanword accent in the South Kyungsang dialect of Korean (SKK) on the basis of data from original fieldwork. This paper first points out some critical errors in previous studies concerning the description of loanword accent in SKK. It will then propose a new, much simpler generalization based on the notion ‘mora’. Specifically, several seemingly different accent patterns can be generalized as a rule assigning an accent on the penultimate mora. This rule as well as some other pitch features of SKK is strikingly similar to the loanword accent of (Tokyo) Japanese. These cross-linguistic similarities can be uncovered if and only if the mora is recognized as a relevant unit of description in Korean just as it is in Japanese.
Fang Liu, Department of Linguistics, The University of Chicago
Yi Xu, Department of Phonetics and Linguistics, University College London
Paper File
  The intonational realizations of statements and declarative questions in American English are studied by examining their interaction with focus and word stress. Five native speakers read 24 sentences eight times. Results of F0 analyses indicate that focus has no effect on the pre-focus region of either statements or questions. In the on-focus region, the pitch range of the stressed syllable is expanded in both statements and questions. The post-focus pitch range is compressed and lowered in statements, but compressed and raised in questions. Furthermore, the pitch target of the stressed syllable in a content word is high or falling in statements but rising in questions, depending on the focus condition and the stress pattern of the word. These results suggest that a particular combination of word stress, focus, and sentence type in an English utterance largely determines its local and global pitch contours.
Thomas Judd Magnuson, University of Victoria
Paper File
  Since even before Lindau’s Story of /r/, the search for a single phonetic (acoustic or articulatory) characteristic which defines rhotics as a class has met with little success [13]. In light of an alternative way of conceptualizing the vocal tract [6, 7], however, this paper proposes that there is indeed an articulatory basis for classifying at least some phonologically rhotic speech sounds as phonetically rhotic insofar as they necessarily involve some degree of constriction or expansion of the pharynx. This paper further proposes a model (Fig. 1) of rhotic association parameters which builds on Lindau’s 1985 [13] model by providing for the contribution of the laryngeal vocal tract to the production of r-like sounds.
Natalia Segal, France Telecom R&D Lannion
Katarina Bartkova, France Telecom R&D Lannion
Paper File
  Automatic speech processing has recently turned to the treatment of continuous spontaneous speech, which demands, among many other issues, a representation of its prosodic organization. This paper presents a new approach to automatic prosodic boundary detection and prosodic unit structuring, based, with certain changes, on a descriptive theory of the French prosodic system initially proposed for prepared speech. This theory had been transformed into a set of rules so as to create a hierarchical representation of a phrase in spontaneous French in the form of a prosodic tree. The method had been manually verified and then applied to a spontaneous speech database in order to obtain a statistical description of prosodic structures.
Poster IV-23 Looking for rhythms in conversational speech
Michael O'Dell, University of Tampere
Mietta Lennes, University of Helsinki
Stefan Werner, University of Joensuu
Tommi Nieminen, University of Jyväskylä
Paper File
  Our exploratory study looks for units of temporal structure in conversational Finnish speech. The relative significance of different hierarchical levels of rhythm was evaluated using Bayesian inference on a linear regression model of coupled oscillators. Results suggest that mora, stress and foot timing as rhythmic factors in Finnish are more relevant than traditionally assumed.
Saandia Ali, CNRS, Parole et Langage Université de Provence
Daniel Hirst, CNRS, Parole et Langage Université de Provence
Paper File
  This paper presents a general model for the relation between representations of form and function for speech prosody on a multi-lingual basis. It outlines a procedure for analysing prosody by synthesis generating formal representations from a minimal representation of prosodic functions and comparing the output with the observed data. This then allows the functional representation to be enriched and to test whether it provides a closer fit to the data. This is specifically applied to the intonation patterns of British English. Five successively more complex models are presented and applied to fifteen continuous passages from the Eurom1 corpus. The quality of fit of the models is finally measured by linear correlation with hand corrected modelled fundamental frequency curves. It is argued that such a process will provide a starting point for an analysis which eventually could provide fully automatic functional annotation of prosody on a multilingual basis.
Poster IV-27 Producing Phrasal Prominence in German
Bistra Andreeva, Institute of Phonetics, Saarland University, Saarbrücken
William J. Barry, Institute of Phonetics, Saarland University, Saarbrücken
Ingmar Steiner, Institute of Phonetics, Saarland University, Saarbrücken
Paper File Additional Files
  This study examines the relative change in a number of acoustic parameters usually associated with the production of prominences. The production of six German sentences under different question answer conditions provide de-accented and accented versions of the same words in broad and narrow focus. Normalised energy, F0, duration and spectral measures were found to form a stable hierarchy in their exponency of the three degrees of accentuation.
Svetlana Stepanova, St. Petersburg State University
Paper File
  This article examines the results of research conducted on different varieties of hesitation phenomena. The research, based on the spontaneous speech recordings of 10 Russian speakers, compares the spectral characteristics of these speakers’ vocalizations from hesitation pauses and the vowels /a / and /e/ within words from spontaneous monologues.
GONGGUAN PENG, Department of Chinese, Translation and Linguistics, City University of Hong Kong
Paper File
  An interesting phenomenon in Fuzhou Chinese is the co-variation of the rhyme and the tone. The alternating rhymes will assume different forms when associated with different sets of tones. Several suggestions have been put forward to explain the relation between the intrinsic pitch of the vowel and the tongue height of the vowel. However, these suggestions rely on different sources which differ somewhat in their descriptions of the number of the diphthongs which show alternations depending on the tone. The considerable variation in the use of symbols suggests the desirability of more acoustic data. Results show that there is no significant differences between those finals with /e, o, a/ as the nucleus occurring in different sets of tones. Duration result of the diphthongs shows that the distribution of time of each component in the falling diphthongs will change when associated with different sets of tones.
Poster IV-33 Post-oralized nasal consonants in Chinese dialects - Aerodynamic and acoustic data
Fang Hu, Phonetics Lab, Institute of Liguistics, Chinese Academy of Social Sciences
Paper File Additional Files
  Denasalization is a widely detected but not well documented and thus poorly understood phonetic or phonological process in Chinese dialects. The plain nasal consonants in Middle Chinese may remain as plain nasals such as in Wu dialects, have conditionally changed into plain fricatives or approximants such as in Mandarin dialects, or become post-oralized. This paper discusses acoustic and aerodynamic data of the post-oralized nasal consonants from four major Chinese dialect groups—Shanxi Jin dialects, Cantonese dialects around the Zhongshan area, southern Min dialects (Xiamen and Chao-Shan areas), and the Qingxin Hakka. The presented phonetic data reveal details of the denasalization process in Chinese dialects in particular and shed light on the understanding of historical sound change in general.
Poster IV-35 A Role for Phonotactic Constraints in Speech Perception
Keren Shatzman, Utrecht Institute of Linguistics OTS, University of Utrecht
René Kager, Utrecht Institute of Linguistics OTS, University of Utrecht
Paper File
  This study investigated whether abstract phonotactic constraints play a role in speech processing. Dutch listeners performed an auditory lexical decision task, in which the nonword stimuli either did or did not violate a phonotactic constraint. Listeners were faster to reject nonwords that violated a phonotactic constraint. This effect remained significant even after partialling out the effects of lexical factors, such as the similarity of the nonwords to existing words in the lexicon. This finding constitutes, to our knowledge, the first demonstration of the involvement of pure abstract phonotactic constraints in on-line speech perception.
Mohamed Yeou, Université Chouaib Doukkali, El jadida
Mohamed Embarki, Praxiling UMR 5267 CNRS-Montpellier III
Sallal Al Maqtari, Université de Sanaa
Christelle Dodane, Université Franche-Comté, Besançon
Paper File
  A comparison of F0 alignment values was carried out for three Arabic dialects (Moroccan Arabic, Kuwaiti Arabic and Yemeni Arabic) using five speakers from each variety. Clear differences found in peak and valley alignments enable separation of Moroccan Arabic from the two other dialects: (1) values of the F0 valley differed significantly, with Moroccan Arabic showing a later synchronisation than Kuwaiti Arabic and Yemeni Arabic; (2) there was variation as to the effect of syllable structure on F0 peaks. The effect is not significant in Yemeni Arabic and Kuwaiti Arabic as the F0 peak is aligned within the stressed vowel in both CV: and CV:C. In Moroccan Arabic, however, the effect of syllable structure is significant: the F0 peak is earlier in closed syllables than open syllables.
Poster IV-39 Lenition of voiceless fricatives in two varieties of southern Italian
Nadia Nocchi, Phonogrammarchiv der Universität Zürich
Stephan Schmid, Phonetisches Laboratorium der Universität Zürich
Paper File
  The purpose of this study is to verify if the traditionally acknowledged lenition of intervocalic plosives in the varieties of Southern Italy also applies to voiceless fricatives. Data from a recent corpus of semi-spontaneous speech collected in Naples and Palermo are analysed according to the parameters of duration, intensity, and voicing. It is demonstrated that, in the intervocalic position, the realisations of /s/ and /f/ are significantly shorter, whereas intensity does not prove to be affected by the phonotactic position; sonorization does occur, to some extent, in the Neapolitan data, being marginal among the Sicilian speakers.
Poster IV-41 Variability of rhotics in Punjabi-English bilinguals
Allen Hirson, City University London
Sohail Nabiah, City & Hackney Primary Care Trust, London
Paper File
  This paper examines variation of /r/-pronunciation as a function of social identification in Punjabi-English bilinguals. This is clearly different from previous studies of linguistic stratification based upon geography, gender age or social network [1], [2]. The study presented here examines group affiliation of second generation speakers of Punjabi in south east Britain. Specifically, the subject selection for the study partitions British-Asian speakers of English (from the Indian Subcontinent) on the basis of self-identification as either ‘British’ or as ‘Asian’. The main research question addressed was whether ‘British Asian’ speakers of English with particular social affiliations acquire the local (south east British) pattern of /r/-pronunciation, or retain features of Punjabi rhotic pronunciation.
Christelle DODANE, Laboratoire Dipralang, Université Paul Valéry Montpellier 3
Jalaleddin AL-TAMIMI, Laboratoire Dynamique du Langage, UMR CNRS 5596, Université Lumière Lyon II
Paper File Additional Files
  This research investigated the role of child-directed speech in the acquisition of vowel systems in a cross-linguistic perspective. In order to determine if vocalic systems are extended in child-directed speech and if this extension varies cross-linguistically, child-directed speech was compared to adult-directed speech in three different languages, French, English and Japanese. The same short story was successively read by mothers to their infant and to an adult (5 mother-infant dyads per language). The acoustic analyses reveal a downward shift of the vowel triangle on the high-low dimension of vowel space (F1). In the three languages, mothers tend to produce more opened vowels in CDS than in ADS.
Poster IV-45 An Automatic Phonetic Transcription Marker as a Phonetics Teaching Tool
Laurence Paris-Delrue, Université de Lille 3
Jean-Claude Desruque, Université de Lille 3
Paper File
  This paper is a report on an experiment carried out with French students of English as a second language. It aims to test the validity of combining a multimedia tool with a constructivist approach to phonetics teaching at university level.
Poster IV-47 Linguistic factors in L2 word stress acquisition: A comparison of Chinese and Vietnamese EFL learners’ development
Shu-chen OU, National University of Kaohsiung
Paper File
  This paper disambiguates two linguistic factors in L2 English word stress acquisition. Chinese EFL learners have been found to prefer a syllable to be stressed when it is closed by a sonorant [5]. This non-English-like pattern is open to at least two interpretations: (a) an effect of L1 transfer, and (b) an effect of universal sonority-weight mapping. In order to evaluate the analyses, data was collected from L1 Vietnamese speakers whose native language allows both sonorant and obstruent codas. If the Vietnamese speakers do not show a preference for sonorant codas, then the L1 transfer interpretation is supported. If both groups acquire L2 English stress in a similar way, then an effect of phonological universal might be possible. These predictions are tested in a perceptual experiment. The results support the hypothesis that the L2 English stress pattern shown by L1 Chinese speakers is due to phonological universals.
Poster IV-49 Perception of English Lexical Stress: Effect of F0 Peak Location on English and Japanese Speakers
Shinichi Tokuma, Chuo University
Paper File
  This study investigated the perceptual effect of duration and F0 peak location on L1/L2 perception of English lexical stress. A nonsense bisyllabic English word embedded in a frame sentence, whose F0 was set to reach its peak after the word, was used as the stimuli of the perceptual experiment. Native English and Japanese speakers were asked to determine lexical stress locations. The results showed that in the perception of English lexical stress, F0 peaks that immediately followed the stimulus words perceptually affected the subjects in an opposite manner: Japanese speakers perceived these F0 peaks as a cue to lexical stress in the preceding syllable, while English speakers perceived them as an independent prominence peak and showed perceptual stress shift. The findings also confirmed the claim by previous studies that, while the perception of Japanese subjects is scarcely affected by duration, English subjects show great sensitivity to it.
Poster IV-51 Phonetic criteria of attractive male voices
Vivien Zuta, Institute of Phonetics Frankfurt
Paper File
  In German voice and language science we act on the assumption that a male’s voice has to be deep in order to leave an attractive impression on the female listener. This study shows that this is not the fact and that even voices with a middle or high fundamental frequency can be judged as attractive. Furthermore, the analysis shows that there is a combination of various parameters, which are responsible for leaving an impression of any kind (positive or negative) to the listener.
Poster IV-53 Developmental changes in cerebral responses to native and non-native vowels: a NIRS study
Yasuyo Minagawa-Kawai, LSCP, EHESS-ENS-CNRS
Nahoko Nishijima, Keio University
Nozomi Naoi, Keio University
Emmanuel Dupoux, LSCP, EHESS-ENS-CNRS
Shozo Kojima, Keio University
Paper File
  While newborn infants discriminate speech sounds from languages that they have never heard, 6-month-olds demonstrate the beginnings of vowel classification specific to their native-language. The neuronal correlates involved in such a dramatic perceptual reorganization process, however, are not well understood. Using near-infrared spectroscopy (NIRS), this study compares the neural responses of Japanese infants at 3-4 months and 7-8 months of age as well as of adults to native ([i] vs. [uu] ) and non-native vowel contrasts ([uu] vs. [u]) within pseudo-word contexts. The findings demonstrated longitudinal developmental changes of functional temporal cortex asymmetries associated with the exposure of the native language.
Poster IV-55 Inhibition of Processing Due to Reduction of the American English Flap
Benjamin V. Tucker, University of Arizona
Natasha Warner, University of Arizona
Paper File
  The speech we encounter in daily life casual conversation often contains impoverished or reduced acoustic information, in comparison to careful speech, and yet listeners can understand such speech with ease. This study explores differences in processing between reduced/ conversational speech and unreduced/careful speech. In a cross-modal identity priming experiment, listeners heard reduced vs. careful pronunciations of real words and then saw visual stimuli and decided whether the visual stimulus was a real word. This experiment investigates processing differences between reduced and unreduced speech using the American English flapped /d/ and word-medial /g/. American English listeners are shown to process unreduced (clear) targets more quickly than reduced targets.
Anke Blech, Departement of Phoniatrics, Pedaudiology and Communication Disorders, UKAachen and Aachen University
Luise Springer, School of Logopedics, University Hospital, Aachen
Bernd J. Kröger, Departement of Phoniatrics, Pedaudiology and Communication Disorders, UKAachen and Aachen University
Paper File
  Purpose: Childhood Apraxia of Speech (CAS) is a developmental disorder affecting the speech motor programming and planning. This study aims to investigate deviant vowel and diphthong articulations of German children with suspected CAS. Methods and Data: A corpus of 115 isolated stimulus words were evoked by picture naming, 33 pseudo words were evoked by repetition and collected for three German children with suspected CAS aged 5;9 to 6;3 years and for 21 controls. Perceptual and acoustic analysis were done in order to judge the vowels and diphthong realisations of suspected CAS-children vs. control speakers. Results: The perceptual evaluation shows vowel and diphthong errors in the suspected CAS-children in contrast to the controls. Discussion: This study shows that incorrect vowel and diphthong productions can be detected in children with suspected CAS by perceptual and acoustic evaluation.
Poster IV-59 An acoustic investigation of pitch accent contrasts in the speech of a Norwegian patient with the foreign accent syndrome
Inger Moen, Department of Linguistics and Scandinavian Studies, University of Oslo
Frank Becker, Sunnaas Hospital
Live Günther, Sunnaas Hospital
Mari Berntsen, Sunnaas Hospital
Paper File
  In 2005 a middle aged Norwegian man became aphasic as a result of a left hemisphere stroke. After a few months his aphasic condition had improved. He was mildly agrammatic with word finding problems and what sounded like a foreign accent. Deviant prosody was an important feature of his foreign sounding speech, in particular the lack of a clear distinction between the two Norwegian word tones (pitch accents). Acoustic analysis of his speech revealed limited F0 variation at word and utterance level and a similar F0 pattern on the two word tones. His deviant prosody is assumed to be the result of reduced ability to produce appropriate F0 variation, a dysarthric condition. There was no indication of apraxia of speech.
Philippe Boula de Mareüil, LIMSI-CNRS
Martine Adda-Decker, LIMSI-CNRS
Cécile Woehrling, LIMSI-CNRS
Paper File Additional Files
  We present data on the pronunciation of oral and nasal vowels in northern and southern French varieties. In particular a sharp contrast exists in the fronting of the open /O/ towards [œ] in the North and the denasalisation of nasal vowels in the South. We examine how linguistic changes in progress may affect these vowels, which are governed by the left/right context and bring to light differences between reading and spontaneous speech. This study was made possible by automatic phoneme alignment on a large corpus of over 100 speakers.
Poster IV-63 Prosodic modelling of synthesised German words
Ursula Hirschfeld, Martin-Luther-Universität Halle-Wittenberg
Rüdiger Hoffmann, Technische Universität Dresden
Friderike Lange, Martin-Luther-Universität Halle-Wittenberg
Paper File
  During the development of an “exemplary” synthesis of words and phonetic words for a “speaking pronunciation dictionary”, considerable deviations from German pronunciation norms are being found, particularly in the prosodic field. On the basis of listening experiments new possibilities of modelling accent patterns arranged specifically for the German vocabulary are being tested.
Meelis Mihkla, Institute of Estonian Language
Paper File
  Traditionally, durational models of speech units have been developed without paying much heed to morphology and part-of-speech information while predicting speech temporal structure. The aim of the present study was to find out whether the rich morphology of the Estonian language could possibly provide some additional (beside the syntactic and part-of-speech) information that could be used in predicting durations. The project is a continuation of prosody studies for Estonian TTS synthesis. Sound durations in the speech of radio newsreaders were modelled by means of different statistical methods (linear regression and neural networks). Model input consisted not only of descriptors of sound context and position, but also of information on part of speech, part of sentence and morphological features. The results indicated a decrease of error in the prediction of segmental durations. Such results were in good harmony with our expectations concerning a morphologically rich language.
Poster IV-67 Investigating Larynx Height With An Articulatory Synthesizer
Eva B. Lasarcyk, Saarland University, Germany
Paper File
  In this paper, we present a comparative study of natural and synthetic speech samples which vary in larynx height. The acoustics of isolated vowels was analyzed with respect to formant frequency changes and changes in voice quality. The synthetic stimuli show the same characteristics as the natural stimuli when special attention is paid to the synthetic excitation quality. One issue addressed behind this study is how the naturalness of speech synthesis can be improved by manipulating voice quality. Another issue is to find out how well the articulatory speech synthesizer used matches the real speech production process.

Back to Conference Schedule