Session Poster III:

Poster III

Type: poster
Chair: Linda Shockey, Petur Helgason
Date: Tuesday - August 07, 2007
Time: 14:20
Room: Poster Area


Béatrice VAXELAIRE, Phonetics Institute of Strasbourg - Speech and Cognition Group
Rudolph SOCK, Phonetics Institute of Strasbourg - Speech and Cognition Group
Fabrice HIRSCH, Phonetics Institute of Strasbourg - Speech and Cognition Group
Johanna-Pascale ROY, Département de linguistique et de didactique des langues
  This investigation deals with the production of VCV sequences produced by French speakers, with particular focus on larynx position and trajectory. X-ray data are extracted from a database for four speakers, uttering sentences or VCV sequences at two speaking rates: normal-conversational and fast. Results obtained from a frame-by-frame analysis of midsagittal profiles reveal: (1) a high positive correlation between the larynx and the hyoid bone in their vertical displacements; 2) a confirmation of previous findings that the position of the larynx is lower for high vowels than for low vowels; (3) anticipatory laryngeal gestures in both /aCu/ and /uCa/ sequences; (4) that these anticipatory gestures are resistant to the behaviour of supraglottal structures, and also to speech rate conditions.
Poster III-4 Speech Synchronization: Investigating the links between perception and action in speech production
Fred Cummins, University College Dublin
  Speakers can achieve a high degree of synchrony when reading a prepared text together. Under these constraints, there is necessarily a very tight coupling of production and perception. In a first experiment, we demonstrate that speakers can successfully synchronize with selected recordings of others obtained in a synchronous speaking condition. We then have speakers attempt to synchronize with modified recordings, in which the original recording is replaced with altered speech. The goal is to find out the physical properties of the speech signal which permit the coupling required for synchronization. It is demonstrated that the energy envelope itself is not sufficient to support coupling, while pitch information is essentially unimportant.
Louis-Jean Boë, ICP – Depart Speech and Cognition, GIPSA, CNRS, Université Stendhal, Grenoble
Jean GRANAT, Muséum National Histoire Naturelle, CNRS, Paris, France
Pierre BADIN, ICP – Depart Speech and Cognition, GIPSA, CNRS, INPG, Grenoble
Denis AUTESSERRE, ICP – Depart Speech and Cognition, GIPSA, CNRS, Université Stendhal, Grenoble
David POCHIC, École Nationale Supérieure d’Électronique, Grenoble, France
Nassim ZGA, École Nationale Supérieure d’Électronique, Grenoble, France
Nathalie HENRICH, ICP – Depart Speech and Cognition, GIPSA, CNRS, Université Stendhal, Grenoble, France
Lucie Ménard, Départ. Linguistique et Didactique des Langues, Univ. du Québec, Montréal, Canada
  The objective of this work is twofold. First, a model of the vocal tract is positioned into the bony architecture of the male and female skulls from birth to adulthood. Second, vowel spaces are determined and vowel prototypes, for the cardinal vowels, are synthesized using a simulation of the laryngeal source. Results of this modeling study during ontogeny allow for a better understanding of speech acquisition processes in infants and vocal tract reconstruction of fossils’ Hominids.
Poster III-8 Temporal compensation in Czech?
Pavel Machač, Institute of Phonetics, Charles University in Prague
Radek Skarnitzl, Institute of Phonetics, Charles University in Prague
  Temporal compensation on the segmental level refers to the tendency towards temporal equalization of CV (and possibly VC) sequences: a shorter duration of one segment leads to a longer duration of the neighbouring segment, and vice versa. Any comprehensive description of sound properties of a language must take the possible existence of this tendency into account. The objective of this research, which constitutes a pilot study into this area for the Czech language, is the question of a possible temporal effect of a consonant on a vowel (‘vocalic compensation’), and of a vowel on a consonant (‘consonantal compensation’). The result suggest that there indeed is a tendency towards bilateral temporal effects between a consonant and vowel (CV).
Poster III-10 Relational timing or absolute duration? Cue weighting in the perception Japanese singleton - geminate stops
Kaori Idemaru, Carnegie Mellon University
Lori Holt, Carnegie Mellon University
  Relational timing has been proposed as a solution to the problem of variability across durational properties of speech arising with changes in speaking rate. The current study investigates the role of absolute and relational timing cues in perception of Japanese stop length (singleton/ geminate) categorization. Absolute (stop duration) and relational (ratio of stop duration to preceding mora duration) duration cues were independently varied in a categorization test. Although Ratio was shown previously to classify speakers’ productions more accurately (Idemaru, 2005), listeners’ category responses showed strong individual differences in cue use. These results demonstrate that a highly reliable acoustic cue in the distribution of cue available in speech production does not necessarily predict its primacy in speech perception.
Poster III-12 The influence of dynamic F0 on the perception of vowel duration: Cross-linguistic evidence
Heike Lehnert-LeHouillier, University at Buffalo & Haskins Laboratories
  This paper investigates the influence of a dynamic fundamental frequency (F0) on the perception of vowel duration. The perception of vowel duration of the vowels [a], [e], and [i] with a falling versus a level F0 was investigated. Native speakers of Thai, Japanese, German, and Latin American Spanish were presented with monosyllabic CV non-sense words, and their perception of the duration of vowels with a level F0 was compared to that of vowels with a falling F0 from 160 Hz to 80Hz. The results show that only Japanese listeners perceived the vowels with a falling F0 as longer. Hence the cross-linguistic investigation shows that the influence of F0 on the perceived duration of vowels is language specific rather than universally present in speech perception.
SPYROS ARMOSTI, University of Cambridge
  In Cypriot Greek, word-final /n/ assimilates to word-initial fricative and sonorant geminates producing ‘super-geminates’. This study examines whether these super-geminates are perceptually distinct from other types of word-initial and post-lexical geminates. The results of the study indicate that super-geminates were not readily identified by the subjects, while the contrast between word-initial geminates and singletons was more marked.
Poster III-16 Vowel Nasalization in American English: Acoustic Variability due to Phonetic Context
Nancy F. Chen, Speech & Hearing Bioscience & Technology, Harvard-MIT Health Sciences and Technology; Speech Communication Group, Research Laboratory of Electronics, Massachusetts Institute of Technology
Janet Slifka, Speech Communication Group, Research Laboratory of Electronics, Massachusetts Institute of Technology
Kenneth N. Stevens, Speech & Hearing Bioscience & Technology, Harvard-MIT Health Sciences and Technology; Speech Communication Group, Research Laboratory of Electronics, Massachusetts Institute of Technology
  This study quantifies acoustic variation of vowel nasalization arising from phonetic context in American English with an emphasis on carryover contexts. While qualitative articulatory trajectories and phonetic descriptions suggest that a vowel is nasalized in carryover contexts, few acoustic studies have examined this issue. Our acoustic analyses investigate the vowel /i/ and show that: (1) a vowel can be nasalized with at least one adjacent nasal consonant, even if the nasal consonant is pre-vocalic; (2) vowels with nasal consonants on both sides (NVN) do not guarantee more vowel nasalization.
Poster III-18 An Acoustic Study of Devoicing of the Voiced Geminate Obstruents in Japanese
Aki Hirose, Institute of International Education in London
Michael Ashby, University College London
  Historically, the phonological system of Japanese did not allow voiced geminate obstruents, and they can be found only in recent loanwords such as /baggu/ “bag” and /kiddo/ “kid”. However, the voicing of geminates in such loanwords is problematic, and seemingly voiceless pronunciations are often to be heard. In a nonsense word study, three speakers read words exemplifying all potential voiced geminate obstruents, together with their voiceless and singleton counterparts, and measures were made of four possible voicing cues. The duration of closure voicing, and to a lesser extent F0 perturbation, suggest unvoicing of the geminates; but F1 transition resembles that for voiced sounds, while preceding vowels are actually longer before geminates than before singletons. Overall, it seems that laryngeal activity in geminates results from a pattern of deliberate control rather than the aerodynamic challenge of maintaining voicing during a long obstruent articulation.
Poster III-20 Inter-subject agreement in rhythm evaluation for four languages (English, French, German, Italian)
Paolo Mairano, Fac. di Lingue e Lett. Straniere - Univ. di Torino (Italy)
Antonio Romano, Dip. Scienze del Linguaggio - Univ. di Torino (Italy)
  This paper deals with the acoustic correlates of stress-timed and syllable-timed languages proposed by Ramus et alii (1999). An experiment has been conducted in order to verify the validity of the three correlates using data of 4 languages (and some of their varieties). The novelty of this study consists in the fact that the segmentation of the acoustic data (based on "The North Wind and the Sun" of the IPA) was carried out independently by both authors. The results show significant differences if compared between them and with those of other studies. However, the general tendency seems to confirm, at least partially, the validity of the three correlates even though they have been obtained from narrative texts.
Poster III-22 Pitch range variation in English tonal contrasts: continuous or categorical?
Laura Dilley, Department of Communication Disorders and Department of Psychology, Bowling Green State University
  The importance of pitch range variation for theories of intonation is well-known, but whether pitch range variation gives rise to distinctive linguistic categories in English is unclear. To test this possibility, three intonation continua were constructed for use in an imitation experiment; all had endpoints with distinct tonal representations under autosegmental-metrical (AM) theory [1]. Responses to all three stimulus sets showed continuous variation in pitch range. The results suggest that pitch range is a dimension which is gradient in English.
Yosuke Igarashi, Japan Society for the Promotion of Science, National Institute for Japanese Language
  This study analyzes Goshogawara Japanese (GJ) which has rising lexical pitch accent. Accented words in this dialect are known to show a pitch lowering in the final syllable only when the word is followed by a juncture. The results of this experiment contradict earlier reports revealing that this lowering is always present. They indicate that pitch accent in GJ is not simply rising (LH) but it contains falling (HL) elements in its representation.
Poster III-26 Moraic anchoring of f0 in Washo
Justin Murphy, Phonology Laboratory, The University of Chicago
Alan C. L. Yu, Phonology Laboratory, The University of Chicago
  Recent research shows that the minima and maxima of pitch accent and tonal contours are often aligned with segmental anchors. This study examines f0 alignment in Washo, an endangered American language. Washo is interesting because, unlike other languages which have been studied, it not only has a vowel length distinction, but also what is known as ‘stress-sensitive quantity alternations’ [1]: long stressed vowels are followed by a short consonant (V:C), while short stressed vowels are followed by a geminate (VC:) This paper reports the results of an acoustic experiment demonstrating that the anchoring of f0 landmarks in Washo makes reference to anchors at the moraic rather than the segmental level. It is found that H anchors consistently with the second mora of the stressed vowel. L, meanwhile, cannot be anchored to the onset of the stressed vowel without reference to the sonority of segments preceding the tonic vowel.
Poster III-28 Effect of utterance length on F0 scaling
Maria del Mar Vanrell, Universitat Autònoma de Barcelona
  This study examines the effect of utterance length on utterance-initial F0 values and H and L scaling of nuclear accent in Majorcan Catalan. Research on the correlation between utterance length and initial F0 values has thus far yielded contradictory answers to the question of whether utterance length is a determining factor for initial pitch height. Regarding the impact of utterance length on scaling of nuclear accents (known as downstep), it has been shown that downstep may be under the conscious control of the speaker and be governed by a clearly communicative function. Firstly, the results reveal that there exists a correlation between sentence length and initial pitch height, but this correlation is not constant across speakers and sentence-types, suggesting that this is an instance of soft preplanning. Secondly, our results show that downstep or, more precisely, the failure of downstep may be grammaticalized in a particular phonological context.
Poster III-30 Production and perception of word prosody in three dialects of Korean
Kenji Yoshida, Indiana University
Junghyoe Yoon, Indiana University
Hyun-jin Kim, Indiana University
  This paper examines the relationship between production and perception of prosodically marked lexical contrast, comparing 16 native speakers from three dialects of Korean known to exhibit variation in the use of prosodic features for lexical marking. A set of synthesized stimuli was constructed, where both F0 contour and syllable duration were manipulated. South Kyungsang speakers have F0 distinction in their productions and are sensitive to F0 variations in perception. Cholla speakers are sensitive to F0 information but in the opposite direction to Kyungsang speakers, suggesting that their 'interpretation' of the F0 is the critical factor of perceptual judgment. Some of the Seoul speakers show a duration contrast and are sensitive only to duration change. The results reveal general though incomplete correlation between production and perception of word prosody, suggestive of the different typological status of the three dialects.
Poster III-32 Effects of Syllable Structure and Nuclear Pitch Accents on Peak Alignment: A Corpus-Based Analysis
Bernd Möbius, University of Stuttgart
Matthias Jilka, University of Stuttgart
  This paper describes the use of a unit selection corpus in carrying out an investigation of factors influencing specific aspects of the phonetic realization of tonal categories, concentrating on the alignment of peaks in H*L pitch accents in German. Three major linguistic parameters potentially influencing peak alignment are investigated. Two of them (syllable structure, nuclear pitch accents) are established influences while vowel quality is usually not considered relevant. Results from other studies are confirmed (peaks occur earlier in nuclear pitch accents, coda type influences peak position) and new findings are offered (in interaction onset type is more important than coda type). The presented procedure both describes the characteristics of the voice providing the corpus (allowing a more detailed phonetic realization of tonal categories, e.g., for speech synthesis) and offers general insights into which factors are relevant to the alignment of H*L peaks in German.
Alice Turk, University of Edinburgh
Snezhina Dimitrova, Sofia University
  Durations of syllables in phrasally stressed English 4-syllable words like democratic, with primary stress on the penultimate syllable and secondary stress on the first syllable, were compared with their counterparts in words without phrasal stress. These comparisons showed considerable variation in lengthening patterns across subjects, where two subjects showed reliable lengthening on only a single syllable in the phrasally-stressed words (primary stressed syllable for one subject, final syllable for the other). The other two subjects reliably lengthened the first syllable, the primary stressed syllable, and final syllable, with the greatest magnitude of lengthening on the primary stressed syllable. Taken together, these results suggest that the initial, secondary stressed syllable, the primary stressed syllable, and the final syllable are all distinct but optional lengthening sites in English.
Yuan Jia, English Department, Nankai University, China
Aijun Li, Institute of Linguistics, Chinese Academy of Social Sciences, China
Ziyu Xiong, Institute of Linguistics, Chinese Academy of Social Sciences, China
Yiya Chen, Department of Linguistics, Radboud Universiteit Nijmegen, Holland
  ABSTRACT This paper explicitly examines the influence of focus on durational patterns of five-syllable words with various positions and different tones in Standard Chinese. Target sentences were constructed that focus elicited on the constituents which were located at the beginning of the sentences. For the within-word syllables of the focused constituents, they were designed in various positions in the words and associated with the tones of tone1, tone2 and tone4. Results of the experiments show that although focus induces significant lengthening of the focused constitutes, the internal durational adjustment of each focused syllable is by no means symmetric and the magnitude of such lengthening is determined by the metrical structure of the focused constituents. Keywords: focus; five-syllable words; durational pattern
Richard Ogden, Dept. of Language & Linguistic Science, University of York
  Complaints might be thought a priori to be a good place to find paralinguistic features in their natural setting. Using conversation analytic methodology, I argue that an account of the phonetics of complaints needs to take into consideration other sequential features of the turn in which the complaint is delivered. In particular, a turn delivering a complaint can either be marked as designed to receive an affiliative reponse (and thus a continuation of the activity of complaining), or marked as closing down the complaint sequence.
Poster III-40 The effect of onset and position in the realization of Tone 1 in two dialects of Taiwan Mandarin
Janice Fon, Graduate Institute of Linguistics, National Taiwan University
Huiju Hsu, Department of Applied Linguistics and Language Studies, Chung Yuan Christian University
Yi-Hsuan Huang, Graduate Institute of Linguistics, National Taiwan University
Sally Chen, Graduate Institute of Linguistics, National Taiwan University
  This study investigates how onset and sentence positioning affect the realization of Tone 1 in two dialects of Taiwan Mandarin. Results showed that the central dialect was higher in register when placed in isolation, but lower when placed in a sentential context. When there was a tonal mismatch, coarticulatory effects were more robust in the northern dialect. This implies that speakers of the central dialect (nonstandard) might be more self-conscious about the standard-vernacular distinction than those of the northern dialect (standard), and overcorrection tended to occur. The effect of onset type was also significant but fairly localized. Obstruent-initial syllables had higher initial pitch than sonorant ones. The declination effect was also significant, the rate of which being higher in the central variety. In addition, sentential stress tended to raise the sentence-final H targets in both varieties. However, the PENTA model was not fully supported.
Matt Bauer, Illinois Institute of Technology
Frank Parker, Parlay Press
  Functional (speaker-based) and non-functional (listener-based) accounts are often equally satisfactory in explaining internally motivated diachronic sound change. Here we report a case clearly favoring the non-functional account: In some dialects of English, /æ/ is raised before /g/ but not /k/. The raising may be an attempt to reduce the conflict between producing the low front vowel before the voiced velar, or it may be due to listener misapprehension. Using acoustic and articulatory data from General American English to simulate the conditions prior to /æ/-raising, we show the precipitating stimulus for /æ/-raising had to have been listener misapprehension. Specifically, even though both /g/ and /k/ exert a coarticulatory effect on /æ/, acoustic evidence for the coarticulatory effect is found only before /g/.
Poster III-44 Phonetic Factors in /r/-Liaison Usage: A First Report
Pilar Mompean-Guillamon, University of Murcia
Jose Antonio Mompean-Gonzalez, University of Murcia
  Variability in /r/-liaison usage in non-rhotic accents of English has been explained by reference to linguistic, sociolinguistic and phonetic factors. This paper looks at two phonetic factors that might condition such variability: a) the type of vowel phoneme at the end of the syllable likely to make the link; and b) the presence/absence of /r/ at the beginning of that syllable. A corpus of Received Pronunciation (RP) English newscasts from the years 2004 and 2005 available from the BBC Learning English website [16] was investigated. Potential contexts were detected and analysed auditorily. The results show that intrusive /r/ is more frequent after back vowels than after central vowels and that the presence of /r/ in the syllable that would make the /r/-link does not seem to have a great effect on the presence of /r/-link.
Poster III-46 How Universal is the Sonority Hierarchy?: A Cross-Linguistic Acoustic Study
Carmen Jany, University of California, Santa Barbara
Matthew Gordon, University of California, Santa Barbara
Carlos M Nash, University of California, Santa Barbara
Nobutaka Takara, University of California, Santa Barbara
  Parker (2002) explores the hypothesis that segmental sonority in the phonological sense has concrete measurable physical correlates. In a study of English and Spanish, Parker concludes that intensity is the most reliable correlate of sonority. This paper extends Parker’s study to four more genetically diverse languages: Egyptian Arabic, Hindi, Mongolian, and Malayalam, thereby examining the universality of the acoustic basis for the sonority hierarchy: glides > liquids > nasals > obstruents. It is shown that disputed sonority contrasts, such as a) laterals vs. rhotics, b) voiceless fricatives vs. voiced stops, c) affricates vs. stops, and d) sibilants vs. other fricatives, follow language-specific patterns, while undisputed contrasts, such as sonorants > obstruents, are cross-linguistically consistent in their acoustic patterns. Differences in sonority as a result of prosodic position and interspeaker variation are not observed in the present study.
Poster III-48 Accent variation in adolescents in Aberdeen: first results for (hw) and (th)
Thorsten Brato, Department of English, University of Giessen
  This paper presents preliminary results of a major study into accent variation in a socially-stratified sample of urban adolescents in Aberdeen. The variables (hw) and (th) were analysed in word list style and reading style. The results indicate changes in progress for the first variable. The status of the second variable is yet unclear. TH-fronting was found only infrequently and seems to be restricted to some speakers.
Tetsuo Harada, Waseda University
  This study examines to what extent English-speaking adults who have attended a Japanese immersion program in childhood, in which many content subjects are taught in Japanese, can retain their L2 pronunciation ability even if L2 input dramatically decreases after they exit the program. The results show that the immersion graduates still retained their ability to control segmental timing (i.e., voice onset time (VOT), contrast between single and geminate stops in Japanese), although their L2 sounds were not exactly the same as monolingual speakers’.
Won Tokuma, Seijo University
  This study investigates how spatial configurations of English fricatives change for Japanese learners in advanced, intermediate and pre-intermediate levels, in comparison to that of native speakers. The perceptual representations obtained from Multidimensional Scaling analysis on similarity judgements showed clear sibilance/nonsibilance division for advanced and intermediate learners, but place of articulation feature was not observed. The perceptual configuration of pre-intermediate level students showed strong L1 phonological influence. The results show that the spatial modelling of similarity data can provide an alternative to the conventional approaches to cross language perception.
Bogdan Rozborski, Polish-Japanese Institute of Information Technology
  The aim of this paper is to demonstrate the appearance of spectral differences of formant structures of a chosen vowel that occurred after compressing PCM sound data using a given compression method. The experiments carried out by the author show that sound data compression does not affect the significantly stability of formant frequency distributions in terms of statistics, as long as it does not introduce random, stochastic component into the original speech signal.
Daniel Callan, ATR Computational Neuroscience Laboratories
Mitsuo Kawato, ATR Computational Neuroscience Laboratories
  This study investigates neural processes related to phoneme identification in the presence of noise. Differential brain activity for difficult consonant identification task (/b/-/d/) relative to an easier vowel identification task (/a/-/o/) was present in brain regions involved with articulatory planning control (Broca’s area, anterior insula, premotor cortex), instantiation of internal models (cerebellum), and auditory processing regions (STG/S). The results of a correlation analysis of behavioral performance with brain activity, as well as analysis of incorrect versus correct responses suggests that activity in brain regions involved with articulatory planning control is related with poorer performance. These results are inconsistent with hypotheses that articulatory planning areas are utilized to facilitate speech perception. Considerable activity in the cerebellum for correct relative to incorrect responses is consistent with the hypothesis that articulatory-auditory internal models instantiated in the cerebellum are utilized to facilitate phoneme perceptual identification performance.
Beate Wendt, Leibniz-Institute for Neurobiology
Ines Bose, Martin-Luther-University of Halle/S.; Department of Speech Communication and Phonetics
Henning Scheich, Leibniz-Institute for Neurobiology
Michael Sailer, MEDIAN Klinik NRZ Magdeburg
Hermann Ackermann, Eberhard-Karls-Universitätsklinikum Tübingen; Clinic for Nerology
  This paper deals with a case of FAS (a German speaking woman with a Russian accent), a rare form of speech disorder. We focused on the patient's prosody while reading aloud, especially on temporal structures (speech rate and speech rhythm) in comparison to features of a Russian and a German native speaker. The aim of the auditory and acoustic analysis was to identify potential key features of pronunciation which could be characteristic of Russian German speech and which might lead listeners to judge the patient's speech as sounding Russian. There are similarities between the patient and the Russian native speaker with regard to some phonetic features (structuring of prosodic phrases). But the patient's speech often shows a lack of some of the most typical features of true Russian foreign accent, and there are more similarities between the Russian and the German native Speaker.
Poster III-60 Multimodal Analysis of Anger in Natural Speech Data
Catherine Mathon, EA 333, ARP, Université Paris Diderot, UFRL case 7003, 2 place Jussieu, 75251 Paris Cedex
  This paper reports a study on detection and expression of anger in French, conducted on natural speech data. Perceptual tests showed that both linguistic and prosodic cues could convey information about the affective state of the speaker. Pragmatic, segmental and supra-segmental analyses of the corpus were conducted in order to reveal the real cues that permit the detection of emotion and the classification of anger in degrees.
Poster III-62 Expressing the inexpressible: A phonetic study of nonstandard use of a diacritic for voiced obstruents in Japanese
Keiko Masuda, Chuo University
  This paper investigates phonetic features of vowels with a diacritic for voiced obstruents (dakuten) in Japanese, which are phonologically and orthographically nonstandard but often observed recently in informal linguistic media. Recorded data of the vowels were analysed in terms of auditory impression, visual inspection, formant frequencies, phonation type, F0 and acoustic intensity. It was revealed that the productions of /a/ with a dakuten exhibited positive spectral tilt in the low frequency range and lowering of F0, both of which are indicative of creaky voice. On the other hand, increase in acoustic intensity, which has been claimed by some previous work, was not clearly observed in this analysis.
Poster III-64 Expressive Speech Corpus Validation by Mapping Subjective Perception to Automatic Classification Based on Prosody and Voice Quality
Ignasi Iriondo, Enginyeria i Arquitectura La Salle. Ramon Llull University
Santiago Planet, Enginyeria i Arquitectura La Salle. Ramon Llull University
Joan Claudi Socoró, Enginyeria i Arquitectura La Salle. Ramon Llull University
Francesc Alías, Enginyeria i Arquitectura La Salle. Ramon Llull University
Carlos Monzo, Enginyeria i Arquitectura La Salle. Ramon Llull University
Elisa Martínez, Enginyeria i Arquitectura La Salle. Ramon Llull University
  This paper presents the validation of the expressive content of an acted corpus produced to be used in speech synthesis, due to this kind of emotional speech can be rather lacking in authenticity. The goal is to obtain an automatic classifier able to prune the bad utterances -from an expressiveness point of view-. The results of a previous subjective test are used for training a multistage emotional identification system based on statistical features computed from the speech prosody and voice quality. Finally, the system provides a set of utterances to be checked and definitely eliminated if appropriate.
Anton Batliner, Lehrstuhl für Mustererkennung, Friedrich-Alexander-Universität Erlangen Nürnberg
Stefan Steidl, Lehrstuhl für Mustererkennung, Friedrich-Alexander-Universität Erlangen Nürnberg
Björn Schuller, Institute for Human-Machine-Communication, Technische Universität München
Dino Seppi, ITC-IRST
Thurid Vogt, Multimedia Concepts and their Applications, University of Augsburg
Laurence Devillers, LIMSI-CNRS
Laurence Vidrascu, LIMSI-CNRS
Noam Amir, Dep. of Communication Disorders, Sackler Faculty of Medicine, Tel Aviv University
Loic Kessous, Dep. of Communication Disorders, Sackler Faculty of Medicine, Tel Aviv University
Vered Aharonson, Tel Aviv Academic College of Engineering, Tel Aviv
  Traditionally, it has been assumed that pitch is the most important prosodic feature for the marking of prominence, and of other phenomena such as the marking of boundaries or emotions. This role has been put into question by recent studies. As nowadays larger databases are always being processed automatically, it is not clear up to what extent the possibly lower relevance of pitch can be attributed to extraction errors or to other factors. We present some ideas as for a phenomenological difference between pitch and duration, and compare the performance of automatically extracted F0 values and of manually corrected F0 values for the automatic recognition of rominence and emotion in spontaneous speech (children giving commands to a pet robot). The difference in classification performance between corrected and automatically extracted pitch features turns out to be consistent but not very pronounced.

