IMPLICIT RATE AND SPEAKER NORMALIZATION IN A CONTEXT-RICH PHONETIC EXEMPLAR MODEL

Travis Wade
Institute for Natural Language Processing, University of Stuttgart

ID 1345
[full paper]

In this study we present a model of speech perception in which (1) memory includes a single, ordered collection of acoustic cues extracted in real time at salient landmark locations from previously heard signals, and (2) identification of newly encountered sounds involves comparing the sounds and their surrounding contexts with similar sequences occurring in memory. Under these assumptions, perceptual speaker and rate normalization and context dependence in general follow implicitly from the statistics of the language environment and do not require traditionally assumed processes or levels of representation. We verify this by means of a simulation in which the model simultaneously acquires VOT and F1 cues to consonant voicing and vowel height, and their dependence on speaking rate and speaker gender, based on exposure to productions from the TIMIT database.