Of Mouths, Ears, Eyes and Brains -- The Sensory Motor Foundations of Spoken Language
What Makes Speech Stick?
Steven Greenberg, Silicon Speech
|Robustness and reliability is the essence of speech communication. The senses collaborate with memory and other brain mechanisms to decode spoken language in the harsh and often unpredictable environments of the real world. How the brain makes speech “stick” is the focus of this special session, which examines how the senses and motor system coordinate during speech perception and production.
Using Auditory Feedback and Rhythmicity for Diphone Discrimination of Degraded Speech
Oded Ghitza, Sensimetrics Corporation
|We describe a computational model of diphone perception based on salient properties of peripheral and central auditory processing. The model comprises an efferent-inspired closed-loop model of the auditory periphery connected to a template-matching neuronal circuit with a gamma rhythm at its core. We show that by exploiting auditory feedback a place/rate model of central processing is sufficient for the prediction of human performance in diphone discrimination of minimal pairs embedded in background noise – in contrast to the need for additional, temporal information when open-loop models of the periphery are used. We also demonstrate that the template-matching circuit exhibits properties, such as time-scaling insensitivity, consistent with (and desirable for) perception of spoken language.
Sensory Goals and Control Mechanisms for Phonemic Articulations
Joseph Perkell, Massachusetts Institute of Technology
|An overview of speech production is described in which the goals of phonemic speech movements are implemented in auditory and somatosensory domains and the movements are controlled by a combination of feedback and feedforward mechanisms. Findings of motor-equivalent trading relations in producing /u/ and /r/, cross-speaker relations between vowel and consonant production and perception, and speakers’ use of a “saturation effect” in producing /s/ support the idea that the goals are in sensory domains. Results of production experiments in which auditory feedback was modified and interrupted provide insight into the nature of feedback and feedforward control mechanisms. The findings are all compatible with the DIVA model of speech motor planning, which makes it possible to quantify relations among phonemic specifications of utterances, brain activity, articulatory movements and the speech sound output.
Analysis-by-Synthesis in Auditory-Visual Speech Perception: Multi-Sensory Motor Interfacing
Virginie van Wassenhove, California Institute of Technology
|In conversation, one sees as much as one hears the interlocutor. Compelling demonstrations of auditory-visual (AV) integration in speech perception are the classic McGurk effects: in McGurk “fusion,” an auditory [p] dubbed onto a face articulating [k] is perceived as a single fused percept [t], but in McGurk “combination,” an auditory [k] dubbed onto a visual [p] is heard as combinations of [k] and [p]. The spatiotemporal co-occurrence of AV speech signals is likely used by the brain. AV integration offers interesting challenges for neuroscience and speech science alike. How, when, where, and in what format do auditory and visual speech signals integrate? Several studies are described, suggesting that multisensory speech integration relies on a dynamic set of predictive computations involving large-scale cortical, sensorimotor networks. Within an ‘analysis-by-synthesis’ framework, it is suggested that speech perception entails a predictive brain network operating on abstract speech units.
Back to Conference Schedule