PRESERVING FINE PHONETIC DETAIL USING EPISODIC MEMORY: AUTOMATIC SPEECH RECOGNITION WITH MINERVA2

Roger Moore & Viktoria Maier
Dept. Computer Science, University of Sheffield, UK

ID 1724
[full paper]

Previous research has demonstrated competitive recognition results using a simulation of episodic memory - 'MINERVA2' - on the Peterson & Barney corpus of vowel formant data. This paper presents a modified implementation designed to work on real speech data, and results are reported on isolated-word recognition experiments conducted using the TI-ALPHA corpus. It is shown that access to fine phonetic detail is critical for achieving high recognition accuracy, whether it is provided by the episodic model or by hidden Markov models incorporating large numbers of Gaussian mixture components. However it is confirmed that, although MINERVA2 offers a powerful means for generalizing by accessing the fine detail retained in all the training data, it is severely hampered by its inability to model temporal sequence. It is concluded that a new episodic model is needed that is based on the principles of MINERVA2 but which overcomes such limitations.