Abstract
We introduce a discriminative model for speech recognition that integrates acoustic, duration and language components. In the framework of finite state machines, a general model for speech recognition G is a finite state transduction from acoustic state sequences to word sequences (e.g., search graph in many speech recognizers). The lattices from a baseline recognizer can be viewed as an a posteriori version of G after having observed an utterance. So far, discriminative language models have been proposed to correct the output side of G and is applied on the lattices. The acoustic state sequences on the input side of these lattice can also be exploited to improve the choice of the best hypotheses through the lattice. Taking this view, the model proposed in this paper jointly estimates the parameters for acoustic and language components in a discriminative setting. The resulting model can be factored as corrections for the input and the output sides of the general model G. This formulation allows us to incorporate duration cues seamlessly. Empirical results on a large vocabulary Arabic GALE task demonstrate that the proposed model improves word error rate substantially, with a gain of 1.6% absolute. Through a series of experiments we analyze the contributions from and interactions between acoustic, duration and language components to find that duration cues play an important role in Arabic task.
Keywords
Affiliated Institutions
Related Publications
Morphology-based and sub-word language modeling for Turkish speech recognition
We explore morphology-based and sub-word language modeling approaches proposed for morphologically rich languages, and evaluate and contrast them for Turkish broadcast news tran...
Lattice-based optimization of sequence classification criteria for neural-network acoustic modeling
Acoustic models used in hidden Markov model/neural-network (HMM/NN) speech recognition systems are usually trained with a frame-based cross-entropy error criterion. In contrast,...
Fundamentals of speech recognition
1. Fundamentals of Speech Recognition. 2. The Speech Signal: Production, Perception, and Acoustic-Phonetic Characterization. 3. Signal Processing and Analysis Methods for Speech...
Short-term Memory for Word Sequences as a Function of Acoustic, Semantic and Formal Similarity
Experiment I studied short-term memory (STM) for auditorily presented five word sequences as a function of acoustic and semantic similarity. There was a large adverse effect of ...
Publication Info
- Year
- 2010
- Type
- article
- Citations
- 24
- Access
- Closed
External Links
Social Impact
Social media, news, blog, policy document mentions
Citation Metrics
Cite This
Identifiers
- DOI
- 10.1109/icassp.2010.5495227