Abstract
In recent years, the development of a feature-based general statistical framework has been pursued for automatic speech recognition via novel designs of minimal or atomic units of speech, aiming at a parsimonious scheme to share the interword and interphone speech data and at a unified way to account for the context-dependent behaviors in speech. The basic design philosophy has been motivated by the theory of distinctive features and by a new form of phonology which argues for use of multidimensional articulatory structures. In this paper, the most recently developed feature-based recognizer is presented, which is capable of operating on all classes of English sounds. Detailed descriptions of the design considerations for the recognizer and of key aspects of the design process are provided. This process, which is called lexicon ‘‘compilation,’’ consists of three elements (1) establishing a feature-specification system; (2) constructing a probabilistic and fractional temporal overlapping pattern across the features; and (3) mapping from the feature-overlap pattern to a state-transition graph. A standard phonetic classification task from the TIMIT database is used as a test bed to evaluate the performance of the recognizer. The experimental results provide preliminary evidence for the effectiveness of the feature-based approach to speech recognition.
Keywords
Affiliated Institutions
Related Publications
An overlapping-feature-based phonological model incorporating linguistic constraints: Applications to speech recognition
Modeling phonological units of speech is a critical issue in speech recognition. In this paper, our recent development of an overlapping-feature-based phonological model that re...
Learning the hidden structure of speech
In the work described here, the backpropagation neural network learning procedure is applied to the analysis and recognition of speech. This procedure takes a set of input/outpu...
Exemplar-Based Sparse Representation Features: From TIMIT to LVCSR
The use of exemplar-based methods, such as support vector machines (SVMs), k-nearest neighbors (kNNs) and sparse representations (SRs), in speech recognition has thus far been l...
Backpropagation training for multilayer conditional random field based phone recognition
Conditional random fields (CRFs) have recently found increased popularity in automatic speech recognition (ASR) applications. CRFs have previously been shown to be effective com...
Boosting attribute and phone estimation accuracies with deep neural networks for detection-based speech recognition
Generation of high-precision sub-phonetic attribute (also known as phonological features) and phone lattices is a key frontend component for detection-based bottom-up speech rec...
Publication Info
- Year
- 1994
- Type
- article
- Volume
- 95
- Issue
- 5
- Pages
- 2702-2719
- Citations
- 144
- Access
- Closed
External Links
Social Impact
Social media, news, blog, policy document mentions
Citation Metrics
Cite This
Identifiers
- DOI
- 10.1121/1.409839