Investigation of full-sequence training of deep belief networks for speech recognition

Abstract

Recently, Deep Belief Networks (DBNs) have been proposed for phone recognition and were found to achieve highly competitive performance. In the original DBNs, only framelevel information was used for training DBN weights while it has been known for long that sequential or full-sequence information can be helpful in improving speech recognition accuracy. In this paper we investigate approaches to optimizing the DBN weights, state-to-state transition parameters, and language model scores using the sequential discriminative training criterion. We describe and analyze the proposed training algorithm and strategy, and discuss practical issues and how they affect the final results. We show that the DBNs learned using the sequence-based training criterion outperform those with frame-based criterion using both threelayer and six-layer models, but the optimization procedure for the deeper DBN is more difficult for the former criterion.

Keywords

Computer scienceTraining (meteorology)Speech recognitionSequence (biology)Artificial intelligenceNatural language processing

Affiliated Institutions

Related Publications

Comparing multilayer perceptron to Deep Belief Network Tandem features for robust ASR

Oriol Vinyals , Suman Ravuri

In this paper, we extend the work done on integrating multilayer perceptron (MLP) networks with HMM systems via the Tandem approach. In particular, we explore whether the use of...

2011 55 citations

Deep Belief Networks using discriminative features for phone recognition

Abdelrahman Mohamed , Tara N. Sainath , George E. Dahl +3 more

Deep Belief Networks (DBNs) are multi-layer generative models. They can be trained to model windows of coefficients extracted from speech and they discover multiple layers of fe...

2011 289 citations

Lattice-based optimization of sequence classification criteria for neural-network acoustic modeling

Brian Kingsbury

Acoustic models used in hidden Markov model/neural-network (HMM/NN) speech recognition systems are usually trained with a frame-based cross-entropy error criterion. In contrast,...

2009 238 citations

Boosting attribute and phone estimation accuracies with deep neural networks for detection-based speech recognition

Dong Yu , Sabato Marco Siniscalchi , Li Deng +1 more

Generation of high-precision sub-phonetic attribute (also known as phonological features) and phone lattices is a key frontend component for detection-based bottom-up speech rec...

2012 64 citations

Backpropagation training for multilayer conditional random field based phone recognition

Rohit Prabhavalkar , Eric Fosler‐Lussier

Conditional random fields (CRFs) have recently found increased popularity in automatic speech recognition (ASR) applications. CRFs have previously been shown to be effective com...

2010 31 citations

Publication Info

Year: 2010
Type: article
Citations: 213
Access: Closed

External Links

View on DOI.org

Social Impact

Altmetric

Investigation of full-sequence training of deep belief networks for speech recognition

PlumX Metrics

Social media, news, blog, policy document mentions

Citation Metrics

213

OpenAlex

Cite This

APA Style

                            
                                    Abdelrahman Mohamed, 
                                
                                    Dong Yu, 
                                
                                    Li Deng
                                
                            (2010). 
                            Investigation of full-sequence training of deep belief networks for speech recognition. 
                            
                            .
                            https://doi.org/10.21437/interspeech.2010-304

Identifiers

DOI: 10.21437/interspeech.2010-304