Improving deep neural networks for LVCSR using rectified linear units and dropout

Abstract

Recently, pre-trained deep neural networks (DNNs) have outperformed traditional acoustic models based on Gaussian mixture models (GMMs) on a variety of large vocabulary speech recognition benchmarks. Deep neural nets have also achieved excellent results on various computer vision tasks using a random “dropout ” procedure that drastically improves generalization error by randomly omitting a fraction of the hidden units in all layers. Since dropout helps avoid overfitting, it has also been successful on a small-scale phone recognition task using larger neural nets. However, training deep neural net acoustic models for large vocabulary speech recognition takes a very long time and dropout is likely to only increase training time. Neural networks with rectified linear unit (ReLU) non-linearities have been highly successful for computer vision tasks and proved faster to train than standard sigmoid units, sometimes also improving discriminative performance. In this work, we show on a 50-hour English Broadcast News task that modified deep neural networks using ReLUs trained with dropout during frame level training provide an 4.2 % relative improvement over a DNN trained with sigmoid units, and a 14.4 % relative improvement over a strong GMM/HMM system. We were able to obtain our results with minimal human hyper-parameter tuning using publicly available Bayesian optimization code. Index Terms — neural networks, deep learning, dropout, acoustic modeling, broadcast news, LVCSR, rectified linear units, Bayesian optimization 1.

Keywords

Computer scienceDropout (neural networks)Artificial neural networkSpeech recognitionArtificial intelligenceSigmoid functionDiscriminative modelDeep neural networksTask (project management)VocabularyDeep learningMixture modelHidden Markov modelTime delay neural networkBayesian probabilityPattern recognition (psychology)Machine learning

Affiliated Institutions

Related Publications

Lattice-based optimization of sequence classification criteria for neural-network acoustic modeling

Brian Kingsbury

Acoustic models used in hidden Markov model/neural-network (HMM/NN) speech recognition systems are usually trained with a frame-based cross-entropy error criterion. In contrast,...

2009 238 citations

Exemplar-Based Sparse Representation Features: From TIMIT to LVCSR

Tara N. Sainath , Bhuvana Ramabhadran , Michael Picheny +2 more

The use of exemplar-based methods, such as support vector machines (SVMs), k-nearest neighbors (kNNs) and sparse representations (SRs), in speech recognition has thus far been l...

2011 IEEE Transactions on Audio Speech and... 65 citations

An exploration of large vocabulary tools for small vocabulary phonetic recognition

Tara N. Sainath , Bhuvana Ramabhadran , Michael Picheny

While research in large vocabulary continuous speech recognition (LVCSR) has sparked the development of many state of the art research ideas, research in this domain suffers fro...

2009 33 citations

Auto-encoder bottleneck features using deep belief networks

Tara N. Sainath , Brian Kingsbury , Bhuvana Ramabhadran

Neural network (NN) bottleneck (BN) features are typically created by training a NN with a middle bottleneck layer. Recently, an alternative structure was proposed which trains ...

2012 181 citations

Deep Neural Networks for Acoustic Modeling in Speech Recognition

Geoffrey E. Hinton , Li Deng , Dong Yu +8 more

Most current speech recognition systems use hidden Markov models (HMMs) to deal with the temporal variability of speech and Gaussian mixture models to determine how well each st...

2012 1899 citations

Publication Info

Year: 2013
Type: article
Citations: 1270
Access: Closed

External Links

View on DOI.org

Social Impact

Altmetric

Improving deep neural networks for LVCSR using rectified linear units and dropout

PlumX Metrics

Social media, news, blog, policy document mentions

Citation Metrics

1270

OpenAlex

Cite This

APA Style

                            
                                    George E. Dahl, 
                                
                                    Tara N. Sainath, 
                                
                                    Geoffrey E. Hinton
                                
                            (2013). 
                            Improving deep neural networks for LVCSR using rectified linear units and dropout. 
                            
                            .
                            https://doi.org/10.1109/icassp.2013.6639346

Identifiers

DOI: 10.1109/icassp.2013.6639346