Algorithms for hyper-parameter optimization

James Bergstra; R. Bardenet; Yoshua Bengio; Balázs Kégl

Abstract

Several recent advances to the state of the art in image classification benchmarks have come from better configurations of existing techniques rather than novel ap-proaches to feature learning. Traditionally, hyper-parameter optimization has been the job of humans because they can be very efficient in regimes where only a few trials are possible. Presently, computer clusters and GPU processors make it pos-sible to run more trials and we show that algorithmic approaches can find better results. We present hyper-parameter optimization results on tasks of training neu-ral networks and deep belief networks (DBNs). We optimize hyper-parameters using random search and two new greedy sequential methods based on the ex-pected improvement criterion. Random search has been shown to be sufficiently efficient for learning neural networks for several datasets, but we show it is unreli-able for training DBNs. The sequential algorithms are applied to the most difficult DBN learning problems from [1] and find significantly better results than the best previously reported. This work contributes novel techniques for making response surface models P (y|x) in which many elements of hyper-parameter assignment (x) are known to be irrelevant given particular values of other elements. 1

Keywords

Computer scienceArtificial neural networkArtificial intelligenceMachine learningDeep learningFeature (linguistics)AlgorithmRandom search

Affiliated Institutions

Related Publications

Random search for hyper-parameter optimization

James Bergstra , Yoshua Bengio

Grid search and manual search are the most widely used strategies for hyper-parameter optimization. This paper shows empirically and theoretically that randomly chosen trials ar...

2012 7916 citations

Investigation of full-sequence training of deep belief networks for speech recognition

Abdelrahman Mohamed , Dong Yu , Li Deng

Recently, Deep Belief Networks (DBNs) have been proposed for phone recognition and were found to achieve highly competitive performance. In the original DBNs, only framelevel in...

2010 213 citations

Comparing multilayer perceptron to Deep Belief Network Tandem features for robust ASR

Oriol Vinyals , Suman Ravuri

In this paper, we extend the work done on integrating multilayer perceptron (MLP) networks with HMM systems via the Tandem approach. In particular, we explore whether the use of...

2011 55 citations

Exploring Strategies for Training Deep Neural Networks

Hugo Larochelle , Yoshua Bengio , Jérôme Louradour +1 more

Deep multi-layer neural networks have many levels of non-linearities allowing them to compactly represent highly non-linear and highly-varying functions. However, until recently...

2009 Journal of Machine Learning Research 1114 citations

Greedy Layer-Wise Training of Deep Networks

Yoshua Bengio , Pascal Lamblin , Dan Popovici +1 more

Complexity theory of circuits strongly suggests that deep architectures can be much more efficient (sometimes exponentially) than shallow architectures, in terms of computationa...

2007 The MIT Press eBooks 4659 citations

Publication Info

Year: 2016
Type: preprint
Citations: 3180
Access: Closed

External Links

Citation Metrics

3180

OpenAlex

Cite This

APA Style

                            
                                    James Bergstra, 
                                
                                    R. Bardenet, 
                                
                                    Yoshua Bengio
                                
                                et al.
                            
                            (2016). 
                            Algorithms for hyper-parameter optimization. 
                            
                            .