Large-Scale Machine Learning with Stochastic Gradient Descent

Abstract

During the last decade, the data sizes have grown faster than the speed of processors. In this context, the capabilities of statistical machine learning methods is limited by the computing time rather than the sample size. A more precise analysis uncovers qualitatively different tradeoffs for the case of small-scale and large-scale learning problems. The large-scale case involves the computational complexity of the underlying optimization algorithm in non-trivial ways. Unlikely optimization algorithms such as stochastic gradient descent show amazing performance for large-scale problems. In particular, second order stochastic gradient and averaged stochastic gradient are asymptotically efficient after a single pass on the training set.

Keywords

Stochastic gradient descentComputer scienceScale (ratio)Stochastic optimizationGradient descentSet (abstract data type)Online machine learningContext (archaeology)Sample (material)Artificial intelligenceAlgorithmMathematical optimizationMachine learningMathematicsActive learning (machine learning)Artificial neural network

Affiliated Institutions

Princeton University US

Related Publications

Pegasos

Shai Shalev‐Shwartz , Yoram Singer , Nathan Srebro

We describe and analyze a simple and effective iterative algorithm for solving the optimization problem cast by Support Vector Machines (SVM). Our method alternates between stoc...

2007 980 citations

ADADELTA: An Adaptive Learning Rate Method

Matthew D. Zeiler

We present a novel per-dimension learning rate method for gradient descent called ADADELTA. The method dynamically adapts over time using only first order information and has mi...

2012 arXiv (Cornell University) 5515 citations

SGD-QN: Careful Quasi-Newton Stochastic Gradient Descent

Antoine Bordes , Léon Bottou , Patrick Gallinari

The SGD-QN algorithm is a stochastic gradient descent algorithm that makes careful use of second-order information and splits the parameter update into independently scheduled c...

2009 HAL (Le Centre pour la Communication ... 342 citations

Optimization for training neural nets

Etienne Barnard

Various techniques of optimizing criterion functions to train neural-net classifiers are investigated. These techniques include three standard deterministic techniques (variable...

1992 IEEE Transactions on Neural Networks 210 citations

Reducing the Time Complexity of the Derandomized Evolution Strategy with Covariance Matrix Adaptation (CMA-ES)

Nikolaus Hansen , Sibylle D. Müller , Petros Koumoutsakos

This paper presents a novel evolutionary optimization strategy based on the derandomized evolution strategy with covariance matrix adaptation (CMA-ES). This new approach is inte...

2003 Evolutionary Computation 2447 citations

Publication Info

Year: 2010
Type: book-chapter
Pages: 177-186
Citations: 5479
Access: Closed

External Links

View on DOI.org

Social Impact

Altmetric

Large-Scale Machine Learning with Stochastic Gradient Descent

PlumX Metrics

Social media, news, blog, policy document mentions

Citation Metrics

5479

OpenAlex

Cite This

APA Style

                            
                                    Léon Bottou
                                
                            (2010). 
                            Large-Scale Machine Learning with Stochastic Gradient Descent. 
                            
                            , 177-186.
                            https://doi.org/10.1007/978-3-7908-2604-3_16

Identifiers

DOI: 10.1007/978-3-7908-2604-3_16