Abstract

During the last decade, the data sizes have grown faster than the speed of processors. In this context, the capabilities of statistical machine learning methods is limited by the computing time rather than the sample size. A more precise analysis uncovers qualitatively different tradeoffs for the case of small-scale and large-scale learning problems. The large-scale case involves the computational complexity of the underlying optimization algorithm in non-trivial ways. Unlikely optimization algorithms such as stochastic gradient descent show amazing performance for large-scale problems. In particular, second order stochastic gradient and averaged stochastic gradient are asymptotically efficient after a single pass on the training set.

Keywords

Stochastic gradient descentComputer scienceScale (ratio)Stochastic optimizationGradient descentSet (abstract data type)Online machine learningContext (archaeology)Sample (material)Artificial intelligenceAlgorithmMathematical optimizationMachine learningMathematicsActive learning (machine learning)Artificial neural network

Affiliated Institutions

Related Publications

Optimization for training neural nets

Various techniques of optimizing criterion functions to train neural-net classifiers are investigated. These techniques include three standard deterministic techniques (variable...

1992 IEEE Transactions on Neural Networks 210 citations

Publication Info

Year
2010
Type
book-chapter
Pages
177-186
Citations
5479
Access
Closed

External Links

Social Impact

Social media, news, blog, policy document mentions

Citation Metrics

5479
OpenAlex

Cite This

Léon Bottou (2010). Large-Scale Machine Learning with Stochastic Gradient Descent. , 177-186. https://doi.org/10.1007/978-3-7908-2604-3_16

Identifiers

DOI
10.1007/978-3-7908-2604-3_16