Large Batch vs. Small Batch Training: Generalization Tradeoffs in Deep Neural Networks

Anish Kumar Pal

doi:10.31224/7356

##article.authors##

Anish Kumar Pal Indian Institute of Technology, Bombay https://orcid.org/0000-0001-6167-1383

DOI:

https://doi.org/10.31224/7356

Keywords:

Batch size, deep learning, generalization gap, SGD, learning-rate scaling, sharp minima, flat minima, gradient noise, implicit regularization, CIFAR-10

Abstract

Batch size is a foundational hyperparameter in stochastic gradient descent (SGD) training of deep neural networks, governing both computational efficiency and model generalization. Although the adverse effect of large batch sizes on test-set performance—the generalization gap—is empirically well known, its dependence on dataset scale, model architecture, and learning-rate scheduling has not been systematically char acterized across a unified experimental framework. In this work we present a comprehensive empirical study on five datasets (three synthetic sets of 1K, 10K, and 50K samples; MNIST; and CIFAR-10) and three neural network architectures, sweeping batch sizes from 1 to 1024. Our experiments confirm that large batch SGD converges to sharp minimizers of the training loss, while small-batch SGD—by exploiting gradient noise as implicit regularization—finds substantially flatter minima that generalize better. We quantify a 6.9× sharpness ratio between batch sizes of 32 and 512, and demonstrate that gradient variance follows the theoretical Var ∝ 1/B law with R2 = 0.996. Controlled ablations validate that the linear scaling rule—scaling the learning rate proportionally with batch size—is essential: omitting it degrades test accuracy by up to 10%. Finally, we identify that the critical batch size—the threshold beyond which accuracy degrades by more than 1%—scales approximately as √N, where N is the dataset size, and we translate all findings into a practical batch size selection protocol for practitioners.

Large Batch vs. Small Batch Training: Generalization Tradeoffs in Deep Neural Networks

##article.authors##

DOI:

Keywords:

Abstract

Downloads

Author Biography

Anish Kumar Pal, Indian Institute of Technology, Bombay

Downloads

Posted

License

Latest preprints