Preprint / Version 1

Large Batch vs. Small Batch Training: Generalization Tradeoffs in Deep Neural Networks

##article.authors##

DOI:

https://doi.org/10.31224/7356

Keywords:

Batch size, deep learning, generalization gap, SGD, learning-rate scaling, sharp minima, flat minima, gradient noise, implicit regularization, CIFAR-10

Abstract

Batch size is a foundational hyperparameter in stochastic gradient descent (SGD) training of deep neural networks, governing both computational efficiency and model generalization. Although the adverse effect of large batch sizes on test-set performance—the generalization gap—is empirically well known, its dependence on dataset scale, model architecture, and learning-rate scheduling has not been systematically char acterized across a unified experimental framework. In this work we present a comprehensive empirical study on five datasets (three synthetic sets of 1K, 10K, and 50K samples; MNIST; and CIFAR-10) and three neural network architectures, sweeping batch sizes from 1 to 1024. Our experiments confirm that large batch SGD converges to sharp minimizers of the training loss, while small-batch SGD—by exploiting gradient noise as implicit regularization—finds substantially flatter minima that generalize better. We quantify a 6.9× sharpness ratio between batch sizes of 32 and 512, and demonstrate that gradient variance follows the theoretical Var ∝ 1/B law with R2 = 0.996. Controlled ablations validate that the linear scaling rule—scaling the learning rate proportionally with batch size—is essential: omitting it degrades test accuracy by up to 10%. Finally, we identify that the critical batch size—the threshold beyond which accuracy degrades by more than 1%—scales approximately as √N, where N is the dataset size, and we translate all findings into a practical batch size selection protocol for practitioners.

Downloads

Download data is not yet available.

Author Biography

Anish Kumar Pal, Indian Institute of Technology, Bombay

Department of Electrical Engineering
M. Tech Student(Research Assistant)

Downloads

Posted

2026-06-18