# Gradient Accumulation - [Pytorch](https://pytorch.org/docs/stable/notes/amp_examples.html) - helps when the model is not able to be trained with a big enough batch size - often caused by memory limitations of the [GPU](GPU) - Accumulate the gradients (for each trainable model value) of several forward passes and after some steps use the accumulated gradients to update the weights - Is then equal to using a large batch size - example with $SGD: \theta_{i}=\theta_{i}−1− \alpha\ast(\Sigma_{i=0}^{N}grad_{\theta_{i}})$