The goal of machine learning is to use data to minimize some particular objective (loss) function. The most widespread way to do this is through gradient descent, which essentially takes small steps in the direction of steepest descent for a particular objective function to find the “minimum”.

  • = learning rate

In practice, we use stochastic gradient descent (SGD), which uses one data point at a time for a single step and uses a much smaller subset of data points at any given step.

MethodData Used per UpdateUpdate RuleSpeedStabilityNoiseBest Used When
Vanilla GD / Batch GDAll training data SlowVery stableNo noiseSmall datasets
Stochastic GD (SGD)One sample FastHigh varianceHigh noiseLarge datasets or online learning
Mini-batch GDA small batch of samplesMediumGood tradeoffMedium noiseDeep learning (standard default)