Regularization

Tldr

Regularization is a way to reduce overfitting by penalizing complexity and reducing variance.

L_{reg} (θ) = L_{data} (θ) + λ \cdot Ω (θ)

$λ > 0$ is the regularization strength
$Ω (θ)$ is the penalty term that discourages complex models

L1 Regularization (Lasso)

$Ω (θ) = j = 1 \sum d ∣ θ_{j} ∣ = ∥ θ ∥_{1}$ $L_{lasso} (θ) = \frac{1}{n} i = 1 \sum n (y_{i} - \overset{y}{^}_{i})^{2} + λ j = 1 \sum d ∣ θ_{j} ∣$

Adds absolute value of weights to the loss

Encourages sparsity — many weights become exactly zero

Performs feature selection by eliminating irrelevant features

Not differentiable at 0 → uses subgradient methods or coordinate descent

L2 Regularization (Ridge)

$Ω (θ) = j = 1 \sum d θ_{j}^{2} = ∥ θ ∥_{2}^{2}$ $L_{ridge} (θ) = \frac{1}{n} i = 1 \sum n (y_{i} - \overset{y}{^}_{i})^{2} + λ j = 1 \sum d θ_{j}^{2}$

Adds squared magnitude of weights to the loss

Tends to spread weight across features

Brayden Zhang

Explorer

Regularization

Graph View

Backlinks