Summary
A technique used to help deep neural networks learn more effectively by allowing them to focus on “what’s different” rather than having to learn everything from scratch in each layer.
Fail
In a deep network, layers are stacked on top of each other to learn complex patterns. However, when we add too many layers, networks sometimes struggle to learn effectively—they may even start degrading the accuracy instead of improving it. Ideally, extra layers should be able to simply “do nothing” when they don’t need to transform the input (e.g., learn an identity function that outputs the same input). However, deep layers often struggle to learn this identity function naturally.
Important
- Residual connections, or skip connections, address this problem by letting each layer output the difference (or residual) between the layer’s input and its desired output.
- Instead of learning the entire output from scratch, each layer only has to learn what it should add or subtract from the input to get closer to the target. This is much easier and faster for the network to learn, especially in deep networks.
Important
- In practice, a residual block lets each layer pass its output plus the original input forward. Mathematically, a residual block for input xxx and function F(x)F(x)F(x) can be written as:
- This shortcut connection, , enables each layer to modify only what’s necessary while leaving the rest of the input unchanged.
Info
Residuals make it easier for networks to learn patterns by helping them quickly recognize when to keep inputs the same, which is useful when extra layers don’t need to make major changes.
Note
Residual connections also allow gradients to flow through the network more easily, preventing them from becoming too large or too small. Solving the Exploding and Vanishing Gradients problem. This helps even the deeper layers train just as effectively as the layers closer to the output.