Tldr
Generative Model used to generate new data by learning how data changes through time through a process of adding noise and then reversing this process.
Forward Process (diffusion)
- The model starts with a clean image and adds noise step by step. [Fixed].
- After many steps, the image becomes completely unrecognizable noise.
- the clean image transitions to noisy images with each step adding more and more noise.
- Mathematically this is a Markov Chain , where at each step Gaussian noise is added:
-
- controls how much noise is added at each step
Reverse Process (Denoising)
- The reverse process involves recovering the original clean image step by step from the noisy image .
- This is modeled as another Markov Chain, where at each step noise is subtracted to move closer to the original image:
- Mathematically, the reverse step can be expressed as:
- Here:
- (optional noise for stochasticity)
- controls the variance during the reverse step.
- Here:
- Mathematically, the reverse step can be expressed as:
- ==The reverse process is learned using a neural network that predicts the amount of noise added at each step. This prediction is used to iteratively denoise.==
Training Objective
-
The model is trained to minimize the difference between the predicted noise and the actual noise added at each forward step:
-
Training is performed by maximizing the ELBO, which leads to maximizing log likelihood.
- : Neural network’s prediction of noise for input at time .
Key Components
-
Forward Process (Noise Addition):
- Progressively adds Gaussian noise to transition .
- Defined by:
-
Reverse Process (Noise Removal):
- Approximates the denoising distribution:
- Where:
- : Mean predicted by the neural network.
- : Variance (can be fixed or learned).
Connection to Variational Inference
- The forward process defines a fixed distribution .
- The reverse process tries to approximate by optimizing the Evidence Lower Bound (ELBO):
Resources
Diffusion Without Tears Step-by-Step Diffusion: An Elementary Tutorial (arxiv.org)