Tldr

Generative Model used to generate new data by learning how data changes through time through a process of adding noise and then reversing this process.

Forward Process (diffusion)

  • The model starts with a clean image and adds noise step by step. [Fixed].
  • After many steps, the image becomes completely unrecognizable noise.
    • the clean image transitions to noisy images with each step adding more and more noise.
    • Mathematically this is a Markov Chain , where at each step Gaussian noise is added:
      • controls how much noise is added at each step

Reverse Process (Denoising)

  • The reverse process involves recovering the original clean image step by step from the noisy image .
  • This is modeled as another Markov Chain, where at each step noise is subtracted to move closer to the original image:
    • Mathematically, the reverse step can be expressed as:
      • Here:
        • (optional noise for stochasticity)
        • controls the variance during the reverse step.
  • ==The reverse process is learned using a neural network that predicts the amount of noise added at each step. This prediction is used to iteratively denoise.==

Training Objective

  • The model is trained to minimize the difference between the predicted noise and the actual noise added at each forward step:

  • Training is performed by maximizing the ELBO, which leads to maximizing log likelihood.

    • : Neural network’s prediction of noise for input at time .

Key Components

  1. Forward Process (Noise Addition):

    • Progressively adds Gaussian noise to transition .
    • Defined by:
  2. Reverse Process (Noise Removal):

    • Approximates the denoising distribution:
    • Where:
      • : Mean predicted by the neural network.
      • : Variance (can be fixed or learned).

Connection to Variational Inference

  • The forward process defines a fixed distribution .
  • The reverse process tries to approximate by optimizing the Evidence Lower Bound (ELBO):

Resources

Diffusion Without Tears Step-by-Step Diffusion: An Elementary Tutorial (arxiv.org)