The probability of the current state is only related to the previous moment, for example, considering Markov relationship , then
Reparameterization
For a neural network, If we want to sample from the Gaussian distribution , the result obtained is not differentiable. Thus, we can first sample from the standard distribution, and then get . Then the randomness is transferred to and , and and are used as part of the affine transformation network.
Forward Diffusion Process
Fig.1 Diffusion Probabilistic Model
Given a data point sampled from a real data distribution , let us define a forward diffusion process in which we add small amount of Gaussian noise to the sample in steps, producing a sequence of noisy samples . The step sizes are controlled by a variance schedule
The data sample gradually loses its distinguishable features as the step becomes larger. Eventually when , is equivalent to an isotropic Gaussian distribution.
Let and , then
(*) Recall that when we merge two Gaussians with different variance, and , the new distribution is . Here the merged standard deviation is .
Usually, we can afford a larger update step when the sample gets noisier, so and therefore .
Reverse Diffusion Process
Fig.2 Diffusion Process
If we can reverse the above process and sample from , we will be able to recreate the true sample from a Gaussian noise input, .
Note that if is small enough, will also be Gaussian.
Unfortunately, we cannot easily estimate because it needs to use the entire dataset and therefore we need to learn a model to approximate these conditional probabilities in order to run the reverse diffusion process.
It is noteworthy that the reverse conditional probability is tractable when conditioned on
As we know, . Using Bayes’ rule, we have
where is some function not involving and details are omitted. Thus, following the standard Gaussian density function, the mean and variance can be parameterized as follows
As discussed above, we can represent and plug it into the above equation and obtain
where is sampled from distribution . Thus,
Pytorch Implementation
Fig.3 The training and sampling algorithms in DDPM
Notice
nan and inf may appear if the precision is not enough