Denoising Diffusion Probabilistic Models

Basics of Probability

Conditional Probability

Markov Chain

The probability of the current state is only related to the previous moment, for example, considering Markov relationship​ , then

Reparameterization

For a neural network, If we want to sample from the Gaussian distribution , the result obtained is not differentiable. Thus, we can first sample from the standard distribution, and then get ​. Then the randomness is transferred to and , and ​and​ are used as part of the affine transformation network.

Forward Diffusion Process

Fig.1 Diffusion Probabilistic Model

Given a data point sampled from a real data distribution , let us define a forward diffusion process in which we add small amount of Gaussian noise to the sample in steps, producing a sequence of noisy samples . The step sizes are controlled by a variance schedule

The data sample gradually loses its distinguishable features as the step becomes larger. Eventually when , is equivalent to an isotropic Gaussian distribution.

Let and , then

(*) Recall that when we merge two Gaussians with different variance, and , the new distribution is . Here the merged standard deviation is .

Usually, we can afford a larger update step when the sample gets noisier, so and therefore .

Reverse Diffusion Process

Fig.2 Diffusion Process

If we can reverse the above process and sample from , we will be able to recreate the true sample from a Gaussian noise input, .

Note that if is small enough, will also be Gaussian.

Unfortunately, we cannot easily estimate because it needs to use the entire dataset and therefore we need to learn a model to approximate these conditional probabilities in order to run the reverse diffusion process.

It is noteworthy that the reverse conditional probability is tractable when conditioned on

As we know, . Using Bayes’ rule, we have

where is some function not involving and details are omitted. Thus, following the standard Gaussian density function, the mean and variance can be parameterized as follows

As discussed above, we can represent and plug it into the above equation and obtain

where is sampled from distribution . Thus,

Pytorch Implementation

Fig.3 The training and sampling algorithms in DDPM

Notice

  • nan and inf may appear if the precision is not enough
  • participates in training through embedding
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
from torch import nn
import torch
from torch.nn import functional as F

class DDPM(nn.Module):
def __init__(self, model, T, beta_1, beta_T):
super(DDPM, self).__init__()
self.model = model
self.T = T
self.beta_1 = beta_1
self.beta_T = beta_T
self._init_constant()

def _init_constant(self):

betas = torch.linspace(self.beta_1, self.beta_T, self.T, dtype=torch.float64)
alphas = 1 - betas
cumprod_alphas = torch.cumprod(alphas, dim=0)
prev_cumprod_alphas = F.pad(cumprod_alphas[:-1], (1, 0), value = 1.)

self.forward_coef1 = torch.sqrt(cumprod_alphas)
self.forward_coef2 = torch.sqrt(1 - cumprod_alphas)
self.reverse_std = torch.sqrt(betas * (1. - prev_cumprod_alphas) / (1. - cumprod_alphas))
self.reverse_mean_coef1 = 1 / self.forward_coef1
self.reverse_mean_coef2 = self.reverse_mean_coef1 * betas / self.forward_coef2

def forward(self, x0):

b = x0.shape[0]
t = torch.randint(0, self.T, size=(b, ))
noise = torch.rand_like(x0)
coef1 = F.embedding(t, self.forward_coef1.unsqueeze(-1)).unsqueeze(-1).unsqueeze(-1)
coef2 = F.embedding(t, self.forward_coef2.unsqueeze(-1)).unsqueeze(-1).unsqueeze(-1)

xt = coef1 * x0 + coef2 * noise
denoise = self.model(xt, t)

return noise, denoise

def sample(self, xT):

xt = xT
b = xT.shape[0]

for step in reversed(range(self.T)):
t = torch.ones(size=(b, ), dtype=torch.long) * step
mean_coef1 = F.embedding(t, self.reverse_mean_coef1.unsqueeze(-1)).unsqueeze(-1).unsqueeze(-1)
mean_coef2 = F.embedding(t, self.reverse_mean_coef2.unsqueeze(-1)).unsqueeze(-1).unsqueeze(-1)
std = F.embedding(t, self.reverse_std.unsqueeze(-1)).unsqueeze(-1).unsqueeze(-1)
denoise = self.model(xt, t)
mean = mean_coef1 * xt - mean_coef2 * denoise
if step > 0:
noise = torch.randn_like(xt)
else:
noise = 0
xt = mean + std * noise
x0 = torch.clip(xt, min=-1, max=1)

return x0
PYTHON

Reference


Denoising Diffusion Probabilistic Models
https://blog.iks-ran.com/2023/08/07/ddpm/
Author
iks-ran
Posted on
August 7, 2023
Licensed under