Graph Normalizing Flows

Jenny Liu, Aviral Kumar, Jimmy Ba, Jamie Kiros, Kevin Swersky

U Toronto, UCB, U Toronto, Google Research, Google Research

Intro

把normalizing flows的想法延伸到graph上，有graph normalizing flows (GNFs)。

Background

Graph Neural Networks

用attention机制，来衡量neighbor nodes在message passing过程中的重要性。

multi-headed attention使用multiple weights $$W$$作为attention，最后再将结果进行concatenate。

Normalizing Flows

normalizing flows (NFs)是一系列generative model，使用可逆的mapping。$$x = f^{-1} (f(x))$$。根据normalizing flow，有

$$ p(z) = P(x) \Big| \frac{\partial f(x)}{\partial x} \Big|^{-1}

注：根据normalizing flow（blog），有 $$p(x) dx = p(z) dz$$，因此

$$ p(z) = p(x) \frac{dx}{dz} = p(x) \frac{dx}{d f(x)} = p(x) \Big| \frac{d f(x)}{d x} \Big|^{-1}

如果有sufficiently expressive mapping，NFs可以讲一个很复杂的分布映射到Gaussian；关键在于找到expressive mapping，同时计算上computable determinant。我们的构造是基于non-volume preserving flows, 也就是RealNVP。尤其是中间的affine coupling layer可以将x的维度分解为两组变量，$$x^{(0)}, x^{(1)}$$，然后各自映射，$$z^{(0)}, z^{(1)}$$

$$ z^{(0)} = x^{(0)}\ z^{(1)} = x^{(1)} \odot \exp (s(x^{(0)})) + t(x^{(0)})

Methods

Reversible Graph Neural Networks (GRevNets)

是一系列的reversible message passing neural networks。为了实现reversibility，将node feature matrix分成了两个部分，$$H_0$$ 和 $$H_1$$。对于graph中的某个特定节点，step t对应的message passing阶段的特征分别叫做 $$h_t^0$$ 和 $$h_t^1$$，同时$$h_t^{v} = concat(h_t^0, h_t^1)$$。

同时message passing的一步也可以分作两个中间步骤，每一个都被当作是半步。有

$$ H{t+1/2}^{(0)} = H_t^{(0)} \odot (F_1(H_t^{(1)})) + F_2 (H_t^{(1)})\ H{t+1/2}^{(1)} = Ht^{(1)}\ H{t+1}^{(0)} = H{t+1/2}^{(0)} \ H{t+1}^{(1)} = H{t+1/2}^{(1)} \odot \exp (G_1(H{t+1/2}^{(0)})) + G2(H{T+1/2}^{(0)})

而相对应的reverse版本也可以相对应的得到。

GNFs for Structured Density Estimation

和NFs一样的思路，我们也可以通过change of variables来告诉我们rules for exact density transformation。如果我们假设 $$Ht \sim P(H_t)$$，那么 $$P(H{t-1})$$ 的density就是 $$P(H{t-1}) = \det \Big| \frac{\partial H{t-1}}{\partial H_t} \Big| P(H_t)$$，因此有

$$ P(G) = \det \Big| \frac{\partial HT}{\partial H_0} \Big| P(H_T) = \Big| P(H_T) \prod{t=1}^T \det \Big| \frac{\partial H{t}}{\partial H{t-1}}

其中$$H_0$$是node features。Jacobian是通过下三角矩阵给出的，因此可以进行density计算。

Graph Auto-Encoders

graph的auto-encoder有一个需要解决的问题是离散化。我们分为两步来解决这个问题

训练一个permutation invariant graph auto-encoder
训练一个GNF来给graph embedding的分布进行建模，然后再用一个decoder生成图

GAE中是生成了一个vector，然后用这个vector和自己的transpose做乘法，从而得到了adjacency matrix。这里使用的方法是embed a set of nodes in a graph jointly, each node will be mapped to its own embedding vector。这样可以避免在decoder的时候跑matching。

loss function就是针对每一条边的BCE。

我们使用一个相对简单的decoder。给定两个node的embedding，我们的decoder输出对应edge的概率为

$$ \hat A_{ij} = \frac{1}{1 + \exp (C (| x_i - x_j|^2-1)) }

其中C是叫做temperature hyperparameter，我们默认为10。这也反映了如果两个节点在embedding space中比较接近，那么这两个点更可能相互连接。

encoder是标准的GNN，with multi-head dot-product attention。如果我们没有并且对node feature不关心，那么可以随机生成Gaussian variables来表示每一个节点。

Appendix

In submission

Graph Normalizing Flows

Graph Normalizing Flows