Graph convolutional networks for computational drug development and discovery

Mengying Sun, etc.

Spatial convolution

structure2vec是将graph structured data表示成某种latent representation，使用graphical model中的approximate inference。而inference algorithm的解决方案也暗示了某种propagation equation：一个node的representation是关于neighborhood marginal（nodes）和messages from neighbors的函数。后来有大量的GCN都是基于这个概念展开，也就是熟知的spatial convolution。

spatial convolution是为了在vertex domain上直接构造convolution。（这个点在我们N-Gram里面证明不太好用）紧跟着这个想法，某些节点的representation就通过不断地从neighborhood更新。这个与Weisfeiler-Lehman算法也是顺应的。这里暗含的一个机制就是将neighborhood信息当作graph substructures，然后将这一的substructure通过可导函数，迭代地将不同/相同的substructure映射到不同/相同的feature spaces上。

这中类似message-passing的propagation规则被归纳成两个阶段：message passing阶段和readout阶段。如下

$$ mi^{l+1} = \sum{j \in Ni} \mathcal{M} (h_i^l, h_j^l, h{ij}^l)\ h_i^{l+1} = \mathcal{U}(h_i^l, m_i^{l+1})

其中$$h_{ij}$$是edge feature。

Spectral convolution

spectral convolution是按照在图上定义的Fourier-like基底上的signal。主要是关注在graph Laplacian matrix。

Fourier transform

经典的Fourier transform是将time domain上的signal转换到了frequency domain，作为一系列basis function的线性组合。basis function的权重可以通过以下式子得到：

$$ F(w) = F[f(t)] = \int f(t) e^{-iwt} dt

其中$$w$$是frequency，with basis function $$e^{iwt}$$，而$$f(t)$$是time domain上的时间。Fourier transform运行两个signal之间的convolution，通过frequency domain上简单的乘积和一个projection回到time domain。（？）

graph Fourier transform，可以通过defining convolutions as linear operators that diagonalize in the Fourier basis表示为eigenvectors of the Laplacian operator。（？）

考虑Laplacian L的特征值降解 $$L = U \Lambda U^T$$，因为是real symmetric positive semi-definite，所以他有一个完整的set of orthonormal eigenvectors，也叫做graph Fourier modes，而它们相互之间的顺序也反映了图上的frequency。

Graph Fourier transform

signal上的graph Fourier transform因此定义为

$$ \hat x = U^T x

而紧接着，两个signal的convolution定义为

$$ x * h = U((U^T x) \odot (U^T h))

其中$$\odot$$表示element-wise product。既然有了两个signal之间的convolution，那么很自然地可以将一个signal $$x$$ filtered by $$g_\theta$$ as

$$ \tilde x = g\theta(L) x = U g\theta(\Lambda) U^T x

（filter $$g_\theta$$是什么？）但这里有几个限制：1. filter并不是localized 2. learning complexity是$$\mathcal{O}(n^2)$$ 3. parameter的数量取决于input size。

为了解决上述问题，有一个work提出了polynomial filter，将parameter的数量降低为polynomial级别。(?)

$$ \tilde x = U g\theta(\Lambda) U^T x \approx U \begin{bmatrix} \sum{j=0}^{K} \thetaj \lambda_1^j & &\ & \ddots & \ & & \sum{j=0}^{K} \theta_j \lambda_n^j \end{bmatrix}

这样的filter就能够将neighbors（K步以外）的信息全部都localized，并且与input size无关。并且为了避免explicit多项式级别的计算，使用了Chebyshev approximation：

$$ g\theta (L) = \sum{k=0}^{K} \theta_k T_k (\tilde L)

其中$$Tk(x) = 2x T{k-1}(x) = T_{k-2}(x)$$能够用来迭代计算polynomial。后面有work只考虑了1st order neighborhood，并且将formulation简化为

$$ g_\theta(L) = \theta \tilde A

其中$$\tilde A = D^{-1/2} (1+A) D^{-1/2}$$表示normalized adjacency matrix，并且每一个节点都增加一个自循环。

Appendix

这里要注意一下spatial convolution、spectral convolution与spatial distance、spectral distance的区别。

Graph convolutional networks for computational drug development and discovery

Graph convolutional networks for computational drug development and discovery