Graph-Bert: Only Attention is Needed for Learning Graph Representations

Method

Graph-Bert会在一个linkless subgraph上（从complete graph sample）先进行训练。为了控制sampling过程，这里引入了top-k intimacy sampling，使用intimacy matrix。

这里使用类似pagerank的算法来定义intimacy matrix $$S$$。

$$ S = \alpha \cdot ( I - (1-\alpha) \cdot \bar A )^{-1}

$$\bar A = A D^{-1}$$ 是normalized adjacency matrix。

Node Context: $$\tau (v_i) = { v_j | v_j \in \mathcal{V} \setminus {v_i} \wedge S(i,j) \ge \theta_i }$$。就是某个node的context是其他所有和他相似度大于$$\theta$$的node。

然后根据node context的概念，我们可以将sampled graph batches表示为$$G = g1, g_2, ..., g{|\mathcal{V}|}$$, $$g_i$$是node i对应的node context subgraph。

input vector包含四个部分

Raw Feature Vector Embedding $$e_j^x = Embed(x_j$$

WL Absolute Role Embedding Weisfeiler-Lehman (WL) algorithm根据structural roles将node进行label。 $$e_j^r = Position-Embed(WL(v_j))$$

Intimacy-Based Relative Positional Embedding $$e_j^p = Position-Embed(P(v_j))$$

Hop-Based Relative Distance Embedding $$e_j^d = Position-Embed(H(v_j, v_i))$$

initial node embedding 是 $$h_j^0 = Aggregate(e_j^x, e_j^r, e_j^p, e_j^d)$$。

下面写graph-transformer

$$ H = softmax (\frac{QK^T}{\square{d_h}}) + G-Res(H^{(l-1)}, X_i)\ Q = H^{(l-1)} W_Q^{(l)}\ K = H^{(l-1)} W_K^{(l)}\ V = H^{(l-1)} W_V^{(l)}

$$G-Res$$是graph residual term，而$$X_i$$是raw features。

处理两类问题：node attribute reconstruction和graph structure recovery。同时，根据应用场景，分为node classification和graph clustering。

Node Raw Attribute Reconstruction

$$\ell_1 = \frac{1}{|\mathcal{V}|} \sum | x_i - \hat x_i |_2$$

Graph Structure Recovery

$$\ell_2 = \frac{1}{|\mathcal{V}|^2} | S - \hat S |_F$$

Node Classification

$$h_i = softmax(FC(z_i))$$

$$\sum_{v_i} \sum_m -y_i(m) \log \hat y_i (m)$$

Graph Clustering

$$\min{\mu_1, ..., \mu_l} \min_c \sum{j=1}^l \sum_{v_i} | z_i - \mu_j |_2$$