Structured Prediction Energy Networks

David Belanger, Andrew McCallum

2 The Structured Prediction Energy Networks (SPEN)

SPEN parameterizes the energy function as a neural network. Put SPEN into our setting, we have two steps. First the node representation is computed by a GCN. For example, the node representation with a two-layer GCN is

__$$ F(X) = \sigma(\sigma(X W1) W_2) $$\_

where $$\sigma$$ is the activation function, $$X$$ is the message passing operation in GCN, and $$Wi, \forall i \in { 1, 2 }$$i_s doing feature projection.

Then the total energy is obtained by adding up the local energy and global energy, i.e., $$E(y, X) = E^{\text{local}} + E^{\text{global}} = \sum{i=1}^{m} yi^T B1 F(X,) + c_2^T \sigma(C_1 y) $$, _where $$c2$$ _is a learnable vector and $$B1, C_1$$ _are learnable matrices.

The energy function defined above assumes $$y$$ are independent of $$X$$, and as mentioned in this paper, this may lead to overfitting. An alternative solution would be conditioned on $$X$$. $$ E(y, X) = E^{\text{local}} + E^{\text{global}} = \sum{i=1}^{m} y_i^T B_1 F(X) + d_2^T \sigma(D_1 [y; F(X)]) $$, where $$d_2$$ and $$D_1$$ are learnable vector and matrix respectively.

With the energy function above, SPEN utilizes SSVM to solve $$\sum{xi, y_i} max_y \big[ \Delta(y_i, y) - E(x_i, y) + E(x_i, y_i) \big]+ $$, where $$\Delta(yp, y_g)$$ _measures the error between prediction label $$yp$$ _and ground truth $$yg$$._

The corresponding loss is defined as

__$$ y = \arg \min{y} (-\Delta (y_i, y) + E(x_i, y)) $$\_

Structured Prediction Energy Networks

Structured Prediction Energy Networks

Structured Prediction Energy Networks

2 The Structured Prediction Energy Networks (SPEN)

results matching ""

No results matching ""