Learning to Reweight Examples for Robust Deep Learning

Learning to Reweight Examples for Robust Deep Learning

Mengye Ren, etc.

Learning to Reweight Examples

从一个meta-learning objective towards an online approximation角度驱动模型。给了对于任意深究网络都适用的实现,并且提供了理论保证其convergence rate在$$O(1/\epsilon^2)$$。

From a meta-learning objective to an online approximation

$$(x,y)$$是training data。存在一个small unbiased and clean validation set $$(x_i^v, y_i^v), 1 \le i \le M, M << N$$。并且还假设training set包含validation set。

$$\Phi(x, \theta)$$是神经网络模型。考虑最小化loss function $$C(\hat y, y)$$,其中$$\hat y = \Phi(x,\theta)$$。

在标准的训练中,我们考虑最小化loss期待值 $$\frac{1}{N} \sum_i C(\hat y_i, y_i) = \frac{1}{N} \sum_i f_i(\theta)$$,每个sample equal weight,且$$f_i(\theta)$$表示的是第i个数据的loss function。这里我们目标是学习一个reweighting of the inputs

$$ \theta^* = \arg \, \min_\theta \sum_i w_i f_i(\theta)

$$

且一开始$$w_i$$未知。而$$w$$是根据validation performance来选取

$$ w^ = \arg \, \min_{w, w>0} \frac{1}{M} \sum_i f_i^v(\theta^(w))

$$

Online approximation

计算最优的$$w_i$$需要两步,内嵌的optimization,并且每一步都会非常“贵”。我们方法的动力就是通过一个optimization循环来学习$$w$$。

类似SGD的优化方法,每一步都使用a mini-batch of training examples。

$$ \theta_{t+1} = \theta_t - \alpha \nabla(\frac{1}{n} \sum_i f_i(\theta_t))

$$

我们想要理解第i个训练样本对于validation set在第t步会有什么影响。按照以前一篇文章的分析,我们考虑perturbing the weighting by $$\epsilon_i$$ for each training example in the mini-batch

$$ f{i,\epsilon}(\theta) = \epsilon_i f_i(\theta)\ \hat \theta{t+1}(\epsilon) = \thetat - \alpha \nabla \sum_i f{i,\epsilon}(\theta) | \theta=\theta_t

$$

下面就考虑查找optimal $$\epsilon^*$$ 来最小化step t 的 validation loss $$f^v$$。

$$ \epsilont^* = \arg \, \min\epsilon \frac{1}{M} \sumi f_i^v(\theta{t+1}(\epsilon))

$$

但这个可能还是很费时间。为了得到一个时间t的很好的估计 $$w_i$$,我们代入了一个gradient descent step 在 validation set,然后rectify the output来得到一个non-negative weighting:

$$ u{i,t} = -\eta \frac{\delta}{\delta \epsilon{i,t}} \frac{1}{m} \sumj f_j^v(\theta{t+1}(\epsilon)) | \epsilon{i,t}=0\ \bar w{i,t} = \max(u{i,t}, 0)

$$

Examples: learning to reweight examples in a multi-layer perceptron network

考虑multi-layered network和multi-layer perceptron (MLP) network。

Appendix

以前的一篇文章是Koh, Pang Wei and Liang, Percy. Understanding black-box predictions via influence functions. ICML 2017.

results matching ""

    No results matching ""