Multi-Task Learning as Multi-Objective Optimization

Multi-Task Learning as Multi-Objective Optimization

Intro

Stein's paradox

Multi-Task Learning as Multi-Objective Optimization

hypothesis定义为$$f^t(x;\theta^{sh},\theta_t)$$,其中$$\theta^{sh}$$是shared parameter,而$$\theta^t$$是task-specific

而loss就是

$$ \min \sum_t c^t \mathcal{L}^t(\theta^{sh}, \theta^t)

$$

其中$$c^t$$是静态或者动态计算的task weight。

这里一个基本的纠正时,在MTL设定下,很难得到global optimality。相替代的,MTL可以被设定为multi-objective optimization:将一系列可能互相矛盾的物体进行优化。multi-objective optimization的目标是获得Pareto optimality。(可以理解为,现有的MTL是让总的loss最低,而Pareto optimality即在保证没有negative出现的同时,让总的loss最低)

Def 1 (Pareto optimality for MTL) 1. 一个solution $$\theta$$ dominates 另外一个 solution $$\bar \theta$$ 当 $$\theta$$在所有task上的loss都不高于$$\bar \theta$$,且总的loss更低。 2. 一个solution $$\theta^*$$ 被叫做Pareto optimality当没有别的solution $$\theta$$比他更dominant。

但这样可能有不止一个的Pareto optimality。这些Pareto optimality set叫做Pareto set $$\mathcal{P_\theta}$$。

Multiple Gradient Descent Algorithm (MGDA)

MGDA利用了KKT condition。

Solving the Optimization Problem

Appendix

关于MGDA,参考一下这篇paper

J.-A. Désidéri. Multiple-gradient descent algorithm (MGDA) for multiobjective optimization. Comptes Rendus Mathematique, 350(5):313–318, 2012.

然后另外有一个baseline可以考虑

A. Kendall, Y. Gal, and R. Cipolla. Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In CVPR, 2018.

results matching ""

    No results matching ""