Data Decisions and Theoretical Implications when Adversarially Learning Fair Representations

Alex Beutel, Jilin Chen, Zhe Zhao, Ed H. Chi

Google Research

Intro

最近几年大家发现，机器学习模型都带有一定的bias。更多的人来定义什么是faireness，以及如何消除ML算法中的bias。

一种常见bias的原因是data skewness。比如有一类用于在训练数据中被under-represented。而最近的work也没有太在意这类问题。

我们这里考虑的情况是很难或者代价很高，去找一个数据点是否来自这个under-represented的群组。这很常见，因为很多数据特征比较敏感（包含个人信息），或者敏感度特征没有被很好地准确定义。而这种数据的缺少会因为其他原因进一步加深，比如本身数据分别就是不均匀的。

这就带来两个条件 1. 本身模型训练就要考虑到数据本身的稀缺和分布不均匀 2. 当de-biased模型训练完并且应用到分类器上的时候，不能依赖于这个信息：当前这个样例是否来自于protected class

这里我们使用adversarial learing来学习这个de-bias latent representation。我们构建一个multi-head DNN，其中这个模型是试图用一个head预测目标类别，而同时防止第二个head能够准确预测敏感特征。这篇paper贡献如下

从理论上将不同fairness的定义与adversarial training目标和dataset的选择连接起来
我们实验上探究了多少数据足够训练一个有效的de-bias ML模型
实验上研究了在adversarial learning effect中不同的数据分布是如何作用影响模型的fairness

Model Structure and Learning

流程为$$x \to g(x) \to f(g(x))$$，其中$$g(x)$$被当作中间隐藏层。

我们假设存在某个特征$$Z$$是敏感的或者受保护的，也就是最终的预测要独立于这个特征。重要的是，如果这个特征没有被用作$$g$$的输入，它还是可能和其他特征产生关联。

我们假设还是能在$$X$$的某一个子集观测到特征$$Z$$，叫做$$S$$。然后我们训练第二个分类器$$a(g(S))=Z$$。这里的顺序就是$$S \to g(S) \to a(g(S))$$。

我们的目标是得到一个$$f(g(x))$$来预测$$Y$$，同时希望$$a(h)$$来尽可能好的预测$$Z$$；而$$g()$$要使得adversary $$a()$$的预测变得困难。具体过程描述在3.2节中。为了实现上述描述的算法，特地使用了一个identify function with negative gradient $$J_\lambda$$，简单理解就是图1中的两个部分，左半边用gradient descent，而后半边用gradient ascent。其效果就是当我们目标训练在将分类$$Y$$更加精确的同时，将$$Z$$预测的结果变差。

Reconstruction error is not included.
Negative gradient.
If the adversarial head uses data $$S$$ with both $$Y=0$$ and $$Y=1$$, then the model will be encouraged to never encode information about $$Z$$; hidden layer $$g(x)$$ will be uncorrelated with $$Z \to$$ demographic parity/statistical parity. $$P(\hat Y | Z=0) = P(\hat Y | Z=1)$$
- with $$Y=1$$ only: $$g$$ should be uncorrelated with $$Z$$ when $$Y=1$$. $$P(\hat Y = 1 | Y=1, Z=0) = P(\hat Y = 1 | Y=1, Z=1)$$
- with $$Y=1$$ only: $$g$$ should be uncorrelated with $$Z$$ when $$Y=0$$. $$P(\hat Y = 0 | Y=0, Z=0) = P(\hat Y = 0 | Y=0, Z=1)$$

Data Selection and Fairness Definition

探究数据集$$S$$能包含什么信息。

如果我们的adversarial部分同时包含了$$Y=0,1$$，那么我们训练的模型就会不包含关于特征$$Z$$的信息。这种特征$$Z$$和标签$$Y$$之间的独立性被称作demographic parity。

而如果adversarial仅仅包含了$$Y=1$$，那么模型就会不包含关于特征$$Z$$的信息，在$$Y=1$$的情况下。而在$$Y=0$$的情况下，还是会包含特征$$Z$$的信息。

（不过我觉得这个都是从直觉上的解释，是否work还需要理论和实验的证明）

Appendix

可以参考的几篇paper

Rich Zemel, Learning fair representations, ICML-13

Konstantinos Bousmalis, Domain separation networks, NIPS 2016

Yaroslav Ganin, Domain-adversarial training of neural networks, JMLR 2016

Data Decisions and Theoretical Implications when Adversarially Learning Fair Representations

Data Decisions and Theoretical Implications when Adversarially Learning Fair Representations