Learning Adversarially Fair and Transferable Representations

David Madras, etc

Abstract

首先representation learning是解决问题的关键；其次可以使用adversarial training来解决representation learning。

Background

几类group fairness介绍。

这几类group fairness都与well-calibrated classifiers相矛盾。

Adversarially Fair Representations

提出模型是Learned Adversarially Fair and Transferable Representations (LAFTR)。将adversary object最大化，同时将classification loss和reconstruction error最小化。

Learning

adversarial objective的功能依赖于我们想要实现的fairness。

对于demographic parity，adversarial objective是每一个group的平均差值。

$$L(h) = 1 - \sum{i\in {0,1}} \frac{1}{|D_i|} \sum{(x,a)\in D_i} | h(f(x,a)) - a|$$

对于equalized odds, adversarial objective是每一个group组合的平均差值。$$D_{i}^j = {(x,y,a) \in D | a=i, y=j}$$。

$$L(h) = 2 - \sum{(i,j) \in {0, 1}^2} \frac{1}{|D_i^j|} \sum{(x,a) \in D_i^j} |h(f(x,a)) - a|$$

而equal opportunity，只需要sum term中考虑$$Y=0$$即可。

$$L(h) = 2 - \sum{ i \in {0,1}, j=0} \frac{1}{|D_i^j|} \sum{(x,a) \in D_i^j} |h(f(x,a)) - a|$$

Motivation

data owner要防止两种类型的vendor

indifferent vendor：只关注性能的最大化，而不在意预测的fairness/unfairness
adversarial vendor：会主动尝试根据敏感特征进行歧视

前面定义的adversarial model中，encoder就是data owner希望的部分，是最终会贩卖给vendor的representation。当学习到这个latent representation之后，剩余两部分会保证这个representation是对的：（1）classifier保证性能（utility），通过在一个task上来模拟indifferent vendor；（2）adversary保证了公平：通过模拟故意带有歧视的adversarial vendor。（与故意带有歧视的vendor差异最大化）

Theoretical Properties

比较两类分布，根据学习到的group representations。考虑在sample space $$\Omega_D$$上的两个分布$$D_0, D_1$$，和binary test function $$\mu: \Omega_D \to {0, 1}$$. $$\mu$$叫test是因为他可以用于区分两个分布的samples，根据expected values在两个分布之间的绝对差值。这个绝对差值test discrepancy是

$$\Omega_D \overset{\mu}{\to} {0,1}$$

$$d\mu(D_0, D_1) = | \mathbb{E}{x \sim D0} [\mu(x)] - \mathbb{E}{x \sim D_1} [\mu(x)] |$$

而两个分布之间的statistical distance就定义为最大的test discrepancy：

$$\Delta(D0, D_1) = \sup\mu d_\mu (D_0, D_1)$$

当学习fair representations的时候，我们同时也感兴趣另外两个分别：Z的分布，基于敏感特征 A，即为$$p(Z|A=0)$$和$$p(Z|A=1)$$，缩写为$$Z_0, Z_1$$。

adversary loss 根据前面的定义，可以有$$L{Adv}^{DP} = 1 - \mathbb{E}{a=0}|h(f(x,0))-0| - \mathbb{E}_{a=1}|h(f(x,1))-1|$$。

Theorem是说$$L_{Adv}^{DP} \ge \Delta(D_0, D_1)$$.

这个证明很简单，只要搞清楚notation即可。

$$ WLOG, \Delta{DP}(g) = \mathbb{E}{Z|A=0}[g] - \mathbb{E}{Z|A=1}[g] = \mathbb{E}{Z|A=0}[g] + \mathbb{E}_{Z|A=1}[1-g] - 1

Now consider an adversary that guesses the opposite of g, i.e., $$h=1-g$$. Then we have

$$ L{Adv}^{DP}(h^*) \ge L{Adv}^{DP}(h) = \Delta_{DP}(g)

Bounding Demographic Parity

supervised learning中我们希望得到一个$$g$$，能够预测label $$Y$$。

$$\Delta(g) = dg(Z_0, Z_1) = | \mathbb{E}{Z0} [g] - \mathbb{E}{Z_1} [g] |$$

这就是group $$A=0$$与group $$A=1$$对应的encoded representation不同。注意$$\Delta(g) \le \Delta^*(Z_0, Z_1)$$，并且$$\Delta(g) = 0$$ iff $$g(Z) \bot A$$，也就是demographic parity已经实现。

现在考虑adversary $$h: \Omega_Z \to {0, 1}$$，其objective function表示为

$$L(h) = \mathbb{E}{Z_0} [1-h] + \mathbb{E}{Z_1}[h] - 1$$

这与公式3一个意思。

Theorem: 考虑到一个分类器$$g: \Omega_Z \to \Omega_y$$ 和adversary $$h: \Omega_Z \to \Omega_A$$作为两个二分类函数，也就是$$\Omega_y = \Omega_A = {0, 1}$$。那么$$L(h^*) \ge \Delta(g)$$: demographic parity distance of g is bounded by optimal objective value of h.

Bounding Equalized Odds

先更新一个notation $$p(Z|A=a,Y=y) = Z_a^y$$。那么equalized odds distance of classifier $$g: \Omega_Z \to {0, 1}$$就可以表示为

$$\Delta(g) = |\mathbb{E}{Z_0^0}[g] - \mathbb{E}{Z1^0}[g]| +|\mathbb{E}{Z0^1}[1-g] - \mathbb{E}{Z_1^1}[1-g]|$$

$$\Delta(g)=0$$ iff g满足equalized odds。

Theorem: classifier $$g: \Omega_Z \to \Omega_y$$ 和 adversary $$h: \Omega_Z \times \Omega_y \to \Omega_Z$$ 作为两个binary functions。那么$$L(h^*) \ge \Delta(g)$$: equalized odds distance of g is bounded by the optimal objective value of h。

The adversary loss is:

$$ \begin{align} L{Adv} & = 2 - [\mathbb{E}{Z0^0}[h] + \mathbb{E}{Z1^0}[1-h] + \mathbb{E}{Z0^1}[h] + \mathbb{E}{Z1^1}[1-h]]\ & = 1-\mathbb{E}{Z0^0}[h] - \mathbb{E}{Z1^0}[1-h] + 1 - \mathbb{E}{Z0^1}[h] - \mathbb{E}{Z1^1}[1-h]\ & = \mathbb{E}{Z0^0}[1-h] + \mathbb{E}{Z1^0}[h] + \mathbb{E}{Z0^1}[1-h] + \mathbb{E}{Z_1^1}[h] - 2 \end{align}

这里的证明能够反向理解为什么会有这样的adversary loss：就是expectation与目标label的差距。

First WLOG, let $$\mathbb{E}{Z_0^0}[g] - \mathbb{E}{Z1^0}[g] = \alpha \in [0, \Delta{EO}(g)]$$, and $$\mathbb{E}{Z_0^1}[1-g] - \mathbb{E}{Z1^1}[1-g] = \Delta{EO}(g) - \alpha$$. So we have

$$ \begin{align} \mathbb{E}{Z_0^0}[g] + \mathbb{E}{Z1^0}[1-g] & = 1 + \alpha\ \mathbb{E}{Z0^1}[1-g] + \mathbb{E}{Z1^1}[g] & = 1 + \Delta{EO}(g) - \alpha \end{align}

Let's consider following adversary $$h(z) = \begin{cases} g(z), & y=1\ 1-g(z), & y=0 \end{cases}$$.

Then we have

$$ \begin{align} \mathbb{E}{Z_0^0}[1-h] + \mathbb{E}{Z1^0}[h] & = 1 + \alpha\ \mathbb{E}{Z0^1}[1-h] + \mathbb{E}{Z1^1}[h] & = 1 + \Delta{EO}(g) - \alpha \end{align}

Thus $$L{Adv}(h^*) \ge L{Adv}(h) = \mathbb{E}{Z_0^0}[1-h] + \mathbb{E}{Z1^0}[h] + \mathbb{E}{Z0^1}[h] + \mathbb{E}{Z1^1}[h] - 2 = 1 + \alpha + 1 + \Delta{EO}(g) - \alpha

2 = \Delta_{EO}(g)$$.

Learning Adversarially Fair and Transferable Representations

Learning Adversarially Fair and Transferable Representations