Focal Loss for Dense Object Detection

Tsung-Yi Lin, Priya Goyal, Ross Girshick, Kaiming He, Piotr Dollar

Intro

目前最新的object dector是基于一个two-stage, proposal-driven mechanism。第一个步骤是产生一个sparse object location，第二个步骤是将每一个location进行分类，究竟是foreground还是background（用CNN）。

一个问题是，能否通过一个one-stage detector就实现上述两个功能。相关的work已经能够实现相当不错的性能，但仍有一定的误差。

这篇paper更进一步，我们提出了一个one-stage object detector，能够达到最有性能。为了实现这个，我们在训练的过程中，将class imbalance作为one-stage达成最优性能的最主要的障碍。

这篇paper提出了一个新的loss function，来作为更有效的替代方法来处理class imbalance。loss function是dynamically scaled cross entropy，其中scaling factor会随着正确分类的confidence提高而逐渐减小为0。直观上，这个scaling factor能够将easy examples的权重降低，而更加关注于hard examples。试验结果表明，我们提出的Focal Loss十分有效，并且比当前的one－stage更加准确。

Class Imbalance

在object detection这类问题上，可能每个detector会在每张图片上预测$$10^4-10^5$$个位置，但是仅有一部分location包含真正的object。这就带来了两个问题： 1. 训练会非常低效，因为大部分都是negatives，无法带来有用信息 2. easy negatives会控制整个训练，并因此使模型退化我们提出的focal loss能够非常自然的处理class imbalance问题。

Focal Loss

$$ CE(p,y) = \begin{cases}

\log (p) & \text{if y = 1}\
\log (1-p) & otherwise \end{cases} $$

$$y\in {\pm 1}$$是ground truth，而$$p\in [0, 1]$$是表示data属于$$y=1$$的概率。为了方便，我们定义

$$ p_t = \begin{cases}

p & \text{if y = 1}\
1-p & otherwise \end{cases} $$

从而有$$CE(p,y) = CE(p_t) = -\log(p_t)$$。

Balanced Cross Entropy

一种常用的解决方案是引入weighting factor $$\alpha$$ for class 1 and $$1-\alpha$$ for class -1。从实战的角度，通常会将$$\alpha$$设置为inverse class frequency或者是一个hyperparameter，由cross validation确定。

Focal Loss Definition

Easily classified negative comprise the majority of the loss and dominate the gradient。尽管$$\alpha$$能够在positive和negative之间进行balance，但是并不能区分easy和hard examples。我们这里提出是将easy example降低权重，从而关注hard negatives。

提出modulating factor $$(1-p)^\gamma$$到cross entropy中，其中focusing parameter $$\gamma$$是可以调节的参数。我们定义focal loss为

$$ FL(p_t) = -(1-p_t)^\gamma \log (p_t)

也就是

$$ FL(p,y) = \begin{cases}

(1-p)^\gamma \log (p) & \text{if y = 1}\
p^\gamma \log (1-p) & otherwise \end{cases} $$

或者可以写成

$$ FL(p,y) = y(1-p)^\gamma \log p + (1-y)p^\gamma \log(1-p)

下面来分析focal loss的两种特性： 1. 当一个data被分错了，并且$$p_t$$很小，那么在这种情况下modulating factor接近于1，从而对loss不影响 2. focusing parameter可以平滑地fit到减小easy example权重的rate上

Focal Loss for Dense Object Detection

Focal Loss for Dense Object Detection