Powered by GitBook

Learning Deep Representation for Imbalanced Classification

Learning Deep Representation for Imbalanced Classification

Chen Huang, etc.

Intro

一种解决方案是只用少量的数据。

第一个操作是re-sampling，balance class priors：under-sample majority class或者over-sample minority class。
第二个是cost-sensitive learning：给minority设置更高的weight

第三个方案是构造。working

这几个方案都在shallow model上有很好的研究，但是deep learning上的研究还不够，并且都有一些限制：overfitting或者丢失重要的数据信息。

我们提出的方案是基于这么一个假设：minority class通常只有非常少的数据，而其visual variability很高。这就使得这些样例的neighbor很容易根据最近的邻居进行评估。

这篇文章有这么几个贡献

如何在imbalanced data上面学习deep feature embedding
提出了一个新的quintuplet sampling，其相关联的triple-header loss保存了每一个cluster的本地信息，和cluster之间的区分信息。使用这个学习到的特征，我们展示了分类问题可以通过一个快速的cluster-wise kNN，紧跟着local large margin decision完成。这个提出的方法叫做Large Margin Local Embedding (LMLE)-kNN。

Learning Deep Representation from Class-Imbalanced Data

我们的目标是学习一个Euclidean embedding $$f(x)$$，从而使得这个embedded feature能够有辨识度。

Quintuplet Sampling

$$x_i$$, anchor
$$x_i^{p+}$$, 同一个cluster内，距离anchor最远的邻居
$$x_i^{p-}$$, 不同cluster，距离 anchor 最近的 within-class 邻居
$$x_i^{p--}$$, 所有距离最远的 within-class 邻居
$$x_i^n$$, 距离最近的 between-class 邻居

我们希望保持一下关系

$$ D(f(x_i), f(x_i^{p+})) < D(f(x_i), f(x_i^{p-})) < D(f(x_i), f(x_i^{p--})) < D(f(x_i), f(x_i^{n}))

$$

其中$$D(f(x_i), f(x_j)) = | f(x_i) - f(x_j) |_2^2$$是欧几里得距离。

这里有一个前提是假设所有的样例都已经被很好的clustered。

results matching ""

No results matching ""