Bayesian Learning via Stochastic Gradient Langevin Dynamics

Bayesian Learning via Stochastic Gradient Langevin Dynamics

Max Welling, Yee Whye Teh

Preliminaries

ML或者MAP的一个问题是,他们没有办法抓住parameter uncertainty,因此会overfit data。Bayesian approach来抓住uncertainty的方法是通过MCMC。这篇paper我们考虑用MCMC的一个方法,交Langevin dynamics。

在SGD中加入了Gaussian noise:

$$ \nabla \thetat = \frac{\epsilon}{2} \Big( \nabla \log p (\theta_t) + \sum{i=1}^N \nabla \log p(x_i|\theta_t) \Big) + \eta_t\ \eta_t \sim \mathcal{N}(0, \epsilon)

$$

还有更复杂的技术,使用Hamiltonian dynamics with momentum variables。但这技术会要求用到整个数据集合(GD而不是SGD),会要求非常高的computational cost。

Stochastic Gradient Langevin Dynamics

将SGD和Langevin Dynamics结合起来看:

$$ \nabla \thetat = \frac{\epsilon_t}{2} \Big( \nabla \log p (\theta_t) + \frac{N}{n} \sum{i=1}^{n} \nabla \log p(x_i|\theta_t) \Big) + \eta_t\ \eta_t \sim \mathcal{N}(0, \epsilon_t)

$$

results matching ""

    No results matching ""