Bayesian Learning via Stochastic Gradient Langevin Dynamics
Bayesian Learning via Stochastic Gradient Langevin Dynamics
Max Welling, Yee Whye Teh
Preliminaries
ML或者MAP的一个问题是,他们没有办法抓住parameter uncertainty,因此会overfit data。Bayesian approach来抓住uncertainty的方法是通过MCMC。这篇paper我们考虑用MCMC的一个方法,交Langevin dynamics。
在SGD中加入了Gaussian noise:
$$ \nabla \thetat = \frac{\epsilon}{2} \Big( \nabla \log p (\theta_t) + \sum{i=1}^N \nabla \log p(x_i|\theta_t) \Big) + \eta_t\ \eta_t \sim \mathcal{N}(0, \epsilon)
$$
还有更复杂的技术,使用Hamiltonian dynamics with momentum variables。但这技术会要求用到整个数据集合(GD而不是SGD),会要求非常高的computational cost。
Stochastic Gradient Langevin Dynamics
将SGD和Langevin Dynamics结合起来看:
$$ \nabla \thetat = \frac{\epsilon_t}{2} \Big( \nabla \log p (\theta_t) + \frac{N}{n} \sum{i=1}^{n} \nabla \log p(x_i|\theta_t) \Big) + \eta_t\ \eta_t \sim \mathcal{N}(0, \epsilon_t)
$$