SGD on Neural Networks Learns Functions of Increasing Complexity
SGD on Neural Networks Learns Functions of Increasing Complexity
Preetum Nakkiran, etc.
In Submission to NeurIPS 2019
Intro
一直推测SGD 通过产生low complexity model 来提供了某种implicit regularization。这篇paper就证明了dynamics of SGD很重要,并且SGD得到的模型能够进行generalize是因为
- 刚开始learning的epoch,SGD更倾向于学习到简单的分类器
- 随着epoch增加,SGD相对比较稳定,并且能保存从简单的classifier(开始几个epoch)得到的信息