Jian Zhan, Ioannis Mitliagkas, Christopher Re
Stanford
正好和上一次读的ArXiv paper: The Marginal Value of Adaptive Gradient Methods in Machine Learning 对应。都是Adam的应用有局限。