Spectrally-normalized margin bounds for neural networks
Spectrally-normalized margin bounds for neural networks
Peter L. Bartlett, Dylan J. Foster, Matus Telgarsky
Intro
In deep learning, #param >> #sample.
- VC theory, $$VC = O(pL)$$, where p is # params, L is # layers
- Neuralnet
- CIFAR10, but with random label
- test error is very highs.
- Margin analysis
- Linear classifier => neuralnet
- Margin distribution
Figure 2展示了,更加容易学习的data的margin distribution应该更加平。
- Spectral norm: $$A_*$$
- Neuralnet: $$F_A(x) = \sigma_L(A_L(...))$$
- L layers
- Spectral complexity
- $$RA = \prod p_i |A_i| (\sum{i=1}^L \frac{|A_i^T - M_i^T|{2,1}^{2/3}}{|Ai|^{2/3}})$$
- $$M_i = I$$ is the resnet, $$(Ax+x) = (W+I)x$$
Theorem 1.1
- *
- If $$|x_i| \le B$$, $$|x| \le \sqrt{n}B$$
- $$F_A \le r$$
- return $$l_n$$ terms
告诉我们,如何得到一个bound,不依赖于dimension
Analysis of margin bound
ramp loss
- step 1, covering bound per layer
- step 2, induction
- step 3, whole network lowering bound
Appendix
正好是Simons Institute课程期间的论文。