Viewing a single comment thread. View all comments

[deleted] t1_j4qu0xg wrote

> I attended a talk last year in which one speaker claimed that due to the way stochastic gradient descent works, it could be that some minimums are never reachable from some initialization states no matter how long one trains. Unfortunately I cannot find what paper/theorem he was referring to.

Some examples of NN failing to learn would be constant initializations. Especially easy to see with zero initialization.

https://medium.com/@safrin1128/weight-initialization-in-neural-network-inspired-by-andrew-ng-e0066dc4a566

​

As for a general framework. If you're familiar with the [Universal Approximation Theorem](https://en.wikipedia.org/wiki/Universal_approximation_theorem). Particular papers discuss convergence rates - https://proceedings.neurips.cc/paper/2020/file/2000f6325dfc4fc3201fc45ed01c7a5d-Paper.pdf

I think it would be a function of your particular problem. I've seen this examined from the perspective of learning frameworks as well such as PAC learning. Reading into that may answer some of your specific questions. I'm not aware of any general result outside of some comments on bounding generalization error based on data segmentation.

At a blush, your question about knowing you'll reach the minimum in K steps feels halting problem-ish. So I'd have to think about it later to convince myself fully.

5