[deleted] t1_j4qu0xg wrote on January 17, 2023 at 5:03 PM

> I attended a talk last year in which one speaker claimed that due to the way stochastic gradient descent works, it could be that some minimums are never reachable from some initialization states no matter how long one trains. Unfortunately I cannot find what paper/theorem he was referring to.

Some examples of NN failing to learn would be constant initializations. Especially easy to see with zero initialization.

https://medium.com/@safrin1128/weight-initialization-in-neural-network-inspired-by-andrew-ng-e0066dc4a566

As for a general framework. If you're familiar with the [Universal Approximation Theorem](https://en.wikipedia.org/wiki/Universal_approximation_theorem). Particular papers discuss convergence rates - https://proceedings.neurips.cc/paper/2020/file/2000f6325dfc4fc3201fc45ed01c7a5d-Paper.pdf

I think it would be a function of your particular problem. I've seen this examined from the perspective of learning frameworks as well such as PAC learning. Reading into that may answer some of your specific questions. I'm not aware of any general result outside of some comments on bounding generalization error based on data segmentation.

At a blush, your question about knowing you'll reach the minimum in K steps feels halting problem-ish. So I'd have to think about it later to convince myself fully.