picardythird t1_iroxxxd wrote on October 9, 2022 at 10:35 PM

Reply to comment by dpkingma in [D] What kind of mental framework/thought process the researchers have when working on solving/proving the math of the new algorithms? by aviisu

I'll admit that VAEs are one of the (imo very few) methods that are principally motivated. The original VAE paper is actually one of my favorite papers in all of ML; it's well-structured, easy to read, and the steps leading from problem to solution are clear.

This is not at all the case for the vast majority of ML research. How many times have we seen "provably optimal" bounds on something that are then shown to, whoops, not actually be optimal a few years (or months! or weeks!) later? How many times have seen "theoretical proofs" for why something works where it turns out that, whoops, the proofs were wrong (looking at you, Adam)? Perhaps most damningly, how many times have we seen a mathematical model or structure that seems to work well in a lab completely fail when exposed to real world data that does not follow the extraordinarily convenient assumptions and simplifications that serve as the foundational underpinnings? This last point is extremely common in nonstandard fields such as continual learning, adversarial ML, and noisy label learning.

Most certainly, some (perhaps even "much"... I hesitate to say "most" or "all") of the work that comes out of the top top top tier labs is motivated by theory before practice. However, outside of these extraordinary groups it is plainly evident that most methods you see in the wild are empirical, not theoretical.

dpkingma t1_irpaffs wrote on October 10, 2022 at 12:15 AM

I agree with the (milder) claim that most methods in the wild are empirical.

With regards to Adam: I wouldn't say that this method was ad hoc. The method was motivated by the ideas in Sections 2 and 3 of the paper (update rule / initialization bias correction), which are correct. The convergence result in Section 4 should be ignored, but didn't play a role in the conception of the problem, and wasn't very relevant in practice anyway due to the convexity assumption.