[Discussion] If we had enough memory to always do full batch gradient descent, would we still need rmsprop/momentum/adam? Submitted by 029187 t3_xt0h2k on October 1, 2022 at 5:01 PM in MachineLearning 20 comments 3
crrrr30 t1_iqpciu1 wrote on October 2, 2022 at 2:48 AM I feel like with that memory available, testing scaling laws is a better research direction than testing full batch Permalink 1
Viewing a single comment thread. View all comments