ClearlyCylindrical

ClearlyCylindrical t1_iqna0cr wrote on October 1, 2022 at 5:20 PM

Reply to [Discussion] If we had enough memory to always do full batch gradient descent, would we still need rmsprop/momentum/adam? by 029187

If it were possible to do full batch all the time minibatches would likely still be used. The stochasticity created by minibatch gradient descent generally improves a models generalisation performance.

ClearlyCylindrical t1_iqlqykj wrote on October 1, 2022 at 8:44 AM

Reply to [D] Why is the machine learning community obsessed with the logistic distribution? by cthorrez

I always thought that it was because its derivative was nice to calculate, just sigmoid(x)*(1 - sigmoid(x)).