ClearlyCylindrical
ClearlyCylindrical t1_iqlqykj wrote
Reply to [D] Why is the machine learning community obsessed with the logistic distribution? by cthorrez
I always thought that it was because its derivative was nice to calculate, just sigmoid(x)*(1 - sigmoid(x)).
ClearlyCylindrical t1_iqna0cr wrote
Reply to [Discussion] If we had enough memory to always do full batch gradient descent, would we still need rmsprop/momentum/adam? by 029187
If it were possible to do full batch all the time minibatches would likely still be used. The stochasticity created by minibatch gradient descent generally improves a models generalisation performance.