Submitted by Ttttrrrroooowwww t3_y9a0j3 in deeplearning
suflaj t1_it66q7y wrote
Reply to comment by Lee8846 in EMA / SWA / SAM by Ttttrrrroooowwww
While it is true that the age of a method does not determine its value, the older a method is, the more likely the performance gains you get are surpassed by some other method or model.
Specifically I do not see why I would use any weight averaging over a better model or training technique.
> In this case, an ensemble of models might not help.
Because you'd just use a bigger batch size
Viewing a single comment thread. View all comments