Viewing a single comment thread. View all comments

suflaj t1_it66q7y wrote

Reply to comment by Lee8846 in EMA / SWA / SAM by Ttttrrrroooowwww

While it is true that the age of a method does not determine its value, the older a method is, the more likely the performance gains you get are surpassed by some other method or model.

Specifically I do not see why I would use any weight averaging over a better model or training technique.

> In this case, an ensemble of models might not help.

Because you'd just use a bigger batch size

1