Viewing a single comment thread. View all comments

Lee8846 t1_it5hqba wrote

Reply to comment by suflaj in EMA / SWA / SAM by Ttttrrrroooowwww

I wouldn't say so. One cannot judge the value for a specific method by whether it's old or new. For example, in self-supervised learning, like in the work of MOCO, people still use moving average. It's a nice technique to maintain the consistency of query encoder. By the way, EMA actually helps to smooth the weights fluctuation in some case, which may be caused by the patterns of the data. In this case, an ensemble of models might not help.

4

suflaj t1_it66q7y wrote

While it is true that the age of a method does not determine its value, the older a method is, the more likely the performance gains you get are surpassed by some other method or model.

Specifically I do not see why I would use any weight averaging over a better model or training technique.

> In this case, an ensemble of models might not help.

Because you'd just use a bigger batch size

1