Lee8846 t1_it5hqba wrote on October 21, 2022 at 2:00 AM

Reply to comment by suflaj in EMA / SWA / SAM by Ttttrrrroooowwww

I wouldn't say so. One cannot judge the value for a specific method by whether it's old or new. For example, in self-supervised learning, like in the work of MOCO, people still use moving average. It's a nice technique to maintain the consistency of query encoder. By the way, EMA actually helps to smooth the weights fluctuation in some case, which may be caused by the patterns of the data. In this case, an ensemble of models might not help.