Submitted by Ttttrrrroooowwww t3_y9a0j3 in deeplearning
Hello,
I’ve been looking for some comparisons between EMA (Exponential Moving Average), SWA (Weight Averaging) and SAM (Sharpness Aware Minimization).
When would I use which? Am I correct to assume that SAM is used “on top” and can complement EMA / SWA? While SWA and EMA basically do the same thing “ensembling” the model during training?
To go further, would label smoothing not also be somewhat comparable?
Looking for some pointers here to studies/articles, or would love to hear your own personal experiences.
suflaj t1_it4h2vs wrote
I wouldn't use any of them because they don't seem to be worth it, and they're generally unproven on modern, relevant models. If I wanted to minimize variance, I'd just build an ensemble of models.
The best advice I can give you is to disregard older papers, model averaging is like a 4 year old idea and doesn't seem to be used much in practice.