suflaj t1_it686mk wrote on October 21, 2022 at 6:00 AM

Reply to comment by Ttttrrrroooowwww in EMA / SWA / SAM by Ttttrrrroooowwww

This would depend on whether or not you believe newer noisy data is more important. I would not use it generally because it's not something you can guarantee on all data and would have to be theoretically confirmed beforehand, which might be impossible given a task.

If I wanted to reduce the noisiness of pseudo-labels I would not want to introduce additional biases on the data itself, so I'd rather do sample selection, which seems to be what the newest papers suggest to do. Weight averaging is introducing biases akin to what weight normalization techniques did, which were partially abandoned in favour of different approaches, ex. larger batch sizes, because they proved to be more robust and performant in practice as we got models more different than the ML baselines we based our findings on.

Now, if I wasn't aware of papers that came out this year, maybe I wouldn't be saying this. That's why I recommended you stick to newer papers, becuase problems are never really fully solved and newer solutions tend to make bigger strides than optimizing older ones.

Ttttrrrroooowwww OP t1_it6dz0l wrote on October 21, 2022 at 7:14 AM

Can you point me to the papers you reference?

Ive only come across 2019 papers about sample selection (assuming you mean data sampling)

suflaj t1_it6fwta wrote on October 21, 2022 at 7:40 AM

That is way too old. Here are a few papers:

https://arxiv.org/abs/2202.07136

https://arxiv.org/abs/2201.10836

Ttttrrrroooowwww OP t1_it6n348 wrote on October 21, 2022 at 9:24 AM

Thanks a lot

I read PARS. Looks very interesting and is somewhat related to pseudo-label entropy minimization. Im thinking of going in a similar direction, a great tip.