Submitted by MazenAmria t3_zhvwvl in deeplearning
MazenAmria OP t1_izpii1s wrote
Reply to comment by sqweeeeeeeeeeeeeeeps in Advices for Deep Learning Research on SWIN Transformer and Knowledge Distillation by MazenAmria
To examine SWIN itself whether it's overparameterized or not.
sqweeeeeeeeeeeeeeeps t1_izspv5o wrote
Showing you can create a smaller model with the same performance means SWIN is overparameterized for that given task. Give it datasets with varying complexity, not just one single one.
sqweeeeeeeeeeeeeeeps t1_izq6vbc wrote
It is.
pr0d_ t1_izqjmmk wrote
yeah as per my comment, the DEiT papers explored knowledge distillation based off Vision Transformers. What you want to do here is probably similar, and the resources needed to prove it is huge to say the list. Any chance you've discussed this with your advisor?
MazenAmria OP t1_izrgnco wrote
I remember reading it, I'll read it again and discuss it. Thanks.
Viewing a single comment thread. View all comments