Submitted by TensorDudee t3_zloof9 in MachineLearning
Internal-Diet-514 t1_j07qmb0 wrote
Reply to comment by pyepyepie in [P] Implemented Vision Transformers 🚀 from scratch using TensorFlow 2.x by TensorDudee
I agree with you, it’s just now a days when people say they have created an architecture that outperforms some baseline they really means it outperforms some baseline on image net or cifar or some other established dataset. All data is different and I really think the focus should be what added ability does this architecture have to model relationships between the input data that a baseline doesn’t and how does that help with this specific problem. Which is why the transformer was such a great architecture to begin with for NLP problems because it demonstrated the ability to model longer range dependencies over an LSTM like architecture. I’m just not sure it translated well to vision when we begin to say it’s better than a pure CNN based architecture.
pyepyepie t1_j08pa80 wrote
Ideas > performance, for sure :)
Viewing a single comment thread. View all comments