Submitted by radi-cho t3_110s8ui in MachineLearning
MysteryInc152 t1_j8ppoiq wrote
Reply to comment by swegmesterflex in [R] [N] Toolformer: Language Models Can Teach Themselves to Use Tools - paper by Meta AI Research by radi-cho
I'd rather the basic senses at least (vision as well as audio) be pretrained as well. We know from Multimodal chain of thought as well as scaling laws for generative mixed modal language models that multimodal models far outperform single modal models on the same data and scale. You won't get that kind of performance gain leveraging those basic senses to outside tools.
Viewing a single comment thread. View all comments