MysteryInc152 t1_j8ppoiq wrote on February 16, 2023 at 1:36 AM

Reply to comment by swegmesterflex in [R] [N] Toolformer: Language Models Can Teach Themselves to Use Tools - paper by Meta AI Research by radi-cho

I'd rather the basic senses at least (vision as well as audio) be pretrained as well. We know from Multimodal chain of thought as well as scaling laws for generative mixed modal language models that multimodal models far outperform single modal models on the same data and scale. You won't get that kind of performance gain leveraging those basic senses to outside tools.

https://arxiv.org/abs/2302.00923

https://arxiv.org/abs/2301.03728