Do we really need 100B+ parameters in a large language model? Submitted by Vegetable-Skill-9700 t3_121agx4 on March 25, 2023 at 4:24 AM in deeplearning 54 comments 43
Single_Blueberry t1_jdnz55p wrote on March 25, 2023 at 8:52 PM Just because there's a more efficient architecture doesn't mean it will also benefit further from inreasing it's size. "We" didn't build a 100B+ parameter model because that's exactly what we need, but because that's the current limit of what we can do. Permalink 1
Viewing a single comment thread. View all comments