Viewing a single comment thread. View all comments

Single_Blueberry t1_jdnz55p wrote

Just because there's a more efficient architecture doesn't mean it will also benefit further from inreasing it's size.

"We" didn't build a 100B+ parameter model because that's exactly what we need, but because that's the current limit of what we can do.

1