We learned that you can get the same result from less parameters and more training. It's a tradeoff thing, so I'm not entirely surprised. We cannot assume that GPT's approach is the most efficient one out there, if anything it's just brute force effectiveness and we should desperately hope that the same or better results can be achieved with much less hardware ultimately. And so far it appears that this is true and is the case.
Viewing a single comment thread. View all comments