Viewing a single comment thread. View all comments

Anenome5 t1_j9n56lk wrote

We learned that you can get the same result from less parameters and more training. It's a tradeoff thing, so I'm not entirely surprised. We cannot assume that GPT's approach is the most efficient one out there, if anything it's just brute force effectiveness and we should desperately hope that the same or better results can be achieved with much less hardware ultimately. And so far it appears that this is true and is the case.


NoidoDev t1_j9nelfr wrote

>same result from less parameters and more training

Thanks, good to know.