SatisfyingLatte t1_ittn52b wrote
Once all the useful representations from the training data has been extracted and learned. Beyond that, increasing model size will overfit the training data. Only language tasks might be solvable by naively scaling current techniques.
ReasonablyBadass t1_ittryn7 wrote
Overfitting isn't an issue anymore due to the discovery of double descent/grokking.
Viewing a single comment thread. View all comments