rjromero
rjromero t1_j12aza8 wrote
Reply to [R] Nonparametric Masked Language Modeling - MetaAi 2022 - NPM - 500x fewer parameters than GPT-3 while outperforming it on zero-shot tasks by Singularian2501
> We use the model architecture and initial weights of RoBERTa large (Liu et al., 2019), consisting of 354M parameters. Training is done for 100,000 steps, using thirty-two 32GB GPUs.
354M parameters? At FP32 that's 1.41gb. It's tiny.
rjromero t1_ivch6er wrote
Reply to [D] Simple Questions Thread by AutoModerator
How did InstructGPT completely go under the radar?
I remember trying GPT3 a while ago and being unimpressed. The results were mostly illogical copypasta. I couldn't believe the hype that preceded it in the media.
That is... Until I tried it again very recently, post InstructGPT. The text generation itself, prompting aside, has improved greatly. Prompting feels unreal, especially some of the Q/A and command extraction tasks. It takes a few shots to perform what would otherwise take mountains of data to train with traditional NLP approaches.
GPT3 is now InstructGPT by default, as of Jan of this year. But why wasn't there more hype around InstructGPT? I feel it warrants a rename or at least a major version bump of GPT.
rjromero t1_j61ytag wrote
Reply to [R] Why Can GPT Learn In-Context? Language Models Secretly Perform Gradient Descent as Meta-Optimizers by currentscurrents
This is incredible research. Finally a lead on how we might get to "true" one shot / few shot learning.