rjromero t1_j12aza8 wrote on December 21, 2022 at 3:36 AM

> We use the model architecture and initial weights of RoBERTa large (Liu et al., 2019), consisting of 354M parameters. Training is done for 100,000 steps, using thirty-two 32GB GPUs.

354M parameters? At FP32 that's 1.41gb. It's tiny.

ItsTheUltimateBob t1_j12goke wrote on December 21, 2022 at 4:25 AM

That's a puny number of GPUs, too.

vwings t1_j13pguc wrote on December 21, 2022 at 1:16 PM

It was expected, right? A retrieval system should be much more efficient than storing phrases in neural net weights as GPT does...