rjromero t1_j12aza8 wrote
> We use the model architecture and initial weights of RoBERTa large (Liu et al., 2019), consisting of 354M parameters. Training is done for 100,000 steps, using thirty-two 32GB GPUs.
354M parameters? At FP32 that's 1.41gb. It's tiny.
ItsTheUltimateBob t1_j12goke wrote
That's a puny number of GPUs, too.
vwings t1_j13pguc wrote
It was expected, right? A retrieval system should be much more efficient than storing phrases in neural net weights as GPT does...
Viewing a single comment thread. View all comments