Best GPUs for pretraining roBERTa-size LLMs with a $50K budget, 4x RTX A6000 v.s. 4x A6000 ADA v.s. 2x A100 80GB Submitted by AngrEvv t3_11vb220 on March 19, 2023 at 4:17 AM in deeplearning 7 comments 18
CKtalon t1_jctb1c0 wrote on March 19, 2023 at 12:10 PM Do not be tricked by memory pooling. NVLink might not really improve performance on the A6000s by much (different case for the A100s) I think it will be a tough choice between 2xA100/ and 4x 6000 Ada Permalink 5
Viewing a single comment thread. View all comments