Submitted by N3urAlgorithm t3_1115h5o in deeplearning
CKtalon t1_j8e0j48 wrote
Reply to comment by N3urAlgorithm in GPU comparisons: RTX 6000 ADA vs Hopper h100 by N3urAlgorithm
You’ll have to use something like DeepSpeed to split the layers across multiple GPUs. Of course, if the model can fit on one GPU, then you can go to crazier with bigger batch sizes
Viewing a single comment thread. View all comments