Submitted by N3urAlgorithm t3_1115h5o in deeplearning
lambda_matt t1_j8facir wrote
Reply to comment by N3urAlgorithm in GPU comparisons: RTX 6000 ADA vs Hopper h100 by N3urAlgorithm
Short answer is, it’s complicated. Some workloads can handle being distributed across slower memory busses.
Frameworks have also implemented strategies for doing single node distributed training https://pytorch.org/tutorials/beginner/dist_overview.html
Viewing a single comment thread. View all comments