Submitted by N3urAlgorithm t3_1115h5o in deeplearning
N3urAlgorithm OP t1_j8dokrh wrote
Reply to comment by lambda_matt in GPU comparisons: RTX 6000 ADA vs Hopper h100 by N3urAlgorithm
So basically rtx6k does not support shared memory and so the stack of ada rtx will only be useful to accelerate things, isn't it?
For the h100 instead is it possible to do something like that?
Is the price difference of 7.5k for the 6000 wrt 30k for the h100 legit?
lambda_matt t1_j8facir wrote
Short answer is, it’s complicated. Some workloads can handle being distributed across slower memory busses.
Frameworks have also implemented strategies for doing single node distributed training https://pytorch.org/tutorials/beginner/dist_overview.html
Viewing a single comment thread. View all comments