Recent comments in /f/deeplearning

MisterManuscript t1_jcxo0zv wrote

You don't need the 40 series, they're designed with providing ML solutions to games. You're paying extra just to have an in-built optical-flow accelerator that you're not gonna use for model training.

The optical flow accelerator is meant for computing dense optical flow fields as part of many inputs to the DLSS feature that most new games use.

You're better off with the 30 series or lesser.

17

RichardBJ1 t1_jcxcu7e wrote

Probably need an answer from someone who has both and has benchmarked some examples. (EDIT: and I do not!) Personally I find a lot of “law of diminishing(Edit) returns” with this type of thing. Also for me, since I spend 100x more time coding and testing will dummy sets… the actual speed of run is not as critical as people would expect…

1

bentheaeg t1_jcsvdy1 wrote

Not able to reply for sure right now (A6000 Ada are missing open tests), I don´t think that many people can. I work at a scale up though (PhotoRoom), and we're getting a 4xA6000 Ada server next week , we were planning to publish benchmarks vs. our other platforms (DGXs, custom servers, .. from A100 to A600 and 3090), stay tuned !

From a distance, semi educated guess:

- A6000 Ada are really, really good in compute. So models which are really compute bound (think Transformers with very big embeddings) should do well, models which are more IO bound (convnets for instance) will not do as well, especially vs. the A100 which has much faster memory

- the impact of nvlink is not super clear to me, its bandwidth was not really big to begin with anyway. My guess is that it may be more useful for latency bound inter GPU communications, like when using syncbatchnorm.

- there are a lot of training tweaks that you can use (model or pipeline parallel, FSDP, grad accumulation to cut on the comms..), so the best training setup for each platform may differ, it's also a game of apples to oranges, and this is by design

- I would take extra care around the cooling system, if you're not a cloud operator then a server going down will be a mess in your lab. This happened to us 3 times in the past 6 months, always because of the cooling. These machines can tap into 2kW+ H24 , this has to be extracted out and from our limited experience some setups (even from really big names) are not up to the task and go belly up in the middle of a job. 80GB A100s are 400 to 450W, A6000s (Ada or now) are 300W, easier to cool down if you're not buckled up. Not a point against the A100 per say, but a point against the A100 & unproven cooling let's say.

3