Viewing a single comment thread. View all comments

Exarctus t1_iyd7r5i wrote

I am an ML scientist. And the statement you're making about AMD GPUs only "being fine in limited circumstances" is absolutely false. Any network that you can create for a CUDA-enabled GPU can also be ported into an AMD-enabled GPU when working with PyTorch with a single code line change.

The issues arise when developers of particular external libraries that you might want to use only develop for one platform. This is **only** an issue when these developers make customized CUDA C implementations for specific part of their network, but don't use HIP for cross-compatibility. This is not an issue if the code is pure PyTorch.

This is not an issue with AMD, it's purely down to laziness (and possibly ill-experience) of the developer.

Regardless, whenever I work with AMD GPUs and implement or derive from other people work, it does sometimes include extra development time to convert e.g any customized CUDA C libraries that have been created by the developer to HIP libraries, but this in itself isn't too difficult as there are conversion tools available.

−11

Ronny_Jotten t1_iydddfe wrote

> the statement you're making about AMD GPUs only "being fine in limited circumstances" is absolutely false

Sorry, but there are limitations to the circumstances in which AMD cards are "fine". There are many real-world cases where Nvidia/CUDA is currently required for something to work. The comment you replied to was:

> Limited use in neural network applications at present due to many application's CUDA requirements (though the same could be said of AMD)

It was not specificaly about "code that is pure PyTorch", nor self-developed systems, but neural network applications in general.

It's fair of you to say that CUDA requirements can be met with HIP and ROCm if the developer supports it, though there are numerous issues and flaws in ROCm itself. But there are still issues and limitations in some circumstances, where they don't, as you've just described yourself! You can say that's due to the "laziness" of the developer, but it doesn't change the fact that it's broken. At the least it requires extra development time to fix, if you have the skills. I know a lot of people would appreciate it if you would convert the bitsandbytes library! Just because it could work, doesn't mean it does work.

The idea that there's just no downside to AMD cards for ML, because of the existence of ROCm, is true only in limited circumstances. "Limited" does not mean "very few", it means that ROCm is not a perfect drop-in replacement for CUDA in all circumstances; there are issues and limitations. The fact that Dreambooth doesn't run on AMD proves the point.

8