After two years of development and some deliberation, AMD decided that there is no business case for running CUDA applications on AMD GPUs. One of the terms of my contract with AMD was that if AMD did not find it fit for further development, I could release it. Which brings us to today.
I'm really curious who at AMD thought it to be a great idea to develop a CUDA compatibility layer but not to release it. As stated, the release was only made because AMD ended financial support.
The problem is that if we make CUDA the standard, then they put nVidia in control of a standard. nVidia could try to manipulate the situation in future versions of CUDA by reworking it to fuck with this implementation, giving AMD a shaky name in the space.
We saw this happen with Wine, where although probably not deliberately, MS made Windows compatibility a moving and very unstable target.
That is something tolerable by open source communities, but isn't something that will fly for official support.
Basically it means that AMD is now a possible contender for the rather large market of basically scientific researchers and private industry who have CUDA based/oriented software to do 'AI' driven development or research on huge banks of GPUs.
Probably this initial implementation still has some kinks to iron out, but it could eventually result in Nvidia not having a functional monopoly in that market.
Also its neat from a hobbyist perspective if youre looking to do some kind of small version of CUDA based stuff along the same lines.
I'd say it's more like they're failing upwards. It's certainly good for AMD, but it seems like it happened in spite of their involvement, not because of it:
For reasons unknown to me, AMD decided this year to discontinue funding the effort and not release it as any software product. But the good news was that there was a clause in case of this eventuality: Janik could open-source the work if/when the contract ended.
AMD didn't want this advertised or released, and even canned this project despite it reaching better performance than the OpenCL alternative. I really don't get their thought process. It's surreal. Do they not want to support AI? Do they not like selling GPUs?
I really don’t get their thought process. It’s surreal.
Maybe they see it as something that would undermine their effords in increasing ROCm/HIP adoption? (But why fund its development for two years then? I agree with you: It all seems so weird!)
Can someone please explain like I’m five what the meaning and impact of this will be? Past posts and comments don’t seem to be very clear. As someone who uses both Linux and macOS professionally for design, this could be a massive game changer for me.
ok, I get that much. what I’d like to know, if you’re willing to explain: what’s it going to be like deploying that on, say, a Mac workstation? a pop_os workstation? (edit: such as: how, can I on macOS, will I work with after effects, etc.)
CUDA is when a program can use the NVIDIA GPU in addition to the CPU for some complicated calculations. AMD now made it possible to use their cards for it too.
I know what CUDA does (as someone who likes rendering stuff, but with AMD cards, I’ve missed it). I’m trying to figure out, realistically, how I can easily deploy and make use of it on my linux and Mac workstations.
the details o’ve come across lately have been a bit… vague.
edit: back when I was in design school, I heard, “when Adobe loves a video card very much, it will hardware accelerate. We call this ‘CUDA’."
CUDA is the kernel diver API for Nvidia. Stuff like AI runs directly on it.
ROCm is the AMD kernel driver API for their hardware, think like Windows calls it a folder while Linux has directories kinds of naming, but in code.
API is an abstraction layer like how Linux makes open, close, print like command and translates then into the actual byte code that runs on your specific hardware.
HIPS is a translation layer that automatically converts CUDA API calls to the ROCm API equivalent.
One problem with ROCm/HIPS is that it only targets the latest 7000 series cards onwards. AMD directly took over the API with the 7k stuff. Older AMD cards are almost like a whole different unrelated thing. At least this is the best I understand it from researching this before making a purchase to run AI offline 7 months ago.
ZLUDA is new and looks to be like a HIPS translation layer between the older 6000 series cards and CUDA indicating likely support for these in the near future.
If you are on modern hardware with secure boot and you have not gone through the hassle of setting up your own TPM secure boot keys so that you can sign your own kernel modules, you want to pay attention to the DKMS info like:
There are some known limitations though like currently only targeting the ROCm 5.x API and not the newly-released ROCm 6.x releases.. In turn having to stick to ROCm 5.7 series as the latest means that using the ROCm DKMS modules don't build against the Linux 6.5 kernel now shipped by Ubuntu 22.04 LTS HWE stacks, for example. Hopefully there will be enough community support to see ZLUDA ported to ROCM 6 so at least it can be maintained with current software releases.
The DKMS module is how the secureboot key shim can automatically build the GPU kernel driver module from source when changes are made. Your use case and what you run may vary here but I think the article is hinting that support is sketchy in the Kernel across distros. You'll probably still have some update headaches at times when stuff might break, but that us pure speculation on my part.
To me it means likely support for AI with older Radeon hardware is in the cards in the near future.