This is an archive of the discontinued LLVM Phabricator instance.

[AMDGPU] Split dot7 feature
ClosedPublic

Authored by rampitec on Jan 24 2023, 2:25 PM.

Diff Detail

Event Timeline

rampitec created this revision.Jan 24 2023, 2:25 PM
Herald added a project: Restricted Project. · View Herald TranscriptJan 24 2023, 2:25 PM
rampitec requested review of this revision.Jan 24 2023, 2:25 PM
Herald added a project: Restricted Project. · View Herald TranscriptJan 24 2023, 2:25 PM
Herald added a subscriber: wdng. · View Herald Transcript
arsenm added inline comments.Jan 24 2023, 2:28 PM
clang/include/clang/Basic/BuiltinsAMDGPU.def
239

I have even less idea what these numbers mean now than I did before. This is also a bitcode compatibility break

rampitec added inline comments.Jan 24 2023, 2:55 PM
clang/include/clang/Basic/BuiltinsAMDGPU.def
239

They actually never meant anything just because there is no system in the support matrix. I know this one will need simultaneous update of the device lib downstream.

foad added inline comments.Jan 25 2023, 2:35 AM
llvm/lib/Target/AMDGPU/VOP3PInstructions.td
1083–1084

Is this because Real instructions copy predicates from the Pseudo? Seems like this could go in as a separate obvious cleanup.

arsenm added inline comments.Jan 25 2023, 8:25 AM
clang/include/clang/Basic/BuiltinsAMDGPU.def
239

why not name these as just the exact instruction name?

rampitec added inline comments.Jan 25 2023, 11:33 AM
clang/include/clang/Basic/BuiltinsAMDGPU.def
239

This is legacy thing. When it first appeared it was a single instruction set. Changing it now completely will break a lot of stuff.

rampitec updated this revision to Diff 492220.Jan 25 2023, 12:08 PM

Split the cleanup NFCI.

rampitec marked an inline comment as done.
rampitec added inline comments.
llvm/lib/Target/AMDGPU/VOP3PInstructions.td
1083–1084
foad accepted this revision.Jan 26 2023, 2:51 AM

LGTM.

This revision is now accepted and ready to land.Jan 26 2023, 2:51 AM
This revision was landed with ongoing or failed builds.Jan 26 2023, 10:34 AM
This revision was automatically updated to reflect the committed changes.
rampitec marked an inline comment as done.
Herald added a project: Restricted Project. · View Herald TranscriptJan 26 2023, 10:34 AM
Herald added a subscriber: cfe-commits. · View Herald Transcript

Would it be possible to backport this to Clang 16?

If https://github.com/RadeonOpenCompute/ROCm-Device-Libs/commit/8dc779e19cbf2ccfd3307b60f7db57cf4203a5be makes it into ROCm 5.5 no distro would be able to build it with "vanilla" Clang 16, potentially causing pain for users that try to build ROCm 5.5 with a Clang from a package manager (a realistic scenario, considering that one may want to invest 5 min to build ROCm but not 40 min to build Clang). ROCm 5.5 will be the first release to officially support the 7900XT and 7900XTX, so not having this potentially causes issues for users with recent AMD hardware. (See https://github.com/RadeonOpenCompute/ROCm/issues/1880 for extensive, related discussion).

@jhuber6 This wouldn't exactly "solve" https://github.com/llvm/llvm-project/issues/60660, but I think this could also be a workaround (with potentially better user experience), as allowing users build ROCm with regular Clang 16 prevents that deadlock where we can't build ROCm anymore. This is entirely based on speculation that ROCm 5.5 won't introduce other breakages before its release though, so I'd totally understand if this is not a satisfactory solution.

Would it be possible to backport this to Clang 16?

If https://github.com/RadeonOpenCompute/ROCm-Device-Libs/commit/8dc779e19cbf2ccfd3307b60f7db57cf4203a5be makes it into ROCm 5.5 no distro would be able to build it with "vanilla" Clang 16, potentially causing pain for users that try to build ROCm 5.5 with a Clang from a package manager (a realistic scenario, considering that one may want to invest 5 min to build ROCm but not 40 min to build Clang). ROCm 5.5 will be the first release to officially support the 7900XT and 7900XTX, so not having this potentially causes issues for users with recent AMD hardware. (See https://github.com/RadeonOpenCompute/ROCm/issues/1880 for extensive, related discussion).

@jhuber6 This wouldn't exactly "solve" https://github.com/llvm/llvm-project/issues/60660, but I think this could also be a workaround (with potentially better user experience), as allowing users build ROCm with regular Clang 16 prevents that deadlock where we can't build ROCm anymore. This is entirely based on speculation that ROCm 5.5 won't introduce other breakages before its release though, so I'd totally understand if this is not a satisfactory solution.

It shall be complimented by the device-lib change in the corresponding release, so it is not that simple.

arsenm added inline comments.Feb 14 2023, 11:02 AM
clang/include/clang/Basic/BuiltinsAMDGPU.def
239

Then why bother renaming this? We really need to stop breaking feature names

It shall be complimented by the device-lib change in the corresponding release, so it is not that simple.

@rampitec I'm not sure I understand. Does this mean that this is breaking in a way that Clang 17 won't be able to build ROCm 5.4?

I thought it was like "we need D142507 to build device-libs after 8dc779e" and for older device libs we just fall back to some older behavior.

It shall be complimented by the device-lib change in the corresponding release, so it is not that simple.

@rampitec I'm not sure I understand. Does this mean that this is breaking in a way that Clang 17 won't be able to build ROCm 5.4?

I thought it was like "we need D142507 to build device-libs after 8dc779e" and for older device libs we just fall back to some older behavior.

Since the feature is actually used by the device-lib it had to be updated in lock step with the compiler change, not after or before. That's what was done in the downstream.

Well, I can already feel the pain that distro maintainers having to build the next ROCm releases 😅

I wonder what the better course of action is here:

  1. Port this patch to Clang 16 so that users with new hardware will be able to build ROCm 5.5, but make it impossible to build ROCm 5.4 and older with clang 16.
  2. Don't port this patch and have a ~6 months gap during which users with the 7900 GPUs won't be able to build ROCm with a stable Clang version, requiring distro maintainers to use several toolchains and source-based distro users to use differentl compatibility patches for different ROCm releases. So basically when 8900 GPUs are announced, clang would support ROCm for 7900 GPUs 😅

Would there be a way to retain at least *some* backwards compatibility or version interoperability? For instance, via an #ifdef CLANG_VERSION_MAJOR in the device libs and an #ifdef INCOMPATIBLE_AMDGPU_INSTS in Clang?

This would obviously very ugly, but it still seems better to me than locking out users (and more likely, ROCm contributors) from using 7900 GPUs if they are unable to build Clang themselves. Users already complain about how hard it is to build ROCm, and they also complain about the frequent breaking changes Clang. I'm very much in favor of moving fast, but I'm worried that complete disregard for backwards compatibility like this with no clear upgrade path or fallback mechanism could cause a lot of frustration for users and distro maintainers.

Maybe there is some other, prettier way to solve this? 🥹

Well, I can already feel the pain that distro maintainers having to build the next ROCm releases 😅

I wonder what the better course of action is here:

  1. Port this patch to Clang 16 so that users with new hardware will be able to build ROCm 5.5, but make it impossible to build ROCm 5.4 and older with clang 16.
  2. Don't port this patch and have a ~6 months gap during which users with the 7900 GPUs won't be able to build ROCm with a stable Clang version, requiring distro maintainers to use several toolchains and source-based distro users to use differentl compatibility patches for different ROCm releases. So basically when 8900 GPUs are announced, clang would support ROCm for 7900 GPUs 😅

Would there be a way to retain at least *some* backwards compatibility or version interoperability? For instance, via an #ifdef CLANG_VERSION_MAJOR in the device libs and an #ifdef INCOMPATIBLE_AMDGPU_INSTS in Clang?

This would obviously very ugly, but it still seems better to me than locking out users (and more likely, ROCm contributors) from using 7900 GPUs if they are unable to build Clang themselves. Users already complain about how hard it is to build ROCm, and they also complain about the frequent breaking changes Clang. I'm very much in favor of moving fast, but I'm worried that complete disregard for backwards compatibility like this with no clear upgrade path or fallback mechanism could cause a lot of frustration for users and distro maintainers.

Maybe there is some other, prettier way to solve this? 🥹

I cannot say there was much choice. The only real choice was to postpone the split and magnify the problem in the future. As for the ifdefs, this might be possible in the device-libs but I do not see how to do it the Builtins.def.

I cannot say there was much choice. The only real choice was to postpone the split and magnify the problem in the future. As for the ifdefs, this might be possible in the device-libs but I do not see how to do it the Builtins.def.

Hmm maybe ifdefs in the device libs would also just delay the issue. Maybe it really is best to pull this change into Clang 16 and accept the fact that it's an unfortunate situation, but at least give users with very recent hardware the option to use a regular Clang to build ROCm. Realistically, those actually upgrading to Clang 16 early will also be those upgrading to ROCm5.5 early and likely also be those most likely to have 7900 GPUs.

Somehow, telling users "if you have a new GPU you need new Clang + ROCm" and "if you want new ROCm for your old GPU you need to also upgrade Clang" sounds better to me than telling them "if you have a new GPU you are SOL unless you use binary releases or build the amd-llvm-fork" 😅

I cannot say there was much choice. The only real choice was to postpone the split and magnify the problem in the future. As for the ifdefs, this might be possible in the device-libs but I do not see how to do it the Builtins.def.

Hmm maybe ifdefs in the device libs would also just delay the issue. Maybe it really is best to pull this change into Clang 16 and accept the fact that it's an unfortunate situation, but at least give users with very recent hardware the option to use a regular Clang to build ROCm. Realistically, those actually upgrading to Clang 16 early will also be those upgrading to ROCm5.5 early and likely also be those most likely to have 7900 GPUs.

Somehow, telling users "if you have a new GPU you need new Clang + ROCm" and "if you want new ROCm for your old GPU you need to also upgrade Clang" sounds better to me than telling them "if you have a new GPU you are SOL unless you use binary releases or build the amd-llvm-fork" 😅

In fact pulling it into clang-16 does not automatically mean it should be the same in the rocm clang build... So this may be a way to go. @b-sumner do you have any objections to backport this into clang-16?

@aaronmondal what exactly backport will look like?

I think unless conflicts arise creating an issue similar to this https://github.com/llvm/llvm-project/issues/60600 with the cherry-pick line set to this commit should be enough. (See also https://llvm.org/docs/GitHub.html).

I think unless conflicts arise creating an issue similar to this https://github.com/llvm/llvm-project/issues/60600 with the cherry-pick line set to this commit should be enough. (See also https://llvm.org/docs/GitHub.html).

I believe it will need D142407 to be cherry-picked as well to apply cleanly. Otherwise I do not expect conflicts. So the c-p need to go into release/16.x, right?
Let's wait for @b-sumner first anyway, he is maintaining device-lib.

I think unless conflicts arise creating an issue similar to this https://github.com/llvm/llvm-project/issues/60600 with the cherry-pick line set to this commit should be enough. (See also https://llvm.org/docs/GitHub.html).

I believe it will need D142407 to be cherry-picked as well to apply cleanly. Otherwise I do not expect conflicts. So the c-p need to go into release/16.x, right?
Let's wait for @b-sumner first anyway, he is maintaining device-lib.

I have no objection to backporting this, but it may need to be accompanied with a device-libs patch, and I don't know where that patch would be checked in. The ROCm-Device-Libs in github certainly doesn't have a "clang-16" branch.

I have no objection to backporting this, but it may need to be accompanied with a device-libs patch, and I don't know where that patch would be checked in. The ROCm-Device-Libs in github certainly doesn't have a "clang-16" branch.

My current understanding is the c-p will go into already forked clang-16, but not to rocm 5.4. So rocm device-libs will be accompanied by the older clang-16 w/o this and stay compatible. Someone building from scratch will use latest clang-16 and staging device-libs with this change. Do you think this will work?

My current understanding is the c-p will go into already forked clang-16, but not to rocm 5.4. So rocm device-libs will be accompanied by the older clang-16 w/o this and stay compatible. Someone building from scratch will use latest clang-16 and staging device-libs with this change. Do you think this will work?

I wouldn't recommend it. I would patch whatever device libs are being built in association with clang-16, not staging. Staging device libs is only appropriate for the staging compiler. A hash of device libs from around the time that clang-16 stable released would probably be safe.

My current understanding is the c-p will go into already forked clang-16, but not to rocm 5.4. So rocm device-libs will be accompanied by the older clang-16 w/o this and stay compatible. Someone building from scratch will use latest clang-16 and staging device-libs with this change. Do you think this will work?

I wouldn't recommend it. I would patch whatever device libs are being built in association with clang-16, not staging. Staging device libs is only appropriate for the staging compiler. A hash of device libs from around the time that clang-16 stable released would probably be safe.

In general the idea is that compiler and device-libs should match. I guess the correct answer then users of clang-16 shall use rocm-5.4.x branch of the device libs?