Currently, wavfrontsize32 is being appended by device-libs to some functions for gfx9 targets.
For gfx9 and below targets, this change deletes the incompatible function, since FeatureWavefrontSize32 is not supported.
Details
Diff Detail
Event Timeline
Currently, wavfrontsize32 is being appended by device-libs to some functions for gfx9 targets.
Why? That seems wrong.
It is wrong but that’s how the library is currently structured. All code coexists in the same IR, and there are some wave32 only functions
One concern is whether removing +wavefrontsize32 will make them work. They may contain an instruction sequence that only works with wavefront size 32.
If device libs doesn't want to change that, we could provide an attribute or similar that device libs annotates functions with. Wave32-only or similar, with semantics that it gets erased by clang without warning on wave64. Would be as if that function never existed on wave64 targets. Might need it to apply to objects as well, depending on how much of this stuff device-libs contains.
But if you're compiling for gfx900 then this pass should remove any functions that are marked with +wavefrontsize32. It should not just remove the +wavefrontsize32 from the function.
I thought we had agreed that we were going to eliminate the body of incompatible functions and that had been implemented. If that is not enough, then maybe we should strip the problematic attributes of such functions as well?
It looks like you just want a one-liner patch that adds FeatureWavefrontSize32 to the FeaturesToCheck table?
That used to be broken. Clang mlink-builtin-bitcode applies attributes to functions in device libs that were both required and missing from device libs. Caused a bunch of pain in openmp because we really wanted to link device libs at link time, not into every TU up front in clang. Has someone fixed that requirement?
Indeed, if your goal is just to remove the functions that have +wavefrontsize32 on targets that doesn't support it, this is all you need to do.
I'm also not a fan of removing attributes in this pass. The pass is documented as "removes any function with incompatible attributes". If it also starts removing attributes then its description (& the CL option) would need to be updated to reflect the change (but really, I think this should be done somewhere else, perhaps in yet another pass).
OpenCL uses AMDGPUSimplifyLibCalls with -amdgpu-prelink, which transforms some calls of device lib functions into calls of new device lib functions, e.g. sincos. Since this happens after clang's -mlink-builtin-bitcode, these newly introduced device lib functions need to be linked in by comgr directly.
But that is not the only situation that device libs are linked by non-clang programs. There are other FE's that generate IR for amdgpu and need link with device libs, e.g. XLA.
That said, I think the question is that whether simply removing +wavefrontsize32 makes a function valid for wave64. There must be a reason that a function in device lib is marked with +wavefrontsize32. I assume that a function marked with +wavefrontsize32 means it can only be used with wave32. If that is true, then such functions should just be treated as incompatible functions and treated the same way.
I think a function with wavefrontsize32 is incompatible with a wavefrontsize64 compilation and similarly for a wavefrontsize64 function with a wavefrontsize32 compilation and we should take the usual steps for incompatible functions.
Updated patch to delete the function if FeatureWavefrontSize32 is not supported by current GPU.
I'd need to think harder about whether it's ok to simply remove functions with wavefrontsize32 if you aren't compiling for wave32. It's not quite like the other features, as it behaves more like a selectable mode. I would expect we delete wave32 functions for <gfx10 targets that don't support the mode
llvm/test/CodeGen/AMDGPU/remove-incompatible-wave32-feature.ll | ||
---|---|---|
26 | -NOT checks are fragile and should be as loose as possible. Drop the "define void" and trailing ( |
I think we need to similarly remove a wavefrontsize64 function if compiling in wave32 mode.
llvm/lib/Target/AMDGPU/AMDGPURemoveIncompatibleFunctions.cpp | ||
---|---|---|
146 | @arsenm AFAIU, This condition checks that (for ex.) if wave32 mode is enabled and current GPU doesn't support waveforntsize32 feature, then that function should be deleted. |
expandImpliedFeatures gets all the features implied by current GPU. But wavefrontsize32 and 64 are not part of the feature list for gfx10 and above targets in AMDGPU.td. AFAIU, It is assumed those subtarget supports both the features if they are not part of the list. So GPUFeatureBits cannot be relied on to query FeatureWavefrontSize32/64.
In the latest revision, made changes to delete FeatureWavefrontSize32 functions for gfx9 and below targets that don't support the mode
llvm/lib/Target/AMDGPU/AMDGPURemoveIncompatibleFunctions.cpp | ||
---|---|---|
162–163 | Can you explain in the comment why we need to check this separately? | |
164 | IMO there is no need to make a separate bool, just inline it in the if, but no strong preference here (though make the bool const if you keep it) | |
164–165 | IMO the first variable isn't really needed, just inline the check in the if. The second one could also just be inlined because the feature won't change, but no strong preference here. If you keep either/both of them, make them const though. | |
167–173 | Moving this into a helper (e.g. reportFunctionRemoved(Fn, Feature)) would be better than duplicating the code |
I think we need to similarly remove a wavefrontsize64 function if compiling in wave32 mode.
I think this is a reasonable request but it's an extension to what this pass currently does.
- Currently the pass removes functions that rely on features not supported by the -mcpu= GPU.
- The extension would be to remove functions that rely on features not supported by the -mcpu= GPU as modified by any -mattr= options.
Does that make sense? If so can we please implement that in a generic way instead of having a special hack for wave32/wave4 features?
It makes sense to me. The library doesn't need it currently, but there has been a use case to detect CUmode in the headers, and that could arrive in the library where it would be useful to rely on either the CUmode or WGPmode version of a function to be removed depending on the mode.
llvm/test/CodeGen/AMDGPU/remove-incompatible-wave32-feature.ll | ||
---|---|---|
31–33 | Note these functions don't really "require" wave 32. They will codegen just fine without it |
@arsenm AFAIU, This condition checks that (for ex.) if wave32 mode is enabled and current GPU doesn't support waveforntsize32 feature, then that function should be deleted.
This inherently fulfills the condition "delete wave32 functions for <gfx10 targets that don't support the mode" ? Please let me know if I'm missing anything here.