This is an archive of the discontinued LLVM Phabricator instance.

[clang][CodeGen] Fix GPU-specific attributes being dropped by bitcode linking
ClosedPublic

Authored by Pierre-vh on Jun 6 2023, 2:55 AM.

Details

Summary

Device libs make use of patterns like this:

__attribute__((target("gfx11-insts")))
static unsigned do_intrin_stuff(void)
{
  return __builtin_amdgcn_s_sendmsg_rtnl(0x0);
}

For functions that are assumed to be eliminated if the currennt GPU target doesn't support them.
At O0 such functions aren't eliminated by common optimizations but often by AMDGPURemoveIncompatibleFunctions instead, which sees the "+gfx11-insts" attribute on, say, GFX9 and knows it's not valid, so it removes the function.

D142907 accidentally made it so such attributes were dropped during bitcode linking, making it impossible for RemoveIncompatibleFunctions to catch the functions and causing ISel to catch fire eventually.

This fixes the issue and adds a new test to ensure we don't accidentally fall into this trap again.

Fixes SWDEV-403642

Diff Detail

Event Timeline

Pierre-vh created this revision.Jun 6 2023, 2:55 AM
Herald added a project: Restricted Project. · View Herald TranscriptJun 6 2023, 2:55 AM
Pierre-vh requested review of this revision.Jun 6 2023, 2:55 AM
Herald added a project: Restricted Project. · View Herald TranscriptJun 6 2023, 2:55 AM
Pierre-vh edited the summary of this revision. (Show Details)Jun 6 2023, 2:56 AM
arsenm added inline comments.Jun 6 2023, 3:37 AM
clang/test/CodeGenCUDA/Inputs/ocml-sample-target-attrs.cl
3

Sound really be ulong

clang/test/CodeGenCUDA/link-builtin-bitcode-gpu-attrs-preserved.cu
35

Also should make sure target-cpu was set

43

ulong

Pierre-vh updated this revision to Diff 528793.Jun 6 2023, 4:32 AM
Pierre-vh marked 3 inline comments as done.

target-cpu wasn't set so I tried something a bit different, but now I still need to remove the target-cpu check in the old test.
For some reason filecheck doesn't match it but it's there?

/home/pierre/work/trunk/llvm-project/clang/test/CodeGenCUDA/link-builtin-bitcode-denormal-fp-mode.cu:150:28: error: IEEEF32-PSZF64-DYNFULL: expected string not found in input
// IEEEF32-PSZF64-DYNFULL: #[[$FUNCATTR]] = { {{.*}} "denormal-fp-math"="preserve-sign,preserve-sign" "denormal-fp-math-f32"="ieee,ieee" {{.*}} "target-cpu"="gfx803" {{.*}} }
                           ^
<stdin>:260:427: note: scanning from here
attributes #0 = { convergent mustprogress noinline norecurse nounwind optnone "amdgpu-flat-work-group-size"="1,1024" "denormal-fp-math"="preserve-sign,preserve-sign" "denormal-fp-math-f32"="ieee,ieee" "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-cpu"="gfx803" "target-features"="+16-bit-insts,+ci-insts,+dpp,+gfx8-insts,+s-memrealtime,+s-memtime-inst,+wavefrontsize64" "uniform-work-group-size"="true" }
                                                                                                                                                                                                                                                                                                                                                                                                                                          ^
<stdin>:260:427: note: with "$FUNCATTR" equal to "1"
attributes #0 = { convergent mustprogress noinline norecurse nounwind optnone "amdgpu-flat-work-group-size"="1,1024" "denormal-fp-math"="preserve-sign,preserve-sign" "denormal-fp-math-f32"="ieee,ieee" "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-cpu"="gfx803" "target-features"="+16-bit-insts,+ci-insts,+dpp,+gfx8-insts,+s-memrealtime,+s-memtime-inst,+wavefrontsize64" "uniform-work-group-size"="true" }
                                                                                                                                                                                                                                                                                                                                                                                                                                          ^
<stdin>:261:85: note: possible intended match here
attributes #1 = { convergent mustprogress nofree norecurse nosync nounwind willreturn memory(none) "denormal-fp-math"="preserve-sign,preserve-sign" "denormal-fp-math-f32"="ieee,ieee" "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-cpu"="gfx803" }
Pierre-vh updated this revision to Diff 528794.Jun 6 2023, 4:40 AM

Fix check lines, I think it's just FileCheck weirdness

arsenm accepted this revision.Jun 6 2023, 1:37 PM
arsenm added a reviewer: yaxunl.
arsenm added inline comments.
clang/test/CodeGenCUDA/link-builtin-bitcode-gpu-attrs-preserved.cu
35

Did this previously receive the target-features spam implied by the target?

This revision is now accepted and ready to land.Jun 6 2023, 1:37 PM
yaxunl accepted this revision.Jun 7 2023, 6:09 AM

LGTM. Thanks

clang/test/CodeGenCUDA/link-builtin-bitcode-gpu-attrs-preserved.cu
36

Do we know why internalize keeps the target-cpu attribute but non-internalize does not?

Pierre-vh marked 2 inline comments as done.Jun 7 2023, 6:50 AM
Pierre-vh added inline comments.
clang/test/CodeGenCUDA/link-builtin-bitcode-gpu-attrs-preserved.cu
35

Did this previously receive the target-features spam implied by the target?

I think it did, the attributes were filled with things like "+gfx9-insts", etc.

Do we know why internalize keeps the target-cpu attribute but non-internalize does not?

PropagateAttrs is only set for -mlink-builtin-bitcode, see CompilerInvocation.cpp@1888 (where OPT_mlink_builtin_bitcode is processed)

This revision was landed with ongoing or failed builds.Jun 7 2023, 6:51 AM
This revision was automatically updated to reflect the committed changes.
Pierre-vh marked an inline comment as done.