This is an archive of the discontinued LLVM Phabricator instance.

[AMDGPU] replace hostcall module flag with function attribute
ClosedPublic

Authored by sameerds on Feb 7 2022, 11:35 PM.

Details

Summary

The module flag to indicate use of hostcall is insufficient to catch
all cases where hostcall might be in use by a kernel. This is now
replaced by a function attribute that gets propagated to top-level
kernel functions via their respective call-graph.

If the attribute "amdgpu-no-hostcall-ptr" is absent on a kernel, the
default behaviour is to emit kernel metadata indicating that the
kernel uses the hostcall buffer pointer passed as an implicit
argument.

The attribute may be placed explicitly by the user, or inferred by the
AMDGPU attributor by examining the call-graph. The attribute is
inferred only if the function is not being sanitized, and the
implictarg_ptr does not result in a load of any byte in the hostcall
pointer argument.

Diff Detail

Event Timeline

sameerds created this revision.Feb 7 2022, 11:35 PM
sameerds requested review of this revision.Feb 7 2022, 11:35 PM
Herald added a reviewer: sstefan1. · View Herald Transcript
Herald added a reviewer: herhut. · View Herald Transcript
Herald added a reviewer: baziotis. · View Herald Transcript
Herald added projects: Restricted Project, Restricted Project, Restricted Project. · View Herald Transcript
jdoerfert added inline comments.Feb 8 2022, 12:20 AM
llvm/lib/Target/AMDGPU/AMDGPUAttributor.cpp
565

You can use AAPointerInfo for the call site return IRPosition. It will (through the iterations) gather all accesses and put them into "bins" based on offset and size. It deals with uses in calls, etc. and if there is stuff missing it is better to add it in one place so we benefit throughout.

588

Always check the return value, if it is false something went wrong and some calls might have been missed. Basically,

if (!A.check...)
  NeedsHostCallPtr = true;

then you also can just return false in the callback w/o setting it.

sameerds added inline comments.Feb 8 2022, 4:05 AM
llvm/lib/Target/AMDGPU/AMDGPUAttributor.cpp
565

I am not following what you have in mind. "implicitarg_ptr" is a pointer returned by an intrinsic that reads an ABI-defined register. I need to check that for a given call-graph, a particular range of bytes relative to that base pointer are never accessed. The above DFS on the uses conservatively assumes that such a load exists unless it can conclusively trace every use of the base pointer. This may include the pointer being passed to an extern function or being stored into a different memory location (although we don't expect ABI registers being capture this way). I am not seeing how to construct this around AAPointerInfo. As far as I can see, the public interface only talks about uses that are recognized as loads and stores.

sameerds updated this revision to Diff 406785.Feb 8 2022, 5:33 AM

eliminated NeedsHostcall by simply checking the return value of the walk over all calls

sameerds added inline comments.Feb 8 2022, 5:35 AM
llvm/lib/Target/AMDGPU/AMDGPUAttributor.cpp
588

Thanks! This eliminated the variable. Also used the opportunity to get more creative with the callback names!

arsenm added inline comments.Feb 8 2022, 6:32 AM
llvm/lib/Target/AMDGPU/AMDGPUAttributor.cpp
198

The ABI should not be a property of the subtarget, and the global subtarget shouldn't be used

llvm/test/CodeGen/AMDGPU/hsa-metadata-hostcall-v5.ll
56

Can you also add cases where the load is wider and the hostcall pointer is a subset of it?

arsenm added inline comments.Feb 8 2022, 6:40 AM
llvm/lib/Target/AMDGPU/AMDGPUAttributor.cpp
520

Can you use checkForAllUses instead of creating your own worklist?

jdoerfert added inline comments.Feb 8 2022, 7:47 AM
llvm/lib/Target/AMDGPU/AMDGPUAttributor.cpp
565

Not actually tested, replaces the function body. Depends on D119249.

const auto PointerInfoAA = A.getAAFor<AAPointerInfo>(*this, IRPosition::callback_returned(cast<CallBase>(Ptr)), DepClassTy::Required);
if (!PointerInfoAA.getState().isValidState())
  return true; // Abort (which is weird as false is abort in the other CB).
AAPointerInfo::OffsetAndSize OAS(*Position, /* probably look pointer width up in DL */ 8);
return !forallInterferingAccesses(OAS, [](const AAPointerInfo::Access &Acc, bool IsExact) {
   return Acc.getRemoteInst()->isDroppable(); });
580–583

I'd suggest to switch the return value of ArgUsed... so it matches this functions (and other callbacks).

jdoerfert added inline comments.Feb 8 2022, 8:28 AM
llvm/lib/Target/AMDGPU/AMDGPUAttributor.cpp
565

You don't actually need the state check.
And as I said, this will take care of following pointers passed into callees or through memory to other places, all while ignoring dead code, etc.

sameerds updated this revision to Diff 406860.Feb 8 2022, 8:47 AM

further simplified the callback return value; moved hostcall info out of the subtarget

arsenm added inline comments.Feb 8 2022, 8:58 AM
llvm/lib/Target/AMDGPU/AMDGPUAttributor.cpp
198

This is still depending on the global subtarget. There's no reason isHsaAbiVersion* should depend on the subtarget. All it's doing is checking global properties

sameerds added inline comments.Feb 8 2022, 9:01 AM
llvm/lib/Target/AMDGPU/AMDGPUAttributor.cpp
198

I don't understand what the objection is. Existing functions that check the ABI version clearly do so by accessing the subtarget. I am merely following existing practice.

I have now rearranged the code a bit. Maybe this works? To be honest, I am not very familiar with how ABI information is tracked. I will heartily welcome any advice on how to retrieve the ABI version and then determine the location of the hostcall implicit arg.

520

checkForAllUses does not track sufficient state to catch every load that happens to overlap with the hostcall pointer. The only pattern likely to happen in real life is "GEP to offset 24, typecast to i64*, load 8 bytes". But unless we can guarantee that this is the only pattern, we need something robust.

@jdoerfert pointed me in the correct direction to make full use of existing machinery, but that is now dependent on D119249

565

I see now. forallInterferingAccesses() does check for valid state on entry, which is sufficient to take care of all the opaque uses like a call to an extern function or a complicated phi or a capturing store. Thanks a ton ... this has been very educational!

580–583

Agreed. In fact switching the return value allowed me to merge the two callbacks into one. I am still keeping the positive name on the outermost function "funcRetrievesHostcallPtr" because it matches the sense near the callsite.

arsenm added inline comments.Feb 8 2022, 9:04 AM
llvm/lib/Target/AMDGPU/AMDGPUAttributor.cpp
198

Fundamentally the ABI is/needs to be a module level property. The subtarget is a function level property, and the subtarget read from the global target machine has been deprecated for years. getHsaABIVersion isn't actually reading anything from the subtarget, and is only using it to query the triple. Really getHsaAbiVersion should be passed the module, which could query the triple/module flag/compiler flag, none of which are subtarget properties

kpyzhov accepted this revision.Feb 8 2022, 10:29 AM
This revision is now accepted and ready to land.Feb 8 2022, 10:29 AM
sameerds updated this revision to Diff 406889.Feb 8 2022, 10:29 AM

added tests for i128 load. hostcall position is now independent of subtarget.

jdoerfert added inline comments.Feb 8 2022, 10:33 AM
llvm/lib/Target/AMDGPU/AMDGPUAttributor.cpp
565

Yes, all "forAll" functions will return false if we cannot visit "all". Though, these methods will utilize the smarts, e.g., ignore what is dead, look at loads if the value is stored in a way we can track it through memory, transfer accesses in a callee to the caller "space" if a pointer is passed to the callee,... etc.

kpyzhov accepted this revision.Feb 8 2022, 10:34 AM
jdoerfert requested changes to this revision.Feb 8 2022, 10:34 AM

@kpyzhov Usually people don't accept without comment, especially if there is clearly ongoing discussion.

This revision now requires changes to proceed.Feb 8 2022, 10:34 AM
t-tye removed a reviewer: tony-tye.Feb 8 2022, 11:27 AM
t-tye added a reviewer: t-tye.
t-tye removed a subscriber: t-tye.
t-tye added a subscriber: t-tye.
t-tye removed a subscriber: t-tye.
sameerds added inline comments.Feb 8 2022, 6:01 PM
llvm/lib/Target/AMDGPU/AMDGPUAttributor.cpp
565

Yes, all "forAll" functions will return false if we cannot visit "all". Though, these methods will utilize the smarts, e.g., ignore what is dead, look at loads if the value is stored in a way we can track it through memory, transfer accesses in a callee to the caller "space" if a pointer is passed to the callee,... etc.

@jdoerfert, do you see D119249 landing soon? We are kinda on a short runway here (less than a handful of days) and hoping to land this review quickly because it solves an important issue. I would prefer to have the check that you outlined, but the alternative is to let my version through now, and then update it when the new interface becomes available.

jdoerfert added inline comments.Feb 9 2022, 12:43 AM
llvm/lib/Target/AMDGPU/AMDGPUAttributor.cpp
565

Did you test my proposed code? I'll land my patch tomorrow, if yours works as you expect with the proposed AAPointerInfo use, great. If it doesn't I don't necessarily mind you merging something else with a clear intention to address issues and remove duplication as we go.

I can lift my commit block as we go and @arsenm can also give it a good-to-go if need be.

sameerds marked 2 inline comments as not done.Feb 9 2022, 1:20 AM
sameerds added inline comments.
llvm/lib/Target/AMDGPU/AMDGPUAttributor.cpp
565

Lifting your commit block would be useful in general. I certainly do not intend to submit something in a hurry.

I applied your change and tested the proposed code. It gives more pessimistic results than my original crude version, i.e., it marks some uses of the implicitarg_ptr as accessing the target range, when I am pretty sure it does not. See test_kernel42 in the lit test hsa-metadata-hostcall-v3.ll in this review. It is intended to access eight bytes from 16 onwards, which clearly does not overlap with offset 24, which is the start of the target range (hostcall pointer).

Strangely, in the debug messages, the AAPointerInfoCallsiteReturned() constructor is called, but I don't see any messages marked [AAPointerInfo].

sameerds added inline comments.Feb 9 2022, 3:07 AM
llvm/lib/Target/AMDGPU/AMDGPUAttributor.cpp
565

Good news. The PointerInfo works as expected ... it's just that AMDGPUAttributor has a very short Allowed list, and I needed to add PointerInfo there.

So @jdoerfert, D119249 works for me with the suggested changes. If that makes it to mainline, I'll update this change to use it.

@arsenm do you see any issues with enabling AAPointerInfo in the AMDGPUAttributor? Will there be concerns about it being "too heavy" or something?

arsenm added inline comments.Feb 9 2022, 8:39 AM
llvm/lib/Target/AMDGPU/AMDGPUAttributor.cpp
565

No, that's what it's there for

sameerds updated this revision to Diff 407762.Feb 10 2022, 9:17 PM

use AAPointerInfo; add more tests to demonstrate the same

jdoerfert accepted this revision.Feb 11 2022, 7:18 AM

Attributor use looks good. @arsenm will need to look at the rest.

This revision is now accepted and ready to land.Feb 11 2022, 7:18 AM
arsenm accepted this revision.Feb 11 2022, 7:20 AM
kpyzhov accepted this revision.Feb 11 2022, 7:31 AM
This revision was landed with ongoing or failed builds.Feb 11 2022, 9:22 AM
This revision was automatically updated to reflect the committed changes.
llvm/test/CodeGen/AMDGPU/hsa-metadata-hostcall-present-v3.ll