This is an archive of the discontinued LLVM Phabricator instance.

Reapply "[AMDGPU] Modify adjustInliningThreshold to also consider the cost of passing function arguments through the stack"
ClosedPublic

Authored by JanekvO on Feb 7 2023, 7:11 AM.

Details

Summary

Reapplies 142c28ffa1323e9a8d53200a22c80d5d778e0d0f as part of D140242 which got reverted due to amdgpu openmp test failures.

This diff fixes said failures by eliding most of adjustInliningThresholdUsingCallee for indirect calls as the callee function is unavailable for indirect calls.

Diff Detail

Event Timeline

JanekvO created this revision.Feb 7 2023, 7:11 AM
Herald added a project: Restricted Project. · View Herald TranscriptFeb 7 2023, 7:11 AM
JanekvO requested review of this revision.Feb 7 2023, 7:11 AM
Herald added a project: Restricted Project. · View Herald Transcript
JanekvO added reviewers: arsenm, scchan, Restricted Project.Feb 7 2023, 7:12 AM
arsenm added inline comments.Feb 7 2023, 7:16 AM
llvm/lib/Target/AMDGPU/AMDGPUTargetTransformInfo.cpp
1172

This also will fail for constexpr casts and aliases. You don’t need to give up on these cases. You should be going off the callee type you can get from the CallBase

JanekvO added inline comments.Feb 7 2023, 7:43 AM
llvm/lib/Target/AMDGPU/AMDGPUTargetTransformInfo.cpp
1172

I couldn't get isArgPassedInSGPR working with only the CallBase. Do indirect calls always pass args through VGPR?

arsenm added inline comments.Feb 7 2023, 7:52 AM
llvm/lib/Target/AMDGPU/AMDGPUTargetTransformInfo.cpp
1172

The calling convention is always the same and can’t be different for direct or indirect calls. The ABI is fully determined by the call type, the calling convention and the attribute set. Is there just a missing attribute wrapper for inreg that checks callee and callsite attributes?

JanekvO added inline comments.Feb 7 2023, 8:11 AM
llvm/lib/Target/AMDGPU/AMDGPUTargetTransformInfo.cpp
1172

Ah, I thought isArgPassedInSGPR depended more on information on the Argument and its associated called Function. Seem it only really needs the inreg/byval attribute info of the arguments.

scchan added inline comments.Feb 7 2023, 12:13 PM
llvm/lib/Target/AMDGPU/AMDGPUTargetTransformInfo.cpp
1172

If it's an indirect call, where's no inlining opportunity or am I missing something?

JanekvO updated this revision to Diff 495811.Feb 8 2023, 5:34 AM
  • Stop bailing out for unknown callees and consider args of indirect calls
JanekvO added inline comments.Feb 8 2023, 5:45 AM
llvm/lib/Target/AMDGPU/AMDGPUTargetTransformInfo.cpp
1172

It can possibly still inline in specific cases (such as the one in the provided test). Moreover, the llvm::getInlineCost (CallBase &Call, Function *Callee, ... API that eventually calls this TTI hook may have calls (e.g., from LTO) where the CallBase &Call is for an indirect call without a known Called function but the user of the API provides a known Function *Callee (i.e., Call.getCalledFunction() != Callee).

arsenm added inline comments.Feb 8 2023, 11:47 AM
llvm/lib/Target/AMDGPU/Utils/AMDGPUBaseInfo.cpp
2488–2489

You should go through CallBase::paramHasAttr rather than directly inspecting the AttributeList

JanekvO added inline comments.Feb 9 2023, 6:26 AM
llvm/lib/Target/AMDGPU/Utils/AMDGPUBaseInfo.cpp
2488–2489

I tried to retain existing API for isArgPassedInSGPR(const Argument *A) which doesn't have access to any CallBase. I think I could force something with function_ref but not sure if that's the way to go.

arsenm added inline comments.Feb 9 2023, 6:27 AM
llvm/lib/Target/AMDGPU/Utils/AMDGPUBaseInfo.cpp
2488–2489

Do we really need to retain the Argument version?

JanekvO added inline comments.Feb 9 2023, 8:14 AM
llvm/lib/Target/AMDGPU/Utils/AMDGPUBaseInfo.cpp
2488–2489

Looking into it, there seems to be a only 2 uses of isArgPassedInSGPR(const Argument *A), however, I couldn't find a way to rewrite them to use a more CallBase friendly API.

arsenm added inline comments.Feb 9 2023, 5:24 PM
llvm/lib/Target/AMDGPU/Utils/AMDGPUBaseInfo.cpp
2488–2489

I'm not really sure what the policy is for mismatched call site and declaration ABI attributes policy is supposed to be. We don't have a verifier check for it. My current assumption is for inreg, it would be present on the declaration and never in the call site attributes which won't work here. If you were to only pick one, you would be better off going off the callee's attribute set.

The Argument and call case are also different. The Argument version is for incoming and CallBase for outgoing. You might just need to split the two

JanekvO added inline comments.Feb 10 2023, 5:44 AM
llvm/lib/Target/AMDGPU/Utils/AMDGPUBaseInfo.cpp
2488–2489

My current assumption is for inreg, it would be present on the declaration and never in the call site attributes which won't work here. If you were to only pick one, you would be better off going off the callee's attribute set.

This might be tough in the case of indirect calls since the CallBase is all that's provided to the TTI hook and CallBase::getCalledfunction will return nullptr. Perhaps adding the Callee function to the existing TTI or a new TTI hook to adjust cost/threshold where the Callee function can be accessed might be more appropriate?

arsenm added inline comments.Feb 10 2023, 6:04 AM
llvm/lib/Target/AMDGPU/Utils/AMDGPUBaseInfo.cpp
2488–2489

I think just two duplicate functions for the incoming and outgoing case is fine. One will use Argument and the other CallBase and parameter index

JanekvO updated this revision to Diff 496508.Feb 10 2023, 9:03 AM
  • Duplicate isArgPassedInSGPR
arsenm accepted this revision.Feb 10 2023, 5:01 PM
arsenm added inline comments.
llvm/lib/Target/AMDGPU/Utils/AMDGPUBaseInfo.cpp
2504

This is kind of meaningless but not really wrong for entry points

This revision is now accepted and ready to land.Feb 10 2023, 5:01 PM
This revision was landed with ongoing or failed builds.Feb 13 2023, 4:17 AM
This revision was automatically updated to reflect the committed changes.