This is an archive of the discontinued LLVM Phabricator instance.

[AMDGPU] Improve indirect call support in closed modules
Needs ReviewPublic

Authored by jdoerfert on Jul 20 2023, 2:01 PM.

Details

Summary

If we see all functions that can be called, thus in a "closed module",
we can perform better reasoning in the presence of unknown callees of
indirect calls. We now collect all indirectly callable functions, and
match them against an indirect call site with an unknown callee. Every
function that is a candidate is potentially called.

The AMDGPU backend is the only user for now. We should enable this for
AMDGPU (and NVIDIA GPUs in certain cases) also when we run the
Attributor (or OpenMP-opt) earlier in the pipeline.

TODOs

  • We should improve the "is candidate" logic.

Diff Detail

Event Timeline

jdoerfert created this revision.Jul 20 2023, 2:01 PM
Herald added a project: Restricted Project. · View Herald Transcript
Herald added subscribers: hoy, ormris, foad and 10 others. · View Herald Transcript
jdoerfert requested review of this revision.Jul 20 2023, 2:01 PM
Herald added a project: Restricted Project. · View Herald Transcript
jdoerfert added inline comments.Jul 20 2023, 2:03 PM
llvm/test/CodeGen/AMDGPU/attributor-loop-issue-58639.ll
84

We should have realized this is UB from the beginning. I think we didn't because AAPotentialValues (and maybe more AAs) is not allowed in the AMDGPUAttributor run.

arsenm added inline comments.Jul 20 2023, 2:03 PM
llvm/lib/Target/AMDGPU/AMDGPUAttributor.cpp
957

This should probably feed from a pass option like other end-of-LTO things

llvm/lib/Transforms/IPO/AttributorAttributes.cpp
10411 ↗(On Diff #542665)

Move to a separate real function?

10416 ↗(On Diff #542665)

The float and double cases are covered by T1 == T2?

jdoerfert added inline comments.Jul 20 2023, 2:06 PM
llvm/lib/Target/AMDGPU/AMDGPUAttributor.cpp
957

Those do not feed into the backend passes, as far as I can tell. I am tempted to add the Module flag back.

llvm/lib/Transforms/IPO/AttributorAttributes.cpp
10416 ↗(On Diff #542665)

Right.

arsenm added inline comments.Jul 20 2023, 2:06 PM
llvm/lib/Transforms/IPO/AttributorAttributes.cpp
10444 ↗(On Diff #542665)

Need tests for all this type pun stuff

jdoerfert updated this revision to Diff 542709.Jul 20 2023, 3:42 PM

Split out TTI stuff

jdoerfert updated this revision to Diff 542723.Jul 20 2023, 4:58 PM

update tests

scott.linder added inline comments.
llvm/lib/Transforms/IPO/Attributor.cpp
3250 ↗(On Diff #542723)

Comment should describe the new case

3254–3267 ↗(On Diff #542723)

I don't understand the context fully, but is the intention to have &F pushed into IndirectlyCallableFunctions repeatedly if there are repeated calls to it? And that earlier-ordered calls are pushed to it, but none are considered once any must-tail-call are reached? I would have expected the push to be pulled out of the loop, as those constraints don't sound meaningful to me. The continue case is also confusing to me, unless there is some side-effect in the loop I'm missing? The same question applies to the original code, where I would have expected a break after CalledViaMustTail is set to true, so I assume I am just missing something?

In addition to all the above confusion the condition on the outermost if also seems strange, as there are two separable cases, one of which depends on the other, but they are combined in a non-obvious way, namely around the starting value of FI.CalledViaMustTail.

I added a suggestion of what makes sense to me, but there are far too many things I'm confused about to have any confidence my suggestions are useful. Any help in understanding would be appreciated!

jdoerfert added inline comments.Jul 24 2023, 9:59 AM
llvm/lib/Transforms/IPO/Attributor.cpp
3254–3267 ↗(On Diff #542723)

I don't understand the context fully, but is the intention to have &F pushed into IndirectlyCallableFunctions repeatedly if there are repeated calls to it?

No. It always ends up in there once. I might have placed it in the wrong scope. I'll check your code.

The continue case is also confusing to me, unless there is some side-effect in the loop I'm missing? The same question applies to the original code, where I would have expected a break after CalledViaMustTail is set to true, so I assume I am just missing something?

The original code seems to be missing a break.

In addition to all the above confusion the condition on the outermost if also seems strange, as there are two separable cases, one of which depends on the other, but they are combined in a non-obvious way, namely around the starting value of FI.CalledViaMustTail.

The outer if has two conditions separated with a ||, no?
If either is true, we want to look at users.
First is isClosedWorldModule(), second is (!isModulePass() && !FI.CalledViaMustTail).
Why do they depend on each other? Maybe the confusion is that we might search for must tail if we don't need to, which is a fair criticism. If we set closed world in the backend (for now), I'll clean this up.

scott.linder added inline comments.Jul 24 2023, 1:15 PM
llvm/lib/Transforms/IPO/Attributor.cpp
3254–3267 ↗(On Diff #542723)

The outer if has two conditions separated with a ||, no?
If either is true, we want to look at users.
First is isClosedWorldModule(), second is (!isModulePass() && !FI.CalledViaMustTail).
Why do they depend on each other? Maybe the confusion is that we might search for must tail if we don't need to, which is a fair criticism. If we set closed world in the backend (for now), I'll clean this up.

Ah, maybe my confusion is just with the relationship between CalledViaMustTail and IndirectlyCallableFunctions, or possibly the lack of relationship between them? In the current version my reading was !CalledViaMustTail && !F.hasLocalLinkage() implied we don't want to update IndirectlyCallableFunctions. The fact that something might just be in the wrong scope might explain the whole thing, and the alternative condition I suggested might make no sense if there is no actual connection between the two.

Anyway, if the two things can be separated entirely that would make it much easier to read. As it stands there are too many possible dynamic execution paths to keep track of easily, IMO.

arsenm added inline comments.Aug 11 2023, 8:44 AM
llvm/include/llvm/Transforms/IPO/Attributor.h
1318 ↗(On Diff #542723)

Should this return ArrayRef?

jdoerfert updated this revision to Diff 553656.Aug 25 2023, 3:52 PM

rebase on top of the changes in main

jdoerfert edited the summary of this revision. (Show Details)Aug 26 2023, 11:30 AM
jdoerfert updated this revision to Diff 554526.Aug 29 2023, 4:58 PM
jdoerfert retitled this revision from [Attributor][AMDGPU] Improve indirect call support in closed modules to [AMDGPU] Improve indirect call support in closed modules.
jdoerfert edited the summary of this revision. (Show Details)

Remove AA code paths and focus on AMDGPU

arsenm added inline comments.Sep 12 2023, 4:12 AM
llvm/lib/Target/AMDGPU/AMDGPUAttributor.cpp
957

Not sure what you mean, there are pass parameters like any other

arsenm added inline comments.Sep 12 2023, 4:15 AM
llvm/lib/Target/AMDGPU/AMDGPUAttributor.cpp
975–977

I don't really understand what optional with null is supposed to mean. Just drop the optional?

979
  • instead of .value()?
llvm/test/CodeGen/AMDGPU/annotate-kernel-features-hsa-call.ll
22

poison

arsenm added inline comments.Sep 12 2023, 4:18 AM
llvm/lib/Target/AMDGPU/AMDGPUAttributor.cpp
966

Can you move this to a real function with a name?

982–984

AMDGPU::isArgPassedInSGPR. Alternatively we could add some kind of variant of TTI::isAlwaysUniform which covers arguments

jdoerfert marked 2 inline comments as done.Sep 15 2023, 3:37 AM
llvm/test/CodeGen/AMDGPU/enable-scratch-only-dynamic-stack.ll