Page MenuHomePhabricator

[AMDGPU] Set implicit arg attributes for indirect calls
ClosedPublic

Authored by madhur13490 on Mar 25 2021, 8:41 AM.

Details

Summary

This patch adds attributes corresponding to
implicits to functions/kernels if

  1. it has an indirect call OR
  2. it's address is taken.

Once such attributes are set, rest of the codegen would work
out-of-box for indirect calls. This patch eliminates
the potential overhead -fixed-abi imposes even though indirect functions
calls are not used.

Diff Detail

Event Timeline

madhur13490 created this revision.Mar 25 2021, 8:41 AM
madhur13490 requested review of this revision.Mar 25 2021, 8:41 AM
Herald added a project: Restricted Project. · View Herald TranscriptMar 25 2021, 8:41 AM

This patch depends on https://reviews.llvm.org/D98884 for all tests to work.

llvm/lib/Target/AMDGPU/AMDGPUAnnotateKernelFeatures.cpp
275

Is the reasoning here that an indirect call site to an unknown callee will have set up all the implicit state, so for consistency a function which is ever called indirectly must always have all the state set up?

I would have hoped the granularity is such that functions can be annotated with only what they need, so that indirect call sites pass (and require) everything but direct call sites don't need to. With consistent register location for implicit values, I would hope that will work out of the box.

299

As discussed, presence of inline asm does not need to be taken to imply presence of an indirect call within said asm. If someone wants to make an indirect branch from asm, it's on them to get it right. If someone wants to use asm without indirect branches, they won't want the extra overhead from conservatively setting up all the state.

383

This doesn't make sense to me.

An indirect call site implies all implicit args are needed, so the caller of that function will acquire those attributes. Then a direct call to said function will have those attributes, and so on.

hasAddressTaken on the kernel itself makes no difference here - is the comment out of date?

madhur13490 added inline comments.Mar 25 2021, 10:17 AM
llvm/lib/Target/AMDGPU/AMDGPUAnnotateKernelFeatures.cpp
275

Is the reasoning here that an indirect call site to an unknown callee will have set up all the implicit state, so for consistency a function which is ever called indirectly must always have all the state set up?

We are not setting up implicit attributes per call site. This pass works at function granularity and it does not take into account the call site. Implicit attributes are not a property of call site but the functions themselves. If function's hasAddressTaken() is true, we set all attributes corresponding to the implicits.

I would have hoped the granularity is such that functions can be annotated with only what they need, so that indirect call sites pass (and require) everything but direct call sites don't need to. With consistent register location for implicit values, I would hope that will work out of the box.

That won't work out. You can't ask call site to pass all implicits when you setup only a subset of them. The arguments setup must be at superset of what is passed. We use attributes on the function (not call site) to setup implicits and thus pass this pass sets them up before we begin codegen so that codegen works out-of-the-box. So your comment - " that functions can be annotated with only what they need" is invalid.

299

The code is setting the flag when there is *noinlineasm()*. If you set the flag even when there is inlineAsm() then all exisiting apps which are using inlieAsm() would be requesting more implicits than current ones. This is what we discussed and aligned internally.

383

I don't understand this comment.

This doesn't make sense to me.

An indirect call site implies all implicit args are needed, so the caller of that function will acquire those attributes. Then a direct call to said function will have those attributes, and so on.

which function?

hasAddressTaken on the kernel itself makes no difference here - is the comment out of date?

hasAddressTaken() on kernels makes difference because hasAddressTaken() for kernels is naturally false. How would set implicits then? This snippet is explicitly for functions which makes an indirect call and not for the address taken functions (that is handled at the beginning).

The comment is not out of date. Look at the test cases - simple-indirect-call.ll and direct-indirect-call.ll.

rampitec added inline comments.Mar 25 2021, 10:41 AM
llvm/lib/Target/AMDGPU/AMDGPUAnnotateKernelFeatures.cpp
277

Pass all arguments to hasAddressTaken().

llvm/lib/Target/AMDGPU/AMDGPUAnnotateKernelFeatures.cpp
275

Argument passing is per call site, derived from on the attributes of the function being called (or a pessimistic superset when the callee is uncertain). What you're doing here makes every call to a function slower, as opposed to only the indirect calls.

The attributes on a function should be derived from the attributes on the functions it calls, and whether any of those call sites is indirect, not based on whether the function itself is called indirectly.

299

I missed the !.

383

The function that was making an indirect call, and is itself being called directly.

The code makes approximate sense here but I can't relate it to the comment.

Approximate in that it should probably be:

if (hasIndirectCall) {
  for (auto AttrName : ImplicitAttrNames) {
    if (callingconv.supports(AttrName) {
      F.addFnAttr(AttrName);
      Changed = true;
    } else {
      // sometimes an error
    }
  }  
}

Could the comment be:
// An indirect call may use any implicit arguments. Conservatively ensure all are available.

llvm/test/CodeGen/AMDGPU/simple-indirect-call.ll
5

This doesn't seem right. The function 'indirect' uses no implicit arguments, it should have none of those attributes appended.

12

This (kernel) function should however be annotated with all the implicit arguments, as it contains an indirect call (that has not been devirtualized in this test)

add all params to hasAddressTaken()

madhur13490 marked an inline comment as done.Mar 25 2021, 10:56 AM
madhur13490 added inline comments.
llvm/lib/Target/AMDGPU/AMDGPUAnnotateKernelFeatures.cpp
277

Ah, Right! missed it.

madhur13490 marked an inline comment as done.Mar 25 2021, 11:10 PM
madhur13490 added inline comments.
llvm/lib/Target/AMDGPU/AMDGPUAnnotateKernelFeatures.cpp
275

Argument passing is per call site, derived from on the attributes of the function being called (or a pessimistic superset when the callee is uncertain). What you're doing here makes every call to a function slower, as opposed to only the indirect calls.

The attributes on a function should be derived from the attributes on the functions it calls, and whether any of those call sites is indirect, not based on whether the function itself is called indirectly.

I don't think we're on same page here. Our objective is to enable -fixed-abi only for those apps which makes an indirect call and making optimization per call site is not an immediate goal. While per call site optimization is useful, it is appropriate in the world where we have address taken functions and apps making direct as well indirect calls to them. We should handle this in the longer term solution where we want to use Attributor framework.

Our immediate goal is to minimize the potential regression problem.

383

The function that was making an indirect call, and is itself being called directly.

The code makes approximate sense here but I can't relate it to the comment.

Approximate in that it should probably be:

if (hasIndirectCall) {
  for (auto AttrName : ImplicitAttrNames) {
    if (callingconv.supports(AttrName) {
      F.addFnAttr(AttrName);
      Changed = true;
    } else {
      // sometimes an error
    }
  }  
}

Could the comment be:
// An indirect call may use any implicit arguments. Conservatively ensure all are available.

383

Can you please elaborate what do you mean by "callingconv.supports(AttrName)"?
Also, please explain the word "approximate" in this context.

I don't have any intention of using CC per AttrName.

llvm/test/CodeGen/AMDGPU/simple-indirect-call.ll
5

Spelling out the immediate objective should address this. We want to set all implicit attributes on hasAddressTaken() functions *as well as* functions making indirect calls. This would address our immediate goal.

JonChesterfield resigned from this revision.Mar 29 2021, 8:32 AM

I don't understand why fine grained attributes would be introduced and then all the fine grained attributes set on the function. I also don't understand why setting the attributes on a function based on which they call is considered difficult. We don't seem to be reaching consensus. I'll leave the assessment of this patch to others.

arsenm accepted this revision.Mar 29 2021, 3:54 PM

This is good enough as a bandaid until the pass is replaced with the inverse pass

llvm/lib/Target/AMDGPU/AMDGPUAnnotateKernelFeatures.cpp
275

Codegen does not attempt to treat implicit arguments per call site, only per function. We could optimize this, but I don't think it would be particularly worthwhile

278

I believe kernels can have their address taken, but not be called (e.g. you could have a global table with a bunch of kernel addresses in it)

This revision is now accepted and ready to land.Mar 29 2021, 3:54 PM

change tests at HEAD

arsenm accepted this revision.Apr 12 2021, 12:10 PM