This is an archive of the discontinued LLVM Phabricator instance.

RFD: Do not CSE convergent calls in different basic blocks
ClosedPublic

Authored by foad on Apr 27 2023, 7:43 AM.

Details

Summary

"convergent" is documented as meaning that the call cannot be made
control-dependent on more values, but in practice we also require that
it cannot be made control-dependent on fewer values, e.g. it cannot be
hoisted out of the body of an "if" statement.

In code like this, if we allow CSE to combine the two calls:

x = convergent_call();
if (cond) {
  y = convergent_call();
  use y;
}

then we get this:

x = convergent_call();
if (cond) {
  use x;
}

This is conceptually equivalent to moving the second call out of the
body of the "if", up to the location of the first call, so it should be
disallowed.

Diff Detail

Event Timeline

foad created this revision.Apr 27 2023, 7:43 AM
Herald added a project: Restricted Project. · View Herald TranscriptApr 27 2023, 7:43 AM
foad requested review of this revision.Apr 27 2023, 7:43 AM
Herald added a project: Restricted Project. · View Herald TranscriptApr 27 2023, 7:43 AM
foad added inline comments.Apr 27 2023, 7:51 AM
llvm/test/CodeGen/AMDGPU/cse-convergent.ll
37

This is the effect of the fix: we repeat the DPP subgroup operation over a reduced set of lanes, instead of reusing the result of the first DPP subgroup operation over all lanes.

llvm/test/Transforms/SimplifyCFG/convergent.ll
85 ↗(On Diff #517552)

This is a completely accidental hoisting improvement due to https://reviews.llvm.org/D129370#inline-1442432. Convergent calls in the "then" and "else" branches are now treated as not identical, which weirdly allows *more* hoisting than when they were considered identical.

nikic requested changes to this revision.Apr 27 2023, 7:54 AM
nikic added a subscriber: nikic.
nikic added inline comments.
llvm/lib/IR/Instruction.cpp
529 ↗(On Diff #517552)

This check should not be part of hasSameSpecialState(). It's not the job of this function to model any dependencies, be they memory dependencies or convergence.

This revision now requires changes to proceed.Apr 27 2023, 7:54 AM

How confident are we in our ability to strip the convergent attribute off of functions that don't need it? Seems like this could cause performance regressions.

foad updated this revision to Diff 517570.Apr 27 2023, 8:20 AM

Move logic from Instruction::hasSameSpecialState to
Instruction::isIdenticalToWhenDefined. @nikic is that more acceptable?

foad added a comment.Apr 27 2023, 8:34 AM

Move logic from Instruction::hasSameSpecialState to
Instruction::isIdenticalToWhenDefined. @nikic is that more acceptable?

Failing that I can move it up into SimpleValue's isEqualImpl in EarlyCSE.

How confident are we in our ability to strip the convergent attribute off of functions that don't need it? Seems like this could cause performance regressions.

I don't know. I guess that is an Attributor question? @arsenm @jdoerfert

nikic added a comment.Apr 27 2023, 8:54 AM

Move logic from Instruction::hasSameSpecialState to
Instruction::isIdenticalToWhenDefined. @nikic is that more acceptable?

Not really. All of those functions are about purely structural comparison of instructions, they don't reason about effects. Moving the check to EarlyCSE would be fine.

How confident are we in our ability to strip the convergent attribute off of functions that don't need it? Seems like this could cause performance regressions.

I don't know. I guess that is an Attributor question? @arsenm @jdoerfert

Removing convergent is easy. It

llvm/lib/IR/Instruction.cpp
578 ↗(On Diff #517570)

I’m questioning whether it’s worth or correct to allow CSE of convergent operations in the same block

foad updated this revision to Diff 517609.Apr 27 2023, 9:36 AM

Restrict the changes to EarlyCSE itself.

foad added inline comments.Apr 27 2023, 11:26 AM
llvm/lib/IR/Instruction.cpp
578 ↗(On Diff #517570)

Surely it's correct? You could even CSE them in different blocks if there is no divergent control flow between them. When we have convergence tokens perhaps this will become trivial: you can CSE them iff they use the convergence token.

nikic resigned from this revision.Apr 27 2023, 12:07 PM

(My concern has been addressed and I can't comment on the change itself.)

ruiling added inline comments.
llvm/lib/IR/Instruction.cpp
578 ↗(On Diff #517570)

I’m questioning whether it’s worth or correct to allow CSE of convergent operations in the same block

Very good question. CSE convergent operations in the same block may still be helpful if there are several subgroup operations(like mbcnt(ballot)) in one block.

Regarding to whether it is correct to CSE convergent operation. I think into two cases:

  1. WWM operations with normal operations. As the inputs to the WWM operation are already set.inactiveed, so we will not do wrong CSE between WWM operation and normal operation.
  2. convergent operations before and after amdgcn.kill. A ballot before an amdgcn.kill obviously cannot be CSEed with another one after the kill.

So I think we should not try to CSE ANY convergent operation up till now.
Last time I think into kill, my idea is we need to add a new type of basic block terminator for it. But I have not go into the details of how to make it work with other parts of our compiler.

foad added inline comments.Apr 28 2023, 3:19 AM
llvm/lib/IR/Instruction.cpp
578 ↗(On Diff #517570)

So I think we should not try to CSE ANY convergent operation up till now.

I would like to push the current patch because I think it is justified by the target-independent meaning of "convergent".

The problem with amdgcn.kill is very AMDGPU-specific, so I think ideally we should find an AMDGPU-specific way to fix it, instead of pessimizing target-independent code.

sameerds added inline comments.Apr 28 2023, 6:24 AM
llvm/lib/IR/Instruction.cpp
578 ↗(On Diff #517570)

I kinda agree with that sentiment. Is it correct to assume that the kill intrinsic is itself experimental, and its semantics needs to be worked out for convergent operations? Shouldn't that happen before using the intrinsic in production?

nhaehnle added inline comments.Apr 28 2023, 6:31 AM
llvm/lib/IR/Instruction.cpp
578 ↗(On Diff #517570)

I'd say the change should go ahead as-is. kill is AMDGPU-specific, and in any case, it looks like this change doesn't make the situation regarding kill worse than it is today.

Further, I agree with @ruiling that kill should conceptually be a terminator. We should consider just using it with callbr.

sameerds added inline comments.Apr 28 2023, 6:34 AM
llvm/lib/IR/Instruction.cpp
578 ↗(On Diff #517570)

Convergence tokens are not sufficient by themselves for CSE. The set of threads at the use of a token can be a subset of the threads at the definition, if there is divergent control flow along the way. Then two uses of the same token can be reached by different sets of the original set, which is no different from the current situation. The same can happen across a kill intrinsic call.

arsenm accepted this revision.Apr 28 2023, 6:48 AM
arsenm added inline comments.
llvm/lib/IR/Instruction.cpp
578 ↗(On Diff #517570)

Yes, I’ve thought about only allowing callbr kills. It would fix this ugly edge case

This revision is now accepted and ready to land.Apr 28 2023, 6:48 AM
This revision was landed with ongoing or failed builds.Apr 28 2023, 6:57 AM
This revision was automatically updated to reflect the committed changes.
foad added inline comments.Apr 28 2023, 6:58 AM
llvm/lib/IR/Instruction.cpp
578 ↗(On Diff #517570)

in any case, it looks like this change doesn't make the situation regarding kill worse than it is today.

Right, this patch is a step in the direction of @ruiling's suggestion of disallowing all CSE of convergent calls.

ruiling added inline comments.Apr 28 2023, 7:08 AM
llvm/lib/IR/Instruction.cpp
578 ↗(On Diff #517570)

Yes, I agree on submit the change, I was just going too much specific to AMDGPU specific stuff when thinking about possible issues. Generally speaking, this change is correct.