This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
lib/Transforms/Scalar/
-
Transforms/
-
Scalar/
-
EarlyCSE.cpp
-
test/CodeGen/AMDGPU/
-
CodeGen/
-
AMDGPU/
1/1
cse-convergent.ll

Differential D149348

RFD: Do not CSE convergent calls in different basic blocks
ClosedPublic

Authored by foad on Apr 27 2023, 7:43 AM.

Download Raw Diff

Details

Reviewers

nhaehnle
sameerds
yaxunl
arsenm
nikic

Group Reviewers

Restricted Project

Commits

rG56af0e913ce7: [EarlyCSE] Do not CSE convergent calls in different basic blocks

Summary

"convergent" is documented as meaning that the call cannot be made
control-dependent on more values, but in practice we also require that
it cannot be made control-dependent on fewer values, e.g. it cannot be
hoisted out of the body of an "if" statement.

In code like this, if we allow CSE to combine the two calls:

x = convergent_call();
if (cond) {
  y = convergent_call();
  use y;
}

then we get this:

x = convergent_call();
if (cond) {
  use x;
}

This is conceptually equivalent to moving the second call out of the
body of the "if", up to the location of the first call, so it should be
disallowed.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

foad created this revision.Apr 27 2023, 7:43 AM

Herald added a project: Restricted Project. · View Herald TranscriptApr 27 2023, 7:43 AM

Herald added subscribers: kosarev, StephenFan, kerbowa and 2 others. · View Herald Transcript

foad requested review of this revision.Apr 27 2023, 7:43 AM

Herald added a project: Restricted Project. · View Herald TranscriptApr 27 2023, 7:43 AM

Herald added a subscriber: llvm-commits. · View Herald Transcript

Harbormaster completed remote builds in B228552: Diff 517552.Apr 27 2023, 7:44 AM

foad added a parent revision: D149349: [CSE] Precommit an AMDGPU test case for D149348.Apr 27 2023, 7:46 AM

foad added inline comments.Apr 27 2023, 7:51 AM

llvm/test/CodeGen/AMDGPU/cse-convergent.ll
37	This is the effect of the fix: we repeat the DPP subgroup operation over a reduced set of lanes, instead of reusing the result of the first DPP subgroup operation over all lanes.
llvm/test/Transforms/SimplifyCFG/convergent.ll
85 ↗	(On Diff #517552)	This is a completely accidental hoisting improvement due to https://reviews.llvm.org/D129370#inline-1442432. Convergent calls in the "then" and "else" branches are now treated as not identical, which weirdly allows more hoisting than when they were considered identical.

nikic requested changes to this revision.Apr 27 2023, 7:54 AM

nikic added a subscriber: nikic.

nikic added inline comments.

llvm/lib/IR/Instruction.cpp
529 ↗	(On Diff #517552)	This check should not be part of hasSameSpecialState(). It's not the job of this function to model any dependencies, be they memory dependencies or convergence.

This revision now requires changes to proceed.Apr 27 2023, 7:54 AM

How confident are we in our ability to strip the convergent attribute off of functions that don't need it? Seems like this could cause performance regressions.

Move logic from Instruction::hasSameSpecialState to
Instruction::isIdenticalToWhenDefined. @nikic is that more acceptable?

Move logic from Instruction::hasSameSpecialState to
Instruction::isIdenticalToWhenDefined. @nikic is that more acceptable?

Failing that I can move it up into SimpleValue's isEqualImpl in EarlyCSE.

How confident are we in our ability to strip the convergent attribute off of functions that don't need it? Seems like this could cause performance regressions.

I don't know. I guess that is an Attributor question? @arsenm @jdoerfert

In D149348#4302188, @foad wrote:

Move logic from Instruction::hasSameSpecialState to
Instruction::isIdenticalToWhenDefined. @nikic is that more acceptable?

Not really. All of those functions are about purely structural comparison of instructions, they don't reason about effects. Moving the check to EarlyCSE would be fine.

Harbormaster completed remote builds in B228566: Diff 517570.Apr 27 2023, 8:55 AM

In D149348#4302261, @foad wrote:

How confident are we in our ability to strip the convergent attribute off of functions that don't need it? Seems like this could cause performance regressions.

I don't know. I guess that is an Attributor question? @arsenm @jdoerfert

Removing convergent is easy. It

llvm/lib/IR/Instruction.cpp
578 ↗	(On Diff #517570)	I’m questioning whether it’s worth or correct to allow CSE of convergent operations in the same block

Restrict the changes to EarlyCSE itself.

Harbormaster completed remote builds in B228587: Diff 517609.Apr 27 2023, 9:36 AM

foad added inline comments.Apr 27 2023, 11:26 AM

llvm/lib/IR/Instruction.cpp
578 ↗	(On Diff #517570)	Surely it's correct? You could even CSE them in different blocks if there is no divergent control flow between them. When we have convergence tokens perhaps this will become trivial: you can CSE them iff they use the convergence token.

(My concern has been addressed and I can't comment on the change itself.)

ruiling added a subscriber: ruiling.Apr 27 2023, 7:12 PM

ruiling added inline comments.

llvm/lib/IR/Instruction.cpp
578 ↗	(On Diff #517570)	I’m questioning whether it’s worth or correct to allow CSE of convergent operations in the same block Very good question. CSE convergent operations in the same block may still be helpful if there are several subgroup operations(like mbcnt(ballot)) in one block. Regarding to whether it is correct to CSE convergent operation. I think into two cases: WWM operations with normal operations. As the inputs to the WWM operation are already `set.inactive`ed, so we will not do wrong CSE between WWM operation and normal operation. convergent operations before and after amdgcn.kill. A ballot before an amdgcn.kill obviously cannot be CSEed with another one after the kill. So I think we should not try to CSE ANY convergent operation up till now. Last time I think into kill, my idea is we need to add a new type of basic block terminator for it. But I have not go into the details of how to make it work with other parts of our compiler.

foad added inline comments.Apr 28 2023, 3:19 AM

llvm/lib/IR/Instruction.cpp
578 ↗	(On Diff #517570)	So I think we should not try to CSE ANY convergent operation up till now. I would like to push the current patch because I think it is justified by the target-independent meaning of "convergent". The problem with amdgcn.kill is very AMDGPU-specific, so I think ideally we should find an AMDGPU-specific way to fix it, instead of pessimizing target-independent code.

sameerds added inline comments.Apr 28 2023, 6:24 AM

llvm/lib/IR/Instruction.cpp
578 ↗	(On Diff #517570)	I kinda agree with that sentiment. Is it correct to assume that the kill intrinsic is itself experimental, and its semantics needs to be worked out for convergent operations? Shouldn't that happen before using the intrinsic in production?

nhaehnle added inline comments.Apr 28 2023, 6:31 AM

llvm/lib/IR/Instruction.cpp
578 ↗	(On Diff #517570)	I'd say the change should go ahead as-is. `kill` is AMDGPU-specific, and in any case, it looks like this change doesn't make the situation regarding `kill` worse than it is today. Further, I agree with @ruiling that `kill` should conceptually be a terminator. We should consider just using it with `callbr`.

sameerds added inline comments.Apr 28 2023, 6:34 AM

llvm/lib/IR/Instruction.cpp
578 ↗	(On Diff #517570)	Convergence tokens are not sufficient by themselves for CSE. The set of threads at the use of a token can be a subset of the threads at the definition, if there is divergent control flow along the way. Then two uses of the same token can be reached by different sets of the original set, which is no different from the current situation. The same can happen across a kill intrinsic call.

arsenm accepted this revision.Apr 28 2023, 6:48 AM

arsenm added inline comments.

llvm/lib/IR/Instruction.cpp
578 ↗	(On Diff #517570)	Yes, I’ve thought about only allowing callbr kills. It would fix this ugly edge case

This revision is now accepted and ready to land.Apr 28 2023, 6:48 AM

Herald added a subscriber: wdng. · View Herald TranscriptApr 28 2023, 6:48 AM

This revision was landed with ongoing or failed builds.Apr 28 2023, 6:57 AM

Closed by commit rG56af0e913ce7: [EarlyCSE] Do not CSE convergent calls in different basic blocks (authored by foad). · Explain Why

This revision was automatically updated to reflect the committed changes.

foad added a commit: rG56af0e913ce7: [EarlyCSE] Do not CSE convergent calls in different basic blocks.

foad mentioned this in rG5534d1d834c2: [CSE] Precommit an AMDGPU test case for D149348.

foad added inline comments.Apr 28 2023, 6:58 AM

llvm/lib/IR/Instruction.cpp
578 ↗	(On Diff #517570)	in any case, it looks like this change doesn't make the situation regarding kill worse than it is today. Right, this patch is a step in the direction of @ruiling's suggestion of disallowing all CSE of convergent calls.

ruiling added inline comments.Apr 28 2023, 7:08 AM

llvm/lib/IR/Instruction.cpp
578 ↗	(On Diff #517570)	Yes, I agree on submit the change, I was just going too much specific to AMDGPU specific stuff when thinking about possible issues. Generally speaking, this change is correct.

foad mentioned this in D153151: [EarlyCSE] Do not CSE convergent calls with memory effects.Jun 16 2023, 9:13 AM

foad mentioned this in rG9ff71814cb5d: [EarlyCSE] Do not CSE convergent calls with memory effects.Jul 14 2023, 3:48 AM

Revision Contents

Path

Size

llvm/

lib/

Transforms/

Scalar/

EarlyCSE.cpp

18 lines

test/

CodeGen/

AMDGPU/

cse-convergent.ll