Download Raw Diff

Details

Reviewers

rtereshin
dsanders

Summary

At the moment, MachineCSE allows CSE-ing convergent instrs which are non-local to each other. This can cause illegal codegen as convergent instrs are control flow dependent. The patch prevents non-local CSE of convergent instrs by adding a check in isProfitableToCSE and rejecting CSE-ing if we're considering CSE-ing non-local convergent instrs. We can still CSE convergent instrs which are in the same control flow scope, so the patch purposely does not make all convergent instrs non-CSE candidates in isCSECandidate.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

mkitzan created this revision.Apr 23 2021, 11:02 AM

Herald added a subscriber: hiraditya. · View Herald TranscriptApr 23 2021, 11:02 AM

mkitzan requested review of this revision.Apr 23 2021, 11:02 AM

Herald added a project: Restricted Project. · View Herald TranscriptApr 23 2021, 11:02 AM

Herald added a subscriber: llvm-commits. · View Herald Transcript

The change LGTM but could you add a test case? There probably aren't many convergent instructions upstream but it should be possible to make a test case using AMDGPU instructions or via G_INTRINSIC

Harbormaster completed remote builds in B100630: Diff 340107.Apr 23 2021, 1:39 PM

Update: added handwritten MIR unit test for the MachineCSE change using AMDGPU's DS_SWIZZLE_B32 instr (which is marked isConvergent in llvm/lib/Target/AMDGPU/DSInstructions.td)

Herald added subscribers: kerbowa, nhaehnle, jvesely. · View Herald TranscriptApr 23 2021, 3:09 PM

dsanders accepted this revision.Apr 23 2021, 4:15 PM

dsanders added inline comments.

llvm/test/CodeGen/AMDGPU/GlobalISel/no-cse-nonlocal-convergent-instrs.mir
9 ↗	(On Diff #340176)	CHECK-LABEL is about partitioning the input into multiple pieces that can be checked independently rather than about labels. LGTM with this and the bb.2 one below as either CHECK/CHECK-NEXT

This revision is now accepted and ready to land.Apr 23 2021, 4:15 PM

Ah ok, good to know. Thanks for the review! Changing them to CHECK.

Update: changed basic block checks from CHECK-LABEL to CHECK

mkitzan mentioned this in rG59f2dd5f1acd: [MachineCSE] Prevent CSE of non-local convergent instrs.Apr 23 2021, 4:53 PM

Harbormaster completed remote builds in B100679: Diff 340176.Apr 23 2021, 5:25 PM

dsanders requested changes to this revision.Apr 23 2021, 6:17 PM

dsanders added inline comments.

llvm/test/CodeGen/AMDGPU/GlobalISel/no-cse-nonlocal-convergent-instrs.mir
53 ↗	(On Diff #340193)	It's been pointed out to me off-list that CSE'ing to here isn't actually banned by isConvergent, it's just one of the cases we conservatively decline to CSE in the change. To be covered by isConvergent it'd have to be CSE'd into a more/differently predicated block (less is ok). Furthermore the other the cases where we wouldn't be conservative are already prevented by other checks in CSE. If we can find the field we actually mean this patch will only need a small change. I haven't been able to find it though, it doesn't seem to exist in the backend and that's probably what's gotten me confused (I don't think this is the first time either :-)) That actually reminded me of something else to double check: Does this CSE without the change too?

This revision now requires changes to proceed.Apr 23 2021, 6:17 PM

arsenm added a subscriber: arsenm.Apr 23 2021, 6:22 PM

arsenm added inline comments.

llvm/test/CodeGen/AMDGPU/GlobalISel/no-cse-nonlocal-convergent-instrs.mir
53 ↗	(On Diff #340193)	The definition of convergent is pretty broken. For AMDGPU in the MIR the control flow as represented by basic blocks no longer expresses the lane level CFG which we're concerned with for convergent ops. In the future when we have convergence tokens, it's not clear to me if we'll somehow preserve those through codegen. It's best to just not CSE convergent operations. It's really unlikely it would be worthwhile if it's even legal

Harbormaster completed remote builds in B100696: Diff 340193.Apr 23 2021, 6:44 PM

rtereshin added inline comments.Apr 23 2021, 6:59 PM

llvm/test/CodeGen/AMDGPU/GlobalISel/no-cse-nonlocal-convergent-instrs.mir
53 ↗	(On Diff #340193)	Maybe we should at least put a comment with just about that, Matt, on the change made in MachineCSE? Otherwise I'm afraid it's way too easy to remove the check and be technically right about it. Thoughts? A green light from AMDGPU for this patch though is very helpful, thank you.

lkail added a subscriber: lkail.Apr 24 2021, 1:17 AM

foad added a subscriber: foad.Apr 26 2021, 1:40 AM

foad added inline comments.

llvm/lib/CodeGen/MachineCSE.cpp
437	If this is a correctness issue then surely it should not be done inside "is profitable to cse"?

dsanders added inline comments.Apr 26 2021, 2:58 PM

llvm/test/CodeGen/AMDGPU/GlobalISel/no-cse-nonlocal-convergent-instrs.mir
53 ↗	(On Diff #340193)	It's best to just not CSE convergent operations. It's really unlikely it would be worthwhile if it's even legal I'd be ok with that (with an explanatory comment). I could believe that any that were legal and worthwhile probably already happened during LLVM-IR.

dsanders added inline comments.Apr 26 2021, 3:01 PM

llvm/test/CodeGen/AMDGPU/GlobalISel/no-cse-nonlocal-convergent-instrs.mir
53 ↗	(On Diff #340193)	I'd be ok with that (with an explanatory comment). I could believe that any that were legal and worthwhile probably already happened during LLVM-IR. Just to clarify, I don't mean to prevent cases within the same BB there. Those happen and are legal and worthwhile

My suggestion is to keep making progress here:

move the check out of is profitable to processBlock top level
put a comprehensive comment on it outlining the issues discussed here (and off fabricator) so far
do (2) in the test as well (and keep the test otherwise as is)

The issues include:
a) isConvergent as of current definition in LLVM does not prove cross-block MachineCSE illegal, however, with the change MachineCSE pass takes the liberty to extend the definition of isConvergent as a practical necessity. The extension is: "assume it is illegal to make a convergent operation dependent not only on additional conditions, but also on fewer conditions than originally"
b) The current open source GPU backends as is do not appear to allow a reasonably simple test case that provably and undeniably functionally breaks w/o the MachineCSE change proposed, as a result, the test being added is merely a coverage test for the change being made, not a reproducer of an actual (execution) problem in AMDGPU backend.

And we merge it from there. This is a conditional LGTM from me, conditions are above. Thanks!

Following Roman's suggestions, the update:

Move the code preventing CSE of isConvergent instrs into ProcessBlockCSE from isProfitableToCSE
Adds comments in MachineCSE and the test explaining why isConvergent is checked to prevent CSE
Adds comment in the test explaining the test is not reproducing an AMDGPU backend bug, but rather is a coverage test for the MachineCSE change

Thanks for all the feedback!

Harbormaster completed remote builds in B101502: Diff 341316.Apr 28 2021, 4:38 PM

foad added inline comments.Apr 29 2021, 2:40 AM

llvm/lib/CodeGen/MachineCSE.cpp
604	Do we also need this check in ProcessBlockPRE?

LGTM with the check in ProcessBlockPRE

llvm/lib/CodeGen/MachineCSE.cpp
604	I think it's needed there too

This revision is now accepted and ready to land.Apr 29 2021, 9:57 AM

rtereshin added inline comments.Apr 29 2021, 2:03 PM

llvm/lib/CodeGen/MachineCSE.cpp
604	@mkitzan IIUC (which might be not the case) PRE not checking for isConvergent is a genuine bug, unlike the CSE part: PRE moves ops into predicated blocks, making them more predicated than before, which is illegal for isConvergent. If that's the case, perhaps in case of PRE `isConvergent` check could be a part of `isPRECandidate`.

Update:

Added isConvergent check in ProcessBlockPRE

Note: @rtereshin and I talked off-list about whether PRE not checking for isConvergent is a bug, and it was determined that for MachineCSE's implementation of PRE it is not a bug.

Harbormaster completed remote builds in B102365: Diff 342509.May 3 2021, 3:48 PM

LGTM

Forgot to link the differential before pushing, but latest update is in a11489ae3e36063c64921439cbab89d1f3280f4a

This is an archive of the discontinued LLVM Phabricator instance.

[MachineCSE] Prevent CSE of non-local convergent instrs
ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 340107

llvm/lib/CodeGen/MachineCSE.cpp

This is an archive of the discontinued LLVM Phabricator instance.

[MachineCSE] Prevent CSE of non-local convergent instrsClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 340107

llvm/lib/CodeGen/MachineCSE.cpp

[MachineCSE] Prevent CSE of non-local convergent instrs
ClosedPublic