Page MenuHomePhabricator

[MachineCSE] Prevent CSE of non-local convergent instrs
ClosedPublic

Authored by mkitzan on Apr 23 2021, 11:02 AM.

Details

Summary

At the moment, MachineCSE allows CSE-ing convergent instrs which are non-local to each other. This can cause illegal codegen as convergent instrs are control flow dependent. The patch prevents non-local CSE of convergent instrs by adding a check in isProfitableToCSE and rejecting CSE-ing if we're considering CSE-ing non-local convergent instrs. We can still CSE convergent instrs which are in the same control flow scope, so the patch purposely does not make all convergent instrs non-CSE candidates in isCSECandidate.

Diff Detail

Event Timeline

mkitzan created this revision.Apr 23 2021, 11:02 AM
mkitzan requested review of this revision.Apr 23 2021, 11:02 AM
Herald added a project: Restricted Project. · View Herald TranscriptApr 23 2021, 11:02 AM

The change LGTM but could you add a test case? There probably aren't many convergent instructions upstream but it should be possible to make a test case using AMDGPU instructions or via G_INTRINSIC

mkitzan updated this revision to Diff 340176.Apr 23 2021, 3:09 PM

Update: added handwritten MIR unit test for the MachineCSE change using AMDGPU's DS_SWIZZLE_B32 instr (which is marked isConvergent in llvm/lib/Target/AMDGPU/DSInstructions.td)

dsanders accepted this revision.Apr 23 2021, 4:15 PM
dsanders added inline comments.
llvm/test/CodeGen/AMDGPU/GlobalISel/no-cse-nonlocal-convergent-instrs.mir
9 ↗(On Diff #340176)

CHECK-LABEL is about partitioning the input into multiple pieces that can be checked independently rather than about labels. LGTM with this and the bb.2 one below as either CHECK/CHECK-NEXT

This revision is now accepted and ready to land.Apr 23 2021, 4:15 PM

Ah ok, good to know. Thanks for the review! Changing them to CHECK.

mkitzan updated this revision to Diff 340193.Apr 23 2021, 4:22 PM

Update: changed basic block checks from CHECK-LABEL to CHECK

dsanders requested changes to this revision.Apr 23 2021, 6:17 PM
dsanders added inline comments.
llvm/test/CodeGen/AMDGPU/GlobalISel/no-cse-nonlocal-convergent-instrs.mir
53 ↗(On Diff #340193)

It's been pointed out to me off-list that CSE'ing to here isn't actually banned by isConvergent, it's just one of the cases we conservatively decline to CSE in the change. To be covered by isConvergent it'd have to be CSE'd into a more/differently predicated block (less is ok). Furthermore the other the cases where we wouldn't be conservative are already prevented by other checks in CSE. If we can find the field we actually mean this patch will only need a small change. I haven't been able to find it though, it doesn't seem to exist in the backend and that's probably what's gotten me confused (I don't think this is the first time either :-))

That actually reminded me of something else to double check: Does this CSE without the change too?

This revision now requires changes to proceed.Apr 23 2021, 6:17 PM
arsenm added a subscriber: arsenm.Apr 23 2021, 6:22 PM
arsenm added inline comments.
llvm/test/CodeGen/AMDGPU/GlobalISel/no-cse-nonlocal-convergent-instrs.mir
53 ↗(On Diff #340193)

The definition of convergent is pretty broken. For AMDGPU in the MIR the control flow as represented by basic blocks no longer expresses the lane level CFG which we're concerned with for convergent ops. In the future when we have convergence tokens, it's not clear to me if we'll somehow preserve those through codegen. It's best to just not CSE convergent operations. It's really unlikely it would be worthwhile if it's even legal

rtereshin added inline comments.Apr 23 2021, 6:59 PM
llvm/test/CodeGen/AMDGPU/GlobalISel/no-cse-nonlocal-convergent-instrs.mir
53 ↗(On Diff #340193)

Maybe we should at least put a comment with just about that, Matt, on the change made in MachineCSE? Otherwise I'm afraid it's way too easy to remove the check and be technically right about it. Thoughts?

A green light from AMDGPU for this patch though is very helpful, thank you.

lkail added a subscriber: lkail.Apr 24 2021, 1:17 AM
foad added a subscriber: foad.Apr 26 2021, 1:40 AM
foad added inline comments.
llvm/lib/CodeGen/MachineCSE.cpp
437

If this is a correctness issue then surely it should not be done inside "is *profitable* to cse"?

dsanders added inline comments.Apr 26 2021, 2:58 PM
llvm/test/CodeGen/AMDGPU/GlobalISel/no-cse-nonlocal-convergent-instrs.mir
53 ↗(On Diff #340193)

It's best to just not CSE convergent operations. It's really unlikely it would be worthwhile if it's even legal

I'd be ok with that (with an explanatory comment). I could believe that any that were legal and worthwhile probably already happened during LLVM-IR.

dsanders added inline comments.Apr 26 2021, 3:01 PM
llvm/test/CodeGen/AMDGPU/GlobalISel/no-cse-nonlocal-convergent-instrs.mir
53 ↗(On Diff #340193)

I'd be ok with that (with an explanatory comment). I could believe that any that were legal and worthwhile probably already happened during LLVM-IR.

Just to clarify, I don't mean to prevent cases within the same BB there. Those happen and are legal and worthwhile

rtereshin accepted this revision.Apr 27 2021, 1:45 PM

My suggestion is to keep making progress here:

  1. move the check out of is profitable to processBlock top level
  2. put a comprehensive comment on it outlining the issues discussed here (and off fabricator) so far
  3. do (2) in the test as well (and keep the test otherwise as is)

The issues include:
a) isConvergent as of current definition in LLVM does not prove cross-block MachineCSE illegal, however, with the change MachineCSE pass takes the liberty to extend the definition of isConvergent as a practical necessity. The extension is: "assume it is illegal to make a convergent operation dependent not only on additional conditions, but also on fewer conditions than originally"
b) The current open source GPU backends as is do not appear to allow a reasonably simple test case that provably and undeniably functionally breaks w/o the MachineCSE change proposed, as a result, the test being added is merely a coverage test for the change being made, not a reproducer of an actual (execution) problem in AMDGPU backend.

And we merge it from there. This is a conditional LGTM from me, conditions are above. Thanks!

mkitzan updated this revision to Diff 341316.EditedApr 28 2021, 2:39 PM

Following Roman's suggestions, the update:

  • Move the code preventing CSE of isConvergent instrs into ProcessBlockCSE from isProfitableToCSE
  • Adds comments in MachineCSE and the test explaining why isConvergent is checked to prevent CSE
  • Adds comment in the test explaining the test is not reproducing an AMDGPU backend bug, but rather is a coverage test for the MachineCSE change

Thanks for all the feedback!

foad added inline comments.Apr 29 2021, 2:40 AM
llvm/lib/CodeGen/MachineCSE.cpp
604

Do we also need this check in ProcessBlockPRE?

dsanders accepted this revision.Apr 29 2021, 9:57 AM

LGTM with the check in ProcessBlockPRE

llvm/lib/CodeGen/MachineCSE.cpp
604

I think it's needed there too

This revision is now accepted and ready to land.Apr 29 2021, 9:57 AM
rtereshin added inline comments.Apr 29 2021, 2:03 PM
llvm/lib/CodeGen/MachineCSE.cpp
604

@mkitzan IIUC (which might be not the case) PRE not checking for isConvergent is a genuine bug, unlike the CSE part: PRE moves ops into predicated blocks, making them more predicated than before, which is illegal for isConvergent.

If that's the case, perhaps in case of PRE isConvergent check could be a part of isPRECandidate.

mkitzan updated this revision to Diff 342509.May 3 2021, 12:39 PM

Update:

  • Added isConvergent check in ProcessBlockPRE

Note: @rtereshin and I talked off-list about whether PRE not checking for isConvergent is a bug, and it was determined that for MachineCSE's implementation of PRE it is not a bug.

mkitzan closed this revision.May 5 2021, 2:32 PM

Forgot to link the differential before pushing, but latest update is in a11489ae3e36063c64921439cbab89d1f3280f4a