At the moment, MachineCSE allows CSE-ing convergent instrs which are non-local to each other. This can cause illegal codegen as convergent instrs are control flow dependent. The patch prevents non-local CSE of convergent instrs by adding a check in isProfitableToCSE and rejecting CSE-ing if we're considering CSE-ing non-local convergent instrs. We can still CSE convergent instrs which are in the same control flow scope, so the patch purposely does not make all convergent instrs non-CSE candidates in isCSECandidate.
The change LGTM but could you add a test case? There probably aren't many convergent instructions upstream but it should be possible to make a test case using AMDGPU instructions or via G_INTRINSIC
Update: added handwritten MIR unit test for the MachineCSE change using AMDGPU's DS_SWIZZLE_B32 instr (which is marked isConvergent in llvm/lib/Target/AMDGPU/DSInstructions.td)
CHECK-LABEL is about partitioning the input into multiple pieces that can be checked independently rather than about labels. LGTM with this and the bb.2 one below as either CHECK/CHECK-NEXT
It's been pointed out to me off-list that CSE'ing to here isn't actually banned by isConvergent, it's just one of the cases we conservatively decline to CSE in the change. To be covered by isConvergent it'd have to be CSE'd into a more/differently predicated block (less is ok). Furthermore the other the cases where we wouldn't be conservative are already prevented by other checks in CSE. If we can find the field we actually mean this patch will only need a small change. I haven't been able to find it though, it doesn't seem to exist in the backend and that's probably what's gotten me confused (I don't think this is the first time either :-))
That actually reminded me of something else to double check: Does this CSE without the change too?
The definition of convergent is pretty broken. For AMDGPU in the MIR the control flow as represented by basic blocks no longer expresses the lane level CFG which we're concerned with for convergent ops. In the future when we have convergence tokens, it's not clear to me if we'll somehow preserve those through codegen. It's best to just not CSE convergent operations. It's really unlikely it would be worthwhile if it's even legal
Maybe we should at least put a comment with just about that, Matt, on the change made in MachineCSE? Otherwise I'm afraid it's way too easy to remove the check and be technically right about it. Thoughts?
A green light from AMDGPU for this patch though is very helpful, thank you.
If this is a correctness issue then surely it should not be done inside "is *profitable* to cse"?
I'd be ok with that (with an explanatory comment). I could believe that any that were legal and worthwhile probably already happened during LLVM-IR.
Just to clarify, I don't mean to prevent cases within the same BB there. Those happen and are legal and worthwhile
My suggestion is to keep making progress here:
- move the check out of is profitable to processBlock top level
- put a comprehensive comment on it outlining the issues discussed here (and off fabricator) so far
- do (2) in the test as well (and keep the test otherwise as is)
The issues include:
a) isConvergent as of current definition in LLVM does not prove cross-block MachineCSE illegal, however, with the change MachineCSE pass takes the liberty to extend the definition of isConvergent as a practical necessity. The extension is: "assume it is illegal to make a convergent operation dependent not only on additional conditions, but also on fewer conditions than originally"
b) The current open source GPU backends as is do not appear to allow a reasonably simple test case that provably and undeniably functionally breaks w/o the MachineCSE change proposed, as a result, the test being added is merely a coverage test for the change being made, not a reproducer of an actual (execution) problem in AMDGPU backend.
And we merge it from there. This is a conditional LGTM from me, conditions are above. Thanks!
Following Roman's suggestions, the update:
- Move the code preventing CSE of isConvergent instrs into ProcessBlockCSE from isProfitableToCSE
- Adds comments in MachineCSE and the test explaining why isConvergent is checked to prevent CSE
- Adds comment in the test explaining the test is not reproducing an AMDGPU backend bug, but rather is a coverage test for the MachineCSE change
Thanks for all the feedback!
LGTM with the check in ProcessBlockPRE
I think it's needed there too
@mkitzan IIUC (which might be not the case) PRE not checking for isConvergent is a genuine bug, unlike the CSE part: PRE moves ops into predicated blocks, making them more predicated than before, which is illegal for isConvergent.
If that's the case, perhaps in case of PRE isConvergent check could be a part of isPRECandidate.
- Added isConvergent check in ProcessBlockPRE
Note: @rtereshin and I talked off-list about whether PRE not checking for isConvergent is a bug, and it was determined that for MachineCSE's implementation of PRE it is not a bug.
Forgot to link the differential before pushing, but latest update is in a11489ae3e36063c64921439cbab89d1f3280f4a