This is partially a workaround. SILowerI1Copies does not understand
unstructured loops. This would result in inserting instructions to
merge a mask register in the same block where it was defined in an
unstructured loop. This is also a small code improvement.
Details
Diff Detail
Unit Tests
Event Timeline
I would recommend saying "irreducible cycle" instead of "unstructured loop". An unstructured loop could also be an unstructured acyclic region where the entry is the header and the exit is the latch.
The underlying problem seems need more time to fix. So I think it is ok to let the patch in as a workaround. And it seems make generated code shorter from the test check changes.
llvm/lib/Target/AMDGPU/SILowerI1Copies.cpp | ||
---|---|---|
606 | I am not sure what kind of unstructured loops may come here. Maybe we need more test for this pass. But in the test case you provided, I think the loop is just a natural loop (with bb.1 as header and bb.3 as latch/existing). I felt the function name findLoop is a little bit confusing here. The reason it returns 0 for the test case is because there is no loop that does not contain PostDomBound (that is bb.3 in the test case). | |
llvm/test/CodeGen/AMDGPU/lower-i1-copies-implicit-def-unstructured-loop.mir | ||
7 | Yes, the situation sounds a little bit awkward to handle. In fact, we can easily get another failure case by just replacing %34 = IMPLICIT_DEF with %34:sreg_64 = V_CMP_EQ_U32_e64 %17:vgpr_32, 1, implicit $exec. Could you share with me an IR reproducer for this issue? That may help me better understand the problem. |
llvm/test/CodeGen/AMDGPU/lower-i1-copies-implicit-def-unstructured-loop.mir | ||
---|---|---|
7 | I started this patch month ago and seem to have misplaced the original reduced IR case. I could re-reduce a second case I saw |
Seems to cause 2 lit test failures on my Release+Asserts build:
LLVM :: CodeGen/AMDGPU/loop_break.ll LLVM :: CodeGen/AMDGPU/sgpr-control-flow.ll
I am not sure what kind of unstructured loops may come here. Maybe we need more test for this pass. But in the test case you provided, I think the loop is just a natural loop (with bb.1 as header and bb.3 as latch/existing). I felt the function name findLoop is a little bit confusing here. The reason it returns 0 for the test case is because there is no loop that does not contain PostDomBound (that is bb.3 in the test case).