This is an archive of the discontinued LLVM Phabricator instance.

AMDGPU: Treat IMPLICIT_DEF like a constant lanemask source
ClosedPublic

Authored by arsenm on Jul 21 2021, 2:29 PM.

Details

Summary

This is partially a workaround. SILowerI1Copies does not understand
unstructured loops. This would result in inserting instructions to
merge a mask register in the same block where it was defined in an
unstructured loop. This is also a small code improvement.

Diff Detail

Event Timeline

arsenm created this revision.Jul 21 2021, 2:29 PM
arsenm requested review of this revision.Jul 21 2021, 2:29 PM
Herald added a project: Restricted Project. · View Herald TranscriptJul 21 2021, 2:29 PM
Herald added a subscriber: wdng. · View Herald Transcript
sameerds accepted this revision.Jul 22 2021, 8:08 PM

I would recommend saying "irreducible cycle" instead of "unstructured loop". An unstructured loop could also be an unstructured acyclic region where the entry is the header and the exit is the latch.

This revision is now accepted and ready to land.Jul 22 2021, 8:08 PM

The underlying problem seems need more time to fix. So I think it is ok to let the patch in as a workaround. And it seems make generated code shorter from the test check changes.

llvm/lib/Target/AMDGPU/SILowerI1Copies.cpp
606

I am not sure what kind of unstructured loops may come here. Maybe we need more test for this pass. But in the test case you provided, I think the loop is just a natural loop (with bb.1 as header and bb.3 as latch/existing). I felt the function name findLoop is a little bit confusing here. The reason it returns 0 for the test case is because there is no loop that does not contain PostDomBound (that is bb.3 in the test case).

llvm/test/CodeGen/AMDGPU/lower-i1-copies-implicit-def-unstructured-loop.mir
7

Yes, the situation sounds a little bit awkward to handle. In fact, we can easily get another failure case by just replacing %34 = IMPLICIT_DEF with %34:sreg_64 = V_CMP_EQ_U32_e64 %17:vgpr_32, 1, implicit $exec. Could you share with me an IR reproducer for this issue? That may help me better understand the problem.

arsenm closed this revision.Jul 27 2021, 8:45 AM
llvm/test/CodeGen/AMDGPU/lower-i1-copies-implicit-def-unstructured-loop.mir
7

I started this patch month ago and seem to have misplaced the original reduced IR case. I could re-reduce a second case I saw

foad added a comment.Jul 27 2021, 9:11 AM

Seems to cause 2 lit test failures on my Release+Asserts build:

LLVM :: CodeGen/AMDGPU/loop_break.ll
LLVM :: CodeGen/AMDGPU/sgpr-control-flow.ll

Seems to cause 2 lit test failures on my Release+Asserts build:

LLVM :: CodeGen/AMDGPU/loop_break.ll
LLVM :: CodeGen/AMDGPU/sgpr-control-flow.ll

9b1bcaea4e0e32636e13e767ecee4de398ce7bd2, I forgot to squash this