This is an archive of the discontinued LLVM Phabricator instance.

[AMDGPU] Fixed incorrect break from loop
ClosedPublic

Authored by tpr on Mar 2 2018, 2:43 PM.

Details

Summary

Lower control flow did not correctly handle the case that a loop break
in if/else was on a condition that was not guaranteed to be masked by
exec. The first test kernel shows an example of this going wrong; after
exiting the loop, exec is all ones, even if it was not before the loop.

The fix is for lowering of if-break and else-break to insert an
S_AND_B64 to mask the break condition with exec. This commit also
includes the optimization of not inserting that S_AND_B64 if it is
obviously not needed because the break condition is the result of a
V_CMP in the same basic block.

Diff Detail

Repository
rL LLVM

Event Timeline

tpr created this revision.Mar 2 2018, 2:43 PM
tpr added a reviewer: rampitec.Mar 2 2018, 2:46 PM
arsenm added inline comments.Mar 2 2018, 2:55 PM
lib/Target/AMDGPU/SILowerControlFlow.cpp
353 ↗(On Diff #136857)

Braces

354 ↗(On Diff #136857)

Can you add a testcase where this will fail if there is no defining instruction (e.g. the source value was a function argument)

356 ↗(On Diff #136857)

This doesn't seem like the right condition to me. This should probably be a predicate that checks the specific operand for being a result that is masked with exec? Theoretically this could be something exotic like a write lane.

This also could be = parentmatch && isFoo() rather than an if

362–363 ↗(On Diff #136857)

Identation of these is weird

rampitec added inline comments.Mar 2 2018, 3:48 PM
lib/Target/AMDGPU/SILowerControlFlow.cpp
354 ↗(On Diff #136857)

Can this be a partial def?

I am not sure why this new and instruction shall be needed. As far as I understand one of the arguments of SI_IF_BREAK shall be full copy of exec mask saved before the loop, isn't it?
If that is not so, shall we fix it instead?

tpr added a comment.Mar 3 2018, 12:23 AM

The arguments of SI_IF_BREAK appear to be the "I want to break this time" mask and the running "already exited loop" mask. Therefore we need to ensure that the first one of those is masked with exec before it is ORed into the second one. See the test case for an example of where it is needed.

lib/Target/AMDGPU/SILowerControlFlow.cpp
354 ↗(On Diff #136857)

I don't think it can be a partial def, as we know that it is something that was an i1 in the IR.

354 ↗(On Diff #136857)

I'll give the suggested test a try, but I suspect it won't like me passing an i1 as an arg.

356 ↗(On Diff #136857)

The operand is the break condition, which is something that was an i1 in the IR, but is now a pair of sgprs (including the case of vcc). If it's a valu, I don't think that can be anything other than an instruction with carry-out, i.e. v_cmp (or possibly the carry-out from v_add_u). I should probably add a comment to this effect.

As an optimization, it could maybe also spot other cases, principally an s_and_b64 with exec that's already there. But I don't think that is needed for this commit.

tpr updated this revision to Diff 136898.Mar 3 2018, 12:51 AM

V2: Addressed some review comments.

nhaehnle accepted this revision.Mar 5 2018, 3:08 AM

LGTM

This revision is now accepted and ready to land.Mar 5 2018, 3:08 AM
rampitec accepted this revision.Mar 5 2018, 9:56 AM

LGTM

Is there anything holding up this patch? I've run into this problem in another testcase

tpr added a comment.Apr 17 2018, 8:54 AM

Apologies for the delay. I found a higher than expected number of test cases where this adds an extra unnecessary "and" instruction. So I plan to revisit this.

Perhaps I should land it anyway, and address the unnecessary "and" instructions in a future patch.

This revision was automatically updated to reflect the committed changes.