This is an archive of the discontinued LLVM Phabricator instance.

[AMDGPU] Fixed incorrect uniform branch condition
ClosedPublic

Authored by tpr on Dec 15 2017, 8:02 AM.

Details

Summary

I had a case where multiple nested uniform ifs resulted in code that did
v_cmp comparisons, combining the results with s_and_b64, s_or_b64 and
s_xor_b64 and using the resulting mask in s_cbranch_vccnz, without first
ensuring that bits for inactive lanes were clear.

There was already code for inserting an "s_and_b64 vcc, exec, vcc" to
clear bits for inactive lanes in the case that the branch is instruction
selected as s_cbranch_scc1 and is then changed to s_cbranch_vccnz in
SIFixSGPRCopies. I have moved that code into SILowerControlFlow so it
also handles the case that the branch is instruction selected as
s_cbranch_vccnz.

I have also added some code to analyze whether the s_and_b64 needs to be
inserted at all. If vcc was defined by a v_cmp, or a combination via
s_and/s_or of a number of v_cmp instructions, then it is not needed at
all. The s_and_b64 is now not inserted in these cases.

Several tests had such an unnecessary s_and_b64, so needed modifying to
cope with its disappearance. The indirect-addressing-si.ll test also
seemed to be affected by some kind of loop transformation, so the lit
script needed changing to cope with that.

Diff Detail

Repository
rL LLVM

Event Timeline

tpr created this revision.Dec 15 2017, 8:02 AM

The clearing extra clearing when moving branch on scc was a leftover that should have been removed in r286134. I also don't understand why it would be needed in that case, since the moved condition is going to be a V_CMP* that does the right thing for inactive lanes.

I think this is backwards. We shouldn't be looking at the already lowered control flow instructions and trying to insert semantically required instructions. This pass is specifically for expanding the exec mask modifications, not general control flow. It could perhaps be renamed. I think you should partially revert r286134 unless the condition source is known to be a compare.

An optimisation pass should then try to eliminate the unnecessary ands, perhaps SIOptimizeExecMaskingPreRA

tpr added a comment.Dec 15 2017, 9:47 AM

Ah, I didn't see r286134.

How do you mean, partially revert it? It looks like I want to completely revert it and fix the comment for semantic correctness, then think about optimizing the cases when the s_and is not needed.

In D41292#956918, @tpr wrote:

Ah, I didn't see r286134.

How do you mean, partially revert it? It looks like I want to completely revert it and fix the comment for semantic correctness, then think about optimizing the cases when the s_and is not needed.

I mean mostly keep it, just re-insert the and when the branch condition isn't known to be a compare like it implicitly assumes

In D41292#956918, @tpr wrote:

Ah, I didn't see r286134.

How do you mean, partially revert it? It looks like I want to completely revert it and fix the comment for semantic correctness, then think about optimizing the cases when the s_and is not needed.

I mean mostly keep it, just re-insert the and when the branch condition isn't known to be a compare like it implicitly assumes

I suppose there are a few other special cases, like CMP_CLASS and other known-to-be-i1 produced instructions

tpr updated this revision to Diff 127258.Dec 16 2017, 2:14 PM

V2: Always add the s_and. We should add a later pass to remove it when it is not needed.

tpr added a comment.Jan 4 2018, 11:23 AM

If there are no further comments, I'll land this.

arsenm added inline comments.Jan 4 2018, 11:47 AM
lib/Target/AMDGPU/AMDGPUISelDAGToDAG.cpp
1680 ↗(On Diff #127258)

Can you add further explanation? Should it still be OK as long as the incoming condition can be seen to be a VCC producer / not a bit op?

tpr updated this revision to Diff 128788.Jan 5 2018, 1:15 PM

Expanded comment.

tpr marked an inline comment as done.Jan 5 2018, 1:17 PM
arsenm accepted this revision.Jan 9 2018, 7:55 AM

LGTM. Are you planning on doing the optimization soon?

This revision is now accepted and ready to land.Jan 9 2018, 7:55 AM
This revision was automatically updated to reflect the committed changes.