This is an archive of the discontinued LLVM Phabricator instance.

[AMDGPU] Iterate LoweredEndCf in the reverse order
ClosedPublic

Authored by cdevadas on Dec 24 2021, 1:54 PM.

Details

Summary

The function that optimally inserts the exec mask
restore operations by combining the blocks currently
visits the lowered END_CF pseudos in the forward
direction as it iterates the setvector in the order
the entries are inserted in it.

Due to the absence of BranchFolding at -O0, the
irregularly placed BBs cause the forward traversal
to incorrectly place two unconditional branches in
certain BBs while combining them, especially when
an intervening block later gets optimized away in
subsequent iterations.

It is avoided by reverse iterating the setvector.
The blocks at the bottom of a function will get
optimized first before processing those at the top.

Fixes: SWDEV-315215

Diff Detail

Event Timeline

cdevadas created this revision.Dec 24 2021, 1:54 PM
cdevadas requested review of this revision.Dec 24 2021, 1:54 PM
Herald added a project: Restricted Project. · View Herald TranscriptDec 24 2021, 1:54 PM
cdevadas updated this revision to Diff 396190.Dec 25 2021, 1:55 AM

lit test clean up.

foad added a comment.Dec 31 2021, 3:43 AM

What was the effect of inserting multiple branch instructions? Did it fail MIR verification?

llvm/lib/Target/AMDGPU/SILowerControlFlow.cpp
585

There's a "using namespace llvm" at the top of this file.

What was the effect of inserting multiple branch instructions? Did it fail MIR verification?

No, MIR verifier won't flag it as an error.
It becomes an assertion later at https://github.com/llvm/llvm-project/blob/main/llvm/lib/Target/AMDGPU/SILateBranchLowering.cpp#L149

In the case of multiple unconditional branches, most targets (in their analyzeBranch implementation) remove all but the first one (Arch64, ARM, and a few more).
But we don't do any such thing in the analyzeBranch. Not sure if there is any specific reason not to do so for AMDGPU.

llvm/lib/Target/AMDGPU/SILowerControlFlow.cpp
585

Thanks for pointing it out. Will simplify it.

cdevadas updated this revision to Diff 397200.Jan 3 2022, 10:44 PM
cdevadas edited the summary of this revision. (Show Details)
cdevadas added a reviewer: foad.

Incorporated Jay's suggestion.

rampitec added inline comments.Jan 4 2022, 9:26 AM
llvm/test/CodeGen/AMDGPU/collapse-endcf.mir
704

It seems you need test running at -O0 to cover it.

715

Wasn't update_mir_test_checks.py changed to use -NEXT checks?

cdevadas updated this revision to Diff 397615.Jan 5 2022, 9:37 AM

Pre-committed the regenerated test collapse-endcf.mir separately.
Rebased the test case and autogenerated the pattern for the new test.

rampitec accepted this revision.Jan 5 2022, 9:53 AM

LGTM.

As discussed it would be worth to do a separate change to skip the optimization at -O0 alltogether.

This revision is now accepted and ready to land.Jan 5 2022, 9:53 AM
This revision was landed with ongoing or failed builds.Jan 5 2022, 9:27 PM
This revision was automatically updated to reflect the committed changes.