Page MenuHomePhabricator

[ARM][LowOverheadLoops] Combine a VCMP and VPST into a VPT
ClosedPublic

Authored by samtebbs on Sep 14 2020, 8:02 AM.

Details

Summary

This patch combines a VCMP followed by a VPST into a VPT, which has the same semantics as the combination of the former two.

Diff Detail

Event Timeline

samtebbs created this revision.Sep 14 2020, 8:02 AM
samtebbs requested review of this revision.Sep 14 2020, 8:02 AM
samparker added inline comments.Sep 15 2020, 4:32 AM
llvm/lib/Target/ARM/ARMLowOverheadLoops.cpp
1301

Shouldn't you be searching for any VCMP opcode? RDA would be a nicer way of finding the VPR def, but that shouldn't be unnecessary anyway - I'm pretty certain the VCMP should be the 'Divergent' instruction.

llvm/test/CodeGen/Thumb2/LowOverheadLoops/vcmp-vpst-combination.ll
2

Probably best not to run at -O3, just in case upstream/downstream have different optimisation pipelines.

samtebbs updated this revision to Diff 291888.Sep 15 2020, 6:28 AM

Remove -O3 from test and improve VCMP detection.

samtebbs updated this revision to Diff 291890.Sep 15 2020, 6:33 AM

Clean up the formatting a little.

samtebbs marked 2 inline comments as done.Sep 15 2020, 6:36 AM
samtebbs added inline comments.
llvm/lib/Target/ARM/ARMLowOverheadLoops.cpp
1301

Checking the Divergent is much better, thanks. I also realised that I wasn't decrementing I if the ++I == E check failed so this new logic is more robust.

samtebbs marked an inline comment as done.Sep 15 2020, 9:00 AM
samparker accepted this revision.Sep 15 2020, 11:51 PM

LGTM, thanks.

This revision is now accepted and ready to land.Sep 15 2020, 11:51 PM
dmgreen added inline comments.Sep 16 2020, 10:13 AM
llvm/lib/Target/ARM/ARMLowOverheadLoops.cpp
1326

The VPT block pass has some very similar code. Do we need to check that Operand 1 and Operand 2 have not been modified between the VCMP and where we are materializing the VPT to?

samtebbs marked an inline comment as done.Sep 21 2020, 7:27 AM
samtebbs added inline comments.
llvm/lib/Target/ARM/ARMLowOverheadLoops.cpp
1326

That is a good spot. I'll submit a check for this in a follow-up.