This is an archive of the discontinued LLVM Phabricator instance.

[ARM] Favour PL/MI over GE/LT when possible
ClosedPublic

Authored by dmgreen on Jul 3 2019, 1:48 PM.

Details

Summary

The arm condition codes for GE is N==V (and for LT is N!=V). If the source of flags cannot set V (overflow), such as a cmp against #0, then we can use the simpler PL and MI conditions that only check N. As these PL/MI conditions are simpler than GE/LT, other passes like the peephole optimiser can have a better time optimising away the redundant CMPs.

The exception is the VSEL instruction, which cannot take these codes, so there the transform favours GE. I have added some extra tests to codegen/arm/vsel.ll to check against 0 (which are not shown here because they do not change). Some of the other tests here have been updated, for which I am showing the difference.

Diff Detail

Repository
rL LLVM

Event Timeline

dmgreen created this revision.Jul 3 2019, 1:48 PM

I wrote a very similar patch at one point, but I didn't submit it because I couldn't demonstrate any significant benefit. Then again, I only tested it with Thumb1; maybe it's more useful for Thumb2?

llvm/lib/Target/ARM/ARMISelLowering.cpp
4086 ↗(On Diff #207881)

isNullConstant

4730 ↗(On Diff #207881)

Do you not need to check for ARMCC::MI here? Or do we avoid that somehow?

dmgreen updated this revision to Diff 207894.Jul 3 2019, 2:40 PM
dmgreen marked 3 inline comments as done.

Now uses isNullConstant

Thanks for taking a look. The codesize changes here are minor, but present and consistently down. It was actually performance that I was looking for though, with this coming up on a couple of hot loops I was seeing. They do something like if ((a-b) < 0) ... a few times which previously was leaving the extra cmp instructions in. The changes in long shift also look like they may be useful.

llvm/lib/Target/ARM/ARMISelLowering.cpp
4730 ↗(On Diff #207881)

The vsel can only handle GE, GT, VS and EQ apparently. The code above tries to invert the conditions to get one of these legal codes. So I don't believe that will usually be generated, and if it is then choosing MI over LT won't make much of a difference.

Without this change, we only end up generating a "vmovpl.f32 s0, s1" instead of a "vselge.f32 s0, s1, s0" in the tests I've added, so it didn't seem much worse in either case. The vsel is presumably better in general though.

efriedma accepted this revision.Jul 3 2019, 3:58 PM

LGTM

This revision is now accepted and ready to land.Jul 3 2019, 3:58 PM
This revision was automatically updated to reflect the committed changes.