Page MenuHomePhabricator

[Target][ARM] Fold or(A, B) more aggressively for I1 Vectors
ClosedPublic

Authored by Pierre-vh on Apr 1 2020, 2:20 AM.

Details

Summary

This is a change to the ARM backend that folds or(A, B) into not(and(not(A), not(B))) more often.
This only affects vectors of i1: v4i1, v8i1 and v16i1 because PerformORCombine_i1 is only called by PerformORCombine if those conditions are met:

if (Subtarget->hasMVEIntegerOps() &&
    (VT == MVT::v4i1 || VT == MVT::v8i1 || VT == MVT::v16i1))
  return PerformORCombine_i1(N, DCI, Subtarget);

This actually generates better code in my tests, as not and and are essentially free compared to or when manipulating the VPR register.

  • and becomes a VPT block (no extra instruction)
  • not becomes a vpnot, which is often removed by the MVE VPT Block Insertion pass to create VPT blocks (no extra instructions).

However, I'm not fully sure it's a good change, and I believe the implementation could be better, which is why I need some help to review this.

Diff Detail

Event Timeline

Pierre-vh created this revision.Apr 1 2020, 2:20 AM

I have no thoughts on the patch itself, but the commit message looks quite alarming out of context. Perhaps it should mention that you're doing this specifically for i1 and vectors of i1, and not for bitwise OR of ordinary integers?

Pierre-vh retitled this revision from [Target][ARM] Fold or(A, B) more aggressively to [Target][ARM] Fold or(A, B) more aggressively for I1 Vectors.Apr 1 2020, 3:01 AM
Pierre-vh edited the summary of this revision. (Show Details)

I have no thoughts on the patch itself, but the commit message looks quite alarming out of context. Perhaps it should mention that you're doing this specifically for i1 and vectors of i1, and not for bitwise OR of ordinary integers?

Sure, I've updated the commit message and the description of the change.

The code that already exists in the function you are changing is essentially already doing the same thing as you add here, just in a more constrained set of circumstances. It is saying that if the operands are obviously invertable, then the VPNOT will be really free and we can go ahead and invert the and to an or. The question is if that is true for all cases or not. For the test cases you have here it is true that the operands are easily invertable, but that won't be true for everything.

With VPT blocks the NOT's will also be free in a lot of cases, but again not all and it's difficult to see where in practice it would be better.

I think that we should either always be doing this, or doing this when the operands are (recursively) freely invertable. I'm not sure which will be better in practice.

Pierre-vh updated this revision to Diff 254159.Apr 1 2020, 4:05 AM

Rebased the patch

I reworked the implementation of the patch, it should be cleaner now.

The test here look like an improvement, but I'm not sure this would be true for all cases.

The old code said something like "if we know that both operands can be inverted, invert the OR."
The new code looks more like "if we know the operands can be inverted; invert them. Else if we know either can't be inverted, don't invert. Else if we know _nothing_ about the operands; invert them, and hope for the best".

If we are going to go this route (if we think it's generally profitable, It's hard to tell with how many other little problems we have in predicate code generation), I think it might make more sense to improve the checks for when we invert or not. We could just always try invert it but I doubt that works very well. What do you think about testing if at least one side of the 'and' can be inverted, I think that might be enough to justify the transform. That should at least either remove a NOT or convert the AND to an OR. And a so long as one of the sides is a VCMP, we will be able to fold the and into the compare.

It doesn't look like we have a fold for "(not vcmp) -> vcmp". It might be better to have PerformORCombine_i1 just produce not(and(not, not), (providing it looks profitable) and have other combines fold the not's into vcmps/anything else.

Pierre-vh updated this revision to Diff 260252.Apr 27 2020, 2:36 AM

Updated the patch: now the transformation only happens if one of the operands is a condition that can be immediately inverted.
It isn't as good as the other version (in terms of improvements) but it's safer (there is less risk of generating terrible code in some situations)

This looks like an improvement on it's own, but I think it would be cleaner if there was a different fold for doing (not vcmp) -> vcmp, so that it needn't be done here. Then we can convert to not(and(not, not) here. Also we can handle swapped operands then too.

Pierre-vh updated this revision to Diff 261754.May 4 2020, 1:04 AM
  • Moved the (not(vcmp)) -> !vcmp fold to PerformXORCombine

Nice. One last round of me nitpicking details I think.

What happened to the test changes in mve-pred-or.ll?

llvm/lib/Target/ARM/ARMISelLowering.cpp
12657

This can just be return (ARMCC::CondCodes)N->getConstantOperandVal(2)

12669

N->getOperand(0).getValueType(),...

12684–12685

Maybe make an IsFreelyInvertable() function/lambda. Then this will just be

if (IsFreelyInvertable(N0) || IsFreelyInvertable(N1))

We can then add things like swapping operands to it, if we teach it those tricks.

12828

Just create a SDLoc for N0. Same above in the other function.

Pierre-vh updated this revision to Diff 261791.May 4 2020, 5:52 AM
Pierre-vh marked 4 inline comments as done.
  • Refactorings (see comments marked "done")
  • Fold even when only one side is free to invert. This brings back the mve-pred-or changes.
dmgreen accepted this revision.May 4 2020, 11:42 PM

Thanks. LGTM, with one extra comment.

llvm/lib/Target/ARM/ARMISelLowering.cpp
12686

These can use DL too. The not is in a way coming from the Or.

This revision is now accepted and ready to land.May 4 2020, 11:42 PM
Pierre-vh updated this revision to Diff 262018.May 5 2020, 1:11 AM
Pierre-vh marked an inline comment as done.
This revision was automatically updated to reflect the committed changes.