Page MenuHomePhabricator

[Reassociate] swap binop operands to increase factoring potential
Changes PlannedPublic

Authored by spatel on Apr 19 2018, 3:30 PM.

Details

Summary

If we have a pair of binops feeding another pair of binops, rearrange the operands so the matching pair are together because that allows easy factorization folds to happen in instcombine:
((X << S) & Y) & (Z << S) --> ((X << S) & (Z << S)) & Y (reassociation)

--> ((X & Z) << S) & Y (factorize shift from 'and' ops optimization)

This is part of solving PR37098:
https://bugs.llvm.org/show_bug.cgi?id=37098

Note that there's an instcombine version of this patch attached there. This reassociate patch took about 10x more effort, so I hope this is the preferred direction. :)

For reasons I still don't completely understand, reassociate does this kind of transform sometimes, but misses everything in my motivating cases.

This patch on its own is gluing an independent cleanup chunk to the end of the existing RewriteExprTree() loop. But if it's approved, I think we can build on it and do something stronger to better order the full expression tree like D40049. That might be an alternative to the proposal to add a separate reassociation pass like D41574.

Diff Detail

Repository
rL LLVM

Event Timeline

spatel created this revision.Apr 19 2018, 3:30 PM
lebedev.ri added inline comments.
lib/Transforms/Scalar/Reassociate.cpp
2121 ↗(On Diff #143169)

I'd think this could be

if (!B.isAssociative() || !B.isCommutative() ||
    !match(&B, m_BinOp(m_BinOp(B0), m_BinOp(B1))))
  return;
spatel added inline comments.Apr 26 2018, 12:08 PM
lib/Transforms/Scalar/Reassociate.cpp
2121 ↗(On Diff #143169)

Sure - I can change that if the general functionality of the patch is approved.

What are your plans for deeper associate-operand trees, such as in the tests I've suggested at D41574? Will you support such cases?

bjope added a subscriber: bjope.Apr 27 2018, 7:39 AM

As far as I can see this is another case of reusing existing Instructions/Values, while changing the actual value that the Instruction produce, right?

Take a look at the debug-info fixes I made here for a similar problem in RewriteExprTree: https://reviews.llvm.org/D45975
I suspect that you may need to discard debug-info in a similar way as in D45975, somewhere inside your new function swapOperandsToMatchBinops.

What are your plans for deeper associate-operand trees, such as in the tests I've suggested at D41574? Will you support such cases?

This patch is independent of D41574 from what I see. Ie, that patch makes no difference on any of these tests. Could it be enhanced to catch these?

If we want to do something like D41574 (provide stronger sorting of the expression tree based on opcode/operand) here in the existing -reassociate, then an addition to ReassociateExpression() like this could be used:

for (unsigned i = 0; i < Ops.size() - 1; ++i) {
  for (unsigned j = i + 1; j < Ops.size(); ++j) {
    BinaryOperator *B0, *B1;
    if (!match(Ops[i].Op, m_BinOp(B0)))
      continue;
    if (match(Ops[j].Op, m_BinOp(B1)) && B0->getOpcode() < B1->getOpcode())
      continue;
    if (B0->getOpcode() == B1->getOpcode()) {
      const APInt *B0C, *B1C;
      if (match(B0->getOperand(1), m_APInt(B0C)) && match(B1->getOperand(1), m_APInt(B1C)))
        if (B0C->ule(*B1C))
          continue;
    }
    std::swap(Ops[i], Ops[j]);
  }
}

That creates some of the factoring folds that we want, but it won't reduce as far as shown in the other patch. I'm not sure if that's a limitation of this pass or if I just botched the code (and this is untested, so may not be correct).

As far as I can see this is another case of reusing existing Instructions/Values, while changing the actual value that the Instruction produce, right?

Take a look at the debug-info fixes I made here for a similar problem in RewriteExprTree: https://reviews.llvm.org/D45975
I suspect that you may need to discard debug-info in a similar way as in D45975, somewhere inside your new function swapOperandsToMatchBinops.

Thanks! You're correct that we're recycling instructions here (not sure why we don't just create new instructions with a Builder?). The funny thing about all the examples here is that after we swap operands in swapOperandsToMatchBinops(), we end up going through the main reassociation loop again and make more changes which triggers your code D45975, so I already see 'undef' in the right places.

Nevertheless, it's a small change to extract and call that code, so let me do that to be safe.

spatel updated this revision to Diff 144408.Apr 27 2018, 3:20 PM

Patch updated:

  1. Extract discardDebugInfo() as a helper function and use it.
  2. Added a test case with debug info (baseline added at rL331083).
  3. Use another m_BinOp to shrink the code.
bjope added a comment.May 4 2018, 3:53 AM

As far as I can see this is another case of reusing existing Instructions/Values, while changing the actual value that the Instruction produce, right?

Take a look at the debug-info fixes I made here for a similar problem in RewriteExprTree: https://reviews.llvm.org/D45975
I suspect that you may need to discard debug-info in a similar way as in D45975, somewhere inside your new function swapOperandsToMatchBinops.

Thanks! You're correct that we're recycling instructions here (not sure why we don't just create new instructions with a Builder?). The funny thing about all the examples here is that after we swap operands in swapOperandsToMatchBinops(), we end up going through the main reassociation loop again and make more changes which triggers your code D45975, so I already see 'undef' in the right places.

Nevertheless, it's a small change to extract and call that code, so let me do that to be safe.

Great! Your last update seems to fix my concern about debug-info. I'll let someone else review the actual code transformation done here.

spatel added a comment.May 4 2018, 9:19 AM

Ping * 2.

Note that D46336 is proposing to solve the reassociation problems shown here within instcombine (but in a bigger way than my draft patch would have done it).

Overall, this seems to make sense to me, but the comments seem misdirecting.

lib/Transforms/Scalar/Reassociate.cpp
2138 ↗(On Diff #144408)

And by "matching" you really mean "with the same type as the parent binop"

2141 ↗(On Diff #144408)

Why can't we swap if B1 has more than one use?
And why is it not a problem for B0? (and positive test for this multi-use would be awesome)

2147 ↗(On Diff #144408)

// If B0 is still not matching, or both operands already have the same opcode, nothing to do.

2152–2153 ↗(On Diff #144408)

I find this comment not strictly true.
We only check that B00->getOpcode() != OtherOpc, we don't check B01.
Also s/V01/B01/

2157 ↗(On Diff #144408)

And here we don't care about one-use?

2161 ↗(On Diff #144408)

B01 ?

2163–2165 ↗(On Diff #144408)

I see, so V01 and B01 are actually the same thing, but with different type.
Would it be better to condense it into one if, like:

BinaryOperator *B01;
if (( match(B0->getOperand(0), m_BinOp(B00)) && B00->getOpcode() == OtherOpc) && 
    (!match(B0->getOperand(1), m_BinOp(B01)) || B01->getOpcode() != OtherOpc)) {
  Value *V01 = B0->getOperand(1);

?

test/Transforms/Reassociate/matching-binops.ll
186 ↗(On Diff #144408)

The logic behind one-use restrictions in this case is not clear to me,
so i'd personally prefer to have more one-use tests, where one-use is
the only thing that is preventing the reassociation.

Hmm, is only test/Transforms/Reassociate/matching-binops.ll regenerated here?
I'm wondering why D46336 changes so many more tests.

Hmm, is only test/Transforms/Reassociate/matching-binops.ll regenerated here?
I'm wondering why D46336 changes so many more tests.

Yes, and this is intentional. I don't think we usually want to have IR regression tests that depend on multiple passes. Although in this case - because I've left the actual factoring/distributive optimization out of this patch (at least for now) - it may be worth adding tests under PhaseOrdering to make sure that nothing is interfering with this transform before instcombine has a chance to reduce it.

Hmm, is only test/Transforms/Reassociate/matching-binops.ll regenerated here?
I'm wondering why D46336 changes so many more tests.

Yes, and this is intentional.

Oh right, this is a reassociate pass, not instcombine :)

I don't think we usually want to have IR regression tests that depend on multiple passes. Although in this case - because I've left the actual factoring/distributive optimization out of this patch (at least for now) - it may be worth adding tests under PhaseOrdering to make sure that nothing is interfering with this transform before instcombine has a chance to reduce it.

spatel updated this revision to Diff 145523.May 7 2018, 1:52 PM
spatel marked 8 inline comments as done.

Patch updated:
I think what made this patch confusing and overly complex is trying to recycle existing instructions (swapping operands rather than just creating new instructions).

As I mentioned in an earlier comment, I don't see a good reason to do that, so let's do the easy thing: use IRBuilder to create new instructions. This has 3 benefits and shrinks the patch:

  1. It simplifies the code needed for commutative canonicalization (looks more like the original instcombine patch now).
  2. It means we don't need to manually update debug info (IRBuilder does the right thing automatically).
  3. It makes the IR flag clearing/propagation cleaner.
spatel added a comment.May 7 2018, 2:19 PM

FWIW, I added a debug statistic for this transform and tested with test-suite, and it fires 1301 times.

lebedev.ri accepted this revision.May 7 2018, 2:35 PM

Patch updated:
I think what made this patch confusing and overly complex is trying to recycle existing instructions (swapping operands rather than just creating new instructions).

As I mentioned in an earlier comment, I don't see a good reason to do that, so let's do the easy thing: use IRBuilder to create new instructions. This has 3 benefits and shrinks the patch:

  1. It simplifies the code needed for commutative canonicalization (looks more like the original instcombine patch now).
  2. It means we don't need to manually update debug info (IRBuilder does the right thing automatically).
  3. It makes the IR flag clearing/propagation cleaner.

Nice!
Looks much better, much easier to follow the logic / read the code.

I think this is ok, but maybe wait for a second opinion, i don't trust myself too much..

This revision is now accepted and ready to land.May 7 2018, 2:35 PM
bjope added a subscriber: aprantl.May 7 2018, 11:55 PM
bjope added inline comments.
test/Transforms/Reassociate/matching-binops.ll
297 ↗(On Diff #145523)

nit: Verify that we "discard" the dbg.value for variable "a" (metadata !19 in the input, metadata !18 in the output), since we do nto calculate the value %and after the transformation.

@aprantl once told me that is was better to use metadata i32 undef instead of metadata !{} when a dbg.value is "discarded". I think it is out-of-scope for this patch, but maybe the code that picks metadata !2 in this solution should insert an undef value instead (or maybe later passes should handle metadata !{} the same way as if we have an explicit undef value, in case there really is a difference today).

One important take-away from https://reviews.llvm.org/D46336#1090588:
The InstCombine and Reassociate need to be run after each another in a loop (what't the correct term, internal pipeline?) until neither of them produces any more changes.

bool madeChanges = false;
do {
  madeChanges |= instcombine();
  madeChanges |= reassociate();
} while(madeChanges);
aprantl added a subscriber: vsk.May 8 2018, 9:29 AM
aprantl added inline comments.
test/Transforms/Reassociate/matching-binops.ll
297 ↗(On Diff #145523)

Do you think it would help to have a common facility along the lines of DbgInstrinsicInstr::replaceWithUndef() to make it easier to do the right thing?

spatel added inline comments.May 8 2018, 9:46 AM
test/Transforms/Reassociate/matching-binops.ll
297 ↗(On Diff #145523)

Definitely. I don't know anything about debuginfo, so I don't know why one option is better than the other or what is best.

As you can see, I'm just using a default IRBuilder, so can we get it to do the optimal thing automatically?

spatel added a comment.May 8 2018, 9:53 AM

One important take-away from https://reviews.llvm.org/D46336#1090588:
The InstCombine and Reassociate need to be run after each another in a loop (what't the correct term, internal pipeline?) until neither of them produces any more changes.

bool madeChanges = false;
do {
  madeChanges |= instcombine();
  madeChanges |= reassociate();
} while(madeChanges);

There are parallel conversations going on about the general direction, so let's move this to llvm-dev and reach consensus before we get into the details:
http://lists.llvm.org/pipermail/llvm-dev/2018-May/123117.html

If there are no objections, I'll commit this soon. The larger D41574 looks stalled. In the meantime, this is a small patch/improvement that will help limit hyper-extension proposals to instcombine. The patch still applies to trunk cleanly.

This revision was automatically updated to reflect the committed changes.
spatel reopened this revision.Sep 12 2018, 2:34 PM

Reopening - reverted with rL342083 because this patch can cause indeterminate output.
At first glance, it's not incorrect output, but there is a difference in the order of operands.

This revision is now accepted and ready to land.Sep 12 2018, 2:34 PM
spatel planned changes to this revision.Sep 12 2018, 2:34 PM
uabelho added a subscriber: uabelho.May 3 2019, 5:16 AM
Herald added a project: Restricted Project. · View Herald TranscriptMay 3 2019, 5:16 AM