Page MenuHomePhabricator

[InstSimplify] fold extracting from std::pair (1/2)
ClosedPublic

Authored by inouehrs on Jul 2 2018, 6:03 AM.

Details

Summary

This patch intends to enable jump threading with a method whose return type is std::pair<int, bool> or std::pair<bool, int>
For example, jump threading does not work for the if statement in func.

std::pair<int, bool> callee(int v) {
  int a = dummy(v);
  if (a) return std::make_pair(dummy(v), true);
  else return std::make_pair(v, v < 0);
}

int func(int v) {
  std::pair<int, bool> rc = callee(v);
  if (rc.second) {
    // do something
  }

SROA executed before the method inlining replaces std::pair by i64 without splitting in both callee and func since at this point no access to the individual fields is seen to SROA.
After inlining, jump threading fails to identify that the incoming value is a constant due to additional instructions (like or, and, trunc).

This series of patch add two patterns in InstructionSimplify to fold extraction of members of std::pair. To help jump threading, actually we need to optimize the code sequence spanning multiple BBs.
These patches does not handle phi by itself, but these additional patterns help NewGVN pass, which calls instsimplify to check opportunities for simplifying instructions over phi, apply phi-of-ops optimization to result in successful jump threading.

Later, in the CFG simplification pass, the similar code modification happens. But it is too late to help jump threading.
This series of patches replaces my old patch D44626 which do similar optimization in InstCombine as suggested by reviewers.

This first patch in the series handles code sequences that merges two values using shl and or and then extracts one value using lshr.

Alive proof for the shift case: https://rise4fun.com/Alive/epNB, mask Y case: https://rise4fun.com/Alive/vgH

Diff Detail

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes
lebedev.ri added a subscriber: lebedev.ri.

The InstSimplify change needs it's own test set in test/Transforms/InstSimplify.

lib/Analysis/InstructionSimplify.cpp
1278–1282

I think this shouldn't talk about C++ here. Just talk about IR.

// Given  Op0 l>> Op1.
// If Op1 is a constant, and Op0 is  (X nuw<< Op1) | Y
// If Y l>> Op1 == 0. we can extract `X` without extra instructions.
1285

Extra unneeded brace ()

1285

What about commutativity? This should be m_c_Or.

1290

Why do we check this? I think nuw already implies that.

inouehrs updated this revision to Diff 155670.Jul 16 2018, 7:01 AM
  • add more test cases
  • make the algorithm more general using m_c_Or instead of m_Or
inouehrs marked 3 inline comments as done.Jul 16 2018, 7:03 AM
inouehrs added inline comments.
lib/Analysis/InstructionSimplify.cpp
1278–1282

Is this better? I mention std::pair as an example.

1285

Modified. Thank you for pointing this out.

1290

Fixed.

lebedev.ri added inline comments.Jul 16 2018, 7:32 AM
lib/Analysis/InstructionSimplify.cpp
1278–1282
There are LLVM IR structures https://llvm.org/docs/LangRef.html#structure-types
Is the comment talking about them? Huh, why does it operate on integers then?

And other thoughts the readers will have later on. I'd concentrate on IR.

1286–1290

Can you add a comment explaining what/how this does?
(Checks that (%x << %op1) | %y does not touch any bits in %x)

1289

But this isn't the width of Y, that's Y->getType()->getScalarSizeInBits().
Maybe effective width of Y.

test/Transforms/InstSimplify/pair.ll
1

Use ./utils/update_test_checks.py please.

All of the tests for instsimplify are already folded if you run -instcombine. Could the motivating problem also be considered a pass ordering bug?

If this is going to be an instsimplify patch, then I agree with the previous feedback: the code comments should use IR examples and explain what is happening to analyze/transform the IR instructions. C++ examples just confuse things.

inouehrs updated this revision to Diff 155856.Jul 17 2018, 5:41 AM
inouehrs marked 2 inline comments as done.

update comments and test cases

@lebedev.ri @spatel
Thak you so much for the advices. I avoid mentioning about C++ in the comment.

@lebedev.ri

Use ./utils/update_test_checks.py please.

I updated the test cases using update_test_checks.py and moved them into existing ll files since pair.ll is no longer good file name.

@spatel
The simplified test cases can be optimized by current instcombine pass. But to enable jump threading for std::pair, which is the original motivation of the patch, we must apply this folding over a phi node as discussed in https://reviews.llvm.org/D44626.
Executing jump threading pass again after CFG simplification may catch this opportunity. But I think it is better to do jump threading early to help other optimizers.

All of the tests for instsimplify are already folded if you run -instcombine. Could the motivating problem also be considered a pass ordering bug?

@spatel
The simplified test cases can be optimized by current instcombine pass.

I do agree that it it concerning that instcombine already handles this.
However this pattern really looks like something for instsimplify.

So i guess my question is, what in instcombine does this fold?
Is it very general, and this is only one of the cases it handles?
If not, maybe it should be refactored into instsimplify.

But to enable jump threading for std::pair, which is the original motivation of the patch,
we must apply this folding over a phi node as discussed in https://reviews.llvm.org/D44626.

Can you explain that in layman terms?
I don't see anything phi-related in instsimplify changes.

Executing jump threading pass again after CFG simplification may catch this opportunity. But I think it is better to do jump threading early to help other optimizers.

lib/Analysis/InstructionSimplify.cpp
1817

I do not understand.
Why is this only handling the case where Y is bool?

inouehrs updated this revision to Diff 156241.Jul 19 2018, 5:22 AM

So i guess my question is, what in instcombine does this fold?
Is it very general, and this is only one of the cases it handles?
If not, maybe it should be refactored into instsimplify.

For the shl->or->lshr case, instcombine first identifies or is redundant and eliminates it. Then shl shr pair is eliminated. I think it is not general compared to my code.
For the shl->or->and case, instcombine does more general but more costly analysis in SimplifyDemandedInstructionBits. The same analysis seems too costly, but I expand the scope of my code for non-boolean cases.

Can you explain that in layman terms?
I don't see anything phi-related in instsimplify changes.

To help jump threading, actually we need to optimize the code sequence spanning multiple BBs. For an example of shl->or->and case looks like

BB1:
  %shl = shl nuw i64 %val, 32
  %or = or i64 %shl, 1
  br %BB2
BB2:
  %phi = phi i64 [ %or, %BB1 ], ... 
  %and = and i64 %phi, 1

The current instcombine cannot optimize such cases.
My instsimplify patch does not handle phi by itself. The NewGVN calls instsimplify to check opportunities for simplifying instructions over phi.

lib/Analysis/InstructionSimplify.cpp
1817

I made the code more generic. What we really need to check is that this AND op selects all bits of X or Y, and no bit from the another.

Some more comments.
Please enhance test coverage, and note the note about possible miscompile.

! In D48828#1166791, @lebedev.ri wrote:
Can you explain that in layman terms?
I don't see anything phi-related in instsimplify changes.

To help jump threading, actually we need to optimize the code sequence spanning multiple BBs. For an example of shl->or->and case looks like

BB1:
  %shl = shl nuw i64 %val, 32
  %or = or i64 %shl, 1
  br %BB2
BB2:
  %phi = phi i64 [ %or, %BB1 ], ... 
  %and = and i64 %phi, 1

The current instcombine cannot optimize such cases.
My instsimplify patch does not handle phi by itself. The NewGVN calls instsimplify to check opportunities for simplifying instructions over phi.

Aha, that is the bit i was looking for. Please update the differential's description with that.

lib/Analysis/InstructionSimplify.cpp
1287–1288

Please add some negative test where this fails.

1817

What we really need to check is that this AND op selects all bits of X or Y, and no bit from the another.

That was exactly my point :)

1826

I think you can use const APInt& Mask here.

1828

But it seems like this only supports extraction of a bit-wide values?
Please at least add a FIXME comment then.

1835–1836

Aha. This will miscompile if you are operating on types wider than i64.

Please add tests with wider types (i128, with elements of i64, e.g.),
and use APInt in these calculations.

1846

This also needs some negative tests.

test/Transforms/InstSimplify/AndOrXor.ll
994 ↗(On Diff #156241)

Please also add two tests with i32 | i32 - test extracting low part, and test high part.

inouehrs updated this revision to Diff 156761.Jul 23 2018, 6:05 AM
inouehrs edited the summary of this revision. (Show Details)
  • fix a bug with an integer larger than 64 bit
  • add more test cases
  • remove an unnecessary check
inouehrs marked 11 inline comments as done.Jul 23 2018, 6:08 AM
inouehrs added inline comments.
lib/Analysis/InstructionSimplify.cpp
1828

Actually, this is an unnecessary check. I removed it to increase the optimization opportunities.

1835–1836

Right. Fixed using APInt.

test/Transforms/InstSimplify/AndOrXor.ll
994 ↗(On Diff #156241)

More tests (including negative tests) added.

inouehrs marked 3 inline comments as done.Jul 23 2018, 6:09 AM
lebedev.ri added inline comments.Jul 23 2018, 6:12 AM
lib/Analysis/InstructionSimplify.cpp
1826

I was indeed specifically talking about const APInt&, not const APInt, note the &.

inouehrs updated this revision to Diff 156765.Jul 23 2018, 6:21 AM
inouehrs updated this revision to Diff 156767.Jul 23 2018, 6:24 AM
inouehrs added inline comments.
lib/Analysis/InstructionSimplify.cpp
1826

Fixed. Thank you for the repeated comments.

Hmm, this looks about right.
SimplifyRightShift() change looks good,
but i have hopefully a last portion of comments on the SimplifyAndInst()..

lib/Analysis/InstructionSimplify.cpp
1823–1825
Value *Y, *Shift;
if (isa<ConstantInt>(Op1) &&
    match(Op0, m_c_Or(m_CombineAnd(m_NUWShl(m_Value(X), m_APInt(ShAmt)),
                                   m_Value(Shift)),
                      m_Value(Y)))) {
}
1835

Pedantically, i still somehow don't like these calculations :/
I think at least this should be:

const APInt EffBitsY = APInt::getLowBitsSet(Width, EffWidthY);

Not sure how to more nicely express the EffBitsX.

1839–1841

Please see if APInt::intersects() and APInt::isSubsetOf() could be used here.

1842–1845
return Shift;
test/Transforms/InstSimplify/AndOrXor.ll
973 ↗(On Diff #156767)

Please give names to all these variables in all the tests, prefix the numeric id with tmp - s/%/%tmp/.

1078–1091 ↗(On Diff #156767)

Could you please move this to before the negative tests?

inouehrs updated this revision to Diff 157477.Jul 26 2018, 7:08 AM

addressed the comments from @lebedev.ri

lebedev.ri edited the summary of this revision. (Show Details)Jul 26 2018, 12:39 PM
lebedev.ri set the repository for this revision to rL LLVM.
lebedev.ri edited the summary of this revision. (Show Details)Jul 26 2018, 1:13 PM
lebedev.ri accepted this revision.Jul 26 2018, 1:24 PM

Yeah, ok, i've convinced myself that this is ok. LGTM.
I do believe that the reasoning for this to be in instsimplify are sound (gvn needs it, running instcombine won't cut it.)

@inouehrs please commit tests now (as of the current trunk, to not angry the bots).
And please wait a bit (2 days?) before committing the transform itself (and the test changes themselves) in case @spatel / others want to comment.

lib/Analysis/InstructionSimplify.cpp
1838

Hmm, right, nice!
Even fits within 80-char width :)

1841–1844

isSubsetOf() - This operation checks that all bits set in this APInt are also set in RHS.

So we check that the mask covers all the possibly-set bits of X / Y.

intersects() - This operation tests if there are any pairs of corresponding bits between this APInt and RHS that are both set.

And we are checking that the mask does *not* cover any possibly-set bits of the Y / X.
Looks about right..

This revision is now accepted and ready to land.Jul 26 2018, 1:25 PM

Yeah, ok, i've convinced myself that this is ok. LGTM.
I do believe that the reasoning for this to be in instsimplify are sound (gvn needs it, running instcombine won't cut it.)

@inouehrs please commit tests now (as of the current trunk, to not angry the bots).

  1. Yes, I'd also like to see the tests get committed now with baseline CHECKs.
  2. We've made a good argument for having this in instsimplify, but I don't think we answered a related question: does adding either or both of these to instsimplify allow us to remove code from instcombine? Hopefully, there are existing regression tests for instcombine to verify its transforms - would those tests pass when this patch is applied?
  3. This is 2 independent patches in 1 review (1 transform for 'lshr'; 1 transform for 'and'). It's best if we split them into separate patches.
  4. At least some of the simplify tests do not appear to be minimized. I would expect that all tests end with 'lshr' or 'and' since that is where the pattern matching starts.
  5. Why does the code use isa<ConstantInt> rather than match(op, m_APInt(C))? Using the matcher would give us vector splat functionality for free IIUC.
inouehrs updated this revision to Diff 157850.Jul 28 2018, 6:10 AM

Yes, I'd also like to see the tests get committed now with baseline CHECKs.

I have committed unit tests in https://reviews.llvm.org/rL338107

We've made a good argument for having this in instsimplify, but I don't think we answered a related question: does adding either or both of these to instsimplify allow us to remove code from instcombine?

I have investigated related instcombine patterns. But so far, I do not find something redundant with this instsimplify patch.
The pattern for 'and' is handled by SimplifyDemandedInstructionBits in instcombine and it has much wider coverage than this pattern.
The pattern for 'lshr' is handled by a combination of multiple patterns for or and shifts in instcombine. My instsimplify patch fully covers neither patterns in instcombine.

This is 2 independent patches in 1 review (1 transform for 'lshr'; 1 transform for 'and'). It's best if we split them into separate patches.

I will commit the transformation in lshr and and separately.

At least some of the simplify tests do not appear to be minimized. I would expect that all tests end with 'lshr' or 'and' since that is where the pattern matching starts.

I fixed the test. I intend to make the generated code simpler.

Why does the code use isa<ConstantInt> rather than match(op, m_APInt(C))? Using the matcher would give us vector splat functionality for free IIUC.

Fixed. Thank you for the suggestion.

Yes, I'd also like to see the tests get committed now with baseline CHECKs.

I have committed unit tests in https://reviews.llvm.org/rL338107

We've made a good argument for having this in instsimplify, but I don't think we answered a related question: does adding either or both of these to instsimplify allow us to remove code from instcombine?

I have investigated related instcombine patterns. But so far, I do not find something redundant with this instsimplify patch.
The pattern for 'and' is handled by SimplifyDemandedInstructionBits in instcombine and it has much wider coverage than this pattern.
The pattern for 'lshr' is handled by a combination of multiple patterns for or and shifts in instcombine. My instsimplify patch fully covers neither patterns in instcombine.

This is 2 independent patches in 1 review (1 transform for 'lshr'; 1 transform for 'and'). It's best if we split them into separate patches.

I will commit the transformation in lshr and and separately.

Commit - yes. But it wouldn't be bad to review (concurrently) these things as two differentials, too.

At least some of the simplify tests do not appear to be minimized. I would expect that all tests end with 'lshr' or 'and' since that is where the pattern matching starts.

I fixed the test. I intend to make the generated code simpler.

Why does the code use isa<ConstantInt> rather than match(op, m_APInt(C))? Using the matcher would give us vector splat functionality for free IIUC.

Fixed. Thank you for the suggestion.

I'm not sure of the exact problem (vector tests needed?), but added one nit.

lib/Analysis/InstructionSimplify.cpp
1281–1283

I'm not sure this is better, or the full fix (tests needed.)
I would think you'd need

const APInt *ShAmt0, *ShAmt1;
if (match(Op1, m_APInt(ShAmt2)) &&
    match(Op0, m_c_Or(m_NUWShl(m_Value(X), m_APInt(ShAmt1)), m_Value(Y))) &&
    *ShAmt0 == *ShAmt1) {
  const APInt *ShAmt = ShAmt1;
1840–1843

intersects() is commutative, so this shouldn't matter.

Why does the code use isa<ConstantInt> rather than match(op, m_APInt(C))? Using the matcher would give us vector splat functionality for free IIUC.

Fixed. Thank you for the suggestion.

I'm not sure of the exact problem (vector tests needed?), but added one nit.

Yes - I think we should have at least 1 minimal test with vector types for each transform, so we know if that's working as expected. I think it will work, but I haven't stepped through to confirm that.

inouehrs updated this revision to Diff 157942.Jul 30 2018, 4:56 AM
inouehrs retitled this revision from [InstSimplify] fold extracting from std::pair to [InstSimplify] fold extracting from std::pair (1/2).
inouehrs edited the summary of this revision. (Show Details)
  • Separate the patch into two; this one is the first of the two.
  • Add test cases with vector data type.
inouehrs added inline comments.Jul 30 2018, 4:59 AM
lib/Analysis/InstructionSimplify.cpp
1281–1283

Sorry but I cannot catch why you use m_APInt in the matcher and then compare the values instead of using m_Specific. What kind of code sequences you want to cover with this?

lebedev.ri added inline comments.Jul 30 2018, 5:34 AM
lib/Analysis/InstructionSimplify.cpp
1281–1283

I'm not sure it matters right now.
m_APInt() could potentially (not right now) match splat constant with undef's - <i32 42, i32 undef, i32 42>
But m_Specific() compares the pointers, not the underlying data. So if ShAmt0 and ShAmt1 are both splat,
but have different undefs (e.g. only one of them has undef elements), they would not have the same constant.

Theoretically, that code in my comment would still match this case.
But this does not matter right now since m_APInt() does not accept constants with undef elements.

Thanks for splitting it up. This is close to good IMO - just a few minor points:

  1. Please add/adjust the tests with baseline checks as a preliminary step; we don't want to lose those in case the code change gets reverted.
  2. Seeing this diff on its own makes it clear that we're overspecifying the more general SimplifyDemandedBits transform (what about the case where the shifts are in the opposite order?). That should be noted as a code comment and in the commit message.*
  3. I don't expect any compile-time problems given that the computeKnownBits is buried under the other pattern checks, but be aware of that concern and watch for regressions.
  • It has been discussed before that SimplifyDemandedBits really shouldn't be included in InstCombine; it should be its own pass. If that structural change was made, would it make adjusting the optimization pipeline a more appealing solution than this?
test/Transforms/InstSimplify/shift.ll
218 ↗(On Diff #157942)

Swap the 'or' operands here so we have coverage for the commuted case?
In general, I like to see a test comment that points that out too, so we know what's changing between the tests.
Also, it's a matter of taste, but the more common test format would put the test comment above the test definition, so it's clearly separated from the auto-generated CHECK lines.

inouehrs updated this revision to Diff 158182.Jul 31 2018, 12:44 AM
inouehrs marked an inline comment as done.Jul 31 2018, 12:57 AM

Please add/adjust the tests with baseline checks as a preliminary step; we don't want to lose those in case the code change gets reverted.

I have updated baseline checks.

Seeing this diff on its own makes it clear that we're overspecifying the more general SimplifyDemandedBits transform (what about the case where the shifts are in the opposite order?). That should be noted as a code comment and in the commit message.*

I added comments. I will also mention in the commit message.

It has been discussed before that SimplifyDemandedBits really shouldn't be included in InstCombine; it should be its own pass. If that structural change was made, would it make adjusting the optimization pipeline a more appealing solution than this?

For the original motivating examples on jump threading, it needs inter-BB optimization (e.g. code below). So If SimplifyDemandedBits pass can support inter-BB opt as well as intra-BB opt and executed before the jump threading, it will be more general than this patch.

BB1:
  %shl = shl nuw i64 1, 32
  %or = or i64 %shl, %v
  br %BB2
BB2:
  %phi = phi i64 [ %or, %BB1 ], ... 
  %shr = lshr i64 %phi, 32
lib/Analysis/InstructionSimplify.cpp
1281–1283

I got it. I rewrite the code as you suggested for safety.

spatel accepted this revision.Jul 31 2018, 5:20 AM

LGTM

This revision was automatically updated to reflect the committed changes.