This is an archive of the discontinued LLVM Phabricator instance.

[InstCombine] Transform bitwise (A >> C - 1, zext(icmp)) -> zext (bitwise(A < 0, icmp)) fold.
ClosedPublic

Authored by XChy on Jul 9 2023, 7:32 AM.

Details

Summary

This extends foldCastedBitwiseLogic to handle the similar cases.
I have recently submitted a patch to implement a single fold like:

(A > 0) | (A < 0) -> zext (A != 0)

But it is not general enough, and some problems like a < b & a >= b - 1 happen again.

So I generalize this fold by matching the pattern bitwise(A >> C - 1, zext(icmp)), and replace A >> C - 1 with zext(A < 0) here.
(C is the scalar size bits of the type of A)
Then we get bitwise(zext(A < 0), zext(icmp)), this will be folded by original code in foldCastedBitwiseLogic, into zext(bitwise(A < 0, icmp)).
And finally, any related icmp fold will be automatically implemented because bitwise(icmp,icmp) had been implemented.

The proof of the correctness is obvious, because the folds below were previously proved and implemented.
A >> C - 1 -> zext(A < 0)
bitwise(zext(A), zext(B)) -> zext(bitwise(A, B))
And the fold of this patch is the combination of folds above.

Related issue:
a < b | a >b
a < b & a >= b - 1
Related patch:
D154126

Diff Detail

Event Timeline

XChy created this revision.Jul 9 2023, 7:32 AM
XChy requested review of this revision.Jul 9 2023, 7:32 AM
nikic added a comment.Jul 9 2023, 7:46 AM

The basic idea here is reasonable, but you need to be very careful about infinite loops: If you replace the shift with zext+icmp and it does *not* get folded afterwards, it will be converted back to the shift, and so on. I don't think the fold is guaranteed to happen, e.g. due to some unlucky interaction with shouldOptimizeCast().

I would recommend to instead directly produce the zext(binop(icmp, icmp)) sequence, rather than letting the following fold handle it.

Please add:

  • Multi-use test.
  • Test where we do not get any beneficial fold out of converting the lshr back into an icmp.

Depending on how the latter case looks like, we might want to further limit this -- e.g. does it make sense to do this if the lshr and icmp work on different variables or not?

goldstein.w.n added a comment.EditedJul 9 2023, 11:56 AM

Throughout the comments/summary/title can you replace X with C to indicate its a constant. Also in a few places you forget the -1 in its description as sizeof_bits(A) - 1.

goldstein.w.n added inline comments.Jul 9 2023, 11:58 AM
llvm/lib/Transforms/InstCombine/InstCombineAndOrXor.cpp
1741

lshr? The comments are all shl. Can you clarify one of them (looks like comments/summary is wrong).

goldstein.w.n added inline comments.Jul 9 2023, 12:00 PM
llvm/lib/Transforms/InstCombine/InstCombineAndOrXor.cpp
1736

'and/or' -> 'bitwise' no?

1744

Is there a reason you create IsMatched as opposed to just embedding the match(...) logic in the if statement?

XChy retitled this revision from [InstCombine] Transform bitwise (A << X, zext(icmp)) -> zext (bitwise(A < 0, icmp)) fold. to [InstCombine] Transform bitwise (A << C - 1, zext(icmp)) -> zext (bitwise(A < 0, icmp)) fold..
XChy edited the summary of this revision. (Show Details)
XChy retitled this revision from [InstCombine] Transform bitwise (A << C - 1, zext(icmp)) -> zext (bitwise(A < 0, icmp)) fold. to [InstCombine] Transform bitwise (A >> C - 1, zext(icmp)) -> zext (bitwise(A < 0, icmp)) fold..Jul 9 2023, 5:26 PM
XChy edited the summary of this revision. (Show Details)
XChy added a comment.Jul 9 2023, 11:18 PM

The basic idea here is reasonable, but you need to be very careful about infinite loops: If you replace the shift with zext+icmp and it does *not* get folded afterwards, it will be converted back to the shift, and so on. I don't think the fold is guaranteed to happen, e.g. due to some unlucky interaction with shouldOptimizeCast().

I would recommend to instead directly produce the zext(binop(icmp, icmp)) sequence, rather than letting the following fold handle it.

Please add:

  • Multi-use test.
  • Test where we do not get any beneficial fold out of converting the lshr back into an icmp.

Depending on how the latter case looks like, we might want to further limit this -- e.g. does it make sense to do this if the lshr and icmp work on different variables or not?

Thanks for your review! I agree with you. Actually, I came across infinite loops during developing and seem to solve it by letting the following fold handle IR. But it's just solved with few tests.

My original purpose is to use foldAndOrOfICmps to fold icmps as fold and/or( A < 0, icmp). From my perspective, if foldAndOrOfICmps doesn't fold the transformed IR, this fold should not happend.

I will add related tests here and try to find out where we do not get any beneficial fold later.

llvm/lib/Transforms/InstCombine/InstCombineAndOrXor.cpp
1736

Maybe and/or is right, since foldAndOrOfICmps do not fold xor.
I'm not sure what xor(icmp,icmp) is folded into.

1741

You're right. I mess it up in last patch too. I'll fix it.

1744

No, it's my personal habit to set a boolean variable when expression is too long. If needed, I can remove it.

XChy updated this revision to Diff 538616.Jul 10 2023, 6:44 AM
goldstein.w.n added inline comments.Jul 10 2023, 9:09 AM
llvm/lib/Transforms/InstCombine/InstCombineAndOrXor.cpp
1736

In that case, you need to only match and/or and probably should mention that in the summary.

1744

Its fine.

nikic added a comment.Jul 10 2023, 1:05 PM

This is still missing multi-use tests. We'll need some m_OneUse guards to prevent unprofitable transforms.

llvm/lib/Transforms/InstCombine/InstCombineAndOrXor.cpp
1716–1732

X -> BW

1734–1740
1743

Needs clang-format.

llvm/test/Transforms/InstCombine/and-or-icmps.ll
2886

The main multi-use test I'm looking for is one where the resulting binop does not get folded. That's the case where your current code will increase instructions, I believe.

XChy updated this revision to Diff 539026.Jul 11 2023, 4:48 AM
XChy marked 9 inline comments as done.
XChy edited the summary of this revision. (Show Details)
goldstein.w.n added inline comments.Jul 11 2023, 9:47 AM
llvm/lib/Transforms/InstCombine/InstCombineAndOrXor.cpp
1716–1732

comment incorrect shift.

1726

Does Op1 need to be an icmp, or can it just be any i1?

1727

nit:: LogicOp != Instruction::Xor should go before the match(...) imo.

XChy updated this revision to Diff 539399.Jul 11 2023, 11:27 PM
XChy marked 2 inline comments as done.
XChy edited the summary of this revision. (Show Details)
XChy set the repository for this revision to rG LLVM Github Monorepo.
XChy updated this revision to Diff 539492.Jul 12 2023, 4:19 AM

[clang-format]

XChy added inline comments.Jul 12 2023, 8:45 AM
llvm/lib/Transforms/InstCombine/InstCombineAndOrXor.cpp
1726

When folding A >> BW - 1 -> A < 0, there are many possible folds for (A < 0) bitwise (icmp).
However, if replacing icmp with an arbitrary i1, it seldom folds and just produces this single fold A >> BW - 1 -> A < 0, which is inefficient.

1736

I noticed foldXorOfICmps fold just now. Maybe I can add some related tests here.

goldstein.w.n added inline comments.Jul 12 2023, 3:11 PM
llvm/lib/Transforms/InstCombine/InstCombineAndOrXor.cpp
1726

I see.

1727

It seems in addressing this you just remove the LogicOp != Xor ingeneral...

XChy updated this revision to Diff 539971.Jul 13 2023, 5:28 AM
XChy marked 2 inline comments as done.

I update the diff to ensure that we fold exactly what can be folded by foldXorOfICmps and foldAndOrOfICmps.

However, I'm not sure what unexpected result the code below may bring.

// remove the deferred 2 instructions : 
// icmp slt A, 0
// bitwise (A < 0, icmp) 
// otherwise there will be infinite loops of combining
Worklist.popDeferred()->eraseFromParent();
Worklist.popDeferred()->eraseFromParent();
XChy added a comment.Jul 13 2023, 6:42 AM

I update the diff to ensure that we fold exactly what can be folded by foldXorOfICmps and foldAndOrOfICmps.

However, I'm not sure what unexpected result the code below may bring.

// remove the deferred 2 instructions : 
// icmp slt A, 0
// bitwise (A < 0, icmp) 
// otherwise there will be infinite loops of combining
Worklist.popDeferred()->eraseFromParent();
Worklist.popDeferred()->eraseFromParent();

Actually, I copy and edit the code to avoid infinite loops from:

Instruction *eraseInstFromFunction(Instruction &I) override {
  LLVM_DEBUG(dbgs() << "IC: ERASE " << I << '\n');
  assert(I.use_empty() && "Cannot erase instruction that is used!");
  salvageDebugInfo(I);

  // Make sure that we reprocess all operands now that we reduced their
  // use counts.
  SmallVector<Value *> Ops(I.operands());
  Worklist.remove(&I);
  I.eraseFromParent();
  for (Value *Op : Ops)
    Worklist.handleUseCountDecrement(Op);
  MadeIRChange = true;
  return nullptr; // Don't do anything with FI
}

I omit MadeIRChange = true; to avoid the infinite loops, which are caused by MadeIRChange with the same instructions deferred and erased(Actually, IR do not change).

goldstein.w.n added inline comments.Jul 13 2023, 3:53 PM
llvm/lib/Transforms/InstCombine/InstCombineAndOrXor.cpp
1756

@nikic, is this right?

llvm/test/Transforms/InstCombine/and-or-icmps.ll
2901

Since you're now supporting xor, you need to add some tests for it.

XChy updated this revision to Diff 540411.Jul 14 2023, 7:19 AM
XChy set the repository for this revision to rG LLVM Github Monorepo.

Add xor tests

XChy marked an inline comment as done.Jul 15 2023, 6:32 AM
XChy updated this revision to Diff 540804.Jul 16 2023, 6:32 AM

Limit the fold for multiuse cases

goldstein.w.n added inline comments.Jul 16 2023, 11:06 AM
llvm/lib/Transforms/InstCombine/InstCombineAndOrXor.cpp
1754

I'm mostly okay with the change, but a little unhappy about this. It seems like a worrisome practice.
I guess it works, but it would be nice if there where a better way to accomplish it.

Generally I'd argue the best solution would be to refactor fold<BinOp>OfICmp to take the components rather than the final instructions, but those are both fairly complicated and propagate the instructions to other functions that would then need to be refactored. All in all more work than its worth.

I'm going to defer my opinion to @nikic about whether this is okay.

Other than my concern here, I'm basically ready to approve.

XChy added inline comments.Jul 16 2023, 8:26 PM
llvm/lib/Transforms/InstCombine/InstCombineAndOrXor.cpp
1754

Thanks for your comment! I agree with you. It's not a good way to erase some deferred instructions in a fold function. It's better to let InstCombinerImpl::run() control the logic of instruction erasion.

But there aren't other ways to determine whether some instructions can be folded by fold<BinOp>OfICmp, unless we just call it with the instructions built and deferred or we can extract a canFold<BinOp>OfICmp function to take the components.

The latter costs too much, since it may require refactoring all sub folds.

For that reason, I applied the former but I didn't find any similar situations in InstCombine. I just try to copy some of the logic of instructions erasion to solve the problem.

I'll search for more similar cases to get a better solution if possible. I'll highly appreciate it if you could give me some other advice.

goldstein.w.n added inline comments.Jul 17 2023, 9:14 AM
llvm/lib/Transforms/InstCombine/InstCombineAndOrXor.cpp
1754

Unfortunately I don't really have a better idea than what you have here, but want to here nikic's opinion

nikic added inline comments.Jul 18 2023, 8:17 AM
llvm/lib/Transforms/InstCombine/InstCombineAndOrXor.cpp
1754

We shouldn't do this kind of speculative transform. Either you can always do the transform based on a reasonably close heuristic (like zext and icmp on the same value) or change APIs in a way that allows doing the transform without materializing the icmp (I think this is not worth the trouble).

My 2c here is that it would be okay to convert to the zext(bitwise(icmp, icmp)) form even if it doesn't always fold, as this seems like the better representation at the IR level to me. Even if it doesn't fold in InstCombine, this is the form that is more likely to be usefully optimized by other passes. If we really care, we can undo this in the backend.

XChy updated this revision to Diff 541812.Jul 18 2023, 7:14 PM
XChy marked 4 inline comments as done.
XChy set the repository for this revision to rG LLVM Github Monorepo.

Produce zext(bitwise(icmp.icmp))

XChy added inline comments.Jul 20 2023, 9:15 PM
llvm/lib/Transforms/InstCombine/InstCombineAndOrXor.cpp
1754

I see. I have reverted to the version producing zext(bitwise(icmp.icmp))

nikic added a comment.Jul 23 2023, 1:09 PM

Looks mostly fine, but I think the one-use check isn't quite correct.

llvm/test/Transforms/InstCombine/and-or-icmps.ll
2670

Not _fail?

2914

Creating more instructions here. You should probably always require one-use on the lshr rather than one-use on one of the operands.

XChy updated this revision to Diff 543357.Jul 23 2023, 8:34 PM
XChy marked an inline comment as done.
  • Apply m_OneUse guard to lshr.
  • Fix nits
XChy marked an inline comment as done.Jul 23 2023, 8:35 PM
XChy added inline comments.Jul 23 2023, 8:45 PM
llvm/test/Transforms/InstCombine/and-or-icmps.ll
2914

If replacing icmp eq 100 with other icmp that can be optimized along with icmp slt 0, there will be a better IR, except just one extra instruction. Is there some principles that determine whether a one-use guard is necessary or whether a fold is too agressive/conservative? Maybe I can apply them to similar situation in future contributions.

nikic added inline comments.Jul 24 2023, 3:16 AM
llvm/test/Transforms/InstCombine/and-or-icmps.ll
2914

For multi-use we want to avoid instruction increase in the worst case.

2947

Based on this we should have m_OneUse on the zext(icmp) as well.

XChy updated this revision to Diff 543448.Jul 24 2023, 3:27 AM
XChy marked 2 inline comments as done.
  • Add m_OneUse guard to zext(icmp).
llvm/test/Transforms/InstCombine/and-or-icmps.ll
2914

OK. Thanks for suggestion.

nikic accepted this revision.Jul 24 2023, 3:28 AM

LGTM

This revision is now accepted and ready to land.Jul 24 2023, 3:28 AM
XChy added a comment.Jul 24 2023, 3:32 AM

LGTM

I don't have commit access, can you please land this for me? Please use 'XChy xxs_chy@outlook.com' for the commit.

This revision was landed with ongoing or failed builds.Jul 24 2023, 4:07 AM
This revision was automatically updated to reflect the committed changes.